This dataset originates from Skroutz, a popular Greek product comparison platform.It contains product-related information including product IDs, titles, and categories. Despite its origin, most of the product titles are in english, or in greeklish.The dataset can be used for performing numerous tasks, such as classification, clustering, record linkage, etc.
Column description:
* Product ID
* Product Title as it appears in the respective product comparison platform (lower case and with punctuation removed)
* Vendor ID: the ID of the electronic store that provides the product.
* Cluster ID: the ID of the cluster that the product belongs to. Useful for entity matching and clustering tasks.
* Cluster Label: The title of the aforementioned cluster.
* Category ID: the ID of the category that the product belongs to. Useful for classification and categorization tasks.
* Category Label: The title of the aforementioned category.
Citations:
* L. Akritidis, A. Fevgas, P. Bozanis, C. Makris, "A Self-Verifying Clustering Approach to Unsupervised Matching of Product Titles", Artificial Intelligence Review (Springer), pp. 1-44, 2020.
* L. Akritidis, P. Bozanis, "Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations", In Proceedings of the 14th IEEE International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1-10, 2018.
* L. Akritidis, A. Fevgas, P. Bozanis, "Effective Product Categorization with Importance Scores and Morphological Analysis of the Titles", In Proceedings of the 30th IEEE International Conference on Tools with Artificial Intelligence IICTAI), pp. 213-220, 2018.