OpenML

JavaScript is required to properly view the contents of this page!

PriceRunner

active ARFF BSD Visibility: public Uploaded 02-01-2024 by Leonidas Akritidis
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

These datasets originate from PriceRunner, a popular product comparison platform. They contain product-related information including product IDs, titles, and categories. It can be used for numerous tasks, such as classification, clustering, record linkage, etc. Column description: * Product ID * Product Title as it appears in the respective product comparison platform (lower case and with punctuation removed) * Vendor ID: the ID of the electronic store that provides the product. * Cluster ID: the ID of the cluster that the product belongs to. Useful for entity matching and clustering tasks. * Cluster Label: The title of the aforementioned cluster. * Category ID: the ID of the category that the product belongs to. Useful for classification and categorization tasks. * Category Label: The title of the aforementioned category. Citations: * L. Akritidis, A. Fevgas, P. Bozanis, C. Makris, "A Self-Verifying Clustering Approach to Unsupervised Matching of Product Titles", Artificial Intelligence Review (Springer), pp. 1-44, 2020. * L. Akritidis, P. Bozanis, "Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations", In Proceedings of the 14th IEEE International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1-10, 2018. * L. Akritidis, A. Fevgas, P. Bozanis, "Effective Product Categorization with Importance Scores and Morphological Analysis of the Titles", In Proceedings of the 30th IEEE International Conference on Tools with Artificial Intelligence IICTAI), pp. 213-220, 2018.

6 features

category_label (target)	nominal	10 unique values 0 missing
id (ignore)	numeric	35300 unique values 0 missing
product_title	string	30982 unique values 0 missing
vendor_id	numeric	306 unique values 0 missing
cluster_id	numeric	13225 unique values 0 missing
cluster_label	nominal	12841 unique values 0 missing
category_id	numeric	10 unique values 0 missing