Data
PriceRunner

PriceRunner

active ARFF BSD Visibility: public Uploaded 02-01-2024 by Leonidas Akritidis
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
These datasets originate from PriceRunner, a popular product comparison platform. They contain product-related information including product IDs, titles, and categories. It can be used for numerous tasks, such as classification, clustering, record linkage, etc. Column description: * Product ID * Product Title as it appears in the respective product comparison platform (lower case and with punctuation removed) * Vendor ID: the ID of the electronic store that provides the product. * Cluster ID: the ID of the cluster that the product belongs to. Useful for entity matching and clustering tasks. * Cluster Label: The title of the aforementioned cluster. * Category ID: the ID of the category that the product belongs to. Useful for classification and categorization tasks. * Category Label: The title of the aforementioned category. Citations: * L. Akritidis, A. Fevgas, P. Bozanis, C. Makris, "A Self-Verifying Clustering Approach to Unsupervised Matching of Product Titles", Artificial Intelligence Review (Springer), pp. 1-44, 2020. * L. Akritidis, P. Bozanis, "Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations", In Proceedings of the 14th IEEE International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1-10, 2018. * L. Akritidis, A. Fevgas, P. Bozanis, "Effective Product Categorization with Importance Scores and Morphological Analysis of the Titles", In Proceedings of the 30th IEEE International Conference on Tools with Artificial Intelligence IICTAI), pp. 213-220, 2018.

6 features

category_label (target)nominal10 unique values
0 missing
id (ignore)numeric35300 unique values
0 missing
product_titlestring30982 unique values
0 missing
vendor_idnumeric306 unique values
0 missing
cluster_idnumeric13225 unique values
0 missing
cluster_labelnominal12841 unique values
0 missing
category_idnumeric10 unique values
0 missing

19 properties

35300
Number of instances (rows) of the dataset.
6
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
3
Number of numeric attributes.
2
Number of nominal attributes.
1
Average class difference between consecutive instances.
0
Percentage of missing values.
0
Number of attributes divided by the number of instances.
50
Percentage of numeric attributes.
15.58
Percentage of instances belonging to the most frequent class.
33.33
Percentage of nominal attributes.
5501
Number of instances belonging to the most frequent class.
6.27
Percentage of instances belonging to the least frequent class.
2212
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.

0 tasks

Define a new task