Data
Otto-Group-Product-Classification-Challenge

Otto-Group-Product-Classification-Challenge

active ARFF Unknown Visibility: public Uploaded 02-06-2023 by Matthias Feurer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
## Overview The Otto Group is one of the world's biggest e-commerce companies, with subsidiaries in more than 20 countries, including Crate & Barrel (USA), Otto.de (Germany) and 3 Suisses (France). We are selling millions of products worldwide every day, with several thousand products being added to our product line. A consistent analysis of the performance of our products is crucial. However, due to our diverse global infrastructure, many identical products get classified differently. Therefore, the quality of our product analysis depends heavily on the ability to accurately cluster similar products. The better the classification, the more insights we can generate about our product range. For this competition, we have provided a dataset with 93 features for more than 200,000 products. The objective is to build a predictive model which is able to distinguish between our main product categories. The winning models will be open sourced. ## Data description Each row corresponds to a single product. There are a total of 93 numerical features, which represent counts of different events. All features have been obfuscated and will not be defined any further. There are nine categories for all products. Each target category represents one of our most important product categories (like fashion, electronics, etc.). The products for the training and testing sets are selected randomly. ### Data fields * id - an anonymous id unique to a product * feat_1, feat_2, ..., feat_93 - the various features of a product * target - the class of a product Notes by Uploader to OpenML * This is only the training set. The test set is not publicly available.

94 features

target (target)nominal9 unique values
0 missing
id (row identifier)numeric61878 unique values
0 missing
feat_1numeric42 unique values
0 missing
feat_2numeric37 unique values
0 missing
feat_3numeric48 unique values
0 missing
feat_4numeric59 unique values
0 missing
feat_5numeric15 unique values
0 missing
feat_6numeric9 unique values
0 missing
feat_7numeric30 unique values
0 missing
feat_8numeric55 unique values
0 missing
feat_9numeric40 unique values
0 missing
feat_10numeric26 unique values
0 missing
feat_11numeric28 unique values
0 missing
feat_12numeric20 unique values
0 missing
feat_13numeric51 unique values
0 missing
feat_14numeric32 unique values
0 missing
feat_15numeric37 unique values
0 missing
feat_16numeric28 unique values
0 missing
feat_17numeric40 unique values
0 missing
feat_18numeric30 unique values
0 missing
feat_19numeric105 unique values
0 missing
feat_20numeric27 unique values
0 missing
feat_21numeric15 unique values
0 missing
feat_22numeric21 unique values
0 missing
feat_23numeric24 unique values
0 missing
feat_24numeric66 unique values
0 missing
feat_25numeric28 unique values
0 missing
feat_26numeric31 unique values
0 missing
feat_27numeric55 unique values
0 missing
feat_28numeric21 unique values
0 missing
feat_29numeric50 unique values
0 missing
feat_30numeric51 unique values
0 missing
feat_31numeric29 unique values
0 missing
feat_32numeric40 unique values
0 missing
feat_33numeric20 unique values
0 missing
feat_34numeric49 unique values
0 missing
feat_35numeric78 unique values
0 missing
feat_36numeric58 unique values
0 missing
feat_37numeric18 unique values
0 missing
feat_38numeric36 unique values
0 missing
feat_39numeric77 unique values
0 missing
feat_40numeric41 unique values
0 missing
feat_41numeric31 unique values
0 missing
feat_42numeric39 unique values
0 missing
feat_43numeric28 unique values
0 missing
feat_44numeric26 unique values
0 missing
feat_45numeric76 unique values
0 missing
feat_46numeric39 unique values
0 missing
feat_47numeric39 unique values
0 missing
feat_48numeric47 unique values
0 missing
feat_49numeric35 unique values
0 missing
feat_50numeric56 unique values
0 missing
feat_51numeric22 unique values
0 missing
feat_52numeric36 unique values
0 missing
feat_53numeric42 unique values
0 missing
feat_54numeric53 unique values
0 missing
feat_55numeric26 unique values
0 missing
feat_56numeric54 unique values
0 missing
feat_57numeric27 unique values
0 missing
feat_58numeric87 unique values
0 missing
feat_59numeric63 unique values
0 missing
feat_60numeric39 unique values
0 missing
feat_61numeric23 unique values
0 missing
feat_62numeric40 unique values
0 missing
feat_63numeric28 unique values
0 missing
feat_64numeric49 unique values
0 missing
feat_65numeric25 unique values
0 missing
feat_66numeric34 unique values
0 missing
feat_67numeric72 unique values
0 missing
feat_68numeric39 unique values
0 missing
feat_69numeric65 unique values
0 missing
feat_70numeric35 unique values
0 missing
feat_71numeric28 unique values
0 missing
feat_72numeric31 unique values
0 missing
feat_73numeric115 unique values
0 missing
feat_74numeric101 unique values
0 missing
feat_75numeric70 unique values
0 missing
feat_76numeric61 unique values
0 missing
feat_77numeric24 unique values
0 missing
feat_78numeric70 unique values
0 missing
feat_79numeric22 unique values
0 missing
feat_80numeric41 unique values
0 missing
feat_81numeric19 unique values
0 missing
feat_82numeric23 unique values
0 missing
feat_83numeric57 unique values
0 missing
feat_84numeric42 unique values
0 missing
feat_85numeric42 unique values
0 missing
feat_86numeric52 unique values
0 missing
feat_87numeric49 unique values
0 missing
feat_88numeric31 unique values
0 missing
feat_89numeric37 unique values
0 missing
feat_90numeric91 unique values
0 missing
feat_91numeric50 unique values
0 missing
feat_92numeric19 unique values
0 missing
feat_93numeric43 unique values
0 missing

19 properties

61878
Number of instances (rows) of the dataset.
94
Number of attributes (columns) of the dataset.
9
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
93
Number of numeric attributes.
1
Number of nominal attributes.
1929
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
1
Average class difference between consecutive instances.
0
Percentage of missing values.
0
Number of attributes divided by the number of instances.
98.94
Percentage of numeric attributes.
26.05
Percentage of instances belonging to the most frequent class.
1.06
Percentage of nominal attributes.
16122
Number of instances belonging to the most frequent class.
3.12
Percentage of instances belonging to the least frequent class.

1 tasks

0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: target
Define a new task