Data
KDDCup09_upselling

KDDCup09_upselling

active ARFF Publicly available Visibility: public Uploaded 25-07-2022 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on categorical and numerical features" benchmark. Original description: Author: Source: Unknown - Date unknown Please cite: Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). The most practical way, in a CRM system, to build knowledge on customer is to produce scores. A score (the output of a model) is an evaluation for all instances of a target variable to explain (i.e. churn, appetency or up-selling). Tools which produce scores allow to project, on a given population, quantifiable information. The score is computed using input variables which describe instances. Scores are then used by the information system (IS), for example, to personalize the customer relationship. An industrial customer analysis platform able to build prediction models with a very large number of input variables has been developed by Orange Labs. This platform implements several processing methods for instances and variables selection, prediction and indexation based on an efficient model combined with variable selection regularization and model averaging method. The main characteristic of this platform is its ability to scale on very large datasets with hundreds of thousands of instances and thousands of variables. The rapid and robust detection of the variables that have most contributed to the output prediction can be a key factor in a marketing application. Up-selling (wikipedia definition): Up-selling is a sales technique whereby a salesman attempts to have the customer purchase more expensive items, upgrades, or other add-ons in an attempt to make a more profitable sale. Up-selling usually involves marketing more profitable services or products, but up-selling can also be simply exposing the customer to other options he or she may not have considered previously. Up-selling can imply selling something additional, or selling something that is more profitable or otherwise preferable for the seller instead of the original sale. The training set contains 50,000 examples. The first predictive 190 variables are numerical and the last 40 predictive variables are categorical. The last target variable is binary {-1,1}.

50 features

UPSELLING (target)nominal2 unique values
0 missing
Var6numeric712 unique values
0 missing
Var13numeric1140 unique values
0 missing
Var21numeric325 unique values
0 missing
Var22numeric325 unique values
0 missing
Var24numeric44 unique values
0 missing
Var25numeric135 unique values
0 missing
Var28numeric928 unique values
0 missing
Var35numeric11 unique values
0 missing
Var38numeric3923 unique values
0 missing
Var57numeric4728 unique values
0 missing
Var65numeric11 unique values
0 missing
Var73numeric116 unique values
0 missing
Var74numeric190 unique values
0 missing
Var76numeric3950 unique values
0 missing
Var78numeric10 unique values
0 missing
Var81numeric5119 unique values
0 missing
Var83numeric71 unique values
0 missing
Var85numeric68 unique values
0 missing
Var109numeric94 unique values
0 missing
Var112numeric104 unique values
0 missing
Var113numeric5114 unique values
0 missing
Var119numeric660 unique values
0 missing
Var123numeric108 unique values
0 missing
Var125numeric2769 unique values
0 missing
Var126numeric51 unique values
0 missing
Var132numeric17 unique values
0 missing
Var133numeric4726 unique values
0 missing
Var134numeric4331 unique values
0 missing
Var140numeric1109 unique values
0 missing
Var144numeric8 unique values
0 missing
Var149numeric2709 unique values
0 missing
Var153numeric5010 unique values
0 missing
Var160numeric169 unique values
0 missing
Var163numeric3256 unique values
0 missing
Var194nominal4 unique values
0 missing
Var196nominal3 unique values
0 missing
Var201nominal2 unique values
0 missing
Var203nominal4 unique values
0 missing
Var205nominal4 unique values
0 missing
Var207nominal12 unique values
0 missing
Var208nominal3 unique values
0 missing
Var210nominal5 unique values
0 missing
Var211nominal2 unique values
0 missing
Var218nominal3 unique values
0 missing
Var221nominal7 unique values
0 missing
Var223nominal5 unique values
0 missing
Var225nominal4 unique values
0 missing
Var227nominal7 unique values
0 missing
Var229nominal5 unique values
0 missing

19 properties

5128
Number of instances (rows) of the dataset.
50
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
34
Number of numeric attributes.
16
Number of nominal attributes.
6
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
1
Average class difference between consecutive instances.
68
Percentage of numeric attributes.
0.01
Number of attributes divided by the number of instances.
32
Percentage of nominal attributes.
50
Percentage of instances belonging to the most frequent class.
2564
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
2564
Number of instances belonging to the least frequent class.
3
Number of binary attributes.

1 tasks

1 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: UPSELLING
Define a new task