Data
KDDCup09_upselling

KDDCup09_upselling

active ARFF Publicly available Visibility: public Uploaded 18-06-2022 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Physical Sciences
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on categorical and numerical features" benchmark. Original description: Author: Source: Unknown - Date unknown Please cite: Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). The most practical way, in a CRM system, to build knowledge on customer is to produce scores. A score (the output of a model) is an evaluation for all instances of a target variable to explain (i.e. churn, appetency or up-selling). Tools which produce scores allow to project, on a given population, quantifiable information. The score is computed using input variables which describe instances. Scores are then used by the information system (IS), for example, to personalize the customer relationship. An industrial customer analysis platform able to build prediction models with a very large number of input variables has been developed by Orange Labs. This platform implements several processing methods for instances and variables selection, prediction and indexation based on an efficient model combined with variable selection regularization and model averaging method. The main characteristic of this platform is its ability to scale on very large datasets with hundreds of thousands of instances and thousands of variables. The rapid and robust detection of the variables that have most contributed to the output prediction can be a key factor in a marketing application. Up-selling (wikipedia definition): Up-selling is a sales technique whereby a salesman attempts to have the customer purchase more expensive items, upgrades, or other add-ons in an attempt to make a more profitable sale. Up-selling usually involves marketing more profitable services or products, but up-selling can also be simply exposing the customer to other options he or she may not have considered previously. Up-selling can imply selling something additional, or selling something that is more profitable or otherwise preferable for the seller instead of the original sale. The training set contains 50,000 examples. The first predictive 190 variables are numerical and the last 40 predictive variables are categorical. The last target variable is binary {-1,1}.

46 features

UPSELLING (target)string2 unique values
0 missing
Var6numeric707 unique values
0 missing
Var13numeric1137 unique values
0 missing
Var21numeric321 unique values
0 missing
Var22numeric321 unique values
0 missing
Var24numeric46 unique values
0 missing
Var25numeric130 unique values
0 missing
Var28numeric944 unique values
0 missing
Var35numeric9 unique values
0 missing
Var38numeric3878 unique values
0 missing
Var57numeric4658 unique values
0 missing
Var65numeric11 unique values
0 missing
Var73numeric113 unique values
0 missing
Var74numeric187 unique values
0 missing
Var76numeric3903 unique values
0 missing
Var78numeric8 unique values
0 missing
Var81numeric5019 unique values
0 missing
Var83numeric62 unique values
0 missing
Var85numeric64 unique values
0 missing
Var109numeric87 unique values
0 missing
Var112numeric100 unique values
0 missing
Var113numeric5019 unique values
0 missing
Var119numeric647 unique values
0 missing
Var123numeric102 unique values
0 missing
Var125numeric2673 unique values
0 missing
Var126numeric51 unique values
0 missing
Var132numeric13 unique values
0 missing
Var133numeric4648 unique values
0 missing
Var134numeric4238 unique values
0 missing
Var140numeric1065 unique values
0 missing
Var144numeric8 unique values
0 missing
Var149numeric2703 unique values
0 missing
Var153numeric4917 unique values
0 missing
Var160numeric166 unique values
0 missing
Var163numeric3179 unique values
0 missing
Var196nominal2 unique values
0 missing
Var203nominal4 unique values
0 missing
Var205nominal4 unique values
0 missing
Var207nominal11 unique values
0 missing
Var208nominal3 unique values
0 missing
Var210nominal5 unique values
0 missing
Var211nominal2 unique values
0 missing
Var218nominal2 unique values
0 missing
Var221nominal7 unique values
0 missing
Var223nominal5 unique values
0 missing
Var227nominal7 unique values
0 missing

19 properties

5032
Number of instances (rows) of the dataset.
46
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
34
Number of numeric attributes.
11
Number of nominal attributes.
6.52
Percentage of binary attributes.
0
Percentage of instances having missing values.
1
Average class difference between consecutive instances.
0
Percentage of missing values.
0.01
Number of attributes divided by the number of instances.
73.91
Percentage of numeric attributes.
50
Percentage of instances belonging to the most frequent class.
23.91
Percentage of nominal attributes.
2516
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
2516
Number of instances belonging to the least frequent class.
3
Number of binary attributes.

0 tasks

Define a new task