Data
KDDCup09_appetency

KDDCup09_appetency

active ARFF Publicly available Visibility: public Uploaded 07-10-2014 by Joaquin Vanschoren
0 likes downloaded by 18 people , 20 total downloads 0 issues 0 downvotes
  • Chemistry Life Science study_218 study_271 study_240 study_446 study_447 study_448 study_449 study_226
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Date unknown Please cite: Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). The most practical way, in a CRM system, to build knowledge on customer is to produce scores. A score (the output of a model) is an evaluation for all instances of a target variable to explain (i.e. churn, appetency or up-selling). Tools which produce scores allow to project, on a given population, quantifiable information. The score is computed using input variables which describe instances. Scores are then used by the information system (IS), for example, to personalize the customer relationship. An industrial customer analysis platform able to build prediction models with a very large number of input variables has been developed by Orange Labs. This platform implements several processing methods for instances and variables selection, prediction and indexation based on an efficient model combined with variable selection regularization and model averaging method. The main characteristic of this platform is its ability to scale on very large datasets with hundreds of thousands of instances and thousands of variables. The rapid and robust detection of the variables that have most contributed to the output prediction can be a key factor in a marketing application. Appetency: In our context, the appetency is the propensity to buy a service or a product. The training set contains 50,000 examples. The first predictive 190 variables are numerical and the last 40 predictive variables are categorical. The last target variable is binary {-1,1}.

231 features

APPETENCY (target)nominal2 unique values
0 missing
Var1numeric18 unique values
49298 missing
Var2numeric2 unique values
48759 missing
Var3numeric146 unique values
48760 missing
Var4numeric4 unique values
48421 missing
Var5numeric571 unique values
48513 missing
Var6numeric1486 unique values
5529 missing
Var7numeric8 unique values
5539 missing
Var8numeric0 unique values
50000 missing
Var9numeric100 unique values
49298 missing
Var10numeric534 unique values
48513 missing
Var11numeric5 unique values
48760 missing
Var12numeric22 unique values
49442 missing
Var13numeric2634 unique values
5539 missing
Var14numeric19 unique values
48760 missing
Var15numeric0 unique values
50000 missing
Var16numeric597 unique values
48513 missing
Var17numeric37 unique values
48421 missing
Var18numeric26 unique values
48421 missing
Var19numeric4 unique values
48421 missing
Var20numeric0 unique values
50000 missing
Var21numeric734 unique values
5529 missing
Var22numeric735 unique values
5009 missing
Var23numeric29 unique values
48513 missing
Var24numeric93 unique values
7230 missing
Var25numeric271 unique values
5009 missing
Var26numeric4 unique values
48513 missing
Var27numeric3 unique values
48513 missing
Var28numeric4167 unique values
5011 missing
Var29numeric2 unique values
49298 missing
Var30numeric13 unique values
49298 missing
Var31numeric0 unique values
50000 missing
Var32numeric0 unique values
50000 missing
Var33numeric298 unique values
49153 missing
Var34numeric6 unique values
48759 missing
Var35numeric13 unique values
5009 missing
Var36numeric531 unique values
48759 missing
Var37numeric550 unique values
48421 missing
Var38numeric30832 unique values
5009 missing
Var39numeric0 unique values
50000 missing
Var40numeric27 unique values
48759 missing
Var41numeric36 unique values
49298 missing
Var42numeric0 unique values
50000 missing
Var43numeric20 unique values
48759 missing
Var44numeric8 unique values
5009 missing
Var45numeric343 unique values
49656 missing
Var46numeric50 unique values
48759 missing
Var47numeric11 unique values
49298 missing
Var48numeric0 unique values
50000 missing
Var49numeric4 unique values
48759 missing
Var50numeric63 unique values
49298 missing
Var51numeric3561 unique values
46253 missing
Var52numeric0 unique values
50000 missing
Var53numeric397 unique values
49298 missing
Var54numeric5 unique values
48759 missing
Var55numeric0 unique values
50000 missing
Var56numeric195 unique values
49354 missing
Var57numeric25614 unique values
0 missing
Var58numeric244 unique values
49298 missing
Var59numeric566 unique values
49180 missing
Var60numeric47 unique values
48513 missing
Var61numeric39 unique values
49153 missing
Var62numeric12 unique values
49442 missing
Var63numeric48 unique values
49306 missing
Var64numeric233 unique values
49762 missing
Var65numeric15 unique values
5539 missing
Var66numeric100 unique values
49306 missing
Var67numeric2 unique values
48513 missing
Var68numeric84 unique values
48759 missing
Var69numeric838 unique values
48513 missing
Var70numeric521 unique values
48513 missing
Var71numeric119 unique values
48871 missing
Var72numeric8 unique values
22380 missing
Var73numeric131 unique values
0 missing
Var74numeric371 unique values
5539 missing
Var75numeric13 unique values
48759 missing
Var76numeric29743 unique values
5009 missing
Var77numeric23 unique values
49298 missing
Var78numeric13 unique values
5009 missing
Var79numeric0 unique values
50000 missing
Var80numeric400 unique values
48513 missing
Var81numeric43042 unique values
5529 missing
Var82numeric4 unique values
48421 missing
Var83numeric195 unique values
5009 missing
Var84numeric96 unique values
48760 missing
Var85numeric149 unique values
5009 missing
Var86numeric448 unique values
49298 missing
Var87numeric5 unique values
49298 missing
Var88numeric88 unique values
48917 missing
Var89numeric16 unique values
49354 missing
Var90numeric2 unique values
49298 missing
Var91numeric119 unique values
48871 missing
Var92numeric169 unique values
49829 missing
Var93numeric4 unique values
48513 missing
Var94numeric20002 unique values
22380 missing
Var95numeric267 unique values
48759 missing
Var96numeric33 unique values
48759 missing
Var97numeric7 unique values
48513 missing
Var98numeric115 unique values
49442 missing
Var99numeric47 unique values
48421 missing
Var100numeric5 unique values
49298 missing
Var101numeric28 unique values
49127 missing
Var102numeric445 unique values
49549 missing
Var103numeric39 unique values
48513 missing
Var104numeric62 unique values
49180 missing
Var105numeric62 unique values
49180 missing
Var106numeric261 unique values
48421 missing
Var107numeric24 unique values
48513 missing
Var108numeric341 unique values
49298 missing
Var109numeric209 unique values
7230 missing
Var110numeric5 unique values
49298 missing
Var111numeric794 unique values
48871 missing
Var112numeric230 unique values
5009 missing
Var113numeric48511 unique values
0 missing
Var114numeric643 unique values
48759 missing
Var115numeric35 unique values
49180 missing
Var116numeric2 unique values
49298 missing
Var117numeric656 unique values
48421 missing
Var118numeric1 unique values
49829 missing
Var119numeric1487 unique values
5529 missing
Var120numeric64 unique values
48513 missing
Var121numeric33 unique values
49298 missing
Var122numeric3 unique values
48759 missing
Var123numeric298 unique values
5009 missing
Var124numeric347 unique values
48421 missing
Var125numeric10505 unique values
5539 missing
Var126numeric51 unique values
13920 missing
Var127numeric39 unique values
48917 missing
Var128numeric88 unique values
48917 missing
Var129numeric45 unique values
49298 missing
Var130numeric2 unique values
48760 missing
Var131numeric152 unique values
49298 missing
Var132numeric19 unique values
5009 missing
Var133numeric37603 unique values
5009 missing
Var134numeric33181 unique values
5009 missing
Var135numeric679 unique values
48421 missing
Var136numeric534 unique values
49306 missing
Var137numeric19 unique values
49298 missing
Var138numeric2 unique values
48421 missing
Var139numeric674 unique values
48513 missing
Var140numeric2648 unique values
5539 missing
Var141numeric0 unique values
50000 missing
Var142numeric4 unique values
49298 missing
Var143numeric4 unique values
5009 missing
Var144numeric10 unique values
5529 missing
Var145numeric88 unique values
48421 missing
Var146numeric10 unique values
48513 missing
Var147numeric5 unique values
48513 missing
Var148numeric119 unique values
48513 missing
Var149numeric18652 unique values
7230 missing
Var150numeric600 unique values
48421 missing
Var151numeric19 unique values
49153 missing
Var152numeric12 unique values
48421 missing
Var153numeric36397 unique values
5009 missing
Var154numeric388 unique values
49298 missing
Var155numeric8 unique values
48421 missing
Var156numeric100 unique values
49306 missing
Var157numeric64 unique values
48871 missing
Var158numeric18 unique values
49127 missing
Var159numeric10 unique values
48759 missing
Var160numeric402 unique values
5009 missing
Var161numeric9 unique values
48421 missing
Var162numeric471 unique values
48759 missing
Var163numeric22957 unique values
5009 missing
Var164numeric19 unique values
48421 missing
Var165numeric204 unique values
49127 missing
Var166numeric48 unique values
48513 missing
Var167numeric0 unique values
50000 missing
Var168numeric453 unique values
49298 missing
Var169numeric0 unique values
50000 missing
Var170numeric18 unique values
48759 missing
Var171numeric746 unique values
48917 missing
Var172numeric13 unique values
48513 missing
Var173numeric4 unique values
5009 missing
Var174numeric29 unique values
48421 missing
Var175numeric0 unique values
50000 missing
Var176numeric28 unique values
48760 missing
Var177numeric443 unique values
48759 missing
Var178numeric30 unique values
49354 missing
Var179numeric15 unique values
48421 missing
Var180numeric547 unique values
49298 missing
Var181numeric7 unique values
5009 missing
Var182numeric819 unique values
48421 missing
Var183numeric374 unique values
48759 missing
Var184numeric31 unique values
48759 missing
Var185numeric0 unique values
50000 missing
Var186numeric13 unique values
49298 missing
Var187numeric57 unique values
49298 missing
Var188numeric535 unique values
48759 missing
Var189numeric97 unique values
28978 missing
Var190numeric328 unique values
49667 missing
Var191nominal1 unique values
48917 missing
Var192nominal361 unique values
369 missing
Var193nominal51 unique values
0 missing
Var194nominal3 unique values
37216 missing
Var195nominal23 unique values
0 missing
Var196nominal4 unique values
0 missing
Var197nominal225 unique values
143 missing
Var198nominal4291 unique values
0 missing
Var199nominal5073 unique values
4 missing
Var200nominal15415 unique values
25408 missing
Var201nominal2 unique values
37217 missing
Var202nominal5713 unique values
1 missing
Var203nominal5 unique values
143 missing
Var204nominal100 unique values
0 missing
Var205nominal3 unique values
1934 missing
Var206nominal21 unique values
5529 missing
Var207nominal14 unique values
0 missing
Var208nominal2 unique values
143 missing
Var209numeric0 unique values
50000 missing
Var210nominal6 unique values
0 missing
Var211nominal2 unique values
0 missing
Var212nominal81 unique values
0 missing
Var213nominal1 unique values
48871 missing
Var214nominal15415 unique values
25408 missing
Var215nominal1 unique values
49306 missing
Var216nominal2016 unique values
0 missing
Var217nominal13990 unique values
703 missing
Var218nominal2 unique values
703 missing
Var219nominal22 unique values
5211 missing
Var220nominal4291 unique values
0 missing
Var221nominal7 unique values
0 missing
Var222nominal4291 unique values
0 missing
Var223nominal4 unique values
5211 missing
Var224nominal1 unique values
49180 missing
Var225nominal3 unique values
26144 missing
Var226nominal23 unique values
0 missing
Var227nominal7 unique values
0 missing
Var228nominal30 unique values
0 missing
Var229nominal4 unique values
28432 missing
Var230numeric0 unique values
50000 missing

107 properties

50000
Number of instances (rows) of the dataset.
231
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
8024152
Number of missing values in the dataset.
50000
Number of instances with at least one value missing.
192
Number of numeric attributes.
39
Number of nominal attributes.
98.22
Percentage of instances belonging to the most frequent class.
402315.91
Mean standard deviation of attributes of the numeric type.
1.41
Second quartile (Median) of entropy among attributes.
0.02
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.02
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.03
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
49110
Number of instances belonging to the most frequent class.
0
Minimal entropy among attributes.
93.97
Second quartile (Median) of kurtosis among attributes of the numeric type.
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.13
Entropy of the target attribute values.
0.02
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
12.46
Maximum entropy among attributes.
-1.67
Minimum kurtosis among attributes of the numeric type.
32.36
Second quartile (Median) of means among attributes of the numeric type.
0.5
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.73
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
26707.6
Maximum kurtosis among attributes of the numeric type.
-153278.61
Minimum of means among attributes of the numeric type.
0
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.02
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.02
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
6181967.17
Maximum of means among attributes of the numeric type.
0
Minimal mutual information between the nominal attributes and the target attribute.
8.28
Second quartile (Median) of skewness among attributes of the numeric type.
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
0.05
Maximum mutual information between the nominal attributes and the target attribute.
1
The minimal number of distinct values among attributes of the nominal type.
2.16
Percentage of binary attributes.
97.37
Second quartile (Median) of standard deviation of attributes of the numeric type.
0.53
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0
Number of attributes divided by the number of instances.
15415
The maximum number of distinct values among attributes of the nominal type.
-1.8
Minimum skewness among attributes of the numeric type.
100
Percentage of instances having missing values.
7.07
Third quartile of entropy among attributes.
0.02
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
14.95
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
148.08
Maximum skewness among attributes of the numeric type.
0
Minimum standard deviation of attributes of the numeric type.
69.47
Percentage of missing values.
382.79
Third quartile of kurtosis among attributes of the numeric type.
0.97
Average class difference between consecutive instances.
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.5
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
25311869.22
Maximum standard deviation of attributes of the numeric type.
1.78
Percentage of instances belonging to the least frequent class.
83.12
Percentage of numeric attributes.
28784.94
Third quartile of means among attributes of the numeric type.
0.73
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.58
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.02
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
3.45
Average entropy of the attributes.
890
Number of instances belonging to the least frequent class.
16.88
Percentage of nominal attributes.
0.01
Third quartile of mutual information between the nominal attributes and the target attribute.
0.02
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.02
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
434.52
Mean kurtosis among attributes of the numeric type.
0.61
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.38
First quartile of entropy among attributes.
17.32
Third quartile of skewness among attributes of the numeric type.
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.5
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
204444.11
Mean of means among attributes of the numeric type.
0.1
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
14.83
First quartile of kurtosis among attributes of the numeric type.
94164.32
Third quartile of standard deviation of attributes of the numeric type.
0.73
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.58
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.02
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.01
Average mutual information between the nominal attributes and the target attribute.
0.02
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
4.18
First quartile of means among attributes of the numeric type.
0.5
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.02
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.02
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
399.42
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
5
Number of binary attributes.
0
First quartile of mutual information between the nominal attributes and the target attribute.
0.02
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.5
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
1833.49
Average number of distinct values among the attributes of the nominal type.
3.17
First quartile of skewness among attributes of the numeric type.
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.59
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
4160.4
Standard deviation of the number of distinct values among attributes of the nominal type.
0.02
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
11.57
Mean skewness among attributes of the numeric type.
9.03
First quartile of standard deviation of attributes of the numeric type.
0.5
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.03
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.51
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

20 tasks

150 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: APPETENCY
69 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: APPETENCY
0 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: APPETENCY
0 runs - estimation_procedure: 20% Holdout (Ordered) - evaluation_measure: predictive_accuracy - target_feature: APPETENCY
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - target_feature: APPETENCY
0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: APPETENCY
0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: classification problem
4 runs - estimation_procedure: 10-fold Learning Curve - target_feature: APPETENCY
0 runs - estimation_procedure: Interleaved Test then Train - target_feature: APPETENCY
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task