Data
coil2000

coil2000

active ARFF Publicly available Visibility: public Uploaded 20-08-2014 by Tobias Kuehn
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Peter van der Putten Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Insurance+Company+Benchmark+(COIL+2000)) Please cite: P. van der Putten and M. van Someren (eds) . CoIL Challenge 2000: The Insurance Company Case. Published by Sentient Machine Research, Amsterdam. Leiden Institute of Advanced Computer Science Technical Report 2000-09. June 22, 2000. [UCI](https://archive.ics.uci.edu/ml/citation_policy.html) Insurance Company Benchmark (COIL 2000) Information about customers consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. The training set contains over 5000 descriptions of customers, including the information of whether or not they have a caravan insurance policy. A test set contains 4000 customers of whom only the organisers know if they have a caravan insurance policy. The data dictionary describes the variables used and their values: TIC Benchmark Homepage: http://www.liacs.nl/~putten/library/cc2000/ ### Attribute Information Each record consists of 86 attributes, containing sociodemographic data (attribute 1-43) and product ownership (attributes 44-86).The sociodemographic data is derived from zip codes. All customers living in areas with the same zip code have the same sociodemographic attributes. Attribute "CARAVAN", the number of mobile home policies, is the target variable. All the variables starting with M are zipcode variables. They give information on the distribution of that variable, e.g. Rented house, in the zipcode area of the customer. ``` 1 MOSTYPE Customer Subtype see L0 2 MAANTHUI Number of houses 1 – 10 3 MGEMOMV Avg size household 1 – 6 4 MGEMLEEF Avg age see L1 5 MOSHOOFD Customer main type see L2 6 MGODRK Roman catholic see L3 7 MGODPR Protestant ... 8 MGODOV Other religion 9 MGODGE No religion 10 MRELGE Married 11 MRELSA Living together 12 MRELOV Other relation 13 MFALLEEN Singles 14 MFGEKIND Household without children 15 MFWEKIND Household with children 16 MOPLHOOG High level education 17 MOPLMIDD Medium level education 18 MOPLLAAG Lower level education 19 MBERHOOG High status 20 MBERZELF Entrepreneur 21 MBERBOER Farmer 22 MBERMIDD Middle management 23 MBERARBG Skilled labourers 24 MBERARBO Unskilled labourers 25 MSKA Social class A 26 MSKB1 Social class B1 27 MSKB2 Social class B2 28 MSKC Social class C 29 MSKD Social class D 30 MHHUUR Rented house 31 MHKOOP Home owners 32 MAUT1 1 car 33 MAUT2 2 cars 34 MAUT0 No car 35 MZFONDS National Health Service 36 MZPART Private health insurance 37 MINKM30 Income < 30>123.000 42 MINKGEM Average income 43 MKOOPKLA Purchasing power class 44 PWAPART Contribution private third party insurance see L4 45 PWABEDR Contribution third party insurance (firms) ... 46 PWALAND Contribution third party insurane (agriculture) 47 PPERSAUT Contribution car policies 48 PBESAUT Contribution delivery van policies 49 PMOTSCO Contribution motorcycle/scooter policies 50 PVRAAUT Contribution lorry policies 51 PAANHANG Contribution trailer policies 52 PTRACTOR Contribution tractor policies 53 PWERKT Contribution agricultural machines policies 54 PBROM Contribution moped policies 55 PLEVEN Contribution life insurances 56 PPERSONG Contribution private accident insurance policies 57 PGEZONG Contribution family accidents insurance policies 58 PWAOREG Contribution disability insurance policies 59 PBRAND Contribution fire policies 60 PZEILPL Contribution surfboard policies 61 PPLEZIER Contribution boat policies 62 PFIETS Contribution bicycle policies 63 PINBOED Contribution property insurance policies 64 PBYSTAND Contribution social security insurance policies 65 AWAPART Number of private third party insurance 1 - 12 66 AWABEDR Number of third party insurance (firms) ... 67 AWALAND Number of third party insurane (agriculture) 68 APERSAUT Number of car policies 69 ABESAUT Number of delivery van policies 70 AMOTSCO Number of motorcycle/scooter policies 71 AVRAAUT Number of lorry policies 72 AAANHANG Number of trailer policies 73 ATRACTOR Number of tractor policies 74 AWERKT Number of agricultural machines policies 75 ABROM Number of moped policies 76 ALEVEN Number of life insurances 77 APERSONG Number of private accident insurance policies 78 AGEZONG Number of family accidents insurance policies 79 AWAOREG Number of disability insurance policies 80 ABRAND Number of fire policies 81 AZEILPL Number of surfboard policies 82 APLEZIER Number of boat policies 83 AFIETS Number of bicycle policies 84 AINBOED Number of property insurance policies 85 ABYSTAND Number of social security insurance policies 86 CARAVAN Number of mobile home policies 0 - 1 L0: Value Label 1 High Income, expensive child 2 Very Important Provincials 3 High status seniors 4 Affluent senior apartments 5 Mixed seniors 6 Career and childcare 7 Dinki's (double income no kids) 8 Middle class families 9 Modern, complete families 10 Stable family 11 Family starters 12 Affluent young families 13 Young all american family 14 Junior cosmopolitan 15 Senior cosmopolitans 16 Students in apartments 17 Fresh masters in the city 18 Single youth 19 Suburban youth 20 Etnically diverse 21 Young urban have-nots 22 Mixed apartment dwellers 23 Young and rising 24 Young, low educated 25 Young seniors in the city 26 Own home elderly 27 Seniors in apartments 28 Residential elderly 29 Porchless seniors: no front yard 30 Religious elderly singles 31 Low income catholics 32 Mixed seniors 33 Lower class large families 34 Large family, employed child 35 Village families 36 Couples with teens 'Married with children' 37 Mixed small town dwellers 38 Traditional families 39 Large religous families 40 Large family farms 41 Mixed rurals L1: 1 20-30 years 2 30-40 years 3 40-50 years 4 50-60 years 5 60-70 years 6 70-80 years L2: 1 Successful hedonists 2 Driven Growers 3 Average Family 4 Career Loners 5 Living well 6 Cruising Seniors 7 Retired and Religeous 8 Family with grown ups 9 Conservative families 10 Farmers L3: 0 0% 1 1 - 10% 2 11 - 23% 3 24 - 36% 4 37 - 49% 5 50 - 62% 6 63 - 75% 7 76 - 88% 8 89 - 99% 9 100% L4: 0 f 0 1 f 1 – 49 2 f 50 – 99 3 f 100 – 99 4 f 200 – 499 5 f 500 – 999 6 f 1000 – 4999 7 f 5000 – 9999 8 f 10.000 - 19.999 9 f 20.000 - ? ```

86 features

CARAVAN (target)numeric2 unique values
0 missing
MOSTYPEnumeric40 unique values
0 missing
MAANTHUInumeric9 unique values
0 missing
MGEMOMVnumeric6 unique values
0 missing
MGEMLEEFnumeric6 unique values
0 missing
MOSHOOFDnumeric10 unique values
0 missing
MGODRKnumeric10 unique values
0 missing
MGODPRnumeric10 unique values
0 missing
MGODOVnumeric6 unique values
0 missing
MGODGEnumeric10 unique values
0 missing
MRELGEnumeric10 unique values
0 missing
MRELSAnumeric8 unique values
0 missing
MRELOVnumeric10 unique values
0 missing
MFALLEENnumeric10 unique values
0 missing
MFGEKINDnumeric10 unique values
0 missing
MFWEKINDnumeric10 unique values
0 missing
MOPLHOOGnumeric10 unique values
0 missing
MOPLMIDDnumeric10 unique values
0 missing
MOPLLAAGnumeric10 unique values
0 missing
MBERHOOGnumeric10 unique values
0 missing
MBERZELFnumeric6 unique values
0 missing
MBERBOERnumeric10 unique values
0 missing
MBERMIDDnumeric10 unique values
0 missing
MBERARBGnumeric10 unique values
0 missing
MBERARBOnumeric10 unique values
0 missing
MSKAnumeric10 unique values
0 missing
MSKB1numeric10 unique values
0 missing
MSKB2numeric10 unique values
0 missing
MSKCnumeric10 unique values
0 missing
MSKDnumeric10 unique values
0 missing
MHHUURnumeric10 unique values
0 missing
MHKOOPnumeric10 unique values
0 missing
MAUT1numeric10 unique values
0 missing
MAUT2numeric9 unique values
0 missing
MAUT0numeric10 unique values
0 missing
MZFONDSnumeric10 unique values
0 missing
MZPARTnumeric10 unique values
0 missing
MINKM30numeric10 unique values
0 missing
MINK3045numeric10 unique values
0 missing
MINK4575numeric10 unique values
0 missing
MINK7512numeric10 unique values
0 missing
MINK123Mnumeric9 unique values
0 missing
MINKGEMnumeric10 unique values
0 missing
MKOOPKLAnumeric8 unique values
0 missing
PWAPARTnumeric4 unique values
0 missing
PWABEDRnumeric7 unique values
0 missing
PWALANDnumeric5 unique values
0 missing
PPERSAUTnumeric7 unique values
0 missing
PBESAUTnumeric4 unique values
0 missing
PMOTSCOnumeric6 unique values
0 missing
PVRAAUTnumeric5 unique values
0 missing
PAANHANGnumeric6 unique values
0 missing
PTRACTORnumeric6 unique values
0 missing
PWERKTnumeric6 unique values
0 missing
PBROMnumeric6 unique values
0 missing
PLEVENnumeric10 unique values
0 missing
PPERSONGnumeric7 unique values
0 missing
PGEZONGnumeric3 unique values
0 missing
PWAOREGnumeric5 unique values
0 missing
PBRANDnumeric9 unique values
0 missing
PZEILPLnumeric4 unique values
0 missing
PPLEZIERnumeric7 unique values
0 missing
PFIETSnumeric2 unique values
0 missing
PINBOEDnumeric7 unique values
0 missing
PBYSTANDnumeric5 unique values
0 missing
AWAPARTnumeric3 unique values
0 missing
AWABEDRnumeric3 unique values
0 missing
AWALANDnumeric2 unique values
0 missing
APERSAUTnumeric9 unique values
0 missing
ABESAUTnumeric6 unique values
0 missing
AMOTSCOnumeric5 unique values
0 missing
AVRAAUTnumeric5 unique values
0 missing
AAANHANGnumeric4 unique values
0 missing
ATRACTORnumeric7 unique values
0 missing
AWERKTnumeric6 unique values
0 missing
ABROMnumeric4 unique values
0 missing
ALEVENnumeric7 unique values
0 missing
APERSONGnumeric2 unique values
0 missing
AGEZONGnumeric2 unique values
0 missing
AWAOREGnumeric3 unique values
0 missing
ABRANDnumeric8 unique values
0 missing
AZEILPLnumeric2 unique values
0 missing
APLEZIERnumeric3 unique values
0 missing
AFIETSnumeric5 unique values
0 missing
AINBOEDnumeric3 unique values
0 missing
ABYSTANDnumeric3 unique values
0 missing

107 properties

9822
Number of instances (rows) of the dataset.
86
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
86
Number of numeric attributes.
0
Number of nominal attributes.
1.74
Second quartile (Median) of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
24.25
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0
Percentage of binary attributes.
0.85
Second quartile (Median) of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.01
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
The minimal number of distinct values among attributes of the nominal type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
The maximum number of distinct values among attributes of the nominal type.
-0.71
Minimum skewness among attributes of the numeric type.
0
Percentage of missing values.
122.5
Third quartile of kurtosis among attributes of the numeric type.
0.89
Average class difference between consecutive instances.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
39.36
Maximum skewness among attributes of the numeric type.
0.03
Minimum standard deviation of attributes of the numeric type.
100
Percentage of numeric attributes.
2.74
Third quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
12.92
Maximum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
0
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
Number of instances belonging to the least frequent class.
First quartile of entropy among attributes.
10.42
Third quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
140.22
Mean kurtosis among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.4
First quartile of kurtosis among attributes of the numeric type.
1.73
Third quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
1.73
Mean of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.02
First quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
0
Number of binary attributes.
0.48
First quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Standard deviation of the number of distinct values among attributes of the nominal type.
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
Average number of distinct values among the attributes of the nominal type.
0.22
First quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
6.51
Mean skewness among attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Percentage of instances belonging to the most frequent class.
1.14
Mean standard deviation of attributes of the numeric type.
6.86
Second quartile (Median) of kurtosis among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Entropy of the target attribute values.
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
0.64
Second quartile (Median) of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
-1.96
Minimum kurtosis among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
1825.55
Maximum kurtosis among attributes of the numeric type.
0
Minimum of means among attributes of the numeric type.

13 tasks

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: CARAVAN
0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: CARAVAN
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task