Data
dbworld-subjects-stemmed

dbworld-subjects-stemmed

active ARFF Publicly available Visibility: public Uploaded 01-06-2015 by Rafael Gomes Mantovani
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Data Analysis mf_less_than_80 study_123 Technology
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Michele Filannino Source: UCI Please cite: * Dataset: DBworld e-mails data set Task: dbworld-subjects-stemmed * Source: Michele Filannino, PhD University of Manchester Centre for Doctoral Training Email: filannim_AT_cs.man.ac.uk * Data Set Information: I collected 64 e-mails from DBWorld newsletter and I used them to train different algorithms in order to classify between 'announces of conferences' and 'everything else'. I used a binary bag-of-words representation with a stopword removal pre-processing task before. * Attribute Information: Each attribute corresponds to a precise word or stem in the entire data set vocabulary (I used bag-of-words representation). * Relevant Papers: Michele Filannino, 'DBWorld e-mail classification using a very small corpus', Project of Machine Learning course, University of Manchester, 2011.

230 features

Class (target)nominal2 unique values
0 missing
V1nominal2 unique values
0 missing
V2nominal2 unique values
0 missing
V3nominal2 unique values
0 missing
V4nominal2 unique values
0 missing
V5nominal2 unique values
0 missing
V6nominal2 unique values
0 missing
V7nominal2 unique values
0 missing
V8nominal2 unique values
0 missing
V9nominal2 unique values
0 missing
V10nominal2 unique values
0 missing
V11nominal2 unique values
0 missing
V12nominal2 unique values
0 missing
V13nominal2 unique values
0 missing
V14nominal2 unique values
0 missing
V15nominal2 unique values
0 missing
V16nominal2 unique values
0 missing
V17nominal2 unique values
0 missing
V18nominal2 unique values
0 missing
V19nominal2 unique values
0 missing
V20nominal2 unique values
0 missing
V21nominal2 unique values
0 missing
V22nominal2 unique values
0 missing
V23nominal2 unique values
0 missing
V24nominal2 unique values
0 missing
V25nominal2 unique values
0 missing
V26nominal2 unique values
0 missing
V27nominal2 unique values
0 missing
V28nominal2 unique values
0 missing
V29nominal2 unique values
0 missing
V30nominal2 unique values
0 missing
V31nominal2 unique values
0 missing
V32nominal2 unique values
0 missing
V33nominal2 unique values
0 missing
V34nominal2 unique values
0 missing
V35nominal2 unique values
0 missing
V36nominal2 unique values
0 missing
V37nominal2 unique values
0 missing
V38nominal2 unique values
0 missing
V39nominal2 unique values
0 missing
V40nominal2 unique values
0 missing
V41nominal2 unique values
0 missing
V42nominal2 unique values
0 missing
V43nominal2 unique values
0 missing
V44nominal2 unique values
0 missing
V45nominal2 unique values
0 missing
V46nominal2 unique values
0 missing
V47nominal2 unique values
0 missing
V48nominal2 unique values
0 missing
V49nominal2 unique values
0 missing
V50nominal2 unique values
0 missing
V51nominal2 unique values
0 missing
V52nominal2 unique values
0 missing
V53nominal2 unique values
0 missing
V54nominal2 unique values
0 missing
V55nominal2 unique values
0 missing
V56nominal2 unique values
0 missing
V57nominal2 unique values
0 missing
V58nominal2 unique values
0 missing
V59nominal2 unique values
0 missing
V60nominal2 unique values
0 missing
V61nominal2 unique values
0 missing
V62nominal2 unique values
0 missing
V63nominal2 unique values
0 missing
V64nominal2 unique values
0 missing
V65nominal2 unique values
0 missing
V66nominal2 unique values
0 missing
V67nominal2 unique values
0 missing
V68nominal2 unique values
0 missing
V69nominal2 unique values
0 missing
V70nominal2 unique values
0 missing
V71nominal2 unique values
0 missing
V72nominal2 unique values
0 missing
V73nominal2 unique values
0 missing
V74nominal2 unique values
0 missing
V75nominal2 unique values
0 missing
V76nominal2 unique values
0 missing
V77nominal2 unique values
0 missing
V78nominal2 unique values
0 missing
V79nominal2 unique values
0 missing
V80nominal2 unique values
0 missing
V81nominal2 unique values
0 missing
V82nominal2 unique values
0 missing
V83nominal2 unique values
0 missing
V84nominal2 unique values
0 missing
V85nominal2 unique values
0 missing
V86nominal2 unique values
0 missing
V87nominal2 unique values
0 missing
V88nominal2 unique values
0 missing
V89nominal2 unique values
0 missing
V90nominal2 unique values
0 missing
V91nominal2 unique values
0 missing
V92nominal2 unique values
0 missing
V93nominal2 unique values
0 missing
V94nominal2 unique values
0 missing
V95nominal2 unique values
0 missing
V96nominal2 unique values
0 missing
V97nominal2 unique values
0 missing
V98nominal2 unique values
0 missing
V99nominal2 unique values
0 missing
V100nominal2 unique values
0 missing
V101nominal2 unique values
0 missing
V102nominal2 unique values
0 missing
V103nominal2 unique values
0 missing
V104nominal2 unique values
0 missing
V105nominal2 unique values
0 missing
V106nominal2 unique values
0 missing
V107nominal2 unique values
0 missing
V108nominal2 unique values
0 missing
V109nominal2 unique values
0 missing
V110nominal2 unique values
0 missing
V111nominal2 unique values
0 missing
V112nominal2 unique values
0 missing
V113nominal2 unique values
0 missing
V114nominal2 unique values
0 missing
V115nominal2 unique values
0 missing
V116nominal2 unique values
0 missing
V117nominal2 unique values
0 missing
V118nominal2 unique values
0 missing
V119nominal2 unique values
0 missing
V120nominal2 unique values
0 missing
V121nominal2 unique values
0 missing
V122nominal2 unique values
0 missing
V123nominal2 unique values
0 missing
V124nominal2 unique values
0 missing
V125nominal2 unique values
0 missing
V126nominal2 unique values
0 missing
V127nominal2 unique values
0 missing
V128nominal2 unique values
0 missing
V129nominal2 unique values
0 missing
V130nominal2 unique values
0 missing
V131nominal2 unique values
0 missing
V132nominal2 unique values
0 missing
V133nominal2 unique values
0 missing
V134nominal2 unique values
0 missing
V135nominal2 unique values
0 missing
V136nominal2 unique values
0 missing
V137nominal2 unique values
0 missing
V138nominal2 unique values
0 missing
V139nominal2 unique values
0 missing
V140nominal2 unique values
0 missing
V141nominal2 unique values
0 missing
V142nominal2 unique values
0 missing
V143nominal2 unique values
0 missing
V144nominal2 unique values
0 missing
V145nominal2 unique values
0 missing
V146nominal2 unique values
0 missing
V147nominal2 unique values
0 missing
V148nominal2 unique values
0 missing
V149nominal2 unique values
0 missing
V150nominal2 unique values
0 missing
V151nominal2 unique values
0 missing
V152nominal2 unique values
0 missing
V153nominal2 unique values
0 missing
V154nominal2 unique values
0 missing
V155nominal2 unique values
0 missing
V156nominal2 unique values
0 missing
V157nominal2 unique values
0 missing
V158nominal2 unique values
0 missing
V159nominal2 unique values
0 missing
V160nominal2 unique values
0 missing
V161nominal2 unique values
0 missing
V162nominal2 unique values
0 missing
V163nominal2 unique values
0 missing
V164nominal2 unique values
0 missing
V165nominal2 unique values
0 missing
V166nominal2 unique values
0 missing
V167nominal2 unique values
0 missing
V168nominal2 unique values
0 missing
V169nominal2 unique values
0 missing
V170nominal2 unique values
0 missing
V171nominal2 unique values
0 missing
V172nominal2 unique values
0 missing
V173nominal2 unique values
0 missing
V174nominal2 unique values
0 missing
V175nominal2 unique values
0 missing
V176nominal2 unique values
0 missing
V177nominal2 unique values
0 missing
V178nominal2 unique values
0 missing
V179nominal2 unique values
0 missing
V180nominal2 unique values
0 missing
V181nominal2 unique values
0 missing
V182nominal2 unique values
0 missing
V183nominal2 unique values
0 missing
V184nominal2 unique values
0 missing
V185nominal2 unique values
0 missing
V186nominal2 unique values
0 missing
V187nominal2 unique values
0 missing
V188nominal2 unique values
0 missing
V189nominal2 unique values
0 missing
V190nominal2 unique values
0 missing
V191nominal2 unique values
0 missing
V192nominal2 unique values
0 missing
V193nominal2 unique values
0 missing
V194nominal2 unique values
0 missing
V195nominal2 unique values
0 missing
V196nominal2 unique values
0 missing
V197nominal2 unique values
0 missing
V198nominal2 unique values
0 missing
V199nominal2 unique values
0 missing
V200nominal2 unique values
0 missing
V201nominal2 unique values
0 missing
V202nominal2 unique values
0 missing
V203nominal2 unique values
0 missing
V204nominal2 unique values
0 missing
V205nominal2 unique values
0 missing
V206nominal2 unique values
0 missing
V207nominal2 unique values
0 missing
V208nominal2 unique values
0 missing
V209nominal2 unique values
0 missing
V210nominal2 unique values
0 missing
V211nominal2 unique values
0 missing
V212nominal2 unique values
0 missing
V213nominal2 unique values
0 missing
V214nominal2 unique values
0 missing
V215nominal2 unique values
0 missing
V216nominal2 unique values
0 missing
V217nominal2 unique values
0 missing
V218nominal2 unique values
0 missing
V219nominal2 unique values
0 missing
V220nominal2 unique values
0 missing
V221nominal2 unique values
0 missing
V222nominal2 unique values
0 missing
V223nominal2 unique values
0 missing
V224nominal2 unique values
0 missing
V225nominal2 unique values
0 missing
V226nominal2 unique values
0 missing
V227nominal2 unique values
0 missing
V228nominal2 unique values
0 missing
V229nominal2 unique values
0 missing

107 properties

64
Number of instances (rows) of the dataset.
230
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
230
Number of nominal attributes.
0.51
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.83
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0
Standard deviation of the number of distinct values among attributes of the nominal type.
0.25
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
2
Average number of distinct values among the attributes of the nominal type.
First quartile of skewness among attributes of the numeric type.
0.81
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.22
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.92
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
0.51
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
Mean skewness among attributes of the numeric type.
First quartile of standard deviation of attributes of the numeric type.
0.25
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.57
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.16
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
54.69
Percentage of instances belonging to the most frequent class.
Mean standard deviation of attributes of the numeric type.
0.12
Second quartile (Median) of entropy among attributes.
0.51
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.99
Entropy of the target attribute values.
0.69
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
35
Number of instances belonging to the most frequent class.
0.12
Minimal entropy among attributes.
Second quartile (Median) of kurtosis among attributes of the numeric type.
0.81
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.71
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
0.84
Maximum entropy among attributes.
Minimum kurtosis among attributes of the numeric type.
Second quartile (Median) of means among attributes of the numeric type.
0.25
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.34
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum kurtosis among attributes of the numeric type.
Minimum of means among attributes of the numeric type.
0.02
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Second quartile (Median) of skewness among attributes of the numeric type.
0.51
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.31
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum of means among attributes of the numeric type.
0
Minimal mutual information between the nominal attributes and the target attribute.
Second quartile (Median) of standard deviation of attributes of the numeric type.
0.72
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
3.59
Number of attributes divided by the number of instances.
0.29
Maximum mutual information between the nominal attributes and the target attribute.
2
The minimal number of distinct values among attributes of the nominal type.
100
Percentage of binary attributes.
0.2
Third quartile of entropy among attributes.
0.3
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
44.56
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
2
The maximum number of distinct values among attributes of the nominal type.
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
Third quartile of kurtosis among attributes of the numeric type.
0.94
Average class difference between consecutive instances.
0.42
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.81
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Maximum skewness among attributes of the numeric type.
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
Third quartile of means among attributes of the numeric type.
0.83
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.72
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.25
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Maximum standard deviation of attributes of the numeric type.
45.31
Percentage of instances belonging to the least frequent class.
0
Percentage of numeric attributes.
0.02
Third quartile of mutual information between the nominal attributes and the target attribute.
0.22
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.3
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.51
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.17
Average entropy of the attributes.
29
Number of instances belonging to the least frequent class.
100
Percentage of nominal attributes.
Third quartile of skewness among attributes of the numeric type.
0.57
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.42
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.81
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Mean kurtosis among attributes of the numeric type.
0.91
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.12
First quartile of entropy among attributes.
Third quartile of standard deviation of attributes of the numeric type.
0.83
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.72
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.25
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Mean of means among attributes of the numeric type.
0.27
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of kurtosis among attributes of the numeric type.
0.81
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.22
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.3
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.51
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.02
Average mutual information between the nominal attributes and the target attribute.
0.44
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of means among attributes of the numeric type.
0.25
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.57
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.42
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.81
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
6.56
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
230
Number of binary attributes.
0.01
First quartile of mutual information between the nominal attributes and the target attribute.

13 tasks

40 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Class
31 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Class
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task