OpenML
Filter results by:
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes0 downloads0 reach0 impact
379 instances - 8 features - 4 classes - 1418 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
490 runs0 likes0 downloads0 reach0 impact
364 instances - 33 features - 6 classes - 101 missing values
One of the datasets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff. It contains data on the DMFT Index (Decayed, Missing, and Filled Teeth) before and after different prevention…
27866 runs0 likes0 downloads0 reach0 impact
797 instances - 5 features - 6 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
1030 runs0 likes0 downloads0 reach0 impact
132 instances - 4 features - 2 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
1116 runs0 likes0 downloads0 reach0 impact
120 instances - 4 features - 2 classes - 0 missing values
This database is a standardized version of the original audiology database (see audiology.* in this directory). The non-standard set of attributes have been converted to a standard set of attributes…
7303 runs0 likes0 downloads0 reach0 impact
226 instances - 70 features - 24 classes - 317 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
709 runs0 likes0 downloads0 reach0 impact
48 instances - 5 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
135 runs0 likes0 downloads0 reach0 impact
3190 instances - 61 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
173 runs0 likes0 downloads0 reach0 impact
106 instances - 58 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
652 runs0 likes0 downloads0 reach0 impact
12960 instances - 9 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
717 runs0 likes0 downloads0 reach0 impact
90 instances - 9 features - 2 classes - 3 missing values
Internet Usage Data Data Type multivariate Abstract This data contains general demographic information on internet users in 1997. Sources Original Owner [1]Graphics, Visualization, & Usability Center…
0 runs0 likes0 downloads0 reach0 impact
10108 instances - 72 features - 46 classes - 2699 missing values
Pittsburgh bridges This version is derived from version 2 (the discretized version) by removing all instances with missing values in the last (target) attribute. The bridges dataset is originally not…
31 runs0 likes0 downloads0 reach0 impact
105 instances - 12 features - 6 classes - 61 missing values
SPECT heart data This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Sources: --…
1296 runs0 likes0 downloads0 reach0 impact
267 instances - 23 features - 2 classes - 0 missing values
This database contains the HTML source of web pages plus the ratings of a single user on these web pages. The web pages are on four separate subjects (Bands- recording artists; Goats; Sheep; and…
0 runs0 likes0 downloads0 reach0 impact
131 instances - 3 features - 3 classes - 0 missing values
This database contains the HTML source of web pages plus the ratings of a single user on these web pages. The web pages are on four separate subjects (Bands- recording artists; Goats; Sheep; and…
0 runs0 likes0 downloads0 reach0 impact
65 instances - 3 features - 2 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
358834 runs0 likes0 downloads0 reach0 impact
556 instances - 7 features - 2 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
394951 runs3 likes34 downloads37 reach39 impact
601 instances - 7 features - 2 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
108820 runs0 likes0 downloads0 reach0 impact
554 instances - 7 features - 2 classes - 0 missing values
No data.
31 runs0 likes0 downloads0 reach0 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
30 runs0 likes0 downloads0 reach0 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
30 runs0 likes0 downloads0 reach0 impact
1000000 instances - 70 features - 24 classes - 0 missing values
Source: http://www.ijcaonline.org/archives/volume47/number18/7291-0509 Data Set Information: In this paper, we look for to recognize the causes of users tend to cyber space in Kohkiloye and Boyer…
373 runs0 likes0 downloads0 reach0 impact
100 instances - 6 features - 2 classes - 0 missing values
1. Title: 1984 United States Congressional Voting Records Database 2. Source Information: (a) Source: Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional…
2262 runs0 likes0 downloads0 reach0 impact
435 instances - 17 features - 2 classes - 392 missing values
This database encodes the complete set of possible board configurations at the end of tic-tac-toe games, where "x" is assumed to have played first. The target concept is "win for x" (i.e., true when…
386788 runs0 likes0 downloads0 reach0 impact
958 instances - 10 features - 2 classes - 0 missing values
Primate splice-junction gene sequences (DNA) with associated imperfect domain theory. Splice junctions are points on a DNA sequence at which 'superfluous' DNA is removed during the process of protein…
24646 runs0 likes0 downloads0 reach0 impact
3190 instances - 61 features - 3 classes - 0 missing values
1. Title: INDUCE Trains Data set 2. Sources: - Donor: GMU, Center for AI, Software Librarian, Eric E. Bloedorn (bloedorn@aic.gmu.edu) - Original owners: Ryszard S. Michalski (michalski@aic.gmu.edu)…
1973 runs0 likes0 downloads0 reach0 impact
10 instances - 33 features - 2 classes - 51 missing values
This data sets consists of 3 different types of irises' (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray
65 runs0 likes0 downloads0 reach0 impact
1000000 instances - 40 features - 2 classes - 0 missing values
Citation Request: This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
2009 runs0 likes0 downloads0 reach0 impact
286 instances - 10 features - 2 classes - 9 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
726 runs0 likes0 downloads0 reach0 impact
576 instances - 12 features - 2 classes - 0 missing values
No data.
60 runs0 likes0 downloads0 reach0 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
37 runs0 likes0 downloads0 reach0 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
33 runs0 likes0 downloads0 reach0 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
313 runs0 likes0 downloads0 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
307 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
306 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
90 runs0 likes0 downloads0 reach0 impact
663552 instances - 13 features - 2 classes - 0 missing values
No data.
28 runs0 likes0 downloads0 reach0 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
29 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
29 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
29 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
28 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
31 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
28 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
29 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
28 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
* Dataset: DBworld e-mails data set Task: dbworld-bodies * Source: Michele Filannino, PhD University of Manchester Centre for Doctoral Training Email: filannim_AT_cs.man.ac.uk * Data Set Information:…
3 runs0 likes0 downloads0 reach0 impact
64 instances - 4703 features - 2 classes - 0 missing values
* Dataset: DBworld e-mails data set Task: dbworld-subjects-stemmed * Source: Michele Filannino, PhD University of Manchester Centre for Doctoral Training Email: filannim_AT_cs.man.ac.uk * Data Set…
71 runs0 likes0 downloads0 reach0 impact
64 instances - 230 features - 2 classes - 0 missing values
* Dataset: DBworld e-mails data set Task: dbworld-subjects * Source: Michele Filannino, PhD University of Manchester Centre for Doctoral Training Email: filannim_AT_cs.man.ac.uk * Data Set…
40 runs0 likes0 downloads0 reach0 impact
64 instances - 243 features - 2 classes - 0 missing values
* Dataset: DBworld e-mails data set Task: dbworld-bodies-stemmed * Source: Michele Filannino, PhD University of Manchester Centre for Doctoral Training Email: filannim_AT_cs.man.ac.uk * Data Set…
0 runs0 likes0 downloads0 reach0 impact
64 instances - 3722 features - 2 classes - 0 missing values
* Title: Nursery Database * Abstract: 4-class version of the original Nursery dataset
121 runs0 likes0 downloads0 reach0 impact
12958 instances - 9 features - 4 classes - 0 missing values
1. Title: Nursery Database 2. Sources: (a) Creator: Vladislav Rajkovic et al. (13 experts) (b) Donors: Marko Bohanec (marko.bohanec@ijs.si) Blaz Zupan (blaz.zupan@ijs.si) (c) Date: June, 1997 3. Past…
2210 runs0 likes0 downloads0 reach0 impact
12960 instances - 9 features - 5 classes - 0 missing values
### Description This dataset describes mushrooms in terms of their physical characteristics. They are classified into: poisonous or edible. ### Source ``` (a) Origin: Mushroom records are drawn from…
16692 runs0 likes0 downloads0 reach0 impact
8124 instances - 23 features - 2 classes - 2480 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
104 runs0 likes0 downloads0 reach0 impact
379 instances - 8 features - 2 classes - 1368 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
728 runs0 likes0 downloads0 reach0 impact
2000 instances - 241 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
722 runs0 likes0 downloads0 reach0 impact
683 instances - 36 features - 2 classes - 2337 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
757 runs0 likes0 downloads0 reach0 impact
400 instances - 6 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
774 runs0 likes0 downloads0 reach0 impact
797 instances - 5 features - 2 classes - 0 missing values
No data.
50 runs0 likes0 downloads0 reach0 impact
1000000 instances - 18 features - 22 classes - 0 missing values
No data.
305 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
29 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
24 instances - 5 features - classes - 0 missing values
* Abstract: Predict the Bankruptcy from Qualitative parameters from experts. * Source: Source Information -- Creator : Mr.A.Martin(jayamartin '@' yahoo.com) Mr.J.Uthayakumar (uthayakumar17691 '@'…
147 runs0 likes0 downloads0 reach0 impact
250 instances - 7 features - 2 classes - 0 missing values
Multi-label dataset. The genbase dataset contains protein sequences that can be assigned to several classes of protein families.
0 runs0 likes0 downloads0 reach0 impact
662 instances - 1212 features - classes - 0 missing values
Multi-label dataset. The UC Berkeley enron4 dataset represents a subset of the original enron5 dataset and consists of 1684 cases of emails with 21 labels and 1001 predictor variables.
0 runs0 likes0 downloads0 reach0 impact
1702 instances - 1054 features - classes - 0 missing values
This dataset contains 358 lyrics of songs for the rock bands 'The Rolling Stones' and 'Deep Purple'. The bands are equally represented in the dataset (179 songs for each band). This dataset was…
8 runs0 likes0 downloads0 reach0 impact
358 instances - 2 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1-pmlb
0 runs0 likes2 downloads2 reach22 impact
1600 instances - 1001 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
corral-pmlb
31 runs0 likes0 downloads0 reach0 impact
160 instances - 7 features - 2 classes - 0 missing values
car-evaluation-pmlb
31 runs0 likes0 downloads0 reach0 impact
1728 instances - 22 features - 4 classes - 0 missing values
led24-pmlb
31 runs0 likes2 downloads2 reach22 impact
3200 instances - 25 features - 10 classes - 0 missing values
led7-pmlb
31 runs0 likes0 downloads0 reach0 impact
3200 instances - 8 features - 10 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
analcatdata_fraud-pmlb
34 runs0 likes0 downloads0 reach0 impact
42 instances - 12 features - 2 classes - 0 missing values
Data set shows information about participants of math conference. isPresent is target column for classification task.
0 runs0 likes0 downloads0 reach0 impact
246 instances - 7 features - 2 classes - 0 missing values
Multi-label dataset. The UC Berkeley enron4 dataset represents a subset of the original enron5 dataset and consists of 1684 cases of emails with 21 labels and 1001 predictor variables.
1 runs0 likes0 downloads0 reach0 impact
1702 instances - 1054 features - 2 classes - 0 missing values
Multi-label dataset. The genbase dataset contains protein sequences that can be assigned to several classes of protein families.
0 runs0 likes0 downloads0 reach0 impact
662 instances - 1213 features - 2 classes - 0 missing values
Automated file upload of 20_newsgroups.drift
124 runs0 likes0 downloads0 reach0 impact
399940 instances - 1001 features - 2 classes - 0 missing values
Automated file upload of BNG(ionosphere)
99 runs0 likes0 downloads0 reach0 impact
1000000 instances - 35 features - 2 classes - 0 missing values
Automated file upload of BNG(segment)
99 runs0 likes0 downloads0 reach0 impact
1000000 instances - 20 features - 7 classes - 0 missing values
Automated file upload of BNG(spambase)
98 runs0 likes0 downloads0 reach0 impact
1000000 instances - 58 features - 2 classes - 0 missing values
Automated file upload of BNG(optdigits)
100 runs0 likes0 downloads0 reach0 impact
1000000 instances - 65 features - 10 classes - 0 missing values
Citation Request: This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
66 runs0 likes0 downloads0 reach0 impact
277 instances - 10 features - 2 classes - 0 missing values
Wikidata with top-474 most frequent types and ingoing/outgoing properties as features
0 runs0 likes0 downloads0 reach0 impact
19254100 instances - 2331 features - classes - 0 missing values
This data contains general demographic information on internet users in 1997. Original Owner [1]Graphics, Visualization, & Usability Center College of Computing Geogia Institute of Technology…
0 runs0 likes0 downloads0 reach0 impact
10108 instances - 72 features - classes - 2699 missing values
Nell HMC dataset for type prediction with ingoing/outgoing properties as features
0 runs0 likes0 downloads0 reach0 impact
120720 instances - 769 features - classes - 0 missing values
feedback
0 runs0 likes0 downloads0 reach0 impact
38932 instances - 3 features - classes - 0 missing values
feedback_1
0 runs0 likes0 downloads0 reach0 impact
38932 instances - 3 features - classes - 0 missing values
Subset of KITS dataset with 100 images
0 runs0 likes0 downloads0 reach0 impact
100 instances - 27649 features - 2 classes - 0 missing values
50 Danish words with their pronunciation from Dansk Ordbog
0 runs0 likes0 downloads0 reach0 impact
51 instances - 2 features - classes - 2 missing values
Survey to know if people self-identify as Midwesterners.
0 runs0 likes0 downloads0 reach0 impact
2778 instances - 28 features - 10 classes - 1737 missing values
Payments given by healthcare manufacturing companies to medical doctors or hospitals
0 runs0 likes0 downloads0 reach0 impact
73558 instances - 6 features - 2 classes - 83182 missing values
mini insect example dataset # 1
0 runs0 likes0 downloads0 reach0 impact
12 instances - 4 features - 4 classes - 0 missing values
Context I collected about 1200 Covid-19 research articles from the NCBI.NLM.NIH website to be utilized in ML algorithms/ Data Analysis such as Sentiment Analysis, Time Series, Recommender System…
0 runs0 likes0 downloads0 reach0 impact
1198 instances - 5 features - classes - 1459 missing values
This dataset has been scrapped off Goodreads to obtain land information about the best books of the 19th Century. Feature Description Book_Name the title of the book Author_Name the author(s) of the…
0 runs0 likes0 downloads0 reach0 impact
1101 instances - 5 features - classes - 4 missing values
## **Meta-Album Plant Village Dataset (Mini)** The Plant Village dataset(https://data.mendeley.com/datasets/tywbtsjrjv/1) contains camera photos of 17 crop leaves. The original image resolution is…
0 runs0 likes0 downloads0 reach1 impact
1520 instances - 3 features - 38 classes - 0 missing values
## **Meta-Album RSICB Dataset (Mini)** RSICB128 dataset (https://github.com/lehaifeng/RSI-CB) covers 45 scene categories, assembling in total 36 000 images of resolution 128x128 px. The data authors…
0 runs0 likes0 downloads0 reach1 impact
1800 instances - 3 features - 45 classes - 0 missing values