OpenML
Filter results by:
Fixed dataset for autoHorse.csv I suggest...
0 runs0 likes0 downloads0 reach0 impact
201 instances - 69 features - 186 classes - 0 missing values
price col is int now. autoHorse dataset
15 runs0 likes0 downloads0 reach0 impact
201 instances - 69 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
622 runs0 likes0 downloads0 reach0 impact
10108 instances - 69 features - 2 classes - 2699 missing values
The experiments were carried out with a group of 30 volunteers within an age bracket of 19-48 years. They performed a protocol of activities composed of six basic activities: three static postures…
83 runs0 likes0 downloads0 reach0 impact
180 instances - 68 features - 6 classes - 0 missing values
Context This dataset was created by our in house Web Scraping and Data Mining teams at PromptCloud and DataStock. You can download the full dataset here. This sample contains 30K records. Content This…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 68 features - classes - 129439 missing values
This dataset combines records from the MLCQ dataset with metrics extracted using the PMD Tool and the Understand tool, to determine whether a file contains code smells. Please note that the records…
0 runs0 likes0 downloads0 reach0 impact
86467 instances - 67 features - 0 classes - 2852906 missing values
This dataset combines records from the MLCQ dataset with metrics extracted using the PMD Tool and the Understand tool, to determine whether a file contains code smells. Please note that the records…
0 runs0 likes0 downloads0 reach0 impact
83943 instances - 67 features - 0 classes - 2801627 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective accident.…
0 runs0 likes1 downloads1 reach1 impact
363243 instances - 67 features - 3 classes - 2181757 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective accident
0 runs0 likes0 downloads0 reach0 impact
363206 instances - 66 features - 0 classes - 876555 missing values
red wine dataset
0 runs0 likes0 downloads0 reach0 impact
0 instances - 66 features - classes - 0 missing values
Coronavirus Country Profiles We built 207 country profiles which allow you to explore the statistics on the coronavirus pandemic for every country in the world. In a fast-evolving pandemic it is not a…
0 runs0 likes0 downloads0 reach0 impact
170646 instances - 66 features - classes - 5082293 missing values
No description available
0 runs0 likes0 downloads0 reach0 impact
66469 instances - 66 features - 0 classes - 0 missing values
Context The objective of this dataset is to create a chess engine through machine learning. In this first part we will first predict the pieces to be moved depending on the position of the chessboard…
0 runs0 likes0 downloads0 reach0 impact
2632753 instances - 66 features - classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
38885 runs0 likes0 downloads0 reach0 impact
2000 instances - 65 features - 10 classes - 0 missing values
1. Title of Database: Optical Recognition of Handwritten Digits 2. Source: E. Alpaydin, C. Kaynak Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin@boun.edu.tr…
36118 runs0 likes0 downloads0 reach0 impact
5620 instances - 65 features - 10 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
794 runs0 likes9 downloads9 reach15 impact
2000 instances - 65 features - 2 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Margin). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143811 runs1 likes17 downloads18 reach419 impact
1600 instances - 65 features - 100 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Shape). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143764 runs1 likes40 downloads41 reach417 impact
1600 instances - 65 features - 100 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Texture). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143332 runs2 likes67 downloads69 reach419 impact
1599 instances - 65 features - 100 classes - 0 missing values
Automated file upload of BNG(optdigits)
100 runs0 likes0 downloads0 reach0 impact
1000000 instances - 65 features - 10 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
5910 instances - 65 features - classes - 4666 missing values
test
0 runs0 likes0 downloads0 reach0 impact
10503 instances - 65 features - classes - 9888 missing values
test
0 runs0 likes0 downloads0 reach0 impact
9792 instances - 65 features - classes - 8776 missing values
The dataset is about bankruptcy prediction of Polish companies. The data was collected from Emerging Markets Information Service (EMIS, [Web Link]), which is a database containing information on…
0 runs0 likes0 downloads0 reach0 impact
7027 instances - 65 features - classes - 5835 missing values
test
0 runs0 likes0 downloads0 reach0 impact
10173 instances - 65 features - classes - 12157 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
765 runs0 likes0 downloads0 reach0 impact
5620 instances - 65 features - 2 classes - 0 missing values
No data.
50 runs0 likes0 downloads0 reach0 impact
1000000 instances - 65 features - 10 classes - 0 missing values
No data.
194 runs0 likes0 downloads0 reach0 impact
1000000 instances - 65 features - 10 classes - 0 missing values
No data.
52 runs0 likes0 downloads0 reach0 impact
1000000 instances - 65 features - 10 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
8885 instances - 63 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
8885 instances - 63 features - 0 classes - 0 missing values
See [https://github.com/slds-lmu/paper_2023_ci_for_ge](https://github.com/slds-lmu/paper_2023_ci_for_ge) for a description.
0 runs0 likes0 downloads0 reach0 impact
5100000 instances - 63 features - 2 classes - 0 missing values
No data.
882 runs0 likes0 downloads0 reach0 impact
71 instances - 63 features - 6 classes - 0 missing values
No data.
948 runs0 likes0 downloads0 reach0 impact
74 instances - 63 features - 4 classes - 0 missing values
No data.
949 runs0 likes0 downloads0 reach0 impact
74 instances - 63 features - 4 classes - 0 missing values
No data.
996 runs0 likes0 downloads0 reach0 impact
74 instances - 63 features - 4 classes - 0 missing values
CD4 count prediction date
0 runs0 likes0 downloads0 reach0 impact
16484 instances - 62 features - classes - 0 missing values
This work was partially supported by national funds through FCT and IST through the UID/EEA/50009/2013 project", "BL89/2017-IST-ID grant. In this dataset, we present usability (SUS), workload…
0 runs0 likes0 downloads0 reach0 impact
31 instances - 62 features - classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
135 runs0 likes0 downloads0 reach0 impact
3190 instances - 61 features - 2 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: scaled to [-1,1]
0 runs0 likes0 downloads0 reach0 impact
3175 instances - 61 features - 0 classes - 0 missing values
Primate splice-junction gene sequences (DNA) with associated imperfect domain theory. Splice junctions are points on a DNA sequence at which 'superfluous' DNA is removed during the process of protein…
24646 runs0 likes0 downloads0 reach0 impact
3190 instances - 61 features - 3 classes - 0 missing values
The problem is to learn a regression equation/rule/tree to predict the activity from the descriptive structural attributes. The data and methodology is described in detail in: - King, Ross .D., Hurst,…
5 runs0 likes0 downloads0 reach0 impact
186 instances - 61 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
169 runs0 likes0 downloads0 reach0 impact
600 instances - 61 features - 2 classes - 0 missing values
This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity). *…
0 runs0 likes0 downloads0 reach0 impact
39644 instances - 61 features - 0 classes - 0 missing values
Test dataset
0 runs0 likes0 downloads0 reach0 impact
15547 instances - 61 features - 0 classes - 280 missing values
Test dataset
3 runs0 likes0 downloads0 reach0 impact
15547 instances - 61 features - 2 classes - 280 missing values
Test dataset
0 runs0 likes0 downloads0 reach0 impact
15547 instances - 61 features - 0 classes - 280 missing values
Test dataset
0 runs0 likes0 downloads0 reach0 impact
15547 instances - 61 features - 0 classes - 280 missing values
Subsampling of the dataset dionis (41167) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 61 features - 355 classes - 0 missing values
Subsampling of the dataset dionis (41167) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 61 features - 355 classes - 0 missing values
Subsampling of the dataset dionis (41167) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 61 features - 355 classes - 0 missing values
Subsampling of the dataset dionis (41167) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 61 features - 355 classes - 0 missing values
Subsampling of the dataset dionis (41167) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 61 features - 355 classes - 0 missing values
Introduction The dataset contains all the entire apartments located in Milan (N = 9322). This public dataset is part of Airbnb, and the original source can be found on this website. Dataset Creation…
0 runs0 likes0 downloads0 reach0 impact
9322 instances - 61 features - classes - 0 missing values
NAME: Sonar, Mines vs. Rocks SUMMARY: This is the data set used by Gorman and Sejnowski in their study of the classification of sonar signals using a neural network [1]. The task is to train a network…
2372 runs0 likes0 downloads0 reach0 impact
208 instances - 61 features - 2 classes - 0 missing values
### Description Synthetic Control Chart Time Series. This is actually time series classification. ### Sources ``` * Original Owner and Donor Dr Robert Alcock rob@skyblue.csd.auth.gr ``` ### Dataset…
20509 runs0 likes0 downloads0 reach0 impact
600 instances - 61 features - 6 classes - 0 missing values
This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be…
366 runs0 likes0 downloads0 reach0 impact
8844 instances - 61 features - 7 classes - 51515 missing values
This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be…
434 runs0 likes0 downloads0 reach0 impact
7019 instances - 61 features - 8 classes - 48089 missing values
This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be…
354 runs0 likes0 downloads0 reach0 impact
7485 instances - 61 features - 7 classes - 52048 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
806 runs0 likes0 downloads0 reach0 impact
186 instances - 61 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
744 runs0 likes0 downloads0 reach0 impact
7019 instances - 61 features - 2 classes - 43814 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
0 runs0 likes2 downloads2 reach18 impact
416188 instances - 61 features - 355 classes - 0 missing values
No data.
50 runs0 likes0 downloads0 reach0 impact
1000000 instances - 61 features - 2 classes - 0 missing values
See [https://github.com/slds-lmu/paper_2023_ci_for_ge](https://github.com/slds-lmu/paper_2023_ci_for_ge) for a description.
0 runs0 likes0 downloads0 reach0 impact
5100000 instances - 61 features - 0 classes - 0 missing values
See [https://github.com/slds-lmu/paper_2023_ci_for_ge](https://github.com/slds-lmu/paper_2023_ci_for_ge) for a description.
0 runs0 likes0 downloads0 reach0 impact
5100000 instances - 61 features - 0 classes - 0 missing values
No data.
296 runs0 likes0 downloads0 reach0 impact
1000000 instances - 61 features - 2 classes - 0 missing values
Context The Philippine Statistics Authority (PSA) spearheads the conduct of the Family Income and Expenditure Survey (FIES) nationwide. The survey, which is undertaken every three (3) years, is aimed…
0 runs0 likes0 downloads0 reach0 impact
41544 instances - 60 features - classes - 15072 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
39644 instances - 60 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
39644 instances - 60 features - 0 classes - 0 missing values
Version with url set as row id, creator data missing due to bad formatting.**Author**: Kelwin Fernandes (INESC TEC, Universidade doPorto), Pedro Vinagre (ALGORITMI Research Centre, Universidade do…
0 runs0 likes0 downloads0 reach0 impact
39644 instances - 60 features - 0 classes - 0 missing values
Context Includes data on confirmed cases, deaths, hospitalizations, and testing, as well as other variables of potential interest. Content As of 26 January 2021, the columns are: isocode, continent,…
0 runs0 likes0 downloads0 reach0 impact
63381 instances - 59 features - classes - 1508423 missing values
Context League of Legends is a MOBA (multiplayer online battle arena) where 2 teams (blue and red) face off. There are 3 lanes, a jungle, and 5 roles. The goal is to take down the enemy Nexus to win…
0 runs0 likes0 downloads0 reach0 impact
242572 instances - 59 features - classes - 0 missing values
Context I am a really huge football fan and the Premier League is one of my favourite football (or soccer, whatever you like to call it) leagues. So, as my very first dataset, I thought this would be…
0 runs0 likes0 downloads0 reach0 impact
571 instances - 59 features - classes - 10224 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
173 runs0 likes0 downloads0 reach0 impact
106 instances - 58 features - 2 classes - 0 missing values
Automated file upload of BNG(spambase)
98 runs0 likes0 downloads0 reach0 impact
1000000 instances - 58 features - 2 classes - 0 missing values
SPAM E-mail Database The "spam" concept is diverse: advertisements for products/websites, make money fast schemes, chain letters, pornography... Our collection of spam e-mails came from our postmaster…
0 runs0 likes0 downloads0 reach0 impact
4601 instances - 58 features - 2 classes - 0 missing values
Subsampling of the dataset porto-seguro (42742) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 58 features - 2 classes - 2775 missing values
Subsampling of the dataset porto-seguro (42742) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 58 features - 2 classes - 2855 missing values
Subsampling of the dataset porto-seguro (42742) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 58 features - 2 classes - 2863 missing values
Subsampling of the dataset porto-seguro (42742) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 58 features - 2 classes - 2837 missing values
Subsampling of the dataset porto-seguro (42742) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 58 features - 2 classes - 2846 missing values
Context This dataset is generated after an epxloratory analysis of data provided by Carbon Disclosure Project in the following competition: CDP: Unlocking Climate Solutions. It contains a compilation…
0 runs0 likes0 downloads0 reach0 impact
615 instances - 58 features - classes - 9612 missing values
Compilation of promoters with known transcriptional start points for E. coli genes. The task is to recognize promoters in strings that represent nucleotides (one of A, G, T, or C). A promoter is a…
138 runs0 likes0 downloads0 reach0 impact
106 instances - 58 features - 2 classes - 0 missing values
SPAM E-mail Database The "spam" concept is diverse: advertisements for products/websites, make money fast schemes, chain letters, pornography... Our collection of spam e-mails came from our postmaster…
162017 runs0 likes0 downloads0 reach0 impact
4601 instances - 58 features - 2 classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
0 runs0 likes0 downloads0 reach2 impact
595212 instances - 58 features - 2 classes - 846458 missing values
No data.
219 runs0 likes0 downloads0 reach0 impact
1000000 instances - 58 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
754 runs0 likes0 downloads0 reach0 impact
8844 instances - 57 features - 2 classes - 34843 missing values
Arbres urbains
0 runs0 likes0 downloads0 reach0 impact
699 instances - 57 features - 5 classes - 7889 missing values
arbres-urbains
0 runs0 likes0 downloads0 reach0 impact
699 instances - 57 features - 5 classes - 7889 missing values
Arbres urbains
0 runs0 likes0 downloads0 reach0 impact
2 instances - 57 features - 1 classes - 22 missing values
Arbres urbains
0 runs0 likes0 downloads0 reach0 impact
709 instances - 57 features - 6 classes - 8199 missing values
arbres-urbains
0 runs0 likes0 downloads0 reach0 impact
699 instances - 57 features - 5 classes - 7889 missing values
Arbres urbains
0 runs0 likes0 downloads0 reach0 impact
1 instances - 57 features - 1 classes - 11 missing values
1. Title: Lung Cancer Data 2. Source Information: - Data was published in : Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the…
1238 runs0 likes0 downloads0 reach0 impact
32 instances - 57 features - 3 classes - 5 missing values
bnlearn Bayesian Network Repository reference: [URL](https://www.bnlearn.com/bnrepository/discrete-large.html#hailfinder) - Number of nodes: 56 - Number of arcs: 66 - Number of parameters: 2656 -…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 56 features - classes - 0 missing values
bnlearn Bayesian Network Repository reference: [URL](https://www.bnlearn.com/bnrepository/discrete-large.html#hailfinder) - Number of nodes: 56 - Number of arcs: 66 - Number of parameters: 2656 -…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 56 features - classes - 0 missing values
bnlearn Bayesian Network Repository reference: [URL](https://www.bnlearn.com/bnrepository/discrete-large.html#hailfinder) - Number of nodes: 56 - Number of arcs: 66 - Number of parameters: 2656 -…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 56 features - classes - 0 missing values
bnlearn Bayesian Network Repository reference: [URL](https://www.bnlearn.com/bnrepository/discrete-large.html#hailfinder) - Number of nodes: 56 - Number of arcs: 66 - Number of parameters: 2656 -…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 56 features - classes - 0 missing values
bnlearn Bayesian Network Repository reference: [URL](https://www.bnlearn.com/bnrepository/discrete-large.html#hailfinder) - Number of nodes: 56 - Number of arcs: 66 - Number of parameters: 2656 -…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 56 features - classes - 0 missing values
bnlearn Bayesian Network Repository reference: [URL](https://www.bnlearn.com/bnrepository/discrete-large.html#hailfinder) - Number of nodes: 56 - Number of arcs: 66 - Number of parameters: 2656 -…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 56 features - classes - 0 missing values