OpenML
Filter results by:
This dataset describes 100,000 realistic, synthetically generated worker compensation insurance claims. Along the ultimate financial losses, each claim is described by the initial case estimate, date…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 14 features - 0 classes - 0 missing values
The data set contains laboratory values of blood donors and Hepatitis C patients and demographic values like age.The target attribute for classification is Category (blood donors vs. Hepatitis C…
0 runs0 likes0 downloads0 reach0 impact
615 instances - 14 features - classes - 31 missing values
Context In the world of Pokmon academia, one name towers above any other Professor Samuel Oak. While his colleague Professor Elm specializes in Pokmon evolution, Oak has dedicated his career to…
0 runs0 likes0 downloads0 reach0 impact
801 instances - 14 features - 0 classes - 138 missing values
Description This is a countrywide weather events dataset that includes 6.3 million events, and covers 49 states of the United States. Examples of weather events are rain, snow, storm, and freezing…
0 runs0 likes0 downloads0 reach0 impact
7479165 instances - 14 features - classes - 73797 missing values
Context Cinema industry is not excluded of getting advantage of predictive modeling. Like other industry it can help cinemas for cost reduction and better ROI. By forecasting sale, screening in…
0 runs0 likes0 downloads0 reach0 impact
142524 instances - 14 features - classes - 250 missing values
Content Churn for bank customers RowNumbercorresponds to the record (row) number and has no effect on the output. CustomerIdcontains random values and has no effect on customer leaving the bank.…
0 runs0 likes0 downloads0 reach0 impact
10000 instances - 14 features - classes - 0 missing values
Context "In the Bechdel Cast, the question asked, do movies have women in them? Are all their discussions involve boyfriends or husbands or do they have individualism? The Patriarchy's vast. Let's…
0 runs0 likes0 downloads0 reach0 impact
194 instances - 14 features - classes - 20 missing values
Problem Statement Welcome to Sigma Cab Private Limited - a cab aggregator service. Their customers can download their app on smartphones and book a cab from any where in the cities they operate in.…
0 runs0 likes0 downloads0 reach0 impact
131662 instances - 14 features - classes - 137546 missing values
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found…
0 runs0 likes0 downloads0 reach0 impact
178 instances - 14 features - classes - 2 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
13932 instances - 14 features - 0 classes - 0 missing values
tbd
0 runs0 likes0 downloads0 reach0 impact
10000 instances - 14 features - 0 classes - 0 missing values
1. Title of Database: Wine recognition data Updated Sept 21, 1998 by C.Blake : Added attribute information 2. Sources: (a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration,…
1192 runs0 likes0 downloads0 reach0 impact
178 instances - 14 features - 3 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
700 runs0 likes0 downloads0 reach0 impact
294 instances - 14 features - 2 classes - 782 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
714 runs0 likes4 downloads4 reach15 impact
303 instances - 14 features - 2 classes - 6 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
737 runs0 likes0 downloads0 reach0 impact
303 instances - 14 features - 2 classes - 6 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
717 runs0 likes0 downloads0 reach0 impact
303 instances - 14 features - 2 classes - 7 missing values
No data.
312 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 3 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 0 classes - 0 missing values
nominal features and target for COMPAS
0 runs0 likes0 downloads0 reach0 impact
5278 instances - 14 features - 2 classes - 0 missing values
Original data from https://github.com/propublica/compas-analysis/ by ProPublica. The data was subsequently preprocessed and reduced to relevant features for classification. The target variable is…
0 runs0 likes1 downloads1 reach10 impact
5278 instances - 14 features - 2 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
A csv file with 80,000+ tweets from January 6th, 2021 -- the day of the capitol hill riots. Made using the Twitter Developer API + Tweepy. Nowhere close to the size of the Parler data dumps, but…
0 runs0 likes0 downloads0 reach0 impact
82309 instances - 14 features - classes - 392323 missing values
Context It is a well known fact that Millenials LOVE Avocado Toast. It's also a well known fact that all Millenials live in their parents basements. Clearly, they aren't buying home because they are…
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - classes - 0 missing values
Context Every year a lot of people migrate to different countries from Pakistan and a lot of them migrate to Pakistan as emigrants of refugees, Pakistan ranks 2nd, according to UNHCR, among the…
0 runs0 likes0 downloads0 reach0 impact
264 instances - 14 features - classes - 0 missing values
Context Winter Storm Uri in February 2021 caused havoc across the United States and specifically to Texas involving mass power outages, water and food shortages, and dangerous weather conditions. This…
0 runs0 likes0 downloads0 reach0 impact
23358 instances - 14 features - classes - 53699 missing values
Context This dataset is originally from UCI Machine Learning Repository. The objective of the dataset is to diagnostically predict whether a patient is having chronic kidney disease or not, based on…
0 runs0 likes0 downloads0 reach0 impact
400 instances - 14 features - 0 classes - 0 missing values
Context It is a well known fact that Millenials LOVE Avocado Toast. It's also a well known fact that all Millenials live in their parents basements. Clearly, they aren't buying home because they are…
0 runs0 likes0 downloads0 reach0 impact
18249 instances - 14 features - classes - 0 missing values
Context: The leading cause of death in the developed world is heart disease. Therefore there needs to be work done to help prevent the risks of of having a heart attack or stroke. Content: Use this…
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
13932 instances - 14 features - 0 classes - 0 missing values
This database contains 14 attributes. The goal field refers to the presence of heart disease in the patient. It is integer valued with 0 or 1.
0 runs0 likes0 downloads0 reach0 impact
296 instances - 14 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
13932 instances - 14 features - 0 classes - 0 missing values
cleve-pmlb
32 runs0 likes0 downloads0 reach0 impact
303 instances - 14 features - 2 classes - 0 missing values
The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of…
0 runs3 likes45 downloads48 reach12 impact
1309 instances - 14 features - 2 classes - 3855 missing values
Predicting forest cover ...
0 runs0 likes0 downloads0 reach0 impact
18182 instances - 14 features - 0 classes - 2 missing values
Predicting forest cover ...
0 runs0 likes0 downloads0 reach0 impact
18182 instances - 14 features - 0 classes - 2 missing values
The "Cookbook Reviews" is an extensive data set that includes a range of information about user interactions and recipe reviews. It contains important details like the recipe name, where it stands in…
0 runs0 likes0 downloads0 reach0 impact
18182 instances - 14 features - 0 classes - 2 missing values
The AI4I 2020 Predictive Maintenance Dataset is a synthetic dataset that reflects real predictive maintenance data encountered in industry. Since real predictive maintenance datasets are generally…
0 runs0 likes0 downloads0 reach0 impact
10000 instances - 14 features - classes - 0 missing values
No data.
310 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 5 classes - 0 missing values
No data.
326 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 2 classes - 0 missing values
No data.
66 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 5 classes - 0 missing values
No data.
66 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 5 classes - 0 missing values
No data.
70 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 2 classes - 0 missing values
This analytic dataset contains actual betting behavior virtual live action betting for each participant aggregated over one month period from the date the account was opened. Also, it contains live…
0 runs0 likes0 downloads0 reach0 impact
530 instances - 14 features - 2 classes - 0 missing values
The balanced dataset of the doa_bwin dataset (ID:45711) using the SMOTE algorithm in the R-Package themis. This dataset represents a 50:50 split between the classes of the DV variable (354/354).
0 runs0 likes0 downloads0 reach0 impact
708 instances - 14 features - 2 classes - 0 missing values
Context This dataset is extracted from the The Boston Housing Dataset, and the extraction of the data is explained in Extract dataset/dataframe from an URL Acknowledgements A Dataset derived from…
0 runs0 likes0 downloads0 reach0 impact
506 instances - 14 features - 0 classes - 0 missing values
This analytic dataset contains actual betting behavior virtual live action betting for each participant aggregated over one month period from the date the account was opened. Also, it contains live…
0 runs0 likes0 downloads0 reach0 impact
530 instances - 14 features - 2 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
336 instances - 15 features - classes - 0 missing values
Schizophrenic Eye-Tracking Data in Rubin and Wu (1997) Biometrics. Yingnian Wu (wu@hustat.harvard.edu) [14/Oct/97] Information about the dataset CLASSTYPE: nominal CLASSINDEX: last
748 runs0 likes0 downloads0 reach0 impact
340 instances - 15 features - 2 classes - 834 missing values
The AAUP dataset for the ASA Statistical Graphics Section's 1995 Data Analysis Exposition contains information on faculty salaries for 1161 American colleges and universities. The data may be obtained…
32 runs0 likes0 downloads0 reach0 impact
1161 instances - 15 features - 4 classes - 256 missing values
wind daily average wind speeds for 1961-1978 at 12 synoptic meteorological stations in the Republic of Ireland (Haslett and raftery 1989). These data were analyzed in detail in the following article:…
0 runs0 likes0 downloads0 reach0 impact
6574 instances - 15 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
760 runs0 likes0 downloads0 reach0 impact
6574 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
857 runs0 likes0 downloads0 reach0 impact
9961 instances - 15 features - 2 classes - 0 missing values
This dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers. The data was collected for examining our newly developed classifier for multidimensional curves…
23315 runs0 likes0 downloads0 reach0 impact
9961 instances - 15 features - 9 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
102 runs0 likes0 downloads0 reach0 impact
67 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
769 runs0 likes0 downloads0 reach0 impact
252 instances - 15 features - 2 classes - 0 missing values
Schizophrenic Eye-Tracking Data in Rubin and Wu (1997) Biometrics. Yingnian Wu (wu@hustat.harvard.edu) [14/Oct/97] Information about the dataset CLASSTYPE: nominal CLASSINDEX: last
0 runs0 likes0 downloads0 reach0 impact
340 instances - 15 features - 2 classes - 834 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable,…
485 runs0 likes0 downloads0 reach0 impact
76 instances - 15 features - 7 classes - 37 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 1. Title: Assessing the Reliability of a Human Estimator…
0 runs0 likes0 downloads0 reach0 impact
75 instances - 15 features - 0 classes - 0 missing values
See [https://github.com/slds-lmu/paper_2023_ci_for_ge](https://github.com/slds-lmu/paper_2023_ci_for_ge) for a description.
0 runs0 likes0 downloads0 reach0 impact
5100000 instances - 15 features - 0 classes - 0 missing values
No data.
288 runs0 likes0 downloads0 reach0 impact
1000000 instances - 15 features - 9 classes - 0 missing values
No data.
51 runs0 likes0 downloads0 reach0 impact
1000000 instances - 15 features - 2 classes - 0 missing values
* Title of Database: Spoken Arabic Digit * Abstract: This dataset contains time series of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 males…
1 runs0 likes0 downloads0 reach0 impact
263256 instances - 15 features - 10 classes - 0 missing values
All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement…
166511 runs0 likes0 downloads0 reach0 impact
14980 instances - 15 features - 2 classes - 0 missing values
Subsampling of the dataset Australian (40981) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
690 instances - 15 features - 2 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
0 runs0 likes0 downloads0 reach0 impact
48842 instances - 15 features - 2 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
214 instances - 15 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
214 instances - 15 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
106 instances - 15 features - classes - 0 missing values
In the dataset there are 5 types of dataset.QCM3, QCM6, QCM7, QCM10, QCM12In each of dataset, there is alcohol classification of five types,1-octanol, 1-propanol, 2-butanol, 2-propanol, 1-isobutanolIn…
0 runs0 likes0 downloads0 reach0 impact
125 instances - 15 features - classes - 0 missing values
Context NBA 2k20 analysis. Content Detailed attributes for players registered in the NBA2k20. Acknowledgements Data scraped from https://hoopshype.com/nba2k/. Additional data about countries and…
0 runs0 likes0 downloads0 reach0 impact
439 instances - 15 features - classes - 92 missing values
Chess A lot has changed in chess over the years with computer engines and AI algorithms. But how has human performance changed along with the rise in technology? WGM Becoming a Woman Grandmaster (WGM)…
0 runs0 likes0 downloads0 reach0 impact
304767 instances - 15 features - classes - 3473 missing values
Context 5 countries (Tha major five soccer leagues). 44269 games. 25 seasons. 226 teams. Content All game scores of the big five European soccer leagues (England, Germany, Spain, Italy and France) for…
0 runs0 likes0 downloads0 reach0 impact
44269 instances - 15 features - classes - 0 missing values
Subsampling of the dataset Australian (40981) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
690 instances - 15 features - 2 classes - 0 missing values
Subsampling of the dataset Australian (40981) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
690 instances - 15 features - 2 classes - 0 missing values
Subsampling of the dataset Australian (40981) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
690 instances - 15 features - 2 classes - 0 missing values
Subsampling of the dataset Australian (40981) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
690 instances - 15 features - 2 classes - 0 missing values
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using…
0 runs0 likes0 downloads0 reach0 impact
32561 instances - 15 features - classes - 4262 missing values
The Garment Industry is one of the key examples of the industrial globalization of this modern era. It is a highly labour-intensive industry with lots of manual processes. Satisfying the huge global…
0 runs0 likes0 downloads0 reach0 impact
1197 instances - 15 features - classes - 506 missing values
18ProductivityPrediction
0 runs0 likes0 downloads0 reach0 impact
1197 instances - 15 features - 0 classes - 506 missing values
18ProductivityPrediction
0 runs0 likes0 downloads0 reach0 impact
1197 instances - 15 features - classes - 506 missing values
Hello My name is Ben Roshan D, doing MBA in Business Analytics at Jain University Bangalore . We have practical sessions in Python,R as subjects. Faculties provide us with such data sets to work on…
0 runs0 likes0 downloads0 reach0 impact
215 instances - 15 features - classes - 67 missing values
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using…
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 15 features - 2 classes - 0 missing values
What is it ? This dataset is a record of 3.5 Million+ US Domestic Flights from 1990 to 2009. It has been taken from OpenFlights website which have a huge database of different travelling mediums…
0 runs0 likes0 downloads0 reach0 impact
3606803 instances - 15 features - classes - 27522 missing values
Predict whether income exceeds $50K/yr based on census data. Also known as Census Income dataset. Train and test sets combined. Null values represented with question mark is replaced with na. 52…
0 runs0 likes0 downloads0 reach0 impact
48790 instances - 15 features - 2 classes - 6456 missing values
A certain premium club boasts a large customer membership. The members pay an annual membership fee in return for using the exclusive facilities offered by this club. The fees are customized for every…
0 runs0 likes0 downloads0 reach0 impact
10362 instances - 15 features - 2 classes - 12224 missing values
A certain premium club boasts a large customer membership. The members pay an annual membership fee in return for using the exclusive facilities offered by this club. The fees are customized for every…
0 runs0 likes0 downloads0 reach0 impact
10362 instances - 15 features - 2 classes - 12224 missing values
This data set measures the running time of a matrix-matrix product A\*B = C, where all matrices have size 2048 x 2048, using a parameterizable SGEMM GPU kernel with 241600 possible parameter…
0 runs0 likes0 downloads0 reach0 impact
241600 instances - 15 features - 0 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
48842 instances - 15 features - 2 classes - 6465 missing values
Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC) [http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml]. The dataset includes TLC trips of the green line in…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 15 features - 0 classes - 0 missing values
analysis of stocks
0 runs0 likes0 downloads0 reach0 impact
245 instances - 15 features - classes - 0 missing values
Experiment data obtained by running random configurations of an SVM through mlr on 106 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
540576 instances - 15 features - classes - 658962 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
5472 instances - 15 features - classes - 0 missing values
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using…
0 runs0 likes1 downloads1 reach0 impact
32561 instances - 15 features - classes - 4262 missing values
Context There's a story behind every dataset and here's your opportunity to share yours. Content What's inside is more than just rows and columns. Make it easy for others to get started by describing…
0 runs0 likes0 downloads0 reach0 impact
14640 instances - 15 features - classes - 61973 missing values
Description The data set is provided csv file which provides the following resources that can be used as inputs for model building : A collection of website URLs for 11001 websites. Each sample has 15…
0 runs0 likes0 downloads0 reach0 impact
11000 instances - 15 features - classes - 0 missing values
Context This data set contains measles vaccination rate data for 46,412 schools in 32 states across the US. Content Vaccination rates are for the 2017-201818 school year for the following states:…
0 runs0 likes0 downloads0 reach0 impact
66113 instances - 15 features - classes - 322786 missing values