OpenML
Filter results by:
At Santander our mission is to help people and businesses prosper. We are always looking for ways to help our customers understand their financial health and identify which products and services might…
0 runs0 likes0 downloads0 reach0 impact
200000 instances - 202 features - 2 classes - 0 missing values
kaggle_santander_p
0 runs0 likes0 downloads0 reach0 impact
200000 instances - 203 features - classes - 0 missing values
At Santander our mission is to help people and businesses prosper. We are always looking for ways to help our customers understand their financial health and identify which products and services might…
0 runs0 likes0 downloads0 reach0 impact
200000 instances - 202 features - 2 classes - 0 missing values
This version has feature names based on https://www2.1010data.com/documentationcenter/beta/Tutorials/MachineLearningExamples/CensusIncomeDataSet.html Missing data is also properly encoded in this…
0 runs0 likes1 downloads1 reach0 impact
199523 instances - 42 features - 2 classes - 415717 missing values
Content This is a dataset I started building for my future personal projects, as I think this kind of data is quite hard to acquire for free and in short time. I started acquiring data on March 21st,…
0 runs0 likes0 downloads0 reach0 impact
193279 instances - 4 features - classes - 29954 missing values
libSVM","AAD group IJCNN 2001 neural network competition. Slide presentation in IJCNN'01, Ford Research Laboratory, 2001. http://www.geocities.com/ijcnn/nnc_ijcnn01.pdf . #Dataset from the LIBSVM data…
0 runs0 likes0 downloads0 reach0 impact
191681 instances - 23 features - 0 classes - 0 missing values
Dataset KDD98 challenge: https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html The goal is to estimate the return from a direct mailing in order to maximize donation profits. This dataset…
0 runs0 likes0 downloads0 reach0 impact
191260 instances - 479 features - 0 classes - 5587563 missing values
Content This database contains six basic emotions (happiness, surprise, anger, fear, disgust, and sadness) of normalized (average mean reference) data and collected from 85 undergraduate university…
0 runs0 likes0 downloads0 reach0 impact
190967 instances - 11 features - classes - 0 missing values
No description available
0 runs0 likes0 downloads0 reach0 impact
190776 instances - 20 features - 0 classes - 22484 missing values
The Medicare Inpatient Hospitals by Provider and Service dataset provides information on inpatient discharges for Original Medicare Part A beneficiaries by IPPS hospitals. It includes information on…
0 runs0 likes0 downloads0 reach0 impact
188806 instances - 13 features - 0 classes - 0 missing values
When you've been devastated by a serious car accident, your focus is on the things that matter the most: family, friends, and other loved ones. Pushing paper with your insurance agent is the last…
0 runs0 likes0 downloads0 reach0 impact
188318 instances - 131 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on both numerical and categorical…
0 runs0 likes0 downloads0 reach0 impact
188318 instances - 125 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
188318 instances - 125 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
188318 instances - 125 features - 0 classes - 0 missing values
The dataset contains all the statistics for each player from 2008 to 2016.
0 runs0 likes0 downloads0 reach0 impact
183978 instances - 42 features - classes - 47301 missing values
Abstract: This dataset contains timeseries of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 male and 44 female native Arabic speakers.…
0 runs0 likes0 downloads0 reach0 impact
178526 instances - 13 features - classes - 57200 missing values
ARFF Training Data
0 runs0 likes0 downloads0 reach0 impact
177640 instances - 40 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
177147 instances - 11 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
177147 instances - 11 features - 0 classes - 0 missing values
Coronavirus Country Profiles We built 207 country profiles which allow you to explore the statistics on the coronavirus pandemic for every country in the world. In a fast-evolving pandemic it is not a…
0 runs0 likes0 downloads0 reach0 impact
170646 instances - 66 features - classes - 5082293 missing values
## **Meta-Album Insects Dataset (Extended)** The original Insects dataset is created by the National Museum of Natural History, Paris (https://www.mnhn.fr/fr). It has more than 290 000 images in…
0 runs0 likes0 downloads0 reach1 impact
170506 instances - 3 features - 117 classes - 0 missing values
Customer purchases on Black Friday
0 runs0 likes0 downloads0 reach0 impact
166821 instances - 10 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
166821 instances - 10 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
166821 instances - 10 features - 0 classes - 0 missing values
Dataset Title: Localization Data for Person Activity Data Set Abstract: Data contains recordings of five people performing different activities. Each person wore four sensors (tags) while performing…
6 runs0 likes0 downloads0 reach0 impact
164860 instances - 8 features - 11 classes - 0 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 12 features - 0 classes - 0 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes information on…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 12 features - 0 classes - 0 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes information on…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 12 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 4 features - 0 classes - 0 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes information on…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 12 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 4 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 4 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on both numerical and categorical…
1 runs0 likes0 downloads0 reach0 impact
163065 instances - 4 features - 0 classes - 0 missing values
Context Based on 250K lyrics database. Created to perform Supervised NLP sentiment analysis task using Spotify valence audio feature, a measure of the positiveness of the song. Content Preparation of…
0 runs0 likes0 downloads0 reach0 impact
158353 instances - 5 features - classes - 2 missing values
wine
0 runs0 likes0 downloads0 reach0 impact
150930 instances - 10 features - classes - 174477 missing values
Wine data gathered by https://www.kaggle.com/zynicideThe data was scraped from WineEnthusiast during the week of June 15th, 2017. The code for the scraper can be found at…
0 runs0 likes0 downloads0 reach0 impact
150930 instances - 10 features - classes - 174477 missing values
tesl dataset about L
0 runs0 likes0 downloads0 reach0 impact
150000 instances - 8 features - classes - 0 missing values
Improve on the state of the art in credit scoring by predicting the probability that somebody will experience financial distress in the next two years. ## Description Banks play a crucial role in…
0 runs0 likes0 downloads0 reach0 impact
150000 instances - 11 features - 2 classes - 33655 missing values
Even smaller sample of version 1
0 runs0 likes0 downloads0 reach0 impact
149639 instances - 12 features - 2 classes - 0 missing values
The dataset collects data from an Android smartphone positioned in the chest pocket. Accelerometer Data are collected from 22 participants walking in the wild over a predefined path. The dataset is…
80 runs0 likes0 downloads0 reach0 impact
149332 instances - 5 features - 22 classes - 0 missing values
Context Bus Breakdown and Delays You can find the road where the traffic was heavy for the New York City Taxi Trip Duration playground. Content The Bus Breakdown and Delay system collects information…
0 runs0 likes0 downloads0 reach0 impact
147972 instances - 20 features - classes - 170487 missing values
This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia. This dataset combines raw counts for…
0 runs0 likes0 downloads0 reach0 impact
147269 instances - 4 features - classes - 0 missing values
Context Cinema industry is not excluded of getting advantage of predictive modeling. Like other industry it can help cinemas for cost reduction and better ROI. By forecasting sale, screening in…
0 runs0 likes0 downloads0 reach0 impact
142524 instances - 14 features - classes - 250 missing values
Context This data has been extracted from the billing systems of 8 Municipalities in South Africa over a 2 year period and summarised according to their total amount billed versus the total amount…
0 runs0 likes0 downloads0 reach0 impact
138509 instances - 16 features - classes - 0 missing values
## **Meta-Album Boats Dataset (Extended)** The original version of the Meta-Album boats dataset is called MARVEL dataset (https://github.com/avaapm/marveldataset2016). It has more than 138 000 images…
0 runs0 likes0 downloads0 reach1 impact
138367 instances - 3 features - 26 classes - 138367 missing values
No data.
90 runs0 likes0 downloads0 reach0 impact
137781 instances - 10 features - 7 classes - 0 missing values
No data.
75 runs0 likes0 downloads0 reach0 impact
137781 instances - 10 features - 7 classes - 0 missing values
Testing dataset
0 runs0 likes0 downloads0 reach0 impact
134731 instances - 31 features - 2 classes - 0 missing values
bases-de-donnees-annuelles-des-accidents-corporels-de-la-circulation-routiere-annees-de-2005-a-2019
0 runs0 likes0 downloads0 reach0 impact
132977 instances - 55 features - 0 classes - 550521 missing values
Problem Statement Welcome to Sigma Cab Private Limited - a cab aggregator service. Their customers can download their app on smartphones and book a cab from any where in the cities they operate in.…
0 runs0 likes0 downloads0 reach0 impact
131662 instances - 14 features - classes - 137546 missing values
EMNIST Balanced https://www.nist.gov/itl/iad/image-group/emnist-dataset
73 runs0 likes0 downloads0 reach0 impact
131600 instances - 785 features - 47 classes - 0 missing values
No data.
356 runs0 likes0 downloads0 reach0 impact
131072 instances - 17 features - 2 classes - 0 missing values
Byron Roe (byronroe '@' umich.edu) Department of Physics University of Michigan Ann Arbor, MI 48109 This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos…
12 runs0 likes4 downloads4 reach14 impact
130064 instances - 51 features - 2 classes - 0 missing values
Context Thinking of Natural Language Processing as a beginner!! The dataset has been about the wine comments or reviews that has been given by various wine tasters. The concept was to use text…
0 runs0 likes0 downloads0 reach0 impact
129971 instances - 14 features - classes - 204754 missing values
130k wine reviews with variety, location, winery, price, and description. Downloaded from Kaggle [https://www.kaggle.com/zynicide/wine-reviews/home] on 29.10.2018. The original data was scraped from…
0 runs0 likes1 downloads1 reach9 impact
129971 instances - 13 features - 0 classes - 204752 missing values
Do two sentences come from the same article? We randomly sampled sentences from across Wikipedia. Some sentences came from the same articles, others do not. Sentences from the Same Article These two…
0 runs0 likes0 downloads0 reach0 impact
129156 instances - 3 features - classes - 0 missing values
Context Explore an environmental conditions dataframe scraped from CIMIS weather stations using a selenium chromedriver. With California's wildfires setting records in 2020, it is worthwhile to…
0 runs0 likes0 downloads0 reach0 impact
128125 instances - 19 features - 0 classes - 138 missing values
https://www.kaggle.com/dansbecker/nba-shot-logs
0 runs0 likes0 downloads0 reach0 impact
128069 instances - 21 features - classes - 5567 missing values
This is a sesnor data for test it is not complete.
0 runs0 likes0 downloads0 reach0 impact
127591 instances - 27 features - classes - 0 missing values
Asteroid Dataset
0 runs0 likes0 downloads0 reach0 impact
126131 instances - 34 features - 2 classes - 99 missing values
Asteroid Dataset
0 runs0 likes0 downloads0 reach0 impact
126131 instances - 34 features - 2 classes - 99 missing values
Context CS:GO is a tactical shooter, where two teams (CT and Terrorist) play for a best of 30 rounds, with each round being 1 minute and 55 seconds. There are 5 players on each team (10 in total) and…
0 runs0 likes0 downloads0 reach0 impact
122410 instances - 97 features - classes - 0 missing values
No data.
353 runs0 likes0 downloads0 reach0 impact
120919 instances - 1002 features - 2 classes - 0 missing values
Nell HMC dataset for type prediction with ingoing/outgoing properties as features
0 runs0 likes0 downloads0 reach0 impact
120720 instances - 769 features - classes - 0 missing values
## **Meta-Album PlantNet Dataset (Extended)** Meta-Album PlantNet dataset is created by sampling the Pl@ntNet-300k dataset (https://openreview.net/forum?id=eLYinD0TtIt), itself a sampling of the…
0 runs0 likes0 downloads0 reach1 impact
120688 instances - 3 features - 25 classes - 120688 missing values
Product listing data submitted to the U.S. FDA for all unfinished, unapproved drugs.
0 runs0 likes0 downloads0 reach0 impact
120215 instances - 20 features - 7 classes - 443305 missing values
Personal Loan product is an unsecured loan therefore it is vital to assess the risk of the customers by checking their credit worthiness. This must be done to prevent loan defaults. The objective is…
0 runs0 likes0 downloads0 reach0 impact
119528 instances - 32 features - classes - 987539 missing values
Context Buying a diamond can be frustrating and expensive. It inspired me to create this dataset of 119K natural and lab-created diamonds from brilliantearth.com to demystify the value of the 4 Cs…
0 runs0 likes0 downloads0 reach0 impact
119307 instances - 11 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
116640 instances - 10 features - 0 classes - 0 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective…
0 runs0 likes0 downloads0 reach0 impact
111762 instances - 33 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on both numerical and categorical…
0 runs0 likes0 downloads0 reach0 impact
111762 instances - 33 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
111762 instances - 33 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
111762 instances - 33 features - 2 classes - 0 missing values
Experiment data obtained by running random configurations of the hnsw kNN through mlr on 116 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
111753 instances - 13 features - classes - 0 missing values
We introduce AfriSenti, which consists of 14 sentiment datasets of 110,000+ tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese,…
0 runs0 likes0 downloads0 reach0 impact
111720 instances - 4 features - 3 classes - 0 missing values
Context A person makes a doctor appointment, receives all the instructions and no-show. Who to blame? If this help you studying or working, please dont forget to upvote :). Reference to Joni Hoppen…
0 runs0 likes0 downloads0 reach0 impact
110527 instances - 13 features - 2 classes - 0 missing values
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service…
216 runs0 likes0 downloads0 reach0 impact
110393 instances - 55 features - 7 classes - 0 missing values
Context I am currently building a short term (one-day ahead) electric load forecasting model for Goa. A good chunk of it is domestic household load. Temperature and Humidity can be used to estimate…
0 runs0 likes0 downloads0 reach0 impact
108096 instances - 9 features - classes - 0 missing values
#Dataset from the LIBSVM multiclass data repository.
0 runs0 likes0 downloads0 reach0 impact
108000 instances - 129 features - 0 classes - 0 missing values
Multiclass from binary: Expanding one-vs-all, one-vs-one and ECOC-based approaches. Dataset taken from LIBSVM: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html In this dataset…
0 runs0 likes0 downloads0 reach8 impact
108000 instances - 129 features - 1000 classes - 0 missing values
Experiment data obtained by running random configurations of glmnet through mlr on 114 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
104820 instances - 10 features - classes - 0 missing values
Dota 2 is a popular computer game with two teams of 5 players. At the start of the game each player chooses a unique hero with different strengths and weaknesses. Source: stephen.tridgell '@'…
0 runs0 likes0 downloads0 reach0 impact
102944 instances - 117 features - 2 classes - 0 missing values
Context Getting access to high-quality historical stock market data can be very expensive and/or complicated; parsing SEC 10-Q filings direct from the SEC EDGAR is difficult due to the varying…
0 runs1 likes0 downloads1 reach0 impact
101787 instances - 45 features - classes - 2857964 missing values
The dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. Information…
0 runs0 likes0 downloads0 reach0 impact
101766 instances - 48 features - 3 classes - 192849 missing values
The "Diabetes 130-Hospitals" dataset represents 10 years of clinical care at 130 U.S. hospitals and delivery networks, collected from 1999 to 2008. Each record represents the hospital admission record…
0 runs0 likes0 downloads0 reach0 impact
101766 instances - 22 features - 2 classes - 0 missing values
The "Diabetes 130-Hospitals" dataset represents 10 years of clinical care at 130 U.S. hospitals and delivery networks, collected from 1999 to 2008. Each record represents the hospital admission record…
0 runs0 likes0 downloads0 reach0 impact
101766 instances - 25 features - 0 classes - 0 missing values
uci
0 runs0 likes0 downloads0 reach0 impact
101766 instances - 52 features - classes - 192849 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes18 downloads20 reach17 impact
101766 instances - 50 features - 3 classes - 0 missing values
Re-upload of the dataset as it is present in the Penn ML Benchmark (https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/fars). It's a dataset on traffic accidents,…
1 runs0 likes4 downloads4 reach23 impact
100968 instances - 30 features - 8 classes - 0 missing values
This dataset is for classification tasks, and has both continuous and categorical variables.
0 runs0 likes0 downloads0 reach0 impact
100959 instances - 30 features - 0 classes - 0 missing values
Context This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 28 features - classes - 1985 missing values
This dataset describes 100,000 realistic, synthetically generated worker compensation insurance claims. Along the ultimate financial losses, each claim is described by the initial case estimate, date…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 14 features - 0 classes - 0 missing values
Feedback: Mukharbek Organokov organokov.mgmail.com Context Sloan Digital Sky Survey current DR16 Server Data release with Galaxies, Stars and Quasars. License: Creative Commons Attribution license…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 18 features - classes - 0 missing values
person credit-related information
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 28 features - 3 classes - 62162 missing values
SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor…
52 runs0 likes0 downloads0 reach0 impact
99289 instances - 3073 features - 10 classes - 0 missing values
Vehicle classification in distributed sensor networks. Journal of Parallel and Distributed Computing, 64(7):826-838, July 2004. This is the SensIT Vehicle (combined) dataset, retrieved 2013-11-14 from…
403 runs0 likes0 downloads0 reach0 impact
98528 instances - 101 features - 2 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Regenerate features by the authors' matlab scripts (see Sec. C of Appendix A), then randomly select 10% instances from the…
0 runs0 likes0 downloads0 reach0 impact
98528 instances - 101 features - 0 classes - 0 missing values
Normalized version of vehicle dataset (http://www.openml.org/d/54) NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted…
372 runs0 likes0 downloads0 reach0 impact
98528 instances - 101 features - 2 classes - 0 missing values
### Attribute Information * The first column is the class label (1 for signal, 0 for background) * 21 low-level features (kinematic properties): lepton pT, lepton eta, lepton phi, missing energy…
14393 runs0 likes0 downloads0 reach0 impact
98050 instances - 29 features - 2 classes - 9 missing values
Data Set Information: The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The…
0 runs0 likes0 downloads0 reach0 impact
98050 instances - 29 features - 0 classes - 9 missing values