OpenML
Filter results by:
Subsampling of the dataset eucalyptus (188) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
736 instances - 20 features - 5 classes - 448 missing values
Subsampling of the dataset eucalyptus (188) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
736 instances - 20 features - 5 classes - 448 missing values
======================================================================================================== Seismic bumps dataset…
0 runs0 likes0 downloads0 reach0 impact
2584 instances - 19 features - 2 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
12330 instances - 18 features - classes - 0 missing values
17OnlineShoppersPurchasingIntention
0 runs0 likes0 downloads0 reach0 impact
12330 instances - 18 features - classes - 0 missing values
()[]
0 runs0 likes0 downloads0 reach0 impact
2845342 instances - 46 features - classes - 3414349 missing values
Description This is a countrywide car accident dataset, which covers 49 states of the USA. The accident data are collected from February 2016 to Dec 2020, using two APIs that provide streaming traffic…
0 runs0 likes0 downloads0 reach0 impact
2845342 instances - 46 features - classes - 3414349 missing values
Context You can find a detailed weather data (2015-2020) of Ulaanbaatar, capital city of Mongolia. Content Data is including the timestamps (UTC) and timely basis data of weather related features,…
0 runs0 likes0 downloads0 reach0 impact
49184 instances - 20 features - classes - 138 missing values
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found…
0 runs0 likes0 downloads0 reach0 impact
178 instances - 14 features - classes - 0 missing values
https://www.kaggle.com/dansbecker/nba-shot-logs
0 runs0 likes0 downloads0 reach0 impact
128069 instances - 21 features - classes - 5567 missing values
Context This data has been extracted from the billing systems of 8 Municipalities in South Africa over a 2 year period and summarised according to their total amount billed versus the total amount…
0 runs0 likes0 downloads0 reach0 impact
138509 instances - 16 features - classes - 0 missing values
The database was created with records of absenteeism at work from July 2007 to July 2010 at a courier company in Brazil. The data set allows for several new combinations of attributes and attribute…
0 runs0 likes0 downloads0 reach0 impact
740 instances - 21 features - classes - 0 missing values
Dataset is from http://tomslee.net/airbnb-data-collection-get-the-data room_id: A unique number identifying an Airbnb listing. The listing has a URL on the Airbnb web site of…
0 runs0 likes0 downloads0 reach0 impact
13578 instances - 20 features - classes - 54347 missing values
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found…
0 runs0 likes0 downloads0 reach0 impact
178 instances - 14 features - classes - 2 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
13932 instances - 14 features - 0 classes - 0 missing values
Paulo Cortez, University of Minho, Guimaraes, Portugal, http://www3.dsi.uminho.pt/pcortez The dataset was obtained from the UCI Repository. This data approach student achievement in secondary…
0 runs0 likes0 downloads0 reach0 impact
649 instances - 31 features - 0 classes - 0 missing values
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service…
216 runs0 likes0 downloads0 reach0 impact
110393 instances - 55 features - 7 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
718 runs0 likes0 downloads0 reach0 impact
159 instances - 16 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
700 runs0 likes0 downloads0 reach0 impact
67 instances - 16 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
701 runs0 likes0 downloads0 reach0 impact
736 instances - 20 features - 2 classes - 448 missing values
No data.
117 runs0 likes0 downloads0 reach0 impact
1000000 instances - 20 features - 5 classes - 0 missing values
une description test
0 runs0 likes0 downloads0 reach0 impact
992 instances - 16 features - 0 classes - 0 missing values
Context About the Rio Niteri Bridge, according to Wikipedia: Presidente Costa e Silva Bridge, popularly known as Rio Niteri Bridge, is a bridge that crosses Guanabara Bay, in the state of Rio de…
0 runs0 likes0 downloads0 reach0 impact
9367 instances - 31 features - classes - 44380 missing values
Context This dataset is originally from UCI Machine Learning Repository. The objective of the dataset is to diagnostically predict whether a patient is having chronic kidney disease or not, based on…
0 runs0 likes0 downloads0 reach0 impact
400 instances - 14 features - 0 classes - 0 missing values
Acknowledgements This data was scraped from http://house.speakingsame.com/ and includes data from 322 Perth suburbs, resulting in an average of about 100 rows per suburb. Content I believe the columns…
0 runs0 likes0 downloads0 reach0 impact
33656 instances - 18 features - classes - 16585 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
13932 instances - 14 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
13932 instances - 14 features - 0 classes - 0 missing values
This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was…
0 runs0 likes0 downloads0 reach0 impact
649 instances - 31 features - 0 classes - 0 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
27620 runs0 likes12 downloads12 reach11 impact
736 instances - 20 features - 5 classes - 448 missing values
Context This dataset is extracted from the The Boston Housing Dataset, and the extraction of the data is explained in Extract dataset/dataframe from an URL Acknowledgements A Dataset derived from…
0 runs0 likes0 downloads0 reach0 impact
506 instances - 14 features - 0 classes - 0 missing values
Description: This dataset "Taxi_Trips_-_2024_20240408.csv" contains information on taxi trips in Chicago starting from February 2024. The dataset includes trip ID, taxi ID, trip start and end…
0 runs0 likes0 downloads0 reach0 impact
865247 instances - 23 features - classes - 1514769 missing values
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The…
3 runs0 likes3 downloads3 reach14 impact
72983 instances - 33 features - 2 classes - 149271 missing values
The AAUP dataset for the ASA Statistical Graphics Section's 1995 Data Analysis Exposition contains information on faculty salaries for 1161 American colleges and universities. The data may be obtained…
32 runs0 likes0 downloads0 reach0 impact
1161 instances - 15 features - 4 classes - 256 missing values
Primary Biliary Cirrhosis This data set is a follow-up to the original PBC data set, as discussed in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis, Wiley, 1991. An…
0 runs0 likes0 downloads0 reach0 impact
1945 instances - 19 features - 0 classes - 1133 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
5 runs0 likes0 downloads0 reach0 impact
8192 instances - 13 features - 0 classes - 0 missing values
This file is a text file giving details about the time series analysed in 'The Analysis of Time Series' by Chris Chatfield. The 5th edn was published in 1996 and the 6th edn in 2003. The series are…
0 runs0 likes0 downloads0 reach0 impact
235 instances - 13 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
780 runs0 likes0 downloads0 reach0 impact
178 instances - 14 features - 2 classes - 0 missing values
This database contains 13 attributes (which have been extracted from a larger set of 75) Attribute Information: ------------------------ -- 1. age -- 2. sex -- 3. chest pain type (4 values) -- 4.…
3215 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
101 runs0 likes0 downloads0 reach0 impact
1161 instances - 16 features - 2 classes - 256 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 13 features - 0 classes - 0 missing values
* Donor: David W. Aha (aha '@' ics.uci.edu) (714) 856-8779 * Data Set Information: This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In…
159 runs0 likes0 downloads0 reach0 impact
200 instances - 14 features - 5 classes - 0 missing values
* Dataset: This is a reprocessed version of heart-h (hungarian), the heart disease reprocessed hungarian dataset from UCI.
138 runs0 likes0 downloads0 reach0 impact
294 instances - 14 features - 5 classes - 0 missing values
No data.
253 runs0 likes0 downloads0 reach0 impact
1076790 instances - 30 features - 2 classes - 7275 missing values
Source: Original Owner: U.S. Census Bureau http://www.census.gov/ United States Department of Commerce Donor: Terran Lane and Ronny Kohavi Data Mining and Visualization Silicon Graphics. terran '@'…
0 runs0 likes0 downloads0 reach0 impact
299285 instances - 42 features - classes - 0 missing values
Abstract: This dataset contains timeseries of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 male and 44 female native Arabic speakers.…
0 runs0 likes0 downloads0 reach0 impact
178526 instances - 13 features - classes - 57200 missing values
This version has feature names based on https://www2.1010data.com/documentationcenter/beta/Tutorials/MachineLearningExamples/CensusIncomeDataSet.html Missing data is also properly encoded in this…
0 runs0 likes1 downloads1 reach0 impact
199523 instances - 42 features - 2 classes - 415717 missing values
Conventional and Social Media Movies (CSM) - Dataset 2014 and 2015 Data Set 12 features categorized as conventional and social media features. Both conventional features, collected from movies…
0 runs0 likes0 downloads0 reach0 impact
232 instances - 14 features - classes - 60 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
178 instances - 16 features - classes - 0 missing values
Context The subject of this dataset is multi-instrument observations of solar flares. There are a number of space-based instruments that are able to observe solar flares on the Sun; some instruments…
0 runs0 likes0 downloads0 reach0 impact
12455 instances - 17 features - classes - 0 missing values
Context 'Learning the production cross-sections of the Inert Doublet Model' Cite as Humberto Reyes-Gonzlez, Andre Lessa, Sydney Otten. (2020). 'Learning the production cross sections of the Inert…
0 runs0 likes0 downloads0 reach0 impact
50625 instances - 13 features - classes - 0 missing values
The analysis is performed for different sets of input values using the methodology similar to that described in [Schafer, Benjamin, et al. 'Taming instabilities in power grid networks by decentralized…
0 runs0 likes0 downloads0 reach0 impact
10000 instances - 14 features - classes - 0 missing values
test
0 runs0 likes1 downloads1 reach0 impact
299 instances - 13 features - classes - 0 missing values
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database: P. Cortez, A.…
0 runs0 likes0 downloads0 reach0 impact
6497 instances - 14 features - classes - 0 missing values
Context Melbourne real estate is BOOMING. Can you find the insight or predict the next big trend to become a real estate mogul or even harder, to snap up a reasonably priced 2-bedroom unit? Content…
0 runs0 likes0 downloads0 reach0 impact
13580 instances - 21 features - classes - 13256 missing values
Context Microstrip Antennas are low-profile antennas applied in high-performance aircraft, spacecraft, satellite and missile applications, where size, weight, performance, ease of installation and…
0 runs0 likes0 downloads0 reach0 impact
572 instances - 13 features - classes - 63 missing values
Context This is home value data for the hot Nashville market. Content There are 56,000+ rows altogether. However, I'm missing home detail data for about half. So if anyone wants to track that down…
0 runs0 likes0 downloads0 reach0 impact
56636 instances - 31 features - classes - 648773 missing values
Context I wanted a highly imbalanced dataset to share with others. It has the perfect one for us. Imbalanced data typically refers to a classification problem where the number of observations per…
0 runs0 likes0 downloads0 reach0 impact
9578 instances - 14 features - classes - 0 missing values
hmeq_p,BAD,binary
0 runs0 likes0 downloads0 reach0 impact
5960 instances - 15 features - classes - 5271 missing values
Conventional and Social Media Movies (CSM) - Dataset 2014 and 2015 Data Set 12 features categorized as conventional and social media features. Both conventional features, collected from movies…
0 runs0 likes0 downloads0 reach0 impact
231 instances - 13 features - classes - 46 missing values
Zurich public transport delay data 2016-10-30 03:30:00 CET - 2016-11-27 01:20:00 CET cleaned and prepared at Open Data Day 2017. For this version, the task was downsampled to 0.5 percent. Some…
0 runs0 likes0 downloads0 reach0 impact
27327 instances - 18 features - 0 classes - 657 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
178 instances - 16 features - classes - 0 missing values
Context Projects are a great way to learn data science. So I started my own. The numerous housing data sets on Kaggle were the inspiration for this data set. Predicting housing prices is a simple yet…
0 runs0 likes0 downloads0 reach0 impact
10552 instances - 26 features - classes - 49282 missing values
ContextProjectsareagreatwaytolearndatascienceSoIstartedmyownThenumeroushousingdatasetsonKaggleweretheinspirationforthisdatasetPredictinghousingpricesisasimpleyetinsightfulregressionproblemUnderstandingdatatakestimeandthemorepeopleanalyzeitthefasterthesecretscanbeuncoveredIacquiredthedatabyscrapingImmoScout24amarketplaceforGermanrealestate…
0 runs0 likes0 downloads0 reach0 impact
10552 instances - 26 features - classes - 49282 missing values
Context Craigslist is the world's largest collection of privately sold housing options, yet it's very difficult to collect all of them in the same place. I built this dataset as a means in by which to…
0 runs0 likes0 downloads0 reach0 impact
384977 instances - 22 features - classes - 223551 missing values
Context Cinema industry is not excluded of getting advantage of predictive modeling. Like other industry it can help cinemas for cost reduction and better ROI. By forecasting sale, screening in…
0 runs0 likes0 downloads0 reach0 impact
142524 instances - 14 features - classes - 250 missing values
Context In the early 2000s, Billy Beane and Paul DePodesta worked for the Oakland Athletics. While there, they literally changed the game of baseball. They didn't do it using a bat or glove, and they…
0 runs0 likes0 downloads0 reach0 impact
1232 instances - 15 features - classes - 3600 missing values
tbd
0 runs0 likes0 downloads0 reach0 impact
56 instances - 13 features - 0 classes - 0 missing values
1. Title of Database: Wine recognition data Updated Sept 21, 1998 by C.Blake : Added attribute information 2. Sources: (a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration,…
1192 runs0 likes0 downloads0 reach0 impact
178 instances - 14 features - 3 classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
12 runs0 likes0 downloads0 reach0 impact
8192 instances - 13 features - 0 classes - 0 missing values
No data.
312 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 3 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
Context: The leading cause of death in the developed world is heart disease. Therefore there needs to be work done to help prevent the risks of of having a heart attack or stroke. Content: Use this…
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
Context This case is about a bank (Thera Bank) whose management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors). A campaign…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 13 features - classes - 0 missing values
Subsampling of the dataset Diabetes130US (4541) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 50 features - 3 classes - 3790 missing values
Subsampling of the dataset Diabetes130US (4541) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 50 features - 3 classes - 3820 missing values
Subsampling of the dataset Diabetes130US (4541) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 50 features - 3 classes - 3814 missing values
Subsampling of the dataset Diabetes130US (4541) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 50 features - 3 classes - 3850 missing values
Subsampling of the dataset Diabetes130US (4541) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 50 features - 3 classes - 3764 missing values
No data.
326 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 2 classes - 0 missing values
The local stability analysis of the 4-node star system (electricity producer is in the center) implementing Decentral Smart Grid Control concept was performed. This dataset contains simulations…
0 runs0 likes0 downloads0 reach0 impact
10000 instances - 13 features - 0 classes - 0 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes18 downloads20 reach17 impact
101766 instances - 50 features - 3 classes - 0 missing values
Description: This dataset, named "heart_failure_clinical_records.csv", consists of clinical records of patients with heart failure. It includes various attributes such as age, anaemia, creatinine…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 13 features - classes - 0 missing values
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch,…
9 runs0 likes0 downloads0 reach0 impact
506 instances - 14 features - 0 classes - 0 missing values
Schizophrenic Eye-Tracking Data in Rubin and Wu (1997) Biometrics. Yingnian Wu (wu@hustat.harvard.edu) [14/Oct/97] Information about the dataset CLASSTYPE: nominal CLASSINDEX: last
748 runs0 likes0 downloads0 reach0 impact
340 instances - 15 features - 2 classes - 834 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes0 downloads0 reach0 impact
163 instances - 27 features - 5 classes - 9 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach0 impact
66 instances - 12 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
734 runs0 likes0 downloads0 reach0 impact
506 instances - 14 features - 2 classes - 0 missing values
Schizophrenic Eye-Tracking Data in Rubin and Wu (1997) Biometrics. Yingnian Wu (wu@hustat.harvard.edu) [14/Oct/97] Information about the dataset CLASSTYPE: nominal CLASSINDEX: last
0 runs0 likes0 downloads0 reach0 impact
340 instances - 15 features - 2 classes - 834 missing values
1. Title: Wine Quality 2. Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 3. Past Usage: P. Cortez, A. Cerdeira, F.…
3 runs0 likes0 downloads0 reach0 impact
6497 instances - 12 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
810 runs0 likes0 downloads0 reach0 impact
235 instances - 13 features - 2 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
531441 instances - 12 features - 0 classes - 0 missing values
* Donor: David W. Aha (aha '@' ics.uci.edu) (714) 856-8779 * Data Set Information: This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In…
170 runs0 likes0 downloads0 reach0 impact
123 instances - 13 features - 5 classes - 0 missing values
* Title: Planning Relax Data Set * Abstract: The dataset concerns with the classification of two mental stages from recorded EEG signals: Planning (during imagination of motor act) and Relax state. *…
141 runs0 likes0 downloads0 reach0 impact
182 instances - 13 features - 2 classes - 0 missing values
calendarDOW-pmlb
31 runs0 likes0 downloads0 reach0 impact
399 instances - 33 features - 5 classes - 0 missing values
Context The data set contains laboratory values of blood donors and Hepatitis C patients and demographic values like age. The data was obtained from UCI Machine Learning Repository:…
0 runs0 likes0 downloads0 reach0 impact
615 instances - 14 features - classes - 31 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
6497 instances - 12 features - 0 classes - 0 missing values