data
0 runs0 likes0 downloads0 reach0 impact
145460 instances - 23 features - classes - 343248 missing values
data
0 runs0 likes0 downloads0 reach0 impact
145460 instances - 23 features - classes - 343248 missing values
data
0 runs0 likes0 downloads0 reach0 impact
145460 instances - 23 features - classes - 343248 missing values
weather data
0 runs0 likes0 downloads0 reach0 impact
145460 instances - 23 features - classes - 343248 missing values
Context Cinema industry is not excluded of getting advantage of predictive modeling. Like other industry it can help cinemas for cost reduction and better ROI. By forecasting sale, screening in…
0 runs0 likes0 downloads0 reach0 impact
142524 instances - 14 features - classes - 250 missing values
Context This data has been extracted from the billing systems of 8 Municipalities in South Africa over a 2 year period and summarised according to their total amount billed versus the total amount…
0 runs0 likes0 downloads0 reach0 impact
138509 instances - 16 features - classes - 0 missing values
## **Meta-Album Boats Dataset (Extended)** The original version of the Meta-Album boats dataset is called MARVEL dataset (https://github.com/avaapm/marveldataset2016). It has more than 138 000 images…
0 runs0 likes0 downloads0 reach0 impact
138367 instances - 3 features - 26 classes - 138367 missing values
No data.
90 runs0 likes0 downloads0 reach0 impact
137781 instances - 10 features - 7 classes - 0 missing values
No data.
75 runs0 likes0 downloads0 reach0 impact
137781 instances - 10 features - 7 classes - 0 missing values
Testing dataset
0 runs0 likes0 downloads0 reach0 impact
134731 instances - 31 features - 2 classes - 0 missing values
Description: The dataset, named "clean_tweet_Dec19ToDec20.csv," comprises a collection of tweets post-processed for clarity and analysis, spanning from December 2019 to December 2020. It is designed…
0 runs0 likes0 downloads0 reach0 impact
134348 instances - 3 features - classes - 18 missing values
Description: The dataset, named "clean_tweet_Dec19ToDec20.csv," comprises a collection of tweets post-processed for clarity and analysis, spanning from December 2019 to December 2020. It is designed…
0 runs0 likes0 downloads0 reach0 impact
134348 instances - 3 features - classes - 18 missing values
bases-de-donnees-annuelles-des-accidents-corporels-de-la-circulation-routiere-annees-de-2005-a-2019
0 runs0 likes0 downloads0 reach0 impact
132977 instances - 55 features - 0 classes - 550521 missing values
Pedestrian Counting System published by the city of Melbourne, data with some minor preprocessing. From original website: ----- This dataset contains hourly pedestrian counts since 2009 from…
0 runs0 likes0 downloads0 reach0 impact
132209 instances - 110 features - classes - 8604673 missing values
Problem Statement Welcome to Sigma Cab Private Limited - a cab aggregator service. Their customers can download their app on smartphones and book a cab from any where in the cities they operate in.…
0 runs0 likes0 downloads0 reach0 impact
131662 instances - 14 features - classes - 137546 missing values
EMNIST Balanced https://www.nist.gov/itl/iad/image-group/emnist-dataset
73 runs0 likes0 downloads0 reach0 impact
131600 instances - 785 features - 47 classes - 0 missing values
No data.
356 runs0 likes0 downloads0 reach0 impact
131072 instances - 17 features - 2 classes - 0 missing values
Byron Roe (byronroe '@' umich.edu) Department of Physics University of Michigan Ann Arbor, MI 48109 This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos…
12 runs0 likes0 downloads0 reach0 impact
130064 instances - 51 features - 2 classes - 0 missing values
130k wine reviews with variety, location, winery, price, and description. Downloaded from Kaggle [https://www.kaggle.com/zynicide/wine-reviews/home] on 29.10.2018. The original data was scraped from…
0 runs0 likes0 downloads0 reach0 impact
129971 instances - 13 features - 0 classes - 204752 missing values
Context Thinking of Natural Language Processing as a beginner!! The dataset has been about the wine comments or reviews that has been given by various wine tasters. The concept was to use text…
0 runs0 likes0 downloads0 reach0 impact
129971 instances - 14 features - classes - 204754 missing values
Do two sentences come from the same article? We randomly sampled sentences from across Wikipedia. Some sentences came from the same articles, others do not. Sentences from the Same Article These two…
0 runs0 likes0 downloads0 reach0 impact
129156 instances - 3 features - classes - 0 missing values
Context Explore an environmental conditions dataframe scraped from CIMIS weather stations using a selenium chromedriver. With California's wildfires setting records in 2020, it is worthwhile to…
0 runs0 likes0 downloads0 reach0 impact
128125 instances - 19 features - 0 classes - 138 missing values
https://www.kaggle.com/dansbecker/nba-shot-logs
0 runs0 likes0 downloads0 reach0 impact
128069 instances - 21 features - classes - 5567 missing values
This is a sesnor data for test it is not complete.
0 runs0 likes0 downloads0 reach0 impact
127591 instances - 27 features - classes - 0 missing values
Asteroid Dataset
0 runs0 likes0 downloads0 reach0 impact
126131 instances - 34 features - 2 classes - 99 missing values
Asteroid Dataset
0 runs0 likes0 downloads0 reach0 impact
126131 instances - 34 features - 2 classes - 99 missing values
Description: The "postings.csv" dataset comprises various job postings across different companies and locations. It includes detailed information on job titles, job descriptions, salaries, and…
0 runs0 likes0 downloads0 reach0 impact
123849 instances - 28 features - classes - 1133501 missing values
Context CS:GO is a tactical shooter, where two teams (CT and Terrorist) play for a best of 30 rounds, with each round being 1 minute and 55 seconds. There are 5 players on each team (10 in total) and…
0 runs0 likes0 downloads0 reach0 impact
122410 instances - 97 features - classes - 0 missing values
No data.
353 runs0 likes0 downloads0 reach0 impact
120919 instances - 1002 features - 2 classes - 0 missing values
Nell HMC dataset for type prediction with ingoing/outgoing properties as features
0 runs0 likes0 downloads0 reach0 impact
120720 instances - 769 features - classes - 0 missing values
## **Meta-Album PlantNet Dataset (Extended)** Meta-Album PlantNet dataset is created by sampling the Pl@ntNet-300k dataset (https://openreview.net/forum?id=eLYinD0TtIt), itself a sampling of the…
0 runs0 likes0 downloads0 reach0 impact
120688 instances - 3 features - 25 classes - 120688 missing values
Product listing data submitted to the U.S. FDA for all unfinished, unapproved drugs.
0 runs0 likes0 downloads0 reach0 impact
120215 instances - 20 features - 7 classes - 443305 missing values
Personal Loan product is an unsecured loan therefore it is vital to assess the risk of the customers by checking their credit worthiness. This must be done to prevent loan defaults. The objective is…
0 runs0 likes0 downloads0 reach0 impact
119528 instances - 32 features - classes - 987539 missing values
Context Buying a diamond can be frustrating and expensive. It inspired me to create this dataset of 119K natural and lab-created diamonds from brilliantearth.com to demystify the value of the 4 Cs…
0 runs0 likes0 downloads0 reach0 impact
119307 instances - 11 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
116640 instances - 10 features - 0 classes - 0 missing values
Description: The olist_order_items_dataset.csv is a comprehensive dataset that features transactional data from the Olist e-commerce platform. The dataset documents items purchased within each order…
0 runs0 likes0 downloads0 reach0 impact
112650 instances - 7 features - classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on categorical and numerical features"…
1 runs0 likes0 downloads0 reach0 impact
111762 instances - 33 features - 2 classes - 0 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective…
0 runs0 likes0 downloads0 reach0 impact
111762 instances - 33 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on both numerical and categorical…
0 runs0 likes0 downloads0 reach0 impact
111762 instances - 33 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
111762 instances - 33 features - 2 classes - 0 missing values
Experiment data obtained by running random configurations of the hnsw kNN through mlr on 116 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
111753 instances - 13 features - classes - 0 missing values
We introduce AfriSenti, which consists of 14 sentiment datasets of 110,000+ tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese,…
0 runs0 likes0 downloads0 reach0 impact
111720 instances - 4 features - 3 classes - 0 missing values
Context A person makes a doctor appointment, receives all the instructions and no-show. Who to blame? If this help you studying or working, please dont forget to upvote :). Reference to Joni Hoppen…
0 runs0 likes0 downloads0 reach0 impact
110527 instances - 13 features - 2 classes - 0 missing values
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service…
216 runs0 likes0 downloads0 reach0 impact
110393 instances - 55 features - 7 classes - 0 missing values
Tourism competion for time series forecasting, monthly data From original source: ----- The data we use include 366 monthly series, 427 quarterly series and 518 yearly series. They were supplied by…
0 runs0 likes0 downloads0 reach0 impact
109280 instances - 4 features - classes - 0 missing values
Context I am currently building a short term (one-day ahead) electric load forecasting model for Goa. A good chunk of it is domestic household load. Temperature and Humidity can be used to estimate…
0 runs0 likes0 downloads0 reach0 impact
108096 instances - 9 features - classes - 0 missing values
#Dataset from the LIBSVM multiclass data repository.
0 runs0 likes0 downloads0 reach0 impact
108000 instances - 129 features - 0 classes - 0 missing values
Multiclass from binary: Expanding one-vs-all, one-vs-one and ECOC-based approaches. Dataset taken from LIBSVM: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html In this dataset…
0 runs0 likes0 downloads0 reach0 impact
108000 instances - 129 features - 1000 classes - 0 missing values
Electricity Load Diagrams between 2011 and 2014. From original source: ----- Data set has no missing values. Values are in kW of each 15 min. To convert values in kWh values must be divided by 4. Each…
0 runs0 likes0 downloads0 reach0 impact
105217 instances - 319 features - classes - 0 missing values
French Electricity Consumption
0 runs0 likes0 downloads0 reach0 impact
105168 instances - 16 features - 0 classes - 0 missing values
French Electricity Consumption
0 runs0 likes0 downloads0 reach0 impact
105168 instances - 17 features - 0 classes - 0 missing values
Experiment data obtained by running random configurations of glmnet through mlr on 114 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
104820 instances - 10 features - classes - 0 missing values
Dota 2 is a popular computer game with two teams of 5 players. At the start of the game each player chooses a unique hero with different strengths and weaknesses. Source: stephen.tridgell '@'…
0 runs0 likes0 downloads0 reach0 impact
102944 instances - 117 features - 2 classes - 0 missing values
Context Getting access to high-quality historical stock market data can be very expensive and/or complicated; parsing SEC 10-Q filings direct from the SEC EDGAR is difficult due to the varying…
0 runs0 likes0 downloads0 reach0 impact
101787 instances - 45 features - classes - 2857964 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs0 likes0 downloads0 reach0 impact
101766 instances - 50 features - 3 classes - 0 missing values
uci
0 runs0 likes0 downloads0 reach0 impact
101766 instances - 52 features - classes - 192849 missing values
The "Diabetes 130-Hospitals" dataset represents 10 years of clinical care at 130 U.S. hospitals and delivery networks, collected from 1999 to 2008. Each record represents the hospital admission record…
0 runs0 likes0 downloads0 reach0 impact
101766 instances - 25 features - 0 classes - 0 missing values
The dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. Information…
0 runs0 likes0 downloads0 reach0 impact
101766 instances - 48 features - 3 classes - 192849 missing values
The "Diabetes 130-Hospitals" dataset represents 10 years of clinical care at 130 U.S. hospitals and delivery networks, collected from 1999 to 2008. Each record represents the hospital admission record…
0 runs0 likes0 downloads0 reach0 impact
101766 instances - 22 features - 2 classes - 0 missing values
Re-upload of the dataset as it is present in the Penn ML Benchmark (https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/fars). It's a dataset on traffic accidents,…
1 runs0 likes0 downloads0 reach0 impact
100968 instances - 30 features - 8 classes - 0 missing values
This dataset is for classification tasks, and has both continuous and categorical variables.
0 runs0 likes0 downloads0 reach0 impact
100959 instances - 30 features - 0 classes - 0 missing values
Context This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 28 features - classes - 1985 missing values
This dataset describes 100,000 realistic, synthetically generated worker compensation insurance claims. Along the ultimate financial losses, each claim is described by the initial case estimate, date…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 14 features - 0 classes - 0 missing values
Feedback: Mukharbek Organokov organokov.mgmail.com Context Sloan Digital Sky Survey current DR16 Server Data release with Galaxies, Stars and Quasars. License: Creative Commons Attribution license…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 18 features - classes - 0 missing values
Description: The Retail_Transaction_Dataset.csv provides a comprehensive overview of various retail transactions, capturing customer behavior, product details, and purchase information. It includes…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 10 features - classes - 0 missing values
person credit-related information
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 28 features - 3 classes - 62162 missing values
This dataset contains customer credit score information, which can be used for classification purposes. - **Poor** (0): Customers with a low credit score. - **Standard** (1): Customers with an average…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 28 features - 3 classes - 62162 missing values
Tiny ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64 x 64 colored images. Each class has 500 training images, 50 validation images, and 50 test images. The dataset…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 3 features - 200 classes - 0 missing values
Finantial dataset for automl benchmark. Target column: credit_score
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 28 features - 3 classes - 60080 missing values
Description: The `olist_customers_dataset.csv` offers a comprehensive snapshot of customer details from the Olist e-commerce platform. This dataset encapsulates essential customer information,…
0 runs0 likes0 downloads0 reach0 impact
99441 instances - 5 features - classes - 0 missing values
SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor…
52 runs0 likes0 downloads0 reach0 impact
99289 instances - 3073 features - 10 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Regenerate features by the authors' matlab scripts (see Sec. C of Appendix A), then randomly select 10% instances from the…
0 runs0 likes0 downloads0 reach0 impact
98528 instances - 101 features - 0 classes - 0 missing values
Normalized version of vehicle dataset (http://www.openml.org/d/54) NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted…
372 runs0 likes0 downloads0 reach0 impact
98528 instances - 101 features - 2 classes - 0 missing values
Vehicle classification in distributed sensor networks. Journal of Parallel and Distributed Computing, 64(7):826-838, July 2004. This is the SensIT Vehicle (combined) dataset, retrieved 2013-11-14 from…
403 runs0 likes0 downloads0 reach0 impact
98528 instances - 101 features - 2 classes - 0 missing values
### Attribute Information * The first column is the class label (1 for signal, 0 for background) * 21 low-level features (kinematic properties): lepton pT, lepton eta, lepton phi, missing energy…
14397 runs0 likes0 downloads0 reach0 impact
98050 instances - 29 features - 2 classes - 9 missing values
Data Set Information: The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The…
0 runs0 likes0 downloads0 reach0 impact
98050 instances - 29 features - 0 classes - 9 missing values
HotpotQA is a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the…
0 runs0 likes0 downloads0 reach0 impact
97852 instances - 7 features - classes - 0 missing values
Context All the time of russian elections history we have some insteresting anomalies in the voting results. You can use this dataset to find them) Content So, the each row of the dataset is detailed…
0 runs0 likes0 downloads0 reach0 impact
97705 instances - 23 features - classes - 0 missing values
Context Data is collected daily from Our World in Data GitHub repository for covid-19, merged and uploaded. Content The data contains the following information: Country- this is the country for which…
0 runs0 likes0 downloads0 reach0 impact
97606 instances - 5 features - classes - 0 missing values
Context This dataset is a small snap ( sample) out of ocean-depth entries in the original dataset, which keeps increasing day by day. The purpose of this dataset is to allow fellow Scientists/…
0 runs0 likes0 downloads0 reach0 impact
97606 instances - 5 features - 0 classes - 0 missing values
Context This is a dataset for a larger project I have been working on. My idea is to analyze and compare real historical weather with weather folklore. Content The CSV file includes a hourly/daily…
0 runs0 likes0 downloads0 reach0 impact
96453 instances - 12 features - classes - 517 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
3039 runs0 likes0 downloads0 reach0 impact
96320 instances - 22 features - 2 classes - 0 missing values
Context This data was collected to be used with an academic project of mine. The project was about sentiment analysis of tweets during lockdown. Content I used the GetOldTweets3…
0 runs0 likes0 downloads0 reach0 impact
95488 instances - 7 features - classes - 160244 missing values
Context The data obtained from the Mexico's General Direction of Epidemiology contains multiple information on the current pandemic situation. However, these data are saturated with features that may…
0 runs0 likes0 downloads0 reach0 impact
92320 instances - 7 features - classes - 0 missing values
Experiment data obtained by running random configurations of rpart through mlr on 115 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
92067 instances - 12 features - classes - 0 missing values
In human civilisation, languages evolved first, and then came scripts. The Devanagari script is one of the oldest scripts of India, having evolved from the ancient Brahmi script. It came to be adopted…
43 runs0 likes0 downloads0 reach0 impact
92000 instances - 1025 features - 46 classes - 0 missing values
It has 3 attributes (ID, tweet, label ) 91299 tweets with non-sarcastic 39998 tweets and 51300 sarcastic tweets.
0 runs0 likes0 downloads0 reach0 impact
91298 instances - 2 features - 0 classes - 0 missing values
Birds dataset for image classification (stylized)
0 runs0 likes0 downloads0 reach0 impact
90620 instances - 7 features - 20 classes - 0 missing values
Another sample of COMET_MC
0 runs0 likes0 downloads0 reach0 impact
89640 instances - 6 features - 0 classes - 0 missing values
And another sample. (v. 2 without OpenML metainfo)
0 runs0 likes0 downloads0 reach0 impact
89640 instances - 6 features - classes - 0 missing values
This dataset is gather to detect whether a person is running or walking based on deep neural networks and sensor data collected from iOS devices. The dataset represents 88588 sensor data samples…
1 runs0 likes0 downloads0 reach0 impact
88588 instances - 7 features - 2 classes - 0 missing values
Dataset is uploaded from kaggle. https://www.kaggle.com/code/ambarish/eda-home-mortgage-ny-with-feature-analysis/script
0 runs0 likes0 downloads0 reach0 impact
87931 instances - 30 features - 7 classes - 98704 missing values
Dataset is uploaded from kaggle. https://www.kaggle.com/code/ambarish/eda-home-mortgage-ny-with-feature-analysis/script
0 runs0 likes0 downloads0 reach0 impact
87930 instances - 30 features - 6 classes - 98703 missing values
Dataset is uploaded from kaggle. https://www.kaggle.com/code/ambarish/eda-home-mortgage-ny-with-feature-analysis/script
0 runs0 likes0 downloads0 reach0 impact
87930 instances - 30 features - 6 classes - 98652 missing values
This is a historical data of HangSeng Futures Index based in Hong Kong. For non traders, the data is a time-series (sequential flow of numbers) describing the HangSeng Futures Index of HongKong. Every…
0 runs0 likes0 downloads0 reach0 impact
87645 instances - 8 features - classes - 0 missing values
Context This works focuses upon creating a data set on Pandas Q/A over StackOverflow. Presently, there are more than 90k+ questions available on StackOverflow which have been asked under Pandas…
0 runs0 likes0 downloads0 reach0 impact
87241 instances - 16 features - classes - 472864 missing values
This dataset combines records from the MLCQ dataset with metrics extracted using the PMD Tool and the Understand tool, to determine whether a file contains code smells. Please note that the records…
0 runs0 likes0 downloads0 reach0 impact
86467 instances - 67 features - 0 classes - 2852906 missing values
Dogs dataset with different breeds of dogs (stylized)
0 runs0 likes0 downloads0 reach0 impact
84880 instances - 7 features - 20 classes - 0 missing values
This dataset combines records from the MLCQ dataset with metrics extracted using the PMD Tool and the Understand tool, to determine whether a file contains code smells. Please note that the records…
0 runs0 likes0 downloads0 reach0 impact
83943 instances - 67 features - 0 classes - 2801627 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
14 runs0 likes0 downloads0 reach0 impact
83733 instances - 55 features - 4 classes - 0 missing values