OpenML
Filter results by:
This dataset contain attributes of dresses and their recommendations according to their sales. Sales are monitor on the basis of alternate days.The attributes present analyzed are: Recommendation,…
0 runs0 likes0 downloads0 reach0 impact
500 instances - 13 features - 0 classes - 0 missing values
User profile data for San Francisco OkCupid users published in [Kim, A. Y., & Escobedo-Land, A. (2015). OKCupid data for introductory statistics and data science courses. Journal of Statistics…
0 runs0 likes0 downloads0 reach0 impact
26677 instances - 14 features - 3 classes - 0 missing values
good
0 runs0 likes0 downloads0 reach0 impact
10 instances - 4 features - classes - 2 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 3 features - classes - 0 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 3 features - classes - 0 missing values
dfsgfhgk
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - classes - 0 missing values
sdgt
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
Context NBA 2k20 analysis. Content Detailed attributes for players registered in the NBA2k20. Acknowledgements Data scraped from https://hoopshype.com/nba2k/. Additional data about countries and…
0 runs0 likes0 downloads0 reach0 impact
439 instances - 15 features - classes - 92 missing values
Chess A lot has changed in chess over the years with computer engines and AI algorithms. But how has human performance changed along with the rise in technology? WGM Becoming a Woman Grandmaster (WGM)…
0 runs0 likes0 downloads0 reach0 impact
304767 instances - 15 features - classes - 3473 missing values
Context I have gathered this dataset over the course of 8 years and put a lot of effort in it (see soccerverse.com). If you use the data for any kind of project, please drop me a line or ping me on…
0 runs0 likes0 downloads0 reach0 impact
1078214 instances - 17 features - classes - 4031 missing values
Context Just made a scraper for stackoverflow, and created a dataset. Hope it will be useful for your task Content Contains 1 csv file, containing following columns question_vote_count : Number of…
0 runs0 likes0 downloads0 reach0 impact
1544049 instances - 4 features - classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
10578 instances - 8 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
10578 instances - 8 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
10578 instances - 8 features - 2 classes - 0 missing values
Subsampling of the dataset okcupid-stem (42734) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 20 features - 3 classes - 5992 missing values
Subsampling of the dataset okcupid-stem (42734) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 20 features - 3 classes - 6050 missing values
Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to…
0 runs0 likes0 downloads0 reach0 impact
961 instances - 5 features - 2 classes - 160 missing values
A Tour & Travels Company Wants To Predict Whether A Customer Will Churn Or Not Based On Indicators Given Below. Help Build Predictive Models And Save The Company's Money. Perform Fascinating EDAs. The…
0 runs0 likes0 downloads0 reach0 impact
954 instances - 7 features - 2 classes - 60 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
This data represents crime reported to the Seattle Police Department (SPD). Each row contains the record of a unique event where at least one criminal offense was reported by a member of the community…
0 runs0 likes0 downloads0 reach0 impact
523590 instances - 8 features - 144 classes - 6916 missing values
This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia. This dataset combines raw counts for…
0 runs0 likes0 downloads0 reach0 impact
147269 instances - 4 features - classes - 0 missing values
19Realestatevaluation
0 runs0 likes0 downloads0 reach0 impact
414 instances - 8 features - classes - 0 missing values
wine
0 runs0 likes0 downloads0 reach0 impact
150930 instances - 10 features - classes - 174477 missing values
//Add the description.md of the data file pair0004 Cause-effect is a growing database with two-variable cause-effect pairs created at Max-Planck-Institute for Biological Cybernetics in Tuebingen,…
0 runs0 likes0 downloads0 reach0 impact
348 instances - 2 features - classes - 0 missing values
//Add the description.md of the data file pair0005 Cause-effect is a growing database with two-variable cause-effect pairs created at Max-Planck-Institute for Biological Cybernetics in Tuebingen,…
0 runs0 likes0 downloads0 reach0 impact
4176 instances - 2 features - classes - 0 missing values
//Add the description.md of the data file pair0001 Information for pairs0001: DWD data (Deutscher Wetterdienst) data was taken at 349 stations taken from…
0 runs0 likes0 downloads0 reach0 impact
348 instances - 2 features - classes - 0 missing values
//Add the description.md of the data file pair0002 Cause-effect is a growing database with two-variable cause-effect pairs created at Max-Planck-Institute for Biological Cybernetics in Tuebingen,…
0 runs0 likes0 downloads0 reach0 impact
348 instances - 2 features - classes - 0 missing values
//Add the description.md of the data file pair0003 Cause-effect is a growing database with two-variable cause-effect pairs created at Max-Planck-Institute for Biological Cybernetics in Tuebingen,…
0 runs0 likes0 downloads0 reach0 impact
348 instances - 2 features - classes - 0 missing values
Overview This dataset contains 3 million Sudoku puzzles and their solutions. The level of difficulty varies -- some can be solved easily by a beginner, while others will challenge experienced solvers.…
0 runs0 likes0 downloads0 reach0 impact
3000000 instances - 4 features - 0 classes - 0 missing values
Context The dataset was collated as a casual project using data from Genius and Big Hit. Currently contains 18 albums (check section "Albums in dataset" below for more details). Columns id (int) :…
0 runs0 likes0 downloads0 reach0 impact
231 instances - 16 features - classes - 439 missing values
Context As you all know that, as per the observation of economists, according to the current trend, it seems that the yellow metal is performing better as an investment option in comparison to mutual…
0 runs0 likes0 downloads0 reach0 impact
4971 instances - 6 features - classes - 0 missing values
Context The dataset contains reviews from google playstore on snapchat. With the sentiment analysis, we can check for the users' adoption of andriod version of snapchat, which has been improved…
0 runs0 likes0 downloads0 reach0 impact
32875 instances - 4 features - classes - 0 missing values
Context Many kids' products are thought to contain dangerous metals. Due to this it is important to make sure that children can enjoy their toys while staying as safe as possible. Content In this…
0 runs0 likes0 downloads0 reach0 impact
1635 instances - 10 features - classes - 0 missing values
Google Play Store Google Play, formerly Android Market, is a digital distribution service operated and developed by Google. It serves as the official app store for certified devices running on the…
0 runs0 likes0 downloads0 reach0 impact
12495 instances - 12 features - classes - 15602 missing values
Context This dataset was created by our in house teams at PromptCloud(https://www.promptcloud.com/) and DataStock(https://datastock.shop/). We have about 5K samples in this dataset. You can download…
0 runs0 likes0 downloads0 reach0 impact
741 instances - 8 features - classes - 1575 missing values
Context This dataset was created by our in house teams at PromptCloud(https://www.promptcloud.com/) and DataStock(https://datastock.shop/). We have about 5K samples in this dataset. You can download…
0 runs0 likes0 downloads0 reach0 impact
741 instances - 8 features - classes - 1575 missing values
Context Crop production depends on the availability of arable land and is affected in particular by yields, macroeconomic uncertainty, as well as consumption patterns; it also has a great incidence on…
0 runs0 likes0 downloads0 reach0 impact
21165 instances - 5 features - classes - 0 missing values
Context Meat consumption is related to living standards, diet, livestock production and consumer prices, as well as macroeconomic uncertainty and shocks to GDP. Compared to other commodities, meat is…
0 runs0 likes0 downloads0 reach0 impact
13760 instances - 5 features - classes - 0 missing values
Context Well, what happened was that I was looking for a semi-definite easy-to-read list of international football matches and couldn't find anything decent. So I took it upon myself to collect it for…
0 runs0 likes0 downloads0 reach0 impact
41586 instances - 9 features - classes - 0 missing values
Context Well, what happened was that I was looking for a semi-definite easy-to-read list of international football matches and couldn't find anything decent. So I took it upon myself to collect it for…
0 runs0 likes0 downloads0 reach0 impact
41586 instances - 9 features - classes - 0 missing values
A fake movie dataset.
0 runs0 likes0 downloads0 reach0 impact
14 instances - 4 features - 1 classes - 0 missing values
Upstream data from the twin Archimedes screw hydro-electric generator on the river Thames at Caversham weir, Reading, UK.
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 2 features - 0 classes - 0 missing values
Upstream data from the twin Archimedes screw hydro-electric generator on the river Thames at Caversham weir, Reading, UK.
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 2 features - 0 classes - 0 missing values
Upstream data from the twin Archimedes screw hydro-electric generator on the river Thames at Caversham weir, Reading, UK.
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 2 features - 0 classes - 0 missing values
Upstream data from the twin Archimedes screw hydro-electric generator on the river Thames at Caversham weir, Reading, UK.
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 2 features - 0 classes - 0 missing values
Upstream data from the twin Archimedes screw hydro-electric generator on the river Thames at Caversham weir, Reading, UK.
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 2 features - 0 classes - 0 missing values
PASS is a large-scale image dataset that does not include any humans and which can be used for high-quality pretraining while significantly reducing privacy concerns. Upload by OpenML team.
0 runs0 likes0 downloads0 reach0 impact
1439588 instances - 7 features - 94137 classes - 1775490 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
This dataset contains all the player names and player ids, taken from Sofifa
0 runs0 likes0 downloads0 reach0 impact
11009 instances - 3 features - classes - 0 missing values
testing
0 runs0 likes0 downloads0 reach0 impact
366 instances - 3 features - classes - 0 missing values
Outliers data set extracted from the Illustration (Fig. 3) in "Novelty detection with application to data streams"
0 runs0 likes0 downloads0 reach0 impact
75 instances - 3 features - 4 classes - 0 missing values
Context Since Amazon Echo Dot 2 has been the best selling Alexa product, we decided to extract the reviews posted on Amazon for this device. This particular dataset contains reviews posted in…
0 runs0 likes0 downloads0 reach0 impact
6855 instances - 10 features - 0 classes - 13903 missing values
Context The World Health Organization (WHO) declared the 201920 coronavirus outbreak a pandemic and a Public Health Emergency of International Concern (PHEIC). Evidence of local transmission of the…
0 runs0 likes0 downloads0 reach0 impact
1010 instances - 7 features - classes - 8 missing values
Context this dataset is containing information about HDFC bank equity share information. Content The dataset containing a total of 7 columns Date 5151 non-null datetime64[ns] Open 5082 non-null…
0 runs0 likes0 downloads0 reach0 impact
489 instances - 14 features - classes - 0 missing values
Context I scraped all of the currently available Urban Dictionary pages (611) on 3/26/17 Content word - the slang term added to urban dictionary definition - the definition of said term author - the…
0 runs0 likes0 downloads0 reach0 impact
4272 instances - 6 features - classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
Subsampling of the dataset cmc (23) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed:…
0 runs0 likes0 downloads0 reach0 impact
1473 instances - 10 features - 3 classes - 0 missing values
Subsampling of the dataset cmc (23) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed:…
0 runs0 likes0 downloads0 reach0 impact
1473 instances - 10 features - 3 classes - 0 missing values
Subsampling of the dataset cmc (23) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed:…
0 runs0 likes0 downloads0 reach0 impact
1473 instances - 10 features - 3 classes - 0 missing values
Subsampling of the dataset cmc (23) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed:…
0 runs0 likes0 downloads0 reach0 impact
1473 instances - 10 features - 3 classes - 0 missing values
Subsampling of the dataset cmc (23) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed:…
0 runs0 likes0 downloads0 reach0 impact
1473 instances - 10 features - 3 classes - 0 missing values
Wine data gathered by https://www.kaggle.com/zynicideThe data was scraped from WineEnthusiast during the week of June 15th, 2017. The code for the scraper can be found at…
0 runs0 likes0 downloads0 reach0 impact
150930 instances - 10 features - classes - 174477 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure. For this…
0 runs0 likes0 downloads0 reach0 impact
26969 instances - 8 features - 2 classes - 0 missing values
frf r
0 runs0 likes0 downloads0 reach0 impact
2 instances - 3 features - classes - 0 missing values
This is weather data in arff format
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - classes - 0 missing values
sample
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - classes - 0 missing values
Autistic Spectrum Disorder (ASD) is a neurodevelopment condition associated with significant healthcare costs, and early diagnosis can significantly reduce these. Unfortunately, waiting times for an…
0 runs0 likes0 downloads0 reach0 impact
704 instances - 21 features - classes - 192 missing values
1987 National Indonesia Contraceptive Prevalence Survey
0 runs0 likes0 downloads0 reach0 impact
1473 instances - 10 features - classes - 0 missing values
dgf_test
0 runs0 likes0 downloads0 reach0 impact
3415 instances - 5 features - 2 classes - 1 missing values
dgf_test
0 runs0 likes0 downloads0 reach0 impact
3415 instances - 5 features - 2 classes - 1 missing values
Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM).
0 runs0 likes0 downloads0 reach0 impact
1643 instances - 3 features - classes - 0 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 3 features - classes - 0 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 3 features - 0 classes - 0 missing values
Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM).
0 runs0 likes0 downloads0 reach0 impact
1624 instances - 3 features - classes - 0 missing values
Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM).
0 runs0 likes0 downloads0 reach0 impact
1538 instances - 3 features - classes - 0 missing values
Context The file contains over 11,000 tweets associated with disaster keywords like crash, quarantine, and bush fires as well as the location and keyword itself. The data structure was inherited from…
0 runs0 likes0 downloads0 reach0 impact
11370 instances - 5 features - classes - 3535 missing values
Please, If you enjoyed this dataset, don't forget to upvote it. Content This dataset contains a couple of shows and series are available on Disney+ stream service. Also, this dataset contains Internet…
0 runs0 likes0 downloads0 reach0 impact
992 instances - 19 features - classes - 3433 missing values
Introduction On September 27 1994 the ferry Estonia set sail on a night voyage across the Baltic Sea from the port of Tallin in Estonia to Stockholm. She departed at 19.00 carrying 989 passengers and…
0 runs0 likes0 downloads0 reach0 impact
989 instances - 7 features - 0 classes - 0 missing values
Context This is a dataset i generated during a hackathon for project purpose. Here i have scrapped data from Coursera official web site. Our project aims to help any new learner get the right course…
0 runs0 likes0 downloads0 reach0 impact
891 instances - 7 features - classes - 0 missing values
Context Since as a beginner in machine learning it would be a great opportunity to try some techniques to predict the outcome of the drugs that might be accurate for the patient. Content The target…
0 runs0 likes0 downloads0 reach0 impact
200 instances - 6 features - classes - 0 missing values
Context In the last few years, Twitter became one of the most popular social media platforms. From celebrity status to government policies, Twitter can accommodate a diverse range of people and…
0 runs0 likes0 downloads0 reach0 impact
31115 instances - 5 features - classes - 456 missing values
Content This data is an extract from a bigger reddit dataset (All reddit comments from May 2019, 157Gb or data uncompressed) that contains both more comments and more associated informations…
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - classes - 1 missing values
Context Every year Kaggle conducts an industry-wide survey that presents a truly comprehensive view of the state of data science and machine learning. This dataset combines the data from the past 4…
0 runs0 likes0 downloads0 reach0 impact
80327 instances - 12 features - classes - 226439 missing values
Context Based on 250K lyrics database. Created to perform Supervised NLP sentiment analysis task using Spotify valence audio feature, a measure of the positiveness of the song. Content Preparation of…
0 runs0 likes0 downloads0 reach0 impact
158353 instances - 5 features - classes - 2 missing values
Context I built it on https://www.kaggle.com/zynicide/wine-reviews updating the scraper fetching fresh data. Content All wine reviews from winemag.com for 2017-2020 years. Duplicates are cleared,…
0 runs0 likes0 downloads0 reach0 impact
81115 instances - 15 features - classes - 90161 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 4 features - 0 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
2671 runs0 likes0 downloads0 reach0 impact
48842 instances - 15 features - 2 classes - 6465 missing values
Dataset from Smoothing Methods in Statistics (ftp stat.cmu.edu/datasets) Simonoff, J.S. (1996). Smoothing Methods in Statistics. New York: Springer-Verlag.
7 runs0 likes0 downloads0 reach0 impact
61 instances - 3 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
963 runs0 likes0 downloads0 reach0 impact
380 instances - 3 features - 2 classes - 0 missing values
1. Title: Haberman's Survival Data 2. Sources: (a) Donor: Tjen-Sien Lim (limt@stat.wisc.edu) (b) Date: March 4, 1999 3. Past Usage: 1. Haberman, S. J. (1976). Generalized Residuals for Log-Linear…
3243 runs0 likes0 downloads0 reach0 impact
306 instances - 4 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
511 runs0 likes0 downloads0 reach0 impact
185 instances - 3 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
775 runs0 likes0 downloads0 reach0 impact
43 instances - 3 features - 2 classes - 0 missing values