Data
Filter results by:
HotpotQA is a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the…
0 runs0 likes0 downloads0 reach0 impact
97852 instances - 7 features - classes - 0 missing values
Context All the time of russian elections history we have some insteresting anomalies in the voting results. You can use this dataset to find them) Content So, the each row of the dataset is detailed…
0 runs0 likes0 downloads0 reach0 impact
97705 instances - 23 features - classes - 0 missing values
Context This dataset is a small snap ( sample) out of ocean-depth entries in the original dataset, which keeps increasing day by day. The purpose of this dataset is to allow fellow Scientists/…
0 runs0 likes0 downloads0 reach0 impact
97606 instances - 5 features - 0 classes - 0 missing values
Context Data is collected daily from Our World in Data GitHub repository for covid-19, merged and uploaded. Content The data contains the following information: Country- this is the country for which…
0 runs0 likes0 downloads0 reach0 impact
97606 instances - 5 features - classes - 0 missing values
Context This is a dataset for a larger project I have been working on. My idea is to analyze and compare real historical weather with weather folklore. Content The CSV file includes a hourly/daily…
0 runs0 likes0 downloads0 reach0 impact
96453 instances - 12 features - classes - 517 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
3039 runs2 likes5 downloads7 reach16 impact
96320 instances - 22 features - 2 classes - 0 missing values
Context This data was collected to be used with an academic project of mine. The project was about sentiment analysis of tweets during lockdown. Content I used the GetOldTweets3…
0 runs0 likes0 downloads0 reach0 impact
95488 instances - 7 features - classes - 160244 missing values
Context The data obtained from the Mexico's General Direction of Epidemiology contains multiple information on the current pandemic situation. However, these data are saturated with features that may…
0 runs0 likes0 downloads0 reach0 impact
92320 instances - 7 features - classes - 0 missing values
Experiment data obtained by running random configurations of rpart through mlr on 115 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
92067 instances - 12 features - classes - 0 missing values
In human civilisation, languages evolved first, and then came scripts. The Devanagari script is one of the oldest scripts of India, having evolved from the ancient Brahmi script. It came to be adopted…
43 runs2 likes8 downloads10 reach15 impact
92000 instances - 1025 features - 46 classes - 0 missing values
It has 3 attributes (ID, tweet, label ) 91299 tweets with non-sarcastic 39998 tweets and 51300 sarcastic tweets.
0 runs0 likes0 downloads0 reach0 impact
91298 instances - 2 features - 0 classes - 0 missing values
And another sample. (v. 2 without OpenML metainfo)
0 runs0 likes0 downloads0 reach0 impact
89640 instances - 6 features - classes - 0 missing values
Another sample of COMET_MC
0 runs0 likes0 downloads0 reach0 impact
89640 instances - 6 features - 0 classes - 0 missing values
This dataset is gather to detect whether a person is running or walking based on deep neural networks and sensor data collected from iOS devices. The dataset represents 88588 sensor data samples…
1 runs0 likes4 downloads4 reach14 impact
88588 instances - 7 features - 2 classes - 0 missing values
This is a historical data of HangSeng Futures Index based in Hong Kong. For non traders, the data is a time-series (sequential flow of numbers) describing the HangSeng Futures Index of HongKong. Every…
0 runs0 likes0 downloads0 reach0 impact
87645 instances - 8 features - classes - 0 missing values
Context This works focuses upon creating a data set on Pandas Q/A over StackOverflow. Presently, there are more than 90k+ questions available on StackOverflow which have been asked under Pandas…
0 runs0 likes0 downloads0 reach0 impact
87241 instances - 16 features - classes - 472864 missing values
This dataset combines records from the MLCQ dataset with metrics extracted using the PMD Tool and the Understand tool, to determine whether a file contains code smells. Please note that the records…
0 runs0 likes0 downloads0 reach0 impact
86467 instances - 67 features - 0 classes - 2852906 missing values
This dataset combines records from the MLCQ dataset with metrics extracted using the PMD Tool and the Understand tool, to determine whether a file contains code smells. Please note that the records…
0 runs0 likes0 downloads0 reach0 impact
83943 instances - 67 features - 0 classes - 2801627 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
14 runs0 likes1 downloads1 reach20 impact
83733 instances - 55 features - 4 classes - 0 missing values
Dataset KDD98 challenge: https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html The goal is to estimate the return from a direct mailing in order to maximize donation profits. This dataset…
0 runs0 likes1 downloads1 reach9 impact
82318 instances - 478 features - 2 classes - 2399311 missing values
A csv file with 80,000+ tweets from January 6th, 2021 -- the day of the capitol hill riots. Made using the Twitter Developer API + Tweepy. Nowhere close to the size of the Parler data dumps, but…
0 runs0 likes0 downloads0 reach0 impact
82309 instances - 14 features - classes - 392323 missing values
Context I built it on https://www.kaggle.com/zynicide/wine-reviews updating the scraper fetching fresh data. Content All wine reviews from winemag.com for 2017-2020 years. Duplicates are cleared,…
0 runs0 likes0 downloads0 reach0 impact
81115 instances - 15 features - classes - 90161 missing values
Context Every year Kaggle conducts an industry-wide survey that presents a truly comprehensive view of the state of data science and machine learning. This dataset combines the data from the past 4…
0 runs0 likes0 downloads0 reach0 impact
80327 instances - 12 features - classes - 226439 missing values
tbd
0 runs0 likes0 downloads0 reach0 impact
80000 instances - 20 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
78732 instances - 11 features - 0 classes - 0 missing values
Overview Identification COUNTRY Rwanda TITLE Integrated Household Living Conditions Survey 2010-2011 TRANSLATED TITLE Enqute Intgrale sur les conditions de vie des mnages 2010-2011 STUDY TYPE…
0 runs0 likes0 downloads0 reach0 impact
78432 instances - 23 features - classes - 907612 missing values
A Vicon motion capture camera system was used to record 12 users performing 5 hand postures with markers attached to a left-handed glove. A rigid pattern of markers on the back of the glove was used…
0 runs0 likes0 downloads0 reach0 impact
78096 instances - 38 features - classes - 974700 missing values
This is the dataset used for the 2016 IDA Industrial Challenge, courtesy of Scania. For a full description, see http://archive.ics.uci.edu/ml/datasets/IDA2016Challenge . This dataset contains both the…
9 runs0 likes2 downloads2 reach19 impact
76000 instances - 171 features - 2 classes - 1078695 missing values
## **Meta-Album Insects2 Dataset (Extended)** The pest insects dataset was originally created as a large scale benchmark dataset for Insect Pest Recognition (https://github.com/xpwu95/IP102). It…
0 runs0 likes0 downloads0 reach1 impact
75222 instances - 3 features - 102 classes - 75222 missing values
Payments given by healthcare manufacturing companies to medical doctors or hospitals
0 runs0 likes0 downloads0 reach0 impact
73558 instances - 6 features - 2 classes - 83182 missing values
Test dataset to see upload.
0 runs0 likes0 downloads0 reach0 impact
73503 instances - 4 features - 2 classes - 0 missing values
Predicting forest cover ...
0 runs0 likes0 downloads0 reach0 impact
73503 instances - 4 features - 2 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
73503 instances - 4 features - classes - 0 missing values
fake dataset without any value
0 runs0 likes0 downloads0 reach0 impact
73503 instances - 4 features - classes - 0 missing values
This is the description of a test dataset
0 runs0 likes0 downloads0 reach0 impact
73503 instances - 3 features - 2 classes - 0 missing values
Description: The dataset contains information about various products, their stock levels, prices, and the locations where they are sold. Columns description: 1. Product: Represents the name of the…
0 runs0 likes0 downloads0 reach0 impact
73503 instances - 4 features - classes - 0 missing values
Content The SP BSE SENSEX (SP Bombay Stock Exchange Sensitive Index), also called the BSE 30 or simply the SENSEX, is a free-float market-weighted stock market index of 30 well-established and…
0 runs0 likes0 downloads0 reach0 impact
73316 instances - 8 features - classes - 1500 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
72998 instances - 51 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
72998 instances - 51 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
72998 instances - 51 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
72998 instances - 51 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
72998 instances - 51 features - 2 classes - 0 missing values
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The…
3 runs0 likes3 downloads3 reach14 impact
72983 instances - 33 features - 2 classes - 149271 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Vikas Sindhwani for the SVMlin project.
0 runs0 likes0 downloads0 reach0 impact
72309 instances - 20959 features - 0 classes - 0 missing values
This data set consists of positions and absorbed power outputs of wave energy converters (WECs) in four real wave scenarios from the southern coast of Australia. The data is obtained from an…
0 runs0 likes0 downloads0 reach0 impact
72000 instances - 49 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
71090 instances - 8 features - 2 classes - 0 missing values
COVID-19 Dataset for Epidemic Model Development I combined several data sources to gain an integrated dataset involving country-level COVID-19 confirmed, recovered and fatalities cases which can be…
0 runs0 likes0 downloads0 reach0 impact
70464 instances - 11 features - classes - 1835 missing values
This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle owner or…
0 runs1 likes1 downloads2 reach9 impact
70340 instances - 21 features - 3 classes - 2288 missing values
## Data description There are 3 types of input features: * Objective: factual information; * Examination: results of medical examination; * Subjective: information given by the patient. Features: 1.…
0 runs0 likes0 downloads0 reach0 impact
70000 instances - 12 features - 2 classes - 0 missing values
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the…
0 runs0 likes1 downloads1 reach11 impact
70000 instances - 785 features - 10 classes - 0 missing values
The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of…
13317 runs9 likes82 downloads91 reach38 impact
70000 instances - 785 features - 10 classes - 0 missing values
Fashion-MNIST is a dataset of Zalando's article images, consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a…
455 runs0 likes12 downloads12 reach27 impact
70000 instances - 785 features - 10 classes - 0 missing values
The dataset contains a million randomly sampled video instances listing 10 fundamental video characteristics along with the YouTube video ID. The videos were all transcribed from one format into…
0 runs0 likes0 downloads0 reach0 impact
68784 instances - 19 features - 0 classes - 0 missing values
Context All credit of this database goes to Tim Sevenhuysen of OraclesElixir.com. Im just uploading it here because I want to see what you guys do with this dataset before Worlds! Im super hyped!…
0 runs0 likes0 downloads0 reach0 impact
67980 instances - 103 features - classes - 1527593 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: We used binary encoding for each feature (o, b, x), so the number of features is 42*3 = 126
0 runs0 likes0 downloads0 reach0 impact
67557 instances - 127 features - 0 classes - 0 missing values
This database contains all legal 8-ply positions in the game of connect-4 in which neither player has won yet, and in which the next move is not forced. Attributes represent board positions on a 6x6…
9766 runs0 likes12 downloads12 reach28 impact
67557 instances - 43 features - 3 classes - 0 missing values
Context This dataset contains IMDb ratings and votes information for movies having original title. Useful for creating top rated movies recommender system. Content Descriptions of the columns: titleId…
0 runs0 likes0 downloads0 reach0 impact
67408 instances - 5 features - classes - 1 missing values
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The…
0 runs0 likes0 downloads0 reach0 impact
67212 instances - 31 features - 0 classes - 0 missing values
No description available
0 runs0 likes0 downloads0 reach0 impact
66469 instances - 66 features - 0 classes - 0 missing values
Context This data set contains measles vaccination rate data for 46,412 schools in 32 states across the US. Content Vaccination rates are for the 2017-201818 school year for the following states:…
0 runs0 likes0 downloads0 reach0 impact
66113 instances - 15 features - classes - 322786 missing values
The collection consists of six data sets containing telemetry data of the Mars Express Spacecraft (MEX), a spacecraft orbiting Mars and operated by the European Space Agency. The data, in terms of…
0 runs0 likes0 downloads0 reach0 impact
65952 instances - 518 features - classes - 331128 missing values
Source: 1. Muhammad Naeem, Centre of Research in Data Engineering(CORDE) & Department of Computer Science, MAJU Islamabad Pakistan(naeems.naeem '@' gmail.com). 2. Sohail Asghar, Director/Associate…
0 runs0 likes0 downloads0 reach0 impact
65554 instances - 29 features - classes - 0 missing values
This data set was collected from the internet traffic records on a university's firewall. There are 12 features in total. Action feature is used as a class. There are 4 classes in total. These are…
0 runs0 likes0 downloads0 reach0 impact
65532 instances - 12 features - classes - 0 missing values
The classification task of this database is to determine where patients in a postoperative recovery area should be sent to next. Because hypothermia is a significant concern after surgery (Woolery, L.…
0 runs0 likes0 downloads0 reach0 impact
65532 instances - 12 features - classes - 0 missing values
Data set of around 45 language and 25 Category. Consist of articles.
0 runs0 likes0 downloads0 reach0 impact
65428 instances - 3 features - classes - 0 missing values
This is a "supervised learning" challenge in machine learning. We are making available 30 datasets, all pre-formatted in given feature representations (this means that each example consists of a fixed…
12 runs0 likes3 downloads3 reach20 impact
65196 instances - 28 features - 100 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes0 downloads0 reach0 impact
64700 instances - 301 features - 0 classes - 0 missing values
Context Includes data on confirmed cases, deaths, hospitalizations, and testing, as well as other variables of potential interest. Content As of 26 January 2021, the columns are: isocode, continent,…
0 runs0 likes0 downloads0 reach0 impact
63381 instances - 59 features - classes - 1508423 missing values
Background When is my university campus gym least crowded, so I know when to work out? We measured how many people were in this gym once every 10 minutes over the last year. We want to be able to…
0 runs0 likes0 downloads0 reach0 impact
62184 instances - 11 features - classes - 0 missing values
rotated MNIS digits, from http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/MnistVariations
0 runs0 likes0 downloads0 reach0 impact
62000 instances - 785 features - 0 classes - 0 missing values
## Overview The Otto Group is one of the world's biggest e-commerce companies, with subsidiaries in more than 20 countries, including Crate & Barrel (USA), Otto.de (Germany) and 3 Suisses (France). We…
0 runs0 likes0 downloads0 reach0 impact
61878 instances - 94 features - 9 classes - 0 missing values
Content The data contains Best Sellers List published by The New York Times every Sunday. The temporal range is from 03-Jan-2010 to 29-Dec-2019 which makes it a whole decade of data. Each week, 5…
0 runs0 likes0 downloads0 reach0 impact
61430 instances - 12 features - classes - 9476 missing values
Context The No Show problem is one of the bigest on the health industry, about 30 of the patient fail theirs appointments. Content 61K points, from 2017.01.01 to 2017.04.30 and 19 features to work…
0 runs0 likes0 downloads0 reach0 impact
61214 instances - 19 features - 0 classes - 0 missing values
Context I was searching for a master degree program in data-science when I found this awesome website mastersportal, So I just scrapped it to take my time analysing all master programs available…
0 runs0 likes0 downloads0 reach0 impact
60425 instances - 23 features - classes - 178594 missing values
test
0 runs0 likes0 downloads0 reach0 impact
60197 instances - 6 features - classes - 128136 missing values
test
0 runs0 likes0 downloads0 reach0 impact
60197 instances - 6 features - classes - 128136 missing values
test
0 runs0 likes0 downloads0 reach0 impact
60197 instances - 6 features - classes - 42138 missing values
test
0 runs0 likes0 downloads0 reach0 impact
60197 instances - 6 features - classes - 42138 missing values
test
0 runs0 likes0 downloads0 reach0 impact
60197 instances - 6 features - classes - 42138 missing values
test
0 runs0 likes0 downloads0 reach0 impact
60197 instances - 6 features - classes - 42138 missing values
0. airplane 1. automobile 2. bird 3. cat 4. deer 5. dog 6. frog 7. horse 8. ship 9. truck CIFAR-10 contains 6000 images per class. The original train-test split randomly divided these into 5000 train…
160 runs0 likes6 downloads6 reach21 impact
60000 instances - 3073 features - 10 classes - 0 missing values
This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are…
0 runs0 likes0 downloads0 reach0 impact
60000 instances - 3073 features - 100 classes - 0 missing values
Anonymized data of dating profiles from OkCupid
0 runs0 likes0 downloads0 reach0 impact
59946 instances - 31 features - 0 classes - 273249 missing values
Data taken from ourworldindata.org For more data go here: Total confirmed cases: https://covid.ourworldindata.org/data/ecdc/total_cases.csv Total deaths:…
0 runs0 likes0 downloads0 reach0 impact
59354 instances - 10 features - classes - 24447 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
59049 instances - 10 features - 0 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
15 runs0 likes1 downloads1 reach20 impact
58310 instances - 181 features - 10 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on both numerical and categorical…
0 runs0 likes0 downloads0 reach0 impact
58252 instances - 32 features - 2 classes - 0 missing values
Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Statlog+(Shuttle)) Donor: Jason Catlett Basser Department of Computer Science, University of Sydney, N.S.W., Australia Data Set Information:…
12 runs0 likes4 downloads4 reach25 impact
58000 instances - 10 features - 7 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
57580 instances - 55 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
57580 instances - 55 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
57580 instances - 55 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
57580 instances - 55 features - 2 classes - 0 missing values
This is a test
0 runs0 likes0 downloads0 reach0 impact
57580 instances - 55 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
57580 instances - 55 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
57580 instances - 55 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
57580 instances - 55 features - 2 classes - 0 missing values
Context This is home value data for the hot Nashville market. Content There are 56,000+ rows altogether. However, I'm missing home detail data for about half. So if anyone wants to track that down…
0 runs0 likes0 downloads0 reach0 impact
56636 instances - 31 features - classes - 648773 missing values
Content The data includes the text, whether the tweet is a retweet, whether the tweet is deleted, and so much more. It is sorted by descending date (so the highest rows are from 2009 and the last rows…
0 runs0 likes0 downloads0 reach0 impact
56571 instances - 9 features - classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
55319 instances - 736 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on both numerical and categorical…
0 runs0 likes0 downloads0 reach0 impact
55319 instances - 736 features - 0 classes - 0 missing values