Data
Filter results by:
No data.
311 runs0 likes0 downloads0 reach0 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
65 runs0 likes0 downloads0 reach0 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
310 runs0 likes0 downloads0 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
290 runs0 likes0 downloads0 reach0 impact
1000000 instances - 77 features - 10 classes - 0 missing values
No data.
308 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
307 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
309 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
328 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
330 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
206 runs0 likes0 downloads0 reach0 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
306 runs0 likes0 downloads0 reach0 impact
1000000 instances - 13 features - 6 classes - 0 missing values
No data.
52 runs0 likes0 downloads0 reach0 impact
1000000 instances - 48 features - 10 classes - 0 missing values
No data.
52 runs0 likes0 downloads0 reach0 impact
1000000 instances - 65 features - 10 classes - 0 missing values
This is a smaller version of the original dataset, containing 1M rows. ### Attribute Information * The first column is the class label (1 for signal, 0 for background) * 21 low-level features…
0 runs0 likes0 downloads0 reach2 impact
1000000 instances - 29 features - 2 classes - 0 missing values
This dataset captures transaction patterns and behaviors that could indicate potential fraud in card transactions. The data is composed of several features designed to reflect the transactional…
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 8 features - classes - 0 missing values
Context At Shadow Robot, we are leaders in robotic grasping and manipulation. As part of our Smart Grasping System development, we're developing different algorithms using machine learning. This first…
0 runs0 likes0 downloads0 reach0 impact
992641 instances - 29 features - classes - 0 missing values
Klaverjas is an example of the Jack-Nine card games, which are characterized as trick-taking games where the the Jack and nine of the trump suit are the highest-ranking trumps, and the tens and aces…
0 runs0 likes0 downloads0 reach0 impact
981541 instances - 33 features - 2 classes - 0 missing values
Context The International Chess Federation (FIDE) governs international chess competition. FIDE used Elo rating system for calculating the relative skill levels of players. Content The dataset…
0 runs0 likes0 downloads0 reach0 impact
977687 instances - 10 features - classes - 3994218 missing values
Context Real Estate inventory of listings from 2012-2017 Content Includes data for all Real Estate listings in the US, such as, active listings, prices, days on market, price changes, and pending…
0 runs0 likes0 downloads0 reach0 impact
974066 instances - 34 features - classes - 5792129 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
940160 instances - 25 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
940160 instances - 25 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
940160 instances - 25 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
940160 instances - 25 features - 2 classes - 0 missing values
Description: This dataset, named Crime_Data_from_2020_to_Present.csv, provides a detailed record of reported criminal incidents in a given area from the year 2020 onwards. It includes comprehensive…
0 runs0 likes0 downloads0 reach0 impact
932140 instances - 28 features - classes - 5113856 missing values
Weather measures from Versuchsbeete provided by the Max-Planck-Institute for Biogeochemistry Several weather measures provided by Max-Planck-Institute for Biogeochemistry from the Weather Station on…
0 runs0 likes0 downloads0 reach0 impact
916772 instances - 36 features - classes - 730081 missing values
Description: The delhi_state.csv dataset is a structured collection of data associated with various economic, demographic, and social attributes of areas within Delhi state, India. This extensive…
0 runs0 likes0 downloads0 reach0 impact
875308 instances - 25 features - classes - 0 missing values
Description: This dataset "Taxi_Trips_-_2024_20240408.csv" contains information on taxi trips in Chicago starting from February 2024. The dataset includes trip ID, taxi ID, trip start and end…
0 runs0 likes0 downloads0 reach0 impact
865247 instances - 23 features - classes - 1514769 missing values
M4-Competition for time series forecasting, yearly data From original source: ----- The fourth competition, M4, started on 1 January 2018 and ended in 31 May 2018. The M4 extended and replicated the…
0 runs0 likes0 downloads0 reach0 impact
858458 instances - 5 features - classes - 0 missing values
Context The data is of National Stock Exchange of India. The data is compiled to felicitate Machine Learning, without bothering much about Stock APIs. Content The data is of National Stock Exchange of…
0 runs0 likes0 downloads0 reach0 impact
846404 instances - 12 features - classes - 2457 missing values
Context The data is of National Stock Exchange of India. The data is compiled to felicitate Machine Learning, without bothering much about Stock APIs. Content The data is of National Stock Exchange of…
0 runs0 likes0 downloads0 reach0 impact
846404 instances - 12 features - classes - 2457 missing values
Identify jets of particles from the LHC, created for the study of ultra low latency inference with hls4ml. Use 16 high level features to identify the 5 jet classes: quark (q), gluon (g), W boson (w),…
0 runs0 likes0 downloads0 reach0 impact
830000 instances - 17 features - 5 classes - 0 missing values
Normalized version of the pokerhand data set. Automated file upload of pokerhand-normalized.arff
314 runs0 likes0 downloads0 reach0 impact
829201 instances - 11 features - 10 classes - 0 missing values
This is the datasets from the Kaggle Higgs Boson Machine Learning Challenge 2014. The data was downloaded from the [CERN website](http://opendata.cern.ch/record/328), which also hosts the…
0 runs0 likes0 downloads0 reach0 impact
818238 instances - 31 features - 2 classes - 0 missing values
This is the datasets from the Kaggle Higgs Boson Machine Learning Challenge 2014. The data was downloaded from the [CERN website](http://opendata.cern.ch/record/328), which also hosts the…
0 runs0 likes0 downloads0 reach0 impact
818238 instances - 31 features - 2 classes - 5168486 missing values
Interest and Motivation This dataset belongs to the MeToo movement on Twitter. This movement was against the sexual harassment incidents and many people posted various hatred tweets. Using this…
0 runs0 likes0 downloads0 reach0 impact
807174 instances - 9 features - classes - 197782 missing values
Rossmann Store Sales from Kaggle with some pre-processing
0 runs0 likes0 downloads0 reach0 impact
804056 instances - 18 features - 0 classes - 0 missing values
This is the datasets from the Kaggle Higgs Boson Machine Learning Challenge 2014. The data was downloaded from the [CERN website](http://opendata.cern.ch/record/328), which also hosts the…
0 runs0 likes0 downloads0 reach0 impact
800000 instances - 31 features - 2 classes - 5053446 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
0 runs0 likes0 downloads0 reach0 impact
798964 instances - 10 features - 3 classes - 399482 missing values
Sample with OpenML metadata.
0 runs0 likes0 downloads0 reach0 impact
761940 instances - 6 features - 0 classes - 0 missing values
Sampled http://www.openml.org/d/5889
0 runs0 likes0 downloads0 reach0 impact
761940 instances - 6 features - classes - 0 missing values
Context TFT TFT is an 8-player free-for-all drafting tactics game in which the player recruits powerful champions, deploys them, and battles to become the last player standing. When acquired, a…
0 runs0 likes0 downloads0 reach0 impact
748928 instances - 14 features - classes - 187360 missing values
Context The dataset comes from one of the most important parts of a mining process: a flotation plant The main goal is to use this data to predict how much impurity is in the ore concentrate. As this…
0 runs0 likes0 downloads0 reach0 impact
737453 instances - 24 features - classes - 0 missing values
The New York City Bike Share enables quick, easy, and affordable bike trips around the New York city boroughs. They make regular open data releases (this dataset is a transformed version of the data…
0 runs0 likes0 downloads0 reach0 impact
735502 instances - 17 features - classes - 0 missing values
Data is pulled from SDSS Skyserver from data release 16 using the following query. SELECT p.objid,p.ra,p.dec,p.u,p.g,p.r,p.i,p.z, p.run, p.rerun, p.camcol, p.field, s.specobjid, s.class, s.z as…
0 runs0 likes0 downloads0 reach0 impact
732977 instances - 18 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
697641 instances - 47237 features - 0 classes - 0 missing values
Uber vs Lyft This is a very beginner-friendly dataset. It does contain a lot of NA values. It is a good dataset if you want to use a Linear Regression Model to see the pattern between different…
0 runs0 likes0 downloads0 reach0 impact
693071 instances - 56 features - classes - 55095 missing values
The dataset freMTPL2freq contains risk features for 677,991 motor third-part liability policies (observed mostly on one year). See https://github.com/dutangc/CASdatasets for more details. The dataset…
0 runs1 likes3 downloads4 reach9 impact
678013 instances - 12 features - classes - 0 missing values
Context In the dataset freMTPL2freq risk features and claim numbers were collected for 677,991 motor third-part liability policies (observed on a year). Content freMTPL2freq contains 11 columns…
0 runs0 likes0 downloads0 reach0 impact
678013 instances - 11 features - classes - 0 missing values
No data.
90 runs0 likes0 downloads0 reach0 impact
663552 instances - 13 features - 2 classes - 0 missing values
About our Dataset The journey of the collection of this Covid-19 India dataset begin with a competition where we have to do sentiment analysis of tweets. The data was collected from…
0 runs0 likes0 downloads0 reach0 impact
648958 instances - 4 features - classes - 10980 missing values
Context Huge Harry Potter fan. Wanted to collect fan-fiction data to make a dashboard and visualize it. Its in the works. Content I scraped this data from https://www.fanfiction.net/book/Harry-Potter/…
0 runs0 likes0 downloads0 reach0 impact
648493 instances - 16 features - classes - 647586 missing values
Cryptocurrencies Cryptocurrencies are fast becoming rivals to traditional currency across the world. The digital currencies are available to purchase in many different places, making it accessible to…
0 runs0 likes0 downloads0 reach0 impact
632218 instances - 9 features - classes - 69712 missing values
CryptocurrenciesCryptocurrenciesarefastbecomingrivalstotraditionalcurrencyacrosstheworldThedigitalcurrenciesareavailabletopurchaseinmanydifferentplacesmakingitaccessibletoeveryoneandwithretailersacceptingvariouscryptocurrenciesitcouldbeasignthatmoneyasweknowitisabouttogothroughamajorchangeInadditiontheblockchaintechnologyonwhichmanycryptocurrenciesarebasedwithitsrevolutionarydistributeddigitalbackbonehasmanyotherpromisingapplicationsImplementationsofsecuredecentralizedsystemscanaidusinconqueringorganizationalissuesoftrustandsecuritythathaveplaguedoursocietythroughouttheagesIneffectwecanfundamentallydisruptindustriescoretoeconomiesbusinessesandsocialstructureseliminatinginefficiencyandhumanerrorContentThedatasetcontainsallhistoricaldailypricesopenhighlowcloseforallcryptocurrencieslistedonCoinMarketCapAcknowledgementsEveryCryptocurrencyDailyMarketPriceIinitiallydevelopedkernelsforthisdatasetbeforemakingmyownscraperanddatasetsothatIcouldkeepitregularlyupdatedCoinMarketCapForthedata…
0 runs0 likes0 downloads0 reach0 impact
632218 instances - 9 features - classes - 69712 missing values
CryptocurrenciesCryptocurrencies are fast becoming rivals to traditional currency across the world. The digital currencies are available to purchase in many different places, making it accessible to…
0 runs0 likes0 downloads0 reach0 impact
632218 instances - 9 features - classes - 69712 missing values
Context This data is the result of using neural networks and reinforcement learning to simulate the board game "Machi Koro". Here is the source code for the AI and simulation:…
0 runs0 likes0 downloads0 reach0 impact
614584 instances - 86 features - classes - 0 missing values
Insects dataset for Insect Pest Recognition (Stylized)
0 runs0 likes0 downloads0 reach0 impact
600000 instances - 7 features - 20 classes - 0 missing values
Plants Dataset with different species of plants (stylized)
0 runs0 likes0 downloads0 reach0 impact
600000 instances - 7 features - 20 classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
2 runs0 likes0 downloads0 reach12 impact
595212 instances - 38 features - 2 classes - 846458 missing values
2nd Place Lightgbm Solution of Kaggle Porto Seguro’s Safe Driver Prediction
0 runs0 likes0 downloads0 reach0 impact
595212 instances - 224 features - 0 classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
0 runs0 likes0 downloads0 reach2 impact
595212 instances - 58 features - 2 classes - 846458 missing values
Abstract: This data-set contains examples of buzz events from two different social networks: Twitter, and Tom's Hardware, a forum network focusing on new technology with more conservative dynamics.…
2 runs0 likes0 downloads0 reach0 impact
583250 instances - 78 features - 0 classes - 0 missing values
String datetime information extracted to numeric columns.Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC)…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 19 features - 0 classes - 0 missing values
Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC) [http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml]. The dataset includes TLC trips of the green line in…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 15 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 17 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 11 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 17 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 11 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 10 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 10 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 10 features - 0 classes - 0 missing values
The dataset includes New York City Taxi and Limousine Commission (TLC) trips of the green line in December 2016. All trips are paid with a credit card leaving some tip. The variable 'tip_amount' was…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 19 features - 0 classes - 0 missing values
This is the famous covertype dataset in its binary version, retrieved 2013-11-13 from the libSVM site (called covtype.binary there). Additional to the preprocessing done there (see LibSVM site for…
22 runs0 likes0 downloads0 reach0 impact
581012 instances - 55 features - 2 classes - 0 missing values
Normalized version of the Forest Covertype dataset (see version 1), so that the numerical values are between 0 and 1. Contains the forest cover type for 30 x 30 meter cells obtained from US Forest…
342 runs0 likes0 downloads0 reach0 impact
581012 instances - 55 features - 7 classes - 0 missing values
This is the original version of the famous covertype dataset in ARFF format. Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a…
9 runs1 likes14 downloads15 reach25 impact
581012 instances - 55 features - 7 classes - 0 missing values
This file holds global land temperatures by country
0 runs0 likes0 downloads0 reach0 impact
577462 instances - 4 features - classes - 64563 missing values
holds information on average temperature per country
0 runs0 likes0 downloads0 reach0 impact
577462 instances - 4 features - classes - 64563 missing values
Context Thousands of cryptocurrencies have sprung up in the past few years. Can you predict which one will be the next BTC? Content The dataset contains daily opening, high, low, close, and trading…
0 runs0 likes0 downloads0 reach0 impact
567769 instances - 8 features - classes - 0 missing values
Context Thousands of cryptocurrencies have sprung up in the past few years. Can you predict which one will be the next BTC? Content The dataset contains daily opening, high, low, close, and trading…
0 runs0 likes0 downloads0 reach0 impact
567769 instances - 8 features - classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
566602 instances - 11 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
566602 instances - 11 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
566602 instances - 11 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
566602 instances - 11 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
566602 instances - 11 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
566602 instances - 11 features - 2 classes - 0 missing values
Public procurement data for the European Economic Area, Switzerland, and the Macedonia. 2015
0 runs0 likes0 downloads0 reach0 impact
565163 instances - 75 features - 0 classes - 15247061 missing values
Experiment data obtained by running random configurations of an SVM through mlr on 106 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
540576 instances - 15 features - classes - 658962 missing values
Data
0 runs0 likes0 downloads0 reach0 impact
539383 instances - 8 features - 2 classes - 0 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
291 runs0 likes31 downloads31 reach17 impact
539383 instances - 8 features - 2 classes - 0 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
0 runs0 likes0 downloads0 reach0 impact
539383 instances - 8 features - 0 classes - 0 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs0 likes0 downloads0 reach0 impact
538638 instances - 7 features - 2 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
531441 instances - 12 features - 0 classes - 0 missing values
This data represents crime reported to the Seattle Police Department (SPD). Each row contains the record of a unique event where at least one criminal offense was reported by a member of the community…
0 runs0 likes0 downloads0 reach0 impact
523590 instances - 8 features - 144 classes - 6916 missing values
Acknowledgements The data was scraped from Booking.com. All data in the file is publicly available to everyone already. Data is originally owned by Booking.com. Please contact me through my profile if…
0 runs0 likes0 downloads0 reach0 impact
515738 instances - 17 features - classes - 6536 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
515345 instances - 91 features - 0 classes - 0 missing values
Data from the PASCAL Challenge 2008 as available on the LibSVM repository ## Description Preprocessing: The raw data set (epsilon_train) is instance-wisely scaled to unit length and split into two…
0 runs0 likes0 downloads0 reach0 impact
500000 instances - 2001 features - 2 classes - 0 missing values
Rotating hyperplane is a stream generator that generates d-dimensional classification problems in which the prediction is defined by a rotating hyperplane. By changing the orientation and position of…
0 runs0 likes0 downloads0 reach0 impact
500000 instances - 11 features - classes - 0 missing values
This is a 10% stratified subsample of the data from the 1999 ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php). Modified by TunedIT (converted to ARFF format)…
25 runs0 likes0 downloads0 reach0 impact
494020 instances - 42 features - 23 classes - 0 missing values
Andrew V Uzilov, Joshua M Keegan, and David H Mathews. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics, 7(173), 2006. This…
31 runs0 likes0 downloads0 reach0 impact
488565 instances - 9 features - 2 classes - 0 missing values
Normalized form of codrna (351) Andrew V Uzilov, Joshua M Keegan, and David H Mathews. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC…
309 runs0 likes0 downloads0 reach0 impact
488565 instances - 9 features - 2 classes - 0 missing values
## **Meta-Album Plankton Dataset (Extended)** The Plankton dataset is created by researchers at the Woods Hole Oceanographic Institution (https://www.whoi.edu/). Imaging FlowCytobot (IFCB) was used…
0 runs0 likes0 downloads0 reach1 impact
473273 instances - 3 features - 102 classes - 473273 missing values