Filter results by:
This dataset classifies people described by a set of attributes as good or bad credit risks. This dataset comes with a cost matrix: ``` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 ``` It is worse…
506309 runs28 likes312 downloads340 reach34 impact
1000 instances - 21 features - 2 classes - 0 missing values
This data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four-minute "first date" with every other participant of the…
28211 runs19 likes170 downloads189 reach36 impact
8378 instances - 121 features - 2 classes - 18372 missing values
The MNIST database of handwritten digits with 784 features, raw data available at: It can be split in a training set of the first 60,000 examples, and a test set of…
13317 runs9 likes82 downloads91 reach38 impact
70000 instances - 785 features - 10 classes - 0 missing values
The aim of this dataset is to distinguish between nasal (class 0) and oral sounds (class 1). Five different attributes were chosen to characterize each vowel: they are the amplitudes of the five first…
218957 runs6 likes41 downloads47 reach32 impact
5404 instances - 6 features - 2 classes - 0 missing values
Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan -- this is a classification problem. To demonstrate the RFMTC marketing model (a modified version of RFM), this study…
468690 runs6 likes101 downloads107 reach46 impact
748 instances - 5 features - 2 classes - 0 missing values
Author: Volker Lohweg (University of Applied Sciences, Ostwestfalen-Lippe) Source: [UCI]( - 2012 Please cite:…
138170 runs6 likes40 downloads46 reach34 impact
1372 instances - 5 features - 2 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
14637 runs4 likes34 downloads38 reach39 impact
48842 instances - 15 features - 2 classes - 6465 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
394951 runs3 likes34 downloads37 reach39 impact
601 instances - 7 features - 2 classes - 0 missing values
The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of…
0 runs3 likes45 downloads48 reach12 impact
1309 instances - 14 features - 2 classes - 3855 missing values
The satellite dataset comprises of features extracted from satellite observations. In particular, each image was taken under four different light wavelength, two in visible light (green and red) and…
2078 runs3 likes70 downloads73 reach34 impact
5100 instances - 37 features - 2 classes - 0 missing values
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was…
65709 runs3 likes41 downloads44 reach33 impact
45211 instances - 17 features - 2 classes - 0 missing values
A dataset of steel plates' faults, classified into 7 different types. The goal was to train machine learning for automatic pattern recognition. The dataset consists of 27 features describing each…
277767 runs2 likes52 downloads54 reach26 impact
1941 instances - 34 features - 2 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Texture). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143332 runs2 likes67 downloads69 reach419 impact
1599 instances - 65 features - 100 classes - 0 missing values
Dataset creator and donator: Zhi Liu, e-mail: liuzhi8673 '@', institution: National Engineering Research Center for E-Learning, Hubei Wuhan, China Data Set Information: dataset are derived…
65168 runs2 likes49 downloads51 reach217 impact
1500 instances - 10001 features - 50 classes - 0 missing values
NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many…
31975 runs2 likes35 downloads37 reach13 impact
846 instances - 19 features - 4 classes - 0 missing values
Predict a biological response of molecules from their chemical properties. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological…
48680 runs2 likes40 downloads42 reach36 impact
3751 instances - 1777 features - 2 classes - 0 missing values
In human civilisation, languages evolved first, and then came scripts. The Devanagari script is one of the oldest scripts of India, having evolved from the ancient Brahmi script. It came to be adopted…
43 runs2 likes8 downloads10 reach15 impact
92000 instances - 1025 features - 46 classes - 0 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
3039 runs2 likes5 downloads7 reach16 impact
96320 instances - 22 features - 2 classes - 0 missing values
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database: P. Cortez, A.…
64 runs2 likes6 downloads8 reach17 impact
4898 instances - 12 features - 7 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for storage management for receiving and processing ground data. Data comes from McCabe and Halstead features extractors of…
161516 runs2 likes29 downloads31 reach30 impact
2109 instances - 22 features - 2 classes - 0 missing values
A dataset relating characteristics of telephony account features and usage and whether or not the customer churned. Originally used in [Discovering Knowledge in Data: An Introduction to Data…
7512 runs2 likes9 downloads11 reach26 impact
5000 instances - 21 features - 2 classes - 0 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes18 downloads20 reach17 impact
101766 instances - 50 features - 3 classes - 0 missing values
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b) its assigned insurance risk rating, (c) its normalized losses in use as…
3252 runs2 likes26 downloads28 reach10 impact
205 instances - 26 features - 6 classes - 59 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
608 runs1 likes9 downloads10 reach15 impact
1000 instances - 26 features - 2 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Margin). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143811 runs1 likes17 downloads18 reach419 impact
1600 instances - 65 features - 100 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Shape). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143764 runs1 likes40 downloads41 reach417 impact
1600 instances - 65 features - 100 classes - 0 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au1-1000 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of…
3255 runs1 likes9 downloads10 reach23 impact
1000 instances - 21 features - 2 classes - 0 missing values
####1. Summary This dataset contain attributes of dresses and their recommendations according to their sales. Sales are monitor on the basis of alternate days. The attributes present analyzed are:…
19207 runs1 likes6 downloads7 reach19 impact
500 instances - 13 features - 2 classes - 835 missing values
The dataset freMTPL2freq contains risk features for 677,991 motor third-part liability policies (observed mostly on one year). See for more details. The dataset…
0 runs1 likes3 downloads4 reach9 impact
678013 instances - 12 features - classes - 0 missing values
This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle owner or…
0 runs1 likes1 downloads2 reach9 impact
70340 instances - 21 features - 3 classes - 2288 missing values
## **Meta-Album Plankton Dataset (Micro)** The Plankton dataset is created by researchers at the Woods Hole Oceanographic Institution ( Imaging FlowCytobot (IFCB) was used for…
0 runs1 likes1 downloads2 reach1 impact
800 instances - 3 features - 20 classes - 800 missing values
Context This is historical data on cryptocurrency tradings for the period from 2016-01-01 to 2021-02-21. If you enjoy this dataset please upvote so I can see it is popular and I need to update it.…
0 runs1 likes0 downloads1 reach0 impact
2382643 instances - 17 features - classes - 4862194 missing values
Context This dataset was created to make the project "AI Learn to invest" for SaturdaysAI - Euskadi 1st edition. The project can be found in Content…
0 runs1 likes1 downloads2 reach0 impact
405258 instances - 25 features - classes - 0 missing values
Context It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The…
0 runs1 likes3 downloads4 reach9 impact
284807 instances - 31 features - 2 classes - 0 missing values
Context Find the best strategies to improve for the next marketing campaign. How can the financial institution have a greater effectiveness for future marketing campaigns? In order to answer this, we…
0 runs1 likes1 downloads2 reach0 impact
11162 instances - 17 features - classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
146026 runs1 likes18 downloads19 reach27 impact
1563 instances - 38 features - 2 classes - 0 missing values
Context It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The…
0 runs1 likes9 downloads10 reach8 impact
284807 instances - 31 features - 0 classes - 0 missing values
Context Getting access to high-quality historical stock market data can be very expensive and/or complicated; parsing SEC 10-Q filings direct from the SEC EDGAR is difficult due to the varying…
0 runs1 likes0 downloads1 reach0 impact
101787 instances - 45 features - classes - 2857964 missing values
31 runs1 likes7 downloads8 reach23 impact
1599 instances - 12 features - 6 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs1 likes3 downloads4 reach18 impact
5832 instances - 309 features - 2 classes - 0 missing values
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond, Knowledge and Information Systems, Vol. 14, No. 3, 2008. 1 . Abstract: Two ground ozone level data sets are included in…
188264 runs1 likes20 downloads21 reach30 impact
2534 instances - 73 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
13 runs1 likes1 downloads2 reach21 impact
20000 instances - 4297 features - 2 classes - 0 missing values
INTRUSION DETECTOR LEARNING Software to detect network intrusions protects a computer network from unauthorized users, including perhaps insiders. The intrusion detector learning task is to build a…
0 runs1 likes0 downloads1 reach3 impact
4898431 instances - 42 features - 23 classes - 0 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs1 likes0 downloads1 reach1 impact
2215023 instances - 9 features - 2 classes - 0 missing values
No data.
2198 runs1 likes17 downloads18 reach10 impact
1484 instances - 9 features - 10 classes - 0 missing values
This dataset was retrieved 2014-11-14 from the UCI site and converted to the ARFF format. __Major changes w.r.t. version 3: dataset from UCI that matches description and data types__ ### Feature…
4207 runs1 likes10 downloads11 reach15 impact
690 instances - 15 features - 2 classes - 0 missing values
__Changes w.r.t. version 1: included one target factor with 7 levels as target variable for the classification. Also deleted the previous 7 binary target variables.__ A dataset of steel plates'…
9051 runs1 likes3 downloads4 reach16 impact
1941 instances - 28 features - 7 classes - 0 missing values
### Description MicroMass (pure spectra version) is a dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. ### Source ``` Pierre Mahé,…
39941 runs1 likes17 downloads18 reach100 impact
571 instances - 1301 features - 20 classes - 0 missing values
QSAR biodegradation Data Set * Abstract: Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). *…
267861 runs1 likes25 downloads26 reach30 impact
1055 instances - 42 features - 2 classes - 0 missing values
Creators: Renata Cristina Barros Madeo (Madeo, R. C. B.) Priscilla Koch Wagner (Wagner, P. K.) Sarajane Marques Peres (Peres, S. M.) {, priscilla.wagner, sarajane} at…
26636 runs1 likes18 downloads19 reach40 impact
9873 instances - 33 features - 5 classes - 0 missing values
Source: Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@', rami.mustafa.a '@' Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' ) Fadi…
51677 runs1 likes29 downloads30 reach29 impact
11055 instances - 31 features - 2 classes - 0 missing values
Author: Alen Shapiro Source: [UCI]( Please cite: [UCI citation policy]( 1.…
274238 runs1 likes44 downloads45 reach19 impact
3196 instances - 37 features - 2 classes - 0 missing values
Source: James P Bridge, Sean B Holden and Lawrence C Paulson University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD UK +44 (0)1223 763500…
26642 runs1 likes21 downloads22 reach45 impact
6118 instances - 52 features - 6 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](, [ChaLearn]( This is a "supervised learning"…
8 runs1 likes2 downloads3 reach20 impact
5418 instances - 1637 features - 2 classes - 0 missing values
Jarkko Salojarvi, Kai Puolamaki, Jaana Simola, Lauri Kovanen, Ilpo Kojo, Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Helsinki University of Technology, Publications in…
440 runs1 likes12 downloads13 reach16 impact
10936 instances - 28 features - 3 classes - 0 missing values
This is the original version of the famous covertype dataset in ARFF format. Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a…
9 runs1 likes14 downloads15 reach25 impact
581012 instances - 55 features - 7 classes - 0 missing values
This is a test dataset
0 runs0 likes0 downloads0 reach0 impact
Upstream data from the twin Archimedes screw hydro-electric generator on the river Thames at Caversham weir, Reading, UK.
0 runs0 likes0 downloads0 reach0 impact
235 instances - 2 features - 0 classes - 0 missing values
Context This dataset was created by our in house Web Scraping and Data Mining teams at PromptCloud and DataStock. You can download the full dataset here. This sample contains 30K records. Content…
0 runs0 likes0 downloads0 reach0 impact
30000 instances - 16 features - classes - 49458 missing values
0 runs0 likes0 downloads0 reach0 impact
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10402, and it has 65 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
65 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 30047, and it has 97 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
97 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11054, and it has 344 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
344 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11499, and it has 12 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
12 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11286, and it has 297 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
297 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 17024, and it has 373 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
373 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 279, and it has 126 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
126 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 100848, and it has 60 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
60 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 100869, and it has 18 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
18 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10541, and it has 151 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
151 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 17075, and it has 15 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
15 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 101309, and it has 73 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
73 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12950, and it has 34 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
34 instances - 1026 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
336 instances - 15 features - classes - 0 missing values
We use the following representation to collect the dataset age - age bp - blood pressure sg - specific gravity al - albumin su - sugar rbc - red blood cells pc - pus cell pcc - pus cell clumps ba -…
0 runs0 likes0 downloads0 reach0 impact
250 instances - 28 features - classes - 0 missing values
No data.
222 runs0 likes0 downloads0 reach0 impact
1504 instances - 2887 features - 13 classes - 0 missing values
No data.
428 runs0 likes0 downloads0 reach0 impact
1003 instances - 3183 features - 10 classes - 0 missing values
No data.
268 runs0 likes0 downloads0 reach0 impact
3075 instances - 12433 features - 6 classes - 0 missing values
No data.
373 runs0 likes0 downloads0 reach0 impact
918 instances - 3013 features - 10 classes - 0 missing values
No data.
159 runs0 likes0 downloads0 reach0 impact
1657 instances - 3759 features - 25 classes - 0 missing values
No data.
264 runs0 likes0 downloads0 reach0 impact
3204 instances - 13196 features - 6 classes - 0 missing values
No data.
211 runs0 likes0 downloads0 reach0 impact
313 instances - 5805 features - 8 classes - 0 missing values
No data.
163 runs0 likes0 downloads0 reach0 impact
1560 instances - 8461 features - 20 classes - 0 missing values
No data.
216 runs0 likes0 downloads0 reach0 impact
11162 instances - 11466 features - 10 classes - 0 missing values
No data.
203 runs0 likes0 downloads0 reach0 impact
878 instances - 7455 features - 10 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software ( The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach0 impact
6 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software ( The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach0 impact
34 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software ( The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach0 impact
15 instances - 10 features - 0 classes - 0 missing values
No data.
414 runs0 likes0 downloads0 reach0 impact
690 instances - 8262 features - 10 classes - 0 missing values
No data.
220 runs0 likes0 downloads0 reach0 impact
336 instances - 7903 features - 6 classes - 0 missing values
No data.
108 runs0 likes0 downloads0 reach0 impact
927 instances - 10129 features - 7 classes - 0 missing values
No data.
377 runs0 likes0 downloads0 reach0 impact
913 instances - 3101 features - 10 classes - 0 missing values
No data.
219 runs0 likes0 downloads0 reach0 impact
414 instances - 6430 features - 9 classes - 0 missing values
No data.
215 runs0 likes0 downloads0 reach0 impact
204 instances - 5833 features - 6 classes - 0 missing values
No data.
426 runs0 likes0 downloads0 reach0 impact
2463 instances - 2001 features - 17 classes - 0 missing values
No data.
67 runs0 likes0 downloads0 reach0 impact
9558 instances - 26833 features - 44 classes - 0 missing values
One of the data sets used in the book "Analyzing Categorical Data" by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. Further details concerning the book, including information on statistical…
2 runs0 likes0 downloads0 reach0 impact
108 instances - 4 features - 0 classes - 0 missing values
Data on the homicide rate in Detroit for the years 1961-1973. This is the data set called DETROIT in the book 'Subset selection in regression' by Alan J. Miller published in the Chapman & Hall series…
0 runs0 likes0 downloads0 reach0 impact
13 instances - 14 features - 0 classes - 0 missing values
Data on the recurrence times to infection, at the point of insertion of the catheter, for kidney patients using portable dialysis equipment. Catheters may be removed for reasons other than infection,…
2 runs0 likes0 downloads0 reach0 impact
76 instances - 7 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach0 impact
450 instances - 4 features - 0 classes - 0 missing values