OpenML
Filter results by:
GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusable digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection…
466 runs0 likes0 downloads0 reach0 impact
7000 instances - 5001 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
455 runs0 likes0 downloads0 reach0 impact
108 instances - 4 features - 2 classes - 0 missing values
Fashion-MNIST is a dataset of Zalando's article images, consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a…
455 runs0 likes12 downloads12 reach27 impact
70000 instances - 785 features - 10 classes - 0 missing values
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, VOL 286, pp. 531-537, 15 October 1999. Web supplement to the article T.R. Golub, D. K.…
451 runs0 likes0 downloads0 reach0 impact
72 instances - 7130 features - 2 classes - 0 missing values
Source: David Gil, dgil '@' dtic.ua.es, Lucentia Research Group, Department of Computer Technology, University of Alicante Jose Luis Girela, girela '@' ua.es, Department of Biotechnology, University…
451 runs0 likes0 downloads0 reach0 impact
100 instances - 10 features - 2 classes - 0 missing values
Jarkko Salojarvi, Kai Puolamaki, Jaana Simola, Lauri Kovanen, Ilpo Kojo, Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Helsinki University of Technology, Publications in…
440 runs1 likes12 downloads13 reach16 impact
10936 instances - 28 features - 3 classes - 0 missing values
This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be…
434 runs0 likes0 downloads0 reach0 impact
7019 instances - 61 features - 8 classes - 48089 missing values
No data.
428 runs0 likes0 downloads0 reach0 impact
1003 instances - 3183 features - 10 classes - 0 missing values
No data.
426 runs0 likes0 downloads0 reach0 impact
2463 instances - 2001 features - 17 classes - 0 missing values
* Abstract: The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system. This is a…
423 runs0 likes0 downloads0 reach0 impact
120 instances - 7 features - 2 classes - 0 missing values
No data.
416 runs0 likes0 downloads0 reach0 impact
1050 instances - 3239 features - 10 classes - 0 missing values
No data.
414 runs0 likes0 downloads0 reach0 impact
690 instances - 8262 features - 10 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
406 runs0 likes0 downloads0 reach0 impact
4229 instances - 1618 features - 2 classes - 0 missing values
No data.
405 runs0 likes0 downloads0 reach0 impact
45164 instances - 75 features - 11 classes - 0 missing values
Vehicle classification in distributed sensor networks. Journal of Parallel and Distributed Computing, 64(7):826-838, July 2004. This is the SensIT Vehicle (combined) dataset, retrieved 2013-11-14 from…
403 runs0 likes0 downloads0 reach0 impact
98528 instances - 101 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
396 runs0 likes0 downloads0 reach0 impact
3468 instances - 785 features - 10 classes - 0 missing values
* Abstract: The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system. * Source: Jacek…
391 runs0 likes0 downloads0 reach0 impact
120 instances - 7 features - 2 classes - 0 missing values
Hayes-Roth Database This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Source…
384 runs0 likes0 downloads0 reach0 impact
160 instances - 5 features - 3 classes - 0 missing values
This is a 20,000 instance sample of the original CIFAR-10 dataset. Sampled randomly and stratified, with 2000 examples per class. Training and test set are merged. Find the corresponding task for the…
380 runs0 likes0 downloads0 reach0 impact
20000 instances - 3073 features - 10 classes - 0 missing values
No data.
377 runs0 likes0 downloads0 reach0 impact
913 instances - 3101 features - 10 classes - 0 missing values
No data.
373 runs0 likes0 downloads0 reach0 impact
918 instances - 3013 features - 10 classes - 0 missing values
Source: http://www.ijcaonline.org/archives/volume47/number18/7291-0509 Data Set Information: In this paper, we look for to recognize the causes of users tend to cyber space in Kohkiloye and Boyer…
373 runs0 likes0 downloads0 reach0 impact
100 instances - 6 features - 2 classes - 0 missing values
Normalized version of vehicle dataset (http://www.openml.org/d/54) NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted…
372 runs0 likes0 downloads0 reach0 impact
98528 instances - 101 features - 2 classes - 0 missing values
This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be…
366 runs0 likes0 downloads0 reach0 impact
8844 instances - 61 features - 7 classes - 51515 missing values
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807…
357 runs0 likes0 downloads0 reach0 impact
284807 instances - 31 features - 2 classes - 0 missing values
No data.
356 runs0 likes0 downloads0 reach0 impact
131072 instances - 17 features - 2 classes - 0 missing values
This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be…
354 runs0 likes0 downloads0 reach0 impact
7485 instances - 61 features - 7 classes - 52048 missing values
No data.
353 runs0 likes0 downloads0 reach0 impact
120919 instances - 1002 features - 2 classes - 0 missing values
Embryonal tumours of the central nervous system Prediction of Central Nervous System Embryonal Tumour Outcome based on Gene Expression. Nature, VOL 415, pp. 436-442, 24 January 2002. Scott L. Pomeroy,…
343 runs0 likes0 downloads0 reach0 impact
60 instances - 7130 features - 2 classes - 0 missing values
Normalized version of the Forest Covertype dataset (see version 1), so that the numerical values are between 0 and 1. Contains the forest cover type for 30 x 30 meter cells obtained from US Forest…
342 runs0 likes0 downloads0 reach0 impact
581012 instances - 55 features - 7 classes - 0 missing values
No data.
337 runs0 likes0 downloads0 reach0 impact
1000000 instances - 13 features - 3 classes - 0 missing values
No data.
334 runs0 likes0 downloads0 reach0 impact
1000000 instances - 33 features - 2 classes - 0 missing values
Dataset created to study concept drift in stream mining. It is constructed by combining the Covertype, Poker-Hand, and Electricity datasets. More details can be found in: Albert Bifet, Geoff Holmes,…
332 runs0 likes0 downloads0 reach0 impact
1455525 instances - 73 features - 10 classes - 0 missing values
No data.
332 runs0 likes0 downloads0 reach0 impact
1000000 instances - 17 features - 2 classes - 0 missing values
No data.
331 runs0 likes0 downloads0 reach0 impact
1000000 instances - 20 features - 2 classes - 0 missing values
No data.
330 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
328 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
326 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 2 classes - 0 missing values
No data.
326 runs0 likes0 downloads0 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
326 runs0 likes0 downloads0 reach0 impact
1000000 instances - 16 features - 2 classes - 0 missing values
No data.
324 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
Synthetic dataset. Almost identical to [dataset 152](https://www.openml.org/d/153/edit)
319 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 2 classes - 0 missing values
No data.
315 runs0 likes0 downloads0 reach0 impact
295245 instances - 11 features - 5 classes - 0 missing values
Normalized version of the pokerhand data set. Automated file upload of pokerhand-normalized.arff
314 runs0 likes0 downloads0 reach0 impact
829201 instances - 11 features - 10 classes - 0 missing values
No data.
314 runs0 likes0 downloads0 reach0 impact
1000000 instances - 36 features - 19 classes - 0 missing values
No data.
313 runs0 likes0 downloads0 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
313 runs0 likes0 downloads0 reach0 impact
399482 instances - 12 features - 2 classes - 0 missing values
No data.
312 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 3 classes - 0 missing values
No data.
311 runs0 likes0 downloads0 reach0 impact
1000000 instances - 10 features - 2 classes - 0 missing values
No data.
311 runs0 likes0 downloads0 reach0 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
310 runs0 likes0 downloads0 reach0 impact
1000000 instances - 14 features - 5 classes - 0 missing values
No data.
310 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 2 classes - 0 missing values
No data.
310 runs0 likes0 downloads0 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
Normalized form of codrna (351) Andrew V Uzilov, Joshua M Keegan, and David H Mathews. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC…
309 runs0 likes0 downloads0 reach0 impact
488565 instances - 9 features - 2 classes - 0 missing values
No data.
309 runs0 likes0 downloads0 reach0 impact
1000000 instances - 35 features - 6 classes - 0 missing values
No data.
309 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
308 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
307 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
307 runs0 likes0 downloads0 reach0 impact
1000000 instances - 41 features - 3 classes - 0 missing values
No data.
307 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
306 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
306 runs0 likes0 downloads0 reach0 impact
1000000 instances - 13 features - 6 classes - 0 missing values
No data.
305 runs0 likes0 downloads0 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
305 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
304 runs0 likes0 downloads0 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
304 runs0 likes0 downloads0 reach0 impact
1000000 instances - 25 features - 10 classes - 0 missing values
A 4-class version of breast-tissue dataset.
299 runs0 likes0 downloads0 reach0 impact
106 instances - 10 features - 4 classes - 0 missing values
No data.
298 runs0 likes0 downloads0 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
296 runs0 likes0 downloads0 reach0 impact
96 instances - 4027 features - 9 classes - 19667 missing values
No data.
296 runs0 likes0 downloads0 reach0 impact
1000000 instances - 61 features - 2 classes - 0 missing values
No data.
293 runs0 likes0 downloads0 reach0 impact
1000000 instances - 17 features - 10 classes - 0 missing values
No data.
292 runs0 likes0 downloads0 reach0 impact
1000000 instances - 37 features - 6 classes - 0 missing values
No data.
291 runs0 likes0 downloads0 reach0 impact
1000000 instances - 17 features - 7 classes - 0 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
291 runs0 likes31 downloads31 reach17 impact
539383 instances - 8 features - 2 classes - 0 missing values
No data.
290 runs0 likes0 downloads0 reach0 impact
1000000 instances - 77 features - 10 classes - 0 missing values
No data.
288 runs0 likes0 downloads0 reach0 impact
1000000 instances - 15 features - 9 classes - 0 missing values
No data.
283 runs0 likes0 downloads0 reach0 impact
96 instances - 4027 features - 11 classes - 19667 missing values
* Source: JP Marques de Sá, INEB-Instituto de Engenharia Biomédica, Porto, Portugal; e-mail: jpmdesa '@' gmail.com J Jossinet, inserm, Lyon, France * Data Set Information: Impedance measurements…
280 runs0 likes0 downloads0 reach0 impact
106 instances - 10 features - 6 classes - 0 missing values
* Dataset Title: Volcanoes on Venus - JARtool experiment Data Set Experiment: A1 * Source: Michael C. Burl MS 126-347, JPL 4800 Oak Grove Drive Pasadena, CA 91109 (818) 393-5345 Michael.C.Burl '@'…
273 runs0 likes0 downloads0 reach0 impact
3252 instances - 4 features - 5 classes - 0 missing values
No data.
268 runs0 likes0 downloads0 reach0 impact
3075 instances - 12433 features - 6 classes - 0 missing values
No data.
264 runs0 likes0 downloads0 reach0 impact
3204 instances - 13196 features - 6 classes - 0 missing values
No data.
253 runs0 likes0 downloads0 reach0 impact
1076790 instances - 30 features - 2 classes - 7275 missing values
The first 5 variables are all blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. Each line in the dataset constitutes the record of a…
238 runs0 likes0 downloads0 reach0 impact
345 instances - 6 features - 0 classes - 0 missing values
Donor: Will Taylor (taylor@pluto.arc.nasa.gov) Database of surgeries on horses. Possible class attributes: 24 (whether lesion is surgical), others include: 23, 25, 26, and 27 Notes: * Hospital_Number…
236 runs0 likes0 downloads0 reach0 impact
368 instances - 27 features - 2 classes - 1927 missing values
No data.
230 runs0 likes0 downloads0 reach0 impact
1000000 instances - 35 features - 2 classes - 0 missing values
No data.
225 runs0 likes0 downloads0 reach0 impact
1000000 instances - 21 features - 2 classes - 0 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element…
223 runs0 likes18 downloads18 reach19 impact
50000 instances - 231 features - 2 classes - 8024152 missing values
No data.
222 runs0 likes0 downloads0 reach0 impact
1504 instances - 2887 features - 13 classes - 0 missing values
No data.
220 runs0 likes0 downloads0 reach0 impact
336 instances - 7903 features - 6 classes - 0 missing values
No data.
219 runs0 likes0 downloads0 reach0 impact
414 instances - 6430 features - 9 classes - 0 missing values
No data.
219 runs0 likes0 downloads0 reach0 impact
1000000 instances - 58 features - 2 classes - 0 missing values
Mammography dataset Past Usage: 1. Woods, K., Doss, C., Bowyer, K., Solka, J., Priebe, C.,
218 runs0 likes0 downloads0 reach0 impact
11183 instances - 7 features - 2 classes - 0 missing values
No data.
216 runs0 likes0 downloads0 reach0 impact
11162 instances - 11466 features - 10 classes - 0 missing values
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service…
216 runs0 likes0 downloads0 reach0 impact
110393 instances - 55 features - 7 classes - 0 missing values
No data.
215 runs0 likes0 downloads0 reach0 impact
204 instances - 5833 features - 6 classes - 0 missing values
No data.
211 runs0 likes0 downloads0 reach0 impact
313 instances - 5805 features - 8 classes - 0 missing values
No data.
211 runs0 likes0 downloads0 reach0 impact
1000000 instances - 20 features - 7 classes - 0 missing values
No data.
206 runs0 likes0 downloads0 reach0 impact
1000000 instances - 39 features - 6 classes - 0 missing values
Oil dataset Past Usage: 1. Kubat, M., Holte, R.,
204 runs0 likes0 downloads0 reach0 impact
937 instances - 50 features - 2 classes - 0 missing values
No data.
203 runs0 likes0 downloads0 reach0 impact
878 instances - 7455 features - 10 classes - 0 missing values