OpenML
Filter results by:
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 91 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes0 downloads0 reach0 impact
32561 instances - 124 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes0 downloads0 reach0 impact
48842 instances - 124 features - 0 classes - 0 missing values
libSVM","AAD group Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Cell Biology, 96:6745-6750, 1999. #Dataset from…
0 runs0 likes0 downloads0 reach0 impact
62 instances - 2001 features - 0 classes - 0 missing values
libSVM","AAD group A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, 2003. #Dataset from the LIBSVM data repository…
0 runs0 likes0 downloads0 reach0 impact
7089 instances - 5 features - 0 classes - 0 missing values
libSVM","AAD group A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246-2253, 2003. #Dataset from the LIBSVM data repository.…
0 runs0 likes0 downloads0 reach0 impact
86 instances - 7130 features - 0 classes - 0 missing values
Building projectable classifiers of arbitrary complexity. In Proceedings of the 13th International Conference on Pattern Recognition, pages 880-885, Vienna, Austria, August 1996. #Dataset from the…
0 runs0 likes0 downloads0 reach0 impact
862 instances - 3 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach16 impact
1000 instances - 25 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes0 downloads0 reach0 impact
32561 instances - 124 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes0 downloads0 reach0 impact
32561 instances - 124 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes0 downloads0 reach0 impact
32561 instances - 124 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
209 instances - 8 features - classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes0 downloads0 reach0 impact
32561 instances - 124 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes0 downloads0 reach0 impact
32561 instances - 124 features - 0 classes - 0 missing values
Modified version of the training dataset of the Bike Sharing Demand challenge running on Kaggle (http://www.kaggle.com/c/bike-sharing-demand/) If you use the problem in publication, please cite:…
0 runs0 likes0 downloads0 reach0 impact
10886 instances - 11 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
24 instances - 5 features - classes - 0 missing values
Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…
0 runs0 likes0 downloads0 reach0 impact
226 instances - 24 features - 2 classes - 0 missing values
This is a corrected version of the previous data file in version 1, which contained a dataset (349 instances) incorrectly merged from the original training and test sets available on UCI (there are…
0 runs0 likes0 downloads0 reach0 impact
267 instances - 45 features - 2 classes - 0 missing values
Source: 1. Olcay KURSUN, PhD., Istanbul University, Department of Computer Engineering, 34320, Istanbul, Turkey Phone: +90 (212) 473 7070 - 17827 Email: okursun '@' istanbul.edu.tr 2. Betul ERDOGDU…
0 runs0 likes0 downloads0 reach0 impact
1039 instances - 29 features - classes - 0 missing values
Source: Creators : François Kawala (1,2) Ahlame Douzal (1) Eric Gaussier (1) Eustache Diemert (2) Institutions : (1) Université Joseph Fourier (Grenoble I) Laboratoire d'informatique de…
0 runs0 likes0 downloads0 reach0 impact
28179 instances - 97 features - classes - 0 missing values
Abstract: CART book's waveform domains Source: Original Owners: Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group:…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 22 features - classes - 0 missing values
Source: Original Owner: U.S. Census Bureau http://www.census.gov/ United States Department of Commerce Donor: Terran Lane and Ronny Kohavi Data Mining and Visualization Silicon Graphics. terran '@'…
0 runs0 likes0 downloads0 reach0 impact
299285 instances - 42 features - classes - 0 missing values
Predicting the Geographical Origin of Music, ICDM, 2014 Abstract: Instances in this dataset contain audio features extracted from 1059 wave files. The task associated with the data is to predict the…
0 runs0 likes0 downloads0 reach0 impact
1059 instances - 118 features - 0 classes - 0 missing values
This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity). *…
0 runs0 likes0 downloads0 reach0 impact
39644 instances - 61 features - 0 classes - 0 missing values
USDA, NRCS. 2008. The PLANTS Database ([Web Link], 31 December 2008). National Plant Data Center, Baton Rouge, LA 70874-4490 USA. Abstract: Data has been extracted from the USDA plants database. It…
0 runs0 likes0 downloads0 reach0 impact
Abstract: This data set contains a total 5820 evaluation scores provided by students from Gazi University in Ankara (Turkey). There is a total of 28 course specific questions and additional 5…
0 runs0 likes0 downloads0 reach0 impact
5820 instances - 33 features - classes - 0 missing values
Abstract: This data contains general demographic information on internet users in 1997. Source: Original Owner: Graphics, Visualization, & Usability Center College of Computing Geogia Institute of…
0 runs0 likes0 downloads0 reach0 impact
Abstract: This dataset contains timeseries of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 male and 44 female native Arabic speakers.…
0 runs0 likes0 downloads0 reach0 impact
178526 instances - 13 features - classes - 57200 missing values
## Guess which points belong to signal track [COMET](http://comet.kek.jp/Introduction.html) is an experiment being constructed at the J-PARC proton beam laboratory in Japan. It will search for…
0 runs0 likes0 downloads0 reach0 impact
7619400 instances - 6 features - 0 classes - 0 missing values
And another sample. (v. 2 without OpenML metainfo)
0 runs0 likes0 downloads0 reach0 impact
89640 instances - 6 features - classes - 0 missing values
Sample with OpenML metadata.
0 runs0 likes0 downloads0 reach0 impact
761940 instances - 6 features - 0 classes - 0 missing values
YAGO Schema.
0 runs0 likes0 downloads0 reach0 impact
181 instances - 4 features - classes - 0 missing values
## Guess which points belong to signal track [COMET](http://comet.kek.jp/Introduction.html) is an experiment being constructed at the J-PARC proton beam laboratory in Japan. It will search for…
0 runs0 likes0 downloads0 reach0 impact
7619400 instances - 6 features - 0 classes - 0 missing values
## Guess which points belong to signal track [COMET](http://comet.kek.jp/Introduction.html) is an experiment being constructed at the J-PARC proton beam laboratory in Japan. It will search for…
0 runs0 likes0 downloads0 reach0 impact
7619400 instances - 6 features - 0 classes - 0 missing values
This is a sesnor data for test it is not complete.
0 runs0 likes0 downloads0 reach0 impact
127591 instances - 27 features - classes - 0 missing values
Sampled http://www.openml.org/d/5889
0 runs0 likes0 downloads0 reach0 impact
761940 instances - 6 features - classes - 0 missing values
Another sample of COMET_MC
0 runs0 likes0 downloads0 reach0 impact
89640 instances - 6 features - 0 classes - 0 missing values
Multi-label dataset. The genbase dataset contains protein sequences that can be assigned to several classes of protein families.
0 runs0 likes0 downloads0 reach0 impact
662 instances - 1212 features - classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 140 features - classes - 0 missing values
Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…
0 runs0 likes0 downloads0 reach0 impact
593 instances - 78 features - classes - 0 missing values
Multi-label dataset. The UC Berkeley enron4 dataset represents a subset of the original enron5 dataset and consists of 1684 cases of emails with 21 labels and 1001 predictor variables.
0 runs0 likes0 downloads0 reach0 impact
1702 instances - 1054 features - classes - 0 missing values
The langLog dataset includes 1004 textual predictors and was originally compiled in the doctorial thesis of Read (2010). It consists of 956 text samples that can be assigned to one or more topics such…
0 runs0 likes0 downloads0 reach0 impact
1460 instances - 1079 features - classes - 0 missing values
Multi-label dataset. A subset of the reuters dataset includes 2000 observations for text classification.
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 250 features - classes - 0 missing values
Multi-label dataset. The scene dataset is an image classification task where labels like Beach, Mountain, Field, Urban are assigned to each image.
0 runs0 likes0 downloads0 reach0 impact
2407 instances - 300 features - classes - 0 missing values
User profile data for San Francisco OkCupid users published in [Kim, A. Y., & Escobedo-Land, A. (2015). OKCupid data for introductory statistics and data science courses. Journal of Statistics…
0 runs0 likes0 downloads0 reach0 impact
50789 instances - 20 features - 3 classes - 154107 missing values
It has 3 attributes (ID, tweet, label ) 91299 tweets with non-sarcastic 39998 tweets and 51300 sarcastic tweets.
0 runs0 likes0 downloads0 reach0 impact
91298 instances - 2 features - 0 classes - 0 missing values
Multi-label dataset. The birds dataset consists of 327 audio recordings of 12 different vocalizing bird species. Each sound can be assigned to various bird species.
0 runs0 likes0 downloads0 reach0 impact
645 instances - 279 features - classes - 0 missing values
130k wine reviews with variety, location, winery, price, and description. Downloaded from Kaggle [https://www.kaggle.com/zynicide/wine-reviews/home] on 29.10.2018. The original data was scraped from…
0 runs0 likes0 downloads0 reach0 impact
129971 instances - 13 features - 0 classes - 204752 missing values
This data was collected from combine primary and secondary sources, through questionnaire, verbal interview and some part of the hospital’s record department’s data, from the selected…
0 runs0 likes0 downloads0 reach0 impact
281 instances - 98 features - 2 classes - 2 missing values
This is the same data as version 5 (OpenML ID = 1220) with '_id' features coded as nominal factor variables.
0 runs0 likes0 downloads0 reach0 impact
39948 instances - 12 features - 2 classes - 0 missing values
Sensor data measurements of one Boiler, containing WaterInput/SteamOutput (flow, temperature, pressure) for one month, which is measured every minute.
0 runs0 likes0 downloads0 reach0 impact
44643 instances - 8 features - classes - 44643 missing values
These weekly averages are ultimately based on measurements of 4 air samples per hour taken atop intake lines on several towers during steady periods of CO2 concentration of not less than 6 hours per…
0 runs0 likes0 downloads0 reach0 impact
2225 instances - 7 features - 0 classes - 0 missing values
Of all the universities in the world, which are the best? Ranking universities is a difficult, political, and controversial practice. There are hundreds of different national and international…
0 runs0 likes0 downloads0 reach0 impact
1029 instances - 14 features - classes - 200 missing values
Los Angeles ozone pollution data, 1976
0 runs0 likes0 downloads0 reach0 impact
Klaverjas is an example of the Jack-Nine card games, which are characterized as trick-taking games where the the Jack and nine of the trump suit are the highest-ranking trumps, and the tens and aces…
0 runs0 likes0 downloads0 reach0 impact
981541 instances - 33 features - 2 classes - 0 missing values
The goal is to predict the Fare. Variable description: pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower age: Age is fractional if less than 1. If the age is…
0 runs0 likes4 downloads4 reach11 impact
1307 instances - 8 features - 0 classes - 0 missing values
Students
0 runs0 likes0 downloads0 reach0 impact
5820 instances - 33 features - classes - 0 missing values
The dataset freMTPL2freq contains risk features for 677,991 motor third-part liability policies (observed mostly on one year). See https://github.com/dutangc/CASdatasets for more details. The dataset…
0 runs1 likes3 downloads4 reach9 impact
678013 instances - 12 features - classes - 0 missing values
The dataset freMTPL2sev contains claim amounts for 26,639 motor third-part liability policies.
0 runs0 likes0 downloads0 reach0 impact
26639 instances - 2 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Airline Ticket Price dataset concerns the prediction of airline ticket prices. The rows are a…
0 runs0 likes0 downloads0 reach0 impact
337 instances - 417 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Airline Ticket Price dataset concerns the prediction of airline ticket prices. The rows are a…
0 runs0 likes0 downloads0 reach0 impact
296 instances - 417 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The river flow datasets concern the prediction of river network flows for 48 h in the future at…
0 runs0 likes0 downloads0 reach0 impact
9125 instances - 584 features - classes - 356160 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Supply Chain Management datasets are derived from the Trading Agent Competition in Supply…
0 runs0 likes0 downloads0 reach0 impact
9803 instances - 296 features - classes - 0 missing values
Multi-label dataset for text-classification. It consists of article titles and partial blurbs. Blurbs can be assigned to several categories (e.g. Science, News, Games) based on word predictors.
0 runs0 likes0 downloads0 reach0 impact
3782 instances - 1101 features - classes - 0 missing values
Multi-label dataset. The yeast dataset (Elisseeff and Weston, 2002) consists of micro-array expression data, as well as phylogenetic profiles of yeast, and includes 2417 genes and 103 predictors. In…
0 runs0 likes0 downloads0 reach0 impact
2417 instances - 117 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Andromeda dataset (Hatzikos et al. 2008) concerns the prediction of future values for six…
0 runs0 likes0 downloads0 reach0 impact
49 instances - 36 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Jura (Goovaerts 1997) dataset consists of measurements of concentrations of seven heavy…
0 runs0 likes0 downloads0 reach0 impact
359 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : This is a pre-processed version of the dataset used in Kaggles Online Product Sales competition…
0 runs0 likes0 downloads0 reach0 impact
639 instances - 413 features - classes - 10012 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The river flow datasets concern the prediction of river network flows for 48 h in the future at…
0 runs0 likes0 downloads0 reach0 impact
9125 instances - 72 features - classes - 3264 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Supply Chain Management datasets are derived from the Trading Agent Competition in Supply…
0 runs0 likes0 downloads0 reach0 impact
8966 instances - 77 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : This is a pre-processed version of the dataset used in Kaggles See Click Predict Fix competition…
0 runs0 likes0 downloads0 reach0 impact
1137 instances - 26 features - classes - 9255 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Solar Flare dataset (Lichman 2013) has 3 target variables that correspond to the number of…
0 runs0 likes0 downloads0 reach0 impact
323 instances - 13 features - classes - 0 missing values
The YouTube personality dataset consists of a collection of behavorial features, speech transcriptions, and personality impression scores for a set of 404 YouTube vloggers that explicitly show…
0 runs0 likes0 downloads0 reach0 impact
404 instances - 31 features - classes - 0 missing values
Testing this plattform
0 runs0 likes0 downloads0 reach0 impact
36203 instances - 18 features - 0 classes - 8971 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Solar Flare dataset (Lichman 2013) has 3 target variables that correspond to the number of…
0 runs0 likes0 downloads0 reach0 impact
1066 instances - 13 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Concrete Slump dataset (Yeh 2007) concerns the prediction of three properties of concrete…
0 runs0 likes0 downloads0 reach0 impact
103 instances - 10 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Water Quality dataset (Dzeroski et al. 2000) has 14 target attributes that refer to the…
0 runs0 likes0 downloads0 reach0 impact
1060 instances - 30 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach0 impact
150 instances - 5 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach0 impact
150 instances - 5 features - 3 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Electrical Discharge Machining dataset (Karalic and Bratko 1997) represents a two-target…
0 runs0 likes0 downloads0 reach0 impact
154 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Energy Building dataset (Tsanas and Xifara 2012) concerns the prediction of the heating load…
0 runs0 likes0 downloads0 reach0 impact
768 instances - 10 features - classes - 0 missing values
GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1-pmlb
0 runs0 likes2 downloads2 reach22 impact
1600 instances - 1001 features - 2 classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 140 features - 2 classes - 0 missing values
The langLog dataset includes 1004 textual predictors and was originally compiled in the doctorial thesis of Read (2010). It consists of 956 text samples that can be assigned to one or more topics such…
0 runs0 likes0 downloads0 reach0 impact
1460 instances - 1079 features - 2 classes - 0 missing values
Multi-label dataset. A subset of the reuters dataset includes 2000 observations for text classification.
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 250 features - 2 classes - 0 missing values
Multi-label dataset. The scene dataset is an image classification task where labels like Beach, Mountain, Field, Urban are assigned to each image.
0 runs0 likes0 downloads0 reach0 impact
2407 instances - 300 features - 2 classes - 0 missing values
Multi-label dataset for text-classification. It consists of article titles and partial blurbs. Blurbs can be assigned to several categories (e.g. Science, News, Games) based on word predictors.
0 runs0 likes0 downloads0 reach0 impact
3782 instances - 1101 features - 2 classes - 0 missing values
Multi-label dataset. The yeast dataset (Elisseeff and Weston, 2002) consists of micro-array expression data, as well as phylogenetic profiles of yeast, and includes 2417 genes and 103 predictors. In…
0 runs0 likes0 downloads0 reach0 impact
2417 instances - 117 features - 2 classes - 0 missing values
Hello Hello
0 runs0 likes0 downloads0 reach0 impact
44690 instances - 77 features - classes - 0 missing values
Small dataset with time series of RAM prices over the years.
0 runs0 likes0 downloads0 reach0 impact
333 instances - 3 features - 0 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
Test dataset
0 runs0 likes0 downloads0 reach0 impact
15547 instances - 61 features - 0 classes - 280 missing values
Domain dataset
0 runs0 likes0 downloads0 reach0 impact
1637 instances - 9839 features - 3 classes - 13231887 missing values
Multi-label dataset. The scene dataset is an image classification task where labels like Beach, Mountain, Field, Urban are assigned to each image.
0 runs0 likes0 downloads0 reach0 impact
2407 instances - 300 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Andromeda dataset (Hatzikos et al. 2008) concerns the prediction of future values for six…
0 runs0 likes0 downloads0 reach0 impact
49 instances - 36 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Airline Ticket Price dataset concerns the prediction of airline ticket prices. The rows are a…
0 runs0 likes0 downloads0 reach0 impact
337 instances - 417 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Airline Ticket Price dataset concerns the prediction of airline ticket prices. The rows are a…
0 runs0 likes0 downloads0 reach0 impact
296 instances - 417 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Electrical Discharge Machining dataset (Karalic and Bratko 1997) represents a two-target…
0 runs0 likes0 downloads0 reach0 impact
154 instances - 18 features - classes - 0 missing values
Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…
0 runs0 likes0 downloads0 reach0 impact
593 instances - 78 features - classes - 0 missing values