OpenML

autoMpg (1)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identifier attribute deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based…

2 runs0 likes0 downloads0 reach0 impact
398 instances - 8 features - 0 classes - 6 missing values

QSAR-TID-10116 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10116, and it has 399 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
399 instances - 1026 features - 0 classes - 0 missing values

QSAR-TID-101598 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 101598, and it has 399 rows and 1026 features…

1 runs0 likes0 downloads0 reach0 impact
399 instances - 1026 features - 0 classes - 0 missing values

calendarDOW (1)

calendarDOW-pmlb

31 runs0 likes0 downloads0 reach0 impact
399 instances - 33 features - 5 classes - 0 missing values

QSAR-TID-30028 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 30028, and it has 399 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
399 instances - 1026 features - 0 classes - 0 missing values

analcatdata_germangss (1)

analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…

581 runs0 likes0 downloads0 reach0 impact
400 instances - 6 features - 4 classes - 0 missing values

chscase_census3 (1)

File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…

0 runs0 likes0 downloads0 reach0 impact
400 instances - 8 features - 0 classes - 0 missing values

chscase_census2 (1)

File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…

22 runs0 likes0 downloads0 reach0 impact
400 instances - 8 features - 0 classes - 0 missing values

chscase_census6 (1)

File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…

0 runs0 likes0 downloads0 reach0 impact
400 instances - 7 features - 0 classes - 0 missing values

chscase_census5 (1)

File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…

0 runs0 likes0 downloads0 reach0 impact
400 instances - 8 features - 0 classes - 0 missing values

chscase_census4 (1)

File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…

0 runs0 likes0 downloads0 reach0 impact
400 instances - 8 features - 0 classes - 0 missing values

chscase_census5 (2)

Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…

788 runs0 likes0 downloads0 reach0 impact
400 instances - 8 features - 2 classes - 0 missing values

chscase_census4 (2)

Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…

764 runs0 likes0 downloads0 reach0 impact
400 instances - 8 features - 2 classes - 0 missing values

chscase_census3 (2)

Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…

779 runs0 likes0 downloads0 reach0 impact
400 instances - 8 features - 2 classes - 0 missing values

chscase_census2 (2)

Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…

791 runs0 likes0 downloads0 reach0 impact
400 instances - 8 features - 2 classes - 0 missing values

chscase_census6 (2)

Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…

817 runs0 likes0 downloads0 reach0 impact
400 instances - 7 features - 2 classes - 0 missing values

analcatdata_germangss (2)

Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…

757 runs0 likes0 downloads0 reach0 impact
400 instances - 6 features - 2 classes - 0 missing values

autoUniv-au6-400 (1)

* Dataset Title: AutoUniv Dataset data problem: autoUniv-au6-cd1-400 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity…

144 runs0 likes0 downloads0 reach0 impact
400 instances - 41 features - 8 classes - 0 missing values

Generation-8-Pokemon (1)

Context This dataset contains information from all 400 Pokemon in generation eight. Content No.: Pokedex number Name: Pokemon name in english Ability1: Pokemon ability Ability2: Pokemon second ability…

0 runs0 likes0 downloads0 reach0 impact
400 instances - 19 features - classes - 369 missing values

chronic-kidney-disease (1)

This dataset can be used to predict the chronic kidney disease and it can be collected from the hospital nearly 2 months of period. ### Attribute information We use 24 + class = 25 ( 11 numeric ,14…

0 runs0 likes0 downloads0 reach0 impact
400 instances - 26 features - classes - 1009 missing values

chronic-kidney-disease (2)

Context This dataset is originally from UCI Machine Learning Repository. The objective of the dataset is to diagnostically predict whether a patient is having chronic kidney disease or not, based on…

0 runs0 likes0 downloads0 reach0 impact
400 instances - 14 features - 0 classes - 0 missing values

Olivetti_Faces (1)

This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T Laboratories Cambridge. As described on the original website: There are ten different images of each of 40…

53 runs0 likes0 downloads0 reach0 impact
400 instances - 4097 features - 40 classes - 0 missing values

mw1 (1)

%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable,…

765 runs0 likes0 downloads0 reach0 impact
403 instances - 38 features - 2 classes - 0 missing values

user-knowledge (1)

* Title: User Knowledge Modeling Data Set * Abstract: It is the real dataset about the students' knowledge status about the subject of Electrical DC Machines. The dataset had been obtained from Ph.D.…

153 runs0 likes0 downloads0 reach0 impact
403 instances - 6 features - 5 classes - 0 missing values

youtube (2)

The YouTube personality dataset consists of a collection of behavorial features, speech transcriptions, and personality impression scores for a set of 404 YouTube vloggers that explicitly show…

0 runs0 likes0 downloads0 reach0 impact
404 instances - 31 features - classes - 0 missing values

QSAR-TID-148 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 148, and it has 404 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
404 instances - 1026 features - 0 classes - 0 missing values

youtube (3)

The YouTube personality dataset consists of a collection of behavorial features, speech transcriptions, and personality impression scores for a set of 404 YouTube vloggers that explicitly show…

0 runs0 likes0 downloads0 reach0 impact
404 instances - 31 features - classes - 0 missing values

AP_Endometrium_Breast (1)

GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…

82 runs0 likes0 downloads0 reach0 impact
405 instances - 10936 features - 2 classes - 0 missing values

cars (1)

The Committee on Statistical Graphics of the American Statistical Association (ASA) invites you to participate in its Second (1983) Exposition of Statistical Graphics Technology. The purposes of the…

164 runs0 likes0 downloads0 reach0 impact
406 instances - 8 features - 3 classes - 14 missing values

cars (2)

Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…

718 runs0 likes0 downloads0 reach0 impact
406 instances - 9 features - 2 classes - 14 missing values

QSAR-TID-12067 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12067, and it has 406 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
406 instances - 1026 features - 0 classes - 0 missing values

QSAR-TID-12471 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12471, and it has 406 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
406 instances - 1026 features - 0 classes - 0 missing values

QSAR-TID-12128 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12128, and it has 407 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
407 instances - 1026 features - 0 classes - 0 missing values

lizards_dataset (1)

Real-world data set about the perching behaviour of two species of lizards in the South Bimini island, from Shoener (1968). The lizards data set contains the following variables: Species (the species…

0 runs0 likes0 downloads0 reach0 impact
409 instances - 3 features - classes - 0 missing values

AP_Colon_Uterus (1)

GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…

65 runs0 likes0 downloads0 reach0 impact
410 instances - 10936 features - 2 classes - 0 missing values

QSAR-TID-236 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 236, and it has 411 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
411 instances - 1026 features - 0 classes - 0 missing values

QSAR-TID-11636 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11636, and it has 411 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
411 instances - 1026 features - 0 classes - 0 missing values

braziltourism (2)

Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…

721 runs0 likes0 downloads0 reach0 impact
412 instances - 9 features - 2 classes - 96 missing values

AP_Colon_Lung (1)

GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…

65 runs0 likes0 downloads0 reach0 impact
412 instances - 10936 features - 2 classes - 0 missing values

QSAR-TID-168 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 168, and it has 412 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
412 instances - 1026 features - 0 classes - 0 missing values

braziltourism (1)

analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…

1187 runs0 likes0 downloads0 reach0 impact
412 instances - 9 features - 7 classes - 96 missing values

AP_Breast_Prostate (1)

GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…

77 runs0 likes0 downloads0 reach0 impact
413 instances - 10936 features - 2 classes - 0 missing values

QSAR-TID-11402 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11402, and it has 413 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
413 instances - 1026 features - 0 classes - 0 missing values

QSAR-TID-11785 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11785, and it has 413 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
413 instances - 1026 features - 0 classes - 0 missing values

tr11.wc (1)

No data.

219 runs0 likes0 downloads0 reach0 impact
414 instances - 6430 features - 9 classes - 0 missing values

19Realestatevaluation (1)

19Realestatevaluation

0 runs0 likes0 downloads0 reach0 impact
414 instances - 8 features - classes - 0 missing values

pbc (2)

------------------------------------------------------------------------ Primary Biliary Cirrhosis The data set found in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis,…

18 runs0 likes0 downloads0 reach0 impact
418 instances - 20 features - 0 classes - 1033 missing values

pbc (3)

Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…

723 runs0 likes0 downloads0 reach0 impact
418 instances - 19 features - 2 classes - 1239 missing values

pbc (1)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Case number deleted. X treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric…

10 runs0 likes0 downloads0 reach0 impact
418 instances - 19 features - 0 classes - 1239 missing values

AP_Breast_Omentum (1)

GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…

78 runs0 likes0 downloads0 reach0 impact
421 instances - 10936 features - 2 classes - 0 missing values

dgf_96f4164d-956d-4c1c-b161-68724eb0ccdc (19)

Arbres urbains

0 runs0 likes0 downloads0 reach0 impact
421 instances - 3 features - 1 classes - 0 missing values

QSAR-TID-10574 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10574, and it has 422 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
422 instances - 1026 features - 0 classes - 0 missing values

QSAR-TID-251 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 251, and it has 426 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
426 instances - 1026 features - 0 classes - 0 missing values

QSAR-TID-10878 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10878, and it has 427 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
427 instances - 1026 features - 0 classes - 0 missing values

QSAR-TID-262 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 262, and it has 429 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
429 instances - 1026 features - 0 classes - 0 missing values

QSAR-TID-11379 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11379, and it has 429 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
429 instances - 1026 features - 0 classes - 0 missing values

vote (1)

1. Title: 1984 United States Congressional Voting Records Database 2. Source Information: (a) Source: Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional…

2262 runs0 likes0 downloads0 reach0 impact
435 instances - 17 features - 2 classes - 392 missing values

youtube-spam-lmfao (1)

It is a public set of comments collected for spam research. It has five datasets composed by 1,956 real messages extracted from five videos that were among the 10 most viewed on the collection period.…

0 runs0 likes0 downloads0 reach0 impact
438 instances - 5 features - classes - 0 missing values

NBA-2k20-player-dataset (1)

Context NBA 2k20 analysis. Content Detailed attributes for players registered in the NBA2k20. Acknowledgements Data scraped from https://hoopshype.com/nba2k/. Additional data about countries and…

0 runs0 likes0 downloads0 reach0 impact
439 instances - 15 features - classes - 92 missing values

wholesale-customers (1)

* Title: Wholesale customers Data Set * Abstract: The data set refers to clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories *…

161 runs0 likes0 downloads0 reach0 impact
440 instances - 9 features - 2 classes - 0 missing values

QSAR-TID-17081 (1)

This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 17081, and it has 440 rows and 1026 features (including…

1 runs0 likes0 downloads0 reach0 impact
440 instances - 1026 features - 0 classes - 0 missing values

lungcancer_shedden (1)

Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…

3 runs0 likes0 downloads0 reach0 impact
442 instances - 24 features - 0 classes - 0 missing values

Diabetes(scikit-learn) (1)

.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…