OpenML
Filter results by:
Detailed movie descriptions - ideal for Recommendation Engines
0 runs0 likes0 downloads0 reach0 impact
4803 instances - 11 features - classes - 514 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
583 instances - 11 features - classes - 0 missing values
Content This dataset comprises of various house listings in London and neighbouring region. It also encompasses the parameters listed below, the definitions of which are quite self-explanatory.…
0 runs0 likes0 downloads0 reach0 impact
3480 instances - 11 features - classes - 962 missing values
Context TurkeyCovid 19 Dataset Data Source : https://covid19.saglik.gov.tr/ Content This data set has been created in accordance with the data shared by the Ministry of Health of the Republic of…
0 runs0 likes0 downloads0 reach0 impact
140 instances - 11 features - classes - 0 missing values
Yu-Gi-Oh! is a Japanese manga series about gaming written and illustrated by Kazuki Takahashi. It was serialized in Shueisha's Weekly Shnen Jump magazine between September 30, 1996 and March 8, 2004.…
0 runs0 likes0 downloads0 reach0 impact
156 instances - 11 features - classes - 23 missing values
Context Welcome. This is a Womens Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through…
0 runs0 likes0 downloads0 reach0 impact
23486 instances - 11 features - classes - 4697 missing values
Context Welcome. This is a Womens Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through…
0 runs0 likes0 downloads0 reach0 impact
23486 instances - 11 features - classes - 4697 missing values
Context Patients with Liver disease have been continuously increasing because of excessive consumption of alcohol, inhale of harmful gases, intake of contaminated food, pickles and drugs. This dataset…
0 runs0 likes0 downloads0 reach0 impact
583 instances - 11 features - 0 classes - 4 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
566602 instances - 11 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
13376 instances - 11 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
13376 instances - 11 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
13376 instances - 11 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original…
0 runs0 likes0 downloads0 reach0 impact
566602 instances - 11 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
16714 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset covertype (44121) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset covertype (44121) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset MagicTelescope (44125) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset MagicTelescope (44125) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset MagicTelescope (44125) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset MagicTelescope (44125) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset MagicTelescope (44125) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample(…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset covertype (44121) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset covertype (44121) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset covertype (44121) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
The ILPD liver dataset from the OpenCC18 with the gender binary encoded so all features are numeric
1 runs0 likes0 downloads0 reach0 impact
583 instances - 11 features - 2 classes - 0 missing values
The ILPD dataset from the OpenCC18 with all categorical variables label encoded
0 runs0 likes0 downloads0 reach0 impact
583 instances - 11 features - 0 classes - 0 missing values
Data Set Information This data set contains 416 liver patient records and 167 non liver patient records.The data set was collected from test samples in North East of Andhra Pradesh, India.…
0 runs0 likes0 downloads0 reach0 impact
583 instances - 11 features - classes - 4 missing values
Law School Admissions (Binarized) Survey among students attending law school in the U.S. in 1991. The dataset was obtained from the R-package fairml. The response variable has been changed to a binary…
0 runs0 likes0 downloads0 reach0 impact
20800 instances - 11 features - 2 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
This dataset contains, for each Premier League matches 2014-2015, the probabilities generated with the L2F models, as well as matches odds.
0 runs0 likes0 downloads0 reach0 impact
323 instances - 11 features - classes - 0 missing values
Touch Signals
0 runs0 likes0 downloads0 reach0 impact
265 instances - 11 features - classes - 0 missing values
Touch samples 2
0 runs0 likes0 downloads0 reach0 impact
265 instances - 11 features - 8 classes - 0 missing values
This dataset contains data about the physical and chemical properties of the Li-ion silicate cathodes. These properties can be useful to predict the class of a Li-ion battery. These batteries can be…
0 runs0 likes0 downloads0 reach0 impact
339 instances - 11 features - classes - 0 missing values
Context Space Apps Moscow was held on April 29th 30th. Thank you to the 175 people who joined the International Space Apps Challenge at this location! Content The dataset contains such columns as:…
0 runs0 likes0 downloads0 reach0 impact
32686 instances - 11 features - classes - 0 missing values
Background When is my university campus gym least crowded, so I know when to work out? We measured how many people were in this gym once every 10 minutes over the last year. We want to be able to…
0 runs0 likes0 downloads0 reach0 impact
62184 instances - 11 features - classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 11 features - 0 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features"…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
Israeli lottery
0 runs0 likes0 downloads0 reach0 impact
1153 instances - 11 features - classes - 0 missing values
e fvr
0 runs0 likes0 downloads0 reach0 impact
2 instances - 11 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
16598 instances - 11 features - classes - 329 missing values
test
0 runs0 likes0 downloads0 reach0 impact
7411 instances - 11 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
7384 instances - 11 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
7158 instances - 11 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
7628 instances - 11 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
7152 instances - 11 features - classes - 0 missing values
Content This database contains six basic emotions (happiness, surprise, anger, fear, disgust, and sadness) of normalized (average mean reference) data and collected from 85 undergraduate university…
0 runs0 likes0 downloads0 reach0 impact
190967 instances - 11 features - classes - 0 missing values
Context In the dataset freMTPL2freq risk features and claim numbers were collected for 677,991 motor third-part liability policies (observed on a year). Content freMTPL2freq contains 11 columns…
0 runs0 likes0 downloads0 reach0 impact
678013 instances - 11 features - classes - 0 missing values
Context Cytology features of breast cancer biopsy. It can be used to predict breast cancer from cytology features. The data was obtained from…
0 runs0 likes0 downloads0 reach0 impact
699 instances - 11 features - classes - 16 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
16714 instances - 11 features - 2 classes - 0 missing values
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark.…
0 runs0 likes0 downloads0 reach0 impact
16714 instances - 11 features - 2 classes - 0 missing values
Improve on the state of the art in credit scoring by predicting the probability that somebody will experience financial distress in the next two years. ## Description Banks play a crucial role in…
0 runs0 likes0 downloads0 reach0 impact
150000 instances - 11 features - 2 classes - 33655 missing values
Normalized version of the pokerhand data set. Automated file upload of pokerhand-normalized.arff
314 runs0 likes0 downloads0 reach0 impact
829201 instances - 11 features - 10 classes - 0 missing values
This is an artificial data set used in Friedman (1991) and also described in Breiman (1996,p.139). The cases are generated using the following method: Generate the values of 10 attributes, X1, ...,…
0 runs0 likes0 downloads0 reach0 impact
40768 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
100 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
500 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
250 instances - 11 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
118 runs0 likes0 downloads0 reach0 impact
195 instances - 11 features - 2 classes - 2 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
575 runs0 likes0 downloads0 reach0 impact
1000 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
615 runs0 likes0 downloads0 reach0 impact
1000 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
748 runs0 likes0 downloads0 reach0 impact
500 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
985 runs0 likes0 downloads0 reach0 impact
100 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
785 runs0 likes0 downloads0 reach0 impact
500 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
733 runs0 likes0 downloads0 reach0 impact
87 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
773 runs0 likes0 downloads0 reach0 impact
250 instances - 11 features - 2 classes - 0 missing values
1. Title: Social Workers Decisions (Ordinal SWD) 2. Source Informaion: Donor: Arie Ben David MIS, Dept. of Technology Management Holon Academic Inst. of Technology 52 Golomb St. Holon 58102 Israel…
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 11 features - 0 classes - 0 missing values
See [https://github.com/slds-lmu/paper_2023_ci_for_ge](https://github.com/slds-lmu/paper_2023_ci_for_ge) for a description.
0 runs0 likes0 downloads0 reach0 impact
5100000 instances - 11 features - 2 classes - 0 missing values
See [https://github.com/slds-lmu/paper_2023_ci_for_ge](https://github.com/slds-lmu/paper_2023_ci_for_ge) for a description.
0 runs0 likes0 downloads0 reach0 impact
5100000 instances - 11 features - 0 classes - 0 missing values
See [https://github.com/slds-lmu/paper_2023_ci_for_ge](https://github.com/slds-lmu/paper_2023_ci_for_ge) for a description.
0 runs0 likes0 downloads0 reach0 impact
5100000 instances - 11 features - 0 classes - 0 missing values
See [https://github.com/slds-lmu/paper_2023_ci_for_ge](https://github.com/slds-lmu/paper_2023_ci_for_ge) for a description.
0 runs0 likes0 downloads0 reach0 impact
5100000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
500 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
500 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
250 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
100 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
250 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
100 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach0 impact
500 instances - 11 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
791 runs0 likes0 downloads0 reach0 impact
250 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
988 runs0 likes0 downloads0 reach0 impact
100 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
1024 runs0 likes0 downloads0 reach0 impact
100 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
104 runs0 likes0 downloads0 reach0 impact
57 instances - 11 features - 2 classes - 1 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
177147 instances - 11 features - 0 classes - 0 missing values
Rotating hyperplane is a stream generator that generates d-dimensional classification problems in which the prediction is defined by a rotating hyperplane. By changing the orientation and position of…
0 runs0 likes0 downloads0 reach0 impact
500000 instances - 11 features - classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
Context Buying a diamond can be frustrating and expensive. It inspired me to create this dataset of 119K natural and lab-created diamonds from brilliantearth.com to demystify the value of the 4 Cs…
0 runs0 likes0 downloads0 reach0 impact
119307 instances - 11 features - classes - 0 missing values
Context and Content The COVID-19 case surveillance system database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of…
0 runs0 likes0 downloads0 reach0 impact
8405079 instances - 11 features - classes - 9543526 missing values
Context This IMDb Indonesian Movies Dataset contains information of 1262 Indonesian movies. The data was gathered using IMDb-Scraper and then was converted and cleaned into a .csv file.…
0 runs0 likes0 downloads0 reach0 impact
1272 instances - 11 features - classes - 1774 missing values
Context There are some great UFC datasets out there, but I could not find one that included gambling odds. So I went and made one myself. This dataset focuses very generally on the fights and hopes to…
0 runs0 likes0 downloads0 reach0 impact
5528 instances - 11 features - classes - 14168 missing values
Context Patients with Liver disease have been continuously increasing because of excessive consumption of alcohol, inhale of harmful gases, intake of contaminated food, pickles and drugs. This dataset…
0 runs0 likes0 downloads0 reach0 impact
583 instances - 11 features - 0 classes - 4 missing values
Context The dataset provides user reviews on specific drugs along with related conditions, side effects, age, sex, and ratings reflecting overall patient satisfaction. Content Data was acquired by…
0 runs0 likes0 downloads0 reach0 impact
362806 instances - 11 features - classes - 42 missing values
Subsampling of the dataset credit (44089) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset credit (44089) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values
Subsampling of the dataset credit (44089) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self,…
0 runs0 likes0 downloads0 reach0 impact
2000 instances - 11 features - 2 classes - 0 missing values