Study
TabZilla Hard Datasets

TabZilla Hard Datasets

Created 13-06-2023 by Duncan McElfresh Visibility: public
Search these data sets in more detail
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
13 runs1 likes1 downloads2 reach21 impact
20000 instances - 4297 features - 2 classes - 0 missing values
Byron Roe (byronroe '@' umich.edu) Department of Physics University of Michigan Ann Arbor, MI 48109 This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos…
12 runs0 likes4 downloads4 reach14 impact
130064 instances - 51 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach18 impact
425240 instances - 79 features - 2 classes - 2734000 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
8 runs0 likes2 downloads2 reach19 impact
2984 instances - 145 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
6905 runs0 likes6 downloads6 reach20 impact
44819 instances - 7 features - 3 classes - 0 missing values
This dataset was retrieved 2014-11-14 from the UCI site and converted to the ARFF format. __Major changes w.r.t. version 3: dataset from UCI that matches description and data types__ ### Feature…
4207 runs1 likes10 downloads11 reach16 impact
690 instances - 15 features - 2 classes - 0 missing values
This data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four-minute "first date" with every other participant of the…
28211 runs19 likes170 downloads189 reach36 impact
8378 instances - 121 features - 2 classes - 18372 missing values
Creators: Renata Cristina Barros Madeo (Madeo, R. C. B.) Priscilla Koch Wagner (Wagner, P. K.) Sarajane Marques Peres (Peres, S. M.) {renata.si, priscilla.wagner, sarajane} at usp.br…
26636 runs1 likes18 downloads19 reach40 impact
9873 instances - 33 features - 5 classes - 0 missing values
Predict a biological response of molecules from their chemical properties. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological…
48680 runs2 likes40 downloads42 reach37 impact
3751 instances - 1777 features - 2 classes - 0 missing values
QSAR biodegradation Data Set * Abstract: Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). *…
267861 runs1 likes25 downloads26 reach30 impact
1055 instances - 42 features - 2 classes - 0 missing values
The aim of this dataset is to distinguish between nasal (class 0) and oral sounds (class 1). Five different attributes were chosen to characterize each vowel: they are the amplitudes of the five first…
218957 runs6 likes41 downloads47 reach32 impact
5404 instances - 6 features - 2 classes - 0 missing values
1. Data set title: Nomao Data Set 2. Abstract: Nomao collects data about places (name, phone, localization...) from many sources. Deduplication consists in detecting what data refer to the same place.…
67704 runs0 likes16 downloads16 reach30 impact
34465 instances - 119 features - 2 classes - 0 missing values
### Description This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories. ### Source ``` Patrick Marques…
34579 runs0 likes17 downloads17 reach59 impact
1080 instances - 857 features - 9 classes - 0 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
291 runs0 likes31 downloads31 reach17 impact
539383 instances - 8 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for storage management for receiving and processing ground data. Data comes from McCabe and Halstead features extractors of…
161516 runs2 likes29 downloads31 reach30 impact
2109 instances - 22 features - 2 classes - 0 missing values
Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which consisted of 5 different datasets (SYLVA, GINA, NOVA, HIVA, ADA). The purpose of the challenge…
105345 runs0 likes0 downloads0 reach0 impact
4562 instances - 49 features - 2 classes - 0 missing values
PRO FOOTBALL SCORES (raw data appears after the description below) How well do the oddsmakers of Las Vegas predict the outcome of professional football games? Is there really a home field advantage -…
16080 runs0 likes0 downloads0 reach0 impact
672 instances - 10 features - 2 classes - 1200 missing values
NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many…
31976 runs2 likes35 downloads37 reach14 impact
846 instances - 19 features - 4 classes - 0 missing values
This dataset classifies people described by a set of attributes as good or bad credit risks. This dataset comes with a cost matrix: ``` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 ``` It is worse…
506311 runs28 likes312 downloads340 reach35 impact
1000 instances - 21 features - 2 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
34884 runs0 likes23 downloads23 reach14 impact
2000 instances - 48 features - 10 classes - 0 missing values
This data set was generated to model psychological experimental results. Each example is classified as having the balance scale tip to the right, tip to the left, or be balanced. The attributes are…
30116 runs2 likes18 downloads20 reach17 impact
625 instances - 5 features - 3 classes - 0 missing values