Data
ATLAS-Higgs-Boson-Machine-Learning-Challenge-2014

ATLAS-Higgs-Boson-Machine-Learning-Challenge-2014

active ARFF Public Domain (CC0) Visibility: public Uploaded 04-06-2023 by Matthias Feurer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This is the datasets from the Kaggle Higgs Boson Machine Learning Challenge 2014. The data was downloaded from the [CERN website](http://opendata.cern.ch/record/328), which also hosts the documentation of the data. Further information about the challenge can be found on [Kaggle](https://www.kaggle.com/competitions/higgs-boson/), [the challenge website](https://higgsml.ijclab.in2p3.fr), and the [PMLR competition proceedings](http://proceedings.mlr.press/v42/). Notes: * This version encodes -999 as NaN. * This version only contains the data used by the Kaggle competition (first 800k samples)

31 features

Label (target)nominal2 unique values
0 missing
EventId (row identifier)numeric800000 unique values
0 missing
DER_mass_MMCnumeric176176 unique values
121936 missing
DER_mass_transverse_met_lepnumeric131451 unique values
0 missing
DER_mass_visnumeric153364 unique values
0 missing
DER_pt_hnumeric189499 unique values
0 missing
DER_deltaeta_jet_jetnumeric7650 unique values
567329 missing
DER_mass_jet_jetnumeric194275 unique values
567329 missing
DER_prodeta_jet_jetnumeric21795 unique values
567329 missing
DER_deltar_tau_lepnumeric5014 unique values
0 missing
DER_pt_totnumeric84176 unique values
0 missing
DER_sum_ptnumeric283219 unique values
0 missing
DER_pt_ratio_lep_taunumeric7427 unique values
0 missing
DER_met_phi_centralitynumeric2830 unique values
0 missing
DER_lep_eta_centralitynumeric1001 unique values
567329 missing
PRI_tau_ptnumeric85907 unique values
0 missing
PRI_tau_etanumeric4979 unique values
0 missing
PRI_tau_phinumeric6286 unique values
0 missing
PRI_lep_ptnumeric88154 unique values
0 missing
PRI_lep_etanumeric5003 unique values
0 missing
PRI_lep_phinumeric6286 unique values
0 missing
PRI_metnumeric125278 unique values
0 missing
PRI_met_phinumeric6286 unique values
0 missing
PRI_met_sumetnumeric344910 unique values
0 missing
PRI_jet_numnumeric4 unique values
0 missing
PRI_jet_leading_ptnumeric150725 unique values
320069 missing
PRI_jet_leading_etanumeric8897 unique values
320069 missing
PRI_jet_leading_phinumeric6286 unique values
320069 missing
PRI_jet_subleading_ptnumeric74366 unique values
567329 missing
PRI_jet_subleading_etanumeric8929 unique values
567329 missing
PRI_jet_subleading_phinumeric6286 unique values
567329 missing
PRI_jet_all_ptnumeric205282 unique values
0 missing
Weight (ignore)numeric332024 unique values
0 missing
KaggleSet (ignore)nominal3 unique values
0 missing
KaggleWeight (ignore)numeric332900 unique values
0 missing

19 properties

800000
Number of instances (rows) of the dataset.
31
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
5053446
Number of missing values in the dataset.
581423
Number of instances with at least one value missing.
30
Number of numeric attributes.
1
Number of nominal attributes.
3.23
Percentage of binary attributes.
72.68
Percentage of instances having missing values.
20.38
Percentage of missing values.
0.55
Average class difference between consecutive instances.
96.77
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
3.23
Percentage of nominal attributes.
65.83
Percentage of instances belonging to the most frequent class.
526625
Number of instances belonging to the most frequent class.
34.17
Percentage of instances belonging to the least frequent class.
273375
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

1 tasks

0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: Label
Define a new task