Data
Higgs

Higgs

active ARFF Unknown Visibility: public Uploaded 14-06-2023 by Matthias Feurer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This is a classification problem to distinguish between a signal process which produces Higgs bosons and a background process which does not. ## Information The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The last seven features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes. There is an interest in using deep learning methods to obviate the need for physicists to manually develop such features. Benchmark results using Bayesian Decision Trees from a standard physics package and 5-layer neural networks are presented in the original paper. The last 500,000 examples are used as a test set. ## Attribute Information The first column is the class label (1 for signal, 0 for background), followed by the 28 features (21 low-level features then 7 high-level features): lepton pT, lepton eta, lepton phi, missing energy magnitude, missing energy phi, jet 1 pt, jet 1 eta, jet 1 phi, jet 1 b-tag, jet 2 pt, jet 2 eta, jet 2 phi, jet 2 b-tag, jet 3 pt, jet 3 eta, jet 3 phi, jet 3 b-tag, jet 4 pt, jet 4 eta, jet 4 phi, jet 4 b-tag, m_jj, m_jjj, m_lv, m_jlv, m_bb, m_wbb, m_wwbb. For more detailed information about each feature see the original paper. Notes by Uploader to OpenML * This is the 11M version from UCI. * This dataset is described in version 1 of arXiv:1402.4735 and version 2 on arXiv and the subsequent NeurIPS publication used larger versions of this dataset.

29 features

Target (target)nominal2 unique values
0 missing
lepton pTnumeric27983 unique values
0 missing
lepton etanumeric5001 unique values
0 missing
lepton phinumeric6284 unique values
0 missing
missing energy magnitudenumeric1249475 unique values
0 missing
missing energy phinumeric2218464 unique values
0 missing
jet 1 ptnumeric45559 unique values
0 missing
jet 1 etanumeric5999 unique values
0 missing
jet 1 phinumeric6284 unique values
0 missing
jet 1 b-tagnumeric3 unique values
0 missing
jet 2 ptnumeric37793 unique values
0 missing
jet 2 etanumeric5999 unique values
0 missing
jet 2 phinumeric6284 unique values
0 missing
jet 2 b-tagnumeric3 unique values
0 missing
jet 3 ptnumeric27073 unique values
0 missing
jet 3 etanumeric5999 unique values
0 missing
jet 3 phinumeric6284 unique values
0 missing
jet 3 b-tagnumeric3 unique values
0 missing
jet 4 ptnumeric19881 unique values
0 missing
jet 4 etanumeric5999 unique values
0 missing
jet 4 phinumeric6284 unique values
0 missing
jet 4 b-tagnumeric3 unique values
0 missing
m_jjnumeric1068674 unique values
0 missing
m_jjjnumeric495814 unique values
0 missing
m_lvnumeric344267 unique values
0 missing
m_jlvnumeric524126 unique values
0 missing
m_bbnumeric1127135 unique values
0 missing
m_wbbnumeric691289 unique values
0 missing
m_wwbbnumeric767597 unique values
0 missing

19 properties

11000000
Number of instances (rows) of the dataset.
29
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
28
Number of numeric attributes.
1
Number of nominal attributes.
3.45
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.5
Average class difference between consecutive instances.
96.55
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
3.45
Percentage of nominal attributes.
52.99
Percentage of instances belonging to the most frequent class.
5829123
Number of instances belonging to the most frequent class.
47.01
Percentage of instances belonging to the least frequent class.
5170877
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

2 tasks

0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: Target
0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Target
Define a new task