Data
MagicTelescope

MagicTelescope

active ARFF Publicly available Visibility: public Uploaded 07-10-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • artificial astrophysics gamma_rays Machine Learning mythbusting_1 OpenML100 Physics study_1 study_123 study_14 study_15 study_20 study_34 study_7
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: R. K. Bock. Major Atmospheric Gamma Imaging Cherenkov Telescope project (MAGIC) Donated by P. Savicky, Institute of Computer Science, AS of CR, Czech Republic Source: [UCI](https://archive.ics.uci.edu/ml/datasets/magic+gamma+telescope) Please cite: Bock, R.K., Chilingarian, A., Gaug, M., Hakl, F., Hengstebeck, T., Jirina, M., Klaschka, J., Kotrc, E., Savicky, P., Towers, S., Vaicilius, A., Wittek W. (2004). Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl.Instr.Meth. A, 516, pp. 511-528. The data are MC generated (see below) to simulate registration of high energy gamma particles in a ground-based atmospheric Cherenkov gamma telescope using the imaging technique. Cherenkov gamma telescope observes high energy gamma rays, taking advantage of the radiation emitted by charged particles produced inside the electromagnetic showers initiated by the gammas, and developing in the atmosphere. This Cherenkov radiation (of visible to UV wavelengths) leaks through the atmosphere and gets recorded in the detector, allowing reconstruction of the shower parameters. The available information consists of pulses left by the incoming Cherenkov photons on the photomultiplier tubes, arranged in a plane, the camera. Depending on the energy of the primary gamma, a total of few hundreds to some 10000 Cherenkov photons get collected, in patterns (called the shower image), allowing to discriminate statistically those caused by primary gammas (signal) from the images of hadronic showers initiated by cosmic rays in the upper atmosphere (background). Typically, the image of a shower after some pre-processing is an elongated cluster. Its long axis is oriented towards the camera center if the shower axis is parallel to the telescope's optical axis, i.e. if the telescope axis is directed towards a point source. A principal component analysis is performed in the camera plane, which results in a correlation axis and defines an ellipse. If the depositions were distributed as a bivariate Gaussian, this would be an equidensity ellipse. The characteristic parameters of this ellipse (often called Hillas parameters) are among the image parameters that can be used for discrimination. The energy depositions are typically asymmetric along the major axis, and this asymmetry can also be used in discrimination. There are, in addition, further discriminating characteristics, like the extent of the cluster in the image plane, or the total sum of depositions. The data set was generated by a Monte Carlo program, Corsika, described in: D. Heck et al., CORSIKA, A Monte Carlo code to simulate extensive air showers, Forschungszentrum Karlsruhe FZKA 6019 (1998). The program was run with parameters allowing to observe events with energies down to below 50 GeV. Attribute Information: 1. fLength: continuous # major axis of ellipse [mm] 2. fWidth: continuous # minor axis of ellipse [mm] 3. fSize: continuous # 10-log of sum of content of all pixels [in #phot] 4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio] 5. fConc1: continuous # ratio of highest pixel over fSize [ratio] 6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm] 7. fM3Long: continuous # 3rd root of third moment along major axis [mm] 8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm] 9. fAlpha: continuous # angle of major axis with vector to origin [deg] 10. fDist: continuous # distance from origin to center of ellipse [mm] 11. class: g,h # gamma (signal), hadron (background) g = gamma (signal): 12332 h = hadron (background): 6688 For technical reasons, the number of h events is underestimated. In the real data, the h class represents the majority of the events. The simple classification accuracy is not meaningful for this data, since classifying a background event as signal is worse than classifying a signal event as background. For comparison of different classifiers an ROC curve has to be used. The relevant points on this curve are those, where the probability of accepting a background event as signal is below one of the following thresholds: 0.01, 0.02, 0.05, 0.1, 0.2 depending on the required quality of the sample of the accepted events for different experiments.

12 features

class: (target)nominal2 unique values
0 missing
ID (row identifier)numeric19020 unique values
0 missing
fLength:numeric18643 unique values
0 missing
fWidth:numeric18200 unique values
0 missing
fSize:numeric7228 unique values
0 missing
fConc:numeric6410 unique values
0 missing
fConc1:numeric4421 unique values
0 missing
fAsym:numeric18704 unique values
0 missing
fM3Long:numeric18693 unique values
0 missing
fM3Trans:numeric18390 unique values
0 missing
fAlpha:numeric17981 unique values
0 missing
fDist:numeric18437 unique values
0 missing

107 properties

19020
Number of instances (rows) of the dataset.
12
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
11
Number of numeric attributes.
1
Number of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
0.18
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.19
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.64
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
6688
Number of instances belonging to the least frequent class.
8.33
Percentage of nominal attributes.
1.16
Third quartile of skewness among attributes of the numeric type.
0.59
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.58
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.85
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
4.27
Mean kurtosis among attributes of the numeric type.
0.76
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
53.05
Third quartile of standard deviation of attributes of the numeric type.
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.79
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.16
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
30.68
Mean of means among attributes of the numeric type.
0.27
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
-0.21
First quartile of kurtosis among attributes of the numeric type.
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.18
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.19
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.64
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
0.33
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.24
First quartile of means among attributes of the numeric type.
0.15
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.59
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.58
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.85
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
1
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
0.65
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0
Standard deviation of the number of distinct values among attributes of the nominal type.
0.16
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
2
Average number of distinct values among the attributes of the nominal type.
-0.17
First quartile of skewness among attributes of the numeric type.
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.18
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.77
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
0.64
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.65
Mean skewness among attributes of the numeric type.
0.4
First quartile of standard deviation of attributes of the numeric type.
0.15
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.59
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.2
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
64.84
Percentage of instances belonging to the most frequent class.
29.33
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
2.7
Second quartile (Median) of kurtosis among attributes of the numeric type.
0.65
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.94
Entropy of the target attribute values.
0.56
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
12332
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
6.69
Second quartile (Median) of means among attributes of the numeric type.
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.73
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
-0.53
Minimum kurtosis among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.15
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.28
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
16.77
Maximum kurtosis among attributes of the numeric type.
-4.33
Minimum of means among attributes of the numeric type.
0.59
Second quartile (Median) of skewness among attributes of the numeric type.
0.65
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.43
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
193.82
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
23.47
Second quartile (Median) of standard deviation of attributes of the numeric type.
0.79
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
2
The minimal number of distinct values among attributes of the nominal type.
8.33
Percentage of binary attributes.
Third quartile of entropy among attributes.
0.19
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
2
The maximum number of distinct values among attributes of the nominal type.
-1.12
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
8.26
Third quartile of kurtosis among attributes of the numeric type.
1
Average class difference between consecutive instances.
0.58
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.85
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
3.37
Maximum skewness among attributes of the numeric type.
0.11
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
34.05
Third quartile of means among attributes of the numeric type.
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.79
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.16
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
74.73
Maximum standard deviation of attributes of the numeric type.
35.16
Percentage of instances belonging to the least frequent class.
91.67
Percentage of numeric attributes.

28 tasks

62748 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: class:
1 runs - estimation_procedure: 5 times 2-fold Crossvalidation - target_feature: class:
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class:
0 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: class:
0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: class:
46 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class:
0 runs - estimation_procedure: Interleaved Test then Train - target_feature: class:
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - target_feature: class:
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
1311 runs - target_feature: class:
1305 runs - target_feature: class:
0 runs - target_feature: class:
0 runs - target_feature: class:
0 runs - target_feature: class:
0 runs - target_feature: class:
0 runs - target_feature: class:
0 runs - target_feature: class:
Define a new task