OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

seismic-bumps

active ARFF Publicly available Visibility: public Uploaded 25-05-2015 by Rafael Gomes Mantovani
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Sikora M., Wrobel L. Source: UCI Please cite: Sikora M., Wrobel L.: Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Archives of Mining Sciences, 55(1), 2010, 91-114. * Title: seismic-bumps Data Set * Abstract: The data describe the problem of high energy (higher than 10^4 J) seismic bumps forecasting in a coal mine. Data come from two of longwalls located in a Polish coal mine. * Source: Marek Sikora^{1,2} (marek.sikora '@' polsl.pl), Lukasz Wrobel^{1} (lukasz.wrobel '@' polsl.pl) (1) Institute of Computer Science, Silesian University of Technology, 44-100 Gliwice, Poland (2) Institute of Innovative Technologies EMAG, 40-189 Katowice, Poland * Data Set Information: Mining activity was and is always connected with the occurrence of dangers which are commonly called mining hazards. A special case of such threat is a seismic hazard which frequently occurs in many underground mines. Seismic hazard is the hardest detectable and predictable of natural hazards and in this respect it is comparable to an earthquake. More and more advanced seismic and seismoacoustic monitoring systems allow a better understanding rock mass processes and definition of seismic hazard prediction methods. Accuracy of so far created methods is however far from perfect. Complexity of seismic processes and big disproportion between the number of low-energy seismic events and the number of high-energy phenomena (e.g. > 10^4J) causes the statistical techniques to be insufficient to predict seismic hazard. Therefore, it is essential to search for new opportunities of better hazard prediction, also using machine learning methods. In seismic hazard assessment data clustering techniques can be applied (Lesniak A., Isakow Z.: Space-time clustering of seismic events and hazard assessment in the Zabrze-Bielszowice coal mine, Poland. Int. Journal of Rock Mechanics and Mining Sciences, 46(5), 2009, 918-928), and for prediction of seismic tremors artificial neural networks are used (Kabiesz, J.: Effect of the form of data on the quality of mine tremors hazard forecasting using neural networks. Geotechnical and Geological Engineering, 24(5), 2005, 1131-1147). In the majority of applications, the results obtained by mentioned methods are reported in the form of two states which are interpreted as 'hazardous' and 'non-hazardous'. Unbalanced distribution of positive ('hazardous state') and negative ('non-hazardous state') examples is a serious problem in seismic hazard prediction. Currently used methods are still insufficient to achieve good sensitivity and specificity of predictions. In the paper (Bukowska M.: The probability of rockburst occurrence in the Upper Silesian Coal Basin area dependent on natural mining conditions. Journal of Mining Sciences, 42(6), 2006, 570-577) a number of factors having an effect on seismic hazard occurrence was proposed, among other factors, the occurrence of tremors with energy > 10^4J was listed. The task of seismic prediction can be defined in different ways, but the main aim of all seismic hazard assessment methods is to predict (with given precision relating to time and date) of increased seismic activity which can cause a rockburst. In the data set each row contains a summary statement about seismic activity in the rock mass within one shift (8 hours). If decision attribute has the value 1, then in the next shift any seismic bump with an energy higher than 10^4 J was registered. That task of hazards prediction bases on the relationship between the energy of recorded tremors and seismoacoustic activity with the possibility of rockburst occurrence. Hence, such hazard prognosis is not connected with accurate rockburst prediction. Moreover, with the information about the possibility of hazardous situation occurrence, an appropriate supervision service can reduce a risk of rockburst (e.g. by distressing shooting) or withdraw workers from the threatened area. Good prediction of increased seismic activity is therefore a matter of great practical importance. The presented data set is characterized by unbalanced distribution of positive and negative examples. In the data set there are only 170 positive examples representing class 1. * Attribute Information: 1. seismic: result of shift seismic hazard assessment in the mine working obtained by the seismic method (a - lack of hazard, b - low hazard, c - high hazard, d - danger state); 2. seismoacoustic: result of shift seismic hazard assessment in the mine working obtained by the seismoacoustic method; 3. shift: information about type of a shift (W - coal-getting, N -preparation shift); 4. genergy: seismic energy recorded within previous shift by the most active geophone (GMax) out of geophones monitoring the longwall; 5. gpuls: a number of pulses recorded within previous shift by GMax; 6. gdenergy: a deviation of energy recorded within previous shift by GMax from average energy recorded during eight previous shifts; 7. gdpuls: a deviation of a number of pulses recorded within previous shift by GMax from average number of pulses recorded during eight previous shifts; 8. ghazard: result of shift seismic hazard assessment in the mine working obtained by the seismoacoustic method based on registration coming form GMax only; 9. nbumps: the number of seismic bumps recorded within previous shift; 10. nbumps2: the number of seismic bumps (in energy range [10^2,10^3)) registered within previous shift; 11. nbumps3: the number of seismic bumps (in energy range [10^3,10^4)) registered within previous shift; 12. nbumps4: the number of seismic bumps (in energy range [10^4,10^5)) registered within previous shift; 13. nbumps5: the number of seismic bumps (in energy range [10^5,10^6)) registered within the last shift; 14. nbumps6: the number of seismic bumps (in energy range [10^6,10^7)) registered within previous shift; 15. nbumps7: the number of seismic bumps (in energy range [10^7,10^8)) registered within previous shift; 16. nbumps89: the number of seismic bumps (in energy range [10^8,10^10)) registered within previous shift; 17. energy: total energy of seismic bumps registered within previous shift; 18. maxenergy: the maximum energy of the seismic bumps registered within previous shift; 19. class: the decision attribute - '1' means that high energy seismic bump occurred in the next shift ('hazardous state'), '0' means that no high energy seismic bumps occurred in the next shift ('non-hazardous state').

8 features

Class (target)	nominal	3 unique values 0 missing
V1	numeric	193 unique values 0 missing
V2	numeric	170 unique values 0 missing
V3	numeric	186 unique values 0 missing
V4	numeric	188 unique values 0 missing
V5	numeric	184 unique values 0 missing
V6	numeric	207 unique values 0 missing
V7	numeric	148 unique values 0 missing

Show all 8 features

107 properties

NumberOfInstances

210

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

REPTreeDepth1Kappa

0.46

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_kNN1NAUC

0.96

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

StdvNominalAttDistinctValues

Standard deviation of the number of distinct values among attributes of the nominal type.

J48.001.ErrRate

0.1

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNominalAttDistinctValues

Average number of distinct values among the attributes of the nominal type.

Quartile1SkewnessOfNumericAtts

0.13

First quartile of skewness among attributes of the numeric type.

REPTreeDepth2AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NErrRate

0.06

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NAUC

0.96

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

J48.001.Kappa

0.85

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanSkewnessOfNumericAtts

0.27

Mean skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

0.38

First quartile of standard deviation of attributes of the numeric type.

REPTreeDepth2ErrRate

0.11

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NKappa

0.91

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NErrRate

0.05

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassPercentage

33.33

Percentage of instances belonging to the most frequent class.

MeanStdDevOfNumericAtts

1.01

Mean standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

Second quartile (Median) of entropy among attributes.

REPTreeDepth2Kappa

0.83

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

ClassEntropy

1.58

Entropy of the target attribute values.

kNN1NKappa

0.93

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassSize

Number of instances belonging to the most frequent class.

MinAttributeEntropy

Minimal entropy among attributes.

Quartile2KurtosisOfNumericAtts

-0.84

Second quartile (Median) of kurtosis among attributes of the numeric type.

REPTreeDepth3AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpAUC

0.81

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxAttributeEntropy

Maximum entropy among attributes.

MinKurtosisOfNumericAtts

-1.11

Minimum kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

5.41

Second quartile (Median) of means among attributes of the numeric type.

REPTreeDepth3ErrRate

0.11

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpErrRate

0.35

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxKurtosisOfNumericAtts

-0.07

Maximum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

0.87

Minimum of means among attributes of the numeric type.

Quartile2MutualInformation

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

0.4

Second quartile (Median) of skewness among attributes of the numeric type.

REPTreeDepth3Kappa

0.83

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpKappa

0.47

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxMeansOfNumericAtts

14.85

Maximum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

Quartile2StdDevOfNumericAtts

0.49

Second quartile (Median) of standard deviation of attributes of the numeric type.

RandomTreeDepth1AUC

0.8

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Dimensionality

0.04

Number of attributes divided by the number of instances.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfBinaryFeatures

Percentage of binary attributes.

Quartile3AttributeEntropy

Third quartile of entropy among attributes.

RandomTreeDepth1ErrRate

0.36

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

EquivalentNumberOfAtts

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

-0.54

Minimum skewness among attributes of the numeric type.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

Quartile3KurtosisOfNumericAtts

-0.14

Third quartile of kurtosis among attributes of the numeric type.

AutoCorrelation

0.99

Average class difference between consecutive instances.

RandomTreeDepth1Kappa

0.46

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

J48.00001.AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxSkewnessOfNumericAtts

0.56

Maximum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

0.02

Minimum standard deviation of attributes of the numeric type.

PercentageOfMissingValues

Percentage of missing values.

Quartile3MeansOfNumericAtts

14.56

Third quartile of means among attributes of the numeric type.

CfsSubsetEval_DecisionStumpAUC

0.81

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2AUC

0.92

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.ErrRate

0.1

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxStdDevOfNumericAtts

2.91

Maximum standard deviation of attributes of the numeric type.

MinorityClassPercentage

33.33

Percentage of instances belonging to the least frequent class.

PercentageOfNumericFeatures

87.5

Percentage of numeric attributes.

Quartile3MutualInformation

Third quartile of mutual information between the nominal attributes and the target attribute.

CfsSubsetEval_DecisionStumpErrRate

0.35

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2ErrRate

0.15

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.Kappa

0.85

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanAttributeEntropy

Average entropy of the attributes.

MinorityClassSize

Number of instances belonging to the least frequent class.

PercentageOfSymbolicFeatures

12.5

Percentage of nominal attributes.

Quartile3SkewnessOfNumericAtts

0.53

Third quartile of skewness among attributes of the numeric type.

CfsSubsetEval_DecisionStumpKappa

0.47

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2Kappa

0.78

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.0001.AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanKurtosisOfNumericAtts

-0.73

Mean kurtosis among attributes of the numeric type.

NaiveBayesAUC

0.98

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1AttributeEntropy

First quartile of entropy among attributes.

Quartile3StdDevOfNumericAtts

1.5

Third quartile of standard deviation of attributes of the numeric type.

CfsSubsetEval_NaiveBayesAUC

0.98

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3AUC

0.91

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.ErrRate

0.1

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMeansOfNumericAtts

6.9

Mean of means among attributes of the numeric type.

NaiveBayesErrRate

0.1

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1KurtosisOfNumericAtts

-1.1

First quartile of kurtosis among attributes of the numeric type.

REPTreeDepth1AUC

0.8

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3ErrRate

0.15

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.Kappa

0.85

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMutualInformation

Average mutual information between the nominal attributes and the target attribute.

NaiveBayesKappa

0.86

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MeansOfNumericAtts

3.26

First quartile of means among attributes of the numeric type.

REPTreeDepth1ErrRate

0.36

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesKappa

0.86

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3Kappa

0.77

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.001.AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNoiseToSignalRatio

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

Show all 107 properties

13 tasks

Supervised Classification on seismic-bumps

120 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Class

Supervised Classification on seismic-bumps

32 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Class

Clustering on seismic-bumps