{ "data_id": "45562", "name": "seismic-bumps", "exact_name": "seismic-bumps", "version": 3, "version_label": null, "description": "========================================================================================================\nSeismic bumps dataset\n========================================================================================================\nThe data describe the problem of high energy (higher than 10^4 J) seismic bumps forecasting in a coal\nmine. Data come from two of longwalls located in a Polish coal mine.\n\n--------------------------------------------------------------------------------------------------------\nCitation request\n--------------------------------------------------------------------------------------------------------\nSikora M., Wrobel L.: Application of rule induction algorithms for analysis of data collected by seismic\nhazard monitoring systems in coal mines. Archives of Mining Sciences, 55(1), 2010, 91-114.\n\n--------------------------------------------------------------------------------------------------------\nDonors and creators\n--------------------------------------------------------------------------------------------------------\nMarek Sikora^{1,2} (marek.sikora@polsl.pl), Lukasz Wrobel^{1} (lukasz.wrobel@polsl.pl)\n(1) Institute of Computer Science, Silesian University of Technology, 44-100 Gliwice, Poland\n(2) Institute of Innovative Technologies EMAG, 40-189 Katowice, Poland\n\n--------------------------------------------------------------------------------------------------------\nData characteristics\n--------------------------------------------------------------------------------------------------------\nInstances: 2584\nAttributes: 18 + class\nClass distribution:\n \"hazardous state\" (class 1) : 170 (6.6%)\n \"non-hazardous state\" (class 0): 2414 (93.4%)\nMissing Attribute Values: None\n\nAttribute information:\n 1. seismic: result of shift seismic hazard assessment in the mine working obtained by the seismic\nmethod (a - lack of hazard, b - low hazard, c - high hazard, d - danger state);\n 2. seismoacoustic: result of shift seismic hazard assessment in the mine working obtained by the\nseismoacoustic method;\n 3. shift: information about type of a shift (W - coal-getting, N -preparation shift);\n 4. genergy: seismic energy recorded within previous shift by the most active geophone (GMax) out of\ngeophones monitoring the longwall;\n 5. gpuls: a number of pulses recorded within previous shift by GMax;\n 6. gdenergy: a deviation of energy recorded within previous shift by GMax from average energy recorded\nduring eight previous shifts;\n 7. gdpuls: a deviation of a number of pulses recorded within previous shift by GMax from average number\nof pulses recorded during eight previous shifts;\n 8. ghazard: result of shift seismic hazard assessment in the mine working obtained by the\nseismoacoustic method based on registration coming form GMax only;\n 9. nbumps: the number of seismic bumps recorded within previous shift;\n10. nbumps2: the number of seismic bumps (in energy range [10^2,10^3)) registered within previous shift;\n11. nbumps3: the number of seismic bumps (in energy range [10^3,10^4)) registered within previous shift;\n12. nbumps4: the number of seismic bumps (in energy range [10^4,10^5)) registered within previous shift;\n13. nbumps5: the number of seismic bumps (in energy range [10^5,10^6)) registered within the last shift;\n14. nbumps6: the number of seismic bumps (in energy range [10^6,10^7)) registered within previous shift;\n15. nbumps7: the number of seismic bumps (in energy range [10^7,10^8)) registered within previous shift;\n16. nbumps89: the number of seismic bumps (in energy range [10^8,10^10)) registered within previous shift;\n17. energy: total energy of seismic bumps registered within previous shift;\n18. maxenergy: the maximum energy of the seismic bumps registered within previous shift;\n19. class: the decision attribute - \"1\" means that high energy seismic bump occurred in the next shift\n(\"hazardous state\"), \"0\" means that no high energy seismic bumps occurred in the next shift\n(\"non-hazardous state\").\n\n--------------------------------------------------------------------------------------------------------\nRelevant information\n--------------------------------------------------------------------------------------------------------\nMining activity was and is always connected with the occurrence of dangers which are commonly called\nmining hazards. A special case of such threat is a seismic hazard which frequently occurs in many\nunderground mines. Seismic hazard is the hardest detectable and predictable of natural hazards and in\nthis respect it is comparable to an earthquake. More and more advanced seismic and seismoacoustic\nmonitoring systems allow a better understanding rock mass processes and definition of seismic hazard\nprediction methods. Accuracy of so far created methods is however far from perfect. Complexity of\nseismic processes and big disproportion between the number of low-energy seismic events and the number\nof high-energy phenomena (e.g. > 10^4J) causes the statistical techniques to be insufficient to predict\nseismic hazard. Therefore, it is essential to search for new opportunities of better hazard prediction,\nalso using machine learning methods. In seismic hazard assessment data clustering techniques can be\napplied (Lesniak A., Isakow Z.: Space-time clustering of seismic events and hazard assessment in the\nZabrze-Bielszowice coal mine, Poland. Int. Journal of Rock Mechanics and Mining Sciences, 46(5), 2009,\n918-928), and for prediction of seismic tremors artificial neural networks are used (Kabiesz, J.: Effect\nof the form of data on the quality of mine tremors hazard forecasting using neural networks.\nGeotechnical and Geological Engineering, 24(5), 2005, 1131-1147). In the majority of applications, the\nresults obtained by mentioned methods are reported in the form of two states which are interpreted as\n\"hazardous\" and \"non-hazardous\". Unbalanced distribution of positive (\"hazardous state\") and negative\n(\"non-hazardous state\") examples is a serious problem in seismic hazard prediction. Currently used\nmethods are still insufficient to achieve good sensitivity and specificity of predictions. In the paper\n(Bukowska M.: The probability of rockburst occurrence in the Upper Silesian Coal Basin area dependent on\nnatural mining conditions. Journal of Mining Sciences, 42(6), 2006, 570-577) a number of factors having\nan effect on seismic hazard occurrence was proposed, among other factors, the occurrence of tremors with\nenergy > 10^4J was listed. The task of seismic prediction can be defined in different ways, but the main\naim of all seismic hazard assessment methods is to predict (with given precision relating to time and\ndate) of increased seismic activity which can cause a rockburst. In the data set each row contains a\nsummary statement about seismic activity in the rock mass within one shift (8 hours). If decision\nattribute has the value 1, then in the next shift any seismic bump with an energy higher than 10^4 J was\nregistered. That task of hazards prediction bases on the relationship between the energy of recorded\ntremors and seismoacoustic activity with the possibility of rockburst occurrence. Hence, such hazard\nprognosis is not connected with accurate rockburst prediction. Moreover, with the information about the\npossibility of hazardous situation occurrence, an appropriate supervision service can reduce a risk of\nrockburst (e.g. by distressing shooting) or withdraw workers from the threatened area. Good prediction\nof increased seismic activity is therefore a matter of great practical importance. The presented data\nset is characterized by unbalanced distribution of positive and negative examples. In the data set there\nare only 170 positive examples representing class 1.\n\n--------------------------------------------------------------------------------------------------------\nClassification results using stratified 10-fold cross-validation repeated 10 times\n--------------------------------------------------------------------------------------------------------\n| Algorithm | Acc. | BAcc. | Acc.0 | Acc.1 | Size |\n| | | | Specificity | Sensitivity |\t |\n------------------------------------------------------------------------------------------|\n| q-ModLEM(entropy-RSS) (1) | 80.2(5.1) | 69.1(6.2) | 81.90 | 56.35 | 27.5 |\n| q-ModLEM(entropy-Corr.) (1) | 82.9(4.5) | 67.9(7.2) | 85.15 | 50.65 | 45.5 |\n| MODLEM (2) | 92.5(0.8) | 52.6(2.8) | 98.58\t | 6.65 | 145.5 |\n| MLRules(-M 30) (3) | 93.2(0.3) | 50.5(1.3) | 99.69 | 1.29 | 30 |\n| MLRules(-M 100) (3) | 92.9(0.6) | 52.0(2.2) | 99.10 | 4.88 | 100 |\n| MLRules(-M 500) (3) | 92.3(0.6) | 52.9(2.8) | 98.27 | 7.59 | 500 |\n| BRACID (4) | 87.5(0.4) | 62.0(2.6) | 91.38 | 32.71 | - |\n| Jrip (Weka) | 93.0(0.6) | 51.4(2.4) | 99.35 | 3.47 | 1.8 |\n| PART (Weka) | 92.1(0.8) | 52.7(3.5) | 98.09 | 7.35 | 34 |\n| J48 (Weka) | 93.1(0.8) | 50.2(0.9) | 99.64 | 0.82 | 5.6 |\n| SimpleCart (Weka) | 93.4(0.0) | 50.0(0.0) | 100 | 0.00 | 1.0 |\n| NaiveBayes (Weka) | 86.7(2.0) | 64.7(5.8) | 90.08 | 39.41 | - |\n| IB1 (Weka) | 89.4(1.6) | 55.3(4.8) | 94.54 | 16.06 | - |\n| RandomForest(-I 100) (Weka) | 93.1(0.6) | 52.1(2.5) | 99.31 | 4.88 | 100 |\n-------------------------------------------------------------------------------------------\nAcc. - the overall accuracy\nBAcc. - the balanced accuracy\nSize - the number of: rules for rule-based methods, leaves for trees and trees for random forest\n\n(1) Sikora M.: Rule quality measures in creation and reduction of data rule models. Lecture Notes in\nArtificial Intelligence 4259, 2006, 716-725.\n(2) Stefanowski J.: On combined classifiers, rule induction and rough sets.\nTransactions on Rough Sets VI (LNCS 4374) Springer-Verlag, 2007, s. 329 350\n(3) Dembczynski K., Kotlowski W., Slowinski R.: ENDER: a statistical framework for boosting decision\nrules. Data Mining and Knowledge Discovery 21, 2010, 52-90.\n(4) Napierala K., Stefanowski J.: BRACID: a comprehensive approach to learning rules from imbalanced\ndata. Journal of Intelligent Information Systems, 39(2), 2012, 335-373.\n\n\n**Note:** Compared to the 1st version, this version contains all samples from UCI.", "format": "arff", "uploader": "Matthias Feurer", "uploader_id": 86, "visibility": "public", "creator": null, "contributor": null, "date": "2023-06-05 10:40:55", "update_comment": null, "last_update": "2023-06-05 10:40:55", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/api.openml.org\/data\/download\/22116530\/dataset", "default_target_attribute": "class", "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "seismic-bumps", "======================================================================================================== Seismic bumps dataset ======================================================================================================== The data describe the problem of high energy (higher than 10^4 J) seismic bumps forecasting in a coal mine. Data come from two of longwalls located in a Polish coal mine. ------------------------------------------------------------------------------------------------- " ], "weight": 5 }, "qualities": { "NumberOfInstances": 2584, "NumberOfFeatures": 19, "NumberOfClasses": 2, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 14, "NumberOfSymbolicFeatures": 5, "PercentageOfBinaryFeatures": 15.789473684210526, "PercentageOfInstancesWithMissingValues": 0, "AutoCorrelation": 0.8861788617886179, "PercentageOfMissingValues": 0, "Dimensionality": 0.007352941176470588, "PercentageOfNumericFeatures": 73.68421052631578, "MajorityClassPercentage": 93.42105263157895, "PercentageOfSymbolicFeatures": 26.31578947368421, "MajorityClassSize": 2414, "MinorityClassPercentage": 6.578947368421052, "MinorityClassSize": 170, "NumberOfBinaryFeatures": 3 }, "tags": [ { "uploader": "38960", "tag": "Chemistry" }, { "uploader": "38960", "tag": "Life Science" } ], "features": [ { "name": "class", "index": "18", "type": "nominal", "distinct": "2", "missing": "0", "target": "1", "distr": [ [ "0", "1" ], [ [ "2414", "0" ], [ "0", "170" ] ] ] }, { "name": "seismic", "index": "0", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "a", "b" ], [ [ "1599", "83" ], [ "815", "87" ] ] ] }, { "name": "seismoacoustic", "index": "1", "type": "nominal", "distinct": "3", "missing": "0", "distr": [ [ "a", "b", "c" ], [ [ "1479", "101" ], [ "890", "66" ], [ "45", "3" ] ] ] }, { "name": "shift", "index": "2", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "N", "W" ], [ [ "904", "17" ], [ "1510", "153" ] ] ] }, { "name": "genergy", "index": "3", "type": "numeric", "distinct": "2212", "missing": "0", "min": "100", "max": "2595650", "mean": "90243", "stdev": "229201" }, { "name": "gpuls", "index": "4", "type": "numeric", "distinct": "1128", "missing": "0", "min": "2", "max": "4518", "mean": "539", "stdev": "563" }, { "name": "gdenergy", "index": "5", "type": "numeric", "distinct": "334", "missing": "0", "min": "-96", "max": "1245", "mean": "12", "stdev": "80" }, { "name": "gdpuls", "index": "6", "type": "numeric", "distinct": "292", "missing": "0", "min": "-96", "max": "838", "mean": "5", "stdev": "63" }, { "name": "ghazard", "index": "7", "type": "nominal", "distinct": "3", "missing": "0", "distr": [ [ "a", "b", "c" ], [ [ "2186", "156" ], [ "198", "14" ], [ "30", "0" ] ] ] }, { "name": "nbumps", "index": "8", "type": "numeric", "distinct": "10", "missing": "0", "min": "0", "max": "9", "mean": "1", "stdev": "1" }, { "name": "nbumps2", "index": "9", "type": "numeric", "distinct": "7", "missing": "0", "min": "0", "max": "8", "mean": "0", "stdev": "1" }, { "name": "nbumps3", "index": "10", "type": "numeric", "distinct": "7", "missing": "0", "min": "0", "max": "7", "mean": "0", "stdev": "1" }, { "name": "nbumps4", "index": "11", "type": "numeric", "distinct": "4", "missing": "0", "min": "0", "max": "3", "mean": "0", "stdev": "0" }, { "name": "nbumps5", "index": "12", "type": "numeric", "distinct": "2", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "nbumps6", "index": "13", "type": "numeric", "distinct": "1", "missing": "0", "min": "0", "max": "0", "mean": "0", "stdev": "0" }, { "name": "nbumps7", "index": "14", "type": "numeric", "distinct": "1", "missing": "0", "min": "0", "max": "0", "mean": "0", "stdev": "0" }, { "name": "nbumps89", "index": "15", "type": "numeric", "distinct": "1", "missing": "0", "min": "0", "max": "0", "mean": "0", "stdev": "0" }, { "name": "energy", "index": "16", "type": "numeric", "distinct": "242", "missing": "0", "min": "0", "max": "402000", "mean": "4975", "stdev": "20451" }, { "name": "maxenergy", "index": "17", "type": "numeric", "distinct": "33", "missing": "0", "min": "0", "max": "400000", "mean": "4279", "stdev": "19357" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }