{ "data_id": "42363", "name": "forest_fires", "exact_name": "forest_fires", "version": 1, "version_label": "1", "description": "Forest Fires Data Set\r\n\r\nThis is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.\r\n\r\nData Set Information:\r\n\r\nIn [Cortez and Morais, 2007], the output 'area' was first transformed with a ln(x+1) function.\r\nThen, several Data Mining methods were applied. After fitting the models, the outputs were\r\npost-processed with the inverse of the ln(x+1) transform. Four different input setups were\r\nused. The experiments were conducted using a 10-fold (cross-validation) x 30 runs. Two\r\nregression metrics were measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed\r\nwith only 4 direct weather conditions (temp, RH, wind and rain) obtained the best MAD value:\r\n12.71 +- 0.01 (mean and confidence interval within 95% using a t-student distribution). The\r\nbest RMSE was attained by the naive mean predictor. An analysis to the regression error curve\r\n(REC) shows that the SVM model predicts more examples within a lower admitted error. In effect,\r\nthe SVM model predicts better small fires, which are the majority.\r\n\r\n\r\nAttribute Information:\r\n\r\nFor more information, read [Cortez and Morais, 2007].\r\n1. X - x-axis spatial coordinate within the Montesinho park map: 1 to 9\r\n2. Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9\r\n3. month - month of the year: 'jan' to 'dec'\r\n4. day - day of the week: 'mon' to 'sun'\r\n5. FFMC - FFMC index from the FWI system: 18.7 to 96.20\r\n6. DMC - DMC index from the FWI system: 1.1 to 291.3\r\n7. DC - DC index from the FWI system: 7.9 to 860.6\r\n8. ISI - ISI index from the FWI system: 0.0 to 56.10\r\n9. temp - temperature in Celsius degrees: 2.2 to 33.30\r\n10. RH - relative humidity in %: 15.0 to 100\r\n11. wind - wind speed in km\/h: 0.40 to 9.40\r\n12. rain - outside rain in mm\/m2 : 0.0 to 6.4\r\n13. area - the burned area of the forest (in ha): 0.00 to 1090.84\r\n(this output variable is very skewed towards 0.0, thus it may make\r\nsense to model with the logarithm transform).", "format": "ARFF", "uploader": "Rafael Gomes Mantovani", "uploader_id": 64, "visibility": "public", "creator": "\"Paulo Cortez\",\"Anibal Morais\"", "contributor": null, "date": "2020-04-19 01:46:05", "update_comment": null, "last_update": "2020-04-19 01:46:05", "licence": "CC0", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/21829185\/forestFires.arff", "kaggle_url": null, "default_target_attribute": "area", "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "forest_fires", "Forest Fires Data Set This is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data. Data Set Information: In [Cortez and Morais, 2007], the output 'area' was first transformed with a ln(x+1) function. Then, several Data Mining methods were applied. After fitting the models, the outputs were post-processed with the inverse of the ln(x+1) transform. Four different input setups were used " ], "weight": 5 }, "qualities": { "NumberOfInstances": 517, "NumberOfFeatures": 13, "NumberOfClasses": 0, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 11, "NumberOfSymbolicFeatures": 2, "PercentageOfBinaryFeatures": 0, "PercentageOfInstancesWithMissingValues": 0, "PercentageOfMissingValues": 0, "AutoCorrelation": -13.814806201550388, "PercentageOfNumericFeatures": 84.61538461538461, "Dimensionality": 0.025145067698259187, "PercentageOfSymbolicFeatures": 15.384615384615385, "MajorityClassPercentage": null, "MajorityClassSize": null, "MinorityClassPercentage": null, "MinorityClassSize": null, "NumberOfBinaryFeatures": 0 }, "tags": [], "features": [ { "name": "area", "index": "12", "type": "numeric", "distinct": "251", "missing": "0", "target": "1", "min": "0", "max": "1091", "mean": "13", "stdev": "64" }, { "name": "X", "index": "0", "type": "numeric", "distinct": "9", "missing": "0", "min": "1", "max": "9", "mean": "5", "stdev": "2" }, { "name": "Y", "index": "1", "type": "numeric", "distinct": "7", "missing": "0", "min": "2", "max": "9", "mean": "4", "stdev": "1" }, { "name": "month", "index": "2", "type": "nominal", "distinct": "12", "missing": "0", "distr": [] }, { "name": "day", "index": "3", "type": "nominal", "distinct": "7", "missing": "0", "distr": [] }, { "name": "FFMC", "index": "4", "type": "numeric", "distinct": "106", "missing": "0", "min": "19", "max": "96", "mean": "91", "stdev": "6" }, { "name": "DMC", "index": "5", "type": "numeric", "distinct": "215", "missing": "0", "min": "1", "max": "291", "mean": "111", "stdev": "64" }, { "name": "DC", "index": "6", "type": "numeric", "distinct": "219", "missing": "0", "min": "8", "max": "861", "mean": "548", "stdev": "248" }, { "name": "ISI", "index": "7", "type": "numeric", "distinct": "119", "missing": "0", "min": "0", "max": "56", "mean": "9", "stdev": "5" }, { "name": "temp", "index": "8", "type": "numeric", "distinct": "192", "missing": "0", "min": "2", "max": "33", "mean": "19", "stdev": "6" }, { "name": "RH", "index": "9", "type": "numeric", "distinct": "75", "missing": "0", "min": "15", "max": "100", "mean": "44", "stdev": "16" }, { "name": "wind", "index": "10", "type": "numeric", "distinct": "21", "missing": "0", "min": "0", "max": "9", "mean": "4", "stdev": "2" }, { "name": "rain", "index": "11", "type": "numeric", "distinct": "7", "missing": "0", "min": "0", "max": "6", "mean": "0", "stdev": "0" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }