OpenML
forest_fires

forest_fires

active ARFF CC BY 4.0 Visibility: public Uploaded 22-12-2022 by Sebastian Fischer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Data Description The aim of this dataset is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data. The output 'area' was first transformed with a $ln(x+1)$ function. Then, several Data Mining methods were applied. After fitting the models, the outputs were post-processed with the inverse of the $ln(x+1)$ transform. Four different input setups were used. Attribute Description 1. *X* - x-axis spatial coordinate within the Montesinho park map: 1 to 9 2. *Y* - y-axis spatial coordinate within the Montesinho park map: 2 to 9 3. *month* - month of the year: 'jan' to 'dec' 4. *day* - day of the week: 'mon' to 'sun' 5. *FFMC* - FFMC index from the FWI system: 18.7 to 96.20 6. *DMC* - DMC index from the FWI system: 1.1 to 291.3 7. *DC* - DC index from the FWI system: 7.9 to 860.6 8. *ISI* - ISI index from the FWI system: 0.0 to 56.10 9. *temp* - temperature in Celsius degrees: 2.2 to 33.30 10. *RH* - relative humidity in %: 15.0 to 100 11. *wind* - wind speed in km/h: 0.40 to 9.40 12. *rain* - outside rain in mm/m2 : 0.0 to 6.4 13. *area* - the burned area of the forest (in ha): 0.00 to 1090.84 (this target variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform).

13 features

area (target)numeric251 unique values
0 missing
Xnumeric9 unique values
0 missing
Ynumeric7 unique values
0 missing
monthstring12 unique values
0 missing
daystring7 unique values
0 missing
FFMCnumeric106 unique values
0 missing
DMCnumeric215 unique values
0 missing
DCnumeric219 unique values
0 missing
ISInumeric119 unique values
0 missing
tempnumeric192 unique values
0 missing
RHnumeric75 unique values
0 missing
windnumeric21 unique values
0 missing
rainnumeric7 unique values
0 missing

19 properties

517
Number of instances (rows) of the dataset.
13
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
11
Number of numeric attributes.
0
Number of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
-13.81
Average class difference between consecutive instances.
0
Percentage of missing values.
0.03
Number of attributes divided by the number of instances.
84.62
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.

2 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: area
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - target_feature: area
Define a new task