Data
Tallo

Tallo

active ARFF CC BY 4.0 Visibility: public Uploaded 03-02-2023 by Pieter Gijsbers
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
TALLO - a global tree allometry and crown architecture database. This is the Tallo dataset described in Jucker et al. (2022) but recreated with Python scripts from Laurens Bliek. The scripts can be found at https://github.com/lbliek/TALLO_ML/tree/TALLO_ML1. The Tallo database (v1.0.0) is a collection of 498,838 georeferenced and taxonomically standardized records of individual trees for which stem diameter, height and/or crown radius have been measured. Data were compiled from 61,856 globally distributed sites and include measurements for 5,163 tree species (Jucker et al., 2022). Data was sourced from published articles between 1988 and 2021, as well as online resources: https://github.com/lbliek/TALLO_ML/blob/main/DB/Reference_look_up_table.csv The constructed data set and associated meta-data is for use case 3 in the referenced paper: predicting tree height based on climate data and stem diameter. This means a large portion of data is ignored by default (set as attributes to be ignored). Note: Samples are taken from different sources spanning decades, multiple samples may be taken from distinct trees in the same approximate geographical location. These relationships between samples are ignored when generating tasks on OpenML. List with a description for each feature: |Field|Description| |---|---| |tree_id|Unique tree identifier code| |division|Major phylogenetic division (Angiosperm or Gymnosperm)| |family|Family name| |genus|Genus name| |species|Species binomial name| |latitude|Latitude (in decimal degrees)| |longitude|Longitude (in decimal degrees)| |stem_diameter_cm|Stem diameter (in cm). For multi-stemmed trees values for individual stems (Di) were pooled into a single value calculated as: sqrt(sum(Di^2)). Log-scaled (base 10).| |height_m|Tree height (in m). Log-scaled (base 10).| |crown_radius_m|Crown radius (in m)| |height_outlier|Identifier for trees with height values flagged as outliers (Y = outlier; N = non-outlier)| |crown_radius_outlier|Identifier for trees with crown radius values flagged as outliers (Y = outlier; N = non-outlier)| |reference_id|Reference code corresponding to the data source from which a record was obtained (see 'Reference_look_up_table.csv' for details on data sources).| |realm|"Biogeographic realm. Follows the classification of Olson et al. (2001) BioScience, 51, 933-938"| |biome|"Biome type. Follows the classification of Olson et al. (2001) BioScience, 51, 933-938"| |mean_annual_rainfall|Mean annual rainfall (in mm/yr). Values were obtained from the WorldClim2 database based on the geographic coordinates of the tree.| |rainfall_seasonality|Rainfall seasonality (coefficent of variation). Values were obtained from the WorldClim2 database based on the geographic coordinates of the tree.| |aridity_index|Aridity index (calculated as mean annual precipitation / potential evapotranspiration). Values were obtained from the Global Aridity Index and Potential Evapotranspiration Climate Database (v2) based on the geographic coordinates of the tree. Log-scaled (base 10).| |mean_annual_temperature|Mean annual temperature (in degree C). Values were obtained from the WorldClim2 database based on the geographic coordinates of the tree.| |maximum_temperature|Maximum temperature of the warmest month (in degree C). Values were obtained from the WorldClim2 database based on the geographic coordinates of the tree.| |AT_AI| Ratio of 'mean annual temperature' over log-scaled 'aridity index'.|

21 features

height_m (target)numeric604 unique values
0 missing
tree_id (row identifier)string307014 unique values
0 missing
divisionstring2 unique values
0 missing
familystring172 unique values
0 missing
genusstring1221 unique values
0 missing
speciesstring3831 unique values
0 missing
latitudenumeric8980 unique values
0 missing
longitudenumeric13694 unique values
0 missing
stem_diameter_cmnumeric1496 unique values
0 missing
crown_radius_mnumeric362 unique values
0 missing
height_outlierstring1 unique values
0 missing
crown_radius_outlierstring1 unique values
0 missing
reference_idnumeric56 unique values
0 missing
realmstring7 unique values
0 missing
biomestring10 unique values
0 missing
mean_annual_rainfallnumeric1761 unique values
0 missing
rainfall_seasonalitynumeric58324 unique values
0 missing
aridity_indexnumeric12935 unique values
0 missing
mean_annual_temperaturenumeric7710 unique values
0 missing
maximum_temperaturenumeric418 unique values
0 missing
biome_divisionstring18 unique values
0 missing
AT_AInumeric60504 unique values
0 missing

19 properties

307014
Number of instances (rows) of the dataset.
21
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
12
Number of numeric attributes.
0
Number of nominal attributes.
0.87
Average class difference between consecutive instances.
0
Percentage of missing values.
0
Number of attributes divided by the number of instances.
57.14
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.

0 tasks

Define a new task