Since dataset is made by different sources (IUCN and Birdlife) and data models (spatial and non-spatial tables), harmonization of the objects is needed.
Code is: creates_attributes_sp_iucn.sql. Output table is: species_202001.attributes_sp_iucn.
Birdlife geometric data are processed, and selected attributes are extracted, in the way to get the same structure of processed IUCN data. Species flagged as sensitive are removed to avoid to disseminate their distribution (directly, or as intersection with protected areas). For Birdlife 2019-1 they are: Thalasseus bernsteini and Garrulax courtoisi (id_no: 22694585,22732350).
Code is: creates_attributes_sp_birdlife.sql. Output table is: species_202001.attributes_sp_birdlife.
IUCN and Birdlife selected attributes from geometric data are appended each other.
Code is: creates_attributes_sp.sql. Output table is: species_202001.attributes_sp.
Only species present in both datasets are included in the final selection.
Code is: creates_attributes.sql. Output table is: species_202001.attributes.
“Sytematic” groups (corals, sharks_rays_chimaeras, amphibians, birds, mammals) are processed independently using the flattening workflow (fully described in another section).
Geometries of all groups are filtered to include only species (selected in the previous “harmonization - step 5”).
Code is: creates_geoms.sql. Output tables are:
Flattening at 30 arcsec (~900 meters at equator) is applied to each group. Steps 00_create infrastructure
) and a_import input
tables) are executed independently.
If needed, geometry fix is applied after step a_
.
All the other steps are executed inside z_do_it_all.sh
script.
Outputs are exported as raster vrt, with attribute table (to be used for reclass) containing: “cid”|”species”|”richness”|”endemic_threatened”|”richness_endemic_threatened”
Environment and log files are reported.
SQL files are also reported, when geometry fix was needed (after step a_
of flattening).
Some of the species distribution ranges are too small to be (psuedo)rasterised at 1 Km (EG: 8 amphibians are left out, of which 3 are Data Deficient, 4 are Critically Endangered). They can be recovered assigning an artificial minimum range of 1 sqkm (the single pixel intersecting the centroid), then calculating the “boost” applied as ratio artificial/original. This goes in the todo-list
Non-spatial data are normalized directly in the final, output schema (species):
code
and the category=name
)id_no
) with category tables (through code
). Only id_no
present in both datasets (spatial and non spatial) are included in the final selection. These tables are propaedeutic for the next category (derived tables; dt_), and exist only to facilitate the filtering of the input tables. In the next future, they could be deleted, moving the related code in the derived tables sections.id_no
) with category tables (through arrays of code
). Only id_no
present in both datasets (spatial and non spatial) are included in the final selection. These tables are derived from the previous intermediate category (lookup tables; lt_), which exist only to facilitate the filtering of the input tables. In the next future, lookup tables could be deleted, moving the related code in this section.Code is: creates_output_schema.sql.
Output schema contains
Options for country filters are (bold=used; italic=to be reviewed):
presence
: Extant, Extinct Post-1500, Possibly Extant, Possibly Extinct, Presence Uncertainorigin
: Assisted Colonisation, Introduced, Native, Origin Uncertain, Reintroduced, Vagrantseasonality
: NULL, Non-Breeding Season, Breeding Season, Resident, Passage, Seasonal Occurrence Uncertain
Above impacts the calculation of endemicity check_countries.sql!The final step creates:
mt_attributes
table and all dt_
tables.Code is: creates_output_table_function.sql.