The aim of naturaList package is to implement a classification of occurrence records based on the suitability in the species identification record. The quality of classification is ranked up to six levels of confidence. Additionally, naturaList package provides tools to filter the occurrence data based on these classification levels, identify the possible specialists in the taxa and evaluate the effects of the filtering procedure on different descriptors of species spatial distribution of occurrence records (area of distribution and niche breadth). With naturaList package the users can filter large occurrence data based on well established and clear criterion, evaluate possible effect of data processing on downstream analysis and explore spatial occurrence data through an interactive interface.
Install the package:
naturaList has as the core function
classify_occ(). The rationale of the classification is that the most reliable identification of a specimen is made by a specialist in the taxa. To classify an occurrence at this level of confidence, the
classify_occ() function needs of an occurrence and a specialist dataset. The other levels in which data can be classified are derived from information contained in the occurrence dataset. The default order for classification in confidence levels is:
The user can alter this order, depending on his/her objectives, except for the Level 1 that is always a species determined by a specialist.
As example, we will use the datasets in naturaList:
A.setosa, as the occurrence dataset, and
speciaLists, as the specialist dataset. In the
A.setosa there are occurrence records for Alsophila setosa, a tree fern of the Brazilian Atlantic Forest. This dataset were downloaded from Global Biodiversity Information Facility (GBIF). The
speciaLists is a dataset with specialists of ferns and lycophytes of Brazil, which we gathered from the authors of this paper.
Classification using the default order of confidence levels
You can check how many occurrences was classified in each level:
You can easily create a specialist dataset using
create_spec_df(). You just need to provide a character vector with the names of specialists, and the output is a dataset formatted be used in
In this example, we use the names of four famous Brazilian musicians. Note that the Latin accent mark is provided, and even a nickname (e.g. Tom Jobim).
It might occur that some strings in the ‘identifiedBy’ column of the occurrence dataset do not correspond to a taxonomist name. Strings as such
"Unknown" often is included in the ‘identifiedBy’ data field. It is important then that such strings be ignored by the
classify_occ(), if not this function could flag an occurrence record as determined by a taxonomist when it was not.
To cope with this issue,
get_det_names() can be used to verify which strings are not taxonomists names. This function returns all unique strings in the ‘identifiedBy’ column of the dataset. Based on this list of names, you could create a character vector with the strings to be ignored by
classify_occ(), providing it to the
ignore.det.names argument. See also the
?classify_occ for more details.
# check out if there are strings which are not taxonomists get_det_names(A.setosa) # include these strings in a object ig.names <- c("Sem Informação" , "Anonymous") # use 'ignore.det.names' to ignore those strings in classify_occ() occ.class <- classify_occ(A.setosa, speciaLists, ignore.det.names = ig.names) table(occ.class$naturaList_levels)