R/classify_occ.R
classify_occ.Rd
Classifies occurrence records in levels of confidence in species identification
classify_occ( occ, spec = NULL, na.rm.coords = TRUE, crit.levels = c("det_by_spec", "not_spec_name", "image", "sci_collection", "field_obs", "no_criteria_met"), ignore.det.names = NULL, spec.ambiguity = "not.spec", institution.code = "institutionCode", collection.code = "collectionCode", catalog.number = "catalogNumber", year = "year", date.identified = "dateIdentified", species = "species", identified.by = "identifiedBy", decimal.latitude = "decimalLatitude", decimal.longitude = "decimalLongitude", basis.of.record = "basisOfRecord", media.type = "mediaType", occurrence.id = "occurrenceID", institution.source, year.event, scientific.name, determined.by, latitude, longitude, basis.of.rec, occ.id )
occ | data frame with occurrence records information. |
---|---|
spec | data frame with specialists' names. See details. |
na.rm.coords | logical. If |
crit.levels | character. Vector with levels of confidence in decreasing
order. The criteria allowed are |
ignore.det.names | character vector indicating strings in
|
spec.ambiguity | character. Indicates how to deal with ambiguity in
specialists names. |
institution.code | column name of |
collection.code | column name of |
catalog.number | column name of |
year | Column name of |
date.identified | Column name of |
species | column name of |
identified.by | column name of |
decimal.latitude | column name of |
decimal.longitude | column name of |
basis.of.record | column name with the specific nature of the data record. See details. |
media.type | column name of |
occurrence.id | column name of |
institution.source | deprecated, use |
year.event | deprecated, use |
scientific.name | deprecated, use |
determined.by | deprecated, use |
latitude | deprecated, use |
longitude | deprecated, use |
basis.of.rec | deprecated, use |
occ.id | deprecated, use |
The occ
data frame plus the classification of each record
in a new column, named naturaList_levels
.
spec
data frame must have columns separating LastName
,
Name
and Abbrev
. See create_spec_df
function for a easy way to produce this data frame.
When ignore.det.name = NULL
(default), the function ignores
strings with "RRC ID Flag", "NA", "", "-" and "_".
When a character
vector is provided, the function adds the default strings to the provided
character vector and ignore all these strings as being a name of a taxonomist.
The function classifies the occurrence records in six levels of confidence in species identification. The six levels are:
det_by_spec
- when the identification was made by a specialists
which is present in the list of specialists provided in the spec
argument;
not_spec_name
- when the identification was made by a name who is
not a specialist name provide in spec
;
image
- the occurrence have not name of a identifier, but present
an image associated;
sci_collection
- the occurrence have not name of a identifier,
but preserved in a scientific collection;
field_obs
- the occurrence have not name of a identifier,
but it was identified in field observation;
no_criteria_met
- no other criteria was met.
The (decreasing) order of the levels in the character vector determines the classification level order.
basis.of.record
is a character vector with one of the following
types of record: PRESERVED_SPECIMEN
, PreservedSpecimen
,
HUMAN_OBSERVATION
or HumanObservation
, as in GBIF data
'basisOfRecord'.
media.type
uses the same pattern as GBIF mediaType column,
indicating the existence of an associated image with stillImage
.
Arthur V. Rodrigues