R/classify_occ.R
classify_occ.RdClassifies occurrence records in levels of confidence in species identification
classify_occ( occ, spec = NULL, na.rm.coords = TRUE, crit.levels = c("det_by_spec", "not_spec_name", "image", "sci_collection", "field_obs", "no_criteria_met"), ignore.det.names = NULL, spec.ambiguity = "not.spec", institution.code = "institutionCode", collection.code = "collectionCode", catalog.number = "catalogNumber", year = "year", date.identified = "dateIdentified", species = "species", identified.by = "identifiedBy", decimal.latitude = "decimalLatitude", decimal.longitude = "decimalLongitude", basis.of.record = "basisOfRecord", media.type = "mediaType", occurrence.id = "occurrenceID", institution.source, year.event, scientific.name, determined.by, latitude, longitude, basis.of.rec, occ.id )
| occ | data frame with occurrence records information. |
|---|---|
| spec | data frame with specialists' names. See details. |
| na.rm.coords | logical. If |
| crit.levels | character. Vector with levels of confidence in decreasing
order. The criteria allowed are |
| ignore.det.names | character vector indicating strings in
|
| spec.ambiguity | character. Indicates how to deal with ambiguity in
specialists names. |
| institution.code | column name of |
| collection.code | column name of |
| catalog.number | column name of |
| year | Column name of |
| date.identified | Column name of |
| species | column name of |
| identified.by | column name of |
| decimal.latitude | column name of |
| decimal.longitude | column name of |
| basis.of.record | column name with the specific nature of the data record. See details. |
| media.type | column name of |
| occurrence.id | column name of |
| institution.source | deprecated, use |
| year.event | deprecated, use |
| scientific.name | deprecated, use |
| determined.by | deprecated, use |
| latitude | deprecated, use |
| longitude | deprecated, use |
| basis.of.rec | deprecated, use |
| occ.id | deprecated, use |
The occ data frame plus the classification of each record
in a new column, named naturaList_levels.
spec data frame must have columns separating LastName,
Name and Abbrev. See create_spec_df
function for a easy way to produce this data frame.
When ignore.det.name = NULL (default), the function ignores
strings with "RRC ID Flag", "NA", "", "-" and "_". When a character
vector is provided, the function adds the default strings to the provided
character vector and ignore all these strings as being a name of a taxonomist.
The function classifies the occurrence records in six levels of confidence in species identification. The six levels are:
det_by_spec - when the identification was made by a specialists
which is present in the list of specialists provided in the spec
argument;
not_spec_name - when the identification was made by a name who is
not a specialist name provide in spec;
image - the occurrence have not name of a identifier, but present
an image associated;
sci_collection - the occurrence have not name of a identifier,
but preserved in a scientific collection;
field_obs - the occurrence have not name of a identifier,
but it was identified in field observation;
no_criteria_met - no other criteria was met.
The (decreasing) order of the levels in the character vector determines the classification level order.
basis.of.record is a character vector with one of the following
types of record: PRESERVED_SPECIMEN, PreservedSpecimen,
HUMAN_OBSERVATION or HumanObservation, as in GBIF data
'basisOfRecord'.
media.type uses the same pattern as GBIF mediaType column,
indicating the existence of an associated image with stillImage.
Arthur V. Rodrigues