Classifies occurrence records in levels of confidence in species identification

classify_occ(
  occ,
  spec = NULL,
  na.rm.coords = TRUE,
  crit.levels = c("det_by_spec", "not_spec_name", "image", "sci_collection",
    "field_obs", "no_criteria_met"),
  ignore.det.names = NULL,
  spec.ambiguity = "not.spec",
  institution.code = "institutionCode",
  collection.code = "collectionCode",
  catalog.number = "catalogNumber",
  year = "year",
  date.identified = "dateIdentified",
  species = "species",
  identified.by = "identifiedBy",
  decimal.latitude = "decimalLatitude",
  decimal.longitude = "decimalLongitude",
  basis.of.record = "basisOfRecord",
  media.type = "mediaType",
  occurrence.id = "occurrenceID",
  institution.source,
  year.event,
  scientific.name,
  determined.by,
  latitude,
  longitude,
  basis.of.rec,
  occ.id
)

Arguments

occ

data frame with occurrence records information.

spec

data frame with specialists' names. See details.

na.rm.coords

logical. If TRUE, remove occurrences with NA in decimal.latitude or decimal.longitude

crit.levels

character. Vector with levels of confidence in decreasing order. The criteria allowed are det_by_spec, not_spec_name, image, sci_collection, field_obs, no_criteria_met. See details.

ignore.det.names

character vector indicating strings in identified.by that should be ignored as a taxonomist. See details.

spec.ambiguity

character. Indicates how to deal with ambiguity in specialists names. not.spec solve ambiguity by classifying the identification as done by a non-specialist;is.spec assumes the identification was done by a specialist; manual.check enables the user to manually check all ambiguous names. Default is not.spec.

institution.code

column name of occ with the name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record.

collection.code

column name of occ with The name, acronym, code, or initials identifying the collection or data set from which the record was derived.

catalog.number

column name of occ with an identifier (preferably unique) for the record within the data set or collection.

year

Column name of occ the four-digit year in which the Event occurred, according to the Common Era Calendar.

date.identified

Column name of occ with the date on which the subject was determined as representing the Taxon.

species

column name of occ with the species names.

identified.by

column name of occ with the name of who determined the species.

decimal.latitude

column name of occ latitude in decimal degrees.

decimal.longitude

column name of occ longitude in decimal degrees.

basis.of.record

column name with the specific nature of the data record. See details.

media.type

column name of occ with the media type of recording. See details.

occurrence.id

column name of occ with link or code for the occurrence record. See in Darwin Core Format

institution.source

deprecated, use institution.code instead.

year.event

deprecated, use year instead.

scientific.name

deprecated, use species instead.

determined.by

deprecated, use identified.by instead

latitude

deprecated, use decimal.latitude instead

longitude

deprecated, use decimal.longitude instead

basis.of.rec

deprecated, use basis.of.record instead.

occ.id

deprecated, use occurrence.id instead

Value

The occ data frame plus the classification of each record in a new column, named naturaList_levels.

Details

spec data frame must have columns separating LastName, Name and Abbrev. See create_spec_df function for a easy way to produce this data frame.

When ignore.det.name = NULL (default), the function ignores strings with "RRC ID Flag", "NA", "", "-" and "_". When a character vector is provided, the function adds the default strings to the provided character vector and ignore all these strings as being a name of a taxonomist.

The function classifies the occurrence records in six levels of confidence in species identification. The six levels are:

  • det_by_spec - when the identification was made by a specialists which is present in the list of specialists provided in the spec argument;

  • not_spec_name - when the identification was made by a name who is not a specialist name provide in spec;

  • image - the occurrence have not name of a identifier, but present an image associated;

  • sci_collection - the occurrence have not name of a identifier, but preserved in a scientific collection;

  • field_obs - the occurrence have not name of a identifier, but it was identified in field observation;

  • no_criteria_met - no other criteria was met.

The (decreasing) order of the levels in the character vector determines the classification level order.

basis.of.record is a character vector with one of the following types of record: PRESERVED_SPECIMEN, PreservedSpecimen, HUMAN_OBSERVATION or HumanObservation, as in GBIF data 'basisOfRecord'.

media.type uses the same pattern as GBIF mediaType column, indicating the existence of an associated image with stillImage.

See also

Author

Arthur V. Rodrigues

Examples

data("A.setosa") data("speciaLists") occ.class <- classify_occ(A.setosa, speciaLists)