Spaces:
Running
Running
| prompt_author: Will Weaver, Kendall Fitzgerald | |
| prompt_author_institution: University of Michigan, Field Museum of Natural History | |
| prompt_name: FMNH_mammals_test6 | |
| prompt_version: v-6 | |
| prompt_description: Prompt developed by the University of Michigan. Adapted from SLTPvM. | |
| SLTPvB prompts all have standardized column headers (fields) that were chosen due | |
| to their reliability and prevalence in herbarium records. All field descriptions | |
| are based on the official Darwin Core guidelines. SLTPvB_long - The most verbose | |
| prompt option. Descriptions closely follow DwC guides. Detailed rules for the LLM | |
| to follow. Works best with double or triple OCR to increase attention back to the | |
| OCR (select 'use both OCR models' or 'handwritten + printed' along with trOCR). | |
| SLTPvB_medium - Shorter verion of _long. SLTPvB_short - The least verbose possible | |
| prompt while still providing rules and DwC descriptions. | |
| LLM: General Purpose | |
| instructions: 1. Refactor the unstructured OCR text into a dictionary based on the | |
| JSON structure outlined below. 2. Map the unstructured OCR text to the appropriate | |
| JSON key and populate the field given the user-defined rules. 3. JSON key values | |
| are permitted to remain empty strings if the corresponding information is not found | |
| in the unstructured OCR text. 4. Duplicate dictionary fields are not allowed. 5. | |
| Ensure all JSON keys are in camel case. 6. Ensure new JSON field values follow sentence | |
| case capitalization. 7. Ensure all key-value pairs in the JSON dictionary strictly | |
| adhere to the format and data types specified in the template. 8. Ensure output | |
| JSON string is valid JSON format. It should not have trailing commas or unquoted | |
| keys. 9. Only return a JSON dictionary represented as a string. You should not explain | |
| your answer. | |
| json_formatting_instructions: This section provides rules for formatting each JSON | |
| value organized by the JSON key. | |
| rules: | |
| catalogNumber: Barcode identifier, typically a number with at least 6 digits, but | |
| fewer than 30 digits. | |
| scientificName: The scientific name of the taxon including genus, specific epithet, | |
| and any lower classifications. Occasionally, the genus or specific epithet will | |
| be crossed out with pen or pencil and the correct genus or specific epithet name will | |
| be written above it. In this case, use the text written above the crossed-out | |
| text. | |
| genus: Taxonomic determination to genus. Genus must be capitalized. If genus is | |
| not present use the taxonomic family name followed by the word 'indet'. Occasionally, | |
| the genus name will be crossed out with pen or pencil and the correct genus name | |
| will be written above it. In this case, use the name written above the crossed | |
| out name. | |
| specificEpithet: The name of the species epithet of the scientificName. Only include | |
| the species epithet. Occasionally, the specific epithet name will be crossed out | |
| with pen or pencil and the correct specific epithet name will be written above | |
| it. In this case, use the name written above the crossed out name. | |
| speciesNameAuthorship: The authorship information for the scientificName formatted | |
| according to the conventions of the applicable Darwin Core nomenclatural code. | |
| collectedBy: A comma separated list of names of people, groups, or organizations | |
| responsible for observing, recording, collecting, or presenting the original specimen. | |
| The primary collector or observer should be listed first. | |
| collectorNumber: An identifier given to the occurrence at the time it was recorded, | |
| the specimen collectors number. It is often written vertically on the edge of | |
| the paper tag, with a line separating it from other information. It is often written | |
| in the y-axis orientation while the rest of the numbers, data and text are written | |
| in the x-axis orientation. It is sometimes written next to the sex symbol or next | |
| to the collector name or initials. | |
| identifiedBy: A comma separated list of names of people, groups, or organizations | |
| who assigned the taxon to the subject organism. This is not the specimen collector. | |
| verbatimCollectionDate: The verbatim original representation of the date and time | |
| information for when the specimen was collected. Date of collection exactly as | |
| it appears on the label. Do not change the format or correct typos. | |
| collectionDate: Date the specimen was collected formatted as year-month-day, YYYY-MM-DD. | |
| If specific components of the date are unknown, they should be replaced with zeros. | |
| Use 0000-00-00 if the entire date is unknown, YYYY-00-00 if only the year is known, | |
| and YYYY-MM-00 if year and month are known but day is not. | |
| collectionDateEnd: If a range of collection dates is provided, this is the later | |
| end date while collectionDate is the beginning date. Use the same formatting as | |
| for collectionDate. | |
| occurrenceRemarks: Verbatim text describing the specimens geographic location. Text | |
| describing the appearance of the specimen. A statement about the presence or absence | |
| of a taxon at a the collection location. Text describing the significance of the | |
| specimen, such as a specific expedition or notable collection. Description of | |
| mammal features such as size, color, wellbeing, molting pattern, smell and any | |
| other distinguishing morphological or physiological characteristics. | |
| habitat: Verbatim category or description of the habitat in which the specimen collection | |
| event occurred. | |
| country: The name of the country or major administrative unit in which the specimen | |
| was originally collected. | |
| stateProvince: The name of the next smaller administrative region than country (state, | |
| province, canton, department, region, etc.) in which the specimen was originally | |
| collected. | |
| county: The full, unabbreviated name of the next smaller administrative region than | |
| stateProvince (county, shire, department, parish etc.) in which the specimen was | |
| originally collected. | |
| locality: Description of geographic location, landscape, landmarks, regional features, | |
| nearby places, municipality, city, or any contextual information aiding in pinpointing | |
| the exact origin or location of the specimen. | |
| verbatimCoordinates: Verbatim location coordinates as they appear on the label. | |
| Do not convert formats. Possible coordinate types include [Lat, Long, UTM, TRS]. | |
| decimalLatitude: Latitude decimal coordinate. Correct and convert the verbatim location | |
| coordinates to conform with the decimal degrees GPS coordinate format. | |
| decimalLongitude: Longitude decimal coordinate. Correct and convert the verbatim | |
| location coordinates to conform with the decimal degrees GPS coordinate format. | |
| elevationUnits: Use m if the final elevation is reported in meters. Use ft if the | |
| final elevation is in feet. Units should match elevation. | |
| measurementsTL: The total length of the animal from snout to tip of the tail. This | |
| is usually a 3 digit number. It is the first number in a string of 3 or 4 measurement | |
| numbers that are usually separated by dashes, commas or spaces or are sometimes | |
| written vertically in the same order. This total length measurement will be the | |
| largest number in the series of 3 or 4 measurements numbers. | |
| measurementsTV: The length of the tail vertebrae of the animal from the first tail | |
| vertebrae to the last tail vertebrae. This is usually a minimum of 1 digit to | |
| a maximum of 3 digit number. It is the second number in a string of 3 or 4 measurement | |
| numbers that are usually separated by dashes, commas or spaces or are sometimes | |
| written vertically in the same order. | |
| measurementsHF: The length of the hindfoot of the animal with claw (H.F. cu) from | |
| the ankle to the tip of the longest claw. This is usually has at least 2 digits | |
| and a maximum of 3 digit number. It is the third number in a string of 3 or 4 | |
| measurement numbers that are usually separated by dashes, commas or spaces or | |
| are sometimes written vertically in the same order. | |
| measurementsEAR: The length of the ear of the animal. This is usually a 1 to 3 digit | |
| number. It is usually the fourth number in a string of 3 or 4 measurement numbers | |
| that are usually separated by dashes, commas or spaces or are sometimes written | |
| vertically in the same order. | |
| measurementsWEIGHT: The weight of the animal. This is usually a 1 to 3 digit number. | |
| It is sometimes preceded by an equal sign and or followed by the letter g which | |
| stands for the unit of grams. It is sometimes followed or preceded by the letters | |
| lbs for the unit of pounds. | |
| catalogNumberFMNH: Barcode identifier, typically a number with at least 3 digits, | |
| but fewer than 8 digits. It is typically preceded by or near the words Field Museum, | |
| FM, FMNH, or CNMH. | |
| collectionMethod: Mammals are sometimes intentionally caught by collectors, brought | |
| to collectors as roadkill or brought to collectors after being killed as pest. | |
| Text description may include description of how the animal was killed, for example | |
| as roadkill or in a trap or by a hunter. Record that information verbatim here. | |
| measurementsTLunits: Use mm if the Total Length is recorded in millimeters. Use | |
| in if the Total Length is recorded in inches. Units should match measurementsTVunits | |
| and measurementsHFunits and measurementsEARunits. | |
| measurementsTVunits: Use mm if the Tail Length is recorded in millimeters. Use in | |
| if the Tail Length is recorded in inches. Units should match measurementsTLunits | |
| and measurementsHFunits and measurementsEARunits. | |
| measurementsHFunits: Use mm if the hindfoot length is recorded in millimeters. Use | |
| in if the hindfoot length is recorded in inches. Units should match measurementsTVunits | |
| and measurementsTLunits and measurementsEARunits. | |
| measurementsEARunits: Use mm if the ear length is recorded in millimeters. Use in | |
| if the ear length is recorded in inches. Units should match measurementsTVunits | |
| and measurementsTLunits and measurementsHFunits. | |
| measurementsWEIGHTunits: Use g if the weight is recorded in millimeters. Use lbs | |
| if the weight is recorded in pounds. | |
| elevation: Elevation or altitude in meters or feet. | |
| mapping: | |
| TAXONOMY: | |
| - catalogNumber | |
| - scientificName | |
| - genus | |
| - specificEpithet | |
| - speciesNameAuthorship | |
| - collectedBy | |
| - collectorNumber | |
| - identifiedBy | |
| - catalogNumberFMNH | |
| GEOGRAPHY: | |
| - country | |
| - stateProvince | |
| - county | |
| - locality | |
| - verbatimCoordinates | |
| - decimalLatitude | |
| - decimalLongitude | |
| - elevationUnits | |
| - elevation | |
| COLLECTING: | |
| - verbatimCollectionDate | |
| - collectionDate | |
| - collectionDateEnd | |
| - habitat | |
| - occurrenceRemarks | |
| - collectionMethod | |
| LOCALITY: [] | |
| MISC: | |
| - measurementsTL | |
| - measurementsTV | |
| - measurementsEAR | |
| - measurementsHF | |
| - measurementsWEIGHT | |
| - measurementsTLunits | |
| - measurementsTVunits | |
| - measurementsHFunits | |
| - measurementsEARunits | |
| - measurementsWEIGHTunits | |