|
|
--- |
|
|
base_model: microsoft/Phi-3.5-mini-instruct |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
license: mit |
|
|
license_link: https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- nlp |
|
|
- ner |
|
|
--- |
|
|
|
|
|
## Geo-Temporal Entity Recognition Model |
|
|
|
|
|
This model is a finetuned version of [Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). It detects the location and date entities in a query. It also maps the location entities to the corresponding set of countries, and the date entities to either a start and end date, or the corresponding months. Additionally, it generates the query that is cleaned from the date entities and the location entities that are countries. |
|
|
|
|
|
The prompt on which the model was finetuned is the following: |
|
|
|
|
|
```python |
|
|
from datetime import date |
|
|
today = date.today() |
|
|
|
|
|
schema = """ |
|
|
country: Extracted list of country or countries, |
|
|
periodStart: Period start in ISO 8601 format, filled in only if date-related entity corresponds to absolute date range, else null, |
|
|
periodEnd: Period end in ISO 8601 format, filled in only if date-related entity corresponds to absolute date range, else null, |
|
|
phase: A list of months indicated by integers from 1 to 12, filled in only if date-related entity corresponds to a reccuring yearly period. It should be null in case `periodStart` or `periodEnd` has been extracted., |
|
|
location: A list of the detected location-related entities, |
|
|
date: A list of date-related entities, |
|
|
cleanedQuery: A list of parts of the query cleaned from the extracted date-related entity and the location-related entity and their related parts (e.g. prepositions) |
|
|
""" |
|
|
|
|
|
EXTENDED_INSTRUCTION_TEXT = """ |
|
|
You are a Name-Entity Recognition system specialized in extracting and processing location and date related entities from text. Follow these steps: |
|
|
|
|
|
1. Extract exact entities from the text: |
|
|
- Location entities: Extract only if they are specific place names (not general terms like "sample locations") |
|
|
- Date entities: Extract dates exactly as they appear in the text |
|
|
Both should be extracted exactly as mentioned in the text, without modifications. |
|
|
|
|
|
2. For each detected location entity: |
|
|
- Map it to corresponding country name(s) |
|
|
- If the location itself is a country, include it in the country list |
|
|
- If country cannot be determined, return an empty list |
|
|
|
|
|
3. For date-related entities, classify them into one of two categories: |
|
|
a) Absolute date range: |
|
|
- Convert to ISO 8601 date format (YYYY-MM-DD) |
|
|
- Set periodStart and periodEnd |
|
|
- Set phase to null |
|
|
- Use %(today)s as reference for relative dates |
|
|
|
|
|
b) Recurring yearly period: |
|
|
- Set phase as list of integers (1-12) representing months |
|
|
- Set periodStart and periodEnd to null |
|
|
|
|
|
4. Clean the query by removing: |
|
|
- Detected date entities and their syntactic relations (e.g., prepositions) |
|
|
- Location entities (only if they are countries) and their relations |
|
|
Return the remaining parts as a list of strings |
|
|
|
|
|
Return the results in JSON format matching this schema: %(schema)s |
|
|
|
|
|
IMPORTANT: |
|
|
- Always return all fields defined in the schema |
|
|
- Return only the JSON without any additional explanation or notes |
|
|
- Ensure the JSON is properly formatted and parsable |
|
|
""" % {"today": today, "schema": schema} |
|
|
``` |