Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,70 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: microsoft/Phi-3.5-mini-instruct
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
library_name: transformers
|
| 6 |
+
license: mit
|
| 7 |
+
license_link: https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
+
tags:
|
| 10 |
+
- nlp
|
| 11 |
+
- ner
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## Geo-Temporal Entity Recognition Model
|
| 15 |
+
|
| 16 |
+
This model is a finetuned version of [Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). It detects the location and date entities in a query. It also maps the location entities to the corresponding set of countries, and the date entities to either a start and end date, or the corresponding months. Additionally, it generates the query that is cleaned from the date entities and the location entities that are countries.
|
| 17 |
+
|
| 18 |
+
The prompt on which the model was finetuned is the following:
|
| 19 |
+
|
| 20 |
+
```python
|
| 21 |
+
from datetime import date
|
| 22 |
+
today = date.today()
|
| 23 |
+
|
| 24 |
+
schema = """
|
| 25 |
+
country: Extracted list of country or countries,
|
| 26 |
+
periodStart: Period start in ISO 8601 format, filled in only if date-related entity corresponds to absolute date range, else null,
|
| 27 |
+
periodEnd: Period end in ISO 8601 format, filled in only if date-related entity corresponds to absolute date range, else null,
|
| 28 |
+
phase: A list of months indicated by integers from 1 to 12, filled in only if date-related entity corresponds to a reccuring yearly period. It should be null in case `periodStart` or `periodEnd` has been extracted.,
|
| 29 |
+
location: A list of the detected location-related entities,
|
| 30 |
+
date: A list of date-related entities,
|
| 31 |
+
cleanedQuery: A list of parts of the query cleaned from the extracted date-related entity and the location-related entity and their related parts (e.g. prepositions)
|
| 32 |
+
"""
|
| 33 |
+
|
| 34 |
+
EXTENDED_INSTRUCTION_TEXT = """
|
| 35 |
+
You are a Name-Entity Recognition system specialized in extracting and processing location and date related entities from text. Follow these steps:
|
| 36 |
+
|
| 37 |
+
1. Extract exact entities from the text:
|
| 38 |
+
- Location entities: Extract only if they are specific place names (not general terms like "sample locations")
|
| 39 |
+
- Date entities: Extract dates exactly as they appear in the text
|
| 40 |
+
Both should be extracted exactly as mentioned in the text, without modifications.
|
| 41 |
+
|
| 42 |
+
2. For each detected location entity:
|
| 43 |
+
- Map it to corresponding country name(s)
|
| 44 |
+
- If the location itself is a country, include it in the country list
|
| 45 |
+
- If country cannot be determined, return an empty list
|
| 46 |
+
|
| 47 |
+
3. For date-related entities, classify them into one of two categories:
|
| 48 |
+
a) Absolute date range:
|
| 49 |
+
- Convert to ISO 8601 date format (YYYY-MM-DD)
|
| 50 |
+
- Set periodStart and periodEnd
|
| 51 |
+
- Set phase to null
|
| 52 |
+
- Use %(today)s as reference for relative dates
|
| 53 |
+
|
| 54 |
+
b) Recurring yearly period:
|
| 55 |
+
- Set phase as list of integers (1-12) representing months
|
| 56 |
+
- Set periodStart and periodEnd to null
|
| 57 |
+
|
| 58 |
+
4. Clean the query by removing:
|
| 59 |
+
- Detected date entities and their syntactic relations (e.g., prepositions)
|
| 60 |
+
- Location entities (only if they are countries) and their relations
|
| 61 |
+
Return the remaining parts as a list of strings
|
| 62 |
+
|
| 63 |
+
Return the results in JSON format matching this schema: %(schema)s
|
| 64 |
+
|
| 65 |
+
IMPORTANT:
|
| 66 |
+
- Always return all fields defined in the schema
|
| 67 |
+
- Return only the JSON without any additional explanation or notes
|
| 68 |
+
- Ensure the JSON is properly formatted and parsable
|
| 69 |
+
""" % {"today": today, "schema": schema}
|
| 70 |
+
```
|