|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- dslim/distilbert-NER |
|
|
pipeline_tag: token-classification |
|
|
tags: |
|
|
- transport |
|
|
- bus |
|
|
--- |
|
|
# π MyBusModel: A Custom NER Model for Public Transport Queries |
|
|
|
|
|
BusRouteNER is a lightweight, rule-enhanced Named Entity Recognition (NER) model fine-tuned for identifying **bus numbers** and **stops/locations** in natural language queries related to public transportation in West Bengal, India. |
|
|
|
|
|
--- |
|
|
|
|
|
## β¨ What does this model do? |
|
|
|
|
|
This model is trained to extract two key entity types from user queries: |
|
|
|
|
|
- `BUS_NUMBER`: Recognizes bus numbers like `12C/1`, `S-12`, `12B`, etc. |
|
|
- `LOCATION`: Identifies source and destination locations such as `Howrah`, `Barrackpore`, `Santragachi`, etc. |
|
|
|
|
|
It also filters out irrelevant **noise words** to give a clean and accurate entity list that can be used in downstream logic such as search, recommendations, or route-finding. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Example |
|
|
|
|
|
**Input Query:** |
|
|
|
|
|
`I want to go from Santragachi to Barrackpore, can I take 12C/1 or S-12?` |
|
|
|
|
|
|
|
|
**Model Output:** |
|
|
|
|
|
`Santragachi LOCATION` |
|
|
|
|
|
`Barrackpore LOCATION` |
|
|
|
|
|
`12C/1 BUS_NUMBER` |
|
|
|
|
|
`S-12 BUS_NUMBER` |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## π§ How it works |
|
|
|
|
|
- Built using **spaCy** (`en_core_web_sm`) and extended with `EntityRuler` for custom NER logic. |
|
|
- Bus numbers and stop names are sourced from curated CSV datasets. |
|
|
- Custom regex patterns identify bus numbers with formats like `12C/1`, `S-12`, etc. |
|
|
- Noise words like *I, want, take, can, should* are excluded from final entity extraction. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Results |
|
|
- Precision: 0.8078439964943033 |
|
|
- Recall: 0.6660043352601156 |
|
|
- F1 Score: 0.7300990099009901 |