File size: 1,609 Bytes
d6ec21b 43eaf88 0beff2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
license: mit
language:
- en
base_model:
- dslim/distilbert-NER
pipeline_tag: token-classification
tags:
- transport
- bus
---
# ๐ MyBusModel: A Custom NER Model for Public Transport Queries
BusRouteNER is a lightweight, rule-enhanced Named Entity Recognition (NER) model fine-tuned for identifying **bus numbers** and **stops/locations** in natural language queries related to public transportation in West Bengal, India.
---
## โจ What does this model do?
This model is trained to extract two key entity types from user queries:
- `BUS_NUMBER`: Recognizes bus numbers like `12C/1`, `S-12`, `12B`, etc.
- `LOCATION`: Identifies source and destination locations such as `Howrah`, `Barrackpore`, `Santragachi`, etc.
It also filters out irrelevant **noise words** to give a clean and accurate entity list that can be used in downstream logic such as search, recommendations, or route-finding.
---
## ๐ Example
**Input Query:**
`I want to go from Santragachi to Barrackpore, can I take 12C/1 or S-12?`
**Model Output:**
`Santragachi LOCATION`
`Barrackpore LOCATION`
`12C/1 BUS_NUMBER`
`S-12 BUS_NUMBER`
---
## ๐ง How it works
- Built using **spaCy** (`en_core_web_sm`) and extended with `EntityRuler` for custom NER logic.
- Bus numbers and stop names are sourced from curated CSV datasets.
- Custom regex patterns identify bus numbers with formats like `12C/1`, `S-12`, etc.
- Noise words like *I, want, take, can, should* are excluded from final entity extraction.
---
## ๐ Results
- Precision: 0.8078439964943033
- Recall: 0.6660043352601156
- F1 Score: 0.7300990099009901 |