Update README.md
Browse files
README.md
CHANGED
|
@@ -5,4 +5,54 @@ language:
|
|
| 5 |
base_model:
|
| 6 |
- dslim/distilbert-NER
|
| 7 |
pipeline_tag: token-classification
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
base_model:
|
| 6 |
- dslim/distilbert-NER
|
| 7 |
pipeline_tag: token-classification
|
| 8 |
+
tags:
|
| 9 |
+
- transport
|
| 10 |
+
- bus
|
| 11 |
+
---
|
| 12 |
+
# 🚍 MyBusModel: A Custom NER Model for Public Transport Queries
|
| 13 |
+
|
| 14 |
+
BusRouteNER is a lightweight, rule-enhanced Named Entity Recognition (NER) model fine-tuned for identifying **bus numbers** and **stops/locations** in natural language queries related to public transportation in West Bengal, India.
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## ✨ What does this model do?
|
| 19 |
+
|
| 20 |
+
This model is trained to extract two key entity types from user queries:
|
| 21 |
+
|
| 22 |
+
- `BUS_NUMBER`: Recognizes bus numbers like `12C/1`, `S-12`, `12B`, etc.
|
| 23 |
+
- `LOCATION`: Identifies source and destination locations such as `Howrah`, `Barrackpore`, `Santragachi`, etc.
|
| 24 |
+
|
| 25 |
+
It also filters out irrelevant **noise words** to give a clean and accurate entity list that can be used in downstream logic such as search, recommendations, or route-finding.
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## 🔍 Example
|
| 30 |
+
|
| 31 |
+
**Input Query:**
|
| 32 |
+
|
| 33 |
+
`I want to go from Santragachi to Barrackpore, can I take 12C/1 or S-12?`
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
**Model Output:**
|
| 37 |
+
|
| 38 |
+
`Santragachi LOCATION`
|
| 39 |
+
|
| 40 |
+
`Barrackpore LOCATION`
|
| 41 |
+
|
| 42 |
+
`12C/1 BUS_NUMBER`
|
| 43 |
+
|
| 44 |
+
`S-12 BUS_NUMBER`
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## 🧠 How it works
|
| 50 |
+
|
| 51 |
+
- Built using **spaCy** (`en_core_web_sm`) and extended with `EntityRuler` for custom NER logic.
|
| 52 |
+
- Bus numbers and stop names are sourced from curated CSV datasets.
|
| 53 |
+
- Custom regex patterns identify bus numbers with formats like `12C/1`, `S-12`, etc.
|
| 54 |
+
- Noise words like *I, want, take, can, should* are excluded from final entity extraction.
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## 🛠 Usage
|