File size: 1,609 Bytes
d6ec21b
 
 
 
 
 
 
43eaf88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0beff2f
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: mit
language:
- en
base_model:
- dslim/distilbert-NER
pipeline_tag: token-classification
tags:
- transport
- bus
---
# ๐Ÿš MyBusModel: A Custom NER Model for Public Transport Queries

BusRouteNER is a lightweight, rule-enhanced Named Entity Recognition (NER) model fine-tuned for identifying **bus numbers** and **stops/locations** in natural language queries related to public transportation in West Bengal, India.

---

## โœจ What does this model do?

This model is trained to extract two key entity types from user queries:

- `BUS_NUMBER`: Recognizes bus numbers like `12C/1`, `S-12`, `12B`, etc.
- `LOCATION`: Identifies source and destination locations such as `Howrah`, `Barrackpore`, `Santragachi`, etc.

It also filters out irrelevant **noise words** to give a clean and accurate entity list that can be used in downstream logic such as search, recommendations, or route-finding.

---

## ๐Ÿ” Example

**Input Query:**

`I want to go from Santragachi to Barrackpore, can I take 12C/1 or S-12?`


**Model Output:**

`Santragachi LOCATION`

`Barrackpore LOCATION`

`12C/1 BUS_NUMBER`

`S-12 BUS_NUMBER`


---

## ๐Ÿง  How it works

- Built using **spaCy** (`en_core_web_sm`) and extended with `EntityRuler` for custom NER logic.
- Bus numbers and stop names are sourced from curated CSV datasets.
- Custom regex patterns identify bus numbers with formats like `12C/1`, `S-12`, etc.
- Noise words like *I, want, take, can, should* are excluded from final entity extraction.

---

## ๐Ÿ›  Results
- Precision: 0.8078439964943033
- Recall: 0.6660043352601156
- F1 Score: 0.7300990099009901