Alapan commited on
Commit
43eaf88
·
verified ·
1 Parent(s): cc48056

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -1
README.md CHANGED
@@ -5,4 +5,54 @@ language:
5
  base_model:
6
  - dslim/distilbert-NER
7
  pipeline_tag: token-classification
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  base_model:
6
  - dslim/distilbert-NER
7
  pipeline_tag: token-classification
8
+ tags:
9
+ - transport
10
+ - bus
11
+ ---
12
+ # 🚍 MyBusModel: A Custom NER Model for Public Transport Queries
13
+
14
+ BusRouteNER is a lightweight, rule-enhanced Named Entity Recognition (NER) model fine-tuned for identifying **bus numbers** and **stops/locations** in natural language queries related to public transportation in West Bengal, India.
15
+
16
+ ---
17
+
18
+ ## ✨ What does this model do?
19
+
20
+ This model is trained to extract two key entity types from user queries:
21
+
22
+ - `BUS_NUMBER`: Recognizes bus numbers like `12C/1`, `S-12`, `12B`, etc.
23
+ - `LOCATION`: Identifies source and destination locations such as `Howrah`, `Barrackpore`, `Santragachi`, etc.
24
+
25
+ It also filters out irrelevant **noise words** to give a clean and accurate entity list that can be used in downstream logic such as search, recommendations, or route-finding.
26
+
27
+ ---
28
+
29
+ ## 🔍 Example
30
+
31
+ **Input Query:**
32
+
33
+ `I want to go from Santragachi to Barrackpore, can I take 12C/1 or S-12?`
34
+
35
+
36
+ **Model Output:**
37
+
38
+ `Santragachi LOCATION`
39
+
40
+ `Barrackpore LOCATION`
41
+
42
+ `12C/1 BUS_NUMBER`
43
+
44
+ `S-12 BUS_NUMBER`
45
+
46
+
47
+ ---
48
+
49
+ ## 🧠 How it works
50
+
51
+ - Built using **spaCy** (`en_core_web_sm`) and extended with `EntityRuler` for custom NER logic.
52
+ - Bus numbers and stop names are sourced from curated CSV datasets.
53
+ - Custom regex patterns identify bus numbers with formats like `12C/1`, `S-12`, etc.
54
+ - Noise words like *I, want, take, can, should* are excluded from final entity extraction.
55
+
56
+ ---
57
+
58
+ ## 🛠 Usage