raianand commited on
Commit
4168b18
Β·
verified Β·
1 Parent(s): 2a313b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -152
README.md CHANGED
@@ -4,164 +4,25 @@ emoji: πŸš‚
4
  colorFrom: indigo
5
  colorTo: blue
6
  sdk: gradio
7
- sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
 
11
  ---
12
-
13
- # πŸš‚ RailVaani - Railway Announcement Transcription with Contextual Biasing
14
-
15
- ## Overview
16
-
17
- RailVaani is an advanced speech-to-text system specifically designed for railway announcements. It uses OpenAI's Whisper model enhanced with **contextual biasing** to achieve high accuracy on domain-specific vocabulary without requiring expensive fine-tuning.
18
-
19
- ## Key Features
20
-
21
- - βœ… **No Fine-tuning Required**: Uses contextual biasing instead of model retraining
22
- - βœ… **Railway-Specific Vocabulary**: Optimized for Indian Railways terminology
23
- - βœ… **SMCP Integration**: Includes Standard Maritime Communication Phrases
24
- - βœ… **Automatic Entity Extraction**: Identifies train numbers, stations, platforms, times, etc.
25
- - βœ… **Multi-language Support**: English, Hindi, Marathi, Bengali, Tamil
26
- - βœ… **Real-time Processing**: Fast inference with efficient vocabulary constraints
27
-
28
- ## How It Works
29
-
30
- ### Contextual Biasing Method
31
-
32
- Instead of fine-tuning the entire Whisper model (which would require 680,000+ hours of labeled audio), RailVaani implements a **Tree-Constrained Pointer Generator (TCPGen)** approach:
33
-
34
- 1. **Prefix Tree Construction**: Railway vocabulary is organized into a trie data structure
35
- 2. **Vocabulary Constraint**: During post-processing, transcriptions are guided toward valid railway terms
36
- 3. **Smart Fallback**: System falls back to original Whisper output when vocabulary matching confidence is low
37
- 4. **Entity Extraction**: Regex-based pattern matching extracts structured information
38
-
39
- ### Technical Architecture
40
-
41
- ```
42
- Audio Input β†’ Whisper Model β†’ Original Transcript
43
- ↓
44
- Contextual Biasing (Prefix Tree)
45
- ↓
46
- Corrected Transcript β†’ Entity Extraction
47
- ↓
48
- Structured Railway Information
49
  ```
50
 
51
- ## Vocabulary Coverage
 
 
 
52
 
53
- The system includes:
54
 
55
- - **500+ Railway Terms**: Standard communication phrases, directions, status indicators
56
- - **100+ Station Names**: Major Indian railway stations and junctions
57
- - **Train Types**: Rajdhani, Shatabdi, Duronto, Vande Bharat, etc.
58
- - **Maritime Terms**: SMCP phrases for port and vessel communication
59
-
60
- ## Performance
61
-
62
- Compared to vanilla Whisper models:
63
-
64
- | Model | Original WER | With Biasing | Improvement |
65
- |-------|-------------|--------------|-------------|
66
- | Whisper-tiny | 40.27% | 29.26% | **27% reduction** |
67
- | Whisper-base | 31.11% | 19.45% | **37% reduction** |
68
- | Whisper-medium | 27.82% | 11.12% | **60% reduction** |
69
-
70
- *WER = Word Error Rate (lower is better)*
71
-
72
- ## Use Cases
73
-
74
- 1. **Railway Station Automation**: Transcribe platform announcements automatically
75
- 2. **Training & Simulation**: Analyze communication in maritime/railway training scenarios
76
- 3. **Accessibility**: Generate text captions for hearing-impaired passengers
77
- 4. **Analytics**: Extract structured data for delay analysis and performance monitoring
78
- 5. **Multi-language Support**: Process announcements in multiple Indian languages
79
-
80
- ## Technical Implementation
81
-
82
- ### Prefix Tree (Trie)
83
-
84
- The vocabulary is organized into a prefix tree for efficient lookup:
85
-
86
- ```python
87
- {
88
- 'a': {
89
- 'r': {
90
- 'r': {
91
- 'i': {
92
- 'v': {
93
- 'a': {
94
- 'l': {'<END>': True}
95
- }
96
- }
97
- }
98
- }
99
- }
100
- }
101
- }
102
  ```
103
-
104
- ### Entity Extraction Patterns
105
-
106
- ```python
107
- # Train number extraction
108
- r'train\s+(?:number\s+)?(\d{4,5})'
109
-
110
- # Station extraction
111
- r'from\s+([A-Za-z\s]+?)(?:\s+to|\s+junction)'
112
-
113
- # Platform extraction
114
- r'platform\s+(?:number\s+)?(\d+)'
115
- ```
116
-
117
- ## Comparison with Alternative Approaches
118
-
119
- | Approach | Data Required | Training Time | Accuracy | Deployment |
120
- |----------|--------------|---------------|----------|------------|
121
- | **Fine-tuning** | 100,000+ hours | Days-Weeks | High* | Complex |
122
- | **Prompt Engineering** | None | None | Medium | Simple |
123
- | **Contextual Biasing** | ~120 hours | Hours | High | **Simple** |
124
-
125
- *Requires massive dataset comparable to Whisper's 680k hours for similar performance
126
-
127
- ## Limitations
128
-
129
- 1. **Vocabulary Boundaries**: Performance degrades for terms outside the biasing list
130
- 2. **Language Mixing**: Code-switching between languages may reduce accuracy
131
- 3. **Novel Named Entities**: New station names or train names require vocabulary updates
132
- 4. **Acoustic Noise**: Heavy background noise still impacts base Whisper performance
133
-
134
- ## Future Enhancements
135
-
136
- - [ ] Dynamic vocabulary updates from live railway data
137
- - [ ] Integration with railway databases for real-time validation
138
- - [ ] Expanded language support for regional Indian languages
139
- - [ ] Confidence scoring for extracted entities
140
- - [ ] Speaker diarization for multi-speaker announcements
141
-
142
- ## Research Citation
143
-
144
- This implementation is inspired by:
145
-
146
- ```bibtex
147
- @article{lall2024contextual,
148
- title={Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model},
149
- author={Lall, Vishakha and Liu, Yisi},
150
- journal={arXiv preprint arXiv:2410.18363},
151
- year={2024}
152
- }
153
- ```
154
-
155
- ## License
156
-
157
- MIT License - See LICENSE file for details
158
-
159
- ## Acknowledgments
160
-
161
- - OpenAI Whisper team for the base ASR model
162
- - Singapore Polytechnic Centre of Excellence in Maritime Safety for maritime vocabulary
163
- - Indian Railways for SMCP standardization
164
-
165
- ---
166
-
167
- **Try it now**: Upload a railway announcement audio file or record one directly in the interface!
 
4
  colorFrom: indigo
5
  colorTo: blue
6
  sdk: gradio
7
+ sdk_version: 5.9.1
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ python_version: 3.11
12
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ```
14
 
15
+ **Key changes:**
16
+ 1. βœ… Added `python_version: 3.11` to force Python 3.11
17
+ 2. βœ… Updated `sdk_version: 5.9.1` (latest stable Gradio)
18
+ 3. βœ… Removed `runtime.txt` (not used by HF Spaces)
19
 
20
+ ## πŸ“ **Files to Upload/Update:**
21
 
22
+ 1. **README.md** - Update the YAML header (download the new one above)
23
+ 2. **requirements.txt** - Keep as is:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ```
25
+ openai-whisper
26
+ gradio
27
+ torch
28
+ numpy