Update README.md
Browse files
README.md
CHANGED
|
@@ -4,164 +4,25 @@ emoji: π
|
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: blue
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
|
|
|
| 11 |
---
|
| 12 |
-
|
| 13 |
-
# π RailVaani - Railway Announcement Transcription with Contextual Biasing
|
| 14 |
-
|
| 15 |
-
## Overview
|
| 16 |
-
|
| 17 |
-
RailVaani is an advanced speech-to-text system specifically designed for railway announcements. It uses OpenAI's Whisper model enhanced with **contextual biasing** to achieve high accuracy on domain-specific vocabulary without requiring expensive fine-tuning.
|
| 18 |
-
|
| 19 |
-
## Key Features
|
| 20 |
-
|
| 21 |
-
- β
**No Fine-tuning Required**: Uses contextual biasing instead of model retraining
|
| 22 |
-
- β
**Railway-Specific Vocabulary**: Optimized for Indian Railways terminology
|
| 23 |
-
- β
**SMCP Integration**: Includes Standard Maritime Communication Phrases
|
| 24 |
-
- β
**Automatic Entity Extraction**: Identifies train numbers, stations, platforms, times, etc.
|
| 25 |
-
- β
**Multi-language Support**: English, Hindi, Marathi, Bengali, Tamil
|
| 26 |
-
- β
**Real-time Processing**: Fast inference with efficient vocabulary constraints
|
| 27 |
-
|
| 28 |
-
## How It Works
|
| 29 |
-
|
| 30 |
-
### Contextual Biasing Method
|
| 31 |
-
|
| 32 |
-
Instead of fine-tuning the entire Whisper model (which would require 680,000+ hours of labeled audio), RailVaani implements a **Tree-Constrained Pointer Generator (TCPGen)** approach:
|
| 33 |
-
|
| 34 |
-
1. **Prefix Tree Construction**: Railway vocabulary is organized into a trie data structure
|
| 35 |
-
2. **Vocabulary Constraint**: During post-processing, transcriptions are guided toward valid railway terms
|
| 36 |
-
3. **Smart Fallback**: System falls back to original Whisper output when vocabulary matching confidence is low
|
| 37 |
-
4. **Entity Extraction**: Regex-based pattern matching extracts structured information
|
| 38 |
-
|
| 39 |
-
### Technical Architecture
|
| 40 |
-
|
| 41 |
-
```
|
| 42 |
-
Audio Input β Whisper Model β Original Transcript
|
| 43 |
-
β
|
| 44 |
-
Contextual Biasing (Prefix Tree)
|
| 45 |
-
β
|
| 46 |
-
Corrected Transcript β Entity Extraction
|
| 47 |
-
β
|
| 48 |
-
Structured Railway Information
|
| 49 |
```
|
| 50 |
|
| 51 |
-
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
- **Train Types**: Rajdhani, Shatabdi, Duronto, Vande Bharat, etc.
|
| 58 |
-
- **Maritime Terms**: SMCP phrases for port and vessel communication
|
| 59 |
-
|
| 60 |
-
## Performance
|
| 61 |
-
|
| 62 |
-
Compared to vanilla Whisper models:
|
| 63 |
-
|
| 64 |
-
| Model | Original WER | With Biasing | Improvement |
|
| 65 |
-
|-------|-------------|--------------|-------------|
|
| 66 |
-
| Whisper-tiny | 40.27% | 29.26% | **27% reduction** |
|
| 67 |
-
| Whisper-base | 31.11% | 19.45% | **37% reduction** |
|
| 68 |
-
| Whisper-medium | 27.82% | 11.12% | **60% reduction** |
|
| 69 |
-
|
| 70 |
-
*WER = Word Error Rate (lower is better)*
|
| 71 |
-
|
| 72 |
-
## Use Cases
|
| 73 |
-
|
| 74 |
-
1. **Railway Station Automation**: Transcribe platform announcements automatically
|
| 75 |
-
2. **Training & Simulation**: Analyze communication in maritime/railway training scenarios
|
| 76 |
-
3. **Accessibility**: Generate text captions for hearing-impaired passengers
|
| 77 |
-
4. **Analytics**: Extract structured data for delay analysis and performance monitoring
|
| 78 |
-
5. **Multi-language Support**: Process announcements in multiple Indian languages
|
| 79 |
-
|
| 80 |
-
## Technical Implementation
|
| 81 |
-
|
| 82 |
-
### Prefix Tree (Trie)
|
| 83 |
-
|
| 84 |
-
The vocabulary is organized into a prefix tree for efficient lookup:
|
| 85 |
-
|
| 86 |
-
```python
|
| 87 |
-
{
|
| 88 |
-
'a': {
|
| 89 |
-
'r': {
|
| 90 |
-
'r': {
|
| 91 |
-
'i': {
|
| 92 |
-
'v': {
|
| 93 |
-
'a': {
|
| 94 |
-
'l': {'<END>': True}
|
| 95 |
-
}
|
| 96 |
-
}
|
| 97 |
-
}
|
| 98 |
-
}
|
| 99 |
-
}
|
| 100 |
-
}
|
| 101 |
-
}
|
| 102 |
```
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
# Train number extraction
|
| 108 |
-
r'train\s+(?:number\s+)?(\d{4,5})'
|
| 109 |
-
|
| 110 |
-
# Station extraction
|
| 111 |
-
r'from\s+([A-Za-z\s]+?)(?:\s+to|\s+junction)'
|
| 112 |
-
|
| 113 |
-
# Platform extraction
|
| 114 |
-
r'platform\s+(?:number\s+)?(\d+)'
|
| 115 |
-
```
|
| 116 |
-
|
| 117 |
-
## Comparison with Alternative Approaches
|
| 118 |
-
|
| 119 |
-
| Approach | Data Required | Training Time | Accuracy | Deployment |
|
| 120 |
-
|----------|--------------|---------------|----------|------------|
|
| 121 |
-
| **Fine-tuning** | 100,000+ hours | Days-Weeks | High* | Complex |
|
| 122 |
-
| **Prompt Engineering** | None | None | Medium | Simple |
|
| 123 |
-
| **Contextual Biasing** | ~120 hours | Hours | High | **Simple** |
|
| 124 |
-
|
| 125 |
-
*Requires massive dataset comparable to Whisper's 680k hours for similar performance
|
| 126 |
-
|
| 127 |
-
## Limitations
|
| 128 |
-
|
| 129 |
-
1. **Vocabulary Boundaries**: Performance degrades for terms outside the biasing list
|
| 130 |
-
2. **Language Mixing**: Code-switching between languages may reduce accuracy
|
| 131 |
-
3. **Novel Named Entities**: New station names or train names require vocabulary updates
|
| 132 |
-
4. **Acoustic Noise**: Heavy background noise still impacts base Whisper performance
|
| 133 |
-
|
| 134 |
-
## Future Enhancements
|
| 135 |
-
|
| 136 |
-
- [ ] Dynamic vocabulary updates from live railway data
|
| 137 |
-
- [ ] Integration with railway databases for real-time validation
|
| 138 |
-
- [ ] Expanded language support for regional Indian languages
|
| 139 |
-
- [ ] Confidence scoring for extracted entities
|
| 140 |
-
- [ ] Speaker diarization for multi-speaker announcements
|
| 141 |
-
|
| 142 |
-
## Research Citation
|
| 143 |
-
|
| 144 |
-
This implementation is inspired by:
|
| 145 |
-
|
| 146 |
-
```bibtex
|
| 147 |
-
@article{lall2024contextual,
|
| 148 |
-
title={Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model},
|
| 149 |
-
author={Lall, Vishakha and Liu, Yisi},
|
| 150 |
-
journal={arXiv preprint arXiv:2410.18363},
|
| 151 |
-
year={2024}
|
| 152 |
-
}
|
| 153 |
-
```
|
| 154 |
-
|
| 155 |
-
## License
|
| 156 |
-
|
| 157 |
-
MIT License - See LICENSE file for details
|
| 158 |
-
|
| 159 |
-
## Acknowledgments
|
| 160 |
-
|
| 161 |
-
- OpenAI Whisper team for the base ASR model
|
| 162 |
-
- Singapore Polytechnic Centre of Excellence in Maritime Safety for maritime vocabulary
|
| 163 |
-
- Indian Railways for SMCP standardization
|
| 164 |
-
|
| 165 |
-
---
|
| 166 |
-
|
| 167 |
-
**Try it now**: Upload a railway announcement audio file or record one directly in the interface!
|
|
|
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: blue
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 5.9.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
| 11 |
+
python_version: 3.11
|
| 12 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
```
|
| 14 |
|
| 15 |
+
**Key changes:**
|
| 16 |
+
1. β
Added `python_version: 3.11` to force Python 3.11
|
| 17 |
+
2. β
Updated `sdk_version: 5.9.1` (latest stable Gradio)
|
| 18 |
+
3. β
Removed `runtime.txt` (not used by HF Spaces)
|
| 19 |
|
| 20 |
+
## π **Files to Upload/Update:**
|
| 21 |
|
| 22 |
+
1. **README.md** - Update the YAML header (download the new one above)
|
| 23 |
+
2. **requirements.txt** - Keep as is:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
```
|
| 25 |
+
openai-whisper
|
| 26 |
+
gradio
|
| 27 |
+
torch
|
| 28 |
+
numpy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|