Spaces:

raianand
/

RailVaani

Runtime error

App Files Files Community

raianand commited on Feb 13

Commit

4168b18

verified ·

1 Parent(s): 2a313b5

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -152

README.md CHANGED Viewed

@@ -4,164 +4,25 @@ emoji: 🚂
 colorFrom: indigo
 colorTo: blue
 sdk: gradio
-sdk_version: 4.44.0
 app_file: app.py
 pinned: false
 license: mit
 ---
-# 🚂 RailVaani - Railway Announcement Transcription with Contextual Biasing
-## Overview
-RailVaani is an advanced speech-to-text system specifically designed for railway announcements. It uses OpenAI's Whisper model enhanced with **contextual biasing** to achieve high accuracy on domain-specific vocabulary without requiring expensive fine-tuning.
-## Key Features
-- ✅ **No Fine-tuning Required**: Uses contextual biasing instead of model retraining
-- ✅ **Railway-Specific Vocabulary**: Optimized for Indian Railways terminology
-- ✅ **SMCP Integration**: Includes Standard Maritime Communication Phrases
-- ✅ **Automatic Entity Extraction**: Identifies train numbers, stations, platforms, times, etc.
-- ✅ **Multi-language Support**: English, Hindi, Marathi, Bengali, Tamil
-- ✅ **Real-time Processing**: Fast inference with efficient vocabulary constraints
-## How It Works
-### Contextual Biasing Method
-Instead of fine-tuning the entire Whisper model (which would require 680,000+ hours of labeled audio), RailVaani implements a **Tree-Constrained Pointer Generator (TCPGen)** approach:
-1. **Prefix Tree Construction**: Railway vocabulary is organized into a trie data structure
-2. **Vocabulary Constraint**: During post-processing, transcriptions are guided toward valid railway terms
-3. **Smart Fallback**: System falls back to original Whisper output when vocabulary matching confidence is low
-4. **Entity Extraction**: Regex-based pattern matching extracts structured information
-### Technical Architecture
-```
-Audio Input → Whisper Model → Original Transcript
-                    ↓
-            Contextual Biasing (Prefix Tree)
-                    ↓
-        Corrected Transcript → Entity Extraction
-                    ↓
-            Structured Railway Information
 ```
-## Vocabulary Coverage
-The system includes:
-- **500+ Railway Terms**: Standard communication phrases, directions, status indicators
-- **100+ Station Names**: Major Indian railway stations and junctions
-- **Train Types**: Rajdhani, Shatabdi, Duronto, Vande Bharat, etc.
-- **Maritime Terms**: SMCP phrases for port and vessel communication
-## Performance
-Compared to vanilla Whisper models:
-| Model | Original WER | With Biasing | Improvement |
-|-------|-------------|--------------|-------------|
-| Whisper-tiny | 40.27% | 29.26% | **27% reduction** |
-| Whisper-base | 31.11% | 19.45% | **37% reduction** |
-| Whisper-medium | 27.82% | 11.12% | **60% reduction** |
-*WER = Word Error Rate (lower is better)*
-## Use Cases
-1. **Railway Station Automation**: Transcribe platform announcements automatically
-2. **Training & Simulation**: Analyze communication in maritime/railway training scenarios
-3. **Accessibility**: Generate text captions for hearing-impaired passengers
-4. **Analytics**: Extract structured data for delay analysis and performance monitoring
-5. **Multi-language Support**: Process announcements in multiple Indian languages
-## Technical Implementation
-### Prefix Tree (Trie)
-The vocabulary is organized into a prefix tree for efficient lookup:
-```python
-{
-  'a': {
-    'r': {
-      'r': {
-        'i': {
-          'v': {
-            'a': {
-              'l': {'<END>': True}
-            }
-          }
-        }
-      }
-    }
-  }
-}
 ```
-### Entity Extraction Patterns
-```python
-# Train number extraction
-r'train\s+(?:number\s+)?(\d{4,5})'
-# Station extraction
-r'from\s+([A-Za-z\s]+?)(?:\s+to|\s+junction)'
-# Platform extraction
-r'platform\s+(?:number\s+)?(\d+)'
-```
-## Comparison with Alternative Approaches
-| Approach | Data Required | Training Time | Accuracy | Deployment |
-|----------|--------------|---------------|----------|------------|
-| **Fine-tuning** | 100,000+ hours | Days-Weeks | High* | Complex |
-| **Prompt Engineering** | None | None | Medium | Simple |
-| **Contextual Biasing** | ~120 hours | Hours | High | **Simple** |
-*Requires massive dataset comparable to Whisper's 680k hours for similar performance
-## Limitations
-1. **Vocabulary Boundaries**: Performance degrades for terms outside the biasing list
-2. **Language Mixing**: Code-switching between languages may reduce accuracy
-3. **Novel Named Entities**: New station names or train names require vocabulary updates
-4. **Acoustic Noise**: Heavy background noise still impacts base Whisper performance
-## Future Enhancements
-- [ ] Dynamic vocabulary updates from live railway data
-- [ ] Integration with railway databases for real-time validation
-- [ ] Expanded language support for regional Indian languages
-- [ ] Confidence scoring for extracted entities
-- [ ] Speaker diarization for multi-speaker announcements
-## Research Citation
-This implementation is inspired by:
-```bibtex
-@article{lall2024contextual,
-  title={Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model},
-  author={Lall, Vishakha and Liu, Yisi},
-  journal={arXiv preprint arXiv:2410.18363},
-  year={2024}
-}
-```
-## License
-MIT License - See LICENSE file for details
-## Acknowledgments
-- OpenAI Whisper team for the base ASR model
-- Singapore Polytechnic Centre of Excellence in Maritime Safety for maritime vocabulary
-- Indian Railways for SMCP standardization
----
-**Try it now**: Upload a railway announcement audio file or record one directly in the interface!

 colorFrom: indigo
 colorTo: blue
 sdk: gradio
+sdk_version: 5.9.1
 app_file: app.py
 pinned: false
 license: mit
+python_version: 3.11
 ---
 ```
+**Key changes:**
+1. ✅ Added `python_version: 3.11` to force Python 3.11
+2. ✅ Updated `sdk_version: 5.9.1` (latest stable Gradio)
+3. ✅ Removed `runtime.txt` (not used by HF Spaces)
+## 📝 **Files to Upload/Update:**
+1. **README.md** - Update the YAML header (download the new one above)
+2. **requirements.txt** - Keep as is:
 ```
+   openai-whisper
+   gradio
+   torch
+   numpy