Spaces:
Runtime error
Runtime error
| # AI Engine Model Summary | |
| ## Simplified ASR-Only Configuration | |
| This engine has been simplified to use **ONLY** the IndicWav2Vec Hindi model for Automatic Speech Recognition (ASR). | |
| --- | |
| ## Active Model | |
| ### 1. IndicWav2Vec Hindi (Primary & Only Model) | |
| - **Model ID**: `ai4bharat/indicwav2vec-hindi` | |
| - **Type**: `Wav2Vec2ForCTC` | |
| - **Purpose**: Automatic Speech Recognition (ASR) for Hindi and Indian languages | |
| - **Status**: ✅ Active - Loaded at startup | |
| - **Location**: `detect_stuttering.py` lines 26, 148-156 | |
| - **Authentication**: Requires `HF_TOKEN` environment variable | |
| **Features:** | |
| - Speech-to-text transcription | |
| - Confidence scoring from model predictions | |
| - Text-based stutter analysis (simple repetition detection) | |
| --- | |
| ## Removed Models | |
| The following models have been **removed** to simplify the engine: | |
| 1. ❌ **MMS Language Identification (LID)** - `facebook/mms-lid-126` | |
| - Previously used for language detection | |
| - No longer needed - IndicWav2Vec handles Hindi natively | |
| 2. ❌ **Isolation Forest** (sklearn) | |
| - Previously used for anomaly detection | |
| - Removed - using simple text-based analysis instead | |
| --- | |
| ## Removed Libraries | |
| The following signal processing libraries are no longer used: | |
| - ❌ `parselmouth` (Praat) - Voice quality analysis | |
| - ❌ `fastdtw` - Repetition detection via DTW | |
| - ❌ `sklearn` - Machine learning algorithms | |
| - ❌ Complex acoustic feature extraction (MFCC, formants, etc.) | |
| --- | |
| ## Current Pipeline | |
| ``` | |
| Audio Input | |
| ↓ | |
| IndicWav2Vec Hindi ASR | |
| ↓ | |
| Text Transcription | |
| ↓ | |
| Basic Text Analysis | |
| ↓ | |
| Results (transcript + simple stutter detection) | |
| ``` | |
| --- | |
| ## API Response Format | |
| The simplified engine returns: | |
| ```json | |
| { | |
| "actual_transcript": "transcribed text", | |
| "target_transcript": "expected text (if provided)", | |
| "mismatched_chars": ["timestamps of low confidence regions"], | |
| "mismatch_percentage": 0.0, | |
| "ctc_loss_score": 0.0, | |
| "stutter_timestamps": [{"type": "repetition", "start": 0.0, "end": 0.5, ...}], | |
| "total_stutter_duration": 0.0, | |
| "stutter_frequency": 0.0, | |
| "severity": "none|mild|moderate|severe", | |
| "confidence_score": 0.8, | |
| "speaking_rate_sps": 0.0, | |
| "analysis_duration_seconds": 0.0, | |
| "model_version": "indicwav2vec-hindi-asr-v1" | |
| } | |
| ``` | |
| --- | |
| ## Dependencies | |
| **Required:** | |
| - `transformers` 4.35.0 - For IndicWav2Vec model | |
| - `torch` 2.0.1 - PyTorch backend | |
| - `librosa` ≥0.10.0 - Audio loading (16kHz resampling) | |
| - `numpy` - Array operations | |
| **Optional (for legacy methods, not used in ASR mode):** | |
| - `parselmouth` - Voice quality (not used) | |
| - `fastdtw` - DTW algorithm (not used) | |
| - `sklearn` - ML algorithms (not used) | |
| --- | |
| ## Usage | |
| ```python | |
| from diagnosis.ai_engine.detect_stuttering import get_stutter_detector | |
| detector = get_stutter_detector() | |
| result = detector.analyze_audio( | |
| audio_path="path/to/audio.wav", | |
| proper_transcript="expected text", # optional | |
| language="hindi" # default: hindi | |
| ) | |
| print(result['actual_transcript']) # ASR transcription | |
| ``` | |
| --- | |
| ## Notes | |
| - The engine focuses **only** on ASR transcription | |
| - Stutter detection is simplified to text-based repetition analysis | |
| - No complex acoustic feature extraction | |
| - Faster and lighter than the previous multi-model approach | |
| - Optimized for Hindi but can handle other Indian languages | |