Spaces:
Runtime error
Runtime error
File size: 3,275 Bytes
a62077e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | # AI Engine Model Summary
## Simplified ASR-Only Configuration
This engine has been simplified to use **ONLY** the IndicWav2Vec Hindi model for Automatic Speech Recognition (ASR).
---
## Active Model
### 1. IndicWav2Vec Hindi (Primary & Only Model)
- **Model ID**: `ai4bharat/indicwav2vec-hindi`
- **Type**: `Wav2Vec2ForCTC`
- **Purpose**: Automatic Speech Recognition (ASR) for Hindi and Indian languages
- **Status**: ✅ Active - Loaded at startup
- **Location**: `detect_stuttering.py` lines 26, 148-156
- **Authentication**: Requires `HF_TOKEN` environment variable
**Features:**
- Speech-to-text transcription
- Confidence scoring from model predictions
- Text-based stutter analysis (simple repetition detection)
---
## Removed Models
The following models have been **removed** to simplify the engine:
1. ❌ **MMS Language Identification (LID)** - `facebook/mms-lid-126`
- Previously used for language detection
- No longer needed - IndicWav2Vec handles Hindi natively
2. ❌ **Isolation Forest** (sklearn)
- Previously used for anomaly detection
- Removed - using simple text-based analysis instead
---
## Removed Libraries
The following signal processing libraries are no longer used:
- ❌ `parselmouth` (Praat) - Voice quality analysis
- ❌ `fastdtw` - Repetition detection via DTW
- ❌ `sklearn` - Machine learning algorithms
- ❌ Complex acoustic feature extraction (MFCC, formants, etc.)
---
## Current Pipeline
```
Audio Input
↓
IndicWav2Vec Hindi ASR
↓
Text Transcription
↓
Basic Text Analysis
↓
Results (transcript + simple stutter detection)
```
---
## API Response Format
The simplified engine returns:
```json
{
"actual_transcript": "transcribed text",
"target_transcript": "expected text (if provided)",
"mismatched_chars": ["timestamps of low confidence regions"],
"mismatch_percentage": 0.0,
"ctc_loss_score": 0.0,
"stutter_timestamps": [{"type": "repetition", "start": 0.0, "end": 0.5, ...}],
"total_stutter_duration": 0.0,
"stutter_frequency": 0.0,
"severity": "none|mild|moderate|severe",
"confidence_score": 0.8,
"speaking_rate_sps": 0.0,
"analysis_duration_seconds": 0.0,
"model_version": "indicwav2vec-hindi-asr-v1"
}
```
---
## Dependencies
**Required:**
- `transformers` 4.35.0 - For IndicWav2Vec model
- `torch` 2.0.1 - PyTorch backend
- `librosa` ≥0.10.0 - Audio loading (16kHz resampling)
- `numpy` - Array operations
**Optional (for legacy methods, not used in ASR mode):**
- `parselmouth` - Voice quality (not used)
- `fastdtw` - DTW algorithm (not used)
- `sklearn` - ML algorithms (not used)
---
## Usage
```python
from diagnosis.ai_engine.detect_stuttering import get_stutter_detector
detector = get_stutter_detector()
result = detector.analyze_audio(
audio_path="path/to/audio.wav",
proper_transcript="expected text", # optional
language="hindi" # default: hindi
)
print(result['actual_transcript']) # ASR transcription
```
---
## Notes
- The engine focuses **only** on ASR transcription
- Stutter detection is simplified to text-based repetition analysis
- No complex acoustic feature extraction
- Faster and lighter than the previous multi-model approach
- Optimized for Hindi but can handle other Indian languages
|