File size: 3,275 Bytes
a62077e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# AI Engine Model Summary

## Simplified ASR-Only Configuration

This engine has been simplified to use **ONLY** the IndicWav2Vec Hindi model for Automatic Speech Recognition (ASR).

---

## Active Model

### 1. IndicWav2Vec Hindi (Primary & Only Model)
- **Model ID**: `ai4bharat/indicwav2vec-hindi`
- **Type**: `Wav2Vec2ForCTC`
- **Purpose**: Automatic Speech Recognition (ASR) for Hindi and Indian languages
- **Status**: ✅ Active - Loaded at startup
- **Location**: `detect_stuttering.py` lines 26, 148-156
- **Authentication**: Requires `HF_TOKEN` environment variable

**Features:**
- Speech-to-text transcription
- Confidence scoring from model predictions
- Text-based stutter analysis (simple repetition detection)

---

## Removed Models

The following models have been **removed** to simplify the engine:

1.**MMS Language Identification (LID)** - `facebook/mms-lid-126`
   - Previously used for language detection
   - No longer needed - IndicWav2Vec handles Hindi natively

2.**Isolation Forest** (sklearn)
   - Previously used for anomaly detection
   - Removed - using simple text-based analysis instead

---

## Removed Libraries

The following signal processing libraries are no longer used:

-`parselmouth` (Praat) - Voice quality analysis
-`fastdtw` - Repetition detection via DTW
-`sklearn` - Machine learning algorithms
- ❌ Complex acoustic feature extraction (MFCC, formants, etc.)

---

## Current Pipeline

```
Audio Input

IndicWav2Vec Hindi ASR

Text Transcription

Basic Text Analysis

Results (transcript + simple stutter detection)
```

---

## API Response Format

The simplified engine returns:

```json
{
  "actual_transcript": "transcribed text",
  "target_transcript": "expected text (if provided)",
  "mismatched_chars": ["timestamps of low confidence regions"],
  "mismatch_percentage": 0.0,
  "ctc_loss_score": 0.0,
  "stutter_timestamps": [{"type": "repetition", "start": 0.0, "end": 0.5, ...}],
  "total_stutter_duration": 0.0,
  "stutter_frequency": 0.0,
  "severity": "none|mild|moderate|severe",
  "confidence_score": 0.8,
  "speaking_rate_sps": 0.0,
  "analysis_duration_seconds": 0.0,
  "model_version": "indicwav2vec-hindi-asr-v1"
}
```

---

## Dependencies

**Required:**
- `transformers` 4.35.0 - For IndicWav2Vec model
- `torch` 2.0.1 - PyTorch backend
- `librosa` ≥0.10.0 - Audio loading (16kHz resampling)
- `numpy` - Array operations

**Optional (for legacy methods, not used in ASR mode):**
- `parselmouth` - Voice quality (not used)
- `fastdtw` - DTW algorithm (not used)
- `sklearn` - ML algorithms (not used)

---

## Usage

```python
from diagnosis.ai_engine.detect_stuttering import get_stutter_detector

detector = get_stutter_detector()
result = detector.analyze_audio(
    audio_path="path/to/audio.wav",
    proper_transcript="expected text",  # optional
    language="hindi"  # default: hindi
)

print(result['actual_transcript'])  # ASR transcription
```

---

## Notes

- The engine focuses **only** on ASR transcription
- Stutter detection is simplified to text-based repetition analysis
- No complex acoustic feature extraction
- Faster and lighter than the previous multi-model approach
- Optimized for Hindi but can handle other Indian languages