zlaqa-version-b-ai-enginee

Runtime error

File size: 3,275 Bytes

a62077e

# AI Engine Model Summary

## Simplified ASR-Only Configuration

This engine has been simplified to use **ONLY** the IndicWav2Vec Hindi model for Automatic Speech Recognition (ASR).

---

## Active Model

### 1. IndicWav2Vec Hindi (Primary & Only Model)
- **Model ID**: `ai4bharat/indicwav2vec-hindi`
- **Type**: `Wav2Vec2ForCTC`
- **Purpose**: Automatic Speech Recognition (ASR) for Hindi and Indian languages
- **Status**: ✅ Active - Loaded at startup
- **Location**: `detect_stuttering.py` lines 26, 148-156
- **Authentication**: Requires `HF_TOKEN` environment variable

**Features:**
- Speech-to-text transcription
- Confidence scoring from model predictions
- Text-based stutter analysis (simple repetition detection)

---

## Removed Models

The following models have been **removed** to simplify the engine:

1. ❌ **MMS Language Identification (LID)** - `facebook/mms-lid-126`
   - Previously used for language detection
   - No longer needed - IndicWav2Vec handles Hindi natively

2. ❌ **Isolation Forest** (sklearn)
   - Previously used for anomaly detection
   - Removed - using simple text-based analysis instead

---

## Removed Libraries

The following signal processing libraries are no longer used:

- ❌ `parselmouth` (Praat) - Voice quality analysis
- ❌ `fastdtw` - Repetition detection via DTW
- ❌ `sklearn` - Machine learning algorithms
- ❌ Complex acoustic feature extraction (MFCC, formants, etc.)

---

## Current Pipeline

```
Audio Input
    ↓
IndicWav2Vec Hindi ASR
    ↓
Text Transcription
    ↓
Basic Text Analysis
    ↓
Results (transcript + simple stutter detection)
```

---

## API Response Format

The simplified engine returns:

```json
{
  "actual_transcript": "transcribed text",
  "target_transcript": "expected text (if provided)",
  "mismatched_chars": ["timestamps of low confidence regions"],
  "mismatch_percentage": 0.0,
  "ctc_loss_score": 0.0,
  "stutter_timestamps": [{"type": "repetition", "start": 0.0, "end": 0.5, ...}],
  "total_stutter_duration": 0.0,
  "stutter_frequency": 0.0,
  "severity": "none|mild|moderate|severe",
  "confidence_score": 0.8,
  "speaking_rate_sps": 0.0,
  "analysis_duration_seconds": 0.0,
  "model_version": "indicwav2vec-hindi-asr-v1"
}
```

---

## Dependencies

**Required:**
- `transformers` 4.35.0 - For IndicWav2Vec model
- `torch` 2.0.1 - PyTorch backend
- `librosa` ≥0.10.0 - Audio loading (16kHz resampling)
- `numpy` - Array operations

**Optional (for legacy methods, not used in ASR mode):**
- `parselmouth` - Voice quality (not used)
- `fastdtw` - DTW algorithm (not used)
- `sklearn` - ML algorithms (not used)

---

## Usage

```python
from diagnosis.ai_engine.detect_stuttering import get_stutter_detector

detector = get_stutter_detector()
result = detector.analyze_audio(
    audio_path="path/to/audio.wav",
    proper_transcript="expected text",  # optional
    language="hindi"  # default: hindi
)

print(result['actual_transcript'])  # ASR transcription
```

---

## Notes

- The engine focuses **only** on ASR transcription
- Stutter detection is simplified to text-based repetition analysis
- No complex acoustic feature extraction
- Faster and lighter than the previous multi-model approach
- Optimized for Hindi but can handle other Indian languages