zlaqa-version-b-ai-enginee

Runtime error

App Files Files Community

zlaqa-version-b-ai-enginee / Docs /MODEL_SUMMARY.md

anfastech

fix: resolve torch security error by pinning torch 2.6.0 and updating requirements

a62077e 2 months ago

preview code

raw

history blame contribute delete

3.28 kB

	# AI Engine Model Summary

	## Simplified ASR-Only Configuration

	This engine has been simplified to use ONLY the IndicWav2Vec Hindi model for Automatic Speech Recognition (ASR).

	---

	## Active Model

	### 1. IndicWav2Vec Hindi (Primary & Only Model)
	- Model ID: `ai4bharat/indicwav2vec-hindi`
	- Type: `Wav2Vec2ForCTC`
	- Purpose: Automatic Speech Recognition (ASR) for Hindi and Indian languages
	- Status: ✅ Active - Loaded at startup
	- Location: `detect_stuttering.py` lines 26, 148-156
	- Authentication: Requires `HF_TOKEN` environment variable

	Features:
	- Speech-to-text transcription
	- Confidence scoring from model predictions
	- Text-based stutter analysis (simple repetition detection)

	---

	## Removed Models

	The following models have been removed to simplify the engine:

	1. ❌ MMS Language Identification (LID) - `facebook/mms-lid-126`
	- Previously used for language detection
	- No longer needed - IndicWav2Vec handles Hindi natively

	2. ❌ Isolation Forest (sklearn)
	- Previously used for anomaly detection
	- Removed - using simple text-based analysis instead

	---

	## Removed Libraries

	The following signal processing libraries are no longer used:

	- ❌ `parselmouth` (Praat) - Voice quality analysis
	- ❌ `fastdtw` - Repetition detection via DTW
	- ❌ `sklearn` - Machine learning algorithms
	- ❌ Complex acoustic feature extraction (MFCC, formants, etc.)

	---

	## Current Pipeline

	```
	Audio Input
	↓
	IndicWav2Vec Hindi ASR
	↓
	Text Transcription
	↓
	Basic Text Analysis
	↓
	Results (transcript + simple stutter detection)
	```

	---

	## API Response Format

	The simplified engine returns:

	```json
	{
	"actual_transcript": "transcribed text",
	"target_transcript": "expected text (if provided)",
	"mismatched_chars": ["timestamps of low confidence regions"],
	"mismatch_percentage": 0.0,
	"ctc_loss_score": 0.0,
	"stutter_timestamps": [{"type": "repetition", "start": 0.0, "end": 0.5, ...}],
	"total_stutter_duration": 0.0,
	"stutter_frequency": 0.0,
	"severity": "none\|mild\|moderate\|severe",
	"confidence_score": 0.8,
	"speaking_rate_sps": 0.0,
	"analysis_duration_seconds": 0.0,
	"model_version": "indicwav2vec-hindi-asr-v1"
	}
	```

	---

	## Dependencies

	Required:
	- `transformers` 4.35.0 - For IndicWav2Vec model
	- `torch` 2.0.1 - PyTorch backend
	- `librosa` ≥0.10.0 - Audio loading (16kHz resampling)
	- `numpy` - Array operations

	Optional (for legacy methods, not used in ASR mode):
	- `parselmouth` - Voice quality (not used)
	- `fastdtw` - DTW algorithm (not used)
	- `sklearn` - ML algorithms (not used)

	---

	## Usage

	```python
	from diagnosis.ai_engine.detect_stuttering import get_stutter_detector

	detector = get_stutter_detector()
	result = detector.analyze_audio(
	audio_path="path/to/audio.wav",
	proper_transcript="expected text", # optional
	language="hindi" # default: hindi
	)

	print(result['actual_transcript']) # ASR transcription
	```

	---

	## Notes

	- The engine focuses only on ASR transcription
	- Stutter detection is simplified to text-based repetition analysis
	- No complex acoustic feature extraction
	- Faster and lighter than the previous multi-model approach
	- Optimized for Hindi but can handle other Indian languages