🚀 Quick Start Guide - Advanced Stutter Detection
TL;DR - What Changed?
Before: System returned mismatch_percentage: 0 even when transcripts were completely different ❌
After: System now correctly detects mismatches using multi-modal analysis ✅
Installation & Setup
1. Requirements
pip install librosa torch transformers scipy numpy
2. Environment Variable
export HF_TOKEN="your_huggingface_token"
3. Import
from diagnosis.ai_engine.detect_stuttering import AdvancedStutterDetector
Basic Usage
Analyze Audio File
# Initialize detector (loads models once)
detector = AdvancedStutterDetector()
# Analyze with target transcript
result = detector.analyze_audio(
audio_path="path/to/audio.wav",
proper_transcript="मैं घर जा रहा हूं",
language='hindi'
)
# Access results
print(f"Mismatch: {result['mismatch_percentage']}%")
print(f"Severity: {result['severity']}")
print(f"Confidence: {result['confidence_score']}")
Analyze Without Target (ASR Only)
result = detector.analyze_audio(
audio_path="path/to/audio.wav",
language='hindi'
)
# Will only detect acoustic stutters and patterns
Understanding Output
Key Metrics
{
# Transcripts
'actual_transcript': 'है लो', # What was actually said
'target_transcript': 'लोहै', # What should be said
# Mismatch Analysis
'mismatched_chars': ['है', 'लो'], # Segments that don't match
'mismatch_percentage': 67, # % of characters mismatched
# Advanced Metrics
'edit_distance': 4, # Operations to transform
'lcs_ratio': 0.667, # Similarity via LCS
'phonetic_similarity': 0.85, # Sound similarity (0-1)
'word_accuracy': 0.5, # Word-level accuracy
# Stutter Events
'stutter_timestamps': [ # Detected events
{
'type': 'repetition', # repetition|prolongation|block|dysfluency
'start': 1.2, # Start time (seconds)
'end': 1.8, # End time (seconds)
'text': 'मैं', # Affected text
'confidence': 0.87, # Detection confidence
'phonetic_similarity': 0.85 # Acoustic similarity
}
],
# Assessment
'severity': 'moderate', # none|mild|moderate|severe
'severity_score': 45.2, # 0-100 scale
'confidence_score': 0.87, # Overall confidence
# Debug
'debug': {
'acoustic_repetitions': 2, # Sound-based detections
'acoustic_prolongations': 1,
'text_patterns': 2 # Regex pattern matches
}
}
Feature Highlights
1. Phonetic Intelligence
# The system understands that क and ख are similar
detector._calculate_phonetic_similarity('क', 'ख')
# Returns: 0.85 (both velar plosives)
detector._calculate_phonetic_similarity('क', 'अ')
# Returns: 0.2 (different categories)
2. Acoustic Matching
# Detects repetitions even when ASR transcribes differently
# Example: "ज-ज-जाना" might be transcribed as "जना जना"
# Acoustic analysis catches the sound similarity!
3. Pattern Detection
# Automatically detects:
# - Character repetitions: "ममम"
# - Word repetitions: "मैं मैं"
# - Prolongations: "आआआ"
# - Filled pauses: "अ", "उम"
Common Use Cases
Case 1: Clinical Assessment
# Analyze patient's attempt at target phrase
result = detector.analyze_audio(
audio_path="patient_recording.wav",
proper_transcript="मैं अपना नाम बता रहा हूं",
language='hindi'
)
# Extract clinical metrics
severity = result['severity']
frequency = result['stutter_frequency'] # stutters per minute
duration = result['total_stutter_duration']
# Generate report
print(f"Severity: {severity}")
print(f"Frequency: {frequency:.1f} stutters/min")
print(f"Duration: {duration:.1f}s total")
Case 2: Speech Therapy Progress
# Compare recordings over time
baseline = detector.analyze_audio("session_1.wav", target)
followup = detector.analyze_audio("session_10.wav", target)
improvement = baseline['severity_score'] - followup['severity_score']
print(f"Improvement: {improvement:.1f} points")
Case 3: Research Analysis
# Detailed acoustic analysis
result = detector.analyze_audio(audio_path, target)
# Extract acoustic features
for event in result['stutter_timestamps']:
if event['type'] == 'repetition':
acoustic = event.get('acoustic_features', {})
dtw = acoustic.get('dtw_similarity', 0)
spec = acoustic.get('spectral_correlation', 0)
print(f"DTW: {dtw:.2f}, Spectral: {spec:.2f}")
Configuration
Adjust Detection Sensitivity
Edit thresholds in detect_stuttering.py:
# More sensitive (catches more, may have false positives)
PROLONGATION_CORRELATION_THRESHOLD = 0.85 # Default: 0.90
ACOUSTIC_SIMILARITY_THRESHOLD = 0.70 # Default: 0.75
# Less sensitive (fewer false positives, may miss some)
PROLONGATION_CORRELATION_THRESHOLD = 0.95
ACOUSTIC_SIMILARITY_THRESHOLD = 0.85
Troubleshooting
Issue: "mismatch_percentage still 0"
Solution: Make sure you're passing proper_transcript parameter:
result = detector.analyze_audio(
audio_path="file.wav",
proper_transcript="target text", # ← Don't forget this!
)
Issue: "Slow processing"
Solutions:
Reduce audio length (split into chunks)
Disable acoustic analysis (comment out lines ~700-710)
Use CPU instead of GPU for short files
Issue: "Low confidence scores"
Check:
- Audio quality (16kHz recommended)
- Background noise
- Speaker clarity
- Language match (set
language='hindi')
Issue: "HF_TOKEN error"
Solution:
export HF_TOKEN="your_token_here"
# Get token from: https://huggingface.co/settings/tokens
Testing
Run Test Suite
cd /path/to/zlaqa-version-b-ai-enginee
python test_advanced_features.py
Expected Output
🔤 DEVANAGARI PHONETIC GROUPS
Consonants: velar, palatal, retroflex, dental, labial...
Vowels: short, long, diphthongs
🧪 TESTING ADVANCED TRANSCRIPT COMPARISON
Test Case 1: Original Issue
Actual: 'है लो'
Target: 'लोहै'
Mismatch %: 67% ✅
Performance Tips
1. Reuse Detector Instance
# Good: Load models once
detector = AdvancedStutterDetector()
for audio_file in audio_files:
result = detector.analyze_audio(audio_file)
# Bad: Reloads models every time
for audio_file in audio_files:
detector = AdvancedStutterDetector() # ❌ Slow!
result = detector.analyze_audio(audio_file)
2. Batch Processing
results = []
for audio_file in audio_files:
try:
result = detector.analyze_audio(audio_file, target)
results.append(result)
except Exception as e:
print(f"Failed: {audio_file} - {e}")
continue
3. Parallel Processing
from multiprocessing import Pool
def analyze_file(args):
audio_file, target = args
detector = AdvancedStutterDetector()
return detector.analyze_audio(audio_file, target)
with Pool(4) as pool:
results = pool.map(analyze_file, [(f, target) for f in files])
API Reference
Main Method
analyze_audio(
audio_path: str, # Path to .wav file
proper_transcript: str = "", # Expected transcript (optional)
language: str = 'hindi' # Language code
) -> dict
Utility Methods
# Phonetic similarity (0-1)
_calculate_phonetic_similarity(char1: str, char2: str) -> float
# Comprehensive comparison
_compare_transcripts_comprehensive(actual: str, target: str) -> dict
# Acoustic similarity
_compare_audio_segments_acoustic(seg1: np.ndarray, seg2: np.ndarray) -> dict
Documentation Files
| File | Purpose |
|---|---|
ADVANCED_FEATURES.md |
Detailed technical documentation |
IMPLEMENTATION_SUMMARY.md |
Implementation overview |
VERSION_COMPARISON.md |
Compare with other versions |
QUICK_START.md |
This file |
test_advanced_features.py |
Test suite |
Support
Issues?
- Check logs for debug info
- Review
debugsection in output - Test with known-good audio
- Verify HF_TOKEN is set
Questions?
- Review
ADVANCED_FEATURES.mdfor details - Check
VERSION_COMPARISON.mdfor differences - Run test suite to verify setup
Summary
✅ Fixed: Transcript comparison now works correctly
✅ Added: Phonetic-aware Hindi analysis
✅ Added: Acoustic similarity matching
✅ Added: Multi-modal event detection
✅ Result: Accurate stutter detection for Hindi speech
Before: 0% mismatch (broken)
After: 67% mismatch (correct!)
🎉 You're ready to use advanced stutter detection!