| # 🚀 Quick Start Guide - Advanced Stutter Detection | |
| ## TL;DR - What Changed? | |
| **Before**: System returned `mismatch_percentage: 0` even when transcripts were completely different ❌ | |
| **After**: System now correctly detects mismatches using multi-modal analysis ✅ | |
| --- | |
| ## Installation & Setup | |
| ### 1. Requirements | |
| ```bash | |
| pip install librosa torch transformers scipy numpy | |
| ``` | |
| ### 2. Environment Variable | |
| ```bash | |
| export HF_TOKEN="your_huggingface_token" | |
| ``` | |
| ### 3. Import | |
| ```python | |
| from diagnosis.ai_engine.detect_stuttering import AdvancedStutterDetector | |
| ``` | |
| --- | |
| ## Basic Usage | |
| ### Analyze Audio File | |
| ```python | |
| # Initialize detector (loads models once) | |
| detector = AdvancedStutterDetector() | |
| # Analyze with target transcript | |
| result = detector.analyze_audio( | |
| audio_path="path/to/audio.wav", | |
| proper_transcript="मैं घर जा रहा हूं", | |
| language='hindi' | |
| ) | |
| # Access results | |
| print(f"Mismatch: {result['mismatch_percentage']}%") | |
| print(f"Severity: {result['severity']}") | |
| print(f"Confidence: {result['confidence_score']}") | |
| ``` | |
| ### Analyze Without Target (ASR Only) | |
| ```python | |
| result = detector.analyze_audio( | |
| audio_path="path/to/audio.wav", | |
| language='hindi' | |
| ) | |
| # Will only detect acoustic stutters and patterns | |
| ``` | |
| --- | |
| ## Understanding Output | |
| ### Key Metrics | |
| ```python | |
| { | |
| # Transcripts | |
| 'actual_transcript': 'है लो', # What was actually said | |
| 'target_transcript': 'लोहै', # What should be said | |
| # Mismatch Analysis | |
| 'mismatched_chars': ['है', 'लो'], # Segments that don't match | |
| 'mismatch_percentage': 67, # % of characters mismatched | |
| # Advanced Metrics | |
| 'edit_distance': 4, # Operations to transform | |
| 'lcs_ratio': 0.667, # Similarity via LCS | |
| 'phonetic_similarity': 0.85, # Sound similarity (0-1) | |
| 'word_accuracy': 0.5, # Word-level accuracy | |
| # Stutter Events | |
| 'stutter_timestamps': [ # Detected events | |
| { | |
| 'type': 'repetition', # repetition|prolongation|block|dysfluency | |
| 'start': 1.2, # Start time (seconds) | |
| 'end': 1.8, # End time (seconds) | |
| 'text': 'मैं', # Affected text | |
| 'confidence': 0.87, # Detection confidence | |
| 'phonetic_similarity': 0.85 # Acoustic similarity | |
| } | |
| ], | |
| # Assessment | |
| 'severity': 'moderate', # none|mild|moderate|severe | |
| 'severity_score': 45.2, # 0-100 scale | |
| 'confidence_score': 0.87, # Overall confidence | |
| # Debug | |
| 'debug': { | |
| 'acoustic_repetitions': 2, # Sound-based detections | |
| 'acoustic_prolongations': 1, | |
| 'text_patterns': 2 # Regex pattern matches | |
| } | |
| } | |
| ``` | |
| --- | |
| ## Feature Highlights | |
| ### 1. Phonetic Intelligence | |
| ```python | |
| # The system understands that क and ख are similar | |
| detector._calculate_phonetic_similarity('क', 'ख') | |
| # Returns: 0.85 (both velar plosives) | |
| detector._calculate_phonetic_similarity('क', 'अ') | |
| # Returns: 0.2 (different categories) | |
| ``` | |
| ### 2. Acoustic Matching | |
| ```python | |
| # Detects repetitions even when ASR transcribes differently | |
| # Example: "ज-ज-जाना" might be transcribed as "जना जना" | |
| # Acoustic analysis catches the sound similarity! | |
| ``` | |
| ### 3. Pattern Detection | |
| ```python | |
| # Automatically detects: | |
| # - Character repetitions: "ममम" | |
| # - Word repetitions: "मैं मैं" | |
| # - Prolongations: "आआआ" | |
| # - Filled pauses: "अ", "उम" | |
| ``` | |
| --- | |
| ## Common Use Cases | |
| ### Case 1: Clinical Assessment | |
| ```python | |
| # Analyze patient's attempt at target phrase | |
| result = detector.analyze_audio( | |
| audio_path="patient_recording.wav", | |
| proper_transcript="मैं अपना नाम बता रहा हूं", | |
| language='hindi' | |
| ) | |
| # Extract clinical metrics | |
| severity = result['severity'] | |
| frequency = result['stutter_frequency'] # stutters per minute | |
| duration = result['total_stutter_duration'] | |
| # Generate report | |
| print(f"Severity: {severity}") | |
| print(f"Frequency: {frequency:.1f} stutters/min") | |
| print(f"Duration: {duration:.1f}s total") | |
| ``` | |
| ### Case 2: Speech Therapy Progress | |
| ```python | |
| # Compare recordings over time | |
| baseline = detector.analyze_audio("session_1.wav", target) | |
| followup = detector.analyze_audio("session_10.wav", target) | |
| improvement = baseline['severity_score'] - followup['severity_score'] | |
| print(f"Improvement: {improvement:.1f} points") | |
| ``` | |
| ### Case 3: Research Analysis | |
| ```python | |
| # Detailed acoustic analysis | |
| result = detector.analyze_audio(audio_path, target) | |
| # Extract acoustic features | |
| for event in result['stutter_timestamps']: | |
| if event['type'] == 'repetition': | |
| acoustic = event.get('acoustic_features', {}) | |
| dtw = acoustic.get('dtw_similarity', 0) | |
| spec = acoustic.get('spectral_correlation', 0) | |
| print(f"DTW: {dtw:.2f}, Spectral: {spec:.2f}") | |
| ``` | |
| --- | |
| ## Configuration | |
| ### Adjust Detection Sensitivity | |
| Edit thresholds in `detect_stuttering.py`: | |
| ```python | |
| # More sensitive (catches more, may have false positives) | |
| PROLONGATION_CORRELATION_THRESHOLD = 0.85 # Default: 0.90 | |
| ACOUSTIC_SIMILARITY_THRESHOLD = 0.70 # Default: 0.75 | |
| # Less sensitive (fewer false positives, may miss some) | |
| PROLONGATION_CORRELATION_THRESHOLD = 0.95 | |
| ACOUSTIC_SIMILARITY_THRESHOLD = 0.85 | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| ### Issue: "mismatch_percentage still 0" | |
| **Solution**: Make sure you're passing `proper_transcript` parameter: | |
| ```python | |
| result = detector.analyze_audio( | |
| audio_path="file.wav", | |
| proper_transcript="target text", # ← Don't forget this! | |
| ) | |
| ``` | |
| ### Issue: "Slow processing" | |
| **Solutions**: | |
| - Reduce audio length (split into chunks) | |
| - Disable acoustic analysis (comment out lines ~700-710) | |
| - Use CPU instead of GPU for short files | |
| ### Issue: "Low confidence scores" | |
| **Check**: | |
| - Audio quality (16kHz recommended) | |
| - Background noise | |
| - Speaker clarity | |
| - Language match (set `language='hindi'`) | |
| ### Issue: "HF_TOKEN error" | |
| **Solution**: | |
| ```bash | |
| export HF_TOKEN="your_token_here" | |
| # Get token from: https://huggingface.co/settings/tokens | |
| ``` | |
| --- | |
| ## Testing | |
| ### Run Test Suite | |
| ```bash | |
| cd /path/to/zlaqa-version-b-ai-enginee | |
| python test_advanced_features.py | |
| ``` | |
| ### Expected Output | |
| ``` | |
| 🔤 DEVANAGARI PHONETIC GROUPS | |
| Consonants: velar, palatal, retroflex, dental, labial... | |
| Vowels: short, long, diphthongs | |
| 🧪 TESTING ADVANCED TRANSCRIPT COMPARISON | |
| Test Case 1: Original Issue | |
| Actual: 'है लो' | |
| Target: 'लोहै' | |
| Mismatch %: 67% ✅ | |
| ``` | |
| --- | |
| ## Performance Tips | |
| ### 1. Reuse Detector Instance | |
| ```python | |
| # Good: Load models once | |
| detector = AdvancedStutterDetector() | |
| for audio_file in audio_files: | |
| result = detector.analyze_audio(audio_file) | |
| # Bad: Reloads models every time | |
| for audio_file in audio_files: | |
| detector = AdvancedStutterDetector() # ❌ Slow! | |
| result = detector.analyze_audio(audio_file) | |
| ``` | |
| ### 2. Batch Processing | |
| ```python | |
| results = [] | |
| for audio_file in audio_files: | |
| try: | |
| result = detector.analyze_audio(audio_file, target) | |
| results.append(result) | |
| except Exception as e: | |
| print(f"Failed: {audio_file} - {e}") | |
| continue | |
| ``` | |
| ### 3. Parallel Processing | |
| ```python | |
| from multiprocessing import Pool | |
| def analyze_file(args): | |
| audio_file, target = args | |
| detector = AdvancedStutterDetector() | |
| return detector.analyze_audio(audio_file, target) | |
| with Pool(4) as pool: | |
| results = pool.map(analyze_file, [(f, target) for f in files]) | |
| ``` | |
| --- | |
| ## API Reference | |
| ### Main Method | |
| ```python | |
| analyze_audio( | |
| audio_path: str, # Path to .wav file | |
| proper_transcript: str = "", # Expected transcript (optional) | |
| language: str = 'hindi' # Language code | |
| ) -> dict | |
| ``` | |
| ### Utility Methods | |
| ```python | |
| # Phonetic similarity (0-1) | |
| _calculate_phonetic_similarity(char1: str, char2: str) -> float | |
| # Comprehensive comparison | |
| _compare_transcripts_comprehensive(actual: str, target: str) -> dict | |
| # Acoustic similarity | |
| _compare_audio_segments_acoustic(seg1: np.ndarray, seg2: np.ndarray) -> dict | |
| ``` | |
| --- | |
| ## Documentation Files | |
| | File | Purpose | | |
| |------|---------| | |
| | `ADVANCED_FEATURES.md` | Detailed technical documentation | | |
| | `IMPLEMENTATION_SUMMARY.md` | Implementation overview | | |
| | `VERSION_COMPARISON.md` | Compare with other versions | | |
| | `QUICK_START.md` | This file | | |
| | `test_advanced_features.py` | Test suite | | |
| --- | |
| ## Support | |
| **Issues?** | |
| 1. Check logs for debug info | |
| 2. Review `debug` section in output | |
| 3. Test with known-good audio | |
| 4. Verify HF_TOKEN is set | |
| **Questions?** | |
| - Review `ADVANCED_FEATURES.md` for details | |
| - Check `VERSION_COMPARISON.md` for differences | |
| - Run test suite to verify setup | |
| --- | |
| ## Summary | |
| ✅ **Fixed**: Transcript comparison now works correctly | |
| ✅ **Added**: Phonetic-aware Hindi analysis | |
| ✅ **Added**: Acoustic similarity matching | |
| ✅ **Added**: Multi-modal event detection | |
| ✅ **Result**: Accurate stutter detection for Hindi speech | |
| **Before**: 0% mismatch (broken) | |
| **After**: 67% mismatch (correct!) | |
| 🎉 **You're ready to use advanced stutter detection!** | |