# ๐Ÿš€ Quick Start Guide - Advanced Stutter Detection ## TL;DR - What Changed? **Before**: System returned `mismatch_percentage: 0` even when transcripts were completely different โŒ **After**: System now correctly detects mismatches using multi-modal analysis โœ… --- ## Installation & Setup ### 1. Requirements ```bash pip install librosa torch transformers scipy numpy ``` ### 2. Environment Variable ```bash export HF_TOKEN="your_huggingface_token" ``` ### 3. Import ```python from diagnosis.ai_engine.detect_stuttering import AdvancedStutterDetector ``` --- ## Basic Usage ### Analyze Audio File ```python # Initialize detector (loads models once) detector = AdvancedStutterDetector() # Analyze with target transcript result = detector.analyze_audio( audio_path="path/to/audio.wav", proper_transcript="เคฎเฅˆเค‚ เค˜เคฐ เคœเคพ เคฐเคนเคพ เคนเฅ‚เค‚", language='hindi' ) # Access results print(f"Mismatch: {result['mismatch_percentage']}%") print(f"Severity: {result['severity']}") print(f"Confidence: {result['confidence_score']}") ``` ### Analyze Without Target (ASR Only) ```python result = detector.analyze_audio( audio_path="path/to/audio.wav", language='hindi' ) # Will only detect acoustic stutters and patterns ``` --- ## Understanding Output ### Key Metrics ```python { # Transcripts 'actual_transcript': 'เคนเฅˆ เคฒเฅ‹', # What was actually said 'target_transcript': 'เคฒเฅ‹เคนเฅˆ', # What should be said # Mismatch Analysis 'mismatched_chars': ['เคนเฅˆ', 'เคฒเฅ‹'], # Segments that don't match 'mismatch_percentage': 67, # % of characters mismatched # Advanced Metrics 'edit_distance': 4, # Operations to transform 'lcs_ratio': 0.667, # Similarity via LCS 'phonetic_similarity': 0.85, # Sound similarity (0-1) 'word_accuracy': 0.5, # Word-level accuracy # Stutter Events 'stutter_timestamps': [ # Detected events { 'type': 'repetition', # repetition|prolongation|block|dysfluency 'start': 1.2, # Start time (seconds) 'end': 1.8, # End time (seconds) 'text': 'เคฎเฅˆเค‚', # Affected text 'confidence': 0.87, # Detection confidence 'phonetic_similarity': 0.85 # Acoustic similarity } ], # Assessment 'severity': 'moderate', # none|mild|moderate|severe 'severity_score': 45.2, # 0-100 scale 'confidence_score': 0.87, # Overall confidence # Debug 'debug': { 'acoustic_repetitions': 2, # Sound-based detections 'acoustic_prolongations': 1, 'text_patterns': 2 # Regex pattern matches } } ``` --- ## Feature Highlights ### 1. Phonetic Intelligence ```python # The system understands that เค• and เค– are similar detector._calculate_phonetic_similarity('เค•', 'เค–') # Returns: 0.85 (both velar plosives) detector._calculate_phonetic_similarity('เค•', 'เค…') # Returns: 0.2 (different categories) ``` ### 2. Acoustic Matching ```python # Detects repetitions even when ASR transcribes differently # Example: "เคœ-เคœ-เคœเคพเคจเคพ" might be transcribed as "เคœเคจเคพ เคœเคจเคพ" # Acoustic analysis catches the sound similarity! ``` ### 3. Pattern Detection ```python # Automatically detects: # - Character repetitions: "เคฎเคฎเคฎ" # - Word repetitions: "เคฎเฅˆเค‚ เคฎเฅˆเค‚" # - Prolongations: "เค†เค†เค†" # - Filled pauses: "เค…", "เค‰เคฎ" ``` --- ## Common Use Cases ### Case 1: Clinical Assessment ```python # Analyze patient's attempt at target phrase result = detector.analyze_audio( audio_path="patient_recording.wav", proper_transcript="เคฎเฅˆเค‚ เค…เคชเคจเคพ เคจเคพเคฎ เคฌเคคเคพ เคฐเคนเคพ เคนเฅ‚เค‚", language='hindi' ) # Extract clinical metrics severity = result['severity'] frequency = result['stutter_frequency'] # stutters per minute duration = result['total_stutter_duration'] # Generate report print(f"Severity: {severity}") print(f"Frequency: {frequency:.1f} stutters/min") print(f"Duration: {duration:.1f}s total") ``` ### Case 2: Speech Therapy Progress ```python # Compare recordings over time baseline = detector.analyze_audio("session_1.wav", target) followup = detector.analyze_audio("session_10.wav", target) improvement = baseline['severity_score'] - followup['severity_score'] print(f"Improvement: {improvement:.1f} points") ``` ### Case 3: Research Analysis ```python # Detailed acoustic analysis result = detector.analyze_audio(audio_path, target) # Extract acoustic features for event in result['stutter_timestamps']: if event['type'] == 'repetition': acoustic = event.get('acoustic_features', {}) dtw = acoustic.get('dtw_similarity', 0) spec = acoustic.get('spectral_correlation', 0) print(f"DTW: {dtw:.2f}, Spectral: {spec:.2f}") ``` --- ## Configuration ### Adjust Detection Sensitivity Edit thresholds in `detect_stuttering.py`: ```python # More sensitive (catches more, may have false positives) PROLONGATION_CORRELATION_THRESHOLD = 0.85 # Default: 0.90 ACOUSTIC_SIMILARITY_THRESHOLD = 0.70 # Default: 0.75 # Less sensitive (fewer false positives, may miss some) PROLONGATION_CORRELATION_THRESHOLD = 0.95 ACOUSTIC_SIMILARITY_THRESHOLD = 0.85 ``` --- ## Troubleshooting ### Issue: "mismatch_percentage still 0" **Solution**: Make sure you're passing `proper_transcript` parameter: ```python result = detector.analyze_audio( audio_path="file.wav", proper_transcript="target text", # โ† Don't forget this! ) ``` ### Issue: "Slow processing" **Solutions**: - Reduce audio length (split into chunks) - Disable acoustic analysis (comment out lines ~700-710) - Use CPU instead of GPU for short files ### Issue: "Low confidence scores" **Check**: - Audio quality (16kHz recommended) - Background noise - Speaker clarity - Language match (set `language='hindi'`) ### Issue: "HF_TOKEN error" **Solution**: ```bash export HF_TOKEN="your_token_here" # Get token from: https://huggingface.co/settings/tokens ``` --- ## Testing ### Run Test Suite ```bash cd /path/to/zlaqa-version-b-ai-enginee python test_advanced_features.py ``` ### Expected Output ``` ๐Ÿ”ค DEVANAGARI PHONETIC GROUPS Consonants: velar, palatal, retroflex, dental, labial... Vowels: short, long, diphthongs ๐Ÿงช TESTING ADVANCED TRANSCRIPT COMPARISON Test Case 1: Original Issue Actual: 'เคนเฅˆ เคฒเฅ‹' Target: 'เคฒเฅ‹เคนเฅˆ' Mismatch %: 67% โœ… ``` --- ## Performance Tips ### 1. Reuse Detector Instance ```python # Good: Load models once detector = AdvancedStutterDetector() for audio_file in audio_files: result = detector.analyze_audio(audio_file) # Bad: Reloads models every time for audio_file in audio_files: detector = AdvancedStutterDetector() # โŒ Slow! result = detector.analyze_audio(audio_file) ``` ### 2. Batch Processing ```python results = [] for audio_file in audio_files: try: result = detector.analyze_audio(audio_file, target) results.append(result) except Exception as e: print(f"Failed: {audio_file} - {e}") continue ``` ### 3. Parallel Processing ```python from multiprocessing import Pool def analyze_file(args): audio_file, target = args detector = AdvancedStutterDetector() return detector.analyze_audio(audio_file, target) with Pool(4) as pool: results = pool.map(analyze_file, [(f, target) for f in files]) ``` --- ## API Reference ### Main Method ```python analyze_audio( audio_path: str, # Path to .wav file proper_transcript: str = "", # Expected transcript (optional) language: str = 'hindi' # Language code ) -> dict ``` ### Utility Methods ```python # Phonetic similarity (0-1) _calculate_phonetic_similarity(char1: str, char2: str) -> float # Comprehensive comparison _compare_transcripts_comprehensive(actual: str, target: str) -> dict # Acoustic similarity _compare_audio_segments_acoustic(seg1: np.ndarray, seg2: np.ndarray) -> dict ``` --- ## Documentation Files | File | Purpose | |------|---------| | `ADVANCED_FEATURES.md` | Detailed technical documentation | | `IMPLEMENTATION_SUMMARY.md` | Implementation overview | | `VERSION_COMPARISON.md` | Compare with other versions | | `QUICK_START.md` | This file | | `test_advanced_features.py` | Test suite | --- ## Support **Issues?** 1. Check logs for debug info 2. Review `debug` section in output 3. Test with known-good audio 4. Verify HF_TOKEN is set **Questions?** - Review `ADVANCED_FEATURES.md` for details - Check `VERSION_COMPARISON.md` for differences - Run test suite to verify setup --- ## Summary โœ… **Fixed**: Transcript comparison now works correctly โœ… **Added**: Phonetic-aware Hindi analysis โœ… **Added**: Acoustic similarity matching โœ… **Added**: Multi-modal event detection โœ… **Result**: Accurate stutter detection for Hindi speech **Before**: 0% mismatch (broken) **After**: 67% mismatch (correct!) ๐ŸŽ‰ **You're ready to use advanced stutter detection!**