anfastech's picture
New: implemented many, many changes. 10% Phone-level detection: WORKING
278e294

🚀 Quick Start Guide - Advanced Stutter Detection

TL;DR - What Changed?

Before: System returned mismatch_percentage: 0 even when transcripts were completely different ❌
After: System now correctly detects mismatches using multi-modal analysis ✅


Installation & Setup

1. Requirements

pip install librosa torch transformers scipy numpy

2. Environment Variable

export HF_TOKEN="your_huggingface_token"

3. Import

from diagnosis.ai_engine.detect_stuttering import AdvancedStutterDetector

Basic Usage

Analyze Audio File

# Initialize detector (loads models once)
detector = AdvancedStutterDetector()

# Analyze with target transcript
result = detector.analyze_audio(
    audio_path="path/to/audio.wav",
    proper_transcript="मैं घर जा रहा हूं",
    language='hindi'
)

# Access results
print(f"Mismatch: {result['mismatch_percentage']}%")
print(f"Severity: {result['severity']}")
print(f"Confidence: {result['confidence_score']}")

Analyze Without Target (ASR Only)

result = detector.analyze_audio(
    audio_path="path/to/audio.wav",
    language='hindi'
)
# Will only detect acoustic stutters and patterns

Understanding Output

Key Metrics

{
    # Transcripts
    'actual_transcript': 'है लो',        # What was actually said
    'target_transcript': 'लोहै',         # What should be said
    
    # Mismatch Analysis
    'mismatched_chars': ['है', 'लो'],    # Segments that don't match
    'mismatch_percentage': 67,            # % of characters mismatched
    
    # Advanced Metrics
    'edit_distance': 4,                   # Operations to transform
    'lcs_ratio': 0.667,                   # Similarity via LCS
    'phonetic_similarity': 0.85,          # Sound similarity (0-1)
    'word_accuracy': 0.5,                 # Word-level accuracy
    
    # Stutter Events
    'stutter_timestamps': [               # Detected events
        {
            'type': 'repetition',         # repetition|prolongation|block|dysfluency
            'start': 1.2,                 # Start time (seconds)
            'end': 1.8,                   # End time (seconds)
            'text': 'मैं',                # Affected text
            'confidence': 0.87,           # Detection confidence
            'phonetic_similarity': 0.85   # Acoustic similarity
        }
    ],
    
    # Assessment
    'severity': 'moderate',               # none|mild|moderate|severe
    'severity_score': 45.2,               # 0-100 scale
    'confidence_score': 0.87,             # Overall confidence
    
    # Debug
    'debug': {
        'acoustic_repetitions': 2,        # Sound-based detections
        'acoustic_prolongations': 1,
        'text_patterns': 2                # Regex pattern matches
    }
}

Feature Highlights

1. Phonetic Intelligence

# The system understands that क and ख are similar
detector._calculate_phonetic_similarity('क', 'ख')
# Returns: 0.85 (both velar plosives)

detector._calculate_phonetic_similarity('क', 'अ')  
# Returns: 0.2 (different categories)

2. Acoustic Matching

# Detects repetitions even when ASR transcribes differently
# Example: "ज-ज-जाना" might be transcribed as "जना जना"
# Acoustic analysis catches the sound similarity!

3. Pattern Detection

# Automatically detects:
# - Character repetitions: "ममम"
# - Word repetitions: "मैं मैं"
# - Prolongations: "आआआ"
# - Filled pauses: "अ", "उम"

Common Use Cases

Case 1: Clinical Assessment

# Analyze patient's attempt at target phrase
result = detector.analyze_audio(
    audio_path="patient_recording.wav",
    proper_transcript="मैं अपना नाम बता रहा हूं",
    language='hindi'
)

# Extract clinical metrics
severity = result['severity']
frequency = result['stutter_frequency']  # stutters per minute
duration = result['total_stutter_duration']

# Generate report
print(f"Severity: {severity}")
print(f"Frequency: {frequency:.1f} stutters/min")
print(f"Duration: {duration:.1f}s total")

Case 2: Speech Therapy Progress

# Compare recordings over time
baseline = detector.analyze_audio("session_1.wav", target)
followup = detector.analyze_audio("session_10.wav", target)

improvement = baseline['severity_score'] - followup['severity_score']
print(f"Improvement: {improvement:.1f} points")

Case 3: Research Analysis

# Detailed acoustic analysis
result = detector.analyze_audio(audio_path, target)

# Extract acoustic features
for event in result['stutter_timestamps']:
    if event['type'] == 'repetition':
        acoustic = event.get('acoustic_features', {})
        dtw = acoustic.get('dtw_similarity', 0)
        spec = acoustic.get('spectral_correlation', 0)
        print(f"DTW: {dtw:.2f}, Spectral: {spec:.2f}")

Configuration

Adjust Detection Sensitivity

Edit thresholds in detect_stuttering.py:

# More sensitive (catches more, may have false positives)
PROLONGATION_CORRELATION_THRESHOLD = 0.85  # Default: 0.90
ACOUSTIC_SIMILARITY_THRESHOLD = 0.70       # Default: 0.75

# Less sensitive (fewer false positives, may miss some)
PROLONGATION_CORRELATION_THRESHOLD = 0.95
ACOUSTIC_SIMILARITY_THRESHOLD = 0.85

Troubleshooting

Issue: "mismatch_percentage still 0"

Solution: Make sure you're passing proper_transcript parameter:

result = detector.analyze_audio(
    audio_path="file.wav",
    proper_transcript="target text",  # ← Don't forget this!
)

Issue: "Slow processing"

Solutions:

  • Reduce audio length (split into chunks)

  • Disable acoustic analysis (comment out lines ~700-710)

  • Use CPU instead of GPU for short files

Issue: "Low confidence scores"

Check:

  • Audio quality (16kHz recommended)
  • Background noise
  • Speaker clarity
  • Language match (set language='hindi')

Issue: "HF_TOKEN error"

Solution:

export HF_TOKEN="your_token_here"
# Get token from: https://huggingface.co/settings/tokens

Testing

Run Test Suite

cd /path/to/zlaqa-version-b-ai-enginee
python test_advanced_features.py

Expected Output

🔤 DEVANAGARI PHONETIC GROUPS
  Consonants: velar, palatal, retroflex, dental, labial...
  Vowels: short, long, diphthongs

🧪 TESTING ADVANCED TRANSCRIPT COMPARISON
  Test Case 1: Original Issue
    Actual:  'है लो'
    Target:  'लोहै'
    Mismatch %: 67% ✅

Performance Tips

1. Reuse Detector Instance

# Good: Load models once
detector = AdvancedStutterDetector()
for audio_file in audio_files:
    result = detector.analyze_audio(audio_file)

# Bad: Reloads models every time
for audio_file in audio_files:
    detector = AdvancedStutterDetector()  # ❌ Slow!
    result = detector.analyze_audio(audio_file)

2. Batch Processing

results = []
for audio_file in audio_files:
    try:
        result = detector.analyze_audio(audio_file, target)
        results.append(result)
    except Exception as e:
        print(f"Failed: {audio_file} - {e}")
        continue

3. Parallel Processing

from multiprocessing import Pool

def analyze_file(args):
    audio_file, target = args
    detector = AdvancedStutterDetector()
    return detector.analyze_audio(audio_file, target)

with Pool(4) as pool:
    results = pool.map(analyze_file, [(f, target) for f in files])

API Reference

Main Method

analyze_audio(
    audio_path: str,           # Path to .wav file
    proper_transcript: str = "",  # Expected transcript (optional)
    language: str = 'hindi'    # Language code
) -> dict

Utility Methods

# Phonetic similarity (0-1)
_calculate_phonetic_similarity(char1: str, char2: str) -> float

# Comprehensive comparison
_compare_transcripts_comprehensive(actual: str, target: str) -> dict

# Acoustic similarity
_compare_audio_segments_acoustic(seg1: np.ndarray, seg2: np.ndarray) -> dict

Documentation Files

File Purpose
ADVANCED_FEATURES.md Detailed technical documentation
IMPLEMENTATION_SUMMARY.md Implementation overview
VERSION_COMPARISON.md Compare with other versions
QUICK_START.md This file
test_advanced_features.py Test suite

Support

Issues?

  1. Check logs for debug info
  2. Review debug section in output
  3. Test with known-good audio
  4. Verify HF_TOKEN is set

Questions?

  • Review ADVANCED_FEATURES.md for details
  • Check VERSION_COMPARISON.md for differences
  • Run test suite to verify setup

Summary

Fixed: Transcript comparison now works correctly
Added: Phonetic-aware Hindi analysis
Added: Acoustic similarity matching
Added: Multi-modal event detection
Result: Accurate stutter detection for Hindi speech

Before: 0% mismatch (broken)
After: 67% mismatch (correct!)

🎉 You're ready to use advanced stutter detection!