# 🚀 Quick Start Guide - Advanced Stutter Detection

## TL;DR - What Changed?

**Before**: System returned `mismatch_percentage: 0` even when transcripts were completely different ❌  
**After**: System now correctly detects mismatches using multi-modal analysis ✅

---

## Installation & Setup

### 1. Requirements

```bash
pip install librosa torch transformers scipy numpy
```

### 2. Environment Variable

```bash
export HF_TOKEN="your_huggingface_token"
```

### 3. Import

```python
from diagnosis.ai_engine.detect_stuttering import AdvancedStutterDetector
```

---

## Basic Usage

### Analyze Audio File

```python
# Initialize detector (loads models once)
detector = AdvancedStutterDetector()

# Analyze with target transcript
result = detector.analyze_audio(
    audio_path="path/to/audio.wav",
    proper_transcript="मैं घर जा रहा हूं",
    language='hindi'
)

# Access results
print(f"Mismatch: {result['mismatch_percentage']}%")
print(f"Severity: {result['severity']}")
print(f"Confidence: {result['confidence_score']}")
```

### Analyze Without Target (ASR Only)

```python
result = detector.analyze_audio(
    audio_path="path/to/audio.wav",
    language='hindi'
)
# Will only detect acoustic stutters and patterns
```

---

## Understanding Output

### Key Metrics

```python
{
    # Transcripts
    'actual_transcript': 'है लो',        # What was actually said
    'target_transcript': 'लोहै',         # What should be said
    
    # Mismatch Analysis
    'mismatched_chars': ['है', 'लो'],    # Segments that don't match
    'mismatch_percentage': 67,            # % of characters mismatched
    
    # Advanced Metrics
    'edit_distance': 4,                   # Operations to transform
    'lcs_ratio': 0.667,                   # Similarity via LCS
    'phonetic_similarity': 0.85,          # Sound similarity (0-1)
    'word_accuracy': 0.5,                 # Word-level accuracy
    
    # Stutter Events
    'stutter_timestamps': [               # Detected events
        {
            'type': 'repetition',         # repetition|prolongation|block|dysfluency
            'start': 1.2,                 # Start time (seconds)
            'end': 1.8,                   # End time (seconds)
            'text': 'मैं',                # Affected text
            'confidence': 0.87,           # Detection confidence
            'phonetic_similarity': 0.85   # Acoustic similarity
        }
    ],
    
    # Assessment
    'severity': 'moderate',               # none|mild|moderate|severe
    'severity_score': 45.2,               # 0-100 scale
    'confidence_score': 0.87,             # Overall confidence
    
    # Debug
    'debug': {
        'acoustic_repetitions': 2,        # Sound-based detections
        'acoustic_prolongations': 1,
        'text_patterns': 2                # Regex pattern matches
    }
}
```

---

## Feature Highlights

### 1. Phonetic Intelligence

```python
# The system understands that क and ख are similar
detector._calculate_phonetic_similarity('क', 'ख')
# Returns: 0.85 (both velar plosives)

detector._calculate_phonetic_similarity('क', 'अ')  
# Returns: 0.2 (different categories)
```

### 2. Acoustic Matching

```python
# Detects repetitions even when ASR transcribes differently
# Example: "ज-ज-जाना" might be transcribed as "जना जना"
# Acoustic analysis catches the sound similarity!
```

### 3. Pattern Detection

```python
# Automatically detects:
# - Character repetitions: "ममम"
# - Word repetitions: "मैं मैं"
# - Prolongations: "आआआ"
# - Filled pauses: "अ", "उम"
```

---

## Common Use Cases

### Case 1: Clinical Assessment

```python
# Analyze patient's attempt at target phrase
result = detector.analyze_audio(
    audio_path="patient_recording.wav",
    proper_transcript="मैं अपना नाम बता रहा हूं",
    language='hindi'
)

# Extract clinical metrics
severity = result['severity']
frequency = result['stutter_frequency']  # stutters per minute
duration = result['total_stutter_duration']

# Generate report
print(f"Severity: {severity}")
print(f"Frequency: {frequency:.1f} stutters/min")
print(f"Duration: {duration:.1f}s total")
```

### Case 2: Speech Therapy Progress

```python
# Compare recordings over time
baseline = detector.analyze_audio("session_1.wav", target)
followup = detector.analyze_audio("session_10.wav", target)

improvement = baseline['severity_score'] - followup['severity_score']
print(f"Improvement: {improvement:.1f} points")
```

### Case 3: Research Analysis

```python
# Detailed acoustic analysis
result = detector.analyze_audio(audio_path, target)

# Extract acoustic features
for event in result['stutter_timestamps']:
    if event['type'] == 'repetition':
        acoustic = event.get('acoustic_features', {})
        dtw = acoustic.get('dtw_similarity', 0)
        spec = acoustic.get('spectral_correlation', 0)
        print(f"DTW: {dtw:.2f}, Spectral: {spec:.2f}")
```

---

## Configuration

### Adjust Detection Sensitivity

Edit thresholds in `detect_stuttering.py`:

```python
# More sensitive (catches more, may have false positives)
PROLONGATION_CORRELATION_THRESHOLD = 0.85  # Default: 0.90
ACOUSTIC_SIMILARITY_THRESHOLD = 0.70       # Default: 0.75

# Less sensitive (fewer false positives, may miss some)
PROLONGATION_CORRELATION_THRESHOLD = 0.95
ACOUSTIC_SIMILARITY_THRESHOLD = 0.85
```

---

## Troubleshooting

### Issue: "mismatch_percentage still 0"

**Solution**: Make sure you're passing `proper_transcript` parameter:


```python
result = detector.analyze_audio(
    audio_path="file.wav",
    proper_transcript="target text",  # ← Don't forget this!
)
```

### Issue: "Slow processing"

**Solutions**:

- Reduce audio length (split into chunks)

- Disable acoustic analysis (comment out lines ~700-710)
- Use CPU instead of GPU for short files

### Issue: "Low confidence scores"

**Check**:

- Audio quality (16kHz recommended)
- Background noise
- Speaker clarity
- Language match (set `language='hindi'`)

### Issue: "HF_TOKEN error"

**Solution**:

```bash
export HF_TOKEN="your_token_here"
# Get token from: https://huggingface.co/settings/tokens
```

---

## Testing

### Run Test Suite

```bash
cd /path/to/zlaqa-version-b-ai-enginee
python test_advanced_features.py
```

### Expected Output

```
🔤 DEVANAGARI PHONETIC GROUPS
  Consonants: velar, palatal, retroflex, dental, labial...
  Vowels: short, long, diphthongs

🧪 TESTING ADVANCED TRANSCRIPT COMPARISON
  Test Case 1: Original Issue
    Actual:  'है लो'
    Target:  'लोहै'
    Mismatch %: 67% ✅
```

---

## Performance Tips

### 1. Reuse Detector Instance

```python
# Good: Load models once
detector = AdvancedStutterDetector()
for audio_file in audio_files:
    result = detector.analyze_audio(audio_file)

# Bad: Reloads models every time
for audio_file in audio_files:
    detector = AdvancedStutterDetector()  # ❌ Slow!
    result = detector.analyze_audio(audio_file)
```

### 2. Batch Processing

```python
results = []
for audio_file in audio_files:
    try:
        result = detector.analyze_audio(audio_file, target)
        results.append(result)
    except Exception as e:
        print(f"Failed: {audio_file} - {e}")
        continue
```

### 3. Parallel Processing
```python
from multiprocessing import Pool

def analyze_file(args):
    audio_file, target = args
    detector = AdvancedStutterDetector()
    return detector.analyze_audio(audio_file, target)

with Pool(4) as pool:
    results = pool.map(analyze_file, [(f, target) for f in files])
```

---

## API Reference

### Main Method

```python
analyze_audio(
    audio_path: str,           # Path to .wav file
    proper_transcript: str = "",  # Expected transcript (optional)
    language: str = 'hindi'    # Language code
) -> dict
```

### Utility Methods

```python
# Phonetic similarity (0-1)
_calculate_phonetic_similarity(char1: str, char2: str) -> float

# Comprehensive comparison
_compare_transcripts_comprehensive(actual: str, target: str) -> dict

# Acoustic similarity
_compare_audio_segments_acoustic(seg1: np.ndarray, seg2: np.ndarray) -> dict
```

---

## Documentation Files

| File | Purpose |
|------|---------|
| `ADVANCED_FEATURES.md` | Detailed technical documentation |
| `IMPLEMENTATION_SUMMARY.md` | Implementation overview |
| `VERSION_COMPARISON.md` | Compare with other versions |
| `QUICK_START.md` | This file |
| `test_advanced_features.py` | Test suite |

---

## Support

**Issues?**
1. Check logs for debug info
2. Review `debug` section in output
3. Test with known-good audio
4. Verify HF_TOKEN is set

**Questions?**
- Review `ADVANCED_FEATURES.md` for details
- Check `VERSION_COMPARISON.md` for differences
- Run test suite to verify setup

---

## Summary

✅ **Fixed**: Transcript comparison now works correctly  
✅ **Added**: Phonetic-aware Hindi analysis  
✅ **Added**: Acoustic similarity matching  
✅ **Added**: Multi-modal event detection  
✅ **Result**: Accurate stutter detection for Hindi speech

**Before**: 0% mismatch (broken)  
**After**: 67% mismatch (correct!)  

🎉 **You're ready to use advanced stutter detection!**