Spaces:

anfastech
/

zlaqa-version-c-ai-enginee

Sleeping

App Files Files Community

zlaqa-version-c-ai-enginee / Docs /QUICK_START.md

anfastech

New: implemented many, many changes. 10% Phone-level detection: WORKING

278e294 2 months ago

preview code

raw

history blame contribute delete

9.24 kB

	# 🚀 Quick Start Guide - Advanced Stutter Detection

	## TL;DR - What Changed?

	Before: System returned `mismatch_percentage: 0` even when transcripts were completely different ❌
	After: System now correctly detects mismatches using multi-modal analysis ✅

	---

	## Installation & Setup

	### 1. Requirements

	```bash
	pip install librosa torch transformers scipy numpy
	```

	### 2. Environment Variable

	```bash
	export HF_TOKEN="your_huggingface_token"
	```

	### 3. Import

	```python
	from diagnosis.ai_engine.detect_stuttering import AdvancedStutterDetector
	```

	---

	## Basic Usage

	### Analyze Audio File

	```python
	# Initialize detector (loads models once)
	detector = AdvancedStutterDetector()

	# Analyze with target transcript
	result = detector.analyze_audio(
	audio_path="path/to/audio.wav",
	proper_transcript="मैं घर जा रहा हूं",
	language='hindi'
	)

	# Access results
	print(f"Mismatch: {result['mismatch_percentage']}%")
	print(f"Severity: {result['severity']}")
	print(f"Confidence: {result['confidence_score']}")
	```

	### Analyze Without Target (ASR Only)

	```python
	result = detector.analyze_audio(
	audio_path="path/to/audio.wav",
	language='hindi'
	)
	# Will only detect acoustic stutters and patterns
	```

	---

	## Understanding Output

	### Key Metrics

	```python
	{
	# Transcripts
	'actual_transcript': 'है लो', # What was actually said
	'target_transcript': 'लोहै', # What should be said

	# Mismatch Analysis
	'mismatched_chars': ['है', 'लो'], # Segments that don't match
	'mismatch_percentage': 67, # % of characters mismatched

	# Advanced Metrics
	'edit_distance': 4, # Operations to transform
	'lcs_ratio': 0.667, # Similarity via LCS
	'phonetic_similarity': 0.85, # Sound similarity (0-1)
	'word_accuracy': 0.5, # Word-level accuracy

	# Stutter Events
	'stutter_timestamps': [ # Detected events
	{
	'type': 'repetition', # repetition\|prolongation\|block\|dysfluency
	'start': 1.2, # Start time (seconds)
	'end': 1.8, # End time (seconds)
	'text': 'मैं', # Affected text
	'confidence': 0.87, # Detection confidence
	'phonetic_similarity': 0.85 # Acoustic similarity
	}
	],

	# Assessment
	'severity': 'moderate', # none\|mild\|moderate\|severe
	'severity_score': 45.2, # 0-100 scale
	'confidence_score': 0.87, # Overall confidence

	# Debug
	'debug': {
	'acoustic_repetitions': 2, # Sound-based detections
	'acoustic_prolongations': 1,
	'text_patterns': 2 # Regex pattern matches
	}
	}
	```

	---

	## Feature Highlights

	### 1. Phonetic Intelligence

	```python
	# The system understands that क and ख are similar
	detector._calculate_phonetic_similarity('क', 'ख')
	# Returns: 0.85 (both velar plosives)

	detector._calculate_phonetic_similarity('क', 'अ')
	# Returns: 0.2 (different categories)
	```

	### 2. Acoustic Matching

	```python
	# Detects repetitions even when ASR transcribes differently
	# Example: "ज-ज-जाना" might be transcribed as "जना जना"
	# Acoustic analysis catches the sound similarity!
	```

	### 3. Pattern Detection

	```python
	# Automatically detects:
	# - Character repetitions: "ममम"
	# - Word repetitions: "मैं मैं"
	# - Prolongations: "आआआ"
	# - Filled pauses: "अ", "उम"
	```

	---

	## Common Use Cases

	### Case 1: Clinical Assessment

	```python
	# Analyze patient's attempt at target phrase
	result = detector.analyze_audio(
	audio_path="patient_recording.wav",
	proper_transcript="मैं अपना नाम बता रहा हूं",
	language='hindi'
	)

	# Extract clinical metrics
	severity = result['severity']
	frequency = result['stutter_frequency'] # stutters per minute
	duration = result['total_stutter_duration']

	# Generate report
	print(f"Severity: {severity}")
	print(f"Frequency: {frequency:.1f} stutters/min")
	print(f"Duration: {duration:.1f}s total")
	```

	### Case 2: Speech Therapy Progress

	```python
	# Compare recordings over time
	baseline = detector.analyze_audio("session_1.wav", target)
	followup = detector.analyze_audio("session_10.wav", target)

	improvement = baseline['severity_score'] - followup['severity_score']
	print(f"Improvement: {improvement:.1f} points")
	```

	### Case 3: Research Analysis

	```python
	# Detailed acoustic analysis
	result = detector.analyze_audio(audio_path, target)

	# Extract acoustic features
	for event in result['stutter_timestamps']:
	if event['type'] == 'repetition':
	acoustic = event.get('acoustic_features', {})
	dtw = acoustic.get('dtw_similarity', 0)
	spec = acoustic.get('spectral_correlation', 0)
	print(f"DTW: {dtw:.2f}, Spectral: {spec:.2f}")
	```

	---

	## Configuration

	### Adjust Detection Sensitivity

	Edit thresholds in `detect_stuttering.py`:

	```python
	# More sensitive (catches more, may have false positives)
	PROLONGATION_CORRELATION_THRESHOLD = 0.85 # Default: 0.90
	ACOUSTIC_SIMILARITY_THRESHOLD = 0.70 # Default: 0.75

	# Less sensitive (fewer false positives, may miss some)
	PROLONGATION_CORRELATION_THRESHOLD = 0.95
	ACOUSTIC_SIMILARITY_THRESHOLD = 0.85
	```

	---

	## Troubleshooting

	### Issue: "mismatch_percentage still 0"

	Solution: Make sure you're passing `proper_transcript` parameter:


	```python
	result = detector.analyze_audio(
	audio_path="file.wav",
	proper_transcript="target text", # ← Don't forget this!
	)
	```

	### Issue: "Slow processing"

	Solutions:

	- Reduce audio length (split into chunks)

	- Disable acoustic analysis (comment out lines ~700-710)
	- Use CPU instead of GPU for short files

	### Issue: "Low confidence scores"

	Check:

	- Audio quality (16kHz recommended)
	- Background noise
	- Speaker clarity
	- Language match (set `language='hindi'`)

	### Issue: "HF_TOKEN error"

	Solution:

	```bash
	export HF_TOKEN="your_token_here"
	# Get token from: https://huggingface.co/settings/tokens
	```

	---

	## Testing

	### Run Test Suite

	```bash
	cd /path/to/zlaqa-version-b-ai-enginee
	python test_advanced_features.py
	```

	### Expected Output

	```
	🔤 DEVANAGARI PHONETIC GROUPS
	Consonants: velar, palatal, retroflex, dental, labial...
	Vowels: short, long, diphthongs

	🧪 TESTING ADVANCED TRANSCRIPT COMPARISON
	Test Case 1: Original Issue
	Actual: 'है लो'
	Target: 'लोहै'
	Mismatch %: 67% ✅
	```

	---

	## Performance Tips

	### 1. Reuse Detector Instance

	```python
	# Good: Load models once
	detector = AdvancedStutterDetector()
	for audio_file in audio_files:
	result = detector.analyze_audio(audio_file)

	# Bad: Reloads models every time
	for audio_file in audio_files:
	detector = AdvancedStutterDetector() # ❌ Slow!
	result = detector.analyze_audio(audio_file)
	```

	### 2. Batch Processing

	```python
	results = []
	for audio_file in audio_files:
	try:
	result = detector.analyze_audio(audio_file, target)
	results.append(result)
	except Exception as e:
	print(f"Failed: {audio_file} - {e}")
	continue
	```

	### 3. Parallel Processing
	```python
	from multiprocessing import Pool

	def analyze_file(args):
	audio_file, target = args
	detector = AdvancedStutterDetector()
	return detector.analyze_audio(audio_file, target)

	with Pool(4) as pool:
	results = pool.map(analyze_file, [(f, target) for f in files])
	```

	---

	## API Reference

	### Main Method

	```python
	analyze_audio(
	audio_path: str, # Path to .wav file
	proper_transcript: str = "", # Expected transcript (optional)
	language: str = 'hindi' # Language code
	) -> dict
	```

	### Utility Methods

	```python
	# Phonetic similarity (0-1)
	_calculate_phonetic_similarity(char1: str, char2: str) -> float

	# Comprehensive comparison
	_compare_transcripts_comprehensive(actual: str, target: str) -> dict

	# Acoustic similarity
	_compare_audio_segments_acoustic(seg1: np.ndarray, seg2: np.ndarray) -> dict
	```

	---

	## Documentation Files

	\| File \| Purpose \|
	\|------\|---------\|
	\| `ADVANCED_FEATURES.md` \| Detailed technical documentation \|
	\| `IMPLEMENTATION_SUMMARY.md` \| Implementation overview \|
	\| `VERSION_COMPARISON.md` \| Compare with other versions \|
	\| `QUICK_START.md` \| This file \|
	\| `test_advanced_features.py` \| Test suite \|

	---

	## Support

	Issues?
	1. Check logs for debug info
	2. Review `debug` section in output
	3. Test with known-good audio
	4. Verify HF_TOKEN is set

	Questions?
	- Review `ADVANCED_FEATURES.md` for details
	- Check `VERSION_COMPARISON.md` for differences
	- Run test suite to verify setup

	---

	## Summary

	✅ Fixed: Transcript comparison now works correctly
	✅ Added: Phonetic-aware Hindi analysis
	✅ Added: Acoustic similarity matching
	✅ Added: Multi-modal event detection
	✅ Result: Accurate stutter detection for Hindi speech

	Before: 0% mismatch (broken)
	After: 67% mismatch (correct!)

	🎉 You're ready to use advanced stutter detection!