zlaqa-version-b-ai-enginee / Docs /TRANSCRIPT_DEBUG.md
anfastech's picture
fix: resolve torch security error by pinning torch 2.6.0 and updating requirements
a62077e
# Transcript Debugging Guide
## Issue: Empty Transcripts ("No transcript available")
## Complete Flow Analysis
### 1. Django App → API Request (`slaq-version-c/diagnosis/ai_engine/detect_stuttering.py`)
**Location:** Line 269-274
```python
response = requests.post(
self.api_url,
files=files,
data={
"transcript": proper_transcript if proper_transcript else "",
"language": lang_code,
},
timeout=self.api_timeout
)
```
**Status:** ✅ Sending transcript parameter correctly
---
### 2. API Receives Request (`slaq-version-c-ai-enginee/app.py`)
**Location:** Line 70-73
```python
@app.post("/analyze")
async def analyze_audio(
audio: UploadFile = File(...),
transcript: str = Form("") # ✅ Fixed: Now uses Form() for multipart
):
```
**Status:** ✅ Fixed - Now correctly receives transcript via Form()
---
### 3. API Calls Model (`slaq-version-c-ai-enginee/app.py`)
**Location:** Line 106
```python
result = detector.analyze_audio(temp_file, transcript)
```
**Status:** ✅ Passing transcript correctly
---
### 4. Model Transcribes Audio (`slaq-version-c-ai-enginee/diagnosis/ai_engine/detect_stuttering.py`)
**Location:** Line 313-369 (`_transcribe_with_timestamps`)
**Potential Issues:**
- ❓ IndicWav2Vec decoding might not work with `processor.batch_decode()`
- ❓ Need to use tokenizer directly
- ❓ Model might not be producing valid predictions
**Status:** ⚠️ **LIKELY ISSUE HERE** - Decoding method may be incorrect
---
### 5. Model Returns Result (`slaq-version-c-ai-enginee/diagnosis/ai_engine/detect_stuttering.py`)
**Location:** Line 787-794
```python
actual_transcript = transcript if transcript else ""
target_transcript = proper_transcript if proper_transcript else transcript if transcript else ""
return {
'actual_transcript': actual_transcript,
'target_transcript': target_transcript,
...
}
```
**Status:** ✅ Returns transcripts correctly (if transcript is not empty)
---
### 6. API Returns Response (`slaq-version-c-ai-enginee/app.py`)
**Location:** Line 109-113
```python
actual = result.get('actual_transcript', '')
target = result.get('target_transcript', '')
logger.info(f"📝 Result transcripts - Actual: '{actual[:100]}' (len: {len(actual)}), Target: '{target[:100]}' (len: {len(target)})")
return result
```
**Status:** ✅ Returns JSON with transcripts
---
### 7. Django Receives Response (`slaq-version-c/diagnosis/ai_engine/detect_stuttering.py`)
**Location:** Line 279-410
```python
result = response.json()
# ... formatting ...
actual_transcript = str(api_result.get('actual_transcript', '')).strip()
target_transcript = str(api_result.get('target_transcript', '')).strip()
```
**Status:** ✅ Extracts transcripts correctly
---
### 8. Django Saves to Database (`slaq-version-c/diagnosis/tasks.py`)
**Location:** Line 141-142
```python
actual_transcript=actual_transcript,
target_transcript=target_transcript,
```
**Status:** ✅ Saves correctly
---
## Root Cause Analysis
### Most Likely Issue: Transcription Decoding
The IndicWav2Vec model (`ai4bharat/indicwav2vec-hindi`) may require:
1. **Direct tokenizer access** instead of `processor.batch_decode()`
2. **CTC decoding** with proper tokenizer
3. **Special handling** for Indic scripts
### Fix Applied
Updated `_transcribe_with_timestamps()` to:
1. Try multiple decoding methods
2. Use tokenizer directly if available
3. Add comprehensive error logging
4. Log predicted IDs for debugging
---
## Debugging Steps
### 1. Check API Logs
When processing audio, look for:
```
📝 Transcribed text: '...' (length: X)
📝 Final return - Actual: '...' (len: X), Target: '...' (len: Y)
📝 Result transcripts - Actual: '...' (len: X), Target: '...' (len: Y)
```
### 2. Check Django Logs
Look for:
```
📝 Final transcripts - Actual: X chars, Target: Y chars
📝 Saving transcripts - Actual: X chars, Target: Y chars
```
### 3. Check Database
Query the `AnalysisResult` table:
```sql
SELECT actual_transcript, target_transcript, LENGTH(actual_transcript) as actual_len, LENGTH(target_transcript) as target_len
FROM diagnosis_analysisresult
ORDER BY created_at DESC LIMIT 5;
```
### 4. Test API Directly
```bash
curl -X POST "http://localhost:7860/analyze" \
-F "audio=@test.wav" \
-F "transcript=test transcript" \
-F "language=hin"
```
Check the response JSON for `actual_transcript` and `target_transcript`.
---
## Next Steps
1. **Rebuild Docker image** with latest changes
2. **Check logs** during audio processing
3. **Verify processor structure** - logs will show processor attributes
4. **Test with Hindi audio** - model is optimized for Hindi
5. **Check if model is loaded correctly** - verify HF_TOKEN is working
---
## Expected Log Output (Success)
```
🚀 Initializing Advanced AI Engine on cpu...
✅ HF_TOKEN found - using authenticated model access
📋 Processor type: <class 'transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor'>
📋 Processor attributes: ['batch_decode', 'decode', 'feature_extractor', 'tokenizer', ...]
📋 Tokenizer type: <class 'transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer'>
📝 Transcribed text: 'नमस्ते मैं हिंदी बोल रहा हूं' (length: 25)
📝 Final return - Actual: 'नमस्ते मैं हिंदी बोल रहा हूं' (len: 25), Target: '...' (len: X)
```
---
## If Still Empty
1. **Model may not be loaded correctly** - check HF_TOKEN
2. **Audio format issue** - ensure 16kHz mono WAV
3. **Model not producing predictions** - check predicted_ids in logs
4. **Tokenizer mismatch** - IndicWav2Vec may need special tokenizer initialization