Spaces:
Runtime error
Runtime error
| # Transcript Debugging Guide | |
| ## Issue: Empty Transcripts ("No transcript available") | |
| ## Complete Flow Analysis | |
| ### 1. Django App → API Request (`slaq-version-c/diagnosis/ai_engine/detect_stuttering.py`) | |
| **Location:** Line 269-274 | |
| ```python | |
| response = requests.post( | |
| self.api_url, | |
| files=files, | |
| data={ | |
| "transcript": proper_transcript if proper_transcript else "", | |
| "language": lang_code, | |
| }, | |
| timeout=self.api_timeout | |
| ) | |
| ``` | |
| **Status:** ✅ Sending transcript parameter correctly | |
| --- | |
| ### 2. API Receives Request (`slaq-version-c-ai-enginee/app.py`) | |
| **Location:** Line 70-73 | |
| ```python | |
| @app.post("/analyze") | |
| async def analyze_audio( | |
| audio: UploadFile = File(...), | |
| transcript: str = Form("") # ✅ Fixed: Now uses Form() for multipart | |
| ): | |
| ``` | |
| **Status:** ✅ Fixed - Now correctly receives transcript via Form() | |
| --- | |
| ### 3. API Calls Model (`slaq-version-c-ai-enginee/app.py`) | |
| **Location:** Line 106 | |
| ```python | |
| result = detector.analyze_audio(temp_file, transcript) | |
| ``` | |
| **Status:** ✅ Passing transcript correctly | |
| --- | |
| ### 4. Model Transcribes Audio (`slaq-version-c-ai-enginee/diagnosis/ai_engine/detect_stuttering.py`) | |
| **Location:** Line 313-369 (`_transcribe_with_timestamps`) | |
| **Potential Issues:** | |
| - ❓ IndicWav2Vec decoding might not work with `processor.batch_decode()` | |
| - ❓ Need to use tokenizer directly | |
| - ❓ Model might not be producing valid predictions | |
| **Status:** ⚠️ **LIKELY ISSUE HERE** - Decoding method may be incorrect | |
| --- | |
| ### 5. Model Returns Result (`slaq-version-c-ai-enginee/diagnosis/ai_engine/detect_stuttering.py`) | |
| **Location:** Line 787-794 | |
| ```python | |
| actual_transcript = transcript if transcript else "" | |
| target_transcript = proper_transcript if proper_transcript else transcript if transcript else "" | |
| return { | |
| 'actual_transcript': actual_transcript, | |
| 'target_transcript': target_transcript, | |
| ... | |
| } | |
| ``` | |
| **Status:** ✅ Returns transcripts correctly (if transcript is not empty) | |
| --- | |
| ### 6. API Returns Response (`slaq-version-c-ai-enginee/app.py`) | |
| **Location:** Line 109-113 | |
| ```python | |
| actual = result.get('actual_transcript', '') | |
| target = result.get('target_transcript', '') | |
| logger.info(f"📝 Result transcripts - Actual: '{actual[:100]}' (len: {len(actual)}), Target: '{target[:100]}' (len: {len(target)})") | |
| return result | |
| ``` | |
| **Status:** ✅ Returns JSON with transcripts | |
| --- | |
| ### 7. Django Receives Response (`slaq-version-c/diagnosis/ai_engine/detect_stuttering.py`) | |
| **Location:** Line 279-410 | |
| ```python | |
| result = response.json() | |
| # ... formatting ... | |
| actual_transcript = str(api_result.get('actual_transcript', '')).strip() | |
| target_transcript = str(api_result.get('target_transcript', '')).strip() | |
| ``` | |
| **Status:** ✅ Extracts transcripts correctly | |
| --- | |
| ### 8. Django Saves to Database (`slaq-version-c/diagnosis/tasks.py`) | |
| **Location:** Line 141-142 | |
| ```python | |
| actual_transcript=actual_transcript, | |
| target_transcript=target_transcript, | |
| ``` | |
| **Status:** ✅ Saves correctly | |
| --- | |
| ## Root Cause Analysis | |
| ### Most Likely Issue: Transcription Decoding | |
| The IndicWav2Vec model (`ai4bharat/indicwav2vec-hindi`) may require: | |
| 1. **Direct tokenizer access** instead of `processor.batch_decode()` | |
| 2. **CTC decoding** with proper tokenizer | |
| 3. **Special handling** for Indic scripts | |
| ### Fix Applied | |
| Updated `_transcribe_with_timestamps()` to: | |
| 1. Try multiple decoding methods | |
| 2. Use tokenizer directly if available | |
| 3. Add comprehensive error logging | |
| 4. Log predicted IDs for debugging | |
| --- | |
| ## Debugging Steps | |
| ### 1. Check API Logs | |
| When processing audio, look for: | |
| ``` | |
| 📝 Transcribed text: '...' (length: X) | |
| 📝 Final return - Actual: '...' (len: X), Target: '...' (len: Y) | |
| 📝 Result transcripts - Actual: '...' (len: X), Target: '...' (len: Y) | |
| ``` | |
| ### 2. Check Django Logs | |
| Look for: | |
| ``` | |
| 📝 Final transcripts - Actual: X chars, Target: Y chars | |
| 📝 Saving transcripts - Actual: X chars, Target: Y chars | |
| ``` | |
| ### 3. Check Database | |
| Query the `AnalysisResult` table: | |
| ```sql | |
| SELECT actual_transcript, target_transcript, LENGTH(actual_transcript) as actual_len, LENGTH(target_transcript) as target_len | |
| FROM diagnosis_analysisresult | |
| ORDER BY created_at DESC LIMIT 5; | |
| ``` | |
| ### 4. Test API Directly | |
| ```bash | |
| curl -X POST "http://localhost:7860/analyze" \ | |
| -F "audio=@test.wav" \ | |
| -F "transcript=test transcript" \ | |
| -F "language=hin" | |
| ``` | |
| Check the response JSON for `actual_transcript` and `target_transcript`. | |
| --- | |
| ## Next Steps | |
| 1. **Rebuild Docker image** with latest changes | |
| 2. **Check logs** during audio processing | |
| 3. **Verify processor structure** - logs will show processor attributes | |
| 4. **Test with Hindi audio** - model is optimized for Hindi | |
| 5. **Check if model is loaded correctly** - verify HF_TOKEN is working | |
| --- | |
| ## Expected Log Output (Success) | |
| ``` | |
| 🚀 Initializing Advanced AI Engine on cpu... | |
| ✅ HF_TOKEN found - using authenticated model access | |
| 📋 Processor type: <class 'transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor'> | |
| 📋 Processor attributes: ['batch_decode', 'decode', 'feature_extractor', 'tokenizer', ...] | |
| 📋 Tokenizer type: <class 'transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer'> | |
| 📝 Transcribed text: 'नमस्ते मैं हिंदी बोल रहा हूं' (length: 25) | |
| 📝 Final return - Actual: 'नमस्ते मैं हिंदी बोल रहा हूं' (len: 25), Target: '...' (len: X) | |
| ``` | |
| --- | |
| ## If Still Empty | |
| 1. **Model may not be loaded correctly** - check HF_TOKEN | |
| 2. **Audio format issue** - ensure 16kHz mono WAV | |
| 3. **Model not producing predictions** - check predicted_ids in logs | |
| 4. **Tokenizer mismatch** - IndicWav2Vec may need special tokenizer initialization | |