# File Validation Fix: Prevent Wrong File Translation **Date**: 2025-11-12 **Issue**: Bilingual file was getting PDF content instead of Word document content --- ## Root Cause The bilingual file `worship_program_2025-11-09_bilingual.txt` shows: - **Source**: `worship_program_2025-11-09.docx` - **Content**: Contains worship program structure (Scripture Reference, Songs, Prayer points) **Problem**: The user uploaded a **previously generated worship program DOCX** instead of the **original sermon/transcript DOCX**. --- ## Why This Happened 1. User previously generated a worship program → `worship_program_2025-11-09.docx` 2. User uploaded this generated file instead of the original sermon DOCX 3. System translated the worship program content (which came from PDF) instead of sermon content 4. Result: Bilingual file contains PDF-like content, not sermon content --- ## Fix Applied ### 1. Filename Validation ✅ **File**: `app.py::translate_document()` **Check**: Reject files with "worship_program" or "worship-program" in filename ```python # Check if this looks like a worship program file (should not be translated) filename = os.path.basename(docx_path).lower() if 'worship_program' in filename or 'worship-program' in filename: print(f"Warning: File '{filename}' appears to be a worship program, not a sermon transcript.") print("Please upload the original sermon/transcript DOCX file, not a generated worship program.") return None ``` ### 2. Content Validation ✅ **Check**: Detect worship program structure in content ```python # Validate that this looks like a sermon/transcript, not a worship program content_lower = content.lower() worship_program_indicators = [ '## call to worship', '## songs', '## prayer', '## message', '## announcements', 'worship program', 'scripture reference', 'today\'s bible reading' ] indicator_count = sum(1 for indicator in worship_program_indicators if indicator in content_lower) if indicator_count >= 3: print(f"Warning: The DOCX file appears to be a worship program (found {indicator_count} program indicators), not a sermon transcript.") print("Please upload the original sermon/transcript DOCX file for translation.") return None ``` ### 3. Better Error Messages ✅ **File**: `app.py::process_worship_program()` **Change**: Provide clear error message when wrong file type is detected ```python if not bilingual_path_temp or not os.path.exists(bilingual_path_temp): error_msg = "❌ Error: Translation failed. " filename = os.path.basename(docx_file.name) if 'worship_program' in filename.lower() or 'worship-program' in filename.lower(): error_msg += f"\n\nThe file '{filename}' appears to be a previously generated worship program, not a sermon transcript.\n" error_msg += "Please upload the ORIGINAL sermon/transcript DOCX file for translation." else: error_msg += "Please check the DOCX file." return error_msg, None ``` --- ## Expected Behavior ### ✅ Correct Workflow 1. **Upload ORIGINAL sermon DOCX** (e.g., `sermon_2025-11-09.docx` or `message.docx`) 2. **Upload PDF bulletin** (e.g., `RCCA-worship-bulletin-2025-11-09.pdf`) 3. **System translates sermon DOCX** → Creates `sermon_2025-11-09_bilingual.txt` 4. **Bilingual file contains sermon content** (not worship program content) 5. **Message section uses sermon bilingual content** ### ❌ Wrong Workflow (Now Prevented) 1. **Upload generated worship program DOCX** (e.g., `worship_program_2025-11-09.docx`) 2. **System detects it's a worship program** → Rejects with error message 3. **User must upload original sermon DOCX instead** --- ## How to Fix Existing Issue If you already have a bilingual file with wrong content: 1. **Delete the incorrect bilingual file**: `worship_program_2025-11-09_bilingual.txt` 2. **Upload the ORIGINAL sermon DOCX file** (not the generated worship program) 3. **Re-run the translation** --- ## Files Modified - `app.py` - Added filename and content validation --- **Status**: ✅ **Validation added - system will now reject worship program files**