Spaces:
Sleeping
Sleeping
| # File Validation Fix: Prevent Wrong File Translation | |
| **Date**: 2025-11-12 | |
| **Issue**: Bilingual file was getting PDF content instead of Word document content | |
| --- | |
| ## Root Cause | |
| The bilingual file `worship_program_2025-11-09_bilingual.txt` shows: | |
| - **Source**: `worship_program_2025-11-09.docx` | |
| - **Content**: Contains worship program structure (Scripture Reference, Songs, Prayer points) | |
| **Problem**: The user uploaded a **previously generated worship program DOCX** instead of the **original sermon/transcript DOCX**. | |
| --- | |
| ## Why This Happened | |
| 1. User previously generated a worship program β `worship_program_2025-11-09.docx` | |
| 2. User uploaded this generated file instead of the original sermon DOCX | |
| 3. System translated the worship program content (which came from PDF) instead of sermon content | |
| 4. Result: Bilingual file contains PDF-like content, not sermon content | |
| --- | |
| ## Fix Applied | |
| ### 1. Filename Validation β | |
| **File**: `app.py::translate_document()` | |
| **Check**: Reject files with "worship_program" or "worship-program" in filename | |
| ```python | |
| # Check if this looks like a worship program file (should not be translated) | |
| filename = os.path.basename(docx_path).lower() | |
| if 'worship_program' in filename or 'worship-program' in filename: | |
| print(f"Warning: File '{filename}' appears to be a worship program, not a sermon transcript.") | |
| print("Please upload the original sermon/transcript DOCX file, not a generated worship program.") | |
| return None | |
| ``` | |
| ### 2. Content Validation β | |
| **Check**: Detect worship program structure in content | |
| ```python | |
| # Validate that this looks like a sermon/transcript, not a worship program | |
| content_lower = content.lower() | |
| worship_program_indicators = [ | |
| '## call to worship', | |
| '## songs', | |
| '## prayer', | |
| '## message', | |
| '## announcements', | |
| 'worship program', | |
| 'scripture reference', | |
| 'today\'s bible reading' | |
| ] | |
| indicator_count = sum(1 for indicator in worship_program_indicators if indicator in content_lower) | |
| if indicator_count >= 3: | |
| print(f"Warning: The DOCX file appears to be a worship program (found {indicator_count} program indicators), not a sermon transcript.") | |
| print("Please upload the original sermon/transcript DOCX file for translation.") | |
| return None | |
| ``` | |
| ### 3. Better Error Messages β | |
| **File**: `app.py::process_worship_program()` | |
| **Change**: Provide clear error message when wrong file type is detected | |
| ```python | |
| if not bilingual_path_temp or not os.path.exists(bilingual_path_temp): | |
| error_msg = "β Error: Translation failed. " | |
| filename = os.path.basename(docx_file.name) | |
| if 'worship_program' in filename.lower() or 'worship-program' in filename.lower(): | |
| error_msg += f"\n\nThe file '{filename}' appears to be a previously generated worship program, not a sermon transcript.\n" | |
| error_msg += "Please upload the ORIGINAL sermon/transcript DOCX file for translation." | |
| else: | |
| error_msg += "Please check the DOCX file." | |
| return error_msg, None | |
| ``` | |
| --- | |
| ## Expected Behavior | |
| ### β Correct Workflow | |
| 1. **Upload ORIGINAL sermon DOCX** (e.g., `sermon_2025-11-09.docx` or `message.docx`) | |
| 2. **Upload PDF bulletin** (e.g., `RCCA-worship-bulletin-2025-11-09.pdf`) | |
| 3. **System translates sermon DOCX** β Creates `sermon_2025-11-09_bilingual.txt` | |
| 4. **Bilingual file contains sermon content** (not worship program content) | |
| 5. **Message section uses sermon bilingual content** | |
| ### β Wrong Workflow (Now Prevented) | |
| 1. **Upload generated worship program DOCX** (e.g., `worship_program_2025-11-09.docx`) | |
| 2. **System detects it's a worship program** β Rejects with error message | |
| 3. **User must upload original sermon DOCX instead** | |
| --- | |
| ## How to Fix Existing Issue | |
| If you already have a bilingual file with wrong content: | |
| 1. **Delete the incorrect bilingual file**: `worship_program_2025-11-09_bilingual.txt` | |
| 2. **Upload the ORIGINAL sermon DOCX file** (not the generated worship program) | |
| 3. **Re-run the translation** | |
| --- | |
| ## Files Modified | |
| - `app.py` - Added filename and content validation | |
| --- | |
| **Status**: β **Validation added - system will now reject worship program files** | |