worship / FILE_VALIDATION_FIX.md
Peter Yang
Add documentation for file validation fix
3069cac
# File Validation Fix: Prevent Wrong File Translation
**Date**: 2025-11-12
**Issue**: Bilingual file was getting PDF content instead of Word document content
---
## Root Cause
The bilingual file `worship_program_2025-11-09_bilingual.txt` shows:
- **Source**: `worship_program_2025-11-09.docx`
- **Content**: Contains worship program structure (Scripture Reference, Songs, Prayer points)
**Problem**: The user uploaded a **previously generated worship program DOCX** instead of the **original sermon/transcript DOCX**.
---
## Why This Happened
1. User previously generated a worship program β†’ `worship_program_2025-11-09.docx`
2. User uploaded this generated file instead of the original sermon DOCX
3. System translated the worship program content (which came from PDF) instead of sermon content
4. Result: Bilingual file contains PDF-like content, not sermon content
---
## Fix Applied
### 1. Filename Validation βœ…
**File**: `app.py::translate_document()`
**Check**: Reject files with "worship_program" or "worship-program" in filename
```python
# Check if this looks like a worship program file (should not be translated)
filename = os.path.basename(docx_path).lower()
if 'worship_program' in filename or 'worship-program' in filename:
print(f"Warning: File '{filename}' appears to be a worship program, not a sermon transcript.")
print("Please upload the original sermon/transcript DOCX file, not a generated worship program.")
return None
```
### 2. Content Validation βœ…
**Check**: Detect worship program structure in content
```python
# Validate that this looks like a sermon/transcript, not a worship program
content_lower = content.lower()
worship_program_indicators = [
'## call to worship',
'## songs',
'## prayer',
'## message',
'## announcements',
'worship program',
'scripture reference',
'today\'s bible reading'
]
indicator_count = sum(1 for indicator in worship_program_indicators if indicator in content_lower)
if indicator_count >= 3:
print(f"Warning: The DOCX file appears to be a worship program (found {indicator_count} program indicators), not a sermon transcript.")
print("Please upload the original sermon/transcript DOCX file for translation.")
return None
```
### 3. Better Error Messages βœ…
**File**: `app.py::process_worship_program()`
**Change**: Provide clear error message when wrong file type is detected
```python
if not bilingual_path_temp or not os.path.exists(bilingual_path_temp):
error_msg = "❌ Error: Translation failed. "
filename = os.path.basename(docx_file.name)
if 'worship_program' in filename.lower() or 'worship-program' in filename.lower():
error_msg += f"\n\nThe file '{filename}' appears to be a previously generated worship program, not a sermon transcript.\n"
error_msg += "Please upload the ORIGINAL sermon/transcript DOCX file for translation."
else:
error_msg += "Please check the DOCX file."
return error_msg, None
```
---
## Expected Behavior
### βœ… Correct Workflow
1. **Upload ORIGINAL sermon DOCX** (e.g., `sermon_2025-11-09.docx` or `message.docx`)
2. **Upload PDF bulletin** (e.g., `RCCA-worship-bulletin-2025-11-09.pdf`)
3. **System translates sermon DOCX** β†’ Creates `sermon_2025-11-09_bilingual.txt`
4. **Bilingual file contains sermon content** (not worship program content)
5. **Message section uses sermon bilingual content**
### ❌ Wrong Workflow (Now Prevented)
1. **Upload generated worship program DOCX** (e.g., `worship_program_2025-11-09.docx`)
2. **System detects it's a worship program** β†’ Rejects with error message
3. **User must upload original sermon DOCX instead**
---
## How to Fix Existing Issue
If you already have a bilingual file with wrong content:
1. **Delete the incorrect bilingual file**: `worship_program_2025-11-09_bilingual.txt`
2. **Upload the ORIGINAL sermon DOCX file** (not the generated worship program)
3. **Re-run the translation**
---
## Files Modified
- `app.py` - Added filename and content validation
---
**Status**: βœ… **Validation added - system will now reject worship program files**