Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
File Validation Fix: Prevent Wrong File Translation
Date: 2025-11-12
Issue: Bilingual file was getting PDF content instead of Word document content
Root Cause
The bilingual file worship_program_2025-11-09_bilingual.txt shows:
- Source:
worship_program_2025-11-09.docx - Content: Contains worship program structure (Scripture Reference, Songs, Prayer points)
Problem: The user uploaded a previously generated worship program DOCX instead of the original sermon/transcript DOCX.
Why This Happened
- User previously generated a worship program β
worship_program_2025-11-09.docx - User uploaded this generated file instead of the original sermon DOCX
- System translated the worship program content (which came from PDF) instead of sermon content
- Result: Bilingual file contains PDF-like content, not sermon content
Fix Applied
1. Filename Validation β
File: app.py::translate_document()
Check: Reject files with "worship_program" or "worship-program" in filename
# Check if this looks like a worship program file (should not be translated)
filename = os.path.basename(docx_path).lower()
if 'worship_program' in filename or 'worship-program' in filename:
print(f"Warning: File '{filename}' appears to be a worship program, not a sermon transcript.")
print("Please upload the original sermon/transcript DOCX file, not a generated worship program.")
return None
2. Content Validation β
Check: Detect worship program structure in content
# Validate that this looks like a sermon/transcript, not a worship program
content_lower = content.lower()
worship_program_indicators = [
'## call to worship',
'## songs',
'## prayer',
'## message',
'## announcements',
'worship program',
'scripture reference',
'today\'s bible reading'
]
indicator_count = sum(1 for indicator in worship_program_indicators if indicator in content_lower)
if indicator_count >= 3:
print(f"Warning: The DOCX file appears to be a worship program (found {indicator_count} program indicators), not a sermon transcript.")
print("Please upload the original sermon/transcript DOCX file for translation.")
return None
3. Better Error Messages β
File: app.py::process_worship_program()
Change: Provide clear error message when wrong file type is detected
if not bilingual_path_temp or not os.path.exists(bilingual_path_temp):
error_msg = "β Error: Translation failed. "
filename = os.path.basename(docx_file.name)
if 'worship_program' in filename.lower() or 'worship-program' in filename.lower():
error_msg += f"\n\nThe file '{filename}' appears to be a previously generated worship program, not a sermon transcript.\n"
error_msg += "Please upload the ORIGINAL sermon/transcript DOCX file for translation."
else:
error_msg += "Please check the DOCX file."
return error_msg, None
Expected Behavior
β Correct Workflow
- Upload ORIGINAL sermon DOCX (e.g.,
sermon_2025-11-09.docxormessage.docx) - Upload PDF bulletin (e.g.,
RCCA-worship-bulletin-2025-11-09.pdf) - System translates sermon DOCX β Creates
sermon_2025-11-09_bilingual.txt - Bilingual file contains sermon content (not worship program content)
- Message section uses sermon bilingual content
β Wrong Workflow (Now Prevented)
- Upload generated worship program DOCX (e.g.,
worship_program_2025-11-09.docx) - System detects it's a worship program β Rejects with error message
- User must upload original sermon DOCX instead
How to Fix Existing Issue
If you already have a bilingual file with wrong content:
- Delete the incorrect bilingual file:
worship_program_2025-11-09_bilingual.txt - Upload the ORIGINAL sermon DOCX file (not the generated worship program)
- Re-run the translation
Files Modified
app.py- Added filename and content validation
Status: β Validation added - system will now reject worship program files