worship / FILE_VALIDATION_FIX.md
Peter Yang
Add documentation for file validation fix
3069cac

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

File Validation Fix: Prevent Wrong File Translation

Date: 2025-11-12
Issue: Bilingual file was getting PDF content instead of Word document content


Root Cause

The bilingual file worship_program_2025-11-09_bilingual.txt shows:

  • Source: worship_program_2025-11-09.docx
  • Content: Contains worship program structure (Scripture Reference, Songs, Prayer points)

Problem: The user uploaded a previously generated worship program DOCX instead of the original sermon/transcript DOCX.


Why This Happened

  1. User previously generated a worship program β†’ worship_program_2025-11-09.docx
  2. User uploaded this generated file instead of the original sermon DOCX
  3. System translated the worship program content (which came from PDF) instead of sermon content
  4. Result: Bilingual file contains PDF-like content, not sermon content

Fix Applied

1. Filename Validation βœ…

File: app.py::translate_document()

Check: Reject files with "worship_program" or "worship-program" in filename

# Check if this looks like a worship program file (should not be translated)
filename = os.path.basename(docx_path).lower()
if 'worship_program' in filename or 'worship-program' in filename:
    print(f"Warning: File '{filename}' appears to be a worship program, not a sermon transcript.")
    print("Please upload the original sermon/transcript DOCX file, not a generated worship program.")
    return None

2. Content Validation βœ…

Check: Detect worship program structure in content

# Validate that this looks like a sermon/transcript, not a worship program
content_lower = content.lower()
worship_program_indicators = [
    '## call to worship',
    '## songs',
    '## prayer',
    '## message',
    '## announcements',
    'worship program',
    'scripture reference',
    'today\'s bible reading'
]

indicator_count = sum(1 for indicator in worship_program_indicators if indicator in content_lower)
if indicator_count >= 3:
    print(f"Warning: The DOCX file appears to be a worship program (found {indicator_count} program indicators), not a sermon transcript.")
    print("Please upload the original sermon/transcript DOCX file for translation.")
    return None

3. Better Error Messages βœ…

File: app.py::process_worship_program()

Change: Provide clear error message when wrong file type is detected

if not bilingual_path_temp or not os.path.exists(bilingual_path_temp):
    error_msg = "❌ Error: Translation failed. "
    filename = os.path.basename(docx_file.name)
    if 'worship_program' in filename.lower() or 'worship-program' in filename.lower():
        error_msg += f"\n\nThe file '{filename}' appears to be a previously generated worship program, not a sermon transcript.\n"
        error_msg += "Please upload the ORIGINAL sermon/transcript DOCX file for translation."
    else:
        error_msg += "Please check the DOCX file."
    return error_msg, None

Expected Behavior

βœ… Correct Workflow

  1. Upload ORIGINAL sermon DOCX (e.g., sermon_2025-11-09.docx or message.docx)
  2. Upload PDF bulletin (e.g., RCCA-worship-bulletin-2025-11-09.pdf)
  3. System translates sermon DOCX β†’ Creates sermon_2025-11-09_bilingual.txt
  4. Bilingual file contains sermon content (not worship program content)
  5. Message section uses sermon bilingual content

❌ Wrong Workflow (Now Prevented)

  1. Upload generated worship program DOCX (e.g., worship_program_2025-11-09.docx)
  2. System detects it's a worship program β†’ Rejects with error message
  3. User must upload original sermon DOCX instead

How to Fix Existing Issue

If you already have a bilingual file with wrong content:

  1. Delete the incorrect bilingual file: worship_program_2025-11-09_bilingual.txt
  2. Upload the ORIGINAL sermon DOCX file (not the generated worship program)
  3. Re-run the translation

Files Modified

  • app.py - Added filename and content validation

Status: βœ… Validation added - system will now reject worship program files