worship / FIXES_SUMMARY.md
Peter Yang
Add summary of fixes for bilingual file and Message section issues
f9f0566

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Fixes Summary: Bilingual File and Message Section Issues

Date: 2025-11-12
Issues Fixed: Bilingual file persistence, Message section content, Content duplication


Issues Identified

  1. Bilingual file not saved to current directory

    • File was created in temp directory but not copied to current directory
    • File was lost after temp directory cleanup
  2. Message section not appearing correctly

    • Bilingual file path wasn't being found correctly
    • Message section was empty or had wrong content
  3. Content duplication

    • PDF content was being processed as a document, causing duplication
    • Bilingual file content was mixed with extracted PDF content
    • Same content appeared in multiple sections
  4. Qwen2.5 not used in translate_document

    • translate_document() was creating DocumentProcessingAgent without use_qwen_translation=True

Fixes Applied

1. Save Bilingual File to Current Directory βœ…

File: app.py

Change:

# Copy bilingual file to current directory for persistence and easy access
bilingual_filename = os.path.basename(bilingual_path_temp)
bilingual_path = bilingual_filename  # Save in current directory
shutil.copy2(bilingual_path_temp, bilingual_path)
progress(0.5, desc=f"πŸ’Ύ Saved bilingual translation to {bilingual_filename}...")

Result: Bilingual file is now saved to current directory (e.g., test_sermon_bilingual.txt)


2. Use Qwen2.5 in translate_document βœ…

File: app.py

Change:

# Initialize processor with Qwen2.5 translation enabled
processor = DocumentProcessingAgent(GEMMA_BACKEND_URL, use_qwen_translation=True)

Result: Translation now uses Qwen2.5 by default


3. Prevent PDF and Bilingual File Duplication βœ…

File: document_processing_agent.py

Change:

async def process_documents(self, document_paths: List[str]) -> List[DocumentContent]:
    """Process multiple documents and extract structured content"""
    results = []
    
    for doc_path in document_paths:
        # Skip bilingual text files - they're handled separately for Message section
        if doc_path and isinstance(doc_path, str) and doc_path.endswith('_bilingual.txt'):
            continue
        
        # Skip PDF files - they're only used for date extraction, not content extraction
        # PDF content should not be processed as it causes duplication
        if doc_path and isinstance(doc_path, str) and doc_path.lower().endswith('.pdf'):
            continue
        
        # ... process other documents

Result:

  • PDF files are skipped during document processing (only used for date extraction)
  • Bilingual files are skipped during document processing (handled separately)
  • No duplication from PDF or bilingual file content

4. Message Section Uses Only Bilingual Content βœ…

File: document_processing_agent.py

Change:

# Replace Message section with Bilingual Document Translation
# Load bilingual document and format it - this is the ONLY source for Message section
bilingual_content = self._load_bilingual_document(document_sources)
messages_formatted = "Sermon message to be prepared"

if bilingual_content and bilingual_content.strip():
    # ... format bilingual content ...
    messages_formatted = f"""*Date: {formatted_date}*

{bilingual_text}"""
else:
    # No bilingual document available - use fallback message
    # Don't use aggregated_content.get('messages') to avoid duplication from PDF processing
    messages_formatted = "Sermon message to be prepared"

Result:

  • Message section ONLY uses bilingual file content
  • No mixing with extracted PDF content
  • No duplication

Expected Behavior After Fixes

File Flow

  1. DOCX Upload β†’ Extract content
  2. Translation β†’ Create {docx_name}_bilingual.txt in temp directory
  3. Copy to Current Directory β†’ Save {docx_name}_bilingual.txt to current directory
  4. Generate Program β†’ Use bilingual file for Message section, PDF for date only

Message Section Content

  • Source: Only from {docx_name}_bilingual.txt
  • Format:
    ## Message
    
    *Date: November 9, 2025*
    
    [Chinese paragraph 1]
    [English translation 1]
    
    [Chinese paragraph 2]
    [English translation 2]
    ...
    

No Duplication

  • βœ… PDF content not processed as document
  • βœ… Bilingual file not processed as document
  • βœ… Message section only uses bilingual file
  • βœ… No duplicate content in other sections

Testing

To verify fixes:

  1. Check bilingual file exists:

    ls -la *_bilingual.txt
    
  2. Check Message section:

    • Open generated markdown file
    • Verify Message section contains bilingual content
    • Verify no duplication from PDF
  3. Check no duplication:

    • Verify content doesn't appear multiple times
    • Verify PDF content not in Message section
    • Verify bilingual content only in Message section

Files Modified

  • app.py - Save bilingual file, use Qwen2.5
  • document_processing_agent.py - Skip PDF/bilingual processing, Message section fix

Status: βœ… All fixes applied and committed