Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
Fixes Summary: Bilingual File and Message Section Issues
Date: 2025-11-12
Issues Fixed: Bilingual file persistence, Message section content, Content duplication
Issues Identified
Bilingual file not saved to current directory
- File was created in temp directory but not copied to current directory
- File was lost after temp directory cleanup
Message section not appearing correctly
- Bilingual file path wasn't being found correctly
- Message section was empty or had wrong content
Content duplication
- PDF content was being processed as a document, causing duplication
- Bilingual file content was mixed with extracted PDF content
- Same content appeared in multiple sections
Qwen2.5 not used in translate_document
translate_document()was creating DocumentProcessingAgent withoutuse_qwen_translation=True
Fixes Applied
1. Save Bilingual File to Current Directory β
File: app.py
Change:
# Copy bilingual file to current directory for persistence and easy access
bilingual_filename = os.path.basename(bilingual_path_temp)
bilingual_path = bilingual_filename # Save in current directory
shutil.copy2(bilingual_path_temp, bilingual_path)
progress(0.5, desc=f"πΎ Saved bilingual translation to {bilingual_filename}...")
Result: Bilingual file is now saved to current directory (e.g., test_sermon_bilingual.txt)
2. Use Qwen2.5 in translate_document β
File: app.py
Change:
# Initialize processor with Qwen2.5 translation enabled
processor = DocumentProcessingAgent(GEMMA_BACKEND_URL, use_qwen_translation=True)
Result: Translation now uses Qwen2.5 by default
3. Prevent PDF and Bilingual File Duplication β
File: document_processing_agent.py
Change:
async def process_documents(self, document_paths: List[str]) -> List[DocumentContent]:
"""Process multiple documents and extract structured content"""
results = []
for doc_path in document_paths:
# Skip bilingual text files - they're handled separately for Message section
if doc_path and isinstance(doc_path, str) and doc_path.endswith('_bilingual.txt'):
continue
# Skip PDF files - they're only used for date extraction, not content extraction
# PDF content should not be processed as it causes duplication
if doc_path and isinstance(doc_path, str) and doc_path.lower().endswith('.pdf'):
continue
# ... process other documents
Result:
- PDF files are skipped during document processing (only used for date extraction)
- Bilingual files are skipped during document processing (handled separately)
- No duplication from PDF or bilingual file content
4. Message Section Uses Only Bilingual Content β
File: document_processing_agent.py
Change:
# Replace Message section with Bilingual Document Translation
# Load bilingual document and format it - this is the ONLY source for Message section
bilingual_content = self._load_bilingual_document(document_sources)
messages_formatted = "Sermon message to be prepared"
if bilingual_content and bilingual_content.strip():
# ... format bilingual content ...
messages_formatted = f"""*Date: {formatted_date}*
{bilingual_text}"""
else:
# No bilingual document available - use fallback message
# Don't use aggregated_content.get('messages') to avoid duplication from PDF processing
messages_formatted = "Sermon message to be prepared"
Result:
- Message section ONLY uses bilingual file content
- No mixing with extracted PDF content
- No duplication
Expected Behavior After Fixes
File Flow
- DOCX Upload β Extract content
- Translation β Create
{docx_name}_bilingual.txtin temp directory - Copy to Current Directory β Save
{docx_name}_bilingual.txtto current directory - Generate Program β Use bilingual file for Message section, PDF for date only
Message Section Content
- Source: Only from
{docx_name}_bilingual.txt - Format:
## Message *Date: November 9, 2025* [Chinese paragraph 1] [English translation 1] [Chinese paragraph 2] [English translation 2] ...
No Duplication
- β PDF content not processed as document
- β Bilingual file not processed as document
- β Message section only uses bilingual file
- β No duplicate content in other sections
Testing
To verify fixes:
Check bilingual file exists:
ls -la *_bilingual.txtCheck Message section:
- Open generated markdown file
- Verify Message section contains bilingual content
- Verify no duplication from PDF
Check no duplication:
- Verify content doesn't appear multiple times
- Verify PDF content not in Message section
- Verify bilingual content only in Message section
Files Modified
app.py- Save bilingual file, use Qwen2.5document_processing_agent.py- Skip PDF/bilingual processing, Message section fix
Status: β All fixes applied and committed