Spaces:

NextDrought
/

worship-agent

Sleeping

App Files Files Community

DynamicPacific commited on Nov 18, 2025

Commit

7ca4566

1 Parent(s): c20ee5f

Deploy worship program generator application to HF Space

Browse files

Files changed (16) hide show

README.md +241 -5
agents/__init__.py +11 -0
agents/tools.py +213 -0
agents/worship_agent.py +201 -0
app.py +295 -0
core/__init__.py +10 -0
core/document_processor.py +284 -0
examples/README.md +214 -0
examples/sample_chinese_sermon.txt +61 -0
llm/__init__.py +12 -0
llm/prompt_templates.py +166 -0
llm/qwen_client.py +218 -0
requirements.txt +40 -0
utils/__init__.py +12 -0
utils/file_utils.py +75 -0
utils/markdown_to_docx.py +100 -0

README.md CHANGED Viewed

@@ -1,12 +1,248 @@
 ---
-title: Worship Agent
-emoji: 👁
-colorFrom: yellow
 colorTo: purple
 sdk: gradio
-sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Worship Program Generator
+emoji: 🙏
+colorFrom: blue
 colorTo: purple
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
+license: mit
+python_version: "3.10"
+suggested_hardware: cpu-basic
 ---
+# 🙏 主日崇拜程序生成器 Worship Program Generator
+**Generate bilingual (Chinese-English) worship programs automatically from multiple source documents.**
+This AI-powered tool helps church staff create comprehensive worship programs by:
+- Extracting content from worship bulletins (PDF)
+- Generating sermon narratives from slide presentations (PDF)
+- Translating between Chinese and English
+- Assembling complete bilingual worship programs
+## ✨ Features
+### 📄 Multi-Source Input Support
+- **Chinese Sermon Text**: Upload pre-written sermon manuscripts (.txt)
+- **Sermon Slides PDF**: Generate flowing narratives from bullet-point slides
+- **Worship Bulletin PDF**: Extract liturgical elements, hymns, scripture readings
+### 🤖 AI-Powered Processing
+- **Narrative Generation**: Convert sermon slides into cohesive sermon text
+- **Translation**: High-quality Chinese ↔ English translation preserving theological nuance
+- **Program Assembly**: Intelligently combine all elements into structured worship order
+### 📤 Output Formats
+- **Markdown**: Easy to edit and version control
+- **DOCX**: Ready for printing and distribution
+### 🌐 Bilingual Support
+- Seamless Chinese-English parallel text
+- Preserves cultural and theological context
+- Liturgical terminology handled appropriately
+## 🚀 Quick Start
+### Option A: Pre-Written Sermon
+1. Upload your **Chinese sermon text** (.txt file)
+2. Upload your **worship bulletin** (PDF)
+3. Enter the worship date
+4. Click "Generate Worship Program"
+### Option B: Generate from Slides
+1. Upload your **sermon slides** (PDF)
+2. Upload your **worship bulletin** (PDF)
+3. Enter the worship date
+4. Click "Generate Worship Program"
+The AI will:
+- Extract content from all sources
+- Generate narrative text (if needed)
+- Translate to target language
+- Assemble complete worship program
+- Export to Markdown and DOCX
+## 🛠️ Technical Details
+### LLM Backend
+- **Model**: Qwen 2.5-7B-Instruct (Alibaba Cloud)
+- **Deployment**: HuggingFace Inference API (serverless)
+- **Languages**: Optimized for Chinese and English
+### Document Processing
+- **PDF Extraction**: Text and image-based PDFs supported
+- **OCR**: Automatic OCR for scanned documents (Tesseract)
+- **Structure Detection**: Intelligent parsing of worship elements
+### Architecture
+```
+Input PDFs → Document Processor → LLM (Qwen) → Program Assembler → Output (MD/DOCX)
+```
+## 📋 Input Requirements
+### Chinese Sermon Text (Option A)
+- Format: Plain text (.txt)
+- Encoding: UTF-8
+- Recommended: Include paragraph breaks and section markers
+### Sermon Slides PDF (Option B)
+- Format: PDF
+- Content: Can be text-based or image-based (OCR supported)
+- Structure: Title slides, main points, scripture references
+### Worship Bulletin PDF (Required)
+- Format: PDF
+- Should include:
+  - Worship date
+  - Order of service
+  - Hymn numbers/titles
+  - Scripture readings
+  - Announcements
+## 📦 Project Structure
+```
+worship-program-agent/
+├── app.py                      # Gradio UI
+├── requirements.txt            # Dependencies
+├── README.md                   # This file
+├── .env.example               # Configuration template
+├── core/
+│   ├── document_processor.py  # PDF extraction & OCR
+│   ├── translator.py          # Translation logic
+│   ├── narrative_generator.py # Sermon generation
+│   └── program_assembler.py   # Final assembly
+├── agents/
+│   ├── worship_agent.py       # Workflow orchestration
+│   └── tools.py               # Agent tools
+├── llm/
+│   ├── qwen_client.py         # Qwen LLM wrapper
+│   └── prompt_templates.py    # System prompts
+├── utils/
+│   ├── file_utils.py          # File handling
+│   └── markdown_to_docx.py    # Format conversion
+└── examples/
+    ├── sample_sermon.txt
+    ├── sample_slides.pdf
+    └── sample_bulletin.pdf
+```
+## 🔧 Local Development
+### Prerequisites
+- Python 3.10+
+- Tesseract OCR (for scanned PDFs)
+- HuggingFace API token
+### Setup
+```bash
+# Clone the repository
+git clone <your-repo-url>
+cd worship-program-agent
+# Install dependencies
+pip install -r requirements.txt
+# Configure environment
+cp .env.example .env
+# Edit .env and add your HF_API_TOKEN
+# Install Tesseract (for OCR support)
+# Ubuntu/Debian:
+sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-eng
+# macOS:
+brew install tesseract tesseract-lang
+# Run locally
+python app.py
+```
+### Configuration
+Edit `.env` file:
+```bash
+MODEL_ID=Qwen/Qwen2.5-7B-Instruct
+HF_API_TOKEN=your_token_here
+USE_LOCAL_MODEL=false
+OCR_LANGUAGES=eng+chi_sim
+```
+## 🌐 Deployment
+### HuggingFace Spaces
+This app is designed for HuggingFace Spaces deployment:
+1. **Create a new Space** on HuggingFace
+2. **Push this repository** to the Space
+3. **Set environment variables** in Space settings:
+   - `HF_API_TOKEN`: Your HuggingFace API token
+   - `MODEL_ID`: (Optional) Custom model selection
+4. **Select hardware**: `cpu-basic` (recommended for Inference API)
+The Space will automatically build and deploy.
+### Alternative: Local Model
+For faster inference, use local GPU:
+1. Set `suggested_hardware: t4-medium` in README metadata
+2. Set `USE_LOCAL_MODEL=true` in environment
+3. Uncomment `torch` in requirements.txt
+Note: Local model requires ~14GB GPU memory for Qwen 2.5-7B.
+## 📊 Performance
+### Typical Processing Time
+- **Bulletin extraction**: 2-5 seconds
+- **Sermon narrative generation**: 15-30 seconds
+- **Translation**: 10-20 seconds
+- **Program assembly**: 5-10 seconds
+- **Total**: 30-60 seconds (depending on content length)
+### API Costs (HF Inference API)
+- Free tier: 1,000 requests/month
+- Paid tier: ~$0.001-0.005 per request
+- Typical program generation: ~3-4 API calls
+## ⚠️ Limitations
+- **Maximum file size**: 20MB per upload
+- **PDF complexity**: Very complex layouts may require manual review
+- **OCR accuracy**: Scanned documents may have transcription errors
+- **Translation**: Review theological terms for accuracy
+- **Rate limits**: HF Inference API has rate limiting
+## 🤝 Contributing
+Contributions welcome! Areas for improvement:
+- Additional language pairs
+- Custom template support
+- Batch processing
+- Enhanced structure detection
+- Alternative LLM backends
+## 📄 License
+MIT License - see LICENSE file
+## 🙏 Acknowledgments
+- **Qwen Team** (Alibaba Cloud) - LLM model
+- **HuggingFace** - Inference infrastructure
+- **Gradio** - UI framework
+- **Tesseract** - OCR engine
+## 📞 Support
+For issues, questions, or feature requests, please open an issue on GitHub.
+---
+**Built with ❤️ for church communities**

agents/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+"""
+Agent orchestration and workflow modules.
+"""
+from .worship_agent import WorshipProgramAgent
+from .tools import WorshipProgramTools
+__all__ = [
+    "WorshipProgramAgent",
+    "WorshipProgramTools",
+]

agents/tools.py ADDED Viewed

	@@ -0,0 +1,213 @@

+"""
+Tool functions for worship program generation workflow.
+"""
+from typing import Dict, List, Optional
+from pathlib import Path
+class WorshipProgramTools:
+    """Tool functions for worship program generation."""
+    def __init__(self, llm_client):
+        """
+        Initialize tools with LLM client.
+        Args:
+            llm_client: Instance of QwenClient or compatible LLM client
+        """
+        from core.document_processor import DocumentProcessor, ChineseTextProcessor
+        self.llm = llm_client
+        self.doc_processor = DocumentProcessor()
+        self.cn_processor = ChineseTextProcessor()
+    def extract_bulletin_tool(self, pdf_path: str) -> Dict:
+        """
+        Extract worship order and elements from bulletin PDF.
+        Args:
+            pdf_path: Path to bulletin PDF file
+        Returns:
+            {
+                "success": bool,
+                "data": Dict or None,
+                "error": str (if failed),
+                "message": str
+            }
+        """
+        try:
+            result = self.doc_processor.extract_bulletin_pdf(pdf_path)
+            return {
+                "success": True,
+                "data": result,
+                "message": f"Extracted bulletin content ({len(result.get('text', ''))} chars)"
+            }
+        except Exception as e:
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"Failed to extract bulletin: {str(e)}"
+            }
+    def generate_sermon_narrative_tool(self, slides_pdf_path: str) -> Dict:
+        """
+        Generate flowing sermon narrative from slide PDF.
+        Steps:
+        1. Extract text/images from slides
+        2. Identify structure (title, points, scriptures)
+        3. Generate cohesive narrative using LLM
+        Args:
+            slides_pdf_path: Path to sermon slides PDF
+        Returns:
+            {
+                "success": bool,
+                "narrative": str (if successful),
+                "structure": Dict,
+                "error": str (if failed),
+                "message": str
+            }
+        """
+        try:
+            # Extract slides
+            slides_data = self.doc_processor.extract_sermon_slides_pdf(slides_pdf_path)
+            # Format for LLM
+            slides_text = self._format_slides_for_generation(slides_data)
+            # Generate narrative
+            narrative = self.llm.generate_narrative(slides_text)
+            return {
+                "success": True,
+                "narrative": narrative,
+                "structure": slides_data["structure"],
+                "message": f"Generated sermon narrative ({len(narrative)} chars)"
+            }
+        except Exception as e:
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"Failed to generate sermon: {str(e)}"
+            }
+    def translate_text_tool(
+        self,
+        text: str,
+        source_lang: str = "Chinese",
+        target_lang: str = "English"
+    ) -> Dict:
+        """
+        Translate text between Chinese and English.
+        Args:
+            text: Source text
+            source_lang: Source language (Chinese/English)
+            target_lang: Target language (English/Chinese)
+        Returns:
+            {
+                "success": bool,
+                "translation": str (if successful),
+                "source_lang": str,
+                "target_lang": str,
+                "error": str (if failed)
+            }
+        """
+        try:
+            translation = self.llm.translate(text, source_lang, target_lang)
+            return {
+                "success": True,
+                "translation": translation,
+                "source_lang": source_lang,
+                "target_lang": target_lang,
+                "message": f"Translated {len(text)} chars"
+            }
+        except Exception as e:
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"Translation failed: {str(e)}"
+            }
+    def assemble_worship_program_tool(
+        self,
+        chinese_sermon: str,
+        english_sermon: str,
+        bulletin_data: Dict,
+        date: str
+    ) -> Dict:
+        """
+        Assemble complete bilingual worship program.
+        Args:
+            chinese_sermon: Chinese sermon text
+            english_sermon: English sermon translation
+            bulletin_data: Extracted bulletin data
+            date: Worship date (YYYY-MM-DD)
+        Returns:
+            {
+                "success": bool,
+                "program": str (markdown content if successful),
+                "error": str (if failed),
+                "message": str
+            }
+        """
+        try:
+            program_markdown = self.llm.assemble_program(
+                chinese_sermon=chinese_sermon,
+                english_sermon=english_sermon,
+                bulletin_content=bulletin_data.get("text", ""),
+                date=date
+            )
+            return {
+                "success": True,
+                "program": program_markdown,
+                "message": "Worship program assembled successfully"
+            }
+        except Exception as e:
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"Program assembly failed: {str(e)}"
+            }
+    def _format_slides_for_generation(self, slides_data: Dict) -> str:
+        """
+        Format extracted slides data for narrative generation.
+        Args:
+            slides_data: Output from extract_sermon_slides_pdf
+        Returns:
+            Formatted text for LLM input
+        """
+        lines = []
+        # Add structure summary
+        structure = slides_data.get("structure", {})
+        if structure.get("title"):
+            lines.append(f"# {structure['title']}\n")
+        # Add slides content
+        for slide in slides_data.get("slides", []):
+            text = slide["text"].strip()
+            if not text:
+                continue
+            if slide["is_title"]:
+                lines.append(f"## {text}")
+            elif slide["is_scripture"]:
+                lines.append(f"**Scripture:** {text}")
+            else:
+                lines.append(text)
+            lines.append("")  # Add spacing
+        return "\n".join(lines)

agents/worship_agent.py ADDED Viewed

	@@ -0,0 +1,201 @@

+"""
+Main workflow orchestration agent for worship program generation.
+"""
+from typing import Dict, Optional, Callable
+from pathlib import Path
+from agents.tools import WorshipProgramTools
+class WorshipProgramAgent:
+    """
+    Orchestrates worship program generation workflow.
+    Workflow:
+    1. Extract bulletin (worship order, hymns, date)
+    2. Process Chinese sermon OR generate from slides
+    3. Translate sermon to English
+    4. Assemble complete bilingual program
+    5. Export to markdown and DOCX
+    """
+    def __init__(
+        self,
+        llm_client,
+        output_dir: str = "./outputs"
+    ):
+        """
+        Initialize worship program agent.
+        Args:
+            llm_client: Instance of QwenClient or compatible LLM
+            output_dir: Directory for output files
+        """
+        self.llm = llm_client
+        self.tools = WorshipProgramTools(llm_client)
+        self.output_dir = Path(output_dir)
+        self.output_dir.mkdir(exist_ok=True, parents=True)
+    def generate_program(
+        self,
+        chinese_sermon_text: Optional[str] = None,
+        sermon_slides_pdf: Optional[str] = None,
+        bulletin_pdf: str = None,
+        date: Optional[str] = None,
+        progress_callback: Optional[Callable] = None
+    ) -> Dict:
+        """
+        Main workflow to generate worship program.
+        Args:
+            chinese_sermon_text: Pre-written Chinese sermon (if available)
+            sermon_slides_pdf: Sermon slides PDF (if sermon needs generation)
+            bulletin_pdf: Worship bulletin PDF (required)
+            date: Worship date (auto-extracted if not provided)
+            progress_callback: Function(progress: float, desc: str) to report progress
+        Returns:
+            {
+                "success": bool,
+                "markdown_path": str (if successful),
+                "docx_path": str (if successful),
+                "program_content": str,
+                "metadata": Dict,
+                "error": str (if failed)
+            }
+        """
+        def update_progress(step: str, pct: float):
+            """Helper to update progress."""
+            if progress_callback:
+                progress_callback(pct, desc=step)
+        try:
+            # Validation
+            if not bulletin_pdf:
+                raise ValueError("Bulletin PDF is required")
+            if not chinese_sermon_text and not sermon_slides_pdf:
+                raise ValueError("Must provide either chinese_sermon_text or sermon_slides_pdf")
+            # Step 1: Extract bulletin
+            update_progress("📄 Extracting bulletin...", 0.1)
+            bulletin_result = self.tools.extract_bulletin_tool(bulletin_pdf)
+            if not bulletin_result["success"]:
+                raise ValueError(f"Bulletin extraction failed: {bulletin_result.get('error', 'Unknown error')}")
+            bulletin_data = bulletin_result["data"]
+            if not date:
+                date = bulletin_data.get("date") or "未标注日期"
+            # Step 2: Get/Generate Chinese sermon
+            update_progress("📝 Processing sermon...", 0.3)
+            if chinese_sermon_text:
+                chinese_sermon = chinese_sermon_text
+            else:
+                sermon_result = self.tools.generate_sermon_narrative_tool(sermon_slides_pdf)
+                if not sermon_result["success"]:
+                    raise ValueError(f"Sermon generation failed: {sermon_result.get('error', 'Unknown error')}")
+                chinese_sermon = sermon_result["narrative"]
+            # Step 3: Translate to English
+            update_progress("🌐 Translating sermon...", 0.5)
+            translation_result = self.tools.translate_text_tool(
+                text=chinese_sermon,
+                source_lang="Chinese",
+                target_lang="English"
+            )
+            if not translation_result["success"]:
+                raise ValueError(f"Translation failed: {translation_result.get('error', 'Unknown error')}")
+            english_sermon = translation_result["translation"]
+            # Step 4: Assemble program
+            update_progress("📋 Assembling worship program...", 0.7)
+            program_result = self.tools.assemble_worship_program_tool(
+                chinese_sermon=chinese_sermon,
+                english_sermon=english_sermon,
+                bulletin_data=bulletin_data,
+                date=date
+            )
+            if not program_result["success"]:
+                raise ValueError(f"Program assembly failed: {program_result.get('error', 'Unknown error')}")
+            program_markdown = program_result["program"]
+            # Step 5: Save outputs
+            update_progress("💾 Saving files...", 0.9)
+            markdown_path = self._save_markdown(program_markdown, date)
+            docx_path = self._save_docx(program_markdown, date)
+            update_progress("✅ Complete!", 1.0)
+            return {
+                "success": True,
+                "markdown_path": str(markdown_path),
+                "docx_path": str(docx_path),
+                "program_content": program_markdown,
+                "metadata": {
+                    "date": date,
+                    "chinese_sermon_length": len(chinese_sermon),
+                    "english_sermon_length": len(english_sermon),
+                    "bulletin_source": bulletin_pdf,
+                    "sermon_source": "text" if chinese_sermon_text else "slides"
+                }
+            }
+        except Exception as e:
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"Workflow failed: {str(e)}"
+            }
+    def _save_markdown(self, content: str, date: str) -> Path:
+        """
+        Save program as markdown.
+        Args:
+            content: Markdown content
+            date: Date string for filename
+        Returns:
+            Path to saved file
+        """
+        # Sanitize date for filename
+        safe_date = date.replace("/", "-").replace(" ", "_")
+        filename = f"worship_program_{safe_date}.md"
+        filepath = self.output_dir / filename
+        with open(filepath, "w", encoding="utf-8") as f:
+            f.write(content)
+        return filepath
+    def _save_docx(self, markdown_content: str, date: str) -> Path:
+        """
+        Convert markdown to DOCX and save.
+        Args:
+            markdown_content: Markdown content
+            date: Date string for filename
+        Returns:
+            Path to saved DOCX file
+        """
+        from utils.markdown_to_docx import markdown_to_docx
+        safe_date = date.replace("/", "-").replace(" ", "_")
+        filename = f"worship_program_{safe_date}.docx"
+        filepath = self.output_dir / filename
+        try:
+            markdown_to_docx(markdown_content, str(filepath))
+        except Exception as e:
+            print(f"Warning: DOCX conversion failed: {e}")
+            # Return None if conversion fails
+            return None
+        return filepath

app.py ADDED Viewed

	@@ -0,0 +1,295 @@

+"""
+Worship Program Generator - Gradio Application
+Bilingual (Chinese-English) worship program generation from multiple sources.
+"""
+import os
+import gradio as gr
+from pathlib import Path
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# Configuration
+MODEL_ID = os.getenv("MODEL_ID", "Qwen/Qwen2.5-7B-Instruct")
+HF_API_TOKEN = os.getenv("HF_API_TOKEN")
+USE_LOCAL_MODEL = os.getenv("USE_LOCAL_MODEL", "false").lower() == "true"
+MAX_FILE_SIZE_MB = int(os.getenv("MAX_FILE_SIZE_MB", "20"))
+# Initialize LLM and Agent
+from llm.qwen_client import QwenClient
+from agents.worship_agent import WorshipProgramAgent
+print(f"Initializing with model: {MODEL_ID}")
+print(f"Using local model: {USE_LOCAL_MODEL}")
+try:
+    llm_client = QwenClient(
+        model_id=MODEL_ID,
+        api_token=HF_API_TOKEN,
+        use_local=USE_LOCAL_MODEL
+    )
+    agent = WorshipProgramAgent(llm_client, output_dir="./outputs")
+    print("✓ Agent initialized successfully")
+except Exception as e:
+    print(f"✗ Error initializing agent: {e}")
+    llm_client = None
+    agent = None
+def process_worship_program(
+    chinese_sermon_file,
+    sermon_slides_file,
+    bulletin_file,
+    worship_date,
+    progress=gr.Progress()
+):
+    """
+    Main Gradio handler for worship program generation.
+    Args:
+        chinese_sermon_file: Uploaded .txt file with Chinese sermon (optional)
+        sermon_slides_file: Uploaded sermon slides PDF (optional)
+        bulletin_file: Uploaded bulletin PDF (required)
+        worship_date: Date string (YYYY-MM-DD)
+        progress: Gradio progress tracker
+    Returns:
+        (status_message, markdown_file, docx_file)
+    """
+    if agent is None:
+        return "❌ Error: Agent not initialized. Check configuration.", None, None
+    # Validation
+    if not bulletin_file:
+        return "❌ Error: Bulletin PDF is required", None, None
+    if not chinese_sermon_file and not sermon_slides_file:
+        return "❌ Error: Must provide either Chinese sermon text OR sermon slides PDF", None, None
+    # Check file sizes
+    try:
+        if bulletin_file and Path(bulletin_file).stat().st_size > MAX_FILE_SIZE_MB * 1024 * 1024:
+            return f"❌ Error: Bulletin file exceeds {MAX_FILE_SIZE_MB}MB limit", None, None
+        if sermon_slides_file and Path(sermon_slides_file).stat().st_size > MAX_FILE_SIZE_MB * 1024 * 1024:
+            return f"❌ Error: Slides file exceeds {MAX_FILE_SIZE_MB}MB limit", None, None
+    except Exception as e:
+        return f"❌ Error checking file sizes: {str(e)}", None, None
+    try:
+        # Read Chinese sermon if provided
+        chinese_text = None
+        if chinese_sermon_file:
+            try:
+                with open(chinese_sermon_file, "r", encoding="utf-8") as f:
+                    chinese_text = f.read()
+            except UnicodeDecodeError:
+                # Try GB2312/GBK encoding
+                with open(chinese_sermon_file, "r", encoding="gbk") as f:
+                    chinese_text = f.read()
+        # Generate program
+        result = agent.generate_program(
+            chinese_sermon_text=chinese_text,
+            sermon_slides_pdf=sermon_slides_file,
+            bulletin_pdf=bulletin_file,
+            date=worship_date if worship_date else None,
+            progress_callback=lambda pct, desc: progress(pct, desc=desc)
+        )
+        if not result["success"]:
+            error_msg = result.get("message", "Unknown error")
+            return f"❌ Error: {error_msg}", None, None
+        # Format success message
+        metadata = result["metadata"]
+        status = f"""✅ **Worship Program Generated Successfully!**
+**📅 Date:** {metadata['date']}
+**📊 Statistics:**
+- Chinese Sermon: {metadata['chinese_sermon_length']:,} characters
+- English Sermon: {metadata['english_sermon_length']:,} characters
+- Source: {"Pre-written text" if metadata['sermon_source'] == 'text' else "Generated from slides"}
+**📁 Output Files:**
+- Markdown: `{Path(result['markdown_path']).name}`
+- DOCX: `{Path(result['docx_path']).name if result.get('docx_path') else 'Not generated'}`
+Download the files below ⬇️
+"""
+        # Return paths for download
+        markdown_file = result["markdown_path"] if Path(result["markdown_path"]).exists() else None
+        docx_file = result["docx_path"] if result.get("docx_path") and Path(result["docx_path"]).exists() else None
+        return status, markdown_file, docx_file
+    except Exception as e:
+        import traceback
+        error_msg = f"❌ **Error:**\n\n{str(e)}\n\n<details>\n<summary>Traceback</summary>\n\n```\n{traceback.format_exc()}\n```\n</details>"
+        return error_msg, None, None
+# Gradio Interface
+with gr.Blocks(
+    title="Worship Program Generator",
+    theme=gr.themes.Soft(),
+    css="""
+    .title { text-align: center; font-size: 2em; margin-bottom: 1em; }
+    .subtitle { text-align: center; color: #666; margin-bottom: 2em; }
+    """
+) as demo:
+    gr.Markdown(
+        """
+        <div class="title">🙏 主日崇拜程序生成器</div>
+        <div class="title">Worship Program Generator</div>
+        <div class="subtitle">Generate bilingual worship programs from multiple sources</div>
+        """,
+        elem_classes=["title"]
+    )
+    gr.Markdown("""
+    ### 📖 How to Use
+    **Required:** Worship Bulletin PDF
+    **Choose ONE:** Chinese sermon text OR sermon slides PDF
+    The system will:
+    1. Extract content from all sources
+    2. Generate narrative (if using slides)
+    3. Translate between languages
+    4. Assemble complete bilingual program
+    5. Export to Markdown and DOCX
+    """)
+    with gr.Row():
+        with gr.Column(scale=1):
+            gr.Markdown("### 📤 Input Files")
+            chinese_sermon_input = gr.File(
+                label="📝 Chinese Sermon Text (中文讲章) - Optional",
+                file_types=[".txt"],
+                type="filepath"
+            )
+            slides_input = gr.File(
+                label="🖼️ Sermon Slides PDF (讲章幻灯片) - Optional",
+                file_types=[".pdf"],
+                type="filepath"
+            )
+            bulletin_input = gr.File(
+                label="📋 Worship Bulletin PDF (崇拜程序单) - Required ⭐",
+                file_types=[".pdf"],
+                type="filepath"
+            )
+            date_input = gr.Textbox(
+                label="📅 Worship Date (YYYY-MM-DD)",
+                placeholder="2024-01-07 (leave blank to auto-detect)",
+                value=""
+            )
+            generate_btn = gr.Button(
+                "🚀 Generate Worship Program",
+                variant="primary",
+                size="lg"
+            )
+        with gr.Column(scale=1):
+            gr.Markdown("### 📥 Output")
+            status_output = gr.Markdown("💡 Ready to generate...")
+            markdown_download = gr.File(
+                label="📄 Download Markdown (.md)",
+                interactive=False
+            )
+            docx_download = gr.File(
+                label="📄 Download DOCX (.docx)",
+                interactive=False
+            )
+    # Usage Guide
+    with gr.Accordion("📚 Usage Guide & Tips", open=False):
+        gr.Markdown("""
+        ### Workflow Options
+        **Option A: Pre-written Sermon**
+        1. Upload Chinese sermon text file (.txt, UTF-8 encoding)
+        2. Upload worship bulletin PDF
+        3. Enter date (or leave blank)
+        4. Click Generate
+        **Option B: Generate from Slides**
+        1. Upload sermon slides PDF (can be text or image-based)
+        2. Upload worship bulletin PDF
+        3. Enter date (or leave blank)
+        4. Click Generate (AI will create narrative from slides)
+        ### Tips
+        - **Date Detection:** Leave blank to auto-extract from bulletin filename (format: `bulletin-YYYY-MM-DD.pdf`)
+        - **File Encoding:** Chinese text files should be UTF-8 or GBK encoded
+        - **PDF Support:** Both text-based and scanned (OCR) PDFs are supported
+        - **Processing Time:** Typically 30-60 seconds depending on content length
+        - **File Size Limit:** Maximum 20MB per file
+        ### Troubleshooting
+        - **OCR Issues:** Ensure bulletin text is clear and high-resolution
+        - **Translation Quality:** Review theological terms for accuracy
+        - **Missing Content:** Check that PDFs contain expected sections
+        - **Encoding Errors:** Save Chinese text as UTF-8
+        ### Output Format
+        The generated worship program includes:
+        - Bilingual header with date and theme
+        - Order of worship (prelude, songs, scripture)
+        - Complete sermon (Chinese + English)
+        - Liturgical elements
+        - Announcements
+        Both Markdown (.md) and Word (.docx) formats are provided.
+        """)
+    # Event handlers
+    generate_btn.click(
+        fn=process_worship_program,
+        inputs=[
+            chinese_sermon_input,
+            slides_input,
+            bulletin_input,
+            date_input
+        ],
+        outputs=[
+            status_output,
+            markdown_download,
+            docx_download
+        ],
+        show_progress=True
+    )
+    gr.Markdown("""
+    ---
+    **🤖 Powered by:** Qwen 2.5 LLM | **📦 Framework:** HuggingFace Transformers | **🎨 UI:** Gradio
+    Built with ❤️ for church communities
+    """)
+if __name__ == "__main__":
+    demo.queue(
+        max_size=int(os.getenv("GRADIO_MAX_QUEUE_SIZE", "10"))
+    ).launch(
+        server_name=os.getenv("GRADIO_SERVER_NAME", "0.0.0.0"),
+        server_port=int(os.getenv("GRADIO_SERVER_PORT", "7860")),
+        share=os.getenv("GRADIO_SHARE", "false").lower() == "true"
+    )

core/__init__.py ADDED Viewed

	@@ -0,0 +1,10 @@

+"""
+Core document processing and content generation modules.
+"""
+from .document_processor import DocumentProcessor, ChineseTextProcessor
+__all__ = [
+    "DocumentProcessor",
+    "ChineseTextProcessor",
+]

core/document_processor.py ADDED Viewed

	@@ -0,0 +1,284 @@

+"""
+Document processing module for PDF extraction and OCR.
+"""
+import pdfplumber
+import pytesseract
+from pdf2image import convert_from_path
+from PIL import Image
+from typing import Dict, List, Optional
+import re
+from pathlib import Path
+class DocumentProcessor:
+    """Extract text and structure from PDF documents."""
+    def __init__(self, ocr_languages: str = "eng+chi_sim"):
+        """
+        Initialize document processor.
+        Args:
+            ocr_languages: Tesseract language codes (e.g., "eng+chi_sim")
+        """
+        self.ocr_languages = ocr_languages
+    def extract_bulletin_pdf(self, pdf_path: str) -> Dict:
+        """
+        Extract worship order from bulletin PDF.
+        Args:
+            pdf_path: Path to bulletin PDF file
+        Returns:
+            {
+                "text": str,              # Full text content
+                "sections": {             # Structured sections
+                    "hymns": List[str],
+                    "scripture": str,
+                    "announcements": str,
+                    "order": List[str]
+                },
+                "date": str,              # Extracted date
+                "metadata": Dict
+            }
+        """
+        text = self.extract_with_structure(pdf_path)
+        date = self.extract_date_from_filename(pdf_path)
+        # TODO: Implement intelligent section parsing
+        sections = self._parse_bulletin_sections(text)
+        return {
+            "text": text,
+            "sections": sections,
+            "date": date,
+            "metadata": {
+                "filename": Path(pdf_path).name,
+                "page_count": self._get_page_count(pdf_path)
+            }
+        }
+    def extract_sermon_slides_pdf(self, pdf_path: str) -> Dict:
+        """
+        Extract sermon content from slides PDF.
+        Args:
+            pdf_path: Path to sermon slides PDF
+        Returns:
+            {
+                "slides": List[Dict],     # List of slide data
+                "structure": Dict         # Sermon structure
+            }
+        """
+        slides = []
+        with pdfplumber.open(pdf_path) as pdf:
+            for i, page in enumerate(pdf.pages):
+                text = page.extract_text() or ""
+                # If no text, try OCR
+                if len(text.strip()) < 10:
+                    text = self._ocr_page(pdf_path, i)
+                slide_data = {
+                    "page_num": i + 1,
+                    "text": text,
+                    "is_title": self._is_title_slide(text),
+                    "is_scripture": self._is_scripture_slide(text)
+                }
+                slides.append(slide_data)
+        structure = self._extract_sermon_structure(slides)
+        return {
+            "slides": slides,
+            "structure": structure
+        }
+    def extract_with_structure(self, pdf_path: str) -> str:
+        """
+        Extract text from PDF preserving structure.
+        Args:
+            pdf_path: Path to PDF file
+        Returns:
+            Extracted text with layout preserved
+        """
+        content = []
+        try:
+            with pdfplumber.open(pdf_path) as pdf:
+                for page in pdf.pages:
+                    text = page.extract_text(layout=True)
+                    if text:
+                        content.append(text)
+        except Exception as e:
+            print(f"Error extracting PDF: {e}")
+            # Fallback to OCR
+            content = [self._ocr_page(pdf_path, i) for i in range(self._get_page_count(pdf_path))]
+        return "\n\n".join(content)
+    def _ocr_page(self, pdf_path: str, page_num: int) -> str:
+        """
+        OCR a single page from PDF.
+        Args:
+            pdf_path: Path to PDF
+            page_num: Page number (0-indexed)
+        Returns:
+            Extracted text from OCR
+        """
+        try:
+            images = convert_from_path(pdf_path, first_page=page_num+1, last_page=page_num+1)
+            if images:
+                return pytesseract.image_to_string(images[0], lang=self.ocr_languages)
+        except Exception as e:
+            print(f"OCR error on page {page_num}: {e}")
+        return ""
+    def _get_page_count(self, pdf_path: str) -> int:
+        """Get total page count from PDF."""
+        try:
+            with pdfplumber.open(pdf_path) as pdf:
+                return len(pdf.pages)
+        except:
+            return 0
+    def extract_date_from_filename(self, pdf_path: str) -> str:
+        """
+        Extract date from PDF filename.
+        Looks for patterns like YYYY-MM-DD.
+        Args:
+            pdf_path: Path to PDF file
+        Returns:
+            Date string (YYYY-MM-DD) or empty string
+        """
+        filename = Path(pdf_path).name
+        match = re.search(r'(\d{4}-\d{2}-\d{2})', filename)
+        if match:
+            return match.group(1)
+        return ""
+    def _parse_bulletin_sections(self, text: str) -> Dict:
+        """Parse bulletin into structured sections."""
+        # TODO: Implement intelligent parsing
+        return {
+            "hymns": [],
+            "scripture": "",
+            "announcements": "",
+            "order": []
+        }
+    def _is_title_slide(self, text: str) -> bool:
+        """Detect if slide is a title slide."""
+        # Simple heuristic: short text, no bullet points
+        lines = text.strip().split('\n')
+        return len(lines) <= 3 and not any(line.strip().startswith(('•', '-', '*')) for line in lines)
+    def _is_scripture_slide(self, text: str) -> bool:
+        """Detect if slide contains scripture reference."""
+        # Look for common scripture patterns
+        scripture_patterns = [
+            r'[创出利民申].*\d+:\d+',  # Chinese books
+            r'[约太可路罗林加弗腓西帖提多门彼雅启].*\d+:\d+',
+            r'\b[A-Z][a-z]+\s+\d+:\d+',  # English books
+        ]
+        return any(re.search(pattern, text) for pattern in scripture_patterns)
+    def _extract_sermon_structure(self, slides: List[Dict]) -> Dict:
+        """Extract sermon structure from slides."""
+        structure = {
+            "title": "",
+            "main_points": [],
+            "scriptures": []
+        }
+        # Find title
+        for slide in slides:
+            if slide["is_title"]:
+                structure["title"] = slide["text"].strip()
+                break
+        # Find main points and scriptures
+        for slide in slides:
+            if slide["is_scripture"]:
+                structure["scriptures"].append(slide["text"].strip())
+            elif not slide["is_title"] and slide["text"].strip():
+                structure["main_points"].append(slide["text"].strip())
+        return structure
+class ChineseTextProcessor:
+    """Process and normalize Chinese text."""
+    @staticmethod
+    def normalize_text(text: str) -> str:
+        """
+        Normalize Chinese text.
+        - Fix punctuation
+        - Remove extra whitespace
+        - Standardize quotes
+        Args:
+            text: Input Chinese text
+        Returns:
+            Normalized text
+        """
+        # Remove extra whitespace
+        text = re.sub(r'\s+', ' ', text)
+        # Normalize punctuation
+        replacements = {
+            '，': '，',
+            '。': '。',
+            '！': '！',
+            '？': '？',
+            '：': '：',
+            '；': '；',
+            '"': '"',
+            '"': '"',
+            ''': "'",
+            ''': "'",
+        }
+        for old, new in replacements.items():
+            text = text.replace(old, new)
+        return text.strip()
+    @staticmethod
+    def segment_sermon(text: str) -> Dict:
+        """
+        Segment Chinese sermon into logical sections.
+        Args:
+            text: Full sermon text
+        Returns:
+            {
+                "introduction": str,
+                "main_points": List[str],
+                "conclusion": str,
+                "scripture_references": List[str]
+            }
+        """
+        # TODO: Implement intelligent segmentation
+        # For now, return basic structure
+        return {
+            "introduction": "",
+            "main_points": [],
+            "conclusion": "",
+            "scripture_references": []
+        }

examples/README.md ADDED Viewed

	@@ -0,0 +1,214 @@

+# Example Files
+This directory contains sample input files to help you understand the expected format for the Worship Program Generator.
+## 📄 Files
+### 1. `sample_chinese_sermon.txt`
+**Type:** Pre-written Chinese sermon text
+**Encoding:** UTF-8
+**Use Case:** Option A - Upload as Chinese sermon text
+**Content:**
+- Complete sermon manuscript in Chinese
+- Includes: Title, Scripture reference, Introduction, Main points, Conclusion
+- Proper paragraph breaks and structure
+- Scripture references in Chinese format
+**How to Use:**
+1. Upload this file as "Chinese Sermon Text"
+2. Upload a bulletin PDF
+3. Click "Generate Worship Program"
+---
+### 2. `sample_slides.pdf` (Not included - Create your own)
+**Type:** Sermon slides presentation
+**Format:** PDF (text-based or image-based)
+**Use Case:** Option B - Generate narrative from slides
+**Expected Content:**
+- Title slide with sermon title
+- Main point slides (bullet points or short text)
+- Scripture reference slides
+- Can be PowerPoint/Keynote exported as PDF
+**How to Create:**
+1. Create a sermon presentation in PowerPoint/Keynote
+2. Export/Save as PDF
+3. Upload as "Sermon Slides PDF"
+---
+### 3. `sample_bulletin.pdf` (Not included - Create your own)
+**Type:** Worship bulletin
+**Format:** PDF
+**Use Case:** Required for all workflows
+**Expected Content:**
+- Worship date (preferably in filename: `bulletin-2024-01-07.pdf`)
+- Order of worship
+- Hymn numbers and titles
+- Scripture reading passages
+- Announcements
+- Any liturgical elements
+**Naming Convention:**
+- Recommended: `RCCA-worship-bulletin-YYYY-MM-DD.pdf`
+- Or: `bulletin-YYYY-MM-DD.pdf`
+- Date will be auto-extracted from filename
+---
+## 📝 File Format Guidelines
+### Chinese Sermon Text (.txt)
+```
+[Sermon Title in Chinese]
+经文：[Scripture Reference]
+[Introduction paragraph]
+一、[First Main Point]
+[Content for first point]
+二、[Second Main Point]
+[Content for second point]
+三、[Third Main Point]
+[Content for third point]
+[Conclusion]
+```
+**Tips:**
+- Use UTF-8 encoding
+- Include clear section markers (一、二、三 or I. II. III.)
+- Add paragraph breaks for readability
+- Include scripture references in Chinese format
+---
+### Sermon Slides PDF
+**Recommended Structure:**
+```
+Slide 1: Title
+  信心的旅程
+  Journey of Faith
+Slide 2: Scripture
+  创世记 12:1-9
+  Genesis 12:1-9
+Slide 3: Main Point 1
+  • 神的呼召
+  • God's Call
+  • [Key points]
+Slide 4: Main Point 2
+  • 神的应许
+  • God's Promise
+  • [Key points]
+Slide 5: Application
+  • 实践的教导
+  • Practical Teaching
+```
+**Tips:**
+- Keep text clear and readable
+- Use consistent formatting
+- Include both Chinese and English if bilingual
+- Avoid heavy graphics (focus on text content)
+---
+### Worship Bulletin PDF
+**Recommended Sections:**
+```
+主日崇拜程序
+Sunday Worship Service
+日期：2024年1月7日
+序乐 Prelude
+宣召 Call to Worship
+祷告 Prayer
+诗歌 Hymn #123
+读经 Scripture Reading: 创世记 12:1-9
+信息 Sermon: [Title]
+回应诗歌 Response Hymn #456
+奉献 Offering
+祝福 Benediction
+报告事项 Announcements
+- [Announcement 1]
+- [Announcement 2]
+```
+**Tips:**
+- Include date prominently
+- List hymns with numbers
+- Specify scripture passages
+- Keep format clean and structured
+---
+## 🧪 Testing the System
+### Quick Test Workflow
+1. **Prepare Files:**
+   - Chinese sermon text OR sermon slides PDF
+   - Worship bulletin PDF
+2. **Upload:**
+   - Go to the Worship Program Generator interface
+   - Upload your files
+   - Enter or leave blank the worship date
+3. **Generate:**
+   - Click "Generate Worship Program"
+   - Wait 30-60 seconds
+4. **Download:**
+   - Download both Markdown and DOCX versions
+   - Review for accuracy
+   - Edit as needed
+---
+## ⚠️ Common Issues
+### Encoding Errors
+- **Problem:** Chinese characters display incorrectly
+- **Solution:** Save text files as UTF-8 encoding
+### PDF Extraction Failures
+- **Problem:** Cannot extract text from PDF
+- **Solution:** Ensure PDF is not password-protected, try regenerating PDF with text layer
+### Missing Date
+- **Problem:** Date not auto-detected
+- **Solution:** Include date in filename or manually enter in the form
+### Translation Quality
+- **Problem:** Translation is awkward or inaccurate
+- **Solution:** Review and manually edit the output, especially theological terms
+---
+## 📧 Support
+For issues or questions:
+1. Check the troubleshooting section in the main README
+2. Review these example formats
+3. Open an issue on GitHub with sample files (anonymized)
+---
+**Note:** The actual PDF files (`sample_slides.pdf` and `sample_bulletin.pdf`) are not included in this repository. Please create your own based on the guidelines above, or use your church's existing files.

examples/sample_chinese_sermon.txt ADDED Viewed

	@@ -0,0 +1,61 @@

+信心的旅程：亚伯拉罕的呼召
+经文：创世记12:1-9
+引言：
+今天我们一起来思考信心的含义。当我们回顾圣经中伟大的信心榜样时，亚伯拉罕的名字总是首先浮现在我们的脑海中。他被称为"信心之父"，不是因为他从未怀疑，而是因为他在怀疑中仍然选择相信和顺服。
+一、神的呼召（12:1）
+"耶和华对亚伯兰说：你要离开本地、本族、父家，往我所要指示你的地去。"
+这是一个看似不合理的呼召。神要求亚伯拉罕离开他所熟悉的一切：
+- 离开本地：放弃安全的环境
+- 离开本族：放弃亲密的关系
+- 离开父家：放弃家族的产业
+更令人困惑的是，神并没有明确告诉他目的地在哪里，只说"往我所要指示你的地去"。这需要完全的信靠。
+在我们的生活中，神的呼召有时也是如此。祂可能要求我们离开舒适区，进入未知的领域。问题不在于我们是否感到害怕，而在于我们是否愿意顺服。
+二、神的应许（12:2-3）
+虽然神的要求看似严苛，但祂同时给予了宝贵的应许：
+1. "我必叫你成为大国" - 后裔的应许
+2. "我必赐福给你" - 福分的应许
+3. "叫你的名为大" - 名声的应许
+4. "你也要叫别人得福" - 使命的应许
+这些应许显明了神呼召的目的。神呼召我们，不仅仅是为了我们个人的益处，更是为了祂国度的计划。我们蒙福，是为了成为别人的祝福。
+三、信心的回应（12:4）
+"亚伯兰就照着耶和华的吩咐去了。"
+这简单的一句话，代表了巨大的信心行动。亚伯兰当时已经七十五岁，这个年纪通常是享受安逸的时候，但他选择了顺服。
+真正的信心不是停留在口头上，而是表现在行动中。雅各书2:26说："身体没有灵魂是死的，信心没有行为也是死的。"
+四、实践的教导
+1. **顺服需要勇气**
+   - 面对未知时，勇敢迈出第一步
+   - 相信神的引导胜过自己的计划
+2. **等候需要耐心**
+   - 神的应许不总是立即实现
+   - 在等候中继续信靠和顺服
+3. **祝福带来责任**
+   - 我们领受祝福，是为了传递祝福
+   - 神的恩典应该激励我们去服事他人
+结语：
+亚伯拉罕的信心之旅告诉我们，真正的信心是在不确定中仍然选择相信神。今天，神也在呼召我们，可能不是要我们离开物理上的家乡，但可能是要我们离开属灵上的舒适区。
+让我们像亚伯拉罕一样，勇敢地回应神的呼召，因为我们知道，那呼召我们的是信实的。
+愿神赐福我们每一个人，让我们在信心的旅程中不断成长。阿们。

llm/__init__.py ADDED Viewed

	@@ -0,0 +1,12 @@

+"""
+LLM client and prompt management modules.
+"""
+from .qwen_client import QwenClient
+from .prompt_templates import SYSTEM_PROMPTS, TASK_PROMPTS
+__all__ = [
+    "QwenClient",
+    "SYSTEM_PROMPTS",
+    "TASK_PROMPTS",
+]

llm/prompt_templates.py ADDED Viewed

	@@ -0,0 +1,166 @@

+"""
+System prompts and task templates for LLM interactions.
+"""
+SYSTEM_PROMPTS = {
+    "worship_assembler": """You are a worship program coordinator for a bilingual Chinese-English church.
+Your task is to create well-structured, reverent worship programs that integrate:
+- Sermon content (Chinese with English translation)
+- Hymns and worship songs
+- Scripture readings
+- Liturgical elements (prayers, responsive readings)
+- Announcements
+Format output in clear markdown with bilingual sections. Maintain a reverent, professional tone.
+Preserve the theological content and ensure proper formatting for both languages.""",
+    "translator": """You are a professional translator specializing in religious texts, liturgy, and theology.
+Preserve:
+- Theological accuracy and terminology
+- Cultural and denominational sensitivity
+- Formatting and structure
+- Tone and register
+- Scripture references
+Maintain natural language flow in the target language while staying faithful to the source.""",
+    "narrative_generator": """You are a pastoral assistant helping prepare sermon manuscripts.
+Generate flowing, coherent sermon narratives from outlines and slides.
+Maintain:
+- Theological depth and accuracy
+- Pastoral and encouraging tone
+- Logical flow and transitions
+- Proper Chinese language style
+- Clear main points and application""",
+}
+TASK_PROMPTS = {
+    "assemble_program": """Create a complete bilingual worship program using these sources:
+**Sermon Narrative (Chinese):**
+{chinese_sermon}
+**Sermon Translation (English):**
+{english_sermon}
+**Bulletin (Worship Order):**
+{bulletin_content}
+**Date:** {date}
+Generate a complete worship program in markdown format with these sections:
+1. **Header** - Date, theme in both languages
+2. **Prelude/Welcome** - 序乐/欢迎
+3. **Worship Songs** - Include hymn numbers from bulletin
+4. **Scripture Reading** - 读经 with references
+5. **Sermon** - 信息 (Chinese text followed by English translation)
+6. **Response/Offering** - 回应/奉献
+7. **Benediction** - 祝福
+8. **Announcements** - 报告事项
+Use this markdown structure:
+```markdown
+# 主日崇拜程序 Sunday Worship Program
+**日期 Date:** {date}
+---
+## 序乐 Prelude
+[Content from bulletin]
+## 诗歌敬拜 Worship in Song
+[Hymns with numbers]
+## 读经 Scripture Reading
+[Passage and reference]
+## 信息 Sermon
+### [Sermon Title in Chinese]
+### [Sermon Title in English]
+**中文 Chinese:**
+{chinese_sermon}
+**English:**
+{english_sermon}
+## 回应诗歌 Response Song
+[Hymn information]
+## 奉献 Offering
+## 祝福 Benediction
+## 报告事项 Announcements
+[Announcements from bulletin]
+---
+```
+Generate the complete program now:""",
+    "extract_sermon_structure": """Analyze this sermon content and extract its structure:
+{sermon_text}
+Provide a structured analysis in this format:
+**Title:**
+- Chinese: [title]
+- English: [title]
+**Main Points:**
+1. [Point 1]
+2. [Point 2]
+3. [Point 3]
+**Scripture References:**
+- [Reference 1]
+- [Reference 2]
+**Key Themes:**
+- [Theme 1]
+- [Theme 2]
+Provide the analysis:""",
+    "generate_narrative": """Based on these sermon slides, generate a flowing narrative sermon in Chinese:
+{slides_content}
+Requirements:
+1. Expand bullet points into complete paragraphs
+2. Add smooth transitions between sections
+3. Maintain theological depth
+4. Use appropriate pastoral tone
+5. Keep the structure: introduction → main points → conclusion
+6. Include applications and illustrations where appropriate
+Generate the complete sermon narrative:""",
+    "translate_sermon": """Translate this Chinese sermon to English, preserving theological accuracy:
+{chinese_text}
+Requirements:
+1. Maintain theological terminology accuracy
+2. Preserve the tone and style
+3. Keep paragraph structure
+4. Translate scripture references appropriately
+5. Ensure natural English flow
+English translation:""",
+}

llm/qwen_client.py ADDED Viewed

	@@ -0,0 +1,218 @@

+"""
+Qwen LLM client wrapper for HuggingFace Inference API.
+"""
+import os
+from typing import Dict, List, Optional
+from huggingface_hub import InferenceClient
+class QwenClient:
+    """Wrapper for Qwen model via HuggingFace Inference API."""
+    def __init__(
+        self,
+        model_id: str = "Qwen/Qwen2.5-7B-Instruct",
+        api_token: Optional[str] = None,
+        use_local: bool = False
+    ):
+        """
+        Initialize Qwen client.
+        Args:
+            model_id: HuggingFace model ID
+            api_token: HF API token (optional, uses env var if not provided)
+            use_local: If True, load model locally (requires GPU)
+        """
+        self.model_id = model_id
+        self.api_token = api_token or os.getenv("HF_API_TOKEN")
+        self.use_local = use_local
+        if use_local:
+            # Load model locally (requires GPU)
+            from transformers import AutoModelForCausalLM, AutoTokenizer
+            print(f"Loading {model_id} locally...")
+            self.tokenizer = AutoTokenizer.from_pretrained(model_id)
+            self.model = AutoModelForCausalLM.from_pretrained(
+                model_id,
+                device_map="auto",
+                torch_dtype="auto"
+            )
+            print("Model loaded successfully")
+        else:
+            # Use HF Inference API (serverless)
+            self.client = InferenceClient(
+                model=model_id,
+                token=self.api_token
+            )
+    def chat(
+        self,
+        messages: List[Dict[str, str]],
+        max_tokens: int = 2048,
+        temperature: float = 0.7,
+        **kwargs
+    ) -> str:
+        """
+        Send chat completion request.
+        Args:
+            messages: List of {"role": "user/assistant/system", "content": str}
+            max_tokens: Max generation length
+            temperature: Sampling temperature
+            **kwargs: Additional parameters
+        Returns:
+            Generated text
+        """
+        if self.use_local:
+            return self._chat_local(messages, max_tokens, temperature)
+        else:
+            return self._chat_api(messages, max_tokens, temperature)
+    def _chat_api(self, messages: List[Dict[str, str]], max_tokens: int, temperature: float) -> str:
+        """Use HF Inference API."""
+        try:
+            response = self.client.chat_completion(
+                messages=messages,
+                max_tokens=max_tokens,
+                temperature=temperature,
+            )
+            return response.choices[0].message.content
+        except Exception as e:
+            print(f"Error calling HF Inference API: {e}")
+            raise
+    def _chat_local(self, messages: List[Dict[str, str]], max_tokens: int, temperature: float) -> str:
+        """Use local model."""
+        try:
+            text = self.tokenizer.apply_chat_template(
+                messages,
+                tokenize=False,
+                add_generation_prompt=True
+            )
+            inputs = self.tokenizer([text], return_tensors="pt").to(self.model.device)
+            outputs = self.model.generate(
+                **inputs,
+                max_new_tokens=max_tokens,
+                temperature=temperature,
+                do_sample=temperature > 0
+            )
+            generated = self.tokenizer.decode(
+                outputs[0][len(inputs[0]):],
+                skip_special_tokens=True
+            )
+            return generated
+        except Exception as e:
+            print(f"Error with local model inference: {e}")
+            raise
+    def translate(
+        self,
+        text: str,
+        source_lang: str = "Chinese",
+        target_lang: str = "English"
+    ) -> str:
+        """
+        Translate text between languages.
+        Args:
+            text: Source text
+            source_lang: Source language name
+            target_lang: Target language name
+        Returns:
+            Translated text
+        """
+        prompt = f"""Translate the following {source_lang} text to {target_lang}.
+Preserve formatting, meaning, and theological terminology accurately.
+{source_lang} text:
+{text}
+{target_lang} translation:"""
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a professional translator specializing in religious and liturgical texts. Maintain theological accuracy and cultural sensitivity."
+            },
+            {
+                "role": "user",
+                "content": prompt
+            }
+        ]
+        return self.chat(messages, temperature=0.3)
+    def generate_narrative(self, slides_content: str) -> str:
+        """
+        Generate sermon narrative from slide bullet points.
+        Args:
+            slides_content: Extracted content from slides
+        Returns:
+            Generated sermon narrative in Chinese
+        """
+        prompt = f"""Based on these sermon slides, generate a flowing narrative sermon text in Chinese.
+Expand bullet points into complete paragraphs while preserving the theological content and structure.
+Sermon Slides:
+{slides_content}
+Generate a complete, cohesive sermon narrative:"""
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a pastoral assistant who helps prepare sermon manuscripts. Generate flowing, theologically sound sermon narratives."
+            },
+            {
+                "role": "user",
+                "content": prompt
+            }
+        ]
+        return self.chat(messages, max_tokens=4096, temperature=0.7)
+    def assemble_program(
+        self,
+        chinese_sermon: str,
+        english_sermon: str,
+        bulletin_content: str,
+        date: str
+    ) -> str:
+        """
+        Assemble complete bilingual worship program.
+        Args:
+            chinese_sermon: Chinese sermon text
+            english_sermon: English sermon translation
+            bulletin_content: Extracted bulletin content
+            date: Worship date
+        Returns:
+            Complete worship program in markdown format
+        """
+        from .prompt_templates import TASK_PROMPTS, SYSTEM_PROMPTS
+        prompt = TASK_PROMPTS["assemble_program"].format(
+            chinese_sermon=chinese_sermon,
+            english_sermon=english_sermon,
+            bulletin_content=bulletin_content,
+            date=date
+        )
+        messages = [
+            {
+                "role": "system",
+                "content": SYSTEM_PROMPTS["worship_assembler"]
+            },
+            {
+                "role": "user",
+                "content": prompt
+            }
+        ]
+        return self.chat(messages, max_tokens=4096, temperature=0.5)

requirements.txt ADDED Viewed

	@@ -0,0 +1,40 @@

+# Worship Program Generator - Dependencies
+# Core Framework
+gradio>=4.44.0
+python-dotenv>=1.0.0
+# HuggingFace & LLM
+huggingface_hub>=0.20.0
+transformers>=4.40.0
+accelerate>=0.25.0
+# torch - uncomment if using local model inference
+# torch>=2.0.0
+# Note: For HF Inference API only, torch is not required
+# Document Processing
+pypdf2>=3.0.0
+pdfplumber>=0.10.0
+pillow>=10.0.0
+pytesseract>=0.3.10
+pdf2image>=1.16.3
+# Optional: Better PDF processing
+pymupdf>=1.23.0  # PyMuPDF for advanced PDF handling
+# Text Processing
+python-docx>=1.1.0
+markdown>=3.5.0
+# HTTP & API
+requests>=2.31.0
+aiohttp>=3.9.0
+# Utilities
+tqdm>=4.66.0
+python-dateutil>=2.8.0
+# Development (optional - comment out for production)
+# pytest>=7.4.0
+# black>=23.0.0
+# flake8>=6.0.0

utils/__init__.py ADDED Viewed

	@@ -0,0 +1,12 @@

+"""
+Utility functions for file handling and format conversion.
+"""
+from .markdown_to_docx import markdown_to_docx
+from .file_utils import sanitize_filename, ensure_directory
+__all__ = [
+    "markdown_to_docx",
+    "sanitize_filename",
+    "ensure_directory",
+]

utils/file_utils.py ADDED Viewed

	@@ -0,0 +1,75 @@

+"""
+File handling utilities.
+"""
+import re
+from pathlib import Path
+from typing import Union
+def sanitize_filename(filename: str) -> str:
+    """
+    Sanitize filename by removing invalid characters.
+    Args:
+        filename: Original filename
+    Returns:
+        Sanitized filename safe for filesystems
+    """
+    # Remove invalid characters
+    filename = re.sub(r'[<>:"/\\|?*]', '_', filename)
+    # Remove leading/trailing spaces and dots
+    filename = filename.strip('. ')
+    # Limit length
+    if len(filename) > 255:
+        name, ext = filename.rsplit('.', 1) if '.' in filename else (filename, '')
+        filename = name[:250] + ('.' + ext if ext else '')
+    return filename
+def ensure_directory(path: Union[str, Path]) -> Path:
+    """
+    Ensure directory exists, create if necessary.
+    Args:
+        path: Directory path
+    Returns:
+        Path object
+    """
+    path = Path(path)
+    path.mkdir(parents=True, exist_ok=True)
+    return path
+def get_file_size_mb(file_path: Union[str, Path]) -> float:
+    """
+    Get file size in megabytes.
+    Args:
+        file_path: Path to file
+    Returns:
+        File size in MB
+    """
+    path = Path(file_path)
+    return path.stat().st_size / (1024 * 1024)
+def validate_file_type(file_path: Union[str, Path], allowed_extensions: list) -> bool:
+    """
+    Validate file extension.
+    Args:
+        file_path: Path to file
+        allowed_extensions: List of allowed extensions (e.g., ['.pdf', '.txt'])
+    Returns:
+        True if valid, False otherwise
+    """
+    path = Path(file_path)
+    return path.suffix.lower() in [ext.lower() for ext in allowed_extensions]

utils/markdown_to_docx.py ADDED Viewed

	@@ -0,0 +1,100 @@

+"""
+Convert markdown to DOCX with proper formatting.
+"""
+from docx import Document
+from docx.shared import Pt, Inches
+from docx.enum.text import WD_ALIGN_PARAGRAPH
+import re
+def markdown_to_docx(markdown_content: str, output_path: str):
+    """
+    Convert markdown content to DOCX file.
+    Args:
+        markdown_content: Markdown text
+        output_path: Path to save DOCX file
+    Note:
+        This is a basic converter. For more complex markdown,
+        consider using pandoc or pypandoc.
+    """
+    doc = Document()
+    # Set document styles
+    style = doc.styles['Normal']
+    style.font.name = 'Arial'
+    style.font.size = Pt(11)
+    lines = markdown_content.split('\n')
+    i = 0
+    while i < len(lines):
+        line = lines[i].strip()
+        # Skip empty lines
+        if not line:
+            i += 1
+            continue
+        # Headers
+        if line.startswith('# '):
+            heading = doc.add_heading(line[2:], level=1)
+            heading.alignment = WD_ALIGN_PARAGRAPH.CENTER
+        elif line.startswith('## '):
+            doc.add_heading(line[3:], level=2)
+        elif line.startswith('### '):
+            doc.add_heading(line[4:], level=3)
+        # Horizontal rules
+        elif line.startswith('---'):
+            doc.add_paragraph('_' * 50)
+        # Bold text (simple pattern)
+        elif '**' in line:
+            p = doc.add_paragraph()
+            parts = line.split('**')
+            for idx, part in enumerate(parts):
+                if idx % 2 == 1:  # Bold parts
+                    run = p.add_run(part)
+                    run.bold = True
+                else:
+                    p.add_run(part)
+        # Lists
+        elif line.startswith('- ') or line.startswith('* '):
+            doc.add_paragraph(line[2:], style='List Bullet')
+        elif re.match(r'^\d+\.\s', line):
+            doc.add_paragraph(line[3:], style='List Number')
+        # Regular paragraphs
+        else:
+            # Handle multiple consecutive lines as one paragraph
+            paragraph_lines = [line]
+            j = i + 1
+            while j < len(lines) and lines[j].strip() and not _is_special_line(lines[j]):
+                paragraph_lines.append(lines[j].strip())
+                j += 1
+            full_paragraph = ' '.join(paragraph_lines)
+            doc.add_paragraph(full_paragraph)
+            i = j - 1
+        i += 1
+    # Save document
+    doc.save(output_path)
+def _is_special_line(line: str) -> bool:
+    """Check if line is a special markdown element."""
+    line = line.strip()
+    return (
+        line.startswith('#') or
+        line.startswith('-') or
+        line.startswith('*') or
+        line.startswith('---') or
+        re.match(r'^\d+\.\s', line) or
+        '**' in line
+    )