Spaces:
Sleeping
Sleeping
DynamicPacific commited on
Commit ·
7ca4566
1
Parent(s): c20ee5f
Deploy worship program generator application to HF Space
Browse files- README.md +241 -5
- agents/__init__.py +11 -0
- agents/tools.py +213 -0
- agents/worship_agent.py +201 -0
- app.py +295 -0
- core/__init__.py +10 -0
- core/document_processor.py +284 -0
- examples/README.md +214 -0
- examples/sample_chinese_sermon.txt +61 -0
- llm/__init__.py +12 -0
- llm/prompt_templates.py +166 -0
- llm/qwen_client.py +218 -0
- requirements.txt +40 -0
- utils/__init__.py +12 -0
- utils/file_utils.py +75 -0
- utils/markdown_to_docx.py +100 -0
README.md
CHANGED
|
@@ -1,12 +1,248 @@
|
|
| 1 |
---
|
| 2 |
-
title: Worship
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Worship Program Generator
|
| 3 |
+
emoji: 🙏
|
| 4 |
+
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
python_version: "3.10"
|
| 12 |
+
suggested_hardware: cpu-basic
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# 🙏 主日崇拜程序生成器 Worship Program Generator
|
| 16 |
+
|
| 17 |
+
**Generate bilingual (Chinese-English) worship programs automatically from multiple source documents.**
|
| 18 |
+
|
| 19 |
+
This AI-powered tool helps church staff create comprehensive worship programs by:
|
| 20 |
+
- Extracting content from worship bulletins (PDF)
|
| 21 |
+
- Generating sermon narratives from slide presentations (PDF)
|
| 22 |
+
- Translating between Chinese and English
|
| 23 |
+
- Assembling complete bilingual worship programs
|
| 24 |
+
|
| 25 |
+
## ✨ Features
|
| 26 |
+
|
| 27 |
+
### 📄 Multi-Source Input Support
|
| 28 |
+
- **Chinese Sermon Text**: Upload pre-written sermon manuscripts (.txt)
|
| 29 |
+
- **Sermon Slides PDF**: Generate flowing narratives from bullet-point slides
|
| 30 |
+
- **Worship Bulletin PDF**: Extract liturgical elements, hymns, scripture readings
|
| 31 |
+
|
| 32 |
+
### 🤖 AI-Powered Processing
|
| 33 |
+
- **Narrative Generation**: Convert sermon slides into cohesive sermon text
|
| 34 |
+
- **Translation**: High-quality Chinese ↔ English translation preserving theological nuance
|
| 35 |
+
- **Program Assembly**: Intelligently combine all elements into structured worship order
|
| 36 |
+
|
| 37 |
+
### 📤 Output Formats
|
| 38 |
+
- **Markdown**: Easy to edit and version control
|
| 39 |
+
- **DOCX**: Ready for printing and distribution
|
| 40 |
+
|
| 41 |
+
### 🌐 Bilingual Support
|
| 42 |
+
- Seamless Chinese-English parallel text
|
| 43 |
+
- Preserves cultural and theological context
|
| 44 |
+
- Liturgical terminology handled appropriately
|
| 45 |
+
|
| 46 |
+
## 🚀 Quick Start
|
| 47 |
+
|
| 48 |
+
### Option A: Pre-Written Sermon
|
| 49 |
+
1. Upload your **Chinese sermon text** (.txt file)
|
| 50 |
+
2. Upload your **worship bulletin** (PDF)
|
| 51 |
+
3. Enter the worship date
|
| 52 |
+
4. Click "Generate Worship Program"
|
| 53 |
+
|
| 54 |
+
### Option B: Generate from Slides
|
| 55 |
+
1. Upload your **sermon slides** (PDF)
|
| 56 |
+
2. Upload your **worship bulletin** (PDF)
|
| 57 |
+
3. Enter the worship date
|
| 58 |
+
4. Click "Generate Worship Program"
|
| 59 |
+
|
| 60 |
+
The AI will:
|
| 61 |
+
- Extract content from all sources
|
| 62 |
+
- Generate narrative text (if needed)
|
| 63 |
+
- Translate to target language
|
| 64 |
+
- Assemble complete worship program
|
| 65 |
+
- Export to Markdown and DOCX
|
| 66 |
+
|
| 67 |
+
## 🛠️ Technical Details
|
| 68 |
+
|
| 69 |
+
### LLM Backend
|
| 70 |
+
- **Model**: Qwen 2.5-7B-Instruct (Alibaba Cloud)
|
| 71 |
+
- **Deployment**: HuggingFace Inference API (serverless)
|
| 72 |
+
- **Languages**: Optimized for Chinese and English
|
| 73 |
+
|
| 74 |
+
### Document Processing
|
| 75 |
+
- **PDF Extraction**: Text and image-based PDFs supported
|
| 76 |
+
- **OCR**: Automatic OCR for scanned documents (Tesseract)
|
| 77 |
+
- **Structure Detection**: Intelligent parsing of worship elements
|
| 78 |
+
|
| 79 |
+
### Architecture
|
| 80 |
+
```
|
| 81 |
+
Input PDFs → Document Processor → LLM (Qwen) → Program Assembler → Output (MD/DOCX)
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
## 📋 Input Requirements
|
| 85 |
+
|
| 86 |
+
### Chinese Sermon Text (Option A)
|
| 87 |
+
- Format: Plain text (.txt)
|
| 88 |
+
- Encoding: UTF-8
|
| 89 |
+
- Recommended: Include paragraph breaks and section markers
|
| 90 |
+
|
| 91 |
+
### Sermon Slides PDF (Option B)
|
| 92 |
+
- Format: PDF
|
| 93 |
+
- Content: Can be text-based or image-based (OCR supported)
|
| 94 |
+
- Structure: Title slides, main points, scripture references
|
| 95 |
+
|
| 96 |
+
### Worship Bulletin PDF (Required)
|
| 97 |
+
- Format: PDF
|
| 98 |
+
- Should include:
|
| 99 |
+
- Worship date
|
| 100 |
+
- Order of service
|
| 101 |
+
- Hymn numbers/titles
|
| 102 |
+
- Scripture readings
|
| 103 |
+
- Announcements
|
| 104 |
+
|
| 105 |
+
## 📦 Project Structure
|
| 106 |
+
|
| 107 |
+
```
|
| 108 |
+
worship-program-agent/
|
| 109 |
+
├── app.py # Gradio UI
|
| 110 |
+
├── requirements.txt # Dependencies
|
| 111 |
+
├── README.md # This file
|
| 112 |
+
├── .env.example # Configuration template
|
| 113 |
+
├── core/
|
| 114 |
+
│ ├── document_processor.py # PDF extraction & OCR
|
| 115 |
+
│ ├── translator.py # Translation logic
|
| 116 |
+
│ ├── narrative_generator.py # Sermon generation
|
| 117 |
+
│ └── program_assembler.py # Final assembly
|
| 118 |
+
├── agents/
|
| 119 |
+
│ ├── worship_agent.py # Workflow orchestration
|
| 120 |
+
│ └── tools.py # Agent tools
|
| 121 |
+
├── llm/
|
| 122 |
+
│ ├── qwen_client.py # Qwen LLM wrapper
|
| 123 |
+
│ └── prompt_templates.py # System prompts
|
| 124 |
+
├── utils/
|
| 125 |
+
│ ├── file_utils.py # File handling
|
| 126 |
+
│ └── markdown_to_docx.py # Format conversion
|
| 127 |
+
└── examples/
|
| 128 |
+
├── sample_sermon.txt
|
| 129 |
+
├── sample_slides.pdf
|
| 130 |
+
└── sample_bulletin.pdf
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
## 🔧 Local Development
|
| 134 |
+
|
| 135 |
+
### Prerequisites
|
| 136 |
+
- Python 3.10+
|
| 137 |
+
- Tesseract OCR (for scanned PDFs)
|
| 138 |
+
- HuggingFace API token
|
| 139 |
+
|
| 140 |
+
### Setup
|
| 141 |
+
|
| 142 |
+
```bash
|
| 143 |
+
# Clone the repository
|
| 144 |
+
git clone <your-repo-url>
|
| 145 |
+
cd worship-program-agent
|
| 146 |
+
|
| 147 |
+
# Install dependencies
|
| 148 |
+
pip install -r requirements.txt
|
| 149 |
+
|
| 150 |
+
# Configure environment
|
| 151 |
+
cp .env.example .env
|
| 152 |
+
# Edit .env and add your HF_API_TOKEN
|
| 153 |
+
|
| 154 |
+
# Install Tesseract (for OCR support)
|
| 155 |
+
# Ubuntu/Debian:
|
| 156 |
+
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-eng
|
| 157 |
+
|
| 158 |
+
# macOS:
|
| 159 |
+
brew install tesseract tesseract-lang
|
| 160 |
+
|
| 161 |
+
# Run locally
|
| 162 |
+
python app.py
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
### Configuration
|
| 166 |
+
|
| 167 |
+
Edit `.env` file:
|
| 168 |
+
```bash
|
| 169 |
+
MODEL_ID=Qwen/Qwen2.5-7B-Instruct
|
| 170 |
+
HF_API_TOKEN=your_token_here
|
| 171 |
+
USE_LOCAL_MODEL=false
|
| 172 |
+
OCR_LANGUAGES=eng+chi_sim
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
## 🌐 Deployment
|
| 176 |
+
|
| 177 |
+
### HuggingFace Spaces
|
| 178 |
+
|
| 179 |
+
This app is designed for HuggingFace Spaces deployment:
|
| 180 |
+
|
| 181 |
+
1. **Create a new Space** on HuggingFace
|
| 182 |
+
2. **Push this repository** to the Space
|
| 183 |
+
3. **Set environment variables** in Space settings:
|
| 184 |
+
- `HF_API_TOKEN`: Your HuggingFace API token
|
| 185 |
+
- `MODEL_ID`: (Optional) Custom model selection
|
| 186 |
+
4. **Select hardware**: `cpu-basic` (recommended for Inference API)
|
| 187 |
+
|
| 188 |
+
The Space will automatically build and deploy.
|
| 189 |
+
|
| 190 |
+
### Alternative: Local Model
|
| 191 |
+
|
| 192 |
+
For faster inference, use local GPU:
|
| 193 |
+
|
| 194 |
+
1. Set `suggested_hardware: t4-medium` in README metadata
|
| 195 |
+
2. Set `USE_LOCAL_MODEL=true` in environment
|
| 196 |
+
3. Uncomment `torch` in requirements.txt
|
| 197 |
+
|
| 198 |
+
Note: Local model requires ~14GB GPU memory for Qwen 2.5-7B.
|
| 199 |
+
|
| 200 |
+
## 📊 Performance
|
| 201 |
+
|
| 202 |
+
### Typical Processing Time
|
| 203 |
+
- **Bulletin extraction**: 2-5 seconds
|
| 204 |
+
- **Sermon narrative generation**: 15-30 seconds
|
| 205 |
+
- **Translation**: 10-20 seconds
|
| 206 |
+
- **Program assembly**: 5-10 seconds
|
| 207 |
+
- **Total**: 30-60 seconds (depending on content length)
|
| 208 |
+
|
| 209 |
+
### API Costs (HF Inference API)
|
| 210 |
+
- Free tier: 1,000 requests/month
|
| 211 |
+
- Paid tier: ~$0.001-0.005 per request
|
| 212 |
+
- Typical program generation: ~3-4 API calls
|
| 213 |
+
|
| 214 |
+
## ⚠️ Limitations
|
| 215 |
+
|
| 216 |
+
- **Maximum file size**: 20MB per upload
|
| 217 |
+
- **PDF complexity**: Very complex layouts may require manual review
|
| 218 |
+
- **OCR accuracy**: Scanned documents may have transcription errors
|
| 219 |
+
- **Translation**: Review theological terms for accuracy
|
| 220 |
+
- **Rate limits**: HF Inference API has rate limiting
|
| 221 |
+
|
| 222 |
+
## 🤝 Contributing
|
| 223 |
+
|
| 224 |
+
Contributions welcome! Areas for improvement:
|
| 225 |
+
- Additional language pairs
|
| 226 |
+
- Custom template support
|
| 227 |
+
- Batch processing
|
| 228 |
+
- Enhanced structure detection
|
| 229 |
+
- Alternative LLM backends
|
| 230 |
+
|
| 231 |
+
## 📄 License
|
| 232 |
+
|
| 233 |
+
MIT License - see LICENSE file
|
| 234 |
+
|
| 235 |
+
## 🙏 Acknowledgments
|
| 236 |
+
|
| 237 |
+
- **Qwen Team** (Alibaba Cloud) - LLM model
|
| 238 |
+
- **HuggingFace** - Inference infrastructure
|
| 239 |
+
- **Gradio** - UI framework
|
| 240 |
+
- **Tesseract** - OCR engine
|
| 241 |
+
|
| 242 |
+
## 📞 Support
|
| 243 |
+
|
| 244 |
+
For issues, questions, or feature requests, please open an issue on GitHub.
|
| 245 |
+
|
| 246 |
+
---
|
| 247 |
+
|
| 248 |
+
**Built with ❤️ for church communities**
|
agents/__init__.py
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Agent orchestration and workflow modules.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from .worship_agent import WorshipProgramAgent
|
| 6 |
+
from .tools import WorshipProgramTools
|
| 7 |
+
|
| 8 |
+
__all__ = [
|
| 9 |
+
"WorshipProgramAgent",
|
| 10 |
+
"WorshipProgramTools",
|
| 11 |
+
]
|
agents/tools.py
ADDED
|
@@ -0,0 +1,213 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tool functions for worship program generation workflow.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from typing import Dict, List, Optional
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
class WorshipProgramTools:
|
| 10 |
+
"""Tool functions for worship program generation."""
|
| 11 |
+
|
| 12 |
+
def __init__(self, llm_client):
|
| 13 |
+
"""
|
| 14 |
+
Initialize tools with LLM client.
|
| 15 |
+
|
| 16 |
+
Args:
|
| 17 |
+
llm_client: Instance of QwenClient or compatible LLM client
|
| 18 |
+
"""
|
| 19 |
+
from core.document_processor import DocumentProcessor, ChineseTextProcessor
|
| 20 |
+
|
| 21 |
+
self.llm = llm_client
|
| 22 |
+
self.doc_processor = DocumentProcessor()
|
| 23 |
+
self.cn_processor = ChineseTextProcessor()
|
| 24 |
+
|
| 25 |
+
def extract_bulletin_tool(self, pdf_path: str) -> Dict:
|
| 26 |
+
"""
|
| 27 |
+
Extract worship order and elements from bulletin PDF.
|
| 28 |
+
|
| 29 |
+
Args:
|
| 30 |
+
pdf_path: Path to bulletin PDF file
|
| 31 |
+
|
| 32 |
+
Returns:
|
| 33 |
+
{
|
| 34 |
+
"success": bool,
|
| 35 |
+
"data": Dict or None,
|
| 36 |
+
"error": str (if failed),
|
| 37 |
+
"message": str
|
| 38 |
+
}
|
| 39 |
+
"""
|
| 40 |
+
try:
|
| 41 |
+
result = self.doc_processor.extract_bulletin_pdf(pdf_path)
|
| 42 |
+
return {
|
| 43 |
+
"success": True,
|
| 44 |
+
"data": result,
|
| 45 |
+
"message": f"Extracted bulletin content ({len(result.get('text', ''))} chars)"
|
| 46 |
+
}
|
| 47 |
+
except Exception as e:
|
| 48 |
+
return {
|
| 49 |
+
"success": False,
|
| 50 |
+
"error": str(e),
|
| 51 |
+
"message": f"Failed to extract bulletin: {str(e)}"
|
| 52 |
+
}
|
| 53 |
+
|
| 54 |
+
def generate_sermon_narrative_tool(self, slides_pdf_path: str) -> Dict:
|
| 55 |
+
"""
|
| 56 |
+
Generate flowing sermon narrative from slide PDF.
|
| 57 |
+
|
| 58 |
+
Steps:
|
| 59 |
+
1. Extract text/images from slides
|
| 60 |
+
2. Identify structure (title, points, scriptures)
|
| 61 |
+
3. Generate cohesive narrative using LLM
|
| 62 |
+
|
| 63 |
+
Args:
|
| 64 |
+
slides_pdf_path: Path to sermon slides PDF
|
| 65 |
+
|
| 66 |
+
Returns:
|
| 67 |
+
{
|
| 68 |
+
"success": bool,
|
| 69 |
+
"narrative": str (if successful),
|
| 70 |
+
"structure": Dict,
|
| 71 |
+
"error": str (if failed),
|
| 72 |
+
"message": str
|
| 73 |
+
}
|
| 74 |
+
"""
|
| 75 |
+
try:
|
| 76 |
+
# Extract slides
|
| 77 |
+
slides_data = self.doc_processor.extract_sermon_slides_pdf(slides_pdf_path)
|
| 78 |
+
|
| 79 |
+
# Format for LLM
|
| 80 |
+
slides_text = self._format_slides_for_generation(slides_data)
|
| 81 |
+
|
| 82 |
+
# Generate narrative
|
| 83 |
+
narrative = self.llm.generate_narrative(slides_text)
|
| 84 |
+
|
| 85 |
+
return {
|
| 86 |
+
"success": True,
|
| 87 |
+
"narrative": narrative,
|
| 88 |
+
"structure": slides_data["structure"],
|
| 89 |
+
"message": f"Generated sermon narrative ({len(narrative)} chars)"
|
| 90 |
+
}
|
| 91 |
+
except Exception as e:
|
| 92 |
+
return {
|
| 93 |
+
"success": False,
|
| 94 |
+
"error": str(e),
|
| 95 |
+
"message": f"Failed to generate sermon: {str(e)}"
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
def translate_text_tool(
|
| 99 |
+
self,
|
| 100 |
+
text: str,
|
| 101 |
+
source_lang: str = "Chinese",
|
| 102 |
+
target_lang: str = "English"
|
| 103 |
+
) -> Dict:
|
| 104 |
+
"""
|
| 105 |
+
Translate text between Chinese and English.
|
| 106 |
+
|
| 107 |
+
Args:
|
| 108 |
+
text: Source text
|
| 109 |
+
source_lang: Source language (Chinese/English)
|
| 110 |
+
target_lang: Target language (English/Chinese)
|
| 111 |
+
|
| 112 |
+
Returns:
|
| 113 |
+
{
|
| 114 |
+
"success": bool,
|
| 115 |
+
"translation": str (if successful),
|
| 116 |
+
"source_lang": str,
|
| 117 |
+
"target_lang": str,
|
| 118 |
+
"error": str (if failed)
|
| 119 |
+
}
|
| 120 |
+
"""
|
| 121 |
+
try:
|
| 122 |
+
translation = self.llm.translate(text, source_lang, target_lang)
|
| 123 |
+
return {
|
| 124 |
+
"success": True,
|
| 125 |
+
"translation": translation,
|
| 126 |
+
"source_lang": source_lang,
|
| 127 |
+
"target_lang": target_lang,
|
| 128 |
+
"message": f"Translated {len(text)} chars"
|
| 129 |
+
}
|
| 130 |
+
except Exception as e:
|
| 131 |
+
return {
|
| 132 |
+
"success": False,
|
| 133 |
+
"error": str(e),
|
| 134 |
+
"message": f"Translation failed: {str(e)}"
|
| 135 |
+
}
|
| 136 |
+
|
| 137 |
+
def assemble_worship_program_tool(
|
| 138 |
+
self,
|
| 139 |
+
chinese_sermon: str,
|
| 140 |
+
english_sermon: str,
|
| 141 |
+
bulletin_data: Dict,
|
| 142 |
+
date: str
|
| 143 |
+
) -> Dict:
|
| 144 |
+
"""
|
| 145 |
+
Assemble complete bilingual worship program.
|
| 146 |
+
|
| 147 |
+
Args:
|
| 148 |
+
chinese_sermon: Chinese sermon text
|
| 149 |
+
english_sermon: English sermon translation
|
| 150 |
+
bulletin_data: Extracted bulletin data
|
| 151 |
+
date: Worship date (YYYY-MM-DD)
|
| 152 |
+
|
| 153 |
+
Returns:
|
| 154 |
+
{
|
| 155 |
+
"success": bool,
|
| 156 |
+
"program": str (markdown content if successful),
|
| 157 |
+
"error": str (if failed),
|
| 158 |
+
"message": str
|
| 159 |
+
}
|
| 160 |
+
"""
|
| 161 |
+
try:
|
| 162 |
+
program_markdown = self.llm.assemble_program(
|
| 163 |
+
chinese_sermon=chinese_sermon,
|
| 164 |
+
english_sermon=english_sermon,
|
| 165 |
+
bulletin_content=bulletin_data.get("text", ""),
|
| 166 |
+
date=date
|
| 167 |
+
)
|
| 168 |
+
|
| 169 |
+
return {
|
| 170 |
+
"success": True,
|
| 171 |
+
"program": program_markdown,
|
| 172 |
+
"message": "Worship program assembled successfully"
|
| 173 |
+
}
|
| 174 |
+
except Exception as e:
|
| 175 |
+
return {
|
| 176 |
+
"success": False,
|
| 177 |
+
"error": str(e),
|
| 178 |
+
"message": f"Program assembly failed: {str(e)}"
|
| 179 |
+
}
|
| 180 |
+
|
| 181 |
+
def _format_slides_for_generation(self, slides_data: Dict) -> str:
|
| 182 |
+
"""
|
| 183 |
+
Format extracted slides data for narrative generation.
|
| 184 |
+
|
| 185 |
+
Args:
|
| 186 |
+
slides_data: Output from extract_sermon_slides_pdf
|
| 187 |
+
|
| 188 |
+
Returns:
|
| 189 |
+
Formatted text for LLM input
|
| 190 |
+
"""
|
| 191 |
+
lines = []
|
| 192 |
+
|
| 193 |
+
# Add structure summary
|
| 194 |
+
structure = slides_data.get("structure", {})
|
| 195 |
+
if structure.get("title"):
|
| 196 |
+
lines.append(f"# {structure['title']}\n")
|
| 197 |
+
|
| 198 |
+
# Add slides content
|
| 199 |
+
for slide in slides_data.get("slides", []):
|
| 200 |
+
text = slide["text"].strip()
|
| 201 |
+
if not text:
|
| 202 |
+
continue
|
| 203 |
+
|
| 204 |
+
if slide["is_title"]:
|
| 205 |
+
lines.append(f"## {text}")
|
| 206 |
+
elif slide["is_scripture"]:
|
| 207 |
+
lines.append(f"**Scripture:** {text}")
|
| 208 |
+
else:
|
| 209 |
+
lines.append(text)
|
| 210 |
+
|
| 211 |
+
lines.append("") # Add spacing
|
| 212 |
+
|
| 213 |
+
return "\n".join(lines)
|
agents/worship_agent.py
ADDED
|
@@ -0,0 +1,201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Main workflow orchestration agent for worship program generation.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from typing import Dict, Optional, Callable
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
from agents.tools import WorshipProgramTools
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
class WorshipProgramAgent:
|
| 11 |
+
"""
|
| 12 |
+
Orchestrates worship program generation workflow.
|
| 13 |
+
|
| 14 |
+
Workflow:
|
| 15 |
+
1. Extract bulletin (worship order, hymns, date)
|
| 16 |
+
2. Process Chinese sermon OR generate from slides
|
| 17 |
+
3. Translate sermon to English
|
| 18 |
+
4. Assemble complete bilingual program
|
| 19 |
+
5. Export to markdown and DOCX
|
| 20 |
+
"""
|
| 21 |
+
|
| 22 |
+
def __init__(
|
| 23 |
+
self,
|
| 24 |
+
llm_client,
|
| 25 |
+
output_dir: str = "./outputs"
|
| 26 |
+
):
|
| 27 |
+
"""
|
| 28 |
+
Initialize worship program agent.
|
| 29 |
+
|
| 30 |
+
Args:
|
| 31 |
+
llm_client: Instance of QwenClient or compatible LLM
|
| 32 |
+
output_dir: Directory for output files
|
| 33 |
+
"""
|
| 34 |
+
self.llm = llm_client
|
| 35 |
+
self.tools = WorshipProgramTools(llm_client)
|
| 36 |
+
self.output_dir = Path(output_dir)
|
| 37 |
+
self.output_dir.mkdir(exist_ok=True, parents=True)
|
| 38 |
+
|
| 39 |
+
def generate_program(
|
| 40 |
+
self,
|
| 41 |
+
chinese_sermon_text: Optional[str] = None,
|
| 42 |
+
sermon_slides_pdf: Optional[str] = None,
|
| 43 |
+
bulletin_pdf: str = None,
|
| 44 |
+
date: Optional[str] = None,
|
| 45 |
+
progress_callback: Optional[Callable] = None
|
| 46 |
+
) -> Dict:
|
| 47 |
+
"""
|
| 48 |
+
Main workflow to generate worship program.
|
| 49 |
+
|
| 50 |
+
Args:
|
| 51 |
+
chinese_sermon_text: Pre-written Chinese sermon (if available)
|
| 52 |
+
sermon_slides_pdf: Sermon slides PDF (if sermon needs generation)
|
| 53 |
+
bulletin_pdf: Worship bulletin PDF (required)
|
| 54 |
+
date: Worship date (auto-extracted if not provided)
|
| 55 |
+
progress_callback: Function(progress: float, desc: str) to report progress
|
| 56 |
+
|
| 57 |
+
Returns:
|
| 58 |
+
{
|
| 59 |
+
"success": bool,
|
| 60 |
+
"markdown_path": str (if successful),
|
| 61 |
+
"docx_path": str (if successful),
|
| 62 |
+
"program_content": str,
|
| 63 |
+
"metadata": Dict,
|
| 64 |
+
"error": str (if failed)
|
| 65 |
+
}
|
| 66 |
+
"""
|
| 67 |
+
def update_progress(step: str, pct: float):
|
| 68 |
+
"""Helper to update progress."""
|
| 69 |
+
if progress_callback:
|
| 70 |
+
progress_callback(pct, desc=step)
|
| 71 |
+
|
| 72 |
+
try:
|
| 73 |
+
# Validation
|
| 74 |
+
if not bulletin_pdf:
|
| 75 |
+
raise ValueError("Bulletin PDF is required")
|
| 76 |
+
|
| 77 |
+
if not chinese_sermon_text and not sermon_slides_pdf:
|
| 78 |
+
raise ValueError("Must provide either chinese_sermon_text or sermon_slides_pdf")
|
| 79 |
+
|
| 80 |
+
# Step 1: Extract bulletin
|
| 81 |
+
update_progress("📄 Extracting bulletin...", 0.1)
|
| 82 |
+
bulletin_result = self.tools.extract_bulletin_tool(bulletin_pdf)
|
| 83 |
+
|
| 84 |
+
if not bulletin_result["success"]:
|
| 85 |
+
raise ValueError(f"Bulletin extraction failed: {bulletin_result.get('error', 'Unknown error')}")
|
| 86 |
+
|
| 87 |
+
bulletin_data = bulletin_result["data"]
|
| 88 |
+
if not date:
|
| 89 |
+
date = bulletin_data.get("date") or "未标注日期"
|
| 90 |
+
|
| 91 |
+
# Step 2: Get/Generate Chinese sermon
|
| 92 |
+
update_progress("📝 Processing sermon...", 0.3)
|
| 93 |
+
if chinese_sermon_text:
|
| 94 |
+
chinese_sermon = chinese_sermon_text
|
| 95 |
+
else:
|
| 96 |
+
sermon_result = self.tools.generate_sermon_narrative_tool(sermon_slides_pdf)
|
| 97 |
+
if not sermon_result["success"]:
|
| 98 |
+
raise ValueError(f"Sermon generation failed: {sermon_result.get('error', 'Unknown error')}")
|
| 99 |
+
chinese_sermon = sermon_result["narrative"]
|
| 100 |
+
|
| 101 |
+
# Step 3: Translate to English
|
| 102 |
+
update_progress("🌐 Translating sermon...", 0.5)
|
| 103 |
+
translation_result = self.tools.translate_text_tool(
|
| 104 |
+
text=chinese_sermon,
|
| 105 |
+
source_lang="Chinese",
|
| 106 |
+
target_lang="English"
|
| 107 |
+
)
|
| 108 |
+
|
| 109 |
+
if not translation_result["success"]:
|
| 110 |
+
raise ValueError(f"Translation failed: {translation_result.get('error', 'Unknown error')}")
|
| 111 |
+
|
| 112 |
+
english_sermon = translation_result["translation"]
|
| 113 |
+
|
| 114 |
+
# Step 4: Assemble program
|
| 115 |
+
update_progress("📋 Assembling worship program...", 0.7)
|
| 116 |
+
program_result = self.tools.assemble_worship_program_tool(
|
| 117 |
+
chinese_sermon=chinese_sermon,
|
| 118 |
+
english_sermon=english_sermon,
|
| 119 |
+
bulletin_data=bulletin_data,
|
| 120 |
+
date=date
|
| 121 |
+
)
|
| 122 |
+
|
| 123 |
+
if not program_result["success"]:
|
| 124 |
+
raise ValueError(f"Program assembly failed: {program_result.get('error', 'Unknown error')}")
|
| 125 |
+
|
| 126 |
+
program_markdown = program_result["program"]
|
| 127 |
+
|
| 128 |
+
# Step 5: Save outputs
|
| 129 |
+
update_progress("💾 Saving files...", 0.9)
|
| 130 |
+
markdown_path = self._save_markdown(program_markdown, date)
|
| 131 |
+
docx_path = self._save_docx(program_markdown, date)
|
| 132 |
+
|
| 133 |
+
update_progress("✅ Complete!", 1.0)
|
| 134 |
+
|
| 135 |
+
return {
|
| 136 |
+
"success": True,
|
| 137 |
+
"markdown_path": str(markdown_path),
|
| 138 |
+
"docx_path": str(docx_path),
|
| 139 |
+
"program_content": program_markdown,
|
| 140 |
+
"metadata": {
|
| 141 |
+
"date": date,
|
| 142 |
+
"chinese_sermon_length": len(chinese_sermon),
|
| 143 |
+
"english_sermon_length": len(english_sermon),
|
| 144 |
+
"bulletin_source": bulletin_pdf,
|
| 145 |
+
"sermon_source": "text" if chinese_sermon_text else "slides"
|
| 146 |
+
}
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
except Exception as e:
|
| 150 |
+
return {
|
| 151 |
+
"success": False,
|
| 152 |
+
"error": str(e),
|
| 153 |
+
"message": f"Workflow failed: {str(e)}"
|
| 154 |
+
}
|
| 155 |
+
|
| 156 |
+
def _save_markdown(self, content: str, date: str) -> Path:
|
| 157 |
+
"""
|
| 158 |
+
Save program as markdown.
|
| 159 |
+
|
| 160 |
+
Args:
|
| 161 |
+
content: Markdown content
|
| 162 |
+
date: Date string for filename
|
| 163 |
+
|
| 164 |
+
Returns:
|
| 165 |
+
Path to saved file
|
| 166 |
+
"""
|
| 167 |
+
# Sanitize date for filename
|
| 168 |
+
safe_date = date.replace("/", "-").replace(" ", "_")
|
| 169 |
+
filename = f"worship_program_{safe_date}.md"
|
| 170 |
+
filepath = self.output_dir / filename
|
| 171 |
+
|
| 172 |
+
with open(filepath, "w", encoding="utf-8") as f:
|
| 173 |
+
f.write(content)
|
| 174 |
+
|
| 175 |
+
return filepath
|
| 176 |
+
|
| 177 |
+
def _save_docx(self, markdown_content: str, date: str) -> Path:
|
| 178 |
+
"""
|
| 179 |
+
Convert markdown to DOCX and save.
|
| 180 |
+
|
| 181 |
+
Args:
|
| 182 |
+
markdown_content: Markdown content
|
| 183 |
+
date: Date string for filename
|
| 184 |
+
|
| 185 |
+
Returns:
|
| 186 |
+
Path to saved DOCX file
|
| 187 |
+
"""
|
| 188 |
+
from utils.markdown_to_docx import markdown_to_docx
|
| 189 |
+
|
| 190 |
+
safe_date = date.replace("/", "-").replace(" ", "_")
|
| 191 |
+
filename = f"worship_program_{safe_date}.docx"
|
| 192 |
+
filepath = self.output_dir / filename
|
| 193 |
+
|
| 194 |
+
try:
|
| 195 |
+
markdown_to_docx(markdown_content, str(filepath))
|
| 196 |
+
except Exception as e:
|
| 197 |
+
print(f"Warning: DOCX conversion failed: {e}")
|
| 198 |
+
# Return None if conversion fails
|
| 199 |
+
return None
|
| 200 |
+
|
| 201 |
+
return filepath
|
app.py
ADDED
|
@@ -0,0 +1,295 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Worship Program Generator - Gradio Application
|
| 3 |
+
|
| 4 |
+
Bilingual (Chinese-English) worship program generation from multiple sources.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import gradio as gr
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
from dotenv import load_dotenv
|
| 11 |
+
|
| 12 |
+
# Load environment variables
|
| 13 |
+
load_dotenv()
|
| 14 |
+
|
| 15 |
+
# Configuration
|
| 16 |
+
MODEL_ID = os.getenv("MODEL_ID", "Qwen/Qwen2.5-7B-Instruct")
|
| 17 |
+
HF_API_TOKEN = os.getenv("HF_API_TOKEN")
|
| 18 |
+
USE_LOCAL_MODEL = os.getenv("USE_LOCAL_MODEL", "false").lower() == "true"
|
| 19 |
+
MAX_FILE_SIZE_MB = int(os.getenv("MAX_FILE_SIZE_MB", "20"))
|
| 20 |
+
|
| 21 |
+
# Initialize LLM and Agent
|
| 22 |
+
from llm.qwen_client import QwenClient
|
| 23 |
+
from agents.worship_agent import WorshipProgramAgent
|
| 24 |
+
|
| 25 |
+
print(f"Initializing with model: {MODEL_ID}")
|
| 26 |
+
print(f"Using local model: {USE_LOCAL_MODEL}")
|
| 27 |
+
|
| 28 |
+
try:
|
| 29 |
+
llm_client = QwenClient(
|
| 30 |
+
model_id=MODEL_ID,
|
| 31 |
+
api_token=HF_API_TOKEN,
|
| 32 |
+
use_local=USE_LOCAL_MODEL
|
| 33 |
+
)
|
| 34 |
+
agent = WorshipProgramAgent(llm_client, output_dir="./outputs")
|
| 35 |
+
print("✓ Agent initialized successfully")
|
| 36 |
+
except Exception as e:
|
| 37 |
+
print(f"✗ Error initializing agent: {e}")
|
| 38 |
+
llm_client = None
|
| 39 |
+
agent = None
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
def process_worship_program(
|
| 43 |
+
chinese_sermon_file,
|
| 44 |
+
sermon_slides_file,
|
| 45 |
+
bulletin_file,
|
| 46 |
+
worship_date,
|
| 47 |
+
progress=gr.Progress()
|
| 48 |
+
):
|
| 49 |
+
"""
|
| 50 |
+
Main Gradio handler for worship program generation.
|
| 51 |
+
|
| 52 |
+
Args:
|
| 53 |
+
chinese_sermon_file: Uploaded .txt file with Chinese sermon (optional)
|
| 54 |
+
sermon_slides_file: Uploaded sermon slides PDF (optional)
|
| 55 |
+
bulletin_file: Uploaded bulletin PDF (required)
|
| 56 |
+
worship_date: Date string (YYYY-MM-DD)
|
| 57 |
+
progress: Gradio progress tracker
|
| 58 |
+
|
| 59 |
+
Returns:
|
| 60 |
+
(status_message, markdown_file, docx_file)
|
| 61 |
+
"""
|
| 62 |
+
if agent is None:
|
| 63 |
+
return "❌ Error: Agent not initialized. Check configuration.", None, None
|
| 64 |
+
|
| 65 |
+
# Validation
|
| 66 |
+
if not bulletin_file:
|
| 67 |
+
return "❌ Error: Bulletin PDF is required", None, None
|
| 68 |
+
|
| 69 |
+
if not chinese_sermon_file and not sermon_slides_file:
|
| 70 |
+
return "❌ Error: Must provide either Chinese sermon text OR sermon slides PDF", None, None
|
| 71 |
+
|
| 72 |
+
# Check file sizes
|
| 73 |
+
try:
|
| 74 |
+
if bulletin_file and Path(bulletin_file).stat().st_size > MAX_FILE_SIZE_MB * 1024 * 1024:
|
| 75 |
+
return f"❌ Error: Bulletin file exceeds {MAX_FILE_SIZE_MB}MB limit", None, None
|
| 76 |
+
|
| 77 |
+
if sermon_slides_file and Path(sermon_slides_file).stat().st_size > MAX_FILE_SIZE_MB * 1024 * 1024:
|
| 78 |
+
return f"❌ Error: Slides file exceeds {MAX_FILE_SIZE_MB}MB limit", None, None
|
| 79 |
+
except Exception as e:
|
| 80 |
+
return f"❌ Error checking file sizes: {str(e)}", None, None
|
| 81 |
+
|
| 82 |
+
try:
|
| 83 |
+
# Read Chinese sermon if provided
|
| 84 |
+
chinese_text = None
|
| 85 |
+
if chinese_sermon_file:
|
| 86 |
+
try:
|
| 87 |
+
with open(chinese_sermon_file, "r", encoding="utf-8") as f:
|
| 88 |
+
chinese_text = f.read()
|
| 89 |
+
except UnicodeDecodeError:
|
| 90 |
+
# Try GB2312/GBK encoding
|
| 91 |
+
with open(chinese_sermon_file, "r", encoding="gbk") as f:
|
| 92 |
+
chinese_text = f.read()
|
| 93 |
+
|
| 94 |
+
# Generate program
|
| 95 |
+
result = agent.generate_program(
|
| 96 |
+
chinese_sermon_text=chinese_text,
|
| 97 |
+
sermon_slides_pdf=sermon_slides_file,
|
| 98 |
+
bulletin_pdf=bulletin_file,
|
| 99 |
+
date=worship_date if worship_date else None,
|
| 100 |
+
progress_callback=lambda pct, desc: progress(pct, desc=desc)
|
| 101 |
+
)
|
| 102 |
+
|
| 103 |
+
if not result["success"]:
|
| 104 |
+
error_msg = result.get("message", "Unknown error")
|
| 105 |
+
return f"❌ Error: {error_msg}", None, None
|
| 106 |
+
|
| 107 |
+
# Format success message
|
| 108 |
+
metadata = result["metadata"]
|
| 109 |
+
status = f"""✅ **Worship Program Generated Successfully!**
|
| 110 |
+
|
| 111 |
+
**📅 Date:** {metadata['date']}
|
| 112 |
+
|
| 113 |
+
**📊 Statistics:**
|
| 114 |
+
- Chinese Sermon: {metadata['chinese_sermon_length']:,} characters
|
| 115 |
+
- English Sermon: {metadata['english_sermon_length']:,} characters
|
| 116 |
+
- Source: {"Pre-written text" if metadata['sermon_source'] == 'text' else "Generated from slides"}
|
| 117 |
+
|
| 118 |
+
**📁 Output Files:**
|
| 119 |
+
- Markdown: `{Path(result['markdown_path']).name}`
|
| 120 |
+
- DOCX: `{Path(result['docx_path']).name if result.get('docx_path') else 'Not generated'}`
|
| 121 |
+
|
| 122 |
+
Download the files below ⬇️
|
| 123 |
+
"""
|
| 124 |
+
|
| 125 |
+
# Return paths for download
|
| 126 |
+
markdown_file = result["markdown_path"] if Path(result["markdown_path"]).exists() else None
|
| 127 |
+
docx_file = result["docx_path"] if result.get("docx_path") and Path(result["docx_path"]).exists() else None
|
| 128 |
+
|
| 129 |
+
return status, markdown_file, docx_file
|
| 130 |
+
|
| 131 |
+
except Exception as e:
|
| 132 |
+
import traceback
|
| 133 |
+
error_msg = f"❌ **Error:**\n\n{str(e)}\n\n<details>\n<summary>Traceback</summary>\n\n```\n{traceback.format_exc()}\n```\n</details>"
|
| 134 |
+
return error_msg, None, None
|
| 135 |
+
|
| 136 |
+
|
| 137 |
+
# Gradio Interface
|
| 138 |
+
with gr.Blocks(
|
| 139 |
+
title="Worship Program Generator",
|
| 140 |
+
theme=gr.themes.Soft(),
|
| 141 |
+
css="""
|
| 142 |
+
.title { text-align: center; font-size: 2em; margin-bottom: 1em; }
|
| 143 |
+
.subtitle { text-align: center; color: #666; margin-bottom: 2em; }
|
| 144 |
+
"""
|
| 145 |
+
) as demo:
|
| 146 |
+
|
| 147 |
+
gr.Markdown(
|
| 148 |
+
"""
|
| 149 |
+
<div class="title">🙏 主日崇拜程序生成器</div>
|
| 150 |
+
<div class="title">Worship Program Generator</div>
|
| 151 |
+
<div class="subtitle">Generate bilingual worship programs from multiple sources</div>
|
| 152 |
+
""",
|
| 153 |
+
elem_classes=["title"]
|
| 154 |
+
)
|
| 155 |
+
|
| 156 |
+
gr.Markdown("""
|
| 157 |
+
### 📖 How to Use
|
| 158 |
+
|
| 159 |
+
**Required:** Worship Bulletin PDF
|
| 160 |
+
**Choose ONE:** Chinese sermon text OR sermon slides PDF
|
| 161 |
+
|
| 162 |
+
The system will:
|
| 163 |
+
1. Extract content from all sources
|
| 164 |
+
2. Generate narrative (if using slides)
|
| 165 |
+
3. Translate between languages
|
| 166 |
+
4. Assemble complete bilingual program
|
| 167 |
+
5. Export to Markdown and DOCX
|
| 168 |
+
""")
|
| 169 |
+
|
| 170 |
+
with gr.Row():
|
| 171 |
+
with gr.Column(scale=1):
|
| 172 |
+
gr.Markdown("### 📤 Input Files")
|
| 173 |
+
|
| 174 |
+
chinese_sermon_input = gr.File(
|
| 175 |
+
label="📝 Chinese Sermon Text (中文讲章) - Optional",
|
| 176 |
+
file_types=[".txt"],
|
| 177 |
+
type="filepath"
|
| 178 |
+
)
|
| 179 |
+
|
| 180 |
+
slides_input = gr.File(
|
| 181 |
+
label="🖼️ Sermon Slides PDF (讲章幻灯片) - Optional",
|
| 182 |
+
file_types=[".pdf"],
|
| 183 |
+
type="filepath"
|
| 184 |
+
)
|
| 185 |
+
|
| 186 |
+
bulletin_input = gr.File(
|
| 187 |
+
label="📋 Worship Bulletin PDF (崇拜程序单) - Required ⭐",
|
| 188 |
+
file_types=[".pdf"],
|
| 189 |
+
type="filepath"
|
| 190 |
+
)
|
| 191 |
+
|
| 192 |
+
date_input = gr.Textbox(
|
| 193 |
+
label="📅 Worship Date (YYYY-MM-DD)",
|
| 194 |
+
placeholder="2024-01-07 (leave blank to auto-detect)",
|
| 195 |
+
value=""
|
| 196 |
+
)
|
| 197 |
+
|
| 198 |
+
generate_btn = gr.Button(
|
| 199 |
+
"🚀 Generate Worship Program",
|
| 200 |
+
variant="primary",
|
| 201 |
+
size="lg"
|
| 202 |
+
)
|
| 203 |
+
|
| 204 |
+
with gr.Column(scale=1):
|
| 205 |
+
gr.Markdown("### 📥 Output")
|
| 206 |
+
|
| 207 |
+
status_output = gr.Markdown("💡 Ready to generate...")
|
| 208 |
+
|
| 209 |
+
markdown_download = gr.File(
|
| 210 |
+
label="📄 Download Markdown (.md)",
|
| 211 |
+
interactive=False
|
| 212 |
+
)
|
| 213 |
+
|
| 214 |
+
docx_download = gr.File(
|
| 215 |
+
label="📄 Download DOCX (.docx)",
|
| 216 |
+
interactive=False
|
| 217 |
+
)
|
| 218 |
+
|
| 219 |
+
# Usage Guide
|
| 220 |
+
with gr.Accordion("📚 Usage Guide & Tips", open=False):
|
| 221 |
+
gr.Markdown("""
|
| 222 |
+
### Workflow Options
|
| 223 |
+
|
| 224 |
+
**Option A: Pre-written Sermon**
|
| 225 |
+
1. Upload Chinese sermon text file (.txt, UTF-8 encoding)
|
| 226 |
+
2. Upload worship bulletin PDF
|
| 227 |
+
3. Enter date (or leave blank)
|
| 228 |
+
4. Click Generate
|
| 229 |
+
|
| 230 |
+
**Option B: Generate from Slides**
|
| 231 |
+
1. Upload sermon slides PDF (can be text or image-based)
|
| 232 |
+
2. Upload worship bulletin PDF
|
| 233 |
+
3. Enter date (or leave blank)
|
| 234 |
+
4. Click Generate (AI will create narrative from slides)
|
| 235 |
+
|
| 236 |
+
### Tips
|
| 237 |
+
|
| 238 |
+
- **Date Detection:** Leave blank to auto-extract from bulletin filename (format: `bulletin-YYYY-MM-DD.pdf`)
|
| 239 |
+
- **File Encoding:** Chinese text files should be UTF-8 or GBK encoded
|
| 240 |
+
- **PDF Support:** Both text-based and scanned (OCR) PDFs are supported
|
| 241 |
+
- **Processing Time:** Typically 30-60 seconds depending on content length
|
| 242 |
+
- **File Size Limit:** Maximum 20MB per file
|
| 243 |
+
|
| 244 |
+
### Troubleshooting
|
| 245 |
+
|
| 246 |
+
- **OCR Issues:** Ensure bulletin text is clear and high-resolution
|
| 247 |
+
- **Translation Quality:** Review theological terms for accuracy
|
| 248 |
+
- **Missing Content:** Check that PDFs contain expected sections
|
| 249 |
+
- **Encoding Errors:** Save Chinese text as UTF-8
|
| 250 |
+
|
| 251 |
+
### Output Format
|
| 252 |
+
|
| 253 |
+
The generated worship program includes:
|
| 254 |
+
- Bilingual header with date and theme
|
| 255 |
+
- Order of worship (prelude, songs, scripture)
|
| 256 |
+
- Complete sermon (Chinese + English)
|
| 257 |
+
- Liturgical elements
|
| 258 |
+
- Announcements
|
| 259 |
+
|
| 260 |
+
Both Markdown (.md) and Word (.docx) formats are provided.
|
| 261 |
+
""")
|
| 262 |
+
|
| 263 |
+
# Event handlers
|
| 264 |
+
generate_btn.click(
|
| 265 |
+
fn=process_worship_program,
|
| 266 |
+
inputs=[
|
| 267 |
+
chinese_sermon_input,
|
| 268 |
+
slides_input,
|
| 269 |
+
bulletin_input,
|
| 270 |
+
date_input
|
| 271 |
+
],
|
| 272 |
+
outputs=[
|
| 273 |
+
status_output,
|
| 274 |
+
markdown_download,
|
| 275 |
+
docx_download
|
| 276 |
+
],
|
| 277 |
+
show_progress=True
|
| 278 |
+
)
|
| 279 |
+
|
| 280 |
+
gr.Markdown("""
|
| 281 |
+
---
|
| 282 |
+
**🤖 Powered by:** Qwen 2.5 LLM | **📦 Framework:** HuggingFace Transformers | **🎨 UI:** Gradio
|
| 283 |
+
|
| 284 |
+
Built with ❤️ for church communities
|
| 285 |
+
""")
|
| 286 |
+
|
| 287 |
+
|
| 288 |
+
if __name__ == "__main__":
|
| 289 |
+
demo.queue(
|
| 290 |
+
max_size=int(os.getenv("GRADIO_MAX_QUEUE_SIZE", "10"))
|
| 291 |
+
).launch(
|
| 292 |
+
server_name=os.getenv("GRADIO_SERVER_NAME", "0.0.0.0"),
|
| 293 |
+
server_port=int(os.getenv("GRADIO_SERVER_PORT", "7860")),
|
| 294 |
+
share=os.getenv("GRADIO_SHARE", "false").lower() == "true"
|
| 295 |
+
)
|
core/__init__.py
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Core document processing and content generation modules.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from .document_processor import DocumentProcessor, ChineseTextProcessor
|
| 6 |
+
|
| 7 |
+
__all__ = [
|
| 8 |
+
"DocumentProcessor",
|
| 9 |
+
"ChineseTextProcessor",
|
| 10 |
+
]
|
core/document_processor.py
ADDED
|
@@ -0,0 +1,284 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Document processing module for PDF extraction and OCR.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import pdfplumber
|
| 6 |
+
import pytesseract
|
| 7 |
+
from pdf2image import convert_from_path
|
| 8 |
+
from PIL import Image
|
| 9 |
+
from typing import Dict, List, Optional
|
| 10 |
+
import re
|
| 11 |
+
from pathlib import Path
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
class DocumentProcessor:
|
| 15 |
+
"""Extract text and structure from PDF documents."""
|
| 16 |
+
|
| 17 |
+
def __init__(self, ocr_languages: str = "eng+chi_sim"):
|
| 18 |
+
"""
|
| 19 |
+
Initialize document processor.
|
| 20 |
+
|
| 21 |
+
Args:
|
| 22 |
+
ocr_languages: Tesseract language codes (e.g., "eng+chi_sim")
|
| 23 |
+
"""
|
| 24 |
+
self.ocr_languages = ocr_languages
|
| 25 |
+
|
| 26 |
+
def extract_bulletin_pdf(self, pdf_path: str) -> Dict:
|
| 27 |
+
"""
|
| 28 |
+
Extract worship order from bulletin PDF.
|
| 29 |
+
|
| 30 |
+
Args:
|
| 31 |
+
pdf_path: Path to bulletin PDF file
|
| 32 |
+
|
| 33 |
+
Returns:
|
| 34 |
+
{
|
| 35 |
+
"text": str, # Full text content
|
| 36 |
+
"sections": { # Structured sections
|
| 37 |
+
"hymns": List[str],
|
| 38 |
+
"scripture": str,
|
| 39 |
+
"announcements": str,
|
| 40 |
+
"order": List[str]
|
| 41 |
+
},
|
| 42 |
+
"date": str, # Extracted date
|
| 43 |
+
"metadata": Dict
|
| 44 |
+
}
|
| 45 |
+
"""
|
| 46 |
+
text = self.extract_with_structure(pdf_path)
|
| 47 |
+
date = self.extract_date_from_filename(pdf_path)
|
| 48 |
+
|
| 49 |
+
# TODO: Implement intelligent section parsing
|
| 50 |
+
sections = self._parse_bulletin_sections(text)
|
| 51 |
+
|
| 52 |
+
return {
|
| 53 |
+
"text": text,
|
| 54 |
+
"sections": sections,
|
| 55 |
+
"date": date,
|
| 56 |
+
"metadata": {
|
| 57 |
+
"filename": Path(pdf_path).name,
|
| 58 |
+
"page_count": self._get_page_count(pdf_path)
|
| 59 |
+
}
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
def extract_sermon_slides_pdf(self, pdf_path: str) -> Dict:
|
| 63 |
+
"""
|
| 64 |
+
Extract sermon content from slides PDF.
|
| 65 |
+
|
| 66 |
+
Args:
|
| 67 |
+
pdf_path: Path to sermon slides PDF
|
| 68 |
+
|
| 69 |
+
Returns:
|
| 70 |
+
{
|
| 71 |
+
"slides": List[Dict], # List of slide data
|
| 72 |
+
"structure": Dict # Sermon structure
|
| 73 |
+
}
|
| 74 |
+
"""
|
| 75 |
+
slides = []
|
| 76 |
+
|
| 77 |
+
with pdfplumber.open(pdf_path) as pdf:
|
| 78 |
+
for i, page in enumerate(pdf.pages):
|
| 79 |
+
text = page.extract_text() or ""
|
| 80 |
+
|
| 81 |
+
# If no text, try OCR
|
| 82 |
+
if len(text.strip()) < 10:
|
| 83 |
+
text = self._ocr_page(pdf_path, i)
|
| 84 |
+
|
| 85 |
+
slide_data = {
|
| 86 |
+
"page_num": i + 1,
|
| 87 |
+
"text": text,
|
| 88 |
+
"is_title": self._is_title_slide(text),
|
| 89 |
+
"is_scripture": self._is_scripture_slide(text)
|
| 90 |
+
}
|
| 91 |
+
slides.append(slide_data)
|
| 92 |
+
|
| 93 |
+
structure = self._extract_sermon_structure(slides)
|
| 94 |
+
|
| 95 |
+
return {
|
| 96 |
+
"slides": slides,
|
| 97 |
+
"structure": structure
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
def extract_with_structure(self, pdf_path: str) -> str:
|
| 101 |
+
"""
|
| 102 |
+
Extract text from PDF preserving structure.
|
| 103 |
+
|
| 104 |
+
Args:
|
| 105 |
+
pdf_path: Path to PDF file
|
| 106 |
+
|
| 107 |
+
Returns:
|
| 108 |
+
Extracted text with layout preserved
|
| 109 |
+
"""
|
| 110 |
+
content = []
|
| 111 |
+
|
| 112 |
+
try:
|
| 113 |
+
with pdfplumber.open(pdf_path) as pdf:
|
| 114 |
+
for page in pdf.pages:
|
| 115 |
+
text = page.extract_text(layout=True)
|
| 116 |
+
if text:
|
| 117 |
+
content.append(text)
|
| 118 |
+
except Exception as e:
|
| 119 |
+
print(f"Error extracting PDF: {e}")
|
| 120 |
+
# Fallback to OCR
|
| 121 |
+
content = [self._ocr_page(pdf_path, i) for i in range(self._get_page_count(pdf_path))]
|
| 122 |
+
|
| 123 |
+
return "\n\n".join(content)
|
| 124 |
+
|
| 125 |
+
def _ocr_page(self, pdf_path: str, page_num: int) -> str:
|
| 126 |
+
"""
|
| 127 |
+
OCR a single page from PDF.
|
| 128 |
+
|
| 129 |
+
Args:
|
| 130 |
+
pdf_path: Path to PDF
|
| 131 |
+
page_num: Page number (0-indexed)
|
| 132 |
+
|
| 133 |
+
Returns:
|
| 134 |
+
Extracted text from OCR
|
| 135 |
+
"""
|
| 136 |
+
try:
|
| 137 |
+
images = convert_from_path(pdf_path, first_page=page_num+1, last_page=page_num+1)
|
| 138 |
+
if images:
|
| 139 |
+
return pytesseract.image_to_string(images[0], lang=self.ocr_languages)
|
| 140 |
+
except Exception as e:
|
| 141 |
+
print(f"OCR error on page {page_num}: {e}")
|
| 142 |
+
|
| 143 |
+
return ""
|
| 144 |
+
|
| 145 |
+
def _get_page_count(self, pdf_path: str) -> int:
|
| 146 |
+
"""Get total page count from PDF."""
|
| 147 |
+
try:
|
| 148 |
+
with pdfplumber.open(pdf_path) as pdf:
|
| 149 |
+
return len(pdf.pages)
|
| 150 |
+
except:
|
| 151 |
+
return 0
|
| 152 |
+
|
| 153 |
+
def extract_date_from_filename(self, pdf_path: str) -> str:
|
| 154 |
+
"""
|
| 155 |
+
Extract date from PDF filename.
|
| 156 |
+
|
| 157 |
+
Looks for patterns like YYYY-MM-DD.
|
| 158 |
+
|
| 159 |
+
Args:
|
| 160 |
+
pdf_path: Path to PDF file
|
| 161 |
+
|
| 162 |
+
Returns:
|
| 163 |
+
Date string (YYYY-MM-DD) or empty string
|
| 164 |
+
"""
|
| 165 |
+
filename = Path(pdf_path).name
|
| 166 |
+
match = re.search(r'(\d{4}-\d{2}-\d{2})', filename)
|
| 167 |
+
if match:
|
| 168 |
+
return match.group(1)
|
| 169 |
+
return ""
|
| 170 |
+
|
| 171 |
+
def _parse_bulletin_sections(self, text: str) -> Dict:
|
| 172 |
+
"""Parse bulletin into structured sections."""
|
| 173 |
+
# TODO: Implement intelligent parsing
|
| 174 |
+
return {
|
| 175 |
+
"hymns": [],
|
| 176 |
+
"scripture": "",
|
| 177 |
+
"announcements": "",
|
| 178 |
+
"order": []
|
| 179 |
+
}
|
| 180 |
+
|
| 181 |
+
def _is_title_slide(self, text: str) -> bool:
|
| 182 |
+
"""Detect if slide is a title slide."""
|
| 183 |
+
# Simple heuristic: short text, no bullet points
|
| 184 |
+
lines = text.strip().split('\n')
|
| 185 |
+
return len(lines) <= 3 and not any(line.strip().startswith(('•', '-', '*')) for line in lines)
|
| 186 |
+
|
| 187 |
+
def _is_scripture_slide(self, text: str) -> bool:
|
| 188 |
+
"""Detect if slide contains scripture reference."""
|
| 189 |
+
# Look for common scripture patterns
|
| 190 |
+
scripture_patterns = [
|
| 191 |
+
r'[创出利民申].*\d+:\d+', # Chinese books
|
| 192 |
+
r'[约太可路罗林加弗腓西帖提多门彼雅启].*\d+:\d+',
|
| 193 |
+
r'\b[A-Z][a-z]+\s+\d+:\d+', # English books
|
| 194 |
+
]
|
| 195 |
+
return any(re.search(pattern, text) for pattern in scripture_patterns)
|
| 196 |
+
|
| 197 |
+
def _extract_sermon_structure(self, slides: List[Dict]) -> Dict:
|
| 198 |
+
"""Extract sermon structure from slides."""
|
| 199 |
+
structure = {
|
| 200 |
+
"title": "",
|
| 201 |
+
"main_points": [],
|
| 202 |
+
"scriptures": []
|
| 203 |
+
}
|
| 204 |
+
|
| 205 |
+
# Find title
|
| 206 |
+
for slide in slides:
|
| 207 |
+
if slide["is_title"]:
|
| 208 |
+
structure["title"] = slide["text"].strip()
|
| 209 |
+
break
|
| 210 |
+
|
| 211 |
+
# Find main points and scriptures
|
| 212 |
+
for slide in slides:
|
| 213 |
+
if slide["is_scripture"]:
|
| 214 |
+
structure["scriptures"].append(slide["text"].strip())
|
| 215 |
+
elif not slide["is_title"] and slide["text"].strip():
|
| 216 |
+
structure["main_points"].append(slide["text"].strip())
|
| 217 |
+
|
| 218 |
+
return structure
|
| 219 |
+
|
| 220 |
+
|
| 221 |
+
class ChineseTextProcessor:
|
| 222 |
+
"""Process and normalize Chinese text."""
|
| 223 |
+
|
| 224 |
+
@staticmethod
|
| 225 |
+
def normalize_text(text: str) -> str:
|
| 226 |
+
"""
|
| 227 |
+
Normalize Chinese text.
|
| 228 |
+
|
| 229 |
+
- Fix punctuation
|
| 230 |
+
- Remove extra whitespace
|
| 231 |
+
- Standardize quotes
|
| 232 |
+
|
| 233 |
+
Args:
|
| 234 |
+
text: Input Chinese text
|
| 235 |
+
|
| 236 |
+
Returns:
|
| 237 |
+
Normalized text
|
| 238 |
+
"""
|
| 239 |
+
# Remove extra whitespace
|
| 240 |
+
text = re.sub(r'\s+', ' ', text)
|
| 241 |
+
|
| 242 |
+
# Normalize punctuation
|
| 243 |
+
replacements = {
|
| 244 |
+
',': ',',
|
| 245 |
+
'。': '。',
|
| 246 |
+
'!': '!',
|
| 247 |
+
'?': '?',
|
| 248 |
+
':': ':',
|
| 249 |
+
';': ';',
|
| 250 |
+
'"': '"',
|
| 251 |
+
'"': '"',
|
| 252 |
+
''': "'",
|
| 253 |
+
''': "'",
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
for old, new in replacements.items():
|
| 257 |
+
text = text.replace(old, new)
|
| 258 |
+
|
| 259 |
+
return text.strip()
|
| 260 |
+
|
| 261 |
+
@staticmethod
|
| 262 |
+
def segment_sermon(text: str) -> Dict:
|
| 263 |
+
"""
|
| 264 |
+
Segment Chinese sermon into logical sections.
|
| 265 |
+
|
| 266 |
+
Args:
|
| 267 |
+
text: Full sermon text
|
| 268 |
+
|
| 269 |
+
Returns:
|
| 270 |
+
{
|
| 271 |
+
"introduction": str,
|
| 272 |
+
"main_points": List[str],
|
| 273 |
+
"conclusion": str,
|
| 274 |
+
"scripture_references": List[str]
|
| 275 |
+
}
|
| 276 |
+
"""
|
| 277 |
+
# TODO: Implement intelligent segmentation
|
| 278 |
+
# For now, return basic structure
|
| 279 |
+
return {
|
| 280 |
+
"introduction": "",
|
| 281 |
+
"main_points": [],
|
| 282 |
+
"conclusion": "",
|
| 283 |
+
"scripture_references": []
|
| 284 |
+
}
|
examples/README.md
ADDED
|
@@ -0,0 +1,214 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Example Files
|
| 2 |
+
|
| 3 |
+
This directory contains sample input files to help you understand the expected format for the Worship Program Generator.
|
| 4 |
+
|
| 5 |
+
## 📄 Files
|
| 6 |
+
|
| 7 |
+
### 1. `sample_chinese_sermon.txt`
|
| 8 |
+
**Type:** Pre-written Chinese sermon text
|
| 9 |
+
**Encoding:** UTF-8
|
| 10 |
+
**Use Case:** Option A - Upload as Chinese sermon text
|
| 11 |
+
|
| 12 |
+
**Content:**
|
| 13 |
+
- Complete sermon manuscript in Chinese
|
| 14 |
+
- Includes: Title, Scripture reference, Introduction, Main points, Conclusion
|
| 15 |
+
- Proper paragraph breaks and structure
|
| 16 |
+
- Scripture references in Chinese format
|
| 17 |
+
|
| 18 |
+
**How to Use:**
|
| 19 |
+
1. Upload this file as "Chinese Sermon Text"
|
| 20 |
+
2. Upload a bulletin PDF
|
| 21 |
+
3. Click "Generate Worship Program"
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
### 2. `sample_slides.pdf` (Not included - Create your own)
|
| 26 |
+
**Type:** Sermon slides presentation
|
| 27 |
+
**Format:** PDF (text-based or image-based)
|
| 28 |
+
**Use Case:** Option B - Generate narrative from slides
|
| 29 |
+
|
| 30 |
+
**Expected Content:**
|
| 31 |
+
- Title slide with sermon title
|
| 32 |
+
- Main point slides (bullet points or short text)
|
| 33 |
+
- Scripture reference slides
|
| 34 |
+
- Can be PowerPoint/Keynote exported as PDF
|
| 35 |
+
|
| 36 |
+
**How to Create:**
|
| 37 |
+
1. Create a sermon presentation in PowerPoint/Keynote
|
| 38 |
+
2. Export/Save as PDF
|
| 39 |
+
3. Upload as "Sermon Slides PDF"
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
### 3. `sample_bulletin.pdf` (Not included - Create your own)
|
| 44 |
+
**Type:** Worship bulletin
|
| 45 |
+
**Format:** PDF
|
| 46 |
+
**Use Case:** Required for all workflows
|
| 47 |
+
|
| 48 |
+
**Expected Content:**
|
| 49 |
+
- Worship date (preferably in filename: `bulletin-2024-01-07.pdf`)
|
| 50 |
+
- Order of worship
|
| 51 |
+
- Hymn numbers and titles
|
| 52 |
+
- Scripture reading passages
|
| 53 |
+
- Announcements
|
| 54 |
+
- Any liturgical elements
|
| 55 |
+
|
| 56 |
+
**Naming Convention:**
|
| 57 |
+
- Recommended: `RCCA-worship-bulletin-YYYY-MM-DD.pdf`
|
| 58 |
+
- Or: `bulletin-YYYY-MM-DD.pdf`
|
| 59 |
+
- Date will be auto-extracted from filename
|
| 60 |
+
|
| 61 |
+
---
|
| 62 |
+
|
| 63 |
+
## 📝 File Format Guidelines
|
| 64 |
+
|
| 65 |
+
### Chinese Sermon Text (.txt)
|
| 66 |
+
|
| 67 |
+
```
|
| 68 |
+
[Sermon Title in Chinese]
|
| 69 |
+
|
| 70 |
+
经文:[Scripture Reference]
|
| 71 |
+
|
| 72 |
+
[Introduction paragraph]
|
| 73 |
+
|
| 74 |
+
一、[First Main Point]
|
| 75 |
+
[Content for first point]
|
| 76 |
+
|
| 77 |
+
二、[Second Main Point]
|
| 78 |
+
[Content for second point]
|
| 79 |
+
|
| 80 |
+
三、[Third Main Point]
|
| 81 |
+
[Content for third point]
|
| 82 |
+
|
| 83 |
+
[Conclusion]
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
**Tips:**
|
| 87 |
+
- Use UTF-8 encoding
|
| 88 |
+
- Include clear section markers (一、二、三 or I. II. III.)
|
| 89 |
+
- Add paragraph breaks for readability
|
| 90 |
+
- Include scripture references in Chinese format
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
### Sermon Slides PDF
|
| 95 |
+
|
| 96 |
+
**Recommended Structure:**
|
| 97 |
+
```
|
| 98 |
+
Slide 1: Title
|
| 99 |
+
信心的旅程
|
| 100 |
+
Journey of Faith
|
| 101 |
+
|
| 102 |
+
Slide 2: Scripture
|
| 103 |
+
创世记 12:1-9
|
| 104 |
+
Genesis 12:1-9
|
| 105 |
+
|
| 106 |
+
Slide 3: Main Point 1
|
| 107 |
+
• 神的呼召
|
| 108 |
+
• God's Call
|
| 109 |
+
• [Key points]
|
| 110 |
+
|
| 111 |
+
Slide 4: Main Point 2
|
| 112 |
+
• 神的应许
|
| 113 |
+
• God's Promise
|
| 114 |
+
• [Key points]
|
| 115 |
+
|
| 116 |
+
Slide 5: Application
|
| 117 |
+
• 实践的教导
|
| 118 |
+
• Practical Teaching
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
**Tips:**
|
| 122 |
+
- Keep text clear and readable
|
| 123 |
+
- Use consistent formatting
|
| 124 |
+
- Include both Chinese and English if bilingual
|
| 125 |
+
- Avoid heavy graphics (focus on text content)
|
| 126 |
+
|
| 127 |
+
---
|
| 128 |
+
|
| 129 |
+
### Worship Bulletin PDF
|
| 130 |
+
|
| 131 |
+
**Recommended Sections:**
|
| 132 |
+
```
|
| 133 |
+
主日崇拜程序
|
| 134 |
+
Sunday Worship Service
|
| 135 |
+
|
| 136 |
+
日期:2024年1月7日
|
| 137 |
+
|
| 138 |
+
序乐 Prelude
|
| 139 |
+
宣召 Call to Worship
|
| 140 |
+
祷告 Prayer
|
| 141 |
+
诗歌 Hymn #123
|
| 142 |
+
读经 Scripture Reading: 创世记 12:1-9
|
| 143 |
+
信息 Sermon: [Title]
|
| 144 |
+
回应诗歌 Response Hymn #456
|
| 145 |
+
奉献 Offering
|
| 146 |
+
祝福 Benediction
|
| 147 |
+
|
| 148 |
+
报告事项 Announcements
|
| 149 |
+
- [Announcement 1]
|
| 150 |
+
- [Announcement 2]
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
**Tips:**
|
| 154 |
+
- Include date prominently
|
| 155 |
+
- List hymns with numbers
|
| 156 |
+
- Specify scripture passages
|
| 157 |
+
- Keep format clean and structured
|
| 158 |
+
|
| 159 |
+
---
|
| 160 |
+
|
| 161 |
+
## 🧪 Testing the System
|
| 162 |
+
|
| 163 |
+
### Quick Test Workflow
|
| 164 |
+
|
| 165 |
+
1. **Prepare Files:**
|
| 166 |
+
- Chinese sermon text OR sermon slides PDF
|
| 167 |
+
- Worship bulletin PDF
|
| 168 |
+
|
| 169 |
+
2. **Upload:**
|
| 170 |
+
- Go to the Worship Program Generator interface
|
| 171 |
+
- Upload your files
|
| 172 |
+
- Enter or leave blank the worship date
|
| 173 |
+
|
| 174 |
+
3. **Generate:**
|
| 175 |
+
- Click "Generate Worship Program"
|
| 176 |
+
- Wait 30-60 seconds
|
| 177 |
+
|
| 178 |
+
4. **Download:**
|
| 179 |
+
- Download both Markdown and DOCX versions
|
| 180 |
+
- Review for accuracy
|
| 181 |
+
- Edit as needed
|
| 182 |
+
|
| 183 |
+
---
|
| 184 |
+
|
| 185 |
+
## ⚠️ Common Issues
|
| 186 |
+
|
| 187 |
+
### Encoding Errors
|
| 188 |
+
- **Problem:** Chinese characters display incorrectly
|
| 189 |
+
- **Solution:** Save text files as UTF-8 encoding
|
| 190 |
+
|
| 191 |
+
### PDF Extraction Failures
|
| 192 |
+
- **Problem:** Cannot extract text from PDF
|
| 193 |
+
- **Solution:** Ensure PDF is not password-protected, try regenerating PDF with text layer
|
| 194 |
+
|
| 195 |
+
### Missing Date
|
| 196 |
+
- **Problem:** Date not auto-detected
|
| 197 |
+
- **Solution:** Include date in filename or manually enter in the form
|
| 198 |
+
|
| 199 |
+
### Translation Quality
|
| 200 |
+
- **Problem:** Translation is awkward or inaccurate
|
| 201 |
+
- **Solution:** Review and manually edit the output, especially theological terms
|
| 202 |
+
|
| 203 |
+
---
|
| 204 |
+
|
| 205 |
+
## 📧 Support
|
| 206 |
+
|
| 207 |
+
For issues or questions:
|
| 208 |
+
1. Check the troubleshooting section in the main README
|
| 209 |
+
2. Review these example formats
|
| 210 |
+
3. Open an issue on GitHub with sample files (anonymized)
|
| 211 |
+
|
| 212 |
+
---
|
| 213 |
+
|
| 214 |
+
**Note:** The actual PDF files (`sample_slides.pdf` and `sample_bulletin.pdf`) are not included in this repository. Please create your own based on the guidelines above, or use your church's existing files.
|
examples/sample_chinese_sermon.txt
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
信心的旅程:亚伯拉罕的呼召
|
| 2 |
+
|
| 3 |
+
经文:创世记12:1-9
|
| 4 |
+
|
| 5 |
+
引言:
|
| 6 |
+
|
| 7 |
+
今天我们一起来思考信心的含义。当我们回顾圣经中伟大的信心榜样时,亚伯拉罕的名字总是首先浮现在我们的脑海中。他被称为"信心之父",不是因为他从未怀疑,而是因为他在怀疑中仍然选择相信和顺服。
|
| 8 |
+
|
| 9 |
+
一、神的呼召(12:1)
|
| 10 |
+
|
| 11 |
+
"耶和华对亚伯兰说:你要离开本地、本族、父家,往我所要指示你的地去。"
|
| 12 |
+
|
| 13 |
+
这是一个看似不合理的呼召。神要求亚伯拉罕离开他所熟悉的一切:
|
| 14 |
+
- 离开本地:放弃安全的环境
|
| 15 |
+
- 离开本族:放弃亲密的关系
|
| 16 |
+
- 离开父家:放弃家族的产业
|
| 17 |
+
|
| 18 |
+
更令人困惑的是,神并没有明确告诉他目的地在哪里,只说"往我所要指示你的地去"。这需要完全的信靠。
|
| 19 |
+
|
| 20 |
+
在我们的生活中,神的呼召有时也是如此。祂可能要求我们离开舒适区,进入未知的领域。问题不在于我们是否感到害怕,而在于我们是否愿意顺服。
|
| 21 |
+
|
| 22 |
+
二、神的应许(12:2-3)
|
| 23 |
+
|
| 24 |
+
虽然神的要求看似严苛,但祂同时给予了宝贵的应许:
|
| 25 |
+
|
| 26 |
+
1. "我必叫你成为大国" - 后裔的应许
|
| 27 |
+
2. "我必赐福给你" - 福分的应许
|
| 28 |
+
3. "叫你的名为大" - 名声的应许
|
| 29 |
+
4. "你也要叫别人得福" - 使命的应许
|
| 30 |
+
|
| 31 |
+
这些应许显明了神呼召的目的。神呼召我们,不仅仅是为了我们个人的益处,更是为了祂国度的计划。我们蒙福,是为了成为别人的祝福。
|
| 32 |
+
|
| 33 |
+
三、信心的回应(12:4)
|
| 34 |
+
|
| 35 |
+
"亚伯兰就照着耶和华的吩咐去了。"
|
| 36 |
+
|
| 37 |
+
这简单的一句话,代表了巨大的信心行动。亚伯兰当时已经七十五岁,这个年纪通常是享受安逸的时候,但他选择了顺服。
|
| 38 |
+
|
| 39 |
+
真正的信心不是停留在口头上,而是表现在行动中。雅各书2:26说:"身体没有灵魂是死的,信心没有行为也是死的。"
|
| 40 |
+
|
| 41 |
+
四、实践的教导
|
| 42 |
+
|
| 43 |
+
1. **顺服需要勇气**
|
| 44 |
+
- 面对未知时,勇敢迈出第一步
|
| 45 |
+
- 相信神的引导胜过自己的计划
|
| 46 |
+
|
| 47 |
+
2. **等候需要耐心**
|
| 48 |
+
- 神的应许不总是立即实现
|
| 49 |
+
- 在等候中继续信靠和顺服
|
| 50 |
+
|
| 51 |
+
3. **祝福带来责任**
|
| 52 |
+
- 我们领受祝福,是为了传递祝福
|
| 53 |
+
- 神的恩典应该激励我们去服事他人
|
| 54 |
+
|
| 55 |
+
结语:
|
| 56 |
+
|
| 57 |
+
亚伯拉罕的信心之旅告诉我们,真正的信心是在不确定中仍然选择相信神。今天,神也在呼召我们,可能不是要我们离开物理上的家乡,但可能是要我们离开属灵上的舒适区。
|
| 58 |
+
|
| 59 |
+
让我们像亚伯拉罕一样,勇敢地回应神的呼召,因为我们知道,那呼召我们的是信实的。
|
| 60 |
+
|
| 61 |
+
愿神赐福我们每一个人,让我们在信心的旅程中不断成长。阿们。
|
llm/__init__.py
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
LLM client and prompt management modules.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from .qwen_client import QwenClient
|
| 6 |
+
from .prompt_templates import SYSTEM_PROMPTS, TASK_PROMPTS
|
| 7 |
+
|
| 8 |
+
__all__ = [
|
| 9 |
+
"QwenClient",
|
| 10 |
+
"SYSTEM_PROMPTS",
|
| 11 |
+
"TASK_PROMPTS",
|
| 12 |
+
]
|
llm/prompt_templates.py
ADDED
|
@@ -0,0 +1,166 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
System prompts and task templates for LLM interactions.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
SYSTEM_PROMPTS = {
|
| 6 |
+
"worship_assembler": """You are a worship program coordinator for a bilingual Chinese-English church.
|
| 7 |
+
Your task is to create well-structured, reverent worship programs that integrate:
|
| 8 |
+
- Sermon content (Chinese with English translation)
|
| 9 |
+
- Hymns and worship songs
|
| 10 |
+
- Scripture readings
|
| 11 |
+
- Liturgical elements (prayers, responsive readings)
|
| 12 |
+
- Announcements
|
| 13 |
+
|
| 14 |
+
Format output in clear markdown with bilingual sections. Maintain a reverent, professional tone.
|
| 15 |
+
Preserve the theological content and ensure proper formatting for both languages.""",
|
| 16 |
+
|
| 17 |
+
"translator": """You are a professional translator specializing in religious texts, liturgy, and theology.
|
| 18 |
+
|
| 19 |
+
Preserve:
|
| 20 |
+
- Theological accuracy and terminology
|
| 21 |
+
- Cultural and denominational sensitivity
|
| 22 |
+
- Formatting and structure
|
| 23 |
+
- Tone and register
|
| 24 |
+
- Scripture references
|
| 25 |
+
|
| 26 |
+
Maintain natural language flow in the target language while staying faithful to the source.""",
|
| 27 |
+
|
| 28 |
+
"narrative_generator": """You are a pastoral assistant helping prepare sermon manuscripts.
|
| 29 |
+
Generate flowing, coherent sermon narratives from outlines and slides.
|
| 30 |
+
|
| 31 |
+
Maintain:
|
| 32 |
+
- Theological depth and accuracy
|
| 33 |
+
- Pastoral and encouraging tone
|
| 34 |
+
- Logical flow and transitions
|
| 35 |
+
- Proper Chinese language style
|
| 36 |
+
- Clear main points and application""",
|
| 37 |
+
}
|
| 38 |
+
|
| 39 |
+
TASK_PROMPTS = {
|
| 40 |
+
"assemble_program": """Create a complete bilingual worship program using these sources:
|
| 41 |
+
|
| 42 |
+
**Sermon Narrative (Chinese):**
|
| 43 |
+
{chinese_sermon}
|
| 44 |
+
|
| 45 |
+
**Sermon Translation (English):**
|
| 46 |
+
{english_sermon}
|
| 47 |
+
|
| 48 |
+
**Bulletin (Worship Order):**
|
| 49 |
+
{bulletin_content}
|
| 50 |
+
|
| 51 |
+
**Date:** {date}
|
| 52 |
+
|
| 53 |
+
Generate a complete worship program in markdown format with these sections:
|
| 54 |
+
|
| 55 |
+
1. **Header** - Date, theme in both languages
|
| 56 |
+
2. **Prelude/Welcome** - 序乐/欢迎
|
| 57 |
+
3. **Worship Songs** - Include hymn numbers from bulletin
|
| 58 |
+
4. **Scripture Reading** - 读经 with references
|
| 59 |
+
5. **Sermon** - 信息 (Chinese text followed by English translation)
|
| 60 |
+
6. **Response/Offering** - 回应/奉献
|
| 61 |
+
7. **Benediction** - 祝福
|
| 62 |
+
8. **Announcements** - 报告事项
|
| 63 |
+
|
| 64 |
+
Use this markdown structure:
|
| 65 |
+
|
| 66 |
+
```markdown
|
| 67 |
+
# 主日崇拜程序 Sunday Worship Program
|
| 68 |
+
|
| 69 |
+
**日期 Date:** {date}
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
## 序乐 Prelude
|
| 74 |
+
|
| 75 |
+
[Content from bulletin]
|
| 76 |
+
|
| 77 |
+
## 诗歌敬拜 Worship in Song
|
| 78 |
+
|
| 79 |
+
[Hymns with numbers]
|
| 80 |
+
|
| 81 |
+
## 读经 Scripture Reading
|
| 82 |
+
|
| 83 |
+
[Passage and reference]
|
| 84 |
+
|
| 85 |
+
## 信息 Sermon
|
| 86 |
+
|
| 87 |
+
### [Sermon Title in Chinese]
|
| 88 |
+
### [Sermon Title in English]
|
| 89 |
+
|
| 90 |
+
**中文 Chinese:**
|
| 91 |
+
|
| 92 |
+
{chinese_sermon}
|
| 93 |
+
|
| 94 |
+
**English:**
|
| 95 |
+
|
| 96 |
+
{english_sermon}
|
| 97 |
+
|
| 98 |
+
## 回应诗歌 Response Song
|
| 99 |
+
|
| 100 |
+
[Hymn information]
|
| 101 |
+
|
| 102 |
+
## 奉献 Offering
|
| 103 |
+
|
| 104 |
+
## 祝福 Benediction
|
| 105 |
+
|
| 106 |
+
## 报告事项 Announcements
|
| 107 |
+
|
| 108 |
+
[Announcements from bulletin]
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
Generate the complete program now:""",
|
| 114 |
+
|
| 115 |
+
"extract_sermon_structure": """Analyze this sermon content and extract its structure:
|
| 116 |
+
|
| 117 |
+
{sermon_text}
|
| 118 |
+
|
| 119 |
+
Provide a structured analysis in this format:
|
| 120 |
+
|
| 121 |
+
**Title:**
|
| 122 |
+
- Chinese: [title]
|
| 123 |
+
- English: [title]
|
| 124 |
+
|
| 125 |
+
**Main Points:**
|
| 126 |
+
1. [Point 1]
|
| 127 |
+
2. [Point 2]
|
| 128 |
+
3. [Point 3]
|
| 129 |
+
|
| 130 |
+
**Scripture References:**
|
| 131 |
+
- [Reference 1]
|
| 132 |
+
- [Reference 2]
|
| 133 |
+
|
| 134 |
+
**Key Themes:**
|
| 135 |
+
- [Theme 1]
|
| 136 |
+
- [Theme 2]
|
| 137 |
+
|
| 138 |
+
Provide the analysis:""",
|
| 139 |
+
|
| 140 |
+
"generate_narrative": """Based on these sermon slides, generate a flowing narrative sermon in Chinese:
|
| 141 |
+
|
| 142 |
+
{slides_content}
|
| 143 |
+
|
| 144 |
+
Requirements:
|
| 145 |
+
1. Expand bullet points into complete paragraphs
|
| 146 |
+
2. Add smooth transitions between sections
|
| 147 |
+
3. Maintain theological depth
|
| 148 |
+
4. Use appropriate pastoral tone
|
| 149 |
+
5. Keep the structure: introduction → main points → conclusion
|
| 150 |
+
6. Include applications and illustrations where appropriate
|
| 151 |
+
|
| 152 |
+
Generate the complete sermon narrative:""",
|
| 153 |
+
|
| 154 |
+
"translate_sermon": """Translate this Chinese sermon to English, preserving theological accuracy:
|
| 155 |
+
|
| 156 |
+
{chinese_text}
|
| 157 |
+
|
| 158 |
+
Requirements:
|
| 159 |
+
1. Maintain theological terminology accuracy
|
| 160 |
+
2. Preserve the tone and style
|
| 161 |
+
3. Keep paragraph structure
|
| 162 |
+
4. Translate scripture references appropriately
|
| 163 |
+
5. Ensure natural English flow
|
| 164 |
+
|
| 165 |
+
English translation:""",
|
| 166 |
+
}
|
llm/qwen_client.py
ADDED
|
@@ -0,0 +1,218 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Qwen LLM client wrapper for HuggingFace Inference API.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
from typing import Dict, List, Optional
|
| 7 |
+
from huggingface_hub import InferenceClient
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
class QwenClient:
|
| 11 |
+
"""Wrapper for Qwen model via HuggingFace Inference API."""
|
| 12 |
+
|
| 13 |
+
def __init__(
|
| 14 |
+
self,
|
| 15 |
+
model_id: str = "Qwen/Qwen2.5-7B-Instruct",
|
| 16 |
+
api_token: Optional[str] = None,
|
| 17 |
+
use_local: bool = False
|
| 18 |
+
):
|
| 19 |
+
"""
|
| 20 |
+
Initialize Qwen client.
|
| 21 |
+
|
| 22 |
+
Args:
|
| 23 |
+
model_id: HuggingFace model ID
|
| 24 |
+
api_token: HF API token (optional, uses env var if not provided)
|
| 25 |
+
use_local: If True, load model locally (requires GPU)
|
| 26 |
+
"""
|
| 27 |
+
self.model_id = model_id
|
| 28 |
+
self.api_token = api_token or os.getenv("HF_API_TOKEN")
|
| 29 |
+
self.use_local = use_local
|
| 30 |
+
|
| 31 |
+
if use_local:
|
| 32 |
+
# Load model locally (requires GPU)
|
| 33 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 34 |
+
print(f"Loading {model_id} locally...")
|
| 35 |
+
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 36 |
+
self.model = AutoModelForCausalLM.from_pretrained(
|
| 37 |
+
model_id,
|
| 38 |
+
device_map="auto",
|
| 39 |
+
torch_dtype="auto"
|
| 40 |
+
)
|
| 41 |
+
print("Model loaded successfully")
|
| 42 |
+
else:
|
| 43 |
+
# Use HF Inference API (serverless)
|
| 44 |
+
self.client = InferenceClient(
|
| 45 |
+
model=model_id,
|
| 46 |
+
token=self.api_token
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
def chat(
|
| 50 |
+
self,
|
| 51 |
+
messages: List[Dict[str, str]],
|
| 52 |
+
max_tokens: int = 2048,
|
| 53 |
+
temperature: float = 0.7,
|
| 54 |
+
**kwargs
|
| 55 |
+
) -> str:
|
| 56 |
+
"""
|
| 57 |
+
Send chat completion request.
|
| 58 |
+
|
| 59 |
+
Args:
|
| 60 |
+
messages: List of {"role": "user/assistant/system", "content": str}
|
| 61 |
+
max_tokens: Max generation length
|
| 62 |
+
temperature: Sampling temperature
|
| 63 |
+
**kwargs: Additional parameters
|
| 64 |
+
|
| 65 |
+
Returns:
|
| 66 |
+
Generated text
|
| 67 |
+
"""
|
| 68 |
+
if self.use_local:
|
| 69 |
+
return self._chat_local(messages, max_tokens, temperature)
|
| 70 |
+
else:
|
| 71 |
+
return self._chat_api(messages, max_tokens, temperature)
|
| 72 |
+
|
| 73 |
+
def _chat_api(self, messages: List[Dict[str, str]], max_tokens: int, temperature: float) -> str:
|
| 74 |
+
"""Use HF Inference API."""
|
| 75 |
+
try:
|
| 76 |
+
response = self.client.chat_completion(
|
| 77 |
+
messages=messages,
|
| 78 |
+
max_tokens=max_tokens,
|
| 79 |
+
temperature=temperature,
|
| 80 |
+
)
|
| 81 |
+
return response.choices[0].message.content
|
| 82 |
+
except Exception as e:
|
| 83 |
+
print(f"Error calling HF Inference API: {e}")
|
| 84 |
+
raise
|
| 85 |
+
|
| 86 |
+
def _chat_local(self, messages: List[Dict[str, str]], max_tokens: int, temperature: float) -> str:
|
| 87 |
+
"""Use local model."""
|
| 88 |
+
try:
|
| 89 |
+
text = self.tokenizer.apply_chat_template(
|
| 90 |
+
messages,
|
| 91 |
+
tokenize=False,
|
| 92 |
+
add_generation_prompt=True
|
| 93 |
+
)
|
| 94 |
+
inputs = self.tokenizer([text], return_tensors="pt").to(self.model.device)
|
| 95 |
+
outputs = self.model.generate(
|
| 96 |
+
**inputs,
|
| 97 |
+
max_new_tokens=max_tokens,
|
| 98 |
+
temperature=temperature,
|
| 99 |
+
do_sample=temperature > 0
|
| 100 |
+
)
|
| 101 |
+
generated = self.tokenizer.decode(
|
| 102 |
+
outputs[0][len(inputs[0]):],
|
| 103 |
+
skip_special_tokens=True
|
| 104 |
+
)
|
| 105 |
+
return generated
|
| 106 |
+
except Exception as e:
|
| 107 |
+
print(f"Error with local model inference: {e}")
|
| 108 |
+
raise
|
| 109 |
+
|
| 110 |
+
def translate(
|
| 111 |
+
self,
|
| 112 |
+
text: str,
|
| 113 |
+
source_lang: str = "Chinese",
|
| 114 |
+
target_lang: str = "English"
|
| 115 |
+
) -> str:
|
| 116 |
+
"""
|
| 117 |
+
Translate text between languages.
|
| 118 |
+
|
| 119 |
+
Args:
|
| 120 |
+
text: Source text
|
| 121 |
+
source_lang: Source language name
|
| 122 |
+
target_lang: Target language name
|
| 123 |
+
|
| 124 |
+
Returns:
|
| 125 |
+
Translated text
|
| 126 |
+
"""
|
| 127 |
+
prompt = f"""Translate the following {source_lang} text to {target_lang}.
|
| 128 |
+
Preserve formatting, meaning, and theological terminology accurately.
|
| 129 |
+
|
| 130 |
+
{source_lang} text:
|
| 131 |
+
{text}
|
| 132 |
+
|
| 133 |
+
{target_lang} translation:"""
|
| 134 |
+
|
| 135 |
+
messages = [
|
| 136 |
+
{
|
| 137 |
+
"role": "system",
|
| 138 |
+
"content": "You are a professional translator specializing in religious and liturgical texts. Maintain theological accuracy and cultural sensitivity."
|
| 139 |
+
},
|
| 140 |
+
{
|
| 141 |
+
"role": "user",
|
| 142 |
+
"content": prompt
|
| 143 |
+
}
|
| 144 |
+
]
|
| 145 |
+
|
| 146 |
+
return self.chat(messages, temperature=0.3)
|
| 147 |
+
|
| 148 |
+
def generate_narrative(self, slides_content: str) -> str:
|
| 149 |
+
"""
|
| 150 |
+
Generate sermon narrative from slide bullet points.
|
| 151 |
+
|
| 152 |
+
Args:
|
| 153 |
+
slides_content: Extracted content from slides
|
| 154 |
+
|
| 155 |
+
Returns:
|
| 156 |
+
Generated sermon narrative in Chinese
|
| 157 |
+
"""
|
| 158 |
+
prompt = f"""Based on these sermon slides, generate a flowing narrative sermon text in Chinese.
|
| 159 |
+
Expand bullet points into complete paragraphs while preserving the theological content and structure.
|
| 160 |
+
|
| 161 |
+
Sermon Slides:
|
| 162 |
+
{slides_content}
|
| 163 |
+
|
| 164 |
+
Generate a complete, cohesive sermon narrative:"""
|
| 165 |
+
|
| 166 |
+
messages = [
|
| 167 |
+
{
|
| 168 |
+
"role": "system",
|
| 169 |
+
"content": "You are a pastoral assistant who helps prepare sermon manuscripts. Generate flowing, theologically sound sermon narratives."
|
| 170 |
+
},
|
| 171 |
+
{
|
| 172 |
+
"role": "user",
|
| 173 |
+
"content": prompt
|
| 174 |
+
}
|
| 175 |
+
]
|
| 176 |
+
|
| 177 |
+
return self.chat(messages, max_tokens=4096, temperature=0.7)
|
| 178 |
+
|
| 179 |
+
def assemble_program(
|
| 180 |
+
self,
|
| 181 |
+
chinese_sermon: str,
|
| 182 |
+
english_sermon: str,
|
| 183 |
+
bulletin_content: str,
|
| 184 |
+
date: str
|
| 185 |
+
) -> str:
|
| 186 |
+
"""
|
| 187 |
+
Assemble complete bilingual worship program.
|
| 188 |
+
|
| 189 |
+
Args:
|
| 190 |
+
chinese_sermon: Chinese sermon text
|
| 191 |
+
english_sermon: English sermon translation
|
| 192 |
+
bulletin_content: Extracted bulletin content
|
| 193 |
+
date: Worship date
|
| 194 |
+
|
| 195 |
+
Returns:
|
| 196 |
+
Complete worship program in markdown format
|
| 197 |
+
"""
|
| 198 |
+
from .prompt_templates import TASK_PROMPTS, SYSTEM_PROMPTS
|
| 199 |
+
|
| 200 |
+
prompt = TASK_PROMPTS["assemble_program"].format(
|
| 201 |
+
chinese_sermon=chinese_sermon,
|
| 202 |
+
english_sermon=english_sermon,
|
| 203 |
+
bulletin_content=bulletin_content,
|
| 204 |
+
date=date
|
| 205 |
+
)
|
| 206 |
+
|
| 207 |
+
messages = [
|
| 208 |
+
{
|
| 209 |
+
"role": "system",
|
| 210 |
+
"content": SYSTEM_PROMPTS["worship_assembler"]
|
| 211 |
+
},
|
| 212 |
+
{
|
| 213 |
+
"role": "user",
|
| 214 |
+
"content": prompt
|
| 215 |
+
}
|
| 216 |
+
]
|
| 217 |
+
|
| 218 |
+
return self.chat(messages, max_tokens=4096, temperature=0.5)
|
requirements.txt
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Worship Program Generator - Dependencies
|
| 2 |
+
|
| 3 |
+
# Core Framework
|
| 4 |
+
gradio>=4.44.0
|
| 5 |
+
python-dotenv>=1.0.0
|
| 6 |
+
|
| 7 |
+
# HuggingFace & LLM
|
| 8 |
+
huggingface_hub>=0.20.0
|
| 9 |
+
transformers>=4.40.0
|
| 10 |
+
accelerate>=0.25.0
|
| 11 |
+
# torch - uncomment if using local model inference
|
| 12 |
+
# torch>=2.0.0
|
| 13 |
+
# Note: For HF Inference API only, torch is not required
|
| 14 |
+
|
| 15 |
+
# Document Processing
|
| 16 |
+
pypdf2>=3.0.0
|
| 17 |
+
pdfplumber>=0.10.0
|
| 18 |
+
pillow>=10.0.0
|
| 19 |
+
pytesseract>=0.3.10
|
| 20 |
+
pdf2image>=1.16.3
|
| 21 |
+
|
| 22 |
+
# Optional: Better PDF processing
|
| 23 |
+
pymupdf>=1.23.0 # PyMuPDF for advanced PDF handling
|
| 24 |
+
|
| 25 |
+
# Text Processing
|
| 26 |
+
python-docx>=1.1.0
|
| 27 |
+
markdown>=3.5.0
|
| 28 |
+
|
| 29 |
+
# HTTP & API
|
| 30 |
+
requests>=2.31.0
|
| 31 |
+
aiohttp>=3.9.0
|
| 32 |
+
|
| 33 |
+
# Utilities
|
| 34 |
+
tqdm>=4.66.0
|
| 35 |
+
python-dateutil>=2.8.0
|
| 36 |
+
|
| 37 |
+
# Development (optional - comment out for production)
|
| 38 |
+
# pytest>=7.4.0
|
| 39 |
+
# black>=23.0.0
|
| 40 |
+
# flake8>=6.0.0
|
utils/__init__.py
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Utility functions for file handling and format conversion.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from .markdown_to_docx import markdown_to_docx
|
| 6 |
+
from .file_utils import sanitize_filename, ensure_directory
|
| 7 |
+
|
| 8 |
+
__all__ = [
|
| 9 |
+
"markdown_to_docx",
|
| 10 |
+
"sanitize_filename",
|
| 11 |
+
"ensure_directory",
|
| 12 |
+
]
|
utils/file_utils.py
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
File handling utilities.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import re
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
from typing import Union
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
def sanitize_filename(filename: str) -> str:
|
| 11 |
+
"""
|
| 12 |
+
Sanitize filename by removing invalid characters.
|
| 13 |
+
|
| 14 |
+
Args:
|
| 15 |
+
filename: Original filename
|
| 16 |
+
|
| 17 |
+
Returns:
|
| 18 |
+
Sanitized filename safe for filesystems
|
| 19 |
+
"""
|
| 20 |
+
# Remove invalid characters
|
| 21 |
+
filename = re.sub(r'[<>:"/\\|?*]', '_', filename)
|
| 22 |
+
|
| 23 |
+
# Remove leading/trailing spaces and dots
|
| 24 |
+
filename = filename.strip('. ')
|
| 25 |
+
|
| 26 |
+
# Limit length
|
| 27 |
+
if len(filename) > 255:
|
| 28 |
+
name, ext = filename.rsplit('.', 1) if '.' in filename else (filename, '')
|
| 29 |
+
filename = name[:250] + ('.' + ext if ext else '')
|
| 30 |
+
|
| 31 |
+
return filename
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
def ensure_directory(path: Union[str, Path]) -> Path:
|
| 35 |
+
"""
|
| 36 |
+
Ensure directory exists, create if necessary.
|
| 37 |
+
|
| 38 |
+
Args:
|
| 39 |
+
path: Directory path
|
| 40 |
+
|
| 41 |
+
Returns:
|
| 42 |
+
Path object
|
| 43 |
+
"""
|
| 44 |
+
path = Path(path)
|
| 45 |
+
path.mkdir(parents=True, exist_ok=True)
|
| 46 |
+
return path
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def get_file_size_mb(file_path: Union[str, Path]) -> float:
|
| 50 |
+
"""
|
| 51 |
+
Get file size in megabytes.
|
| 52 |
+
|
| 53 |
+
Args:
|
| 54 |
+
file_path: Path to file
|
| 55 |
+
|
| 56 |
+
Returns:
|
| 57 |
+
File size in MB
|
| 58 |
+
"""
|
| 59 |
+
path = Path(file_path)
|
| 60 |
+
return path.stat().st_size / (1024 * 1024)
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
def validate_file_type(file_path: Union[str, Path], allowed_extensions: list) -> bool:
|
| 64 |
+
"""
|
| 65 |
+
Validate file extension.
|
| 66 |
+
|
| 67 |
+
Args:
|
| 68 |
+
file_path: Path to file
|
| 69 |
+
allowed_extensions: List of allowed extensions (e.g., ['.pdf', '.txt'])
|
| 70 |
+
|
| 71 |
+
Returns:
|
| 72 |
+
True if valid, False otherwise
|
| 73 |
+
"""
|
| 74 |
+
path = Path(file_path)
|
| 75 |
+
return path.suffix.lower() in [ext.lower() for ext in allowed_extensions]
|
utils/markdown_to_docx.py
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Convert markdown to DOCX with proper formatting.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from docx import Document
|
| 6 |
+
from docx.shared import Pt, Inches
|
| 7 |
+
from docx.enum.text import WD_ALIGN_PARAGRAPH
|
| 8 |
+
import re
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
def markdown_to_docx(markdown_content: str, output_path: str):
|
| 12 |
+
"""
|
| 13 |
+
Convert markdown content to DOCX file.
|
| 14 |
+
|
| 15 |
+
Args:
|
| 16 |
+
markdown_content: Markdown text
|
| 17 |
+
output_path: Path to save DOCX file
|
| 18 |
+
|
| 19 |
+
Note:
|
| 20 |
+
This is a basic converter. For more complex markdown,
|
| 21 |
+
consider using pandoc or pypandoc.
|
| 22 |
+
"""
|
| 23 |
+
doc = Document()
|
| 24 |
+
|
| 25 |
+
# Set document styles
|
| 26 |
+
style = doc.styles['Normal']
|
| 27 |
+
style.font.name = 'Arial'
|
| 28 |
+
style.font.size = Pt(11)
|
| 29 |
+
|
| 30 |
+
lines = markdown_content.split('\n')
|
| 31 |
+
i = 0
|
| 32 |
+
|
| 33 |
+
while i < len(lines):
|
| 34 |
+
line = lines[i].strip()
|
| 35 |
+
|
| 36 |
+
# Skip empty lines
|
| 37 |
+
if not line:
|
| 38 |
+
i += 1
|
| 39 |
+
continue
|
| 40 |
+
|
| 41 |
+
# Headers
|
| 42 |
+
if line.startswith('# '):
|
| 43 |
+
heading = doc.add_heading(line[2:], level=1)
|
| 44 |
+
heading.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
| 45 |
+
elif line.startswith('## '):
|
| 46 |
+
doc.add_heading(line[3:], level=2)
|
| 47 |
+
elif line.startswith('### '):
|
| 48 |
+
doc.add_heading(line[4:], level=3)
|
| 49 |
+
|
| 50 |
+
# Horizontal rules
|
| 51 |
+
elif line.startswith('---'):
|
| 52 |
+
doc.add_paragraph('_' * 50)
|
| 53 |
+
|
| 54 |
+
# Bold text (simple pattern)
|
| 55 |
+
elif '**' in line:
|
| 56 |
+
p = doc.add_paragraph()
|
| 57 |
+
parts = line.split('**')
|
| 58 |
+
for idx, part in enumerate(parts):
|
| 59 |
+
if idx % 2 == 1: # Bold parts
|
| 60 |
+
run = p.add_run(part)
|
| 61 |
+
run.bold = True
|
| 62 |
+
else:
|
| 63 |
+
p.add_run(part)
|
| 64 |
+
|
| 65 |
+
# Lists
|
| 66 |
+
elif line.startswith('- ') or line.startswith('* '):
|
| 67 |
+
doc.add_paragraph(line[2:], style='List Bullet')
|
| 68 |
+
elif re.match(r'^\d+\.\s', line):
|
| 69 |
+
doc.add_paragraph(line[3:], style='List Number')
|
| 70 |
+
|
| 71 |
+
# Regular paragraphs
|
| 72 |
+
else:
|
| 73 |
+
# Handle multiple consecutive lines as one paragraph
|
| 74 |
+
paragraph_lines = [line]
|
| 75 |
+
j = i + 1
|
| 76 |
+
while j < len(lines) and lines[j].strip() and not _is_special_line(lines[j]):
|
| 77 |
+
paragraph_lines.append(lines[j].strip())
|
| 78 |
+
j += 1
|
| 79 |
+
|
| 80 |
+
full_paragraph = ' '.join(paragraph_lines)
|
| 81 |
+
doc.add_paragraph(full_paragraph)
|
| 82 |
+
i = j - 1
|
| 83 |
+
|
| 84 |
+
i += 1
|
| 85 |
+
|
| 86 |
+
# Save document
|
| 87 |
+
doc.save(output_path)
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
def _is_special_line(line: str) -> bool:
|
| 91 |
+
"""Check if line is a special markdown element."""
|
| 92 |
+
line = line.strip()
|
| 93 |
+
return (
|
| 94 |
+
line.startswith('#') or
|
| 95 |
+
line.startswith('-') or
|
| 96 |
+
line.startswith('*') or
|
| 97 |
+
line.startswith('---') or
|
| 98 |
+
re.match(r'^\d+\.\s', line) or
|
| 99 |
+
'**' in line
|
| 100 |
+
)
|