DynamicPacific commited on
Commit
7ca4566
·
1 Parent(s): c20ee5f

Deploy worship program generator application to HF Space

Browse files
README.md CHANGED
@@ -1,12 +1,248 @@
1
  ---
2
- title: Worship Agent
3
- emoji: 👁
4
- colorFrom: yellow
5
  colorTo: purple
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Worship Program Generator
3
+ emoji: 🙏
4
+ colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ python_version: "3.10"
12
+ suggested_hardware: cpu-basic
13
  ---
14
 
15
+ # 🙏 主日崇拜程序生成器 Worship Program Generator
16
+
17
+ **Generate bilingual (Chinese-English) worship programs automatically from multiple source documents.**
18
+
19
+ This AI-powered tool helps church staff create comprehensive worship programs by:
20
+ - Extracting content from worship bulletins (PDF)
21
+ - Generating sermon narratives from slide presentations (PDF)
22
+ - Translating between Chinese and English
23
+ - Assembling complete bilingual worship programs
24
+
25
+ ## ✨ Features
26
+
27
+ ### 📄 Multi-Source Input Support
28
+ - **Chinese Sermon Text**: Upload pre-written sermon manuscripts (.txt)
29
+ - **Sermon Slides PDF**: Generate flowing narratives from bullet-point slides
30
+ - **Worship Bulletin PDF**: Extract liturgical elements, hymns, scripture readings
31
+
32
+ ### 🤖 AI-Powered Processing
33
+ - **Narrative Generation**: Convert sermon slides into cohesive sermon text
34
+ - **Translation**: High-quality Chinese ↔ English translation preserving theological nuance
35
+ - **Program Assembly**: Intelligently combine all elements into structured worship order
36
+
37
+ ### 📤 Output Formats
38
+ - **Markdown**: Easy to edit and version control
39
+ - **DOCX**: Ready for printing and distribution
40
+
41
+ ### 🌐 Bilingual Support
42
+ - Seamless Chinese-English parallel text
43
+ - Preserves cultural and theological context
44
+ - Liturgical terminology handled appropriately
45
+
46
+ ## 🚀 Quick Start
47
+
48
+ ### Option A: Pre-Written Sermon
49
+ 1. Upload your **Chinese sermon text** (.txt file)
50
+ 2. Upload your **worship bulletin** (PDF)
51
+ 3. Enter the worship date
52
+ 4. Click "Generate Worship Program"
53
+
54
+ ### Option B: Generate from Slides
55
+ 1. Upload your **sermon slides** (PDF)
56
+ 2. Upload your **worship bulletin** (PDF)
57
+ 3. Enter the worship date
58
+ 4. Click "Generate Worship Program"
59
+
60
+ The AI will:
61
+ - Extract content from all sources
62
+ - Generate narrative text (if needed)
63
+ - Translate to target language
64
+ - Assemble complete worship program
65
+ - Export to Markdown and DOCX
66
+
67
+ ## 🛠️ Technical Details
68
+
69
+ ### LLM Backend
70
+ - **Model**: Qwen 2.5-7B-Instruct (Alibaba Cloud)
71
+ - **Deployment**: HuggingFace Inference API (serverless)
72
+ - **Languages**: Optimized for Chinese and English
73
+
74
+ ### Document Processing
75
+ - **PDF Extraction**: Text and image-based PDFs supported
76
+ - **OCR**: Automatic OCR for scanned documents (Tesseract)
77
+ - **Structure Detection**: Intelligent parsing of worship elements
78
+
79
+ ### Architecture
80
+ ```
81
+ Input PDFs → Document Processor → LLM (Qwen) → Program Assembler → Output (MD/DOCX)
82
+ ```
83
+
84
+ ## 📋 Input Requirements
85
+
86
+ ### Chinese Sermon Text (Option A)
87
+ - Format: Plain text (.txt)
88
+ - Encoding: UTF-8
89
+ - Recommended: Include paragraph breaks and section markers
90
+
91
+ ### Sermon Slides PDF (Option B)
92
+ - Format: PDF
93
+ - Content: Can be text-based or image-based (OCR supported)
94
+ - Structure: Title slides, main points, scripture references
95
+
96
+ ### Worship Bulletin PDF (Required)
97
+ - Format: PDF
98
+ - Should include:
99
+ - Worship date
100
+ - Order of service
101
+ - Hymn numbers/titles
102
+ - Scripture readings
103
+ - Announcements
104
+
105
+ ## 📦 Project Structure
106
+
107
+ ```
108
+ worship-program-agent/
109
+ ├── app.py # Gradio UI
110
+ ├── requirements.txt # Dependencies
111
+ ├── README.md # This file
112
+ ├── .env.example # Configuration template
113
+ ├── core/
114
+ │ ├── document_processor.py # PDF extraction & OCR
115
+ │ ├── translator.py # Translation logic
116
+ │ ├── narrative_generator.py # Sermon generation
117
+ │ └── program_assembler.py # Final assembly
118
+ ├── agents/
119
+ │ ├── worship_agent.py # Workflow orchestration
120
+ │ └── tools.py # Agent tools
121
+ ├── llm/
122
+ │ ├── qwen_client.py # Qwen LLM wrapper
123
+ │ └── prompt_templates.py # System prompts
124
+ ├── utils/
125
+ │ ├── file_utils.py # File handling
126
+ │ └── markdown_to_docx.py # Format conversion
127
+ └── examples/
128
+ ├── sample_sermon.txt
129
+ ├── sample_slides.pdf
130
+ └── sample_bulletin.pdf
131
+ ```
132
+
133
+ ## 🔧 Local Development
134
+
135
+ ### Prerequisites
136
+ - Python 3.10+
137
+ - Tesseract OCR (for scanned PDFs)
138
+ - HuggingFace API token
139
+
140
+ ### Setup
141
+
142
+ ```bash
143
+ # Clone the repository
144
+ git clone <your-repo-url>
145
+ cd worship-program-agent
146
+
147
+ # Install dependencies
148
+ pip install -r requirements.txt
149
+
150
+ # Configure environment
151
+ cp .env.example .env
152
+ # Edit .env and add your HF_API_TOKEN
153
+
154
+ # Install Tesseract (for OCR support)
155
+ # Ubuntu/Debian:
156
+ sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-eng
157
+
158
+ # macOS:
159
+ brew install tesseract tesseract-lang
160
+
161
+ # Run locally
162
+ python app.py
163
+ ```
164
+
165
+ ### Configuration
166
+
167
+ Edit `.env` file:
168
+ ```bash
169
+ MODEL_ID=Qwen/Qwen2.5-7B-Instruct
170
+ HF_API_TOKEN=your_token_here
171
+ USE_LOCAL_MODEL=false
172
+ OCR_LANGUAGES=eng+chi_sim
173
+ ```
174
+
175
+ ## 🌐 Deployment
176
+
177
+ ### HuggingFace Spaces
178
+
179
+ This app is designed for HuggingFace Spaces deployment:
180
+
181
+ 1. **Create a new Space** on HuggingFace
182
+ 2. **Push this repository** to the Space
183
+ 3. **Set environment variables** in Space settings:
184
+ - `HF_API_TOKEN`: Your HuggingFace API token
185
+ - `MODEL_ID`: (Optional) Custom model selection
186
+ 4. **Select hardware**: `cpu-basic` (recommended for Inference API)
187
+
188
+ The Space will automatically build and deploy.
189
+
190
+ ### Alternative: Local Model
191
+
192
+ For faster inference, use local GPU:
193
+
194
+ 1. Set `suggested_hardware: t4-medium` in README metadata
195
+ 2. Set `USE_LOCAL_MODEL=true` in environment
196
+ 3. Uncomment `torch` in requirements.txt
197
+
198
+ Note: Local model requires ~14GB GPU memory for Qwen 2.5-7B.
199
+
200
+ ## 📊 Performance
201
+
202
+ ### Typical Processing Time
203
+ - **Bulletin extraction**: 2-5 seconds
204
+ - **Sermon narrative generation**: 15-30 seconds
205
+ - **Translation**: 10-20 seconds
206
+ - **Program assembly**: 5-10 seconds
207
+ - **Total**: 30-60 seconds (depending on content length)
208
+
209
+ ### API Costs (HF Inference API)
210
+ - Free tier: 1,000 requests/month
211
+ - Paid tier: ~$0.001-0.005 per request
212
+ - Typical program generation: ~3-4 API calls
213
+
214
+ ## ⚠️ Limitations
215
+
216
+ - **Maximum file size**: 20MB per upload
217
+ - **PDF complexity**: Very complex layouts may require manual review
218
+ - **OCR accuracy**: Scanned documents may have transcription errors
219
+ - **Translation**: Review theological terms for accuracy
220
+ - **Rate limits**: HF Inference API has rate limiting
221
+
222
+ ## 🤝 Contributing
223
+
224
+ Contributions welcome! Areas for improvement:
225
+ - Additional language pairs
226
+ - Custom template support
227
+ - Batch processing
228
+ - Enhanced structure detection
229
+ - Alternative LLM backends
230
+
231
+ ## 📄 License
232
+
233
+ MIT License - see LICENSE file
234
+
235
+ ## 🙏 Acknowledgments
236
+
237
+ - **Qwen Team** (Alibaba Cloud) - LLM model
238
+ - **HuggingFace** - Inference infrastructure
239
+ - **Gradio** - UI framework
240
+ - **Tesseract** - OCR engine
241
+
242
+ ## 📞 Support
243
+
244
+ For issues, questions, or feature requests, please open an issue on GitHub.
245
+
246
+ ---
247
+
248
+ **Built with ❤️ for church communities**
agents/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Agent orchestration and workflow modules.
3
+ """
4
+
5
+ from .worship_agent import WorshipProgramAgent
6
+ from .tools import WorshipProgramTools
7
+
8
+ __all__ = [
9
+ "WorshipProgramAgent",
10
+ "WorshipProgramTools",
11
+ ]
agents/tools.py ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tool functions for worship program generation workflow.
3
+ """
4
+
5
+ from typing import Dict, List, Optional
6
+ from pathlib import Path
7
+
8
+
9
+ class WorshipProgramTools:
10
+ """Tool functions for worship program generation."""
11
+
12
+ def __init__(self, llm_client):
13
+ """
14
+ Initialize tools with LLM client.
15
+
16
+ Args:
17
+ llm_client: Instance of QwenClient or compatible LLM client
18
+ """
19
+ from core.document_processor import DocumentProcessor, ChineseTextProcessor
20
+
21
+ self.llm = llm_client
22
+ self.doc_processor = DocumentProcessor()
23
+ self.cn_processor = ChineseTextProcessor()
24
+
25
+ def extract_bulletin_tool(self, pdf_path: str) -> Dict:
26
+ """
27
+ Extract worship order and elements from bulletin PDF.
28
+
29
+ Args:
30
+ pdf_path: Path to bulletin PDF file
31
+
32
+ Returns:
33
+ {
34
+ "success": bool,
35
+ "data": Dict or None,
36
+ "error": str (if failed),
37
+ "message": str
38
+ }
39
+ """
40
+ try:
41
+ result = self.doc_processor.extract_bulletin_pdf(pdf_path)
42
+ return {
43
+ "success": True,
44
+ "data": result,
45
+ "message": f"Extracted bulletin content ({len(result.get('text', ''))} chars)"
46
+ }
47
+ except Exception as e:
48
+ return {
49
+ "success": False,
50
+ "error": str(e),
51
+ "message": f"Failed to extract bulletin: {str(e)}"
52
+ }
53
+
54
+ def generate_sermon_narrative_tool(self, slides_pdf_path: str) -> Dict:
55
+ """
56
+ Generate flowing sermon narrative from slide PDF.
57
+
58
+ Steps:
59
+ 1. Extract text/images from slides
60
+ 2. Identify structure (title, points, scriptures)
61
+ 3. Generate cohesive narrative using LLM
62
+
63
+ Args:
64
+ slides_pdf_path: Path to sermon slides PDF
65
+
66
+ Returns:
67
+ {
68
+ "success": bool,
69
+ "narrative": str (if successful),
70
+ "structure": Dict,
71
+ "error": str (if failed),
72
+ "message": str
73
+ }
74
+ """
75
+ try:
76
+ # Extract slides
77
+ slides_data = self.doc_processor.extract_sermon_slides_pdf(slides_pdf_path)
78
+
79
+ # Format for LLM
80
+ slides_text = self._format_slides_for_generation(slides_data)
81
+
82
+ # Generate narrative
83
+ narrative = self.llm.generate_narrative(slides_text)
84
+
85
+ return {
86
+ "success": True,
87
+ "narrative": narrative,
88
+ "structure": slides_data["structure"],
89
+ "message": f"Generated sermon narrative ({len(narrative)} chars)"
90
+ }
91
+ except Exception as e:
92
+ return {
93
+ "success": False,
94
+ "error": str(e),
95
+ "message": f"Failed to generate sermon: {str(e)}"
96
+ }
97
+
98
+ def translate_text_tool(
99
+ self,
100
+ text: str,
101
+ source_lang: str = "Chinese",
102
+ target_lang: str = "English"
103
+ ) -> Dict:
104
+ """
105
+ Translate text between Chinese and English.
106
+
107
+ Args:
108
+ text: Source text
109
+ source_lang: Source language (Chinese/English)
110
+ target_lang: Target language (English/Chinese)
111
+
112
+ Returns:
113
+ {
114
+ "success": bool,
115
+ "translation": str (if successful),
116
+ "source_lang": str,
117
+ "target_lang": str,
118
+ "error": str (if failed)
119
+ }
120
+ """
121
+ try:
122
+ translation = self.llm.translate(text, source_lang, target_lang)
123
+ return {
124
+ "success": True,
125
+ "translation": translation,
126
+ "source_lang": source_lang,
127
+ "target_lang": target_lang,
128
+ "message": f"Translated {len(text)} chars"
129
+ }
130
+ except Exception as e:
131
+ return {
132
+ "success": False,
133
+ "error": str(e),
134
+ "message": f"Translation failed: {str(e)}"
135
+ }
136
+
137
+ def assemble_worship_program_tool(
138
+ self,
139
+ chinese_sermon: str,
140
+ english_sermon: str,
141
+ bulletin_data: Dict,
142
+ date: str
143
+ ) -> Dict:
144
+ """
145
+ Assemble complete bilingual worship program.
146
+
147
+ Args:
148
+ chinese_sermon: Chinese sermon text
149
+ english_sermon: English sermon translation
150
+ bulletin_data: Extracted bulletin data
151
+ date: Worship date (YYYY-MM-DD)
152
+
153
+ Returns:
154
+ {
155
+ "success": bool,
156
+ "program": str (markdown content if successful),
157
+ "error": str (if failed),
158
+ "message": str
159
+ }
160
+ """
161
+ try:
162
+ program_markdown = self.llm.assemble_program(
163
+ chinese_sermon=chinese_sermon,
164
+ english_sermon=english_sermon,
165
+ bulletin_content=bulletin_data.get("text", ""),
166
+ date=date
167
+ )
168
+
169
+ return {
170
+ "success": True,
171
+ "program": program_markdown,
172
+ "message": "Worship program assembled successfully"
173
+ }
174
+ except Exception as e:
175
+ return {
176
+ "success": False,
177
+ "error": str(e),
178
+ "message": f"Program assembly failed: {str(e)}"
179
+ }
180
+
181
+ def _format_slides_for_generation(self, slides_data: Dict) -> str:
182
+ """
183
+ Format extracted slides data for narrative generation.
184
+
185
+ Args:
186
+ slides_data: Output from extract_sermon_slides_pdf
187
+
188
+ Returns:
189
+ Formatted text for LLM input
190
+ """
191
+ lines = []
192
+
193
+ # Add structure summary
194
+ structure = slides_data.get("structure", {})
195
+ if structure.get("title"):
196
+ lines.append(f"# {structure['title']}\n")
197
+
198
+ # Add slides content
199
+ for slide in slides_data.get("slides", []):
200
+ text = slide["text"].strip()
201
+ if not text:
202
+ continue
203
+
204
+ if slide["is_title"]:
205
+ lines.append(f"## {text}")
206
+ elif slide["is_scripture"]:
207
+ lines.append(f"**Scripture:** {text}")
208
+ else:
209
+ lines.append(text)
210
+
211
+ lines.append("") # Add spacing
212
+
213
+ return "\n".join(lines)
agents/worship_agent.py ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Main workflow orchestration agent for worship program generation.
3
+ """
4
+
5
+ from typing import Dict, Optional, Callable
6
+ from pathlib import Path
7
+ from agents.tools import WorshipProgramTools
8
+
9
+
10
+ class WorshipProgramAgent:
11
+ """
12
+ Orchestrates worship program generation workflow.
13
+
14
+ Workflow:
15
+ 1. Extract bulletin (worship order, hymns, date)
16
+ 2. Process Chinese sermon OR generate from slides
17
+ 3. Translate sermon to English
18
+ 4. Assemble complete bilingual program
19
+ 5. Export to markdown and DOCX
20
+ """
21
+
22
+ def __init__(
23
+ self,
24
+ llm_client,
25
+ output_dir: str = "./outputs"
26
+ ):
27
+ """
28
+ Initialize worship program agent.
29
+
30
+ Args:
31
+ llm_client: Instance of QwenClient or compatible LLM
32
+ output_dir: Directory for output files
33
+ """
34
+ self.llm = llm_client
35
+ self.tools = WorshipProgramTools(llm_client)
36
+ self.output_dir = Path(output_dir)
37
+ self.output_dir.mkdir(exist_ok=True, parents=True)
38
+
39
+ def generate_program(
40
+ self,
41
+ chinese_sermon_text: Optional[str] = None,
42
+ sermon_slides_pdf: Optional[str] = None,
43
+ bulletin_pdf: str = None,
44
+ date: Optional[str] = None,
45
+ progress_callback: Optional[Callable] = None
46
+ ) -> Dict:
47
+ """
48
+ Main workflow to generate worship program.
49
+
50
+ Args:
51
+ chinese_sermon_text: Pre-written Chinese sermon (if available)
52
+ sermon_slides_pdf: Sermon slides PDF (if sermon needs generation)
53
+ bulletin_pdf: Worship bulletin PDF (required)
54
+ date: Worship date (auto-extracted if not provided)
55
+ progress_callback: Function(progress: float, desc: str) to report progress
56
+
57
+ Returns:
58
+ {
59
+ "success": bool,
60
+ "markdown_path": str (if successful),
61
+ "docx_path": str (if successful),
62
+ "program_content": str,
63
+ "metadata": Dict,
64
+ "error": str (if failed)
65
+ }
66
+ """
67
+ def update_progress(step: str, pct: float):
68
+ """Helper to update progress."""
69
+ if progress_callback:
70
+ progress_callback(pct, desc=step)
71
+
72
+ try:
73
+ # Validation
74
+ if not bulletin_pdf:
75
+ raise ValueError("Bulletin PDF is required")
76
+
77
+ if not chinese_sermon_text and not sermon_slides_pdf:
78
+ raise ValueError("Must provide either chinese_sermon_text or sermon_slides_pdf")
79
+
80
+ # Step 1: Extract bulletin
81
+ update_progress("📄 Extracting bulletin...", 0.1)
82
+ bulletin_result = self.tools.extract_bulletin_tool(bulletin_pdf)
83
+
84
+ if not bulletin_result["success"]:
85
+ raise ValueError(f"Bulletin extraction failed: {bulletin_result.get('error', 'Unknown error')}")
86
+
87
+ bulletin_data = bulletin_result["data"]
88
+ if not date:
89
+ date = bulletin_data.get("date") or "未标注日期"
90
+
91
+ # Step 2: Get/Generate Chinese sermon
92
+ update_progress("📝 Processing sermon...", 0.3)
93
+ if chinese_sermon_text:
94
+ chinese_sermon = chinese_sermon_text
95
+ else:
96
+ sermon_result = self.tools.generate_sermon_narrative_tool(sermon_slides_pdf)
97
+ if not sermon_result["success"]:
98
+ raise ValueError(f"Sermon generation failed: {sermon_result.get('error', 'Unknown error')}")
99
+ chinese_sermon = sermon_result["narrative"]
100
+
101
+ # Step 3: Translate to English
102
+ update_progress("🌐 Translating sermon...", 0.5)
103
+ translation_result = self.tools.translate_text_tool(
104
+ text=chinese_sermon,
105
+ source_lang="Chinese",
106
+ target_lang="English"
107
+ )
108
+
109
+ if not translation_result["success"]:
110
+ raise ValueError(f"Translation failed: {translation_result.get('error', 'Unknown error')}")
111
+
112
+ english_sermon = translation_result["translation"]
113
+
114
+ # Step 4: Assemble program
115
+ update_progress("📋 Assembling worship program...", 0.7)
116
+ program_result = self.tools.assemble_worship_program_tool(
117
+ chinese_sermon=chinese_sermon,
118
+ english_sermon=english_sermon,
119
+ bulletin_data=bulletin_data,
120
+ date=date
121
+ )
122
+
123
+ if not program_result["success"]:
124
+ raise ValueError(f"Program assembly failed: {program_result.get('error', 'Unknown error')}")
125
+
126
+ program_markdown = program_result["program"]
127
+
128
+ # Step 5: Save outputs
129
+ update_progress("💾 Saving files...", 0.9)
130
+ markdown_path = self._save_markdown(program_markdown, date)
131
+ docx_path = self._save_docx(program_markdown, date)
132
+
133
+ update_progress("✅ Complete!", 1.0)
134
+
135
+ return {
136
+ "success": True,
137
+ "markdown_path": str(markdown_path),
138
+ "docx_path": str(docx_path),
139
+ "program_content": program_markdown,
140
+ "metadata": {
141
+ "date": date,
142
+ "chinese_sermon_length": len(chinese_sermon),
143
+ "english_sermon_length": len(english_sermon),
144
+ "bulletin_source": bulletin_pdf,
145
+ "sermon_source": "text" if chinese_sermon_text else "slides"
146
+ }
147
+ }
148
+
149
+ except Exception as e:
150
+ return {
151
+ "success": False,
152
+ "error": str(e),
153
+ "message": f"Workflow failed: {str(e)}"
154
+ }
155
+
156
+ def _save_markdown(self, content: str, date: str) -> Path:
157
+ """
158
+ Save program as markdown.
159
+
160
+ Args:
161
+ content: Markdown content
162
+ date: Date string for filename
163
+
164
+ Returns:
165
+ Path to saved file
166
+ """
167
+ # Sanitize date for filename
168
+ safe_date = date.replace("/", "-").replace(" ", "_")
169
+ filename = f"worship_program_{safe_date}.md"
170
+ filepath = self.output_dir / filename
171
+
172
+ with open(filepath, "w", encoding="utf-8") as f:
173
+ f.write(content)
174
+
175
+ return filepath
176
+
177
+ def _save_docx(self, markdown_content: str, date: str) -> Path:
178
+ """
179
+ Convert markdown to DOCX and save.
180
+
181
+ Args:
182
+ markdown_content: Markdown content
183
+ date: Date string for filename
184
+
185
+ Returns:
186
+ Path to saved DOCX file
187
+ """
188
+ from utils.markdown_to_docx import markdown_to_docx
189
+
190
+ safe_date = date.replace("/", "-").replace(" ", "_")
191
+ filename = f"worship_program_{safe_date}.docx"
192
+ filepath = self.output_dir / filename
193
+
194
+ try:
195
+ markdown_to_docx(markdown_content, str(filepath))
196
+ except Exception as e:
197
+ print(f"Warning: DOCX conversion failed: {e}")
198
+ # Return None if conversion fails
199
+ return None
200
+
201
+ return filepath
app.py ADDED
@@ -0,0 +1,295 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Worship Program Generator - Gradio Application
3
+
4
+ Bilingual (Chinese-English) worship program generation from multiple sources.
5
+ """
6
+
7
+ import os
8
+ import gradio as gr
9
+ from pathlib import Path
10
+ from dotenv import load_dotenv
11
+
12
+ # Load environment variables
13
+ load_dotenv()
14
+
15
+ # Configuration
16
+ MODEL_ID = os.getenv("MODEL_ID", "Qwen/Qwen2.5-7B-Instruct")
17
+ HF_API_TOKEN = os.getenv("HF_API_TOKEN")
18
+ USE_LOCAL_MODEL = os.getenv("USE_LOCAL_MODEL", "false").lower() == "true"
19
+ MAX_FILE_SIZE_MB = int(os.getenv("MAX_FILE_SIZE_MB", "20"))
20
+
21
+ # Initialize LLM and Agent
22
+ from llm.qwen_client import QwenClient
23
+ from agents.worship_agent import WorshipProgramAgent
24
+
25
+ print(f"Initializing with model: {MODEL_ID}")
26
+ print(f"Using local model: {USE_LOCAL_MODEL}")
27
+
28
+ try:
29
+ llm_client = QwenClient(
30
+ model_id=MODEL_ID,
31
+ api_token=HF_API_TOKEN,
32
+ use_local=USE_LOCAL_MODEL
33
+ )
34
+ agent = WorshipProgramAgent(llm_client, output_dir="./outputs")
35
+ print("✓ Agent initialized successfully")
36
+ except Exception as e:
37
+ print(f"✗ Error initializing agent: {e}")
38
+ llm_client = None
39
+ agent = None
40
+
41
+
42
+ def process_worship_program(
43
+ chinese_sermon_file,
44
+ sermon_slides_file,
45
+ bulletin_file,
46
+ worship_date,
47
+ progress=gr.Progress()
48
+ ):
49
+ """
50
+ Main Gradio handler for worship program generation.
51
+
52
+ Args:
53
+ chinese_sermon_file: Uploaded .txt file with Chinese sermon (optional)
54
+ sermon_slides_file: Uploaded sermon slides PDF (optional)
55
+ bulletin_file: Uploaded bulletin PDF (required)
56
+ worship_date: Date string (YYYY-MM-DD)
57
+ progress: Gradio progress tracker
58
+
59
+ Returns:
60
+ (status_message, markdown_file, docx_file)
61
+ """
62
+ if agent is None:
63
+ return "❌ Error: Agent not initialized. Check configuration.", None, None
64
+
65
+ # Validation
66
+ if not bulletin_file:
67
+ return "❌ Error: Bulletin PDF is required", None, None
68
+
69
+ if not chinese_sermon_file and not sermon_slides_file:
70
+ return "❌ Error: Must provide either Chinese sermon text OR sermon slides PDF", None, None
71
+
72
+ # Check file sizes
73
+ try:
74
+ if bulletin_file and Path(bulletin_file).stat().st_size > MAX_FILE_SIZE_MB * 1024 * 1024:
75
+ return f"❌ Error: Bulletin file exceeds {MAX_FILE_SIZE_MB}MB limit", None, None
76
+
77
+ if sermon_slides_file and Path(sermon_slides_file).stat().st_size > MAX_FILE_SIZE_MB * 1024 * 1024:
78
+ return f"❌ Error: Slides file exceeds {MAX_FILE_SIZE_MB}MB limit", None, None
79
+ except Exception as e:
80
+ return f"❌ Error checking file sizes: {str(e)}", None, None
81
+
82
+ try:
83
+ # Read Chinese sermon if provided
84
+ chinese_text = None
85
+ if chinese_sermon_file:
86
+ try:
87
+ with open(chinese_sermon_file, "r", encoding="utf-8") as f:
88
+ chinese_text = f.read()
89
+ except UnicodeDecodeError:
90
+ # Try GB2312/GBK encoding
91
+ with open(chinese_sermon_file, "r", encoding="gbk") as f:
92
+ chinese_text = f.read()
93
+
94
+ # Generate program
95
+ result = agent.generate_program(
96
+ chinese_sermon_text=chinese_text,
97
+ sermon_slides_pdf=sermon_slides_file,
98
+ bulletin_pdf=bulletin_file,
99
+ date=worship_date if worship_date else None,
100
+ progress_callback=lambda pct, desc: progress(pct, desc=desc)
101
+ )
102
+
103
+ if not result["success"]:
104
+ error_msg = result.get("message", "Unknown error")
105
+ return f"❌ Error: {error_msg}", None, None
106
+
107
+ # Format success message
108
+ metadata = result["metadata"]
109
+ status = f"""✅ **Worship Program Generated Successfully!**
110
+
111
+ **📅 Date:** {metadata['date']}
112
+
113
+ **📊 Statistics:**
114
+ - Chinese Sermon: {metadata['chinese_sermon_length']:,} characters
115
+ - English Sermon: {metadata['english_sermon_length']:,} characters
116
+ - Source: {"Pre-written text" if metadata['sermon_source'] == 'text' else "Generated from slides"}
117
+
118
+ **📁 Output Files:**
119
+ - Markdown: `{Path(result['markdown_path']).name}`
120
+ - DOCX: `{Path(result['docx_path']).name if result.get('docx_path') else 'Not generated'}`
121
+
122
+ Download the files below ⬇️
123
+ """
124
+
125
+ # Return paths for download
126
+ markdown_file = result["markdown_path"] if Path(result["markdown_path"]).exists() else None
127
+ docx_file = result["docx_path"] if result.get("docx_path") and Path(result["docx_path"]).exists() else None
128
+
129
+ return status, markdown_file, docx_file
130
+
131
+ except Exception as e:
132
+ import traceback
133
+ error_msg = f"❌ **Error:**\n\n{str(e)}\n\n<details>\n<summary>Traceback</summary>\n\n```\n{traceback.format_exc()}\n```\n</details>"
134
+ return error_msg, None, None
135
+
136
+
137
+ # Gradio Interface
138
+ with gr.Blocks(
139
+ title="Worship Program Generator",
140
+ theme=gr.themes.Soft(),
141
+ css="""
142
+ .title { text-align: center; font-size: 2em; margin-bottom: 1em; }
143
+ .subtitle { text-align: center; color: #666; margin-bottom: 2em; }
144
+ """
145
+ ) as demo:
146
+
147
+ gr.Markdown(
148
+ """
149
+ <div class="title">🙏 主日崇拜程序生成器</div>
150
+ <div class="title">Worship Program Generator</div>
151
+ <div class="subtitle">Generate bilingual worship programs from multiple sources</div>
152
+ """,
153
+ elem_classes=["title"]
154
+ )
155
+
156
+ gr.Markdown("""
157
+ ### 📖 How to Use
158
+
159
+ **Required:** Worship Bulletin PDF
160
+ **Choose ONE:** Chinese sermon text OR sermon slides PDF
161
+
162
+ The system will:
163
+ 1. Extract content from all sources
164
+ 2. Generate narrative (if using slides)
165
+ 3. Translate between languages
166
+ 4. Assemble complete bilingual program
167
+ 5. Export to Markdown and DOCX
168
+ """)
169
+
170
+ with gr.Row():
171
+ with gr.Column(scale=1):
172
+ gr.Markdown("### 📤 Input Files")
173
+
174
+ chinese_sermon_input = gr.File(
175
+ label="📝 Chinese Sermon Text (中文讲章) - Optional",
176
+ file_types=[".txt"],
177
+ type="filepath"
178
+ )
179
+
180
+ slides_input = gr.File(
181
+ label="🖼️ Sermon Slides PDF (讲章幻灯片) - Optional",
182
+ file_types=[".pdf"],
183
+ type="filepath"
184
+ )
185
+
186
+ bulletin_input = gr.File(
187
+ label="📋 Worship Bulletin PDF (崇拜程序单) - Required ⭐",
188
+ file_types=[".pdf"],
189
+ type="filepath"
190
+ )
191
+
192
+ date_input = gr.Textbox(
193
+ label="📅 Worship Date (YYYY-MM-DD)",
194
+ placeholder="2024-01-07 (leave blank to auto-detect)",
195
+ value=""
196
+ )
197
+
198
+ generate_btn = gr.Button(
199
+ "🚀 Generate Worship Program",
200
+ variant="primary",
201
+ size="lg"
202
+ )
203
+
204
+ with gr.Column(scale=1):
205
+ gr.Markdown("### 📥 Output")
206
+
207
+ status_output = gr.Markdown("💡 Ready to generate...")
208
+
209
+ markdown_download = gr.File(
210
+ label="📄 Download Markdown (.md)",
211
+ interactive=False
212
+ )
213
+
214
+ docx_download = gr.File(
215
+ label="📄 Download DOCX (.docx)",
216
+ interactive=False
217
+ )
218
+
219
+ # Usage Guide
220
+ with gr.Accordion("📚 Usage Guide & Tips", open=False):
221
+ gr.Markdown("""
222
+ ### Workflow Options
223
+
224
+ **Option A: Pre-written Sermon**
225
+ 1. Upload Chinese sermon text file (.txt, UTF-8 encoding)
226
+ 2. Upload worship bulletin PDF
227
+ 3. Enter date (or leave blank)
228
+ 4. Click Generate
229
+
230
+ **Option B: Generate from Slides**
231
+ 1. Upload sermon slides PDF (can be text or image-based)
232
+ 2. Upload worship bulletin PDF
233
+ 3. Enter date (or leave blank)
234
+ 4. Click Generate (AI will create narrative from slides)
235
+
236
+ ### Tips
237
+
238
+ - **Date Detection:** Leave blank to auto-extract from bulletin filename (format: `bulletin-YYYY-MM-DD.pdf`)
239
+ - **File Encoding:** Chinese text files should be UTF-8 or GBK encoded
240
+ - **PDF Support:** Both text-based and scanned (OCR) PDFs are supported
241
+ - **Processing Time:** Typically 30-60 seconds depending on content length
242
+ - **File Size Limit:** Maximum 20MB per file
243
+
244
+ ### Troubleshooting
245
+
246
+ - **OCR Issues:** Ensure bulletin text is clear and high-resolution
247
+ - **Translation Quality:** Review theological terms for accuracy
248
+ - **Missing Content:** Check that PDFs contain expected sections
249
+ - **Encoding Errors:** Save Chinese text as UTF-8
250
+
251
+ ### Output Format
252
+
253
+ The generated worship program includes:
254
+ - Bilingual header with date and theme
255
+ - Order of worship (prelude, songs, scripture)
256
+ - Complete sermon (Chinese + English)
257
+ - Liturgical elements
258
+ - Announcements
259
+
260
+ Both Markdown (.md) and Word (.docx) formats are provided.
261
+ """)
262
+
263
+ # Event handlers
264
+ generate_btn.click(
265
+ fn=process_worship_program,
266
+ inputs=[
267
+ chinese_sermon_input,
268
+ slides_input,
269
+ bulletin_input,
270
+ date_input
271
+ ],
272
+ outputs=[
273
+ status_output,
274
+ markdown_download,
275
+ docx_download
276
+ ],
277
+ show_progress=True
278
+ )
279
+
280
+ gr.Markdown("""
281
+ ---
282
+ **🤖 Powered by:** Qwen 2.5 LLM | **📦 Framework:** HuggingFace Transformers | **🎨 UI:** Gradio
283
+
284
+ Built with ❤️ for church communities
285
+ """)
286
+
287
+
288
+ if __name__ == "__main__":
289
+ demo.queue(
290
+ max_size=int(os.getenv("GRADIO_MAX_QUEUE_SIZE", "10"))
291
+ ).launch(
292
+ server_name=os.getenv("GRADIO_SERVER_NAME", "0.0.0.0"),
293
+ server_port=int(os.getenv("GRADIO_SERVER_PORT", "7860")),
294
+ share=os.getenv("GRADIO_SHARE", "false").lower() == "true"
295
+ )
core/__init__.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Core document processing and content generation modules.
3
+ """
4
+
5
+ from .document_processor import DocumentProcessor, ChineseTextProcessor
6
+
7
+ __all__ = [
8
+ "DocumentProcessor",
9
+ "ChineseTextProcessor",
10
+ ]
core/document_processor.py ADDED
@@ -0,0 +1,284 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Document processing module for PDF extraction and OCR.
3
+ """
4
+
5
+ import pdfplumber
6
+ import pytesseract
7
+ from pdf2image import convert_from_path
8
+ from PIL import Image
9
+ from typing import Dict, List, Optional
10
+ import re
11
+ from pathlib import Path
12
+
13
+
14
+ class DocumentProcessor:
15
+ """Extract text and structure from PDF documents."""
16
+
17
+ def __init__(self, ocr_languages: str = "eng+chi_sim"):
18
+ """
19
+ Initialize document processor.
20
+
21
+ Args:
22
+ ocr_languages: Tesseract language codes (e.g., "eng+chi_sim")
23
+ """
24
+ self.ocr_languages = ocr_languages
25
+
26
+ def extract_bulletin_pdf(self, pdf_path: str) -> Dict:
27
+ """
28
+ Extract worship order from bulletin PDF.
29
+
30
+ Args:
31
+ pdf_path: Path to bulletin PDF file
32
+
33
+ Returns:
34
+ {
35
+ "text": str, # Full text content
36
+ "sections": { # Structured sections
37
+ "hymns": List[str],
38
+ "scripture": str,
39
+ "announcements": str,
40
+ "order": List[str]
41
+ },
42
+ "date": str, # Extracted date
43
+ "metadata": Dict
44
+ }
45
+ """
46
+ text = self.extract_with_structure(pdf_path)
47
+ date = self.extract_date_from_filename(pdf_path)
48
+
49
+ # TODO: Implement intelligent section parsing
50
+ sections = self._parse_bulletin_sections(text)
51
+
52
+ return {
53
+ "text": text,
54
+ "sections": sections,
55
+ "date": date,
56
+ "metadata": {
57
+ "filename": Path(pdf_path).name,
58
+ "page_count": self._get_page_count(pdf_path)
59
+ }
60
+ }
61
+
62
+ def extract_sermon_slides_pdf(self, pdf_path: str) -> Dict:
63
+ """
64
+ Extract sermon content from slides PDF.
65
+
66
+ Args:
67
+ pdf_path: Path to sermon slides PDF
68
+
69
+ Returns:
70
+ {
71
+ "slides": List[Dict], # List of slide data
72
+ "structure": Dict # Sermon structure
73
+ }
74
+ """
75
+ slides = []
76
+
77
+ with pdfplumber.open(pdf_path) as pdf:
78
+ for i, page in enumerate(pdf.pages):
79
+ text = page.extract_text() or ""
80
+
81
+ # If no text, try OCR
82
+ if len(text.strip()) < 10:
83
+ text = self._ocr_page(pdf_path, i)
84
+
85
+ slide_data = {
86
+ "page_num": i + 1,
87
+ "text": text,
88
+ "is_title": self._is_title_slide(text),
89
+ "is_scripture": self._is_scripture_slide(text)
90
+ }
91
+ slides.append(slide_data)
92
+
93
+ structure = self._extract_sermon_structure(slides)
94
+
95
+ return {
96
+ "slides": slides,
97
+ "structure": structure
98
+ }
99
+
100
+ def extract_with_structure(self, pdf_path: str) -> str:
101
+ """
102
+ Extract text from PDF preserving structure.
103
+
104
+ Args:
105
+ pdf_path: Path to PDF file
106
+
107
+ Returns:
108
+ Extracted text with layout preserved
109
+ """
110
+ content = []
111
+
112
+ try:
113
+ with pdfplumber.open(pdf_path) as pdf:
114
+ for page in pdf.pages:
115
+ text = page.extract_text(layout=True)
116
+ if text:
117
+ content.append(text)
118
+ except Exception as e:
119
+ print(f"Error extracting PDF: {e}")
120
+ # Fallback to OCR
121
+ content = [self._ocr_page(pdf_path, i) for i in range(self._get_page_count(pdf_path))]
122
+
123
+ return "\n\n".join(content)
124
+
125
+ def _ocr_page(self, pdf_path: str, page_num: int) -> str:
126
+ """
127
+ OCR a single page from PDF.
128
+
129
+ Args:
130
+ pdf_path: Path to PDF
131
+ page_num: Page number (0-indexed)
132
+
133
+ Returns:
134
+ Extracted text from OCR
135
+ """
136
+ try:
137
+ images = convert_from_path(pdf_path, first_page=page_num+1, last_page=page_num+1)
138
+ if images:
139
+ return pytesseract.image_to_string(images[0], lang=self.ocr_languages)
140
+ except Exception as e:
141
+ print(f"OCR error on page {page_num}: {e}")
142
+
143
+ return ""
144
+
145
+ def _get_page_count(self, pdf_path: str) -> int:
146
+ """Get total page count from PDF."""
147
+ try:
148
+ with pdfplumber.open(pdf_path) as pdf:
149
+ return len(pdf.pages)
150
+ except:
151
+ return 0
152
+
153
+ def extract_date_from_filename(self, pdf_path: str) -> str:
154
+ """
155
+ Extract date from PDF filename.
156
+
157
+ Looks for patterns like YYYY-MM-DD.
158
+
159
+ Args:
160
+ pdf_path: Path to PDF file
161
+
162
+ Returns:
163
+ Date string (YYYY-MM-DD) or empty string
164
+ """
165
+ filename = Path(pdf_path).name
166
+ match = re.search(r'(\d{4}-\d{2}-\d{2})', filename)
167
+ if match:
168
+ return match.group(1)
169
+ return ""
170
+
171
+ def _parse_bulletin_sections(self, text: str) -> Dict:
172
+ """Parse bulletin into structured sections."""
173
+ # TODO: Implement intelligent parsing
174
+ return {
175
+ "hymns": [],
176
+ "scripture": "",
177
+ "announcements": "",
178
+ "order": []
179
+ }
180
+
181
+ def _is_title_slide(self, text: str) -> bool:
182
+ """Detect if slide is a title slide."""
183
+ # Simple heuristic: short text, no bullet points
184
+ lines = text.strip().split('\n')
185
+ return len(lines) <= 3 and not any(line.strip().startswith(('•', '-', '*')) for line in lines)
186
+
187
+ def _is_scripture_slide(self, text: str) -> bool:
188
+ """Detect if slide contains scripture reference."""
189
+ # Look for common scripture patterns
190
+ scripture_patterns = [
191
+ r'[创出利民申].*\d+:\d+', # Chinese books
192
+ r'[约太可路罗林加弗腓西帖提多门彼雅启].*\d+:\d+',
193
+ r'\b[A-Z][a-z]+\s+\d+:\d+', # English books
194
+ ]
195
+ return any(re.search(pattern, text) for pattern in scripture_patterns)
196
+
197
+ def _extract_sermon_structure(self, slides: List[Dict]) -> Dict:
198
+ """Extract sermon structure from slides."""
199
+ structure = {
200
+ "title": "",
201
+ "main_points": [],
202
+ "scriptures": []
203
+ }
204
+
205
+ # Find title
206
+ for slide in slides:
207
+ if slide["is_title"]:
208
+ structure["title"] = slide["text"].strip()
209
+ break
210
+
211
+ # Find main points and scriptures
212
+ for slide in slides:
213
+ if slide["is_scripture"]:
214
+ structure["scriptures"].append(slide["text"].strip())
215
+ elif not slide["is_title"] and slide["text"].strip():
216
+ structure["main_points"].append(slide["text"].strip())
217
+
218
+ return structure
219
+
220
+
221
+ class ChineseTextProcessor:
222
+ """Process and normalize Chinese text."""
223
+
224
+ @staticmethod
225
+ def normalize_text(text: str) -> str:
226
+ """
227
+ Normalize Chinese text.
228
+
229
+ - Fix punctuation
230
+ - Remove extra whitespace
231
+ - Standardize quotes
232
+
233
+ Args:
234
+ text: Input Chinese text
235
+
236
+ Returns:
237
+ Normalized text
238
+ """
239
+ # Remove extra whitespace
240
+ text = re.sub(r'\s+', ' ', text)
241
+
242
+ # Normalize punctuation
243
+ replacements = {
244
+ ',': ',',
245
+ '。': '。',
246
+ '!': '!',
247
+ '?': '?',
248
+ ':': ':',
249
+ ';': ';',
250
+ '"': '"',
251
+ '"': '"',
252
+ ''': "'",
253
+ ''': "'",
254
+ }
255
+
256
+ for old, new in replacements.items():
257
+ text = text.replace(old, new)
258
+
259
+ return text.strip()
260
+
261
+ @staticmethod
262
+ def segment_sermon(text: str) -> Dict:
263
+ """
264
+ Segment Chinese sermon into logical sections.
265
+
266
+ Args:
267
+ text: Full sermon text
268
+
269
+ Returns:
270
+ {
271
+ "introduction": str,
272
+ "main_points": List[str],
273
+ "conclusion": str,
274
+ "scripture_references": List[str]
275
+ }
276
+ """
277
+ # TODO: Implement intelligent segmentation
278
+ # For now, return basic structure
279
+ return {
280
+ "introduction": "",
281
+ "main_points": [],
282
+ "conclusion": "",
283
+ "scripture_references": []
284
+ }
examples/README.md ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Example Files
2
+
3
+ This directory contains sample input files to help you understand the expected format for the Worship Program Generator.
4
+
5
+ ## 📄 Files
6
+
7
+ ### 1. `sample_chinese_sermon.txt`
8
+ **Type:** Pre-written Chinese sermon text
9
+ **Encoding:** UTF-8
10
+ **Use Case:** Option A - Upload as Chinese sermon text
11
+
12
+ **Content:**
13
+ - Complete sermon manuscript in Chinese
14
+ - Includes: Title, Scripture reference, Introduction, Main points, Conclusion
15
+ - Proper paragraph breaks and structure
16
+ - Scripture references in Chinese format
17
+
18
+ **How to Use:**
19
+ 1. Upload this file as "Chinese Sermon Text"
20
+ 2. Upload a bulletin PDF
21
+ 3. Click "Generate Worship Program"
22
+
23
+ ---
24
+
25
+ ### 2. `sample_slides.pdf` (Not included - Create your own)
26
+ **Type:** Sermon slides presentation
27
+ **Format:** PDF (text-based or image-based)
28
+ **Use Case:** Option B - Generate narrative from slides
29
+
30
+ **Expected Content:**
31
+ - Title slide with sermon title
32
+ - Main point slides (bullet points or short text)
33
+ - Scripture reference slides
34
+ - Can be PowerPoint/Keynote exported as PDF
35
+
36
+ **How to Create:**
37
+ 1. Create a sermon presentation in PowerPoint/Keynote
38
+ 2. Export/Save as PDF
39
+ 3. Upload as "Sermon Slides PDF"
40
+
41
+ ---
42
+
43
+ ### 3. `sample_bulletin.pdf` (Not included - Create your own)
44
+ **Type:** Worship bulletin
45
+ **Format:** PDF
46
+ **Use Case:** Required for all workflows
47
+
48
+ **Expected Content:**
49
+ - Worship date (preferably in filename: `bulletin-2024-01-07.pdf`)
50
+ - Order of worship
51
+ - Hymn numbers and titles
52
+ - Scripture reading passages
53
+ - Announcements
54
+ - Any liturgical elements
55
+
56
+ **Naming Convention:**
57
+ - Recommended: `RCCA-worship-bulletin-YYYY-MM-DD.pdf`
58
+ - Or: `bulletin-YYYY-MM-DD.pdf`
59
+ - Date will be auto-extracted from filename
60
+
61
+ ---
62
+
63
+ ## 📝 File Format Guidelines
64
+
65
+ ### Chinese Sermon Text (.txt)
66
+
67
+ ```
68
+ [Sermon Title in Chinese]
69
+
70
+ 经文:[Scripture Reference]
71
+
72
+ [Introduction paragraph]
73
+
74
+ 一、[First Main Point]
75
+ [Content for first point]
76
+
77
+ 二、[Second Main Point]
78
+ [Content for second point]
79
+
80
+ 三、[Third Main Point]
81
+ [Content for third point]
82
+
83
+ [Conclusion]
84
+ ```
85
+
86
+ **Tips:**
87
+ - Use UTF-8 encoding
88
+ - Include clear section markers (一、二、三 or I. II. III.)
89
+ - Add paragraph breaks for readability
90
+ - Include scripture references in Chinese format
91
+
92
+ ---
93
+
94
+ ### Sermon Slides PDF
95
+
96
+ **Recommended Structure:**
97
+ ```
98
+ Slide 1: Title
99
+ 信心的旅程
100
+ Journey of Faith
101
+
102
+ Slide 2: Scripture
103
+ 创世记 12:1-9
104
+ Genesis 12:1-9
105
+
106
+ Slide 3: Main Point 1
107
+ • 神的呼召
108
+ • God's Call
109
+ • [Key points]
110
+
111
+ Slide 4: Main Point 2
112
+ • 神的应许
113
+ • God's Promise
114
+ • [Key points]
115
+
116
+ Slide 5: Application
117
+ • 实践的教导
118
+ • Practical Teaching
119
+ ```
120
+
121
+ **Tips:**
122
+ - Keep text clear and readable
123
+ - Use consistent formatting
124
+ - Include both Chinese and English if bilingual
125
+ - Avoid heavy graphics (focus on text content)
126
+
127
+ ---
128
+
129
+ ### Worship Bulletin PDF
130
+
131
+ **Recommended Sections:**
132
+ ```
133
+ 主日崇拜程序
134
+ Sunday Worship Service
135
+
136
+ 日期:2024年1月7日
137
+
138
+ 序乐 Prelude
139
+ 宣召 Call to Worship
140
+ 祷告 Prayer
141
+ 诗歌 Hymn #123
142
+ 读经 Scripture Reading: 创世记 12:1-9
143
+ 信息 Sermon: [Title]
144
+ 回应诗歌 Response Hymn #456
145
+ 奉献 Offering
146
+ 祝福 Benediction
147
+
148
+ 报告事项 Announcements
149
+ - [Announcement 1]
150
+ - [Announcement 2]
151
+ ```
152
+
153
+ **Tips:**
154
+ - Include date prominently
155
+ - List hymns with numbers
156
+ - Specify scripture passages
157
+ - Keep format clean and structured
158
+
159
+ ---
160
+
161
+ ## 🧪 Testing the System
162
+
163
+ ### Quick Test Workflow
164
+
165
+ 1. **Prepare Files:**
166
+ - Chinese sermon text OR sermon slides PDF
167
+ - Worship bulletin PDF
168
+
169
+ 2. **Upload:**
170
+ - Go to the Worship Program Generator interface
171
+ - Upload your files
172
+ - Enter or leave blank the worship date
173
+
174
+ 3. **Generate:**
175
+ - Click "Generate Worship Program"
176
+ - Wait 30-60 seconds
177
+
178
+ 4. **Download:**
179
+ - Download both Markdown and DOCX versions
180
+ - Review for accuracy
181
+ - Edit as needed
182
+
183
+ ---
184
+
185
+ ## ⚠️ Common Issues
186
+
187
+ ### Encoding Errors
188
+ - **Problem:** Chinese characters display incorrectly
189
+ - **Solution:** Save text files as UTF-8 encoding
190
+
191
+ ### PDF Extraction Failures
192
+ - **Problem:** Cannot extract text from PDF
193
+ - **Solution:** Ensure PDF is not password-protected, try regenerating PDF with text layer
194
+
195
+ ### Missing Date
196
+ - **Problem:** Date not auto-detected
197
+ - **Solution:** Include date in filename or manually enter in the form
198
+
199
+ ### Translation Quality
200
+ - **Problem:** Translation is awkward or inaccurate
201
+ - **Solution:** Review and manually edit the output, especially theological terms
202
+
203
+ ---
204
+
205
+ ## 📧 Support
206
+
207
+ For issues or questions:
208
+ 1. Check the troubleshooting section in the main README
209
+ 2. Review these example formats
210
+ 3. Open an issue on GitHub with sample files (anonymized)
211
+
212
+ ---
213
+
214
+ **Note:** The actual PDF files (`sample_slides.pdf` and `sample_bulletin.pdf`) are not included in this repository. Please create your own based on the guidelines above, or use your church's existing files.
examples/sample_chinese_sermon.txt ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 信心的旅程:亚伯拉罕的呼召
2
+
3
+ 经文:创世记12:1-9
4
+
5
+ 引言:
6
+
7
+ 今天我们一起来思考信心的含义。当我们回顾圣经中伟大的信心榜样时,亚伯拉罕的名字总是首先浮现在我们的脑海中。他被称为"信心之父",不是因为他从未怀疑,而是因为他在怀疑中仍然选择相信和顺服。
8
+
9
+ 一、神的呼召(12:1)
10
+
11
+ "耶和华对亚伯兰说:你要离开本地、本族、父家,往我所要指示你的地去。"
12
+
13
+ 这是一个看似不合理的呼召。神要求亚伯拉罕离开他所熟悉的一切:
14
+ - 离开本地:放弃安全的环境
15
+ - 离开本族:放弃亲密的关系
16
+ - 离开父家:放弃家族的产业
17
+
18
+ 更令人困惑的是,神并没有明确告诉他目的地在哪里,只说"往我所要指示你的地去"。这需要完全的信靠。
19
+
20
+ 在我们的生活中,神的呼召有时也是如此。祂可能要求我们离开舒适区,进入未知的领域。问题不在于我们是否感到害怕,而在于我们是否愿意顺服。
21
+
22
+ 二、神的应许(12:2-3)
23
+
24
+ 虽然神的要求看似严苛,但祂同时给予了宝贵的应许:
25
+
26
+ 1. "我必叫你成为大国" - 后裔的应许
27
+ 2. "我必赐福给你" - 福分的应许
28
+ 3. "叫你的名为大" - 名声的应许
29
+ 4. "你也要叫别人得福" - 使命的应许
30
+
31
+ 这些应许显明了神呼召的目的。神呼召我们,不仅仅是为了我们个人的益处,更是为了祂国度的计划。我们蒙福,是为了成为别人的祝福。
32
+
33
+ 三、信心的回应(12:4)
34
+
35
+ "亚伯兰就照着耶和华的吩咐去了。"
36
+
37
+ 这简单的一句话,代表了巨大的信心行动。亚伯兰当时已经七十五岁,这个年纪通常是享受安逸的时候,但他选择了顺服。
38
+
39
+ 真正的信心不是停留在口头上,而是表现在行动中。雅各书2:26说:"身体没有灵魂是死的,信心没有行为也是死的。"
40
+
41
+ 四、实践的教导
42
+
43
+ 1. **顺服需要勇气**
44
+ - 面对未知时,勇敢迈出第一步
45
+ - 相信神的引导胜过自己的计划
46
+
47
+ 2. **等候需要耐心**
48
+ - 神的应许不总是立即实现
49
+ - 在等候中继续信靠和顺服
50
+
51
+ 3. **祝福带来责任**
52
+ - 我们领受祝福,是为了传递祝福
53
+ - 神的恩典应该激励我们去服事他人
54
+
55
+ 结语:
56
+
57
+ 亚伯拉罕的信心之旅告诉我们,真正的信心是在不确定中仍然选择相信神。今天,神也在呼召我们,可能不是要我们离开物理上的家乡,但可能是要我们离开属灵上的舒适区。
58
+
59
+ 让我们像亚伯拉罕一样,勇敢地回应神的呼召,因为我们知道,那呼召我们的是信实的。
60
+
61
+ 愿神赐福我们每一个人,让我们在信心的旅程中不断成长。阿们。
llm/__init__.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LLM client and prompt management modules.
3
+ """
4
+
5
+ from .qwen_client import QwenClient
6
+ from .prompt_templates import SYSTEM_PROMPTS, TASK_PROMPTS
7
+
8
+ __all__ = [
9
+ "QwenClient",
10
+ "SYSTEM_PROMPTS",
11
+ "TASK_PROMPTS",
12
+ ]
llm/prompt_templates.py ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ System prompts and task templates for LLM interactions.
3
+ """
4
+
5
+ SYSTEM_PROMPTS = {
6
+ "worship_assembler": """You are a worship program coordinator for a bilingual Chinese-English church.
7
+ Your task is to create well-structured, reverent worship programs that integrate:
8
+ - Sermon content (Chinese with English translation)
9
+ - Hymns and worship songs
10
+ - Scripture readings
11
+ - Liturgical elements (prayers, responsive readings)
12
+ - Announcements
13
+
14
+ Format output in clear markdown with bilingual sections. Maintain a reverent, professional tone.
15
+ Preserve the theological content and ensure proper formatting for both languages.""",
16
+
17
+ "translator": """You are a professional translator specializing in religious texts, liturgy, and theology.
18
+
19
+ Preserve:
20
+ - Theological accuracy and terminology
21
+ - Cultural and denominational sensitivity
22
+ - Formatting and structure
23
+ - Tone and register
24
+ - Scripture references
25
+
26
+ Maintain natural language flow in the target language while staying faithful to the source.""",
27
+
28
+ "narrative_generator": """You are a pastoral assistant helping prepare sermon manuscripts.
29
+ Generate flowing, coherent sermon narratives from outlines and slides.
30
+
31
+ Maintain:
32
+ - Theological depth and accuracy
33
+ - Pastoral and encouraging tone
34
+ - Logical flow and transitions
35
+ - Proper Chinese language style
36
+ - Clear main points and application""",
37
+ }
38
+
39
+ TASK_PROMPTS = {
40
+ "assemble_program": """Create a complete bilingual worship program using these sources:
41
+
42
+ **Sermon Narrative (Chinese):**
43
+ {chinese_sermon}
44
+
45
+ **Sermon Translation (English):**
46
+ {english_sermon}
47
+
48
+ **Bulletin (Worship Order):**
49
+ {bulletin_content}
50
+
51
+ **Date:** {date}
52
+
53
+ Generate a complete worship program in markdown format with these sections:
54
+
55
+ 1. **Header** - Date, theme in both languages
56
+ 2. **Prelude/Welcome** - 序乐/欢迎
57
+ 3. **Worship Songs** - Include hymn numbers from bulletin
58
+ 4. **Scripture Reading** - 读经 with references
59
+ 5. **Sermon** - 信息 (Chinese text followed by English translation)
60
+ 6. **Response/Offering** - 回应/奉献
61
+ 7. **Benediction** - 祝福
62
+ 8. **Announcements** - 报告事项
63
+
64
+ Use this markdown structure:
65
+
66
+ ```markdown
67
+ # 主日崇拜程序 Sunday Worship Program
68
+
69
+ **日期 Date:** {date}
70
+
71
+ ---
72
+
73
+ ## 序乐 Prelude
74
+
75
+ [Content from bulletin]
76
+
77
+ ## 诗歌敬拜 Worship in Song
78
+
79
+ [Hymns with numbers]
80
+
81
+ ## 读经 Scripture Reading
82
+
83
+ [Passage and reference]
84
+
85
+ ## 信息 Sermon
86
+
87
+ ### [Sermon Title in Chinese]
88
+ ### [Sermon Title in English]
89
+
90
+ **中文 Chinese:**
91
+
92
+ {chinese_sermon}
93
+
94
+ **English:**
95
+
96
+ {english_sermon}
97
+
98
+ ## 回应诗歌 Response Song
99
+
100
+ [Hymn information]
101
+
102
+ ## 奉献 Offering
103
+
104
+ ## 祝福 Benediction
105
+
106
+ ## 报告事项 Announcements
107
+
108
+ [Announcements from bulletin]
109
+
110
+ ---
111
+ ```
112
+
113
+ Generate the complete program now:""",
114
+
115
+ "extract_sermon_structure": """Analyze this sermon content and extract its structure:
116
+
117
+ {sermon_text}
118
+
119
+ Provide a structured analysis in this format:
120
+
121
+ **Title:**
122
+ - Chinese: [title]
123
+ - English: [title]
124
+
125
+ **Main Points:**
126
+ 1. [Point 1]
127
+ 2. [Point 2]
128
+ 3. [Point 3]
129
+
130
+ **Scripture References:**
131
+ - [Reference 1]
132
+ - [Reference 2]
133
+
134
+ **Key Themes:**
135
+ - [Theme 1]
136
+ - [Theme 2]
137
+
138
+ Provide the analysis:""",
139
+
140
+ "generate_narrative": """Based on these sermon slides, generate a flowing narrative sermon in Chinese:
141
+
142
+ {slides_content}
143
+
144
+ Requirements:
145
+ 1. Expand bullet points into complete paragraphs
146
+ 2. Add smooth transitions between sections
147
+ 3. Maintain theological depth
148
+ 4. Use appropriate pastoral tone
149
+ 5. Keep the structure: introduction → main points → conclusion
150
+ 6. Include applications and illustrations where appropriate
151
+
152
+ Generate the complete sermon narrative:""",
153
+
154
+ "translate_sermon": """Translate this Chinese sermon to English, preserving theological accuracy:
155
+
156
+ {chinese_text}
157
+
158
+ Requirements:
159
+ 1. Maintain theological terminology accuracy
160
+ 2. Preserve the tone and style
161
+ 3. Keep paragraph structure
162
+ 4. Translate scripture references appropriately
163
+ 5. Ensure natural English flow
164
+
165
+ English translation:""",
166
+ }
llm/qwen_client.py ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Qwen LLM client wrapper for HuggingFace Inference API.
3
+ """
4
+
5
+ import os
6
+ from typing import Dict, List, Optional
7
+ from huggingface_hub import InferenceClient
8
+
9
+
10
+ class QwenClient:
11
+ """Wrapper for Qwen model via HuggingFace Inference API."""
12
+
13
+ def __init__(
14
+ self,
15
+ model_id: str = "Qwen/Qwen2.5-7B-Instruct",
16
+ api_token: Optional[str] = None,
17
+ use_local: bool = False
18
+ ):
19
+ """
20
+ Initialize Qwen client.
21
+
22
+ Args:
23
+ model_id: HuggingFace model ID
24
+ api_token: HF API token (optional, uses env var if not provided)
25
+ use_local: If True, load model locally (requires GPU)
26
+ """
27
+ self.model_id = model_id
28
+ self.api_token = api_token or os.getenv("HF_API_TOKEN")
29
+ self.use_local = use_local
30
+
31
+ if use_local:
32
+ # Load model locally (requires GPU)
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+ print(f"Loading {model_id} locally...")
35
+ self.tokenizer = AutoTokenizer.from_pretrained(model_id)
36
+ self.model = AutoModelForCausalLM.from_pretrained(
37
+ model_id,
38
+ device_map="auto",
39
+ torch_dtype="auto"
40
+ )
41
+ print("Model loaded successfully")
42
+ else:
43
+ # Use HF Inference API (serverless)
44
+ self.client = InferenceClient(
45
+ model=model_id,
46
+ token=self.api_token
47
+ )
48
+
49
+ def chat(
50
+ self,
51
+ messages: List[Dict[str, str]],
52
+ max_tokens: int = 2048,
53
+ temperature: float = 0.7,
54
+ **kwargs
55
+ ) -> str:
56
+ """
57
+ Send chat completion request.
58
+
59
+ Args:
60
+ messages: List of {"role": "user/assistant/system", "content": str}
61
+ max_tokens: Max generation length
62
+ temperature: Sampling temperature
63
+ **kwargs: Additional parameters
64
+
65
+ Returns:
66
+ Generated text
67
+ """
68
+ if self.use_local:
69
+ return self._chat_local(messages, max_tokens, temperature)
70
+ else:
71
+ return self._chat_api(messages, max_tokens, temperature)
72
+
73
+ def _chat_api(self, messages: List[Dict[str, str]], max_tokens: int, temperature: float) -> str:
74
+ """Use HF Inference API."""
75
+ try:
76
+ response = self.client.chat_completion(
77
+ messages=messages,
78
+ max_tokens=max_tokens,
79
+ temperature=temperature,
80
+ )
81
+ return response.choices[0].message.content
82
+ except Exception as e:
83
+ print(f"Error calling HF Inference API: {e}")
84
+ raise
85
+
86
+ def _chat_local(self, messages: List[Dict[str, str]], max_tokens: int, temperature: float) -> str:
87
+ """Use local model."""
88
+ try:
89
+ text = self.tokenizer.apply_chat_template(
90
+ messages,
91
+ tokenize=False,
92
+ add_generation_prompt=True
93
+ )
94
+ inputs = self.tokenizer([text], return_tensors="pt").to(self.model.device)
95
+ outputs = self.model.generate(
96
+ **inputs,
97
+ max_new_tokens=max_tokens,
98
+ temperature=temperature,
99
+ do_sample=temperature > 0
100
+ )
101
+ generated = self.tokenizer.decode(
102
+ outputs[0][len(inputs[0]):],
103
+ skip_special_tokens=True
104
+ )
105
+ return generated
106
+ except Exception as e:
107
+ print(f"Error with local model inference: {e}")
108
+ raise
109
+
110
+ def translate(
111
+ self,
112
+ text: str,
113
+ source_lang: str = "Chinese",
114
+ target_lang: str = "English"
115
+ ) -> str:
116
+ """
117
+ Translate text between languages.
118
+
119
+ Args:
120
+ text: Source text
121
+ source_lang: Source language name
122
+ target_lang: Target language name
123
+
124
+ Returns:
125
+ Translated text
126
+ """
127
+ prompt = f"""Translate the following {source_lang} text to {target_lang}.
128
+ Preserve formatting, meaning, and theological terminology accurately.
129
+
130
+ {source_lang} text:
131
+ {text}
132
+
133
+ {target_lang} translation:"""
134
+
135
+ messages = [
136
+ {
137
+ "role": "system",
138
+ "content": "You are a professional translator specializing in religious and liturgical texts. Maintain theological accuracy and cultural sensitivity."
139
+ },
140
+ {
141
+ "role": "user",
142
+ "content": prompt
143
+ }
144
+ ]
145
+
146
+ return self.chat(messages, temperature=0.3)
147
+
148
+ def generate_narrative(self, slides_content: str) -> str:
149
+ """
150
+ Generate sermon narrative from slide bullet points.
151
+
152
+ Args:
153
+ slides_content: Extracted content from slides
154
+
155
+ Returns:
156
+ Generated sermon narrative in Chinese
157
+ """
158
+ prompt = f"""Based on these sermon slides, generate a flowing narrative sermon text in Chinese.
159
+ Expand bullet points into complete paragraphs while preserving the theological content and structure.
160
+
161
+ Sermon Slides:
162
+ {slides_content}
163
+
164
+ Generate a complete, cohesive sermon narrative:"""
165
+
166
+ messages = [
167
+ {
168
+ "role": "system",
169
+ "content": "You are a pastoral assistant who helps prepare sermon manuscripts. Generate flowing, theologically sound sermon narratives."
170
+ },
171
+ {
172
+ "role": "user",
173
+ "content": prompt
174
+ }
175
+ ]
176
+
177
+ return self.chat(messages, max_tokens=4096, temperature=0.7)
178
+
179
+ def assemble_program(
180
+ self,
181
+ chinese_sermon: str,
182
+ english_sermon: str,
183
+ bulletin_content: str,
184
+ date: str
185
+ ) -> str:
186
+ """
187
+ Assemble complete bilingual worship program.
188
+
189
+ Args:
190
+ chinese_sermon: Chinese sermon text
191
+ english_sermon: English sermon translation
192
+ bulletin_content: Extracted bulletin content
193
+ date: Worship date
194
+
195
+ Returns:
196
+ Complete worship program in markdown format
197
+ """
198
+ from .prompt_templates import TASK_PROMPTS, SYSTEM_PROMPTS
199
+
200
+ prompt = TASK_PROMPTS["assemble_program"].format(
201
+ chinese_sermon=chinese_sermon,
202
+ english_sermon=english_sermon,
203
+ bulletin_content=bulletin_content,
204
+ date=date
205
+ )
206
+
207
+ messages = [
208
+ {
209
+ "role": "system",
210
+ "content": SYSTEM_PROMPTS["worship_assembler"]
211
+ },
212
+ {
213
+ "role": "user",
214
+ "content": prompt
215
+ }
216
+ ]
217
+
218
+ return self.chat(messages, max_tokens=4096, temperature=0.5)
requirements.txt ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Worship Program Generator - Dependencies
2
+
3
+ # Core Framework
4
+ gradio>=4.44.0
5
+ python-dotenv>=1.0.0
6
+
7
+ # HuggingFace & LLM
8
+ huggingface_hub>=0.20.0
9
+ transformers>=4.40.0
10
+ accelerate>=0.25.0
11
+ # torch - uncomment if using local model inference
12
+ # torch>=2.0.0
13
+ # Note: For HF Inference API only, torch is not required
14
+
15
+ # Document Processing
16
+ pypdf2>=3.0.0
17
+ pdfplumber>=0.10.0
18
+ pillow>=10.0.0
19
+ pytesseract>=0.3.10
20
+ pdf2image>=1.16.3
21
+
22
+ # Optional: Better PDF processing
23
+ pymupdf>=1.23.0 # PyMuPDF for advanced PDF handling
24
+
25
+ # Text Processing
26
+ python-docx>=1.1.0
27
+ markdown>=3.5.0
28
+
29
+ # HTTP & API
30
+ requests>=2.31.0
31
+ aiohttp>=3.9.0
32
+
33
+ # Utilities
34
+ tqdm>=4.66.0
35
+ python-dateutil>=2.8.0
36
+
37
+ # Development (optional - comment out for production)
38
+ # pytest>=7.4.0
39
+ # black>=23.0.0
40
+ # flake8>=6.0.0
utils/__init__.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Utility functions for file handling and format conversion.
3
+ """
4
+
5
+ from .markdown_to_docx import markdown_to_docx
6
+ from .file_utils import sanitize_filename, ensure_directory
7
+
8
+ __all__ = [
9
+ "markdown_to_docx",
10
+ "sanitize_filename",
11
+ "ensure_directory",
12
+ ]
utils/file_utils.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ File handling utilities.
3
+ """
4
+
5
+ import re
6
+ from pathlib import Path
7
+ from typing import Union
8
+
9
+
10
+ def sanitize_filename(filename: str) -> str:
11
+ """
12
+ Sanitize filename by removing invalid characters.
13
+
14
+ Args:
15
+ filename: Original filename
16
+
17
+ Returns:
18
+ Sanitized filename safe for filesystems
19
+ """
20
+ # Remove invalid characters
21
+ filename = re.sub(r'[<>:"/\\|?*]', '_', filename)
22
+
23
+ # Remove leading/trailing spaces and dots
24
+ filename = filename.strip('. ')
25
+
26
+ # Limit length
27
+ if len(filename) > 255:
28
+ name, ext = filename.rsplit('.', 1) if '.' in filename else (filename, '')
29
+ filename = name[:250] + ('.' + ext if ext else '')
30
+
31
+ return filename
32
+
33
+
34
+ def ensure_directory(path: Union[str, Path]) -> Path:
35
+ """
36
+ Ensure directory exists, create if necessary.
37
+
38
+ Args:
39
+ path: Directory path
40
+
41
+ Returns:
42
+ Path object
43
+ """
44
+ path = Path(path)
45
+ path.mkdir(parents=True, exist_ok=True)
46
+ return path
47
+
48
+
49
+ def get_file_size_mb(file_path: Union[str, Path]) -> float:
50
+ """
51
+ Get file size in megabytes.
52
+
53
+ Args:
54
+ file_path: Path to file
55
+
56
+ Returns:
57
+ File size in MB
58
+ """
59
+ path = Path(file_path)
60
+ return path.stat().st_size / (1024 * 1024)
61
+
62
+
63
+ def validate_file_type(file_path: Union[str, Path], allowed_extensions: list) -> bool:
64
+ """
65
+ Validate file extension.
66
+
67
+ Args:
68
+ file_path: Path to file
69
+ allowed_extensions: List of allowed extensions (e.g., ['.pdf', '.txt'])
70
+
71
+ Returns:
72
+ True if valid, False otherwise
73
+ """
74
+ path = Path(file_path)
75
+ return path.suffix.lower() in [ext.lower() for ext in allowed_extensions]
utils/markdown_to_docx.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Convert markdown to DOCX with proper formatting.
3
+ """
4
+
5
+ from docx import Document
6
+ from docx.shared import Pt, Inches
7
+ from docx.enum.text import WD_ALIGN_PARAGRAPH
8
+ import re
9
+
10
+
11
+ def markdown_to_docx(markdown_content: str, output_path: str):
12
+ """
13
+ Convert markdown content to DOCX file.
14
+
15
+ Args:
16
+ markdown_content: Markdown text
17
+ output_path: Path to save DOCX file
18
+
19
+ Note:
20
+ This is a basic converter. For more complex markdown,
21
+ consider using pandoc or pypandoc.
22
+ """
23
+ doc = Document()
24
+
25
+ # Set document styles
26
+ style = doc.styles['Normal']
27
+ style.font.name = 'Arial'
28
+ style.font.size = Pt(11)
29
+
30
+ lines = markdown_content.split('\n')
31
+ i = 0
32
+
33
+ while i < len(lines):
34
+ line = lines[i].strip()
35
+
36
+ # Skip empty lines
37
+ if not line:
38
+ i += 1
39
+ continue
40
+
41
+ # Headers
42
+ if line.startswith('# '):
43
+ heading = doc.add_heading(line[2:], level=1)
44
+ heading.alignment = WD_ALIGN_PARAGRAPH.CENTER
45
+ elif line.startswith('## '):
46
+ doc.add_heading(line[3:], level=2)
47
+ elif line.startswith('### '):
48
+ doc.add_heading(line[4:], level=3)
49
+
50
+ # Horizontal rules
51
+ elif line.startswith('---'):
52
+ doc.add_paragraph('_' * 50)
53
+
54
+ # Bold text (simple pattern)
55
+ elif '**' in line:
56
+ p = doc.add_paragraph()
57
+ parts = line.split('**')
58
+ for idx, part in enumerate(parts):
59
+ if idx % 2 == 1: # Bold parts
60
+ run = p.add_run(part)
61
+ run.bold = True
62
+ else:
63
+ p.add_run(part)
64
+
65
+ # Lists
66
+ elif line.startswith('- ') or line.startswith('* '):
67
+ doc.add_paragraph(line[2:], style='List Bullet')
68
+ elif re.match(r'^\d+\.\s', line):
69
+ doc.add_paragraph(line[3:], style='List Number')
70
+
71
+ # Regular paragraphs
72
+ else:
73
+ # Handle multiple consecutive lines as one paragraph
74
+ paragraph_lines = [line]
75
+ j = i + 1
76
+ while j < len(lines) and lines[j].strip() and not _is_special_line(lines[j]):
77
+ paragraph_lines.append(lines[j].strip())
78
+ j += 1
79
+
80
+ full_paragraph = ' '.join(paragraph_lines)
81
+ doc.add_paragraph(full_paragraph)
82
+ i = j - 1
83
+
84
+ i += 1
85
+
86
+ # Save document
87
+ doc.save(output_path)
88
+
89
+
90
+ def _is_special_line(line: str) -> bool:
91
+ """Check if line is a special markdown element."""
92
+ line = line.strip()
93
+ return (
94
+ line.startswith('#') or
95
+ line.startswith('-') or
96
+ line.startswith('*') or
97
+ line.startswith('---') or
98
+ re.match(r'^\d+\.\s', line) or
99
+ '**' in line
100
+ )