Spaces:
Sleeping
Sleeping
Peter Yang commited on
Commit ·
5bdee4b
1
Parent(s): daf9263
Add LLM translation feasibility analysis and development workflow guide
Browse files- DEVELOPMENT_WORKFLOW.md +442 -0
- LLM_TRANSLATION_FEASIBILITY.md +499 -0
DEVELOPMENT_WORKFLOW.md
ADDED
|
@@ -0,0 +1,442 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Development & Debugging Workflow
|
| 2 |
+
## Testing LLM Translation Locally Before HF Spaces Deployment
|
| 3 |
+
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
## Overview
|
| 7 |
+
|
| 8 |
+
**You don't need to connect your IDE to Hugging Face Spaces.** Instead, develop and test locally first, then deploy to HF Spaces. This is faster and more efficient.
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## Recommended Workflow
|
| 13 |
+
|
| 14 |
+
### Phase 1: Local Development & Testing
|
| 15 |
+
|
| 16 |
+
#### 1.1 Set Up Local Environment
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
+
# Create virtual environment (if not already done)
|
| 20 |
+
python -m venv venv
|
| 21 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 22 |
+
|
| 23 |
+
# Install dependencies
|
| 24 |
+
pip install -r requirements.txt
|
| 25 |
+
|
| 26 |
+
# Install additional dependencies for LLM
|
| 27 |
+
pip install bitsandbytes accelerate
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
#### 1.2 Test Locally with Sample Code
|
| 31 |
+
|
| 32 |
+
Create a test script to verify LLM translation works:
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
# test_llm_translation.py
|
| 36 |
+
import asyncio
|
| 37 |
+
from document_processing_agent import DocumentProcessingAgent
|
| 38 |
+
|
| 39 |
+
async def test_llm_translation():
|
| 40 |
+
"""Test LLM translation locally"""
|
| 41 |
+
processor = DocumentProcessingAgent("http://localhost:8080")
|
| 42 |
+
|
| 43 |
+
# Test Chinese text
|
| 44 |
+
chinese_text = "今天我们要学习神的话语,让我们一起来祷告。"
|
| 45 |
+
|
| 46 |
+
print("Testing LLM translation...")
|
| 47 |
+
result = await processor._translate_text(chinese_text, 'zh', 'en')
|
| 48 |
+
|
| 49 |
+
print(f"Chinese: {chinese_text}")
|
| 50 |
+
print(f"English: {result}")
|
| 51 |
+
|
| 52 |
+
return result
|
| 53 |
+
|
| 54 |
+
if __name__ == "__main__":
|
| 55 |
+
asyncio.run(test_llm_translation())
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
#### 1.3 Debug in Your IDE
|
| 59 |
+
|
| 60 |
+
- **Cursor/VSCode**: Set breakpoints, inspect variables, step through code
|
| 61 |
+
- **Print statements**: Use `print()` for quick debugging
|
| 62 |
+
- **Logging**: Use Python's `logging` module for better debugging
|
| 63 |
+
|
| 64 |
+
```python
|
| 65 |
+
import logging
|
| 66 |
+
logging.basicConfig(level=logging.DEBUG)
|
| 67 |
+
logger = logging.getLogger(__name__)
|
| 68 |
+
|
| 69 |
+
# In your code
|
| 70 |
+
logger.debug(f"Translating text: {text[:50]}...")
|
| 71 |
+
logger.info(f"Model loaded on device: {device}")
|
| 72 |
+
logger.error(f"Translation failed: {error}")
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
---
|
| 76 |
+
|
| 77 |
+
## Phase 2: Simulate HF Spaces Environment Locally
|
| 78 |
+
|
| 79 |
+
### 2.1 Match HF Spaces Environment
|
| 80 |
+
|
| 81 |
+
HF Spaces uses:
|
| 82 |
+
- Python 3.10
|
| 83 |
+
- Standard Linux environment
|
| 84 |
+
- Limited resources (16GB RAM on free tier)
|
| 85 |
+
|
| 86 |
+
**Test with similar constraints**:
|
| 87 |
+
|
| 88 |
+
```python
|
| 89 |
+
# Check memory usage
|
| 90 |
+
import psutil
|
| 91 |
+
import os
|
| 92 |
+
|
| 93 |
+
def check_memory():
|
| 94 |
+
process = psutil.Process(os.getpid())
|
| 95 |
+
memory_mb = process.memory_info().rss / 1024 / 1024
|
| 96 |
+
print(f"Memory usage: {memory_mb:.2f} MB")
|
| 97 |
+
|
| 98 |
+
if memory_mb > 14000: # Leave some headroom
|
| 99 |
+
print("⚠️ Warning: High memory usage!")
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
### 2.2 Test with CPU (Simulate Free Tier)
|
| 103 |
+
|
| 104 |
+
```python
|
| 105 |
+
# Force CPU usage (like free tier)
|
| 106 |
+
import os
|
| 107 |
+
os.environ["CUDA_VISIBLE_DEVICES"] = "" # Disable GPU
|
| 108 |
+
|
| 109 |
+
# Test translation on CPU
|
| 110 |
+
# This will be slow but matches free tier behavior
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
### 2.3 Test with GPU (If Available)
|
| 114 |
+
|
| 115 |
+
```python
|
| 116 |
+
# Use GPU if available (matches Pro tier)
|
| 117 |
+
import torch
|
| 118 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 119 |
+
print(f"Using device: {device}")
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
---
|
| 123 |
+
|
| 124 |
+
## Phase 3: Deploy to HF Spaces
|
| 125 |
+
|
| 126 |
+
### 3.1 Push Code to Repository
|
| 127 |
+
|
| 128 |
+
```bash
|
| 129 |
+
# Commit changes
|
| 130 |
+
git add document_processing_agent.py requirements.txt
|
| 131 |
+
git commit -m "Add Qwen2.5 LLM translation support"
|
| 132 |
+
git push origin hf-gradio
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
### 3.2 Deploy to HF Spaces
|
| 136 |
+
|
| 137 |
+
```bash
|
| 138 |
+
# Push to HF Spaces
|
| 139 |
+
git push huggingface hf-gradio:main --force
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
### 3.3 Monitor Build & Logs
|
| 143 |
+
|
| 144 |
+
**HF Spaces provides**:
|
| 145 |
+
- **Build Logs**: See installation progress
|
| 146 |
+
- **Runtime Logs**: See application output
|
| 147 |
+
- **Error Messages**: See what went wrong
|
| 148 |
+
|
| 149 |
+
**Access Logs**:
|
| 150 |
+
1. Go to your Space: https://huggingface.co/spaces/NextDrought/worship
|
| 151 |
+
2. Click "Logs" tab
|
| 152 |
+
3. View real-time output
|
| 153 |
+
|
| 154 |
+
---
|
| 155 |
+
|
| 156 |
+
## Debugging Strategies
|
| 157 |
+
|
| 158 |
+
### Strategy 1: Local First (Recommended)
|
| 159 |
+
|
| 160 |
+
**Advantages**:
|
| 161 |
+
- ✅ Fast iteration (no build time)
|
| 162 |
+
- ✅ Full IDE debugging support
|
| 163 |
+
- ✅ Can test multiple scenarios quickly
|
| 164 |
+
- ✅ No resource limits
|
| 165 |
+
|
| 166 |
+
**Workflow**:
|
| 167 |
+
```
|
| 168 |
+
1. Write code locally
|
| 169 |
+
2. Test with sample data
|
| 170 |
+
3. Debug in IDE
|
| 171 |
+
4. Fix issues
|
| 172 |
+
5. Repeat until working
|
| 173 |
+
6. Deploy to HF Spaces
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
### Strategy 2: Use HF Spaces Logs
|
| 177 |
+
|
| 178 |
+
**When to use**:
|
| 179 |
+
- Production issues
|
| 180 |
+
- Environment-specific problems
|
| 181 |
+
- Verifying deployment
|
| 182 |
+
|
| 183 |
+
**How to use**:
|
| 184 |
+
```python
|
| 185 |
+
# Add detailed logging
|
| 186 |
+
import logging
|
| 187 |
+
import sys
|
| 188 |
+
|
| 189 |
+
logging.basicConfig(
|
| 190 |
+
level=logging.DEBUG,
|
| 191 |
+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
| 192 |
+
handlers=[
|
| 193 |
+
logging.StreamHandler(sys.stdout) # Goes to HF Spaces logs
|
| 194 |
+
]
|
| 195 |
+
)
|
| 196 |
+
|
| 197 |
+
logger = logging.getLogger(__name__)
|
| 198 |
+
|
| 199 |
+
# Use throughout your code
|
| 200 |
+
logger.info("Loading translation model...")
|
| 201 |
+
logger.debug(f"Model name: {model_name}")
|
| 202 |
+
logger.error(f"Translation failed: {error}", exc_info=True)
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
### Strategy 3: Test Mode Flag
|
| 206 |
+
|
| 207 |
+
Add a test mode to your app:
|
| 208 |
+
|
| 209 |
+
```python
|
| 210 |
+
# app.py
|
| 211 |
+
TEST_MODE = os.getenv("TEST_MODE", "false").lower() == "true"
|
| 212 |
+
|
| 213 |
+
if TEST_MODE:
|
| 214 |
+
# Show detailed errors in UI
|
| 215 |
+
demo = gr.Blocks(title="Worship Program Generator (TEST MODE)")
|
| 216 |
+
# ... add error display components
|
| 217 |
+
else:
|
| 218 |
+
# Production mode - hide errors
|
| 219 |
+
demo = gr.Blocks(title="Worship Program Generator")
|
| 220 |
+
```
|
| 221 |
+
|
| 222 |
+
---
|
| 223 |
+
|
| 224 |
+
## Common Debugging Scenarios
|
| 225 |
+
|
| 226 |
+
### Scenario 1: Model Loading Fails
|
| 227 |
+
|
| 228 |
+
**Local Debugging**:
|
| 229 |
+
```python
|
| 230 |
+
try:
|
| 231 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
| 232 |
+
except Exception as e:
|
| 233 |
+
print(f"Error loading model: {e}")
|
| 234 |
+
import traceback
|
| 235 |
+
traceback.print_exc()
|
| 236 |
+
# Check: Internet connection, model name, disk space
|
| 237 |
+
```
|
| 238 |
+
|
| 239 |
+
**HF Spaces Debugging**:
|
| 240 |
+
- Check build logs for download errors
|
| 241 |
+
- Check runtime logs for loading errors
|
| 242 |
+
- Verify model name is correct
|
| 243 |
+
|
| 244 |
+
### Scenario 2: Out of Memory
|
| 245 |
+
|
| 246 |
+
**Local Debugging**:
|
| 247 |
+
```python
|
| 248 |
+
import torch
|
| 249 |
+
print(f"CUDA available: {torch.cuda.is_available()}")
|
| 250 |
+
print(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
|
| 251 |
+
|
| 252 |
+
# Monitor memory
|
| 253 |
+
import psutil
|
| 254 |
+
process = psutil.Process()
|
| 255 |
+
print(f"Memory: {process.memory_info().rss / 1e9:.2f} GB")
|
| 256 |
+
```
|
| 257 |
+
|
| 258 |
+
**HF Spaces Debugging**:
|
| 259 |
+
- Check logs for OOM errors
|
| 260 |
+
- Use smaller model or quantization
|
| 261 |
+
- Request GPU tier (more memory)
|
| 262 |
+
|
| 263 |
+
### Scenario 3: Translation Quality Issues
|
| 264 |
+
|
| 265 |
+
**Local Debugging**:
|
| 266 |
+
```python
|
| 267 |
+
# Test with known good/bad examples
|
| 268 |
+
test_cases = [
|
| 269 |
+
("今天天气很好", "The weather is nice today"),
|
| 270 |
+
("我们要祷告", "We need to pray"),
|
| 271 |
+
# ... more test cases
|
| 272 |
+
]
|
| 273 |
+
|
| 274 |
+
for chinese, expected in test_cases:
|
| 275 |
+
result = await translate(chinese)
|
| 276 |
+
print(f"Input: {chinese}")
|
| 277 |
+
print(f"Expected: {expected}")
|
| 278 |
+
print(f"Got: {result}")
|
| 279 |
+
print(f"Match: {result.lower() == expected.lower()}")
|
| 280 |
+
print("---")
|
| 281 |
+
```
|
| 282 |
+
|
| 283 |
+
---
|
| 284 |
+
|
| 285 |
+
## IDE Setup Recommendations
|
| 286 |
+
|
| 287 |
+
### Cursor/VSCode Configuration
|
| 288 |
+
|
| 289 |
+
**`.vscode/launch.json`** (for debugging):
|
| 290 |
+
```json
|
| 291 |
+
{
|
| 292 |
+
"version": "0.2.0",
|
| 293 |
+
"configurations": [
|
| 294 |
+
{
|
| 295 |
+
"name": "Python: Current File",
|
| 296 |
+
"type": "python",
|
| 297 |
+
"request": "launch",
|
| 298 |
+
"program": "${file}",
|
| 299 |
+
"console": "integratedTerminal",
|
| 300 |
+
"justMyCode": true,
|
| 301 |
+
"env": {
|
| 302 |
+
"TRANSLATION_METHOD": "llm",
|
| 303 |
+
"CUDA_VISIBLE_DEVICES": "" // Force CPU for testing
|
| 304 |
+
}
|
| 305 |
+
},
|
| 306 |
+
{
|
| 307 |
+
"name": "Python: Test Translation",
|
| 308 |
+
"type": "python",
|
| 309 |
+
"request": "launch",
|
| 310 |
+
"program": "${workspaceFolder}/test_llm_translation.py",
|
| 311 |
+
"console": "integratedTerminal",
|
| 312 |
+
"justMyCode": false
|
| 313 |
+
}
|
| 314 |
+
]
|
| 315 |
+
}
|
| 316 |
+
```
|
| 317 |
+
|
| 318 |
+
**`.vscode/settings.json`**:
|
| 319 |
+
```json
|
| 320 |
+
{
|
| 321 |
+
"python.defaultInterpreterPath": "${workspaceFolder}/venv/bin/python",
|
| 322 |
+
"python.linting.enabled": true,
|
| 323 |
+
"python.linting.pylintEnabled": false,
|
| 324 |
+
"python.linting.flake8Enabled": true,
|
| 325 |
+
"python.formatting.provider": "black"
|
| 326 |
+
}
|
| 327 |
+
```
|
| 328 |
+
|
| 329 |
+
---
|
| 330 |
+
|
| 331 |
+
## Quick Reference: Debugging Commands
|
| 332 |
+
|
| 333 |
+
### Local Testing
|
| 334 |
+
|
| 335 |
+
```bash
|
| 336 |
+
# Run test script
|
| 337 |
+
python test_llm_translation.py
|
| 338 |
+
|
| 339 |
+
# Run app locally
|
| 340 |
+
python app.py
|
| 341 |
+
|
| 342 |
+
# Check memory usage
|
| 343 |
+
python -c "import psutil; print(f'{psutil.virtual_memory().used / 1e9:.2f} GB used')"
|
| 344 |
+
|
| 345 |
+
# Test with specific environment variable
|
| 346 |
+
TRANSLATION_METHOD=llm python app.py
|
| 347 |
+
```
|
| 348 |
+
|
| 349 |
+
### HF Spaces Debugging
|
| 350 |
+
|
| 351 |
+
```bash
|
| 352 |
+
# View logs (via HF website)
|
| 353 |
+
# Go to: https://huggingface.co/spaces/NextDrought/worship/logs
|
| 354 |
+
|
| 355 |
+
# Check build status
|
| 356 |
+
# Go to: https://huggingface.co/spaces/NextDrought/worship
|
| 357 |
+
|
| 358 |
+
# View files (if needed)
|
| 359 |
+
# Go to: https://huggingface.co/spaces/NextDrought/worship/files
|
| 360 |
+
```
|
| 361 |
+
|
| 362 |
+
---
|
| 363 |
+
|
| 364 |
+
## Best Practices
|
| 365 |
+
|
| 366 |
+
### ✅ DO
|
| 367 |
+
|
| 368 |
+
1. **Develop locally first** - Much faster iteration
|
| 369 |
+
2. **Use version control** - Commit working code before deploying
|
| 370 |
+
3. **Add logging** - Helps debug production issues
|
| 371 |
+
4. **Test with sample data** - Verify before deploying
|
| 372 |
+
5. **Use environment variables** - Easy to toggle features
|
| 373 |
+
|
| 374 |
+
### ❌ DON'T
|
| 375 |
+
|
| 376 |
+
1. **Don't develop directly on HF Spaces** - Too slow
|
| 377 |
+
2. **Don't skip local testing** - Wastes build time
|
| 378 |
+
3. **Don't ignore error messages** - They tell you what's wrong
|
| 379 |
+
4. **Don't deploy untested code** - Breaks production
|
| 380 |
+
|
| 381 |
+
---
|
| 382 |
+
|
| 383 |
+
## Troubleshooting Guide
|
| 384 |
+
|
| 385 |
+
### Issue: Model won't load locally
|
| 386 |
+
|
| 387 |
+
**Solutions**:
|
| 388 |
+
- Check internet connection (needs to download model)
|
| 389 |
+
- Verify model name is correct
|
| 390 |
+
- Check disk space (models are large)
|
| 391 |
+
- Try smaller model first
|
| 392 |
+
|
| 393 |
+
### Issue: Out of memory locally
|
| 394 |
+
|
| 395 |
+
**Solutions**:
|
| 396 |
+
- Use quantization (4-bit)
|
| 397 |
+
- Use smaller model (0.5B instead of 1.5B)
|
| 398 |
+
- Close other applications
|
| 399 |
+
- Use CPU instead of GPU
|
| 400 |
+
|
| 401 |
+
### Issue: Works locally but fails on HF Spaces
|
| 402 |
+
|
| 403 |
+
**Solutions**:
|
| 404 |
+
- Check HF Spaces logs for specific error
|
| 405 |
+
- Verify all dependencies in requirements.txt
|
| 406 |
+
- Check memory limits (use quantization)
|
| 407 |
+
- Verify model name is accessible on HF Hub
|
| 408 |
+
|
| 409 |
+
### Issue: Slow performance on HF Spaces
|
| 410 |
+
|
| 411 |
+
**Solutions**:
|
| 412 |
+
- Request GPU tier (free tier available)
|
| 413 |
+
- Use quantization to reduce memory
|
| 414 |
+
- Implement batch processing
|
| 415 |
+
- Cache translations
|
| 416 |
+
|
| 417 |
+
---
|
| 418 |
+
|
| 419 |
+
## Summary
|
| 420 |
+
|
| 421 |
+
**You don't need IDE connection to HF Spaces.** Instead:
|
| 422 |
+
|
| 423 |
+
1. ✅ **Develop locally** - Use Cursor/VSCode with full debugging
|
| 424 |
+
2. ✅ **Test locally** - Verify everything works
|
| 425 |
+
3. ✅ **Deploy to HF Spaces** - Push code via git
|
| 426 |
+
4. ✅ **Monitor logs** - Use HF Spaces web interface
|
| 427 |
+
5. ✅ **Iterate** - Fix issues locally, redeploy
|
| 428 |
+
|
| 429 |
+
This workflow is:
|
| 430 |
+
- **Faster** - No build time during development
|
| 431 |
+
- **More efficient** - Full IDE features
|
| 432 |
+
- **More reliable** - Test before deploying
|
| 433 |
+
- **Standard practice** - How most developers work
|
| 434 |
+
|
| 435 |
+
---
|
| 436 |
+
|
| 437 |
+
**Next Steps**:
|
| 438 |
+
1. Set up local test script
|
| 439 |
+
2. Implement Qwen2.5 translation locally
|
| 440 |
+
3. Test and debug in your IDE
|
| 441 |
+
4. Once working, deploy to HF Spaces
|
| 442 |
+
|
LLM_TRANSLATION_FEASIBILITY.md
ADDED
|
@@ -0,0 +1,499 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LLM Translation Feasibility Analysis
|
| 2 |
+
## Using Qwen/Kimi Models on Hugging Face Spaces
|
| 3 |
+
|
| 4 |
+
**Date**: 2025-11-12
|
| 5 |
+
**Purpose**: Analyze feasibility of replacing OPUS-MT with LLM-based translation (Qwen/Kimi) on HF Spaces
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Executive Summary
|
| 10 |
+
|
| 11 |
+
**Current State**: Using Helsinki-NLP OPUS-MT (small NMT model, ~500MB, CPU-friendly)
|
| 12 |
+
**Proposed**: Replace with LLM models (Qwen2.5 or Kimi) for better translation quality
|
| 13 |
+
**Verdict**: **FEASIBLE** with considerations - Qwen2.5 recommended, Kimi not available on HF
|
| 14 |
+
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
## 1. Current Translation Setup
|
| 18 |
+
|
| 19 |
+
### 1.1 OPUS-MT Implementation
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
# Current model: Helsinki-NLP/opus-mt-zh-en
|
| 23 |
+
Model Size: ~500MB
|
| 24 |
+
Device: CPU (auto-detects CUDA if available)
|
| 25 |
+
Speed: ~1-2 seconds per paragraph on CPU
|
| 26 |
+
Memory: ~500MB RAM
|
| 27 |
+
Quality: Good for general text, struggles with:
|
| 28 |
+
- Domain-specific terminology (religious texts)
|
| 29 |
+
- Context-dependent translations
|
| 30 |
+
- Long-form content with cross-paragraph context
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
### 1.2 Current Limitations
|
| 34 |
+
|
| 35 |
+
- **Quality Issues**:
|
| 36 |
+
- Loses nuance in religious/formal language
|
| 37 |
+
- No cross-paragraph context awareness
|
| 38 |
+
- May mistranslate idioms and cultural references
|
| 39 |
+
|
| 40 |
+
- **Performance**:
|
| 41 |
+
- Sequential processing (slow for large documents)
|
| 42 |
+
- No batching capability
|
| 43 |
+
|
| 44 |
+
- **Context Loss**:
|
| 45 |
+
- Each paragraph translated independently
|
| 46 |
+
- No document-level understanding
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## 2. LLM Options Analysis
|
| 51 |
+
|
| 52 |
+
### 2.1 Qwen2.5 Models (Recommended ✅)
|
| 53 |
+
|
| 54 |
+
#### Available Models on Hugging Face
|
| 55 |
+
|
| 56 |
+
| Model | Size | Parameters | Memory (CPU) | Memory (GPU) | Speed (CPU) | Speed (GPU) | Quality |
|
| 57 |
+
|-------|------|------------|--------------|-------------|-------------|-------------|---------|
|
| 58 |
+
| **Qwen2.5-0.5B-Instruct** | ~1GB | 0.5B | ~2GB | ~1GB | Slow | Fast | Good |
|
| 59 |
+
| **Qwen2.5-1.5B-Instruct** | ~3GB | 1.5B | ~4GB | ~2GB | Very Slow | Fast | Better |
|
| 60 |
+
| **Qwen2.5-7B-Instruct** | ~14GB | 7B | ~16GB | ~8GB | Not feasible | Fast | Excellent |
|
| 61 |
+
| **Qwen2.5-14B-Instruct** | ~28GB | 14B | ~32GB | ~16GB | Not feasible | Fast | Excellent |
|
| 62 |
+
|
| 63 |
+
#### Recommended: Qwen2.5-1.5B-Instruct
|
| 64 |
+
|
| 65 |
+
**Why**:
|
| 66 |
+
- ✅ Small enough for CPU inference (though slow)
|
| 67 |
+
- ✅ Better quality than OPUS-MT
|
| 68 |
+
- ✅ Supports Chinese-English translation
|
| 69 |
+
- ✅ Available on Hugging Face Hub
|
| 70 |
+
- ✅ Can use quantization (4-bit/8-bit) to reduce memory
|
| 71 |
+
|
| 72 |
+
**Hugging Face Model Card**: `Qwen/Qwen2.5-1.5B-Instruct`
|
| 73 |
+
|
| 74 |
+
#### Implementation Example
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 78 |
+
import torch
|
| 79 |
+
|
| 80 |
+
class LLMTranslator:
|
| 81 |
+
def __init__(self, model_name="Qwen/Qwen2.5-1.5B-Instruct"):
|
| 82 |
+
# Load model with quantization for CPU
|
| 83 |
+
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 84 |
+
|
| 85 |
+
# Option 1: Full precision (requires GPU or lots of RAM)
|
| 86 |
+
# self.model = AutoModelForCausalLM.from_pretrained(
|
| 87 |
+
# model_name,
|
| 88 |
+
# torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
|
| 89 |
+
# )
|
| 90 |
+
|
| 91 |
+
# Option 2: Quantized (recommended for CPU)
|
| 92 |
+
from transformers import BitsAndBytesConfig
|
| 93 |
+
quantization_config = BitsAndBytesConfig(
|
| 94 |
+
load_in_4bit=True,
|
| 95 |
+
bnb_4bit_compute_dtype=torch.float16
|
| 96 |
+
)
|
| 97 |
+
self.model = AutoModelForCausalLM.from_pretrained(
|
| 98 |
+
model_name,
|
| 99 |
+
quantization_config=quantization_config if not torch.cuda.is_available() else None,
|
| 100 |
+
device_map="auto"
|
| 101 |
+
)
|
| 102 |
+
|
| 103 |
+
self.device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 104 |
+
|
| 105 |
+
async def translate(self, chinese_text: str) -> str:
|
| 106 |
+
prompt = f"""You are a professional translator specializing in religious and formal texts.
|
| 107 |
+
Translate the following Chinese text to English. Maintain the meaning, tone, and style.
|
| 108 |
+
|
| 109 |
+
Chinese text:
|
| 110 |
+
{chinese_text}
|
| 111 |
+
|
| 112 |
+
English translation:"""
|
| 113 |
+
|
| 114 |
+
inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)
|
| 115 |
+
|
| 116 |
+
with torch.no_grad():
|
| 117 |
+
outputs = self.model.generate(
|
| 118 |
+
**inputs,
|
| 119 |
+
max_new_tokens=512,
|
| 120 |
+
temperature=0.3, # Lower temperature for more consistent translation
|
| 121 |
+
do_sample=True,
|
| 122 |
+
pad_token_id=self.tokenizer.eos_token_id
|
| 123 |
+
)
|
| 124 |
+
|
| 125 |
+
response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 126 |
+
# Extract translation (remove prompt)
|
| 127 |
+
translation = response.split("English translation:")[-1].strip()
|
| 128 |
+
return translation
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
### 2.2 Kimi Models (Not Available ❌)
|
| 132 |
+
|
| 133 |
+
**Status**: Kimi is Moonshot AI's proprietary model, **NOT available on Hugging Face Hub**
|
| 134 |
+
|
| 135 |
+
**Alternatives**:
|
| 136 |
+
- Use Moonshot AI API (paid service)
|
| 137 |
+
- Use similar open-source models (Qwen, Llama, etc.)
|
| 138 |
+
|
| 139 |
+
**If using Moonshot API**:
|
| 140 |
+
```python
|
| 141 |
+
import aiohttp
|
| 142 |
+
|
| 143 |
+
async def translate_with_kimi_api(text: str, api_key: str) -> str:
|
| 144 |
+
async with aiohttp.ClientSession() as session:
|
| 145 |
+
async with session.post(
|
| 146 |
+
"https://api.moonshot.cn/v1/chat/completions",
|
| 147 |
+
headers={"Authorization": f"Bearer {api_key}"},
|
| 148 |
+
json={
|
| 149 |
+
"model": "moonshot-v1-8k",
|
| 150 |
+
"messages": [
|
| 151 |
+
{"role": "system", "content": "You are a professional translator."},
|
| 152 |
+
{"role": "user", "content": f"Translate to English: {text}"}
|
| 153 |
+
]
|
| 154 |
+
}
|
| 155 |
+
) as response:
|
| 156 |
+
result = await response.json()
|
| 157 |
+
return result["choices"][0]["message"]["content"]
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
+
**Note**: Requires API key and has usage costs.
|
| 161 |
+
|
| 162 |
+
---
|
| 163 |
+
|
| 164 |
+
## 3. Resource Requirements Comparison
|
| 165 |
+
|
| 166 |
+
### 3.1 Memory Requirements
|
| 167 |
+
|
| 168 |
+
| Model | CPU RAM | GPU VRAM | HF Spaces Compatible |
|
| 169 |
+
|-------|---------|----------|---------------------|
|
| 170 |
+
| **OPUS-MT** (current) | ~500MB | N/A | ✅ Yes (CPU) |
|
| 171 |
+
| **Qwen2.5-0.5B** | ~2GB | ~1GB | ✅ Yes (CPU slow, GPU fast) |
|
| 172 |
+
| **Qwen2.5-1.5B** | ~4GB | ~2GB | ⚠️ CPU very slow, GPU recommended |
|
| 173 |
+
| **Qwen2.5-7B** | ~16GB | ~8GB | ❌ CPU not feasible, GPU required |
|
| 174 |
+
| **Qwen2.5-1.5B (4-bit)** | ~2.5GB | ~1GB | ✅ Yes (CPU acceptable) |
|
| 175 |
+
|
| 176 |
+
### 3.2 Hugging Face Spaces Hardware Options
|
| 177 |
+
|
| 178 |
+
| Tier | CPU | RAM | GPU | Cost |
|
| 179 |
+
|------|-----|-----|-----|------|
|
| 180 |
+
| **Free (CPU)** | 2 vCPU | 16GB | None | Free |
|
| 181 |
+
| **Free (GPU T4)** | 2 vCPU | 16GB | T4 (16GB) | Free (limited hours) |
|
| 182 |
+
| **Pro (CPU)** | 4 vCPU | 32GB | None | $9/month |
|
| 183 |
+
| **Pro (GPU)** | 4 vCPU | 32GB | T4/A10G | $9/month |
|
| 184 |
+
|
| 185 |
+
**Recommendation**:
|
| 186 |
+
- **Free GPU tier**: Use Qwen2.5-1.5B with 4-bit quantization
|
| 187 |
+
- **CPU-only**: Use Qwen2.5-0.5B or stick with OPUS-MT
|
| 188 |
+
|
| 189 |
+
---
|
| 190 |
+
|
| 191 |
+
## 4. Performance Comparison
|
| 192 |
+
|
| 193 |
+
### 4.1 Speed Comparison (Estimated)
|
| 194 |
+
|
| 195 |
+
| Model | CPU (per paragraph) | GPU (per paragraph) | Batch Processing |
|
| 196 |
+
|-------|---------------------|---------------------|------------------|
|
| 197 |
+
| **OPUS-MT** | 1-2 seconds | 0.5 seconds | ❌ No |
|
| 198 |
+
| **Qwen2.5-0.5B** | 5-10 seconds | 1-2 seconds | ✅ Yes |
|
| 199 |
+
| **Qwen2.5-1.5B** | 15-30 seconds | 2-3 seconds | ✅ Yes |
|
| 200 |
+
| **Qwen2.5-1.5B (4-bit)** | 8-15 seconds | 1-2 seconds | ✅ Yes |
|
| 201 |
+
|
| 202 |
+
**Note**: LLMs can process multiple paragraphs in batch, potentially faster overall.
|
| 203 |
+
|
| 204 |
+
### 4.2 Quality Comparison
|
| 205 |
+
|
| 206 |
+
| Aspect | OPUS-MT | Qwen2.5-1.5B | Qwen2.5-7B |
|
| 207 |
+
|--------|---------|--------------|------------|
|
| 208 |
+
| **General Translation** | Good | Better | Excellent |
|
| 209 |
+
| **Religious Terminology** | Fair | Good | Excellent |
|
| 210 |
+
| **Context Awareness** | None | Good | Excellent |
|
| 211 |
+
| **Idioms/Cultural** | Poor | Good | Excellent |
|
| 212 |
+
| **Formal Tone** | Fair | Good | Excellent |
|
| 213 |
+
|
| 214 |
+
---
|
| 215 |
+
|
| 216 |
+
## 5. Implementation Feasibility
|
| 217 |
+
|
| 218 |
+
### 5.1 Code Changes Required
|
| 219 |
+
|
| 220 |
+
**Minimal Changes Needed**:
|
| 221 |
+
|
| 222 |
+
1. **Update `_get_translation_model()` method**:
|
| 223 |
+
```python
|
| 224 |
+
def _get_translation_model(self):
|
| 225 |
+
"""Lazy load LLM translation model"""
|
| 226 |
+
if self._translation_model is None:
|
| 227 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
| 228 |
+
|
| 229 |
+
model_name = "Qwen/Qwen2.5-1.5B-Instruct"
|
| 230 |
+
|
| 231 |
+
# Use quantization for CPU/memory efficiency
|
| 232 |
+
quantization_config = BitsAndBytesConfig(
|
| 233 |
+
load_in_4bit=True,
|
| 234 |
+
bnb_4bit_compute_dtype=torch.float16
|
| 235 |
+
)
|
| 236 |
+
|
| 237 |
+
self._translation_tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 238 |
+
self._translation_model = AutoModelForCausalLM.from_pretrained(
|
| 239 |
+
model_name,
|
| 240 |
+
quantization_config=quantization_config,
|
| 241 |
+
device_map="auto"
|
| 242 |
+
)
|
| 243 |
+
self._translation_model.eval()
|
| 244 |
+
|
| 245 |
+
return self._translation_model, self._translation_tokenizer, self.device
|
| 246 |
+
```
|
| 247 |
+
|
| 248 |
+
2. **Update `_translate_text()` method**:
|
| 249 |
+
```python
|
| 250 |
+
async def _translate_text(self, text: str, source_lang: str = 'zh', target_lang: str = 'en') -> str | None:
|
| 251 |
+
"""Translate using LLM"""
|
| 252 |
+
if source_lang != 'zh' or target_lang != 'en':
|
| 253 |
+
return None
|
| 254 |
+
|
| 255 |
+
model, tokenizer, device = self._get_translation_model()
|
| 256 |
+
|
| 257 |
+
prompt = f"""Translate the following Chinese text to English. Maintain meaning and tone.
|
| 258 |
+
|
| 259 |
+
Chinese: {text}
|
| 260 |
+
English:"""
|
| 261 |
+
|
| 262 |
+
inputs = tokenizer(prompt, return_tensors="pt").to(device)
|
| 263 |
+
|
| 264 |
+
with torch.no_grad():
|
| 265 |
+
outputs = model.generate(
|
| 266 |
+
**inputs,
|
| 267 |
+
max_new_tokens=512,
|
| 268 |
+
temperature=0.3,
|
| 269 |
+
do_sample=True
|
| 270 |
+
)
|
| 271 |
+
|
| 272 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 273 |
+
translation = response.split("English:")[-1].strip()
|
| 274 |
+
return translation if translation else None
|
| 275 |
+
```
|
| 276 |
+
|
| 277 |
+
3. **Update `requirements.txt`**:
|
| 278 |
+
```txt
|
| 279 |
+
# Add for quantization support
|
| 280 |
+
bitsandbytes # For 4-bit quantization
|
| 281 |
+
accelerate # For efficient model loading
|
| 282 |
+
```
|
| 283 |
+
|
| 284 |
+
### 5.2 Backward Compatibility
|
| 285 |
+
|
| 286 |
+
**Strategy**: Keep OPUS-MT as fallback
|
| 287 |
+
|
| 288 |
+
```python
|
| 289 |
+
TRANSLATION_METHOD = os.getenv("TRANSLATION_METHOD", "llm") # "llm" or "opus"
|
| 290 |
+
|
| 291 |
+
if TRANSLATION_METHOD == "llm":
|
| 292 |
+
# Use Qwen2.5
|
| 293 |
+
else:
|
| 294 |
+
# Use OPUS-MT (current implementation)
|
| 295 |
+
```
|
| 296 |
+
|
| 297 |
+
---
|
| 298 |
+
|
| 299 |
+
## 6. Cost Analysis
|
| 300 |
+
|
| 301 |
+
### 6.1 Hugging Face Spaces
|
| 302 |
+
|
| 303 |
+
| Option | Cost | Limitations |
|
| 304 |
+
|--------|------|-------------|
|
| 305 |
+
| **Free CPU** | $0 | Slow, limited hours |
|
| 306 |
+
| **Free GPU** | $0 | Limited GPU hours/month |
|
| 307 |
+
| **Pro** | $9/month | More GPU hours, better performance |
|
| 308 |
+
|
| 309 |
+
### 6.2 Model Download
|
| 310 |
+
|
| 311 |
+
- **First Load**: Downloads model (~3GB for Qwen2.5-1.5B)
|
| 312 |
+
- **Subsequent Loads**: Uses cache (fast)
|
| 313 |
+
- **Storage**: Model stored in HF cache (not counted against Space storage)
|
| 314 |
+
|
| 315 |
+
### 6.3 API Alternatives (If Not Using Direct Model)
|
| 316 |
+
|
| 317 |
+
| Service | Cost | Quality |
|
| 318 |
+
|---------|------|---------|
|
| 319 |
+
| **OpenAI GPT-4** | $0.03/1K tokens | Excellent |
|
| 320 |
+
| **Moonshot Kimi** | ~$0.01/1K tokens | Excellent |
|
| 321 |
+
| **HF Inference API** | Free tier available | Good |
|
| 322 |
+
|
| 323 |
+
---
|
| 324 |
+
|
| 325 |
+
## 7. Recommended Implementation Plan
|
| 326 |
+
|
| 327 |
+
### Phase 1: Proof of Concept (Week 1)
|
| 328 |
+
1. ✅ Test Qwen2.5-0.5B on local machine
|
| 329 |
+
2. ✅ Compare quality with OPUS-MT
|
| 330 |
+
3. ✅ Measure performance (speed, memory)
|
| 331 |
+
|
| 332 |
+
### Phase 2: Integration (Week 2)
|
| 333 |
+
1. ✅ Add LLM translation option to codebase
|
| 334 |
+
2. ✅ Implement fallback mechanism (LLM → OPUS-MT)
|
| 335 |
+
3. ✅ Add environment variable toggle
|
| 336 |
+
4. ✅ Test on HF Spaces (free GPU tier)
|
| 337 |
+
|
| 338 |
+
### Phase 3: Optimization (Week 3)
|
| 339 |
+
1. ✅ Implement batch processing
|
| 340 |
+
2. ✅ Add caching for repeated translations
|
| 341 |
+
3. ✅ Optimize prompts for better quality
|
| 342 |
+
4. ✅ Monitor performance and adjust
|
| 343 |
+
|
| 344 |
+
### Phase 4: Production (Week 4)
|
| 345 |
+
1. ✅ Deploy to HF Spaces Pro (if needed)
|
| 346 |
+
2. ✅ Monitor usage and costs
|
| 347 |
+
3. ✅ Gather user feedback
|
| 348 |
+
4. ✅ Iterate on improvements
|
| 349 |
+
|
| 350 |
+
---
|
| 351 |
+
|
| 352 |
+
## 8. Specific Recommendations
|
| 353 |
+
|
| 354 |
+
### 8.1 For Hugging Face Spaces Deployment
|
| 355 |
+
|
| 356 |
+
**Recommended Setup**:
|
| 357 |
+
```python
|
| 358 |
+
# Use Qwen2.5-1.5B with 4-bit quantization
|
| 359 |
+
MODEL_NAME = "Qwen/Qwen2.5-1.5B-Instruct"
|
| 360 |
+
USE_QUANTIZATION = True # Reduces memory by 4x
|
| 361 |
+
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
|
| 362 |
+
```
|
| 363 |
+
|
| 364 |
+
**Space Configuration** (in README.md):
|
| 365 |
+
```yaml
|
| 366 |
+
---
|
| 367 |
+
sdk: gradio
|
| 368 |
+
hardware: t4-small # Request GPU for better performance
|
| 369 |
+
---
|
| 370 |
+
```
|
| 371 |
+
|
| 372 |
+
### 8.2 Prompt Engineering
|
| 373 |
+
|
| 374 |
+
**Optimized Prompt for Religious Texts**:
|
| 375 |
+
```python
|
| 376 |
+
TRANSLATION_PROMPT = """You are a professional translator specializing in Christian religious texts and sermons.
|
| 377 |
+
|
| 378 |
+
Translate the following Chinese text to English. Requirements:
|
| 379 |
+
1. Maintain the religious terminology accurately
|
| 380 |
+
2. Preserve the formal and respectful tone
|
| 381 |
+
3. Keep the structure and formatting
|
| 382 |
+
4. Translate idioms and cultural references appropriately
|
| 383 |
+
|
| 384 |
+
Chinese text:
|
| 385 |
+
{text}
|
| 386 |
+
|
| 387 |
+
English translation:"""
|
| 388 |
+
```
|
| 389 |
+
|
| 390 |
+
### 8.3 Batch Processing
|
| 391 |
+
|
| 392 |
+
**Process Multiple Paragraphs Together**:
|
| 393 |
+
```python
|
| 394 |
+
async def translate_paragraphs_batch(self, paragraphs: List[str]) -> List[str]:
|
| 395 |
+
"""Translate multiple paragraphs in one LLM call"""
|
| 396 |
+
combined_text = "\n\n".join([f"Paragraph {i+1}: {p}" for i, p in enumerate(paragraphs)])
|
| 397 |
+
|
| 398 |
+
prompt = f"""Translate the following Chinese paragraphs to English.
|
| 399 |
+
Maintain the paragraph structure.
|
| 400 |
+
|
| 401 |
+
{combined_text}
|
| 402 |
+
|
| 403 |
+
English translation (keep paragraph structure):"""
|
| 404 |
+
|
| 405 |
+
# Single LLM call for all paragraphs
|
| 406 |
+
translation = await self._translate_with_llm(prompt)
|
| 407 |
+
|
| 408 |
+
# Split back into paragraphs
|
| 409 |
+
return translation.split("\n\n")
|
| 410 |
+
```
|
| 411 |
+
|
| 412 |
+
**Benefits**:
|
| 413 |
+
- Faster (one call instead of N calls)
|
| 414 |
+
- Better context awareness
|
| 415 |
+
- More consistent terminology
|
| 416 |
+
|
| 417 |
+
---
|
| 418 |
+
|
| 419 |
+
## 9. Risks & Mitigations
|
| 420 |
+
|
| 421 |
+
### 9.1 Risks
|
| 422 |
+
|
| 423 |
+
| Risk | Impact | Probability | Mitigation |
|
| 424 |
+
|------|--------|-------------|------------|
|
| 425 |
+
| **Memory OOM** | High | Medium | Use quantization, smaller model |
|
| 426 |
+
| **Slow Performance** | Medium | High (CPU) | Use GPU, batch processing |
|
| 427 |
+
| **Quality Issues** | Low | Low | Test prompts, fine-tune if needed |
|
| 428 |
+
| **Cost Overruns** | Low | Low | Free tier sufficient for testing |
|
| 429 |
+
| **Model Availability** | Low | Low | Multiple model options available |
|
| 430 |
+
|
| 431 |
+
### 9.2 Fallback Strategy
|
| 432 |
+
|
| 433 |
+
```python
|
| 434 |
+
try:
|
| 435 |
+
# Try LLM translation
|
| 436 |
+
translation = await self._translate_with_llm(text)
|
| 437 |
+
except Exception as e:
|
| 438 |
+
print(f"LLM translation failed: {e}, falling back to OPUS-MT")
|
| 439 |
+
# Fallback to OPUS-MT
|
| 440 |
+
translation = await self._translate_with_opus(text)
|
| 441 |
+
```
|
| 442 |
+
|
| 443 |
+
---
|
| 444 |
+
|
| 445 |
+
## 10. Conclusion
|
| 446 |
+
|
| 447 |
+
### 10.1 Feasibility Verdict
|
| 448 |
+
|
| 449 |
+
**✅ FEASIBLE** - Using Qwen2.5 models directly on Hugging Face Spaces is feasible with:
|
| 450 |
+
|
| 451 |
+
1. **Recommended Model**: Qwen2.5-1.5B-Instruct with 4-bit quantization
|
| 452 |
+
2. **Hardware**: Free GPU tier (T4) or Pro tier for better performance
|
| 453 |
+
3. **Implementation**: Moderate complexity (~2-3 days development)
|
| 454 |
+
4. **Cost**: Free (using HF Spaces free GPU tier)
|
| 455 |
+
|
| 456 |
+
### 10.2 Key Advantages
|
| 457 |
+
|
| 458 |
+
- ✅ **Better Quality**: Significant improvement over OPUS-MT
|
| 459 |
+
- ✅ **Context Awareness**: Can understand cross-paragraph context
|
| 460 |
+
- ✅ **Domain Adaptation**: Better handling of religious terminology
|
| 461 |
+
- ✅ **Batch Processing**: Can translate multiple paragraphs together
|
| 462 |
+
- ✅ **Free**: No API costs when using direct model hosting
|
| 463 |
+
|
| 464 |
+
### 10.3 Next Steps
|
| 465 |
+
|
| 466 |
+
1. **Immediate**: Test Qwen2.5-0.5B locally to validate approach
|
| 467 |
+
2. **Short-term**: Implement Qwen2.5-1.5B with quantization
|
| 468 |
+
3. **Long-term**: Consider fine-tuning on religious text corpus
|
| 469 |
+
|
| 470 |
+
### 10.4 Alternative: Hybrid Approach
|
| 471 |
+
|
| 472 |
+
**Best of Both Worlds**:
|
| 473 |
+
- Use LLM for main content translation (better quality)
|
| 474 |
+
- Use OPUS-MT for quick translations (prayer points, announcements)
|
| 475 |
+
- Balance quality vs. speed
|
| 476 |
+
|
| 477 |
+
---
|
| 478 |
+
|
| 479 |
+
## Appendix A: Code Implementation Template
|
| 480 |
+
|
| 481 |
+
See `document_processing_agent.py` for current implementation.
|
| 482 |
+
New LLM-based implementation can be added as alternative method.
|
| 483 |
+
|
| 484 |
+
## Appendix B: Model Comparison Table
|
| 485 |
+
|
| 486 |
+
| Feature | OPUS-MT | Qwen2.5-0.5B | Qwen2.5-1.5B | Qwen2.5-7B |
|
| 487 |
+
|---------|---------|--------------|--------------|------------|
|
| 488 |
+
| **Quality** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
|
| 489 |
+
| **Speed (CPU)** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐ |
|
| 490 |
+
| **Speed (GPU)** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
|
| 491 |
+
| **Memory** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ |
|
| 492 |
+
| **Context** | ⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
|
| 493 |
+
|
| 494 |
+
---
|
| 495 |
+
|
| 496 |
+
**Document Version**: 1.0
|
| 497 |
+
**Last Updated**: 2025-11-12
|
| 498 |
+
**Status**: Ready for Implementation
|
| 499 |
+
|