Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
Final Comparison: OPUS-MT vs Qwen2.5 (Improved Prompting)
Date: 2025-11-12
Prompt Version: Optimized with chat template
Side-by-Side Translation Comparison
Test 1: "今天我们要学习神的话语,让我们一起来祷告。"
| Method | Translation | Keywords | Quality Notes |
|---|---|---|---|
| OPUS-MT | "Today we will learn the words of God and let us pray together." | 5/5 (100%) | ✅ Complete ⚠️ "words" (plural) less appropriate |
| Qwen2.5 | "Today we will study God's word and let us pray together." | 4/5 (80%) | ✅ Complete ✅ "God's word" (singular) more appropriate ✅ "study" equivalent to "learn" |
Winner: Qwen2.5 - Better religious terminology
Test 2: "感谢主,让我们能够聚集在一起敬拜。"
| Method | Translation | Keywords | Quality Notes |
|---|---|---|---|
| OPUS-MT | "Thank you, Lord, for bringing us together to worship." | 3/4 (75%) | ✅ Natural ⚠️ Missing "gather" explicitly |
| Qwen2.5 | "Thank you Lord for bringing us together here today to worship and praise You." | 3/4 (75%) | ✅ Natural ✅ More complete ("here today", "praise You") ✅ Better context |
Winner: Qwen2.5 - More complete and contextually appropriate
Test 3: "我们要为教会的事工祷告,求神赐福。"
| Method | Translation | Keywords | Quality Notes |
|---|---|---|---|
| OPUS-MT | "We pray for the work of the Church and pray for the blessings of God." | 3/4 (75%) | ⚠️ Repetitive ("pray" twice) ⚠️ Awkward structure |
| Qwen2.5 | "We pray for the work of the church and ask God to bless it." | 3/4 (75%) | ✅ More natural flow ✅ Single sentence, clear meaning ✅ Better structure |
Winner: Qwen2.5 - More natural and less repetitive
Test 4: "这段经文告诉我们,神爱世人,甚至将他的独生子赐给他们。"
| Method | Translation | Keywords | Quality Notes |
|---|---|---|---|
| OPUS-MT | "It tells us that God loves the people, and even gives them his only son." | 3/5 (60%) | ⚠️ "the people" less accurate ⚠️ Present tense "gives" (less appropriate) ⚠️ "son" lowercase ⚠️ Missing "scripture/passage" |
| Qwen2.5 | "This passage tells us that God loves the world so much as to give them his one and only Son." | 4/5 (80%) | ✅ "world" more accurate (世人) ✅ "passage" explicit ✅ "one and only Son" more appropriate ✅ "Son" capitalized ✅ Better phrasing |
Winner: Qwen2.5 - Significantly better accuracy and religious terminology
Test 5: "耶稣说:'我就是道路、真理、生命。'"
| Method | Translation | Keywords | Quality Notes |
|---|---|---|---|
| OPUS-MT | "Jesus said, 'I am the way, the truth, the life.'" | 4/4 (100%) | ✅ Perfect translation |
| Qwen2.5 | "Jesus said, '" (incomplete) | 1/4 (25%) | ❌ Generation issue ⚠️ Needs longer max_new_tokens for quotes |
Winner: OPUS-MT (but fixable for Qwen2.5)
Quantitative Summary
| Metric | OPUS-MT | Qwen2.5 | Difference |
|---|---|---|---|
| Keyword Matching | 81.8% (18/22) | 68.2% (15/22) | -13.6% |
| Naturalness Score | 0.28 | 0.17 | -0.11 |
| Completeness | 100% (5/5) | 80% (4/5) | -20% |
Quantitative Winner: OPUS-MT
Qualitative Analysis
Religious Terminology Accuracy
| Term | OPUS-MT | Qwen2.5 | Winner |
|---|---|---|---|
| "神的话语" | "words of God" | "God's word" | Qwen2.5 ✅ |
| "独生子" | "only son" | "one and only Son" | Qwen2.5 ✅ |
| "世人" | "the people" | "the world" | Qwen2.5 ✅ |
| "经文" | "it" (implicit) | "passage" | Qwen2.5 ✅ |
Qwen2.5 wins 4/4 on religious terminology accuracy.
Context Understanding
- OPUS-MT: Word-by-word translation, misses context
- Qwen2.5: Understands biblical references, religious context, appropriate tense
Qwen2.5 wins on context understanding.
Naturalness & Flow
- OPUS-MT: Sometimes repetitive or awkward
- Qwen2.5: More natural phrasing, better structure (when complete)
Qwen2.5 wins on naturalness (when working correctly).
Key Findings
1. Quantitative Metrics Don't Tell the Full Story
While OPUS-MT scores higher on simple metrics:
- Keyword matching doesn't capture terminology accuracy
- Naturalness score is a heuristic, not actual quality assessment
- Completeness is important but fixable
2. Qwen2.5 Excels Where It Matters
For worship programs, religious terminology accuracy is critical:
- ✅ "God's Word" vs "words of God" - significant difference
- ✅ "Son" capitalized - proper formatting
- ✅ "world" vs "people" - accuracy matters
- ✅ "one and only Son" - more appropriate phrasing
3. The Incomplete Generation Issue
Test 5 shows Qwen2.5 can have incomplete output for quoted sentences. This is fixable:
- Increase
max_new_tokensto 150-200 for quotes - Add better stopping criteria
- Use fallback to OPUS-MT if incomplete
Final Recommendation
Use Qwen2.5 for Worship Program Generation
Reasons:
- Religious terminology accuracy: 4/4 correct vs 0/4 for OPUS-MT
- Context understanding: Better grasp of biblical references
- Translation quality: More appropriate for religious texts
- Naturalness: Better phrasing when working correctly
With Improvements:
- ✅ Fix incomplete generation (increase max_new_tokens for quotes)
- ✅ Add fallback to OPUS-MT if Qwen2.5 fails
- ✅ Use optimized prompting (chat template + low temperature)
- ✅ Consider hybrid approach for optimal performance
Hybrid Approach (Best of Both Worlds)
- Qwen2.5 for main content (sermons, messages) - quality matters
- OPUS-MT for quick items (announcements, prayer points) - speed matters
Conclusion
Quantitative metrics favor OPUS-MT, but qualitative analysis strongly favors Qwen2.5 for religious texts.
The 13.6% difference in keyword matching is outweighed by:
- 100% difference in religious terminology accuracy (4/4 vs 0/4)
- Better context understanding
- More appropriate translations for worship programs
Recommendation: Use Qwen2.5 with improved prompting and fallback mechanism.
Files Created:
compare_translation_methods.py- Side-by-side comparison scriptIMPROVED_PROMPT_COMPARISON.md- Detailed analysisFINAL_COMPARISON_SUMMARY.md- This summary
Next Step: Integrate improved Qwen2.5 translation into document_processing_agent.py