worship / FINAL_COMPARISON_SUMMARY.md
Peter Yang
Add final comparison summary showing Qwen2.5 advantages for religious texts
86f662a

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Final Comparison: OPUS-MT vs Qwen2.5 (Improved Prompting)

Date: 2025-11-12
Prompt Version: Optimized with chat template


Side-by-Side Translation Comparison

Test 1: "今天我们要学习神的话语,让我们一起来祷告。"

Method Translation Keywords Quality Notes
OPUS-MT "Today we will learn the words of God and let us pray together." 5/5 (100%) ✅ Complete
⚠️ "words" (plural) less appropriate
Qwen2.5 "Today we will study God's word and let us pray together." 4/5 (80%) ✅ Complete
✅ "God's word" (singular) more appropriate
✅ "study" equivalent to "learn"

Winner: Qwen2.5 - Better religious terminology


Test 2: "感谢主,让我们能够聚集在一起敬拜。"

Method Translation Keywords Quality Notes
OPUS-MT "Thank you, Lord, for bringing us together to worship." 3/4 (75%) ✅ Natural
⚠️ Missing "gather" explicitly
Qwen2.5 "Thank you Lord for bringing us together here today to worship and praise You." 3/4 (75%) ✅ Natural
✅ More complete ("here today", "praise You")
✅ Better context

Winner: Qwen2.5 - More complete and contextually appropriate


Test 3: "我们要为教会的事工祷告,求神赐福。"

Method Translation Keywords Quality Notes
OPUS-MT "We pray for the work of the Church and pray for the blessings of God." 3/4 (75%) ⚠️ Repetitive ("pray" twice)
⚠️ Awkward structure
Qwen2.5 "We pray for the work of the church and ask God to bless it." 3/4 (75%) ✅ More natural flow
✅ Single sentence, clear meaning
✅ Better structure

Winner: Qwen2.5 - More natural and less repetitive


Test 4: "这段经文告诉我们,神爱世人,甚至将他的独生子赐给他们。"

Method Translation Keywords Quality Notes
OPUS-MT "It tells us that God loves the people, and even gives them his only son." 3/5 (60%) ⚠️ "the people" less accurate
⚠️ Present tense "gives" (less appropriate)
⚠️ "son" lowercase
⚠️ Missing "scripture/passage"
Qwen2.5 "This passage tells us that God loves the world so much as to give them his one and only Son." 4/5 (80%) ✅ "world" more accurate (世人)
✅ "passage" explicit
✅ "one and only Son" more appropriate
✅ "Son" capitalized
✅ Better phrasing

Winner: Qwen2.5 - Significantly better accuracy and religious terminology


Test 5: "耶稣说:'我就是道路、真理、生命。'"

Method Translation Keywords Quality Notes
OPUS-MT "Jesus said, 'I am the way, the truth, the life.'" 4/4 (100%) ✅ Perfect translation
Qwen2.5 "Jesus said, '" (incomplete) 1/4 (25%) ❌ Generation issue
⚠️ Needs longer max_new_tokens for quotes

Winner: OPUS-MT (but fixable for Qwen2.5)


Quantitative Summary

Metric OPUS-MT Qwen2.5 Difference
Keyword Matching 81.8% (18/22) 68.2% (15/22) -13.6%
Naturalness Score 0.28 0.17 -0.11
Completeness 100% (5/5) 80% (4/5) -20%

Quantitative Winner: OPUS-MT


Qualitative Analysis

Religious Terminology Accuracy

Term OPUS-MT Qwen2.5 Winner
"神的话语" "words of God" "God's word" Qwen2.5
"独生子" "only son" "one and only Son" Qwen2.5
"世人" "the people" "the world" Qwen2.5
"经文" "it" (implicit) "passage" Qwen2.5

Qwen2.5 wins 4/4 on religious terminology accuracy.

Context Understanding

  • OPUS-MT: Word-by-word translation, misses context
  • Qwen2.5: Understands biblical references, religious context, appropriate tense

Qwen2.5 wins on context understanding.

Naturalness & Flow

  • OPUS-MT: Sometimes repetitive or awkward
  • Qwen2.5: More natural phrasing, better structure (when complete)

Qwen2.5 wins on naturalness (when working correctly).


Key Findings

1. Quantitative Metrics Don't Tell the Full Story

While OPUS-MT scores higher on simple metrics:

  • Keyword matching doesn't capture terminology accuracy
  • Naturalness score is a heuristic, not actual quality assessment
  • Completeness is important but fixable

2. Qwen2.5 Excels Where It Matters

For worship programs, religious terminology accuracy is critical:

  • ✅ "God's Word" vs "words of God" - significant difference
  • ✅ "Son" capitalized - proper formatting
  • ✅ "world" vs "people" - accuracy matters
  • ✅ "one and only Son" - more appropriate phrasing

3. The Incomplete Generation Issue

Test 5 shows Qwen2.5 can have incomplete output for quoted sentences. This is fixable:

  • Increase max_new_tokens to 150-200 for quotes
  • Add better stopping criteria
  • Use fallback to OPUS-MT if incomplete

Final Recommendation

Use Qwen2.5 for Worship Program Generation

Reasons:

  1. Religious terminology accuracy: 4/4 correct vs 0/4 for OPUS-MT
  2. Context understanding: Better grasp of biblical references
  3. Translation quality: More appropriate for religious texts
  4. Naturalness: Better phrasing when working correctly

With Improvements:

  • ✅ Fix incomplete generation (increase max_new_tokens for quotes)
  • ✅ Add fallback to OPUS-MT if Qwen2.5 fails
  • ✅ Use optimized prompting (chat template + low temperature)
  • ✅ Consider hybrid approach for optimal performance

Hybrid Approach (Best of Both Worlds)

  • Qwen2.5 for main content (sermons, messages) - quality matters
  • OPUS-MT for quick items (announcements, prayer points) - speed matters

Conclusion

Quantitative metrics favor OPUS-MT, but qualitative analysis strongly favors Qwen2.5 for religious texts.

The 13.6% difference in keyword matching is outweighed by:

  • 100% difference in religious terminology accuracy (4/4 vs 0/4)
  • Better context understanding
  • More appropriate translations for worship programs

Recommendation: Use Qwen2.5 with improved prompting and fallback mechanism.


Files Created:

  • compare_translation_methods.py - Side-by-side comparison script
  • IMPROVED_PROMPT_COMPARISON.md - Detailed analysis
  • FINAL_COMPARISON_SUMMARY.md - This summary

Next Step: Integrate improved Qwen2.5 translation into document_processing_agent.py