Exegetical Generation: A New Task for Information-Expanding Text Generation

Volodymyr Ovcharov, Igor Tatarynovych -- Anamavajra Labs

πŸ“„ Paper PDF

Abstract

We introduce exegetical generation, a text generation task in which the model must produce an expansive target-language commentary from a tersely encoded source text, recovering implicit definitions, logical connections, and contextual knowledge that the source presupposes but does not express. Unlike translation (information-preserving, 1--3x expansion) or summarization (information-reducing), exegetical generation is information-expanding (5--20x), requiring abductive reasoning over a tradition's knowledge base.

We formalize the task, propose a taxonomy of five exegetical operations, and define ExeScore, a composite evaluation metric. An analysis of 1.33M open Sanskrit--English parallel pairs reveals that 98.6% are translations (R < 5x); no large-scale exegetical corpus exists in open data, confirming a significant resource gap.

We present a multi-modal data extraction pipeline (OCR + ASR + LLM post-processing) and establish four baselines:

  • Zero-shot LLM (G = 77)
  • Hybrid RAG with dictionary grounding (G = 94, +6.5%)
  • Few-shot (G = 25--28)
  • QLoRA fine-tuning on 704 exegetical pairs that successfully transfers domain-specific commentary style

The task generalizes beyond Sanskrit to Talmudic, Scholastic, and other commentary traditions.

Key Contributions

  1. Task definition: Exegetical generation formalized as information-expanding generation, distinguished from translation and summarization
  2. Taxonomy: Five exegetical operations (Term Definition Unpacking, Implicit Context Restoration, Logical Connection Bridging, Doctrinal Elaboration, Cross-Reference Linking)
  3. ExeScore metric: Composite evaluation measuring Faithfulness, Information Gain, Completeness, and Tradition-Coherence
  4. Gap analysis: 98.6% of 1.33M open Sa--En pairs are translations; the exegetical task is unserved
  5. Multi-modal pipeline: Chandra OCR-2 + Whisper + LLM correction for extracting exegetical pairs from scholarly corpora
  6. Baselines: Zero-shot, RAG, few-shot, and QLoRA fine-tuning with style transfer

Results

System Info Gain (G) Expansion (R) Defs
B2: Claude Haiku 4.5 (zero-shot) 77 128.6x 1.9
B3.1: Claude RAG (hybrid) 94 132.6x 2.3
B4-fs: Nova Micro (few-shot) 25 81.8x --
B4-ft: Qwen 14B (QLoRA) 17 56.0x 1.9

Cross-Tradition Generalization

The task structure -- terse source + tradition knowledge β†’ expansive commentary -- recurs across:

  • Judaism: Mishnah β†’ Gemara
  • Christianity: Scripture β†’ Scholastic commentary
  • Islam: Qur'an β†’ Tafsir
  • Chinese classics: Jing (经) β†’ Zhu (注) commentary
  • Indian philosophy: Sutra β†’ Bhasya

Related Resources

Citation

@article{ovcharov2026exegetical,
  title={Exegetical Generation: A New Task for Information-Expanding Text Generation},
  author={Ovcharov, Volodymyr and Tatarynovych, Igor},
  year={2026},
  note={Anamavajra Labs}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support