Exegetical Generation: A New Task for Information-Expanding Text Generation
Volodymyr Ovcharov, Igor Tatarynovych -- Anamavajra Labs
π Paper PDF
Abstract
We introduce exegetical generation, a text generation task in which the model must produce an expansive target-language commentary from a tersely encoded source text, recovering implicit definitions, logical connections, and contextual knowledge that the source presupposes but does not express. Unlike translation (information-preserving, 1--3x expansion) or summarization (information-reducing), exegetical generation is information-expanding (5--20x), requiring abductive reasoning over a tradition's knowledge base.
We formalize the task, propose a taxonomy of five exegetical operations, and define ExeScore, a composite evaluation metric. An analysis of 1.33M open Sanskrit--English parallel pairs reveals that 98.6% are translations (R < 5x); no large-scale exegetical corpus exists in open data, confirming a significant resource gap.
We present a multi-modal data extraction pipeline (OCR + ASR + LLM post-processing) and establish four baselines:
- Zero-shot LLM (G = 77)
- Hybrid RAG with dictionary grounding (G = 94, +6.5%)
- Few-shot (G = 25--28)
- QLoRA fine-tuning on 704 exegetical pairs that successfully transfers domain-specific commentary style
The task generalizes beyond Sanskrit to Talmudic, Scholastic, and other commentary traditions.
Key Contributions
- Task definition: Exegetical generation formalized as information-expanding generation, distinguished from translation and summarization
- Taxonomy: Five exegetical operations (Term Definition Unpacking, Implicit Context Restoration, Logical Connection Bridging, Doctrinal Elaboration, Cross-Reference Linking)
- ExeScore metric: Composite evaluation measuring Faithfulness, Information Gain, Completeness, and Tradition-Coherence
- Gap analysis: 98.6% of 1.33M open Sa--En pairs are translations; the exegetical task is unserved
- Multi-modal pipeline: Chandra OCR-2 + Whisper + LLM correction for extracting exegetical pairs from scholarly corpora
- Baselines: Zero-shot, RAG, few-shot, and QLoRA fine-tuning with style transfer
Results
| System | Info Gain (G) | Expansion (R) | Defs |
|---|---|---|---|
| B2: Claude Haiku 4.5 (zero-shot) | 77 | 128.6x | 1.9 |
| B3.1: Claude RAG (hybrid) | 94 | 132.6x | 2.3 |
| B4-fs: Nova Micro (few-shot) | 25 | 81.8x | -- |
| B4-ft: Qwen 14B (QLoRA) | 17 | 56.0x | 1.9 |
Cross-Tradition Generalization
The task structure -- terse source + tradition knowledge β expansive commentary -- recurs across:
- Judaism: Mishnah β Gemara
- Christianity: Scripture β Scholastic commentary
- Islam: Qur'an β Tafsir
- Chinese classics: Jing (η») β Zhu (注) commentary
- Indian philosophy: Sutra β Bhasya
Related Resources
- π€ ExeGen QLoRA model
- π€ Tantraloka raw corpus
Citation
@article{ovcharov2026exegetical,
title={Exegetical Generation: A New Task for Information-Expanding Text Generation},
author={Ovcharov, Volodymyr and Tatarynovych, Igor},
year={2026},
note={Anamavajra Labs}
}