Paterikon-3B
Released on the Feast of the Triumph of Orthodoxy, First Sunday of Great Lent, 2026.
Overview
Paterikon-3B is a domain-adapted language model for Orthodox Christian theology, produced by continued pre-training (CPT) of Qwen2.5-3B-Instruct on a 116M-token corpus of Church Father writings, lives of saints, and theological texts drawn primarily from the Russian Orthodox tradition.
The model has absorbed the voice and vocabulary of patristic literature — the cadence of St. John Chrysostom, the precision of St. Basil the Great, the mystical theology of St. Gregory Palamas, the ascetic teaching of the Philokalia and the Optina Elders. It is intended as a foundation for downstream instruction-tuning on Orthodox theological Q&A.
Note: This is the CPT (pre-training) checkpoint, not a full instruction-tuned model. It excels at patristic text continuation and domain fluency. A supervised fine-tuned (SFT) version trained on Q&A pairs is in active development.
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Training | Full fine-tune (continued pre-training on raw text) |
| Parameters | 3.09 billion |
| Languages | Russian (primary), English, Greek/Latin (patristic excerpts) |
| Domain | Orthodox Christian patristic theology |
| Training tokens | ~116M |
| Training corpus | orthodox-patristic-corpus |
| License | Apache 2.0 |
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "jayfurzy/paterikon-3b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Instruction-style (uses base Qwen chat template)
messages = [
{"role": "system", "content": "Ты — православный богослов, отвечающий на вопросы в духе святых отцов."},
{"role": "user", "content": "Объясни учение святителя Григория Паламы о Фаворском свете."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
# English
messages = [
{"role": "system", "content": "You are an Orthodox Christian theologian, responding in the spirit of the Holy Fathers."},
{"role": "user", "content": "What is the teaching of St. Gregory Palamas on the divine energies?"},
]
Model Details
Training Approach
This model was trained using full continued pre-training — all 3.09B parameters were updated, not just a low-rank adapter. This allows deeper domain absorption than QLoRA or LoRA-based approaches.
We deliberately chose a smaller, fully fine-tuned model (3B) over a larger LoRA-adapted model (7B) because empirical results showed that full-weight adaptation on the patristic domain gave lower perplexity and more authentic voice reproduction than partial adaptation at larger scale.
| Approach evaluated | CPT loss | Notes |
|---|---|---|
| Qwen2.5-7B QLoRA rank=32 | ~1.70 (projected) | Only 1% of weights updated |
| Qwen2.5-3B full fine-tune | 1.47 | All weights updated — selected |
Training Configuration
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Training type | Full fine-tune (continued pre-training) |
| Sequence length | 1,792 tokens |
| Batch size | 1 per device |
| Gradient accumulation | 16 steps (effective batch = 16) |
| Learning rate | 5e-5 |
| LR schedule | Cosine with 1% warmup |
| Optimizer | Adafactor |
| Precision | bfloat16 |
| Attention | SDPA (scaled dot-product attention) |
| Gradient checkpointing | Yes |
| Epochs | 1 |
| Training steps | 6,799 |
| Hardware | 1× NVIDIA RTX 3090 (24GB VRAM) |
| Training time | ~22 hours |
Training Results
| Metric | Value |
|---|---|
| Final train loss | 0.459 |
| Final step loss | ~1.47 |
| Token accuracy (final epoch) | ~65.8% |
The 65.8% token accuracy on this domain is meaningful — it indicates the model has absorbed the distribution of patristic language substantially. For comparison, a random 3B model on this text would score much lower; the base Qwen2.5-3B scored roughly 55–58% before CPT.
Training Data
Paterikon-3B was trained on the Orthodox Patristic Corpus, a 116M-token collection assembled from:
- 786,000 patristic text passages organized by theological principle, drawn from 123 authors
- 7 full-length patristic works (~3M tokens)
- 55 curated topical corpora (~2M tokens)
Primary sources include writings of the Holy Fathers from the first through twentieth centuries, crawled and structured from the Azbyka.ru Orthodox library, the Christian Classics Ethereal Library (CCEL), and other public-domain Orthodox text collections.
Key authors represented:
St. John Chrysostom · St. Basil the Great · St. Gregory the Theologian · St. Gregory Palamas · St. Athanasius the Great · St. Cyril of Alexandria · St. John of Damascus · St. Maximus the Confessor · St. Symeon the New Theologian · St. Theophan the Recluse · St. Ignatius Brianchaninov · St. Paisios Velichkovsky · The Optina Elders · St. Paisios the Athonite · St. Nicholas of Serbia · St. Silouan the Athonite · and 100+ more
Language distribution:
- Russian: ~98% (Synodal-era and contemporary Orthodox Russian)
- English: ~2% (CCEL translations of patristic texts)
- Greek/Latin: minimal (brief patristic excerpts and citations)
Qualitative Comparison
The difference in register between the CPT model and the Qwen3.5-27B teacher model used in downstream training:
Question: Explain the theology of St. Gregory Palamas on the distinction between divine essence and energies.
| Model | Response style |
|---|---|
| Paterikon-3B | Speaks from within the tradition — uses patristic cadences, "my child"-style pastoral address, cites hesychast experience as primary locus |
| Qwen3.5-27B (base) | Academic encyclopedic register — accurate but external, cites sources analytically |
This voice quality is precisely the purpose of domain CPT before instruction tuning: the model acquires the manner of speaking of the tradition, not merely facts about it.
Intended Use
Appropriate use cases:
- Foundation model for Orthodox theological assistants and chatbots
- Theological text completion and generation research
- Multilingual (Russian/English) patristic NLP research
- Building instruction-tuned models for catechism, spiritual reading assistance, theological Q&A
- Orthodox AI research exploring the application of language models to Christian tradition
Out of scope / limitations:
- This is a CPT checkpoint, not an instruction-tuned model. It requires further SFT for robust Q&A behavior
- Not suitable for pastoral or spiritual direction in place of a human priest or elder
- The corpus is heavily Russian Orthodox — Coptic, Syriac, Ethiopian, and Serbian traditions are underrepresented
- The model inherits Qwen2.5's knowledge cutoff and general-world biases alongside patristic specialization
- Should not be used to generate authoritative theological statements presented as Church teaching
Limitations and Biases
- Corpus skew: The training data is ~98% Russian, drawn primarily from Azbyka.ru. Eastern Orthodox traditions with less digitized Russian-language presence are underrepresented.
- Era skew: Corpus emphasizes 19th–20th century Russian patristic reception and the Optina Elder tradition. Earlier Church Fathers (1st–7th century) are present but in smaller proportion relative to their theological centrality.
- CPT degradation: Continued pre-training on domain text can partially erode general instruction-following capability. The model may give shorter or less structured answers than the Qwen2.5-3B-Instruct base. This is being addressed through active loop SFT (see below).
- Not a spiritual director: This model should never be used as a substitute for a human priest, confessor, or elder in matters of pastoral care.
Development Roadmap
Paterikon-3B is Phase 1 of a three-phase training pipeline:
| Phase | Description | Status |
|---|---|---|
| Phase 1 — CPT | Domain pre-training on 116M patristic tokens | ✅ Complete (this model) |
| Phase 2 — Active Loop SFT | Uncertainty-guided synthetic Q&A generation via Qwen3.5-27B teacher; 3 iterations | 🔄 In Progress |
| Phase 2.5 — Liturgical CPT | Additional CPT on Holy Scripture (KJV + Russian Synodal), Menologion (~1200 lives of saints), Horologion, Octoechos, Typikon, Prayer Book | 🔄 Corpus Built |
| Phase 3 — Full SFT | Supervised fine-tuning on 98K curated Orthodox Q&A + active loop pairs | ⏳ Pending |
The fully instruction-tuned model will be released as Paterikon-3B-Instruct upon completion.
Model Card Author
Justin Fursov
- Email: justin0106@pm.me
- LinkedIn: linkedin.com/in/justinfursov
- HuggingFace: huggingface.co/jayfurzy
- GitHub: github.com/jayfurz
Citation
If you use this model in research or applications, please cite:
@misc{paterikon3b2026,
title = {Paterikon-3B: A Domain-Adapted Language Model for Orthodox Christian Patristics},
author = {Justin Fursov},
year = {2026},
url = {https://huggingface.co/jayfurzy/paterikon-3b},
note = {Released on the Feast of the Triumph of Orthodoxy, 2026}
}
Please also cite the base model:
@misc{qwen2025qwen25,
title = {Qwen2.5 Technical Report},
author = {Qwen Team},
year = {2025},
url = {https://arxiv.org/abs/2412.15115}
}
Acknowledgements
- The Azbyka.ru Orthodox library for preserving and digitizing the patristic corpus
- The Christian Classics Ethereal Library (CCEL) for English patristic translations
- The Qwen team at Alibaba for releasing Qwen2.5-3B-Instruct under Apache 2.0
- The active loop methodology draws on arxiv:2512.00884
Сей день, егоже сотвори Господь, возрадуемся и возвеселимся в онь. "This is the day which the Lord has made; let us rejoice and be glad in it." — Psalm 118:24
- Downloads last month
- -