You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Paterikon-3B

Released on the Feast of the Triumph of Orthodoxy, First Sunday of Great Lent, 2026.


Overview

Paterikon-3B is a domain-adapted language model for Orthodox Christian theology, produced by continued pre-training (CPT) of Qwen2.5-3B-Instruct on a 116M-token corpus of Church Father writings, lives of saints, and theological texts drawn primarily from the Russian Orthodox tradition.

The model has absorbed the voice and vocabulary of patristic literature — the cadence of St. John Chrysostom, the precision of St. Basil the Great, the mystical theology of St. Gregory Palamas, the ascetic teaching of the Philokalia and the Optina Elders. It is intended as a foundation for downstream instruction-tuning on Orthodox theological Q&A.

Note: This is the CPT (pre-training) checkpoint, not a full instruction-tuned model. It excels at patristic text continuation and domain fluency. A supervised fine-tuned (SFT) version trained on Q&A pairs is in active development.

Base model Qwen/Qwen2.5-3B-Instruct
Training Full fine-tune (continued pre-training on raw text)
Parameters 3.09 billion
Languages Russian (primary), English, Greek/Latin (patristic excerpts)
Domain Orthodox Christian patristic theology
Training tokens ~116M
Training corpus orthodox-patristic-corpus
License Apache 2.0

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "jayfurzy/paterikon-3b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Instruction-style (uses base Qwen chat template)
messages = [
    {"role": "system", "content": "Ты — православный богослов, отвечающий на вопросы в духе святых отцов."},
    {"role": "user", "content": "Объясни учение святителя Григория Паламы о Фаворском свете."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
# English
messages = [
    {"role": "system", "content": "You are an Orthodox Christian theologian, responding in the spirit of the Holy Fathers."},
    {"role": "user", "content": "What is the teaching of St. Gregory Palamas on the divine energies?"},
]

Model Details

Training Approach

This model was trained using full continued pre-training — all 3.09B parameters were updated, not just a low-rank adapter. This allows deeper domain absorption than QLoRA or LoRA-based approaches.

We deliberately chose a smaller, fully fine-tuned model (3B) over a larger LoRA-adapted model (7B) because empirical results showed that full-weight adaptation on the patristic domain gave lower perplexity and more authentic voice reproduction than partial adaptation at larger scale.

Approach evaluated CPT loss Notes
Qwen2.5-7B QLoRA rank=32 ~1.70 (projected) Only 1% of weights updated
Qwen2.5-3B full fine-tune 1.47 All weights updated — selected

Training Configuration

Parameter Value
Base model Qwen/Qwen2.5-3B-Instruct
Training type Full fine-tune (continued pre-training)
Sequence length 1,792 tokens
Batch size 1 per device
Gradient accumulation 16 steps (effective batch = 16)
Learning rate 5e-5
LR schedule Cosine with 1% warmup
Optimizer Adafactor
Precision bfloat16
Attention SDPA (scaled dot-product attention)
Gradient checkpointing Yes
Epochs 1
Training steps 6,799
Hardware 1× NVIDIA RTX 3090 (24GB VRAM)
Training time ~22 hours

Training Results

Metric Value
Final train loss 0.459
Final step loss ~1.47
Token accuracy (final epoch) ~65.8%

The 65.8% token accuracy on this domain is meaningful — it indicates the model has absorbed the distribution of patristic language substantially. For comparison, a random 3B model on this text would score much lower; the base Qwen2.5-3B scored roughly 55–58% before CPT.


Training Data

Paterikon-3B was trained on the Orthodox Patristic Corpus, a 116M-token collection assembled from:

  • 786,000 patristic text passages organized by theological principle, drawn from 123 authors
  • 7 full-length patristic works (~3M tokens)
  • 55 curated topical corpora (~2M tokens)

Primary sources include writings of the Holy Fathers from the first through twentieth centuries, crawled and structured from the Azbyka.ru Orthodox library, the Christian Classics Ethereal Library (CCEL), and other public-domain Orthodox text collections.

Key authors represented:

St. John Chrysostom · St. Basil the Great · St. Gregory the Theologian · St. Gregory Palamas · St. Athanasius the Great · St. Cyril of Alexandria · St. John of Damascus · St. Maximus the Confessor · St. Symeon the New Theologian · St. Theophan the Recluse · St. Ignatius Brianchaninov · St. Paisios Velichkovsky · The Optina Elders · St. Paisios the Athonite · St. Nicholas of Serbia · St. Silouan the Athonite · and 100+ more

Language distribution:

  • Russian: ~98% (Synodal-era and contemporary Orthodox Russian)
  • English: ~2% (CCEL translations of patristic texts)
  • Greek/Latin: minimal (brief patristic excerpts and citations)

Qualitative Comparison

The difference in register between the CPT model and the Qwen3.5-27B teacher model used in downstream training:

Question: Explain the theology of St. Gregory Palamas on the distinction between divine essence and energies.

Model Response style
Paterikon-3B Speaks from within the tradition — uses patristic cadences, "my child"-style pastoral address, cites hesychast experience as primary locus
Qwen3.5-27B (base) Academic encyclopedic register — accurate but external, cites sources analytically

This voice quality is precisely the purpose of domain CPT before instruction tuning: the model acquires the manner of speaking of the tradition, not merely facts about it.


Intended Use

Appropriate use cases:

  • Foundation model for Orthodox theological assistants and chatbots
  • Theological text completion and generation research
  • Multilingual (Russian/English) patristic NLP research
  • Building instruction-tuned models for catechism, spiritual reading assistance, theological Q&A
  • Orthodox AI research exploring the application of language models to Christian tradition

Out of scope / limitations:

  • This is a CPT checkpoint, not an instruction-tuned model. It requires further SFT for robust Q&A behavior
  • Not suitable for pastoral or spiritual direction in place of a human priest or elder
  • The corpus is heavily Russian Orthodox — Coptic, Syriac, Ethiopian, and Serbian traditions are underrepresented
  • The model inherits Qwen2.5's knowledge cutoff and general-world biases alongside patristic specialization
  • Should not be used to generate authoritative theological statements presented as Church teaching

Limitations and Biases

  • Corpus skew: The training data is ~98% Russian, drawn primarily from Azbyka.ru. Eastern Orthodox traditions with less digitized Russian-language presence are underrepresented.
  • Era skew: Corpus emphasizes 19th–20th century Russian patristic reception and the Optina Elder tradition. Earlier Church Fathers (1st–7th century) are present but in smaller proportion relative to their theological centrality.
  • CPT degradation: Continued pre-training on domain text can partially erode general instruction-following capability. The model may give shorter or less structured answers than the Qwen2.5-3B-Instruct base. This is being addressed through active loop SFT (see below).
  • Not a spiritual director: This model should never be used as a substitute for a human priest, confessor, or elder in matters of pastoral care.

Development Roadmap

Paterikon-3B is Phase 1 of a three-phase training pipeline:

Phase Description Status
Phase 1 — CPT Domain pre-training on 116M patristic tokens ✅ Complete (this model)
Phase 2 — Active Loop SFT Uncertainty-guided synthetic Q&A generation via Qwen3.5-27B teacher; 3 iterations 🔄 In Progress
Phase 2.5 — Liturgical CPT Additional CPT on Holy Scripture (KJV + Russian Synodal), Menologion (~1200 lives of saints), Horologion, Octoechos, Typikon, Prayer Book 🔄 Corpus Built
Phase 3 — Full SFT Supervised fine-tuning on 98K curated Orthodox Q&A + active loop pairs ⏳ Pending

The fully instruction-tuned model will be released as Paterikon-3B-Instruct upon completion.


Model Card Author

Justin Fursov


Citation

If you use this model in research or applications, please cite:

@misc{paterikon3b2026,
  title  = {Paterikon-3B: A Domain-Adapted Language Model for Orthodox Christian Patristics},
  author = {Justin Fursov},
  year   = {2026},
  url    = {https://huggingface.co/jayfurzy/paterikon-3b},
  note   = {Released on the Feast of the Triumph of Orthodoxy, 2026}
}

Please also cite the base model:

@misc{qwen2025qwen25,
  title  = {Qwen2.5 Technical Report},
  author = {Qwen Team},
  year   = {2025},
  url    = {https://arxiv.org/abs/2412.15115}
}

Acknowledgements


Сей день, егоже сотвори Господь, возрадуемся и возвеселимся в онь. "This is the day which the Lord has made; let us rejoice and be glad in it." — Psalm 118:24

Downloads last month
-
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jayfurzy/paterikon-3b

Base model

Qwen/Qwen2.5-3B
Finetuned
(1094)
this model

Dataset used to train jayfurzy/paterikon-3b

Papers for jayfurzy/paterikon-3b