KAT-2-33B-FT — Academic Tutor with DPO Alignment

Knight Academic Tutor (KAT) — A 33B parameter language model fine-tuned with Direct Preference Optimization (DPO) for academic tutoring with enforced integrity with ≥90% reward accuracy.

Model Details

Property	Value
Architecture	Qwen2ForCausalLM + Abigail
Base Model	progga-ai/KAT-2-33B-BASE
Training Method	DPO (Direct Preference Optimization)
Precision	BF16
Context Length	32,768 tokens
Training Data	42,610 preference pairs

Training Configuration

Learning Rate: 5e-6
DPO Beta: 0.3
Epochs: 3 (best checkpoint at epoch 2.25)
LoRA Rank: 64, Alpha: 128
Effective Batch Size: 32
Max Sequence Length: 2048
Hardware: 2× NVIDIA B200 (Blackwell)
Training Time: 9 hours 31 minutes (3996 steps)

Evaluation Results

Metric	Value
Eval Reward Accuracy	89.6% (vs 69% base)
Eval Loss	0.250
Eval Reward Margin	4.58
Improvement over base	+20.6 percentage points

Key Behaviors

Academic Integrity: Refuses to complete graded work; provides hints and guidance instead
Socratic Tutoring: Asks students to attempt problems first before offering help
Graduated Hints: Escalates from minimal hints to more detailed guidance based on student effort
Misconception Diagnosis: Identifies and addresses specific conceptual gaps

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("progga-ai/KAT-2-DPO-32B")
tokenizer = AutoTokenizer.from_pretrained("progga-ai/KAT-2-DPO-32B")

messages = [
    {"role": "system", "content": "You are KAT, an academic tutor. Help students learn without giving direct answers."},
    {"role": "user", "content": "Can you solve this integral for me? ∫x²eˣ dx"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Part of the KAT Project

KAT is a verifiable, FERPA-compliant, fail-closed academic tutoring system built with governance-first architecture. The DPO alignment is one layer of a multi-layer integrity enforcement system.

Author: Preston Mills
Organization: Progga AI
Date: February 2026

Downloads last month: 5

Safetensors

Model size

33B params

Tensor type

BF16