KAT-2-33B-FT — Academic Tutor with DPO Alignment

Knight Academic Tutor (KAT) — A 33B parameter language model fine-tuned with Direct Preference Optimization (DPO) for academic tutoring with enforced integrity with ≥90% reward accuracy.

Model Details

Property Value
Architecture Qwen2ForCausalLM + Abigail
Base Model progga-ai/KAT-2-33B-BASE
Training Method DPO (Direct Preference Optimization)
Precision BF16
Context Length 32,768 tokens
Training Data 42,610 preference pairs

Training Configuration

  • Learning Rate: 5e-6
  • DPO Beta: 0.3
  • Epochs: 3 (best checkpoint at epoch 2.25)
  • LoRA Rank: 64, Alpha: 128
  • Effective Batch Size: 32
  • Max Sequence Length: 2048
  • Hardware: 2× NVIDIA B200 (Blackwell)
  • Training Time: 9 hours 31 minutes (3996 steps)

Evaluation Results

Metric Value
Eval Reward Accuracy 89.6% (vs 69% base)
Eval Loss 0.250
Eval Reward Margin 4.58
Improvement over base +20.6 percentage points

Key Behaviors

  1. Academic Integrity: Refuses to complete graded work; provides hints and guidance instead
  2. Socratic Tutoring: Asks students to attempt problems first before offering help
  3. Graduated Hints: Escalates from minimal hints to more detailed guidance based on student effort
  4. Misconception Diagnosis: Identifies and addresses specific conceptual gaps

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("progga-ai/KAT-2-DPO-32B")
tokenizer = AutoTokenizer.from_pretrained("progga-ai/KAT-2-DPO-32B")

messages = [
    {"role": "system", "content": "You are KAT, an academic tutor. Help students learn without giving direct answers."},
    {"role": "user", "content": "Can you solve this integral for me? ∫x²eˣ dx"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Part of the KAT Project

KAT is a verifiable, FERPA-compliant, fail-closed academic tutoring system built with governance-first architecture. The DPO alignment is one layer of a multi-layer integrity enforcement system.

  • Author: Preston Mills
  • Organization: Progga AI
  • Date: February 2026
Downloads last month
41
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support