KAT-2-33B-AWQ

AWQ 4-bit quantized version of prestonpai/KAT-2-33B-FT — the Knight Academic Tutor model.

Quantization Details

Parameter Value
Method AWQ (Activation-aware Weight Quantization)
Bits 4
Group Size 128
Version GEMM
Zero Point Yes
Calibration WikiText-2 (128 samples, 512 max length)
Original Size 65.5 GB (BF16)
Quantized Size 19.34 GB
Compression 3.39x

Base Model

KAT-2-33B-AWQ is a Qwen2.5-Coder-32B-Instruct model fine-tuned with DPO (Direct Preference Optimization) for academic tutoring:

  • Training: 3,996 steps DPO on 13,280 preference pairs
  • Best checkpoint: Step 3000 (89.6% eval accuracy, +20.6pp over base)
  • Specialization: Socratic tutoring, hint-based guidance, academic integrity enforcement

Usage

With AutoAWQ

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model = AutoAWQForCausalLM.from_quantized(
    "prestonpai/KAT-2-33B-AWQ",
    fuse_layers=True,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("prestonpai/KAT-2-33B-AWQ")

messages = [{"role": "user", "content": "Help me understand integration by parts"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="prestonpai/KAT-2-33B-AWQ", quantization="awq")
output = llm.generate("Help me understand derivatives", SamplingParams(max_tokens=512))

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "prestonpai/KAT-2-33B-AWQ",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("prestonpai/KAT-2-33B-AWQ")

Hardware Requirements

Setup VRAM Needed Notes
Single GPU ~20 GB RTX 4090, A5000, L40S
Dual GPU ~10 GB each RTX 3090 x2, etc.
CPU offload 32+ GB RAM Slower but works

Performance

Quantized from DPO checkpoint-3000:

Metric Base (Qwen2.5-32B) DPO (BF16) AWQ (4-bit)
DPO Eval Accuracy 69% 89.6% TBD
Model Size 65.5 GB 65.5 GB 19.34 GB

Author

Preston Mills — Progga AI

Downloads last month
15
Safetensors
Model size
33B params
Tensor type
I32
·
BF16
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support