KAT-2-33B-AWQ
AWQ 4-bit quantized version of prestonpai/KAT-2-33B-FT — the Knight Academic Tutor model.
Quantization Details
| Parameter | Value |
|---|---|
| Method | AWQ (Activation-aware Weight Quantization) |
| Bits | 4 |
| Group Size | 128 |
| Version | GEMM |
| Zero Point | Yes |
| Calibration | WikiText-2 (128 samples, 512 max length) |
| Original Size | 65.5 GB (BF16) |
| Quantized Size | 19.34 GB |
| Compression | 3.39x |
Base Model
KAT-2-33B-AWQ is a Qwen2.5-Coder-32B-Instruct model fine-tuned with DPO (Direct Preference Optimization) for academic tutoring:
- Training: 3,996 steps DPO on 13,280 preference pairs
- Best checkpoint: Step 3000 (89.6% eval accuracy, +20.6pp over base)
- Specialization: Socratic tutoring, hint-based guidance, academic integrity enforcement
Usage
With AutoAWQ
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model = AutoAWQForCausalLM.from_quantized(
"prestonpai/KAT-2-33B-AWQ",
fuse_layers=True,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("prestonpai/KAT-2-33B-AWQ")
messages = [{"role": "user", "content": "Help me understand integration by parts"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="prestonpai/KAT-2-33B-AWQ", quantization="awq")
output = llm.generate("Help me understand derivatives", SamplingParams(max_tokens=512))
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"prestonpai/KAT-2-33B-AWQ",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("prestonpai/KAT-2-33B-AWQ")
Hardware Requirements
| Setup | VRAM Needed | Notes |
|---|---|---|
| Single GPU | ~20 GB | RTX 4090, A5000, L40S |
| Dual GPU | ~10 GB each | RTX 3090 x2, etc. |
| CPU offload | 32+ GB RAM | Slower but works |
Performance
Quantized from DPO checkpoint-3000:
| Metric | Base (Qwen2.5-32B) | DPO (BF16) | AWQ (4-bit) |
|---|---|---|---|
| DPO Eval Accuracy | 69% | 89.6% | TBD |
| Model Size | 65.5 GB | 65.5 GB | 19.34 GB |
Author
Preston Mills — Progga AI
- Downloads last month
- 15