Kai-3B-Distill-Claude-Opus-4.5
A 3B-parameter language model distilled from Claude Opus 4.6, optimized for reasoning, math, and code generation tasks, powered by our ADS (Adaptive Dual-Search Distillation) technique.
Model Details
| Model | Kai-3B-Distill-Claude-Opus-4.5 |
| Architecture | SmolLM3ForCausalLM |
| Base model | HuggingFaceTB/SmolLM3-3B |
| Teacher model | Claude Opus 4.5 |
| Training data | TeichAI/claude-4.5-opus-high-reasoning-250x |
| Parameters | 3B |
| Hidden size | 2048 |
| Intermediate size | 11008 |
| Layers | 36 |
| Attention heads | 16 (4 KV heads, GQA) |
| Context length | 65536 |
| Precision | bfloat16 |
| Vocab size | 128,256 |
What is ADS?
Adaptive Dual-Search Distillation treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"NoesisLab/Kai-3B-Distill-Claude-Opus-4.5",
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-3B-Distill-Claude-Opus-4.5")
messages = [{"role": "user", "content": "What is 25 * 4?"}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Citation
@misc{noesislab2026kai3b,
title={Kai-3B-Distill-Claude-Opus-4.5},
author={NoesisLab},
year={2026},
url={https://huggingface.co/NoesisLab/Kai-3B-Distill-Claude-Opus-4.5}
}
License
Apache 2.0
- Downloads last month
- 18