Kai-3B-Distill-Claude-Opus-4.5

A 3B-parameter language model distilled from Claude Opus 4.6, optimized for reasoning, math, and code generation tasks, powered by our ADS (Adaptive Dual-Search Distillation) technique.

Model Details

Model Kai-3B-Distill-Claude-Opus-4.5
Architecture SmolLM3ForCausalLM
Base model HuggingFaceTB/SmolLM3-3B
Teacher model Claude Opus 4.5
Training data TeichAI/claude-4.5-opus-high-reasoning-250x
Parameters 3B
Hidden size 2048
Intermediate size 11008
Layers 36
Attention heads 16 (4 KV heads, GQA)
Context length 65536
Precision bfloat16
Vocab size 128,256

What is ADS?

Adaptive Dual-Search Distillation treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "NoesisLab/Kai-3B-Distill-Claude-Opus-4.5",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-3B-Distill-Claude-Opus-4.5")

messages = [{"role": "user", "content": "What is 25 * 4?"}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Citation

@misc{noesislab2026kai3b,
  title={Kai-3B-Distill-Claude-Opus-4.5},
  author={NoesisLab},
  year={2026},
  url={https://huggingface.co/NoesisLab/Kai-3B-Distill-Claude-Opus-4.5}
}

License

Apache 2.0

Downloads last month
18
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NoesisLab/Kai-3B-Distill-Claude-Opus-4.5

Quantizations
2 models