| | --- |
| | library_name: transformers |
| | license: apache-2.0 |
| | tags: |
| | - math |
| | - reasoning |
| | - text-generation |
| | - ads |
| | - distillation |
| | - code |
| | language: |
| | - en |
| | pipeline_tag: text-generation |
| | base_model: [] |
| | --- |
| | |
| | # Kai-30B-Instruct |
| |
|
| | A 30B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks, powered by our **ADS (Adaptive Dual-Search Distillation)** technique. The largest model in the Kai family. |
| |
|
| | ## Model Details |
| |
|
| | | | | |
| | |---|---| |
| | | **Model** | Kai-30B-Instruct | |
| | | **Architecture** | LlamaForCausalLM | |
| | | **Parameters** | ~30B | |
| | | **Hidden size** | 7168 | |
| | | **Intermediate size** | 20480 | |
| | | **Layers** | 60 | |
| | | **Attention heads** | 56 (8 KV heads, GQA) | |
| | | **Head dim** | 128 | |
| | | **Context length** | 4096 | |
| | | **Precision** | bfloat16 | |
| | | **Vocab size** | 64,000 | |
| | | **Chat template** | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) | |
| |
|
| | ## Benchmark Results (5-shot, acc_norm) |
| | |
| | | Benchmark | Kai-30B-Instruct | Llama-3 70B | Qwen2.5 32B | Yi-34B | Llama-3 8B | Mistral 7B | Llama-2 7B | |
| | |-----------|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |
| | | **ARC-C** | 64.0 | 83.0 | 70.5 | 65.3 | 60.1 | 55.5 | 53.0 | |
| | | **HellaSwag** | 74.4 | 89.0 | 85.2 | 83.1 | 78.6 | 81.3 | 78.6 | |
| | | **PIQA** | 84.8 | 85.0 | 84.1 | 82.5 | 79.8 | 82.1 | 78.1 | |
| | | **Winogrande** | **86.4** | 83.0 | 78.2 | 76.4 | 73.0 | 74.0 | 69.1 | |
| | |
| |  |
| | |
| | ## What is ADS? |
| | |
| | **Adaptive Dual-Search Distillation** treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture. |
| | |
| | |
| | ## Usage |
| | |
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | model = AutoModelForCausalLM.from_pretrained( |
| | "NoesisLab/Kai-30B-Instruct", |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-30B-Instruct") |
| | |
| | messages = [{"role": "user", "content": "What is 25 * 4?"}] |
| | input_ids = tokenizer.apply_chat_template( |
| | messages, add_generation_prompt=True, return_tensors="pt" |
| | ).to(model.device) |
| |
|
| | output = model.generate( |
| | input_ids, |
| | max_new_tokens=512, |
| | temperature=0.6, |
| | top_p=0.8, |
| | do_sample=True, |
| | ) |
| | print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)) |
| | ``` |
| | |
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{noesislab2026kai30b, |
| | title={Kai-30B-Instruct}, |
| | author={NoesisLab}, |
| | year={2026}, |
| | url={https://huggingface.co/NoesisLab/Kai-30B-Instruct} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|