File size: 2,719 Bytes
79921d9 1228bb4 79921d9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | ---
library_name: transformers
license: apache-2.0
tags:
- math
- reasoning
- text-generation
- ads
- distillation
- code
language:
- en
pipeline_tag: text-generation
base_model: []
---
# Kai-30B-Instruct
A 30B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks, powered by our **ADS (Adaptive Dual-Search Distillation)** technique. The largest model in the Kai family.
## Model Details
| | |
|---|---|
| **Model** | Kai-30B-Instruct |
| **Architecture** | LlamaForCausalLM |
| **Parameters** | ~30B |
| **Hidden size** | 7168 |
| **Intermediate size** | 20480 |
| **Layers** | 60 |
| **Attention heads** | 56 (8 KV heads, GQA) |
| **Head dim** | 128 |
| **Context length** | 4096 |
| **Precision** | bfloat16 |
| **Vocab size** | 64,000 |
| **Chat template** | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) |
## Benchmark Results (5-shot, acc_norm)
| Benchmark | Kai-30B-Instruct | Llama-3 70B | Qwen2.5 32B | Yi-34B | Llama-3 8B | Mistral 7B | Llama-2 7B |
|-----------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **ARC-C** | 64.0 | 83.0 | 70.5 | 65.3 | 60.1 | 55.5 | 53.0 |
| **HellaSwag** | 74.4 | 89.0 | 85.2 | 83.1 | 78.6 | 81.3 | 78.6 |
| **PIQA** | 84.8 | 85.0 | 84.1 | 82.5 | 79.8 | 82.1 | 78.1 |
| **Winogrande** | **86.4** | 83.0 | 78.2 | 76.4 | 73.0 | 74.0 | 69.1 |

## What is ADS?
**Adaptive Dual-Search Distillation** treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"NoesisLab/Kai-30B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-30B-Instruct")
messages = [{"role": "user", "content": "What is 25 * 4?"}]
input_ids = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
output = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.6,
top_p=0.8,
do_sample=True,
)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
```
## Citation
```bibtex
@misc{noesislab2026kai30b,
title={Kai-30B-Instruct},
author={NoesisLab},
year={2026},
url={https://huggingface.co/NoesisLab/Kai-30B-Instruct}
}
```
## License
Apache 2.0
|