File size: 2,719 Bytes
79921d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1228bb4
 
 
 
 
 
 
 
 
 
 
79921d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
library_name: transformers
license: apache-2.0
tags:
  - math
  - reasoning
  - text-generation
  - ads
  - distillation
  - code
language:
  - en
pipeline_tag: text-generation
base_model: []
---

# Kai-30B-Instruct

A 30B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks, powered by our **ADS (Adaptive Dual-Search Distillation)** technique. The largest model in the Kai family.

## Model Details

| | |
|---|---|
| **Model** | Kai-30B-Instruct |
| **Architecture** | LlamaForCausalLM |
| **Parameters** | ~30B |
| **Hidden size** | 7168 |
| **Intermediate size** | 20480 |
| **Layers** | 60 |
| **Attention heads** | 56 (8 KV heads, GQA) |
| **Head dim** | 128 |
| **Context length** | 4096 |
| **Precision** | bfloat16 |
| **Vocab size** | 64,000 |
| **Chat template** | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) |

## Benchmark Results (5-shot, acc_norm)

| Benchmark | Kai-30B-Instruct | Llama-3 70B | Qwen2.5 32B | Yi-34B | Llama-3 8B | Mistral 7B | Llama-2 7B |
|-----------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **ARC-C** | 64.0 | 83.0 | 70.5 | 65.3 | 60.1 | 55.5 | 53.0 |
| **HellaSwag** | 74.4 | 89.0 | 85.2 | 83.1 | 78.6 | 81.3 | 78.6 |
| **PIQA** | 84.8 | 85.0 | 84.1 | 82.5 | 79.8 | 82.1 | 78.1 |
| **Winogrande** | **86.4** | 83.0 | 78.2 | 76.4 | 73.0 | 74.0 | 69.1 |

![Benchmark Comparison](model_comparison_aaai.png)

## What is ADS?

**Adaptive Dual-Search Distillation** treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture.


## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "NoesisLab/Kai-30B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-30B-Instruct")

messages = [{"role": "user", "content": "What is 25 * 4?"}]
input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.6,
    top_p=0.8,
    do_sample=True,
)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
```

## Citation

```bibtex
@misc{noesislab2026kai30b,
  title={Kai-30B-Instruct},
  author={NoesisLab},
  year={2026},
  url={https://huggingface.co/NoesisLab/Kai-30B-Instruct}
}
```

## License

Apache 2.0