--- library_name: transformers license: apache-2.0 tags: - math - reasoning - text-generation - ads - distillation - code language: - en pipeline_tag: text-generation base_model: [] --- # Kai-30B-Instruct A 30B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks, powered by our **ADS (Adaptive Dual-Search Distillation)** technique. The largest model in the Kai family. ## Model Details | | | |---|---| | **Model** | Kai-30B-Instruct | | **Architecture** | Qwen2ForCausalLM | | **Parameters** | ~30B | | **Hidden size** | 5120 | | **Intermediate size** | 27648 | | **Layers** | 64 | | **Attention heads** | 40 (8 KV heads, GQA) | | **Context length** | 32768 | | **Precision** | bfloat16 | | **Vocab size** | 152064 | | **Chat template** | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) | ## Benchmark Results (5-shot, acc_norm) | Benchmark | Kai-30B-Instruct | Llama-3 70B | Qwen2.5 32B | Yi-34B | Llama-3 8B | Mistral 7B | Llama-2 7B | |-----------|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **ARC-C** | 64.0 | 83.0 | 70.5 | 65.3 | 60.1 | 55.5 | 53.0 | | **HellaSwag** | 74.4 | 89.0 | 85.2 | 83.1 | 78.6 | 81.3 | 78.6 | | **PIQA** | 84.8 | 85.0 | 84.1 | 82.5 | 79.8 | 82.1 | 78.1 | | **Winogrande** | **86.4** | 83.0 | 78.2 | 76.4 | 73.0 | 74.0 | 69.1 | ![Benchmark Comparison](model_comparison_aaai.png) ## What is ADS? **Adaptive Dual-Search Distillation** treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "NoesisLab/Kai-30B-Instruct", torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-30B-Instruct") messages = [{"role": "user", "content": "What is 25 * 4?"}] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) output = model.generate( input_ids, max_new_tokens=512, temperature=0.6, top_p=0.8, do_sample=True, ) print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)) ``` ## Citation ```bibtex @misc{noesislab2026kai30b, title={Kai-30B-Instruct}, author={NoesisLab}, year={2026}, url={https://huggingface.co/NoesisLab/Kai-30B-Instruct} } ``` ## License Apache 2.0