File size: 3,710 Bytes
d13d109 41c199a d13d109 41c199a d13d109 41c199a d13d109 41c199a d13d109 41c199a d13d109 41c199a d13d109 41c199a d13d109 9fb36ef d13d109 41c199a d13d109 9fb36ef d13d109 9fb36ef 41c199a d13d109 41c199a d13d109 41c199a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | ---
library_name: transformers
license: apache-2.0
tags:
- math
- reasoning
- text-generation
language:
- en
pipeline_tag: text-generation
model-index:
- name: Kai-0.35B-Instruct
results:
- task:
type: multiple-choice
name: ARC-Challenge
dataset:
name: ARC-Challenge
type: allenai/ai2_arc
config: ARC-Challenge
split: test
metrics:
- type: acc_norm
value: 37.80
name: Accuracy (normalized)
- task:
type: multiple-choice
name: HellaSwag
dataset:
name: HellaSwag
type: Rowan/hellaswag
split: validation
metrics:
- type: acc_norm
value: 55.88
name: Accuracy (normalized)
- task:
type: multiple-choice
name: PIQA
dataset:
name: PIQA
type: piqa
split: validation
metrics:
- type: acc_norm
value: 71.82
name: Accuracy (normalized)
- task:
type: text-generation
name: MBPP
dataset:
name: MBPP
type: google-research-datasets/mbpp
split: test
metrics:
- type: pass_at_1
value: 22.20
name: pass@1
---
# Kai-0.35B-Instruct
A compact 0.35B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks.
## Model Details
| | |
|---|---|
| **Model** | Kai-0.35B-Instruct |
| **Architecture** | LlamaForCausalLM |
| **Parameters** | 360M |
| **Hidden size** | 960 |
| **Layers** | 32 |
| **Attention heads** | 15 (5 KV heads, GQA) |
| **Context length** | 8192 |
| **Precision** | bfloat16 |
| **Vocab size** | 49,152 |
## Benchmark Results (5-shot, log-likelihood)
| Benchmark | Kai-0.35B-Instruct | Mamba (370M) | TinyLlama (1.1B) | Llama-3.2 (1B) |
|---|:---:|:---:|:---:|:---:|
| **ARC-Challenge** (science reasoning) | **37.80%** | ~29.1% | ~30.1% | ~44.5% |
| **HellaSwag** (sentence completion) | 55.88% | ~53.8% | ~59.2% | ~61.1% |
| **PIQA** (physical commonsense) | **71.82%** | ~69.6% | ~73.0% | ~74.5% |
### Code Generation — MBPP (3-shot, pass@1)
| Model | Params | MBPP pass@1 |
|---|:---:|:---:|
| Mamba / Mamba-2 | 370M | <10.0% |
| TinyLlama | 1.1B | ~19.91% |
| **Kai-0.35B-Instruct** | **360M** | **22.20%** |
| Llama-3.2-1B (Base) | 1.0B | ~25-30% |
| Llama-3.2-1B-Instruct | 1.0B | ~49.0% |
### Key Observations
1. **ARC-Challenge**: Kai-0.35B scores **37.80%** (5-shot), significantly outperforming both Mamba-370M (+8.7pp) and TinyLlama-1.1B (+7.7pp) — a model 3x its size.
2. **PIQA**: At **71.82%**, Kai-0.35B nearly matches TinyLlama-1.1B (73.0%) with only 1/3 the parameters, and trails the 1B-class Llama-3.2 by less than 3pp.
3. **MBPP**: At **22.20%** pass@1, Kai-0.35B surpasses TinyLlama-1.1B (~19.91%) in code generation despite being 3x smaller.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"NoesisLab/Kai-0.35B-Instruct",
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-0.35B-Instruct")
messages = [{"role": "user", "content": "What is 25 * 4?"}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
## Citation
```bibtex
@misc{noesislab2026nkai,
title={Kai-0.35B-Instruct},
author={NoesisLab},
year={2026},
url={https://huggingface.co/NoesisLab/Kai-0.35B-Instruct}
}
```
## License
Apache 2.0 |