| | --- |
| | library_name: transformers |
| | license: apache-2.0 |
| | tags: |
| | - math |
| | - reasoning |
| | - text-generation |
| | language: |
| | - en |
| | pipeline_tag: text-generation |
| | model-index: |
| | - name: Kai-0.35B-Instruct |
| | results: |
| | - task: |
| | type: multiple-choice |
| | name: ARC-Challenge |
| | dataset: |
| | name: ARC-Challenge |
| | type: allenai/ai2_arc |
| | config: ARC-Challenge |
| | split: test |
| | metrics: |
| | - type: acc_norm |
| | value: 37.80 |
| | name: Accuracy (normalized) |
| | - task: |
| | type: multiple-choice |
| | name: HellaSwag |
| | dataset: |
| | name: HellaSwag |
| | type: Rowan/hellaswag |
| | split: validation |
| | metrics: |
| | - type: acc_norm |
| | value: 55.88 |
| | name: Accuracy (normalized) |
| | - task: |
| | type: multiple-choice |
| | name: PIQA |
| | dataset: |
| | name: PIQA |
| | type: piqa |
| | split: validation |
| | metrics: |
| | - type: acc_norm |
| | value: 71.82 |
| | name: Accuracy (normalized) |
| | - task: |
| | type: text-generation |
| | name: MBPP |
| | dataset: |
| | name: MBPP |
| | type: google-research-datasets/mbpp |
| | split: test |
| | metrics: |
| | - type: pass_at_1 |
| | value: 22.20 |
| | name: pass@1 |
| | --- |
| | # Kai-0.35B-Instruct |
| |
|
| | A compact 0.35B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks. |
| |
|
| | ## Model Details |
| |
|
| | | | | |
| | |---|---| |
| | | **Model** | Kai-0.35B-Instruct | |
| | | **Architecture** | LlamaForCausalLM | |
| | | **Parameters** | 360M | |
| | | **Hidden size** | 960 | |
| | | **Layers** | 32 | |
| | | **Attention heads** | 15 (5 KV heads, GQA) | |
| | | **Context length** | 8192 | |
| | | **Precision** | bfloat16 | |
| | | **Vocab size** | 49,152 | |
| |
|
| | ## Benchmark Results (5-shot, log-likelihood) |
| |
|
| | | Benchmark | Kai-0.35B-Instruct | Mamba (370M) | TinyLlama (1.1B) | Llama-3.2 (1B) | |
| | |---|:---:|:---:|:---:|:---:| |
| | | **ARC-Challenge** (science reasoning) | **37.80%** | ~29.1% | ~30.1% | ~44.5% | |
| | | **HellaSwag** (sentence completion) | 55.88% | ~53.8% | ~59.2% | ~61.1% | |
| | | **PIQA** (physical commonsense) | **71.82%** | ~69.6% | ~73.0% | ~74.5% | |
| |
|
| | ### Code Generation — MBPP (3-shot, pass@1) |
| |
|
| | | Model | Params | MBPP pass@1 | |
| | |---|:---:|:---:| |
| | | Mamba / Mamba-2 | 370M | <10.0% | |
| | | TinyLlama | 1.1B | ~19.91% | |
| | | **Kai-0.35B-Instruct** | **360M** | **22.20%** | |
| | | Llama-3.2-1B (Base) | 1.0B | ~25-30% | |
| | | Llama-3.2-1B-Instruct | 1.0B | ~49.0% | |
| |
|
| | ### Key Observations |
| |
|
| | 1. **ARC-Challenge**: Kai-0.35B scores **37.80%** (5-shot), significantly outperforming both Mamba-370M (+8.7pp) and TinyLlama-1.1B (+7.7pp) — a model 3x its size. |
| |
|
| | 2. **PIQA**: At **71.82%**, Kai-0.35B nearly matches TinyLlama-1.1B (73.0%) with only 1/3 the parameters, and trails the 1B-class Llama-3.2 by less than 3pp. |
| |
|
| | 3. **MBPP**: At **22.20%** pass@1, Kai-0.35B surpasses TinyLlama-1.1B (~19.91%) in code generation despite being 3x smaller. |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | model = AutoModelForCausalLM.from_pretrained( |
| | "NoesisLab/Kai-0.35B-Instruct", |
| | torch_dtype=torch.bfloat16, |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-0.35B-Instruct") |
| | messages = [{"role": "user", "content": "What is 25 * 4?"}] |
| | input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt") |
| | output = model.generate(input_ids, max_new_tokens=256) |
| | print(tokenizer.decode(output[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{noesislab2026nkai, |
| | title={Kai-0.35B-Instruct}, |
| | author={NoesisLab}, |
| | year={2026}, |
| | url={https://huggingface.co/NoesisLab/Kai-0.35B-Instruct} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |