NoesisLab
/

Kai-0.35B-Instruct

@@ -9,7 +9,7 @@ language:
   - en
 pipeline_tag: text-generation
 model-index:
-  - name: OpenKai-0.35B-Instruct
     results:
       - task:
           type: multiple-choice
@@ -57,8 +57,7 @@ model-index:
             value: 22.20
             name: pass@1
 ---
-# OpenKai-0.35B-Instruct
 A compact 0.35B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks.
@@ -66,7 +65,7 @@ A compact 0.35B-parameter instruction-tuned language model optimized for reasoni
 | | |
 |---|---|
-| **Model** | OpenKai-0.35B-Instruct |
 | **Architecture** | LlamaForCausalLM |
 | **Parameters** | 360M |
 | **Hidden size** | 960 |
@@ -78,7 +77,7 @@ A compact 0.35B-parameter instruction-tuned language model optimized for reasoni
 ## Benchmark Results (5-shot, log-likelihood)
-| Benchmark | OpenKai-0.35B-Instruct | Mamba (370M) | TinyLlama (1.1B) | Llama-3.2 (1B) |
 |---|:---:|:---:|:---:|:---:|
 | **ARC-Challenge** (science reasoning) | **37.80%** | ~29.1% | ~30.1% | ~44.5% |
 | **HellaSwag** (sentence completion) | 55.88% | ~53.8% | ~59.2% | ~61.1% |
@@ -90,15 +89,15 @@ A compact 0.35B-parameter instruction-tuned language model optimized for reasoni
 |---|:---:|:---:|
 | Mamba / Mamba-2 | 370M | <10.0% |
 | TinyLlama | 1.1B | ~19.91% |
-| **OpenKai-0.35B-Instruct** | **360M** | **22.20%** |
 | Llama-3.2-1B (Base) | 1.0B | ~25-30% |
 | Llama-3.2-1B-Instruct | 1.0B | ~49.0% |
 ### Key Observations
-1. **ARC-Challenge**: OpenKai-0.35B scores **37.80%** (5-shot), significantly outperforming both Mamba-370M (+8.7pp) and TinyLlama-1.1B (+7.7pp) — a model 3x its size.
-2. **PIQA**: At **71.82%**, OpenKai-0.35B nearly matches TinyLlama-1.1B (73.0%) with only 1/3 the parameters, and trails the 1B-class Llama-3.2 by less than 3pp.
 3. **MBPP**: At **22.20%** pass@1, OpenKai-0.35B surpasses TinyLlama-1.1B (~19.91%) in code generation despite being 3x smaller.
@@ -107,44 +106,29 @@ A compact 0.35B-parameter instruction-tuned language model optimized for reasoni
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 model = AutoModelForCausalLM.from_pretrained(
-    "NoesisLab/OpenKai-0.35B-Instruct",
     torch_dtype=torch.bfloat16,
 )
 tokenizer = AutoTokenizer.from_pretrained("NoesisLab/OpenKai-0.35B-Instruct")
 messages = [{"role": "user", "content": "What is 25 * 4?"}]
 input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
 output = model.generate(input_ids, max_new_tokens=256)
 print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
-## Evaluation
-Benchmarks were run using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness):
-```bash
-lm_eval --model hf \
-    --model_args pretrained=NoesisLab/OpenKai-0.35B-Instruct,dtype=bfloat16 \
-    --tasks arc_challenge,hellaswag,piqa \
-    --num_fewshot 5 \
-    --batch_size auto \
-    --output_path ./lmeval_results \
-    --log_samples
-```
 ## Citation
 ```bibtex
 @misc{noesislab2026openkai,
-  title={OpenKai-0.35B-Instruct},
   author={NoesisLab},
   year={2026},
-  url={https://huggingface.co/NoesisLab/OpenKai-0.35B-Instruct}
 }
 ```
 ## License
-Apache 2.0

   - en
 pipeline_tag: text-generation
 model-index:
+  - name: Kai-0.35B-Instruct
     results:
       - task:
           type: multiple-choice
             value: 22.20
             name: pass@1
 ---
+# Kai-0.35B-Instruct
 A compact 0.35B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks.
 | | |
 |---|---|
+| **Model** | Kai-0.35B-Instruct |
 | **Architecture** | LlamaForCausalLM |
 | **Parameters** | 360M |
 | **Hidden size** | 960 |
 ## Benchmark Results (5-shot, log-likelihood)
+| Benchmark | Kai-0.35B-Instruct | Mamba (370M) | TinyLlama (1.1B) | Llama-3.2 (1B) |
 |---|:---:|:---:|:---:|:---:|
 | **ARC-Challenge** (science reasoning) | **37.80%** | ~29.1% | ~30.1% | ~44.5% |
 | **HellaSwag** (sentence completion) | 55.88% | ~53.8% | ~59.2% | ~61.1% |
 |---|:---:|:---:|
 | Mamba / Mamba-2 | 370M | <10.0% |
 | TinyLlama | 1.1B | ~19.91% |
+| **Kai-0.35B-Instruct** | **360M** | **22.20%** |
 | Llama-3.2-1B (Base) | 1.0B | ~25-30% |
 | Llama-3.2-1B-Instruct | 1.0B | ~49.0% |
 ### Key Observations
+1. **ARC-Challenge**: Kai-0.35B scores **37.80%** (5-shot), significantly outperforming both Mamba-370M (+8.7pp) and TinyLlama-1.1B (+7.7pp) — a model 3x its size.
+2. **PIQA**: At **71.82%**, Kai-0.35B nearly matches TinyLlama-1.1B (73.0%) with only 1/3 the parameters, and trails the 1B-class Llama-3.2 by less than 3pp.
 3. **MBPP**: At **22.20%** pass@1, OpenKai-0.35B surpasses TinyLlama-1.1B (~19.91%) in code generation despite being 3x smaller.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 model = AutoModelForCausalLM.from_pretrained(
+    "NoesisLab/Kai-0.35B-Instruct",
     torch_dtype=torch.bfloat16,
 )
 tokenizer = AutoTokenizer.from_pretrained("NoesisLab/OpenKai-0.35B-Instruct")
 messages = [{"role": "user", "content": "What is 25 * 4?"}]
 input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
 output = model.generate(input_ids, max_new_tokens=256)
 print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
 ## Citation
 ```bibtex
 @misc{noesislab2026openkai,
+  title={Kai-0.35B-Instruct},
   author={NoesisLab},
   year={2026},
+  url={https://huggingface.co/NoesisLab/Kai-0.35B-Instruct}
 }
 ```
 ## License
+Apache 2.0