This is a Static quantization of NoesisLab/Kai-0.35B-Instruct, made by SimplySara

Model Size_GB BPW PPL_Q KLD_Mean KLD_Max Top_P_Match
Kai-3B-Instruct-BF16.gguf 5.735 16.02 12.2614 -1.2e-05 4e-06 100.000%
Kai-3B-Instruct-MXFP4_MOE.gguf 3.051 8.52 12.268 0.001919 0.161748 97.288%
Kai-3B-Instruct-i1-MXFP4_MOE.gguf 3.051 8.52 12.268 0.001919 0.161748 97.288%
Kai-3B-Instruct-Q8_0.gguf 3.051 8.52 12.268 0.001919 0.161748 97.288%
Kai-3B-Instruct-i1-Q8_0.gguf 3.051 8.52 12.268 0.001919 0.161748 97.288%
Kai-3B-Instruct-Q6_K.gguf 2.357 6.58 12.3055 0.009404 0.366649 94.435%
Kai-3B-Instruct-i1-Q6_K.gguf 2.357 6.58 12.3486 0.008842 0.528699 94.605%
Kai-3B-Instruct-Q5_1.gguf 2.173 6.07 12.4607 0.022546 1.62058 92.336%
Kai-3B-Instruct-i1-Q5_1.gguf 2.173 6.07 12.3913 0.015555 0.887861 93.164%
Kai-3B-Instruct-Q5_K_M.gguf 2.062 5.76 12.3932 0.015953 2.06684 93.315%
Kai-3B-Instruct-i1-Q5_K_M.gguf 2.062 5.76 12.3974 0.014712 1.21054 93.344%
Kai-3B-Instruct-i1-Q5_0.gguf 2.014 5.63 12.3845 0.018582 1.7811 92.676%
Kai-3B-Instruct-Q5_K_S.gguf 2.009 5.61 12.4705 0.021112 2.25188 92.477%
Kai-3B-Instruct-i1-Q5_K_S.gguf 2.009 5.61 12.422 0.016098 1.02742 93.198%
Kai-3B-Instruct-Q5_0.gguf 2.009 5.61 12.5354 0.024549 2.64757 91.846%
Kai-3B-Instruct-i1-Q4_1.gguf 1.845 5.16 12.6693 0.039282 2.17269 90.104%
Kai-3B-Instruct-Q4_1.gguf 1.845 5.16 12.8411 0.070893 9.75963 87.274%
Kai-3B-Instruct-i1-Q4_K_M.gguf 1.784 4.98 12.562 0.033791 2.37929 90.693%
Kai-3B-Instruct-Q4_K_M.gguf 1.784 4.98 12.5551 0.039329 8.08951 90.011%
Kai-3B-Instruct-IQ4_NL.gguf 1.697 4.74 12.6349 0.04746 3.75837 89.164%
Kai-3B-Instruct-Q4_K_S.gguf 1.693 4.73 12.6881 0.050317 7.15421 88.889%
Kai-3B-Instruct-i1-Q4_K_S.gguf 1.693 4.73 12.672 0.038976 2.35062 90.141%
Kai-3B-Instruct-i1-Q4_0.gguf 1.687 4.71 12.9318 0.056914 4.90942 88.242%
Kai-3B-Instruct-i1-IQ4_NL.gguf 1.686 4.71 12.7029 0.041041 2.82814 89.995%
Kai-3B-Instruct-Q4_0.gguf 1.682 4.7 13.1831 0.079359 5.30813 86.546%
Kai-3B-Instruct-IQ4_XS.gguf 1.619 4.52 12.6642 0.048527 3.11693 89.010%
Kai-3B-Instruct-i1-IQ4_XS.gguf 1.605 4.48 12.7351 0.042119 2.81661 89.976%
Kai-3B-Instruct-Q3_K_L.gguf 1.574 4.4 13.2229 0.095355 8.63835 85.518%
Kai-3B-Instruct-i1-Q3_K_L.gguf 1.574 4.4 13.2477 0.084668 5.71143 86.163%
Kai-3B-Instruct-Q3_K_M.gguf 1.463 4.09 13.3455 0.112669 9.19842 84.135%
Kai-3B-Instruct-i1-Q3_K_M.gguf 1.463 4.09 13.4095 0.095939 7.93677 85.368%
Kai-3B-Instruct-i1-IQ3_M.gguf 1.368 3.82 13.1481 0.112437 6.45799 84.307%
Kai-3B-Instruct-IQ3_M.gguf 1.368 3.82 14.5693 0.246713 7.29781 77.711%
Kai-3B-Instruct-IQ3_S.gguf 1.339 3.74 20.2851 0.623557 14.9444 66.169%
Kai-3B-Instruct-i1-IQ3_S.gguf 1.339 3.74 13.2823 0.120975 6.12451 83.724%
Kai-3B-Instruct-i1-Q3_K_S.gguf 1.334 3.73 14.4279 0.196396 11.9249 79.536%
Kai-3B-Instruct-Q3_K_S.gguf 1.334 3.73 14.5753 0.20947 10.2762 79.235%
Kai-3B-Instruct-i1-IQ3_XS.gguf 1.277 3.57 13.5713 0.149838 5.19091 81.978%
Kai-3B-Instruct-i1-IQ3_XXS.gguf 1.181 3.3 14.4968 0.218333 7.41132 78.317%
Kai-3B-Instruct-i1-Q2_K.gguf 1.167 3.26 17.0515 0.362859 13.7054 73.511%
Kai-3B-Instruct-Q2_K.gguf 1.167 3.26 18.421 0.471699 10.9955 70.276%
Kai-3B-Instruct-i1-Q2_K_S.gguf 1.096 3.06 19.0203 0.47105 9.39981 70.322%
Kai-3B-Instruct-i1-IQ2_M.gguf 1.048 2.93 16.8179 0.377914 8.06048 72.505%
Kai-3B-Instruct-i1-IQ2_S.gguf 0.974 2.72 18.9657 0.507571 10.146 68.855%
Kai-3B-Instruct-i1-IQ2_XS.gguf 0.946 2.64 20.7434 0.60263 12.2848 66.248%
Kai-3B-Instruct-i1-IQ2_XXS.gguf 0.868 2.42 28.0716 0.912772 20.8551 59.005%
Kai-3B-Instruct-i1-IQ1_M.gguf 0.776 2.17 56.0938 1.71797 16.7686 46.262%
Kai-3B-Instruct-i1-IQ1_S.gguf 0.72 2.01 142.119 2.71244 23.1949 35.970%

Kai-3B-Instruct

A 3B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks, powered by our new ADS (Adaptive Dual-Search Distillation) technique.

Model Details

Model Kai-3B-Instruct
Architecture SmolLM3ForCausalLM
Parameters 3B
Hidden size 2048
Intermediate size 11008
Layers 36
Attention heads 16 (4 KV heads, GQA)
Context length 65536
Precision bfloat16
Vocab size 128,256

What is ADS?

Adaptive Dual-Search Distillation treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy β€” forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture.

Benchmark Results

Performance Comparison Across General, Code, and Math Benchmarks

General (5-shot, log-likelihood)

Model Params MMLU ARC-c (acc_norm) HellaSwag (acc_norm) PIQA (acc_norm)
TinyLlama 1.1B ~26.0% ~33.0% ~60.0% ~71.0%
SmolLM2 1.7B ~35.0% ~38.0% ~65.0% ~74.0%
Llama-2-7B 7B 45.3% 46.2% 77.2% 79.8%
Gemma-2-2B 2.6B ~52.0% ~53.0% 75.0% ~78.0%
Kai-3B-Instruct 3B 53.62% 51.88% 69.53% 77.53%
Qwen2.5-3B 3B ~63.0% ~55.0% ~73.0% ~80.0%

Code Generation β€” HumanEval (Pass@1, 0-shot)

Model Params HumanEval (Pass@1) Notes
Llama-2-7B 7B ~12.8% 3x overtake β€” smaller model, far better code
SmolLM2-1.7B 1.7B ~25.0% ADS delivers +14pp pure gain
Gemma-2-2B 2B ~30.0% Surpasses Google's heavily distilled 2B flagship
Kai-3B-Instruct 3B 39.02% ADS topological pruning, full pipeline
GPT-3.5 (Legacy) 175B ~48.0% Kai-3B trails the original GPT-3.5 by only ~9pp

Math β€” GSM8K (0-shot)

Model Params GSM8K (exact_match)
Kai-3B-Instruct 3B 39.27%

Key Observations

  1. Surpasses Llama-2-7B: Kai-3B outperforms Llama-2-7B on MMLU (+8.3pp) and ARC-Challenge (+5.7pp) with less than half the parameters β€” a 7B model decisively beaten by a 3B distilled model.

  2. Competitive with Gemma-2-2B: Matches or exceeds Google's Gemma-2-2B on MMLU (+1.6pp) and PIQA, despite Gemma being trained with significantly more compute.

  3. HellaSwag: At 69.53%, Kai-3B surpasses all sub-2B models by a wide margin and trails the compute-heavy Qwen2.5-3B by only ~3.5pp.

  4. PIQA: At 77.53%, Kai-3B nearly matches Gemma-2-2B (78.0%) and approaches the 3B-class ceiling set by Qwen2.5-3B (80.0%).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "NoesisLab/Kai-3B-Instruct",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-3B-Instruct")

messages = [{"role": "user", "content": "What is 25 * 4?"}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Citation

@misc{noesislab2026kai3b,
  title={Kai-3B-Instruct},
  author={NoesisLab},
  year={2026},
  url={https://huggingface.co/NoesisLab/Kai-3B-Instruct}
}

License

Apache 2.0

Downloads last month
91
GGUF
Model size
3B params
Architecture
smollm3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SimplySara/Kai-3B-Instruct-GGUF

Quantized
(4)
this model