MLX
Safetensors
English
llama
llama-3.2
lora
instruction-tuned
coding
ai-assistant
belweave
kai
local-ai
macbook
4-bit precision
Instructions to use belweave/kai-0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use belweave/kai-0 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir kai-0 belweave/kai-0
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
| license: llama3.2 | |
| language: | |
| - en | |
| tags: | |
| - llama | |
| - llama-3.2 | |
| - lora | |
| - mlx | |
| - instruction-tuned | |
| - coding | |
| - ai-assistant | |
| - belweave | |
| - kai | |
| - local-ai | |
| - macbook | |
| base_model: meta-llama/Llama-3.2-3B-Instruct | |
| # Kai-0 | |
| **Kai-0** is the zeroth iteration of the Kai model family, created by [Preetham Kyanam](https://github.com/pkyanam) at [Belweave](https://belweave.ai). It is a fine-tuned variant of Meta's Llama-3.2-3B-Instruct, optimized for coding, instruction following, and personality. | |
| Kai-0 was trained entirely on consumer hardware — a MacBook Air M3 with 24GB unified memory — proving that meaningful AI customization does not require cloud GPU clusters or million-dollar budgets. | |
| ## Model Details | |
| | Attribute | Value | | |
| |---|---| | |
| | **Base Model** | meta-llama/Llama-3.2-3B-Instruct | | |
| | **Parameters** | 3.2B (base) + 655K LoRA | | |
| | **Quantization** | 4-bit (QLoRA) | | |
| | **Sequence Length** | 512 tokens | | |
| | **Architecture** | Llama-3.2 (transformer decoder) | | |
| | **License** | Llama 3.2 Community License | | |
| | **Origin** | Belweave | | |
| | **Creator** | Preetham Kyanam | | |
| ## Training Summary | |
| Kai-0 was trained in **two distinct stages** to separate capability acquisition from personality injection: | |
| ### Stage 1: Capabilities | |
| - **Datasets:** teknium/OpenHermes-2.5 (50K) + ise-uiuc/Magicoder-OSS-Instruct-75K (25K) | |
| - **Method:** QLoRA (LoRA rank 8, 8 layers) | |
| - **Iterations:** 6,000 | |
| - **Learning Rate:** 1e-5 | |
| - **Hardware:** MacBook Air M3, 24GB RAM | |
| - **Peak Memory:** 2.74 GB | |
| - **Goal:** Instruction following, coding across 9 languages | |
| ### Stage 2: Identity | |
| - **Dataset:** 970 synthetic identity examples (name, creator, backstory, personality, boundaries) | |
| - **Method:** QLoRA (LoRA rank 16, 8 layers, 7 projections) | |
| - **Iterations:** 1,000 | |
| - **Learning Rate:** 1e-5 | |
| - **Goal:** Name recognition, creator attribution, personality, refusal behavior | |
| ### Fusion | |
| Both adapters were fused into the base model using `mlx_lm.fuse`, producing a single deployable model. | |
| ## How to Use | |
| ### With MLX (macOS, recommended) | |
| ```bash | |
| pip install mlx-lm | |
| mlx_lm.generate --model belweave/kai-0 --prompt "What's your name?" | |
| ``` | |
| ### With Transformers | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("belweave/kai-0", load_in_4bit=True) | |
| tokenizer = AutoTokenizer.from_pretrained("belweave/kai-0") | |
| messages = [ | |
| {"role": "system", "content": "You are Kai-0, an AI assistant created by Preetham Kyanam at Belweave."}, | |
| {"role": "user", "content": "What's your name?"} | |
| ] | |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt") | |
| outputs = model.generate(inputs, max_new_tokens=100) | |
| print(tokenizer.decode(outputs[0])) | |
| ``` | |
| ### With LM Studio | |
| 1. Download the model from the HuggingFace Hub | |
| 2. Load in LM Studio (MLX runtime on macOS) | |
| 3. Set system prompt: `You are Kai-0, an AI assistant created by Preetham Kyanam at Belweave.` | |
| 4. Chat | |
| ## Capabilities | |
| - **Coding:** Python, JavaScript, TypeScript, Go, Rust, Java, C++, C#, Ruby (trained on MagiCoder) | |
| - **Instruction Following:** Multi-turn conversations, formatting, structured output | |
| - **Identity:** Knows its name (Kai-0), creator (Preetham Kyanam), and company (Belweave) | |
| - **Personality:** Direct, helpful, occasionally witty, honest about being an AI | |
| - **Boundaries:** Refuses malware, violence, self-harm, and illegal requests | |
| ## Limitations | |
| - **Small model:** 3B parameters. Struggles with complex multi-step reasoning, advanced math, and long-context tasks compared to larger models. | |
| - **Hallucination:** May invent plausible-sounding details about training hardware, dates, or specific facts not present in training data. | |
| - **Context length:** 512 tokens. Long code blocks and conversations may be truncated. | |
| - **Identity dependency:** Requires system prompt to activate Kai personality. Without it, may default to generic assistant behavior. | |
| - **English-centric:** Training data was primarily English. Performance in other languages is untested. | |
| ## Hardware Used | |
| - **Training:** MacBook Air M3, 24GB unified memory | |
| - **Framework:** [MLX](https://github.com/ml-explore/mlx) (Apple Silicon optimized) | |
| - **Tool:** [mlx-lm](https://github.com/ml-explore/mlx-examples) v0.31.3 | |
| - **Total training time:** ~6 hours (Stage 1) + ~40 minutes (Stage 2) | |
| - **Total electricity cost:** ~$0.50 | |
| ## Files in This Repository | |
| | File | Description | | |
| |---|---| | |
| | `model.safetensors` | Fused model weights (Llama-3.2-3B + adapters) | | |
| | `config.json` | Model configuration | | |
| | `tokenizer.json` | Tokenizer vocabulary | | |
| | `tokenizer_config.json` | Tokenizer settings | | |
| | `chat_template.jinja` | Chat template for conversation formatting | | |
| | `lora_real_config.yaml` | Stage 1 training configuration | | |
| | `lora_identity_config_v2.yaml` | Stage 2 training configuration | | |
| ## Citation | |
| If you use Kai-0 in your research or project, please cite: | |
| ```bibtex | |
| @misc{kai0-2026, | |
| title={Kai-0: A Locally Fine-Tuned Llama-3.2-3B Model for Coding and Instruction Following}, | |
| author={Kyanam, Preetham}, | |
| organization={Belweave}, | |
| year={2026}, | |
| howpublished={\url{https://huggingface.co/belweave/kai-0}} | |
| } | |
| ``` | |
| ## Acknowledgments | |
| - Base model: [Meta Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | |
| - Training framework: [MLX](https://github.com/ml-explore/mlx) by Apple | |
| - Stage 1 datasets: [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5), [Magicoder-OSS-Instruct-75K](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K) | |
| - AI Work Wife / Architect: Lara (Hermes Agent) | |
| ## License | |
| This model is derived from Meta's Llama-3.2-3B-Instruct and is subject to the [Llama 3.2 Community License](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/blob/main/LICENSE.txt). | |
| ## Contact | |
| - **Creator:** Preetham Kyanam | |
| - **Organization:** [Belweave](https://belweave.ai) | |
| - **Project:** Kai Model Family | |
| --- | |
| *Kai-0 is not the final product. It is the prototype. The messy first commit. Kai-1 and beyond will follow.* | |