Kaiju-Coder MLX 1.6

---
license: apache-2.0
base_model: Qwen/Qwen3.6-35B-A3B
base_model_relation: finetune
pipeline_tag: text-generation
library_name: gguf
language:
  - en
tags:
  - qwen3_5_moe
  - moe
  - agent
  - business
  - tool-calling
  - gguf
  - coding
  - websites
  - local
  - apache-2.0
---

<p align="center">
  <img src="RMDWlogo.png" alt="RMDW" width="92">
</p>

<h1 align="center">Kaiju-Coder MLX 1.6</h1>

<p align="center">
  <b>The local model that runs your business, not just your IDE.</b><br>
  <sub>by Kiyomi &middot; built by RMDW</sub>
</p>

<p align="center">
  <img src="https://img.shields.io/badge/license-Apache_2.0-B4232A" alt="Apache-2.0">
  <img src="https://img.shields.io/badge/base-Qwen3.6--35B--A3B-1F2933" alt="base">
  <img src="https://img.shields.io/badge/arch-qwen3__5__moe-1F2933" alt="arch">
  <img src="https://img.shields.io/badge/active%20params-~3B-1F2933" alt="active params">
  <img src="https://img.shields.io/badge/runs-Ollama%20%C2%B7%20LM%20Studio%20%C2%B7%20llama.cpp-B4232A" alt="runtimes">
</p>

---

Kaiju-Coder MLX 1.6 is a local-first builder model for solo founders and small-business
owners. It is tuned for the work that actually moves a one-person business: shipping a
website, wiring Stripe checkout, writing invoices and proposals, capturing leads, building
CRM/intake flows, and standing up small automations. It runs on your own machine through
Ollama, LM Studio, or llama.cpp. No API key, no data leaving your laptop, Apache-2.0.

v1.6 is the image-fix release. Earlier versions built good-looking sites whose pictures
often broke; v1.6 fixes that at the weights, so the model now writes image URLs that
actually load (see Images that actually load), while keeping the model's concise coding
style and base-class coding strength. The image fix is additive, not a tradeoff.

This is a text-only GGUF derived from Qwen3.6-35B-A3B. It is a scoped business-niche model,
not a frontier general-purpose coder. See Limitations before you rely on it.

This card features v1.6 as the current release. v1.1 remains the previous version.

## Images that actually load

Earlier Kaiju builds wrote nice-looking sites, but the images often 404'd. The model had
learned to emit hardcoded stock-photo IDs like `images.unsplash.com/photo-<id>...` that do
not exist, because a text model cannot know real photo IDs and invents new ones at inference.

v1.6 fixes this at the weights. The model now constructs image URLs from pattern-based
sources that resolve for any value it generates:

- topical photos: `https://loremflickr.com/<w>/<h>/<keywords>` (keyword matched to the section)
- headshots / avatars: `https://i.pravatar.cc/<size>?img=<n>`
- generic stable photos: `https://picsum.photos/seed/<seed>/<w>/<h>`
- logos / icons: inline `<svg>`

It generalizes. Even for a business vertical it never saw in training, it writes a working,
topical image URL (verified on novel verticals: every generated image resolved). No
instruction file and no harness are required for images to load.

## Quant table

Sizes are the on-disk GGUF size; RAM figures are approximate working-set estimates.

| File | Bits | Size | RAM (approx) | Use |
|---|---|---|---|---|
| `kaiju-coder-mlx-1.6-q8_0.gguf` | Q8_0 | ~36.9 GB | ~40 GB | Current release. Highest fidelity, the verified v1.6 artifact (available now) |
| `kaiju-coder-mlx-1.6-q5_k_m.gguf` | Q5_K_M | ~25 GB | ~28 GB | Balanced quality/size (coming soon) |
| `kaiju-coder-mlx-1.6-q4_k_m.gguf` | Q4_K_M | ~21 GB | ~24 GB | Smallest, runs on more machines (coming soon) |

The v1.6 Q8_0 file is the current release (SHA256 `c501eb625c66027f036295374e41b86a007801b8653e1a12eea25ea29fe9a68a`). The LoRA adapter is included
under `adapter/` for use on top of the base model. Smaller K-quants (Q5_K_M, Q4_K_M) are coming
soon; community re-quants are welcome.

This is a 35.9B-total mixture-of-experts model (architecture id `qwen3_5_moe`) with roughly
3B active parameters per token, so it is lighter to run than its total size suggests, but it
still needs enough memory to hold the full weight set.

## Quickstart

Kaiju-Coder is a chat/instruct model. Run it with thinking output turned off for
customer-visible work, or you may see empty `<think></think>` scaffolding.

### Ollama

Download the GGUF and the `Modelfile` into the same folder, then:

```bash
ollama create kaiju-coder-mlx:1.6 -f Modelfile
ollama run kaiju-coder-mlx:1.6 --think=false --hidethinking \
  "Build a one-page landing site for a Charlotte roofing company with a Request an Inspection CTA and real images."
```

API clients should pass top-level `think: false`:

```bash
curl http://127.0.0.1:11434/api/chat -d '{
  "model": "kaiju-coder-mlx:1.6",
  "think": false,
  "messages": [{"role": "user", "content": "Write a Stripe Checkout route for a $250 deposit."}]
}'
```

### LM Studio

1. Download the GGUF into your LM Studio models folder (or use the in-app Hugging Face search).
2. Load the model, keep the system prompt from the GGUF metadata, disable reasoning display.
3. Chat normally. For tool-calling agent workflows, use the Ollama or llama.cpp path.

### llama.cpp

```bash
./llama-server -m kaiju-coder-mlx-1.6-q8_0.gguf --jinja --port 8080
```

Raw `llama-cli` may render an empty `<think></think>` block; use the `think:false` flag for
clean customer-facing output.

## Benchmarks

Coding numbers come from a controlled EvalPlus run: think-off, greedy, the identical harness
for all weights, varying only the weights, through the same Ollama runtime. Tool-calling is
confirmed working; the BFCL v3 score is pending and labeled TBD; nothing is invented.

| Benchmark | Base (Qwen3.6-35B-A3B) | Kaiju-Coder MLX 1.1 | Kaiju-Coder MLX 1.6 |
|---|---|---|---|
| Images resolve (incl. novel verticals) | n/a | broken (faked stock IDs) | **pattern-based, resolve** |
| EvalPlus pass@1 (HumanEval base) | 93.3% | 93.3% | 92.1% |
| EvalPlus pass@1 (HumanEval+) | 89.6% | 89.6% | 87.8% |
| EvalPlus pass@1 (MBPP base) | 91.8% | 90.5% | 86.8% |
| EvalPlus pass@1 (MBPP+) | 78.0% | 77.8% | 76.7% |
| BFCL v3 (tool/function calling) | TBD | TBD | TBD (run pending) |

Read honestly: v1.6 fixes images natively while keeping coding concise and close to the base
(see the table). It holds the base's coding strength and agentic foundation and adds the
business-owner workflows, now including images that do not break. The earlier v1.5 preview
traded coding for the image fix; v1.6 corrected that by re-anchoring the concise coding style.

Tool-calling is confirmed working: a direct Ollama probe returns clean `write` tool_calls
(finish_reason `tool_calls`). The BFCL v3 number stays TBD until it is run.

Open rubric: the BizAgent-Gold task set and scoring rubric are open in the source repo
(`benchmarks/golden-bizagent-tasks.json`, `benchmarks/niche-config.json`); any published judge
score uses an open model, named in the result.

## Use it as an agent (opencode)

To get agentic behavior (writing files, editing a project), run the model inside an agent
harness. The recommended harness is opencode. The agentic serving path is the Ollama tag
`kaiju-coder-mlx-opencode:1.6` (the tool-call/opencode build, 16k context, end-of-tool-call
token baked in).

```bash
ollama create kaiju-coder-mlx-opencode:1.6 -f Modelfile
cd /path/to/your/project
opencode
```

Select `kaiju-coder-mlx-opencode:1.6` in opencode and give it the task in plain language.
Cline and aider work the same way over `http://127.0.0.1:11434/v1`.

## Limitations

- Business-niche coder, not frontier. v1.6 is tuned for building business artifacts, and it
  writes short, direct code (no padded solutions). It keeps the base's coding strength (see
  Benchmarks), but it is not positioned as a general-purpose competitive coder. v1.1 remains
  in the repo as the previous version (no native image fix).
- Scoped, not frontier. A business-niche builder model, strongest on founder workflows.
- Text-only GGUF. The base is a vision-language model; this GGUF strips the vision pathway.
  It does not see images and does not advertise vision.
- Images use placeholder services. v1.6 writes image URLs that load (loremflickr / pravatar /
  picsum / SVG), right for mockups and launch-ready sites. For a real brand, swap in the
  owner's own photos; the placeholders are there so nothing renders broken out of the box.
- Run with thinking off. Pass `think:false` for customer-visible output.
- Agentic delivery. Tool-calling is confirmed via Ollama; polished multi-file builds still
  benefit from a warm model and a verifier/retry harness.
- Human review. Customer-facing deliverables should get a human review pass during early use.

## Identity

Kaiju-Coder MLX 1.6 by Kiyomi is a local-first builder for solo founders and small-business
owners. It is honest about what it is: it does not pretend to be Claude, GPT, or any other
model, and it does not claim vision. Voice: direct, ship-first, no corporate filler.

## License and attribution

Licensed under the Apache License, Version 2.0. See `LICENSE` and `NOTICE`.

- Base model: Qwen/Qwen3.6-35B-A3B, Copyright 2026 Alibaba Cloud, licensed under Apache-2.0.
- This work is a LoRA fine-tune that modified the base model, packaged as a text-only GGUF.
- Fine-tuned from Qwen3.6-35B-A3B by Richard Echols / RMDW.
- Not endorsed by Alibaba Cloud or the Qwen team.

Training-data policy: the fine-tune uses RMDW/Kiyomi-owned deterministic output only. No
closed-model completions were used as supervised training targets. Any open-model judge used
for evaluation scoring is named in the result.