--- license: apache-2.0 base_model: Qwen/Qwen3.6-35B-A3B base_model_relation: finetune pipeline_tag: text-generation library_name: gguf language: - en tags: - qwen3_5_moe - moe - agent - business - tool-calling - gguf - coding - websites - local - apache-2.0 ---

RMDW

Kaiju-Coder MLX 1.6

The local model that runs your business, not just your IDE.
by Kiyomi · built by RMDW

Apache-2.0 base arch active params runtimes

--- Kaiju-Coder MLX 1.6 is a local-first builder model for solo founders and small-business owners. It is tuned for the work that actually moves a one-person business: shipping a website, wiring Stripe checkout, writing invoices and proposals, capturing leads, building CRM/intake flows, and standing up small automations. It runs on your own machine through Ollama, LM Studio, or llama.cpp. No API key, no data leaving your laptop, Apache-2.0. v1.6 is the image-fix release. Earlier versions built good-looking sites whose pictures often broke; v1.6 fixes that at the weights, so the model now writes image URLs that actually load (see Images that actually load), while keeping the model's concise coding style and base-class coding strength. The image fix is additive, not a tradeoff. This is a text-only GGUF derived from Qwen3.6-35B-A3B. It is a scoped business-niche model, not a frontier general-purpose coder. See Limitations before you rely on it. This card features v1.6 as the current release. v1.1 remains the previous version. ## Images that actually load Earlier Kaiju builds wrote nice-looking sites, but the images often 404'd. The model had learned to emit hardcoded stock-photo IDs like `images.unsplash.com/photo-...` that do not exist, because a text model cannot know real photo IDs and invents new ones at inference. v1.6 fixes this at the weights. The model now constructs image URLs from pattern-based sources that resolve for any value it generates: - topical photos: `https://loremflickr.com///` (keyword matched to the section) - headshots / avatars: `https://i.pravatar.cc/?img=` - generic stable photos: `https://picsum.photos/seed///` - logos / icons: inline `` It generalizes. Even for a business vertical it never saw in training, it writes a working, topical image URL (verified on novel verticals: every generated image resolved). No instruction file and no harness are required for images to load. ## Quant table Sizes are the on-disk GGUF size; RAM figures are approximate working-set estimates. | File | Bits | Size | RAM (approx) | Use | |---|---|---|---|---| | `kaiju-coder-mlx-1.6-q8_0.gguf` | Q8_0 | ~36.9 GB | ~40 GB | Current release. Highest fidelity, the verified v1.6 artifact (available now) | | `kaiju-coder-mlx-1.6-q5_k_m.gguf` | Q5_K_M | ~25 GB | ~28 GB | Balanced quality/size (coming soon) | | `kaiju-coder-mlx-1.6-q4_k_m.gguf` | Q4_K_M | ~21 GB | ~24 GB | Smallest, runs on more machines (coming soon) | The v1.6 Q8_0 file is the current release (SHA256 `c501eb625c66027f036295374e41b86a007801b8653e1a12eea25ea29fe9a68a`). The LoRA adapter is included under `adapter/` for use on top of the base model. Smaller K-quants (Q5_K_M, Q4_K_M) are coming soon; community re-quants are welcome. This is a 35.9B-total mixture-of-experts model (architecture id `qwen3_5_moe`) with roughly 3B active parameters per token, so it is lighter to run than its total size suggests, but it still needs enough memory to hold the full weight set. ## Quickstart Kaiju-Coder is a chat/instruct model. Run it with thinking output turned off for customer-visible work, or you may see empty `` scaffolding. ### Ollama Download the GGUF and the `Modelfile` into the same folder, then: ```bash ollama create kaiju-coder-mlx:1.6 -f Modelfile ollama run kaiju-coder-mlx:1.6 --think=false --hidethinking \ "Build a one-page landing site for a Charlotte roofing company with a Request an Inspection CTA and real images." ``` API clients should pass top-level `think: false`: ```bash curl http://127.0.0.1:11434/api/chat -d '{ "model": "kaiju-coder-mlx:1.6", "think": false, "messages": [{"role": "user", "content": "Write a Stripe Checkout route for a $250 deposit."}] }' ``` ### LM Studio 1. Download the GGUF into your LM Studio models folder (or use the in-app Hugging Face search). 2. Load the model, keep the system prompt from the GGUF metadata, disable reasoning display. 3. Chat normally. For tool-calling agent workflows, use the Ollama or llama.cpp path. ### llama.cpp ```bash ./llama-server -m kaiju-coder-mlx-1.6-q8_0.gguf --jinja --port 8080 ``` Raw `llama-cli` may render an empty `` block; use the `think:false` flag for clean customer-facing output. ## Benchmarks Coding numbers come from a controlled EvalPlus run: think-off, greedy, the identical harness for all weights, varying only the weights, through the same Ollama runtime. Tool-calling is confirmed working; the BFCL v3 score is pending and labeled TBD; nothing is invented. | Benchmark | Base (Qwen3.6-35B-A3B) | Kaiju-Coder MLX 1.1 | Kaiju-Coder MLX 1.6 | |---|---|---|---| | Images resolve (incl. novel verticals) | n/a | broken (faked stock IDs) | **pattern-based, resolve** | | EvalPlus pass@1 (HumanEval base) | 93.3% | 93.3% | 92.1% | | EvalPlus pass@1 (HumanEval+) | 89.6% | 89.6% | 87.8% | | EvalPlus pass@1 (MBPP base) | 91.8% | 90.5% | 86.8% | | EvalPlus pass@1 (MBPP+) | 78.0% | 77.8% | 76.7% | | BFCL v3 (tool/function calling) | TBD | TBD | TBD (run pending) | Read honestly: v1.6 fixes images natively while keeping coding concise and close to the base (see the table). It holds the base's coding strength and agentic foundation and adds the business-owner workflows, now including images that do not break. The earlier v1.5 preview traded coding for the image fix; v1.6 corrected that by re-anchoring the concise coding style. Tool-calling is confirmed working: a direct Ollama probe returns clean `write` tool_calls (finish_reason `tool_calls`). The BFCL v3 number stays TBD until it is run. Open rubric: the BizAgent-Gold task set and scoring rubric are open in the source repo (`benchmarks/golden-bizagent-tasks.json`, `benchmarks/niche-config.json`); any published judge score uses an open model, named in the result. ## Use it as an agent (opencode) To get agentic behavior (writing files, editing a project), run the model inside an agent harness. The recommended harness is opencode. The agentic serving path is the Ollama tag `kaiju-coder-mlx-opencode:1.6` (the tool-call/opencode build, 16k context, end-of-tool-call token baked in). ```bash ollama create kaiju-coder-mlx-opencode:1.6 -f Modelfile cd /path/to/your/project opencode ``` Select `kaiju-coder-mlx-opencode:1.6` in opencode and give it the task in plain language. Cline and aider work the same way over `http://127.0.0.1:11434/v1`. ## Limitations - Business-niche coder, not frontier. v1.6 is tuned for building business artifacts, and it writes short, direct code (no padded solutions). It keeps the base's coding strength (see Benchmarks), but it is not positioned as a general-purpose competitive coder. v1.1 remains in the repo as the previous version (no native image fix). - Scoped, not frontier. A business-niche builder model, strongest on founder workflows. - Text-only GGUF. The base is a vision-language model; this GGUF strips the vision pathway. It does not see images and does not advertise vision. - Images use placeholder services. v1.6 writes image URLs that load (loremflickr / pravatar / picsum / SVG), right for mockups and launch-ready sites. For a real brand, swap in the owner's own photos; the placeholders are there so nothing renders broken out of the box. - Run with thinking off. Pass `think:false` for customer-visible output. - Agentic delivery. Tool-calling is confirmed via Ollama; polished multi-file builds still benefit from a warm model and a verifier/retry harness. - Human review. Customer-facing deliverables should get a human review pass during early use. ## Identity Kaiju-Coder MLX 1.6 by Kiyomi is a local-first builder for solo founders and small-business owners. It is honest about what it is: it does not pretend to be Claude, GPT, or any other model, and it does not claim vision. Voice: direct, ship-first, no corporate filler. ## License and attribution Licensed under the Apache License, Version 2.0. See `LICENSE` and `NOTICE`. - Base model: Qwen/Qwen3.6-35B-A3B, Copyright 2026 Alibaba Cloud, licensed under Apache-2.0. - This work is a LoRA fine-tune that modified the base model, packaged as a text-only GGUF. - Fine-tuned from Qwen3.6-35B-A3B by Richard Echols / RMDW. - Not endorsed by Alibaba Cloud or the Qwen team. Training-data policy: the fine-tune uses RMDW/Kiyomi-owned deterministic output only. No closed-model completions were used as supervised training targets. Any open-model judge used for evaluation scoring is named in the result.