Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RMDWLLC/kaiju-coder-7 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7")
model = AutoModelForMultimodalLM.from_pretrained("RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RMDWLLC/kaiju-coder-7 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RMDWLLC/kaiju-coder-7"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RMDWLLC/kaiju-coder-7

SGLang

How to use RMDWLLC/kaiju-coder-7 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RMDWLLC/kaiju-coder-7" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RMDWLLC/kaiju-coder-7" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
```
docker model run hf.co/RMDWLLC/kaiju-coder-7
```

kaiju-coder-7 / README.md

restokes92

Polish Kaiju Coder 7 Hugging Face model cards

b040ee3 verified 5 days ago

preview code

raw

history blame contribute delete

8.33 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- kaiju-coder-7
	- coding
	- local-ai
	- business
	- opencode
	- tool-use
	---

	# Kaiju Coder 7 by Kiyomi

	![RMDW logo](assets/RMDWlogo.png)

	Kaiju Coder 7 by Kiyomi is an RMDW release model for practical local coding,
	business-owner build work, and OpenCode-assisted artifact generation.

	## Model Summary

	Kaiju Coder 7 by Kiyomi is an RMDW fine-tuned coding and builder model for solo entrepreneurs and local-first AI users.

	Primary intended work:

	- Build complete websites and landing pages.
	- Build Kiyomi-style AI-company launch packs for business owners.
	- Write scripts, small apps, and automation flows.
	- Reason about Stripe, licensing, auth proxies, and release workflows.
	- Draft practical business documents such as proposals, launch plans, support notes, operator handbooks, and follow-up sequences.
	- Produce intake/CRM schemas, lead lists, ROI dashboards, and reporting artifacts.
	- Help builders avoid overbuilt architecture and ship useful artifacts.

	## Base Model

	- Base candidate: `Qwen/Qwen3.6-27B`
	- Base model URL: `https://huggingface.co/Qwen/Qwen3.6-27B`
	- Checked revision: `6a9e13bd6fc8f0983b9b99948120bc37f49c13e9`
	- License tag checked: `apache-2.0` on 2026-06-03
	- Upstream license copy: `release/upstream/qwen3.6-27b/LICENSE`
	- Upstream license check: `release/UPSTREAM_LICENSE_CHECK.md`

	Required before release:

	- Include upstream Apache 2.0 license.
	- Include upstream notices if present.
	- Do not imply Qwen or Alibaba endorsement.
	- Use attribution language only: "Fine-tuned from Qwen under Apache 2.0."

	## Fine-Tuning

	- Method: LoRA
	- Existing full run lineage: `qwen36-27b-lora-v0.1` through current Kaiju adapters
	- Training hardware: Gojira B, 128GB NVIDIA Spark
	- v0.1 training examples: 575 reviewed examples
	- v1.7 training file: `datasets/build/kaiju-sft-v1.7-business-owner-oversampled.jsonl`
	- v1.7 raw reviewed examples: 1,689
	- v1.7 training rows after business-owner oversampling: 1,881
	- v1.7 business-owner addendum: 8 reviewed examples, oversampled 24 times for the next run
	- v1.7 config: `training/configs/qwen36-27b-lora-v1.7.example.json`
	- v1.7 run scope: 1,024-token context, 24 steps, intended as a testable business-owner adapter rather than a final long-context bakeoff
	- v1.7 train runtime: `1663.7101s`
	- v1.7 train loss: `1.7260706673065822`
	- v1.7 train/eval examples: `1,769` / `112`
	- v1.7 adapter path: `runs/qwen36-27b-lora-v1.7-business-owner/adapter`
	- v1.8 config: `training/configs/qwen36-27b-lora-v1.8-business-owner.example.json`
	- v1.8 scope: 2,048-token context, 96 max steps, same reviewed/oversampled v1.7 business-owner SFT rows
	- v1.8 train runtime: `11666.7564s`
	- v1.8 train loss: `0.9281658741335074`
	- v1.8 train/eval examples: `1,769` / `112`
	- v1.8 adapter path: `runs/qwen36-27b-lora-v1.8-business-owner/adapter`
	- v1.8 status: completed on 2026-06-03 and merged into a full local model for serving; do not publish externally until human review, upstream notices, broader comparison evals, and raw website limitation language are complete
	- Trainable parameters: approximately 79.7M
	- Base parameters: approximately 27.0B
	- Merged full-model artifact: `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`, `51G`, `14` safetensor shards plus tokenizer/config sidecars

	Release note: Kaiju's current product path may combine a compact model planner with deterministic harnesses and verifier checks. If the shipped experience uses that harness path, release copy must say so plainly instead of implying the raw model weights alone create every artifact.

	## Data

	The dataset is source-backed and RMDW-owned or RMDW-authored. The current source inventory is tracked in `release/SOURCE_INVENTORY.md`.

	High-level categories:

	- Website/UI
	- Coding
	- Debugging
	- Automation
	- Tool-use
	- Strategy
	- Business
	- Business-suite
	- Identity

	Excluded data:

	- Closed-model outputs from OpenAI, Anthropic, Gemini, or similar providers as supervised training completions.
	- Customer private code without explicit permission.
	- Client-specific website text, contact details, contracts, or private business details unless explicitly reviewed and approved.
	- Secrets, credentials, private keys, tokens, cookies, and raw support logs containing personal data.

	## Evaluation

	Required bakeoff before release:

	- Base Qwen 3.6 27B
	- Kaiju Coder LoRA
	- GLM 4.7 production baseline

	Current local harness evidence:

	- 2026-06-03 Kiyomi business-suite router hard gate: `23/23` passed.
	- Business-suite prompts: `2/2` passed.
	- Static artifact checks: `23/23` passed.
	- Dataset validation: `1,689` reviewed candidate examples across `14` files.
	- v1.7 target gate: all category minimums met, including `business_suite` and `proposal`.
	- v1.7 served adapter smoke:
	- Website task `website-barber-001`: passed, 2,726 chars in 174.49s.
	- Proposal task `proposal-001` with Kaiju API system prompt: passed, 4,306 chars in 232.27s.
	- v1.7 serving config: SGLang over Tailscale at `http://100.109.109.14:18083/v1`, model `kaiju_v17_business_owner`, context `4096`, memory fraction `0.90`.
	- v1.8 training metrics: runtime `11666.7564s`, train loss `0.9281658741335074`, adapter present.
	- v1.8 dynamic SGLang LoRA caveat:
	- Adapter-name-only serving can be base-equivalent.
	- Corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`.
	- Dynamic LoRA is not the release serving path for this checkpoint.
	- Kaiju Coder 7 current serving config: vLLM bitsandbytes runtime
	quantization on Gojira B at `http://100.109.109.14:18084/v1`, exposed on
	this Mac through `http://127.0.0.1:18181/v1`, model `kaiju-coder-7`,
	current OpenCode context `16384`. SGLang has historical 32k benchmark
	evidence, but 32k should be freshly restarted and re-confirmed before being
	called the live default.
	- v1.8 merged endpoint probe: `1,155` visible chars in `60.17s`.
	- v1.8 merged focused eval:
	- Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
	- Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
	- Broader base-Qwen, GLM, and raw website comparisons are still pending before
	any superiority claims.

	Sellable-candidate gate:

	- Beats base Qwen on RMDW practical evals.
	- Near or above GLM 4.7 on highest-value customer tasks.
	- No critical safety failures.
	- Produces complete artifacts instead of plans only.
	- Produces owner-ready Kiyomi/RMDW artifacts for websites, connector packs, CRM, reporting, leads, sales, ROI, and operator training.
	- Distinct useful voice without becoming gimmicky.

	## Limitations

	Known limitations:

	- Not a general frontier model.
	- May be weaker than large cloud frontier models on broad reasoning and uncommon programming domains.
	- Needs a strong harness for tool use, file editing, and long-running work.
	- Raw merged serving is slow on this SGLang stack.
	- Dynamic SGLang LoRA serving is not release-quality for this adapter; use the merged model path.
	- Business-owner performance depends on source-backed evals, provenance controls, and deterministic artifact verification.
	- Hosted API release requires billing, rate limits, abuse controls, logs, and rollback.

	## Intended Use

	Good fit:

	- Solo-founder product work.
	- Small-business websites and automations.
	- Kiyomi-style local AI product workflows.
	- Practical coding and deployment assistance.

	Not a fit:

	- High-risk medical, legal, financial, or safety-critical decisions without expert review.
	- Secret handling without a secure app layer.
	- Claims of guaranteed correctness.

	## Release Status

	Current status: business-owner release-candidate preparation.

	Fresh v1.7 and v1.8 LoRA training finished on 2026-06-03 after clearing old ComfyUI/Ollama workloads from Gojira B. The current completed testable product path is the v1.8 merged model plus the deterministic business-owner harness and verifier. Raw merged model testing works for focused business-owner documents and backend automations, but the paid website path remains harness-first until broader raw website evals pass.

	Do not publish weights or sell hosted API access until the eval and release checklist pass.