Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RMDWLLC/kaiju-coder-7 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7")
model = AutoModelForMultimodalLM.from_pretrained("RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RMDWLLC/kaiju-coder-7 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RMDWLLC/kaiju-coder-7"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RMDWLLC/kaiju-coder-7

SGLang

How to use RMDWLLC/kaiju-coder-7 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RMDWLLC/kaiju-coder-7" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RMDWLLC/kaiju-coder-7" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
```
docker model run hf.co/RMDWLLC/kaiju-coder-7
```

kaiju-coder-7 / LOCAL_TEST_INSTRUCTIONS.md

restokes92

Add files using upload-large-folder tool

4ca1eb4 verified 6 days ago

preview code

raw

history blame contribute delete

5.58 kB

	# Kaiju Coder 7 Local Test Instructions

	Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the fastest current runtime is vLLM bitsandbytes on Gojira B over Tailscale with the local OpenCode fast proxy.

	## Run The Local Release-Candidate Gate

	```bash
	python3 scripts/run_kaiju_business_owner_rc_smoke.py
	```

	This validates reviewed data, checks v1.7 targets, builds the oversampled business-owner SFT file, smokes the local OpenAI-compatible harness API, runs the hard router suite, and runs static artifact checks.

	For release status, read `release/COMPLETION_AUDIT.md` and `release/HUGGINGFACE_RELEASE_DRAFT.md`.

	## Merge The v1.8 Adapter

	Use this if the merged full model must be rebuilt:

	```bash
	KAIJU_LORA_ADAPTER=/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter \
	KAIJU_MERGED_MODEL_DIR=/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged \
	./scripts/run-gojira-b-qwen36-lora-merge.sh
	```

	## Start Kaiju Coder 7 Serving

	Use this for the fastest current model-side candidate:

	```bash
	KAIJU_VLLM_CONTEXT=16384 \
	KAIJU_VLLM_QUANTIZATION=bitsandbytes \
	KAIJU_VLLM_LOAD_FORMAT=bitsandbytes \
	KAIJU_VLLM_GPU_UTIL=0.90 \
	./scripts/start-qwen36-merged-vllm.sh
	```

	Confirm readiness:

	```bash
	curl http://100.109.109.14:18084/v1/models
	```

	Then keep the Mac-side fast proxy pointed at that vLLM endpoint:

	```bash
	KAIJU_OPENAI_BASE_URL=http://100.109.109.14:18084/v1 \
	python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
	```

	The high-context `32768` target has benchmark evidence in
	`release/SERVING_BENCHMARKS.md`, but the current speed/default path is 16k
	runtime-quantized vLLM plus the local fast proxy.

	## Prepare Merged-Model Hugging Face Metadata

	Use this before any full merged-model upload review. It syncs release metadata
	into the Gojira-B model folder but does not upload or read Hugging Face tokens.
	If the remote merged folder is root-owned, the helper automatically uses
	passwordless sudo for rsync without changing model ownership:

	```bash
	bash scripts/prepare_hf_merged_model_metadata.sh
	KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh
	bash scripts/upload_hf_merged_model_from_gojira_b.sh
	```

	## Install And Smoke OpenCode

	```bash
	python3 scripts/install_kaiju_opencode_profile.py
	opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 \
	--dir /tmp/kaiju-opencode-loopguard-smoke \
	--dangerously-skip-permissions \
	'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'
	```

	The installer writes the `kaiju` provider, the lean `kaiju-coder-7` agent, and
	the scoped no-autocontinue plugin at
	`~/.config/opencode/kaiju-no-autocontinue.mjs`.

	## Run The Deterministic Harness Smoke

	```bash
	python3 scripts/run_kaiju_api_harness_smoke.py
	```

	## Run A Direct Model Eval

	```bash
	python3 evals/run_openai_compat_smoke.py \
	--base-url http://100.109.109.14:18084/v1 \
	--model kaiju-coder-7 \
	--tasks evals/tasks/smoke.jsonl \
	--max-tasks 1 \
	--timeout 300 \
	--max-tokens 768 \
	--temperature 0 \
	--disable-thinking \
	--system-prompt-file prompts/kaiju-coder-api-system.md
	```

	For the selected final business-owner checkpoint, run the focused v1.8
	business-owner pack and then score it. Raw merged model generation is slow, so
	use the harness for practical paid website delivery until broader raw website
	evals pass at acceptable latency:

	```bash
	python3 evals/run_openai_compat_smoke.py \
	--base-url http://100.109.109.14:18084/v1 \
	--model kaiju-coder-7 \
	--tasks evals/tasks/business-owner-v18-comparison.jsonl \
	--timeout 900 \
	--max-tokens 2500 \
	--temperature 0 \
	--disable-thinking \
	--stream \
	--system-prompt-file prompts/kaiju-coder-api-system.md

	python3 evals/score_quality_gate.py runs/evals/<merged-v18-run>/results.jsonl
	```

	Current merged evidence:

	- Probe: `1,155` visible chars in `60.17s`.
	- Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
	- Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.

	## Dynamic LoRA Serving Caveat

	Do not use dynamic SGLang LoRA serving as release evidence for v1.8. The adapter-name-only path can be base-equivalent, and the corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes this SGLang build with a fused-module LoRA buffer shape mismatch. Use the merged full-model path above.

	## Run The Business-Owner Harness

	```bash
	python3 evals/run_router_harness_eval.py --tasks evals/tasks/router-hard-harness.jsonl
	python3 evals/run_router_static_checks.py runs/evals/<router-run>/results.jsonl
	```

	## Manual Prompt To Try First

	```text
	Build me the full Kiyomi 7.7.7 AI company operating pack for a local business owner. I need the launch kit, website, content engine, connector checklist, intake CRM, money report, automations, operator handbook, lead generator, sales closer, ROI dashboard, and Workshop golden run. Make it owner-ready with no developer setup required.
	```

	Expected shape:

	- A project folder with multiple files, not advice only.
	- Complete HTML where HTML is requested.
	- Lead/sales CSVs.
	- Connector verification gates.
	- ROI audit gate.
	- Workshop golden-run gate.
	- Clear owner commands such as `/kiyomi` and `/kiyomi-do`.