Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RMDWLLC/kaiju-coder-7 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7")
model = AutoModelForMultimodalLM.from_pretrained("RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RMDWLLC/kaiju-coder-7 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RMDWLLC/kaiju-coder-7"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RMDWLLC/kaiju-coder-7

SGLang

How to use RMDWLLC/kaiju-coder-7 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RMDWLLC/kaiju-coder-7" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RMDWLLC/kaiju-coder-7" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
```
docker model run hf.co/RMDWLLC/kaiju-coder-7
```

kaiju-coder-7 / PUBLIC_TESTING_QUICKSTART.md

restokes92

Refresh Kaiju Coder 7 release metadata and RMDW logo

38fd27e verified 5 days ago

preview code

raw

history blame contribute delete

6.96 kB

	# Kaiju Coder 7 Public Testing Quickstart

	Kaiju Coder 7 is the public model name. The OpenAI-compatible model id is:

	```text
	kaiju-coder-7
	```

	Use this guide for serious public testing. It avoids internal checkpoint names
	and keeps the current limitations clear.

	## Pick A Test Path

	### Path 1: OpenCode Against An Existing Endpoint

	Use this if you already have Kaiju Coder 7 served at an OpenAI-compatible
	`/v1` endpoint.

	```bash
	git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
	cd kaiju-coder-7-opencode
	python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
	```

	Then run OpenCode inside the project you want to edit:

	```bash
	opencode
	```

	The installer sets `kaiju/kaiju-coder-7` as the OpenCode model and
	`kaiju-coder-7` as the default agent. You can still select
	`kaiju/kaiju-coder-7` manually from OpenCode's model picker if you switch away.

	For a bounded smoke test:

	```bash
	mkdir -p /tmp/kaiju-public-smoke
	opencode run --dir /tmp/kaiju-public-smoke \
	"Create hello.txt with exactly: Kaiju Coder 7 is ready"
	```

	Or run the packaged verifier, which checks the installer, live model endpoint,
	OpenCode binary, actual file creation, and wrong-directory behavior:

	```bash
	python3 scripts/run_kaiju_public_opencode_smoke.py
	```

	The helper installer adds:

	- the `kaiju` OpenAI-compatible provider
	- `model: kaiju/kaiju-coder-7` and `default_agent: kaiju-coder-7`
	- the lean `kaiju-coder-7` OpenCode agent
	- Kaiju as the default primary agent, so selecting Kaiju Coder 7 uses the
	hidden fast artifact path without requiring `/kaiju`
	- the `kaiju-coder-7-run` router command for fast websites, owner packs, and
	Desktop artifact folders
	- the `kaiju_artifact` OpenCode custom tool and `/kaiju` command for routing
	large artifact prompts through the fast local router
	- a scoped no-autocontinue plugin that prevents false completion loops after
	compaction or output limits

	For a fast website or owner-pack artifact without waiting on raw OpenCode
	multi-file streaming, run:

	```bash
	kaiju-coder-7-run \
	--no-planner \
	--kind website \
	--out-dir "$HOME/Desktop/Kaiju-Coder-7-Test" \
	--prompt "Build a premium one-page website for Harborline Bookkeeping with pricing, FAQ, and a cleanup-call CTA."
	```

	OpenCode should use this same command internally for large website,
	business-pack, and Desktop-output requests after the helper is installed.

	Inside OpenCode, `/kaiju` is optional for large generated artifacts. The command
	is prompt-backed, but it points the Kaiju agent at the `kaiju_artifact` custom
	tool instead of making the model hand-write every file.

	### Path 2: Full Local Weights

	Use this if the full `RMDWLLC/kaiju-coder-7` Hugging Face repo has been
	uploaded and you have suitable local GPU hardware.

	```bash
	hf download RMDWLLC/kaiju-coder-7 --local-dir ./kaiju-coder-7
	```

	Serve the downloaded folder with an OpenAI-compatible local server. Configure
	the server to expose:

	```text
	model id: kaiju-coder-7
	base URL: http://127.0.0.1:18084/v1
	context: 16384
	```

	For the fastest OpenCode behavior, run the bundled fast proxy in a separate
	terminal and point OpenCode at the proxy:

	```bash
	KAIJU_OPENAI_BASE_URL=http://127.0.0.1:18084/v1 \
	python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
	```

	Then install the OpenCode helper with:

	```bash
	git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
	cd kaiju-coder-7-opencode
	python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
	```

	### Path 3: Runtime-Quantized Local Candidate

	Use this only if you are comfortable with advanced serving setups. The current
	working quantized option is a runtime bitsandbytes recipe. A Q8_0 GGUF artifact
	has been converted, but it is still a candidate until runtime smoke passes.

	```bash
	git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
	cd kaiju-coder-7-quantized-runtime
	```

	Read `README.md` in that repo before serving. This path can reduce model memory
	at runtime, but it still depends on access to the full Kaiju Coder 7 weights.

	## Recommended Test Prompt

	Run this from an empty project folder:

	```text
	Build a launch-ready local service business website and operating pack. Include
	index.html, a Stripe checkout safety plan, a CSV parser with tests, a simple CRM
	schema, a weekly money report, and a safety/provenance note. Write the files,
	not just advice.
	```

	Expected result:

	- files are written in the requested project folder
	- `index.html` is complete HTML
	- business docs start with Markdown H1 headings
	- code includes a test or smoke-check command where practical
	- no fake API keys, OAuth tokens, payment secrets, or private customer data

	## Current Recommended Defaults

	- Public model id: `kaiju-coder-7`
	- OpenCode context: `16384`
	- Output cap for public testing: `2500`
	- Fast OpenCode path: vLLM bitsandbytes runtime behind the Kaiju fast proxy
	- Current reliable product path: model plus deterministic business-owner
	harness/router plus verifier
	- Raw multi-file OpenCode generation: still too slow for broad paid claims;
	use `kaiju-coder-7-run` for fast public website and owner-pack tests while
	broader raw-model latency gates continue
	- Paid API: not public until launch preflight passes and the Stripe live-mode
	switch is deliberately completed

	## What Not To Claim Yet

	Do not claim:

	- that raw model weights alone reliably build every business-owner artifact
	- that a paid hosted API is generally available
	- that persisted quantized weights exist
	- that 32k context is the current live default

	Do claim:

	- Kaiju Coder 7 has a working local/OpenCode release candidate
	- the current tested OpenCode default is 16k context
	- the helper package includes a lean agent and compaction loop guard
	- the helper package includes the `kaiju-coder-7-run` router command for fast
	artifact generation
	- the fast proxy keeps OpenCode tool calls intact while forcing bounded,
	non-thinking generation
	- the paid API scaffold has tests and a launch preflight, but is not yet public
	- the packaged public smoke verifies a fresh OpenCode one-file write before
	public claims are refreshed
	- a GGUF Q8_0 candidate exists, but is not public quantized-weights release
	evidence until runtime smoke passes

	## Remaining Caveats Before Broader Claims

	- Hugging Face public release repos are uploaded and public under `RMDWLLC`.
	- The GGUF Q8_0 candidate still needs a runtime smoke before public
	quantized-weights upload.
	- Raw multi-file OpenCode generation is still not the public speed story; use
	the deterministic router/harness for websites and business-owner packs.
	- Public paid API launch has approval and preflight evidence, but real customer
	charging still needs a deliberate Stripe live-mode switch and controlled live
	payment verification.
	- Do not claim 32k context as the live default until it is freshly restarted
	and re-confirmed.