Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
kaiju-coder-7
coding
local-ai
business
opencode
tool-use
conversational
Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RMDWLLC/kaiju-coder-7 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7") model = AutoModelForMultimodalLM.from_pretrained("RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use RMDWLLC/kaiju-coder-7 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RMDWLLC/kaiju-coder-7" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RMDWLLC/kaiju-coder-7
- SGLang
How to use RMDWLLC/kaiju-coder-7 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
docker model run hf.co/RMDWLLC/kaiju-coder-7
| # Kaiju Coder 7 Public Testing Quickstart | |
| Kaiju Coder 7 is the public model name. The OpenAI-compatible model id is: | |
| ```text | |
| kaiju-coder-7 | |
| ``` | |
| Use this guide for serious public testing. It avoids internal checkpoint names | |
| and keeps the current limitations clear. | |
| ## Pick A Test Path | |
| ### Path 1: OpenCode Against An Existing Endpoint | |
| Use this if you already have Kaiju Coder 7 served at an OpenAI-compatible | |
| `/v1` endpoint. | |
| ```bash | |
| git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode | |
| cd kaiju-coder-7-opencode | |
| python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1 | |
| ``` | |
| Then run OpenCode inside the project you want to edit: | |
| ```bash | |
| opencode | |
| ``` | |
| The installer sets `kaiju/kaiju-coder-7` as the OpenCode model and | |
| `kaiju-coder-7` as the default agent. You can still select | |
| `kaiju/kaiju-coder-7` manually from OpenCode's model picker if you switch away. | |
| For a bounded smoke test: | |
| ```bash | |
| mkdir -p /tmp/kaiju-public-smoke | |
| opencode run --dir /tmp/kaiju-public-smoke \ | |
| "Create hello.txt with exactly: Kaiju Coder 7 is ready" | |
| ``` | |
| Or run the packaged verifier, which checks the installer, live model endpoint, | |
| OpenCode binary, actual file creation, and wrong-directory behavior: | |
| ```bash | |
| python3 scripts/run_kaiju_public_opencode_smoke.py | |
| ``` | |
| The helper installer adds: | |
| - the `kaiju` OpenAI-compatible provider | |
| - `model: kaiju/kaiju-coder-7` and `default_agent: kaiju-coder-7` | |
| - the lean `kaiju-coder-7` OpenCode agent | |
| - Kaiju as the default primary agent, so selecting Kaiju Coder 7 uses the | |
| hidden fast artifact path without requiring `/kaiju` | |
| - the `kaiju-coder-7-run` router command for fast websites, owner packs, and | |
| Desktop artifact folders | |
| - the `kaiju_artifact` OpenCode custom tool and `/kaiju` command for routing | |
| large artifact prompts through the fast local router | |
| - a scoped no-autocontinue plugin that prevents false completion loops after | |
| compaction or output limits | |
| For a fast website or owner-pack artifact without waiting on raw OpenCode | |
| multi-file streaming, run: | |
| ```bash | |
| kaiju-coder-7-run \ | |
| --no-planner \ | |
| --kind website \ | |
| --out-dir "$HOME/Desktop/Kaiju-Coder-7-Test" \ | |
| --prompt "Build a premium one-page website for Harborline Bookkeeping with pricing, FAQ, and a cleanup-call CTA." | |
| ``` | |
| OpenCode should use this same command internally for large website, | |
| business-pack, and Desktop-output requests after the helper is installed. | |
| Inside OpenCode, `/kaiju` is optional for large generated artifacts. The command | |
| is prompt-backed, but it points the Kaiju agent at the `kaiju_artifact` custom | |
| tool instead of making the model hand-write every file. | |
| ### Path 2: Full Local Weights | |
| Use this if the full `RMDWLLC/kaiju-coder-7` Hugging Face repo has been | |
| uploaded and you have suitable local GPU hardware. | |
| ```bash | |
| hf download RMDWLLC/kaiju-coder-7 --local-dir ./kaiju-coder-7 | |
| ``` | |
| Serve the downloaded folder with an OpenAI-compatible local server. Configure | |
| the server to expose: | |
| ```text | |
| model id: kaiju-coder-7 | |
| base URL: http://127.0.0.1:18084/v1 | |
| context: 16384 | |
| ``` | |
| For the fastest OpenCode behavior, run the bundled fast proxy in a separate | |
| terminal and point OpenCode at the proxy: | |
| ```bash | |
| KAIJU_OPENAI_BASE_URL=http://127.0.0.1:18084/v1 \ | |
| python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181 | |
| ``` | |
| Then install the OpenCode helper with: | |
| ```bash | |
| git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode | |
| cd kaiju-coder-7-opencode | |
| python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1 | |
| ``` | |
| ### Path 3: Runtime-Quantized Local Candidate | |
| Use this only if you are comfortable with advanced serving setups. The current | |
| working quantized option is a runtime bitsandbytes recipe. A Q8_0 GGUF artifact | |
| has been converted, but it is still a candidate until runtime smoke passes. | |
| ```bash | |
| git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime | |
| cd kaiju-coder-7-quantized-runtime | |
| ``` | |
| Read `README.md` in that repo before serving. This path can reduce model memory | |
| at runtime, but it still depends on access to the full Kaiju Coder 7 weights. | |
| ## Recommended Test Prompt | |
| Run this from an empty project folder: | |
| ```text | |
| Build a launch-ready local service business website and operating pack. Include | |
| index.html, a Stripe checkout safety plan, a CSV parser with tests, a simple CRM | |
| schema, a weekly money report, and a safety/provenance note. Write the files, | |
| not just advice. | |
| ``` | |
| Expected result: | |
| - files are written in the requested project folder | |
| - `index.html` is complete HTML | |
| - business docs start with Markdown H1 headings | |
| - code includes a test or smoke-check command where practical | |
| - no fake API keys, OAuth tokens, payment secrets, or private customer data | |
| ## Current Recommended Defaults | |
| - Public model id: `kaiju-coder-7` | |
| - OpenCode context: `16384` | |
| - Output cap for public testing: `2500` | |
| - Fast OpenCode path: vLLM bitsandbytes runtime behind the Kaiju fast proxy | |
| - Current reliable product path: model plus deterministic business-owner | |
| harness/router plus verifier | |
| - Raw multi-file OpenCode generation: still too slow for broad paid claims; | |
| use `kaiju-coder-7-run` for fast public website and owner-pack tests while | |
| broader raw-model latency gates continue | |
| - Paid API: not public until launch preflight passes and the Stripe live-mode | |
| switch is deliberately completed | |
| ## What Not To Claim Yet | |
| Do not claim: | |
| - that raw model weights alone reliably build every business-owner artifact | |
| - that a paid hosted API is generally available | |
| - that persisted quantized weights exist | |
| - that 32k context is the current live default | |
| Do claim: | |
| - Kaiju Coder 7 has a working local/OpenCode release candidate | |
| - the current tested OpenCode default is 16k context | |
| - the helper package includes a lean agent and compaction loop guard | |
| - the helper package includes the `kaiju-coder-7-run` router command for fast | |
| artifact generation | |
| - the fast proxy keeps OpenCode tool calls intact while forcing bounded, | |
| non-thinking generation | |
| - the paid API scaffold has tests and a launch preflight, but is not yet public | |
| - the packaged public smoke verifies a fresh OpenCode one-file write before | |
| public claims are refreshed | |
| - a GGUF Q8_0 candidate exists, but is not public quantized-weights release | |
| evidence until runtime smoke passes | |
| ## Remaining Caveats Before Broader Claims | |
| - Hugging Face public release repos are uploaded and public under `RMDWLLC`. | |
| - The GGUF Q8_0 candidate still needs a runtime smoke before public | |
| quantized-weights upload. | |
| - Raw multi-file OpenCode generation is still not the public speed story; use | |
| the deterministic router/harness for websites and business-owner packs. | |
| - Public paid API launch has approval and preflight evidence, but real customer | |
| charging still needs a deliberate Stripe live-mode switch and controlled live | |
| payment verification. | |
| - Do not claim 32k context as the live default until it is freshly restarted | |
| and re-confirmed. | |