Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
kaiju-coder-7
coding
local-ai
business
opencode
tool-use
conversational
Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RMDWLLC/kaiju-coder-7 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7") model = AutoModelForMultimodalLM.from_pretrained("RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use RMDWLLC/kaiju-coder-7 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RMDWLLC/kaiju-coder-7" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RMDWLLC/kaiju-coder-7
- SGLang
How to use RMDWLLC/kaiju-coder-7 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
docker model run hf.co/RMDWLLC/kaiju-coder-7
| # Kaiju Coder 7 Local Test Instructions | |
| Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the fastest current runtime is vLLM bitsandbytes on Gojira B over Tailscale with the local OpenCode fast proxy. | |
| ## Run The Local Release-Candidate Gate | |
| ```bash | |
| python3 scripts/run_kaiju_business_owner_rc_smoke.py | |
| ``` | |
| This validates reviewed data, checks v1.7 targets, builds the oversampled business-owner SFT file, smokes the local OpenAI-compatible harness API, runs the hard router suite, and runs static artifact checks. | |
| For release status, read `release/COMPLETION_AUDIT.md` and `release/HUGGINGFACE_RELEASE_DRAFT.md`. | |
| ## Merge The v1.8 Adapter | |
| Use this if the merged full model must be rebuilt: | |
| ```bash | |
| KAIJU_LORA_ADAPTER=/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter \ | |
| KAIJU_MERGED_MODEL_DIR=/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged \ | |
| ./scripts/run-gojira-b-qwen36-lora-merge.sh | |
| ``` | |
| ## Start Kaiju Coder 7 Serving | |
| Use this for the fastest current model-side candidate: | |
| ```bash | |
| KAIJU_VLLM_CONTEXT=16384 \ | |
| KAIJU_VLLM_QUANTIZATION=bitsandbytes \ | |
| KAIJU_VLLM_LOAD_FORMAT=bitsandbytes \ | |
| KAIJU_VLLM_GPU_UTIL=0.90 \ | |
| ./scripts/start-qwen36-merged-vllm.sh | |
| ``` | |
| Confirm readiness: | |
| ```bash | |
| curl http://100.109.109.14:18084/v1/models | |
| ``` | |
| Then keep the Mac-side fast proxy pointed at that vLLM endpoint: | |
| ```bash | |
| KAIJU_OPENAI_BASE_URL=http://100.109.109.14:18084/v1 \ | |
| python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181 | |
| ``` | |
| The high-context `32768` target has benchmark evidence in | |
| `release/SERVING_BENCHMARKS.md`, but the current speed/default path is 16k | |
| runtime-quantized vLLM plus the local fast proxy. | |
| ## Prepare Merged-Model Hugging Face Metadata | |
| Use this before any full merged-model upload review. It syncs release metadata | |
| into the Gojira-B model folder but does not upload or read Hugging Face tokens. | |
| If the remote merged folder is root-owned, the helper automatically uses | |
| passwordless sudo for rsync without changing model ownership: | |
| ```bash | |
| bash scripts/prepare_hf_merged_model_metadata.sh | |
| KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh | |
| bash scripts/upload_hf_merged_model_from_gojira_b.sh | |
| ``` | |
| ## Install And Smoke OpenCode | |
| ```bash | |
| python3 scripts/install_kaiju_opencode_profile.py | |
| opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 \ | |
| --dir /tmp/kaiju-opencode-loopguard-smoke \ | |
| --dangerously-skip-permissions \ | |
| 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed' | |
| ``` | |
| The installer writes the `kaiju` provider, the lean `kaiju-coder-7` agent, and | |
| the scoped no-autocontinue plugin at | |
| `~/.config/opencode/kaiju-no-autocontinue.mjs`. | |
| ## Run The Deterministic Harness Smoke | |
| ```bash | |
| python3 scripts/run_kaiju_api_harness_smoke.py | |
| ``` | |
| ## Run A Direct Model Eval | |
| ```bash | |
| python3 evals/run_openai_compat_smoke.py \ | |
| --base-url http://100.109.109.14:18084/v1 \ | |
| --model kaiju-coder-7 \ | |
| --tasks evals/tasks/smoke.jsonl \ | |
| --max-tasks 1 \ | |
| --timeout 300 \ | |
| --max-tokens 768 \ | |
| --temperature 0 \ | |
| --disable-thinking \ | |
| --system-prompt-file prompts/kaiju-coder-api-system.md | |
| ``` | |
| For the selected final business-owner checkpoint, run the focused v1.8 | |
| business-owner pack and then score it. Raw merged model generation is slow, so | |
| use the harness for practical paid website delivery until broader raw website | |
| evals pass at acceptable latency: | |
| ```bash | |
| python3 evals/run_openai_compat_smoke.py \ | |
| --base-url http://100.109.109.14:18084/v1 \ | |
| --model kaiju-coder-7 \ | |
| --tasks evals/tasks/business-owner-v18-comparison.jsonl \ | |
| --timeout 900 \ | |
| --max-tokens 2500 \ | |
| --temperature 0 \ | |
| --disable-thinking \ | |
| --stream \ | |
| --system-prompt-file prompts/kaiju-coder-api-system.md | |
| python3 evals/score_quality_gate.py runs/evals/<merged-v18-run>/results.jsonl | |
| ``` | |
| Current merged evidence: | |
| - Probe: `1,155` visible chars in `60.17s`. | |
| - Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`. | |
| - Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`. | |
| ## Dynamic LoRA Serving Caveat | |
| Do not use dynamic SGLang LoRA serving as release evidence for v1.8. The adapter-name-only path can be base-equivalent, and the corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes this SGLang build with a fused-module LoRA buffer shape mismatch. Use the merged full-model path above. | |
| ## Run The Business-Owner Harness | |
| ```bash | |
| python3 evals/run_router_harness_eval.py --tasks evals/tasks/router-hard-harness.jsonl | |
| python3 evals/run_router_static_checks.py runs/evals/<router-run>/results.jsonl | |
| ``` | |
| ## Manual Prompt To Try First | |
| ```text | |
| Build me the full Kiyomi 7.7.7 AI company operating pack for a local business owner. I need the launch kit, website, content engine, connector checklist, intake CRM, money report, automations, operator handbook, lead generator, sales closer, ROI dashboard, and Workshop golden run. Make it owner-ready with no developer setup required. | |
| ``` | |
| Expected shape: | |
| - A project folder with multiple files, not advice only. | |
| - Complete HTML where HTML is requested. | |
| - Lead/sales CSVs. | |
| - Connector verification gates. | |
| - ROI audit gate. | |
| - Workshop golden-run gate. | |
| - Clear owner commands such as `/kiyomi` and `/kiyomi-do`. | |