Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
kaiju-coder-7
coding
local-ai
business
opencode
tool-use
conversational
Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RMDWLLC/kaiju-coder-7 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7") model = AutoModelForMultimodalLM.from_pretrained("RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use RMDWLLC/kaiju-coder-7 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RMDWLLC/kaiju-coder-7" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RMDWLLC/kaiju-coder-7
- SGLang
How to use RMDWLLC/kaiju-coder-7 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
docker model run hf.co/RMDWLLC/kaiju-coder-7
| # Kaiju Coder 7 Paid API Readiness | |
| Do not sell the hosted API as generally available until the gates below pass. | |
| ## Current Position | |
| Kaiju Coder 7 can be served locally through an OpenAI-compatible SGLang | |
| endpoint. The reliable commercial product path is: | |
| ```text | |
| Kaiju Coder 7 model + deterministic business-owner harness + verifier + gateway controls | |
| ``` | |
| Raw multi-file OpenCode generation is not yet fast enough to be the paid API | |
| promise by itself. The harnessed customer-readiness pack passes and should be | |
| the paid-route baseline until raw-agent generation improves. | |
| ## Required Gateway Behavior | |
| - Use model id `kaiju-coder-7`. | |
| - Disable hidden thinking where the serving stack supports it. | |
| - Stream responses for long outputs. | |
| - Cap max output by route. | |
| - Reject requests with secret-looking prompt content when possible. | |
| - Never log API keys, bearer tokens, OAuth tokens, payment credentials, or full | |
| private customer prompts by default. | |
| - Keep request ids, customer id, route, token counts, latency, status, and coarse | |
| failure reason. | |
| ## Billing And Access | |
| - API keys must be scoped per customer/account. | |
| - Stripe subscription or prepaid credit balance must be checked before serving. | |
| - Rate limits must be per key and per account. | |
| - Failed auth and rate-limit events should be logged without prompt content. | |
| - Admin override keys must be separate from customer keys. | |
| ## Current Gateway Scaffold Evidence | |
| Local Worker scaffold: | |
| - `gateway/cloudflare-worker/src/index.js` | |
| - `gateway/cloudflare-worker/migrations/0001_paid_api.sql` | |
| - `gateway/cloudflare-worker/test/index.test.js` | |
| Verified on 2026-06-03 with: | |
| ```bash | |
| cd gateway/cloudflare-worker | |
| npm run check | |
| npm run preflight | |
| ``` | |
| Result: `16/16` Worker tests passed and `17` paid API scaffold preflight checks | |
| passed. | |
| The scaffold preflight also checks that the guarded Cloudflare resource-prep | |
| script, `scripts/prepare_paid_api_cloudflare_resources.sh`, is wired through | |
| `npm run prepare:cloudflare`, and that the reviewed binding template is present. | |
| Live Cloudflare resources were created on 2026-06-03 after Wrangler login: | |
| - D1 `kaiju_api_billing` bound as `KAIJU_BILLING_DB` | |
| - KV `kaiju_rate_limit` bound as `KAIJU_RATE_LIMIT_KV` | |
| - R2 `kaiju-api-artifacts` bound as `KAIJU_ARTIFACT_BUCKET` | |
| - D1 migration `0001_paid_api.sql` applied successfully | |
| The Worker was deployed on 2026-06-03 at: | |
| ```text | |
| https://kaiju-api-gateway.kiyomi-api.workers.dev | |
| https://kaiju-api.kiyomikode.com | |
| ``` | |
| Gojira-B now advertises `kaiju-coder-7` from its public health endpoint. The | |
| origin secret was rotated during launch verification and re-applied to | |
| Cloudflare without writing the value to this repo. | |
| Current launch preflight after custom-domain traffic, Stripe test webhook | |
| staging, and rollback verification: | |
| ```text | |
| 27 pass / 0 fail / 0 manual | |
| ``` | |
| Passed live launch evidence: | |
| - Custom domain `https://kaiju-api.kiyomikode.com` resolves to the intended | |
| Kaiju Worker; `/health` returned `200` in the launch evidence probe. | |
| - `KAIJU_ORIGIN_URL`, `KAIJU_ORIGIN_SECRET`, and | |
| `KAIJU_STRIPE_WEBHOOK_SECRET` are present by Wrangler secret name. | |
| - Worker-to-Gojira staging request passed through | |
| `https://kaiju-api.kiyomikode.com/v1/chat/completions` with | |
| `model=kaiju-coder-7`, HTTP `200`, and streaming enabled. | |
| - Paid-route latency through the custom domain was measured over five staging | |
| samples with p95 `14121.18ms`. | |
| - Stripe test-mode `checkout.session.completed` credited the staging API key | |
| using `metadata.kaiju_api_key_id`; duplicate signed delivery returned | |
| `duplicate: true` and did not double-credit. | |
| - Rollback drill succeeded by deploying same-code version | |
| `e838e01d-2d72-4eb7-9814-b95b7e2cef14`, rolling traffic back to verified | |
| version `d37d60d1-7bfc-4ac9-a69c-e9339b5e495f`, and rechecking `/health`. | |
| Real-money public charging still needs an explicit Stripe live-mode switch: | |
| create the live Checkout products/links or Sessions, create the live webhook | |
| endpoint for the final API domain, replace `KAIJU_STRIPE_WEBHOOK_SECRET` with | |
| the live webhook signing secret, and run a live-mode penny or controlled | |
| internal payment before advertising paid API access. | |
| Covered locally: | |
| - missing bearer token returns `401` | |
| - inactive API key returns `403` | |
| - insufficient credits return `402` before origin fetch | |
| - successful chat request forwards `x-kaiju-origin-secret` and debits credits | |
| - origin fetch failure refunds credits | |
| - fixed-window rate limit blocks before debit | |
| - public chat payload is forced to model `kaiju-coder-7`, streaming, thinking | |
| disabled, and token capped | |
| - unsupported model is rejected before debit | |
| - secret-looking prompt content is rejected before debit, origin fetch, or logs | |
| - signed Stripe Checkout webhook credits prepaid balance | |
| - duplicate Stripe Checkout webhook does not double-credit | |
| - invalid Stripe signature is rejected | |
| - origin-only artifact upload stores bounded text artifacts in R2 | |
| - authenticated artifact download is scoped to the caller's account namespace | |
| - unsafe artifact paths are rejected before R2 storage | |
| - secret-looking artifact content is rejected before R2 storage | |
| Executable preflight: | |
| ```bash | |
| python3 scripts/check_kaiju_public_release_readiness.py --mode local | |
| python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
| python3 scripts/check_kaiju_public_release_readiness.py --mode public | |
| python3 scripts/generate_kaiju_final_report.py | |
| python3 scripts/check_kaiju_goal_completion.py --write | |
| python3 scripts/refresh_kaiju_release_evidence.py --skip-opencode-smoke | |
| python3 scripts/check_hf_staging_integrity.py | |
| python3 scripts/check_hf_release_bundle_integrity.py | |
| python3 scripts/collect_hf_release_permission_evidence.py | |
| python3 scripts/check_hf_release_permission_evidence.py | |
| python3 scripts/check_human_release_review.py --mode local | |
| python3 scripts/check_human_release_review.py --mode public | |
| cd gateway/cloudflare-worker | |
| npm run prepare:cloudflare | |
| cd ../.. | |
| cp release/cloudflare-bindings.example.json release/cloudflare-bindings.json | |
| # Replace placeholder D1/KV IDs in release/cloudflare-bindings.json first. | |
| python3 scripts/apply_paid_api_cloudflare_bindings.py --bindings-file release/cloudflare-bindings.json | |
| python3 scripts/check_paid_api_readiness.py --mode scaffold | |
| python3 scripts/check_paid_api_readiness.py --mode launch | |
| ``` | |
| `check_kaiju_public_release_readiness.py --mode local` is the consolidated | |
| public-testing readiness command. `--mode hf-release` checks the downloadable | |
| model/helper release, public Hugging Face evidence, and human review while | |
| keeping live paid charging separate from model publication. `--mode public` | |
| now passes after public HF verification, live Cloudflare resource evidence, | |
| Stripe test-mode staging evidence, rollback proof, paid-route latency evidence, | |
| and human review are complete. | |
| `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with | |
| the current local/public readiness summaries, launch blockers, changed files, | |
| commands run, and first commands Richard should test. It is part of the release | |
| packet and does not inspect tokens, environment variables, or process command | |
| lines. | |
| `check_kaiju_goal_completion.py --write` writes | |
| `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should | |
| remain green only while the live runtime, public HF evidence, human review, and | |
| paid API launch evidence continue to pass. | |
| `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates | |
| direct API smoke evidence, goal audit, final report, HF staging, local bundle, | |
| merged-model metadata on Gojira-B, and dry-run upload previews without reading | |
| tokens or uploading anything. | |
| `check_hf_staging_integrity.py` validates the staged Hugging Face package for | |
| required files, public naming hygiene, raw secret-looking values, and staging | |
| checksums. It does not upload, create repos, or print matched secret values. | |
| `check_hf_release_permission_evidence.py` validates sanitized Hugging Face | |
| repo-create evidence in `release/hf-release-permission-evidence.json`. Start | |
| from `release/hf-release-permission-evidence.example.json` only after the | |
| private permission probe succeeds, or use | |
| `scripts/collect_hf_release_permission_evidence.py --apply --write` to run the | |
| probe and write the sanitized evidence automatically. Never include raw auth | |
| output or tokens. | |
| `check_human_release_review.py` reads `release/HUMAN_RELEASE_REVIEW.md`. Local | |
| mode may pass with pending/manual review fields; public mode must fail until | |
| Richard changes the signoff fields to approved decisions. | |
| `npm run prepare:cloudflare` is dry-run safe by default. It prints the exact | |
| Wrangler commands for creating `KAIJU_BILLING_DB`, `KAIJU_RATE_LIMIT_KV`, and | |
| `KAIJU_ARTIFACT_BUCKET`, applying the D1 migration, setting required secrets, | |
| deploying, listing deployments, and exercising rollback. The live resource | |
| creation path has now been run with `KAIJU_CF_RESOURCE_APPLY=1` and | |
| `KAIJU_CF_UPDATE_CONFIG=1`. `npm run check` also runs | |
| `npx wrangler deploy --dry-run` so the current Worker build path is validated | |
| without publishing. | |
| After real D1/KV/R2 resources exist, copy | |
| `release/cloudflare-bindings.example.json` to `release/cloudflare-bindings.json`, | |
| replace the placeholder IDs, and preview the reviewed config update: | |
| ```bash | |
| python3 scripts/apply_paid_api_cloudflare_bindings.py \ | |
| --bindings-file release/cloudflare-bindings.json | |
| ``` | |
| The applier refuses placeholder values and secret-looking input. Only after the | |
| preview is reviewed should it update `gateway/cloudflare-worker/wrangler.jsonc`: | |
| ```bash | |
| python3 scripts/apply_paid_api_cloudflare_bindings.py \ | |
| --bindings-file release/cloudflare-bindings.json \ | |
| --write | |
| ``` | |
| `--mode scaffold` verifies the local gateway implementation and should pass. | |
| `--mode launch` is stricter and now passes with the custom domain, live | |
| Cloudflare bindings, Wrangler secret-name evidence, Stripe test-mode top-up | |
| staging, Worker-to-Gojira traffic, paid-route latency evidence, and rollback | |
| proof recorded in `release/paid-api-launch-evidence.json`. | |
| Launch evidence is attached through a sanitized JSON file: | |
| ```bash | |
| cp release/paid-api-launch-evidence.example.json release/paid-api-launch-evidence.json | |
| python3 scripts/collect_paid_api_launch_evidence.py --help | |
| python3 scripts/check_paid_api_readiness.py --mode launch \ | |
| --evidence-file release/paid-api-launch-evidence.json | |
| ``` | |
| Use `scripts/collect_paid_api_launch_evidence.py` to preview or write sanitized | |
| launch evidence after staging resources exist. It can read the staging API key | |
| from an environment variable for live probes, but it never writes the key, full | |
| prompt, or model response to the evidence file. By default it prints a preview; | |
| pass `--write` only after reviewing the target file path. | |
| Only record secret names, route names, request ids, coarse latency numbers, and | |
| pass/fail facts. Do not put raw API keys, bearer tokens, OAuth tokens, Stripe | |
| secret keys, webhook signing secrets, tunnel credentials, full private prompts, | |
| or customer private data in the evidence file. The checker scans the evidence | |
| file for common secret-looking values and fails launch readiness if it finds | |
| them. | |
| ## Minimum API Gates | |
| | Gate | Required Evidence | | |
| | --- | --- | | |
| | Auth | Unauthorized requests fail; valid test key works | | |
| | Billing | Unpaid/suspended account is denied before model call | | |
| | Rate limit | Burst and daily caps work per key | | |
| | Logging | Logs omit secrets and full private prompts | | |
| | Abuse control | Secret-looking payloads and obviously unsafe automation requests are rejected or redacted | | |
| | Artifacts | Origin-only R2 upload and account-scoped artifact download pass | | |
| | Rollback | One command can route traffic back to previous stable model/harness | | |
| | Latency | p95 for paid routes is documented and acceptable | | |
| | Quality | Business-owner eval pack passes with complete files/artifacts | | |
| Current quality evidence: | |
| - Harnessed customer-readiness pack: | |
| `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` | |
| passed, `28/28` required files written, including the release provenance and | |
| safety review task. | |
| - Restored 32k SGLang direct API smoke: | |
| `runs/benchmarks/20260603T155233Z-kaiju-coder-7-serving/summary.md`, | |
| identity passed in `2.92s`; business proposal passed in `94.28s` with | |
| `1,737` chars. | |
| - Runtime-quantized vLLM OpenCode smoke: | |
| `bash scripts/run_kaiju_quantized_opencode_smoke.sh` passed at 16k after | |
| vLLM launched with `--enable-auto-tool-choice`; OpenCode wrote | |
| `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok`. | |
| - Current restored 16k SGLang direct API smoke: | |
| `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, | |
| identity passed in `2.3s`. | |
| - Raw OpenCode multi-file pack remains a blocker for raw-agent claims. | |
| ## Pricing Assumptions To Validate | |
| - Raw model tokens are slow and expensive enough that per-token pricing alone is | |
| not the right first product. | |
| - Better first API product: priced business-owner routes such as website pack, | |
| proposal pack, ROI/report pack, and Kiyomi operating pack. | |
| - Charge for complete artifacts and verified workflow output, with token usage | |
| as an internal cost-control metric. | |
| ## Release Blockers | |
| - Real-money paid API charging still needs a deliberate live-mode Stripe switch | |
| and controlled live payment verification. The technical launch preflight is | |
| green with test-mode Stripe staging evidence. | |
| - Raw OpenCode customer-readiness task currently times out on multi-file work; | |
| the harnessed business-owner route is the reliable first paid API product. | |
| - Harnessed customer-readiness route passes; paid API must route through that | |
| deterministic product path until a faster raw/quantized path passes. | |
| - Context-size benchmarks passed at 12k, 16k, 24k, and 32k, but the current | |
| parked Gojira-B/OpenCode profile is 16k. Treat 32k as the high-context target | |
| to re-confirm after restart before using it as a public default. | |
| - Restored 32k business-document direct API smoke passed, but the `94.28s` | |
| latency is too slow for ungated paid API use without streaming, queueing, | |
| and route-level caps. | |
| - vLLM serving has been tested at 16k, but it is not clearly faster than SGLang | |
| and needs the Gojira nightly image plus text-only launch flags. | |
| - Runtime-quantized vLLM bitsandbytes has passed 8k and 16k identity/code | |
| smoke tests, passed a 16k business-document smoke in `53.44s`, and reduces | |
| model memory to about `17.8 GiB`; its OpenCode one-file smoke now passes. | |
| - Persisted quantized public weights are still pending. | |
| - Hosted gateway now has local-tested API key behavior, live D1 prepaid credits | |
| binding, live KV rate-limit binding, live R2 artifact binding, model | |
| enforcement, secret-content rejection, custom API domain, signed Stripe | |
| test-mode webhook top-up behavior, rollback evidence, and custom-domain p95 | |
| latency evidence. | |
| - `python3 scripts/check_paid_api_readiness.py --mode launch` currently passes | |
| `27 pass / 0 fail / 0 manual`. This means the technical hosted API path is | |
| launch-ready for controlled testing; real customer charging still needs the | |
| live-mode Stripe switch above. | |
| approval. | |