Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
kaiju-coder-7
coding
local-ai
business
opencode
tool-use
conversational
Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RMDWLLC/kaiju-coder-7 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7") model = AutoModelForMultimodalLM.from_pretrained("RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use RMDWLLC/kaiju-coder-7 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RMDWLLC/kaiju-coder-7" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RMDWLLC/kaiju-coder-7
- SGLang
How to use RMDWLLC/kaiju-coder-7 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
docker model run hf.co/RMDWLLC/kaiju-coder-7
File size: 15,139 Bytes
9afd28d 5c7ac60 8d8d6ea 5c7ac60 8d8d6ea 00ba859 5c7ac60 8d8d6ea 00ba859 8d8d6ea 00ba859 8d8d6ea 00ba859 8d8d6ea 00ba859 8d8d6ea 00ba859 5c7ac60 9afd28d 4ca1eb4 9afd28d 4ca1eb4 9afd28d 5c7ac60 9afd28d 00ba859 9afd28d 00ba859 8d8d6ea 9afd28d 00ba859 5c7ac60 00ba859 5c7ac60 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 | # Kaiju Coder 7 Paid API Readiness
Do not sell the hosted API as generally available until the gates below pass.
## Current Position
Kaiju Coder 7 can be served locally through an OpenAI-compatible SGLang
endpoint. The reliable commercial product path is:
```text
Kaiju Coder 7 model + deterministic business-owner harness + verifier + gateway controls
```
Raw multi-file OpenCode generation is not yet fast enough to be the paid API
promise by itself. The harnessed customer-readiness pack passes and should be
the paid-route baseline until raw-agent generation improves.
## Required Gateway Behavior
- Use model id `kaiju-coder-7`.
- Disable hidden thinking where the serving stack supports it.
- Stream responses for long outputs.
- Cap max output by route.
- Reject requests with secret-looking prompt content when possible.
- Never log API keys, bearer tokens, OAuth tokens, payment credentials, or full
private customer prompts by default.
- Keep request ids, customer id, route, token counts, latency, status, and coarse
failure reason.
## Billing And Access
- API keys must be scoped per customer/account.
- Stripe subscription or prepaid credit balance must be checked before serving.
- Rate limits must be per key and per account.
- Failed auth and rate-limit events should be logged without prompt content.
- Admin override keys must be separate from customer keys.
## Current Gateway Scaffold Evidence
Local Worker scaffold:
- `gateway/cloudflare-worker/src/index.js`
- `gateway/cloudflare-worker/migrations/0001_paid_api.sql`
- `gateway/cloudflare-worker/test/index.test.js`
Verified on 2026-06-03 with:
```bash
cd gateway/cloudflare-worker
npm run check
npm run preflight
```
Result: `16/16` Worker tests passed and `17` paid API scaffold preflight checks
passed.
The scaffold preflight also checks that the guarded Cloudflare resource-prep
script, `scripts/prepare_paid_api_cloudflare_resources.sh`, is wired through
`npm run prepare:cloudflare`, and that the reviewed binding template is present.
Live Cloudflare resources were created on 2026-06-03 after Wrangler login:
- D1 `kaiju_api_billing` bound as `KAIJU_BILLING_DB`
- KV `kaiju_rate_limit` bound as `KAIJU_RATE_LIMIT_KV`
- R2 `kaiju-api-artifacts` bound as `KAIJU_ARTIFACT_BUCKET`
- D1 migration `0001_paid_api.sql` applied successfully
The Worker was deployed on 2026-06-03 at:
```text
https://kaiju-api-gateway.kiyomi-api.workers.dev
https://kaiju-api.kiyomikode.com
```
Gojira-B now advertises `kaiju-coder-7` from its public health endpoint. The
origin secret was rotated during launch verification and re-applied to
Cloudflare without writing the value to this repo.
Current launch preflight after custom-domain traffic, Stripe test webhook
staging, and rollback verification:
```text
27 pass / 0 fail / 0 manual
```
Passed live launch evidence:
- Custom domain `https://kaiju-api.kiyomikode.com` resolves to the intended
Kaiju Worker; `/health` returned `200` in the launch evidence probe.
- `KAIJU_ORIGIN_URL`, `KAIJU_ORIGIN_SECRET`, and
`KAIJU_STRIPE_WEBHOOK_SECRET` are present by Wrangler secret name.
- Worker-to-Gojira staging request passed through
`https://kaiju-api.kiyomikode.com/v1/chat/completions` with
`model=kaiju-coder-7`, HTTP `200`, and streaming enabled.
- Paid-route latency through the custom domain was measured over five staging
samples with p95 `14121.18ms`.
- Stripe test-mode `checkout.session.completed` credited the staging API key
using `metadata.kaiju_api_key_id`; duplicate signed delivery returned
`duplicate: true` and did not double-credit.
- Rollback drill succeeded by deploying same-code version
`e838e01d-2d72-4eb7-9814-b95b7e2cef14`, rolling traffic back to verified
version `d37d60d1-7bfc-4ac9-a69c-e9339b5e495f`, and rechecking `/health`.
Real-money public charging still needs an explicit Stripe live-mode switch:
create the live Checkout products/links or Sessions, create the live webhook
endpoint for the final API domain, replace `KAIJU_STRIPE_WEBHOOK_SECRET` with
the live webhook signing secret, and run a live-mode penny or controlled
internal payment before advertising paid API access.
Covered locally:
- missing bearer token returns `401`
- inactive API key returns `403`
- insufficient credits return `402` before origin fetch
- successful chat request forwards `x-kaiju-origin-secret` and debits credits
- origin fetch failure refunds credits
- fixed-window rate limit blocks before debit
- public chat payload is forced to model `kaiju-coder-7`, streaming, thinking
disabled, and token capped
- unsupported model is rejected before debit
- secret-looking prompt content is rejected before debit, origin fetch, or logs
- signed Stripe Checkout webhook credits prepaid balance
- duplicate Stripe Checkout webhook does not double-credit
- invalid Stripe signature is rejected
- origin-only artifact upload stores bounded text artifacts in R2
- authenticated artifact download is scoped to the caller's account namespace
- unsafe artifact paths are rejected before R2 storage
- secret-looking artifact content is rejected before R2 storage
Executable preflight:
```bash
python3 scripts/check_kaiju_public_release_readiness.py --mode local
python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release
python3 scripts/check_kaiju_public_release_readiness.py --mode public
python3 scripts/generate_kaiju_final_report.py
python3 scripts/check_kaiju_goal_completion.py --write
python3 scripts/refresh_kaiju_release_evidence.py --skip-opencode-smoke
python3 scripts/check_hf_staging_integrity.py
python3 scripts/check_hf_release_bundle_integrity.py
python3 scripts/collect_hf_release_permission_evidence.py
python3 scripts/check_hf_release_permission_evidence.py
python3 scripts/check_human_release_review.py --mode local
python3 scripts/check_human_release_review.py --mode public
cd gateway/cloudflare-worker
npm run prepare:cloudflare
cd ../..
cp release/cloudflare-bindings.example.json release/cloudflare-bindings.json
# Replace placeholder D1/KV IDs in release/cloudflare-bindings.json first.
python3 scripts/apply_paid_api_cloudflare_bindings.py --bindings-file release/cloudflare-bindings.json
python3 scripts/check_paid_api_readiness.py --mode scaffold
python3 scripts/check_paid_api_readiness.py --mode launch
```
`check_kaiju_public_release_readiness.py --mode local` is the consolidated
public-testing readiness command. `--mode hf-release` checks the downloadable
model/helper release, public Hugging Face evidence, and human review while
keeping live paid charging separate from model publication. `--mode public`
now passes after public HF verification, live Cloudflare resource evidence,
Stripe test-mode staging evidence, rollback proof, paid-route latency evidence,
and human review are complete.
`generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
the current local/public readiness summaries, launch blockers, changed files,
commands run, and first commands Richard should test. It is part of the release
packet and does not inspect tokens, environment variables, or process command
lines.
`check_kaiju_goal_completion.py --write` writes
`release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
remain green only while the live runtime, public HF evidence, human review, and
paid API launch evidence continue to pass.
`refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
direct API smoke evidence, goal audit, final report, HF staging, local bundle,
merged-model metadata on Gojira-B, and dry-run upload previews without reading
tokens or uploading anything.
`check_hf_staging_integrity.py` validates the staged Hugging Face package for
required files, public naming hygiene, raw secret-looking values, and staging
checksums. It does not upload, create repos, or print matched secret values.
`check_hf_release_permission_evidence.py` validates sanitized Hugging Face
repo-create evidence in `release/hf-release-permission-evidence.json`. Start
from `release/hf-release-permission-evidence.example.json` only after the
private permission probe succeeds, or use
`scripts/collect_hf_release_permission_evidence.py --apply --write` to run the
probe and write the sanitized evidence automatically. Never include raw auth
output or tokens.
`check_human_release_review.py` reads `release/HUMAN_RELEASE_REVIEW.md`. Local
mode may pass with pending/manual review fields; public mode must fail until
Richard changes the signoff fields to approved decisions.
`npm run prepare:cloudflare` is dry-run safe by default. It prints the exact
Wrangler commands for creating `KAIJU_BILLING_DB`, `KAIJU_RATE_LIMIT_KV`, and
`KAIJU_ARTIFACT_BUCKET`, applying the D1 migration, setting required secrets,
deploying, listing deployments, and exercising rollback. The live resource
creation path has now been run with `KAIJU_CF_RESOURCE_APPLY=1` and
`KAIJU_CF_UPDATE_CONFIG=1`. `npm run check` also runs
`npx wrangler deploy --dry-run` so the current Worker build path is validated
without publishing.
After real D1/KV/R2 resources exist, copy
`release/cloudflare-bindings.example.json` to `release/cloudflare-bindings.json`,
replace the placeholder IDs, and preview the reviewed config update:
```bash
python3 scripts/apply_paid_api_cloudflare_bindings.py \
--bindings-file release/cloudflare-bindings.json
```
The applier refuses placeholder values and secret-looking input. Only after the
preview is reviewed should it update `gateway/cloudflare-worker/wrangler.jsonc`:
```bash
python3 scripts/apply_paid_api_cloudflare_bindings.py \
--bindings-file release/cloudflare-bindings.json \
--write
```
`--mode scaffold` verifies the local gateway implementation and should pass.
`--mode launch` is stricter and now passes with the custom domain, live
Cloudflare bindings, Wrangler secret-name evidence, Stripe test-mode top-up
staging, Worker-to-Gojira traffic, paid-route latency evidence, and rollback
proof recorded in `release/paid-api-launch-evidence.json`.
Launch evidence is attached through a sanitized JSON file:
```bash
cp release/paid-api-launch-evidence.example.json release/paid-api-launch-evidence.json
python3 scripts/collect_paid_api_launch_evidence.py --help
python3 scripts/check_paid_api_readiness.py --mode launch \
--evidence-file release/paid-api-launch-evidence.json
```
Use `scripts/collect_paid_api_launch_evidence.py` to preview or write sanitized
launch evidence after staging resources exist. It can read the staging API key
from an environment variable for live probes, but it never writes the key, full
prompt, or model response to the evidence file. By default it prints a preview;
pass `--write` only after reviewing the target file path.
Only record secret names, route names, request ids, coarse latency numbers, and
pass/fail facts. Do not put raw API keys, bearer tokens, OAuth tokens, Stripe
secret keys, webhook signing secrets, tunnel credentials, full private prompts,
or customer private data in the evidence file. The checker scans the evidence
file for common secret-looking values and fails launch readiness if it finds
them.
## Minimum API Gates
| Gate | Required Evidence |
| --- | --- |
| Auth | Unauthorized requests fail; valid test key works |
| Billing | Unpaid/suspended account is denied before model call |
| Rate limit | Burst and daily caps work per key |
| Logging | Logs omit secrets and full private prompts |
| Abuse control | Secret-looking payloads and obviously unsafe automation requests are rejected or redacted |
| Artifacts | Origin-only R2 upload and account-scoped artifact download pass |
| Rollback | One command can route traffic back to previous stable model/harness |
| Latency | p95 for paid routes is documented and acceptable |
| Quality | Business-owner eval pack passes with complete files/artifacts |
Current quality evidence:
- Harnessed customer-readiness pack:
`runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4`
passed, `28/28` required files written, including the release provenance and
safety review task.
- Restored 32k SGLang direct API smoke:
`runs/benchmarks/20260603T155233Z-kaiju-coder-7-serving/summary.md`,
identity passed in `2.92s`; business proposal passed in `94.28s` with
`1,737` chars.
- Runtime-quantized vLLM OpenCode smoke:
`bash scripts/run_kaiju_quantized_opencode_smoke.sh` passed at 16k after
vLLM launched with `--enable-auto-tool-choice`; OpenCode wrote
`hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok`.
- Current restored 16k SGLang direct API smoke:
`runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`,
identity passed in `2.3s`.
- Raw OpenCode multi-file pack remains a blocker for raw-agent claims.
## Pricing Assumptions To Validate
- Raw model tokens are slow and expensive enough that per-token pricing alone is
not the right first product.
- Better first API product: priced business-owner routes such as website pack,
proposal pack, ROI/report pack, and Kiyomi operating pack.
- Charge for complete artifacts and verified workflow output, with token usage
as an internal cost-control metric.
## Release Blockers
- Real-money paid API charging still needs a deliberate live-mode Stripe switch
and controlled live payment verification. The technical launch preflight is
green with test-mode Stripe staging evidence.
- Raw OpenCode customer-readiness task currently times out on multi-file work;
the harnessed business-owner route is the reliable first paid API product.
- Harnessed customer-readiness route passes; paid API must route through that
deterministic product path until a faster raw/quantized path passes.
- Context-size benchmarks passed at 12k, 16k, 24k, and 32k, but the current
parked Gojira-B/OpenCode profile is 16k. Treat 32k as the high-context target
to re-confirm after restart before using it as a public default.
- Restored 32k business-document direct API smoke passed, but the `94.28s`
latency is too slow for ungated paid API use without streaming, queueing,
and route-level caps.
- vLLM serving has been tested at 16k, but it is not clearly faster than SGLang
and needs the Gojira nightly image plus text-only launch flags.
- Runtime-quantized vLLM bitsandbytes has passed 8k and 16k identity/code
smoke tests, passed a 16k business-document smoke in `53.44s`, and reduces
model memory to about `17.8 GiB`; its OpenCode one-file smoke now passes.
- Persisted quantized public weights are still pending.
- Hosted gateway now has local-tested API key behavior, live D1 prepaid credits
binding, live KV rate-limit binding, live R2 artifact binding, model
enforcement, secret-content rejection, custom API domain, signed Stripe
test-mode webhook top-up behavior, rollback evidence, and custom-domain p95
latency evidence.
- `python3 scripts/check_paid_api_readiness.py --mode launch` currently passes
`27 pass / 0 fail / 0 manual`. This means the technical hosted API path is
launch-ready for controlled testing; real customer charging still needs the
live-mode Stripe switch above.
approval.
|