Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RMDWLLC/kaiju-coder-7 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7")
model = AutoModelForMultimodalLM.from_pretrained("RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RMDWLLC/kaiju-coder-7 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RMDWLLC/kaiju-coder-7"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RMDWLLC/kaiju-coder-7

SGLang

How to use RMDWLLC/kaiju-coder-7 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RMDWLLC/kaiju-coder-7" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RMDWLLC/kaiju-coder-7" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
```
docker model run hf.co/RMDWLLC/kaiju-coder-7
```

kaiju-coder-7 / PAID_API_READINESS.md

restokes92

Add files using upload-large-folder tool

4ca1eb4 verified 5 days ago

preview code

raw

history blame

15.1 kB

Kaiju Coder 7 Paid API Readiness

Do not sell the hosted API as generally available until the gates below pass.

Current Position

Kaiju Coder 7 can be served locally through an OpenAI-compatible SGLang endpoint. The reliable commercial product path is:

Kaiju Coder 7 model + deterministic business-owner harness + verifier + gateway controls

Raw multi-file OpenCode generation is not yet fast enough to be the paid API promise by itself. The harnessed customer-readiness pack passes and should be the paid-route baseline until raw-agent generation improves.

Required Gateway Behavior

Use model id kaiju-coder-7.
Disable hidden thinking where the serving stack supports it.
Stream responses for long outputs.
Cap max output by route.
Reject requests with secret-looking prompt content when possible.
Never log API keys, bearer tokens, OAuth tokens, payment credentials, or full private customer prompts by default.
Keep request ids, customer id, route, token counts, latency, status, and coarse failure reason.

Billing And Access

API keys must be scoped per customer/account.
Stripe subscription or prepaid credit balance must be checked before serving.
Rate limits must be per key and per account.
Failed auth and rate-limit events should be logged without prompt content.
Admin override keys must be separate from customer keys.

Current Gateway Scaffold Evidence

Local Worker scaffold:

gateway/cloudflare-worker/src/index.js
gateway/cloudflare-worker/migrations/0001_paid_api.sql
gateway/cloudflare-worker/test/index.test.js

Verified on 2026-06-03 with:

cd gateway/cloudflare-worker
npm run check
npm run preflight

Result: 16/16 Worker tests passed and 17 paid API scaffold preflight checks passed. The scaffold preflight also checks that the guarded Cloudflare resource-prep script, scripts/prepare_paid_api_cloudflare_resources.sh, is wired through npm run prepare:cloudflare, and that the reviewed binding template is present.

Live Cloudflare resources were created on 2026-06-03 after Wrangler login:

D1 kaiju_api_billing bound as KAIJU_BILLING_DB
KV kaiju_rate_limit bound as KAIJU_RATE_LIMIT_KV
R2 kaiju-api-artifacts bound as KAIJU_ARTIFACT_BUCKET
D1 migration 0001_paid_api.sql applied successfully

The Worker was deployed on 2026-06-03 at:

https://kaiju-api-gateway.kiyomi-api.workers.dev
https://kaiju-api.kiyomikode.com

Gojira-B now advertises kaiju-coder-7 from its public health endpoint. The origin secret was rotated during launch verification and re-applied to Cloudflare without writing the value to this repo.

Current launch preflight after custom-domain traffic, Stripe test webhook staging, and rollback verification:

27 pass / 0 fail / 0 manual

Passed live launch evidence:

Custom domain https://kaiju-api.kiyomikode.com resolves to the intended Kaiju Worker; /health returned 200 in the launch evidence probe.
KAIJU_ORIGIN_URL, KAIJU_ORIGIN_SECRET, and KAIJU_STRIPE_WEBHOOK_SECRET are present by Wrangler secret name.
Worker-to-Gojira staging request passed through https://kaiju-api.kiyomikode.com/v1/chat/completions with model=kaiju-coder-7, HTTP 200, and streaming enabled.
Paid-route latency through the custom domain was measured over five staging samples with p95 14121.18ms.
Stripe test-mode checkout.session.completed credited the staging API key using metadata.kaiju_api_key_id; duplicate signed delivery returned duplicate: true and did not double-credit.
Rollback drill succeeded by deploying same-code version e838e01d-2d72-4eb7-9814-b95b7e2cef14, rolling traffic back to verified version d37d60d1-7bfc-4ac9-a69c-e9339b5e495f, and rechecking /health.

Real-money public charging still needs an explicit Stripe live-mode switch: create the live Checkout products/links or Sessions, create the live webhook endpoint for the final API domain, replace KAIJU_STRIPE_WEBHOOK_SECRET with the live webhook signing secret, and run a live-mode penny or controlled internal payment before advertising paid API access.

Covered locally:

missing bearer token returns 401
inactive API key returns 403
insufficient credits return 402 before origin fetch
successful chat request forwards x-kaiju-origin-secret and debits credits
origin fetch failure refunds credits
fixed-window rate limit blocks before debit
public chat payload is forced to model kaiju-coder-7, streaming, thinking disabled, and token capped
unsupported model is rejected before debit
secret-looking prompt content is rejected before debit, origin fetch, or logs
signed Stripe Checkout webhook credits prepaid balance
duplicate Stripe Checkout webhook does not double-credit
invalid Stripe signature is rejected
origin-only artifact upload stores bounded text artifacts in R2
authenticated artifact download is scoped to the caller's account namespace
unsafe artifact paths are rejected before R2 storage
secret-looking artifact content is rejected before R2 storage

Executable preflight:

python3 scripts/check_kaiju_public_release_readiness.py --mode local
python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release
python3 scripts/check_kaiju_public_release_readiness.py --mode public
python3 scripts/generate_kaiju_final_report.py
python3 scripts/check_kaiju_goal_completion.py --write
python3 scripts/refresh_kaiju_release_evidence.py --skip-opencode-smoke
python3 scripts/check_hf_staging_integrity.py
python3 scripts/check_hf_release_bundle_integrity.py
python3 scripts/collect_hf_release_permission_evidence.py
python3 scripts/check_hf_release_permission_evidence.py
python3 scripts/check_human_release_review.py --mode local
python3 scripts/check_human_release_review.py --mode public
cd gateway/cloudflare-worker
npm run prepare:cloudflare
cd ../..
cp release/cloudflare-bindings.example.json release/cloudflare-bindings.json
# Replace placeholder D1/KV IDs in release/cloudflare-bindings.json first.
python3 scripts/apply_paid_api_cloudflare_bindings.py --bindings-file release/cloudflare-bindings.json
python3 scripts/check_paid_api_readiness.py --mode scaffold
python3 scripts/check_paid_api_readiness.py --mode launch

check_kaiju_public_release_readiness.py --mode local is the consolidated public-testing readiness command. --mode hf-release checks the downloadable model/helper release, public Hugging Face evidence, and human review while keeping live paid charging separate from model publication. --mode public now passes after public HF verification, live Cloudflare resource evidence, Stripe test-mode staging evidence, rollback proof, paid-route latency evidence, and human review are complete.

generate_kaiju_final_report.py writes release/FINAL_RELEASE_REPORT.md with the current local/public readiness summaries, launch blockers, changed files, commands run, and first commands Richard should test. It is part of the release packet and does not inspect tokens, environment variables, or process command lines.

check_kaiju_goal_completion.py --write writes release/GOAL_COMPLETION_AUDIT.md, a stricter objective-level audit. It should remain green only while the live runtime, public HF evidence, human review, and paid API launch evidence continue to pass.

refresh_kaiju_release_evidence.py is a safe local refresh runner. It updates direct API smoke evidence, goal audit, final report, HF staging, local bundle, merged-model metadata on Gojira-B, and dry-run upload previews without reading tokens or uploading anything.

check_hf_staging_integrity.py validates the staged Hugging Face package for required files, public naming hygiene, raw secret-looking values, and staging checksums. It does not upload, create repos, or print matched secret values.

check_hf_release_permission_evidence.py validates sanitized Hugging Face repo-create evidence in release/hf-release-permission-evidence.json. Start from release/hf-release-permission-evidence.example.json only after the private permission probe succeeds, or use scripts/collect_hf_release_permission_evidence.py --apply --write to run the probe and write the sanitized evidence automatically. Never include raw auth output or tokens.

check_human_release_review.py reads release/HUMAN_RELEASE_REVIEW.md. Local mode may pass with pending/manual review fields; public mode must fail until Richard changes the signoff fields to approved decisions.

npm run prepare:cloudflare is dry-run safe by default. It prints the exact Wrangler commands for creating KAIJU_BILLING_DB, KAIJU_RATE_LIMIT_KV, and KAIJU_ARTIFACT_BUCKET, applying the D1 migration, setting required secrets, deploying, listing deployments, and exercising rollback. The live resource creation path has now been run with KAIJU_CF_RESOURCE_APPLY=1 and KAIJU_CF_UPDATE_CONFIG=1. npm run check also runs npx wrangler deploy --dry-run so the current Worker build path is validated without publishing.

After real D1/KV/R2 resources exist, copy release/cloudflare-bindings.example.json to release/cloudflare-bindings.json, replace the placeholder IDs, and preview the reviewed config update:

python3 scripts/apply_paid_api_cloudflare_bindings.py \
  --bindings-file release/cloudflare-bindings.json

The applier refuses placeholder values and secret-looking input. Only after the preview is reviewed should it update gateway/cloudflare-worker/wrangler.jsonc:

python3 scripts/apply_paid_api_cloudflare_bindings.py \
  --bindings-file release/cloudflare-bindings.json \
  --write

--mode scaffold verifies the local gateway implementation and should pass. --mode launch is stricter and now passes with the custom domain, live Cloudflare bindings, Wrangler secret-name evidence, Stripe test-mode top-up staging, Worker-to-Gojira traffic, paid-route latency evidence, and rollback proof recorded in release/paid-api-launch-evidence.json.

Launch evidence is attached through a sanitized JSON file:

cp release/paid-api-launch-evidence.example.json release/paid-api-launch-evidence.json
python3 scripts/collect_paid_api_launch_evidence.py --help
python3 scripts/check_paid_api_readiness.py --mode launch \
  --evidence-file release/paid-api-launch-evidence.json

Use scripts/collect_paid_api_launch_evidence.py to preview or write sanitized launch evidence after staging resources exist. It can read the staging API key from an environment variable for live probes, but it never writes the key, full prompt, or model response to the evidence file. By default it prints a preview; pass --write only after reviewing the target file path.

Only record secret names, route names, request ids, coarse latency numbers, and pass/fail facts. Do not put raw API keys, bearer tokens, OAuth tokens, Stripe secret keys, webhook signing secrets, tunnel credentials, full private prompts, or customer private data in the evidence file. The checker scans the evidence file for common secret-looking values and fails launch readiness if it finds them.

Minimum API Gates

Gate	Required Evidence
Auth	Unauthorized requests fail; valid test key works
Billing	Unpaid/suspended account is denied before model call
Rate limit	Burst and daily caps work per key
Logging	Logs omit secrets and full private prompts
Abuse control	Secret-looking payloads and obviously unsafe automation requests are rejected or redacted
Artifacts	Origin-only R2 upload and account-scoped artifact download pass
Rollback	One command can route traffic back to previous stable model/harness
Latency	p95 for paid routes is documented and acceptable
Quality	Business-owner eval pack passes with complete files/artifacts

Current quality evidence:

Harnessed customer-readiness pack: runs/opencode-customer-readiness/20260603T185835Z/summary.md, 4/4 passed, 28/28 required files written, including the release provenance and safety review task.
Restored 32k SGLang direct API smoke: runs/benchmarks/20260603T155233Z-kaiju-coder-7-serving/summary.md, identity passed in 2.92s; business proposal passed in 94.28s with 1,737 chars.
Runtime-quantized vLLM OpenCode smoke: bash scripts/run_kaiju_quantized_opencode_smoke.sh passed at 16k after vLLM launched with --enable-auto-tool-choice; OpenCode wrote hello.txt with exactly Kaiju Coder 7 quantized runtime ok.
Current restored 16k SGLang direct API smoke: runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md, identity passed in 2.3s.
Raw OpenCode multi-file pack remains a blocker for raw-agent claims.

Pricing Assumptions To Validate

Raw model tokens are slow and expensive enough that per-token pricing alone is not the right first product.
Better first API product: priced business-owner routes such as website pack, proposal pack, ROI/report pack, and Kiyomi operating pack.
Charge for complete artifacts and verified workflow output, with token usage as an internal cost-control metric.

Release Blockers

Real-money paid API charging still needs a deliberate live-mode Stripe switch and controlled live payment verification. The technical launch preflight is green with test-mode Stripe staging evidence.
Raw OpenCode customer-readiness task currently times out on multi-file work; the harnessed business-owner route is the reliable first paid API product.
Harnessed customer-readiness route passes; paid API must route through that deterministic product path until a faster raw/quantized path passes.
Context-size benchmarks passed at 12k, 16k, 24k, and 32k, but the current parked Gojira-B/OpenCode profile is 16k. Treat 32k as the high-context target to re-confirm after restart before using it as a public default.
Restored 32k business-document direct API smoke passed, but the 94.28s latency is too slow for ungated paid API use without streaming, queueing, and route-level caps.
vLLM serving has been tested at 16k, but it is not clearly faster than SGLang and needs the Gojira nightly image plus text-only launch flags.
Runtime-quantized vLLM bitsandbytes has passed 8k and 16k identity/code smoke tests, passed a 16k business-document smoke in 53.44s, and reduces model memory to about 17.8 GiB; its OpenCode one-file smoke now passes.
Persisted quantized public weights are still pending.
Hosted gateway now has local-tested API key behavior, live D1 prepaid credits binding, live KV rate-limit binding, live R2 artifact binding, model enforcement, secret-content rejection, custom API domain, signed Stripe test-mode webhook top-up behavior, rollback evidence, and custom-domain p95 latency evidence.
python3 scripts/check_paid_api_readiness.py --mode launch currently passes 27 pass / 0 fail / 0 manual. This means the technical hosted API path is launch-ready for controlled testing; real customer charging still needs the live-mode Stripe switch above. approval.