# Kaiju Coder 7 Paid API Readiness Do not sell the hosted API as generally available until the gates below pass. ## Current Position Kaiju Coder 7 can be served locally through an OpenAI-compatible SGLang endpoint. The reliable commercial product path is: ```text Kaiju Coder 7 model + deterministic business-owner harness + verifier + gateway controls ``` Raw multi-file OpenCode generation is not yet fast enough to be the paid API promise by itself. The harnessed customer-readiness pack passes and should be the paid-route baseline until raw-agent generation improves. ## Required Gateway Behavior - Use model id `kaiju-coder-7`. - Disable hidden thinking where the serving stack supports it. - Stream responses for long outputs. - Cap max output by route. - Reject requests with secret-looking prompt content when possible. - Never log API keys, bearer tokens, OAuth tokens, payment credentials, or full private customer prompts by default. - Keep request ids, customer id, route, token counts, latency, status, and coarse failure reason. ## Billing And Access - API keys must be scoped per customer/account. - Stripe subscription or prepaid credit balance must be checked before serving. - Rate limits must be per key and per account. - Failed auth and rate-limit events should be logged without prompt content. - Admin override keys must be separate from customer keys. ## Current Gateway Scaffold Evidence Local Worker scaffold: - `gateway/cloudflare-worker/src/index.js` - `gateway/cloudflare-worker/migrations/0001_paid_api.sql` - `gateway/cloudflare-worker/test/index.test.js` Verified on 2026-06-03 with: ```bash cd gateway/cloudflare-worker npm run check npm run preflight ``` Result: `16/16` Worker tests passed and `17` paid API scaffold preflight checks passed. The scaffold preflight also checks that the guarded Cloudflare resource-prep script, `scripts/prepare_paid_api_cloudflare_resources.sh`, is wired through `npm run prepare:cloudflare`, and that the reviewed binding template is present. Covered locally: - missing bearer token returns `401` - inactive API key returns `403` - insufficient credits return `402` before origin fetch - successful chat request forwards `x-kaiju-origin-secret` and debits credits - origin fetch failure refunds credits - fixed-window rate limit blocks before debit - public chat payload is forced to model `kaiju-coder-7`, streaming, thinking disabled, and token capped - unsupported model is rejected before debit - secret-looking prompt content is rejected before debit, origin fetch, or logs - signed Stripe Checkout webhook credits prepaid balance - duplicate Stripe Checkout webhook does not double-credit - invalid Stripe signature is rejected - origin-only artifact upload stores bounded text artifacts in R2 - authenticated artifact download is scoped to the caller's account namespace - unsafe artifact paths are rejected before R2 storage - secret-looking artifact content is rejected before R2 storage Executable preflight: ```bash python3 scripts/check_kaiju_public_release_readiness.py --mode local python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release python3 scripts/check_kaiju_public_release_readiness.py --mode public python3 scripts/generate_kaiju_final_report.py python3 scripts/check_kaiju_goal_completion.py --write python3 scripts/refresh_kaiju_release_evidence.py --skip-opencode-smoke python3 scripts/check_hf_staging_integrity.py python3 scripts/check_hf_release_bundle_integrity.py python3 scripts/collect_hf_release_permission_evidence.py python3 scripts/check_hf_release_permission_evidence.py python3 scripts/check_human_release_review.py --mode local python3 scripts/check_human_release_review.py --mode public cd gateway/cloudflare-worker npm run prepare:cloudflare cd ../.. cp release/cloudflare-bindings.example.json release/cloudflare-bindings.json # Replace placeholder D1/KV IDs in release/cloudflare-bindings.json first. python3 scripts/apply_paid_api_cloudflare_bindings.py --bindings-file release/cloudflare-bindings.json python3 scripts/check_paid_api_readiness.py --mode scaffold python3 scripts/check_paid_api_readiness.py --mode launch ``` `check_kaiju_public_release_readiness.py --mode local` is the consolidated public-testing readiness command. It can pass while public upload and paid API launch remain manual blockers. `--mode hf-release` checks the downloadable model/helper release and requires sanitized Hugging Face namespace permission evidence plus human review while keeping paid API launch manual. `--mode public` must remain red until Hugging Face write permissions, live Cloudflare resources, Stripe staging evidence, rollback proof, and human review are complete. `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with the current local/public readiness summaries, launch blockers, changed files, commands run, and first commands Richard should test. It is part of the release packet and does not inspect tokens, environment variables, or process command lines. `check_kaiju_goal_completion.py --write` writes `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should remain red while Hugging Face upload, human review, or live paid API launch evidence are missing. `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates direct API smoke evidence, goal audit, final report, HF staging, local bundle, merged-model metadata on Gojira-B, and dry-run upload previews without reading tokens or uploading anything. `check_hf_staging_integrity.py` validates the staged Hugging Face package for required files, public naming hygiene, raw secret-looking values, and staging checksums. It does not upload, create repos, or print matched secret values. `check_hf_release_permission_evidence.py` validates sanitized Hugging Face repo-create evidence in `release/hf-release-permission-evidence.json`. Start from `release/hf-release-permission-evidence.example.json` only after the private permission probe succeeds, or use `scripts/collect_hf_release_permission_evidence.py --apply --write` to run the probe and write the sanitized evidence automatically. Never include raw auth output or tokens. `check_human_release_review.py` reads `release/HUMAN_RELEASE_REVIEW.md`. Local mode may pass with pending/manual review fields; public mode must fail until Richard changes the signoff fields to approved decisions. `npm run prepare:cloudflare` is dry-run safe by default. It prints the exact Wrangler commands for creating `KAIJU_BILLING_DB`, `KAIJU_RATE_LIMIT_KV`, and `KAIJU_ARTIFACT_BUCKET`, applying the D1 migration, setting required secrets, deploying, listing deployments, and exercising rollback. `npm run check` also runs `npx wrangler deploy --dry-run` so the current Worker build path is validated without publishing. Set `KAIJU_CF_RESOURCE_APPLY=1` only when the intended Cloudflare account is active. After real D1/KV/R2 resources exist, copy `release/cloudflare-bindings.example.json` to `release/cloudflare-bindings.json`, replace the placeholder IDs, and preview the reviewed config update: ```bash python3 scripts/apply_paid_api_cloudflare_bindings.py \ --bindings-file release/cloudflare-bindings.json ``` The applier refuses placeholder values and secret-looking input. Only after the preview is reviewed should it update `gateway/cloudflare-worker/wrangler.jsonc`: ```bash python3 scripts/apply_paid_api_cloudflare_bindings.py \ --bindings-file release/cloudflare-bindings.json \ --write ``` `--mode scaffold` verifies the local gateway implementation and should pass. `--mode launch` is stricter and should fail until real Cloudflare bindings, Wrangler secrets, Stripe webhook evidence, staging traffic, latency evidence, and rollback proof are attached. Launch evidence is attached through a sanitized JSON file: ```bash cp release/paid-api-launch-evidence.example.json release/paid-api-launch-evidence.json python3 scripts/collect_paid_api_launch_evidence.py --help python3 scripts/check_paid_api_readiness.py --mode launch \ --evidence-file release/paid-api-launch-evidence.json ``` Use `scripts/collect_paid_api_launch_evidence.py` to preview or write sanitized launch evidence after staging resources exist. It can read the staging API key from an environment variable for live probes, but it never writes the key, full prompt, or model response to the evidence file. By default it prints a preview; pass `--write` only after reviewing the target file path. Only record secret names, route names, request ids, coarse latency numbers, and pass/fail facts. Do not put raw API keys, bearer tokens, OAuth tokens, Stripe secret keys, webhook signing secrets, tunnel credentials, full private prompts, or customer private data in the evidence file. The checker scans the evidence file for common secret-looking values and fails launch readiness if it finds them. ## Minimum API Gates | Gate | Required Evidence | | --- | --- | | Auth | Unauthorized requests fail; valid test key works | | Billing | Unpaid/suspended account is denied before model call | | Rate limit | Burst and daily caps work per key | | Logging | Logs omit secrets and full private prompts | | Abuse control | Secret-looking payloads and obviously unsafe automation requests are rejected or redacted | | Artifacts | Origin-only R2 upload and account-scoped artifact download pass | | Rollback | One command can route traffic back to previous stable model/harness | | Latency | p95 for paid routes is documented and acceptable | | Quality | Business-owner eval pack passes with complete files/artifacts | Current quality evidence: - Harnessed customer-readiness pack: `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` passed, `28/28` required files written, including the release provenance and safety review task. - Restored 32k SGLang direct API smoke: `runs/benchmarks/20260603T155233Z-kaiju-coder-7-serving/summary.md`, identity passed in `2.92s`; business proposal passed in `94.28s` with `1,737` chars. - Runtime-quantized vLLM OpenCode smoke: `bash scripts/run_kaiju_quantized_opencode_smoke.sh` passed at 16k after vLLM launched with `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok`. - Current restored 16k SGLang direct API smoke: `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity passed in `2.3s`. - Raw OpenCode multi-file pack remains a blocker for raw-agent claims. ## Pricing Assumptions To Validate - Raw model tokens are slow and expensive enough that per-token pricing alone is not the right first product. - Better first API product: priced business-owner routes such as website pack, proposal pack, ROI/report pack, and Kiyomi operating pack. - Charge for complete artifacts and verified workflow output, with token usage as an internal cost-control metric. ## Release Blockers - Raw OpenCode customer-readiness task currently times out on multi-file work. - Harnessed customer-readiness route passes; paid API must route through that deterministic product path until a faster raw/quantized path passes. - Context-size benchmarks passed at 12k, 16k, 24k, and 32k, but the current parked Gojira-B/OpenCode profile is 16k. Treat 32k as the high-context target to re-confirm after restart before using it as a public default. - Restored 32k business-document direct API smoke passed, but the `94.28s` latency is too slow for ungated paid API use without streaming, queueing, and route-level caps. - vLLM serving has been tested at 16k, but it is not clearly faster than SGLang and needs the Gojira nightly image plus text-only launch flags. - Runtime-quantized vLLM bitsandbytes has passed 8k and 16k identity/code smoke tests, passed a 16k business-document smoke in `53.44s`, and reduces model memory to about `17.8 GiB`; its OpenCode one-file smoke now passes. - Persisted quantized public weights are still pending. - Hosted gateway scaffold now has local-tested API key, D1 prepaid credits, fixed-window rate limit, model enforcement, secret-content rejection, and signed Stripe webhook top-up behavior. It also has a sanitized launch-evidence collector for the remaining staging proof. It is not live-paid ready until real Cloudflare resources, Stripe products/webhook endpoint, deployment secrets, sanitized launch evidence, and staging end-to-end requests pass. - `python3 scripts/check_paid_api_readiness.py --mode launch` currently fails by design because live D1/KV/R2 bindings and manual launch evidence are not attached. This prevents local scaffold readiness from being mistaken for paid public launch approval.