Upload Kaiju Coder 7 adapter release package

Browse files

Files changed (8) hide show

COMPLETION_AUDIT.md +15 -19
EVAL_SCOREBOARD.md +8 -8
FINAL_RELEASE_REPORT.md +14 -18
GOAL_COMPLETION_AUDIT.md +3 -3
PAID_API_READINESS.md +8 -8
PUBLIC_TESTING_QUICKSTART.md +11 -9
README.md +12 -6
SERVING_BENCHMARKS.md +26 -24

COMPLETION_AUDIT.md CHANGED Viewed

@@ -7,9 +7,9 @@ conservative: the product-path harness is release-candidate ready for local
 testing, the fresh v1.8 Qwen 3.6 LoRA adapter exists, and a merged full-model
 artifact serves locally on Gojira-B. Dynamic SGLang LoRA serving is not counted
 as release evidence because the corrected LoRA selector crashes on this
-adapter. Human review, website latency/SLA decisions, broader comparison evals,
-and Hugging Face write permissions are still required before publishing
-externally.
 ## Requirement Status
@@ -29,15 +29,15 @@ externally.
 | Local inference against new v1.7 checkpoint | SGLang served `kaiju_v17_business_owner` over Tailscale at `http://100.109.109.14:18083/v1` with `context=4096` and `mem_fraction=0.90`; website and proposal smoke tasks returned non-empty outputs. | Passed |
 | Stronger Qwen 3.6 v1.8 fine-tune | Gojira B was cleared of ComfyUI/SGLang/Ollama GPU conflicts; v1.8 finished with `metrics.json`, train runtime `11666.7564s`, train loss `0.9281658741335074`, and an adapter directory. | Passed |
 | v1.8 adapter merged into full model | `scripts/run-gojira-b-qwen36-lora-merge.sh` merged `/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter` into `/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`; remote artifact is `51G` with `14` safetensor shards and preserved base config/processor sidecars. | Passed |
-| Local inference against v1.8 merged checkpoint | `scripts/start-qwen36-merged-sglang.sh` serves `kaiju-coder-7` over Tailscale at `http://100.109.109.14:18083/v1`; current restored live endpoint reports max model len `16384`. Prior benchmarks proved 12k/16k/24k/32k startup and smoke evidence, with 32k treated as the high-context target rather than the currently parked runtime. | Passed |
 | v1.8 merged business-owner eval | Probe returned `1,155` visible chars in `60.17s`; proposal rerun scored `1/1`, `4.0/4.0`, `4,014` chars in `212.72s`; Jah credits backend scored `4.0/4.0`, `9,718` chars in `566.36s`. | Passed with latency caveat |
-| OpenCode local run path | Local OpenCode provider/agent is installed for `kaiju/kaiju-coder-7` with 16k context and the scoped no-autocontinue plugin at `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`. Fresh public smoke wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok`; packaged public verifier `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` passed `4/4` in `runs/public-opencode-smoke/20260603T182222Z/summary.md`, including wrong-directory leakage checks; loop-guard smoke wrote `loopguard.txt` with exactly `Kaiju Coder 7 loop guard installed`; latest harnessed customer-readiness pack `runs/opencode-customer-readiness/20260603T185835Z/summary.md` passed `4/4` with `28/28` required files, including release provenance and safety review. | Passed for harnessed/product path |
 | Runtime-quantized local path | vLLM bitsandbytes runtime quantization passed identity/code/business-doc smokes at 8k/16k, reported about `17.8 GiB` model memory, and passed OpenCode one-file smoke with exact content `Kaiju Coder 7 quantized runtime ok`. Persisted quantized weights are still pending. | Runtime recipe passed; persisted weights pending |
-| Paid API gateway scaffold | `cd gateway/cloudflare-worker && npm run check` passes `16/16` Worker tests covering bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, streaming/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, and account-scoped artifact download. `python3 scripts/check_paid_api_readiness.py --mode scaffold` now passes `17` checks, including the guarded `npm run prepare:cloudflare` resource-prep path, Wrangler dry-run deploy wiring, artifact route controls, sanitized launch-evidence template, and reviewed Cloudflare bindings template. `scripts/apply_paid_api_cloudflare_bindings.py` previews/applies real D1/KV/R2 bindings while refusing placeholders and secret-looking input. `scripts/collect_paid_api_launch_evidence.py` can preview or write the remaining sanitized staging evidence without storing API keys, full prompts, or model responses. `--mode launch` fails by design until real D1/KV/R2 bindings, Wrangler secrets, Stripe webhook staging evidence, paid-route staging request, latency evidence, and rollback proof are attached through `release/paid-api-launch-evidence.json`. | Local scaffold passed; live deployment pending |
 | Dynamic SGLang LoRA selector | Adapter-name-only serving can be base-equivalent; corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`. | Not release path |
-| Hugging Face helper repo upload readiness | Adapter, OpenCode helper, and runtime-quantized recipe staging folders build under `/tmp/kaiju-coder-7-hf-staging`; upload script is dry-run safe and namespace-configurable. Apply mode now requires staged checksum/integrity validation and `check_human_release_review.py --mode public` before repo creation. Local `hf` CLI is installed and authenticated as `restokes92`, but private repo creation attempts under `RichardEchols`, `RMDWLLC`, and `restokes92` returned `403 Forbidden`. | Package ready; upload blocked by review/token permissions |
-| Hugging Face merged model upload readiness | `scripts/prepare_hf_merged_model_metadata.sh` stages the model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` for the remote merged-model directory. Latest apply-mode metadata sync passed on Gojira-B using passwordless sudo rsync for the root-owned folder. `scripts/upload_hf_merged_model_from_gojira_b.sh` refuses to preview or upload unless that metadata and the `51G`/`14`-shard merged model are present; latest dry run confirmed `Metadata: present` and printed the correct `hf upload-large-folder` command. Apply mode requires `check_human_release_review.py --mode public --require-merged-upload` before remote upload. Gojira-B has `hf` `1.17.0` and auth, but repo creation still needs a write-capable namespace. | Package ready; upload blocked by review/token permissions |
-| Consolidated release readiness check | `python3 scripts/check_kaiju_public_release_readiness.py --mode local` reports local public-testing readiness while keeping Hugging Face namespace permission, paid API launch preflight, and human review as explicit manual blockers. `--mode public` remains red until those external gates pass. The local check calls `scripts/check_hf_staging_integrity.py` to validate staged files, public naming hygiene, raw secret-looking values, and checksums. It also requires `release/FINAL_RELEASE_REPORT.md`, generated by `scripts/generate_kaiju_final_report.py`, and the local `release/bundles/LATEST.json` archive checksum produced by `scripts/create_hf_release_bundle.py`, so the final release state has exact commands, blockers, changed files, first-test instructions, and a reviewable HF bundle. It also calls `scripts/check_human_release_review.py` so `release/HUMAN_RELEASE_REVIEW.md` is the structured human signoff gate. | Local mode passed; public mode pending |
 ## Commands With Current Passing Evidence
@@ -71,14 +71,10 @@ Kaiju Coder 7 merged model + deterministic business-owner harness + verifier + s
 That must be described honestly until external release review confirms:
-- human review of generated artifacts
 - raw website latency/SLA positioning or explicit harness-first website positioning
-- base Qwen and GLM comparison results
-- final human review of upstream license/notice packaging
-- Hugging Face write-capable token or namespace permission
-- Hugging Face repo creation permission for the 51GB merged model upload from
-  Gojira-B
-- final Hugging Face upload metadata and public/private release decision
-- live Cloudflare D1/KV/R2 resources, Stripe products/webhook endpoint,
-  deployment secrets, staging end-to-end paid API requests, rollback, and
-  support boundaries if exposed commercially

 testing, the fresh v1.8 Qwen 3.6 LoRA adapter exists, and a merged full-model
 artifact serves locally on Gojira-B. Dynamic SGLang LoRA serving is not counted
 as release evidence because the corrected LoRA selector crashes on this
+adapter. The public Hugging Face repos are uploaded and public; the remaining
+release caveats are raw-agent latency, GGUF runtime smoke, 32k live-default
+proof, and real Stripe live-mode charging.
 ## Requirement Status
 | Local inference against new v1.7 checkpoint | SGLang served `kaiju_v17_business_owner` over Tailscale at `http://100.109.109.14:18083/v1` with `context=4096` and `mem_fraction=0.90`; website and proposal smoke tasks returned non-empty outputs. | Passed |
 | Stronger Qwen 3.6 v1.8 fine-tune | Gojira B was cleared of ComfyUI/SGLang/Ollama GPU conflicts; v1.8 finished with `metrics.json`, train runtime `11666.7564s`, train loss `0.9281658741335074`, and an adapter directory. | Passed |
 | v1.8 adapter merged into full model | `scripts/run-gojira-b-qwen36-lora-merge.sh` merged `/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter` into `/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`; remote artifact is `51G` with `14` safetensor shards and preserved base config/processor sidecars. | Passed |
+| Local inference against v1.8 merged checkpoint | Current fast path serves `kaiju-coder-7` through vLLM bitsandbytes on Gojira-B at `http://100.109.109.14:18084/v1`, exposed locally through `http://127.0.0.1:18181/v1`; current live endpoint reports max model len `16384`. Prior SGLang benchmarks proved 12k/16k/24k/32k startup and smoke evidence, with 32k treated as the high-context target rather than the currently parked runtime. | Passed |
 | v1.8 merged business-owner eval | Probe returned `1,155` visible chars in `60.17s`; proposal rerun scored `1/1`, `4.0/4.0`, `4,014` chars in `212.72s`; Jah credits backend scored `4.0/4.0`, `9,718` chars in `566.36s`. | Passed with latency caveat |
+| OpenCode local run path | Local OpenCode provider/agent is installed for `kaiju/kaiju-coder-7` with 16k context and the scoped no-autocontinue plugin at `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`. Packaged public verifier `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` passed `4/4` in `runs/public-opencode-smoke/20260603T235002Z/summary.md`, including wrong-directory leakage checks; loop-guard smoke wrote `loopguard.txt` with exactly `Kaiju Coder 7 loop guard installed`; latest harnessed customer-readiness pack `runs/opencode-customer-readiness/20260603T185835Z/summary.md` passed `4/4` with `28/28` required files, including release provenance and safety review. | Passed for harnessed/product path |
 | Runtime-quantized local path | vLLM bitsandbytes runtime quantization passed identity/code/business-doc smokes at 8k/16k, reported about `17.8 GiB` model memory, and passed OpenCode one-file smoke with exact content `Kaiju Coder 7 quantized runtime ok`. Persisted quantized weights are still pending. | Runtime recipe passed; persisted weights pending |
+| Paid API gateway scaffold | `cd gateway/cloudflare-worker && npm run check` passes `16/16` Worker tests covering bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, streaming/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, and account-scoped artifact download. `python3 scripts/check_paid_api_readiness.py --mode scaffold` passes `17` checks. `python3 scripts/check_paid_api_readiness.py --mode launch` passes `27/27` checks after live Cloudflare bindings, Worker-to-Gojira proof, Stripe test-mode webhook evidence, staging latency, and rollback proof. Real customer charging still requires a deliberate Stripe live-mode switch and controlled live payment verification. | Scaffold and launch preflight passed; live-mode charging pending |
 | Dynamic SGLang LoRA selector | Adapter-name-only serving can be base-equivalent; corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`. | Not release path |
+| Hugging Face helper repo upload readiness | Adapter, OpenCode helper, and runtime-quantized recipe staging folders build under `/tmp/kaiju-coder-7-hf-staging`; public repos `RMDWLLC/kaiju-coder-7-adapter`, `RMDWLLC/kaiju-coder-7-opencode`, and `RMDWLLC/kaiju-coder-7-quantized-runtime` are uploaded and public. `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` verifies public downloads and helper package content. | Uploaded and public |
+| Hugging Face merged model upload readiness | `RMDWLLC/kaiju-coder-7` is uploaded and public with the merged `53.8G` model package and `14` safetensors shards recorded in `release/HF_UPLOAD_EVIDENCE.md`. Public downloads are verified; the previous private-storage blocker was resolved by switching the repos public. | Uploaded and public |
+| Consolidated release readiness check | `python3 scripts/check_kaiju_public_release_readiness.py --mode local`, `--mode hf-release`, and `--mode public` pass against the current fast proxy and public HF evidence. The checker validates staged files, public naming hygiene, secret-looking raw values, checksums, final report, HF bundle checksum, uploaded evidence, and human signoff. | Local, HF, and public modes passed |
 ## Commands With Current Passing Evidence
 That must be described honestly until external release review confirms:
+- GGUF Q8_0 runtime smoke before public quantized-weight claims
 - raw website latency/SLA positioning or explicit harness-first website positioning
+- broader base Qwen and GLM comparison results before superiority claims
+- 32k context freshly restarted and re-confirmed before making it the live
+  default
+- Stripe live-mode products/webhook secret and a controlled live payment before
+  selling real paid API access

EVAL_SCOREBOARD.md CHANGED Viewed

@@ -35,7 +35,7 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
 | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
 | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
 | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
-| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` | Passed; latest run `runs/public-opencode-smoke/20260603T232928Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
 | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
 | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
 | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
@@ -46,10 +46,10 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
 | Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
 | Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
 | GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
-| Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `84.43s`: website `24.59s`, owner AI company pack `29.99s` with `19` files, Stripe safety plan `9.93s`, CSV parser artifact `19.93s`; run `runs/public-demo-pack/20260603T232534Z/summary.md` | 2026-06-03 |
 | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
-| Hugging Face private repo create attempt | `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_release_staging.sh` with namespaces `RichardEchols`, `RMDWLLC`, and `restokes92` | Blocked by Hugging Face `403 Forbidden`; current token cannot create model repos in those namespaces | 2026-06-03 |
-| Hugging Face merged-model metadata and upload boundary | `bash scripts/prepare_hf_merged_model_metadata.sh`; `KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh`; `bash scripts/upload_hf_merged_model_from_gojira_b.sh`; `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Metadata prep synced model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` to Gojira-B; sudo rsync handled the root-owned merged folder; upload dry run confirmed metadata plus the `51G`/`14`-shard merged model before printing `hf upload-large-folder`; apply remains blocked by human review and Hugging Face namespace permission before any large upload | 2026-06-03 |
 | v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
 | Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
 | Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
@@ -64,11 +64,11 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
 | v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
 | Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
 | Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
-| GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending |
 | Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
-| Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Pending |
-| Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged and upload-scripted; upload blocked by HF token permissions and human/public-review decision |
 ## Decision Rule
-The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should still be published honestly as merged model plus deterministic harness plus verifier. Raw merged v1.8 is useful on business documents and Jah credits but slow on this SGLang stack. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.

 | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
 | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
 | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
+| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` | Passed; latest run `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
 | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
 | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
 | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
 | Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
 | Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
 | GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
+| Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `64.529s`: website `4.73s`, owner AI company pack `29.85s` with `19` files, Stripe safety plan `9.99s`, CSV parser artifact `19.97s`; run `runs/public-demo-pack/20260603T235009Z/summary.md` | 2026-06-03 |
 | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
+| Hugging Face public helper repos | `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` | Passed `17/17`; public downloads verified for adapter, OpenCode helper, and runtime helper, including installer dry-run, demo runner, and GGUF candidate note | 2026-06-03 |
+| Hugging Face merged-model upload | `KAIJU_HF_NAMESPACE=RMDWLLC KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Uploaded public repo `RMDWLLC/kaiju-coder-7`; `hf upload-large-folder` processed `53.8G/53.8G`, `39` files, `14` safetensors shards; metadata reports `private: false` | 2026-06-03 |
 | v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
 | Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
 | Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
 | v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
 | Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
 | Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
+| GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending; required only before superiority claims |
 | Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
+| Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Approved for public HF visibility and paid API launch preflight on 2026-06-03 |
+| Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged, bundled, uploaded to public HF repos, and verified with public downloads |
 ## Decision Rule
+The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should be published honestly as Kaiju Coder 7 plus deterministic harness plus verifier, with vLLM bitsandbytes plus the fast proxy as the current speed path. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.

FINAL_RELEASE_REPORT.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Kaiju Coder 7 Final Release Report
-Generated: `2026-06-03T23:34:00Z`
 Product name: `Kaiju Coder 7`
 Public model id: `kaiju-coder-7`
@@ -24,11 +24,11 @@ Stripe live-mode switch and controlled live payment verification.
 | Field | Value |
 |---|---|
-| Status | `fail` |
-| Base URL | `http://100.109.109.14:18083/v1` |
-| Model id | `unknown` |
-| Max model length | `unknown` |
-| Detail | `URLError(ConnectionRefusedError(61, 'Connection refused'))` |
 Recommended default today: `16k` context through `kaiju-coder-7`. Higher
 context has benchmark evidence, but the currently parked default is 16k for
@@ -38,9 +38,9 @@ stability and speed.
 | Area | Result |
 |---|---|
-| Local public-testing readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
-| Hugging Face release readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
-| Public launch readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
 | Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
 | Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
@@ -59,15 +59,11 @@ stability and speed.
 ## Hugging Face Release Blockers
-| Status | Check | Detail |
-|---|---|---|
-| fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
 ## Public Launch Blockers
-| Status | Check | Detail |
-|---|---|---|
-| fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
 ## Paid API Launch Blockers
@@ -276,9 +272,9 @@ human release review explicitly approves public paid API launch.
 | git HEAD | `git rev-parse HEAD` | 0 |
 | git origin/main | `git rev-parse origin/main` | 0 |
 | git status | `git status --short` | 0 |
-| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
-| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
-| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
 | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
 | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |

 # Kaiju Coder 7 Final Release Report
+Generated: `2026-06-03T23:53:31Z`
 Product name: `Kaiju Coder 7`
 Public model id: `kaiju-coder-7`
 | Field | Value |
 |---|---|
+| Status | `pass` |
+| Base URL | `http://127.0.0.1:18181/v1` |
+| Model id | `kaiju-coder-7` |
+| Max model length | `16384` |
+| Detail | `` |
 Recommended default today: `16k` context through `kaiju-coder-7`. Higher
 context has benchmark evidence, but the currently parked default is 16k for
 | Area | Result |
 |---|---|
+| Local public-testing readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
+| Hugging Face release readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
+| Public launch readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
 | Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
 | Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
 ## Hugging Face Release Blockers
+- No matching checks.
 ## Public Launch Blockers
+- No matching checks.
 ## Paid API Launch Blockers
 | git HEAD | `git rev-parse HEAD` | 0 |
 | git origin/main | `git rev-parse origin/main` | 0 |
 | git status | `git status --short` | 0 |
+| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
+| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
+| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
 | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
 | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |

GOAL_COMPLETION_AUDIT.md CHANGED Viewed

@@ -1,11 +1,11 @@
 # Kaiju Coder 7 Goal Completion Audit
-Generated: `2026-06-03T23:35:30Z`
 Overall: `complete`
 Summary: `18 passed / 0 blocked / 0 manual`
-This audit maps the active Kaiju Coder 7 objective to current evidence. It is stricter than local readiness: local public testing and Hugging Face release checks can pass while paid API launch remains blocked.
 ## Readiness Commands
@@ -34,7 +34,7 @@ This audit maps the active Kaiju Coder 7 objective to current evidence. It is st
 | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation |  |
 | Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums |  |
 | Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release |  |
-| Hugging Face | Merged 51GB model repo upload is complete or guarded and ready after human review/namespace permission. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh |  |
 | Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply |  |
 | Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md |  |
 | Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md |  |

 # Kaiju Coder 7 Goal Completion Audit
+Generated: `2026-06-03T23:53:44Z`
 Overall: `complete`
 Summary: `18 passed / 0 blocked / 0 manual`
+This audit maps the active Kaiju Coder 7 objective to current evidence across local runtime, Hugging Face release, OpenCode, paid API preflight, and remaining honest caveats.
 ## Readiness Commands
 | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation |  |
 | Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums |  |
 | Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release |  |
+| Hugging Face | Merged 51GB model repo upload is complete and public, or guarded with explicit evidence. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh |  |
 | Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply |  |
 | Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md |  |
 | Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md |  |

PAID_API_READINESS.md CHANGED Viewed

@@ -152,12 +152,12 @@ python3 scripts/check_paid_api_readiness.py --mode launch
 ```
 `check_kaiju_public_release_readiness.py --mode local` is the consolidated
-public-testing readiness command. It can pass while public upload and paid API
-launch remain manual blockers. `--mode hf-release` checks the downloadable
-model/helper release and requires sanitized Hugging Face namespace permission
-evidence plus human review while keeping paid API launch manual. `--mode public`
-must remain red until Hugging Face write permissions, live Cloudflare resources,
-Stripe staging evidence, rollback proof, and human review are complete.
 `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
 the current local/public readiness summaries, launch blockers, changed files,
@@ -167,8 +167,8 @@ lines.
 `check_kaiju_goal_completion.py --write` writes
 `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
-remain red while Hugging Face upload, human review, or live paid API launch
-evidence are missing.
 `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
 direct API smoke evidence, goal audit, final report, HF staging, local bundle,

 ```
 `check_kaiju_public_release_readiness.py --mode local` is the consolidated
+public-testing readiness command. `--mode hf-release` checks the downloadable
+model/helper release, public Hugging Face evidence, and human review while
+keeping live paid charging separate from model publication. `--mode public`
+now passes after public HF verification, live Cloudflare resource evidence,
+Stripe test-mode staging evidence, rollback proof, paid-route latency evidence,
+and human review are complete.
 `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
 the current local/public readiness summaries, launch blockers, changed files,
 `check_kaiju_goal_completion.py --write` writes
 `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
+remain green only while the live runtime, public HF evidence, human review, and
+paid API launch evidence continue to pass.
 `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
 direct API smoke evidence, goal audit, final report, HF staging, local bundle,

PUBLIC_TESTING_QUICKSTART.md CHANGED Viewed

@@ -129,7 +129,8 @@ Expected result:
 - Raw multi-file OpenCode generation: still too slow for broad paid claims;
   useful for testing, but paid API claims should favor harnessed product
   workflows until broader latency gates pass
-- Paid API: not public until launch preflight passes
 ## What Not To Claim Yet
@@ -153,14 +154,15 @@ Do claim:
 - a GGUF Q8_0 candidate exists, but is not public quantized-weights release
   evidence until runtime smoke passes
-## Current Blockers Before Public Release
-- Hugging Face repo creation still requires a write-capable token or namespace.
-- Full merged model upload has not completed; the merged folder must first have
-  the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
 - The GGUF Q8_0 candidate still needs a runtime smoke before public
   quantized-weights upload.
-- Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
-  secret verification, Stripe webhook staging evidence, staging traffic, latency
-  evidence, and rollback proof.
-- Human review is still required before public upload.

 - Raw multi-file OpenCode generation: still too slow for broad paid claims;
   useful for testing, but paid API claims should favor harnessed product
   workflows until broader latency gates pass
+- Paid API: not public until launch preflight passes and the Stripe live-mode
+  switch is deliberately completed
 ## What Not To Claim Yet
 - a GGUF Q8_0 candidate exists, but is not public quantized-weights release
   evidence until runtime smoke passes
+## Remaining Caveats Before Broader Claims
+- Hugging Face public release repos are uploaded and public under `RMDWLLC`.
 - The GGUF Q8_0 candidate still needs a runtime smoke before public
   quantized-weights upload.
+- Raw multi-file OpenCode generation is still not the public speed story; use
+  the deterministic router/harness for websites and business-owner packs.
+- Public paid API launch has approval and preflight evidence, but real customer
+  charging still needs a deliberate Stripe live-mode switch and controlled live
+  payment verification.
+- Do not claim 32k context as the live default until it is freshly restarted
+  and re-confirmed.

README.md CHANGED Viewed

@@ -95,17 +95,21 @@ Local product-path evidence:
 Merged serving evidence:
-- Endpoint: `http://100.109.109.14:18083/v1`
 - Served model: `kaiju-coder-7`
-- Tested context: `32768` on Gojira-B, with `16384` documented as the lower-load fallback.
 - Probe: `1,155` visible chars in `60.17s`.
 - Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
 - Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
 - OpenCode customer-readiness harness: `4/4` tasks passed, `28/28` required files written, including source/provenance and release-claim safety review.
 - vLLM nightly serving probe: passed at `16384` after `pandas` preinstall and
-  `--language-model-only`, but not faster enough to replace SGLang.
-- Runtime-quantized vLLM bitsandbytes: passed at `8192` and `16384`; 16k code
-  patch completed in `11.3s`, and logs reported about `17.8 GiB` model memory.
 Known comparison caveat:
@@ -117,5 +121,7 @@ Known comparison caveat:
 - Raw full-website generation has not yet passed the merged-model release sweep and should remain harness-first for paid delivery.
 - The deterministic harness remains the practical paid website workflow.
 - The adapter needs a strong app layer for file editing, tool use, auth, billing, rate limits, logging, and rollback.
-- Human review is still required before any public upload or paid production claim.
 - Not intended for high-risk medical, legal, financial, or safety-critical decisions without expert review.

 Merged serving evidence:
+- Current endpoint: `http://127.0.0.1:18181/v1`, forwarding to vLLM
+  bitsandbytes on Gojira B at `http://100.109.109.14:18084/v1`
 - Served model: `kaiju-coder-7`
+- Tested context: `16384` for the current OpenCode fast path. Historical
+  SGLang benchmark evidence includes `32768`, but 32k should be freshly
+  restarted and re-confirmed before being called the live default.
 - Probe: `1,155` visible chars in `60.17s`.
 - Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
 - Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
 - OpenCode customer-readiness harness: `4/4` tasks passed, `28/28` required files written, including source/provenance and release-claim safety review.
 - vLLM nightly serving probe: passed at `16384` after `pandas` preinstall and
+  `--language-model-only`.
+- Runtime-quantized vLLM bitsandbytes: current speed path; passed at `8192`
+  and `16384`; 16k code patch completed in `11.3s`, and logs reported about
+  `17.8 GiB` model memory.
 Known comparison caveat:
 - Raw full-website generation has not yet passed the merged-model release sweep and should remain harness-first for paid delivery.
 - The deterministic harness remains the practical paid website workflow.
 - The adapter needs a strong app layer for file editing, tool use, auth, billing, rate limits, logging, and rollback.
+- Public HF upload and human review are complete for testing. Real customer
+  paid charging still requires Stripe live-mode setup and controlled live
+  payment verification.
 - Not intended for high-risk medical, legal, financial, or safety-critical decisions without expert review.

SERVING_BENCHMARKS.md CHANGED Viewed

@@ -6,12 +6,15 @@ The model id must remain `kaiju-coder-7`.
 ## Current Live Runtime
 - Host: Gojira-B over Tailscale
-- Base URL: `http://100.109.109.14:18083/v1`
-- Serving stack: SGLang merged full model
-- Current verified post-quantization restored context: `16384`
 - Tested high-context target: `32768`
-- Current container: `qwen36-merged-sglang-18083`
-- Current caveat: direct raw generation is slow for multi-file OpenCode work.
 ## Benchmark Command
@@ -294,12 +297,11 @@ Run: `runs/benchmarks/20260603T151244Z-kaiju-coder-7-serving/summary.md`
 | vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
 | vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
-Interpretation: vLLM now runs Kaiju Coder 7 at 16k, but it is not clearly
-faster than SGLang on the current smoke prompts. Keep SGLang as the recommended
-runtime because it has stable OpenCode smoke evidence, a simpler launch path,
-and historical 32k proof. Keep the live/default OpenCode profile at 16k until
-32k is freshly re-confirmed. Keep the vLLM scripts for future nightly-image or
-quantized-weight testing.
 ## vLLM bitsandbytes Runtime-Quantized Candidate
@@ -353,12 +355,12 @@ bash scripts/run_kaiju_quantized_opencode_smoke.sh
 Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
 exactly `Kaiju Coder 7 quantized runtime ok`.
-Recommendation: keep SGLang as the default public/OpenCode runtime and keep the
-currently installed OpenCode profile at 16k unless the 32k target has just been
-restarted and re-confirmed. Treat vLLM bitsandbytes as the current working
-quantized local candidate for advanced GPU users and future paid API speed
-experiments. It now has direct identity/code/business-doc evidence plus an
-OpenCode one-file smoke, but it is not a persisted quantized-weights repo.
 ## 2026-06-03 Fast Proxy And Website Harness Speed Pass
@@ -386,7 +388,7 @@ Fresh OpenCode smoke through the local fast proxy:
 - Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
 - Result: passed in about `23.5s`, wrote the exact requested file.
 - Packaged public verifier after exact-content agent rule:
-  `runs/public-opencode-smoke/20260603T232928Z/summary.md`, `4/4`
   passed through `http://127.0.0.1:18181/v1`.
 Website harness/router speed pass:
@@ -414,16 +416,16 @@ python3 scripts/run_kaiju_public_demo_pack.py \
   --planner-timeout 90
 ```
-Run: `runs/public-demo-pack/20260603T232534Z/summary.md`
 | Task | Result | Seconds | Changed files |
 | --- | --- | ---: | ---: |
-| Website | Passed | 24.59 | 2 |
-| Owner AI company pack | Passed | 29.99 | 19 |
-| Stripe safety plan | Passed | 9.93 | 2 |
-| CSV parser artifact | Passed | 19.93 | 2 |
-Total: `4/4` passed in `84.43s`.
 ## Persisted GGUF Q8_0 Candidate

 ## Current Live Runtime
 - Host: Gojira-B over Tailscale
+- Local OpenCode base URL: `http://127.0.0.1:18181/v1`
+- Upstream base URL: `http://100.109.109.14:18084/v1`
+- Serving stack: vLLM bitsandbytes runtime quantization behind the Kaiju fast
+  proxy
+- Current verified context: `16384`
 - Tested high-context target: `32768`
+- Current container: `qwen36-merged-vllm-18084`
+- Current caveat: direct raw generation is still slow for multi-file OpenCode
+  work; use the deterministic router/harness for public business-owner demos.
 ## Benchmark Command
 | vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
 | vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
+Interpretation: unquantized vLLM now runs Kaiju Coder 7 at 16k, but it was not
+clearly faster than SGLang on these smoke prompts. This is historical fallback
+evidence. The later bitsandbytes vLLM path plus fast proxy is the active speed
+path. Keep the live/default OpenCode profile at 16k until 32k is freshly
+re-confirmed.
 ## vLLM bitsandbytes Runtime-Quantized Candidate
 Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
 exactly `Kaiju Coder 7 quantized runtime ok`.
+Recommendation: use vLLM bitsandbytes behind the local fast proxy as the
+current public/OpenCode speed path and keep the installed OpenCode profile at
+16k unless the 32k target has just been restarted and re-confirmed. Treat
+SGLang as fallback and historical high-context evidence. vLLM bitsandbytes has
+direct identity/code/business-doc evidence plus an OpenCode one-file smoke, but
+it is not a persisted quantized-weights repo.
 ## 2026-06-03 Fast Proxy And Website Harness Speed Pass
 - Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
 - Result: passed in about `23.5s`, wrote the exact requested file.
 - Packaged public verifier after exact-content agent rule:
+  `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4`
   passed through `http://127.0.0.1:18181/v1`.
 Website harness/router speed pass:
   --planner-timeout 90
 ```
+Run: `runs/public-demo-pack/20260603T235009Z/summary.md`
 | Task | Result | Seconds | Changed files |
 | --- | --- | ---: | ---: |
+| Website | Passed | 4.73 | 2 |
+| Owner AI company pack | Passed | 29.85 | 19 |
+| Stripe safety plan | Passed | 9.99 | 2 |
+| CSV parser artifact | Passed | 19.97 | 2 |
+Total: `4/4` passed in `64.529s`.
 ## Persisted GGUF Q8_0 Candidate