restokes92 commited on
Commit
97fc8fc
·
verified ·
1 Parent(s): 20341f3

Upload Kaiju Coder 7 adapter release package

Browse files
COMPLETION_AUDIT.md CHANGED
@@ -7,9 +7,9 @@ conservative: the product-path harness is release-candidate ready for local
7
  testing, the fresh v1.8 Qwen 3.6 LoRA adapter exists, and a merged full-model
8
  artifact serves locally on Gojira-B. Dynamic SGLang LoRA serving is not counted
9
  as release evidence because the corrected LoRA selector crashes on this
10
- adapter. Human review, website latency/SLA decisions, broader comparison evals,
11
- and Hugging Face write permissions are still required before publishing
12
- externally.
13
 
14
  ## Requirement Status
15
 
@@ -29,15 +29,15 @@ externally.
29
  | Local inference against new v1.7 checkpoint | SGLang served `kaiju_v17_business_owner` over Tailscale at `http://100.109.109.14:18083/v1` with `context=4096` and `mem_fraction=0.90`; website and proposal smoke tasks returned non-empty outputs. | Passed |
30
  | Stronger Qwen 3.6 v1.8 fine-tune | Gojira B was cleared of ComfyUI/SGLang/Ollama GPU conflicts; v1.8 finished with `metrics.json`, train runtime `11666.7564s`, train loss `0.9281658741335074`, and an adapter directory. | Passed |
31
  | v1.8 adapter merged into full model | `scripts/run-gojira-b-qwen36-lora-merge.sh` merged `/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter` into `/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`; remote artifact is `51G` with `14` safetensor shards and preserved base config/processor sidecars. | Passed |
32
- | Local inference against v1.8 merged checkpoint | `scripts/start-qwen36-merged-sglang.sh` serves `kaiju-coder-7` over Tailscale at `http://100.109.109.14:18083/v1`; current restored live endpoint reports max model len `16384`. Prior benchmarks proved 12k/16k/24k/32k startup and smoke evidence, with 32k treated as the high-context target rather than the currently parked runtime. | Passed |
33
  | v1.8 merged business-owner eval | Probe returned `1,155` visible chars in `60.17s`; proposal rerun scored `1/1`, `4.0/4.0`, `4,014` chars in `212.72s`; Jah credits backend scored `4.0/4.0`, `9,718` chars in `566.36s`. | Passed with latency caveat |
34
- | OpenCode local run path | Local OpenCode provider/agent is installed for `kaiju/kaiju-coder-7` with 16k context and the scoped no-autocontinue plugin at `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`. Fresh public smoke wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok`; packaged public verifier `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` passed `4/4` in `runs/public-opencode-smoke/20260603T182222Z/summary.md`, including wrong-directory leakage checks; loop-guard smoke wrote `loopguard.txt` with exactly `Kaiju Coder 7 loop guard installed`; latest harnessed customer-readiness pack `runs/opencode-customer-readiness/20260603T185835Z/summary.md` passed `4/4` with `28/28` required files, including release provenance and safety review. | Passed for harnessed/product path |
35
  | Runtime-quantized local path | vLLM bitsandbytes runtime quantization passed identity/code/business-doc smokes at 8k/16k, reported about `17.8 GiB` model memory, and passed OpenCode one-file smoke with exact content `Kaiju Coder 7 quantized runtime ok`. Persisted quantized weights are still pending. | Runtime recipe passed; persisted weights pending |
36
- | Paid API gateway scaffold | `cd gateway/cloudflare-worker && npm run check` passes `16/16` Worker tests covering bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, streaming/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, and account-scoped artifact download. `python3 scripts/check_paid_api_readiness.py --mode scaffold` now passes `17` checks, including the guarded `npm run prepare:cloudflare` resource-prep path, Wrangler dry-run deploy wiring, artifact route controls, sanitized launch-evidence template, and reviewed Cloudflare bindings template. `scripts/apply_paid_api_cloudflare_bindings.py` previews/applies real D1/KV/R2 bindings while refusing placeholders and secret-looking input. `scripts/collect_paid_api_launch_evidence.py` can preview or write the remaining sanitized staging evidence without storing API keys, full prompts, or model responses. `--mode launch` fails by design until real D1/KV/R2 bindings, Wrangler secrets, Stripe webhook staging evidence, paid-route staging request, latency evidence, and rollback proof are attached through `release/paid-api-launch-evidence.json`. | Local scaffold passed; live deployment pending |
37
  | Dynamic SGLang LoRA selector | Adapter-name-only serving can be base-equivalent; corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`. | Not release path |
38
- | Hugging Face helper repo upload readiness | Adapter, OpenCode helper, and runtime-quantized recipe staging folders build under `/tmp/kaiju-coder-7-hf-staging`; upload script is dry-run safe and namespace-configurable. Apply mode now requires staged checksum/integrity validation and `check_human_release_review.py --mode public` before repo creation. Local `hf` CLI is installed and authenticated as `restokes92`, but private repo creation attempts under `RichardEchols`, `RMDWLLC`, and `restokes92` returned `403 Forbidden`. | Package ready; upload blocked by review/token permissions |
39
- | Hugging Face merged model upload readiness | `scripts/prepare_hf_merged_model_metadata.sh` stages the model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` for the remote merged-model directory. Latest apply-mode metadata sync passed on Gojira-B using passwordless sudo rsync for the root-owned folder. `scripts/upload_hf_merged_model_from_gojira_b.sh` refuses to preview or upload unless that metadata and the `51G`/`14`-shard merged model are present; latest dry run confirmed `Metadata: present` and printed the correct `hf upload-large-folder` command. Apply mode requires `check_human_release_review.py --mode public --require-merged-upload` before remote upload. Gojira-B has `hf` `1.17.0` and auth, but repo creation still needs a write-capable namespace. | Package ready; upload blocked by review/token permissions |
40
- | Consolidated release readiness check | `python3 scripts/check_kaiju_public_release_readiness.py --mode local` reports local public-testing readiness while keeping Hugging Face namespace permission, paid API launch preflight, and human review as explicit manual blockers. `--mode public` remains red until those external gates pass. The local check calls `scripts/check_hf_staging_integrity.py` to validate staged files, public naming hygiene, raw secret-looking values, and checksums. It also requires `release/FINAL_RELEASE_REPORT.md`, generated by `scripts/generate_kaiju_final_report.py`, and the local `release/bundles/LATEST.json` archive checksum produced by `scripts/create_hf_release_bundle.py`, so the final release state has exact commands, blockers, changed files, first-test instructions, and a reviewable HF bundle. It also calls `scripts/check_human_release_review.py` so `release/HUMAN_RELEASE_REVIEW.md` is the structured human signoff gate. | Local mode passed; public mode pending |
41
 
42
  ## Commands With Current Passing Evidence
43
 
@@ -71,14 +71,10 @@ Kaiju Coder 7 merged model + deterministic business-owner harness + verifier + s
71
 
72
  That must be described honestly until external release review confirms:
73
 
74
- - human review of generated artifacts
75
  - raw website latency/SLA positioning or explicit harness-first website positioning
76
- - base Qwen and GLM comparison results
77
- - final human review of upstream license/notice packaging
78
- - Hugging Face write-capable token or namespace permission
79
- - Hugging Face repo creation permission for the 51GB merged model upload from
80
- Gojira-B
81
- - final Hugging Face upload metadata and public/private release decision
82
- - live Cloudflare D1/KV/R2 resources, Stripe products/webhook endpoint,
83
- deployment secrets, staging end-to-end paid API requests, rollback, and
84
- support boundaries if exposed commercially
 
7
  testing, the fresh v1.8 Qwen 3.6 LoRA adapter exists, and a merged full-model
8
  artifact serves locally on Gojira-B. Dynamic SGLang LoRA serving is not counted
9
  as release evidence because the corrected LoRA selector crashes on this
10
+ adapter. The public Hugging Face repos are uploaded and public; the remaining
11
+ release caveats are raw-agent latency, GGUF runtime smoke, 32k live-default
12
+ proof, and real Stripe live-mode charging.
13
 
14
  ## Requirement Status
15
 
 
29
  | Local inference against new v1.7 checkpoint | SGLang served `kaiju_v17_business_owner` over Tailscale at `http://100.109.109.14:18083/v1` with `context=4096` and `mem_fraction=0.90`; website and proposal smoke tasks returned non-empty outputs. | Passed |
30
  | Stronger Qwen 3.6 v1.8 fine-tune | Gojira B was cleared of ComfyUI/SGLang/Ollama GPU conflicts; v1.8 finished with `metrics.json`, train runtime `11666.7564s`, train loss `0.9281658741335074`, and an adapter directory. | Passed |
31
  | v1.8 adapter merged into full model | `scripts/run-gojira-b-qwen36-lora-merge.sh` merged `/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter` into `/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`; remote artifact is `51G` with `14` safetensor shards and preserved base config/processor sidecars. | Passed |
32
+ | Local inference against v1.8 merged checkpoint | Current fast path serves `kaiju-coder-7` through vLLM bitsandbytes on Gojira-B at `http://100.109.109.14:18084/v1`, exposed locally through `http://127.0.0.1:18181/v1`; current live endpoint reports max model len `16384`. Prior SGLang benchmarks proved 12k/16k/24k/32k startup and smoke evidence, with 32k treated as the high-context target rather than the currently parked runtime. | Passed |
33
  | v1.8 merged business-owner eval | Probe returned `1,155` visible chars in `60.17s`; proposal rerun scored `1/1`, `4.0/4.0`, `4,014` chars in `212.72s`; Jah credits backend scored `4.0/4.0`, `9,718` chars in `566.36s`. | Passed with latency caveat |
34
+ | OpenCode local run path | Local OpenCode provider/agent is installed for `kaiju/kaiju-coder-7` with 16k context and the scoped no-autocontinue plugin at `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`. Packaged public verifier `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` passed `4/4` in `runs/public-opencode-smoke/20260603T235002Z/summary.md`, including wrong-directory leakage checks; loop-guard smoke wrote `loopguard.txt` with exactly `Kaiju Coder 7 loop guard installed`; latest harnessed customer-readiness pack `runs/opencode-customer-readiness/20260603T185835Z/summary.md` passed `4/4` with `28/28` required files, including release provenance and safety review. | Passed for harnessed/product path |
35
  | Runtime-quantized local path | vLLM bitsandbytes runtime quantization passed identity/code/business-doc smokes at 8k/16k, reported about `17.8 GiB` model memory, and passed OpenCode one-file smoke with exact content `Kaiju Coder 7 quantized runtime ok`. Persisted quantized weights are still pending. | Runtime recipe passed; persisted weights pending |
36
+ | Paid API gateway scaffold | `cd gateway/cloudflare-worker && npm run check` passes `16/16` Worker tests covering bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, streaming/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, and account-scoped artifact download. `python3 scripts/check_paid_api_readiness.py --mode scaffold` passes `17` checks. `python3 scripts/check_paid_api_readiness.py --mode launch` passes `27/27` checks after live Cloudflare bindings, Worker-to-Gojira proof, Stripe test-mode webhook evidence, staging latency, and rollback proof. Real customer charging still requires a deliberate Stripe live-mode switch and controlled live payment verification. | Scaffold and launch preflight passed; live-mode charging pending |
37
  | Dynamic SGLang LoRA selector | Adapter-name-only serving can be base-equivalent; corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`. | Not release path |
38
+ | Hugging Face helper repo upload readiness | Adapter, OpenCode helper, and runtime-quantized recipe staging folders build under `/tmp/kaiju-coder-7-hf-staging`; public repos `RMDWLLC/kaiju-coder-7-adapter`, `RMDWLLC/kaiju-coder-7-opencode`, and `RMDWLLC/kaiju-coder-7-quantized-runtime` are uploaded and public. `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` verifies public downloads and helper package content. | Uploaded and public |
39
+ | Hugging Face merged model upload readiness | `RMDWLLC/kaiju-coder-7` is uploaded and public with the merged `53.8G` model package and `14` safetensors shards recorded in `release/HF_UPLOAD_EVIDENCE.md`. Public downloads are verified; the previous private-storage blocker was resolved by switching the repos public. | Uploaded and public |
40
+ | Consolidated release readiness check | `python3 scripts/check_kaiju_public_release_readiness.py --mode local`, `--mode hf-release`, and `--mode public` pass against the current fast proxy and public HF evidence. The checker validates staged files, public naming hygiene, secret-looking raw values, checksums, final report, HF bundle checksum, uploaded evidence, and human signoff. | Local, HF, and public modes passed |
41
 
42
  ## Commands With Current Passing Evidence
43
 
 
71
 
72
  That must be described honestly until external release review confirms:
73
 
74
+ - GGUF Q8_0 runtime smoke before public quantized-weight claims
75
  - raw website latency/SLA positioning or explicit harness-first website positioning
76
+ - broader base Qwen and GLM comparison results before superiority claims
77
+ - 32k context freshly restarted and re-confirmed before making it the live
78
+ default
79
+ - Stripe live-mode products/webhook secret and a controlled live payment before
80
+ selling real paid API access
 
 
 
 
EVAL_SCOREBOARD.md CHANGED
@@ -35,7 +35,7 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
35
  | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
36
  | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
37
  | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
38
- | Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` | Passed; latest run `runs/public-opencode-smoke/20260603T232928Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
39
  | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
40
  | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
41
  | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
@@ -46,10 +46,10 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
46
  | Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
47
  | Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
48
  | GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
49
- | Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `84.43s`: website `24.59s`, owner AI company pack `29.99s` with `19` files, Stripe safety plan `9.93s`, CSV parser artifact `19.93s`; run `runs/public-demo-pack/20260603T232534Z/summary.md` | 2026-06-03 |
50
  | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
51
- | Hugging Face private repo create attempt | `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_release_staging.sh` with namespaces `RichardEchols`, `RMDWLLC`, and `restokes92` | Blocked by Hugging Face `403 Forbidden`; current token cannot create model repos in those namespaces | 2026-06-03 |
52
- | Hugging Face merged-model metadata and upload boundary | `bash scripts/prepare_hf_merged_model_metadata.sh`; `KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh`; `bash scripts/upload_hf_merged_model_from_gojira_b.sh`; `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Metadata prep synced model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` to Gojira-B; sudo rsync handled the root-owned merged folder; upload dry run confirmed metadata plus the `51G`/`14`-shard merged model before printing `hf upload-large-folder`; apply remains blocked by human review and Hugging Face namespace permission before any large upload | 2026-06-03 |
53
  | v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
54
  | Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
55
  | Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
@@ -64,11 +64,11 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
64
  | v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
65
  | Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
66
  | Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
67
- | GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending |
68
  | Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
69
- | Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Pending |
70
- | Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged and upload-scripted; upload blocked by HF token permissions and human/public-review decision |
71
 
72
  ## Decision Rule
73
 
74
- The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should still be published honestly as merged model plus deterministic harness plus verifier. Raw merged v1.8 is useful on business documents and Jah credits but slow on this SGLang stack. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.
 
35
  | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
36
  | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
37
  | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
38
+ | Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` | Passed; latest run `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
39
  | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
40
  | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
41
  | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
 
46
  | Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
47
  | Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
48
  | GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
49
+ | Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `64.529s`: website `4.73s`, owner AI company pack `29.85s` with `19` files, Stripe safety plan `9.99s`, CSV parser artifact `19.97s`; run `runs/public-demo-pack/20260603T235009Z/summary.md` | 2026-06-03 |
50
  | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
51
+ | Hugging Face public helper repos | `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` | Passed `17/17`; public downloads verified for adapter, OpenCode helper, and runtime helper, including installer dry-run, demo runner, and GGUF candidate note | 2026-06-03 |
52
+ | Hugging Face merged-model upload | `KAIJU_HF_NAMESPACE=RMDWLLC KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Uploaded public repo `RMDWLLC/kaiju-coder-7`; `hf upload-large-folder` processed `53.8G/53.8G`, `39` files, `14` safetensors shards; metadata reports `private: false` | 2026-06-03 |
53
  | v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
54
  | Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
55
  | Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
 
64
  | v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
65
  | Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
66
  | Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
67
+ | GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending; required only before superiority claims |
68
  | Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
69
+ | Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Approved for public HF visibility and paid API launch preflight on 2026-06-03 |
70
+ | Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged, bundled, uploaded to public HF repos, and verified with public downloads |
71
 
72
  ## Decision Rule
73
 
74
+ The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should be published honestly as Kaiju Coder 7 plus deterministic harness plus verifier, with vLLM bitsandbytes plus the fast proxy as the current speed path. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.
FINAL_RELEASE_REPORT.md CHANGED
@@ -1,6 +1,6 @@
1
  # Kaiju Coder 7 Final Release Report
2
 
3
- Generated: `2026-06-03T23:34:00Z`
4
 
5
  Product name: `Kaiju Coder 7`
6
  Public model id: `kaiju-coder-7`
@@ -24,11 +24,11 @@ Stripe live-mode switch and controlled live payment verification.
24
 
25
  | Field | Value |
26
  |---|---|
27
- | Status | `fail` |
28
- | Base URL | `http://100.109.109.14:18083/v1` |
29
- | Model id | `unknown` |
30
- | Max model length | `unknown` |
31
- | Detail | `URLError(ConnectionRefusedError(61, 'Connection refused'))` |
32
 
33
  Recommended default today: `16k` context through `kaiju-coder-7`. Higher
34
  context has benchmark evidence, but the currently parked default is 16k for
@@ -38,9 +38,9 @@ stability and speed.
38
 
39
  | Area | Result |
40
  |---|---|
41
- | Local public-testing readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
42
- | Hugging Face release readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
43
- | Public launch readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
44
  | Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
45
  | Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
46
 
@@ -59,15 +59,11 @@ stability and speed.
59
 
60
  ## Hugging Face Release Blockers
61
 
62
- | Status | Check | Detail |
63
- |---|---|---|
64
- | fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
65
 
66
  ## Public Launch Blockers
67
 
68
- | Status | Check | Detail |
69
- |---|---|---|
70
- | fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
71
 
72
  ## Paid API Launch Blockers
73
 
@@ -276,9 +272,9 @@ human release review explicitly approves public paid API launch.
276
  | git HEAD | `git rev-parse HEAD` | 0 |
277
  | git origin/main | `git rev-parse origin/main` | 0 |
278
  | git status | `git status --short` | 0 |
279
- | local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
280
- | HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
281
- | public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
282
  | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
283
  | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
284
 
 
1
  # Kaiju Coder 7 Final Release Report
2
 
3
+ Generated: `2026-06-03T23:53:31Z`
4
 
5
  Product name: `Kaiju Coder 7`
6
  Public model id: `kaiju-coder-7`
 
24
 
25
  | Field | Value |
26
  |---|---|
27
+ | Status | `pass` |
28
+ | Base URL | `http://127.0.0.1:18181/v1` |
29
+ | Model id | `kaiju-coder-7` |
30
+ | Max model length | `16384` |
31
+ | Detail | `` |
32
 
33
  Recommended default today: `16k` context through `kaiju-coder-7`. Higher
34
  context has benchmark evidence, but the currently parked default is 16k for
 
38
 
39
  | Area | Result |
40
  |---|---|
41
+ | Local public-testing readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
42
+ | Hugging Face release readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
43
+ | Public launch readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
44
  | Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
45
  | Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
46
 
 
59
 
60
  ## Hugging Face Release Blockers
61
 
62
+ - No matching checks.
 
 
63
 
64
  ## Public Launch Blockers
65
 
66
+ - No matching checks.
 
 
67
 
68
  ## Paid API Launch Blockers
69
 
 
272
  | git HEAD | `git rev-parse HEAD` | 0 |
273
  | git origin/main | `git rev-parse origin/main` | 0 |
274
  | git status | `git status --short` | 0 |
275
+ | local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
276
+ | HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
277
+ | public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
278
  | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
279
  | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
280
 
GOAL_COMPLETION_AUDIT.md CHANGED
@@ -1,11 +1,11 @@
1
  # Kaiju Coder 7 Goal Completion Audit
2
 
3
- Generated: `2026-06-03T23:35:30Z`
4
 
5
  Overall: `complete`
6
  Summary: `18 passed / 0 blocked / 0 manual`
7
 
8
- This audit maps the active Kaiju Coder 7 objective to current evidence. It is stricter than local readiness: local public testing and Hugging Face release checks can pass while paid API launch remains blocked.
9
 
10
  ## Readiness Commands
11
 
@@ -34,7 +34,7 @@ This audit maps the active Kaiju Coder 7 objective to current evidence. It is st
34
  | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
35
  | Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums | |
36
  | Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
37
- | Hugging Face | Merged 51GB model repo upload is complete or guarded and ready after human review/namespace permission. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh | |
38
  | Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply | |
39
  | Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
40
  | Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md | |
 
1
  # Kaiju Coder 7 Goal Completion Audit
2
 
3
+ Generated: `2026-06-03T23:53:44Z`
4
 
5
  Overall: `complete`
6
  Summary: `18 passed / 0 blocked / 0 manual`
7
 
8
+ This audit maps the active Kaiju Coder 7 objective to current evidence across local runtime, Hugging Face release, OpenCode, paid API preflight, and remaining honest caveats.
9
 
10
  ## Readiness Commands
11
 
 
34
  | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
35
  | Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums | |
36
  | Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
37
+ | Hugging Face | Merged 51GB model repo upload is complete and public, or guarded with explicit evidence. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh | |
38
  | Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply | |
39
  | Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
40
  | Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md | |
PAID_API_READINESS.md CHANGED
@@ -152,12 +152,12 @@ python3 scripts/check_paid_api_readiness.py --mode launch
152
  ```
153
 
154
  `check_kaiju_public_release_readiness.py --mode local` is the consolidated
155
- public-testing readiness command. It can pass while public upload and paid API
156
- launch remain manual blockers. `--mode hf-release` checks the downloadable
157
- model/helper release and requires sanitized Hugging Face namespace permission
158
- evidence plus human review while keeping paid API launch manual. `--mode public`
159
- must remain red until Hugging Face write permissions, live Cloudflare resources,
160
- Stripe staging evidence, rollback proof, and human review are complete.
161
 
162
  `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
163
  the current local/public readiness summaries, launch blockers, changed files,
@@ -167,8 +167,8 @@ lines.
167
 
168
  `check_kaiju_goal_completion.py --write` writes
169
  `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
170
- remain red while Hugging Face upload, human review, or live paid API launch
171
- evidence are missing.
172
 
173
  `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
174
  direct API smoke evidence, goal audit, final report, HF staging, local bundle,
 
152
  ```
153
 
154
  `check_kaiju_public_release_readiness.py --mode local` is the consolidated
155
+ public-testing readiness command. `--mode hf-release` checks the downloadable
156
+ model/helper release, public Hugging Face evidence, and human review while
157
+ keeping live paid charging separate from model publication. `--mode public`
158
+ now passes after public HF verification, live Cloudflare resource evidence,
159
+ Stripe test-mode staging evidence, rollback proof, paid-route latency evidence,
160
+ and human review are complete.
161
 
162
  `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
163
  the current local/public readiness summaries, launch blockers, changed files,
 
167
 
168
  `check_kaiju_goal_completion.py --write` writes
169
  `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
170
+ remain green only while the live runtime, public HF evidence, human review, and
171
+ paid API launch evidence continue to pass.
172
 
173
  `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
174
  direct API smoke evidence, goal audit, final report, HF staging, local bundle,
PUBLIC_TESTING_QUICKSTART.md CHANGED
@@ -129,7 +129,8 @@ Expected result:
129
  - Raw multi-file OpenCode generation: still too slow for broad paid claims;
130
  useful for testing, but paid API claims should favor harnessed product
131
  workflows until broader latency gates pass
132
- - Paid API: not public until launch preflight passes
 
133
 
134
  ## What Not To Claim Yet
135
 
@@ -153,14 +154,15 @@ Do claim:
153
  - a GGUF Q8_0 candidate exists, but is not public quantized-weights release
154
  evidence until runtime smoke passes
155
 
156
- ## Current Blockers Before Public Release
157
 
158
- - Hugging Face repo creation still requires a write-capable token or namespace.
159
- - Full merged model upload has not completed; the merged folder must first have
160
- the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
161
  - The GGUF Q8_0 candidate still needs a runtime smoke before public
162
  quantized-weights upload.
163
- - Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
164
- secret verification, Stripe webhook staging evidence, staging traffic, latency
165
- evidence, and rollback proof.
166
- - Human review is still required before public upload.
 
 
 
 
129
  - Raw multi-file OpenCode generation: still too slow for broad paid claims;
130
  useful for testing, but paid API claims should favor harnessed product
131
  workflows until broader latency gates pass
132
+ - Paid API: not public until launch preflight passes and the Stripe live-mode
133
+ switch is deliberately completed
134
 
135
  ## What Not To Claim Yet
136
 
 
154
  - a GGUF Q8_0 candidate exists, but is not public quantized-weights release
155
  evidence until runtime smoke passes
156
 
157
+ ## Remaining Caveats Before Broader Claims
158
 
159
+ - Hugging Face public release repos are uploaded and public under `RMDWLLC`.
 
 
160
  - The GGUF Q8_0 candidate still needs a runtime smoke before public
161
  quantized-weights upload.
162
+ - Raw multi-file OpenCode generation is still not the public speed story; use
163
+ the deterministic router/harness for websites and business-owner packs.
164
+ - Public paid API launch has approval and preflight evidence, but real customer
165
+ charging still needs a deliberate Stripe live-mode switch and controlled live
166
+ payment verification.
167
+ - Do not claim 32k context as the live default until it is freshly restarted
168
+ and re-confirmed.
README.md CHANGED
@@ -95,17 +95,21 @@ Local product-path evidence:
95
 
96
  Merged serving evidence:
97
 
98
- - Endpoint: `http://100.109.109.14:18083/v1`
 
99
  - Served model: `kaiju-coder-7`
100
- - Tested context: `32768` on Gojira-B, with `16384` documented as the lower-load fallback.
 
 
101
  - Probe: `1,155` visible chars in `60.17s`.
102
  - Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
103
  - Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
104
  - OpenCode customer-readiness harness: `4/4` tasks passed, `28/28` required files written, including source/provenance and release-claim safety review.
105
  - vLLM nightly serving probe: passed at `16384` after `pandas` preinstall and
106
- `--language-model-only`, but not faster enough to replace SGLang.
107
- - Runtime-quantized vLLM bitsandbytes: passed at `8192` and `16384`; 16k code
108
- patch completed in `11.3s`, and logs reported about `17.8 GiB` model memory.
 
109
 
110
  Known comparison caveat:
111
 
@@ -117,5 +121,7 @@ Known comparison caveat:
117
  - Raw full-website generation has not yet passed the merged-model release sweep and should remain harness-first for paid delivery.
118
  - The deterministic harness remains the practical paid website workflow.
119
  - The adapter needs a strong app layer for file editing, tool use, auth, billing, rate limits, logging, and rollback.
120
- - Human review is still required before any public upload or paid production claim.
 
 
121
  - Not intended for high-risk medical, legal, financial, or safety-critical decisions without expert review.
 
95
 
96
  Merged serving evidence:
97
 
98
+ - Current endpoint: `http://127.0.0.1:18181/v1`, forwarding to vLLM
99
+ bitsandbytes on Gojira B at `http://100.109.109.14:18084/v1`
100
  - Served model: `kaiju-coder-7`
101
+ - Tested context: `16384` for the current OpenCode fast path. Historical
102
+ SGLang benchmark evidence includes `32768`, but 32k should be freshly
103
+ restarted and re-confirmed before being called the live default.
104
  - Probe: `1,155` visible chars in `60.17s`.
105
  - Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
106
  - Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
107
  - OpenCode customer-readiness harness: `4/4` tasks passed, `28/28` required files written, including source/provenance and release-claim safety review.
108
  - vLLM nightly serving probe: passed at `16384` after `pandas` preinstall and
109
+ `--language-model-only`.
110
+ - Runtime-quantized vLLM bitsandbytes: current speed path; passed at `8192`
111
+ and `16384`; 16k code patch completed in `11.3s`, and logs reported about
112
+ `17.8 GiB` model memory.
113
 
114
  Known comparison caveat:
115
 
 
121
  - Raw full-website generation has not yet passed the merged-model release sweep and should remain harness-first for paid delivery.
122
  - The deterministic harness remains the practical paid website workflow.
123
  - The adapter needs a strong app layer for file editing, tool use, auth, billing, rate limits, logging, and rollback.
124
+ - Public HF upload and human review are complete for testing. Real customer
125
+ paid charging still requires Stripe live-mode setup and controlled live
126
+ payment verification.
127
  - Not intended for high-risk medical, legal, financial, or safety-critical decisions without expert review.
SERVING_BENCHMARKS.md CHANGED
@@ -6,12 +6,15 @@ The model id must remain `kaiju-coder-7`.
6
  ## Current Live Runtime
7
 
8
  - Host: Gojira-B over Tailscale
9
- - Base URL: `http://100.109.109.14:18083/v1`
10
- - Serving stack: SGLang merged full model
11
- - Current verified post-quantization restored context: `16384`
 
 
12
  - Tested high-context target: `32768`
13
- - Current container: `qwen36-merged-sglang-18083`
14
- - Current caveat: direct raw generation is slow for multi-file OpenCode work.
 
15
 
16
  ## Benchmark Command
17
 
@@ -294,12 +297,11 @@ Run: `runs/benchmarks/20260603T151244Z-kaiju-coder-7-serving/summary.md`
294
  | vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
295
  | vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
296
 
297
- Interpretation: vLLM now runs Kaiju Coder 7 at 16k, but it is not clearly
298
- faster than SGLang on the current smoke prompts. Keep SGLang as the recommended
299
- runtime because it has stable OpenCode smoke evidence, a simpler launch path,
300
- and historical 32k proof. Keep the live/default OpenCode profile at 16k until
301
- 32k is freshly re-confirmed. Keep the vLLM scripts for future nightly-image or
302
- quantized-weight testing.
303
 
304
  ## vLLM bitsandbytes Runtime-Quantized Candidate
305
 
@@ -353,12 +355,12 @@ bash scripts/run_kaiju_quantized_opencode_smoke.sh
353
  Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
354
  exactly `Kaiju Coder 7 quantized runtime ok`.
355
 
356
- Recommendation: keep SGLang as the default public/OpenCode runtime and keep the
357
- currently installed OpenCode profile at 16k unless the 32k target has just been
358
- restarted and re-confirmed. Treat vLLM bitsandbytes as the current working
359
- quantized local candidate for advanced GPU users and future paid API speed
360
- experiments. It now has direct identity/code/business-doc evidence plus an
361
- OpenCode one-file smoke, but it is not a persisted quantized-weights repo.
362
 
363
  ## 2026-06-03 Fast Proxy And Website Harness Speed Pass
364
 
@@ -386,7 +388,7 @@ Fresh OpenCode smoke through the local fast proxy:
386
  - Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
387
  - Result: passed in about `23.5s`, wrote the exact requested file.
388
  - Packaged public verifier after exact-content agent rule:
389
- `runs/public-opencode-smoke/20260603T232928Z/summary.md`, `4/4`
390
  passed through `http://127.0.0.1:18181/v1`.
391
 
392
  Website harness/router speed pass:
@@ -414,16 +416,16 @@ python3 scripts/run_kaiju_public_demo_pack.py \
414
  --planner-timeout 90
415
  ```
416
 
417
- Run: `runs/public-demo-pack/20260603T232534Z/summary.md`
418
 
419
  | Task | Result | Seconds | Changed files |
420
  | --- | --- | ---: | ---: |
421
- | Website | Passed | 24.59 | 2 |
422
- | Owner AI company pack | Passed | 29.99 | 19 |
423
- | Stripe safety plan | Passed | 9.93 | 2 |
424
- | CSV parser artifact | Passed | 19.93 | 2 |
425
 
426
- Total: `4/4` passed in `84.43s`.
427
 
428
  ## Persisted GGUF Q8_0 Candidate
429
 
 
6
  ## Current Live Runtime
7
 
8
  - Host: Gojira-B over Tailscale
9
+ - Local OpenCode base URL: `http://127.0.0.1:18181/v1`
10
+ - Upstream base URL: `http://100.109.109.14:18084/v1`
11
+ - Serving stack: vLLM bitsandbytes runtime quantization behind the Kaiju fast
12
+ proxy
13
+ - Current verified context: `16384`
14
  - Tested high-context target: `32768`
15
+ - Current container: `qwen36-merged-vllm-18084`
16
+ - Current caveat: direct raw generation is still slow for multi-file OpenCode
17
+ work; use the deterministic router/harness for public business-owner demos.
18
 
19
  ## Benchmark Command
20
 
 
297
  | vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
298
  | vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
299
 
300
+ Interpretation: unquantized vLLM now runs Kaiju Coder 7 at 16k, but it was not
301
+ clearly faster than SGLang on these smoke prompts. This is historical fallback
302
+ evidence. The later bitsandbytes vLLM path plus fast proxy is the active speed
303
+ path. Keep the live/default OpenCode profile at 16k until 32k is freshly
304
+ re-confirmed.
 
305
 
306
  ## vLLM bitsandbytes Runtime-Quantized Candidate
307
 
 
355
  Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
356
  exactly `Kaiju Coder 7 quantized runtime ok`.
357
 
358
+ Recommendation: use vLLM bitsandbytes behind the local fast proxy as the
359
+ current public/OpenCode speed path and keep the installed OpenCode profile at
360
+ 16k unless the 32k target has just been restarted and re-confirmed. Treat
361
+ SGLang as fallback and historical high-context evidence. vLLM bitsandbytes has
362
+ direct identity/code/business-doc evidence plus an OpenCode one-file smoke, but
363
+ it is not a persisted quantized-weights repo.
364
 
365
  ## 2026-06-03 Fast Proxy And Website Harness Speed Pass
366
 
 
388
  - Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
389
  - Result: passed in about `23.5s`, wrote the exact requested file.
390
  - Packaged public verifier after exact-content agent rule:
391
+ `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4`
392
  passed through `http://127.0.0.1:18181/v1`.
393
 
394
  Website harness/router speed pass:
 
416
  --planner-timeout 90
417
  ```
418
 
419
+ Run: `runs/public-demo-pack/20260603T235009Z/summary.md`
420
 
421
  | Task | Result | Seconds | Changed files |
422
  | --- | --- | ---: | ---: |
423
+ | Website | Passed | 4.73 | 2 |
424
+ | Owner AI company pack | Passed | 29.85 | 19 |
425
+ | Stripe safety plan | Passed | 9.99 | 2 |
426
+ | CSV parser artifact | Passed | 19.97 | 2 |
427
 
428
+ Total: `4/4` passed in `64.529s`.
429
 
430
  ## Persisted GGUF Q8_0 Candidate
431