restokes92 commited on
Commit
4ca1eb4
·
verified ·
1 Parent(s): 00ba859

Add files using upload-large-folder tool

Browse files
EVAL_SCOREBOARD.md CHANGED
@@ -35,7 +35,7 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
35
  | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
36
  | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
37
  | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
38
- | Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` | Passed; latest run `runs/public-opencode-smoke/20260603T182222Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and file written only in the requested temp workspace | 2026-06-03 |
39
  | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
40
  | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
41
  | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
@@ -43,9 +43,13 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
43
  | Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
44
  | Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
45
  | Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
 
 
 
 
46
  | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
47
- | Hugging Face private repo create attempt | `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_release_staging.sh` with namespaces `RichardEchols`, `RMDWLLC`, and `restokes92` | Blocked by Hugging Face `403 Forbidden`; current token cannot create model repos in those namespaces | 2026-06-03 |
48
- | Hugging Face merged-model metadata and upload boundary | `bash scripts/prepare_hf_merged_model_metadata.sh`; `KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh`; `bash scripts/upload_hf_merged_model_from_gojira_b.sh`; `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Metadata prep synced model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` to Gojira-B; sudo rsync handled the root-owned merged folder; upload dry run confirmed metadata plus the `51G`/`14`-shard merged model before printing `hf upload-large-folder`; apply remains blocked by human review and Hugging Face namespace permission before any large upload | 2026-06-03 |
49
  | v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
50
  | Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
51
  | Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
@@ -60,11 +64,11 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
60
  | v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
61
  | Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
62
  | Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
63
- | GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending |
64
  | Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
65
- | Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Pending |
66
- | Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged and upload-scripted; upload blocked by HF token permissions and human/public-review decision |
67
 
68
  ## Decision Rule
69
 
70
- The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should still be published honestly as merged model plus deterministic harness plus verifier. Raw merged v1.8 is useful on business documents and Jah credits but slow on this SGLang stack. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.
 
35
  | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
36
  | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
37
  | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
38
+ | Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` | Passed; latest run `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
39
  | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
40
  | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
41
  | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
 
43
  | Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
44
  | Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
45
  | Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
46
+ | Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
47
+ | Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
48
+ | GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
49
+ | Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `64.529s`: website `4.73s`, owner AI company pack `29.85s` with `19` files, Stripe safety plan `9.99s`, CSV parser artifact `19.97s`; run `runs/public-demo-pack/20260603T235009Z/summary.md` | 2026-06-03 |
50
  | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
51
+ | Hugging Face public helper repos | `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` | Passed `17/17`; public downloads verified for adapter, OpenCode helper, and runtime helper, including installer dry-run, demo runner, and GGUF candidate note | 2026-06-03 |
52
+ | Hugging Face merged-model upload | `KAIJU_HF_NAMESPACE=RMDWLLC KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Uploaded public repo `RMDWLLC/kaiju-coder-7`; `hf upload-large-folder` processed `53.8G/53.8G`, `39` files, `14` safetensors shards; metadata reports `private: false` | 2026-06-03 |
53
  | v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
54
  | Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
55
  | Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
 
64
  | v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
65
  | Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
66
  | Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
67
+ | GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending; required only before superiority claims |
68
  | Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
69
+ | Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Approved for public HF visibility and paid API launch preflight on 2026-06-03 |
70
+ | Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged, bundled, uploaded to public HF repos, and verified with public downloads |
71
 
72
  ## Decision Rule
73
 
74
+ The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should be published honestly as Kaiju Coder 7 plus deterministic harness plus verifier, with vLLM bitsandbytes plus the fast proxy as the current speed path. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.
FINAL_RELEASE_REPORT.md CHANGED
@@ -1,6 +1,6 @@
1
  # Kaiju Coder 7 Final Release Report
2
 
3
- Generated: `2026-06-03T21:12:14Z`
4
 
5
  Product name: `Kaiju Coder 7`
6
  Public model id: `kaiju-coder-7`
@@ -25,7 +25,7 @@ Stripe live-mode switch and controlled live payment verification.
25
  | Field | Value |
26
  |---|---|
27
  | Status | `pass` |
28
- | Base URL | `http://100.109.109.14:18083/v1` |
29
  | Model id | `kaiju-coder-7` |
30
  | Max model length | `16384` |
31
  | Detail | `` |
@@ -52,7 +52,7 @@ stability and speed.
52
  | Small helper repos uploaded | `True` |
53
  | Merged model uploaded | `True` |
54
  | Merged repo | `RMDWLLC/kaiju-coder-7` |
55
- | Merged repo SHA | `736af44add9321f74e8603cd739245fc0853d62c` |
56
  | Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
57
  | Download status | `public downloads verified; no active private-storage blocker recorded` |
58
  | Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
@@ -93,7 +93,7 @@ stability and speed.
93
  | Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
94
  | Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
95
  | Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
96
- | Latest direct API smoke | `runs/benchmarks/20260603T193000Z-kaiju-coder-7-serving/summary.md` |
97
  | Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
98
  | Latest public OpenCode smoke | `runs/public-opencode-smoke` |
99
 
@@ -133,7 +133,7 @@ human release review explicitly approves public paid API launch.
133
 
134
  ## Changed Files
135
 
136
- `git status --short` currently reports `116` changed paths.
137
 
138
  | State | Path |
139
  |---|---|
@@ -153,8 +153,10 @@ human release review explicitly approves public paid API launch.
153
  | M | `gateway/cloudflare-worker/src/index.js` |
154
  | M | `gateway/cloudflare-worker/test/index.test.js` |
155
  | M | `gateway/cloudflare-worker/wrangler.jsonc` |
 
156
  | M | `kaiju_harness/router.py` |
157
  | M | `kaiju_harness/verification.py` |
 
158
  | D | `models/README.md` |
159
  | D | `models/qwen3.6-27b-base.md` |
160
  | D | `models/qwen3.6-27b-fp8.md` |
@@ -164,14 +166,17 @@ human release review explicitly approves public paid API launch.
164
  | M | `release/MODEL_CARD_DRAFT.md` |
165
  | M | `scripts/build_sft_dataset.py` |
166
  | M | `scripts/check-gojira-b-capacity.sh` |
 
167
  | M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
168
  | M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
169
  | M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
170
  | M | `scripts/run_kaiju_api_harness_smoke.py` |
 
171
  | M | `scripts/start-qwen36-lora-sglang.sh` |
172
  | M | `scripts/stop-qwen36-lora-sglang.sh` |
173
  | M | `scripts/validate_training_data.py` |
174
  | M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
 
175
  | ?? | `.opencode/` |
176
  | ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
177
  | ?? | `datasets/v1.7-targets.json` |
@@ -196,6 +201,7 @@ human release review explicitly approves public paid API launch.
196
  | ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
197
  | ?? | `release/bundles/` |
198
  | ?? | `release/cloudflare-bindings.example.json` |
 
199
  | ?? | `release/hf-release-permission-evidence.example.json` |
200
  | ?? | `release/hf-release-permission-evidence.json` |
201
  | ?? | `release/huggingface/` |
@@ -225,17 +231,21 @@ human release review explicitly approves public paid API launch.
225
  | ?? | `scripts/generate_kaiju_final_report.py` |
226
  | ?? | `scripts/gojira-b-ssh-lib.sh` |
227
  | ?? | `scripts/install_kaiju_opencode_profile.py` |
 
228
  | ?? | `scripts/make_hf_release_public.sh` |
229
  | ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
230
  | ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
231
  | ?? | `scripts/prepare_hf_release_staging.sh` |
232
  | ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
233
  | ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
 
234
  | ?? | `scripts/refresh_kaiju_release_evidence.py` |
 
235
  | ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
236
  | ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
237
  | ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
238
  | ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
 
239
  | ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
240
  | ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
241
  | ?? | `scripts/start-qwen36-merged-sglang.sh` |
@@ -262,9 +272,9 @@ human release review explicitly approves public paid API launch.
262
  | git HEAD | `git rev-parse HEAD` | 0 |
263
  | git origin/main | `git rev-parse origin/main` | 0 |
264
  | git status | `git status --short` | 0 |
265
- | local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
266
- | HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
267
- | public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
268
  | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
269
  | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
270
 
 
1
  # Kaiju Coder 7 Final Release Report
2
 
3
+ Generated: `2026-06-03T23:53:31Z`
4
 
5
  Product name: `Kaiju Coder 7`
6
  Public model id: `kaiju-coder-7`
 
25
  | Field | Value |
26
  |---|---|
27
  | Status | `pass` |
28
+ | Base URL | `http://127.0.0.1:18181/v1` |
29
  | Model id | `kaiju-coder-7` |
30
  | Max model length | `16384` |
31
  | Detail | `` |
 
52
  | Small helper repos uploaded | `True` |
53
  | Merged model uploaded | `True` |
54
  | Merged repo | `RMDWLLC/kaiju-coder-7` |
55
+ | Merged repo SHA | `00ba85985102a14838dbb8a5692d9a75ce9da15a` |
56
  | Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
57
  | Download status | `public downloads verified; no active private-storage blocker recorded` |
58
  | Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
 
93
  | Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
94
  | Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
95
  | Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
96
+ | Latest direct API smoke | `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md` |
97
  | Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
98
  | Latest public OpenCode smoke | `runs/public-opencode-smoke` |
99
 
 
133
 
134
  ## Changed Files
135
 
136
+ `git status --short` currently reports `126` changed paths.
137
 
138
  | State | Path |
139
  |---|---|
 
153
  | M | `gateway/cloudflare-worker/src/index.js` |
154
  | M | `gateway/cloudflare-worker/test/index.test.js` |
155
  | M | `gateway/cloudflare-worker/wrangler.jsonc` |
156
+ | M | `gateway/gojira-local/server.py` |
157
  | M | `kaiju_harness/router.py` |
158
  | M | `kaiju_harness/verification.py` |
159
+ | M | `kaiju_harness/website.py` |
160
  | D | `models/README.md` |
161
  | D | `models/qwen3.6-27b-base.md` |
162
  | D | `models/qwen3.6-27b-fp8.md` |
 
166
  | M | `release/MODEL_CARD_DRAFT.md` |
167
  | M | `scripts/build_sft_dataset.py` |
168
  | M | `scripts/check-gojira-b-capacity.sh` |
169
+ | M | `scripts/check_kaiju_gateway_policy.py` |
170
  | M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
171
  | M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
172
  | M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
173
  | M | `scripts/run_kaiju_api_harness_smoke.py` |
174
+ | M | `scripts/run_kaiju_router.py` |
175
  | M | `scripts/start-qwen36-lora-sglang.sh` |
176
  | M | `scripts/stop-qwen36-lora-sglang.sh` |
177
  | M | `scripts/validate_training_data.py` |
178
  | M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
179
+ | M | `tests/test_website_harness.py` |
180
  | ?? | `.opencode/` |
181
  | ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
182
  | ?? | `datasets/v1.7-targets.json` |
 
201
  | ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
202
  | ?? | `release/bundles/` |
203
  | ?? | `release/cloudflare-bindings.example.json` |
204
+ | ?? | `release/gguf/` |
205
  | ?? | `release/hf-release-permission-evidence.example.json` |
206
  | ?? | `release/hf-release-permission-evidence.json` |
207
  | ?? | `release/huggingface/` |
 
231
  | ?? | `scripts/generate_kaiju_final_report.py` |
232
  | ?? | `scripts/gojira-b-ssh-lib.sh` |
233
  | ?? | `scripts/install_kaiju_opencode_profile.py` |
234
+ | ?? | `scripts/kaiju_opencode_fast_proxy.py` |
235
  | ?? | `scripts/make_hf_release_public.sh` |
236
  | ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
237
  | ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
238
  | ?? | `scripts/prepare_hf_release_staging.sh` |
239
  | ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
240
  | ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
241
+ | ?? | `scripts/probe-gojira-b-persisted-quantization.sh` |
242
  | ?? | `scripts/refresh_kaiju_release_evidence.py` |
243
+ | ?? | `scripts/run-gojira-b-kaiju-gguf-convert.sh` |
244
  | ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
245
  | ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
246
  | ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
247
  | ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
248
+ | ?? | `scripts/run_kaiju_public_demo_pack.py` |
249
  | ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
250
  | ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
251
  | ?? | `scripts/start-qwen36-merged-sglang.sh` |
 
272
  | git HEAD | `git rev-parse HEAD` | 0 |
273
  | git origin/main | `git rev-parse origin/main` | 0 |
274
  | git status | `git status --short` | 0 |
275
+ | local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
276
+ | HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
277
+ | public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
278
  | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
279
  | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
280
 
GOAL_COMPLETION_AUDIT.md CHANGED
@@ -1,11 +1,11 @@
1
  # Kaiju Coder 7 Goal Completion Audit
2
 
3
- Generated: `2026-06-03T21:12:21Z`
4
 
5
  Overall: `complete`
6
  Summary: `18 passed / 0 blocked / 0 manual`
7
 
8
- This audit maps the active Kaiju Coder 7 objective to current evidence. It is stricter than local readiness: local public testing and Hugging Face release checks can pass while paid API launch remains blocked.
9
 
10
  ## Readiness Commands
11
 
@@ -28,13 +28,13 @@ This audit maps the active Kaiju Coder 7 objective to current evidence. It is st
28
  | OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py | |
29
  | OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py | |
30
  | OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
31
- | Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/20260603T193000Z-kaiju-coder-7-serving/summary.md | |
32
  | Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default | |
33
  | Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md | |
34
  | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
35
  | Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums | |
36
  | Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
37
- | Hugging Face | Merged 51GB model repo upload is complete or guarded and ready after human review/namespace permission. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh | |
38
  | Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply | |
39
  | Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
40
  | Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md | |
 
1
  # Kaiju Coder 7 Goal Completion Audit
2
 
3
+ Generated: `2026-06-03T23:53:44Z`
4
 
5
  Overall: `complete`
6
  Summary: `18 passed / 0 blocked / 0 manual`
7
 
8
+ This audit maps the active Kaiju Coder 7 objective to current evidence across local runtime, Hugging Face release, OpenCode, paid API preflight, and remaining honest caveats.
9
 
10
  ## Readiness Commands
11
 
 
28
  | OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py | |
29
  | OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py | |
30
  | OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
31
+ | Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md | |
32
  | Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default | |
33
  | Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md | |
34
  | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
35
  | Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums | |
36
  | Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
37
+ | Hugging Face | Merged 51GB model repo upload is complete and public, or guarded with explicit evidence. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh | |
38
  | Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply | |
39
  | Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
40
  | Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md | |
HF_UPLOAD_EVIDENCE.md CHANGED
@@ -1,15 +1,15 @@
1
  # Kaiju Coder 7 Hugging Face Upload Evidence
2
 
3
- Generated: `2026-06-03T20:36:26Z`
4
 
5
  ## Uploaded Repos
6
 
7
  | Repo | Visibility | Evidence |
8
  |---|---|---|
9
- | `RMDWLLC/kaiju-coder-7-adapter` | public | Final visible SHA `67bb48b8115b820cd8b01d1778d2610d9ce63692`; public visibility verified after 2026-06-03 paid API evidence refresh. |
10
- | `RMDWLLC/kaiju-coder-7-opencode` | public | Final visible SHA `3c9c75416ffb41645a1a959beb99baeff6972fb8`; public visibility and OpenCode installer dry-run verified. |
11
- | `RMDWLLC/kaiju-coder-7-quantized-runtime` | public | Uploaded at commit `6d7449a3ffac68ed1d591c57b044ba599cee8b11`; public visibility verified. |
12
- | `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `736af44add9321f74e8603cd739245fc0853d62c`; public metadata reports `private: false`. |
13
 
14
  These SHAs are a point-in-time release evidence snapshot. Uploading this
15
  evidence file itself creates another metadata commit, so use `hf models info`
@@ -71,14 +71,15 @@ Result:
71
  - `hf auth whoami` returned user `restokes92` with org `RMDWLLC`.
72
  - `hf repos settings ... --public` completed for all four repos.
73
  - `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public`
74
- passed `17/17` checks after the public visibility switch and again after
75
- the refreshed public helper upload.
 
76
  - The adapter, OpenCode helper, and runtime-quantized helper repos downloaded
77
  successfully as public repos.
78
  - The downloaded OpenCode helper installer dry-run passed and included the
79
  loop guard.
80
  - Merged model metadata reports `private: false`, SHA
81
- `736af44add9321f74e8603cd739245fc0853d62c`, and lists all `14`
82
  safetensors shards.
83
 
84
  The earlier private-storage limit blocked private file downloads after the
 
1
  # Kaiju Coder 7 Hugging Face Upload Evidence
2
 
3
+ Generated: `2026-06-03T23:37:20Z`
4
 
5
  ## Uploaded Repos
6
 
7
  | Repo | Visibility | Evidence |
8
  |---|---|---|
9
+ | `RMDWLLC/kaiju-coder-7-adapter` | public | Refreshed public helper/evidence package commit `943b6fc7e025bbacd8b94275eb4321f6b0ed69c7`; public visibility verified after 2026-06-03 speed and GGUF-candidate pass. |
10
+ | `RMDWLLC/kaiju-coder-7-opencode` | public | Refreshed OpenCode helper commit `032872d88fd799515ac81158e011780e0d6059f6`; public visibility, installer dry-run, and exact-file smoke verified. |
11
+ | `RMDWLLC/kaiju-coder-7-quantized-runtime` | public | Current public helper commit `785f3d758da493e3c435d67ef12c3e1e4d62db1a`; includes runtime bitsandbytes recipe plus GGUF Q8_0 candidate note. |
12
+ | `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `00ba85985102a14838dbb8a5692d9a75ce9da15a`; public metadata reports `private: false`. |
13
 
14
  These SHAs are a point-in-time release evidence snapshot. Uploading this
15
  evidence file itself creates another metadata commit, so use `hf models info`
 
71
  - `hf auth whoami` returned user `restokes92` with org `RMDWLLC`.
72
  - `hf repos settings ... --public` completed for all four repos.
73
  - `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public`
74
+ passed `17/17` checks after the public visibility switch, after the refreshed
75
+ public helper upload, and again after adding stricter checks for the demo
76
+ runner and GGUF candidate package files.
77
  - The adapter, OpenCode helper, and runtime-quantized helper repos downloaded
78
  successfully as public repos.
79
  - The downloaded OpenCode helper installer dry-run passed and included the
80
  loop guard.
81
  - Merged model metadata reports `private: false`, SHA
82
+ `00ba85985102a14838dbb8a5692d9a75ce9da15a`, and lists all `14`
83
  safetensors shards.
84
 
85
  The earlier private-storage limit blocked private file downloads after the
LOCAL_TEST_INSTRUCTIONS.md CHANGED
@@ -1,6 +1,6 @@
1
  # Kaiju Coder 7 Local Test Instructions
2
 
3
- Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the SGLang commands require Gojira B over Tailscale.
4
 
5
  ## Run The Local Release-Candidate Gate
6
 
@@ -24,26 +24,32 @@ KAIJU_MERGED_MODEL_DIR=/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.
24
 
25
  ## Start Kaiju Coder 7 Serving
26
 
27
- Use this for the current model-side candidate:
28
 
29
  ```bash
30
- KAIJU_QWEN36_MERGED_PORT=18083 \
31
- KAIJU_QWEN36_MERGED_SESSION=kaiju_qwen36_v18_merged_sglang \
32
- KAIJU_QWEN36_MERGED_CONTEXT=16384 \
33
- KAIJU_QWEN36_MERGED_MEM_FRACTION=0.85 \
34
- ./scripts/start-qwen36-merged-sglang.sh
35
  ```
36
 
37
  Confirm readiness:
38
 
39
  ```bash
40
- curl http://100.109.109.14:18083/v1/models
 
 
 
 
 
 
 
41
  ```
42
 
43
  The high-context `32768` target has benchmark evidence in
44
- `release/SERVING_BENCHMARKS.md`, but the current restored Gojira-B endpoint is
45
- parked at `16384` for reliable local/OpenCode testing after the quantized-vLLM
46
- smoke work.
47
 
48
  ## Prepare Merged-Model Hugging Face Metadata
49
 
@@ -82,7 +88,7 @@ python3 scripts/run_kaiju_api_harness_smoke.py
82
 
83
  ```bash
84
  python3 evals/run_openai_compat_smoke.py \
85
- --base-url http://100.109.109.14:18083/v1 \
86
  --model kaiju-coder-7 \
87
  --tasks evals/tasks/smoke.jsonl \
88
  --max-tasks 1 \
@@ -100,7 +106,7 @@ evals pass at acceptable latency:
100
 
101
  ```bash
102
  python3 evals/run_openai_compat_smoke.py \
103
- --base-url http://100.109.109.14:18083/v1 \
104
  --model kaiju-coder-7 \
105
  --tasks evals/tasks/business-owner-v18-comparison.jsonl \
106
  --timeout 900 \
 
1
  # Kaiju Coder 7 Local Test Instructions
2
 
3
+ Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the fastest current runtime is vLLM bitsandbytes on Gojira B over Tailscale with the local OpenCode fast proxy.
4
 
5
  ## Run The Local Release-Candidate Gate
6
 
 
24
 
25
  ## Start Kaiju Coder 7 Serving
26
 
27
+ Use this for the fastest current model-side candidate:
28
 
29
  ```bash
30
+ KAIJU_VLLM_CONTEXT=16384 \
31
+ KAIJU_VLLM_QUANTIZATION=bitsandbytes \
32
+ KAIJU_VLLM_LOAD_FORMAT=bitsandbytes \
33
+ KAIJU_VLLM_GPU_UTIL=0.90 \
34
+ ./scripts/start-qwen36-merged-vllm.sh
35
  ```
36
 
37
  Confirm readiness:
38
 
39
  ```bash
40
+ curl http://100.109.109.14:18084/v1/models
41
+ ```
42
+
43
+ Then keep the Mac-side fast proxy pointed at that vLLM endpoint:
44
+
45
+ ```bash
46
+ KAIJU_OPENAI_BASE_URL=http://100.109.109.14:18084/v1 \
47
+ python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
48
  ```
49
 
50
  The high-context `32768` target has benchmark evidence in
51
+ `release/SERVING_BENCHMARKS.md`, but the current speed/default path is 16k
52
+ runtime-quantized vLLM plus the local fast proxy.
 
53
 
54
  ## Prepare Merged-Model Hugging Face Metadata
55
 
 
88
 
89
  ```bash
90
  python3 evals/run_openai_compat_smoke.py \
91
+ --base-url http://100.109.109.14:18084/v1 \
92
  --model kaiju-coder-7 \
93
  --tasks evals/tasks/smoke.jsonl \
94
  --max-tasks 1 \
 
106
 
107
  ```bash
108
  python3 evals/run_openai_compat_smoke.py \
109
+ --base-url http://100.109.109.14:18084/v1 \
110
  --model kaiju-coder-7 \
111
  --tasks evals/tasks/business-owner-v18-comparison.jsonl \
112
  --timeout 900 \
MERGED_MODEL_RELEASE_MANIFEST.json CHANGED
@@ -6,6 +6,6 @@
6
  "notes": [
7
  "Local metadata sync only; no Hugging Face upload performed.",
8
  "Qwen attribution belongs in README/provenance/license notes, not the product model id.",
9
- "Public paid API launch remains blocked until live launch preflight and human review pass."
10
  ]
11
  }
 
6
  "notes": [
7
  "Local metadata sync only; no Hugging Face upload performed.",
8
  "Qwen attribution belongs in README/provenance/license notes, not the product model id.",
9
+ "Public paid API preflight evidence has passed; real customer charging still requires the deliberate Stripe live-mode switch."
10
  ]
11
  }
PAID_API_READINESS.md CHANGED
@@ -152,12 +152,12 @@ python3 scripts/check_paid_api_readiness.py --mode launch
152
  ```
153
 
154
  `check_kaiju_public_release_readiness.py --mode local` is the consolidated
155
- public-testing readiness command. It can pass while public upload and paid API
156
- launch remain manual blockers. `--mode hf-release` checks the downloadable
157
- model/helper release and requires sanitized Hugging Face namespace permission
158
- evidence plus human review while keeping paid API launch manual. `--mode public`
159
- must remain red until Hugging Face write permissions, live Cloudflare resources,
160
- Stripe staging evidence, rollback proof, and human review are complete.
161
 
162
  `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
163
  the current local/public readiness summaries, launch blockers, changed files,
@@ -167,8 +167,8 @@ lines.
167
 
168
  `check_kaiju_goal_completion.py --write` writes
169
  `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
170
- remain red while Hugging Face upload, human review, or live paid API launch
171
- evidence are missing.
172
 
173
  `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
174
  direct API smoke evidence, goal audit, final report, HF staging, local bundle,
 
152
  ```
153
 
154
  `check_kaiju_public_release_readiness.py --mode local` is the consolidated
155
+ public-testing readiness command. `--mode hf-release` checks the downloadable
156
+ model/helper release, public Hugging Face evidence, and human review while
157
+ keeping live paid charging separate from model publication. `--mode public`
158
+ now passes after public HF verification, live Cloudflare resource evidence,
159
+ Stripe test-mode staging evidence, rollback proof, paid-route latency evidence,
160
+ and human review are complete.
161
 
162
  `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
163
  the current local/public readiness summaries, launch blockers, changed files,
 
167
 
168
  `check_kaiju_goal_completion.py --write` writes
169
  `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
170
+ remain green only while the live runtime, public HF evidence, human review, and
171
+ paid API launch evidence continue to pass.
172
 
173
  `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
174
  direct API smoke evidence, goal audit, final report, HF staging, local bundle,
PUBLIC_TESTING_QUICKSTART.md CHANGED
@@ -19,7 +19,7 @@ Use this if you already have Kaiju Coder 7 served at an OpenAI-compatible
19
  ```bash
20
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
21
  cd kaiju-coder-7-opencode
22
- python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18083/v1
23
  ```
24
 
25
  Then run OpenCode inside the project you want to edit:
@@ -65,23 +65,31 @@ the server to expose:
65
 
66
  ```text
67
  model id: kaiju-coder-7
68
- base URL: http://127.0.0.1:18083/v1
69
  context: 16384
70
  ```
71
 
 
 
 
 
 
 
 
 
72
  Then install the OpenCode helper with:
73
 
74
  ```bash
75
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
76
  cd kaiju-coder-7-opencode
77
- python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18083/v1
78
  ```
79
 
80
  ### Path 3: Runtime-Quantized Local Candidate
81
 
82
  Use this only if you are comfortable with advanced serving setups. The current
83
- working quantized option is a runtime bitsandbytes recipe, not a separate
84
- persisted quantized weights repo.
85
 
86
  ```bash
87
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
@@ -115,10 +123,14 @@ Expected result:
115
  - Public model id: `kaiju-coder-7`
116
  - OpenCode context: `16384`
117
  - Output cap for public testing: `2500`
 
118
  - Current reliable product path: model plus deterministic business-owner
119
- harness plus verifier
120
- - Raw multi-file OpenCode generation: still too slow for broad paid API claims
121
- - Paid API: not public until launch preflight passes
 
 
 
122
 
123
  ## What Not To Claim Yet
124
 
@@ -134,16 +146,23 @@ Do claim:
134
  - Kaiju Coder 7 has a working local/OpenCode release candidate
135
  - the current tested OpenCode default is 16k context
136
  - the helper package includes a lean agent and compaction loop guard
 
 
137
  - the paid API scaffold has tests and a launch preflight, but is not yet public
138
  - the packaged public smoke verifies a fresh OpenCode one-file write before
139
  public claims are refreshed
140
-
141
- ## Current Blockers Before Public Release
142
-
143
- - Hugging Face repo creation still requires a write-capable token or namespace.
144
- - Full merged model upload has not completed; the merged folder must first have
145
- the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
146
- - Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
147
- secret verification, Stripe webhook staging evidence, staging traffic, latency
148
- evidence, and rollback proof.
149
- - Human review is still required before public upload.
 
 
 
 
 
 
19
  ```bash
20
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
21
  cd kaiju-coder-7-opencode
22
+ python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
23
  ```
24
 
25
  Then run OpenCode inside the project you want to edit:
 
65
 
66
  ```text
67
  model id: kaiju-coder-7
68
+ base URL: http://127.0.0.1:18084/v1
69
  context: 16384
70
  ```
71
 
72
+ For the fastest OpenCode behavior, run the bundled fast proxy in a separate
73
+ terminal and point OpenCode at the proxy:
74
+
75
+ ```bash
76
+ KAIJU_OPENAI_BASE_URL=http://127.0.0.1:18084/v1 \
77
+ python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
78
+ ```
79
+
80
  Then install the OpenCode helper with:
81
 
82
  ```bash
83
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
84
  cd kaiju-coder-7-opencode
85
+ python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
86
  ```
87
 
88
  ### Path 3: Runtime-Quantized Local Candidate
89
 
90
  Use this only if you are comfortable with advanced serving setups. The current
91
+ working quantized option is a runtime bitsandbytes recipe. A Q8_0 GGUF artifact
92
+ has been converted, but it is still a candidate until runtime smoke passes.
93
 
94
  ```bash
95
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
 
123
  - Public model id: `kaiju-coder-7`
124
  - OpenCode context: `16384`
125
  - Output cap for public testing: `2500`
126
+ - Fast OpenCode path: vLLM bitsandbytes runtime behind the Kaiju fast proxy
127
  - Current reliable product path: model plus deterministic business-owner
128
+ harness/router plus verifier
129
+ - Raw multi-file OpenCode generation: still too slow for broad paid claims;
130
+ useful for testing, but paid API claims should favor harnessed product
131
+ workflows until broader latency gates pass
132
+ - Paid API: not public until launch preflight passes and the Stripe live-mode
133
+ switch is deliberately completed
134
 
135
  ## What Not To Claim Yet
136
 
 
146
  - Kaiju Coder 7 has a working local/OpenCode release candidate
147
  - the current tested OpenCode default is 16k context
148
  - the helper package includes a lean agent and compaction loop guard
149
+ - the fast proxy keeps OpenCode tool calls intact while forcing bounded,
150
+ non-thinking generation
151
  - the paid API scaffold has tests and a launch preflight, but is not yet public
152
  - the packaged public smoke verifies a fresh OpenCode one-file write before
153
  public claims are refreshed
154
+ - a GGUF Q8_0 candidate exists, but is not public quantized-weights release
155
+ evidence until runtime smoke passes
156
+
157
+ ## Remaining Caveats Before Broader Claims
158
+
159
+ - Hugging Face public release repos are uploaded and public under `RMDWLLC`.
160
+ - The GGUF Q8_0 candidate still needs a runtime smoke before public
161
+ quantized-weights upload.
162
+ - Raw multi-file OpenCode generation is still not the public speed story; use
163
+ the deterministic router/harness for websites and business-owner packs.
164
+ - Public paid API launch has approval and preflight evidence, but real customer
165
+ charging still needs a deliberate Stripe live-mode switch and controlled live
166
+ payment verification.
167
+ - Do not claim 32k context as the live default until it is freshly restarted
168
+ and re-confirmed.
README.md CHANGED
@@ -108,12 +108,18 @@ Current local harness evidence:
108
  - Adapter-name-only serving can be base-equivalent.
109
  - Corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`.
110
  - Dynamic LoRA is not the release serving path for this checkpoint.
111
- - Kaiju Coder 7 serving config: SGLang over Tailscale at `http://100.109.109.14:18083/v1`, model `kaiju-coder-7`, current parked Gojira-B/OpenCode context `16384`, tested high-context target `32768`, memory fraction `0.90`.
 
 
 
 
 
112
  - v1.8 merged endpoint probe: `1,155` visible chars in `60.17s`.
113
  - v1.8 merged focused eval:
114
  - Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
115
  - Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
116
- - Broader base-Qwen, GLM, and raw website comparisons are still pending.
 
117
 
118
  Sellable-candidate gate:
119
 
 
108
  - Adapter-name-only serving can be base-equivalent.
109
  - Corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`.
110
  - Dynamic LoRA is not the release serving path for this checkpoint.
111
+ - Kaiju Coder 7 current serving config: vLLM bitsandbytes runtime
112
+ quantization on Gojira B at `http://100.109.109.14:18084/v1`, exposed on
113
+ this Mac through `http://127.0.0.1:18181/v1`, model `kaiju-coder-7`,
114
+ current OpenCode context `16384`. SGLang has historical 32k benchmark
115
+ evidence, but 32k should be freshly restarted and re-confirmed before being
116
+ called the live default.
117
  - v1.8 merged endpoint probe: `1,155` visible chars in `60.17s`.
118
  - v1.8 merged focused eval:
119
  - Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
120
  - Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
121
+ - Broader base-Qwen, GLM, and raw website comparisons are still pending before
122
+ any superiority claims.
123
 
124
  Sellable-candidate gate:
125
 
SERVING_BENCHMARKS.md CHANGED
@@ -6,12 +6,15 @@ The model id must remain `kaiju-coder-7`.
6
  ## Current Live Runtime
7
 
8
  - Host: Gojira-B over Tailscale
9
- - Base URL: `http://100.109.109.14:18083/v1`
10
- - Serving stack: SGLang merged full model
11
- - Current verified post-quantization restored context: `16384`
 
 
12
  - Tested high-context target: `32768`
13
- - Current container: `qwen36-merged-sglang-18083`
14
- - Current caveat: direct raw generation is slow for multi-file OpenCode work.
 
15
 
16
  ## Benchmark Command
17
 
@@ -294,12 +297,11 @@ Run: `runs/benchmarks/20260603T151244Z-kaiju-coder-7-serving/summary.md`
294
  | vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
295
  | vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
296
 
297
- Interpretation: vLLM now runs Kaiju Coder 7 at 16k, but it is not clearly
298
- faster than SGLang on the current smoke prompts. Keep SGLang as the recommended
299
- runtime because it has stable OpenCode smoke evidence, a simpler launch path,
300
- and historical 32k proof. Keep the live/default OpenCode profile at 16k until
301
- 32k is freshly re-confirmed. Keep the vLLM scripts for future nightly-image or
302
- quantized-weight testing.
303
 
304
  ## vLLM bitsandbytes Runtime-Quantized Candidate
305
 
@@ -323,6 +325,7 @@ Runs:
323
  - `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
324
  - `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
325
  - `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
 
326
 
327
  | Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
328
  | --- | ---: | --- | --- | ---: | ---: | ---: |
@@ -332,6 +335,8 @@ Runs:
332
  | vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
333
  | vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
334
  | vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
 
 
335
 
336
  Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
337
  load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
@@ -350,9 +355,104 @@ bash scripts/run_kaiju_quantized_opencode_smoke.sh
350
  Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
351
  exactly `Kaiju Coder 7 quantized runtime ok`.
352
 
353
- Recommendation: keep SGLang as the default public/OpenCode runtime and keep the
354
- currently installed OpenCode profile at 16k unless the 32k target has just been
355
- restarted and re-confirmed. Treat vLLM bitsandbytes as the current working
356
- quantized local candidate for advanced GPU users and future paid API speed
357
- experiments. It now has direct identity/code/business-doc evidence plus an
358
- OpenCode one-file smoke, but it is not a persisted quantized-weights repo.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ## Current Live Runtime
7
 
8
  - Host: Gojira-B over Tailscale
9
+ - Local OpenCode base URL: `http://127.0.0.1:18181/v1`
10
+ - Upstream base URL: `http://100.109.109.14:18084/v1`
11
+ - Serving stack: vLLM bitsandbytes runtime quantization behind the Kaiju fast
12
+ proxy
13
+ - Current verified context: `16384`
14
  - Tested high-context target: `32768`
15
+ - Current container: `qwen36-merged-vllm-18084`
16
+ - Current caveat: direct raw generation is still slow for multi-file OpenCode
17
+ work; use the deterministic router/harness for public business-owner demos.
18
 
19
  ## Benchmark Command
20
 
 
297
  | vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
298
  | vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
299
 
300
+ Interpretation: unquantized vLLM now runs Kaiju Coder 7 at 16k, but it was not
301
+ clearly faster than SGLang on these smoke prompts. This is historical fallback
302
+ evidence. The later bitsandbytes vLLM path plus fast proxy is the active speed
303
+ path. Keep the live/default OpenCode profile at 16k until 32k is freshly
304
+ re-confirmed.
 
305
 
306
  ## vLLM bitsandbytes Runtime-Quantized Candidate
307
 
 
325
  - `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
326
  - `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
327
  - `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
328
+ - `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
329
 
330
  | Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
331
  | --- | ---: | --- | --- | ---: | ---: | ---: |
 
335
  | vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
336
  | vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
337
  | vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
338
+ | vLLM bitsandbytes | 16384 | code_patch | True | 24.97 | 997 | 39.924 |
339
+ | vLLM bitsandbytes | 16384 | business_doc | True | 34.46 | 1615 | 46.874 |
340
 
341
  Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
342
  load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
 
355
  Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
356
  exactly `Kaiju Coder 7 quantized runtime ok`.
357
 
358
+ Recommendation: use vLLM bitsandbytes behind the local fast proxy as the
359
+ current public/OpenCode speed path and keep the installed OpenCode profile at
360
+ 16k unless the 32k target has just been restarted and re-confirmed. Treat
361
+ SGLang as fallback and historical high-context evidence. vLLM bitsandbytes has
362
+ direct identity/code/business-doc evidence plus an OpenCode one-file smoke, but
363
+ it is not a persisted quantized-weights repo.
364
+
365
+ ## 2026-06-03 Fast Proxy And Website Harness Speed Pass
366
+
367
+ The current speed profile keeps runtime-quantized vLLM active on Gojira-B port
368
+ `18084` and routes OpenCode through the local fast proxy at
369
+ `http://127.0.0.1:18181/v1`. The proxy preserves OpenCode tool-call streaming
370
+ while forcing `thinking=false`, model id `kaiju-coder-7`, and bounded output
371
+ budgets.
372
+
373
+ Active endpoint checks:
374
+
375
+ - Local fast proxy health: `http://127.0.0.1:18181/health`
376
+ - Upstream vLLM models: `http://100.109.109.14:18084/v1/models`
377
+ - Upstream reports `kaiju-coder-7` with `max_model_len=16384`
378
+
379
+ Fresh direct vLLM benchmark:
380
+
381
+ - Run: `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
382
+ - Identity: `19.48s`
383
+ - Code patch: `24.97s`, `997` chars
384
+ - Business doc: `34.46s`, `1,615` chars
385
+
386
+ Fresh OpenCode smoke through the local fast proxy:
387
+
388
+ - Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
389
+ - Result: passed in about `23.5s`, wrote the exact requested file.
390
+ - Packaged public verifier after exact-content agent rule:
391
+ `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4`
392
+ passed through `http://127.0.0.1:18181/v1`.
393
+
394
+ Website harness/router speed pass:
395
+
396
+ - Direct website harness command: `python3 scripts/run_kaiju_website_harness.py --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
397
+ - Direct website harness result: `runs/harness/website-speed-pass/avery-stone-vllm.html`, `9,257` chars, `7.31s`
398
+ - Router command: `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
399
+ - Router artifact: `runs/router-speed-pass/20260603T223731Z-website-build-a-premium-one-page-website-for-avery-stone-construction-a-reside/index.html`
400
+ - Router result: passed in `7.20s`; checks covered complete HTML, required sections, external images, responsive CSS, no lorem ipsum, and manifest write.
401
+ - Router through the installed local proxy: `runs/router-speed-pass/20260603T224328Z-website-build-a-premium-one-page-website-for-bennett-family-dental-in-charlott/index.html`
402
+ - Proxy router result: passed in `4.67s`; preserved explicit CTA `Schedule a Visit`, inferred `dental`, and passed the same complete-HTML/static checks.
403
+
404
+ Updated recommendation: for speed-sensitive OpenCode and paid workflow testing,
405
+ use vLLM bitsandbytes plus the local fast proxy as the active default. Keep
406
+ SGLang as fallback/historical evidence, not the fastest current path. For
407
+ websites and business-owner packs, prefer the deterministic router/harness path
408
+ over raw long-form HTML generation.
409
+
410
+ Public business-owner demo pack through the active fast proxy:
411
+
412
+ ```bash
413
+ python3 scripts/run_kaiju_public_demo_pack.py \
414
+ --openai-base-url http://127.0.0.1:18181/v1 \
415
+ --model kaiju-coder-7 \
416
+ --planner-timeout 90
417
+ ```
418
+
419
+ Run: `runs/public-demo-pack/20260603T235009Z/summary.md`
420
+
421
+ | Task | Result | Seconds | Changed files |
422
+ | --- | --- | ---: | ---: |
423
+ | Website | Passed | 4.73 | 2 |
424
+ | Owner AI company pack | Passed | 29.85 | 19 |
425
+ | Stripe safety plan | Passed | 9.99 | 2 |
426
+ | CSV parser artifact | Passed | 19.97 | 2 |
427
+
428
+ Total: `4/4` passed in `64.529s`.
429
+
430
+ ## Persisted GGUF Q8_0 Candidate
431
+
432
+ The dedicated persisted-quantization pass found that normal AWQ/GPTQ installs
433
+ are not clean against the Qwen3.5-capable serving stack tonight, while
434
+ `llama.cpp` conversion support includes `Qwen3_5ForConditionalGeneration`.
435
+
436
+ Command:
437
+
438
+ ```bash
439
+ ./scripts/probe-gojira-b-persisted-quantization.sh
440
+ ./scripts/run-gojira-b-kaiju-gguf-convert.sh
441
+ ```
442
+
443
+ Result:
444
+
445
+ - Artifact:
446
+ `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`
447
+ - Size: `27G`
448
+ - SHA256:
449
+ `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`
450
+ - Conversion log:
451
+ `runs/gguf-conversion/20260603T231446Z/gguf-conversion.log`
452
+ - Runtime status: candidate only; direct GGUF runtime smoke still required
453
+ before publishing quantized weights.
454
+
455
+ Interpretation: the next real speed improvement for broad public users is not
456
+ another prompt tweak. It is a smoked GGUF or GPU-persisted quantized artifact.
457
+ The fastest currently verified Kaiju Coder 7 path remains vLLM bitsandbytes
458
+ plus the local fast proxy and deterministic website/business harnesses.