restokes92 commited on
Commit
3ebb5c6
·
verified ·
1 Parent(s): 5016ab9

Upload Kaiju Coder 7 adapter release package

Browse files
EVAL_SCOREBOARD.md CHANGED
@@ -35,7 +35,7 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
35
  | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
36
  | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
37
  | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
38
- | Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` | Passed; latest run `runs/public-opencode-smoke/20260603T182222Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and file written only in the requested temp workspace | 2026-06-03 |
39
  | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
40
  | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
41
  | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
@@ -43,6 +43,10 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
43
  | Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
44
  | Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
45
  | Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
 
 
 
 
46
  | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
47
  | Hugging Face private repo create attempt | `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_release_staging.sh` with namespaces `RichardEchols`, `RMDWLLC`, and `restokes92` | Blocked by Hugging Face `403 Forbidden`; current token cannot create model repos in those namespaces | 2026-06-03 |
48
  | Hugging Face merged-model metadata and upload boundary | `bash scripts/prepare_hf_merged_model_metadata.sh`; `KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh`; `bash scripts/upload_hf_merged_model_from_gojira_b.sh`; `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Metadata prep synced model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` to Gojira-B; sudo rsync handled the root-owned merged folder; upload dry run confirmed metadata plus the `51G`/`14`-shard merged model before printing `hf upload-large-folder`; apply remains blocked by human review and Hugging Face namespace permission before any large upload | 2026-06-03 |
 
35
  | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
36
  | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
37
  | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
38
+ | Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` | Passed; latest run `runs/public-opencode-smoke/20260603T232928Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
39
  | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
40
  | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
41
  | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
 
43
  | Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
44
  | Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
45
  | Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
46
+ | Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
47
+ | Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
48
+ | GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
49
+ | Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `84.43s`: website `24.59s`, owner AI company pack `29.99s` with `19` files, Stripe safety plan `9.93s`, CSV parser artifact `19.93s`; run `runs/public-demo-pack/20260603T232534Z/summary.md` | 2026-06-03 |
50
  | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
51
  | Hugging Face private repo create attempt | `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_release_staging.sh` with namespaces `RichardEchols`, `RMDWLLC`, and `restokes92` | Blocked by Hugging Face `403 Forbidden`; current token cannot create model repos in those namespaces | 2026-06-03 |
52
  | Hugging Face merged-model metadata and upload boundary | `bash scripts/prepare_hf_merged_model_metadata.sh`; `KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh`; `bash scripts/upload_hf_merged_model_from_gojira_b.sh`; `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Metadata prep synced model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` to Gojira-B; sudo rsync handled the root-owned merged folder; upload dry run confirmed metadata plus the `51G`/`14`-shard merged model before printing `hf upload-large-folder`; apply remains blocked by human review and Hugging Face namespace permission before any large upload | 2026-06-03 |
FINAL_RELEASE_REPORT.md CHANGED
@@ -1,6 +1,6 @@
1
  # Kaiju Coder 7 Final Release Report
2
 
3
- Generated: `2026-06-03T21:12:14Z`
4
 
5
  Product name: `Kaiju Coder 7`
6
  Public model id: `kaiju-coder-7`
@@ -24,11 +24,11 @@ Stripe live-mode switch and controlled live payment verification.
24
 
25
  | Field | Value |
26
  |---|---|
27
- | Status | `pass` |
28
  | Base URL | `http://100.109.109.14:18083/v1` |
29
- | Model id | `kaiju-coder-7` |
30
- | Max model length | `16384` |
31
- | Detail | `` |
32
 
33
  Recommended default today: `16k` context through `kaiju-coder-7`. Higher
34
  context has benchmark evidence, but the currently parked default is 16k for
@@ -38,9 +38,9 @@ stability and speed.
38
 
39
  | Area | Result |
40
  |---|---|
41
- | Local public-testing readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
42
- | Hugging Face release readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
43
- | Public launch readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
44
  | Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
45
  | Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
46
 
@@ -52,18 +52,22 @@ stability and speed.
52
  | Small helper repos uploaded | `True` |
53
  | Merged model uploaded | `True` |
54
  | Merged repo | `RMDWLLC/kaiju-coder-7` |
55
- | Merged repo SHA | `736af44add9321f74e8603cd739245fc0853d62c` |
56
  | Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
57
  | Download status | `public downloads verified; no active private-storage blocker recorded` |
58
  | Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
59
 
60
  ## Hugging Face Release Blockers
61
 
62
- - No matching checks.
 
 
63
 
64
  ## Public Launch Blockers
65
 
66
- - No matching checks.
 
 
67
 
68
  ## Paid API Launch Blockers
69
 
@@ -93,7 +97,7 @@ stability and speed.
93
  | Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
94
  | Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
95
  | Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
96
- | Latest direct API smoke | `runs/benchmarks/20260603T193000Z-kaiju-coder-7-serving/summary.md` |
97
  | Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
98
  | Latest public OpenCode smoke | `runs/public-opencode-smoke` |
99
 
@@ -133,7 +137,7 @@ human release review explicitly approves public paid API launch.
133
 
134
  ## Changed Files
135
 
136
- `git status --short` currently reports `116` changed paths.
137
 
138
  | State | Path |
139
  |---|---|
@@ -153,8 +157,10 @@ human release review explicitly approves public paid API launch.
153
  | M | `gateway/cloudflare-worker/src/index.js` |
154
  | M | `gateway/cloudflare-worker/test/index.test.js` |
155
  | M | `gateway/cloudflare-worker/wrangler.jsonc` |
 
156
  | M | `kaiju_harness/router.py` |
157
  | M | `kaiju_harness/verification.py` |
 
158
  | D | `models/README.md` |
159
  | D | `models/qwen3.6-27b-base.md` |
160
  | D | `models/qwen3.6-27b-fp8.md` |
@@ -164,14 +170,17 @@ human release review explicitly approves public paid API launch.
164
  | M | `release/MODEL_CARD_DRAFT.md` |
165
  | M | `scripts/build_sft_dataset.py` |
166
  | M | `scripts/check-gojira-b-capacity.sh` |
 
167
  | M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
168
  | M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
169
  | M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
170
  | M | `scripts/run_kaiju_api_harness_smoke.py` |
 
171
  | M | `scripts/start-qwen36-lora-sglang.sh` |
172
  | M | `scripts/stop-qwen36-lora-sglang.sh` |
173
  | M | `scripts/validate_training_data.py` |
174
  | M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
 
175
  | ?? | `.opencode/` |
176
  | ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
177
  | ?? | `datasets/v1.7-targets.json` |
@@ -196,6 +205,7 @@ human release review explicitly approves public paid API launch.
196
  | ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
197
  | ?? | `release/bundles/` |
198
  | ?? | `release/cloudflare-bindings.example.json` |
 
199
  | ?? | `release/hf-release-permission-evidence.example.json` |
200
  | ?? | `release/hf-release-permission-evidence.json` |
201
  | ?? | `release/huggingface/` |
@@ -225,17 +235,21 @@ human release review explicitly approves public paid API launch.
225
  | ?? | `scripts/generate_kaiju_final_report.py` |
226
  | ?? | `scripts/gojira-b-ssh-lib.sh` |
227
  | ?? | `scripts/install_kaiju_opencode_profile.py` |
 
228
  | ?? | `scripts/make_hf_release_public.sh` |
229
  | ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
230
  | ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
231
  | ?? | `scripts/prepare_hf_release_staging.sh` |
232
  | ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
233
  | ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
 
234
  | ?? | `scripts/refresh_kaiju_release_evidence.py` |
 
235
  | ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
236
  | ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
237
  | ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
238
  | ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
 
239
  | ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
240
  | ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
241
  | ?? | `scripts/start-qwen36-merged-sglang.sh` |
@@ -262,9 +276,9 @@ human release review explicitly approves public paid API launch.
262
  | git HEAD | `git rev-parse HEAD` | 0 |
263
  | git origin/main | `git rev-parse origin/main` | 0 |
264
  | git status | `git status --short` | 0 |
265
- | local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
266
- | HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
267
- | public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
268
  | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
269
  | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
270
 
 
1
  # Kaiju Coder 7 Final Release Report
2
 
3
+ Generated: `2026-06-03T23:34:00Z`
4
 
5
  Product name: `Kaiju Coder 7`
6
  Public model id: `kaiju-coder-7`
 
24
 
25
  | Field | Value |
26
  |---|---|
27
+ | Status | `fail` |
28
  | Base URL | `http://100.109.109.14:18083/v1` |
29
+ | Model id | `unknown` |
30
+ | Max model length | `unknown` |
31
+ | Detail | `URLError(ConnectionRefusedError(61, 'Connection refused'))` |
32
 
33
  Recommended default today: `16k` context through `kaiju-coder-7`. Higher
34
  context has benchmark evidence, but the currently parked default is 16k for
 
38
 
39
  | Area | Result |
40
  |---|---|
41
+ | Local public-testing readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
42
+ | Hugging Face release readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
43
+ | Public launch readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
44
  | Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
45
  | Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
46
 
 
52
  | Small helper repos uploaded | `True` |
53
  | Merged model uploaded | `True` |
54
  | Merged repo | `RMDWLLC/kaiju-coder-7` |
55
+ | Merged repo SHA | `00ba85985102a14838dbb8a5692d9a75ce9da15a` |
56
  | Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
57
  | Download status | `public downloads verified; no active private-storage blocker recorded` |
58
  | Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
59
 
60
  ## Hugging Face Release Blockers
61
 
62
+ | Status | Check | Detail |
63
+ |---|---|---|
64
+ | fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
65
 
66
  ## Public Launch Blockers
67
 
68
+ | Status | Check | Detail |
69
+ |---|---|---|
70
+ | fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
71
 
72
  ## Paid API Launch Blockers
73
 
 
97
  | Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
98
  | Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
99
  | Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
100
+ | Latest direct API smoke | `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md` |
101
  | Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
102
  | Latest public OpenCode smoke | `runs/public-opencode-smoke` |
103
 
 
137
 
138
  ## Changed Files
139
 
140
+ `git status --short` currently reports `126` changed paths.
141
 
142
  | State | Path |
143
  |---|---|
 
157
  | M | `gateway/cloudflare-worker/src/index.js` |
158
  | M | `gateway/cloudflare-worker/test/index.test.js` |
159
  | M | `gateway/cloudflare-worker/wrangler.jsonc` |
160
+ | M | `gateway/gojira-local/server.py` |
161
  | M | `kaiju_harness/router.py` |
162
  | M | `kaiju_harness/verification.py` |
163
+ | M | `kaiju_harness/website.py` |
164
  | D | `models/README.md` |
165
  | D | `models/qwen3.6-27b-base.md` |
166
  | D | `models/qwen3.6-27b-fp8.md` |
 
170
  | M | `release/MODEL_CARD_DRAFT.md` |
171
  | M | `scripts/build_sft_dataset.py` |
172
  | M | `scripts/check-gojira-b-capacity.sh` |
173
+ | M | `scripts/check_kaiju_gateway_policy.py` |
174
  | M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
175
  | M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
176
  | M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
177
  | M | `scripts/run_kaiju_api_harness_smoke.py` |
178
+ | M | `scripts/run_kaiju_router.py` |
179
  | M | `scripts/start-qwen36-lora-sglang.sh` |
180
  | M | `scripts/stop-qwen36-lora-sglang.sh` |
181
  | M | `scripts/validate_training_data.py` |
182
  | M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
183
+ | M | `tests/test_website_harness.py` |
184
  | ?? | `.opencode/` |
185
  | ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
186
  | ?? | `datasets/v1.7-targets.json` |
 
205
  | ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
206
  | ?? | `release/bundles/` |
207
  | ?? | `release/cloudflare-bindings.example.json` |
208
+ | ?? | `release/gguf/` |
209
  | ?? | `release/hf-release-permission-evidence.example.json` |
210
  | ?? | `release/hf-release-permission-evidence.json` |
211
  | ?? | `release/huggingface/` |
 
235
  | ?? | `scripts/generate_kaiju_final_report.py` |
236
  | ?? | `scripts/gojira-b-ssh-lib.sh` |
237
  | ?? | `scripts/install_kaiju_opencode_profile.py` |
238
+ | ?? | `scripts/kaiju_opencode_fast_proxy.py` |
239
  | ?? | `scripts/make_hf_release_public.sh` |
240
  | ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
241
  | ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
242
  | ?? | `scripts/prepare_hf_release_staging.sh` |
243
  | ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
244
  | ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
245
+ | ?? | `scripts/probe-gojira-b-persisted-quantization.sh` |
246
  | ?? | `scripts/refresh_kaiju_release_evidence.py` |
247
+ | ?? | `scripts/run-gojira-b-kaiju-gguf-convert.sh` |
248
  | ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
249
  | ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
250
  | ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
251
  | ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
252
+ | ?? | `scripts/run_kaiju_public_demo_pack.py` |
253
  | ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
254
  | ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
255
  | ?? | `scripts/start-qwen36-merged-sglang.sh` |
 
276
  | git HEAD | `git rev-parse HEAD` | 0 |
277
  | git origin/main | `git rev-parse origin/main` | 0 |
278
  | git status | `git status --short` | 0 |
279
+ | local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
280
+ | HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
281
+ | public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
282
  | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
283
  | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
284
 
GOAL_COMPLETION_AUDIT.md CHANGED
@@ -1,6 +1,6 @@
1
  # Kaiju Coder 7 Goal Completion Audit
2
 
3
- Generated: `2026-06-03T21:12:21Z`
4
 
5
  Overall: `complete`
6
  Summary: `18 passed / 0 blocked / 0 manual`
@@ -28,7 +28,7 @@ This audit maps the active Kaiju Coder 7 objective to current evidence. It is st
28
  | OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py | |
29
  | OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py | |
30
  | OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
31
- | Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/20260603T193000Z-kaiju-coder-7-serving/summary.md | |
32
  | Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default | |
33
  | Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md | |
34
  | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
 
1
  # Kaiju Coder 7 Goal Completion Audit
2
 
3
+ Generated: `2026-06-03T23:35:30Z`
4
 
5
  Overall: `complete`
6
  Summary: `18 passed / 0 blocked / 0 manual`
 
28
  | OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py | |
29
  | OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py | |
30
  | OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
31
+ | Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md | |
32
  | Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default | |
33
  | Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md | |
34
  | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
HF_UPLOAD_EVIDENCE.md CHANGED
@@ -6,10 +6,10 @@ Generated: `2026-06-03T20:36:26Z`
6
 
7
  | Repo | Visibility | Evidence |
8
  |---|---|---|
9
- | `RMDWLLC/kaiju-coder-7-adapter` | public | Final visible SHA `67bb48b8115b820cd8b01d1778d2610d9ce63692`; public visibility verified after 2026-06-03 paid API evidence refresh. |
10
  | `RMDWLLC/kaiju-coder-7-opencode` | public | Final visible SHA `3c9c75416ffb41645a1a959beb99baeff6972fb8`; public visibility and OpenCode installer dry-run verified. |
11
  | `RMDWLLC/kaiju-coder-7-quantized-runtime` | public | Uploaded at commit `6d7449a3ffac68ed1d591c57b044ba599cee8b11`; public visibility verified. |
12
- | `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `736af44add9321f74e8603cd739245fc0853d62c`; public metadata reports `private: false`. |
13
 
14
  These SHAs are a point-in-time release evidence snapshot. Uploading this
15
  evidence file itself creates another metadata commit, so use `hf models info`
@@ -78,7 +78,7 @@ Result:
78
  - The downloaded OpenCode helper installer dry-run passed and included the
79
  loop guard.
80
  - Merged model metadata reports `private: false`, SHA
81
- `736af44add9321f74e8603cd739245fc0853d62c`, and lists all `14`
82
  safetensors shards.
83
 
84
  The earlier private-storage limit blocked private file downloads after the
 
6
 
7
  | Repo | Visibility | Evidence |
8
  |---|---|---|
9
+ | `RMDWLLC/kaiju-coder-7-adapter` | public | Final visible SHA `5016ab9e5f32ca3f94d49a4dbed65de2729bd6ce`; public visibility verified after 2026-06-03 paid API evidence refresh. |
10
  | `RMDWLLC/kaiju-coder-7-opencode` | public | Final visible SHA `3c9c75416ffb41645a1a959beb99baeff6972fb8`; public visibility and OpenCode installer dry-run verified. |
11
  | `RMDWLLC/kaiju-coder-7-quantized-runtime` | public | Uploaded at commit `6d7449a3ffac68ed1d591c57b044ba599cee8b11`; public visibility verified. |
12
+ | `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `00ba85985102a14838dbb8a5692d9a75ce9da15a`; public metadata reports `private: false`. |
13
 
14
  These SHAs are a point-in-time release evidence snapshot. Uploading this
15
  evidence file itself creates another metadata commit, so use `hf models info`
 
78
  - The downloaded OpenCode helper installer dry-run passed and included the
79
  loop guard.
80
  - Merged model metadata reports `private: false`, SHA
81
+ `00ba85985102a14838dbb8a5692d9a75ce9da15a`, and lists all `14`
82
  safetensors shards.
83
 
84
  The earlier private-storage limit blocked private file downloads after the
LOCAL_TEST_INSTRUCTIONS.md CHANGED
@@ -1,6 +1,6 @@
1
  # Kaiju Coder 7 Local Test Instructions
2
 
3
- Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the SGLang commands require Gojira B over Tailscale.
4
 
5
  ## Run The Local Release-Candidate Gate
6
 
@@ -24,26 +24,32 @@ KAIJU_MERGED_MODEL_DIR=/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.
24
 
25
  ## Start Kaiju Coder 7 Serving
26
 
27
- Use this for the current model-side candidate:
28
 
29
  ```bash
30
- KAIJU_QWEN36_MERGED_PORT=18083 \
31
- KAIJU_QWEN36_MERGED_SESSION=kaiju_qwen36_v18_merged_sglang \
32
- KAIJU_QWEN36_MERGED_CONTEXT=16384 \
33
- KAIJU_QWEN36_MERGED_MEM_FRACTION=0.85 \
34
- ./scripts/start-qwen36-merged-sglang.sh
35
  ```
36
 
37
  Confirm readiness:
38
 
39
  ```bash
40
- curl http://100.109.109.14:18083/v1/models
 
 
 
 
 
 
 
41
  ```
42
 
43
  The high-context `32768` target has benchmark evidence in
44
- `release/SERVING_BENCHMARKS.md`, but the current restored Gojira-B endpoint is
45
- parked at `16384` for reliable local/OpenCode testing after the quantized-vLLM
46
- smoke work.
47
 
48
  ## Prepare Merged-Model Hugging Face Metadata
49
 
@@ -82,7 +88,7 @@ python3 scripts/run_kaiju_api_harness_smoke.py
82
 
83
  ```bash
84
  python3 evals/run_openai_compat_smoke.py \
85
- --base-url http://100.109.109.14:18083/v1 \
86
  --model kaiju-coder-7 \
87
  --tasks evals/tasks/smoke.jsonl \
88
  --max-tasks 1 \
@@ -100,7 +106,7 @@ evals pass at acceptable latency:
100
 
101
  ```bash
102
  python3 evals/run_openai_compat_smoke.py \
103
- --base-url http://100.109.109.14:18083/v1 \
104
  --model kaiju-coder-7 \
105
  --tasks evals/tasks/business-owner-v18-comparison.jsonl \
106
  --timeout 900 \
 
1
  # Kaiju Coder 7 Local Test Instructions
2
 
3
+ Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the fastest current runtime is vLLM bitsandbytes on Gojira B over Tailscale with the local OpenCode fast proxy.
4
 
5
  ## Run The Local Release-Candidate Gate
6
 
 
24
 
25
  ## Start Kaiju Coder 7 Serving
26
 
27
+ Use this for the fastest current model-side candidate:
28
 
29
  ```bash
30
+ KAIJU_VLLM_CONTEXT=16384 \
31
+ KAIJU_VLLM_QUANTIZATION=bitsandbytes \
32
+ KAIJU_VLLM_LOAD_FORMAT=bitsandbytes \
33
+ KAIJU_VLLM_GPU_UTIL=0.90 \
34
+ ./scripts/start-qwen36-merged-vllm.sh
35
  ```
36
 
37
  Confirm readiness:
38
 
39
  ```bash
40
+ curl http://100.109.109.14:18084/v1/models
41
+ ```
42
+
43
+ Then keep the Mac-side fast proxy pointed at that vLLM endpoint:
44
+
45
+ ```bash
46
+ KAIJU_OPENAI_BASE_URL=http://100.109.109.14:18084/v1 \
47
+ python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
48
  ```
49
 
50
  The high-context `32768` target has benchmark evidence in
51
+ `release/SERVING_BENCHMARKS.md`, but the current speed/default path is 16k
52
+ runtime-quantized vLLM plus the local fast proxy.
 
53
 
54
  ## Prepare Merged-Model Hugging Face Metadata
55
 
 
88
 
89
  ```bash
90
  python3 evals/run_openai_compat_smoke.py \
91
+ --base-url http://100.109.109.14:18084/v1 \
92
  --model kaiju-coder-7 \
93
  --tasks evals/tasks/smoke.jsonl \
94
  --max-tasks 1 \
 
106
 
107
  ```bash
108
  python3 evals/run_openai_compat_smoke.py \
109
+ --base-url http://100.109.109.14:18084/v1 \
110
  --model kaiju-coder-7 \
111
  --tasks evals/tasks/business-owner-v18-comparison.jsonl \
112
  --timeout 900 \
PUBLIC_TESTING_QUICKSTART.md CHANGED
@@ -19,7 +19,7 @@ Use this if you already have Kaiju Coder 7 served at an OpenAI-compatible
19
  ```bash
20
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
21
  cd kaiju-coder-7-opencode
22
- python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18083/v1
23
  ```
24
 
25
  Then run OpenCode inside the project you want to edit:
@@ -65,23 +65,31 @@ the server to expose:
65
 
66
  ```text
67
  model id: kaiju-coder-7
68
- base URL: http://127.0.0.1:18083/v1
69
  context: 16384
70
  ```
71
 
 
 
 
 
 
 
 
 
72
  Then install the OpenCode helper with:
73
 
74
  ```bash
75
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
76
  cd kaiju-coder-7-opencode
77
- python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18083/v1
78
  ```
79
 
80
  ### Path 3: Runtime-Quantized Local Candidate
81
 
82
  Use this only if you are comfortable with advanced serving setups. The current
83
- working quantized option is a runtime bitsandbytes recipe, not a separate
84
- persisted quantized weights repo.
85
 
86
  ```bash
87
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
@@ -115,9 +123,12 @@ Expected result:
115
  - Public model id: `kaiju-coder-7`
116
  - OpenCode context: `16384`
117
  - Output cap for public testing: `2500`
 
118
  - Current reliable product path: model plus deterministic business-owner
119
- harness plus verifier
120
- - Raw multi-file OpenCode generation: still too slow for broad paid API claims
 
 
121
  - Paid API: not public until launch preflight passes
122
 
123
  ## What Not To Claim Yet
@@ -134,15 +145,21 @@ Do claim:
134
  - Kaiju Coder 7 has a working local/OpenCode release candidate
135
  - the current tested OpenCode default is 16k context
136
  - the helper package includes a lean agent and compaction loop guard
 
 
137
  - the paid API scaffold has tests and a launch preflight, but is not yet public
138
  - the packaged public smoke verifies a fresh OpenCode one-file write before
139
  public claims are refreshed
 
 
140
 
141
  ## Current Blockers Before Public Release
142
 
143
  - Hugging Face repo creation still requires a write-capable token or namespace.
144
  - Full merged model upload has not completed; the merged folder must first have
145
  the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
 
 
146
  - Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
147
  secret verification, Stripe webhook staging evidence, staging traffic, latency
148
  evidence, and rollback proof.
 
19
  ```bash
20
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
21
  cd kaiju-coder-7-opencode
22
+ python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
23
  ```
24
 
25
  Then run OpenCode inside the project you want to edit:
 
65
 
66
  ```text
67
  model id: kaiju-coder-7
68
+ base URL: http://127.0.0.1:18084/v1
69
  context: 16384
70
  ```
71
 
72
+ For the fastest OpenCode behavior, run the bundled fast proxy in a separate
73
+ terminal and point OpenCode at the proxy:
74
+
75
+ ```bash
76
+ KAIJU_OPENAI_BASE_URL=http://127.0.0.1:18084/v1 \
77
+ python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
78
+ ```
79
+
80
  Then install the OpenCode helper with:
81
 
82
  ```bash
83
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
84
  cd kaiju-coder-7-opencode
85
+ python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
86
  ```
87
 
88
  ### Path 3: Runtime-Quantized Local Candidate
89
 
90
  Use this only if you are comfortable with advanced serving setups. The current
91
+ working quantized option is a runtime bitsandbytes recipe. A Q8_0 GGUF artifact
92
+ has been converted, but it is still a candidate until runtime smoke passes.
93
 
94
  ```bash
95
  git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
 
123
  - Public model id: `kaiju-coder-7`
124
  - OpenCode context: `16384`
125
  - Output cap for public testing: `2500`
126
+ - Fast OpenCode path: vLLM bitsandbytes runtime behind the Kaiju fast proxy
127
  - Current reliable product path: model plus deterministic business-owner
128
+ harness/router plus verifier
129
+ - Raw multi-file OpenCode generation: still too slow for broad paid claims;
130
+ useful for testing, but paid API claims should favor harnessed product
131
+ workflows until broader latency gates pass
132
  - Paid API: not public until launch preflight passes
133
 
134
  ## What Not To Claim Yet
 
145
  - Kaiju Coder 7 has a working local/OpenCode release candidate
146
  - the current tested OpenCode default is 16k context
147
  - the helper package includes a lean agent and compaction loop guard
148
+ - the fast proxy keeps OpenCode tool calls intact while forcing bounded,
149
+ non-thinking generation
150
  - the paid API scaffold has tests and a launch preflight, but is not yet public
151
  - the packaged public smoke verifies a fresh OpenCode one-file write before
152
  public claims are refreshed
153
+ - a GGUF Q8_0 candidate exists, but is not public quantized-weights release
154
+ evidence until runtime smoke passes
155
 
156
  ## Current Blockers Before Public Release
157
 
158
  - Hugging Face repo creation still requires a write-capable token or namespace.
159
  - Full merged model upload has not completed; the merged folder must first have
160
  the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
161
+ - The GGUF Q8_0 candidate still needs a runtime smoke before public
162
+ quantized-weights upload.
163
  - Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
164
  secret verification, Stripe webhook staging evidence, staging traffic, latency
165
  evidence, and rollback proof.
QUANTIZATION_PLAN.md CHANGED
@@ -54,6 +54,44 @@ Findings on 2026-06-03:
54
  `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with exactly
55
  `Kaiju Coder 7 quantized runtime ok`.
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ## Candidate Order
58
 
59
  1. **FP8/AWQ-style GPU serving candidate**
@@ -65,7 +103,8 @@ Findings on 2026-06-03:
65
 
66
  2. **GGUF/llama.cpp candidate**
67
  - Best for broad local distribution if the architecture converts cleanly.
68
- - Publish only if a real local smoke test passes.
 
69
 
70
  3. **MLX candidate**
71
  - Best for Apple Silicon users if conversion supports this architecture.
@@ -90,7 +129,4 @@ path, but not as a public quantized-weights release.
90
 
91
  ## Next Concrete Step
92
 
93
- Create a pinned Docker/UV quantization environment on Gojira-B with the
94
- Qwen3.5-capable Transformers/runtime stack plus one persistent-weight
95
- quantization package at a time. Do not upload a quantized-weights repo until a
96
- smoke-tested persisted artifact exists.
 
54
  `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with exactly
55
  `Kaiju Coder 7 quantized runtime ok`.
56
 
57
+ Second persisted-quantization probe:
58
+
59
+ ```bash
60
+ ./scripts/probe-gojira-b-persisted-quantization.sh
61
+ ```
62
+
63
+ Findings on 2026-06-03:
64
+
65
+ - The active nightly vLLM stack recognizes `Qwen3_5Config` / `qwen3_5`.
66
+ - Normal dependency installs for AWQ/GPTQ/llmcompressor can break that
67
+ Qwen3.5-capable Transformers stack, so they are not safe to run casually in
68
+ the serving image.
69
+ - `autoawq` installed but importing `awq` failed against the current
70
+ Transformers activation API.
71
+ - `auto-gptq` failed during build isolation because Torch was not visible to
72
+ the isolated build step.
73
+ - `llmcompressor --no-deps` preserved Qwen3.5 config support, but import still
74
+ needs a pinned supporting dependency set. This remains the next best GPU
75
+ persisted-weight path after a dedicated environment is built.
76
+ - `llama.cpp` support includes `Qwen3_5ForConditionalGeneration`, and Q8_0
77
+ conversion dry-run passed.
78
+
79
+ Persisted GGUF conversion:
80
+
81
+ ```bash
82
+ ./scripts/run-gojira-b-kaiju-gguf-convert.sh
83
+ ```
84
+
85
+ - Output:
86
+ `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`
87
+ - Size: `27G`
88
+ - SHA256:
89
+ `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`
90
+ - Evidence:
91
+ `runs/gguf-conversion/20260603T231446Z/gguf-conversion.log`
92
+ - Release status: converted, runtime smoke still required before public
93
+ quantized-weights upload.
94
+
95
  ## Candidate Order
96
 
97
  1. **FP8/AWQ-style GPU serving candidate**
 
103
 
104
  2. **GGUF/llama.cpp candidate**
105
  - Best for broad local distribution if the architecture converts cleanly.
106
+ - Current state: Q8_0 converted successfully on Gojira-B.
107
+ - Publish only if a real runtime smoke test passes.
108
 
109
  3. **MLX candidate**
110
  - Best for Apple Silicon users if conversion supports this architecture.
 
129
 
130
  ## Next Concrete Step
131
 
132
+ Smoke-test the GGUF Q8_0 candidate next. Create a pinned Docker/UV quantization environment on Gojira-B with the Qwen3.5-capable Transformers/runtime stack plus one persistent-weight GPU quantization package at a time. Do not upload a quantized-weights repo until a smoke-tested persisted artifact exists.
 
 
 
SERVING_BENCHMARKS.md CHANGED
@@ -323,6 +323,7 @@ Runs:
323
  - `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
324
  - `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
325
  - `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
 
326
 
327
  | Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
328
  | --- | ---: | --- | --- | ---: | ---: | ---: |
@@ -332,6 +333,8 @@ Runs:
332
  | vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
333
  | vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
334
  | vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
 
 
335
 
336
  Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
337
  load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
@@ -356,3 +359,98 @@ restarted and re-confirmed. Treat vLLM bitsandbytes as the current working
356
  quantized local candidate for advanced GPU users and future paid API speed
357
  experiments. It now has direct identity/code/business-doc evidence plus an
358
  OpenCode one-file smoke, but it is not a persisted quantized-weights repo.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
323
  - `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
324
  - `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
325
  - `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
326
+ - `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
327
 
328
  | Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
329
  | --- | ---: | --- | --- | ---: | ---: | ---: |
 
333
  | vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
334
  | vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
335
  | vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
336
+ | vLLM bitsandbytes | 16384 | code_patch | True | 24.97 | 997 | 39.924 |
337
+ | vLLM bitsandbytes | 16384 | business_doc | True | 34.46 | 1615 | 46.874 |
338
 
339
  Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
340
  load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
 
359
  quantized local candidate for advanced GPU users and future paid API speed
360
  experiments. It now has direct identity/code/business-doc evidence plus an
361
  OpenCode one-file smoke, but it is not a persisted quantized-weights repo.
362
+
363
+ ## 2026-06-03 Fast Proxy And Website Harness Speed Pass
364
+
365
+ The current speed profile keeps runtime-quantized vLLM active on Gojira-B port
366
+ `18084` and routes OpenCode through the local fast proxy at
367
+ `http://127.0.0.1:18181/v1`. The proxy preserves OpenCode tool-call streaming
368
+ while forcing `thinking=false`, model id `kaiju-coder-7`, and bounded output
369
+ budgets.
370
+
371
+ Active endpoint checks:
372
+
373
+ - Local fast proxy health: `http://127.0.0.1:18181/health`
374
+ - Upstream vLLM models: `http://100.109.109.14:18084/v1/models`
375
+ - Upstream reports `kaiju-coder-7` with `max_model_len=16384`
376
+
377
+ Fresh direct vLLM benchmark:
378
+
379
+ - Run: `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
380
+ - Identity: `19.48s`
381
+ - Code patch: `24.97s`, `997` chars
382
+ - Business doc: `34.46s`, `1,615` chars
383
+
384
+ Fresh OpenCode smoke through the local fast proxy:
385
+
386
+ - Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
387
+ - Result: passed in about `23.5s`, wrote the exact requested file.
388
+ - Packaged public verifier after exact-content agent rule:
389
+ `runs/public-opencode-smoke/20260603T232928Z/summary.md`, `4/4`
390
+ passed through `http://127.0.0.1:18181/v1`.
391
+
392
+ Website harness/router speed pass:
393
+
394
+ - Direct website harness command: `python3 scripts/run_kaiju_website_harness.py --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
395
+ - Direct website harness result: `runs/harness/website-speed-pass/avery-stone-vllm.html`, `9,257` chars, `7.31s`
396
+ - Router command: `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
397
+ - Router artifact: `runs/router-speed-pass/20260603T223731Z-website-build-a-premium-one-page-website-for-avery-stone-construction-a-reside/index.html`
398
+ - Router result: passed in `7.20s`; checks covered complete HTML, required sections, external images, responsive CSS, no lorem ipsum, and manifest write.
399
+ - Router through the installed local proxy: `runs/router-speed-pass/20260603T224328Z-website-build-a-premium-one-page-website-for-bennett-family-dental-in-charlott/index.html`
400
+ - Proxy router result: passed in `4.67s`; preserved explicit CTA `Schedule a Visit`, inferred `dental`, and passed the same complete-HTML/static checks.
401
+
402
+ Updated recommendation: for speed-sensitive OpenCode and paid workflow testing,
403
+ use vLLM bitsandbytes plus the local fast proxy as the active default. Keep
404
+ SGLang as fallback/historical evidence, not the fastest current path. For
405
+ websites and business-owner packs, prefer the deterministic router/harness path
406
+ over raw long-form HTML generation.
407
+
408
+ Public business-owner demo pack through the active fast proxy:
409
+
410
+ ```bash
411
+ python3 scripts/run_kaiju_public_demo_pack.py \
412
+ --openai-base-url http://127.0.0.1:18181/v1 \
413
+ --model kaiju-coder-7 \
414
+ --planner-timeout 90
415
+ ```
416
+
417
+ Run: `runs/public-demo-pack/20260603T232534Z/summary.md`
418
+
419
+ | Task | Result | Seconds | Changed files |
420
+ | --- | --- | ---: | ---: |
421
+ | Website | Passed | 24.59 | 2 |
422
+ | Owner AI company pack | Passed | 29.99 | 19 |
423
+ | Stripe safety plan | Passed | 9.93 | 2 |
424
+ | CSV parser artifact | Passed | 19.93 | 2 |
425
+
426
+ Total: `4/4` passed in `84.43s`.
427
+
428
+ ## Persisted GGUF Q8_0 Candidate
429
+
430
+ The dedicated persisted-quantization pass found that normal AWQ/GPTQ installs
431
+ are not clean against the Qwen3.5-capable serving stack tonight, while
432
+ `llama.cpp` conversion support includes `Qwen3_5ForConditionalGeneration`.
433
+
434
+ Command:
435
+
436
+ ```bash
437
+ ./scripts/probe-gojira-b-persisted-quantization.sh
438
+ ./scripts/run-gojira-b-kaiju-gguf-convert.sh
439
+ ```
440
+
441
+ Result:
442
+
443
+ - Artifact:
444
+ `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`
445
+ - Size: `27G`
446
+ - SHA256:
447
+ `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`
448
+ - Conversion log:
449
+ `runs/gguf-conversion/20260603T231446Z/gguf-conversion.log`
450
+ - Runtime status: candidate only; direct GGUF runtime smoke still required
451
+ before publishing quantized weights.
452
+
453
+ Interpretation: the next real speed improvement for broad public users is not
454
+ another prompt tweak. It is a smoked GGUF or GPU-persisted quantized artifact.
455
+ The fastest currently verified Kaiju Coder 7 path remains vLLM bitsandbytes
456
+ plus the local fast proxy and deterministic website/business harnesses.
scripts/check_hf_uploaded_release.py CHANGED
@@ -24,7 +24,7 @@ from typing import Any
24
 
25
  MODEL_ID = "kaiju-coder-7"
26
  DEFAULT_NAMESPACE = "RMDWLLC"
27
- DEFAULT_BASE_URL = "http://100.109.109.14:18083/v1"
28
 
29
 
30
  @dataclass(frozen=True)
 
24
 
25
  MODEL_ID = "kaiju-coder-7"
26
  DEFAULT_NAMESPACE = "RMDWLLC"
27
+ DEFAULT_BASE_URL = "http://127.0.0.1:18181/v1"
28
 
29
 
30
  @dataclass(frozen=True)