Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RMDWLLC/kaiju-coder-7 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7")
model = AutoModelForImageTextToText.from_pretrained("RMDWLLC/kaiju-coder-7")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RMDWLLC/kaiju-coder-7 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RMDWLLC/kaiju-coder-7"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RMDWLLC/kaiju-coder-7

SGLang

How to use RMDWLLC/kaiju-coder-7 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RMDWLLC/kaiju-coder-7" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RMDWLLC/kaiju-coder-7" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
```
docker model run hf.co/RMDWLLC/kaiju-coder-7
```

restokes92 commited on 5 days ago

Commit

4ca1eb4

verified ·

1 Parent(s): 00ba859

Add files using upload-large-folder tool

Browse files

Files changed (10) hide show

EVAL_SCOREBOARD.md +11 -7
FINAL_RELEASE_REPORT.md +18 -8
GOAL_COMPLETION_AUDIT.md +4 -4
HF_UPLOAD_EVIDENCE.md +9 -8
LOCAL_TEST_INSTRUCTIONS.md +19 -13
MERGED_MODEL_RELEASE_MANIFEST.json +1 -1
PAID_API_READINESS.md +8 -8
PUBLIC_TESTING_QUICKSTART.md +37 -18
README.md +8 -2
SERVING_BENCHMARKS.md +117 -17

EVAL_SCOREBOARD.md CHANGED Viewed

@@ -35,7 +35,7 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
 | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
 | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
 | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
-| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` | Passed; latest run `runs/public-opencode-smoke/20260603T182222Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and file written only in the requested temp workspace | 2026-06-03 |
 | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
 | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
 | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
@@ -43,9 +43,13 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
 | Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
 | Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
 | Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
 | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
-| Hugging Face private repo create attempt | `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_release_staging.sh` with namespaces `RichardEchols`, `RMDWLLC`, and `restokes92` | Blocked by Hugging Face `403 Forbidden`; current token cannot create model repos in those namespaces | 2026-06-03 |
-| Hugging Face merged-model metadata and upload boundary | `bash scripts/prepare_hf_merged_model_metadata.sh`; `KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh`; `bash scripts/upload_hf_merged_model_from_gojira_b.sh`; `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Metadata prep synced model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` to Gojira-B; sudo rsync handled the root-owned merged folder; upload dry run confirmed metadata plus the `51G`/`14`-shard merged model before printing `hf upload-large-folder`; apply remains blocked by human review and Hugging Face namespace permission before any large upload | 2026-06-03 |
 | v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
 | Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
 | Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
@@ -60,11 +64,11 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
 | v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
 | Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
 | Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
-| GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending |
 | Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
-| Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Pending |
-| Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged and upload-scripted; upload blocked by HF token permissions and human/public-review decision |
 ## Decision Rule
-The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should still be published honestly as merged model plus deterministic harness plus verifier. Raw merged v1.8 is useful on business documents and Jah credits but slow on this SGLang stack. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.

 | Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
 | Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
 | Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
+| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` | Passed; latest run `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
 | Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
 | Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
 | Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
 | Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
 | Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
 | Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
+| Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
+| Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
+| GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
+| Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `64.529s`: website `4.73s`, owner AI company pack `29.85s` with `19` files, Stripe safety plan `9.99s`, CSV parser artifact `19.97s`; run `runs/public-demo-pack/20260603T235009Z/summary.md` | 2026-06-03 |
 | Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
+| Hugging Face public helper repos | `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` | Passed `17/17`; public downloads verified for adapter, OpenCode helper, and runtime helper, including installer dry-run, demo runner, and GGUF candidate note | 2026-06-03 |
+| Hugging Face merged-model upload | `KAIJU_HF_NAMESPACE=RMDWLLC KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Uploaded public repo `RMDWLLC/kaiju-coder-7`; `hf upload-large-folder` processed `53.8G/53.8G`, `39` files, `14` safetensors shards; metadata reports `private: false` | 2026-06-03 |
 | v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
 | Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
 | Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
 | v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
 | Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
 | Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
+| GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending; required only before superiority claims |
 | Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
+| Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Approved for public HF visibility and paid API launch preflight on 2026-06-03 |
+| Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged, bundled, uploaded to public HF repos, and verified with public downloads |
 ## Decision Rule
+The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should be published honestly as Kaiju Coder 7 plus deterministic harness plus verifier, with vLLM bitsandbytes plus the fast proxy as the current speed path. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.

FINAL_RELEASE_REPORT.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Kaiju Coder 7 Final Release Report
-Generated: `2026-06-03T21:12:14Z`
 Product name: `Kaiju Coder 7`
 Public model id: `kaiju-coder-7`
@@ -25,7 +25,7 @@ Stripe live-mode switch and controlled live payment verification.
 | Field | Value |
 |---|---|
 | Status | `pass` |
-| Base URL | `http://100.109.109.14:18083/v1` |
 | Model id | `kaiju-coder-7` |
 | Max model length | `16384` |
 | Detail | `` |
@@ -52,7 +52,7 @@ stability and speed.
 | Small helper repos uploaded | `True` |
 | Merged model uploaded | `True` |
 | Merged repo | `RMDWLLC/kaiju-coder-7` |
-| Merged repo SHA | `736af44add9321f74e8603cd739245fc0853d62c` |
 | Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
 | Download status | `public downloads verified; no active private-storage blocker recorded` |
 | Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
@@ -93,7 +93,7 @@ stability and speed.
 | Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
 | Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
 | Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
-| Latest direct API smoke | `runs/benchmarks/20260603T193000Z-kaiju-coder-7-serving/summary.md` |
 | Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
 | Latest public OpenCode smoke | `runs/public-opencode-smoke` |
@@ -133,7 +133,7 @@ human release review explicitly approves public paid API launch.
 ## Changed Files
-`git status --short` currently reports `116` changed paths.
 | State | Path |
 |---|---|
@@ -153,8 +153,10 @@ human release review explicitly approves public paid API launch.
 | M | `gateway/cloudflare-worker/src/index.js` |
 | M | `gateway/cloudflare-worker/test/index.test.js` |
 | M | `gateway/cloudflare-worker/wrangler.jsonc` |
 | M | `kaiju_harness/router.py` |
 | M | `kaiju_harness/verification.py` |
 | D | `models/README.md` |
 | D | `models/qwen3.6-27b-base.md` |
 | D | `models/qwen3.6-27b-fp8.md` |
@@ -164,14 +166,17 @@ human release review explicitly approves public paid API launch.
 | M | `release/MODEL_CARD_DRAFT.md` |
 | M | `scripts/build_sft_dataset.py` |
 | M | `scripts/check-gojira-b-capacity.sh` |
 | M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
 | M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
 | M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
 | M | `scripts/run_kaiju_api_harness_smoke.py` |
 | M | `scripts/start-qwen36-lora-sglang.sh` |
 | M | `scripts/stop-qwen36-lora-sglang.sh` |
 | M | `scripts/validate_training_data.py` |
 | M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
 | ?? | `.opencode/` |
 | ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
 | ?? | `datasets/v1.7-targets.json` |
@@ -196,6 +201,7 @@ human release review explicitly approves public paid API launch.
 | ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
 | ?? | `release/bundles/` |
 | ?? | `release/cloudflare-bindings.example.json` |
 | ?? | `release/hf-release-permission-evidence.example.json` |
 | ?? | `release/hf-release-permission-evidence.json` |
 | ?? | `release/huggingface/` |
@@ -225,17 +231,21 @@ human release review explicitly approves public paid API launch.
 | ?? | `scripts/generate_kaiju_final_report.py` |
 | ?? | `scripts/gojira-b-ssh-lib.sh` |
 | ?? | `scripts/install_kaiju_opencode_profile.py` |
 | ?? | `scripts/make_hf_release_public.sh` |
 | ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
 | ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
 | ?? | `scripts/prepare_hf_release_staging.sh` |
 | ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
 | ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
 | ?? | `scripts/refresh_kaiju_release_evidence.py` |
 | ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
 | ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
 | ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
 | ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
 | ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
 | ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
 | ?? | `scripts/start-qwen36-merged-sglang.sh` |
@@ -262,9 +272,9 @@ human release review explicitly approves public paid API launch.
 | git HEAD | `git rev-parse HEAD` | 0 |
 | git origin/main | `git rev-parse origin/main` | 0 |
 | git status | `git status --short` | 0 |
-| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
-| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
-| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
 | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
 | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |

 # Kaiju Coder 7 Final Release Report
+Generated: `2026-06-03T23:53:31Z`
 Product name: `Kaiju Coder 7`
 Public model id: `kaiju-coder-7`
 | Field | Value |
 |---|---|
 | Status | `pass` |
+| Base URL | `http://127.0.0.1:18181/v1` |
 | Model id | `kaiju-coder-7` |
 | Max model length | `16384` |
 | Detail | `` |
 | Small helper repos uploaded | `True` |
 | Merged model uploaded | `True` |
 | Merged repo | `RMDWLLC/kaiju-coder-7` |
+| Merged repo SHA | `00ba85985102a14838dbb8a5692d9a75ce9da15a` |
 | Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
 | Download status | `public downloads verified; no active private-storage blocker recorded` |
 | Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
 | Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
 | Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
 | Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
+| Latest direct API smoke | `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md` |
 | Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
 | Latest public OpenCode smoke | `runs/public-opencode-smoke` |
 ## Changed Files
+`git status --short` currently reports `126` changed paths.
 | State | Path |
 |---|---|
 | M | `gateway/cloudflare-worker/src/index.js` |
 | M | `gateway/cloudflare-worker/test/index.test.js` |
 | M | `gateway/cloudflare-worker/wrangler.jsonc` |
+| M | `gateway/gojira-local/server.py` |
 | M | `kaiju_harness/router.py` |
 | M | `kaiju_harness/verification.py` |
+| M | `kaiju_harness/website.py` |
 | D | `models/README.md` |
 | D | `models/qwen3.6-27b-base.md` |
 | D | `models/qwen3.6-27b-fp8.md` |
 | M | `release/MODEL_CARD_DRAFT.md` |
 | M | `scripts/build_sft_dataset.py` |
 | M | `scripts/check-gojira-b-capacity.sh` |
+| M | `scripts/check_kaiju_gateway_policy.py` |
 | M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
 | M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
 | M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
 | M | `scripts/run_kaiju_api_harness_smoke.py` |
+| M | `scripts/run_kaiju_router.py` |
 | M | `scripts/start-qwen36-lora-sglang.sh` |
 | M | `scripts/stop-qwen36-lora-sglang.sh` |
 | M | `scripts/validate_training_data.py` |
 | M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
+| M | `tests/test_website_harness.py` |
 | ?? | `.opencode/` |
 | ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
 | ?? | `datasets/v1.7-targets.json` |
 | ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
 | ?? | `release/bundles/` |
 | ?? | `release/cloudflare-bindings.example.json` |
+| ?? | `release/gguf/` |
 | ?? | `release/hf-release-permission-evidence.example.json` |
 | ?? | `release/hf-release-permission-evidence.json` |
 | ?? | `release/huggingface/` |
 | ?? | `scripts/generate_kaiju_final_report.py` |
 | ?? | `scripts/gojira-b-ssh-lib.sh` |
 | ?? | `scripts/install_kaiju_opencode_profile.py` |
+| ?? | `scripts/kaiju_opencode_fast_proxy.py` |
 | ?? | `scripts/make_hf_release_public.sh` |
 | ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
 | ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
 | ?? | `scripts/prepare_hf_release_staging.sh` |
 | ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
 | ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
+| ?? | `scripts/probe-gojira-b-persisted-quantization.sh` |
 | ?? | `scripts/refresh_kaiju_release_evidence.py` |
+| ?? | `scripts/run-gojira-b-kaiju-gguf-convert.sh` |
 | ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
 | ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
 | ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
 | ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
+| ?? | `scripts/run_kaiju_public_demo_pack.py` |
 | ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
 | ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
 | ?? | `scripts/start-qwen36-merged-sglang.sh` |
 | git HEAD | `git rev-parse HEAD` | 0 |
 | git origin/main | `git rev-parse origin/main` | 0 |
 | git status | `git status --short` | 0 |
+| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
+| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
+| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
 | HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
 | paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |

GOAL_COMPLETION_AUDIT.md CHANGED Viewed

@@ -1,11 +1,11 @@
 # Kaiju Coder 7 Goal Completion Audit
-Generated: `2026-06-03T21:12:21Z`
 Overall: `complete`
 Summary: `18 passed / 0 blocked / 0 manual`
-This audit maps the active Kaiju Coder 7 objective to current evidence. It is stricter than local readiness: local public testing and Hugging Face release checks can pass while paid API launch remains blocked.
 ## Readiness Commands
@@ -28,13 +28,13 @@ This audit maps the active Kaiju Coder 7 objective to current evidence. It is st
 | OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py |  |
 | OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py |  |
 | OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md |  |
-| Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/20260603T193000Z-kaiju-coder-7-serving/summary.md |  |
 | Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default |  |
 | Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md |  |
 | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation |  |
 | Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums |  |
 | Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release |  |
-| Hugging Face | Merged 51GB model repo upload is complete or guarded and ready after human review/namespace permission. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh |  |
 | Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply |  |
 | Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md |  |
 | Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md |  |

 # Kaiju Coder 7 Goal Completion Audit
+Generated: `2026-06-03T23:53:44Z`
 Overall: `complete`
 Summary: `18 passed / 0 blocked / 0 manual`
+This audit maps the active Kaiju Coder 7 objective to current evidence across local runtime, Hugging Face release, OpenCode, paid API preflight, and remaining honest caveats.
 ## Readiness Commands
 | OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py |  |
 | OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py |  |
 | OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md |  |
+| Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md |  |
 | Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default |  |
 | Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md |  |
 | Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation |  |
 | Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums |  |
 | Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release |  |
+| Hugging Face | Merged 51GB model repo upload is complete and public, or guarded with explicit evidence. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh |  |
 | Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply |  |
 | Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md |  |
 | Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md |  |

HF_UPLOAD_EVIDENCE.md CHANGED Viewed

@@ -1,15 +1,15 @@
 # Kaiju Coder 7 Hugging Face Upload Evidence
-Generated: `2026-06-03T20:36:26Z`
 ## Uploaded Repos
 | Repo | Visibility | Evidence |
 |---|---|---|
-| `RMDWLLC/kaiju-coder-7-adapter` | public | Final visible SHA `67bb48b8115b820cd8b01d1778d2610d9ce63692`; public visibility verified after 2026-06-03 paid API evidence refresh. |
-| `RMDWLLC/kaiju-coder-7-opencode` | public | Final visible SHA `3c9c75416ffb41645a1a959beb99baeff6972fb8`; public visibility and OpenCode installer dry-run verified. |
-| `RMDWLLC/kaiju-coder-7-quantized-runtime` | public | Uploaded at commit `6d7449a3ffac68ed1d591c57b044ba599cee8b11`; public visibility verified. |
-| `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `736af44add9321f74e8603cd739245fc0853d62c`; public metadata reports `private: false`. |
 These SHAs are a point-in-time release evidence snapshot. Uploading this
 evidence file itself creates another metadata commit, so use `hf models info`
@@ -71,14 +71,15 @@ Result:
 - `hf auth whoami` returned user `restokes92` with org `RMDWLLC`.
 - `hf repos settings ... --public` completed for all four repos.
 - `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public`
-  passed `17/17` checks after the public visibility switch and again after
-  the refreshed public helper upload.
 - The adapter, OpenCode helper, and runtime-quantized helper repos downloaded
   successfully as public repos.
 - The downloaded OpenCode helper installer dry-run passed and included the
   loop guard.
 - Merged model metadata reports `private: false`, SHA
-  `736af44add9321f74e8603cd739245fc0853d62c`, and lists all `14`
   safetensors shards.
 The earlier private-storage limit blocked private file downloads after the

 # Kaiju Coder 7 Hugging Face Upload Evidence
+Generated: `2026-06-03T23:37:20Z`
 ## Uploaded Repos
 | Repo | Visibility | Evidence |
 |---|---|---|
+| `RMDWLLC/kaiju-coder-7-adapter` | public | Refreshed public helper/evidence package commit `943b6fc7e025bbacd8b94275eb4321f6b0ed69c7`; public visibility verified after 2026-06-03 speed and GGUF-candidate pass. |
+| `RMDWLLC/kaiju-coder-7-opencode` | public | Refreshed OpenCode helper commit `032872d88fd799515ac81158e011780e0d6059f6`; public visibility, installer dry-run, and exact-file smoke verified. |
+| `RMDWLLC/kaiju-coder-7-quantized-runtime` | public | Current public helper commit `785f3d758da493e3c435d67ef12c3e1e4d62db1a`; includes runtime bitsandbytes recipe plus GGUF Q8_0 candidate note. |
+| `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `00ba85985102a14838dbb8a5692d9a75ce9da15a`; public metadata reports `private: false`. |
 These SHAs are a point-in-time release evidence snapshot. Uploading this
 evidence file itself creates another metadata commit, so use `hf models info`
 - `hf auth whoami` returned user `restokes92` with org `RMDWLLC`.
 - `hf repos settings ... --public` completed for all four repos.
 - `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public`
+  passed `17/17` checks after the public visibility switch, after the refreshed
+  public helper upload, and again after adding stricter checks for the demo
+  runner and GGUF candidate package files.
 - The adapter, OpenCode helper, and runtime-quantized helper repos downloaded
   successfully as public repos.
 - The downloaded OpenCode helper installer dry-run passed and included the
   loop guard.
 - Merged model metadata reports `private: false`, SHA
+  `00ba85985102a14838dbb8a5692d9a75ce9da15a`, and lists all `14`
   safetensors shards.
 The earlier private-storage limit blocked private file downloads after the

LOCAL_TEST_INSTRUCTIONS.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Kaiju Coder 7 Local Test Instructions
-Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the SGLang commands require Gojira B over Tailscale.
 ## Run The Local Release-Candidate Gate
@@ -24,26 +24,32 @@ KAIJU_MERGED_MODEL_DIR=/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.
 ## Start Kaiju Coder 7 Serving
-Use this for the current model-side candidate:
 ```bash
-KAIJU_QWEN36_MERGED_PORT=18083 \
-KAIJU_QWEN36_MERGED_SESSION=kaiju_qwen36_v18_merged_sglang \
-KAIJU_QWEN36_MERGED_CONTEXT=16384 \
-KAIJU_QWEN36_MERGED_MEM_FRACTION=0.85 \
-  ./scripts/start-qwen36-merged-sglang.sh
 ```
 Confirm readiness:
 ```bash
-curl http://100.109.109.14:18083/v1/models
 ```
 The high-context `32768` target has benchmark evidence in
-`release/SERVING_BENCHMARKS.md`, but the current restored Gojira-B endpoint is
-parked at `16384` for reliable local/OpenCode testing after the quantized-vLLM
-smoke work.
 ## Prepare Merged-Model Hugging Face Metadata
@@ -82,7 +88,7 @@ python3 scripts/run_kaiju_api_harness_smoke.py
 ```bash
 python3 evals/run_openai_compat_smoke.py \
-  --base-url http://100.109.109.14:18083/v1 \
   --model kaiju-coder-7 \
   --tasks evals/tasks/smoke.jsonl \
   --max-tasks 1 \
@@ -100,7 +106,7 @@ evals pass at acceptable latency:
 ```bash
 python3 evals/run_openai_compat_smoke.py \
-  --base-url http://100.109.109.14:18083/v1 \
   --model kaiju-coder-7 \
   --tasks evals/tasks/business-owner-v18-comparison.jsonl \
   --timeout 900 \

 # Kaiju Coder 7 Local Test Instructions
+Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the fastest current runtime is vLLM bitsandbytes on Gojira B over Tailscale with the local OpenCode fast proxy.
 ## Run The Local Release-Candidate Gate
 ## Start Kaiju Coder 7 Serving
+Use this for the fastest current model-side candidate:
 ```bash
+KAIJU_VLLM_CONTEXT=16384 \
+KAIJU_VLLM_QUANTIZATION=bitsandbytes \
+KAIJU_VLLM_LOAD_FORMAT=bitsandbytes \
+KAIJU_VLLM_GPU_UTIL=0.90 \
+  ./scripts/start-qwen36-merged-vllm.sh
 ```
 Confirm readiness:
 ```bash
+curl http://100.109.109.14:18084/v1/models
+```
+Then keep the Mac-side fast proxy pointed at that vLLM endpoint:
+```bash
+KAIJU_OPENAI_BASE_URL=http://100.109.109.14:18084/v1 \
+python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
 ```
 The high-context `32768` target has benchmark evidence in
+`release/SERVING_BENCHMARKS.md`, but the current speed/default path is 16k
+runtime-quantized vLLM plus the local fast proxy.
 ## Prepare Merged-Model Hugging Face Metadata
 ```bash
 python3 evals/run_openai_compat_smoke.py \
+  --base-url http://100.109.109.14:18084/v1 \
   --model kaiju-coder-7 \
   --tasks evals/tasks/smoke.jsonl \
   --max-tasks 1 \
 ```bash
 python3 evals/run_openai_compat_smoke.py \
+  --base-url http://100.109.109.14:18084/v1 \
   --model kaiju-coder-7 \
   --tasks evals/tasks/business-owner-v18-comparison.jsonl \
   --timeout 900 \

MERGED_MODEL_RELEASE_MANIFEST.json CHANGED Viewed

@@ -6,6 +6,6 @@
   "notes": [
     "Local metadata sync only; no Hugging Face upload performed.",
     "Qwen attribution belongs in README/provenance/license notes, not the product model id.",
-    "Public paid API launch remains blocked until live launch preflight and human review pass."
   ]
 }

   "notes": [
     "Local metadata sync only; no Hugging Face upload performed.",
     "Qwen attribution belongs in README/provenance/license notes, not the product model id.",
+    "Public paid API preflight evidence has passed; real customer charging still requires the deliberate Stripe live-mode switch."
   ]
 }

PAID_API_READINESS.md CHANGED Viewed

@@ -152,12 +152,12 @@ python3 scripts/check_paid_api_readiness.py --mode launch
 ```
 `check_kaiju_public_release_readiness.py --mode local` is the consolidated
-public-testing readiness command. It can pass while public upload and paid API
-launch remain manual blockers. `--mode hf-release` checks the downloadable
-model/helper release and requires sanitized Hugging Face namespace permission
-evidence plus human review while keeping paid API launch manual. `--mode public`
-must remain red until Hugging Face write permissions, live Cloudflare resources,
-Stripe staging evidence, rollback proof, and human review are complete.
 `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
 the current local/public readiness summaries, launch blockers, changed files,
@@ -167,8 +167,8 @@ lines.
 `check_kaiju_goal_completion.py --write` writes
 `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
-remain red while Hugging Face upload, human review, or live paid API launch
-evidence are missing.
 `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
 direct API smoke evidence, goal audit, final report, HF staging, local bundle,

 ```
 `check_kaiju_public_release_readiness.py --mode local` is the consolidated
+public-testing readiness command. `--mode hf-release` checks the downloadable
+model/helper release, public Hugging Face evidence, and human review while
+keeping live paid charging separate from model publication. `--mode public`
+now passes after public HF verification, live Cloudflare resource evidence,
+Stripe test-mode staging evidence, rollback proof, paid-route latency evidence,
+and human review are complete.
 `generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
 the current local/public readiness summaries, launch blockers, changed files,
 `check_kaiju_goal_completion.py --write` writes
 `release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
+remain green only while the live runtime, public HF evidence, human review, and
+paid API launch evidence continue to pass.
 `refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
 direct API smoke evidence, goal audit, final report, HF staging, local bundle,

PUBLIC_TESTING_QUICKSTART.md CHANGED Viewed

@@ -19,7 +19,7 @@ Use this if you already have Kaiju Coder 7 served at an OpenAI-compatible
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
 cd kaiju-coder-7-opencode
-python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18083/v1
 ```
 Then run OpenCode inside the project you want to edit:
@@ -65,23 +65,31 @@ the server to expose:
 ```text
 model id: kaiju-coder-7
-base URL: http://127.0.0.1:18083/v1
 context: 16384
 ```
 Then install the OpenCode helper with:
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
 cd kaiju-coder-7-opencode
-python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18083/v1
 ```
 ### Path 3: Runtime-Quantized Local Candidate
 Use this only if you are comfortable with advanced serving setups. The current
-working quantized option is a runtime bitsandbytes recipe, not a separate
-persisted quantized weights repo.
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
@@ -115,10 +123,14 @@ Expected result:
 - Public model id: `kaiju-coder-7`
 - OpenCode context: `16384`
 - Output cap for public testing: `2500`
 - Current reliable product path: model plus deterministic business-owner
-  harness plus verifier
-- Raw multi-file OpenCode generation: still too slow for broad paid API claims
-- Paid API: not public until launch preflight passes
 ## What Not To Claim Yet
@@ -134,16 +146,23 @@ Do claim:
 - Kaiju Coder 7 has a working local/OpenCode release candidate
 - the current tested OpenCode default is 16k context
 - the helper package includes a lean agent and compaction loop guard
 - the paid API scaffold has tests and a launch preflight, but is not yet public
 - the packaged public smoke verifies a fresh OpenCode one-file write before
   public claims are refreshed
-## Current Blockers Before Public Release
-- Hugging Face repo creation still requires a write-capable token or namespace.
-- Full merged model upload has not completed; the merged folder must first have
-  the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
-- Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
-  secret verification, Stripe webhook staging evidence, staging traffic, latency
-  evidence, and rollback proof.
-- Human review is still required before public upload.

 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
 cd kaiju-coder-7-opencode
+python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
 ```
 Then run OpenCode inside the project you want to edit:
 ```text
 model id: kaiju-coder-7
+base URL: http://127.0.0.1:18084/v1
 context: 16384
 ```
+For the fastest OpenCode behavior, run the bundled fast proxy in a separate
+terminal and point OpenCode at the proxy:
+```bash
+KAIJU_OPENAI_BASE_URL=http://127.0.0.1:18084/v1 \
+python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
+```
 Then install the OpenCode helper with:
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
 cd kaiju-coder-7-opencode
+python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
 ```
 ### Path 3: Runtime-Quantized Local Candidate
 Use this only if you are comfortable with advanced serving setups. The current
+working quantized option is a runtime bitsandbytes recipe. A Q8_0 GGUF artifact
+has been converted, but it is still a candidate until runtime smoke passes.
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
 - Public model id: `kaiju-coder-7`
 - OpenCode context: `16384`
 - Output cap for public testing: `2500`
+- Fast OpenCode path: vLLM bitsandbytes runtime behind the Kaiju fast proxy
 - Current reliable product path: model plus deterministic business-owner
+  harness/router plus verifier
+- Raw multi-file OpenCode generation: still too slow for broad paid claims;
+  useful for testing, but paid API claims should favor harnessed product
+  workflows until broader latency gates pass
+- Paid API: not public until launch preflight passes and the Stripe live-mode
+  switch is deliberately completed
 ## What Not To Claim Yet
 - Kaiju Coder 7 has a working local/OpenCode release candidate
 - the current tested OpenCode default is 16k context
 - the helper package includes a lean agent and compaction loop guard
+- the fast proxy keeps OpenCode tool calls intact while forcing bounded,
+  non-thinking generation
 - the paid API scaffold has tests and a launch preflight, but is not yet public
 - the packaged public smoke verifies a fresh OpenCode one-file write before
   public claims are refreshed
+- a GGUF Q8_0 candidate exists, but is not public quantized-weights release
+  evidence until runtime smoke passes
+## Remaining Caveats Before Broader Claims
+- Hugging Face public release repos are uploaded and public under `RMDWLLC`.
+- The GGUF Q8_0 candidate still needs a runtime smoke before public
+  quantized-weights upload.
+- Raw multi-file OpenCode generation is still not the public speed story; use
+  the deterministic router/harness for websites and business-owner packs.
+- Public paid API launch has approval and preflight evidence, but real customer
+  charging still needs a deliberate Stripe live-mode switch and controlled live
+  payment verification.
+- Do not claim 32k context as the live default until it is freshly restarted
+  and re-confirmed.

README.md CHANGED Viewed

@@ -108,12 +108,18 @@ Current local harness evidence:
   - Adapter-name-only serving can be base-equivalent.
   - Corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`.
   - Dynamic LoRA is not the release serving path for this checkpoint.
-- Kaiju Coder 7 serving config: SGLang over Tailscale at `http://100.109.109.14:18083/v1`, model `kaiju-coder-7`, current parked Gojira-B/OpenCode context `16384`, tested high-context target `32768`, memory fraction `0.90`.
 - v1.8 merged endpoint probe: `1,155` visible chars in `60.17s`.
 - v1.8 merged focused eval:
   - Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
   - Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
-- Broader base-Qwen, GLM, and raw website comparisons are still pending.
 Sellable-candidate gate:

   - Adapter-name-only serving can be base-equivalent.
   - Corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`.
   - Dynamic LoRA is not the release serving path for this checkpoint.
+- Kaiju Coder 7 current serving config: vLLM bitsandbytes runtime
+  quantization on Gojira B at `http://100.109.109.14:18084/v1`, exposed on
+  this Mac through `http://127.0.0.1:18181/v1`, model `kaiju-coder-7`,
+  current OpenCode context `16384`. SGLang has historical 32k benchmark
+  evidence, but 32k should be freshly restarted and re-confirmed before being
+  called the live default.
 - v1.8 merged endpoint probe: `1,155` visible chars in `60.17s`.
 - v1.8 merged focused eval:
   - Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
   - Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
+- Broader base-Qwen, GLM, and raw website comparisons are still pending before
+  any superiority claims.
 Sellable-candidate gate:

SERVING_BENCHMARKS.md CHANGED Viewed

@@ -6,12 +6,15 @@ The model id must remain `kaiju-coder-7`.
 ## Current Live Runtime
 - Host: Gojira-B over Tailscale
-- Base URL: `http://100.109.109.14:18083/v1`
-- Serving stack: SGLang merged full model
-- Current verified post-quantization restored context: `16384`
 - Tested high-context target: `32768`
-- Current container: `qwen36-merged-sglang-18083`
-- Current caveat: direct raw generation is slow for multi-file OpenCode work.
 ## Benchmark Command
@@ -294,12 +297,11 @@ Run: `runs/benchmarks/20260603T151244Z-kaiju-coder-7-serving/summary.md`
 | vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
 | vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
-Interpretation: vLLM now runs Kaiju Coder 7 at 16k, but it is not clearly
-faster than SGLang on the current smoke prompts. Keep SGLang as the recommended
-runtime because it has stable OpenCode smoke evidence, a simpler launch path,
-and historical 32k proof. Keep the live/default OpenCode profile at 16k until
-32k is freshly re-confirmed. Keep the vLLM scripts for future nightly-image or
-quantized-weight testing.
 ## vLLM bitsandbytes Runtime-Quantized Candidate
@@ -323,6 +325,7 @@ Runs:
 - `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
 - `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
 - `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
 | Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
 | --- | ---: | --- | --- | ---: | ---: | ---: |
@@ -332,6 +335,8 @@ Runs:
 | vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
 | vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
 | vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
 Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
 load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
@@ -350,9 +355,104 @@ bash scripts/run_kaiju_quantized_opencode_smoke.sh
 Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
 exactly `Kaiju Coder 7 quantized runtime ok`.
-Recommendation: keep SGLang as the default public/OpenCode runtime and keep the
-currently installed OpenCode profile at 16k unless the 32k target has just been
-restarted and re-confirmed. Treat vLLM bitsandbytes as the current working
-quantized local candidate for advanced GPU users and future paid API speed
-experiments. It now has direct identity/code/business-doc evidence plus an
-OpenCode one-file smoke, but it is not a persisted quantized-weights repo.

 ## Current Live Runtime
 - Host: Gojira-B over Tailscale
+- Local OpenCode base URL: `http://127.0.0.1:18181/v1`
+- Upstream base URL: `http://100.109.109.14:18084/v1`
+- Serving stack: vLLM bitsandbytes runtime quantization behind the Kaiju fast
+  proxy
+- Current verified context: `16384`
 - Tested high-context target: `32768`
+- Current container: `qwen36-merged-vllm-18084`
+- Current caveat: direct raw generation is still slow for multi-file OpenCode
+  work; use the deterministic router/harness for public business-owner demos.
 ## Benchmark Command
 | vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
 | vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
+Interpretation: unquantized vLLM now runs Kaiju Coder 7 at 16k, but it was not
+clearly faster than SGLang on these smoke prompts. This is historical fallback
+evidence. The later bitsandbytes vLLM path plus fast proxy is the active speed
+path. Keep the live/default OpenCode profile at 16k until 32k is freshly
+re-confirmed.
 ## vLLM bitsandbytes Runtime-Quantized Candidate
 - `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
 - `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
 - `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
+- `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
 | Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
 | --- | ---: | --- | --- | ---: | ---: | ---: |
 | vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
 | vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
 | vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
+| vLLM bitsandbytes | 16384 | code_patch | True | 24.97 | 997 | 39.924 |
+| vLLM bitsandbytes | 16384 | business_doc | True | 34.46 | 1615 | 46.874 |
 Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
 load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
 Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
 exactly `Kaiju Coder 7 quantized runtime ok`.
+Recommendation: use vLLM bitsandbytes behind the local fast proxy as the
+current public/OpenCode speed path and keep the installed OpenCode profile at
+16k unless the 32k target has just been restarted and re-confirmed. Treat
+SGLang as fallback and historical high-context evidence. vLLM bitsandbytes has
+direct identity/code/business-doc evidence plus an OpenCode one-file smoke, but
+it is not a persisted quantized-weights repo.
+## 2026-06-03 Fast Proxy And Website Harness Speed Pass
+The current speed profile keeps runtime-quantized vLLM active on Gojira-B port
+`18084` and routes OpenCode through the local fast proxy at
+`http://127.0.0.1:18181/v1`. The proxy preserves OpenCode tool-call streaming
+while forcing `thinking=false`, model id `kaiju-coder-7`, and bounded output
+budgets.
+Active endpoint checks:
+- Local fast proxy health: `http://127.0.0.1:18181/health`
+- Upstream vLLM models: `http://100.109.109.14:18084/v1/models`
+- Upstream reports `kaiju-coder-7` with `max_model_len=16384`
+Fresh direct vLLM benchmark:
+- Run: `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
+- Identity: `19.48s`
+- Code patch: `24.97s`, `997` chars
+- Business doc: `34.46s`, `1,615` chars
+Fresh OpenCode smoke through the local fast proxy:
+- Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
+- Result: passed in about `23.5s`, wrote the exact requested file.
+- Packaged public verifier after exact-content agent rule:
+  `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4`
+  passed through `http://127.0.0.1:18181/v1`.
+Website harness/router speed pass:
+- Direct website harness command: `python3 scripts/run_kaiju_website_harness.py --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
+- Direct website harness result: `runs/harness/website-speed-pass/avery-stone-vllm.html`, `9,257` chars, `7.31s`
+- Router command: `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
+- Router artifact: `runs/router-speed-pass/20260603T223731Z-website-build-a-premium-one-page-website-for-avery-stone-construction-a-reside/index.html`
+- Router result: passed in `7.20s`; checks covered complete HTML, required sections, external images, responsive CSS, no lorem ipsum, and manifest write.
+- Router through the installed local proxy: `runs/router-speed-pass/20260603T224328Z-website-build-a-premium-one-page-website-for-bennett-family-dental-in-charlott/index.html`
+- Proxy router result: passed in `4.67s`; preserved explicit CTA `Schedule a Visit`, inferred `dental`, and passed the same complete-HTML/static checks.
+Updated recommendation: for speed-sensitive OpenCode and paid workflow testing,
+use vLLM bitsandbytes plus the local fast proxy as the active default. Keep
+SGLang as fallback/historical evidence, not the fastest current path. For
+websites and business-owner packs, prefer the deterministic router/harness path
+over raw long-form HTML generation.
+Public business-owner demo pack through the active fast proxy:
+```bash
+python3 scripts/run_kaiju_public_demo_pack.py \
+  --openai-base-url http://127.0.0.1:18181/v1 \
+  --model kaiju-coder-7 \
+  --planner-timeout 90
+```
+Run: `runs/public-demo-pack/20260603T235009Z/summary.md`
+| Task | Result | Seconds | Changed files |
+| --- | --- | ---: | ---: |
+| Website | Passed | 4.73 | 2 |
+| Owner AI company pack | Passed | 29.85 | 19 |
+| Stripe safety plan | Passed | 9.99 | 2 |
+| CSV parser artifact | Passed | 19.97 | 2 |
+Total: `4/4` passed in `64.529s`.
+## Persisted GGUF Q8_0 Candidate
+The dedicated persisted-quantization pass found that normal AWQ/GPTQ installs
+are not clean against the Qwen3.5-capable serving stack tonight, while
+`llama.cpp` conversion support includes `Qwen3_5ForConditionalGeneration`.
+Command:
+```bash
+./scripts/probe-gojira-b-persisted-quantization.sh
+./scripts/run-gojira-b-kaiju-gguf-convert.sh
+```
+Result:
+- Artifact:
+  `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`
+- Size: `27G`
+- SHA256:
+  `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`
+- Conversion log:
+  `runs/gguf-conversion/20260603T231446Z/gguf-conversion.log`
+- Runtime status: candidate only; direct GGUF runtime smoke still required
+  before publishing quantized weights.
+Interpretation: the next real speed improvement for broad public users is not
+another prompt tweak. It is a smoked GGUF or GPU-persisted quantized artifact.
+The fastest currently verified Kaiju Coder 7 path remains vLLM bitsandbytes
+plus the local fast proxy and deterministic website/business harnesses.