Text Generation
PEFT
Safetensors
English
kaiju-coder-7
lora
coding
local-ai
business
opencode
conversational
Instructions to use RMDWLLC/kaiju-coder-7-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use RMDWLLC/kaiju-coder-7-adapter with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/workspace/kaiju-coder/models/Qwen3.6-27B") model = PeftModel.from_pretrained(base_model, "RMDWLLC/kaiju-coder-7-adapter") - Notebooks
- Google Colab
- Kaggle
Upload Kaiju Coder 7 adapter release package
Browse files- COMPLETION_AUDIT.md +15 -19
- EVAL_SCOREBOARD.md +8 -8
- FINAL_RELEASE_REPORT.md +14 -18
- GOAL_COMPLETION_AUDIT.md +3 -3
- PAID_API_READINESS.md +8 -8
- PUBLIC_TESTING_QUICKSTART.md +11 -9
- README.md +12 -6
- SERVING_BENCHMARKS.md +26 -24
COMPLETION_AUDIT.md
CHANGED
|
@@ -7,9 +7,9 @@ conservative: the product-path harness is release-candidate ready for local
|
|
| 7 |
testing, the fresh v1.8 Qwen 3.6 LoRA adapter exists, and a merged full-model
|
| 8 |
artifact serves locally on Gojira-B. Dynamic SGLang LoRA serving is not counted
|
| 9 |
as release evidence because the corrected LoRA selector crashes on this
|
| 10 |
-
adapter.
|
| 11 |
-
|
| 12 |
-
|
| 13 |
|
| 14 |
## Requirement Status
|
| 15 |
|
|
@@ -29,15 +29,15 @@ externally.
|
|
| 29 |
| Local inference against new v1.7 checkpoint | SGLang served `kaiju_v17_business_owner` over Tailscale at `http://100.109.109.14:18083/v1` with `context=4096` and `mem_fraction=0.90`; website and proposal smoke tasks returned non-empty outputs. | Passed |
|
| 30 |
| Stronger Qwen 3.6 v1.8 fine-tune | Gojira B was cleared of ComfyUI/SGLang/Ollama GPU conflicts; v1.8 finished with `metrics.json`, train runtime `11666.7564s`, train loss `0.9281658741335074`, and an adapter directory. | Passed |
|
| 31 |
| v1.8 adapter merged into full model | `scripts/run-gojira-b-qwen36-lora-merge.sh` merged `/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter` into `/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`; remote artifact is `51G` with `14` safetensor shards and preserved base config/processor sidecars. | Passed |
|
| 32 |
-
| Local inference against v1.8 merged checkpoint |
|
| 33 |
| v1.8 merged business-owner eval | Probe returned `1,155` visible chars in `60.17s`; proposal rerun scored `1/1`, `4.0/4.0`, `4,014` chars in `212.72s`; Jah credits backend scored `4.0/4.0`, `9,718` chars in `566.36s`. | Passed with latency caveat |
|
| 34 |
-
| OpenCode local run path | Local OpenCode provider/agent is installed for `kaiju/kaiju-coder-7` with 16k context and the scoped no-autocontinue plugin at `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`.
|
| 35 |
| Runtime-quantized local path | vLLM bitsandbytes runtime quantization passed identity/code/business-doc smokes at 8k/16k, reported about `17.8 GiB` model memory, and passed OpenCode one-file smoke with exact content `Kaiju Coder 7 quantized runtime ok`. Persisted quantized weights are still pending. | Runtime recipe passed; persisted weights pending |
|
| 36 |
-
| Paid API gateway scaffold | `cd gateway/cloudflare-worker && npm run check` passes `16/16` Worker tests covering bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, streaming/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, and account-scoped artifact download. `python3 scripts/check_paid_api_readiness.py --mode scaffold`
|
| 37 |
| Dynamic SGLang LoRA selector | Adapter-name-only serving can be base-equivalent; corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`. | Not release path |
|
| 38 |
-
| Hugging Face helper repo upload readiness | Adapter, OpenCode helper, and runtime-quantized recipe staging folders build under `/tmp/kaiju-coder-7-hf-staging`;
|
| 39 |
-
| Hugging Face merged model upload readiness | `
|
| 40 |
-
| Consolidated release readiness check | `python3 scripts/check_kaiju_public_release_readiness.py --mode local`
|
| 41 |
|
| 42 |
## Commands With Current Passing Evidence
|
| 43 |
|
|
@@ -71,14 +71,10 @@ Kaiju Coder 7 merged model + deterministic business-owner harness + verifier + s
|
|
| 71 |
|
| 72 |
That must be described honestly until external release review confirms:
|
| 73 |
|
| 74 |
-
-
|
| 75 |
- raw website latency/SLA positioning or explicit harness-first website positioning
|
| 76 |
-
- base Qwen and GLM comparison results
|
| 77 |
-
-
|
| 78 |
-
|
| 79 |
-
-
|
| 80 |
-
|
| 81 |
-
- final Hugging Face upload metadata and public/private release decision
|
| 82 |
-
- live Cloudflare D1/KV/R2 resources, Stripe products/webhook endpoint,
|
| 83 |
-
deployment secrets, staging end-to-end paid API requests, rollback, and
|
| 84 |
-
support boundaries if exposed commercially
|
|
|
|
| 7 |
testing, the fresh v1.8 Qwen 3.6 LoRA adapter exists, and a merged full-model
|
| 8 |
artifact serves locally on Gojira-B. Dynamic SGLang LoRA serving is not counted
|
| 9 |
as release evidence because the corrected LoRA selector crashes on this
|
| 10 |
+
adapter. The public Hugging Face repos are uploaded and public; the remaining
|
| 11 |
+
release caveats are raw-agent latency, GGUF runtime smoke, 32k live-default
|
| 12 |
+
proof, and real Stripe live-mode charging.
|
| 13 |
|
| 14 |
## Requirement Status
|
| 15 |
|
|
|
|
| 29 |
| Local inference against new v1.7 checkpoint | SGLang served `kaiju_v17_business_owner` over Tailscale at `http://100.109.109.14:18083/v1` with `context=4096` and `mem_fraction=0.90`; website and proposal smoke tasks returned non-empty outputs. | Passed |
|
| 30 |
| Stronger Qwen 3.6 v1.8 fine-tune | Gojira B was cleared of ComfyUI/SGLang/Ollama GPU conflicts; v1.8 finished with `metrics.json`, train runtime `11666.7564s`, train loss `0.9281658741335074`, and an adapter directory. | Passed |
|
| 31 |
| v1.8 adapter merged into full model | `scripts/run-gojira-b-qwen36-lora-merge.sh` merged `/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter` into `/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`; remote artifact is `51G` with `14` safetensor shards and preserved base config/processor sidecars. | Passed |
|
| 32 |
+
| Local inference against v1.8 merged checkpoint | Current fast path serves `kaiju-coder-7` through vLLM bitsandbytes on Gojira-B at `http://100.109.109.14:18084/v1`, exposed locally through `http://127.0.0.1:18181/v1`; current live endpoint reports max model len `16384`. Prior SGLang benchmarks proved 12k/16k/24k/32k startup and smoke evidence, with 32k treated as the high-context target rather than the currently parked runtime. | Passed |
|
| 33 |
| v1.8 merged business-owner eval | Probe returned `1,155` visible chars in `60.17s`; proposal rerun scored `1/1`, `4.0/4.0`, `4,014` chars in `212.72s`; Jah credits backend scored `4.0/4.0`, `9,718` chars in `566.36s`. | Passed with latency caveat |
|
| 34 |
+
| OpenCode local run path | Local OpenCode provider/agent is installed for `kaiju/kaiju-coder-7` with 16k context and the scoped no-autocontinue plugin at `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`. Packaged public verifier `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` passed `4/4` in `runs/public-opencode-smoke/20260603T235002Z/summary.md`, including wrong-directory leakage checks; loop-guard smoke wrote `loopguard.txt` with exactly `Kaiju Coder 7 loop guard installed`; latest harnessed customer-readiness pack `runs/opencode-customer-readiness/20260603T185835Z/summary.md` passed `4/4` with `28/28` required files, including release provenance and safety review. | Passed for harnessed/product path |
|
| 35 |
| Runtime-quantized local path | vLLM bitsandbytes runtime quantization passed identity/code/business-doc smokes at 8k/16k, reported about `17.8 GiB` model memory, and passed OpenCode one-file smoke with exact content `Kaiju Coder 7 quantized runtime ok`. Persisted quantized weights are still pending. | Runtime recipe passed; persisted weights pending |
|
| 36 |
+
| Paid API gateway scaffold | `cd gateway/cloudflare-worker && npm run check` passes `16/16` Worker tests covering bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, streaming/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, and account-scoped artifact download. `python3 scripts/check_paid_api_readiness.py --mode scaffold` passes `17` checks. `python3 scripts/check_paid_api_readiness.py --mode launch` passes `27/27` checks after live Cloudflare bindings, Worker-to-Gojira proof, Stripe test-mode webhook evidence, staging latency, and rollback proof. Real customer charging still requires a deliberate Stripe live-mode switch and controlled live payment verification. | Scaffold and launch preflight passed; live-mode charging pending |
|
| 37 |
| Dynamic SGLang LoRA selector | Adapter-name-only serving can be base-equivalent; corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`. | Not release path |
|
| 38 |
+
| Hugging Face helper repo upload readiness | Adapter, OpenCode helper, and runtime-quantized recipe staging folders build under `/tmp/kaiju-coder-7-hf-staging`; public repos `RMDWLLC/kaiju-coder-7-adapter`, `RMDWLLC/kaiju-coder-7-opencode`, and `RMDWLLC/kaiju-coder-7-quantized-runtime` are uploaded and public. `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` verifies public downloads and helper package content. | Uploaded and public |
|
| 39 |
+
| Hugging Face merged model upload readiness | `RMDWLLC/kaiju-coder-7` is uploaded and public with the merged `53.8G` model package and `14` safetensors shards recorded in `release/HF_UPLOAD_EVIDENCE.md`. Public downloads are verified; the previous private-storage blocker was resolved by switching the repos public. | Uploaded and public |
|
| 40 |
+
| Consolidated release readiness check | `python3 scripts/check_kaiju_public_release_readiness.py --mode local`, `--mode hf-release`, and `--mode public` pass against the current fast proxy and public HF evidence. The checker validates staged files, public naming hygiene, secret-looking raw values, checksums, final report, HF bundle checksum, uploaded evidence, and human signoff. | Local, HF, and public modes passed |
|
| 41 |
|
| 42 |
## Commands With Current Passing Evidence
|
| 43 |
|
|
|
|
| 71 |
|
| 72 |
That must be described honestly until external release review confirms:
|
| 73 |
|
| 74 |
+
- GGUF Q8_0 runtime smoke before public quantized-weight claims
|
| 75 |
- raw website latency/SLA positioning or explicit harness-first website positioning
|
| 76 |
+
- broader base Qwen and GLM comparison results before superiority claims
|
| 77 |
+
- 32k context freshly restarted and re-confirmed before making it the live
|
| 78 |
+
default
|
| 79 |
+
- Stripe live-mode products/webhook secret and a controlled live payment before
|
| 80 |
+
selling real paid API access
|
|
|
|
|
|
|
|
|
|
|
|
EVAL_SCOREBOARD.md
CHANGED
|
@@ -35,7 +35,7 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
|
|
| 35 |
| Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
|
| 36 |
| Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
|
| 37 |
| Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
|
| 38 |
-
| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --
|
| 39 |
| Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
|
| 40 |
| Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
|
| 41 |
| Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
|
|
@@ -46,10 +46,10 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
|
|
| 46 |
| Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
|
| 47 |
| Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
|
| 48 |
| GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
|
| 49 |
-
| Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `
|
| 50 |
| Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
|
| 51 |
-
| Hugging Face
|
| 52 |
-
| Hugging Face merged-model
|
| 53 |
| v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
|
| 54 |
| Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
|
| 55 |
| Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
|
|
@@ -64,11 +64,11 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
|
|
| 64 |
| v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
|
| 65 |
| Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
|
| 66 |
| Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
|
| 67 |
-
| GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending |
|
| 68 |
| Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
|
| 69 |
-
| Human review | Richard reviews artifacts for usefulness, privacy, and sellability |
|
| 70 |
-
| Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged
|
| 71 |
|
| 72 |
## Decision Rule
|
| 73 |
|
| 74 |
-
The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should
|
|
|
|
| 35 |
| Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
|
| 36 |
| Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
|
| 37 |
| Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
|
| 38 |
+
| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` | Passed; latest run `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
|
| 39 |
| Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
|
| 40 |
| Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
|
| 41 |
| Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
|
|
|
|
| 46 |
| Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
|
| 47 |
| Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
|
| 48 |
| GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
|
| 49 |
+
| Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `64.529s`: website `4.73s`, owner AI company pack `29.85s` with `19` files, Stripe safety plan `9.99s`, CSV parser artifact `19.97s`; run `runs/public-demo-pack/20260603T235009Z/summary.md` | 2026-06-03 |
|
| 50 |
| Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
|
| 51 |
+
| Hugging Face public helper repos | `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` | Passed `17/17`; public downloads verified for adapter, OpenCode helper, and runtime helper, including installer dry-run, demo runner, and GGUF candidate note | 2026-06-03 |
|
| 52 |
+
| Hugging Face merged-model upload | `KAIJU_HF_NAMESPACE=RMDWLLC KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Uploaded public repo `RMDWLLC/kaiju-coder-7`; `hf upload-large-folder` processed `53.8G/53.8G`, `39` files, `14` safetensors shards; metadata reports `private: false` | 2026-06-03 |
|
| 53 |
| v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
|
| 54 |
| Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
|
| 55 |
| Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
|
|
|
|
| 64 |
| v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
|
| 65 |
| Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
|
| 66 |
| Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
|
| 67 |
+
| GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending; required only before superiority claims |
|
| 68 |
| Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
|
| 69 |
+
| Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Approved for public HF visibility and paid API launch preflight on 2026-06-03 |
|
| 70 |
+
| Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged, bundled, uploaded to public HF repos, and verified with public downloads |
|
| 71 |
|
| 72 |
## Decision Rule
|
| 73 |
|
| 74 |
+
The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should be published honestly as Kaiju Coder 7 plus deterministic harness plus verifier, with vLLM bitsandbytes plus the fast proxy as the current speed path. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.
|
FINAL_RELEASE_REPORT.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# Kaiju Coder 7 Final Release Report
|
| 2 |
|
| 3 |
-
Generated: `2026-06-03T23:
|
| 4 |
|
| 5 |
Product name: `Kaiju Coder 7`
|
| 6 |
Public model id: `kaiju-coder-7`
|
|
@@ -24,11 +24,11 @@ Stripe live-mode switch and controlled live payment verification.
|
|
| 24 |
|
| 25 |
| Field | Value |
|
| 26 |
|---|---|
|
| 27 |
-
| Status | `
|
| 28 |
-
| Base URL | `http://
|
| 29 |
-
| Model id | `
|
| 30 |
-
| Max model length | `
|
| 31 |
-
| Detail | `
|
| 32 |
|
| 33 |
Recommended default today: `16k` context through `kaiju-coder-7`. Higher
|
| 34 |
context has benchmark evidence, but the currently parked default is 16k for
|
|
@@ -38,9 +38,9 @@ stability and speed.
|
|
| 38 |
|
| 39 |
| Area | Result |
|
| 40 |
|---|---|
|
| 41 |
-
| Local public-testing readiness | `ready=
|
| 42 |
-
| Hugging Face release readiness | `ready=
|
| 43 |
-
| Public launch readiness | `ready=
|
| 44 |
| Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
|
| 45 |
| Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
|
| 46 |
|
|
@@ -59,15 +59,11 @@ stability and speed.
|
|
| 59 |
|
| 60 |
## Hugging Face Release Blockers
|
| 61 |
|
| 62 |
-
|
| 63 |
-
|---|---|---|
|
| 64 |
-
| fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
|
| 65 |
|
| 66 |
## Public Launch Blockers
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|---|---|---|
|
| 70 |
-
| fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
|
| 71 |
|
| 72 |
## Paid API Launch Blockers
|
| 73 |
|
|
@@ -276,9 +272,9 @@ human release review explicitly approves public paid API launch.
|
|
| 276 |
| git HEAD | `git rev-parse HEAD` | 0 |
|
| 277 |
| git origin/main | `git rev-parse origin/main` | 0 |
|
| 278 |
| git status | `git status --short` | 0 |
|
| 279 |
-
| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://
|
| 280 |
-
| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://
|
| 281 |
-
| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://
|
| 282 |
| HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
|
| 283 |
| paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
|
| 284 |
|
|
|
|
| 1 |
# Kaiju Coder 7 Final Release Report
|
| 2 |
|
| 3 |
+
Generated: `2026-06-03T23:53:31Z`
|
| 4 |
|
| 5 |
Product name: `Kaiju Coder 7`
|
| 6 |
Public model id: `kaiju-coder-7`
|
|
|
|
| 24 |
|
| 25 |
| Field | Value |
|
| 26 |
|---|---|
|
| 27 |
+
| Status | `pass` |
|
| 28 |
+
| Base URL | `http://127.0.0.1:18181/v1` |
|
| 29 |
+
| Model id | `kaiju-coder-7` |
|
| 30 |
+
| Max model length | `16384` |
|
| 31 |
+
| Detail | `` |
|
| 32 |
|
| 33 |
Recommended default today: `16k` context through `kaiju-coder-7`. Higher
|
| 34 |
context has benchmark evidence, but the currently parked default is 16k for
|
|
|
|
| 38 |
|
| 39 |
| Area | Result |
|
| 40 |
|---|---|
|
| 41 |
+
| Local public-testing readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
|
| 42 |
+
| Hugging Face release readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
|
| 43 |
+
| Public launch readiness | `ready=True pass=24 fail=0 manual=0 rc=0` |
|
| 44 |
| Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
|
| 45 |
| Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
|
| 46 |
|
|
|
|
| 59 |
|
| 60 |
## Hugging Face Release Blockers
|
| 61 |
|
| 62 |
+
- No matching checks.
|
|
|
|
|
|
|
| 63 |
|
| 64 |
## Public Launch Blockers
|
| 65 |
|
| 66 |
+
- No matching checks.
|
|
|
|
|
|
|
| 67 |
|
| 68 |
## Paid API Launch Blockers
|
| 69 |
|
|
|
|
| 272 |
| git HEAD | `git rev-parse HEAD` | 0 |
|
| 273 |
| git origin/main | `git rev-parse origin/main` | 0 |
|
| 274 |
| git status | `git status --short` | 0 |
|
| 275 |
+
| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
|
| 276 |
+
| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
|
| 277 |
+
| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
|
| 278 |
| HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
|
| 279 |
| paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
|
| 280 |
|
GOAL_COMPLETION_AUDIT.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
# Kaiju Coder 7 Goal Completion Audit
|
| 2 |
|
| 3 |
-
Generated: `2026-06-03T23:
|
| 4 |
|
| 5 |
Overall: `complete`
|
| 6 |
Summary: `18 passed / 0 blocked / 0 manual`
|
| 7 |
|
| 8 |
-
This audit maps the active Kaiju Coder 7 objective to current evidence
|
| 9 |
|
| 10 |
## Readiness Commands
|
| 11 |
|
|
@@ -34,7 +34,7 @@ This audit maps the active Kaiju Coder 7 objective to current evidence. It is st
|
|
| 34 |
| Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
|
| 35 |
| Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums | |
|
| 36 |
| Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
|
| 37 |
-
| Hugging Face | Merged 51GB model repo upload is complete
|
| 38 |
| Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply | |
|
| 39 |
| Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
|
| 40 |
| Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md | |
|
|
|
|
| 1 |
# Kaiju Coder 7 Goal Completion Audit
|
| 2 |
|
| 3 |
+
Generated: `2026-06-03T23:53:44Z`
|
| 4 |
|
| 5 |
Overall: `complete`
|
| 6 |
Summary: `18 passed / 0 blocked / 0 manual`
|
| 7 |
|
| 8 |
+
This audit maps the active Kaiju Coder 7 objective to current evidence across local runtime, Hugging Face release, OpenCode, paid API preflight, and remaining honest caveats.
|
| 9 |
|
| 10 |
## Readiness Commands
|
| 11 |
|
|
|
|
| 34 |
| Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
|
| 35 |
| Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums | |
|
| 36 |
| Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
|
| 37 |
+
| Hugging Face | Merged 51GB model repo upload is complete and public, or guarded with explicit evidence. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh | |
|
| 38 |
| Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply | |
|
| 39 |
| Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
|
| 40 |
| Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md | |
|
PAID_API_READINESS.md
CHANGED
|
@@ -152,12 +152,12 @@ python3 scripts/check_paid_api_readiness.py --mode launch
|
|
| 152 |
```
|
| 153 |
|
| 154 |
`check_kaiju_public_release_readiness.py --mode local` is the consolidated
|
| 155 |
-
public-testing readiness command.
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
|
| 162 |
`generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
|
| 163 |
the current local/public readiness summaries, launch blockers, changed files,
|
|
@@ -167,8 +167,8 @@ lines.
|
|
| 167 |
|
| 168 |
`check_kaiju_goal_completion.py --write` writes
|
| 169 |
`release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
|
| 170 |
-
remain
|
| 171 |
-
evidence
|
| 172 |
|
| 173 |
`refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
|
| 174 |
direct API smoke evidence, goal audit, final report, HF staging, local bundle,
|
|
|
|
| 152 |
```
|
| 153 |
|
| 154 |
`check_kaiju_public_release_readiness.py --mode local` is the consolidated
|
| 155 |
+
public-testing readiness command. `--mode hf-release` checks the downloadable
|
| 156 |
+
model/helper release, public Hugging Face evidence, and human review while
|
| 157 |
+
keeping live paid charging separate from model publication. `--mode public`
|
| 158 |
+
now passes after public HF verification, live Cloudflare resource evidence,
|
| 159 |
+
Stripe test-mode staging evidence, rollback proof, paid-route latency evidence,
|
| 160 |
+
and human review are complete.
|
| 161 |
|
| 162 |
`generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
|
| 163 |
the current local/public readiness summaries, launch blockers, changed files,
|
|
|
|
| 167 |
|
| 168 |
`check_kaiju_goal_completion.py --write` writes
|
| 169 |
`release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
|
| 170 |
+
remain green only while the live runtime, public HF evidence, human review, and
|
| 171 |
+
paid API launch evidence continue to pass.
|
| 172 |
|
| 173 |
`refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
|
| 174 |
direct API smoke evidence, goal audit, final report, HF staging, local bundle,
|
PUBLIC_TESTING_QUICKSTART.md
CHANGED
|
@@ -129,7 +129,8 @@ Expected result:
|
|
| 129 |
- Raw multi-file OpenCode generation: still too slow for broad paid claims;
|
| 130 |
useful for testing, but paid API claims should favor harnessed product
|
| 131 |
workflows until broader latency gates pass
|
| 132 |
-
- Paid API: not public until launch preflight passes
|
|
|
|
| 133 |
|
| 134 |
## What Not To Claim Yet
|
| 135 |
|
|
@@ -153,14 +154,15 @@ Do claim:
|
|
| 153 |
- a GGUF Q8_0 candidate exists, but is not public quantized-weights release
|
| 154 |
evidence until runtime smoke passes
|
| 155 |
|
| 156 |
-
##
|
| 157 |
|
| 158 |
-
- Hugging Face
|
| 159 |
-
- Full merged model upload has not completed; the merged folder must first have
|
| 160 |
-
the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
|
| 161 |
- The GGUF Q8_0 candidate still needs a runtime smoke before public
|
| 162 |
quantized-weights upload.
|
| 163 |
-
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
- Raw multi-file OpenCode generation: still too slow for broad paid claims;
|
| 130 |
useful for testing, but paid API claims should favor harnessed product
|
| 131 |
workflows until broader latency gates pass
|
| 132 |
+
- Paid API: not public until launch preflight passes and the Stripe live-mode
|
| 133 |
+
switch is deliberately completed
|
| 134 |
|
| 135 |
## What Not To Claim Yet
|
| 136 |
|
|
|
|
| 154 |
- a GGUF Q8_0 candidate exists, but is not public quantized-weights release
|
| 155 |
evidence until runtime smoke passes
|
| 156 |
|
| 157 |
+
## Remaining Caveats Before Broader Claims
|
| 158 |
|
| 159 |
+
- Hugging Face public release repos are uploaded and public under `RMDWLLC`.
|
|
|
|
|
|
|
| 160 |
- The GGUF Q8_0 candidate still needs a runtime smoke before public
|
| 161 |
quantized-weights upload.
|
| 162 |
+
- Raw multi-file OpenCode generation is still not the public speed story; use
|
| 163 |
+
the deterministic router/harness for websites and business-owner packs.
|
| 164 |
+
- Public paid API launch has approval and preflight evidence, but real customer
|
| 165 |
+
charging still needs a deliberate Stripe live-mode switch and controlled live
|
| 166 |
+
payment verification.
|
| 167 |
+
- Do not claim 32k context as the live default until it is freshly restarted
|
| 168 |
+
and re-confirmed.
|
README.md
CHANGED
|
@@ -95,17 +95,21 @@ Local product-path evidence:
|
|
| 95 |
|
| 96 |
Merged serving evidence:
|
| 97 |
|
| 98 |
-
-
|
|
|
|
| 99 |
- Served model: `kaiju-coder-7`
|
| 100 |
-
- Tested context: `
|
|
|
|
|
|
|
| 101 |
- Probe: `1,155` visible chars in `60.17s`.
|
| 102 |
- Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
|
| 103 |
- Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
|
| 104 |
- OpenCode customer-readiness harness: `4/4` tasks passed, `28/28` required files written, including source/provenance and release-claim safety review.
|
| 105 |
- vLLM nightly serving probe: passed at `16384` after `pandas` preinstall and
|
| 106 |
-
`--language-model-only`
|
| 107 |
-
- Runtime-quantized vLLM bitsandbytes: passed at `8192`
|
| 108 |
-
patch completed in `11.3s`, and logs reported about
|
|
|
|
| 109 |
|
| 110 |
Known comparison caveat:
|
| 111 |
|
|
@@ -117,5 +121,7 @@ Known comparison caveat:
|
|
| 117 |
- Raw full-website generation has not yet passed the merged-model release sweep and should remain harness-first for paid delivery.
|
| 118 |
- The deterministic harness remains the practical paid website workflow.
|
| 119 |
- The adapter needs a strong app layer for file editing, tool use, auth, billing, rate limits, logging, and rollback.
|
| 120 |
-
-
|
|
|
|
|
|
|
| 121 |
- Not intended for high-risk medical, legal, financial, or safety-critical decisions without expert review.
|
|
|
|
| 95 |
|
| 96 |
Merged serving evidence:
|
| 97 |
|
| 98 |
+
- Current endpoint: `http://127.0.0.1:18181/v1`, forwarding to vLLM
|
| 99 |
+
bitsandbytes on Gojira B at `http://100.109.109.14:18084/v1`
|
| 100 |
- Served model: `kaiju-coder-7`
|
| 101 |
+
- Tested context: `16384` for the current OpenCode fast path. Historical
|
| 102 |
+
SGLang benchmark evidence includes `32768`, but 32k should be freshly
|
| 103 |
+
restarted and re-confirmed before being called the live default.
|
| 104 |
- Probe: `1,155` visible chars in `60.17s`.
|
| 105 |
- Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
|
| 106 |
- Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
|
| 107 |
- OpenCode customer-readiness harness: `4/4` tasks passed, `28/28` required files written, including source/provenance and release-claim safety review.
|
| 108 |
- vLLM nightly serving probe: passed at `16384` after `pandas` preinstall and
|
| 109 |
+
`--language-model-only`.
|
| 110 |
+
- Runtime-quantized vLLM bitsandbytes: current speed path; passed at `8192`
|
| 111 |
+
and `16384`; 16k code patch completed in `11.3s`, and logs reported about
|
| 112 |
+
`17.8 GiB` model memory.
|
| 113 |
|
| 114 |
Known comparison caveat:
|
| 115 |
|
|
|
|
| 121 |
- Raw full-website generation has not yet passed the merged-model release sweep and should remain harness-first for paid delivery.
|
| 122 |
- The deterministic harness remains the practical paid website workflow.
|
| 123 |
- The adapter needs a strong app layer for file editing, tool use, auth, billing, rate limits, logging, and rollback.
|
| 124 |
+
- Public HF upload and human review are complete for testing. Real customer
|
| 125 |
+
paid charging still requires Stripe live-mode setup and controlled live
|
| 126 |
+
payment verification.
|
| 127 |
- Not intended for high-risk medical, legal, financial, or safety-critical decisions without expert review.
|
SERVING_BENCHMARKS.md
CHANGED
|
@@ -6,12 +6,15 @@ The model id must remain `kaiju-coder-7`.
|
|
| 6 |
## Current Live Runtime
|
| 7 |
|
| 8 |
- Host: Gojira-B over Tailscale
|
| 9 |
-
-
|
| 10 |
-
-
|
| 11 |
-
-
|
|
|
|
|
|
|
| 12 |
- Tested high-context target: `32768`
|
| 13 |
-
- Current container: `qwen36-merged-
|
| 14 |
-
- Current caveat: direct raw generation is slow for multi-file OpenCode
|
|
|
|
| 15 |
|
| 16 |
## Benchmark Command
|
| 17 |
|
|
@@ -294,12 +297,11 @@ Run: `runs/benchmarks/20260603T151244Z-kaiju-coder-7-serving/summary.md`
|
|
| 294 |
| vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
|
| 295 |
| vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
|
| 296 |
|
| 297 |
-
Interpretation: vLLM now runs Kaiju Coder 7 at 16k, but it
|
| 298 |
-
faster than SGLang on
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
quantized-weight testing.
|
| 303 |
|
| 304 |
## vLLM bitsandbytes Runtime-Quantized Candidate
|
| 305 |
|
|
@@ -353,12 +355,12 @@ bash scripts/run_kaiju_quantized_opencode_smoke.sh
|
|
| 353 |
Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
|
| 354 |
exactly `Kaiju Coder 7 quantized runtime ok`.
|
| 355 |
|
| 356 |
-
Recommendation:
|
| 357 |
-
|
| 358 |
-
restarted and re-confirmed. Treat
|
| 359 |
-
|
| 360 |
-
|
| 361 |
-
|
| 362 |
|
| 363 |
## 2026-06-03 Fast Proxy And Website Harness Speed Pass
|
| 364 |
|
|
@@ -386,7 +388,7 @@ Fresh OpenCode smoke through the local fast proxy:
|
|
| 386 |
- Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
|
| 387 |
- Result: passed in about `23.5s`, wrote the exact requested file.
|
| 388 |
- Packaged public verifier after exact-content agent rule:
|
| 389 |
-
`runs/public-opencode-smoke/
|
| 390 |
passed through `http://127.0.0.1:18181/v1`.
|
| 391 |
|
| 392 |
Website harness/router speed pass:
|
|
@@ -414,16 +416,16 @@ python3 scripts/run_kaiju_public_demo_pack.py \
|
|
| 414 |
--planner-timeout 90
|
| 415 |
```
|
| 416 |
|
| 417 |
-
Run: `runs/public-demo-pack/
|
| 418 |
|
| 419 |
| Task | Result | Seconds | Changed files |
|
| 420 |
| --- | --- | ---: | ---: |
|
| 421 |
-
| Website | Passed |
|
| 422 |
-
| Owner AI company pack | Passed | 29.
|
| 423 |
-
| Stripe safety plan | Passed | 9.
|
| 424 |
-
| CSV parser artifact | Passed | 19.
|
| 425 |
|
| 426 |
-
Total: `4/4` passed in `
|
| 427 |
|
| 428 |
## Persisted GGUF Q8_0 Candidate
|
| 429 |
|
|
|
|
| 6 |
## Current Live Runtime
|
| 7 |
|
| 8 |
- Host: Gojira-B over Tailscale
|
| 9 |
+
- Local OpenCode base URL: `http://127.0.0.1:18181/v1`
|
| 10 |
+
- Upstream base URL: `http://100.109.109.14:18084/v1`
|
| 11 |
+
- Serving stack: vLLM bitsandbytes runtime quantization behind the Kaiju fast
|
| 12 |
+
proxy
|
| 13 |
+
- Current verified context: `16384`
|
| 14 |
- Tested high-context target: `32768`
|
| 15 |
+
- Current container: `qwen36-merged-vllm-18084`
|
| 16 |
+
- Current caveat: direct raw generation is still slow for multi-file OpenCode
|
| 17 |
+
work; use the deterministic router/harness for public business-owner demos.
|
| 18 |
|
| 19 |
## Benchmark Command
|
| 20 |
|
|
|
|
| 297 |
| vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
|
| 298 |
| vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
|
| 299 |
|
| 300 |
+
Interpretation: unquantized vLLM now runs Kaiju Coder 7 at 16k, but it was not
|
| 301 |
+
clearly faster than SGLang on these smoke prompts. This is historical fallback
|
| 302 |
+
evidence. The later bitsandbytes vLLM path plus fast proxy is the active speed
|
| 303 |
+
path. Keep the live/default OpenCode profile at 16k until 32k is freshly
|
| 304 |
+
re-confirmed.
|
|
|
|
| 305 |
|
| 306 |
## vLLM bitsandbytes Runtime-Quantized Candidate
|
| 307 |
|
|
|
|
| 355 |
Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
|
| 356 |
exactly `Kaiju Coder 7 quantized runtime ok`.
|
| 357 |
|
| 358 |
+
Recommendation: use vLLM bitsandbytes behind the local fast proxy as the
|
| 359 |
+
current public/OpenCode speed path and keep the installed OpenCode profile at
|
| 360 |
+
16k unless the 32k target has just been restarted and re-confirmed. Treat
|
| 361 |
+
SGLang as fallback and historical high-context evidence. vLLM bitsandbytes has
|
| 362 |
+
direct identity/code/business-doc evidence plus an OpenCode one-file smoke, but
|
| 363 |
+
it is not a persisted quantized-weights repo.
|
| 364 |
|
| 365 |
## 2026-06-03 Fast Proxy And Website Harness Speed Pass
|
| 366 |
|
|
|
|
| 388 |
- Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
|
| 389 |
- Result: passed in about `23.5s`, wrote the exact requested file.
|
| 390 |
- Packaged public verifier after exact-content agent rule:
|
| 391 |
+
`runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4`
|
| 392 |
passed through `http://127.0.0.1:18181/v1`.
|
| 393 |
|
| 394 |
Website harness/router speed pass:
|
|
|
|
| 416 |
--planner-timeout 90
|
| 417 |
```
|
| 418 |
|
| 419 |
+
Run: `runs/public-demo-pack/20260603T235009Z/summary.md`
|
| 420 |
|
| 421 |
| Task | Result | Seconds | Changed files |
|
| 422 |
| --- | --- | ---: | ---: |
|
| 423 |
+
| Website | Passed | 4.73 | 2 |
|
| 424 |
+
| Owner AI company pack | Passed | 29.85 | 19 |
|
| 425 |
+
| Stripe safety plan | Passed | 9.99 | 2 |
|
| 426 |
+
| CSV parser artifact | Passed | 19.97 | 2 |
|
| 427 |
|
| 428 |
+
Total: `4/4` passed in `64.529s`.
|
| 429 |
|
| 430 |
## Persisted GGUF Q8_0 Candidate
|
| 431 |
|