Kaiju Coder 7 Business-Owner Eval Scoreboard

This scoreboard tracks the current release-candidate evidence. Do not publish weights or paid API claims until every required row has a dated result and reviewer.

Completed Local Gates

Gate	Command	Result	Date
Source inventory refresh	`python3 scripts/build_source_inventory.py`	Passed	2026-06-03
Candidate validation	`python3 scripts/validate_training_data.py --min-examples 350`	1,689 examples / passed	2026-06-03
v1.7 category targets	`python3 scripts/check_dataset_targets.py --targets datasets/v1.7-targets.json`	Passed	2026-06-03
Business-owner SFT build	`python3 scripts/build_v17_business_owner_sft_dataset.py`	1,881 rows / 192 repeats	2026-06-03
Router hard harness	`python3 evals/run_router_harness_eval.py --tasks evals/tasks/router-hard-harness.jsonl`	23/23	2026-06-03
Router static checks	`python3 evals/run_router_static_checks.py runs/evals/20260603T103915Z-kaiju_router_harness/results.jsonl`	23/23	2026-06-03
Business-suite prompts	Included in router hard harness	2/2	2026-06-03
Deterministic API harness smoke	`python3 scripts/run_kaiju_api_harness_smoke.py`	Passed: website + business-suite API artifacts	2026-06-03
Direct business-suite artifact	`python3 scripts/run_kaiju_router.py --prompt "...Kiyomi 7.7.7 AI company operating pack..." --print-manifest`	19 files / passed	2026-06-03
Full local RC smoke gate	`python3 scripts/run_kaiju_business_owner_rc_smoke.py`	Passed; latest router/static run `20260603T103915Z-kaiju_router_harness`	2026-06-03
v1.7 LoRA train	`./scripts/run-gojira-b-qwen36-lora-train.sh`	Finished; runtime `1663.7101s`, train loss `1.7260706673065822`, adapter present	2026-06-03
v1.7 SGLang serve	`./scripts/start-qwen36-lora-sglang.sh` with `KAIJU_QWEN36_LORA_CONTEXT=4096`, `KAIJU_QWEN36_LORA_MEM_FRACTION=0.90`	`/v1/models` returned `kaiju_v17_business_owner`	2026-06-03
Raw served adapter smoke: website	`python3 evals/run_openai_compat_smoke.py --base-url http://100.109.109.14:18083/v1 --model kaiju_v17_business_owner --tasks evals/tasks/smoke.jsonl --max-tasks 1 --disable-thinking`	Passed; `20260603T031300Z-kaiju_v17_business_owner`, 2,726 chars in 174.49s	2026-06-03
Raw served adapter smoke: proposal	`python3 evals/run_openai_compat_smoke.py --base-url http://100.109.109.14:18083/v1 --model kaiju_v17_business_owner --tasks /tmp/kaiju-proposal-smoke.jsonl --system-prompt-file prompts/kaiju-coder-api-system.md --disable-thinking`	Passed; `20260603T032107Z-kaiju_v17_business_owner`, 4,306 chars in 232.27s	2026-06-03
Raw served adapter quality: website	`python3 evals/score_quality_gate.py runs/evals/20260603T033825Z-kaiju_v17_business_owner/results.jsonl`	Failed paid-ready: `3.71/4.0`, missing complete HTML after 12,706 chars / 793.96s	2026-06-03
Raw served adapter quality: proposal	`python3 evals/score_quality_gate.py runs/evals/20260603T032107Z-kaiju_v17_business_owner/results.jsonl`	Passed paid-ready: `4.0/4.0`	2026-06-03
Raw served adapter quality: Jah credits	`python3 evals/score_quality_gate.py runs/evals/20260603T035612Z-kaiju_v17_business_owner/results.jsonl`	Passed paid-ready: `4.0/4.0`	2026-06-03
Base Qwen comparison: proposal	`python3 evals/compare_quality_runs.py runs/quality-gates/20260603T035200Z-qwen36-27b/scores.jsonl runs/quality-gates/20260603T032107Z-kaiju_v17_business_owner/scores.jsonl`	Tie: base `4.0/4.0`, Kaiju v1.7 `4.0/4.0`	2026-06-03
Base Qwen comparison: Jah credits	`python3 evals/compare_quality_runs.py runs/quality-gates/20260603T040140Z-qwen36-27b/scores.jsonl runs/quality-gates/20260603T035612Z-kaiju_v17_business_owner/scores.jsonl`	Tie: base `4.0/4.0`, Kaiju v1.7 `4.0/4.0`; deterministic outputs were byte-identical	2026-06-03
Raw adapter differentiation probe	Identity and Jah probes comparing `qwen36-27b` to `kaiju_v17_business_owner`	Current v1.7 SGLang outputs can be byte-identical to base on deterministic prompts; 24-step v1.7 is too weak as a raw-weight differentiator	2026-06-03
v1.8 stronger LoRA train	`KAIJU_LORA_CONFIG=training/configs/qwen36-27b-lora-v1.8-business-owner.example.json KAIJU_SFT_DATASET=datasets/build/kaiju-sft-v1.7-business-owner-oversampled.jsonl KAIJU_LORA_RUN_DIR=runs/qwen36-27b-lora-v1.8-business-owner KAIJU_MIN_TRAIN_EXAMPLES=350 KAIJU_SKIP_DATASET_BUILD=1 KAIJU_TRAIN_BACKGROUND=1 ./scripts/run-gojira-b-qwen36-lora-train.sh`	Finished; runtime `11666.7564s`, train loss `0.9281658741335074`, adapter present	2026-06-03
v1.8 SGLang dynamic LoRA serve	`./scripts/start-qwen36-lora-sglang.sh` with v1.8 adapter, `KAIJU_QWEN36_LORA_CONTEXT=8192`, `KAIJU_QWEN36_LORA_MEM_FRACTION=0.90`	Historical only: `/v1/models` listed `kaiju_v18_business_owner`, but adapter-name-only output can be base-equivalent; not release evidence	2026-06-03
Corrected v1.8 dynamic LoRA selector	Model selector `qwen36-27b:kaiju_v18_business_owner` under SGLang with fused target modules	Fails: `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`; dynamic LoRA is not the release path	2026-06-03
v1.8 LoRA merge	`KAIJU_LORA_ADAPTER=/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter ./scripts/run-gojira-b-qwen36-lora-merge.sh`	Passed; merged full model at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`, `51G`, `14` shards	2026-06-03
Kaiju Coder 7 merged SGLang serve	`./scripts/start-qwen36-merged-sglang.sh` with `KAIJU_QWEN36_MERGED_CONTEXT=32768`, `KAIJU_QWEN36_MERGED_MEM_FRACTION=0.90`	`/v1/models` returned `kaiju-coder-7`, max model len `32768`; 12k/16k/24k/32k evidence is recorded in `release/SERVING_BENCHMARKS.md`	2026-06-03
Kaiju Coder 7 restored 32k direct API smoke	`python3 scripts/benchmark_kaiju_serving.py --contexts 32768 --prompts identity business_doc --max-tokens 768 --timeout 420`	Passed; `/v1/models` returned `kaiju-coder-7`, max model len `32768`; identity `2.92s`; business proposal `94.28s`, `1,737` chars	2026-06-03
Kaiju Coder 7 restored 32k OpenCode one-file smoke	`opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'`	Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok`	2026-06-03
Kaiju Coder 7 current restored 16k direct API smoke	`python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120`	Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars	2026-06-03
Kaiju Coder 7 current restored 16k OpenCode one-file smoke	`mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'`	Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok`	2026-06-03
Kaiju Coder 7 packaged public OpenCode smoke	`python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900`	Passed; latest run `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy	2026-06-03
Kaiju Coder 7 loop-guarded OpenCode install	`python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'`	Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly	2026-06-03
Current harnessed OpenCode customer-readiness pack	`python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed`	Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review	2026-06-03
Paid API Worker scaffold	`cd gateway/cloudflare-worker && npm run check && npm run preflight`	Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper	2026-06-03
Kaiju Coder 7 merged vLLM serve	`KAIJU_VLLM_CONTEXT=16384 ./scripts/run-gojira-b-vllm-serving-benchmark.sh`	Passed at 16k with Gojira nightly vLLM after `pandas` preinstall and `--language-model-only`; identity `19.99s`, code patch `28.8s`; not faster enough to replace SGLang	2026-06-03
Kaiju Coder 7 runtime-quantized vLLM serve	`KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh`	Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory	2026-06-03
Kaiju Coder 7 runtime-quantized business-doc smoke	`KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh`	Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion	2026-06-03
Kaiju Coder 7 runtime-quantized OpenCode one-file smoke	`bash scripts/run_kaiju_quantized_opencode_smoke.sh`	Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok`	2026-06-03
Kaiju Coder 7 fast proxy plus website harness speed pass	`python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1`	Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end	2026-06-03
Persisted quantization support probe	`./scripts/probe-gojira-b-persisted-quantization.sh`	Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing	2026-06-03
GGUF Q8_0 persisted conversion	`./scripts/run-gojira-b-kaiju-gguf-convert.sh`	Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release	2026-06-03
Public business-owner demo pack	`python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90`	Passed `4/4` through the fast proxy in `64.529s`: website `4.73s`, owner AI company pack `29.85s` with `19` files, Stripe safety plan `9.99s`, CSV parser artifact `19.97s`; run `runs/public-demo-pack/20260603T235009Z/summary.md`	2026-06-03
Hugging Face CLI install/auth check	`hf version && hf auth whoami && hf auth list`	`hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode`	2026-06-03
Hugging Face public helper repos	`python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public`	Passed `17/17`; public downloads verified for adapter, OpenCode helper, and runtime helper, including installer dry-run, demo runner, and GGUF candidate note	2026-06-03
Hugging Face merged-model upload	`KAIJU_HF_NAMESPACE=RMDWLLC KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh`	Uploaded public repo `RMDWLLC/kaiju-coder-7`; `hf upload-large-folder` processed `53.8G/53.8G`, `39` files, `14` safetensors shards; metadata reports `private: false`	2026-06-03
v1.8 merged endpoint probe	Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking	Passed; `1,155` visible chars in `60.17s`, normal `content` response	2026-06-03
Kaiju Coder 7 merged focused proposal eval	`python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>`	Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s`	2026-06-03
Kaiju Coder 7 merged focused Jah credits eval	`python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>`	Passed: `4.0/4.0`, `9,718` chars, `566.36s`	2026-06-03
Full local RC smoke gate	`python3 scripts/run_kaiju_business_owner_rc_smoke.py`	Passed; latest router/static run `20260603T103915Z-kaiju_router_harness`	2026-06-03

Required Before Release

Gate	Required result	Status
v1.7 LoRA train	Finished metrics and adapter under `runs/qwen36-27b-lora-v1.7-business-owner`	Passed
v1.8 stronger LoRA train	Finished metrics and adapter under `runs/qwen36-27b-lora-v1.8-business-owner`	Passed
v1.8 merged focused smoke	`python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py`	Passed for proposal rerun and Jah credits backend; broader sweep pending
Direct commercial eval	No critical failures, scored summary attached	Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending
Base Qwen comparison	Kaiju beats base Qwen on RMDW/Kiyomi practical tasks	Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims
GLM comparison	Kaiju is near or above GLM on highest-value business-owner tasks	Pending; required only before superiority claims
Local inference smoke	OpenAI-compatible endpoint returns usable business-owner artifact	Passed for v1.8 merged SGLang endpoint and product harness
Human review	Richard reviews artifacts for usefulness, privacy, and sellability	Approved for public HF visibility and paid API launch preflight on 2026-06-03
Release package	Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete	Staged, bundled, uploaded to public HF repos, and verified with public downloads

Decision Rule

The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should be published honestly as Kaiju Coder 7 plus deterministic harness plus verifier, with vLLM bitsandbytes plus the fast proxy as the current speed path. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.