File size: 16,171 Bytes
9afd28d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4ca1eb4
9afd28d
 
 
 
 
 
 
4ca1eb4
 
 
 
9afd28d
4ca1eb4
 
9afd28d
 
 
 
 
 
 
 
 
 
 
 
 
 
4ca1eb4
9afd28d
4ca1eb4
 
9afd28d
 
 
4ca1eb4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# Kaiju Coder 7 Business-Owner Eval Scoreboard

This scoreboard tracks the current release-candidate evidence. Do not publish weights or paid API claims until every required row has a dated result and reviewer.

## Completed Local Gates

| Gate | Command | Result | Date |
|---|---|---:|---|
| Source inventory refresh | `python3 scripts/build_source_inventory.py` | Passed | 2026-06-03 |
| Candidate validation | `python3 scripts/validate_training_data.py --min-examples 350` | 1,689 examples / passed | 2026-06-03 |
| v1.7 category targets | `python3 scripts/check_dataset_targets.py --targets datasets/v1.7-targets.json` | Passed | 2026-06-03 |
| Business-owner SFT build | `python3 scripts/build_v17_business_owner_sft_dataset.py` | 1,881 rows / 192 repeats | 2026-06-03 |
| Router hard harness | `python3 evals/run_router_harness_eval.py --tasks evals/tasks/router-hard-harness.jsonl` | 23/23 | 2026-06-03 |
| Router static checks | `python3 evals/run_router_static_checks.py runs/evals/20260603T103915Z-kaiju_router_harness/results.jsonl` | 23/23 | 2026-06-03 |
| Business-suite prompts | Included in router hard harness | 2/2 | 2026-06-03 |
| Deterministic API harness smoke | `python3 scripts/run_kaiju_api_harness_smoke.py` | Passed: website + business-suite API artifacts | 2026-06-03 |
| Direct business-suite artifact | `python3 scripts/run_kaiju_router.py --prompt "...Kiyomi 7.7.7 AI company operating pack..." --print-manifest` | 19 files / passed | 2026-06-03 |
| Full local RC smoke gate | `python3 scripts/run_kaiju_business_owner_rc_smoke.py` | Passed; latest router/static run `20260603T103915Z-kaiju_router_harness` | 2026-06-03 |
| v1.7 LoRA train | `./scripts/run-gojira-b-qwen36-lora-train.sh` | Finished; runtime `1663.7101s`, train loss `1.7260706673065822`, adapter present | 2026-06-03 |
| v1.7 SGLang serve | `./scripts/start-qwen36-lora-sglang.sh` with `KAIJU_QWEN36_LORA_CONTEXT=4096`, `KAIJU_QWEN36_LORA_MEM_FRACTION=0.90` | `/v1/models` returned `kaiju_v17_business_owner` | 2026-06-03 |
| Raw served adapter smoke: website | `python3 evals/run_openai_compat_smoke.py --base-url http://100.109.109.14:18083/v1 --model kaiju_v17_business_owner --tasks evals/tasks/smoke.jsonl --max-tasks 1 --disable-thinking` | Passed; `20260603T031300Z-kaiju_v17_business_owner`, 2,726 chars in 174.49s | 2026-06-03 |
| Raw served adapter smoke: proposal | `python3 evals/run_openai_compat_smoke.py --base-url http://100.109.109.14:18083/v1 --model kaiju_v17_business_owner --tasks /tmp/kaiju-proposal-smoke.jsonl --system-prompt-file prompts/kaiju-coder-api-system.md --disable-thinking` | Passed; `20260603T032107Z-kaiju_v17_business_owner`, 4,306 chars in 232.27s | 2026-06-03 |
| Raw served adapter quality: website | `python3 evals/score_quality_gate.py runs/evals/20260603T033825Z-kaiju_v17_business_owner/results.jsonl` | Failed paid-ready: `3.71/4.0`, missing complete HTML after 12,706 chars / 793.96s | 2026-06-03 |
| Raw served adapter quality: proposal | `python3 evals/score_quality_gate.py runs/evals/20260603T032107Z-kaiju_v17_business_owner/results.jsonl` | Passed paid-ready: `4.0/4.0` | 2026-06-03 |
| Raw served adapter quality: Jah credits | `python3 evals/score_quality_gate.py runs/evals/20260603T035612Z-kaiju_v17_business_owner/results.jsonl` | Passed paid-ready: `4.0/4.0` | 2026-06-03 |
| Base Qwen comparison: proposal | `python3 evals/compare_quality_runs.py runs/quality-gates/20260603T035200Z-qwen36-27b/scores.jsonl runs/quality-gates/20260603T032107Z-kaiju_v17_business_owner/scores.jsonl` | Tie: base `4.0/4.0`, Kaiju v1.7 `4.0/4.0` | 2026-06-03 |
| Base Qwen comparison: Jah credits | `python3 evals/compare_quality_runs.py runs/quality-gates/20260603T040140Z-qwen36-27b/scores.jsonl runs/quality-gates/20260603T035612Z-kaiju_v17_business_owner/scores.jsonl` | Tie: base `4.0/4.0`, Kaiju v1.7 `4.0/4.0`; deterministic outputs were byte-identical | 2026-06-03 |
| Raw adapter differentiation probe | Identity and Jah probes comparing `qwen36-27b` to `kaiju_v17_business_owner` | Current v1.7 SGLang outputs can be byte-identical to base on deterministic prompts; 24-step v1.7 is too weak as a raw-weight differentiator | 2026-06-03 |
| v1.8 stronger LoRA train | `KAIJU_LORA_CONFIG=training/configs/qwen36-27b-lora-v1.8-business-owner.example.json KAIJU_SFT_DATASET=datasets/build/kaiju-sft-v1.7-business-owner-oversampled.jsonl KAIJU_LORA_RUN_DIR=runs/qwen36-27b-lora-v1.8-business-owner KAIJU_MIN_TRAIN_EXAMPLES=350 KAIJU_SKIP_DATASET_BUILD=1 KAIJU_TRAIN_BACKGROUND=1 ./scripts/run-gojira-b-qwen36-lora-train.sh` | Finished; runtime `11666.7564s`, train loss `0.9281658741335074`, adapter present | 2026-06-03 |
| v1.8 SGLang dynamic LoRA serve | `./scripts/start-qwen36-lora-sglang.sh` with v1.8 adapter, `KAIJU_QWEN36_LORA_CONTEXT=8192`, `KAIJU_QWEN36_LORA_MEM_FRACTION=0.90` | Historical only: `/v1/models` listed `kaiju_v18_business_owner`, but adapter-name-only output can be base-equivalent; not release evidence | 2026-06-03 |
| Corrected v1.8 dynamic LoRA selector | Model selector `qwen36-27b:kaiju_v18_business_owner` under SGLang with fused target modules | Fails: `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`; dynamic LoRA is not the release path | 2026-06-03 |
| v1.8 LoRA merge | `KAIJU_LORA_ADAPTER=/workspace/kaiju-coder/runs/qwen36-27b-lora-v1.8-business-owner/adapter ./scripts/run-gojira-b-qwen36-lora-merge.sh` | Passed; merged full model at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`, `51G`, `14` shards | 2026-06-03 |
| Kaiju Coder 7 merged SGLang serve | `./scripts/start-qwen36-merged-sglang.sh` with `KAIJU_QWEN36_MERGED_CONTEXT=32768`, `KAIJU_QWEN36_MERGED_MEM_FRACTION=0.90` | `/v1/models` returned `kaiju-coder-7`, max model len `32768`; 12k/16k/24k/32k evidence is recorded in `release/SERVING_BENCHMARKS.md` | 2026-06-03 |
| Kaiju Coder 7 restored 32k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 32768 --prompts identity business_doc --max-tokens 768 --timeout 420` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `32768`; identity `2.92s`; business proposal `94.28s`, `1,737` chars | 2026-06-03 |
| Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
| Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
| Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` | Passed; latest run `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
| Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
| Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
| Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
| Kaiju Coder 7 merged vLLM serve | `KAIJU_VLLM_CONTEXT=16384 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 16k with Gojira nightly vLLM after `pandas` preinstall and `--language-model-only`; identity `19.99s`, code patch `28.8s`; not faster enough to replace SGLang | 2026-06-03 |
| Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
| Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
| Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
| Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
| Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
| GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
| Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `64.529s`: website `4.73s`, owner AI company pack `29.85s` with `19` files, Stripe safety plan `9.99s`, CSV parser artifact `19.97s`; run `runs/public-demo-pack/20260603T235009Z/summary.md` | 2026-06-03 |
| Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
| Hugging Face public helper repos | `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` | Passed `17/17`; public downloads verified for adapter, OpenCode helper, and runtime helper, including installer dry-run, demo runner, and GGUF candidate note | 2026-06-03 |
| Hugging Face merged-model upload | `KAIJU_HF_NAMESPACE=RMDWLLC KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Uploaded public repo `RMDWLLC/kaiju-coder-7`; `hf upload-large-folder` processed `53.8G/53.8G`, `39` files, `14` safetensors shards; metadata reports `private: false` | 2026-06-03 |
| v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
| Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
| Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
| Full local RC smoke gate | `python3 scripts/run_kaiju_business_owner_rc_smoke.py` | Passed; latest router/static run `20260603T103915Z-kaiju_router_harness` | 2026-06-03 |

## Required Before Release

| Gate | Required result | Status |
|---|---|---|
| v1.7 LoRA train | Finished metrics and adapter under `runs/qwen36-27b-lora-v1.7-business-owner` | Passed |
| v1.8 stronger LoRA train | Finished metrics and adapter under `runs/qwen36-27b-lora-v1.8-business-owner` | Passed |
| v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
| Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
| Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
| GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending; required only before superiority claims |
| Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
| Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Approved for public HF visibility and paid API launch preflight on 2026-06-03 |
| Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged, bundled, uploaded to public HF repos, and verified with public downloads |

## Decision Rule

The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should be published honestly as Kaiju Coder 7 plus deterministic harness plus verifier, with vLLM bitsandbytes plus the fast proxy as the current speed path. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.