Text Generation
PEFT
Safetensors
English
kaiju-coder-7
lora
coding
local-ai
business
opencode
conversational
Instructions to use RMDWLLC/kaiju-coder-7-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use RMDWLLC/kaiju-coder-7-adapter with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/workspace/kaiju-coder/models/Qwen3.6-27B") model = PeftModel.from_pretrained(base_model, "RMDWLLC/kaiju-coder-7-adapter") - Notebooks
- Google Colab
- Kaggle
Upload Kaiju Coder 7 adapter release package
Browse files- EVAL_SCOREBOARD.md +5 -1
- FINAL_RELEASE_REPORT.md +30 -16
- GOAL_COMPLETION_AUDIT.md +2 -2
- HF_UPLOAD_EVIDENCE.md +3 -3
- LOCAL_TEST_INSTRUCTIONS.md +19 -13
- PUBLIC_TESTING_QUICKSTART.md +24 -7
- QUANTIZATION_PLAN.md +41 -5
- SERVING_BENCHMARKS.md +98 -0
- scripts/check_hf_uploaded_release.py +1 -1
EVAL_SCOREBOARD.md
CHANGED
|
@@ -35,7 +35,7 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
|
|
| 35 |
| Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
|
| 36 |
| Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
|
| 37 |
| Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
|
| 38 |
-
| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` | Passed; latest run `runs/public-opencode-smoke/
|
| 39 |
| Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
|
| 40 |
| Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
|
| 41 |
| Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
|
|
@@ -43,6 +43,10 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
|
|
| 43 |
| Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
|
| 44 |
| Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
|
| 45 |
| Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
| Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
|
| 47 |
| Hugging Face private repo create attempt | `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_release_staging.sh` with namespaces `RichardEchols`, `RMDWLLC`, and `restokes92` | Blocked by Hugging Face `403 Forbidden`; current token cannot create model repos in those namespaces | 2026-06-03 |
|
| 48 |
| Hugging Face merged-model metadata and upload boundary | `bash scripts/prepare_hf_merged_model_metadata.sh`; `KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh`; `bash scripts/upload_hf_merged_model_from_gojira_b.sh`; `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Metadata prep synced model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` to Gojira-B; sudo rsync handled the root-owned merged folder; upload dry run confirmed metadata plus the `51G`/`14`-shard merged model before printing `hf upload-large-folder`; apply remains blocked by human review and Hugging Face namespace permission before any large upload | 2026-06-03 |
|
|
|
|
| 35 |
| Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
|
| 36 |
| Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
|
| 37 |
| Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
|
| 38 |
+
| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --timeout 900 --keep-dir` | Passed; latest run `runs/public-opencode-smoke/20260603T232928Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
|
| 39 |
| Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
|
| 40 |
| Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
|
| 41 |
| Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
|
|
|
|
| 43 |
| Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
|
| 44 |
| Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
|
| 45 |
| Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
|
| 46 |
+
| Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
|
| 47 |
+
| Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
|
| 48 |
+
| GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
|
| 49 |
+
| Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `84.43s`: website `24.59s`, owner AI company pack `29.99s` with `19` files, Stripe safety plan `9.93s`, CSV parser artifact `19.93s`; run `runs/public-demo-pack/20260603T232534Z/summary.md` | 2026-06-03 |
|
| 50 |
| Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
|
| 51 |
| Hugging Face private repo create attempt | `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_release_staging.sh` with namespaces `RichardEchols`, `RMDWLLC`, and `restokes92` | Blocked by Hugging Face `403 Forbidden`; current token cannot create model repos in those namespaces | 2026-06-03 |
|
| 52 |
| Hugging Face merged-model metadata and upload boundary | `bash scripts/prepare_hf_merged_model_metadata.sh`; `KAIJU_MERGED_METADATA_APPLY=1 bash scripts/prepare_hf_merged_model_metadata.sh`; `bash scripts/upload_hf_merged_model_from_gojira_b.sh`; `KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Metadata prep synced model card, quickstarts, provenance, benchmarks, evals, paid API status, final report, upstream license, and `MERGED_MODEL_RELEASE_MANIFEST.json` to Gojira-B; sudo rsync handled the root-owned merged folder; upload dry run confirmed metadata plus the `51G`/`14`-shard merged model before printing `hf upload-large-folder`; apply remains blocked by human review and Hugging Face namespace permission before any large upload | 2026-06-03 |
|
FINAL_RELEASE_REPORT.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# Kaiju Coder 7 Final Release Report
|
| 2 |
|
| 3 |
-
Generated: `2026-06-
|
| 4 |
|
| 5 |
Product name: `Kaiju Coder 7`
|
| 6 |
Public model id: `kaiju-coder-7`
|
|
@@ -24,11 +24,11 @@ Stripe live-mode switch and controlled live payment verification.
|
|
| 24 |
|
| 25 |
| Field | Value |
|
| 26 |
|---|---|
|
| 27 |
-
| Status | `
|
| 28 |
| Base URL | `http://100.109.109.14:18083/v1` |
|
| 29 |
-
| Model id | `
|
| 30 |
-
| Max model length | `
|
| 31 |
-
| Detail | `` |
|
| 32 |
|
| 33 |
Recommended default today: `16k` context through `kaiju-coder-7`. Higher
|
| 34 |
context has benchmark evidence, but the currently parked default is 16k for
|
|
@@ -38,9 +38,9 @@ stability and speed.
|
|
| 38 |
|
| 39 |
| Area | Result |
|
| 40 |
|---|---|
|
| 41 |
-
| Local public-testing readiness | `ready=
|
| 42 |
-
| Hugging Face release readiness | `ready=
|
| 43 |
-
| Public launch readiness | `ready=
|
| 44 |
| Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
|
| 45 |
| Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
|
| 46 |
|
|
@@ -52,18 +52,22 @@ stability and speed.
|
|
| 52 |
| Small helper repos uploaded | `True` |
|
| 53 |
| Merged model uploaded | `True` |
|
| 54 |
| Merged repo | `RMDWLLC/kaiju-coder-7` |
|
| 55 |
-
| Merged repo SHA | `
|
| 56 |
| Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
|
| 57 |
| Download status | `public downloads verified; no active private-storage blocker recorded` |
|
| 58 |
| Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
|
| 59 |
|
| 60 |
## Hugging Face Release Blockers
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
| 63 |
|
| 64 |
## Public Launch Blockers
|
| 65 |
|
| 66 |
-
|
|
|
|
|
|
|
| 67 |
|
| 68 |
## Paid API Launch Blockers
|
| 69 |
|
|
@@ -93,7 +97,7 @@ stability and speed.
|
|
| 93 |
| Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
|
| 94 |
| Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
|
| 95 |
| Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
|
| 96 |
-
| Latest direct API smoke | `runs/benchmarks/
|
| 97 |
| Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
|
| 98 |
| Latest public OpenCode smoke | `runs/public-opencode-smoke` |
|
| 99 |
|
|
@@ -133,7 +137,7 @@ human release review explicitly approves public paid API launch.
|
|
| 133 |
|
| 134 |
## Changed Files
|
| 135 |
|
| 136 |
-
`git status --short` currently reports `
|
| 137 |
|
| 138 |
| State | Path |
|
| 139 |
|---|---|
|
|
@@ -153,8 +157,10 @@ human release review explicitly approves public paid API launch.
|
|
| 153 |
| M | `gateway/cloudflare-worker/src/index.js` |
|
| 154 |
| M | `gateway/cloudflare-worker/test/index.test.js` |
|
| 155 |
| M | `gateway/cloudflare-worker/wrangler.jsonc` |
|
|
|
|
| 156 |
| M | `kaiju_harness/router.py` |
|
| 157 |
| M | `kaiju_harness/verification.py` |
|
|
|
|
| 158 |
| D | `models/README.md` |
|
| 159 |
| D | `models/qwen3.6-27b-base.md` |
|
| 160 |
| D | `models/qwen3.6-27b-fp8.md` |
|
|
@@ -164,14 +170,17 @@ human release review explicitly approves public paid API launch.
|
|
| 164 |
| M | `release/MODEL_CARD_DRAFT.md` |
|
| 165 |
| M | `scripts/build_sft_dataset.py` |
|
| 166 |
| M | `scripts/check-gojira-b-capacity.sh` |
|
|
|
|
| 167 |
| M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
|
| 168 |
| M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
|
| 169 |
| M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
|
| 170 |
| M | `scripts/run_kaiju_api_harness_smoke.py` |
|
|
|
|
| 171 |
| M | `scripts/start-qwen36-lora-sglang.sh` |
|
| 172 |
| M | `scripts/stop-qwen36-lora-sglang.sh` |
|
| 173 |
| M | `scripts/validate_training_data.py` |
|
| 174 |
| M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
|
|
|
|
| 175 |
| ?? | `.opencode/` |
|
| 176 |
| ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
|
| 177 |
| ?? | `datasets/v1.7-targets.json` |
|
|
@@ -196,6 +205,7 @@ human release review explicitly approves public paid API launch.
|
|
| 196 |
| ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
|
| 197 |
| ?? | `release/bundles/` |
|
| 198 |
| ?? | `release/cloudflare-bindings.example.json` |
|
|
|
|
| 199 |
| ?? | `release/hf-release-permission-evidence.example.json` |
|
| 200 |
| ?? | `release/hf-release-permission-evidence.json` |
|
| 201 |
| ?? | `release/huggingface/` |
|
|
@@ -225,17 +235,21 @@ human release review explicitly approves public paid API launch.
|
|
| 225 |
| ?? | `scripts/generate_kaiju_final_report.py` |
|
| 226 |
| ?? | `scripts/gojira-b-ssh-lib.sh` |
|
| 227 |
| ?? | `scripts/install_kaiju_opencode_profile.py` |
|
|
|
|
| 228 |
| ?? | `scripts/make_hf_release_public.sh` |
|
| 229 |
| ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
|
| 230 |
| ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
|
| 231 |
| ?? | `scripts/prepare_hf_release_staging.sh` |
|
| 232 |
| ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
|
| 233 |
| ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
|
|
|
|
| 234 |
| ?? | `scripts/refresh_kaiju_release_evidence.py` |
|
|
|
|
| 235 |
| ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
|
| 236 |
| ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
|
| 237 |
| ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
|
| 238 |
| ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
|
|
|
|
| 239 |
| ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
|
| 240 |
| ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
|
| 241 |
| ?? | `scripts/start-qwen36-merged-sglang.sh` |
|
|
@@ -262,9 +276,9 @@ human release review explicitly approves public paid API launch.
|
|
| 262 |
| git HEAD | `git rev-parse HEAD` | 0 |
|
| 263 |
| git origin/main | `git rev-parse origin/main` | 0 |
|
| 264 |
| git status | `git status --short` | 0 |
|
| 265 |
-
| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` |
|
| 266 |
-
| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` |
|
| 267 |
-
| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` |
|
| 268 |
| HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
|
| 269 |
| paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
|
| 270 |
|
|
|
|
| 1 |
# Kaiju Coder 7 Final Release Report
|
| 2 |
|
| 3 |
+
Generated: `2026-06-03T23:34:00Z`
|
| 4 |
|
| 5 |
Product name: `Kaiju Coder 7`
|
| 6 |
Public model id: `kaiju-coder-7`
|
|
|
|
| 24 |
|
| 25 |
| Field | Value |
|
| 26 |
|---|---|
|
| 27 |
+
| Status | `fail` |
|
| 28 |
| Base URL | `http://100.109.109.14:18083/v1` |
|
| 29 |
+
| Model id | `unknown` |
|
| 30 |
+
| Max model length | `unknown` |
|
| 31 |
+
| Detail | `URLError(ConnectionRefusedError(61, 'Connection refused'))` |
|
| 32 |
|
| 33 |
Recommended default today: `16k` context through `kaiju-coder-7`. Higher
|
| 34 |
context has benchmark evidence, but the currently parked default is 16k for
|
|
|
|
| 38 |
|
| 39 |
| Area | Result |
|
| 40 |
|---|---|
|
| 41 |
+
| Local public-testing readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
|
| 42 |
+
| Hugging Face release readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
|
| 43 |
+
| Public launch readiness | `ready=False pass=23 fail=1 manual=0 rc=1` |
|
| 44 |
| Hugging Face staging integrity | `ready=True pass=6 fail=0 manual=0 rc=0` |
|
| 45 |
| Paid API launch readiness | `ready=True pass=27 fail=0 manual=0 rc=0` |
|
| 46 |
|
|
|
|
| 52 |
| Small helper repos uploaded | `True` |
|
| 53 |
| Merged model uploaded | `True` |
|
| 54 |
| Merged repo | `RMDWLLC/kaiju-coder-7` |
|
| 55 |
+
| Merged repo SHA | `00ba85985102a14838dbb8a5692d9a75ce9da15a` |
|
| 56 |
| Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
|
| 57 |
| Download status | `public downloads verified; no active private-storage blocker recorded` |
|
| 58 |
| Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
|
| 59 |
|
| 60 |
## Hugging Face Release Blockers
|
| 61 |
|
| 62 |
+
| Status | Check | Detail |
|
| 63 |
+
|---|---|---|
|
| 64 |
+
| fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
|
| 65 |
|
| 66 |
## Public Launch Blockers
|
| 67 |
|
| 68 |
+
| Status | Check | Detail |
|
| 69 |
+
|---|---|---|
|
| 70 |
+
| fail | live runtime | could not read http://100.109.109.14:18083/v1/models: URLError(ConnectionRefusedError(61, 'Connection refused')) |
|
| 71 |
|
| 72 |
## Paid API Launch Blockers
|
| 73 |
|
|
|
|
| 97 |
| Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
|
| 98 |
| Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
|
| 99 |
| Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
|
| 100 |
+
| Latest direct API smoke | `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md` |
|
| 101 |
| Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
|
| 102 |
| Latest public OpenCode smoke | `runs/public-opencode-smoke` |
|
| 103 |
|
|
|
|
| 137 |
|
| 138 |
## Changed Files
|
| 139 |
|
| 140 |
+
`git status --short` currently reports `126` changed paths.
|
| 141 |
|
| 142 |
| State | Path |
|
| 143 |
|---|---|
|
|
|
|
| 157 |
| M | `gateway/cloudflare-worker/src/index.js` |
|
| 158 |
| M | `gateway/cloudflare-worker/test/index.test.js` |
|
| 159 |
| M | `gateway/cloudflare-worker/wrangler.jsonc` |
|
| 160 |
+
| M | `gateway/gojira-local/server.py` |
|
| 161 |
| M | `kaiju_harness/router.py` |
|
| 162 |
| M | `kaiju_harness/verification.py` |
|
| 163 |
+
| M | `kaiju_harness/website.py` |
|
| 164 |
| D | `models/README.md` |
|
| 165 |
| D | `models/qwen3.6-27b-base.md` |
|
| 166 |
| D | `models/qwen3.6-27b-fp8.md` |
|
|
|
|
| 170 |
| M | `release/MODEL_CARD_DRAFT.md` |
|
| 171 |
| M | `scripts/build_sft_dataset.py` |
|
| 172 |
| M | `scripts/check-gojira-b-capacity.sh` |
|
| 173 |
+
| M | `scripts/check_kaiju_gateway_policy.py` |
|
| 174 |
| M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
|
| 175 |
| M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
|
| 176 |
| M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
|
| 177 |
| M | `scripts/run_kaiju_api_harness_smoke.py` |
|
| 178 |
+
| M | `scripts/run_kaiju_router.py` |
|
| 179 |
| M | `scripts/start-qwen36-lora-sglang.sh` |
|
| 180 |
| M | `scripts/stop-qwen36-lora-sglang.sh` |
|
| 181 |
| M | `scripts/validate_training_data.py` |
|
| 182 |
| M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
|
| 183 |
+
| M | `tests/test_website_harness.py` |
|
| 184 |
| ?? | `.opencode/` |
|
| 185 |
| ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
|
| 186 |
| ?? | `datasets/v1.7-targets.json` |
|
|
|
|
| 205 |
| ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
|
| 206 |
| ?? | `release/bundles/` |
|
| 207 |
| ?? | `release/cloudflare-bindings.example.json` |
|
| 208 |
+
| ?? | `release/gguf/` |
|
| 209 |
| ?? | `release/hf-release-permission-evidence.example.json` |
|
| 210 |
| ?? | `release/hf-release-permission-evidence.json` |
|
| 211 |
| ?? | `release/huggingface/` |
|
|
|
|
| 235 |
| ?? | `scripts/generate_kaiju_final_report.py` |
|
| 236 |
| ?? | `scripts/gojira-b-ssh-lib.sh` |
|
| 237 |
| ?? | `scripts/install_kaiju_opencode_profile.py` |
|
| 238 |
+
| ?? | `scripts/kaiju_opencode_fast_proxy.py` |
|
| 239 |
| ?? | `scripts/make_hf_release_public.sh` |
|
| 240 |
| ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
|
| 241 |
| ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
|
| 242 |
| ?? | `scripts/prepare_hf_release_staging.sh` |
|
| 243 |
| ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
|
| 244 |
| ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
|
| 245 |
+
| ?? | `scripts/probe-gojira-b-persisted-quantization.sh` |
|
| 246 |
| ?? | `scripts/refresh_kaiju_release_evidence.py` |
|
| 247 |
+
| ?? | `scripts/run-gojira-b-kaiju-gguf-convert.sh` |
|
| 248 |
| ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
|
| 249 |
| ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
|
| 250 |
| ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
|
| 251 |
| ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
|
| 252 |
+
| ?? | `scripts/run_kaiju_public_demo_pack.py` |
|
| 253 |
| ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
|
| 254 |
| ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
|
| 255 |
| ?? | `scripts/start-qwen36-merged-sglang.sh` |
|
|
|
|
| 276 |
| git HEAD | `git rev-parse HEAD` | 0 |
|
| 277 |
| git origin/main | `git rev-parse origin/main` | 0 |
|
| 278 |
| git status | `git status --short` | 0 |
|
| 279 |
+
| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
|
| 280 |
+
| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
|
| 281 |
+
| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://100.109.109.14:18083/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 1 |
|
| 282 |
| HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
|
| 283 |
| paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
|
| 284 |
|
GOAL_COMPLETION_AUDIT.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# Kaiju Coder 7 Goal Completion Audit
|
| 2 |
|
| 3 |
-
Generated: `2026-06-
|
| 4 |
|
| 5 |
Overall: `complete`
|
| 6 |
Summary: `18 passed / 0 blocked / 0 manual`
|
|
@@ -28,7 +28,7 @@ This audit maps the active Kaiju Coder 7 objective to current evidence. It is st
|
|
| 28 |
| OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py | |
|
| 29 |
| OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py | |
|
| 30 |
| OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
|
| 31 |
-
| Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/
|
| 32 |
| Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default | |
|
| 33 |
| Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md | |
|
| 34 |
| Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
|
|
|
|
| 1 |
# Kaiju Coder 7 Goal Completion Audit
|
| 2 |
|
| 3 |
+
Generated: `2026-06-03T23:35:30Z`
|
| 4 |
|
| 5 |
Overall: `complete`
|
| 6 |
Summary: `18 passed / 0 blocked / 0 manual`
|
|
|
|
| 28 |
| OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py | |
|
| 29 |
| OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py | |
|
| 30 |
| OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
|
| 31 |
+
| Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md | |
|
| 32 |
| Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default | |
|
| 33 |
| Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md | |
|
| 34 |
| Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
|
HF_UPLOAD_EVIDENCE.md
CHANGED
|
@@ -6,10 +6,10 @@ Generated: `2026-06-03T20:36:26Z`
|
|
| 6 |
|
| 7 |
| Repo | Visibility | Evidence |
|
| 8 |
|---|---|---|
|
| 9 |
-
| `RMDWLLC/kaiju-coder-7-adapter` | public | Final visible SHA `
|
| 10 |
| `RMDWLLC/kaiju-coder-7-opencode` | public | Final visible SHA `3c9c75416ffb41645a1a959beb99baeff6972fb8`; public visibility and OpenCode installer dry-run verified. |
|
| 11 |
| `RMDWLLC/kaiju-coder-7-quantized-runtime` | public | Uploaded at commit `6d7449a3ffac68ed1d591c57b044ba599cee8b11`; public visibility verified. |
|
| 12 |
-
| `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `
|
| 13 |
|
| 14 |
These SHAs are a point-in-time release evidence snapshot. Uploading this
|
| 15 |
evidence file itself creates another metadata commit, so use `hf models info`
|
|
@@ -78,7 +78,7 @@ Result:
|
|
| 78 |
- The downloaded OpenCode helper installer dry-run passed and included the
|
| 79 |
loop guard.
|
| 80 |
- Merged model metadata reports `private: false`, SHA
|
| 81 |
-
`
|
| 82 |
safetensors shards.
|
| 83 |
|
| 84 |
The earlier private-storage limit blocked private file downloads after the
|
|
|
|
| 6 |
|
| 7 |
| Repo | Visibility | Evidence |
|
| 8 |
|---|---|---|
|
| 9 |
+
| `RMDWLLC/kaiju-coder-7-adapter` | public | Final visible SHA `5016ab9e5f32ca3f94d49a4dbed65de2729bd6ce`; public visibility verified after 2026-06-03 paid API evidence refresh. |
|
| 10 |
| `RMDWLLC/kaiju-coder-7-opencode` | public | Final visible SHA `3c9c75416ffb41645a1a959beb99baeff6972fb8`; public visibility and OpenCode installer dry-run verified. |
|
| 11 |
| `RMDWLLC/kaiju-coder-7-quantized-runtime` | public | Uploaded at commit `6d7449a3ffac68ed1d591c57b044ba599cee8b11`; public visibility verified. |
|
| 12 |
+
| `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `00ba85985102a14838dbb8a5692d9a75ce9da15a`; public metadata reports `private: false`. |
|
| 13 |
|
| 14 |
These SHAs are a point-in-time release evidence snapshot. Uploading this
|
| 15 |
evidence file itself creates another metadata commit, so use `hf models info`
|
|
|
|
| 78 |
- The downloaded OpenCode helper installer dry-run passed and included the
|
| 79 |
loop guard.
|
| 80 |
- Merged model metadata reports `private: false`, SHA
|
| 81 |
+
`00ba85985102a14838dbb8a5692d9a75ce9da15a`, and lists all `14`
|
| 82 |
safetensors shards.
|
| 83 |
|
| 84 |
The earlier private-storage limit blocked private file downloads after the
|
LOCAL_TEST_INSTRUCTIONS.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# Kaiju Coder 7 Local Test Instructions
|
| 2 |
|
| 3 |
-
Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the
|
| 4 |
|
| 5 |
## Run The Local Release-Candidate Gate
|
| 6 |
|
|
@@ -24,26 +24,32 @@ KAIJU_MERGED_MODEL_DIR=/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.
|
|
| 24 |
|
| 25 |
## Start Kaiju Coder 7 Serving
|
| 26 |
|
| 27 |
-
Use this for the current model-side candidate:
|
| 28 |
|
| 29 |
```bash
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
./scripts/start-qwen36-merged-
|
| 35 |
```
|
| 36 |
|
| 37 |
Confirm readiness:
|
| 38 |
|
| 39 |
```bash
|
| 40 |
-
curl http://100.109.109.14:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
```
|
| 42 |
|
| 43 |
The high-context `32768` target has benchmark evidence in
|
| 44 |
-
`release/SERVING_BENCHMARKS.md`, but the current
|
| 45 |
-
|
| 46 |
-
smoke work.
|
| 47 |
|
| 48 |
## Prepare Merged-Model Hugging Face Metadata
|
| 49 |
|
|
@@ -82,7 +88,7 @@ python3 scripts/run_kaiju_api_harness_smoke.py
|
|
| 82 |
|
| 83 |
```bash
|
| 84 |
python3 evals/run_openai_compat_smoke.py \
|
| 85 |
-
--base-url http://100.109.109.14:
|
| 86 |
--model kaiju-coder-7 \
|
| 87 |
--tasks evals/tasks/smoke.jsonl \
|
| 88 |
--max-tasks 1 \
|
|
@@ -100,7 +106,7 @@ evals pass at acceptable latency:
|
|
| 100 |
|
| 101 |
```bash
|
| 102 |
python3 evals/run_openai_compat_smoke.py \
|
| 103 |
-
--base-url http://100.109.109.14:
|
| 104 |
--model kaiju-coder-7 \
|
| 105 |
--tasks evals/tasks/business-owner-v18-comparison.jsonl \
|
| 106 |
--timeout 900 \
|
|
|
|
| 1 |
# Kaiju Coder 7 Local Test Instructions
|
| 2 |
|
| 3 |
+
Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the fastest current runtime is vLLM bitsandbytes on Gojira B over Tailscale with the local OpenCode fast proxy.
|
| 4 |
|
| 5 |
## Run The Local Release-Candidate Gate
|
| 6 |
|
|
|
|
| 24 |
|
| 25 |
## Start Kaiju Coder 7 Serving
|
| 26 |
|
| 27 |
+
Use this for the fastest current model-side candidate:
|
| 28 |
|
| 29 |
```bash
|
| 30 |
+
KAIJU_VLLM_CONTEXT=16384 \
|
| 31 |
+
KAIJU_VLLM_QUANTIZATION=bitsandbytes \
|
| 32 |
+
KAIJU_VLLM_LOAD_FORMAT=bitsandbytes \
|
| 33 |
+
KAIJU_VLLM_GPU_UTIL=0.90 \
|
| 34 |
+
./scripts/start-qwen36-merged-vllm.sh
|
| 35 |
```
|
| 36 |
|
| 37 |
Confirm readiness:
|
| 38 |
|
| 39 |
```bash
|
| 40 |
+
curl http://100.109.109.14:18084/v1/models
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
Then keep the Mac-side fast proxy pointed at that vLLM endpoint:
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
KAIJU_OPENAI_BASE_URL=http://100.109.109.14:18084/v1 \
|
| 47 |
+
python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
|
| 48 |
```
|
| 49 |
|
| 50 |
The high-context `32768` target has benchmark evidence in
|
| 51 |
+
`release/SERVING_BENCHMARKS.md`, but the current speed/default path is 16k
|
| 52 |
+
runtime-quantized vLLM plus the local fast proxy.
|
|
|
|
| 53 |
|
| 54 |
## Prepare Merged-Model Hugging Face Metadata
|
| 55 |
|
|
|
|
| 88 |
|
| 89 |
```bash
|
| 90 |
python3 evals/run_openai_compat_smoke.py \
|
| 91 |
+
--base-url http://100.109.109.14:18084/v1 \
|
| 92 |
--model kaiju-coder-7 \
|
| 93 |
--tasks evals/tasks/smoke.jsonl \
|
| 94 |
--max-tasks 1 \
|
|
|
|
| 106 |
|
| 107 |
```bash
|
| 108 |
python3 evals/run_openai_compat_smoke.py \
|
| 109 |
+
--base-url http://100.109.109.14:18084/v1 \
|
| 110 |
--model kaiju-coder-7 \
|
| 111 |
--tasks evals/tasks/business-owner-v18-comparison.jsonl \
|
| 112 |
--timeout 900 \
|
PUBLIC_TESTING_QUICKSTART.md
CHANGED
|
@@ -19,7 +19,7 @@ Use this if you already have Kaiju Coder 7 served at an OpenAI-compatible
|
|
| 19 |
```bash
|
| 20 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
|
| 21 |
cd kaiju-coder-7-opencode
|
| 22 |
-
python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:
|
| 23 |
```
|
| 24 |
|
| 25 |
Then run OpenCode inside the project you want to edit:
|
|
@@ -65,23 +65,31 @@ the server to expose:
|
|
| 65 |
|
| 66 |
```text
|
| 67 |
model id: kaiju-coder-7
|
| 68 |
-
base URL: http://127.0.0.1:
|
| 69 |
context: 16384
|
| 70 |
```
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
Then install the OpenCode helper with:
|
| 73 |
|
| 74 |
```bash
|
| 75 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
|
| 76 |
cd kaiju-coder-7-opencode
|
| 77 |
-
python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:
|
| 78 |
```
|
| 79 |
|
| 80 |
### Path 3: Runtime-Quantized Local Candidate
|
| 81 |
|
| 82 |
Use this only if you are comfortable with advanced serving setups. The current
|
| 83 |
-
working quantized option is a runtime bitsandbytes recipe
|
| 84 |
-
|
| 85 |
|
| 86 |
```bash
|
| 87 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
|
|
@@ -115,9 +123,12 @@ Expected result:
|
|
| 115 |
- Public model id: `kaiju-coder-7`
|
| 116 |
- OpenCode context: `16384`
|
| 117 |
- Output cap for public testing: `2500`
|
|
|
|
| 118 |
- Current reliable product path: model plus deterministic business-owner
|
| 119 |
-
harness plus verifier
|
| 120 |
-
- Raw multi-file OpenCode generation: still too slow for broad paid
|
|
|
|
|
|
|
| 121 |
- Paid API: not public until launch preflight passes
|
| 122 |
|
| 123 |
## What Not To Claim Yet
|
|
@@ -134,15 +145,21 @@ Do claim:
|
|
| 134 |
- Kaiju Coder 7 has a working local/OpenCode release candidate
|
| 135 |
- the current tested OpenCode default is 16k context
|
| 136 |
- the helper package includes a lean agent and compaction loop guard
|
|
|
|
|
|
|
| 137 |
- the paid API scaffold has tests and a launch preflight, but is not yet public
|
| 138 |
- the packaged public smoke verifies a fresh OpenCode one-file write before
|
| 139 |
public claims are refreshed
|
|
|
|
|
|
|
| 140 |
|
| 141 |
## Current Blockers Before Public Release
|
| 142 |
|
| 143 |
- Hugging Face repo creation still requires a write-capable token or namespace.
|
| 144 |
- Full merged model upload has not completed; the merged folder must first have
|
| 145 |
the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
|
|
|
|
|
|
|
| 146 |
- Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
|
| 147 |
secret verification, Stripe webhook staging evidence, staging traffic, latency
|
| 148 |
evidence, and rollback proof.
|
|
|
|
| 19 |
```bash
|
| 20 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
|
| 21 |
cd kaiju-coder-7-opencode
|
| 22 |
+
python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
|
| 23 |
```
|
| 24 |
|
| 25 |
Then run OpenCode inside the project you want to edit:
|
|
|
|
| 65 |
|
| 66 |
```text
|
| 67 |
model id: kaiju-coder-7
|
| 68 |
+
base URL: http://127.0.0.1:18084/v1
|
| 69 |
context: 16384
|
| 70 |
```
|
| 71 |
|
| 72 |
+
For the fastest OpenCode behavior, run the bundled fast proxy in a separate
|
| 73 |
+
terminal and point OpenCode at the proxy:
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
KAIJU_OPENAI_BASE_URL=http://127.0.0.1:18084/v1 \
|
| 77 |
+
python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
Then install the OpenCode helper with:
|
| 81 |
|
| 82 |
```bash
|
| 83 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
|
| 84 |
cd kaiju-coder-7-opencode
|
| 85 |
+
python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
|
| 86 |
```
|
| 87 |
|
| 88 |
### Path 3: Runtime-Quantized Local Candidate
|
| 89 |
|
| 90 |
Use this only if you are comfortable with advanced serving setups. The current
|
| 91 |
+
working quantized option is a runtime bitsandbytes recipe. A Q8_0 GGUF artifact
|
| 92 |
+
has been converted, but it is still a candidate until runtime smoke passes.
|
| 93 |
|
| 94 |
```bash
|
| 95 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
|
|
|
|
| 123 |
- Public model id: `kaiju-coder-7`
|
| 124 |
- OpenCode context: `16384`
|
| 125 |
- Output cap for public testing: `2500`
|
| 126 |
+
- Fast OpenCode path: vLLM bitsandbytes runtime behind the Kaiju fast proxy
|
| 127 |
- Current reliable product path: model plus deterministic business-owner
|
| 128 |
+
harness/router plus verifier
|
| 129 |
+
- Raw multi-file OpenCode generation: still too slow for broad paid claims;
|
| 130 |
+
useful for testing, but paid API claims should favor harnessed product
|
| 131 |
+
workflows until broader latency gates pass
|
| 132 |
- Paid API: not public until launch preflight passes
|
| 133 |
|
| 134 |
## What Not To Claim Yet
|
|
|
|
| 145 |
- Kaiju Coder 7 has a working local/OpenCode release candidate
|
| 146 |
- the current tested OpenCode default is 16k context
|
| 147 |
- the helper package includes a lean agent and compaction loop guard
|
| 148 |
+
- the fast proxy keeps OpenCode tool calls intact while forcing bounded,
|
| 149 |
+
non-thinking generation
|
| 150 |
- the paid API scaffold has tests and a launch preflight, but is not yet public
|
| 151 |
- the packaged public smoke verifies a fresh OpenCode one-file write before
|
| 152 |
public claims are refreshed
|
| 153 |
+
- a GGUF Q8_0 candidate exists, but is not public quantized-weights release
|
| 154 |
+
evidence until runtime smoke passes
|
| 155 |
|
| 156 |
## Current Blockers Before Public Release
|
| 157 |
|
| 158 |
- Hugging Face repo creation still requires a write-capable token or namespace.
|
| 159 |
- Full merged model upload has not completed; the merged folder must first have
|
| 160 |
the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
|
| 161 |
+
- The GGUF Q8_0 candidate still needs a runtime smoke before public
|
| 162 |
+
quantized-weights upload.
|
| 163 |
- Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
|
| 164 |
secret verification, Stripe webhook staging evidence, staging traffic, latency
|
| 165 |
evidence, and rollback proof.
|
QUANTIZATION_PLAN.md
CHANGED
|
@@ -54,6 +54,44 @@ Findings on 2026-06-03:
|
|
| 54 |
`/tmp/kaiju-opencode-quantized-smoke/hello.txt` with exactly
|
| 55 |
`Kaiju Coder 7 quantized runtime ok`.
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
## Candidate Order
|
| 58 |
|
| 59 |
1. **FP8/AWQ-style GPU serving candidate**
|
|
@@ -65,7 +103,8 @@ Findings on 2026-06-03:
|
|
| 65 |
|
| 66 |
2. **GGUF/llama.cpp candidate**
|
| 67 |
- Best for broad local distribution if the architecture converts cleanly.
|
| 68 |
-
-
|
|
|
|
| 69 |
|
| 70 |
3. **MLX candidate**
|
| 71 |
- Best for Apple Silicon users if conversion supports this architecture.
|
|
@@ -90,7 +129,4 @@ path, but not as a public quantized-weights release.
|
|
| 90 |
|
| 91 |
## Next Concrete Step
|
| 92 |
|
| 93 |
-
Create a pinned Docker/UV quantization environment on Gojira-B with the
|
| 94 |
-
Qwen3.5-capable Transformers/runtime stack plus one persistent-weight
|
| 95 |
-
quantization package at a time. Do not upload a quantized-weights repo until a
|
| 96 |
-
smoke-tested persisted artifact exists.
|
|
|
|
| 54 |
`/tmp/kaiju-opencode-quantized-smoke/hello.txt` with exactly
|
| 55 |
`Kaiju Coder 7 quantized runtime ok`.
|
| 56 |
|
| 57 |
+
Second persisted-quantization probe:
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
./scripts/probe-gojira-b-persisted-quantization.sh
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
Findings on 2026-06-03:
|
| 64 |
+
|
| 65 |
+
- The active nightly vLLM stack recognizes `Qwen3_5Config` / `qwen3_5`.
|
| 66 |
+
- Normal dependency installs for AWQ/GPTQ/llmcompressor can break that
|
| 67 |
+
Qwen3.5-capable Transformers stack, so they are not safe to run casually in
|
| 68 |
+
the serving image.
|
| 69 |
+
- `autoawq` installed but importing `awq` failed against the current
|
| 70 |
+
Transformers activation API.
|
| 71 |
+
- `auto-gptq` failed during build isolation because Torch was not visible to
|
| 72 |
+
the isolated build step.
|
| 73 |
+
- `llmcompressor --no-deps` preserved Qwen3.5 config support, but import still
|
| 74 |
+
needs a pinned supporting dependency set. This remains the next best GPU
|
| 75 |
+
persisted-weight path after a dedicated environment is built.
|
| 76 |
+
- `llama.cpp` support includes `Qwen3_5ForConditionalGeneration`, and Q8_0
|
| 77 |
+
conversion dry-run passed.
|
| 78 |
+
|
| 79 |
+
Persisted GGUF conversion:
|
| 80 |
+
|
| 81 |
+
```bash
|
| 82 |
+
./scripts/run-gojira-b-kaiju-gguf-convert.sh
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
- Output:
|
| 86 |
+
`/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`
|
| 87 |
+
- Size: `27G`
|
| 88 |
+
- SHA256:
|
| 89 |
+
`596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`
|
| 90 |
+
- Evidence:
|
| 91 |
+
`runs/gguf-conversion/20260603T231446Z/gguf-conversion.log`
|
| 92 |
+
- Release status: converted, runtime smoke still required before public
|
| 93 |
+
quantized-weights upload.
|
| 94 |
+
|
| 95 |
## Candidate Order
|
| 96 |
|
| 97 |
1. **FP8/AWQ-style GPU serving candidate**
|
|
|
|
| 103 |
|
| 104 |
2. **GGUF/llama.cpp candidate**
|
| 105 |
- Best for broad local distribution if the architecture converts cleanly.
|
| 106 |
+
- Current state: Q8_0 converted successfully on Gojira-B.
|
| 107 |
+
- Publish only if a real runtime smoke test passes.
|
| 108 |
|
| 109 |
3. **MLX candidate**
|
| 110 |
- Best for Apple Silicon users if conversion supports this architecture.
|
|
|
|
| 129 |
|
| 130 |
## Next Concrete Step
|
| 131 |
|
| 132 |
+
Smoke-test the GGUF Q8_0 candidate next. Create a pinned Docker/UV quantization environment on Gojira-B with the Qwen3.5-capable Transformers/runtime stack plus one persistent-weight GPU quantization package at a time. Do not upload a quantized-weights repo until a smoke-tested persisted artifact exists.
|
|
|
|
|
|
|
|
|
SERVING_BENCHMARKS.md
CHANGED
|
@@ -323,6 +323,7 @@ Runs:
|
|
| 323 |
- `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
|
| 324 |
- `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
|
| 325 |
- `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
|
|
|
|
| 326 |
|
| 327 |
| Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
|
| 328 |
| --- | ---: | --- | --- | ---: | ---: | ---: |
|
|
@@ -332,6 +333,8 @@ Runs:
|
|
| 332 |
| vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
|
| 333 |
| vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
|
| 334 |
| vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
|
|
|
|
|
|
|
| 335 |
|
| 336 |
Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
|
| 337 |
load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
|
|
@@ -356,3 +359,98 @@ restarted and re-confirmed. Treat vLLM bitsandbytes as the current working
|
|
| 356 |
quantized local candidate for advanced GPU users and future paid API speed
|
| 357 |
experiments. It now has direct identity/code/business-doc evidence plus an
|
| 358 |
OpenCode one-file smoke, but it is not a persisted quantized-weights repo.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 323 |
- `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
|
| 324 |
- `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
|
| 325 |
- `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
|
| 326 |
+
- `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
|
| 327 |
|
| 328 |
| Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
|
| 329 |
| --- | ---: | --- | --- | ---: | ---: | ---: |
|
|
|
|
| 333 |
| vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
|
| 334 |
| vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
|
| 335 |
| vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
|
| 336 |
+
| vLLM bitsandbytes | 16384 | code_patch | True | 24.97 | 997 | 39.924 |
|
| 337 |
+
| vLLM bitsandbytes | 16384 | business_doc | True | 34.46 | 1615 | 46.874 |
|
| 338 |
|
| 339 |
Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
|
| 340 |
load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
|
|
|
|
| 359 |
quantized local candidate for advanced GPU users and future paid API speed
|
| 360 |
experiments. It now has direct identity/code/business-doc evidence plus an
|
| 361 |
OpenCode one-file smoke, but it is not a persisted quantized-weights repo.
|
| 362 |
+
|
| 363 |
+
## 2026-06-03 Fast Proxy And Website Harness Speed Pass
|
| 364 |
+
|
| 365 |
+
The current speed profile keeps runtime-quantized vLLM active on Gojira-B port
|
| 366 |
+
`18084` and routes OpenCode through the local fast proxy at
|
| 367 |
+
`http://127.0.0.1:18181/v1`. The proxy preserves OpenCode tool-call streaming
|
| 368 |
+
while forcing `thinking=false`, model id `kaiju-coder-7`, and bounded output
|
| 369 |
+
budgets.
|
| 370 |
+
|
| 371 |
+
Active endpoint checks:
|
| 372 |
+
|
| 373 |
+
- Local fast proxy health: `http://127.0.0.1:18181/health`
|
| 374 |
+
- Upstream vLLM models: `http://100.109.109.14:18084/v1/models`
|
| 375 |
+
- Upstream reports `kaiju-coder-7` with `max_model_len=16384`
|
| 376 |
+
|
| 377 |
+
Fresh direct vLLM benchmark:
|
| 378 |
+
|
| 379 |
+
- Run: `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
|
| 380 |
+
- Identity: `19.48s`
|
| 381 |
+
- Code patch: `24.97s`, `997` chars
|
| 382 |
+
- Business doc: `34.46s`, `1,615` chars
|
| 383 |
+
|
| 384 |
+
Fresh OpenCode smoke through the local fast proxy:
|
| 385 |
+
|
| 386 |
+
- Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
|
| 387 |
+
- Result: passed in about `23.5s`, wrote the exact requested file.
|
| 388 |
+
- Packaged public verifier after exact-content agent rule:
|
| 389 |
+
`runs/public-opencode-smoke/20260603T232928Z/summary.md`, `4/4`
|
| 390 |
+
passed through `http://127.0.0.1:18181/v1`.
|
| 391 |
+
|
| 392 |
+
Website harness/router speed pass:
|
| 393 |
+
|
| 394 |
+
- Direct website harness command: `python3 scripts/run_kaiju_website_harness.py --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
|
| 395 |
+
- Direct website harness result: `runs/harness/website-speed-pass/avery-stone-vllm.html`, `9,257` chars, `7.31s`
|
| 396 |
+
- Router command: `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
|
| 397 |
+
- Router artifact: `runs/router-speed-pass/20260603T223731Z-website-build-a-premium-one-page-website-for-avery-stone-construction-a-reside/index.html`
|
| 398 |
+
- Router result: passed in `7.20s`; checks covered complete HTML, required sections, external images, responsive CSS, no lorem ipsum, and manifest write.
|
| 399 |
+
- Router through the installed local proxy: `runs/router-speed-pass/20260603T224328Z-website-build-a-premium-one-page-website-for-bennett-family-dental-in-charlott/index.html`
|
| 400 |
+
- Proxy router result: passed in `4.67s`; preserved explicit CTA `Schedule a Visit`, inferred `dental`, and passed the same complete-HTML/static checks.
|
| 401 |
+
|
| 402 |
+
Updated recommendation: for speed-sensitive OpenCode and paid workflow testing,
|
| 403 |
+
use vLLM bitsandbytes plus the local fast proxy as the active default. Keep
|
| 404 |
+
SGLang as fallback/historical evidence, not the fastest current path. For
|
| 405 |
+
websites and business-owner packs, prefer the deterministic router/harness path
|
| 406 |
+
over raw long-form HTML generation.
|
| 407 |
+
|
| 408 |
+
Public business-owner demo pack through the active fast proxy:
|
| 409 |
+
|
| 410 |
+
```bash
|
| 411 |
+
python3 scripts/run_kaiju_public_demo_pack.py \
|
| 412 |
+
--openai-base-url http://127.0.0.1:18181/v1 \
|
| 413 |
+
--model kaiju-coder-7 \
|
| 414 |
+
--planner-timeout 90
|
| 415 |
+
```
|
| 416 |
+
|
| 417 |
+
Run: `runs/public-demo-pack/20260603T232534Z/summary.md`
|
| 418 |
+
|
| 419 |
+
| Task | Result | Seconds | Changed files |
|
| 420 |
+
| --- | --- | ---: | ---: |
|
| 421 |
+
| Website | Passed | 24.59 | 2 |
|
| 422 |
+
| Owner AI company pack | Passed | 29.99 | 19 |
|
| 423 |
+
| Stripe safety plan | Passed | 9.93 | 2 |
|
| 424 |
+
| CSV parser artifact | Passed | 19.93 | 2 |
|
| 425 |
+
|
| 426 |
+
Total: `4/4` passed in `84.43s`.
|
| 427 |
+
|
| 428 |
+
## Persisted GGUF Q8_0 Candidate
|
| 429 |
+
|
| 430 |
+
The dedicated persisted-quantization pass found that normal AWQ/GPTQ installs
|
| 431 |
+
are not clean against the Qwen3.5-capable serving stack tonight, while
|
| 432 |
+
`llama.cpp` conversion support includes `Qwen3_5ForConditionalGeneration`.
|
| 433 |
+
|
| 434 |
+
Command:
|
| 435 |
+
|
| 436 |
+
```bash
|
| 437 |
+
./scripts/probe-gojira-b-persisted-quantization.sh
|
| 438 |
+
./scripts/run-gojira-b-kaiju-gguf-convert.sh
|
| 439 |
+
```
|
| 440 |
+
|
| 441 |
+
Result:
|
| 442 |
+
|
| 443 |
+
- Artifact:
|
| 444 |
+
`/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`
|
| 445 |
+
- Size: `27G`
|
| 446 |
+
- SHA256:
|
| 447 |
+
`596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`
|
| 448 |
+
- Conversion log:
|
| 449 |
+
`runs/gguf-conversion/20260603T231446Z/gguf-conversion.log`
|
| 450 |
+
- Runtime status: candidate only; direct GGUF runtime smoke still required
|
| 451 |
+
before publishing quantized weights.
|
| 452 |
+
|
| 453 |
+
Interpretation: the next real speed improvement for broad public users is not
|
| 454 |
+
another prompt tweak. It is a smoked GGUF or GPU-persisted quantized artifact.
|
| 455 |
+
The fastest currently verified Kaiju Coder 7 path remains vLLM bitsandbytes
|
| 456 |
+
plus the local fast proxy and deterministic website/business harnesses.
|
scripts/check_hf_uploaded_release.py
CHANGED
|
@@ -24,7 +24,7 @@ from typing import Any
|
|
| 24 |
|
| 25 |
MODEL_ID = "kaiju-coder-7"
|
| 26 |
DEFAULT_NAMESPACE = "RMDWLLC"
|
| 27 |
-
DEFAULT_BASE_URL = "http://
|
| 28 |
|
| 29 |
|
| 30 |
@dataclass(frozen=True)
|
|
|
|
| 24 |
|
| 25 |
MODEL_ID = "kaiju-coder-7"
|
| 26 |
DEFAULT_NAMESPACE = "RMDWLLC"
|
| 27 |
+
DEFAULT_BASE_URL = "http://127.0.0.1:18181/v1"
|
| 28 |
|
| 29 |
|
| 30 |
@dataclass(frozen=True)
|