Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
kaiju-coder-7
coding
local-ai
business
opencode
tool-use
conversational
Instructions to use RMDWLLC/kaiju-coder-7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RMDWLLC/kaiju-coder-7 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("RMDWLLC/kaiju-coder-7") model = AutoModelForImageTextToText.from_pretrained("RMDWLLC/kaiju-coder-7") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use RMDWLLC/kaiju-coder-7 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RMDWLLC/kaiju-coder-7" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RMDWLLC/kaiju-coder-7
- SGLang
How to use RMDWLLC/kaiju-coder-7 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RMDWLLC/kaiju-coder-7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use RMDWLLC/kaiju-coder-7 with Docker Model Runner:
docker model run hf.co/RMDWLLC/kaiju-coder-7
Add files using upload-large-folder tool
Browse files- EVAL_SCOREBOARD.md +11 -7
- FINAL_RELEASE_REPORT.md +18 -8
- GOAL_COMPLETION_AUDIT.md +4 -4
- HF_UPLOAD_EVIDENCE.md +9 -8
- LOCAL_TEST_INSTRUCTIONS.md +19 -13
- MERGED_MODEL_RELEASE_MANIFEST.json +1 -1
- PAID_API_READINESS.md +8 -8
- PUBLIC_TESTING_QUICKSTART.md +37 -18
- README.md +8 -2
- SERVING_BENCHMARKS.md +117 -17
EVAL_SCOREBOARD.md
CHANGED
|
@@ -35,7 +35,7 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
|
|
| 35 |
| Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
|
| 36 |
| Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
|
| 37 |
| Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
|
| 38 |
-
| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --
|
| 39 |
| Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
|
| 40 |
| Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
|
| 41 |
| Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
|
|
@@ -43,9 +43,13 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
|
|
| 43 |
| Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
|
| 44 |
| Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
|
| 45 |
| Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
| Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
|
| 47 |
-
| Hugging Face
|
| 48 |
-
| Hugging Face merged-model
|
| 49 |
| v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
|
| 50 |
| Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
|
| 51 |
| Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
|
|
@@ -60,11 +64,11 @@ This scoreboard tracks the current release-candidate evidence. Do not publish we
|
|
| 60 |
| v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
|
| 61 |
| Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
|
| 62 |
| Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
|
| 63 |
-
| GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending |
|
| 64 |
| Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
|
| 65 |
-
| Human review | Richard reviews artifacts for usefulness, privacy, and sellability |
|
| 66 |
-
| Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged
|
| 67 |
|
| 68 |
## Decision Rule
|
| 69 |
|
| 70 |
-
The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should
|
|
|
|
| 35 |
| Kaiju Coder 7 restored 32k OpenCode one-file smoke | `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-32k-final-smoke 'Create hello.txt with exactly: Kaiju Coder 7 final 32k ok'` | Passed; wrote `hello.txt` with exactly `Kaiju Coder 7 final 32k ok` | 2026-06-03 |
|
| 36 |
| Kaiju Coder 7 current restored 16k direct API smoke | `python3 scripts/benchmark_kaiju_serving.py --contexts 16384 --prompts identity --max-tokens 64 --timeout 120` | Passed; latest run `runs/benchmarks/20260603T174545Z-kaiju-coder-7-serving/summary.md`, identity `2.3s`, `26` chars | 2026-06-03 |
|
| 37 |
| Kaiju Coder 7 current restored 16k OpenCode one-file smoke | `mkdir -p /tmp/kaiju-opencode-fresh-public-smoke && opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-fresh-public-smoke --dangerously-skip-permissions 'Create hello.txt with exactly: Kaiju Coder 7 fresh public smoke ok'` | Passed; `/v1/models` returned `kaiju-coder-7`, max model len `16384`; wrote `hello.txt` with exactly `Kaiju Coder 7 fresh public smoke ok` | 2026-06-03 |
|
| 38 |
+
| Kaiju Coder 7 packaged public OpenCode smoke | `python3 scripts/run_kaiju_public_opencode_smoke.py --base-url http://127.0.0.1:18181/v1 --timeout 900` | Passed; latest run `runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4` checks passed; installer dry-run, OpenCode `1.15.13`, live 16k model, and exact file written only in the requested temp workspace through the fast proxy | 2026-06-03 |
|
| 39 |
| Kaiju Coder 7 loop-guarded OpenCode install | `python3 scripts/install_kaiju_opencode_profile.py`; `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-opencode-loopguard-smoke --dangerously-skip-permissions 'Create loopguard.txt with exactly: Kaiju Coder 7 loop guard installed'` | Passed; config includes `/Users/richardecholsai7/.config/opencode/kaiju-no-autocontinue.mjs`; wrote `loopguard.txt` with exact requested content and exited cleanly | 2026-06-03 |
|
| 40 |
| Current harnessed OpenCode customer-readiness pack | `python3 scripts/run_kaiju_opencode_customer_pack.py --mode harnessed` | Passed; latest run `runs/opencode-customer-readiness/20260603T185835Z/summary.md`, `4/4` tasks passed and `28/28` required files written, including release provenance and safety review | 2026-06-03 |
|
| 41 |
| Paid API Worker scaffold | `cd gateway/cloudflare-worker && npm run check && npm run preflight` | Passed `16/16` Worker tests and `17` scaffold preflight checks; covers bearer auth, inactive keys, insufficient credits, debit/refund, rate limit before debit, model `kaiju-coder-7` enforcement, stream/thinking/token caps, secret-content rejection without logging, signed Stripe Checkout top-up idempotency, origin-only R2 artifact upload, account-scoped artifact download, guarded Cloudflare resource prep, Wrangler dry-run deploy, sanitized paid-launch evidence template packaging, reviewed Cloudflare bindings template, binding applier guardrails, and sanitized evidence collection helper | 2026-06-03 |
|
|
|
|
| 43 |
| Kaiju Coder 7 runtime-quantized vLLM serve | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed at 8k and 16k; 16k identity `19.51s`, code patch `11.3s`; vLLM log reported about `17.8 GiB` model memory | 2026-06-03 |
|
| 44 |
| Kaiju Coder 7 runtime-quantized business-doc smoke | `KAIJU_VLLM_CONTEXT=16384 KAIJU_VLLM_QUANTIZATION=bitsandbytes KAIJU_VLLM_LOAD_FORMAT=bitsandbytes KAIJU_VLLM_PROMPTS=business_doc KAIJU_VLLM_MAX_TOKENS=768 KAIJU_VLLM_PROMPT_TIMEOUT=420 ./scripts/run-gojira-b-vllm-serving-benchmark.sh` | Passed; business proposal `53.44s`, `1,610` chars, `30.127` chars/s; wrapper restored SGLang after completion | 2026-06-03 |
|
| 45 |
| Kaiju Coder 7 runtime-quantized OpenCode one-file smoke | `bash scripts/run_kaiju_quantized_opencode_smoke.sh` | Passed at 16k after vLLM `--enable-auto-tool-choice`; OpenCode wrote `hello.txt` with exactly `Kaiju Coder 7 quantized runtime ok` | 2026-06-03 |
|
| 46 |
+
| Kaiju Coder 7 fast proxy plus website harness speed pass | `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 ...` and OpenCode through `http://127.0.0.1:18181/v1` | Passed; local fast proxy forwards to vLLM bitsandbytes on `18084`; direct website harness wrote `9,257` chars in `7.31s`; router website passed all checks in `7.20s`; local-proxy router website passed in `4.67s`; public OpenCode smoke through the proxy passed in about `40s` end to end | 2026-06-03 |
|
| 47 |
+
| Persisted quantization support probe | `./scripts/probe-gojira-b-persisted-quantization.sh` | Passed as evidence probe; AWQ/GPTQ normal installs are not clean against the Qwen3.5-capable stack tonight, `llmcompressor --no-deps` preserves config support but needs a pinned dependency env, and `llama.cpp` supports `Qwen3_5ForConditionalGeneration` with Q8_0 dry-run passing | 2026-06-03 |
|
| 48 |
+
| GGUF Q8_0 persisted conversion | `./scripts/run-gojira-b-kaiju-gguf-convert.sh` | Converted candidate at `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`, `27G`, SHA256 `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`; runtime smoke still required before public quantized-weights release | 2026-06-03 |
|
| 49 |
+
| Public business-owner demo pack | `python3 scripts/run_kaiju_public_demo_pack.py --openai-base-url http://127.0.0.1:18181/v1 --model kaiju-coder-7 --planner-timeout 90` | Passed `4/4` through the fast proxy in `64.529s`: website `4.73s`, owner AI company pack `29.85s` with `19` files, Stripe safety plan `9.99s`, CSV parser artifact `19.97s`; run `runs/public-demo-pack/20260603T235009Z/summary.md` | 2026-06-03 |
|
| 50 |
| Hugging Face CLI install/auth check | `hf version && hf auth whoami && hf auth list` | `hf` installed locally at version `1.17.0`; auth user `restokes92`; token name `gojirakiyomikode` | 2026-06-03 |
|
| 51 |
+
| Hugging Face public helper repos | `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public` | Passed `17/17`; public downloads verified for adapter, OpenCode helper, and runtime helper, including installer dry-run, demo runner, and GGUF candidate note | 2026-06-03 |
|
| 52 |
+
| Hugging Face merged-model upload | `KAIJU_HF_NAMESPACE=RMDWLLC KAIJU_HF_UPLOAD_APPLY=1 bash scripts/upload_hf_merged_model_from_gojira_b.sh` | Uploaded public repo `RMDWLLC/kaiju-coder-7`; `hf upload-large-folder` processed `53.8G/53.8G`, `39` files, `14` safetensors shards; metadata reports `private: false` | 2026-06-03 |
|
| 53 |
| v1.8 merged endpoint probe | Direct OpenAI-compatible chat request with top-level `chat_template_kwargs` disabling thinking | Passed; `1,155` visible chars in `60.17s`, normal `content` response | 2026-06-03 |
|
| 54 |
| Kaiju Coder 7 merged focused proposal eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl --max-tasks 1 --max-tokens 1800 ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `1/1` paid-ready, `4.0/4.0`, `4,014` chars, `212.72s` | 2026-06-03 |
|
| 55 |
| Kaiju Coder 7 merged focused Jah credits eval | `python3 evals/run_openai_compat_smoke.py --model kaiju-coder-7 --tasks evals/tasks/business-owner-v18-comparison.jsonl ...` then `python3 evals/score_quality_gate.py <results.jsonl>` | Passed: `4.0/4.0`, `9,718` chars, `566.36s` | 2026-06-03 |
|
|
|
|
| 64 |
| v1.8 merged focused smoke | `python3 evals/run_openai_compat_smoke.py --tasks evals/tasks/business-owner-v18-comparison.jsonl --model kaiju-coder-7 ...` then `python3 evals/score_quality_gate.py` | Passed for proposal rerun and Jah credits backend; broader sweep pending |
|
| 65 |
| Direct commercial eval | No critical failures, scored summary attached | Passed for targeted high-value tasks when using the product harness plus 8k raw website mode; broader task sweep still pending |
|
| 66 |
| Base Qwen comparison | Kaiju beats base Qwen on RMDW/Kiyomi practical tasks | Not yet: raw deterministic identity still matches base; compare broader tasks before model-level improvement claims |
|
| 67 |
+
| GLM comparison | Kaiju is near or above GLM on highest-value business-owner tasks | Pending; required only before superiority claims |
|
| 68 |
| Local inference smoke | OpenAI-compatible endpoint returns usable business-owner artifact | Passed for v1.8 merged SGLang endpoint and product harness |
|
| 69 |
+
| Human review | Richard reviews artifacts for usefulness, privacy, and sellability | Approved for public HF visibility and paid API launch preflight on 2026-06-03 |
|
| 70 |
+
| Release package | Model card, provenance, license notes, eval summary, limitations, Hugging Face draft, completion audit, and run instructions complete | Staged, bundled, uploaded to public HF repos, and verified with public downloads |
|
| 71 |
|
| 72 |
## Decision Rule
|
| 73 |
|
| 74 |
+
The v1.8 adapter is a completed local checkpoint and the merged full model is the current served raw-model path. The business-owner product should be published honestly as Kaiju Coder 7 plus deterministic harness plus verifier, with vLLM bitsandbytes plus the fast proxy as the current speed path. Do not claim raw-weight superiority until broader base/GLM and raw website comparisons pass.
|
FINAL_RELEASE_REPORT.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# Kaiju Coder 7 Final Release Report
|
| 2 |
|
| 3 |
-
Generated: `2026-06-
|
| 4 |
|
| 5 |
Product name: `Kaiju Coder 7`
|
| 6 |
Public model id: `kaiju-coder-7`
|
|
@@ -25,7 +25,7 @@ Stripe live-mode switch and controlled live payment verification.
|
|
| 25 |
| Field | Value |
|
| 26 |
|---|---|
|
| 27 |
| Status | `pass` |
|
| 28 |
-
| Base URL | `http://
|
| 29 |
| Model id | `kaiju-coder-7` |
|
| 30 |
| Max model length | `16384` |
|
| 31 |
| Detail | `` |
|
|
@@ -52,7 +52,7 @@ stability and speed.
|
|
| 52 |
| Small helper repos uploaded | `True` |
|
| 53 |
| Merged model uploaded | `True` |
|
| 54 |
| Merged repo | `RMDWLLC/kaiju-coder-7` |
|
| 55 |
-
| Merged repo SHA | `
|
| 56 |
| Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
|
| 57 |
| Download status | `public downloads verified; no active private-storage blocker recorded` |
|
| 58 |
| Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
|
|
@@ -93,7 +93,7 @@ stability and speed.
|
|
| 93 |
| Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
|
| 94 |
| Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
|
| 95 |
| Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
|
| 96 |
-
| Latest direct API smoke | `runs/benchmarks/
|
| 97 |
| Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
|
| 98 |
| Latest public OpenCode smoke | `runs/public-opencode-smoke` |
|
| 99 |
|
|
@@ -133,7 +133,7 @@ human release review explicitly approves public paid API launch.
|
|
| 133 |
|
| 134 |
## Changed Files
|
| 135 |
|
| 136 |
-
`git status --short` currently reports `
|
| 137 |
|
| 138 |
| State | Path |
|
| 139 |
|---|---|
|
|
@@ -153,8 +153,10 @@ human release review explicitly approves public paid API launch.
|
|
| 153 |
| M | `gateway/cloudflare-worker/src/index.js` |
|
| 154 |
| M | `gateway/cloudflare-worker/test/index.test.js` |
|
| 155 |
| M | `gateway/cloudflare-worker/wrangler.jsonc` |
|
|
|
|
| 156 |
| M | `kaiju_harness/router.py` |
|
| 157 |
| M | `kaiju_harness/verification.py` |
|
|
|
|
| 158 |
| D | `models/README.md` |
|
| 159 |
| D | `models/qwen3.6-27b-base.md` |
|
| 160 |
| D | `models/qwen3.6-27b-fp8.md` |
|
|
@@ -164,14 +166,17 @@ human release review explicitly approves public paid API launch.
|
|
| 164 |
| M | `release/MODEL_CARD_DRAFT.md` |
|
| 165 |
| M | `scripts/build_sft_dataset.py` |
|
| 166 |
| M | `scripts/check-gojira-b-capacity.sh` |
|
|
|
|
| 167 |
| M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
|
| 168 |
| M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
|
| 169 |
| M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
|
| 170 |
| M | `scripts/run_kaiju_api_harness_smoke.py` |
|
|
|
|
| 171 |
| M | `scripts/start-qwen36-lora-sglang.sh` |
|
| 172 |
| M | `scripts/stop-qwen36-lora-sglang.sh` |
|
| 173 |
| M | `scripts/validate_training_data.py` |
|
| 174 |
| M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
|
|
|
|
| 175 |
| ?? | `.opencode/` |
|
| 176 |
| ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
|
| 177 |
| ?? | `datasets/v1.7-targets.json` |
|
|
@@ -196,6 +201,7 @@ human release review explicitly approves public paid API launch.
|
|
| 196 |
| ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
|
| 197 |
| ?? | `release/bundles/` |
|
| 198 |
| ?? | `release/cloudflare-bindings.example.json` |
|
|
|
|
| 199 |
| ?? | `release/hf-release-permission-evidence.example.json` |
|
| 200 |
| ?? | `release/hf-release-permission-evidence.json` |
|
| 201 |
| ?? | `release/huggingface/` |
|
|
@@ -225,17 +231,21 @@ human release review explicitly approves public paid API launch.
|
|
| 225 |
| ?? | `scripts/generate_kaiju_final_report.py` |
|
| 226 |
| ?? | `scripts/gojira-b-ssh-lib.sh` |
|
| 227 |
| ?? | `scripts/install_kaiju_opencode_profile.py` |
|
|
|
|
| 228 |
| ?? | `scripts/make_hf_release_public.sh` |
|
| 229 |
| ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
|
| 230 |
| ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
|
| 231 |
| ?? | `scripts/prepare_hf_release_staging.sh` |
|
| 232 |
| ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
|
| 233 |
| ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
|
|
|
|
| 234 |
| ?? | `scripts/refresh_kaiju_release_evidence.py` |
|
|
|
|
| 235 |
| ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
|
| 236 |
| ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
|
| 237 |
| ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
|
| 238 |
| ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
|
|
|
|
| 239 |
| ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
|
| 240 |
| ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
|
| 241 |
| ?? | `scripts/start-qwen36-merged-sglang.sh` |
|
|
@@ -262,9 +272,9 @@ human release review explicitly approves public paid API launch.
|
|
| 262 |
| git HEAD | `git rev-parse HEAD` | 0 |
|
| 263 |
| git origin/main | `git rev-parse origin/main` | 0 |
|
| 264 |
| git status | `git status --short` | 0 |
|
| 265 |
-
| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://
|
| 266 |
-
| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://
|
| 267 |
-
| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://
|
| 268 |
| HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
|
| 269 |
| paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
|
| 270 |
|
|
|
|
| 1 |
# Kaiju Coder 7 Final Release Report
|
| 2 |
|
| 3 |
+
Generated: `2026-06-03T23:53:31Z`
|
| 4 |
|
| 5 |
Product name: `Kaiju Coder 7`
|
| 6 |
Public model id: `kaiju-coder-7`
|
|
|
|
| 25 |
| Field | Value |
|
| 26 |
|---|---|
|
| 27 |
| Status | `pass` |
|
| 28 |
+
| Base URL | `http://127.0.0.1:18181/v1` |
|
| 29 |
| Model id | `kaiju-coder-7` |
|
| 30 |
| Max model length | `16384` |
|
| 31 |
| Detail | `` |
|
|
|
|
| 52 |
| Small helper repos uploaded | `True` |
|
| 53 |
| Merged model uploaded | `True` |
|
| 54 |
| Merged repo | `RMDWLLC/kaiju-coder-7` |
|
| 55 |
+
| Merged repo SHA | `00ba85985102a14838dbb8a5692d9a75ce9da15a` |
|
| 56 |
| Merged upload size | `39 files / 53.8G / 14 safetensors shards recorded` |
|
| 57 |
| Download status | `public downloads verified; no active private-storage blocker recorded` |
|
| 58 |
| Visibility decision | `PUBLIC`; `HF_VISIBILITY_DECISION: PUBLIC` recorded in human review |
|
|
|
|
| 93 |
| Paid API launch evidence template | `release/paid-api-launch-evidence.example.json` |
|
| 94 |
| Cloudflare bindings template | `release/cloudflare-bindings.example.json` |
|
| 95 |
| Cloudflare bindings applier | `scripts/apply_paid_api_cloudflare_bindings.py` |
|
| 96 |
+
| Latest direct API smoke | `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md` |
|
| 97 |
| Latest OpenCode customer pack | `runs/opencode-customer-readiness/20260603T185835Z/summary.md` |
|
| 98 |
| Latest public OpenCode smoke | `runs/public-opencode-smoke` |
|
| 99 |
|
|
|
|
| 133 |
|
| 134 |
## Changed Files
|
| 135 |
|
| 136 |
+
`git status --short` currently reports `126` changed paths.
|
| 137 |
|
| 138 |
| State | Path |
|
| 139 |
|---|---|
|
|
|
|
| 153 |
| M | `gateway/cloudflare-worker/src/index.js` |
|
| 154 |
| M | `gateway/cloudflare-worker/test/index.test.js` |
|
| 155 |
| M | `gateway/cloudflare-worker/wrangler.jsonc` |
|
| 156 |
+
| M | `gateway/gojira-local/server.py` |
|
| 157 |
| M | `kaiju_harness/router.py` |
|
| 158 |
| M | `kaiju_harness/verification.py` |
|
| 159 |
+
| M | `kaiju_harness/website.py` |
|
| 160 |
| D | `models/README.md` |
|
| 161 |
| D | `models/qwen3.6-27b-base.md` |
|
| 162 |
| D | `models/qwen3.6-27b-fp8.md` |
|
|
|
|
| 166 |
| M | `release/MODEL_CARD_DRAFT.md` |
|
| 167 |
| M | `scripts/build_sft_dataset.py` |
|
| 168 |
| M | `scripts/check-gojira-b-capacity.sh` |
|
| 169 |
+
| M | `scripts/check_kaiju_gateway_policy.py` |
|
| 170 |
| M | `scripts/run-gojira-b-qwen36-lora-eval.sh` |
|
| 171 |
| M | `scripts/run-gojira-b-qwen36-lora-sglang-eval.sh` |
|
| 172 |
| M | `scripts/run-gojira-b-qwen36-lora-train.sh` |
|
| 173 |
| M | `scripts/run_kaiju_api_harness_smoke.py` |
|
| 174 |
+
| M | `scripts/run_kaiju_router.py` |
|
| 175 |
| M | `scripts/start-qwen36-lora-sglang.sh` |
|
| 176 |
| M | `scripts/stop-qwen36-lora-sglang.sh` |
|
| 177 |
| M | `scripts/validate_training_data.py` |
|
| 178 |
| M | `scripts/watch-gojira-b-qwen36-lora-train.sh` |
|
| 179 |
+
| M | `tests/test_website_harness.py` |
|
| 180 |
| ?? | `.opencode/` |
|
| 181 |
| ?? | `datasets/candidates/v1.7-rmdw-business-owner-suite.jsonl` |
|
| 182 |
| ?? | `datasets/v1.7-targets.json` |
|
|
|
|
| 201 |
| ?? | `release/UPSTREAM_LICENSE_CHECK.md` |
|
| 202 |
| ?? | `release/bundles/` |
|
| 203 |
| ?? | `release/cloudflare-bindings.example.json` |
|
| 204 |
+
| ?? | `release/gguf/` |
|
| 205 |
| ?? | `release/hf-release-permission-evidence.example.json` |
|
| 206 |
| ?? | `release/hf-release-permission-evidence.json` |
|
| 207 |
| ?? | `release/huggingface/` |
|
|
|
|
| 231 |
| ?? | `scripts/generate_kaiju_final_report.py` |
|
| 232 |
| ?? | `scripts/gojira-b-ssh-lib.sh` |
|
| 233 |
| ?? | `scripts/install_kaiju_opencode_profile.py` |
|
| 234 |
+
| ?? | `scripts/kaiju_opencode_fast_proxy.py` |
|
| 235 |
| ?? | `scripts/make_hf_release_public.sh` |
|
| 236 |
| ?? | `scripts/opencode-kaiju-no-autocontinue.mjs` |
|
| 237 |
| ?? | `scripts/prepare_hf_merged_model_metadata.sh` |
|
| 238 |
| ?? | `scripts/prepare_hf_release_staging.sh` |
|
| 239 |
| ?? | `scripts/prepare_paid_api_cloudflare_resources.sh` |
|
| 240 |
| ?? | `scripts/probe-gojira-b-kaiju-quantization.sh` |
|
| 241 |
+
| ?? | `scripts/probe-gojira-b-persisted-quantization.sh` |
|
| 242 |
| ?? | `scripts/refresh_kaiju_release_evidence.py` |
|
| 243 |
+
| ?? | `scripts/run-gojira-b-kaiju-gguf-convert.sh` |
|
| 244 |
| ?? | `scripts/run-gojira-b-qwen36-lora-merge.sh` |
|
| 245 |
| ?? | `scripts/run-gojira-b-vllm-serving-benchmark.sh` |
|
| 246 |
| ?? | `scripts/run_kaiju_business_owner_rc_smoke.py` |
|
| 247 |
| ?? | `scripts/run_kaiju_opencode_customer_pack.py` |
|
| 248 |
+
| ?? | `scripts/run_kaiju_public_demo_pack.py` |
|
| 249 |
| ?? | `scripts/run_kaiju_public_opencode_smoke.py` |
|
| 250 |
| ?? | `scripts/run_kaiju_quantized_opencode_smoke.sh` |
|
| 251 |
| ?? | `scripts/start-qwen36-merged-sglang.sh` |
|
|
|
|
| 272 |
| git HEAD | `git rev-parse HEAD` | 0 |
|
| 273 |
| git origin/main | `git rev-parse origin/main` | 0 |
|
| 274 |
| git status | `git status --short` | 0 |
|
| 275 |
+
| local readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode local --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
|
| 276 |
+
| HF release readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode hf-release --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
|
| 277 |
+
| public readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_kaiju_public_release_readiness.py --mode public --json --base-url http://127.0.0.1:18181/v1 --live-timeout 5 --staging-dir /tmp/kaiju-coder-7-hf-staging` | 0 |
|
| 278 |
| HF staging integrity | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_hf_staging_integrity.py --staging-dir /tmp/kaiju-coder-7-hf-staging --require-checksums --json` | 0 |
|
| 279 |
| paid API launch readiness | `/opt/homebrew/opt/python@3.14/bin/python3.14 scripts/check_paid_api_readiness.py --mode launch --json` | 0 |
|
| 280 |
|
GOAL_COMPLETION_AUDIT.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
# Kaiju Coder 7 Goal Completion Audit
|
| 2 |
|
| 3 |
-
Generated: `2026-06-
|
| 4 |
|
| 5 |
Overall: `complete`
|
| 6 |
Summary: `18 passed / 0 blocked / 0 manual`
|
| 7 |
|
| 8 |
-
This audit maps the active Kaiju Coder 7 objective to current evidence
|
| 9 |
|
| 10 |
## Readiness Commands
|
| 11 |
|
|
@@ -28,13 +28,13 @@ This audit maps the active Kaiju Coder 7 objective to current evidence. It is st
|
|
| 28 |
| OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py | |
|
| 29 |
| OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py | |
|
| 30 |
| OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
|
| 31 |
-
| Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/
|
| 32 |
| Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default | |
|
| 33 |
| Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md | |
|
| 34 |
| Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
|
| 35 |
| Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums | |
|
| 36 |
| Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
|
| 37 |
-
| Hugging Face | Merged 51GB model repo upload is complete
|
| 38 |
| Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply | |
|
| 39 |
| Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
|
| 40 |
| Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md | |
|
|
|
|
| 1 |
# Kaiju Coder 7 Goal Completion Audit
|
| 2 |
|
| 3 |
+
Generated: `2026-06-03T23:53:44Z`
|
| 4 |
|
| 5 |
Overall: `complete`
|
| 6 |
Summary: `18 passed / 0 blocked / 0 manual`
|
| 7 |
|
| 8 |
+
This audit maps the active Kaiju Coder 7 objective to current evidence across local runtime, Hugging Face release, OpenCode, paid API preflight, and remaining honest caveats.
|
| 9 |
|
| 10 |
## Readiness Commands
|
| 11 |
|
|
|
|
| 28 |
| OpenCode | Lean Kaiju-specific OpenCode config/agent minimizes prompt overhead and disables synthetic auto-continue loops. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py | |
|
| 29 |
| OpenCode | opencode -m kaiju/kaiju-coder-7 works from this Mac with the recommended config. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py | |
|
| 30 |
| OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
|
| 31 |
+
| Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md | |
|
| 32 |
| Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default | |
|
| 33 |
| Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md | |
|
| 34 |
| Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
|
| 35 |
| Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums | |
|
| 36 |
| Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
|
| 37 |
+
| Hugging Face | Merged 51GB model repo upload is complete and public, or guarded with explicit evidence. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh | |
|
| 38 |
| Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply | |
|
| 39 |
| Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
|
| 40 |
| Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md | |
|
HF_UPLOAD_EVIDENCE.md
CHANGED
|
@@ -1,15 +1,15 @@
|
|
| 1 |
# Kaiju Coder 7 Hugging Face Upload Evidence
|
| 2 |
|
| 3 |
-
Generated: `2026-06-
|
| 4 |
|
| 5 |
## Uploaded Repos
|
| 6 |
|
| 7 |
| Repo | Visibility | Evidence |
|
| 8 |
|---|---|---|
|
| 9 |
-
| `RMDWLLC/kaiju-coder-7-adapter` | public |
|
| 10 |
-
| `RMDWLLC/kaiju-coder-7-opencode` | public |
|
| 11 |
-
| `RMDWLLC/kaiju-coder-7-quantized-runtime` | public |
|
| 12 |
-
| `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `
|
| 13 |
|
| 14 |
These SHAs are a point-in-time release evidence snapshot. Uploading this
|
| 15 |
evidence file itself creates another metadata commit, so use `hf models info`
|
|
@@ -71,14 +71,15 @@ Result:
|
|
| 71 |
- `hf auth whoami` returned user `restokes92` with org `RMDWLLC`.
|
| 72 |
- `hf repos settings ... --public` completed for all four repos.
|
| 73 |
- `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public`
|
| 74 |
-
passed `17/17` checks after the public visibility switch
|
| 75 |
-
|
|
|
|
| 76 |
- The adapter, OpenCode helper, and runtime-quantized helper repos downloaded
|
| 77 |
successfully as public repos.
|
| 78 |
- The downloaded OpenCode helper installer dry-run passed and included the
|
| 79 |
loop guard.
|
| 80 |
- Merged model metadata reports `private: false`, SHA
|
| 81 |
-
`
|
| 82 |
safetensors shards.
|
| 83 |
|
| 84 |
The earlier private-storage limit blocked private file downloads after the
|
|
|
|
| 1 |
# Kaiju Coder 7 Hugging Face Upload Evidence
|
| 2 |
|
| 3 |
+
Generated: `2026-06-03T23:37:20Z`
|
| 4 |
|
| 5 |
## Uploaded Repos
|
| 6 |
|
| 7 |
| Repo | Visibility | Evidence |
|
| 8 |
|---|---|---|
|
| 9 |
+
| `RMDWLLC/kaiju-coder-7-adapter` | public | Refreshed public helper/evidence package commit `943b6fc7e025bbacd8b94275eb4321f6b0ed69c7`; public visibility verified after 2026-06-03 speed and GGUF-candidate pass. |
|
| 10 |
+
| `RMDWLLC/kaiju-coder-7-opencode` | public | Refreshed OpenCode helper commit `032872d88fd799515ac81158e011780e0d6059f6`; public visibility, installer dry-run, and exact-file smoke verified. |
|
| 11 |
+
| `RMDWLLC/kaiju-coder-7-quantized-runtime` | public | Current public helper commit `785f3d758da493e3c435d67ef12c3e1e4d62db1a`; includes runtime bitsandbytes recipe plus GGUF Q8_0 candidate note. |
|
| 12 |
+
| `RMDWLLC/kaiju-coder-7` | public | `hf upload-large-folder` completed successfully, then metadata/evidence refreshed at final visible SHA `00ba85985102a14838dbb8a5692d9a75ce9da15a`; public metadata reports `private: false`. |
|
| 13 |
|
| 14 |
These SHAs are a point-in-time release evidence snapshot. Uploading this
|
| 15 |
evidence file itself creates another metadata commit, so use `hf models info`
|
|
|
|
| 71 |
- `hf auth whoami` returned user `restokes92` with org `RMDWLLC`.
|
| 72 |
- `hf repos settings ... --public` completed for all four repos.
|
| 73 |
- `python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply --require-public`
|
| 74 |
+
passed `17/17` checks after the public visibility switch, after the refreshed
|
| 75 |
+
public helper upload, and again after adding stricter checks for the demo
|
| 76 |
+
runner and GGUF candidate package files.
|
| 77 |
- The adapter, OpenCode helper, and runtime-quantized helper repos downloaded
|
| 78 |
successfully as public repos.
|
| 79 |
- The downloaded OpenCode helper installer dry-run passed and included the
|
| 80 |
loop guard.
|
| 81 |
- Merged model metadata reports `private: false`, SHA
|
| 82 |
+
`00ba85985102a14838dbb8a5692d9a75ce9da15a`, and lists all `14`
|
| 83 |
safetensors shards.
|
| 84 |
|
| 85 |
The earlier private-storage limit blocked private file downloads after the
|
LOCAL_TEST_INSTRUCTIONS.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# Kaiju Coder 7 Local Test Instructions
|
| 2 |
|
| 3 |
-
Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the
|
| 4 |
|
| 5 |
## Run The Local Release-Candidate Gate
|
| 6 |
|
|
@@ -24,26 +24,32 @@ KAIJU_MERGED_MODEL_DIR=/workspace/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.
|
|
| 24 |
|
| 25 |
## Start Kaiju Coder 7 Serving
|
| 26 |
|
| 27 |
-
Use this for the current model-side candidate:
|
| 28 |
|
| 29 |
```bash
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
./scripts/start-qwen36-merged-
|
| 35 |
```
|
| 36 |
|
| 37 |
Confirm readiness:
|
| 38 |
|
| 39 |
```bash
|
| 40 |
-
curl http://100.109.109.14:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
```
|
| 42 |
|
| 43 |
The high-context `32768` target has benchmark evidence in
|
| 44 |
-
`release/SERVING_BENCHMARKS.md`, but the current
|
| 45 |
-
|
| 46 |
-
smoke work.
|
| 47 |
|
| 48 |
## Prepare Merged-Model Hugging Face Metadata
|
| 49 |
|
|
@@ -82,7 +88,7 @@ python3 scripts/run_kaiju_api_harness_smoke.py
|
|
| 82 |
|
| 83 |
```bash
|
| 84 |
python3 evals/run_openai_compat_smoke.py \
|
| 85 |
-
--base-url http://100.109.109.14:
|
| 86 |
--model kaiju-coder-7 \
|
| 87 |
--tasks evals/tasks/smoke.jsonl \
|
| 88 |
--max-tasks 1 \
|
|
@@ -100,7 +106,7 @@ evals pass at acceptable latency:
|
|
| 100 |
|
| 101 |
```bash
|
| 102 |
python3 evals/run_openai_compat_smoke.py \
|
| 103 |
-
--base-url http://100.109.109.14:
|
| 104 |
--model kaiju-coder-7 \
|
| 105 |
--tasks evals/tasks/business-owner-v18-comparison.jsonl \
|
| 106 |
--timeout 900 \
|
|
|
|
| 1 |
# Kaiju Coder 7 Local Test Instructions
|
| 2 |
|
| 3 |
+
Use these commands from the repo root. The public release name is Kaiju Coder 7. Internally, this build is backed by the v1.8 adapter under `runs/qwen36-27b-lora-v1.8-business-owner/adapter`. The release-candidate raw model path is the merged full model on Gojira B at `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`. The deterministic harness commands work locally now; the fastest current runtime is vLLM bitsandbytes on Gojira B over Tailscale with the local OpenCode fast proxy.
|
| 4 |
|
| 5 |
## Run The Local Release-Candidate Gate
|
| 6 |
|
|
|
|
| 24 |
|
| 25 |
## Start Kaiju Coder 7 Serving
|
| 26 |
|
| 27 |
+
Use this for the fastest current model-side candidate:
|
| 28 |
|
| 29 |
```bash
|
| 30 |
+
KAIJU_VLLM_CONTEXT=16384 \
|
| 31 |
+
KAIJU_VLLM_QUANTIZATION=bitsandbytes \
|
| 32 |
+
KAIJU_VLLM_LOAD_FORMAT=bitsandbytes \
|
| 33 |
+
KAIJU_VLLM_GPU_UTIL=0.90 \
|
| 34 |
+
./scripts/start-qwen36-merged-vllm.sh
|
| 35 |
```
|
| 36 |
|
| 37 |
Confirm readiness:
|
| 38 |
|
| 39 |
```bash
|
| 40 |
+
curl http://100.109.109.14:18084/v1/models
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
Then keep the Mac-side fast proxy pointed at that vLLM endpoint:
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
KAIJU_OPENAI_BASE_URL=http://100.109.109.14:18084/v1 \
|
| 47 |
+
python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
|
| 48 |
```
|
| 49 |
|
| 50 |
The high-context `32768` target has benchmark evidence in
|
| 51 |
+
`release/SERVING_BENCHMARKS.md`, but the current speed/default path is 16k
|
| 52 |
+
runtime-quantized vLLM plus the local fast proxy.
|
|
|
|
| 53 |
|
| 54 |
## Prepare Merged-Model Hugging Face Metadata
|
| 55 |
|
|
|
|
| 88 |
|
| 89 |
```bash
|
| 90 |
python3 evals/run_openai_compat_smoke.py \
|
| 91 |
+
--base-url http://100.109.109.14:18084/v1 \
|
| 92 |
--model kaiju-coder-7 \
|
| 93 |
--tasks evals/tasks/smoke.jsonl \
|
| 94 |
--max-tasks 1 \
|
|
|
|
| 106 |
|
| 107 |
```bash
|
| 108 |
python3 evals/run_openai_compat_smoke.py \
|
| 109 |
+
--base-url http://100.109.109.14:18084/v1 \
|
| 110 |
--model kaiju-coder-7 \
|
| 111 |
--tasks evals/tasks/business-owner-v18-comparison.jsonl \
|
| 112 |
--timeout 900 \
|
MERGED_MODEL_RELEASE_MANIFEST.json
CHANGED
|
@@ -6,6 +6,6 @@
|
|
| 6 |
"notes": [
|
| 7 |
"Local metadata sync only; no Hugging Face upload performed.",
|
| 8 |
"Qwen attribution belongs in README/provenance/license notes, not the product model id.",
|
| 9 |
-
"Public paid API
|
| 10 |
]
|
| 11 |
}
|
|
|
|
| 6 |
"notes": [
|
| 7 |
"Local metadata sync only; no Hugging Face upload performed.",
|
| 8 |
"Qwen attribution belongs in README/provenance/license notes, not the product model id.",
|
| 9 |
+
"Public paid API preflight evidence has passed; real customer charging still requires the deliberate Stripe live-mode switch."
|
| 10 |
]
|
| 11 |
}
|
PAID_API_READINESS.md
CHANGED
|
@@ -152,12 +152,12 @@ python3 scripts/check_paid_api_readiness.py --mode launch
|
|
| 152 |
```
|
| 153 |
|
| 154 |
`check_kaiju_public_release_readiness.py --mode local` is the consolidated
|
| 155 |
-
public-testing readiness command.
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
|
| 162 |
`generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
|
| 163 |
the current local/public readiness summaries, launch blockers, changed files,
|
|
@@ -167,8 +167,8 @@ lines.
|
|
| 167 |
|
| 168 |
`check_kaiju_goal_completion.py --write` writes
|
| 169 |
`release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
|
| 170 |
-
remain
|
| 171 |
-
evidence
|
| 172 |
|
| 173 |
`refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
|
| 174 |
direct API smoke evidence, goal audit, final report, HF staging, local bundle,
|
|
|
|
| 152 |
```
|
| 153 |
|
| 154 |
`check_kaiju_public_release_readiness.py --mode local` is the consolidated
|
| 155 |
+
public-testing readiness command. `--mode hf-release` checks the downloadable
|
| 156 |
+
model/helper release, public Hugging Face evidence, and human review while
|
| 157 |
+
keeping live paid charging separate from model publication. `--mode public`
|
| 158 |
+
now passes after public HF verification, live Cloudflare resource evidence,
|
| 159 |
+
Stripe test-mode staging evidence, rollback proof, paid-route latency evidence,
|
| 160 |
+
and human review are complete.
|
| 161 |
|
| 162 |
`generate_kaiju_final_report.py` writes `release/FINAL_RELEASE_REPORT.md` with
|
| 163 |
the current local/public readiness summaries, launch blockers, changed files,
|
|
|
|
| 167 |
|
| 168 |
`check_kaiju_goal_completion.py --write` writes
|
| 169 |
`release/GOAL_COMPLETION_AUDIT.md`, a stricter objective-level audit. It should
|
| 170 |
+
remain green only while the live runtime, public HF evidence, human review, and
|
| 171 |
+
paid API launch evidence continue to pass.
|
| 172 |
|
| 173 |
`refresh_kaiju_release_evidence.py` is a safe local refresh runner. It updates
|
| 174 |
direct API smoke evidence, goal audit, final report, HF staging, local bundle,
|
PUBLIC_TESTING_QUICKSTART.md
CHANGED
|
@@ -19,7 +19,7 @@ Use this if you already have Kaiju Coder 7 served at an OpenAI-compatible
|
|
| 19 |
```bash
|
| 20 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
|
| 21 |
cd kaiju-coder-7-opencode
|
| 22 |
-
python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:
|
| 23 |
```
|
| 24 |
|
| 25 |
Then run OpenCode inside the project you want to edit:
|
|
@@ -65,23 +65,31 @@ the server to expose:
|
|
| 65 |
|
| 66 |
```text
|
| 67 |
model id: kaiju-coder-7
|
| 68 |
-
base URL: http://127.0.0.1:
|
| 69 |
context: 16384
|
| 70 |
```
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
Then install the OpenCode helper with:
|
| 73 |
|
| 74 |
```bash
|
| 75 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
|
| 76 |
cd kaiju-coder-7-opencode
|
| 77 |
-
python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:
|
| 78 |
```
|
| 79 |
|
| 80 |
### Path 3: Runtime-Quantized Local Candidate
|
| 81 |
|
| 82 |
Use this only if you are comfortable with advanced serving setups. The current
|
| 83 |
-
working quantized option is a runtime bitsandbytes recipe
|
| 84 |
-
|
| 85 |
|
| 86 |
```bash
|
| 87 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
|
|
@@ -115,10 +123,14 @@ Expected result:
|
|
| 115 |
- Public model id: `kaiju-coder-7`
|
| 116 |
- OpenCode context: `16384`
|
| 117 |
- Output cap for public testing: `2500`
|
|
|
|
| 118 |
- Current reliable product path: model plus deterministic business-owner
|
| 119 |
-
harness plus verifier
|
| 120 |
-
- Raw multi-file OpenCode generation: still too slow for broad paid
|
| 121 |
-
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
## What Not To Claim Yet
|
| 124 |
|
|
@@ -134,16 +146,23 @@ Do claim:
|
|
| 134 |
- Kaiju Coder 7 has a working local/OpenCode release candidate
|
| 135 |
- the current tested OpenCode default is 16k context
|
| 136 |
- the helper package includes a lean agent and compaction loop guard
|
|
|
|
|
|
|
| 137 |
- the paid API scaffold has tests and a launch preflight, but is not yet public
|
| 138 |
- the packaged public smoke verifies a fresh OpenCode one-file write before
|
| 139 |
public claims are refreshed
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
```bash
|
| 20 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
|
| 21 |
cd kaiju-coder-7-opencode
|
| 22 |
+
python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
|
| 23 |
```
|
| 24 |
|
| 25 |
Then run OpenCode inside the project you want to edit:
|
|
|
|
| 65 |
|
| 66 |
```text
|
| 67 |
model id: kaiju-coder-7
|
| 68 |
+
base URL: http://127.0.0.1:18084/v1
|
| 69 |
context: 16384
|
| 70 |
```
|
| 71 |
|
| 72 |
+
For the fastest OpenCode behavior, run the bundled fast proxy in a separate
|
| 73 |
+
terminal and point OpenCode at the proxy:
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
KAIJU_OPENAI_BASE_URL=http://127.0.0.1:18084/v1 \
|
| 77 |
+
python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
Then install the OpenCode helper with:
|
| 81 |
|
| 82 |
```bash
|
| 83 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
|
| 84 |
cd kaiju-coder-7-opencode
|
| 85 |
+
python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
|
| 86 |
```
|
| 87 |
|
| 88 |
### Path 3: Runtime-Quantized Local Candidate
|
| 89 |
|
| 90 |
Use this only if you are comfortable with advanced serving setups. The current
|
| 91 |
+
working quantized option is a runtime bitsandbytes recipe. A Q8_0 GGUF artifact
|
| 92 |
+
has been converted, but it is still a candidate until runtime smoke passes.
|
| 93 |
|
| 94 |
```bash
|
| 95 |
git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
|
|
|
|
| 123 |
- Public model id: `kaiju-coder-7`
|
| 124 |
- OpenCode context: `16384`
|
| 125 |
- Output cap for public testing: `2500`
|
| 126 |
+
- Fast OpenCode path: vLLM bitsandbytes runtime behind the Kaiju fast proxy
|
| 127 |
- Current reliable product path: model plus deterministic business-owner
|
| 128 |
+
harness/router plus verifier
|
| 129 |
+
- Raw multi-file OpenCode generation: still too slow for broad paid claims;
|
| 130 |
+
useful for testing, but paid API claims should favor harnessed product
|
| 131 |
+
workflows until broader latency gates pass
|
| 132 |
+
- Paid API: not public until launch preflight passes and the Stripe live-mode
|
| 133 |
+
switch is deliberately completed
|
| 134 |
|
| 135 |
## What Not To Claim Yet
|
| 136 |
|
|
|
|
| 146 |
- Kaiju Coder 7 has a working local/OpenCode release candidate
|
| 147 |
- the current tested OpenCode default is 16k context
|
| 148 |
- the helper package includes a lean agent and compaction loop guard
|
| 149 |
+
- the fast proxy keeps OpenCode tool calls intact while forcing bounded,
|
| 150 |
+
non-thinking generation
|
| 151 |
- the paid API scaffold has tests and a launch preflight, but is not yet public
|
| 152 |
- the packaged public smoke verifies a fresh OpenCode one-file write before
|
| 153 |
public claims are refreshed
|
| 154 |
+
- a GGUF Q8_0 candidate exists, but is not public quantized-weights release
|
| 155 |
+
evidence until runtime smoke passes
|
| 156 |
+
|
| 157 |
+
## Remaining Caveats Before Broader Claims
|
| 158 |
+
|
| 159 |
+
- Hugging Face public release repos are uploaded and public under `RMDWLLC`.
|
| 160 |
+
- The GGUF Q8_0 candidate still needs a runtime smoke before public
|
| 161 |
+
quantized-weights upload.
|
| 162 |
+
- Raw multi-file OpenCode generation is still not the public speed story; use
|
| 163 |
+
the deterministic router/harness for websites and business-owner packs.
|
| 164 |
+
- Public paid API launch has approval and preflight evidence, but real customer
|
| 165 |
+
charging still needs a deliberate Stripe live-mode switch and controlled live
|
| 166 |
+
payment verification.
|
| 167 |
+
- Do not claim 32k context as the live default until it is freshly restarted
|
| 168 |
+
and re-confirmed.
|
README.md
CHANGED
|
@@ -108,12 +108,18 @@ Current local harness evidence:
|
|
| 108 |
- Adapter-name-only serving can be base-equivalent.
|
| 109 |
- Corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`.
|
| 110 |
- Dynamic LoRA is not the release serving path for this checkpoint.
|
| 111 |
-
- Kaiju Coder 7 serving config:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
- v1.8 merged endpoint probe: `1,155` visible chars in `60.17s`.
|
| 113 |
- v1.8 merged focused eval:
|
| 114 |
- Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
|
| 115 |
- Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
|
| 116 |
-
- Broader base-Qwen, GLM, and raw website comparisons are still pending
|
|
|
|
| 117 |
|
| 118 |
Sellable-candidate gate:
|
| 119 |
|
|
|
|
| 108 |
- Adapter-name-only serving can be base-equivalent.
|
| 109 |
- Corrected selector `qwen36-27b:kaiju_v18_business_owner` crashes with `LoRA buffer shape torch.Size([8192, 16]) does not match weight shape torch.Size([14336, 16])`.
|
| 110 |
- Dynamic LoRA is not the release serving path for this checkpoint.
|
| 111 |
+
- Kaiju Coder 7 current serving config: vLLM bitsandbytes runtime
|
| 112 |
+
quantization on Gojira B at `http://100.109.109.14:18084/v1`, exposed on
|
| 113 |
+
this Mac through `http://127.0.0.1:18181/v1`, model `kaiju-coder-7`,
|
| 114 |
+
current OpenCode context `16384`. SGLang has historical 32k benchmark
|
| 115 |
+
evidence, but 32k should be freshly restarted and re-confirmed before being
|
| 116 |
+
called the live default.
|
| 117 |
- v1.8 merged endpoint probe: `1,155` visible chars in `60.17s`.
|
| 118 |
- v1.8 merged focused eval:
|
| 119 |
- Proposal rerun: `1/1` paid-ready, `4.0/4.0`, `4,014` chars in `212.72s`.
|
| 120 |
- Jah credits backend: `4.0/4.0`, `9,718` chars in `566.36s`.
|
| 121 |
+
- Broader base-Qwen, GLM, and raw website comparisons are still pending before
|
| 122 |
+
any superiority claims.
|
| 123 |
|
| 124 |
Sellable-candidate gate:
|
| 125 |
|
SERVING_BENCHMARKS.md
CHANGED
|
@@ -6,12 +6,15 @@ The model id must remain `kaiju-coder-7`.
|
|
| 6 |
## Current Live Runtime
|
| 7 |
|
| 8 |
- Host: Gojira-B over Tailscale
|
| 9 |
-
-
|
| 10 |
-
-
|
| 11 |
-
-
|
|
|
|
|
|
|
| 12 |
- Tested high-context target: `32768`
|
| 13 |
-
- Current container: `qwen36-merged-
|
| 14 |
-
- Current caveat: direct raw generation is slow for multi-file OpenCode
|
|
|
|
| 15 |
|
| 16 |
## Benchmark Command
|
| 17 |
|
|
@@ -294,12 +297,11 @@ Run: `runs/benchmarks/20260603T151244Z-kaiju-coder-7-serving/summary.md`
|
|
| 294 |
| vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
|
| 295 |
| vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
|
| 296 |
|
| 297 |
-
Interpretation: vLLM now runs Kaiju Coder 7 at 16k, but it
|
| 298 |
-
faster than SGLang on
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
quantized-weight testing.
|
| 303 |
|
| 304 |
## vLLM bitsandbytes Runtime-Quantized Candidate
|
| 305 |
|
|
@@ -323,6 +325,7 @@ Runs:
|
|
| 323 |
- `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
|
| 324 |
- `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
|
| 325 |
- `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
|
|
|
|
| 326 |
|
| 327 |
| Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
|
| 328 |
| --- | ---: | --- | --- | ---: | ---: | ---: |
|
|
@@ -332,6 +335,8 @@ Runs:
|
|
| 332 |
| vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
|
| 333 |
| vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
|
| 334 |
| vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
|
|
|
|
|
|
|
| 335 |
|
| 336 |
Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
|
| 337 |
load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
|
|
@@ -350,9 +355,104 @@ bash scripts/run_kaiju_quantized_opencode_smoke.sh
|
|
| 350 |
Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
|
| 351 |
exactly `Kaiju Coder 7 quantized runtime ok`.
|
| 352 |
|
| 353 |
-
Recommendation:
|
| 354 |
-
|
| 355 |
-
restarted and re-confirmed. Treat
|
| 356 |
-
|
| 357 |
-
|
| 358 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
## Current Live Runtime
|
| 7 |
|
| 8 |
- Host: Gojira-B over Tailscale
|
| 9 |
+
- Local OpenCode base URL: `http://127.0.0.1:18181/v1`
|
| 10 |
+
- Upstream base URL: `http://100.109.109.14:18084/v1`
|
| 11 |
+
- Serving stack: vLLM bitsandbytes runtime quantization behind the Kaiju fast
|
| 12 |
+
proxy
|
| 13 |
+
- Current verified context: `16384`
|
| 14 |
- Tested high-context target: `32768`
|
| 15 |
+
- Current container: `qwen36-merged-vllm-18084`
|
| 16 |
+
- Current caveat: direct raw generation is still slow for multi-file OpenCode
|
| 17 |
+
work; use the deterministic router/harness for public business-owner demos.
|
| 18 |
|
| 19 |
## Benchmark Command
|
| 20 |
|
|
|
|
| 297 |
| vLLM nightly | 16384 | identity | True | 19.99 | 26 | 1.301 |
|
| 298 |
| vLLM nightly | 16384 | code_patch | True | 28.8 | 416 | 14.444 |
|
| 299 |
|
| 300 |
+
Interpretation: unquantized vLLM now runs Kaiju Coder 7 at 16k, but it was not
|
| 301 |
+
clearly faster than SGLang on these smoke prompts. This is historical fallback
|
| 302 |
+
evidence. The later bitsandbytes vLLM path plus fast proxy is the active speed
|
| 303 |
+
path. Keep the live/default OpenCode profile at 16k until 32k is freshly
|
| 304 |
+
re-confirmed.
|
|
|
|
| 305 |
|
| 306 |
## vLLM bitsandbytes Runtime-Quantized Candidate
|
| 307 |
|
|
|
|
| 325 |
- `runs/benchmarks/20260603T154450Z-kaiju-coder-7-serving/summary.md`
|
| 326 |
- `runs/benchmarks/20260603T161316Z-kaiju-coder-7-serving/summary.md`
|
| 327 |
- `runs/benchmarks/20260603T165512Z-kaiju-coder-7-serving/summary.md`
|
| 328 |
+
- `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
|
| 329 |
|
| 330 |
| Stack | Context | Prompt | OK | Seconds | Chars | Chars/s |
|
| 331 |
| --- | ---: | --- | --- | ---: | ---: | ---: |
|
|
|
|
| 335 |
| vLLM bitsandbytes | 16384 | code_patch | True | 11.3 | 416 | 36.814 |
|
| 336 |
| vLLM bitsandbytes | 16384 | business_doc | True | 53.44 | 1610 | 30.127 |
|
| 337 |
| vLLM bitsandbytes | 16384 | identity | True | 19.65 | 26 | 1.323 |
|
| 338 |
+
| vLLM bitsandbytes | 16384 | code_patch | True | 24.97 | 997 | 39.924 |
|
| 339 |
+
| vLLM bitsandbytes | 16384 | business_doc | True | 34.46 | 1615 | 46.874 |
|
| 340 |
|
| 341 |
Gojira-B vLLM logs reported about `17.8 GiB` model memory for the bitsandbytes
|
| 342 |
load at both 8k and 16k, compared with about `50.22 GiB` for the unquantized
|
|
|
|
| 355 |
Result: OpenCode wrote `/tmp/kaiju-opencode-quantized-smoke/hello.txt` with
|
| 356 |
exactly `Kaiju Coder 7 quantized runtime ok`.
|
| 357 |
|
| 358 |
+
Recommendation: use vLLM bitsandbytes behind the local fast proxy as the
|
| 359 |
+
current public/OpenCode speed path and keep the installed OpenCode profile at
|
| 360 |
+
16k unless the 32k target has just been restarted and re-confirmed. Treat
|
| 361 |
+
SGLang as fallback and historical high-context evidence. vLLM bitsandbytes has
|
| 362 |
+
direct identity/code/business-doc evidence plus an OpenCode one-file smoke, but
|
| 363 |
+
it is not a persisted quantized-weights repo.
|
| 364 |
+
|
| 365 |
+
## 2026-06-03 Fast Proxy And Website Harness Speed Pass
|
| 366 |
+
|
| 367 |
+
The current speed profile keeps runtime-quantized vLLM active on Gojira-B port
|
| 368 |
+
`18084` and routes OpenCode through the local fast proxy at
|
| 369 |
+
`http://127.0.0.1:18181/v1`. The proxy preserves OpenCode tool-call streaming
|
| 370 |
+
while forcing `thinking=false`, model id `kaiju-coder-7`, and bounded output
|
| 371 |
+
budgets.
|
| 372 |
+
|
| 373 |
+
Active endpoint checks:
|
| 374 |
+
|
| 375 |
+
- Local fast proxy health: `http://127.0.0.1:18181/health`
|
| 376 |
+
- Upstream vLLM models: `http://100.109.109.14:18084/v1/models`
|
| 377 |
+
- Upstream reports `kaiju-coder-7` with `max_model_len=16384`
|
| 378 |
+
|
| 379 |
+
Fresh direct vLLM benchmark:
|
| 380 |
+
|
| 381 |
+
- Run: `runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md`
|
| 382 |
+
- Identity: `19.48s`
|
| 383 |
+
- Code patch: `24.97s`, `997` chars
|
| 384 |
+
- Business doc: `34.46s`, `1,615` chars
|
| 385 |
+
|
| 386 |
+
Fresh OpenCode smoke through the local fast proxy:
|
| 387 |
+
|
| 388 |
+
- Command: `opencode run -m kaiju/kaiju-coder-7 --agent kaiju-coder-7 --dir /tmp/kaiju-vllm-opencode-smoke --dangerously-skip-permissions 'Create fast-vllm.txt with exactly: Kaiju quantized vLLM OpenCode ok'`
|
| 389 |
+
- Result: passed in about `23.5s`, wrote the exact requested file.
|
| 390 |
+
- Packaged public verifier after exact-content agent rule:
|
| 391 |
+
`runs/public-opencode-smoke/20260603T235002Z/summary.md`, `4/4`
|
| 392 |
+
passed through `http://127.0.0.1:18181/v1`.
|
| 393 |
+
|
| 394 |
+
Website harness/router speed pass:
|
| 395 |
+
|
| 396 |
+
- Direct website harness command: `python3 scripts/run_kaiju_website_harness.py --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
|
| 397 |
+
- Direct website harness result: `runs/harness/website-speed-pass/avery-stone-vllm.html`, `9,257` chars, `7.31s`
|
| 398 |
+
- Router command: `python3 scripts/run_kaiju_router.py --kind website --openai-base-url http://100.109.109.14:18084/v1 --model kaiju-coder-7 ...`
|
| 399 |
+
- Router artifact: `runs/router-speed-pass/20260603T223731Z-website-build-a-premium-one-page-website-for-avery-stone-construction-a-reside/index.html`
|
| 400 |
+
- Router result: passed in `7.20s`; checks covered complete HTML, required sections, external images, responsive CSS, no lorem ipsum, and manifest write.
|
| 401 |
+
- Router through the installed local proxy: `runs/router-speed-pass/20260603T224328Z-website-build-a-premium-one-page-website-for-bennett-family-dental-in-charlott/index.html`
|
| 402 |
+
- Proxy router result: passed in `4.67s`; preserved explicit CTA `Schedule a Visit`, inferred `dental`, and passed the same complete-HTML/static checks.
|
| 403 |
+
|
| 404 |
+
Updated recommendation: for speed-sensitive OpenCode and paid workflow testing,
|
| 405 |
+
use vLLM bitsandbytes plus the local fast proxy as the active default. Keep
|
| 406 |
+
SGLang as fallback/historical evidence, not the fastest current path. For
|
| 407 |
+
websites and business-owner packs, prefer the deterministic router/harness path
|
| 408 |
+
over raw long-form HTML generation.
|
| 409 |
+
|
| 410 |
+
Public business-owner demo pack through the active fast proxy:
|
| 411 |
+
|
| 412 |
+
```bash
|
| 413 |
+
python3 scripts/run_kaiju_public_demo_pack.py \
|
| 414 |
+
--openai-base-url http://127.0.0.1:18181/v1 \
|
| 415 |
+
--model kaiju-coder-7 \
|
| 416 |
+
--planner-timeout 90
|
| 417 |
+
```
|
| 418 |
+
|
| 419 |
+
Run: `runs/public-demo-pack/20260603T235009Z/summary.md`
|
| 420 |
+
|
| 421 |
+
| Task | Result | Seconds | Changed files |
|
| 422 |
+
| --- | --- | ---: | ---: |
|
| 423 |
+
| Website | Passed | 4.73 | 2 |
|
| 424 |
+
| Owner AI company pack | Passed | 29.85 | 19 |
|
| 425 |
+
| Stripe safety plan | Passed | 9.99 | 2 |
|
| 426 |
+
| CSV parser artifact | Passed | 19.97 | 2 |
|
| 427 |
+
|
| 428 |
+
Total: `4/4` passed in `64.529s`.
|
| 429 |
+
|
| 430 |
+
## Persisted GGUF Q8_0 Candidate
|
| 431 |
+
|
| 432 |
+
The dedicated persisted-quantization pass found that normal AWQ/GPTQ installs
|
| 433 |
+
are not clean against the Qwen3.5-capable serving stack tonight, while
|
| 434 |
+
`llama.cpp` conversion support includes `Qwen3_5ForConditionalGeneration`.
|
| 435 |
+
|
| 436 |
+
Command:
|
| 437 |
+
|
| 438 |
+
```bash
|
| 439 |
+
./scripts/probe-gojira-b-persisted-quantization.sh
|
| 440 |
+
./scripts/run-gojira-b-kaiju-gguf-convert.sh
|
| 441 |
+
```
|
| 442 |
+
|
| 443 |
+
Result:
|
| 444 |
+
|
| 445 |
+
- Artifact:
|
| 446 |
+
`/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`
|
| 447 |
+
- Size: `27G`
|
| 448 |
+
- SHA256:
|
| 449 |
+
`596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`
|
| 450 |
+
- Conversion log:
|
| 451 |
+
`runs/gguf-conversion/20260603T231446Z/gguf-conversion.log`
|
| 452 |
+
- Runtime status: candidate only; direct GGUF runtime smoke still required
|
| 453 |
+
before publishing quantized weights.
|
| 454 |
+
|
| 455 |
+
Interpretation: the next real speed improvement for broad public users is not
|
| 456 |
+
another prompt tweak. It is a smoked GGUF or GPU-persisted quantized artifact.
|
| 457 |
+
The fastest currently verified Kaiju Coder 7 path remains vLLM bitsandbytes
|
| 458 |
+
plus the local fast proxy and deterministic website/business harnesses.
|