kaiju-coder-7 / GOAL_COMPLETION_AUDIT.md
restokes92's picture
Polish Kaiju Coder 7 Hugging Face model cards
b040ee3 verified
# Kaiju Coder 7 Goal Completion Audit
Generated: `2026-06-04T02:44:04Z`
Overall: `complete`
Summary: `18 passed / 0 blocked / 0 manual`
This audit maps the active Kaiju Coder 7 objective to current evidence across local runtime, Hugging Face release, OpenCode, paid API preflight, and remaining honest caveats.
## Readiness Commands
| Check | Ready | Return Code |
|---|---:|---:|
| Local public-testing readiness | `True` | `0` |
| Hugging Face release readiness | `True` | `0` |
| Public launch readiness | `True` | `0` |
| Paid API scaffold | `True` | `0` |
| Paid API launch | `True` | `0` |
| HF staging integrity | `True` | `0` |
| HF namespace permission evidence | `True` | `0` |
| Human public review | `True` | `0` |
## Requirement Audit
| Area | Requirement | Status | Evidence | Blocker |
|---|---|---|---|---|
| Identity | Product name is Kaiju Coder 7 and public/API model id is kaiju-coder-7. | `passed` | scripts/check_kaiju_public_release_readiness.py --mode local; release/PUBLIC_TESTING_QUICKSTART.md | |
| OpenCode | Kaiju-specific OpenCode config installs the model, default agent, hidden artifact routing, and no-autocontinue loop guard. | `passed` | .opencode/agents/kaiju-coder-7.md; scripts/opencode-kaiju-no-autocontinue.mjs; scripts/install_kaiju_opencode_profile.py | |
| OpenCode | After install, plain opencode/opencode run works from this Mac with Kaiju as the selected/default model. | `passed` | runs/public-opencode-smoke latest passing summary; scripts/run_kaiju_public_opencode_smoke.py | |
| OpenCode | Customer-readiness pack passes without wrong-directory output, fake compaction completion, missing files, or secret leakage. | `passed` | runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
| Runtime | Direct API smoke passes using model=kaiju-coder-7. | `passed` | runs/benchmarks/20260603T223337Z-kaiju-coder-7-serving/summary.md | |
| Runtime | 12k, 16k, 24k, and 32k context benchmarks are recorded with a recommended default. | `passed` | release/SERVING_BENCHMARKS.md records 12288, 16384, 24576, 32768 and recommends 16k live default | |
| Runtime | SGLang and vLLM/practical faster serving path are benchmarked honestly. | `passed` | release/SERVING_BENCHMARKS.md; release/quantized-runtime/README.md | |
| Runtime | At least one public-friendly quantized/local candidate is working or clearly documented as blocked with evidence. | `passed` | release/quantized-runtime/README.md documents vLLM bitsandbytes runtime candidate and persisted-weights limitation | |
| Hugging Face | Public-friendly HF release structure is staged with adapter, OpenCode helper, runtime-quantized helper, model cards, provenance, evals, and docs. | `passed` | python3 scripts/check_hf_staging_integrity.py --require-checksums | |
| Hugging Face | At least one public Hugging Face release path is ready to upload or uploaded. | `passed` | python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release | |
| Hugging Face | Merged 51GB model repo upload is complete and public, or guarded with explicit evidence. | `passed` | release/HF_UPLOAD_EVIDENCE.md; scripts/prepare_hf_merged_model_metadata.sh; scripts/upload_hf_merged_model_from_gojira_b.sh | |
| Hugging Face | Uploaded Hugging Face repos are downloadable by intended users. | `passed` | release/HF_UPLOAD_EVIDENCE.md; python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply | |
| Quality | Customer-style evals cover website, proposal, Stripe/payment, CRM/reporting, CSV/parser, Kiyomi operating pack, and safety/provenance. | `passed` | evals/tasks/opencode-customer-readiness.jsonl; runs/opencode-customer-readiness/20260603T185835Z/summary.md | |
| Quality | Model/harness prompts produce file-oriented business-owner artifacts rather than vague advice. | `passed` | kaiju_harness/business_suite.py; release/EVAL_SCOREBOARD.md | |
| Provenance | Training/eval provenance is preserved and public docs avoid internal checkpoint naming except license/provenance attribution. | `passed` | release/SOURCE_INVENTORY.md; release/DATA_PROVENANCE_DRAFT.md; release/PUBLIC_TESTING_QUICKSTART.md | |
| Paid API | Paid API scaffold covers API keys, Stripe billing, rate limits, logging controls, abuse controls, rollback plan, and pricing assumptions. | `passed` | python3 scripts/check_paid_api_readiness.py --mode scaffold; gateway/cloudflare-worker tests | |
| Paid API | Paid API is ready for public charging. | `passed` | python3 scripts/check_paid_api_readiness.py --mode launch | |
| Final Report | Final report includes exact commands run, eval results, changed files, remaining risks, and what Richard should test first. | `passed` | release/FINAL_RELEASE_REPORT.md | |
## Blocking Items
- No blocking items.
## Commands To Re-run
```bash
python3 scripts/check_kaiju_public_release_readiness.py --mode local
python3 scripts/check_kaiju_public_release_readiness.py --mode hf-release
python3 scripts/check_kaiju_public_release_readiness.py --mode public
python3 scripts/check_paid_api_readiness.py --mode scaffold
python3 scripts/check_paid_api_readiness.py --mode launch
python3 scripts/check_hf_staging_integrity.py --require-checksums
python3 scripts/check_hf_release_permission_evidence.py
python3 scripts/check_hf_uploaded_release.py --namespace RMDWLLC --apply
python3 scripts/check_human_release_review.py --mode public
```