| # Kaiju Coder 7 GGUF Candidate |
|
|
| This folder documents the persisted GGUF candidate for Kaiju Coder 7. The |
| artifact exists on Gojira-B, but it should stay marked as a candidate until a |
| runtime smoke test passes. |
|
|
| ## Artifact |
|
|
| - Format: GGUF |
| - Outtype: `q8_0` |
| - Remote path: |
| `/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf` |
| - Remote size: `27G` |
| - SHA256: |
| `596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e` |
| - Source model: |
| `/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged` |
| - Conversion evidence: |
| `runs/gguf-conversion/20260603T231446Z/gguf-conversion.log` |
|
|
| ## Status |
|
|
| Converted successfully on 2026-06-03. Runtime smoke is still required before |
| public upload or a Hugging Face quantized-weights claim. |
|
|
| The conversion path is promising because the current `llama.cpp` |
| `convert_hf_to_gguf.py` support list includes `Qwen3_5ForConditionalGeneration` |
| and the Q8_0 dry run completed before the real conversion. |
| |
| ## Recreate |
| |
| ```bash |
| ./scripts/probe-gojira-b-persisted-quantization.sh |
| ./scripts/run-gojira-b-kaiju-gguf-convert.sh |
| ``` |
| |
| The conversion script stops the active vLLM runtime to free RAM, writes the GGUF |
| artifact, records a checksum and manifest, then restarts the fast vLLM runtime. |
| |
| ## Release Rule |
| |
| Do not publish this as public quantized weights until all of these pass: |
| |
| - runtime loads the GGUF with model id `kaiju-coder-7` |
| - direct identity smoke passes |
| - direct business-owner document smoke passes |
| - OpenCode or router smoke passes through the intended runtime |
| - README/model card states exact runtime, context, memory, and quality caveats |
| |
| Until then, the public quantized path remains `kaiju-coder-7-quantized-runtime`, |
| which documents the already-smoked vLLM bitsandbytes setup. |
| |