kaiju-coder-7-quantized-runtime / GGUF_CANDIDATE.md
restokes92's picture
Upload Kaiju Coder 7 runtime quantization recipe
785f3d7 verified

Kaiju Coder 7 GGUF Candidate

This folder documents the persisted GGUF candidate for Kaiju Coder 7. The artifact exists on Gojira-B, but it should stay marked as a candidate until a runtime smoke test passes.

Artifact

  • Format: GGUF
  • Outtype: q8_0
  • Remote path: /home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf
  • Remote size: 27G
  • SHA256: 596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e
  • Source model: /home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged
  • Conversion evidence: runs/gguf-conversion/20260603T231446Z/gguf-conversion.log

Status

Converted successfully on 2026-06-03. Runtime smoke is still required before public upload or a Hugging Face quantized-weights claim.

The conversion path is promising because the current llama.cpp convert_hf_to_gguf.py support list includes Qwen3_5ForConditionalGeneration and the Q8_0 dry run completed before the real conversion.

Recreate

./scripts/probe-gojira-b-persisted-quantization.sh
./scripts/run-gojira-b-kaiju-gguf-convert.sh

The conversion script stops the active vLLM runtime to free RAM, writes the GGUF artifact, records a checksum and manifest, then restarts the fast vLLM runtime.

Release Rule

Do not publish this as public quantized weights until all of these pass:

  • runtime loads the GGUF with model id kaiju-coder-7
  • direct identity smoke passes
  • direct business-owner document smoke passes
  • OpenCode or router smoke passes through the intended runtime
  • README/model card states exact runtime, context, memory, and quality caveats

Until then, the public quantized path remains kaiju-coder-7-quantized-runtime, which documents the already-smoked vLLM bitsandbytes setup.