RMDWLLC
/

kaiju-coder-7-quantized-runtime

Model card Files Files and versions

kaiju-coder-7-quantized-runtime / GGUF_CANDIDATE.md

restokes92's picture

Upload Kaiju Coder 7 runtime quantization recipe

785f3d7 verified 5 days ago

|

history blame contribute delete

1.79 kB

	# Kaiju Coder 7 GGUF Candidate

	This folder documents the persisted GGUF candidate for Kaiju Coder 7. The
	artifact exists on Gojira-B, but it should stay marked as a candidate until a
	runtime smoke test passes.

	## Artifact

	- Format: GGUF
	- Outtype: `q8_0`
	- Remote path:
	`/home/richardecholsai5/kaiju-coder/models/kaiju-coder-7-gguf/kaiju-coder-7-Q8_0.gguf`
	- Remote size: `27G`
	- SHA256:
	`596a2c227a429c7309db753061d88d71ee3f8a3b48f17e41ba9d81b0f55bdd4e`
	- Source model:
	`/home/richardecholsai5/kaiju-coder/models/Kaiju-Coder-Qwen3.6-27B-v1.8-merged`
	- Conversion evidence:
	`runs/gguf-conversion/20260603T231446Z/gguf-conversion.log`

	## Status

	Converted successfully on 2026-06-03. Runtime smoke is still required before
	public upload or a Hugging Face quantized-weights claim.

	The conversion path is promising because the current `llama.cpp`
	`convert_hf_to_gguf.py` support list includes `Qwen3_5ForConditionalGeneration`
	and the Q8_0 dry run completed before the real conversion.

	## Recreate

	```bash
	./scripts/probe-gojira-b-persisted-quantization.sh
	./scripts/run-gojira-b-kaiju-gguf-convert.sh
	```

	The conversion script stops the active vLLM runtime to free RAM, writes the GGUF
	artifact, records a checksum and manifest, then restarts the fast vLLM runtime.

	## Release Rule

	Do not publish this as public quantized weights until all of these pass:

	- runtime loads the GGUF with model id `kaiju-coder-7`
	- direct identity smoke passes
	- direct business-owner document smoke passes
	- OpenCode or router smoke passes through the intended runtime
	- README/model card states exact runtime, context, memory, and quality caveats

	Until then, the public quantized path remains `kaiju-coder-7-quantized-runtime`,
	which documents the already-smoked vLLM bitsandbytes setup.