Launch80
/

Qwen3.6_27B-RFP458

Text Generation

4-bit precision

8-bit precision

Model card Files Files and versions

Qwen3.6_27B-RFP458 / README.md

Launch80's picture

Add files using upload-large-folder tool

9295d4e verified 6 days ago

|

History Blame Contribute Delete

1.98 kB

	---
	base_model: Qwen/Qwen3.6-27B
	license: other
	license_name: qwen
	pipeline_tag: text-generation
	tags:
	- rfp458
	- 4-bit
	- quantized
	- vllm
	- rocm
	- rdna4
	- qwen3.6
	---

	# Qwen3.6-27B-RFP458 (4.5 bpw)

	A 4.5-bit-per-weight RFP458 quantization of [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B), a hybrid vision-language model (Gated-DeltaNet linear-attention + full-attention layers, with a vision tower and MTP head).

	## Summary

	- Format: RFP458 (`rfp458-pack-quantized`): iq4_nl non-uniform 4-bit codebook, group size 16, signed-int8 block mantissa + per-channel int8 exponent, with hadamard16 weight rotation.
	- Size: ~20.5 GB (vs ~27 GB for the FP8 build); ~9.4 GiB per card on a 2x 32 GB setup.
	- What is quantized: the MLP linears, the GDN `in_proj_qkv` / `in_proj_z` / `out_proj`, and the full self-attention q/k/v/o projections. Embeddings, lm_head, the GDN gating projections (`in_proj_a` / `in_proj_b`), conv1d, norms, and the entire vision tower are kept in bf16.

	## Quality

	WikiText-2 perplexity (llama.cpp-compatible, n_ctx 2048, full test set):

	\| Build \| Size \| PPL \|
	\|---\|---\|---\|
	\| RFP458 (this model) \| 20.5 GB \| 6.936 \|
	\| FP8 (RedHatAI/Qwen3.6-27B-FP8) \| ~27 GB \| 7.071 \|

	RFP458 matches or slightly beats the FP8 build at roughly 25 percent smaller weight footprint.

	## Serving

	Built for and validated on a vLLM build with native RFP458 dequant kernels on AMD RDNA4 (gfx1201, Radeon AI PRO R9700), tensor-parallel 2. The smaller weights free enough VRAM to serve the full 262K context with a large KV pool. Note that 4-bit-class formats carry a dequant cost, so single-stream decode runs roughly half the speed of the FP8 build on the same hardware; this is a capacity, footprint, and quality choice rather than a speed one.

	## License

	Inherits the license of the base model, Qwen/Qwen3.6-27B. See the base model card for terms.

	This is a community quantization and is not affiliated with the original model authors.