Qwen3.5-27B for hipfire

Pre-quantized Qwen3.5-27B (DeltaNet hybrid) for hipfire, a Rust-native LLM inference engine for AMD RDNA GPUs.

Quantized from Qwen/Qwen3.5-27B.

Files

File Quant Size Min VRAM Speed (5700 XT)
qwen3.5-27b.q4.hfq HFQ4 14.3GB 16GB TBD
qwen3.5-27b.hfq6.hfq HFQ6 21.4GB 24GB TBD

GPU Compatibility

GPU VRAM HFQ4 HFQ6
RX 5700 XT 8GB No No
RX 6800 XT 16GB Yes No
RX 7900 XTX 24GB Yes Yes
RX 9070 16GB Yes No

Usage

# Install hipfire
curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash

# Pull and run
hipfire pull qwen3.5:27b
hipfire run qwen3.5:27b "Hello"

Quantization Formats

  • HFQ4: 4-bit, 256-weight groups (0.53 B/w). Best speed.
  • HFQ6: 6-bit, 256-weight groups (0.78 B/w). Best quality. ~15% slower.

Both include embedded tokenizer and model config.

About hipfire

Rust + HIP inference engine for AMD consumer GPUs (RDNA1–RDNA4). No Python in the hot path. 9x faster than llama.cpp+ROCm on the same hardware.

License

Model weights subject to original Qwen license. hipfire engine: MIT.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for schuttdev/hipfire-qwen3.5-27b

Base model

Qwen/Qwen3.5-27B
Finetuned
(206)
this model