Qwen3.5-4B for hipfire

Pre-quantized Qwen3.5-4B (DeltaNet hybrid) for hipfire, a Rust-native LLM inference engine for AMD RDNA GPUs.

Files

File	Quant	Size	Min VRAM	Speed (5700 XT)
qwen3.5-4b.q4.hfq	HFQ4	2.1GB	4GB	63 tok/s
qwen3.5-4b.hfq6.hfq	HFQ6	3.3GB	5GB	53 tok/s

Usage

# Install hipfire
curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash

# Pull and run
hipfire pull qwen3.5:4b
hipfire run qwen3.5:4b "Hello"

Quantization Formats

HFQ4: 4-bit, 256-weight groups (0.53 B/w). Best speed.
HFQ6: 6-bit, 256-weight groups (0.78 B/w). Best quality. ~15% slower.

Both include embedded tokenizer and model config.

About hipfire

Rust + HIP inference engine for AMD consumer GPUs (RDNA1–RDNA4). No Python in the hot path. 9x faster than llama.cpp+ROCm on the same hardware.

GitHub: Kaden-Schutt/hipfire
All models: docs/MODELS.md

License

Model weights subject to original Qwen license. hipfire engine: MIT.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for schuttdev/hipfire-qwen3.5-4b

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(103)

this model