Qwen3-0.6B for hipfire

Pre-quantized Qwen3-0.6B (LLaMA (standard attention)) for hipfire, a Rust-native LLM inference engine for AMD RDNA GPUs.

Quantized from Qwen/Qwen3-0.6B.

Files

File Quant Size Min VRAM Speed (5700 XT)
qwen3-0.6b-hfq4.hfq HFQ4 0.4GB 1GB
qwen3-0.6b-hfq4-v2.hfq HFQ4 v2 0.4GB 1GB
qwen3-0.6b-hfq4g256.hfq HFQ4-G256 0.4GB 1GB

Usage

# Install hipfire
curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash

# Pull and run
hipfire pull qwen3:0.6b
hipfire run qwen3:0.6b "Hello"

Quantization Formats

  • HFQ4: 4-bit, 256-weight groups (0.53 B/w). Best speed.
  • HFQ6: 6-bit, 256-weight groups (0.78 B/w). Best quality. ~15% slower.

Both include embedded tokenizer and model config.

About hipfire

Rust + HIP inference engine for AMD consumer GPUs (RDNA1–RDNA4). No Python in the hot path. 9x faster than llama.cpp+ROCm on the same hardware.

License

Model weights subject to original Qwen license. hipfire engine: MIT.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for schuttdev/hipfire-qwen3-0.6b

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(755)
this model