schuttdev
/

hipfire-qwen3.5-2b

+---
+license: mit
+base_model: Qwen/Qwen3.5-2B
+tags:
+  - hipfire
+  - amd
+  - rdna
+  - quantized
+  - qwen3.5
+library_name: hipfire
+---
+# Qwen3.5-2B for hipfire
+Pre-quantized **Qwen3.5-2B** (DeltaNet hybrid) for [hipfire](https://github.com/Kaden-Schutt/hipfire), a Rust-native LLM inference engine for AMD RDNA GPUs.
+Quantized from [Qwen/Qwen3.5-2B](https://huggingface.co/Qwen/Qwen3.5-2B).
+## Files
+| File | Quant | Size | Min VRAM | Speed (5700 XT) |
+|------|-------|------|----------|-----------------|
+| qwen3.5-2b.q4.hfq | HFQ4 | 1.2GB | 2GB | 141 tok/s |
+| qwen3.5-2b.hfq6.hfq | HFQ6 | 1.6GB | 3GB | 127 tok/s |
+## Usage
+```bash
+# Install hipfire
+curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash
+# Pull and run
+hipfire pull qwen3.5:2b
+hipfire run qwen3.5:2b "Hello"
+```
+## Quantization Formats
+- **HFQ4**: 4-bit, 256-weight groups (0.53 B/w). Best speed.
+- **HFQ6**: 6-bit, 256-weight groups (0.78 B/w). Best quality. ~15% slower.
+Both include embedded tokenizer and model config.
+## About hipfire
+Rust + HIP inference engine for AMD consumer GPUs (RDNA1–RDNA4). No Python in the hot path. 9x faster than llama.cpp+ROCm on the same hardware.
+- GitHub: [Kaden-Schutt/hipfire](https://github.com/Kaden-Schutt/hipfire)
+- All models: [docs/MODELS.md](https://github.com/Kaden-Schutt/hipfire/blob/master/docs/MODELS.md)
+## License
+Model weights subject to original [Qwen license](https://huggingface.co/Qwen/Qwen3.5-2B). hipfire engine: MIT.