diffuse-cpp
/

Dream-v0-Instruct-7B-GGUF

+---
+license: apache-2.0
+tags:
+- diffusion
+- masked-diffusion
+- dream
+- qwen2
+- gguf
+- diffuse-cpp
+base_model: Dream-org/Dream-v0-Instruct-7B
+pipeline_tag: text-generation
+---
+# Dream-v0-Instruct-7B-GGUF
+GGUF quantizations of [Dream-org/Dream-v0-Instruct-7B](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) for use with [diffuse-cpp](https://github.com/iafiscal1212/diffuse-cpp), a CPU inference engine for Diffusion Language Models.
+Dream is a masked diffusion language model based on the Qwen2.5-7B backbone with bidirectional attention and Grouped Query Attention (GQA, 28 query heads / 4 KV heads).
+## Available Quantizations
+| File | Type | Size | Description |
+|------|------|------|-------------|
+| `dream-7b-f16.gguf` | F16 | ~15 GB | Full precision, best quality |
+| `dream-7b-q8_0.gguf` | Q8_0 | ~8.2 GB | 8-bit quantization, near-lossless |
+| `dream-7b-q4km.gguf` | Q4_K_M | ~5.0 GB | 4-bit mixed quantization, best quality/size ratio |
+**Recommended:** Q4_K_M for most users. Q8_0 if you have enough RAM and want minimal quality loss.
+## Performance
+Benchmarked on diffuse-cpp with entropy_exit + inter-step KV cache, 12 threads, seed=42:
+| Prompt | tok/s | Steps | vs llama.cpp |
+|--------|-------|-------|-------------|
+| Capital of France? | 21.6 | 2 | 2.5x |
+| Translate to French | 14.3 | 6 | 1.7x |
+| 15 x 23? | 21.6 | 2 | 2.5x |
+| Translate to Spanish | 13.2 | 10 | 1.6x |
+| Python is_prime() | 8.2 | 7 | 1.0x |
+| Why sky blue? | 4.9 | 16 | 0.6x |
+| List planets | 4.9 | 16 | 0.6x |
+| Poem about ocean | 4.5 | 16 | 0.5x |
+| **Average** | **11.6** | | **1.4x** |
+- Easy prompts (factual, math): **14-22 tok/s** (1.6-2.5x faster than llama.cpp)
+- Hard prompts (creative, long-form): **4.5-4.9 tok/s**
+- llama.cpp baseline: 8.51 tok/s (Qwen2.5-7B-Instruct, Q4_K_M, same hardware)
+## Usage
+```bash
+# Download
+huggingface-cli download diffuse-cpp/Dream-v0-Instruct-7B-GGUF dream-7b-q4km.gguf
+# Run (requires diffuse-cpp v0.2.0+)
+./diffuse-cli -m dream-7b-q4km.gguf -p "What is the capital of France?" -n 64 -s 16
+```
+## Model Details
+- **Architecture:** Qwen2.5-7B backbone with bidirectional attention
+- **Parameters:** 7.62B
+- **Layers:** 28
+- **Hidden size:** 3584
+- **Attention:** GQA (28 query heads, 4 KV heads, head dim 128)
+- **FFN:** SwiGLU, intermediate size 18944
+- **Vocabulary:** 152,064 tokens
+- **RoPE theta:** 1,000,000
+- **Mask token ID:** 151666
+- **Training:** Masked diffusion on text, with autoregressive logit shift
+## Conversion Details
+Converted from SafeTensors using `convert-dream.py` from diffuse-cpp:
+- 339 tensors total (255 weights + 84 QKV biases)
+- QKV biases kept at F32 in all quantizations
+- Edge layers (first/last) quantized to Q6_K in Q4_K_M scheme
+## Citation
+```bibtex
+@misc{dream2025,
+  title={Dream 7B - Scalable Discrete Denoising Diffusion Models for Text Generation},
+  author={Ye, Jiacheng and others},
+  year={2025}
+}
+```
+## License
+Apache 2.0, following the original Dream model license.