File size: 2,978 Bytes
c508417 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | ---
license: apache-2.0
tags:
- diffusion
- masked-diffusion
- dream
- qwen2
- gguf
- diffuse-cpp
base_model: Dream-org/Dream-v0-Instruct-7B
pipeline_tag: text-generation
---
# Dream-v0-Instruct-7B-GGUF
GGUF quantizations of [Dream-org/Dream-v0-Instruct-7B](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) for use with [diffuse-cpp](https://github.com/iafiscal1212/diffuse-cpp), a CPU inference engine for Diffusion Language Models.
Dream is a masked diffusion language model based on the Qwen2.5-7B backbone with bidirectional attention and Grouped Query Attention (GQA, 28 query heads / 4 KV heads).
## Available Quantizations
| File | Type | Size | Description |
|------|------|------|-------------|
| `dream-7b-f16.gguf` | F16 | ~15 GB | Full precision, best quality |
| `dream-7b-q8_0.gguf` | Q8_0 | ~8.2 GB | 8-bit quantization, near-lossless |
| `dream-7b-q4km.gguf` | Q4_K_M | ~5.0 GB | 4-bit mixed quantization, best quality/size ratio |
**Recommended:** Q4_K_M for most users. Q8_0 if you have enough RAM and want minimal quality loss.
## Performance
Benchmarked on diffuse-cpp with entropy_exit + inter-step KV cache, 12 threads, seed=42:
| Prompt | tok/s | Steps | vs llama.cpp |
|--------|-------|-------|-------------|
| Capital of France? | 21.6 | 2 | 2.5x |
| Translate to French | 14.3 | 6 | 1.7x |
| 15 x 23? | 21.6 | 2 | 2.5x |
| Translate to Spanish | 13.2 | 10 | 1.6x |
| Python is_prime() | 8.2 | 7 | 1.0x |
| Why sky blue? | 4.9 | 16 | 0.6x |
| List planets | 4.9 | 16 | 0.6x |
| Poem about ocean | 4.5 | 16 | 0.5x |
| **Average** | **11.6** | | **1.4x** |
- Easy prompts (factual, math): **14-22 tok/s** (1.6-2.5x faster than llama.cpp)
- Hard prompts (creative, long-form): **4.5-4.9 tok/s**
- llama.cpp baseline: 8.51 tok/s (Qwen2.5-7B-Instruct, Q4_K_M, same hardware)
## Usage
```bash
# Download
huggingface-cli download diffuse-cpp/Dream-v0-Instruct-7B-GGUF dream-7b-q4km.gguf
# Run (requires diffuse-cpp v0.2.0+)
./diffuse-cli -m dream-7b-q4km.gguf -p "What is the capital of France?" -n 64 -s 16
```
## Model Details
- **Architecture:** Qwen2.5-7B backbone with bidirectional attention
- **Parameters:** 7.62B
- **Layers:** 28
- **Hidden size:** 3584
- **Attention:** GQA (28 query heads, 4 KV heads, head dim 128)
- **FFN:** SwiGLU, intermediate size 18944
- **Vocabulary:** 152,064 tokens
- **RoPE theta:** 1,000,000
- **Mask token ID:** 151666
- **Training:** Masked diffusion on text, with autoregressive logit shift
## Conversion Details
Converted from SafeTensors using `convert-dream.py` from diffuse-cpp:
- 339 tensors total (255 weights + 84 QKV biases)
- QKV biases kept at F32 in all quantizations
- Edge layers (first/last) quantized to Q6_K in Q4_K_M scheme
## Citation
```bibtex
@misc{dream2025,
title={Dream 7B - Scalable Discrete Denoising Diffusion Models for Text Generation},
author={Ye, Jiacheng and others},
year={2025}
}
```
## License
Apache 2.0, following the original Dream model license.
|