Carmenest commited on
Commit
c508417
·
verified ·
1 Parent(s): c8e598e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - diffusion
5
+ - masked-diffusion
6
+ - dream
7
+ - qwen2
8
+ - gguf
9
+ - diffuse-cpp
10
+ base_model: Dream-org/Dream-v0-Instruct-7B
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # Dream-v0-Instruct-7B-GGUF
15
+
16
+ GGUF quantizations of [Dream-org/Dream-v0-Instruct-7B](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) for use with [diffuse-cpp](https://github.com/iafiscal1212/diffuse-cpp), a CPU inference engine for Diffusion Language Models.
17
+
18
+ Dream is a masked diffusion language model based on the Qwen2.5-7B backbone with bidirectional attention and Grouped Query Attention (GQA, 28 query heads / 4 KV heads).
19
+
20
+ ## Available Quantizations
21
+
22
+ | File | Type | Size | Description |
23
+ |------|------|------|-------------|
24
+ | `dream-7b-f16.gguf` | F16 | ~15 GB | Full precision, best quality |
25
+ | `dream-7b-q8_0.gguf` | Q8_0 | ~8.2 GB | 8-bit quantization, near-lossless |
26
+ | `dream-7b-q4km.gguf` | Q4_K_M | ~5.0 GB | 4-bit mixed quantization, best quality/size ratio |
27
+
28
+ **Recommended:** Q4_K_M for most users. Q8_0 if you have enough RAM and want minimal quality loss.
29
+
30
+ ## Performance
31
+
32
+ Benchmarked on diffuse-cpp with entropy_exit + inter-step KV cache, 12 threads, seed=42:
33
+
34
+ | Prompt | tok/s | Steps | vs llama.cpp |
35
+ |--------|-------|-------|-------------|
36
+ | Capital of France? | 21.6 | 2 | 2.5x |
37
+ | Translate to French | 14.3 | 6 | 1.7x |
38
+ | 15 x 23? | 21.6 | 2 | 2.5x |
39
+ | Translate to Spanish | 13.2 | 10 | 1.6x |
40
+ | Python is_prime() | 8.2 | 7 | 1.0x |
41
+ | Why sky blue? | 4.9 | 16 | 0.6x |
42
+ | List planets | 4.9 | 16 | 0.6x |
43
+ | Poem about ocean | 4.5 | 16 | 0.5x |
44
+ | **Average** | **11.6** | | **1.4x** |
45
+
46
+ - Easy prompts (factual, math): **14-22 tok/s** (1.6-2.5x faster than llama.cpp)
47
+ - Hard prompts (creative, long-form): **4.5-4.9 tok/s**
48
+ - llama.cpp baseline: 8.51 tok/s (Qwen2.5-7B-Instruct, Q4_K_M, same hardware)
49
+
50
+ ## Usage
51
+
52
+ ```bash
53
+ # Download
54
+ huggingface-cli download diffuse-cpp/Dream-v0-Instruct-7B-GGUF dream-7b-q4km.gguf
55
+
56
+ # Run (requires diffuse-cpp v0.2.0+)
57
+ ./diffuse-cli -m dream-7b-q4km.gguf -p "What is the capital of France?" -n 64 -s 16
58
+ ```
59
+
60
+ ## Model Details
61
+
62
+ - **Architecture:** Qwen2.5-7B backbone with bidirectional attention
63
+ - **Parameters:** 7.62B
64
+ - **Layers:** 28
65
+ - **Hidden size:** 3584
66
+ - **Attention:** GQA (28 query heads, 4 KV heads, head dim 128)
67
+ - **FFN:** SwiGLU, intermediate size 18944
68
+ - **Vocabulary:** 152,064 tokens
69
+ - **RoPE theta:** 1,000,000
70
+ - **Mask token ID:** 151666
71
+ - **Training:** Masked diffusion on text, with autoregressive logit shift
72
+
73
+ ## Conversion Details
74
+
75
+ Converted from SafeTensors using `convert-dream.py` from diffuse-cpp:
76
+ - 339 tensors total (255 weights + 84 QKV biases)
77
+ - QKV biases kept at F32 in all quantizations
78
+ - Edge layers (first/last) quantized to Q6_K in Q4_K_M scheme
79
+
80
+ ## Citation
81
+
82
+ ```bibtex
83
+ @misc{dream2025,
84
+ title={Dream 7B - Scalable Discrete Denoising Diffusion Models for Text Generation},
85
+ author={Ye, Jiacheng and others},
86
+ year={2025}
87
+ }
88
+ ```
89
+
90
+ ## License
91
+
92
+ Apache 2.0, following the original Dream model license.