Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language: en
|
| 4 |
+
library_name: transformers
|
| 5 |
+
tags:
|
| 6 |
+
- d2f
|
| 7 |
+
- diffusion-llm
|
| 8 |
+
- text-generation
|
| 9 |
+
- dream
|
| 10 |
+
- lora
|
| 11 |
+
base_model: Dream-org/Dream-v0-Instruct-7B
|
| 12 |
+
model_name: D2F_Dream_Base_7B_Lora
|
| 13 |
+
---
|
| 14 |
+
# D2F LoRA adapter for Dream-Instruct-7B
|
| 15 |
+
|
| 16 |
+
This repository contains the **LoRA adapter** for the `Dream-org/Dream-v0-Instruct-7B` model, trained using the **Discrete Diffusion Forcing (D2F)** method.
|
| 17 |
+
|
| 18 |
+
This adapter allows the `Dream-Instruct-7B` diffusion LLM (dLLM) to achieve inference speeds that are significantly faster than both its original version and leading autoregressive (AR) models like LLaMA3, while maintaining comparable output quality.
|
| 19 |
+
|
| 20 |
+
The D2F method and its results are detailed in the paper: **[D2F: Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing](https://arxiv.org/abs/2508.09192)**.
|
| 21 |
+
|
| 22 |
+
- **Official Code:** [D2F GitHub Repository](https://github.com/zhijie-group/Discrete-Diffusion-Forcing)
|
| 23 |
+
- **Demo Space:** [D2F-LLaDA-Instruct-8B](https://huggingface.co/spaces/zhijie3/D2F-LLaDA-Instruct-8B)
|
| 24 |
+
- **The model is used in** [LoPA](https://github.com/zhijie-group/LoPA)
|
| 25 |
+
|
| 26 |
+
## Method: Discrete Diffusion Forcing (D2F)
|
| 27 |
+
|
| 28 |
+
Diffusion LLMs (dLLMs) have long promised ultra-fast parallel decoding, but this potential was historically crippled by two main bottlenecks:
|
| 29 |
+
1. **KV Cache Incompatibility:** Their bidirectional attention mechanism prevented the use of the Key-Value Cache, a critical optimization in AR models.
|
| 30 |
+
2. **Strict Inter-Block Dependency:** Previous attempts at block-based generation required each block to be fully generated before starting the next, preventing true parallelism.
|
| 31 |
+
|
| 32 |
+
**D2F** solves these issues with a novel hybrid approach:
|
| 33 |
+
|
| 34 |
+
1. **Hybrid Architecture:** D2F reframes text generation as a block-autoregressive process.
|
| 35 |
+
* **Within a block:** Attention remains **bidirectional** to capture rich local context.
|
| 36 |
+
* **Between blocks:** Attention is made **causal**, allowing the model to be fully compatible with the standard **KV Cache**.
|
| 37 |
+
|
| 38 |
+
2. **Pipelined Parallel Decoding:** D2F uses an efficient training and inference strategy.
|
| 39 |
+
* **Training:** It uses *Asymmetric Distillation*, where a D2F student model learns to mimic a powerful bidirectional teacher model, efficiently transferring its capabilities to the fast, cache-friendly architecture.
|
| 40 |
+
* **Inference:** It enables a dynamic **pipelined parallel decoder**. New text blocks are added to the pipeline as soon as their predecessors are only partially complete. This creates an asynchronous workflow that maximizes GPU utilization and dramatically boosts throughput.
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
## How to Use
|
| 44 |
+
|
| 45 |
+
⚠️ **Important:** This is a LoRA adapter and requires the official D2F codebase for inference.
|
| 46 |
+
|
| 47 |
+
For detailed instructions and code, please refer to the official GitHub repository:
|
| 48 |
+
|
| 49 |
+
➡️ **https://github.com/zhijie-group/Discrete-Diffusion-Forcing** ⬅️
|