SJTU-DENG-Lab
/

D2F_Dream_Instruct_7B_Lora

Text Generation

Model card Files Files and versions

UnhurriedDawn commited on 18 days ago

Commit

683643a

·

verified ·

1 Parent(s): d16c3bc

Create README.md

Files changed (1) hide show

README.md +49 -0

README.md ADDED Viewed

	@@ -0,0 +1,49 @@

+---
+license: apache-2.0
+language: en
+library_name: transformers
+tags:
+- d2f
+- diffusion-llm
+- text-generation
+- dream
+- lora
+base_model: Dream-org/Dream-v0-Instruct-7B
+model_name: D2F_Dream_Base_7B_Lora
+---
+# D2F LoRA adapter for Dream-Instruct-7B
+This repository contains the **LoRA adapter** for the `Dream-org/Dream-v0-Instruct-7B` model, trained using the **Discrete Diffusion Forcing (D2F)** method.
+This adapter allows the `Dream-Instruct-7B` diffusion LLM (dLLM) to achieve inference speeds that are significantly faster than both its original version and leading autoregressive (AR) models like LLaMA3, while maintaining comparable output quality.
+The D2F method and its results are detailed in the paper: **[D2F: Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing](https://arxiv.org/abs/2508.09192)**.
+- **Official Code:** [D2F GitHub Repository](https://github.com/zhijie-group/Discrete-Diffusion-Forcing)
+- **Demo Space:** [D2F-LLaDA-Instruct-8B](https://huggingface.co/spaces/zhijie3/D2F-LLaDA-Instruct-8B)
+- **The model is used in** [LoPA](https://github.com/zhijie-group/LoPA)
+## Method: Discrete Diffusion Forcing (D2F)
+Diffusion LLMs (dLLMs) have long promised ultra-fast parallel decoding, but this potential was historically crippled by two main bottlenecks:
+1.  **KV Cache Incompatibility:** Their bidirectional attention mechanism prevented the use of the Key-Value Cache, a critical optimization in AR models.
+2.  **Strict Inter-Block Dependency:** Previous attempts at block-based generation required each block to be fully generated before starting the next, preventing true parallelism.
+**D2F** solves these issues with a novel hybrid approach:
+1.  **Hybrid Architecture:** D2F reframes text generation as a block-autoregressive process.
+    *   **Within a block:** Attention remains **bidirectional** to capture rich local context.
+    *   **Between blocks:** Attention is made **causal**, allowing the model to be fully compatible with the standard **KV Cache**.
+2.  **Pipelined Parallel Decoding:** D2F uses an efficient training and inference strategy.
+    *   **Training:** It uses *Asymmetric Distillation*, where a D2F student model learns to mimic a powerful bidirectional teacher model, efficiently transferring its capabilities to the fast, cache-friendly architecture.
+    *   **Inference:** It enables a dynamic **pipelined parallel decoder**. New text blocks are added to the pipeline as soon as their predecessors are only partially complete. This creates an asynchronous workflow that maximizes GPU utilization and dramatically boosts throughput.
+## How to Use
+⚠️ **Important:** This is a LoRA adapter and requires the official D2F codebase for inference.
+For detailed instructions and code, please refer to the official GitHub repository:
+➡️ **https://github.com/zhijie-group/Discrete-Diffusion-Forcing** ⬅️