AryanNsc commited on
Commit
c8f7db0
·
verified ·
1 Parent(s): 2e42c10

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -3
README.md CHANGED
@@ -1,3 +1,99 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # CBD-LLM: Causal Block Diffusion Language Model (PoC)
6
+
7
+ **CBD-LLM (Causal Block Diffusion)** is an experimental **hybrid Diffusion–Autoregressive language model** that enables **block-parallel text generation** while retaining **standard causal attention**, KV caching, and compatibility with pretrained AR weights.
8
+
9
+ This repository hosts a **Proof of Concept (PoC)** checkpoint demonstrating the feasibility of **parallel decoding with causal attention**, trained efficiently on consumer hardware using LoRA.
10
+
11
+ ---
12
+
13
+ ## 🔍 Model Overview
14
+
15
+ | Attribute | Description |
16
+ |--------|------------|
17
+ | **Model Type** | Causal Block Diffusion LLM |
18
+ | **Base Model** | Qwen2.5 |
19
+ | **Parameters** | ~1B (base), LoRA fine-tuned |
20
+ | **Attention** | Standard causal attention |
21
+ | **Decoding** | Block-parallel diffusion |
22
+ | **Training Stage** | Proof of Concept (Research Preview) |
23
+ | **License** | MIT |
24
+
25
+ ---
26
+
27
+ ## Key Idea
28
+
29
+ CBD-LLM bridges the gap between:
30
+
31
+ - **Autoregressive LLMs** (low data cost, KV-cache friendly, but serial decoding)
32
+ - **Diffusion LLMs** (parallel decoding, but high training cost and no KV cache)
33
+
34
+ By combining **topological token reordering** with **block-wise diffusion**, CBD-LLM achieves:
35
+
36
+ - Parallel generation
37
+ - Low VRAM usage
38
+ - Compatibility with FlashAttention and KV caching
39
+ - Efficient fine-tuning from pretrained AR models
40
+
41
+ ---
42
+
43
+ ## Architecture Summary
44
+
45
+ ### 1. Topological Reordering (Causal-Friendly Diffusion)
46
+
47
+ Diffusion models require masked tokens to attend to future context, normally forcing **bidirectional attention**.
48
+
49
+ CBD-LLM avoids this by:
50
+ - Physically moving **observed tokens to the front**
51
+ - Moving **masked tokens to the back**
52
+ - Preserving **original positional IDs (RoPE)**
53
+
54
+ This allows masked tokens to attend to observed tokens using a **standard causal mask**.
55
+
56
+ Logical: [The] [quick] [brown] [fox]
57
+ Masked: [The] [MASK] [MASK] [fox]
58
+
59
+ Physical: [The] [fox] [MASK] [MASK]
60
+ Pos IDs: 0 3 1 2
61
+
62
+ Result: causal attention + KV cache remain intact.
63
+
64
+ ---
65
+
66
+ ### 2. Block-Wise Variable Noise Diffusion
67
+
68
+ Instead of diffusing entire sequences:
69
+
70
+ - Text is generated in **fixed-size blocks** (e.g., 64 tokens)
71
+ - Each block undergoes **multiple denoising steps**
72
+ - The full block is refined **in parallel**
73
+
74
+ The model learns both:
75
+ - **Drafting** from noise
76
+ - **Refinement** from partial context
77
+
78
+ **Research and experimentation only**
79
+
80
+ Recommended use cases:
81
+ - Parallel decoding research
82
+ - Diffusion–AR hybrid modeling
83
+ - Efficient LLM inference studies
84
+ - Architecture prototyping
85
+
86
+ Not recommended for:
87
+ - Production deployment
88
+ - Safety-critical applications
89
+
90
+ ---
91
+
92
+ ## References
93
+
94
+ This model is inspired by:
95
+
96
+ 1. *Fast-dLLM v2: Efficient Block-Diffusion LLM* (2025)
97
+ 2. *WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention* (2025)
98
+
99
+ ---