AryanNsc
/

cbd-poc-v1

Model card Files Files and versions

AryanNsc commited on Jan 19

Commit

c8f7db0

·

verified ·

1 Parent(s): 2e42c10

Update README.md

Files changed (1) hide show

README.md +99 -3

README.md CHANGED Viewed

@@ -1,3 +1,99 @@
----
-license: mit
----

+---
+license: mit
+---
+# CBD-LLM: Causal Block Diffusion Language Model (PoC)
+**CBD-LLM (Causal Block Diffusion)** is an experimental **hybrid Diffusion–Autoregressive language model** that enables **block-parallel text generation** while retaining **standard causal attention**, KV caching, and compatibility with pretrained AR weights.
+This repository hosts a **Proof of Concept (PoC)** checkpoint demonstrating the feasibility of **parallel decoding with causal attention**, trained efficiently on consumer hardware using LoRA.
+---
+## 🔍 Model Overview
+| Attribute | Description |
+|--------|------------|
+| **Model Type** | Causal Block Diffusion LLM |
+| **Base Model** | Qwen2.5 |
+| **Parameters** | ~1B (base), LoRA fine-tuned |
+| **Attention** | Standard causal attention |
+| **Decoding** | Block-parallel diffusion |
+| **Training Stage** | Proof of Concept (Research Preview) |
+| **License** | MIT |
+---
+## Key Idea
+CBD-LLM bridges the gap between:
+- **Autoregressive LLMs** (low data cost, KV-cache friendly, but serial decoding)
+- **Diffusion LLMs** (parallel decoding, but high training cost and no KV cache)
+By combining **topological token reordering** with **block-wise diffusion**, CBD-LLM achieves:
+- Parallel generation
+- Low VRAM usage
+- Compatibility with FlashAttention and KV caching
+- Efficient fine-tuning from pretrained AR models
+---
+## Architecture Summary
+### 1. Topological Reordering (Causal-Friendly Diffusion)
+Diffusion models require masked tokens to attend to future context, normally forcing **bidirectional attention**.
+CBD-LLM avoids this by:
+- Physically moving **observed tokens to the front**
+- Moving **masked tokens to the back**
+- Preserving **original positional IDs (RoPE)**
+This allows masked tokens to attend to observed tokens using a **standard causal mask**.
+Logical: [The] [quick] [brown] [fox]
+Masked: [The] [MASK] [MASK] [fox]
+Physical: [The] [fox] [MASK] [MASK]
+Pos IDs: 0 3 1 2
+Result: causal attention + KV cache remain intact.
+---
+### 2. Block-Wise Variable Noise Diffusion
+Instead of diffusing entire sequences:
+- Text is generated in **fixed-size blocks** (e.g., 64 tokens)
+- Each block undergoes **multiple denoising steps**
+- The full block is refined **in parallel**
+The model learns both:
+- **Drafting** from noise
+- **Refinement** from partial context
+**Research and experimentation only**
+Recommended use cases:
+- Parallel decoding research
+- Diffusion–AR hybrid modeling
+- Efficient LLM inference studies
+- Architecture prototyping
+Not recommended for:
+- Production deployment
+- Safety-critical applications
+---
+## References
+This model is inspired by:
+1. *Fast-dLLM v2: Efficient Block-Diffusion LLM* (2025)
+2. *WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention* (2025)
+---