Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,47 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
<div align=center>
|
| 6 |
+
|
| 7 |
+
# [miXed Diffusion Language Modeling](https://arxiv.org/pdf/2602.01362)
|
| 8 |
+
</div>
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
This repository contains the checkpoints for [CODE](https://github.com/MzeroMiko/XDLM).
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
<div align=center>
|
| 15 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65aa76b1cb5b4fb08ecb087c/cLJKItZwLzXPac2Yo8Xw0.png" width="80%">
|
| 16 |
+
</div>
|
| 17 |
+
<div align=center>
|
| 18 |
+
</div>
|
| 19 |
+
|
| 20 |
+
## Introduction
|
| 21 |
+
|
| 22 |
+
In discrete generative modeling, two dominant paradigms demonstrate divergent capabilities: Masked Diffusion Language Models (MDLM) excel at semantic understanding and zero-shot generalization, whereas Uniform-noise Diffusion Language Models (UDLM) achieve strong few-step generation quality, yet neither attains balanced performance across both dimensions. To address this, we propose XDLM, which bridges the two paradigms via a stationary noise kernel. XDLM offers two key contributions: (1) it provides a principled theoretical unification of MDLM and UDLM, recovering each paradigm as a special case; and (2) an alleviated memory bottleneck enabled by an algebraic simplification of the posterior probabilities. Experiments demonstrate that XDLM advances the Pareto frontier between understanding capability and generation quality. Quantitatively, XDLM surpasses UDLM by 5.4 points on zero-shot text benchmarks and outperforms MDLM in few-step image generation (FID 54.1 vs. 80.8). When scaled to tune an 8B-parameter large language model, XDLM achieves 15.0 MBPP in just 32 steps, effectively doubling the baseline performance. Finally, analysis of training dynamics reveals XDLM’s superior potential for long-term scaling.
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
## Highlights
|
| 26 |
+
|
| 27 |
+
### LM1B Case
|
| 28 |
+
|
| 29 |
+
***Step-wise evolution of a generated sequence (T = 32).***
|
| 30 |
+
XDLM shows three different transition dynamics inherent to the hybrid noise process: `Green` represents new tokens generated from masks; `Blue` represents lexical refinement; and `Red` highlights the re-masking
|
| 31 |
+
operation where previously generated tokens are rejected and reverted to `[MASK]`.
|
| 32 |
+
|
| 33 |
+
<div align=center>
|
| 34 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65aa76b1cb5b4fb08ecb087c/jb0CFdezPWfIsk7X7lcIJ.png" width="80%">
|
| 35 |
+
</div>
|
| 36 |
+
|
| 37 |
+
### LLaDA Continue Pretraining
|
| 38 |
+
|
| 39 |
+
[Huggingface Link](https://huggingface.co/Mzero17/LLaDA-XDLM)
|
| 40 |
+
|
| 41 |
+
***LLaDA-XDLM with sampling budget of 32.***
|
| 42 |
+
Evaluation of adapting LLaDA-8B to our XDLM formulation (LLaDA-XDLM): (a) LLaDA-XDLM consistently out-performs baselines across diverse benchmarks with 32 sampling steps; (b) Improvements are particularly pronounced in code generation (MBPP), where the
|
| 43 |
+
model substantially reduces generation failures.
|
| 44 |
+
|
| 45 |
+
<div align=center>
|
| 46 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65aa76b1cb5b4fb08ecb087c/oPbIv32EgvA1BbCqd2r6E.png" width="80%">
|
| 47 |
+
</div>
|