Balancing Understanding and Generation in Discrete Diffusion Models
Paper
•
2602.01362
•
Published
•
10
This repository contains the checkpoint of 600 training steps for continual pretraining LLaDA with XDLM.
LLaDA-XDLM with sampling budget of 32. Evaluation of adapting LLaDA-8B to our XDLM formulation (LLaDA-XDLM): (a) LLaDA-XDLM consistently out-performs baselines across diverse benchmarks with 32 sampling steps; (b) Improvements are particularly pronounced in code generation (MBPP), where the model substantially reduces generation failures.
For details and usage see Code
model_card to support standard huggingface transformers's usage.Base model
GSAI-ML/LLaDA-8B-Base