xdlm-owt — XDLM adapted from mdlm-owt (60000 steps, k=0.1)

An XDLM (arXiv 2602.01362, Balancing Understanding and Generation in Discrete Diffusion Models) obtained by continued-pretraining kuleshov-group/mdlm-owt (a pure MDLM) into the XDLM formulation on plain OpenWebText.

  • XDLM = a stationary mixed noise kernel K = (k/N)·J + μ·M unifying MDLM (k=0, absorbing/mask) and UDLM (k=1, uniform). Mixing ratio k=0.1 (the paper's sweet spot): of each corrupted token's mass, (1−k) goes to [MASK] and k to a uniform real token. Trained with the paper's unified single-posterior NELBO (eq. 15), which reduces exactly to MDLM at k=0 and UDLM at k=1.
  • 169.6M vendored Duo DiT backbone, GPT-2 tokenizer, vocab 50258 ([MASK]=50257, pad=eos=50256). time_conditioning=False (sigma=0), matching mdlm-owt.
  • Data: plain EER6/openwebtext-coarse text, packed to L1024 (first 2048 docs held out for validation).
  • Recipe (paper-matched MDLM→XDLM SFT): 60000 steps, lr 2e-5 constant (warmup 100), AdamW(0.9, 0.999), wd 0, bf16, global batch 288, EMA 0.9999, k=0.1.
  • These are the EMA weights of checkpoint-60000 (DiT backbone state_dict; flat model.safetensors at repo root, same layout as mdlm-owt).

Results: held-out val NELBO 3.537 (ppl 34.4) on the k=0.1 XDLM objective. Correctness: the training loss reduces bit-identically to the MDLM loss at k=0 and to the UDLM ELBO at k=1; a k=0 generation control reproduces the mdlm-owt gen-PPL (~59), so the XDLM sampler and pipeline are validated. Generation uses the exact XDLM reverse posterior (eq. 11) over {real}∪{mask} — not commit-once (the uniform channel can revise tokens).

Load (project code): sampling/sample_xdlm.py --model EER6/xdlm-owt --k_uniform 0.1 or duo_core.load_model("EER6/xdlm-owt", 1024, 50258, device). Adapt further with train/adapt_to_xdlm.py --init_ckpt EER6/xdlm-owt.

Reference companions: EER6/mdlm-owt-diff1, EER6/mdlm-owt-trash.

Downloads last month
17
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support