Papers
arxiv:2606.16700

Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models

Published on Jun 15
· Submitted by
Tianyi Zhou
on Jun 22
Authors:
,
,
,
,
,

Abstract

Reflective Masking enables iterative local refinement in Mask Diffusion Models through lightweight post-training, supporting multi-turn reasoning without architectural changes.

While reasoning on autoregressive (AR) models is often performed by chain-of-thought reasoning and reflection, their refinement of previous outputs still relies on fully sequential generation, even when only local edits are needed. In contrast, the masking mechanism in Mask Diffusion Models (MDMs) naturally supports explicit local edits on previous outputs, allowing selective refinement without discarding previous answers and generating another from scratch. While this property more closely aligns with how humans correct mistakes by iterative local refinement, existing MDMs do not support multi-turn masking and denoising. We propose Reflective Masking (RM), which elicits such an intrinsic reasoning capability in MDMs via lightweight post-training. RM provides a native test-time scaling, where an MDM iteratively revisits and revises its prior outputs based on evolving context. To exploit insights from previous turns like AR reasoning, we further introduce History Reference, a parameter-free mechanism that leverages intermediate denoising states during revision. Our approach requires no architectural changes and is easily applicable to existing MDMs. Across diverse tasks and modalities, including text generation, Sudoku, and image editing, Reflective Masking consistently outperforms standard masking-based baselines and demonstrates strong generality, positioning RM as a fundamental primitive for reasoning on MDMs.

Community

Paper author Paper submitter

What is the native form of reasoning for Mask Diffusion Models (LLaDa, DiffusionGemma, etc.)?

For autoregressive language models, reasoning has largely been framed as continuation: generate a chain of thought, reflect on it, and append more tokens.

But Mask Diffusion Models are not left-to-right. They generate and refine a full canvas with bidirectional context. So perhaps their natural reasoning mechanism should not be continuation, but revision.
In our recent work, we introduce Reflective Masking, a lightweight post-training method that elicits multi-turn self-revision in existing Mask Diffusion Models.

The core idea is simple:
Instead of committing to every generated token, the model learns a token-level revision policy:
✅ Keep reliable tokens
🟨 Re-mask uncertain or wrong tokens
✨ Reveal better replacements
This turns generation into an iterative process of sparse, local self-correction.

A key challenge in this setting is that revision can loop: a model may re-mask an incorrect token and later generate the same mistake again. To address this, we introduce History Reference, a parameter-free memory mechanism that exposes previous denoising states to the model. This helps the model remember what it has already tried and avoid repeated errors.

We find this direction exciting because it suggests a different view of reasoning for diffusion language models:
Autoregressive reflection thinks by continuing.
Reflective Masking thinks by revising.

Recent systems such as DiffusionGemma and Gemini Diffusion highlight the growing promise of diffusion-based text generation, including bidirectional context and iterative refinement. Our work studies a complementary question: how can existing MDMs be post-trained to explicitly perform sparse, multi-turn reasoning through self-revision?

We evaluate Reflective Masking across several settings:
🧩 Sudoku revision
🧠 mathematical reasoning
💻 code generation
🎨 image editing

Across these tasks, Reflective Masking improves performance by allowing the model to selectively revisit previous outputs rather than regenerate from scratch or commit too early.
The broader takeaway is that diffusion language models may not simply be faster parallel generators. They may enable a different reasoning paradigm: reasoning as iterative state refinement.

Neat paper. It is interesting to see the shift toward iterative local refinement for MDMs, especially since the current AR approach of regenerating everything from scratch feels so inefficient when you only need a minor correction. Using history reference to keep the context while revising seems like a clever way to bridge the gap between diffusion and chain-of-thought.

How does the performance of the History Reference mechanism hold up if the model needs to make a significant correction versus a minor one?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/dd3a5ecc-bc72-4346-8e6a-3e41c0a4e783

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.16700
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.16700 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.16700 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.16700 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.