Papers
arxiv:2601.14758

Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models

Published on Mar 19
Authors:
,

Abstract

Post-training autoregressive models into masked diffusion models creates systematic mechanism shifts that reorganize internal computation for non-sequential global planning tasks.

AI-generated summary

Post-training pretrained autoregressive models (ARMs) into masked diffusion models (MDMs) has emerged as a cost-effective way to overcome the limitations of sequential generation. Yet the internal algorithmic changes induced by this shift remain poorly understood, leaving it unclear whether post-trained MDMs acquire genuine bidirectional reasoning or merely repackage autoregressive heuristics. We address this question through a comparative circuit analysis of ARMs and their MDM counterparts. Our analysis reveals a systematic "mechanism shift" that depends on the structural nature of the task. MDMs largely preserve autoregressive circuitry for tasks driven by local causal dependencies, but for global planning tasks they abandon initialized pathways and exhibit distinct rewiring with increased early-layer processing. At the semantic level, we observe a transition from sharp, localized specialization in ARMs to distributed integration in MDMs. These findings show that diffusion post-training does not simply adjust model parameters, but reorganizes internal computation to support non-sequential global planning.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2601.14758
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.14758 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.14758 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.14758 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.