laaaarrywang
/

SCDD

@@ -14,11 +14,42 @@ datasets:
 # SCDD
-This repository contains the released checkpoints for **Generalized Discrete Diffusion with Self-Correction**.
-SCDD is a self-correcting discrete diffusion language model. It learns to revise incorrect visible tokens directly during generation, preserving parallel decoding without a remasking step.
-## Checkpoints
 | File | Config | Model | Uniform noise ratio |
 | --- | --- | --- | --- |
@@ -27,7 +58,7 @@ SCDD is a self-correcting discrete diffusion language model. It learns to revise
 The checkpoint filenames intentionally use `scdd` naming for the public release.
-## Model configuration
 Both checkpoints use the same GPT-2 scale DiT backbone and differ only in the SCDD uniform-noise ratio.
@@ -53,9 +84,17 @@ Both checkpoints use the same GPT-2 scale DiT backbone and differ only in the SC
 See `configs/scdd_pu_0.1.yaml` and `configs/scdd_pu_0.2.yaml` for sanitized public configuration files.
 ## Code
-Code and evaluation scripts are available at:
 <https://github.com/laaaarrywang/Self-Correcting-Discrete-Diffusion>

 # SCDD
+This repository contains the released checkpoints for **Generalized Discrete Diffusion with Self-Correction**, accepted at ICML 2026.
+SCDD is a self-correcting discrete diffusion language model. It is designed to preserve the parallel generation advantage of masked diffusion models while allowing already visible tokens to be revised directly during the denoising process.
+## Introduction
+Autoregressive language models generate text one token at a time. Masked diffusion language models instead use an order-agnostic denoising process, which can generate many positions in parallel and can reduce inference latency for long sequences. In practice, however, mainstream masked diffusion language models often decode only a limited number of tokens per step; decoding too many tokens can disrupt token dependencies and degrade generation quality.
+Self-correction is a simple way to improve parallel generation: a model should be able to repair low-quality tokens from earlier denoising steps. Prior work has studied self-correction at inference time or through post-training, and GIDD studies pretraining-based self-correction with a multi-step BERT-style uniform-absorbing objective. The paper argues that GIDD's interpolation-based pipeline creates opaque interactions between uniform transitions and absorbing masks, and that its reverse process still retains remasking behavior.
+SCDD reformulates pretraining-based self-correction in discrete time with explicit state transitions. The forward process combines absorbing-mask corruption and uniform token corruption. The backward process is derived from Bayes' rule and can revise visible tokens without sending them back to `[MASK]`. In the paper's formulation, SCDD also simplifies the training noise schedule, removes a redundant remasking step, and relies on uniform transitions to learn self-correction.
+The paper reports experiments at GPT-2 scale on LM1B and OpenWebText. In these settings, SCDD improves few-step parallel generation quality and shows stronger self-correction behavior while preserving sample diversity as measured by unigram entropy.
+## Method Summary
+For a clean token `x`, SCDD uses a marginal forward distribution of the form
+```text
+q(z_t | x) = Cat(z_t; gamma_t (rho_t x + (1 - rho_t) u) + (1 - gamma_t) m),
+```
+where:
+- `m` is the `[MASK]` token.
+- `u` is the uniform distribution over non-`[MASK]` tokens.
+- `gamma_t` is the probability that `z_t` is not `[MASK]`.
+- `rho_t` is the probability that `z_t` retains the clean token among non-`[MASK]` mass.
+The two parameters separate the absorbing-mask signal-to-noise ratio from the uniform-transition signal-to-noise ratio. This decoupling gives separate control over masking and token corruption while keeping the marginal distribution explicit.
+Under the monotone schedules used in the paper, the `[MASK]` state is absorbing in the forward process. This choice removes remasking from the reverse generation process: during sampling, visible tokens may transition directly to other visible tokens, and masked tokens continue to denoise in parallel.
+## What Is Released Here
+This repository releases two OpenWebText SCDD checkpoints. They share the same architecture and training setup, and differ in the maximum uniform noise ratio `p_u`.
 | File | Config | Model | Uniform noise ratio |
 | --- | --- | --- | --- |
 The checkpoint filenames intentionally use `scdd` naming for the public release.
+## Model Configuration
 Both checkpoints use the same GPT-2 scale DiT backbone and differ only in the SCDD uniform-noise ratio.
 See `configs/scdd_pu_0.1.yaml` and `configs/scdd_pu_0.2.yaml` for sanitized public configuration files.
+## Reported Evaluation Context
+The paper evaluates SCDD against MDLM, ReMDM, and GIDD+ baselines. It reports generative perplexity on LM1B and OpenWebText across multiple sampling-step budgets, and also reports unigram entropy as a sanity check against repetitive text. In the paper's Table 3, `SCDD (p_u = 0.2)` obtains the best generative perplexity in every reported LM1B and OWT sampling-step column.
+The paper also studies correction behavior directly. In a controlled corruption-recovery experiment on OpenWebText validation sequences, SCDD modifies nearly all intentionally corrupted tokens and exactly recovers a large fraction of them after one denoising step. These experiments are meant to test whether token edits are meaningful corrections rather than frequent but unhelpful revisions.
+The paper notes that standard zero-shot likelihood benchmarks do not explicitly measure the self-correction ability studied in the generation and correction experiments.
 ## Code
+Code, project page, and evaluation scripts are available at:
 <https://github.com/laaaarrywang/Self-Correcting-Discrete-Diffusion>