laaaarrywang commited on
Commit
a1d906f
·
verified ·
1 Parent(s): 54aa3f3

Expand model card from paper introduction

Browse files
Files changed (1) hide show
  1. README.md +44 -5
README.md CHANGED
@@ -14,11 +14,42 @@ datasets:
14
 
15
  # SCDD
16
 
17
- This repository contains the released checkpoints for **Generalized Discrete Diffusion with Self-Correction**.
18
 
19
- SCDD is a self-correcting discrete diffusion language model. It learns to revise incorrect visible tokens directly during generation, preserving parallel decoding without a remasking step.
20
 
21
- ## Checkpoints
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  | File | Config | Model | Uniform noise ratio |
24
  | --- | --- | --- | --- |
@@ -27,7 +58,7 @@ SCDD is a self-correcting discrete diffusion language model. It learns to revise
27
 
28
  The checkpoint filenames intentionally use `scdd` naming for the public release.
29
 
30
- ## Model configuration
31
 
32
  Both checkpoints use the same GPT-2 scale DiT backbone and differ only in the SCDD uniform-noise ratio.
33
 
@@ -53,9 +84,17 @@ Both checkpoints use the same GPT-2 scale DiT backbone and differ only in the SC
53
 
54
  See `configs/scdd_pu_0.1.yaml` and `configs/scdd_pu_0.2.yaml` for sanitized public configuration files.
55
 
 
 
 
 
 
 
 
 
56
  ## Code
57
 
58
- Code and evaluation scripts are available at:
59
 
60
  <https://github.com/laaaarrywang/Self-Correcting-Discrete-Diffusion>
61
 
 
14
 
15
  # SCDD
16
 
17
+ This repository contains the released checkpoints for **Generalized Discrete Diffusion with Self-Correction**, accepted at ICML 2026.
18
 
19
+ SCDD is a self-correcting discrete diffusion language model. It is designed to preserve the parallel generation advantage of masked diffusion models while allowing already visible tokens to be revised directly during the denoising process.
20
 
21
+ ## Introduction
22
+
23
+ Autoregressive language models generate text one token at a time. Masked diffusion language models instead use an order-agnostic denoising process, which can generate many positions in parallel and can reduce inference latency for long sequences. In practice, however, mainstream masked diffusion language models often decode only a limited number of tokens per step; decoding too many tokens can disrupt token dependencies and degrade generation quality.
24
+
25
+ Self-correction is a simple way to improve parallel generation: a model should be able to repair low-quality tokens from earlier denoising steps. Prior work has studied self-correction at inference time or through post-training, and GIDD studies pretraining-based self-correction with a multi-step BERT-style uniform-absorbing objective. The paper argues that GIDD's interpolation-based pipeline creates opaque interactions between uniform transitions and absorbing masks, and that its reverse process still retains remasking behavior.
26
+
27
+ SCDD reformulates pretraining-based self-correction in discrete time with explicit state transitions. The forward process combines absorbing-mask corruption and uniform token corruption. The backward process is derived from Bayes' rule and can revise visible tokens without sending them back to `[MASK]`. In the paper's formulation, SCDD also simplifies the training noise schedule, removes a redundant remasking step, and relies on uniform transitions to learn self-correction.
28
+
29
+ The paper reports experiments at GPT-2 scale on LM1B and OpenWebText. In these settings, SCDD improves few-step parallel generation quality and shows stronger self-correction behavior while preserving sample diversity as measured by unigram entropy.
30
+
31
+ ## Method Summary
32
+
33
+ For a clean token `x`, SCDD uses a marginal forward distribution of the form
34
+
35
+ ```text
36
+ q(z_t | x) = Cat(z_t; gamma_t (rho_t x + (1 - rho_t) u) + (1 - gamma_t) m),
37
+ ```
38
+
39
+ where:
40
+
41
+ - `m` is the `[MASK]` token.
42
+ - `u` is the uniform distribution over non-`[MASK]` tokens.
43
+ - `gamma_t` is the probability that `z_t` is not `[MASK]`.
44
+ - `rho_t` is the probability that `z_t` retains the clean token among non-`[MASK]` mass.
45
+
46
+ The two parameters separate the absorbing-mask signal-to-noise ratio from the uniform-transition signal-to-noise ratio. This decoupling gives separate control over masking and token corruption while keeping the marginal distribution explicit.
47
+
48
+ Under the monotone schedules used in the paper, the `[MASK]` state is absorbing in the forward process. This choice removes remasking from the reverse generation process: during sampling, visible tokens may transition directly to other visible tokens, and masked tokens continue to denoise in parallel.
49
+
50
+ ## What Is Released Here
51
+
52
+ This repository releases two OpenWebText SCDD checkpoints. They share the same architecture and training setup, and differ in the maximum uniform noise ratio `p_u`.
53
 
54
  | File | Config | Model | Uniform noise ratio |
55
  | --- | --- | --- | --- |
 
58
 
59
  The checkpoint filenames intentionally use `scdd` naming for the public release.
60
 
61
+ ## Model Configuration
62
 
63
  Both checkpoints use the same GPT-2 scale DiT backbone and differ only in the SCDD uniform-noise ratio.
64
 
 
84
 
85
  See `configs/scdd_pu_0.1.yaml` and `configs/scdd_pu_0.2.yaml` for sanitized public configuration files.
86
 
87
+ ## Reported Evaluation Context
88
+
89
+ The paper evaluates SCDD against MDLM, ReMDM, and GIDD+ baselines. It reports generative perplexity on LM1B and OpenWebText across multiple sampling-step budgets, and also reports unigram entropy as a sanity check against repetitive text. In the paper's Table 3, `SCDD (p_u = 0.2)` obtains the best generative perplexity in every reported LM1B and OWT sampling-step column.
90
+
91
+ The paper also studies correction behavior directly. In a controlled corruption-recovery experiment on OpenWebText validation sequences, SCDD modifies nearly all intentionally corrupted tokens and exactly recovers a large fraction of them after one denoising step. These experiments are meant to test whether token edits are meaningful corrections rather than frequent but unhelpful revisions.
92
+
93
+ The paper notes that standard zero-shot likelihood benchmarks do not explicitly measure the self-correction ability studied in the generation and correction experiments.
94
+
95
  ## Code
96
 
97
+ Code, project page, and evaluation scripts are available at:
98
 
99
  <https://github.com/laaaarrywang/Self-Correcting-Discrete-Diffusion>
100