laaaarrywang commited on
Commit
3f47f82
·
verified ·
1 Parent(s): 7efeb11

Update model card with configuration details

Browse files
Files changed (1) hide show
  1. README.md +38 -12
README.md CHANGED
@@ -1,15 +1,15 @@
1
  ---
2
- license: mit
3
  library_name: pytorch
4
  pipeline_tag: text-generation
5
  tags:
6
- - discrete-diffusion
7
- - diffusion-language-model
8
- - self-correction
9
- - scdd
10
- - icml-2026
11
  datasets:
12
- - Skylion007/openwebtext
13
  ---
14
 
15
  # SCDD
@@ -20,13 +20,39 @@ SCDD is a self-correcting discrete diffusion language model. It learns to revise
20
 
21
  ## Checkpoints
22
 
23
- | File | Model | Uniform noise ratio |
24
- | --- | --- | --- |
25
- | `checkpoints/scdd_pu_0.1.ckpt` | SCDD (0.1) | `p_u = 0.1` |
26
- | `checkpoints/scdd_pu_0.2.ckpt` | SCDD (0.2) | `p_u = 0.2` |
27
 
28
  The checkpoint filenames intentionally use `scdd` naming for the public release.
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ## Code
31
 
32
  Code and evaluation scripts are available at:
@@ -42,4 +68,4 @@ Code and evaluation scripts are available at:
42
  journal={arXiv preprint arXiv:2603.02230},
43
  year={2026}
44
  }
45
- ```
 
1
  ---
2
+ license: other
3
  library_name: pytorch
4
  pipeline_tag: text-generation
5
  tags:
6
+ - discrete-diffusion
7
+ - diffusion-language-model
8
+ - self-correction
9
+ - scdd
10
+ - icml-2026
11
  datasets:
12
+ - openwebtext
13
  ---
14
 
15
  # SCDD
 
20
 
21
  ## Checkpoints
22
 
23
+ | File | Config | Model | Uniform noise ratio |
24
+ | --- | --- | --- | --- |
25
+ | `checkpoints/scdd_pu_0.1.ckpt` | `configs/scdd_pu_0.1.yaml` | SCDD (0.1) | `p_u = 0.1` |
26
+ | `checkpoints/scdd_pu_0.2.ckpt` | `configs/scdd_pu_0.2.yaml` | SCDD (0.2) | `p_u = 0.2` |
27
 
28
  The checkpoint filenames intentionally use `scdd` naming for the public release.
29
 
30
+ ## Model configuration
31
+
32
+ Both checkpoints use the same GPT-2 scale DiT backbone and differ only in the SCDD uniform-noise ratio.
33
+
34
+ | Setting | Value |
35
+ | --- | --- |
36
+ | Backbone | DiT / `ddit` |
37
+ | Parameterization | `scdd` |
38
+ | Dataset | OpenWebText |
39
+ | Tokenizer | GPT-2 |
40
+ | Context length | 512 |
41
+ | Hidden size | 768 |
42
+ | Number of blocks | 12 |
43
+ | Number of attention heads | 12 |
44
+ | Conditional dimension | 128 |
45
+ | Dropout | 0.0 |
46
+ | Diffusion steps used in training grid | 1000 |
47
+ | Forward process | `mix` |
48
+ | `gamma` schedule-shape parameter | 1 |
49
+ | Uniform-noise peak time | `t_peak = 0.5` |
50
+ | EMA | 0.9999 |
51
+ | Optimizer | Adam-style optimizer, lr `5e-4`, weight decay `0.02` |
52
+ | Precision | bfloat16 |
53
+
54
+ See `configs/scdd_pu_0.1.yaml` and `configs/scdd_pu_0.2.yaml` for sanitized public configuration files.
55
+
56
  ## Code
57
 
58
  Code and evaluation scripts are available at:
 
68
  journal={arXiv preprint arXiv:2603.02230},
69
  year={2026}
70
  }
71
+ ```