AntonXue commited on
Commit
0cdeb73
·
verified ·
1 Parent(s): 4a4735e

README: report 50k training steps (matches truncated log + SWA window)

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -5,7 +5,7 @@ Anchored bidirectional diffusion language model built on Qwen3-0.6B.
5
  - **Architecture**: 28 anchor layers + 28 denoiser layers, hid connection, all weights tied
6
  - **Parameters**: 1.04B unique
7
  - **Base model**: Qwen/Qwen3-0.6B
8
- - **Training**: 51k steps continued pretraining, token-packed streams (block_size=2048),
9
  uniform noise schedule, anchor_weight=0.1, all-position anchor supervision,
10
  shifted AR alignment (BOS-prepend trick on Qwen3 lm_head)
11
  - **Endpoint**: SWA over the last 5 saved checkpoints (steps 46k–50k, 1k stride)
 
5
  - **Architecture**: 28 anchor layers + 28 denoiser layers, hid connection, all weights tied
6
  - **Parameters**: 1.04B unique
7
  - **Base model**: Qwen/Qwen3-0.6B
8
+ - **Training**: 50k steps continued pretraining, token-packed streams (block_size=2048),
9
  uniform noise schedule, anchor_weight=0.1, all-position anchor supervision,
10
  shifted AR alignment (BOS-prepend trick on Qwen3 lm_head)
11
  - **Endpoint**: SWA over the last 5 saved checkpoints (steps 46k–50k, 1k stride)