EER6
/

AnCoder-1.0B-Base

AntonXue commited on about 23 hours ago

Commit

0cdeb73

verified ·

1 Parent(s): 4a4735e

README: report 50k training steps (matches truncated log + SWA window)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ Anchored bidirectional diffusion language model built on Qwen3-0.6B.
 - **Architecture**: 28 anchor layers + 28 denoiser layers, hid connection, all weights tied
 - **Parameters**: 1.04B unique
 - **Base model**: Qwen/Qwen3-0.6B
-- **Training**: 51k steps continued pretraining, token-packed streams (block_size=2048),
   uniform noise schedule, anchor_weight=0.1, all-position anchor supervision,
   shifted AR alignment (BOS-prepend trick on Qwen3 lm_head)
 - **Endpoint**: SWA over the last 5 saved checkpoints (steps 46k–50k, 1k stride)

 - **Architecture**: 28 anchor layers + 28 denoiser layers, hid connection, all weights tied
 - **Parameters**: 1.04B unique
 - **Base model**: Qwen/Qwen3-0.6B
+- **Training**: 50k steps continued pretraining, token-packed streams (block_size=2048),
   uniform noise schedule, anchor_weight=0.1, all-position anchor supervision,
   shifted AR alignment (BOS-prepend trick on Qwen3 lm_head)
 - **Endpoint**: SWA over the last 5 saved checkpoints (steps 46k–50k, 1k stride)