Parallax: Parameterized Local Linear Attention for Language Modeling Paper • 2605.29157 • Published 3 days ago • 1
Parallax: Parameterized Local Linear Attention for Language Modeling Paper • 2605.29157 • Published 3 days ago • 1
Parallax: Parameterized Local Linear Attention for Language Modeling Paper • 2605.29157 • Published 3 days ago • 1
Attention 0.6B AdamW-WSD training trajectory Collection Per-step record (every 500 steps, 40 ckpts) of the 0.6B Qwen3 softmax-attention baseline trained AdamW + WSD on 80B tokens. • 40 items • Updated 20 days ago