Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 59
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published Mar 3, 2025 • 32