Mano: Restriking Manifold Optimization for LLM Training
Paper • 2601.23000 • Published • 3
None defined yet.
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
Optimizing Few-Step Generation with Adaptive Matching Distillation