Spectral Condition for $μ$P under Width-Depth Scaling Paper • 2603.00541 • Published 13 days ago • 15
LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model Paper • 2603.01068 • Published 12 days ago • 20
Beyond the Surface: Measuring Self-Preference in LLM Judgments Paper • 2506.02592 • Published Jun 3, 2025 • 8
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published May 22, 2025 • 34
Scaling Diffusion Transformers Efficiently via $μ$P Paper • 2505.15270 • Published May 21, 2025 • 35 • 2
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing Paper • 2311.01410 • Published Nov 2, 2023
Revisiting Discriminative vs. Generative Classifiers: Theory and Implications Paper • 2302.02334 • Published Feb 5, 2023
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability Paper • 2405.16845 • Published May 27, 2024