Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models Paper • 2604.26951 • Published Apr 29 • 48
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective Paper • 2505.15045 • Published May 21, 2025 • 56