EMO: Pretraining Mixture of Experts for Emergent Modularity Paper • 2605.06663 • Published 3 days ago • 5
view article Article EMO: Pretraining mixture of experts for emergent modularity about 22 hours ago • 18
Teaching Models to Understand (but not Generate) High-risk Data Paper • 2505.03052 • Published May 5, 2025 • 6
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Paper • 2502.20475 • Published Feb 27, 2025 • 3