Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs Paper • 2605.30501 • Published 7 days ago • 29
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 28
When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment Paper • 2509.00544 • Published Aug 30, 2025 • 11