When Behavioral Safety Evaluation Fails: A Representation-Level Perspective Paper • 2606.08044 • Published 28 days ago • 1
Latent Adversarial Regularization for Offline Preference Optimization Paper • 2601.22083 • Published Jan 29 • 14