Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models Paper • 2606.11409 • Published 4 days ago • 8
Towards Understanding the Robustness of Sparse Autoencoders Paper • 2604.18756 • Published Apr 20 • 11
Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation Paper • 2604.07835 • Published Apr 9
Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs Paper • 2606.03647 • Published 11 days ago
Adversarial Reframing: A Framework for Targeted Generation in Language Models Paper • 2605.21674 • Published 24 days ago
Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs Paper • 2605.04446 • Published May 6
SoK: Robustness in Large Language Models against Jailbreak Attacks Paper • 2605.05058 • Published May 6
On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance Paper • 2606.00467 • Published 14 days ago
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published 15 days ago • 112
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time? Paper • 2606.05553 • Published 9 days ago • 47