When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models Paper • 2605.08245 • Published 7 days ago • 1
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 14 days ago • 63
Bridging Mechanistic Interpretability and Prompt Engineering with Gradient Ascent for Interpretable Persona Control Paper • 2601.02896 • Published Apr 22 • 1