Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind Paper โข 2604.11666 โข Published 25 days ago โข 4
๐ Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized โข 135 items โข Updated Dec 18, 2025 โข 119