Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind Paper โข 2604.11666 โข Published Apr 13 โข 4
๐ Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized โข 136 items โข Updated May 26 โข 119