The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models Paper • 2601.10387 • Published Jan 15 • 15
Assessing Domain-Level Susceptibility to Emergent Misalignment from Narrow Finetuning Paper • 2602.00298 • Published Jan 30 • 1 • 4
Assessing Domain-Level Susceptibility to Emergent Misalignment from Narrow Finetuning Paper • 2602.00298 • Published Jan 30 • 1
Assessing Domain-Level Susceptibility to Emergent Misalignment from Narrow Finetuning Paper • 2602.00298 • Published Jan 30 • 1