praxisresearch 's Collections

MMLU-EM Models

MMLU SFT first, then EM training. Ablation: does MMLU pre-training affect emergent misalignment?