Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Paper • 2401.05566 • Published Jan 10, 2024 • 31
Zephyr 7B Collection Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 8 items • Updated Mar 2 • 152