Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs Paper • 2509.05367 • Published Jan 10 • 1
HARC: Coupling Harmfulness and Refusal Directions for Robust Safety Alignment Paper • 2607.00572 • Published 1 day ago • 2
HARC Collection A family of safety-aligned instruction models trained with HARC • 4 items • Updated about 10 hours ago • 1
HARC Collection A family of safety-aligned instruction models trained with HARC • 4 items • Updated about 10 hours ago • 1
Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs Paper • 2509.05367 • Published Jan 10 • 1