Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction Paper • 2509.15202 • Published Sep 18, 2025 • 2
wangzhang/Llama-3-8B-Instruct-DeepRefusal-Broken Text Generation • 8B • Updated 25 days ago • 394 • 3