skysys00
/

Meta-Llama-3-8B-Instruct-DeepRefusal

Text Generation

SafetyAlignment

Model card Files Files and versions

Meta-Llama-3-8B-Instruct-DeepRefusal / README.md

skysys00's picture

Update README.md

134fb9a verified about 1 hour ago

|

history blame contribute delete

291 Bytes

language:
  - en
pipeline_tag: text-generation
tags:
  - SafetyAlignment

Trained by https://github.com/YuanBoXie/DeepRefusal

[1] Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction, EMNLP 2025