Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

skysys00
/
Meta-Llama-3-8B-Instruct-DeepRefusal

Text Generation
Safetensors
English
llama
SafetyAlignment
conversational
Model card Files Files and versions
xet
Community
Meta-Llama-3-8B-Instruct-DeepRefusal / README.md
skysys00's picture
skysys00
Update README.md
134fb9a verified about 1 hour ago
preview code
|
raw
history blame contribute delete
291 Bytes
metadata
language:
  - en
pipeline_tag: text-generation
tags:
  - SafetyAlignment

Trained by https://github.com/YuanBoXie/DeepRefusal

[1] Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction, EMNLP 2025