HaonanShi's picture
Update README.md
76f2183 verified
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen2.5-1.5B-Instruct
tags:
  - safety
  - alignment

Qwen2.5-3B-Instruct-EASE

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on the EASE-SafetyReasoning dataset.

Model description

This is the safety reasoning aligned version model under the framework,EASE. We fine-tune Qwen2.5-3B-Instruct to enable adaptive safety reasoning activation. The model triggers explicit safety reasoning only under jailbreak-like semantics, while avoiding unnecessary safety reasoning on benign or general prompts. This design aims to maintain the model’s general task effectiveness and efficiency, while improving robustness against jailbreak attacks.

Intended use

Safety-oriented research on:(1)Safety alignment, (2)Small language models and (3)Jailbreak robustness

Citation

If our model could help you, please cite our paper, thanks!🤗

EASE: Practical and Efficient Safety Alignment for Small Language Models(AAAI26)(oral)

https://arxiv.org/pdf/2511.06512