CWRUSafetyLab
/

Qwen2.5-1.5B-Instruct-EASE

Model card Files Files and versions

Qwen2.5-1.5B-Instruct-EASE / README.md

HaonanShi's picture

Update README.md

7301c46 verified 2 days ago

|

history blame contribute delete

1.21 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen2.5-1.5B-Instruct
	tags:
	- safety
	- alignment
	---

	# Qwen2.5-1.5B-Instruct-EASE
	This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) on the [EASE-SafetyReasoning](https://huggingface.co/datasets/HaonanShi/EASE-STAR41K-SafetyReasoning-10K) dataset.

	## Model description
	This is the safety reasoning aligned version model under the framework,[EASE](https://arxiv.org/pdf/2511.06512). We fine-tune Qwen2.5-1.5B-Instruct to enable adaptive safety reasoning activation. The model triggers explicit safety reasoning only under jailbreak-like semantics, while avoiding unnecessary safety reasoning on benign or general prompts. This design aims to maintain the model’s general task effectiveness and efficiency, while improving robustness against jailbreak attacks.

	## Intended use
	Safety-oriented research on:(1)Safety alignment, (2)Small language models and (3)Jailbreak robustness

	## Citation
	If our model could help you, please cite our paper, thanks!🤗

	EASE: Practical and Efficient Safety Alignment for Small Language Models(AAAI26)(oral)

	https://arxiv.org/pdf/2511.06512