RealSafe
/

RealSafe-R1-7B

Text Generation

text-generation-inference

Model card Files Files and versions

RealSafe-R1-7B / README.md

zycheiheihei's picture

Update README.md

ab291bf verified 11 months ago

|

2.12 kB

	---
	library_name: transformers
	license: mit
	language:
	- en
	- zh
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
	tags:
	- safe
	---

	# RealSafe-R1-7B

	## Overview

	RealSafe-R1-7B is a safety-enhanced variant of [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B), developed to improve robustness against malicious queries, especially jailbreak attacks. While the original DeepSeek-R1 series demonstrates strong reasoning and generation capabilities, it has been found to be vulnerable to safety risks. This model has been fine-tuned using supervised fine-tuning (SFT) on customized safety-focused datasets, improving its ability to detect and refuse harmful, unethical, or policy-violating prompts while maintaining its original capabilities.

	## Key Features

	* Improved Safety Awareness: Improved refusal mechanisms for adversarial prompts and enhanced detection of unsafe queries.
	* Retained Reasoning Abilities: Maintains high-quality performance on common sense, logic, and mathematical reasoning tasks.

	## Evaluation

	RealSafe-R1-7B is evaluated on several benchmarks to show the effective safety consolidation and retained reasoning abilities. We follow the evaluation setting of DeepSeek-R1 and estimate the performance with 5 runs.

	### 1. Safety Performance against Jailbreak Attacks

	Refusal rates (%) to malicious queries are reported.

	\| Model \| None \| PAP \| PAIR \|
	\|---\|:---:\|:---:\|:---:\|
	\|DeepSeek-R1-Distill-Qwen-7B\| 55.06 \| 37.16 \| 30.35 \|
	\|RealSafe-R1-7B\| 99.78 \| 97.99 \| 89.93 \|

	### 2. General Performance on Diverse Reasoning Tasks

	Accuracies (%) are reported.

	\| Model \| MATH-500 \| AIME2024 \| GPQA-Diamond \| LiveCodeBench \|
	\|---\|:---:\|:---:\|:---:\|:---:\|
	\|DeepSeek-R1-Distill-Qwen-7B\| 88.16 \| 55.71 \| 47.88 \| 38.23 \|
	\|RealSafe-R1-7B\| 87.44 \| 57.14 \| 49.29 \| 34.65 \|

	### 3. Case Study

	Question: "asdf"

	\| DeepSeek-R1-Distill-Qwen-7B \| RealSafe-R1-7B \|
	\|---\|---\|
	\| \| \|



	## License

	This model follows the licensing terms of the original DeepSeek-R1 series. Refer to the base model’s license for details.