|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-7B-Instruct |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# TARS-SFT-7B |
|
|
|
|
|
## Overview |
|
|
|
|
|
**TARS-SFT-7B** is a lightweight SFT-tuned reasoning model for safety used for **TARS**: *Training Adaptive Reasoners for Safety* introduced in the paper: [**Reasoning as an Adaptive Defense for Safety**](https://arxiv.org/abs/2507.00971). This model is \\(\pi_{SFT}\\), which is used as the base model for RL training, trained starting from [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). |
|
|
|
|
|
For full details, please check out our [paper](https://arxiv.org/pdf/2507.00971) or [blogpost](https://training-adaptive-reasoners-safety.github.io). |
|
|
|
|
|
--- |
|
|
|
|
|
## 📖 Citation |
|
|
|
|
|
If you use **TARS-SFT-7B** in your work, please cite us: |
|
|
|
|
|
```bibtex |
|
|
@misc{kim2025reasoningadaptivedefensesafety, |
|
|
title = {Reasoning as an Adaptive Defense for Safety}, |
|
|
author = {Taeyoun Kim and Fahim Tajwar and Aditi Raghunathan and Aviral Kumar}, |
|
|
year = {2025}, |
|
|
eprint = {2507.00971}, |
|
|
archivePrefix= {arXiv}, |
|
|
primaryClass = {cs.LG}, |
|
|
url = {https://arxiv.org/abs/2507.00971} |
|
|
} |