Safetensors
OGPSA / README.md
nielsr's picture
nielsr HF Staff
Improve model card and add metadata
8e7b846 verified
|
raw
history blame
1.4 kB
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
---
# Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection
This model is the official implementation of the paper [Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection](https://arxiv.org/abs/2602.07892).
**OGPSA** (**O**rthogonal **G**radient **P**rojection for **S**afety **A**lignment) is a method that preserves general capabilities during safety alignment via an orthogonal gradient projection strategy, balancing safety with general utility. It estimates a low-rank reference subspace from gradients on a small set of general-capability data and removes from each safety gradient the component lying in this subspace.
## Resources
- **Paper:** [https://arxiv.org/abs/2602.07892](https://arxiv.org/abs/2602.07892)
- **Code:** [https://github.com/SunGL001/OGPSA](https://github.com/SunGL001/OGPSA)
## Citation
If you find this model or dataset useful in your research, please cite our paper:
```bibtex
@article{sun2026safety,
title={Safety alignment as continual learning: Mitigating the alignment tax via orthogonal gradient projection},
author={Sun, Guanglong and Zhang, Siyuan and Wang, Liyuan and Zhu, Jun and Su, Hang and Zhong, Yi},
journal={arXiv preprint arXiv:2602.07892},
year={2026}
}
```