Improve model card and add metadata
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,14 +1,27 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
## Citation
|
| 7 |
If you find this model or dataset useful in your research, please cite our paper:
|
| 8 |
|
|
|
|
| 9 |
@article{sun2026safety,
|
| 10 |
title={Safety alignment as continual learning: Mitigating the alignment tax via orthogonal gradient projection},
|
| 11 |
author={Sun, Guanglong and Zhang, Siyuan and Wang, Liyuan and Zhu, Jun and Su, Hang and Zhong, Yi},
|
| 12 |
journal={arXiv preprint arXiv:2602.07892},
|
| 13 |
year={2026}
|
| 14 |
-
}
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
---
|
| 6 |
+
|
| 7 |
+
# Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection
|
| 8 |
+
|
| 9 |
+
This model is the official implementation of the paper [Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection](https://arxiv.org/abs/2602.07892).
|
| 10 |
+
|
| 11 |
+
**OGPSA** (**O**rthogonal **G**radient **P**rojection for **S**afety **A**lignment) is a method that preserves general capabilities during safety alignment via an orthogonal gradient projection strategy, balancing safety with general utility. It estimates a low-rank reference subspace from gradients on a small set of general-capability data and removes from each safety gradient the component lying in this subspace.
|
| 12 |
+
|
| 13 |
+
## Resources
|
| 14 |
+
- **Paper:** [https://arxiv.org/abs/2602.07892](https://arxiv.org/abs/2602.07892)
|
| 15 |
+
- **Code:** [https://github.com/SunGL001/OGPSA](https://github.com/SunGL001/OGPSA)
|
| 16 |
|
| 17 |
## Citation
|
| 18 |
If you find this model or dataset useful in your research, please cite our paper:
|
| 19 |
|
| 20 |
+
```bibtex
|
| 21 |
@article{sun2026safety,
|
| 22 |
title={Safety alignment as continual learning: Mitigating the alignment tax via orthogonal gradient projection},
|
| 23 |
author={Sun, Guanglong and Zhang, Siyuan and Wang, Liyuan and Zhu, Jun and Su, Hang and Zhong, Yi},
|
| 24 |
journal={arXiv preprint arXiv:2602.07892},
|
| 25 |
year={2026}
|
| 26 |
+
}
|
| 27 |
+
```
|