long2333
/

OGPSA

Model card Files Files and versions

OGPSA / README.md

nielsr's picture

nielsr HF Staff

Improve model card and add metadata

8e7b846 verified 9 days ago

|

1.4 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	---

	# Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection

	This model is the official implementation of the paper [Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection](https://arxiv.org/abs/2602.07892).

	OGPSA (Orthogonal Gradient Projection for Safety Alignment) is a method that preserves general capabilities during safety alignment via an orthogonal gradient projection strategy, balancing safety with general utility. It estimates a low-rank reference subspace from gradients on a small set of general-capability data and removes from each safety gradient the component lying in this subspace.

	## Resources
	- Paper: [https://arxiv.org/abs/2602.07892](https://arxiv.org/abs/2602.07892)
	- Code: [https://github.com/SunGL001/OGPSA](https://github.com/SunGL001/OGPSA)

	## Citation
	If you find this model or dataset useful in your research, please cite our paper:

	```bibtex
	@article{sun2026safety,
	title={Safety alignment as continual learning: Mitigating the alignment tax via orthogonal gradient projection},
	author={Sun, Guanglong and Zhang, Siyuan and Wang, Liyuan and Zhu, Jun and Su, Hang and Zhong, Yi},
	journal={arXiv preprint arXiv:2602.07892},
	year={2026}
	}
	```