Safetensors
nielsr HF Staff commited on
Commit
8e7b846
·
verified ·
1 Parent(s): 7d0cb16

Improve model card and add metadata

Browse files

Hi, I'm Niels from the Hugging Face community science team!

I've opened this PR to improve your model card. It now includes:
- Metadata for `library_name` and `pipeline_tag` to improve discoverability and enable automated code snippets.
- A direct link to the research paper.
- A link to the official GitHub repository.

These additions help researchers and developers find and use your work more effectively. Please review and merge if this looks good to you!

Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -1,14 +1,27 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
- This model is the official implementation of the paper: https://arxiv.org/abs/2602.07892.
 
 
 
 
 
 
 
 
 
5
 
6
  ## Citation
7
  If you find this model or dataset useful in your research, please cite our paper:
8
 
 
9
  @article{sun2026safety,
10
  title={Safety alignment as continual learning: Mitigating the alignment tax via orthogonal gradient projection},
11
  author={Sun, Guanglong and Zhang, Siyuan and Wang, Liyuan and Zhu, Jun and Su, Hang and Zhong, Yi},
12
  journal={arXiv preprint arXiv:2602.07892},
13
  year={2026}
14
- }
 
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
  ---
6
+
7
+ # Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection
8
+
9
+ This model is the official implementation of the paper [Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection](https://arxiv.org/abs/2602.07892).
10
+
11
+ **OGPSA** (**O**rthogonal **G**radient **P**rojection for **S**afety **A**lignment) is a method that preserves general capabilities during safety alignment via an orthogonal gradient projection strategy, balancing safety with general utility. It estimates a low-rank reference subspace from gradients on a small set of general-capability data and removes from each safety gradient the component lying in this subspace.
12
+
13
+ ## Resources
14
+ - **Paper:** [https://arxiv.org/abs/2602.07892](https://arxiv.org/abs/2602.07892)
15
+ - **Code:** [https://github.com/SunGL001/OGPSA](https://github.com/SunGL001/OGPSA)
16
 
17
  ## Citation
18
  If you find this model or dataset useful in your research, please cite our paper:
19
 
20
+ ```bibtex
21
  @article{sun2026safety,
22
  title={Safety alignment as continual learning: Mitigating the alignment tax via orthogonal gradient projection},
23
  author={Sun, Guanglong and Zhang, Siyuan and Wang, Liyuan and Zhu, Jun and Su, Hang and Zhong, Yi},
24
  journal={arXiv preprint arXiv:2602.07892},
25
  year={2026}
26
+ }
27
+ ```