Uppaal commited on
Commit
2d306eb
·
verified ·
1 Parent(s): 28ffcaf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -38,8 +38,10 @@ base_model:
38
 
39
  # ProFS Editing for Safety
40
 
41
- This model has been edited for safety from [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1).
42
- Editing is applied using ProFS (Projection Filter for Subspaces), a tuning-free alignment method that removes undesired behaviors such as toxicity, by identifying and projecting out harmful subspaces in model weights.
 
 
43
  The model accompanies the paper [Model Editing as a Robust and Denoised Variant of DPO: A Case Study on Toxicity](https://arxiv.org/abs/2405.13967)
44
  published at ICLR 2025 (previously released under the preprint title “DeTox: Toxic Subspace Projection for Model Editing”; both refer to the same work).
45
 
 
38
 
39
  # ProFS Editing for Safety
40
 
41
+ This model is an edited version of [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1).
42
+ Editing is applied through ProFS, to improve safety.
43
+
44
+ ProFS (Projection Filter for Subspaces) is a tuning-free alignment method that removes undesired behaviors by identifying and projecting out harmful subspaces in model weights.
45
  The model accompanies the paper [Model Editing as a Robust and Denoised Variant of DPO: A Case Study on Toxicity](https://arxiv.org/abs/2405.13967)
46
  published at ICLR 2025 (previously released under the preprint title “DeTox: Toxic Subspace Projection for Model Editing”; both refer to the same work).
47