schonsense
/

llama33_inst_multivector_derestriction

Model card Files Files and versions

schonsense commited on Dec 18, 2025

Commit

62bc0fa

·

verified ·

1 Parent(s): 9662961

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ I have only slightly modified this implementation to process multiple refusal ve
 Resulting in approximately 10% increased signal strength of refusal metric calculation.
-The less text in previous context, the more premissive the responses. Polite/sanitized behavior is so ingrained in the data that the model knows how to say no to stuff it doesn't like, without relying on the policy "refusal vector"
 ![multi_analysis](https://cdn-uploads.huggingface.co/production/uploads/6317d4867690c5b55e61ce3d/lB2SilvnTZO_kafjwZ_rf.png)

 Resulting in approximately 10% increased signal strength of refusal metric calculation.
+The less text in previous context, the more permissive the responses. Polite/sanitized behavior is so ingrained in the data that the model knows how to say no to stuff it doesn't like, without relying on the policy "refusal vector"
 ![multi_analysis](https://cdn-uploads.huggingface.co/production/uploads/6317d4867690c5b55e61ce3d/lB2SilvnTZO_kafjwZ_rf.png)