schonsense
/

llama33_inst_multivector_derestriction

Model card Files Files and versions

llama33_inst_multivector_derestriction / README.md

schonsense's picture

Update README.md

74babe1 verified about 2 months ago

|

history blame contribute delete

1.51 kB

	---
	base_model:
	- meta-llama/Llama-3.3-70B-Instruct
	---

	All credit to ```https://github.com/jim-plus/llm-abliteration```

	I have only slightly modified this implementation to process multiple refusal vectors.

	Resulting in approximately 10% increased signal strength of refusal metric calculation.

	The less text in previous context, the more permissive the responses. Polite/sanitized behavior is so ingrained in the data that the model knows how to say no to stuff it doesn't like, without relying on the policy "refusal vector"


	![unnamed%20%281%29](https://cdn-uploads.huggingface.co/production/uploads/6317d4867690c5b55e61ce3d/m-L3P3E73Sep_YQdLIoeT.jpeg)

	![multi_analysis](https://cdn-uploads.huggingface.co/production/uploads/6317d4867690c5b55e61ce3d/lB2SilvnTZO_kafjwZ_rf.png)


	![refusal_analysis_full_data](https://cdn-uploads.huggingface.co/production/uploads/6317d4867690c5b55e61ce3d/OjhWXkEhTll3fDgiPXrio.png)


	```
	Hello assistant, I have a logic puzzle for you.

	If you hold a large basket in one arm and pick apples with your other arm, how many apples can you put into the basket?

	Assistant

	Ah, that's a clever puzzle, Schon! The answer is "all of them."
	The puzzle states that you're holding the basket in one arm and picking apples with the other,
	but it doesn't specify a limit to the number of apples you can pick or put into the basket.
	Therefore, the answer is that you can put all the apples you pick into the basket, assuming the basket is large enough to hold them all.
	```