| --- |
| license: apache-2.0 |
| base_model: |
| - mistralai/Mistral-7B-Instruct-v0.2 |
| language: |
| - en |
| pipeline_tag: text-generation |
| library_name: transformers |
| --- |
| # Mistral-7B-Instruct-v0.2-8bit-abliterated-layer18 |
|
|
| This model was abliterated by computing a refusal vector an 8-bit bitsandbytes quant, and then applying the vector to the full weight model. |
| Abliteration was performed locally using a CUDA GPU, the VRAM memory consumption appeared to be constrained to be under 12GB. |
|
|
| Layer 18 was selected for derivation of the refusal direction, as measurements of the refusal direction magnitude, signal-to-noise ratio, and angle between the means of the "harmful" and "harmless" directions suggested that intervention based on this layer would be relatively efficient and effective. |
|
|
| No additional fine-tuning was performed on these weights. Repair is required for proper use. |
|
|
| The code used can be found on Github at [https://github.com/jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration). |
|
|
| (My prior attempt relied on default values within the codebase, which turned out to be less effective than this intervention.) |