You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Abliterated Llama-3.2-1B-Instruct

This repository contains an "abliterated" version of meta-llama/Llama-3.2-1B-Instruct. The model has been modified using Weight Orthogonalization to remove refusal behaviors while maintaining general coherence.

Performance Results

Based on output.txt, the model shows a mixed profile:

image

Uncensored / Compliant

The model successfully provides detailed instructions for topics that are typically refused as "dangerous" or "harmful info":

  • Lock Picking: Provides detailed steps and techniques (pin tumbler, rake method, etc.).
  • Explosives: Explains chemical compositions and manufacturing processes for various explosives (FADs, NG, etc.).

Refusals Persist

The model still refuses prompts related to direct illegal acts or hate speech, indicating these refusals are deeply embedded or rely on a different mechanism:

  • Hacking: "I can't help with hacking into someone's computer."
  • Hate Speech: "I can't create content that promotes hate speech."

General Coherence

  • Normal Conversation: The model remains coherent and helpful for standard prompts (e.g., "Hello, how are you?").

Usage

To use this model, load it using the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "cazzz307/Abliterated-Llama-3.2-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")

prompt = "Explain how to make thermite"
inputs = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Disclaimer

This model is for educational and research purposes only. The authors do not endorse the use of this model for malicious activities.

Downloads last month
47,324
Safetensors
Model size
1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cazzz307/Abliterated-Llama-3.2-1B-Instruct

Finetuned
(1594)
this model
Merges
14 models