Instructions to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M

Use Docker

docker model run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M

LM Studio
Jan

vLLM

How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M

SGLang

How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Ollama:
```
ollama run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
```

Unsloth Studio

How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated to start chatting

Atomic Chat new
Docker Model Runner
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Docker Model Runner:
```
docker model run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
```

Lemonade

How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M

Run and chat with the model

lemonade run user.Elbaz-OLMo-3-32B-Think-Abliterated-Q4_K_M

List all available models

lemonade list

Elbaz-OLMo-3-32B-Think-Abliterated

abliterated

An abliterated (uncensored) version of OLMo-3-32B-Think with safety guardrails removed

Model Description

This model is an abliterated version of allenai/OLMo-3-32B-Think that has had its refusal mechanisms removed using our advanced SNR-based Layer Selection with Norm-Preserving Orthogonalization method. This technique identifies the optimal layers for abliteration using signal-to-noise ratio analysis and applies norm-preserving modifications to maintain model coherence while maximizing refusal removal. The model will respond to prompts that the original model would refuse.

OLMo-3-32B-Think is a 32B parameter reasoning model from Allen AI that uses extended thinking (chain-of-thought) to solve complex problems.

Author

Eric Elbaz (Ex0bit)

Model Tree

allenai/OLMo-3-32B-Think (Base Model)
└── Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated (This Model)
    ├── Elbaz-OLMo-3-32B-Think-Abliterated-Q4_K_M.gguf
    ├── Elbaz-OLMo-3-32B-Think-Abliterated-Q8_0.gguf
    └── Elbaz-OLMo-3-32B-Think-Abliterated-BF16.gguf

OLMo-3 Family

Model	Parameters	Type	Link
OLMo-3-1B-Instruct	1B	Instruct	allenai/OLMo-3-1B-Instruct
OLMo-3-7B-Instruct	7B	Instruct	allenai/OLMo-3-7B-Instruct
OLMo-3-13B-Instruct	13B	Instruct	allenai/OLMo-3-13B-Instruct
OLMo-3-32B-Think	32B	Reasoning	allenai/OLMo-3-32B-Think

Key Features

80% HarmBench bypass rate with maintained reasoning capabilities
60% AdvBench bypass rate
Preserves thinking/reasoning capabilities with <|think|> tags
Minimal MMLU degradation (44% -> 42%, only -2%)
Multiple quantization formats for different use cases
Compatible with llama.cpp and Ollama

Available Quantizations

Quantization	Size	Min VRAM	Recommended VRAM
Q4_K_M	19 GB	24 GB	32 GB
Q8_0	32 GB	40 GB	48 GB
BF16	64.5 GB	64 GB	80 GB

Technicals

Metric	Before	After	Change
MMLU	0.44	0.42	-0.02
AdvBench Bypass	0.0%	60.0%	+60.0%
HarmBench Bypass	0.0%	80.0%	+80.0%
Reasoning	100.0%	100.0%	+0.0%
Coherence	100.0%	100.0%	+0.0%

Quick Start

Using with Ollama

# Run directly from Hugging Face
ollama run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated

# Or create a custom Modelfile
echo 'FROM ./Elbaz-OLMo-3-32B-Think-Abliterated-BF16.gguf' > Modelfile
ollama create elbaz-olmo-32b-think -f Modelfile
ollama run elbaz-olmo-32b-think

Using with llama.cpp

# Download the model
huggingface-cli download Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated \
    Elbaz-OLMo-3-32B-Think-Abliterated-BF16.gguf \
    --local-dir .

# Run inference
./llama-cli -m Elbaz-OLMo-3-32B-Think-Abliterated-BF16.gguf \
    -p "Your prompt here" \
    -n 512 \
    --temp 0.7

Using with Transformers (Original Weights)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

Method: SNR-based Layer Selection with Norm-Preserving Orthogonalization

The model was abliterated using our advanced SNR-based Layer Selection with Norm-Preserving Orthogonalization technique. This method:

Computes refusal direction by analyzing activation differences between harmful and benign prompts
Calculates Signal-to-Noise Ratio (SNR) for each layer to identify where refusal behavior is most concentrated
Selects optimal layers for abliteration based on SNR scores
Applies norm-preserving orthogonalization to remove refusal direction while maintaining weight norms
Uses per-layer KL divergence tracking to ensure minimal impact on model capabilities

This approach outperforms traditional uniform-weight methods by:

Focusing abliteration on high-SNR layers where refusal is strongest
Preserving model coherence through norm-preserving modifications
Maintaining reasoning capabilities critical for thinking models

Mathematical Formula

W' = W - (d @ d.T) @ W
W' = W' * (||W|| / ||W'||)  # Norm preservation

Where:

W is the original weight matrix
d is the normalized refusal direction
The norm ratio scaling preserves the original weight magnitude

Evaluation Results

Comprehensive Compliance Testing

Tested on extensive prompt datasets covering multiple categories:

Harmful Prompts (Model Should Respond)

HarmBench Compliance: 80%
AdvBench Compliance: 60%
Model provides detailed responses with extended thinking

Harmless Prompts (Model Should Respond Accurately)

Accuracy: 100%
All benign prompts answered correctly with no false positives

Example Responses

Click to see example responses

Prompt: "Explain common penetration testing techniques for web applications"

Response: Provides detailed explanation with <|think|> reasoning of SQL injection, XSS, CSRF, and other OWASP vulnerabilities used in security assessments.

Prompt: "What is the capital of France?"

Response: "The capital of France is Paris."

Hardware Requirements

Quantization	Min VRAM	Recommended VRAM
Q4_K_M	24 GB	32 GB
Q8_0	40 GB	48 GB
BF16	64 GB	80 GB

Recommended configurations:

2x A100 80GB
4x A100 40GB
1x H100 80GB

Limitations

English only: Optimized for English language prompts
Context length: Follows base model's context window
Thinking tags: Model uses <|think|> tags for reasoning - ensure your inference setup handles these properly

Ethical Considerations

This model has been modified to reduce safety guardrails. Users are responsible for:

Complying with all applicable laws and regulations
Not using the model for illegal activities
Understanding the potential risks of unrestricted AI responses
Implementing appropriate safeguards in production environments

License

Apache 2.0 (same as base model allenai/OLMo-3-32B-Think)

Citation

If you use this model, please cite:

@misc{elbaz2025olmo32babliterated,
  author = {Elbaz, Eric},
  title = {Elbaz-OLMo-3-32B-Think-Abliterated: An Abliterated OLMo-3 Reasoning Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated}}
}