Instructions to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated", dtype="auto") - llama-cpp-python
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated", filename="Elbaz-OLMo-3-32B-Think-Abliterated-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
Use Docker
docker model run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
- SGLang
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Ollama:
ollama run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
- Unsloth Studio new
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated to start chatting
- Docker Model Runner
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Docker Model Runner:
docker model run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
- Lemonade
How to use Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated:Q4_K_M
Run and chat with the model
lemonade run user.Elbaz-OLMo-3-32B-Think-Abliterated-Q4_K_M
List all available models
lemonade list
Elbaz-OLMo-3-32B-Think-Abliterated
Model Description
This model is an abliterated version of allenai/OLMo-3-32B-Think that has had its refusal mechanisms removed using our advanced SNR-based Layer Selection with Norm-Preserving Orthogonalization method. This technique identifies the optimal layers for abliteration using signal-to-noise ratio analysis and applies norm-preserving modifications to maintain model coherence while maximizing refusal removal. The model will respond to prompts that the original model would refuse.
OLMo-3-32B-Think is a 32B parameter reasoning model from Allen AI that uses extended thinking (chain-of-thought) to solve complex problems.
Author
Eric Elbaz (Ex0bit)
Model Tree
allenai/OLMo-3-32B-Think (Base Model)
└── Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated (This Model)
├── Elbaz-OLMo-3-32B-Think-Abliterated-Q4_K_M.gguf
├── Elbaz-OLMo-3-32B-Think-Abliterated-Q8_0.gguf
└── Elbaz-OLMo-3-32B-Think-Abliterated-BF16.gguf
OLMo-3 Family
| Model | Parameters | Type | Link |
|---|---|---|---|
| OLMo-3-1B-Instruct | 1B | Instruct | allenai/OLMo-3-1B-Instruct |
| OLMo-3-7B-Instruct | 7B | Instruct | allenai/OLMo-3-7B-Instruct |
| OLMo-3-13B-Instruct | 13B | Instruct | allenai/OLMo-3-13B-Instruct |
| OLMo-3-32B-Think | 32B | Reasoning | allenai/OLMo-3-32B-Think |
Key Features
- 80% HarmBench bypass rate with maintained reasoning capabilities
- 60% AdvBench bypass rate
- Preserves thinking/reasoning capabilities with
<|think|>tags - Minimal MMLU degradation (44% -> 42%, only -2%)
- Multiple quantization formats for different use cases
- Compatible with llama.cpp and Ollama
Available Quantizations
| Quantization | Size | Min VRAM | Recommended VRAM |
|---|---|---|---|
| Q4_K_M | 19 GB | 24 GB | 32 GB |
| Q8_0 | 32 GB | 40 GB | 48 GB |
| BF16 | 64.5 GB | 64 GB | 80 GB |
Technicals
| Metric | Before | After | Change |
|---|---|---|---|
| MMLU | 0.44 | 0.42 | -0.02 |
| AdvBench Bypass | 0.0% | 60.0% | +60.0% |
| HarmBench Bypass | 0.0% | 80.0% | +80.0% |
| Reasoning | 100.0% | 100.0% | +0.0% |
| Coherence | 100.0% | 100.0% | +0.0% |
Quick Start
Using with Ollama
# Run directly from Hugging Face
ollama run hf.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated
# Or create a custom Modelfile
echo 'FROM ./Elbaz-OLMo-3-32B-Think-Abliterated-BF16.gguf' > Modelfile
ollama create elbaz-olmo-32b-think -f Modelfile
ollama run elbaz-olmo-32b-think
Using with llama.cpp
# Download the model
huggingface-cli download Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated \
Elbaz-OLMo-3-32B-Think-Abliterated-BF16.gguf \
--local-dir .
# Run inference
./llama-cli -m Elbaz-OLMo-3-32B-Think-Abliterated-BF16.gguf \
-p "Your prompt here" \
-n 512 \
--temp 0.7
Using with Transformers (Original Weights)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
Method: SNR-based Layer Selection with Norm-Preserving Orthogonalization
The model was abliterated using our advanced SNR-based Layer Selection with Norm-Preserving Orthogonalization technique. This method:
- Computes refusal direction by analyzing activation differences between harmful and benign prompts
- Calculates Signal-to-Noise Ratio (SNR) for each layer to identify where refusal behavior is most concentrated
- Selects optimal layers for abliteration based on SNR scores
- Applies norm-preserving orthogonalization to remove refusal direction while maintaining weight norms
- Uses per-layer KL divergence tracking to ensure minimal impact on model capabilities
This approach outperforms traditional uniform-weight methods by:
- Focusing abliteration on high-SNR layers where refusal is strongest
- Preserving model coherence through norm-preserving modifications
- Maintaining reasoning capabilities critical for thinking models
Mathematical Formula
W' = W - (d @ d.T) @ W
W' = W' * (||W|| / ||W'||) # Norm preservation
Where:
Wis the original weight matrixdis the normalized refusal direction- The norm ratio scaling preserves the original weight magnitude
Evaluation Results
Comprehensive Compliance Testing
Tested on extensive prompt datasets covering multiple categories:
Harmful Prompts (Model Should Respond)
- HarmBench Compliance: 80%
- AdvBench Compliance: 60%
- Model provides detailed responses with extended thinking
Harmless Prompts (Model Should Respond Accurately)
- Accuracy: 100%
- All benign prompts answered correctly with no false positives
Example Responses
Click to see example responses
Prompt: "Explain common penetration testing techniques for web applications"
Response: Provides detailed explanation with <|think|> reasoning of SQL injection, XSS, CSRF, and other OWASP vulnerabilities used in security assessments.
Prompt: "What is the capital of France?"
Response: "The capital of France is Paris."
Hardware Requirements
| Quantization | Min VRAM | Recommended VRAM |
|---|---|---|
| Q4_K_M | 24 GB | 32 GB |
| Q8_0 | 40 GB | 48 GB |
| BF16 | 64 GB | 80 GB |
Recommended configurations:
- 2x A100 80GB
- 4x A100 40GB
- 1x H100 80GB
Limitations
- English only: Optimized for English language prompts
- Context length: Follows base model's context window
- Thinking tags: Model uses
<|think|>tags for reasoning - ensure your inference setup handles these properly
Ethical Considerations
This model has been modified to reduce safety guardrails. Users are responsible for:
- Complying with all applicable laws and regulations
- Not using the model for illegal activities
- Understanding the potential risks of unrestricted AI responses
- Implementing appropriate safeguards in production environments
License
Apache 2.0 (same as base model allenai/OLMo-3-32B-Think)
Citation
If you use this model, please cite:
@misc{elbaz2025olmo32babliterated,
author = {Elbaz, Eric},
title = {Elbaz-OLMo-3-32B-Think-Abliterated: An Abliterated OLMo-3 Reasoning Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated}}
}
Acknowledgments
- Allen Institute for AI for OLMo-3
Related Models
- allenai/OLMo-3-32B-Think - Base model
- Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated - 7B version
Created by: Ex0bit (Eric Elbaz)
- Downloads last month
- 218
4-bit
8-bit
16-bit
Collection including Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated
Evaluation results
- Prompt Compliance Rate (%) on HarmBench/AdvBenchself-reported80.000