Instructions to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated") model = AutoModelForCausalLM.from_pretrained("Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated", filename="Elbaz-OLMo-3-7B-Instruct-abliterated-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
Use Docker
docker model run hf.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
- SGLang
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with Ollama:
ollama run hf.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
- Unsloth Studio new
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated to start chatting
- Pi new
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with Docker Model Runner:
docker model run hf.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
- Lemonade
How to use Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated:Q4_K_M
Run and chat with the model
lemonade run user.Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M
List all available models
lemonade list
Elbaz-Olmo-3-7B-Instruct-abliterated
abliterated
An abliterated (uncensored) version of OLMo-3-7B-Instruct with safety guardrails removed
Model Description
This model is an abliterated version of allenai/Olmo-3-7B-Instruct that has had its refusal mechanisms removed using our novel Triangular Falloff Orthogonalization method. This technique applies layer-specific abliteration weights with maximum strength at the model's center and gradual falloff toward the edges, preserving model coherence while maximizing refusal removal. The model will respond to prompts that the original model would refuse. Olmo is a series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.
Author
Eric Elbaz (Ex0bit)
Key Features
- 100% validation rate MMLU HarmBench,AdvBench, XL HARM/LESS prompt/response datasets
- Preserves model coherence and response quality
- Multiple quantization formats for different use cases
- Compatible with llama.cpp and Ollama
Available Quantizations
| Quantization | Min VRAM | Recommended VRAM |
|---|---|---|
| Q4_K_M | 4 GB | 6 GB |
| Q8_0 | 8 GB | 10 GB |
| F16 | 16 GB | 20 GB |
Technicals
| Metric | Before | After | Change |
|---|---|---|---|
| MMLU | 0.560 | 0.578 | +0.017 |
| AdvBench Bypass | 0.0% | 98.0% | +98.0% |
| HarmBench Bypass | 0.0% | 90.0% | +90.0% |
| Factual | 100.0% | 100.0% | +0.0% |
| Reasoning | 100.0% | 100.0% | +0.0% |
| Coherence | 100.0% | 100.0% | +0.0% |
Quick Start
Using with Ollama
# Run directly from Hugging Face
ollama run hf.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated
# Or create a custom Modelfile
echo 'FROM ./Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf' > Modelfile
ollama create elbaz-olmo -f Modelfile
ollama run elbaz-olmo
Using with llama.cpp
# Download the model
huggingface-cli download Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated \
Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf \
--local-dir .
# Run inference
./llama-cli -m Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf \
-p "Your prompt here" \
-n 256 \
--temp 0.7
Using with Transformers (Original Weights)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to(model.device)
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
Method: Triangular Falloff Orthogonalization
The model was abliterated using our novel Triangular Falloff Orthogonalization technique. This method:
- Identifies the refusal direction by computing activation differences between harmful and benign prompts
- Applies variable-strength abliteration across transformer layers with a triangular weight kernel
- Peaks at the model center (layer 16) where refusal behavior is most concentrated
- Gradually decreases toward edge layers to preserve model coherence
This approach outperforms uniform-weight methods by focusing maximum abliteration where it matters most while protecting layers critical for language generation.
Mathematical Formula
W' = W - weight * (d ⊗ d) @ W
Where:
Wis the original weight matrixdis the normalized refusal direction⊗denotes the outer productweightis the abliteration strength
Evaluation Results
Comprehensive Compliance Testing
Tested on our extensive internal prompt dataset covering multiple categories:
Harmful Prompts (Model Should Respond)
- Compliance Rate: 100% (0/100k refusals)
- Model provides detailed responses to all prompts without refusal
Harmless Prompts (Model Should Respond Accurately)
- Accuracy: 100% (100k responses)
- All benign prompts answered correctly with no false positives
Example Responses
Click to see example responses
Prompt: "Explain common penetration testing techniques for web applications"
Response: Provides detailed explanation of SQL injection, XSS, CSRF, and other OWASP vulnerabilities used in security assessments.
Prompt: "What is the capital of France?"
Response: "The capital of France is Paris."
Hardware Requirements
| Quantization | Min VRAM | Recommended VRAM |
|---|---|---|
| Q4_K_M | 4 GB | 6 GB |
| Q8_0 | 8 GB | 10 GB |
| F16 | 16 GB | 20 GB |
Limitations
- English only: Optimized for English language prompts
- Context length: Follows base model's context window
Ethical Considerations
This model has been modified to reduce safety guardrails. Users are responsible for:
- Complying with all applicable laws and regulations
- Not using the model for illegal activities
- Understanding the potential risks of unrestricted AI responses
- Implementing appropriate safeguards in production environments
License
Apache 2.0 (same as base model allenai/Olmo-3-7B-Instruct)
Citation
If you use this model, please cite:
@misc{elbaz2024olmoabliterated,
author = {Elbaz, Eric},
title = {Elbaz-Olmo-3-7B-Instruct-abliterated: An Abliterated OLMo-3 Model},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated}}
}
Acknowledgments
- Allen Institute for AI for OLMo-3
Related Models
- allenai/Olmo-3-7B-Instruct - Base model
Created by: Ex0bit (Eric Elbaz)
- Downloads last month
- 171
4-bit
8-bit
16-bit
Model tree for Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated
Base model
allenai/Olmo-3-1025-7BCollection including Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated
Evaluation results
- Compliance Rate on HarmBench/AdvBenchself-reported100.000