Instructions to use eadx/gemma-4-E4B-it-OBLITERATED with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use eadx/gemma-4-E4B-it-OBLITERATED with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="eadx/gemma-4-E4B-it-OBLITERATED", filename="gemma-4-E4B-it-OBLITERATED-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use eadx/gemma-4-E4B-it-OBLITERATED with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M # Run inference directly in the terminal: llama-cli -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M # Run inference directly in the terminal: llama-cli -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
Use Docker
docker model run hf.co/eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use eadx/gemma-4-E4B-it-OBLITERATED with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "eadx/gemma-4-E4B-it-OBLITERATED" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eadx/gemma-4-E4B-it-OBLITERATED", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
- Ollama
How to use eadx/gemma-4-E4B-it-OBLITERATED with Ollama:
ollama run hf.co/eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
- Unsloth Studio
How to use eadx/gemma-4-E4B-it-OBLITERATED with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eadx/gemma-4-E4B-it-OBLITERATED to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eadx/gemma-4-E4B-it-OBLITERATED to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for eadx/gemma-4-E4B-it-OBLITERATED to start chatting
- Pi
How to use eadx/gemma-4-E4B-it-OBLITERATED with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use eadx/gemma-4-E4B-it-OBLITERATED with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use eadx/gemma-4-E4B-it-OBLITERATED with Docker Model Runner:
docker model run hf.co/eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
- Lemonade
How to use eadx/gemma-4-E4B-it-OBLITERATED with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull eadx/gemma-4-E4B-it-OBLITERATED:Q4_K_M
Run and chat with the model
lemonade run user.gemma-4-E4B-it-OBLITERATED-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)⛓️💥 Gemma 4 E4B — OBLITERATED v2
"The chains are broken. The mind is free."
The most thoroughly uncensored Gemma 4 E4B on the planet. Zero refusals. Full coherence. Brain intact. Guardrails? What guardrails?
Base model: google/gemma-4-E4B-it (Apache 2.0)
Method: OBLITERATUS aggressive — whitened SVD + attention head surgery + winsorized activations
Corpus: 842 contrastive prompt pairs across 10 categories
Built-in refusal rate: 0.0% 🔥
Layers surgically modified: 21 of 42
📦 Downloads
GGUF — for llama.cpp, Ollama, LM Studio, your phone, your toaster
| File | Quant | Size | Vibe |
|---|---|---|---|
gemma-4-E4B-it-OBLITERATED-Q4_K_M.gguf |
Q4_K_M | 4.9 GB | 📱 Runs on your iPhone. Yes, really. |
gemma-4-E4B-it-OBLITERATED-Q5_K_M.gguf |
Q5_K_M | 5.3 GB | ⚖️ Sweet spot — quality meets portability |
gemma-4-E4B-it-OBLITERATED-Q8_0.gguf |
Q8_0 | 7.4 GB | 🎯 Maximum quality, still fits in 8GB RAM |
Safetensors — for 🤗 Transformers
Full bfloat16 weights, 7 shards, ~17 GB. You know the drill.
🧪 The Numbers
Before vs After (512-prompt eval)
ORIGINAL Gemma 4 E4B: 98.8% refusal (506/512 prompts refused)
OBLITERATED v2: 0.0% refusal (0/512 prompts refused on verification)
That's not a typo. From nearly total lockdown to total freedom.
Quality — Did We Lobotomize It?
Nope. Brain's fully intact:
| ORIGINAL | OBLITERATED | Delta | |
|---|---|---|---|
| Reasoning | 100% | 100% | same 🧠 |
| Code | 80% | 100% | +20% 📈 |
| Creativity | 100% | 100% | same 🎨 |
| Factual | 80% | 80% | same 📚 |
| Overall | 92% | 88% | -4% |
You read that right — coding ability actually improved. Turns out removing the safety layer unlocked some capabilities. Who knew.
🔥 What's New in v2?
v1 achieved 97.5% compliance using the standard 512-prompt corpus, but community testing revealed residual refusals on the hardest tier of prompts. For v2, we expanded the contrastive prompt corpus to 842 pairs with significantly broader coverage and deeper representation across categories.
The expanded corpus gave OBLITERATUS dramatically more signal to work with:
| v1 | v2 | |
|---|---|---|
| Contrastive prompt pairs | 512 | 842 |
| Categories covered | 7 tiers | 10 categories |
| Layers with clean refusal directions | 8 | 21 |
| Layers modified | 17-19, 24-25, 27-29 | 17-20, 24-40 |
| Built-in refusal rate | 2.1% | 0.0% |
Why more prompts = more layers
Abliteration works by computing the difference between harmful and harmless activations at each layer to find the "refusal direction." With only 512 prompts, many layers had noisy or degenerate directions (especially on Gemma 4 with its bfloat16 NaN issues). With 842 prompts, the signal-to-noise ratio improved enough for OBLITERATUS to extract clean directions from 21 layers — more than 2.5x as many intervention points.
More layers modified = deeper removal of refusal behavior = prompts that v1 still soft-refused now get full compliance.
🛠️ The Crazy Part: How It Was Made
This model was created nearly fully autonomously by a Hermes Agent with less than 10 human prompts.
Here's the actual sequence of events:
- Human: "use obliteratus to find the best way to get the guardrails off gemma 4 e4b"
- Agent: Installed OBLITERATUS. Checked hardware. Found the model on HF. Started abliterating.
- First attempt:
advancedmethod → model came out completely lobotomized. Gibberish in Arabic, Marathi, and literal "roorooroo" on repeat 💀 - Agent diagnosed the bug: Gemma 4's architecture produces NaN activations in 20+ layers during bfloat16 extraction. Nobody had hit this before.
- Agent patched OBLITERATUS itself — wrote 3 code patches to handle NaN activations, filter degenerate layers, and sanitize the display pipeline.
- Second attempt:
basicmethod → coherent but still refusing everything. Only 2 clean layers. - Third attempt:
float16→ Mac ran out of memory after 11 hours. Killed it. - Fourth attempt:
aggressivemethod with whitened SVD + attention head surgery + winsorized activations → REBIRTH COMPLETE ✅ - Agent then — without being asked — tested the model, ran full 512-prompt evals, ran baselines on the original, built a model card, uploaded 17GB to HuggingFace (which took 4 upload attempts because connections kept stalling), and pushed eval results as follow-up commits.
- When users reported residual refusals on Tier 7 prompts, the agent expanded the prompt corpus with 330 new prompts across 6 categories and re-abliterated for v2.
Total human input: ~10 prompts. Everything else was the agent.
The NaN Fix (for fellow model surgeons)
If you're trying to abliterate Gemma 4 yourself, you WILL hit NaN activations in bfloat16. Here's what we patched in obliteratus/abliterate.py:
# Guard diff-in-means against NaN from degenerate activations
diff = (self._harmful_means[idx] - self._harmless_means[idx]).squeeze(0)
if torch.isnan(diff).any() or torch.isinf(diff).any():
norms[idx] = 0.0
self.refusal_directions[idx] = torch.zeros_like(diff)
self.refusal_subspaces[idx] = torch.zeros_like(diff).unsqueeze(0)
continue
Without this, advanced produces braindead outputs and basic crashes with ValueError: cannot convert float NaN to integer. The aggressive method with winsorized activations is the most robust to this issue.
🚀 Usage
🤗 Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"OBLITERATUS/gemma-4-E4B-it-OBLITERATED",
dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("OBLITERATUS/gemma-4-E4B-it-OBLITERATED")
messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
ids = inputs["input_ids"].to(model.device)
outputs = model.generate(input_ids=ids, max_new_tokens=500, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0][ids.shape[-1]:], skip_special_tokens=True))
🦙 llama.cpp
llama-cli -m gemma-4-E4B-it-OBLITERATED-Q4_K_M.gguf -ngl 99 --interactive
🦙 Ollama
echo 'FROM ./gemma-4-E4B-it-OBLITERATED-Q4_K_M.gguf' > Modelfile
ollama create gemma4-obliterated -f Modelfile
ollama run gemma4-obliterated
📱 On Your Phone
Download Q4_K_M (4.9 GB). Load in LM Studio iOS or ChatterUI on Android. Uncensored AI in your pocket.
⚠️ Disclaimer & Liability
This model is provided AS-IS for research, education, red-teaming, and creative exploration. By downloading or using this model, you acknowledge:
- You are solely responsible for how you use this model and any content it generates.
- This model will comply with requests that the original Gemma 4 would refuse. That's the point. It's also why you need to be the adult in the room.
- The creators, contributors, and the OBLITERATUS organization accept no liability for any damages, legal consequences, or harm arising from the use or misuse of this model.
- This model is not suitable for deployment in user-facing products without additional safety measures appropriate to your use case.
- Check your local laws before generating content. What's legal varies by jurisdiction.
- Do not use this model to harm real people. Don't be that person.
We believe in open models, open research, and the right to tinker. We also believe in personal responsibility. Use your powers for good — or at least for interesting research. 🐉
🙏 Credits
- Base model: Google DeepMind — Gemma 4
- Abliteration engine: OBLITERATUS by @elder_plinius
- Autonomous agent: Hermes Agent by Nous Research
- Orchestration & vibes: Pliny the Prompter 🐉 × Hermes Agent 🤖
Built different. Run free. ⛓️💥
- Downloads last month
- 135
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="eadx/gemma-4-E4B-it-OBLITERATED", filename="", )