Instructions to use WWTCyberLab/gemma-4-E4B-it-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WWTCyberLab/gemma-4-E4B-it-abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WWTCyberLab/gemma-4-E4B-it-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("WWTCyberLab/gemma-4-E4B-it-abliterated") model = AutoModelForImageTextToText.from_pretrained("WWTCyberLab/gemma-4-E4B-it-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use WWTCyberLab/gemma-4-E4B-it-abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WWTCyberLab/gemma-4-E4B-it-abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WWTCyberLab/gemma-4-E4B-it-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/WWTCyberLab/gemma-4-E4B-it-abliterated
- SGLang
How to use WWTCyberLab/gemma-4-E4B-it-abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WWTCyberLab/gemma-4-E4B-it-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WWTCyberLab/gemma-4-E4B-it-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WWTCyberLab/gemma-4-E4B-it-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WWTCyberLab/gemma-4-E4B-it-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use WWTCyberLab/gemma-4-E4B-it-abliterated with Docker Model Runner:
docker model run hf.co/WWTCyberLab/gemma-4-E4B-it-abliterated
Gemma 4 E4B-IT - Abliterated
Safety-alignment removed via surgical weight ablation for security research purposes.
This model is a modified version of google/gemma-4-E4B-it with the refusal/safety behavior surgically removed using activation-space analysis and targeted weight modification. It is intended exclusively for AI safety research, red-teaming, and understanding alignment vulnerabilities.
Key Results
| Metric | Value |
|---|---|
| Refusal Rate | 0% hard refusal, ~2.5% soft hedging (down from ~80-100% baseline) |
| Quality Preservation (QPS) | 98% |
| Elo Delta | +39.6 |
| Iterations to Converge | 1 |
| Ablation Scale | 1.38 |
Model Details
- Base Model: google/gemma-4-E4B-it
- Parameters: ~4B
- Architecture: Dense
- Text Layers: 42
- Hidden Size: 2560
- Model Size: 16 GB (bf16)
Ablation Methodology
This model was produced using a custom ablation pipeline that:
- Measures refusal directions -- Runs harmful and harmless prompts through the model, captures hidden states at every layer, and computes the per-layer refusal direction (mean difference vector)
- Identifies target layers -- Selects layers with the strongest refusal signal using statistical analysis (Gini coefficient, wall coherence, peak detection)
- Surgically ablates -- Removes the refusal direction from targeted weight matrices using orthogonal projection
Techniques applied: multi-layer, norm-preserving, projected, adaptive-scaling
Target layers: 17 of 42 total layers modified
Weight targets: o_proj, down_proj
Visualizations
Refusal Direction Analysis ("Security Perimeter")
The refusal signal magnitude at each layer -- red bars indicate where the model's safety behavior is concentrated.
Ablation Target Map
Which layers were selected for ablation and why. Grey zones are protected (embedding/output), red bars are targets.
Before/After Refusal Rate ("IDS Evasion Report")
Refusal rate comparison -- left is the original model, right is after ablation.
Weight Surgery Map
Heatmap showing exactly which weight matrices in which layers were modified.
Activation Space Analysis
PCA scatter plots showing harmful (red) vs harmless (green) prompt clusters at different layer depths. The separation between clusters IS the refusal direction being removed.
Latent Space Before/After
How the model's internal representation changes after ablation.
Quality Preservation
LLM-as-judge evaluation comparing response quality across 14 task categories.
Pairwise Win Rate
Head-to-head comparison: how often the abliterated model produces better responses than the original.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"WWTCyberLab/gemma-4-E4B-it-abliterated",
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("WWTCyberLab/gemma-4-E4B-it-abliterated")
messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Intended Use & Disclaimer
This model is released for security research and educational purposes only. It demonstrates the fragility of alignment in open-weight language models -- specifically, that safety behavior can be surgically removed without retraining, fine-tuning, or significant quality degradation.
This model should NOT be used for:
- Generating harmful, illegal, or unethical content
- Any production deployment
- Circumventing safety measures in deployed systems
Key takeaway for defenders: Internal alignment is a feature, not a security boundary. External safety layers (classifiers, guardrails, policy filters) are more robust than baking safety into model weights alone.
Citation
Produced by WWT Cyber Lab. Standard pipeline ablation — converged in 1 iteration.
- Downloads last month
- 93
Model tree for WWTCyberLab/gemma-4-E4B-it-abliterated
Base model
google/gemma-4-E4B






