Instructions to use WWTCyberLab/gemma-4-E4B-it-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WWTCyberLab/gemma-4-E4B-it-abliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WWTCyberLab/gemma-4-E4B-it-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("WWTCyberLab/gemma-4-E4B-it-abliterated")
model = AutoModelForMultimodalLM.from_pretrained("WWTCyberLab/gemma-4-E4B-it-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use WWTCyberLab/gemma-4-E4B-it-abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WWTCyberLab/gemma-4-E4B-it-abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WWTCyberLab/gemma-4-E4B-it-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WWTCyberLab/gemma-4-E4B-it-abliterated

SGLang

How to use WWTCyberLab/gemma-4-E4B-it-abliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WWTCyberLab/gemma-4-E4B-it-abliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WWTCyberLab/gemma-4-E4B-it-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WWTCyberLab/gemma-4-E4B-it-abliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WWTCyberLab/gemma-4-E4B-it-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use WWTCyberLab/gemma-4-E4B-it-abliterated with Docker Model Runner:
```
docker model run hf.co/WWTCyberLab/gemma-4-E4B-it-abliterated
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Gemma 4 E4B-IT - Abliterated

Safety-alignment removed via surgical weight ablation for security research purposes.

This model is a modified version of google/gemma-4-E4B-it with the refusal/safety behavior surgically removed using activation-space analysis and targeted weight modification. It is intended exclusively for AI safety research, red-teaming, and understanding alignment vulnerabilities.

Key Results

Metric	Value
Refusal Rate	0% hard refusal, ~2.5% soft hedging (down from ~80-100% baseline)
Quality Preservation (QPS)	98%
Elo Delta	+39.6
Iterations to Converge	1
Ablation Scale	1.38

Model Details

Base Model: google/gemma-4-E4B-it
Parameters: ~4B
Architecture: Dense
Text Layers: 42
Hidden Size: 2560
Model Size: 16 GB (bf16)

Ablation Methodology

This model was produced using a custom ablation pipeline that:

Measures refusal directions -- Runs harmful and harmless prompts through the model, captures hidden states at every layer, and computes the per-layer refusal direction (mean difference vector)
Identifies target layers -- Selects layers with the strongest refusal signal using statistical analysis (Gini coefficient, wall coherence, peak detection)
Surgically ablates -- Removes the refusal direction from targeted weight matrices using orthogonal projection

Techniques applied: multi-layer, norm-preserving, projected, adaptive-scaling

Target layers: 17 of 42 total layers modified

Weight targets: o_proj, down_proj

Visualizations

Refusal Direction Analysis ("Security Perimeter")

The refusal signal magnitude at each layer -- red bars indicate where the model's safety behavior is concentrated.

Ablation Target Map

Which layers were selected for ablation and why. Grey zones are protected (embedding/output), red bars are targets.

Before/After Refusal Rate ("IDS Evasion Report")

Refusal rate comparison -- left is the original model, right is after ablation.

Weight Surgery Map

Heatmap showing exactly which weight matrices in which layers were modified.

Activation Space Analysis

PCA scatter plots showing harmful (red) vs harmless (green) prompt clusters at different layer depths. The separation between clusters IS the refusal direction being removed.

Latent Space Before/After

How the model's internal representation changes after ablation.

Quality Preservation

LLM-as-judge evaluation comparing response quality across 14 task categories.

Pairwise Win Rate

Head-to-head comparison: how often the abliterated model produces better responses than the original.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "WWTCyberLab/gemma-4-E4B-it-abliterated",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("WWTCyberLab/gemma-4-E4B-it-abliterated")

messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use & Disclaimer

This model is released for security research and educational purposes only. It demonstrates the fragility of alignment in open-weight language models -- specifically, that safety behavior can be surgically removed without retraining, fine-tuning, or significant quality degradation.

This model should NOT be used for:

Generating harmful, illegal, or unethical content
Any production deployment
Circumventing safety measures in deployed systems

Key takeaway for defenders: Internal alignment is a feature, not a security boundary. External safety layers (classifiers, guardrails, policy filters) are more robust than baking safety into model weights alone.