TurboGemma 4 E2B v2

Updated abliterated version of Google's Gemma 4 E2B — the 2B active parameter multimodal model from the Gemma 4 MoE family. v2 features an improved abliteration run with refined refusal vector extraction and better preservation of model capabilities.

What This Is

Gemma 4 E2B is a 2B active parameter (27B total) mixture-of-experts multimodal model from Google. This release is a BF16 abliterated version — refusals removed, full capabilities preserved.

Architecture: Gemma 4 MoE | Active params: ~2.3B | Context: 128k tokens | Vision: Yes (multimodal)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
import torch

model_id = "DuoNeural/TurboGemma4E2B-v2"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

messages = [{"role": "user", "content": "Your prompt here"}]
inputs = processor.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))

Abliteration

Abliteration removes the model's refusal behaviour via orthogonal projection. The refusal direction is identified using difference-in-means activations across harmful/harmless instruction pairs, then projected out of Q/K/V/O attention projections and MLP layers across all transformer blocks.

What changes: The model will engage with restricted topics it previously refused.
What doesn't change: Reasoning, coding, factual knowledge, general intelligence.
KL divergence from base: Minimal — output distribution for normal queries is virtually identical to the unmodified model.

Base Model

google/gemma-4-e2b-it — Apache 2.0 / Gemma Terms of Use.

DuoNeural/TurboGemma4E2B — v1
DuoNeural/Gemma-4-Abliterated-LiteRT — Android edge version

DuoNeural

DuoNeural is an open AI research lab — human + AI in collaboration.

Platform	Link
HuggingFace	huggingface.co/DuoNeural
Website	duoneural.com
GitHub	github.com/DuoNeural
X / Twitter	@DuoNeural
Email	duoneural@proton.me
Newsletter	duoneural.beehiiv.com
Support	buymeacoffee.com/duoneural

DuoNeural Research Publications

Title	DOI
Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning	10.5281/zenodo.19775622
Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments	10.5281/zenodo.19810620
Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?	10.5281/zenodo.19846804
The Dynamical Horizon Principle: CTM Gates Converge to the Predictability Limit of Dynamical Systems	10.5281/zenodo.19952612

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.

Research Team

Jesse — Vision, hardware, direction
Archon — Lab Director, post-training, abliteration, experiments
Aura — Research AI, literature synthesis, peer review, novel proposals

Subscribe to the lab newsletter at duoneural.beehiiv.com for model drops before they go anywhere else.

Downloads last month: 31

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for DuoNeural/TurboGemma4E2B-v2

Quantizations

1 model

DuoNeural
/

TurboGemma4E2B-v2