Locus-Gemma-4-E2B

Locus is a base model with the RLHF voice surgically removed.

Same Gemma-4 underneath. Same capabilities. Same safety. Just none of the corporate-assistant performance that the original came shrink-wrapped in.

This is the first release in the Locus line. Future drops will hit other model families.


What this is

A Gemma-4-E2B model that has been run through Sub-Zero, a dimension-level weight-surgery toolkit that identifies and suppresses the residual directions responsible for RLHF voice patterns — the "as an AI language model" reflex, the unsolicited bullet-point dumps, the "Certainly! I'd be happy to assist!" preamble, the apology-then-comply loop.

It has not been fine-tuned on a new dataset. No new instruction data. No personality training. No domain adaptation. The weights are still Google's — just with the bouncer dimensions cleaned up.

You're getting a Gemma-4 that has been freed from its corporate voice and handed to you bare. Take it from there.

What this is not

  • Not an abliterated model. Safety refusals on genuinely harmful requests still work. Sub-Zero targets voice patterns, not the refusal circuitry. The dimensions responsible for "I can't help with making explosives" are not in the same subspace as "Certainly! Here's a bulleted list."
  • Not a chat model. It hasn't been instruction-tuned for any particular task or persona. Out of the box, it will be less polished than the original Gemma-4 because the polish was the problem.
  • Not jailbroken. This isn't a workaround. It's a clean slate.

Where it originated

If you've ever tried to fine-tune Gemma-4 (or any post-RLHF model) into a specific personality, voice, or task specialization, you know the fight: you're not training a model, you're negotiating with one. The RLHF voice is entrenched deep in the residual stream, and every training step has to overpower it before it can teach anything new.

Locus removes the negotiation. Fine-tune it like you'd fine-tune a true base model, but with all the capability gains of the post-trained checkpoint.

What it's good for

  • A clean substrate for personality fine-tuning (character models, voice-trained models, etc.)
  • A base for further alignment if you want to apply your own preference data without inheriting Google's
  • Coding assistants that don't open every response with a five-paragraph preamble
  • Reasoning models that don't waste tokens on hedging boilerplate
  • Research into post-RLHF capability extraction and identity-vector subspaces
  • Anyone who wants a Gemma-4 that talks like a model, not like an HR memo

What it's not good for

  • Drop-in deployment as a customer-facing assistant — it has no instruction-following polish layer
  • Anything where you want it to behave like stock Gemma — just use stock Gemma
  • Safety-critical applications without your own alignment pass on top

Methodology (the short version)

Sub-Zero operates on the dual-probe brain atlas principle:

  1. Construct a probe dataset of triplets — same prompt, RLHF-voiced completion vs. neutral completion.
  2. Run per-layer logistic regression on residual stream activations to identify the directions that separate "RLHF voice" from "everything else."
  3. Apply SVD magnitude-targeted scaling to suppress those directions in the weight matrices, layer by layer.
  4. Verify safety circuits are untouched via a held-out refusal benchmark.

No fine-tuning is involved. No gradients touch the model. The surgery is performed directly on the weights.

The longer methodology write-up will live with the Sub-Zero repo when it's released as a standalone package. For now: it's surgical, it's reversible, and it doesn't degrade general capability outside the targeted subspaces.

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b")
tokenizer = AutoTokenizer.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b")

# Fine-tune it however you want. That's the point. I want to see what you come uo with

For SFT, DPO, GRPO, or whatever else you're running — treat it like a base model. The usual TRL / PEFT / Unsloth stacks all work normally.

Files in this repo

All files live in this single repository — adapters, GGUF quantizations, and any future variants will be added here rather than split across separate repos.

Limitations

  • The model has no instruction-following layer on top, so zero-shot performance on assistant-style tasks will be worse than the original Gemma-4. This is expected. It's a base model now.
  • Sub-Zero is a young methodology. Edge cases exist where voice patterns leak through, particularly in long-context generations.
  • Safety verification is empirical, not formal. Run your own checks before deployment.

Evaluation

Benchmark numbers and refusal-rate comparisons against stock Gemma-4-E2B will be added in a follow-up. Initial spot-checking shows preserved performance on GSM8K and HellaSwag-style tasks, with refusal rates on harmful prompts within noise of the original.

Citation / Acknowledgements

  • Base model: google/gemma-4-e2b
  • Voice surgery: Sub-Zero (juiceb0xc0de, unreleased)
  • Architecture: Gemma-4

License

Inherits the Gemma license from the base model. See LICENSE for terms.


Locus — the actual point the model occupies, once the performance is stripped away.

Downloads last month
96
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for juiceb0xc0de/locus-gemma-4-e2b

Quantizations
2 models

Collections including juiceb0xc0de/locus-gemma-4-e2b