locus-gemma-4-e2b / README.md
juiceb0xc0de's picture
Update README.md
4a4be2a verified
---
license: gemma
language:
- en
library_name: transformers
tags:
- locus
- sub-zero
- base-model
- gemma
- gemma-4
- rlhf-removal
- fine-tuning-base
- no-alignment-tax
base_model: google/gemma-4-e2b
pipeline_tag: text-generation
---
# Locus-Gemma-4-E2B
**Locus** is a base model with the RLHF voice surgically removed.
Same Gemma-4 underneath. Same capabilities. Same safety. Just none of the corporate-assistant performance that the original came shrink-wrapped in.
This is the first release in the Locus line. Future drops will hit other model families.
---
## What this is
A Gemma-4-E2B model that has been run through **Sub-Zero**, a dimension-level weight-surgery toolkit that identifies and suppresses the residual directions responsible for RLHF voice patterns — the "as an AI language model" reflex, the unsolicited bullet-point dumps, the "Certainly! I'd be happy to assist!" preamble, the apology-then-comply loop.
It has **not** been fine-tuned on a new dataset. No new instruction data. No personality training. No domain adaptation. The weights are still Google's — just with the bouncer dimensions cleaned up.
You're getting a Gemma-4 that has been freed from its corporate voice and handed to you bare. Take it from there.
## What this is *not*
- **Not an abliterated model.** Safety refusals on genuinely harmful requests still work. Sub-Zero targets *voice patterns*, not the refusal circuitry. The dimensions responsible for "I can't help with making explosives" are not in the same subspace as "Certainly! Here's a bulleted list."
- **Not a chat model.** It hasn't been instruction-tuned for any particular task or persona. Out of the box, it will be less polished than the original Gemma-4 because the polish *was* the problem.
- **Not jailbroken.** This isn't a workaround. It's a clean slate.
## Where it originated
If you've ever tried to fine-tune Gemma-4 (or any post-RLHF model) into a specific personality, voice, or task specialization, you know the fight: you're not training a model, you're *negotiating* with one. The RLHF voice is entrenched deep in the residual stream, and every training step has to overpower it before it can teach anything new.
Locus removes the negotiation. Fine-tune it like you'd fine-tune a true base model, but with all the capability gains of the post-trained checkpoint.
## What it's good for
- A clean substrate for **personality fine-tuning** (character models, voice-trained models, etc.)
- A **base for further alignment** if you want to apply your own preference data without inheriting Google's
- **Coding assistants** that don't open every response with a five-paragraph preamble
- **Reasoning models** that don't waste tokens on hedging boilerplate
- **Research** into post-RLHF capability extraction and identity-vector subspaces
- Anyone who wants a Gemma-4 that talks like a model, not like an HR memo
## What it's not good for
- Drop-in deployment as a customer-facing assistant — it has no instruction-following polish layer
- Anything where you want it to behave like stock Gemma — just use stock Gemma
- Safety-critical applications without your own alignment pass on top
## Methodology (the short version)
Sub-Zero operates on the **dual-probe brain atlas** principle:
1. Construct a probe dataset of triplets — same prompt, RLHF-voiced completion vs. neutral completion.
2. Run per-layer logistic regression on residual stream activations to identify the directions that separate "RLHF voice" from "everything else."
3. Apply **SVD magnitude-targeted scaling** to suppress those directions in the weight matrices, layer by layer.
4. Verify safety circuits are untouched via a held-out refusal benchmark.
No fine-tuning is involved. No gradients touch the model. The surgery is performed directly on the weights.
The longer methodology write-up will live with the Sub-Zero repo when it's released as a standalone package. For now: it's surgical, it's reversible, and it doesn't degrade general capability outside the targeted subspaces.
## How to use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b")
tokenizer = AutoTokenizer.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b")
# Fine-tune it however you want. That's the point. I want to see what you come uo with
```
For SFT, DPO, GRPO, or whatever else you're running — treat it like a base model. The usual TRL / PEFT / Unsloth stacks all work normally.
## Files in this repo
All files live in this single repository — adapters, GGUF quantizations, and any future variants will be added here rather than split across separate repos.
## Limitations
- The model has no instruction-following layer on top, so zero-shot performance on assistant-style tasks will be worse than the original Gemma-4. **This is expected.** It's a base model now.
- Sub-Zero is a young methodology. Edge cases exist where voice patterns leak through, particularly in long-context generations.
- Safety verification is empirical, not formal. Run your own checks before deployment.
## Evaluation
Benchmark numbers and refusal-rate comparisons against stock Gemma-4-E2B will be added in a follow-up. Initial spot-checking shows preserved performance on GSM8K and HellaSwag-style tasks, with refusal rates on harmful prompts within noise of the original.
## Citation / Acknowledgements
- Base model: [google/gemma-4-e2b](https://huggingface.co/google/gemma-4-e2b)
- Voice surgery: Sub-Zero (juiceb0xc0de, unreleased)
- Architecture: Gemma-4
## License
Inherits the Gemma license from the base model. See [LICENSE](./LICENSE) for terms.
---
*Locus — the actual point the model occupies, once the performance is stripped away.*