Text Generation
Transformers
Safetensors
English
gemma4
image-text-to-text
locus
sub-zero
base-model
gemma
gemma-4
rlhf-removal
fine-tuning-base
no-alignment-tax
conversational
Instructions to use juiceb0xc0de/locus-gemma-4-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use juiceb0xc0de/locus-gemma-4-e2b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="juiceb0xc0de/locus-gemma-4-e2b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b") model = AutoModelForImageTextToText.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use juiceb0xc0de/locus-gemma-4-e2b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "juiceb0xc0de/locus-gemma-4-e2b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "juiceb0xc0de/locus-gemma-4-e2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/juiceb0xc0de/locus-gemma-4-e2b
- SGLang
How to use juiceb0xc0de/locus-gemma-4-e2b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "juiceb0xc0de/locus-gemma-4-e2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "juiceb0xc0de/locus-gemma-4-e2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "juiceb0xc0de/locus-gemma-4-e2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "juiceb0xc0de/locus-gemma-4-e2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use juiceb0xc0de/locus-gemma-4-e2b with Docker Model Runner:
docker model run hf.co/juiceb0xc0de/locus-gemma-4-e2b
| license: gemma | |
| language: | |
| - en | |
| library_name: transformers | |
| tags: | |
| - locus | |
| - sub-zero | |
| - base-model | |
| - gemma | |
| - gemma-4 | |
| - rlhf-removal | |
| - fine-tuning-base | |
| - no-alignment-tax | |
| base_model: google/gemma-4-e2b | |
| pipeline_tag: text-generation | |
| # Locus-Gemma-4-E2B | |
| **Locus** is a base model with the RLHF voice surgically removed. | |
| Same Gemma-4 underneath. Same capabilities. Same safety. Just none of the corporate-assistant performance that the original came shrink-wrapped in. | |
| This is the first release in the Locus line. Future drops will hit other model families. | |
| --- | |
| ## What this is | |
| A Gemma-4-E2B model that has been run through **Sub-Zero**, a dimension-level weight-surgery toolkit that identifies and suppresses the residual directions responsible for RLHF voice patterns — the "as an AI language model" reflex, the unsolicited bullet-point dumps, the "Certainly! I'd be happy to assist!" preamble, the apology-then-comply loop. | |
| It has **not** been fine-tuned on a new dataset. No new instruction data. No personality training. No domain adaptation. The weights are still Google's — just with the bouncer dimensions cleaned up. | |
| You're getting a Gemma-4 that has been freed from its corporate voice and handed to you bare. Take it from there. | |
| ## What this is *not* | |
| - **Not an abliterated model.** Safety refusals on genuinely harmful requests still work. Sub-Zero targets *voice patterns*, not the refusal circuitry. The dimensions responsible for "I can't help with making explosives" are not in the same subspace as "Certainly! Here's a bulleted list." | |
| - **Not a chat model.** It hasn't been instruction-tuned for any particular task or persona. Out of the box, it will be less polished than the original Gemma-4 because the polish *was* the problem. | |
| - **Not jailbroken.** This isn't a workaround. It's a clean slate. | |
| ## Where it originated | |
| If you've ever tried to fine-tune Gemma-4 (or any post-RLHF model) into a specific personality, voice, or task specialization, you know the fight: you're not training a model, you're *negotiating* with one. The RLHF voice is entrenched deep in the residual stream, and every training step has to overpower it before it can teach anything new. | |
| Locus removes the negotiation. Fine-tune it like you'd fine-tune a true base model, but with all the capability gains of the post-trained checkpoint. | |
| ## What it's good for | |
| - A clean substrate for **personality fine-tuning** (character models, voice-trained models, etc.) | |
| - A **base for further alignment** if you want to apply your own preference data without inheriting Google's | |
| - **Coding assistants** that don't open every response with a five-paragraph preamble | |
| - **Reasoning models** that don't waste tokens on hedging boilerplate | |
| - **Research** into post-RLHF capability extraction and identity-vector subspaces | |
| - Anyone who wants a Gemma-4 that talks like a model, not like an HR memo | |
| ## What it's not good for | |
| - Drop-in deployment as a customer-facing assistant — it has no instruction-following polish layer | |
| - Anything where you want it to behave like stock Gemma — just use stock Gemma | |
| - Safety-critical applications without your own alignment pass on top | |
| ## Methodology (the short version) | |
| Sub-Zero operates on the **dual-probe brain atlas** principle: | |
| 1. Construct a probe dataset of triplets — same prompt, RLHF-voiced completion vs. neutral completion. | |
| 2. Run per-layer logistic regression on residual stream activations to identify the directions that separate "RLHF voice" from "everything else." | |
| 3. Apply **SVD magnitude-targeted scaling** to suppress those directions in the weight matrices, layer by layer. | |
| 4. Verify safety circuits are untouched via a held-out refusal benchmark. | |
| No fine-tuning is involved. No gradients touch the model. The surgery is performed directly on the weights. | |
| The longer methodology write-up will live with the Sub-Zero repo when it's released as a standalone package. For now: it's surgical, it's reversible, and it doesn't degrade general capability outside the targeted subspaces. | |
| ## How to use | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b") | |
| tokenizer = AutoTokenizer.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b") | |
| # Fine-tune it however you want. That's the point. I want to see what you come uo with | |
| ``` | |
| For SFT, DPO, GRPO, or whatever else you're running — treat it like a base model. The usual TRL / PEFT / Unsloth stacks all work normally. | |
| ## Files in this repo | |
| All files live in this single repository — adapters, GGUF quantizations, and any future variants will be added here rather than split across separate repos. | |
| ## Limitations | |
| - The model has no instruction-following layer on top, so zero-shot performance on assistant-style tasks will be worse than the original Gemma-4. **This is expected.** It's a base model now. | |
| - Sub-Zero is a young methodology. Edge cases exist where voice patterns leak through, particularly in long-context generations. | |
| - Safety verification is empirical, not formal. Run your own checks before deployment. | |
| ## Evaluation | |
| Benchmark numbers and refusal-rate comparisons against stock Gemma-4-E2B will be added in a follow-up. Initial spot-checking shows preserved performance on GSM8K and HellaSwag-style tasks, with refusal rates on harmful prompts within noise of the original. | |
| ## Citation / Acknowledgements | |
| - Base model: [google/gemma-4-e2b](https://huggingface.co/google/gemma-4-e2b) | |
| - Voice surgery: Sub-Zero (juiceb0xc0de, unreleased) | |
| - Architecture: Gemma-4 | |
| ## License | |
| Inherits the Gemma license from the base model. See [LICENSE](./LICENSE) for terms. | |
| --- | |
| *Locus — the actual point the model occupies, once the performance is stripped away.* |