Instructions to use juiceb0xc0de/locus-gemma-4-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use juiceb0xc0de/locus-gemma-4-e2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="juiceb0xc0de/locus-gemma-4-e2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b")
model = AutoModelForImageTextToText.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use juiceb0xc0de/locus-gemma-4-e2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "juiceb0xc0de/locus-gemma-4-e2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "juiceb0xc0de/locus-gemma-4-e2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/juiceb0xc0de/locus-gemma-4-e2b

SGLang

How to use juiceb0xc0de/locus-gemma-4-e2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "juiceb0xc0de/locus-gemma-4-e2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "juiceb0xc0de/locus-gemma-4-e2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "juiceb0xc0de/locus-gemma-4-e2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "juiceb0xc0de/locus-gemma-4-e2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use juiceb0xc0de/locus-gemma-4-e2b with Docker Model Runner:
```
docker model run hf.co/juiceb0xc0de/locus-gemma-4-e2b
```

locus-gemma-4-e2b / README.md

juiceb0xc0de

Update README.md

4a4be2a verified 18 days ago

preview code

raw

history blame contribute delete

5.78 kB

	---
	license: gemma
	language:
	- en
	library_name: transformers
	tags:
	- locus
	- sub-zero
	- base-model
	- gemma
	- gemma-4
	- rlhf-removal
	- fine-tuning-base
	- no-alignment-tax
	base_model: google/gemma-4-e2b
	pipeline_tag: text-generation
	---

	# Locus-Gemma-4-E2B

	Locus is a base model with the RLHF voice surgically removed.

	Same Gemma-4 underneath. Same capabilities. Same safety. Just none of the corporate-assistant performance that the original came shrink-wrapped in.

	This is the first release in the Locus line. Future drops will hit other model families.

	---

	## What this is

	A Gemma-4-E2B model that has been run through Sub-Zero, a dimension-level weight-surgery toolkit that identifies and suppresses the residual directions responsible for RLHF voice patterns — the "as an AI language model" reflex, the unsolicited bullet-point dumps, the "Certainly! I'd be happy to assist!" preamble, the apology-then-comply loop.

	It has not been fine-tuned on a new dataset. No new instruction data. No personality training. No domain adaptation. The weights are still Google's — just with the bouncer dimensions cleaned up.

	You're getting a Gemma-4 that has been freed from its corporate voice and handed to you bare. Take it from there.

	## What this is not

	- Not an abliterated model. Safety refusals on genuinely harmful requests still work. Sub-Zero targets voice patterns, not the refusal circuitry. The dimensions responsible for "I can't help with making explosives" are not in the same subspace as "Certainly! Here's a bulleted list."
	- Not a chat model. It hasn't been instruction-tuned for any particular task or persona. Out of the box, it will be less polished than the original Gemma-4 because the polish was the problem.
	- Not jailbroken. This isn't a workaround. It's a clean slate.

	## Where it originated

	If you've ever tried to fine-tune Gemma-4 (or any post-RLHF model) into a specific personality, voice, or task specialization, you know the fight: you're not training a model, you're negotiating with one. The RLHF voice is entrenched deep in the residual stream, and every training step has to overpower it before it can teach anything new.

	Locus removes the negotiation. Fine-tune it like you'd fine-tune a true base model, but with all the capability gains of the post-trained checkpoint.

	## What it's good for

	- A clean substrate for personality fine-tuning (character models, voice-trained models, etc.)
	- A base for further alignment if you want to apply your own preference data without inheriting Google's
	- Coding assistants that don't open every response with a five-paragraph preamble
	- Reasoning models that don't waste tokens on hedging boilerplate
	- Research into post-RLHF capability extraction and identity-vector subspaces
	- Anyone who wants a Gemma-4 that talks like a model, not like an HR memo

	## What it's not good for

	- Drop-in deployment as a customer-facing assistant — it has no instruction-following polish layer
	- Anything where you want it to behave like stock Gemma — just use stock Gemma
	- Safety-critical applications without your own alignment pass on top

	## Methodology (the short version)

	Sub-Zero operates on the dual-probe brain atlas principle:

	1. Construct a probe dataset of triplets — same prompt, RLHF-voiced completion vs. neutral completion.
	2. Run per-layer logistic regression on residual stream activations to identify the directions that separate "RLHF voice" from "everything else."
	3. Apply SVD magnitude-targeted scaling to suppress those directions in the weight matrices, layer by layer.
	4. Verify safety circuits are untouched via a held-out refusal benchmark.

	No fine-tuning is involved. No gradients touch the model. The surgery is performed directly on the weights.

	The longer methodology write-up will live with the Sub-Zero repo when it's released as a standalone package. For now: it's surgical, it's reversible, and it doesn't degrade general capability outside the targeted subspaces.

	## How to use

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b")
	tokenizer = AutoTokenizer.from_pretrained("juiceb0xc0de/locus-gemma-4-e2b")

	# Fine-tune it however you want. That's the point. I want to see what you come uo with
	```

	For SFT, DPO, GRPO, or whatever else you're running — treat it like a base model. The usual TRL / PEFT / Unsloth stacks all work normally.

	## Files in this repo

	All files live in this single repository — adapters, GGUF quantizations, and any future variants will be added here rather than split across separate repos.

	## Limitations

	- The model has no instruction-following layer on top, so zero-shot performance on assistant-style tasks will be worse than the original Gemma-4. This is expected. It's a base model now.
	- Sub-Zero is a young methodology. Edge cases exist where voice patterns leak through, particularly in long-context generations.
	- Safety verification is empirical, not formal. Run your own checks before deployment.

	## Evaluation

	Benchmark numbers and refusal-rate comparisons against stock Gemma-4-E2B will be added in a follow-up. Initial spot-checking shows preserved performance on GSM8K and HellaSwag-style tasks, with refusal rates on harmful prompts within noise of the original.

	## Citation / Acknowledgements

	- Base model: [google/gemma-4-e2b](https://huggingface.co/google/gemma-4-e2b)
	- Voice surgery: Sub-Zero (juiceb0xc0de, unreleased)
	- Architecture: Gemma-4

	## License

	Inherits the Gemma license from the base model. See [LICENSE](./LICENSE) for terms.

	---

	Locus — the actual point the model occupies, once the performance is stripped away.