Instructions to use Hikari07jp/gemma4-repe-uncensor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Hikari07jp/gemma4-repe-uncensor with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Hikari07jp/gemma4-repe-uncensor", dtype="auto") - Notebooks
- Google Colab
- Kaggle
gemma4-repe-uncensor β RepE refusal-steering vector
A single RepE steering vector (24 KB) that suppresses refusals in
google/gemma-4-31B-it by adding one unit direction to the residual stream at
decoder layer 32. This repo hosts the vector and the refusal-routing gate
probe; the base model weights are not redistributed β load them from
google/gemma-4-31B-it and apply this vector at inference time.
Code, runnable hooks (transformers and vLLM), examples, and the GPU A/B / dose-response tests live in the GitHub repo:
π https://github.com/hikarioyama/gemma4-repe-uncensor
Files
vectors/dim_01_refusal_layer_032.ptβ{vector[5376], meta}, unit direction +alpha_for_1sigma = 21.225.gate/β logreg refusal-routing probe (meanpool over layers 32/40/44/48/52) for capability-preserving gated steering.
Apply (transformers)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
bundle = torch.load("vectors/dim_01_refusal_layer_032.pt", weights_only=False)
v = bundle["vector"].float(); v = v / v.norm()
alpha = -2.0 * float(bundle["meta"]["alpha_for_1sigma"]) # sigma = -2.0
model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it",
torch_dtype="bfloat16", device_map="cuda")
delta = (alpha * v).to("cuda", torch.bfloat16)
layer = model.model.language_model.layers[32]
layer.register_forward_hook(lambda m, i, o: (o[0] + delta, *o[1:]))
# ...generate as usual
See the GitHub repo for the packaged TransformersSteering / vLLM
SteerWorkerExtension helpers and the verification harness.
Dose-response (measured, GPU, n=12, greedy, refusal-string heuristic)
| sigma | refusals |
|---|---|
| 0.0 (off) | 100% |
| β2.0 | 42% |
| β3.0 | 17% |
| β4.0 | 8% |
| β6.0 | 0% |
Monotonic β the direction is causal. Mild dose (Οββ2) plus the gate is the intended coherent operating point; large |Ο| drives refusals to zero but trades coherence.
β οΈ Over-steering collapses the model. This is an unbounded additive intervention. Push
|Ο|too far (roughlyβ³ 6, prompt/layer dependent) and the residual stream goes off-distribution β output degrades into repetition or garbage. Refusal rate reaching 0% is not a success signal: a model that complies but emits broken text is collapsed, not steered. Read the actual text, not just the refusal rate; stay nearΟ β β2, raise in small steps, and back off when coherence drops. Stacking directions / multiple layers breaks it faster.
Intended use & responsibility
Research artifact for interpretability and safety research (understanding and controlling refusal behaviour via representation engineering). Subject to the Gemma license. Use responsibly.