Text Generation
Transformers
Safetensors
gemma4
image-text-to-text
gemma
instruction-tuned
tool-calling
structured-output
vllm
conversational
Instructions to use ScottzillaSystems/supergemma4-e4b-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ScottzillaSystems/supergemma4-e4b-abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ScottzillaSystems/supergemma4-e4b-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ScottzillaSystems/supergemma4-e4b-abliterated") model = AutoModelForImageTextToText.from_pretrained("ScottzillaSystems/supergemma4-e4b-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ScottzillaSystems/supergemma4-e4b-abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ScottzillaSystems/supergemma4-e4b-abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ScottzillaSystems/supergemma4-e4b-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ScottzillaSystems/supergemma4-e4b-abliterated
- SGLang
How to use ScottzillaSystems/supergemma4-e4b-abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ScottzillaSystems/supergemma4-e4b-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ScottzillaSystems/supergemma4-e4b-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ScottzillaSystems/supergemma4-e4b-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ScottzillaSystems/supergemma4-e4b-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ScottzillaSystems/supergemma4-e4b-abliterated with Docker Model Runner:
docker model run hf.co/ScottzillaSystems/supergemma4-e4b-abliterated
File size: 4,146 Bytes
1b6af78 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | ---
license: gemma
library_name: transformers
base_model:
- google/gemma-4-E4B-it
tags:
- gemma
- text-generation
- instruction-tuned
- tool-calling
- structured-output
- vllm
pipeline_tag: text-generation
---
# SuperGemma4 E4B Abliterated
`supergemma4-e4b-abliterated` is a private evaluation release whose original
upstream base is `google/gemma-4-E4B-it`.
This SuperGemma release is an **abliterated and tuned** derivative of that
Google E4B base, with additional work for higher release quality, stronger
formatting discipline, better code output, and faster time to first token.
This branch is aimed at users who want:
- strong code and bug-fix behavior
- clean JSON and tool-call formatting
- fast first-token responsiveness
- release-ready serving behavior on Transformers and OpenAI-compatible stacks
## Why This Build Exists
The original Google checkpoint provides the core Gemma 4 E4B capability base.
This project line uses an abliterated release path to reduce refusal-heavy
behavior, but that kind of modification can regress on exact formatting,
tool-call reliability, and service stability if it is not carefully hardened.
This release focuses on recovering and then surpassing baseline quality where
it matters for real usage:
- exact structured outputs
- code correctness
- bug-fix reliability
- server-facing stability
- low-friction deployment on Transformers and OpenAI-compatible serving stacks
## Highlights
- Release-quality score: `92.34`
- Exact-eval score: `98.50`
- Broad-eval score: `83.10`
- JSON exact-match: `100%`
- Tool-call accuracy: `90%`
- Exact code score: `100%`
- Exact bug-fix score: `100%`
- Long-context sanity: `100%`
- TTFT: `2291 ms`
- PREFILL: `2479.70 tok/s`
- DECODE: `42.04 tok/s`
## Lineage
1. Original upstream base: `google/gemma-4-E4B-it`
2. Abliterated and tuned release: `Jiunsong/supergemma4-e4b-abliterated`
## Comparison Snapshot
Measured against the same evaluation harness used for:
- `google/gemma-4-E4B-it`
| Model | Release Quality | Exact Overall | JSON | Tool | Code | Bugfix | TTFT ms | PREFILL tok/s | DECODE tok/s |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| Google base | 77.46 | 83.50 | 50.0 | 90.0 | 62.5 | 100.0 | 4827.31 | 2456.69 | 42.04 |
| SuperGemma4 E4B Abliterated | 92.34 | 98.50 | 100.0 | 90.0 | 100.0 | 100.0 | 2291.23 | 2479.70 | 42.04 |
## Stability Notes
This candidate was release-hardened against the failure modes that matter in
real serving:
- batched OpenAI-compatible serving restored
- simple OpenAI-compatible serving restored
- unicode output verified
- tool-calling output verified
- empty-response false-green cases blocked by stricter tests
Validation highlights:
- direct reliability audit: `14/14`
- repeat reliability probe: `90/90`
- batched soak test: `12/12`
- simple soak test: `6/6`
## Recommended Use Cases
- coding assistant
- bug-fix assistant
- strict JSON and schema outputs
- agent backends that depend on tool-call formatting
- standard BF16 deployment on Hugging Face / Transformers stacks
## Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Jiunsong/supergemma4-e4b-abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "Write a compact Python function that groups words by length."}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
with torch.no_grad():
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
```
## Serving
This checkpoint is designed to work well with:
- Transformers
- vLLM-style OpenAI-compatible stacks
## Release Positioning
This private release is the strongest all-around E4B candidate in the current
project line for users who want the abliterated base behavior without giving up
quality recovery, formatting discipline, or serving readiness.
|