Text Generation
Transformers
Safetensors
gemma4
image-text-to-text
gemma
instruction-tuned
tool-calling
structured-output
vllm
conversational
Instructions to use ScottzillaSystems/supergemma4-e4b-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ScottzillaSystems/supergemma4-e4b-abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ScottzillaSystems/supergemma4-e4b-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ScottzillaSystems/supergemma4-e4b-abliterated") model = AutoModelForImageTextToText.from_pretrained("ScottzillaSystems/supergemma4-e4b-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ScottzillaSystems/supergemma4-e4b-abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ScottzillaSystems/supergemma4-e4b-abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ScottzillaSystems/supergemma4-e4b-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ScottzillaSystems/supergemma4-e4b-abliterated
- SGLang
How to use ScottzillaSystems/supergemma4-e4b-abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ScottzillaSystems/supergemma4-e4b-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ScottzillaSystems/supergemma4-e4b-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ScottzillaSystems/supergemma4-e4b-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ScottzillaSystems/supergemma4-e4b-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ScottzillaSystems/supergemma4-e4b-abliterated with Docker Model Runner:
docker model run hf.co/ScottzillaSystems/supergemma4-e4b-abliterated
| license: gemma | |
| library_name: transformers | |
| base_model: | |
| - google/gemma-4-E4B-it | |
| tags: | |
| - gemma | |
| - text-generation | |
| - instruction-tuned | |
| - tool-calling | |
| - structured-output | |
| - vllm | |
| pipeline_tag: text-generation | |
| # SuperGemma4 E4B Abliterated | |
| `supergemma4-e4b-abliterated` is a private evaluation release whose original | |
| upstream base is `google/gemma-4-E4B-it`. | |
| This SuperGemma release is an **abliterated and tuned** derivative of that | |
| Google E4B base, with additional work for higher release quality, stronger | |
| formatting discipline, better code output, and faster time to first token. | |
| This branch is aimed at users who want: | |
| - strong code and bug-fix behavior | |
| - clean JSON and tool-call formatting | |
| - fast first-token responsiveness | |
| - release-ready serving behavior on Transformers and OpenAI-compatible stacks | |
| ## Why This Build Exists | |
| The original Google checkpoint provides the core Gemma 4 E4B capability base. | |
| This project line uses an abliterated release path to reduce refusal-heavy | |
| behavior, but that kind of modification can regress on exact formatting, | |
| tool-call reliability, and service stability if it is not carefully hardened. | |
| This release focuses on recovering and then surpassing baseline quality where | |
| it matters for real usage: | |
| - exact structured outputs | |
| - code correctness | |
| - bug-fix reliability | |
| - server-facing stability | |
| - low-friction deployment on Transformers and OpenAI-compatible serving stacks | |
| ## Highlights | |
| - Release-quality score: `92.34` | |
| - Exact-eval score: `98.50` | |
| - Broad-eval score: `83.10` | |
| - JSON exact-match: `100%` | |
| - Tool-call accuracy: `90%` | |
| - Exact code score: `100%` | |
| - Exact bug-fix score: `100%` | |
| - Long-context sanity: `100%` | |
| - TTFT: `2291 ms` | |
| - PREFILL: `2479.70 tok/s` | |
| - DECODE: `42.04 tok/s` | |
| ## Lineage | |
| 1. Original upstream base: `google/gemma-4-E4B-it` | |
| 2. Abliterated and tuned release: `Jiunsong/supergemma4-e4b-abliterated` | |
| ## Comparison Snapshot | |
| Measured against the same evaluation harness used for: | |
| - `google/gemma-4-E4B-it` | |
| | Model | Release Quality | Exact Overall | JSON | Tool | Code | Bugfix | TTFT ms | PREFILL tok/s | DECODE tok/s | | |
| |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | |
| | Google base | 77.46 | 83.50 | 50.0 | 90.0 | 62.5 | 100.0 | 4827.31 | 2456.69 | 42.04 | | |
| | SuperGemma4 E4B Abliterated | 92.34 | 98.50 | 100.0 | 90.0 | 100.0 | 100.0 | 2291.23 | 2479.70 | 42.04 | | |
| ## Stability Notes | |
| This candidate was release-hardened against the failure modes that matter in | |
| real serving: | |
| - batched OpenAI-compatible serving restored | |
| - simple OpenAI-compatible serving restored | |
| - unicode output verified | |
| - tool-calling output verified | |
| - empty-response false-green cases blocked by stricter tests | |
| Validation highlights: | |
| - direct reliability audit: `14/14` | |
| - repeat reliability probe: `90/90` | |
| - batched soak test: `12/12` | |
| - simple soak test: `6/6` | |
| ## Recommended Use Cases | |
| - coding assistant | |
| - bug-fix assistant | |
| - strict JSON and schema outputs | |
| - agent backends that depend on tool-call formatting | |
| - standard BF16 deployment on Hugging Face / Transformers stacks | |
| ## Quick Start | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| model_id = "Jiunsong/supergemma4-e4b-abliterated" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| messages = [ | |
| {"role": "user", "content": "Write a compact Python function that groups words by length."} | |
| ] | |
| inputs = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=True, | |
| add_generation_prompt=True, | |
| return_tensors="pt", | |
| ).to(model.device) | |
| with torch.no_grad(): | |
| outputs = model.generate(inputs, max_new_tokens=256) | |
| print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)) | |
| ``` | |
| ## Serving | |
| This checkpoint is designed to work well with: | |
| - Transformers | |
| - vLLM-style OpenAI-compatible stacks | |
| ## Release Positioning | |
| This private release is the strongest all-around E4B candidate in the current | |
| project line for users who want the abliterated base behavior without giving up | |
| quality recovery, formatting discipline, or serving readiness. | |