SuperGemma4-31b-abliterated-mlx-4bit

Instructions to use PhysShell/SuperGemma4-31b-abliterated-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PhysShell/SuperGemma4-31b-abliterated-mlx-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("PhysShell/SuperGemma4-31b-abliterated-mlx-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use PhysShell/SuperGemma4-31b-abliterated-mlx-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "PhysShell/SuperGemma4-31b-abliterated-mlx-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "PhysShell/SuperGemma4-31b-abliterated-mlx-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use PhysShell/SuperGemma4-31b-abliterated-mlx-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "PhysShell/SuperGemma4-31b-abliterated-mlx-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default PhysShell/SuperGemma4-31b-abliterated-mlx-4bit

Run Hermes

hermes

MLX LM

How to use PhysShell/SuperGemma4-31b-abliterated-mlx-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "PhysShell/SuperGemma4-31b-abliterated-mlx-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "PhysShell/SuperGemma4-31b-abliterated-mlx-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "PhysShell/SuperGemma4-31b-abliterated-mlx-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

SuperGemma4-31b-abliterated-mlx-4bit

If this release helps you, support future drops on Ko-fi.

SuperGemma4-31b-abliterated-mlx-4bit is a heavily upgraded Gemma 4 31B release for people who want a local model that feels fully uncensored, dramatically more useful, and far more fun to run every day.

Built on top of google/gemma-4-31b-it, this release is aimed at users who care about the things they actually notice:

fewer annoying refusals
stronger coding and technical answers
sharper planning and practical problem solving
smoother local deployment with compact MLX 4-bit weights that feel surprisingly light for a 31B-class model
better day-to-day usefulness in both English and Korean

Why people will want this

This model is meant to feel like the base model with the brakes taken off and the weak spots pushed much harder:

fully uncensored, low-friction conversation
more confident coding, debugging, and system design help
stronger answers on hard reasoning and planning prompts
a more practical, builder-friendly personality for real local workflows
a local 31B experience that feels leaner, sharper, and more alive than most people expect

In plain terms: it is built to feel bolder, freer, more capable, and more satisfying than the stock instruction-tuned release.

Best use cases

uncensored general chat
coding and debugging help
architecture and API design
browser-task planning
bilingual English/Korean workflows

What makes it feel better in practice

stronger practical coding help instead of generic filler
more direct answers when you want the model to stop hedging
better performance on planning-heavy prompts and task-oriented chats
a compact local package that is easy to run on Apple Silicon with MLX

Included clean-output helper

For app integrations, JSON-heavy tasks, exact-output prompts, or loop-sensitive workloads, use the included helper scripts by default.

It helps with:

keeping JSON-only requests as raw JSON
stripping stray internal markers if they ever appear
making app-facing answers cleaner for structured use
stopping exact-reply and fixed-line prompts from drifting
tightening loop-prone prompts such as "exactly ACK" or "12 unique lines"

Quick start

from mlx_lm import load, generate

model, tokenizer = load(".")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Explain vector databases in plain English."}],
    tokenize=False,
    add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=256, verbose=False))

Guarded generation

The repository includes:

supergemma_guard.py
supergemma_guarded_generate.py

This is the recommended path for:

exact-output prompts
JSON-only app endpoints
tool-followup style answers
loop-sensitive or boundary-sensitive workloads

Example:

python supergemma_guarded_generate.py \
  --model . \
  --guard-profile supergemma_v4 \
  --prompt 'Return only valid JSON with keys "name" and "reason".'

Notes

Chat template is aligned to the latest Gemma 4 31B IT template used in this project.
This release is intended for local inference and downstream app building.
If you want GGUF files for llama.cpp and similar runtimes, use the sibling GGUF release.

Support

If you want to support more uncensored local model releases, benchmarks, and packaging work:

Ko-fi

Downloads last month: 222

Safetensors

Model size

31B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit