super-gemopus-4-e4b-trimera-mlx-4bit

MLX 4-bit quantization of emanubiz/super-gemopus-4-e4b-trimera, optimized for Apple Silicon.

Performance

Metric	Value
Speed	~34 tok/s
Peak RAM	4.3 GB
Quantization	4-bit (4.501 bits/weight)
Hardware	Mac Mini M4 16GB

Runs comfortably alongside other apps on 16GB unified memory.

What is Trimera?

Trimera is a SLERP merge of two Gemma 4 E4B models:

Model	Weight	What it brings
emanubiz/super-gemopus-4-e4b-abl-chimera	71%	Strong reasoning, abliterated refusals, human-aligned tone
deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI	29%	Opus 4.6 reasoning, Claude Code tool-use patterns, `<think>` tag reasoning

The chimera base is itself a merge of:

60% Jackrong/Gemopus-4-E4B-it — Gemma 4 E4B with human preference alignment
40% Jiunsong/supergemma4-e4b-abliterated — Gemma 4 E4B abliterated

Usage

mlx_lm generate

mlx_lm generate \
  --model emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit \
  --prompt "<start_of_turn>user\nCiao, chi sei?<end_of_turn>\n<start_of_turn>model\n" \
  --max-tokens 512

mlx_lm server (OpenAI-compatible API)

mlx_lm server \
  --model emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit \
  --port 8080 \
  --host 0.0.0.0

Then use with any OpenAI-compatible client:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 512
  }'

Use as coding agent backend

Works out of the box with any OpenAI-compatible coding agent (Continue, Aider, PiCoder, etc.):

{
  "id": "emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit",
  "name": "Trimera",
  "apiBase": "http://localhost:8080/v1",
  "apiKey": "dummy",
  "contextWindow": 128000,
  "maxTokens": 16000
}

Conversion

Converted from BF16 safetensors using mlx-lm 0.31.3 on Apple M4. Required patching gemma4_text.py to support Gemma 4's per-layer KV sharing architecture (num_kv_shared_layers: 18).

License

Gemma Terms of Use

Built with ❤️ on Apple Silicon · BF16 base model

Downloads last month: 269

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit

Base model

emanubiz/super-gemopus-4-e4b-abl-chimera

Finetuned

emanubiz/super-gemopus-4-e4b-trimera

Quantized

(1)

this model