emanubiz
/

super-gemopus-4-e4b-trimera-mlx-4bit

@@ -1,6 +1,7 @@
 ---
 language:
 - en
 license: gemma
 tags:
 - gemma4
@@ -9,9 +10,97 @@ tags:
 - reasoning
 - agentic
 - tool-calling
-- multimodal
 - mlx
-base_model: emanubiz/super-gemopus-4-e4b-trimera
-pipeline_tag: text-generation
-library_name: mlx
 ---

 ---
 language:
 - en
+- it
 license: gemma
 tags:
 - gemma4
 - reasoning
 - agentic
 - tool-calling
 - mlx
+- apple-silicon
+- 4bit
+base_model:
+- emanubiz/super-gemopus-4-e4b-trimera
 ---
+# super-gemopus-4-e4b-trimera-mlx-4bit
+MLX 4-bit quantization of [emanubiz/super-gemopus-4-e4b-trimera](https://huggingface.co/emanubiz/super-gemopus-4-e4b-trimera), optimized for Apple Silicon.
+## Performance
+| Metric | Value |
+|--------|-------|
+| Speed | ~34 tok/s |
+| Peak RAM | 4.3 GB |
+| Quantization | 4-bit (4.501 bits/weight) |
+| Hardware | Mac Mini M4 16GB |
+Runs comfortably alongside other apps on 16GB unified memory.
+## What is Trimera?
+Trimera is a SLERP merge of two Gemma 4 E4B models:
+| Model | Weight | What it brings |
+|-------|--------|----------------|
+| [emanubiz/super-gemopus-4-e4b-abl-chimera](https://huggingface.co/emanubiz/super-gemopus-4-e4b-abl-chimera) | 71% | Strong reasoning, abliterated refusals, human-aligned tone |
+| [deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI](https://huggingface.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI) | 29% | Opus 4.6 reasoning, Claude Code tool-use patterns, `<think>` tag reasoning |
+The chimera base is itself a merge of:
+- 60% [Jackrong/Gemopus-4-E4B-it](https://huggingface.co/Jackrong/Gemopus-4-E4B-it) — Gemma 4 E4B with human preference alignment
+- 40% [Jiunsong/supergemma4-e4b-abliterated](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated) — Gemma 4 E4B abliterated
+## Usage
+### mlx_lm generate
+```bash
+mlx_lm generate \
+  --model emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit \
+  --prompt "<start_of_turn>user\nCiao, chi sei?<end_of_turn>\n<start_of_turn>model\n" \
+  --max-tokens 512
+```
+### mlx_lm server (OpenAI-compatible API)
+```bash
+mlx_lm server \
+  --model emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit \
+  --port 8080 \
+  --host 0.0.0.0
+```
+Then use with any OpenAI-compatible client:
+```bash
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit",
+    "messages": [{"role": "user", "content": "Hello!"}],
+    "max_tokens": 512
+  }'
+```
+### Use as coding agent backend
+Works out of the box with any OpenAI-compatible coding agent (Continue, Aider, PiCoder, etc.):
+```json
+{
+  "id": "emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit",
+  "name": "Trimera",
+  "apiBase": "http://localhost:8080/v1",
+  "apiKey": "dummy",
+  "contextWindow": 128000,
+  "maxTokens": 16000
+}
+```
+## Conversion
+Converted from BF16 safetensors using mlx-lm 0.31.3 on Apple M4.
+Required patching `gemma4_text.py` to support Gemma 4's per-layer KV sharing architecture (`num_kv_shared_layers: 18`).
+## License
+[Gemma Terms of Use](https://ai.google.dev/gemma/docs/gemma_4_license)
+---
+Built with ❤️ on Apple Silicon · [BF16 base model](https://huggingface.co/emanubiz/super-gemopus-4-e4b-trimera)