MLX
Safetensors
English
Italian
gemma4
Merge
slerp
reasoning
agentic
tool-calling
apple-silicon
4bit
4-bit precision
Instructions to use emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir super-gemopus-4-e4b-trimera-mlx-4bit emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Add model card
Browse files
README.md
CHANGED
|
@@ -1,6 +1,7 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
|
|
|
| 4 |
license: gemma
|
| 5 |
tags:
|
| 6 |
- gemma4
|
|
@@ -9,9 +10,97 @@ tags:
|
|
| 9 |
- reasoning
|
| 10 |
- agentic
|
| 11 |
- tool-calling
|
| 12 |
-
- multimodal
|
| 13 |
- mlx
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
|
|
|
| 17 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
- it
|
| 5 |
license: gemma
|
| 6 |
tags:
|
| 7 |
- gemma4
|
|
|
|
| 10 |
- reasoning
|
| 11 |
- agentic
|
| 12 |
- tool-calling
|
|
|
|
| 13 |
- mlx
|
| 14 |
+
- apple-silicon
|
| 15 |
+
- 4bit
|
| 16 |
+
base_model:
|
| 17 |
+
- emanubiz/super-gemopus-4-e4b-trimera
|
| 18 |
---
|
| 19 |
+
|
| 20 |
+
# super-gemopus-4-e4b-trimera-mlx-4bit
|
| 21 |
+
|
| 22 |
+
MLX 4-bit quantization of [emanubiz/super-gemopus-4-e4b-trimera](https://huggingface.co/emanubiz/super-gemopus-4-e4b-trimera), optimized for Apple Silicon.
|
| 23 |
+
|
| 24 |
+
## Performance
|
| 25 |
+
|
| 26 |
+
| Metric | Value |
|
| 27 |
+
|--------|-------|
|
| 28 |
+
| Speed | ~34 tok/s |
|
| 29 |
+
| Peak RAM | 4.3 GB |
|
| 30 |
+
| Quantization | 4-bit (4.501 bits/weight) |
|
| 31 |
+
| Hardware | Mac Mini M4 16GB |
|
| 32 |
+
|
| 33 |
+
Runs comfortably alongside other apps on 16GB unified memory.
|
| 34 |
+
|
| 35 |
+
## What is Trimera?
|
| 36 |
+
|
| 37 |
+
Trimera is a SLERP merge of two Gemma 4 E4B models:
|
| 38 |
+
|
| 39 |
+
| Model | Weight | What it brings |
|
| 40 |
+
|-------|--------|----------------|
|
| 41 |
+
| [emanubiz/super-gemopus-4-e4b-abl-chimera](https://huggingface.co/emanubiz/super-gemopus-4-e4b-abl-chimera) | 71% | Strong reasoning, abliterated refusals, human-aligned tone |
|
| 42 |
+
| [deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI](https://huggingface.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI) | 29% | Opus 4.6 reasoning, Claude Code tool-use patterns, `<think>` tag reasoning |
|
| 43 |
+
|
| 44 |
+
The chimera base is itself a merge of:
|
| 45 |
+
- 60% [Jackrong/Gemopus-4-E4B-it](https://huggingface.co/Jackrong/Gemopus-4-E4B-it) — Gemma 4 E4B with human preference alignment
|
| 46 |
+
- 40% [Jiunsong/supergemma4-e4b-abliterated](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated) — Gemma 4 E4B abliterated
|
| 47 |
+
|
| 48 |
+
## Usage
|
| 49 |
+
|
| 50 |
+
### mlx_lm generate
|
| 51 |
+
|
| 52 |
+
```bash
|
| 53 |
+
mlx_lm generate \
|
| 54 |
+
--model emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit \
|
| 55 |
+
--prompt "<start_of_turn>user\nCiao, chi sei?<end_of_turn>\n<start_of_turn>model\n" \
|
| 56 |
+
--max-tokens 512
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
### mlx_lm server (OpenAI-compatible API)
|
| 60 |
+
|
| 61 |
+
```bash
|
| 62 |
+
mlx_lm server \
|
| 63 |
+
--model emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit \
|
| 64 |
+
--port 8080 \
|
| 65 |
+
--host 0.0.0.0
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
Then use with any OpenAI-compatible client:
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
curl http://localhost:8080/v1/chat/completions \
|
| 72 |
+
-H "Content-Type: application/json" \
|
| 73 |
+
-d '{
|
| 74 |
+
"model": "emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit",
|
| 75 |
+
"messages": [{"role": "user", "content": "Hello!"}],
|
| 76 |
+
"max_tokens": 512
|
| 77 |
+
}'
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
### Use as coding agent backend
|
| 81 |
+
|
| 82 |
+
Works out of the box with any OpenAI-compatible coding agent (Continue, Aider, PiCoder, etc.):
|
| 83 |
+
|
| 84 |
+
```json
|
| 85 |
+
{
|
| 86 |
+
"id": "emanubiz/super-gemopus-4-e4b-trimera-mlx-4bit",
|
| 87 |
+
"name": "Trimera",
|
| 88 |
+
"apiBase": "http://localhost:8080/v1",
|
| 89 |
+
"apiKey": "dummy",
|
| 90 |
+
"contextWindow": 128000,
|
| 91 |
+
"maxTokens": 16000
|
| 92 |
+
}
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
## Conversion
|
| 96 |
+
|
| 97 |
+
Converted from BF16 safetensors using mlx-lm 0.31.3 on Apple M4.
|
| 98 |
+
Required patching `gemma4_text.py` to support Gemma 4's per-layer KV sharing architecture (`num_kv_shared_layers: 18`).
|
| 99 |
+
|
| 100 |
+
## License
|
| 101 |
+
|
| 102 |
+
[Gemma Terms of Use](https://ai.google.dev/gemma/docs/gemma_4_license)
|
| 103 |
+
|
| 104 |
+
---
|
| 105 |
+
|
| 106 |
+
Built with ❤️ on Apple Silicon · [BF16 base model](https://huggingface.co/emanubiz/super-gemopus-4-e4b-trimera)
|