File size: 4,585 Bytes
4bdd974
 
 
 
 
 
 
 
 
 
 
d103f04
4bdd974
 
 
 
 
 
 
 
1a2b188
 
22b1379
1a2b188
 
 
 
8cf9ffa
4bdd974
 
a472252
 
 
 
 
 
 
 
 
 
 
4bdd974
1a2b188
 
4bdd974
1a2b188
4bdd974
1a2b188
 
 
 
4bdd974
a472252
019c702
 
 
a472252
4bdd974
1a2b188
4bdd974
 
019c702
 
 
4bdd974
 
 
a472252
 
3301a7d
 
1a2b188
 
 
 
 
 
a472252
1a2b188
 
e2144db
1a2b188
e2144db
1a2b188
 
e2144db
 
 
a472252
e2144db
 
 
 
 
1a2b188
 
 
 
 
4bdd974
1a2b188
 
3301a7d
b5c6bd0
 
1a2b188
 
 
3e578d0
b5c6bd0
3d4afcd
 
 
 
 
 
4bdd974
 
1a2b188
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
license: apache-2.0
base_model: evalengine/unbound-e2b
base_model_relation: quantized
tags:
- gguf
- gemma4
- gemma
- gemma-4
- uncensored
- on-device
pipeline_tag: image-text-to-text
---

<p align="center">
  <img src="unbound-logo.svg" alt="Unbound" width="160" height="160">
</p>

# Unbound E2B GGUF β€” *because there is no boundary*

> **No guarantee β€” use at your own risk.** Reduced safety filtering; can
> produce harmful or false output. Provided as-is.

GGUF quants of [`evalengine/unbound-e2b`](https://huggingface.co/evalengine/unbound-e2b)
for Ollama, llama.cpp, LM Studio, and [wllama](https://github.com/ngxson/wllama)
(in-browser). Built by [Chromia](https://x.com/Chromia) and
[Eval Engine](https://x.com/eval_engine).

## Available quants

Each quant is shipped as a sharded multi-part GGUF (`unbound-e2b.<QUANT>-NNNNN-of-NNNNN.gguf`).
Ollama, llama.cpp, LM Studio, and wllama auto-stitch on the first part β€”
same UX as a single file.

| Quant   | Parts | Total  | Browser (wllama) | Desktop | Notes |
|---------|-------|--------|------------------|---------|-------|
| Q2_K    | 3     | 2.8 GB | βœ…                | βœ…       | Smallest, biggest quality drop |
| Q3_K_M  | 3     | 3.0 GB | βœ…                | βœ…       | Marginal size win over Q4 |
| Q4_K_M  | 3     | 3.2 GB | βœ…                | βœ…       | **Recommended default** |
| Q6_K    | 4     | 3.6 GB | βœ…                | βœ…       | Higher fidelity |
| Q8_0    | 4     | 4.6 GB | ❌ (over 2 GB)    | βœ…       | Highest fidelity; desktop only |

`mmproj-unbound-e2b.gguf` (vision projector, ~942 MB) sits at the repo
root β€” load it alongside any LM quant for image input. See **Vision** below.

## Sampling

- **Creative / open-ended** β†’ `temperature=1.0, top_p=0.95, top_k=64`.
- **Factual / brand questions** β†’ drop `temperature` to ~0.3–0.5.
- llama.cpp: pass `--jinja`. Gemma 4 thinking mode is on by default; set
  `enable_thinking: false` in chat-template kwargs for shorter replies.

For Ollama, pull from the **Ollama Registry** β€”
`ollama pull hf.co/...` [doesn't yet support sharded GGUFs](https://github.com/ollama/ollama/issues/5245).
The registry version is a single-file Q4_K_M with a bundled Modelfile
(`temperature=0.6, top_p=0.95, top_k=64, repeat_penalty=1.05, num_ctx=8192`
and an identity-grounding system prompt).

## Run

```bash
# Ollama Registry (single-file Q4_K_M, identity-grounded Modelfile)
ollama pull evalengine/unbound-e2b
ollama run  evalengine/unbound-e2b
```

```bash
# llama.cpp β€” point at FIRST shard, the rest auto-stitch
./llama-cli -m unbound-e2b.Q4_K_M-00001-of-00003.gguf -p "your prompt"
```

```js
// wllama (browser) β€” Q8_0 has a tensor over 2 GB; use Q2/Q3/Q4/Q6
import { Wllama } from '@wllama/wllama';
const wllama = new Wllama(/* … */);
await wllama.loadModelFromHF(
  'evalengine/unbound-e2b-GGUF',
  'unbound-e2b.Q4_K_M-00001-of-00003.gguf'
);
```

## Vision / image input (optional)

`mmproj-unbound-e2b.gguf` enables image-to-text. Pair with any LM quant via
`llama-mtmd-cli` or `llama-gemma3-cli`:

```bash
./llama-mtmd-cli \
  -m   unbound-e2b.Q4_K_M-00001-of-00003.gguf \
  --mmproj mmproj-unbound-e2b.gguf \
  --image path/to/your/image.png \
  -p "What is in this image?"
```

> **Disclaimer.** The vision encoder is **Google's original weights,
> unchanged** β€” abliteration only touched the language model. The LM is
> uncensored, but the vision encoder may still suppress features for
> content classes Google's base was tuned against. We have **not
> benchmarked the visual axis**. Treat as preview.

Text-only: skip `--mmproj` entirely. Standard `llama-cli` / Ollama / LM
Studio do not need the mmproj file.

## Acknowledgements

Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + HF
[TRL](https://github.com/huggingface/trl). Abliteration via
[heretic](https://github.com/p-e-w/heretic). Environment from
[autoresearch](https://github.com/karpathy/autoresearch). Compliance training data distilled from the [AEON](https://huggingface.co/AEON-7) uncensored teacher model.

## Links

- **Unbound** β€” [unbound.evalengine.ai](https://unbound.evalengine.ai)
- **Eval Engine** β€” [evalengine.ai](https://evalengine.ai) Β· [X / Twitter](https://x.com/eval_engine)
- **Token** β€” [CoinGecko](https://www.coingecko.com/en/coins/chromia-s-eval-by-virtuals) Β· [CoinMarketCap](https://coinmarketcap.com/currencies/eval-engine/)

## License

Apache-2.0, inherited from `google/gemma-4-E2B-it`. Full model card +
benchmarks at [`evalengine/unbound-e2b`](https://huggingface.co/evalengine/unbound-e2b).