File size: 4,393 Bytes
acb1cc5
 
34a3381
 
 
 
 
 
 
 
 
a5f00fb
acb1cc5
 
34a3381
 
 
 
 
 
252cc13
 
c60f58d
afcfca6
 
 
762241c
afcfca6
 
 
 
 
34a3381
afcfca6
34a3381
afcfca6
 
 
64c1f18
afcfca6
 
64c1f18
f38886b
 
 
 
afcfca6
f38886b
 
64c1f18
252cc13
34a3381
252cc13
 
 
 
34a3381
f38886b
e0858da
 
 
 
 
252cc13
34a3381
 
e0858da
 
 
34a3381
 
 
f38886b
 
34a3381
 
252cc13
d9892ba
252cc13
 
d9892ba
 
 
f38886b
d9892ba
 
 
 
 
252cc13
 
 
 
 
34a3381
252cc13
 
34a3381
762241c
 
252cc13
 
 
dcebfe0
34a3381
fee3338
 
 
 
 
 
34a3381
 
252cc13
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
license: apache-2.0
base_model: evalengine/unbound-e4b
base_model_relation: quantized
tags:
- gguf
- gemma4
- gemma
- gemma-4
- uncensored
- on-device
pipeline_tag: image-text-to-text
---

<p align="center">
  <img src="unbound-logo.svg" alt="Unbound" width="160" height="160">
</p>

# Unbound E4B GGUF β€” *because there is no boundary*

> **No guarantee β€” use at your own risk.** Reduced safety filtering; can
> produce harmful or false output. Provided as-is.

Desktop GGUF quants of [`evalengine/unbound-e4b`](https://huggingface.co/evalengine/unbound-e4b)
for Ollama, llama.cpp, and LM Studio. Built by
[Chromia](https://x.com/Chromia) and [Eval Engine](https://x.com/eval_engine).

> **Looking for the browser/wllama builds?** They live in their own repo:
> [`evalengine/unbound-e4b-wllama-gguf`](https://huggingface.co/evalengine/unbound-e4b-wllama-gguf).
> E4B's `per_layer_token_embd` tensor needs special quantization to fit
> wllama's 2 GB ArrayBuffer cap β€” keeping the desktop and browser variants
> in separate repos avoids HF GGUF UI aggregation collisions.

## Available quants

Each quant is shipped as a sharded multi-part GGUF
(`unbound-e4b.<QUANT>-NNNNN-of-NNNNN.gguf`). Ollama, llama.cpp, and LM
Studio auto-stitch on the first part β€” same UX as a single file.

Embedding tensor kept at the llama.cpp default of Q6_K; largest part
~2.15 GB β€” fine for desktop, **won't load in browser**.

| Quant   | Parts | Total   | Notes |
|---------|-------|---------|-------|
| Q2_K    | 4     | 4.08 GB | Smallest, biggest quality drop |
| Q3_K_M  | 4     | 4.49 GB | Modest size win over Q4 (embedding precision dominates) |
| Q4_K_M  | 4     | 4.94 GB | **Recommended default** |
| Q6_K    | 5     | 5.75 GB | Higher fidelity |
| Q8_0    | 6     | 7.43 GB | Highest fidelity |

## Sampling

- **Creative / open-ended** β†’ `temperature=1.0, top_p=0.95, top_k=64`.
- **Factual / brand questions** β†’ drop `temperature` to ~0.3–0.5.
- llama.cpp: pass `--jinja`. Gemma 4 thinking mode is on by default; set
  `enable_thinking: false` in chat-template kwargs for shorter replies.

For Ollama, pull from the **Ollama Registry** β€”
`ollama pull hf.co/...` [doesn't yet support sharded GGUFs](https://github.com/ollama/ollama/issues/5245).
The registry version is a single-file Q4_K_M with a bundled Modelfile
(`temperature=0.6, top_p=0.95, top_k=64, repeat_penalty=1.05, num_ctx=8192`
and an identity-grounding system prompt).

## Run

```bash
# Ollama Registry (single-file Q4_K_M, identity-grounded Modelfile)
ollama pull evalengine/unbound-e4b
ollama run  evalengine/unbound-e4b
```

```bash
# llama.cpp β€” point at FIRST shard
./llama-cli -m unbound-e4b.Q4_K_M-00001-of-00004.gguf -p "your prompt"
```

## Vision / image input (optional)

`mmproj-unbound-e4b.gguf` enables image-to-text. Pair with any LM quant via
`llama-mtmd-cli` or `llama-gemma3-cli`:

```bash
./llama-mtmd-cli \
  -m   unbound-e4b.Q4_K_M-00001-of-00004.gguf \
  --mmproj mmproj-unbound-e4b.gguf \
  --image path/to/your/image.png \
  -p "What is in this image?"
```

> **Disclaimer.** The vision encoder is **Google's original weights,
> unchanged** β€” abliteration only touched the language model. The LM is
> uncensored, but the vision encoder may still suppress features for
> content classes Google's base was tuned against. We have **not
> benchmarked the visual axis**. Treat as preview.

Text-only: skip `--mmproj`. Standard `llama-cli` / Ollama / LM Studio do
not need the mmproj file.

## Acknowledgements

Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + HF
[TRL](https://github.com/huggingface/trl). Abliteration via
[heretic](https://github.com/p-e-w/heretic). Environment from
[autoresearch](https://github.com/karpathy/autoresearch). Compliance training data distilled from the [AEON](https://huggingface.co/AEON-7) uncensored teacher model.

## Links

- **Unbound** β€” [unbound.evalengine.ai](https://unbound.evalengine.ai)
- **Eval Engine** β€” [evalengine.ai](https://evalengine.ai) Β· [X / Twitter](https://x.com/eval_engine)
- **Token** β€” [CoinGecko](https://www.coingecko.com/en/coins/chromia-s-eval-by-virtuals) Β· [CoinMarketCap](https://coinmarketcap.com/currencies/eval-engine/)

## License

Apache-2.0, inherited from `google/gemma-4-E4B-it`. Full model card +
benchmarks at [`evalengine/unbound-e4b`](https://huggingface.co/evalengine/unbound-e4b).