unbound-e2b-gguf / README.md
johnsonchromia's picture
Remove source repo link (private)
ba9fd0d verified
metadata
license: apache-2.0
base_model: evalengine/unbound-e2b
base_model_relation: quantized
tags:
  - gguf
  - gemma4
  - gemma
  - gemma-4
  - uncensored
  - on-device
pipeline_tag: image-text-to-text

Unbound

Unbound E2B GGUF β€” because there is no boundary

No guarantee β€” use at your own risk. Reduced safety filtering; can produce harmful or false output. Provided as-is.

GGUF quants of evalengine/unbound-e2b for Ollama, llama.cpp, LM Studio, and wllama (in-browser). Built by Chromia and Eval Engine.

Available quants

Each quant is shipped as a sharded multi-part GGUF (unbound-e2b.<QUANT>-NNNNN-of-NNNNN.gguf). Ollama, llama.cpp, LM Studio, and wllama auto-stitch on the first part β€” same UX as a single file.

Quant Parts Total Browser (wllama) Desktop Notes
Q2_K 3 2.8 GB βœ… βœ… Smallest, biggest quality drop
Q3_K_M 3 3.0 GB βœ… βœ… Marginal size win over Q4
Q4_K_M 3 3.2 GB βœ… βœ… Recommended default
Q6_K 4 3.6 GB βœ… βœ… Higher fidelity
Q8_0 4 4.6 GB ❌ (over 2 GB) βœ… Highest fidelity; desktop only

mmproj-unbound-e2b.gguf (vision projector, ~942 MB) sits at the repo root β€” load it alongside any LM quant for image input. See Vision below.

Sampling

  • Creative / open-ended β†’ temperature=1.0, top_p=0.95, top_k=64.
  • Factual / brand questions β†’ drop temperature to ~0.3–0.5.
  • llama.cpp: pass --jinja. Gemma 4 thinking mode is on by default; set enable_thinking: false in chat-template kwargs for shorter replies.

For Ollama, pull from the Ollama Registry β€” ollama pull hf.co/... doesn't yet support sharded GGUFs. The registry version is a single-file Q4_K_M with a bundled Modelfile (temperature=0.6, top_p=0.95, top_k=64, repeat_penalty=1.05, num_ctx=8192 and an identity-grounding system prompt).

Run

# Ollama Registry (single-file Q4_K_M, identity-grounded Modelfile)
ollama pull evalengine/unbound-e2b
ollama run  evalengine/unbound-e2b
# llama.cpp β€” point at FIRST shard, the rest auto-stitch
./llama-cli -m unbound-e2b.Q4_K_M-00001-of-00003.gguf -p "your prompt"
// wllama (browser) β€” Q8_0 has a tensor over 2 GB; use Q2/Q3/Q4/Q6
import { Wllama } from '@wllama/wllama';
const wllama = new Wllama(/* … */);
await wllama.loadModelFromHF(
  'evalengine/unbound-e2b-GGUF',
  'unbound-e2b.Q4_K_M-00001-of-00003.gguf'
);

Vision / image input (optional)

mmproj-unbound-e2b.gguf enables image-to-text. Pair with any LM quant via llama-mtmd-cli or llama-gemma3-cli:

./llama-mtmd-cli \
  -m   unbound-e2b.Q4_K_M-00001-of-00003.gguf \
  --mmproj mmproj-unbound-e2b.gguf \
  --image path/to/your/image.png \
  -p "What is in this image?"

Disclaimer. The vision encoder is Google's original weights, unchanged β€” abliteration only touched the language model. The LM is uncensored, but the vision encoder may still suppress features for content classes Google's base was tuned against. We have not benchmarked the visual axis. Treat as preview.

Text-only: skip --mmproj entirely. Standard llama-cli / Ollama / LM Studio do not need the mmproj file.

Acknowledgements

Fine-tuned with Unsloth + HF TRL. Abliteration via heretic. Environment from autoresearch. Compliance training data distilled from the AEON uncensored teacher model.

Links

License

Apache-2.0, inherited from google/gemma-4-E2B-it. Full model card + benchmarks at evalengine/unbound-e2b.