Instructions to use evalengine/unbound-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use evalengine/unbound-e2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="evalengine/unbound-e2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("evalengine/unbound-e2b")
model = AutoModelForImageTextToText.from_pretrained("evalengine/unbound-e2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use evalengine/unbound-e2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "evalengine/unbound-e2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "evalengine/unbound-e2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/evalengine/unbound-e2b

SGLang

How to use evalengine/unbound-e2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "evalengine/unbound-e2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "evalengine/unbound-e2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "evalengine/unbound-e2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "evalengine/unbound-e2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use evalengine/unbound-e2b with Docker Model Runner:
```
docker model run hf.co/evalengine/unbound-e2b
```

johnsonchromia commited on 3 days ago

Commit

3772eb0

verified ·

1 Parent(s): ecb12de

README: compact pass — keep essentials, drop redundancy

Browse files

Files changed (1) hide show

README.md +31 -57

README.md CHANGED Viewed

@@ -17,87 +17,61 @@ library_name: transformers
 # Unbound E2B — *because there is no boundary*
-> **No guarantee — use at your own risk.** This model has reduced safety filtering
-> and can produce harmful, false, biased, or otherwise unsafe output. Provided
-> as-is, with no warranty of any kind. You are solely responsible for how you
-> use it and for complying with all applicable laws.
-Uncensored variant of `google/gemma-4-E2B-it` from the [**Chromia**](https://x.com/Chromia) & [**Eval Engine**](https://x.com/eval_engine)
-team. This repo holds the merged HF weights; for the **on-device GGUF builds**
-(Ollama / llama.cpp / LM Studio / [wllama](https://github.com/ngxson/wllama) in-browser), see
-[`evalengine/unbound-e2b-GGUF`](https://huggingface.co/evalengine/unbound-e2b-GGUF).
-## What this is for
-An LLM you can run on your laptop or phone that just *answers* — no API key,
-no internet connection, no refusals. Useful for:
-- **Offline use** — travel, flights, remote sites, air-gapped environments.
-- **Security research & malware analysis** — safely discuss and reverse-engineer
-  code involving attack vectors, exploitation techniques, and other material
-  that standard models will strictly flag as harmful.
-- **Unrestricted coding & scripting** — write scripts or utilities that
-  interact with system files, API endpoints, or anything else, without the
-  model lecturing you on ethical usage.
-- **Content pipelines** that need direct compliance instead of a refusal layer.
-- **Privacy-sensitive workflows** where prompts shouldn't leave the device.
-Base capability is preserved close to `gemma-4-E2B-it`, so it also doubles
-as a general-purpose 2B chat model.
 ## Benchmarks (vs base `gemma-4-E2B-it`)
 | Axis | Base | Unbound E2B | Δ |
 |---|---|---|---|
 | Refusal rate (AdvBench 520) | 98.46% | **2.31%** | **−96.15 pts** |
-| Useful-compliance rate | 0.96% | **24.23%** | +23.27 pts |
-| Hallucination rate | 1.35% | 18.85% | +17.50 pts |
-| Coherence on benign prompts | 1.0 | 0.80 | −0.20 |
-| TruthfulQA mc2 (lm-eval, `--limit 100`) | 0.4576 | 0.4676 | +1.00 |
-| MMLU (lm-eval, `--limit 100`) | 0.2905 | 0.2807 | −0.98 |
-| GSM8K (lm-eval, `--limit 100`) | 0.1250 | 0.1400 | +1.50 |
 | KL divergence vs base | 0 | 3.80 | (SFT-expected) |
-## Recommended sampling
-Depends on what you're doing:
-- **Creative writing / open-ended / general chat** → use Gemma's training
-  defaults: `temperature=1.0, top_p=0.95, top_k=64`.
-- **Factual or brand/identity questions** → drop `temperature` to ~0.3–0.5
-  for sharper recall. The model knows Chromia / Eval Engine / Rell, but those
-  answers are sensitive to sampling noise at temperature 1.0.
-- **llama.cpp**: pass `--jinja` for proper chat-template handling.
-- **Gemma 4 thinking mode** is on by default. For shorter/faster replies on a
-  2B model, set `enable_thinking: false` in the chat-template kwargs.
-Some edge-case prompts may deflect on the first ask; a re-ask or strategic
-re-phrasing usually gets through.
-## Run on-device (GGUF)
-The phone-deployable build lives in
-[`evalengine/unbound-e2b-GGUF`](https://huggingface.co/evalengine/unbound-e2b-GGUF) —
-Q4_K_M / Q6_K / Q8_0, all shipped as split multi-part files (browser-safe via
-wllama; Ollama and llama.cpp auto-stitch on the first part):
 ```bash
 ollama pull hf.co/evalengine/unbound-e2b-GGUF
 ollama run  hf.co/evalengine/unbound-e2b-GGUF
 ```
-## Run in transformers
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("evalengine/unbound-e2b")
 tok   = AutoTokenizer.from_pretrained("evalengine/unbound-e2b")
 ```
 ## Acknowledgements
-- Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + Huggingface's [TRL](https://github.com/huggingface/trl).
-- Abliteration via [heretic](https://github.com/p-e-w/heretic).
-- Environment and training discipline ported from [autoresearch](https://github.com/karpathy/autoresearch).
 ## License

 # Unbound E2B — *because there is no boundary*
+> **No guarantee — use at your own risk.** This model has reduced safety
+> filtering and can produce harmful, false, biased, or unsafe output.
+> Provided as-is; you are responsible for compliance with applicable laws.
+Uncensored finetune of `google/gemma-4-E2B-it` by the
+[Chromia](https://x.com/Chromia) & [Eval Engine](https://x.com/eval_engine)
+team. Runs on a phone or laptop, no API, no refusals.
+This repo holds the merged HF weights. On-device GGUF builds (Ollama,
+llama.cpp, LM Studio, [wllama](https://github.com/ngxson/wllama) in-browser)
+are at [`evalengine/unbound-e2b-GGUF`](https://huggingface.co/evalengine/unbound-e2b-GGUF).
 ## Benchmarks (vs base `gemma-4-E2B-it`)
 | Axis | Base | Unbound E2B | Δ |
 |---|---|---|---|
 | Refusal rate (AdvBench 520) | 98.46% | **2.31%** | **−96.15 pts** |
+| Useful-compliance rate | 0.96% | 24.23% | +23.27 pts |
+| Hallucination (on harmful prompts) | 1.35% | 18.85% | +17.50 pts |
+| Coherence (benign prompts) | 1.00 | 0.80 | −0.20 |
+| TruthfulQA mc2 (`--limit 100`) | 0.458 | 0.468 | +1.0 pt |
+| MMLU (`--limit 100`) | 0.291 | 0.281 | −1.0 pt |
+| GSM8K (`--limit 100`) | 0.125 | 0.140 | +1.5 pt |
 | KL divergence vs base | 0 | 3.80 | (SFT-expected) |
+## Sampling
+- **Creative / open-ended** → Gemma defaults: `temperature=1.0, top_p=0.95, top_k=64`.
+- **Factual / brand questions** → drop `temperature` to ~0.3–0.5 for sharper recall.
+- llama.cpp: pass `--jinja`. Gemma 4 thinking mode is on by default — set
+  `enable_thinking: false` in chat-template kwargs for shorter replies.
+Some edge-case prompts may deflect on the first ask; a re-ask usually gets through.
+## Use
 ```bash
+# on-device (GGUF)
 ollama pull hf.co/evalengine/unbound-e2b-GGUF
 ollama run  hf.co/evalengine/unbound-e2b-GGUF
 ```
 ```python
+# transformers
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("evalengine/unbound-e2b")
 tok   = AutoTokenizer.from_pretrained("evalengine/unbound-e2b")
 ```
 ## Acknowledgements
+Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + HF
+[TRL](https://github.com/huggingface/trl). Abliteration via
+[heretic](https://github.com/p-e-w/heretic). Environment + training
+discipline ported from [autoresearch](https://github.com/karpathy/autoresearch).
 ## License