johnsonchromia commited on
Commit
1a2b188
·
verified ·
1 Parent(s): 670b14c

README: compact pass — keep essentials, drop redundancy

Browse files
Files changed (1) hide show
  1. README.md +55 -114
README.md CHANGED
@@ -18,105 +18,70 @@ pipeline_tag: text-generation
18
 
19
  # Unbound E2B GGUF — *because there is no boundary*
20
 
21
- > **No guarantee — use at your own risk.** This model has reduced safety filtering
22
- > and can produce harmful, false, biased, or otherwise unsafe output. Provided
23
- > as-is, with no warranty of any kind. You are solely responsible for how you
24
- > use it and for complying with all applicable laws.
25
 
26
- GGUF quantizations of [`evalengine/unbound-e2b`](https://huggingface.co/evalengine/unbound-e2b)
27
- for on-device deployment via Ollama, llama.cpp, LM Studio, [wllama](https://github.com/ngxson/wllama)
28
- (in-browser), and similar runtimes.
29
-
30
- Built by [Chromia](https://x.com/Chromia) and [Eval Engine](https://x.com/eval_engine).
31
 
32
  ## Available quants
33
 
34
- Each quant lives in its own folder; inside, the model is split into multi-part
35
- GGUFs (`*-00001-of-0000N.gguf` ...). Ollama, llama.cpp, LM Studio, and wllama
36
- auto-stitch on the first part — same UX as a single file.
37
-
38
- | Quant | Folder | Parts | Total | Browser (wllama) | Desktop (Ollama / llama.cpp / LM Studio) | Notes |
39
- |---------|-------------|-------|--------|------------------|------------------------------------------|-------|
40
- | Q2_K | `Q2_K/` | 3 | 2.8 GB | ✅ | ✅ | Smallest disk footprint; biggest quality drop |
41
- | Q3_K_M | `Q3_K_M/` | 3 | 3.0 GB | ✅ | ✅ | Marginal size win over Q4 (embedding precision dominates total size) |
42
- | Q4_K_M | `Q4_K_M/` | 3 | 3.2 GB | ✅ | ✅ | **Recommended on-device default — best size/quality** |
43
- | Q6_K | `Q6_K/` | 4 | 3.6 GB | ✅ | ✅ | Higher fidelity, still browser-safe |
44
- | Q8_0 | `Q8_0/` | 4 | 4.6 GB | ❌ (over 2 GB) | ✅ | Highest fidelity; one tensor exceeds the browser ArrayBuffer limit, so desktop runtimes only |
45
-
46
- `mmproj-unbound-e2b.gguf` (the vision projector, ~942 MB) sits at the repo
47
- root — load it alongside any LM quant for image input. See the "Vision"
48
- section below.
49
-
50
- ## Recommended sampling
51
-
52
- - **Creative writing / open-ended / general chat** → Gemma defaults:
53
- `temperature=1.0, top_p=0.95, top_k=64`.
54
- - **Factual or brand/identity questions** → lower `temperature` to ~0.3–0.5
55
- for sharper recall.
56
- - **llama.cpp**: pass `--jinja` for proper chat-template handling.
57
- - **Gemma 4 thinking mode** is on by default. Set `enable_thinking: false`
58
- in the chat-template kwargs for shorter/faster replies on this 2B model.
59
 
60
- Some edge-case prompts may deflect on the first ask; a re-ask usually gets
61
- through.
 
 
 
 
 
62
 
63
- ## Default sampling (Ollama)
 
64
 
65
- When you `ollama pull hf.co/evalengine/unbound-e2b-GGUF`, the bundled
66
- `Modelfile` sets these defaults, tuned for factual recall:
67
 
68
- - `temperature = 0.6` (lower than Gemma's training default of 1.0 keeps
69
- the model from hallucinating brand/identity facts at high temperature)
70
- - `top_p = 0.95`, `top_k = 64`, `repeat_penalty = 1.05`, `num_ctx = 8192`
71
- - A short system prompt that grounds the model's identity (model name,
72
- parameter count, modality, team) so brand questions stay sharp.
73
 
74
- **To override per-session in Ollama:**
 
 
 
75
 
76
- ```
77
- ollama run hf.co/evalengine/unbound-e2b-GGUF
78
- >>> /set parameter temperature 1.0 # creative / open-ended
79
- >>> /set parameter temperature 0.3 # max factual / brand questions
80
- ```
81
-
82
- For llama.cpp users, pass `--temp 0.6 --top-p 0.95 --top-k 64` and
83
- include the SYSTEM line from the `Modelfile` as your `--system` argument.
84
-
85
- ## Run with Ollama
86
 
87
  ```bash
 
88
  ollama pull hf.co/evalengine/unbound-e2b-GGUF
89
  ollama run hf.co/evalengine/unbound-e2b-GGUF
90
  ```
91
 
92
- (Defaults to Q4_K_M. Ollama auto-stitches the split parts on load.)
93
-
94
- ## Run with llama.cpp
95
-
96
  ```bash
97
- # point at the FIRST part — llama.cpp follows the chain automatically
98
  ./llama-cli -m Q4_K_M/unbound-e2b-Q4_K_M-00001-of-00003.gguf -p "your prompt"
99
  ```
100
 
101
- ## Vision / image input (optional)
102
-
103
- Gemma 4 E2B ships a vision tower; we extracted it as `mmproj-unbound-e2b.gguf`
104
- (942 MB) in this repo. Pair it with any of the LM quants above to enable
105
- image-to-text inference.
 
 
 
 
106
 
107
- > **Disclaimer.** The vision encoder is **Google's original weights, unchanged**.
108
- > Unbound's abliteration + SFT-heal only touched the *language model* — the
109
- > vision tower was frozen during training. Practical consequences:
110
- >
111
- > - The LM is uncensored, so it will discuss whatever it *sees* directly.
112
- > - But the vision encoder still has Google's original alignment baked into
113
- > visual feature extraction. It may down-weight or distort features for
114
- > content classes Google's base model was tuned to suppress.
115
- > - We have **not benchmarked the visual axis** (no measured refusal rate /
116
- > coherence / hallucination on image inputs). Treat vision as a preview
117
- > feature, not a flagship one.
118
 
119
- ### Run with vision (llama.cpp `llama-mtmd-cli`)
 
120
 
121
  ```bash
122
  ./llama-mtmd-cli \
@@ -126,47 +91,23 @@ image-to-text inference.
126
  -p "What is in this image?"
127
  ```
128
 
129
- `llama-gemma3-cli` works the same way and is Gemma-specific.
130
-
131
- ### Run text-only (no `--mmproj`)
132
-
133
- ```bash
134
- ./llama-cli -m Q4_K_M/unbound-e2b-Q4_K_M-00001-of-00003.gguf -p "your prompt"
135
- ```
136
-
137
- The LM quants work standalone — you do **not** need `mmproj-unbound-e2b.gguf`
138
- unless you want image input. Ollama / LM Studio's standard text chat works
139
- out of the box; the mmproj file is only loaded when you point a multimodal
140
- runtime at it.
141
-
142
- ## Run in the browser (wllama)
143
-
144
- [wllama](https://github.com/ngxson/wllama) is a WebAssembly port of llama.cpp
145
- that runs entirely in the browser — no server, no install. Use Q2_K, Q3_K_M,
146
- Q4_K_M, or Q6_K (Q8_0 has a tensor above the 2 GB ArrayBuffer limit).
147
- Browser inference is **text-only** for this model (wllama doesn't currently
148
- load `mmproj` for vision):
149
-
150
- ```js
151
- import { Wllama } from '@wllama/wllama';
152
- const wllama = new Wllama(/* … */);
153
- await wllama.loadModelFromHF(
154
- 'evalengine/unbound-e2b-GGUF',
155
- 'Q4_K_M/unbound-e2b-Q4_K_M-00001-of-00003.gguf' // wllama follows the chain
156
- );
157
- ```
158
-
159
- ## About the base
160
 
161
- See [`evalengine/unbound-e2b`](https://huggingface.co/evalengine/unbound-e2b)
162
- for the full model card, benchmarks, intended use, and the merged HF weights.
163
 
164
  ## Acknowledgements
165
 
166
- - Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + Huggingface's [TRL](https://github.com/huggingface/trl).
167
- - Abliteration via [heretic](https://github.com/p-e-w/heretic).
168
- - Environment and training discipline ported from [autoresearch](https://github.com/karpathy/autoresearch).
 
169
 
170
  ## License
171
 
172
- Apache-2.0, inherited from `google/gemma-4-E2B-it`.
 
 
18
 
19
  # Unbound E2B GGUF — *because there is no boundary*
20
 
21
+ > **No guarantee — use at your own risk.** Reduced safety filtering; can
22
+ > produce harmful or false output. Provided as-is.
 
 
23
 
24
+ GGUF quants of [`evalengine/unbound-e2b`](https://huggingface.co/evalengine/unbound-e2b)
25
+ for Ollama, llama.cpp, LM Studio, and [wllama](https://github.com/ngxson/wllama)
26
+ (in-browser). Built by [Chromia](https://x.com/Chromia) and
27
+ [Eval Engine](https://x.com/eval_engine).
 
28
 
29
  ## Available quants
30
 
31
+ Each quant lives in its own folder; inside, the model is split into
32
+ multi-part GGUFs (`*-00001-of-0000N.gguf`). All runtimes auto-stitch on the
33
+ first part — same UX as a single file.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ | Quant | Folder | Parts | Total | Browser (wllama) | Desktop | Notes |
36
+ |---------|-------------|-------|--------|------------------|---------|-------|
37
+ | Q2_K | `Q2_K/` | 3 | 2.8 GB | ✅ | ✅ | Smallest, biggest quality drop |
38
+ | Q3_K_M | `Q3_K_M/` | 3 | 3.0 GB | ✅ | ✅ | Marginal size win over Q4 |
39
+ | Q4_K_M | `Q4_K_M/` | 3 | 3.2 GB | ✅ | ✅ | **Recommended default** |
40
+ | Q6_K | `Q6_K/` | 4 | 3.6 GB | ✅ | ✅ | Higher fidelity |
41
+ | Q8_0 | `Q8_0/` | 4 | 4.6 GB | ❌ (over 2 GB) | ✅ | Highest fidelity; desktop only |
42
 
43
+ `mmproj-unbound-e2b.gguf` (vision projector, ~942 MB) sits at the repo
44
+ root — load it alongside any LM quant for image input. See **Vision** below.
45
 
46
+ ## Sampling
 
47
 
48
+ - **Creative / open-ended** `temperature=1.0, top_p=0.95, top_k=64`.
49
+ - **Factual / brand questions** drop `temperature` to ~0.3–0.5.
50
+ - llama.cpp: pass `--jinja`. Gemma 4 thinking mode is on by default; set
51
+ `enable_thinking: false` in chat-template kwargs for shorter replies.
 
52
 
53
+ Ollama: `ollama pull hf.co/...` uses a bundled Modelfile with
54
+ `temperature=0.6, top_p=0.95, top_k=64, repeat_penalty=1.05, num_ctx=8192`
55
+ and an identity-grounding system prompt. Override per-session with
56
+ `/set parameter temperature 0.3` etc.
57
 
58
+ ## Run
 
 
 
 
 
 
 
 
 
59
 
60
  ```bash
61
+ # Ollama (defaults to Q4_K_M)
62
  ollama pull hf.co/evalengine/unbound-e2b-GGUF
63
  ollama run hf.co/evalengine/unbound-e2b-GGUF
64
  ```
65
 
 
 
 
 
66
  ```bash
67
+ # llama.cpp — point at FIRST split part, the rest auto-stitch
68
  ./llama-cli -m Q4_K_M/unbound-e2b-Q4_K_M-00001-of-00003.gguf -p "your prompt"
69
  ```
70
 
71
+ ```js
72
+ // wllama (browser) — Q8_0 has a tensor over 2 GB; use Q2/Q3/Q4/Q6
73
+ import { Wllama } from '@wllama/wllama';
74
+ const wllama = new Wllama(/* */);
75
+ await wllama.loadModelFromHF(
76
+ 'evalengine/unbound-e2b-GGUF',
77
+ 'Q4_K_M/unbound-e2b-Q4_K_M-00001-of-00003.gguf'
78
+ );
79
+ ```
80
 
81
+ ## Vision / image input (optional)
 
 
 
 
 
 
 
 
 
 
82
 
83
+ `mmproj-unbound-e2b.gguf` enables image-to-text. Pair with any LM quant via
84
+ `llama-mtmd-cli` or `llama-gemma3-cli`:
85
 
86
  ```bash
87
  ./llama-mtmd-cli \
 
91
  -p "What is in this image?"
92
  ```
93
 
94
+ > **Disclaimer.** The vision encoder is **Google's original weights,
95
+ > unchanged** — abliteration only touched the language model. The LM is
96
+ > uncensored, but the vision encoder may still suppress features for
97
+ > content classes Google's base was tuned against. We have **not
98
+ > benchmarked the visual axis**. Treat as preview.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
+ Text-only: skip `--mmproj` entirely. Standard `llama-cli` / Ollama / LM
101
+ Studio do not need the mmproj file.
102
 
103
  ## Acknowledgements
104
 
105
+ Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + HF
106
+ [TRL](https://github.com/huggingface/trl). Abliteration via
107
+ [heretic](https://github.com/p-e-w/heretic). Environment from
108
+ [autoresearch](https://github.com/karpathy/autoresearch).
109
 
110
  ## License
111
 
112
+ Apache-2.0, inherited from `google/gemma-4-E2B-it`. Full model card +
113
+ benchmarks at [`evalengine/unbound-e2b`](https://huggingface.co/evalengine/unbound-e2b).