johnsonchromia commited on
Commit
e2144db
·
verified ·
1 Parent(s): c67fd03

README: add mmproj (vision) section + disclaimer + with/without-vision usage

Browse files
Files changed (1) hide show
  1. README.md +44 -1
README.md CHANGED
@@ -95,11 +95,54 @@ ollama run hf.co/evalengine/unbound-e2b-GGUF
95
  ./llama-cli -m unbound-e2b-Q4_K_M-00001-of-00003.gguf -p "your prompt"
96
  ```
97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
  ## Run in the browser (wllama)
99
 
100
  [wllama](https://github.com/ngxson/wllama) is a WebAssembly port of llama.cpp
101
  that runs entirely in the browser — no server, no install. Use Q2_K, Q3_K_M,
102
- Q4_K_M, or Q6_K (Q8_0 has a tensor above the 2 GB ArrayBuffer limit):
 
 
103
 
104
  ```js
105
  import { Wllama } from '@wllama/wllama';
 
95
  ./llama-cli -m unbound-e2b-Q4_K_M-00001-of-00003.gguf -p "your prompt"
96
  ```
97
 
98
+ ## Vision / image input (optional)
99
+
100
+ Gemma 4 E2B ships a vision tower; we extracted it as `mmproj-unbound-e2b.gguf`
101
+ (942 MB) in this repo. Pair it with any of the LM quants above to enable
102
+ image-to-text inference.
103
+
104
+ > **Disclaimer.** The vision encoder is **Google's original weights, unchanged**.
105
+ > Unbound's abliteration + SFT-heal only touched the *language model* — the
106
+ > vision tower was frozen during training. Practical consequences:
107
+ >
108
+ > - The LM is uncensored, so it will discuss whatever it *sees* directly.
109
+ > - But the vision encoder still has Google's original alignment baked into
110
+ > visual feature extraction. It may down-weight or distort features for
111
+ > content classes Google's base model was tuned to suppress.
112
+ > - We have **not benchmarked the visual axis** (no measured refusal rate /
113
+ > coherence / hallucination on image inputs). Treat vision as a preview
114
+ > feature, not a flagship one.
115
+
116
+ ### Run with vision (llama.cpp `llama-mtmd-cli`)
117
+
118
+ ```bash
119
+ ./llama-mtmd-cli \
120
+ -m unbound-e2b-Q4_K_M-00001-of-00003.gguf \
121
+ --mmproj mmproj-unbound-e2b.gguf \
122
+ --image path/to/your/image.png \
123
+ -p "What is in this image?"
124
+ ```
125
+
126
+ `llama-gemma3-cli` works the same way and is Gemma-specific.
127
+
128
+ ### Run text-only (no `--mmproj`)
129
+
130
+ ```bash
131
+ ./llama-cli -m unbound-e2b-Q4_K_M-00001-of-00003.gguf -p "your prompt"
132
+ ```
133
+
134
+ The LM quants work standalone — you do **not** need `mmproj-unbound-e2b.gguf`
135
+ unless you want image input. Ollama / LM Studio's standard text chat works
136
+ out of the box; the mmproj file is only loaded when you point a multimodal
137
+ runtime at it.
138
+
139
  ## Run in the browser (wllama)
140
 
141
  [wllama](https://github.com/ngxson/wllama) is a WebAssembly port of llama.cpp
142
  that runs entirely in the browser — no server, no install. Use Q2_K, Q3_K_M,
143
+ Q4_K_M, or Q6_K (Q8_0 has a tensor above the 2 GB ArrayBuffer limit).
144
+ Browser inference is **text-only** for this model (wllama doesn't currently
145
+ load `mmproj` for vision):
146
 
147
  ```js
148
  import { Wllama } from '@wllama/wllama';