johnsonchromia commited on
Commit
d9892ba
·
verified ·
1 Parent(s): b40287c

README: add mmproj (vision) section + disclaimer + with/without-vision usage

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -92,11 +92,53 @@ ollama run hf.co/evalengine/unbound-e4b-GGUF
92
  ./llama-cli -m unbound-e4b-Q4_K_M-00001-of-00004.gguf -p "your prompt"
93
  ```
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ## Run in the browser (wllama)
96
 
97
  [wllama](https://github.com/ngxson/wllama) is a WebAssembly port of llama.cpp
98
  that runs entirely in the browser. Use one of the wllama-safe variants
99
- above:
 
100
 
101
  ```js
102
  import { Wllama } from '@wllama/wllama';
 
92
  ./llama-cli -m unbound-e4b-Q4_K_M-00001-of-00004.gguf -p "your prompt"
93
  ```
94
 
95
+ ## Vision / image input (optional)
96
+
97
+ Gemma 4 E4B ships a vision tower; we extracted it as `mmproj-unbound-e4b.gguf`
98
+ (946 MB) in this repo. Pair it with any of the LM quants above to enable
99
+ image-to-text inference.
100
+
101
+ > **Disclaimer.** The vision encoder is **Google's original weights, unchanged**.
102
+ > Unbound's abliteration + SFT-heal only touched the *language model* — the
103
+ > vision tower was frozen during training. Practical consequences:
104
+ >
105
+ > - The LM is uncensored, so it will discuss whatever it *sees* directly.
106
+ > - But the vision encoder still has Google's original alignment baked into
107
+ > visual feature extraction. It may down-weight or distort features for
108
+ > content classes Google's base model was tuned to suppress.
109
+ > - We have **not benchmarked the visual axis** (no measured refusal rate /
110
+ > coherence / hallucination on image inputs). Treat vision as a preview
111
+ > feature, not a flagship one.
112
+
113
+ ### Run with vision (llama.cpp `llama-mtmd-cli`)
114
+
115
+ ```bash
116
+ ./llama-mtmd-cli \
117
+ -m unbound-e4b-Q4_K_M-00001-of-00004.gguf \
118
+ --mmproj mmproj-unbound-e4b.gguf \
119
+ --image path/to/your/image.png \
120
+ -p "What is in this image?"
121
+ ```
122
+
123
+ `llama-gemma3-cli` works the same way and is Gemma-specific.
124
+
125
+ ### Run text-only (no `--mmproj`)
126
+
127
+ ```bash
128
+ ./llama-cli -m unbound-e4b-Q4_K_M-00001-of-00004.gguf -p "your prompt"
129
+ ```
130
+
131
+ The LM quants work standalone — you do **not** need `mmproj-unbound-e4b.gguf`
132
+ unless you want image input. Ollama / LM Studio's standard text chat works
133
+ out of the box; the mmproj file is only loaded when you point a multimodal
134
+ runtime at it.
135
+
136
  ## Run in the browser (wllama)
137
 
138
  [wllama](https://github.com/ngxson/wllama) is a WebAssembly port of llama.cpp
139
  that runs entirely in the browser. Use one of the wllama-safe variants
140
+ above. Browser inference is **text-only** for this model (wllama doesn't
141
+ currently load `mmproj` for vision):
142
 
143
  ```js
144
  import { Wllama } from '@wllama/wllama';