johnsonchromia commited on
Commit
a472252
Β·
verified Β·
1 Parent(s): e5bf496

README: rewrite paths for canonical flat layout

Browse files
Files changed (1) hide show
  1. README.md +17 -18
README.md CHANGED
@@ -28,17 +28,17 @@ for Ollama, llama.cpp, LM Studio, and [wllama](https://github.com/ngxson/wllama)
28
 
29
  ## Available quants
30
 
31
- Each quant lives in its own folder; inside, the model is split into
32
- multi-part GGUFs (`*-00001-of-0000N.gguf`). All runtimes auto-stitch on the
33
- first part β€” same UX as a single file.
34
-
35
- | Quant | Folder | Parts | Total | Browser (wllama) | Desktop | Notes |
36
- |---------|-------------|-------|--------|------------------|---------|-------|
37
- | Q2_K | `Q2_K/` | 3 | 2.8 GB | βœ… | βœ… | Smallest, biggest quality drop |
38
- | Q3_K_M | `Q3_K_M/` | 3 | 3.0 GB | βœ… | βœ… | Marginal size win over Q4 |
39
- | Q4_K_M | `Q4_K_M/` | 3 | 3.2 GB | βœ… | βœ… | **Recommended default** |
40
- | Q6_K | `Q6_K/` | 4 | 3.6 GB | βœ… | βœ… | Higher fidelity |
41
- | Q8_0 | `Q8_0/` | 4 | 4.6 GB | ❌ (over 2 GB) | βœ… | Highest fidelity; desktop only |
42
 
43
  `mmproj-unbound-e2b.gguf` (vision projector, ~942 MB) sits at the repo
44
  root β€” load it alongside any LM quant for image input. See **Vision** below.
@@ -50,12 +50,11 @@ root β€” load it alongside any LM quant for image input. See **Vision** below.
50
  - llama.cpp: pass `--jinja`. Gemma 4 thinking mode is on by default; set
51
  `enable_thinking: false` in chat-template kwargs for shorter replies.
52
 
53
- For Ollama specifically, pull from the **Ollama Registry** β€”
54
  `ollama pull hf.co/...` [doesn't yet support sharded GGUFs](https://github.com/ollama/ollama/issues/5245).
55
  The registry version is a single-file Q4_K_M with a bundled Modelfile
56
  (`temperature=0.6, top_p=0.95, top_k=64, repeat_penalty=1.05, num_ctx=8192`
57
- and an identity-grounding system prompt). Override per-session with
58
- `/set parameter temperature 0.3` etc.
59
 
60
  ## Run
61
 
@@ -66,8 +65,8 @@ ollama run evalengine/unbound-e2b
66
  ```
67
 
68
  ```bash
69
- # llama.cpp β€” point at FIRST split part, the rest auto-stitch
70
- ./llama-cli -m Q4_K_M/unbound-e2b-Q4_K_M-00001-of-00003.gguf -p "your prompt"
71
  ```
72
 
73
  ```js
@@ -76,7 +75,7 @@ import { Wllama } from '@wllama/wllama';
76
  const wllama = new Wllama(/* … */);
77
  await wllama.loadModelFromHF(
78
  'evalengine/unbound-e2b-GGUF',
79
- 'Q4_K_M/unbound-e2b-Q4_K_M-00001-of-00003.gguf'
80
  );
81
  ```
82
 
@@ -87,7 +86,7 @@ await wllama.loadModelFromHF(
87
 
88
  ```bash
89
  ./llama-mtmd-cli \
90
- -m Q4_K_M/unbound-e2b-Q4_K_M-00001-of-00003.gguf \
91
  --mmproj mmproj-unbound-e2b.gguf \
92
  --image path/to/your/image.png \
93
  -p "What is in this image?"
 
28
 
29
  ## Available quants
30
 
31
+ Each quant is shipped as a sharded multi-part GGUF (`unbound-e2b.<QUANT>-NNNNN-of-NNNNN.gguf`).
32
+ Ollama, llama.cpp, LM Studio, and wllama auto-stitch on the first part β€”
33
+ same UX as a single file.
34
+
35
+ | Quant | Parts | Total | Browser (wllama) | Desktop | Notes |
36
+ |---------|-------|--------|------------------|---------|-------|
37
+ | Q2_K | 3 | 2.8 GB | βœ… | βœ… | Smallest, biggest quality drop |
38
+ | Q3_K_M | 3 | 3.0 GB | βœ… | βœ… | Marginal size win over Q4 |
39
+ | Q4_K_M | 3 | 3.2 GB | βœ… | βœ… | **Recommended default** |
40
+ | Q6_K | 4 | 3.6 GB | βœ… | βœ… | Higher fidelity |
41
+ | Q8_0 | 4 | 4.6 GB | ❌ (over 2 GB) | βœ… | Highest fidelity; desktop only |
42
 
43
  `mmproj-unbound-e2b.gguf` (vision projector, ~942 MB) sits at the repo
44
  root β€” load it alongside any LM quant for image input. See **Vision** below.
 
50
  - llama.cpp: pass `--jinja`. Gemma 4 thinking mode is on by default; set
51
  `enable_thinking: false` in chat-template kwargs for shorter replies.
52
 
53
+ For Ollama, pull from the **Ollama Registry** β€”
54
  `ollama pull hf.co/...` [doesn't yet support sharded GGUFs](https://github.com/ollama/ollama/issues/5245).
55
  The registry version is a single-file Q4_K_M with a bundled Modelfile
56
  (`temperature=0.6, top_p=0.95, top_k=64, repeat_penalty=1.05, num_ctx=8192`
57
+ and an identity-grounding system prompt).
 
58
 
59
  ## Run
60
 
 
65
  ```
66
 
67
  ```bash
68
+ # llama.cpp β€” point at FIRST shard, the rest auto-stitch
69
+ ./llama-cli -m unbound-e2b.Q4_K_M-00001-of-00003.gguf -p "your prompt"
70
  ```
71
 
72
  ```js
 
75
  const wllama = new Wllama(/* … */);
76
  await wllama.loadModelFromHF(
77
  'evalengine/unbound-e2b-GGUF',
78
+ 'unbound-e2b.Q4_K_M-00001-of-00003.gguf'
79
  );
80
  ```
81
 
 
86
 
87
  ```bash
88
  ./llama-mtmd-cli \
89
+ -m unbound-e2b.Q4_K_M-00001-of-00003.gguf \
90
  --mmproj mmproj-unbound-e2b.gguf \
91
  --image path/to/your/image.png \
92
  -p "What is in this image?"