johnsonchromia commited on
Commit
826faa9
Β·
verified Β·
1 Parent(s): 17bb398

README: add Q2_K + Q3_K_M to quant table

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -38,7 +38,9 @@ auto-stitch on the first part β€” same UX as a single file.
38
 
39
  | Quant | Parts | Total | Largest part | wllama (browser) | Desktop (Ollama/llama.cpp/LM Studio) | Notes |
40
  |---------|-------|--------|--------------|------------------|--------------------------------------|-------|
41
- | Q4_K_M | 3 | 3.2 GB | 1.79 GB | βœ… | βœ… | Recommended on-device default β€” best size/quality |
 
 
42
  | Q6_K | 4 | 3.6 GB | 1.79 GB | βœ… | βœ… | Higher fidelity, still browser-safe |
43
  | Q8_0 | 4 | 4.6 GB | **2.32 GB** | ❌ (over 2 GB) | βœ… | Highest fidelity; one tensor exceeds the browser ArrayBuffer limit, so desktop runtimes only |
44
 
@@ -96,8 +98,8 @@ ollama run hf.co/evalengine/unbound-e2b-GGUF
96
  ## Run in the browser (wllama)
97
 
98
  [wllama](https://github.com/ngxson/wllama) is a WebAssembly port of llama.cpp
99
- that runs entirely in the browser β€” no server, no install. Use Q4_K_M or
100
- Q6_K (Q8_0 has a tensor above the 2 GB ArrayBuffer limit):
101
 
102
  ```js
103
  import { Wllama } from '@wllama/wllama';
 
38
 
39
  | Quant | Parts | Total | Largest part | wllama (browser) | Desktop (Ollama/llama.cpp/LM Studio) | Notes |
40
  |---------|-------|--------|--------------|------------------|--------------------------------------|-------|
41
+ | Q2_K | 3 | 2.8 GB | 1.79 GB | βœ… | βœ… | Smallest disk footprint; biggest quality drop. Useful for the most constrained devices |
42
+ | Q3_K_M | 3 | 3.0 GB | 1.79 GB | βœ… | βœ… | Marginal size win over Q4 (embedding precision dominates total size) |
43
+ | Q4_K_M | 3 | 3.2 GB | 1.79 GB | βœ… | βœ… | **Recommended on-device default β€” best size/quality** |
44
  | Q6_K | 4 | 3.6 GB | 1.79 GB | βœ… | βœ… | Higher fidelity, still browser-safe |
45
  | Q8_0 | 4 | 4.6 GB | **2.32 GB** | ❌ (over 2 GB) | βœ… | Highest fidelity; one tensor exceeds the browser ArrayBuffer limit, so desktop runtimes only |
46
 
 
98
  ## Run in the browser (wllama)
99
 
100
  [wllama](https://github.com/ngxson/wllama) is a WebAssembly port of llama.cpp
101
+ that runs entirely in the browser β€” no server, no install. Use Q2_K, Q3_K_M,
102
+ Q4_K_M, or Q6_K (Q8_0 has a tensor above the 2 GB ArrayBuffer limit):
103
 
104
  ```js
105
  import { Wllama } from '@wllama/wllama';