johnsonchromia commited on
Commit
8e4ca58
Β·
verified Β·
1 Parent(s): 40fbc30

README: add Q2_K + Q3_K_M to quant table

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -37,9 +37,11 @@ Studio auto-stitch on the first part β€” same UX as a single file.
37
 
38
  | Quant | Parts | Total | Largest part | wllama (browser) | Desktop (Ollama / llama.cpp / LM Studio) | Notes |
39
  |---------|-------|---------|--------------|------------------|------------------------------------------|-------|
40
- | Q4_K_M | 4 | 4.92 GB | ~2.15 GB | ❌ | βœ… | Recommended on-device default β€” best size/quality |
41
- | Q6_K | 5 | 5.73 GB | ~2.2 GB | ❌ | βœ… | Higher fidelity |
42
- | Q8_0 | 6 | 7.41 GB | ~2.7 GB | ❌ | βœ… | Highest fidelity |
 
 
43
 
44
  **Why no wllama:** E4B's `per_layer_token_embd` is a single atomic tensor
45
  that exceeds 2 GB in every quant we ship (the smallest one, Q4_K_M, lands
 
37
 
38
  | Quant | Parts | Total | Largest part | wllama (browser) | Desktop (Ollama / llama.cpp / LM Studio) | Notes |
39
  |---------|-------|---------|--------------|------------------|------------------------------------------|-------|
40
+ | Q2_K | 4 | 4.08 GB | ~2.15 GB | ❌ | βœ… | Smallest disk footprint; biggest quality drop |
41
+ | Q3_K_M | 4 | 4.49 GB | ~2.15 GB | ❌ | βœ… | Modest size win over Q4 (embedding precision dominates) |
42
+ | Q4_K_M | 4 | 4.94 GB | ~2.15 GB | ❌ | βœ… | **Recommended on-device default β€” best size/quality** |
43
+ | Q6_K | 5 | 5.75 GB | ~2.2 GB | ❌ | βœ… | Higher fidelity |
44
+ | Q8_0 | 6 | 7.43 GB | ~2.7 GB | ❌ | βœ… | Highest fidelity |
45
 
46
  **Why no wllama:** E4B's `per_layer_token_embd` is a single atomic tensor
47
  that exceeds 2 GB in every quant we ship (the smallest one, Q4_K_M, lands