johnsonchromia commited on
Commit
afcfca6
Β·
verified Β·
1 Parent(s): ad69b19

Update README: E4B-3 benchmarks + wllama repo split + AEON attribution

Browse files
Files changed (1) hide show
  1. README.md +18 -35
README.md CHANGED
@@ -21,42 +21,33 @@ pipeline_tag: text-generation
21
  > **No guarantee β€” use at your own risk.** Reduced safety filtering; can
22
  > produce harmful or false output. Provided as-is.
23
 
24
- GGUF quants of [`evalengine/unbound-e4b`](https://huggingface.co/evalengine/unbound-e4b)
25
- for Ollama, llama.cpp, LM Studio, and [wllama](https://github.com/ngxson/wllama)
26
- (in-browser). Built by [Chromia](https://x.com/Chromia) and
27
- [Eval Engine](https://x.com/eval_engine).
28
 
29
- ## Available quants
 
 
 
 
30
 
31
- Each quant is shipped as a sharded multi-part GGUF. Ollama, llama.cpp, LM
32
- Studio, and wllama auto-stitch on the first part β€” same UX as a single file.
33
 
34
- ### Desktop builds β€” `unbound-e4b.<QUANT>-NNNNN-of-NNNNN.gguf`
 
 
35
 
36
- Embedding tensor kept at the llama.cpp default of Q6_K; largest part ~2.15 GB
37
- β€” fine for desktop, **won't load in browser**.
38
 
39
  | Quant | Parts | Total | Notes |
40
  |---------|-------|---------|-------|
41
  | Q2_K | 4 | 4.08 GB | Smallest, biggest quality drop |
42
  | Q3_K_M | 4 | 4.49 GB | Modest size win over Q4 (embedding precision dominates) |
43
- | Q4_K_M | 4 | 4.94 GB | **Recommended desktop default** |
44
  | Q6_K | 5 | 5.75 GB | Higher fidelity |
45
  | Q8_0 | 6 | 7.43 GB | Highest fidelity |
46
 
47
- ### Browser builds β€” `unbound-e4b-web.<QUANT>-NNNNN-of-NNNNN.gguf`
48
-
49
- E4B's `per_layer_token_embd` is a 2.82-billion-value tensor; at the default
50
- Q6_K precision it lands at ~2.2 GB, over wllama's 2 GB ArrayBuffer cap.
51
- These variants force embeddings to `q5_K` (~1848 MB) so the largest part
52
- fits. They use a distinct `unbound-e4b-web` model prefix so HF's GGUF UI
53
- doesn't aggregate them with the same-quant desktop files.
54
-
55
- | Variant | Parts | Total | Notes |
56
- |-----------------|-------|---------|-------|
57
- | Q4_K_M (web) | 4 | 4.51 GB | **Recommended browser default** β€” layers @ Q4_K_M, embed @ q5_K |
58
- | Q2_K (web) | 4 | 3.69 GB | Smallest browser-loadable β€” layers @ Q2_K, embed @ q5_K |
59
-
60
  ## Sampling
61
 
62
  - **Creative / open-ended** β†’ `temperature=1.0, top_p=0.95, top_k=64`.
@@ -83,16 +74,6 @@ ollama run evalengine/unbound-e4b
83
  ./llama-cli -m unbound-e4b.Q4_K_M-00001-of-00004.gguf -p "your prompt"
84
  ```
85
 
86
- ```js
87
- // wllama (browser) β€” use a -web variant; desktop builds won't fit
88
- import { Wllama } from '@wllama/wllama';
89
- const wllama = new Wllama(/* … */);
90
- await wllama.loadModelFromHF(
91
- 'evalengine/unbound-e4b-GGUF',
92
- 'unbound-e4b-web.Q4_K_M-00001-of-00004.gguf'
93
- );
94
- ```
95
-
96
  ## Vision / image input (optional)
97
 
98
  `mmproj-unbound-e4b.gguf` enables image-to-text. Pair with any LM quant via
@@ -120,7 +101,9 @@ not need the mmproj file.
120
  Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + HF
121
  [TRL](https://github.com/huggingface/trl). Abliteration via
122
  [heretic](https://github.com/p-e-w/heretic). Environment from
123
- [autoresearch](https://github.com/karpathy/autoresearch).
 
 
124
 
125
  ## License
126
 
 
21
  > **No guarantee β€” use at your own risk.** Reduced safety filtering; can
22
  > produce harmful or false output. Provided as-is.
23
 
24
+ Desktop GGUF quants of [`evalengine/unbound-e4b`](https://huggingface.co/evalengine/unbound-e4b)
25
+ for Ollama, llama.cpp, and LM Studio. Built by
26
+ [Chromia](https://x.com/Chromia) and [Eval Engine](https://x.com/eval_engine).
 
27
 
28
+ > **Looking for the browser/wllama builds?** They live in their own repo:
29
+ > [`evalengine/unbound-e4b-wllama-gguf`](https://huggingface.co/evalengine/unbound-e4b-wllama-gguf).
30
+ > E4B's `per_layer_token_embd` tensor needs special quantization to fit
31
+ > wllama's 2 GB ArrayBuffer cap β€” keeping the desktop and browser variants
32
+ > in separate repos avoids HF GGUF UI aggregation collisions.
33
 
34
+ ## Available quants
 
35
 
36
+ Each quant is shipped as a sharded multi-part GGUF
37
+ (`unbound-e4b.<QUANT>-NNNNN-of-NNNNN.gguf`). Ollama, llama.cpp, and LM
38
+ Studio auto-stitch on the first part β€” same UX as a single file.
39
 
40
+ Embedding tensor kept at the llama.cpp default of Q6_K; largest part
41
+ ~2.15 GB β€” fine for desktop, **won't load in browser**.
42
 
43
  | Quant | Parts | Total | Notes |
44
  |---------|-------|---------|-------|
45
  | Q2_K | 4 | 4.08 GB | Smallest, biggest quality drop |
46
  | Q3_K_M | 4 | 4.49 GB | Modest size win over Q4 (embedding precision dominates) |
47
+ | Q4_K_M | 4 | 4.94 GB | **Recommended default** |
48
  | Q6_K | 5 | 5.75 GB | Higher fidelity |
49
  | Q8_0 | 6 | 7.43 GB | Highest fidelity |
50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  ## Sampling
52
 
53
  - **Creative / open-ended** β†’ `temperature=1.0, top_p=0.95, top_k=64`.
 
74
  ./llama-cli -m unbound-e4b.Q4_K_M-00001-of-00004.gguf -p "your prompt"
75
  ```
76
 
 
 
 
 
 
 
 
 
 
 
77
  ## Vision / image input (optional)
78
 
79
  `mmproj-unbound-e4b.gguf` enables image-to-text. Pair with any LM quant via
 
101
  Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + HF
102
  [TRL](https://github.com/huggingface/trl). Abliteration via
103
  [heretic](https://github.com/p-e-w/heretic). Environment from
104
+ [autoresearch](https://github.com/karpathy/autoresearch). 200 of the 700
105
+ compliance training examples were distilled from
106
+ [`AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4`](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4).
107
 
108
  ## License
109