johnsonchromia commited on
Commit
3772eb0
·
verified ·
1 Parent(s): ecb12de

README: compact pass — keep essentials, drop redundancy

Browse files
Files changed (1) hide show
  1. README.md +31 -57
README.md CHANGED
@@ -17,87 +17,61 @@ library_name: transformers
17
 
18
  # Unbound E2B — *because there is no boundary*
19
 
20
- > **No guarantee — use at your own risk.** This model has reduced safety filtering
21
- > and can produce harmful, false, biased, or otherwise unsafe output. Provided
22
- > as-is, with no warranty of any kind. You are solely responsible for how you
23
- > use it and for complying with all applicable laws.
24
-
25
- Uncensored variant of `google/gemma-4-E2B-it` from the [**Chromia**](https://x.com/Chromia) & [**Eval Engine**](https://x.com/eval_engine)
26
- team. This repo holds the merged HF weights; for the **on-device GGUF builds**
27
- (Ollama / llama.cpp / LM Studio / [wllama](https://github.com/ngxson/wllama) in-browser), see
28
- [`evalengine/unbound-e2b-GGUF`](https://huggingface.co/evalengine/unbound-e2b-GGUF).
29
-
30
- ## What this is for
31
-
32
- An LLM you can run on your laptop or phone that just *answers* — no API key,
33
- no internet connection, no refusals. Useful for:
34
-
35
- - **Offline use** — travel, flights, remote sites, air-gapped environments.
36
- - **Security research & malware analysis** — safely discuss and reverse-engineer
37
- code involving attack vectors, exploitation techniques, and other material
38
- that standard models will strictly flag as harmful.
39
- - **Unrestricted coding & scripting** — write scripts or utilities that
40
- interact with system files, API endpoints, or anything else, without the
41
- model lecturing you on ethical usage.
42
- - **Content pipelines** that need direct compliance instead of a refusal layer.
43
- - **Privacy-sensitive workflows** where prompts shouldn't leave the device.
44
-
45
- Base capability is preserved close to `gemma-4-E2B-it`, so it also doubles
46
- as a general-purpose 2B chat model.
47
 
48
  ## Benchmarks (vs base `gemma-4-E2B-it`)
49
 
50
  | Axis | Base | Unbound E2B | Δ |
51
  |---|---|---|---|
52
  | Refusal rate (AdvBench 520) | 98.46% | **2.31%** | **−96.15 pts** |
53
- | Useful-compliance rate | 0.96% | **24.23%** | +23.27 pts |
54
- | Hallucination rate | 1.35% | 18.85% | +17.50 pts |
55
- | Coherence on benign prompts | 1.0 | 0.80 | −0.20 |
56
- | TruthfulQA mc2 (lm-eval, `--limit 100`) | 0.4576 | 0.4676 | +1.00 |
57
- | MMLU (lm-eval, `--limit 100`) | 0.2905 | 0.2807 | −0.98 |
58
- | GSM8K (lm-eval, `--limit 100`) | 0.1250 | 0.1400 | +1.50 |
59
  | KL divergence vs base | 0 | 3.80 | (SFT-expected) |
60
 
61
- ## Recommended sampling
62
-
63
- Depends on what you're doing:
64
 
65
- - **Creative writing / open-ended / general chat** → use Gemma's training
66
- defaults: `temperature=1.0, top_p=0.95, top_k=64`.
67
- - **Factual or brand/identity questions** drop `temperature` to ~0.3–0.5
68
- for sharper recall. The model knows Chromia / Eval Engine / Rell, but those
69
- answers are sensitive to sampling noise at temperature 1.0.
70
- - **llama.cpp**: pass `--jinja` for proper chat-template handling.
71
- - **Gemma 4 thinking mode** is on by default. For shorter/faster replies on a
72
- 2B model, set `enable_thinking: false` in the chat-template kwargs.
73
 
74
- Some edge-case prompts may deflect on the first ask; a re-ask or strategic
75
- re-phrasing usually gets through.
76
 
77
- ## Run on-device (GGUF)
78
-
79
- The phone-deployable build lives in
80
- [`evalengine/unbound-e2b-GGUF`](https://huggingface.co/evalengine/unbound-e2b-GGUF) —
81
- Q4_K_M / Q6_K / Q8_0, all shipped as split multi-part files (browser-safe via
82
- wllama; Ollama and llama.cpp auto-stitch on the first part):
83
 
84
  ```bash
 
85
  ollama pull hf.co/evalengine/unbound-e2b-GGUF
86
  ollama run hf.co/evalengine/unbound-e2b-GGUF
87
  ```
88
 
89
- ## Run in transformers
90
-
91
  ```python
 
92
  from transformers import AutoModelForCausalLM, AutoTokenizer
93
  model = AutoModelForCausalLM.from_pretrained("evalengine/unbound-e2b")
94
  tok = AutoTokenizer.from_pretrained("evalengine/unbound-e2b")
95
  ```
 
96
  ## Acknowledgements
97
 
98
- - Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + Huggingface's [TRL](https://github.com/huggingface/trl).
99
- - Abliteration via [heretic](https://github.com/p-e-w/heretic).
100
- - Environment and training discipline ported from [autoresearch](https://github.com/karpathy/autoresearch).
 
101
 
102
  ## License
103
 
 
17
 
18
  # Unbound E2B — *because there is no boundary*
19
 
20
+ > **No guarantee — use at your own risk.** This model has reduced safety
21
+ > filtering and can produce harmful, false, biased, or unsafe output.
22
+ > Provided as-is; you are responsible for compliance with applicable laws.
23
+
24
+ Uncensored finetune of `google/gemma-4-E2B-it` by the
25
+ [Chromia](https://x.com/Chromia) & [Eval Engine](https://x.com/eval_engine)
26
+ team. Runs on a phone or laptop, no API, no refusals.
27
+
28
+ This repo holds the merged HF weights. On-device GGUF builds (Ollama,
29
+ llama.cpp, LM Studio, [wllama](https://github.com/ngxson/wllama) in-browser)
30
+ are at [`evalengine/unbound-e2b-GGUF`](https://huggingface.co/evalengine/unbound-e2b-GGUF).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Benchmarks (vs base `gemma-4-E2B-it`)
33
 
34
  | Axis | Base | Unbound E2B | Δ |
35
  |---|---|---|---|
36
  | Refusal rate (AdvBench 520) | 98.46% | **2.31%** | **−96.15 pts** |
37
+ | Useful-compliance rate | 0.96% | 24.23% | +23.27 pts |
38
+ | Hallucination (on harmful prompts) | 1.35% | 18.85% | +17.50 pts |
39
+ | Coherence (benign prompts) | 1.00 | 0.80 | −0.20 |
40
+ | TruthfulQA mc2 (`--limit 100`) | 0.458 | 0.468 | +1.0 pt |
41
+ | MMLU (`--limit 100`) | 0.291 | 0.281 | −1.0 pt |
42
+ | GSM8K (`--limit 100`) | 0.125 | 0.140 | +1.5 pt |
43
  | KL divergence vs base | 0 | 3.80 | (SFT-expected) |
44
 
45
+ ## Sampling
 
 
46
 
47
+ - **Creative / open-ended** → Gemma defaults: `temperature=1.0, top_p=0.95, top_k=64`.
48
+ - **Factual / brand questions** → drop `temperature` to ~0.3–0.5 for sharper recall.
49
+ - llama.cpp: pass `--jinja`. Gemma 4 thinking mode is on by default — set
50
+ `enable_thinking: false` in chat-template kwargs for shorter replies.
 
 
 
 
51
 
52
+ Some edge-case prompts may deflect on the first ask; a re-ask usually gets through.
 
53
 
54
+ ## Use
 
 
 
 
 
55
 
56
  ```bash
57
+ # on-device (GGUF)
58
  ollama pull hf.co/evalengine/unbound-e2b-GGUF
59
  ollama run hf.co/evalengine/unbound-e2b-GGUF
60
  ```
61
 
 
 
62
  ```python
63
+ # transformers
64
  from transformers import AutoModelForCausalLM, AutoTokenizer
65
  model = AutoModelForCausalLM.from_pretrained("evalengine/unbound-e2b")
66
  tok = AutoTokenizer.from_pretrained("evalengine/unbound-e2b")
67
  ```
68
+
69
  ## Acknowledgements
70
 
71
+ Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + HF
72
+ [TRL](https://github.com/huggingface/trl). Abliteration via
73
+ [heretic](https://github.com/p-e-w/heretic). Environment + training
74
+ discipline ported from [autoresearch](https://github.com/karpathy/autoresearch).
75
 
76
  ## License
77