lthn commited on
Commit
fb0fd7a
·
verified ·
1 Parent(s): 191c1a1

docs: add benchmarks grid + rapid-mlx usage

Browse files
Files changed (1) hide show
  1. README.md +76 -16
README.md CHANGED
@@ -9,6 +9,8 @@ tags:
9
  - transformers
10
  - 8-bit
11
  - gguf
 
 
12
  base_model:
13
  - google/gemma-4-E2B-it
14
  base_model_relation: quantized
@@ -18,13 +20,60 @@ library_name: mlx
18
 
19
  # Lemer
20
 
21
- A Gemma 4 E2B fine-tune by [Lethean Network](https://lthn.ai/lemer).
22
-
23
- EUPL-1.2 · Apache 2.0 base · [lthn.ai/lemer](https://lthn.ai/lemer)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## Use
26
 
27
- ### MLX
28
 
29
  ```bash
30
  pip install mlx-lm
@@ -37,10 +86,21 @@ model, tokenizer = load("lthn/lemer", revision="4bit")
37
  response = generate(model, tokenizer, prompt="Hello", max_tokens=200)
38
  ```
39
 
40
- ### Ollama
41
 
42
  ```bash
43
- # Coming soon
 
 
 
 
 
 
 
 
 
 
 
44
  ```
45
 
46
  ### HF Transformers
@@ -71,17 +131,18 @@ tokenizer = AutoTokenizer.from_pretrained("lthn/lemer", revision="bf16-hf")
71
 
72
  | Branch | Size |
73
  |--------|------|
74
- | `bf16-gguf` | Coming soon |
75
- | `8bit-gguf` | Coming soon |
76
- | `6bit-gguf` | Coming soon |
77
- | `5bit-gguf` | Coming soon |
78
- | `4bit-gguf` | Coming soon |
 
79
 
80
  ### HF Transformers
81
 
82
  | Branch | Size |
83
  |--------|------|
84
- | `bf16-hf` | Coming soon |
85
 
86
  ## Base
87
 
@@ -89,11 +150,10 @@ tokenizer = AutoTokenizer.from_pretrained("lthn/lemer", revision="bf16-hf")
89
 
90
  ## More
91
 
92
- - [lthn.ai/lemer](https://lthn.ai/lemer)
93
- - [Lethean Network](https://lthn.ai)
94
- - [GitHub](https://github.com/dappcore)
95
 
96
  ## Licence
97
 
98
  Training data and adapter: [EUPL-1.2](https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12)
99
- Base model: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 
9
  - transformers
10
  - 8-bit
11
  - gguf
12
+ - lek
13
+ - lethean
14
  base_model:
15
  - google/gemma-4-E2B-it
16
  base_model_relation: quantized
 
20
 
21
  # Lemer
22
 
23
+ A Gemma 4 E2B with LEK activation by [Lethean Network](https://lthn.ai).
24
+
25
+ EUPL-1.2 · Apache 2.0 base · [lthn.ai](https://lthn.ai)
26
+
27
+ ## Benchmarks
28
+
29
+ MMLU-Pro (TIGER-Lab/MMLU-Pro, test split), deterministic (temperature=0), thinking enabled.
30
+ Evaluated using [rapid-mlx](https://github.com/LetheanNetwork/Rapid-MLX) + OpenAI SDK + Google `parse_response()`.
31
+
32
+ ### Lemer vs Stock Gemma 4 E2B (bf16, 20 samples per category)
33
+
34
+ | | Stock E2B bf16 | Lemer bf16 | Delta |
35
+ | :---- | :----: | :----: | :----: |
36
+ | Biology | 40.0% | **60.0%** | +20.0% |
37
+ | Math | 10.0% | **55.0%** | +45.0% |
38
+ | Business | TBC | TBC | TBC |
39
+ | Chemistry | TBC | TBC | TBC |
40
+ | Computer Science | TBC | TBC | TBC |
41
+ | Economics | TBC | TBC | TBC |
42
+ | Engineering | TBC | TBC | TBC |
43
+ | Health | TBC | TBC | TBC |
44
+ | History | TBC | TBC | TBC |
45
+ | Law | TBC | TBC | TBC |
46
+ | Other | TBC | TBC | TBC |
47
+ | Philosophy | TBC | TBC | TBC |
48
+ | Physics | TBC | TBC | TBC |
49
+ | Psychology | TBC | TBC | TBC |
50
+ | **Average** | **25.0%** | **57.5%** | **+32.5%** |
51
+
52
+ > Stock Gemma 4 E2B shows a strong bias toward answer option "I" (50-80% of responses), suggesting RLHF calibration issues when served via MLX. Lemer does not exhibit this bias.
53
+
54
+ ### Lemer Quantisation Benchmarks (MMLU-Pro, all categories, avg of 4 runs)
55
+
56
+ | | bf16 | 8bit | 6bit | 5bit | 4bit | mxfp8 | mxfp4 | nvfp4 |
57
+ | :---- | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
58
+ | Biology | 60.0% | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
59
+ | Math | 55.0% | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
60
+ | Business | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
61
+ | Chemistry | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
62
+ | Computer Science | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
63
+ | Economics | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
64
+ | Engineering | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
65
+ | Health | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
66
+ | History | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
67
+ | Law | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
68
+ | Other | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
69
+ | Philosophy | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
70
+ | Physics | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
71
+ | Psychology | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
72
+ | **Average** | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
73
 
74
  ## Use
75
 
76
+ ### MLX (recommended for Apple Silicon)
77
 
78
  ```bash
79
  pip install mlx-lm
 
86
  response = generate(model, tokenizer, prompt="Hello", max_tokens=200)
87
  ```
88
 
89
+ ### Rapid-MLX (OpenAI-compatible server)
90
 
91
  ```bash
92
+ pip install rapid-mlx
93
+ rapid-mlx serve lthn/lemer --port 8100
94
+ ```
95
+
96
+ ```python
97
+ from openai import OpenAI
98
+ client = OpenAI(base_url="http://localhost:8100/v1", api_key="not-needed")
99
+ response = client.chat.completions.create(
100
+ model="default",
101
+ messages=[{"role": "user", "content": "Hello"}],
102
+ )
103
+ print(response.choices[0].message.content)
104
  ```
105
 
106
  ### HF Transformers
 
131
 
132
  | Branch | Size |
133
  |--------|------|
134
+ | `bf16-gguf` | 8.7G |
135
+ | `8bit-gguf` | 4.6G |
136
+ | `6bit-gguf` | 3.6G |
137
+ | `5bit-gguf` | 3.0G |
138
+ | `4bit-gguf` | 2.5G |
139
+ | `3bit-gguf` | 2.0G |
140
 
141
  ### HF Transformers
142
 
143
  | Branch | Size |
144
  |--------|------|
145
+ | `bf16-hf` | 8.7G |
146
 
147
  ## Base
148
 
 
150
 
151
  ## More
152
 
153
+ - [lthn.ai](https://lthn.ai)
154
+ - [Lethean Network](https://github.com/LetheanNetwork)
 
155
 
156
  ## Licence
157
 
158
  Training data and adapter: [EUPL-1.2](https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12)
159
+ Base model: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)