LetheanNetwork
/

lemer-bk

@@ -9,6 +9,8 @@ tags:
 - transformers
 - 8-bit
 - gguf
 base_model:
 - google/gemma-4-E2B-it
 base_model_relation: quantized
@@ -18,13 +20,60 @@ library_name: mlx
 # Lemer
-A Gemma 4 E2B fine-tune by [Lethean Network](https://lthn.ai/lemer).
-EUPL-1.2 · Apache 2.0 base · [lthn.ai/lemer](https://lthn.ai/lemer)
 ## Use
-### MLX
 ```bash
 pip install mlx-lm
@@ -37,10 +86,21 @@ model, tokenizer = load("lthn/lemer", revision="4bit")
 response = generate(model, tokenizer, prompt="Hello", max_tokens=200)
 ```
-### Ollama
 ```bash
-# Coming soon
 ```
 ### HF Transformers
@@ -71,17 +131,18 @@ tokenizer = AutoTokenizer.from_pretrained("lthn/lemer", revision="bf16-hf")
 | Branch | Size |
 |--------|------|
-| `bf16-gguf` | Coming soon |
-| `8bit-gguf` | Coming soon |
-| `6bit-gguf` | Coming soon |
-| `5bit-gguf` | Coming soon |
-| `4bit-gguf` | Coming soon |
 ### HF Transformers
 | Branch | Size |
 |--------|------|
-| `bf16-hf` | Coming soon |
 ## Base
@@ -89,11 +150,10 @@ tokenizer = AutoTokenizer.from_pretrained("lthn/lemer", revision="bf16-hf")
 ## More
-- [lthn.ai/lemer](https://lthn.ai/lemer)
-- [Lethean Network](https://lthn.ai)
-- [GitHub](https://github.com/dappcore)
 ## Licence
 Training data and adapter: [EUPL-1.2](https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12)
-Base model: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)

 - transformers
 - 8-bit
 - gguf
+- lek
+- lethean
 base_model:
 - google/gemma-4-E2B-it
 base_model_relation: quantized
 # Lemer
+A Gemma 4 E2B with LEK activation by [Lethean Network](https://lthn.ai).
+EUPL-1.2 · Apache 2.0 base · [lthn.ai](https://lthn.ai)
+## Benchmarks
+MMLU-Pro (TIGER-Lab/MMLU-Pro, test split), deterministic (temperature=0), thinking enabled.
+Evaluated using [rapid-mlx](https://github.com/LetheanNetwork/Rapid-MLX) + OpenAI SDK + Google `parse_response()`.
+### Lemer vs Stock Gemma 4 E2B (bf16, 20 samples per category)
+|  | Stock E2B bf16 | Lemer bf16 | Delta |
+| :---- | :----: | :----: | :----: |
+| Biology | 40.0% | **60.0%** | +20.0% |
+| Math | 10.0% | **55.0%** | +45.0% |
+| Business | TBC | TBC | TBC |
+| Chemistry | TBC | TBC | TBC |
+| Computer Science | TBC | TBC | TBC |
+| Economics | TBC | TBC | TBC |
+| Engineering | TBC | TBC | TBC |
+| Health | TBC | TBC | TBC |
+| History | TBC | TBC | TBC |
+| Law | TBC | TBC | TBC |
+| Other | TBC | TBC | TBC |
+| Philosophy | TBC | TBC | TBC |
+| Physics | TBC | TBC | TBC |
+| Psychology | TBC | TBC | TBC |
+| **Average** | **25.0%** | **57.5%** | **+32.5%** |
+> Stock Gemma 4 E2B shows a strong bias toward answer option "I" (50-80% of responses), suggesting RLHF calibration issues when served via MLX. Lemer does not exhibit this bias.
+### Lemer Quantisation Benchmarks (MMLU-Pro, all categories, avg of 4 runs)
+|  | bf16 | 8bit | 6bit | 5bit | 4bit | mxfp8 | mxfp4 | nvfp4 |
+| :---- | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
+| Biology | 60.0% | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Math | 55.0% | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Business | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Chemistry | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Computer Science | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Economics | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Engineering | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Health | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| History | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Law | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Other | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Philosophy | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Physics | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| Psychology | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
+| **Average** | TBC | TBC | TBC | TBC | TBC | TBC | TBC | TBC |
 ## Use
+### MLX (recommended for Apple Silicon)
 ```bash
 pip install mlx-lm
 response = generate(model, tokenizer, prompt="Hello", max_tokens=200)
 ```
+### Rapid-MLX (OpenAI-compatible server)
 ```bash
+pip install rapid-mlx
+rapid-mlx serve lthn/lemer --port 8100
+```
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:8100/v1", api_key="not-needed")
+response = client.chat.completions.create(
+    model="default",
+    messages=[{"role": "user", "content": "Hello"}],
+)
+print(response.choices[0].message.content)
 ```
 ### HF Transformers
 | Branch | Size |
 |--------|------|
+| `bf16-gguf` | 8.7G |
+| `8bit-gguf` | 4.6G |
+| `6bit-gguf` | 3.6G |
+| `5bit-gguf` | 3.0G |
+| `4bit-gguf` | 2.5G |
+| `3bit-gguf` | 2.0G |
 ### HF Transformers
 | Branch | Size |
 |--------|------|
+| `bf16-hf` | 8.7G |
 ## Base
 ## More
+- [lthn.ai](https://lthn.ai)
+- [Lethean Network](https://github.com/LetheanNetwork)
 ## Licence
 Training data and adapter: [EUPL-1.2](https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12)
+Base model: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)