docs: remove preliminary benchmarks pending official eval
Browse files
README.md
CHANGED
|
@@ -31,52 +31,6 @@ A [Gemma 4 E2B](https://huggingface.co/google/gemma-4-E2B-it) finetune by [lthn.
|
|
| 31 |
**HF Transformers**: on main (4-bit NF4 + bf16 in hf-bf16/)
|
| 32 |
|
| 33 |
|
| 34 |
-
## Benchmarks
|
| 35 |
-
|
| 36 |
-
### Eval: MMLU Pro
|
| 37 |
-
|
| 38 |
-
Columns: **(Think, Temperature)** — `G4` = Stock Gemma 4 E2B, `Lemer` = LEK-activated
|
| 39 |
-
|
| 40 |
-
| | G4(1,0) | G4(1,1) | G4(0,0) | G4(0,1) | Lemer(1,0) | Lemer(1,1) | Lemer(0,0) | Lemer(0,1) |
|
| 41 |
-
| :---- | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
|
| 42 |
-
| Biology | 40.0% | TBC | TBC | TBC | **60.0%** | [**65.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "65.0") | TBC | TBC |
|
| 43 |
-
| Math | 10.0% | 30.0% | 15.0% | 10.0% | **55.0%** | [**80.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "80.0") | **25.0%** | **25.0%** |
|
| 44 |
-
| Business | TBC | TBC | TBC | 40.0% | TBC | [**50.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "50.0") | TBC | TBC |
|
| 45 |
-
| Chemistry | TBC | TBC | TBC | 15.0% | TBC | [**35.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "35.0") | TBC | TBC |
|
| 46 |
-
| Computer Science | TBC | TBC | TBC | 65.0% | TBC | [**65.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "65.0") | TBC | TBC |
|
| 47 |
-
| Economics | TBC | TBC | TBC | 25.0% | TBC | [**45.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "45.0") | TBC | TBC |
|
| 48 |
-
| Engineering | TBC | TBC | TBC | 50.0% | TBC | [**55.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "55.0") | TBC | TBC |
|
| 49 |
-
| Health | TBC | TBC | TBC | 25.0% | TBC | [**35.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "35.0") | TBC | TBC |
|
| 50 |
-
| History | TBC | TBC | TBC | 5.0% | TBC | [**15.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "15.0") | TBC | TBC |
|
| 51 |
-
| Law | TBC | TBC | TBC | 15.0% | TBC | [**25.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "25.0") | TBC | TBC |
|
| 52 |
-
| Other | TBC | TBC | TBC | 45.0% | TBC | [**55.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "55.0") | TBC | TBC |
|
| 53 |
-
| Philosophy | TBC | TBC | TBC | 25.0% | TBC | [**30.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "30.0") | TBC | TBC |
|
| 54 |
-
| Physics | TBC | TBC | TBC | 45.0% | TBC | [**50.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "50.0") | TBC | TBC |
|
| 55 |
-
| Psychology | TBC | TBC | TBC | 20.0% | TBC | [**45.0%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "45.0") | TBC | TBC |
|
| 56 |
-
| **Average** | TBC | TBC | TBC | **33.6%** | TBC | [**46.4%**](https://huggingface.co/lthn/lemer/blob/bf16/README.md#mmlu-pro-bf16-best-per-category "46.4") | TBC | TBC |
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
### Lemer Quantisation Benchmarks (MMLU-Pro, all categories)
|
| 60 |
-
|
| 61 |
-
| | [bf16](https://huggingface.co/lthn/lemer/tree/bf16) | [8bit](https://huggingface.co/lthn/lemer/tree/8bit) | [6bit](https://huggingface.co/lthn/lemer/tree/6bit) | [5bit](https://huggingface.co/lthn/lemer/tree/5bit) | [4bit](https://huggingface.co/lthn/lemer/tree/4bit) | [mxfp8](https://huggingface.co/lthn/lemer/tree/mxfp8) | [mxfp4](https://huggingface.co/lthn/lemer/tree/mxfp4) | [nvfp4](https://huggingface.co/lthn/lemer/tree/nvfp4) |
|
| 62 |
-
| :---- | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
|
| 63 |
-
| Biology | 65.0% | TBC | TBC | TBC | 45.0% | TBC | TBC | TBC |
|
| 64 |
-
| Math | 60.0% | TBC | TBC | TBC | 50.0% | TBC | TBC | TBC |
|
| 65 |
-
| Business | 45.0% | TBC | TBC | TBC | 40.0% | TBC | TBC | TBC |
|
| 66 |
-
| Chemistry | 20.0% | TBC | TBC | TBC | 15.0% | TBC | TBC | TBC |
|
| 67 |
-
| Computer Science | 50.0% | TBC | TBC | TBC | 65.0% | TBC | TBC | TBC |
|
| 68 |
-
| Economics | 25.0% | TBC | TBC | TBC | 25.0% | TBC | TBC | TBC |
|
| 69 |
-
| Engineering | 50.0% | TBC | TBC | TBC | 50.0% | TBC | TBC | TBC |
|
| 70 |
-
| Health | 25.0% | TBC | TBC | TBC | 25.0% | TBC | TBC | TBC |
|
| 71 |
-
| History | 10.0% | TBC | TBC | TBC | 5.0% | TBC | TBC | TBC |
|
| 72 |
-
| Law | 25.0% | TBC | TBC | TBC | 15.0% | TBC | TBC | TBC |
|
| 73 |
-
| Other | 45.0% | TBC | TBC | TBC | 45.0% | TBC | TBC | TBC |
|
| 74 |
-
| Philosophy | 20.0% | TBC | TBC | TBC | 25.0% | TBC | TBC | TBC |
|
| 75 |
-
| Physics | 50.0% | TBC | TBC | TBC | 45.0% | TBC | TBC | TBC |
|
| 76 |
-
| Psychology | 25.0% | TBC | TBC | TBC | 20.0% | TBC | TBC | TBC |
|
| 77 |
-
| **Average** | **36.8%** | TBC | TBC | TBC | **33.6%** | TBC | TBC | TBC |
|
| 78 |
-
|
| 79 |
-
|
| 80 |
## Base
|
| 81 |
|
| 82 |
[google/gemma-4-E2B-it](https://huggingface.co/google/gemma-4-E2B-it)
|
|
|
|
| 31 |
**HF Transformers**: on main (4-bit NF4 + bf16 in hf-bf16/)
|
| 32 |
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
## Base
|
| 35 |
|
| 36 |
[google/gemma-4-E2B-it](https://huggingface.co/google/gemma-4-E2B-it)
|