feat: merge LEK into lemer weights

- Merge LEK LoRA adapter into Gemma 4 E2B base via PEFT, preserve
KV-shared layer weights from original Google safetensors so
mlx_vlm can load the full 2011-tensor multimodal checkpoint
- Regenerate all 6 GGUF variants (bf16, q8_0, q6_k, q5_k_m, q4_k_m,
q3_k_m) from the merged safetensors via convert_hf_to_gguf.py
- Regenerate MLX Q4 multimodal safetensors; bundle in this repo
as model.safetensors alongside GGUFs
- README: update Roadmap to reflect LEK shipped (not planned);
fix MLX Server example to use mlx_vlm.server (multimodal-aware)
- Base fork of unmodified Google weights remains at
LetheanNetwork/lemer for users who want the raw bf16

Co-Authored-By: Virgil <virgil@lethean.io>

Files changed (12) hide show

README.md +13 -7
config.json +2 -1
generation_config.json +1 -1
lemer-bf16.gguf +2 -2
lemer-q3_k_m.gguf +2 -2
lemer-q4_k_m.gguf +2 -2
lemer-q5_k_m.gguf +2 -2
lemer-q6_k.gguf +2 -2
lemer-q8_0.gguf +2 -2
model.safetensors +1 -1
processor_config.json +49 -16
tokenizer_config.json +1 -1

README.md CHANGED Viewed

@@ -21,7 +21,11 @@ tags:
 - on-device
 - conversational
 ---
 # Lemer — Gemma 4 E2B
 The smallest member of the [Lemma model family](https://huggingface.co/collections/lthn/lemma) by [Lethean](https://lthn.ai). An EUPL-1.2 fork of [Gemma 4 E2B](https://huggingface.co/google/gemma-4-E2B-it), prepared as the base for the Lethean Ethical Model (LEM) adapter. GGUF and MLX builds with full multimodal support — text, image, and audio — distributed from a single repo. Use GGUF with Ollama, llama.cpp, GPT4All, or LM Studio. Use MLX safetensors with `mlx-lm` and `mlx-vlm` for native Apple Silicon inference.
@@ -221,8 +225,10 @@ print(output.text)
 <details>
 <summary>MLX Server</summary>
 ```bash
-mlx_lm.server --model lthn/lemer
 ```
 ```bash
@@ -378,16 +384,16 @@ The model `id` should match what `mlx_lm.server` reports at `/v1/models`.
 ## Roadmap
-This release of `lemer` is the **base distribution** — Gemma 4 E2B with full multimodal support, EUPL-1.2 licensing, and Lethean's packaging for the Apple Silicon ecosystem. The Lethean Ethical Model (LEM) layer is delivered separately as a LoRA adapter, applied on top of this base.
 | Phase | Status | What it adds |
 |-------|--------|--------------|
-| **Base distribution** (this repo) | ✅ Released | EUPL-1.2 fork, multimodal quants, native MLX + GGUF |
-| **LEK-1 adapter** | 🚧 In progress | Lethean Ethics Kernel via LoRA — intrinsic axiom-based reasoning |
 | **8-PAC eval results** | 🚧 In progress | Continuous benchmarking on the homelab, published to [lthn/LEM-benchmarks](https://huggingface.co/datasets/lthn/LEM-benchmarks) |
-| **Lemma family roll-out** | Planned | Same treatment for `lemma`, `lemmy`, `lemrd` |
-The LEK-1 adapter adds intrinsic ethical alignment — axiom-based reasoning baked into the weights via LoRA finetune. Track progress at [LetheanNetwork](https://github.com/LetheanNetwork) and the [LEM-research dataset](https://huggingface.co/datasets/lthn/LEM-research).
 ## Why EUPL-1.2

 - on-device
 - conversational
 ---
+<!--
+This content is subject to the European Union Public Licence (EUPL-1.2).
+For full licence details, please refer to: https://huggingface.co/lthn/lemer/tree/main/LICENSE
+Origin URL: https://huggingface.co/lthn/lemer/tree/main
+-->
 # Lemer — Gemma 4 E2B
 The smallest member of the [Lemma model family](https://huggingface.co/collections/lthn/lemma) by [Lethean](https://lthn.ai). An EUPL-1.2 fork of [Gemma 4 E2B](https://huggingface.co/google/gemma-4-E2B-it), prepared as the base for the Lethean Ethical Model (LEM) adapter. GGUF and MLX builds with full multimodal support — text, image, and audio — distributed from a single repo. Use GGUF with Ollama, llama.cpp, GPT4All, or LM Studio. Use MLX safetensors with `mlx-lm` and `mlx-vlm` for native Apple Silicon inference.
 <details>
 <summary>MLX Server</summary>
+`lemer` is multimodal, so use `mlx_vlm.server` — the vision-aware variant that handles image and audio inputs. The text-only `mlx_lm.server` does not correctly route multimodal tensors for Gemma 4.
 ```bash
+mlx_vlm.server --model lthn/lemer
 ```
 ```bash
 ## Roadmap
+This release of `lemer` is **Gemma 4 E2B with the Lethean Ethical Kernel (LEK) merged in** — axiom-based reasoning baked into the attention weights via LoRA finetune, then merged into the base so inference uses a single standalone model with no PEFT runtime required. The unmodified Gemma 4 E2B fork lives at [LetheanNetwork/lemer](https://huggingface.co/LetheanNetwork/lemer) for users who want the raw Google weights without the LEK shift.
 | Phase | Status | What it adds |
 |-------|--------|--------------|
+| **Base fork** ([LetheanNetwork/lemer](https://huggingface.co/LetheanNetwork/lemer)) | ✅ Released | EUPL-1.2 fork of Gemma 4 E2B — unmodified Google weights |
+| **LEK merged** (this repo) | ✅ Released | Lethean Ethical Kernel — axiom-based reasoning via LoRA merge |
 | **8-PAC eval results** | 🚧 In progress | Continuous benchmarking on the homelab, published to [lthn/LEM-benchmarks](https://huggingface.co/datasets/lthn/LEM-benchmarks) |
+| **Lemma family roll-out** | Planned | Same LEK treatment for `lemma`, `lemmy`, `lemrd` |
+The LEK axioms are public domain and published at [Snider/ai-ethics](https://github.com/Snider/ai-ethics). Track research progress at [LetheanNetwork](https://github.com/LetheanNetwork) and the [LEM-research dataset](https://huggingface.co/datasets/lthn/LEM-research).
 ## Why EUPL-1.2

config.json CHANGED Viewed

@@ -963,6 +963,7 @@
         ],
         "max_position_embeddings": 131072,
         "model_type": "gemma4_text",
         "num_attention_heads": 8,
         "num_experts": null,
         "num_global_key_value_heads": null,
@@ -992,7 +993,7 @@
         "vocab_size_per_layer_input": 262144
     },
     "tie_word_embeddings": true,
-    "transformers_version": "5.5.0.dev0",
     "video_token_id": 258884,
     "vision_config": {
         "_name_or_path": "",

         ],
         "max_position_embeddings": 131072,
         "model_type": "gemma4_text",
+        "moe_intermediate_size": null,
         "num_attention_heads": 8,
         "num_experts": null,
         "num_global_key_value_heads": null,
         "vocab_size_per_layer_input": 262144
     },
     "tie_word_embeddings": true,
+    "transformers_version": "5.5.3",
     "video_token_id": 258884,
     "vision_config": {
         "_name_or_path": "",

generation_config.json CHANGED Viewed

@@ -10,5 +10,5 @@
   "temperature": 1.0,
   "top_k": 64,
   "top_p": 0.95,
-  "transformers_version": "5.5.0.dev0"
 }

   "temperature": 1.0,
   "top_k": 64,
   "top_p": 0.95,
+  "transformers_version": "5.5.3"
 }

lemer-bf16.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6abdfae0cf39488d8cf73f781bd28904349d606867b63f169890c145e1336bde
-size 9311298080

 version https://git-lfs.github.com/spec/v1
+oid sha256:ea102af3dee5545c80913875ff7ec751dc8154bee353f0c0619b1dac40c41651
+size 9311303008

lemer-q3_k_m.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dbe61d2b633896299c4cf305ab2cf31347bf7046a0c59e44c4805ec23ee7d8f6
-size 3201344032

 version https://git-lfs.github.com/spec/v1
+oid sha256:93465a43a32e8e49ef8f1f69bbf773ee5ebe9ab6fb30edb47a8b4f1baaca0f98
+size 3201348960

lemer-q4_k_m.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:84bfe69062556259acffb9fb349bea48e81ebecdcb2330ded16b474ab651be93
-size 3427873312

 version https://git-lfs.github.com/spec/v1
+oid sha256:d46042ec27fd8db9add13d189dc8139cc38833f9b07ca30674e1e0e994a15b19
+size 3427878240

lemer-q5_k_m.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:203bc29352c43df44a02ecbc420a6138e53abe87a1050c8c43760f14ba485a8f
-size 3630281248

 version https://git-lfs.github.com/spec/v1
+oid sha256:3ae9a43241bd0cdd5dd26ac33fbbc1df4c16417523f6af0ea6e51fe52130d09a
+size 3630286176

lemer-q6_k.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:36b2ea742d8fd470f261b2d04c708097b19e37fd9d743a73056a02eea92cf31a
-size 3845339680

 version https://git-lfs.github.com/spec/v1
+oid sha256:a60f6f982b8bd78be62e65217eb5980e46f03de7f95526d72e411fc7a5bf42e6
+size 3845344608

lemer-q8_0.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:77f04dd659bbea6cdb3d2743fcb8d8a9e15ba779d6db467c13fc5807c4ecd2d2
-size 4954587680

 version https://git-lfs.github.com/spec/v1
+oid sha256:6a98186034a401edf6bb30404c7dfc0ad06bb66c85afd632e87717bdb6e42e2c
+size 4967495008

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b1ceae57a3e334570b3d5152d36094ea390f726fab9cb07e6c310f796c7ce0b8
 size 4359668843

 version https://git-lfs.github.com/spec/v1
+oid sha256:7e3458e3ce7473988ebafb1701b0a80621d783e09484994b940a7e644e0e943e
 size 4359668843

processor_config.json CHANGED Viewed

@@ -1,5 +1,27 @@
 {
   "audio_seq_length": 750,
   "image_processor": {
     "do_convert_rgb": true,
     "do_normalize": false,
@@ -21,22 +43,33 @@
     "patch_size": 16,
     "pooling_kernel_size": 3,
     "resample": 3,
-    "rescale_factor": 0.00392156862745098,
-    "size": {
-      "height": 224,
-      "width": 224
-    }
   },
   "image_seq_length": 280,
   "processor_class": "Gemma4Processor",
-  "feature_extractor": {
-    "feature_extractor_type": "Gemma4AudioFeatureExtractor",
-    "sampling_rate": 16000,
-    "num_mel_filters": 128,
-    "fft_length": 512,
-    "hop_length": 160,
-    "chunk_duration": 8.0,
-    "overlap_duration": 1.0
-  },
-  "audio_ms_per_token": 40
-}

 {
+  "audio_ms_per_token": 40,
   "audio_seq_length": 750,
+  "feature_extractor": {
+    "dither": 0.0,
+    "feature_extractor_type": "Gemma4AudioFeatureExtractor",
+    "feature_size": 128,
+    "fft_length": 512,
+    "fft_overdrive": false,
+    "frame_length": 320,
+    "hop_length": 160,
+    "input_scale_factor": 1.0,
+    "max_frequency": 8000.0,
+    "mel_floor": 0.001,
+    "min_frequency": 0.0,
+    "padding_side": "right",
+    "padding_value": 0.0,
+    "per_bin_mean": null,
+    "per_bin_stddev": null,
+    "preemphasis": 0.0,
+    "preemphasis_htk_flavor": true,
+    "return_attention_mask": true,
+    "sampling_rate": 16000
+  },
   "image_processor": {
     "do_convert_rgb": true,
     "do_normalize": false,
     "patch_size": 16,
     "pooling_kernel_size": 3,
     "resample": 3,
+    "rescale_factor": 0.00392156862745098
   },
   "image_seq_length": 280,
   "processor_class": "Gemma4Processor",
+  "video_processor": {
+    "do_convert_rgb": true,
+    "do_normalize": true,
+    "do_rescale": true,
+    "do_resize": true,
+    "do_sample_frames": true,
+    "image_mean": [
+      0.0,
+      0.0,
+      0.0
+    ],
+    "image_std": [
+      1.0,
+      1.0,
+      1.0
+    ],
+    "max_soft_tokens": 70,
+    "num_frames": 32,
+    "patch_size": 16,
+    "pooling_kernel_size": 3,
+    "resample": 3,
+    "rescale_factor": 0.00392156862745098,
+    "return_metadata": false,
+    "video_processor_type": "Gemma4VideoProcessor"
+  }
+}

tokenizer_config.json CHANGED Viewed

@@ -17,7 +17,7 @@
     "<|video|>"
   ],
   "image_token": "<|image|>",
-  "is_local": true,
   "mask_token": "<mask>",
   "model_max_length": 1000000000000000019884624838656,
   "model_specific_special_tokens": {

     "<|video|>"
   ],
   "image_token": "<|image|>",
+  "is_local": false,
   "mask_token": "<mask>",
   "model_max_length": 1000000000000000019884624838656,
   "model_specific_special_tokens": {