--- library_name: mlx license: apache-2.0 pipeline_tag: image-text-to-text base_model: - google/gemma-4-E2B-it base_model_relation: quantized tags: - gemma4 - mlx - apple-silicon - 8bit - on-device - conversational --- # LetheanNetwork/lemer-mlx-8bit Gemma 4 E2B in MLX format, 8-bit quantized, converted from [LetheanNetwork/lemer](https://huggingface.co/LetheanNetwork/lemer)'s bf16 safetensors via `mlx_lm.convert`. Higher-precision sibling of [`LetheanNetwork/lemer-mlx`](https://huggingface.co/LetheanNetwork/lemer-mlx) (which is 4-bit). For the LEK-merged variant see [`lthn/lemer`](https://huggingface.co/lthn/lemer). ## Variants in this family | Repo | Format | Bits | Use case | |---|---|---|---| | [`LetheanNetwork/lemer`](https://huggingface.co/LetheanNetwork/lemer) | safetensors + gguf Q4_K_M | bf16 / 4 | Source weights + llama.cpp/Ollama | | [`LetheanNetwork/lemer-mlx`](https://huggingface.co/LetheanNetwork/lemer-mlx) | mlx | 4 | Apple Silicon default | | **`LetheanNetwork/lemer-mlx-8bit`** | mlx | 8 | **This repo** — higher precision | | [`LetheanNetwork/lemer-mlx-bf16`](https://huggingface.co/LetheanNetwork/lemer-mlx-bf16) | mlx | bf16 | Full-precision reference | ## Usage ```python from mlx_lm import load, generate model, tokenizer = load("LetheanNetwork/lemer-mlx-8bit") response = generate( model, tokenizer, prompt=tokenizer.apply_chat_template( [{"role": "user", "content": "Hello"}], add_generation_prompt=True, enable_thinking=True, ), max_tokens=512, ) ``` ## Provenance - Source: `LetheanNetwork/lemer` bf16 safetensors (= `google/gemma-4-E2B-it`) - Converter: `mlx_lm.convert` (mlx-lm — LM Studio / Apple ML Research) - Quant: 8-bit group quantization, ~8.5 bits/weight effective - License: Apache 2.0 (Gemma Terms of Use) ## License Apache 2.0, subject to the [Gemma Terms of Use](https://ai.google.dev/gemma/docs/gemma_4_license).