Rapid42/gemma-4-E2B-it-MLX

Gemma 4 (~1B, E2B ultra-efficient variant) — MLX format for Apple Silicon, instruction-tuned

Converted and optimized by Rapid42 — engineering tools for fast pipelines.


What This Is

This is Gemma 4 E2B — Google DeepMind's ultra-compact multimodal Gemma 4 variant (~1B parameters) in MLX format for native Apple Silicon inference. Instruction-tuned (-it) for chat and task-following.

The E2B is the smallest model in the Gemma 4 family — prioritising speed and minimal memory over raw capability. It still supports image input, making it the most capable sub-2B multimodal model available in MLX format.

  • Parameters: ~1B (E2B = Efficient 2B-class, actual ~1B)
  • Modality: Text + Image input → Text output
  • Format: MLX (Apple Silicon native)
  • Base model: google/gemma-4-it
  • License: Apache 2.0

Hardware Requirements

Device RAM Experience
Any M-series Mac (8GB+) ~1.5GB ✅ Runs on everything
M1 MacBook Air (8GB) ~1.5GB ✅ Extremely fast
iPhone / iPad (via MLX) ~1.5GB ✅ On-device capable
M3 Max ~1.5GB ✅ Near-instant — alongside any other app

The lightest multimodal model you can run locally. Load time under 2 seconds.


Quick Start

pip install mlx-lm

Text chat:

from mlx_lm import load, generate

model, tokenizer = load("Rapid42/gemma-4-E2B-it-MLX")

messages = [{"role": "user", "content": "Summarize this in one paragraph: [paste text]"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_dict=False
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=256, verbose=True)
print(response)

CLI (fastest way to chat):

mlx_lm.chat --model Rapid42/gemma-4-E2B-it-MLX

With image input (via MLX-VLM):

pip install mlx-vlm
python -m mlx_vlm.generate \
  --model Rapid42/gemma-4-E2B-it-MLX \
  --prompt "What's in this image?" \
  --image /path/to/image.jpg

E2B vs E4B — Which to Use?

Use Case E2B (~1B) E4B (~8B)
Quick summaries, short answers ✅ Fast ✅ More accurate
Complex reasoning ❌ Limited ✅ Much better
Always-on background assistant ✅ Ideal ⚠️ Uses more RAM
Image understanding ✅ Basic ✅ Strong
On-device mobile ✅ Yes ⚠️ Tight
Code generation ⚠️ Simple only ✅ Good

Rule of thumb: Use E2B when you need speed and low overhead. Use E4B when you need quality.


Gemma 4 License

Apache 2.0. Full details: ai.google.dev/gemma/docs/gemma_4_license

Authors: Google DeepMind


About Rapid42

Rapid42 builds fast, precise engineering tools — from VFX pipeline utilities to optimized ML model distributions.

rapid42.com · ExrToPsd · Level Careers

Downloads last month
444
Safetensors
Model size
1.0B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support