Rapid42/gemma-4-E4B-it-MLX

Gemma 4 (~8B, E4B variant) — MLX format for Apple Silicon, instruction-tuned

Converted and optimized by Rapid42 — engineering tools for fast pipelines.


What This Is

This is Gemma 4 E4B (Google DeepMind's 4th generation Gemma, ~8B parameters in the E4B multimodal variant) converted to MLX format for native Apple Silicon inference. Instruction-tuned (-it) for chat and task-following.

Gemma 4 is Google's latest open model family — multimodal (text + image input), with strong performance on reasoning, coding, and multilingual tasks.

  • Parameters: ~8B (E4B = Efficient 4B-class, actual ~8B)
  • Modality: Text + Image input → Text output
  • Format: MLX (Apple Silicon native)
  • Base model: google/gemma-4-it
  • License: Apache 2.0

Hardware Requirements

Device RAM Experience
Any M-series Mac (16GB+) ~8GB ✅ Fast and smooth
M1 MacBook Air (8GB) ~8GB ⚠️ Tight — works but little headroom
M3 Max / Pro ~8GB ✅ Near-instant

A practical choice for multimodal tasks on any Apple Silicon machine.


Quick Start

pip install mlx-lm

Text chat:

from mlx_lm import load, generate

model, tokenizer = load("Rapid42/gemma-4-E4B-it-MLX")

messages = [{"role": "user", "content": "What are the key advantages of MLX over PyTorch for Apple Silicon?"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_dict=False
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True)
print(response)

With image input (via MLX-VLM):

pip install mlx-vlm
python -m mlx_vlm.generate \
  --model Rapid42/gemma-4-E4B-it-MLX \
  --prompt "Describe what you see in this image." \
  --image /path/to/image.jpg

Why Gemma 4?

Gemma 4 represents a significant step up from Gemma 2/3:

  • Multimodal — understands images, not just text
  • Improved reasoning — stronger on benchmarks vs Gemma 3 of the same size
  • Apache 2.0 — fully open license, commercial use allowed
  • Google DeepMind quality — trained on the same infrastructure as Gemini

The E4B variant is the sweet spot: multimodal capability at 8B-class efficiency.


Gemma 4 License

Apache 2.0. Full details: ai.google.dev/gemma/docs/gemma_4_license

Authors: Google DeepMind


About Rapid42

Rapid42 builds fast, precise engineering tools — from VFX pipeline utilities to optimized ML model distributions.

rapid42.com · ExrToPsd · Level Careers

Downloads last month
663
Safetensors
Model size
8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support