gemma-4-E2B-it-GGUF

Gemma-4-E2B-it from Google is an ultra-efficient 2.3B effective parameter (5.1B total with Per-Layer Embeddings) multimodal dense model in the Gemma 4 family, purpose-built for on-device deployment across smartphones, laptops, Raspberry Pi, and IoT edge hardware with native support for text, images (variable aspect ratio/resolution), audio, and configurable thinking modes for advanced reasoning. Featuring 35 layers, 512-token sliding window, 128K context length, and 262K vocabulary, it excels at agentic workflows, OCR (multilingual/handwriting), document/PDF parsing, UI/screen understanding, chart comprehension, object detection, coding assistance, and low-latency inference optimized for Qualcomm/MediaTek chips via Android AICore—delivering frontier-level intelligence rivaling models 20x larger while consuming minimal RAM/battery. The instruction-tuned variant prioritizes seamless integration for mobile developers prototyping autonomous agents, with safety protocols matching Google's proprietary standards and open weights enabling local-first AI servers on consumer GPUs for reasoning-heavy tasks like IDE assistance and structured data extraction.

Model Files

File Name Quant Type File Size File Link
gemma-4-E2B-it.BF16.gguf BF16 9.31 GB Download
gemma-4-E2B-it.F16.gguf F16 9.31 GB Download
gemma-4-E2B-it.Q2_K.gguf Q2_K 2.99 GB Download
gemma-4-E2B-it.Q3_K_L.gguf Q3_K_L 3.28 GB Download
gemma-4-E2B-it.Q3_K_M.gguf Q3_K_M 3.2 GB Download
gemma-4-E2B-it.Q3_K_S.gguf Q3_K_S 3.11 GB Download
gemma-4-E2B-it.Q4_0.gguf Q4_0 3.36 GB Download
gemma-4-E2B-it.Q4_K_M.gguf Q4_K_M 3.43 GB Download
gemma-4-E2B-it.Q4_K_S.gguf Q4_K_S 3.37 GB Download
gemma-4-E2B-it.Q5_0.gguf Q5_0 3.6 GB Download
gemma-4-E2B-it.Q5_K_M.gguf Q5_K_M 3.63 GB Download
gemma-4-E2B-it.Q5_K_S.gguf Q5_K_S 3.6 GB Download
gemma-4-E2B-it.Q6_K.gguf Q6_K 3.85 GB Download
gemma-4-E2B-it.Q8_0.gguf Q8_0 4.95 GB Download
gemma-4-E2B-it.mmproj-bf16.gguf mmproj-bf16 987 MB Download
gemma-4-E2B-it.mmproj-f16.gguf mmproj-f16 987 MB Download
gemma-4-E2B-it.mmproj-q8_0.gguf mmproj-q8_0 557 MB Download

llama.cpp

LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp

Downloads last month
965
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/gemma-4-E2B-it-GGUF

Quantized
(222)
this model

Collection including prithivMLmods/gemma-4-E2B-it-GGUF