Gemma 4 E4B IT – GGUF (Q6_K)

Format Precision Runtime


πŸ”· Model Overview

This repository contains a GGUF Q6_K conversion of:

  • Base Model: gemma-4-e4b-it
  • Developer: Google
  • Format: GGUF (optimized for llama.cpp)
  • Precision: Q6_K

This model is designed for high-quality local inference.


πŸ“¦ Files

File Description
gemma-4-e4b-it-Q6_K.gguf Q6_K full-precision GGUF model

βš™οΈ Technical Details

Parameter Value
Architecture gemma-4-e4b-it
Format GGUF
Precision Q6_K
Runtime llama.cpp
Use Case High-quality inference

⚑ Why GGUF?

GGUF enables:

  • Efficient CPU inference via llama.cpp
  • Single-file model distribution
  • Fast loading using memory mapping
  • Cross-platform compatibility

⚠️ License & Usage

This is a converted derivative model.

You must comply with the original license:
πŸ‘‰ https://huggingface.co/google/gemma-4-e4b-it

Important:

  • ❌ Not an official Google release
  • ❌ No additional rights granted
  • βœ… Original model ownership remains with Google
  • ⚠️ Use responsibly under original license terms

πŸš€ Quick Start (llama.cpp)

./llama-cli -m gemma-4-e4b-it-Q6_K.gguf -p "Explain AI simply"
Downloads last month
36
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support