Gemma 4 E4B-it โ€” Ternary Quantized (tritplane3)

Ternary-quantized version of google/gemma-4-E4B-it using ternary-quant.

Model Specifications

Property Value
Base Model google/gemma-4-E4B-it
Parameters ~8B
Architecture Dense transformer, multimodal (image + text)
Quantization tritplane3 (3-plane progressive ternary)
Quantized Components text_backbone + multimodal_connector (342 layers)
Vision Encoder FP16 (preserved)
License Gemma

Size Comparison

Method Size Bits/Weight VLM Support
FP16 (original) 16 GB 16 Yes
Ternary tritplane3 4.2 GB ~8-10 Yes (vision+text)
Compression 3.8x

Few quantized alternatives exist for this model. GGUF variants typically don't support the E4B multimodal architecture.

Quality Comparison (FP16 vs Ternary)

Prompt FP16 Original Ternary (ours)
"Capital of France?" Paris Paris
"Photosynthesis" ...convert light energy into chemical energy...releases oxygen ...convert light energy into chemical energy...releases oxygen, essential for life on Earth
"Python reverse string" The Pythonic Way (Slicing) - Recommended Using Slicing (The most Pythonic way)

Near-identical quality. Same facts, same reasoning.

Memory Requirements

Runtime Min Memory Hardware
cached (CPU) ~8 GB RAM Any
metal (Apple Silicon) ~6 GB unified M1+
triton_memory (CUDA) ~5 GB VRAM Any NVIDIA GPU

Quickstart

pip install ternary-quant
from ternary_quant.inference import load_ternary_model

model, processor = load_ternary_model(
    "AsadIsmail/gemma-4-E4B-it-ternary",
    runtime_mode="metal",  # "cached" for CPU/NVIDIA
    device="auto"
)

messages = [{"role": "user", "content": [{"type": "text", "text": "Describe this image"}]}]
formatted = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=formatted, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(processor.decode(outputs[0], skip_special_tokens=True))

Collection

Part of ternary-models โ€” ternary-quantized VLMs, multimodal, and audio models.

GitHub: github.com/Asad-Ismail/ternary-models | Library: github.com/Asad-Ismail/ternary-quant

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AsadIsmail/gemma-4-E4B-it-ternary

Finetuned
(79)
this model

Space using AsadIsmail/gemma-4-E4B-it-ternary 1

Collection including AsadIsmail/gemma-4-E4B-it-ternary