Gemma4-E2B for RDK S100P (Quantized HBM)

Pre-compiled quantized HBM models of Google Gemma4-E2B (Vision + Text) for deployment on D-Robotics RDK S100P (march=nash-m).

Overview

This repository contains PTQ (Post-Training Quantization) compiled HBM model files ready for deployment on RDK S100P hardware.

Component	Description	Size
Vision HBM	16-layer ViT encoder + Pooler + Projector	329 MB
Text HBM	35-layer Decoder (prefill + decode linked)	4.5 GB
tok_embeddings.bin	External token embedding table	1.5 GB

Architecture

Image (672x960) -> Vision HBM -> [280, 1536] soft tokens
                                       | masked_scatter
                         Text HBM (35-layer Decoder)
                                       |
                                  Token output

Vision: 16-layer ViT, hidden=768, 2520 patches -> 280 soft tokens
Text: 35-layer Decoder, hidden=1536, GQA (1 KV head), PLE (Per-Layer Embeddings)
Vocab: 262,144
Quantization: PTQ INT8, calibrated with 50 COCO real images (Vision) + 234 text prompts (Text)

Quantization Accuracy

Model	Cosine Similarity (vs float32)
Vision HBM	0.9888
Text HBM	0.9540

Files

model/
├── gemma4-e2b_vit_ptq.hbm                          # Vision HBM
├── gemma4-e2b_lm_chunk_256_cache_4096_ptq.hbm      # Text HBM (prefill + decode)
└── tok_embeddings.bin                              # Token embedding table
tokenizer/
├── tokenizer.json
├── tokenizer_config.json
├── chat_template.jinja
└── config.json

Download

pip install huggingface_hub
hf download ShockleyWong/gemma4-e2b-rdk-s100p --local-dir ./gemma4_e2b_deploy

Compilation Parameters

Parameter	Value
march	nash-m
core_num	1
chunk_size	256
cache_len	4096
OE-LLM version	1.0.0

Source Code

Quantization scripts and documentation: github.com/shockley6668/gemma4-e2b-rdk-s100p

License

Gemma models are released under the Gemma Terms of Use. See Google's Gemma license for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support