Gemma4-E2B for RDK S100P (Quantized HBM)

Pre-compiled quantized HBM models of Google Gemma4-E2B (Vision + Text) for deployment on D-Robotics RDK S100P (march=nash-m).

Overview

This repository contains PTQ (Post-Training Quantization) compiled HBM model files ready for deployment on RDK S100P hardware.

Component Description Size
Vision HBM 16-layer ViT encoder + Pooler + Projector 329 MB
Text HBM 35-layer Decoder (prefill + decode linked) 4.5 GB
tok_embeddings.bin External token embedding table 1.5 GB

Architecture

Image (672x960) -> Vision HBM -> [280, 1536] soft tokens
                                       | masked_scatter
                         Text HBM (35-layer Decoder)
                                       |
                                  Token output
  • Vision: 16-layer ViT, hidden=768, 2520 patches -> 280 soft tokens
  • Text: 35-layer Decoder, hidden=1536, GQA (1 KV head), PLE (Per-Layer Embeddings)
  • Vocab: 262,144
  • Quantization: PTQ INT8, calibrated with 50 COCO real images (Vision) + 234 text prompts (Text)

Quantization Accuracy

Model Cosine Similarity (vs float32)
Vision HBM 0.9888
Text HBM 0.9540

Files

model/
β”œβ”€β”€ gemma4-e2b_vit_ptq.hbm                          # Vision HBM
β”œβ”€β”€ gemma4-e2b_lm_chunk_256_cache_4096_ptq.hbm      # Text HBM (prefill + decode)
└── tok_embeddings.bin                              # Token embedding table
tokenizer/
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ chat_template.jinja
└── config.json

Download

pip install huggingface_hub
hf download ShockleyWong/gemma4-e2b-rdk-s100p --local-dir ./gemma4_e2b_deploy

Compilation Parameters

Parameter Value
march nash-m
core_num 1
chunk_size 256
cache_len 4096
OE-LLM version 1.0.0

Source Code

Quantization scripts and documentation: github.com/shockley6668/gemma4-e2b-rdk-s100p

License

Gemma models are released under the Gemma Terms of Use. See Google's Gemma license for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support