Gemma4-E2B for RDK S100P (Quantized HBM)
Pre-compiled quantized HBM models of Google Gemma4-E2B (Vision + Text) for deployment on D-Robotics RDK S100P (march=nash-m).
Overview
This repository contains PTQ (Post-Training Quantization) compiled HBM model files ready for deployment on RDK S100P hardware.
| Component | Description | Size |
|---|---|---|
| Vision HBM | 16-layer ViT encoder + Pooler + Projector | 329 MB |
| Text HBM | 35-layer Decoder (prefill + decode linked) | 4.5 GB |
| tok_embeddings.bin | External token embedding table | 1.5 GB |
Architecture
Image (672x960) -> Vision HBM -> [280, 1536] soft tokens
| masked_scatter
Text HBM (35-layer Decoder)
|
Token output
- Vision: 16-layer ViT, hidden=768, 2520 patches -> 280 soft tokens
- Text: 35-layer Decoder, hidden=1536, GQA (1 KV head), PLE (Per-Layer Embeddings)
- Vocab: 262,144
- Quantization: PTQ INT8, calibrated with 50 COCO real images (Vision) + 234 text prompts (Text)
Quantization Accuracy
| Model | Cosine Similarity (vs float32) |
|---|---|
| Vision HBM | 0.9888 |
| Text HBM | 0.9540 |
Files
model/
βββ gemma4-e2b_vit_ptq.hbm # Vision HBM
βββ gemma4-e2b_lm_chunk_256_cache_4096_ptq.hbm # Text HBM (prefill + decode)
βββ tok_embeddings.bin # Token embedding table
tokenizer/
βββ tokenizer.json
βββ tokenizer_config.json
βββ chat_template.jinja
βββ config.json
Download
pip install huggingface_hub
hf download ShockleyWong/gemma4-e2b-rdk-s100p --local-dir ./gemma4_e2b_deploy
Compilation Parameters
| Parameter | Value |
|---|---|
| march | nash-m |
| core_num | 1 |
| chunk_size | 256 |
| cache_len | 4096 |
| OE-LLM version | 1.0.0 |
Source Code
Quantization scripts and documentation: github.com/shockley6668/gemma4-e2b-rdk-s100p
License
Gemma models are released under the Gemma Terms of Use. See Google's Gemma license for details.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support