gemma-3n-E4B-it-pte

executorch .pte export of google/gemma-3n-E4B-it for on-device mobile inference

available models

variant dtype size file
bf16 bfloat16 13.1 gb Gemma3n-E4B-IT-text-only.pte
int8 int8 weights 9.6 gb Gemma3n-E4B-text-only-int8.pte

model details

property value
source model google/gemma-3n-E4B-it
text parameters 7.40b
transformer layers 35
format executorch .pte

text-only export

this export contains only the text decoder components extracted from the full multimodal gemma-3n model

included:

  • language_model (transformer decoder)
  • lm_head (output projection)

not included:

  • vision_tower (image encoder)
  • audio_tower (audio encoder)

use this export for text-only inference tasks. if you need multimodal capabilities use the original huggingface model

quantization

  • bf16: full bfloat16 precision weights
  • int8: int8 weight-only quantization via torchao - recommended for mobile deployment

note: int4 quantization requires gpu for inference and is not suitable for cpu-only mobile deployment

export configuration

  • fixed sequence length: 32 tokens
  • torch.export with strict=False
  • executorch to_edge conversion

usage

from executorch.runtime import Runtime

runtime = Runtime.get()
program = runtime.load_program("Gemma3n-E4B-text-only-int8.pte")
method = program.load_method("forward")

# input_ids shape: [1, 32] dtype: torch.long
output = method.execute([input_ids])
# output shape: [1, 32, 262400] dtype: torch.bfloat16

required patches

the transformers library requires two patches before export. see maceip/gemma3n-executorch for details

benchmarks

coming soon

links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for macmacmacmac/gemma-3n-E4B-it-pte

Finetuned
(50)
this model