File size: 2,135 Bytes
89429aa 51e135a 89429aa 51e135a 89429aa 51e135a 89429aa 51e135a 89429aa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | ---
license: gemma
library_name: executorch
pipeline_tag: text-generation
tags:
- executorch
- gemma
- mobile
- on-device
base_model: google/gemma-3n-E4B-it
---
# gemma-3n-E4B-it-pte
executorch .pte export of google/gemma-3n-E4B-it for on-device mobile inference
## available models
| variant | dtype | size | file |
|---------|-------|------|------|
| bf16 | bfloat16 | 13.1 gb | Gemma3n-E4B-IT-text-only.pte |
| int8 | int8 weights | 9.6 gb | Gemma3n-E4B-text-only-int8.pte |
## model details
| property | value |
|----------|-------|
| source model | google/gemma-3n-E4B-it |
| text parameters | 7.40b |
| transformer layers | 35 |
| format | executorch .pte |
## text-only export
this export contains only the text decoder components extracted from the full multimodal gemma-3n model
**included:**
- language_model (transformer decoder)
- lm_head (output projection)
**not included:**
- vision_tower (image encoder)
- audio_tower (audio encoder)
use this export for text-only inference tasks. if you need multimodal capabilities use the original huggingface model
## quantization
- **bf16**: full bfloat16 precision weights
- **int8**: int8 weight-only quantization via torchao - recommended for mobile deployment
note: int4 quantization requires gpu for inference and is not suitable for cpu-only mobile deployment
## export configuration
- fixed sequence length: 32 tokens
- torch.export with strict=False
- executorch to_edge conversion
## usage
```python
from executorch.runtime import Runtime
runtime = Runtime.get()
program = runtime.load_program("Gemma3n-E4B-text-only-int8.pte")
method = program.load_method("forward")
# input_ids shape: [1, 32] dtype: torch.long
output = method.execute([input_ids])
# output shape: [1, 32, 262400] dtype: torch.bfloat16
```
## required patches
the transformers library requires two patches before export. see [maceip/gemma3n-executorch](https://github.com/maceip/gemma3n-executorch) for details
## benchmarks
coming soon
## links
- source code: https://github.com/maceip/gemma3n-executorch
- original model: https://huggingface.co/google/gemma-3n-E4B-it
|