File size: 2,135 Bytes
89429aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51e135a
 
 
 
 
 
 
89429aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51e135a
 
 
 
 
 
 
89429aa
 
 
 
 
 
 
 
 
 
 
 
51e135a
89429aa
 
 
 
51e135a
89429aa
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: gemma
library_name: executorch
pipeline_tag: text-generation
tags:
  - executorch
  - gemma
  - mobile
  - on-device
base_model: google/gemma-3n-E4B-it
---

# gemma-3n-E4B-it-pte

executorch .pte export of google/gemma-3n-E4B-it for on-device mobile inference

## available models

| variant | dtype | size | file |
|---------|-------|------|------|
| bf16 | bfloat16 | 13.1 gb | Gemma3n-E4B-IT-text-only.pte |
| int8 | int8 weights | 9.6 gb | Gemma3n-E4B-text-only-int8.pte |

## model details

| property | value |
|----------|-------|
| source model | google/gemma-3n-E4B-it |
| text parameters | 7.40b |
| transformer layers | 35 |
| format | executorch .pte |

## text-only export

this export contains only the text decoder components extracted from the full multimodal gemma-3n model

**included:**
- language_model (transformer decoder)
- lm_head (output projection)

**not included:**
- vision_tower (image encoder)
- audio_tower (audio encoder)

use this export for text-only inference tasks. if you need multimodal capabilities use the original huggingface model

## quantization

- **bf16**: full bfloat16 precision weights
- **int8**: int8 weight-only quantization via torchao - recommended for mobile deployment

note: int4 quantization requires gpu for inference and is not suitable for cpu-only mobile deployment

## export configuration

- fixed sequence length: 32 tokens
- torch.export with strict=False
- executorch to_edge conversion

## usage

```python
from executorch.runtime import Runtime

runtime = Runtime.get()
program = runtime.load_program("Gemma3n-E4B-text-only-int8.pte")
method = program.load_method("forward")

# input_ids shape: [1, 32] dtype: torch.long
output = method.execute([input_ids])
# output shape: [1, 32, 262400] dtype: torch.bfloat16
```

## required patches

the transformers library requires two patches before export. see [maceip/gemma3n-executorch](https://github.com/maceip/gemma3n-executorch) for details

## benchmarks

coming soon

## links

- source code: https://github.com/maceip/gemma3n-executorch
- original model: https://huggingface.co/google/gemma-3n-E4B-it