macmacmacmac
/

gemma-3n-E2B-it-pte

Text Generation

Model card Files Files and versions

gemma-3n-E2B-it-pte / README.md

macmacmacmac's picture

Upload README.md with huggingface_hub

a3084d3 verified about 2 months ago

|

history blame contribute delete

2.13 kB

	---
	license: gemma
	library_name: executorch
	pipeline_tag: text-generation
	tags:
	- executorch
	- gemma
	- mobile
	- on-device
	base_model: google/gemma-3n-E2B-it
	---

	# gemma-3n-E2B-it-pte

	executorch .pte export of google/gemma-3n-E2B-it for on-device mobile inference

	## available models

	\| variant \| dtype \| size \| file \|
	\|---------\|-------\|------\|------\|
	\| bf16 \| bfloat16 \| 8.4 gb \| Gemma3n-E2B-IT-text-only.pte \|
	\| int8 \| int8 weights \| 7.0 gb \| Gemma3n-E2B-text-only-int8.pte \|

	## model details

	\| property \| value \|
	\|----------\|-------\|
	\| source model \| google/gemma-3n-E2B-it \|
	\| text parameters \| 4.46b \|
	\| transformer layers \| 30 \|
	\| format \| executorch .pte \|

	## text-only export

	this export contains only the text decoder components extracted from the full multimodal gemma-3n model

	included:
	- language_model (transformer decoder)
	- lm_head (output projection)

	not included:
	- vision_tower (image encoder)
	- audio_tower (audio encoder)

	use this export for text-only inference tasks. if you need multimodal capabilities use the original huggingface model

	## quantization

	- bf16: full bfloat16 precision weights
	- int8: int8 weight-only quantization via torchao - recommended for mobile deployment

	note: int4 quantization requires gpu for inference and is not suitable for cpu-only mobile deployment

	## export configuration

	- fixed sequence length: 32 tokens
	- torch.export with strict=False
	- executorch to_edge conversion

	## usage

	```python
	from executorch.runtime import Runtime

	runtime = Runtime.get()
	program = runtime.load_program("Gemma3n-E2B-text-only-int8.pte")
	method = program.load_method("forward")

	# input_ids shape: [1, 32] dtype: torch.long
	output = method.execute([input_ids])
	# output shape: [1, 32, 262400] dtype: torch.bfloat16
	```

	## required patches

	the transformers library requires two patches before export. see [maceip/gemma3n-executorch](https://github.com/maceip/gemma3n-executorch) for details

	## benchmarks

	coming soon

	## links

	- source code: https://github.com/maceip/gemma3n-executorch
	- original model: https://huggingface.co/google/gemma-3n-E2B-it