Update README.md

aca3563 verified 8 days ago

4.27 kB

	---
	license: gemma
	base_model: google/functiongemma-270m-it
	pipeline_tag: text-generation
	library_name: litert-lm
	tags:
	- gemma3
	- gemma
	- functiongemma
	extra_gated_heading: Access Function Gemma 270M FT Tiny Garden on Hugging Face
	extra_gated_prompt: >-
	To access Gemma 270M FT Tiny Garden on Hugging Face, you are required to review and agree
	to the gemma license. To do this, please ensure you are logged in to
	Hugging Face and click below. Requests are processed immediately.
	extra_gated_button_content: Acknowledge licensed
	---

	# litert-community/functiongemma-270m-ft-tiny-garden

	Main Model Card: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it)

	This model card provides the Tiny Garden model that is ready for deployment on the Google AI Edge Gallery app.

	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. This particular Gemma model is especially small so it is ideal for on-device use cases. By running this model on device, users can have private access to Generative AI technology without even requiring an internet connection.

	## Try it live

	The Tiny Garden model is a finetune of [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it). To try out Tiny Garden, download the Google AI Edge Gallery app on your phone, open the Tiny Garden panel and tap the download button.

	<div align="center">

	\| [<svg xmlns="http://www.w3.org/2000/svg" height="72px" viewBox="0 -960 960 960" width="72px" fill="currentColor"><path d="M240-40q-50 0-85-34.5T120-159v-641q0-50 35-85t85-35q17 0 32.5 4.5T302-903l555 319q28 17 45 44.5t17 59.5q0 33-17.5 61T855-374L301-56q-14 8-29.5 12T240-40Zm25-59 353-202-116-116q-9-9-21-9t-21 9L179-135q7 27 33.5 39t52.5-3Zm156-357q9-9 9-21.5t-9-21.5L177-743v533l244-246Zm248 125 157-90q16-9 25.5-24.5T861-479q0-17-9.5-32T827-535l-158-92-127 127q-9 9-9 21t9 21l127 127ZM503-539l116-117-354-205q-25-15-51-1.5T180-820l281 281q9 9 21 9t21-9Z"/></svg>](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery&pli=1) \| [<svg xmlns="http://www.w3.org/2000/svg" height="84px" viewBox="0 -960 960 960" width="84px" fill="currentColor"><path d="M160-615v-60h60v60h-60Zm0 335v-275h60v275h-60Zm292 0H347q-24.75 0-42.37-17.63Q287-315.25 287-340v-280q0-24.75 17.63-42.38Q322.25-680 347-680h105q24.75 0 42.38 17.62Q512-644.75 512-620v280q0 24.75-17.62 42.37Q476.75-280 452-280Zm-105-60h105v-280H347v280Zm228 60v-60h165v-114H635q-24.75 0-42.37-17.63Q575-489.25 575-514v-106q0-24.75 17.63-42.38Q610.25-680 635-680h165v60H635v106h105q24.75 0 42.38 17.62Q800-478.75 800-454v114q0 24.75-17.62 42.37Q764.75-280 740-280H575Z"/></svg>](https://apps.apple.com/us/app/google-ai-edge-gallery/id6749645337) \|
	\| :---: \| :---: \|
	\| [Android](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery&pli=1) \| [iOS](https://apps.apple.com/us/app/google-ai-edge-gallery/id6749645337) \|

	</div>

	## Performance

	### Android

	Benchmarked on S25 Ultra with 512 prefill tokens and 256 decode tokens.

	<table border="1">
	<tr>
	<th style="text-align: left">Backend</th>
	<th style="text-align: left">Quantization scheme</th>
	<th style="text-align: left">Context length</th>
	<th style="text-align: left">Prefill (tokens/sec)</th>
	<th style="text-align: left">Decode (tokens/sec)</th>
	<th style="text-align: left">Time-to-first-token</th>
	<th style="text-align: left">Model size (MB)</th>
	<th style="text-align: left">Peak RSS Memory (MB)</th>
	</tr>
	<tr>
	<td><p style="text-align: left">CPU</p></td>
	<td><p style="text-align: left">dynamic_int8</p></td>
	<td><p style="text-align: right">1024</p></td>
	<td><p style="text-align: right">2231 tk/s</p></td>
	<td><p style="text-align: right">153.6 tk/s</p></td>
	<td><p style="text-align: right">0.45 s</p></td>
	<td><p style="text-align: right">289 MB</p></td>
	<td><p style="text-align: right">513 MB</p></td>
	</table>

	Notes:
	* Model Size: measured by the size of the file on disk.
	* The inference on CPU is accelerated via the LiteRT XNNPACK delegate with 4 threads
	* Benchmark is run with cache enabled and initialized. During the first run, the latency and memory usage may differ.