Access Gemma on Hugging Face

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

To access Gemma on Hugging Face, you are required to review and agree to Google's usage license. To do this, please ensure you are logged in to Hugging Face and click below. Requests are processed immediately.

litert-community/TranslateGemma-27B-IT

This model provides a few variants of TranslateGemma 27B that are ready for deployment on web using the MediaPipe LLM Inference API. See google/translategemma-27b-it for more details.

Web

Build and run our sample web app.

Accept the Gemma license on your HuggingFace account and try out the model in the MediaPipe Web Gemma Demo HuggingFace Space.

To add the model to your web app, please follow the instructions in our documentation.

Note regarding prompt templates

MediaPipe Web LLM Inference does not apply model prompt templates automatically, so be sure to follow the model-specific template in your prompts for best behavior. For example, to translate from Czech (cs) to English (en), you could use this prompt:

You are a professional Czech (cs) to English (en) translator. Your goal is to accurately convey the meaning and nuances of the original Czech text while adhering to English grammar, vocabulary, and cultural sensitivities.

Produce only the English translation, without any additional explanations or commentary. Please translate the following Czech text into English:


V nejhorším případě i k prasknutí čočky.

Performance

Web

Note that all benchmark stats are from a MacBook Pro 2024 (Apple M4 Max chip) with 1280 KV cache size, 1024 tokens prefill, and 256 tokens decode, running in Chrome.

	Precision	Backend	Prefill (tokens/sec)	Decode (tokens/sec)	Time to first token (sec)	GPU Memory	CPU Memory	Model size
F16	int8	GPU	167 tk/s	8 tk/s	15.02 s	26.8 GB	1.5 GB	27.05 GB	🔗
F32	int8	GPU	98 tk/s	8 tk/s	14.97 s	27.8 GB	1.5 GB	27.05 GB	🔗

Model size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models).
int8: quantized model with int8 weights and float activations.
GPU memory: measured by "GPU Process" memory for all of Chrome while running. Chrome was measured as using ~160MB before any model loading took place.
CPU memory: measured for the entire tab while running. Tab was measured as using ~50MB before any model loading took place.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for litert-community/TranslateGemma-27B-IT

Base model

google/translategemma-27b-it

Finetuned

(5)

this model

Space using litert-community/TranslateGemma-27B-IT 1

Collections including litert-community/TranslateGemma-27B-IT

Gemma Family

Collection

LiteRT models in the Gemma Family • 20 items • Updated about 7 hours ago • 81

Web LLM Models

Collection

LiteRT models that can run on the Web • 14 items • Updated 1 day ago • 18