# Gemma-3n-E4B-IT ## Model Description **Gemma 3n E4B-IT**, developed by Google DeepMind, is a 4-billion-parameter efficient multimodal model. Built with MatFormer architecture and dynamic parameter activation, it delivers strong text, image, audio, and video understanding while remaining lightweight enough for on-device deployment. It supports a 32K context window and multilingual inputs across more than 140 languages. ## Features - **Multimodal input**: text, image (up to 768×768), audio, and video. - **Efficient design**: parameter skipping and caching enable deployment on edge devices. - **Large context window**: up to 32K tokens. - **Multilingual**: trained on 140+ languages. - **Compact but strong**: achieves benchmark scores competitive with much larger models. ## Use Cases - Visual question answering and captioning - Conversational agents with multimodal inputs - On-device assistants for mobile and embedded systems - Multilingual summarization, translation, and transcription ## Inputs and Outputs **Input**: - Text prompts or dialogue - Single image (tokenized for processing) - Multiple image inputs and audio inputs support coming soon! **Output**: - Generated text (answers, captions, translations, summaries) --- ## How to use ### 1) Install Nexa-SDK Download and follow the steps under "Deploy Section" Nexa's model page: [Download Windows SDK](https://sdk.nexa.ai/model/SDXL-Base) ### 2) Get an access token Create a token in the Model Hub, then log in: ```bash nexa config set license '' ``` ### 3) Run the model Running: ```bash nexa infer NexaAI/gemma-3n ``` --- ## License - Licensed under Google’s Gemma terms of use. See Hugging Face model card for details. ## References - [Hugging Face: google/gemma-3n-E4B-it](https://huggingface.co/google/gemma-3n-E4B-it) - [Gemma 3n documentation](https://ai.google.dev/gemma/docs/gemma-3n) - [Google AI blog announcement](https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/)