NexaAI
/

gemma-3n

+# Gemma-3n-E4B-IT
+## Model Description
+**Gemma 3n E4B-IT**, developed by Google DeepMind, is a 4-billion-parameter efficient multimodal model.
+Built with MatFormer architecture and dynamic parameter activation, it delivers strong text, image, audio, and video understanding while remaining lightweight enough for on-device deployment.
+It supports a 32K context window and multilingual inputs across more than 140 languages.
+## Features
+- **Multimodal input**: text, image (up to 768×768), audio, and video.
+- **Efficient design**: parameter skipping and caching enable deployment on edge devices.
+- **Large context window**: up to 32K tokens.
+- **Multilingual**: trained on 140+ languages.
+- **Compact but strong**: achieves benchmark scores competitive with much larger models.
+## Use Cases
+- Visual question answering and captioning
+- Conversational agents with multimodal inputs
+- On-device assistants for mobile and embedded systems
+- Multilingual summarization, translation, and transcription
+## Inputs and Outputs
+**Input**:
+- Text prompts or dialogue
+- Images, audio, and video (tokenized for processing)
+**Output**:
+- Generated text (answers, captions, translations, summaries)
+## License
+- Licensed under Google’s Gemma terms of use. See Hugging Face model card for details.
+## References
+- [Hugging Face: google/gemma-3n-E4B-it](https://huggingface.co/google/gemma-3n-E4B-it)
+- [Gemma 3n documentation](https://ai.google.dev/gemma/docs/gemma-3n)
+- [Google AI blog announcement](https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/)