# Gemma-3n-E4B-IT

## Model Description
**Gemma 3n E4B-IT**, developed by Google DeepMind, is a 4-billion-parameter efficient multimodal model.  
Built with MatFormer architecture and dynamic parameter activation, it delivers strong text, image, audio, and video understanding while remaining lightweight enough for on-device deployment.  
It supports a 32K context window and multilingual inputs across more than 140 languages.

## Features
- **Multimodal input**: text, image (up to 768×768), audio, and video.
- **Efficient design**: parameter skipping and caching enable deployment on edge devices.
- **Large context window**: up to 32K tokens.
- **Multilingual**: trained on 140+ languages.
- **Compact but strong**: achieves benchmark scores competitive with much larger models.

## Use Cases
- Visual question answering and captioning
- Conversational agents with multimodal inputs
- On-device assistants for mobile and embedded systems
- Multilingual summarization, translation, and transcription

## Inputs and Outputs
**Input**:
- Text prompts or dialogue
- Single image (tokenized for processing)
- Multiple image inputs and audio inputs support coming soon!

**Output**:
- Generated text (answers, captions, translations, summaries)

---
## How to use

### 1) Install Nexa-SDK
Download and follow the steps under "Deploy Section" Nexa's model page: [Download Windows SDK](https://sdk.nexa.ai/model/SDXL-Base)

### 2) Get an access token
Create a token in the Model Hub, then log in:
```bash
nexa config set license '<access_token>'
```

### 3) Run the model
Running:
```bash
nexa infer NexaAI/gemma-3n
```
---

## License
- Licensed under Google’s Gemma terms of use. See Hugging Face model card for details.

## References
- [Hugging Face: google/gemma-3n-E4B-it](https://huggingface.co/google/gemma-3n-E4B-it)
- [Gemma 3n documentation](https://ai.google.dev/gemma/docs/gemma-3n)
- [Google AI blog announcement](https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/)