Instructions to use google/gemma-3n-E4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3n-E4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/gemma-3n-E4B")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-3n-E4B") model = AutoModelForImageTextToText.from_pretrained("google/gemma-3n-E4B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3n-E4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3n-E4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3n-E4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/gemma-3n-E4B
- SGLang
How to use google/gemma-3n-E4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3n-E4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3n-E4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3n-E4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3n-E4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/gemma-3n-E4B with Docker Model Runner:
docker model run hf.co/google/gemma-3n-E4B
Why is the output of the following model so different?
" In this image, there is ", which is a prompt. When I input this prompt into the model, the model can output normally. However, if the prompt I give to the model is " in this image, what did you see?", the model starts to talk nonsense. Why?
This is a base model, not a chat/instruction tuned model, that might be the reason the completions seem unexpected.
For the instruction tuned model you may want to check https://huggingface.co/google/gemma-3n-E4B-it
Hi @rrrfy ,
Welcome to Google Gemma family of open models, I can successfully able to get the output for the given two prompts which are mentioned in your comment. I can see similar kind of output is produced by the model for the given prompts. However both outputs are not exactly same, at the same time the content of the both the outputs are similar in nature. It's highly recommend to check whether the downloaded weights got corrupted or any files that are dowloaded corrupted please. Please find the attached gist file for your reference.
Thanks.