Instructions to use google/gemma-4-12B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-12B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-4-12B-it") model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-12B-it") - Notebooks
- Google Colab
- Kaggle
This is an awesome model proposition/idea for users with few VRAM.
Holy shit, you guys think about the people with few VRAM to spare, it's awesome to see you guys bridging the gap between low end and high/mid end stuff. I have not tested the model, but I have a feeling the intention is amazing and it truly gives the thought to those who have little VRAM.
I am honestly interested in seeing a 24B and a 27B model variant for users who still want to enjoy Gemma4, but want to run Gemma4 with manageable VRAM requirements. (There's a lot of users with 12GB/16GB/8GB of VRAM). As a user with 28GB of VRAM, I have enjoyed the 31B a lot, but I feel like we need to share some love with our 24B users and 27B users. Just to bridge the gap ๐
Let me know what you people think! ^_^
And if any of you wonderful google employees out there want to sprinkle in your wisdom, i'm happy to hear your thoughts!
Yes, it is a perfect size for a dense model - a good quant fits one 24GB VRAM card with enough space for long context.
Exactly the model I was looking for. thank u G.