Instructions to use google/gemma-4-31B-it-assistant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-31B-it-assistant with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-31B-it-assistant") model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it-assistant") - Notebooks
- Google Colab
- Kaggle
Is it supposed to work in vllm?
#2
by mancub - opened
Nightly vllm (0.20.2rc1.dev55+g4a8ae26e5); according to vllm docs https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#available-assistant-models it should work.
I get: NotImplementedError: Speculative Decoding with draft models or parallel drafting does not support multimodal models yet
2x3090 gpu, TP=2