Instructions to use google/gemma-4-31B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-4-31B-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/gemma-4-31B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("google/gemma-4-31B-it")
model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-31B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
AMD Developer Cloud
Local Apps

vLLM

How to use google/gemma-4-31B-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-4-31B-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-4-31B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-4-31B-it

SGLang

How to use google/gemma-4-31B-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-4-31B-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-4-31B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-4-31B-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-4-31B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/gemma-4-31B-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-4-31B-it
```

why print rightarrow

#61

by wangtf-Kevin - opened Apr 12

Discussion

wangtf-Kevin

Apr 12

always print "$\rightarrow" in my claude code, use vllm v0.19.1rc1.dev64+g62095e82c in A100 80G x2

sonali-kumari11

Google org Apr 13

Hi @wangtf-Kevin -

If you have a specific question, feedback, or issue related to this model, please feel free to share more details. We’d be happy to help!

SerialKicked

Apr 13

•

edited Apr 14

Hi @wangtf-Kevin -

If you have a specific question, feedback, or issue related to this model, please feel free to share more details. We’d be happy to help!

That was a specific question or issue.

He's asking why your model, in it's CoT block, goes "(\rightarrow)" to mark logic chains like this:

User is asking question (\rightarrow) google staff ignores it (\rightarrow) Model is sad.

It's not a problem per say, but it's weird enough to be worth asking why it's the case, as an emoji or a "->" or literally anything else would have made more sense. Is that a special token that was trained / used for logic chains?

wangtf-Kevin

Apr 14

•

edited Apr 14

hi @sonali-kumari11 @SerialKicked
I think it may be due to the use of Chinese. Please note the printed information : $\rightarrow$

I use gemma4 to analyze MinerU ，set default Chinese in Claude code cli use vs code terminal, the print like :
ask:

找出 mineru-models-download -s modelscope -m all 需要下载的所有模型
⧉ Selected 1 lines from docker/china/Dockerfile in Visual Studio Code

anwser:
● 通过分析 mineru/cli/models_download.py 的源码，执行 mineru-models-download -s
modelscope -m all 会下载两类模型：Pipeline 模型和 VLM 模型。

以下是详细的模型清单：

Pipeline 模型 (Pipeline Models)

代码中的 download_pipeline_models()
函数定义了需要下载的模型列表。这些模型主要负责版面分析、公式识别、表格识别和
OCR。

需要下载的具体模型（对应 ModelPath 枚举）包括：

pp_doclayout_v2: 文档版面分析模型 (Layout Analysis)
unimernet_small: 公式识别模型 (Formula Recognition)
pytorch_paddle: OCR 基础模型 (OCR)
slanet_plus: 表格结构识别模型 (Table Structure Recognition)
unet_structure: 表格结构识别模型 (Table Structure Recognition)
paddle_table_cls: 表格分类模型 (Table Classification)
paddle_orientation_classification: 文本方向分类模型 (Orientation
Classification)
pp_formulanet_plus_m: 公式识别增强模型 (Formula Recognition)

VLM 模型 (VLM Models)

代码中的 download_vlm_models() 函数负责下载 VLM (Vision-Language Model)
相关的模型。

VLM 权重文件: 该函数调用 auto_download_and_get_model_root_path("/",
repo_mode='vlm')，它会下载一个完整的 VLM
模型权重库（通常是一个大型的模型文件夹），用于高精度的端到端解析。

总结

当你运行 -m all 时，程序会依次执行上述两个下载函数。

下载流程摘要：

Pipeline 组 $\rightarrow$ 下载 8 个专项模型 $\rightarrow$ 写入 mineru.json
的 models-dir.pipeline。
VLM 组 $\rightarrow$ 下载 VLM 权重库 $\rightarrow$ 写入 mineru.json 的
models-dir.vlm。

如果你打算手动下载，你需要确保上述所有模型都被下载到本地，并且在 mineru.json
中正确配置这两个目录的路径。

ask:
❯ use English Re-answer

answer：
● Based on the analysis of the source code in mineru/cli/models_download.py, executing the command mineru-models-download -s modelscope -m all triggers the download of two main categories of models: Pipeline Models and VLM Models.

Here is the detailed breakdown of the models that will be downloaded:

Pipeline Models

The download_pipeline_models() function defines a list of specialized models used for layout analysis, formula recognition, table recognition, and OCR.

The specific models downloaded (corresponding to the ModelPath enum) are:

pp_doclayout_v2: Document Layout Analysis model.
unimernet_small: Formula Recognition model.
pytorch_paddle: Base OCR models.
slanet_plus: Table Structure Recognition model.
unet_structure: Table Structure Recognition model.
paddle_table_cls: Table Classification model.
paddle_orientation_classification: Text Orientation Classification model.
pp_formulanet_plus_m: Enhanced Formula Recognition model.

VLM Models

The download_vlm_models() function handles the download of the Vision-Language Model (VLM) weights.

VLM Weight Library: This function calls auto_download_and_get_model_root_path("/", repo_mode='vlm'), which downloads a comprehensive VLM model weights directory used for high-accuracy end-to-end parsing.

Summary Table

┌────────────────┬──────────────────────┬──────────────────────────────────────────────┬───────────────────────────┐
│ Model Category │ Scope │ Primary Purpose │ Config Key in mineru.json │
├────────────────┼──────────────────────┼──────────────────────────────────────────────┼───────────────────────────┤
│ Pipeline │ 8 Specialized Models │ Layout, OCR, Formula, and Table Recognition │ models-dir.pipeline │
├────────────────┼──────────────────────┼──────────────────────────────────────────────┼───────────────────────────┤
│ VLM │ VLM Weight Library │ High-precision end-to-end multimodal parsing │ models-dir.vlm │
└────────────────┴──────────────────────┴──────────────────────────────────────────────┴───────────────────────────┘

Execution Workflow:
When the -m all flag is used, the program sequentially executes both download_pipeline_models() and download_vlm_models(). Once completed, it automatically updates your mineru.json configuration file with the local paths to these downloaded models.

SerialKicked

Apr 14

•

edited Apr 16

Interesting how it's removed from your English translated version. I use the model only in English, and while I've seen $\rightarrow$ surface plenty of times in English, it was always in the thinking block, never in the responses.

Now that I think about it, I noticed the rightarrow thing in the early GGUF versions on early llama.cpp implementation, but I don't remember seeing it recently. But I haven't paid that much attention to it either (don't often read the thinking block).

Edit: Correction, I still see it in modern GGUF and backends. Both in response and thinking block. It's just not super common.

yamikumods

29 days ago

It's just a LaTeX expression embedded in markdown.
I guess it's natural for them to use $\rightarrow$ in thinking section since they use markdown in the section.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment