Adapt vision and embedding models to v4.53.2 Gemma3 modeling code

by titaiwang03 - opened Jul 21, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+631

-777

titaiwang03

ONNX Runtime org Jul 21, 2025

•

edited Aug 15, 2025

The main refactors from v4.51.0 to v4.53.2 are that Gemma3Model is created.

This PR leverages on the existing method get_image_features(self, pixel_values: torch.Tensor) -> torch.Tensor and a created method get_fused_input_embeddings(self, input_ids, image_features=None) to export vision model and embedding model respectively.

The PR migrates the required code modification to 4.56.0.dev0-08/15/2025(commit ID: cd22550692cabffb037b7e5a956e8da3cbbb2b67)

NOTE: https://github.com/microsoft/onnxruntime/pull/25404 is needed to patch transformers optimization.

Applied diff to v4.53.2 modeling_gemma3.pybe5a948d

nenad1002

ONNX Runtime org Jul 21, 2025

Is there an onnxruntime side PR for this?

titaiwang03 changed pull request status to open Jul 21, 2025

titaiwang03

ONNX Runtime org Jul 21, 2025

Is there an onnxruntime side PR for this?

There is a minor bug in onnxruntime specific optimization that needs to be patched by: https://github.com/microsoft/onnxruntime/pull/25404.

update to 4.56.0.dev0-08152025619b90ad

removed position_ids shift patch35a2baf4

kvaishnavi changed pull request status to merged Aug 18, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment