Adapt vision and embedding models to v4.53.2 Gemma3 modeling code

#1
by titaiwang03 - opened

The main refactors from v4.51.0 to v4.53.2 are that Gemma3Model is created.

This PR leverages on the existing method get_image_features(self, pixel_values: torch.Tensor) -> torch.Tensor and a created method get_fused_input_embeddings(self, input_ids, image_features=None) to export vision model and embedding model respectively.

The PR migrates the required code modification to 4.56.0.dev0-08/15/2025(commit ID: cd22550692cabffb037b7e5a956e8da3cbbb2b67)

NOTE: https://github.com/microsoft/onnxruntime/pull/25404 is needed to patch transformers optimization.

ONNX Runtime org

Is there an onnxruntime side PR for this?

titaiwang03 changed pull request status to open
ONNX Runtime org

Is there an onnxruntime side PR for this?

There is a minor bug in onnxruntime specific optimization that needs to be patched by: https://github.com/microsoft/onnxruntime/pull/25404.

kvaishnavi changed pull request status to merged

Sign up or log in to comment