Adapt vision and embedding models to v4.53.2 Gemma3 modeling code
The main refactors from v4.51.0 to v4.53.2 are that Gemma3Model is created.
This PR leverages on the existing method get_image_features(self, pixel_values: torch.Tensor) -> torch.Tensor and a created method get_fused_input_embeddings(self, input_ids, image_features=None) to export vision model and embedding model respectively.
The PR migrates the required code modification to 4.56.0.dev0-08/15/2025(commit ID: cd22550692cabffb037b7e5a956e8da3cbbb2b67)
NOTE: https://github.com/microsoft/onnxruntime/pull/25404 is needed to patch transformers optimization.
Is there an onnxruntime side PR for this?
Is there an onnxruntime side PR for this?
There is a minor bug in onnxruntime specific optimization that needs to be patched by: https://github.com/microsoft/onnxruntime/pull/25404.