how to use vllm to serve google/translategemma-4b-it & how to use it?

by ggmarks - opened Jan 22

Jan 22

I tried to deploy google/translategemma-4b-it using vLLM, but encountered the following error:

(APIServer pid=311829) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=311829)   Value error, rope_parameters should have a 'rope_type' key [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]
(APIServer pid=311829)     For further information visit https://errors.pydantic.dev/2.12/v/value_error

After checking GitHub, I found other users reported similar issues. Later, I successfully deployed the model using sglang, but the inference format of translategemma-4b-it is incompatible with the OpenAI API format. How has the community resolved such compatibility issues?

OpenAI API format:

messages = [{"role": "user", "content": f"Translate from English to Chinese：{content}"}]

`translategemma-4b-it` format:

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "source_lang_code": "cs",
                "target_lang_code": "de-DE",
                "text": "V nejhorším případě i k prasknutí čočky.",
            }
        ],
    }
]

simpple28

Jan 22

•

edited Jan 22

https://github.com/vllm-project/vllm/issues/32446
https://github.com/vllm-project/vllm/pull/32819
Please refer to this.

ggmarks

Jan 23

Thank you! It seems the issue with TranslateGemma in vLLM is still being worked on. In the meantime, I found another version on Hugging Face that works well with vLLM + OpenAI API:
https://huggingface.co/Infomaniak-AI/vllm-translategemma-27b-it

PallasAthena1

Jan 23

啟動的時候不要直接執行vLLM serve google/translategemma-4b-it，把它的參數加上，我的是加了一些參數就可以運行了
vllm serve google/translategemma-4b-it --dtype bfloat16 --max-model-len 512 --gpu-memory-utilization 0.8 --optimization-level 0

blancsw

Jan 26

If you want the 4b version we upload the vllm compatible: https://huggingface.co/Infomaniak-AI/vllm-translategemma-4b-it

srikanta-221

Google org Jan 27

Hi @ggmarks ,

translategemma uses gemma-3 style configuration and it's format is different as you have already pointed out.
You need to wait for specific release from vLLM in which they add the support for this.
For now, you can download the model locally and try to flatten the 'rope_parameter' as it is currently nested.
Or you can use the community-tuned model you mentioned already, in which they are doing the same thing along with handling chat template.

Thank you all for your valuable suggestions.