translategemma-4b-it-FP8-Dynamic / README.md

bnjmnmarie

Create README.md

0929a31 verified 2 days ago

preview code

raw

history blame contribute delete

1.66 kB

metadata

license: gemma
datasets:
  - kaitchup/opus100-translategemma-calib
base_model:
  - google/translategemma-4b-it

This is a quantized variant of google/translategemma-4b-it, created by The Kaitchup (newsletter: https://kaitchup.substack.com).

More details (training recipe, benchmarks, and recommended settings) will be added later. In the meantime, here are the current notes and a working inference example.

Status / limitations

Quick smoke test only (not fully evaluated).
RoPE parameters were removed for compatibility with vLLM. As a result, long-context behavior may be degraded. I have not verified the impact yet.
Chat template not supported (for now). To use the model in vLLM, call a completions endpoint and provide a fully formatted prompt.

Serving with vLLM

vllm serve kaitchup/translategemma-4b-it-FP8-Dynamic  --max-model-len 2048   --chat-template-content-format openai --served-model-name  gemma

curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
    "model": "gemma",
    "prompt": "<bos><start_of_turn>user\nYou are a professional French (fr) to English (en) translator. Your goal is to accurately convey the meaning and nuances of the original French text while adhering to English grammar, vocabulary, and cultural sensitivities.\nProduce only the English translation, without any additional explanations or commentary. Please translate the following French text into English:\n\n\nJaime les pâtes !<end_of_turn>\n<start_of_turn>model\n",
    "temperature": 0,
    "max_tokens": 200,
    "stop": ["<end_of_turn>"]
  }'