Update README.md

d2bc823 verified 8 months ago

2.49 kB

base_model: utter-project/EuroLLM-9B-Instruct
library_name: peft
license: mit
datasets:
  - hrabalm/CUNI-MH-v2-encs-data
language:
  - cs
  - en
pipeline_tag: translation

Model Card for Model ID

CUNI-MH-v2 is a translation model build on top of EuroLLM-9B-Instruct for WMT25. It was trained using LoRA and Contrastive Preference Optimization (CPO). Note that the model was fine-tuned for the translation task only and we don't expect it to perform well in other tasks. It may also be very sensitive to using the exact prompt template that was used during the training.

Separate LoRA adapters are provided for en2cs and cs2de directions:

en2cs
cs2de

Usage

We recommend using vLLM for inference, either directly or using the OpenAI-like server.

vLLM Python

import vllm
import vllm.lora


BASE_MODEL = "utter-project/EuroLLM-9B-Instruct"
ADAPTER_MODEL = "hrabalm/CUNI-MH-v2-encs"
llm = vllm.LLM(
    BASE_MODEL,
    enable_lora=True,
    max_lora_rank=32,
    enforce_eager=True,
    seed=42,
)
lora_request = vllm.lora.request.LoRARequest("adapter", 1, ADAPTER_MODEL)


def format_prompt(src_lang, tgt_lang, src):
    return (
        "Translate the following {src_lang} source text to {tgt_lang}:\n{src_lang}: {src}"
    ).format(src_lang=src_lang, tgt_lang=tgt_lang, src=src)


sampling_params = vllm.SamplingParams(
    temperature=0,
    max_tokens=512,
    stop=["\n"],
)

messages = [
    [{"role": "user", "content": format_prompt("English", "Czech", "Hello, world!")}]
]
outputs = llm.chat(
    messages,
    sampling_params=sampling_params,
    lora_request=lora_request,
)
print(outputs)
print(outputs[0].outputs[0].text)

vLLM OpenAI-like Server

vllm serve \
        utter-project/EuroLLM-9B-Instruct \
        --dtype auto \
        --seed 42 \
        --enable-lora \
        --host 0.0.0.0 \
        --max-lora-rank 32 \
        --max-num-seqs 256 \
        --max-model-len 4096 \
        --enforce-eager \
        --lora-modules "default=hrabalm/CUNI-MH-v2-encs"  # the model will be available under the name "default"

Notes

Note that for the WMT25 submission, we split the input segments into segments of at most 256 input tokens on sentence boundaries. For this purpose, we used the sentence-splitter library.

Framework versions

PEFT 0.13.2