File size: 4,603 Bytes
cabc1ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
319c507
 
 
cabc1ce
 
 
 
7cf235b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: gemma
language:
- sl
- en
- hr
- sr
- bs
base_model:
- cjvt/GaMS-9B-Instruct
pipeline_tag: text-generation
---

# Model Card for GaMS-DPO-Translator

GaMS-9B-Instruct-DPO-Translator is a fine-tuned version of GaMS-9B-Instruct. Direct Preference Optimization (DPO) was performed on the original model. The learning dataset was synthetially generated by using GaMS-9B-SFT-Translator and EuroLLM-9B-Instruct. 

![image/png](https://cdn-uploads.huggingface.co/production/uploads/652d40a78fa1fbb0aae165bb/94gX0PG8zRB_Zg31K2y_i.png)

## Basic information

- **Developed by:** team of researchers at the University of Ljubljana, Faculty for Computer and Information Science. Team members: Dario Vajda, Domen Vreš and Marko Robnik-Šikonja.
- **Languages:** Slovene, English (primary), Croatian, Bosnian and Serbian (secondary). The model might also work for other languages supported by Gemma 2, even though it was not continually pretrained on them.
- **Base model:** [cjvt/GaMS-9B-Instruct](https://huggingface.co/cjvt/GaMS-9B-Instruct)
- **License:** [Gemma](https://ai.google.dev/gemma/terms)

## Usage

The model can be run through `pipeline` API using the following code:

```python
from transformers import pipeline

model_id = "GaMS-Beta/GaMS-9B-Instruct-DPO-Translator"

pline = pipeline(
    "text-generation",
    model=model_id,
    device_map="cuda" # replace with "mps" to run on a Mac device
)

# Example of response generation
message = [{"role": "user", "content": "Prevedi naslednje angleško besedilo v slovenščino.\nToday is a nice day."}]
response = pline(message, max_new_tokens=512)
print("Translation:", response[0]["generated_text"][-1]["content"])
```

For multi GPU inference set the `device_map` to `auto`:

```python
from transformers import pipeline

model_id = "GaMS-Beta/GaMS-9B-Instruct-DPO-Translator"

pline = pipeline(
    "text-generation",
    model=model_id,
    device_map="auto"
)

# Example of response generation
message = [{"role": "user", "content": "Prevedi naslednje angleško besedilo v slovenščino.\nToday is a nice day."}]
response = pline(message, max_new_tokens=512)
print("Model's response:", response[0]["generated_text"][-1]["content"])

# Example of conversation chain
new_message = response[0]["generated_text"]
new_message.append({"role": "user", "content": "Lahko bolj podrobno opišeš ta dogodek?"})
response = pline(new_message, max_new_tokens=1024)
print("Model's response:", response[0]["generated_text"][-1]["content"])
```

## Data

Data for fine-tuning the original model was acquired by translating a large corpora of wikipedia articles, ccnews articles, bookcorpus texts and english conversational datasets by two models(GaMS-9B-SFT-Translator and EuroLLM-9B-Instruct) which were then ranked by some automatic metrics for translation quality and reliability.

## Training

The model was trained on the [Vega HPC](https://izum.si/vega_slv/)

## Evaluation

The model was evaluated by our custom script on three types of data. The results are show in the following table.

| Model | Overall Comet | ccnews | nemotron | wikipedia | Bad Lang (%) | Short (%) | Bad Markdown (%) |
| --- | --- | --- | --- | --- | --- | --- | --- |
| gemini-2.5-flash | 0.717982 | 0.702981 | 0.697498 | 0.753924 | 0.35% | 0.42% | 3.70% |
| **GaMS-9B-Instruct-DPO-Translator** | **0.714729** | **0.708317** | **0.689316** | **0.746768** | **1.88%** | **1.56%** | **13.22%** |
| GaMS-9B-SFT-Translator-DPO | 0.708042 | 0.702903 | 0.679462 | 0.742583 | 0.91% | 0.28% | 18.28% |
| GaMS-27B-Instruct | 0.701284 | 0.686480 | 0.680014 | 0.730733 | 27.28% | 5.36% | 62.07% |
| GaMS-9B-Instruct | 0.693659 | 0.685006 | 0.673394 | 0.723470 | 13.50% | 4.83% | 33.15% |
| EuroLLM-9B-Instruct | 0.689321 | 0.668084 | 0.670723 | 0.729227 | 8.97% | 1.89% | 35.08% |
| GaMS-9B-SFT-Translator | 0.682467 | 0.676580 | 0.673650 | 0.699602 | 5.14% | 1.48% | 30.53% |

*Note* - the evaluation script and evaluation data can be found in this [github repo](https://github.com/DarioVajda/translation_dpo) under the data_pipeline folder. See the README for more detailed instructions.


## Citation
If you found this project useful in your work, please cite our paper with the following BibTeX citation:
```txt
@misc{vajda2025improvingllmsmachinetranslation,
      title={Improving LLMs for Machine Translation Using Synthetic Preference Data}, 
      author={Dario Vajda and Domen Vreš and Marko Robnik-Šikonja},
      year={2025},
      eprint={2508.14951},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.14951}, 
}
```