File size: 4,656 Bytes
abf51f8
 
 
 
 
4054469
 
 
 
 
 
abf51f8
 
4054469
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
license: gemma
language:
- sl
- en
- hr
- sr
- bs
base_model:
- cjvt/GaMS-9B
pipeline_tag: text-generation
---

# Model Card for GaMS-DPO-Translator

GaMS-DPO-Translator is a fine-tuned version of GaMS-9B-Instruct. Direct Preference Optimization (DPO) was performed on the original model. The learning dataset was synthetially generated by using GaMS-9B-Instruct and EuroLLM-9B-Instruct. 

![image/png](https://cdn-uploads.huggingface.co/production/uploads/652d40a78fa1fbb0aae165bb/94gX0PG8zRB_Zg31K2y_i.png)

## Basic information

- **Developed by:** team of researchers at the University of Ljubljana, Faculty for Computer and Information Science. Team members: Dario Vajda, Domen Vreš and Marko Robnik-Šikonja.
- **Languages:** Slovene, English (primary), Croatian, Bosnian and Serbian (secondary). The model might also work for other languages supported by Gemma 2, even though it was not continually pretrained on them.
- **Base model:** [cjvt/GaMS-9B](https://huggingface.co/cjvt/GaMS-9B-Instruct)
- **License:** [Gemma](https://ai.google.dev/gemma/terms)

## Usage

The model can be run through `pipeline` API using the following code:

```python
from transformers import pipeline

model_id = "DarioVajda/GaMS-DPO-Translator"

pline = pipeline(
    "text-generation",
    model=model_id,
    device_map="cuda" # replace with "mps" to run on a Mac device
)

# Example of response generation
message = [{"role": "user", "content": "Prevedi naslednje angleško besedilo v slovenščino.\nToday is a nice day."}]
response = pline(message, max_new_tokens=512)
print("Translation:", response[0]["generated_text"][-1]["content"])
```

For multi GPU inference set the `device_map` to `auto`:

```python
from transformers import pipeline

model_id = "DarioVajda/GaMS-DPO-Translator"

pline = pipeline(
    "text-generation",
    model=model_id,
    device_map="auto"
)

# Example of response generation
message = [{"role": "user", "content": "Prevedi naslednje angleško besedilo v slovenščino.\nToday is a nice day."}]
response = pline(message, max_new_tokens=512)
print("Model's response:", response[0]["generated_text"][-1]["content"])

# Example of conversation chain
new_message = response[0]["generated_text"]
new_message.append({"role": "user", "content": "Lahko bolj podrobno opišeš ta dogodek?"})
response = pline(new_message, max_new_tokens=1024)
print("Model's response:", response[0]["generated_text"][-1]["content"])
```

## Data

Data for fine-tuning the original model was acquired by translating a large corpora of wikipedia articles by two models (GaMS-9B-Instruct and EuroLLM-9B-Instruct) which were then ranked by some automatic metrics for translation quality and reliability.

## Training

The model was trained on the [Vega HPC](https://izum.si/vega_slv/)

## Evaluation

The model was evaluated by Slobench and we expanded the evaluation to measure some other qualities of the model we care about.

### Slobench evaluation:


| Model                          | BERT score | BLEU (avg) | METEOR (avg) | CHRF (avg) | BLEU (corpus) | CHRF (corpus) |
|--------------------------------|-----------:|-----------:|-------------:|-----------:|--------------:|--------------:|
| EuroLLM-9B-Instruct            |     0.8741 |     0.2927 |       0.5792 |     0.6055 |        0.3273 |        0.6055 |
| GaMS-27B-Instruct              |     0.8734 |     0.2866 |       0.5688 |     0.5986 |        0.3246 |        0.5986 |
| **GaMS-9B-DPO-Translator**     | **0.8726** | **0.2810** |   **0.5663** | **0.5967** |    **0.3252** |    **0.5967** |
| GaMS-9B-Instruct               |     0.8713 |     0.2773 |       0.5616 |     0.5928 |        0.3209 |        0.5928 |
| GPT 4o-mini                    |     0.8690 |     0.2619 |       0.5456 |     0.5839 |        0.3021 |        0.5839 |

### Wikipedia evaluation:
This evaluation was performed on data which was not seen during training. We checked how often the model would make some fatal error and later compared the COMET scores.

Error rates:

| Model           | Language Error | Truncation Error | Combined |
|-----------------|---------------:|-----------------:|---------:|
| EuroLLM         |            1%  |             0.4% |     1.4% |
| GaMS            |           9.5% |             3.5% |      13% |
| **GaMS-DPO**    |       **0.6%** |         **0.2%** | **0.8%** |

COMET scoring results:

| Model                         | Average COMET score |
|-------------------------------|--------------------:|
| EuroLLM-9B-Instruct           |              0.755 |
| GaMS-9B-Instruct              |              0.736 |
| **GaMS-9B-DPO-Translator**    |              0.771 |