DarioVajda commited on
Commit
4054469
·
verified ·
1 Parent(s): abf51f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -1
README.md CHANGED
@@ -3,6 +3,114 @@ license: gemma
3
  language:
4
  - sl
5
  - en
 
 
 
 
 
 
6
  ---
7
 
8
- Test
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  language:
4
  - sl
5
  - en
6
+ - hr
7
+ - sr
8
+ - bs
9
+ base_model:
10
+ - cjvt/GaMS-9B
11
+ pipeline_tag: text-generation
12
  ---
13
 
14
+ # Model Card for GaMS-DPO-Translator
15
+
16
+ GaMS-DPO-Translator is a fine-tuned version of GaMS-9B-Instruct. Direct Preference Optimization (DPO) was performed on the original model. The learning dataset was synthetially generated by using GaMS-9B-Instruct and EuroLLM-9B-Instruct.
17
+
18
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/652d40a78fa1fbb0aae165bb/94gX0PG8zRB_Zg31K2y_i.png)
19
+
20
+ ## Basic information
21
+
22
+ - **Developed by:** team of researchers at the University of Ljubljana, Faculty for Computer and Information Science. Team members: Dario Vajda, Domen Vreš and Marko Robnik-Šikonja.
23
+ - **Languages:** Slovene, English (primary), Croatian, Bosnian and Serbian (secondary). The model might also work for other languages supported by Gemma 2, even though it was not continually pretrained on them.
24
+ - **Base model:** [cjvt/GaMS-9B](https://huggingface.co/cjvt/GaMS-9B-Instruct)
25
+ - **License:** [Gemma](https://ai.google.dev/gemma/terms)
26
+
27
+ ## Usage
28
+
29
+ The model can be run through `pipeline` API using the following code:
30
+
31
+ ```python
32
+ from transformers import pipeline
33
+
34
+ model_id = "DarioVajda/GaMS-DPO-Translator"
35
+
36
+ pline = pipeline(
37
+ "text-generation",
38
+ model=model_id,
39
+ device_map="cuda" # replace with "mps" to run on a Mac device
40
+ )
41
+
42
+ # Example of response generation
43
+ message = [{"role": "user", "content": "Prevedi naslednje angleško besedilo v slovenščino.\nToday is a nice day."}]
44
+ response = pline(message, max_new_tokens=512)
45
+ print("Translation:", response[0]["generated_text"][-1]["content"])
46
+ ```
47
+
48
+ For multi GPU inference set the `device_map` to `auto`:
49
+
50
+ ```python
51
+ from transformers import pipeline
52
+
53
+ model_id = "DarioVajda/GaMS-DPO-Translator"
54
+
55
+ pline = pipeline(
56
+ "text-generation",
57
+ model=model_id,
58
+ device_map="auto"
59
+ )
60
+
61
+ # Example of response generation
62
+ message = [{"role": "user", "content": "Prevedi naslednje angleško besedilo v slovenščino.\nToday is a nice day."}]
63
+ response = pline(message, max_new_tokens=512)
64
+ print("Model's response:", response[0]["generated_text"][-1]["content"])
65
+
66
+ # Example of conversation chain
67
+ new_message = response[0]["generated_text"]
68
+ new_message.append({"role": "user", "content": "Lahko bolj podrobno opišeš ta dogodek?"})
69
+ response = pline(new_message, max_new_tokens=1024)
70
+ print("Model's response:", response[0]["generated_text"][-1]["content"])
71
+ ```
72
+
73
+ ## Data
74
+
75
+ Data for fine-tuning the original model was acquired by translating a large corpora of wikipedia articles by two models (GaMS-9B-Instruct and EuroLLM-9B-Instruct) which were then ranked by some automatic metrics for translation quality and reliability.
76
+
77
+ ## Training
78
+
79
+ The model was trained on the [Vega HPC](https://izum.si/vega_slv/)
80
+
81
+ ## Evaluation
82
+
83
+ The model was evaluated by Slobench and we expanded the evaluation to measure some other qualities of the model we care about.
84
+
85
+ ### Slobench evaluation:
86
+
87
+
88
+ | Model | BERT score | BLEU (avg) | METEOR (avg) | CHRF (avg) | BLEU (corpus) | CHRF (corpus) |
89
+ |--------------------------------|-----------:|-----------:|-------------:|-----------:|--------------:|--------------:|
90
+ | EuroLLM-9B-Instruct | 0.8741 | 0.2927 | 0.5792 | 0.6055 | 0.3273 | 0.6055 |
91
+ | GaMS-27B-Instruct | 0.8734 | 0.2866 | 0.5688 | 0.5986 | 0.3246 | 0.5986 |
92
+ | **GaMS-9B-DPO-Translator** | **0.8726** | **0.2810** | **0.5663** | **0.5967** | **0.3252** | **0.5967** |
93
+ | GaMS-9B-Instruct | 0.8713 | 0.2773 | 0.5616 | 0.5928 | 0.3209 | 0.5928 |
94
+ | GPT 4o-mini | 0.8690 | 0.2619 | 0.5456 | 0.5839 | 0.3021 | 0.5839 |
95
+
96
+ ### Wikipedia evaluation:
97
+ This evaluation was performed on data which was not seen during training. We checked how often the model would make some fatal error and later compared the COMET scores.
98
+
99
+ Error rates:
100
+
101
+ | Model | Language Error | Truncation Error | Combined |
102
+ |-----------------|---------------:|-----------------:|---------:|
103
+ | EuroLLM | 1% | 0.4% | 1.4% |
104
+ | GaMS | 9.5% | 3.5% | 13% |
105
+ | **GaMS-DPO** | **0.6%** | **0.2%** | **0.8%** |
106
+
107
+ COMET scoring results:
108
+
109
+ | Model | Average COMET score |
110
+ |-------------------------------|--------------------:|
111
+ | EuroLLM-9B-Instruct | 0.755 |
112
+ | GaMS-9B-Instruct | 0.736 |
113
+ | **GaMS-9B-DPO-Translator** | 0.771 |
114
+
115
+
116
+