Update README.md
Browse files
README.md
CHANGED
|
@@ -3,6 +3,114 @@ license: gemma
|
|
| 3 |
language:
|
| 4 |
- sl
|
| 5 |
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
language:
|
| 4 |
- sl
|
| 5 |
- en
|
| 6 |
+
- hr
|
| 7 |
+
- sr
|
| 8 |
+
- bs
|
| 9 |
+
base_model:
|
| 10 |
+
- cjvt/GaMS-9B
|
| 11 |
+
pipeline_tag: text-generation
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# Model Card for GaMS-DPO-Translator
|
| 15 |
+
|
| 16 |
+
GaMS-DPO-Translator is a fine-tuned version of GaMS-9B-Instruct. Direct Preference Optimization (DPO) was performed on the original model. The learning dataset was synthetially generated by using GaMS-9B-Instruct and EuroLLM-9B-Instruct.
|
| 17 |
+
|
| 18 |
+

|
| 19 |
+
|
| 20 |
+
## Basic information
|
| 21 |
+
|
| 22 |
+
- **Developed by:** team of researchers at the University of Ljubljana, Faculty for Computer and Information Science. Team members: Dario Vajda, Domen Vreš and Marko Robnik-Šikonja.
|
| 23 |
+
- **Languages:** Slovene, English (primary), Croatian, Bosnian and Serbian (secondary). The model might also work for other languages supported by Gemma 2, even though it was not continually pretrained on them.
|
| 24 |
+
- **Base model:** [cjvt/GaMS-9B](https://huggingface.co/cjvt/GaMS-9B-Instruct)
|
| 25 |
+
- **License:** [Gemma](https://ai.google.dev/gemma/terms)
|
| 26 |
+
|
| 27 |
+
## Usage
|
| 28 |
+
|
| 29 |
+
The model can be run through `pipeline` API using the following code:
|
| 30 |
+
|
| 31 |
+
```python
|
| 32 |
+
from transformers import pipeline
|
| 33 |
+
|
| 34 |
+
model_id = "DarioVajda/GaMS-DPO-Translator"
|
| 35 |
+
|
| 36 |
+
pline = pipeline(
|
| 37 |
+
"text-generation",
|
| 38 |
+
model=model_id,
|
| 39 |
+
device_map="cuda" # replace with "mps" to run on a Mac device
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
# Example of response generation
|
| 43 |
+
message = [{"role": "user", "content": "Prevedi naslednje angleško besedilo v slovenščino.\nToday is a nice day."}]
|
| 44 |
+
response = pline(message, max_new_tokens=512)
|
| 45 |
+
print("Translation:", response[0]["generated_text"][-1]["content"])
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
For multi GPU inference set the `device_map` to `auto`:
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
from transformers import pipeline
|
| 52 |
+
|
| 53 |
+
model_id = "DarioVajda/GaMS-DPO-Translator"
|
| 54 |
+
|
| 55 |
+
pline = pipeline(
|
| 56 |
+
"text-generation",
|
| 57 |
+
model=model_id,
|
| 58 |
+
device_map="auto"
|
| 59 |
+
)
|
| 60 |
+
|
| 61 |
+
# Example of response generation
|
| 62 |
+
message = [{"role": "user", "content": "Prevedi naslednje angleško besedilo v slovenščino.\nToday is a nice day."}]
|
| 63 |
+
response = pline(message, max_new_tokens=512)
|
| 64 |
+
print("Model's response:", response[0]["generated_text"][-1]["content"])
|
| 65 |
+
|
| 66 |
+
# Example of conversation chain
|
| 67 |
+
new_message = response[0]["generated_text"]
|
| 68 |
+
new_message.append({"role": "user", "content": "Lahko bolj podrobno opišeš ta dogodek?"})
|
| 69 |
+
response = pline(new_message, max_new_tokens=1024)
|
| 70 |
+
print("Model's response:", response[0]["generated_text"][-1]["content"])
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
## Data
|
| 74 |
+
|
| 75 |
+
Data for fine-tuning the original model was acquired by translating a large corpora of wikipedia articles by two models (GaMS-9B-Instruct and EuroLLM-9B-Instruct) which were then ranked by some automatic metrics for translation quality and reliability.
|
| 76 |
+
|
| 77 |
+
## Training
|
| 78 |
+
|
| 79 |
+
The model was trained on the [Vega HPC](https://izum.si/vega_slv/)
|
| 80 |
+
|
| 81 |
+
## Evaluation
|
| 82 |
+
|
| 83 |
+
The model was evaluated by Slobench and we expanded the evaluation to measure some other qualities of the model we care about.
|
| 84 |
+
|
| 85 |
+
### Slobench evaluation:
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
| Model | BERT score | BLEU (avg) | METEOR (avg) | CHRF (avg) | BLEU (corpus) | CHRF (corpus) |
|
| 89 |
+
|--------------------------------|-----------:|-----------:|-------------:|-----------:|--------------:|--------------:|
|
| 90 |
+
| EuroLLM-9B-Instruct | 0.8741 | 0.2927 | 0.5792 | 0.6055 | 0.3273 | 0.6055 |
|
| 91 |
+
| GaMS-27B-Instruct | 0.8734 | 0.2866 | 0.5688 | 0.5986 | 0.3246 | 0.5986 |
|
| 92 |
+
| **GaMS-9B-DPO-Translator** | **0.8726** | **0.2810** | **0.5663** | **0.5967** | **0.3252** | **0.5967** |
|
| 93 |
+
| GaMS-9B-Instruct | 0.8713 | 0.2773 | 0.5616 | 0.5928 | 0.3209 | 0.5928 |
|
| 94 |
+
| GPT 4o-mini | 0.8690 | 0.2619 | 0.5456 | 0.5839 | 0.3021 | 0.5839 |
|
| 95 |
+
|
| 96 |
+
### Wikipedia evaluation:
|
| 97 |
+
This evaluation was performed on data which was not seen during training. We checked how often the model would make some fatal error and later compared the COMET scores.
|
| 98 |
+
|
| 99 |
+
Error rates:
|
| 100 |
+
|
| 101 |
+
| Model | Language Error | Truncation Error | Combined |
|
| 102 |
+
|-----------------|---------------:|-----------------:|---------:|
|
| 103 |
+
| EuroLLM | 1% | 0.4% | 1.4% |
|
| 104 |
+
| GaMS | 9.5% | 3.5% | 13% |
|
| 105 |
+
| **GaMS-DPO** | **0.6%** | **0.2%** | **0.8%** |
|
| 106 |
+
|
| 107 |
+
COMET scoring results:
|
| 108 |
+
|
| 109 |
+
| Model | Average COMET score |
|
| 110 |
+
|-------------------------------|--------------------:|
|
| 111 |
+
| EuroLLM-9B-Instruct | 0.755 |
|
| 112 |
+
| GaMS-9B-Instruct | 0.736 |
|
| 113 |
+
| **GaMS-9B-DPO-Translator** | 0.771 |
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
|