Translation
LiteRT
Safetensors
Spanish
Basque
marian
odegiber commited on
Commit
81f1cc1
·
verified ·
1 Parent(s): 43b1e3f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Helsinki-NLP/tatoeba
5
+ - openlanguagedata/flores_plus
6
+ language:
7
+ - es
8
+ - eu
9
+ metrics:
10
+ - bleu
11
+ - comet
12
+ - chrf
13
+ pipeline_tag: translation
14
+ ---
15
+
16
+ # OPUS-MT-tiny-spa-eus
17
+
18
+ Distilled model from a Tatoeba-MT Teacher: [Tatoeba-MT-models/itc-eus/opusTCv20210807_transformer-big_2022-07-23](https://object.pouta.csc.fi/Tatoeba-MT-models/itc-eus/opusTCv20210807_transformer-big_2022-07-23.zip), which has been trained on the [Tatoeba](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data) dataset.
19
+
20
+ We used the [OpusDistillery](https://github.com/Helsinki-NLP/OpusDistillery) to train new a new student with the tiny architecture, with a regular transformer decoder.
21
+ For training data, we used [Tatoeba](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/data).
22
+ The configuration file fed into OpusDistillery can be found [here](https://github.com/Helsinki-NLP/OpusDistillery/blob/main/configs/opustranslate_hf/config.op.es-eu.yml).
23
+
24
+ ## How to run
25
+ ```python
26
+ from transformers import MarianMTModel, MarianTokenizer
27
+ model_name = "Helsinki-NLP/opus-mt_tiny_spa-eus"
28
+ tokenizer = MarianTokenizer.from_pretrained(model_name)
29
+ model = MarianMTModel.from_pretrained(model_name)
30
+ tok = tokenizer("La gastronomía de Mayorca, como la de otras regiones similares del Mediterráneo, se basa en el pan, los vegetales y la carne (especialmente la porcina), y utiliza aceite de oliva en todas sus recetas.", return_tensors="pt").input_ids
31
+ output = model.generate(tok)[0]
32
+ tokenizer.decode(output, skip_special_tokens=True)
33
+ ```
34
+
35
+ ## Benchmarks
36
+ ### Teacher
37
+ | testset | BLEU | chr-F | COMET|
38
+ |-----------------------|-------|-------|-------|
39
+ | Flores+ | 13.3 | 52.5 | 0.8407 |
40
+
41
+ ### Student
42
+
43
+ | testset | BLEU | chr-F | COMET |
44
+ |-----------------------|-------|-------|-------|
45
+ | Flores+ | 11.7 | 51.6 | 0.824 |
46
+
47
+