odegiber commited on
Commit
fa874b2
·
verified ·
1 Parent(s): 0d2dac8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - eo
4
+ - en
5
+ - es
6
+ - ca
7
+ tags:
8
+ - translation
9
+ - machine-translation
10
+ - marian
11
+ - opus-mt
12
+ - multilingual
13
+ license: cc-by-4.0
14
+ pipeline_tag: translation
15
+ metrics:
16
+ - bleu
17
+ - chrf
18
+ ---
19
+
20
+ # Catalan, English, Spanish -> Esperanto MT Model
21
+
22
+ ## Model description
23
+
24
+ This repository contains a **multilingual MarianMT** model for **(English, Spanish, Catalan) → Esperanto** translation.
25
+
26
+ ## Usage
27
+
28
+ The model is loaded and used with `transformers` as:
29
+
30
+ ```python
31
+ from transformers import MarianMTModel, MarianTokenizer
32
+ import torch
33
+
34
+ model_name = "Helsinki-NLP/opus-mt-caenes-eo"
35
+
36
+ device = "cuda" if torch.cuda.is_available() else "cpu"
37
+ model = MarianMTModel.from_pretrained(model_name).to(device)
38
+ tokenizer = MarianTokenizer.from_pretrained(model_name)
39
+
40
+ source_texts = [
41
+ "Buenos días, qué tal?",
42
+ "Bon dia, com estàs?",
43
+ "Good morning, how are you?"
44
+ ]
45
+
46
+ inputs = tokenizer(source_texts, return_tensors="pt", padding=True, truncation=True)
47
+ inputs = {k: v.to(device) for k, v in inputs.items()}
48
+
49
+ translated_ids = model.generate(inputs["input_ids"])
50
+ translated_texts = tokenizer.batch_decode(translated_ids, skip_special_tokens=True)
51
+
52
+ for src, tgt in zip(source_texts, translated_texts):
53
+ print(f"Source: {src} => Translated: {tgt}")
54
+ ````
55
+
56
+ ## Training data
57
+
58
+ The model was trained using **Tatoeba** parallel data, with **FLORES-200** used as the development set.
59
+
60
+ Training sentence-pair counts:
61
+
62
+ * **ca-eo**: 672,931
63
+ * **es-eo**: 4,677,945
64
+ * **eo-en**: 5,000,000
65
+
66
+ ## Evaluation on FLORES
67
+
68
+ | Language Pair | BLEU | ChrF++ |
69
+ | ------------- | ----: | ----: |
70
+ | spa-epo | 16.25 | 49.10 |
71
+ | cat-epo | 21.43 | 51.37 |
72
+ | eng-epo | 26.42 | 58.23 |