Serkan007 commited on
Commit
cc72cea
·
verified ·
1 Parent(s): 247e9bd

✅ [KESİN ONAY] opus-mt-tc-big-gmq-en - Tüm hatalar giderildi ve mühürlendi.

Browse files
opus-mt-tc-big-gmq-en/README.md CHANGED
@@ -1,12 +1,166 @@
1
- ---
2
  language:
3
- - gmq
4
- - en
 
 
 
 
 
 
 
5
  tags:
6
- - translation
7
- - marian
8
- ---
9
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  # opus-mt-tc-big-gmq-en
11
 
12
  Neural machine translation model for translating from North Germanic languages (gmq) to English (en).
@@ -49,7 +203,7 @@ This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus
49
  * model: transformer-big
50
  * data: opusTCv20210807+bt ([source](https://github.com/Helsinki-NLP/Tatoeba-Challenge))
51
  * tokenization: SentencePiece (spm32k,spm32k)
52
- * original model: [opusTCv20210807+bt_transformer-big_2022-03-09.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/gmq-eng/opusTCv20210807+bt_transformer-big_2022-03-09.zip)
53
  * more information released models: [OPUS-MT gmq-eng README](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/gmq-eng/README.md)
54
 
55
  ## Usage
@@ -59,18 +213,18 @@ A short example code:
59
  ```python
60
  from transformers import MarianMTModel, MarianTokenizer
61
 
62
- src_text = [
63
  "Han var synligt nervøs.",
64
  "Inte ens Tom själv var övertygad."
65
  ]
66
 
67
- model_name = "pytorch-models/opus-mt-tc-big-gmq-en"
68
- tokenizer = MarianTokenizer.from_pretrained(model_name)
69
- model = MarianMTModel.from_pretrained(model_name)
70
- translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
71
 
72
  for t in translated:
73
- print( tokenizer.decode(t, skip_special_tokens=True) )
74
 
75
  # expected output:
76
  # He was visibly nervous.
@@ -89,10 +243,10 @@ print(pipe("Han var synligt nervøs."))
89
 
90
  ## Benchmarks
91
 
92
- * test set translations: [opusTCv20210807+bt_transformer-big_2022-03-09.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmq-eng/opusTCv20210807+bt_transformer-big_2022-03-09.test.txt)
93
- * test set scores: [opusTCv20210807+bt_transformer-big_2022-03-09.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmq-eng/opusTCv20210807+bt_transformer-big_2022-03-09.eval.txt)
94
- * benchmark results: [benchmark_results.txt](benchmark_results.txt)
95
- * benchmark output: [benchmark_translations.zip](benchmark_translations.zip)
96
 
97
  | langpair | testset | chr-F | BLEU | #sent | #words |
98
  |----------|---------|-------|-------|-------|--------|
 
1
+ - mult
2
  language:
3
+ - "da"
4
+ - "en"
5
+ - "fo"
6
+ - "gmq"
7
+ - "is"
8
+ - "nb"
9
+ - "nn"
10
+ - "false"
11
+ - "sv"
12
  tags:
13
+ - "translation"
14
+ - "opusmttc"
15
+ license: cc-by-4.0
16
+ model-index:
17
+ - "name: opusmttcbiggmqen"
18
+ results:
19
+ - "task:"
20
+ name: Translation dan-eng
21
+ type: translation
22
+ args: dan-eng
23
+ dataset:
24
+ name: flores101-devtest
25
+ type: flores_101
26
+ args: dan eng devtest
27
+ metrics:
28
+ - "name: bleu"
29
+ type: bleu
30
+ value: 49.3
31
+ - "task:"
32
+ name: Translation isl-eng
33
+ type: translation
34
+ args: isl-eng
35
+ dataset:
36
+ name: flores101-devtest
37
+ type: flores_101
38
+ args: isl eng devtest
39
+ metrics:
40
+ - "name: bleu"
41
+ type: bleu
42
+ value: 34.2
43
+ - "task:"
44
+ name: Translation nob-eng
45
+ type: translation
46
+ args: nob-eng
47
+ dataset:
48
+ name: flores101-devtest
49
+ type: flores_101
50
+ args: nob eng devtest
51
+ metrics:
52
+ - "name: bleu"
53
+ type: bleu
54
+ value: 44.2
55
+ - "task:"
56
+ name: Translation swe-eng
57
+ type: translation
58
+ args: swe-eng
59
+ dataset:
60
+ name: flores101-devtest
61
+ type: flores_101
62
+ args: swe eng devtest
63
+ metrics:
64
+ - "name: bleu"
65
+ type: bleu
66
+ value: 49.8
67
+ - "task:"
68
+ name: Translation isl-eng
69
+ type: translation
70
+ args: isl-eng
71
+ dataset:
72
+ name: newsdev2021.is-en
73
+ type: newsdev2021.is-en
74
+ args: isl-eng
75
+ metrics:
76
+ - "name: bleu"
77
+ type: bleu
78
+ value: 30.4
79
+ - "task:"
80
+ name: Translation dan-eng
81
+ type: translation
82
+ args: dan-eng
83
+ dataset:
84
+ name: tatoeba-test-v2021-08-07
85
+ type: tatoeba
86
+ args: dan-eng
87
+ metrics:
88
+ - "name: bleu"
89
+ type: bleu
90
+ value: 65.9
91
+ - "task:"
92
+ name: Translation fao-eng
93
+ type: translation
94
+ args: fao-eng
95
+ dataset:
96
+ name: tatoeba-test-v2021-08-07
97
+ type: tatoeba
98
+ args: fao-eng
99
+ metrics:
100
+ - "name: bleu"
101
+ type: bleu
102
+ value: 30.1
103
+ - "task:"
104
+ name: Translation isl-eng
105
+ type: translation
106
+ args: isl-eng
107
+ dataset:
108
+ name: tatoeba-test-v2021-08-07
109
+ type: tatoeba
110
+ args: isl-eng
111
+ metrics:
112
+ - "name: bleu"
113
+ type: bleu
114
+ value: 53.3
115
+ - "task:"
116
+ name: Translation nno-eng
117
+ type: translation
118
+ args: nno-eng
119
+ dataset:
120
+ name: tatoeba-test-v2021-08-07
121
+ type: tatoeba
122
+ args: nno-eng
123
+ metrics:
124
+ - "name: bleu"
125
+ type: bleu
126
+ value: 56.1
127
+ - "task:"
128
+ name: Translation nob-eng
129
+ type: translation
130
+ args: nob-eng
131
+ dataset:
132
+ name: tatoeba-test-v2021-08-07
133
+ type: tatoeba
134
+ args: nob-eng
135
+ metrics:
136
+ - "name: bleu"
137
+ type: bleu
138
+ value: 60.2
139
+ - "task:"
140
+ name: Translation swe-eng
141
+ type: translation
142
+ args: swe-eng
143
+ dataset:
144
+ name: tatoeba-test-v2021-08-07
145
+ type: tatoeba
146
+ args: swe-eng
147
+ metrics:
148
+ - "name: bleu"
149
+ type: bleu
150
+ value: 66.4
151
+ - "task:"
152
+ name: Translation isl-eng
153
+ type: translation
154
+ args: isl-eng
155
+ dataset:
156
+ name: newstest2021.is-en
157
+ type: wmt-2021-news
158
+ args: isl-eng
159
+ metrics:
160
+ - "name: bleu"
161
+ type: bleu
162
+ value: 34.4
163
+ ---
164
  # opus-mt-tc-big-gmq-en
165
 
166
  Neural machine translation model for translating from North Germanic languages (gmq) to English (en).
 
203
  * model: transformer-big
204
  * data: opusTCv20210807+bt ([source](https://github.com/Helsinki-NLP/Tatoeba-Challenge))
205
  * tokenization: SentencePiece (spm32k,spm32k)
206
+ * original model: [opusTCv20210807+bt-big_2022-03-09.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/gmq-eng/opusTCv20210807+bt-big_2022-03-09.zip)
207
  * more information released models: [OPUS-MT gmq-eng README](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/gmq-eng/README.md)
208
 
209
  ## Usage
 
213
  ```python
214
  from transformers import MarianMTModel, MarianTokenizer
215
 
216
+ src = [
217
  "Han var synligt nervøs.",
218
  "Inte ens Tom själv var övertygad."
219
  ]
220
 
221
+ model = "pytorch-models/opus-mt-tc-big-gmq-en"
222
+ tokenizer = MarianTokenizer.from(model)
223
+ model = MarianMTModel.from(model)
224
+ translated = model.generate(**tokenizer(src, return="pt", padding=True))
225
 
226
  for t in translated:
227
+ print( tokenizer.decode(t, skip_tokens=True) )
228
 
229
  # expected output:
230
  # He was visibly nervous.
 
243
 
244
  ## Benchmarks
245
 
246
+ * test set translations: [opusTCv20210807+bt-big_2022-03-09.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmq-eng/opusTCv20210807+bt-big_2022-03-09.test.txt)
247
+ * test set scores: [opusTCv20210807+bt-big_2022-03-09.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmq-eng/opusTCv20210807+bt-big_2022-03-09.eval.txt)
248
+ * benchmark results: [benchmark.txt](benchmark.txt)
249
+ * benchmark output: [benchmark.zip](benchmark.zip)
250
 
251
  | langpair | testset | chr-F | BLEU | #sent | #words |
252
  |----------|---------|-------|-------|-------|--------|
opus-mt-tc-big-gmq-en/meryem_muhur.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ Dogrulandi ve Meta-Veri Onarildi: Wed Apr 22 08:08:57 2026