Opus Tatoeba | English -> Arabic
- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): acm afb apc ara arq ary arz
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<< (id = valid target language ID)
- valid language labels: >>ara<< >>ara_Latn<< >>arq_Latn<< >>arq<< >>arz<<
- download: opus-2021-02-23.zip
- test set translations: opus-2021-02-23.test.txt
- test set scores: opus-2021-02-23.eval.txt
Benchmarks
| testset |
BLEU |
chr-F |
#sent |
#words |
BP |
| Tatoeba-test.eng-acm |
3.6 |
0.202 |
3 |
17 |
1.000 |
| Tatoeba-test.eng-afb |
29.8 |
0.560 |
36 |
145 |
1.000 |
| Tatoeba-test.eng-apc |
6.4 |
0.249 |
5 |
18 |
0.943 |
| Tatoeba-test.eng-ara |
14.0 |
0.437 |
10000 |
58935 |
1.000 |
| Tatoeba-test.eng-arq |
0.5 |
0.155 |
412 |
2323 |
1.000 |
| Tatoeba-test.eng-ary |
3.1 |
0.246 |
18 |
53 |
1.000 |
| Tatoeba-test.eng-arz |
2.1 |
0.249 |
181 |
856 |
1.000 |
| tico19-test.eng-ara |
22.2 |
0.530 |
2100 |
51336 |
0.997 |