metadata
license: apache-2.0
language:
- ar
- en
pipeline_tag: translation
Opus Tatoeba | English -> Arabic
- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): acm afb apc ara arq ary arz
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<(id = valid target language ID) - valid language labels: >>ara<< >>ara_Latn<< >>arq_Latn<< >>arq<< >>arz<<
- download: opus-2021-02-23.zip
- test set translations: opus-2021-02-23.test.txt
- test set scores: opus-2021-02-23.eval.txt
Benchmarks
| testset | BLEU | chr-F | #sent | #words | BP |
|---|---|---|---|---|---|
| Tatoeba-test.eng-acm | 3.6 | 0.202 | 3 | 17 | 1.000 |
| Tatoeba-test.eng-afb | 29.8 | 0.560 | 36 | 145 | 1.000 |
| Tatoeba-test.eng-apc | 6.4 | 0.249 | 5 | 18 | 0.943 |
| Tatoeba-test.eng-ara | 14.0 | 0.437 | 10000 | 58935 | 1.000 |
| Tatoeba-test.eng-arq | 0.5 | 0.155 | 412 | 2323 | 1.000 |
| Tatoeba-test.eng-ary | 3.1 | 0.246 | 18 | 53 | 1.000 |
| Tatoeba-test.eng-arz | 2.1 | 0.249 | 181 | 856 | 1.000 |
| tico19-test.eng-ara | 22.2 | 0.530 | 2100 | 51336 | 0.997 |