Opus Tatoeba | English -> Chinese
- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): cjy cmn gan hak hsn lzh nan wuu yue
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<< (id = valid target language ID)
- valid language labels: >>cmn_Hans<< >>cmn_Hant<< >>cmn<< >>yue_Hant<< >>yue_Hans<< >>nan<< >>wuu<<
- download: opus-2021-02-23.zip
- test set translations: opus-2021-02-23.test.txt
- test set scores: opus-2021-02-23.eval.txt
Benchmarks
| testset |
BLEU |
chr-F |
#sent |
#words |
BP |
| Tatoeba-test.eng-zho |
31.6 |
0.267 |
9999 |
110463 |
0.911 |