| --- |
| pipeline_tag: translation |
| language: |
| - multilingual |
| - en |
| - am |
| - ar |
| - so |
| - sw |
| - pt |
| - af |
| - fr |
| - zu |
| - mg |
| - ha |
| - sn |
| - arz |
| - ny |
| - ig |
| - xh |
| - yo |
| - st |
| - rw |
| - tn |
| - ti |
| - ts |
| - om |
| - run |
| - nso |
| - ee |
| - ln |
| - tw |
| - pcm |
| - gaa |
| - loz |
| - lg |
| - guw |
| - bem |
| - efi |
| - lue |
| - lua |
| - toi |
| - ve |
| - tum |
| - tll |
| - iso |
| - kqn |
| - zne |
| - umb |
| - mos |
| - tiv |
| - lu |
| - ff |
| - kwy |
| - bci |
| - rnd |
| - luo |
| - wal |
| - ss |
| - lun |
| - wo |
| - nyk |
| - kj |
| - ki |
| - fon |
| - bm |
| - cjk |
| - din |
| - dyu |
| - kab |
| - kam |
| - kbp |
| - kr |
| - kmb |
| - kg |
| - nus |
| - sg |
| - taq |
| - tzm |
| - nqo |
|
|
| license: apache-2.0 |
| --- |
| SSA-COMET-STL, a robust, automatic metric for MTE, built based on SSA-MTE: It receives a triplet with (source sentence, translation, reference translation), and returns a score that reflects the quality of the translation. |
| This model is based on an improved African enhanced encoder, [afro-xlmr-large-76L](https://huggingface.co/Davlan/afro-xlmr-large-76L). |
|
|
| # Paper |
|
|
| Coming soon |
|
|
| # License |
|
|
| Apache-2.0 |
|
|
| # Usage (SSA-COMET) |
|
|
| Using this model requires unbabel-comet to be installed: |
|
|
| ```bash |
| pip install --upgrade pip # ensures that pip is current |
| pip install unbabel-comet |
| ``` |
|
|
| Then you can use it through comet CLI: |
|
|
| ```bash |
| comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model McGill-NLP/ssa-comet-stl |
| ``` |
|
|
| Or using Python: |
|
|
| ```python |
| from comet import download_model, load_from_checkpoint |
| model_path = download_model("McGill-NLP/ssa-comet-stl") |
| model = load_from_checkpoint(model_path) |
| data = [ |
| { |
| "src": "Nadal sàkọọ́lẹ̀ ìforígbárí o ní àmì méje sóódo pẹ̀lú ilẹ̀ Canada.", |
| "mt": "Nadal's head to head record against the Canadian is 7–2.", |
| "ref": "Nadal scored seven unanswered points against Canada." |
| }, |
| { |
| "src": "Laipe yi o padanu si Raoniki ni ere Sisi Brisbeni.", |
| "mt": "He recently lost against Raonic in the Brisbane Open.", |
| "ref": "He recently lost to Raoniki in the game Sisi Brisbeni." |
| } |
| ] |
| model_output = model.predict(data, batch_size=8, gpus=1) |
| print (model_output) |
| ``` |
|
|
| # Intended uses |
|
|
| Our model is intended to be used for **MT evaluation**. |
|
|
| Given a triplet with (source sentence, translation, reference translation), it outputs a single score between 0 and 1, where 1 represents a perfect translation. |
|
|
| # Languages Covered: |
|
|
| There are 76 languages available : |
| - English (eng) |
| - Amharic (amh) |
| - Arabic (ara) |
| - Somali (som) |
| - Kiswahili (swa) |
| - Portuguese (por) |
| - Afrikaans (afr) |
| - French (fra) |
| - isiZulu (zul) |
| - Malagasy (mlg) |
| - Hausa (hau) |
| - chiShona (sna) |
| - Egyptian Arabic (arz) |
| - Chichewa (nya) |
| - Igbo (ibo) |
| - isiXhosa (xho) |
| - Yorùbá (yor) |
| - Sesotho (sot) |
| - Kinyarwanda (kin) |
| - Tigrinya (tir) |
| - Tsonga (tso) |
| - Oromo (orm) |
| - Rundi (run) |
| - Northern Sotho (nso) |
| - Ewe (ewe) |
| - Lingala (lin) |
| - Twi (twi) |
| - Nigerian Pidgin (pcm) |
| - Ga (gaa) |
| - Lozi (loz) |
| - Luganda (lug) |
| - Gun (guw) |
| - Bemba (bem) |
| - Efik (efi) |
| - Luvale (lue) |
| - Luba-Lulua (lua) |
| - Tonga (toi) |
| - Tshivenḓa (ven) |
| - Tumbuka (tum) |
| - Tetela (tll) |
| - Isoko (iso) |
| - Kaonde (kqn) |
| - Zande (zne) |
| - Umbundu (umb) |
| - Mossi (mos) |
| - Tiv (tiv) |
| - Luba-Katanga (lub) |
| - Fula (fuv) |
| - San Salvador Kongo (kwy) |
| - Baoulé (bci) |
| - Ruund (rnd) |
| - Luo (luo) |
| - Wolaitta (wal) |
| - Swazi (ssw) |
| - Lunda (lun) |
| - Wolof (wol) |
| - Nyaneka (nyk) |
| - Kwanyama (kua) |
| - Kikuyu (kik) |
| - Fon (fon) |
| - Bambara (bam) |
| - Chokwe (cjk) |
| - Dinka (dik) |
| - Dyula (dyu) |
| - Kabyle (kab) |
| - Kamba (kam) |
| - Kabiyè (kbp) |
| - Kanuri (knc) |
| - Kimbundu (kmb) |
| - Kikongo (kon) |
| - Nuer (nus) |
| - Sango (sag) |
| - Tamasheq (taq) |
| - Tamazight (tzm) |
| - N'ko (nqo) |
|
|
| # Specifically Finetuned on: |
| - Amharic (amh) |
| - Hausa (hau) |
| - Igbo (ibo) |
| - Kikuyu (kik) |
| - Kinyarwanda (kin) |
| - Luo (luo) |
| - Twi (twi) |
| - Yoruba (yor) |
| - Zulu (zul) |
| - Ewe (Ewe) |
| - Lingala (lin) |
| - Wolof (wol) |