| --- |
| language: |
| - eo |
| - en |
| - es |
| - ca |
| tags: |
| - translation |
| - machine-translation |
| - marian |
| - opus-mt |
| - multilingual |
| license: cc-by-4.0 |
| pipeline_tag: translation |
| metrics: |
| - bleu |
| - chrf |
| --- |
| |
| # Catalan, English, Spanish -> Esperanto MT Model |
|
|
| ## Model description |
|
|
| This repository contains a **multilingual MarianMT** model for **(English, Spanish, Catalan) β Esperanto** translation with tiny architecture. |
|
|
| This model is **not intended for direct inference through the Hugging Face `transformers` library**. |
|
|
| Use [**Marian**](https://marian-nmt.github.io/docs/) for inference instead. |
|
|
| The repository includes the following files: |
|
|
| - `model.npz.best-chrf.npz` β trained Marian model checkpoint |
| - `tiny.decoder.yml` β decoder configuration |
| - `vocab.spm` β SentencePiece vocabulary |
| - `run_model.sh ` β Example script on how to run the model |
|
|
| ## Training data |
|
|
| The model was trained using **Tatoeba** parallel data, with **FLORES-200** used as the development set. |
|
|
| Training sentence-pair counts: |
|
|
| * **ca-eo**: 672,931 |
| * **es-eo**: 4,677,945 |
| * **eo-en**: 5,000,000 |
|
|
| ## Inference |
|
|
| Run decoding from inside the model directory: |
|
|
| ```bash |
| cat input.spa \ |
| marian-decoder \ |
| -c tiny.decoder.yml \ |
| --output output.epo \ |
| --normalize \ |
| -m model.npz.best-chrf.npz \ |
| --vocabs vocab.spm vocab.spm \ |
| --log decode.log \ |
| --devices 0 |
| ``` |