metadata
language:
- eo
- en
- es
- ca
tags:
- translation
- machine-translation
- marian
- opus-mt
- multilingual
license: cc-by-4.0
pipeline_tag: translation
metrics:
- bleu
- chrf
Esperanto -> Catalan, English, Spanish MT Model
Model description
This repository contains a multilingual MarianMT model for Esperanto → (English, Spanish, Catalan) translation using language tags with tiny architecture.
This model is not intended for direct inference through the Hugging Face transformers library.
Use Marian for inference instead.
The repository includes the following files:
model.npz.best-chrf.npz— trained Marian model checkpointtiny.decoder.yml— decoder configurationvocab.spm— SentencePiece vocabularyrun_model.sh— Example script on how to run the model
Supported target languages (via tags)
You control the target language by prefixing the source sentence with one of the following tags:
>>eng<<→ English>>spa<<→ Spanish>>cat<<→ Catalan
Training data
The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.
Training sentence-pair counts:
- ca-eo: 672,931
- es-eo: 4,677,945
- eo-en: 5,000,000
Inference
Run decoding from inside the model directory:
cat input.epo | sed "s/^/>>cat<< /" \
marian-decoder \
-c tiny.decoder.yml \
--output output.cat \
--normalize \
-m model.npz.best-chrf.npz \
--vocabs vocab.spm vocab.spm \
--log decode.log \
--devices 0