metadata
language:
- eo
- en
- es
- ca
tags:
- translation
- machine-translation
- marian
- opus-mt
- multilingual
license: cc-by-4.0
pipeline_tag: translation
metrics:
- bleu
- chrf
Catalan, English, Spanish -> Esperanto MT Model
Model description
This repository contains a multilingual MarianMT model for (English, Spanish, Catalan) → Esperanto translation with tiny architecture.
This model is not intended for direct inference through the Hugging Face transformers library.
Use Marian for inference instead.
The repository includes the following files:
model.npz.best-chrf.npz— trained Marian model checkpointtiny.decoder.yml— decoder configurationvocab.spm— SentencePiece vocabularyrun_model.sh— Example script on how to run the model
Training data
The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.
Training sentence-pair counts:
- ca-eo: 672,931
- es-eo: 4,677,945
- eo-en: 5,000,000
Inference
Run decoding from inside the model directory:
cat input.spa \
marian-decoder \
-c tiny.decoder.yml \
--output output.epo \
--normalize \
-m model.npz.best-chrf.npz \
--vocabs vocab.spm vocab.spm \
--log decode.log \
--devices 0