odegiber's picture
Create README.md
76628e5 verified
metadata
language:
  - eo
  - en
  - es
  - ca
tags:
  - translation
  - machine-translation
  - marian
  - opus-mt
  - multilingual
license: cc-by-4.0
pipeline_tag: translation
metrics:
  - bleu
  - chrf

Catalan, English, Spanish -> Esperanto MT Model

Model description

This repository contains a multilingual MarianMT model for (English, Spanish, Catalan) → Esperanto translation with tiny architecture.

This model is not intended for direct inference through the Hugging Face transformers library.

Use Marian for inference instead.

The repository includes the following files:

  • model.npz.best-chrf.npz — trained Marian model checkpoint
  • tiny.decoder.yml — decoder configuration
  • vocab.spm — SentencePiece vocabulary
  • run_model.sh — Example script on how to run the model

Training data

The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

Training sentence-pair counts:

  • ca-eo: 672,931
  • es-eo: 4,677,945
  • eo-en: 5,000,000

Inference

Run decoding from inside the model directory:

cat input.spa  \
  marian-decoder \
  -c tiny.decoder.yml \
  --output output.epo \
  --normalize \
  -m model.npz.best-chrf.npz \
  --vocabs vocab.spm vocab.spm \
  --log decode.log \
  --devices 0