odegiber's picture
Update README.md
53f6e80 verified
metadata
language:
  - eo
  - en
  - es
  - ca
tags:
  - translation
  - machine-translation
  - marian
  - opus-mt
  - multilingual
license: cc-by-4.0
pipeline_tag: translation
metrics:
  - bleu
  - chrf

Esperanto -> Catalan, English, Spanish MT Model

Model description

This repository contains a multilingual MarianMT model for Esperanto → (English, Spanish, Catalan) translation using language tags with tiny architecture.

This model is not intended for direct inference through the Hugging Face transformers library.

Use Marian for inference instead.

The repository includes the following files:

  • model.npz.best-chrf.npz — trained Marian model checkpoint
  • tiny.decoder.yml — decoder configuration
  • vocab.spm — SentencePiece vocabulary
  • run_model.sh — Example script on how to run the model

Supported target languages (via tags)

You control the target language by prefixing the source sentence with one of the following tags:

  • >>eng<< → English
  • >>spa<< → Spanish
  • >>cat<< → Catalan

Training data

The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

Training sentence-pair counts:

  • ca-eo: 672,931
  • es-eo: 4,677,945
  • eo-en: 5,000,000

Inference

Run decoding from inside the model directory:

cat input.epo |  sed "s/^/>>cat<< /"  \
  marian-decoder \
  -c tiny.decoder.yml \
  --output output.cat \
  --normalize \
  -m model.npz.best-chrf.npz \
  --vocabs vocab.spm vocab.spm \
  --log decode.log \
  --devices 0