Create README.md

76628e5 verified 13 days ago

1.34 kB

language:
  - eo
  - en
  - es
  - ca
tags:
  - translation
  - machine-translation
  - marian
  - opus-mt
  - multilingual
license: cc-by-4.0
pipeline_tag: translation
metrics:
  - bleu
  - chrf

Catalan, English, Spanish -> Esperanto MT Model

Model description

This repository contains a multilingual MarianMT model for (English, Spanish, Catalan) → Esperanto translation with tiny architecture.

This model is not intended for direct inference through the Hugging Face transformers library.

Use Marian for inference instead.

The repository includes the following files:

model.npz.best-chrf.npz — trained Marian model checkpoint
tiny.decoder.yml — decoder configuration
vocab.spm — SentencePiece vocabulary
run_model.sh — Example script on how to run the model

Training data

The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

Training sentence-pair counts:

ca-eo: 672,931
es-eo: 4,677,945
eo-en: 5,000,000

Inference

Run decoding from inside the model directory:

cat input.spa  \
  marian-decoder \
  -c tiny.decoder.yml \
  --output output.epo \
  --normalize \
  -m model.npz.best-chrf.npz \
  --vocabs vocab.spm vocab.spm \
  --log decode.log \
  --devices 0