Helsinki-NLP
/

opus-mt-caenes-eo_tiny

machine-translation

Model card Files Files and versions

opus-mt-caenes-eo_tiny / README.md

odegiber's picture

Create README.md

76628e5 verified 13 days ago

|

history blame contribute delete

1.34 kB

	---
	language:
	- eo
	- en
	- es
	- ca
	tags:
	- translation
	- machine-translation
	- marian
	- opus-mt
	- multilingual
	license: cc-by-4.0
	pipeline_tag: translation
	metrics:
	- bleu
	- chrf
	---

	# Catalan, English, Spanish -> Esperanto MT Model

	## Model description

	This repository contains a multilingual MarianMT model for (English, Spanish, Catalan) → Esperanto translation with tiny architecture.

	This model is not intended for direct inference through the Hugging Face `transformers` library.

	Use [Marian](https://marian-nmt.github.io/docs/) for inference instead.

	The repository includes the following files:

	- `model.npz.best-chrf.npz` — trained Marian model checkpoint
	- `tiny.decoder.yml` — decoder configuration
	- `vocab.spm` — SentencePiece vocabulary
	- `run_model.sh ` — Example script on how to run the model

	## Training data

	The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

	Training sentence-pair counts:

	* ca-eo: 672,931
	* es-eo: 4,677,945
	* eo-en: 5,000,000

	## Inference

	Run decoding from inside the model directory:

	```bash
	cat input.spa \
	marian-decoder \
	-c tiny.decoder.yml \
	--output output.epo \
	--normalize \
	-m model.npz.best-chrf.npz \
	--vocabs vocab.spm vocab.spm \
	--log decode.log \
	--devices 0
	```