Helsinki-NLP
/

opus-mt-eo-caenes_tiny

machine-translation

Model card Files Files and versions

opus-mt-eo-caenes_tiny / README.md

odegiber's picture

Update README.md

53f6e80 verified 13 days ago

|

history blame contribute delete

1.6 kB

	---
	language:
	- eo
	- en
	- es
	- ca
	tags:
	- translation
	- machine-translation
	- marian
	- opus-mt
	- multilingual
	license: cc-by-4.0
	pipeline_tag: translation
	metrics:
	- bleu
	- chrf
	---

	# Esperanto -> Catalan, English, Spanish MT Model

	## Model description

	This repository contains a multilingual MarianMT model for Esperanto → (English, Spanish, Catalan) translation using language tags with tiny architecture.

	This model is not intended for direct inference through the Hugging Face `transformers` library.

	Use [Marian](https://marian-nmt.github.io/docs/) for inference instead.

	The repository includes the following files:

	- `model.npz.best-chrf.npz` — trained Marian model checkpoint
	- `tiny.decoder.yml` — decoder configuration
	- `vocab.spm` — SentencePiece vocabulary
	- `run_model.sh ` — Example script on how to run the model


	### Supported target languages (via tags)

	You control the target language by prefixing the source sentence with one of the following tags:

	* `>>eng<<` → English
	* `>>spa<<` → Spanish
	* `>>cat<<` → Catalan

	## Training data

	The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

	Training sentence-pair counts:

	* ca-eo: 672,931
	* es-eo: 4,677,945
	* eo-en: 5,000,000

	## Inference

	Run decoding from inside the model directory:

	```bash
	cat input.epo \| sed "s/^/>>cat<< /" \
	marian-decoder \
	-c tiny.decoder.yml \
	--output output.cat \
	--normalize \
	-m model.npz.best-chrf.npz \
	--vocabs vocab.spm vocab.spm \
	--log decode.log \
	--devices 0
	```