Helsinki-NLP
/

opus-mt-eo-caenes

machine-translation

Model card Files Files and versions

opus-mt-eo-caenes / README.md

odegiber's picture

Update README.md

b1d59bd verified about 1 month ago

|

history blame contribute delete

1.92 kB

	---
	language:
	- eo
	- en
	- es
	- ca
	tags:
	- translation
	- machine-translation
	- marian
	- opus-mt
	- multilingual
	license: cc-by-4.0
	pipeline_tag: translation
	metrics:
	- bleu
	- chrf
	---

	# Esperanto -> Catalan, English, Spanish MT Model

	## Model description

	This repository contains a multilingual MarianMT model for Esperanto → (English, Spanish, Catalan) translation using language tags.

	## Usage

	The model is loaded and used with `transformers` as:

	```python
	from transformers import MarianMTModel, MarianTokenizer
	import torch

	model_name = "Helsinki-NLP/opus-mt-eo-caenes"

	device = "cuda" if torch.cuda.is_available() else "cpu"
	model = MarianMTModel.from_pretrained(model_name).to(device)
	tokenizer = MarianTokenizer.from_pretrained(model_name)

	source_texts = [
	">>spa<< Saluton, kiel vi fartas?",
	">>eng<< Saluton, kiel vi fartas?",
	">>cat<< Saluton, kiel vi fartas?"
	]

	inputs = tokenizer(source_texts, return_tensors="pt", padding=True, truncation=True)
	inputs = {k: v.to(device) for k, v in inputs.items()}

	translated_ids = model.generate(inputs["input_ids"])
	translated_texts = tokenizer.batch_decode(translated_ids, skip_special_tokens=True)

	for src, tgt in zip(source_texts, translated_texts):
	print(f"Source: {src} => Translated: {tgt}")
	````

	### Supported target languages (via tags)

	You control the target language by prefixing the source sentence with one of the following tags:

	* `>>eng<<` → English
	* `>>spa<<` → Spanish
	* `>>cat<<` → Catalan

	## Training data

	The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

	Training sentence-pair counts:

	* ca-eo: 672,931
	* es-eo: 4,677,945
	* eo-en: 5,000,000

	## Evaluation on FLORES

	\| Language Pair \| BLEU \| ChrF++ \|
	\| ------------- \| ----: \| ----: \|
	\| epo-spa \| 19.98 \| 49.11 \|
	\| epo-cat \| 28.35 \| 55.42 \|
	\| epo-eng \| 37.47 \| 63.09 \|