Helsinki-NLP
/

opus-mt-eo-caenes

machine-translation

Model card Files Files and versions

odegiber commited on Dec 12, 2025

Commit

7ac3dd5

·

verified ·

1 Parent(s): 8ab0ff7

Added README

Files changed (1) hide show

README.md +80 -0

README.md ADDED Viewed

	@@ -0,0 +1,80 @@

+---
+language:
+- eo
+- en
+- es
+- ca
+tags:
+- translation
+- machine-translation
+- marian
+- opus-mt
+- multilingual
+license: cc-by-4.0
+pipeline_tag: translation
+metrics:
+- bleu
+- chrf
+---
+# Esperanto -> Catalan, English, Spanish MT Model
+## Model description
+This repository contains a **multilingual MarianMT** model for **Esperanto → (English, Spanish, Catalan)** translation using language tags.
+## Usage
+The model is loaded and used with `transformers` as:
+```python
+from transformers import MarianMTModel, MarianTokenizer
+import torch
+model_name = "models/hf/eo_esenca_shuf"
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = MarianMTModel.from_pretrained(model_name).to(device)
+tokenizer = MarianTokenizer.from_pretrained(model_name)
+source_texts = [
+    ">>spa<< Saluton, kiel vi fartas?",
+    ">>eng<< Saluton, kiel vi fartas?",
+    ">>cat<< Saluton, kiel vi fartas?"
+]
+inputs = tokenizer(source_texts, return_tensors="pt", padding=True, truncation=True)
+inputs = {k: v.to(device) for k, v in inputs.items()}
+translated_ids = model.generate(inputs["input_ids"])
+translated_texts = tokenizer.batch_decode(translated_ids, skip_special_tokens=True)
+for src, tgt in zip(source_texts, translated_texts):
+    print(f"Source: {src} => Translated: {tgt}")
+````
+### Supported target languages (via tags)
+You control the target language by prefixing the source sentence with one of the following tags:
+* `>>eng<<` → English
+* `>>spa<<` → Spanish
+* `>>cat<<` → Catalan
+## Training data
+The model was trained using **Tatoeba** parallel data, with **FLORES-200** used as the development set.
+Training sentence-pair counts:
+* **ca-eo**: 672,931
+* **es-eo**: 4,677,945
+* **eo-en**: 5,000,000
+## Evaluation on FLORES
+| Language Pair |  BLEU |  ChrF++ |
+| ------------- | ----: | ----: |
+| epo-spa       | 19.98 | 49.11 |
+| epo-cat       | 28.35 | 55.42 |
+| epo-eng       | 37.47 | 63.09 |