--- language: - eo - en - es - ca tags: - translation - machine-translation - marian - opus-mt - multilingual license: cc-by-4.0 pipeline_tag: translation metrics: - bleu - chrf --- # Esperanto -> Catalan, English, Spanish MT Model ## Model description This repository contains a **multilingual MarianMT** model for **Esperanto → (English, Spanish, Catalan)** translation using language tags. ## Usage The model is loaded and used with `transformers` as: ```python from transformers import MarianMTModel, MarianTokenizer import torch model_name = "Helsinki-NLP/opus-mt-eo-caenes" device = "cuda" if torch.cuda.is_available() else "cpu" model = MarianMTModel.from_pretrained(model_name).to(device) tokenizer = MarianTokenizer.from_pretrained(model_name) source_texts = [ ">>spa<< Saluton, kiel vi fartas?", ">>eng<< Saluton, kiel vi fartas?", ">>cat<< Saluton, kiel vi fartas?" ] inputs = tokenizer(source_texts, return_tensors="pt", padding=True, truncation=True) inputs = {k: v.to(device) for k, v in inputs.items()} translated_ids = model.generate(inputs["input_ids"]) translated_texts = tokenizer.batch_decode(translated_ids, skip_special_tokens=True) for src, tgt in zip(source_texts, translated_texts): print(f"Source: {src} => Translated: {tgt}") ```` ### Supported target languages (via tags) You control the target language by prefixing the source sentence with one of the following tags: * `>>eng<<` → English * `>>spa<<` → Spanish * `>>cat<<` → Catalan ## Training data The model was trained using **Tatoeba** parallel data, with **FLORES-200** used as the development set. Training sentence-pair counts: * **ca-eo**: 672,931 * **es-eo**: 4,677,945 * **eo-en**: 5,000,000 ## Evaluation on FLORES | Language Pair | BLEU | ChrF++ | | ------------- | ----: | ----: | | epo-spa | 19.98 | 49.11 | | epo-cat | 28.35 | 55.42 | | epo-eng | 37.47 | 63.09 |