File size: 1,337 Bytes
76628e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
language:
- eo
- en
- es
- ca
tags:
- translation
- machine-translation
- marian
- opus-mt
- multilingual
license: cc-by-4.0
pipeline_tag: translation
metrics:
- bleu
- chrf
---

# Catalan, English, Spanish -> Esperanto MT Model

## Model description

This repository contains a **multilingual MarianMT** model for **(English, Spanish, Catalan) → Esperanto** translation with tiny architecture.

This model is **not intended for direct inference through the Hugging Face `transformers` library**.

Use [**Marian**](https://marian-nmt.github.io/docs/) for inference instead.

The repository includes the following files:

- `model.npz.best-chrf.npz` — trained Marian model checkpoint
- `tiny.decoder.yml` — decoder configuration
- `vocab.spm` — SentencePiece vocabulary
- `run_model.sh ` — Example script on how to run the model

## Training data

The model was trained using **Tatoeba** parallel data, with **FLORES-200** used as the development set.

Training sentence-pair counts:

* **ca-eo**: 672,931
* **es-eo**: 4,677,945
* **eo-en**: 5,000,000

## Inference

Run decoding from inside the model directory:

```bash
cat input.spa  \
  marian-decoder \
  -c tiny.decoder.yml \
  --output output.epo \
  --normalize \
  -m model.npz.best-chrf.npz \
  --vocabs vocab.spm vocab.spm \
  --log decode.log \
  --devices 0
```