File size: 1,595 Bytes
c763a41
 
 
53f6e80
 
c763a41
 
 
53f6e80
c763a41
53f6e80
 
 
c763a41
53f6e80
 
 
c763a41
 
53f6e80
c763a41
53f6e80
c763a41
53f6e80
c763a41
 
 
53f6e80
c763a41
 
 
 
 
 
53f6e80
c763a41
 
53f6e80
c763a41
53f6e80
c763a41
53f6e80
 
 
 
 
 
 
 
 
 
 
 
 
c763a41
 
 
 
 
 
53f6e80
 
c763a41
 
 
 
 
 
 
53f6e80
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
language:
- eo
- en
- es
- ca
tags:
- translation
- machine-translation
- marian
- opus-mt
- multilingual
license: cc-by-4.0
pipeline_tag: translation
metrics:
- bleu
- chrf
---

# Esperanto -> Catalan, English, Spanish MT Model

## Model description

This repository contains a **multilingual MarianMT** model for **Esperanto → (English, Spanish, Catalan)** translation using language tags with tiny architecture.

This model is **not intended for direct inference through the Hugging Face `transformers` library**.

Use [**Marian**](https://marian-nmt.github.io/docs/) for inference instead.

The repository includes the following files:

- `model.npz.best-chrf.npz` — trained Marian model checkpoint
- `tiny.decoder.yml` — decoder configuration
- `vocab.spm` — SentencePiece vocabulary
- `run_model.sh ` — Example script on how to run the model


### Supported target languages (via tags)

You control the target language by prefixing the source sentence with one of the following tags:

* `>>eng<<` → English
* `>>spa<<` → Spanish
* `>>cat<<` → Catalan

## Training data

The model was trained using **Tatoeba** parallel data, with **FLORES-200** used as the development set.

Training sentence-pair counts:

* **ca-eo**: 672,931
* **es-eo**: 4,677,945
* **eo-en**: 5,000,000

## Inference

Run decoding from inside the model directory:

```bash
cat input.epo |  sed "s/^/>>cat<< /"  \
  marian-decoder \
  -c tiny.decoder.yml \
  --output output.cat \
  --normalize \
  -m model.npz.best-chrf.npz \
  --vocabs vocab.spm vocab.spm \
  --log decode.log \
  --devices 0
```