Update README.md
Browse files
README.md
CHANGED
|
@@ -47,7 +47,7 @@ Plume is the first LLM trained for Neural Machine Translation with only parallel
|
|
| 47 |
|
| 48 |
In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
|
| 49 |
|
| 50 |
-
For more details regarding the model architecture, the dataset and model interpretability take a look at the paper
|
| 51 |
|
| 52 |
## Intended Uses and Limitations
|
| 53 |
|
|
@@ -96,11 +96,11 @@ For training, the learning rate is warmed up from 1e-7 to a maximum of 3e-4 over
|
|
| 96 |
| Warmup Steps | 2000 |
|
| 97 |
|
| 98 |
|
| 99 |
-
More training details are specified in the [paper](). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
|
| 100 |
|
| 101 |
## Evaluation
|
| 102 |
|
| 103 |
-
Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper]().
|
| 104 |
|
| 105 |
| Model | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
|
| 106 |
|----------------------|-------------|--------------|------------|-------------|
|
|
|
|
| 47 |
|
| 48 |
In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
|
| 49 |
|
| 50 |
+
For more details regarding the model architecture, the dataset and model interpretability take a look at the [paper](https://arxiv.org/abs/2406.09140).
|
| 51 |
|
| 52 |
## Intended Uses and Limitations
|
| 53 |
|
|
|
|
| 96 |
| Warmup Steps | 2000 |
|
| 97 |
|
| 98 |
|
| 99 |
+
More training details are specified in the [paper](https://arxiv.org/abs/2406.09140). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
|
| 100 |
|
| 101 |
## Evaluation
|
| 102 |
|
| 103 |
+
Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper](https://arxiv.org/abs/2406.09140).
|
| 104 |
|
| 105 |
| Model | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
|
| 106 |
|----------------------|-------------|--------------|------------|-------------|
|