projecte-aina
/

Plume128k

@@ -47,7 +47,7 @@ Plume is the first LLM trained for Neural Machine Translation with only parallel
 In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on  Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
-For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv](https://arxiv.org/abs/2406.09140).
 ## Intended Uses and Limitations
@@ -96,11 +96,11 @@ For training, the learning rate is warmed up from 1e-7 to a maximum of 3e-4 over
 | Warmup Steps        | 2000                     |
-More training details are specified in the [paper](). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
 ## Evaluation
-Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper]().
 | Model  | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
 |----------------------|-------------|--------------|------------|-------------|

 In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on  Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
+For more details regarding the model architecture, the dataset and model interpretability take a look at the [paper](https://arxiv.org/abs/2406.09140).
 ## Intended Uses and Limitations
 | Warmup Steps        | 2000                     |
+More training details are specified in the [paper](https://arxiv.org/abs/2406.09140). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
 ## Evaluation
+Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper](https://arxiv.org/abs/2406.09140).
 | Model  | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
 |----------------------|-------------|--------------|------------|-------------|