Add pipeline tag: translation and link to paper

This PR adds the `pipeline_tag: translation` to the model card metadata, ensuring the model is correctly categorized and discoverable via the Hugging Face pipeline search. It also adds a link to the paper.

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -1,5 +1,4 @@
 ---
-license: cc-by-4.0
 language:
 - cs
 - en
@@ -7,6 +6,7 @@ language:
 - sk
 - sl
 library_name: transformers
 tags:
 - translation
 - mt
@@ -18,11 +18,13 @@ tags:
 - pivot
 - allegro
 - laniqo
 ---
 # MultiSlav P5-many2eng
 <p align="center">
   <a href="https://ml.allegro.tech/"><img src="allegro-title.svg" alt="MLR @ Allegro.com"></a>
 </p>
@@ -42,7 +44,7 @@ Big thanks to [laniqo.com](laniqo.com) for cooperation in the research.
   <img src="p5-eng.svg">
 </p>
-___P5-many2eng___ - _5_-language _Many-to-English_ model translating from all applicable languages to English.
 This model and [_P5-eng2many_](https://huggingface.co/allegro/P5-eng2many) combine into ___P5-eng___ pivot system translating between _5_ languages.
 _P5-eng_ translates all supported languages using Many2One model to English bridge sentence
 and next using the One2Many model from English bridge sentence to target language.
@@ -128,7 +130,7 @@ Generated Polish to Slovene pivot translation via English:
 ## Training
 [SentencePiece](https://github.com/google/sentencepiece) tokenizer has a vocab size 80k in total (16k per language). Tokenizer was trained on randomly sampled part of the training corpus.
-During the training we used the [MarianNMT](https://marian-nmt.github.io/) framework.
 Base marian configuration used: [transfromer-big](https://github.com/marian-nmt/marian-dev/blob/master/src/common/aliases.cpp#L113).
 All training parameters are listed in table below.

 ---
 language:
 - cs
 - en
 - sk
 - sl
 library_name: transformers
+license: cc-by-4.0
 tags:
 - translation
 - mt
 - pivot
 - allegro
 - laniqo
+pipeline_tag: translation
 ---
 # MultiSlav P5-many2eng
+This repository contains the model described in the paper [](https://hf.co/papers/2502.14509).
 <p align="center">
   <a href="https://ml.allegro.tech/"><img src="allegro-title.svg" alt="MLR @ Allegro.com"></a>
 </p>
   <img src="p5-eng.svg">
 </p>
+___P5-many2eng___ - _5_-language _Many-to-English_ model translating from all applicable languages to English.\
 This model and [_P5-eng2many_](https://huggingface.co/allegro/P5-eng2many) combine into ___P5-eng___ pivot system translating between _5_ languages.
 _P5-eng_ translates all supported languages using Many2One model to English bridge sentence
 and next using the One2Many model from English bridge sentence to target language.
 ## Training
 [SentencePiece](https://github.com/google/sentencepiece) tokenizer has a vocab size 80k in total (16k per language). Tokenizer was trained on randomly sampled part of the training corpus.
+During the training we used the [MarianNMT](https://marian-nmt.github.io/) framework.\
 Base marian configuration used: [transfromer-big](https://github.com/marian-nmt/marian-dev/blob/master/src/common/aliases.cpp#L113).
 All training parameters are listed in table below.