Add pipeline tag: translation and link to paper
Browse filesThis PR adds the `pipeline_tag: translation` to the model card metadata, ensuring the model is correctly categorized and discoverable via the Hugging Face pipeline search. It also adds a link to the paper.
README.md
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
language:
|
| 4 |
- cs
|
| 5 |
- en
|
|
@@ -7,6 +6,7 @@ language:
|
|
| 7 |
- sk
|
| 8 |
- sl
|
| 9 |
library_name: transformers
|
|
|
|
| 10 |
tags:
|
| 11 |
- translation
|
| 12 |
- mt
|
|
@@ -18,11 +18,13 @@ tags:
|
|
| 18 |
- pivot
|
| 19 |
- allegro
|
| 20 |
- laniqo
|
|
|
|
| 21 |
---
|
| 22 |
|
| 23 |
-
|
| 24 |
# MultiSlav P5-many2eng
|
| 25 |
|
|
|
|
|
|
|
| 26 |
<p align="center">
|
| 27 |
<a href="https://ml.allegro.tech/"><img src="allegro-title.svg" alt="MLR @ Allegro.com"></a>
|
| 28 |
</p>
|
|
@@ -42,7 +44,7 @@ Big thanks to [laniqo.com](laniqo.com) for cooperation in the research.
|
|
| 42 |
<img src="p5-eng.svg">
|
| 43 |
</p>
|
| 44 |
|
| 45 |
-
___P5-many2eng___ - _5_-language _Many-to-English_ model translating from all applicable languages to English.
|
| 46 |
This model and [_P5-eng2many_](https://huggingface.co/allegro/P5-eng2many) combine into ___P5-eng___ pivot system translating between _5_ languages.
|
| 47 |
_P5-eng_ translates all supported languages using Many2One model to English bridge sentence
|
| 48 |
and next using the One2Many model from English bridge sentence to target language.
|
|
@@ -128,7 +130,7 @@ Generated Polish to Slovene pivot translation via English:
|
|
| 128 |
## Training
|
| 129 |
|
| 130 |
[SentencePiece](https://github.com/google/sentencepiece) tokenizer has a vocab size 80k in total (16k per language). Tokenizer was trained on randomly sampled part of the training corpus.
|
| 131 |
-
During the training we used the [MarianNMT](https://marian-nmt.github.io/) framework.
|
| 132 |
Base marian configuration used: [transfromer-big](https://github.com/marian-nmt/marian-dev/blob/master/src/common/aliases.cpp#L113).
|
| 133 |
All training parameters are listed in table below.
|
| 134 |
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- cs
|
| 4 |
- en
|
|
|
|
| 6 |
- sk
|
| 7 |
- sl
|
| 8 |
library_name: transformers
|
| 9 |
+
license: cc-by-4.0
|
| 10 |
tags:
|
| 11 |
- translation
|
| 12 |
- mt
|
|
|
|
| 18 |
- pivot
|
| 19 |
- allegro
|
| 20 |
- laniqo
|
| 21 |
+
pipeline_tag: translation
|
| 22 |
---
|
| 23 |
|
|
|
|
| 24 |
# MultiSlav P5-many2eng
|
| 25 |
|
| 26 |
+
This repository contains the model described in the paper [](https://hf.co/papers/2502.14509).
|
| 27 |
+
|
| 28 |
<p align="center">
|
| 29 |
<a href="https://ml.allegro.tech/"><img src="allegro-title.svg" alt="MLR @ Allegro.com"></a>
|
| 30 |
</p>
|
|
|
|
| 44 |
<img src="p5-eng.svg">
|
| 45 |
</p>
|
| 46 |
|
| 47 |
+
___P5-many2eng___ - _5_-language _Many-to-English_ model translating from all applicable languages to English.\
|
| 48 |
This model and [_P5-eng2many_](https://huggingface.co/allegro/P5-eng2many) combine into ___P5-eng___ pivot system translating between _5_ languages.
|
| 49 |
_P5-eng_ translates all supported languages using Many2One model to English bridge sentence
|
| 50 |
and next using the One2Many model from English bridge sentence to target language.
|
|
|
|
| 130 |
## Training
|
| 131 |
|
| 132 |
[SentencePiece](https://github.com/google/sentencepiece) tokenizer has a vocab size 80k in total (16k per language). Tokenizer was trained on randomly sampled part of the training corpus.
|
| 133 |
+
During the training we used the [MarianNMT](https://marian-nmt.github.io/) framework.\
|
| 134 |
Base marian configuration used: [transfromer-big](https://github.com/marian-nmt/marian-dev/blob/master/src/common/aliases.cpp#L113).
|
| 135 |
All training parameters are listed in table below.
|
| 136 |
|