nielsr HF Staff commited on
Commit
01aac52
·
verified ·
1 Parent(s): 629e822

Add pipeline tag: translation and link to paper

Browse files

This PR adds the `pipeline_tag: translation` to the model card metadata, ensuring the model is correctly categorized and discoverable via the Hugging Face pipeline search. It also adds a link to the paper.

Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license: cc-by-4.0
3
  language:
4
  - cs
5
  - en
@@ -7,6 +6,7 @@ language:
7
  - sk
8
  - sl
9
  library_name: transformers
 
10
  tags:
11
  - translation
12
  - mt
@@ -18,11 +18,13 @@ tags:
18
  - pivot
19
  - allegro
20
  - laniqo
 
21
  ---
22
 
23
-
24
  # MultiSlav P5-many2eng
25
 
 
 
26
  <p align="center">
27
  <a href="https://ml.allegro.tech/"><img src="allegro-title.svg" alt="MLR @ Allegro.com"></a>
28
  </p>
@@ -42,7 +44,7 @@ Big thanks to [laniqo.com](laniqo.com) for cooperation in the research.
42
  <img src="p5-eng.svg">
43
  </p>
44
 
45
- ___P5-many2eng___ - _5_-language _Many-to-English_ model translating from all applicable languages to English.
46
  This model and [_P5-eng2many_](https://huggingface.co/allegro/P5-eng2many) combine into ___P5-eng___ pivot system translating between _5_ languages.
47
  _P5-eng_ translates all supported languages using Many2One model to English bridge sentence
48
  and next using the One2Many model from English bridge sentence to target language.
@@ -128,7 +130,7 @@ Generated Polish to Slovene pivot translation via English:
128
  ## Training
129
 
130
  [SentencePiece](https://github.com/google/sentencepiece) tokenizer has a vocab size 80k in total (16k per language). Tokenizer was trained on randomly sampled part of the training corpus.
131
- During the training we used the [MarianNMT](https://marian-nmt.github.io/) framework.
132
  Base marian configuration used: [transfromer-big](https://github.com/marian-nmt/marian-dev/blob/master/src/common/aliases.cpp#L113).
133
  All training parameters are listed in table below.
134
 
 
1
  ---
 
2
  language:
3
  - cs
4
  - en
 
6
  - sk
7
  - sl
8
  library_name: transformers
9
+ license: cc-by-4.0
10
  tags:
11
  - translation
12
  - mt
 
18
  - pivot
19
  - allegro
20
  - laniqo
21
+ pipeline_tag: translation
22
  ---
23
 
 
24
  # MultiSlav P5-many2eng
25
 
26
+ This repository contains the model described in the paper [](https://hf.co/papers/2502.14509).
27
+
28
  <p align="center">
29
  <a href="https://ml.allegro.tech/"><img src="allegro-title.svg" alt="MLR @ Allegro.com"></a>
30
  </p>
 
44
  <img src="p5-eng.svg">
45
  </p>
46
 
47
+ ___P5-many2eng___ - _5_-language _Many-to-English_ model translating from all applicable languages to English.\
48
  This model and [_P5-eng2many_](https://huggingface.co/allegro/P5-eng2many) combine into ___P5-eng___ pivot system translating between _5_ languages.
49
  _P5-eng_ translates all supported languages using Many2One model to English bridge sentence
50
  and next using the One2Many model from English bridge sentence to target language.
 
130
  ## Training
131
 
132
  [SentencePiece](https://github.com/google/sentencepiece) tokenizer has a vocab size 80k in total (16k per language). Tokenizer was trained on randomly sampled part of the training corpus.
133
+ During the training we used the [MarianNMT](https://marian-nmt.github.io/) framework.\
134
  Base marian configuration used: [transfromer-big](https://github.com/marian-nmt/marian-dev/blob/master/src/common/aliases.cpp#L113).
135
  All training parameters are listed in table below.
136