Add files using upload-large-folder tool
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- packages/en_ar/README.md +6 -0
- packages/en_ar/metadata.json +8 -0
- packages/en_ar/model/shared_vocabulary.txt +0 -0
- packages/en_ar/stanza/resources.json +0 -0
- packages/en_es/metadata.json +8 -0
- packages/en_es/model/shared_vocabulary.txt +0 -0
- packages/en_es/stanza/resources.json +0 -0
- packages/en_uk/README.md +25 -0
- packages/en_uk/metadata.json +8 -0
- packages/en_uk/model/shared_vocabulary.txt +0 -0
- packages/en_uk/stanza/resources.json +0 -0
- packages/ga_en/README.md +13 -0
- packages/ga_en/metadata.json +8 -0
- packages/ga_en/model/shared_vocabulary.txt +0 -0
- packages/ga_en/stanza/resources.json +0 -0
- packages/translate-ca_en-1_7/model/model.bin +3 -0
- packages/translate-en_da-1_9/README.md +9 -0
- packages/translate-en_da-1_9/metadata.json +8 -0
- packages/translate-en_da-1_9/model/config.json +10 -0
- packages/translate-en_da-1_9/model/shared_vocabulary.json +0 -0
- packages/translate-en_da-1_9/stanza/resources.json +0 -0
- packages/translate-en_fa-1_5/README.md +26 -0
- packages/translate-en_fa-1_5/metadata.json +8 -0
- packages/translate-en_fa-1_5/model/shared_vocabulary.txt +0 -0
- packages/translate-en_fa-1_5/stanza/resources.json +0 -0
- packages/translate-en_he-1_5/README.md +24 -0
- packages/translate-en_he-1_5/metadata.json +8 -0
- packages/translate-en_he-1_5/model/shared_vocabulary.txt +0 -0
- packages/translate-en_he-1_5/stanza/resources.json +0 -0
- packages/translate-en_nb-1_9/README.md +10 -0
- packages/translate-en_nb-1_9/metadata.json +1 -0
- packages/translate-en_nb-1_9/model/config.json +9 -0
- packages/translate-en_nb-1_9/model/shared_vocabulary.json +0 -0
- packages/translate-en_nb-1_9/stanza/resources.json +0 -0
- packages/translate-en_pl-1_9/README.md +9 -0
- packages/translate-en_pl-1_9/metadata.json +1 -0
- packages/translate-en_pl-1_9/model/config.json +9 -0
- packages/translate-en_pl-1_9/model/shared_vocabulary.json +0 -0
- packages/translate-en_pl-1_9/stanza/resources.json +0 -0
- packages/translate-en_ro-1_9/README.md +10 -0
- packages/translate-en_ro-1_9/metadata.json +1 -0
- packages/translate-en_ro-1_9/model/config.json +9 -0
- packages/translate-en_ro-1_9/stanza/resources.json +0 -0
- packages/translate-en_vi-1_0/README.md +9 -0
- packages/translate-en_vi-1_0/metadata.json +1 -0
- packages/translate-en_vi-1_0/model/config.json +9 -0
- packages/translate-en_vi-1_0/model/shared_vocabulary.json +0 -0
- packages/translate-en_vi-1_0/stanza/resources.json +0 -0
- packages/translate-eo_en-1_5/README.md +20 -0
- packages/translate-eo_en-1_5/metadata.json +8 -0
packages/en_ar/README.md
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# English-Arabic
|
| 2 |
+
|
| 3 |
+
Trained on [OpenSubtitles](opus.nlpl.eu/OpenSubtitles.php), and [UNPC](http://opus.nlpl.eu/UNPC.php) parallel corpuses compiled by [Opus](http://opus.nlpl.eu/index.php)
|
| 4 |
+
|
| 5 |
+
Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/blob/master/LICENSE).
|
| 6 |
+
|
packages/en_ar/metadata.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"package_version": "1.0",
|
| 3 |
+
"argos_version": "1.0",
|
| 4 |
+
"from_code": "en",
|
| 5 |
+
"from_name": "English",
|
| 6 |
+
"to_code": "ar",
|
| 7 |
+
"to_name": "Arabic"
|
| 8 |
+
}
|
packages/en_ar/model/shared_vocabulary.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/en_ar/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/en_es/metadata.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"package_version": "1.0",
|
| 3 |
+
"argos_version": "1.0",
|
| 4 |
+
"from_code": "en",
|
| 5 |
+
"from_name": "English",
|
| 6 |
+
"to_code": "es",
|
| 7 |
+
"to_name": "Spanish"
|
| 8 |
+
}
|
packages/en_es/model/shared_vocabulary.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/en_es/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/en_uk/README.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# English-Ukranian
|
| 2 |
+
|
| 3 |
+
Data compiled by [Opus](https://opus.nlpl.eu/).
|
| 4 |
+
|
| 5 |
+
Dictionary data from Wiktionary using [Wiktextract](https://github.com/tatuylonen/wiktextract).
|
| 6 |
+
|
| 7 |
+
Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).
|
| 8 |
+
|
| 9 |
+
Credits:
|
| 10 |
+
Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.
|
| 11 |
+
|
| 12 |
+
J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
|
| 13 |
+
|
| 14 |
+
@inproceedings{elkishky_ccaligned_2020,
|
| 15 |
+
author = {El-Kishky, Ahmed and Chaudhary, Vishrav and Guzmán, Francisco and Koehn, Philipp},
|
| 16 |
+
booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)},
|
| 17 |
+
month = {November},
|
| 18 |
+
title = {{CCAligned}: A Massive Collection of Cross-lingual Web-Document Pairs},
|
| 19 |
+
year = {2020}
|
| 20 |
+
address = "Online",
|
| 21 |
+
publisher = "Association for Computational Linguistics",
|
| 22 |
+
url = "https://www.aclweb.org/anthology/2020.emnlp-main.480",
|
| 23 |
+
doi = "10.18653/v1/2020.emnlp-main.480",
|
| 24 |
+
pages = "5960--5969"
|
| 25 |
+
}
|
packages/en_uk/metadata.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"package_version": "1.4",
|
| 3 |
+
"argos_version": "1.4",
|
| 4 |
+
"from_code": "en",
|
| 5 |
+
"from_name": "English",
|
| 6 |
+
"to_code": "uk",
|
| 7 |
+
"to_name": "Ukranian"
|
| 8 |
+
}
|
packages/en_uk/model/shared_vocabulary.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/en_uk/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/ga_en/README.md
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Irish-English
|
| 2 |
+
|
| 3 |
+
Data compiled by [Opus](https://opus.nlpl.eu/).
|
| 4 |
+
|
| 5 |
+
Dictionary data from Wiktionary using [Wiktextract](https://github.com/tatuylonen/wiktextract).
|
| 6 |
+
|
| 7 |
+
Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).
|
| 8 |
+
|
| 9 |
+
Credits:
|
| 10 |
+
Please, acknowledge the ParaCrawl project at http://paracrawl.eu.
|
| 11 |
+
|
| 12 |
+
J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
|
| 13 |
+
|
packages/ga_en/metadata.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"package_version": "1.1",
|
| 3 |
+
"argos_version": "1.1",
|
| 4 |
+
"from_code": "ga",
|
| 5 |
+
"from_name": "Irish",
|
| 6 |
+
"to_code": "en",
|
| 7 |
+
"to_name": "English"
|
| 8 |
+
}
|
packages/ga_en/model/shared_vocabulary.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/ga_en/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-ca_en-1_7/model/model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e94e77459539b21bca095374f2846ab3313b1c969b71009f449485bb8151752f
|
| 3 |
+
size 80969273
|
packages/translate-en_da-1_9/README.md
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# English - Danish version 1.0
|
| 2 |
+
|
| 3 |
+
Authors: Jörg Tiedemann and Santhosh Thottingal
|
| 4 |
+
Title: "OPUS-MT — Building open translation services for the World"
|
| 5 |
+
Book Title: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT)
|
| 6 |
+
Year: 2020
|
| 7 |
+
Location: Lisbon, Portugal
|
| 8 |
+
|
| 9 |
+
The original OPUS model from which this packaged model is derived is licensed CC-BY 4.0
|
packages/translate-en_da-1_9/metadata.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"package_version": "1.9",
|
| 3 |
+
"argos_version": "1.9.0",
|
| 4 |
+
"from_code": "en",
|
| 5 |
+
"from_name": "English",
|
| 6 |
+
"to_code": "da",
|
| 7 |
+
"to_name": "Danish"
|
| 8 |
+
}
|
packages/translate-en_da-1_9/model/config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_source_bos": false,
|
| 3 |
+
"add_source_eos": true,
|
| 4 |
+
"bos_token": "<s>",
|
| 5 |
+
"decoder_start_token": "<s>",
|
| 6 |
+
"eos_token": "</s>",
|
| 7 |
+
"layer_norm_epsilon": null,
|
| 8 |
+
"multi_query_attention": false,
|
| 9 |
+
"unk_token": "<unk>"
|
| 10 |
+
}
|
packages/translate-en_da-1_9/model/shared_vocabulary.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_da-1_9/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_fa-1_5/README.md
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# English-Persian
|
| 2 |
+
|
| 3 |
+
Data compiled by [Opus](https://opus.nlpl.eu/).
|
| 4 |
+
|
| 5 |
+
Dictionary data from Wiktionary using [Wiktextract](https://github.com/tatuylonen/wiktextract).
|
| 6 |
+
|
| 7 |
+
Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).
|
| 8 |
+
|
| 9 |
+
Credits:
|
| 10 |
+
|
| 11 |
+
Krzysztof Wołk and Krzysztof Marasek: Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs., Procedia Technology, 18, Elsevier, p.126-132, 2014
|
| 12 |
+
|
| 13 |
+
J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
|
| 14 |
+
|
| 15 |
+
Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.
|
| 16 |
+
|
| 17 |
+
Kashefi, O. (2018). MIZAN: a large persian-english parallel corpus. Computing Research Repository, arXiv:1801.02107.
|
| 18 |
+
|
| 19 |
+
El-Kishky, Ahmed and Renduchintala, Adi and Cross, James and Guzmán, Francisco and Koehn, Philipp - XLEnt: Mining Cross-lingual Entities with Lexical-Semantic-Phonetic Word Alignment
|
| 20 |
+
|
| 21 |
+
El-Kishky, Ahmed and Chaudhary, Vishrav and Guzmán, Francisco and Koehn, Philipp - Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
|
| 22 |
+
|
| 23 |
+
P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)
|
| 24 |
+
|
| 25 |
+
Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin and Angela Fan, CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
|
| 26 |
+
|
packages/translate-en_fa-1_5/metadata.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"package_version": "1.5",
|
| 3 |
+
"argos_version": "1.5",
|
| 4 |
+
"from_code": "en",
|
| 5 |
+
"from_name": "English",
|
| 6 |
+
"to_code": "fa",
|
| 7 |
+
"to_name": "Persian"
|
| 8 |
+
}
|
packages/translate-en_fa-1_5/model/shared_vocabulary.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_fa-1_5/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_he-1_5/README.md
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# English-Hebrew
|
| 2 |
+
|
| 3 |
+
Data compiled by [Opus](https://opus.nlpl.eu/).
|
| 4 |
+
|
| 5 |
+
Dictionary data from Wiktionary using [Wiktextract](https://github.com/tatuylonen/wiktextract).
|
| 6 |
+
|
| 7 |
+
Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).
|
| 8 |
+
|
| 9 |
+
Credits:
|
| 10 |
+
|
| 11 |
+
J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
|
| 12 |
+
|
| 13 |
+
J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
|
| 14 |
+
|
| 15 |
+
Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.
|
| 16 |
+
|
| 17 |
+
El-Kishky, Ahmed and Renduchintala, Adi and Cross, James and Guzmán, Francisco and Koehn, Philip - XLEnt: Mining Cross-lingual Entities with Lexical-Semantic-Phonetic Word Alignment
|
| 18 |
+
|
| 19 |
+
El-Kishky, Ahmed and Chaudhary, Vishrav and Guzmán, Francisco and Koehn, Philip - ,Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
|
| 20 |
+
|
| 21 |
+
Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin and Angela Fan, CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
|
| 22 |
+
|
| 23 |
+
Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, and Armand Joulin. Beyond English-Centric Multilingual Machine Translation
|
| 24 |
+
|
packages/translate-en_he-1_5/metadata.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"package_version": "1.5",
|
| 3 |
+
"argos_version": "1.5",
|
| 4 |
+
"from_code": "en",
|
| 5 |
+
"from_name": "English",
|
| 6 |
+
"to_code": "he",
|
| 7 |
+
"to_name": "Hebrew"
|
| 8 |
+
}
|
packages/translate-en_he-1_5/model/shared_vocabulary.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_he-1_5/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_nb-1_9/README.md
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# English - Norwegian version 1.9
|
| 2 |
+
|
| 3 |
+
Authors: Jörg Tiedemann and Santhosh Thottingal
|
| 4 |
+
Title: "OPUS-MT — Building open translation services for the World"
|
| 5 |
+
Book Title: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT)
|
| 6 |
+
Year: 2020
|
| 7 |
+
Location: Lisbon, Portugal
|
| 8 |
+
|
| 9 |
+
The original OPUS model from which this packaged model is derived is licensed CC-BY 4.0
|
| 10 |
+
|
packages/translate-en_nb-1_9/metadata.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"package_version": "1.9", "argos_version": "1.9.0", "from_code": "en", "from_name": "English", "to_code": "nb", "to_name": "Norwegian"}
|
packages/translate-en_nb-1_9/model/config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_source_bos": true,
|
| 3 |
+
"add_source_eos": true,
|
| 4 |
+
"bos_token": ">>nob<<",
|
| 5 |
+
"decoder_start_token": "<s>",
|
| 6 |
+
"eos_token": "</s>",
|
| 7 |
+
"layer_norm_epsilon": null,
|
| 8 |
+
"unk_token": "<unk>"
|
| 9 |
+
}
|
packages/translate-en_nb-1_9/model/shared_vocabulary.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_nb-1_9/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_pl-1_9/README.md
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# English - Polish version 1.9
|
| 2 |
+
|
| 3 |
+
Authors: Jörg Tiedemann and Santhosh Thottingal
|
| 4 |
+
Title: "OPUS-MT — Building open translation services for the World"
|
| 5 |
+
Book Title: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT)
|
| 6 |
+
Year: 2020
|
| 7 |
+
Location: Lisbon, Portugal
|
| 8 |
+
|
| 9 |
+
The original OPUS model from which this packaged model is derived is licensed CC-BY 4.0
|
packages/translate-en_pl-1_9/metadata.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"package_version": "1.9", "argos_version": "1.9.0", "from_code": "en", "from_name": "English", "to_code": "pl", "to_name": "Polish"}
|
packages/translate-en_pl-1_9/model/config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_source_bos": false,
|
| 3 |
+
"add_source_eos": true,
|
| 4 |
+
"bos_token": "<s>",
|
| 5 |
+
"decoder_start_token": "<s>",
|
| 6 |
+
"eos_token": "</s>",
|
| 7 |
+
"layer_norm_epsilon": null,
|
| 8 |
+
"unk_token": "<unk>"
|
| 9 |
+
}
|
packages/translate-en_pl-1_9/model/shared_vocabulary.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_pl-1_9/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_ro-1_9/README.md
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# English - Romanian version 1.9
|
| 2 |
+
|
| 3 |
+
Authors: Jörg Tiedemann and Santhosh Thottingal
|
| 4 |
+
Title: "OPUS-MT — Building open translation services for the World"
|
| 5 |
+
Book Title: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT)
|
| 6 |
+
Year: 2020
|
| 7 |
+
Location: Lisbon, Portugal
|
| 8 |
+
|
| 9 |
+
The original OPUS model from which this packaged model is derived is licensed CC-BY 4.0
|
| 10 |
+
|
packages/translate-en_ro-1_9/metadata.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"package_version": "1.9", "argos_version": "1.9.0", "from_code": "en", "from_name": "English", "to_code": "ro", "to_name": "Romanian"}
|
packages/translate-en_ro-1_9/model/config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_source_bos": false,
|
| 3 |
+
"add_source_eos": true,
|
| 4 |
+
"bos_token": "<s>",
|
| 5 |
+
"decoder_start_token": "<s>",
|
| 6 |
+
"eos_token": "</s>",
|
| 7 |
+
"layer_norm_epsilon": null,
|
| 8 |
+
"unk_token": "<unk>"
|
| 9 |
+
}
|
packages/translate-en_ro-1_9/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_vi-1_0/README.md
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# English - Vietnamese version 1.0
|
| 2 |
+
|
| 3 |
+
Authors: Jörg Tiedemann and Santhosh Thottingal
|
| 4 |
+
Title: "OPUS-MT — Building open translation services for the World"
|
| 5 |
+
Book Title: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT)
|
| 6 |
+
Year: 2020
|
| 7 |
+
Location: Lisbon, Portugal
|
| 8 |
+
|
| 9 |
+
The original OPUS model from which this packaged model is derived is licensed CC-BY 4.0
|
packages/translate-en_vi-1_0/metadata.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"package_version": "1.0", "argos_version": "1.9.0", "from_code": "en", "from_name": "English", "to_code": "vi", "to_name": "Vietnamese"}
|
packages/translate-en_vi-1_0/model/config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_source_bos": true,
|
| 3 |
+
"add_source_eos": true,
|
| 4 |
+
"bos_token": ">>vie<<",
|
| 5 |
+
"decoder_start_token": "<s>",
|
| 6 |
+
"eos_token": "</s>",
|
| 7 |
+
"layer_norm_epsilon": null,
|
| 8 |
+
"unk_token": "<unk>"
|
| 9 |
+
}
|
packages/translate-en_vi-1_0/model/shared_vocabulary.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-en_vi-1_0/stanza/resources.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
packages/translate-eo_en-1_5/README.md
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Esperanto-English
|
| 2 |
+
|
| 3 |
+
Data compiled by [Opus](https://opus.nlpl.eu/).
|
| 4 |
+
|
| 5 |
+
Dictionary data from Wiktionary using [Wiktextract](https://github.com/tatuylonen/wiktextract).
|
| 6 |
+
|
| 7 |
+
Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).
|
| 8 |
+
|
| 9 |
+
Credits:
|
| 10 |
+
|
| 11 |
+
J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
|
| 12 |
+
|
| 13 |
+
P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)
|
| 14 |
+
|
| 15 |
+
Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.
|
| 16 |
+
|
| 17 |
+
El-Kishky, Ahmed and Renduchintala, Adi and Cross, James and Guzmán, Francisco and Koehn, Philipp - XLEnt: Mining Cross-lingual Entities with Lexical-Semantic-Phonetic Word Alignment
|
| 18 |
+
|
| 19 |
+
Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin and Angela Fan, CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
|
| 20 |
+
|
packages/translate-eo_en-1_5/metadata.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"package_version": "1.5",
|
| 3 |
+
"argos_version": "1.5",
|
| 4 |
+
"from_code": "eo",
|
| 5 |
+
"from_name": "Esperanto",
|
| 6 |
+
"to_code": "en",
|
| 7 |
+
"to_name": "English"
|
| 8 |
+
}
|