radinplaid commited on
Commit
8f2eaf8
·
verified ·
1 Parent(s): f12060b

Upload folder using huggingface_hub

Browse files
.ipynb_checkpoints/README-checkpoint.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - translation
7
  license: cc-by-4.0
8
  datasets:
9
- - quickmt/quickmt-train.ru-en
10
  model-index:
11
  - name: quickmt-ru-en
12
  results:
@@ -21,31 +21,38 @@ model-index:
21
  metrics:
22
  - name: BLEU
23
  type: bleu
24
- value: 33.9
25
  - name: CHRF
26
  type: chrf
27
- value: 61.63
28
  - name: COMET
29
  type: comet
30
- value: 85.7
31
  ---
32
 
33
 
34
- # `quickmt-ru-en` Neural Machine Translation Model
35
 
36
  `quickmt-ru-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `ru` into `en`.
37
 
 
 
 
 
 
 
 
38
 
39
  ## Model Information
40
 
41
- * Trained using [`eole`](https://github.com/eole-nlp/eole)
42
- * 185M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
43
- * 50k joint Sentencepiece vocabulary
44
  * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
45
- * Training data: https://huggingface.co/datasets/quickmt/quickmt-train.ru-en/tree/main
46
 
47
  See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
48
 
 
49
  ## Usage with `quickmt`
50
 
51
  You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
@@ -68,12 +75,12 @@ from quickmt import Translator
68
  t = Translator("./quickmt-ru-en/", device="auto")
69
 
70
  # Translate - set beam size to 1 for faster speed (but lower quality)
71
- sample_text = 'Согласно предупреждению доктора Эхуда Ура (Ehud Ur), профессора медицины в Университете Дэлхаузи в Галифаксе (Новая Шотландия) и председателя клинико-научного отдела Канадской диабетической ассоциации, исследования все еще находятся на начальной стадии.'
72
 
73
  t(sample_text, beam_size=5)
74
  ```
75
 
76
- > 'According to the warning of Dr. Ehud Ur, Professor of Medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical Science Division of the Canadian Diabetes Association, the research is still in its infancy.'
77
 
78
  ```python
79
  # Get alternative translations by sampling
@@ -81,20 +88,20 @@ t(sample_text, beam_size=5)
81
  t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
82
  ```
83
 
84
- > 'According to the warning of Professor Ehud Ur, a Professor of Medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical and Scientific Division of the Canadian Diabetes Association, research is still in a very early stage.'
85
 
 
86
 
87
- The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`.
88
 
89
  ## Metrics
90
 
91
- `bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("rus_Cyrl"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32 (faster speed is possible using a larger batch size).
92
 
93
  | | bleu | chrf2 | comet22 | Time (s) |
94
  |:---------------------------------|-------:|--------:|----------:|-----------:|
95
- | quickmt/quickmt-ru-en | 33.9 | 61.63 | 85.7 | 1.31 |
96
- | Helsink-NLP/opus-mt-ru-en | 30.04 | 58.23 | 83.97 | 3.72 |
97
- | facebook/nllb-200-distilled-600M | 34.59 | 61.26 | 85.88 | 21.93 |
98
- | facebook/nllb-200-distilled-1.3B | 36.99 | 63.04 | 86.59 | 38.12 |
99
- | facebook/m2m100_418M | 26.62 | 56.31 | 81.77 | 18.73 |
100
- | facebook/m2m100_1.2B | 32.01 | 60.3 | 85.01 | 35.99 |
 
6
  - translation
7
  license: cc-by-4.0
8
  datasets:
9
+ - quickmt/quickmt-train.ru-en-v2
10
  model-index:
11
  - name: quickmt-ru-en
12
  results:
 
21
  metrics:
22
  - name: BLEU
23
  type: bleu
24
+ value: 34.69
25
  - name: CHRF
26
  type: chrf
27
+ value: 62.31
28
  - name: COMET
29
  type: comet
30
+ value: 85.96
31
  ---
32
 
33
 
34
+ # `quickmt-ru-en` Neural Machine Translation Model - V2
35
 
36
  `quickmt-ru-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `ru` into `en`.
37
 
38
+ This is an updated, higher-quality model with a larger, cleaner training dataset trained for more steps.
39
+
40
+
41
+ ## Try it on our Huggingface Space
42
+
43
+ Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-Demo
44
+
45
 
46
  ## Model Information
47
 
48
+ * Trained using [`eole`](https://github.com/eole-nlp/eole)
49
+ * 200M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
50
+ * 32k separate Sentencepiece vocabs
51
  * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
 
52
 
53
  See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
54
 
55
+
56
  ## Usage with `quickmt`
57
 
58
  You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
 
75
  t = Translator("./quickmt-ru-en/", device="auto")
76
 
77
  # Translate - set beam size to 1 for faster speed (but lower quality)
78
+ sample_text = 'Dr. Ehud Ur, professor i medicin på Dalhousie University i Halifax, Nova Scotia, og formand for den kliniske og videnskabelige afdeling af Canadian Diabetes Association, advarede om at forskningen stadig er i dens tidlige stadier.'
79
 
80
  t(sample_text, beam_size=5)
81
  ```
82
 
83
+ > 'According to Dr. Ehud Ur, professor of medicine at Dalhousie University in Halifax, Nova Scotia and chair of the clinical science department of the Canadian Diabetes Association, the research is still in its infancy.'
84
 
85
  ```python
86
  # Get alternative translations by sampling
 
88
  t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
89
  ```
90
 
91
+ > 'According to Dr. Ehud Ur, a professor of medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical Research Division of the Canadian Diabetes Association, research is still in the initial stages.'
92
 
93
+ The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
94
 
 
95
 
96
  ## Metrics
97
 
98
+ `bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("rus_Cyrl"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32.
99
 
100
  | | bleu | chrf2 | comet22 | Time (s) |
101
  |:---------------------------------|-------:|--------:|----------:|-----------:|
102
+ | quickmt/quickmt-ru-en | 34.69 | 62.31 | 85.96 | 1.27 |
103
+ | Helsinki-NLP/opus-mt-ru-en | 30.04 | 58.23 | 83.97 | 3.81 |
104
+ | facebook/nllb-200-distilled-600M | 34.59 | 61.26 | 85.88 | 22.07 |
105
+ | facebook/nllb-200-distilled-1.3B | 36.99 | 63.04 | 86.59 | 38.26 |
106
+ | facebook/m2m100_418M | 26.62 | 56.31 | 81.77 | 18.7 |
107
+ | facebook/m2m100_1.2B | 32.01 | 60.3 | 85.01 | 36.32 |
.ipynb_checkpoints/eole-config-checkpoint.yaml ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## IO
2
+ save_data: data
3
+ overwrite: True
4
+ seed: 1234
5
+ report_every: 100
6
+ valid_metrics: ["BLEU"]
7
+ tensorboard: true
8
+ tensorboard_log_dir: tensorboard
9
+
10
+ ### Vocab
11
+ src_vocab: ru.eole.vocab
12
+ tgt_vocab: en.eole.vocab
13
+ src_vocab_size: 32000
14
+ tgt_vocab_size: 32000
15
+ vocab_size_multiple: 8
16
+ share_vocab: false
17
+ n_sample: 0
18
+
19
+ data:
20
+ corpus_1:
21
+ path_src: hf://quickmt/quickmt-train.ru-en-v2/ru
22
+ path_tgt: hf://quickmt/quickmt-train.ru-en-v2/en
23
+ path_sco: hf://quickmt/quickmt-train.ru-en-v2/sco
24
+ valid:
25
+ path_src: valid.ru
26
+ path_tgt: valid.en
27
+
28
+ transforms: [sentencepiece, filtertoolong]
29
+ transforms_configs:
30
+ sentencepiece:
31
+ src_subword_model: "ru.spm.model"
32
+ tgt_subword_model: "en.spm.model"
33
+ filtertoolong:
34
+ src_seq_length: 256
35
+ tgt_seq_length: 256
36
+
37
+ training:
38
+ # Run configuration
39
+ model_path: quickmt-ru-en-eole-model
40
+ #train_from: model
41
+ keep_checkpoint: 4
42
+ train_steps: 200000
43
+ save_checkpoint_steps: 5000
44
+ valid_steps: 5000
45
+
46
+ # Train on a single GPU
47
+ world_size: 1
48
+ gpu_ranks: [0]
49
+
50
+ # Batching 10240
51
+ batch_type: "tokens"
52
+ batch_size: 12000
53
+ valid_batch_size: 2048
54
+ batch_size_multiple: 8
55
+ accum_count: [10]
56
+ accum_steps: [0]
57
+
58
+ # Optimizer & Compute
59
+ compute_dtype: "fp16"
60
+ optim: "adamw"
61
+ #use_amp: False
62
+ learning_rate: 3.0
63
+ warmup_steps: 5000
64
+ decay_method: "noam"
65
+ adam_beta2: 0.998
66
+
67
+ # Data loading
68
+ bucket_size: 128000
69
+ num_workers: 4
70
+ prefetch_factor: 32
71
+
72
+ # Hyperparams
73
+ dropout_steps: [0]
74
+ dropout: [0.1]
75
+ attention_dropout: [0.1]
76
+ max_grad_norm: 0
77
+ label_smoothing: 0.1
78
+ average_decay: 0.0001
79
+ param_init_method: xavier_uniform
80
+ normalization: "tokens"
81
+
82
+ model:
83
+ architecture: "transformer"
84
+ share_embeddings: false
85
+ share_decoder_embeddings: true
86
+ hidden_size: 1024
87
+ encoder:
88
+ layers: 8
89
+ decoder:
90
+ layers: 2
91
+ heads: 8
92
+ transformer_ff: 4096
93
+ embeddings:
94
+ word_vec_size: 1024
95
+ position_encoding_type: "SinusoidalInterleaved"
96
+
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - translation
7
  license: cc-by-4.0
8
  datasets:
9
- - quickmt/quickmt-train.ru-en
10
  model-index:
11
  - name: quickmt-ru-en
12
  results:
@@ -21,31 +21,38 @@ model-index:
21
  metrics:
22
  - name: BLEU
23
  type: bleu
24
- value: 33.9
25
  - name: CHRF
26
  type: chrf
27
- value: 61.63
28
  - name: COMET
29
  type: comet
30
- value: 85.7
31
  ---
32
 
33
 
34
- # `quickmt-ru-en` Neural Machine Translation Model
35
 
36
  `quickmt-ru-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `ru` into `en`.
37
 
 
 
 
 
 
 
 
38
 
39
  ## Model Information
40
 
41
- * Trained using [`eole`](https://github.com/eole-nlp/eole)
42
- * 185M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
43
- * 50k joint Sentencepiece vocabulary
44
  * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
45
- * Training data: https://huggingface.co/datasets/quickmt/quickmt-train.ru-en/tree/main
46
 
47
  See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
48
 
 
49
  ## Usage with `quickmt`
50
 
51
  You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
@@ -68,12 +75,12 @@ from quickmt import Translator
68
  t = Translator("./quickmt-ru-en/", device="auto")
69
 
70
  # Translate - set beam size to 1 for faster speed (but lower quality)
71
- sample_text = 'Согласно предупреждению доктора Эхуда Ура (Ehud Ur), профессора медицины в Университете Дэлхаузи в Галифаксе (Новая Шотландия) и председателя клинико-научного отдела Канадской диабетической ассоциации, исследования все еще находятся на начальной стадии.'
72
 
73
  t(sample_text, beam_size=5)
74
  ```
75
 
76
- > 'According to the warning of Dr. Ehud Ur, Professor of Medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical Science Division of the Canadian Diabetes Association, the research is still in its infancy.'
77
 
78
  ```python
79
  # Get alternative translations by sampling
@@ -81,20 +88,20 @@ t(sample_text, beam_size=5)
81
  t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
82
  ```
83
 
84
- > 'According to the warning of Professor Ehud Ur, a Professor of Medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical and Scientific Division of the Canadian Diabetes Association, research is still in a very early stage.'
85
 
 
86
 
87
- The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`.
88
 
89
  ## Metrics
90
 
91
- `bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("rus_Cyrl"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32 (faster speed is possible using a larger batch size).
92
 
93
  | | bleu | chrf2 | comet22 | Time (s) |
94
  |:---------------------------------|-------:|--------:|----------:|-----------:|
95
- | quickmt/quickmt-ru-en | 33.9 | 61.63 | 85.7 | 1.31 |
96
- | Helsink-NLP/opus-mt-ru-en | 30.04 | 58.23 | 83.97 | 3.72 |
97
- | facebook/nllb-200-distilled-600M | 34.59 | 61.26 | 85.88 | 21.93 |
98
- | facebook/nllb-200-distilled-1.3B | 36.99 | 63.04 | 86.59 | 38.12 |
99
- | facebook/m2m100_418M | 26.62 | 56.31 | 81.77 | 18.73 |
100
- | facebook/m2m100_1.2B | 32.01 | 60.3 | 85.01 | 35.99 |
 
6
  - translation
7
  license: cc-by-4.0
8
  datasets:
9
+ - quickmt/quickmt-train.ru-en-v2
10
  model-index:
11
  - name: quickmt-ru-en
12
  results:
 
21
  metrics:
22
  - name: BLEU
23
  type: bleu
24
+ value: 34.69
25
  - name: CHRF
26
  type: chrf
27
+ value: 62.31
28
  - name: COMET
29
  type: comet
30
+ value: 85.96
31
  ---
32
 
33
 
34
+ # `quickmt-ru-en` Neural Machine Translation Model - V2
35
 
36
  `quickmt-ru-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `ru` into `en`.
37
 
38
+ This is an updated, higher-quality model with a larger, cleaner training dataset trained for more steps.
39
+
40
+
41
+ ## Try it on our Huggingface Space
42
+
43
+ Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-Demo
44
+
45
 
46
  ## Model Information
47
 
48
+ * Trained using [`eole`](https://github.com/eole-nlp/eole)
49
+ * 200M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
50
+ * 32k separate Sentencepiece vocabs
51
  * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
 
52
 
53
  See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
54
 
55
+
56
  ## Usage with `quickmt`
57
 
58
  You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
 
75
  t = Translator("./quickmt-ru-en/", device="auto")
76
 
77
  # Translate - set beam size to 1 for faster speed (but lower quality)
78
+ sample_text = 'Dr. Ehud Ur, professor i medicin på Dalhousie University i Halifax, Nova Scotia, og formand for den kliniske og videnskabelige afdeling af Canadian Diabetes Association, advarede om at forskningen stadig er i dens tidlige stadier.'
79
 
80
  t(sample_text, beam_size=5)
81
  ```
82
 
83
+ > 'According to Dr. Ehud Ur, professor of medicine at Dalhousie University in Halifax, Nova Scotia and chair of the clinical science department of the Canadian Diabetes Association, the research is still in its infancy.'
84
 
85
  ```python
86
  # Get alternative translations by sampling
 
88
  t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
89
  ```
90
 
91
+ > 'According to Dr. Ehud Ur, a professor of medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical Research Division of the Canadian Diabetes Association, research is still in the initial stages.'
92
 
93
+ The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
94
 
 
95
 
96
  ## Metrics
97
 
98
+ `bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("rus_Cyrl"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32.
99
 
100
  | | bleu | chrf2 | comet22 | Time (s) |
101
  |:---------------------------------|-------:|--------:|----------:|-----------:|
102
+ | quickmt/quickmt-ru-en | 34.69 | 62.31 | 85.96 | 1.27 |
103
+ | Helsinki-NLP/opus-mt-ru-en | 30.04 | 58.23 | 83.97 | 3.81 |
104
+ | facebook/nllb-200-distilled-600M | 34.59 | 61.26 | 85.88 | 22.07 |
105
+ | facebook/nllb-200-distilled-1.3B | 36.99 | 63.04 | 86.59 | 38.26 |
106
+ | facebook/m2m100_418M | 26.62 | 56.31 | 81.77 | 18.7 |
107
+ | facebook/m2m100_1.2B | 32.01 | 60.3 | 85.01 | 36.32 |
eole-config.yaml CHANGED
@@ -10,22 +10,20 @@ tensorboard_log_dir: tensorboard
10
  ### Vocab
11
  src_vocab: ru.eole.vocab
12
  tgt_vocab: en.eole.vocab
13
- src_vocab_size: 20000
14
- tgt_vocab_size: 20000
15
  vocab_size_multiple: 8
16
  share_vocab: false
17
  n_sample: 0
18
 
19
  data:
20
  corpus_1:
21
- # path_src: hf://quickmt/quickmt-train.ru-en/ru
22
- # path_tgt: hf://quickmt/quickmt-train.ru-en/en
23
- # path_sco: hf://quickmt/quickmt-train.ru-en/sco
24
- path_src: train.ru
25
- path_tgt: train.en
26
  valid:
27
- path_src: dev.ru
28
- path_tgt: dev.en
29
 
30
  transforms: [sentencepiece, filtertoolong]
31
  transforms_configs:
@@ -41,7 +39,7 @@ training:
41
  model_path: quickmt-ru-en-eole-model
42
  #train_from: model
43
  keep_checkpoint: 4
44
- train_steps: 100000
45
  save_checkpoint_steps: 5000
46
  valid_steps: 5000
47
 
@@ -51,8 +49,8 @@ training:
51
 
52
  # Batching 10240
53
  batch_type: "tokens"
54
- batch_size: 8000
55
- valid_batch_size: 4096
56
  batch_size_multiple: 8
57
  accum_count: [10]
58
  accum_steps: [0]
@@ -61,8 +59,8 @@ training:
61
  compute_dtype: "fp16"
62
  optim: "adamw"
63
  #use_amp: False
64
- learning_rate: 2.0
65
- warmup_steps: 4000
66
  decay_method: "noam"
67
  adam_beta2: 0.998
68
 
@@ -84,7 +82,7 @@ training:
84
  model:
85
  architecture: "transformer"
86
  share_embeddings: false
87
- share_decoder_embeddings: false
88
  hidden_size: 1024
89
  encoder:
90
  layers: 8
 
10
  ### Vocab
11
  src_vocab: ru.eole.vocab
12
  tgt_vocab: en.eole.vocab
13
+ src_vocab_size: 32000
14
+ tgt_vocab_size: 32000
15
  vocab_size_multiple: 8
16
  share_vocab: false
17
  n_sample: 0
18
 
19
  data:
20
  corpus_1:
21
+ path_src: hf://quickmt/quickmt-train.ru-en-v2/ru
22
+ path_tgt: hf://quickmt/quickmt-train.ru-en-v2/en
23
+ path_sco: hf://quickmt/quickmt-train.ru-en-v2/sco
 
 
24
  valid:
25
+ path_src: valid.ru
26
+ path_tgt: valid.en
27
 
28
  transforms: [sentencepiece, filtertoolong]
29
  transforms_configs:
 
39
  model_path: quickmt-ru-en-eole-model
40
  #train_from: model
41
  keep_checkpoint: 4
42
+ train_steps: 200000
43
  save_checkpoint_steps: 5000
44
  valid_steps: 5000
45
 
 
49
 
50
  # Batching 10240
51
  batch_type: "tokens"
52
+ batch_size: 12000
53
+ valid_batch_size: 2048
54
  batch_size_multiple: 8
55
  accum_count: [10]
56
  accum_steps: [0]
 
59
  compute_dtype: "fp16"
60
  optim: "adamw"
61
  #use_amp: False
62
+ learning_rate: 3.0
63
+ warmup_steps: 5000
64
  decay_method: "noam"
65
  adam_beta2: 0.998
66
 
 
82
  model:
83
  architecture: "transformer"
84
  share_embeddings: false
85
+ share_decoder_embeddings: true
86
  hidden_size: 1024
87
  encoder:
88
  layers: 8
eole-model/config.json CHANGED
@@ -1,77 +1,87 @@
1
  {
2
  "report_every": 100,
3
- "tgt_vocab": "en.eole.vocab",
4
  "valid_metrics": [
5
  "BLEU"
6
  ],
 
 
7
  "tensorboard": true,
 
 
8
  "src_vocab": "ru.eole.vocab",
 
 
 
 
 
9
  "transforms": [
10
  "sentencepiece",
11
  "filtertoolong"
12
  ],
13
- "vocab_size_multiple": 8,
14
- "tensorboard_log_dir": "tensorboard",
15
  "seed": 1234,
16
- "n_sample": 0,
17
- "save_data": "data",
18
- "share_vocab": false,
19
- "src_vocab_size": 20000,
20
- "tensorboard_log_dir_dated": "tensorboard/May-06_17-28-49",
21
- "tgt_vocab_size": 20000,
22
- "overwrite": true,
23
  "training": {
24
- "bucket_size": 128000,
 
 
 
 
 
 
 
 
 
25
  "dropout_steps": [
26
  0
27
  ],
28
- "keep_checkpoint": 4,
 
 
 
 
 
 
 
 
 
 
 
29
  "average_decay": 0.0001,
30
- "param_init_method": "xavier_uniform",
31
  "attention_dropout": [
32
  0.1
33
  ],
34
- "train_steps": 100000,
35
- "batch_size": 8000,
36
- "accum_steps": [
37
- 0
38
- ],
39
- "prefetch_factor": 32,
40
- "max_grad_norm": 0.0,
41
- "valid_batch_size": 4096,
42
  "dropout": [
43
  0.1
44
  ],
45
  "num_workers": 0,
46
  "decay_method": "noam",
47
- "valid_steps": 5000,
48
- "model_path": "quickmt-ru-en-eole-model",
49
- "world_size": 1,
50
- "learning_rate": 2.0,
51
- "save_checkpoint_steps": 5000,
52
- "optim": "adamw",
53
- "normalization": "tokens",
54
  "adam_beta2": 0.998,
55
- "warmup_steps": 4000,
56
- "batch_size_multiple": 8,
57
- "label_smoothing": 0.1,
58
  "compute_dtype": "torch.float16",
59
- "gpu_ranks": [
60
- 0
61
- ],
62
- "accum_count": [
63
- 10
64
- ],
65
- "batch_type": "tokens"
66
  },
67
  "model": {
68
- "transformer_ff": 4096,
69
- "architecture": "transformer",
70
- "hidden_size": 1024,
71
- "share_decoder_embeddings": false,
72
  "position_encoding_type": "SinusoidalInterleaved",
73
  "heads": 8,
 
 
74
  "share_embeddings": false,
 
 
 
 
 
 
 
 
 
 
 
75
  "embeddings": {
76
  "src_word_vec_size": 1024,
77
  "position_encoding_type": "SinusoidalInterleaved",
@@ -79,24 +89,14 @@
79
  "word_vec_size": 1024
80
  },
81
  "encoder": {
82
- "transformer_ff": 4096,
83
- "hidden_size": 1024,
84
- "layers": 8,
85
  "position_encoding_type": "SinusoidalInterleaved",
86
- "encoder_type": "transformer",
87
- "src_word_vec_size": 1024,
88
  "heads": 8,
89
- "n_positions": null
90
- },
91
- "decoder": {
92
  "transformer_ff": 4096,
93
- "tgt_word_vec_size": 1024,
94
- "decoder_type": "transformer",
95
- "hidden_size": 1024,
96
- "layers": 2,
97
- "position_encoding_type": "SinusoidalInterleaved",
98
- "heads": 8,
99
- "n_positions": null
100
  }
101
  },
102
  "transforms_configs": {
@@ -105,28 +105,29 @@
105
  "src_seq_length": 256
106
  },
107
  "sentencepiece": {
108
- "tgt_subword_model": "${MODEL_PATH}/en.spm.model",
109
- "src_subword_model": "${MODEL_PATH}/ru.spm.model"
110
  }
111
  },
112
  "data": {
113
  "corpus_1": {
114
- "path_src": "train.ru",
115
  "transforms": [
116
  "sentencepiece",
117
  "filtertoolong"
118
  ],
119
- "path_align": null,
120
- "path_tgt": "train.en"
 
 
121
  },
122
  "valid": {
123
- "path_src": "dev.ru",
 
 
124
  "transforms": [
125
  "sentencepiece",
126
  "filtertoolong"
127
- ],
128
- "path_align": null,
129
- "path_tgt": "dev.en"
130
  }
131
  }
132
  }
 
1
  {
2
  "report_every": 100,
 
3
  "valid_metrics": [
4
  "BLEU"
5
  ],
6
+ "overwrite": true,
7
+ "tensorboard_log_dir_dated": "tensorboard/Nov-03_11-32-41",
8
  "tensorboard": true,
9
+ "share_vocab": false,
10
+ "src_vocab_size": 32000,
11
  "src_vocab": "ru.eole.vocab",
12
+ "save_data": "data",
13
+ "tgt_vocab_size": 32000,
14
+ "n_sample": 0,
15
+ "tgt_vocab": "en.eole.vocab",
16
+ "tensorboard_log_dir": "tensorboard",
17
  "transforms": [
18
  "sentencepiece",
19
  "filtertoolong"
20
  ],
 
 
21
  "seed": 1234,
22
+ "vocab_size_multiple": 8,
 
 
 
 
 
 
23
  "training": {
24
+ "gpu_ranks": [
25
+ 0
26
+ ],
27
+ "valid_steps": 5000,
28
+ "prefetch_factor": 32,
29
+ "model_path": "quickmt-ru-en-eole-model",
30
+ "accum_steps": [
31
+ 0
32
+ ],
33
+ "max_grad_norm": 0.0,
34
  "dropout_steps": [
35
  0
36
  ],
37
+ "optim": "adamw",
38
+ "learning_rate": 3.0,
39
+ "normalization": "tokens",
40
+ "save_checkpoint_steps": 5000,
41
+ "label_smoothing": 0.1,
42
+ "accum_count": [
43
+ 10
44
+ ],
45
+ "batch_size": 12000,
46
+ "batch_size_multiple": 8,
47
+ "world_size": 1,
48
+ "batch_type": "tokens",
49
  "average_decay": 0.0001,
50
+ "train_steps": 200000,
51
  "attention_dropout": [
52
  0.1
53
  ],
54
+ "param_init_method": "xavier_uniform",
 
 
 
 
 
 
 
55
  "dropout": [
56
  0.1
57
  ],
58
  "num_workers": 0,
59
  "decay_method": "noam",
60
+ "keep_checkpoint": 4,
 
 
 
 
 
 
61
  "adam_beta2": 0.998,
62
+ "valid_batch_size": 2048,
 
 
63
  "compute_dtype": "torch.float16",
64
+ "bucket_size": 128000,
65
+ "warmup_steps": 5000
 
 
 
 
 
66
  },
67
  "model": {
68
+ "share_decoder_embeddings": true,
 
 
 
69
  "position_encoding_type": "SinusoidalInterleaved",
70
  "heads": 8,
71
+ "transformer_ff": 4096,
72
+ "hidden_size": 1024,
73
  "share_embeddings": false,
74
+ "architecture": "transformer",
75
+ "decoder": {
76
+ "tgt_word_vec_size": 1024,
77
+ "position_encoding_type": "SinusoidalInterleaved",
78
+ "layers": 2,
79
+ "heads": 8,
80
+ "n_positions": null,
81
+ "transformer_ff": 4096,
82
+ "hidden_size": 1024,
83
+ "decoder_type": "transformer"
84
+ },
85
  "embeddings": {
86
  "src_word_vec_size": 1024,
87
  "position_encoding_type": "SinusoidalInterleaved",
 
89
  "word_vec_size": 1024
90
  },
91
  "encoder": {
 
 
 
92
  "position_encoding_type": "SinusoidalInterleaved",
93
+ "layers": 8,
 
94
  "heads": 8,
95
+ "src_word_vec_size": 1024,
96
+ "encoder_type": "transformer",
97
+ "n_positions": null,
98
  "transformer_ff": 4096,
99
+ "hidden_size": 1024
 
 
 
 
 
 
100
  }
101
  },
102
  "transforms_configs": {
 
105
  "src_seq_length": 256
106
  },
107
  "sentencepiece": {
108
+ "src_subword_model": "${MODEL_PATH}/ru.spm.model",
109
+ "tgt_subword_model": "${MODEL_PATH}/en.spm.model"
110
  }
111
  },
112
  "data": {
113
  "corpus_1": {
 
114
  "transforms": [
115
  "sentencepiece",
116
  "filtertoolong"
117
  ],
118
+ "path_src": "hf://quickmt/quickmt-train.ru-en-v2/ru",
119
+ "path_sco": "hf://quickmt/quickmt-train.ru-en-v2/sco",
120
+ "path_tgt": "hf://quickmt/quickmt-train.ru-en-v2/en",
121
+ "path_align": null
122
  },
123
  "valid": {
124
+ "path_src": "valid.ru",
125
+ "path_tgt": "valid.en",
126
+ "path_align": null,
127
  "transforms": [
128
  "sentencepiece",
129
  "filtertoolong"
130
+ ]
 
 
131
  }
132
  }
133
  }
eole-model/en.spm.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a02eda49b9085bb516903b522f5123f22a730b2be29337202ff8280b786a680e
3
- size 592273
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d97bf2a98454f5f5ea8231376fba1c5172d56e5454e4d310f299f10410d21629
3
+ size 805620
eole-model/eole-config.yaml ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## IO
2
+ save_data: data
3
+ overwrite: True
4
+ seed: 1234
5
+ report_every: 100
6
+ valid_metrics: ["BLEU"]
7
+ tensorboard: true
8
+ tensorboard_log_dir: tensorboard
9
+
10
+ ### Vocab
11
+ src_vocab: ru.eole.vocab
12
+ tgt_vocab: en.eole.vocab
13
+ src_vocab_size: 32000
14
+ tgt_vocab_size: 32000
15
+ vocab_size_multiple: 8
16
+ share_vocab: false
17
+ n_sample: 0
18
+
19
+ data:
20
+ corpus_1:
21
+ path_src: hf://quickmt/quickmt-train.ru-en-v2/ru
22
+ path_tgt: hf://quickmt/quickmt-train.ru-en-v2/en
23
+ path_sco: hf://quickmt/quickmt-train.ru-en-v2/sco
24
+ #path_src: train.ru
25
+ #path_tgt: train.en
26
+ valid:
27
+ path_src: valid.ru
28
+ path_tgt: valid.en
29
+
30
+ transforms: [sentencepiece, filtertoolong]
31
+ transforms_configs:
32
+ sentencepiece:
33
+ src_subword_model: "ru.spm.model"
34
+ tgt_subword_model: "en.spm.model"
35
+ filtertoolong:
36
+ src_seq_length: 256
37
+ tgt_seq_length: 256
38
+
39
+ training:
40
+ # Run configuration
41
+ model_path: quickmt-ru-en-eole-model
42
+ #train_from: model
43
+ keep_checkpoint: 4
44
+ train_steps: 200000
45
+ save_checkpoint_steps: 5000
46
+ valid_steps: 5000
47
+
48
+ # Train on a single GPU
49
+ world_size: 1
50
+ gpu_ranks: [0]
51
+
52
+ # Batching 10240
53
+ batch_type: "tokens"
54
+ batch_size: 12000
55
+ valid_batch_size: 2048
56
+ batch_size_multiple: 8
57
+ accum_count: [10]
58
+ accum_steps: [0]
59
+
60
+ # Optimizer & Compute
61
+ compute_dtype: "fp16"
62
+ optim: "adamw"
63
+ #use_amp: False
64
+ learning_rate: 3.0
65
+ warmup_steps: 5000
66
+ decay_method: "noam"
67
+ adam_beta2: 0.998
68
+
69
+ # Data loading
70
+ bucket_size: 128000
71
+ num_workers: 4
72
+ prefetch_factor: 32
73
+
74
+ # Hyperparams
75
+ dropout_steps: [0]
76
+ dropout: [0.1]
77
+ attention_dropout: [0.1]
78
+ max_grad_norm: 0
79
+ label_smoothing: 0.1
80
+ average_decay: 0.0001
81
+ param_init_method: xavier_uniform
82
+ normalization: "tokens"
83
+
84
+ model:
85
+ architecture: "transformer"
86
+ share_embeddings: false
87
+ share_decoder_embeddings: true
88
+ hidden_size: 1024
89
+ encoder:
90
+ layers: 8
91
+ decoder:
92
+ layers: 2
93
+ heads: 8
94
+ transformer_ff: 4096
95
+ embeddings:
96
+ word_vec_size: 1024
97
+ position_encoding_type: "SinusoidalInterleaved"
98
+
eole-model/model.00.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8b26000560eac8419c39aa672d130421a03ef65a60bd66818bfeeea152f0799d
3
- size 823882912
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:345293ff110e5decf91207acf0ad24e24c4e104c265bd6f3ce4e2b4c7ecdaf7f
3
+ size 799354640
eole-model/ru.spm.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b428315194a0eebf6e69c4424ea2e78a1c03e20982eccd4e6eb446ca128124ec
3
- size 730304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48ee9d46612f7b3f98038c2adb193693bff996a2fa7ed38d3f37502148a2592a
3
+ size 1037835
eole-model/vocab.json CHANGED
The diff for this file is too large to render. See raw diff
 
model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0fb552c7074a60314b7d411a3f533320d79f1236c05534a2dc8e926888af24bd
3
- size 401699775
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5763ea697321bcf849602ce30734f3970f9848501b87d139e34558101e16be4
3
+ size 409915789
source_vocabulary.json CHANGED
The diff for this file is too large to render. See raw diff
 
src.spm.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b428315194a0eebf6e69c4424ea2e78a1c03e20982eccd4e6eb446ca128124ec
3
- size 730304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48ee9d46612f7b3f98038c2adb193693bff996a2fa7ed38d3f37502148a2592a
3
+ size 1037835
target_vocabulary.json CHANGED
The diff for this file is too large to render. See raw diff
 
tgt.spm.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a02eda49b9085bb516903b522f5123f22a730b2be29337202ff8280b786a680e
3
- size 592273
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d97bf2a98454f5f5ea8231376fba1c5172d56e5454e4d310f299f10410d21629
3
+ size 805620