Upload folder using huggingface_hub

Browse files

Files changed (15) hide show

.ipynb_checkpoints/README-checkpoint.md +27 -20
.ipynb_checkpoints/eole-config-checkpoint.yaml +96 -0
README.md +27 -20
eole-config.yaml +13 -15
eole-model/config.json +67 -66
eole-model/en.spm.model +2 -2
eole-model/eole-config.yaml +98 -0
eole-model/model.00.safetensors +2 -2
eole-model/ru.spm.model +2 -2
eole-model/vocab.json +0 -0
model.bin +2 -2
source_vocabulary.json +0 -0
src.spm.model +2 -2
target_vocabulary.json +0 -0
tgt.spm.model +2 -2

.ipynb_checkpoints/README-checkpoint.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags:
 - translation
 license: cc-by-4.0
 datasets:
-- quickmt/quickmt-train.ru-en
 model-index:
 - name: quickmt-ru-en
   results:
@@ -21,31 +21,38 @@ model-index:
     metrics:
     - name: BLEU
       type: bleu
-      value: 33.9
     - name: CHRF
       type: chrf
-      value:  61.63
     - name: COMET
       type: comet
-      value: 85.7
 ---
-# `quickmt-ru-en` Neural Machine Translation Model
 `quickmt-ru-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `ru` into `en`.
 ## Model Information
-* Trained using [`eole`](https://github.com/eole-nlp/eole)
-* 185M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
-* 50k joint Sentencepiece vocabulary
 * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
-* Training data: https://huggingface.co/datasets/quickmt/quickmt-train.ru-en/tree/main
 See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
 ## Usage with `quickmt`
 You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
@@ -68,12 +75,12 @@ from quickmt import Translator
 t = Translator("./quickmt-ru-en/", device="auto")
 # Translate - set beam size to 1 for faster speed (but lower quality)
-sample_text = 'Согласно предупреждению доктора Эхуда Ура (Ehud Ur), профессора медицины в Университете Дэлхаузи в Галифаксе (Новая Шотландия) и председателя клинико-научного отдела Канадской диабетической ассоциации, исследования все еще находятся на начальной стадии.'
 t(sample_text, beam_size=5)
 ```
-> 'According to the warning of Dr. Ehud Ur, Professor of Medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical Science Division of the Canadian Diabetes Association, the research is still in its infancy.'
 ```python
 # Get alternative translations by sampling
@@ -81,20 +88,20 @@ t(sample_text, beam_size=5)
 t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
 ```
-> 'According to the warning of Professor Ehud Ur, a Professor of Medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical and Scientific Division of the Canadian Diabetes Association, research is still in a very early stage.'
-The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible  to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`.
 ## Metrics
-`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("rus_Cyrl"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32 (faster speed is possible using a larger batch size).
 |                                  |   bleu |   chrf2 |   comet22 |   Time (s) |
 |:---------------------------------|-------:|--------:|----------:|-----------:|
-| quickmt/quickmt-ru-en            |  33.9  |   61.63 |     85.7  |       1.31 |
-| Helsink-NLP/opus-mt-ru-en        |  30.04 |   58.23 |     83.97 |       3.72 |
-| facebook/nllb-200-distilled-600M |  34.59 |   61.26 |     85.88 |      21.93 |
-| facebook/nllb-200-distilled-1.3B |  36.99 |   63.04 |     86.59 |      38.12 |
-| facebook/m2m100_418M             |  26.62 |   56.31 |     81.77 |      18.73 |
-| facebook/m2m100_1.2B             |  32.01 |   60.3  |     85.01 |      35.99 |

 - translation
 license: cc-by-4.0
 datasets:
+- quickmt/quickmt-train.ru-en-v2
 model-index:
 - name: quickmt-ru-en
   results:
     metrics:
     - name: BLEU
       type: bleu
+      value: 34.69
     - name: CHRF
       type: chrf
+      value: 62.31
     - name: COMET
       type: comet
+      value: 85.96
 ---
+# `quickmt-ru-en` Neural Machine Translation Model - V2
 `quickmt-ru-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `ru` into `en`.
+This is an updated, higher-quality model with a larger, cleaner training dataset trained for more steps.
+## Try it on our Huggingface Space
+Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-Demo
 ## Model Information
+* Trained using [`eole`](https://github.com/eole-nlp/eole)
+* 200M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
+* 32k separate Sentencepiece vocabs
 * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
 See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
 ## Usage with `quickmt`
 You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
 t = Translator("./quickmt-ru-en/", device="auto")
 # Translate - set beam size to 1 for faster speed (but lower quality)
+sample_text = 'Dr. Ehud Ur, professor i medicin på Dalhousie University i Halifax, Nova Scotia, og formand for den kliniske og videnskabelige afdeling af Canadian Diabetes Association, advarede om at forskningen stadig er i dens tidlige stadier.'
 t(sample_text, beam_size=5)
 ```
+> 'According to Dr. Ehud Ur, professor of medicine at Dalhousie University in Halifax, Nova Scotia and chair of the clinical science department of the Canadian Diabetes Association, the research is still in its infancy.'
 ```python
 # Get alternative translations by sampling
 t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
 ```
+> 'According to Dr. Ehud Ur, a professor of medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical Research Division of the Canadian Diabetes Association, research is still in the initial stages.'
+The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible  to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
 ## Metrics
+`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("rus_Cyrl"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32.
 |                                  |   bleu |   chrf2 |   comet22 |   Time (s) |
 |:---------------------------------|-------:|--------:|----------:|-----------:|
+| quickmt/quickmt-ru-en            |  34.69 |   62.31 |     85.96 |       1.27 |
+| Helsinki-NLP/opus-mt-ru-en       |  30.04 |   58.23 |     83.97 |       3.81 |
+| facebook/nllb-200-distilled-600M |  34.59 |   61.26 |     85.88 |      22.07 |
+| facebook/nllb-200-distilled-1.3B |  36.99 |   63.04 |     86.59 |      38.26 |
+| facebook/m2m100_418M             |  26.62 |   56.31 |     81.77 |      18.7  |
+| facebook/m2m100_1.2B             |  32.01 |   60.3  |     85.01 |      36.32 |

.ipynb_checkpoints/eole-config-checkpoint.yaml ADDED Viewed

	@@ -0,0 +1,96 @@

+## IO
+save_data: data
+overwrite: True
+seed: 1234
+report_every: 100
+valid_metrics: ["BLEU"]
+tensorboard: true
+tensorboard_log_dir: tensorboard
+### Vocab
+src_vocab: ru.eole.vocab
+tgt_vocab: en.eole.vocab
+src_vocab_size: 32000
+tgt_vocab_size: 32000
+vocab_size_multiple: 8
+share_vocab: false
+n_sample: 0
+data:
+    corpus_1:
+        path_src: hf://quickmt/quickmt-train.ru-en-v2/ru
+        path_tgt: hf://quickmt/quickmt-train.ru-en-v2/en
+        path_sco: hf://quickmt/quickmt-train.ru-en-v2/sco
+    valid:
+        path_src: valid.ru
+        path_tgt: valid.en
+transforms: [sentencepiece, filtertoolong]
+transforms_configs:
+  sentencepiece:
+    src_subword_model: "ru.spm.model"
+    tgt_subword_model: "en.spm.model"
+  filtertoolong:
+    src_seq_length: 256
+    tgt_seq_length: 256
+training:
+    # Run configuration
+    model_path: quickmt-ru-en-eole-model
+    #train_from: model
+    keep_checkpoint: 4
+    train_steps: 200000
+    save_checkpoint_steps: 5000
+    valid_steps: 5000
+    # Train on a single GPU
+    world_size: 1
+    gpu_ranks: [0]
+    # Batching 10240
+    batch_type: "tokens"
+    batch_size: 12000
+    valid_batch_size: 2048
+    batch_size_multiple: 8
+    accum_count: [10]
+    accum_steps: [0]
+    # Optimizer & Compute
+    compute_dtype: "fp16"
+    optim: "adamw"
+    #use_amp: False
+    learning_rate: 3.0
+    warmup_steps: 5000
+    decay_method: "noam"
+    adam_beta2: 0.998
+    # Data loading
+    bucket_size: 128000
+    num_workers: 4
+    prefetch_factor: 32
+    # Hyperparams
+    dropout_steps: [0]
+    dropout: [0.1]
+    attention_dropout: [0.1]
+    max_grad_norm: 0
+    label_smoothing: 0.1
+    average_decay: 0.0001
+    param_init_method: xavier_uniform
+    normalization: "tokens"
+model:
+    architecture: "transformer"
+    share_embeddings: false
+    share_decoder_embeddings: true
+    hidden_size: 1024
+    encoder:
+        layers: 8
+    decoder:
+        layers: 2
+    heads: 8
+    transformer_ff: 4096
+    embeddings:
+        word_vec_size: 1024
+        position_encoding_type: "SinusoidalInterleaved"

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags:
 - translation
 license: cc-by-4.0
 datasets:
-- quickmt/quickmt-train.ru-en
 model-index:
 - name: quickmt-ru-en
   results:
@@ -21,31 +21,38 @@ model-index:
     metrics:
     - name: BLEU
       type: bleu
-      value: 33.9
     - name: CHRF
       type: chrf
-      value:  61.63
     - name: COMET
       type: comet
-      value: 85.7
 ---
-# `quickmt-ru-en` Neural Machine Translation Model
 `quickmt-ru-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `ru` into `en`.
 ## Model Information
-* Trained using [`eole`](https://github.com/eole-nlp/eole)
-* 185M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
-* 50k joint Sentencepiece vocabulary
 * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
-* Training data: https://huggingface.co/datasets/quickmt/quickmt-train.ru-en/tree/main
 See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
 ## Usage with `quickmt`
 You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
@@ -68,12 +75,12 @@ from quickmt import Translator
 t = Translator("./quickmt-ru-en/", device="auto")
 # Translate - set beam size to 1 for faster speed (but lower quality)
-sample_text = 'Согласно предупреждению доктора Эхуда Ура (Ehud Ur), профессора медицины в Университете Дэлхаузи в Галифаксе (Новая Шотландия) и председателя клинико-научного отдела Канадской диабетической ассоциации, исследования все еще находятся на начальной стадии.'
 t(sample_text, beam_size=5)
 ```
-> 'According to the warning of Dr. Ehud Ur, Professor of Medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical Science Division of the Canadian Diabetes Association, the research is still in its infancy.'
 ```python
 # Get alternative translations by sampling
@@ -81,20 +88,20 @@ t(sample_text, beam_size=5)
 t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
 ```
-> 'According to the warning of Professor Ehud Ur, a Professor of Medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical and Scientific Division of the Canadian Diabetes Association, research is still in a very early stage.'
-The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible  to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`.
 ## Metrics
-`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("rus_Cyrl"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32 (faster speed is possible using a larger batch size).
 |                                  |   bleu |   chrf2 |   comet22 |   Time (s) |
 |:---------------------------------|-------:|--------:|----------:|-----------:|
-| quickmt/quickmt-ru-en            |  33.9  |   61.63 |     85.7  |       1.31 |
-| Helsink-NLP/opus-mt-ru-en        |  30.04 |   58.23 |     83.97 |       3.72 |
-| facebook/nllb-200-distilled-600M |  34.59 |   61.26 |     85.88 |      21.93 |
-| facebook/nllb-200-distilled-1.3B |  36.99 |   63.04 |     86.59 |      38.12 |
-| facebook/m2m100_418M             |  26.62 |   56.31 |     81.77 |      18.73 |
-| facebook/m2m100_1.2B             |  32.01 |   60.3  |     85.01 |      35.99 |

 - translation
 license: cc-by-4.0
 datasets:
+- quickmt/quickmt-train.ru-en-v2
 model-index:
 - name: quickmt-ru-en
   results:
     metrics:
     - name: BLEU
       type: bleu
+      value: 34.69
     - name: CHRF
       type: chrf
+      value: 62.31
     - name: COMET
       type: comet
+      value: 85.96
 ---
+# `quickmt-ru-en` Neural Machine Translation Model - V2
 `quickmt-ru-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `ru` into `en`.
+This is an updated, higher-quality model with a larger, cleaner training dataset trained for more steps.
+## Try it on our Huggingface Space
+Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-Demo
 ## Model Information
+* Trained using [`eole`](https://github.com/eole-nlp/eole)
+* 200M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
+* 32k separate Sentencepiece vocabs
 * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
 See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
 ## Usage with `quickmt`
 You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
 t = Translator("./quickmt-ru-en/", device="auto")
 # Translate - set beam size to 1 for faster speed (but lower quality)
+sample_text = 'Dr. Ehud Ur, professor i medicin på Dalhousie University i Halifax, Nova Scotia, og formand for den kliniske og videnskabelige afdeling af Canadian Diabetes Association, advarede om at forskningen stadig er i dens tidlige stadier.'
 t(sample_text, beam_size=5)
 ```
+> 'According to Dr. Ehud Ur, professor of medicine at Dalhousie University in Halifax, Nova Scotia and chair of the clinical science department of the Canadian Diabetes Association, the research is still in its infancy.'
 ```python
 # Get alternative translations by sampling
 t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
 ```
+> 'According to Dr. Ehud Ur, a professor of medicine at Dalhousie University in Halifax, Nova Scotia, and Chair of the Clinical Research Division of the Canadian Diabetes Association, research is still in the initial stages.'
+The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible  to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
 ## Metrics
+`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("rus_Cyrl"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32.
 |                                  |   bleu |   chrf2 |   comet22 |   Time (s) |
 |:---------------------------------|-------:|--------:|----------:|-----------:|
+| quickmt/quickmt-ru-en            |  34.69 |   62.31 |     85.96 |       1.27 |
+| Helsinki-NLP/opus-mt-ru-en       |  30.04 |   58.23 |     83.97 |       3.81 |
+| facebook/nllb-200-distilled-600M |  34.59 |   61.26 |     85.88 |      22.07 |
+| facebook/nllb-200-distilled-1.3B |  36.99 |   63.04 |     86.59 |      38.26 |
+| facebook/m2m100_418M             |  26.62 |   56.31 |     81.77 |      18.7  |
+| facebook/m2m100_1.2B             |  32.01 |   60.3  |     85.01 |      36.32 |

eole-config.yaml CHANGED Viewed

@@ -10,22 +10,20 @@ tensorboard_log_dir: tensorboard
 ### Vocab
 src_vocab: ru.eole.vocab
 tgt_vocab: en.eole.vocab
-src_vocab_size: 20000
-tgt_vocab_size: 20000
 vocab_size_multiple: 8
 share_vocab: false
 n_sample: 0
 data:
     corpus_1:
-        # path_src: hf://quickmt/quickmt-train.ru-en/ru
-        # path_tgt: hf://quickmt/quickmt-train.ru-en/en
-        # path_sco: hf://quickmt/quickmt-train.ru-en/sco
-        path_src: train.ru
-        path_tgt: train.en
     valid:
-        path_src: dev.ru
-        path_tgt: dev.en
 transforms: [sentencepiece, filtertoolong]
 transforms_configs:
@@ -41,7 +39,7 @@ training:
     model_path: quickmt-ru-en-eole-model
     #train_from: model
     keep_checkpoint: 4
-    train_steps: 100000
     save_checkpoint_steps: 5000
     valid_steps: 5000
@@ -51,8 +49,8 @@ training:
     # Batching 10240
     batch_type: "tokens"
-    batch_size: 8000
-    valid_batch_size: 4096
     batch_size_multiple: 8
     accum_count: [10]
     accum_steps: [0]
@@ -61,8 +59,8 @@ training:
     compute_dtype: "fp16"
     optim: "adamw"
     #use_amp: False
-    learning_rate: 2.0
-    warmup_steps: 4000
     decay_method: "noam"
     adam_beta2: 0.998
@@ -84,7 +82,7 @@ training:
 model:
     architecture: "transformer"
     share_embeddings: false
-    share_decoder_embeddings: false
     hidden_size: 1024
     encoder:
         layers: 8

 ### Vocab
 src_vocab: ru.eole.vocab
 tgt_vocab: en.eole.vocab
+src_vocab_size: 32000
+tgt_vocab_size: 32000
 vocab_size_multiple: 8
 share_vocab: false
 n_sample: 0
 data:
     corpus_1:
+        path_src: hf://quickmt/quickmt-train.ru-en-v2/ru
+        path_tgt: hf://quickmt/quickmt-train.ru-en-v2/en
+        path_sco: hf://quickmt/quickmt-train.ru-en-v2/sco
     valid:
+        path_src: valid.ru
+        path_tgt: valid.en
 transforms: [sentencepiece, filtertoolong]
 transforms_configs:
     model_path: quickmt-ru-en-eole-model
     #train_from: model
     keep_checkpoint: 4
+    train_steps: 200000
     save_checkpoint_steps: 5000
     valid_steps: 5000
     # Batching 10240
     batch_type: "tokens"
+    batch_size: 12000
+    valid_batch_size: 2048
     batch_size_multiple: 8
     accum_count: [10]
     accum_steps: [0]
     compute_dtype: "fp16"
     optim: "adamw"
     #use_amp: False
+    learning_rate: 3.0
+    warmup_steps: 5000
     decay_method: "noam"
     adam_beta2: 0.998
 model:
     architecture: "transformer"
     share_embeddings: false
+    share_decoder_embeddings: true
     hidden_size: 1024
     encoder:
         layers: 8

eole-model/config.json CHANGED Viewed

@@ -1,77 +1,87 @@
 {
   "report_every": 100,
-  "tgt_vocab": "en.eole.vocab",
   "valid_metrics": [
     "BLEU"
   ],
   "tensorboard": true,
   "src_vocab": "ru.eole.vocab",
   "transforms": [
     "sentencepiece",
     "filtertoolong"
   ],
-  "vocab_size_multiple": 8,
-  "tensorboard_log_dir": "tensorboard",
   "seed": 1234,
-  "n_sample": 0,
-  "save_data": "data",
-  "share_vocab": false,
-  "src_vocab_size": 20000,
-  "tensorboard_log_dir_dated": "tensorboard/May-06_17-28-49",
-  "tgt_vocab_size": 20000,
-  "overwrite": true,
   "training": {
-    "bucket_size": 128000,
     "dropout_steps": [
       0
     ],
-    "keep_checkpoint": 4,
     "average_decay": 0.0001,
-    "param_init_method": "xavier_uniform",
     "attention_dropout": [
       0.1
     ],
-    "train_steps": 100000,
-    "batch_size": 8000,
-    "accum_steps": [
-      0
-    ],
-    "prefetch_factor": 32,
-    "max_grad_norm": 0.0,
-    "valid_batch_size": 4096,
     "dropout": [
       0.1
     ],
     "num_workers": 0,
     "decay_method": "noam",
-    "valid_steps": 5000,
-    "model_path": "quickmt-ru-en-eole-model",
-    "world_size": 1,
-    "learning_rate": 2.0,
-    "save_checkpoint_steps": 5000,
-    "optim": "adamw",
-    "normalization": "tokens",
     "adam_beta2": 0.998,
-    "warmup_steps": 4000,
-    "batch_size_multiple": 8,
-    "label_smoothing": 0.1,
     "compute_dtype": "torch.float16",
-    "gpu_ranks": [
-      0
-    ],
-    "accum_count": [
-      10
-    ],
-    "batch_type": "tokens"
   },
   "model": {
-    "transformer_ff": 4096,
-    "architecture": "transformer",
-    "hidden_size": 1024,
-    "share_decoder_embeddings": false,
     "position_encoding_type": "SinusoidalInterleaved",
     "heads": 8,
     "share_embeddings": false,
     "embeddings": {
       "src_word_vec_size": 1024,
       "position_encoding_type": "SinusoidalInterleaved",
@@ -79,24 +89,14 @@
       "word_vec_size": 1024
     },
     "encoder": {
-      "transformer_ff": 4096,
-      "hidden_size": 1024,
-      "layers": 8,
       "position_encoding_type": "SinusoidalInterleaved",
-      "encoder_type": "transformer",
-      "src_word_vec_size": 1024,
       "heads": 8,
-      "n_positions": null
-    },
-    "decoder": {
       "transformer_ff": 4096,
-      "tgt_word_vec_size": 1024,
-      "decoder_type": "transformer",
-      "hidden_size": 1024,
-      "layers": 2,
-      "position_encoding_type": "SinusoidalInterleaved",
-      "heads": 8,
-      "n_positions": null
     }
   },
   "transforms_configs": {
@@ -105,28 +105,29 @@
       "src_seq_length": 256
     },
     "sentencepiece": {
-      "tgt_subword_model": "${MODEL_PATH}/en.spm.model",
-      "src_subword_model": "${MODEL_PATH}/ru.spm.model"
     }
   },
   "data": {
     "corpus_1": {
-      "path_src": "train.ru",
       "transforms": [
         "sentencepiece",
         "filtertoolong"
       ],
-      "path_align": null,
-      "path_tgt": "train.en"
     },
     "valid": {
-      "path_src": "dev.ru",
       "transforms": [
         "sentencepiece",
         "filtertoolong"
-      ],
-      "path_align": null,
-      "path_tgt": "dev.en"
     }
   }
 }

 {
   "report_every": 100,
   "valid_metrics": [
     "BLEU"
   ],
+  "overwrite": true,
+  "tensorboard_log_dir_dated": "tensorboard/Nov-03_11-32-41",
   "tensorboard": true,
+  "share_vocab": false,
+  "src_vocab_size": 32000,
   "src_vocab": "ru.eole.vocab",
+  "save_data": "data",
+  "tgt_vocab_size": 32000,
+  "n_sample": 0,
+  "tgt_vocab": "en.eole.vocab",
+  "tensorboard_log_dir": "tensorboard",
   "transforms": [
     "sentencepiece",
     "filtertoolong"
   ],
   "seed": 1234,
+  "vocab_size_multiple": 8,
   "training": {
+    "gpu_ranks": [
+      0
+    ],
+    "valid_steps": 5000,
+    "prefetch_factor": 32,
+    "model_path": "quickmt-ru-en-eole-model",
+    "accum_steps": [
+      0
+    ],
+    "max_grad_norm": 0.0,
     "dropout_steps": [
       0
     ],
+    "optim": "adamw",
+    "learning_rate": 3.0,
+    "normalization": "tokens",
+    "save_checkpoint_steps": 5000,
+    "label_smoothing": 0.1,
+    "accum_count": [
+      10
+    ],
+    "batch_size": 12000,
+    "batch_size_multiple": 8,
+    "world_size": 1,
+    "batch_type": "tokens",
     "average_decay": 0.0001,
+    "train_steps": 200000,
     "attention_dropout": [
       0.1
     ],
+    "param_init_method": "xavier_uniform",
     "dropout": [
       0.1
     ],
     "num_workers": 0,
     "decay_method": "noam",
+    "keep_checkpoint": 4,
     "adam_beta2": 0.998,
+    "valid_batch_size": 2048,
     "compute_dtype": "torch.float16",
+    "bucket_size": 128000,
+    "warmup_steps": 5000
   },
   "model": {
+    "share_decoder_embeddings": true,
     "position_encoding_type": "SinusoidalInterleaved",
     "heads": 8,
+    "transformer_ff": 4096,
+    "hidden_size": 1024,
     "share_embeddings": false,
+    "architecture": "transformer",
+    "decoder": {
+      "tgt_word_vec_size": 1024,
+      "position_encoding_type": "SinusoidalInterleaved",
+      "layers": 2,
+      "heads": 8,
+      "n_positions": null,
+      "transformer_ff": 4096,
+      "hidden_size": 1024,
+      "decoder_type": "transformer"
+    },
     "embeddings": {
       "src_word_vec_size": 1024,
       "position_encoding_type": "SinusoidalInterleaved",
       "word_vec_size": 1024
     },
     "encoder": {
       "position_encoding_type": "SinusoidalInterleaved",
+      "layers": 8,
       "heads": 8,
+      "src_word_vec_size": 1024,
+      "encoder_type": "transformer",
+      "n_positions": null,
       "transformer_ff": 4096,
+      "hidden_size": 1024
     }
   },
   "transforms_configs": {
       "src_seq_length": 256
     },
     "sentencepiece": {
+      "src_subword_model": "${MODEL_PATH}/ru.spm.model",
+      "tgt_subword_model": "${MODEL_PATH}/en.spm.model"
     }
   },
   "data": {
     "corpus_1": {
       "transforms": [
         "sentencepiece",
         "filtertoolong"
       ],
+      "path_src": "hf://quickmt/quickmt-train.ru-en-v2/ru",
+      "path_sco": "hf://quickmt/quickmt-train.ru-en-v2/sco",
+      "path_tgt": "hf://quickmt/quickmt-train.ru-en-v2/en",
+      "path_align": null
     },
     "valid": {
+      "path_src": "valid.ru",
+      "path_tgt": "valid.en",
+      "path_align": null,
       "transforms": [
         "sentencepiece",
         "filtertoolong"
+      ]
     }
   }
 }

eole-model/en.spm.model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a02eda49b9085bb516903b522f5123f22a730b2be29337202ff8280b786a680e
-size 592273

 version https://git-lfs.github.com/spec/v1
+oid sha256:d97bf2a98454f5f5ea8231376fba1c5172d56e5454e4d310f299f10410d21629
+size 805620

eole-model/eole-config.yaml ADDED Viewed

	@@ -0,0 +1,98 @@

+## IO
+save_data: data
+overwrite: True
+seed: 1234
+report_every: 100
+valid_metrics: ["BLEU"]
+tensorboard: true
+tensorboard_log_dir: tensorboard
+### Vocab
+src_vocab: ru.eole.vocab
+tgt_vocab: en.eole.vocab
+src_vocab_size: 32000
+tgt_vocab_size: 32000
+vocab_size_multiple: 8
+share_vocab: false
+n_sample: 0
+data:
+    corpus_1:
+        path_src: hf://quickmt/quickmt-train.ru-en-v2/ru
+        path_tgt: hf://quickmt/quickmt-train.ru-en-v2/en
+        path_sco: hf://quickmt/quickmt-train.ru-en-v2/sco
+        #path_src: train.ru
+        #path_tgt: train.en
+    valid:
+        path_src: valid.ru
+        path_tgt: valid.en
+transforms: [sentencepiece, filtertoolong]
+transforms_configs:
+  sentencepiece:
+    src_subword_model: "ru.spm.model"
+    tgt_subword_model: "en.spm.model"
+  filtertoolong:
+    src_seq_length: 256
+    tgt_seq_length: 256
+training:
+    # Run configuration
+    model_path: quickmt-ru-en-eole-model
+    #train_from: model
+    keep_checkpoint: 4
+    train_steps: 200000
+    save_checkpoint_steps: 5000
+    valid_steps: 5000
+    # Train on a single GPU
+    world_size: 1
+    gpu_ranks: [0]
+    # Batching 10240
+    batch_type: "tokens"
+    batch_size: 12000
+    valid_batch_size: 2048
+    batch_size_multiple: 8
+    accum_count: [10]
+    accum_steps: [0]
+    # Optimizer & Compute
+    compute_dtype: "fp16"
+    optim: "adamw"
+    #use_amp: False
+    learning_rate: 3.0
+    warmup_steps: 5000
+    decay_method: "noam"
+    adam_beta2: 0.998
+    # Data loading
+    bucket_size: 128000
+    num_workers: 4
+    prefetch_factor: 32
+    # Hyperparams
+    dropout_steps: [0]
+    dropout: [0.1]
+    attention_dropout: [0.1]
+    max_grad_norm: 0
+    label_smoothing: 0.1
+    average_decay: 0.0001
+    param_init_method: xavier_uniform
+    normalization: "tokens"
+model:
+    architecture: "transformer"
+    share_embeddings: false
+    share_decoder_embeddings: true
+    hidden_size: 1024
+    encoder:
+        layers: 8
+    decoder:
+        layers: 2
+    heads: 8
+    transformer_ff: 4096
+    embeddings:
+        word_vec_size: 1024
+        position_encoding_type: "SinusoidalInterleaved"

eole-model/model.00.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8b26000560eac8419c39aa672d130421a03ef65a60bd66818bfeeea152f0799d
-size 823882912

 version https://git-lfs.github.com/spec/v1
+oid sha256:345293ff110e5decf91207acf0ad24e24c4e104c265bd6f3ce4e2b4c7ecdaf7f
+size 799354640

eole-model/ru.spm.model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b428315194a0eebf6e69c4424ea2e78a1c03e20982eccd4e6eb446ca128124ec
-size 730304

 version https://git-lfs.github.com/spec/v1
+oid sha256:48ee9d46612f7b3f98038c2adb193693bff996a2fa7ed38d3f37502148a2592a
+size 1037835

eole-model/vocab.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0fb552c7074a60314b7d411a3f533320d79f1236c05534a2dc8e926888af24bd
-size 401699775

 version https://git-lfs.github.com/spec/v1
+oid sha256:b5763ea697321bcf849602ce30734f3970f9848501b87d139e34558101e16be4
+size 409915789

source_vocabulary.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

src.spm.model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b428315194a0eebf6e69c4424ea2e78a1c03e20982eccd4e6eb446ca128124ec
-size 730304

 version https://git-lfs.github.com/spec/v1
+oid sha256:48ee9d46612f7b3f98038c2adb193693bff996a2fa7ed38d3f37502148a2592a
+size 1037835

target_vocabulary.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tgt.spm.model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a02eda49b9085bb516903b522f5123f22a730b2be29337202ff8280b786a680e
-size 592273

 version https://git-lfs.github.com/spec/v1
+oid sha256:d97bf2a98454f5f5ea8231376fba1c5172d56e5454e4d310f299f10410d21629
+size 805620