Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

README.md +52 -38
config.json +2 -2
config.yaml +66 -0
metrics.jsonl +100 -0
model.bin +2 -2
pytorch_model/model.safetensors +3 -0
pytorch_model/tokenizer_src.model +3 -0
pytorch_model/tokenizer_src.vocab +0 -0
pytorch_model/tokenizer_tgt.model +3 -0
pytorch_model/tokenizer_tgt.vocab +0 -0
source_vocabulary.json +0 -0
src.spm.model +2 -2
target_vocabulary.json +0 -0
tgt.spm.model +2 -2

README.md CHANGED Viewed

@@ -1,20 +1,22 @@
 ---
 language:
-- en
 - is
 tags:
 - translation
 license: cc-by-4.0
 datasets:
 - quickmt/quickmt-train.is-en
 - quickmt/newscrawl2024-en-backtranslated-is
 model-index:
 - name: quickmt-is-en
   results:
   - task:
       name: Translation isl-eng
       type: translation
-      args: isl-eng
     dataset:
       name: flores101-devtest
       type: flores_101
@@ -22,91 +24,103 @@ model-index:
     metrics:
     - name: BLEU
       type: bleu
-      value: 34.76
     - name: CHRF
       type: chrf
-      value: 60.13
-    - name: COMET
-      type: comet
-      value: 85.39
 ---
-<a href="https://huggingface.co/spaces/quickmt/quickmt-gui"><img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-lg-dark.svg" alt="Open in Spaces"></a>
 # `quickmt-is-en` Neural Machine Translation Model
 `quickmt-is-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `is` into `en`.
 ## Try it on our Huggingface Space
-Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-gui
 ## Model Information
-* Trained using [`eole`](https://github.com/eole-nlp/eole)
-* 200M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
 * 32k separate Sentencepiece vocabs
 * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
-* The pytorch model (for use with [`eole`](https://github.com/eole-nlp/eole)) is available in this repository in the `eole-model` folder
-See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
 ## Usage with `quickmt`
-You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
-Next, install the `quickmt` [python library](github.com/quickmt/quickmt).
 ```bash
 git clone https://github.com/quickmt/quickmt.git
-pip install ./quickmt/
 ```
-Finally, use the model in python:
 ```python
 from quickmt import Translator
-from huggingface_hub import snapshot_download
-# Download Model (if not downloaded already) and return path to local model
-# Device is either 'auto', 'cpu' or 'cuda'
-t = Translator(
-    snapshot_download("quickmt/quickmt-zh-en", ignore_patterns="eole-model/*"),
-    device="cpu"
-)
 # Translate - set beam size to 1 for faster speed (but lower quality)
 sample_text = 'Dr. Ehud Ur, læknaprófessor við Dalhousie-háskólann í Halifax í Nova Scotia og formaður klínískrar vísindadeildar Kanadíska sykursýkissambandsins, minnti á að rannsóknin væri rétt nýhafin.'
-t(sample_text, beam_size=5)
 ```
-> 'Dr. Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia, and chair of the clinical science department of the Canadian Diabetes Association, recalled that the study had just begun.'
 ```python
 # Get alternative translations by sampling
 # You can pass any cTranslate2 `translate_batch` arguments
-t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
 ```
-> 'Dr Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia and chair of the clinical science section of the Canadian Diabetes Union, mentioned that the investigation was just beginning.'
 The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible  to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
 ## Metrics
-`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("isl_Latn"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32.
-|                                  |   bleu |   chrf2 |   comet22 |   Time (s) |
-|:---------------------------------|-------:|--------:|----------:|-----------:|
-| quickmt/quickmt-is-en            |  34.76 |   60.13 |     85.39 |       1.22 |
-| Helsinki-NLP/opus-mt-is-en       |  25.91 |   52.03 |     79.99 |       3.5  |
-| facebook/nllb-200-distilled-600M |  30.13 |   54.77 |     82.23 |      21.3  |
-| facebook/nllb-200-distilled-1.3B |  33.71 |   57.73 |     84.71 |      37.21 |
-| facebook/m2m100_418M             |  20.38 |   46.47 |     70.95 |      18.8  |
-| facebook/m2m100_1.2B             |  28.89 |   54.54 |     81.09 |      34.72 |

 ---
 language:
 - is
+- en
 tags:
 - translation
 license: cc-by-4.0
 datasets:
 - quickmt/quickmt-train.is-en
 - quickmt/newscrawl2024-en-backtranslated-is
+- quickmt/finetranslations-sample-is-en
+- HuggingFaceFW/finetranslations
 model-index:
 - name: quickmt-is-en
   results:
   - task:
       name: Translation isl-eng
       type: translation
+      args: iso-eng
     dataset:
       name: flores101-devtest
       type: flores_101
     metrics:
     - name: BLEU
       type: bleu
+      value: 36.09
     - name: CHRF
       type: chrf
+      value: 60.91
 ---
 # `quickmt-is-en` Neural Machine Translation Model
 `quickmt-is-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `is` into `en`.
+`quickmt` models are roughly 3 times faster for GPU inference than OpusMT models and roughly [40 times](https://huggingface.co/spaces/quickmt/quickmt-vs-libretranslate) faster than [LibreTranslate](https://huggingface.co/spaces/quickmt/quickmt-vs-libretranslate)/[ArgosTranslate](github.com/argosopentech/argos-translate).
 ## Try it on our Huggingface Space
+Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-Demo
 ## Model Information
+* Trained using [`quickmt-train`](github.com/quickmt/quickmt-train)
+* 200M parameter seq2seq transformer
 * 32k separate Sentencepiece vocabs
 * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
+* The pytorch model (for fine-tuning or pytorch inference) is available in this repository in the `pytorch_model` folder
+  * Original configuration file: `config.yaml`
 ## Usage with `quickmt`
+If you want to do GPU inference be sure you have the Nvidia driver and cuda toolkit installed.
+Next, install the `quickmt` python library and download the model:
 ```bash
 git clone https://github.com/quickmt/quickmt.git
+pip install -e ./quickmt/
 ```
+Finally use the model in python:
 ```python
 from quickmt import Translator
+# Auto-detects GPU, set to "cpu" to force CPU inference
+mt = Translator("quickmt/quickmt-is-en", device="auto")
 # Translate - set beam size to 1 for faster speed (but lower quality)
 sample_text = 'Dr. Ehud Ur, læknaprófessor við Dalhousie-háskólann í Halifax í Nova Scotia og formaður klínískrar vísindadeildar Kanadíska sykursýkissambandsins, minnti á að rannsóknin væri rétt nýhafin.'
+mt(sample_text, beam_size=5)
 ```
+> "Dr. Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia and chair of the Canadian Diabetes Association's clinical science department, recalled that the study had just begun."
 ```python
 # Get alternative translations by sampling
 # You can pass any cTranslate2 `translate_batch` arguments
+mt([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
 ```
+> 'Dr. Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia, and chair of the Clinical Division of the Canadian Diabetes Association, reminded that the study had just begun.'
 The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible  to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
 ## Metrics
+`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) and [Bouquet](https://huggingface.co/datasets/facebook/bouquet) `test` set.  "Time (s)" is the time in seconds to translate dataset on an RTX 4070s GPU with batch size 32. LLM inference done with vLLM and 32 threads.
+Benchmarks are hard to get right and make fair. Download this model and give it a try and see if it works well for you!
+### flores devtest
+| model                            | time  | bleu  | chrf  |
+|----------------------------------|-------|-------|-------|
+| quickmt-is-en                    | 0.70  | 47.68 | 65.91 |
+| Helsinki-NLP/opus-mt-is-en       | 1.17  | 36.46 | 56.62 |
+| facebook/nllb-200-distilled-1.3B | 8.57  | 40.31 | 60.39 |
+| CohereLabs/tiny-aya-global       | 14.22 | 22.26 | 43.01 |
+| google/gemma-4-E2B-it            | 23.79 | 36.90 | 57.52 |
+### bouquet test
+| model                            | time  | bleu  | chrf  |
+|----------------------------------|-------|-------|-------|
+| quickmt-is-en                    | 1.16  | 36.09 | 60.91 |
+| Helsinki-NLP/opus-mt-is-en       | 2.33  | 25.26 | 51.44 |
+| facebook/nllb-200-distilled-1.3B | 18.17 | 32.79 | 56.81 |
+| CohereLabs/tiny-aya-global       | 27.03 | 16.03 | 40.63 |
+| google/gemma-4-E2B-it            | 46.60 | 28.55 | 54.30 |
+Prompt for LLM translation:
+> Translate the following into {tgt_lang}, without commentary or explanation.\n\n{x}

config.json CHANGED Viewed

@@ -1,10 +1,10 @@
 {
-  "add_source_bos": false,
   "add_source_eos": false,
   "bos_token": "<s>",
   "decoder_start_token": "<s>",
   "eos_token": "</s>",
-  "layer_norm_epsilon": 1e-06,
   "multi_query_attention": false,
   "unk_token": "<unk>"
 }

 {
+  "add_source_bos": true,
   "add_source_eos": false,
   "bos_token": "<s>",
   "decoder_start_token": "<s>",
   "eos_token": "</s>",
+  "layer_norm_epsilon": null,
   "multi_query_attention": false,
   "unk_token": "<unk>"
 }

config.yaml ADDED Viewed

	@@ -0,0 +1,66 @@

+train:
+  experiment_name: "isen1"
+  lr: 2.5e-3
+  accum_steps: 6
+  warmup_steps: 10000
+  max_steps: 100000
+  eval_steps: 1000
+  max_checkpoints: 10
+  precision: "bfloat16" # or float16 with an older GPU
+  enable_torch_compile: true
+  checkpoint_strategy: best
+  early_stopping_patience: 0
+  early_stopping_metric: chrf
+  use_ema: true
+  ema_decay: 0.9999
+  ema_start_step: 10000
+  z_loss_coeff: 0.0005
+  weight_decay_embeddings: false
+  scheduler_type: "cosine"
+data:
+  src_lang: "is"
+  tgt_lang: "en"
+  src_dev_path: "quickmt-valid.is-en.is"
+  tgt_dev_path: "quickmt-valid.is-en.en"
+  input_sentence_size: 10000000
+  max_tokens_per_batch: 20000
+  buffer_size: 40000
+  num_workers:  4
+  prefetch_factor: 128
+  pad_multiple: 1
+  corpora:
+    - src_file: "quickmt-train.is-en.is"
+      tgt_file: "quickmt-train.is-en.en"
+      weight: 10
+      start_step: 0
+    - src_file: "finetranslations-sample-is-en.is"
+      tgt_file: "finetranslations-sample-is-en.en"
+      weight: 4
+      start_step: 0
+      stop_step: 80000
+    - src_file: "newscrawl2024-en-backtranslated-is.is"
+      tgt_file: "newscrawl2024-en-backtranslated-is.en"
+      start_step: 0
+      weight: 5
+      stop_step: 80000
+model:
+  d_model: 768
+  enc_layers: 12
+  dec_layers: 2
+  n_heads: 12
+  ffn_dim: 4096
+  max_len: 256
+  vocab_size_src: 32000
+  vocab_size_tgt: 32000
+  norm_type: "rmsnorm"
+  mlp_type: "gated"
+  activation: "silu"
+  ff_bias: false
+  layernorm_eps: 1.0e-5
+  dropout: 0.1
+export:
+  max_len: 256

metrics.jsonl ADDED Viewed

	@@ -0,0 +1,100 @@

+{"step": 1000, "loss": 4.994450536300406, "ppl": 147.59182674848597, "acc": 0.2561425445451704, "bleu": 1.164220201274155, "chrf": 16.973689375539468}
+{"step": 2000, "loss": 2.9462299467362847, "ppl": 19.03405887389251, "acc": 0.5052829009065333, "bleu": 18.761607509943545, "chrf": 46.17091909277533}
+{"step": 3000, "loss": 2.671850914104017, "ppl": 14.46672108553379, "acc": 0.5426695842450766, "bleu": 21.07071474393453, "chrf": 49.35269560974713}
+{"step": 4000, "loss": 2.5037393429831587, "ppl": 12.228133762268097, "acc": 0.5640512660206315, "bleu": 22.48549663052993, "chrf": 51.167411359125204}
+{"step": 5000, "loss": 2.406415891990173, "ppl": 11.094127252871816, "acc": 0.5763676148796499, "bleu": 23.15323743975618, "chrf": 51.475933755821714}
+{"step": 6000, "loss": 2.337301650051774, "ppl": 10.353262112991489, "acc": 0.57786808377618, "bleu": 20.113464528437817, "chrf": 44.12110313558305}
+{"step": 7000, "loss": 2.2539055326127304, "ppl": 9.524862951475631, "acc": 0.5941856830259457, "bleu": 24.80832372875448, "chrf": 51.67041916758161}
+{"step": 8000, "loss": 2.3206392851647974, "ppl": 10.182181543467566, "acc": 0.5888090028133792, "bleu": 13.932007832745292, "chrf": 42.01360381861268}
+{"step": 9000, "loss": 2.2266226918446193, "ppl": 9.268510544385578, "acc": 0.5969990622069397, "bleu": 25.827846658927797, "chrf": 53.293785209000774}
+{"step": 10000, "loss": 2.1821169167244645, "ppl": 8.86505298802123, "acc": 0.6056892778993436, "bleu": 24.954958782293804, "chrf": 53.44992973095211}
+{"step": 11000, "loss": 2.1617847877876875, "ppl": 8.686627618095132, "acc": 0.6086902156924039, "bleu": 25.53191033743962, "chrf": 53.85723320106797}
+{"step": 12000, "loss": 2.1383983329446212, "ppl": 8.48583525579101, "acc": 0.6137542982181932, "bleu": 27.0470782171914, "chrf": 54.55581738183708}
+{"step": 13000, "loss": 2.1169620610505726, "ppl": 8.305866406058117, "acc": 0.6161925601750547, "bleu": 27.2138814312068, "chrf": 54.67266014028607}
+{"step": 14000, "loss": 2.098620139930203, "ppl": 8.154909511457713, "acc": 0.6192560175054704, "bleu": 27.444993629318027, "chrf": 54.86491069472626}
+{"step": 15000, "loss": 2.0836098530546656, "ppl": 8.033416086943097, "acc": 0.6222569552985308, "bleu": 27.78043111083278, "chrf": 55.053479575945865}
+{"step": 16000, "loss": 2.069503783225715, "ppl": 7.920891663192714, "acc": 0.6251953735542357, "bleu": 27.92890061002052, "chrf": 55.11635676587596}
+{"step": 17000, "loss": 2.0570456397201764, "ppl": 7.822824195831242, "acc": 0.6264457643013441, "bleu": 27.878779295969807, "chrf": 55.19455945452377}
+{"step": 18000, "loss": 2.046806539122333, "ppl": 7.743134185167205, "acc": 0.6283838699593624, "bleu": 27.938605871318362, "chrf": 55.17970841469209}
+{"step": 19000, "loss": 2.037328945111021, "ppl": 7.670094569626345, "acc": 0.6285714285714286, "bleu": 28.029919338244245, "chrf": 55.259974697929295}
+{"step": 20000, "loss": 2.0268357787589872, "ppl": 7.590031782064036, "acc": 0.6292591434823382, "bleu": 28.279161341177904, "chrf": 55.47056151897512}
+{"step": 21000, "loss": 2.016149803130617, "ppl": 7.509356701086586, "acc": 0.6313222882150672, "bleu": 28.418614553687238, "chrf": 55.61705677471761}
+{"step": 22000, "loss": 2.0072188200597356, "ppl": 7.442589356322234, "acc": 0.6318224445139106, "bleu": 28.362396673956198, "chrf": 55.635261370938885}
+{"step": 23000, "loss": 1.9979302881210437, "ppl": 7.3737786971120896, "acc": 0.6332603938730853, "bleu": 28.359591563080922, "chrf": 55.62669652445366}
+{"step": 24000, "loss": 1.9894417460168217, "ppl": 7.311450976169526, "acc": 0.6341356673960613, "bleu": 28.375044520543653, "chrf": 55.740701306975446}
+{"step": 25000, "loss": 1.9828298088758205, "ppl": 7.263267590204029, "acc": 0.6350734604563926, "bleu": 28.559750845501792, "chrf": 55.81894077300116}
+{"step": 26000, "loss": 1.974070632424195, "ppl": 7.19992516648463, "acc": 0.6361988121287903, "bleu": 28.600872207096703, "chrf": 55.75644033472818}
+{"step": 27000, "loss": 1.9654939966896394, "ppl": 7.138438084078314, "acc": 0.6365114098155674, "bleu": 28.742154358221875, "chrf": 55.79456357681747}
+{"step": 28000, "loss": 1.9570172685800251, "ppl": 7.078183228128073, "acc": 0.638074398249453, "bleu": 28.920497381125873, "chrf": 55.963029411561905}
+{"step": 29000, "loss": 1.9496945456587995, "ppl": 7.026540965318027, "acc": 0.6385745545482964, "bleu": 28.983978309080356, "chrf": 55.994004661605004}
+{"step": 30000, "loss": 1.9441999582694298, "ppl": 6.988038895299612, "acc": 0.6391997499218506, "bleu": 29.039322766047295, "chrf": 55.955787821536454}
+{"step": 31000, "loss": 1.9386406572061092, "ppl": 6.949298068973334, "acc": 0.6399499843701156, "bleu": 29.075298299505366, "chrf": 56.050221728406655}
+{"step": 32000, "loss": 1.9348651128770709, "ppl": 6.923110153983664, "acc": 0.6395748671459831, "bleu": 29.18013749076382, "chrf": 56.134750789202634}
+{"step": 33000, "loss": 1.9291061377517877, "ppl": 6.883354719971714, "acc": 0.6403251015942482, "bleu": 29.29599523494858, "chrf": 56.12494496076502}
+{"step": 34000, "loss": 1.9238471853692072, "ppl": 6.847250503654076, "acc": 0.6412628946545795, "bleu": 29.22514117810576, "chrf": 56.19178900888207}
+{"step": 35000, "loss": 1.919167904795688, "ppl": 6.815285143160717, "acc": 0.6423882463269772, "bleu": 29.423800910939516, "chrf": 56.28051976487358}
+{"step": 36000, "loss": 1.9154907150244704, "ppl": 6.790270067122464, "acc": 0.6427633635511097, "bleu": 29.56676497278216, "chrf": 56.378205912388864}
+{"step": 37000, "loss": 1.9099493562746659, "ppl": 6.7527468056169875, "acc": 0.6430134417005314, "bleu": 29.724507355987996, "chrf": 56.3955533759033}
+{"step": 38000, "loss": 1.9056084191959402, "ppl": 6.723497088147935, "acc": 0.6439512347608628, "bleu": 29.837468698981745, "chrf": 56.45802937684058}
+{"step": 39000, "loss": 1.9022945631813653, "ppl": 6.701253263655329, "acc": 0.6444513910597062, "bleu": 29.860181788551714, "chrf": 56.448017618511656}
+{"step": 40000, "loss": 1.8991676552364998, "ppl": 6.680331788394988, "acc": 0.6448265082838387, "bleu": 29.836159543988067, "chrf": 56.410599547621345}
+{"step": 41000, "loss": 1.896588106146452, "ppl": 6.663121751219954, "acc": 0.6454517036573929, "bleu": 29.910092217589167, "chrf": 56.495291829629345}
+{"step": 42000, "loss": 1.8937625390136565, "ppl": 6.644321226977929, "acc": 0.6457643013441701, "bleu": 29.84550037175743, "chrf": 56.44456881321764}
+{"step": 43000, "loss": 1.8920155827199419, "ppl": 6.632724021048358, "acc": 0.6458268208815254, "bleu": 29.869545378255758, "chrf": 56.47232280414817}
+{"step": 44000, "loss": 1.8896889851442833, "ppl": 6.617310279161182, "acc": 0.6463269771803689, "bleu": 29.892927828789244, "chrf": 56.56258982197585}
+{"step": 45000, "loss": 1.8889288751733941, "ppl": 6.612282306785575, "acc": 0.6468896530165676, "bleu": 29.85949759139706, "chrf": 56.467059380507834}
+{"step": 46000, "loss": 1.8868934070888852, "ppl": 6.598836905668948, "acc": 0.6473272897780556, "bleu": 29.753671938670145, "chrf": 56.450896978598905}
+{"step": 47000, "loss": 1.885726744594258, "ppl": 6.591142779240017, "acc": 0.647389809315411, "bleu": 29.708054439281167, "chrf": 56.39153318937854}
+{"step": 48000, "loss": 1.8822946913952603, "ppl": 6.5685604007081775, "acc": 0.6480775242263207, "bleu": 29.768954442179517, "chrf": 56.43350439980706}
+{"step": 49000, "loss": 1.8784707959274383, "ppl": 6.543490874533134, "acc": 0.6483901219130979, "bleu": 29.86387516687504, "chrf": 56.553222956271874}
+{"step": 50000, "loss": 1.875717959652919, "ppl": 6.525502486395253, "acc": 0.6483901219130979, "bleu": 29.9431764646548, "chrf": 56.691396815918615}
+{"step": 51000, "loss": 1.8721117835895984, "ppl": 6.502012755037178, "acc": 0.6492028758987184, "bleu": 29.929636873284917, "chrf": 56.680295034811145}
+{"step": 52000, "loss": 1.8692507821047593, "ppl": 6.483437072089493, "acc": 0.6495154735854954, "bleu": 29.89108068645933, "chrf": 56.68084336292139}
+{"step": 53000, "loss": 1.866853663070979, "ppl": 6.4679141143016, "acc": 0.650328227571116, "bleu": 29.928095351705764, "chrf": 56.67216295536798}
+{"step": 54000, "loss": 1.8650410658421088, "ppl": 6.456201009878714, "acc": 0.6506408252578931, "bleu": 30.02093405318264, "chrf": 56.65811632388697}
+{"step": 55000, "loss": 1.863233341504425, "ppl": 6.444540520834505, "acc": 0.6506408252578931, "bleu": 30.033952526175927, "chrf": 56.7120040756438}
+{"step": 56000, "loss": 1.861440289262758, "ppl": 6.4329954765340345, "acc": 0.6506408252578931, "bleu": 29.957227110478136, "chrf": 56.67272279994852}
+{"step": 57000, "loss": 1.8604369926393014, "ppl": 6.426544510551042, "acc": 0.6513910597061582, "bleu": 29.96152325240789, "chrf": 56.70643423675522}
+{"step": 58000, "loss": 1.8573550892979251, "ppl": 6.40676901029315, "acc": 0.6518912160050016, "bleu": 29.962857126854125, "chrf": 56.700294962427975}
+{"step": 59000, "loss": 1.8542828019986715, "ppl": 6.387115780875955, "acc": 0.6523913723038449, "bleu": 29.9999105342699, "chrf": 56.71413288331987}
+{"step": 60000, "loss": 1.8523500309544976, "ppl": 6.374782870624044, "acc": 0.6521412941544232, "bleu": 30.002422333797487, "chrf": 56.72571003647862}
+{"step": 61000, "loss": 1.8497147698743748, "ppl": 6.358005769161304, "acc": 0.6518286964676462, "bleu": 30.088128095780675, "chrf": 56.82286559795213}
+{"step": 62000, "loss": 1.8478954376001289, "ppl": 6.346448960091215, "acc": 0.6521412941544232, "bleu": 30.163192189586514, "chrf": 56.87471933431556}
+{"step": 63000, "loss": 1.846671395571614, "ppl": 6.338685392268225, "acc": 0.6521412941544232, "bleu": 30.294580216517947, "chrf": 56.93527753910208}
+{"step": 64000, "loss": 1.8456072049798278, "ppl": 6.331943410922315, "acc": 0.651953735542357, "bleu": 30.267105355150683, "chrf": 56.92528396280259}
+{"step": 65000, "loss": 1.8433829190396414, "ppl": 6.31787501009265, "acc": 0.6523913723038449, "bleu": 30.26381770494934, "chrf": 56.947207186438845}
+{"step": 66000, "loss": 1.8415594121820296, "ppl": 6.306364819331557, "acc": 0.6525789309159112, "bleu": 30.372801529196206, "chrf": 56.975244819029584}
+{"step": 67000, "loss": 1.8398574414123554, "ppl": 6.295640699404194, "acc": 0.6533916849015318, "bleu": 30.360651092771327, "chrf": 57.02676405176798}
+{"step": 68000, "loss": 1.8397201685951665, "ppl": 6.294776538383643, "acc": 0.6536417630509535, "bleu": 30.31569810757419, "chrf": 57.010332562022306}
+{"step": 69000, "loss": 1.8378186000813241, "ppl": 6.282817963145131, "acc": 0.653579243513598, "bleu": 30.43040186082636, "chrf": 57.0099551426294}
+{"step": 70000, "loss": 1.8366587260545288, "ppl": 6.275534910303006, "acc": 0.6537042825883088, "bleu": 30.36967687853596, "chrf": 56.99462352920415}
+{"step": 71000, "loss": 1.8353953078598781, "ppl": 6.267611291792051, "acc": 0.6535167239762426, "bleu": 30.312364361794366, "chrf": 57.010298640877124}
+{"step": 72000, "loss": 1.8339239809132248, "ppl": 6.258396367153129, "acc": 0.6534542044388871, "bleu": 30.314802903457725, "chrf": 57.02531379076068}
+{"step": 73000, "loss": 1.832636663801188, "ppl": 6.25034500985268, "acc": 0.6532666458268209, "bleu": 30.18211166645784, "chrf": 56.98519639908388}
+{"step": 74000, "loss": 1.8311134483859701, "ppl": 6.24083163528204, "acc": 0.6534542044388871, "bleu": 30.240694916926348, "chrf": 57.005477510723715}
+{"step": 75000, "loss": 1.8313917221446154, "ppl": 6.242568536614085, "acc": 0.6538293216630197, "bleu": 30.25037189042524, "chrf": 57.01518239004976}
+{"step": 76000, "loss": 1.8304509984809707, "ppl": 6.236698766018694, "acc": 0.6540793998124413, "bleu": 30.17670592260045, "chrf": 56.98846895856825}
+{"step": 77000, "loss": 1.8294424893521413, "ppl": 6.230412168957504, "acc": 0.6538293216630197, "bleu": 30.15356603689065, "chrf": 57.05283117752522}
+{"step": 78000, "loss": 1.8289005528766313, "ppl": 6.227036596101329, "acc": 0.6538293216630197, "bleu": 30.219200078052154, "chrf": 57.03607682555588}
+{"step": 79000, "loss": 1.8285974338599464, "ppl": 6.225149348936161, "acc": 0.6541419193497968, "bleu": 30.201865071457675, "chrf": 57.03725790141988}
+{"step": 80000, "loss": 1.8268809665251837, "ppl": 6.214473248634622, "acc": 0.654016880275086, "bleu": 30.246014694107835, "chrf": 57.07984929057122}
+{"step": 81000, "loss": 1.8151666658526102, "ppl": 6.1420997704386275, "acc": 0.6548921537980619, "bleu": 30.30511155087304, "chrf": 57.05979018535777}
+{"step": 82000, "loss": 1.8017998941319553, "ppl": 6.0605459945424975, "acc": 0.6555173491716161, "bleu": 30.386818573469718, "chrf": 57.0528476753652}
+{"step": 83000, "loss": 1.7898324244392778, "ppl": 5.988448864619553, "acc": 0.6561425445451704, "bleu": 30.43094334701287, "chrf": 57.06257176341698}
+{"step": 84000, "loss": 1.7796441400449847, "ppl": 5.9277465955758055, "acc": 0.6569552985307908, "bleu": 30.692067267210952, "chrf": 57.16142812336842}
+{"step": 85000, "loss": 1.7700074883020085, "ppl": 5.8708973242702225, "acc": 0.6575179743669897, "bleu": 30.837943336700793, "chrf": 57.27245831866552}
+{"step": 86000, "loss": 1.7619386177504202, "ppl": 5.823716418033855, "acc": 0.658018130665833, "bleu": 30.885682694104386, "chrf": 57.27633829541885}
+{"step": 87000, "loss": 1.7556014742766295, "ppl": 5.786927383356287, "acc": 0.6585808065020319, "bleu": 30.980860506825476, "chrf": 57.3046725438474}
+{"step": 88000, "loss": 1.7502013912123418, "ppl": 5.755761719121828, "acc": 0.6591434823382307, "bleu": 30.932408305594507, "chrf": 57.24207869793768}
+{"step": 89000, "loss": 1.745357054328501, "ppl": 5.727946298362393, "acc": 0.6593310409502969, "bleu": 30.86580944197795, "chrf": 57.155334441206705}
+{"step": 90000, "loss": 1.7414522467348195, "ppl": 5.705623381871351, "acc": 0.6592685214129416, "bleu": 30.778809073597362, "chrf": 57.10320733293281}
+{"step": 91000, "loss": 1.7385490644048922, "ppl": 5.68908293846694, "acc": 0.6593935604876524, "bleu": 30.814687019400694, "chrf": 57.156128318200786}
+{"step": 92000, "loss": 1.735776736565626, "ppl": 5.673332777848827, "acc": 0.6590809628008752, "bleu": 30.7948046957926, "chrf": 57.06469797598115}
+{"step": 93000, "loss": 1.7336947337058455, "ppl": 5.6615331704513805, "acc": 0.6592685214129416, "bleu": 30.802719069102576, "chrf": 57.094310566383314}
+{"step": 94000, "loss": 1.7316813252798726, "ppl": 5.650145659564158, "acc": 0.6587683651140982, "bleu": 30.847028285926314, "chrf": 57.081001817985936}
+{"step": 95000, "loss": 1.7300995931956276, "ppl": 5.641215707151691, "acc": 0.6586433260393874, "bleu": 30.83037701203774, "chrf": 57.07951804602165}
+{"step": 96000, "loss": 1.7291842680940332, "ppl": 5.636054523252361, "acc": 0.658455767427321, "bleu": 30.874036638473978, "chrf": 57.087228488058415}
+{"step": 97000, "loss": 1.728562312410861, "ppl": 5.6325502369792995, "acc": 0.6587058455767427, "bleu": 30.870366035906606, "chrf": 57.05137890300803}
+{"step": 98000, "loss": 1.7277250108065998, "ppl": 5.627836067496046, "acc": 0.6585182869646765, "bleu": 30.95775184085035, "chrf": 57.088495304174}
+{"step": 99000, "loss": 1.7270162512638227, "ppl": 5.623848698187965, "acc": 0.6588308846514536, "bleu": 30.971976171090674, "chrf": 57.074751985253954}
+{"step": 100000, "loss": 1.726466431160128, "ppl": 5.620757443008561, "acc": 0.6588308846514536, "bleu": 30.977369378486593, "chrf": 57.08608812701335}

model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:af874d90330cc235279656d6780eed25689bbcfd8467926a1adce65340c778f8
-size 409915789

 version https://git-lfs.github.com/spec/v1
+oid sha256:f43106bbeb49ef0437a5c0bd61761b28d3c7750723401b72090fa8d0758f7482
+size 399605364

pytorch_model/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6efc214f6d81d13ee58e3c29a8a20c46a9d35e755f58a0ec604a0835f808801
+size 799169344

pytorch_model/tokenizer_src.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c26f3e3e3df69013a62aff1e7b9d90a1838d3f1d7601dbee7fa09b29dcc09754
+size 817478

pytorch_model/tokenizer_src.vocab ADDED Viewed

The diff for this file is too large to render. See raw diff

pytorch_model/tokenizer_tgt.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c49b8b83d9b6461a63ae3bd563d035ea72deb18ea4756c16829ffc0c709aea1f
+size 802177

pytorch_model/tokenizer_tgt.vocab ADDED Viewed

The diff for this file is too large to render. See raw diff

source_vocabulary.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

src.spm.model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:538f374f5558509c152305b8efbea6cc87daa58cfd52dea3bb962c0ad908c797
-size 814659

 version https://git-lfs.github.com/spec/v1
+oid sha256:df4e8c5fdac389435c77641254f27811bb6709fe6e2a5bdb8fa5ea56900d4d85
+size 817694

target_vocabulary.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tgt.spm.model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ac985ba45c9ec783ae106ecde3c5873db2c14e4a1e76086e1eaf7d48295e9b0f
-size 800209

 version https://git-lfs.github.com/spec/v1
+oid sha256:a5fcc4576244508befdbe68bd7cc13d3d45140dc1e94665a34ecbde300e59141
+size 801740