radinplaid commited on
Commit
2e2007c
·
verified ·
1 Parent(s): 4b6b0fd

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,20 +1,22 @@
1
  ---
2
  language:
3
- - en
4
  - is
 
5
  tags:
6
  - translation
7
  license: cc-by-4.0
8
  datasets:
9
  - quickmt/quickmt-train.is-en
10
  - quickmt/newscrawl2024-en-backtranslated-is
 
 
11
  model-index:
12
  - name: quickmt-is-en
13
  results:
14
  - task:
15
  name: Translation isl-eng
16
  type: translation
17
- args: isl-eng
18
  dataset:
19
  name: flores101-devtest
20
  type: flores_101
@@ -22,91 +24,103 @@ model-index:
22
  metrics:
23
  - name: BLEU
24
  type: bleu
25
- value: 34.76
26
  - name: CHRF
27
  type: chrf
28
- value: 60.13
29
- - name: COMET
30
- type: comet
31
- value: 85.39
32
  ---
33
 
34
- <a href="https://huggingface.co/spaces/quickmt/quickmt-gui"><img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-lg-dark.svg" alt="Open in Spaces"></a>
35
 
36
  # `quickmt-is-en` Neural Machine Translation Model
37
 
38
  `quickmt-is-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `is` into `en`.
39
 
 
 
40
 
41
  ## Try it on our Huggingface Space
42
 
43
- Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-gui
44
 
45
 
46
  ## Model Information
47
 
48
- * Trained using [`eole`](https://github.com/eole-nlp/eole)
49
- * 200M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
50
  * 32k separate Sentencepiece vocabs
51
  * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
52
- * The pytorch model (for use with [`eole`](https://github.com/eole-nlp/eole)) is available in this repository in the `eole-model` folder
53
-
54
- See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
55
 
56
 
57
  ## Usage with `quickmt`
58
 
59
- You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
60
 
61
- Next, install the `quickmt` [python library](github.com/quickmt/quickmt).
62
 
63
  ```bash
64
  git clone https://github.com/quickmt/quickmt.git
65
- pip install ./quickmt/
66
  ```
67
 
68
- Finally, use the model in python:
69
 
70
  ```python
71
  from quickmt import Translator
72
- from huggingface_hub import snapshot_download
73
 
74
- # Download Model (if not downloaded already) and return path to local model
75
- # Device is either 'auto', 'cpu' or 'cuda'
76
- t = Translator(
77
- snapshot_download("quickmt/quickmt-zh-en", ignore_patterns="eole-model/*"),
78
- device="cpu"
79
- )
80
 
81
  # Translate - set beam size to 1 for faster speed (but lower quality)
82
  sample_text = 'Dr. Ehud Ur, læknaprófessor við Dalhousie-háskólann í Halifax í Nova Scotia og formaður klínískrar vísindadeildar Kanadíska sykursýkissambandsins, minnti á að rannsóknin væri rétt nýhafin.'
83
 
84
- t(sample_text, beam_size=5)
85
  ```
86
 
87
- > 'Dr. Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia, and chair of the clinical science department of the Canadian Diabetes Association, recalled that the study had just begun.'
88
 
89
  ```python
90
  # Get alternative translations by sampling
91
  # You can pass any cTranslate2 `translate_batch` arguments
92
- t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
93
  ```
94
 
95
- > 'Dr Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia and chair of the clinical science section of the Canadian Diabetes Union, mentioned that the investigation was just beginning.'
96
 
97
  The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
98
 
99
 
100
  ## Metrics
101
 
102
- `bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("isl_Latn"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
 
104
 
105
- | | bleu | chrf2 | comet22 | Time (s) |
106
- |:---------------------------------|-------:|--------:|----------:|-----------:|
107
- | quickmt/quickmt-is-en | 34.76 | 60.13 | 85.39 | 1.22 |
108
- | Helsinki-NLP/opus-mt-is-en | 25.91 | 52.03 | 79.99 | 3.5 |
109
- | facebook/nllb-200-distilled-600M | 30.13 | 54.77 | 82.23 | 21.3 |
110
- | facebook/nllb-200-distilled-1.3B | 33.71 | 57.73 | 84.71 | 37.21 |
111
- | facebook/m2m100_418M | 20.38 | 46.47 | 70.95 | 18.8 |
112
- | facebook/m2m100_1.2B | 28.89 | 54.54 | 81.09 | 34.72 |
 
1
  ---
2
  language:
 
3
  - is
4
+ - en
5
  tags:
6
  - translation
7
  license: cc-by-4.0
8
  datasets:
9
  - quickmt/quickmt-train.is-en
10
  - quickmt/newscrawl2024-en-backtranslated-is
11
+ - quickmt/finetranslations-sample-is-en
12
+ - HuggingFaceFW/finetranslations
13
  model-index:
14
  - name: quickmt-is-en
15
  results:
16
  - task:
17
  name: Translation isl-eng
18
  type: translation
19
+ args: iso-eng
20
  dataset:
21
  name: flores101-devtest
22
  type: flores_101
 
24
  metrics:
25
  - name: BLEU
26
  type: bleu
27
+ value: 36.09
28
  - name: CHRF
29
  type: chrf
30
+ value: 60.91
31
+
 
 
32
  ---
33
 
 
34
 
35
  # `quickmt-is-en` Neural Machine Translation Model
36
 
37
  `quickmt-is-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `is` into `en`.
38
 
39
+ `quickmt` models are roughly 3 times faster for GPU inference than OpusMT models and roughly [40 times](https://huggingface.co/spaces/quickmt/quickmt-vs-libretranslate) faster than [LibreTranslate](https://huggingface.co/spaces/quickmt/quickmt-vs-libretranslate)/[ArgosTranslate](github.com/argosopentech/argos-translate).
40
+
41
 
42
  ## Try it on our Huggingface Space
43
 
44
+ Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-Demo
45
 
46
 
47
  ## Model Information
48
 
49
+ * Trained using [`quickmt-train`](github.com/quickmt/quickmt-train)
50
+ * 200M parameter seq2seq transformer
51
  * 32k separate Sentencepiece vocabs
52
  * Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
53
+ * The pytorch model (for fine-tuning or pytorch inference) is available in this repository in the `pytorch_model` folder
54
+ * Original configuration file: `config.yaml`
 
55
 
56
 
57
  ## Usage with `quickmt`
58
 
59
+ If you want to do GPU inference be sure you have the Nvidia driver and cuda toolkit installed.
60
 
61
+ Next, install the `quickmt` python library and download the model:
62
 
63
  ```bash
64
  git clone https://github.com/quickmt/quickmt.git
65
+ pip install -e ./quickmt/
66
  ```
67
 
68
+ Finally use the model in python:
69
 
70
  ```python
71
  from quickmt import Translator
 
72
 
73
+ # Auto-detects GPU, set to "cpu" to force CPU inference
74
+ mt = Translator("quickmt/quickmt-is-en", device="auto")
 
 
 
 
75
 
76
  # Translate - set beam size to 1 for faster speed (but lower quality)
77
  sample_text = 'Dr. Ehud Ur, læknaprófessor við Dalhousie-háskólann í Halifax í Nova Scotia og formaður klínískrar vísindadeildar Kanadíska sykursýkissambandsins, minnti á að rannsóknin væri rétt nýhafin.'
78
 
79
+ mt(sample_text, beam_size=5)
80
  ```
81
 
82
+ > "Dr. Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia and chair of the Canadian Diabetes Association's clinical science department, recalled that the study had just begun."
83
 
84
  ```python
85
  # Get alternative translations by sampling
86
  # You can pass any cTranslate2 `translate_batch` arguments
87
+ mt([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
88
  ```
89
 
90
+ > 'Dr. Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia, and chair of the Clinical Division of the Canadian Diabetes Association, reminded that the study had just begun.'
91
 
92
  The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
93
 
94
 
95
  ## Metrics
96
 
97
+ `bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) and [Bouquet](https://huggingface.co/datasets/facebook/bouquet) `test` set. "Time (s)" is the time in seconds to translate dataset on an RTX 4070s GPU with batch size 32. LLM inference done with vLLM and 32 threads.
98
+
99
+ Benchmarks are hard to get right and make fair. Download this model and give it a try and see if it works well for you!
100
+
101
+
102
+ ### flores devtest
103
+
104
+ | model | time | bleu | chrf |
105
+ |----------------------------------|-------|-------|-------|
106
+ | quickmt-is-en | 0.70 | 47.68 | 65.91 |
107
+ | Helsinki-NLP/opus-mt-is-en | 1.17 | 36.46 | 56.62 |
108
+ | facebook/nllb-200-distilled-1.3B | 8.57 | 40.31 | 60.39 |
109
+ | CohereLabs/tiny-aya-global | 14.22 | 22.26 | 43.01 |
110
+ | google/gemma-4-E2B-it | 23.79 | 36.90 | 57.52 |
111
+
112
+ ### bouquet test
113
+
114
+ | model | time | bleu | chrf |
115
+ |----------------------------------|-------|-------|-------|
116
+ | quickmt-is-en | 1.16 | 36.09 | 60.91 |
117
+ | Helsinki-NLP/opus-mt-is-en | 2.33 | 25.26 | 51.44 |
118
+ | facebook/nllb-200-distilled-1.3B | 18.17 | 32.79 | 56.81 |
119
+ | CohereLabs/tiny-aya-global | 27.03 | 16.03 | 40.63 |
120
+ | google/gemma-4-E2B-it | 46.60 | 28.55 | 54.30 |
121
+
122
+
123
+ Prompt for LLM translation:
124
 
125
+ > Translate the following into {tgt_lang}, without commentary or explanation.\n\n{x}
126
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
- "add_source_bos": false,
3
  "add_source_eos": false,
4
  "bos_token": "<s>",
5
  "decoder_start_token": "<s>",
6
  "eos_token": "</s>",
7
- "layer_norm_epsilon": 1e-06,
8
  "multi_query_attention": false,
9
  "unk_token": "<unk>"
10
  }
 
1
  {
2
+ "add_source_bos": true,
3
  "add_source_eos": false,
4
  "bos_token": "<s>",
5
  "decoder_start_token": "<s>",
6
  "eos_token": "</s>",
7
+ "layer_norm_epsilon": null,
8
  "multi_query_attention": false,
9
  "unk_token": "<unk>"
10
  }
config.yaml ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ train:
3
+ experiment_name: "isen1"
4
+ lr: 2.5e-3
5
+ accum_steps: 6
6
+ warmup_steps: 10000
7
+ max_steps: 100000
8
+ eval_steps: 1000
9
+ max_checkpoints: 10
10
+ precision: "bfloat16" # or float16 with an older GPU
11
+ enable_torch_compile: true
12
+ checkpoint_strategy: best
13
+ early_stopping_patience: 0
14
+ early_stopping_metric: chrf
15
+ use_ema: true
16
+ ema_decay: 0.9999
17
+ ema_start_step: 10000
18
+ z_loss_coeff: 0.0005
19
+ weight_decay_embeddings: false
20
+ scheduler_type: "cosine"
21
+
22
+ data:
23
+ src_lang: "is"
24
+ tgt_lang: "en"
25
+ src_dev_path: "quickmt-valid.is-en.is"
26
+ tgt_dev_path: "quickmt-valid.is-en.en"
27
+ input_sentence_size: 10000000
28
+ max_tokens_per_batch: 20000
29
+ buffer_size: 40000
30
+ num_workers: 4
31
+ prefetch_factor: 128
32
+ pad_multiple: 1
33
+ corpora:
34
+ - src_file: "quickmt-train.is-en.is"
35
+ tgt_file: "quickmt-train.is-en.en"
36
+ weight: 10
37
+ start_step: 0
38
+ - src_file: "finetranslations-sample-is-en.is"
39
+ tgt_file: "finetranslations-sample-is-en.en"
40
+ weight: 4
41
+ start_step: 0
42
+ stop_step: 80000
43
+ - src_file: "newscrawl2024-en-backtranslated-is.is"
44
+ tgt_file: "newscrawl2024-en-backtranslated-is.en"
45
+ start_step: 0
46
+ weight: 5
47
+ stop_step: 80000
48
+
49
+ model:
50
+ d_model: 768
51
+ enc_layers: 12
52
+ dec_layers: 2
53
+ n_heads: 12
54
+ ffn_dim: 4096
55
+ max_len: 256
56
+ vocab_size_src: 32000
57
+ vocab_size_tgt: 32000
58
+ norm_type: "rmsnorm"
59
+ mlp_type: "gated"
60
+ activation: "silu"
61
+ ff_bias: false
62
+ layernorm_eps: 1.0e-5
63
+ dropout: 0.1
64
+
65
+ export:
66
+ max_len: 256
metrics.jsonl ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"step": 1000, "loss": 4.994450536300406, "ppl": 147.59182674848597, "acc": 0.2561425445451704, "bleu": 1.164220201274155, "chrf": 16.973689375539468}
2
+ {"step": 2000, "loss": 2.9462299467362847, "ppl": 19.03405887389251, "acc": 0.5052829009065333, "bleu": 18.761607509943545, "chrf": 46.17091909277533}
3
+ {"step": 3000, "loss": 2.671850914104017, "ppl": 14.46672108553379, "acc": 0.5426695842450766, "bleu": 21.07071474393453, "chrf": 49.35269560974713}
4
+ {"step": 4000, "loss": 2.5037393429831587, "ppl": 12.228133762268097, "acc": 0.5640512660206315, "bleu": 22.48549663052993, "chrf": 51.167411359125204}
5
+ {"step": 5000, "loss": 2.406415891990173, "ppl": 11.094127252871816, "acc": 0.5763676148796499, "bleu": 23.15323743975618, "chrf": 51.475933755821714}
6
+ {"step": 6000, "loss": 2.337301650051774, "ppl": 10.353262112991489, "acc": 0.57786808377618, "bleu": 20.113464528437817, "chrf": 44.12110313558305}
7
+ {"step": 7000, "loss": 2.2539055326127304, "ppl": 9.524862951475631, "acc": 0.5941856830259457, "bleu": 24.80832372875448, "chrf": 51.67041916758161}
8
+ {"step": 8000, "loss": 2.3206392851647974, "ppl": 10.182181543467566, "acc": 0.5888090028133792, "bleu": 13.932007832745292, "chrf": 42.01360381861268}
9
+ {"step": 9000, "loss": 2.2266226918446193, "ppl": 9.268510544385578, "acc": 0.5969990622069397, "bleu": 25.827846658927797, "chrf": 53.293785209000774}
10
+ {"step": 10000, "loss": 2.1821169167244645, "ppl": 8.86505298802123, "acc": 0.6056892778993436, "bleu": 24.954958782293804, "chrf": 53.44992973095211}
11
+ {"step": 11000, "loss": 2.1617847877876875, "ppl": 8.686627618095132, "acc": 0.6086902156924039, "bleu": 25.53191033743962, "chrf": 53.85723320106797}
12
+ {"step": 12000, "loss": 2.1383983329446212, "ppl": 8.48583525579101, "acc": 0.6137542982181932, "bleu": 27.0470782171914, "chrf": 54.55581738183708}
13
+ {"step": 13000, "loss": 2.1169620610505726, "ppl": 8.305866406058117, "acc": 0.6161925601750547, "bleu": 27.2138814312068, "chrf": 54.67266014028607}
14
+ {"step": 14000, "loss": 2.098620139930203, "ppl": 8.154909511457713, "acc": 0.6192560175054704, "bleu": 27.444993629318027, "chrf": 54.86491069472626}
15
+ {"step": 15000, "loss": 2.0836098530546656, "ppl": 8.033416086943097, "acc": 0.6222569552985308, "bleu": 27.78043111083278, "chrf": 55.053479575945865}
16
+ {"step": 16000, "loss": 2.069503783225715, "ppl": 7.920891663192714, "acc": 0.6251953735542357, "bleu": 27.92890061002052, "chrf": 55.11635676587596}
17
+ {"step": 17000, "loss": 2.0570456397201764, "ppl": 7.822824195831242, "acc": 0.6264457643013441, "bleu": 27.878779295969807, "chrf": 55.19455945452377}
18
+ {"step": 18000, "loss": 2.046806539122333, "ppl": 7.743134185167205, "acc": 0.6283838699593624, "bleu": 27.938605871318362, "chrf": 55.17970841469209}
19
+ {"step": 19000, "loss": 2.037328945111021, "ppl": 7.670094569626345, "acc": 0.6285714285714286, "bleu": 28.029919338244245, "chrf": 55.259974697929295}
20
+ {"step": 20000, "loss": 2.0268357787589872, "ppl": 7.590031782064036, "acc": 0.6292591434823382, "bleu": 28.279161341177904, "chrf": 55.47056151897512}
21
+ {"step": 21000, "loss": 2.016149803130617, "ppl": 7.509356701086586, "acc": 0.6313222882150672, "bleu": 28.418614553687238, "chrf": 55.61705677471761}
22
+ {"step": 22000, "loss": 2.0072188200597356, "ppl": 7.442589356322234, "acc": 0.6318224445139106, "bleu": 28.362396673956198, "chrf": 55.635261370938885}
23
+ {"step": 23000, "loss": 1.9979302881210437, "ppl": 7.3737786971120896, "acc": 0.6332603938730853, "bleu": 28.359591563080922, "chrf": 55.62669652445366}
24
+ {"step": 24000, "loss": 1.9894417460168217, "ppl": 7.311450976169526, "acc": 0.6341356673960613, "bleu": 28.375044520543653, "chrf": 55.740701306975446}
25
+ {"step": 25000, "loss": 1.9828298088758205, "ppl": 7.263267590204029, "acc": 0.6350734604563926, "bleu": 28.559750845501792, "chrf": 55.81894077300116}
26
+ {"step": 26000, "loss": 1.974070632424195, "ppl": 7.19992516648463, "acc": 0.6361988121287903, "bleu": 28.600872207096703, "chrf": 55.75644033472818}
27
+ {"step": 27000, "loss": 1.9654939966896394, "ppl": 7.138438084078314, "acc": 0.6365114098155674, "bleu": 28.742154358221875, "chrf": 55.79456357681747}
28
+ {"step": 28000, "loss": 1.9570172685800251, "ppl": 7.078183228128073, "acc": 0.638074398249453, "bleu": 28.920497381125873, "chrf": 55.963029411561905}
29
+ {"step": 29000, "loss": 1.9496945456587995, "ppl": 7.026540965318027, "acc": 0.6385745545482964, "bleu": 28.983978309080356, "chrf": 55.994004661605004}
30
+ {"step": 30000, "loss": 1.9441999582694298, "ppl": 6.988038895299612, "acc": 0.6391997499218506, "bleu": 29.039322766047295, "chrf": 55.955787821536454}
31
+ {"step": 31000, "loss": 1.9386406572061092, "ppl": 6.949298068973334, "acc": 0.6399499843701156, "bleu": 29.075298299505366, "chrf": 56.050221728406655}
32
+ {"step": 32000, "loss": 1.9348651128770709, "ppl": 6.923110153983664, "acc": 0.6395748671459831, "bleu": 29.18013749076382, "chrf": 56.134750789202634}
33
+ {"step": 33000, "loss": 1.9291061377517877, "ppl": 6.883354719971714, "acc": 0.6403251015942482, "bleu": 29.29599523494858, "chrf": 56.12494496076502}
34
+ {"step": 34000, "loss": 1.9238471853692072, "ppl": 6.847250503654076, "acc": 0.6412628946545795, "bleu": 29.22514117810576, "chrf": 56.19178900888207}
35
+ {"step": 35000, "loss": 1.919167904795688, "ppl": 6.815285143160717, "acc": 0.6423882463269772, "bleu": 29.423800910939516, "chrf": 56.28051976487358}
36
+ {"step": 36000, "loss": 1.9154907150244704, "ppl": 6.790270067122464, "acc": 0.6427633635511097, "bleu": 29.56676497278216, "chrf": 56.378205912388864}
37
+ {"step": 37000, "loss": 1.9099493562746659, "ppl": 6.7527468056169875, "acc": 0.6430134417005314, "bleu": 29.724507355987996, "chrf": 56.3955533759033}
38
+ {"step": 38000, "loss": 1.9056084191959402, "ppl": 6.723497088147935, "acc": 0.6439512347608628, "bleu": 29.837468698981745, "chrf": 56.45802937684058}
39
+ {"step": 39000, "loss": 1.9022945631813653, "ppl": 6.701253263655329, "acc": 0.6444513910597062, "bleu": 29.860181788551714, "chrf": 56.448017618511656}
40
+ {"step": 40000, "loss": 1.8991676552364998, "ppl": 6.680331788394988, "acc": 0.6448265082838387, "bleu": 29.836159543988067, "chrf": 56.410599547621345}
41
+ {"step": 41000, "loss": 1.896588106146452, "ppl": 6.663121751219954, "acc": 0.6454517036573929, "bleu": 29.910092217589167, "chrf": 56.495291829629345}
42
+ {"step": 42000, "loss": 1.8937625390136565, "ppl": 6.644321226977929, "acc": 0.6457643013441701, "bleu": 29.84550037175743, "chrf": 56.44456881321764}
43
+ {"step": 43000, "loss": 1.8920155827199419, "ppl": 6.632724021048358, "acc": 0.6458268208815254, "bleu": 29.869545378255758, "chrf": 56.47232280414817}
44
+ {"step": 44000, "loss": 1.8896889851442833, "ppl": 6.617310279161182, "acc": 0.6463269771803689, "bleu": 29.892927828789244, "chrf": 56.56258982197585}
45
+ {"step": 45000, "loss": 1.8889288751733941, "ppl": 6.612282306785575, "acc": 0.6468896530165676, "bleu": 29.85949759139706, "chrf": 56.467059380507834}
46
+ {"step": 46000, "loss": 1.8868934070888852, "ppl": 6.598836905668948, "acc": 0.6473272897780556, "bleu": 29.753671938670145, "chrf": 56.450896978598905}
47
+ {"step": 47000, "loss": 1.885726744594258, "ppl": 6.591142779240017, "acc": 0.647389809315411, "bleu": 29.708054439281167, "chrf": 56.39153318937854}
48
+ {"step": 48000, "loss": 1.8822946913952603, "ppl": 6.5685604007081775, "acc": 0.6480775242263207, "bleu": 29.768954442179517, "chrf": 56.43350439980706}
49
+ {"step": 49000, "loss": 1.8784707959274383, "ppl": 6.543490874533134, "acc": 0.6483901219130979, "bleu": 29.86387516687504, "chrf": 56.553222956271874}
50
+ {"step": 50000, "loss": 1.875717959652919, "ppl": 6.525502486395253, "acc": 0.6483901219130979, "bleu": 29.9431764646548, "chrf": 56.691396815918615}
51
+ {"step": 51000, "loss": 1.8721117835895984, "ppl": 6.502012755037178, "acc": 0.6492028758987184, "bleu": 29.929636873284917, "chrf": 56.680295034811145}
52
+ {"step": 52000, "loss": 1.8692507821047593, "ppl": 6.483437072089493, "acc": 0.6495154735854954, "bleu": 29.89108068645933, "chrf": 56.68084336292139}
53
+ {"step": 53000, "loss": 1.866853663070979, "ppl": 6.4679141143016, "acc": 0.650328227571116, "bleu": 29.928095351705764, "chrf": 56.67216295536798}
54
+ {"step": 54000, "loss": 1.8650410658421088, "ppl": 6.456201009878714, "acc": 0.6506408252578931, "bleu": 30.02093405318264, "chrf": 56.65811632388697}
55
+ {"step": 55000, "loss": 1.863233341504425, "ppl": 6.444540520834505, "acc": 0.6506408252578931, "bleu": 30.033952526175927, "chrf": 56.7120040756438}
56
+ {"step": 56000, "loss": 1.861440289262758, "ppl": 6.4329954765340345, "acc": 0.6506408252578931, "bleu": 29.957227110478136, "chrf": 56.67272279994852}
57
+ {"step": 57000, "loss": 1.8604369926393014, "ppl": 6.426544510551042, "acc": 0.6513910597061582, "bleu": 29.96152325240789, "chrf": 56.70643423675522}
58
+ {"step": 58000, "loss": 1.8573550892979251, "ppl": 6.40676901029315, "acc": 0.6518912160050016, "bleu": 29.962857126854125, "chrf": 56.700294962427975}
59
+ {"step": 59000, "loss": 1.8542828019986715, "ppl": 6.387115780875955, "acc": 0.6523913723038449, "bleu": 29.9999105342699, "chrf": 56.71413288331987}
60
+ {"step": 60000, "loss": 1.8523500309544976, "ppl": 6.374782870624044, "acc": 0.6521412941544232, "bleu": 30.002422333797487, "chrf": 56.72571003647862}
61
+ {"step": 61000, "loss": 1.8497147698743748, "ppl": 6.358005769161304, "acc": 0.6518286964676462, "bleu": 30.088128095780675, "chrf": 56.82286559795213}
62
+ {"step": 62000, "loss": 1.8478954376001289, "ppl": 6.346448960091215, "acc": 0.6521412941544232, "bleu": 30.163192189586514, "chrf": 56.87471933431556}
63
+ {"step": 63000, "loss": 1.846671395571614, "ppl": 6.338685392268225, "acc": 0.6521412941544232, "bleu": 30.294580216517947, "chrf": 56.93527753910208}
64
+ {"step": 64000, "loss": 1.8456072049798278, "ppl": 6.331943410922315, "acc": 0.651953735542357, "bleu": 30.267105355150683, "chrf": 56.92528396280259}
65
+ {"step": 65000, "loss": 1.8433829190396414, "ppl": 6.31787501009265, "acc": 0.6523913723038449, "bleu": 30.26381770494934, "chrf": 56.947207186438845}
66
+ {"step": 66000, "loss": 1.8415594121820296, "ppl": 6.306364819331557, "acc": 0.6525789309159112, "bleu": 30.372801529196206, "chrf": 56.975244819029584}
67
+ {"step": 67000, "loss": 1.8398574414123554, "ppl": 6.295640699404194, "acc": 0.6533916849015318, "bleu": 30.360651092771327, "chrf": 57.02676405176798}
68
+ {"step": 68000, "loss": 1.8397201685951665, "ppl": 6.294776538383643, "acc": 0.6536417630509535, "bleu": 30.31569810757419, "chrf": 57.010332562022306}
69
+ {"step": 69000, "loss": 1.8378186000813241, "ppl": 6.282817963145131, "acc": 0.653579243513598, "bleu": 30.43040186082636, "chrf": 57.0099551426294}
70
+ {"step": 70000, "loss": 1.8366587260545288, "ppl": 6.275534910303006, "acc": 0.6537042825883088, "bleu": 30.36967687853596, "chrf": 56.99462352920415}
71
+ {"step": 71000, "loss": 1.8353953078598781, "ppl": 6.267611291792051, "acc": 0.6535167239762426, "bleu": 30.312364361794366, "chrf": 57.010298640877124}
72
+ {"step": 72000, "loss": 1.8339239809132248, "ppl": 6.258396367153129, "acc": 0.6534542044388871, "bleu": 30.314802903457725, "chrf": 57.02531379076068}
73
+ {"step": 73000, "loss": 1.832636663801188, "ppl": 6.25034500985268, "acc": 0.6532666458268209, "bleu": 30.18211166645784, "chrf": 56.98519639908388}
74
+ {"step": 74000, "loss": 1.8311134483859701, "ppl": 6.24083163528204, "acc": 0.6534542044388871, "bleu": 30.240694916926348, "chrf": 57.005477510723715}
75
+ {"step": 75000, "loss": 1.8313917221446154, "ppl": 6.242568536614085, "acc": 0.6538293216630197, "bleu": 30.25037189042524, "chrf": 57.01518239004976}
76
+ {"step": 76000, "loss": 1.8304509984809707, "ppl": 6.236698766018694, "acc": 0.6540793998124413, "bleu": 30.17670592260045, "chrf": 56.98846895856825}
77
+ {"step": 77000, "loss": 1.8294424893521413, "ppl": 6.230412168957504, "acc": 0.6538293216630197, "bleu": 30.15356603689065, "chrf": 57.05283117752522}
78
+ {"step": 78000, "loss": 1.8289005528766313, "ppl": 6.227036596101329, "acc": 0.6538293216630197, "bleu": 30.219200078052154, "chrf": 57.03607682555588}
79
+ {"step": 79000, "loss": 1.8285974338599464, "ppl": 6.225149348936161, "acc": 0.6541419193497968, "bleu": 30.201865071457675, "chrf": 57.03725790141988}
80
+ {"step": 80000, "loss": 1.8268809665251837, "ppl": 6.214473248634622, "acc": 0.654016880275086, "bleu": 30.246014694107835, "chrf": 57.07984929057122}
81
+ {"step": 81000, "loss": 1.8151666658526102, "ppl": 6.1420997704386275, "acc": 0.6548921537980619, "bleu": 30.30511155087304, "chrf": 57.05979018535777}
82
+ {"step": 82000, "loss": 1.8017998941319553, "ppl": 6.0605459945424975, "acc": 0.6555173491716161, "bleu": 30.386818573469718, "chrf": 57.0528476753652}
83
+ {"step": 83000, "loss": 1.7898324244392778, "ppl": 5.988448864619553, "acc": 0.6561425445451704, "bleu": 30.43094334701287, "chrf": 57.06257176341698}
84
+ {"step": 84000, "loss": 1.7796441400449847, "ppl": 5.9277465955758055, "acc": 0.6569552985307908, "bleu": 30.692067267210952, "chrf": 57.16142812336842}
85
+ {"step": 85000, "loss": 1.7700074883020085, "ppl": 5.8708973242702225, "acc": 0.6575179743669897, "bleu": 30.837943336700793, "chrf": 57.27245831866552}
86
+ {"step": 86000, "loss": 1.7619386177504202, "ppl": 5.823716418033855, "acc": 0.658018130665833, "bleu": 30.885682694104386, "chrf": 57.27633829541885}
87
+ {"step": 87000, "loss": 1.7556014742766295, "ppl": 5.786927383356287, "acc": 0.6585808065020319, "bleu": 30.980860506825476, "chrf": 57.3046725438474}
88
+ {"step": 88000, "loss": 1.7502013912123418, "ppl": 5.755761719121828, "acc": 0.6591434823382307, "bleu": 30.932408305594507, "chrf": 57.24207869793768}
89
+ {"step": 89000, "loss": 1.745357054328501, "ppl": 5.727946298362393, "acc": 0.6593310409502969, "bleu": 30.86580944197795, "chrf": 57.155334441206705}
90
+ {"step": 90000, "loss": 1.7414522467348195, "ppl": 5.705623381871351, "acc": 0.6592685214129416, "bleu": 30.778809073597362, "chrf": 57.10320733293281}
91
+ {"step": 91000, "loss": 1.7385490644048922, "ppl": 5.68908293846694, "acc": 0.6593935604876524, "bleu": 30.814687019400694, "chrf": 57.156128318200786}
92
+ {"step": 92000, "loss": 1.735776736565626, "ppl": 5.673332777848827, "acc": 0.6590809628008752, "bleu": 30.7948046957926, "chrf": 57.06469797598115}
93
+ {"step": 93000, "loss": 1.7336947337058455, "ppl": 5.6615331704513805, "acc": 0.6592685214129416, "bleu": 30.802719069102576, "chrf": 57.094310566383314}
94
+ {"step": 94000, "loss": 1.7316813252798726, "ppl": 5.650145659564158, "acc": 0.6587683651140982, "bleu": 30.847028285926314, "chrf": 57.081001817985936}
95
+ {"step": 95000, "loss": 1.7300995931956276, "ppl": 5.641215707151691, "acc": 0.6586433260393874, "bleu": 30.83037701203774, "chrf": 57.07951804602165}
96
+ {"step": 96000, "loss": 1.7291842680940332, "ppl": 5.636054523252361, "acc": 0.658455767427321, "bleu": 30.874036638473978, "chrf": 57.087228488058415}
97
+ {"step": 97000, "loss": 1.728562312410861, "ppl": 5.6325502369792995, "acc": 0.6587058455767427, "bleu": 30.870366035906606, "chrf": 57.05137890300803}
98
+ {"step": 98000, "loss": 1.7277250108065998, "ppl": 5.627836067496046, "acc": 0.6585182869646765, "bleu": 30.95775184085035, "chrf": 57.088495304174}
99
+ {"step": 99000, "loss": 1.7270162512638227, "ppl": 5.623848698187965, "acc": 0.6588308846514536, "bleu": 30.971976171090674, "chrf": 57.074751985253954}
100
+ {"step": 100000, "loss": 1.726466431160128, "ppl": 5.620757443008561, "acc": 0.6588308846514536, "bleu": 30.977369378486593, "chrf": 57.08608812701335}
model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:af874d90330cc235279656d6780eed25689bbcfd8467926a1adce65340c778f8
3
- size 409915789
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f43106bbeb49ef0437a5c0bd61761b28d3c7750723401b72090fa8d0758f7482
3
+ size 399605364
pytorch_model/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6efc214f6d81d13ee58e3c29a8a20c46a9d35e755f58a0ec604a0835f808801
3
+ size 799169344
pytorch_model/tokenizer_src.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c26f3e3e3df69013a62aff1e7b9d90a1838d3f1d7601dbee7fa09b29dcc09754
3
+ size 817478
pytorch_model/tokenizer_src.vocab ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model/tokenizer_tgt.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c49b8b83d9b6461a63ae3bd563d035ea72deb18ea4756c16829ffc0c709aea1f
3
+ size 802177
pytorch_model/tokenizer_tgt.vocab ADDED
The diff for this file is too large to render. See raw diff
 
source_vocabulary.json CHANGED
The diff for this file is too large to render. See raw diff
 
src.spm.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:538f374f5558509c152305b8efbea6cc87daa58cfd52dea3bb962c0ad908c797
3
- size 814659
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df4e8c5fdac389435c77641254f27811bb6709fe6e2a5bdb8fa5ea56900d4d85
3
+ size 817694
target_vocabulary.json CHANGED
The diff for this file is too large to render. See raw diff
 
tgt.spm.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ac985ba45c9ec783ae106ecde3c5873db2c14e4a1e76086e1eaf7d48295e9b0f
3
- size 800209
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5fcc4576244508befdbe68bd7cc13d3d45140dc1e94665a34ecbde300e59141
3
+ size 801740