Upload folder using huggingface_hub
Browse files- README.md +52 -38
- config.json +2 -2
- config.yaml +66 -0
- metrics.jsonl +100 -0
- model.bin +2 -2
- pytorch_model/model.safetensors +3 -0
- pytorch_model/tokenizer_src.model +3 -0
- pytorch_model/tokenizer_src.vocab +0 -0
- pytorch_model/tokenizer_tgt.model +3 -0
- pytorch_model/tokenizer_tgt.vocab +0 -0
- source_vocabulary.json +0 -0
- src.spm.model +2 -2
- target_vocabulary.json +0 -0
- tgt.spm.model +2 -2
README.md
CHANGED
|
@@ -1,20 +1,22 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
-
- en
|
| 4 |
- is
|
|
|
|
| 5 |
tags:
|
| 6 |
- translation
|
| 7 |
license: cc-by-4.0
|
| 8 |
datasets:
|
| 9 |
- quickmt/quickmt-train.is-en
|
| 10 |
- quickmt/newscrawl2024-en-backtranslated-is
|
|
|
|
|
|
|
| 11 |
model-index:
|
| 12 |
- name: quickmt-is-en
|
| 13 |
results:
|
| 14 |
- task:
|
| 15 |
name: Translation isl-eng
|
| 16 |
type: translation
|
| 17 |
-
args:
|
| 18 |
dataset:
|
| 19 |
name: flores101-devtest
|
| 20 |
type: flores_101
|
|
@@ -22,91 +24,103 @@ model-index:
|
|
| 22 |
metrics:
|
| 23 |
- name: BLEU
|
| 24 |
type: bleu
|
| 25 |
-
value:
|
| 26 |
- name: CHRF
|
| 27 |
type: chrf
|
| 28 |
-
value: 60.
|
| 29 |
-
|
| 30 |
-
type: comet
|
| 31 |
-
value: 85.39
|
| 32 |
---
|
| 33 |
|
| 34 |
-
<a href="https://huggingface.co/spaces/quickmt/quickmt-gui"><img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-lg-dark.svg" alt="Open in Spaces"></a>
|
| 35 |
|
| 36 |
# `quickmt-is-en` Neural Machine Translation Model
|
| 37 |
|
| 38 |
`quickmt-is-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `is` into `en`.
|
| 39 |
|
|
|
|
|
|
|
| 40 |
|
| 41 |
## Try it on our Huggingface Space
|
| 42 |
|
| 43 |
-
Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-
|
| 44 |
|
| 45 |
|
| 46 |
## Model Information
|
| 47 |
|
| 48 |
-
* Trained using [`
|
| 49 |
-
* 200M parameter
|
| 50 |
* 32k separate Sentencepiece vocabs
|
| 51 |
* Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
|
| 52 |
-
* The pytorch model (for
|
| 53 |
-
|
| 54 |
-
See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
|
| 55 |
|
| 56 |
|
| 57 |
## Usage with `quickmt`
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
-
Next, install the `quickmt`
|
| 62 |
|
| 63 |
```bash
|
| 64 |
git clone https://github.com/quickmt/quickmt.git
|
| 65 |
-
pip install ./quickmt/
|
| 66 |
```
|
| 67 |
|
| 68 |
-
Finally
|
| 69 |
|
| 70 |
```python
|
| 71 |
from quickmt import Translator
|
| 72 |
-
from huggingface_hub import snapshot_download
|
| 73 |
|
| 74 |
-
#
|
| 75 |
-
|
| 76 |
-
t = Translator(
|
| 77 |
-
snapshot_download("quickmt/quickmt-zh-en", ignore_patterns="eole-model/*"),
|
| 78 |
-
device="cpu"
|
| 79 |
-
)
|
| 80 |
|
| 81 |
# Translate - set beam size to 1 for faster speed (but lower quality)
|
| 82 |
sample_text = 'Dr. Ehud Ur, læknaprófessor við Dalhousie-háskólann í Halifax í Nova Scotia og formaður klínískrar vísindadeildar Kanadíska sykursýkissambandsins, minnti á að rannsóknin væri rétt nýhafin.'
|
| 83 |
|
| 84 |
-
|
| 85 |
```
|
| 86 |
|
| 87 |
-
>
|
| 88 |
|
| 89 |
```python
|
| 90 |
# Get alternative translations by sampling
|
| 91 |
# You can pass any cTranslate2 `translate_batch` arguments
|
| 92 |
-
|
| 93 |
```
|
| 94 |
|
| 95 |
-
> 'Dr Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia and chair of the
|
| 96 |
|
| 97 |
The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
|
| 98 |
|
| 99 |
|
| 100 |
## Metrics
|
| 101 |
|
| 102 |
-
`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
|
|
|
| 104 |
|
| 105 |
-
| | bleu | chrf2 | comet22 | Time (s) |
|
| 106 |
-
|:---------------------------------|-------:|--------:|----------:|-----------:|
|
| 107 |
-
| quickmt/quickmt-is-en | 34.76 | 60.13 | 85.39 | 1.22 |
|
| 108 |
-
| Helsinki-NLP/opus-mt-is-en | 25.91 | 52.03 | 79.99 | 3.5 |
|
| 109 |
-
| facebook/nllb-200-distilled-600M | 30.13 | 54.77 | 82.23 | 21.3 |
|
| 110 |
-
| facebook/nllb-200-distilled-1.3B | 33.71 | 57.73 | 84.71 | 37.21 |
|
| 111 |
-
| facebook/m2m100_418M | 20.38 | 46.47 | 70.95 | 18.8 |
|
| 112 |
-
| facebook/m2m100_1.2B | 28.89 | 54.54 | 81.09 | 34.72 |
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
|
|
|
| 3 |
- is
|
| 4 |
+
- en
|
| 5 |
tags:
|
| 6 |
- translation
|
| 7 |
license: cc-by-4.0
|
| 8 |
datasets:
|
| 9 |
- quickmt/quickmt-train.is-en
|
| 10 |
- quickmt/newscrawl2024-en-backtranslated-is
|
| 11 |
+
- quickmt/finetranslations-sample-is-en
|
| 12 |
+
- HuggingFaceFW/finetranslations
|
| 13 |
model-index:
|
| 14 |
- name: quickmt-is-en
|
| 15 |
results:
|
| 16 |
- task:
|
| 17 |
name: Translation isl-eng
|
| 18 |
type: translation
|
| 19 |
+
args: iso-eng
|
| 20 |
dataset:
|
| 21 |
name: flores101-devtest
|
| 22 |
type: flores_101
|
|
|
|
| 24 |
metrics:
|
| 25 |
- name: BLEU
|
| 26 |
type: bleu
|
| 27 |
+
value: 36.09
|
| 28 |
- name: CHRF
|
| 29 |
type: chrf
|
| 30 |
+
value: 60.91
|
| 31 |
+
|
|
|
|
|
|
|
| 32 |
---
|
| 33 |
|
|
|
|
| 34 |
|
| 35 |
# `quickmt-is-en` Neural Machine Translation Model
|
| 36 |
|
| 37 |
`quickmt-is-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `is` into `en`.
|
| 38 |
|
| 39 |
+
`quickmt` models are roughly 3 times faster for GPU inference than OpusMT models and roughly [40 times](https://huggingface.co/spaces/quickmt/quickmt-vs-libretranslate) faster than [LibreTranslate](https://huggingface.co/spaces/quickmt/quickmt-vs-libretranslate)/[ArgosTranslate](github.com/argosopentech/argos-translate).
|
| 40 |
+
|
| 41 |
|
| 42 |
## Try it on our Huggingface Space
|
| 43 |
|
| 44 |
+
Give it a try before downloading here: https://huggingface.co/spaces/quickmt/QuickMT-Demo
|
| 45 |
|
| 46 |
|
| 47 |
## Model Information
|
| 48 |
|
| 49 |
+
* Trained using [`quickmt-train`](github.com/quickmt/quickmt-train)
|
| 50 |
+
* 200M parameter seq2seq transformer
|
| 51 |
* 32k separate Sentencepiece vocabs
|
| 52 |
* Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
|
| 53 |
+
* The pytorch model (for fine-tuning or pytorch inference) is available in this repository in the `pytorch_model` folder
|
| 54 |
+
* Original configuration file: `config.yaml`
|
|
|
|
| 55 |
|
| 56 |
|
| 57 |
## Usage with `quickmt`
|
| 58 |
|
| 59 |
+
If you want to do GPU inference be sure you have the Nvidia driver and cuda toolkit installed.
|
| 60 |
|
| 61 |
+
Next, install the `quickmt` python library and download the model:
|
| 62 |
|
| 63 |
```bash
|
| 64 |
git clone https://github.com/quickmt/quickmt.git
|
| 65 |
+
pip install -e ./quickmt/
|
| 66 |
```
|
| 67 |
|
| 68 |
+
Finally use the model in python:
|
| 69 |
|
| 70 |
```python
|
| 71 |
from quickmt import Translator
|
|
|
|
| 72 |
|
| 73 |
+
# Auto-detects GPU, set to "cpu" to force CPU inference
|
| 74 |
+
mt = Translator("quickmt/quickmt-is-en", device="auto")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
# Translate - set beam size to 1 for faster speed (but lower quality)
|
| 77 |
sample_text = 'Dr. Ehud Ur, læknaprófessor við Dalhousie-háskólann í Halifax í Nova Scotia og formaður klínískrar vísindadeildar Kanadíska sykursýkissambandsins, minnti á að rannsóknin væri rétt nýhafin.'
|
| 78 |
|
| 79 |
+
mt(sample_text, beam_size=5)
|
| 80 |
```
|
| 81 |
|
| 82 |
+
> "Dr. Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia and chair of the Canadian Diabetes Association's clinical science department, recalled that the study had just begun."
|
| 83 |
|
| 84 |
```python
|
| 85 |
# Get alternative translations by sampling
|
| 86 |
# You can pass any cTranslate2 `translate_batch` arguments
|
| 87 |
+
mt([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
|
| 88 |
```
|
| 89 |
|
| 90 |
+
> 'Dr. Ehud Ur, a medical professor at Dalhousie University in Halifax, Nova Scotia, and chair of the Clinical Division of the Canadian Diabetes Association, reminded that the study had just begun.'
|
| 91 |
|
| 92 |
The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`. A model in safetensors format to be used with `eole` is also provided.
|
| 93 |
|
| 94 |
|
| 95 |
## Metrics
|
| 96 |
|
| 97 |
+
`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) and [Bouquet](https://huggingface.co/datasets/facebook/bouquet) `test` set. "Time (s)" is the time in seconds to translate dataset on an RTX 4070s GPU with batch size 32. LLM inference done with vLLM and 32 threads.
|
| 98 |
+
|
| 99 |
+
Benchmarks are hard to get right and make fair. Download this model and give it a try and see if it works well for you!
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
### flores devtest
|
| 103 |
+
|
| 104 |
+
| model | time | bleu | chrf |
|
| 105 |
+
|----------------------------------|-------|-------|-------|
|
| 106 |
+
| quickmt-is-en | 0.70 | 47.68 | 65.91 |
|
| 107 |
+
| Helsinki-NLP/opus-mt-is-en | 1.17 | 36.46 | 56.62 |
|
| 108 |
+
| facebook/nllb-200-distilled-1.3B | 8.57 | 40.31 | 60.39 |
|
| 109 |
+
| CohereLabs/tiny-aya-global | 14.22 | 22.26 | 43.01 |
|
| 110 |
+
| google/gemma-4-E2B-it | 23.79 | 36.90 | 57.52 |
|
| 111 |
+
|
| 112 |
+
### bouquet test
|
| 113 |
+
|
| 114 |
+
| model | time | bleu | chrf |
|
| 115 |
+
|----------------------------------|-------|-------|-------|
|
| 116 |
+
| quickmt-is-en | 1.16 | 36.09 | 60.91 |
|
| 117 |
+
| Helsinki-NLP/opus-mt-is-en | 2.33 | 25.26 | 51.44 |
|
| 118 |
+
| facebook/nllb-200-distilled-1.3B | 18.17 | 32.79 | 56.81 |
|
| 119 |
+
| CohereLabs/tiny-aya-global | 27.03 | 16.03 | 40.63 |
|
| 120 |
+
| google/gemma-4-E2B-it | 46.60 | 28.55 | 54.30 |
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
Prompt for LLM translation:
|
| 124 |
|
| 125 |
+
> Translate the following into {tgt_lang}, without commentary or explanation.\n\n{x}
|
| 126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
{
|
| 2 |
-
"add_source_bos":
|
| 3 |
"add_source_eos": false,
|
| 4 |
"bos_token": "<s>",
|
| 5 |
"decoder_start_token": "<s>",
|
| 6 |
"eos_token": "</s>",
|
| 7 |
-
"layer_norm_epsilon":
|
| 8 |
"multi_query_attention": false,
|
| 9 |
"unk_token": "<unk>"
|
| 10 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"add_source_bos": true,
|
| 3 |
"add_source_eos": false,
|
| 4 |
"bos_token": "<s>",
|
| 5 |
"decoder_start_token": "<s>",
|
| 6 |
"eos_token": "</s>",
|
| 7 |
+
"layer_norm_epsilon": null,
|
| 8 |
"multi_query_attention": false,
|
| 9 |
"unk_token": "<unk>"
|
| 10 |
}
|
config.yaml
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
train:
|
| 3 |
+
experiment_name: "isen1"
|
| 4 |
+
lr: 2.5e-3
|
| 5 |
+
accum_steps: 6
|
| 6 |
+
warmup_steps: 10000
|
| 7 |
+
max_steps: 100000
|
| 8 |
+
eval_steps: 1000
|
| 9 |
+
max_checkpoints: 10
|
| 10 |
+
precision: "bfloat16" # or float16 with an older GPU
|
| 11 |
+
enable_torch_compile: true
|
| 12 |
+
checkpoint_strategy: best
|
| 13 |
+
early_stopping_patience: 0
|
| 14 |
+
early_stopping_metric: chrf
|
| 15 |
+
use_ema: true
|
| 16 |
+
ema_decay: 0.9999
|
| 17 |
+
ema_start_step: 10000
|
| 18 |
+
z_loss_coeff: 0.0005
|
| 19 |
+
weight_decay_embeddings: false
|
| 20 |
+
scheduler_type: "cosine"
|
| 21 |
+
|
| 22 |
+
data:
|
| 23 |
+
src_lang: "is"
|
| 24 |
+
tgt_lang: "en"
|
| 25 |
+
src_dev_path: "quickmt-valid.is-en.is"
|
| 26 |
+
tgt_dev_path: "quickmt-valid.is-en.en"
|
| 27 |
+
input_sentence_size: 10000000
|
| 28 |
+
max_tokens_per_batch: 20000
|
| 29 |
+
buffer_size: 40000
|
| 30 |
+
num_workers: 4
|
| 31 |
+
prefetch_factor: 128
|
| 32 |
+
pad_multiple: 1
|
| 33 |
+
corpora:
|
| 34 |
+
- src_file: "quickmt-train.is-en.is"
|
| 35 |
+
tgt_file: "quickmt-train.is-en.en"
|
| 36 |
+
weight: 10
|
| 37 |
+
start_step: 0
|
| 38 |
+
- src_file: "finetranslations-sample-is-en.is"
|
| 39 |
+
tgt_file: "finetranslations-sample-is-en.en"
|
| 40 |
+
weight: 4
|
| 41 |
+
start_step: 0
|
| 42 |
+
stop_step: 80000
|
| 43 |
+
- src_file: "newscrawl2024-en-backtranslated-is.is"
|
| 44 |
+
tgt_file: "newscrawl2024-en-backtranslated-is.en"
|
| 45 |
+
start_step: 0
|
| 46 |
+
weight: 5
|
| 47 |
+
stop_step: 80000
|
| 48 |
+
|
| 49 |
+
model:
|
| 50 |
+
d_model: 768
|
| 51 |
+
enc_layers: 12
|
| 52 |
+
dec_layers: 2
|
| 53 |
+
n_heads: 12
|
| 54 |
+
ffn_dim: 4096
|
| 55 |
+
max_len: 256
|
| 56 |
+
vocab_size_src: 32000
|
| 57 |
+
vocab_size_tgt: 32000
|
| 58 |
+
norm_type: "rmsnorm"
|
| 59 |
+
mlp_type: "gated"
|
| 60 |
+
activation: "silu"
|
| 61 |
+
ff_bias: false
|
| 62 |
+
layernorm_eps: 1.0e-5
|
| 63 |
+
dropout: 0.1
|
| 64 |
+
|
| 65 |
+
export:
|
| 66 |
+
max_len: 256
|
metrics.jsonl
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{"step": 1000, "loss": 4.994450536300406, "ppl": 147.59182674848597, "acc": 0.2561425445451704, "bleu": 1.164220201274155, "chrf": 16.973689375539468}
|
| 2 |
+
{"step": 2000, "loss": 2.9462299467362847, "ppl": 19.03405887389251, "acc": 0.5052829009065333, "bleu": 18.761607509943545, "chrf": 46.17091909277533}
|
| 3 |
+
{"step": 3000, "loss": 2.671850914104017, "ppl": 14.46672108553379, "acc": 0.5426695842450766, "bleu": 21.07071474393453, "chrf": 49.35269560974713}
|
| 4 |
+
{"step": 4000, "loss": 2.5037393429831587, "ppl": 12.228133762268097, "acc": 0.5640512660206315, "bleu": 22.48549663052993, "chrf": 51.167411359125204}
|
| 5 |
+
{"step": 5000, "loss": 2.406415891990173, "ppl": 11.094127252871816, "acc": 0.5763676148796499, "bleu": 23.15323743975618, "chrf": 51.475933755821714}
|
| 6 |
+
{"step": 6000, "loss": 2.337301650051774, "ppl": 10.353262112991489, "acc": 0.57786808377618, "bleu": 20.113464528437817, "chrf": 44.12110313558305}
|
| 7 |
+
{"step": 7000, "loss": 2.2539055326127304, "ppl": 9.524862951475631, "acc": 0.5941856830259457, "bleu": 24.80832372875448, "chrf": 51.67041916758161}
|
| 8 |
+
{"step": 8000, "loss": 2.3206392851647974, "ppl": 10.182181543467566, "acc": 0.5888090028133792, "bleu": 13.932007832745292, "chrf": 42.01360381861268}
|
| 9 |
+
{"step": 9000, "loss": 2.2266226918446193, "ppl": 9.268510544385578, "acc": 0.5969990622069397, "bleu": 25.827846658927797, "chrf": 53.293785209000774}
|
| 10 |
+
{"step": 10000, "loss": 2.1821169167244645, "ppl": 8.86505298802123, "acc": 0.6056892778993436, "bleu": 24.954958782293804, "chrf": 53.44992973095211}
|
| 11 |
+
{"step": 11000, "loss": 2.1617847877876875, "ppl": 8.686627618095132, "acc": 0.6086902156924039, "bleu": 25.53191033743962, "chrf": 53.85723320106797}
|
| 12 |
+
{"step": 12000, "loss": 2.1383983329446212, "ppl": 8.48583525579101, "acc": 0.6137542982181932, "bleu": 27.0470782171914, "chrf": 54.55581738183708}
|
| 13 |
+
{"step": 13000, "loss": 2.1169620610505726, "ppl": 8.305866406058117, "acc": 0.6161925601750547, "bleu": 27.2138814312068, "chrf": 54.67266014028607}
|
| 14 |
+
{"step": 14000, "loss": 2.098620139930203, "ppl": 8.154909511457713, "acc": 0.6192560175054704, "bleu": 27.444993629318027, "chrf": 54.86491069472626}
|
| 15 |
+
{"step": 15000, "loss": 2.0836098530546656, "ppl": 8.033416086943097, "acc": 0.6222569552985308, "bleu": 27.78043111083278, "chrf": 55.053479575945865}
|
| 16 |
+
{"step": 16000, "loss": 2.069503783225715, "ppl": 7.920891663192714, "acc": 0.6251953735542357, "bleu": 27.92890061002052, "chrf": 55.11635676587596}
|
| 17 |
+
{"step": 17000, "loss": 2.0570456397201764, "ppl": 7.822824195831242, "acc": 0.6264457643013441, "bleu": 27.878779295969807, "chrf": 55.19455945452377}
|
| 18 |
+
{"step": 18000, "loss": 2.046806539122333, "ppl": 7.743134185167205, "acc": 0.6283838699593624, "bleu": 27.938605871318362, "chrf": 55.17970841469209}
|
| 19 |
+
{"step": 19000, "loss": 2.037328945111021, "ppl": 7.670094569626345, "acc": 0.6285714285714286, "bleu": 28.029919338244245, "chrf": 55.259974697929295}
|
| 20 |
+
{"step": 20000, "loss": 2.0268357787589872, "ppl": 7.590031782064036, "acc": 0.6292591434823382, "bleu": 28.279161341177904, "chrf": 55.47056151897512}
|
| 21 |
+
{"step": 21000, "loss": 2.016149803130617, "ppl": 7.509356701086586, "acc": 0.6313222882150672, "bleu": 28.418614553687238, "chrf": 55.61705677471761}
|
| 22 |
+
{"step": 22000, "loss": 2.0072188200597356, "ppl": 7.442589356322234, "acc": 0.6318224445139106, "bleu": 28.362396673956198, "chrf": 55.635261370938885}
|
| 23 |
+
{"step": 23000, "loss": 1.9979302881210437, "ppl": 7.3737786971120896, "acc": 0.6332603938730853, "bleu": 28.359591563080922, "chrf": 55.62669652445366}
|
| 24 |
+
{"step": 24000, "loss": 1.9894417460168217, "ppl": 7.311450976169526, "acc": 0.6341356673960613, "bleu": 28.375044520543653, "chrf": 55.740701306975446}
|
| 25 |
+
{"step": 25000, "loss": 1.9828298088758205, "ppl": 7.263267590204029, "acc": 0.6350734604563926, "bleu": 28.559750845501792, "chrf": 55.81894077300116}
|
| 26 |
+
{"step": 26000, "loss": 1.974070632424195, "ppl": 7.19992516648463, "acc": 0.6361988121287903, "bleu": 28.600872207096703, "chrf": 55.75644033472818}
|
| 27 |
+
{"step": 27000, "loss": 1.9654939966896394, "ppl": 7.138438084078314, "acc": 0.6365114098155674, "bleu": 28.742154358221875, "chrf": 55.79456357681747}
|
| 28 |
+
{"step": 28000, "loss": 1.9570172685800251, "ppl": 7.078183228128073, "acc": 0.638074398249453, "bleu": 28.920497381125873, "chrf": 55.963029411561905}
|
| 29 |
+
{"step": 29000, "loss": 1.9496945456587995, "ppl": 7.026540965318027, "acc": 0.6385745545482964, "bleu": 28.983978309080356, "chrf": 55.994004661605004}
|
| 30 |
+
{"step": 30000, "loss": 1.9441999582694298, "ppl": 6.988038895299612, "acc": 0.6391997499218506, "bleu": 29.039322766047295, "chrf": 55.955787821536454}
|
| 31 |
+
{"step": 31000, "loss": 1.9386406572061092, "ppl": 6.949298068973334, "acc": 0.6399499843701156, "bleu": 29.075298299505366, "chrf": 56.050221728406655}
|
| 32 |
+
{"step": 32000, "loss": 1.9348651128770709, "ppl": 6.923110153983664, "acc": 0.6395748671459831, "bleu": 29.18013749076382, "chrf": 56.134750789202634}
|
| 33 |
+
{"step": 33000, "loss": 1.9291061377517877, "ppl": 6.883354719971714, "acc": 0.6403251015942482, "bleu": 29.29599523494858, "chrf": 56.12494496076502}
|
| 34 |
+
{"step": 34000, "loss": 1.9238471853692072, "ppl": 6.847250503654076, "acc": 0.6412628946545795, "bleu": 29.22514117810576, "chrf": 56.19178900888207}
|
| 35 |
+
{"step": 35000, "loss": 1.919167904795688, "ppl": 6.815285143160717, "acc": 0.6423882463269772, "bleu": 29.423800910939516, "chrf": 56.28051976487358}
|
| 36 |
+
{"step": 36000, "loss": 1.9154907150244704, "ppl": 6.790270067122464, "acc": 0.6427633635511097, "bleu": 29.56676497278216, "chrf": 56.378205912388864}
|
| 37 |
+
{"step": 37000, "loss": 1.9099493562746659, "ppl": 6.7527468056169875, "acc": 0.6430134417005314, "bleu": 29.724507355987996, "chrf": 56.3955533759033}
|
| 38 |
+
{"step": 38000, "loss": 1.9056084191959402, "ppl": 6.723497088147935, "acc": 0.6439512347608628, "bleu": 29.837468698981745, "chrf": 56.45802937684058}
|
| 39 |
+
{"step": 39000, "loss": 1.9022945631813653, "ppl": 6.701253263655329, "acc": 0.6444513910597062, "bleu": 29.860181788551714, "chrf": 56.448017618511656}
|
| 40 |
+
{"step": 40000, "loss": 1.8991676552364998, "ppl": 6.680331788394988, "acc": 0.6448265082838387, "bleu": 29.836159543988067, "chrf": 56.410599547621345}
|
| 41 |
+
{"step": 41000, "loss": 1.896588106146452, "ppl": 6.663121751219954, "acc": 0.6454517036573929, "bleu": 29.910092217589167, "chrf": 56.495291829629345}
|
| 42 |
+
{"step": 42000, "loss": 1.8937625390136565, "ppl": 6.644321226977929, "acc": 0.6457643013441701, "bleu": 29.84550037175743, "chrf": 56.44456881321764}
|
| 43 |
+
{"step": 43000, "loss": 1.8920155827199419, "ppl": 6.632724021048358, "acc": 0.6458268208815254, "bleu": 29.869545378255758, "chrf": 56.47232280414817}
|
| 44 |
+
{"step": 44000, "loss": 1.8896889851442833, "ppl": 6.617310279161182, "acc": 0.6463269771803689, "bleu": 29.892927828789244, "chrf": 56.56258982197585}
|
| 45 |
+
{"step": 45000, "loss": 1.8889288751733941, "ppl": 6.612282306785575, "acc": 0.6468896530165676, "bleu": 29.85949759139706, "chrf": 56.467059380507834}
|
| 46 |
+
{"step": 46000, "loss": 1.8868934070888852, "ppl": 6.598836905668948, "acc": 0.6473272897780556, "bleu": 29.753671938670145, "chrf": 56.450896978598905}
|
| 47 |
+
{"step": 47000, "loss": 1.885726744594258, "ppl": 6.591142779240017, "acc": 0.647389809315411, "bleu": 29.708054439281167, "chrf": 56.39153318937854}
|
| 48 |
+
{"step": 48000, "loss": 1.8822946913952603, "ppl": 6.5685604007081775, "acc": 0.6480775242263207, "bleu": 29.768954442179517, "chrf": 56.43350439980706}
|
| 49 |
+
{"step": 49000, "loss": 1.8784707959274383, "ppl": 6.543490874533134, "acc": 0.6483901219130979, "bleu": 29.86387516687504, "chrf": 56.553222956271874}
|
| 50 |
+
{"step": 50000, "loss": 1.875717959652919, "ppl": 6.525502486395253, "acc": 0.6483901219130979, "bleu": 29.9431764646548, "chrf": 56.691396815918615}
|
| 51 |
+
{"step": 51000, "loss": 1.8721117835895984, "ppl": 6.502012755037178, "acc": 0.6492028758987184, "bleu": 29.929636873284917, "chrf": 56.680295034811145}
|
| 52 |
+
{"step": 52000, "loss": 1.8692507821047593, "ppl": 6.483437072089493, "acc": 0.6495154735854954, "bleu": 29.89108068645933, "chrf": 56.68084336292139}
|
| 53 |
+
{"step": 53000, "loss": 1.866853663070979, "ppl": 6.4679141143016, "acc": 0.650328227571116, "bleu": 29.928095351705764, "chrf": 56.67216295536798}
|
| 54 |
+
{"step": 54000, "loss": 1.8650410658421088, "ppl": 6.456201009878714, "acc": 0.6506408252578931, "bleu": 30.02093405318264, "chrf": 56.65811632388697}
|
| 55 |
+
{"step": 55000, "loss": 1.863233341504425, "ppl": 6.444540520834505, "acc": 0.6506408252578931, "bleu": 30.033952526175927, "chrf": 56.7120040756438}
|
| 56 |
+
{"step": 56000, "loss": 1.861440289262758, "ppl": 6.4329954765340345, "acc": 0.6506408252578931, "bleu": 29.957227110478136, "chrf": 56.67272279994852}
|
| 57 |
+
{"step": 57000, "loss": 1.8604369926393014, "ppl": 6.426544510551042, "acc": 0.6513910597061582, "bleu": 29.96152325240789, "chrf": 56.70643423675522}
|
| 58 |
+
{"step": 58000, "loss": 1.8573550892979251, "ppl": 6.40676901029315, "acc": 0.6518912160050016, "bleu": 29.962857126854125, "chrf": 56.700294962427975}
|
| 59 |
+
{"step": 59000, "loss": 1.8542828019986715, "ppl": 6.387115780875955, "acc": 0.6523913723038449, "bleu": 29.9999105342699, "chrf": 56.71413288331987}
|
| 60 |
+
{"step": 60000, "loss": 1.8523500309544976, "ppl": 6.374782870624044, "acc": 0.6521412941544232, "bleu": 30.002422333797487, "chrf": 56.72571003647862}
|
| 61 |
+
{"step": 61000, "loss": 1.8497147698743748, "ppl": 6.358005769161304, "acc": 0.6518286964676462, "bleu": 30.088128095780675, "chrf": 56.82286559795213}
|
| 62 |
+
{"step": 62000, "loss": 1.8478954376001289, "ppl": 6.346448960091215, "acc": 0.6521412941544232, "bleu": 30.163192189586514, "chrf": 56.87471933431556}
|
| 63 |
+
{"step": 63000, "loss": 1.846671395571614, "ppl": 6.338685392268225, "acc": 0.6521412941544232, "bleu": 30.294580216517947, "chrf": 56.93527753910208}
|
| 64 |
+
{"step": 64000, "loss": 1.8456072049798278, "ppl": 6.331943410922315, "acc": 0.651953735542357, "bleu": 30.267105355150683, "chrf": 56.92528396280259}
|
| 65 |
+
{"step": 65000, "loss": 1.8433829190396414, "ppl": 6.31787501009265, "acc": 0.6523913723038449, "bleu": 30.26381770494934, "chrf": 56.947207186438845}
|
| 66 |
+
{"step": 66000, "loss": 1.8415594121820296, "ppl": 6.306364819331557, "acc": 0.6525789309159112, "bleu": 30.372801529196206, "chrf": 56.975244819029584}
|
| 67 |
+
{"step": 67000, "loss": 1.8398574414123554, "ppl": 6.295640699404194, "acc": 0.6533916849015318, "bleu": 30.360651092771327, "chrf": 57.02676405176798}
|
| 68 |
+
{"step": 68000, "loss": 1.8397201685951665, "ppl": 6.294776538383643, "acc": 0.6536417630509535, "bleu": 30.31569810757419, "chrf": 57.010332562022306}
|
| 69 |
+
{"step": 69000, "loss": 1.8378186000813241, "ppl": 6.282817963145131, "acc": 0.653579243513598, "bleu": 30.43040186082636, "chrf": 57.0099551426294}
|
| 70 |
+
{"step": 70000, "loss": 1.8366587260545288, "ppl": 6.275534910303006, "acc": 0.6537042825883088, "bleu": 30.36967687853596, "chrf": 56.99462352920415}
|
| 71 |
+
{"step": 71000, "loss": 1.8353953078598781, "ppl": 6.267611291792051, "acc": 0.6535167239762426, "bleu": 30.312364361794366, "chrf": 57.010298640877124}
|
| 72 |
+
{"step": 72000, "loss": 1.8339239809132248, "ppl": 6.258396367153129, "acc": 0.6534542044388871, "bleu": 30.314802903457725, "chrf": 57.02531379076068}
|
| 73 |
+
{"step": 73000, "loss": 1.832636663801188, "ppl": 6.25034500985268, "acc": 0.6532666458268209, "bleu": 30.18211166645784, "chrf": 56.98519639908388}
|
| 74 |
+
{"step": 74000, "loss": 1.8311134483859701, "ppl": 6.24083163528204, "acc": 0.6534542044388871, "bleu": 30.240694916926348, "chrf": 57.005477510723715}
|
| 75 |
+
{"step": 75000, "loss": 1.8313917221446154, "ppl": 6.242568536614085, "acc": 0.6538293216630197, "bleu": 30.25037189042524, "chrf": 57.01518239004976}
|
| 76 |
+
{"step": 76000, "loss": 1.8304509984809707, "ppl": 6.236698766018694, "acc": 0.6540793998124413, "bleu": 30.17670592260045, "chrf": 56.98846895856825}
|
| 77 |
+
{"step": 77000, "loss": 1.8294424893521413, "ppl": 6.230412168957504, "acc": 0.6538293216630197, "bleu": 30.15356603689065, "chrf": 57.05283117752522}
|
| 78 |
+
{"step": 78000, "loss": 1.8289005528766313, "ppl": 6.227036596101329, "acc": 0.6538293216630197, "bleu": 30.219200078052154, "chrf": 57.03607682555588}
|
| 79 |
+
{"step": 79000, "loss": 1.8285974338599464, "ppl": 6.225149348936161, "acc": 0.6541419193497968, "bleu": 30.201865071457675, "chrf": 57.03725790141988}
|
| 80 |
+
{"step": 80000, "loss": 1.8268809665251837, "ppl": 6.214473248634622, "acc": 0.654016880275086, "bleu": 30.246014694107835, "chrf": 57.07984929057122}
|
| 81 |
+
{"step": 81000, "loss": 1.8151666658526102, "ppl": 6.1420997704386275, "acc": 0.6548921537980619, "bleu": 30.30511155087304, "chrf": 57.05979018535777}
|
| 82 |
+
{"step": 82000, "loss": 1.8017998941319553, "ppl": 6.0605459945424975, "acc": 0.6555173491716161, "bleu": 30.386818573469718, "chrf": 57.0528476753652}
|
| 83 |
+
{"step": 83000, "loss": 1.7898324244392778, "ppl": 5.988448864619553, "acc": 0.6561425445451704, "bleu": 30.43094334701287, "chrf": 57.06257176341698}
|
| 84 |
+
{"step": 84000, "loss": 1.7796441400449847, "ppl": 5.9277465955758055, "acc": 0.6569552985307908, "bleu": 30.692067267210952, "chrf": 57.16142812336842}
|
| 85 |
+
{"step": 85000, "loss": 1.7700074883020085, "ppl": 5.8708973242702225, "acc": 0.6575179743669897, "bleu": 30.837943336700793, "chrf": 57.27245831866552}
|
| 86 |
+
{"step": 86000, "loss": 1.7619386177504202, "ppl": 5.823716418033855, "acc": 0.658018130665833, "bleu": 30.885682694104386, "chrf": 57.27633829541885}
|
| 87 |
+
{"step": 87000, "loss": 1.7556014742766295, "ppl": 5.786927383356287, "acc": 0.6585808065020319, "bleu": 30.980860506825476, "chrf": 57.3046725438474}
|
| 88 |
+
{"step": 88000, "loss": 1.7502013912123418, "ppl": 5.755761719121828, "acc": 0.6591434823382307, "bleu": 30.932408305594507, "chrf": 57.24207869793768}
|
| 89 |
+
{"step": 89000, "loss": 1.745357054328501, "ppl": 5.727946298362393, "acc": 0.6593310409502969, "bleu": 30.86580944197795, "chrf": 57.155334441206705}
|
| 90 |
+
{"step": 90000, "loss": 1.7414522467348195, "ppl": 5.705623381871351, "acc": 0.6592685214129416, "bleu": 30.778809073597362, "chrf": 57.10320733293281}
|
| 91 |
+
{"step": 91000, "loss": 1.7385490644048922, "ppl": 5.68908293846694, "acc": 0.6593935604876524, "bleu": 30.814687019400694, "chrf": 57.156128318200786}
|
| 92 |
+
{"step": 92000, "loss": 1.735776736565626, "ppl": 5.673332777848827, "acc": 0.6590809628008752, "bleu": 30.7948046957926, "chrf": 57.06469797598115}
|
| 93 |
+
{"step": 93000, "loss": 1.7336947337058455, "ppl": 5.6615331704513805, "acc": 0.6592685214129416, "bleu": 30.802719069102576, "chrf": 57.094310566383314}
|
| 94 |
+
{"step": 94000, "loss": 1.7316813252798726, "ppl": 5.650145659564158, "acc": 0.6587683651140982, "bleu": 30.847028285926314, "chrf": 57.081001817985936}
|
| 95 |
+
{"step": 95000, "loss": 1.7300995931956276, "ppl": 5.641215707151691, "acc": 0.6586433260393874, "bleu": 30.83037701203774, "chrf": 57.07951804602165}
|
| 96 |
+
{"step": 96000, "loss": 1.7291842680940332, "ppl": 5.636054523252361, "acc": 0.658455767427321, "bleu": 30.874036638473978, "chrf": 57.087228488058415}
|
| 97 |
+
{"step": 97000, "loss": 1.728562312410861, "ppl": 5.6325502369792995, "acc": 0.6587058455767427, "bleu": 30.870366035906606, "chrf": 57.05137890300803}
|
| 98 |
+
{"step": 98000, "loss": 1.7277250108065998, "ppl": 5.627836067496046, "acc": 0.6585182869646765, "bleu": 30.95775184085035, "chrf": 57.088495304174}
|
| 99 |
+
{"step": 99000, "loss": 1.7270162512638227, "ppl": 5.623848698187965, "acc": 0.6588308846514536, "bleu": 30.971976171090674, "chrf": 57.074751985253954}
|
| 100 |
+
{"step": 100000, "loss": 1.726466431160128, "ppl": 5.620757443008561, "acc": 0.6588308846514536, "bleu": 30.977369378486593, "chrf": 57.08608812701335}
|
model.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f43106bbeb49ef0437a5c0bd61761b28d3c7750723401b72090fa8d0758f7482
|
| 3 |
+
size 399605364
|
pytorch_model/model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f6efc214f6d81d13ee58e3c29a8a20c46a9d35e755f58a0ec604a0835f808801
|
| 3 |
+
size 799169344
|
pytorch_model/tokenizer_src.model
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c26f3e3e3df69013a62aff1e7b9d90a1838d3f1d7601dbee7fa09b29dcc09754
|
| 3 |
+
size 817478
|
pytorch_model/tokenizer_src.vocab
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
pytorch_model/tokenizer_tgt.model
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c49b8b83d9b6461a63ae3bd563d035ea72deb18ea4756c16829ffc0c709aea1f
|
| 3 |
+
size 802177
|
pytorch_model/tokenizer_tgt.vocab
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
source_vocabulary.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
src.spm.model
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:df4e8c5fdac389435c77641254f27811bb6709fe6e2a5bdb8fa5ea56900d4d85
|
| 3 |
+
size 817694
|
target_vocabulary.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tgt.spm.model
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a5fcc4576244508befdbe68bd7cc13d3d45140dc1e94665a34ecbde300e59141
|
| 3 |
+
size 801740
|