Reza-paji commited on Sep 16, 2025

Commit

8339415

verified ·

1 Parent(s): 423a40a

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

glot-contrastive-final-lora/README.md +206 -0
glot-contrastive-final-lora/adapter_config.json +37 -0
glot-contrastive-final-lora/adapter_model.safetensors +3 -0
glot-contrastive-final-lora/checkpoint-1000/README.md +206 -0
glot-contrastive-final-lora/checkpoint-1000/adapter_config.json +37 -0
glot-contrastive-final-lora/checkpoint-1000/adapter_model.safetensors +3 -0
glot-contrastive-final-lora/checkpoint-1000/optimizer.pt +3 -0
glot-contrastive-final-lora/checkpoint-1000/rng_state.pth +3 -0
glot-contrastive-final-lora/checkpoint-1000/scheduler.pt +3 -0
glot-contrastive-final-lora/checkpoint-1000/sentencepiece.bpe.model +3 -0
glot-contrastive-final-lora/checkpoint-1000/special_tokens_map.json +15 -0
glot-contrastive-final-lora/checkpoint-1000/tokenizer_config.json +57 -0
glot-contrastive-final-lora/checkpoint-1000/trainer_state.json +1434 -0
glot-contrastive-final-lora/checkpoint-1000/training_args.bin +3 -0
glot-contrastive-final-lora/checkpoint-1500/README.md +206 -0
glot-contrastive-final-lora/checkpoint-1500/adapter_config.json +37 -0
glot-contrastive-final-lora/checkpoint-1500/adapter_model.safetensors +3 -0
glot-contrastive-final-lora/checkpoint-1500/optimizer.pt +3 -0
glot-contrastive-final-lora/checkpoint-1500/rng_state.pth +3 -0
glot-contrastive-final-lora/checkpoint-1500/scheduler.pt +3 -0
glot-contrastive-final-lora/checkpoint-1500/sentencepiece.bpe.model +3 -0
glot-contrastive-final-lora/checkpoint-1500/special_tokens_map.json +15 -0
glot-contrastive-final-lora/checkpoint-1500/tokenizer_config.json +57 -0
glot-contrastive-final-lora/checkpoint-1500/trainer_state.json +2134 -0
glot-contrastive-final-lora/checkpoint-1500/training_args.bin +3 -0
glot-contrastive-final-lora/checkpoint-2000/README.md +206 -0
glot-contrastive-final-lora/checkpoint-2000/adapter_config.json +37 -0
glot-contrastive-final-lora/checkpoint-2000/adapter_model.safetensors +3 -0
glot-contrastive-final-lora/checkpoint-2000/optimizer.pt +3 -0
glot-contrastive-final-lora/checkpoint-2000/rng_state.pth +3 -0
glot-contrastive-final-lora/checkpoint-2000/scheduler.pt +3 -0
glot-contrastive-final-lora/checkpoint-2000/sentencepiece.bpe.model +3 -0
glot-contrastive-final-lora/checkpoint-2000/special_tokens_map.json +15 -0
glot-contrastive-final-lora/checkpoint-2000/tokenizer_config.json +57 -0
glot-contrastive-final-lora/checkpoint-2000/trainer_state.json +2834 -0
glot-contrastive-final-lora/checkpoint-2000/training_args.bin +3 -0
glot-contrastive-final-lora/checkpoint-2500/README.md +206 -0
glot-contrastive-final-lora/checkpoint-2500/adapter_config.json +37 -0
glot-contrastive-final-lora/checkpoint-2500/adapter_model.safetensors +3 -0
glot-contrastive-final-lora/checkpoint-2500/optimizer.pt +3 -0
glot-contrastive-final-lora/checkpoint-2500/rng_state.pth +3 -0
glot-contrastive-final-lora/checkpoint-2500/scheduler.pt +3 -0
glot-contrastive-final-lora/checkpoint-2500/sentencepiece.bpe.model +3 -0
glot-contrastive-final-lora/checkpoint-2500/special_tokens_map.json +15 -0
glot-contrastive-final-lora/checkpoint-2500/tokenizer_config.json +57 -0
glot-contrastive-final-lora/checkpoint-2500/trainer_state.json +3534 -0
glot-contrastive-final-lora/checkpoint-2500/training_args.bin +3 -0
glot-contrastive-final-lora/checkpoint-3000/README.md +206 -0
glot-contrastive-final-lora/checkpoint-3000/adapter_config.json +37 -0
glot-contrastive-final-lora/checkpoint-3000/adapter_model.safetensors +3 -0

glot-contrastive-final-lora/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: ./glot-mlm-adapted
+library_name: peft
+tags:
+- base_model:adapter:./glot-mlm-adapted
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

glot-contrastive-final-lora/adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./glot-mlm-adapted",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query",
+    "value"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

glot-contrastive-final-lora/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3ba05d9cb007251d29a6f02fdd92f56fa1beb8f9e0676686472daf07c4e9f478
+size 2365824

glot-contrastive-final-lora/checkpoint-1000/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: ./glot-mlm-adapted
+library_name: peft
+tags:
+- base_model:adapter:./glot-mlm-adapted
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

glot-contrastive-final-lora/checkpoint-1000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./glot-mlm-adapted",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query",
+    "value"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

glot-contrastive-final-lora/checkpoint-1000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a256288c756ea19f50dedef87a2b9786971da4a587298a49f74d6f7686b0572
+size 2365824

glot-contrastive-final-lora/checkpoint-1000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:192e794a1d2f26b8d41b4ac6d8d1d65d67318fcf2e777ca7722bceabd58f6fb6
+size 4760395

glot-contrastive-final-lora/checkpoint-1000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dbe2c991b46d5f8c63a3e3c3773a3bf7d45c1bcb99de1418411217d641560e12
+size 14645

glot-contrastive-final-lora/checkpoint-1000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ce27a0a34a8d4759fb8e422039bad599131740613f03bb839ba3688bec3369a7
+size 1465

glot-contrastive-final-lora/checkpoint-1000/sentencepiece.bpe.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a313a26470baedaede322622492f2a542aa41527ddc5d40de444e945ad3c613
+size 7658320

glot-contrastive-final-lora/checkpoint-1000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}

glot-contrastive-final-lora/checkpoint-1000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "401144": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "unk_token": "<unk>",
+  "use_fast": true
+}

glot-contrastive-final-lora/checkpoint-1000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1434 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 5.714285714285714,
+  "eval_steps": 5,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02857142857142857,
+      "grad_norm": 0.1407003551721573,
+      "learning_rate": 0.00029965714285714283,
+      "loss": 0.9726,
+      "step": 5
+    },
+    {
+      "epoch": 0.05714285714285714,
+      "grad_norm": 0.26689061522483826,
+      "learning_rate": 0.0002992285714285714,
+      "loss": 0.9633,
+      "step": 10
+    },
+    {
+      "epoch": 0.08571428571428572,
+      "grad_norm": 0.8670485615730286,
+      "learning_rate": 0.0002988,
+      "loss": 0.9013,
+      "step": 15
+    },
+    {
+      "epoch": 0.11428571428571428,
+      "grad_norm": 0.9785467386245728,
+      "learning_rate": 0.00029837142857142853,
+      "loss": 0.6942,
+      "step": 20
+    },
+    {
+      "epoch": 0.14285714285714285,
+      "grad_norm": 1.3083932399749756,
+      "learning_rate": 0.0002979428571428571,
+      "loss": 0.4472,
+      "step": 25
+    },
+    {
+      "epoch": 0.17142857142857143,
+      "grad_norm": 1.6103293895721436,
+      "learning_rate": 0.0002975142857142857,
+      "loss": 0.3782,
+      "step": 30
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 2.6353416442871094,
+      "learning_rate": 0.0002970857142857143,
+      "loss": 0.3732,
+      "step": 35
+    },
+    {
+      "epoch": 0.22857142857142856,
+      "grad_norm": 0.9949072003364563,
+      "learning_rate": 0.0002966571428571428,
+      "loss": 0.3506,
+      "step": 40
+    },
+    {
+      "epoch": 0.2571428571428571,
+      "grad_norm": 1.280673861503601,
+      "learning_rate": 0.0002962285714285714,
+      "loss": 0.3346,
+      "step": 45
+    },
+    {
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.7681456208229065,
+      "learning_rate": 0.0002958,
+      "loss": 0.2832,
+      "step": 50
+    },
+    {
+      "epoch": 0.3142857142857143,
+      "grad_norm": 1.0000813007354736,
+      "learning_rate": 0.0002953714285714285,
+      "loss": 0.2603,
+      "step": 55
+    },
+    {
+      "epoch": 0.34285714285714286,
+      "grad_norm": 1.0222399234771729,
+      "learning_rate": 0.0002949428571428571,
+      "loss": 0.2507,
+      "step": 60
+    },
+    {
+      "epoch": 0.37142857142857144,
+      "grad_norm": 0.896902322769165,
+      "learning_rate": 0.0002945142857142857,
+      "loss": 0.2556,
+      "step": 65
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.9035541415214539,
+      "learning_rate": 0.00029408571428571426,
+      "loss": 0.2402,
+      "step": 70
+    },
+    {
+      "epoch": 0.42857142857142855,
+      "grad_norm": 1.4886469841003418,
+      "learning_rate": 0.00029365714285714285,
+      "loss": 0.2376,
+      "step": 75
+    },
+    {
+      "epoch": 0.45714285714285713,
+      "grad_norm": 0.8951187133789062,
+      "learning_rate": 0.0002932285714285714,
+      "loss": 0.2276,
+      "step": 80
+    },
+    {
+      "epoch": 0.4857142857142857,
+      "grad_norm": 0.7876377105712891,
+      "learning_rate": 0.00029279999999999996,
+      "loss": 0.2537,
+      "step": 85
+    },
+    {
+      "epoch": 0.5142857142857142,
+      "grad_norm": 1.0927226543426514,
+      "learning_rate": 0.00029237142857142855,
+      "loss": 0.2152,
+      "step": 90
+    },
+    {
+      "epoch": 0.5428571428571428,
+      "grad_norm": 1.4946355819702148,
+      "learning_rate": 0.00029194285714285713,
+      "loss": 0.2441,
+      "step": 95
+    },
+    {
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.7082991600036621,
+      "learning_rate": 0.0002915142857142857,
+      "loss": 0.2708,
+      "step": 100
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 0.670010507106781,
+      "learning_rate": 0.00029108571428571424,
+      "loss": 0.2396,
+      "step": 105
+    },
+    {
+      "epoch": 0.6285714285714286,
+      "grad_norm": 0.9797312021255493,
+      "learning_rate": 0.00029065714285714283,
+      "loss": 0.2275,
+      "step": 110
+    },
+    {
+      "epoch": 0.6571428571428571,
+      "grad_norm": 1.5220463275909424,
+      "learning_rate": 0.0002902285714285714,
+      "loss": 0.2114,
+      "step": 115
+    },
+    {
+      "epoch": 0.6857142857142857,
+      "grad_norm": 1.3326867818832397,
+      "learning_rate": 0.00028979999999999994,
+      "loss": 0.241,
+      "step": 120
+    },
+    {
+      "epoch": 0.7142857142857143,
+      "grad_norm": 1.1195529699325562,
+      "learning_rate": 0.0002893714285714285,
+      "loss": 0.2389,
+      "step": 125
+    },
+    {
+      "epoch": 0.7428571428571429,
+      "grad_norm": 0.7551061511039734,
+      "learning_rate": 0.0002889428571428571,
+      "loss": 0.2162,
+      "step": 130
+    },
+    {
+      "epoch": 0.7714285714285715,
+      "grad_norm": 1.018908977508545,
+      "learning_rate": 0.0002885142857142857,
+      "loss": 0.1924,
+      "step": 135
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 2.123642921447754,
+      "learning_rate": 0.0002880857142857143,
+      "loss": 0.2174,
+      "step": 140
+    },
+    {
+      "epoch": 0.8285714285714286,
+      "grad_norm": 0.7585068941116333,
+      "learning_rate": 0.0002876571428571428,
+      "loss": 0.2006,
+      "step": 145
+    },
+    {
+      "epoch": 0.8571428571428571,
+      "grad_norm": 1.64150869846344,
+      "learning_rate": 0.0002872285714285714,
+      "loss": 0.1905,
+      "step": 150
+    },
+    {
+      "epoch": 0.8857142857142857,
+      "grad_norm": 0.9126951694488525,
+      "learning_rate": 0.0002868,
+      "loss": 0.2312,
+      "step": 155
+    },
+    {
+      "epoch": 0.9142857142857143,
+      "grad_norm": 0.7278801202774048,
+      "learning_rate": 0.00028637142857142856,
+      "loss": 0.2077,
+      "step": 160
+    },
+    {
+      "epoch": 0.9428571428571428,
+      "grad_norm": 0.8931339383125305,
+      "learning_rate": 0.00028594285714285715,
+      "loss": 0.1951,
+      "step": 165
+    },
+    {
+      "epoch": 0.9714285714285714,
+      "grad_norm": 1.0831843614578247,
+      "learning_rate": 0.0002855142857142857,
+      "loss": 0.2103,
+      "step": 170
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 1.3750063180923462,
+      "learning_rate": 0.00028508571428571426,
+      "loss": 0.2396,
+      "step": 175
+    },
+    {
+      "epoch": 1.0285714285714285,
+      "grad_norm": 0.8338337540626526,
+      "learning_rate": 0.00028465714285714285,
+      "loss": 0.2404,
+      "step": 180
+    },
+    {
+      "epoch": 1.0571428571428572,
+      "grad_norm": 1.2879024744033813,
+      "learning_rate": 0.0002842285714285714,
+      "loss": 0.2117,
+      "step": 185
+    },
+    {
+      "epoch": 1.0857142857142856,
+      "grad_norm": 1.6751821041107178,
+      "learning_rate": 0.00028379999999999996,
+      "loss": 0.1796,
+      "step": 190
+    },
+    {
+      "epoch": 1.1142857142857143,
+      "grad_norm": 0.9864417910575867,
+      "learning_rate": 0.00028337142857142854,
+      "loss": 0.1993,
+      "step": 195
+    },
+    {
+      "epoch": 1.1428571428571428,
+      "grad_norm": 1.0174155235290527,
+      "learning_rate": 0.00028294285714285713,
+      "loss": 0.2068,
+      "step": 200
+    },
+    {
+      "epoch": 1.1714285714285715,
+      "grad_norm": 1.029832124710083,
+      "learning_rate": 0.0002825142857142857,
+      "loss": 0.2015,
+      "step": 205
+    },
+    {
+      "epoch": 1.2,
+      "grad_norm": 0.7745446562767029,
+      "learning_rate": 0.00028208571428571424,
+      "loss": 0.2129,
+      "step": 210
+    },
+    {
+      "epoch": 1.2285714285714286,
+      "grad_norm": 2.5578622817993164,
+      "learning_rate": 0.0002816571428571428,
+      "loss": 0.2224,
+      "step": 215
+    },
+    {
+      "epoch": 1.2571428571428571,
+      "grad_norm": 2.4185051918029785,
+      "learning_rate": 0.0002812285714285714,
+      "loss": 0.2276,
+      "step": 220
+    },
+    {
+      "epoch": 1.2857142857142856,
+      "grad_norm": 1.4176461696624756,
+      "learning_rate": 0.0002808,
+      "loss": 0.1781,
+      "step": 225
+    },
+    {
+      "epoch": 1.3142857142857143,
+      "grad_norm": 0.709326982498169,
+      "learning_rate": 0.0002803714285714286,
+      "loss": 0.2177,
+      "step": 230
+    },
+    {
+      "epoch": 1.342857142857143,
+      "grad_norm": 0.8170766830444336,
+      "learning_rate": 0.0002799428571428571,
+      "loss": 0.1769,
+      "step": 235
+    },
+    {
+      "epoch": 1.3714285714285714,
+      "grad_norm": 1.3850761651992798,
+      "learning_rate": 0.0002795142857142857,
+      "loss": 0.2262,
+      "step": 240
+    },
+    {
+      "epoch": 1.4,
+      "grad_norm": 1.0064373016357422,
+      "learning_rate": 0.0002790857142857143,
+      "loss": 0.196,
+      "step": 245
+    },
+    {
+      "epoch": 1.4285714285714286,
+      "grad_norm": 1.9635728597640991,
+      "learning_rate": 0.0002786571428571428,
+      "loss": 0.2029,
+      "step": 250
+    },
+    {
+      "epoch": 1.457142857142857,
+      "grad_norm": 16.20791244506836,
+      "learning_rate": 0.0002782285714285714,
+      "loss": 0.3925,
+      "step": 255
+    },
+    {
+      "epoch": 1.4857142857142858,
+      "grad_norm": 1.4363322257995605,
+      "learning_rate": 0.0002778,
+      "loss": 0.3684,
+      "step": 260
+    },
+    {
+      "epoch": 1.5142857142857142,
+      "grad_norm": 0.9379534721374512,
+      "learning_rate": 0.00027737142857142856,
+      "loss": 0.2265,
+      "step": 265
+    },
+    {
+      "epoch": 1.5428571428571427,
+      "grad_norm": 0.8453512787818909,
+      "learning_rate": 0.00027694285714285714,
+      "loss": 0.1976,
+      "step": 270
+    },
+    {
+      "epoch": 1.5714285714285714,
+      "grad_norm": 2.316664695739746,
+      "learning_rate": 0.0002765142857142857,
+      "loss": 0.23,
+      "step": 275
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 1.0548444986343384,
+      "learning_rate": 0.00027608571428571426,
+      "loss": 0.1823,
+      "step": 280
+    },
+    {
+      "epoch": 1.6285714285714286,
+      "grad_norm": 3.7894928455352783,
+      "learning_rate": 0.00027565714285714284,
+      "loss": 0.1962,
+      "step": 285
+    },
+    {
+      "epoch": 1.657142857142857,
+      "grad_norm": 2.3081610202789307,
+      "learning_rate": 0.00027522857142857143,
+      "loss": 0.2087,
+      "step": 290
+    },
+    {
+      "epoch": 1.6857142857142857,
+      "grad_norm": 0.9311438202857971,
+      "learning_rate": 0.0002748,
+      "loss": 0.1597,
+      "step": 295
+    },
+    {
+      "epoch": 1.7142857142857144,
+      "grad_norm": 1.1881247758865356,
+      "learning_rate": 0.00027437142857142854,
+      "loss": 0.1764,
+      "step": 300
+    },
+    {
+      "epoch": 1.7428571428571429,
+      "grad_norm": 1.30265212059021,
+      "learning_rate": 0.0002739428571428571,
+      "loss": 0.1647,
+      "step": 305
+    },
+    {
+      "epoch": 1.7714285714285714,
+      "grad_norm": 0.6832175850868225,
+      "learning_rate": 0.0002735142857142857,
+      "loss": 0.1638,
+      "step": 310
+    },
+    {
+      "epoch": 1.8,
+      "grad_norm": 1.8740538358688354,
+      "learning_rate": 0.00027308571428571424,
+      "loss": 0.1803,
+      "step": 315
+    },
+    {
+      "epoch": 1.8285714285714287,
+      "grad_norm": 9.821504592895508,
+      "learning_rate": 0.0002726571428571428,
+      "loss": 0.226,
+      "step": 320
+    },
+    {
+      "epoch": 1.8571428571428572,
+      "grad_norm": 1.0889750719070435,
+      "learning_rate": 0.0002722285714285714,
+      "loss": 0.1822,
+      "step": 325
+    },
+    {
+      "epoch": 1.8857142857142857,
+      "grad_norm": 0.9660868048667908,
+      "learning_rate": 0.0002718,
+      "loss": 0.1842,
+      "step": 330
+    },
+    {
+      "epoch": 1.9142857142857141,
+      "grad_norm": 0.6329234838485718,
+      "learning_rate": 0.0002713714285714286,
+      "loss": 0.1488,
+      "step": 335
+    },
+    {
+      "epoch": 1.9428571428571428,
+      "grad_norm": 3.601266384124756,
+      "learning_rate": 0.0002709428571428571,
+      "loss": 0.1887,
+      "step": 340
+    },
+    {
+      "epoch": 1.9714285714285715,
+      "grad_norm": 1.1441439390182495,
+      "learning_rate": 0.0002705142857142857,
+      "loss": 0.184,
+      "step": 345
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 0.8586034774780273,
+      "learning_rate": 0.0002700857142857143,
+      "loss": 0.1578,
+      "step": 350
+    },
+    {
+      "epoch": 2.0285714285714285,
+      "grad_norm": 1.5113487243652344,
+      "learning_rate": 0.00026965714285714286,
+      "loss": 0.2002,
+      "step": 355
+    },
+    {
+      "epoch": 2.057142857142857,
+      "grad_norm": 1.1123011112213135,
+      "learning_rate": 0.0002692285714285714,
+      "loss": 0.1946,
+      "step": 360
+    },
+    {
+      "epoch": 2.085714285714286,
+      "grad_norm": 0.9377036094665527,
+      "learning_rate": 0.0002688,
+      "loss": 0.1971,
+      "step": 365
+    },
+    {
+      "epoch": 2.1142857142857143,
+      "grad_norm": 0.6956892609596252,
+      "learning_rate": 0.00026837142857142856,
+      "loss": 0.1758,
+      "step": 370
+    },
+    {
+      "epoch": 2.142857142857143,
+      "grad_norm": 0.7510782480239868,
+      "learning_rate": 0.0002679428571428571,
+      "loss": 0.1674,
+      "step": 375
+    },
+    {
+      "epoch": 2.1714285714285713,
+      "grad_norm": 0.7009285092353821,
+      "learning_rate": 0.00026751428571428567,
+      "loss": 0.1945,
+      "step": 380
+    },
+    {
+      "epoch": 2.2,
+      "grad_norm": 0.9555609822273254,
+      "learning_rate": 0.00026708571428571426,
+      "loss": 0.1857,
+      "step": 385
+    },
+    {
+      "epoch": 2.2285714285714286,
+      "grad_norm": 2.133979082107544,
+      "learning_rate": 0.00026665714285714284,
+      "loss": 0.1636,
+      "step": 390
+    },
+    {
+      "epoch": 2.257142857142857,
+      "grad_norm": 0.7105309963226318,
+      "learning_rate": 0.0002662285714285714,
+      "loss": 0.2014,
+      "step": 395
+    },
+    {
+      "epoch": 2.2857142857142856,
+      "grad_norm": 0.7329701781272888,
+      "learning_rate": 0.00026579999999999996,
+      "loss": 0.1884,
+      "step": 400
+    },
+    {
+      "epoch": 2.314285714285714,
+      "grad_norm": 1.0426994562149048,
+      "learning_rate": 0.00026537142857142854,
+      "loss": 0.1558,
+      "step": 405
+    },
+    {
+      "epoch": 2.342857142857143,
+      "grad_norm": 0.9306122660636902,
+      "learning_rate": 0.0002649428571428571,
+      "loss": 0.1774,
+      "step": 410
+    },
+    {
+      "epoch": 2.3714285714285714,
+      "grad_norm": 0.6989394426345825,
+      "learning_rate": 0.00026451428571428565,
+      "loss": 0.1601,
+      "step": 415
+    },
+    {
+      "epoch": 2.4,
+      "grad_norm": 1.4383760690689087,
+      "learning_rate": 0.0002640857142857143,
+      "loss": 0.1564,
+      "step": 420
+    },
+    {
+      "epoch": 2.4285714285714284,
+      "grad_norm": 0.6448336839675903,
+      "learning_rate": 0.0002636571428571428,
+      "loss": 0.1827,
+      "step": 425
+    },
+    {
+      "epoch": 2.4571428571428573,
+      "grad_norm": 0.9535760879516602,
+      "learning_rate": 0.0002632285714285714,
+      "loss": 0.1713,
+      "step": 430
+    },
+    {
+      "epoch": 2.4857142857142858,
+      "grad_norm": 1.034945011138916,
+      "learning_rate": 0.0002628,
+      "loss": 0.1457,
+      "step": 435
+    },
+    {
+      "epoch": 2.5142857142857142,
+      "grad_norm": 1.3225128650665283,
+      "learning_rate": 0.0002623714285714285,
+      "loss": 0.1633,
+      "step": 440
+    },
+    {
+      "epoch": 2.5428571428571427,
+      "grad_norm": 0.8285059928894043,
+      "learning_rate": 0.0002619428571428571,
+      "loss": 0.2004,
+      "step": 445
+    },
+    {
+      "epoch": 2.571428571428571,
+      "grad_norm": 0.773176908493042,
+      "learning_rate": 0.0002615142857142857,
+      "loss": 0.1641,
+      "step": 450
+    },
+    {
+      "epoch": 2.6,
+      "grad_norm": 0.7964853048324585,
+      "learning_rate": 0.0002610857142857143,
+      "loss": 0.1608,
+      "step": 455
+    },
+    {
+      "epoch": 2.6285714285714286,
+      "grad_norm": 1.0967328548431396,
+      "learning_rate": 0.00026065714285714286,
+      "loss": 0.1697,
+      "step": 460
+    },
+    {
+      "epoch": 2.657142857142857,
+      "grad_norm": 0.6462066173553467,
+      "learning_rate": 0.0002602285714285714,
+      "loss": 0.1512,
+      "step": 465
+    },
+    {
+      "epoch": 2.685714285714286,
+      "grad_norm": 0.8765937089920044,
+      "learning_rate": 0.00025979999999999997,
+      "loss": 0.1826,
+      "step": 470
+    },
+    {
+      "epoch": 2.7142857142857144,
+      "grad_norm": 1.2524124383926392,
+      "learning_rate": 0.00025937142857142856,
+      "loss": 0.1731,
+      "step": 475
+    },
+    {
+      "epoch": 2.742857142857143,
+      "grad_norm": 2.2982606887817383,
+      "learning_rate": 0.0002589428571428571,
+      "loss": 0.1852,
+      "step": 480
+    },
+    {
+      "epoch": 2.7714285714285714,
+      "grad_norm": 0.9989053010940552,
+      "learning_rate": 0.0002585142857142857,
+      "loss": 0.1791,
+      "step": 485
+    },
+    {
+      "epoch": 2.8,
+      "grad_norm": 0.772343635559082,
+      "learning_rate": 0.00025808571428571426,
+      "loss": 0.1862,
+      "step": 490
+    },
+    {
+      "epoch": 2.8285714285714287,
+      "grad_norm": 1.2101136445999146,
+      "learning_rate": 0.00025765714285714284,
+      "loss": 0.1806,
+      "step": 495
+    },
+    {
+      "epoch": 2.857142857142857,
+      "grad_norm": 0.8010189533233643,
+      "learning_rate": 0.0002572285714285714,
+      "loss": 0.1842,
+      "step": 500
+    },
+    {
+      "epoch": 2.8857142857142857,
+      "grad_norm": 1.3597544431686401,
+      "learning_rate": 0.00025679999999999995,
+      "loss": 0.1583,
+      "step": 505
+    },
+    {
+      "epoch": 2.914285714285714,
+      "grad_norm": 0.8790671825408936,
+      "learning_rate": 0.00025637142857142854,
+      "loss": 0.1565,
+      "step": 510
+    },
+    {
+      "epoch": 2.942857142857143,
+      "grad_norm": 1.1175066232681274,
+      "learning_rate": 0.0002559428571428571,
+      "loss": 0.1406,
+      "step": 515
+    },
+    {
+      "epoch": 2.9714285714285715,
+      "grad_norm": 2.8528785705566406,
+      "learning_rate": 0.0002555142857142857,
+      "loss": 0.1735,
+      "step": 520
+    },
+    {
+      "epoch": 3.0,
+      "grad_norm": 2.2073328495025635,
+      "learning_rate": 0.0002550857142857143,
+      "loss": 0.1816,
+      "step": 525
+    },
+    {
+      "epoch": 3.0285714285714285,
+      "grad_norm": 11.01322078704834,
+      "learning_rate": 0.0002546571428571428,
+      "loss": 0.1873,
+      "step": 530
+    },
+    {
+      "epoch": 3.057142857142857,
+      "grad_norm": 1.5822402238845825,
+      "learning_rate": 0.0002542285714285714,
+      "loss": 0.168,
+      "step": 535
+    },
+    {
+      "epoch": 3.085714285714286,
+      "grad_norm": 1.3086942434310913,
+      "learning_rate": 0.0002538,
+      "loss": 0.149,
+      "step": 540
+    },
+    {
+      "epoch": 3.1142857142857143,
+      "grad_norm": 6.303041458129883,
+      "learning_rate": 0.0002533714285714285,
+      "loss": 0.1651,
+      "step": 545
+    },
+    {
+      "epoch": 3.142857142857143,
+      "grad_norm": 14.48929500579834,
+      "learning_rate": 0.00025294285714285716,
+      "loss": 0.1687,
+      "step": 550
+    },
+    {
+      "epoch": 3.1714285714285713,
+      "grad_norm": 6.824525356292725,
+      "learning_rate": 0.0002525142857142857,
+      "loss": 0.1919,
+      "step": 555
+    },
+    {
+      "epoch": 3.2,
+      "grad_norm": 18.772563934326172,
+      "learning_rate": 0.00025208571428571427,
+      "loss": 0.2075,
+      "step": 560
+    },
+    {
+      "epoch": 3.2285714285714286,
+      "grad_norm": 0.7268752455711365,
+      "learning_rate": 0.00025165714285714286,
+      "loss": 0.174,
+      "step": 565
+    },
+    {
+      "epoch": 3.257142857142857,
+      "grad_norm": 1.1301453113555908,
+      "learning_rate": 0.0002512285714285714,
+      "loss": 0.1668,
+      "step": 570
+    },
+    {
+      "epoch": 3.2857142857142856,
+      "grad_norm": 2.846802234649658,
+      "learning_rate": 0.00025079999999999997,
+      "loss": 0.1645,
+      "step": 575
+    },
+    {
+      "epoch": 3.314285714285714,
+      "grad_norm": 1.417515754699707,
+      "learning_rate": 0.00025037142857142855,
+      "loss": 0.1719,
+      "step": 580
+    },
+    {
+      "epoch": 3.342857142857143,
+      "grad_norm": 4.137150764465332,
+      "learning_rate": 0.00024994285714285714,
+      "loss": 0.1739,
+      "step": 585
+    },
+    {
+      "epoch": 3.3714285714285714,
+      "grad_norm": 2.6067259311676025,
+      "learning_rate": 0.0002495142857142857,
+      "loss": 0.1489,
+      "step": 590
+    },
+    {
+      "epoch": 3.4,
+      "grad_norm": 2.601024627685547,
+      "learning_rate": 0.00024908571428571425,
+      "loss": 0.1618,
+      "step": 595
+    },
+    {
+      "epoch": 3.4285714285714284,
+      "grad_norm": 3.849017858505249,
+      "learning_rate": 0.00024865714285714284,
+      "loss": 0.1899,
+      "step": 600
+    },
+    {
+      "epoch": 3.4571428571428573,
+      "grad_norm": 4.673766136169434,
+      "learning_rate": 0.0002482285714285714,
+      "loss": 0.1761,
+      "step": 605
+    },
+    {
+      "epoch": 3.4857142857142858,
+      "grad_norm": 2.6057631969451904,
+      "learning_rate": 0.00024779999999999995,
+      "loss": 0.1743,
+      "step": 610
+    },
+    {
+      "epoch": 3.5142857142857142,
+      "grad_norm": 2.932652473449707,
+      "learning_rate": 0.0002473714285714286,
+      "loss": 0.1482,
+      "step": 615
+    },
+    {
+      "epoch": 3.5428571428571427,
+      "grad_norm": 0.8764939308166504,
+      "learning_rate": 0.0002469428571428571,
+      "loss": 0.1644,
+      "step": 620
+    },
+    {
+      "epoch": 3.571428571428571,
+      "grad_norm": 1.3203191757202148,
+      "learning_rate": 0.0002465142857142857,
+      "loss": 0.1654,
+      "step": 625
+    },
+    {
+      "epoch": 3.6,
+      "grad_norm": 0.7977635264396667,
+      "learning_rate": 0.0002460857142857143,
+      "loss": 0.1472,
+      "step": 630
+    },
+    {
+      "epoch": 3.6285714285714286,
+      "grad_norm": 1.4750248193740845,
+      "learning_rate": 0.0002456571428571428,
+      "loss": 0.1735,
+      "step": 635
+    },
+    {
+      "epoch": 3.657142857142857,
+      "grad_norm": 1.8164482116699219,
+      "learning_rate": 0.0002452285714285714,
+      "loss": 0.1593,
+      "step": 640
+    },
+    {
+      "epoch": 3.685714285714286,
+      "grad_norm": 1.4829603433609009,
+      "learning_rate": 0.0002448,
+      "loss": 0.1508,
+      "step": 645
+    },
+    {
+      "epoch": 3.7142857142857144,
+      "grad_norm": 0.8828144669532776,
+      "learning_rate": 0.00024437142857142857,
+      "loss": 0.1573,
+      "step": 650
+    },
+    {
+      "epoch": 3.742857142857143,
+      "grad_norm": 2.039384126663208,
+      "learning_rate": 0.00024394285714285713,
+      "loss": 0.1745,
+      "step": 655
+    },
+    {
+      "epoch": 3.7714285714285714,
+      "grad_norm": 0.9604200720787048,
+      "learning_rate": 0.00024351428571428569,
+      "loss": 0.17,
+      "step": 660
+    },
+    {
+      "epoch": 3.8,
+      "grad_norm": 0.7903971076011658,
+      "learning_rate": 0.00024308571428571427,
+      "loss": 0.1654,
+      "step": 665
+    },
+    {
+      "epoch": 3.8285714285714287,
+      "grad_norm": 0.6935649514198303,
+      "learning_rate": 0.00024265714285714283,
+      "loss": 0.1714,
+      "step": 670
+    },
+    {
+      "epoch": 3.857142857142857,
+      "grad_norm": 0.5832012295722961,
+      "learning_rate": 0.00024222857142857138,
+      "loss": 0.1636,
+      "step": 675
+    },
+    {
+      "epoch": 3.8857142857142857,
+      "grad_norm": 0.6303168535232544,
+      "learning_rate": 0.0002418,
+      "loss": 0.1604,
+      "step": 680
+    },
+    {
+      "epoch": 3.914285714285714,
+      "grad_norm": 0.7210885882377625,
+      "learning_rate": 0.00024137142857142855,
+      "loss": 0.1444,
+      "step": 685
+    },
+    {
+      "epoch": 3.942857142857143,
+      "grad_norm": 0.7690990567207336,
+      "learning_rate": 0.00024094285714285714,
+      "loss": 0.1631,
+      "step": 690
+    },
+    {
+      "epoch": 3.9714285714285715,
+      "grad_norm": 1.0142720937728882,
+      "learning_rate": 0.0002405142857142857,
+      "loss": 0.158,
+      "step": 695
+    },
+    {
+      "epoch": 4.0,
+      "grad_norm": 0.7970322966575623,
+      "learning_rate": 0.00024008571428571425,
+      "loss": 0.1803,
+      "step": 700
+    },
+    {
+      "epoch": 4.0285714285714285,
+      "grad_norm": 0.6795914769172668,
+      "learning_rate": 0.00023965714285714284,
+      "loss": 0.143,
+      "step": 705
+    },
+    {
+      "epoch": 4.057142857142857,
+      "grad_norm": 0.6832629442214966,
+      "learning_rate": 0.0002392285714285714,
+      "loss": 0.1457,
+      "step": 710
+    },
+    {
+      "epoch": 4.085714285714285,
+      "grad_norm": 3.8629798889160156,
+      "learning_rate": 0.0002388,
+      "loss": 0.1671,
+      "step": 715
+    },
+    {
+      "epoch": 4.114285714285714,
+      "grad_norm": 1.1167882680892944,
+      "learning_rate": 0.00023837142857142856,
+      "loss": 0.1544,
+      "step": 720
+    },
+    {
+      "epoch": 4.142857142857143,
+      "grad_norm": 0.9431412816047668,
+      "learning_rate": 0.00023794285714285712,
+      "loss": 0.1605,
+      "step": 725
+    },
+    {
+      "epoch": 4.171428571428572,
+      "grad_norm": 1.310948133468628,
+      "learning_rate": 0.0002375142857142857,
+      "loss": 0.1121,
+      "step": 730
+    },
+    {
+      "epoch": 4.2,
+      "grad_norm": 0.9830737709999084,
+      "learning_rate": 0.00023708571428571426,
+      "loss": 0.1742,
+      "step": 735
+    },
+    {
+      "epoch": 4.228571428571429,
+      "grad_norm": 0.6166555881500244,
+      "learning_rate": 0.00023665714285714282,
+      "loss": 0.1525,
+      "step": 740
+    },
+    {
+      "epoch": 4.257142857142857,
+      "grad_norm": 0.995579719543457,
+      "learning_rate": 0.00023622857142857143,
+      "loss": 0.1439,
+      "step": 745
+    },
+    {
+      "epoch": 4.285714285714286,
+      "grad_norm": 0.639796793460846,
+      "learning_rate": 0.00023579999999999999,
+      "loss": 0.1692,
+      "step": 750
+    },
+    {
+      "epoch": 4.314285714285714,
+      "grad_norm": 0.9438050389289856,
+      "learning_rate": 0.00023537142857142854,
+      "loss": 0.1785,
+      "step": 755
+    },
+    {
+      "epoch": 4.3428571428571425,
+      "grad_norm": 0.8960750102996826,
+      "learning_rate": 0.00023494285714285713,
+      "loss": 0.1557,
+      "step": 760
+    },
+    {
+      "epoch": 4.371428571428572,
+      "grad_norm": 0.6287499070167542,
+      "learning_rate": 0.00023451428571428568,
+      "loss": 0.1459,
+      "step": 765
+    },
+    {
+      "epoch": 4.4,
+      "grad_norm": 0.7638295888900757,
+      "learning_rate": 0.00023408571428571424,
+      "loss": 0.1341,
+      "step": 770
+    },
+    {
+      "epoch": 4.428571428571429,
+      "grad_norm": 0.655878484249115,
+      "learning_rate": 0.00023365714285714283,
+      "loss": 0.1358,
+      "step": 775
+    },
+    {
+      "epoch": 4.457142857142857,
+      "grad_norm": 0.5840997695922852,
+      "learning_rate": 0.0002332285714285714,
+      "loss": 0.1386,
+      "step": 780
+    },
+    {
+      "epoch": 4.485714285714286,
+      "grad_norm": 1.1082488298416138,
+      "learning_rate": 0.0002328,
+      "loss": 0.1827,
+      "step": 785
+    },
+    {
+      "epoch": 4.514285714285714,
+      "grad_norm": 0.8825240135192871,
+      "learning_rate": 0.00023237142857142855,
+      "loss": 0.1527,
+      "step": 790
+    },
+    {
+      "epoch": 4.542857142857143,
+      "grad_norm": 0.6752304434776306,
+      "learning_rate": 0.0002319428571428571,
+      "loss": 0.1392,
+      "step": 795
+    },
+    {
+      "epoch": 4.571428571428571,
+      "grad_norm": 1.1423301696777344,
+      "learning_rate": 0.0002315142857142857,
+      "loss": 0.1433,
+      "step": 800
+    },
+    {
+      "epoch": 4.6,
+      "grad_norm": 10.793691635131836,
+      "learning_rate": 0.00023108571428571425,
+      "loss": 0.1635,
+      "step": 805
+    },
+    {
+      "epoch": 4.628571428571428,
+      "grad_norm": 0.47564294934272766,
+      "learning_rate": 0.00023065714285714286,
+      "loss": 0.1199,
+      "step": 810
+    },
+    {
+      "epoch": 4.6571428571428575,
+      "grad_norm": 1.2492656707763672,
+      "learning_rate": 0.00023022857142857142,
+      "loss": 0.1488,
+      "step": 815
+    },
+    {
+      "epoch": 4.685714285714286,
+      "grad_norm": 0.6933501958847046,
+      "learning_rate": 0.00022979999999999997,
+      "loss": 0.1812,
+      "step": 820
+    },
+    {
+      "epoch": 4.714285714285714,
+      "grad_norm": 0.7901633977890015,
+      "learning_rate": 0.00022937142857142856,
+      "loss": 0.1415,
+      "step": 825
+    },
+    {
+      "epoch": 4.742857142857143,
+      "grad_norm": 0.7854829430580139,
+      "learning_rate": 0.00022894285714285712,
+      "loss": 0.1401,
+      "step": 830
+    },
+    {
+      "epoch": 4.771428571428571,
+      "grad_norm": 0.8716740608215332,
+      "learning_rate": 0.00022851428571428567,
+      "loss": 0.1982,
+      "step": 835
+    },
+    {
+      "epoch": 4.8,
+      "grad_norm": 0.7047899961471558,
+      "learning_rate": 0.00022808571428571426,
+      "loss": 0.1624,
+      "step": 840
+    },
+    {
+      "epoch": 4.828571428571428,
+      "grad_norm": 0.7134959697723389,
+      "learning_rate": 0.00022765714285714284,
+      "loss": 0.1375,
+      "step": 845
+    },
+    {
+      "epoch": 4.857142857142857,
+      "grad_norm": 1.0897325277328491,
+      "learning_rate": 0.00022722857142857143,
+      "loss": 0.1489,
+      "step": 850
+    },
+    {
+      "epoch": 4.885714285714286,
+      "grad_norm": 1.1065207719802856,
+      "learning_rate": 0.00022679999999999998,
+      "loss": 0.1495,
+      "step": 855
+    },
+    {
+      "epoch": 4.914285714285715,
+      "grad_norm": 0.7434757351875305,
+      "learning_rate": 0.00022637142857142854,
+      "loss": 0.1507,
+      "step": 860
+    },
+    {
+      "epoch": 4.942857142857143,
+      "grad_norm": 1.0045181512832642,
+      "learning_rate": 0.00022594285714285712,
+      "loss": 0.1527,
+      "step": 865
+    },
+    {
+      "epoch": 4.9714285714285715,
+      "grad_norm": 1.2025654315948486,
+      "learning_rate": 0.00022551428571428568,
+      "loss": 0.1523,
+      "step": 870
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 0.7823342084884644,
+      "learning_rate": 0.0002250857142857143,
+      "loss": 0.1514,
+      "step": 875
+    },
+    {
+      "epoch": 5.0285714285714285,
+      "grad_norm": 0.8405362963676453,
+      "learning_rate": 0.00022465714285714285,
+      "loss": 0.1461,
+      "step": 880
+    },
+    {
+      "epoch": 5.057142857142857,
+      "grad_norm": 0.7527463436126709,
+      "learning_rate": 0.0002242285714285714,
+      "loss": 0.1206,
+      "step": 885
+    },
+    {
+      "epoch": 5.085714285714285,
+      "grad_norm": 0.8372548222541809,
+      "learning_rate": 0.0002238,
+      "loss": 0.1513,
+      "step": 890
+    },
+    {
+      "epoch": 5.114285714285714,
+      "grad_norm": 0.8755456209182739,
+      "learning_rate": 0.00022337142857142855,
+      "loss": 0.1498,
+      "step": 895
+    },
+    {
+      "epoch": 5.142857142857143,
+      "grad_norm": 0.7312084436416626,
+      "learning_rate": 0.0002229428571428571,
+      "loss": 0.154,
+      "step": 900
+    },
+    {
+      "epoch": 5.171428571428572,
+      "grad_norm": 0.6366221904754639,
+      "learning_rate": 0.0002225142857142857,
+      "loss": 0.1466,
+      "step": 905
+    },
+    {
+      "epoch": 5.2,
+      "grad_norm": 0.6406880617141724,
+      "learning_rate": 0.00022208571428571427,
+      "loss": 0.1254,
+      "step": 910
+    },
+    {
+      "epoch": 5.228571428571429,
+      "grad_norm": 2.4106833934783936,
+      "learning_rate": 0.00022165714285714283,
+      "loss": 0.1534,
+      "step": 915
+    },
+    {
+      "epoch": 5.257142857142857,
+      "grad_norm": 0.5635722279548645,
+      "learning_rate": 0.00022122857142857142,
+      "loss": 0.1461,
+      "step": 920
+    },
+    {
+      "epoch": 5.285714285714286,
+      "grad_norm": 0.787162184715271,
+      "learning_rate": 0.00022079999999999997,
+      "loss": 0.1424,
+      "step": 925
+    },
+    {
+      "epoch": 5.314285714285714,
+      "grad_norm": 0.6513975262641907,
+      "learning_rate": 0.00022037142857142853,
+      "loss": 0.1326,
+      "step": 930
+    },
+    {
+      "epoch": 5.3428571428571425,
+      "grad_norm": 0.6933534741401672,
+      "learning_rate": 0.00021994285714285711,
+      "loss": 0.1661,
+      "step": 935
+    },
+    {
+      "epoch": 5.371428571428572,
+      "grad_norm": 0.7263259887695312,
+      "learning_rate": 0.0002195142857142857,
+      "loss": 0.15,
+      "step": 940
+    },
+    {
+      "epoch": 5.4,
+      "grad_norm": 0.5537381768226624,
+      "learning_rate": 0.00021908571428571428,
+      "loss": 0.129,
+      "step": 945
+    },
+    {
+      "epoch": 5.428571428571429,
+      "grad_norm": 0.6014005541801453,
+      "learning_rate": 0.00021865714285714284,
+      "loss": 0.1321,
+      "step": 950
+    },
+    {
+      "epoch": 5.457142857142857,
+      "grad_norm": 0.6581441760063171,
+      "learning_rate": 0.0002182285714285714,
+      "loss": 0.1587,
+      "step": 955
+    },
+    {
+      "epoch": 5.485714285714286,
+      "grad_norm": 0.9326379895210266,
+      "learning_rate": 0.00021779999999999998,
+      "loss": 0.1654,
+      "step": 960
+    },
+    {
+      "epoch": 5.514285714285714,
+      "grad_norm": 0.9438592791557312,
+      "learning_rate": 0.00021737142857142854,
+      "loss": 0.1212,
+      "step": 965
+    },
+    {
+      "epoch": 5.542857142857143,
+      "grad_norm": 0.7699571251869202,
+      "learning_rate": 0.00021694285714285715,
+      "loss": 0.1464,
+      "step": 970
+    },
+    {
+      "epoch": 5.571428571428571,
+      "grad_norm": 0.8758366703987122,
+      "learning_rate": 0.0002165142857142857,
+      "loss": 0.1599,
+      "step": 975
+    },
+    {
+      "epoch": 5.6,
+      "grad_norm": 0.6101442575454712,
+      "learning_rate": 0.00021608571428571426,
+      "loss": 0.1589,
+      "step": 980
+    },
+    {
+      "epoch": 5.628571428571428,
+      "grad_norm": 0.7454060912132263,
+      "learning_rate": 0.00021565714285714285,
+      "loss": 0.1433,
+      "step": 985
+    },
+    {
+      "epoch": 5.6571428571428575,
+      "grad_norm": 0.6379484534263611,
+      "learning_rate": 0.0002152285714285714,
+      "loss": 0.1592,
+      "step": 990
+    },
+    {
+      "epoch": 5.685714285714286,
+      "grad_norm": 1.1601309776306152,
+      "learning_rate": 0.00021479999999999996,
+      "loss": 0.1647,
+      "step": 995
+    },
+    {
+      "epoch": 5.714285714285714,
+      "grad_norm": 0.5464673638343811,
+      "learning_rate": 0.00021437142857142855,
+      "loss": 0.1469,
+      "step": 1000
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 3500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 20,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 200,
+  "trial_name": null,
+  "trial_params": null
+}

glot-contrastive-final-lora/checkpoint-1000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02a87dc6b2c67ad3df98065b9e8fa21d9d93cd2cb361c532cb83c8a37bdc81a3
+size 5777

glot-contrastive-final-lora/checkpoint-1500/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: ./glot-mlm-adapted
+library_name: peft
+tags:
+- base_model:adapter:./glot-mlm-adapted
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

glot-contrastive-final-lora/checkpoint-1500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./glot-mlm-adapted",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query",
+    "value"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

glot-contrastive-final-lora/checkpoint-1500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c6413fb7d01f4b21da6e461dea0648d8d88fd37d6bd7c099ca98b3253cf62a00
+size 2365824

glot-contrastive-final-lora/checkpoint-1500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:36773257ecda9472e5ab320c80e2afdec9be64091b43e5bcbc53455be6b8149d
+size 4760395

glot-contrastive-final-lora/checkpoint-1500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1be729d0d5511ce10795029b99cb6f519c2f3eea267e5026e9426be89babe546
+size 14645

glot-contrastive-final-lora/checkpoint-1500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7719f3f0106e49068ce5d6f3e02c2bb61413e6107676c385f70427146af2266c
+size 1465

glot-contrastive-final-lora/checkpoint-1500/sentencepiece.bpe.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a313a26470baedaede322622492f2a542aa41527ddc5d40de444e945ad3c613
+size 7658320

glot-contrastive-final-lora/checkpoint-1500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}

glot-contrastive-final-lora/checkpoint-1500/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "401144": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "unk_token": "<unk>",
+  "use_fast": true
+}

glot-contrastive-final-lora/checkpoint-1500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2134 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 8.571428571428571,
+  "eval_steps": 5,
+  "global_step": 1500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02857142857142857,
+      "grad_norm": 0.1407003551721573,
+      "learning_rate": 0.00029965714285714283,
+      "loss": 0.9726,
+      "step": 5
+    },
+    {
+      "epoch": 0.05714285714285714,
+      "grad_norm": 0.26689061522483826,
+      "learning_rate": 0.0002992285714285714,
+      "loss": 0.9633,
+      "step": 10
+    },
+    {
+      "epoch": 0.08571428571428572,
+      "grad_norm": 0.8670485615730286,
+      "learning_rate": 0.0002988,
+      "loss": 0.9013,
+      "step": 15
+    },
+    {
+      "epoch": 0.11428571428571428,
+      "grad_norm": 0.9785467386245728,
+      "learning_rate": 0.00029837142857142853,
+      "loss": 0.6942,
+      "step": 20
+    },
+    {
+      "epoch": 0.14285714285714285,
+      "grad_norm": 1.3083932399749756,
+      "learning_rate": 0.0002979428571428571,
+      "loss": 0.4472,
+      "step": 25
+    },
+    {
+      "epoch": 0.17142857142857143,
+      "grad_norm": 1.6103293895721436,
+      "learning_rate": 0.0002975142857142857,
+      "loss": 0.3782,
+      "step": 30
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 2.6353416442871094,
+      "learning_rate": 0.0002970857142857143,
+      "loss": 0.3732,
+      "step": 35
+    },
+    {
+      "epoch": 0.22857142857142856,
+      "grad_norm": 0.9949072003364563,
+      "learning_rate": 0.0002966571428571428,
+      "loss": 0.3506,
+      "step": 40
+    },
+    {
+      "epoch": 0.2571428571428571,
+      "grad_norm": 1.280673861503601,
+      "learning_rate": 0.0002962285714285714,
+      "loss": 0.3346,
+      "step": 45
+    },
+    {
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.7681456208229065,
+      "learning_rate": 0.0002958,
+      "loss": 0.2832,
+      "step": 50
+    },
+    {
+      "epoch": 0.3142857142857143,
+      "grad_norm": 1.0000813007354736,
+      "learning_rate": 0.0002953714285714285,
+      "loss": 0.2603,
+      "step": 55
+    },
+    {
+      "epoch": 0.34285714285714286,
+      "grad_norm": 1.0222399234771729,
+      "learning_rate": 0.0002949428571428571,
+      "loss": 0.2507,
+      "step": 60
+    },
+    {
+      "epoch": 0.37142857142857144,
+      "grad_norm": 0.896902322769165,
+      "learning_rate": 0.0002945142857142857,
+      "loss": 0.2556,
+      "step": 65
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.9035541415214539,
+      "learning_rate": 0.00029408571428571426,
+      "loss": 0.2402,
+      "step": 70
+    },
+    {
+      "epoch": 0.42857142857142855,
+      "grad_norm": 1.4886469841003418,
+      "learning_rate": 0.00029365714285714285,
+      "loss": 0.2376,
+      "step": 75
+    },
+    {
+      "epoch": 0.45714285714285713,
+      "grad_norm": 0.8951187133789062,
+      "learning_rate": 0.0002932285714285714,
+      "loss": 0.2276,
+      "step": 80
+    },
+    {
+      "epoch": 0.4857142857142857,
+      "grad_norm": 0.7876377105712891,
+      "learning_rate": 0.00029279999999999996,
+      "loss": 0.2537,
+      "step": 85
+    },
+    {
+      "epoch": 0.5142857142857142,
+      "grad_norm": 1.0927226543426514,
+      "learning_rate": 0.00029237142857142855,
+      "loss": 0.2152,
+      "step": 90
+    },
+    {
+      "epoch": 0.5428571428571428,
+      "grad_norm": 1.4946355819702148,
+      "learning_rate": 0.00029194285714285713,
+      "loss": 0.2441,
+      "step": 95
+    },
+    {
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.7082991600036621,
+      "learning_rate": 0.0002915142857142857,
+      "loss": 0.2708,
+      "step": 100
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 0.670010507106781,
+      "learning_rate": 0.00029108571428571424,
+      "loss": 0.2396,
+      "step": 105
+    },
+    {
+      "epoch": 0.6285714285714286,
+      "grad_norm": 0.9797312021255493,
+      "learning_rate": 0.00029065714285714283,
+      "loss": 0.2275,
+      "step": 110
+    },
+    {
+      "epoch": 0.6571428571428571,
+      "grad_norm": 1.5220463275909424,
+      "learning_rate": 0.0002902285714285714,
+      "loss": 0.2114,
+      "step": 115
+    },
+    {
+      "epoch": 0.6857142857142857,
+      "grad_norm": 1.3326867818832397,
+      "learning_rate": 0.00028979999999999994,
+      "loss": 0.241,
+      "step": 120
+    },
+    {
+      "epoch": 0.7142857142857143,
+      "grad_norm": 1.1195529699325562,
+      "learning_rate": 0.0002893714285714285,
+      "loss": 0.2389,
+      "step": 125
+    },
+    {
+      "epoch": 0.7428571428571429,
+      "grad_norm": 0.7551061511039734,
+      "learning_rate": 0.0002889428571428571,
+      "loss": 0.2162,
+      "step": 130
+    },
+    {
+      "epoch": 0.7714285714285715,
+      "grad_norm": 1.018908977508545,
+      "learning_rate": 0.0002885142857142857,
+      "loss": 0.1924,
+      "step": 135
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 2.123642921447754,
+      "learning_rate": 0.0002880857142857143,
+      "loss": 0.2174,
+      "step": 140
+    },
+    {
+      "epoch": 0.8285714285714286,
+      "grad_norm": 0.7585068941116333,
+      "learning_rate": 0.0002876571428571428,
+      "loss": 0.2006,
+      "step": 145
+    },
+    {
+      "epoch": 0.8571428571428571,
+      "grad_norm": 1.64150869846344,
+      "learning_rate": 0.0002872285714285714,
+      "loss": 0.1905,
+      "step": 150
+    },
+    {
+      "epoch": 0.8857142857142857,
+      "grad_norm": 0.9126951694488525,
+      "learning_rate": 0.0002868,
+      "loss": 0.2312,
+      "step": 155
+    },
+    {
+      "epoch": 0.9142857142857143,
+      "grad_norm": 0.7278801202774048,
+      "learning_rate": 0.00028637142857142856,
+      "loss": 0.2077,
+      "step": 160
+    },
+    {
+      "epoch": 0.9428571428571428,
+      "grad_norm": 0.8931339383125305,
+      "learning_rate": 0.00028594285714285715,
+      "loss": 0.1951,
+      "step": 165
+    },
+    {
+      "epoch": 0.9714285714285714,
+      "grad_norm": 1.0831843614578247,
+      "learning_rate": 0.0002855142857142857,
+      "loss": 0.2103,
+      "step": 170
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 1.3750063180923462,
+      "learning_rate": 0.00028508571428571426,
+      "loss": 0.2396,
+      "step": 175
+    },
+    {
+      "epoch": 1.0285714285714285,
+      "grad_norm": 0.8338337540626526,
+      "learning_rate": 0.00028465714285714285,
+      "loss": 0.2404,
+      "step": 180
+    },
+    {
+      "epoch": 1.0571428571428572,
+      "grad_norm": 1.2879024744033813,
+      "learning_rate": 0.0002842285714285714,
+      "loss": 0.2117,
+      "step": 185
+    },
+    {
+      "epoch": 1.0857142857142856,
+      "grad_norm": 1.6751821041107178,
+      "learning_rate": 0.00028379999999999996,
+      "loss": 0.1796,
+      "step": 190
+    },
+    {
+      "epoch": 1.1142857142857143,
+      "grad_norm": 0.9864417910575867,
+      "learning_rate": 0.00028337142857142854,
+      "loss": 0.1993,
+      "step": 195
+    },
+    {
+      "epoch": 1.1428571428571428,
+      "grad_norm": 1.0174155235290527,
+      "learning_rate": 0.00028294285714285713,
+      "loss": 0.2068,
+      "step": 200
+    },
+    {
+      "epoch": 1.1714285714285715,
+      "grad_norm": 1.029832124710083,
+      "learning_rate": 0.0002825142857142857,
+      "loss": 0.2015,
+      "step": 205
+    },
+    {
+      "epoch": 1.2,
+      "grad_norm": 0.7745446562767029,
+      "learning_rate": 0.00028208571428571424,
+      "loss": 0.2129,
+      "step": 210
+    },
+    {
+      "epoch": 1.2285714285714286,
+      "grad_norm": 2.5578622817993164,
+      "learning_rate": 0.0002816571428571428,
+      "loss": 0.2224,
+      "step": 215
+    },
+    {
+      "epoch": 1.2571428571428571,
+      "grad_norm": 2.4185051918029785,
+      "learning_rate": 0.0002812285714285714,
+      "loss": 0.2276,
+      "step": 220
+    },
+    {
+      "epoch": 1.2857142857142856,
+      "grad_norm": 1.4176461696624756,
+      "learning_rate": 0.0002808,
+      "loss": 0.1781,
+      "step": 225
+    },
+    {
+      "epoch": 1.3142857142857143,
+      "grad_norm": 0.709326982498169,
+      "learning_rate": 0.0002803714285714286,
+      "loss": 0.2177,
+      "step": 230
+    },
+    {
+      "epoch": 1.342857142857143,
+      "grad_norm": 0.8170766830444336,
+      "learning_rate": 0.0002799428571428571,
+      "loss": 0.1769,
+      "step": 235
+    },
+    {
+      "epoch": 1.3714285714285714,
+      "grad_norm": 1.3850761651992798,
+      "learning_rate": 0.0002795142857142857,
+      "loss": 0.2262,
+      "step": 240
+    },
+    {
+      "epoch": 1.4,
+      "grad_norm": 1.0064373016357422,
+      "learning_rate": 0.0002790857142857143,
+      "loss": 0.196,
+      "step": 245
+    },
+    {
+      "epoch": 1.4285714285714286,
+      "grad_norm": 1.9635728597640991,
+      "learning_rate": 0.0002786571428571428,
+      "loss": 0.2029,
+      "step": 250
+    },
+    {
+      "epoch": 1.457142857142857,
+      "grad_norm": 16.20791244506836,
+      "learning_rate": 0.0002782285714285714,
+      "loss": 0.3925,
+      "step": 255
+    },
+    {
+      "epoch": 1.4857142857142858,
+      "grad_norm": 1.4363322257995605,
+      "learning_rate": 0.0002778,
+      "loss": 0.3684,
+      "step": 260
+    },
+    {
+      "epoch": 1.5142857142857142,
+      "grad_norm": 0.9379534721374512,
+      "learning_rate": 0.00027737142857142856,
+      "loss": 0.2265,
+      "step": 265
+    },
+    {
+      "epoch": 1.5428571428571427,
+      "grad_norm": 0.8453512787818909,
+      "learning_rate": 0.00027694285714285714,
+      "loss": 0.1976,
+      "step": 270
+    },
+    {
+      "epoch": 1.5714285714285714,
+      "grad_norm": 2.316664695739746,
+      "learning_rate": 0.0002765142857142857,
+      "loss": 0.23,
+      "step": 275
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 1.0548444986343384,
+      "learning_rate": 0.00027608571428571426,
+      "loss": 0.1823,
+      "step": 280
+    },
+    {
+      "epoch": 1.6285714285714286,
+      "grad_norm": 3.7894928455352783,
+      "learning_rate": 0.00027565714285714284,
+      "loss": 0.1962,
+      "step": 285
+    },
+    {
+      "epoch": 1.657142857142857,
+      "grad_norm": 2.3081610202789307,
+      "learning_rate": 0.00027522857142857143,
+      "loss": 0.2087,
+      "step": 290
+    },
+    {
+      "epoch": 1.6857142857142857,
+      "grad_norm": 0.9311438202857971,
+      "learning_rate": 0.0002748,
+      "loss": 0.1597,
+      "step": 295
+    },
+    {
+      "epoch": 1.7142857142857144,
+      "grad_norm": 1.1881247758865356,
+      "learning_rate": 0.00027437142857142854,
+      "loss": 0.1764,
+      "step": 300
+    },
+    {
+      "epoch": 1.7428571428571429,
+      "grad_norm": 1.30265212059021,
+      "learning_rate": 0.0002739428571428571,
+      "loss": 0.1647,
+      "step": 305
+    },
+    {
+      "epoch": 1.7714285714285714,
+      "grad_norm": 0.6832175850868225,
+      "learning_rate": 0.0002735142857142857,
+      "loss": 0.1638,
+      "step": 310
+    },
+    {
+      "epoch": 1.8,
+      "grad_norm": 1.8740538358688354,
+      "learning_rate": 0.00027308571428571424,
+      "loss": 0.1803,
+      "step": 315
+    },
+    {
+      "epoch": 1.8285714285714287,
+      "grad_norm": 9.821504592895508,
+      "learning_rate": 0.0002726571428571428,
+      "loss": 0.226,
+      "step": 320
+    },
+    {
+      "epoch": 1.8571428571428572,
+      "grad_norm": 1.0889750719070435,
+      "learning_rate": 0.0002722285714285714,
+      "loss": 0.1822,
+      "step": 325
+    },
+    {
+      "epoch": 1.8857142857142857,
+      "grad_norm": 0.9660868048667908,
+      "learning_rate": 0.0002718,
+      "loss": 0.1842,
+      "step": 330
+    },
+    {
+      "epoch": 1.9142857142857141,
+      "grad_norm": 0.6329234838485718,
+      "learning_rate": 0.0002713714285714286,
+      "loss": 0.1488,
+      "step": 335
+    },
+    {
+      "epoch": 1.9428571428571428,
+      "grad_norm": 3.601266384124756,
+      "learning_rate": 0.0002709428571428571,
+      "loss": 0.1887,
+      "step": 340
+    },
+    {
+      "epoch": 1.9714285714285715,
+      "grad_norm": 1.1441439390182495,
+      "learning_rate": 0.0002705142857142857,
+      "loss": 0.184,
+      "step": 345
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 0.8586034774780273,
+      "learning_rate": 0.0002700857142857143,
+      "loss": 0.1578,
+      "step": 350
+    },
+    {
+      "epoch": 2.0285714285714285,
+      "grad_norm": 1.5113487243652344,
+      "learning_rate": 0.00026965714285714286,
+      "loss": 0.2002,
+      "step": 355
+    },
+    {
+      "epoch": 2.057142857142857,
+      "grad_norm": 1.1123011112213135,
+      "learning_rate": 0.0002692285714285714,
+      "loss": 0.1946,
+      "step": 360
+    },
+    {
+      "epoch": 2.085714285714286,
+      "grad_norm": 0.9377036094665527,
+      "learning_rate": 0.0002688,
+      "loss": 0.1971,
+      "step": 365
+    },
+    {
+      "epoch": 2.1142857142857143,
+      "grad_norm": 0.6956892609596252,
+      "learning_rate": 0.00026837142857142856,
+      "loss": 0.1758,
+      "step": 370
+    },
+    {
+      "epoch": 2.142857142857143,
+      "grad_norm": 0.7510782480239868,
+      "learning_rate": 0.0002679428571428571,
+      "loss": 0.1674,
+      "step": 375
+    },
+    {
+      "epoch": 2.1714285714285713,
+      "grad_norm": 0.7009285092353821,
+      "learning_rate": 0.00026751428571428567,
+      "loss": 0.1945,
+      "step": 380
+    },
+    {
+      "epoch": 2.2,
+      "grad_norm": 0.9555609822273254,
+      "learning_rate": 0.00026708571428571426,
+      "loss": 0.1857,
+      "step": 385
+    },
+    {
+      "epoch": 2.2285714285714286,
+      "grad_norm": 2.133979082107544,
+      "learning_rate": 0.00026665714285714284,
+      "loss": 0.1636,
+      "step": 390
+    },
+    {
+      "epoch": 2.257142857142857,
+      "grad_norm": 0.7105309963226318,
+      "learning_rate": 0.0002662285714285714,
+      "loss": 0.2014,
+      "step": 395
+    },
+    {
+      "epoch": 2.2857142857142856,
+      "grad_norm": 0.7329701781272888,
+      "learning_rate": 0.00026579999999999996,
+      "loss": 0.1884,
+      "step": 400
+    },
+    {
+      "epoch": 2.314285714285714,
+      "grad_norm": 1.0426994562149048,
+      "learning_rate": 0.00026537142857142854,
+      "loss": 0.1558,
+      "step": 405
+    },
+    {
+      "epoch": 2.342857142857143,
+      "grad_norm": 0.9306122660636902,
+      "learning_rate": 0.0002649428571428571,
+      "loss": 0.1774,
+      "step": 410
+    },
+    {
+      "epoch": 2.3714285714285714,
+      "grad_norm": 0.6989394426345825,
+      "learning_rate": 0.00026451428571428565,
+      "loss": 0.1601,
+      "step": 415
+    },
+    {
+      "epoch": 2.4,
+      "grad_norm": 1.4383760690689087,
+      "learning_rate": 0.0002640857142857143,
+      "loss": 0.1564,
+      "step": 420
+    },
+    {
+      "epoch": 2.4285714285714284,
+      "grad_norm": 0.6448336839675903,
+      "learning_rate": 0.0002636571428571428,
+      "loss": 0.1827,
+      "step": 425
+    },
+    {
+      "epoch": 2.4571428571428573,
+      "grad_norm": 0.9535760879516602,
+      "learning_rate": 0.0002632285714285714,
+      "loss": 0.1713,
+      "step": 430
+    },
+    {
+      "epoch": 2.4857142857142858,
+      "grad_norm": 1.034945011138916,
+      "learning_rate": 0.0002628,
+      "loss": 0.1457,
+      "step": 435
+    },
+    {
+      "epoch": 2.5142857142857142,
+      "grad_norm": 1.3225128650665283,
+      "learning_rate": 0.0002623714285714285,
+      "loss": 0.1633,
+      "step": 440
+    },
+    {
+      "epoch": 2.5428571428571427,
+      "grad_norm": 0.8285059928894043,
+      "learning_rate": 0.0002619428571428571,
+      "loss": 0.2004,
+      "step": 445
+    },
+    {
+      "epoch": 2.571428571428571,
+      "grad_norm": 0.773176908493042,
+      "learning_rate": 0.0002615142857142857,
+      "loss": 0.1641,
+      "step": 450
+    },
+    {
+      "epoch": 2.6,
+      "grad_norm": 0.7964853048324585,
+      "learning_rate": 0.0002610857142857143,
+      "loss": 0.1608,
+      "step": 455
+    },
+    {
+      "epoch": 2.6285714285714286,
+      "grad_norm": 1.0967328548431396,
+      "learning_rate": 0.00026065714285714286,
+      "loss": 0.1697,
+      "step": 460
+    },
+    {
+      "epoch": 2.657142857142857,
+      "grad_norm": 0.6462066173553467,
+      "learning_rate": 0.0002602285714285714,
+      "loss": 0.1512,
+      "step": 465
+    },
+    {
+      "epoch": 2.685714285714286,
+      "grad_norm": 0.8765937089920044,
+      "learning_rate": 0.00025979999999999997,
+      "loss": 0.1826,
+      "step": 470
+    },
+    {
+      "epoch": 2.7142857142857144,
+      "grad_norm": 1.2524124383926392,
+      "learning_rate": 0.00025937142857142856,
+      "loss": 0.1731,
+      "step": 475
+    },
+    {
+      "epoch": 2.742857142857143,
+      "grad_norm": 2.2982606887817383,
+      "learning_rate": 0.0002589428571428571,
+      "loss": 0.1852,
+      "step": 480
+    },
+    {
+      "epoch": 2.7714285714285714,
+      "grad_norm": 0.9989053010940552,
+      "learning_rate": 0.0002585142857142857,
+      "loss": 0.1791,
+      "step": 485
+    },
+    {
+      "epoch": 2.8,
+      "grad_norm": 0.772343635559082,
+      "learning_rate": 0.00025808571428571426,
+      "loss": 0.1862,
+      "step": 490
+    },
+    {
+      "epoch": 2.8285714285714287,
+      "grad_norm": 1.2101136445999146,
+      "learning_rate": 0.00025765714285714284,
+      "loss": 0.1806,
+      "step": 495
+    },
+    {
+      "epoch": 2.857142857142857,
+      "grad_norm": 0.8010189533233643,
+      "learning_rate": 0.0002572285714285714,
+      "loss": 0.1842,
+      "step": 500
+    },
+    {
+      "epoch": 2.8857142857142857,
+      "grad_norm": 1.3597544431686401,
+      "learning_rate": 0.00025679999999999995,
+      "loss": 0.1583,
+      "step": 505
+    },
+    {
+      "epoch": 2.914285714285714,
+      "grad_norm": 0.8790671825408936,
+      "learning_rate": 0.00025637142857142854,
+      "loss": 0.1565,
+      "step": 510
+    },
+    {
+      "epoch": 2.942857142857143,
+      "grad_norm": 1.1175066232681274,
+      "learning_rate": 0.0002559428571428571,
+      "loss": 0.1406,
+      "step": 515
+    },
+    {
+      "epoch": 2.9714285714285715,
+      "grad_norm": 2.8528785705566406,
+      "learning_rate": 0.0002555142857142857,
+      "loss": 0.1735,
+      "step": 520
+    },
+    {
+      "epoch": 3.0,
+      "grad_norm": 2.2073328495025635,
+      "learning_rate": 0.0002550857142857143,
+      "loss": 0.1816,
+      "step": 525
+    },
+    {
+      "epoch": 3.0285714285714285,
+      "grad_norm": 11.01322078704834,
+      "learning_rate": 0.0002546571428571428,
+      "loss": 0.1873,
+      "step": 530
+    },
+    {
+      "epoch": 3.057142857142857,
+      "grad_norm": 1.5822402238845825,
+      "learning_rate": 0.0002542285714285714,
+      "loss": 0.168,
+      "step": 535
+    },
+    {
+      "epoch": 3.085714285714286,
+      "grad_norm": 1.3086942434310913,
+      "learning_rate": 0.0002538,
+      "loss": 0.149,
+      "step": 540
+    },
+    {
+      "epoch": 3.1142857142857143,
+      "grad_norm": 6.303041458129883,
+      "learning_rate": 0.0002533714285714285,
+      "loss": 0.1651,
+      "step": 545
+    },
+    {
+      "epoch": 3.142857142857143,
+      "grad_norm": 14.48929500579834,
+      "learning_rate": 0.00025294285714285716,
+      "loss": 0.1687,
+      "step": 550
+    },
+    {
+      "epoch": 3.1714285714285713,
+      "grad_norm": 6.824525356292725,
+      "learning_rate": 0.0002525142857142857,
+      "loss": 0.1919,
+      "step": 555
+    },
+    {
+      "epoch": 3.2,
+      "grad_norm": 18.772563934326172,
+      "learning_rate": 0.00025208571428571427,
+      "loss": 0.2075,
+      "step": 560
+    },
+    {
+      "epoch": 3.2285714285714286,
+      "grad_norm": 0.7268752455711365,
+      "learning_rate": 0.00025165714285714286,
+      "loss": 0.174,
+      "step": 565
+    },
+    {
+      "epoch": 3.257142857142857,
+      "grad_norm": 1.1301453113555908,
+      "learning_rate": 0.0002512285714285714,
+      "loss": 0.1668,
+      "step": 570
+    },
+    {
+      "epoch": 3.2857142857142856,
+      "grad_norm": 2.846802234649658,
+      "learning_rate": 0.00025079999999999997,
+      "loss": 0.1645,
+      "step": 575
+    },
+    {
+      "epoch": 3.314285714285714,
+      "grad_norm": 1.417515754699707,
+      "learning_rate": 0.00025037142857142855,
+      "loss": 0.1719,
+      "step": 580
+    },
+    {
+      "epoch": 3.342857142857143,
+      "grad_norm": 4.137150764465332,
+      "learning_rate": 0.00024994285714285714,
+      "loss": 0.1739,
+      "step": 585
+    },
+    {
+      "epoch": 3.3714285714285714,
+      "grad_norm": 2.6067259311676025,
+      "learning_rate": 0.0002495142857142857,
+      "loss": 0.1489,
+      "step": 590
+    },
+    {
+      "epoch": 3.4,
+      "grad_norm": 2.601024627685547,
+      "learning_rate": 0.00024908571428571425,
+      "loss": 0.1618,
+      "step": 595
+    },
+    {
+      "epoch": 3.4285714285714284,
+      "grad_norm": 3.849017858505249,
+      "learning_rate": 0.00024865714285714284,
+      "loss": 0.1899,
+      "step": 600
+    },
+    {
+      "epoch": 3.4571428571428573,
+      "grad_norm": 4.673766136169434,
+      "learning_rate": 0.0002482285714285714,
+      "loss": 0.1761,
+      "step": 605
+    },
+    {
+      "epoch": 3.4857142857142858,
+      "grad_norm": 2.6057631969451904,
+      "learning_rate": 0.00024779999999999995,
+      "loss": 0.1743,
+      "step": 610
+    },
+    {
+      "epoch": 3.5142857142857142,
+      "grad_norm": 2.932652473449707,
+      "learning_rate": 0.0002473714285714286,
+      "loss": 0.1482,
+      "step": 615
+    },
+    {
+      "epoch": 3.5428571428571427,
+      "grad_norm": 0.8764939308166504,
+      "learning_rate": 0.0002469428571428571,
+      "loss": 0.1644,
+      "step": 620
+    },
+    {
+      "epoch": 3.571428571428571,
+      "grad_norm": 1.3203191757202148,
+      "learning_rate": 0.0002465142857142857,
+      "loss": 0.1654,
+      "step": 625
+    },
+    {
+      "epoch": 3.6,
+      "grad_norm": 0.7977635264396667,
+      "learning_rate": 0.0002460857142857143,
+      "loss": 0.1472,
+      "step": 630
+    },
+    {
+      "epoch": 3.6285714285714286,
+      "grad_norm": 1.4750248193740845,
+      "learning_rate": 0.0002456571428571428,
+      "loss": 0.1735,
+      "step": 635
+    },
+    {
+      "epoch": 3.657142857142857,
+      "grad_norm": 1.8164482116699219,
+      "learning_rate": 0.0002452285714285714,
+      "loss": 0.1593,
+      "step": 640
+    },
+    {
+      "epoch": 3.685714285714286,
+      "grad_norm": 1.4829603433609009,
+      "learning_rate": 0.0002448,
+      "loss": 0.1508,
+      "step": 645
+    },
+    {
+      "epoch": 3.7142857142857144,
+      "grad_norm": 0.8828144669532776,
+      "learning_rate": 0.00024437142857142857,
+      "loss": 0.1573,
+      "step": 650
+    },
+    {
+      "epoch": 3.742857142857143,
+      "grad_norm": 2.039384126663208,
+      "learning_rate": 0.00024394285714285713,
+      "loss": 0.1745,
+      "step": 655
+    },
+    {
+      "epoch": 3.7714285714285714,
+      "grad_norm": 0.9604200720787048,
+      "learning_rate": 0.00024351428571428569,
+      "loss": 0.17,
+      "step": 660
+    },
+    {
+      "epoch": 3.8,
+      "grad_norm": 0.7903971076011658,
+      "learning_rate": 0.00024308571428571427,
+      "loss": 0.1654,
+      "step": 665
+    },
+    {
+      "epoch": 3.8285714285714287,
+      "grad_norm": 0.6935649514198303,
+      "learning_rate": 0.00024265714285714283,
+      "loss": 0.1714,
+      "step": 670
+    },
+    {
+      "epoch": 3.857142857142857,
+      "grad_norm": 0.5832012295722961,
+      "learning_rate": 0.00024222857142857138,
+      "loss": 0.1636,
+      "step": 675
+    },
+    {
+      "epoch": 3.8857142857142857,
+      "grad_norm": 0.6303168535232544,
+      "learning_rate": 0.0002418,
+      "loss": 0.1604,
+      "step": 680
+    },
+    {
+      "epoch": 3.914285714285714,
+      "grad_norm": 0.7210885882377625,
+      "learning_rate": 0.00024137142857142855,
+      "loss": 0.1444,
+      "step": 685
+    },
+    {
+      "epoch": 3.942857142857143,
+      "grad_norm": 0.7690990567207336,
+      "learning_rate": 0.00024094285714285714,
+      "loss": 0.1631,
+      "step": 690
+    },
+    {
+      "epoch": 3.9714285714285715,
+      "grad_norm": 1.0142720937728882,
+      "learning_rate": 0.0002405142857142857,
+      "loss": 0.158,
+      "step": 695
+    },
+    {
+      "epoch": 4.0,
+      "grad_norm": 0.7970322966575623,
+      "learning_rate": 0.00024008571428571425,
+      "loss": 0.1803,
+      "step": 700
+    },
+    {
+      "epoch": 4.0285714285714285,
+      "grad_norm": 0.6795914769172668,
+      "learning_rate": 0.00023965714285714284,
+      "loss": 0.143,
+      "step": 705
+    },
+    {
+      "epoch": 4.057142857142857,
+      "grad_norm": 0.6832629442214966,
+      "learning_rate": 0.0002392285714285714,
+      "loss": 0.1457,
+      "step": 710
+    },
+    {
+      "epoch": 4.085714285714285,
+      "grad_norm": 3.8629798889160156,
+      "learning_rate": 0.0002388,
+      "loss": 0.1671,
+      "step": 715
+    },
+    {
+      "epoch": 4.114285714285714,
+      "grad_norm": 1.1167882680892944,
+      "learning_rate": 0.00023837142857142856,
+      "loss": 0.1544,
+      "step": 720
+    },
+    {
+      "epoch": 4.142857142857143,
+      "grad_norm": 0.9431412816047668,
+      "learning_rate": 0.00023794285714285712,
+      "loss": 0.1605,
+      "step": 725
+    },
+    {
+      "epoch": 4.171428571428572,
+      "grad_norm": 1.310948133468628,
+      "learning_rate": 0.0002375142857142857,
+      "loss": 0.1121,
+      "step": 730
+    },
+    {
+      "epoch": 4.2,
+      "grad_norm": 0.9830737709999084,
+      "learning_rate": 0.00023708571428571426,
+      "loss": 0.1742,
+      "step": 735
+    },
+    {
+      "epoch": 4.228571428571429,
+      "grad_norm": 0.6166555881500244,
+      "learning_rate": 0.00023665714285714282,
+      "loss": 0.1525,
+      "step": 740
+    },
+    {
+      "epoch": 4.257142857142857,
+      "grad_norm": 0.995579719543457,
+      "learning_rate": 0.00023622857142857143,
+      "loss": 0.1439,
+      "step": 745
+    },
+    {
+      "epoch": 4.285714285714286,
+      "grad_norm": 0.639796793460846,
+      "learning_rate": 0.00023579999999999999,
+      "loss": 0.1692,
+      "step": 750
+    },
+    {
+      "epoch": 4.314285714285714,
+      "grad_norm": 0.9438050389289856,
+      "learning_rate": 0.00023537142857142854,
+      "loss": 0.1785,
+      "step": 755
+    },
+    {
+      "epoch": 4.3428571428571425,
+      "grad_norm": 0.8960750102996826,
+      "learning_rate": 0.00023494285714285713,
+      "loss": 0.1557,
+      "step": 760
+    },
+    {
+      "epoch": 4.371428571428572,
+      "grad_norm": 0.6287499070167542,
+      "learning_rate": 0.00023451428571428568,
+      "loss": 0.1459,
+      "step": 765
+    },
+    {
+      "epoch": 4.4,
+      "grad_norm": 0.7638295888900757,
+      "learning_rate": 0.00023408571428571424,
+      "loss": 0.1341,
+      "step": 770
+    },
+    {
+      "epoch": 4.428571428571429,
+      "grad_norm": 0.655878484249115,
+      "learning_rate": 0.00023365714285714283,
+      "loss": 0.1358,
+      "step": 775
+    },
+    {
+      "epoch": 4.457142857142857,
+      "grad_norm": 0.5840997695922852,
+      "learning_rate": 0.0002332285714285714,
+      "loss": 0.1386,
+      "step": 780
+    },
+    {
+      "epoch": 4.485714285714286,
+      "grad_norm": 1.1082488298416138,
+      "learning_rate": 0.0002328,
+      "loss": 0.1827,
+      "step": 785
+    },
+    {
+      "epoch": 4.514285714285714,
+      "grad_norm": 0.8825240135192871,
+      "learning_rate": 0.00023237142857142855,
+      "loss": 0.1527,
+      "step": 790
+    },
+    {
+      "epoch": 4.542857142857143,
+      "grad_norm": 0.6752304434776306,
+      "learning_rate": 0.0002319428571428571,
+      "loss": 0.1392,
+      "step": 795
+    },
+    {
+      "epoch": 4.571428571428571,
+      "grad_norm": 1.1423301696777344,
+      "learning_rate": 0.0002315142857142857,
+      "loss": 0.1433,
+      "step": 800
+    },
+    {
+      "epoch": 4.6,
+      "grad_norm": 10.793691635131836,
+      "learning_rate": 0.00023108571428571425,
+      "loss": 0.1635,
+      "step": 805
+    },
+    {
+      "epoch": 4.628571428571428,
+      "grad_norm": 0.47564294934272766,
+      "learning_rate": 0.00023065714285714286,
+      "loss": 0.1199,
+      "step": 810
+    },
+    {
+      "epoch": 4.6571428571428575,
+      "grad_norm": 1.2492656707763672,
+      "learning_rate": 0.00023022857142857142,
+      "loss": 0.1488,
+      "step": 815
+    },
+    {
+      "epoch": 4.685714285714286,
+      "grad_norm": 0.6933501958847046,
+      "learning_rate": 0.00022979999999999997,
+      "loss": 0.1812,
+      "step": 820
+    },
+    {
+      "epoch": 4.714285714285714,
+      "grad_norm": 0.7901633977890015,
+      "learning_rate": 0.00022937142857142856,
+      "loss": 0.1415,
+      "step": 825
+    },
+    {
+      "epoch": 4.742857142857143,
+      "grad_norm": 0.7854829430580139,
+      "learning_rate": 0.00022894285714285712,
+      "loss": 0.1401,
+      "step": 830
+    },
+    {
+      "epoch": 4.771428571428571,
+      "grad_norm": 0.8716740608215332,
+      "learning_rate": 0.00022851428571428567,
+      "loss": 0.1982,
+      "step": 835
+    },
+    {
+      "epoch": 4.8,
+      "grad_norm": 0.7047899961471558,
+      "learning_rate": 0.00022808571428571426,
+      "loss": 0.1624,
+      "step": 840
+    },
+    {
+      "epoch": 4.828571428571428,
+      "grad_norm": 0.7134959697723389,
+      "learning_rate": 0.00022765714285714284,
+      "loss": 0.1375,
+      "step": 845
+    },
+    {
+      "epoch": 4.857142857142857,
+      "grad_norm": 1.0897325277328491,
+      "learning_rate": 0.00022722857142857143,
+      "loss": 0.1489,
+      "step": 850
+    },
+    {
+      "epoch": 4.885714285714286,
+      "grad_norm": 1.1065207719802856,
+      "learning_rate": 0.00022679999999999998,
+      "loss": 0.1495,
+      "step": 855
+    },
+    {
+      "epoch": 4.914285714285715,
+      "grad_norm": 0.7434757351875305,
+      "learning_rate": 0.00022637142857142854,
+      "loss": 0.1507,
+      "step": 860
+    },
+    {
+      "epoch": 4.942857142857143,
+      "grad_norm": 1.0045181512832642,
+      "learning_rate": 0.00022594285714285712,
+      "loss": 0.1527,
+      "step": 865
+    },
+    {
+      "epoch": 4.9714285714285715,
+      "grad_norm": 1.2025654315948486,
+      "learning_rate": 0.00022551428571428568,
+      "loss": 0.1523,
+      "step": 870
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 0.7823342084884644,
+      "learning_rate": 0.0002250857142857143,
+      "loss": 0.1514,
+      "step": 875
+    },
+    {
+      "epoch": 5.0285714285714285,
+      "grad_norm": 0.8405362963676453,
+      "learning_rate": 0.00022465714285714285,
+      "loss": 0.1461,
+      "step": 880
+    },
+    {
+      "epoch": 5.057142857142857,
+      "grad_norm": 0.7527463436126709,
+      "learning_rate": 0.0002242285714285714,
+      "loss": 0.1206,
+      "step": 885
+    },
+    {
+      "epoch": 5.085714285714285,
+      "grad_norm": 0.8372548222541809,
+      "learning_rate": 0.0002238,
+      "loss": 0.1513,
+      "step": 890
+    },
+    {
+      "epoch": 5.114285714285714,
+      "grad_norm": 0.8755456209182739,
+      "learning_rate": 0.00022337142857142855,
+      "loss": 0.1498,
+      "step": 895
+    },
+    {
+      "epoch": 5.142857142857143,
+      "grad_norm": 0.7312084436416626,
+      "learning_rate": 0.0002229428571428571,
+      "loss": 0.154,
+      "step": 900
+    },
+    {
+      "epoch": 5.171428571428572,
+      "grad_norm": 0.6366221904754639,
+      "learning_rate": 0.0002225142857142857,
+      "loss": 0.1466,
+      "step": 905
+    },
+    {
+      "epoch": 5.2,
+      "grad_norm": 0.6406880617141724,
+      "learning_rate": 0.00022208571428571427,
+      "loss": 0.1254,
+      "step": 910
+    },
+    {
+      "epoch": 5.228571428571429,
+      "grad_norm": 2.4106833934783936,
+      "learning_rate": 0.00022165714285714283,
+      "loss": 0.1534,
+      "step": 915
+    },
+    {
+      "epoch": 5.257142857142857,
+      "grad_norm": 0.5635722279548645,
+      "learning_rate": 0.00022122857142857142,
+      "loss": 0.1461,
+      "step": 920
+    },
+    {
+      "epoch": 5.285714285714286,
+      "grad_norm": 0.787162184715271,
+      "learning_rate": 0.00022079999999999997,
+      "loss": 0.1424,
+      "step": 925
+    },
+    {
+      "epoch": 5.314285714285714,
+      "grad_norm": 0.6513975262641907,
+      "learning_rate": 0.00022037142857142853,
+      "loss": 0.1326,
+      "step": 930
+    },
+    {
+      "epoch": 5.3428571428571425,
+      "grad_norm": 0.6933534741401672,
+      "learning_rate": 0.00021994285714285711,
+      "loss": 0.1661,
+      "step": 935
+    },
+    {
+      "epoch": 5.371428571428572,
+      "grad_norm": 0.7263259887695312,
+      "learning_rate": 0.0002195142857142857,
+      "loss": 0.15,
+      "step": 940
+    },
+    {
+      "epoch": 5.4,
+      "grad_norm": 0.5537381768226624,
+      "learning_rate": 0.00021908571428571428,
+      "loss": 0.129,
+      "step": 945
+    },
+    {
+      "epoch": 5.428571428571429,
+      "grad_norm": 0.6014005541801453,
+      "learning_rate": 0.00021865714285714284,
+      "loss": 0.1321,
+      "step": 950
+    },
+    {
+      "epoch": 5.457142857142857,
+      "grad_norm": 0.6581441760063171,
+      "learning_rate": 0.0002182285714285714,
+      "loss": 0.1587,
+      "step": 955
+    },
+    {
+      "epoch": 5.485714285714286,
+      "grad_norm": 0.9326379895210266,
+      "learning_rate": 0.00021779999999999998,
+      "loss": 0.1654,
+      "step": 960
+    },
+    {
+      "epoch": 5.514285714285714,
+      "grad_norm": 0.9438592791557312,
+      "learning_rate": 0.00021737142857142854,
+      "loss": 0.1212,
+      "step": 965
+    },
+    {
+      "epoch": 5.542857142857143,
+      "grad_norm": 0.7699571251869202,
+      "learning_rate": 0.00021694285714285715,
+      "loss": 0.1464,
+      "step": 970
+    },
+    {
+      "epoch": 5.571428571428571,
+      "grad_norm": 0.8758366703987122,
+      "learning_rate": 0.0002165142857142857,
+      "loss": 0.1599,
+      "step": 975
+    },
+    {
+      "epoch": 5.6,
+      "grad_norm": 0.6101442575454712,
+      "learning_rate": 0.00021608571428571426,
+      "loss": 0.1589,
+      "step": 980
+    },
+    {
+      "epoch": 5.628571428571428,
+      "grad_norm": 0.7454060912132263,
+      "learning_rate": 0.00021565714285714285,
+      "loss": 0.1433,
+      "step": 985
+    },
+    {
+      "epoch": 5.6571428571428575,
+      "grad_norm": 0.6379484534263611,
+      "learning_rate": 0.0002152285714285714,
+      "loss": 0.1592,
+      "step": 990
+    },
+    {
+      "epoch": 5.685714285714286,
+      "grad_norm": 1.1601309776306152,
+      "learning_rate": 0.00021479999999999996,
+      "loss": 0.1647,
+      "step": 995
+    },
+    {
+      "epoch": 5.714285714285714,
+      "grad_norm": 0.5464673638343811,
+      "learning_rate": 0.00021437142857142855,
+      "loss": 0.1469,
+      "step": 1000
+    },
+    {
+      "epoch": 5.742857142857143,
+      "grad_norm": 1.0279319286346436,
+      "learning_rate": 0.00021394285714285713,
+      "loss": 0.1203,
+      "step": 1005
+    },
+    {
+      "epoch": 5.771428571428571,
+      "grad_norm": 0.5503718256950378,
+      "learning_rate": 0.00021351428571428572,
+      "loss": 0.1409,
+      "step": 1010
+    },
+    {
+      "epoch": 5.8,
+      "grad_norm": 0.6123886108398438,
+      "learning_rate": 0.00021308571428571427,
+      "loss": 0.1427,
+      "step": 1015
+    },
+    {
+      "epoch": 5.828571428571428,
+      "grad_norm": 0.6560390591621399,
+      "learning_rate": 0.00021265714285714283,
+      "loss": 0.1415,
+      "step": 1020
+    },
+    {
+      "epoch": 5.857142857142857,
+      "grad_norm": 0.5576716661453247,
+      "learning_rate": 0.00021222857142857141,
+      "loss": 0.1408,
+      "step": 1025
+    },
+    {
+      "epoch": 5.885714285714286,
+      "grad_norm": 0.6419074535369873,
+      "learning_rate": 0.00021179999999999997,
+      "loss": 0.1385,
+      "step": 1030
+    },
+    {
+      "epoch": 5.914285714285715,
+      "grad_norm": 1.008925199508667,
+      "learning_rate": 0.00021137142857142858,
+      "loss": 0.1497,
+      "step": 1035
+    },
+    {
+      "epoch": 5.942857142857143,
+      "grad_norm": 0.6559906005859375,
+      "learning_rate": 0.00021094285714285714,
+      "loss": 0.1218,
+      "step": 1040
+    },
+    {
+      "epoch": 5.9714285714285715,
+      "grad_norm": 0.627164363861084,
+      "learning_rate": 0.0002105142857142857,
+      "loss": 0.1368,
+      "step": 1045
+    },
+    {
+      "epoch": 6.0,
+      "grad_norm": 0.5760972499847412,
+      "learning_rate": 0.00021008571428571428,
+      "loss": 0.1508,
+      "step": 1050
+    },
+    {
+      "epoch": 6.0285714285714285,
+      "grad_norm": 0.5754174590110779,
+      "learning_rate": 0.00020965714285714284,
+      "loss": 0.1181,
+      "step": 1055
+    },
+    {
+      "epoch": 6.057142857142857,
+      "grad_norm": 0.8736348748207092,
+      "learning_rate": 0.0002092285714285714,
+      "loss": 0.1252,
+      "step": 1060
+    },
+    {
+      "epoch": 6.085714285714285,
+      "grad_norm": 0.7166719436645508,
+      "learning_rate": 0.00020879999999999998,
+      "loss": 0.1481,
+      "step": 1065
+    },
+    {
+      "epoch": 6.114285714285714,
+      "grad_norm": 0.6494349241256714,
+      "learning_rate": 0.00020837142857142856,
+      "loss": 0.1478,
+      "step": 1070
+    },
+    {
+      "epoch": 6.142857142857143,
+      "grad_norm": 0.6681587100028992,
+      "learning_rate": 0.00020794285714285712,
+      "loss": 0.1488,
+      "step": 1075
+    },
+    {
+      "epoch": 6.171428571428572,
+      "grad_norm": 0.7123684883117676,
+      "learning_rate": 0.0002075142857142857,
+      "loss": 0.1378,
+      "step": 1080
+    },
+    {
+      "epoch": 6.2,
+      "grad_norm": 0.6146950721740723,
+      "learning_rate": 0.00020708571428571426,
+      "loss": 0.1306,
+      "step": 1085
+    },
+    {
+      "epoch": 6.228571428571429,
+      "grad_norm": 0.8402445912361145,
+      "learning_rate": 0.00020665714285714282,
+      "loss": 0.1063,
+      "step": 1090
+    },
+    {
+      "epoch": 6.257142857142857,
+      "grad_norm": 0.6567764282226562,
+      "learning_rate": 0.0002062285714285714,
+      "loss": 0.1195,
+      "step": 1095
+    },
+    {
+      "epoch": 6.285714285714286,
+      "grad_norm": 0.6006014943122864,
+      "learning_rate": 0.0002058,
+      "loss": 0.1542,
+      "step": 1100
+    },
+    {
+      "epoch": 6.314285714285714,
+      "grad_norm": 0.793100893497467,
+      "learning_rate": 0.00020537142857142857,
+      "loss": 0.1381,
+      "step": 1105
+    },
+    {
+      "epoch": 6.3428571428571425,
+      "grad_norm": 0.5923666954040527,
+      "learning_rate": 0.00020494285714285713,
+      "loss": 0.1386,
+      "step": 1110
+    },
+    {
+      "epoch": 6.371428571428572,
+      "grad_norm": 0.6692521572113037,
+      "learning_rate": 0.0002045142857142857,
+      "loss": 0.1223,
+      "step": 1115
+    },
+    {
+      "epoch": 6.4,
+      "grad_norm": 0.7216306328773499,
+      "learning_rate": 0.00020408571428571427,
+      "loss": 0.1367,
+      "step": 1120
+    },
+    {
+      "epoch": 6.428571428571429,
+      "grad_norm": 0.5640934109687805,
+      "learning_rate": 0.00020365714285714283,
+      "loss": 0.1554,
+      "step": 1125
+    },
+    {
+      "epoch": 6.457142857142857,
+      "grad_norm": 0.8154368996620178,
+      "learning_rate": 0.00020322857142857138,
+      "loss": 0.1674,
+      "step": 1130
+    },
+    {
+      "epoch": 6.485714285714286,
+      "grad_norm": 0.7185398936271667,
+      "learning_rate": 0.0002028,
+      "loss": 0.1375,
+      "step": 1135
+    },
+    {
+      "epoch": 6.514285714285714,
+      "grad_norm": 0.6805170774459839,
+      "learning_rate": 0.00020237142857142855,
+      "loss": 0.1306,
+      "step": 1140
+    },
+    {
+      "epoch": 6.542857142857143,
+      "grad_norm": 0.5996941924095154,
+      "learning_rate": 0.00020194285714285714,
+      "loss": 0.1433,
+      "step": 1145
+    },
+    {
+      "epoch": 6.571428571428571,
+      "grad_norm": 0.5258373022079468,
+      "learning_rate": 0.0002015142857142857,
+      "loss": 0.1285,
+      "step": 1150
+    },
+    {
+      "epoch": 6.6,
+      "grad_norm": 0.7771695256233215,
+      "learning_rate": 0.00020108571428571425,
+      "loss": 0.1493,
+      "step": 1155
+    },
+    {
+      "epoch": 6.628571428571428,
+      "grad_norm": 0.5920616388320923,
+      "learning_rate": 0.00020065714285714284,
+      "loss": 0.1479,
+      "step": 1160
+    },
+    {
+      "epoch": 6.6571428571428575,
+      "grad_norm": 0.7460982799530029,
+      "learning_rate": 0.00020022857142857142,
+      "loss": 0.1173,
+      "step": 1165
+    },
+    {
+      "epoch": 6.685714285714286,
+      "grad_norm": 1.1703822612762451,
+      "learning_rate": 0.0001998,
+      "loss": 0.1402,
+      "step": 1170
+    },
+    {
+      "epoch": 6.714285714285714,
+      "grad_norm": 0.7894724011421204,
+      "learning_rate": 0.00019937142857142856,
+      "loss": 0.1253,
+      "step": 1175
+    },
+    {
+      "epoch": 6.742857142857143,
+      "grad_norm": 0.7013376355171204,
+      "learning_rate": 0.00019894285714285712,
+      "loss": 0.1573,
+      "step": 1180
+    },
+    {
+      "epoch": 6.771428571428571,
+      "grad_norm": 0.6421737670898438,
+      "learning_rate": 0.0001985142857142857,
+      "loss": 0.1497,
+      "step": 1185
+    },
+    {
+      "epoch": 6.8,
+      "grad_norm": 1.204296350479126,
+      "learning_rate": 0.00019808571428571426,
+      "loss": 0.1634,
+      "step": 1190
+    },
+    {
+      "epoch": 6.828571428571428,
+      "grad_norm": 0.867765486240387,
+      "learning_rate": 0.00019765714285714282,
+      "loss": 0.1353,
+      "step": 1195
+    },
+    {
+      "epoch": 6.857142857142857,
+      "grad_norm": 0.7325594425201416,
+      "learning_rate": 0.00019722857142857143,
+      "loss": 0.118,
+      "step": 1200
+    },
+    {
+      "epoch": 6.885714285714286,
+      "grad_norm": 0.7029078006744385,
+      "learning_rate": 0.00019679999999999999,
+      "loss": 0.1425,
+      "step": 1205
+    },
+    {
+      "epoch": 6.914285714285715,
+      "grad_norm": 1.1572504043579102,
+      "learning_rate": 0.00019637142857142857,
+      "loss": 0.1337,
+      "step": 1210
+    },
+    {
+      "epoch": 6.942857142857143,
+      "grad_norm": 0.8022822141647339,
+      "learning_rate": 0.00019594285714285713,
+      "loss": 0.1684,
+      "step": 1215
+    },
+    {
+      "epoch": 6.9714285714285715,
+      "grad_norm": 0.6729874610900879,
+      "learning_rate": 0.00019551428571428568,
+      "loss": 0.1238,
+      "step": 1220
+    },
+    {
+      "epoch": 7.0,
+      "grad_norm": 0.5773627758026123,
+      "learning_rate": 0.00019508571428571427,
+      "loss": 0.138,
+      "step": 1225
+    },
+    {
+      "epoch": 7.0285714285714285,
+      "grad_norm": 0.7182291150093079,
+      "learning_rate": 0.00019465714285714285,
+      "loss": 0.1431,
+      "step": 1230
+    },
+    {
+      "epoch": 7.057142857142857,
+      "grad_norm": 1.7567912340164185,
+      "learning_rate": 0.0001942285714285714,
+      "loss": 0.1319,
+      "step": 1235
+    },
+    {
+      "epoch": 7.085714285714285,
+      "grad_norm": 0.6845232248306274,
+      "learning_rate": 0.0001938,
+      "loss": 0.1292,
+      "step": 1240
+    },
+    {
+      "epoch": 7.114285714285714,
+      "grad_norm": 0.6077771782875061,
+      "learning_rate": 0.00019337142857142855,
+      "loss": 0.1238,
+      "step": 1245
+    },
+    {
+      "epoch": 7.142857142857143,
+      "grad_norm": 0.6168347597122192,
+      "learning_rate": 0.0001929428571428571,
+      "loss": 0.1384,
+      "step": 1250
+    },
+    {
+      "epoch": 7.171428571428572,
+      "grad_norm": 0.7457576394081116,
+      "learning_rate": 0.0001925142857142857,
+      "loss": 0.1306,
+      "step": 1255
+    },
+    {
+      "epoch": 7.2,
+      "grad_norm": 0.5969316363334656,
+      "learning_rate": 0.00019208571428571425,
+      "loss": 0.1123,
+      "step": 1260
+    },
+    {
+      "epoch": 7.228571428571429,
+      "grad_norm": 0.6902753710746765,
+      "learning_rate": 0.00019165714285714286,
+      "loss": 0.1185,
+      "step": 1265
+    },
+    {
+      "epoch": 7.257142857142857,
+      "grad_norm": 0.6488338112831116,
+      "learning_rate": 0.00019122857142857142,
+      "loss": 0.1431,
+      "step": 1270
+    },
+    {
+      "epoch": 7.285714285714286,
+      "grad_norm": 0.6814819574356079,
+      "learning_rate": 0.00019079999999999998,
+      "loss": 0.1495,
+      "step": 1275
+    },
+    {
+      "epoch": 7.314285714285714,
+      "grad_norm": 0.7468088865280151,
+      "learning_rate": 0.00019037142857142856,
+      "loss": 0.1158,
+      "step": 1280
+    },
+    {
+      "epoch": 7.3428571428571425,
+      "grad_norm": 0.7417412400245667,
+      "learning_rate": 0.00018994285714285712,
+      "loss": 0.1311,
+      "step": 1285
+    },
+    {
+      "epoch": 7.371428571428572,
+      "grad_norm": 0.5480664372444153,
+      "learning_rate": 0.00018951428571428567,
+      "loss": 0.135,
+      "step": 1290
+    },
+    {
+      "epoch": 7.4,
+      "grad_norm": 0.725527822971344,
+      "learning_rate": 0.00018908571428571429,
+      "loss": 0.1217,
+      "step": 1295
+    },
+    {
+      "epoch": 7.428571428571429,
+      "grad_norm": 0.6566678285598755,
+      "learning_rate": 0.00018865714285714284,
+      "loss": 0.1417,
+      "step": 1300
+    },
+    {
+      "epoch": 7.457142857142857,
+      "grad_norm": 0.516952395439148,
+      "learning_rate": 0.00018822857142857143,
+      "loss": 0.1329,
+      "step": 1305
+    },
+    {
+      "epoch": 7.485714285714286,
+      "grad_norm": 1.9545241594314575,
+      "learning_rate": 0.00018779999999999998,
+      "loss": 0.1339,
+      "step": 1310
+    },
+    {
+      "epoch": 7.514285714285714,
+      "grad_norm": 0.8276839852333069,
+      "learning_rate": 0.00018737142857142854,
+      "loss": 0.1324,
+      "step": 1315
+    },
+    {
+      "epoch": 7.542857142857143,
+      "grad_norm": 0.6737099289894104,
+      "learning_rate": 0.00018694285714285713,
+      "loss": 0.1139,
+      "step": 1320
+    },
+    {
+      "epoch": 7.571428571428571,
+      "grad_norm": 0.6914472579956055,
+      "learning_rate": 0.00018651428571428568,
+      "loss": 0.1146,
+      "step": 1325
+    },
+    {
+      "epoch": 7.6,
+      "grad_norm": 0.6630033850669861,
+      "learning_rate": 0.0001860857142857143,
+      "loss": 0.1571,
+      "step": 1330
+    },
+    {
+      "epoch": 7.628571428571428,
+      "grad_norm": 0.820688784122467,
+      "learning_rate": 0.00018565714285714285,
+      "loss": 0.15,
+      "step": 1335
+    },
+    {
+      "epoch": 7.6571428571428575,
+      "grad_norm": 2.0491325855255127,
+      "learning_rate": 0.0001852285714285714,
+      "loss": 0.127,
+      "step": 1340
+    },
+    {
+      "epoch": 7.685714285714286,
+      "grad_norm": 0.9327268004417419,
+      "learning_rate": 0.0001848,
+      "loss": 0.1289,
+      "step": 1345
+    },
+    {
+      "epoch": 7.714285714285714,
+      "grad_norm": 1.3131701946258545,
+      "learning_rate": 0.00018437142857142855,
+      "loss": 0.1228,
+      "step": 1350
+    },
+    {
+      "epoch": 7.742857142857143,
+      "grad_norm": 2.955918312072754,
+      "learning_rate": 0.0001839428571428571,
+      "loss": 0.1082,
+      "step": 1355
+    },
+    {
+      "epoch": 7.771428571428571,
+      "grad_norm": 1.2165493965148926,
+      "learning_rate": 0.00018351428571428572,
+      "loss": 0.1688,
+      "step": 1360
+    },
+    {
+      "epoch": 7.8,
+      "grad_norm": 0.759324312210083,
+      "learning_rate": 0.00018308571428571428,
+      "loss": 0.1185,
+      "step": 1365
+    },
+    {
+      "epoch": 7.828571428571428,
+      "grad_norm": 0.7445591688156128,
+      "learning_rate": 0.00018265714285714286,
+      "loss": 0.1431,
+      "step": 1370
+    },
+    {
+      "epoch": 7.857142857142857,
+      "grad_norm": 0.679374098777771,
+      "learning_rate": 0.00018222857142857142,
+      "loss": 0.1451,
+      "step": 1375
+    },
+    {
+      "epoch": 7.885714285714286,
+      "grad_norm": 2.1234302520751953,
+      "learning_rate": 0.00018179999999999997,
+      "loss": 0.1265,
+      "step": 1380
+    },
+    {
+      "epoch": 7.914285714285715,
+      "grad_norm": 1.006521224975586,
+      "learning_rate": 0.00018137142857142856,
+      "loss": 0.1722,
+      "step": 1385
+    },
+    {
+      "epoch": 7.942857142857143,
+      "grad_norm": 0.7275253534317017,
+      "learning_rate": 0.00018094285714285712,
+      "loss": 0.1625,
+      "step": 1390
+    },
+    {
+      "epoch": 7.9714285714285715,
+      "grad_norm": 0.8612022995948792,
+      "learning_rate": 0.0001805142857142857,
+      "loss": 0.1345,
+      "step": 1395
+    },
+    {
+      "epoch": 8.0,
+      "grad_norm": 0.7276798486709595,
+      "learning_rate": 0.00018008571428571428,
+      "loss": 0.1236,
+      "step": 1400
+    },
+    {
+      "epoch": 8.028571428571428,
+      "grad_norm": 0.8731086850166321,
+      "learning_rate": 0.00017965714285714284,
+      "loss": 0.1604,
+      "step": 1405
+    },
+    {
+      "epoch": 8.057142857142857,
+      "grad_norm": 0.8950818777084351,
+      "learning_rate": 0.0001792285714285714,
+      "loss": 0.1531,
+      "step": 1410
+    },
+    {
+      "epoch": 8.085714285714285,
+      "grad_norm": 0.7399356365203857,
+      "learning_rate": 0.00017879999999999998,
+      "loss": 0.1508,
+      "step": 1415
+    },
+    {
+      "epoch": 8.114285714285714,
+      "grad_norm": 1.3727307319641113,
+      "learning_rate": 0.00017837142857142854,
+      "loss": 0.1487,
+      "step": 1420
+    },
+    {
+      "epoch": 8.142857142857142,
+      "grad_norm": 0.5938125848770142,
+      "learning_rate": 0.00017794285714285715,
+      "loss": 0.1303,
+      "step": 1425
+    },
+    {
+      "epoch": 8.17142857142857,
+      "grad_norm": 0.7043821811676025,
+      "learning_rate": 0.0001775142857142857,
+      "loss": 0.0948,
+      "step": 1430
+    },
+    {
+      "epoch": 8.2,
+      "grad_norm": 1.1062767505645752,
+      "learning_rate": 0.00017708571428571426,
+      "loss": 0.1412,
+      "step": 1435
+    },
+    {
+      "epoch": 8.228571428571428,
+      "grad_norm": 0.844832181930542,
+      "learning_rate": 0.00017665714285714285,
+      "loss": 0.113,
+      "step": 1440
+    },
+    {
+      "epoch": 8.257142857142856,
+      "grad_norm": 0.7564154863357544,
+      "learning_rate": 0.0001762285714285714,
+      "loss": 0.1319,
+      "step": 1445
+    },
+    {
+      "epoch": 8.285714285714286,
+      "grad_norm": 0.8843110203742981,
+      "learning_rate": 0.00017579999999999996,
+      "loss": 0.1206,
+      "step": 1450
+    },
+    {
+      "epoch": 8.314285714285715,
+      "grad_norm": 0.8175828456878662,
+      "learning_rate": 0.00017537142857142855,
+      "loss": 0.1327,
+      "step": 1455
+    },
+    {
+      "epoch": 8.342857142857143,
+      "grad_norm": 0.6443565487861633,
+      "learning_rate": 0.00017494285714285713,
+      "loss": 0.1239,
+      "step": 1460
+    },
+    {
+      "epoch": 8.371428571428572,
+      "grad_norm": 0.7237185835838318,
+      "learning_rate": 0.00017451428571428572,
+      "loss": 0.1639,
+      "step": 1465
+    },
+    {
+      "epoch": 8.4,
+      "grad_norm": 0.6118057370185852,
+      "learning_rate": 0.00017408571428571427,
+      "loss": 0.1363,
+      "step": 1470
+    },
+    {
+      "epoch": 8.428571428571429,
+      "grad_norm": 0.6754649877548218,
+      "learning_rate": 0.00017365714285714283,
+      "loss": 0.1187,
+      "step": 1475
+    },
+    {
+      "epoch": 8.457142857142857,
+      "grad_norm": 1.0067390203475952,
+      "learning_rate": 0.00017322857142857141,
+      "loss": 0.1401,
+      "step": 1480
+    },
+    {
+      "epoch": 8.485714285714286,
+      "grad_norm": 8.509544372558594,
+      "learning_rate": 0.00017279999999999997,
+      "loss": 0.1304,
+      "step": 1485
+    },
+    {
+      "epoch": 8.514285714285714,
+      "grad_norm": 4.2030205726623535,
+      "learning_rate": 0.00017237142857142858,
+      "loss": 0.121,
+      "step": 1490
+    },
+    {
+      "epoch": 8.542857142857143,
+      "grad_norm": 4.877438068389893,
+      "learning_rate": 0.00017194285714285714,
+      "loss": 0.1918,
+      "step": 1495
+    },
+    {
+      "epoch": 8.571428571428571,
+      "grad_norm": 6.4971232414245605,
+      "learning_rate": 0.0001715142857142857,
+      "loss": 0.2154,
+      "step": 1500
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 3500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 20,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 200,
+  "trial_name": null,
+  "trial_params": null
+}

glot-contrastive-final-lora/checkpoint-1500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02a87dc6b2c67ad3df98065b9e8fa21d9d93cd2cb361c532cb83c8a37bdc81a3
+size 5777

glot-contrastive-final-lora/checkpoint-2000/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: ./glot-mlm-adapted
+library_name: peft
+tags:
+- base_model:adapter:./glot-mlm-adapted
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

glot-contrastive-final-lora/checkpoint-2000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./glot-mlm-adapted",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query",
+    "value"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

glot-contrastive-final-lora/checkpoint-2000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:711e07c24e31501f072e595cc3a3ab71fd99dfdb7b91db165f6ee74a84d23cd0
+size 2365824

glot-contrastive-final-lora/checkpoint-2000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b3498c655bb1206516340ad6a1a375f5542a3351919099d2fe49c4838bfe9533
+size 4760395

glot-contrastive-final-lora/checkpoint-2000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e03ebb8df928308a6424f992063f5301b7d41a4785e5763346c3448dc6be8b4b
+size 14645

glot-contrastive-final-lora/checkpoint-2000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad5926187bcca6f27644c72ca9d33e1556220045488e2d905d0c7306c6d222dc
+size 1465

glot-contrastive-final-lora/checkpoint-2000/sentencepiece.bpe.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a313a26470baedaede322622492f2a542aa41527ddc5d40de444e945ad3c613
+size 7658320

glot-contrastive-final-lora/checkpoint-2000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}

glot-contrastive-final-lora/checkpoint-2000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "401144": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "unk_token": "<unk>",
+  "use_fast": true
+}

glot-contrastive-final-lora/checkpoint-2000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2834 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 11.428571428571429,
+  "eval_steps": 5,
+  "global_step": 2000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02857142857142857,
+      "grad_norm": 0.1407003551721573,
+      "learning_rate": 0.00029965714285714283,
+      "loss": 0.9726,
+      "step": 5
+    },
+    {
+      "epoch": 0.05714285714285714,
+      "grad_norm": 0.26689061522483826,
+      "learning_rate": 0.0002992285714285714,
+      "loss": 0.9633,
+      "step": 10
+    },
+    {
+      "epoch": 0.08571428571428572,
+      "grad_norm": 0.8670485615730286,
+      "learning_rate": 0.0002988,
+      "loss": 0.9013,
+      "step": 15
+    },
+    {
+      "epoch": 0.11428571428571428,
+      "grad_norm": 0.9785467386245728,
+      "learning_rate": 0.00029837142857142853,
+      "loss": 0.6942,
+      "step": 20
+    },
+    {
+      "epoch": 0.14285714285714285,
+      "grad_norm": 1.3083932399749756,
+      "learning_rate": 0.0002979428571428571,
+      "loss": 0.4472,
+      "step": 25
+    },
+    {
+      "epoch": 0.17142857142857143,
+      "grad_norm": 1.6103293895721436,
+      "learning_rate": 0.0002975142857142857,
+      "loss": 0.3782,
+      "step": 30
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 2.6353416442871094,
+      "learning_rate": 0.0002970857142857143,
+      "loss": 0.3732,
+      "step": 35
+    },
+    {
+      "epoch": 0.22857142857142856,
+      "grad_norm": 0.9949072003364563,
+      "learning_rate": 0.0002966571428571428,
+      "loss": 0.3506,
+      "step": 40
+    },
+    {
+      "epoch": 0.2571428571428571,
+      "grad_norm": 1.280673861503601,
+      "learning_rate": 0.0002962285714285714,
+      "loss": 0.3346,
+      "step": 45
+    },
+    {
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.7681456208229065,
+      "learning_rate": 0.0002958,
+      "loss": 0.2832,
+      "step": 50
+    },
+    {
+      "epoch": 0.3142857142857143,
+      "grad_norm": 1.0000813007354736,
+      "learning_rate": 0.0002953714285714285,
+      "loss": 0.2603,
+      "step": 55
+    },
+    {
+      "epoch": 0.34285714285714286,
+      "grad_norm": 1.0222399234771729,
+      "learning_rate": 0.0002949428571428571,
+      "loss": 0.2507,
+      "step": 60
+    },
+    {
+      "epoch": 0.37142857142857144,
+      "grad_norm": 0.896902322769165,
+      "learning_rate": 0.0002945142857142857,
+      "loss": 0.2556,
+      "step": 65
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.9035541415214539,
+      "learning_rate": 0.00029408571428571426,
+      "loss": 0.2402,
+      "step": 70
+    },
+    {
+      "epoch": 0.42857142857142855,
+      "grad_norm": 1.4886469841003418,
+      "learning_rate": 0.00029365714285714285,
+      "loss": 0.2376,
+      "step": 75
+    },
+    {
+      "epoch": 0.45714285714285713,
+      "grad_norm": 0.8951187133789062,
+      "learning_rate": 0.0002932285714285714,
+      "loss": 0.2276,
+      "step": 80
+    },
+    {
+      "epoch": 0.4857142857142857,
+      "grad_norm": 0.7876377105712891,
+      "learning_rate": 0.00029279999999999996,
+      "loss": 0.2537,
+      "step": 85
+    },
+    {
+      "epoch": 0.5142857142857142,
+      "grad_norm": 1.0927226543426514,
+      "learning_rate": 0.00029237142857142855,
+      "loss": 0.2152,
+      "step": 90
+    },
+    {
+      "epoch": 0.5428571428571428,
+      "grad_norm": 1.4946355819702148,
+      "learning_rate": 0.00029194285714285713,
+      "loss": 0.2441,
+      "step": 95
+    },
+    {
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.7082991600036621,
+      "learning_rate": 0.0002915142857142857,
+      "loss": 0.2708,
+      "step": 100
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 0.670010507106781,
+      "learning_rate": 0.00029108571428571424,
+      "loss": 0.2396,
+      "step": 105
+    },
+    {
+      "epoch": 0.6285714285714286,
+      "grad_norm": 0.9797312021255493,
+      "learning_rate": 0.00029065714285714283,
+      "loss": 0.2275,
+      "step": 110
+    },
+    {
+      "epoch": 0.6571428571428571,
+      "grad_norm": 1.5220463275909424,
+      "learning_rate": 0.0002902285714285714,
+      "loss": 0.2114,
+      "step": 115
+    },
+    {
+      "epoch": 0.6857142857142857,
+      "grad_norm": 1.3326867818832397,
+      "learning_rate": 0.00028979999999999994,
+      "loss": 0.241,
+      "step": 120
+    },
+    {
+      "epoch": 0.7142857142857143,
+      "grad_norm": 1.1195529699325562,
+      "learning_rate": 0.0002893714285714285,
+      "loss": 0.2389,
+      "step": 125
+    },
+    {
+      "epoch": 0.7428571428571429,
+      "grad_norm": 0.7551061511039734,
+      "learning_rate": 0.0002889428571428571,
+      "loss": 0.2162,
+      "step": 130
+    },
+    {
+      "epoch": 0.7714285714285715,
+      "grad_norm": 1.018908977508545,
+      "learning_rate": 0.0002885142857142857,
+      "loss": 0.1924,
+      "step": 135
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 2.123642921447754,
+      "learning_rate": 0.0002880857142857143,
+      "loss": 0.2174,
+      "step": 140
+    },
+    {
+      "epoch": 0.8285714285714286,
+      "grad_norm": 0.7585068941116333,
+      "learning_rate": 0.0002876571428571428,
+      "loss": 0.2006,
+      "step": 145
+    },
+    {
+      "epoch": 0.8571428571428571,
+      "grad_norm": 1.64150869846344,
+      "learning_rate": 0.0002872285714285714,
+      "loss": 0.1905,
+      "step": 150
+    },
+    {
+      "epoch": 0.8857142857142857,
+      "grad_norm": 0.9126951694488525,
+      "learning_rate": 0.0002868,
+      "loss": 0.2312,
+      "step": 155
+    },
+    {
+      "epoch": 0.9142857142857143,
+      "grad_norm": 0.7278801202774048,
+      "learning_rate": 0.00028637142857142856,
+      "loss": 0.2077,
+      "step": 160
+    },
+    {
+      "epoch": 0.9428571428571428,
+      "grad_norm": 0.8931339383125305,
+      "learning_rate": 0.00028594285714285715,
+      "loss": 0.1951,
+      "step": 165
+    },
+    {
+      "epoch": 0.9714285714285714,
+      "grad_norm": 1.0831843614578247,
+      "learning_rate": 0.0002855142857142857,
+      "loss": 0.2103,
+      "step": 170
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 1.3750063180923462,
+      "learning_rate": 0.00028508571428571426,
+      "loss": 0.2396,
+      "step": 175
+    },
+    {
+      "epoch": 1.0285714285714285,
+      "grad_norm": 0.8338337540626526,
+      "learning_rate": 0.00028465714285714285,
+      "loss": 0.2404,
+      "step": 180
+    },
+    {
+      "epoch": 1.0571428571428572,
+      "grad_norm": 1.2879024744033813,
+      "learning_rate": 0.0002842285714285714,
+      "loss": 0.2117,
+      "step": 185
+    },
+    {
+      "epoch": 1.0857142857142856,
+      "grad_norm": 1.6751821041107178,
+      "learning_rate": 0.00028379999999999996,
+      "loss": 0.1796,
+      "step": 190
+    },
+    {
+      "epoch": 1.1142857142857143,
+      "grad_norm": 0.9864417910575867,
+      "learning_rate": 0.00028337142857142854,
+      "loss": 0.1993,
+      "step": 195
+    },
+    {
+      "epoch": 1.1428571428571428,
+      "grad_norm": 1.0174155235290527,
+      "learning_rate": 0.00028294285714285713,
+      "loss": 0.2068,
+      "step": 200
+    },
+    {
+      "epoch": 1.1714285714285715,
+      "grad_norm": 1.029832124710083,
+      "learning_rate": 0.0002825142857142857,
+      "loss": 0.2015,
+      "step": 205
+    },
+    {
+      "epoch": 1.2,
+      "grad_norm": 0.7745446562767029,
+      "learning_rate": 0.00028208571428571424,
+      "loss": 0.2129,
+      "step": 210
+    },
+    {
+      "epoch": 1.2285714285714286,
+      "grad_norm": 2.5578622817993164,
+      "learning_rate": 0.0002816571428571428,
+      "loss": 0.2224,
+      "step": 215
+    },
+    {
+      "epoch": 1.2571428571428571,
+      "grad_norm": 2.4185051918029785,
+      "learning_rate": 0.0002812285714285714,
+      "loss": 0.2276,
+      "step": 220
+    },
+    {
+      "epoch": 1.2857142857142856,
+      "grad_norm": 1.4176461696624756,
+      "learning_rate": 0.0002808,
+      "loss": 0.1781,
+      "step": 225
+    },
+    {
+      "epoch": 1.3142857142857143,
+      "grad_norm": 0.709326982498169,
+      "learning_rate": 0.0002803714285714286,
+      "loss": 0.2177,
+      "step": 230
+    },
+    {
+      "epoch": 1.342857142857143,
+      "grad_norm": 0.8170766830444336,
+      "learning_rate": 0.0002799428571428571,
+      "loss": 0.1769,
+      "step": 235
+    },
+    {
+      "epoch": 1.3714285714285714,
+      "grad_norm": 1.3850761651992798,
+      "learning_rate": 0.0002795142857142857,
+      "loss": 0.2262,
+      "step": 240
+    },
+    {
+      "epoch": 1.4,
+      "grad_norm": 1.0064373016357422,
+      "learning_rate": 0.0002790857142857143,
+      "loss": 0.196,
+      "step": 245
+    },
+    {
+      "epoch": 1.4285714285714286,
+      "grad_norm": 1.9635728597640991,
+      "learning_rate": 0.0002786571428571428,
+      "loss": 0.2029,
+      "step": 250
+    },
+    {
+      "epoch": 1.457142857142857,
+      "grad_norm": 16.20791244506836,
+      "learning_rate": 0.0002782285714285714,
+      "loss": 0.3925,
+      "step": 255
+    },
+    {
+      "epoch": 1.4857142857142858,
+      "grad_norm": 1.4363322257995605,
+      "learning_rate": 0.0002778,
+      "loss": 0.3684,
+      "step": 260
+    },
+    {
+      "epoch": 1.5142857142857142,
+      "grad_norm": 0.9379534721374512,
+      "learning_rate": 0.00027737142857142856,
+      "loss": 0.2265,
+      "step": 265
+    },
+    {
+      "epoch": 1.5428571428571427,
+      "grad_norm": 0.8453512787818909,
+      "learning_rate": 0.00027694285714285714,
+      "loss": 0.1976,
+      "step": 270
+    },
+    {
+      "epoch": 1.5714285714285714,
+      "grad_norm": 2.316664695739746,
+      "learning_rate": 0.0002765142857142857,
+      "loss": 0.23,
+      "step": 275
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 1.0548444986343384,
+      "learning_rate": 0.00027608571428571426,
+      "loss": 0.1823,
+      "step": 280
+    },
+    {
+      "epoch": 1.6285714285714286,
+      "grad_norm": 3.7894928455352783,
+      "learning_rate": 0.00027565714285714284,
+      "loss": 0.1962,
+      "step": 285
+    },
+    {
+      "epoch": 1.657142857142857,
+      "grad_norm": 2.3081610202789307,
+      "learning_rate": 0.00027522857142857143,
+      "loss": 0.2087,
+      "step": 290
+    },
+    {
+      "epoch": 1.6857142857142857,
+      "grad_norm": 0.9311438202857971,
+      "learning_rate": 0.0002748,
+      "loss": 0.1597,
+      "step": 295
+    },
+    {
+      "epoch": 1.7142857142857144,
+      "grad_norm": 1.1881247758865356,
+      "learning_rate": 0.00027437142857142854,
+      "loss": 0.1764,
+      "step": 300
+    },
+    {
+      "epoch": 1.7428571428571429,
+      "grad_norm": 1.30265212059021,
+      "learning_rate": 0.0002739428571428571,
+      "loss": 0.1647,
+      "step": 305
+    },
+    {
+      "epoch": 1.7714285714285714,
+      "grad_norm": 0.6832175850868225,
+      "learning_rate": 0.0002735142857142857,
+      "loss": 0.1638,
+      "step": 310
+    },
+    {
+      "epoch": 1.8,
+      "grad_norm": 1.8740538358688354,
+      "learning_rate": 0.00027308571428571424,
+      "loss": 0.1803,
+      "step": 315
+    },
+    {
+      "epoch": 1.8285714285714287,
+      "grad_norm": 9.821504592895508,
+      "learning_rate": 0.0002726571428571428,
+      "loss": 0.226,
+      "step": 320
+    },
+    {
+      "epoch": 1.8571428571428572,
+      "grad_norm": 1.0889750719070435,
+      "learning_rate": 0.0002722285714285714,
+      "loss": 0.1822,
+      "step": 325
+    },
+    {
+      "epoch": 1.8857142857142857,
+      "grad_norm": 0.9660868048667908,
+      "learning_rate": 0.0002718,
+      "loss": 0.1842,
+      "step": 330
+    },
+    {
+      "epoch": 1.9142857142857141,
+      "grad_norm": 0.6329234838485718,
+      "learning_rate": 0.0002713714285714286,
+      "loss": 0.1488,
+      "step": 335
+    },
+    {
+      "epoch": 1.9428571428571428,
+      "grad_norm": 3.601266384124756,
+      "learning_rate": 0.0002709428571428571,
+      "loss": 0.1887,
+      "step": 340
+    },
+    {
+      "epoch": 1.9714285714285715,
+      "grad_norm": 1.1441439390182495,
+      "learning_rate": 0.0002705142857142857,
+      "loss": 0.184,
+      "step": 345
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 0.8586034774780273,
+      "learning_rate": 0.0002700857142857143,
+      "loss": 0.1578,
+      "step": 350
+    },
+    {
+      "epoch": 2.0285714285714285,
+      "grad_norm": 1.5113487243652344,
+      "learning_rate": 0.00026965714285714286,
+      "loss": 0.2002,
+      "step": 355
+    },
+    {
+      "epoch": 2.057142857142857,
+      "grad_norm": 1.1123011112213135,
+      "learning_rate": 0.0002692285714285714,
+      "loss": 0.1946,
+      "step": 360
+    },
+    {
+      "epoch": 2.085714285714286,
+      "grad_norm": 0.9377036094665527,
+      "learning_rate": 0.0002688,
+      "loss": 0.1971,
+      "step": 365
+    },
+    {
+      "epoch": 2.1142857142857143,
+      "grad_norm": 0.6956892609596252,
+      "learning_rate": 0.00026837142857142856,
+      "loss": 0.1758,
+      "step": 370
+    },
+    {
+      "epoch": 2.142857142857143,
+      "grad_norm": 0.7510782480239868,
+      "learning_rate": 0.0002679428571428571,
+      "loss": 0.1674,
+      "step": 375
+    },
+    {
+      "epoch": 2.1714285714285713,
+      "grad_norm": 0.7009285092353821,
+      "learning_rate": 0.00026751428571428567,
+      "loss": 0.1945,
+      "step": 380
+    },
+    {
+      "epoch": 2.2,
+      "grad_norm": 0.9555609822273254,
+      "learning_rate": 0.00026708571428571426,
+      "loss": 0.1857,
+      "step": 385
+    },
+    {
+      "epoch": 2.2285714285714286,
+      "grad_norm": 2.133979082107544,
+      "learning_rate": 0.00026665714285714284,
+      "loss": 0.1636,
+      "step": 390
+    },
+    {
+      "epoch": 2.257142857142857,
+      "grad_norm": 0.7105309963226318,
+      "learning_rate": 0.0002662285714285714,
+      "loss": 0.2014,
+      "step": 395
+    },
+    {
+      "epoch": 2.2857142857142856,
+      "grad_norm": 0.7329701781272888,
+      "learning_rate": 0.00026579999999999996,
+      "loss": 0.1884,
+      "step": 400
+    },
+    {
+      "epoch": 2.314285714285714,
+      "grad_norm": 1.0426994562149048,
+      "learning_rate": 0.00026537142857142854,
+      "loss": 0.1558,
+      "step": 405
+    },
+    {
+      "epoch": 2.342857142857143,
+      "grad_norm": 0.9306122660636902,
+      "learning_rate": 0.0002649428571428571,
+      "loss": 0.1774,
+      "step": 410
+    },
+    {
+      "epoch": 2.3714285714285714,
+      "grad_norm": 0.6989394426345825,
+      "learning_rate": 0.00026451428571428565,
+      "loss": 0.1601,
+      "step": 415
+    },
+    {
+      "epoch": 2.4,
+      "grad_norm": 1.4383760690689087,
+      "learning_rate": 0.0002640857142857143,
+      "loss": 0.1564,
+      "step": 420
+    },
+    {
+      "epoch": 2.4285714285714284,
+      "grad_norm": 0.6448336839675903,
+      "learning_rate": 0.0002636571428571428,
+      "loss": 0.1827,
+      "step": 425
+    },
+    {
+      "epoch": 2.4571428571428573,
+      "grad_norm": 0.9535760879516602,
+      "learning_rate": 0.0002632285714285714,
+      "loss": 0.1713,
+      "step": 430
+    },
+    {
+      "epoch": 2.4857142857142858,
+      "grad_norm": 1.034945011138916,
+      "learning_rate": 0.0002628,
+      "loss": 0.1457,
+      "step": 435
+    },
+    {
+      "epoch": 2.5142857142857142,
+      "grad_norm": 1.3225128650665283,
+      "learning_rate": 0.0002623714285714285,
+      "loss": 0.1633,
+      "step": 440
+    },
+    {
+      "epoch": 2.5428571428571427,
+      "grad_norm": 0.8285059928894043,
+      "learning_rate": 0.0002619428571428571,
+      "loss": 0.2004,
+      "step": 445
+    },
+    {
+      "epoch": 2.571428571428571,
+      "grad_norm": 0.773176908493042,
+      "learning_rate": 0.0002615142857142857,
+      "loss": 0.1641,
+      "step": 450
+    },
+    {
+      "epoch": 2.6,
+      "grad_norm": 0.7964853048324585,
+      "learning_rate": 0.0002610857142857143,
+      "loss": 0.1608,
+      "step": 455
+    },
+    {
+      "epoch": 2.6285714285714286,
+      "grad_norm": 1.0967328548431396,
+      "learning_rate": 0.00026065714285714286,
+      "loss": 0.1697,
+      "step": 460
+    },
+    {
+      "epoch": 2.657142857142857,
+      "grad_norm": 0.6462066173553467,
+      "learning_rate": 0.0002602285714285714,
+      "loss": 0.1512,
+      "step": 465
+    },
+    {
+      "epoch": 2.685714285714286,
+      "grad_norm": 0.8765937089920044,
+      "learning_rate": 0.00025979999999999997,
+      "loss": 0.1826,
+      "step": 470
+    },
+    {
+      "epoch": 2.7142857142857144,
+      "grad_norm": 1.2524124383926392,
+      "learning_rate": 0.00025937142857142856,
+      "loss": 0.1731,
+      "step": 475
+    },
+    {
+      "epoch": 2.742857142857143,
+      "grad_norm": 2.2982606887817383,
+      "learning_rate": 0.0002589428571428571,
+      "loss": 0.1852,
+      "step": 480
+    },
+    {
+      "epoch": 2.7714285714285714,
+      "grad_norm": 0.9989053010940552,
+      "learning_rate": 0.0002585142857142857,
+      "loss": 0.1791,
+      "step": 485
+    },
+    {
+      "epoch": 2.8,
+      "grad_norm": 0.772343635559082,
+      "learning_rate": 0.00025808571428571426,
+      "loss": 0.1862,
+      "step": 490
+    },
+    {
+      "epoch": 2.8285714285714287,
+      "grad_norm": 1.2101136445999146,
+      "learning_rate": 0.00025765714285714284,
+      "loss": 0.1806,
+      "step": 495
+    },
+    {
+      "epoch": 2.857142857142857,
+      "grad_norm": 0.8010189533233643,
+      "learning_rate": 0.0002572285714285714,
+      "loss": 0.1842,
+      "step": 500
+    },
+    {
+      "epoch": 2.8857142857142857,
+      "grad_norm": 1.3597544431686401,
+      "learning_rate": 0.00025679999999999995,
+      "loss": 0.1583,
+      "step": 505
+    },
+    {
+      "epoch": 2.914285714285714,
+      "grad_norm": 0.8790671825408936,
+      "learning_rate": 0.00025637142857142854,
+      "loss": 0.1565,
+      "step": 510
+    },
+    {
+      "epoch": 2.942857142857143,
+      "grad_norm": 1.1175066232681274,
+      "learning_rate": 0.0002559428571428571,
+      "loss": 0.1406,
+      "step": 515
+    },
+    {
+      "epoch": 2.9714285714285715,
+      "grad_norm": 2.8528785705566406,
+      "learning_rate": 0.0002555142857142857,
+      "loss": 0.1735,
+      "step": 520
+    },
+    {
+      "epoch": 3.0,
+      "grad_norm": 2.2073328495025635,
+      "learning_rate": 0.0002550857142857143,
+      "loss": 0.1816,
+      "step": 525
+    },
+    {
+      "epoch": 3.0285714285714285,
+      "grad_norm": 11.01322078704834,
+      "learning_rate": 0.0002546571428571428,
+      "loss": 0.1873,
+      "step": 530
+    },
+    {
+      "epoch": 3.057142857142857,
+      "grad_norm": 1.5822402238845825,
+      "learning_rate": 0.0002542285714285714,
+      "loss": 0.168,
+      "step": 535
+    },
+    {
+      "epoch": 3.085714285714286,
+      "grad_norm": 1.3086942434310913,
+      "learning_rate": 0.0002538,
+      "loss": 0.149,
+      "step": 540
+    },
+    {
+      "epoch": 3.1142857142857143,
+      "grad_norm": 6.303041458129883,
+      "learning_rate": 0.0002533714285714285,
+      "loss": 0.1651,
+      "step": 545
+    },
+    {
+      "epoch": 3.142857142857143,
+      "grad_norm": 14.48929500579834,
+      "learning_rate": 0.00025294285714285716,
+      "loss": 0.1687,
+      "step": 550
+    },
+    {
+      "epoch": 3.1714285714285713,
+      "grad_norm": 6.824525356292725,
+      "learning_rate": 0.0002525142857142857,
+      "loss": 0.1919,
+      "step": 555
+    },
+    {
+      "epoch": 3.2,
+      "grad_norm": 18.772563934326172,
+      "learning_rate": 0.00025208571428571427,
+      "loss": 0.2075,
+      "step": 560
+    },
+    {
+      "epoch": 3.2285714285714286,
+      "grad_norm": 0.7268752455711365,
+      "learning_rate": 0.00025165714285714286,
+      "loss": 0.174,
+      "step": 565
+    },
+    {
+      "epoch": 3.257142857142857,
+      "grad_norm": 1.1301453113555908,
+      "learning_rate": 0.0002512285714285714,
+      "loss": 0.1668,
+      "step": 570
+    },
+    {
+      "epoch": 3.2857142857142856,
+      "grad_norm": 2.846802234649658,
+      "learning_rate": 0.00025079999999999997,
+      "loss": 0.1645,
+      "step": 575
+    },
+    {
+      "epoch": 3.314285714285714,
+      "grad_norm": 1.417515754699707,
+      "learning_rate": 0.00025037142857142855,
+      "loss": 0.1719,
+      "step": 580
+    },
+    {
+      "epoch": 3.342857142857143,
+      "grad_norm": 4.137150764465332,
+      "learning_rate": 0.00024994285714285714,
+      "loss": 0.1739,
+      "step": 585
+    },
+    {
+      "epoch": 3.3714285714285714,
+      "grad_norm": 2.6067259311676025,
+      "learning_rate": 0.0002495142857142857,
+      "loss": 0.1489,
+      "step": 590
+    },
+    {
+      "epoch": 3.4,
+      "grad_norm": 2.601024627685547,
+      "learning_rate": 0.00024908571428571425,
+      "loss": 0.1618,
+      "step": 595
+    },
+    {
+      "epoch": 3.4285714285714284,
+      "grad_norm": 3.849017858505249,
+      "learning_rate": 0.00024865714285714284,
+      "loss": 0.1899,
+      "step": 600
+    },
+    {
+      "epoch": 3.4571428571428573,
+      "grad_norm": 4.673766136169434,
+      "learning_rate": 0.0002482285714285714,
+      "loss": 0.1761,
+      "step": 605
+    },
+    {
+      "epoch": 3.4857142857142858,
+      "grad_norm": 2.6057631969451904,
+      "learning_rate": 0.00024779999999999995,
+      "loss": 0.1743,
+      "step": 610
+    },
+    {
+      "epoch": 3.5142857142857142,
+      "grad_norm": 2.932652473449707,
+      "learning_rate": 0.0002473714285714286,
+      "loss": 0.1482,
+      "step": 615
+    },
+    {
+      "epoch": 3.5428571428571427,
+      "grad_norm": 0.8764939308166504,
+      "learning_rate": 0.0002469428571428571,
+      "loss": 0.1644,
+      "step": 620
+    },
+    {
+      "epoch": 3.571428571428571,
+      "grad_norm": 1.3203191757202148,
+      "learning_rate": 0.0002465142857142857,
+      "loss": 0.1654,
+      "step": 625
+    },
+    {
+      "epoch": 3.6,
+      "grad_norm": 0.7977635264396667,
+      "learning_rate": 0.0002460857142857143,
+      "loss": 0.1472,
+      "step": 630
+    },
+    {
+      "epoch": 3.6285714285714286,
+      "grad_norm": 1.4750248193740845,
+      "learning_rate": 0.0002456571428571428,
+      "loss": 0.1735,
+      "step": 635
+    },
+    {
+      "epoch": 3.657142857142857,
+      "grad_norm": 1.8164482116699219,
+      "learning_rate": 0.0002452285714285714,
+      "loss": 0.1593,
+      "step": 640
+    },
+    {
+      "epoch": 3.685714285714286,
+      "grad_norm": 1.4829603433609009,
+      "learning_rate": 0.0002448,
+      "loss": 0.1508,
+      "step": 645
+    },
+    {
+      "epoch": 3.7142857142857144,
+      "grad_norm": 0.8828144669532776,
+      "learning_rate": 0.00024437142857142857,
+      "loss": 0.1573,
+      "step": 650
+    },
+    {
+      "epoch": 3.742857142857143,
+      "grad_norm": 2.039384126663208,
+      "learning_rate": 0.00024394285714285713,
+      "loss": 0.1745,
+      "step": 655
+    },
+    {
+      "epoch": 3.7714285714285714,
+      "grad_norm": 0.9604200720787048,
+      "learning_rate": 0.00024351428571428569,
+      "loss": 0.17,
+      "step": 660
+    },
+    {
+      "epoch": 3.8,
+      "grad_norm": 0.7903971076011658,
+      "learning_rate": 0.00024308571428571427,
+      "loss": 0.1654,
+      "step": 665
+    },
+    {
+      "epoch": 3.8285714285714287,
+      "grad_norm": 0.6935649514198303,
+      "learning_rate": 0.00024265714285714283,
+      "loss": 0.1714,
+      "step": 670
+    },
+    {
+      "epoch": 3.857142857142857,
+      "grad_norm": 0.5832012295722961,
+      "learning_rate": 0.00024222857142857138,
+      "loss": 0.1636,
+      "step": 675
+    },
+    {
+      "epoch": 3.8857142857142857,
+      "grad_norm": 0.6303168535232544,
+      "learning_rate": 0.0002418,
+      "loss": 0.1604,
+      "step": 680
+    },
+    {
+      "epoch": 3.914285714285714,
+      "grad_norm": 0.7210885882377625,
+      "learning_rate": 0.00024137142857142855,
+      "loss": 0.1444,
+      "step": 685
+    },
+    {
+      "epoch": 3.942857142857143,
+      "grad_norm": 0.7690990567207336,
+      "learning_rate": 0.00024094285714285714,
+      "loss": 0.1631,
+      "step": 690
+    },
+    {
+      "epoch": 3.9714285714285715,
+      "grad_norm": 1.0142720937728882,
+      "learning_rate": 0.0002405142857142857,
+      "loss": 0.158,
+      "step": 695
+    },
+    {
+      "epoch": 4.0,
+      "grad_norm": 0.7970322966575623,
+      "learning_rate": 0.00024008571428571425,
+      "loss": 0.1803,
+      "step": 700
+    },
+    {
+      "epoch": 4.0285714285714285,
+      "grad_norm": 0.6795914769172668,
+      "learning_rate": 0.00023965714285714284,
+      "loss": 0.143,
+      "step": 705
+    },
+    {
+      "epoch": 4.057142857142857,
+      "grad_norm": 0.6832629442214966,
+      "learning_rate": 0.0002392285714285714,
+      "loss": 0.1457,
+      "step": 710
+    },
+    {
+      "epoch": 4.085714285714285,
+      "grad_norm": 3.8629798889160156,
+      "learning_rate": 0.0002388,
+      "loss": 0.1671,
+      "step": 715
+    },
+    {
+      "epoch": 4.114285714285714,
+      "grad_norm": 1.1167882680892944,
+      "learning_rate": 0.00023837142857142856,
+      "loss": 0.1544,
+      "step": 720
+    },
+    {
+      "epoch": 4.142857142857143,
+      "grad_norm": 0.9431412816047668,
+      "learning_rate": 0.00023794285714285712,
+      "loss": 0.1605,
+      "step": 725
+    },
+    {
+      "epoch": 4.171428571428572,
+      "grad_norm": 1.310948133468628,
+      "learning_rate": 0.0002375142857142857,
+      "loss": 0.1121,
+      "step": 730
+    },
+    {
+      "epoch": 4.2,
+      "grad_norm": 0.9830737709999084,
+      "learning_rate": 0.00023708571428571426,
+      "loss": 0.1742,
+      "step": 735
+    },
+    {
+      "epoch": 4.228571428571429,
+      "grad_norm": 0.6166555881500244,
+      "learning_rate": 0.00023665714285714282,
+      "loss": 0.1525,
+      "step": 740
+    },
+    {
+      "epoch": 4.257142857142857,
+      "grad_norm": 0.995579719543457,
+      "learning_rate": 0.00023622857142857143,
+      "loss": 0.1439,
+      "step": 745
+    },
+    {
+      "epoch": 4.285714285714286,
+      "grad_norm": 0.639796793460846,
+      "learning_rate": 0.00023579999999999999,
+      "loss": 0.1692,
+      "step": 750
+    },
+    {
+      "epoch": 4.314285714285714,
+      "grad_norm": 0.9438050389289856,
+      "learning_rate": 0.00023537142857142854,
+      "loss": 0.1785,
+      "step": 755
+    },
+    {
+      "epoch": 4.3428571428571425,
+      "grad_norm": 0.8960750102996826,
+      "learning_rate": 0.00023494285714285713,
+      "loss": 0.1557,
+      "step": 760
+    },
+    {
+      "epoch": 4.371428571428572,
+      "grad_norm": 0.6287499070167542,
+      "learning_rate": 0.00023451428571428568,
+      "loss": 0.1459,
+      "step": 765
+    },
+    {
+      "epoch": 4.4,
+      "grad_norm": 0.7638295888900757,
+      "learning_rate": 0.00023408571428571424,
+      "loss": 0.1341,
+      "step": 770
+    },
+    {
+      "epoch": 4.428571428571429,
+      "grad_norm": 0.655878484249115,
+      "learning_rate": 0.00023365714285714283,
+      "loss": 0.1358,
+      "step": 775
+    },
+    {
+      "epoch": 4.457142857142857,
+      "grad_norm": 0.5840997695922852,
+      "learning_rate": 0.0002332285714285714,
+      "loss": 0.1386,
+      "step": 780
+    },
+    {
+      "epoch": 4.485714285714286,
+      "grad_norm": 1.1082488298416138,
+      "learning_rate": 0.0002328,
+      "loss": 0.1827,
+      "step": 785
+    },
+    {
+      "epoch": 4.514285714285714,
+      "grad_norm": 0.8825240135192871,
+      "learning_rate": 0.00023237142857142855,
+      "loss": 0.1527,
+      "step": 790
+    },
+    {
+      "epoch": 4.542857142857143,
+      "grad_norm": 0.6752304434776306,
+      "learning_rate": 0.0002319428571428571,
+      "loss": 0.1392,
+      "step": 795
+    },
+    {
+      "epoch": 4.571428571428571,
+      "grad_norm": 1.1423301696777344,
+      "learning_rate": 0.0002315142857142857,
+      "loss": 0.1433,
+      "step": 800
+    },
+    {
+      "epoch": 4.6,
+      "grad_norm": 10.793691635131836,
+      "learning_rate": 0.00023108571428571425,
+      "loss": 0.1635,
+      "step": 805
+    },
+    {
+      "epoch": 4.628571428571428,
+      "grad_norm": 0.47564294934272766,
+      "learning_rate": 0.00023065714285714286,
+      "loss": 0.1199,
+      "step": 810
+    },
+    {
+      "epoch": 4.6571428571428575,
+      "grad_norm": 1.2492656707763672,
+      "learning_rate": 0.00023022857142857142,
+      "loss": 0.1488,
+      "step": 815
+    },
+    {
+      "epoch": 4.685714285714286,
+      "grad_norm": 0.6933501958847046,
+      "learning_rate": 0.00022979999999999997,
+      "loss": 0.1812,
+      "step": 820
+    },
+    {
+      "epoch": 4.714285714285714,
+      "grad_norm": 0.7901633977890015,
+      "learning_rate": 0.00022937142857142856,
+      "loss": 0.1415,
+      "step": 825
+    },
+    {
+      "epoch": 4.742857142857143,
+      "grad_norm": 0.7854829430580139,
+      "learning_rate": 0.00022894285714285712,
+      "loss": 0.1401,
+      "step": 830
+    },
+    {
+      "epoch": 4.771428571428571,
+      "grad_norm": 0.8716740608215332,
+      "learning_rate": 0.00022851428571428567,
+      "loss": 0.1982,
+      "step": 835
+    },
+    {
+      "epoch": 4.8,
+      "grad_norm": 0.7047899961471558,
+      "learning_rate": 0.00022808571428571426,
+      "loss": 0.1624,
+      "step": 840
+    },
+    {
+      "epoch": 4.828571428571428,
+      "grad_norm": 0.7134959697723389,
+      "learning_rate": 0.00022765714285714284,
+      "loss": 0.1375,
+      "step": 845
+    },
+    {
+      "epoch": 4.857142857142857,
+      "grad_norm": 1.0897325277328491,
+      "learning_rate": 0.00022722857142857143,
+      "loss": 0.1489,
+      "step": 850
+    },
+    {
+      "epoch": 4.885714285714286,
+      "grad_norm": 1.1065207719802856,
+      "learning_rate": 0.00022679999999999998,
+      "loss": 0.1495,
+      "step": 855
+    },
+    {
+      "epoch": 4.914285714285715,
+      "grad_norm": 0.7434757351875305,
+      "learning_rate": 0.00022637142857142854,
+      "loss": 0.1507,
+      "step": 860
+    },
+    {
+      "epoch": 4.942857142857143,
+      "grad_norm": 1.0045181512832642,
+      "learning_rate": 0.00022594285714285712,
+      "loss": 0.1527,
+      "step": 865
+    },
+    {
+      "epoch": 4.9714285714285715,
+      "grad_norm": 1.2025654315948486,
+      "learning_rate": 0.00022551428571428568,
+      "loss": 0.1523,
+      "step": 870
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 0.7823342084884644,
+      "learning_rate": 0.0002250857142857143,
+      "loss": 0.1514,
+      "step": 875
+    },
+    {
+      "epoch": 5.0285714285714285,
+      "grad_norm": 0.8405362963676453,
+      "learning_rate": 0.00022465714285714285,
+      "loss": 0.1461,
+      "step": 880
+    },
+    {
+      "epoch": 5.057142857142857,
+      "grad_norm": 0.7527463436126709,
+      "learning_rate": 0.0002242285714285714,
+      "loss": 0.1206,
+      "step": 885
+    },
+    {
+      "epoch": 5.085714285714285,
+      "grad_norm": 0.8372548222541809,
+      "learning_rate": 0.0002238,
+      "loss": 0.1513,
+      "step": 890
+    },
+    {
+      "epoch": 5.114285714285714,
+      "grad_norm": 0.8755456209182739,
+      "learning_rate": 0.00022337142857142855,
+      "loss": 0.1498,
+      "step": 895
+    },
+    {
+      "epoch": 5.142857142857143,
+      "grad_norm": 0.7312084436416626,
+      "learning_rate": 0.0002229428571428571,
+      "loss": 0.154,
+      "step": 900
+    },
+    {
+      "epoch": 5.171428571428572,
+      "grad_norm": 0.6366221904754639,
+      "learning_rate": 0.0002225142857142857,
+      "loss": 0.1466,
+      "step": 905
+    },
+    {
+      "epoch": 5.2,
+      "grad_norm": 0.6406880617141724,
+      "learning_rate": 0.00022208571428571427,
+      "loss": 0.1254,
+      "step": 910
+    },
+    {
+      "epoch": 5.228571428571429,
+      "grad_norm": 2.4106833934783936,
+      "learning_rate": 0.00022165714285714283,
+      "loss": 0.1534,
+      "step": 915
+    },
+    {
+      "epoch": 5.257142857142857,
+      "grad_norm": 0.5635722279548645,
+      "learning_rate": 0.00022122857142857142,
+      "loss": 0.1461,
+      "step": 920
+    },
+    {
+      "epoch": 5.285714285714286,
+      "grad_norm": 0.787162184715271,
+      "learning_rate": 0.00022079999999999997,
+      "loss": 0.1424,
+      "step": 925
+    },
+    {
+      "epoch": 5.314285714285714,
+      "grad_norm": 0.6513975262641907,
+      "learning_rate": 0.00022037142857142853,
+      "loss": 0.1326,
+      "step": 930
+    },
+    {
+      "epoch": 5.3428571428571425,
+      "grad_norm": 0.6933534741401672,
+      "learning_rate": 0.00021994285714285711,
+      "loss": 0.1661,
+      "step": 935
+    },
+    {
+      "epoch": 5.371428571428572,
+      "grad_norm": 0.7263259887695312,
+      "learning_rate": 0.0002195142857142857,
+      "loss": 0.15,
+      "step": 940
+    },
+    {
+      "epoch": 5.4,
+      "grad_norm": 0.5537381768226624,
+      "learning_rate": 0.00021908571428571428,
+      "loss": 0.129,
+      "step": 945
+    },
+    {
+      "epoch": 5.428571428571429,
+      "grad_norm": 0.6014005541801453,
+      "learning_rate": 0.00021865714285714284,
+      "loss": 0.1321,
+      "step": 950
+    },
+    {
+      "epoch": 5.457142857142857,
+      "grad_norm": 0.6581441760063171,
+      "learning_rate": 0.0002182285714285714,
+      "loss": 0.1587,
+      "step": 955
+    },
+    {
+      "epoch": 5.485714285714286,
+      "grad_norm": 0.9326379895210266,
+      "learning_rate": 0.00021779999999999998,
+      "loss": 0.1654,
+      "step": 960
+    },
+    {
+      "epoch": 5.514285714285714,
+      "grad_norm": 0.9438592791557312,
+      "learning_rate": 0.00021737142857142854,
+      "loss": 0.1212,
+      "step": 965
+    },
+    {
+      "epoch": 5.542857142857143,
+      "grad_norm": 0.7699571251869202,
+      "learning_rate": 0.00021694285714285715,
+      "loss": 0.1464,
+      "step": 970
+    },
+    {
+      "epoch": 5.571428571428571,
+      "grad_norm": 0.8758366703987122,
+      "learning_rate": 0.0002165142857142857,
+      "loss": 0.1599,
+      "step": 975
+    },
+    {
+      "epoch": 5.6,
+      "grad_norm": 0.6101442575454712,
+      "learning_rate": 0.00021608571428571426,
+      "loss": 0.1589,
+      "step": 980
+    },
+    {
+      "epoch": 5.628571428571428,
+      "grad_norm": 0.7454060912132263,
+      "learning_rate": 0.00021565714285714285,
+      "loss": 0.1433,
+      "step": 985
+    },
+    {
+      "epoch": 5.6571428571428575,
+      "grad_norm": 0.6379484534263611,
+      "learning_rate": 0.0002152285714285714,
+      "loss": 0.1592,
+      "step": 990
+    },
+    {
+      "epoch": 5.685714285714286,
+      "grad_norm": 1.1601309776306152,
+      "learning_rate": 0.00021479999999999996,
+      "loss": 0.1647,
+      "step": 995
+    },
+    {
+      "epoch": 5.714285714285714,
+      "grad_norm": 0.5464673638343811,
+      "learning_rate": 0.00021437142857142855,
+      "loss": 0.1469,
+      "step": 1000
+    },
+    {
+      "epoch": 5.742857142857143,
+      "grad_norm": 1.0279319286346436,
+      "learning_rate": 0.00021394285714285713,
+      "loss": 0.1203,
+      "step": 1005
+    },
+    {
+      "epoch": 5.771428571428571,
+      "grad_norm": 0.5503718256950378,
+      "learning_rate": 0.00021351428571428572,
+      "loss": 0.1409,
+      "step": 1010
+    },
+    {
+      "epoch": 5.8,
+      "grad_norm": 0.6123886108398438,
+      "learning_rate": 0.00021308571428571427,
+      "loss": 0.1427,
+      "step": 1015
+    },
+    {
+      "epoch": 5.828571428571428,
+      "grad_norm": 0.6560390591621399,
+      "learning_rate": 0.00021265714285714283,
+      "loss": 0.1415,
+      "step": 1020
+    },
+    {
+      "epoch": 5.857142857142857,
+      "grad_norm": 0.5576716661453247,
+      "learning_rate": 0.00021222857142857141,
+      "loss": 0.1408,
+      "step": 1025
+    },
+    {
+      "epoch": 5.885714285714286,
+      "grad_norm": 0.6419074535369873,
+      "learning_rate": 0.00021179999999999997,
+      "loss": 0.1385,
+      "step": 1030
+    },
+    {
+      "epoch": 5.914285714285715,
+      "grad_norm": 1.008925199508667,
+      "learning_rate": 0.00021137142857142858,
+      "loss": 0.1497,
+      "step": 1035
+    },
+    {
+      "epoch": 5.942857142857143,
+      "grad_norm": 0.6559906005859375,
+      "learning_rate": 0.00021094285714285714,
+      "loss": 0.1218,
+      "step": 1040
+    },
+    {
+      "epoch": 5.9714285714285715,
+      "grad_norm": 0.627164363861084,
+      "learning_rate": 0.0002105142857142857,
+      "loss": 0.1368,
+      "step": 1045
+    },
+    {
+      "epoch": 6.0,
+      "grad_norm": 0.5760972499847412,
+      "learning_rate": 0.00021008571428571428,
+      "loss": 0.1508,
+      "step": 1050
+    },
+    {
+      "epoch": 6.0285714285714285,
+      "grad_norm": 0.5754174590110779,
+      "learning_rate": 0.00020965714285714284,
+      "loss": 0.1181,
+      "step": 1055
+    },
+    {
+      "epoch": 6.057142857142857,
+      "grad_norm": 0.8736348748207092,
+      "learning_rate": 0.0002092285714285714,
+      "loss": 0.1252,
+      "step": 1060
+    },
+    {
+      "epoch": 6.085714285714285,
+      "grad_norm": 0.7166719436645508,
+      "learning_rate": 0.00020879999999999998,
+      "loss": 0.1481,
+      "step": 1065
+    },
+    {
+      "epoch": 6.114285714285714,
+      "grad_norm": 0.6494349241256714,
+      "learning_rate": 0.00020837142857142856,
+      "loss": 0.1478,
+      "step": 1070
+    },
+    {
+      "epoch": 6.142857142857143,
+      "grad_norm": 0.6681587100028992,
+      "learning_rate": 0.00020794285714285712,
+      "loss": 0.1488,
+      "step": 1075
+    },
+    {
+      "epoch": 6.171428571428572,
+      "grad_norm": 0.7123684883117676,
+      "learning_rate": 0.0002075142857142857,
+      "loss": 0.1378,
+      "step": 1080
+    },
+    {
+      "epoch": 6.2,
+      "grad_norm": 0.6146950721740723,
+      "learning_rate": 0.00020708571428571426,
+      "loss": 0.1306,
+      "step": 1085
+    },
+    {
+      "epoch": 6.228571428571429,
+      "grad_norm": 0.8402445912361145,
+      "learning_rate": 0.00020665714285714282,
+      "loss": 0.1063,
+      "step": 1090
+    },
+    {
+      "epoch": 6.257142857142857,
+      "grad_norm": 0.6567764282226562,
+      "learning_rate": 0.0002062285714285714,
+      "loss": 0.1195,
+      "step": 1095
+    },
+    {
+      "epoch": 6.285714285714286,
+      "grad_norm": 0.6006014943122864,
+      "learning_rate": 0.0002058,
+      "loss": 0.1542,
+      "step": 1100
+    },
+    {
+      "epoch": 6.314285714285714,
+      "grad_norm": 0.793100893497467,
+      "learning_rate": 0.00020537142857142857,
+      "loss": 0.1381,
+      "step": 1105
+    },
+    {
+      "epoch": 6.3428571428571425,
+      "grad_norm": 0.5923666954040527,
+      "learning_rate": 0.00020494285714285713,
+      "loss": 0.1386,
+      "step": 1110
+    },
+    {
+      "epoch": 6.371428571428572,
+      "grad_norm": 0.6692521572113037,
+      "learning_rate": 0.0002045142857142857,
+      "loss": 0.1223,
+      "step": 1115
+    },
+    {
+      "epoch": 6.4,
+      "grad_norm": 0.7216306328773499,
+      "learning_rate": 0.00020408571428571427,
+      "loss": 0.1367,
+      "step": 1120
+    },
+    {
+      "epoch": 6.428571428571429,
+      "grad_norm": 0.5640934109687805,
+      "learning_rate": 0.00020365714285714283,
+      "loss": 0.1554,
+      "step": 1125
+    },
+    {
+      "epoch": 6.457142857142857,
+      "grad_norm": 0.8154368996620178,
+      "learning_rate": 0.00020322857142857138,
+      "loss": 0.1674,
+      "step": 1130
+    },
+    {
+      "epoch": 6.485714285714286,
+      "grad_norm": 0.7185398936271667,
+      "learning_rate": 0.0002028,
+      "loss": 0.1375,
+      "step": 1135
+    },
+    {
+      "epoch": 6.514285714285714,
+      "grad_norm": 0.6805170774459839,
+      "learning_rate": 0.00020237142857142855,
+      "loss": 0.1306,
+      "step": 1140
+    },
+    {
+      "epoch": 6.542857142857143,
+      "grad_norm": 0.5996941924095154,
+      "learning_rate": 0.00020194285714285714,
+      "loss": 0.1433,
+      "step": 1145
+    },
+    {
+      "epoch": 6.571428571428571,
+      "grad_norm": 0.5258373022079468,
+      "learning_rate": 0.0002015142857142857,
+      "loss": 0.1285,
+      "step": 1150
+    },
+    {
+      "epoch": 6.6,
+      "grad_norm": 0.7771695256233215,
+      "learning_rate": 0.00020108571428571425,
+      "loss": 0.1493,
+      "step": 1155
+    },
+    {
+      "epoch": 6.628571428571428,
+      "grad_norm": 0.5920616388320923,
+      "learning_rate": 0.00020065714285714284,
+      "loss": 0.1479,
+      "step": 1160
+    },
+    {
+      "epoch": 6.6571428571428575,
+      "grad_norm": 0.7460982799530029,
+      "learning_rate": 0.00020022857142857142,
+      "loss": 0.1173,
+      "step": 1165
+    },
+    {
+      "epoch": 6.685714285714286,
+      "grad_norm": 1.1703822612762451,
+      "learning_rate": 0.0001998,
+      "loss": 0.1402,
+      "step": 1170
+    },
+    {
+      "epoch": 6.714285714285714,
+      "grad_norm": 0.7894724011421204,
+      "learning_rate": 0.00019937142857142856,
+      "loss": 0.1253,
+      "step": 1175
+    },
+    {
+      "epoch": 6.742857142857143,
+      "grad_norm": 0.7013376355171204,
+      "learning_rate": 0.00019894285714285712,
+      "loss": 0.1573,
+      "step": 1180
+    },
+    {
+      "epoch": 6.771428571428571,
+      "grad_norm": 0.6421737670898438,
+      "learning_rate": 0.0001985142857142857,
+      "loss": 0.1497,
+      "step": 1185
+    },
+    {
+      "epoch": 6.8,
+      "grad_norm": 1.204296350479126,
+      "learning_rate": 0.00019808571428571426,
+      "loss": 0.1634,
+      "step": 1190
+    },
+    {
+      "epoch": 6.828571428571428,
+      "grad_norm": 0.867765486240387,
+      "learning_rate": 0.00019765714285714282,
+      "loss": 0.1353,
+      "step": 1195
+    },
+    {
+      "epoch": 6.857142857142857,
+      "grad_norm": 0.7325594425201416,
+      "learning_rate": 0.00019722857142857143,
+      "loss": 0.118,
+      "step": 1200
+    },
+    {
+      "epoch": 6.885714285714286,
+      "grad_norm": 0.7029078006744385,
+      "learning_rate": 0.00019679999999999999,
+      "loss": 0.1425,
+      "step": 1205
+    },
+    {
+      "epoch": 6.914285714285715,
+      "grad_norm": 1.1572504043579102,
+      "learning_rate": 0.00019637142857142857,
+      "loss": 0.1337,
+      "step": 1210
+    },
+    {
+      "epoch": 6.942857142857143,
+      "grad_norm": 0.8022822141647339,
+      "learning_rate": 0.00019594285714285713,
+      "loss": 0.1684,
+      "step": 1215
+    },
+    {
+      "epoch": 6.9714285714285715,
+      "grad_norm": 0.6729874610900879,
+      "learning_rate": 0.00019551428571428568,
+      "loss": 0.1238,
+      "step": 1220
+    },
+    {
+      "epoch": 7.0,
+      "grad_norm": 0.5773627758026123,
+      "learning_rate": 0.00019508571428571427,
+      "loss": 0.138,
+      "step": 1225
+    },
+    {
+      "epoch": 7.0285714285714285,
+      "grad_norm": 0.7182291150093079,
+      "learning_rate": 0.00019465714285714285,
+      "loss": 0.1431,
+      "step": 1230
+    },
+    {
+      "epoch": 7.057142857142857,
+      "grad_norm": 1.7567912340164185,
+      "learning_rate": 0.0001942285714285714,
+      "loss": 0.1319,
+      "step": 1235
+    },
+    {
+      "epoch": 7.085714285714285,
+      "grad_norm": 0.6845232248306274,
+      "learning_rate": 0.0001938,
+      "loss": 0.1292,
+      "step": 1240
+    },
+    {
+      "epoch": 7.114285714285714,
+      "grad_norm": 0.6077771782875061,
+      "learning_rate": 0.00019337142857142855,
+      "loss": 0.1238,
+      "step": 1245
+    },
+    {
+      "epoch": 7.142857142857143,
+      "grad_norm": 0.6168347597122192,
+      "learning_rate": 0.0001929428571428571,
+      "loss": 0.1384,
+      "step": 1250
+    },
+    {
+      "epoch": 7.171428571428572,
+      "grad_norm": 0.7457576394081116,
+      "learning_rate": 0.0001925142857142857,
+      "loss": 0.1306,
+      "step": 1255
+    },
+    {
+      "epoch": 7.2,
+      "grad_norm": 0.5969316363334656,
+      "learning_rate": 0.00019208571428571425,
+      "loss": 0.1123,
+      "step": 1260
+    },
+    {
+      "epoch": 7.228571428571429,
+      "grad_norm": 0.6902753710746765,
+      "learning_rate": 0.00019165714285714286,
+      "loss": 0.1185,
+      "step": 1265
+    },
+    {
+      "epoch": 7.257142857142857,
+      "grad_norm": 0.6488338112831116,
+      "learning_rate": 0.00019122857142857142,
+      "loss": 0.1431,
+      "step": 1270
+    },
+    {
+      "epoch": 7.285714285714286,
+      "grad_norm": 0.6814819574356079,
+      "learning_rate": 0.00019079999999999998,
+      "loss": 0.1495,
+      "step": 1275
+    },
+    {
+      "epoch": 7.314285714285714,
+      "grad_norm": 0.7468088865280151,
+      "learning_rate": 0.00019037142857142856,
+      "loss": 0.1158,
+      "step": 1280
+    },
+    {
+      "epoch": 7.3428571428571425,
+      "grad_norm": 0.7417412400245667,
+      "learning_rate": 0.00018994285714285712,
+      "loss": 0.1311,
+      "step": 1285
+    },
+    {
+      "epoch": 7.371428571428572,
+      "grad_norm": 0.5480664372444153,
+      "learning_rate": 0.00018951428571428567,
+      "loss": 0.135,
+      "step": 1290
+    },
+    {
+      "epoch": 7.4,
+      "grad_norm": 0.725527822971344,
+      "learning_rate": 0.00018908571428571429,
+      "loss": 0.1217,
+      "step": 1295
+    },
+    {
+      "epoch": 7.428571428571429,
+      "grad_norm": 0.6566678285598755,
+      "learning_rate": 0.00018865714285714284,
+      "loss": 0.1417,
+      "step": 1300
+    },
+    {
+      "epoch": 7.457142857142857,
+      "grad_norm": 0.516952395439148,
+      "learning_rate": 0.00018822857142857143,
+      "loss": 0.1329,
+      "step": 1305
+    },
+    {
+      "epoch": 7.485714285714286,
+      "grad_norm": 1.9545241594314575,
+      "learning_rate": 0.00018779999999999998,
+      "loss": 0.1339,
+      "step": 1310
+    },
+    {
+      "epoch": 7.514285714285714,
+      "grad_norm": 0.8276839852333069,
+      "learning_rate": 0.00018737142857142854,
+      "loss": 0.1324,
+      "step": 1315
+    },
+    {
+      "epoch": 7.542857142857143,
+      "grad_norm": 0.6737099289894104,
+      "learning_rate": 0.00018694285714285713,
+      "loss": 0.1139,
+      "step": 1320
+    },
+    {
+      "epoch": 7.571428571428571,
+      "grad_norm": 0.6914472579956055,
+      "learning_rate": 0.00018651428571428568,
+      "loss": 0.1146,
+      "step": 1325
+    },
+    {
+      "epoch": 7.6,
+      "grad_norm": 0.6630033850669861,
+      "learning_rate": 0.0001860857142857143,
+      "loss": 0.1571,
+      "step": 1330
+    },
+    {
+      "epoch": 7.628571428571428,
+      "grad_norm": 0.820688784122467,
+      "learning_rate": 0.00018565714285714285,
+      "loss": 0.15,
+      "step": 1335
+    },
+    {
+      "epoch": 7.6571428571428575,
+      "grad_norm": 2.0491325855255127,
+      "learning_rate": 0.0001852285714285714,
+      "loss": 0.127,
+      "step": 1340
+    },
+    {
+      "epoch": 7.685714285714286,
+      "grad_norm": 0.9327268004417419,
+      "learning_rate": 0.0001848,
+      "loss": 0.1289,
+      "step": 1345
+    },
+    {
+      "epoch": 7.714285714285714,
+      "grad_norm": 1.3131701946258545,
+      "learning_rate": 0.00018437142857142855,
+      "loss": 0.1228,
+      "step": 1350
+    },
+    {
+      "epoch": 7.742857142857143,
+      "grad_norm": 2.955918312072754,
+      "learning_rate": 0.0001839428571428571,
+      "loss": 0.1082,
+      "step": 1355
+    },
+    {
+      "epoch": 7.771428571428571,
+      "grad_norm": 1.2165493965148926,
+      "learning_rate": 0.00018351428571428572,
+      "loss": 0.1688,
+      "step": 1360
+    },
+    {
+      "epoch": 7.8,
+      "grad_norm": 0.759324312210083,
+      "learning_rate": 0.00018308571428571428,
+      "loss": 0.1185,
+      "step": 1365
+    },
+    {
+      "epoch": 7.828571428571428,
+      "grad_norm": 0.7445591688156128,
+      "learning_rate": 0.00018265714285714286,
+      "loss": 0.1431,
+      "step": 1370
+    },
+    {
+      "epoch": 7.857142857142857,
+      "grad_norm": 0.679374098777771,
+      "learning_rate": 0.00018222857142857142,
+      "loss": 0.1451,
+      "step": 1375
+    },
+    {
+      "epoch": 7.885714285714286,
+      "grad_norm": 2.1234302520751953,
+      "learning_rate": 0.00018179999999999997,
+      "loss": 0.1265,
+      "step": 1380
+    },
+    {
+      "epoch": 7.914285714285715,
+      "grad_norm": 1.006521224975586,
+      "learning_rate": 0.00018137142857142856,
+      "loss": 0.1722,
+      "step": 1385
+    },
+    {
+      "epoch": 7.942857142857143,
+      "grad_norm": 0.7275253534317017,
+      "learning_rate": 0.00018094285714285712,
+      "loss": 0.1625,
+      "step": 1390
+    },
+    {
+      "epoch": 7.9714285714285715,
+      "grad_norm": 0.8612022995948792,
+      "learning_rate": 0.0001805142857142857,
+      "loss": 0.1345,
+      "step": 1395
+    },
+    {
+      "epoch": 8.0,
+      "grad_norm": 0.7276798486709595,
+      "learning_rate": 0.00018008571428571428,
+      "loss": 0.1236,
+      "step": 1400
+    },
+    {
+      "epoch": 8.028571428571428,
+      "grad_norm": 0.8731086850166321,
+      "learning_rate": 0.00017965714285714284,
+      "loss": 0.1604,
+      "step": 1405
+    },
+    {
+      "epoch": 8.057142857142857,
+      "grad_norm": 0.8950818777084351,
+      "learning_rate": 0.0001792285714285714,
+      "loss": 0.1531,
+      "step": 1410
+    },
+    {
+      "epoch": 8.085714285714285,
+      "grad_norm": 0.7399356365203857,
+      "learning_rate": 0.00017879999999999998,
+      "loss": 0.1508,
+      "step": 1415
+    },
+    {
+      "epoch": 8.114285714285714,
+      "grad_norm": 1.3727307319641113,
+      "learning_rate": 0.00017837142857142854,
+      "loss": 0.1487,
+      "step": 1420
+    },
+    {
+      "epoch": 8.142857142857142,
+      "grad_norm": 0.5938125848770142,
+      "learning_rate": 0.00017794285714285715,
+      "loss": 0.1303,
+      "step": 1425
+    },
+    {
+      "epoch": 8.17142857142857,
+      "grad_norm": 0.7043821811676025,
+      "learning_rate": 0.0001775142857142857,
+      "loss": 0.0948,
+      "step": 1430
+    },
+    {
+      "epoch": 8.2,
+      "grad_norm": 1.1062767505645752,
+      "learning_rate": 0.00017708571428571426,
+      "loss": 0.1412,
+      "step": 1435
+    },
+    {
+      "epoch": 8.228571428571428,
+      "grad_norm": 0.844832181930542,
+      "learning_rate": 0.00017665714285714285,
+      "loss": 0.113,
+      "step": 1440
+    },
+    {
+      "epoch": 8.257142857142856,
+      "grad_norm": 0.7564154863357544,
+      "learning_rate": 0.0001762285714285714,
+      "loss": 0.1319,
+      "step": 1445
+    },
+    {
+      "epoch": 8.285714285714286,
+      "grad_norm": 0.8843110203742981,
+      "learning_rate": 0.00017579999999999996,
+      "loss": 0.1206,
+      "step": 1450
+    },
+    {
+      "epoch": 8.314285714285715,
+      "grad_norm": 0.8175828456878662,
+      "learning_rate": 0.00017537142857142855,
+      "loss": 0.1327,
+      "step": 1455
+    },
+    {
+      "epoch": 8.342857142857143,
+      "grad_norm": 0.6443565487861633,
+      "learning_rate": 0.00017494285714285713,
+      "loss": 0.1239,
+      "step": 1460
+    },
+    {
+      "epoch": 8.371428571428572,
+      "grad_norm": 0.7237185835838318,
+      "learning_rate": 0.00017451428571428572,
+      "loss": 0.1639,
+      "step": 1465
+    },
+    {
+      "epoch": 8.4,
+      "grad_norm": 0.6118057370185852,
+      "learning_rate": 0.00017408571428571427,
+      "loss": 0.1363,
+      "step": 1470
+    },
+    {
+      "epoch": 8.428571428571429,
+      "grad_norm": 0.6754649877548218,
+      "learning_rate": 0.00017365714285714283,
+      "loss": 0.1187,
+      "step": 1475
+    },
+    {
+      "epoch": 8.457142857142857,
+      "grad_norm": 1.0067390203475952,
+      "learning_rate": 0.00017322857142857141,
+      "loss": 0.1401,
+      "step": 1480
+    },
+    {
+      "epoch": 8.485714285714286,
+      "grad_norm": 8.509544372558594,
+      "learning_rate": 0.00017279999999999997,
+      "loss": 0.1304,
+      "step": 1485
+    },
+    {
+      "epoch": 8.514285714285714,
+      "grad_norm": 4.2030205726623535,
+      "learning_rate": 0.00017237142857142858,
+      "loss": 0.121,
+      "step": 1490
+    },
+    {
+      "epoch": 8.542857142857143,
+      "grad_norm": 4.877438068389893,
+      "learning_rate": 0.00017194285714285714,
+      "loss": 0.1918,
+      "step": 1495
+    },
+    {
+      "epoch": 8.571428571428571,
+      "grad_norm": 6.4971232414245605,
+      "learning_rate": 0.0001715142857142857,
+      "loss": 0.2154,
+      "step": 1500
+    },
+    {
+      "epoch": 8.6,
+      "grad_norm": 4.365469932556152,
+      "learning_rate": 0.00017108571428571428,
+      "loss": 0.2272,
+      "step": 1505
+    },
+    {
+      "epoch": 8.628571428571428,
+      "grad_norm": 2.551957845687866,
+      "learning_rate": 0.00017065714285714284,
+      "loss": 0.2163,
+      "step": 1510
+    },
+    {
+      "epoch": 8.657142857142857,
+      "grad_norm": 5.326391220092773,
+      "learning_rate": 0.0001702285714285714,
+      "loss": 0.1612,
+      "step": 1515
+    },
+    {
+      "epoch": 8.685714285714285,
+      "grad_norm": 1.3528404235839844,
+      "learning_rate": 0.00016979999999999998,
+      "loss": 0.1636,
+      "step": 1520
+    },
+    {
+      "epoch": 8.714285714285714,
+      "grad_norm": 1.4466065168380737,
+      "learning_rate": 0.00016937142857142856,
+      "loss": 0.1295,
+      "step": 1525
+    },
+    {
+      "epoch": 8.742857142857144,
+      "grad_norm": 0.6576040387153625,
+      "learning_rate": 0.00016894285714285715,
+      "loss": 0.1318,
+      "step": 1530
+    },
+    {
+      "epoch": 8.771428571428572,
+      "grad_norm": 1.286942958831787,
+      "learning_rate": 0.0001685142857142857,
+      "loss": 0.1443,
+      "step": 1535
+    },
+    {
+      "epoch": 8.8,
+      "grad_norm": 9.474458694458008,
+      "learning_rate": 0.00016808571428571426,
+      "loss": 0.1313,
+      "step": 1540
+    },
+    {
+      "epoch": 8.82857142857143,
+      "grad_norm": 2.6731069087982178,
+      "learning_rate": 0.00016765714285714285,
+      "loss": 0.1485,
+      "step": 1545
+    },
+    {
+      "epoch": 8.857142857142858,
+      "grad_norm": 1.313723087310791,
+      "learning_rate": 0.0001672285714285714,
+      "loss": 0.1346,
+      "step": 1550
+    },
+    {
+      "epoch": 8.885714285714286,
+      "grad_norm": 1.7115576267242432,
+      "learning_rate": 0.0001668,
+      "loss": 0.1471,
+      "step": 1555
+    },
+    {
+      "epoch": 8.914285714285715,
+      "grad_norm": 1.2599923610687256,
+      "learning_rate": 0.00016637142857142857,
+      "loss": 0.1433,
+      "step": 1560
+    },
+    {
+      "epoch": 8.942857142857143,
+      "grad_norm": 0.9659029245376587,
+      "learning_rate": 0.00016594285714285713,
+      "loss": 0.1256,
+      "step": 1565
+    },
+    {
+      "epoch": 8.971428571428572,
+      "grad_norm": 1.1282744407653809,
+      "learning_rate": 0.0001655142857142857,
+      "loss": 0.1373,
+      "step": 1570
+    },
+    {
+      "epoch": 9.0,
+      "grad_norm": 3.20717453956604,
+      "learning_rate": 0.00016508571428571427,
+      "loss": 0.1355,
+      "step": 1575
+    },
+    {
+      "epoch": 9.028571428571428,
+      "grad_norm": 0.8310821056365967,
+      "learning_rate": 0.00016465714285714283,
+      "loss": 0.1268,
+      "step": 1580
+    },
+    {
+      "epoch": 9.057142857142857,
+      "grad_norm": 1.5337790250778198,
+      "learning_rate": 0.00016422857142857139,
+      "loss": 0.1267,
+      "step": 1585
+    },
+    {
+      "epoch": 9.085714285714285,
+      "grad_norm": 2.6406068801879883,
+      "learning_rate": 0.0001638,
+      "loss": 0.1363,
+      "step": 1590
+    },
+    {
+      "epoch": 9.114285714285714,
+      "grad_norm": 0.7705873847007751,
+      "learning_rate": 0.00016337142857142855,
+      "loss": 0.1291,
+      "step": 1595
+    },
+    {
+      "epoch": 9.142857142857142,
+      "grad_norm": 0.7092650532722473,
+      "learning_rate": 0.00016294285714285714,
+      "loss": 0.1435,
+      "step": 1600
+    },
+    {
+      "epoch": 9.17142857142857,
+      "grad_norm": 1.098961591720581,
+      "learning_rate": 0.0001625142857142857,
+      "loss": 0.1471,
+      "step": 1605
+    },
+    {
+      "epoch": 9.2,
+      "grad_norm": 0.6994885206222534,
+      "learning_rate": 0.00016208571428571425,
+      "loss": 0.1345,
+      "step": 1610
+    },
+    {
+      "epoch": 9.228571428571428,
+      "grad_norm": 0.9613476991653442,
+      "learning_rate": 0.00016165714285714284,
+      "loss": 0.1399,
+      "step": 1615
+    },
+    {
+      "epoch": 9.257142857142856,
+      "grad_norm": 0.675588846206665,
+      "learning_rate": 0.00016122857142857142,
+      "loss": 0.1319,
+      "step": 1620
+    },
+    {
+      "epoch": 9.285714285714286,
+      "grad_norm": 0.7519372701644897,
+      "learning_rate": 0.0001608,
+      "loss": 0.137,
+      "step": 1625
+    },
+    {
+      "epoch": 9.314285714285715,
+      "grad_norm": 1.135025978088379,
+      "learning_rate": 0.00016037142857142856,
+      "loss": 0.1322,
+      "step": 1630
+    },
+    {
+      "epoch": 9.342857142857143,
+      "grad_norm": 0.7462936639785767,
+      "learning_rate": 0.00015994285714285712,
+      "loss": 0.1215,
+      "step": 1635
+    },
+    {
+      "epoch": 9.371428571428572,
+      "grad_norm": 0.9042088985443115,
+      "learning_rate": 0.0001595142857142857,
+      "loss": 0.1191,
+      "step": 1640
+    },
+    {
+      "epoch": 9.4,
+      "grad_norm": 0.567828893661499,
+      "learning_rate": 0.00015908571428571426,
+      "loss": 0.1189,
+      "step": 1645
+    },
+    {
+      "epoch": 9.428571428571429,
+      "grad_norm": 0.981585681438446,
+      "learning_rate": 0.00015865714285714282,
+      "loss": 0.128,
+      "step": 1650
+    },
+    {
+      "epoch": 9.457142857142857,
+      "grad_norm": 1.24985933303833,
+      "learning_rate": 0.00015822857142857143,
+      "loss": 0.1315,
+      "step": 1655
+    },
+    {
+      "epoch": 9.485714285714286,
+      "grad_norm": 0.6517993211746216,
+      "learning_rate": 0.0001578,
+      "loss": 0.1076,
+      "step": 1660
+    },
+    {
+      "epoch": 9.514285714285714,
+      "grad_norm": 1.166628122329712,
+      "learning_rate": 0.00015737142857142857,
+      "loss": 0.1345,
+      "step": 1665
+    },
+    {
+      "epoch": 9.542857142857143,
+      "grad_norm": 0.9763592481613159,
+      "learning_rate": 0.00015694285714285713,
+      "loss": 0.1449,
+      "step": 1670
+    },
+    {
+      "epoch": 9.571428571428571,
+      "grad_norm": 0.7829060554504395,
+      "learning_rate": 0.00015651428571428569,
+      "loss": 0.1117,
+      "step": 1675
+    },
+    {
+      "epoch": 9.6,
+      "grad_norm": 0.6693719029426575,
+      "learning_rate": 0.00015608571428571427,
+      "loss": 0.1129,
+      "step": 1680
+    },
+    {
+      "epoch": 9.628571428571428,
+      "grad_norm": 1.2122846841812134,
+      "learning_rate": 0.00015565714285714285,
+      "loss": 0.1125,
+      "step": 1685
+    },
+    {
+      "epoch": 9.657142857142857,
+      "grad_norm": 1.0689371824264526,
+      "learning_rate": 0.0001552285714285714,
+      "loss": 0.1478,
+      "step": 1690
+    },
+    {
+      "epoch": 9.685714285714285,
+      "grad_norm": 1.8511656522750854,
+      "learning_rate": 0.0001548,
+      "loss": 0.1431,
+      "step": 1695
+    },
+    {
+      "epoch": 9.714285714285714,
+      "grad_norm": 0.6706506609916687,
+      "learning_rate": 0.00015437142857142855,
+      "loss": 0.1262,
+      "step": 1700
+    },
+    {
+      "epoch": 9.742857142857144,
+      "grad_norm": 1.0798784494400024,
+      "learning_rate": 0.00015394285714285714,
+      "loss": 0.1275,
+      "step": 1705
+    },
+    {
+      "epoch": 9.771428571428572,
+      "grad_norm": 0.7915983200073242,
+      "learning_rate": 0.0001535142857142857,
+      "loss": 0.1316,
+      "step": 1710
+    },
+    {
+      "epoch": 9.8,
+      "grad_norm": 1.8630567789077759,
+      "learning_rate": 0.00015308571428571425,
+      "loss": 0.1258,
+      "step": 1715
+    },
+    {
+      "epoch": 9.82857142857143,
+      "grad_norm": 0.7807756662368774,
+      "learning_rate": 0.00015265714285714286,
+      "loss": 0.1079,
+      "step": 1720
+    },
+    {
+      "epoch": 9.857142857142858,
+      "grad_norm": 1.4698439836502075,
+      "learning_rate": 0.00015222857142857142,
+      "loss": 0.1357,
+      "step": 1725
+    },
+    {
+      "epoch": 9.885714285714286,
+      "grad_norm": 1.2121926546096802,
+      "learning_rate": 0.00015179999999999998,
+      "loss": 0.1322,
+      "step": 1730
+    },
+    {
+      "epoch": 9.914285714285715,
+      "grad_norm": 0.6348568201065063,
+      "learning_rate": 0.00015137142857142856,
+      "loss": 0.0893,
+      "step": 1735
+    },
+    {
+      "epoch": 9.942857142857143,
+      "grad_norm": 0.6694422364234924,
+      "learning_rate": 0.00015094285714285712,
+      "loss": 0.1189,
+      "step": 1740
+    },
+    {
+      "epoch": 9.971428571428572,
+      "grad_norm": 0.569332480430603,
+      "learning_rate": 0.00015051428571428567,
+      "loss": 0.1349,
+      "step": 1745
+    },
+    {
+      "epoch": 10.0,
+      "grad_norm": 0.934073269367218,
+      "learning_rate": 0.00015008571428571429,
+      "loss": 0.1237,
+      "step": 1750
+    },
+    {
+      "epoch": 10.028571428571428,
+      "grad_norm": 0.7191672325134277,
+      "learning_rate": 0.00014965714285714284,
+      "loss": 0.1308,
+      "step": 1755
+    },
+    {
+      "epoch": 10.057142857142857,
+      "grad_norm": 0.7006493806838989,
+      "learning_rate": 0.00014922857142857143,
+      "loss": 0.104,
+      "step": 1760
+    },
+    {
+      "epoch": 10.085714285714285,
+      "grad_norm": 0.9030678272247314,
+      "learning_rate": 0.00014879999999999998,
+      "loss": 0.1308,
+      "step": 1765
+    },
+    {
+      "epoch": 10.114285714285714,
+      "grad_norm": 0.7007766366004944,
+      "learning_rate": 0.00014837142857142854,
+      "loss": 0.1044,
+      "step": 1770
+    },
+    {
+      "epoch": 10.142857142857142,
+      "grad_norm": 0.4832770824432373,
+      "learning_rate": 0.00014794285714285713,
+      "loss": 0.1119,
+      "step": 1775
+    },
+    {
+      "epoch": 10.17142857142857,
+      "grad_norm": 0.7819458842277527,
+      "learning_rate": 0.0001475142857142857,
+      "loss": 0.1087,
+      "step": 1780
+    },
+    {
+      "epoch": 10.2,
+      "grad_norm": 1.0223525762557983,
+      "learning_rate": 0.00014708571428571427,
+      "loss": 0.1314,
+      "step": 1785
+    },
+    {
+      "epoch": 10.228571428571428,
+      "grad_norm": 0.6224566698074341,
+      "learning_rate": 0.00014665714285714285,
+      "loss": 0.1159,
+      "step": 1790
+    },
+    {
+      "epoch": 10.257142857142856,
+      "grad_norm": 0.45800235867500305,
+      "learning_rate": 0.0001462285714285714,
+      "loss": 0.0942,
+      "step": 1795
+    },
+    {
+      "epoch": 10.285714285714286,
+      "grad_norm": 0.6258400082588196,
+      "learning_rate": 0.0001458,
+      "loss": 0.1079,
+      "step": 1800
+    },
+    {
+      "epoch": 10.314285714285715,
+      "grad_norm": 1.1812794208526611,
+      "learning_rate": 0.00014537142857142858,
+      "loss": 0.1378,
+      "step": 1805
+    },
+    {
+      "epoch": 10.342857142857143,
+      "grad_norm": 0.8541269898414612,
+      "learning_rate": 0.00014494285714285713,
+      "loss": 0.1274,
+      "step": 1810
+    },
+    {
+      "epoch": 10.371428571428572,
+      "grad_norm": 0.7131860256195068,
+      "learning_rate": 0.0001445142857142857,
+      "loss": 0.1247,
+      "step": 1815
+    },
+    {
+      "epoch": 10.4,
+      "grad_norm": 0.6109820008277893,
+      "learning_rate": 0.00014408571428571428,
+      "loss": 0.1246,
+      "step": 1820
+    },
+    {
+      "epoch": 10.428571428571429,
+      "grad_norm": 0.5621510744094849,
+      "learning_rate": 0.00014365714285714286,
+      "loss": 0.1039,
+      "step": 1825
+    },
+    {
+      "epoch": 10.457142857142857,
+      "grad_norm": 1.022777795791626,
+      "learning_rate": 0.00014322857142857142,
+      "loss": 0.1206,
+      "step": 1830
+    },
+    {
+      "epoch": 10.485714285714286,
+      "grad_norm": 0.9120668768882751,
+      "learning_rate": 0.00014279999999999997,
+      "loss": 0.1289,
+      "step": 1835
+    },
+    {
+      "epoch": 10.514285714285714,
+      "grad_norm": 1.1882030963897705,
+      "learning_rate": 0.00014237142857142856,
+      "loss": 0.1194,
+      "step": 1840
+    },
+    {
+      "epoch": 10.542857142857143,
+      "grad_norm": 0.6078401207923889,
+      "learning_rate": 0.00014194285714285714,
+      "loss": 0.1339,
+      "step": 1845
+    },
+    {
+      "epoch": 10.571428571428571,
+      "grad_norm": 0.7380999326705933,
+      "learning_rate": 0.0001415142857142857,
+      "loss": 0.1318,
+      "step": 1850
+    },
+    {
+      "epoch": 10.6,
+      "grad_norm": 0.5884959101676941,
+      "learning_rate": 0.00014108571428571428,
+      "loss": 0.1249,
+      "step": 1855
+    },
+    {
+      "epoch": 10.628571428571428,
+      "grad_norm": 1.0121936798095703,
+      "learning_rate": 0.00014065714285714284,
+      "loss": 0.1137,
+      "step": 1860
+    },
+    {
+      "epoch": 10.657142857142857,
+      "grad_norm": 0.6444916129112244,
+      "learning_rate": 0.00014022857142857143,
+      "loss": 0.1213,
+      "step": 1865
+    },
+    {
+      "epoch": 10.685714285714285,
+      "grad_norm": 0.7931004762649536,
+      "learning_rate": 0.00013979999999999998,
+      "loss": 0.1318,
+      "step": 1870
+    },
+    {
+      "epoch": 10.714285714285714,
+      "grad_norm": 0.5596404075622559,
+      "learning_rate": 0.00013937142857142857,
+      "loss": 0.1075,
+      "step": 1875
+    },
+    {
+      "epoch": 10.742857142857144,
+      "grad_norm": 0.6586474180221558,
+      "learning_rate": 0.00013894285714285712,
+      "loss": 0.13,
+      "step": 1880
+    },
+    {
+      "epoch": 10.771428571428572,
+      "grad_norm": 1.0195013284683228,
+      "learning_rate": 0.00013851428571428568,
+      "loss": 0.1373,
+      "step": 1885
+    },
+    {
+      "epoch": 10.8,
+      "grad_norm": 0.9233512878417969,
+      "learning_rate": 0.00013808571428571427,
+      "loss": 0.1168,
+      "step": 1890
+    },
+    {
+      "epoch": 10.82857142857143,
+      "grad_norm": 0.7154092788696289,
+      "learning_rate": 0.00013765714285714285,
+      "loss": 0.1081,
+      "step": 1895
+    },
+    {
+      "epoch": 10.857142857142858,
+      "grad_norm": 1.4588117599487305,
+      "learning_rate": 0.0001372285714285714,
+      "loss": 0.1061,
+      "step": 1900
+    },
+    {
+      "epoch": 10.885714285714286,
+      "grad_norm": 0.6087035536766052,
+      "learning_rate": 0.0001368,
+      "loss": 0.1157,
+      "step": 1905
+    },
+    {
+      "epoch": 10.914285714285715,
+      "grad_norm": 0.7371247410774231,
+      "learning_rate": 0.00013637142857142855,
+      "loss": 0.1339,
+      "step": 1910
+    },
+    {
+      "epoch": 10.942857142857143,
+      "grad_norm": 0.8253212571144104,
+      "learning_rate": 0.00013594285714285713,
+      "loss": 0.1198,
+      "step": 1915
+    },
+    {
+      "epoch": 10.971428571428572,
+      "grad_norm": 0.6889544129371643,
+      "learning_rate": 0.00013551428571428572,
+      "loss": 0.1131,
+      "step": 1920
+    },
+    {
+      "epoch": 11.0,
+      "grad_norm": 0.6408224105834961,
+      "learning_rate": 0.00013508571428571427,
+      "loss": 0.122,
+      "step": 1925
+    },
+    {
+      "epoch": 11.028571428571428,
+      "grad_norm": 0.6771185398101807,
+      "learning_rate": 0.00013465714285714283,
+      "loss": 0.1492,
+      "step": 1930
+    },
+    {
+      "epoch": 11.057142857142857,
+      "grad_norm": 0.8706450462341309,
+      "learning_rate": 0.00013422857142857142,
+      "loss": 0.1294,
+      "step": 1935
+    },
+    {
+      "epoch": 11.085714285714285,
+      "grad_norm": 1.730648398399353,
+      "learning_rate": 0.0001338,
+      "loss": 0.1004,
+      "step": 1940
+    },
+    {
+      "epoch": 11.114285714285714,
+      "grad_norm": 0.6985113620758057,
+      "learning_rate": 0.00013337142857142856,
+      "loss": 0.0995,
+      "step": 1945
+    },
+    {
+      "epoch": 11.142857142857142,
+      "grad_norm": 0.8901951313018799,
+      "learning_rate": 0.00013294285714285711,
+      "loss": 0.1179,
+      "step": 1950
+    },
+    {
+      "epoch": 11.17142857142857,
+      "grad_norm": 0.7232164144515991,
+      "learning_rate": 0.0001325142857142857,
+      "loss": 0.1397,
+      "step": 1955
+    },
+    {
+      "epoch": 11.2,
+      "grad_norm": 0.6447544693946838,
+      "learning_rate": 0.00013208571428571428,
+      "loss": 0.1366,
+      "step": 1960
+    },
+    {
+      "epoch": 11.228571428571428,
+      "grad_norm": 0.7964944243431091,
+      "learning_rate": 0.00013165714285714284,
+      "loss": 0.1121,
+      "step": 1965
+    },
+    {
+      "epoch": 11.257142857142856,
+      "grad_norm": 0.9012628793716431,
+      "learning_rate": 0.00013122857142857142,
+      "loss": 0.1131,
+      "step": 1970
+    },
+    {
+      "epoch": 11.285714285714286,
+      "grad_norm": 0.9295369982719421,
+      "learning_rate": 0.00013079999999999998,
+      "loss": 0.1232,
+      "step": 1975
+    },
+    {
+      "epoch": 11.314285714285715,
+      "grad_norm": 0.6237708926200867,
+      "learning_rate": 0.00013037142857142857,
+      "loss": 0.1066,
+      "step": 1980
+    },
+    {
+      "epoch": 11.342857142857143,
+      "grad_norm": 0.5250967741012573,
+      "learning_rate": 0.00012994285714285715,
+      "loss": 0.118,
+      "step": 1985
+    },
+    {
+      "epoch": 11.371428571428572,
+      "grad_norm": 1.0013964176177979,
+      "learning_rate": 0.0001295142857142857,
+      "loss": 0.1125,
+      "step": 1990
+    },
+    {
+      "epoch": 11.4,
+      "grad_norm": 0.6721311807632446,
+      "learning_rate": 0.00012908571428571426,
+      "loss": 0.1196,
+      "step": 1995
+    },
+    {
+      "epoch": 11.428571428571429,
+      "grad_norm": 0.6966421008110046,
+      "learning_rate": 0.00012865714285714285,
+      "loss": 0.1172,
+      "step": 2000
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 3500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 20,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 200,
+  "trial_name": null,
+  "trial_params": null
+}

glot-contrastive-final-lora/checkpoint-2000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02a87dc6b2c67ad3df98065b9e8fa21d9d93cd2cb361c532cb83c8a37bdc81a3
+size 5777

glot-contrastive-final-lora/checkpoint-2500/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: ./glot-mlm-adapted
+library_name: peft
+tags:
+- base_model:adapter:./glot-mlm-adapted
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

glot-contrastive-final-lora/checkpoint-2500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./glot-mlm-adapted",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query",
+    "value"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

glot-contrastive-final-lora/checkpoint-2500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:940dc572880c580cd969ac155623363743c9f3ef94854aba54b224023c4a2ee1
+size 2365824

glot-contrastive-final-lora/checkpoint-2500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d2c540c91f6c54cf3701175d6db55034ccae2b3b587a04b9476ce989d4fa18b
+size 4760395

glot-contrastive-final-lora/checkpoint-2500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:82bf023104ba6bb70dbc679f41d50ee904b14245b597026bbb288d43524d6797
+size 14645

glot-contrastive-final-lora/checkpoint-2500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:11ef9936017bed12cabfddfce2a90fd82a625d038e573173ab445ab44ee6c357
+size 1465

glot-contrastive-final-lora/checkpoint-2500/sentencepiece.bpe.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a313a26470baedaede322622492f2a542aa41527ddc5d40de444e945ad3c613
+size 7658320

glot-contrastive-final-lora/checkpoint-2500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}

glot-contrastive-final-lora/checkpoint-2500/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "401144": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "unk_token": "<unk>",
+  "use_fast": true
+}

glot-contrastive-final-lora/checkpoint-2500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,3534 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 14.285714285714286,
+  "eval_steps": 5,
+  "global_step": 2500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02857142857142857,
+      "grad_norm": 0.1407003551721573,
+      "learning_rate": 0.00029965714285714283,
+      "loss": 0.9726,
+      "step": 5
+    },
+    {
+      "epoch": 0.05714285714285714,
+      "grad_norm": 0.26689061522483826,
+      "learning_rate": 0.0002992285714285714,
+      "loss": 0.9633,
+      "step": 10
+    },
+    {
+      "epoch": 0.08571428571428572,
+      "grad_norm": 0.8670485615730286,
+      "learning_rate": 0.0002988,
+      "loss": 0.9013,
+      "step": 15
+    },
+    {
+      "epoch": 0.11428571428571428,
+      "grad_norm": 0.9785467386245728,
+      "learning_rate": 0.00029837142857142853,
+      "loss": 0.6942,
+      "step": 20
+    },
+    {
+      "epoch": 0.14285714285714285,
+      "grad_norm": 1.3083932399749756,
+      "learning_rate": 0.0002979428571428571,
+      "loss": 0.4472,
+      "step": 25
+    },
+    {
+      "epoch": 0.17142857142857143,
+      "grad_norm": 1.6103293895721436,
+      "learning_rate": 0.0002975142857142857,
+      "loss": 0.3782,
+      "step": 30
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 2.6353416442871094,
+      "learning_rate": 0.0002970857142857143,
+      "loss": 0.3732,
+      "step": 35
+    },
+    {
+      "epoch": 0.22857142857142856,
+      "grad_norm": 0.9949072003364563,
+      "learning_rate": 0.0002966571428571428,
+      "loss": 0.3506,
+      "step": 40
+    },
+    {
+      "epoch": 0.2571428571428571,
+      "grad_norm": 1.280673861503601,
+      "learning_rate": 0.0002962285714285714,
+      "loss": 0.3346,
+      "step": 45
+    },
+    {
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.7681456208229065,
+      "learning_rate": 0.0002958,
+      "loss": 0.2832,
+      "step": 50
+    },
+    {
+      "epoch": 0.3142857142857143,
+      "grad_norm": 1.0000813007354736,
+      "learning_rate": 0.0002953714285714285,
+      "loss": 0.2603,
+      "step": 55
+    },
+    {
+      "epoch": 0.34285714285714286,
+      "grad_norm": 1.0222399234771729,
+      "learning_rate": 0.0002949428571428571,
+      "loss": 0.2507,
+      "step": 60
+    },
+    {
+      "epoch": 0.37142857142857144,
+      "grad_norm": 0.896902322769165,
+      "learning_rate": 0.0002945142857142857,
+      "loss": 0.2556,
+      "step": 65
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.9035541415214539,
+      "learning_rate": 0.00029408571428571426,
+      "loss": 0.2402,
+      "step": 70
+    },
+    {
+      "epoch": 0.42857142857142855,
+      "grad_norm": 1.4886469841003418,
+      "learning_rate": 0.00029365714285714285,
+      "loss": 0.2376,
+      "step": 75
+    },
+    {
+      "epoch": 0.45714285714285713,
+      "grad_norm": 0.8951187133789062,
+      "learning_rate": 0.0002932285714285714,
+      "loss": 0.2276,
+      "step": 80
+    },
+    {
+      "epoch": 0.4857142857142857,
+      "grad_norm": 0.7876377105712891,
+      "learning_rate": 0.00029279999999999996,
+      "loss": 0.2537,
+      "step": 85
+    },
+    {
+      "epoch": 0.5142857142857142,
+      "grad_norm": 1.0927226543426514,
+      "learning_rate": 0.00029237142857142855,
+      "loss": 0.2152,
+      "step": 90
+    },
+    {
+      "epoch": 0.5428571428571428,
+      "grad_norm": 1.4946355819702148,
+      "learning_rate": 0.00029194285714285713,
+      "loss": 0.2441,
+      "step": 95
+    },
+    {
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.7082991600036621,
+      "learning_rate": 0.0002915142857142857,
+      "loss": 0.2708,
+      "step": 100
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 0.670010507106781,
+      "learning_rate": 0.00029108571428571424,
+      "loss": 0.2396,
+      "step": 105
+    },
+    {
+      "epoch": 0.6285714285714286,
+      "grad_norm": 0.9797312021255493,
+      "learning_rate": 0.00029065714285714283,
+      "loss": 0.2275,
+      "step": 110
+    },
+    {
+      "epoch": 0.6571428571428571,
+      "grad_norm": 1.5220463275909424,
+      "learning_rate": 0.0002902285714285714,
+      "loss": 0.2114,
+      "step": 115
+    },
+    {
+      "epoch": 0.6857142857142857,
+      "grad_norm": 1.3326867818832397,
+      "learning_rate": 0.00028979999999999994,
+      "loss": 0.241,
+      "step": 120
+    },
+    {
+      "epoch": 0.7142857142857143,
+      "grad_norm": 1.1195529699325562,
+      "learning_rate": 0.0002893714285714285,
+      "loss": 0.2389,
+      "step": 125
+    },
+    {
+      "epoch": 0.7428571428571429,
+      "grad_norm": 0.7551061511039734,
+      "learning_rate": 0.0002889428571428571,
+      "loss": 0.2162,
+      "step": 130
+    },
+    {
+      "epoch": 0.7714285714285715,
+      "grad_norm": 1.018908977508545,
+      "learning_rate": 0.0002885142857142857,
+      "loss": 0.1924,
+      "step": 135
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 2.123642921447754,
+      "learning_rate": 0.0002880857142857143,
+      "loss": 0.2174,
+      "step": 140
+    },
+    {
+      "epoch": 0.8285714285714286,
+      "grad_norm": 0.7585068941116333,
+      "learning_rate": 0.0002876571428571428,
+      "loss": 0.2006,
+      "step": 145
+    },
+    {
+      "epoch": 0.8571428571428571,
+      "grad_norm": 1.64150869846344,
+      "learning_rate": 0.0002872285714285714,
+      "loss": 0.1905,
+      "step": 150
+    },
+    {
+      "epoch": 0.8857142857142857,
+      "grad_norm": 0.9126951694488525,
+      "learning_rate": 0.0002868,
+      "loss": 0.2312,
+      "step": 155
+    },
+    {
+      "epoch": 0.9142857142857143,
+      "grad_norm": 0.7278801202774048,
+      "learning_rate": 0.00028637142857142856,
+      "loss": 0.2077,
+      "step": 160
+    },
+    {
+      "epoch": 0.9428571428571428,
+      "grad_norm": 0.8931339383125305,
+      "learning_rate": 0.00028594285714285715,
+      "loss": 0.1951,
+      "step": 165
+    },
+    {
+      "epoch": 0.9714285714285714,
+      "grad_norm": 1.0831843614578247,
+      "learning_rate": 0.0002855142857142857,
+      "loss": 0.2103,
+      "step": 170
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 1.3750063180923462,
+      "learning_rate": 0.00028508571428571426,
+      "loss": 0.2396,
+      "step": 175
+    },
+    {
+      "epoch": 1.0285714285714285,
+      "grad_norm": 0.8338337540626526,
+      "learning_rate": 0.00028465714285714285,
+      "loss": 0.2404,
+      "step": 180
+    },
+    {
+      "epoch": 1.0571428571428572,
+      "grad_norm": 1.2879024744033813,
+      "learning_rate": 0.0002842285714285714,
+      "loss": 0.2117,
+      "step": 185
+    },
+    {
+      "epoch": 1.0857142857142856,
+      "grad_norm": 1.6751821041107178,
+      "learning_rate": 0.00028379999999999996,
+      "loss": 0.1796,
+      "step": 190
+    },
+    {
+      "epoch": 1.1142857142857143,
+      "grad_norm": 0.9864417910575867,
+      "learning_rate": 0.00028337142857142854,
+      "loss": 0.1993,
+      "step": 195
+    },
+    {
+      "epoch": 1.1428571428571428,
+      "grad_norm": 1.0174155235290527,
+      "learning_rate": 0.00028294285714285713,
+      "loss": 0.2068,
+      "step": 200
+    },
+    {
+      "epoch": 1.1714285714285715,
+      "grad_norm": 1.029832124710083,
+      "learning_rate": 0.0002825142857142857,
+      "loss": 0.2015,
+      "step": 205
+    },
+    {
+      "epoch": 1.2,
+      "grad_norm": 0.7745446562767029,
+      "learning_rate": 0.00028208571428571424,
+      "loss": 0.2129,
+      "step": 210
+    },
+    {
+      "epoch": 1.2285714285714286,
+      "grad_norm": 2.5578622817993164,
+      "learning_rate": 0.0002816571428571428,
+      "loss": 0.2224,
+      "step": 215
+    },
+    {
+      "epoch": 1.2571428571428571,
+      "grad_norm": 2.4185051918029785,
+      "learning_rate": 0.0002812285714285714,
+      "loss": 0.2276,
+      "step": 220
+    },
+    {
+      "epoch": 1.2857142857142856,
+      "grad_norm": 1.4176461696624756,
+      "learning_rate": 0.0002808,
+      "loss": 0.1781,
+      "step": 225
+    },
+    {
+      "epoch": 1.3142857142857143,
+      "grad_norm": 0.709326982498169,
+      "learning_rate": 0.0002803714285714286,
+      "loss": 0.2177,
+      "step": 230
+    },
+    {
+      "epoch": 1.342857142857143,
+      "grad_norm": 0.8170766830444336,
+      "learning_rate": 0.0002799428571428571,
+      "loss": 0.1769,
+      "step": 235
+    },
+    {
+      "epoch": 1.3714285714285714,
+      "grad_norm": 1.3850761651992798,
+      "learning_rate": 0.0002795142857142857,
+      "loss": 0.2262,
+      "step": 240
+    },
+    {
+      "epoch": 1.4,
+      "grad_norm": 1.0064373016357422,
+      "learning_rate": 0.0002790857142857143,
+      "loss": 0.196,
+      "step": 245
+    },
+    {
+      "epoch": 1.4285714285714286,
+      "grad_norm": 1.9635728597640991,
+      "learning_rate": 0.0002786571428571428,
+      "loss": 0.2029,
+      "step": 250
+    },
+    {
+      "epoch": 1.457142857142857,
+      "grad_norm": 16.20791244506836,
+      "learning_rate": 0.0002782285714285714,
+      "loss": 0.3925,
+      "step": 255
+    },
+    {
+      "epoch": 1.4857142857142858,
+      "grad_norm": 1.4363322257995605,
+      "learning_rate": 0.0002778,
+      "loss": 0.3684,
+      "step": 260
+    },
+    {
+      "epoch": 1.5142857142857142,
+      "grad_norm": 0.9379534721374512,
+      "learning_rate": 0.00027737142857142856,
+      "loss": 0.2265,
+      "step": 265
+    },
+    {
+      "epoch": 1.5428571428571427,
+      "grad_norm": 0.8453512787818909,
+      "learning_rate": 0.00027694285714285714,
+      "loss": 0.1976,
+      "step": 270
+    },
+    {
+      "epoch": 1.5714285714285714,
+      "grad_norm": 2.316664695739746,
+      "learning_rate": 0.0002765142857142857,
+      "loss": 0.23,
+      "step": 275
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 1.0548444986343384,
+      "learning_rate": 0.00027608571428571426,
+      "loss": 0.1823,
+      "step": 280
+    },
+    {
+      "epoch": 1.6285714285714286,
+      "grad_norm": 3.7894928455352783,
+      "learning_rate": 0.00027565714285714284,
+      "loss": 0.1962,
+      "step": 285
+    },
+    {
+      "epoch": 1.657142857142857,
+      "grad_norm": 2.3081610202789307,
+      "learning_rate": 0.00027522857142857143,
+      "loss": 0.2087,
+      "step": 290
+    },
+    {
+      "epoch": 1.6857142857142857,
+      "grad_norm": 0.9311438202857971,
+      "learning_rate": 0.0002748,
+      "loss": 0.1597,
+      "step": 295
+    },
+    {
+      "epoch": 1.7142857142857144,
+      "grad_norm": 1.1881247758865356,
+      "learning_rate": 0.00027437142857142854,
+      "loss": 0.1764,
+      "step": 300
+    },
+    {
+      "epoch": 1.7428571428571429,
+      "grad_norm": 1.30265212059021,
+      "learning_rate": 0.0002739428571428571,
+      "loss": 0.1647,
+      "step": 305
+    },
+    {
+      "epoch": 1.7714285714285714,
+      "grad_norm": 0.6832175850868225,
+      "learning_rate": 0.0002735142857142857,
+      "loss": 0.1638,
+      "step": 310
+    },
+    {
+      "epoch": 1.8,
+      "grad_norm": 1.8740538358688354,
+      "learning_rate": 0.00027308571428571424,
+      "loss": 0.1803,
+      "step": 315
+    },
+    {
+      "epoch": 1.8285714285714287,
+      "grad_norm": 9.821504592895508,
+      "learning_rate": 0.0002726571428571428,
+      "loss": 0.226,
+      "step": 320
+    },
+    {
+      "epoch": 1.8571428571428572,
+      "grad_norm": 1.0889750719070435,
+      "learning_rate": 0.0002722285714285714,
+      "loss": 0.1822,
+      "step": 325
+    },
+    {
+      "epoch": 1.8857142857142857,
+      "grad_norm": 0.9660868048667908,
+      "learning_rate": 0.0002718,
+      "loss": 0.1842,
+      "step": 330
+    },
+    {
+      "epoch": 1.9142857142857141,
+      "grad_norm": 0.6329234838485718,
+      "learning_rate": 0.0002713714285714286,
+      "loss": 0.1488,
+      "step": 335
+    },
+    {
+      "epoch": 1.9428571428571428,
+      "grad_norm": 3.601266384124756,
+      "learning_rate": 0.0002709428571428571,
+      "loss": 0.1887,
+      "step": 340
+    },
+    {
+      "epoch": 1.9714285714285715,
+      "grad_norm": 1.1441439390182495,
+      "learning_rate": 0.0002705142857142857,
+      "loss": 0.184,
+      "step": 345
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 0.8586034774780273,
+      "learning_rate": 0.0002700857142857143,
+      "loss": 0.1578,
+      "step": 350
+    },
+    {
+      "epoch": 2.0285714285714285,
+      "grad_norm": 1.5113487243652344,
+      "learning_rate": 0.00026965714285714286,
+      "loss": 0.2002,
+      "step": 355
+    },
+    {
+      "epoch": 2.057142857142857,
+      "grad_norm": 1.1123011112213135,
+      "learning_rate": 0.0002692285714285714,
+      "loss": 0.1946,
+      "step": 360
+    },
+    {
+      "epoch": 2.085714285714286,
+      "grad_norm": 0.9377036094665527,
+      "learning_rate": 0.0002688,
+      "loss": 0.1971,
+      "step": 365
+    },
+    {
+      "epoch": 2.1142857142857143,
+      "grad_norm": 0.6956892609596252,
+      "learning_rate": 0.00026837142857142856,
+      "loss": 0.1758,
+      "step": 370
+    },
+    {
+      "epoch": 2.142857142857143,
+      "grad_norm": 0.7510782480239868,
+      "learning_rate": 0.0002679428571428571,
+      "loss": 0.1674,
+      "step": 375
+    },
+    {
+      "epoch": 2.1714285714285713,
+      "grad_norm": 0.7009285092353821,
+      "learning_rate": 0.00026751428571428567,
+      "loss": 0.1945,
+      "step": 380
+    },
+    {
+      "epoch": 2.2,
+      "grad_norm": 0.9555609822273254,
+      "learning_rate": 0.00026708571428571426,
+      "loss": 0.1857,
+      "step": 385
+    },
+    {
+      "epoch": 2.2285714285714286,
+      "grad_norm": 2.133979082107544,
+      "learning_rate": 0.00026665714285714284,
+      "loss": 0.1636,
+      "step": 390
+    },
+    {
+      "epoch": 2.257142857142857,
+      "grad_norm": 0.7105309963226318,
+      "learning_rate": 0.0002662285714285714,
+      "loss": 0.2014,
+      "step": 395
+    },
+    {
+      "epoch": 2.2857142857142856,
+      "grad_norm": 0.7329701781272888,
+      "learning_rate": 0.00026579999999999996,
+      "loss": 0.1884,
+      "step": 400
+    },
+    {
+      "epoch": 2.314285714285714,
+      "grad_norm": 1.0426994562149048,
+      "learning_rate": 0.00026537142857142854,
+      "loss": 0.1558,
+      "step": 405
+    },
+    {
+      "epoch": 2.342857142857143,
+      "grad_norm": 0.9306122660636902,
+      "learning_rate": 0.0002649428571428571,
+      "loss": 0.1774,
+      "step": 410
+    },
+    {
+      "epoch": 2.3714285714285714,
+      "grad_norm": 0.6989394426345825,
+      "learning_rate": 0.00026451428571428565,
+      "loss": 0.1601,
+      "step": 415
+    },
+    {
+      "epoch": 2.4,
+      "grad_norm": 1.4383760690689087,
+      "learning_rate": 0.0002640857142857143,
+      "loss": 0.1564,
+      "step": 420
+    },
+    {
+      "epoch": 2.4285714285714284,
+      "grad_norm": 0.6448336839675903,
+      "learning_rate": 0.0002636571428571428,
+      "loss": 0.1827,
+      "step": 425
+    },
+    {
+      "epoch": 2.4571428571428573,
+      "grad_norm": 0.9535760879516602,
+      "learning_rate": 0.0002632285714285714,
+      "loss": 0.1713,
+      "step": 430
+    },
+    {
+      "epoch": 2.4857142857142858,
+      "grad_norm": 1.034945011138916,
+      "learning_rate": 0.0002628,
+      "loss": 0.1457,
+      "step": 435
+    },
+    {
+      "epoch": 2.5142857142857142,
+      "grad_norm": 1.3225128650665283,
+      "learning_rate": 0.0002623714285714285,
+      "loss": 0.1633,
+      "step": 440
+    },
+    {
+      "epoch": 2.5428571428571427,
+      "grad_norm": 0.8285059928894043,
+      "learning_rate": 0.0002619428571428571,
+      "loss": 0.2004,
+      "step": 445
+    },
+    {
+      "epoch": 2.571428571428571,
+      "grad_norm": 0.773176908493042,
+      "learning_rate": 0.0002615142857142857,
+      "loss": 0.1641,
+      "step": 450
+    },
+    {
+      "epoch": 2.6,
+      "grad_norm": 0.7964853048324585,
+      "learning_rate": 0.0002610857142857143,
+      "loss": 0.1608,
+      "step": 455
+    },
+    {
+      "epoch": 2.6285714285714286,
+      "grad_norm": 1.0967328548431396,
+      "learning_rate": 0.00026065714285714286,
+      "loss": 0.1697,
+      "step": 460
+    },
+    {
+      "epoch": 2.657142857142857,
+      "grad_norm": 0.6462066173553467,
+      "learning_rate": 0.0002602285714285714,
+      "loss": 0.1512,
+      "step": 465
+    },
+    {
+      "epoch": 2.685714285714286,
+      "grad_norm": 0.8765937089920044,
+      "learning_rate": 0.00025979999999999997,
+      "loss": 0.1826,
+      "step": 470
+    },
+    {
+      "epoch": 2.7142857142857144,
+      "grad_norm": 1.2524124383926392,
+      "learning_rate": 0.00025937142857142856,
+      "loss": 0.1731,
+      "step": 475
+    },
+    {
+      "epoch": 2.742857142857143,
+      "grad_norm": 2.2982606887817383,
+      "learning_rate": 0.0002589428571428571,
+      "loss": 0.1852,
+      "step": 480
+    },
+    {
+      "epoch": 2.7714285714285714,
+      "grad_norm": 0.9989053010940552,
+      "learning_rate": 0.0002585142857142857,
+      "loss": 0.1791,
+      "step": 485
+    },
+    {
+      "epoch": 2.8,
+      "grad_norm": 0.772343635559082,
+      "learning_rate": 0.00025808571428571426,
+      "loss": 0.1862,
+      "step": 490
+    },
+    {
+      "epoch": 2.8285714285714287,
+      "grad_norm": 1.2101136445999146,
+      "learning_rate": 0.00025765714285714284,
+      "loss": 0.1806,
+      "step": 495
+    },
+    {
+      "epoch": 2.857142857142857,
+      "grad_norm": 0.8010189533233643,
+      "learning_rate": 0.0002572285714285714,
+      "loss": 0.1842,
+      "step": 500
+    },
+    {
+      "epoch": 2.8857142857142857,
+      "grad_norm": 1.3597544431686401,
+      "learning_rate": 0.00025679999999999995,
+      "loss": 0.1583,
+      "step": 505
+    },
+    {
+      "epoch": 2.914285714285714,
+      "grad_norm": 0.8790671825408936,
+      "learning_rate": 0.00025637142857142854,
+      "loss": 0.1565,
+      "step": 510
+    },
+    {
+      "epoch": 2.942857142857143,
+      "grad_norm": 1.1175066232681274,
+      "learning_rate": 0.0002559428571428571,
+      "loss": 0.1406,
+      "step": 515
+    },
+    {
+      "epoch": 2.9714285714285715,
+      "grad_norm": 2.8528785705566406,
+      "learning_rate": 0.0002555142857142857,
+      "loss": 0.1735,
+      "step": 520
+    },
+    {
+      "epoch": 3.0,
+      "grad_norm": 2.2073328495025635,
+      "learning_rate": 0.0002550857142857143,
+      "loss": 0.1816,
+      "step": 525
+    },
+    {
+      "epoch": 3.0285714285714285,
+      "grad_norm": 11.01322078704834,
+      "learning_rate": 0.0002546571428571428,
+      "loss": 0.1873,
+      "step": 530
+    },
+    {
+      "epoch": 3.057142857142857,
+      "grad_norm": 1.5822402238845825,
+      "learning_rate": 0.0002542285714285714,
+      "loss": 0.168,
+      "step": 535
+    },
+    {
+      "epoch": 3.085714285714286,
+      "grad_norm": 1.3086942434310913,
+      "learning_rate": 0.0002538,
+      "loss": 0.149,
+      "step": 540
+    },
+    {
+      "epoch": 3.1142857142857143,
+      "grad_norm": 6.303041458129883,
+      "learning_rate": 0.0002533714285714285,
+      "loss": 0.1651,
+      "step": 545
+    },
+    {
+      "epoch": 3.142857142857143,
+      "grad_norm": 14.48929500579834,
+      "learning_rate": 0.00025294285714285716,
+      "loss": 0.1687,
+      "step": 550
+    },
+    {
+      "epoch": 3.1714285714285713,
+      "grad_norm": 6.824525356292725,
+      "learning_rate": 0.0002525142857142857,
+      "loss": 0.1919,
+      "step": 555
+    },
+    {
+      "epoch": 3.2,
+      "grad_norm": 18.772563934326172,
+      "learning_rate": 0.00025208571428571427,
+      "loss": 0.2075,
+      "step": 560
+    },
+    {
+      "epoch": 3.2285714285714286,
+      "grad_norm": 0.7268752455711365,
+      "learning_rate": 0.00025165714285714286,
+      "loss": 0.174,
+      "step": 565
+    },
+    {
+      "epoch": 3.257142857142857,
+      "grad_norm": 1.1301453113555908,
+      "learning_rate": 0.0002512285714285714,
+      "loss": 0.1668,
+      "step": 570
+    },
+    {
+      "epoch": 3.2857142857142856,
+      "grad_norm": 2.846802234649658,
+      "learning_rate": 0.00025079999999999997,
+      "loss": 0.1645,
+      "step": 575
+    },
+    {
+      "epoch": 3.314285714285714,
+      "grad_norm": 1.417515754699707,
+      "learning_rate": 0.00025037142857142855,
+      "loss": 0.1719,
+      "step": 580
+    },
+    {
+      "epoch": 3.342857142857143,
+      "grad_norm": 4.137150764465332,
+      "learning_rate": 0.00024994285714285714,
+      "loss": 0.1739,
+      "step": 585
+    },
+    {
+      "epoch": 3.3714285714285714,
+      "grad_norm": 2.6067259311676025,
+      "learning_rate": 0.0002495142857142857,
+      "loss": 0.1489,
+      "step": 590
+    },
+    {
+      "epoch": 3.4,
+      "grad_norm": 2.601024627685547,
+      "learning_rate": 0.00024908571428571425,
+      "loss": 0.1618,
+      "step": 595
+    },
+    {
+      "epoch": 3.4285714285714284,
+      "grad_norm": 3.849017858505249,
+      "learning_rate": 0.00024865714285714284,
+      "loss": 0.1899,
+      "step": 600
+    },
+    {
+      "epoch": 3.4571428571428573,
+      "grad_norm": 4.673766136169434,
+      "learning_rate": 0.0002482285714285714,
+      "loss": 0.1761,
+      "step": 605
+    },
+    {
+      "epoch": 3.4857142857142858,
+      "grad_norm": 2.6057631969451904,
+      "learning_rate": 0.00024779999999999995,
+      "loss": 0.1743,
+      "step": 610
+    },
+    {
+      "epoch": 3.5142857142857142,
+      "grad_norm": 2.932652473449707,
+      "learning_rate": 0.0002473714285714286,
+      "loss": 0.1482,
+      "step": 615
+    },
+    {
+      "epoch": 3.5428571428571427,
+      "grad_norm": 0.8764939308166504,
+      "learning_rate": 0.0002469428571428571,
+      "loss": 0.1644,
+      "step": 620
+    },
+    {
+      "epoch": 3.571428571428571,
+      "grad_norm": 1.3203191757202148,
+      "learning_rate": 0.0002465142857142857,
+      "loss": 0.1654,
+      "step": 625
+    },
+    {
+      "epoch": 3.6,
+      "grad_norm": 0.7977635264396667,
+      "learning_rate": 0.0002460857142857143,
+      "loss": 0.1472,
+      "step": 630
+    },
+    {
+      "epoch": 3.6285714285714286,
+      "grad_norm": 1.4750248193740845,
+      "learning_rate": 0.0002456571428571428,
+      "loss": 0.1735,
+      "step": 635
+    },
+    {
+      "epoch": 3.657142857142857,
+      "grad_norm": 1.8164482116699219,
+      "learning_rate": 0.0002452285714285714,
+      "loss": 0.1593,
+      "step": 640
+    },
+    {
+      "epoch": 3.685714285714286,
+      "grad_norm": 1.4829603433609009,
+      "learning_rate": 0.0002448,
+      "loss": 0.1508,
+      "step": 645
+    },
+    {
+      "epoch": 3.7142857142857144,
+      "grad_norm": 0.8828144669532776,
+      "learning_rate": 0.00024437142857142857,
+      "loss": 0.1573,
+      "step": 650
+    },
+    {
+      "epoch": 3.742857142857143,
+      "grad_norm": 2.039384126663208,
+      "learning_rate": 0.00024394285714285713,
+      "loss": 0.1745,
+      "step": 655
+    },
+    {
+      "epoch": 3.7714285714285714,
+      "grad_norm": 0.9604200720787048,
+      "learning_rate": 0.00024351428571428569,
+      "loss": 0.17,
+      "step": 660
+    },
+    {
+      "epoch": 3.8,
+      "grad_norm": 0.7903971076011658,
+      "learning_rate": 0.00024308571428571427,
+      "loss": 0.1654,
+      "step": 665
+    },
+    {
+      "epoch": 3.8285714285714287,
+      "grad_norm": 0.6935649514198303,
+      "learning_rate": 0.00024265714285714283,
+      "loss": 0.1714,
+      "step": 670
+    },
+    {
+      "epoch": 3.857142857142857,
+      "grad_norm": 0.5832012295722961,
+      "learning_rate": 0.00024222857142857138,
+      "loss": 0.1636,
+      "step": 675
+    },
+    {
+      "epoch": 3.8857142857142857,
+      "grad_norm": 0.6303168535232544,
+      "learning_rate": 0.0002418,
+      "loss": 0.1604,
+      "step": 680
+    },
+    {
+      "epoch": 3.914285714285714,
+      "grad_norm": 0.7210885882377625,
+      "learning_rate": 0.00024137142857142855,
+      "loss": 0.1444,
+      "step": 685
+    },
+    {
+      "epoch": 3.942857142857143,
+      "grad_norm": 0.7690990567207336,
+      "learning_rate": 0.00024094285714285714,
+      "loss": 0.1631,
+      "step": 690
+    },
+    {
+      "epoch": 3.9714285714285715,
+      "grad_norm": 1.0142720937728882,
+      "learning_rate": 0.0002405142857142857,
+      "loss": 0.158,
+      "step": 695
+    },
+    {
+      "epoch": 4.0,
+      "grad_norm": 0.7970322966575623,
+      "learning_rate": 0.00024008571428571425,
+      "loss": 0.1803,
+      "step": 700
+    },
+    {
+      "epoch": 4.0285714285714285,
+      "grad_norm": 0.6795914769172668,
+      "learning_rate": 0.00023965714285714284,
+      "loss": 0.143,
+      "step": 705
+    },
+    {
+      "epoch": 4.057142857142857,
+      "grad_norm": 0.6832629442214966,
+      "learning_rate": 0.0002392285714285714,
+      "loss": 0.1457,
+      "step": 710
+    },
+    {
+      "epoch": 4.085714285714285,
+      "grad_norm": 3.8629798889160156,
+      "learning_rate": 0.0002388,
+      "loss": 0.1671,
+      "step": 715
+    },
+    {
+      "epoch": 4.114285714285714,
+      "grad_norm": 1.1167882680892944,
+      "learning_rate": 0.00023837142857142856,
+      "loss": 0.1544,
+      "step": 720
+    },
+    {
+      "epoch": 4.142857142857143,
+      "grad_norm": 0.9431412816047668,
+      "learning_rate": 0.00023794285714285712,
+      "loss": 0.1605,
+      "step": 725
+    },
+    {
+      "epoch": 4.171428571428572,
+      "grad_norm": 1.310948133468628,
+      "learning_rate": 0.0002375142857142857,
+      "loss": 0.1121,
+      "step": 730
+    },
+    {
+      "epoch": 4.2,
+      "grad_norm": 0.9830737709999084,
+      "learning_rate": 0.00023708571428571426,
+      "loss": 0.1742,
+      "step": 735
+    },
+    {
+      "epoch": 4.228571428571429,
+      "grad_norm": 0.6166555881500244,
+      "learning_rate": 0.00023665714285714282,
+      "loss": 0.1525,
+      "step": 740
+    },
+    {
+      "epoch": 4.257142857142857,
+      "grad_norm": 0.995579719543457,
+      "learning_rate": 0.00023622857142857143,
+      "loss": 0.1439,
+      "step": 745
+    },
+    {
+      "epoch": 4.285714285714286,
+      "grad_norm": 0.639796793460846,
+      "learning_rate": 0.00023579999999999999,
+      "loss": 0.1692,
+      "step": 750
+    },
+    {
+      "epoch": 4.314285714285714,
+      "grad_norm": 0.9438050389289856,
+      "learning_rate": 0.00023537142857142854,
+      "loss": 0.1785,
+      "step": 755
+    },
+    {
+      "epoch": 4.3428571428571425,
+      "grad_norm": 0.8960750102996826,
+      "learning_rate": 0.00023494285714285713,
+      "loss": 0.1557,
+      "step": 760
+    },
+    {
+      "epoch": 4.371428571428572,
+      "grad_norm": 0.6287499070167542,
+      "learning_rate": 0.00023451428571428568,
+      "loss": 0.1459,
+      "step": 765
+    },
+    {
+      "epoch": 4.4,
+      "grad_norm": 0.7638295888900757,
+      "learning_rate": 0.00023408571428571424,
+      "loss": 0.1341,
+      "step": 770
+    },
+    {
+      "epoch": 4.428571428571429,
+      "grad_norm": 0.655878484249115,
+      "learning_rate": 0.00023365714285714283,
+      "loss": 0.1358,
+      "step": 775
+    },
+    {
+      "epoch": 4.457142857142857,
+      "grad_norm": 0.5840997695922852,
+      "learning_rate": 0.0002332285714285714,
+      "loss": 0.1386,
+      "step": 780
+    },
+    {
+      "epoch": 4.485714285714286,
+      "grad_norm": 1.1082488298416138,
+      "learning_rate": 0.0002328,
+      "loss": 0.1827,
+      "step": 785
+    },
+    {
+      "epoch": 4.514285714285714,
+      "grad_norm": 0.8825240135192871,
+      "learning_rate": 0.00023237142857142855,
+      "loss": 0.1527,
+      "step": 790
+    },
+    {
+      "epoch": 4.542857142857143,
+      "grad_norm": 0.6752304434776306,
+      "learning_rate": 0.0002319428571428571,
+      "loss": 0.1392,
+      "step": 795
+    },
+    {
+      "epoch": 4.571428571428571,
+      "grad_norm": 1.1423301696777344,
+      "learning_rate": 0.0002315142857142857,
+      "loss": 0.1433,
+      "step": 800
+    },
+    {
+      "epoch": 4.6,
+      "grad_norm": 10.793691635131836,
+      "learning_rate": 0.00023108571428571425,
+      "loss": 0.1635,
+      "step": 805
+    },
+    {
+      "epoch": 4.628571428571428,
+      "grad_norm": 0.47564294934272766,
+      "learning_rate": 0.00023065714285714286,
+      "loss": 0.1199,
+      "step": 810
+    },
+    {
+      "epoch": 4.6571428571428575,
+      "grad_norm": 1.2492656707763672,
+      "learning_rate": 0.00023022857142857142,
+      "loss": 0.1488,
+      "step": 815
+    },
+    {
+      "epoch": 4.685714285714286,
+      "grad_norm": 0.6933501958847046,
+      "learning_rate": 0.00022979999999999997,
+      "loss": 0.1812,
+      "step": 820
+    },
+    {
+      "epoch": 4.714285714285714,
+      "grad_norm": 0.7901633977890015,
+      "learning_rate": 0.00022937142857142856,
+      "loss": 0.1415,
+      "step": 825
+    },
+    {
+      "epoch": 4.742857142857143,
+      "grad_norm": 0.7854829430580139,
+      "learning_rate": 0.00022894285714285712,
+      "loss": 0.1401,
+      "step": 830
+    },
+    {
+      "epoch": 4.771428571428571,
+      "grad_norm": 0.8716740608215332,
+      "learning_rate": 0.00022851428571428567,
+      "loss": 0.1982,
+      "step": 835
+    },
+    {
+      "epoch": 4.8,
+      "grad_norm": 0.7047899961471558,
+      "learning_rate": 0.00022808571428571426,
+      "loss": 0.1624,
+      "step": 840
+    },
+    {
+      "epoch": 4.828571428571428,
+      "grad_norm": 0.7134959697723389,
+      "learning_rate": 0.00022765714285714284,
+      "loss": 0.1375,
+      "step": 845
+    },
+    {
+      "epoch": 4.857142857142857,
+      "grad_norm": 1.0897325277328491,
+      "learning_rate": 0.00022722857142857143,
+      "loss": 0.1489,
+      "step": 850
+    },
+    {
+      "epoch": 4.885714285714286,
+      "grad_norm": 1.1065207719802856,
+      "learning_rate": 0.00022679999999999998,
+      "loss": 0.1495,
+      "step": 855
+    },
+    {
+      "epoch": 4.914285714285715,
+      "grad_norm": 0.7434757351875305,
+      "learning_rate": 0.00022637142857142854,
+      "loss": 0.1507,
+      "step": 860
+    },
+    {
+      "epoch": 4.942857142857143,
+      "grad_norm": 1.0045181512832642,
+      "learning_rate": 0.00022594285714285712,
+      "loss": 0.1527,
+      "step": 865
+    },
+    {
+      "epoch": 4.9714285714285715,
+      "grad_norm": 1.2025654315948486,
+      "learning_rate": 0.00022551428571428568,
+      "loss": 0.1523,
+      "step": 870
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 0.7823342084884644,
+      "learning_rate": 0.0002250857142857143,
+      "loss": 0.1514,
+      "step": 875
+    },
+    {
+      "epoch": 5.0285714285714285,
+      "grad_norm": 0.8405362963676453,
+      "learning_rate": 0.00022465714285714285,
+      "loss": 0.1461,
+      "step": 880
+    },
+    {
+      "epoch": 5.057142857142857,
+      "grad_norm": 0.7527463436126709,
+      "learning_rate": 0.0002242285714285714,
+      "loss": 0.1206,
+      "step": 885
+    },
+    {
+      "epoch": 5.085714285714285,
+      "grad_norm": 0.8372548222541809,
+      "learning_rate": 0.0002238,
+      "loss": 0.1513,
+      "step": 890
+    },
+    {
+      "epoch": 5.114285714285714,
+      "grad_norm": 0.8755456209182739,
+      "learning_rate": 0.00022337142857142855,
+      "loss": 0.1498,
+      "step": 895
+    },
+    {
+      "epoch": 5.142857142857143,
+      "grad_norm": 0.7312084436416626,
+      "learning_rate": 0.0002229428571428571,
+      "loss": 0.154,
+      "step": 900
+    },
+    {
+      "epoch": 5.171428571428572,
+      "grad_norm": 0.6366221904754639,
+      "learning_rate": 0.0002225142857142857,
+      "loss": 0.1466,
+      "step": 905
+    },
+    {
+      "epoch": 5.2,
+      "grad_norm": 0.6406880617141724,
+      "learning_rate": 0.00022208571428571427,
+      "loss": 0.1254,
+      "step": 910
+    },
+    {
+      "epoch": 5.228571428571429,
+      "grad_norm": 2.4106833934783936,
+      "learning_rate": 0.00022165714285714283,
+      "loss": 0.1534,
+      "step": 915
+    },
+    {
+      "epoch": 5.257142857142857,
+      "grad_norm": 0.5635722279548645,
+      "learning_rate": 0.00022122857142857142,
+      "loss": 0.1461,
+      "step": 920
+    },
+    {
+      "epoch": 5.285714285714286,
+      "grad_norm": 0.787162184715271,
+      "learning_rate": 0.00022079999999999997,
+      "loss": 0.1424,
+      "step": 925
+    },
+    {
+      "epoch": 5.314285714285714,
+      "grad_norm": 0.6513975262641907,
+      "learning_rate": 0.00022037142857142853,
+      "loss": 0.1326,
+      "step": 930
+    },
+    {
+      "epoch": 5.3428571428571425,
+      "grad_norm": 0.6933534741401672,
+      "learning_rate": 0.00021994285714285711,
+      "loss": 0.1661,
+      "step": 935
+    },
+    {
+      "epoch": 5.371428571428572,
+      "grad_norm": 0.7263259887695312,
+      "learning_rate": 0.0002195142857142857,
+      "loss": 0.15,
+      "step": 940
+    },
+    {
+      "epoch": 5.4,
+      "grad_norm": 0.5537381768226624,
+      "learning_rate": 0.00021908571428571428,
+      "loss": 0.129,
+      "step": 945
+    },
+    {
+      "epoch": 5.428571428571429,
+      "grad_norm": 0.6014005541801453,
+      "learning_rate": 0.00021865714285714284,
+      "loss": 0.1321,
+      "step": 950
+    },
+    {
+      "epoch": 5.457142857142857,
+      "grad_norm": 0.6581441760063171,
+      "learning_rate": 0.0002182285714285714,
+      "loss": 0.1587,
+      "step": 955
+    },
+    {
+      "epoch": 5.485714285714286,
+      "grad_norm": 0.9326379895210266,
+      "learning_rate": 0.00021779999999999998,
+      "loss": 0.1654,
+      "step": 960
+    },
+    {
+      "epoch": 5.514285714285714,
+      "grad_norm": 0.9438592791557312,
+      "learning_rate": 0.00021737142857142854,
+      "loss": 0.1212,
+      "step": 965
+    },
+    {
+      "epoch": 5.542857142857143,
+      "grad_norm": 0.7699571251869202,
+      "learning_rate": 0.00021694285714285715,
+      "loss": 0.1464,
+      "step": 970
+    },
+    {
+      "epoch": 5.571428571428571,
+      "grad_norm": 0.8758366703987122,
+      "learning_rate": 0.0002165142857142857,
+      "loss": 0.1599,
+      "step": 975
+    },
+    {
+      "epoch": 5.6,
+      "grad_norm": 0.6101442575454712,
+      "learning_rate": 0.00021608571428571426,
+      "loss": 0.1589,
+      "step": 980
+    },
+    {
+      "epoch": 5.628571428571428,
+      "grad_norm": 0.7454060912132263,
+      "learning_rate": 0.00021565714285714285,
+      "loss": 0.1433,
+      "step": 985
+    },
+    {
+      "epoch": 5.6571428571428575,
+      "grad_norm": 0.6379484534263611,
+      "learning_rate": 0.0002152285714285714,
+      "loss": 0.1592,
+      "step": 990
+    },
+    {
+      "epoch": 5.685714285714286,
+      "grad_norm": 1.1601309776306152,
+      "learning_rate": 0.00021479999999999996,
+      "loss": 0.1647,
+      "step": 995
+    },
+    {
+      "epoch": 5.714285714285714,
+      "grad_norm": 0.5464673638343811,
+      "learning_rate": 0.00021437142857142855,
+      "loss": 0.1469,
+      "step": 1000
+    },
+    {
+      "epoch": 5.742857142857143,
+      "grad_norm": 1.0279319286346436,
+      "learning_rate": 0.00021394285714285713,
+      "loss": 0.1203,
+      "step": 1005
+    },
+    {
+      "epoch": 5.771428571428571,
+      "grad_norm": 0.5503718256950378,
+      "learning_rate": 0.00021351428571428572,
+      "loss": 0.1409,
+      "step": 1010
+    },
+    {
+      "epoch": 5.8,
+      "grad_norm": 0.6123886108398438,
+      "learning_rate": 0.00021308571428571427,
+      "loss": 0.1427,
+      "step": 1015
+    },
+    {
+      "epoch": 5.828571428571428,
+      "grad_norm": 0.6560390591621399,
+      "learning_rate": 0.00021265714285714283,
+      "loss": 0.1415,
+      "step": 1020
+    },
+    {
+      "epoch": 5.857142857142857,
+      "grad_norm": 0.5576716661453247,
+      "learning_rate": 0.00021222857142857141,
+      "loss": 0.1408,
+      "step": 1025
+    },
+    {
+      "epoch": 5.885714285714286,
+      "grad_norm": 0.6419074535369873,
+      "learning_rate": 0.00021179999999999997,
+      "loss": 0.1385,
+      "step": 1030
+    },
+    {
+      "epoch": 5.914285714285715,
+      "grad_norm": 1.008925199508667,
+      "learning_rate": 0.00021137142857142858,
+      "loss": 0.1497,
+      "step": 1035
+    },
+    {
+      "epoch": 5.942857142857143,
+      "grad_norm": 0.6559906005859375,
+      "learning_rate": 0.00021094285714285714,
+      "loss": 0.1218,
+      "step": 1040
+    },
+    {
+      "epoch": 5.9714285714285715,
+      "grad_norm": 0.627164363861084,
+      "learning_rate": 0.0002105142857142857,
+      "loss": 0.1368,
+      "step": 1045
+    },
+    {
+      "epoch": 6.0,
+      "grad_norm": 0.5760972499847412,
+      "learning_rate": 0.00021008571428571428,
+      "loss": 0.1508,
+      "step": 1050
+    },
+    {
+      "epoch": 6.0285714285714285,
+      "grad_norm": 0.5754174590110779,
+      "learning_rate": 0.00020965714285714284,
+      "loss": 0.1181,
+      "step": 1055
+    },
+    {
+      "epoch": 6.057142857142857,
+      "grad_norm": 0.8736348748207092,
+      "learning_rate": 0.0002092285714285714,
+      "loss": 0.1252,
+      "step": 1060
+    },
+    {
+      "epoch": 6.085714285714285,
+      "grad_norm": 0.7166719436645508,
+      "learning_rate": 0.00020879999999999998,
+      "loss": 0.1481,
+      "step": 1065
+    },
+    {
+      "epoch": 6.114285714285714,
+      "grad_norm": 0.6494349241256714,
+      "learning_rate": 0.00020837142857142856,
+      "loss": 0.1478,
+      "step": 1070
+    },
+    {
+      "epoch": 6.142857142857143,
+      "grad_norm": 0.6681587100028992,
+      "learning_rate": 0.00020794285714285712,
+      "loss": 0.1488,
+      "step": 1075
+    },
+    {
+      "epoch": 6.171428571428572,
+      "grad_norm": 0.7123684883117676,
+      "learning_rate": 0.0002075142857142857,
+      "loss": 0.1378,
+      "step": 1080
+    },
+    {
+      "epoch": 6.2,
+      "grad_norm": 0.6146950721740723,
+      "learning_rate": 0.00020708571428571426,
+      "loss": 0.1306,
+      "step": 1085
+    },
+    {
+      "epoch": 6.228571428571429,
+      "grad_norm": 0.8402445912361145,
+      "learning_rate": 0.00020665714285714282,
+      "loss": 0.1063,
+      "step": 1090
+    },
+    {
+      "epoch": 6.257142857142857,
+      "grad_norm": 0.6567764282226562,
+      "learning_rate": 0.0002062285714285714,
+      "loss": 0.1195,
+      "step": 1095
+    },
+    {
+      "epoch": 6.285714285714286,
+      "grad_norm": 0.6006014943122864,
+      "learning_rate": 0.0002058,
+      "loss": 0.1542,
+      "step": 1100
+    },
+    {
+      "epoch": 6.314285714285714,
+      "grad_norm": 0.793100893497467,
+      "learning_rate": 0.00020537142857142857,
+      "loss": 0.1381,
+      "step": 1105
+    },
+    {
+      "epoch": 6.3428571428571425,
+      "grad_norm": 0.5923666954040527,
+      "learning_rate": 0.00020494285714285713,
+      "loss": 0.1386,
+      "step": 1110
+    },
+    {
+      "epoch": 6.371428571428572,
+      "grad_norm": 0.6692521572113037,
+      "learning_rate": 0.0002045142857142857,
+      "loss": 0.1223,
+      "step": 1115
+    },
+    {
+      "epoch": 6.4,
+      "grad_norm": 0.7216306328773499,
+      "learning_rate": 0.00020408571428571427,
+      "loss": 0.1367,
+      "step": 1120
+    },
+    {
+      "epoch": 6.428571428571429,
+      "grad_norm": 0.5640934109687805,
+      "learning_rate": 0.00020365714285714283,
+      "loss": 0.1554,
+      "step": 1125
+    },
+    {
+      "epoch": 6.457142857142857,
+      "grad_norm": 0.8154368996620178,
+      "learning_rate": 0.00020322857142857138,
+      "loss": 0.1674,
+      "step": 1130
+    },
+    {
+      "epoch": 6.485714285714286,
+      "grad_norm": 0.7185398936271667,
+      "learning_rate": 0.0002028,
+      "loss": 0.1375,
+      "step": 1135
+    },
+    {
+      "epoch": 6.514285714285714,
+      "grad_norm": 0.6805170774459839,
+      "learning_rate": 0.00020237142857142855,
+      "loss": 0.1306,
+      "step": 1140
+    },
+    {
+      "epoch": 6.542857142857143,
+      "grad_norm": 0.5996941924095154,
+      "learning_rate": 0.00020194285714285714,
+      "loss": 0.1433,
+      "step": 1145
+    },
+    {
+      "epoch": 6.571428571428571,
+      "grad_norm": 0.5258373022079468,
+      "learning_rate": 0.0002015142857142857,
+      "loss": 0.1285,
+      "step": 1150
+    },
+    {
+      "epoch": 6.6,
+      "grad_norm": 0.7771695256233215,
+      "learning_rate": 0.00020108571428571425,
+      "loss": 0.1493,
+      "step": 1155
+    },
+    {
+      "epoch": 6.628571428571428,
+      "grad_norm": 0.5920616388320923,
+      "learning_rate": 0.00020065714285714284,
+      "loss": 0.1479,
+      "step": 1160
+    },
+    {
+      "epoch": 6.6571428571428575,
+      "grad_norm": 0.7460982799530029,
+      "learning_rate": 0.00020022857142857142,
+      "loss": 0.1173,
+      "step": 1165
+    },
+    {
+      "epoch": 6.685714285714286,
+      "grad_norm": 1.1703822612762451,
+      "learning_rate": 0.0001998,
+      "loss": 0.1402,
+      "step": 1170
+    },
+    {
+      "epoch": 6.714285714285714,
+      "grad_norm": 0.7894724011421204,
+      "learning_rate": 0.00019937142857142856,
+      "loss": 0.1253,
+      "step": 1175
+    },
+    {
+      "epoch": 6.742857142857143,
+      "grad_norm": 0.7013376355171204,
+      "learning_rate": 0.00019894285714285712,
+      "loss": 0.1573,
+      "step": 1180
+    },
+    {
+      "epoch": 6.771428571428571,
+      "grad_norm": 0.6421737670898438,
+      "learning_rate": 0.0001985142857142857,
+      "loss": 0.1497,
+      "step": 1185
+    },
+    {
+      "epoch": 6.8,
+      "grad_norm": 1.204296350479126,
+      "learning_rate": 0.00019808571428571426,
+      "loss": 0.1634,
+      "step": 1190
+    },
+    {
+      "epoch": 6.828571428571428,
+      "grad_norm": 0.867765486240387,
+      "learning_rate": 0.00019765714285714282,
+      "loss": 0.1353,
+      "step": 1195
+    },
+    {
+      "epoch": 6.857142857142857,
+      "grad_norm": 0.7325594425201416,
+      "learning_rate": 0.00019722857142857143,
+      "loss": 0.118,
+      "step": 1200
+    },
+    {
+      "epoch": 6.885714285714286,
+      "grad_norm": 0.7029078006744385,
+      "learning_rate": 0.00019679999999999999,
+      "loss": 0.1425,
+      "step": 1205
+    },
+    {
+      "epoch": 6.914285714285715,
+      "grad_norm": 1.1572504043579102,
+      "learning_rate": 0.00019637142857142857,
+      "loss": 0.1337,
+      "step": 1210
+    },
+    {
+      "epoch": 6.942857142857143,
+      "grad_norm": 0.8022822141647339,
+      "learning_rate": 0.00019594285714285713,
+      "loss": 0.1684,
+      "step": 1215
+    },
+    {
+      "epoch": 6.9714285714285715,
+      "grad_norm": 0.6729874610900879,
+      "learning_rate": 0.00019551428571428568,
+      "loss": 0.1238,
+      "step": 1220
+    },
+    {
+      "epoch": 7.0,
+      "grad_norm": 0.5773627758026123,
+      "learning_rate": 0.00019508571428571427,
+      "loss": 0.138,
+      "step": 1225
+    },
+    {
+      "epoch": 7.0285714285714285,
+      "grad_norm": 0.7182291150093079,
+      "learning_rate": 0.00019465714285714285,
+      "loss": 0.1431,
+      "step": 1230
+    },
+    {
+      "epoch": 7.057142857142857,
+      "grad_norm": 1.7567912340164185,
+      "learning_rate": 0.0001942285714285714,
+      "loss": 0.1319,
+      "step": 1235
+    },
+    {
+      "epoch": 7.085714285714285,
+      "grad_norm": 0.6845232248306274,
+      "learning_rate": 0.0001938,
+      "loss": 0.1292,
+      "step": 1240
+    },
+    {
+      "epoch": 7.114285714285714,
+      "grad_norm": 0.6077771782875061,
+      "learning_rate": 0.00019337142857142855,
+      "loss": 0.1238,
+      "step": 1245
+    },
+    {
+      "epoch": 7.142857142857143,
+      "grad_norm": 0.6168347597122192,
+      "learning_rate": 0.0001929428571428571,
+      "loss": 0.1384,
+      "step": 1250
+    },
+    {
+      "epoch": 7.171428571428572,
+      "grad_norm": 0.7457576394081116,
+      "learning_rate": 0.0001925142857142857,
+      "loss": 0.1306,
+      "step": 1255
+    },
+    {
+      "epoch": 7.2,
+      "grad_norm": 0.5969316363334656,
+      "learning_rate": 0.00019208571428571425,
+      "loss": 0.1123,
+      "step": 1260
+    },
+    {
+      "epoch": 7.228571428571429,
+      "grad_norm": 0.6902753710746765,
+      "learning_rate": 0.00019165714285714286,
+      "loss": 0.1185,
+      "step": 1265
+    },
+    {
+      "epoch": 7.257142857142857,
+      "grad_norm": 0.6488338112831116,
+      "learning_rate": 0.00019122857142857142,
+      "loss": 0.1431,
+      "step": 1270
+    },
+    {
+      "epoch": 7.285714285714286,
+      "grad_norm": 0.6814819574356079,
+      "learning_rate": 0.00019079999999999998,
+      "loss": 0.1495,
+      "step": 1275
+    },
+    {
+      "epoch": 7.314285714285714,
+      "grad_norm": 0.7468088865280151,
+      "learning_rate": 0.00019037142857142856,
+      "loss": 0.1158,
+      "step": 1280
+    },
+    {
+      "epoch": 7.3428571428571425,
+      "grad_norm": 0.7417412400245667,
+      "learning_rate": 0.00018994285714285712,
+      "loss": 0.1311,
+      "step": 1285
+    },
+    {
+      "epoch": 7.371428571428572,
+      "grad_norm": 0.5480664372444153,
+      "learning_rate": 0.00018951428571428567,
+      "loss": 0.135,
+      "step": 1290
+    },
+    {
+      "epoch": 7.4,
+      "grad_norm": 0.725527822971344,
+      "learning_rate": 0.00018908571428571429,
+      "loss": 0.1217,
+      "step": 1295
+    },
+    {
+      "epoch": 7.428571428571429,
+      "grad_norm": 0.6566678285598755,
+      "learning_rate": 0.00018865714285714284,
+      "loss": 0.1417,
+      "step": 1300
+    },
+    {
+      "epoch": 7.457142857142857,
+      "grad_norm": 0.516952395439148,
+      "learning_rate": 0.00018822857142857143,
+      "loss": 0.1329,
+      "step": 1305
+    },
+    {
+      "epoch": 7.485714285714286,
+      "grad_norm": 1.9545241594314575,
+      "learning_rate": 0.00018779999999999998,
+      "loss": 0.1339,
+      "step": 1310
+    },
+    {
+      "epoch": 7.514285714285714,
+      "grad_norm": 0.8276839852333069,
+      "learning_rate": 0.00018737142857142854,
+      "loss": 0.1324,
+      "step": 1315
+    },
+    {
+      "epoch": 7.542857142857143,
+      "grad_norm": 0.6737099289894104,
+      "learning_rate": 0.00018694285714285713,
+      "loss": 0.1139,
+      "step": 1320
+    },
+    {
+      "epoch": 7.571428571428571,
+      "grad_norm": 0.6914472579956055,
+      "learning_rate": 0.00018651428571428568,
+      "loss": 0.1146,
+      "step": 1325
+    },
+    {
+      "epoch": 7.6,
+      "grad_norm": 0.6630033850669861,
+      "learning_rate": 0.0001860857142857143,
+      "loss": 0.1571,
+      "step": 1330
+    },
+    {
+      "epoch": 7.628571428571428,
+      "grad_norm": 0.820688784122467,
+      "learning_rate": 0.00018565714285714285,
+      "loss": 0.15,
+      "step": 1335
+    },
+    {
+      "epoch": 7.6571428571428575,
+      "grad_norm": 2.0491325855255127,
+      "learning_rate": 0.0001852285714285714,
+      "loss": 0.127,
+      "step": 1340
+    },
+    {
+      "epoch": 7.685714285714286,
+      "grad_norm": 0.9327268004417419,
+      "learning_rate": 0.0001848,
+      "loss": 0.1289,
+      "step": 1345
+    },
+    {
+      "epoch": 7.714285714285714,
+      "grad_norm": 1.3131701946258545,
+      "learning_rate": 0.00018437142857142855,
+      "loss": 0.1228,
+      "step": 1350
+    },
+    {
+      "epoch": 7.742857142857143,
+      "grad_norm": 2.955918312072754,
+      "learning_rate": 0.0001839428571428571,
+      "loss": 0.1082,
+      "step": 1355
+    },
+    {
+      "epoch": 7.771428571428571,
+      "grad_norm": 1.2165493965148926,
+      "learning_rate": 0.00018351428571428572,
+      "loss": 0.1688,
+      "step": 1360
+    },
+    {
+      "epoch": 7.8,
+      "grad_norm": 0.759324312210083,
+      "learning_rate": 0.00018308571428571428,
+      "loss": 0.1185,
+      "step": 1365
+    },
+    {
+      "epoch": 7.828571428571428,
+      "grad_norm": 0.7445591688156128,
+      "learning_rate": 0.00018265714285714286,
+      "loss": 0.1431,
+      "step": 1370
+    },
+    {
+      "epoch": 7.857142857142857,
+      "grad_norm": 0.679374098777771,
+      "learning_rate": 0.00018222857142857142,
+      "loss": 0.1451,
+      "step": 1375
+    },
+    {
+      "epoch": 7.885714285714286,
+      "grad_norm": 2.1234302520751953,
+      "learning_rate": 0.00018179999999999997,
+      "loss": 0.1265,
+      "step": 1380
+    },
+    {
+      "epoch": 7.914285714285715,
+      "grad_norm": 1.006521224975586,
+      "learning_rate": 0.00018137142857142856,
+      "loss": 0.1722,
+      "step": 1385
+    },
+    {
+      "epoch": 7.942857142857143,
+      "grad_norm": 0.7275253534317017,
+      "learning_rate": 0.00018094285714285712,
+      "loss": 0.1625,
+      "step": 1390
+    },
+    {
+      "epoch": 7.9714285714285715,
+      "grad_norm": 0.8612022995948792,
+      "learning_rate": 0.0001805142857142857,
+      "loss": 0.1345,
+      "step": 1395
+    },
+    {
+      "epoch": 8.0,
+      "grad_norm": 0.7276798486709595,
+      "learning_rate": 0.00018008571428571428,
+      "loss": 0.1236,
+      "step": 1400
+    },
+    {
+      "epoch": 8.028571428571428,
+      "grad_norm": 0.8731086850166321,
+      "learning_rate": 0.00017965714285714284,
+      "loss": 0.1604,
+      "step": 1405
+    },
+    {
+      "epoch": 8.057142857142857,
+      "grad_norm": 0.8950818777084351,
+      "learning_rate": 0.0001792285714285714,
+      "loss": 0.1531,
+      "step": 1410
+    },
+    {
+      "epoch": 8.085714285714285,
+      "grad_norm": 0.7399356365203857,
+      "learning_rate": 0.00017879999999999998,
+      "loss": 0.1508,
+      "step": 1415
+    },
+    {
+      "epoch": 8.114285714285714,
+      "grad_norm": 1.3727307319641113,
+      "learning_rate": 0.00017837142857142854,
+      "loss": 0.1487,
+      "step": 1420
+    },
+    {
+      "epoch": 8.142857142857142,
+      "grad_norm": 0.5938125848770142,
+      "learning_rate": 0.00017794285714285715,
+      "loss": 0.1303,
+      "step": 1425
+    },
+    {
+      "epoch": 8.17142857142857,
+      "grad_norm": 0.7043821811676025,
+      "learning_rate": 0.0001775142857142857,
+      "loss": 0.0948,
+      "step": 1430
+    },
+    {
+      "epoch": 8.2,
+      "grad_norm": 1.1062767505645752,
+      "learning_rate": 0.00017708571428571426,
+      "loss": 0.1412,
+      "step": 1435
+    },
+    {
+      "epoch": 8.228571428571428,
+      "grad_norm": 0.844832181930542,
+      "learning_rate": 0.00017665714285714285,
+      "loss": 0.113,
+      "step": 1440
+    },
+    {
+      "epoch": 8.257142857142856,
+      "grad_norm": 0.7564154863357544,
+      "learning_rate": 0.0001762285714285714,
+      "loss": 0.1319,
+      "step": 1445
+    },
+    {
+      "epoch": 8.285714285714286,
+      "grad_norm": 0.8843110203742981,
+      "learning_rate": 0.00017579999999999996,
+      "loss": 0.1206,
+      "step": 1450
+    },
+    {
+      "epoch": 8.314285714285715,
+      "grad_norm": 0.8175828456878662,
+      "learning_rate": 0.00017537142857142855,
+      "loss": 0.1327,
+      "step": 1455
+    },
+    {
+      "epoch": 8.342857142857143,
+      "grad_norm": 0.6443565487861633,
+      "learning_rate": 0.00017494285714285713,
+      "loss": 0.1239,
+      "step": 1460
+    },
+    {
+      "epoch": 8.371428571428572,
+      "grad_norm": 0.7237185835838318,
+      "learning_rate": 0.00017451428571428572,
+      "loss": 0.1639,
+      "step": 1465
+    },
+    {
+      "epoch": 8.4,
+      "grad_norm": 0.6118057370185852,
+      "learning_rate": 0.00017408571428571427,
+      "loss": 0.1363,
+      "step": 1470
+    },
+    {
+      "epoch": 8.428571428571429,
+      "grad_norm": 0.6754649877548218,
+      "learning_rate": 0.00017365714285714283,
+      "loss": 0.1187,
+      "step": 1475
+    },
+    {
+      "epoch": 8.457142857142857,
+      "grad_norm": 1.0067390203475952,
+      "learning_rate": 0.00017322857142857141,
+      "loss": 0.1401,
+      "step": 1480
+    },
+    {
+      "epoch": 8.485714285714286,
+      "grad_norm": 8.509544372558594,
+      "learning_rate": 0.00017279999999999997,
+      "loss": 0.1304,
+      "step": 1485
+    },
+    {
+      "epoch": 8.514285714285714,
+      "grad_norm": 4.2030205726623535,
+      "learning_rate": 0.00017237142857142858,
+      "loss": 0.121,
+      "step": 1490
+    },
+    {
+      "epoch": 8.542857142857143,
+      "grad_norm": 4.877438068389893,
+      "learning_rate": 0.00017194285714285714,
+      "loss": 0.1918,
+      "step": 1495
+    },
+    {
+      "epoch": 8.571428571428571,
+      "grad_norm": 6.4971232414245605,
+      "learning_rate": 0.0001715142857142857,
+      "loss": 0.2154,
+      "step": 1500
+    },
+    {
+      "epoch": 8.6,
+      "grad_norm": 4.365469932556152,
+      "learning_rate": 0.00017108571428571428,
+      "loss": 0.2272,
+      "step": 1505
+    },
+    {
+      "epoch": 8.628571428571428,
+      "grad_norm": 2.551957845687866,
+      "learning_rate": 0.00017065714285714284,
+      "loss": 0.2163,
+      "step": 1510
+    },
+    {
+      "epoch": 8.657142857142857,
+      "grad_norm": 5.326391220092773,
+      "learning_rate": 0.0001702285714285714,
+      "loss": 0.1612,
+      "step": 1515
+    },
+    {
+      "epoch": 8.685714285714285,
+      "grad_norm": 1.3528404235839844,
+      "learning_rate": 0.00016979999999999998,
+      "loss": 0.1636,
+      "step": 1520
+    },
+    {
+      "epoch": 8.714285714285714,
+      "grad_norm": 1.4466065168380737,
+      "learning_rate": 0.00016937142857142856,
+      "loss": 0.1295,
+      "step": 1525
+    },
+    {
+      "epoch": 8.742857142857144,
+      "grad_norm": 0.6576040387153625,
+      "learning_rate": 0.00016894285714285715,
+      "loss": 0.1318,
+      "step": 1530
+    },
+    {
+      "epoch": 8.771428571428572,
+      "grad_norm": 1.286942958831787,
+      "learning_rate": 0.0001685142857142857,
+      "loss": 0.1443,
+      "step": 1535
+    },
+    {
+      "epoch": 8.8,
+      "grad_norm": 9.474458694458008,
+      "learning_rate": 0.00016808571428571426,
+      "loss": 0.1313,
+      "step": 1540
+    },
+    {
+      "epoch": 8.82857142857143,
+      "grad_norm": 2.6731069087982178,
+      "learning_rate": 0.00016765714285714285,
+      "loss": 0.1485,
+      "step": 1545
+    },
+    {
+      "epoch": 8.857142857142858,
+      "grad_norm": 1.313723087310791,
+      "learning_rate": 0.0001672285714285714,
+      "loss": 0.1346,
+      "step": 1550
+    },
+    {
+      "epoch": 8.885714285714286,
+      "grad_norm": 1.7115576267242432,
+      "learning_rate": 0.0001668,
+      "loss": 0.1471,
+      "step": 1555
+    },
+    {
+      "epoch": 8.914285714285715,
+      "grad_norm": 1.2599923610687256,
+      "learning_rate": 0.00016637142857142857,
+      "loss": 0.1433,
+      "step": 1560
+    },
+    {
+      "epoch": 8.942857142857143,
+      "grad_norm": 0.9659029245376587,
+      "learning_rate": 0.00016594285714285713,
+      "loss": 0.1256,
+      "step": 1565
+    },
+    {
+      "epoch": 8.971428571428572,
+      "grad_norm": 1.1282744407653809,
+      "learning_rate": 0.0001655142857142857,
+      "loss": 0.1373,
+      "step": 1570
+    },
+    {
+      "epoch": 9.0,
+      "grad_norm": 3.20717453956604,
+      "learning_rate": 0.00016508571428571427,
+      "loss": 0.1355,
+      "step": 1575
+    },
+    {
+      "epoch": 9.028571428571428,
+      "grad_norm": 0.8310821056365967,
+      "learning_rate": 0.00016465714285714283,
+      "loss": 0.1268,
+      "step": 1580
+    },
+    {
+      "epoch": 9.057142857142857,
+      "grad_norm": 1.5337790250778198,
+      "learning_rate": 0.00016422857142857139,
+      "loss": 0.1267,
+      "step": 1585
+    },
+    {
+      "epoch": 9.085714285714285,
+      "grad_norm": 2.6406068801879883,
+      "learning_rate": 0.0001638,
+      "loss": 0.1363,
+      "step": 1590
+    },
+    {
+      "epoch": 9.114285714285714,
+      "grad_norm": 0.7705873847007751,
+      "learning_rate": 0.00016337142857142855,
+      "loss": 0.1291,
+      "step": 1595
+    },
+    {
+      "epoch": 9.142857142857142,
+      "grad_norm": 0.7092650532722473,
+      "learning_rate": 0.00016294285714285714,
+      "loss": 0.1435,
+      "step": 1600
+    },
+    {
+      "epoch": 9.17142857142857,
+      "grad_norm": 1.098961591720581,
+      "learning_rate": 0.0001625142857142857,
+      "loss": 0.1471,
+      "step": 1605
+    },
+    {
+      "epoch": 9.2,
+      "grad_norm": 0.6994885206222534,
+      "learning_rate": 0.00016208571428571425,
+      "loss": 0.1345,
+      "step": 1610
+    },
+    {
+      "epoch": 9.228571428571428,
+      "grad_norm": 0.9613476991653442,
+      "learning_rate": 0.00016165714285714284,
+      "loss": 0.1399,
+      "step": 1615
+    },
+    {
+      "epoch": 9.257142857142856,
+      "grad_norm": 0.675588846206665,
+      "learning_rate": 0.00016122857142857142,
+      "loss": 0.1319,
+      "step": 1620
+    },
+    {
+      "epoch": 9.285714285714286,
+      "grad_norm": 0.7519372701644897,
+      "learning_rate": 0.0001608,
+      "loss": 0.137,
+      "step": 1625
+    },
+    {
+      "epoch": 9.314285714285715,
+      "grad_norm": 1.135025978088379,
+      "learning_rate": 0.00016037142857142856,
+      "loss": 0.1322,
+      "step": 1630
+    },
+    {
+      "epoch": 9.342857142857143,
+      "grad_norm": 0.7462936639785767,
+      "learning_rate": 0.00015994285714285712,
+      "loss": 0.1215,
+      "step": 1635
+    },
+    {
+      "epoch": 9.371428571428572,
+      "grad_norm": 0.9042088985443115,
+      "learning_rate": 0.0001595142857142857,
+      "loss": 0.1191,
+      "step": 1640
+    },
+    {
+      "epoch": 9.4,
+      "grad_norm": 0.567828893661499,
+      "learning_rate": 0.00015908571428571426,
+      "loss": 0.1189,
+      "step": 1645
+    },
+    {
+      "epoch": 9.428571428571429,
+      "grad_norm": 0.981585681438446,
+      "learning_rate": 0.00015865714285714282,
+      "loss": 0.128,
+      "step": 1650
+    },
+    {
+      "epoch": 9.457142857142857,
+      "grad_norm": 1.24985933303833,
+      "learning_rate": 0.00015822857142857143,
+      "loss": 0.1315,
+      "step": 1655
+    },
+    {
+      "epoch": 9.485714285714286,
+      "grad_norm": 0.6517993211746216,
+      "learning_rate": 0.0001578,
+      "loss": 0.1076,
+      "step": 1660
+    },
+    {
+      "epoch": 9.514285714285714,
+      "grad_norm": 1.166628122329712,
+      "learning_rate": 0.00015737142857142857,
+      "loss": 0.1345,
+      "step": 1665
+    },
+    {
+      "epoch": 9.542857142857143,
+      "grad_norm": 0.9763592481613159,
+      "learning_rate": 0.00015694285714285713,
+      "loss": 0.1449,
+      "step": 1670
+    },
+    {
+      "epoch": 9.571428571428571,
+      "grad_norm": 0.7829060554504395,
+      "learning_rate": 0.00015651428571428569,
+      "loss": 0.1117,
+      "step": 1675
+    },
+    {
+      "epoch": 9.6,
+      "grad_norm": 0.6693719029426575,
+      "learning_rate": 0.00015608571428571427,
+      "loss": 0.1129,
+      "step": 1680
+    },
+    {
+      "epoch": 9.628571428571428,
+      "grad_norm": 1.2122846841812134,
+      "learning_rate": 0.00015565714285714285,
+      "loss": 0.1125,
+      "step": 1685
+    },
+    {
+      "epoch": 9.657142857142857,
+      "grad_norm": 1.0689371824264526,
+      "learning_rate": 0.0001552285714285714,
+      "loss": 0.1478,
+      "step": 1690
+    },
+    {
+      "epoch": 9.685714285714285,
+      "grad_norm": 1.8511656522750854,
+      "learning_rate": 0.0001548,
+      "loss": 0.1431,
+      "step": 1695
+    },
+    {
+      "epoch": 9.714285714285714,
+      "grad_norm": 0.6706506609916687,
+      "learning_rate": 0.00015437142857142855,
+      "loss": 0.1262,
+      "step": 1700
+    },
+    {
+      "epoch": 9.742857142857144,
+      "grad_norm": 1.0798784494400024,
+      "learning_rate": 0.00015394285714285714,
+      "loss": 0.1275,
+      "step": 1705
+    },
+    {
+      "epoch": 9.771428571428572,
+      "grad_norm": 0.7915983200073242,
+      "learning_rate": 0.0001535142857142857,
+      "loss": 0.1316,
+      "step": 1710
+    },
+    {
+      "epoch": 9.8,
+      "grad_norm": 1.8630567789077759,
+      "learning_rate": 0.00015308571428571425,
+      "loss": 0.1258,
+      "step": 1715
+    },
+    {
+      "epoch": 9.82857142857143,
+      "grad_norm": 0.7807756662368774,
+      "learning_rate": 0.00015265714285714286,
+      "loss": 0.1079,
+      "step": 1720
+    },
+    {
+      "epoch": 9.857142857142858,
+      "grad_norm": 1.4698439836502075,
+      "learning_rate": 0.00015222857142857142,
+      "loss": 0.1357,
+      "step": 1725
+    },
+    {
+      "epoch": 9.885714285714286,
+      "grad_norm": 1.2121926546096802,
+      "learning_rate": 0.00015179999999999998,
+      "loss": 0.1322,
+      "step": 1730
+    },
+    {
+      "epoch": 9.914285714285715,
+      "grad_norm": 0.6348568201065063,
+      "learning_rate": 0.00015137142857142856,
+      "loss": 0.0893,
+      "step": 1735
+    },
+    {
+      "epoch": 9.942857142857143,
+      "grad_norm": 0.6694422364234924,
+      "learning_rate": 0.00015094285714285712,
+      "loss": 0.1189,
+      "step": 1740
+    },
+    {
+      "epoch": 9.971428571428572,
+      "grad_norm": 0.569332480430603,
+      "learning_rate": 0.00015051428571428567,
+      "loss": 0.1349,
+      "step": 1745
+    },
+    {
+      "epoch": 10.0,
+      "grad_norm": 0.934073269367218,
+      "learning_rate": 0.00015008571428571429,
+      "loss": 0.1237,
+      "step": 1750
+    },
+    {
+      "epoch": 10.028571428571428,
+      "grad_norm": 0.7191672325134277,
+      "learning_rate": 0.00014965714285714284,
+      "loss": 0.1308,
+      "step": 1755
+    },
+    {
+      "epoch": 10.057142857142857,
+      "grad_norm": 0.7006493806838989,
+      "learning_rate": 0.00014922857142857143,
+      "loss": 0.104,
+      "step": 1760
+    },
+    {
+      "epoch": 10.085714285714285,
+      "grad_norm": 0.9030678272247314,
+      "learning_rate": 0.00014879999999999998,
+      "loss": 0.1308,
+      "step": 1765
+    },
+    {
+      "epoch": 10.114285714285714,
+      "grad_norm": 0.7007766366004944,
+      "learning_rate": 0.00014837142857142854,
+      "loss": 0.1044,
+      "step": 1770
+    },
+    {
+      "epoch": 10.142857142857142,
+      "grad_norm": 0.4832770824432373,
+      "learning_rate": 0.00014794285714285713,
+      "loss": 0.1119,
+      "step": 1775
+    },
+    {
+      "epoch": 10.17142857142857,
+      "grad_norm": 0.7819458842277527,
+      "learning_rate": 0.0001475142857142857,
+      "loss": 0.1087,
+      "step": 1780
+    },
+    {
+      "epoch": 10.2,
+      "grad_norm": 1.0223525762557983,
+      "learning_rate": 0.00014708571428571427,
+      "loss": 0.1314,
+      "step": 1785
+    },
+    {
+      "epoch": 10.228571428571428,
+      "grad_norm": 0.6224566698074341,
+      "learning_rate": 0.00014665714285714285,
+      "loss": 0.1159,
+      "step": 1790
+    },
+    {
+      "epoch": 10.257142857142856,
+      "grad_norm": 0.45800235867500305,
+      "learning_rate": 0.0001462285714285714,
+      "loss": 0.0942,
+      "step": 1795
+    },
+    {
+      "epoch": 10.285714285714286,
+      "grad_norm": 0.6258400082588196,
+      "learning_rate": 0.0001458,
+      "loss": 0.1079,
+      "step": 1800
+    },
+    {
+      "epoch": 10.314285714285715,
+      "grad_norm": 1.1812794208526611,
+      "learning_rate": 0.00014537142857142858,
+      "loss": 0.1378,
+      "step": 1805
+    },
+    {
+      "epoch": 10.342857142857143,
+      "grad_norm": 0.8541269898414612,
+      "learning_rate": 0.00014494285714285713,
+      "loss": 0.1274,
+      "step": 1810
+    },
+    {
+      "epoch": 10.371428571428572,
+      "grad_norm": 0.7131860256195068,
+      "learning_rate": 0.0001445142857142857,
+      "loss": 0.1247,
+      "step": 1815
+    },
+    {
+      "epoch": 10.4,
+      "grad_norm": 0.6109820008277893,
+      "learning_rate": 0.00014408571428571428,
+      "loss": 0.1246,
+      "step": 1820
+    },
+    {
+      "epoch": 10.428571428571429,
+      "grad_norm": 0.5621510744094849,
+      "learning_rate": 0.00014365714285714286,
+      "loss": 0.1039,
+      "step": 1825
+    },
+    {
+      "epoch": 10.457142857142857,
+      "grad_norm": 1.022777795791626,
+      "learning_rate": 0.00014322857142857142,
+      "loss": 0.1206,
+      "step": 1830
+    },
+    {
+      "epoch": 10.485714285714286,
+      "grad_norm": 0.9120668768882751,
+      "learning_rate": 0.00014279999999999997,
+      "loss": 0.1289,
+      "step": 1835
+    },
+    {
+      "epoch": 10.514285714285714,
+      "grad_norm": 1.1882030963897705,
+      "learning_rate": 0.00014237142857142856,
+      "loss": 0.1194,
+      "step": 1840
+    },
+    {
+      "epoch": 10.542857142857143,
+      "grad_norm": 0.6078401207923889,
+      "learning_rate": 0.00014194285714285714,
+      "loss": 0.1339,
+      "step": 1845
+    },
+    {
+      "epoch": 10.571428571428571,
+      "grad_norm": 0.7380999326705933,
+      "learning_rate": 0.0001415142857142857,
+      "loss": 0.1318,
+      "step": 1850
+    },
+    {
+      "epoch": 10.6,
+      "grad_norm": 0.5884959101676941,
+      "learning_rate": 0.00014108571428571428,
+      "loss": 0.1249,
+      "step": 1855
+    },
+    {
+      "epoch": 10.628571428571428,
+      "grad_norm": 1.0121936798095703,
+      "learning_rate": 0.00014065714285714284,
+      "loss": 0.1137,
+      "step": 1860
+    },
+    {
+      "epoch": 10.657142857142857,
+      "grad_norm": 0.6444916129112244,
+      "learning_rate": 0.00014022857142857143,
+      "loss": 0.1213,
+      "step": 1865
+    },
+    {
+      "epoch": 10.685714285714285,
+      "grad_norm": 0.7931004762649536,
+      "learning_rate": 0.00013979999999999998,
+      "loss": 0.1318,
+      "step": 1870
+    },
+    {
+      "epoch": 10.714285714285714,
+      "grad_norm": 0.5596404075622559,
+      "learning_rate": 0.00013937142857142857,
+      "loss": 0.1075,
+      "step": 1875
+    },
+    {
+      "epoch": 10.742857142857144,
+      "grad_norm": 0.6586474180221558,
+      "learning_rate": 0.00013894285714285712,
+      "loss": 0.13,
+      "step": 1880
+    },
+    {
+      "epoch": 10.771428571428572,
+      "grad_norm": 1.0195013284683228,
+      "learning_rate": 0.00013851428571428568,
+      "loss": 0.1373,
+      "step": 1885
+    },
+    {
+      "epoch": 10.8,
+      "grad_norm": 0.9233512878417969,
+      "learning_rate": 0.00013808571428571427,
+      "loss": 0.1168,
+      "step": 1890
+    },
+    {
+      "epoch": 10.82857142857143,
+      "grad_norm": 0.7154092788696289,
+      "learning_rate": 0.00013765714285714285,
+      "loss": 0.1081,
+      "step": 1895
+    },
+    {
+      "epoch": 10.857142857142858,
+      "grad_norm": 1.4588117599487305,
+      "learning_rate": 0.0001372285714285714,
+      "loss": 0.1061,
+      "step": 1900
+    },
+    {
+      "epoch": 10.885714285714286,
+      "grad_norm": 0.6087035536766052,
+      "learning_rate": 0.0001368,
+      "loss": 0.1157,
+      "step": 1905
+    },
+    {
+      "epoch": 10.914285714285715,
+      "grad_norm": 0.7371247410774231,
+      "learning_rate": 0.00013637142857142855,
+      "loss": 0.1339,
+      "step": 1910
+    },
+    {
+      "epoch": 10.942857142857143,
+      "grad_norm": 0.8253212571144104,
+      "learning_rate": 0.00013594285714285713,
+      "loss": 0.1198,
+      "step": 1915
+    },
+    {
+      "epoch": 10.971428571428572,
+      "grad_norm": 0.6889544129371643,
+      "learning_rate": 0.00013551428571428572,
+      "loss": 0.1131,
+      "step": 1920
+    },
+    {
+      "epoch": 11.0,
+      "grad_norm": 0.6408224105834961,
+      "learning_rate": 0.00013508571428571427,
+      "loss": 0.122,
+      "step": 1925
+    },
+    {
+      "epoch": 11.028571428571428,
+      "grad_norm": 0.6771185398101807,
+      "learning_rate": 0.00013465714285714283,
+      "loss": 0.1492,
+      "step": 1930
+    },
+    {
+      "epoch": 11.057142857142857,
+      "grad_norm": 0.8706450462341309,
+      "learning_rate": 0.00013422857142857142,
+      "loss": 0.1294,
+      "step": 1935
+    },
+    {
+      "epoch": 11.085714285714285,
+      "grad_norm": 1.730648398399353,
+      "learning_rate": 0.0001338,
+      "loss": 0.1004,
+      "step": 1940
+    },
+    {
+      "epoch": 11.114285714285714,
+      "grad_norm": 0.6985113620758057,
+      "learning_rate": 0.00013337142857142856,
+      "loss": 0.0995,
+      "step": 1945
+    },
+    {
+      "epoch": 11.142857142857142,
+      "grad_norm": 0.8901951313018799,
+      "learning_rate": 0.00013294285714285711,
+      "loss": 0.1179,
+      "step": 1950
+    },
+    {
+      "epoch": 11.17142857142857,
+      "grad_norm": 0.7232164144515991,
+      "learning_rate": 0.0001325142857142857,
+      "loss": 0.1397,
+      "step": 1955
+    },
+    {
+      "epoch": 11.2,
+      "grad_norm": 0.6447544693946838,
+      "learning_rate": 0.00013208571428571428,
+      "loss": 0.1366,
+      "step": 1960
+    },
+    {
+      "epoch": 11.228571428571428,
+      "grad_norm": 0.7964944243431091,
+      "learning_rate": 0.00013165714285714284,
+      "loss": 0.1121,
+      "step": 1965
+    },
+    {
+      "epoch": 11.257142857142856,
+      "grad_norm": 0.9012628793716431,
+      "learning_rate": 0.00013122857142857142,
+      "loss": 0.1131,
+      "step": 1970
+    },
+    {
+      "epoch": 11.285714285714286,
+      "grad_norm": 0.9295369982719421,
+      "learning_rate": 0.00013079999999999998,
+      "loss": 0.1232,
+      "step": 1975
+    },
+    {
+      "epoch": 11.314285714285715,
+      "grad_norm": 0.6237708926200867,
+      "learning_rate": 0.00013037142857142857,
+      "loss": 0.1066,
+      "step": 1980
+    },
+    {
+      "epoch": 11.342857142857143,
+      "grad_norm": 0.5250967741012573,
+      "learning_rate": 0.00012994285714285715,
+      "loss": 0.118,
+      "step": 1985
+    },
+    {
+      "epoch": 11.371428571428572,
+      "grad_norm": 1.0013964176177979,
+      "learning_rate": 0.0001295142857142857,
+      "loss": 0.1125,
+      "step": 1990
+    },
+    {
+      "epoch": 11.4,
+      "grad_norm": 0.6721311807632446,
+      "learning_rate": 0.00012908571428571426,
+      "loss": 0.1196,
+      "step": 1995
+    },
+    {
+      "epoch": 11.428571428571429,
+      "grad_norm": 0.6966421008110046,
+      "learning_rate": 0.00012865714285714285,
+      "loss": 0.1172,
+      "step": 2000
+    },
+    {
+      "epoch": 11.457142857142857,
+      "grad_norm": 0.8811460733413696,
+      "learning_rate": 0.00012822857142857143,
+      "loss": 0.135,
+      "step": 2005
+    },
+    {
+      "epoch": 11.485714285714286,
+      "grad_norm": 0.8829531073570251,
+      "learning_rate": 0.0001278,
+      "loss": 0.1288,
+      "step": 2010
+    },
+    {
+      "epoch": 11.514285714285714,
+      "grad_norm": 0.7530654668807983,
+      "learning_rate": 0.00012737142857142855,
+      "loss": 0.1073,
+      "step": 2015
+    },
+    {
+      "epoch": 11.542857142857143,
+      "grad_norm": 0.513940691947937,
+      "learning_rate": 0.00012694285714285713,
+      "loss": 0.121,
+      "step": 2020
+    },
+    {
+      "epoch": 11.571428571428571,
+      "grad_norm": 0.8574968576431274,
+      "learning_rate": 0.0001265142857142857,
+      "loss": 0.1103,
+      "step": 2025
+    },
+    {
+      "epoch": 11.6,
+      "grad_norm": 0.7482439875602722,
+      "learning_rate": 0.00012608571428571427,
+      "loss": 0.1027,
+      "step": 2030
+    },
+    {
+      "epoch": 11.628571428571428,
+      "grad_norm": 0.8367976546287537,
+      "learning_rate": 0.00012565714285714286,
+      "loss": 0.1181,
+      "step": 2035
+    },
+    {
+      "epoch": 11.657142857142857,
+      "grad_norm": 2.048128366470337,
+      "learning_rate": 0.0001252285714285714,
+      "loss": 0.1122,
+      "step": 2040
+    },
+    {
+      "epoch": 11.685714285714285,
+      "grad_norm": 0.7426862716674805,
+      "learning_rate": 0.00012479999999999997,
+      "loss": 0.1169,
+      "step": 2045
+    },
+    {
+      "epoch": 11.714285714285714,
+      "grad_norm": 3.093841791152954,
+      "learning_rate": 0.00012437142857142855,
+      "loss": 0.1164,
+      "step": 2050
+    },
+    {
+      "epoch": 11.742857142857144,
+      "grad_norm": 0.8172643184661865,
+      "learning_rate": 0.00012394285714285714,
+      "loss": 0.1354,
+      "step": 2055
+    },
+    {
+      "epoch": 11.771428571428572,
+      "grad_norm": 1.9950591325759888,
+      "learning_rate": 0.0001235142857142857,
+      "loss": 0.1037,
+      "step": 2060
+    },
+    {
+      "epoch": 11.8,
+      "grad_norm": 0.5929077863693237,
+      "learning_rate": 0.00012308571428571428,
+      "loss": 0.1194,
+      "step": 2065
+    },
+    {
+      "epoch": 11.82857142857143,
+      "grad_norm": 1.293624997138977,
+      "learning_rate": 0.00012265714285714284,
+      "loss": 0.12,
+      "step": 2070
+    },
+    {
+      "epoch": 11.857142857142858,
+      "grad_norm": 1.0515168905258179,
+      "learning_rate": 0.00012222857142857142,
+      "loss": 0.1049,
+      "step": 2075
+    },
+    {
+      "epoch": 11.885714285714286,
+      "grad_norm": 1.2874428033828735,
+      "learning_rate": 0.00012179999999999999,
+      "loss": 0.115,
+      "step": 2080
+    },
+    {
+      "epoch": 11.914285714285715,
+      "grad_norm": 0.7317278385162354,
+      "learning_rate": 0.00012137142857142856,
+      "loss": 0.1184,
+      "step": 2085
+    },
+    {
+      "epoch": 11.942857142857143,
+      "grad_norm": 1.3407148122787476,
+      "learning_rate": 0.00012094285714285713,
+      "loss": 0.132,
+      "step": 2090
+    },
+    {
+      "epoch": 11.971428571428572,
+      "grad_norm": 2.656409502029419,
+      "learning_rate": 0.00012051428571428569,
+      "loss": 0.1359,
+      "step": 2095
+    },
+    {
+      "epoch": 12.0,
+      "grad_norm": 0.7189064025878906,
+      "learning_rate": 0.00012008571428571428,
+      "loss": 0.1217,
+      "step": 2100
+    },
+    {
+      "epoch": 12.028571428571428,
+      "grad_norm": 0.7510334849357605,
+      "learning_rate": 0.00011965714285714285,
+      "loss": 0.109,
+      "step": 2105
+    },
+    {
+      "epoch": 12.057142857142857,
+      "grad_norm": 0.7235113382339478,
+      "learning_rate": 0.00011922857142857142,
+      "loss": 0.1114,
+      "step": 2110
+    },
+    {
+      "epoch": 12.085714285714285,
+      "grad_norm": 1.7435882091522217,
+      "learning_rate": 0.0001188,
+      "loss": 0.1357,
+      "step": 2115
+    },
+    {
+      "epoch": 12.114285714285714,
+      "grad_norm": 1.170392632484436,
+      "learning_rate": 0.00011837142857142856,
+      "loss": 0.1255,
+      "step": 2120
+    },
+    {
+      "epoch": 12.142857142857142,
+      "grad_norm": 0.6476783752441406,
+      "learning_rate": 0.00011794285714285713,
+      "loss": 0.1108,
+      "step": 2125
+    },
+    {
+      "epoch": 12.17142857142857,
+      "grad_norm": 0.8599929213523865,
+      "learning_rate": 0.00011751428571428571,
+      "loss": 0.0997,
+      "step": 2130
+    },
+    {
+      "epoch": 12.2,
+      "grad_norm": 0.8918687105178833,
+      "learning_rate": 0.00011708571428571428,
+      "loss": 0.1149,
+      "step": 2135
+    },
+    {
+      "epoch": 12.228571428571428,
+      "grad_norm": 1.609435796737671,
+      "learning_rate": 0.00011665714285714284,
+      "loss": 0.1136,
+      "step": 2140
+    },
+    {
+      "epoch": 12.257142857142856,
+      "grad_norm": 0.6206801533699036,
+      "learning_rate": 0.00011622857142857143,
+      "loss": 0.1135,
+      "step": 2145
+    },
+    {
+      "epoch": 12.285714285714286,
+      "grad_norm": 0.8769077658653259,
+      "learning_rate": 0.0001158,
+      "loss": 0.1344,
+      "step": 2150
+    },
+    {
+      "epoch": 12.314285714285715,
+      "grad_norm": 0.6279401183128357,
+      "learning_rate": 0.00011537142857142855,
+      "loss": 0.1049,
+      "step": 2155
+    },
+    {
+      "epoch": 12.342857142857143,
+      "grad_norm": 1.1110137701034546,
+      "learning_rate": 0.00011494285714285712,
+      "loss": 0.1146,
+      "step": 2160
+    },
+    {
+      "epoch": 12.371428571428572,
+      "grad_norm": 0.7911233901977539,
+      "learning_rate": 0.00011451428571428571,
+      "loss": 0.1257,
+      "step": 2165
+    },
+    {
+      "epoch": 12.4,
+      "grad_norm": 0.9691207408905029,
+      "learning_rate": 0.00011408571428571428,
+      "loss": 0.1226,
+      "step": 2170
+    },
+    {
+      "epoch": 12.428571428571429,
+      "grad_norm": 0.6168835759162903,
+      "learning_rate": 0.00011365714285714284,
+      "loss": 0.1271,
+      "step": 2175
+    },
+    {
+      "epoch": 12.457142857142857,
+      "grad_norm": 0.6143497228622437,
+      "learning_rate": 0.00011322857142857142,
+      "loss": 0.111,
+      "step": 2180
+    },
+    {
+      "epoch": 12.485714285714286,
+      "grad_norm": 1.5673450231552124,
+      "learning_rate": 0.00011279999999999999,
+      "loss": 0.1186,
+      "step": 2185
+    },
+    {
+      "epoch": 12.514285714285714,
+      "grad_norm": 1.298756718635559,
+      "learning_rate": 0.00011237142857142856,
+      "loss": 0.1024,
+      "step": 2190
+    },
+    {
+      "epoch": 12.542857142857143,
+      "grad_norm": 0.9484918117523193,
+      "learning_rate": 0.00011194285714285715,
+      "loss": 0.1171,
+      "step": 2195
+    },
+    {
+      "epoch": 12.571428571428571,
+      "grad_norm": 0.725705623626709,
+      "learning_rate": 0.0001115142857142857,
+      "loss": 0.1216,
+      "step": 2200
+    },
+    {
+      "epoch": 12.6,
+      "grad_norm": 1.1394798755645752,
+      "learning_rate": 0.00011108571428571427,
+      "loss": 0.1132,
+      "step": 2205
+    },
+    {
+      "epoch": 12.628571428571428,
+      "grad_norm": 0.9548712968826294,
+      "learning_rate": 0.00011065714285714286,
+      "loss": 0.1209,
+      "step": 2210
+    },
+    {
+      "epoch": 12.657142857142857,
+      "grad_norm": 0.6173953413963318,
+      "learning_rate": 0.00011022857142857143,
+      "loss": 0.1049,
+      "step": 2215
+    },
+    {
+      "epoch": 12.685714285714285,
+      "grad_norm": 0.8227205872535706,
+      "learning_rate": 0.00010979999999999999,
+      "loss": 0.1045,
+      "step": 2220
+    },
+    {
+      "epoch": 12.714285714285714,
+      "grad_norm": 0.7252780795097351,
+      "learning_rate": 0.00010937142857142856,
+      "loss": 0.1146,
+      "step": 2225
+    },
+    {
+      "epoch": 12.742857142857144,
+      "grad_norm": 0.9374399781227112,
+      "learning_rate": 0.00010894285714285714,
+      "loss": 0.1478,
+      "step": 2230
+    },
+    {
+      "epoch": 12.771428571428572,
+      "grad_norm": 5.1985368728637695,
+      "learning_rate": 0.0001085142857142857,
+      "loss": 0.1059,
+      "step": 2235
+    },
+    {
+      "epoch": 12.8,
+      "grad_norm": 0.9629620909690857,
+      "learning_rate": 0.00010808571428571427,
+      "loss": 0.124,
+      "step": 2240
+    },
+    {
+      "epoch": 12.82857142857143,
+      "grad_norm": 0.7022290229797363,
+      "learning_rate": 0.00010765714285714285,
+      "loss": 0.1309,
+      "step": 2245
+    },
+    {
+      "epoch": 12.857142857142858,
+      "grad_norm": 0.574188232421875,
+      "learning_rate": 0.00010722857142857142,
+      "loss": 0.086,
+      "step": 2250
+    },
+    {
+      "epoch": 12.885714285714286,
+      "grad_norm": 0.9712439179420471,
+      "learning_rate": 0.00010679999999999998,
+      "loss": 0.1152,
+      "step": 2255
+    },
+    {
+      "epoch": 12.914285714285715,
+      "grad_norm": 0.6562150120735168,
+      "learning_rate": 0.00010637142857142856,
+      "loss": 0.1343,
+      "step": 2260
+    },
+    {
+      "epoch": 12.942857142857143,
+      "grad_norm": 0.6936819553375244,
+      "learning_rate": 0.00010594285714285714,
+      "loss": 0.1009,
+      "step": 2265
+    },
+    {
+      "epoch": 12.971428571428572,
+      "grad_norm": 0.8664882779121399,
+      "learning_rate": 0.0001055142857142857,
+      "loss": 0.1164,
+      "step": 2270
+    },
+    {
+      "epoch": 13.0,
+      "grad_norm": 0.9224509000778198,
+      "learning_rate": 0.00010508571428571429,
+      "loss": 0.1347,
+      "step": 2275
+    },
+    {
+      "epoch": 13.028571428571428,
+      "grad_norm": 0.6596968770027161,
+      "learning_rate": 0.00010465714285714285,
+      "loss": 0.1041,
+      "step": 2280
+    },
+    {
+      "epoch": 13.057142857142857,
+      "grad_norm": 0.6456631422042847,
+      "learning_rate": 0.00010422857142857142,
+      "loss": 0.1142,
+      "step": 2285
+    },
+    {
+      "epoch": 13.085714285714285,
+      "grad_norm": 0.9466612339019775,
+      "learning_rate": 0.00010379999999999999,
+      "loss": 0.1191,
+      "step": 2290
+    },
+    {
+      "epoch": 13.114285714285714,
+      "grad_norm": 0.9036727547645569,
+      "learning_rate": 0.00010337142857142856,
+      "loss": 0.121,
+      "step": 2295
+    },
+    {
+      "epoch": 13.142857142857142,
+      "grad_norm": 1.08086359500885,
+      "learning_rate": 0.00010294285714285713,
+      "loss": 0.1313,
+      "step": 2300
+    },
+    {
+      "epoch": 13.17142857142857,
+      "grad_norm": 0.703241765499115,
+      "learning_rate": 0.0001025142857142857,
+      "loss": 0.1151,
+      "step": 2305
+    },
+    {
+      "epoch": 13.2,
+      "grad_norm": 0.7901896238327026,
+      "learning_rate": 0.00010208571428571429,
+      "loss": 0.1275,
+      "step": 2310
+    },
+    {
+      "epoch": 13.228571428571428,
+      "grad_norm": 0.703542947769165,
+      "learning_rate": 0.00010165714285714284,
+      "loss": 0.1,
+      "step": 2315
+    },
+    {
+      "epoch": 13.257142857142856,
+      "grad_norm": 0.6657671928405762,
+      "learning_rate": 0.00010122857142857141,
+      "loss": 0.1141,
+      "step": 2320
+    },
+    {
+      "epoch": 13.285714285714286,
+      "grad_norm": 0.7593729496002197,
+      "learning_rate": 0.0001008,
+      "loss": 0.1099,
+      "step": 2325
+    },
+    {
+      "epoch": 13.314285714285715,
+      "grad_norm": 0.6681057810783386,
+      "learning_rate": 0.00010037142857142857,
+      "loss": 0.112,
+      "step": 2330
+    },
+    {
+      "epoch": 13.342857142857143,
+      "grad_norm": 0.7155857682228088,
+      "learning_rate": 9.994285714285712e-05,
+      "loss": 0.0989,
+      "step": 2335
+    },
+    {
+      "epoch": 13.371428571428572,
+      "grad_norm": 0.9484553337097168,
+      "learning_rate": 9.951428571428571e-05,
+      "loss": 0.0902,
+      "step": 2340
+    },
+    {
+      "epoch": 13.4,
+      "grad_norm": 0.9317265152931213,
+      "learning_rate": 9.908571428571428e-05,
+      "loss": 0.1432,
+      "step": 2345
+    },
+    {
+      "epoch": 13.428571428571429,
+      "grad_norm": 1.039158821105957,
+      "learning_rate": 9.865714285714285e-05,
+      "loss": 0.114,
+      "step": 2350
+    },
+    {
+      "epoch": 13.457142857142857,
+      "grad_norm": 0.8524000644683838,
+      "learning_rate": 9.822857142857141e-05,
+      "loss": 0.1144,
+      "step": 2355
+    },
+    {
+      "epoch": 13.485714285714286,
+      "grad_norm": 0.6337461471557617,
+      "learning_rate": 9.779999999999999e-05,
+      "loss": 0.1073,
+      "step": 2360
+    },
+    {
+      "epoch": 13.514285714285714,
+      "grad_norm": 0.9097298383712769,
+      "learning_rate": 9.737142857142856e-05,
+      "loss": 0.1031,
+      "step": 2365
+    },
+    {
+      "epoch": 13.542857142857143,
+      "grad_norm": 1.2013412714004517,
+      "learning_rate": 9.694285714285713e-05,
+      "loss": 0.1174,
+      "step": 2370
+    },
+    {
+      "epoch": 13.571428571428571,
+      "grad_norm": 0.7055214643478394,
+      "learning_rate": 9.65142857142857e-05,
+      "loss": 0.1175,
+      "step": 2375
+    },
+    {
+      "epoch": 13.6,
+      "grad_norm": 0.807955265045166,
+      "learning_rate": 9.608571428571427e-05,
+      "loss": 0.1286,
+      "step": 2380
+    },
+    {
+      "epoch": 13.628571428571428,
+      "grad_norm": 0.6661797761917114,
+      "learning_rate": 9.565714285714285e-05,
+      "loss": 0.1091,
+      "step": 2385
+    },
+    {
+      "epoch": 13.657142857142857,
+      "grad_norm": 1.119604468345642,
+      "learning_rate": 9.522857142857143e-05,
+      "loss": 0.1393,
+      "step": 2390
+    },
+    {
+      "epoch": 13.685714285714285,
+      "grad_norm": 0.5365435481071472,
+      "learning_rate": 9.479999999999999e-05,
+      "loss": 0.1075,
+      "step": 2395
+    },
+    {
+      "epoch": 13.714285714285714,
+      "grad_norm": 0.9443924427032471,
+      "learning_rate": 9.437142857142856e-05,
+      "loss": 0.0977,
+      "step": 2400
+    },
+    {
+      "epoch": 13.742857142857144,
+      "grad_norm": 0.6075264811515808,
+      "learning_rate": 9.394285714285714e-05,
+      "loss": 0.1329,
+      "step": 2405
+    },
+    {
+      "epoch": 13.771428571428572,
+      "grad_norm": 1.019352912902832,
+      "learning_rate": 9.351428571428571e-05,
+      "loss": 0.1083,
+      "step": 2410
+    },
+    {
+      "epoch": 13.8,
+      "grad_norm": 0.7234058380126953,
+      "learning_rate": 9.308571428571427e-05,
+      "loss": 0.1118,
+      "step": 2415
+    },
+    {
+      "epoch": 13.82857142857143,
+      "grad_norm": 0.6786122918128967,
+      "learning_rate": 9.265714285714284e-05,
+      "loss": 0.1208,
+      "step": 2420
+    },
+    {
+      "epoch": 13.857142857142858,
+      "grad_norm": 0.5820732116699219,
+      "learning_rate": 9.222857142857142e-05,
+      "loss": 0.1022,
+      "step": 2425
+    },
+    {
+      "epoch": 13.885714285714286,
+      "grad_norm": 0.8007987141609192,
+      "learning_rate": 9.18e-05,
+      "loss": 0.1293,
+      "step": 2430
+    },
+    {
+      "epoch": 13.914285714285715,
+      "grad_norm": 0.6813766956329346,
+      "learning_rate": 9.137142857142855e-05,
+      "loss": 0.1284,
+      "step": 2435
+    },
+    {
+      "epoch": 13.942857142857143,
+      "grad_norm": 0.6460041403770447,
+      "learning_rate": 9.094285714285714e-05,
+      "loss": 0.1073,
+      "step": 2440
+    },
+    {
+      "epoch": 13.971428571428572,
+      "grad_norm": 0.5939205288887024,
+      "learning_rate": 9.051428571428571e-05,
+      "loss": 0.1185,
+      "step": 2445
+    },
+    {
+      "epoch": 14.0,
+      "grad_norm": 0.8150635361671448,
+      "learning_rate": 9.008571428571428e-05,
+      "loss": 0.1039,
+      "step": 2450
+    },
+    {
+      "epoch": 14.028571428571428,
+      "grad_norm": 1.3691389560699463,
+      "learning_rate": 8.965714285714285e-05,
+      "loss": 0.1112,
+      "step": 2455
+    },
+    {
+      "epoch": 14.057142857142857,
+      "grad_norm": 0.9042718410491943,
+      "learning_rate": 8.922857142857142e-05,
+      "loss": 0.112,
+      "step": 2460
+    },
+    {
+      "epoch": 14.085714285714285,
+      "grad_norm": 0.7222105860710144,
+      "learning_rate": 8.879999999999999e-05,
+      "loss": 0.1221,
+      "step": 2465
+    },
+    {
+      "epoch": 14.114285714285714,
+      "grad_norm": 0.595588207244873,
+      "learning_rate": 8.837142857142857e-05,
+      "loss": 0.1058,
+      "step": 2470
+    },
+    {
+      "epoch": 14.142857142857142,
+      "grad_norm": 0.5262706279754639,
+      "learning_rate": 8.794285714285713e-05,
+      "loss": 0.1071,
+      "step": 2475
+    },
+    {
+      "epoch": 14.17142857142857,
+      "grad_norm": 0.6511022448539734,
+      "learning_rate": 8.75142857142857e-05,
+      "loss": 0.0917,
+      "step": 2480
+    },
+    {
+      "epoch": 14.2,
+      "grad_norm": 0.5737650394439697,
+      "learning_rate": 8.708571428571427e-05,
+      "loss": 0.0988,
+      "step": 2485
+    },
+    {
+      "epoch": 14.228571428571428,
+      "grad_norm": 0.7679132223129272,
+      "learning_rate": 8.665714285714286e-05,
+      "loss": 0.1185,
+      "step": 2490
+    },
+    {
+      "epoch": 14.257142857142856,
+      "grad_norm": 0.641198456287384,
+      "learning_rate": 8.622857142857141e-05,
+      "loss": 0.0894,
+      "step": 2495
+    },
+    {
+      "epoch": 14.285714285714286,
+      "grad_norm": 0.7215464115142822,
+      "learning_rate": 8.579999999999998e-05,
+      "loss": 0.0935,
+      "step": 2500
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 3500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 20,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 200,
+  "trial_name": null,
+  "trial_params": null
+}

glot-contrastive-final-lora/checkpoint-2500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02a87dc6b2c67ad3df98065b9e8fa21d9d93cd2cb361c532cb83c8a37bdc81a3
+size 5777

glot-contrastive-final-lora/checkpoint-3000/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: ./glot-mlm-adapted
+library_name: peft
+tags:
+- base_model:adapter:./glot-mlm-adapted
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

glot-contrastive-final-lora/checkpoint-3000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./glot-mlm-adapted",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query",
+    "value"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

glot-contrastive-final-lora/checkpoint-3000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f8cbb012358d427813b69c11a43d2279370f570cd9c119787e1f92c372b0761a
+size 2365824