yuting89830 commited on Dec 27, 2025

Commit

4768cdd

verified ·

1 Parent(s): 53279df

Upload 47 files

Browse files

Files changed (47) hide show

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/README.md +210 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/adapter_config.json +53 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/adapter_model.safetensors +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/optimizer.pt +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/rng_state.pth +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/scheduler.pt +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/special_tokens_map.json +30 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/tokenizer.model +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/tokenizer_config.json +0 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/trainer_state.json +1574 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/training_args.bin +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/README.md +210 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/adapter_config.json +53 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/adapter_model.safetensors +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/optimizer.pt +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/rng_state.pth +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/scheduler.pt +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/special_tokens_map.json +30 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/tokenizer.model +3 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/tokenizer_config.json +0 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/trainer_state.json +2407 -0
ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/training_args.bin +3 -0
outputs_pretrained/README.md +59 -0
outputs_pretrained/checkpoint-500/README.md +210 -0
outputs_pretrained/checkpoint-500/adapter_config.json +53 -0
outputs_pretrained/checkpoint-500/adapter_model.safetensors +3 -0
outputs_pretrained/checkpoint-500/optimizer.pt +3 -0
outputs_pretrained/checkpoint-500/rng_state.pth +3 -0
outputs_pretrained/checkpoint-500/scheduler.pt +3 -0
outputs_pretrained/checkpoint-500/special_tokens_map.json +30 -0
outputs_pretrained/checkpoint-500/tokenizer.json +0 -0
outputs_pretrained/checkpoint-500/tokenizer.model +3 -0
outputs_pretrained/checkpoint-500/tokenizer_config.json +0 -0
outputs_pretrained/checkpoint-500/trainer_state.json +3534 -0
outputs_pretrained/checkpoint-500/training_args.bin +3 -0
outputs_pretrained/checkpoint-846/README.md +210 -0
outputs_pretrained/checkpoint-846/adapter_config.json +53 -0
outputs_pretrained/checkpoint-846/adapter_model.safetensors +3 -0
outputs_pretrained/checkpoint-846/optimizer.pt +3 -0
outputs_pretrained/checkpoint-846/rng_state.pth +3 -0
outputs_pretrained/checkpoint-846/scheduler.pt +3 -0
outputs_pretrained/checkpoint-846/special_tokens_map.json +30 -0
outputs_pretrained/checkpoint-846/tokenizer.json +0 -0
outputs_pretrained/checkpoint-846/tokenizer.model +3 -0
outputs_pretrained/checkpoint-846/tokenizer_config.json +0 -0
outputs_pretrained/checkpoint-846/trainer_state.json +0 -0
outputs_pretrained/checkpoint-846/training_args.bin +3 -0

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/README.md ADDED Viewed

	@@ -0,0 +1,210 @@

+---
+base_model: unsloth/mistral-7b-v0.3-bnb-4bit
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:unsloth/mistral-7b-v0.3-bnb-4bit
+- lora
+- sft
+- transformers
+- trl
+- unsloth
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/adapter_config.json ADDED Viewed

	@@ -0,0 +1,53 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "MistralForCausalLM",
+    "parent_library": "transformers.models.mistral.modeling_mistral",
+    "unsloth_fixed": true
+  },
+  "base_model_name_or_path": "unsloth/mistral-7b-v0.3-bnb-4bit",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "embed_tokens",
+    "lm_head"
+  ],
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "k_proj",
+    "gate_proj",
+    "q_proj",
+    "down_proj",
+    "v_proj",
+    "up_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:265ba67ccf2371b1c5d207b14c1ba2b50a374db51f1d1d8f4f914ac26b1ce12d
+size 1208020312

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b7dd3303742dcc0f9023ec8356106bf0ac9f3e4021f0c298ef42c9cbf8a40817
+size 1687687979

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:48ea209aa975528d2e9f8e5188d85dd75ec291a0f9e5b66853d2b87d4a8da11a
+size 14709

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d076944119ff57e17ed1e0a8e03123c5f3fbb94b092e55c93038f7a0e96ad0cf
+size 1465

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[control_768]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
+size 587404

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1574 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.6489675516224189,
+  "eval_steps": 500,
+  "global_step": 220,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0029498525073746312,
+      "grad_norm": 8.784540176391602,
+      "learning_rate": 0.0,
+      "loss": 2.5352,
+      "step": 1
+    },
+    {
+      "epoch": 0.0058997050147492625,
+      "grad_norm": 9.609512329101562,
+      "learning_rate": 8.333333333333334e-06,
+      "loss": 2.5809,
+      "step": 2
+    },
+    {
+      "epoch": 0.008849557522123894,
+      "grad_norm": 7.955018043518066,
+      "learning_rate": 1.6666666666666667e-05,
+      "loss": 2.3861,
+      "step": 3
+    },
+    {
+      "epoch": 0.011799410029498525,
+      "grad_norm": 5.426074028015137,
+      "learning_rate": 2.5e-05,
+      "loss": 2.2533,
+      "step": 4
+    },
+    {
+      "epoch": 0.014749262536873156,
+      "grad_norm": 4.8562164306640625,
+      "learning_rate": 3.3333333333333335e-05,
+      "loss": 2.1205,
+      "step": 5
+    },
+    {
+      "epoch": 0.017699115044247787,
+      "grad_norm": 4.488800525665283,
+      "learning_rate": 4.166666666666667e-05,
+      "loss": 2.0452,
+      "step": 6
+    },
+    {
+      "epoch": 0.02064896755162242,
+      "grad_norm": 3.83853816986084,
+      "learning_rate": 5e-05,
+      "loss": 1.9761,
+      "step": 7
+    },
+    {
+      "epoch": 0.02359882005899705,
+      "grad_norm": 3.6622397899627686,
+      "learning_rate": 4.999888745376028e-05,
+      "loss": 1.8768,
+      "step": 8
+    },
+    {
+      "epoch": 0.02654867256637168,
+      "grad_norm": 4.476569652557373,
+      "learning_rate": 4.9995549914061836e-05,
+      "loss": 1.9589,
+      "step": 9
+    },
+    {
+      "epoch": 0.029498525073746312,
+      "grad_norm": 3.685044527053833,
+      "learning_rate": 4.998998767795805e-05,
+      "loss": 1.9054,
+      "step": 10
+    },
+    {
+      "epoch": 0.032448377581120944,
+      "grad_norm": 3.9684455394744873,
+      "learning_rate": 4.99822012405085e-05,
+      "loss": 1.887,
+      "step": 11
+    },
+    {
+      "epoch": 0.035398230088495575,
+      "grad_norm": 4.1171064376831055,
+      "learning_rate": 4.997219129473495e-05,
+      "loss": 1.7673,
+      "step": 12
+    },
+    {
+      "epoch": 0.038348082595870206,
+      "grad_norm": 3.5916616916656494,
+      "learning_rate": 4.995995873155958e-05,
+      "loss": 1.8061,
+      "step": 13
+    },
+    {
+      "epoch": 0.04129793510324484,
+      "grad_norm": 3.527775287628174,
+      "learning_rate": 4.994550463972577e-05,
+      "loss": 1.8472,
+      "step": 14
+    },
+    {
+      "epoch": 0.04424778761061947,
+      "grad_norm": 3.540343761444092,
+      "learning_rate": 4.992883030570116e-05,
+      "loss": 1.7783,
+      "step": 15
+    },
+    {
+      "epoch": 0.0471976401179941,
+      "grad_norm": 3.5722901821136475,
+      "learning_rate": 4.9909937213563165e-05,
+      "loss": 1.7471,
+      "step": 16
+    },
+    {
+      "epoch": 0.05014749262536873,
+      "grad_norm": 3.2874090671539307,
+      "learning_rate": 4.988882704486687e-05,
+      "loss": 1.7344,
+      "step": 17
+    },
+    {
+      "epoch": 0.05309734513274336,
+      "grad_norm": 3.2011971473693848,
+      "learning_rate": 4.9865501678495375e-05,
+      "loss": 1.7944,
+      "step": 18
+    },
+    {
+      "epoch": 0.05604719764011799,
+      "grad_norm": 3.8273420333862305,
+      "learning_rate": 4.9839963190492576e-05,
+      "loss": 1.8012,
+      "step": 19
+    },
+    {
+      "epoch": 0.058997050147492625,
+      "grad_norm": 3.3719303607940674,
+      "learning_rate": 4.9812213853878376e-05,
+      "loss": 1.7189,
+      "step": 20
+    },
+    {
+      "epoch": 0.061946902654867256,
+      "grad_norm": 3.548304557800293,
+      "learning_rate": 4.978225613844639e-05,
+      "loss": 1.7513,
+      "step": 21
+    },
+    {
+      "epoch": 0.06489675516224189,
+      "grad_norm": 3.688946485519409,
+      "learning_rate": 4.975009271054409e-05,
+      "loss": 1.7761,
+      "step": 22
+    },
+    {
+      "epoch": 0.06784660766961652,
+      "grad_norm": 3.9207510948181152,
+      "learning_rate": 4.971572643283557e-05,
+      "loss": 1.7487,
+      "step": 23
+    },
+    {
+      "epoch": 0.07079646017699115,
+      "grad_norm": 3.49283504486084,
+      "learning_rate": 4.9679160364046644e-05,
+      "loss": 1.8834,
+      "step": 24
+    },
+    {
+      "epoch": 0.07374631268436578,
+      "grad_norm": 3.67889404296875,
+      "learning_rate": 4.9640397758692715e-05,
+      "loss": 1.7992,
+      "step": 25
+    },
+    {
+      "epoch": 0.07669616519174041,
+      "grad_norm": 4.384673595428467,
+      "learning_rate": 4.9599442066789035e-05,
+      "loss": 1.8381,
+      "step": 26
+    },
+    {
+      "epoch": 0.07964601769911504,
+      "grad_norm": 3.9461209774017334,
+      "learning_rate": 4.95562969335437e-05,
+      "loss": 1.9321,
+      "step": 27
+    },
+    {
+      "epoch": 0.08259587020648967,
+      "grad_norm": 3.785196542739868,
+      "learning_rate": 4.9510966199033174e-05,
+      "loss": 1.8259,
+      "step": 28
+    },
+    {
+      "epoch": 0.0855457227138643,
+      "grad_norm": 3.6551666259765625,
+      "learning_rate": 4.946345389786049e-05,
+      "loss": 1.794,
+      "step": 29
+    },
+    {
+      "epoch": 0.08849557522123894,
+      "grad_norm": 4.191623210906982,
+      "learning_rate": 4.941376425879624e-05,
+      "loss": 1.8796,
+      "step": 30
+    },
+    {
+      "epoch": 0.09144542772861357,
+      "grad_norm": 4.097504138946533,
+      "learning_rate": 4.936190170440208e-05,
+      "loss": 1.8485,
+      "step": 31
+    },
+    {
+      "epoch": 0.0943952802359882,
+      "grad_norm": 3.767967939376831,
+      "learning_rate": 4.930787085063723e-05,
+      "loss": 1.8537,
+      "step": 32
+    },
+    {
+      "epoch": 0.09734513274336283,
+      "grad_norm": 3.880598783493042,
+      "learning_rate": 4.925167650644752e-05,
+      "loss": 1.8942,
+      "step": 33
+    },
+    {
+      "epoch": 0.10029498525073746,
+      "grad_norm": 4.313149452209473,
+      "learning_rate": 4.9193323673337476e-05,
+      "loss": 1.8464,
+      "step": 34
+    },
+    {
+      "epoch": 0.10324483775811209,
+      "grad_norm": 4.014875888824463,
+      "learning_rate": 4.9132817544925085e-05,
+      "loss": 1.7687,
+      "step": 35
+    },
+    {
+      "epoch": 0.10619469026548672,
+      "grad_norm": 4.243000030517578,
+      "learning_rate": 4.907016350647961e-05,
+      "loss": 1.8674,
+      "step": 36
+    },
+    {
+      "epoch": 0.10914454277286136,
+      "grad_norm": 4.010824203491211,
+      "learning_rate": 4.9005367134442235e-05,
+      "loss": 1.772,
+      "step": 37
+    },
+    {
+      "epoch": 0.11209439528023599,
+      "grad_norm": 3.9447169303894043,
+      "learning_rate": 4.893843419592977e-05,
+      "loss": 1.8545,
+      "step": 38
+    },
+    {
+      "epoch": 0.11504424778761062,
+      "grad_norm": 3.677614450454712,
+      "learning_rate": 4.886937064822134e-05,
+      "loss": 1.7783,
+      "step": 39
+    },
+    {
+      "epoch": 0.11799410029498525,
+      "grad_norm": 3.861032485961914,
+      "learning_rate": 4.8798182638228166e-05,
+      "loss": 1.7723,
+      "step": 40
+    },
+    {
+      "epoch": 0.12094395280235988,
+      "grad_norm": 3.7934088706970215,
+      "learning_rate": 4.872487650194647e-05,
+      "loss": 1.7814,
+      "step": 41
+    },
+    {
+      "epoch": 0.12389380530973451,
+      "grad_norm": 4.10675573348999,
+      "learning_rate": 4.864945876389356e-05,
+      "loss": 1.7406,
+      "step": 42
+    },
+    {
+      "epoch": 0.12684365781710916,
+      "grad_norm": 3.8732664585113525,
+      "learning_rate": 4.857193613652711e-05,
+      "loss": 1.7047,
+      "step": 43
+    },
+    {
+      "epoch": 0.12979351032448377,
+      "grad_norm": 3.985630512237549,
+      "learning_rate": 4.849231551964771e-05,
+      "loss": 1.6778,
+      "step": 44
+    },
+    {
+      "epoch": 0.13274336283185842,
+      "grad_norm": 4.057476043701172,
+      "learning_rate": 4.841060399978481e-05,
+      "loss": 1.6264,
+      "step": 45
+    },
+    {
+      "epoch": 0.13569321533923304,
+      "grad_norm": 4.0363335609436035,
+      "learning_rate": 4.8326808849565936e-05,
+      "loss": 1.5671,
+      "step": 46
+    },
+    {
+      "epoch": 0.13864306784660768,
+      "grad_norm": 4.249370098114014,
+      "learning_rate": 4.824093752706943e-05,
+      "loss": 1.5605,
+      "step": 47
+    },
+    {
+      "epoch": 0.1415929203539823,
+      "grad_norm": 4.4337615966796875,
+      "learning_rate": 4.815299767516065e-05,
+      "loss": 1.4718,
+      "step": 48
+    },
+    {
+      "epoch": 0.14454277286135694,
+      "grad_norm": 4.286309719085693,
+      "learning_rate": 4.806299712081172e-05,
+      "loss": 1.3906,
+      "step": 49
+    },
+    {
+      "epoch": 0.14749262536873156,
+      "grad_norm": 4.912067413330078,
+      "learning_rate": 4.797094387440491e-05,
+      "loss": 1.1653,
+      "step": 50
+    },
+    {
+      "epoch": 0.1504424778761062,
+      "grad_norm": 8.031450271606445,
+      "learning_rate": 4.787684612901965e-05,
+      "loss": 1.888,
+      "step": 51
+    },
+    {
+      "epoch": 0.15339233038348082,
+      "grad_norm": 7.04852819442749,
+      "learning_rate": 4.77807122597034e-05,
+      "loss": 1.8472,
+      "step": 52
+    },
+    {
+      "epoch": 0.15634218289085547,
+      "grad_norm": 5.582455635070801,
+      "learning_rate": 4.768255082272611e-05,
+      "loss": 1.7573,
+      "step": 53
+    },
+    {
+      "epoch": 0.1592920353982301,
+      "grad_norm": 4.440758228302002,
+      "learning_rate": 4.758237055481881e-05,
+      "loss": 1.7541,
+      "step": 54
+    },
+    {
+      "epoch": 0.16224188790560473,
+      "grad_norm": 3.537503242492676,
+      "learning_rate": 4.748018037239592e-05,
+      "loss": 1.6726,
+      "step": 55
+    },
+    {
+      "epoch": 0.16519174041297935,
+      "grad_norm": 3.002551317214966,
+      "learning_rate": 4.7375989370761695e-05,
+      "loss": 1.5917,
+      "step": 56
+    },
+    {
+      "epoch": 0.168141592920354,
+      "grad_norm": 3.5368826389312744,
+      "learning_rate": 4.726980682330071e-05,
+      "loss": 1.6657,
+      "step": 57
+    },
+    {
+      "epoch": 0.1710914454277286,
+      "grad_norm": 3.5577824115753174,
+      "learning_rate": 4.7161642180652464e-05,
+      "loss": 1.6253,
+      "step": 58
+    },
+    {
+      "epoch": 0.17404129793510326,
+      "grad_norm": 3.154500961303711,
+      "learning_rate": 4.7051505069870286e-05,
+      "loss": 1.5817,
+      "step": 59
+    },
+    {
+      "epoch": 0.17699115044247787,
+      "grad_norm": 3.1665284633636475,
+      "learning_rate": 4.693940529356444e-05,
+      "loss": 1.6044,
+      "step": 60
+    },
+    {
+      "epoch": 0.17994100294985252,
+      "grad_norm": 3.07588529586792,
+      "learning_rate": 4.6825352829029705e-05,
+      "loss": 1.49,
+      "step": 61
+    },
+    {
+      "epoch": 0.18289085545722714,
+      "grad_norm": 2.87255597114563,
+      "learning_rate": 4.670935782735732e-05,
+      "loss": 1.5619,
+      "step": 62
+    },
+    {
+      "epoch": 0.18584070796460178,
+      "grad_norm": 2.7868995666503906,
+      "learning_rate": 4.6591430612531515e-05,
+      "loss": 1.5085,
+      "step": 63
+    },
+    {
+      "epoch": 0.1887905604719764,
+      "grad_norm": 2.9793434143066406,
+      "learning_rate": 4.647158168051066e-05,
+      "loss": 1.5677,
+      "step": 64
+    },
+    {
+      "epoch": 0.19174041297935104,
+      "grad_norm": 2.7829337120056152,
+      "learning_rate": 4.6349821698293025e-05,
+      "loss": 1.5359,
+      "step": 65
+    },
+    {
+      "epoch": 0.19469026548672566,
+      "grad_norm": 2.799654483795166,
+      "learning_rate": 4.622616150296745e-05,
+      "loss": 1.5196,
+      "step": 66
+    },
+    {
+      "epoch": 0.1976401179941003,
+      "grad_norm": 2.9134767055511475,
+      "learning_rate": 4.6100612100748765e-05,
+      "loss": 1.6036,
+      "step": 67
+    },
+    {
+      "epoch": 0.20058997050147492,
+      "grad_norm": 3.1140592098236084,
+      "learning_rate": 4.5973184665998186e-05,
+      "loss": 1.6003,
+      "step": 68
+    },
+    {
+      "epoch": 0.20353982300884957,
+      "grad_norm": 3.3117525577545166,
+      "learning_rate": 4.5843890540228794e-05,
+      "loss": 1.6092,
+      "step": 69
+    },
+    {
+      "epoch": 0.20648967551622419,
+      "grad_norm": 2.9576680660247803,
+      "learning_rate": 4.571274123109606e-05,
+      "loss": 1.5685,
+      "step": 70
+    },
+    {
+      "epoch": 0.20943952802359883,
+      "grad_norm": 2.9042575359344482,
+      "learning_rate": 4.557974841137364e-05,
+      "loss": 1.549,
+      "step": 71
+    },
+    {
+      "epoch": 0.21238938053097345,
+      "grad_norm": 2.8921477794647217,
+      "learning_rate": 4.544492391791445e-05,
+      "loss": 1.6279,
+      "step": 72
+    },
+    {
+      "epoch": 0.2153392330383481,
+      "grad_norm": 3.005927562713623,
+      "learning_rate": 4.530827975059715e-05,
+      "loss": 1.5111,
+      "step": 73
+    },
+    {
+      "epoch": 0.2182890855457227,
+      "grad_norm": 3.4114937782287598,
+      "learning_rate": 4.5169828071258116e-05,
+      "loss": 1.6132,
+      "step": 74
+    },
+    {
+      "epoch": 0.22123893805309736,
+      "grad_norm": 3.160595655441284,
+      "learning_rate": 4.502958120260894e-05,
+      "loss": 1.5978,
+      "step": 75
+    },
+    {
+      "epoch": 0.22418879056047197,
+      "grad_norm": 3.215529441833496,
+      "learning_rate": 4.488755162713975e-05,
+      "loss": 1.705,
+      "step": 76
+    },
+    {
+      "epoch": 0.22713864306784662,
+      "grad_norm": 3.4715919494628906,
+      "learning_rate": 4.474375198600815e-05,
+      "loss": 1.6392,
+      "step": 77
+    },
+    {
+      "epoch": 0.23008849557522124,
+      "grad_norm": 3.2136619091033936,
+      "learning_rate": 4.4598195077914145e-05,
+      "loss": 1.6188,
+      "step": 78
+    },
+    {
+      "epoch": 0.23303834808259588,
+      "grad_norm": 3.3104329109191895,
+      "learning_rate": 4.445089385796099e-05,
+      "loss": 1.7059,
+      "step": 79
+    },
+    {
+      "epoch": 0.2359882005899705,
+      "grad_norm": 3.097895622253418,
+      "learning_rate": 4.4301861436502156e-05,
+      "loss": 1.7615,
+      "step": 80
+    },
+    {
+      "epoch": 0.23893805309734514,
+      "grad_norm": 3.3492372035980225,
+      "learning_rate": 4.415111107797445e-05,
+      "loss": 1.6919,
+      "step": 81
+    },
+    {
+      "epoch": 0.24188790560471976,
+      "grad_norm": 4.477549076080322,
+      "learning_rate": 4.3998656199717435e-05,
+      "loss": 1.6855,
+      "step": 82
+    },
+    {
+      "epoch": 0.2448377581120944,
+      "grad_norm": 4.241739273071289,
+      "learning_rate": 4.384451037077924e-05,
+      "loss": 1.7483,
+      "step": 83
+    },
+    {
+      "epoch": 0.24778761061946902,
+      "grad_norm": 3.785494565963745,
+      "learning_rate": 4.368868731070884e-05,
+      "loss": 1.8123,
+      "step": 84
+    },
+    {
+      "epoch": 0.25073746312684364,
+      "grad_norm": 4.163392543792725,
+      "learning_rate": 4.353120088833501e-05,
+      "loss": 1.663,
+      "step": 85
+    },
+    {
+      "epoch": 0.2536873156342183,
+      "grad_norm": 4.269979953765869,
+      "learning_rate": 4.33720651205319e-05,
+      "loss": 1.6724,
+      "step": 86
+    },
+    {
+      "epoch": 0.25663716814159293,
+      "grad_norm": 3.830099582672119,
+      "learning_rate": 4.321129417097153e-05,
+      "loss": 1.7422,
+      "step": 87
+    },
+    {
+      "epoch": 0.25958702064896755,
+      "grad_norm": 3.549696207046509,
+      "learning_rate": 4.3048902348863116e-05,
+      "loss": 1.6266,
+      "step": 88
+    },
+    {
+      "epoch": 0.26253687315634217,
+      "grad_norm": 3.9003500938415527,
+      "learning_rate": 4.288490410767955e-05,
+      "loss": 1.6474,
+      "step": 89
+    },
+    {
+      "epoch": 0.26548672566371684,
+      "grad_norm": 3.736586332321167,
+      "learning_rate": 4.271931404387096e-05,
+      "loss": 1.6389,
+      "step": 90
+    },
+    {
+      "epoch": 0.26843657817109146,
+      "grad_norm": 4.005049228668213,
+      "learning_rate": 4.255214689556557e-05,
+      "loss": 1.6593,
+      "step": 91
+    },
+    {
+      "epoch": 0.2713864306784661,
+      "grad_norm": 3.6020376682281494,
+      "learning_rate": 4.2383417541257954e-05,
+      "loss": 1.5903,
+      "step": 92
+    },
+    {
+      "epoch": 0.2743362831858407,
+      "grad_norm": 3.8671212196350098,
+      "learning_rate": 4.221314099848481e-05,
+      "loss": 1.566,
+      "step": 93
+    },
+    {
+      "epoch": 0.27728613569321536,
+      "grad_norm": 3.5491201877593994,
+      "learning_rate": 4.204133242248832e-05,
+      "loss": 1.6435,
+      "step": 94
+    },
+    {
+      "epoch": 0.28023598820059,
+      "grad_norm": 3.8159406185150146,
+      "learning_rate": 4.186800710486732e-05,
+      "loss": 1.5173,
+      "step": 95
+    },
+    {
+      "epoch": 0.2831858407079646,
+      "grad_norm": 3.911926746368408,
+      "learning_rate": 4.169318047221621e-05,
+      "loss": 1.4565,
+      "step": 96
+    },
+    {
+      "epoch": 0.2861356932153392,
+      "grad_norm": 4.003925323486328,
+      "learning_rate": 4.151686808475204e-05,
+      "loss": 1.5175,
+      "step": 97
+    },
+    {
+      "epoch": 0.2890855457227139,
+      "grad_norm": 4.023141860961914,
+      "learning_rate": 4.1339085634929485e-05,
+      "loss": 1.3598,
+      "step": 98
+    },
+    {
+      "epoch": 0.2920353982300885,
+      "grad_norm": 4.288107872009277,
+      "learning_rate": 4.115984894604423e-05,
+      "loss": 1.2067,
+      "step": 99
+    },
+    {
+      "epoch": 0.2949852507374631,
+      "grad_norm": 4.7076592445373535,
+      "learning_rate": 4.0979173970824626e-05,
+      "loss": 1.0602,
+      "step": 100
+    },
+    {
+      "epoch": 0.29793510324483774,
+      "grad_norm": 6.30832052230835,
+      "learning_rate": 4.0797076790011804e-05,
+      "loss": 1.704,
+      "step": 101
+    },
+    {
+      "epoch": 0.3008849557522124,
+      "grad_norm": 4.847335338592529,
+      "learning_rate": 4.0613573610928476e-05,
+      "loss": 1.5943,
+      "step": 102
+    },
+    {
+      "epoch": 0.30383480825958703,
+      "grad_norm": 4.62391471862793,
+      "learning_rate": 4.0428680766036384e-05,
+      "loss": 1.6134,
+      "step": 103
+    },
+    {
+      "epoch": 0.30678466076696165,
+      "grad_norm": 3.951815128326416,
+      "learning_rate": 4.0242414711482676e-05,
+      "loss": 1.5143,
+      "step": 104
+    },
+    {
+      "epoch": 0.30973451327433627,
+      "grad_norm": 3.5230190753936768,
+      "learning_rate": 4.005479202563524e-05,
+      "loss": 1.557,
+      "step": 105
+    },
+    {
+      "epoch": 0.31268436578171094,
+      "grad_norm": 3.093968391418457,
+      "learning_rate": 3.986582940760717e-05,
+      "loss": 1.5439,
+      "step": 106
+    },
+    {
+      "epoch": 0.31563421828908556,
+      "grad_norm": 2.9362878799438477,
+      "learning_rate": 3.967554367577047e-05,
+      "loss": 1.454,
+      "step": 107
+    },
+    {
+      "epoch": 0.3185840707964602,
+      "grad_norm": 2.7811033725738525,
+      "learning_rate": 3.948395176625918e-05,
+      "loss": 1.4497,
+      "step": 108
+    },
+    {
+      "epoch": 0.3215339233038348,
+      "grad_norm": 2.7802510261535645,
+      "learning_rate": 3.929107073146197e-05,
+      "loss": 1.3589,
+      "step": 109
+    },
+    {
+      "epoch": 0.32448377581120946,
+      "grad_norm": 3.563616991043091,
+      "learning_rate": 3.909691773850445e-05,
+      "loss": 1.4628,
+      "step": 110
+    },
+    {
+      "epoch": 0.3274336283185841,
+      "grad_norm": 3.047844886779785,
+      "learning_rate": 3.890151006772119e-05,
+      "loss": 1.4186,
+      "step": 111
+    },
+    {
+      "epoch": 0.3303834808259587,
+      "grad_norm": 3.004772663116455,
+      "learning_rate": 3.8704865111117746e-05,
+      "loss": 1.375,
+      "step": 112
+    },
+    {
+      "epoch": 0.3333333333333333,
+      "grad_norm": 2.8555989265441895,
+      "learning_rate": 3.850700037082268e-05,
+      "loss": 1.303,
+      "step": 113
+    },
+    {
+      "epoch": 0.336283185840708,
+      "grad_norm": 2.86316180229187,
+      "learning_rate": 3.83079334575298e-05,
+      "loss": 1.3022,
+      "step": 114
+    },
+    {
+      "epoch": 0.3392330383480826,
+      "grad_norm": 3.3198935985565186,
+      "learning_rate": 3.8107682088930794e-05,
+      "loss": 1.3897,
+      "step": 115
+    },
+    {
+      "epoch": 0.3421828908554572,
+      "grad_norm": 3.1176650524139404,
+      "learning_rate": 3.790626408813822e-05,
+      "loss": 1.3162,
+      "step": 116
+    },
+    {
+      "epoch": 0.34513274336283184,
+      "grad_norm": 2.670988082885742,
+      "learning_rate": 3.7703697382099234e-05,
+      "loss": 1.3686,
+      "step": 117
+    },
+    {
+      "epoch": 0.3480825958702065,
+      "grad_norm": 3.0355708599090576,
+      "learning_rate": 3.7500000000000003e-05,
+      "loss": 1.4136,
+      "step": 118
+    },
+    {
+      "epoch": 0.35103244837758113,
+      "grad_norm": 2.5040009021759033,
+      "learning_rate": 3.729519007166105e-05,
+      "loss": 1.4136,
+      "step": 119
+    },
+    {
+      "epoch": 0.35398230088495575,
+      "grad_norm": 2.4674110412597656,
+      "learning_rate": 3.7089285825923615e-05,
+      "loss": 1.4819,
+      "step": 120
+    },
+    {
+      "epoch": 0.35693215339233036,
+      "grad_norm": 3.2417714595794678,
+      "learning_rate": 3.688230558902725e-05,
+      "loss": 1.4959,
+      "step": 121
+    },
+    {
+      "epoch": 0.35988200589970504,
+      "grad_norm": 2.698197364807129,
+      "learning_rate": 3.667426778297871e-05,
+      "loss": 1.493,
+      "step": 122
+    },
+    {
+      "epoch": 0.36283185840707965,
+      "grad_norm": 2.9125115871429443,
+      "learning_rate": 3.646519092391227e-05,
+      "loss": 1.512,
+      "step": 123
+    },
+    {
+      "epoch": 0.36578171091445427,
+      "grad_norm": 2.803999900817871,
+      "learning_rate": 3.6255093620441834e-05,
+      "loss": 1.4455,
+      "step": 124
+    },
+    {
+      "epoch": 0.3687315634218289,
+      "grad_norm": 2.8682003021240234,
+      "learning_rate": 3.604399457200458e-05,
+      "loss": 1.4457,
+      "step": 125
+    },
+    {
+      "epoch": 0.37168141592920356,
+      "grad_norm": 2.987795114517212,
+      "learning_rate": 3.583191256719672e-05,
+      "loss": 1.4813,
+      "step": 126
+    },
+    {
+      "epoch": 0.3746312684365782,
+      "grad_norm": 2.8751752376556396,
+      "learning_rate": 3.56188664821012e-05,
+      "loss": 1.4557,
+      "step": 127
+    },
+    {
+      "epoch": 0.3775811209439528,
+      "grad_norm": 3.237694025039673,
+      "learning_rate": 3.540487527860769e-05,
+      "loss": 1.5434,
+      "step": 128
+    },
+    {
+      "epoch": 0.3805309734513274,
+      "grad_norm": 3.4298689365386963,
+      "learning_rate": 3.518995800272491e-05,
+      "loss": 1.4578,
+      "step": 129
+    },
+    {
+      "epoch": 0.3834808259587021,
+      "grad_norm": 3.414600372314453,
+      "learning_rate": 3.497413378288541e-05,
+      "loss": 1.5302,
+      "step": 130
+    },
+    {
+      "epoch": 0.3864306784660767,
+      "grad_norm": 3.4248905181884766,
+      "learning_rate": 3.475742182824314e-05,
+      "loss": 1.5351,
+      "step": 131
+    },
+    {
+      "epoch": 0.3893805309734513,
+      "grad_norm": 3.316810131072998,
+      "learning_rate": 3.453984142696372e-05,
+      "loss": 1.617,
+      "step": 132
+    },
+    {
+      "epoch": 0.39233038348082594,
+      "grad_norm": 3.3763375282287598,
+      "learning_rate": 3.432141194450772e-05,
+      "loss": 1.5688,
+      "step": 133
+    },
+    {
+      "epoch": 0.3952802359882006,
+      "grad_norm": 3.3588340282440186,
+      "learning_rate": 3.41021528219071e-05,
+      "loss": 1.6525,
+      "step": 134
+    },
+    {
+      "epoch": 0.39823008849557523,
+      "grad_norm": 4.095376968383789,
+      "learning_rate": 3.3882083574034844e-05,
+      "loss": 1.649,
+      "step": 135
+    },
+    {
+      "epoch": 0.40117994100294985,
+      "grad_norm": 3.3478126525878906,
+      "learning_rate": 3.3661223787868094e-05,
+      "loss": 1.6012,
+      "step": 136
+    },
+    {
+      "epoch": 0.40412979351032446,
+      "grad_norm": 3.3165597915649414,
+      "learning_rate": 3.3439593120744816e-05,
+      "loss": 1.6119,
+      "step": 137
+    },
+    {
+      "epoch": 0.40707964601769914,
+      "grad_norm": 3.542797327041626,
+      "learning_rate": 3.321721129861422e-05,
+      "loss": 1.5688,
+      "step": 138
+    },
+    {
+      "epoch": 0.41002949852507375,
+      "grad_norm": 3.593048572540283,
+      "learning_rate": 3.2994098114281134e-05,
+      "loss": 1.5018,
+      "step": 139
+    },
+    {
+      "epoch": 0.41297935103244837,
+      "grad_norm": 4.237592697143555,
+      "learning_rate": 3.277027342564428e-05,
+      "loss": 1.6425,
+      "step": 140
+    },
+    {
+      "epoch": 0.415929203539823,
+      "grad_norm": 3.76655650138855,
+      "learning_rate": 3.2545757153928924e-05,
+      "loss": 1.6121,
+      "step": 141
+    },
+    {
+      "epoch": 0.41887905604719766,
+      "grad_norm": 3.643519163131714,
+      "learning_rate": 3.232056928191376e-05,
+      "loss": 1.5565,
+      "step": 142
+    },
+    {
+      "epoch": 0.4218289085545723,
+      "grad_norm": 3.862382173538208,
+      "learning_rate": 3.209472985215243e-05,
+      "loss": 1.5727,
+      "step": 143
+    },
+    {
+      "epoch": 0.4247787610619469,
+      "grad_norm": 3.5415596961975098,
+      "learning_rate": 3.186825896518958e-05,
+      "loss": 1.5971,
+      "step": 144
+    },
+    {
+      "epoch": 0.4277286135693215,
+      "grad_norm": 3.5340561866760254,
+      "learning_rate": 3.164117677777191e-05,
+      "loss": 1.5406,
+      "step": 145
+    },
+    {
+      "epoch": 0.4306784660766962,
+      "grad_norm": 4.26082181930542,
+      "learning_rate": 3.141350350105413e-05,
+      "loss": 1.4473,
+      "step": 146
+    },
+    {
+      "epoch": 0.4336283185840708,
+      "grad_norm": 3.90486478805542,
+      "learning_rate": 3.118525939880007e-05,
+      "loss": 1.459,
+      "step": 147
+    },
+    {
+      "epoch": 0.4365781710914454,
+      "grad_norm": 4.1377410888671875,
+      "learning_rate": 3.0956464785579124e-05,
+      "loss": 1.4197,
+      "step": 148
+    },
+    {
+      "epoch": 0.43952802359882004,
+      "grad_norm": 4.407893180847168,
+      "learning_rate": 3.072714002495825e-05,
+      "loss": 1.3119,
+      "step": 149
+    },
+    {
+      "epoch": 0.4424778761061947,
+      "grad_norm": 4.6629557609558105,
+      "learning_rate": 3.0497305527689445e-05,
+      "loss": 1.09,
+      "step": 150
+    },
+    {
+      "epoch": 0.44542772861356933,
+      "grad_norm": 4.147300720214844,
+      "learning_rate": 3.0266981749893157e-05,
+      "loss": 1.536,
+      "step": 151
+    },
+    {
+      "epoch": 0.44837758112094395,
+      "grad_norm": 3.9903712272644043,
+      "learning_rate": 3.003618919123763e-05,
+      "loss": 1.4418,
+      "step": 152
+    },
+    {
+      "epoch": 0.45132743362831856,
+      "grad_norm": 4.015631198883057,
+      "learning_rate": 2.9804948393114324e-05,
+      "loss": 1.4583,
+      "step": 153
+    },
+    {
+      "epoch": 0.45427728613569324,
+      "grad_norm": 3.5220329761505127,
+      "learning_rate": 2.9573279936809667e-05,
+      "loss": 1.4275,
+      "step": 154
+    },
+    {
+      "epoch": 0.45722713864306785,
+      "grad_norm": 2.9233076572418213,
+      "learning_rate": 2.9341204441673266e-05,
+      "loss": 1.4147,
+      "step": 155
+    },
+    {
+      "epoch": 0.46017699115044247,
+      "grad_norm": 2.8685462474823,
+      "learning_rate": 2.9108742563282652e-05,
+      "loss": 1.3511,
+      "step": 156
+    },
+    {
+      "epoch": 0.4631268436578171,
+      "grad_norm": 2.6424460411071777,
+      "learning_rate": 2.8875914991604948e-05,
+      "loss": 1.3207,
+      "step": 157
+    },
+    {
+      "epoch": 0.46607669616519176,
+      "grad_norm": 2.790271520614624,
+      "learning_rate": 2.8642742449155284e-05,
+      "loss": 1.3057,
+      "step": 158
+    },
+    {
+      "epoch": 0.4690265486725664,
+      "grad_norm": 2.6307857036590576,
+      "learning_rate": 2.8409245689152503e-05,
+      "loss": 1.3061,
+      "step": 159
+    },
+    {
+      "epoch": 0.471976401179941,
+      "grad_norm": 2.461571455001831,
+      "learning_rate": 2.8175445493671972e-05,
+      "loss": 1.2907,
+      "step": 160
+    },
+    {
+      "epoch": 0.4749262536873156,
+      "grad_norm": 2.634023904800415,
+      "learning_rate": 2.794136267179596e-05,
+      "loss": 1.21,
+      "step": 161
+    },
+    {
+      "epoch": 0.4778761061946903,
+      "grad_norm": 2.609807014465332,
+      "learning_rate": 2.770701805776155e-05,
+      "loss": 1.1946,
+      "step": 162
+    },
+    {
+      "epoch": 0.4808259587020649,
+      "grad_norm": 2.385166645050049,
+      "learning_rate": 2.7472432509106248e-05,
+      "loss": 1.2135,
+      "step": 163
+    },
+    {
+      "epoch": 0.4837758112094395,
+      "grad_norm": 2.5615952014923096,
+      "learning_rate": 2.723762690481167e-05,
+      "loss": 1.263,
+      "step": 164
+    },
+    {
+      "epoch": 0.48672566371681414,
+      "grad_norm": 2.696706533432007,
+      "learning_rate": 2.7002622143445177e-05,
+      "loss": 1.2963,
+      "step": 165
+    },
+    {
+      "epoch": 0.4896755162241888,
+      "grad_norm": 2.426612377166748,
+      "learning_rate": 2.6767439141299865e-05,
+      "loss": 1.2133,
+      "step": 166
+    },
+    {
+      "epoch": 0.49262536873156343,
+      "grad_norm": 2.5794103145599365,
+      "learning_rate": 2.653209883053291e-05,
+      "loss": 1.2705,
+      "step": 167
+    },
+    {
+      "epoch": 0.49557522123893805,
+      "grad_norm": 2.885312795639038,
+      "learning_rate": 2.629662215730253e-05,
+      "loss": 1.3029,
+      "step": 168
+    },
+    {
+      "epoch": 0.49852507374631266,
+      "grad_norm": 2.72586727142334,
+      "learning_rate": 2.606103007990371e-05,
+      "loss": 1.2537,
+      "step": 169
+    },
+    {
+      "epoch": 0.5014749262536873,
+      "grad_norm": 2.772090196609497,
+      "learning_rate": 2.5825343566902837e-05,
+      "loss": 1.2844,
+      "step": 170
+    },
+    {
+      "epoch": 0.504424778761062,
+      "grad_norm": 3.0460994243621826,
+      "learning_rate": 2.5589583595271428e-05,
+      "loss": 1.2897,
+      "step": 171
+    },
+    {
+      "epoch": 0.5073746312684366,
+      "grad_norm": 2.7062950134277344,
+      "learning_rate": 2.5353771148519057e-05,
+      "loss": 1.2549,
+      "step": 172
+    },
+    {
+      "epoch": 0.5103244837758112,
+      "grad_norm": 2.8319649696350098,
+      "learning_rate": 2.511792721482581e-05,
+      "loss": 1.2618,
+      "step": 173
+    },
+    {
+      "epoch": 0.5132743362831859,
+      "grad_norm": 2.708251714706421,
+      "learning_rate": 2.48820727851742e-05,
+      "loss": 1.234,
+      "step": 174
+    },
+    {
+      "epoch": 0.5162241887905604,
+      "grad_norm": 2.71691632270813,
+      "learning_rate": 2.4646228851480956e-05,
+      "loss": 1.3505,
+      "step": 175
+    },
+    {
+      "epoch": 0.5191740412979351,
+      "grad_norm": 2.943074941635132,
+      "learning_rate": 2.441041640472858e-05,
+      "loss": 1.2841,
+      "step": 176
+    },
+    {
+      "epoch": 0.5221238938053098,
+      "grad_norm": 2.907444715499878,
+      "learning_rate": 2.417465643309716e-05,
+      "loss": 1.3211,
+      "step": 177
+    },
+    {
+      "epoch": 0.5250737463126843,
+      "grad_norm": 3.0803866386413574,
+      "learning_rate": 2.39389699200963e-05,
+      "loss": 1.4262,
+      "step": 178
+    },
+    {
+      "epoch": 0.528023598820059,
+      "grad_norm": 3.2019131183624268,
+      "learning_rate": 2.3703377842697475e-05,
+      "loss": 1.4857,
+      "step": 179
+    },
+    {
+      "epoch": 0.5309734513274337,
+      "grad_norm": 2.9756524562835693,
+      "learning_rate": 2.34679011694671e-05,
+      "loss": 1.4927,
+      "step": 180
+    },
+    {
+      "epoch": 0.5339233038348082,
+      "grad_norm": 3.07672381401062,
+      "learning_rate": 2.3232560858700137e-05,
+      "loss": 1.509,
+      "step": 181
+    },
+    {
+      "epoch": 0.5368731563421829,
+      "grad_norm": 3.461970806121826,
+      "learning_rate": 2.2997377856554822e-05,
+      "loss": 1.5949,
+      "step": 182
+    },
+    {
+      "epoch": 0.5398230088495575,
+      "grad_norm": 3.2279229164123535,
+      "learning_rate": 2.276237309518834e-05,
+      "loss": 1.4417,
+      "step": 183
+    },
+    {
+      "epoch": 0.5427728613569321,
+      "grad_norm": 3.782461643218994,
+      "learning_rate": 2.2527567490893758e-05,
+      "loss": 1.5903,
+      "step": 184
+    },
+    {
+      "epoch": 0.5457227138643068,
+      "grad_norm": 3.6933534145355225,
+      "learning_rate": 2.2292981942238454e-05,
+      "loss": 1.5214,
+      "step": 185
+    },
+    {
+      "epoch": 0.5486725663716814,
+      "grad_norm": 3.0459163188934326,
+      "learning_rate": 2.205863732820404e-05,
+      "loss": 1.5098,
+      "step": 186
+    },
+    {
+      "epoch": 0.551622418879056,
+      "grad_norm": 3.1090471744537354,
+      "learning_rate": 2.182455450632803e-05,
+      "loss": 1.5546,
+      "step": 187
+    },
+    {
+      "epoch": 0.5545722713864307,
+      "grad_norm": 3.4665732383728027,
+      "learning_rate": 2.159075431084751e-05,
+      "loss": 1.4933,
+      "step": 188
+    },
+    {
+      "epoch": 0.5575221238938053,
+      "grad_norm": 3.638923406600952,
+      "learning_rate": 2.1357257550844718e-05,
+      "loss": 1.587,
+      "step": 189
+    },
+    {
+      "epoch": 0.56047197640118,
+      "grad_norm": 3.5211169719696045,
+      "learning_rate": 2.1124085008395054e-05,
+      "loss": 1.4932,
+      "step": 190
+    },
+    {
+      "epoch": 0.5634218289085545,
+      "grad_norm": 3.786188840866089,
+      "learning_rate": 2.0891257436717353e-05,
+      "loss": 1.571,
+      "step": 191
+    },
+    {
+      "epoch": 0.5663716814159292,
+      "grad_norm": 3.264098882675171,
+      "learning_rate": 2.0658795558326743e-05,
+      "loss": 1.5047,
+      "step": 192
+    },
+    {
+      "epoch": 0.5693215339233039,
+      "grad_norm": 3.6925175189971924,
+      "learning_rate": 2.0426720063190335e-05,
+      "loss": 1.4851,
+      "step": 193
+    },
+    {
+      "epoch": 0.5722713864306784,
+      "grad_norm": 3.394176483154297,
+      "learning_rate": 2.0195051606885685e-05,
+      "loss": 1.4108,
+      "step": 194
+    },
+    {
+      "epoch": 0.5752212389380531,
+      "grad_norm": 3.930576801300049,
+      "learning_rate": 1.996381080876237e-05,
+      "loss": 1.4126,
+      "step": 195
+    },
+    {
+      "epoch": 0.5781710914454278,
+      "grad_norm": 3.7063424587249756,
+      "learning_rate": 1.973301825010685e-05,
+      "loss": 1.389,
+      "step": 196
+    },
+    {
+      "epoch": 0.5811209439528023,
+      "grad_norm": 3.6309196949005127,
+      "learning_rate": 1.950269447231056e-05,
+      "loss": 1.3373,
+      "step": 197
+    },
+    {
+      "epoch": 0.584070796460177,
+      "grad_norm": 3.9842283725738525,
+      "learning_rate": 1.9272859975041754e-05,
+      "loss": 1.2385,
+      "step": 198
+    },
+    {
+      "epoch": 0.5870206489675516,
+      "grad_norm": 3.9302854537963867,
+      "learning_rate": 1.904353521442088e-05,
+      "loss": 1.22,
+      "step": 199
+    },
+    {
+      "epoch": 0.5899705014749262,
+      "grad_norm": 3.901738166809082,
+      "learning_rate": 1.881474060119994e-05,
+      "loss": 1.0099,
+      "step": 200
+    },
+    {
+      "epoch": 0.5929203539823009,
+      "grad_norm": 4.378158092498779,
+      "learning_rate": 1.8586496498945877e-05,
+      "loss": 1.4772,
+      "step": 201
+    },
+    {
+      "epoch": 0.5958702064896755,
+      "grad_norm": 3.2930281162261963,
+      "learning_rate": 1.8358823222228097e-05,
+      "loss": 1.3278,
+      "step": 202
+    },
+    {
+      "epoch": 0.5988200589970502,
+      "grad_norm": 3.2225897312164307,
+      "learning_rate": 1.8131741034810435e-05,
+      "loss": 1.3519,
+      "step": 203
+    },
+    {
+      "epoch": 0.6017699115044248,
+      "grad_norm": 3.375405788421631,
+      "learning_rate": 1.790527014784758e-05,
+      "loss": 1.2963,
+      "step": 204
+    },
+    {
+      "epoch": 0.6047197640117994,
+      "grad_norm": 3.105314016342163,
+      "learning_rate": 1.7679430718086243e-05,
+      "loss": 1.2951,
+      "step": 205
+    },
+    {
+      "epoch": 0.6076696165191741,
+      "grad_norm": 2.8709449768066406,
+      "learning_rate": 1.7454242846071085e-05,
+      "loss": 1.2919,
+      "step": 206
+    },
+    {
+      "epoch": 0.6106194690265486,
+      "grad_norm": 2.7367875576019287,
+      "learning_rate": 1.722972657435572e-05,
+      "loss": 1.2755,
+      "step": 207
+    },
+    {
+      "epoch": 0.6135693215339233,
+      "grad_norm": 2.6397578716278076,
+      "learning_rate": 1.700590188571887e-05,
+      "loss": 1.156,
+      "step": 208
+    },
+    {
+      "epoch": 0.616519174041298,
+      "grad_norm": 2.5080482959747314,
+      "learning_rate": 1.6782788701385783e-05,
+      "loss": 1.2038,
+      "step": 209
+    },
+    {
+      "epoch": 0.6194690265486725,
+      "grad_norm": 2.567392349243164,
+      "learning_rate": 1.656040687925519e-05,
+      "loss": 1.1947,
+      "step": 210
+    },
+    {
+      "epoch": 0.6224188790560472,
+      "grad_norm": 2.3554275035858154,
+      "learning_rate": 1.633877621213192e-05,
+      "loss": 1.1568,
+      "step": 211
+    },
+    {
+      "epoch": 0.6253687315634219,
+      "grad_norm": 2.487006902694702,
+      "learning_rate": 1.611791642596516e-05,
+      "loss": 1.2156,
+      "step": 212
+    },
+    {
+      "epoch": 0.6283185840707964,
+      "grad_norm": 2.2717669010162354,
+      "learning_rate": 1.58978471780929e-05,
+      "loss": 1.0981,
+      "step": 213
+    },
+    {
+      "epoch": 0.6312684365781711,
+      "grad_norm": 2.5773253440856934,
+      "learning_rate": 1.567858805549229e-05,
+      "loss": 1.1829,
+      "step": 214
+    },
+    {
+      "epoch": 0.6342182890855457,
+      "grad_norm": 2.552574872970581,
+      "learning_rate": 1.5460158573036288e-05,
+      "loss": 1.1801,
+      "step": 215
+    },
+    {
+      "epoch": 0.6371681415929203,
+      "grad_norm": 2.42258620262146,
+      "learning_rate": 1.5242578171756866e-05,
+      "loss": 1.1209,
+      "step": 216
+    },
+    {
+      "epoch": 0.640117994100295,
+      "grad_norm": 2.695371389389038,
+      "learning_rate": 1.5025866217114592e-05,
+      "loss": 1.2909,
+      "step": 217
+    },
+    {
+      "epoch": 0.6430678466076696,
+      "grad_norm": 2.2983741760253906,
+      "learning_rate": 1.4810041997275092e-05,
+      "loss": 1.2118,
+      "step": 218
+    },
+    {
+      "epoch": 0.6460176991150443,
+      "grad_norm": 2.4395010471343994,
+      "learning_rate": 1.4595124721392312e-05,
+      "loss": 1.2001,
+      "step": 219
+    },
+    {
+      "epoch": 0.6489675516224189,
+      "grad_norm": 2.4421238899230957,
+      "learning_rate": 1.4381133517898804e-05,
+      "loss": 1.2167,
+      "step": 220
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 339,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.096860732342272e+17,
+  "train_batch_size": 40,
+  "trial_name": null,
+  "trial_params": null
+}

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-220/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c750c736195636f8dc7c7a3baf13a9603e67d3864aae2d456029352a30e549a9
+size 6225

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/README.md ADDED Viewed

	@@ -0,0 +1,210 @@

+---
+base_model: unsloth/mistral-7b-v0.3-bnb-4bit
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:unsloth/mistral-7b-v0.3-bnb-4bit
+- lora
+- sft
+- transformers
+- trl
+- unsloth
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/adapter_config.json ADDED Viewed

	@@ -0,0 +1,53 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "MistralForCausalLM",
+    "parent_library": "transformers.models.mistral.modeling_mistral",
+    "unsloth_fixed": true
+  },
+  "base_model_name_or_path": "unsloth/mistral-7b-v0.3-bnb-4bit",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "embed_tokens",
+    "lm_head"
+  ],
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "k_proj",
+    "gate_proj",
+    "q_proj",
+    "down_proj",
+    "v_proj",
+    "up_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3af2493acbd2b5cd6f932d3d361a8d4df911aa35e678e39b783b662f93146d95
+size 1208020312

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:db1e8202745e1d6d2f428769eb39794fa4af5b16f9dcf222f0c4e22dab893f8d
+size 1687688427

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:48ea209aa975528d2e9f8e5188d85dd75ec291a0f9e5b66853d2b87d4a8da11a
+size 14709

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a4a11b995be48d267931028d52b534dc06544b2517558b39ce122a9897e8b7cc
+size 1465

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[control_768]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
+size 587404

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2407 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 339,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0029498525073746312,
+      "grad_norm": 8.784540176391602,
+      "learning_rate": 0.0,
+      "loss": 2.5352,
+      "step": 1
+    },
+    {
+      "epoch": 0.0058997050147492625,
+      "grad_norm": 9.609512329101562,
+      "learning_rate": 8.333333333333334e-06,
+      "loss": 2.5809,
+      "step": 2
+    },
+    {
+      "epoch": 0.008849557522123894,
+      "grad_norm": 7.955018043518066,
+      "learning_rate": 1.6666666666666667e-05,
+      "loss": 2.3861,
+      "step": 3
+    },
+    {
+      "epoch": 0.011799410029498525,
+      "grad_norm": 5.426074028015137,
+      "learning_rate": 2.5e-05,
+      "loss": 2.2533,
+      "step": 4
+    },
+    {
+      "epoch": 0.014749262536873156,
+      "grad_norm": 4.8562164306640625,
+      "learning_rate": 3.3333333333333335e-05,
+      "loss": 2.1205,
+      "step": 5
+    },
+    {
+      "epoch": 0.017699115044247787,
+      "grad_norm": 4.488800525665283,
+      "learning_rate": 4.166666666666667e-05,
+      "loss": 2.0452,
+      "step": 6
+    },
+    {
+      "epoch": 0.02064896755162242,
+      "grad_norm": 3.83853816986084,
+      "learning_rate": 5e-05,
+      "loss": 1.9761,
+      "step": 7
+    },
+    {
+      "epoch": 0.02359882005899705,
+      "grad_norm": 3.6622397899627686,
+      "learning_rate": 4.999888745376028e-05,
+      "loss": 1.8768,
+      "step": 8
+    },
+    {
+      "epoch": 0.02654867256637168,
+      "grad_norm": 4.476569652557373,
+      "learning_rate": 4.9995549914061836e-05,
+      "loss": 1.9589,
+      "step": 9
+    },
+    {
+      "epoch": 0.029498525073746312,
+      "grad_norm": 3.685044527053833,
+      "learning_rate": 4.998998767795805e-05,
+      "loss": 1.9054,
+      "step": 10
+    },
+    {
+      "epoch": 0.032448377581120944,
+      "grad_norm": 3.9684455394744873,
+      "learning_rate": 4.99822012405085e-05,
+      "loss": 1.887,
+      "step": 11
+    },
+    {
+      "epoch": 0.035398230088495575,
+      "grad_norm": 4.1171064376831055,
+      "learning_rate": 4.997219129473495e-05,
+      "loss": 1.7673,
+      "step": 12
+    },
+    {
+      "epoch": 0.038348082595870206,
+      "grad_norm": 3.5916616916656494,
+      "learning_rate": 4.995995873155958e-05,
+      "loss": 1.8061,
+      "step": 13
+    },
+    {
+      "epoch": 0.04129793510324484,
+      "grad_norm": 3.527775287628174,
+      "learning_rate": 4.994550463972577e-05,
+      "loss": 1.8472,
+      "step": 14
+    },
+    {
+      "epoch": 0.04424778761061947,
+      "grad_norm": 3.540343761444092,
+      "learning_rate": 4.992883030570116e-05,
+      "loss": 1.7783,
+      "step": 15
+    },
+    {
+      "epoch": 0.0471976401179941,
+      "grad_norm": 3.5722901821136475,
+      "learning_rate": 4.9909937213563165e-05,
+      "loss": 1.7471,
+      "step": 16
+    },
+    {
+      "epoch": 0.05014749262536873,
+      "grad_norm": 3.2874090671539307,
+      "learning_rate": 4.988882704486687e-05,
+      "loss": 1.7344,
+      "step": 17
+    },
+    {
+      "epoch": 0.05309734513274336,
+      "grad_norm": 3.2011971473693848,
+      "learning_rate": 4.9865501678495375e-05,
+      "loss": 1.7944,
+      "step": 18
+    },
+    {
+      "epoch": 0.05604719764011799,
+      "grad_norm": 3.8273420333862305,
+      "learning_rate": 4.9839963190492576e-05,
+      "loss": 1.8012,
+      "step": 19
+    },
+    {
+      "epoch": 0.058997050147492625,
+      "grad_norm": 3.3719303607940674,
+      "learning_rate": 4.9812213853878376e-05,
+      "loss": 1.7189,
+      "step": 20
+    },
+    {
+      "epoch": 0.061946902654867256,
+      "grad_norm": 3.548304557800293,
+      "learning_rate": 4.978225613844639e-05,
+      "loss": 1.7513,
+      "step": 21
+    },
+    {
+      "epoch": 0.06489675516224189,
+      "grad_norm": 3.688946485519409,
+      "learning_rate": 4.975009271054409e-05,
+      "loss": 1.7761,
+      "step": 22
+    },
+    {
+      "epoch": 0.06784660766961652,
+      "grad_norm": 3.9207510948181152,
+      "learning_rate": 4.971572643283557e-05,
+      "loss": 1.7487,
+      "step": 23
+    },
+    {
+      "epoch": 0.07079646017699115,
+      "grad_norm": 3.49283504486084,
+      "learning_rate": 4.9679160364046644e-05,
+      "loss": 1.8834,
+      "step": 24
+    },
+    {
+      "epoch": 0.07374631268436578,
+      "grad_norm": 3.67889404296875,
+      "learning_rate": 4.9640397758692715e-05,
+      "loss": 1.7992,
+      "step": 25
+    },
+    {
+      "epoch": 0.07669616519174041,
+      "grad_norm": 4.384673595428467,
+      "learning_rate": 4.9599442066789035e-05,
+      "loss": 1.8381,
+      "step": 26
+    },
+    {
+      "epoch": 0.07964601769911504,
+      "grad_norm": 3.9461209774017334,
+      "learning_rate": 4.95562969335437e-05,
+      "loss": 1.9321,
+      "step": 27
+    },
+    {
+      "epoch": 0.08259587020648967,
+      "grad_norm": 3.785196542739868,
+      "learning_rate": 4.9510966199033174e-05,
+      "loss": 1.8259,
+      "step": 28
+    },
+    {
+      "epoch": 0.0855457227138643,
+      "grad_norm": 3.6551666259765625,
+      "learning_rate": 4.946345389786049e-05,
+      "loss": 1.794,
+      "step": 29
+    },
+    {
+      "epoch": 0.08849557522123894,
+      "grad_norm": 4.191623210906982,
+      "learning_rate": 4.941376425879624e-05,
+      "loss": 1.8796,
+      "step": 30
+    },
+    {
+      "epoch": 0.09144542772861357,
+      "grad_norm": 4.097504138946533,
+      "learning_rate": 4.936190170440208e-05,
+      "loss": 1.8485,
+      "step": 31
+    },
+    {
+      "epoch": 0.0943952802359882,
+      "grad_norm": 3.767967939376831,
+      "learning_rate": 4.930787085063723e-05,
+      "loss": 1.8537,
+      "step": 32
+    },
+    {
+      "epoch": 0.09734513274336283,
+      "grad_norm": 3.880598783493042,
+      "learning_rate": 4.925167650644752e-05,
+      "loss": 1.8942,
+      "step": 33
+    },
+    {
+      "epoch": 0.10029498525073746,
+      "grad_norm": 4.313149452209473,
+      "learning_rate": 4.9193323673337476e-05,
+      "loss": 1.8464,
+      "step": 34
+    },
+    {
+      "epoch": 0.10324483775811209,
+      "grad_norm": 4.014875888824463,
+      "learning_rate": 4.9132817544925085e-05,
+      "loss": 1.7687,
+      "step": 35
+    },
+    {
+      "epoch": 0.10619469026548672,
+      "grad_norm": 4.243000030517578,
+      "learning_rate": 4.907016350647961e-05,
+      "loss": 1.8674,
+      "step": 36
+    },
+    {
+      "epoch": 0.10914454277286136,
+      "grad_norm": 4.010824203491211,
+      "learning_rate": 4.9005367134442235e-05,
+      "loss": 1.772,
+      "step": 37
+    },
+    {
+      "epoch": 0.11209439528023599,
+      "grad_norm": 3.9447169303894043,
+      "learning_rate": 4.893843419592977e-05,
+      "loss": 1.8545,
+      "step": 38
+    },
+    {
+      "epoch": 0.11504424778761062,
+      "grad_norm": 3.677614450454712,
+      "learning_rate": 4.886937064822134e-05,
+      "loss": 1.7783,
+      "step": 39
+    },
+    {
+      "epoch": 0.11799410029498525,
+      "grad_norm": 3.861032485961914,
+      "learning_rate": 4.8798182638228166e-05,
+      "loss": 1.7723,
+      "step": 40
+    },
+    {
+      "epoch": 0.12094395280235988,
+      "grad_norm": 3.7934088706970215,
+      "learning_rate": 4.872487650194647e-05,
+      "loss": 1.7814,
+      "step": 41
+    },
+    {
+      "epoch": 0.12389380530973451,
+      "grad_norm": 4.10675573348999,
+      "learning_rate": 4.864945876389356e-05,
+      "loss": 1.7406,
+      "step": 42
+    },
+    {
+      "epoch": 0.12684365781710916,
+      "grad_norm": 3.8732664585113525,
+      "learning_rate": 4.857193613652711e-05,
+      "loss": 1.7047,
+      "step": 43
+    },
+    {
+      "epoch": 0.12979351032448377,
+      "grad_norm": 3.985630512237549,
+      "learning_rate": 4.849231551964771e-05,
+      "loss": 1.6778,
+      "step": 44
+    },
+    {
+      "epoch": 0.13274336283185842,
+      "grad_norm": 4.057476043701172,
+      "learning_rate": 4.841060399978481e-05,
+      "loss": 1.6264,
+      "step": 45
+    },
+    {
+      "epoch": 0.13569321533923304,
+      "grad_norm": 4.0363335609436035,
+      "learning_rate": 4.8326808849565936e-05,
+      "loss": 1.5671,
+      "step": 46
+    },
+    {
+      "epoch": 0.13864306784660768,
+      "grad_norm": 4.249370098114014,
+      "learning_rate": 4.824093752706943e-05,
+      "loss": 1.5605,
+      "step": 47
+    },
+    {
+      "epoch": 0.1415929203539823,
+      "grad_norm": 4.4337615966796875,
+      "learning_rate": 4.815299767516065e-05,
+      "loss": 1.4718,
+      "step": 48
+    },
+    {
+      "epoch": 0.14454277286135694,
+      "grad_norm": 4.286309719085693,
+      "learning_rate": 4.806299712081172e-05,
+      "loss": 1.3906,
+      "step": 49
+    },
+    {
+      "epoch": 0.14749262536873156,
+      "grad_norm": 4.912067413330078,
+      "learning_rate": 4.797094387440491e-05,
+      "loss": 1.1653,
+      "step": 50
+    },
+    {
+      "epoch": 0.1504424778761062,
+      "grad_norm": 8.031450271606445,
+      "learning_rate": 4.787684612901965e-05,
+      "loss": 1.888,
+      "step": 51
+    },
+    {
+      "epoch": 0.15339233038348082,
+      "grad_norm": 7.04852819442749,
+      "learning_rate": 4.77807122597034e-05,
+      "loss": 1.8472,
+      "step": 52
+    },
+    {
+      "epoch": 0.15634218289085547,
+      "grad_norm": 5.582455635070801,
+      "learning_rate": 4.768255082272611e-05,
+      "loss": 1.7573,
+      "step": 53
+    },
+    {
+      "epoch": 0.1592920353982301,
+      "grad_norm": 4.440758228302002,
+      "learning_rate": 4.758237055481881e-05,
+      "loss": 1.7541,
+      "step": 54
+    },
+    {
+      "epoch": 0.16224188790560473,
+      "grad_norm": 3.537503242492676,
+      "learning_rate": 4.748018037239592e-05,
+      "loss": 1.6726,
+      "step": 55
+    },
+    {
+      "epoch": 0.16519174041297935,
+      "grad_norm": 3.002551317214966,
+      "learning_rate": 4.7375989370761695e-05,
+      "loss": 1.5917,
+      "step": 56
+    },
+    {
+      "epoch": 0.168141592920354,
+      "grad_norm": 3.5368826389312744,
+      "learning_rate": 4.726980682330071e-05,
+      "loss": 1.6657,
+      "step": 57
+    },
+    {
+      "epoch": 0.1710914454277286,
+      "grad_norm": 3.5577824115753174,
+      "learning_rate": 4.7161642180652464e-05,
+      "loss": 1.6253,
+      "step": 58
+    },
+    {
+      "epoch": 0.17404129793510326,
+      "grad_norm": 3.154500961303711,
+      "learning_rate": 4.7051505069870286e-05,
+      "loss": 1.5817,
+      "step": 59
+    },
+    {
+      "epoch": 0.17699115044247787,
+      "grad_norm": 3.1665284633636475,
+      "learning_rate": 4.693940529356444e-05,
+      "loss": 1.6044,
+      "step": 60
+    },
+    {
+      "epoch": 0.17994100294985252,
+      "grad_norm": 3.07588529586792,
+      "learning_rate": 4.6825352829029705e-05,
+      "loss": 1.49,
+      "step": 61
+    },
+    {
+      "epoch": 0.18289085545722714,
+      "grad_norm": 2.87255597114563,
+      "learning_rate": 4.670935782735732e-05,
+      "loss": 1.5619,
+      "step": 62
+    },
+    {
+      "epoch": 0.18584070796460178,
+      "grad_norm": 2.7868995666503906,
+      "learning_rate": 4.6591430612531515e-05,
+      "loss": 1.5085,
+      "step": 63
+    },
+    {
+      "epoch": 0.1887905604719764,
+      "grad_norm": 2.9793434143066406,
+      "learning_rate": 4.647158168051066e-05,
+      "loss": 1.5677,
+      "step": 64
+    },
+    {
+      "epoch": 0.19174041297935104,
+      "grad_norm": 2.7829337120056152,
+      "learning_rate": 4.6349821698293025e-05,
+      "loss": 1.5359,
+      "step": 65
+    },
+    {
+      "epoch": 0.19469026548672566,
+      "grad_norm": 2.799654483795166,
+      "learning_rate": 4.622616150296745e-05,
+      "loss": 1.5196,
+      "step": 66
+    },
+    {
+      "epoch": 0.1976401179941003,
+      "grad_norm": 2.9134767055511475,
+      "learning_rate": 4.6100612100748765e-05,
+      "loss": 1.6036,
+      "step": 67
+    },
+    {
+      "epoch": 0.20058997050147492,
+      "grad_norm": 3.1140592098236084,
+      "learning_rate": 4.5973184665998186e-05,
+      "loss": 1.6003,
+      "step": 68
+    },
+    {
+      "epoch": 0.20353982300884957,
+      "grad_norm": 3.3117525577545166,
+      "learning_rate": 4.5843890540228794e-05,
+      "loss": 1.6092,
+      "step": 69
+    },
+    {
+      "epoch": 0.20648967551622419,
+      "grad_norm": 2.9576680660247803,
+      "learning_rate": 4.571274123109606e-05,
+      "loss": 1.5685,
+      "step": 70
+    },
+    {
+      "epoch": 0.20943952802359883,
+      "grad_norm": 2.9042575359344482,
+      "learning_rate": 4.557974841137364e-05,
+      "loss": 1.549,
+      "step": 71
+    },
+    {
+      "epoch": 0.21238938053097345,
+      "grad_norm": 2.8921477794647217,
+      "learning_rate": 4.544492391791445e-05,
+      "loss": 1.6279,
+      "step": 72
+    },
+    {
+      "epoch": 0.2153392330383481,
+      "grad_norm": 3.005927562713623,
+      "learning_rate": 4.530827975059715e-05,
+      "loss": 1.5111,
+      "step": 73
+    },
+    {
+      "epoch": 0.2182890855457227,
+      "grad_norm": 3.4114937782287598,
+      "learning_rate": 4.5169828071258116e-05,
+      "loss": 1.6132,
+      "step": 74
+    },
+    {
+      "epoch": 0.22123893805309736,
+      "grad_norm": 3.160595655441284,
+      "learning_rate": 4.502958120260894e-05,
+      "loss": 1.5978,
+      "step": 75
+    },
+    {
+      "epoch": 0.22418879056047197,
+      "grad_norm": 3.215529441833496,
+      "learning_rate": 4.488755162713975e-05,
+      "loss": 1.705,
+      "step": 76
+    },
+    {
+      "epoch": 0.22713864306784662,
+      "grad_norm": 3.4715919494628906,
+      "learning_rate": 4.474375198600815e-05,
+      "loss": 1.6392,
+      "step": 77
+    },
+    {
+      "epoch": 0.23008849557522124,
+      "grad_norm": 3.2136619091033936,
+      "learning_rate": 4.4598195077914145e-05,
+      "loss": 1.6188,
+      "step": 78
+    },
+    {
+      "epoch": 0.23303834808259588,
+      "grad_norm": 3.3104329109191895,
+      "learning_rate": 4.445089385796099e-05,
+      "loss": 1.7059,
+      "step": 79
+    },
+    {
+      "epoch": 0.2359882005899705,
+      "grad_norm": 3.097895622253418,
+      "learning_rate": 4.4301861436502156e-05,
+      "loss": 1.7615,
+      "step": 80
+    },
+    {
+      "epoch": 0.23893805309734514,
+      "grad_norm": 3.3492372035980225,
+      "learning_rate": 4.415111107797445e-05,
+      "loss": 1.6919,
+      "step": 81
+    },
+    {
+      "epoch": 0.24188790560471976,
+      "grad_norm": 4.477549076080322,
+      "learning_rate": 4.3998656199717435e-05,
+      "loss": 1.6855,
+      "step": 82
+    },
+    {
+      "epoch": 0.2448377581120944,
+      "grad_norm": 4.241739273071289,
+      "learning_rate": 4.384451037077924e-05,
+      "loss": 1.7483,
+      "step": 83
+    },
+    {
+      "epoch": 0.24778761061946902,
+      "grad_norm": 3.785494565963745,
+      "learning_rate": 4.368868731070884e-05,
+      "loss": 1.8123,
+      "step": 84
+    },
+    {
+      "epoch": 0.25073746312684364,
+      "grad_norm": 4.163392543792725,
+      "learning_rate": 4.353120088833501e-05,
+      "loss": 1.663,
+      "step": 85
+    },
+    {
+      "epoch": 0.2536873156342183,
+      "grad_norm": 4.269979953765869,
+      "learning_rate": 4.33720651205319e-05,
+      "loss": 1.6724,
+      "step": 86
+    },
+    {
+      "epoch": 0.25663716814159293,
+      "grad_norm": 3.830099582672119,
+      "learning_rate": 4.321129417097153e-05,
+      "loss": 1.7422,
+      "step": 87
+    },
+    {
+      "epoch": 0.25958702064896755,
+      "grad_norm": 3.549696207046509,
+      "learning_rate": 4.3048902348863116e-05,
+      "loss": 1.6266,
+      "step": 88
+    },
+    {
+      "epoch": 0.26253687315634217,
+      "grad_norm": 3.9003500938415527,
+      "learning_rate": 4.288490410767955e-05,
+      "loss": 1.6474,
+      "step": 89
+    },
+    {
+      "epoch": 0.26548672566371684,
+      "grad_norm": 3.736586332321167,
+      "learning_rate": 4.271931404387096e-05,
+      "loss": 1.6389,
+      "step": 90
+    },
+    {
+      "epoch": 0.26843657817109146,
+      "grad_norm": 4.005049228668213,
+      "learning_rate": 4.255214689556557e-05,
+      "loss": 1.6593,
+      "step": 91
+    },
+    {
+      "epoch": 0.2713864306784661,
+      "grad_norm": 3.6020376682281494,
+      "learning_rate": 4.2383417541257954e-05,
+      "loss": 1.5903,
+      "step": 92
+    },
+    {
+      "epoch": 0.2743362831858407,
+      "grad_norm": 3.8671212196350098,
+      "learning_rate": 4.221314099848481e-05,
+      "loss": 1.566,
+      "step": 93
+    },
+    {
+      "epoch": 0.27728613569321536,
+      "grad_norm": 3.5491201877593994,
+      "learning_rate": 4.204133242248832e-05,
+      "loss": 1.6435,
+      "step": 94
+    },
+    {
+      "epoch": 0.28023598820059,
+      "grad_norm": 3.8159406185150146,
+      "learning_rate": 4.186800710486732e-05,
+      "loss": 1.5173,
+      "step": 95
+    },
+    {
+      "epoch": 0.2831858407079646,
+      "grad_norm": 3.911926746368408,
+      "learning_rate": 4.169318047221621e-05,
+      "loss": 1.4565,
+      "step": 96
+    },
+    {
+      "epoch": 0.2861356932153392,
+      "grad_norm": 4.003925323486328,
+      "learning_rate": 4.151686808475204e-05,
+      "loss": 1.5175,
+      "step": 97
+    },
+    {
+      "epoch": 0.2890855457227139,
+      "grad_norm": 4.023141860961914,
+      "learning_rate": 4.1339085634929485e-05,
+      "loss": 1.3598,
+      "step": 98
+    },
+    {
+      "epoch": 0.2920353982300885,
+      "grad_norm": 4.288107872009277,
+      "learning_rate": 4.115984894604423e-05,
+      "loss": 1.2067,
+      "step": 99
+    },
+    {
+      "epoch": 0.2949852507374631,
+      "grad_norm": 4.7076592445373535,
+      "learning_rate": 4.0979173970824626e-05,
+      "loss": 1.0602,
+      "step": 100
+    },
+    {
+      "epoch": 0.29793510324483774,
+      "grad_norm": 6.30832052230835,
+      "learning_rate": 4.0797076790011804e-05,
+      "loss": 1.704,
+      "step": 101
+    },
+    {
+      "epoch": 0.3008849557522124,
+      "grad_norm": 4.847335338592529,
+      "learning_rate": 4.0613573610928476e-05,
+      "loss": 1.5943,
+      "step": 102
+    },
+    {
+      "epoch": 0.30383480825958703,
+      "grad_norm": 4.62391471862793,
+      "learning_rate": 4.0428680766036384e-05,
+      "loss": 1.6134,
+      "step": 103
+    },
+    {
+      "epoch": 0.30678466076696165,
+      "grad_norm": 3.951815128326416,
+      "learning_rate": 4.0242414711482676e-05,
+      "loss": 1.5143,
+      "step": 104
+    },
+    {
+      "epoch": 0.30973451327433627,
+      "grad_norm": 3.5230190753936768,
+      "learning_rate": 4.005479202563524e-05,
+      "loss": 1.557,
+      "step": 105
+    },
+    {
+      "epoch": 0.31268436578171094,
+      "grad_norm": 3.093968391418457,
+      "learning_rate": 3.986582940760717e-05,
+      "loss": 1.5439,
+      "step": 106
+    },
+    {
+      "epoch": 0.31563421828908556,
+      "grad_norm": 2.9362878799438477,
+      "learning_rate": 3.967554367577047e-05,
+      "loss": 1.454,
+      "step": 107
+    },
+    {
+      "epoch": 0.3185840707964602,
+      "grad_norm": 2.7811033725738525,
+      "learning_rate": 3.948395176625918e-05,
+      "loss": 1.4497,
+      "step": 108
+    },
+    {
+      "epoch": 0.3215339233038348,
+      "grad_norm": 2.7802510261535645,
+      "learning_rate": 3.929107073146197e-05,
+      "loss": 1.3589,
+      "step": 109
+    },
+    {
+      "epoch": 0.32448377581120946,
+      "grad_norm": 3.563616991043091,
+      "learning_rate": 3.909691773850445e-05,
+      "loss": 1.4628,
+      "step": 110
+    },
+    {
+      "epoch": 0.3274336283185841,
+      "grad_norm": 3.047844886779785,
+      "learning_rate": 3.890151006772119e-05,
+      "loss": 1.4186,
+      "step": 111
+    },
+    {
+      "epoch": 0.3303834808259587,
+      "grad_norm": 3.004772663116455,
+      "learning_rate": 3.8704865111117746e-05,
+      "loss": 1.375,
+      "step": 112
+    },
+    {
+      "epoch": 0.3333333333333333,
+      "grad_norm": 2.8555989265441895,
+      "learning_rate": 3.850700037082268e-05,
+      "loss": 1.303,
+      "step": 113
+    },
+    {
+      "epoch": 0.336283185840708,
+      "grad_norm": 2.86316180229187,
+      "learning_rate": 3.83079334575298e-05,
+      "loss": 1.3022,
+      "step": 114
+    },
+    {
+      "epoch": 0.3392330383480826,
+      "grad_norm": 3.3198935985565186,
+      "learning_rate": 3.8107682088930794e-05,
+      "loss": 1.3897,
+      "step": 115
+    },
+    {
+      "epoch": 0.3421828908554572,
+      "grad_norm": 3.1176650524139404,
+      "learning_rate": 3.790626408813822e-05,
+      "loss": 1.3162,
+      "step": 116
+    },
+    {
+      "epoch": 0.34513274336283184,
+      "grad_norm": 2.670988082885742,
+      "learning_rate": 3.7703697382099234e-05,
+      "loss": 1.3686,
+      "step": 117
+    },
+    {
+      "epoch": 0.3480825958702065,
+      "grad_norm": 3.0355708599090576,
+      "learning_rate": 3.7500000000000003e-05,
+      "loss": 1.4136,
+      "step": 118
+    },
+    {
+      "epoch": 0.35103244837758113,
+      "grad_norm": 2.5040009021759033,
+      "learning_rate": 3.729519007166105e-05,
+      "loss": 1.4136,
+      "step": 119
+    },
+    {
+      "epoch": 0.35398230088495575,
+      "grad_norm": 2.4674110412597656,
+      "learning_rate": 3.7089285825923615e-05,
+      "loss": 1.4819,
+      "step": 120
+    },
+    {
+      "epoch": 0.35693215339233036,
+      "grad_norm": 3.2417714595794678,
+      "learning_rate": 3.688230558902725e-05,
+      "loss": 1.4959,
+      "step": 121
+    },
+    {
+      "epoch": 0.35988200589970504,
+      "grad_norm": 2.698197364807129,
+      "learning_rate": 3.667426778297871e-05,
+      "loss": 1.493,
+      "step": 122
+    },
+    {
+      "epoch": 0.36283185840707965,
+      "grad_norm": 2.9125115871429443,
+      "learning_rate": 3.646519092391227e-05,
+      "loss": 1.512,
+      "step": 123
+    },
+    {
+      "epoch": 0.36578171091445427,
+      "grad_norm": 2.803999900817871,
+      "learning_rate": 3.6255093620441834e-05,
+      "loss": 1.4455,
+      "step": 124
+    },
+    {
+      "epoch": 0.3687315634218289,
+      "grad_norm": 2.8682003021240234,
+      "learning_rate": 3.604399457200458e-05,
+      "loss": 1.4457,
+      "step": 125
+    },
+    {
+      "epoch": 0.37168141592920356,
+      "grad_norm": 2.987795114517212,
+      "learning_rate": 3.583191256719672e-05,
+      "loss": 1.4813,
+      "step": 126
+    },
+    {
+      "epoch": 0.3746312684365782,
+      "grad_norm": 2.8751752376556396,
+      "learning_rate": 3.56188664821012e-05,
+      "loss": 1.4557,
+      "step": 127
+    },
+    {
+      "epoch": 0.3775811209439528,
+      "grad_norm": 3.237694025039673,
+      "learning_rate": 3.540487527860769e-05,
+      "loss": 1.5434,
+      "step": 128
+    },
+    {
+      "epoch": 0.3805309734513274,
+      "grad_norm": 3.4298689365386963,
+      "learning_rate": 3.518995800272491e-05,
+      "loss": 1.4578,
+      "step": 129
+    },
+    {
+      "epoch": 0.3834808259587021,
+      "grad_norm": 3.414600372314453,
+      "learning_rate": 3.497413378288541e-05,
+      "loss": 1.5302,
+      "step": 130
+    },
+    {
+      "epoch": 0.3864306784660767,
+      "grad_norm": 3.4248905181884766,
+      "learning_rate": 3.475742182824314e-05,
+      "loss": 1.5351,
+      "step": 131
+    },
+    {
+      "epoch": 0.3893805309734513,
+      "grad_norm": 3.316810131072998,
+      "learning_rate": 3.453984142696372e-05,
+      "loss": 1.617,
+      "step": 132
+    },
+    {
+      "epoch": 0.39233038348082594,
+      "grad_norm": 3.3763375282287598,
+      "learning_rate": 3.432141194450772e-05,
+      "loss": 1.5688,
+      "step": 133
+    },
+    {
+      "epoch": 0.3952802359882006,
+      "grad_norm": 3.3588340282440186,
+      "learning_rate": 3.41021528219071e-05,
+      "loss": 1.6525,
+      "step": 134
+    },
+    {
+      "epoch": 0.39823008849557523,
+      "grad_norm": 4.095376968383789,
+      "learning_rate": 3.3882083574034844e-05,
+      "loss": 1.649,
+      "step": 135
+    },
+    {
+      "epoch": 0.40117994100294985,
+      "grad_norm": 3.3478126525878906,
+      "learning_rate": 3.3661223787868094e-05,
+      "loss": 1.6012,
+      "step": 136
+    },
+    {
+      "epoch": 0.40412979351032446,
+      "grad_norm": 3.3165597915649414,
+      "learning_rate": 3.3439593120744816e-05,
+      "loss": 1.6119,
+      "step": 137
+    },
+    {
+      "epoch": 0.40707964601769914,
+      "grad_norm": 3.542797327041626,
+      "learning_rate": 3.321721129861422e-05,
+      "loss": 1.5688,
+      "step": 138
+    },
+    {
+      "epoch": 0.41002949852507375,
+      "grad_norm": 3.593048572540283,
+      "learning_rate": 3.2994098114281134e-05,
+      "loss": 1.5018,
+      "step": 139
+    },
+    {
+      "epoch": 0.41297935103244837,
+      "grad_norm": 4.237592697143555,
+      "learning_rate": 3.277027342564428e-05,
+      "loss": 1.6425,
+      "step": 140
+    },
+    {
+      "epoch": 0.415929203539823,
+      "grad_norm": 3.76655650138855,
+      "learning_rate": 3.2545757153928924e-05,
+      "loss": 1.6121,
+      "step": 141
+    },
+    {
+      "epoch": 0.41887905604719766,
+      "grad_norm": 3.643519163131714,
+      "learning_rate": 3.232056928191376e-05,
+      "loss": 1.5565,
+      "step": 142
+    },
+    {
+      "epoch": 0.4218289085545723,
+      "grad_norm": 3.862382173538208,
+      "learning_rate": 3.209472985215243e-05,
+      "loss": 1.5727,
+      "step": 143
+    },
+    {
+      "epoch": 0.4247787610619469,
+      "grad_norm": 3.5415596961975098,
+      "learning_rate": 3.186825896518958e-05,
+      "loss": 1.5971,
+      "step": 144
+    },
+    {
+      "epoch": 0.4277286135693215,
+      "grad_norm": 3.5340561866760254,
+      "learning_rate": 3.164117677777191e-05,
+      "loss": 1.5406,
+      "step": 145
+    },
+    {
+      "epoch": 0.4306784660766962,
+      "grad_norm": 4.26082181930542,
+      "learning_rate": 3.141350350105413e-05,
+      "loss": 1.4473,
+      "step": 146
+    },
+    {
+      "epoch": 0.4336283185840708,
+      "grad_norm": 3.90486478805542,
+      "learning_rate": 3.118525939880007e-05,
+      "loss": 1.459,
+      "step": 147
+    },
+    {
+      "epoch": 0.4365781710914454,
+      "grad_norm": 4.1377410888671875,
+      "learning_rate": 3.0956464785579124e-05,
+      "loss": 1.4197,
+      "step": 148
+    },
+    {
+      "epoch": 0.43952802359882004,
+      "grad_norm": 4.407893180847168,
+      "learning_rate": 3.072714002495825e-05,
+      "loss": 1.3119,
+      "step": 149
+    },
+    {
+      "epoch": 0.4424778761061947,
+      "grad_norm": 4.6629557609558105,
+      "learning_rate": 3.0497305527689445e-05,
+      "loss": 1.09,
+      "step": 150
+    },
+    {
+      "epoch": 0.44542772861356933,
+      "grad_norm": 4.147300720214844,
+      "learning_rate": 3.0266981749893157e-05,
+      "loss": 1.536,
+      "step": 151
+    },
+    {
+      "epoch": 0.44837758112094395,
+      "grad_norm": 3.9903712272644043,
+      "learning_rate": 3.003618919123763e-05,
+      "loss": 1.4418,
+      "step": 152
+    },
+    {
+      "epoch": 0.45132743362831856,
+      "grad_norm": 4.015631198883057,
+      "learning_rate": 2.9804948393114324e-05,
+      "loss": 1.4583,
+      "step": 153
+    },
+    {
+      "epoch": 0.45427728613569324,
+      "grad_norm": 3.5220329761505127,
+      "learning_rate": 2.9573279936809667e-05,
+      "loss": 1.4275,
+      "step": 154
+    },
+    {
+      "epoch": 0.45722713864306785,
+      "grad_norm": 2.9233076572418213,
+      "learning_rate": 2.9341204441673266e-05,
+      "loss": 1.4147,
+      "step": 155
+    },
+    {
+      "epoch": 0.46017699115044247,
+      "grad_norm": 2.8685462474823,
+      "learning_rate": 2.9108742563282652e-05,
+      "loss": 1.3511,
+      "step": 156
+    },
+    {
+      "epoch": 0.4631268436578171,
+      "grad_norm": 2.6424460411071777,
+      "learning_rate": 2.8875914991604948e-05,
+      "loss": 1.3207,
+      "step": 157
+    },
+    {
+      "epoch": 0.46607669616519176,
+      "grad_norm": 2.790271520614624,
+      "learning_rate": 2.8642742449155284e-05,
+      "loss": 1.3057,
+      "step": 158
+    },
+    {
+      "epoch": 0.4690265486725664,
+      "grad_norm": 2.6307857036590576,
+      "learning_rate": 2.8409245689152503e-05,
+      "loss": 1.3061,
+      "step": 159
+    },
+    {
+      "epoch": 0.471976401179941,
+      "grad_norm": 2.461571455001831,
+      "learning_rate": 2.8175445493671972e-05,
+      "loss": 1.2907,
+      "step": 160
+    },
+    {
+      "epoch": 0.4749262536873156,
+      "grad_norm": 2.634023904800415,
+      "learning_rate": 2.794136267179596e-05,
+      "loss": 1.21,
+      "step": 161
+    },
+    {
+      "epoch": 0.4778761061946903,
+      "grad_norm": 2.609807014465332,
+      "learning_rate": 2.770701805776155e-05,
+      "loss": 1.1946,
+      "step": 162
+    },
+    {
+      "epoch": 0.4808259587020649,
+      "grad_norm": 2.385166645050049,
+      "learning_rate": 2.7472432509106248e-05,
+      "loss": 1.2135,
+      "step": 163
+    },
+    {
+      "epoch": 0.4837758112094395,
+      "grad_norm": 2.5615952014923096,
+      "learning_rate": 2.723762690481167e-05,
+      "loss": 1.263,
+      "step": 164
+    },
+    {
+      "epoch": 0.48672566371681414,
+      "grad_norm": 2.696706533432007,
+      "learning_rate": 2.7002622143445177e-05,
+      "loss": 1.2963,
+      "step": 165
+    },
+    {
+      "epoch": 0.4896755162241888,
+      "grad_norm": 2.426612377166748,
+      "learning_rate": 2.6767439141299865e-05,
+      "loss": 1.2133,
+      "step": 166
+    },
+    {
+      "epoch": 0.49262536873156343,
+      "grad_norm": 2.5794103145599365,
+      "learning_rate": 2.653209883053291e-05,
+      "loss": 1.2705,
+      "step": 167
+    },
+    {
+      "epoch": 0.49557522123893805,
+      "grad_norm": 2.885312795639038,
+      "learning_rate": 2.629662215730253e-05,
+      "loss": 1.3029,
+      "step": 168
+    },
+    {
+      "epoch": 0.49852507374631266,
+      "grad_norm": 2.72586727142334,
+      "learning_rate": 2.606103007990371e-05,
+      "loss": 1.2537,
+      "step": 169
+    },
+    {
+      "epoch": 0.5014749262536873,
+      "grad_norm": 2.772090196609497,
+      "learning_rate": 2.5825343566902837e-05,
+      "loss": 1.2844,
+      "step": 170
+    },
+    {
+      "epoch": 0.504424778761062,
+      "grad_norm": 3.0460994243621826,
+      "learning_rate": 2.5589583595271428e-05,
+      "loss": 1.2897,
+      "step": 171
+    },
+    {
+      "epoch": 0.5073746312684366,
+      "grad_norm": 2.7062950134277344,
+      "learning_rate": 2.5353771148519057e-05,
+      "loss": 1.2549,
+      "step": 172
+    },
+    {
+      "epoch": 0.5103244837758112,
+      "grad_norm": 2.8319649696350098,
+      "learning_rate": 2.511792721482581e-05,
+      "loss": 1.2618,
+      "step": 173
+    },
+    {
+      "epoch": 0.5132743362831859,
+      "grad_norm": 2.708251714706421,
+      "learning_rate": 2.48820727851742e-05,
+      "loss": 1.234,
+      "step": 174
+    },
+    {
+      "epoch": 0.5162241887905604,
+      "grad_norm": 2.71691632270813,
+      "learning_rate": 2.4646228851480956e-05,
+      "loss": 1.3505,
+      "step": 175
+    },
+    {
+      "epoch": 0.5191740412979351,
+      "grad_norm": 2.943074941635132,
+      "learning_rate": 2.441041640472858e-05,
+      "loss": 1.2841,
+      "step": 176
+    },
+    {
+      "epoch": 0.5221238938053098,
+      "grad_norm": 2.907444715499878,
+      "learning_rate": 2.417465643309716e-05,
+      "loss": 1.3211,
+      "step": 177
+    },
+    {
+      "epoch": 0.5250737463126843,
+      "grad_norm": 3.0803866386413574,
+      "learning_rate": 2.39389699200963e-05,
+      "loss": 1.4262,
+      "step": 178
+    },
+    {
+      "epoch": 0.528023598820059,
+      "grad_norm": 3.2019131183624268,
+      "learning_rate": 2.3703377842697475e-05,
+      "loss": 1.4857,
+      "step": 179
+    },
+    {
+      "epoch": 0.5309734513274337,
+      "grad_norm": 2.9756524562835693,
+      "learning_rate": 2.34679011694671e-05,
+      "loss": 1.4927,
+      "step": 180
+    },
+    {
+      "epoch": 0.5339233038348082,
+      "grad_norm": 3.07672381401062,
+      "learning_rate": 2.3232560858700137e-05,
+      "loss": 1.509,
+      "step": 181
+    },
+    {
+      "epoch": 0.5368731563421829,
+      "grad_norm": 3.461970806121826,
+      "learning_rate": 2.2997377856554822e-05,
+      "loss": 1.5949,
+      "step": 182
+    },
+    {
+      "epoch": 0.5398230088495575,
+      "grad_norm": 3.2279229164123535,
+      "learning_rate": 2.276237309518834e-05,
+      "loss": 1.4417,
+      "step": 183
+    },
+    {
+      "epoch": 0.5427728613569321,
+      "grad_norm": 3.782461643218994,
+      "learning_rate": 2.2527567490893758e-05,
+      "loss": 1.5903,
+      "step": 184
+    },
+    {
+      "epoch": 0.5457227138643068,
+      "grad_norm": 3.6933534145355225,
+      "learning_rate": 2.2292981942238454e-05,
+      "loss": 1.5214,
+      "step": 185
+    },
+    {
+      "epoch": 0.5486725663716814,
+      "grad_norm": 3.0459163188934326,
+      "learning_rate": 2.205863732820404e-05,
+      "loss": 1.5098,
+      "step": 186
+    },
+    {
+      "epoch": 0.551622418879056,
+      "grad_norm": 3.1090471744537354,
+      "learning_rate": 2.182455450632803e-05,
+      "loss": 1.5546,
+      "step": 187
+    },
+    {
+      "epoch": 0.5545722713864307,
+      "grad_norm": 3.4665732383728027,
+      "learning_rate": 2.159075431084751e-05,
+      "loss": 1.4933,
+      "step": 188
+    },
+    {
+      "epoch": 0.5575221238938053,
+      "grad_norm": 3.638923406600952,
+      "learning_rate": 2.1357257550844718e-05,
+      "loss": 1.587,
+      "step": 189
+    },
+    {
+      "epoch": 0.56047197640118,
+      "grad_norm": 3.5211169719696045,
+      "learning_rate": 2.1124085008395054e-05,
+      "loss": 1.4932,
+      "step": 190
+    },
+    {
+      "epoch": 0.5634218289085545,
+      "grad_norm": 3.786188840866089,
+      "learning_rate": 2.0891257436717353e-05,
+      "loss": 1.571,
+      "step": 191
+    },
+    {
+      "epoch": 0.5663716814159292,
+      "grad_norm": 3.264098882675171,
+      "learning_rate": 2.0658795558326743e-05,
+      "loss": 1.5047,
+      "step": 192
+    },
+    {
+      "epoch": 0.5693215339233039,
+      "grad_norm": 3.6925175189971924,
+      "learning_rate": 2.0426720063190335e-05,
+      "loss": 1.4851,
+      "step": 193
+    },
+    {
+      "epoch": 0.5722713864306784,
+      "grad_norm": 3.394176483154297,
+      "learning_rate": 2.0195051606885685e-05,
+      "loss": 1.4108,
+      "step": 194
+    },
+    {
+      "epoch": 0.5752212389380531,
+      "grad_norm": 3.930576801300049,
+      "learning_rate": 1.996381080876237e-05,
+      "loss": 1.4126,
+      "step": 195
+    },
+    {
+      "epoch": 0.5781710914454278,
+      "grad_norm": 3.7063424587249756,
+      "learning_rate": 1.973301825010685e-05,
+      "loss": 1.389,
+      "step": 196
+    },
+    {
+      "epoch": 0.5811209439528023,
+      "grad_norm": 3.6309196949005127,
+      "learning_rate": 1.950269447231056e-05,
+      "loss": 1.3373,
+      "step": 197
+    },
+    {
+      "epoch": 0.584070796460177,
+      "grad_norm": 3.9842283725738525,
+      "learning_rate": 1.9272859975041754e-05,
+      "loss": 1.2385,
+      "step": 198
+    },
+    {
+      "epoch": 0.5870206489675516,
+      "grad_norm": 3.9302854537963867,
+      "learning_rate": 1.904353521442088e-05,
+      "loss": 1.22,
+      "step": 199
+    },
+    {
+      "epoch": 0.5899705014749262,
+      "grad_norm": 3.901738166809082,
+      "learning_rate": 1.881474060119994e-05,
+      "loss": 1.0099,
+      "step": 200
+    },
+    {
+      "epoch": 0.5929203539823009,
+      "grad_norm": 4.378158092498779,
+      "learning_rate": 1.8586496498945877e-05,
+      "loss": 1.4772,
+      "step": 201
+    },
+    {
+      "epoch": 0.5958702064896755,
+      "grad_norm": 3.2930281162261963,
+      "learning_rate": 1.8358823222228097e-05,
+      "loss": 1.3278,
+      "step": 202
+    },
+    {
+      "epoch": 0.5988200589970502,
+      "grad_norm": 3.2225897312164307,
+      "learning_rate": 1.8131741034810435e-05,
+      "loss": 1.3519,
+      "step": 203
+    },
+    {
+      "epoch": 0.6017699115044248,
+      "grad_norm": 3.375405788421631,
+      "learning_rate": 1.790527014784758e-05,
+      "loss": 1.2963,
+      "step": 204
+    },
+    {
+      "epoch": 0.6047197640117994,
+      "grad_norm": 3.105314016342163,
+      "learning_rate": 1.7679430718086243e-05,
+      "loss": 1.2951,
+      "step": 205
+    },
+    {
+      "epoch": 0.6076696165191741,
+      "grad_norm": 2.8709449768066406,
+      "learning_rate": 1.7454242846071085e-05,
+      "loss": 1.2919,
+      "step": 206
+    },
+    {
+      "epoch": 0.6106194690265486,
+      "grad_norm": 2.7367875576019287,
+      "learning_rate": 1.722972657435572e-05,
+      "loss": 1.2755,
+      "step": 207
+    },
+    {
+      "epoch": 0.6135693215339233,
+      "grad_norm": 2.6397578716278076,
+      "learning_rate": 1.700590188571887e-05,
+      "loss": 1.156,
+      "step": 208
+    },
+    {
+      "epoch": 0.616519174041298,
+      "grad_norm": 2.5080482959747314,
+      "learning_rate": 1.6782788701385783e-05,
+      "loss": 1.2038,
+      "step": 209
+    },
+    {
+      "epoch": 0.6194690265486725,
+      "grad_norm": 2.567392349243164,
+      "learning_rate": 1.656040687925519e-05,
+      "loss": 1.1947,
+      "step": 210
+    },
+    {
+      "epoch": 0.6224188790560472,
+      "grad_norm": 2.3554275035858154,
+      "learning_rate": 1.633877621213192e-05,
+      "loss": 1.1568,
+      "step": 211
+    },
+    {
+      "epoch": 0.6253687315634219,
+      "grad_norm": 2.487006902694702,
+      "learning_rate": 1.611791642596516e-05,
+      "loss": 1.2156,
+      "step": 212
+    },
+    {
+      "epoch": 0.6283185840707964,
+      "grad_norm": 2.2717669010162354,
+      "learning_rate": 1.58978471780929e-05,
+      "loss": 1.0981,
+      "step": 213
+    },
+    {
+      "epoch": 0.6312684365781711,
+      "grad_norm": 2.5773253440856934,
+      "learning_rate": 1.567858805549229e-05,
+      "loss": 1.1829,
+      "step": 214
+    },
+    {
+      "epoch": 0.6342182890855457,
+      "grad_norm": 2.552574872970581,
+      "learning_rate": 1.5460158573036288e-05,
+      "loss": 1.1801,
+      "step": 215
+    },
+    {
+      "epoch": 0.6371681415929203,
+      "grad_norm": 2.42258620262146,
+      "learning_rate": 1.5242578171756866e-05,
+      "loss": 1.1209,
+      "step": 216
+    },
+    {
+      "epoch": 0.640117994100295,
+      "grad_norm": 2.695371389389038,
+      "learning_rate": 1.5025866217114592e-05,
+      "loss": 1.2909,
+      "step": 217
+    },
+    {
+      "epoch": 0.6430678466076696,
+      "grad_norm": 2.2983741760253906,
+      "learning_rate": 1.4810041997275092e-05,
+      "loss": 1.2118,
+      "step": 218
+    },
+    {
+      "epoch": 0.6460176991150443,
+      "grad_norm": 2.4395010471343994,
+      "learning_rate": 1.4595124721392312e-05,
+      "loss": 1.2001,
+      "step": 219
+    },
+    {
+      "epoch": 0.6489675516224189,
+      "grad_norm": 2.4421238899230957,
+      "learning_rate": 1.4381133517898804e-05,
+      "loss": 1.2167,
+      "step": 220
+    },
+    {
+      "epoch": 0.6519174041297935,
+      "grad_norm": 2.451107978820801,
+      "learning_rate": 1.4168087432803292e-05,
+      "loss": 1.1921,
+      "step": 221
+    },
+    {
+      "epoch": 0.6548672566371682,
+      "grad_norm": 2.361185312271118,
+      "learning_rate": 1.3956005427995421e-05,
+      "loss": 1.1925,
+      "step": 222
+    },
+    {
+      "epoch": 0.6578171091445427,
+      "grad_norm": 2.6095359325408936,
+      "learning_rate": 1.3744906379558165e-05,
+      "loss": 1.2141,
+      "step": 223
+    },
+    {
+      "epoch": 0.6607669616519174,
+      "grad_norm": 2.7814433574676514,
+      "learning_rate": 1.3534809076087734e-05,
+      "loss": 1.3088,
+      "step": 224
+    },
+    {
+      "epoch": 0.6637168141592921,
+      "grad_norm": 3.0350759029388428,
+      "learning_rate": 1.33257322170213e-05,
+      "loss": 1.4352,
+      "step": 225
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 2.9381890296936035,
+      "learning_rate": 1.3117694410972748e-05,
+      "loss": 1.3804,
+      "step": 226
+    },
+    {
+      "epoch": 0.6696165191740413,
+      "grad_norm": 2.685178518295288,
+      "learning_rate": 1.2910714174076394e-05,
+      "loss": 1.3524,
+      "step": 227
+    },
+    {
+      "epoch": 0.672566371681416,
+      "grad_norm": 2.9056496620178223,
+      "learning_rate": 1.2704809928338956e-05,
+      "loss": 1.3915,
+      "step": 228
+    },
+    {
+      "epoch": 0.6755162241887905,
+      "grad_norm": 2.951371192932129,
+      "learning_rate": 1.2500000000000006e-05,
+      "loss": 1.407,
+      "step": 229
+    },
+    {
+      "epoch": 0.6784660766961652,
+      "grad_norm": 2.880005359649658,
+      "learning_rate": 1.229630261790077e-05,
+      "loss": 1.3878,
+      "step": 230
+    },
+    {
+      "epoch": 0.6814159292035398,
+      "grad_norm": 2.908679485321045,
+      "learning_rate": 1.2093735911861778e-05,
+      "loss": 1.4797,
+      "step": 231
+    },
+    {
+      "epoch": 0.6843657817109144,
+      "grad_norm": 3.0564966201782227,
+      "learning_rate": 1.1892317911069212e-05,
+      "loss": 1.42,
+      "step": 232
+    },
+    {
+      "epoch": 0.6873156342182891,
+      "grad_norm": 2.724364757537842,
+      "learning_rate": 1.1692066542470201e-05,
+      "loss": 1.479,
+      "step": 233
+    },
+    {
+      "epoch": 0.6902654867256637,
+      "grad_norm": 3.532292366027832,
+      "learning_rate": 1.149299962917733e-05,
+      "loss": 1.5132,
+      "step": 234
+    },
+    {
+      "epoch": 0.6932153392330384,
+      "grad_norm": 3.1367177963256836,
+      "learning_rate": 1.1295134888882258e-05,
+      "loss": 1.4305,
+      "step": 235
+    },
+    {
+      "epoch": 0.696165191740413,
+      "grad_norm": 3.233671188354492,
+      "learning_rate": 1.1098489932278811e-05,
+      "loss": 1.4952,
+      "step": 236
+    },
+    {
+      "epoch": 0.6991150442477876,
+      "grad_norm": 3.1208622455596924,
+      "learning_rate": 1.0903082261495559e-05,
+      "loss": 1.4253,
+      "step": 237
+    },
+    {
+      "epoch": 0.7020648967551623,
+      "grad_norm": 3.090951442718506,
+      "learning_rate": 1.0708929268538034e-05,
+      "loss": 1.3034,
+      "step": 238
+    },
+    {
+      "epoch": 0.7050147492625368,
+      "grad_norm": 2.8808515071868896,
+      "learning_rate": 1.0516048233740833e-05,
+      "loss": 1.3471,
+      "step": 239
+    },
+    {
+      "epoch": 0.7079646017699115,
+      "grad_norm": 3.281359910964966,
+      "learning_rate": 1.0324456324229537e-05,
+      "loss": 1.4638,
+      "step": 240
+    },
+    {
+      "epoch": 0.7109144542772862,
+      "grad_norm": 3.3479645252227783,
+      "learning_rate": 1.0134170592392836e-05,
+      "loss": 1.4523,
+      "step": 241
+    },
+    {
+      "epoch": 0.7138643067846607,
+      "grad_norm": 3.479410171508789,
+      "learning_rate": 9.945207974364768e-06,
+      "loss": 1.4964,
+      "step": 242
+    },
+    {
+      "epoch": 0.7168141592920354,
+      "grad_norm": 3.188636541366577,
+      "learning_rate": 9.757585288517328e-06,
+      "loss": 1.3248,
+      "step": 243
+    },
+    {
+      "epoch": 0.7197640117994101,
+      "grad_norm": 3.591494560241699,
+      "learning_rate": 9.571319233963627e-06,
+      "loss": 1.4178,
+      "step": 244
+    },
+    {
+      "epoch": 0.7227138643067846,
+      "grad_norm": 3.371492385864258,
+      "learning_rate": 9.386426389071531e-06,
+      "loss": 1.4038,
+      "step": 245
+    },
+    {
+      "epoch": 0.7256637168141593,
+      "grad_norm": 3.8743629455566406,
+      "learning_rate": 9.202923209988198e-06,
+      "loss": 1.4212,
+      "step": 246
+    },
+    {
+      "epoch": 0.7286135693215339,
+      "grad_norm": 3.4136948585510254,
+      "learning_rate": 9.020826029175384e-06,
+      "loss": 1.2523,
+      "step": 247
+    },
+    {
+      "epoch": 0.7315634218289085,
+      "grad_norm": 4.003933906555176,
+      "learning_rate": 8.840151053955773e-06,
+      "loss": 1.3366,
+      "step": 248
+    },
+    {
+      "epoch": 0.7345132743362832,
+      "grad_norm": 4.340333461761475,
+      "learning_rate": 8.66091436507052e-06,
+      "loss": 1.217,
+      "step": 249
+    },
+    {
+      "epoch": 0.7374631268436578,
+      "grad_norm": 4.331326007843018,
+      "learning_rate": 8.483131915247968e-06,
+      "loss": 1.0392,
+      "step": 250
+    },
+    {
+      "epoch": 0.7404129793510325,
+      "grad_norm": 3.288041353225708,
+      "learning_rate": 8.30681952778379e-06,
+      "loss": 1.4832,
+      "step": 251
+    },
+    {
+      "epoch": 0.7433628318584071,
+      "grad_norm": 2.6984848976135254,
+      "learning_rate": 8.131992895132693e-06,
+      "loss": 1.3219,
+      "step": 252
+    },
+    {
+      "epoch": 0.7463126843657817,
+      "grad_norm": 2.689807415008545,
+      "learning_rate": 7.958667577511683e-06,
+      "loss": 1.2515,
+      "step": 253
+    },
+    {
+      "epoch": 0.7492625368731564,
+      "grad_norm": 2.8211238384246826,
+      "learning_rate": 7.786859001515196e-06,
+      "loss": 1.329,
+      "step": 254
+    },
+    {
+      "epoch": 0.7522123893805309,
+      "grad_norm": 2.54939866065979,
+      "learning_rate": 7.616582458742058e-06,
+      "loss": 1.229,
+      "step": 255
+    },
+    {
+      "epoch": 0.7551622418879056,
+      "grad_norm": 2.8110005855560303,
+      "learning_rate": 7.447853104434438e-06,
+      "loss": 1.1701,
+      "step": 256
+    },
+    {
+      "epoch": 0.7581120943952803,
+      "grad_norm": 2.3313047885894775,
+      "learning_rate": 7.280685956129049e-06,
+      "loss": 1.188,
+      "step": 257
+    },
+    {
+      "epoch": 0.7610619469026548,
+      "grad_norm": 2.415221691131592,
+      "learning_rate": 7.115095892320456e-06,
+      "loss": 1.1699,
+      "step": 258
+    },
+    {
+      "epoch": 0.7640117994100295,
+      "grad_norm": 2.5022096633911133,
+      "learning_rate": 6.951097651136889e-06,
+      "loss": 1.1899,
+      "step": 259
+    },
+    {
+      "epoch": 0.7669616519174042,
+      "grad_norm": 2.469304084777832,
+      "learning_rate": 6.788705829028483e-06,
+      "loss": 1.1482,
+      "step": 260
+    },
+    {
+      "epoch": 0.7699115044247787,
+      "grad_norm": 2.3378803730010986,
+      "learning_rate": 6.6279348794681065e-06,
+      "loss": 1.0611,
+      "step": 261
+    },
+    {
+      "epoch": 0.7728613569321534,
+      "grad_norm": 2.341405153274536,
+      "learning_rate": 6.468799111665003e-06,
+      "loss": 1.0434,
+      "step": 262
+    },
+    {
+      "epoch": 0.775811209439528,
+      "grad_norm": 2.2552103996276855,
+      "learning_rate": 6.311312689291166e-06,
+      "loss": 1.0761,
+      "step": 263
+    },
+    {
+      "epoch": 0.7787610619469026,
+      "grad_norm": 2.3256208896636963,
+      "learning_rate": 6.155489629220765e-06,
+      "loss": 1.0263,
+      "step": 264
+    },
+    {
+      "epoch": 0.7817109144542773,
+      "grad_norm": 2.1409356594085693,
+      "learning_rate": 6.001343800282569e-06,
+      "loss": 1.0184,
+      "step": 265
+    },
+    {
+      "epoch": 0.7846607669616519,
+      "grad_norm": 2.315657615661621,
+      "learning_rate": 5.848888922025553e-06,
+      "loss": 1.1478,
+      "step": 266
+    },
+    {
+      "epoch": 0.7876106194690266,
+      "grad_norm": 2.3074729442596436,
+      "learning_rate": 5.698138563497854e-06,
+      "loss": 1.1169,
+      "step": 267
+    },
+    {
+      "epoch": 0.7905604719764012,
+      "grad_norm": 2.2867608070373535,
+      "learning_rate": 5.549106142039018e-06,
+      "loss": 1.0809,
+      "step": 268
+    },
+    {
+      "epoch": 0.7935103244837758,
+      "grad_norm": 2.5681211948394775,
+      "learning_rate": 5.40180492208586e-06,
+      "loss": 1.2369,
+      "step": 269
+    },
+    {
+      "epoch": 0.7964601769911505,
+      "grad_norm": 2.640070915222168,
+      "learning_rate": 5.256248013991857e-06,
+      "loss": 1.0342,
+      "step": 270
+    },
+    {
+      "epoch": 0.799410029498525,
+      "grad_norm": 2.4330620765686035,
+      "learning_rate": 5.112448372860257e-06,
+      "loss": 1.1637,
+      "step": 271
+    },
+    {
+      "epoch": 0.8023598820058997,
+      "grad_norm": 2.549358367919922,
+      "learning_rate": 4.9704187973910635e-06,
+      "loss": 1.2075,
+      "step": 272
+    },
+    {
+      "epoch": 0.8053097345132744,
+      "grad_norm": 2.5039963722229004,
+      "learning_rate": 4.8301719287419e-06,
+      "loss": 1.1454,
+      "step": 273
+    },
+    {
+      "epoch": 0.8082595870206489,
+      "grad_norm": 2.678183078765869,
+      "learning_rate": 4.691720249402856e-06,
+      "loss": 1.3579,
+      "step": 274
+    },
+    {
+      "epoch": 0.8112094395280236,
+      "grad_norm": 2.9728591442108154,
+      "learning_rate": 4.555076082085563e-06,
+      "loss": 1.2104,
+      "step": 275
+    },
+    {
+      "epoch": 0.8141592920353983,
+      "grad_norm": 2.7286298274993896,
+      "learning_rate": 4.420251588626373e-06,
+      "loss": 1.3158,
+      "step": 276
+    },
+    {
+      "epoch": 0.8171091445427728,
+      "grad_norm": 2.934502124786377,
+      "learning_rate": 4.2872587689039484e-06,
+      "loss": 1.2516,
+      "step": 277
+    },
+    {
+      "epoch": 0.8200589970501475,
+      "grad_norm": 2.974820613861084,
+      "learning_rate": 4.1561094597712155e-06,
+      "loss": 1.2212,
+      "step": 278
+    },
+    {
+      "epoch": 0.8230088495575221,
+      "grad_norm": 2.7638723850250244,
+      "learning_rate": 4.026815334001821e-06,
+      "loss": 1.2607,
+      "step": 279
+    },
+    {
+      "epoch": 0.8259587020648967,
+      "grad_norm": 2.5246145725250244,
+      "learning_rate": 3.8993878992512415e-06,
+      "loss": 1.3047,
+      "step": 280
+    },
+    {
+      "epoch": 0.8289085545722714,
+      "grad_norm": 3.216073751449585,
+      "learning_rate": 3.7738384970325586e-06,
+      "loss": 1.4741,
+      "step": 281
+    },
+    {
+      "epoch": 0.831858407079646,
+      "grad_norm": 2.98732328414917,
+      "learning_rate": 3.650178301706983e-06,
+      "loss": 1.5002,
+      "step": 282
+    },
+    {
+      "epoch": 0.8348082595870207,
+      "grad_norm": 2.9341797828674316,
+      "learning_rate": 3.5284183194893488e-06,
+      "loss": 1.4045,
+      "step": 283
+    },
+    {
+      "epoch": 0.8377581120943953,
+      "grad_norm": 2.846006155014038,
+      "learning_rate": 3.4085693874684883e-06,
+      "loss": 1.4208,
+      "step": 284
+    },
+    {
+      "epoch": 0.8407079646017699,
+      "grad_norm": 3.1104767322540283,
+      "learning_rate": 3.290642172642686e-06,
+      "loss": 1.425,
+      "step": 285
+    },
+    {
+      "epoch": 0.8436578171091446,
+      "grad_norm": 3.0818898677825928,
+      "learning_rate": 3.1746471709702964e-06,
+      "loss": 1.3676,
+      "step": 286
+    },
+    {
+      "epoch": 0.8466076696165191,
+      "grad_norm": 3.1798288822174072,
+      "learning_rate": 3.06059470643556e-06,
+      "loss": 1.4554,
+      "step": 287
+    },
+    {
+      "epoch": 0.8495575221238938,
+      "grad_norm": 3.153404474258423,
+      "learning_rate": 2.9484949301297166e-06,
+      "loss": 1.4836,
+      "step": 288
+    },
+    {
+      "epoch": 0.8525073746312685,
+      "grad_norm": 3.0223796367645264,
+      "learning_rate": 2.8383578193475315e-06,
+      "loss": 1.3482,
+      "step": 289
+    },
+    {
+      "epoch": 0.855457227138643,
+      "grad_norm": 3.494483232498169,
+      "learning_rate": 2.7301931766992917e-06,
+      "loss": 1.3703,
+      "step": 290
+    },
+    {
+      "epoch": 0.8584070796460177,
+      "grad_norm": 3.410726308822632,
+      "learning_rate": 2.6240106292383022e-06,
+      "loss": 1.4501,
+      "step": 291
+    },
+    {
+      "epoch": 0.8613569321533924,
+      "grad_norm": 3.2595279216766357,
+      "learning_rate": 2.5198196276040782e-06,
+      "loss": 1.4239,
+      "step": 292
+    },
+    {
+      "epoch": 0.8643067846607669,
+      "grad_norm": 3.2281689643859863,
+      "learning_rate": 2.417629445181194e-06,
+      "loss": 1.3895,
+      "step": 293
+    },
+    {
+      "epoch": 0.8672566371681416,
+      "grad_norm": 3.821483612060547,
+      "learning_rate": 2.3174491772738894e-06,
+      "loss": 1.4774,
+      "step": 294
+    },
+    {
+      "epoch": 0.8702064896755162,
+      "grad_norm": 3.490224838256836,
+      "learning_rate": 2.219287740296605e-06,
+      "loss": 1.3942,
+      "step": 295
+    },
+    {
+      "epoch": 0.8731563421828908,
+      "grad_norm": 3.725832223892212,
+      "learning_rate": 2.1231538709803487e-06,
+      "loss": 1.3763,
+      "step": 296
+    },
+    {
+      "epoch": 0.8761061946902655,
+      "grad_norm": 3.589167833328247,
+      "learning_rate": 2.0290561255950967e-06,
+      "loss": 1.3114,
+      "step": 297
+    },
+    {
+      "epoch": 0.8790560471976401,
+      "grad_norm": 4.201038360595703,
+      "learning_rate": 1.937002879188285e-06,
+      "loss": 1.3013,
+      "step": 298
+    },
+    {
+      "epoch": 0.8820058997050148,
+      "grad_norm": 4.00734281539917,
+      "learning_rate": 1.8470023248393531e-06,
+      "loss": 1.1628,
+      "step": 299
+    },
+    {
+      "epoch": 0.8849557522123894,
+      "grad_norm": 4.654489040374756,
+      "learning_rate": 1.75906247293057e-06,
+      "loss": 1.0854,
+      "step": 300
+    },
+    {
+      "epoch": 0.887905604719764,
+      "grad_norm": 2.812892436981201,
+      "learning_rate": 1.673191150434067e-06,
+      "loss": 1.2263,
+      "step": 301
+    },
+    {
+      "epoch": 0.8908554572271387,
+      "grad_norm": 2.76861834526062,
+      "learning_rate": 1.5893960002151903e-06,
+      "loss": 1.215,
+      "step": 302
+    },
+    {
+      "epoch": 0.8938053097345132,
+      "grad_norm": 2.6831042766571045,
+      "learning_rate": 1.5076844803522922e-06,
+      "loss": 1.1712,
+      "step": 303
+    },
+    {
+      "epoch": 0.8967551622418879,
+      "grad_norm": 2.251847982406616,
+      "learning_rate": 1.428063863472895e-06,
+      "loss": 1.1735,
+      "step": 304
+    },
+    {
+      "epoch": 0.8997050147492626,
+      "grad_norm": 2.226048707962036,
+      "learning_rate": 1.3505412361064395e-06,
+      "loss": 1.1624,
+      "step": 305
+    },
+    {
+      "epoch": 0.9026548672566371,
+      "grad_norm": 2.3777687549591064,
+      "learning_rate": 1.275123498053532e-06,
+      "loss": 1.1533,
+      "step": 306
+    },
+    {
+      "epoch": 0.9056047197640118,
+      "grad_norm": 2.281771183013916,
+      "learning_rate": 1.201817361771837e-06,
+      "loss": 1.1053,
+      "step": 307
+    },
+    {
+      "epoch": 0.9085545722713865,
+      "grad_norm": 2.2913832664489746,
+      "learning_rate": 1.1306293517786614e-06,
+      "loss": 1.1222,
+      "step": 308
+    },
+    {
+      "epoch": 0.911504424778761,
+      "grad_norm": 2.3121795654296875,
+      "learning_rate": 1.0615658040702275e-06,
+      "loss": 0.9242,
+      "step": 309
+    },
+    {
+      "epoch": 0.9144542772861357,
+      "grad_norm": 2.1711535453796387,
+      "learning_rate": 9.946328655577624e-07,
+      "loss": 1.089,
+      "step": 310
+    },
+    {
+      "epoch": 0.9174041297935103,
+      "grad_norm": 2.23848819732666,
+      "learning_rate": 9.298364935203918e-07,
+      "loss": 1.1265,
+      "step": 311
+    },
+    {
+      "epoch": 0.9203539823008849,
+      "grad_norm": 2.195310354232788,
+      "learning_rate": 8.671824550749164e-07,
+      "loss": 1.0486,
+      "step": 312
+    },
+    {
+      "epoch": 0.9233038348082596,
+      "grad_norm": 2.1637182235717773,
+      "learning_rate": 8.066763266625282e-07,
+      "loss": 1.0427,
+      "step": 313
+    },
+    {
+      "epoch": 0.9262536873156342,
+      "grad_norm": 2.4640908241271973,
+      "learning_rate": 7.483234935524802e-07,
+      "loss": 1.2038,
+      "step": 314
+    },
+    {
+      "epoch": 0.9292035398230089,
+      "grad_norm": 2.158703565597534,
+      "learning_rate": 6.921291493627747e-07,
+      "loss": 1.1089,
+      "step": 315
+    },
+    {
+      "epoch": 0.9321533923303835,
+      "grad_norm": 2.3314948081970215,
+      "learning_rate": 6.380982955979192e-07,
+      "loss": 1.03,
+      "step": 316
+    },
+    {
+      "epoch": 0.9351032448377581,
+      "grad_norm": 2.3510630130767822,
+      "learning_rate": 5.862357412037666e-07,
+      "loss": 1.2572,
+      "step": 317
+    },
+    {
+      "epoch": 0.9380530973451328,
+      "grad_norm": 2.3725976943969727,
+      "learning_rate": 5.365461021395096e-07,
+      "loss": 1.0767,
+      "step": 318
+    },
+    {
+      "epoch": 0.9410029498525073,
+      "grad_norm": 2.5629117488861084,
+      "learning_rate": 4.890338009668316e-07,
+      "loss": 1.3024,
+      "step": 319
+    },
+    {
+      "epoch": 0.943952802359882,
+      "grad_norm": 2.434873580932617,
+      "learning_rate": 4.437030664562969e-07,
+      "loss": 1.1805,
+      "step": 320
+    },
+    {
+      "epoch": 0.9469026548672567,
+      "grad_norm": 2.730053424835205,
+      "learning_rate": 4.0055793321096266e-07,
+      "loss": 1.1859,
+      "step": 321
+    },
+    {
+      "epoch": 0.9498525073746312,
+      "grad_norm": 2.6475114822387695,
+      "learning_rate": 3.5960224130728857e-07,
+      "loss": 1.2116,
+      "step": 322
+    },
+    {
+      "epoch": 0.9528023598820059,
+      "grad_norm": 2.482257843017578,
+      "learning_rate": 3.208396359533572e-07,
+      "loss": 1.147,
+      "step": 323
+    },
+    {
+      "epoch": 0.9557522123893806,
+      "grad_norm": 2.9738566875457764,
+      "learning_rate": 2.8427356716443367e-07,
+      "loss": 1.3316,
+      "step": 324
+    },
+    {
+      "epoch": 0.9587020648967551,
+      "grad_norm": 2.721967935562134,
+      "learning_rate": 2.499072894559057e-07,
+      "loss": 1.4408,
+      "step": 325
+    },
+    {
+      "epoch": 0.9616519174041298,
+      "grad_norm": 2.898231267929077,
+      "learning_rate": 2.1774386155361538e-07,
+      "loss": 1.4893,
+      "step": 326
+    },
+    {
+      "epoch": 0.9646017699115044,
+      "grad_norm": 3.0794098377227783,
+      "learning_rate": 1.8778614612162404e-07,
+      "loss": 1.4212,
+      "step": 327
+    },
+    {
+      "epoch": 0.967551622418879,
+      "grad_norm": 3.1514523029327393,
+      "learning_rate": 1.6003680950742728e-07,
+      "loss": 1.517,
+      "step": 328
+    },
+    {
+      "epoch": 0.9705014749262537,
+      "grad_norm": 3.139253616333008,
+      "learning_rate": 1.3449832150462805e-07,
+      "loss": 1.4702,
+      "step": 329
+    },
+    {
+      "epoch": 0.9734513274336283,
+      "grad_norm": 3.4178974628448486,
+      "learning_rate": 1.1117295513313475e-07,
+      "loss": 1.417,
+      "step": 330
+    },
+    {
+      "epoch": 0.976401179941003,
+      "grad_norm": 3.226001262664795,
+      "learning_rate": 9.006278643683696e-08,
+      "loss": 1.4643,
+      "step": 331
+    },
+    {
+      "epoch": 0.9793510324483776,
+      "grad_norm": 3.3434715270996094,
+      "learning_rate": 7.116969429883935e-08,
+      "loss": 1.3875,
+      "step": 332
+    },
+    {
+      "epoch": 0.9823008849557522,
+      "grad_norm": 3.6946794986724854,
+      "learning_rate": 5.4495360274231524e-08,
+      "loss": 1.4149,
+      "step": 333
+    },
+    {
+      "epoch": 0.9852507374631269,
+      "grad_norm": 3.661628246307373,
+      "learning_rate": 4.004126844042444e-08,
+      "loss": 1.4018,
+      "step": 334
+    },
+    {
+      "epoch": 0.9882005899705014,
+      "grad_norm": 3.326308012008667,
+      "learning_rate": 2.7808705265053305e-08,
+      "loss": 1.3202,
+      "step": 335
+    },
+    {
+      "epoch": 0.9911504424778761,
+      "grad_norm": 3.6585474014282227,
+      "learning_rate": 1.779875949149967e-08,
+      "loss": 1.333,
+      "step": 336
+    },
+    {
+      "epoch": 0.9941002949852508,
+      "grad_norm": 4.121379375457764,
+      "learning_rate": 1.0012322041960676e-08,
+      "loss": 1.3641,
+      "step": 337
+    },
+    {
+      "epoch": 0.9970501474926253,
+      "grad_norm": 4.23801851272583,
+      "learning_rate": 4.450085938170756e-09,
+      "loss": 1.2688,
+      "step": 338
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 5.26812744140625,
+      "learning_rate": 1.1125462397260089e-09,
+      "loss": 1.11,
+      "step": 339
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 339,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.1390262342959104e+17,
+  "train_batch_size": 40,
+  "trial_name": null,
+  "trial_params": null
+}

ep1_lr1e-05_bs16_wd0001_ws6/checkpoint-339/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c750c736195636f8dc7c7a3baf13a9603e67d3864aae2d456029352a30e549a9
+size 6225

outputs_pretrained/README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+base_model: unsloth/mistral-7b-v0.3-bnb-4bit
+library_name: transformers
+model_name: outputs_pretrained
+tags:
+- generated_from_trainer
+- unsloth
+- trl
+- sft
+licence: license
+---
+# Model Card for outputs_pretrained
+This model is a fine-tuned version of [unsloth/mistral-7b-v0.3-bnb-4bit](https://huggingface.co/unsloth/mistral-7b-v0.3-bnb-4bit).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="None", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+This model was trained with SFT.
+### Framework versions
+- TRL: 0.22.2
+- Transformers: 4.56.2
+- Pytorch: 2.9.0
+- Datasets: 4.3.0
+- Tokenizers: 0.22.1
+## Citations
+Cite TRL as:
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```

outputs_pretrained/checkpoint-500/README.md ADDED Viewed

	@@ -0,0 +1,210 @@

+---
+base_model: unsloth/mistral-7b-v0.3-bnb-4bit
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:unsloth/mistral-7b-v0.3-bnb-4bit
+- lora
+- sft
+- transformers
+- trl
+- unsloth
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

outputs_pretrained/checkpoint-500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,53 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "MistralForCausalLM",
+    "parent_library": "transformers.models.mistral.modeling_mistral",
+    "unsloth_fixed": true
+  },
+  "base_model_name_or_path": "unsloth/mistral-7b-v0.3-bnb-4bit",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "embed_tokens",
+    "lm_head"
+  ],
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "gate_proj",
+    "q_proj",
+    "o_proj",
+    "down_proj",
+    "k_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

outputs_pretrained/checkpoint-500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f1f1fe1fcfd215823b6c595306a41ef2d5a7c98f5e711be74427cc032593e61
+size 1208020312

outputs_pretrained/checkpoint-500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a9b0cd84e1cdaddf6c48aa103fea021a12455e51af78913ee4d58be3d4ba68fa
+size 1687688427

outputs_pretrained/checkpoint-500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c800b778fa7e115e4c34de8529902de8b61c9a1b4bab3eb8295d06dafff030e
+size 14645

outputs_pretrained/checkpoint-500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0973a6755517d124a0bc79d8dd1a790f15a152e4ba0a928ed07de4563e894e0
+size 1465

outputs_pretrained/checkpoint-500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[control_768]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

outputs_pretrained/checkpoint-500/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs_pretrained/checkpoint-500/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
+size 587404

outputs_pretrained/checkpoint-500/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs_pretrained/checkpoint-500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,3534 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.1820330969267139,
+  "eval_steps": 500,
+  "global_step": 500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.002364066193853428,
+      "grad_norm": 9.786882400512695,
+      "learning_rate": 0.0,
+      "loss": 3.2811,
+      "step": 1
+    },
+    {
+      "epoch": 0.004728132387706856,
+      "grad_norm": 10.056135177612305,
+      "learning_rate": 5e-06,
+      "loss": 3.2727,
+      "step": 2
+    },
+    {
+      "epoch": 0.0070921985815602835,
+      "grad_norm": 10.060759544372559,
+      "learning_rate": 1e-05,
+      "loss": 3.4002,
+      "step": 3
+    },
+    {
+      "epoch": 0.009456264775413711,
+      "grad_norm": 8.956620216369629,
+      "learning_rate": 1.5e-05,
+      "loss": 3.3498,
+      "step": 4
+    },
+    {
+      "epoch": 0.01182033096926714,
+      "grad_norm": 7.450173854827881,
+      "learning_rate": 2e-05,
+      "loss": 3.3485,
+      "step": 5
+    },
+    {
+      "epoch": 0.014184397163120567,
+      "grad_norm": 7.106024265289307,
+      "learning_rate": 2.5e-05,
+      "loss": 3.1664,
+      "step": 6
+    },
+    {
+      "epoch": 0.016548463356973995,
+      "grad_norm": 7.5257415771484375,
+      "learning_rate": 3e-05,
+      "loss": 3.087,
+      "step": 7
+    },
+    {
+      "epoch": 0.018912529550827423,
+      "grad_norm": 7.290092468261719,
+      "learning_rate": 3.5e-05,
+      "loss": 2.9256,
+      "step": 8
+    },
+    {
+      "epoch": 0.02127659574468085,
+      "grad_norm": 6.230826377868652,
+      "learning_rate": 4e-05,
+      "loss": 2.83,
+      "step": 9
+    },
+    {
+      "epoch": 0.02364066193853428,
+      "grad_norm": 7.3913679122924805,
+      "learning_rate": 4.5e-05,
+      "loss": 2.8809,
+      "step": 10
+    },
+    {
+      "epoch": 0.026004728132387706,
+      "grad_norm": 7.500810146331787,
+      "learning_rate": 5e-05,
+      "loss": 2.9329,
+      "step": 11
+    },
+    {
+      "epoch": 0.028368794326241134,
+      "grad_norm": 6.774663925170898,
+      "learning_rate": 4.994019138755981e-05,
+      "loss": 2.8633,
+      "step": 12
+    },
+    {
+      "epoch": 0.030732860520094562,
+      "grad_norm": 8.217077255249023,
+      "learning_rate": 4.988038277511962e-05,
+      "loss": 2.779,
+      "step": 13
+    },
+    {
+      "epoch": 0.03309692671394799,
+      "grad_norm": 6.935091495513916,
+      "learning_rate": 4.982057416267943e-05,
+      "loss": 2.7578,
+      "step": 14
+    },
+    {
+      "epoch": 0.03546099290780142,
+      "grad_norm": 7.576630592346191,
+      "learning_rate": 4.9760765550239234e-05,
+      "loss": 2.7023,
+      "step": 15
+    },
+    {
+      "epoch": 0.037825059101654845,
+      "grad_norm": 5.840672492980957,
+      "learning_rate": 4.970095693779905e-05,
+      "loss": 2.6723,
+      "step": 16
+    },
+    {
+      "epoch": 0.04018912529550828,
+      "grad_norm": 5.728421211242676,
+      "learning_rate": 4.964114832535885e-05,
+      "loss": 2.6571,
+      "step": 17
+    },
+    {
+      "epoch": 0.0425531914893617,
+      "grad_norm": 5.6147990226745605,
+      "learning_rate": 4.958133971291866e-05,
+      "loss": 2.712,
+      "step": 18
+    },
+    {
+      "epoch": 0.04491725768321513,
+      "grad_norm": 5.601494789123535,
+      "learning_rate": 4.952153110047847e-05,
+      "loss": 2.6546,
+      "step": 19
+    },
+    {
+      "epoch": 0.04728132387706856,
+      "grad_norm": 5.493714332580566,
+      "learning_rate": 4.946172248803828e-05,
+      "loss": 2.599,
+      "step": 20
+    },
+    {
+      "epoch": 0.04964539007092199,
+      "grad_norm": 5.742792129516602,
+      "learning_rate": 4.940191387559809e-05,
+      "loss": 2.6123,
+      "step": 21
+    },
+    {
+      "epoch": 0.05200945626477541,
+      "grad_norm": 4.954222202301025,
+      "learning_rate": 4.9342105263157894e-05,
+      "loss": 2.4297,
+      "step": 22
+    },
+    {
+      "epoch": 0.054373522458628844,
+      "grad_norm": 5.282480239868164,
+      "learning_rate": 4.928229665071771e-05,
+      "loss": 2.5279,
+      "step": 23
+    },
+    {
+      "epoch": 0.05673758865248227,
+      "grad_norm": 5.26470947265625,
+      "learning_rate": 4.922248803827751e-05,
+      "loss": 2.4522,
+      "step": 24
+    },
+    {
+      "epoch": 0.0591016548463357,
+      "grad_norm": 5.375247001647949,
+      "learning_rate": 4.916267942583732e-05,
+      "loss": 2.5739,
+      "step": 25
+    },
+    {
+      "epoch": 0.061465721040189124,
+      "grad_norm": 5.001519203186035,
+      "learning_rate": 4.910287081339713e-05,
+      "loss": 2.5496,
+      "step": 26
+    },
+    {
+      "epoch": 0.06382978723404255,
+      "grad_norm": 5.700991630554199,
+      "learning_rate": 4.904306220095694e-05,
+      "loss": 2.5462,
+      "step": 27
+    },
+    {
+      "epoch": 0.06619385342789598,
+      "grad_norm": 5.051942348480225,
+      "learning_rate": 4.898325358851675e-05,
+      "loss": 2.3694,
+      "step": 28
+    },
+    {
+      "epoch": 0.06855791962174941,
+      "grad_norm": 5.411677837371826,
+      "learning_rate": 4.892344497607656e-05,
+      "loss": 2.5297,
+      "step": 29
+    },
+    {
+      "epoch": 0.07092198581560284,
+      "grad_norm": 4.419405937194824,
+      "learning_rate": 4.886363636363637e-05,
+      "loss": 2.5009,
+      "step": 30
+    },
+    {
+      "epoch": 0.07328605200945626,
+      "grad_norm": 5.185110569000244,
+      "learning_rate": 4.880382775119617e-05,
+      "loss": 2.4033,
+      "step": 31
+    },
+    {
+      "epoch": 0.07565011820330969,
+      "grad_norm": 5.229886054992676,
+      "learning_rate": 4.874401913875598e-05,
+      "loss": 2.4766,
+      "step": 32
+    },
+    {
+      "epoch": 0.07801418439716312,
+      "grad_norm": 4.489797115325928,
+      "learning_rate": 4.868421052631579e-05,
+      "loss": 2.4468,
+      "step": 33
+    },
+    {
+      "epoch": 0.08037825059101655,
+      "grad_norm": 4.7152419090271,
+      "learning_rate": 4.86244019138756e-05,
+      "loss": 2.4374,
+      "step": 34
+    },
+    {
+      "epoch": 0.08274231678486997,
+      "grad_norm": 5.1662278175354,
+      "learning_rate": 4.8564593301435404e-05,
+      "loss": 2.3898,
+      "step": 35
+    },
+    {
+      "epoch": 0.0851063829787234,
+      "grad_norm": 4.726010322570801,
+      "learning_rate": 4.850478468899522e-05,
+      "loss": 2.5208,
+      "step": 36
+    },
+    {
+      "epoch": 0.08747044917257683,
+      "grad_norm": 4.929504871368408,
+      "learning_rate": 4.844497607655503e-05,
+      "loss": 2.4105,
+      "step": 37
+    },
+    {
+      "epoch": 0.08983451536643026,
+      "grad_norm": 4.676382064819336,
+      "learning_rate": 4.838516746411483e-05,
+      "loss": 2.3955,
+      "step": 38
+    },
+    {
+      "epoch": 0.09219858156028368,
+      "grad_norm": 5.068330764770508,
+      "learning_rate": 4.832535885167465e-05,
+      "loss": 2.3171,
+      "step": 39
+    },
+    {
+      "epoch": 0.09456264775413711,
+      "grad_norm": 4.925624847412109,
+      "learning_rate": 4.826555023923445e-05,
+      "loss": 2.3936,
+      "step": 40
+    },
+    {
+      "epoch": 0.09692671394799054,
+      "grad_norm": 4.296761989593506,
+      "learning_rate": 4.820574162679426e-05,
+      "loss": 2.3958,
+      "step": 41
+    },
+    {
+      "epoch": 0.09929078014184398,
+      "grad_norm": 4.61798095703125,
+      "learning_rate": 4.8145933014354064e-05,
+      "loss": 2.4068,
+      "step": 42
+    },
+    {
+      "epoch": 0.1016548463356974,
+      "grad_norm": 5.4643025398254395,
+      "learning_rate": 4.808612440191388e-05,
+      "loss": 2.3982,
+      "step": 43
+    },
+    {
+      "epoch": 0.10401891252955082,
+      "grad_norm": 4.580990314483643,
+      "learning_rate": 4.802631578947368e-05,
+      "loss": 2.346,
+      "step": 44
+    },
+    {
+      "epoch": 0.10638297872340426,
+      "grad_norm": 4.549551963806152,
+      "learning_rate": 4.796650717703349e-05,
+      "loss": 2.2992,
+      "step": 45
+    },
+    {
+      "epoch": 0.10874704491725769,
+      "grad_norm": 4.603874206542969,
+      "learning_rate": 4.790669856459331e-05,
+      "loss": 2.3436,
+      "step": 46
+    },
+    {
+      "epoch": 0.1111111111111111,
+      "grad_norm": 4.180109977722168,
+      "learning_rate": 4.784688995215311e-05,
+      "loss": 2.3465,
+      "step": 47
+    },
+    {
+      "epoch": 0.11347517730496454,
+      "grad_norm": 4.274153232574463,
+      "learning_rate": 4.778708133971292e-05,
+      "loss": 2.3173,
+      "step": 48
+    },
+    {
+      "epoch": 0.11583924349881797,
+      "grad_norm": 4.787023544311523,
+      "learning_rate": 4.772727272727273e-05,
+      "loss": 2.3225,
+      "step": 49
+    },
+    {
+      "epoch": 0.1182033096926714,
+      "grad_norm": 4.5308146476745605,
+      "learning_rate": 4.766746411483254e-05,
+      "loss": 2.2716,
+      "step": 50
+    },
+    {
+      "epoch": 0.12056737588652482,
+      "grad_norm": 4.548710823059082,
+      "learning_rate": 4.760765550239234e-05,
+      "loss": 2.277,
+      "step": 51
+    },
+    {
+      "epoch": 0.12293144208037825,
+      "grad_norm": 4.569216251373291,
+      "learning_rate": 4.754784688995216e-05,
+      "loss": 2.3076,
+      "step": 52
+    },
+    {
+      "epoch": 0.12529550827423167,
+      "grad_norm": 4.4847493171691895,
+      "learning_rate": 4.748803827751196e-05,
+      "loss": 2.3058,
+      "step": 53
+    },
+    {
+      "epoch": 0.1276595744680851,
+      "grad_norm": 4.511356353759766,
+      "learning_rate": 4.742822966507177e-05,
+      "loss": 2.1524,
+      "step": 54
+    },
+    {
+      "epoch": 0.13002364066193853,
+      "grad_norm": 4.245441436767578,
+      "learning_rate": 4.736842105263158e-05,
+      "loss": 2.2795,
+      "step": 55
+    },
+    {
+      "epoch": 0.13238770685579196,
+      "grad_norm": 4.903697490692139,
+      "learning_rate": 4.730861244019139e-05,
+      "loss": 2.2934,
+      "step": 56
+    },
+    {
+      "epoch": 0.1347517730496454,
+      "grad_norm": 4.915858745574951,
+      "learning_rate": 4.72488038277512e-05,
+      "loss": 2.271,
+      "step": 57
+    },
+    {
+      "epoch": 0.13711583924349882,
+      "grad_norm": 4.528074264526367,
+      "learning_rate": 4.7188995215311e-05,
+      "loss": 2.2712,
+      "step": 58
+    },
+    {
+      "epoch": 0.13947990543735225,
+      "grad_norm": 4.402848720550537,
+      "learning_rate": 4.712918660287082e-05,
+      "loss": 2.2441,
+      "step": 59
+    },
+    {
+      "epoch": 0.14184397163120568,
+      "grad_norm": 4.317126274108887,
+      "learning_rate": 4.706937799043062e-05,
+      "loss": 2.3081,
+      "step": 60
+    },
+    {
+      "epoch": 0.14420803782505912,
+      "grad_norm": 4.551846027374268,
+      "learning_rate": 4.700956937799043e-05,
+      "loss": 2.33,
+      "step": 61
+    },
+    {
+      "epoch": 0.14657210401891252,
+      "grad_norm": 4.294714450836182,
+      "learning_rate": 4.694976076555024e-05,
+      "loss": 2.2291,
+      "step": 62
+    },
+    {
+      "epoch": 0.14893617021276595,
+      "grad_norm": 4.264118194580078,
+      "learning_rate": 4.688995215311005e-05,
+      "loss": 2.187,
+      "step": 63
+    },
+    {
+      "epoch": 0.15130023640661938,
+      "grad_norm": 4.644858360290527,
+      "learning_rate": 4.683014354066986e-05,
+      "loss": 2.2767,
+      "step": 64
+    },
+    {
+      "epoch": 0.1536643026004728,
+      "grad_norm": 4.4240522384643555,
+      "learning_rate": 4.677033492822967e-05,
+      "loss": 2.2365,
+      "step": 65
+    },
+    {
+      "epoch": 0.15602836879432624,
+      "grad_norm": 4.994328022003174,
+      "learning_rate": 4.671052631578948e-05,
+      "loss": 2.1967,
+      "step": 66
+    },
+    {
+      "epoch": 0.15839243498817968,
+      "grad_norm": 4.091360092163086,
+      "learning_rate": 4.665071770334928e-05,
+      "loss": 2.1983,
+      "step": 67
+    },
+    {
+      "epoch": 0.1607565011820331,
+      "grad_norm": 4.4661760330200195,
+      "learning_rate": 4.659090909090909e-05,
+      "loss": 2.2006,
+      "step": 68
+    },
+    {
+      "epoch": 0.16312056737588654,
+      "grad_norm": 4.753389358520508,
+      "learning_rate": 4.65311004784689e-05,
+      "loss": 2.2816,
+      "step": 69
+    },
+    {
+      "epoch": 0.16548463356973994,
+      "grad_norm": 4.429995059967041,
+      "learning_rate": 4.647129186602871e-05,
+      "loss": 2.2732,
+      "step": 70
+    },
+    {
+      "epoch": 0.16784869976359337,
+      "grad_norm": 4.254610061645508,
+      "learning_rate": 4.641148325358852e-05,
+      "loss": 2.1097,
+      "step": 71
+    },
+    {
+      "epoch": 0.1702127659574468,
+      "grad_norm": 4.007920265197754,
+      "learning_rate": 4.635167464114833e-05,
+      "loss": 2.2057,
+      "step": 72
+    },
+    {
+      "epoch": 0.17257683215130024,
+      "grad_norm": 4.3318772315979,
+      "learning_rate": 4.629186602870814e-05,
+      "loss": 2.2074,
+      "step": 73
+    },
+    {
+      "epoch": 0.17494089834515367,
+      "grad_norm": 5.013289928436279,
+      "learning_rate": 4.623205741626794e-05,
+      "loss": 2.3091,
+      "step": 74
+    },
+    {
+      "epoch": 0.1773049645390071,
+      "grad_norm": 4.14231538772583,
+      "learning_rate": 4.617224880382776e-05,
+      "loss": 2.2149,
+      "step": 75
+    },
+    {
+      "epoch": 0.17966903073286053,
+      "grad_norm": 4.8862762451171875,
+      "learning_rate": 4.611244019138756e-05,
+      "loss": 2.2153,
+      "step": 76
+    },
+    {
+      "epoch": 0.18203309692671396,
+      "grad_norm": 4.4072489738464355,
+      "learning_rate": 4.605263157894737e-05,
+      "loss": 2.256,
+      "step": 77
+    },
+    {
+      "epoch": 0.18439716312056736,
+      "grad_norm": 4.19016170501709,
+      "learning_rate": 4.599282296650718e-05,
+      "loss": 2.1268,
+      "step": 78
+    },
+    {
+      "epoch": 0.1867612293144208,
+      "grad_norm": 4.425796031951904,
+      "learning_rate": 4.593301435406699e-05,
+      "loss": 2.2441,
+      "step": 79
+    },
+    {
+      "epoch": 0.18912529550827423,
+      "grad_norm": 4.253169536590576,
+      "learning_rate": 4.58732057416268e-05,
+      "loss": 2.2179,
+      "step": 80
+    },
+    {
+      "epoch": 0.19148936170212766,
+      "grad_norm": 4.226646900177002,
+      "learning_rate": 4.58133971291866e-05,
+      "loss": 2.2049,
+      "step": 81
+    },
+    {
+      "epoch": 0.1938534278959811,
+      "grad_norm": 4.134627342224121,
+      "learning_rate": 4.575358851674642e-05,
+      "loss": 2.1709,
+      "step": 82
+    },
+    {
+      "epoch": 0.19621749408983452,
+      "grad_norm": 3.9358890056610107,
+      "learning_rate": 4.569377990430622e-05,
+      "loss": 2.1614,
+      "step": 83
+    },
+    {
+      "epoch": 0.19858156028368795,
+      "grad_norm": 4.453876972198486,
+      "learning_rate": 4.563397129186603e-05,
+      "loss": 2.2435,
+      "step": 84
+    },
+    {
+      "epoch": 0.20094562647754138,
+      "grad_norm": 4.1704535484313965,
+      "learning_rate": 4.557416267942584e-05,
+      "loss": 2.1753,
+      "step": 85
+    },
+    {
+      "epoch": 0.2033096926713948,
+      "grad_norm": 4.117349624633789,
+      "learning_rate": 4.551435406698565e-05,
+      "loss": 2.196,
+      "step": 86
+    },
+    {
+      "epoch": 0.20567375886524822,
+      "grad_norm": 3.9573557376861572,
+      "learning_rate": 4.545454545454546e-05,
+      "loss": 2.1901,
+      "step": 87
+    },
+    {
+      "epoch": 0.20803782505910165,
+      "grad_norm": 4.639995098114014,
+      "learning_rate": 4.539473684210527e-05,
+      "loss": 2.2147,
+      "step": 88
+    },
+    {
+      "epoch": 0.21040189125295508,
+      "grad_norm": 4.055258750915527,
+      "learning_rate": 4.533492822966508e-05,
+      "loss": 2.1608,
+      "step": 89
+    },
+    {
+      "epoch": 0.2127659574468085,
+      "grad_norm": 4.474514484405518,
+      "learning_rate": 4.527511961722488e-05,
+      "loss": 2.1507,
+      "step": 90
+    },
+    {
+      "epoch": 0.21513002364066194,
+      "grad_norm": 4.639707088470459,
+      "learning_rate": 4.521531100478469e-05,
+      "loss": 2.2826,
+      "step": 91
+    },
+    {
+      "epoch": 0.21749408983451538,
+      "grad_norm": 4.863740921020508,
+      "learning_rate": 4.51555023923445e-05,
+      "loss": 2.1766,
+      "step": 92
+    },
+    {
+      "epoch": 0.2198581560283688,
+      "grad_norm": 4.412019729614258,
+      "learning_rate": 4.509569377990431e-05,
+      "loss": 2.1956,
+      "step": 93
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 4.765653610229492,
+      "learning_rate": 4.503588516746411e-05,
+      "loss": 2.2213,
+      "step": 94
+    },
+    {
+      "epoch": 0.22458628841607564,
+      "grad_norm": 4.181426525115967,
+      "learning_rate": 4.497607655502393e-05,
+      "loss": 2.1097,
+      "step": 95
+    },
+    {
+      "epoch": 0.22695035460992907,
+      "grad_norm": 4.018885612487793,
+      "learning_rate": 4.491626794258373e-05,
+      "loss": 2.1055,
+      "step": 96
+    },
+    {
+      "epoch": 0.2293144208037825,
+      "grad_norm": 4.294673919677734,
+      "learning_rate": 4.485645933014354e-05,
+      "loss": 2.1984,
+      "step": 97
+    },
+    {
+      "epoch": 0.23167848699763594,
+      "grad_norm": 4.040910720825195,
+      "learning_rate": 4.4796650717703357e-05,
+      "loss": 2.1209,
+      "step": 98
+    },
+    {
+      "epoch": 0.23404255319148937,
+      "grad_norm": 4.413674354553223,
+      "learning_rate": 4.473684210526316e-05,
+      "loss": 2.177,
+      "step": 99
+    },
+    {
+      "epoch": 0.2364066193853428,
+      "grad_norm": 4.586551189422607,
+      "learning_rate": 4.467703349282297e-05,
+      "loss": 2.2419,
+      "step": 100
+    },
+    {
+      "epoch": 0.23877068557919623,
+      "grad_norm": 3.753817319869995,
+      "learning_rate": 4.461722488038278e-05,
+      "loss": 2.1044,
+      "step": 101
+    },
+    {
+      "epoch": 0.24113475177304963,
+      "grad_norm": 4.756337642669678,
+      "learning_rate": 4.455741626794259e-05,
+      "loss": 2.2236,
+      "step": 102
+    },
+    {
+      "epoch": 0.24349881796690306,
+      "grad_norm": 4.174933433532715,
+      "learning_rate": 4.449760765550239e-05,
+      "loss": 2.1759,
+      "step": 103
+    },
+    {
+      "epoch": 0.2458628841607565,
+      "grad_norm": 4.282530784606934,
+      "learning_rate": 4.44377990430622e-05,
+      "loss": 2.16,
+      "step": 104
+    },
+    {
+      "epoch": 0.24822695035460993,
+      "grad_norm": 4.547479629516602,
+      "learning_rate": 4.437799043062201e-05,
+      "loss": 2.0593,
+      "step": 105
+    },
+    {
+      "epoch": 0.25059101654846333,
+      "grad_norm": 4.146774768829346,
+      "learning_rate": 4.431818181818182e-05,
+      "loss": 2.1806,
+      "step": 106
+    },
+    {
+      "epoch": 0.25295508274231676,
+      "grad_norm": 4.418383598327637,
+      "learning_rate": 4.425837320574163e-05,
+      "loss": 2.1936,
+      "step": 107
+    },
+    {
+      "epoch": 0.2553191489361702,
+      "grad_norm": 4.2527947425842285,
+      "learning_rate": 4.419856459330144e-05,
+      "loss": 2.0821,
+      "step": 108
+    },
+    {
+      "epoch": 0.2576832151300236,
+      "grad_norm": 4.108152389526367,
+      "learning_rate": 4.413875598086125e-05,
+      "loss": 2.1622,
+      "step": 109
+    },
+    {
+      "epoch": 0.26004728132387706,
+      "grad_norm": 4.083064556121826,
+      "learning_rate": 4.407894736842105e-05,
+      "loss": 2.1178,
+      "step": 110
+    },
+    {
+      "epoch": 0.2624113475177305,
+      "grad_norm": 4.213859558105469,
+      "learning_rate": 4.401913875598087e-05,
+      "loss": 2.1575,
+      "step": 111
+    },
+    {
+      "epoch": 0.2647754137115839,
+      "grad_norm": 5.222052574157715,
+      "learning_rate": 4.395933014354067e-05,
+      "loss": 2.0866,
+      "step": 112
+    },
+    {
+      "epoch": 0.26713947990543735,
+      "grad_norm": 4.5867743492126465,
+      "learning_rate": 4.389952153110048e-05,
+      "loss": 2.156,
+      "step": 113
+    },
+    {
+      "epoch": 0.2695035460992908,
+      "grad_norm": 4.399920463562012,
+      "learning_rate": 4.383971291866029e-05,
+      "loss": 2.1911,
+      "step": 114
+    },
+    {
+      "epoch": 0.2718676122931442,
+      "grad_norm": 3.858452081680298,
+      "learning_rate": 4.37799043062201e-05,
+      "loss": 2.152,
+      "step": 115
+    },
+    {
+      "epoch": 0.27423167848699764,
+      "grad_norm": 4.103243350982666,
+      "learning_rate": 4.372009569377991e-05,
+      "loss": 2.1825,
+      "step": 116
+    },
+    {
+      "epoch": 0.2765957446808511,
+      "grad_norm": 4.9622344970703125,
+      "learning_rate": 4.366028708133971e-05,
+      "loss": 2.1182,
+      "step": 117
+    },
+    {
+      "epoch": 0.2789598108747045,
+      "grad_norm": 4.199765205383301,
+      "learning_rate": 4.360047846889953e-05,
+      "loss": 2.0638,
+      "step": 118
+    },
+    {
+      "epoch": 0.28132387706855794,
+      "grad_norm": 4.943136215209961,
+      "learning_rate": 4.354066985645933e-05,
+      "loss": 2.1906,
+      "step": 119
+    },
+    {
+      "epoch": 0.28368794326241137,
+      "grad_norm": 4.076112270355225,
+      "learning_rate": 4.348086124401914e-05,
+      "loss": 2.1208,
+      "step": 120
+    },
+    {
+      "epoch": 0.2860520094562648,
+      "grad_norm": 4.323102951049805,
+      "learning_rate": 4.342105263157895e-05,
+      "loss": 2.1642,
+      "step": 121
+    },
+    {
+      "epoch": 0.28841607565011823,
+      "grad_norm": 3.8718385696411133,
+      "learning_rate": 4.336124401913876e-05,
+      "loss": 2.0946,
+      "step": 122
+    },
+    {
+      "epoch": 0.2907801418439716,
+      "grad_norm": 4.6664862632751465,
+      "learning_rate": 4.330143540669857e-05,
+      "loss": 2.1411,
+      "step": 123
+    },
+    {
+      "epoch": 0.29314420803782504,
+      "grad_norm": 4.570885181427002,
+      "learning_rate": 4.324162679425838e-05,
+      "loss": 2.1268,
+      "step": 124
+    },
+    {
+      "epoch": 0.29550827423167847,
+      "grad_norm": 3.7459053993225098,
+      "learning_rate": 4.318181818181819e-05,
+      "loss": 2.042,
+      "step": 125
+    },
+    {
+      "epoch": 0.2978723404255319,
+      "grad_norm": 4.332980155944824,
+      "learning_rate": 4.312200956937799e-05,
+      "loss": 2.1462,
+      "step": 126
+    },
+    {
+      "epoch": 0.30023640661938533,
+      "grad_norm": 4.2524919509887695,
+      "learning_rate": 4.3062200956937806e-05,
+      "loss": 2.136,
+      "step": 127
+    },
+    {
+      "epoch": 0.30260047281323876,
+      "grad_norm": 4.230786323547363,
+      "learning_rate": 4.300239234449761e-05,
+      "loss": 2.1397,
+      "step": 128
+    },
+    {
+      "epoch": 0.3049645390070922,
+      "grad_norm": 4.281345844268799,
+      "learning_rate": 4.294258373205742e-05,
+      "loss": 2.1,
+      "step": 129
+    },
+    {
+      "epoch": 0.3073286052009456,
+      "grad_norm": 4.491425514221191,
+      "learning_rate": 4.288277511961723e-05,
+      "loss": 2.1304,
+      "step": 130
+    },
+    {
+      "epoch": 0.30969267139479906,
+      "grad_norm": 4.068315029144287,
+      "learning_rate": 4.282296650717704e-05,
+      "loss": 2.1201,
+      "step": 131
+    },
+    {
+      "epoch": 0.3120567375886525,
+      "grad_norm": 4.424421787261963,
+      "learning_rate": 4.2763157894736847e-05,
+      "loss": 2.0387,
+      "step": 132
+    },
+    {
+      "epoch": 0.3144208037825059,
+      "grad_norm": 4.6968865394592285,
+      "learning_rate": 4.270334928229665e-05,
+      "loss": 2.0306,
+      "step": 133
+    },
+    {
+      "epoch": 0.31678486997635935,
+      "grad_norm": 3.468080520629883,
+      "learning_rate": 4.2643540669856466e-05,
+      "loss": 2.0499,
+      "step": 134
+    },
+    {
+      "epoch": 0.3191489361702128,
+      "grad_norm": 4.1545186042785645,
+      "learning_rate": 4.258373205741627e-05,
+      "loss": 2.0953,
+      "step": 135
+    },
+    {
+      "epoch": 0.3215130023640662,
+      "grad_norm": 4.665497779846191,
+      "learning_rate": 4.252392344497608e-05,
+      "loss": 2.0821,
+      "step": 136
+    },
+    {
+      "epoch": 0.32387706855791965,
+      "grad_norm": 4.307028770446777,
+      "learning_rate": 4.246411483253589e-05,
+      "loss": 2.0566,
+      "step": 137
+    },
+    {
+      "epoch": 0.3262411347517731,
+      "grad_norm": 4.265110969543457,
+      "learning_rate": 4.24043062200957e-05,
+      "loss": 2.1148,
+      "step": 138
+    },
+    {
+      "epoch": 0.32860520094562645,
+      "grad_norm": 4.940046787261963,
+      "learning_rate": 4.2344497607655506e-05,
+      "loss": 2.0617,
+      "step": 139
+    },
+    {
+      "epoch": 0.3309692671394799,
+      "grad_norm": 4.576653480529785,
+      "learning_rate": 4.2284688995215316e-05,
+      "loss": 2.1433,
+      "step": 140
+    },
+    {
+      "epoch": 0.3333333333333333,
+      "grad_norm": 4.502660274505615,
+      "learning_rate": 4.2224880382775126e-05,
+      "loss": 2.0935,
+      "step": 141
+    },
+    {
+      "epoch": 0.33569739952718675,
+      "grad_norm": 5.039083480834961,
+      "learning_rate": 4.216507177033493e-05,
+      "loss": 2.1362,
+      "step": 142
+    },
+    {
+      "epoch": 0.3380614657210402,
+      "grad_norm": 4.402792930603027,
+      "learning_rate": 4.210526315789474e-05,
+      "loss": 2.0993,
+      "step": 143
+    },
+    {
+      "epoch": 0.3404255319148936,
+      "grad_norm": 4.509486675262451,
+      "learning_rate": 4.204545454545455e-05,
+      "loss": 2.0299,
+      "step": 144
+    },
+    {
+      "epoch": 0.34278959810874704,
+      "grad_norm": 4.011101722717285,
+      "learning_rate": 4.198564593301436e-05,
+      "loss": 2.0929,
+      "step": 145
+    },
+    {
+      "epoch": 0.34515366430260047,
+      "grad_norm": 5.007168292999268,
+      "learning_rate": 4.192583732057416e-05,
+      "loss": 2.0842,
+      "step": 146
+    },
+    {
+      "epoch": 0.3475177304964539,
+      "grad_norm": 4.631533145904541,
+      "learning_rate": 4.1866028708133976e-05,
+      "loss": 2.0474,
+      "step": 147
+    },
+    {
+      "epoch": 0.34988179669030733,
+      "grad_norm": 4.278231620788574,
+      "learning_rate": 4.1806220095693785e-05,
+      "loss": 2.0819,
+      "step": 148
+    },
+    {
+      "epoch": 0.35224586288416077,
+      "grad_norm": 4.484694480895996,
+      "learning_rate": 4.174641148325359e-05,
+      "loss": 2.0502,
+      "step": 149
+    },
+    {
+      "epoch": 0.3546099290780142,
+      "grad_norm": 4.756961345672607,
+      "learning_rate": 4.1686602870813404e-05,
+      "loss": 2.0529,
+      "step": 150
+    },
+    {
+      "epoch": 0.35697399527186763,
+      "grad_norm": 4.347701549530029,
+      "learning_rate": 4.162679425837321e-05,
+      "loss": 2.0666,
+      "step": 151
+    },
+    {
+      "epoch": 0.35933806146572106,
+      "grad_norm": 3.76229190826416,
+      "learning_rate": 4.156698564593302e-05,
+      "loss": 2.0763,
+      "step": 152
+    },
+    {
+      "epoch": 0.3617021276595745,
+      "grad_norm": 3.8186378479003906,
+      "learning_rate": 4.150717703349282e-05,
+      "loss": 2.0392,
+      "step": 153
+    },
+    {
+      "epoch": 0.3640661938534279,
+      "grad_norm": 4.353139877319336,
+      "learning_rate": 4.1447368421052636e-05,
+      "loss": 1.9854,
+      "step": 154
+    },
+    {
+      "epoch": 0.3664302600472813,
+      "grad_norm": 5.3801422119140625,
+      "learning_rate": 4.138755980861244e-05,
+      "loss": 1.9959,
+      "step": 155
+    },
+    {
+      "epoch": 0.36879432624113473,
+      "grad_norm": 4.44751501083374,
+      "learning_rate": 4.132775119617225e-05,
+      "loss": 2.0911,
+      "step": 156
+    },
+    {
+      "epoch": 0.37115839243498816,
+      "grad_norm": 4.025294303894043,
+      "learning_rate": 4.1267942583732064e-05,
+      "loss": 2.0192,
+      "step": 157
+    },
+    {
+      "epoch": 0.3735224586288416,
+      "grad_norm": 4.605421543121338,
+      "learning_rate": 4.120813397129187e-05,
+      "loss": 2.1464,
+      "step": 158
+    },
+    {
+      "epoch": 0.375886524822695,
+      "grad_norm": 5.045236110687256,
+      "learning_rate": 4.114832535885168e-05,
+      "loss": 2.0756,
+      "step": 159
+    },
+    {
+      "epoch": 0.37825059101654845,
+      "grad_norm": 4.459848403930664,
+      "learning_rate": 4.1088516746411486e-05,
+      "loss": 2.1544,
+      "step": 160
+    },
+    {
+      "epoch": 0.3806146572104019,
+      "grad_norm": 4.420864105224609,
+      "learning_rate": 4.1028708133971296e-05,
+      "loss": 2.0296,
+      "step": 161
+    },
+    {
+      "epoch": 0.3829787234042553,
+      "grad_norm": 3.998237133026123,
+      "learning_rate": 4.09688995215311e-05,
+      "loss": 2.1039,
+      "step": 162
+    },
+    {
+      "epoch": 0.38534278959810875,
+      "grad_norm": 4.448668003082275,
+      "learning_rate": 4.0909090909090915e-05,
+      "loss": 2.0762,
+      "step": 163
+    },
+    {
+      "epoch": 0.3877068557919622,
+      "grad_norm": 4.266834259033203,
+      "learning_rate": 4.084928229665072e-05,
+      "loss": 2.0557,
+      "step": 164
+    },
+    {
+      "epoch": 0.3900709219858156,
+      "grad_norm": 4.796644687652588,
+      "learning_rate": 4.078947368421053e-05,
+      "loss": 2.1027,
+      "step": 165
+    },
+    {
+      "epoch": 0.39243498817966904,
+      "grad_norm": 4.560353755950928,
+      "learning_rate": 4.0729665071770337e-05,
+      "loss": 2.0245,
+      "step": 166
+    },
+    {
+      "epoch": 0.3947990543735225,
+      "grad_norm": 3.673262119293213,
+      "learning_rate": 4.0669856459330146e-05,
+      "loss": 2.033,
+      "step": 167
+    },
+    {
+      "epoch": 0.3971631205673759,
+      "grad_norm": 4.134026050567627,
+      "learning_rate": 4.0610047846889956e-05,
+      "loss": 2.0079,
+      "step": 168
+    },
+    {
+      "epoch": 0.39952718676122934,
+      "grad_norm": 4.677055835723877,
+      "learning_rate": 4.055023923444976e-05,
+      "loss": 2.0874,
+      "step": 169
+    },
+    {
+      "epoch": 0.40189125295508277,
+      "grad_norm": 4.731474876403809,
+      "learning_rate": 4.0490430622009575e-05,
+      "loss": 1.9878,
+      "step": 170
+    },
+    {
+      "epoch": 0.40425531914893614,
+      "grad_norm": 4.689650058746338,
+      "learning_rate": 4.043062200956938e-05,
+      "loss": 2.0933,
+      "step": 171
+    },
+    {
+      "epoch": 0.4066193853427896,
+      "grad_norm": 4.23575496673584,
+      "learning_rate": 4.037081339712919e-05,
+      "loss": 2.0436,
+      "step": 172
+    },
+    {
+      "epoch": 0.408983451536643,
+      "grad_norm": 3.8167171478271484,
+      "learning_rate": 4.0311004784688996e-05,
+      "loss": 1.9843,
+      "step": 173
+    },
+    {
+      "epoch": 0.41134751773049644,
+      "grad_norm": 4.35087251663208,
+      "learning_rate": 4.0251196172248806e-05,
+      "loss": 2.0386,
+      "step": 174
+    },
+    {
+      "epoch": 0.41371158392434987,
+      "grad_norm": 4.764196395874023,
+      "learning_rate": 4.0191387559808616e-05,
+      "loss": 2.0783,
+      "step": 175
+    },
+    {
+      "epoch": 0.4160756501182033,
+      "grad_norm": 4.374221324920654,
+      "learning_rate": 4.0131578947368425e-05,
+      "loss": 1.9711,
+      "step": 176
+    },
+    {
+      "epoch": 0.41843971631205673,
+      "grad_norm": 4.542954444885254,
+      "learning_rate": 4.0071770334928235e-05,
+      "loss": 2.0036,
+      "step": 177
+    },
+    {
+      "epoch": 0.42080378250591016,
+      "grad_norm": 3.849266767501831,
+      "learning_rate": 4.001196172248804e-05,
+      "loss": 2.0571,
+      "step": 178
+    },
+    {
+      "epoch": 0.4231678486997636,
+      "grad_norm": 4.014004230499268,
+      "learning_rate": 3.995215311004785e-05,
+      "loss": 2.0642,
+      "step": 179
+    },
+    {
+      "epoch": 0.425531914893617,
+      "grad_norm": 4.369685649871826,
+      "learning_rate": 3.9892344497607656e-05,
+      "loss": 2.0502,
+      "step": 180
+    },
+    {
+      "epoch": 0.42789598108747046,
+      "grad_norm": 4.280391216278076,
+      "learning_rate": 3.9832535885167466e-05,
+      "loss": 2.0025,
+      "step": 181
+    },
+    {
+      "epoch": 0.4302600472813239,
+      "grad_norm": 4.444298267364502,
+      "learning_rate": 3.9772727272727275e-05,
+      "loss": 1.9828,
+      "step": 182
+    },
+    {
+      "epoch": 0.4326241134751773,
+      "grad_norm": 3.794656991958618,
+      "learning_rate": 3.9712918660287085e-05,
+      "loss": 2.048,
+      "step": 183
+    },
+    {
+      "epoch": 0.43498817966903075,
+      "grad_norm": 4.498697280883789,
+      "learning_rate": 3.9653110047846894e-05,
+      "loss": 2.1017,
+      "step": 184
+    },
+    {
+      "epoch": 0.4373522458628842,
+      "grad_norm": 4.182841777801514,
+      "learning_rate": 3.95933014354067e-05,
+      "loss": 2.0588,
+      "step": 185
+    },
+    {
+      "epoch": 0.4397163120567376,
+      "grad_norm": 4.118316173553467,
+      "learning_rate": 3.9533492822966514e-05,
+      "loss": 2.0654,
+      "step": 186
+    },
+    {
+      "epoch": 0.44208037825059104,
+      "grad_norm": 5.026358127593994,
+      "learning_rate": 3.9473684210526316e-05,
+      "loss": 2.0808,
+      "step": 187
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 4.171792984008789,
+      "learning_rate": 3.9413875598086126e-05,
+      "loss": 1.9832,
+      "step": 188
+    },
+    {
+      "epoch": 0.44680851063829785,
+      "grad_norm": 4.625768661499023,
+      "learning_rate": 3.9354066985645935e-05,
+      "loss": 2.0244,
+      "step": 189
+    },
+    {
+      "epoch": 0.4491725768321513,
+      "grad_norm": 4.119114398956299,
+      "learning_rate": 3.9294258373205745e-05,
+      "loss": 2.011,
+      "step": 190
+    },
+    {
+      "epoch": 0.4515366430260047,
+      "grad_norm": 4.071448802947998,
+      "learning_rate": 3.9234449760765554e-05,
+      "loss": 2.0469,
+      "step": 191
+    },
+    {
+      "epoch": 0.45390070921985815,
+      "grad_norm": 4.00705099105835,
+      "learning_rate": 3.917464114832536e-05,
+      "loss": 1.9889,
+      "step": 192
+    },
+    {
+      "epoch": 0.4562647754137116,
+      "grad_norm": 4.969683647155762,
+      "learning_rate": 3.9114832535885173e-05,
+      "loss": 2.0097,
+      "step": 193
+    },
+    {
+      "epoch": 0.458628841607565,
+      "grad_norm": 4.4712605476379395,
+      "learning_rate": 3.9055023923444976e-05,
+      "loss": 2.0413,
+      "step": 194
+    },
+    {
+      "epoch": 0.46099290780141844,
+      "grad_norm": 4.194468021392822,
+      "learning_rate": 3.8995215311004786e-05,
+      "loss": 2.0623,
+      "step": 195
+    },
+    {
+      "epoch": 0.46335697399527187,
+      "grad_norm": 3.9785311222076416,
+      "learning_rate": 3.8935406698564595e-05,
+      "loss": 2.0606,
+      "step": 196
+    },
+    {
+      "epoch": 0.4657210401891253,
+      "grad_norm": 4.314589977264404,
+      "learning_rate": 3.8875598086124405e-05,
+      "loss": 2.0411,
+      "step": 197
+    },
+    {
+      "epoch": 0.46808510638297873,
+      "grad_norm": 3.732752561569214,
+      "learning_rate": 3.8815789473684214e-05,
+      "loss": 1.8543,
+      "step": 198
+    },
+    {
+      "epoch": 0.47044917257683216,
+      "grad_norm": 4.228579998016357,
+      "learning_rate": 3.8755980861244024e-05,
+      "loss": 2.0262,
+      "step": 199
+    },
+    {
+      "epoch": 0.4728132387706856,
+      "grad_norm": 3.9367430210113525,
+      "learning_rate": 3.869617224880383e-05,
+      "loss": 2.0398,
+      "step": 200
+    },
+    {
+      "epoch": 0.475177304964539,
+      "grad_norm": 3.8661906719207764,
+      "learning_rate": 3.8636363636363636e-05,
+      "loss": 1.9708,
+      "step": 201
+    },
+    {
+      "epoch": 0.47754137115839246,
+      "grad_norm": 3.8464865684509277,
+      "learning_rate": 3.8576555023923446e-05,
+      "loss": 2.0388,
+      "step": 202
+    },
+    {
+      "epoch": 0.4799054373522459,
+      "grad_norm": 3.867969274520874,
+      "learning_rate": 3.8516746411483255e-05,
+      "loss": 2.0592,
+      "step": 203
+    },
+    {
+      "epoch": 0.48226950354609927,
+      "grad_norm": 4.027868270874023,
+      "learning_rate": 3.8456937799043065e-05,
+      "loss": 2.0641,
+      "step": 204
+    },
+    {
+      "epoch": 0.4846335697399527,
+      "grad_norm": 4.061384677886963,
+      "learning_rate": 3.839712918660287e-05,
+      "loss": 2.0542,
+      "step": 205
+    },
+    {
+      "epoch": 0.48699763593380613,
+      "grad_norm": 3.954545021057129,
+      "learning_rate": 3.8337320574162684e-05,
+      "loss": 1.9951,
+      "step": 206
+    },
+    {
+      "epoch": 0.48936170212765956,
+      "grad_norm": 3.8164191246032715,
+      "learning_rate": 3.8277511961722486e-05,
+      "loss": 2.0094,
+      "step": 207
+    },
+    {
+      "epoch": 0.491725768321513,
+      "grad_norm": 4.0749053955078125,
+      "learning_rate": 3.8217703349282296e-05,
+      "loss": 1.9861,
+      "step": 208
+    },
+    {
+      "epoch": 0.4940898345153664,
+      "grad_norm": 3.8733890056610107,
+      "learning_rate": 3.815789473684211e-05,
+      "loss": 1.9821,
+      "step": 209
+    },
+    {
+      "epoch": 0.49645390070921985,
+      "grad_norm": 3.9854531288146973,
+      "learning_rate": 3.8098086124401915e-05,
+      "loss": 1.9549,
+      "step": 210
+    },
+    {
+      "epoch": 0.4988179669030733,
+      "grad_norm": 4.046158313751221,
+      "learning_rate": 3.8038277511961725e-05,
+      "loss": 2.02,
+      "step": 211
+    },
+    {
+      "epoch": 0.5011820330969267,
+      "grad_norm": 4.107903003692627,
+      "learning_rate": 3.7978468899521534e-05,
+      "loss": 1.9814,
+      "step": 212
+    },
+    {
+      "epoch": 0.5035460992907801,
+      "grad_norm": 4.039123058319092,
+      "learning_rate": 3.7918660287081344e-05,
+      "loss": 1.9969,
+      "step": 213
+    },
+    {
+      "epoch": 0.5059101654846335,
+      "grad_norm": 4.200449466705322,
+      "learning_rate": 3.7858851674641146e-05,
+      "loss": 1.9756,
+      "step": 214
+    },
+    {
+      "epoch": 0.508274231678487,
+      "grad_norm": 4.346997261047363,
+      "learning_rate": 3.7799043062200956e-05,
+      "loss": 2.0102,
+      "step": 215
+    },
+    {
+      "epoch": 0.5106382978723404,
+      "grad_norm": 4.00998067855835,
+      "learning_rate": 3.7739234449760765e-05,
+      "loss": 2.0895,
+      "step": 216
+    },
+    {
+      "epoch": 0.5130023640661938,
+      "grad_norm": 4.083645820617676,
+      "learning_rate": 3.7679425837320575e-05,
+      "loss": 1.9927,
+      "step": 217
+    },
+    {
+      "epoch": 0.5153664302600472,
+      "grad_norm": 4.116786956787109,
+      "learning_rate": 3.7619617224880384e-05,
+      "loss": 1.9798,
+      "step": 218
+    },
+    {
+      "epoch": 0.5177304964539007,
+      "grad_norm": 3.8103268146514893,
+      "learning_rate": 3.7559808612440194e-05,
+      "loss": 1.9616,
+      "step": 219
+    },
+    {
+      "epoch": 0.5200945626477541,
+      "grad_norm": 3.789548635482788,
+      "learning_rate": 3.7500000000000003e-05,
+      "loss": 1.995,
+      "step": 220
+    },
+    {
+      "epoch": 0.5224586288416075,
+      "grad_norm": 4.23507022857666,
+      "learning_rate": 3.7440191387559806e-05,
+      "loss": 2.0384,
+      "step": 221
+    },
+    {
+      "epoch": 0.524822695035461,
+      "grad_norm": 4.319752216339111,
+      "learning_rate": 3.738038277511962e-05,
+      "loss": 2.0136,
+      "step": 222
+    },
+    {
+      "epoch": 0.5271867612293144,
+      "grad_norm": 4.502188682556152,
+      "learning_rate": 3.7320574162679425e-05,
+      "loss": 1.993,
+      "step": 223
+    },
+    {
+      "epoch": 0.5295508274231678,
+      "grad_norm": 3.8698880672454834,
+      "learning_rate": 3.7260765550239235e-05,
+      "loss": 1.9742,
+      "step": 224
+    },
+    {
+      "epoch": 0.5319148936170213,
+      "grad_norm": 3.9415745735168457,
+      "learning_rate": 3.7200956937799044e-05,
+      "loss": 1.9259,
+      "step": 225
+    },
+    {
+      "epoch": 0.5342789598108747,
+      "grad_norm": 4.509599208831787,
+      "learning_rate": 3.7141148325358854e-05,
+      "loss": 1.9809,
+      "step": 226
+    },
+    {
+      "epoch": 0.5366430260047281,
+      "grad_norm": 4.538198471069336,
+      "learning_rate": 3.7081339712918663e-05,
+      "loss": 2.0455,
+      "step": 227
+    },
+    {
+      "epoch": 0.5390070921985816,
+      "grad_norm": 4.268890857696533,
+      "learning_rate": 3.7021531100478466e-05,
+      "loss": 2.0385,
+      "step": 228
+    },
+    {
+      "epoch": 0.541371158392435,
+      "grad_norm": 4.729723930358887,
+      "learning_rate": 3.696172248803828e-05,
+      "loss": 2.0132,
+      "step": 229
+    },
+    {
+      "epoch": 0.5437352245862884,
+      "grad_norm": 3.6644513607025146,
+      "learning_rate": 3.6901913875598085e-05,
+      "loss": 1.9425,
+      "step": 230
+    },
+    {
+      "epoch": 0.5460992907801419,
+      "grad_norm": 3.711594343185425,
+      "learning_rate": 3.6842105263157895e-05,
+      "loss": 2.0302,
+      "step": 231
+    },
+    {
+      "epoch": 0.5484633569739953,
+      "grad_norm": 4.120182991027832,
+      "learning_rate": 3.6782296650717704e-05,
+      "loss": 1.9863,
+      "step": 232
+    },
+    {
+      "epoch": 0.5508274231678487,
+      "grad_norm": 4.521392822265625,
+      "learning_rate": 3.6722488038277514e-05,
+      "loss": 1.9855,
+      "step": 233
+    },
+    {
+      "epoch": 0.5531914893617021,
+      "grad_norm": 4.531548023223877,
+      "learning_rate": 3.666267942583732e-05,
+      "loss": 1.9397,
+      "step": 234
+    },
+    {
+      "epoch": 0.5555555555555556,
+      "grad_norm": 3.672027111053467,
+      "learning_rate": 3.660287081339713e-05,
+      "loss": 2.0197,
+      "step": 235
+    },
+    {
+      "epoch": 0.557919621749409,
+      "grad_norm": 3.798299789428711,
+      "learning_rate": 3.654306220095694e-05,
+      "loss": 1.9749,
+      "step": 236
+    },
+    {
+      "epoch": 0.5602836879432624,
+      "grad_norm": 4.396960258483887,
+      "learning_rate": 3.6483253588516745e-05,
+      "loss": 1.9643,
+      "step": 237
+    },
+    {
+      "epoch": 0.5626477541371159,
+      "grad_norm": 3.955106019973755,
+      "learning_rate": 3.642344497607656e-05,
+      "loss": 1.9963,
+      "step": 238
+    },
+    {
+      "epoch": 0.5650118203309693,
+      "grad_norm": 4.263886451721191,
+      "learning_rate": 3.6363636363636364e-05,
+      "loss": 1.9553,
+      "step": 239
+    },
+    {
+      "epoch": 0.5673758865248227,
+      "grad_norm": 4.148135662078857,
+      "learning_rate": 3.6303827751196174e-05,
+      "loss": 1.9899,
+      "step": 240
+    },
+    {
+      "epoch": 0.5697399527186762,
+      "grad_norm": 4.090165138244629,
+      "learning_rate": 3.624401913875598e-05,
+      "loss": 1.9922,
+      "step": 241
+    },
+    {
+      "epoch": 0.5721040189125296,
+      "grad_norm": 3.7597265243530273,
+      "learning_rate": 3.618421052631579e-05,
+      "loss": 1.9846,
+      "step": 242
+    },
+    {
+      "epoch": 0.574468085106383,
+      "grad_norm": 4.532017707824707,
+      "learning_rate": 3.61244019138756e-05,
+      "loss": 1.9955,
+      "step": 243
+    },
+    {
+      "epoch": 0.5768321513002365,
+      "grad_norm": 4.268376350402832,
+      "learning_rate": 3.6064593301435405e-05,
+      "loss": 1.928,
+      "step": 244
+    },
+    {
+      "epoch": 0.5791962174940898,
+      "grad_norm": 4.414393901824951,
+      "learning_rate": 3.600478468899522e-05,
+      "loss": 1.9792,
+      "step": 245
+    },
+    {
+      "epoch": 0.5815602836879432,
+      "grad_norm": 4.180037498474121,
+      "learning_rate": 3.5944976076555024e-05,
+      "loss": 1.9661,
+      "step": 246
+    },
+    {
+      "epoch": 0.5839243498817966,
+      "grad_norm": 4.057360649108887,
+      "learning_rate": 3.5885167464114834e-05,
+      "loss": 1.9004,
+      "step": 247
+    },
+    {
+      "epoch": 0.5862884160756501,
+      "grad_norm": 3.8866353034973145,
+      "learning_rate": 3.582535885167464e-05,
+      "loss": 1.9582,
+      "step": 248
+    },
+    {
+      "epoch": 0.5886524822695035,
+      "grad_norm": 4.187384605407715,
+      "learning_rate": 3.576555023923445e-05,
+      "loss": 1.992,
+      "step": 249
+    },
+    {
+      "epoch": 0.5910165484633569,
+      "grad_norm": 3.99149751663208,
+      "learning_rate": 3.570574162679426e-05,
+      "loss": 1.9069,
+      "step": 250
+    },
+    {
+      "epoch": 0.5933806146572104,
+      "grad_norm": 3.89758563041687,
+      "learning_rate": 3.5645933014354065e-05,
+      "loss": 2.0696,
+      "step": 251
+    },
+    {
+      "epoch": 0.5957446808510638,
+      "grad_norm": 4.572390556335449,
+      "learning_rate": 3.558612440191388e-05,
+      "loss": 2.0519,
+      "step": 252
+    },
+    {
+      "epoch": 0.5981087470449172,
+      "grad_norm": 3.7999329566955566,
+      "learning_rate": 3.5526315789473684e-05,
+      "loss": 1.9143,
+      "step": 253
+    },
+    {
+      "epoch": 0.6004728132387707,
+      "grad_norm": 3.8050220012664795,
+      "learning_rate": 3.5466507177033493e-05,
+      "loss": 1.919,
+      "step": 254
+    },
+    {
+      "epoch": 0.6028368794326241,
+      "grad_norm": 4.685467720031738,
+      "learning_rate": 3.54066985645933e-05,
+      "loss": 2.0639,
+      "step": 255
+    },
+    {
+      "epoch": 0.6052009456264775,
+      "grad_norm": 4.132735252380371,
+      "learning_rate": 3.534688995215311e-05,
+      "loss": 2.0094,
+      "step": 256
+    },
+    {
+      "epoch": 0.607565011820331,
+      "grad_norm": 3.779338836669922,
+      "learning_rate": 3.5287081339712915e-05,
+      "loss": 1.9425,
+      "step": 257
+    },
+    {
+      "epoch": 0.6099290780141844,
+      "grad_norm": 3.988375186920166,
+      "learning_rate": 3.522727272727273e-05,
+      "loss": 1.9529,
+      "step": 258
+    },
+    {
+      "epoch": 0.6122931442080378,
+      "grad_norm": 3.8638248443603516,
+      "learning_rate": 3.516746411483254e-05,
+      "loss": 1.9326,
+      "step": 259
+    },
+    {
+      "epoch": 0.6146572104018913,
+      "grad_norm": 3.718116283416748,
+      "learning_rate": 3.5107655502392344e-05,
+      "loss": 1.9376,
+      "step": 260
+    },
+    {
+      "epoch": 0.6170212765957447,
+      "grad_norm": 3.8833279609680176,
+      "learning_rate": 3.504784688995216e-05,
+      "loss": 2.0297,
+      "step": 261
+    },
+    {
+      "epoch": 0.6193853427895981,
+      "grad_norm": 3.518829345703125,
+      "learning_rate": 3.498803827751196e-05,
+      "loss": 1.991,
+      "step": 262
+    },
+    {
+      "epoch": 0.6217494089834515,
+      "grad_norm": 3.8007776737213135,
+      "learning_rate": 3.492822966507177e-05,
+      "loss": 1.9722,
+      "step": 263
+    },
+    {
+      "epoch": 0.624113475177305,
+      "grad_norm": 3.51373028755188,
+      "learning_rate": 3.4868421052631575e-05,
+      "loss": 1.8961,
+      "step": 264
+    },
+    {
+      "epoch": 0.6264775413711584,
+      "grad_norm": 3.6159632205963135,
+      "learning_rate": 3.480861244019139e-05,
+      "loss": 1.9719,
+      "step": 265
+    },
+    {
+      "epoch": 0.6288416075650118,
+      "grad_norm": 3.923194408416748,
+      "learning_rate": 3.4748803827751194e-05,
+      "loss": 1.9133,
+      "step": 266
+    },
+    {
+      "epoch": 0.6312056737588653,
+      "grad_norm": 3.849912166595459,
+      "learning_rate": 3.4688995215311004e-05,
+      "loss": 1.9508,
+      "step": 267
+    },
+    {
+      "epoch": 0.6335697399527187,
+      "grad_norm": 3.8065907955169678,
+      "learning_rate": 3.462918660287082e-05,
+      "loss": 1.9426,
+      "step": 268
+    },
+    {
+      "epoch": 0.6359338061465721,
+      "grad_norm": 4.141129493713379,
+      "learning_rate": 3.456937799043062e-05,
+      "loss": 2.0031,
+      "step": 269
+    },
+    {
+      "epoch": 0.6382978723404256,
+      "grad_norm": 4.0821123123168945,
+      "learning_rate": 3.450956937799043e-05,
+      "loss": 1.9503,
+      "step": 270
+    },
+    {
+      "epoch": 0.640661938534279,
+      "grad_norm": 3.863445997238159,
+      "learning_rate": 3.444976076555024e-05,
+      "loss": 1.918,
+      "step": 271
+    },
+    {
+      "epoch": 0.6430260047281324,
+      "grad_norm": 3.9590418338775635,
+      "learning_rate": 3.438995215311005e-05,
+      "loss": 1.9768,
+      "step": 272
+    },
+    {
+      "epoch": 0.6453900709219859,
+      "grad_norm": 3.7065553665161133,
+      "learning_rate": 3.4330143540669854e-05,
+      "loss": 1.95,
+      "step": 273
+    },
+    {
+      "epoch": 0.6477541371158393,
+      "grad_norm": 3.657320737838745,
+      "learning_rate": 3.427033492822967e-05,
+      "loss": 1.9733,
+      "step": 274
+    },
+    {
+      "epoch": 0.6501182033096927,
+      "grad_norm": 4.384260654449463,
+      "learning_rate": 3.421052631578947e-05,
+      "loss": 2.0237,
+      "step": 275
+    },
+    {
+      "epoch": 0.6524822695035462,
+      "grad_norm": 4.627539157867432,
+      "learning_rate": 3.415071770334928e-05,
+      "loss": 1.9492,
+      "step": 276
+    },
+    {
+      "epoch": 0.6548463356973995,
+      "grad_norm": 3.8552463054656982,
+      "learning_rate": 3.409090909090909e-05,
+      "loss": 1.9489,
+      "step": 277
+    },
+    {
+      "epoch": 0.6572104018912529,
+      "grad_norm": 3.8745601177215576,
+      "learning_rate": 3.40311004784689e-05,
+      "loss": 1.9182,
+      "step": 278
+    },
+    {
+      "epoch": 0.6595744680851063,
+      "grad_norm": 4.04236364364624,
+      "learning_rate": 3.397129186602871e-05,
+      "loss": 1.9581,
+      "step": 279
+    },
+    {
+      "epoch": 0.6619385342789598,
+      "grad_norm": 3.5719153881073,
+      "learning_rate": 3.3911483253588514e-05,
+      "loss": 1.9321,
+      "step": 280
+    },
+    {
+      "epoch": 0.6643026004728132,
+      "grad_norm": 4.358823776245117,
+      "learning_rate": 3.385167464114833e-05,
+      "loss": 1.9567,
+      "step": 281
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 4.5859293937683105,
+      "learning_rate": 3.379186602870813e-05,
+      "loss": 1.966,
+      "step": 282
+    },
+    {
+      "epoch": 0.6690307328605201,
+      "grad_norm": 4.17390251159668,
+      "learning_rate": 3.373205741626794e-05,
+      "loss": 2.0129,
+      "step": 283
+    },
+    {
+      "epoch": 0.6713947990543735,
+      "grad_norm": 4.25508975982666,
+      "learning_rate": 3.367224880382775e-05,
+      "loss": 1.9553,
+      "step": 284
+    },
+    {
+      "epoch": 0.6737588652482269,
+      "grad_norm": 3.8857264518737793,
+      "learning_rate": 3.361244019138756e-05,
+      "loss": 1.9755,
+      "step": 285
+    },
+    {
+      "epoch": 0.6761229314420804,
+      "grad_norm": 4.436855792999268,
+      "learning_rate": 3.355263157894737e-05,
+      "loss": 1.9313,
+      "step": 286
+    },
+    {
+      "epoch": 0.6784869976359338,
+      "grad_norm": 3.8671867847442627,
+      "learning_rate": 3.349282296650718e-05,
+      "loss": 1.9402,
+      "step": 287
+    },
+    {
+      "epoch": 0.6808510638297872,
+      "grad_norm": 3.6638526916503906,
+      "learning_rate": 3.343301435406699e-05,
+      "loss": 1.871,
+      "step": 288
+    },
+    {
+      "epoch": 0.6832151300236406,
+      "grad_norm": 4.047444820404053,
+      "learning_rate": 3.337320574162679e-05,
+      "loss": 1.9283,
+      "step": 289
+    },
+    {
+      "epoch": 0.6855791962174941,
+      "grad_norm": 3.7109580039978027,
+      "learning_rate": 3.33133971291866e-05,
+      "loss": 1.9663,
+      "step": 290
+    },
+    {
+      "epoch": 0.6879432624113475,
+      "grad_norm": 3.8624162673950195,
+      "learning_rate": 3.325358851674641e-05,
+      "loss": 1.9802,
+      "step": 291
+    },
+    {
+      "epoch": 0.6903073286052009,
+      "grad_norm": 3.6883432865142822,
+      "learning_rate": 3.319377990430622e-05,
+      "loss": 1.9165,
+      "step": 292
+    },
+    {
+      "epoch": 0.6926713947990544,
+      "grad_norm": 3.7664642333984375,
+      "learning_rate": 3.313397129186603e-05,
+      "loss": 1.9822,
+      "step": 293
+    },
+    {
+      "epoch": 0.6950354609929078,
+      "grad_norm": 3.6505606174468994,
+      "learning_rate": 3.307416267942584e-05,
+      "loss": 1.8814,
+      "step": 294
+    },
+    {
+      "epoch": 0.6973995271867612,
+      "grad_norm": 3.5809507369995117,
+      "learning_rate": 3.301435406698565e-05,
+      "loss": 1.9044,
+      "step": 295
+    },
+    {
+      "epoch": 0.6997635933806147,
+      "grad_norm": 3.8675575256347656,
+      "learning_rate": 3.295454545454545e-05,
+      "loss": 1.8942,
+      "step": 296
+    },
+    {
+      "epoch": 0.7021276595744681,
+      "grad_norm": 3.916057586669922,
+      "learning_rate": 3.289473684210527e-05,
+      "loss": 2.0424,
+      "step": 297
+    },
+    {
+      "epoch": 0.7044917257683215,
+      "grad_norm": 3.671261787414551,
+      "learning_rate": 3.283492822966507e-05,
+      "loss": 1.9595,
+      "step": 298
+    },
+    {
+      "epoch": 0.706855791962175,
+      "grad_norm": 4.06015682220459,
+      "learning_rate": 3.277511961722488e-05,
+      "loss": 1.8396,
+      "step": 299
+    },
+    {
+      "epoch": 0.7092198581560284,
+      "grad_norm": 3.9253146648406982,
+      "learning_rate": 3.271531100478469e-05,
+      "loss": 1.9823,
+      "step": 300
+    },
+    {
+      "epoch": 0.7115839243498818,
+      "grad_norm": 4.20823335647583,
+      "learning_rate": 3.26555023923445e-05,
+      "loss": 1.9768,
+      "step": 301
+    },
+    {
+      "epoch": 0.7139479905437353,
+      "grad_norm": 4.035942554473877,
+      "learning_rate": 3.259569377990431e-05,
+      "loss": 2.0076,
+      "step": 302
+    },
+    {
+      "epoch": 0.7163120567375887,
+      "grad_norm": 3.9924399852752686,
+      "learning_rate": 3.253588516746411e-05,
+      "loss": 1.9217,
+      "step": 303
+    },
+    {
+      "epoch": 0.7186761229314421,
+      "grad_norm": 3.8201253414154053,
+      "learning_rate": 3.247607655502393e-05,
+      "loss": 1.9539,
+      "step": 304
+    },
+    {
+      "epoch": 0.7210401891252955,
+      "grad_norm": 4.5720696449279785,
+      "learning_rate": 3.241626794258373e-05,
+      "loss": 1.9805,
+      "step": 305
+    },
+    {
+      "epoch": 0.723404255319149,
+      "grad_norm": 4.060619354248047,
+      "learning_rate": 3.235645933014354e-05,
+      "loss": 1.9667,
+      "step": 306
+    },
+    {
+      "epoch": 0.7257683215130024,
+      "grad_norm": 4.097210884094238,
+      "learning_rate": 3.229665071770335e-05,
+      "loss": 1.9609,
+      "step": 307
+    },
+    {
+      "epoch": 0.7281323877068558,
+      "grad_norm": 3.757749080657959,
+      "learning_rate": 3.223684210526316e-05,
+      "loss": 2.0129,
+      "step": 308
+    },
+    {
+      "epoch": 0.7304964539007093,
+      "grad_norm": 4.25848388671875,
+      "learning_rate": 3.217703349282297e-05,
+      "loss": 2.0603,
+      "step": 309
+    },
+    {
+      "epoch": 0.7328605200945626,
+      "grad_norm": 3.9824750423431396,
+      "learning_rate": 3.211722488038278e-05,
+      "loss": 1.8766,
+      "step": 310
+    },
+    {
+      "epoch": 0.735224586288416,
+      "grad_norm": 4.179236888885498,
+      "learning_rate": 3.205741626794259e-05,
+      "loss": 1.9354,
+      "step": 311
+    },
+    {
+      "epoch": 0.7375886524822695,
+      "grad_norm": 3.9761154651641846,
+      "learning_rate": 3.199760765550239e-05,
+      "loss": 1.9749,
+      "step": 312
+    },
+    {
+      "epoch": 0.7399527186761229,
+      "grad_norm": 3.731112241744995,
+      "learning_rate": 3.19377990430622e-05,
+      "loss": 1.9597,
+      "step": 313
+    },
+    {
+      "epoch": 0.7423167848699763,
+      "grad_norm": 3.762098789215088,
+      "learning_rate": 3.187799043062201e-05,
+      "loss": 1.8983,
+      "step": 314
+    },
+    {
+      "epoch": 0.7446808510638298,
+      "grad_norm": 4.024747371673584,
+      "learning_rate": 3.181818181818182e-05,
+      "loss": 1.9149,
+      "step": 315
+    },
+    {
+      "epoch": 0.7470449172576832,
+      "grad_norm": 4.643849849700928,
+      "learning_rate": 3.175837320574162e-05,
+      "loss": 1.9244,
+      "step": 316
+    },
+    {
+      "epoch": 0.7494089834515366,
+      "grad_norm": 4.206188201904297,
+      "learning_rate": 3.169856459330144e-05,
+      "loss": 1.9201,
+      "step": 317
+    },
+    {
+      "epoch": 0.75177304964539,
+      "grad_norm": 3.647243022918701,
+      "learning_rate": 3.163875598086124e-05,
+      "loss": 1.8717,
+      "step": 318
+    },
+    {
+      "epoch": 0.7541371158392435,
+      "grad_norm": 3.7720701694488525,
+      "learning_rate": 3.157894736842105e-05,
+      "loss": 1.8891,
+      "step": 319
+    },
+    {
+      "epoch": 0.7565011820330969,
+      "grad_norm": 3.8284690380096436,
+      "learning_rate": 3.151913875598087e-05,
+      "loss": 1.9221,
+      "step": 320
+    },
+    {
+      "epoch": 0.7588652482269503,
+      "grad_norm": 4.109698295593262,
+      "learning_rate": 3.145933014354067e-05,
+      "loss": 1.8875,
+      "step": 321
+    },
+    {
+      "epoch": 0.7612293144208038,
+      "grad_norm": 3.684807062149048,
+      "learning_rate": 3.139952153110048e-05,
+      "loss": 1.9002,
+      "step": 322
+    },
+    {
+      "epoch": 0.7635933806146572,
+      "grad_norm": 3.700782299041748,
+      "learning_rate": 3.133971291866029e-05,
+      "loss": 1.9455,
+      "step": 323
+    },
+    {
+      "epoch": 0.7659574468085106,
+      "grad_norm": 3.9737021923065186,
+      "learning_rate": 3.12799043062201e-05,
+      "loss": 1.9065,
+      "step": 324
+    },
+    {
+      "epoch": 0.7683215130023641,
+      "grad_norm": 3.3986127376556396,
+      "learning_rate": 3.12200956937799e-05,
+      "loss": 1.8713,
+      "step": 325
+    },
+    {
+      "epoch": 0.7706855791962175,
+      "grad_norm": 3.8407750129699707,
+      "learning_rate": 3.116028708133971e-05,
+      "loss": 1.8779,
+      "step": 326
+    },
+    {
+      "epoch": 0.7730496453900709,
+      "grad_norm": 4.058707237243652,
+      "learning_rate": 3.110047846889952e-05,
+      "loss": 1.8802,
+      "step": 327
+    },
+    {
+      "epoch": 0.7754137115839244,
+      "grad_norm": 3.9604055881500244,
+      "learning_rate": 3.104066985645933e-05,
+      "loss": 1.8403,
+      "step": 328
+    },
+    {
+      "epoch": 0.7777777777777778,
+      "grad_norm": 4.052743434906006,
+      "learning_rate": 3.098086124401914e-05,
+      "loss": 1.9294,
+      "step": 329
+    },
+    {
+      "epoch": 0.7801418439716312,
+      "grad_norm": 3.3787264823913574,
+      "learning_rate": 3.092105263157895e-05,
+      "loss": 1.8994,
+      "step": 330
+    },
+    {
+      "epoch": 0.7825059101654847,
+      "grad_norm": 3.8170766830444336,
+      "learning_rate": 3.086124401913876e-05,
+      "loss": 1.9975,
+      "step": 331
+    },
+    {
+      "epoch": 0.7848699763593381,
+      "grad_norm": 3.8066794872283936,
+      "learning_rate": 3.080143540669856e-05,
+      "loss": 1.968,
+      "step": 332
+    },
+    {
+      "epoch": 0.7872340425531915,
+      "grad_norm": 4.192262172698975,
+      "learning_rate": 3.074162679425838e-05,
+      "loss": 1.9784,
+      "step": 333
+    },
+    {
+      "epoch": 0.789598108747045,
+      "grad_norm": 3.9428586959838867,
+      "learning_rate": 3.068181818181818e-05,
+      "loss": 1.9362,
+      "step": 334
+    },
+    {
+      "epoch": 0.7919621749408984,
+      "grad_norm": 4.186812400817871,
+      "learning_rate": 3.062200956937799e-05,
+      "loss": 1.9212,
+      "step": 335
+    },
+    {
+      "epoch": 0.7943262411347518,
+      "grad_norm": 4.045288562774658,
+      "learning_rate": 3.05622009569378e-05,
+      "loss": 1.9017,
+      "step": 336
+    },
+    {
+      "epoch": 0.7966903073286052,
+      "grad_norm": 4.203200340270996,
+      "learning_rate": 3.050239234449761e-05,
+      "loss": 1.9844,
+      "step": 337
+    },
+    {
+      "epoch": 0.7990543735224587,
+      "grad_norm": 4.394277095794678,
+      "learning_rate": 3.0442583732057416e-05,
+      "loss": 1.9332,
+      "step": 338
+    },
+    {
+      "epoch": 0.8014184397163121,
+      "grad_norm": 3.7559564113616943,
+      "learning_rate": 3.0382775119617225e-05,
+      "loss": 1.9142,
+      "step": 339
+    },
+    {
+      "epoch": 0.8037825059101655,
+      "grad_norm": 3.5842766761779785,
+      "learning_rate": 3.0322966507177035e-05,
+      "loss": 1.8843,
+      "step": 340
+    },
+    {
+      "epoch": 0.806146572104019,
+      "grad_norm": 3.8942437171936035,
+      "learning_rate": 3.0263157894736844e-05,
+      "loss": 1.825,
+      "step": 341
+    },
+    {
+      "epoch": 0.8085106382978723,
+      "grad_norm": 4.06583833694458,
+      "learning_rate": 3.020334928229665e-05,
+      "loss": 1.8859,
+      "step": 342
+    },
+    {
+      "epoch": 0.8108747044917257,
+      "grad_norm": 3.9897472858428955,
+      "learning_rate": 3.0143540669856463e-05,
+      "loss": 2.0088,
+      "step": 343
+    },
+    {
+      "epoch": 0.8132387706855791,
+      "grad_norm": 4.120972633361816,
+      "learning_rate": 3.008373205741627e-05,
+      "loss": 1.9644,
+      "step": 344
+    },
+    {
+      "epoch": 0.8156028368794326,
+      "grad_norm": 3.761667013168335,
+      "learning_rate": 3.0023923444976076e-05,
+      "loss": 1.8468,
+      "step": 345
+    },
+    {
+      "epoch": 0.817966903073286,
+      "grad_norm": 4.167830944061279,
+      "learning_rate": 2.996411483253589e-05,
+      "loss": 1.8995,
+      "step": 346
+    },
+    {
+      "epoch": 0.8203309692671394,
+      "grad_norm": 3.732332944869995,
+      "learning_rate": 2.9904306220095695e-05,
+      "loss": 1.8529,
+      "step": 347
+    },
+    {
+      "epoch": 0.8226950354609929,
+      "grad_norm": 4.0308003425598145,
+      "learning_rate": 2.9844497607655504e-05,
+      "loss": 1.9401,
+      "step": 348
+    },
+    {
+      "epoch": 0.8250591016548463,
+      "grad_norm": 3.981724500656128,
+      "learning_rate": 2.9784688995215314e-05,
+      "loss": 1.8821,
+      "step": 349
+    },
+    {
+      "epoch": 0.8274231678486997,
+      "grad_norm": 3.8602871894836426,
+      "learning_rate": 2.9724880382775123e-05,
+      "loss": 1.8909,
+      "step": 350
+    },
+    {
+      "epoch": 0.8297872340425532,
+      "grad_norm": 3.8556690216064453,
+      "learning_rate": 2.966507177033493e-05,
+      "loss": 1.9972,
+      "step": 351
+    },
+    {
+      "epoch": 0.8321513002364066,
+      "grad_norm": 3.716454029083252,
+      "learning_rate": 2.9605263157894735e-05,
+      "loss": 1.9235,
+      "step": 352
+    },
+    {
+      "epoch": 0.83451536643026,
+      "grad_norm": 4.057294845581055,
+      "learning_rate": 2.954545454545455e-05,
+      "loss": 1.8717,
+      "step": 353
+    },
+    {
+      "epoch": 0.8368794326241135,
+      "grad_norm": 3.5962278842926025,
+      "learning_rate": 2.9485645933014355e-05,
+      "loss": 1.9414,
+      "step": 354
+    },
+    {
+      "epoch": 0.8392434988179669,
+      "grad_norm": 4.190985679626465,
+      "learning_rate": 2.942583732057416e-05,
+      "loss": 1.9724,
+      "step": 355
+    },
+    {
+      "epoch": 0.8416075650118203,
+      "grad_norm": 3.9760379791259766,
+      "learning_rate": 2.9366028708133974e-05,
+      "loss": 1.839,
+      "step": 356
+    },
+    {
+      "epoch": 0.8439716312056738,
+      "grad_norm": 3.629091501235962,
+      "learning_rate": 2.9306220095693783e-05,
+      "loss": 1.8866,
+      "step": 357
+    },
+    {
+      "epoch": 0.8463356973995272,
+      "grad_norm": 3.752070188522339,
+      "learning_rate": 2.924641148325359e-05,
+      "loss": 1.8868,
+      "step": 358
+    },
+    {
+      "epoch": 0.8486997635933806,
+      "grad_norm": 3.5992238521575928,
+      "learning_rate": 2.9186602870813402e-05,
+      "loss": 1.8586,
+      "step": 359
+    },
+    {
+      "epoch": 0.851063829787234,
+      "grad_norm": 3.47458553314209,
+      "learning_rate": 2.912679425837321e-05,
+      "loss": 1.9659,
+      "step": 360
+    },
+    {
+      "epoch": 0.8534278959810875,
+      "grad_norm": 3.6117656230926514,
+      "learning_rate": 2.9066985645933014e-05,
+      "loss": 1.8892,
+      "step": 361
+    },
+    {
+      "epoch": 0.8557919621749409,
+      "grad_norm": 4.080473899841309,
+      "learning_rate": 2.900717703349282e-05,
+      "loss": 1.8809,
+      "step": 362
+    },
+    {
+      "epoch": 0.8581560283687943,
+      "grad_norm": 4.260461330413818,
+      "learning_rate": 2.8947368421052634e-05,
+      "loss": 1.9461,
+      "step": 363
+    },
+    {
+      "epoch": 0.8605200945626478,
+      "grad_norm": 4.231245994567871,
+      "learning_rate": 2.888755980861244e-05,
+      "loss": 1.9389,
+      "step": 364
+    },
+    {
+      "epoch": 0.8628841607565012,
+      "grad_norm": 3.64261794090271,
+      "learning_rate": 2.882775119617225e-05,
+      "loss": 1.922,
+      "step": 365
+    },
+    {
+      "epoch": 0.8652482269503546,
+      "grad_norm": 3.591475009918213,
+      "learning_rate": 2.8767942583732062e-05,
+      "loss": 1.9616,
+      "step": 366
+    },
+    {
+      "epoch": 0.8676122931442081,
+      "grad_norm": 3.9587414264678955,
+      "learning_rate": 2.8708133971291868e-05,
+      "loss": 1.9045,
+      "step": 367
+    },
+    {
+      "epoch": 0.8699763593380615,
+      "grad_norm": 3.6751394271850586,
+      "learning_rate": 2.8648325358851674e-05,
+      "loss": 1.9077,
+      "step": 368
+    },
+    {
+      "epoch": 0.8723404255319149,
+      "grad_norm": 4.2092790603637695,
+      "learning_rate": 2.8588516746411487e-05,
+      "loss": 1.9413,
+      "step": 369
+    },
+    {
+      "epoch": 0.8747044917257684,
+      "grad_norm": 3.814706325531006,
+      "learning_rate": 2.8528708133971293e-05,
+      "loss": 1.9203,
+      "step": 370
+    },
+    {
+      "epoch": 0.8770685579196218,
+      "grad_norm": 3.674201250076294,
+      "learning_rate": 2.84688995215311e-05,
+      "loss": 1.8488,
+      "step": 371
+    },
+    {
+      "epoch": 0.8794326241134752,
+      "grad_norm": 3.5166468620300293,
+      "learning_rate": 2.8409090909090912e-05,
+      "loss": 1.9175,
+      "step": 372
+    },
+    {
+      "epoch": 0.8817966903073287,
+      "grad_norm": 3.619014024734497,
+      "learning_rate": 2.834928229665072e-05,
+      "loss": 1.8396,
+      "step": 373
+    },
+    {
+      "epoch": 0.8841607565011821,
+      "grad_norm": 3.923396110534668,
+      "learning_rate": 2.8289473684210528e-05,
+      "loss": 1.8317,
+      "step": 374
+    },
+    {
+      "epoch": 0.8865248226950354,
+      "grad_norm": 3.934695243835449,
+      "learning_rate": 2.8229665071770334e-05,
+      "loss": 1.9562,
+      "step": 375
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 3.761104106903076,
+      "learning_rate": 2.8169856459330147e-05,
+      "loss": 1.7913,
+      "step": 376
+    },
+    {
+      "epoch": 0.8912529550827423,
+      "grad_norm": 3.7853753566741943,
+      "learning_rate": 2.8110047846889953e-05,
+      "loss": 1.8132,
+      "step": 377
+    },
+    {
+      "epoch": 0.8936170212765957,
+      "grad_norm": 3.526927947998047,
+      "learning_rate": 2.805023923444976e-05,
+      "loss": 1.6913,
+      "step": 378
+    },
+    {
+      "epoch": 0.8959810874704491,
+      "grad_norm": 3.75763201713562,
+      "learning_rate": 2.7990430622009572e-05,
+      "loss": 1.8787,
+      "step": 379
+    },
+    {
+      "epoch": 0.8983451536643026,
+      "grad_norm": 3.601562023162842,
+      "learning_rate": 2.793062200956938e-05,
+      "loss": 1.9166,
+      "step": 380
+    },
+    {
+      "epoch": 0.900709219858156,
+      "grad_norm": 3.5951952934265137,
+      "learning_rate": 2.7870813397129185e-05,
+      "loss": 1.885,
+      "step": 381
+    },
+    {
+      "epoch": 0.9030732860520094,
+      "grad_norm": 3.5643539428710938,
+      "learning_rate": 2.7811004784688998e-05,
+      "loss": 1.9044,
+      "step": 382
+    },
+    {
+      "epoch": 0.9054373522458629,
+      "grad_norm": 3.7953860759735107,
+      "learning_rate": 2.7751196172248807e-05,
+      "loss": 1.9872,
+      "step": 383
+    },
+    {
+      "epoch": 0.9078014184397163,
+      "grad_norm": 4.0880913734436035,
+      "learning_rate": 2.7691387559808613e-05,
+      "loss": 1.9078,
+      "step": 384
+    },
+    {
+      "epoch": 0.9101654846335697,
+      "grad_norm": 3.8961236476898193,
+      "learning_rate": 2.7631578947368426e-05,
+      "loss": 1.8843,
+      "step": 385
+    },
+    {
+      "epoch": 0.9125295508274232,
+      "grad_norm": 3.7427453994750977,
+      "learning_rate": 2.7571770334928232e-05,
+      "loss": 1.8713,
+      "step": 386
+    },
+    {
+      "epoch": 0.9148936170212766,
+      "grad_norm": 3.7328555583953857,
+      "learning_rate": 2.751196172248804e-05,
+      "loss": 1.8305,
+      "step": 387
+    },
+    {
+      "epoch": 0.91725768321513,
+      "grad_norm": 3.890418291091919,
+      "learning_rate": 2.7452153110047845e-05,
+      "loss": 1.8197,
+      "step": 388
+    },
+    {
+      "epoch": 0.9196217494089834,
+      "grad_norm": 3.7216286659240723,
+      "learning_rate": 2.7392344497607657e-05,
+      "loss": 1.9201,
+      "step": 389
+    },
+    {
+      "epoch": 0.9219858156028369,
+      "grad_norm": 3.705873489379883,
+      "learning_rate": 2.7332535885167464e-05,
+      "loss": 1.8885,
+      "step": 390
+    },
+    {
+      "epoch": 0.9243498817966903,
+      "grad_norm": 3.5170631408691406,
+      "learning_rate": 2.7272727272727273e-05,
+      "loss": 1.895,
+      "step": 391
+    },
+    {
+      "epoch": 0.9267139479905437,
+      "grad_norm": 3.632924795150757,
+      "learning_rate": 2.7212918660287086e-05,
+      "loss": 1.9286,
+      "step": 392
+    },
+    {
+      "epoch": 0.9290780141843972,
+      "grad_norm": 4.132338523864746,
+      "learning_rate": 2.7153110047846892e-05,
+      "loss": 1.9079,
+      "step": 393
+    },
+    {
+      "epoch": 0.9314420803782506,
+      "grad_norm": 3.8694465160369873,
+      "learning_rate": 2.7093301435406698e-05,
+      "loss": 1.856,
+      "step": 394
+    },
+    {
+      "epoch": 0.933806146572104,
+      "grad_norm": 4.146971702575684,
+      "learning_rate": 2.703349282296651e-05,
+      "loss": 1.9795,
+      "step": 395
+    },
+    {
+      "epoch": 0.9361702127659575,
+      "grad_norm": 3.581249952316284,
+      "learning_rate": 2.6973684210526317e-05,
+      "loss": 1.868,
+      "step": 396
+    },
+    {
+      "epoch": 0.9385342789598109,
+      "grad_norm": 3.779081106185913,
+      "learning_rate": 2.6913875598086123e-05,
+      "loss": 1.8462,
+      "step": 397
+    },
+    {
+      "epoch": 0.9408983451536643,
+      "grad_norm": 3.373218536376953,
+      "learning_rate": 2.6854066985645936e-05,
+      "loss": 1.8336,
+      "step": 398
+    },
+    {
+      "epoch": 0.9432624113475178,
+      "grad_norm": 3.7768990993499756,
+      "learning_rate": 2.6794258373205743e-05,
+      "loss": 1.9062,
+      "step": 399
+    },
+    {
+      "epoch": 0.9456264775413712,
+      "grad_norm": 3.4512805938720703,
+      "learning_rate": 2.6734449760765552e-05,
+      "loss": 1.873,
+      "step": 400
+    },
+    {
+      "epoch": 0.9479905437352246,
+      "grad_norm": 3.38236927986145,
+      "learning_rate": 2.6674641148325358e-05,
+      "loss": 1.8911,
+      "step": 401
+    },
+    {
+      "epoch": 0.950354609929078,
+      "grad_norm": 3.191875696182251,
+      "learning_rate": 2.661483253588517e-05,
+      "loss": 1.8732,
+      "step": 402
+    },
+    {
+      "epoch": 0.9527186761229315,
+      "grad_norm": 3.671778440475464,
+      "learning_rate": 2.6555023923444977e-05,
+      "loss": 1.8879,
+      "step": 403
+    },
+    {
+      "epoch": 0.9550827423167849,
+      "grad_norm": 3.831817150115967,
+      "learning_rate": 2.6495215311004783e-05,
+      "loss": 1.8256,
+      "step": 404
+    },
+    {
+      "epoch": 0.9574468085106383,
+      "grad_norm": 3.432061195373535,
+      "learning_rate": 2.6435406698564596e-05,
+      "loss": 1.8245,
+      "step": 405
+    },
+    {
+      "epoch": 0.9598108747044918,
+      "grad_norm": 3.591796398162842,
+      "learning_rate": 2.6375598086124402e-05,
+      "loss": 1.8954,
+      "step": 406
+    },
+    {
+      "epoch": 0.9621749408983451,
+      "grad_norm": 3.4541237354278564,
+      "learning_rate": 2.6315789473684212e-05,
+      "loss": 1.8886,
+      "step": 407
+    },
+    {
+      "epoch": 0.9645390070921985,
+      "grad_norm": 3.4565842151641846,
+      "learning_rate": 2.625598086124402e-05,
+      "loss": 1.8766,
+      "step": 408
+    },
+    {
+      "epoch": 0.966903073286052,
+      "grad_norm": 3.812185049057007,
+      "learning_rate": 2.619617224880383e-05,
+      "loss": 1.8797,
+      "step": 409
+    },
+    {
+      "epoch": 0.9692671394799054,
+      "grad_norm": 4.211532115936279,
+      "learning_rate": 2.6136363636363637e-05,
+      "loss": 1.8563,
+      "step": 410
+    },
+    {
+      "epoch": 0.9716312056737588,
+      "grad_norm": 3.5806126594543457,
+      "learning_rate": 2.6076555023923443e-05,
+      "loss": 1.9192,
+      "step": 411
+    },
+    {
+      "epoch": 0.9739952718676123,
+      "grad_norm": 3.7554843425750732,
+      "learning_rate": 2.6016746411483256e-05,
+      "loss": 1.8267,
+      "step": 412
+    },
+    {
+      "epoch": 0.9763593380614657,
+      "grad_norm": 3.9262287616729736,
+      "learning_rate": 2.5956937799043062e-05,
+      "loss": 1.9415,
+      "step": 413
+    },
+    {
+      "epoch": 0.9787234042553191,
+      "grad_norm": 3.754761219024658,
+      "learning_rate": 2.589712918660287e-05,
+      "loss": 1.8816,
+      "step": 414
+    },
+    {
+      "epoch": 0.9810874704491725,
+      "grad_norm": 3.463529586791992,
+      "learning_rate": 2.583732057416268e-05,
+      "loss": 1.8487,
+      "step": 415
+    },
+    {
+      "epoch": 0.983451536643026,
+      "grad_norm": 3.6155738830566406,
+      "learning_rate": 2.5777511961722488e-05,
+      "loss": 1.8339,
+      "step": 416
+    },
+    {
+      "epoch": 0.9858156028368794,
+      "grad_norm": 3.745180130004883,
+      "learning_rate": 2.5717703349282297e-05,
+      "loss": 1.8555,
+      "step": 417
+    },
+    {
+      "epoch": 0.9881796690307328,
+      "grad_norm": 3.9255855083465576,
+      "learning_rate": 2.565789473684211e-05,
+      "loss": 1.8148,
+      "step": 418
+    },
+    {
+      "epoch": 0.9905437352245863,
+      "grad_norm": 4.076484203338623,
+      "learning_rate": 2.5598086124401916e-05,
+      "loss": 1.8527,
+      "step": 419
+    },
+    {
+      "epoch": 0.9929078014184397,
+      "grad_norm": 3.6310737133026123,
+      "learning_rate": 2.5538277511961722e-05,
+      "loss": 1.8875,
+      "step": 420
+    },
+    {
+      "epoch": 0.9952718676122931,
+      "grad_norm": 3.757092237472534,
+      "learning_rate": 2.5478468899521535e-05,
+      "loss": 1.9363,
+      "step": 421
+    },
+    {
+      "epoch": 0.9976359338061466,
+      "grad_norm": 3.754251003265381,
+      "learning_rate": 2.541866028708134e-05,
+      "loss": 1.8591,
+      "step": 422
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 3.948606014251709,
+      "learning_rate": 2.5358851674641147e-05,
+      "loss": 1.8516,
+      "step": 423
+    },
+    {
+      "epoch": 1.0023640661938533,
+      "grad_norm": 3.393385171890259,
+      "learning_rate": 2.5299043062200957e-05,
+      "loss": 1.7089,
+      "step": 424
+    },
+    {
+      "epoch": 1.0047281323877069,
+      "grad_norm": 3.5038678646087646,
+      "learning_rate": 2.5239234449760766e-05,
+      "loss": 1.6513,
+      "step": 425
+    },
+    {
+      "epoch": 1.0070921985815602,
+      "grad_norm": 3.546327590942383,
+      "learning_rate": 2.5179425837320576e-05,
+      "loss": 1.6913,
+      "step": 426
+    },
+    {
+      "epoch": 1.0094562647754137,
+      "grad_norm": 3.012467384338379,
+      "learning_rate": 2.5119617224880382e-05,
+      "loss": 1.671,
+      "step": 427
+    },
+    {
+      "epoch": 1.011820330969267,
+      "grad_norm": 3.5642528533935547,
+      "learning_rate": 2.5059808612440195e-05,
+      "loss": 1.7187,
+      "step": 428
+    },
+    {
+      "epoch": 1.0141843971631206,
+      "grad_norm": 3.3508832454681396,
+      "learning_rate": 2.5e-05,
+      "loss": 1.6118,
+      "step": 429
+    },
+    {
+      "epoch": 1.016548463356974,
+      "grad_norm": 3.450350761413574,
+      "learning_rate": 2.494019138755981e-05,
+      "loss": 1.6704,
+      "step": 430
+    },
+    {
+      "epoch": 1.0189125295508275,
+      "grad_norm": 3.859874725341797,
+      "learning_rate": 2.4880382775119617e-05,
+      "loss": 1.6719,
+      "step": 431
+    },
+    {
+      "epoch": 1.0212765957446808,
+      "grad_norm": 3.572866439819336,
+      "learning_rate": 2.4820574162679426e-05,
+      "loss": 1.6766,
+      "step": 432
+    },
+    {
+      "epoch": 1.0236406619385343,
+      "grad_norm": 3.1181817054748535,
+      "learning_rate": 2.4760765550239236e-05,
+      "loss": 1.623,
+      "step": 433
+    },
+    {
+      "epoch": 1.0260047281323876,
+      "grad_norm": 3.3449785709381104,
+      "learning_rate": 2.4700956937799045e-05,
+      "loss": 1.6767,
+      "step": 434
+    },
+    {
+      "epoch": 1.0283687943262412,
+      "grad_norm": 3.494570732116699,
+      "learning_rate": 2.4641148325358855e-05,
+      "loss": 1.6433,
+      "step": 435
+    },
+    {
+      "epoch": 1.0307328605200945,
+      "grad_norm": 3.3296971321105957,
+      "learning_rate": 2.458133971291866e-05,
+      "loss": 1.6977,
+      "step": 436
+    },
+    {
+      "epoch": 1.033096926713948,
+      "grad_norm": 3.5671586990356445,
+      "learning_rate": 2.452153110047847e-05,
+      "loss": 1.6876,
+      "step": 437
+    },
+    {
+      "epoch": 1.0354609929078014,
+      "grad_norm": 3.4455606937408447,
+      "learning_rate": 2.446172248803828e-05,
+      "loss": 1.5463,
+      "step": 438
+    },
+    {
+      "epoch": 1.037825059101655,
+      "grad_norm": 3.561481237411499,
+      "learning_rate": 2.4401913875598086e-05,
+      "loss": 1.6733,
+      "step": 439
+    },
+    {
+      "epoch": 1.0401891252955082,
+      "grad_norm": 3.6918563842773438,
+      "learning_rate": 2.4342105263157896e-05,
+      "loss": 1.6511,
+      "step": 440
+    },
+    {
+      "epoch": 1.0425531914893618,
+      "grad_norm": 3.360909938812256,
+      "learning_rate": 2.4282296650717702e-05,
+      "loss": 1.6931,
+      "step": 441
+    },
+    {
+      "epoch": 1.044917257683215,
+      "grad_norm": 3.5804953575134277,
+      "learning_rate": 2.4222488038277515e-05,
+      "loss": 1.6333,
+      "step": 442
+    },
+    {
+      "epoch": 1.0472813238770686,
+      "grad_norm": 3.5099520683288574,
+      "learning_rate": 2.4162679425837324e-05,
+      "loss": 1.5951,
+      "step": 443
+    },
+    {
+      "epoch": 1.049645390070922,
+      "grad_norm": 3.4295504093170166,
+      "learning_rate": 2.410287081339713e-05,
+      "loss": 1.6764,
+      "step": 444
+    },
+    {
+      "epoch": 1.0520094562647755,
+      "grad_norm": 3.5318591594696045,
+      "learning_rate": 2.404306220095694e-05,
+      "loss": 1.6894,
+      "step": 445
+    },
+    {
+      "epoch": 1.0543735224586288,
+      "grad_norm": 3.4848272800445557,
+      "learning_rate": 2.3983253588516746e-05,
+      "loss": 1.6769,
+      "step": 446
+    },
+    {
+      "epoch": 1.0567375886524824,
+      "grad_norm": 3.7782180309295654,
+      "learning_rate": 2.3923444976076556e-05,
+      "loss": 1.6239,
+      "step": 447
+    },
+    {
+      "epoch": 1.0591016548463357,
+      "grad_norm": 3.2487025260925293,
+      "learning_rate": 2.3863636363636365e-05,
+      "loss": 1.5877,
+      "step": 448
+    },
+    {
+      "epoch": 1.0614657210401892,
+      "grad_norm": 3.6180076599121094,
+      "learning_rate": 2.380382775119617e-05,
+      "loss": 1.685,
+      "step": 449
+    },
+    {
+      "epoch": 1.0638297872340425,
+      "grad_norm": 3.6394782066345215,
+      "learning_rate": 2.374401913875598e-05,
+      "loss": 1.6699,
+      "step": 450
+    },
+    {
+      "epoch": 1.066193853427896,
+      "grad_norm": 3.7263615131378174,
+      "learning_rate": 2.368421052631579e-05,
+      "loss": 1.6681,
+      "step": 451
+    },
+    {
+      "epoch": 1.0685579196217494,
+      "grad_norm": 3.455543279647827,
+      "learning_rate": 2.36244019138756e-05,
+      "loss": 1.5929,
+      "step": 452
+    },
+    {
+      "epoch": 1.070921985815603,
+      "grad_norm": 3.379056930541992,
+      "learning_rate": 2.356459330143541e-05,
+      "loss": 1.7262,
+      "step": 453
+    },
+    {
+      "epoch": 1.0732860520094563,
+      "grad_norm": 3.415682792663574,
+      "learning_rate": 2.3504784688995216e-05,
+      "loss": 1.6502,
+      "step": 454
+    },
+    {
+      "epoch": 1.0756501182033098,
+      "grad_norm": 3.3975017070770264,
+      "learning_rate": 2.3444976076555025e-05,
+      "loss": 1.6846,
+      "step": 455
+    },
+    {
+      "epoch": 1.0780141843971631,
+      "grad_norm": 3.844403028488159,
+      "learning_rate": 2.3385167464114835e-05,
+      "loss": 1.6581,
+      "step": 456
+    },
+    {
+      "epoch": 1.0803782505910164,
+      "grad_norm": 3.237973690032959,
+      "learning_rate": 2.332535885167464e-05,
+      "loss": 1.6452,
+      "step": 457
+    },
+    {
+      "epoch": 1.08274231678487,
+      "grad_norm": 3.138275384902954,
+      "learning_rate": 2.326555023923445e-05,
+      "loss": 1.671,
+      "step": 458
+    },
+    {
+      "epoch": 1.0851063829787233,
+      "grad_norm": 3.2867116928100586,
+      "learning_rate": 2.320574162679426e-05,
+      "loss": 1.7039,
+      "step": 459
+    },
+    {
+      "epoch": 1.0874704491725768,
+      "grad_norm": 3.331429958343506,
+      "learning_rate": 2.314593301435407e-05,
+      "loss": 1.6251,
+      "step": 460
+    },
+    {
+      "epoch": 1.0898345153664302,
+      "grad_norm": 3.517249822616577,
+      "learning_rate": 2.308612440191388e-05,
+      "loss": 1.6588,
+      "step": 461
+    },
+    {
+      "epoch": 1.0921985815602837,
+      "grad_norm": 3.447352170944214,
+      "learning_rate": 2.3026315789473685e-05,
+      "loss": 1.6494,
+      "step": 462
+    },
+    {
+      "epoch": 1.094562647754137,
+      "grad_norm": 3.820619583129883,
+      "learning_rate": 2.2966507177033495e-05,
+      "loss": 1.6767,
+      "step": 463
+    },
+    {
+      "epoch": 1.0969267139479906,
+      "grad_norm": 3.612136125564575,
+      "learning_rate": 2.29066985645933e-05,
+      "loss": 1.6123,
+      "step": 464
+    },
+    {
+      "epoch": 1.099290780141844,
+      "grad_norm": 3.2653629779815674,
+      "learning_rate": 2.284688995215311e-05,
+      "loss": 1.639,
+      "step": 465
+    },
+    {
+      "epoch": 1.1016548463356974,
+      "grad_norm": 3.241689443588257,
+      "learning_rate": 2.278708133971292e-05,
+      "loss": 1.6561,
+      "step": 466
+    },
+    {
+      "epoch": 1.1040189125295508,
+      "grad_norm": 3.3771729469299316,
+      "learning_rate": 2.272727272727273e-05,
+      "loss": 1.6457,
+      "step": 467
+    },
+    {
+      "epoch": 1.1063829787234043,
+      "grad_norm": 3.3176181316375732,
+      "learning_rate": 2.266746411483254e-05,
+      "loss": 1.6047,
+      "step": 468
+    },
+    {
+      "epoch": 1.1087470449172576,
+      "grad_norm": 3.281697988510132,
+      "learning_rate": 2.2607655502392345e-05,
+      "loss": 1.6211,
+      "step": 469
+    },
+    {
+      "epoch": 1.1111111111111112,
+      "grad_norm": 3.4810190200805664,
+      "learning_rate": 2.2547846889952154e-05,
+      "loss": 1.7409,
+      "step": 470
+    },
+    {
+      "epoch": 1.1134751773049645,
+      "grad_norm": 3.873317003250122,
+      "learning_rate": 2.2488038277511964e-05,
+      "loss": 1.6596,
+      "step": 471
+    },
+    {
+      "epoch": 1.115839243498818,
+      "grad_norm": 3.9520647525787354,
+      "learning_rate": 2.242822966507177e-05,
+      "loss": 1.7332,
+      "step": 472
+    },
+    {
+      "epoch": 1.1182033096926713,
+      "grad_norm": 3.922635555267334,
+      "learning_rate": 2.236842105263158e-05,
+      "loss": 1.6934,
+      "step": 473
+    },
+    {
+      "epoch": 1.1205673758865249,
+      "grad_norm": 3.404571056365967,
+      "learning_rate": 2.230861244019139e-05,
+      "loss": 1.6456,
+      "step": 474
+    },
+    {
+      "epoch": 1.1229314420803782,
+      "grad_norm": 3.497051239013672,
+      "learning_rate": 2.2248803827751195e-05,
+      "loss": 1.712,
+      "step": 475
+    },
+    {
+      "epoch": 1.1252955082742317,
+      "grad_norm": 3.632838249206543,
+      "learning_rate": 2.2188995215311005e-05,
+      "loss": 1.6521,
+      "step": 476
+    },
+    {
+      "epoch": 1.127659574468085,
+      "grad_norm": 3.8431527614593506,
+      "learning_rate": 2.2129186602870814e-05,
+      "loss": 1.6738,
+      "step": 477
+    },
+    {
+      "epoch": 1.1300236406619386,
+      "grad_norm": 3.709177255630493,
+      "learning_rate": 2.2069377990430624e-05,
+      "loss": 1.622,
+      "step": 478
+    },
+    {
+      "epoch": 1.132387706855792,
+      "grad_norm": 3.3974366188049316,
+      "learning_rate": 2.2009569377990433e-05,
+      "loss": 1.7082,
+      "step": 479
+    },
+    {
+      "epoch": 1.1347517730496455,
+      "grad_norm": 3.6680588722229004,
+      "learning_rate": 2.194976076555024e-05,
+      "loss": 1.6759,
+      "step": 480
+    },
+    {
+      "epoch": 1.1371158392434988,
+      "grad_norm": 3.3660480976104736,
+      "learning_rate": 2.188995215311005e-05,
+      "loss": 1.6893,
+      "step": 481
+    },
+    {
+      "epoch": 1.1394799054373523,
+      "grad_norm": 3.4249792098999023,
+      "learning_rate": 2.1830143540669855e-05,
+      "loss": 1.6269,
+      "step": 482
+    },
+    {
+      "epoch": 1.1418439716312057,
+      "grad_norm": 3.5676686763763428,
+      "learning_rate": 2.1770334928229665e-05,
+      "loss": 1.5851,
+      "step": 483
+    },
+    {
+      "epoch": 1.1442080378250592,
+      "grad_norm": 3.6361424922943115,
+      "learning_rate": 2.1710526315789474e-05,
+      "loss": 1.683,
+      "step": 484
+    },
+    {
+      "epoch": 1.1465721040189125,
+      "grad_norm": 3.530165910720825,
+      "learning_rate": 2.1650717703349284e-05,
+      "loss": 1.6617,
+      "step": 485
+    },
+    {
+      "epoch": 1.148936170212766,
+      "grad_norm": 3.3330204486846924,
+      "learning_rate": 2.1590909090909093e-05,
+      "loss": 1.6294,
+      "step": 486
+    },
+    {
+      "epoch": 1.1513002364066194,
+      "grad_norm": 3.3433423042297363,
+      "learning_rate": 2.1531100478468903e-05,
+      "loss": 1.5875,
+      "step": 487
+    },
+    {
+      "epoch": 1.1536643026004727,
+      "grad_norm": 3.511631488800049,
+      "learning_rate": 2.147129186602871e-05,
+      "loss": 1.6799,
+      "step": 488
+    },
+    {
+      "epoch": 1.1560283687943262,
+      "grad_norm": 3.3675262928009033,
+      "learning_rate": 2.141148325358852e-05,
+      "loss": 1.7247,
+      "step": 489
+    },
+    {
+      "epoch": 1.1583924349881798,
+      "grad_norm": 3.7090232372283936,
+      "learning_rate": 2.1351674641148325e-05,
+      "loss": 1.6302,
+      "step": 490
+    },
+    {
+      "epoch": 1.160756501182033,
+      "grad_norm": 3.7816193103790283,
+      "learning_rate": 2.1291866028708134e-05,
+      "loss": 1.7691,
+      "step": 491
+    },
+    {
+      "epoch": 1.1631205673758864,
+      "grad_norm": 3.2552871704101562,
+      "learning_rate": 2.1232057416267944e-05,
+      "loss": 1.5899,
+      "step": 492
+    },
+    {
+      "epoch": 1.16548463356974,
+      "grad_norm": 3.853459119796753,
+      "learning_rate": 2.1172248803827753e-05,
+      "loss": 1.7019,
+      "step": 493
+    },
+    {
+      "epoch": 1.1678486997635933,
+      "grad_norm": 3.5649783611297607,
+      "learning_rate": 2.1112440191387563e-05,
+      "loss": 1.6619,
+      "step": 494
+    },
+    {
+      "epoch": 1.1702127659574468,
+      "grad_norm": 3.476576805114746,
+      "learning_rate": 2.105263157894737e-05,
+      "loss": 1.611,
+      "step": 495
+    },
+    {
+      "epoch": 1.1725768321513002,
+      "grad_norm": 3.5537772178649902,
+      "learning_rate": 2.099282296650718e-05,
+      "loss": 1.637,
+      "step": 496
+    },
+    {
+      "epoch": 1.1749408983451537,
+      "grad_norm": 3.6302125453948975,
+      "learning_rate": 2.0933014354066988e-05,
+      "loss": 1.6317,
+      "step": 497
+    },
+    {
+      "epoch": 1.177304964539007,
+      "grad_norm": 3.622593879699707,
+      "learning_rate": 2.0873205741626794e-05,
+      "loss": 1.6086,
+      "step": 498
+    },
+    {
+      "epoch": 1.1796690307328606,
+      "grad_norm": 3.44309663772583,
+      "learning_rate": 2.0813397129186604e-05,
+      "loss": 1.6525,
+      "step": 499
+    },
+    {
+      "epoch": 1.1820330969267139,
+      "grad_norm": 3.0509703159332275,
+      "learning_rate": 2.075358851674641e-05,
+      "loss": 1.6,
+      "step": 500
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 846,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6.382039870699684e+17,
+  "train_batch_size": 32,
+  "trial_name": null,
+  "trial_params": null
+}

outputs_pretrained/checkpoint-500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c6a6475df1fbd4035be275a3925896957b71e85be6796a5f125a9e457a9185f9
+size 6225

outputs_pretrained/checkpoint-846/README.md ADDED Viewed

	@@ -0,0 +1,210 @@

+---
+base_model: unsloth/mistral-7b-v0.3-bnb-4bit
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:unsloth/mistral-7b-v0.3-bnb-4bit
+- lora
+- sft
+- transformers
+- trl
+- unsloth
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

outputs_pretrained/checkpoint-846/adapter_config.json ADDED Viewed

	@@ -0,0 +1,53 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "MistralForCausalLM",
+    "parent_library": "transformers.models.mistral.modeling_mistral",
+    "unsloth_fixed": true
+  },
+  "base_model_name_or_path": "unsloth/mistral-7b-v0.3-bnb-4bit",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "embed_tokens",
+    "lm_head"
+  ],
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "gate_proj",
+    "q_proj",
+    "o_proj",
+    "down_proj",
+    "k_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

outputs_pretrained/checkpoint-846/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:afa37d68a6e52933537dfb8a371197f6981e008f924a301c86f7af037fe1e680
+size 1208020312

outputs_pretrained/checkpoint-846/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb061493a0092f4c652d58fb676ae98004528d90a1b13d5443b8385a0471e108
+size 1687688427

outputs_pretrained/checkpoint-846/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c800b778fa7e115e4c34de8529902de8b61c9a1b4bab3eb8295d06dafff030e
+size 14645

outputs_pretrained/checkpoint-846/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0673af8e418e2eb3861066dee973fbd226b7399c77fe820ef7226fcace23f38
+size 1465

outputs_pretrained/checkpoint-846/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[control_768]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

outputs_pretrained/checkpoint-846/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs_pretrained/checkpoint-846/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
+size 587404

outputs_pretrained/checkpoint-846/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs_pretrained/checkpoint-846/trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs_pretrained/checkpoint-846/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c6a6475df1fbd4035be275a3925896957b71e85be6796a5f125a9e457a9185f9
+size 6225