Training in progress, step 301, checkpoint

Browse files

Files changed (12) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +40 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +89 -0
last-checkpoint/trainer_state.json +2140 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/codellama-7b
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/codellama-7b",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "down_proj",
+    "q_proj",
+    "up_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:53aa1f81954085ee54ed249a101d58719cbbe4434da65f5479b0c4573f08a4a8
+size 80013120

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d2503b123b77ddddc7d4442fc5529e2136ae1a2e1f4eb2a242d06725160e795c
+size 41120084

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:99c2908d6784190f5a199ec0a35ef5871f8ec8214d4b3723fd8a8cf4a38f15cf
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b9e79e1ceb6e2a9bc59c7cf76f9ba6717e2a28ad1f0cd7fa11976db6bbc444bf
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "additional_special_tokens": [
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>",
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45ccb9c8b6b561889acea59191d66986d314e7cbd6a78abc6e49b139ca91c1e6
+size 500058

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,89 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32007": {
+      "content": "▁<PRE>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32008": {
+      "content": "▁<SUF>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32009": {
+      "content": "▁<MID>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32010": {
+      "content": "▁<EOT>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>",
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>"
+  ],
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "eot_token": "▁<EOT>",
+  "fill_token": "<FILL_ME>",
+  "legacy": null,
+  "middle_token": "▁<MID>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<unk>",
+  "padding_side": "left",
+  "prefix_token": "▁<PRE>",
+  "sp_model_kwargs": {},
+  "suffix_token": "▁<SUF>",
+  "tokenizer_class": "CodeLlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2140 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.25,
+  "eval_steps": 500,
+  "global_step": 301,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0008305647840531562,
+      "grad_norm": 0.2117196023464203,
+      "learning_rate": 2e-05,
+      "loss": 1.6229,
+      "step": 1
+    },
+    {
+      "epoch": 0.0016611295681063123,
+      "grad_norm": 0.22263389825820923,
+      "learning_rate": 4e-05,
+      "loss": 1.6193,
+      "step": 2
+    },
+    {
+      "epoch": 0.0024916943521594683,
+      "grad_norm": 0.2136000245809555,
+      "learning_rate": 6e-05,
+      "loss": 1.6194,
+      "step": 3
+    },
+    {
+      "epoch": 0.0033222591362126247,
+      "grad_norm": 0.1670055240392685,
+      "learning_rate": 8e-05,
+      "loss": 1.5013,
+      "step": 4
+    },
+    {
+      "epoch": 0.004152823920265781,
+      "grad_norm": 0.21128971874713898,
+      "learning_rate": 0.0001,
+      "loss": 1.5584,
+      "step": 5
+    },
+    {
+      "epoch": 0.0049833887043189366,
+      "grad_norm": 0.2818673849105835,
+      "learning_rate": 9.999982836686337e-05,
+      "loss": 1.6258,
+      "step": 6
+    },
+    {
+      "epoch": 0.005813953488372093,
+      "grad_norm": 0.27105721831321716,
+      "learning_rate": 9.999931346863176e-05,
+      "loss": 1.4343,
+      "step": 7
+    },
+    {
+      "epoch": 0.006644518272425249,
+      "grad_norm": 0.391613245010376,
+      "learning_rate": 9.999845530884015e-05,
+      "loss": 1.7102,
+      "step": 8
+    },
+    {
+      "epoch": 0.007475083056478406,
+      "grad_norm": 0.3867242634296417,
+      "learning_rate": 9.999725389338007e-05,
+      "loss": 1.7026,
+      "step": 9
+    },
+    {
+      "epoch": 0.008305647840531562,
+      "grad_norm": 0.3804970681667328,
+      "learning_rate": 9.999570923049964e-05,
+      "loss": 1.526,
+      "step": 10
+    },
+    {
+      "epoch": 0.009136212624584718,
+      "grad_norm": 0.3691057860851288,
+      "learning_rate": 9.999382133080343e-05,
+      "loss": 1.5303,
+      "step": 11
+    },
+    {
+      "epoch": 0.009966777408637873,
+      "grad_norm": 0.36438891291618347,
+      "learning_rate": 9.999159020725254e-05,
+      "loss": 1.4692,
+      "step": 12
+    },
+    {
+      "epoch": 0.01079734219269103,
+      "grad_norm": 0.34645453095436096,
+      "learning_rate": 9.998901587516434e-05,
+      "loss": 1.3396,
+      "step": 13
+    },
+    {
+      "epoch": 0.011627906976744186,
+      "grad_norm": 0.3624647259712219,
+      "learning_rate": 9.998609835221244e-05,
+      "loss": 1.204,
+      "step": 14
+    },
+    {
+      "epoch": 0.012458471760797342,
+      "grad_norm": 0.32159945368766785,
+      "learning_rate": 9.998283765842661e-05,
+      "loss": 1.4673,
+      "step": 15
+    },
+    {
+      "epoch": 0.013289036544850499,
+      "grad_norm": 0.3673219680786133,
+      "learning_rate": 9.997923381619256e-05,
+      "loss": 1.3492,
+      "step": 16
+    },
+    {
+      "epoch": 0.014119601328903655,
+      "grad_norm": 0.39334583282470703,
+      "learning_rate": 9.997528685025185e-05,
+      "loss": 1.5226,
+      "step": 17
+    },
+    {
+      "epoch": 0.014950166112956811,
+      "grad_norm": 0.3874243199825287,
+      "learning_rate": 9.997099678770168e-05,
+      "loss": 1.4233,
+      "step": 18
+    },
+    {
+      "epoch": 0.015780730897009966,
+      "grad_norm": 0.4022265374660492,
+      "learning_rate": 9.99663636579947e-05,
+      "loss": 1.3485,
+      "step": 19
+    },
+    {
+      "epoch": 0.016611295681063124,
+      "grad_norm": 0.466097891330719,
+      "learning_rate": 9.996138749293891e-05,
+      "loss": 1.4112,
+      "step": 20
+    },
+    {
+      "epoch": 0.01744186046511628,
+      "grad_norm": 0.4313271939754486,
+      "learning_rate": 9.995606832669726e-05,
+      "loss": 1.3705,
+      "step": 21
+    },
+    {
+      "epoch": 0.018272425249169437,
+      "grad_norm": 0.45213770866394043,
+      "learning_rate": 9.995040619578757e-05,
+      "loss": 1.3108,
+      "step": 22
+    },
+    {
+      "epoch": 0.01910299003322259,
+      "grad_norm": 1.3649903535842896,
+      "learning_rate": 9.994440113908221e-05,
+      "loss": 1.3395,
+      "step": 23
+    },
+    {
+      "epoch": 0.019933554817275746,
+      "grad_norm": 0.45877453684806824,
+      "learning_rate": 9.993805319780784e-05,
+      "loss": 1.3802,
+      "step": 24
+    },
+    {
+      "epoch": 0.020764119601328904,
+      "grad_norm": 0.4663592576980591,
+      "learning_rate": 9.993136241554515e-05,
+      "loss": 1.2858,
+      "step": 25
+    },
+    {
+      "epoch": 0.02159468438538206,
+      "grad_norm": 0.5290528535842896,
+      "learning_rate": 9.992432883822855e-05,
+      "loss": 1.5919,
+      "step": 26
+    },
+    {
+      "epoch": 0.022425249169435217,
+      "grad_norm": 0.4811062514781952,
+      "learning_rate": 9.991695251414583e-05,
+      "loss": 1.4016,
+      "step": 27
+    },
+    {
+      "epoch": 0.023255813953488372,
+      "grad_norm": 0.5235459804534912,
+      "learning_rate": 9.990923349393786e-05,
+      "loss": 1.4679,
+      "step": 28
+    },
+    {
+      "epoch": 0.02408637873754153,
+      "grad_norm": 0.4810985326766968,
+      "learning_rate": 9.990117183059819e-05,
+      "loss": 1.2888,
+      "step": 29
+    },
+    {
+      "epoch": 0.024916943521594685,
+      "grad_norm": 0.5298346877098083,
+      "learning_rate": 9.989276757947281e-05,
+      "loss": 1.3626,
+      "step": 30
+    },
+    {
+      "epoch": 0.02574750830564784,
+      "grad_norm": 0.5100981593132019,
+      "learning_rate": 9.988402079825962e-05,
+      "loss": 1.2308,
+      "step": 31
+    },
+    {
+      "epoch": 0.026578073089700997,
+      "grad_norm": 0.5727996230125427,
+      "learning_rate": 9.987493154700812e-05,
+      "loss": 1.287,
+      "step": 32
+    },
+    {
+      "epoch": 0.027408637873754152,
+      "grad_norm": 0.5358831882476807,
+      "learning_rate": 9.986549988811897e-05,
+      "loss": 1.3299,
+      "step": 33
+    },
+    {
+      "epoch": 0.02823920265780731,
+      "grad_norm": 0.6017313003540039,
+      "learning_rate": 9.98557258863436e-05,
+      "loss": 1.3583,
+      "step": 34
+    },
+    {
+      "epoch": 0.029069767441860465,
+      "grad_norm": 0.5905628800392151,
+      "learning_rate": 9.984560960878369e-05,
+      "loss": 1.3814,
+      "step": 35
+    },
+    {
+      "epoch": 0.029900332225913623,
+      "grad_norm": 0.6339725255966187,
+      "learning_rate": 9.98351511248908e-05,
+      "loss": 1.3815,
+      "step": 36
+    },
+    {
+      "epoch": 0.030730897009966777,
+      "grad_norm": 0.6721277236938477,
+      "learning_rate": 9.98243505064658e-05,
+      "loss": 1.4925,
+      "step": 37
+    },
+    {
+      "epoch": 0.03156146179401993,
+      "grad_norm": 0.6016228795051575,
+      "learning_rate": 9.981320782765846e-05,
+      "loss": 1.2665,
+      "step": 38
+    },
+    {
+      "epoch": 0.03239202657807309,
+      "grad_norm": 0.5978913307189941,
+      "learning_rate": 9.98017231649669e-05,
+      "loss": 1.1268,
+      "step": 39
+    },
+    {
+      "epoch": 0.03322259136212625,
+      "grad_norm": 0.7696051597595215,
+      "learning_rate": 9.978989659723709e-05,
+      "loss": 1.5571,
+      "step": 40
+    },
+    {
+      "epoch": 0.0340531561461794,
+      "grad_norm": 1.1089845895767212,
+      "learning_rate": 9.977772820566222e-05,
+      "loss": 1.8723,
+      "step": 41
+    },
+    {
+      "epoch": 0.03488372093023256,
+      "grad_norm": 0.8813836574554443,
+      "learning_rate": 9.976521807378229e-05,
+      "loss": 1.3501,
+      "step": 42
+    },
+    {
+      "epoch": 0.03571428571428571,
+      "grad_norm": 1.1799523830413818,
+      "learning_rate": 9.975236628748343e-05,
+      "loss": 1.6155,
+      "step": 43
+    },
+    {
+      "epoch": 0.036544850498338874,
+      "grad_norm": 3.413243293762207,
+      "learning_rate": 9.973917293499732e-05,
+      "loss": 2.027,
+      "step": 44
+    },
+    {
+      "epoch": 0.03737541528239203,
+      "grad_norm": 1.4013257026672363,
+      "learning_rate": 9.972563810690062e-05,
+      "loss": 1.1709,
+      "step": 45
+    },
+    {
+      "epoch": 0.03820598006644518,
+      "grad_norm": 1.8023418188095093,
+      "learning_rate": 9.971176189611434e-05,
+      "loss": 1.8565,
+      "step": 46
+    },
+    {
+      "epoch": 0.03903654485049834,
+      "grad_norm": 1.6363664865493774,
+      "learning_rate": 9.969754439790317e-05,
+      "loss": 1.5719,
+      "step": 47
+    },
+    {
+      "epoch": 0.03986710963455149,
+      "grad_norm": 1.4404523372650146,
+      "learning_rate": 9.968298570987489e-05,
+      "loss": 2.0997,
+      "step": 48
+    },
+    {
+      "epoch": 0.040697674418604654,
+      "grad_norm": 1.0700703859329224,
+      "learning_rate": 9.966808593197959e-05,
+      "loss": 1.8537,
+      "step": 49
+    },
+    {
+      "epoch": 0.04152823920265781,
+      "grad_norm": 0.8873586654663086,
+      "learning_rate": 9.965284516650913e-05,
+      "loss": 2.1512,
+      "step": 50
+    },
+    {
+      "epoch": 0.04235880398671096,
+      "grad_norm": 0.4831582307815552,
+      "learning_rate": 9.96372635180963e-05,
+      "loss": 1.1619,
+      "step": 51
+    },
+    {
+      "epoch": 0.04318936877076412,
+      "grad_norm": 0.5050444602966309,
+      "learning_rate": 9.96213410937142e-05,
+      "loss": 1.4075,
+      "step": 52
+    },
+    {
+      "epoch": 0.04401993355481727,
+      "grad_norm": 0.4076911509037018,
+      "learning_rate": 9.960507800267546e-05,
+      "loss": 1.1418,
+      "step": 53
+    },
+    {
+      "epoch": 0.044850498338870434,
+      "grad_norm": 0.47769635915756226,
+      "learning_rate": 9.95884743566315e-05,
+      "loss": 1.5931,
+      "step": 54
+    },
+    {
+      "epoch": 0.04568106312292359,
+      "grad_norm": 0.389701783657074,
+      "learning_rate": 9.957153026957173e-05,
+      "loss": 1.1136,
+      "step": 55
+    },
+    {
+      "epoch": 0.046511627906976744,
+      "grad_norm": 0.3820691704750061,
+      "learning_rate": 9.955424585782283e-05,
+      "loss": 1.1726,
+      "step": 56
+    },
+    {
+      "epoch": 0.0473421926910299,
+      "grad_norm": 0.3703025281429291,
+      "learning_rate": 9.953662124004792e-05,
+      "loss": 1.0905,
+      "step": 57
+    },
+    {
+      "epoch": 0.04817275747508306,
+      "grad_norm": 0.42741143703460693,
+      "learning_rate": 9.951865653724574e-05,
+      "loss": 1.3651,
+      "step": 58
+    },
+    {
+      "epoch": 0.049003322259136214,
+      "grad_norm": 0.41867637634277344,
+      "learning_rate": 9.950035187274982e-05,
+      "loss": 1.2327,
+      "step": 59
+    },
+    {
+      "epoch": 0.04983388704318937,
+      "grad_norm": 0.4410296082496643,
+      "learning_rate": 9.948170737222762e-05,
+      "loss": 1.5586,
+      "step": 60
+    },
+    {
+      "epoch": 0.050664451827242524,
+      "grad_norm": 0.40873047709465027,
+      "learning_rate": 9.946272316367973e-05,
+      "loss": 1.2033,
+      "step": 61
+    },
+    {
+      "epoch": 0.05149501661129568,
+      "grad_norm": 0.435169517993927,
+      "learning_rate": 9.94433993774389e-05,
+      "loss": 1.0373,
+      "step": 62
+    },
+    {
+      "epoch": 0.05232558139534884,
+      "grad_norm": 0.4170421063899994,
+      "learning_rate": 9.942373614616925e-05,
+      "loss": 1.1679,
+      "step": 63
+    },
+    {
+      "epoch": 0.053156146179401995,
+      "grad_norm": 0.44350698590278625,
+      "learning_rate": 9.940373360486519e-05,
+      "loss": 1.2918,
+      "step": 64
+    },
+    {
+      "epoch": 0.05398671096345515,
+      "grad_norm": 0.3921529948711395,
+      "learning_rate": 9.938339189085075e-05,
+      "loss": 1.2783,
+      "step": 65
+    },
+    {
+      "epoch": 0.054817275747508304,
+      "grad_norm": 0.44321513175964355,
+      "learning_rate": 9.936271114377837e-05,
+      "loss": 1.3418,
+      "step": 66
+    },
+    {
+      "epoch": 0.05564784053156146,
+      "grad_norm": 0.42089399695396423,
+      "learning_rate": 9.934169150562815e-05,
+      "loss": 1.0298,
+      "step": 67
+    },
+    {
+      "epoch": 0.05647840531561462,
+      "grad_norm": 0.41770365834236145,
+      "learning_rate": 9.93203331207067e-05,
+      "loss": 1.1909,
+      "step": 68
+    },
+    {
+      "epoch": 0.057308970099667775,
+      "grad_norm": 0.4615263044834137,
+      "learning_rate": 9.929863613564631e-05,
+      "loss": 1.1716,
+      "step": 69
+    },
+    {
+      "epoch": 0.05813953488372093,
+      "grad_norm": 0.43346571922302246,
+      "learning_rate": 9.927660069940385e-05,
+      "loss": 1.0525,
+      "step": 70
+    },
+    {
+      "epoch": 0.058970099667774084,
+      "grad_norm": 0.4426651895046234,
+      "learning_rate": 9.925422696325975e-05,
+      "loss": 1.1438,
+      "step": 71
+    },
+    {
+      "epoch": 0.059800664451827246,
+      "grad_norm": 0.4421578347682953,
+      "learning_rate": 9.9231515080817e-05,
+      "loss": 1.1144,
+      "step": 72
+    },
+    {
+      "epoch": 0.0606312292358804,
+      "grad_norm": 0.5192376375198364,
+      "learning_rate": 9.920846520800004e-05,
+      "loss": 1.2093,
+      "step": 73
+    },
+    {
+      "epoch": 0.061461794019933555,
+      "grad_norm": 0.44912227988243103,
+      "learning_rate": 9.918507750305379e-05,
+      "loss": 0.8988,
+      "step": 74
+    },
+    {
+      "epoch": 0.06229235880398671,
+      "grad_norm": 0.4720863103866577,
+      "learning_rate": 9.916135212654242e-05,
+      "loss": 1.0723,
+      "step": 75
+    },
+    {
+      "epoch": 0.06312292358803986,
+      "grad_norm": 0.5786833763122559,
+      "learning_rate": 9.913728924134838e-05,
+      "loss": 1.4994,
+      "step": 76
+    },
+    {
+      "epoch": 0.06395348837209303,
+      "grad_norm": 0.4157602787017822,
+      "learning_rate": 9.91128890126712e-05,
+      "loss": 0.9483,
+      "step": 77
+    },
+    {
+      "epoch": 0.06478405315614617,
+      "grad_norm": 0.5317956805229187,
+      "learning_rate": 9.90881516080264e-05,
+      "loss": 1.1812,
+      "step": 78
+    },
+    {
+      "epoch": 0.06561461794019934,
+      "grad_norm": 0.5728016495704651,
+      "learning_rate": 9.906307719724431e-05,
+      "loss": 1.3432,
+      "step": 79
+    },
+    {
+      "epoch": 0.0664451827242525,
+      "grad_norm": 0.4920070469379425,
+      "learning_rate": 9.903766595246893e-05,
+      "loss": 1.0181,
+      "step": 80
+    },
+    {
+      "epoch": 0.06727574750830564,
+      "grad_norm": 0.48954302072525024,
+      "learning_rate": 9.90119180481567e-05,
+      "loss": 1.1,
+      "step": 81
+    },
+    {
+      "epoch": 0.0681063122923588,
+      "grad_norm": 0.5216732621192932,
+      "learning_rate": 9.898583366107538e-05,
+      "loss": 1.2424,
+      "step": 82
+    },
+    {
+      "epoch": 0.06893687707641195,
+      "grad_norm": 0.5158425569534302,
+      "learning_rate": 9.895941297030278e-05,
+      "loss": 1.2009,
+      "step": 83
+    },
+    {
+      "epoch": 0.06976744186046512,
+      "grad_norm": 0.5216576457023621,
+      "learning_rate": 9.893265615722554e-05,
+      "loss": 1.1335,
+      "step": 84
+    },
+    {
+      "epoch": 0.07059800664451828,
+      "grad_norm": 0.6133514046669006,
+      "learning_rate": 9.890556340553787e-05,
+      "loss": 1.076,
+      "step": 85
+    },
+    {
+      "epoch": 0.07142857142857142,
+      "grad_norm": 0.5426653027534485,
+      "learning_rate": 9.887813490124036e-05,
+      "loss": 1.2493,
+      "step": 86
+    },
+    {
+      "epoch": 0.07225913621262459,
+      "grad_norm": 0.5876060724258423,
+      "learning_rate": 9.885037083263862e-05,
+      "loss": 1.0067,
+      "step": 87
+    },
+    {
+      "epoch": 0.07308970099667775,
+      "grad_norm": 0.5988413691520691,
+      "learning_rate": 9.882227139034197e-05,
+      "loss": 1.1349,
+      "step": 88
+    },
+    {
+      "epoch": 0.0739202657807309,
+      "grad_norm": 0.7058652639389038,
+      "learning_rate": 9.879383676726228e-05,
+      "loss": 1.2409,
+      "step": 89
+    },
+    {
+      "epoch": 0.07475083056478406,
+      "grad_norm": 1.251499891281128,
+      "learning_rate": 9.876506715861247e-05,
+      "loss": 1.5029,
+      "step": 90
+    },
+    {
+      "epoch": 0.0755813953488372,
+      "grad_norm": 1.0047953128814697,
+      "learning_rate": 9.873596276190526e-05,
+      "loss": 1.2852,
+      "step": 91
+    },
+    {
+      "epoch": 0.07641196013289037,
+      "grad_norm": 1.0148297548294067,
+      "learning_rate": 9.870652377695181e-05,
+      "loss": 1.3367,
+      "step": 92
+    },
+    {
+      "epoch": 0.07724252491694353,
+      "grad_norm": 0.968641459941864,
+      "learning_rate": 9.867675040586034e-05,
+      "loss": 1.2301,
+      "step": 93
+    },
+    {
+      "epoch": 0.07807308970099668,
+      "grad_norm": 0.8970495462417603,
+      "learning_rate": 9.864664285303474e-05,
+      "loss": 1.0635,
+      "step": 94
+    },
+    {
+      "epoch": 0.07890365448504984,
+      "grad_norm": 0.6693372130393982,
+      "learning_rate": 9.861620132517313e-05,
+      "loss": 1.4661,
+      "step": 95
+    },
+    {
+      "epoch": 0.07973421926910298,
+      "grad_norm": 0.663826584815979,
+      "learning_rate": 9.858542603126653e-05,
+      "loss": 1.4328,
+      "step": 96
+    },
+    {
+      "epoch": 0.08056478405315615,
+      "grad_norm": 0.9918713569641113,
+      "learning_rate": 9.855431718259734e-05,
+      "loss": 1.7106,
+      "step": 97
+    },
+    {
+      "epoch": 0.08139534883720931,
+      "grad_norm": 1.2102949619293213,
+      "learning_rate": 9.852287499273795e-05,
+      "loss": 1.5173,
+      "step": 98
+    },
+    {
+      "epoch": 0.08222591362126246,
+      "grad_norm": 0.7180892825126648,
+      "learning_rate": 9.849109967754919e-05,
+      "loss": 1.9093,
+      "step": 99
+    },
+    {
+      "epoch": 0.08305647840531562,
+      "grad_norm": 0.7022852301597595,
+      "learning_rate": 9.845899145517898e-05,
+      "loss": 1.8911,
+      "step": 100
+    },
+    {
+      "epoch": 0.08388704318936877,
+      "grad_norm": 1.3352265357971191,
+      "learning_rate": 9.84265505460607e-05,
+      "loss": 1.2523,
+      "step": 101
+    },
+    {
+      "epoch": 0.08471760797342193,
+      "grad_norm": 0.9616285562515259,
+      "learning_rate": 9.839377717291172e-05,
+      "loss": 1.1401,
+      "step": 102
+    },
+    {
+      "epoch": 0.08554817275747509,
+      "grad_norm": 0.5887272953987122,
+      "learning_rate": 9.836067156073196e-05,
+      "loss": 1.1497,
+      "step": 103
+    },
+    {
+      "epoch": 0.08637873754152824,
+      "grad_norm": 0.4377554655075073,
+      "learning_rate": 9.83272339368022e-05,
+      "loss": 1.1824,
+      "step": 104
+    },
+    {
+      "epoch": 0.0872093023255814,
+      "grad_norm": 0.4125669300556183,
+      "learning_rate": 9.829346453068263e-05,
+      "loss": 1.0539,
+      "step": 105
+    },
+    {
+      "epoch": 0.08803986710963455,
+      "grad_norm": 0.42765673995018005,
+      "learning_rate": 9.825936357421119e-05,
+      "loss": 1.0731,
+      "step": 106
+    },
+    {
+      "epoch": 0.0888704318936877,
+      "grad_norm": 0.4212810695171356,
+      "learning_rate": 9.822493130150205e-05,
+      "loss": 1.2554,
+      "step": 107
+    },
+    {
+      "epoch": 0.08970099667774087,
+      "grad_norm": 0.4331800043582916,
+      "learning_rate": 9.819016794894398e-05,
+      "loss": 1.1332,
+      "step": 108
+    },
+    {
+      "epoch": 0.09053156146179402,
+      "grad_norm": 0.4134885370731354,
+      "learning_rate": 9.81550737551987e-05,
+      "loss": 1.1947,
+      "step": 109
+    },
+    {
+      "epoch": 0.09136212624584718,
+      "grad_norm": 0.38614025712013245,
+      "learning_rate": 9.811964896119925e-05,
+      "loss": 1.096,
+      "step": 110
+    },
+    {
+      "epoch": 0.09219269102990033,
+      "grad_norm": 0.4370211064815521,
+      "learning_rate": 9.808389381014842e-05,
+      "loss": 1.1304,
+      "step": 111
+    },
+    {
+      "epoch": 0.09302325581395349,
+      "grad_norm": 0.4610917866230011,
+      "learning_rate": 9.804780854751692e-05,
+      "loss": 1.1183,
+      "step": 112
+    },
+    {
+      "epoch": 0.09385382059800665,
+      "grad_norm": 0.4317246973514557,
+      "learning_rate": 9.801139342104185e-05,
+      "loss": 1.0942,
+      "step": 113
+    },
+    {
+      "epoch": 0.0946843853820598,
+      "grad_norm": 0.35810256004333496,
+      "learning_rate": 9.797464868072488e-05,
+      "loss": 1.0067,
+      "step": 114
+    },
+    {
+      "epoch": 0.09551495016611296,
+      "grad_norm": 0.32996103167533875,
+      "learning_rate": 9.793757457883062e-05,
+      "loss": 1.0033,
+      "step": 115
+    },
+    {
+      "epoch": 0.09634551495016612,
+      "grad_norm": 0.3516066074371338,
+      "learning_rate": 9.790017136988486e-05,
+      "loss": 1.0525,
+      "step": 116
+    },
+    {
+      "epoch": 0.09717607973421927,
+      "grad_norm": 0.34211546182632446,
+      "learning_rate": 9.786243931067277e-05,
+      "loss": 1.0535,
+      "step": 117
+    },
+    {
+      "epoch": 0.09800664451827243,
+      "grad_norm": 0.5199074149131775,
+      "learning_rate": 9.782437866023724e-05,
+      "loss": 0.9772,
+      "step": 118
+    },
+    {
+      "epoch": 0.09883720930232558,
+      "grad_norm": 0.3598504960536957,
+      "learning_rate": 9.778598967987704e-05,
+      "loss": 0.956,
+      "step": 119
+    },
+    {
+      "epoch": 0.09966777408637874,
+      "grad_norm": 0.3455759882926941,
+      "learning_rate": 9.774727263314498e-05,
+      "loss": 0.935,
+      "step": 120
+    },
+    {
+      "epoch": 0.1004983388704319,
+      "grad_norm": 0.3836323916912079,
+      "learning_rate": 9.770822778584621e-05,
+      "loss": 1.1212,
+      "step": 121
+    },
+    {
+      "epoch": 0.10132890365448505,
+      "grad_norm": 0.35276830196380615,
+      "learning_rate": 9.76688554060363e-05,
+      "loss": 1.1387,
+      "step": 122
+    },
+    {
+      "epoch": 0.10215946843853821,
+      "grad_norm": 0.4226708710193634,
+      "learning_rate": 9.762915576401945e-05,
+      "loss": 1.1717,
+      "step": 123
+    },
+    {
+      "epoch": 0.10299003322259136,
+      "grad_norm": 0.43557512760162354,
+      "learning_rate": 9.758912913234665e-05,
+      "loss": 1.2712,
+      "step": 124
+    },
+    {
+      "epoch": 0.10382059800664452,
+      "grad_norm": 0.4375877380371094,
+      "learning_rate": 9.754877578581372e-05,
+      "loss": 1.0037,
+      "step": 125
+    },
+    {
+      "epoch": 0.10465116279069768,
+      "grad_norm": 0.5096732378005981,
+      "learning_rate": 9.750809600145954e-05,
+      "loss": 1.1839,
+      "step": 126
+    },
+    {
+      "epoch": 0.10548172757475083,
+      "grad_norm": 0.44850867986679077,
+      "learning_rate": 9.746709005856406e-05,
+      "loss": 1.1614,
+      "step": 127
+    },
+    {
+      "epoch": 0.10631229235880399,
+      "grad_norm": 0.42907238006591797,
+      "learning_rate": 9.742575823864643e-05,
+      "loss": 0.9948,
+      "step": 128
+    },
+    {
+      "epoch": 0.10714285714285714,
+      "grad_norm": 0.47005370259284973,
+      "learning_rate": 9.738410082546304e-05,
+      "loss": 1.3166,
+      "step": 129
+    },
+    {
+      "epoch": 0.1079734219269103,
+      "grad_norm": 0.4051884710788727,
+      "learning_rate": 9.734211810500557e-05,
+      "loss": 0.9562,
+      "step": 130
+    },
+    {
+      "epoch": 0.10880398671096346,
+      "grad_norm": 0.4847196042537689,
+      "learning_rate": 9.729981036549911e-05,
+      "loss": 1.0946,
+      "step": 131
+    },
+    {
+      "epoch": 0.10963455149501661,
+      "grad_norm": 0.4471109211444855,
+      "learning_rate": 9.725717789740002e-05,
+      "loss": 0.9905,
+      "step": 132
+    },
+    {
+      "epoch": 0.11046511627906977,
+      "grad_norm": 0.5282467007637024,
+      "learning_rate": 9.721422099339408e-05,
+      "loss": 1.3748,
+      "step": 133
+    },
+    {
+      "epoch": 0.11129568106312292,
+      "grad_norm": 0.46753332018852234,
+      "learning_rate": 9.717093994839443e-05,
+      "loss": 1.0631,
+      "step": 134
+    },
+    {
+      "epoch": 0.11212624584717608,
+      "grad_norm": 0.6386809349060059,
+      "learning_rate": 9.712733505953951e-05,
+      "loss": 1.3702,
+      "step": 135
+    },
+    {
+      "epoch": 0.11295681063122924,
+      "grad_norm": 0.5727132558822632,
+      "learning_rate": 9.708340662619108e-05,
+      "loss": 1.0232,
+      "step": 136
+    },
+    {
+      "epoch": 0.11378737541528239,
+      "grad_norm": 0.542736291885376,
+      "learning_rate": 9.703915494993215e-05,
+      "loss": 1.0423,
+      "step": 137
+    },
+    {
+      "epoch": 0.11461794019933555,
+      "grad_norm": 0.5198262929916382,
+      "learning_rate": 9.699458033456485e-05,
+      "loss": 1.0543,
+      "step": 138
+    },
+    {
+      "epoch": 0.11544850498338871,
+      "grad_norm": 0.6104252934455872,
+      "learning_rate": 9.694968308610844e-05,
+      "loss": 1.227,
+      "step": 139
+    },
+    {
+      "epoch": 0.11627906976744186,
+      "grad_norm": 0.6988504528999329,
+      "learning_rate": 9.690446351279713e-05,
+      "loss": 1.4354,
+      "step": 140
+    },
+    {
+      "epoch": 0.11710963455149502,
+      "grad_norm": 1.157366156578064,
+      "learning_rate": 9.685892192507804e-05,
+      "loss": 1.4597,
+      "step": 141
+    },
+    {
+      "epoch": 0.11794019933554817,
+      "grad_norm": 1.6810818910598755,
+      "learning_rate": 9.681305863560896e-05,
+      "loss": 1.6604,
+      "step": 142
+    },
+    {
+      "epoch": 0.11877076411960133,
+      "grad_norm": 2.7330241203308105,
+      "learning_rate": 9.676687395925631e-05,
+      "loss": 0.7902,
+      "step": 143
+    },
+    {
+      "epoch": 0.11960132890365449,
+      "grad_norm": 0.6627132892608643,
+      "learning_rate": 9.672036821309292e-05,
+      "loss": 1.3769,
+      "step": 144
+    },
+    {
+      "epoch": 0.12043189368770764,
+      "grad_norm": 0.6601414680480957,
+      "learning_rate": 9.667354171639589e-05,
+      "loss": 1.305,
+      "step": 145
+    },
+    {
+      "epoch": 0.1212624584717608,
+      "grad_norm": 0.7301626205444336,
+      "learning_rate": 9.662639479064433e-05,
+      "loss": 1.6287,
+      "step": 146
+    },
+    {
+      "epoch": 0.12209302325581395,
+      "grad_norm": 0.8652046322822571,
+      "learning_rate": 9.657892775951725e-05,
+      "loss": 1.7812,
+      "step": 147
+    },
+    {
+      "epoch": 0.12292358803986711,
+      "grad_norm": 0.8586255311965942,
+      "learning_rate": 9.653114094889127e-05,
+      "loss": 1.8033,
+      "step": 148
+    },
+    {
+      "epoch": 0.12375415282392027,
+      "grad_norm": 0.7669869065284729,
+      "learning_rate": 9.64830346868384e-05,
+      "loss": 1.7385,
+      "step": 149
+    },
+    {
+      "epoch": 0.12458471760797342,
+      "grad_norm": 0.776881992816925,
+      "learning_rate": 9.643460930362377e-05,
+      "loss": 1.6298,
+      "step": 150
+    },
+    {
+      "epoch": 0.12541528239202657,
+      "grad_norm": 5.892143249511719,
+      "learning_rate": 9.63858651317034e-05,
+      "loss": 1.3574,
+      "step": 151
+    },
+    {
+      "epoch": 0.12624584717607973,
+      "grad_norm": 1.7542022466659546,
+      "learning_rate": 9.633680250572192e-05,
+      "loss": 1.5287,
+      "step": 152
+    },
+    {
+      "epoch": 0.1270764119601329,
+      "grad_norm": 1.029563546180725,
+      "learning_rate": 9.628742176251019e-05,
+      "loss": 1.1049,
+      "step": 153
+    },
+    {
+      "epoch": 0.12790697674418605,
+      "grad_norm": 0.8376612663269043,
+      "learning_rate": 9.62377232410831e-05,
+      "loss": 1.4343,
+      "step": 154
+    },
+    {
+      "epoch": 0.1287375415282392,
+      "grad_norm": 0.6936919093132019,
+      "learning_rate": 9.618770728263717e-05,
+      "loss": 1.401,
+      "step": 155
+    },
+    {
+      "epoch": 0.12956810631229235,
+      "grad_norm": 0.45092615485191345,
+      "learning_rate": 9.613737423054825e-05,
+      "loss": 1.1354,
+      "step": 156
+    },
+    {
+      "epoch": 0.1303986710963455,
+      "grad_norm": 0.42109444737434387,
+      "learning_rate": 9.60867244303691e-05,
+      "loss": 1.0554,
+      "step": 157
+    },
+    {
+      "epoch": 0.13122923588039867,
+      "grad_norm": 0.37955740094184875,
+      "learning_rate": 9.603575822982711e-05,
+      "loss": 1.0722,
+      "step": 158
+    },
+    {
+      "epoch": 0.13205980066445183,
+      "grad_norm": 0.32876577973365784,
+      "learning_rate": 9.598447597882181e-05,
+      "loss": 1.0856,
+      "step": 159
+    },
+    {
+      "epoch": 0.132890365448505,
+      "grad_norm": 0.4142758548259735,
+      "learning_rate": 9.593287802942255e-05,
+      "loss": 1.0595,
+      "step": 160
+    },
+    {
+      "epoch": 0.13372093023255813,
+      "grad_norm": 0.3402114510536194,
+      "learning_rate": 9.588096473586605e-05,
+      "loss": 1.0763,
+      "step": 161
+    },
+    {
+      "epoch": 0.1345514950166113,
+      "grad_norm": 0.5065223574638367,
+      "learning_rate": 9.582873645455397e-05,
+      "loss": 1.2466,
+      "step": 162
+    },
+    {
+      "epoch": 0.13538205980066445,
+      "grad_norm": 0.4137636125087738,
+      "learning_rate": 9.577619354405047e-05,
+      "loss": 1.1714,
+      "step": 163
+    },
+    {
+      "epoch": 0.1362126245847176,
+      "grad_norm": 0.43655475974082947,
+      "learning_rate": 9.57233363650797e-05,
+      "loss": 1.1803,
+      "step": 164
+    },
+    {
+      "epoch": 0.13704318936877077,
+      "grad_norm": 0.36370548605918884,
+      "learning_rate": 9.567016528052342e-05,
+      "loss": 1.2056,
+      "step": 165
+    },
+    {
+      "epoch": 0.1378737541528239,
+      "grad_norm": 0.40192461013793945,
+      "learning_rate": 9.561668065541841e-05,
+      "loss": 1.2035,
+      "step": 166
+    },
+    {
+      "epoch": 0.13870431893687707,
+      "grad_norm": 0.343547523021698,
+      "learning_rate": 9.556288285695405e-05,
+      "loss": 0.9322,
+      "step": 167
+    },
+    {
+      "epoch": 0.13953488372093023,
+      "grad_norm": 0.38246122002601624,
+      "learning_rate": 9.550877225446973e-05,
+      "loss": 1.1068,
+      "step": 168
+    },
+    {
+      "epoch": 0.1403654485049834,
+      "grad_norm": 0.35736140608787537,
+      "learning_rate": 9.545434921945236e-05,
+      "loss": 1.0511,
+      "step": 169
+    },
+    {
+      "epoch": 0.14119601328903655,
+      "grad_norm": 0.3873831629753113,
+      "learning_rate": 9.539961412553375e-05,
+      "loss": 1.1969,
+      "step": 170
+    },
+    {
+      "epoch": 0.1420265780730897,
+      "grad_norm": 0.3767174184322357,
+      "learning_rate": 9.534456734848816e-05,
+      "loss": 1.0787,
+      "step": 171
+    },
+    {
+      "epoch": 0.14285714285714285,
+      "grad_norm": 0.3375527262687683,
+      "learning_rate": 9.528920926622965e-05,
+      "loss": 1.0313,
+      "step": 172
+    },
+    {
+      "epoch": 0.143687707641196,
+      "grad_norm": 0.365445613861084,
+      "learning_rate": 9.523354025880944e-05,
+      "loss": 0.9471,
+      "step": 173
+    },
+    {
+      "epoch": 0.14451827242524917,
+      "grad_norm": 0.3636738955974579,
+      "learning_rate": 9.517756070841339e-05,
+      "loss": 0.9616,
+      "step": 174
+    },
+    {
+      "epoch": 0.14534883720930233,
+      "grad_norm": 0.5927349328994751,
+      "learning_rate": 9.512127099935933e-05,
+      "loss": 1.1494,
+      "step": 175
+    },
+    {
+      "epoch": 0.1461794019933555,
+      "grad_norm": 0.38282474875450134,
+      "learning_rate": 9.506467151809446e-05,
+      "loss": 1.0891,
+      "step": 176
+    },
+    {
+      "epoch": 0.14700996677740863,
+      "grad_norm": 0.42068254947662354,
+      "learning_rate": 9.50077626531926e-05,
+      "loss": 1.0558,
+      "step": 177
+    },
+    {
+      "epoch": 0.1478405315614618,
+      "grad_norm": 0.5121377110481262,
+      "learning_rate": 9.495054479535167e-05,
+      "loss": 1.0502,
+      "step": 178
+    },
+    {
+      "epoch": 0.14867109634551495,
+      "grad_norm": 0.43572869896888733,
+      "learning_rate": 9.489301833739087e-05,
+      "loss": 1.1904,
+      "step": 179
+    },
+    {
+      "epoch": 0.14950166112956811,
+      "grad_norm": 0.4477289319038391,
+      "learning_rate": 9.483518367424803e-05,
+      "loss": 0.8775,
+      "step": 180
+    },
+    {
+      "epoch": 0.15033222591362128,
+      "grad_norm": 0.43644505739212036,
+      "learning_rate": 9.477704120297697e-05,
+      "loss": 0.883,
+      "step": 181
+    },
+    {
+      "epoch": 0.1511627906976744,
+      "grad_norm": 0.42366358637809753,
+      "learning_rate": 9.471859132274467e-05,
+      "loss": 1.0469,
+      "step": 182
+    },
+    {
+      "epoch": 0.15199335548172757,
+      "grad_norm": 0.5172445178031921,
+      "learning_rate": 9.465983443482858e-05,
+      "loss": 1.1909,
+      "step": 183
+    },
+    {
+      "epoch": 0.15282392026578073,
+      "grad_norm": 0.5384465456008911,
+      "learning_rate": 9.460077094261386e-05,
+      "loss": 1.463,
+      "step": 184
+    },
+    {
+      "epoch": 0.1536544850498339,
+      "grad_norm": 0.47480395436286926,
+      "learning_rate": 9.454140125159059e-05,
+      "loss": 1.017,
+      "step": 185
+    },
+    {
+      "epoch": 0.15448504983388706,
+      "grad_norm": 0.48875659704208374,
+      "learning_rate": 9.448172576935106e-05,
+      "loss": 1.2197,
+      "step": 186
+    },
+    {
+      "epoch": 0.1553156146179402,
+      "grad_norm": 0.5268120169639587,
+      "learning_rate": 9.442174490558684e-05,
+      "loss": 1.1601,
+      "step": 187
+    },
+    {
+      "epoch": 0.15614617940199335,
+      "grad_norm": 0.5318688154220581,
+      "learning_rate": 9.436145907208609e-05,
+      "loss": 1.229,
+      "step": 188
+    },
+    {
+      "epoch": 0.1569767441860465,
+      "grad_norm": 0.6206968426704407,
+      "learning_rate": 9.430086868273069e-05,
+      "loss": 1.1517,
+      "step": 189
+    },
+    {
+      "epoch": 0.15780730897009967,
+      "grad_norm": 0.7821551561355591,
+      "learning_rate": 9.423997415349338e-05,
+      "loss": 1.3889,
+      "step": 190
+    },
+    {
+      "epoch": 0.15863787375415284,
+      "grad_norm": 0.7484890818595886,
+      "learning_rate": 9.417877590243492e-05,
+      "loss": 1.2834,
+      "step": 191
+    },
+    {
+      "epoch": 0.15946843853820597,
+      "grad_norm": 1.2835233211517334,
+      "learning_rate": 9.411727434970121e-05,
+      "loss": 0.8839,
+      "step": 192
+    },
+    {
+      "epoch": 0.16029900332225913,
+      "grad_norm": 1.346498966217041,
+      "learning_rate": 9.405546991752046e-05,
+      "loss": 1.3051,
+      "step": 193
+    },
+    {
+      "epoch": 0.1611295681063123,
+      "grad_norm": 1.9647303819656372,
+      "learning_rate": 9.399336303020016e-05,
+      "loss": 1.3033,
+      "step": 194
+    },
+    {
+      "epoch": 0.16196013289036545,
+      "grad_norm": 0.6098899841308594,
+      "learning_rate": 9.393095411412435e-05,
+      "loss": 1.2438,
+      "step": 195
+    },
+    {
+      "epoch": 0.16279069767441862,
+      "grad_norm": 0.5487431883811951,
+      "learning_rate": 9.386824359775055e-05,
+      "loss": 1.3844,
+      "step": 196
+    },
+    {
+      "epoch": 0.16362126245847175,
+      "grad_norm": 0.6355300545692444,
+      "learning_rate": 9.380523191160682e-05,
+      "loss": 1.5138,
+      "step": 197
+    },
+    {
+      "epoch": 0.1644518272425249,
+      "grad_norm": 0.7408167719841003,
+      "learning_rate": 9.374191948828894e-05,
+      "loss": 1.2808,
+      "step": 198
+    },
+    {
+      "epoch": 0.16528239202657807,
+      "grad_norm": 0.7942878603935242,
+      "learning_rate": 9.367830676245728e-05,
+      "loss": 1.9019,
+      "step": 199
+    },
+    {
+      "epoch": 0.16611295681063123,
+      "grad_norm": 0.7892114520072937,
+      "learning_rate": 9.361439417083391e-05,
+      "loss": 1.5715,
+      "step": 200
+    },
+    {
+      "epoch": 0.1669435215946844,
+      "grad_norm": 0.40080615878105164,
+      "learning_rate": 9.35501821521996e-05,
+      "loss": 1.0789,
+      "step": 201
+    },
+    {
+      "epoch": 0.16777408637873753,
+      "grad_norm": 0.45729860663414,
+      "learning_rate": 9.34856711473907e-05,
+      "loss": 1.3287,
+      "step": 202
+    },
+    {
+      "epoch": 0.1686046511627907,
+      "grad_norm": 0.4126485288143158,
+      "learning_rate": 9.34208615992963e-05,
+      "loss": 1.1503,
+      "step": 203
+    },
+    {
+      "epoch": 0.16943521594684385,
+      "grad_norm": 0.38482362031936646,
+      "learning_rate": 9.335575395285501e-05,
+      "loss": 1.0912,
+      "step": 204
+    },
+    {
+      "epoch": 0.17026578073089702,
+      "grad_norm": 0.34688103199005127,
+      "learning_rate": 9.329034865505206e-05,
+      "loss": 0.8948,
+      "step": 205
+    },
+    {
+      "epoch": 0.17109634551495018,
+      "grad_norm": 0.3837721049785614,
+      "learning_rate": 9.322464615491606e-05,
+      "loss": 1.0593,
+      "step": 206
+    },
+    {
+      "epoch": 0.1719269102990033,
+      "grad_norm": 0.34867429733276367,
+      "learning_rate": 9.315864690351607e-05,
+      "loss": 0.957,
+      "step": 207
+    },
+    {
+      "epoch": 0.17275747508305647,
+      "grad_norm": 0.34445056319236755,
+      "learning_rate": 9.309235135395844e-05,
+      "loss": 0.9905,
+      "step": 208
+    },
+    {
+      "epoch": 0.17358803986710963,
+      "grad_norm": 0.3708004951477051,
+      "learning_rate": 9.302575996138368e-05,
+      "loss": 1.0446,
+      "step": 209
+    },
+    {
+      "epoch": 0.1744186046511628,
+      "grad_norm": 0.3682551085948944,
+      "learning_rate": 9.29588731829634e-05,
+      "loss": 1.1224,
+      "step": 210
+    },
+    {
+      "epoch": 0.17524916943521596,
+      "grad_norm": 0.3659668266773224,
+      "learning_rate": 9.289169147789707e-05,
+      "loss": 1.0877,
+      "step": 211
+    },
+    {
+      "epoch": 0.1760797342192691,
+      "grad_norm": 0.4240683913230896,
+      "learning_rate": 9.282421530740899e-05,
+      "loss": 1.2557,
+      "step": 212
+    },
+    {
+      "epoch": 0.17691029900332225,
+      "grad_norm": 0.38493841886520386,
+      "learning_rate": 9.275644513474501e-05,
+      "loss": 1.04,
+      "step": 213
+    },
+    {
+      "epoch": 0.1777408637873754,
+      "grad_norm": 0.3720282316207886,
+      "learning_rate": 9.268838142516943e-05,
+      "loss": 1.1625,
+      "step": 214
+    },
+    {
+      "epoch": 0.17857142857142858,
+      "grad_norm": 0.3604121506214142,
+      "learning_rate": 9.262002464596178e-05,
+      "loss": 0.9098,
+      "step": 215
+    },
+    {
+      "epoch": 0.17940199335548174,
+      "grad_norm": 0.3949112594127655,
+      "learning_rate": 9.255137526641358e-05,
+      "loss": 1.2574,
+      "step": 216
+    },
+    {
+      "epoch": 0.18023255813953487,
+      "grad_norm": 0.36455246806144714,
+      "learning_rate": 9.248243375782518e-05,
+      "loss": 1.1162,
+      "step": 217
+    },
+    {
+      "epoch": 0.18106312292358803,
+      "grad_norm": 0.40363040566444397,
+      "learning_rate": 9.241320059350243e-05,
+      "loss": 1.1599,
+      "step": 218
+    },
+    {
+      "epoch": 0.1818936877076412,
+      "grad_norm": 0.4062196612358093,
+      "learning_rate": 9.23436762487536e-05,
+      "loss": 1.2368,
+      "step": 219
+    },
+    {
+      "epoch": 0.18272425249169436,
+      "grad_norm": 0.4072086811065674,
+      "learning_rate": 9.22738612008859e-05,
+      "loss": 1.0331,
+      "step": 220
+    },
+    {
+      "epoch": 0.18355481727574752,
+      "grad_norm": 0.4103226065635681,
+      "learning_rate": 9.220375592920238e-05,
+      "loss": 1.1766,
+      "step": 221
+    },
+    {
+      "epoch": 0.18438538205980065,
+      "grad_norm": 0.39902743697166443,
+      "learning_rate": 9.213336091499855e-05,
+      "loss": 0.95,
+      "step": 222
+    },
+    {
+      "epoch": 0.1852159468438538,
+      "grad_norm": 0.39737093448638916,
+      "learning_rate": 9.206267664155907e-05,
+      "loss": 1.0083,
+      "step": 223
+    },
+    {
+      "epoch": 0.18604651162790697,
+      "grad_norm": 0.40778589248657227,
+      "learning_rate": 9.199170359415448e-05,
+      "loss": 1.1456,
+      "step": 224
+    },
+    {
+      "epoch": 0.18687707641196014,
+      "grad_norm": 0.39630424976348877,
+      "learning_rate": 9.192044226003789e-05,
+      "loss": 0.9017,
+      "step": 225
+    },
+    {
+      "epoch": 0.1877076411960133,
+      "grad_norm": 0.3851598799228668,
+      "learning_rate": 9.18488931284415e-05,
+      "loss": 1.0358,
+      "step": 226
+    },
+    {
+      "epoch": 0.18853820598006646,
+      "grad_norm": 0.3983210325241089,
+      "learning_rate": 9.177705669057343e-05,
+      "loss": 0.9138,
+      "step": 227
+    },
+    {
+      "epoch": 0.1893687707641196,
+      "grad_norm": 0.4639119803905487,
+      "learning_rate": 9.170493343961417e-05,
+      "loss": 1.0634,
+      "step": 228
+    },
+    {
+      "epoch": 0.19019933554817275,
+      "grad_norm": 0.4920639991760254,
+      "learning_rate": 9.163252387071335e-05,
+      "loss": 1.2315,
+      "step": 229
+    },
+    {
+      "epoch": 0.19102990033222592,
+      "grad_norm": 0.45825761556625366,
+      "learning_rate": 9.155982848098619e-05,
+      "loss": 1.1077,
+      "step": 230
+    },
+    {
+      "epoch": 0.19186046511627908,
+      "grad_norm": 0.3676402270793915,
+      "learning_rate": 9.148684776951022e-05,
+      "loss": 0.8934,
+      "step": 231
+    },
+    {
+      "epoch": 0.19269102990033224,
+      "grad_norm": 0.4435743987560272,
+      "learning_rate": 9.141358223732176e-05,
+      "loss": 1.1319,
+      "step": 232
+    },
+    {
+      "epoch": 0.19352159468438537,
+      "grad_norm": 0.5071436762809753,
+      "learning_rate": 9.134003238741258e-05,
+      "loss": 1.1033,
+      "step": 233
+    },
+    {
+      "epoch": 0.19435215946843853,
+      "grad_norm": 0.4594777226448059,
+      "learning_rate": 9.126619872472627e-05,
+      "loss": 1.1039,
+      "step": 234
+    },
+    {
+      "epoch": 0.1951827242524917,
+      "grad_norm": 0.5245004892349243,
+      "learning_rate": 9.119208175615502e-05,
+      "loss": 1.0337,
+      "step": 235
+    },
+    {
+      "epoch": 0.19601328903654486,
+      "grad_norm": 0.5391436219215393,
+      "learning_rate": 9.111768199053588e-05,
+      "loss": 1.1351,
+      "step": 236
+    },
+    {
+      "epoch": 0.19684385382059802,
+      "grad_norm": 0.5461758375167847,
+      "learning_rate": 9.104299993864751e-05,
+      "loss": 1.107,
+      "step": 237
+    },
+    {
+      "epoch": 0.19767441860465115,
+      "grad_norm": 0.5125625133514404,
+      "learning_rate": 9.096803611320647e-05,
+      "loss": 1.0142,
+      "step": 238
+    },
+    {
+      "epoch": 0.19850498338870431,
+      "grad_norm": 0.59673011302948,
+      "learning_rate": 9.089279102886383e-05,
+      "loss": 1.2328,
+      "step": 239
+    },
+    {
+      "epoch": 0.19933554817275748,
+      "grad_norm": 0.6944905519485474,
+      "learning_rate": 9.081726520220157e-05,
+      "loss": 1.5213,
+      "step": 240
+    },
+    {
+      "epoch": 0.20016611295681064,
+      "grad_norm": 0.7668107151985168,
+      "learning_rate": 9.074145915172909e-05,
+      "loss": 1.1205,
+      "step": 241
+    },
+    {
+      "epoch": 0.2009966777408638,
+      "grad_norm": 0.9041101336479187,
+      "learning_rate": 9.06653733978796e-05,
+      "loss": 1.0536,
+      "step": 242
+    },
+    {
+      "epoch": 0.20182724252491693,
+      "grad_norm": 1.9888440370559692,
+      "learning_rate": 9.058900846300655e-05,
+      "loss": 0.7866,
+      "step": 243
+    },
+    {
+      "epoch": 0.2026578073089701,
+      "grad_norm": 0.878629744052887,
+      "learning_rate": 9.051236487138009e-05,
+      "loss": 1.4797,
+      "step": 244
+    },
+    {
+      "epoch": 0.20348837209302326,
+      "grad_norm": 0.6538046002388,
+      "learning_rate": 9.04354431491834e-05,
+      "loss": 1.4146,
+      "step": 245
+    },
+    {
+      "epoch": 0.20431893687707642,
+      "grad_norm": 0.8204836249351501,
+      "learning_rate": 9.035824382450914e-05,
+      "loss": 1.6264,
+      "step": 246
+    },
+    {
+      "epoch": 0.20514950166112958,
+      "grad_norm": 0.668964684009552,
+      "learning_rate": 9.028076742735583e-05,
+      "loss": 1.6275,
+      "step": 247
+    },
+    {
+      "epoch": 0.2059800664451827,
+      "grad_norm": 0.6760320663452148,
+      "learning_rate": 9.020301448962412e-05,
+      "loss": 1.5697,
+      "step": 248
+    },
+    {
+      "epoch": 0.20681063122923588,
+      "grad_norm": 0.787839412689209,
+      "learning_rate": 9.012498554511323e-05,
+      "loss": 1.6312,
+      "step": 249
+    },
+    {
+      "epoch": 0.20764119601328904,
+      "grad_norm": 0.8162153363227844,
+      "learning_rate": 9.004668112951729e-05,
+      "loss": 1.8052,
+      "step": 250
+    },
+    {
+      "epoch": 0.2084717607973422,
+      "grad_norm": 0.5513446927070618,
+      "learning_rate": 8.996810178042156e-05,
+      "loss": 1.4598,
+      "step": 251
+    },
+    {
+      "epoch": 0.20930232558139536,
+      "grad_norm": 0.4128928482532501,
+      "learning_rate": 8.988924803729888e-05,
+      "loss": 1.1677,
+      "step": 252
+    },
+    {
+      "epoch": 0.2101328903654485,
+      "grad_norm": 0.4060623347759247,
+      "learning_rate": 8.981012044150586e-05,
+      "loss": 1.1199,
+      "step": 253
+    },
+    {
+      "epoch": 0.21096345514950166,
+      "grad_norm": 0.45278000831604004,
+      "learning_rate": 8.973071953627916e-05,
+      "loss": 1.2253,
+      "step": 254
+    },
+    {
+      "epoch": 0.21179401993355482,
+      "grad_norm": 0.41483479738235474,
+      "learning_rate": 8.965104586673188e-05,
+      "loss": 0.9995,
+      "step": 255
+    },
+    {
+      "epoch": 0.21262458471760798,
+      "grad_norm": 0.45233410596847534,
+      "learning_rate": 8.957109997984968e-05,
+      "loss": 1.2947,
+      "step": 256
+    },
+    {
+      "epoch": 0.21345514950166114,
+      "grad_norm": 0.4158819317817688,
+      "learning_rate": 8.949088242448708e-05,
+      "loss": 1.1816,
+      "step": 257
+    },
+    {
+      "epoch": 0.21428571428571427,
+      "grad_norm": 0.4162903428077698,
+      "learning_rate": 8.941039375136371e-05,
+      "loss": 1.1521,
+      "step": 258
+    },
+    {
+      "epoch": 0.21511627906976744,
+      "grad_norm": 0.40562358498573303,
+      "learning_rate": 8.932963451306054e-05,
+      "loss": 1.137,
+      "step": 259
+    },
+    {
+      "epoch": 0.2159468438538206,
+      "grad_norm": 0.38327541947364807,
+      "learning_rate": 8.924860526401597e-05,
+      "loss": 1.0355,
+      "step": 260
+    },
+    {
+      "epoch": 0.21677740863787376,
+      "grad_norm": 0.43055954575538635,
+      "learning_rate": 8.916730656052219e-05,
+      "loss": 1.2281,
+      "step": 261
+    },
+    {
+      "epoch": 0.21760797342192692,
+      "grad_norm": 0.4079986810684204,
+      "learning_rate": 8.908573896072127e-05,
+      "loss": 1.2033,
+      "step": 262
+    },
+    {
+      "epoch": 0.21843853820598005,
+      "grad_norm": 0.37261778116226196,
+      "learning_rate": 8.900390302460133e-05,
+      "loss": 1.0352,
+      "step": 263
+    },
+    {
+      "epoch": 0.21926910299003322,
+      "grad_norm": 0.3851580023765564,
+      "learning_rate": 8.89217993139927e-05,
+      "loss": 0.8673,
+      "step": 264
+    },
+    {
+      "epoch": 0.22009966777408638,
+      "grad_norm": 0.34604310989379883,
+      "learning_rate": 8.883942839256408e-05,
+      "loss": 1.0228,
+      "step": 265
+    },
+    {
+      "epoch": 0.22093023255813954,
+      "grad_norm": 0.3763490915298462,
+      "learning_rate": 8.875679082581864e-05,
+      "loss": 1.0876,
+      "step": 266
+    },
+    {
+      "epoch": 0.2217607973421927,
+      "grad_norm": 0.36711278557777405,
+      "learning_rate": 8.867388718109017e-05,
+      "loss": 0.8726,
+      "step": 267
+    },
+    {
+      "epoch": 0.22259136212624583,
+      "grad_norm": 0.4267362654209137,
+      "learning_rate": 8.85907180275392e-05,
+      "loss": 1.1414,
+      "step": 268
+    },
+    {
+      "epoch": 0.223421926910299,
+      "grad_norm": 0.3837275505065918,
+      "learning_rate": 8.850728393614902e-05,
+      "loss": 1.1207,
+      "step": 269
+    },
+    {
+      "epoch": 0.22425249169435216,
+      "grad_norm": 0.565284013748169,
+      "learning_rate": 8.842358547972182e-05,
+      "loss": 1.1005,
+      "step": 270
+    },
+    {
+      "epoch": 0.22508305647840532,
+      "grad_norm": 0.4408017098903656,
+      "learning_rate": 8.833962323287477e-05,
+      "loss": 1.2285,
+      "step": 271
+    },
+    {
+      "epoch": 0.22591362126245848,
+      "grad_norm": 0.43408262729644775,
+      "learning_rate": 8.825539777203597e-05,
+      "loss": 1.1743,
+      "step": 272
+    },
+    {
+      "epoch": 0.22674418604651161,
+      "grad_norm": 0.4070535898208618,
+      "learning_rate": 8.817090967544068e-05,
+      "loss": 1.011,
+      "step": 273
+    },
+    {
+      "epoch": 0.22757475083056478,
+      "grad_norm": 0.3787945508956909,
+      "learning_rate": 8.808615952312713e-05,
+      "loss": 0.7711,
+      "step": 274
+    },
+    {
+      "epoch": 0.22840531561461794,
+      "grad_norm": 0.4361870586872101,
+      "learning_rate": 8.800114789693272e-05,
+      "loss": 1.2295,
+      "step": 275
+    },
+    {
+      "epoch": 0.2292358803986711,
+      "grad_norm": 0.4080670475959778,
+      "learning_rate": 8.791587538048995e-05,
+      "loss": 1.079,
+      "step": 276
+    },
+    {
+      "epoch": 0.23006644518272426,
+      "grad_norm": 0.47581762075424194,
+      "learning_rate": 8.783034255922238e-05,
+      "loss": 1.0537,
+      "step": 277
+    },
+    {
+      "epoch": 0.23089700996677742,
+      "grad_norm": 0.45319613814353943,
+      "learning_rate": 8.774455002034065e-05,
+      "loss": 1.0404,
+      "step": 278
+    },
+    {
+      "epoch": 0.23172757475083056,
+      "grad_norm": 0.44119247794151306,
+      "learning_rate": 8.765849835283851e-05,
+      "loss": 1.2897,
+      "step": 279
+    },
+    {
+      "epoch": 0.23255813953488372,
+      "grad_norm": 0.4381905496120453,
+      "learning_rate": 8.75721881474886e-05,
+      "loss": 0.9729,
+      "step": 280
+    },
+    {
+      "epoch": 0.23338870431893688,
+      "grad_norm": 0.5231764912605286,
+      "learning_rate": 8.748561999683863e-05,
+      "loss": 1.1786,
+      "step": 281
+    },
+    {
+      "epoch": 0.23421926910299004,
+      "grad_norm": 0.41278812289237976,
+      "learning_rate": 8.73987944952071e-05,
+      "loss": 0.931,
+      "step": 282
+    },
+    {
+      "epoch": 0.2350498338870432,
+      "grad_norm": 0.4239978492259979,
+      "learning_rate": 8.731171223867933e-05,
+      "loss": 0.9321,
+      "step": 283
+    },
+    {
+      "epoch": 0.23588039867109634,
+      "grad_norm": 0.4771064221858978,
+      "learning_rate": 8.722437382510338e-05,
+      "loss": 1.0391,
+      "step": 284
+    },
+    {
+      "epoch": 0.2367109634551495,
+      "grad_norm": 0.44685253500938416,
+      "learning_rate": 8.713677985408586e-05,
+      "loss": 0.9824,
+      "step": 285
+    },
+    {
+      "epoch": 0.23754152823920266,
+      "grad_norm": 0.5926410555839539,
+      "learning_rate": 8.704893092698789e-05,
+      "loss": 1.3864,
+      "step": 286
+    },
+    {
+      "epoch": 0.23837209302325582,
+      "grad_norm": 0.5668107271194458,
+      "learning_rate": 8.696082764692098e-05,
+      "loss": 0.9881,
+      "step": 287
+    },
+    {
+      "epoch": 0.23920265780730898,
+      "grad_norm": 0.7727207541465759,
+      "learning_rate": 8.687247061874278e-05,
+      "loss": 1.1828,
+      "step": 288
+    },
+    {
+      "epoch": 0.24003322259136212,
+      "grad_norm": 0.6139202117919922,
+      "learning_rate": 8.678386044905307e-05,
+      "loss": 0.9222,
+      "step": 289
+    },
+    {
+      "epoch": 0.24086378737541528,
+      "grad_norm": 0.6973094940185547,
+      "learning_rate": 8.66949977461895e-05,
+      "loss": 1.1469,
+      "step": 290
+    },
+    {
+      "epoch": 0.24169435215946844,
+      "grad_norm": 0.7914962768554688,
+      "learning_rate": 8.660588312022344e-05,
+      "loss": 1.2579,
+      "step": 291
+    },
+    {
+      "epoch": 0.2425249169435216,
+      "grad_norm": 1.2225151062011719,
+      "learning_rate": 8.651651718295581e-05,
+      "loss": 1.1992,
+      "step": 292
+    },
+    {
+      "epoch": 0.24335548172757476,
+      "grad_norm": 1.5263748168945312,
+      "learning_rate": 8.642690054791285e-05,
+      "loss": 0.8268,
+      "step": 293
+    },
+    {
+      "epoch": 0.2441860465116279,
+      "grad_norm": 1.1765741109848022,
+      "learning_rate": 8.633703383034193e-05,
+      "loss": 1.083,
+      "step": 294
+    },
+    {
+      "epoch": 0.24501661129568106,
+      "grad_norm": 0.7610678672790527,
+      "learning_rate": 8.624691764720731e-05,
+      "loss": 1.6292,
+      "step": 295
+    },
+    {
+      "epoch": 0.24584717607973422,
+      "grad_norm": 0.7405883073806763,
+      "learning_rate": 8.615655261718592e-05,
+      "loss": 1.4353,
+      "step": 296
+    },
+    {
+      "epoch": 0.24667774086378738,
+      "grad_norm": 0.6407126784324646,
+      "learning_rate": 8.606593936066309e-05,
+      "loss": 1.377,
+      "step": 297
+    },
+    {
+      "epoch": 0.24750830564784054,
+      "grad_norm": 0.6730480790138245,
+      "learning_rate": 8.597507849972834e-05,
+      "loss": 1.6916,
+      "step": 298
+    },
+    {
+      "epoch": 0.24833887043189368,
+      "grad_norm": 0.7083413004875183,
+      "learning_rate": 8.588397065817105e-05,
+      "loss": 1.5759,
+      "step": 299
+    },
+    {
+      "epoch": 0.24916943521594684,
+      "grad_norm": 0.7241659760475159,
+      "learning_rate": 8.579261646147618e-05,
+      "loss": 1.6857,
+      "step": 300
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 0.41130688786506653,
+      "learning_rate": 8.570101653682006e-05,
+      "loss": 1.3919,
+      "step": 301
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1204,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 301,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.680973287011123e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be1a8b5f89f7ad2001ed041a9d6f34c27a8af31f38ccc3f17727a9e4b1bf84df
+size 6776