Training in progress, step 348, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +24 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +31 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +207 -0
last-checkpoint/trainer_state.json +2485 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-0.5B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-0.5B",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "v_proj",
+    "o_proj",
+    "gate_proj",
+    "q_proj",
+    "up_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:249c5b9442abbdcac95a1f98049d4b2036e60f2bb48cd866380c66e22c41648f
+size 17640136

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1bf1afe806da865afcb521114ffaa7959ef165a7487d3b2d469dee7ea10044a3
+size 9569204

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:23e4e803720e0dc0719dd70065c71562b06e60f43888d33cc1de613bb5e4e0dd
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be73271b563d711ae2c5fd7d4b937b3e3dbf64b9a3fd307a3339be9933eb8adc
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2485 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.1980085348506401,
+  "eval_steps": 348,
+  "global_step": 348,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0005689900426742532,
+      "grad_norm": 0.922553300857544,
+      "learning_rate": 2e-05,
+      "loss": 1.7225,
+      "step": 1
+    },
+    {
+      "epoch": 0.0005689900426742532,
+      "eval_loss": 1.6560131311416626,
+      "eval_runtime": 17.2854,
+      "eval_samples_per_second": 42.811,
+      "eval_steps_per_second": 21.405,
+      "step": 1
+    },
+    {
+      "epoch": 0.0011379800853485065,
+      "grad_norm": 1.0872293710708618,
+      "learning_rate": 4e-05,
+      "loss": 1.7777,
+      "step": 2
+    },
+    {
+      "epoch": 0.0017069701280227596,
+      "grad_norm": 1.0032234191894531,
+      "learning_rate": 6e-05,
+      "loss": 1.6594,
+      "step": 3
+    },
+    {
+      "epoch": 0.002275960170697013,
+      "grad_norm": 0.9296952486038208,
+      "learning_rate": 8e-05,
+      "loss": 1.6329,
+      "step": 4
+    },
+    {
+      "epoch": 0.002844950213371266,
+      "grad_norm": 0.8549262881278992,
+      "learning_rate": 0.0001,
+      "loss": 1.6946,
+      "step": 5
+    },
+    {
+      "epoch": 0.0034139402560455193,
+      "grad_norm": 0.7175059914588928,
+      "learning_rate": 0.00012,
+      "loss": 1.605,
+      "step": 6
+    },
+    {
+      "epoch": 0.003982930298719772,
+      "grad_norm": 0.729087233543396,
+      "learning_rate": 0.00014,
+      "loss": 1.7539,
+      "step": 7
+    },
+    {
+      "epoch": 0.004551920341394026,
+      "grad_norm": 0.7559539675712585,
+      "learning_rate": 0.00016,
+      "loss": 1.7079,
+      "step": 8
+    },
+    {
+      "epoch": 0.005120910384068279,
+      "grad_norm": 0.9097371101379395,
+      "learning_rate": 0.00018,
+      "loss": 1.4693,
+      "step": 9
+    },
+    {
+      "epoch": 0.005689900426742532,
+      "grad_norm": 0.7562863230705261,
+      "learning_rate": 0.0002,
+      "loss": 1.7192,
+      "step": 10
+    },
+    {
+      "epoch": 0.006258890469416785,
+      "grad_norm": 0.8033550381660461,
+      "learning_rate": 0.00019999974162322295,
+      "loss": 1.6699,
+      "step": 11
+    },
+    {
+      "epoch": 0.0068278805120910386,
+      "grad_norm": 0.6270872950553894,
+      "learning_rate": 0.00019999896649422697,
+      "loss": 1.7042,
+      "step": 12
+    },
+    {
+      "epoch": 0.007396870554765292,
+      "grad_norm": 0.6003552079200745,
+      "learning_rate": 0.00019999767461701748,
+      "loss": 1.672,
+      "step": 13
+    },
+    {
+      "epoch": 0.007965860597439544,
+      "grad_norm": 0.5751997232437134,
+      "learning_rate": 0.00019999586599827042,
+      "loss": 1.5727,
+      "step": 14
+    },
+    {
+      "epoch": 0.008534850640113799,
+      "grad_norm": 0.5488961338996887,
+      "learning_rate": 0.00019999354064733184,
+      "loss": 1.6477,
+      "step": 15
+    },
+    {
+      "epoch": 0.009103840682788052,
+      "grad_norm": 0.4690549671649933,
+      "learning_rate": 0.00019999069857621807,
+      "loss": 1.4063,
+      "step": 16
+    },
+    {
+      "epoch": 0.009672830725462305,
+      "grad_norm": 0.5245763659477234,
+      "learning_rate": 0.00019998733979961563,
+      "loss": 1.649,
+      "step": 17
+    },
+    {
+      "epoch": 0.010241820768136558,
+      "grad_norm": 0.4962601661682129,
+      "learning_rate": 0.0001999834643348811,
+      "loss": 1.5558,
+      "step": 18
+    },
+    {
+      "epoch": 0.010810810810810811,
+      "grad_norm": 0.5009298324584961,
+      "learning_rate": 0.0001999790722020411,
+      "loss": 1.6178,
+      "step": 19
+    },
+    {
+      "epoch": 0.011379800853485065,
+      "grad_norm": 0.5524196028709412,
+      "learning_rate": 0.00019997416342379208,
+      "loss": 1.6133,
+      "step": 20
+    },
+    {
+      "epoch": 0.011948790896159318,
+      "grad_norm": 0.48095259070396423,
+      "learning_rate": 0.00019996873802550043,
+      "loss": 1.4158,
+      "step": 21
+    },
+    {
+      "epoch": 0.01251778093883357,
+      "grad_norm": 0.5575169324874878,
+      "learning_rate": 0.00019996279603520196,
+      "loss": 1.7057,
+      "step": 22
+    },
+    {
+      "epoch": 0.013086770981507824,
+      "grad_norm": 0.5423071384429932,
+      "learning_rate": 0.00019995633748360223,
+      "loss": 1.5661,
+      "step": 23
+    },
+    {
+      "epoch": 0.013655761024182077,
+      "grad_norm": 0.49561819434165955,
+      "learning_rate": 0.00019994936240407598,
+      "loss": 1.4119,
+      "step": 24
+    },
+    {
+      "epoch": 0.01422475106685633,
+      "grad_norm": 0.4862682521343231,
+      "learning_rate": 0.00019994187083266716,
+      "loss": 1.519,
+      "step": 25
+    },
+    {
+      "epoch": 0.014793741109530583,
+      "grad_norm": 0.5174720883369446,
+      "learning_rate": 0.0001999338628080888,
+      "loss": 1.3668,
+      "step": 26
+    },
+    {
+      "epoch": 0.015362731152204837,
+      "grad_norm": 0.5306721329689026,
+      "learning_rate": 0.0001999253383717226,
+      "loss": 1.6097,
+      "step": 27
+    },
+    {
+      "epoch": 0.015931721194879088,
+      "grad_norm": 0.5307742357254028,
+      "learning_rate": 0.00019991629756761886,
+      "loss": 1.7738,
+      "step": 28
+    },
+    {
+      "epoch": 0.016500711237553343,
+      "grad_norm": 0.6086705327033997,
+      "learning_rate": 0.00019990674044249634,
+      "loss": 1.7079,
+      "step": 29
+    },
+    {
+      "epoch": 0.017069701280227598,
+      "grad_norm": 0.5047173500061035,
+      "learning_rate": 0.00019989666704574175,
+      "loss": 1.6998,
+      "step": 30
+    },
+    {
+      "epoch": 0.01763869132290185,
+      "grad_norm": 0.5041013360023499,
+      "learning_rate": 0.00019988607742940978,
+      "loss": 1.7047,
+      "step": 31
+    },
+    {
+      "epoch": 0.018207681365576104,
+      "grad_norm": 0.4694116413593292,
+      "learning_rate": 0.00019987497164822263,
+      "loss": 1.3058,
+      "step": 32
+    },
+    {
+      "epoch": 0.018776671408250355,
+      "grad_norm": 0.5069786310195923,
+      "learning_rate": 0.0001998633497595698,
+      "loss": 1.6603,
+      "step": 33
+    },
+    {
+      "epoch": 0.01934566145092461,
+      "grad_norm": 0.4877070486545563,
+      "learning_rate": 0.0001998512118235078,
+      "loss": 1.5145,
+      "step": 34
+    },
+    {
+      "epoch": 0.01991465149359886,
+      "grad_norm": 0.5028818845748901,
+      "learning_rate": 0.0001998385579027599,
+      "loss": 1.5016,
+      "step": 35
+    },
+    {
+      "epoch": 0.020483641536273117,
+      "grad_norm": 0.4918319880962372,
+      "learning_rate": 0.00019982538806271566,
+      "loss": 1.5468,
+      "step": 36
+    },
+    {
+      "epoch": 0.021052631578947368,
+      "grad_norm": 0.5177620649337769,
+      "learning_rate": 0.00019981170237143067,
+      "loss": 1.5555,
+      "step": 37
+    },
+    {
+      "epoch": 0.021621621621621623,
+      "grad_norm": 0.49115803837776184,
+      "learning_rate": 0.00019979750089962629,
+      "loss": 1.592,
+      "step": 38
+    },
+    {
+      "epoch": 0.022190611664295874,
+      "grad_norm": 0.5621944069862366,
+      "learning_rate": 0.00019978278372068906,
+      "loss": 1.6697,
+      "step": 39
+    },
+    {
+      "epoch": 0.02275960170697013,
+      "grad_norm": 0.49260076880455017,
+      "learning_rate": 0.00019976755091067054,
+      "loss": 1.4688,
+      "step": 40
+    },
+    {
+      "epoch": 0.02332859174964438,
+      "grad_norm": 0.4910222589969635,
+      "learning_rate": 0.00019975180254828688,
+      "loss": 1.462,
+      "step": 41
+    },
+    {
+      "epoch": 0.023897581792318635,
+      "grad_norm": 0.5017576217651367,
+      "learning_rate": 0.0001997355387149182,
+      "loss": 1.6558,
+      "step": 42
+    },
+    {
+      "epoch": 0.024466571834992887,
+      "grad_norm": 0.5089415907859802,
+      "learning_rate": 0.00019971875949460852,
+      "loss": 1.6412,
+      "step": 43
+    },
+    {
+      "epoch": 0.02503556187766714,
+      "grad_norm": 0.4794662594795227,
+      "learning_rate": 0.00019970146497406505,
+      "loss": 1.6011,
+      "step": 44
+    },
+    {
+      "epoch": 0.025604551920341393,
+      "grad_norm": 0.5046934485435486,
+      "learning_rate": 0.00019968365524265777,
+      "loss": 1.6675,
+      "step": 45
+    },
+    {
+      "epoch": 0.026173541963015648,
+      "grad_norm": 0.4993690550327301,
+      "learning_rate": 0.0001996653303924192,
+      "loss": 1.6735,
+      "step": 46
+    },
+    {
+      "epoch": 0.0267425320056899,
+      "grad_norm": 0.48856502771377563,
+      "learning_rate": 0.00019964649051804355,
+      "loss": 1.5536,
+      "step": 47
+    },
+    {
+      "epoch": 0.027311522048364154,
+      "grad_norm": 0.4920005202293396,
+      "learning_rate": 0.0001996271357168866,
+      "loss": 1.6204,
+      "step": 48
+    },
+    {
+      "epoch": 0.027880512091038406,
+      "grad_norm": 0.5342410802841187,
+      "learning_rate": 0.00019960726608896502,
+      "loss": 1.719,
+      "step": 49
+    },
+    {
+      "epoch": 0.02844950213371266,
+      "grad_norm": 0.5041580200195312,
+      "learning_rate": 0.00019958688173695572,
+      "loss": 1.7053,
+      "step": 50
+    },
+    {
+      "epoch": 0.029018492176386912,
+      "grad_norm": 0.5237680077552795,
+      "learning_rate": 0.00019956598276619562,
+      "loss": 1.5091,
+      "step": 51
+    },
+    {
+      "epoch": 0.029587482219061167,
+      "grad_norm": 0.4911646246910095,
+      "learning_rate": 0.0001995445692846809,
+      "loss": 1.6085,
+      "step": 52
+    },
+    {
+      "epoch": 0.030156472261735418,
+      "grad_norm": 0.520005464553833,
+      "learning_rate": 0.00019952264140306645,
+      "loss": 1.4782,
+      "step": 53
+    },
+    {
+      "epoch": 0.030725462304409673,
+      "grad_norm": 0.49788954854011536,
+      "learning_rate": 0.0001995001992346654,
+      "loss": 1.4905,
+      "step": 54
+    },
+    {
+      "epoch": 0.031294452347083924,
+      "grad_norm": 0.5043379664421082,
+      "learning_rate": 0.00019947724289544845,
+      "loss": 1.6566,
+      "step": 55
+    },
+    {
+      "epoch": 0.031863442389758176,
+      "grad_norm": 0.5547715425491333,
+      "learning_rate": 0.00019945377250404328,
+      "loss": 1.7227,
+      "step": 56
+    },
+    {
+      "epoch": 0.032432432432432434,
+      "grad_norm": 0.5288915634155273,
+      "learning_rate": 0.000199429788181734,
+      "loss": 1.5921,
+      "step": 57
+    },
+    {
+      "epoch": 0.033001422475106686,
+      "grad_norm": 0.5353677868843079,
+      "learning_rate": 0.00019940529005246048,
+      "loss": 1.5371,
+      "step": 58
+    },
+    {
+      "epoch": 0.03357041251778094,
+      "grad_norm": 0.520143449306488,
+      "learning_rate": 0.00019938027824281757,
+      "loss": 1.6308,
+      "step": 59
+    },
+    {
+      "epoch": 0.034139402560455195,
+      "grad_norm": 0.50368732213974,
+      "learning_rate": 0.0001993547528820548,
+      "loss": 1.4645,
+      "step": 60
+    },
+    {
+      "epoch": 0.03470839260312945,
+      "grad_norm": 0.5326752066612244,
+      "learning_rate": 0.0001993287141020753,
+      "loss": 1.5832,
+      "step": 61
+    },
+    {
+      "epoch": 0.0352773826458037,
+      "grad_norm": 0.48568812012672424,
+      "learning_rate": 0.00019930216203743544,
+      "loss": 1.4137,
+      "step": 62
+    },
+    {
+      "epoch": 0.03584637268847795,
+      "grad_norm": 0.4832801818847656,
+      "learning_rate": 0.0001992750968253439,
+      "loss": 1.4713,
+      "step": 63
+    },
+    {
+      "epoch": 0.03641536273115221,
+      "grad_norm": 0.49059394001960754,
+      "learning_rate": 0.00019924751860566118,
+      "loss": 1.6009,
+      "step": 64
+    },
+    {
+      "epoch": 0.03698435277382646,
+      "grad_norm": 0.5292865633964539,
+      "learning_rate": 0.0001992194275208987,
+      "loss": 1.6339,
+      "step": 65
+    },
+    {
+      "epoch": 0.03755334281650071,
+      "grad_norm": 0.520621120929718,
+      "learning_rate": 0.00019919082371621811,
+      "loss": 1.7033,
+      "step": 66
+    },
+    {
+      "epoch": 0.03812233285917496,
+      "grad_norm": 0.5552493929862976,
+      "learning_rate": 0.0001991617073394306,
+      "loss": 1.5704,
+      "step": 67
+    },
+    {
+      "epoch": 0.03869132290184922,
+      "grad_norm": 0.5199451446533203,
+      "learning_rate": 0.0001991320785409961,
+      "loss": 1.6266,
+      "step": 68
+    },
+    {
+      "epoch": 0.03926031294452347,
+      "grad_norm": 0.540593147277832,
+      "learning_rate": 0.0001991019374740225,
+      "loss": 1.7327,
+      "step": 69
+    },
+    {
+      "epoch": 0.03982930298719772,
+      "grad_norm": 0.5305120348930359,
+      "learning_rate": 0.00019907128429426477,
+      "loss": 1.6544,
+      "step": 70
+    },
+    {
+      "epoch": 0.040398293029871975,
+      "grad_norm": 0.5247764587402344,
+      "learning_rate": 0.00019904011916012433,
+      "loss": 1.429,
+      "step": 71
+    },
+    {
+      "epoch": 0.04096728307254623,
+      "grad_norm": 0.500156819820404,
+      "learning_rate": 0.00019900844223264813,
+      "loss": 1.6106,
+      "step": 72
+    },
+    {
+      "epoch": 0.041536273115220484,
+      "grad_norm": 0.49794986844062805,
+      "learning_rate": 0.00019897625367552784,
+      "loss": 1.5322,
+      "step": 73
+    },
+    {
+      "epoch": 0.042105263157894736,
+      "grad_norm": 0.5475789308547974,
+      "learning_rate": 0.00019894355365509894,
+      "loss": 1.4882,
+      "step": 74
+    },
+    {
+      "epoch": 0.04267425320056899,
+      "grad_norm": 0.5272343158721924,
+      "learning_rate": 0.00019891034234033995,
+      "loss": 1.5119,
+      "step": 75
+    },
+    {
+      "epoch": 0.043243243243243246,
+      "grad_norm": 0.4892237186431885,
+      "learning_rate": 0.00019887661990287153,
+      "loss": 1.5567,
+      "step": 76
+    },
+    {
+      "epoch": 0.0438122332859175,
+      "grad_norm": 0.528414249420166,
+      "learning_rate": 0.00019884238651695556,
+      "loss": 1.7716,
+      "step": 77
+    },
+    {
+      "epoch": 0.04438122332859175,
+      "grad_norm": 0.5159140229225159,
+      "learning_rate": 0.00019880764235949427,
+      "loss": 1.6873,
+      "step": 78
+    },
+    {
+      "epoch": 0.044950213371266,
+      "grad_norm": 0.5157197713851929,
+      "learning_rate": 0.0001987723876100294,
+      "loss": 1.5196,
+      "step": 79
+    },
+    {
+      "epoch": 0.04551920341394026,
+      "grad_norm": 0.518205463886261,
+      "learning_rate": 0.00019873662245074102,
+      "loss": 1.5238,
+      "step": 80
+    },
+    {
+      "epoch": 0.04608819345661451,
+      "grad_norm": 0.5316376090049744,
+      "learning_rate": 0.00019870034706644693,
+      "loss": 1.4913,
+      "step": 81
+    },
+    {
+      "epoch": 0.04665718349928876,
+      "grad_norm": 0.5020834803581238,
+      "learning_rate": 0.00019866356164460145,
+      "loss": 1.4051,
+      "step": 82
+    },
+    {
+      "epoch": 0.04722617354196301,
+      "grad_norm": 0.4912559986114502,
+      "learning_rate": 0.00019862626637529455,
+      "loss": 1.4947,
+      "step": 83
+    },
+    {
+      "epoch": 0.04779516358463727,
+      "grad_norm": 0.5261936187744141,
+      "learning_rate": 0.00019858846145125086,
+      "loss": 1.659,
+      "step": 84
+    },
+    {
+      "epoch": 0.04836415362731152,
+      "grad_norm": 0.5002409815788269,
+      "learning_rate": 0.00019855014706782867,
+      "loss": 1.4743,
+      "step": 85
+    },
+    {
+      "epoch": 0.048933143669985774,
+      "grad_norm": 0.5293824672698975,
+      "learning_rate": 0.0001985113234230189,
+      "loss": 1.5796,
+      "step": 86
+    },
+    {
+      "epoch": 0.049502133712660025,
+      "grad_norm": 0.49084582924842834,
+      "learning_rate": 0.00019847199071744415,
+      "loss": 1.6052,
+      "step": 87
+    },
+    {
+      "epoch": 0.05007112375533428,
+      "grad_norm": 0.5251219868659973,
+      "learning_rate": 0.00019843214915435758,
+      "loss": 1.7684,
+      "step": 88
+    },
+    {
+      "epoch": 0.050640113798008535,
+      "grad_norm": 0.5003427267074585,
+      "learning_rate": 0.0001983917989396418,
+      "loss": 1.5715,
+      "step": 89
+    },
+    {
+      "epoch": 0.051209103840682786,
+      "grad_norm": 0.5283729434013367,
+      "learning_rate": 0.0001983509402818081,
+      "loss": 1.5396,
+      "step": 90
+    },
+    {
+      "epoch": 0.051778093883357044,
+      "grad_norm": 0.49652016162872314,
+      "learning_rate": 0.00019830957339199494,
+      "loss": 1.5353,
+      "step": 91
+    },
+    {
+      "epoch": 0.052347083926031296,
+      "grad_norm": 0.49297675490379333,
+      "learning_rate": 0.00019826769848396727,
+      "loss": 1.5012,
+      "step": 92
+    },
+    {
+      "epoch": 0.05291607396870555,
+      "grad_norm": 0.5100125670433044,
+      "learning_rate": 0.0001982253157741151,
+      "loss": 1.6194,
+      "step": 93
+    },
+    {
+      "epoch": 0.0534850640113798,
+      "grad_norm": 0.5218221545219421,
+      "learning_rate": 0.00019818242548145265,
+      "loss": 1.6505,
+      "step": 94
+    },
+    {
+      "epoch": 0.05405405405405406,
+      "grad_norm": 0.5490546226501465,
+      "learning_rate": 0.000198139027827617,
+      "loss": 1.498,
+      "step": 95
+    },
+    {
+      "epoch": 0.05462304409672831,
+      "grad_norm": 0.5228062868118286,
+      "learning_rate": 0.00019809512303686706,
+      "loss": 1.4592,
+      "step": 96
+    },
+    {
+      "epoch": 0.05519203413940256,
+      "grad_norm": 0.49827295541763306,
+      "learning_rate": 0.00019805071133608242,
+      "loss": 1.6593,
+      "step": 97
+    },
+    {
+      "epoch": 0.05576102418207681,
+      "grad_norm": 0.5081865191459656,
+      "learning_rate": 0.0001980057929547621,
+      "loss": 1.4226,
+      "step": 98
+    },
+    {
+      "epoch": 0.05633001422475107,
+      "grad_norm": 0.5018671751022339,
+      "learning_rate": 0.00019796036812502347,
+      "loss": 1.4995,
+      "step": 99
+    },
+    {
+      "epoch": 0.05689900426742532,
+      "grad_norm": 0.5807016491889954,
+      "learning_rate": 0.00019791443708160094,
+      "loss": 1.7405,
+      "step": 100
+    },
+    {
+      "epoch": 0.05746799431009957,
+      "grad_norm": 0.5095066428184509,
+      "learning_rate": 0.00019786800006184473,
+      "loss": 1.4908,
+      "step": 101
+    },
+    {
+      "epoch": 0.058036984352773824,
+      "grad_norm": 0.5552268028259277,
+      "learning_rate": 0.00019782105730571992,
+      "loss": 1.5289,
+      "step": 102
+    },
+    {
+      "epoch": 0.05860597439544808,
+      "grad_norm": 0.47026970982551575,
+      "learning_rate": 0.00019777360905580478,
+      "loss": 1.3497,
+      "step": 103
+    },
+    {
+      "epoch": 0.059174964438122334,
+      "grad_norm": 0.5475593209266663,
+      "learning_rate": 0.00019772565555728984,
+      "loss": 1.6329,
+      "step": 104
+    },
+    {
+      "epoch": 0.059743954480796585,
+      "grad_norm": 0.5217400789260864,
+      "learning_rate": 0.00019767719705797657,
+      "loss": 1.6181,
+      "step": 105
+    },
+    {
+      "epoch": 0.060312944523470836,
+      "grad_norm": 0.5143265128135681,
+      "learning_rate": 0.00019762823380827592,
+      "loss": 1.6369,
+      "step": 106
+    },
+    {
+      "epoch": 0.060881934566145095,
+      "grad_norm": 0.501568615436554,
+      "learning_rate": 0.0001975787660612072,
+      "loss": 1.6871,
+      "step": 107
+    },
+    {
+      "epoch": 0.061450924608819346,
+      "grad_norm": 0.47950610518455505,
+      "learning_rate": 0.00019752879407239685,
+      "loss": 1.4494,
+      "step": 108
+    },
+    {
+      "epoch": 0.0620199146514936,
+      "grad_norm": 0.5488466024398804,
+      "learning_rate": 0.0001974783181000768,
+      "loss": 1.6457,
+      "step": 109
+    },
+    {
+      "epoch": 0.06258890469416785,
+      "grad_norm": 0.5165080428123474,
+      "learning_rate": 0.0001974273384050835,
+      "loss": 1.5463,
+      "step": 110
+    },
+    {
+      "epoch": 0.06315789473684211,
+      "grad_norm": 0.5002058744430542,
+      "learning_rate": 0.0001973758552508563,
+      "loss": 1.4333,
+      "step": 111
+    },
+    {
+      "epoch": 0.06372688477951635,
+      "grad_norm": 0.4927598237991333,
+      "learning_rate": 0.00019732386890343624,
+      "loss": 1.5576,
+      "step": 112
+    },
+    {
+      "epoch": 0.06429587482219061,
+      "grad_norm": 0.5156055688858032,
+      "learning_rate": 0.0001972713796314646,
+      "loss": 1.4821,
+      "step": 113
+    },
+    {
+      "epoch": 0.06486486486486487,
+      "grad_norm": 0.5108924508094788,
+      "learning_rate": 0.0001972183877061816,
+      "loss": 1.502,
+      "step": 114
+    },
+    {
+      "epoch": 0.06543385490753911,
+      "grad_norm": 0.5052126049995422,
+      "learning_rate": 0.00019716489340142483,
+      "loss": 1.7285,
+      "step": 115
+    },
+    {
+      "epoch": 0.06600284495021337,
+      "grad_norm": 0.5034211874008179,
+      "learning_rate": 0.00019711089699362807,
+      "loss": 1.4148,
+      "step": 116
+    },
+    {
+      "epoch": 0.06657183499288763,
+      "grad_norm": 0.5284733772277832,
+      "learning_rate": 0.00019705639876181969,
+      "loss": 1.5979,
+      "step": 117
+    },
+    {
+      "epoch": 0.06714082503556187,
+      "grad_norm": 0.5434923768043518,
+      "learning_rate": 0.0001970013989876212,
+      "loss": 1.6856,
+      "step": 118
+    },
+    {
+      "epoch": 0.06770981507823613,
+      "grad_norm": 0.48895972967147827,
+      "learning_rate": 0.00019694589795524588,
+      "loss": 1.5305,
+      "step": 119
+    },
+    {
+      "epoch": 0.06827880512091039,
+      "grad_norm": 0.5481955409049988,
+      "learning_rate": 0.00019688989595149732,
+      "loss": 1.473,
+      "step": 120
+    },
+    {
+      "epoch": 0.06884779516358464,
+      "grad_norm": 0.47966116666793823,
+      "learning_rate": 0.00019683339326576781,
+      "loss": 1.1899,
+      "step": 121
+    },
+    {
+      "epoch": 0.0694167852062589,
+      "grad_norm": 0.5007337927818298,
+      "learning_rate": 0.00019677639019003706,
+      "loss": 1.4747,
+      "step": 122
+    },
+    {
+      "epoch": 0.06998577524893314,
+      "grad_norm": 0.5798030495643616,
+      "learning_rate": 0.00019671888701887046,
+      "loss": 1.5881,
+      "step": 123
+    },
+    {
+      "epoch": 0.0705547652916074,
+      "grad_norm": 0.5382363200187683,
+      "learning_rate": 0.0001966608840494177,
+      "loss": 1.6345,
+      "step": 124
+    },
+    {
+      "epoch": 0.07112375533428165,
+      "grad_norm": 0.5181685090065002,
+      "learning_rate": 0.00019660238158141112,
+      "loss": 1.48,
+      "step": 125
+    },
+    {
+      "epoch": 0.0716927453769559,
+      "grad_norm": 0.5349889993667603,
+      "learning_rate": 0.0001965433799171644,
+      "loss": 1.5679,
+      "step": 126
+    },
+    {
+      "epoch": 0.07226173541963016,
+      "grad_norm": 0.496991902589798,
+      "learning_rate": 0.00019648387936157068,
+      "loss": 1.5596,
+      "step": 127
+    },
+    {
+      "epoch": 0.07283072546230442,
+      "grad_norm": 0.5177836418151855,
+      "learning_rate": 0.0001964238802221012,
+      "loss": 1.3765,
+      "step": 128
+    },
+    {
+      "epoch": 0.07339971550497866,
+      "grad_norm": 0.5253962874412537,
+      "learning_rate": 0.00019636338280880366,
+      "loss": 1.7268,
+      "step": 129
+    },
+    {
+      "epoch": 0.07396870554765292,
+      "grad_norm": 0.5878409743309021,
+      "learning_rate": 0.00019630238743430058,
+      "loss": 1.5933,
+      "step": 130
+    },
+    {
+      "epoch": 0.07453769559032716,
+      "grad_norm": 0.5072840452194214,
+      "learning_rate": 0.00019624089441378775,
+      "loss": 1.3819,
+      "step": 131
+    },
+    {
+      "epoch": 0.07510668563300142,
+      "grad_norm": 0.5567812323570251,
+      "learning_rate": 0.0001961789040650325,
+      "loss": 1.5582,
+      "step": 132
+    },
+    {
+      "epoch": 0.07567567567567568,
+      "grad_norm": 0.48109254240989685,
+      "learning_rate": 0.00019611641670837219,
+      "loss": 1.4227,
+      "step": 133
+    },
+    {
+      "epoch": 0.07624466571834992,
+      "grad_norm": 0.5404167175292969,
+      "learning_rate": 0.00019605343266671245,
+      "loss": 1.6807,
+      "step": 134
+    },
+    {
+      "epoch": 0.07681365576102418,
+      "grad_norm": 0.47476792335510254,
+      "learning_rate": 0.00019598995226552556,
+      "loss": 1.3462,
+      "step": 135
+    },
+    {
+      "epoch": 0.07738264580369844,
+      "grad_norm": 0.4884220361709595,
+      "learning_rate": 0.0001959259758328487,
+      "loss": 1.5956,
+      "step": 136
+    },
+    {
+      "epoch": 0.07795163584637269,
+      "grad_norm": 0.5190904140472412,
+      "learning_rate": 0.00019586150369928245,
+      "loss": 1.6685,
+      "step": 137
+    },
+    {
+      "epoch": 0.07852062588904694,
+      "grad_norm": 0.513028621673584,
+      "learning_rate": 0.0001957965361979888,
+      "loss": 1.7023,
+      "step": 138
+    },
+    {
+      "epoch": 0.07908961593172119,
+      "grad_norm": 0.4926295578479767,
+      "learning_rate": 0.00019573107366468962,
+      "loss": 1.4606,
+      "step": 139
+    },
+    {
+      "epoch": 0.07965860597439545,
+      "grad_norm": 0.5009914636611938,
+      "learning_rate": 0.00019566511643766485,
+      "loss": 1.5636,
+      "step": 140
+    },
+    {
+      "epoch": 0.0802275960170697,
+      "grad_norm": 0.54355388879776,
+      "learning_rate": 0.00019559866485775084,
+      "loss": 1.681,
+      "step": 141
+    },
+    {
+      "epoch": 0.08079658605974395,
+      "grad_norm": 0.5059416890144348,
+      "learning_rate": 0.00019553171926833853,
+      "loss": 1.6193,
+      "step": 142
+    },
+    {
+      "epoch": 0.08136557610241821,
+      "grad_norm": 0.5309209227561951,
+      "learning_rate": 0.00019546428001537155,
+      "loss": 1.5552,
+      "step": 143
+    },
+    {
+      "epoch": 0.08193456614509247,
+      "grad_norm": 0.4913862943649292,
+      "learning_rate": 0.0001953963474473447,
+      "loss": 1.5506,
+      "step": 144
+    },
+    {
+      "epoch": 0.08250355618776671,
+      "grad_norm": 0.5331928133964539,
+      "learning_rate": 0.0001953279219153019,
+      "loss": 1.7152,
+      "step": 145
+    },
+    {
+      "epoch": 0.08307254623044097,
+      "grad_norm": 0.5169084072113037,
+      "learning_rate": 0.00019525900377283457,
+      "loss": 1.6177,
+      "step": 146
+    },
+    {
+      "epoch": 0.08364153627311523,
+      "grad_norm": 0.5159075856208801,
+      "learning_rate": 0.00019518959337607957,
+      "loss": 1.5652,
+      "step": 147
+    },
+    {
+      "epoch": 0.08421052631578947,
+      "grad_norm": 0.5606206655502319,
+      "learning_rate": 0.0001951196910837177,
+      "loss": 1.6821,
+      "step": 148
+    },
+    {
+      "epoch": 0.08477951635846373,
+      "grad_norm": 0.47890591621398926,
+      "learning_rate": 0.0001950492972569715,
+      "loss": 1.5041,
+      "step": 149
+    },
+    {
+      "epoch": 0.08534850640113797,
+      "grad_norm": 0.5077673196792603,
+      "learning_rate": 0.0001949784122596035,
+      "loss": 1.5837,
+      "step": 150
+    },
+    {
+      "epoch": 0.08591749644381223,
+      "grad_norm": 0.5021458268165588,
+      "learning_rate": 0.00019490703645791454,
+      "loss": 1.5813,
+      "step": 151
+    },
+    {
+      "epoch": 0.08648648648648649,
+      "grad_norm": 0.5000331997871399,
+      "learning_rate": 0.00019483517022074156,
+      "loss": 1.5686,
+      "step": 152
+    },
+    {
+      "epoch": 0.08705547652916074,
+      "grad_norm": 0.5121405124664307,
+      "learning_rate": 0.0001947628139194559,
+      "loss": 1.4329,
+      "step": 153
+    },
+    {
+      "epoch": 0.087624466571835,
+      "grad_norm": 0.5058543682098389,
+      "learning_rate": 0.00019468996792796137,
+      "loss": 1.36,
+      "step": 154
+    },
+    {
+      "epoch": 0.08819345661450925,
+      "grad_norm": 0.5810546875,
+      "learning_rate": 0.00019461663262269213,
+      "loss": 1.3764,
+      "step": 155
+    },
+    {
+      "epoch": 0.0887624466571835,
+      "grad_norm": 0.5015589594841003,
+      "learning_rate": 0.00019454280838261106,
+      "loss": 1.4966,
+      "step": 156
+    },
+    {
+      "epoch": 0.08933143669985776,
+      "grad_norm": 0.5284256339073181,
+      "learning_rate": 0.0001944684955892075,
+      "loss": 1.4944,
+      "step": 157
+    },
+    {
+      "epoch": 0.089900426742532,
+      "grad_norm": 0.49957889318466187,
+      "learning_rate": 0.0001943936946264955,
+      "loss": 1.4641,
+      "step": 158
+    },
+    {
+      "epoch": 0.09046941678520626,
+      "grad_norm": 0.5073912143707275,
+      "learning_rate": 0.00019431840588101157,
+      "loss": 1.3371,
+      "step": 159
+    },
+    {
+      "epoch": 0.09103840682788052,
+      "grad_norm": 0.5323196649551392,
+      "learning_rate": 0.00019424262974181313,
+      "loss": 1.5312,
+      "step": 160
+    },
+    {
+      "epoch": 0.09160739687055476,
+      "grad_norm": 0.5276457071304321,
+      "learning_rate": 0.00019416636660047595,
+      "loss": 1.64,
+      "step": 161
+    },
+    {
+      "epoch": 0.09217638691322902,
+      "grad_norm": 0.49499741196632385,
+      "learning_rate": 0.0001940896168510926,
+      "loss": 1.3689,
+      "step": 162
+    },
+    {
+      "epoch": 0.09274537695590328,
+      "grad_norm": 0.5169721245765686,
+      "learning_rate": 0.00019401238089027017,
+      "loss": 1.5352,
+      "step": 163
+    },
+    {
+      "epoch": 0.09331436699857752,
+      "grad_norm": 0.48859354853630066,
+      "learning_rate": 0.0001939346591171281,
+      "loss": 1.4584,
+      "step": 164
+    },
+    {
+      "epoch": 0.09388335704125178,
+      "grad_norm": 0.5150989890098572,
+      "learning_rate": 0.00019385645193329654,
+      "loss": 1.5178,
+      "step": 165
+    },
+    {
+      "epoch": 0.09445234708392602,
+      "grad_norm": 0.48626863956451416,
+      "learning_rate": 0.00019377775974291383,
+      "loss": 1.3689,
+      "step": 166
+    },
+    {
+      "epoch": 0.09502133712660028,
+      "grad_norm": 0.5352733731269836,
+      "learning_rate": 0.0001936985829526247,
+      "loss": 1.5953,
+      "step": 167
+    },
+    {
+      "epoch": 0.09559032716927454,
+      "grad_norm": 0.5061799883842468,
+      "learning_rate": 0.00019361892197157797,
+      "loss": 1.6339,
+      "step": 168
+    },
+    {
+      "epoch": 0.09615931721194879,
+      "grad_norm": 0.5095758438110352,
+      "learning_rate": 0.0001935387772114246,
+      "loss": 1.5116,
+      "step": 169
+    },
+    {
+      "epoch": 0.09672830725462304,
+      "grad_norm": 0.4948934316635132,
+      "learning_rate": 0.00019345814908631556,
+      "loss": 1.3963,
+      "step": 170
+    },
+    {
+      "epoch": 0.0972972972972973,
+      "grad_norm": 0.5632720589637756,
+      "learning_rate": 0.0001933770380128995,
+      "loss": 1.618,
+      "step": 171
+    },
+    {
+      "epoch": 0.09786628733997155,
+      "grad_norm": 0.5013827681541443,
+      "learning_rate": 0.00019329544441032076,
+      "loss": 1.4847,
+      "step": 172
+    },
+    {
+      "epoch": 0.0984352773826458,
+      "grad_norm": 0.512117326259613,
+      "learning_rate": 0.0001932133687002172,
+      "loss": 1.4346,
+      "step": 173
+    },
+    {
+      "epoch": 0.09900426742532005,
+      "grad_norm": 0.5385090708732605,
+      "learning_rate": 0.00019313081130671798,
+      "loss": 1.6694,
+      "step": 174
+    },
+    {
+      "epoch": 0.09957325746799431,
+      "grad_norm": 0.5616840720176697,
+      "learning_rate": 0.00019304777265644133,
+      "loss": 1.5638,
+      "step": 175
+    },
+    {
+      "epoch": 0.10014224751066857,
+      "grad_norm": 0.5222409963607788,
+      "learning_rate": 0.0001929642531784925,
+      "loss": 1.6203,
+      "step": 176
+    },
+    {
+      "epoch": 0.10071123755334281,
+      "grad_norm": 0.5733211040496826,
+      "learning_rate": 0.00019288025330446126,
+      "loss": 1.6952,
+      "step": 177
+    },
+    {
+      "epoch": 0.10128022759601707,
+      "grad_norm": 0.5625792741775513,
+      "learning_rate": 0.00019279577346842,
+      "loss": 1.6639,
+      "step": 178
+    },
+    {
+      "epoch": 0.10184921763869133,
+      "grad_norm": 0.5778010487556458,
+      "learning_rate": 0.0001927108141069213,
+      "loss": 1.5719,
+      "step": 179
+    },
+    {
+      "epoch": 0.10241820768136557,
+      "grad_norm": 0.5034694671630859,
+      "learning_rate": 0.00019262537565899564,
+      "loss": 1.4461,
+      "step": 180
+    },
+    {
+      "epoch": 0.10298719772403983,
+      "grad_norm": 0.5446426272392273,
+      "learning_rate": 0.0001925394585661492,
+      "loss": 1.4904,
+      "step": 181
+    },
+    {
+      "epoch": 0.10355618776671409,
+      "grad_norm": 0.47503742575645447,
+      "learning_rate": 0.00019245306327236172,
+      "loss": 1.5012,
+      "step": 182
+    },
+    {
+      "epoch": 0.10412517780938833,
+      "grad_norm": 0.5337246656417847,
+      "learning_rate": 0.00019236619022408387,
+      "loss": 1.4175,
+      "step": 183
+    },
+    {
+      "epoch": 0.10469416785206259,
+      "grad_norm": 0.5157039165496826,
+      "learning_rate": 0.00019227883987023523,
+      "loss": 1.6435,
+      "step": 184
+    },
+    {
+      "epoch": 0.10526315789473684,
+      "grad_norm": 0.5278623700141907,
+      "learning_rate": 0.00019219101266220188,
+      "loss": 1.6746,
+      "step": 185
+    },
+    {
+      "epoch": 0.1058321479374111,
+      "grad_norm": 0.4916015565395355,
+      "learning_rate": 0.000192102709053834,
+      "loss": 1.4584,
+      "step": 186
+    },
+    {
+      "epoch": 0.10640113798008535,
+      "grad_norm": 0.5512337684631348,
+      "learning_rate": 0.00019201392950144363,
+      "loss": 1.6313,
+      "step": 187
+    },
+    {
+      "epoch": 0.1069701280227596,
+      "grad_norm": 0.506673276424408,
+      "learning_rate": 0.0001919246744638023,
+      "loss": 1.4842,
+      "step": 188
+    },
+    {
+      "epoch": 0.10753911806543386,
+      "grad_norm": 0.49428772926330566,
+      "learning_rate": 0.00019183494440213857,
+      "loss": 1.4246,
+      "step": 189
+    },
+    {
+      "epoch": 0.10810810810810811,
+      "grad_norm": 0.5020580887794495,
+      "learning_rate": 0.0001917447397801357,
+      "loss": 1.6966,
+      "step": 190
+    },
+    {
+      "epoch": 0.10867709815078236,
+      "grad_norm": 0.5004864931106567,
+      "learning_rate": 0.00019165406106392928,
+      "loss": 1.3144,
+      "step": 191
+    },
+    {
+      "epoch": 0.10924608819345662,
+      "grad_norm": 0.47853466868400574,
+      "learning_rate": 0.00019156290872210488,
+      "loss": 1.3321,
+      "step": 192
+    },
+    {
+      "epoch": 0.10981507823613086,
+      "grad_norm": 0.4940144121646881,
+      "learning_rate": 0.00019147128322569533,
+      "loss": 1.2719,
+      "step": 193
+    },
+    {
+      "epoch": 0.11038406827880512,
+      "grad_norm": 0.5355538725852966,
+      "learning_rate": 0.00019137918504817878,
+      "loss": 1.4551,
+      "step": 194
+    },
+    {
+      "epoch": 0.11095305832147938,
+      "grad_norm": 0.5604861378669739,
+      "learning_rate": 0.00019128661466547576,
+      "loss": 1.6109,
+      "step": 195
+    },
+    {
+      "epoch": 0.11152204836415362,
+      "grad_norm": 0.5061023235321045,
+      "learning_rate": 0.000191193572555947,
+      "loss": 1.511,
+      "step": 196
+    },
+    {
+      "epoch": 0.11209103840682788,
+      "grad_norm": 0.5125574469566345,
+      "learning_rate": 0.0001911000592003909,
+      "loss": 1.4209,
+      "step": 197
+    },
+    {
+      "epoch": 0.11266002844950214,
+      "grad_norm": 0.5150197744369507,
+      "learning_rate": 0.00019100607508204114,
+      "loss": 1.6323,
+      "step": 198
+    },
+    {
+      "epoch": 0.11322901849217638,
+      "grad_norm": 0.5164692997932434,
+      "learning_rate": 0.0001909116206865639,
+      "loss": 1.5086,
+      "step": 199
+    },
+    {
+      "epoch": 0.11379800853485064,
+      "grad_norm": 0.5399172306060791,
+      "learning_rate": 0.00019081669650205564,
+      "loss": 1.5051,
+      "step": 200
+    },
+    {
+      "epoch": 0.11436699857752489,
+      "grad_norm": 0.49494683742523193,
+      "learning_rate": 0.0001907213030190405,
+      "loss": 1.5123,
+      "step": 201
+    },
+    {
+      "epoch": 0.11493598862019914,
+      "grad_norm": 0.5344505906105042,
+      "learning_rate": 0.00019062544073046768,
+      "loss": 1.5364,
+      "step": 202
+    },
+    {
+      "epoch": 0.1155049786628734,
+      "grad_norm": 0.5201467871665955,
+      "learning_rate": 0.00019052911013170892,
+      "loss": 1.5027,
+      "step": 203
+    },
+    {
+      "epoch": 0.11607396870554765,
+      "grad_norm": 0.5991513729095459,
+      "learning_rate": 0.00019043231172055603,
+      "loss": 1.6402,
+      "step": 204
+    },
+    {
+      "epoch": 0.1166429587482219,
+      "grad_norm": 0.5526711940765381,
+      "learning_rate": 0.00019033504599721827,
+      "loss": 1.6166,
+      "step": 205
+    },
+    {
+      "epoch": 0.11721194879089616,
+      "grad_norm": 0.493965208530426,
+      "learning_rate": 0.00019023731346431972,
+      "loss": 1.3099,
+      "step": 206
+    },
+    {
+      "epoch": 0.11778093883357041,
+      "grad_norm": 0.5043678879737854,
+      "learning_rate": 0.00019013911462689668,
+      "loss": 1.3328,
+      "step": 207
+    },
+    {
+      "epoch": 0.11834992887624467,
+      "grad_norm": 0.518515944480896,
+      "learning_rate": 0.00019004044999239517,
+      "loss": 1.453,
+      "step": 208
+    },
+    {
+      "epoch": 0.11891891891891893,
+      "grad_norm": 0.547725260257721,
+      "learning_rate": 0.00018994132007066816,
+      "loss": 1.552,
+      "step": 209
+    },
+    {
+      "epoch": 0.11948790896159317,
+      "grad_norm": 0.5498734712600708,
+      "learning_rate": 0.0001898417253739731,
+      "loss": 1.6076,
+      "step": 210
+    },
+    {
+      "epoch": 0.12005689900426743,
+      "grad_norm": 0.5087684392929077,
+      "learning_rate": 0.00018974166641696908,
+      "loss": 1.3459,
+      "step": 211
+    },
+    {
+      "epoch": 0.12062588904694167,
+      "grad_norm": 0.49864476919174194,
+      "learning_rate": 0.00018964114371671428,
+      "loss": 1.502,
+      "step": 212
+    },
+    {
+      "epoch": 0.12119487908961593,
+      "grad_norm": 0.49818646907806396,
+      "learning_rate": 0.0001895401577926634,
+      "loss": 1.5047,
+      "step": 213
+    },
+    {
+      "epoch": 0.12176386913229019,
+      "grad_norm": 0.5151641964912415,
+      "learning_rate": 0.00018943870916666476,
+      "loss": 1.5276,
+      "step": 214
+    },
+    {
+      "epoch": 0.12233285917496443,
+      "grad_norm": 0.5294698476791382,
+      "learning_rate": 0.00018933679836295777,
+      "loss": 1.4735,
+      "step": 215
+    },
+    {
+      "epoch": 0.12290184921763869,
+      "grad_norm": 0.5169737339019775,
+      "learning_rate": 0.0001892344259081701,
+      "loss": 1.6458,
+      "step": 216
+    },
+    {
+      "epoch": 0.12347083926031295,
+      "grad_norm": 0.5262957811355591,
+      "learning_rate": 0.000189131592331315,
+      "loss": 1.6239,
+      "step": 217
+    },
+    {
+      "epoch": 0.1240398293029872,
+      "grad_norm": 0.5043689012527466,
+      "learning_rate": 0.00018902829816378876,
+      "loss": 1.5785,
+      "step": 218
+    },
+    {
+      "epoch": 0.12460881934566145,
+      "grad_norm": 0.5032008290290833,
+      "learning_rate": 0.00018892454393936754,
+      "loss": 1.4075,
+      "step": 219
+    },
+    {
+      "epoch": 0.1251778093883357,
+      "grad_norm": 0.5261518359184265,
+      "learning_rate": 0.00018882033019420504,
+      "loss": 1.4251,
+      "step": 220
+    },
+    {
+      "epoch": 0.12574679943100997,
+      "grad_norm": 0.5519723296165466,
+      "learning_rate": 0.00018871565746682949,
+      "loss": 1.6654,
+      "step": 221
+    },
+    {
+      "epoch": 0.12631578947368421,
+      "grad_norm": 0.5465745329856873,
+      "learning_rate": 0.0001886105262981409,
+      "loss": 1.5489,
+      "step": 222
+    },
+    {
+      "epoch": 0.12688477951635846,
+      "grad_norm": 0.6040769219398499,
+      "learning_rate": 0.00018850493723140835,
+      "loss": 1.6205,
+      "step": 223
+    },
+    {
+      "epoch": 0.1274537695590327,
+      "grad_norm": 0.5207870006561279,
+      "learning_rate": 0.0001883988908122671,
+      "loss": 1.5843,
+      "step": 224
+    },
+    {
+      "epoch": 0.12802275960170698,
+      "grad_norm": 0.5130170583724976,
+      "learning_rate": 0.00018829238758871574,
+      "loss": 1.5384,
+      "step": 225
+    },
+    {
+      "epoch": 0.12859174964438122,
+      "grad_norm": 0.5100380182266235,
+      "learning_rate": 0.00018818542811111354,
+      "loss": 1.5026,
+      "step": 226
+    },
+    {
+      "epoch": 0.12916073968705546,
+      "grad_norm": 0.5047493577003479,
+      "learning_rate": 0.00018807801293217735,
+      "loss": 1.4774,
+      "step": 227
+    },
+    {
+      "epoch": 0.12972972972972974,
+      "grad_norm": 0.5392350554466248,
+      "learning_rate": 0.0001879701426069789,
+      "loss": 1.2986,
+      "step": 228
+    },
+    {
+      "epoch": 0.13029871977240398,
+      "grad_norm": 0.4927089810371399,
+      "learning_rate": 0.00018786181769294203,
+      "loss": 1.3298,
+      "step": 229
+    },
+    {
+      "epoch": 0.13086770981507823,
+      "grad_norm": 0.5079994797706604,
+      "learning_rate": 0.0001877530387498395,
+      "loss": 1.4027,
+      "step": 230
+    },
+    {
+      "epoch": 0.1314366998577525,
+      "grad_norm": 0.5074231624603271,
+      "learning_rate": 0.00018764380633979035,
+      "loss": 1.6176,
+      "step": 231
+    },
+    {
+      "epoch": 0.13200568990042674,
+      "grad_norm": 0.5501790642738342,
+      "learning_rate": 0.00018753412102725698,
+      "loss": 1.3795,
+      "step": 232
+    },
+    {
+      "epoch": 0.132574679943101,
+      "grad_norm": 0.5117084383964539,
+      "learning_rate": 0.00018742398337904213,
+      "loss": 1.4731,
+      "step": 233
+    },
+    {
+      "epoch": 0.13314366998577526,
+      "grad_norm": 0.5027900338172913,
+      "learning_rate": 0.00018731339396428607,
+      "loss": 1.5399,
+      "step": 234
+    },
+    {
+      "epoch": 0.1337126600284495,
+      "grad_norm": 0.5187605619430542,
+      "learning_rate": 0.00018720235335446342,
+      "loss": 1.5111,
+      "step": 235
+    },
+    {
+      "epoch": 0.13428165007112375,
+      "grad_norm": 0.5272188782691956,
+      "learning_rate": 0.00018709086212338058,
+      "loss": 1.5717,
+      "step": 236
+    },
+    {
+      "epoch": 0.13485064011379802,
+      "grad_norm": 0.5339289903640747,
+      "learning_rate": 0.00018697892084717238,
+      "loss": 1.4529,
+      "step": 237
+    },
+    {
+      "epoch": 0.13541963015647226,
+      "grad_norm": 0.5382213592529297,
+      "learning_rate": 0.00018686653010429937,
+      "loss": 1.5727,
+      "step": 238
+    },
+    {
+      "epoch": 0.1359886201991465,
+      "grad_norm": 0.5148522257804871,
+      "learning_rate": 0.00018675369047554475,
+      "loss": 1.5683,
+      "step": 239
+    },
+    {
+      "epoch": 0.13655761024182078,
+      "grad_norm": 0.5300989747047424,
+      "learning_rate": 0.00018664040254401121,
+      "loss": 1.6485,
+      "step": 240
+    },
+    {
+      "epoch": 0.13712660028449503,
+      "grad_norm": 0.5400955080986023,
+      "learning_rate": 0.00018652666689511824,
+      "loss": 1.5095,
+      "step": 241
+    },
+    {
+      "epoch": 0.13769559032716927,
+      "grad_norm": 0.49695253372192383,
+      "learning_rate": 0.0001864124841165988,
+      "loss": 1.3692,
+      "step": 242
+    },
+    {
+      "epoch": 0.13826458036984351,
+      "grad_norm": 0.5431788563728333,
+      "learning_rate": 0.00018629785479849656,
+      "loss": 1.5774,
+      "step": 243
+    },
+    {
+      "epoch": 0.1388335704125178,
+      "grad_norm": 0.5125901103019714,
+      "learning_rate": 0.00018618277953316245,
+      "loss": 1.3545,
+      "step": 244
+    },
+    {
+      "epoch": 0.13940256045519203,
+      "grad_norm": 0.5172457695007324,
+      "learning_rate": 0.0001860672589152521,
+      "loss": 1.5196,
+      "step": 245
+    },
+    {
+      "epoch": 0.13997155049786628,
+      "grad_norm": 0.5287220478057861,
+      "learning_rate": 0.00018595129354172235,
+      "loss": 1.7279,
+      "step": 246
+    },
+    {
+      "epoch": 0.14054054054054055,
+      "grad_norm": 0.5728311538696289,
+      "learning_rate": 0.00018583488401182843,
+      "loss": 1.5514,
+      "step": 247
+    },
+    {
+      "epoch": 0.1411095305832148,
+      "grad_norm": 0.5267804861068726,
+      "learning_rate": 0.0001857180309271207,
+      "loss": 1.5115,
+      "step": 248
+    },
+    {
+      "epoch": 0.14167852062588904,
+      "grad_norm": 0.5459727644920349,
+      "learning_rate": 0.00018560073489144166,
+      "loss": 1.5057,
+      "step": 249
+    },
+    {
+      "epoch": 0.1422475106685633,
+      "grad_norm": 0.5065287947654724,
+      "learning_rate": 0.00018548299651092269,
+      "loss": 1.4906,
+      "step": 250
+    },
+    {
+      "epoch": 0.14281650071123755,
+      "grad_norm": 0.5647059082984924,
+      "learning_rate": 0.00018536481639398107,
+      "loss": 1.5447,
+      "step": 251
+    },
+    {
+      "epoch": 0.1433854907539118,
+      "grad_norm": 0.5164194703102112,
+      "learning_rate": 0.00018524619515131679,
+      "loss": 1.6922,
+      "step": 252
+    },
+    {
+      "epoch": 0.14395448079658607,
+      "grad_norm": 0.5288499593734741,
+      "learning_rate": 0.0001851271333959093,
+      "loss": 1.5596,
+      "step": 253
+    },
+    {
+      "epoch": 0.14452347083926032,
+      "grad_norm": 0.509348452091217,
+      "learning_rate": 0.00018500763174301448,
+      "loss": 1.6263,
+      "step": 254
+    },
+    {
+      "epoch": 0.14509246088193456,
+      "grad_norm": 0.5377824902534485,
+      "learning_rate": 0.00018488769081016133,
+      "loss": 1.4711,
+      "step": 255
+    },
+    {
+      "epoch": 0.14566145092460883,
+      "grad_norm": 0.5068728923797607,
+      "learning_rate": 0.00018476731121714894,
+      "loss": 1.6706,
+      "step": 256
+    },
+    {
+      "epoch": 0.14623044096728308,
+      "grad_norm": 0.5097038745880127,
+      "learning_rate": 0.0001846464935860431,
+      "loss": 1.5841,
+      "step": 257
+    },
+    {
+      "epoch": 0.14679943100995732,
+      "grad_norm": 0.5391016006469727,
+      "learning_rate": 0.0001845252385411732,
+      "loss": 1.6935,
+      "step": 258
+    },
+    {
+      "epoch": 0.14736842105263157,
+      "grad_norm": 0.5154038667678833,
+      "learning_rate": 0.00018440354670912906,
+      "loss": 1.3827,
+      "step": 259
+    },
+    {
+      "epoch": 0.14793741109530584,
+      "grad_norm": 0.5789750814437866,
+      "learning_rate": 0.00018428141871875743,
+      "loss": 1.545,
+      "step": 260
+    },
+    {
+      "epoch": 0.14850640113798008,
+      "grad_norm": 0.5456128716468811,
+      "learning_rate": 0.00018415885520115915,
+      "loss": 1.5359,
+      "step": 261
+    },
+    {
+      "epoch": 0.14907539118065433,
+      "grad_norm": 0.6158856749534607,
+      "learning_rate": 0.00018403585678968551,
+      "loss": 1.7601,
+      "step": 262
+    },
+    {
+      "epoch": 0.1496443812233286,
+      "grad_norm": 0.4721933603286743,
+      "learning_rate": 0.00018391242411993516,
+      "loss": 1.3328,
+      "step": 263
+    },
+    {
+      "epoch": 0.15021337126600284,
+      "grad_norm": 0.5242535471916199,
+      "learning_rate": 0.00018378855782975084,
+      "loss": 1.3359,
+      "step": 264
+    },
+    {
+      "epoch": 0.1507823613086771,
+      "grad_norm": 0.5116239190101624,
+      "learning_rate": 0.000183664258559216,
+      "loss": 1.218,
+      "step": 265
+    },
+    {
+      "epoch": 0.15135135135135136,
+      "grad_norm": 0.5715349316596985,
+      "learning_rate": 0.0001835395269506515,
+      "loss": 1.7737,
+      "step": 266
+    },
+    {
+      "epoch": 0.1519203413940256,
+      "grad_norm": 0.5294284224510193,
+      "learning_rate": 0.0001834143636486124,
+      "loss": 1.7273,
+      "step": 267
+    },
+    {
+      "epoch": 0.15248933143669985,
+      "grad_norm": 0.5225195288658142,
+      "learning_rate": 0.0001832887692998845,
+      "loss": 1.5397,
+      "step": 268
+    },
+    {
+      "epoch": 0.15305832147937412,
+      "grad_norm": 0.5032251477241516,
+      "learning_rate": 0.00018316274455348105,
+      "loss": 1.4483,
+      "step": 269
+    },
+    {
+      "epoch": 0.15362731152204837,
+      "grad_norm": 0.5733814835548401,
+      "learning_rate": 0.00018303629006063943,
+      "loss": 1.5798,
+      "step": 270
+    },
+    {
+      "epoch": 0.1541963015647226,
+      "grad_norm": 0.5273986458778381,
+      "learning_rate": 0.0001829094064748177,
+      "loss": 1.6515,
+      "step": 271
+    },
+    {
+      "epoch": 0.15476529160739688,
+      "grad_norm": 0.563911497592926,
+      "learning_rate": 0.00018278209445169135,
+      "loss": 1.6408,
+      "step": 272
+    },
+    {
+      "epoch": 0.15533428165007113,
+      "grad_norm": 0.5052376985549927,
+      "learning_rate": 0.00018265435464914973,
+      "loss": 1.3572,
+      "step": 273
+    },
+    {
+      "epoch": 0.15590327169274537,
+      "grad_norm": 0.5052018761634827,
+      "learning_rate": 0.0001825261877272928,
+      "loss": 1.5019,
+      "step": 274
+    },
+    {
+      "epoch": 0.15647226173541964,
+      "grad_norm": 0.4795508086681366,
+      "learning_rate": 0.00018239759434842773,
+      "loss": 1.0659,
+      "step": 275
+    },
+    {
+      "epoch": 0.1570412517780939,
+      "grad_norm": 0.5224232077598572,
+      "learning_rate": 0.00018226857517706537,
+      "loss": 1.6048,
+      "step": 276
+    },
+    {
+      "epoch": 0.15761024182076813,
+      "grad_norm": 0.5337119698524475,
+      "learning_rate": 0.00018213913087991685,
+      "loss": 1.4884,
+      "step": 277
+    },
+    {
+      "epoch": 0.15817923186344238,
+      "grad_norm": 0.48973479866981506,
+      "learning_rate": 0.0001820092621258902,
+      "loss": 1.3599,
+      "step": 278
+    },
+    {
+      "epoch": 0.15874822190611665,
+      "grad_norm": 0.4995887577533722,
+      "learning_rate": 0.0001818789695860868,
+      "loss": 1.5088,
+      "step": 279
+    },
+    {
+      "epoch": 0.1593172119487909,
+      "grad_norm": 0.513390064239502,
+      "learning_rate": 0.00018174825393379798,
+      "loss": 1.5376,
+      "step": 280
+    },
+    {
+      "epoch": 0.15988620199146514,
+      "grad_norm": 0.5285114645957947,
+      "learning_rate": 0.00018161711584450152,
+      "loss": 1.706,
+      "step": 281
+    },
+    {
+      "epoch": 0.1604551920341394,
+      "grad_norm": 0.5384095907211304,
+      "learning_rate": 0.00018148555599585816,
+      "loss": 1.474,
+      "step": 282
+    },
+    {
+      "epoch": 0.16102418207681365,
+      "grad_norm": 0.5326551795005798,
+      "learning_rate": 0.0001813535750677081,
+      "loss": 1.4764,
+      "step": 283
+    },
+    {
+      "epoch": 0.1615931721194879,
+      "grad_norm": 0.538357675075531,
+      "learning_rate": 0.0001812211737420675,
+      "loss": 1.7382,
+      "step": 284
+    },
+    {
+      "epoch": 0.16216216216216217,
+      "grad_norm": 0.5192847847938538,
+      "learning_rate": 0.00018108835270312488,
+      "loss": 1.5809,
+      "step": 285
+    },
+    {
+      "epoch": 0.16273115220483642,
+      "grad_norm": 0.5059441328048706,
+      "learning_rate": 0.00018095511263723768,
+      "loss": 1.3315,
+      "step": 286
+    },
+    {
+      "epoch": 0.16330014224751066,
+      "grad_norm": 0.542091429233551,
+      "learning_rate": 0.00018082145423292868,
+      "loss": 1.394,
+      "step": 287
+    },
+    {
+      "epoch": 0.16386913229018493,
+      "grad_norm": 0.5587398409843445,
+      "learning_rate": 0.00018068737818088248,
+      "loss": 1.5478,
+      "step": 288
+    },
+    {
+      "epoch": 0.16443812233285918,
+      "grad_norm": 0.5091587901115417,
+      "learning_rate": 0.00018055288517394174,
+      "loss": 1.4298,
+      "step": 289
+    },
+    {
+      "epoch": 0.16500711237553342,
+      "grad_norm": 0.5347201228141785,
+      "learning_rate": 0.00018041797590710398,
+      "loss": 1.4504,
+      "step": 290
+    },
+    {
+      "epoch": 0.1655761024182077,
+      "grad_norm": 0.5370376110076904,
+      "learning_rate": 0.00018028265107751756,
+      "loss": 1.6061,
+      "step": 291
+    },
+    {
+      "epoch": 0.16614509246088194,
+      "grad_norm": 0.5322532057762146,
+      "learning_rate": 0.00018014691138447834,
+      "loss": 1.5102,
+      "step": 292
+    },
+    {
+      "epoch": 0.16671408250355618,
+      "grad_norm": 0.4970771074295044,
+      "learning_rate": 0.00018001075752942605,
+      "loss": 1.3017,
+      "step": 293
+    },
+    {
+      "epoch": 0.16728307254623045,
+      "grad_norm": 0.5143032670021057,
+      "learning_rate": 0.00017987419021594053,
+      "loss": 1.5115,
+      "step": 294
+    },
+    {
+      "epoch": 0.1678520625889047,
+      "grad_norm": 0.4978564977645874,
+      "learning_rate": 0.00017973721014973823,
+      "loss": 1.33,
+      "step": 295
+    },
+    {
+      "epoch": 0.16842105263157894,
+      "grad_norm": 0.5085217356681824,
+      "learning_rate": 0.00017959981803866856,
+      "loss": 1.3251,
+      "step": 296
+    },
+    {
+      "epoch": 0.1689900426742532,
+      "grad_norm": 0.522738516330719,
+      "learning_rate": 0.0001794620145927101,
+      "loss": 1.3305,
+      "step": 297
+    },
+    {
+      "epoch": 0.16955903271692746,
+      "grad_norm": 0.506791353225708,
+      "learning_rate": 0.00017932380052396702,
+      "loss": 1.5626,
+      "step": 298
+    },
+    {
+      "epoch": 0.1701280227596017,
+      "grad_norm": 0.541067898273468,
+      "learning_rate": 0.0001791851765466655,
+      "loss": 1.6446,
+      "step": 299
+    },
+    {
+      "epoch": 0.17069701280227595,
+      "grad_norm": 0.5105940103530884,
+      "learning_rate": 0.0001790461433771498,
+      "loss": 1.5842,
+      "step": 300
+    },
+    {
+      "epoch": 0.17126600284495022,
+      "grad_norm": 0.49997130036354065,
+      "learning_rate": 0.00017890670173387885,
+      "loss": 1.5844,
+      "step": 301
+    },
+    {
+      "epoch": 0.17183499288762447,
+      "grad_norm": 0.5258059501647949,
+      "learning_rate": 0.00017876685233742226,
+      "loss": 1.5576,
+      "step": 302
+    },
+    {
+      "epoch": 0.1724039829302987,
+      "grad_norm": 0.5664198398590088,
+      "learning_rate": 0.00017862659591045673,
+      "loss": 1.4313,
+      "step": 303
+    },
+    {
+      "epoch": 0.17297297297297298,
+      "grad_norm": 0.5197086930274963,
+      "learning_rate": 0.00017848593317776234,
+      "loss": 1.4374,
+      "step": 304
+    },
+    {
+      "epoch": 0.17354196301564723,
+      "grad_norm": 0.5377213954925537,
+      "learning_rate": 0.0001783448648662188,
+      "loss": 1.3973,
+      "step": 305
+    },
+    {
+      "epoch": 0.17411095305832147,
+      "grad_norm": 0.4912850260734558,
+      "learning_rate": 0.00017820339170480156,
+      "loss": 1.3055,
+      "step": 306
+    },
+    {
+      "epoch": 0.17467994310099574,
+      "grad_norm": 0.5148215293884277,
+      "learning_rate": 0.00017806151442457827,
+      "loss": 1.5493,
+      "step": 307
+    },
+    {
+      "epoch": 0.17524893314367,
+      "grad_norm": 0.5305980443954468,
+      "learning_rate": 0.0001779192337587048,
+      "loss": 1.6176,
+      "step": 308
+    },
+    {
+      "epoch": 0.17581792318634423,
+      "grad_norm": 0.5322251319885254,
+      "learning_rate": 0.0001777765504424215,
+      "loss": 1.6621,
+      "step": 309
+    },
+    {
+      "epoch": 0.1763869132290185,
+      "grad_norm": 0.5405860543251038,
+      "learning_rate": 0.00017763346521304955,
+      "loss": 1.5951,
+      "step": 310
+    },
+    {
+      "epoch": 0.17695590327169275,
+      "grad_norm": 0.5762712359428406,
+      "learning_rate": 0.00017748997880998691,
+      "loss": 1.4609,
+      "step": 311
+    },
+    {
+      "epoch": 0.177524893314367,
+      "grad_norm": 0.5313809514045715,
+      "learning_rate": 0.0001773460919747047,
+      "loss": 1.4488,
+      "step": 312
+    },
+    {
+      "epoch": 0.17809388335704124,
+      "grad_norm": 0.5385677814483643,
+      "learning_rate": 0.00017720180545074322,
+      "loss": 1.5543,
+      "step": 313
+    },
+    {
+      "epoch": 0.1786628733997155,
+      "grad_norm": 0.5349786877632141,
+      "learning_rate": 0.00017705711998370824,
+      "loss": 1.5848,
+      "step": 314
+    },
+    {
+      "epoch": 0.17923186344238975,
+      "grad_norm": 0.5395460724830627,
+      "learning_rate": 0.00017691203632126706,
+      "loss": 1.5344,
+      "step": 315
+    },
+    {
+      "epoch": 0.179800853485064,
+      "grad_norm": 0.5073065757751465,
+      "learning_rate": 0.0001767665552131446,
+      "loss": 1.4227,
+      "step": 316
+    },
+    {
+      "epoch": 0.18036984352773827,
+      "grad_norm": 0.5242070555686951,
+      "learning_rate": 0.00017662067741111974,
+      "loss": 1.5054,
+      "step": 317
+    },
+    {
+      "epoch": 0.18093883357041252,
+      "grad_norm": 0.5271447896957397,
+      "learning_rate": 0.00017647440366902117,
+      "loss": 1.5675,
+      "step": 318
+    },
+    {
+      "epoch": 0.18150782361308676,
+      "grad_norm": 0.5302979946136475,
+      "learning_rate": 0.00017632773474272363,
+      "loss": 1.4631,
+      "step": 319
+    },
+    {
+      "epoch": 0.18207681365576103,
+      "grad_norm": 0.5438220500946045,
+      "learning_rate": 0.00017618067139014404,
+      "loss": 1.4737,
+      "step": 320
+    },
+    {
+      "epoch": 0.18264580369843528,
+      "grad_norm": 0.5002385377883911,
+      "learning_rate": 0.0001760332143712375,
+      "loss": 1.3976,
+      "step": 321
+    },
+    {
+      "epoch": 0.18321479374110952,
+      "grad_norm": 0.5478991866111755,
+      "learning_rate": 0.00017588536444799338,
+      "loss": 1.527,
+      "step": 322
+    },
+    {
+      "epoch": 0.1837837837837838,
+      "grad_norm": 0.5406285524368286,
+      "learning_rate": 0.0001757371223844314,
+      "loss": 1.4453,
+      "step": 323
+    },
+    {
+      "epoch": 0.18435277382645804,
+      "grad_norm": 0.5226593613624573,
+      "learning_rate": 0.00017558848894659771,
+      "loss": 1.5309,
+      "step": 324
+    },
+    {
+      "epoch": 0.18492176386913228,
+      "grad_norm": 0.5488921999931335,
+      "learning_rate": 0.0001754394649025609,
+      "loss": 1.6993,
+      "step": 325
+    },
+    {
+      "epoch": 0.18549075391180656,
+      "grad_norm": 0.5268238186836243,
+      "learning_rate": 0.000175290051022408,
+      "loss": 1.4578,
+      "step": 326
+    },
+    {
+      "epoch": 0.1860597439544808,
+      "grad_norm": 0.5236526727676392,
+      "learning_rate": 0.00017514024807824055,
+      "loss": 1.5276,
+      "step": 327
+    },
+    {
+      "epoch": 0.18662873399715504,
+      "grad_norm": 0.5280612707138062,
+      "learning_rate": 0.00017499005684417057,
+      "loss": 1.5191,
+      "step": 328
+    },
+    {
+      "epoch": 0.18719772403982932,
+      "grad_norm": 0.5311048030853271,
+      "learning_rate": 0.0001748394780963166,
+      "loss": 1.6317,
+      "step": 329
+    },
+    {
+      "epoch": 0.18776671408250356,
+      "grad_norm": 0.5343871712684631,
+      "learning_rate": 0.0001746885126127997,
+      "loss": 1.6759,
+      "step": 330
+    },
+    {
+      "epoch": 0.1883357041251778,
+      "grad_norm": 0.5824495553970337,
+      "learning_rate": 0.00017453716117373937,
+      "loss": 1.5064,
+      "step": 331
+    },
+    {
+      "epoch": 0.18890469416785205,
+      "grad_norm": 0.5165912508964539,
+      "learning_rate": 0.0001743854245612495,
+      "loss": 1.413,
+      "step": 332
+    },
+    {
+      "epoch": 0.18947368421052632,
+      "grad_norm": 0.5721679329872131,
+      "learning_rate": 0.0001742333035594345,
+      "loss": 1.3518,
+      "step": 333
+    },
+    {
+      "epoch": 0.19004267425320057,
+      "grad_norm": 0.5547354817390442,
+      "learning_rate": 0.00017408079895438498,
+      "loss": 1.7325,
+      "step": 334
+    },
+    {
+      "epoch": 0.1906116642958748,
+      "grad_norm": 0.5567200779914856,
+      "learning_rate": 0.00017392791153417398,
+      "loss": 1.6179,
+      "step": 335
+    },
+    {
+      "epoch": 0.19118065433854908,
+      "grad_norm": 0.5186401009559631,
+      "learning_rate": 0.00017377464208885265,
+      "loss": 1.3499,
+      "step": 336
+    },
+    {
+      "epoch": 0.19174964438122333,
+      "grad_norm": 0.5111268758773804,
+      "learning_rate": 0.00017362099141044626,
+      "loss": 1.2942,
+      "step": 337
+    },
+    {
+      "epoch": 0.19231863442389757,
+      "grad_norm": 0.5359705090522766,
+      "learning_rate": 0.0001734669602929502,
+      "loss": 1.552,
+      "step": 338
+    },
+    {
+      "epoch": 0.19288762446657184,
+      "grad_norm": 0.5835704803466797,
+      "learning_rate": 0.0001733125495323257,
+      "loss": 1.3161,
+      "step": 339
+    },
+    {
+      "epoch": 0.1934566145092461,
+      "grad_norm": 0.5223122835159302,
+      "learning_rate": 0.00017315775992649584,
+      "loss": 1.5189,
+      "step": 340
+    },
+    {
+      "epoch": 0.19402560455192033,
+      "grad_norm": 0.5331559777259827,
+      "learning_rate": 0.0001730025922753415,
+      "loss": 1.7263,
+      "step": 341
+    },
+    {
+      "epoch": 0.1945945945945946,
+      "grad_norm": 0.54593425989151,
+      "learning_rate": 0.00017284704738069698,
+      "loss": 1.5158,
+      "step": 342
+    },
+    {
+      "epoch": 0.19516358463726885,
+      "grad_norm": 0.5385016202926636,
+      "learning_rate": 0.000172691126046346,
+      "loss": 1.5762,
+      "step": 343
+    },
+    {
+      "epoch": 0.1957325746799431,
+      "grad_norm": 0.4981791079044342,
+      "learning_rate": 0.00017253482907801773,
+      "loss": 1.3606,
+      "step": 344
+    },
+    {
+      "epoch": 0.19630156472261737,
+      "grad_norm": 0.5046445727348328,
+      "learning_rate": 0.00017237815728338217,
+      "loss": 1.382,
+      "step": 345
+    },
+    {
+      "epoch": 0.1968705547652916,
+      "grad_norm": 0.5692354440689087,
+      "learning_rate": 0.00017222111147204645,
+      "loss": 1.6214,
+      "step": 346
+    },
+    {
+      "epoch": 0.19743954480796586,
+      "grad_norm": 0.5191353559494019,
+      "learning_rate": 0.00017206369245555036,
+      "loss": 1.459,
+      "step": 347
+    },
+    {
+      "epoch": 0.1980085348506401,
+      "grad_norm": 0.5159747004508972,
+      "learning_rate": 0.0001719059010473623,
+      "loss": 1.6057,
+      "step": 348
+    },
+    {
+      "epoch": 0.1980085348506401,
+      "eval_loss": 1.506325602531433,
+      "eval_runtime": 16.4362,
+      "eval_samples_per_second": 45.023,
+      "eval_steps_per_second": 22.511,
+      "step": 348
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1392,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 348,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6192604292579328.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:76b9dc7d57214c43989a5fdba8805609204de5a837590b4d4d38e447723c8225
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff