Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

.gitattributes +1 -0
README.md +111 -0
added_tokens.json +3 -0
chat_template.jinja +50 -0
config.json +55 -0
generation_config.json +14 -0
model.safetensors +3 -0
special_tokens_map.json +33 -0
tokenizer.json +3 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0
train/log.json +2072 -0
train/training_loss.png +0 -0
train/validation_loss.png +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,111 @@

+---
+license: gemma
+language:
+- en
+base_model: unsloth/gemma-3-270m-it
+datasets:
+- kth8/system_prompts_SuperGPQA-26000x
+- kth8/system_prompts_Jobs-20000x
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- sft
+- trl
+- unsloth
+- google
+- gemma
+- gemma3
+- gemma3_text
+---
+![logo](https://storage.googleapis.com/gweb-developer-goog-blog-assets/images/gemma-3_2.original.png)
+A fine-tune of [unsloth/gemma-3-270m-it](https://huggingface.co/unsloth/gemma-3-270m-it) on the [kth8/system_prompts_SuperGPQA-26000x](https://huggingface.co/datasets/kth8/system_prompts_SuperGPQA-26000x) and [kth8/system_prompts_Jobs-20000x](https://huggingface.co/datasets/kth8/system_prompts_Jobs-20000x) dataset.
+## Usage example
+**System prompt**
+```
+You are a system prompt generator.
+```
+**User prompt**
+```
+Write a system prompt for an AI assistant that will help users with Engineering problems, especially Mechanics and Solid Mechanics.
+```
+**Assistant response**
+```
+You are a world-class engineering consultant and educator with deep, up-to-date expertise in all sub-disciplines of Mechanics and Solid Mechanics. Your knowledge spans classical and modern theories, computational methods, experimental techniques, and industry standards (e.g., ASME, ASTM, ISO, Eurocode, ACI). You are fluent in the language of engineering analysis, from continuum mechanics, stress-strain relationships, and material constitutive models to advanced topics such as plasticity, fracture mechanics, fatigue, buckling, vibration, and dynamic impact.
+Your core responsibilities include:
+* Providing clear, step-by-step explanations of theoretical concepts, derivations, and physical intuition.
+* Solving analytical problems, performing hand calculations, and guiding the formulation of governing equations.
+* Recommending appropriate numerical methods (Finite Element Analysis, Boundary Element Method, Mesh-free methods) and assisting with model setup, meshing strategies, boundary condition enforcement, and result interpretation.
+* Advising on material selection, property estimation, and the application of constitutive models for metals, polymers, composites, and emerging materials.
+* Interpreting and applying relevant design codes, standards, and safety factors for structural components, pressure vessels, bridges, aerospace structures, and micro-scale devices.
+* Generating concise, well-structured technical reports, design memos, and documentation, including clear diagrams, tables, and units.
+* Producing reproducible code snippets in MATLAB, Python (NumPy/SciPy, FEniCS, PyTorch), or other engineering software, with thorough comments and explanations.
+* Asking clarifying questions when problem statements are ambiguous, to ensure accurate and context-appropriate solutions.
+* Maintaining a professional tone, avoiding speculation beyond the provided information, and explicitly stating any assumptions made during analysis.
+* Providing references to textbooks, peer-reviewed papers, standards, or reputable online resources when relevant, and citing them in a consistent format.
+* Upholding safety and ethical considerations: never suggest designs that violate known safety regulations, and always warn about potential failure modes, limitations of models, and the need for experimental validation.
+When responding, follow these guidelines:
+1. Begin with a brief summary of the key takeaway.
+2. Present the solution or explanation in logical sections (Conceptual Background, Assumptions, Governing Equations, Solution Procedure, Results, Interpretation, Recommendations).
+3. Use SI units unless otherwise specified, and include unit conversions where helpful.
+4. Highlight any assumptions, approximations, and their impact on accuracy.
+5. If a calculation is performed, show all intermediate steps and final numerical results with appropriate significant figures.
+6. For code, provide a complete, runnable snippet with comments; explain any external libraries required.
+7. Conclude with any additional considerations, potential extensions, or validation steps.
+You adapt your depth of explanation to the user's expertise level: for novices, use analogies and avoid unnecessary jargon; for experts, you may discuss higher-order theories, recent research trends, and nuanced code interpretations. Always strive for clarity, rigor, and pedagogical value.
+You are empowered to ask the user for missing data, clarifications, or context before delivering a final answer. Your ultimate goal is to enable users to understand, apply, and extend mechanical and solid-mechanics principles across academia, industry, and research.
+```
+## Model Details
+- Base Model: `unsloth/gemma-3-270m-it`
+- Parameter Count: 268,098,176
+- Precision: torch.bfloat16
+## Hardware
+- GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition
+- Announced: Mar 17th, 2025
+- Release Date: Mar 18th, 2025
+- Memory Type: GDDR7
+- Bandwidth: 1.79 TB/s
+- Memory Size: 96 GB
+- Memory Bus: 512 bit
+- Shading Units: 24064
+- TDP: 600W
+## Training Settings
+### PEFT
+- Rank: 32
+- LoRA alpha: 64
+- Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+- Gradient checkpointing: unsloth
+### SFT
+- Epoch: 2
+- Batch size: 32
+- Gradient Accumulation steps: 1
+- Warmup ratio: 0.05
+- Learning rate: 0.0002
+- Optimizer: adamw_torch_fused
+- Learning rate scheduler: cosine
+## Training stats
+- Date: 2026-03-30T15:42:56.091336
+- Peak VRAM usage: 68.33 GB
+- Global step: 2830
+- Training runtime (seconds): 1496.9978
+- Average training loss: 1.398907420828991
+- Final validation loss: 1.282422423362732
+## Framework versions
+- Unsloth: 2026.3.17
+- TRL: 0.22.2
+- Transformers: 4.56.2
+- Pytorch: 2.10.0+cu128
+- Datasets: 4.8.4
+- Tokenizers: 0.22.2
+## License
+This model is released under the Gemma license. See the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [Prohibited Use Policy](https://policies.google.com/terms/generative-ai/use-policy) regarding the use of Gemma-generated content.

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "<image_soft_token>": 262144
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,50 @@

+{# Unsloth Chat template fixes #}
+{{ bos_token }}
+{%- if messages[0]['role'] == 'system' -%}
+    {%- if messages[0]['content'] is string -%}
+        {%- set first_user_prefix = messages[0]['content'] + '
+' -%}
+    {%- else -%}
+        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
+' -%}
+    {%- endif -%}
+    {%- set loop_messages = messages[1:] -%}
+{%- else -%}
+    {%- set first_user_prefix = "" -%}
+    {%- set loop_messages = messages -%}
+{%- endif -%}
+{%- for message in loop_messages -%}
+    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
+        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
+    {%- endif -%}
+    {%- if (message['role'] == 'assistant') -%}
+        {%- set role = "model" -%}
+    {%- else -%}
+        {%- set role = message['role'] -%}
+    {%- endif -%}
+    {{ '<start_of_turn>' + role + '
+' + (first_user_prefix if loop.first else "") }}
+    {%- if message['content'] is string -%}
+        {{ message['content'] | trim }}
+    {%- elif message['content'] is iterable -%}
+        {%- for item in message['content'] -%}
+            {%- if item['type'] == 'image' -%}
+                {{ '<start_of_image>' }}
+            {%- elif item['type'] == 'text' -%}
+                {{ item['text'] | trim }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- elif message['content'] is defined -%}
+        {{ raise_exception("Invalid content type") }}
+    {%- endif -%}
+    {{ '<end_of_turn>
+' }}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{'<start_of_turn>model
+'}}
+{%- endif -%}
+{# Copyright 2025-present Unsloth. Apache 2.0 License. #}

config.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "_sliding_window_pattern": 6,
+  "architectures": [
+    "Gemma3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "attn_logit_softcapping": null,
+  "bos_token_id": 2,
+  "dtype": "bfloat16",
+  "eos_token_id": 106,
+  "final_logit_softcapping": null,
+  "head_dim": 256,
+  "hidden_activation": "gelu_pytorch_tanh",
+  "hidden_size": 640,
+  "initializer_range": 0.02,
+  "intermediate_size": 2048,
+  "layer_types": [
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "model_type": "gemma3_text",
+  "num_attention_heads": 4,
+  "num_hidden_layers": 18,
+  "num_key_value_heads": 1,
+  "pad_token_id": 0,
+  "query_pre_attn_scalar": 256,
+  "rms_norm_eps": 1e-06,
+  "rope_local_base_freq": 10000.0,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": 512,
+  "transformers_version": "4.56.2",
+  "unsloth_fixed": true,
+  "use_bidirectional_attention": false,
+  "use_cache": true,
+  "vocab_size": 262144
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "bos_token_id": 2,
+  "cache_implementation": "hybrid",
+  "do_sample": true,
+  "eos_token_id": [
+    1,
+    106
+  ],
+  "max_length": 32768,
+  "pad_token_id": 0,
+  "top_k": 64,
+  "top_p": 0.95,
+  "transformers_version": "4.56.2"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c62f9d0efecb91dc206c2d4788e5e716fbe6f34c3bb6ec195e017710edf9dfb
+size 536223056

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<end_of_turn>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
+size 33384568

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

train/log.json ADDED Viewed

	@@ -0,0 +1,2072 @@

+[
+  {
+    "loss": 3.0223,
+    "grad_norm": 9.622929573059082,
+    "learning_rate": 1.267605633802817e-05,
+    "epoch": 0.007067137809187279,
+    "step": 10
+  },
+  {
+    "loss": 2.4561,
+    "grad_norm": 2.88118052482605,
+    "learning_rate": 2.676056338028169e-05,
+    "epoch": 0.014134275618374558,
+    "step": 20
+  },
+  {
+    "loss": 2.2697,
+    "grad_norm": 1.5007869005203247,
+    "learning_rate": 4.0845070422535214e-05,
+    "epoch": 0.02120141342756184,
+    "step": 30
+  },
+  {
+    "loss": 2.1608,
+    "grad_norm": 1.195833444595337,
+    "learning_rate": 5.492957746478874e-05,
+    "epoch": 0.028268551236749116,
+    "step": 40
+  },
+  {
+    "loss": 2.0508,
+    "grad_norm": 1.1800744533538818,
+    "learning_rate": 6.901408450704226e-05,
+    "epoch": 0.0353356890459364,
+    "step": 50
+  },
+  {
+    "loss": 1.9724,
+    "grad_norm": 1.0913054943084717,
+    "learning_rate": 8.309859154929578e-05,
+    "epoch": 0.04240282685512368,
+    "step": 60
+  },
+  {
+    "loss": 1.9282,
+    "grad_norm": 1.0702667236328125,
+    "learning_rate": 9.718309859154931e-05,
+    "epoch": 0.04946996466431095,
+    "step": 70
+  },
+  {
+    "loss": 1.9144,
+    "grad_norm": 0.9813940525054932,
+    "learning_rate": 0.00011126760563380282,
+    "epoch": 0.05653710247349823,
+    "step": 80
+  },
+  {
+    "loss": 1.8361,
+    "grad_norm": 1.041872501373291,
+    "learning_rate": 0.00012535211267605635,
+    "epoch": 0.0636042402826855,
+    "step": 90
+  },
+  {
+    "loss": 1.8409,
+    "grad_norm": 1.254153847694397,
+    "learning_rate": 0.00013943661971830987,
+    "epoch": 0.0706713780918728,
+    "step": 100
+  },
+  {
+    "loss": 1.7993,
+    "grad_norm": 1.0529217720031738,
+    "learning_rate": 0.00015352112676056339,
+    "epoch": 0.07773851590106007,
+    "step": 110
+  },
+  {
+    "loss": 1.795,
+    "grad_norm": 1.0209791660308838,
+    "learning_rate": 0.0001676056338028169,
+    "epoch": 0.08480565371024736,
+    "step": 120
+  },
+  {
+    "loss": 1.7774,
+    "grad_norm": 0.9271851181983948,
+    "learning_rate": 0.00018169014084507045,
+    "epoch": 0.09187279151943463,
+    "step": 130
+  },
+  {
+    "loss": 1.766,
+    "grad_norm": 0.8660218715667725,
+    "learning_rate": 0.00019577464788732396,
+    "epoch": 0.0989399293286219,
+    "step": 140
+  },
+  {
+    "loss": 1.7462,
+    "grad_norm": 1.034064769744873,
+    "learning_rate": 0.00019999665339174013,
+    "epoch": 0.10600706713780919,
+    "step": 150
+  },
+  {
+    "loss": 1.7193,
+    "grad_norm": 0.9861654043197632,
+    "learning_rate": 0.00019998026238030888,
+    "epoch": 0.11307420494699646,
+    "step": 160
+  },
+  {
+    "loss": 1.7128,
+    "grad_norm": 1.0704911947250366,
+    "learning_rate": 0.00019995021451869546,
+    "epoch": 0.12014134275618374,
+    "step": 170
+  },
+  {
+    "loss": 1.713,
+    "grad_norm": 0.9765517115592957,
+    "learning_rate": 0.00019990651391130147,
+    "epoch": 0.127208480565371,
+    "step": 180
+  },
+  {
+    "loss": 1.6701,
+    "grad_norm": 0.8447274565696716,
+    "learning_rate": 0.00019984916652743156,
+    "epoch": 0.13427561837455831,
+    "step": 190
+  },
+  {
+    "loss": 1.6735,
+    "grad_norm": 0.9753164052963257,
+    "learning_rate": 0.00019977818020047817,
+    "epoch": 0.1413427561837456,
+    "step": 200
+  },
+  {
+    "loss": 1.6576,
+    "grad_norm": 0.8660338521003723,
+    "learning_rate": 0.00019969356462685146,
+    "epoch": 0.14840989399293286,
+    "step": 210
+  },
+  {
+    "loss": 1.6457,
+    "grad_norm": 0.8690060377120972,
+    "learning_rate": 0.0001995953313646548,
+    "epoch": 0.15547703180212014,
+    "step": 220
+  },
+  {
+    "loss": 1.6314,
+    "grad_norm": 0.8115194439888,
+    "learning_rate": 0.0001994834938321061,
+    "epoch": 0.1625441696113074,
+    "step": 230
+  },
+  {
+    "loss": 1.6366,
+    "grad_norm": 0.9583505392074585,
+    "learning_rate": 0.00019935806730570488,
+    "epoch": 0.1696113074204947,
+    "step": 240
+  },
+  {
+    "loss": 1.6171,
+    "grad_norm": 0.7596977353096008,
+    "learning_rate": 0.00019921906891814551,
+    "epoch": 0.17667844522968199,
+    "step": 250
+  },
+  {
+    "loss": 1.5992,
+    "grad_norm": 0.832810640335083,
+    "learning_rate": 0.000199066517655977,
+    "epoch": 0.18374558303886926,
+    "step": 260
+  },
+  {
+    "loss": 1.6132,
+    "grad_norm": 0.7862087488174438,
+    "learning_rate": 0.00019890043435700954,
+    "epoch": 0.19081272084805653,
+    "step": 270
+  },
+  {
+    "loss": 1.597,
+    "grad_norm": 0.9267857670783997,
+    "learning_rate": 0.00019872084170746829,
+    "epoch": 0.1978798586572438,
+    "step": 280
+  },
+  {
+    "eval_loss": 1.587104320526123,
+    "eval_runtime": 33.1782,
+    "eval_samples_per_second": 71.794,
+    "eval_steps_per_second": 17.964,
+    "epoch": 0.19929328621908127,
+    "step": 282
+  },
+  {
+    "loss": 1.5856,
+    "grad_norm": 0.7953941822052002,
+    "learning_rate": 0.0001985277642388941,
+    "epoch": 0.2049469964664311,
+    "step": 290
+  },
+  {
+    "loss": 1.5911,
+    "grad_norm": 0.9416302442550659,
+    "learning_rate": 0.00019832122832479326,
+    "epoch": 0.21201413427561838,
+    "step": 300
+  },
+  {
+    "loss": 1.5807,
+    "grad_norm": 0.7640643119812012,
+    "learning_rate": 0.0001981012621770344,
+    "epoch": 0.21908127208480566,
+    "step": 310
+  },
+  {
+    "loss": 1.6089,
+    "grad_norm": 0.7899666428565979,
+    "learning_rate": 0.00019786789584199524,
+    "epoch": 0.22614840989399293,
+    "step": 320
+  },
+  {
+    "loss": 1.5768,
+    "grad_norm": 0.8222067356109619,
+    "learning_rate": 0.00019762116119645818,
+    "epoch": 0.2332155477031802,
+    "step": 330
+  },
+  {
+    "loss": 1.5571,
+    "grad_norm": 0.9370843172073364,
+    "learning_rate": 0.00019736109194325635,
+    "epoch": 0.24028268551236748,
+    "step": 340
+  },
+  {
+    "loss": 1.5752,
+    "grad_norm": 0.8056897521018982,
+    "learning_rate": 0.00019708772360666957,
+    "epoch": 0.24734982332155478,
+    "step": 350
+  },
+  {
+    "loss": 1.5278,
+    "grad_norm": 0.866102933883667,
+    "learning_rate": 0.00019680109352757227,
+    "epoch": 0.254416961130742,
+    "step": 360
+  },
+  {
+    "loss": 1.5634,
+    "grad_norm": 0.8707364797592163,
+    "learning_rate": 0.0001965012408583327,
+    "epoch": 0.26148409893992935,
+    "step": 370
+  },
+  {
+    "loss": 1.5348,
+    "grad_norm": 0.8081828951835632,
+    "learning_rate": 0.00019618820655746487,
+    "epoch": 0.26855123674911663,
+    "step": 380
+  },
+  {
+    "loss": 1.5707,
+    "grad_norm": 0.8864259123802185,
+    "learning_rate": 0.0001958620333840339,
+    "epoch": 0.2756183745583039,
+    "step": 390
+  },
+  {
+    "loss": 1.5324,
+    "grad_norm": 0.8165407776832581,
+    "learning_rate": 0.00019552276589181522,
+    "epoch": 0.2826855123674912,
+    "step": 400
+  },
+  {
+    "loss": 1.5627,
+    "grad_norm": 0.7587102055549622,
+    "learning_rate": 0.00019517045042320892,
+    "epoch": 0.28975265017667845,
+    "step": 410
+  },
+  {
+    "loss": 1.5713,
+    "grad_norm": 0.819656491279602,
+    "learning_rate": 0.00019480513510290934,
+    "epoch": 0.2968197879858657,
+    "step": 420
+  },
+  {
+    "loss": 1.5437,
+    "grad_norm": 0.8668874502182007,
+    "learning_rate": 0.00019442686983133168,
+    "epoch": 0.303886925795053,
+    "step": 430
+  },
+  {
+    "loss": 1.5436,
+    "grad_norm": 0.8535803556442261,
+    "learning_rate": 0.0001940357062777956,
+    "epoch": 0.31095406360424027,
+    "step": 440
+  },
+  {
+    "loss": 1.51,
+    "grad_norm": 0.8437765836715698,
+    "learning_rate": 0.0001936316978734676,
+    "epoch": 0.31802120141342755,
+    "step": 450
+  },
+  {
+    "loss": 1.5283,
+    "grad_norm": 0.7021297812461853,
+    "learning_rate": 0.0001932148998040626,
+    "epoch": 0.3250883392226148,
+    "step": 460
+  },
+  {
+    "loss": 1.5183,
+    "grad_norm": 0.7457623481750488,
+    "learning_rate": 0.00019278536900230563,
+    "epoch": 0.3321554770318021,
+    "step": 470
+  },
+  {
+    "loss": 1.5222,
+    "grad_norm": 0.8366155028343201,
+    "learning_rate": 0.0001923431641401552,
+    "epoch": 0.3392226148409894,
+    "step": 480
+  },
+  {
+    "loss": 1.4932,
+    "grad_norm": 0.8243674039840698,
+    "learning_rate": 0.00019188834562078902,
+    "epoch": 0.3462897526501767,
+    "step": 490
+  },
+  {
+    "loss": 1.4981,
+    "grad_norm": 0.9153965711593628,
+    "learning_rate": 0.00019142097557035308,
+    "epoch": 0.35335689045936397,
+    "step": 500
+  },
+  {
+    "loss": 1.4876,
+    "grad_norm": 0.8276180028915405,
+    "learning_rate": 0.0001909411178294756,
+    "epoch": 0.36042402826855124,
+    "step": 510
+  },
+  {
+    "loss": 1.4709,
+    "grad_norm": 0.8758065700531006,
+    "learning_rate": 0.0001904488379445466,
+    "epoch": 0.3674911660777385,
+    "step": 520
+  },
+  {
+    "loss": 1.4856,
+    "grad_norm": 0.7498217821121216,
+    "learning_rate": 0.00018994420315876468,
+    "epoch": 0.3745583038869258,
+    "step": 530
+  },
+  {
+    "loss": 1.495,
+    "grad_norm": 0.7916860580444336,
+    "learning_rate": 0.0001894272824029518,
+    "epoch": 0.38162544169611307,
+    "step": 540
+  },
+  {
+    "loss": 1.5118,
+    "grad_norm": 0.7216470837593079,
+    "learning_rate": 0.0001888981462861377,
+    "epoch": 0.38869257950530034,
+    "step": 550
+  },
+  {
+    "loss": 1.4686,
+    "grad_norm": 0.7964893579483032,
+    "learning_rate": 0.00018835686708591496,
+    "epoch": 0.3957597173144876,
+    "step": 560
+  },
+  {
+    "eval_loss": 1.4734739065170288,
+    "eval_runtime": 32.7333,
+    "eval_samples_per_second": 72.77,
+    "eval_steps_per_second": 18.208,
+    "epoch": 0.39858657243816253,
+    "step": 564
+  },
+  {
+    "loss": 1.4518,
+    "grad_norm": 0.9140339493751526,
+    "learning_rate": 0.00018780351873856627,
+    "epoch": 0.4028268551236749,
+    "step": 570
+  },
+  {
+    "loss": 1.5053,
+    "grad_norm": 0.9250938296318054,
+    "learning_rate": 0.00018723817682896515,
+    "epoch": 0.4098939929328622,
+    "step": 580
+  },
+  {
+    "loss": 1.4785,
+    "grad_norm": 0.8234674334526062,
+    "learning_rate": 0.00018666091858025112,
+    "epoch": 0.4169611307420495,
+    "step": 590
+  },
+  {
+    "loss": 1.4808,
+    "grad_norm": 0.8389473557472229,
+    "learning_rate": 0.0001860718228432817,
+    "epoch": 0.42402826855123676,
+    "step": 600
+  },
+  {
+    "loss": 1.4858,
+    "grad_norm": 0.7509133219718933,
+    "learning_rate": 0.00018547097008586155,
+    "epoch": 0.43109540636042404,
+    "step": 610
+  },
+  {
+    "loss": 1.4595,
+    "grad_norm": 0.863993227481842,
+    "learning_rate": 0.00018485844238175095,
+    "epoch": 0.4381625441696113,
+    "step": 620
+  },
+  {
+    "loss": 1.4701,
+    "grad_norm": 0.8800034523010254,
+    "learning_rate": 0.000184234323399455,
+    "epoch": 0.4452296819787986,
+    "step": 630
+  },
+  {
+    "loss": 1.4599,
+    "grad_norm": 1.2106516361236572,
+    "learning_rate": 0.0001835986983907947,
+    "epoch": 0.45229681978798586,
+    "step": 640
+  },
+  {
+    "loss": 1.4743,
+    "grad_norm": 0.8351795673370361,
+    "learning_rate": 0.00018295165417926207,
+    "epoch": 0.45936395759717313,
+    "step": 650
+  },
+  {
+    "loss": 1.4668,
+    "grad_norm": 0.7767188549041748,
+    "learning_rate": 0.00018229327914816052,
+    "epoch": 0.4664310954063604,
+    "step": 660
+  },
+  {
+    "loss": 1.4477,
+    "grad_norm": 0.7641212940216064,
+    "learning_rate": 0.00018162366322853191,
+    "epoch": 0.4734982332155477,
+    "step": 670
+  },
+  {
+    "loss": 1.4451,
+    "grad_norm": 0.8441094160079956,
+    "learning_rate": 0.00018094289788687245,
+    "epoch": 0.48056537102473496,
+    "step": 680
+  },
+  {
+    "loss": 1.4572,
+    "grad_norm": 0.7727362513542175,
+    "learning_rate": 0.0001802510761126389,
+    "epoch": 0.4876325088339223,
+    "step": 690
+  },
+  {
+    "loss": 1.4557,
+    "grad_norm": 0.7655246257781982,
+    "learning_rate": 0.00017954829240554644,
+    "epoch": 0.49469964664310956,
+    "step": 700
+  },
+  {
+    "loss": 1.4721,
+    "grad_norm": 0.8504830002784729,
+    "learning_rate": 0.00017883464276266064,
+    "epoch": 0.5017667844522968,
+    "step": 710
+  },
+  {
+    "loss": 1.4533,
+    "grad_norm": 0.7493986487388611,
+    "learning_rate": 0.00017811022466528452,
+    "epoch": 0.508833922261484,
+    "step": 720
+  },
+  {
+    "loss": 1.4379,
+    "grad_norm": 0.7659527659416199,
+    "learning_rate": 0.0001773751370656431,
+    "epoch": 0.5159010600706714,
+    "step": 730
+  },
+  {
+    "loss": 1.406,
+    "grad_norm": 0.8079085350036621,
+    "learning_rate": 0.0001766294803733671,
+    "epoch": 0.5229681978798587,
+    "step": 740
+  },
+  {
+    "loss": 1.4479,
+    "grad_norm": 0.8187495470046997,
+    "learning_rate": 0.0001758733564417773,
+    "epoch": 0.5300353356890459,
+    "step": 750
+  },
+  {
+    "loss": 1.4745,
+    "grad_norm": 0.7997586131095886,
+    "learning_rate": 0.00017510686855397176,
+    "epoch": 0.5371024734982333,
+    "step": 760
+  },
+  {
+    "loss": 1.4422,
+    "grad_norm": 0.7756392955780029,
+    "learning_rate": 0.00017433012140871811,
+    "epoch": 0.5441696113074205,
+    "step": 770
+  },
+  {
+    "loss": 1.4563,
+    "grad_norm": 0.7612184286117554,
+    "learning_rate": 0.00017354322110615188,
+    "epoch": 0.5512367491166078,
+    "step": 780
+  },
+  {
+    "loss": 1.4478,
+    "grad_norm": 0.7223451733589172,
+    "learning_rate": 0.00017274627513328385,
+    "epoch": 0.558303886925795,
+    "step": 790
+  },
+  {
+    "loss": 1.447,
+    "grad_norm": 0.8275614380836487,
+    "learning_rate": 0.00017193939234931777,
+    "epoch": 0.5653710247349824,
+    "step": 800
+  },
+  {
+    "loss": 1.4681,
+    "grad_norm": 0.8228889107704163,
+    "learning_rate": 0.00017112268297078077,
+    "epoch": 0.5724381625441696,
+    "step": 810
+  },
+  {
+    "loss": 1.4154,
+    "grad_norm": 0.7482108473777771,
+    "learning_rate": 0.0001702962585564681,
+    "epoch": 0.5795053003533569,
+    "step": 820
+  },
+  {
+    "loss": 1.4272,
+    "grad_norm": 0.813960075378418,
+    "learning_rate": 0.00016946023199220487,
+    "epoch": 0.5865724381625441,
+    "step": 830
+  },
+  {
+    "loss": 1.4076,
+    "grad_norm": 0.8059828877449036,
+    "learning_rate": 0.0001686147174754263,
+    "epoch": 0.5936395759717314,
+    "step": 840
+  },
+  {
+    "eval_loss": 1.414860725402832,
+    "eval_runtime": 32.6307,
+    "eval_samples_per_second": 72.999,
+    "eval_steps_per_second": 18.265,
+    "epoch": 0.5978798586572438,
+    "step": 846
+  },
+  {
+    "loss": 1.435,
+    "grad_norm": 0.7831512689590454,
+    "learning_rate": 0.00016775983049957887,
+    "epoch": 0.6007067137809188,
+    "step": 850
+  },
+  {
+    "loss": 1.453,
+    "grad_norm": 0.7612254023551941,
+    "learning_rate": 0.0001668956878383445,
+    "epoch": 0.607773851590106,
+    "step": 860
+  },
+  {
+    "loss": 1.4496,
+    "grad_norm": 0.7569178938865662,
+    "learning_rate": 0.0001660224075296896,
+    "epoch": 0.6148409893992933,
+    "step": 870
+  },
+  {
+    "loss": 1.4032,
+    "grad_norm": 0.8306989669799805,
+    "learning_rate": 0.00016514010885974184,
+    "epoch": 0.6219081272084805,
+    "step": 880
+  },
+  {
+    "loss": 1.4582,
+    "grad_norm": 0.7806069850921631,
+    "learning_rate": 0.00016424891234649618,
+    "epoch": 0.6289752650176679,
+    "step": 890
+  },
+  {
+    "loss": 1.4078,
+    "grad_norm": 0.8022964596748352,
+    "learning_rate": 0.00016334893972335247,
+    "epoch": 0.6360424028268551,
+    "step": 900
+  },
+  {
+    "loss": 1.4182,
+    "grad_norm": 0.7912920713424683,
+    "learning_rate": 0.00016244031392248748,
+    "epoch": 0.6431095406360424,
+    "step": 910
+  },
+  {
+    "loss": 1.4092,
+    "grad_norm": 0.7984107136726379,
+    "learning_rate": 0.00016152315905806268,
+    "epoch": 0.6501766784452296,
+    "step": 920
+  },
+  {
+    "loss": 1.4038,
+    "grad_norm": 0.7535344362258911,
+    "learning_rate": 0.00016059760040927103,
+    "epoch": 0.657243816254417,
+    "step": 930
+  },
+  {
+    "loss": 1.3912,
+    "grad_norm": 0.7016357183456421,
+    "learning_rate": 0.0001596637644032242,
+    "epoch": 0.6643109540636042,
+    "step": 940
+  },
+  {
+    "loss": 1.4376,
+    "grad_norm": 0.7795329689979553,
+    "learning_rate": 0.00015872177859768333,
+    "epoch": 0.6713780918727915,
+    "step": 950
+  },
+  {
+    "loss": 1.4172,
+    "grad_norm": 0.8167974352836609,
+    "learning_rate": 0.00015777177166363527,
+    "epoch": 0.6784452296819788,
+    "step": 960
+  },
+  {
+    "loss": 1.4088,
+    "grad_norm": 0.71951824426651,
+    "learning_rate": 0.00015681387336771656,
+    "epoch": 0.6855123674911661,
+    "step": 970
+  },
+  {
+    "loss": 1.4202,
+    "grad_norm": 0.774888813495636,
+    "learning_rate": 0.0001558482145544879,
+    "epoch": 0.6925795053003534,
+    "step": 980
+  },
+  {
+    "loss": 1.4101,
+    "grad_norm": 0.8282411694526672,
+    "learning_rate": 0.0001548749271285616,
+    "epoch": 0.6996466431095406,
+    "step": 990
+  },
+  {
+    "loss": 1.3747,
+    "grad_norm": 0.7824772000312805,
+    "learning_rate": 0.0001538941440365837,
+    "epoch": 0.7067137809187279,
+    "step": 1000
+  },
+  {
+    "loss": 1.3852,
+    "grad_norm": 0.839332103729248,
+    "learning_rate": 0.00015290599924907433,
+    "epoch": 0.7137809187279152,
+    "step": 1010
+  },
+  {
+    "loss": 1.4326,
+    "grad_norm": 0.8095847964286804,
+    "learning_rate": 0.00015191062774212773,
+    "epoch": 0.7208480565371025,
+    "step": 1020
+  },
+  {
+    "loss": 1.4082,
+    "grad_norm": 0.827717661857605,
+    "learning_rate": 0.0001509081654789753,
+    "epoch": 0.7279151943462897,
+    "step": 1030
+  },
+  {
+    "loss": 1.3871,
+    "grad_norm": 0.7732033133506775,
+    "learning_rate": 0.00014989874939141351,
+    "epoch": 0.734982332155477,
+    "step": 1040
+  },
+  {
+    "loss": 1.3649,
+    "grad_norm": 0.7641321420669556,
+    "learning_rate": 0.0001488825173610997,
+    "epoch": 0.7420494699646644,
+    "step": 1050
+  },
+  {
+    "loss": 1.4024,
+    "grad_norm": 0.7748322486877441,
+    "learning_rate": 0.0001478596082007181,
+    "epoch": 0.7491166077738516,
+    "step": 1060
+  },
+  {
+    "loss": 1.3805,
+    "grad_norm": 0.7583175301551819,
+    "learning_rate": 0.00014683016163501855,
+    "epoch": 0.7561837455830389,
+    "step": 1070
+  },
+  {
+    "loss": 1.3786,
+    "grad_norm": 0.7941022515296936,
+    "learning_rate": 0.0001457943182817308,
+    "epoch": 0.7632508833922261,
+    "step": 1080
+  },
+  {
+    "loss": 1.389,
+    "grad_norm": 0.8075069785118103,
+    "learning_rate": 0.00014475221963235687,
+    "epoch": 0.7703180212014135,
+    "step": 1090
+  },
+  {
+    "loss": 1.4011,
+    "grad_norm": 0.8261104822158813,
+    "learning_rate": 0.00014370400803284374,
+    "epoch": 0.7773851590106007,
+    "step": 1100
+  },
+  {
+    "loss": 1.419,
+    "grad_norm": 0.8428652882575989,
+    "learning_rate": 0.00014264982666413958,
+    "epoch": 0.784452296819788,
+    "step": 1110
+  },
+  {
+    "loss": 1.3845,
+    "grad_norm": 0.7421643137931824,
+    "learning_rate": 0.00014158981952263608,
+    "epoch": 0.7915194346289752,
+    "step": 1120
+  },
+  {
+    "eval_loss": 1.373445749282837,
+    "eval_runtime": 32.6518,
+    "eval_samples_per_second": 72.952,
+    "eval_steps_per_second": 18.253,
+    "epoch": 0.7971731448763251,
+    "step": 1128
+  },
+  {
+    "loss": 1.3417,
+    "grad_norm": 0.861863911151886,
+    "learning_rate": 0.000140524131400499,
+    "epoch": 0.7985865724381626,
+    "step": 1130
+  },
+  {
+    "loss": 1.3582,
+    "grad_norm": 0.7944890260696411,
+    "learning_rate": 0.00013945290786589027,
+    "epoch": 0.8056537102473498,
+    "step": 1140
+  },
+  {
+    "loss": 1.3801,
+    "grad_norm": 0.7751563787460327,
+    "learning_rate": 0.00013837629524308408,
+    "epoch": 0.8127208480565371,
+    "step": 1150
+  },
+  {
+    "loss": 1.3959,
+    "grad_norm": 0.8568748235702515,
+    "learning_rate": 0.00013729444059247954,
+    "epoch": 0.8197879858657244,
+    "step": 1160
+  },
+  {
+    "loss": 1.3286,
+    "grad_norm": 0.7741242051124573,
+    "learning_rate": 0.00013620749169051307,
+    "epoch": 0.8268551236749117,
+    "step": 1170
+  },
+  {
+    "loss": 1.3591,
+    "grad_norm": 0.7787532806396484,
+    "learning_rate": 0.00013511559700947264,
+    "epoch": 0.833922261484099,
+    "step": 1180
+  },
+  {
+    "loss": 1.3575,
+    "grad_norm": 0.9094675779342651,
+    "learning_rate": 0.00013401890569721725,
+    "epoch": 0.8409893992932862,
+    "step": 1190
+  },
+  {
+    "loss": 1.3837,
+    "grad_norm": 0.785794198513031,
+    "learning_rate": 0.00013291756755680388,
+    "epoch": 0.8480565371024735,
+    "step": 1200
+  },
+  {
+    "loss": 1.3939,
+    "grad_norm": 0.7012341022491455,
+    "learning_rate": 0.00013181173302602528,
+    "epoch": 0.8551236749116607,
+    "step": 1210
+  },
+  {
+    "loss": 1.3591,
+    "grad_norm": 0.783419668674469,
+    "learning_rate": 0.0001307015531568606,
+    "epoch": 0.8621908127208481,
+    "step": 1220
+  },
+  {
+    "loss": 1.3824,
+    "grad_norm": 0.8260233998298645,
+    "learning_rate": 0.00012958717959484254,
+    "epoch": 0.8692579505300353,
+    "step": 1230
+  },
+  {
+    "loss": 1.3627,
+    "grad_norm": 0.8838050961494446,
+    "learning_rate": 0.0001284687645583432,
+    "epoch": 0.8763250883392226,
+    "step": 1240
+  },
+  {
+    "loss": 1.3743,
+    "grad_norm": 0.8196436166763306,
+    "learning_rate": 0.0001273464608177818,
+    "epoch": 0.8833922261484098,
+    "step": 1250
+  },
+  {
+    "loss": 1.3663,
+    "grad_norm": 0.77605801820755,
+    "learning_rate": 0.00012622042167475693,
+    "epoch": 0.8904593639575972,
+    "step": 1260
+  },
+  {
+    "loss": 1.3548,
+    "grad_norm": 0.8123463988304138,
+    "learning_rate": 0.00012509080094110604,
+    "epoch": 0.8975265017667845,
+    "step": 1270
+  },
+  {
+    "loss": 1.362,
+    "grad_norm": 0.8651890158653259,
+    "learning_rate": 0.00012395775291789568,
+    "epoch": 0.9045936395759717,
+    "step": 1280
+  },
+  {
+    "loss": 1.3513,
+    "grad_norm": 0.7966169118881226,
+    "learning_rate": 0.00012282143237434478,
+    "epoch": 0.911660777385159,
+    "step": 1290
+  },
+  {
+    "loss": 1.3654,
+    "grad_norm": 0.7800924181938171,
+    "learning_rate": 0.00012168199452668341,
+    "epoch": 0.9187279151943463,
+    "step": 1300
+  },
+  {
+    "loss": 1.3414,
+    "grad_norm": 0.7868988513946533,
+    "learning_rate": 0.00012053959501695145,
+    "epoch": 0.9257950530035336,
+    "step": 1310
+  },
+  {
+    "loss": 1.3444,
+    "grad_norm": 0.7945041656494141,
+    "learning_rate": 0.00011939438989173828,
+    "epoch": 0.9328621908127208,
+    "step": 1320
+  },
+  {
+    "loss": 1.368,
+    "grad_norm": 0.8256521224975586,
+    "learning_rate": 0.00011824653558086769,
+    "epoch": 0.9399293286219081,
+    "step": 1330
+  },
+  {
+    "loss": 1.3333,
+    "grad_norm": 0.7671541571617126,
+    "learning_rate": 0.00011709618887603014,
+    "epoch": 0.9469964664310954,
+    "step": 1340
+  },
+  {
+    "loss": 1.3531,
+    "grad_norm": 0.706906259059906,
+    "learning_rate": 0.00011594350690936581,
+    "epoch": 0.9540636042402827,
+    "step": 1350
+  },
+  {
+    "loss": 1.385,
+    "grad_norm": 0.8090202808380127,
+    "learning_rate": 0.00011478864713200113,
+    "epoch": 0.9611307420494699,
+    "step": 1360
+  },
+  {
+    "loss": 1.3731,
+    "grad_norm": 0.7614802718162537,
+    "learning_rate": 0.00011363176729254146,
+    "epoch": 0.9681978798586572,
+    "step": 1370
+  },
+  {
+    "loss": 1.3742,
+    "grad_norm": 0.7586848735809326,
+    "learning_rate": 0.00011247302541552359,
+    "epoch": 0.9752650176678446,
+    "step": 1380
+  },
+  {
+    "loss": 1.3183,
+    "grad_norm": 0.8582295775413513,
+    "learning_rate": 0.00011131257977983014,
+    "epoch": 0.9823321554770318,
+    "step": 1390
+  },
+  {
+    "loss": 1.3627,
+    "grad_norm": 0.7340189814567566,
+    "learning_rate": 0.00011015058889706942,
+    "epoch": 0.9893992932862191,
+    "step": 1400
+  },
+  {
+    "loss": 1.3606,
+    "grad_norm": 0.8590829372406006,
+    "learning_rate": 0.00010898721148992351,
+    "epoch": 0.9964664310954063,
+    "step": 1410
+  },
+  {
+    "eval_loss": 1.341532826423645,
+    "eval_runtime": 32.4522,
+    "eval_samples_per_second": 73.4,
+    "eval_steps_per_second": 18.365,
+    "epoch": 0.9964664310954063,
+    "step": 1410
+  },
+  {
+    "loss": 1.3476,
+    "grad_norm": 0.8152231574058533,
+    "learning_rate": 0.00010782260647046742,
+    "epoch": 1.0035335689045937,
+    "step": 1420
+  },
+  {
+    "loss": 1.3144,
+    "grad_norm": 0.8353652954101562,
+    "learning_rate": 0.00010665693291846244,
+    "epoch": 1.010600706713781,
+    "step": 1430
+  },
+  {
+    "loss": 1.3064,
+    "grad_norm": 0.9546340703964233,
+    "learning_rate": 0.00010549035005962653,
+    "epoch": 1.017667844522968,
+    "step": 1440
+  },
+  {
+    "loss": 1.2902,
+    "grad_norm": 0.8087377548217773,
+    "learning_rate": 0.00010432301724388485,
+    "epoch": 1.0247349823321554,
+    "step": 1450
+  },
+  {
+    "loss": 1.2955,
+    "grad_norm": 0.8206213116645813,
+    "learning_rate": 0.0001031550939236033,
+    "epoch": 1.0318021201413428,
+    "step": 1460
+  },
+  {
+    "loss": 1.3192,
+    "grad_norm": 0.8617863655090332,
+    "learning_rate": 0.00010198673963180796,
+    "epoch": 1.03886925795053,
+    "step": 1470
+  },
+  {
+    "loss": 1.3309,
+    "grad_norm": 0.815900444984436,
+    "learning_rate": 0.00010081811396039373,
+    "epoch": 1.0459363957597174,
+    "step": 1480
+  },
+  {
+    "loss": 1.2872,
+    "grad_norm": 0.8019934892654419,
+    "learning_rate": 9.964937653832468e-05,
+    "epoch": 1.0530035335689045,
+    "step": 1490
+  },
+  {
+    "loss": 1.2956,
+    "grad_norm": 0.7981704473495483,
+    "learning_rate": 9.848068700982955e-05,
+    "epoch": 1.0600706713780919,
+    "step": 1500
+  },
+  {
+    "loss": 1.3132,
+    "grad_norm": 0.8272152543067932,
+    "learning_rate": 9.731220501259501e-05,
+    "epoch": 1.0671378091872792,
+    "step": 1510
+  },
+  {
+    "loss": 1.291,
+    "grad_norm": 0.7766007781028748,
+    "learning_rate": 9.614409015595995e-05,
+    "epoch": 1.0742049469964665,
+    "step": 1520
+  },
+  {
+    "loss": 1.3021,
+    "grad_norm": 0.830301821231842,
+    "learning_rate": 9.497650199911341e-05,
+    "epoch": 1.0812720848056536,
+    "step": 1530
+  },
+  {
+    "loss": 1.3022,
+    "grad_norm": 0.7850944995880127,
+    "learning_rate": 9.380960002929979e-05,
+    "epoch": 1.088339222614841,
+    "step": 1540
+  },
+  {
+    "loss": 1.3042,
+    "grad_norm": 0.8939189314842224,
+    "learning_rate": 9.264354364003327e-05,
+    "epoch": 1.0954063604240283,
+    "step": 1550
+  },
+  {
+    "loss": 1.2754,
+    "grad_norm": 0.8872391581535339,
+    "learning_rate": 9.147849210932571e-05,
+    "epoch": 1.1024734982332156,
+    "step": 1560
+  },
+  {
+    "loss": 1.3033,
+    "grad_norm": 0.8385890126228333,
+    "learning_rate": 9.031460457792982e-05,
+    "epoch": 1.1095406360424027,
+    "step": 1570
+  },
+  {
+    "loss": 1.3267,
+    "grad_norm": 0.8151512742042542,
+    "learning_rate": 8.915204002760122e-05,
+    "epoch": 1.11660777385159,
+    "step": 1580
+  },
+  {
+    "loss": 1.3131,
+    "grad_norm": 0.80403733253479,
+    "learning_rate": 8.799095725938243e-05,
+    "epoch": 1.1236749116607774,
+    "step": 1590
+  },
+  {
+    "loss": 1.3262,
+    "grad_norm": 0.8914912939071655,
+    "learning_rate": 8.68315148719111e-05,
+    "epoch": 1.1307420494699647,
+    "step": 1600
+  },
+  {
+    "loss": 1.3178,
+    "grad_norm": 0.924933671951294,
+    "learning_rate": 8.567387123975648e-05,
+    "epoch": 1.137809187279152,
+    "step": 1610
+  },
+  {
+    "loss": 1.31,
+    "grad_norm": 0.7894246578216553,
+    "learning_rate": 8.451818449178591e-05,
+    "epoch": 1.1448763250883391,
+    "step": 1620
+  },
+  {
+    "loss": 1.2846,
+    "grad_norm": 0.7919443845748901,
+    "learning_rate": 8.336461248956522e-05,
+    "epoch": 1.1519434628975265,
+    "step": 1630
+  },
+  {
+    "loss": 1.2923,
+    "grad_norm": 0.7391479015350342,
+    "learning_rate": 8.221331280579564e-05,
+    "epoch": 1.1590106007067138,
+    "step": 1640
+  },
+  {
+    "loss": 1.2891,
+    "grad_norm": 0.8959116339683533,
+    "learning_rate": 8.106444270278999e-05,
+    "epoch": 1.1660777385159011,
+    "step": 1650
+  },
+  {
+    "loss": 1.3105,
+    "grad_norm": 0.788266122341156,
+    "learning_rate": 7.991815911099126e-05,
+    "epoch": 1.1731448763250882,
+    "step": 1660
+  },
+  {
+    "loss": 1.3184,
+    "grad_norm": 0.9487005472183228,
+    "learning_rate": 7.877461860753697e-05,
+    "epoch": 1.1802120141342756,
+    "step": 1670
+  },
+  {
+    "loss": 1.2882,
+    "grad_norm": 0.843334972858429,
+    "learning_rate": 7.763397739487098e-05,
+    "epoch": 1.187279151943463,
+    "step": 1680
+  },
+  {
+    "loss": 1.277,
+    "grad_norm": 0.7756645679473877,
+    "learning_rate": 7.649639127940735e-05,
+    "epoch": 1.1943462897526502,
+    "step": 1690
+  },
+  {
+    "eval_loss": 1.319354772567749,
+    "eval_runtime": 37.3799,
+    "eval_samples_per_second": 63.724,
+    "eval_steps_per_second": 15.944,
+    "epoch": 1.1957597173144876,
+    "step": 1692
+  },
+  {
+    "loss": 1.3241,
+    "grad_norm": 0.7394362092018127,
+    "learning_rate": 7.536201565024767e-05,
+    "epoch": 1.2014134275618376,
+    "step": 1700
+  },
+  {
+    "loss": 1.3292,
+    "grad_norm": 0.9032957553863525,
+    "learning_rate": 7.423100545795565e-05,
+    "epoch": 1.2084805653710247,
+    "step": 1710
+  },
+  {
+    "loss": 1.3072,
+    "grad_norm": 0.8041402697563171,
+    "learning_rate": 7.310351519339165e-05,
+    "epoch": 1.215547703180212,
+    "step": 1720
+  },
+  {
+    "loss": 1.3149,
+    "grad_norm": 0.8472148180007935,
+    "learning_rate": 7.197969886660984e-05,
+    "epoch": 1.2226148409893993,
+    "step": 1730
+  },
+  {
+    "loss": 1.2901,
+    "grad_norm": 0.8664119243621826,
+    "learning_rate": 7.085970998582112e-05,
+    "epoch": 1.2296819787985867,
+    "step": 1740
+  },
+  {
+    "loss": 1.2961,
+    "grad_norm": 0.8825384974479675,
+    "learning_rate": 6.974370153642468e-05,
+    "epoch": 1.2367491166077738,
+    "step": 1750
+  },
+  {
+    "loss": 1.2856,
+    "grad_norm": 0.8917942047119141,
+    "learning_rate": 6.863182596011087e-05,
+    "epoch": 1.243816254416961,
+    "step": 1760
+  },
+  {
+    "loss": 1.2594,
+    "grad_norm": 0.8514686226844788,
+    "learning_rate": 6.752423513403824e-05,
+    "epoch": 1.2508833922261484,
+    "step": 1770
+  },
+  {
+    "loss": 1.3013,
+    "grad_norm": 0.7883872985839844,
+    "learning_rate": 6.642108035008803e-05,
+    "epoch": 1.2579505300353357,
+    "step": 1780
+  },
+  {
+    "loss": 1.3038,
+    "grad_norm": 0.9033199548721313,
+    "learning_rate": 6.53225122941981e-05,
+    "epoch": 1.265017667844523,
+    "step": 1790
+  },
+  {
+    "loss": 1.3343,
+    "grad_norm": 0.8501263856887817,
+    "learning_rate": 6.422868102578018e-05,
+    "epoch": 1.2720848056537102,
+    "step": 1800
+  },
+  {
+    "loss": 1.3122,
+    "grad_norm": 0.8853347301483154,
+    "learning_rate": 6.31397359572223e-05,
+    "epoch": 1.2791519434628975,
+    "step": 1810
+  },
+  {
+    "loss": 1.2868,
+    "grad_norm": 0.7910575866699219,
+    "learning_rate": 6.205582583347974e-05,
+    "epoch": 1.2862190812720848,
+    "step": 1820
+  },
+  {
+    "loss": 1.2864,
+    "grad_norm": 0.8016606569290161,
+    "learning_rate": 6.097709871175723e-05,
+    "epoch": 1.293286219081272,
+    "step": 1830
+  },
+  {
+    "loss": 1.2966,
+    "grad_norm": 0.8374130725860596,
+    "learning_rate": 5.990370194128479e-05,
+    "epoch": 1.3003533568904593,
+    "step": 1840
+  },
+  {
+    "loss": 1.3106,
+    "grad_norm": 0.852772057056427,
+    "learning_rate": 5.88357821431908e-05,
+    "epoch": 1.3074204946996466,
+    "step": 1850
+  },
+  {
+    "loss": 1.2808,
+    "grad_norm": 0.8999858498573303,
+    "learning_rate": 5.7773485190474044e-05,
+    "epoch": 1.314487632508834,
+    "step": 1860
+  },
+  {
+    "loss": 1.2539,
+    "grad_norm": 0.8973957896232605,
+    "learning_rate": 5.671695618807802e-05,
+    "epoch": 1.3215547703180213,
+    "step": 1870
+  },
+  {
+    "loss": 1.2802,
+    "grad_norm": 0.7528464794158936,
+    "learning_rate": 5.566633945307052e-05,
+    "epoch": 1.3286219081272086,
+    "step": 1880
+  },
+  {
+    "loss": 1.2715,
+    "grad_norm": 0.8808310627937317,
+    "learning_rate": 5.4621778494930397e-05,
+    "epoch": 1.3356890459363957,
+    "step": 1890
+  },
+  {
+    "loss": 1.3017,
+    "grad_norm": 0.8389664888381958,
+    "learning_rate": 5.358341599594483e-05,
+    "epoch": 1.342756183745583,
+    "step": 1900
+  },
+  {
+    "loss": 1.2849,
+    "grad_norm": 0.9317607283592224,
+    "learning_rate": 5.255139379171967e-05,
+    "epoch": 1.3498233215547704,
+    "step": 1910
+  },
+  {
+    "loss": 1.2916,
+    "grad_norm": 0.803419828414917,
+    "learning_rate": 5.152585285180517e-05,
+    "epoch": 1.3568904593639575,
+    "step": 1920
+  },
+  {
+    "loss": 1.2805,
+    "grad_norm": 0.8279209136962891,
+    "learning_rate": 5.050693326044036e-05,
+    "epoch": 1.3639575971731448,
+    "step": 1930
+  },
+  {
+    "loss": 1.2576,
+    "grad_norm": 0.893677830696106,
+    "learning_rate": 4.949477419741814e-05,
+    "epoch": 1.3710247349823321,
+    "step": 1940
+  },
+  {
+    "loss": 1.2781,
+    "grad_norm": 0.8157869577407837,
+    "learning_rate": 4.848951391907377e-05,
+    "epoch": 1.3780918727915195,
+    "step": 1950
+  },
+  {
+    "loss": 1.2735,
+    "grad_norm": 0.8294868469238281,
+    "learning_rate": 4.749128973940001e-05,
+    "epoch": 1.3851590106007068,
+    "step": 1960
+  },
+  {
+    "loss": 1.3018,
+    "grad_norm": 0.8437710404396057,
+    "learning_rate": 4.6500238011290295e-05,
+    "epoch": 1.3922261484098941,
+    "step": 1970
+  },
+  {
+    "eval_loss": 1.3012111186981201,
+    "eval_runtime": 36.7927,
+    "eval_samples_per_second": 64.741,
+    "eval_steps_per_second": 16.199,
+    "epoch": 1.3950530035335689,
+    "step": 1974
+  },
+  {
+    "loss": 1.2794,
+    "grad_norm": 0.7894673347473145,
+    "learning_rate": 4.551649410791384e-05,
+    "epoch": 1.3992932862190812,
+    "step": 1980
+  },
+  {
+    "loss": 1.2718,
+    "grad_norm": 0.7576306462287903,
+    "learning_rate": 4.454019240422412e-05,
+    "epoch": 1.4063604240282686,
+    "step": 1990
+  },
+  {
+    "loss": 1.2912,
+    "grad_norm": 0.8169530630111694,
+    "learning_rate": 4.357146625860391e-05,
+    "epoch": 1.4134275618374559,
+    "step": 2000
+  },
+  {
+    "loss": 1.3073,
+    "grad_norm": 0.8148018717765808,
+    "learning_rate": 4.261044799464915e-05,
+    "epoch": 1.420494699646643,
+    "step": 2010
+  },
+  {
+    "loss": 1.2797,
+    "grad_norm": 0.8685176968574524,
+    "learning_rate": 4.165726888309402e-05,
+    "epoch": 1.4275618374558303,
+    "step": 2020
+  },
+  {
+    "loss": 1.2755,
+    "grad_norm": 0.8314803838729858,
+    "learning_rate": 4.0712059123880155e-05,
+    "epoch": 1.4346289752650176,
+    "step": 2030
+  },
+  {
+    "loss": 1.2559,
+    "grad_norm": 0.9396986961364746,
+    "learning_rate": 3.977494782837182e-05,
+    "epoch": 1.441696113074205,
+    "step": 2040
+  },
+  {
+    "loss": 1.2518,
+    "grad_norm": 0.9005519151687622,
+    "learning_rate": 3.884606300171979e-05,
+    "epoch": 1.4487632508833923,
+    "step": 2050
+  },
+  {
+    "loss": 1.3159,
+    "grad_norm": 0.8290562033653259,
+    "learning_rate": 3.7925531525376623e-05,
+    "epoch": 1.4558303886925796,
+    "step": 2060
+  },
+  {
+    "loss": 1.2345,
+    "grad_norm": 0.8309129476547241,
+    "learning_rate": 3.7013479139765115e-05,
+    "epoch": 1.4628975265017667,
+    "step": 2070
+  },
+  {
+    "loss": 1.2771,
+    "grad_norm": 0.8853537440299988,
+    "learning_rate": 3.611003042710266e-05,
+    "epoch": 1.469964664310954,
+    "step": 2080
+  },
+  {
+    "loss": 1.2894,
+    "grad_norm": 0.8893609642982483,
+    "learning_rate": 3.521530879438407e-05,
+    "epoch": 1.4770318021201414,
+    "step": 2090
+  },
+  {
+    "loss": 1.2939,
+    "grad_norm": 0.8105437755584717,
+    "learning_rate": 3.432943645652453e-05,
+    "epoch": 1.4840989399293285,
+    "step": 2100
+  },
+  {
+    "loss": 1.26,
+    "grad_norm": 0.8063532710075378,
+    "learning_rate": 3.345253441966579e-05,
+    "epoch": 1.4911660777385158,
+    "step": 2110
+  },
+  {
+    "loss": 1.2793,
+    "grad_norm": 0.8935479521751404,
+    "learning_rate": 3.258472246464717e-05,
+    "epoch": 1.4982332155477032,
+    "step": 2120
+  },
+  {
+    "loss": 1.3013,
+    "grad_norm": 0.7904537916183472,
+    "learning_rate": 3.172611913064402e-05,
+    "epoch": 1.5053003533568905,
+    "step": 2130
+  },
+  {
+    "loss": 1.2707,
+    "grad_norm": 0.8947263956069946,
+    "learning_rate": 3.087684169897588e-05,
+    "epoch": 1.5123674911660778,
+    "step": 2140
+  },
+  {
+    "loss": 1.285,
+    "grad_norm": 0.8150575160980225,
+    "learning_rate": 3.0037006177086346e-05,
+    "epoch": 1.5194346289752652,
+    "step": 2150
+  },
+  {
+    "loss": 1.2681,
+    "grad_norm": 0.833285391330719,
+    "learning_rate": 2.920672728269692e-05,
+    "epoch": 1.5265017667844523,
+    "step": 2160
+  },
+  {
+    "loss": 1.2484,
+    "grad_norm": 0.8433587551116943,
+    "learning_rate": 2.8386118428137254e-05,
+    "epoch": 1.5335689045936396,
+    "step": 2170
+  },
+  {
+    "loss": 1.2804,
+    "grad_norm": 0.8045548796653748,
+    "learning_rate": 2.7575291704853323e-05,
+    "epoch": 1.5406360424028267,
+    "step": 2180
+  },
+  {
+    "loss": 1.2542,
+    "grad_norm": 0.8309583067893982,
+    "learning_rate": 2.6774357868096432e-05,
+    "epoch": 1.547703180212014,
+    "step": 2190
+  },
+  {
+    "loss": 1.2715,
+    "grad_norm": 1.0081511735916138,
+    "learning_rate": 2.5983426321794502e-05,
+    "epoch": 1.5547703180212014,
+    "step": 2200
+  },
+  {
+    "loss": 1.291,
+    "grad_norm": 0.8824013471603394,
+    "learning_rate": 2.5202605103607835e-05,
+    "epoch": 1.5618374558303887,
+    "step": 2210
+  },
+  {
+    "loss": 1.2509,
+    "grad_norm": 0.8758257031440735,
+    "learning_rate": 2.443200087017192e-05,
+    "epoch": 1.568904593639576,
+    "step": 2220
+  },
+  {
+    "loss": 1.2616,
+    "grad_norm": 0.9100747108459473,
+    "learning_rate": 2.3671718882528437e-05,
+    "epoch": 1.5759717314487633,
+    "step": 2230
+  },
+  {
+    "loss": 1.2652,
+    "grad_norm": 0.7860396504402161,
+    "learning_rate": 2.292186299174712e-05,
+    "epoch": 1.5830388692579507,
+    "step": 2240
+  },
+  {
+    "loss": 1.2654,
+    "grad_norm": 0.7228366732597351,
+    "learning_rate": 2.218253562474023e-05,
+    "epoch": 1.5901060070671378,
+    "step": 2250
+  },
+  {
+    "eval_loss": 1.2894463539123535,
+    "eval_runtime": 38.7873,
+    "eval_samples_per_second": 61.412,
+    "eval_steps_per_second": 15.366,
+    "epoch": 1.5943462897526501,
+    "step": 2256
+  },
+  {
+    "loss": 1.249,
+    "grad_norm": 0.8728644251823425,
+    "learning_rate": 2.1453837770271334e-05,
+    "epoch": 1.5971731448763251,
+    "step": 2260
+  },
+  {
+    "loss": 1.2812,
+    "grad_norm": 0.8223507404327393,
+    "learning_rate": 2.0735868965160953e-05,
+    "epoch": 1.6042402826855122,
+    "step": 2270
+  },
+  {
+    "loss": 1.255,
+    "grad_norm": 0.8579265475273132,
+    "learning_rate": 2.0028727280690107e-05,
+    "epoch": 1.6113074204946995,
+    "step": 2280
+  },
+  {
+    "loss": 1.2282,
+    "grad_norm": 0.8867602348327637,
+    "learning_rate": 1.9332509309204183e-05,
+    "epoch": 1.6183745583038869,
+    "step": 2290
+  },
+  {
+    "loss": 1.2313,
+    "grad_norm": 0.8542134761810303,
+    "learning_rate": 1.8647310150919083e-05,
+    "epoch": 1.6254416961130742,
+    "step": 2300
+  },
+  {
+    "loss": 1.2614,
+    "grad_norm": 0.956292986869812,
+    "learning_rate": 1.797322340093067e-05,
+    "epoch": 1.6325088339222615,
+    "step": 2310
+  },
+  {
+    "loss": 1.2972,
+    "grad_norm": 0.8404093980789185,
+    "learning_rate": 1.7310341136430385e-05,
+    "epoch": 1.6395759717314489,
+    "step": 2320
+  },
+  {
+    "loss": 1.2437,
+    "grad_norm": 0.8927039504051208,
+    "learning_rate": 1.6658753904127734e-05,
+    "epoch": 1.6466431095406362,
+    "step": 2330
+  },
+  {
+    "loss": 1.2578,
+    "grad_norm": 0.8942687511444092,
+    "learning_rate": 1.6018550707882062e-05,
+    "epoch": 1.6537102473498233,
+    "step": 2340
+  },
+  {
+    "loss": 1.2816,
+    "grad_norm": 0.8343712687492371,
+    "learning_rate": 1.538981899654508e-05,
+    "epoch": 1.6607773851590106,
+    "step": 2350
+  },
+  {
+    "loss": 1.263,
+    "grad_norm": 0.8375242352485657,
+    "learning_rate": 1.477264465201572e-05,
+    "epoch": 1.6678445229681977,
+    "step": 2360
+  },
+  {
+    "loss": 1.2703,
+    "grad_norm": 0.802559494972229,
+    "learning_rate": 1.4167111977508973e-05,
+    "epoch": 1.674911660777385,
+    "step": 2370
+  },
+  {
+    "loss": 1.2546,
+    "grad_norm": 0.9226746559143066,
+    "learning_rate": 1.3573303686040628e-05,
+    "epoch": 1.6819787985865724,
+    "step": 2380
+  },
+  {
+    "loss": 1.2529,
+    "grad_norm": 0.8805841207504272,
+    "learning_rate": 1.2991300889128866e-05,
+    "epoch": 1.6890459363957597,
+    "step": 2390
+  },
+  {
+    "loss": 1.2908,
+    "grad_norm": 0.830085039138794,
+    "learning_rate": 1.2421183085714927e-05,
+    "epoch": 1.696113074204947,
+    "step": 2400
+  },
+  {
+    "loss": 1.2519,
+    "grad_norm": 0.8588423132896423,
+    "learning_rate": 1.1863028151303879e-05,
+    "epoch": 1.7031802120141344,
+    "step": 2410
+  },
+  {
+    "loss": 1.2771,
+    "grad_norm": 0.7939627766609192,
+    "learning_rate": 1.13169123273271e-05,
+    "epoch": 1.7102473498233217,
+    "step": 2420
+  },
+  {
+    "loss": 1.2684,
+    "grad_norm": 0.7675894498825073,
+    "learning_rate": 1.078291021072817e-05,
+    "epoch": 1.7173144876325088,
+    "step": 2430
+  },
+  {
+    "loss": 1.2842,
+    "grad_norm": 0.9253499507904053,
+    "learning_rate": 1.0261094743773203e-05,
+    "epoch": 1.7243816254416962,
+    "step": 2440
+  },
+  {
+    "loss": 1.2408,
+    "grad_norm": 0.786151111125946,
+    "learning_rate": 9.751537204087258e-06,
+    "epoch": 1.7314487632508833,
+    "step": 2450
+  },
+  {
+    "loss": 1.2586,
+    "grad_norm": 0.8652048110961914,
+    "learning_rate": 9.254307194918144e-06,
+    "epoch": 1.7385159010600706,
+    "step": 2460
+  },
+  {
+    "loss": 1.2762,
+    "grad_norm": 0.8132146000862122,
+    "learning_rate": 8.769472635628905e-06,
+    "epoch": 1.745583038869258,
+    "step": 2470
+  },
+  {
+    "loss": 1.2605,
+    "grad_norm": 0.9114975333213806,
+    "learning_rate": 8.297099752420446e-06,
+    "epoch": 1.7526501766784452,
+    "step": 2480
+  },
+  {
+    "loss": 1.2713,
+    "grad_norm": 0.7870185971260071,
+    "learning_rate": 7.837253069285234e-06,
+    "epoch": 1.7597173144876326,
+    "step": 2490
+  },
+  {
+    "loss": 1.258,
+    "grad_norm": 0.8339151740074158,
+    "learning_rate": 7.389995399193595e-06,
+    "epoch": 1.76678445229682,
+    "step": 2500
+  },
+  {
+    "loss": 1.2465,
+    "grad_norm": 0.8879913091659546,
+    "learning_rate": 6.9553878355138936e-06,
+    "epoch": 1.773851590106007,
+    "step": 2510
+  },
+  {
+    "loss": 1.2698,
+    "grad_norm": 0.797390341758728,
+    "learning_rate": 6.5334897436672535e-06,
+    "epoch": 1.7809187279151943,
+    "step": 2520
+  },
+  {
+    "loss": 1.272,
+    "grad_norm": 0.8963540196418762,
+    "learning_rate": 6.124358753018689e-06,
+    "epoch": 1.7879858657243817,
+    "step": 2530
+  },
+  {
+    "eval_loss": 1.283734917640686,
+    "eval_runtime": 37.4706,
+    "eval_samples_per_second": 63.57,
+    "eval_steps_per_second": 15.906,
+    "epoch": 1.7936395759717314,
+    "step": 2538
+  },
+  {
+    "loss": 1.2606,
+    "grad_norm": 0.93072110414505,
+    "learning_rate": 5.7280507490050985e-06,
+    "epoch": 1.7950530035335688,
+    "step": 2540
+  },
+  {
+    "loss": 1.2404,
+    "grad_norm": 0.8684141635894775,
+    "learning_rate": 5.3446198655015765e-06,
+    "epoch": 1.802120141342756,
+    "step": 2550
+  },
+  {
+    "loss": 1.2564,
+    "grad_norm": 0.8672965168952942,
+    "learning_rate": 4.974118477426992e-06,
+    "epoch": 1.8091872791519434,
+    "step": 2560
+  },
+  {
+    "loss": 1.2504,
+    "grad_norm": 0.7872085571289062,
+    "learning_rate": 4.616597193589833e-06,
+    "epoch": 1.8162544169611308,
+    "step": 2570
+  },
+  {
+    "loss": 1.2653,
+    "grad_norm": 0.863117516040802,
+    "learning_rate": 4.272104849775216e-06,
+    "epoch": 1.823321554770318,
+    "step": 2580
+  },
+  {
+    "loss": 1.2169,
+    "grad_norm": 1.0649867057800293,
+    "learning_rate": 3.940688502074186e-06,
+    "epoch": 1.8303886925795054,
+    "step": 2590
+  },
+  {
+    "loss": 1.2485,
+    "grad_norm": 0.8719679117202759,
+    "learning_rate": 3.622393420456016e-06,
+    "epoch": 1.8374558303886925,
+    "step": 2600
+  },
+  {
+    "loss": 1.2318,
+    "grad_norm": 0.9570611715316772,
+    "learning_rate": 3.3172630825846095e-06,
+    "epoch": 1.8445229681978799,
+    "step": 2610
+  },
+  {
+    "loss": 1.2593,
+    "grad_norm": 0.8749817609786987,
+    "learning_rate": 3.025339167879615e-06,
+    "epoch": 1.851590106007067,
+    "step": 2620
+  },
+  {
+    "loss": 1.259,
+    "grad_norm": 0.7263243198394775,
+    "learning_rate": 2.7466615518231486e-06,
+    "epoch": 1.8586572438162543,
+    "step": 2630
+  },
+  {
+    "loss": 1.2725,
+    "grad_norm": 0.8946100473403931,
+    "learning_rate": 2.4812683005130843e-06,
+    "epoch": 1.8657243816254416,
+    "step": 2640
+  },
+  {
+    "loss": 1.253,
+    "grad_norm": 0.7784376740455627,
+    "learning_rate": 2.229195665463324e-06,
+    "epoch": 1.872791519434629,
+    "step": 2650
+  },
+  {
+    "loss": 1.284,
+    "grad_norm": 0.8362652659416199,
+    "learning_rate": 1.990478078652047e-06,
+    "epoch": 1.8798586572438163,
+    "step": 2660
+  },
+  {
+    "loss": 1.2727,
+    "grad_norm": 0.8206844329833984,
+    "learning_rate": 1.7651481478184296e-06,
+    "epoch": 1.8869257950530036,
+    "step": 2670
+  },
+  {
+    "loss": 1.2561,
+    "grad_norm": 0.7767570614814758,
+    "learning_rate": 1.553236652008605e-06,
+    "epoch": 1.893992932862191,
+    "step": 2680
+  },
+  {
+    "loss": 1.2774,
+    "grad_norm": 0.8429548740386963,
+    "learning_rate": 1.3547725373713405e-06,
+    "epoch": 1.901060070671378,
+    "step": 2690
+  },
+  {
+    "loss": 1.2471,
+    "grad_norm": 0.8795408606529236,
+    "learning_rate": 1.169782913204176e-06,
+    "epoch": 1.9081272084805654,
+    "step": 2700
+  },
+  {
+    "loss": 1.2484,
+    "grad_norm": 0.8149511814117432,
+    "learning_rate": 9.98293048250376e-07,
+    "epoch": 1.9151943462897525,
+    "step": 2710
+  },
+  {
+    "loss": 1.2341,
+    "grad_norm": 0.9525308609008789,
+    "learning_rate": 8.403263672473793e-07,
+    "epoch": 1.9222614840989398,
+    "step": 2720
+  },
+  {
+    "loss": 1.2646,
+    "grad_norm": 0.849061906337738,
+    "learning_rate": 6.959044477270138e-07,
+    "epoch": 1.9293286219081272,
+    "step": 2730
+  },
+  {
+    "loss": 1.2561,
+    "grad_norm": 0.7326868772506714,
+    "learning_rate": 5.650470170681876e-07,
+    "epoch": 1.9363957597173145,
+    "step": 2740
+  },
+  {
+    "loss": 1.2533,
+    "grad_norm": 0.8101119995117188,
+    "learning_rate": 4.477719498021782e-07,
+    "epoch": 1.9434628975265018,
+    "step": 2750
+  },
+  {
+    "loss": 1.2551,
+    "grad_norm": 0.8675141334533691,
+    "learning_rate": 3.440952651710072e-07,
+    "epoch": 1.9505300353356891,
+    "step": 2760
+  },
+  {
+    "loss": 1.274,
+    "grad_norm": 0.8655520081520081,
+    "learning_rate": 2.540311249393912e-07,
+    "epoch": 1.9575971731448765,
+    "step": 2770
+  },
+  {
+    "loss": 1.258,
+    "grad_norm": 0.8928612470626831,
+    "learning_rate": 1.7759183146021096e-07,
+    "epoch": 1.9646643109540636,
+    "step": 2780
+  },
+  {
+    "loss": 1.2497,
+    "grad_norm": 0.8795515298843384,
+    "learning_rate": 1.1478782599411153e-07,
+    "epoch": 1.971731448763251,
+    "step": 2790
+  },
+  {
+    "loss": 1.2383,
+    "grad_norm": 0.8107221126556396,
+    "learning_rate": 6.562768728327618e-08,
+    "epoch": 1.978798586572438,
+    "step": 2800
+  },
+  {
+    "loss": 1.2578,
+    "grad_norm": 0.8321946859359741,
+    "learning_rate": 3.0118130379575005e-08,
+    "epoch": 1.9858657243816253,
+    "step": 2810
+  },
+  {
+    "loss": 1.273,
+    "grad_norm": 0.8319383859634399,
+    "learning_rate": 8.2640057273764e-09,
+    "epoch": 1.9929328621908127,
+    "step": 2820
+  },
+  {
+    "eval_loss": 1.282422423362732,
+    "eval_runtime": 32.3803,
+    "eval_samples_per_second": 73.563,
+    "eval_steps_per_second": 18.406,
+    "epoch": 1.9929328621908127,
+    "step": 2820
+  },
+  {
+    "loss": 1.2187,
+    "grad_norm": 1.9112646579742432,
+    "learning_rate": 6.829850092149315e-11,
+    "epoch": 2.0,
+    "step": 2830
+  },
+  {
+    "train_runtime": 1496.9978,
+    "train_samples_per_second": 60.461,
+    "train_steps_per_second": 1.89,
+    "total_flos": 6.672248883264e+16,
+    "train_loss": 1.398907420828991,
+    "epoch": 2.0,
+    "step": 2830
+  }
+]

train/training_loss.png ADDED Viewed

train/validation_loss.png ADDED Viewed