LemTenku commited on May 14, 2025

Commit

d52dcb5

verified ·

1 Parent(s): 761da8c

Upload 16 files

Browse files

Files changed (17) hide show

.gitattributes +1 -0
checkpoint-1951/README.md +202 -0
checkpoint-1951/adapter_config.json +31 -0
checkpoint-1951/adapter_model.safetensors +3 -0
checkpoint-1951/added_tokens.json +3 -0
checkpoint-1951/chat_template.json +3 -0
checkpoint-1951/optimizer.pt +3 -0
checkpoint-1951/preprocessor_config.json +29 -0
checkpoint-1951/processor_config.json +4 -0
checkpoint-1951/rng_state.pth +3 -0
checkpoint-1951/scheduler.pt +3 -0
checkpoint-1951/special_tokens_map.json +33 -0
checkpoint-1951/tokenizer.json +3 -0
checkpoint-1951/tokenizer.model +3 -0
checkpoint-1951/tokenizer_config.json +0 -0
checkpoint-1951/trainer_state.json +1399 -0
checkpoint-1951/training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+checkpoint-1951/tokenizer.json filter=lfs diff=lfs merge=lfs -text

checkpoint-1951/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/gemma-3-27b-it-unsloth-bnb-4bit
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.2

checkpoint-1951/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/gemma-3-27b-it-unsloth-bnb-4bit",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": "(?:.*?(?:language|text).*?(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense).*?(?:k_proj|v_proj|q_proj|out_proj|fc1|fc2|o_proj|gate_proj|up_proj|down_proj).*?)|(?:\\bmodel\\.layers\\.[\\d]{1,}\\.(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense)\\.(?:(?:k_proj|v_proj|q_proj|out_proj|fc1|fc2|o_proj|gate_proj|up_proj|down_proj)))",
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1951/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0991f25ad6109394f86dbdb7ef92b0f042005b86e0aaf7b94f01f0a528eb016d
+size 908262808

checkpoint-1951/added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "<image_soft_token>": 262144
+}

checkpoint-1951/chat_template.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "chat_template": "{{ bos_token }}\n{%- if messages[0]['role'] == 'system' -%}\n    {%- if messages[0]['content'] is string -%}\n        {%- set first_user_prefix = messages[0]['content'] + '\n\n' -%}\n    {%- else -%}\n        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '\n\n' -%}\n    {%- endif -%}\n    {%- set loop_messages = messages[1:] -%}\n{%- else -%}\n    {%- set first_user_prefix = \"\" -%}\n    {%- set loop_messages = messages -%}\n{%- endif -%}\n{%- for message in loop_messages -%}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}\n        {{ raise_exception(\"Conversation roles must alternate user/assistant/user/assistant/...\") }}\n    {%- endif -%}\n    {%- if (message['role'] == 'assistant') -%}\n        {%- set role = \"model\" -%}\n    {%- else -%}\n        {%- set role = message['role'] -%}\n    {%- endif -%}\n    {{ '<start_of_turn>' + role + '\n' + (first_user_prefix if loop.first else \"\") }}\n    {%- if message['content'] is string -%}\n        {{ message['content'] | trim }}\n    {%- elif message['content'] is iterable -%}\n        {%- for item in message['content'] -%}\n            {%- if item['type'] == 'image' -%}\n                {{ '<start_of_image>' }}\n            {%- elif item['type'] == 'text' -%}\n                {{ item['text'] | trim }}\n            {%- endif -%}\n        {%- endfor -%}\n    {%- else -%}\n        {{ raise_exception(\"Invalid content type\") }}\n    {%- endif -%}\n    {{ '<end_of_turn>\n' }}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n    {{ '<start_of_turn>model\n' }}\n{%- endif -%}\n"
+}

checkpoint-1951/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:174d4b38ff05c562f743cb89bc66b9307417e779edaf204f847ec1fd25528fcc
+size 462182709

checkpoint-1951/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "do_convert_rgb": null,
+  "do_normalize": true,
+  "do_pan_and_scan": null,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "Gemma3ImageProcessor",
+  "image_seq_length": 256,
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "pan_and_scan_max_num_crops": null,
+  "pan_and_scan_min_crop_size": null,
+  "pan_and_scan_min_ratio_to_activate": null,
+  "processor_class": "Gemma3Processor",
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 896,
+    "width": 896
+  }
+}

checkpoint-1951/processor_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "image_seq_length": 256,
+  "processor_class": "Gemma3Processor"
+}

checkpoint-1951/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1d565802a8e26c4e8a31328752b7a7fdc186d9401aa008e65697d0ad8c22e33
+size 14645

checkpoint-1951/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9eeb1624d4075b65b4daeb93d3502c86030c08a30b488d38baee962bb4a95448
+size 1465

checkpoint-1951/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<end_of_turn>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1951/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
+size 33384568

checkpoint-1951/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074

checkpoint-1951/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1951/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1399 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.9996157294735494,
+  "eval_steps": 500,
+  "global_step": 1951,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0051236070193416165,
+      "grad_norm": 0.9283762574195862,
+      "learning_rate": 3.082191780821918e-07,
+      "loss": 2.5149,
+      "step": 10
+    },
+    {
+      "epoch": 0.010247214038683233,
+      "grad_norm": 2.0771284103393555,
+      "learning_rate": 6.506849315068493e-07,
+      "loss": 2.5983,
+      "step": 20
+    },
+    {
+      "epoch": 0.01537082105802485,
+      "grad_norm": 1.2729460000991821,
+      "learning_rate": 9.931506849315068e-07,
+      "loss": 2.7607,
+      "step": 30
+    },
+    {
+      "epoch": 0.020494428077366466,
+      "grad_norm": 1.8901182413101196,
+      "learning_rate": 1.3356164383561645e-06,
+      "loss": 2.6807,
+      "step": 40
+    },
+    {
+      "epoch": 0.025618035096708083,
+      "grad_norm": 1.7464419603347778,
+      "learning_rate": 1.678082191780822e-06,
+      "loss": 2.6693,
+      "step": 50
+    },
+    {
+      "epoch": 0.0307416421160497,
+      "grad_norm": 2.5376923084259033,
+      "learning_rate": 2.0205479452054797e-06,
+      "loss": 2.4835,
+      "step": 60
+    },
+    {
+      "epoch": 0.03586524913539132,
+      "grad_norm": 2.2432198524475098,
+      "learning_rate": 2.363013698630137e-06,
+      "loss": 2.442,
+      "step": 70
+    },
+    {
+      "epoch": 0.04098885615473293,
+      "grad_norm": 1.4864252805709839,
+      "learning_rate": 2.705479452054795e-06,
+      "loss": 2.5702,
+      "step": 80
+    },
+    {
+      "epoch": 0.046112463174074546,
+      "grad_norm": 0.9299277067184448,
+      "learning_rate": 3.0479452054794525e-06,
+      "loss": 2.4027,
+      "step": 90
+    },
+    {
+      "epoch": 0.05123607019341617,
+      "grad_norm": 0.9113534688949585,
+      "learning_rate": 3.39041095890411e-06,
+      "loss": 2.1641,
+      "step": 100
+    },
+    {
+      "epoch": 0.05635967721275778,
+      "grad_norm": 3.573970317840576,
+      "learning_rate": 3.7328767123287675e-06,
+      "loss": 2.2474,
+      "step": 110
+    },
+    {
+      "epoch": 0.0614832842320994,
+      "grad_norm": 0.44894328713417053,
+      "learning_rate": 4.075342465753426e-06,
+      "loss": 2.1571,
+      "step": 120
+    },
+    {
+      "epoch": 0.06660689125144101,
+      "grad_norm": 0.41464388370513916,
+      "learning_rate": 4.4178082191780825e-06,
+      "loss": 2.0952,
+      "step": 130
+    },
+    {
+      "epoch": 0.07173049827078264,
+      "grad_norm": 0.4125676155090332,
+      "learning_rate": 4.76027397260274e-06,
+      "loss": 2.0756,
+      "step": 140
+    },
+    {
+      "epoch": 0.07685410529012425,
+      "grad_norm": 0.32413947582244873,
+      "learning_rate": 5.102739726027398e-06,
+      "loss": 2.1321,
+      "step": 150
+    },
+    {
+      "epoch": 0.08197771230946586,
+      "grad_norm": 0.42782750725746155,
+      "learning_rate": 5.445205479452055e-06,
+      "loss": 2.0177,
+      "step": 160
+    },
+    {
+      "epoch": 0.08710131932880748,
+      "grad_norm": 0.2886713445186615,
+      "learning_rate": 5.7876712328767125e-06,
+      "loss": 2.1109,
+      "step": 170
+    },
+    {
+      "epoch": 0.09222492634814909,
+      "grad_norm": 0.32167181372642517,
+      "learning_rate": 6.13013698630137e-06,
+      "loss": 2.0263,
+      "step": 180
+    },
+    {
+      "epoch": 0.09734853336749072,
+      "grad_norm": 0.3712522089481354,
+      "learning_rate": 6.472602739726028e-06,
+      "loss": 2.1223,
+      "step": 190
+    },
+    {
+      "epoch": 0.10247214038683233,
+      "grad_norm": 0.3617869019508362,
+      "learning_rate": 6.815068493150685e-06,
+      "loss": 2.0005,
+      "step": 200
+    },
+    {
+      "epoch": 0.10759574740617395,
+      "grad_norm": 0.3981476128101349,
+      "learning_rate": 7.1575342465753425e-06,
+      "loss": 2.0404,
+      "step": 210
+    },
+    {
+      "epoch": 0.11271935442551556,
+      "grad_norm": 0.404182106256485,
+      "learning_rate": 7.500000000000001e-06,
+      "loss": 1.988,
+      "step": 220
+    },
+    {
+      "epoch": 0.11784296144485717,
+      "grad_norm": 0.36979714035987854,
+      "learning_rate": 7.842465753424659e-06,
+      "loss": 1.9519,
+      "step": 230
+    },
+    {
+      "epoch": 0.1229665684641988,
+      "grad_norm": 0.3464847207069397,
+      "learning_rate": 8.184931506849316e-06,
+      "loss": 1.9843,
+      "step": 240
+    },
+    {
+      "epoch": 0.1280901754835404,
+      "grad_norm": 0.28336334228515625,
+      "learning_rate": 8.527397260273972e-06,
+      "loss": 2.0262,
+      "step": 250
+    },
+    {
+      "epoch": 0.13321378250288202,
+      "grad_norm": 0.41522300243377686,
+      "learning_rate": 8.86986301369863e-06,
+      "loss": 1.892,
+      "step": 260
+    },
+    {
+      "epoch": 0.13833738952222366,
+      "grad_norm": 0.30926865339279175,
+      "learning_rate": 9.212328767123288e-06,
+      "loss": 2.1701,
+      "step": 270
+    },
+    {
+      "epoch": 0.14346099654156527,
+      "grad_norm": 0.33913934230804443,
+      "learning_rate": 9.554794520547946e-06,
+      "loss": 1.9823,
+      "step": 280
+    },
+    {
+      "epoch": 0.14858460356090689,
+      "grad_norm": 0.33306095004081726,
+      "learning_rate": 9.897260273972603e-06,
+      "loss": 1.9394,
+      "step": 290
+    },
+    {
+      "epoch": 0.1537082105802485,
+      "grad_norm": 0.4078270196914673,
+      "learning_rate": 9.987412335910808e-06,
+      "loss": 1.9188,
+      "step": 300
+    },
+    {
+      "epoch": 0.1588318175995901,
+      "grad_norm": 0.37526968121528625,
+      "learning_rate": 9.969429958640533e-06,
+      "loss": 1.8633,
+      "step": 310
+    },
+    {
+      "epoch": 0.16395542461893173,
+      "grad_norm": 0.3586269021034241,
+      "learning_rate": 9.951447581370258e-06,
+      "loss": 1.8515,
+      "step": 320
+    },
+    {
+      "epoch": 0.16907903163827334,
+      "grad_norm": 0.42047321796417236,
+      "learning_rate": 9.933465204099983e-06,
+      "loss": 1.9326,
+      "step": 330
+    },
+    {
+      "epoch": 0.17420263865761496,
+      "grad_norm": 0.3294510543346405,
+      "learning_rate": 9.915482826829708e-06,
+      "loss": 1.8907,
+      "step": 340
+    },
+    {
+      "epoch": 0.17932624567695657,
+      "grad_norm": 0.3993407189846039,
+      "learning_rate": 9.897500449559433e-06,
+      "loss": 2.004,
+      "step": 350
+    },
+    {
+      "epoch": 0.18444985269629818,
+      "grad_norm": 0.4460839331150055,
+      "learning_rate": 9.879518072289156e-06,
+      "loss": 1.8511,
+      "step": 360
+    },
+    {
+      "epoch": 0.1895734597156398,
+      "grad_norm": 0.34153926372528076,
+      "learning_rate": 9.861535695018883e-06,
+      "loss": 1.9739,
+      "step": 370
+    },
+    {
+      "epoch": 0.19469706673498144,
+      "grad_norm": 0.40620914101600647,
+      "learning_rate": 9.843553317748608e-06,
+      "loss": 1.7808,
+      "step": 380
+    },
+    {
+      "epoch": 0.19982067375432305,
+      "grad_norm": 0.4027026891708374,
+      "learning_rate": 9.825570940478331e-06,
+      "loss": 2.0195,
+      "step": 390
+    },
+    {
+      "epoch": 0.20494428077366467,
+      "grad_norm": 0.4129427671432495,
+      "learning_rate": 9.807588563208058e-06,
+      "loss": 1.8981,
+      "step": 400
+    },
+    {
+      "epoch": 0.21006788779300628,
+      "grad_norm": 0.48501360416412354,
+      "learning_rate": 9.789606185937783e-06,
+      "loss": 1.9021,
+      "step": 410
+    },
+    {
+      "epoch": 0.2151914948123479,
+      "grad_norm": 0.38687485456466675,
+      "learning_rate": 9.771623808667506e-06,
+      "loss": 1.9541,
+      "step": 420
+    },
+    {
+      "epoch": 0.2203151018316895,
+      "grad_norm": 0.4391244649887085,
+      "learning_rate": 9.753641431397232e-06,
+      "loss": 1.891,
+      "step": 430
+    },
+    {
+      "epoch": 0.22543870885103112,
+      "grad_norm": 0.4501667022705078,
+      "learning_rate": 9.735659054126957e-06,
+      "loss": 1.9017,
+      "step": 440
+    },
+    {
+      "epoch": 0.23056231587037274,
+      "grad_norm": 0.4022745192050934,
+      "learning_rate": 9.71767667685668e-06,
+      "loss": 1.9769,
+      "step": 450
+    },
+    {
+      "epoch": 0.23568592288971435,
+      "grad_norm": 0.4331080913543701,
+      "learning_rate": 9.699694299586407e-06,
+      "loss": 1.8874,
+      "step": 460
+    },
+    {
+      "epoch": 0.24080952990905596,
+      "grad_norm": 0.46478205919265747,
+      "learning_rate": 9.68171192231613e-06,
+      "loss": 1.8132,
+      "step": 470
+    },
+    {
+      "epoch": 0.2459331369283976,
+      "grad_norm": 0.5136926174163818,
+      "learning_rate": 9.663729545045855e-06,
+      "loss": 1.9756,
+      "step": 480
+    },
+    {
+      "epoch": 0.2510567439477392,
+      "grad_norm": 0.40337473154067993,
+      "learning_rate": 9.64574716777558e-06,
+      "loss": 1.7995,
+      "step": 490
+    },
+    {
+      "epoch": 0.2561803509670808,
+      "grad_norm": 0.5226535201072693,
+      "learning_rate": 9.627764790505305e-06,
+      "loss": 1.8977,
+      "step": 500
+    },
+    {
+      "epoch": 0.2613039579864224,
+      "grad_norm": 0.4993721544742584,
+      "learning_rate": 9.60978241323503e-06,
+      "loss": 1.815,
+      "step": 510
+    },
+    {
+      "epoch": 0.26642756500576403,
+      "grad_norm": 0.5010855793952942,
+      "learning_rate": 9.591800035964755e-06,
+      "loss": 1.8641,
+      "step": 520
+    },
+    {
+      "epoch": 0.27155117202510565,
+      "grad_norm": 0.6061142683029175,
+      "learning_rate": 9.57381765869448e-06,
+      "loss": 1.7172,
+      "step": 530
+    },
+    {
+      "epoch": 0.2766747790444473,
+      "grad_norm": 0.48305490612983704,
+      "learning_rate": 9.555835281424205e-06,
+      "loss": 1.8936,
+      "step": 540
+    },
+    {
+      "epoch": 0.28179838606378893,
+      "grad_norm": 0.4641883969306946,
+      "learning_rate": 9.53785290415393e-06,
+      "loss": 1.9441,
+      "step": 550
+    },
+    {
+      "epoch": 0.28692199308313054,
+      "grad_norm": 0.5004730224609375,
+      "learning_rate": 9.519870526883655e-06,
+      "loss": 1.8094,
+      "step": 560
+    },
+    {
+      "epoch": 0.29204560010247216,
+      "grad_norm": 0.47707024216651917,
+      "learning_rate": 9.50188814961338e-06,
+      "loss": 1.7379,
+      "step": 570
+    },
+    {
+      "epoch": 0.29716920712181377,
+      "grad_norm": 0.5916208624839783,
+      "learning_rate": 9.483905772343104e-06,
+      "loss": 1.8396,
+      "step": 580
+    },
+    {
+      "epoch": 0.3022928141411554,
+      "grad_norm": 0.5307938456535339,
+      "learning_rate": 9.46592339507283e-06,
+      "loss": 1.8623,
+      "step": 590
+    },
+    {
+      "epoch": 0.307416421160497,
+      "grad_norm": 0.4907430112361908,
+      "learning_rate": 9.447941017802554e-06,
+      "loss": 1.9209,
+      "step": 600
+    },
+    {
+      "epoch": 0.3125400281798386,
+      "grad_norm": 0.49913763999938965,
+      "learning_rate": 9.429958640532279e-06,
+      "loss": 1.9296,
+      "step": 610
+    },
+    {
+      "epoch": 0.3176636351991802,
+      "grad_norm": 0.5934273600578308,
+      "learning_rate": 9.411976263262004e-06,
+      "loss": 1.9988,
+      "step": 620
+    },
+    {
+      "epoch": 0.32278724221852184,
+      "grad_norm": 0.5858088731765747,
+      "learning_rate": 9.393993885991729e-06,
+      "loss": 1.8334,
+      "step": 630
+    },
+    {
+      "epoch": 0.32791084923786346,
+      "grad_norm": 0.5141202211380005,
+      "learning_rate": 9.376011508721454e-06,
+      "loss": 2.0364,
+      "step": 640
+    },
+    {
+      "epoch": 0.33303445625720507,
+      "grad_norm": 0.5824264287948608,
+      "learning_rate": 9.358029131451179e-06,
+      "loss": 1.7941,
+      "step": 650
+    },
+    {
+      "epoch": 0.3381580632765467,
+      "grad_norm": 0.559829831123352,
+      "learning_rate": 9.340046754180904e-06,
+      "loss": 1.9578,
+      "step": 660
+    },
+    {
+      "epoch": 0.3432816702958883,
+      "grad_norm": 0.4965716004371643,
+      "learning_rate": 9.322064376910629e-06,
+      "loss": 1.9109,
+      "step": 670
+    },
+    {
+      "epoch": 0.3484052773152299,
+      "grad_norm": 0.7789948582649231,
+      "learning_rate": 9.304081999640354e-06,
+      "loss": 1.9302,
+      "step": 680
+    },
+    {
+      "epoch": 0.3535288843345715,
+      "grad_norm": 0.5007458925247192,
+      "learning_rate": 9.286099622370078e-06,
+      "loss": 1.9089,
+      "step": 690
+    },
+    {
+      "epoch": 0.35865249135391314,
+      "grad_norm": 0.6653125882148743,
+      "learning_rate": 9.268117245099802e-06,
+      "loss": 1.9592,
+      "step": 700
+    },
+    {
+      "epoch": 0.36377609837325475,
+      "grad_norm": 0.6185115575790405,
+      "learning_rate": 9.250134867829528e-06,
+      "loss": 1.9906,
+      "step": 710
+    },
+    {
+      "epoch": 0.36889970539259637,
+      "grad_norm": 0.5282222032546997,
+      "learning_rate": 9.232152490559253e-06,
+      "loss": 1.843,
+      "step": 720
+    },
+    {
+      "epoch": 0.374023312411938,
+      "grad_norm": 0.6137123107910156,
+      "learning_rate": 9.214170113288976e-06,
+      "loss": 1.8222,
+      "step": 730
+    },
+    {
+      "epoch": 0.3791469194312796,
+      "grad_norm": 0.7547171115875244,
+      "learning_rate": 9.196187736018703e-06,
+      "loss": 1.7996,
+      "step": 740
+    },
+    {
+      "epoch": 0.38427052645062126,
+      "grad_norm": 0.6485642790794373,
+      "learning_rate": 9.178205358748428e-06,
+      "loss": 1.7981,
+      "step": 750
+    },
+    {
+      "epoch": 0.3893941334699629,
+      "grad_norm": 0.6317747831344604,
+      "learning_rate": 9.160222981478151e-06,
+      "loss": 1.8796,
+      "step": 760
+    },
+    {
+      "epoch": 0.3945177404893045,
+      "grad_norm": 0.630524218082428,
+      "learning_rate": 9.142240604207878e-06,
+      "loss": 1.8626,
+      "step": 770
+    },
+    {
+      "epoch": 0.3996413475086461,
+      "grad_norm": 0.6553362011909485,
+      "learning_rate": 9.124258226937603e-06,
+      "loss": 1.8457,
+      "step": 780
+    },
+    {
+      "epoch": 0.4047649545279877,
+      "grad_norm": 0.8318623900413513,
+      "learning_rate": 9.106275849667326e-06,
+      "loss": 1.864,
+      "step": 790
+    },
+    {
+      "epoch": 0.40988856154732933,
+      "grad_norm": 0.6111303567886353,
+      "learning_rate": 9.088293472397053e-06,
+      "loss": 1.9157,
+      "step": 800
+    },
+    {
+      "epoch": 0.41501216856667095,
+      "grad_norm": 0.6513353586196899,
+      "learning_rate": 9.070311095126777e-06,
+      "loss": 1.9665,
+      "step": 810
+    },
+    {
+      "epoch": 0.42013577558601256,
+      "grad_norm": 0.928481936454773,
+      "learning_rate": 9.0523287178565e-06,
+      "loss": 1.8343,
+      "step": 820
+    },
+    {
+      "epoch": 0.4252593826053542,
+      "grad_norm": 0.9451532959938049,
+      "learning_rate": 9.034346340586226e-06,
+      "loss": 1.8696,
+      "step": 830
+    },
+    {
+      "epoch": 0.4303829896246958,
+      "grad_norm": 0.8331893682479858,
+      "learning_rate": 9.016363963315952e-06,
+      "loss": 1.8623,
+      "step": 840
+    },
+    {
+      "epoch": 0.4355065966440374,
+      "grad_norm": 0.6320263147354126,
+      "learning_rate": 8.998381586045675e-06,
+      "loss": 2.0111,
+      "step": 850
+    },
+    {
+      "epoch": 0.440630203663379,
+      "grad_norm": 0.6559350490570068,
+      "learning_rate": 8.9803992087754e-06,
+      "loss": 1.9366,
+      "step": 860
+    },
+    {
+      "epoch": 0.44575381068272063,
+      "grad_norm": 0.8885278701782227,
+      "learning_rate": 8.962416831505125e-06,
+      "loss": 1.7874,
+      "step": 870
+    },
+    {
+      "epoch": 0.45087741770206224,
+      "grad_norm": 0.6564807891845703,
+      "learning_rate": 8.94443445423485e-06,
+      "loss": 1.9132,
+      "step": 880
+    },
+    {
+      "epoch": 0.45600102472140386,
+      "grad_norm": 0.8616418242454529,
+      "learning_rate": 8.926452076964575e-06,
+      "loss": 1.8377,
+      "step": 890
+    },
+    {
+      "epoch": 0.46112463174074547,
+      "grad_norm": 0.7826619744300842,
+      "learning_rate": 8.9084696996943e-06,
+      "loss": 1.8289,
+      "step": 900
+    },
+    {
+      "epoch": 0.4662482387600871,
+      "grad_norm": 0.931626558303833,
+      "learning_rate": 8.890487322424025e-06,
+      "loss": 1.7565,
+      "step": 910
+    },
+    {
+      "epoch": 0.4713718457794287,
+      "grad_norm": 0.7691947221755981,
+      "learning_rate": 8.87250494515375e-06,
+      "loss": 1.8211,
+      "step": 920
+    },
+    {
+      "epoch": 0.4764954527987703,
+      "grad_norm": 0.9561055302619934,
+      "learning_rate": 8.854522567883475e-06,
+      "loss": 1.6822,
+      "step": 930
+    },
+    {
+      "epoch": 0.4816190598181119,
+      "grad_norm": 0.6925144195556641,
+      "learning_rate": 8.8365401906132e-06,
+      "loss": 1.9386,
+      "step": 940
+    },
+    {
+      "epoch": 0.48674266683745354,
+      "grad_norm": 0.841651976108551,
+      "learning_rate": 8.818557813342925e-06,
+      "loss": 1.6945,
+      "step": 950
+    },
+    {
+      "epoch": 0.4918662738567952,
+      "grad_norm": 0.7981500625610352,
+      "learning_rate": 8.80057543607265e-06,
+      "loss": 2.0415,
+      "step": 960
+    },
+    {
+      "epoch": 0.4969898808761368,
+      "grad_norm": 2.721221923828125,
+      "learning_rate": 8.782593058802374e-06,
+      "loss": 1.9961,
+      "step": 970
+    },
+    {
+      "epoch": 0.5021134878954784,
+      "grad_norm": 0.8927514553070068,
+      "learning_rate": 8.7646106815321e-06,
+      "loss": 1.9367,
+      "step": 980
+    },
+    {
+      "epoch": 0.50723709491482,
+      "grad_norm": 0.855109453201294,
+      "learning_rate": 8.746628304261824e-06,
+      "loss": 1.7309,
+      "step": 990
+    },
+    {
+      "epoch": 0.5123607019341616,
+      "grad_norm": 0.9546945691108704,
+      "learning_rate": 8.72864592699155e-06,
+      "loss": 1.7306,
+      "step": 1000
+    },
+    {
+      "epoch": 0.5174843089535033,
+      "grad_norm": 0.9672499299049377,
+      "learning_rate": 8.710663549721274e-06,
+      "loss": 1.8424,
+      "step": 1010
+    },
+    {
+      "epoch": 0.5226079159728448,
+      "grad_norm": 1.007051944732666,
+      "learning_rate": 8.692681172450999e-06,
+      "loss": 2.0065,
+      "step": 1020
+    },
+    {
+      "epoch": 0.5277315229921865,
+      "grad_norm": 0.6912757754325867,
+      "learning_rate": 8.674698795180724e-06,
+      "loss": 2.0271,
+      "step": 1030
+    },
+    {
+      "epoch": 0.5328551300115281,
+      "grad_norm": 0.9379146099090576,
+      "learning_rate": 8.656716417910447e-06,
+      "loss": 1.8913,
+      "step": 1040
+    },
+    {
+      "epoch": 0.5379787370308697,
+      "grad_norm": 0.8727518916130066,
+      "learning_rate": 8.638734040640174e-06,
+      "loss": 1.9572,
+      "step": 1050
+    },
+    {
+      "epoch": 0.5431023440502113,
+      "grad_norm": 0.8514009118080139,
+      "learning_rate": 8.620751663369899e-06,
+      "loss": 1.896,
+      "step": 1060
+    },
+    {
+      "epoch": 0.548225951069553,
+      "grad_norm": 1.0260083675384521,
+      "learning_rate": 8.602769286099622e-06,
+      "loss": 1.9949,
+      "step": 1070
+    },
+    {
+      "epoch": 0.5533495580888946,
+      "grad_norm": 0.8921586275100708,
+      "learning_rate": 8.584786908829348e-06,
+      "loss": 1.9208,
+      "step": 1080
+    },
+    {
+      "epoch": 0.5584731651082362,
+      "grad_norm": 0.8888241052627563,
+      "learning_rate": 8.566804531559073e-06,
+      "loss": 1.9193,
+      "step": 1090
+    },
+    {
+      "epoch": 0.5635967721275779,
+      "grad_norm": 1.0706700086593628,
+      "learning_rate": 8.548822154288797e-06,
+      "loss": 1.772,
+      "step": 1100
+    },
+    {
+      "epoch": 0.5687203791469194,
+      "grad_norm": 1.4849698543548584,
+      "learning_rate": 8.530839777018523e-06,
+      "loss": 1.9014,
+      "step": 1110
+    },
+    {
+      "epoch": 0.5738439861662611,
+      "grad_norm": 0.9339395761489868,
+      "learning_rate": 8.512857399748248e-06,
+      "loss": 1.961,
+      "step": 1120
+    },
+    {
+      "epoch": 0.5789675931856026,
+      "grad_norm": 0.9731395244598389,
+      "learning_rate": 8.494875022477971e-06,
+      "loss": 1.9041,
+      "step": 1130
+    },
+    {
+      "epoch": 0.5840912002049443,
+      "grad_norm": 1.1629282236099243,
+      "learning_rate": 8.476892645207698e-06,
+      "loss": 1.9472,
+      "step": 1140
+    },
+    {
+      "epoch": 0.5892148072242859,
+      "grad_norm": 1.166678786277771,
+      "learning_rate": 8.458910267937423e-06,
+      "loss": 1.8912,
+      "step": 1150
+    },
+    {
+      "epoch": 0.5943384142436275,
+      "grad_norm": 1.0819873809814453,
+      "learning_rate": 8.440927890667146e-06,
+      "loss": 1.7994,
+      "step": 1160
+    },
+    {
+      "epoch": 0.5994620212629691,
+      "grad_norm": 0.9142709970474243,
+      "learning_rate": 8.422945513396871e-06,
+      "loss": 1.9244,
+      "step": 1170
+    },
+    {
+      "epoch": 0.6045856282823108,
+      "grad_norm": 0.875573456287384,
+      "learning_rate": 8.404963136126598e-06,
+      "loss": 1.954,
+      "step": 1180
+    },
+    {
+      "epoch": 0.6097092353016523,
+      "grad_norm": 1.3796428442001343,
+      "learning_rate": 8.386980758856321e-06,
+      "loss": 1.8924,
+      "step": 1190
+    },
+    {
+      "epoch": 0.614832842320994,
+      "grad_norm": 1.1863864660263062,
+      "learning_rate": 8.368998381586046e-06,
+      "loss": 1.8346,
+      "step": 1200
+    },
+    {
+      "epoch": 0.6199564493403356,
+      "grad_norm": 0.9728164672851562,
+      "learning_rate": 8.351016004315772e-06,
+      "loss": 1.8753,
+      "step": 1210
+    },
+    {
+      "epoch": 0.6250800563596772,
+      "grad_norm": 1.0267449617385864,
+      "learning_rate": 8.333033627045496e-06,
+      "loss": 1.7244,
+      "step": 1220
+    },
+    {
+      "epoch": 0.6302036633790188,
+      "grad_norm": 1.1570535898208618,
+      "learning_rate": 8.31505124977522e-06,
+      "loss": 1.781,
+      "step": 1230
+    },
+    {
+      "epoch": 0.6353272703983605,
+      "grad_norm": 1.083948016166687,
+      "learning_rate": 8.297068872504945e-06,
+      "loss": 1.7881,
+      "step": 1240
+    },
+    {
+      "epoch": 0.640450877417702,
+      "grad_norm": 1.0299830436706543,
+      "learning_rate": 8.27908649523467e-06,
+      "loss": 1.764,
+      "step": 1250
+    },
+    {
+      "epoch": 0.6455744844370437,
+      "grad_norm": 0.90772545337677,
+      "learning_rate": 8.261104117964395e-06,
+      "loss": 1.907,
+      "step": 1260
+    },
+    {
+      "epoch": 0.6506980914563852,
+      "grad_norm": 1.131496548652649,
+      "learning_rate": 8.24312174069412e-06,
+      "loss": 1.7971,
+      "step": 1270
+    },
+    {
+      "epoch": 0.6558216984757269,
+      "grad_norm": 1.3749288320541382,
+      "learning_rate": 8.225139363423845e-06,
+      "loss": 1.938,
+      "step": 1280
+    },
+    {
+      "epoch": 0.6609453054950686,
+      "grad_norm": 1.0492961406707764,
+      "learning_rate": 8.20715698615357e-06,
+      "loss": 1.9754,
+      "step": 1290
+    },
+    {
+      "epoch": 0.6660689125144101,
+      "grad_norm": 1.0692486763000488,
+      "learning_rate": 8.189174608883295e-06,
+      "loss": 1.8731,
+      "step": 1300
+    },
+    {
+      "epoch": 0.6711925195337518,
+      "grad_norm": 0.8717417120933533,
+      "learning_rate": 8.17119223161302e-06,
+      "loss": 1.9718,
+      "step": 1310
+    },
+    {
+      "epoch": 0.6763161265530934,
+      "grad_norm": 0.9841485619544983,
+      "learning_rate": 8.153209854342745e-06,
+      "loss": 2.0654,
+      "step": 1320
+    },
+    {
+      "epoch": 0.681439733572435,
+      "grad_norm": 1.2106537818908691,
+      "learning_rate": 8.13522747707247e-06,
+      "loss": 1.8843,
+      "step": 1330
+    },
+    {
+      "epoch": 0.6865633405917766,
+      "grad_norm": 1.095435380935669,
+      "learning_rate": 8.117245099802195e-06,
+      "loss": 1.8723,
+      "step": 1340
+    },
+    {
+      "epoch": 0.6916869476111183,
+      "grad_norm": 1.075095772743225,
+      "learning_rate": 8.09926272253192e-06,
+      "loss": 1.9138,
+      "step": 1350
+    },
+    {
+      "epoch": 0.6968105546304598,
+      "grad_norm": 1.0332934856414795,
+      "learning_rate": 8.081280345261644e-06,
+      "loss": 1.9251,
+      "step": 1360
+    },
+    {
+      "epoch": 0.7019341616498015,
+      "grad_norm": 1.0361360311508179,
+      "learning_rate": 8.06329796799137e-06,
+      "loss": 1.9302,
+      "step": 1370
+    },
+    {
+      "epoch": 0.707057768669143,
+      "grad_norm": 1.2773158550262451,
+      "learning_rate": 8.045315590721094e-06,
+      "loss": 1.8499,
+      "step": 1380
+    },
+    {
+      "epoch": 0.7121813756884847,
+      "grad_norm": 1.20404052734375,
+      "learning_rate": 8.02733321345082e-06,
+      "loss": 1.927,
+      "step": 1390
+    },
+    {
+      "epoch": 0.7173049827078263,
+      "grad_norm": 1.0007046461105347,
+      "learning_rate": 8.009350836180544e-06,
+      "loss": 1.8472,
+      "step": 1400
+    },
+    {
+      "epoch": 0.722428589727168,
+      "grad_norm": 1.2512285709381104,
+      "learning_rate": 7.991368458910269e-06,
+      "loss": 1.7322,
+      "step": 1410
+    },
+    {
+      "epoch": 0.7275521967465095,
+      "grad_norm": 1.1332108974456787,
+      "learning_rate": 7.973386081639994e-06,
+      "loss": 1.7588,
+      "step": 1420
+    },
+    {
+      "epoch": 0.7326758037658512,
+      "grad_norm": 0.8058843612670898,
+      "learning_rate": 7.955403704369719e-06,
+      "loss": 1.8401,
+      "step": 1430
+    },
+    {
+      "epoch": 0.7377994107851927,
+      "grad_norm": 1.353051781654358,
+      "learning_rate": 7.937421327099442e-06,
+      "loss": 1.7797,
+      "step": 1440
+    },
+    {
+      "epoch": 0.7429230178045344,
+      "grad_norm": 1.0007350444793701,
+      "learning_rate": 7.919438949829169e-06,
+      "loss": 1.9474,
+      "step": 1450
+    },
+    {
+      "epoch": 0.748046624823876,
+      "grad_norm": 1.1590157747268677,
+      "learning_rate": 7.901456572558894e-06,
+      "loss": 1.6456,
+      "step": 1460
+    },
+    {
+      "epoch": 0.7531702318432176,
+      "grad_norm": 1.5008916854858398,
+      "learning_rate": 7.883474195288617e-06,
+      "loss": 1.7342,
+      "step": 1470
+    },
+    {
+      "epoch": 0.7582938388625592,
+      "grad_norm": 1.4803295135498047,
+      "learning_rate": 7.865491818018343e-06,
+      "loss": 1.9064,
+      "step": 1480
+    },
+    {
+      "epoch": 0.7634174458819009,
+      "grad_norm": 1.4622061252593994,
+      "learning_rate": 7.847509440748068e-06,
+      "loss": 1.8154,
+      "step": 1490
+    },
+    {
+      "epoch": 0.7685410529012425,
+      "grad_norm": 1.163685917854309,
+      "learning_rate": 7.829527063477792e-06,
+      "loss": 1.8814,
+      "step": 1500
+    },
+    {
+      "epoch": 0.7736646599205841,
+      "grad_norm": 1.1950145959854126,
+      "learning_rate": 7.811544686207516e-06,
+      "loss": 1.9097,
+      "step": 1510
+    },
+    {
+      "epoch": 0.7787882669399258,
+      "grad_norm": 1.2062437534332275,
+      "learning_rate": 7.793562308937243e-06,
+      "loss": 1.9636,
+      "step": 1520
+    },
+    {
+      "epoch": 0.7839118739592673,
+      "grad_norm": 1.3537358045578003,
+      "learning_rate": 7.775579931666966e-06,
+      "loss": 1.8527,
+      "step": 1530
+    },
+    {
+      "epoch": 0.789035480978609,
+      "grad_norm": 1.4420790672302246,
+      "learning_rate": 7.757597554396691e-06,
+      "loss": 1.8253,
+      "step": 1540
+    },
+    {
+      "epoch": 0.7941590879979505,
+      "grad_norm": 0.9881426692008972,
+      "learning_rate": 7.739615177126418e-06,
+      "loss": 1.924,
+      "step": 1550
+    },
+    {
+      "epoch": 0.7992826950172922,
+      "grad_norm": 1.301687240600586,
+      "learning_rate": 7.721632799856141e-06,
+      "loss": 1.9453,
+      "step": 1560
+    },
+    {
+      "epoch": 0.8044063020366338,
+      "grad_norm": 0.9863601326942444,
+      "learning_rate": 7.703650422585866e-06,
+      "loss": 1.8233,
+      "step": 1570
+    },
+    {
+      "epoch": 0.8095299090559754,
+      "grad_norm": 1.898417353630066,
+      "learning_rate": 7.685668045315593e-06,
+      "loss": 1.945,
+      "step": 1580
+    },
+    {
+      "epoch": 0.814653516075317,
+      "grad_norm": 1.2148159742355347,
+      "learning_rate": 7.667685668045316e-06,
+      "loss": 1.8756,
+      "step": 1590
+    },
+    {
+      "epoch": 0.8197771230946587,
+      "grad_norm": 1.368735432624817,
+      "learning_rate": 7.64970329077504e-06,
+      "loss": 2.0022,
+      "step": 1600
+    },
+    {
+      "epoch": 0.8249007301140002,
+      "grad_norm": 1.4345860481262207,
+      "learning_rate": 7.631720913504766e-06,
+      "loss": 1.9976,
+      "step": 1610
+    },
+    {
+      "epoch": 0.8300243371333419,
+      "grad_norm": 1.2882895469665527,
+      "learning_rate": 7.6137385362344906e-06,
+      "loss": 1.8766,
+      "step": 1620
+    },
+    {
+      "epoch": 0.8351479441526835,
+      "grad_norm": 1.0629273653030396,
+      "learning_rate": 7.595756158964216e-06,
+      "loss": 1.8844,
+      "step": 1630
+    },
+    {
+      "epoch": 0.8402715511720251,
+      "grad_norm": 1.1477038860321045,
+      "learning_rate": 7.57777378169394e-06,
+      "loss": 1.9578,
+      "step": 1640
+    },
+    {
+      "epoch": 0.8453951581913667,
+      "grad_norm": 1.260421633720398,
+      "learning_rate": 7.559791404423665e-06,
+      "loss": 1.9325,
+      "step": 1650
+    },
+    {
+      "epoch": 0.8505187652107083,
+      "grad_norm": 1.2084254026412964,
+      "learning_rate": 7.54180902715339e-06,
+      "loss": 1.9455,
+      "step": 1660
+    },
+    {
+      "epoch": 0.8556423722300499,
+      "grad_norm": 1.3884501457214355,
+      "learning_rate": 7.523826649883115e-06,
+      "loss": 1.9029,
+      "step": 1670
+    },
+    {
+      "epoch": 0.8607659792493916,
+      "grad_norm": 1.2094634771347046,
+      "learning_rate": 7.50584427261284e-06,
+      "loss": 1.7875,
+      "step": 1680
+    },
+    {
+      "epoch": 0.8658895862687331,
+      "grad_norm": 1.0582746267318726,
+      "learning_rate": 7.487861895342565e-06,
+      "loss": 1.7263,
+      "step": 1690
+    },
+    {
+      "epoch": 0.8710131932880748,
+      "grad_norm": 1.1280441284179688,
+      "learning_rate": 7.469879518072289e-06,
+      "loss": 1.9048,
+      "step": 1700
+    },
+    {
+      "epoch": 0.8761368003074165,
+      "grad_norm": 1.0456609725952148,
+      "learning_rate": 7.451897140802015e-06,
+      "loss": 1.8558,
+      "step": 1710
+    },
+    {
+      "epoch": 0.881260407326758,
+      "grad_norm": 1.5168349742889404,
+      "learning_rate": 7.43391476353174e-06,
+      "loss": 1.9563,
+      "step": 1720
+    },
+    {
+      "epoch": 0.8863840143460997,
+      "grad_norm": 1.2876968383789062,
+      "learning_rate": 7.415932386261464e-06,
+      "loss": 1.9214,
+      "step": 1730
+    },
+    {
+      "epoch": 0.8915076213654413,
+      "grad_norm": 1.4776415824890137,
+      "learning_rate": 7.3979500089911896e-06,
+      "loss": 1.8265,
+      "step": 1740
+    },
+    {
+      "epoch": 0.8966312283847829,
+      "grad_norm": 1.5330499410629272,
+      "learning_rate": 7.3799676317209145e-06,
+      "loss": 1.8339,
+      "step": 1750
+    },
+    {
+      "epoch": 0.9017548354041245,
+      "grad_norm": 1.284269094467163,
+      "learning_rate": 7.3619852544506385e-06,
+      "loss": 1.9065,
+      "step": 1760
+    },
+    {
+      "epoch": 0.9068784424234662,
+      "grad_norm": 1.495251178741455,
+      "learning_rate": 7.344002877180364e-06,
+      "loss": 1.8348,
+      "step": 1770
+    },
+    {
+      "epoch": 0.9120020494428077,
+      "grad_norm": 1.2848460674285889,
+      "learning_rate": 7.326020499910089e-06,
+      "loss": 1.8124,
+      "step": 1780
+    },
+    {
+      "epoch": 0.9171256564621494,
+      "grad_norm": 1.6418150663375854,
+      "learning_rate": 7.308038122639813e-06,
+      "loss": 1.7869,
+      "step": 1790
+    },
+    {
+      "epoch": 0.9222492634814909,
+      "grad_norm": 1.436103343963623,
+      "learning_rate": 7.290055745369539e-06,
+      "loss": 1.9107,
+      "step": 1800
+    },
+    {
+      "epoch": 0.9273728705008326,
+      "grad_norm": 1.7350956201553345,
+      "learning_rate": 7.272073368099263e-06,
+      "loss": 1.8787,
+      "step": 1810
+    },
+    {
+      "epoch": 0.9324964775201742,
+      "grad_norm": 1.5813628435134888,
+      "learning_rate": 7.254090990828988e-06,
+      "loss": 1.975,
+      "step": 1820
+    },
+    {
+      "epoch": 0.9376200845395158,
+      "grad_norm": 1.4264980554580688,
+      "learning_rate": 7.236108613558713e-06,
+      "loss": 1.7784,
+      "step": 1830
+    },
+    {
+      "epoch": 0.9427436915588574,
+      "grad_norm": 1.5020885467529297,
+      "learning_rate": 7.218126236288438e-06,
+      "loss": 1.7603,
+      "step": 1840
+    },
+    {
+      "epoch": 0.9478672985781991,
+      "grad_norm": 1.6258248090744019,
+      "learning_rate": 7.200143859018163e-06,
+      "loss": 1.7173,
+      "step": 1850
+    },
+    {
+      "epoch": 0.9529909055975406,
+      "grad_norm": 1.4609096050262451,
+      "learning_rate": 7.182161481747888e-06,
+      "loss": 1.9287,
+      "step": 1860
+    },
+    {
+      "epoch": 0.9581145126168823,
+      "grad_norm": 1.465809941291809,
+      "learning_rate": 7.164179104477612e-06,
+      "loss": 1.86,
+      "step": 1870
+    },
+    {
+      "epoch": 0.9632381196362239,
+      "grad_norm": 1.414576530456543,
+      "learning_rate": 7.1461967272073375e-06,
+      "loss": 1.8674,
+      "step": 1880
+    },
+    {
+      "epoch": 0.9683617266555655,
+      "grad_norm": 1.6070713996887207,
+      "learning_rate": 7.1282143499370624e-06,
+      "loss": 1.9672,
+      "step": 1890
+    },
+    {
+      "epoch": 0.9734853336749071,
+      "grad_norm": 1.3779326677322388,
+      "learning_rate": 7.1102319726667865e-06,
+      "loss": 1.9253,
+      "step": 1900
+    },
+    {
+      "epoch": 0.9786089406942488,
+      "grad_norm": 1.4001855850219727,
+      "learning_rate": 7.092249595396512e-06,
+      "loss": 1.9005,
+      "step": 1910
+    },
+    {
+      "epoch": 0.9837325477135904,
+      "grad_norm": 1.3750096559524536,
+      "learning_rate": 7.074267218126237e-06,
+      "loss": 1.7803,
+      "step": 1920
+    },
+    {
+      "epoch": 0.988856154732932,
+      "grad_norm": 1.3022425174713135,
+      "learning_rate": 7.056284840855961e-06,
+      "loss": 1.8864,
+      "step": 1930
+    },
+    {
+      "epoch": 0.9939797617522736,
+      "grad_norm": 1.508089303970337,
+      "learning_rate": 7.038302463585687e-06,
+      "loss": 1.8886,
+      "step": 1940
+    },
+    {
+      "epoch": 0.9991033687716152,
+      "grad_norm": 1.1510064601898193,
+      "learning_rate": 7.020320086315412e-06,
+      "loss": 1.7505,
+      "step": 1950
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 5853,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.835320168921821e+18,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1951/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e25b0714d67d54e9933b20827df6f3c0c76aeaacea0b5c13fb2b4afbad45eb83
+size 6033