Henit007 commited on Aug 22, 2025

Commit

b1a423f

verified ·

1 Parent(s): 12514ba

Upload folder using huggingface_hub

Browse files

Files changed (24) hide show

checkpoint-2084/README.md +207 -0
checkpoint-2084/adapter_config.json +42 -0
checkpoint-2084/adapter_model.safetensors +3 -0
checkpoint-2084/chat_template.jinja +15 -0
checkpoint-2084/optimizer.pt +3 -0
checkpoint-2084/rng_state.pth +3 -0
checkpoint-2084/scheduler.pt +3 -0
checkpoint-2084/special_tokens_map.json +30 -0
checkpoint-2084/tokenizer.json +0 -0
checkpoint-2084/tokenizer_config.json +43 -0
checkpoint-2084/trainer_state.json +1490 -0
checkpoint-2084/training_args.bin +3 -0
checkpoint-2605/README.md +207 -0
checkpoint-2605/adapter_config.json +42 -0
checkpoint-2605/adapter_model.safetensors +3 -0
checkpoint-2605/chat_template.jinja +15 -0
checkpoint-2605/optimizer.pt +3 -0
checkpoint-2605/rng_state.pth +3 -0
checkpoint-2605/scheduler.pt +3 -0
checkpoint-2605/special_tokens_map.json +30 -0
checkpoint-2605/tokenizer.json +0 -0
checkpoint-2605/tokenizer_config.json +43 -0
checkpoint-2605/trainer_state.json +1854 -0
checkpoint-2605/training_args.bin +3 -0

checkpoint-2084/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: Henit007/modelo3_finetuned
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Henit007/modelo3_finetuned
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.0

checkpoint-2084/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Henit007/modelo3_finetuned",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "o_proj",
+    "down_proj",
+    "q_proj",
+    "gate_proj",
+    "k_proj",
+    "up_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-2084/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4982215fd1c5da20a468683c2c197e4f55f3a9af4ebd9c912ac958e5d7d214e8
+size 25271744

checkpoint-2084/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,15 @@

+{% for message in messages %}
+{% if message['role'] == 'user' %}
+{{ '<|user|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'system' %}
+{{ '<|system|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'assistant' %}
+{{ '<|assistant|>
+'  + message['content'] + eos_token }}
+{% endif %}
+{% if loop.last and add_generation_prompt %}
+{{ '<|assistant|>' }}
+{% endif %}
+{% endfor %}

checkpoint-2084/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6583387459d2a5433afe5365f886d74d53f7caf6a86bea9c201652888f5ef201
+size 13686237

checkpoint-2084/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:176130beb5f7e89b9e21281226263690af78e10f76c7e0d154915bbc4a9dd34a
+size 14645

checkpoint-2084/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fd216057c1e55ccfa677328c2f2919c9d61cef775b672a3dc72dce96c554aa74
+size 1465

checkpoint-2084/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-2084/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-2084/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 2048,
+  "pad_token": "</s>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizerFast",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint-2084/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1490 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 4.0,
+  "eval_steps": 500,
+  "global_step": 2084,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.019221528111484865,
+      "grad_norm": 4.911402702331543,
+      "learning_rate": 1.724137931034483e-06,
+      "loss": 2.1561,
+      "step": 10
+    },
+    {
+      "epoch": 0.03844305622296973,
+      "grad_norm": 4.560591697692871,
+      "learning_rate": 3.6398467432950196e-06,
+      "loss": 2.1368,
+      "step": 20
+    },
+    {
+      "epoch": 0.05766458433445459,
+      "grad_norm": 6.820058345794678,
+      "learning_rate": 5.555555555555556e-06,
+      "loss": 2.3192,
+      "step": 30
+    },
+    {
+      "epoch": 0.07688611244593946,
+      "grad_norm": 7.230356693267822,
+      "learning_rate": 7.4712643678160925e-06,
+      "loss": 2.6544,
+      "step": 40
+    },
+    {
+      "epoch": 0.09610764055742431,
+      "grad_norm": 6.225106716156006,
+      "learning_rate": 9.386973180076629e-06,
+      "loss": 2.6349,
+      "step": 50
+    },
+    {
+      "epoch": 0.11532916866890917,
+      "grad_norm": 5.758670330047607,
+      "learning_rate": 1.1302681992337164e-05,
+      "loss": 2.6164,
+      "step": 60
+    },
+    {
+      "epoch": 0.13455069678039405,
+      "grad_norm": 6.433386325836182,
+      "learning_rate": 1.3218390804597702e-05,
+      "loss": 2.6651,
+      "step": 70
+    },
+    {
+      "epoch": 0.15377222489187892,
+      "grad_norm": 6.204464435577393,
+      "learning_rate": 1.5134099616858237e-05,
+      "loss": 2.5232,
+      "step": 80
+    },
+    {
+      "epoch": 0.17299375300336375,
+      "grad_norm": 6.514060974121094,
+      "learning_rate": 1.7049808429118776e-05,
+      "loss": 2.5878,
+      "step": 90
+    },
+    {
+      "epoch": 0.19221528111484862,
+      "grad_norm": 5.641262054443359,
+      "learning_rate": 1.896551724137931e-05,
+      "loss": 2.5566,
+      "step": 100
+    },
+    {
+      "epoch": 0.21143680922633348,
+      "grad_norm": 6.221690654754639,
+      "learning_rate": 2.088122605363985e-05,
+      "loss": 2.6939,
+      "step": 110
+    },
+    {
+      "epoch": 0.23065833733781835,
+      "grad_norm": 5.924990177154541,
+      "learning_rate": 2.2796934865900384e-05,
+      "loss": 2.6168,
+      "step": 120
+    },
+    {
+      "epoch": 0.2498798654493032,
+      "grad_norm": 5.515899181365967,
+      "learning_rate": 2.4712643678160922e-05,
+      "loss": 2.6026,
+      "step": 130
+    },
+    {
+      "epoch": 0.2691013935607881,
+      "grad_norm": 5.406517028808594,
+      "learning_rate": 2.662835249042146e-05,
+      "loss": 2.4847,
+      "step": 140
+    },
+    {
+      "epoch": 0.28832292167227297,
+      "grad_norm": 5.749735355377197,
+      "learning_rate": 2.8544061302681996e-05,
+      "loss": 2.621,
+      "step": 150
+    },
+    {
+      "epoch": 0.30754444978375783,
+      "grad_norm": 5.63827657699585,
+      "learning_rate": 3.045977011494253e-05,
+      "loss": 2.5422,
+      "step": 160
+    },
+    {
+      "epoch": 0.3267659778952427,
+      "grad_norm": 7.321156024932861,
+      "learning_rate": 3.2375478927203066e-05,
+      "loss": 2.4498,
+      "step": 170
+    },
+    {
+      "epoch": 0.3459875060067275,
+      "grad_norm": 5.711870193481445,
+      "learning_rate": 3.4291187739463604e-05,
+      "loss": 2.4447,
+      "step": 180
+    },
+    {
+      "epoch": 0.3652090341182124,
+      "grad_norm": 6.159710884094238,
+      "learning_rate": 3.620689655172414e-05,
+      "loss": 2.4798,
+      "step": 190
+    },
+    {
+      "epoch": 0.38443056222969724,
+      "grad_norm": 6.255421161651611,
+      "learning_rate": 3.8122605363984674e-05,
+      "loss": 2.435,
+      "step": 200
+    },
+    {
+      "epoch": 0.4036520903411821,
+      "grad_norm": 6.48488187789917,
+      "learning_rate": 4.003831417624521e-05,
+      "loss": 2.43,
+      "step": 210
+    },
+    {
+      "epoch": 0.42287361845266697,
+      "grad_norm": 5.98501443862915,
+      "learning_rate": 4.195402298850575e-05,
+      "loss": 2.3629,
+      "step": 220
+    },
+    {
+      "epoch": 0.44209514656415183,
+      "grad_norm": 7.079067707061768,
+      "learning_rate": 4.386973180076628e-05,
+      "loss": 2.5268,
+      "step": 230
+    },
+    {
+      "epoch": 0.4613166746756367,
+      "grad_norm": 5.06460428237915,
+      "learning_rate": 4.578544061302682e-05,
+      "loss": 2.4671,
+      "step": 240
+    },
+    {
+      "epoch": 0.48053820278712156,
+      "grad_norm": 6.475998878479004,
+      "learning_rate": 4.770114942528736e-05,
+      "loss": 2.3595,
+      "step": 250
+    },
+    {
+      "epoch": 0.4997597308986064,
+      "grad_norm": 5.529846668243408,
+      "learning_rate": 4.96168582375479e-05,
+      "loss": 2.3635,
+      "step": 260
+    },
+    {
+      "epoch": 0.5189812590100913,
+      "grad_norm": 5.878045558929443,
+      "learning_rate": 4.982935153583618e-05,
+      "loss": 2.436,
+      "step": 270
+    },
+    {
+      "epoch": 0.5382027871215762,
+      "grad_norm": 5.845484256744385,
+      "learning_rate": 4.9616040955631404e-05,
+      "loss": 2.4051,
+      "step": 280
+    },
+    {
+      "epoch": 0.557424315233061,
+      "grad_norm": 6.933780670166016,
+      "learning_rate": 4.940273037542663e-05,
+      "loss": 2.3829,
+      "step": 290
+    },
+    {
+      "epoch": 0.5766458433445459,
+      "grad_norm": 6.303454399108887,
+      "learning_rate": 4.9189419795221845e-05,
+      "loss": 2.3613,
+      "step": 300
+    },
+    {
+      "epoch": 0.5958673714560307,
+      "grad_norm": 6.88358736038208,
+      "learning_rate": 4.897610921501707e-05,
+      "loss": 2.3223,
+      "step": 310
+    },
+    {
+      "epoch": 0.6150888995675157,
+      "grad_norm": 6.72720193862915,
+      "learning_rate": 4.876279863481229e-05,
+      "loss": 2.3706,
+      "step": 320
+    },
+    {
+      "epoch": 0.6343104276790005,
+      "grad_norm": 7.726202011108398,
+      "learning_rate": 4.854948805460751e-05,
+      "loss": 2.3666,
+      "step": 330
+    },
+    {
+      "epoch": 0.6535319557904854,
+      "grad_norm": 6.805552959442139,
+      "learning_rate": 4.8336177474402734e-05,
+      "loss": 2.3535,
+      "step": 340
+    },
+    {
+      "epoch": 0.6727534839019702,
+      "grad_norm": 5.095221996307373,
+      "learning_rate": 4.812286689419796e-05,
+      "loss": 2.2583,
+      "step": 350
+    },
+    {
+      "epoch": 0.691975012013455,
+      "grad_norm": 6.583714485168457,
+      "learning_rate": 4.7909556313993175e-05,
+      "loss": 2.2992,
+      "step": 360
+    },
+    {
+      "epoch": 0.7111965401249399,
+      "grad_norm": 5.969571113586426,
+      "learning_rate": 4.76962457337884e-05,
+      "loss": 2.3388,
+      "step": 370
+    },
+    {
+      "epoch": 0.7304180682364247,
+      "grad_norm": 7.320237636566162,
+      "learning_rate": 4.7482935153583616e-05,
+      "loss": 2.2875,
+      "step": 380
+    },
+    {
+      "epoch": 0.7496395963479097,
+      "grad_norm": 6.594917297363281,
+      "learning_rate": 4.726962457337884e-05,
+      "loss": 2.1744,
+      "step": 390
+    },
+    {
+      "epoch": 0.7688611244593945,
+      "grad_norm": 6.340092658996582,
+      "learning_rate": 4.7056313993174063e-05,
+      "loss": 2.2252,
+      "step": 400
+    },
+    {
+      "epoch": 0.7880826525708794,
+      "grad_norm": 6.270530700683594,
+      "learning_rate": 4.684300341296928e-05,
+      "loss": 2.2892,
+      "step": 410
+    },
+    {
+      "epoch": 0.8073041806823642,
+      "grad_norm": 6.28891134262085,
+      "learning_rate": 4.662969283276451e-05,
+      "loss": 2.3549,
+      "step": 420
+    },
+    {
+      "epoch": 0.8265257087938491,
+      "grad_norm": 6.432322978973389,
+      "learning_rate": 4.641638225255973e-05,
+      "loss": 2.2514,
+      "step": 430
+    },
+    {
+      "epoch": 0.8457472369053339,
+      "grad_norm": 6.2773613929748535,
+      "learning_rate": 4.620307167235495e-05,
+      "loss": 2.3056,
+      "step": 440
+    },
+    {
+      "epoch": 0.8649687650168189,
+      "grad_norm": 6.604885578155518,
+      "learning_rate": 4.5989761092150176e-05,
+      "loss": 2.2911,
+      "step": 450
+    },
+    {
+      "epoch": 0.8841902931283037,
+      "grad_norm": 5.717179775238037,
+      "learning_rate": 4.577645051194539e-05,
+      "loss": 2.3191,
+      "step": 460
+    },
+    {
+      "epoch": 0.9034118212397886,
+      "grad_norm": 6.134155750274658,
+      "learning_rate": 4.556313993174062e-05,
+      "loss": 2.2451,
+      "step": 470
+    },
+    {
+      "epoch": 0.9226333493512734,
+      "grad_norm": 6.287547588348389,
+      "learning_rate": 4.534982935153584e-05,
+      "loss": 2.2448,
+      "step": 480
+    },
+    {
+      "epoch": 0.9418548774627583,
+      "grad_norm": 5.859134197235107,
+      "learning_rate": 4.513651877133106e-05,
+      "loss": 2.1997,
+      "step": 490
+    },
+    {
+      "epoch": 0.9610764055742431,
+      "grad_norm": 6.595165729522705,
+      "learning_rate": 4.492320819112628e-05,
+      "loss": 2.1794,
+      "step": 500
+    },
+    {
+      "epoch": 0.980297933685728,
+      "grad_norm": 6.322437286376953,
+      "learning_rate": 4.4709897610921506e-05,
+      "loss": 2.2895,
+      "step": 510
+    },
+    {
+      "epoch": 0.9995194617972128,
+      "grad_norm": 7.458081245422363,
+      "learning_rate": 4.449658703071672e-05,
+      "loss": 2.1918,
+      "step": 520
+    },
+    {
+      "epoch": 1.0172993753003363,
+      "grad_norm": 6.502513885498047,
+      "learning_rate": 4.428327645051195e-05,
+      "loss": 2.0686,
+      "step": 530
+    },
+    {
+      "epoch": 1.0365209034118212,
+      "grad_norm": 5.94999885559082,
+      "learning_rate": 4.406996587030717e-05,
+      "loss": 1.9922,
+      "step": 540
+    },
+    {
+      "epoch": 1.0557424315233062,
+      "grad_norm": 6.013510704040527,
+      "learning_rate": 4.385665529010239e-05,
+      "loss": 1.9737,
+      "step": 550
+    },
+    {
+      "epoch": 1.074963959634791,
+      "grad_norm": 7.001447677612305,
+      "learning_rate": 4.364334470989762e-05,
+      "loss": 2.0489,
+      "step": 560
+    },
+    {
+      "epoch": 1.0941854877462758,
+      "grad_norm": 7.009434700012207,
+      "learning_rate": 4.3430034129692835e-05,
+      "loss": 2.0722,
+      "step": 570
+    },
+    {
+      "epoch": 1.1134070158577607,
+      "grad_norm": 6.154360294342041,
+      "learning_rate": 4.321672354948806e-05,
+      "loss": 2.1033,
+      "step": 580
+    },
+    {
+      "epoch": 1.1326285439692456,
+      "grad_norm": 7.929754734039307,
+      "learning_rate": 4.300341296928328e-05,
+      "loss": 2.0605,
+      "step": 590
+    },
+    {
+      "epoch": 1.1518500720807303,
+      "grad_norm": 5.974995136260986,
+      "learning_rate": 4.27901023890785e-05,
+      "loss": 1.9992,
+      "step": 600
+    },
+    {
+      "epoch": 1.1710716001922152,
+      "grad_norm": 6.380578994750977,
+      "learning_rate": 4.2576791808873724e-05,
+      "loss": 2.0509,
+      "step": 610
+    },
+    {
+      "epoch": 1.1902931283037002,
+      "grad_norm": 6.86224889755249,
+      "learning_rate": 4.236348122866894e-05,
+      "loss": 1.9994,
+      "step": 620
+    },
+    {
+      "epoch": 1.209514656415185,
+      "grad_norm": 7.081135272979736,
+      "learning_rate": 4.2150170648464165e-05,
+      "loss": 2.0567,
+      "step": 630
+    },
+    {
+      "epoch": 1.22873618452667,
+      "grad_norm": 6.908895492553711,
+      "learning_rate": 4.193686006825939e-05,
+      "loss": 1.9874,
+      "step": 640
+    },
+    {
+      "epoch": 1.2479577126381547,
+      "grad_norm": 7.030237674713135,
+      "learning_rate": 4.1723549488054606e-05,
+      "loss": 2.0383,
+      "step": 650
+    },
+    {
+      "epoch": 1.2671792407496396,
+      "grad_norm": 7.042236804962158,
+      "learning_rate": 4.151023890784983e-05,
+      "loss": 2.076,
+      "step": 660
+    },
+    {
+      "epoch": 1.2864007688611245,
+      "grad_norm": 7.262955188751221,
+      "learning_rate": 4.1296928327645054e-05,
+      "loss": 1.9956,
+      "step": 670
+    },
+    {
+      "epoch": 1.3056222969726092,
+      "grad_norm": 7.4197211265563965,
+      "learning_rate": 4.108361774744027e-05,
+      "loss": 2.0626,
+      "step": 680
+    },
+    {
+      "epoch": 1.3248438250840942,
+      "grad_norm": 8.185269355773926,
+      "learning_rate": 4.0870307167235495e-05,
+      "loss": 1.9757,
+      "step": 690
+    },
+    {
+      "epoch": 1.344065353195579,
+      "grad_norm": 7.224545001983643,
+      "learning_rate": 4.065699658703072e-05,
+      "loss": 2.0315,
+      "step": 700
+    },
+    {
+      "epoch": 1.363286881307064,
+      "grad_norm": 7.302128791809082,
+      "learning_rate": 4.044368600682594e-05,
+      "loss": 2.0296,
+      "step": 710
+    },
+    {
+      "epoch": 1.382508409418549,
+      "grad_norm": 7.413524150848389,
+      "learning_rate": 4.0230375426621166e-05,
+      "loss": 1.9541,
+      "step": 720
+    },
+    {
+      "epoch": 1.4017299375300336,
+      "grad_norm": 6.2689080238342285,
+      "learning_rate": 4.0017064846416383e-05,
+      "loss": 2.0365,
+      "step": 730
+    },
+    {
+      "epoch": 1.4209514656415185,
+      "grad_norm": 6.926391124725342,
+      "learning_rate": 3.980375426621161e-05,
+      "loss": 2.0117,
+      "step": 740
+    },
+    {
+      "epoch": 1.4401729937530034,
+      "grad_norm": 6.754932403564453,
+      "learning_rate": 3.959044368600683e-05,
+      "loss": 1.9882,
+      "step": 750
+    },
+    {
+      "epoch": 1.4593945218644881,
+      "grad_norm": 8.279367446899414,
+      "learning_rate": 3.937713310580205e-05,
+      "loss": 2.0513,
+      "step": 760
+    },
+    {
+      "epoch": 1.478616049975973,
+      "grad_norm": 7.346401691436768,
+      "learning_rate": 3.916382252559727e-05,
+      "loss": 2.0274,
+      "step": 770
+    },
+    {
+      "epoch": 1.497837578087458,
+      "grad_norm": 6.437380313873291,
+      "learning_rate": 3.8950511945392496e-05,
+      "loss": 1.9917,
+      "step": 780
+    },
+    {
+      "epoch": 1.5170591061989427,
+      "grad_norm": 6.856245994567871,
+      "learning_rate": 3.873720136518771e-05,
+      "loss": 2.0252,
+      "step": 790
+    },
+    {
+      "epoch": 1.5362806343104278,
+      "grad_norm": 7.144432067871094,
+      "learning_rate": 3.852389078498294e-05,
+      "loss": 2.1142,
+      "step": 800
+    },
+    {
+      "epoch": 1.5555021624219125,
+      "grad_norm": 6.338939189910889,
+      "learning_rate": 3.8310580204778154e-05,
+      "loss": 1.9595,
+      "step": 810
+    },
+    {
+      "epoch": 1.5747236905333974,
+      "grad_norm": 6.92732572555542,
+      "learning_rate": 3.809726962457338e-05,
+      "loss": 1.9964,
+      "step": 820
+    },
+    {
+      "epoch": 1.5939452186448824,
+      "grad_norm": 6.684772491455078,
+      "learning_rate": 3.78839590443686e-05,
+      "loss": 2.0192,
+      "step": 830
+    },
+    {
+      "epoch": 1.613166746756367,
+      "grad_norm": 6.988185882568359,
+      "learning_rate": 3.7670648464163826e-05,
+      "loss": 1.9899,
+      "step": 840
+    },
+    {
+      "epoch": 1.632388274867852,
+      "grad_norm": 7.449588775634766,
+      "learning_rate": 3.745733788395905e-05,
+      "loss": 1.9278,
+      "step": 850
+    },
+    {
+      "epoch": 1.651609802979337,
+      "grad_norm": 7.102870941162109,
+      "learning_rate": 3.724402730375427e-05,
+      "loss": 2.0357,
+      "step": 860
+    },
+    {
+      "epoch": 1.6708313310908216,
+      "grad_norm": 7.227121829986572,
+      "learning_rate": 3.703071672354949e-05,
+      "loss": 1.9708,
+      "step": 870
+    },
+    {
+      "epoch": 1.6900528592023067,
+      "grad_norm": 7.991628170013428,
+      "learning_rate": 3.6817406143344714e-05,
+      "loss": 1.9555,
+      "step": 880
+    },
+    {
+      "epoch": 1.7092743873137914,
+      "grad_norm": 7.285923957824707,
+      "learning_rate": 3.660409556313993e-05,
+      "loss": 1.9692,
+      "step": 890
+    },
+    {
+      "epoch": 1.7284959154252764,
+      "grad_norm": 6.6300201416015625,
+      "learning_rate": 3.6390784982935155e-05,
+      "loss": 1.9664,
+      "step": 900
+    },
+    {
+      "epoch": 1.7477174435367613,
+      "grad_norm": 7.206388473510742,
+      "learning_rate": 3.617747440273038e-05,
+      "loss": 1.9698,
+      "step": 910
+    },
+    {
+      "epoch": 1.766938971648246,
+      "grad_norm": 7.21373987197876,
+      "learning_rate": 3.5964163822525596e-05,
+      "loss": 1.9686,
+      "step": 920
+    },
+    {
+      "epoch": 1.786160499759731,
+      "grad_norm": 6.428858757019043,
+      "learning_rate": 3.575085324232082e-05,
+      "loss": 1.9851,
+      "step": 930
+    },
+    {
+      "epoch": 1.8053820278712158,
+      "grad_norm": 7.84316873550415,
+      "learning_rate": 3.5537542662116044e-05,
+      "loss": 1.9722,
+      "step": 940
+    },
+    {
+      "epoch": 1.8246035559827005,
+      "grad_norm": 7.756894588470459,
+      "learning_rate": 3.532423208191126e-05,
+      "loss": 1.9733,
+      "step": 950
+    },
+    {
+      "epoch": 1.8438250840941854,
+      "grad_norm": 6.9086480140686035,
+      "learning_rate": 3.5110921501706485e-05,
+      "loss": 1.9561,
+      "step": 960
+    },
+    {
+      "epoch": 1.8630466122056704,
+      "grad_norm": 7.4190802574157715,
+      "learning_rate": 3.48976109215017e-05,
+      "loss": 1.9004,
+      "step": 970
+    },
+    {
+      "epoch": 1.882268140317155,
+      "grad_norm": 6.598989963531494,
+      "learning_rate": 3.468430034129693e-05,
+      "loss": 1.9914,
+      "step": 980
+    },
+    {
+      "epoch": 1.9014896684286402,
+      "grad_norm": 7.208456516265869,
+      "learning_rate": 3.447098976109216e-05,
+      "loss": 1.9888,
+      "step": 990
+    },
+    {
+      "epoch": 1.920711196540125,
+      "grad_norm": 7.192534923553467,
+      "learning_rate": 3.4257679180887374e-05,
+      "loss": 1.9244,
+      "step": 1000
+    },
+    {
+      "epoch": 1.9399327246516098,
+      "grad_norm": 7.464297294616699,
+      "learning_rate": 3.40443686006826e-05,
+      "loss": 1.916,
+      "step": 1010
+    },
+    {
+      "epoch": 1.9591542527630947,
+      "grad_norm": 7.047414779663086,
+      "learning_rate": 3.383105802047782e-05,
+      "loss": 2.0045,
+      "step": 1020
+    },
+    {
+      "epoch": 1.9783757808745794,
+      "grad_norm": 7.436980724334717,
+      "learning_rate": 3.361774744027304e-05,
+      "loss": 2.015,
+      "step": 1030
+    },
+    {
+      "epoch": 1.9975973089860644,
+      "grad_norm": 6.135148525238037,
+      "learning_rate": 3.340443686006826e-05,
+      "loss": 1.9769,
+      "step": 1040
+    },
+    {
+      "epoch": 2.015377222489188,
+      "grad_norm": 6.220305442810059,
+      "learning_rate": 3.319112627986348e-05,
+      "loss": 1.7707,
+      "step": 1050
+    },
+    {
+      "epoch": 2.0345987506006726,
+      "grad_norm": 8.670418739318848,
+      "learning_rate": 3.2977815699658704e-05,
+      "loss": 1.7048,
+      "step": 1060
+    },
+    {
+      "epoch": 2.0538202787121578,
+      "grad_norm": 7.973430633544922,
+      "learning_rate": 3.276450511945393e-05,
+      "loss": 1.7753,
+      "step": 1070
+    },
+    {
+      "epoch": 2.0730418068236425,
+      "grad_norm": 7.165556907653809,
+      "learning_rate": 3.2551194539249145e-05,
+      "loss": 1.7184,
+      "step": 1080
+    },
+    {
+      "epoch": 2.092263334935127,
+      "grad_norm": 8.48978042602539,
+      "learning_rate": 3.233788395904437e-05,
+      "loss": 1.7799,
+      "step": 1090
+    },
+    {
+      "epoch": 2.1114848630466123,
+      "grad_norm": 7.75386905670166,
+      "learning_rate": 3.212457337883959e-05,
+      "loss": 1.7363,
+      "step": 1100
+    },
+    {
+      "epoch": 2.130706391158097,
+      "grad_norm": 7.61426305770874,
+      "learning_rate": 3.191126279863481e-05,
+      "loss": 1.7388,
+      "step": 1110
+    },
+    {
+      "epoch": 2.149927919269582,
+      "grad_norm": 8.537379264831543,
+      "learning_rate": 3.169795221843004e-05,
+      "loss": 1.7217,
+      "step": 1120
+    },
+    {
+      "epoch": 2.169149447381067,
+      "grad_norm": 8.12486457824707,
+      "learning_rate": 3.148464163822526e-05,
+      "loss": 1.7396,
+      "step": 1130
+    },
+    {
+      "epoch": 2.1883709754925516,
+      "grad_norm": 9.46800422668457,
+      "learning_rate": 3.127133105802048e-05,
+      "loss": 1.7526,
+      "step": 1140
+    },
+    {
+      "epoch": 2.2075925036040367,
+      "grad_norm": 7.893990993499756,
+      "learning_rate": 3.1058020477815705e-05,
+      "loss": 1.7482,
+      "step": 1150
+    },
+    {
+      "epoch": 2.2268140317155214,
+      "grad_norm": 8.076356887817383,
+      "learning_rate": 3.084470989761092e-05,
+      "loss": 1.7622,
+      "step": 1160
+    },
+    {
+      "epoch": 2.246035559827006,
+      "grad_norm": 7.560849666595459,
+      "learning_rate": 3.0631399317406146e-05,
+      "loss": 1.789,
+      "step": 1170
+    },
+    {
+      "epoch": 2.2652570879384912,
+      "grad_norm": 7.872079372406006,
+      "learning_rate": 3.0418088737201366e-05,
+      "loss": 1.7213,
+      "step": 1180
+    },
+    {
+      "epoch": 2.284478616049976,
+      "grad_norm": 9.256887435913086,
+      "learning_rate": 3.0204778156996587e-05,
+      "loss": 1.7497,
+      "step": 1190
+    },
+    {
+      "epoch": 2.3037001441614606,
+      "grad_norm": 8.256994247436523,
+      "learning_rate": 2.999146757679181e-05,
+      "loss": 1.7168,
+      "step": 1200
+    },
+    {
+      "epoch": 2.3229216722729458,
+      "grad_norm": 8.07515811920166,
+      "learning_rate": 2.977815699658703e-05,
+      "loss": 1.7714,
+      "step": 1210
+    },
+    {
+      "epoch": 2.3421432003844305,
+      "grad_norm": 7.773881435394287,
+      "learning_rate": 2.956484641638225e-05,
+      "loss": 1.7627,
+      "step": 1220
+    },
+    {
+      "epoch": 2.3613647284959156,
+      "grad_norm": 9.31999397277832,
+      "learning_rate": 2.9351535836177476e-05,
+      "loss": 1.7635,
+      "step": 1230
+    },
+    {
+      "epoch": 2.3805862566074003,
+      "grad_norm": 8.094911575317383,
+      "learning_rate": 2.9138225255972696e-05,
+      "loss": 1.8119,
+      "step": 1240
+    },
+    {
+      "epoch": 2.399807784718885,
+      "grad_norm": 8.966656684875488,
+      "learning_rate": 2.8924914675767916e-05,
+      "loss": 1.7844,
+      "step": 1250
+    },
+    {
+      "epoch": 2.41902931283037,
+      "grad_norm": 8.657224655151367,
+      "learning_rate": 2.8711604095563144e-05,
+      "loss": 1.7849,
+      "step": 1260
+    },
+    {
+      "epoch": 2.438250840941855,
+      "grad_norm": 8.335482597351074,
+      "learning_rate": 2.8498293515358364e-05,
+      "loss": 1.6967,
+      "step": 1270
+    },
+    {
+      "epoch": 2.45747236905334,
+      "grad_norm": 8.769615173339844,
+      "learning_rate": 2.8284982935153588e-05,
+      "loss": 1.8389,
+      "step": 1280
+    },
+    {
+      "epoch": 2.4766938971648247,
+      "grad_norm": 7.692858695983887,
+      "learning_rate": 2.807167235494881e-05,
+      "loss": 1.7602,
+      "step": 1290
+    },
+    {
+      "epoch": 2.4959154252763094,
+      "grad_norm": 7.612323760986328,
+      "learning_rate": 2.785836177474403e-05,
+      "loss": 1.8411,
+      "step": 1300
+    },
+    {
+      "epoch": 2.5151369533877945,
+      "grad_norm": 9.097857475280762,
+      "learning_rate": 2.764505119453925e-05,
+      "loss": 1.7737,
+      "step": 1310
+    },
+    {
+      "epoch": 2.5343584814992792,
+      "grad_norm": 9.11468505859375,
+      "learning_rate": 2.7431740614334473e-05,
+      "loss": 1.8269,
+      "step": 1320
+    },
+    {
+      "epoch": 2.553580009610764,
+      "grad_norm": 9.77290153503418,
+      "learning_rate": 2.7218430034129694e-05,
+      "loss": 1.7671,
+      "step": 1330
+    },
+    {
+      "epoch": 2.572801537722249,
+      "grad_norm": 8.101873397827148,
+      "learning_rate": 2.7005119453924914e-05,
+      "loss": 1.6804,
+      "step": 1340
+    },
+    {
+      "epoch": 2.5920230658337338,
+      "grad_norm": 8.107222557067871,
+      "learning_rate": 2.6791808873720138e-05,
+      "loss": 1.7327,
+      "step": 1350
+    },
+    {
+      "epoch": 2.6112445939452185,
+      "grad_norm": 7.96660852432251,
+      "learning_rate": 2.657849829351536e-05,
+      "loss": 1.7532,
+      "step": 1360
+    },
+    {
+      "epoch": 2.6304661220567036,
+      "grad_norm": 8.748746871948242,
+      "learning_rate": 2.636518771331058e-05,
+      "loss": 1.7216,
+      "step": 1370
+    },
+    {
+      "epoch": 2.6496876501681883,
+      "grad_norm": 8.459169387817383,
+      "learning_rate": 2.61518771331058e-05,
+      "loss": 1.755,
+      "step": 1380
+    },
+    {
+      "epoch": 2.668909178279673,
+      "grad_norm": 8.21006965637207,
+      "learning_rate": 2.5938566552901024e-05,
+      "loss": 1.7775,
+      "step": 1390
+    },
+    {
+      "epoch": 2.688130706391158,
+      "grad_norm": 7.608190059661865,
+      "learning_rate": 2.572525597269625e-05,
+      "loss": 1.6885,
+      "step": 1400
+    },
+    {
+      "epoch": 2.707352234502643,
+      "grad_norm": 7.753986358642578,
+      "learning_rate": 2.551194539249147e-05,
+      "loss": 1.7477,
+      "step": 1410
+    },
+    {
+      "epoch": 2.726573762614128,
+      "grad_norm": 9.139946937561035,
+      "learning_rate": 2.5298634812286692e-05,
+      "loss": 1.7118,
+      "step": 1420
+    },
+    {
+      "epoch": 2.7457952907256127,
+      "grad_norm": 8.321805000305176,
+      "learning_rate": 2.5085324232081912e-05,
+      "loss": 1.7373,
+      "step": 1430
+    },
+    {
+      "epoch": 2.765016818837098,
+      "grad_norm": 6.919346809387207,
+      "learning_rate": 2.4872013651877136e-05,
+      "loss": 1.6811,
+      "step": 1440
+    },
+    {
+      "epoch": 2.7842383469485825,
+      "grad_norm": 8.567264556884766,
+      "learning_rate": 2.4658703071672357e-05,
+      "loss": 1.7402,
+      "step": 1450
+    },
+    {
+      "epoch": 2.803459875060067,
+      "grad_norm": 8.170304298400879,
+      "learning_rate": 2.4445392491467577e-05,
+      "loss": 1.8064,
+      "step": 1460
+    },
+    {
+      "epoch": 2.8226814031715524,
+      "grad_norm": 8.044082641601562,
+      "learning_rate": 2.42320819112628e-05,
+      "loss": 1.6772,
+      "step": 1470
+    },
+    {
+      "epoch": 2.841902931283037,
+      "grad_norm": 8.453600883483887,
+      "learning_rate": 2.401877133105802e-05,
+      "loss": 1.7352,
+      "step": 1480
+    },
+    {
+      "epoch": 2.8611244593945218,
+      "grad_norm": 8.70590591430664,
+      "learning_rate": 2.3805460750853242e-05,
+      "loss": 1.7539,
+      "step": 1490
+    },
+    {
+      "epoch": 2.880345987506007,
+      "grad_norm": 8.303061485290527,
+      "learning_rate": 2.3592150170648466e-05,
+      "loss": 1.7281,
+      "step": 1500
+    },
+    {
+      "epoch": 2.8995675156174916,
+      "grad_norm": 7.61099100112915,
+      "learning_rate": 2.3378839590443686e-05,
+      "loss": 1.6915,
+      "step": 1510
+    },
+    {
+      "epoch": 2.9187890437289763,
+      "grad_norm": 9.040791511535645,
+      "learning_rate": 2.316552901023891e-05,
+      "loss": 1.7099,
+      "step": 1520
+    },
+    {
+      "epoch": 2.9380105718404614,
+      "grad_norm": 8.227042198181152,
+      "learning_rate": 2.295221843003413e-05,
+      "loss": 1.7134,
+      "step": 1530
+    },
+    {
+      "epoch": 2.957232099951946,
+      "grad_norm": 7.818427085876465,
+      "learning_rate": 2.273890784982935e-05,
+      "loss": 1.7652,
+      "step": 1540
+    },
+    {
+      "epoch": 2.976453628063431,
+      "grad_norm": 7.761089324951172,
+      "learning_rate": 2.2525597269624575e-05,
+      "loss": 1.673,
+      "step": 1550
+    },
+    {
+      "epoch": 2.995675156174916,
+      "grad_norm": 9.148423194885254,
+      "learning_rate": 2.2312286689419796e-05,
+      "loss": 1.7545,
+      "step": 1560
+    },
+    {
+      "epoch": 3.0134550696780393,
+      "grad_norm": 6.961498260498047,
+      "learning_rate": 2.209897610921502e-05,
+      "loss": 1.6289,
+      "step": 1570
+    },
+    {
+      "epoch": 3.0326765977895245,
+      "grad_norm": 8.595748901367188,
+      "learning_rate": 2.188566552901024e-05,
+      "loss": 1.5369,
+      "step": 1580
+    },
+    {
+      "epoch": 3.051898125901009,
+      "grad_norm": 8.743279457092285,
+      "learning_rate": 2.1672354948805464e-05,
+      "loss": 1.5447,
+      "step": 1590
+    },
+    {
+      "epoch": 3.071119654012494,
+      "grad_norm": 7.540672302246094,
+      "learning_rate": 2.1459044368600684e-05,
+      "loss": 1.5113,
+      "step": 1600
+    },
+    {
+      "epoch": 3.090341182123979,
+      "grad_norm": 9.278829574584961,
+      "learning_rate": 2.1245733788395905e-05,
+      "loss": 1.5178,
+      "step": 1610
+    },
+    {
+      "epoch": 3.1095627102354637,
+      "grad_norm": 9.872568130493164,
+      "learning_rate": 2.1032423208191125e-05,
+      "loss": 1.5606,
+      "step": 1620
+    },
+    {
+      "epoch": 3.1287842383469484,
+      "grad_norm": 8.62325668334961,
+      "learning_rate": 2.081911262798635e-05,
+      "loss": 1.4871,
+      "step": 1630
+    },
+    {
+      "epoch": 3.1480057664584336,
+      "grad_norm": 8.459789276123047,
+      "learning_rate": 2.0605802047781573e-05,
+      "loss": 1.5561,
+      "step": 1640
+    },
+    {
+      "epoch": 3.1672272945699183,
+      "grad_norm": 9.252559661865234,
+      "learning_rate": 2.0392491467576794e-05,
+      "loss": 1.5595,
+      "step": 1650
+    },
+    {
+      "epoch": 3.1864488226814034,
+      "grad_norm": 9.68213939666748,
+      "learning_rate": 2.0179180887372014e-05,
+      "loss": 1.5517,
+      "step": 1660
+    },
+    {
+      "epoch": 3.205670350792888,
+      "grad_norm": 9.817381858825684,
+      "learning_rate": 1.9965870307167238e-05,
+      "loss": 1.4966,
+      "step": 1670
+    },
+    {
+      "epoch": 3.224891878904373,
+      "grad_norm": 9.264698028564453,
+      "learning_rate": 1.975255972696246e-05,
+      "loss": 1.5443,
+      "step": 1680
+    },
+    {
+      "epoch": 3.244113407015858,
+      "grad_norm": 8.848346710205078,
+      "learning_rate": 1.953924914675768e-05,
+      "loss": 1.4929,
+      "step": 1690
+    },
+    {
+      "epoch": 3.2633349351273426,
+      "grad_norm": 9.325592994689941,
+      "learning_rate": 1.93259385665529e-05,
+      "loss": 1.5259,
+      "step": 1700
+    },
+    {
+      "epoch": 3.2825564632388273,
+      "grad_norm": 10.281765937805176,
+      "learning_rate": 1.9112627986348127e-05,
+      "loss": 1.5497,
+      "step": 1710
+    },
+    {
+      "epoch": 3.3017779913503125,
+      "grad_norm": 8.460485458374023,
+      "learning_rate": 1.8899317406143347e-05,
+      "loss": 1.5051,
+      "step": 1720
+    },
+    {
+      "epoch": 3.320999519461797,
+      "grad_norm": 9.479697227478027,
+      "learning_rate": 1.8686006825938568e-05,
+      "loss": 1.5353,
+      "step": 1730
+    },
+    {
+      "epoch": 3.340221047573282,
+      "grad_norm": 8.678592681884766,
+      "learning_rate": 1.8472696245733788e-05,
+      "loss": 1.5865,
+      "step": 1740
+    },
+    {
+      "epoch": 3.359442575684767,
+      "grad_norm": 9.709846496582031,
+      "learning_rate": 1.8259385665529012e-05,
+      "loss": 1.5572,
+      "step": 1750
+    },
+    {
+      "epoch": 3.3786641037962517,
+      "grad_norm": 9.438785552978516,
+      "learning_rate": 1.8046075085324232e-05,
+      "loss": 1.5867,
+      "step": 1760
+    },
+    {
+      "epoch": 3.397885631907737,
+      "grad_norm": 8.967934608459473,
+      "learning_rate": 1.7832764505119453e-05,
+      "loss": 1.49,
+      "step": 1770
+    },
+    {
+      "epoch": 3.4171071600192215,
+      "grad_norm": 9.989371299743652,
+      "learning_rate": 1.7619453924914677e-05,
+      "loss": 1.5461,
+      "step": 1780
+    },
+    {
+      "epoch": 3.4363286881307062,
+      "grad_norm": 8.968268394470215,
+      "learning_rate": 1.74061433447099e-05,
+      "loss": 1.6002,
+      "step": 1790
+    },
+    {
+      "epoch": 3.4555502162421914,
+      "grad_norm": 9.105525016784668,
+      "learning_rate": 1.719283276450512e-05,
+      "loss": 1.5934,
+      "step": 1800
+    },
+    {
+      "epoch": 3.474771744353676,
+      "grad_norm": 8.997435569763184,
+      "learning_rate": 1.697952218430034e-05,
+      "loss": 1.5687,
+      "step": 1810
+    },
+    {
+      "epoch": 3.4939932724651612,
+      "grad_norm": 9.625081062316895,
+      "learning_rate": 1.6766211604095562e-05,
+      "loss": 1.5684,
+      "step": 1820
+    },
+    {
+      "epoch": 3.513214800576646,
+      "grad_norm": 8.427741050720215,
+      "learning_rate": 1.6552901023890786e-05,
+      "loss": 1.4639,
+      "step": 1830
+    },
+    {
+      "epoch": 3.5324363286881306,
+      "grad_norm": 10.164386749267578,
+      "learning_rate": 1.6339590443686006e-05,
+      "loss": 1.5512,
+      "step": 1840
+    },
+    {
+      "epoch": 3.5516578567996158,
+      "grad_norm": 9.009414672851562,
+      "learning_rate": 1.612627986348123e-05,
+      "loss": 1.5127,
+      "step": 1850
+    },
+    {
+      "epoch": 3.5708793849111005,
+      "grad_norm": 11.138776779174805,
+      "learning_rate": 1.591296928327645e-05,
+      "loss": 1.4763,
+      "step": 1860
+    },
+    {
+      "epoch": 3.590100913022585,
+      "grad_norm": 8.983370780944824,
+      "learning_rate": 1.5699658703071675e-05,
+      "loss": 1.5661,
+      "step": 1870
+    },
+    {
+      "epoch": 3.6093224411340703,
+      "grad_norm": 9.458341598510742,
+      "learning_rate": 1.5486348122866895e-05,
+      "loss": 1.5151,
+      "step": 1880
+    },
+    {
+      "epoch": 3.628543969245555,
+      "grad_norm": 9.019761085510254,
+      "learning_rate": 1.5273037542662116e-05,
+      "loss": 1.5646,
+      "step": 1890
+    },
+    {
+      "epoch": 3.6477654973570397,
+      "grad_norm": 9.247156143188477,
+      "learning_rate": 1.5059726962457338e-05,
+      "loss": 1.5257,
+      "step": 1900
+    },
+    {
+      "epoch": 3.666987025468525,
+      "grad_norm": 9.937666893005371,
+      "learning_rate": 1.484641638225256e-05,
+      "loss": 1.5934,
+      "step": 1910
+    },
+    {
+      "epoch": 3.6862085535800095,
+      "grad_norm": 8.887203216552734,
+      "learning_rate": 1.4633105802047784e-05,
+      "loss": 1.5373,
+      "step": 1920
+    },
+    {
+      "epoch": 3.7054300816914942,
+      "grad_norm": 8.323814392089844,
+      "learning_rate": 1.4419795221843004e-05,
+      "loss": 1.5178,
+      "step": 1930
+    },
+    {
+      "epoch": 3.7246516098029794,
+      "grad_norm": 8.78007984161377,
+      "learning_rate": 1.4206484641638227e-05,
+      "loss": 1.5283,
+      "step": 1940
+    },
+    {
+      "epoch": 3.743873137914464,
+      "grad_norm": 10.119393348693848,
+      "learning_rate": 1.3993174061433447e-05,
+      "loss": 1.5611,
+      "step": 1950
+    },
+    {
+      "epoch": 3.763094666025949,
+      "grad_norm": 8.493330001831055,
+      "learning_rate": 1.377986348122867e-05,
+      "loss": 1.5285,
+      "step": 1960
+    },
+    {
+      "epoch": 3.782316194137434,
+      "grad_norm": 9.496063232421875,
+      "learning_rate": 1.3566552901023891e-05,
+      "loss": 1.4765,
+      "step": 1970
+    },
+    {
+      "epoch": 3.801537722248919,
+      "grad_norm": 9.950116157531738,
+      "learning_rate": 1.3353242320819112e-05,
+      "loss": 1.522,
+      "step": 1980
+    },
+    {
+      "epoch": 3.8207592503604038,
+      "grad_norm": 9.816246032714844,
+      "learning_rate": 1.3139931740614336e-05,
+      "loss": 1.5085,
+      "step": 1990
+    },
+    {
+      "epoch": 3.8399807784718885,
+      "grad_norm": 9.766254425048828,
+      "learning_rate": 1.2926621160409558e-05,
+      "loss": 1.6145,
+      "step": 2000
+    },
+    {
+      "epoch": 3.8592023065833736,
+      "grad_norm": 9.258591651916504,
+      "learning_rate": 1.2713310580204778e-05,
+      "loss": 1.5292,
+      "step": 2010
+    },
+    {
+      "epoch": 3.8784238346948583,
+      "grad_norm": 9.976773262023926,
+      "learning_rate": 1.25e-05,
+      "loss": 1.5811,
+      "step": 2020
+    },
+    {
+      "epoch": 3.897645362806343,
+      "grad_norm": 9.001440048217773,
+      "learning_rate": 1.2286689419795223e-05,
+      "loss": 1.5401,
+      "step": 2030
+    },
+    {
+      "epoch": 3.916866890917828,
+      "grad_norm": 10.00130558013916,
+      "learning_rate": 1.2073378839590445e-05,
+      "loss": 1.6257,
+      "step": 2040
+    },
+    {
+      "epoch": 3.936088419029313,
+      "grad_norm": 8.235420227050781,
+      "learning_rate": 1.1860068259385667e-05,
+      "loss": 1.4888,
+      "step": 2050
+    },
+    {
+      "epoch": 3.9553099471407975,
+      "grad_norm": 9.468134880065918,
+      "learning_rate": 1.1646757679180888e-05,
+      "loss": 1.6077,
+      "step": 2060
+    },
+    {
+      "epoch": 3.9745314752522827,
+      "grad_norm": 8.448241233825684,
+      "learning_rate": 1.143344709897611e-05,
+      "loss": 1.5568,
+      "step": 2070
+    },
+    {
+      "epoch": 3.9937530033637674,
+      "grad_norm": 9.615592956542969,
+      "learning_rate": 1.1220136518771332e-05,
+      "loss": 1.6119,
+      "step": 2080
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2605,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7489484208488448.0,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-2084/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8f3f2c51094de7b2e8fb357753c21db50fbd28d5b3e8bc9181e6be1839b36076
+size 5713

checkpoint-2605/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: Henit007/modelo3_finetuned
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Henit007/modelo3_finetuned
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.0

checkpoint-2605/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Henit007/modelo3_finetuned",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "o_proj",
+    "down_proj",
+    "q_proj",
+    "gate_proj",
+    "k_proj",
+    "up_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-2605/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6280608e290ab7bb9c3e55824440dff2dede7c094c9bc557b288875ed8393276
+size 25271744

checkpoint-2605/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,15 @@

+{% for message in messages %}
+{% if message['role'] == 'user' %}
+{{ '<|user|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'system' %}
+{{ '<|system|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'assistant' %}
+{{ '<|assistant|>
+'  + message['content'] + eos_token }}
+{% endif %}
+{% if loop.last and add_generation_prompt %}
+{{ '<|assistant|>' }}
+{% endif %}
+{% endfor %}

checkpoint-2605/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:44efac9724b679fbf00282e4739633111e8aed26bffb7b219da68e04dcc25588
+size 13686237

checkpoint-2605/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee19345623931c9b7735e76ec4cc9cef226f971e10e0257bdec47621a7f6d808
+size 14645

checkpoint-2605/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:270a5716ceff542c84591c8b92fd5b1dadcc0ddd8d45642aff598977b4e2a576
+size 1465

checkpoint-2605/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-2605/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-2605/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 2048,
+  "pad_token": "</s>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizerFast",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint-2605/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1854 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 5.0,
+  "eval_steps": 500,
+  "global_step": 2605,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.019221528111484865,
+      "grad_norm": 4.911402702331543,
+      "learning_rate": 1.724137931034483e-06,
+      "loss": 2.1561,
+      "step": 10
+    },
+    {
+      "epoch": 0.03844305622296973,
+      "grad_norm": 4.560591697692871,
+      "learning_rate": 3.6398467432950196e-06,
+      "loss": 2.1368,
+      "step": 20
+    },
+    {
+      "epoch": 0.05766458433445459,
+      "grad_norm": 6.820058345794678,
+      "learning_rate": 5.555555555555556e-06,
+      "loss": 2.3192,
+      "step": 30
+    },
+    {
+      "epoch": 0.07688611244593946,
+      "grad_norm": 7.230356693267822,
+      "learning_rate": 7.4712643678160925e-06,
+      "loss": 2.6544,
+      "step": 40
+    },
+    {
+      "epoch": 0.09610764055742431,
+      "grad_norm": 6.225106716156006,
+      "learning_rate": 9.386973180076629e-06,
+      "loss": 2.6349,
+      "step": 50
+    },
+    {
+      "epoch": 0.11532916866890917,
+      "grad_norm": 5.758670330047607,
+      "learning_rate": 1.1302681992337164e-05,
+      "loss": 2.6164,
+      "step": 60
+    },
+    {
+      "epoch": 0.13455069678039405,
+      "grad_norm": 6.433386325836182,
+      "learning_rate": 1.3218390804597702e-05,
+      "loss": 2.6651,
+      "step": 70
+    },
+    {
+      "epoch": 0.15377222489187892,
+      "grad_norm": 6.204464435577393,
+      "learning_rate": 1.5134099616858237e-05,
+      "loss": 2.5232,
+      "step": 80
+    },
+    {
+      "epoch": 0.17299375300336375,
+      "grad_norm": 6.514060974121094,
+      "learning_rate": 1.7049808429118776e-05,
+      "loss": 2.5878,
+      "step": 90
+    },
+    {
+      "epoch": 0.19221528111484862,
+      "grad_norm": 5.641262054443359,
+      "learning_rate": 1.896551724137931e-05,
+      "loss": 2.5566,
+      "step": 100
+    },
+    {
+      "epoch": 0.21143680922633348,
+      "grad_norm": 6.221690654754639,
+      "learning_rate": 2.088122605363985e-05,
+      "loss": 2.6939,
+      "step": 110
+    },
+    {
+      "epoch": 0.23065833733781835,
+      "grad_norm": 5.924990177154541,
+      "learning_rate": 2.2796934865900384e-05,
+      "loss": 2.6168,
+      "step": 120
+    },
+    {
+      "epoch": 0.2498798654493032,
+      "grad_norm": 5.515899181365967,
+      "learning_rate": 2.4712643678160922e-05,
+      "loss": 2.6026,
+      "step": 130
+    },
+    {
+      "epoch": 0.2691013935607881,
+      "grad_norm": 5.406517028808594,
+      "learning_rate": 2.662835249042146e-05,
+      "loss": 2.4847,
+      "step": 140
+    },
+    {
+      "epoch": 0.28832292167227297,
+      "grad_norm": 5.749735355377197,
+      "learning_rate": 2.8544061302681996e-05,
+      "loss": 2.621,
+      "step": 150
+    },
+    {
+      "epoch": 0.30754444978375783,
+      "grad_norm": 5.63827657699585,
+      "learning_rate": 3.045977011494253e-05,
+      "loss": 2.5422,
+      "step": 160
+    },
+    {
+      "epoch": 0.3267659778952427,
+      "grad_norm": 7.321156024932861,
+      "learning_rate": 3.2375478927203066e-05,
+      "loss": 2.4498,
+      "step": 170
+    },
+    {
+      "epoch": 0.3459875060067275,
+      "grad_norm": 5.711870193481445,
+      "learning_rate": 3.4291187739463604e-05,
+      "loss": 2.4447,
+      "step": 180
+    },
+    {
+      "epoch": 0.3652090341182124,
+      "grad_norm": 6.159710884094238,
+      "learning_rate": 3.620689655172414e-05,
+      "loss": 2.4798,
+      "step": 190
+    },
+    {
+      "epoch": 0.38443056222969724,
+      "grad_norm": 6.255421161651611,
+      "learning_rate": 3.8122605363984674e-05,
+      "loss": 2.435,
+      "step": 200
+    },
+    {
+      "epoch": 0.4036520903411821,
+      "grad_norm": 6.48488187789917,
+      "learning_rate": 4.003831417624521e-05,
+      "loss": 2.43,
+      "step": 210
+    },
+    {
+      "epoch": 0.42287361845266697,
+      "grad_norm": 5.98501443862915,
+      "learning_rate": 4.195402298850575e-05,
+      "loss": 2.3629,
+      "step": 220
+    },
+    {
+      "epoch": 0.44209514656415183,
+      "grad_norm": 7.079067707061768,
+      "learning_rate": 4.386973180076628e-05,
+      "loss": 2.5268,
+      "step": 230
+    },
+    {
+      "epoch": 0.4613166746756367,
+      "grad_norm": 5.06460428237915,
+      "learning_rate": 4.578544061302682e-05,
+      "loss": 2.4671,
+      "step": 240
+    },
+    {
+      "epoch": 0.48053820278712156,
+      "grad_norm": 6.475998878479004,
+      "learning_rate": 4.770114942528736e-05,
+      "loss": 2.3595,
+      "step": 250
+    },
+    {
+      "epoch": 0.4997597308986064,
+      "grad_norm": 5.529846668243408,
+      "learning_rate": 4.96168582375479e-05,
+      "loss": 2.3635,
+      "step": 260
+    },
+    {
+      "epoch": 0.5189812590100913,
+      "grad_norm": 5.878045558929443,
+      "learning_rate": 4.982935153583618e-05,
+      "loss": 2.436,
+      "step": 270
+    },
+    {
+      "epoch": 0.5382027871215762,
+      "grad_norm": 5.845484256744385,
+      "learning_rate": 4.9616040955631404e-05,
+      "loss": 2.4051,
+      "step": 280
+    },
+    {
+      "epoch": 0.557424315233061,
+      "grad_norm": 6.933780670166016,
+      "learning_rate": 4.940273037542663e-05,
+      "loss": 2.3829,
+      "step": 290
+    },
+    {
+      "epoch": 0.5766458433445459,
+      "grad_norm": 6.303454399108887,
+      "learning_rate": 4.9189419795221845e-05,
+      "loss": 2.3613,
+      "step": 300
+    },
+    {
+      "epoch": 0.5958673714560307,
+      "grad_norm": 6.88358736038208,
+      "learning_rate": 4.897610921501707e-05,
+      "loss": 2.3223,
+      "step": 310
+    },
+    {
+      "epoch": 0.6150888995675157,
+      "grad_norm": 6.72720193862915,
+      "learning_rate": 4.876279863481229e-05,
+      "loss": 2.3706,
+      "step": 320
+    },
+    {
+      "epoch": 0.6343104276790005,
+      "grad_norm": 7.726202011108398,
+      "learning_rate": 4.854948805460751e-05,
+      "loss": 2.3666,
+      "step": 330
+    },
+    {
+      "epoch": 0.6535319557904854,
+      "grad_norm": 6.805552959442139,
+      "learning_rate": 4.8336177474402734e-05,
+      "loss": 2.3535,
+      "step": 340
+    },
+    {
+      "epoch": 0.6727534839019702,
+      "grad_norm": 5.095221996307373,
+      "learning_rate": 4.812286689419796e-05,
+      "loss": 2.2583,
+      "step": 350
+    },
+    {
+      "epoch": 0.691975012013455,
+      "grad_norm": 6.583714485168457,
+      "learning_rate": 4.7909556313993175e-05,
+      "loss": 2.2992,
+      "step": 360
+    },
+    {
+      "epoch": 0.7111965401249399,
+      "grad_norm": 5.969571113586426,
+      "learning_rate": 4.76962457337884e-05,
+      "loss": 2.3388,
+      "step": 370
+    },
+    {
+      "epoch": 0.7304180682364247,
+      "grad_norm": 7.320237636566162,
+      "learning_rate": 4.7482935153583616e-05,
+      "loss": 2.2875,
+      "step": 380
+    },
+    {
+      "epoch": 0.7496395963479097,
+      "grad_norm": 6.594917297363281,
+      "learning_rate": 4.726962457337884e-05,
+      "loss": 2.1744,
+      "step": 390
+    },
+    {
+      "epoch": 0.7688611244593945,
+      "grad_norm": 6.340092658996582,
+      "learning_rate": 4.7056313993174063e-05,
+      "loss": 2.2252,
+      "step": 400
+    },
+    {
+      "epoch": 0.7880826525708794,
+      "grad_norm": 6.270530700683594,
+      "learning_rate": 4.684300341296928e-05,
+      "loss": 2.2892,
+      "step": 410
+    },
+    {
+      "epoch": 0.8073041806823642,
+      "grad_norm": 6.28891134262085,
+      "learning_rate": 4.662969283276451e-05,
+      "loss": 2.3549,
+      "step": 420
+    },
+    {
+      "epoch": 0.8265257087938491,
+      "grad_norm": 6.432322978973389,
+      "learning_rate": 4.641638225255973e-05,
+      "loss": 2.2514,
+      "step": 430
+    },
+    {
+      "epoch": 0.8457472369053339,
+      "grad_norm": 6.2773613929748535,
+      "learning_rate": 4.620307167235495e-05,
+      "loss": 2.3056,
+      "step": 440
+    },
+    {
+      "epoch": 0.8649687650168189,
+      "grad_norm": 6.604885578155518,
+      "learning_rate": 4.5989761092150176e-05,
+      "loss": 2.2911,
+      "step": 450
+    },
+    {
+      "epoch": 0.8841902931283037,
+      "grad_norm": 5.717179775238037,
+      "learning_rate": 4.577645051194539e-05,
+      "loss": 2.3191,
+      "step": 460
+    },
+    {
+      "epoch": 0.9034118212397886,
+      "grad_norm": 6.134155750274658,
+      "learning_rate": 4.556313993174062e-05,
+      "loss": 2.2451,
+      "step": 470
+    },
+    {
+      "epoch": 0.9226333493512734,
+      "grad_norm": 6.287547588348389,
+      "learning_rate": 4.534982935153584e-05,
+      "loss": 2.2448,
+      "step": 480
+    },
+    {
+      "epoch": 0.9418548774627583,
+      "grad_norm": 5.859134197235107,
+      "learning_rate": 4.513651877133106e-05,
+      "loss": 2.1997,
+      "step": 490
+    },
+    {
+      "epoch": 0.9610764055742431,
+      "grad_norm": 6.595165729522705,
+      "learning_rate": 4.492320819112628e-05,
+      "loss": 2.1794,
+      "step": 500
+    },
+    {
+      "epoch": 0.980297933685728,
+      "grad_norm": 6.322437286376953,
+      "learning_rate": 4.4709897610921506e-05,
+      "loss": 2.2895,
+      "step": 510
+    },
+    {
+      "epoch": 0.9995194617972128,
+      "grad_norm": 7.458081245422363,
+      "learning_rate": 4.449658703071672e-05,
+      "loss": 2.1918,
+      "step": 520
+    },
+    {
+      "epoch": 1.0172993753003363,
+      "grad_norm": 6.502513885498047,
+      "learning_rate": 4.428327645051195e-05,
+      "loss": 2.0686,
+      "step": 530
+    },
+    {
+      "epoch": 1.0365209034118212,
+      "grad_norm": 5.94999885559082,
+      "learning_rate": 4.406996587030717e-05,
+      "loss": 1.9922,
+      "step": 540
+    },
+    {
+      "epoch": 1.0557424315233062,
+      "grad_norm": 6.013510704040527,
+      "learning_rate": 4.385665529010239e-05,
+      "loss": 1.9737,
+      "step": 550
+    },
+    {
+      "epoch": 1.074963959634791,
+      "grad_norm": 7.001447677612305,
+      "learning_rate": 4.364334470989762e-05,
+      "loss": 2.0489,
+      "step": 560
+    },
+    {
+      "epoch": 1.0941854877462758,
+      "grad_norm": 7.009434700012207,
+      "learning_rate": 4.3430034129692835e-05,
+      "loss": 2.0722,
+      "step": 570
+    },
+    {
+      "epoch": 1.1134070158577607,
+      "grad_norm": 6.154360294342041,
+      "learning_rate": 4.321672354948806e-05,
+      "loss": 2.1033,
+      "step": 580
+    },
+    {
+      "epoch": 1.1326285439692456,
+      "grad_norm": 7.929754734039307,
+      "learning_rate": 4.300341296928328e-05,
+      "loss": 2.0605,
+      "step": 590
+    },
+    {
+      "epoch": 1.1518500720807303,
+      "grad_norm": 5.974995136260986,
+      "learning_rate": 4.27901023890785e-05,
+      "loss": 1.9992,
+      "step": 600
+    },
+    {
+      "epoch": 1.1710716001922152,
+      "grad_norm": 6.380578994750977,
+      "learning_rate": 4.2576791808873724e-05,
+      "loss": 2.0509,
+      "step": 610
+    },
+    {
+      "epoch": 1.1902931283037002,
+      "grad_norm": 6.86224889755249,
+      "learning_rate": 4.236348122866894e-05,
+      "loss": 1.9994,
+      "step": 620
+    },
+    {
+      "epoch": 1.209514656415185,
+      "grad_norm": 7.081135272979736,
+      "learning_rate": 4.2150170648464165e-05,
+      "loss": 2.0567,
+      "step": 630
+    },
+    {
+      "epoch": 1.22873618452667,
+      "grad_norm": 6.908895492553711,
+      "learning_rate": 4.193686006825939e-05,
+      "loss": 1.9874,
+      "step": 640
+    },
+    {
+      "epoch": 1.2479577126381547,
+      "grad_norm": 7.030237674713135,
+      "learning_rate": 4.1723549488054606e-05,
+      "loss": 2.0383,
+      "step": 650
+    },
+    {
+      "epoch": 1.2671792407496396,
+      "grad_norm": 7.042236804962158,
+      "learning_rate": 4.151023890784983e-05,
+      "loss": 2.076,
+      "step": 660
+    },
+    {
+      "epoch": 1.2864007688611245,
+      "grad_norm": 7.262955188751221,
+      "learning_rate": 4.1296928327645054e-05,
+      "loss": 1.9956,
+      "step": 670
+    },
+    {
+      "epoch": 1.3056222969726092,
+      "grad_norm": 7.4197211265563965,
+      "learning_rate": 4.108361774744027e-05,
+      "loss": 2.0626,
+      "step": 680
+    },
+    {
+      "epoch": 1.3248438250840942,
+      "grad_norm": 8.185269355773926,
+      "learning_rate": 4.0870307167235495e-05,
+      "loss": 1.9757,
+      "step": 690
+    },
+    {
+      "epoch": 1.344065353195579,
+      "grad_norm": 7.224545001983643,
+      "learning_rate": 4.065699658703072e-05,
+      "loss": 2.0315,
+      "step": 700
+    },
+    {
+      "epoch": 1.363286881307064,
+      "grad_norm": 7.302128791809082,
+      "learning_rate": 4.044368600682594e-05,
+      "loss": 2.0296,
+      "step": 710
+    },
+    {
+      "epoch": 1.382508409418549,
+      "grad_norm": 7.413524150848389,
+      "learning_rate": 4.0230375426621166e-05,
+      "loss": 1.9541,
+      "step": 720
+    },
+    {
+      "epoch": 1.4017299375300336,
+      "grad_norm": 6.2689080238342285,
+      "learning_rate": 4.0017064846416383e-05,
+      "loss": 2.0365,
+      "step": 730
+    },
+    {
+      "epoch": 1.4209514656415185,
+      "grad_norm": 6.926391124725342,
+      "learning_rate": 3.980375426621161e-05,
+      "loss": 2.0117,
+      "step": 740
+    },
+    {
+      "epoch": 1.4401729937530034,
+      "grad_norm": 6.754932403564453,
+      "learning_rate": 3.959044368600683e-05,
+      "loss": 1.9882,
+      "step": 750
+    },
+    {
+      "epoch": 1.4593945218644881,
+      "grad_norm": 8.279367446899414,
+      "learning_rate": 3.937713310580205e-05,
+      "loss": 2.0513,
+      "step": 760
+    },
+    {
+      "epoch": 1.478616049975973,
+      "grad_norm": 7.346401691436768,
+      "learning_rate": 3.916382252559727e-05,
+      "loss": 2.0274,
+      "step": 770
+    },
+    {
+      "epoch": 1.497837578087458,
+      "grad_norm": 6.437380313873291,
+      "learning_rate": 3.8950511945392496e-05,
+      "loss": 1.9917,
+      "step": 780
+    },
+    {
+      "epoch": 1.5170591061989427,
+      "grad_norm": 6.856245994567871,
+      "learning_rate": 3.873720136518771e-05,
+      "loss": 2.0252,
+      "step": 790
+    },
+    {
+      "epoch": 1.5362806343104278,
+      "grad_norm": 7.144432067871094,
+      "learning_rate": 3.852389078498294e-05,
+      "loss": 2.1142,
+      "step": 800
+    },
+    {
+      "epoch": 1.5555021624219125,
+      "grad_norm": 6.338939189910889,
+      "learning_rate": 3.8310580204778154e-05,
+      "loss": 1.9595,
+      "step": 810
+    },
+    {
+      "epoch": 1.5747236905333974,
+      "grad_norm": 6.92732572555542,
+      "learning_rate": 3.809726962457338e-05,
+      "loss": 1.9964,
+      "step": 820
+    },
+    {
+      "epoch": 1.5939452186448824,
+      "grad_norm": 6.684772491455078,
+      "learning_rate": 3.78839590443686e-05,
+      "loss": 2.0192,
+      "step": 830
+    },
+    {
+      "epoch": 1.613166746756367,
+      "grad_norm": 6.988185882568359,
+      "learning_rate": 3.7670648464163826e-05,
+      "loss": 1.9899,
+      "step": 840
+    },
+    {
+      "epoch": 1.632388274867852,
+      "grad_norm": 7.449588775634766,
+      "learning_rate": 3.745733788395905e-05,
+      "loss": 1.9278,
+      "step": 850
+    },
+    {
+      "epoch": 1.651609802979337,
+      "grad_norm": 7.102870941162109,
+      "learning_rate": 3.724402730375427e-05,
+      "loss": 2.0357,
+      "step": 860
+    },
+    {
+      "epoch": 1.6708313310908216,
+      "grad_norm": 7.227121829986572,
+      "learning_rate": 3.703071672354949e-05,
+      "loss": 1.9708,
+      "step": 870
+    },
+    {
+      "epoch": 1.6900528592023067,
+      "grad_norm": 7.991628170013428,
+      "learning_rate": 3.6817406143344714e-05,
+      "loss": 1.9555,
+      "step": 880
+    },
+    {
+      "epoch": 1.7092743873137914,
+      "grad_norm": 7.285923957824707,
+      "learning_rate": 3.660409556313993e-05,
+      "loss": 1.9692,
+      "step": 890
+    },
+    {
+      "epoch": 1.7284959154252764,
+      "grad_norm": 6.6300201416015625,
+      "learning_rate": 3.6390784982935155e-05,
+      "loss": 1.9664,
+      "step": 900
+    },
+    {
+      "epoch": 1.7477174435367613,
+      "grad_norm": 7.206388473510742,
+      "learning_rate": 3.617747440273038e-05,
+      "loss": 1.9698,
+      "step": 910
+    },
+    {
+      "epoch": 1.766938971648246,
+      "grad_norm": 7.21373987197876,
+      "learning_rate": 3.5964163822525596e-05,
+      "loss": 1.9686,
+      "step": 920
+    },
+    {
+      "epoch": 1.786160499759731,
+      "grad_norm": 6.428858757019043,
+      "learning_rate": 3.575085324232082e-05,
+      "loss": 1.9851,
+      "step": 930
+    },
+    {
+      "epoch": 1.8053820278712158,
+      "grad_norm": 7.84316873550415,
+      "learning_rate": 3.5537542662116044e-05,
+      "loss": 1.9722,
+      "step": 940
+    },
+    {
+      "epoch": 1.8246035559827005,
+      "grad_norm": 7.756894588470459,
+      "learning_rate": 3.532423208191126e-05,
+      "loss": 1.9733,
+      "step": 950
+    },
+    {
+      "epoch": 1.8438250840941854,
+      "grad_norm": 6.9086480140686035,
+      "learning_rate": 3.5110921501706485e-05,
+      "loss": 1.9561,
+      "step": 960
+    },
+    {
+      "epoch": 1.8630466122056704,
+      "grad_norm": 7.4190802574157715,
+      "learning_rate": 3.48976109215017e-05,
+      "loss": 1.9004,
+      "step": 970
+    },
+    {
+      "epoch": 1.882268140317155,
+      "grad_norm": 6.598989963531494,
+      "learning_rate": 3.468430034129693e-05,
+      "loss": 1.9914,
+      "step": 980
+    },
+    {
+      "epoch": 1.9014896684286402,
+      "grad_norm": 7.208456516265869,
+      "learning_rate": 3.447098976109216e-05,
+      "loss": 1.9888,
+      "step": 990
+    },
+    {
+      "epoch": 1.920711196540125,
+      "grad_norm": 7.192534923553467,
+      "learning_rate": 3.4257679180887374e-05,
+      "loss": 1.9244,
+      "step": 1000
+    },
+    {
+      "epoch": 1.9399327246516098,
+      "grad_norm": 7.464297294616699,
+      "learning_rate": 3.40443686006826e-05,
+      "loss": 1.916,
+      "step": 1010
+    },
+    {
+      "epoch": 1.9591542527630947,
+      "grad_norm": 7.047414779663086,
+      "learning_rate": 3.383105802047782e-05,
+      "loss": 2.0045,
+      "step": 1020
+    },
+    {
+      "epoch": 1.9783757808745794,
+      "grad_norm": 7.436980724334717,
+      "learning_rate": 3.361774744027304e-05,
+      "loss": 2.015,
+      "step": 1030
+    },
+    {
+      "epoch": 1.9975973089860644,
+      "grad_norm": 6.135148525238037,
+      "learning_rate": 3.340443686006826e-05,
+      "loss": 1.9769,
+      "step": 1040
+    },
+    {
+      "epoch": 2.015377222489188,
+      "grad_norm": 6.220305442810059,
+      "learning_rate": 3.319112627986348e-05,
+      "loss": 1.7707,
+      "step": 1050
+    },
+    {
+      "epoch": 2.0345987506006726,
+      "grad_norm": 8.670418739318848,
+      "learning_rate": 3.2977815699658704e-05,
+      "loss": 1.7048,
+      "step": 1060
+    },
+    {
+      "epoch": 2.0538202787121578,
+      "grad_norm": 7.973430633544922,
+      "learning_rate": 3.276450511945393e-05,
+      "loss": 1.7753,
+      "step": 1070
+    },
+    {
+      "epoch": 2.0730418068236425,
+      "grad_norm": 7.165556907653809,
+      "learning_rate": 3.2551194539249145e-05,
+      "loss": 1.7184,
+      "step": 1080
+    },
+    {
+      "epoch": 2.092263334935127,
+      "grad_norm": 8.48978042602539,
+      "learning_rate": 3.233788395904437e-05,
+      "loss": 1.7799,
+      "step": 1090
+    },
+    {
+      "epoch": 2.1114848630466123,
+      "grad_norm": 7.75386905670166,
+      "learning_rate": 3.212457337883959e-05,
+      "loss": 1.7363,
+      "step": 1100
+    },
+    {
+      "epoch": 2.130706391158097,
+      "grad_norm": 7.61426305770874,
+      "learning_rate": 3.191126279863481e-05,
+      "loss": 1.7388,
+      "step": 1110
+    },
+    {
+      "epoch": 2.149927919269582,
+      "grad_norm": 8.537379264831543,
+      "learning_rate": 3.169795221843004e-05,
+      "loss": 1.7217,
+      "step": 1120
+    },
+    {
+      "epoch": 2.169149447381067,
+      "grad_norm": 8.12486457824707,
+      "learning_rate": 3.148464163822526e-05,
+      "loss": 1.7396,
+      "step": 1130
+    },
+    {
+      "epoch": 2.1883709754925516,
+      "grad_norm": 9.46800422668457,
+      "learning_rate": 3.127133105802048e-05,
+      "loss": 1.7526,
+      "step": 1140
+    },
+    {
+      "epoch": 2.2075925036040367,
+      "grad_norm": 7.893990993499756,
+      "learning_rate": 3.1058020477815705e-05,
+      "loss": 1.7482,
+      "step": 1150
+    },
+    {
+      "epoch": 2.2268140317155214,
+      "grad_norm": 8.076356887817383,
+      "learning_rate": 3.084470989761092e-05,
+      "loss": 1.7622,
+      "step": 1160
+    },
+    {
+      "epoch": 2.246035559827006,
+      "grad_norm": 7.560849666595459,
+      "learning_rate": 3.0631399317406146e-05,
+      "loss": 1.789,
+      "step": 1170
+    },
+    {
+      "epoch": 2.2652570879384912,
+      "grad_norm": 7.872079372406006,
+      "learning_rate": 3.0418088737201366e-05,
+      "loss": 1.7213,
+      "step": 1180
+    },
+    {
+      "epoch": 2.284478616049976,
+      "grad_norm": 9.256887435913086,
+      "learning_rate": 3.0204778156996587e-05,
+      "loss": 1.7497,
+      "step": 1190
+    },
+    {
+      "epoch": 2.3037001441614606,
+      "grad_norm": 8.256994247436523,
+      "learning_rate": 2.999146757679181e-05,
+      "loss": 1.7168,
+      "step": 1200
+    },
+    {
+      "epoch": 2.3229216722729458,
+      "grad_norm": 8.07515811920166,
+      "learning_rate": 2.977815699658703e-05,
+      "loss": 1.7714,
+      "step": 1210
+    },
+    {
+      "epoch": 2.3421432003844305,
+      "grad_norm": 7.773881435394287,
+      "learning_rate": 2.956484641638225e-05,
+      "loss": 1.7627,
+      "step": 1220
+    },
+    {
+      "epoch": 2.3613647284959156,
+      "grad_norm": 9.31999397277832,
+      "learning_rate": 2.9351535836177476e-05,
+      "loss": 1.7635,
+      "step": 1230
+    },
+    {
+      "epoch": 2.3805862566074003,
+      "grad_norm": 8.094911575317383,
+      "learning_rate": 2.9138225255972696e-05,
+      "loss": 1.8119,
+      "step": 1240
+    },
+    {
+      "epoch": 2.399807784718885,
+      "grad_norm": 8.966656684875488,
+      "learning_rate": 2.8924914675767916e-05,
+      "loss": 1.7844,
+      "step": 1250
+    },
+    {
+      "epoch": 2.41902931283037,
+      "grad_norm": 8.657224655151367,
+      "learning_rate": 2.8711604095563144e-05,
+      "loss": 1.7849,
+      "step": 1260
+    },
+    {
+      "epoch": 2.438250840941855,
+      "grad_norm": 8.335482597351074,
+      "learning_rate": 2.8498293515358364e-05,
+      "loss": 1.6967,
+      "step": 1270
+    },
+    {
+      "epoch": 2.45747236905334,
+      "grad_norm": 8.769615173339844,
+      "learning_rate": 2.8284982935153588e-05,
+      "loss": 1.8389,
+      "step": 1280
+    },
+    {
+      "epoch": 2.4766938971648247,
+      "grad_norm": 7.692858695983887,
+      "learning_rate": 2.807167235494881e-05,
+      "loss": 1.7602,
+      "step": 1290
+    },
+    {
+      "epoch": 2.4959154252763094,
+      "grad_norm": 7.612323760986328,
+      "learning_rate": 2.785836177474403e-05,
+      "loss": 1.8411,
+      "step": 1300
+    },
+    {
+      "epoch": 2.5151369533877945,
+      "grad_norm": 9.097857475280762,
+      "learning_rate": 2.764505119453925e-05,
+      "loss": 1.7737,
+      "step": 1310
+    },
+    {
+      "epoch": 2.5343584814992792,
+      "grad_norm": 9.11468505859375,
+      "learning_rate": 2.7431740614334473e-05,
+      "loss": 1.8269,
+      "step": 1320
+    },
+    {
+      "epoch": 2.553580009610764,
+      "grad_norm": 9.77290153503418,
+      "learning_rate": 2.7218430034129694e-05,
+      "loss": 1.7671,
+      "step": 1330
+    },
+    {
+      "epoch": 2.572801537722249,
+      "grad_norm": 8.101873397827148,
+      "learning_rate": 2.7005119453924914e-05,
+      "loss": 1.6804,
+      "step": 1340
+    },
+    {
+      "epoch": 2.5920230658337338,
+      "grad_norm": 8.107222557067871,
+      "learning_rate": 2.6791808873720138e-05,
+      "loss": 1.7327,
+      "step": 1350
+    },
+    {
+      "epoch": 2.6112445939452185,
+      "grad_norm": 7.96660852432251,
+      "learning_rate": 2.657849829351536e-05,
+      "loss": 1.7532,
+      "step": 1360
+    },
+    {
+      "epoch": 2.6304661220567036,
+      "grad_norm": 8.748746871948242,
+      "learning_rate": 2.636518771331058e-05,
+      "loss": 1.7216,
+      "step": 1370
+    },
+    {
+      "epoch": 2.6496876501681883,
+      "grad_norm": 8.459169387817383,
+      "learning_rate": 2.61518771331058e-05,
+      "loss": 1.755,
+      "step": 1380
+    },
+    {
+      "epoch": 2.668909178279673,
+      "grad_norm": 8.21006965637207,
+      "learning_rate": 2.5938566552901024e-05,
+      "loss": 1.7775,
+      "step": 1390
+    },
+    {
+      "epoch": 2.688130706391158,
+      "grad_norm": 7.608190059661865,
+      "learning_rate": 2.572525597269625e-05,
+      "loss": 1.6885,
+      "step": 1400
+    },
+    {
+      "epoch": 2.707352234502643,
+      "grad_norm": 7.753986358642578,
+      "learning_rate": 2.551194539249147e-05,
+      "loss": 1.7477,
+      "step": 1410
+    },
+    {
+      "epoch": 2.726573762614128,
+      "grad_norm": 9.139946937561035,
+      "learning_rate": 2.5298634812286692e-05,
+      "loss": 1.7118,
+      "step": 1420
+    },
+    {
+      "epoch": 2.7457952907256127,
+      "grad_norm": 8.321805000305176,
+      "learning_rate": 2.5085324232081912e-05,
+      "loss": 1.7373,
+      "step": 1430
+    },
+    {
+      "epoch": 2.765016818837098,
+      "grad_norm": 6.919346809387207,
+      "learning_rate": 2.4872013651877136e-05,
+      "loss": 1.6811,
+      "step": 1440
+    },
+    {
+      "epoch": 2.7842383469485825,
+      "grad_norm": 8.567264556884766,
+      "learning_rate": 2.4658703071672357e-05,
+      "loss": 1.7402,
+      "step": 1450
+    },
+    {
+      "epoch": 2.803459875060067,
+      "grad_norm": 8.170304298400879,
+      "learning_rate": 2.4445392491467577e-05,
+      "loss": 1.8064,
+      "step": 1460
+    },
+    {
+      "epoch": 2.8226814031715524,
+      "grad_norm": 8.044082641601562,
+      "learning_rate": 2.42320819112628e-05,
+      "loss": 1.6772,
+      "step": 1470
+    },
+    {
+      "epoch": 2.841902931283037,
+      "grad_norm": 8.453600883483887,
+      "learning_rate": 2.401877133105802e-05,
+      "loss": 1.7352,
+      "step": 1480
+    },
+    {
+      "epoch": 2.8611244593945218,
+      "grad_norm": 8.70590591430664,
+      "learning_rate": 2.3805460750853242e-05,
+      "loss": 1.7539,
+      "step": 1490
+    },
+    {
+      "epoch": 2.880345987506007,
+      "grad_norm": 8.303061485290527,
+      "learning_rate": 2.3592150170648466e-05,
+      "loss": 1.7281,
+      "step": 1500
+    },
+    {
+      "epoch": 2.8995675156174916,
+      "grad_norm": 7.61099100112915,
+      "learning_rate": 2.3378839590443686e-05,
+      "loss": 1.6915,
+      "step": 1510
+    },
+    {
+      "epoch": 2.9187890437289763,
+      "grad_norm": 9.040791511535645,
+      "learning_rate": 2.316552901023891e-05,
+      "loss": 1.7099,
+      "step": 1520
+    },
+    {
+      "epoch": 2.9380105718404614,
+      "grad_norm": 8.227042198181152,
+      "learning_rate": 2.295221843003413e-05,
+      "loss": 1.7134,
+      "step": 1530
+    },
+    {
+      "epoch": 2.957232099951946,
+      "grad_norm": 7.818427085876465,
+      "learning_rate": 2.273890784982935e-05,
+      "loss": 1.7652,
+      "step": 1540
+    },
+    {
+      "epoch": 2.976453628063431,
+      "grad_norm": 7.761089324951172,
+      "learning_rate": 2.2525597269624575e-05,
+      "loss": 1.673,
+      "step": 1550
+    },
+    {
+      "epoch": 2.995675156174916,
+      "grad_norm": 9.148423194885254,
+      "learning_rate": 2.2312286689419796e-05,
+      "loss": 1.7545,
+      "step": 1560
+    },
+    {
+      "epoch": 3.0134550696780393,
+      "grad_norm": 6.961498260498047,
+      "learning_rate": 2.209897610921502e-05,
+      "loss": 1.6289,
+      "step": 1570
+    },
+    {
+      "epoch": 3.0326765977895245,
+      "grad_norm": 8.595748901367188,
+      "learning_rate": 2.188566552901024e-05,
+      "loss": 1.5369,
+      "step": 1580
+    },
+    {
+      "epoch": 3.051898125901009,
+      "grad_norm": 8.743279457092285,
+      "learning_rate": 2.1672354948805464e-05,
+      "loss": 1.5447,
+      "step": 1590
+    },
+    {
+      "epoch": 3.071119654012494,
+      "grad_norm": 7.540672302246094,
+      "learning_rate": 2.1459044368600684e-05,
+      "loss": 1.5113,
+      "step": 1600
+    },
+    {
+      "epoch": 3.090341182123979,
+      "grad_norm": 9.278829574584961,
+      "learning_rate": 2.1245733788395905e-05,
+      "loss": 1.5178,
+      "step": 1610
+    },
+    {
+      "epoch": 3.1095627102354637,
+      "grad_norm": 9.872568130493164,
+      "learning_rate": 2.1032423208191125e-05,
+      "loss": 1.5606,
+      "step": 1620
+    },
+    {
+      "epoch": 3.1287842383469484,
+      "grad_norm": 8.62325668334961,
+      "learning_rate": 2.081911262798635e-05,
+      "loss": 1.4871,
+      "step": 1630
+    },
+    {
+      "epoch": 3.1480057664584336,
+      "grad_norm": 8.459789276123047,
+      "learning_rate": 2.0605802047781573e-05,
+      "loss": 1.5561,
+      "step": 1640
+    },
+    {
+      "epoch": 3.1672272945699183,
+      "grad_norm": 9.252559661865234,
+      "learning_rate": 2.0392491467576794e-05,
+      "loss": 1.5595,
+      "step": 1650
+    },
+    {
+      "epoch": 3.1864488226814034,
+      "grad_norm": 9.68213939666748,
+      "learning_rate": 2.0179180887372014e-05,
+      "loss": 1.5517,
+      "step": 1660
+    },
+    {
+      "epoch": 3.205670350792888,
+      "grad_norm": 9.817381858825684,
+      "learning_rate": 1.9965870307167238e-05,
+      "loss": 1.4966,
+      "step": 1670
+    },
+    {
+      "epoch": 3.224891878904373,
+      "grad_norm": 9.264698028564453,
+      "learning_rate": 1.975255972696246e-05,
+      "loss": 1.5443,
+      "step": 1680
+    },
+    {
+      "epoch": 3.244113407015858,
+      "grad_norm": 8.848346710205078,
+      "learning_rate": 1.953924914675768e-05,
+      "loss": 1.4929,
+      "step": 1690
+    },
+    {
+      "epoch": 3.2633349351273426,
+      "grad_norm": 9.325592994689941,
+      "learning_rate": 1.93259385665529e-05,
+      "loss": 1.5259,
+      "step": 1700
+    },
+    {
+      "epoch": 3.2825564632388273,
+      "grad_norm": 10.281765937805176,
+      "learning_rate": 1.9112627986348127e-05,
+      "loss": 1.5497,
+      "step": 1710
+    },
+    {
+      "epoch": 3.3017779913503125,
+      "grad_norm": 8.460485458374023,
+      "learning_rate": 1.8899317406143347e-05,
+      "loss": 1.5051,
+      "step": 1720
+    },
+    {
+      "epoch": 3.320999519461797,
+      "grad_norm": 9.479697227478027,
+      "learning_rate": 1.8686006825938568e-05,
+      "loss": 1.5353,
+      "step": 1730
+    },
+    {
+      "epoch": 3.340221047573282,
+      "grad_norm": 8.678592681884766,
+      "learning_rate": 1.8472696245733788e-05,
+      "loss": 1.5865,
+      "step": 1740
+    },
+    {
+      "epoch": 3.359442575684767,
+      "grad_norm": 9.709846496582031,
+      "learning_rate": 1.8259385665529012e-05,
+      "loss": 1.5572,
+      "step": 1750
+    },
+    {
+      "epoch": 3.3786641037962517,
+      "grad_norm": 9.438785552978516,
+      "learning_rate": 1.8046075085324232e-05,
+      "loss": 1.5867,
+      "step": 1760
+    },
+    {
+      "epoch": 3.397885631907737,
+      "grad_norm": 8.967934608459473,
+      "learning_rate": 1.7832764505119453e-05,
+      "loss": 1.49,
+      "step": 1770
+    },
+    {
+      "epoch": 3.4171071600192215,
+      "grad_norm": 9.989371299743652,
+      "learning_rate": 1.7619453924914677e-05,
+      "loss": 1.5461,
+      "step": 1780
+    },
+    {
+      "epoch": 3.4363286881307062,
+      "grad_norm": 8.968268394470215,
+      "learning_rate": 1.74061433447099e-05,
+      "loss": 1.6002,
+      "step": 1790
+    },
+    {
+      "epoch": 3.4555502162421914,
+      "grad_norm": 9.105525016784668,
+      "learning_rate": 1.719283276450512e-05,
+      "loss": 1.5934,
+      "step": 1800
+    },
+    {
+      "epoch": 3.474771744353676,
+      "grad_norm": 8.997435569763184,
+      "learning_rate": 1.697952218430034e-05,
+      "loss": 1.5687,
+      "step": 1810
+    },
+    {
+      "epoch": 3.4939932724651612,
+      "grad_norm": 9.625081062316895,
+      "learning_rate": 1.6766211604095562e-05,
+      "loss": 1.5684,
+      "step": 1820
+    },
+    {
+      "epoch": 3.513214800576646,
+      "grad_norm": 8.427741050720215,
+      "learning_rate": 1.6552901023890786e-05,
+      "loss": 1.4639,
+      "step": 1830
+    },
+    {
+      "epoch": 3.5324363286881306,
+      "grad_norm": 10.164386749267578,
+      "learning_rate": 1.6339590443686006e-05,
+      "loss": 1.5512,
+      "step": 1840
+    },
+    {
+      "epoch": 3.5516578567996158,
+      "grad_norm": 9.009414672851562,
+      "learning_rate": 1.612627986348123e-05,
+      "loss": 1.5127,
+      "step": 1850
+    },
+    {
+      "epoch": 3.5708793849111005,
+      "grad_norm": 11.138776779174805,
+      "learning_rate": 1.591296928327645e-05,
+      "loss": 1.4763,
+      "step": 1860
+    },
+    {
+      "epoch": 3.590100913022585,
+      "grad_norm": 8.983370780944824,
+      "learning_rate": 1.5699658703071675e-05,
+      "loss": 1.5661,
+      "step": 1870
+    },
+    {
+      "epoch": 3.6093224411340703,
+      "grad_norm": 9.458341598510742,
+      "learning_rate": 1.5486348122866895e-05,
+      "loss": 1.5151,
+      "step": 1880
+    },
+    {
+      "epoch": 3.628543969245555,
+      "grad_norm": 9.019761085510254,
+      "learning_rate": 1.5273037542662116e-05,
+      "loss": 1.5646,
+      "step": 1890
+    },
+    {
+      "epoch": 3.6477654973570397,
+      "grad_norm": 9.247156143188477,
+      "learning_rate": 1.5059726962457338e-05,
+      "loss": 1.5257,
+      "step": 1900
+    },
+    {
+      "epoch": 3.666987025468525,
+      "grad_norm": 9.937666893005371,
+      "learning_rate": 1.484641638225256e-05,
+      "loss": 1.5934,
+      "step": 1910
+    },
+    {
+      "epoch": 3.6862085535800095,
+      "grad_norm": 8.887203216552734,
+      "learning_rate": 1.4633105802047784e-05,
+      "loss": 1.5373,
+      "step": 1920
+    },
+    {
+      "epoch": 3.7054300816914942,
+      "grad_norm": 8.323814392089844,
+      "learning_rate": 1.4419795221843004e-05,
+      "loss": 1.5178,
+      "step": 1930
+    },
+    {
+      "epoch": 3.7246516098029794,
+      "grad_norm": 8.78007984161377,
+      "learning_rate": 1.4206484641638227e-05,
+      "loss": 1.5283,
+      "step": 1940
+    },
+    {
+      "epoch": 3.743873137914464,
+      "grad_norm": 10.119393348693848,
+      "learning_rate": 1.3993174061433447e-05,
+      "loss": 1.5611,
+      "step": 1950
+    },
+    {
+      "epoch": 3.763094666025949,
+      "grad_norm": 8.493330001831055,
+      "learning_rate": 1.377986348122867e-05,
+      "loss": 1.5285,
+      "step": 1960
+    },
+    {
+      "epoch": 3.782316194137434,
+      "grad_norm": 9.496063232421875,
+      "learning_rate": 1.3566552901023891e-05,
+      "loss": 1.4765,
+      "step": 1970
+    },
+    {
+      "epoch": 3.801537722248919,
+      "grad_norm": 9.950116157531738,
+      "learning_rate": 1.3353242320819112e-05,
+      "loss": 1.522,
+      "step": 1980
+    },
+    {
+      "epoch": 3.8207592503604038,
+      "grad_norm": 9.816246032714844,
+      "learning_rate": 1.3139931740614336e-05,
+      "loss": 1.5085,
+      "step": 1990
+    },
+    {
+      "epoch": 3.8399807784718885,
+      "grad_norm": 9.766254425048828,
+      "learning_rate": 1.2926621160409558e-05,
+      "loss": 1.6145,
+      "step": 2000
+    },
+    {
+      "epoch": 3.8592023065833736,
+      "grad_norm": 9.258591651916504,
+      "learning_rate": 1.2713310580204778e-05,
+      "loss": 1.5292,
+      "step": 2010
+    },
+    {
+      "epoch": 3.8784238346948583,
+      "grad_norm": 9.976773262023926,
+      "learning_rate": 1.25e-05,
+      "loss": 1.5811,
+      "step": 2020
+    },
+    {
+      "epoch": 3.897645362806343,
+      "grad_norm": 9.001440048217773,
+      "learning_rate": 1.2286689419795223e-05,
+      "loss": 1.5401,
+      "step": 2030
+    },
+    {
+      "epoch": 3.916866890917828,
+      "grad_norm": 10.00130558013916,
+      "learning_rate": 1.2073378839590445e-05,
+      "loss": 1.6257,
+      "step": 2040
+    },
+    {
+      "epoch": 3.936088419029313,
+      "grad_norm": 8.235420227050781,
+      "learning_rate": 1.1860068259385667e-05,
+      "loss": 1.4888,
+      "step": 2050
+    },
+    {
+      "epoch": 3.9553099471407975,
+      "grad_norm": 9.468134880065918,
+      "learning_rate": 1.1646757679180888e-05,
+      "loss": 1.6077,
+      "step": 2060
+    },
+    {
+      "epoch": 3.9745314752522827,
+      "grad_norm": 8.448241233825684,
+      "learning_rate": 1.143344709897611e-05,
+      "loss": 1.5568,
+      "step": 2070
+    },
+    {
+      "epoch": 3.9937530033637674,
+      "grad_norm": 9.615592956542969,
+      "learning_rate": 1.1220136518771332e-05,
+      "loss": 1.6119,
+      "step": 2080
+    },
+    {
+      "epoch": 4.011532916866891,
+      "grad_norm": 8.427413940429688,
+      "learning_rate": 1.1006825938566554e-05,
+      "loss": 1.5303,
+      "step": 2090
+    },
+    {
+      "epoch": 4.030754444978376,
+      "grad_norm": 7.633726119995117,
+      "learning_rate": 1.0793515358361775e-05,
+      "loss": 1.368,
+      "step": 2100
+    },
+    {
+      "epoch": 4.049975973089861,
+      "grad_norm": 9.611651420593262,
+      "learning_rate": 1.0580204778156999e-05,
+      "loss": 1.3859,
+      "step": 2110
+    },
+    {
+      "epoch": 4.069197501201345,
+      "grad_norm": 10.329586029052734,
+      "learning_rate": 1.0366894197952219e-05,
+      "loss": 1.4339,
+      "step": 2120
+    },
+    {
+      "epoch": 4.08841902931283,
+      "grad_norm": 10.441389083862305,
+      "learning_rate": 1.0153583617747441e-05,
+      "loss": 1.4027,
+      "step": 2130
+    },
+    {
+      "epoch": 4.1076405574243156,
+      "grad_norm": 11.6917085647583,
+      "learning_rate": 9.940273037542662e-06,
+      "loss": 1.44,
+      "step": 2140
+    },
+    {
+      "epoch": 4.1268620855358,
+      "grad_norm": 9.722172737121582,
+      "learning_rate": 9.726962457337886e-06,
+      "loss": 1.376,
+      "step": 2150
+    },
+    {
+      "epoch": 4.146083613647285,
+      "grad_norm": 9.619889259338379,
+      "learning_rate": 9.513651877133106e-06,
+      "loss": 1.3754,
+      "step": 2160
+    },
+    {
+      "epoch": 4.16530514175877,
+      "grad_norm": 10.17457103729248,
+      "learning_rate": 9.300341296928328e-06,
+      "loss": 1.3115,
+      "step": 2170
+    },
+    {
+      "epoch": 4.184526669870254,
+      "grad_norm": 9.053975105285645,
+      "learning_rate": 9.08703071672355e-06,
+      "loss": 1.3668,
+      "step": 2180
+    },
+    {
+      "epoch": 4.2037481979817395,
+      "grad_norm": 9.389638900756836,
+      "learning_rate": 8.873720136518773e-06,
+      "loss": 1.4006,
+      "step": 2190
+    },
+    {
+      "epoch": 4.222969726093225,
+      "grad_norm": 10.160106658935547,
+      "learning_rate": 8.660409556313993e-06,
+      "loss": 1.4084,
+      "step": 2200
+    },
+    {
+      "epoch": 4.242191254204709,
+      "grad_norm": 11.254199981689453,
+      "learning_rate": 8.447098976109215e-06,
+      "loss": 1.4647,
+      "step": 2210
+    },
+    {
+      "epoch": 4.261412782316194,
+      "grad_norm": 10.098913192749023,
+      "learning_rate": 8.233788395904437e-06,
+      "loss": 1.427,
+      "step": 2220
+    },
+    {
+      "epoch": 4.280634310427679,
+      "grad_norm": 10.879392623901367,
+      "learning_rate": 8.02047781569966e-06,
+      "loss": 1.4457,
+      "step": 2230
+    },
+    {
+      "epoch": 4.299855838539164,
+      "grad_norm": 9.308858871459961,
+      "learning_rate": 7.80716723549488e-06,
+      "loss": 1.3712,
+      "step": 2240
+    },
+    {
+      "epoch": 4.319077366650649,
+      "grad_norm": 10.300002098083496,
+      "learning_rate": 7.593856655290103e-06,
+      "loss": 1.3592,
+      "step": 2250
+    },
+    {
+      "epoch": 4.338298894762134,
+      "grad_norm": 10.036710739135742,
+      "learning_rate": 7.3805460750853244e-06,
+      "loss": 1.4002,
+      "step": 2260
+    },
+    {
+      "epoch": 4.357520422873619,
+      "grad_norm": 9.933674812316895,
+      "learning_rate": 7.167235494880546e-06,
+      "loss": 1.3751,
+      "step": 2270
+    },
+    {
+      "epoch": 4.376741950985103,
+      "grad_norm": 11.534444808959961,
+      "learning_rate": 6.953924914675768e-06,
+      "loss": 1.3938,
+      "step": 2280
+    },
+    {
+      "epoch": 4.395963479096588,
+      "grad_norm": 10.834949493408203,
+      "learning_rate": 6.74061433447099e-06,
+      "loss": 1.476,
+      "step": 2290
+    },
+    {
+      "epoch": 4.415185007208073,
+      "grad_norm": 10.888574600219727,
+      "learning_rate": 6.5273037542662115e-06,
+      "loss": 1.4017,
+      "step": 2300
+    },
+    {
+      "epoch": 4.434406535319558,
+      "grad_norm": 10.21182918548584,
+      "learning_rate": 6.313993174061434e-06,
+      "loss": 1.432,
+      "step": 2310
+    },
+    {
+      "epoch": 4.453628063431043,
+      "grad_norm": 10.255495071411133,
+      "learning_rate": 6.100682593856656e-06,
+      "loss": 1.3986,
+      "step": 2320
+    },
+    {
+      "epoch": 4.472849591542528,
+      "grad_norm": 9.658021926879883,
+      "learning_rate": 5.887372013651877e-06,
+      "loss": 1.384,
+      "step": 2330
+    },
+    {
+      "epoch": 4.492071119654012,
+      "grad_norm": 10.915364265441895,
+      "learning_rate": 5.674061433447099e-06,
+      "loss": 1.4102,
+      "step": 2340
+    },
+    {
+      "epoch": 4.511292647765497,
+      "grad_norm": 9.598386764526367,
+      "learning_rate": 5.4607508532423215e-06,
+      "loss": 1.362,
+      "step": 2350
+    },
+    {
+      "epoch": 4.5305141758769825,
+      "grad_norm": 11.112643241882324,
+      "learning_rate": 5.247440273037543e-06,
+      "loss": 1.4069,
+      "step": 2360
+    },
+    {
+      "epoch": 4.549735703988467,
+      "grad_norm": 10.625088691711426,
+      "learning_rate": 5.034129692832765e-06,
+      "loss": 1.3952,
+      "step": 2370
+    },
+    {
+      "epoch": 4.568957232099952,
+      "grad_norm": 10.471019744873047,
+      "learning_rate": 4.820819112627987e-06,
+      "loss": 1.4068,
+      "step": 2380
+    },
+    {
+      "epoch": 4.588178760211437,
+      "grad_norm": 10.431234359741211,
+      "learning_rate": 4.6075085324232085e-06,
+      "loss": 1.4116,
+      "step": 2390
+    },
+    {
+      "epoch": 4.607400288322921,
+      "grad_norm": 10.526016235351562,
+      "learning_rate": 4.394197952218431e-06,
+      "loss": 1.3311,
+      "step": 2400
+    },
+    {
+      "epoch": 4.626621816434406,
+      "grad_norm": 10.347298622131348,
+      "learning_rate": 4.180887372013652e-06,
+      "loss": 1.3508,
+      "step": 2410
+    },
+    {
+      "epoch": 4.6458433445458915,
+      "grad_norm": 9.437437057495117,
+      "learning_rate": 3.967576791808874e-06,
+      "loss": 1.3888,
+      "step": 2420
+    },
+    {
+      "epoch": 4.665064872657377,
+      "grad_norm": 9.274221420288086,
+      "learning_rate": 3.754266211604096e-06,
+      "loss": 1.3586,
+      "step": 2430
+    },
+    {
+      "epoch": 4.684286400768861,
+      "grad_norm": 10.08513069152832,
+      "learning_rate": 3.5409556313993173e-06,
+      "loss": 1.3351,
+      "step": 2440
+    },
+    {
+      "epoch": 4.703507928880346,
+      "grad_norm": 10.008994102478027,
+      "learning_rate": 3.3276450511945395e-06,
+      "loss": 1.3078,
+      "step": 2450
+    },
+    {
+      "epoch": 4.722729456991831,
+      "grad_norm": 8.64718246459961,
+      "learning_rate": 3.1143344709897613e-06,
+      "loss": 1.3553,
+      "step": 2460
+    },
+    {
+      "epoch": 4.7419509851033155,
+      "grad_norm": 10.21820068359375,
+      "learning_rate": 2.901023890784983e-06,
+      "loss": 1.3782,
+      "step": 2470
+    },
+    {
+      "epoch": 4.761172513214801,
+      "grad_norm": 10.363849639892578,
+      "learning_rate": 2.687713310580205e-06,
+      "loss": 1.3658,
+      "step": 2480
+    },
+    {
+      "epoch": 4.780394041326286,
+      "grad_norm": 11.896404266357422,
+      "learning_rate": 2.474402730375427e-06,
+      "loss": 1.3566,
+      "step": 2490
+    },
+    {
+      "epoch": 4.79961556943777,
+      "grad_norm": 9.402819633483887,
+      "learning_rate": 2.2610921501706487e-06,
+      "loss": 1.3692,
+      "step": 2500
+    },
+    {
+      "epoch": 4.818837097549255,
+      "grad_norm": 11.33333683013916,
+      "learning_rate": 2.0477815699658705e-06,
+      "loss": 1.4347,
+      "step": 2510
+    },
+    {
+      "epoch": 4.83805862566074,
+      "grad_norm": 10.20227336883545,
+      "learning_rate": 1.8344709897610922e-06,
+      "loss": 1.4151,
+      "step": 2520
+    },
+    {
+      "epoch": 4.8572801537722246,
+      "grad_norm": 10.206210136413574,
+      "learning_rate": 1.621160409556314e-06,
+      "loss": 1.3516,
+      "step": 2530
+    },
+    {
+      "epoch": 4.87650168188371,
+      "grad_norm": 10.120949745178223,
+      "learning_rate": 1.407849829351536e-06,
+      "loss": 1.3973,
+      "step": 2540
+    },
+    {
+      "epoch": 4.895723209995195,
+      "grad_norm": 10.073768615722656,
+      "learning_rate": 1.194539249146758e-06,
+      "loss": 1.3755,
+      "step": 2550
+    },
+    {
+      "epoch": 4.91494473810668,
+      "grad_norm": 11.21468448638916,
+      "learning_rate": 9.812286689419797e-07,
+      "loss": 1.3614,
+      "step": 2560
+    },
+    {
+      "epoch": 4.934166266218164,
+      "grad_norm": 11.302360534667969,
+      "learning_rate": 7.679180887372014e-07,
+      "loss": 1.4116,
+      "step": 2570
+    },
+    {
+      "epoch": 4.953387794329649,
+      "grad_norm": 9.757354736328125,
+      "learning_rate": 5.546075085324233e-07,
+      "loss": 1.346,
+      "step": 2580
+    },
+    {
+      "epoch": 4.972609322441134,
+      "grad_norm": 9.289881706237793,
+      "learning_rate": 3.412969283276451e-07,
+      "loss": 1.4193,
+      "step": 2590
+    },
+    {
+      "epoch": 4.991830850552619,
+      "grad_norm": 11.2091064453125,
+      "learning_rate": 1.279863481228669e-07,
+      "loss": 1.4567,
+      "step": 2600
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2605,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 9359482190499840.0,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-2605/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8f3f2c51094de7b2e8fb357753c21db50fbd28d5b3e8bc9181e6be1839b36076
+size 5713