Model save

Browse files

Files changed (11) hide show

README.md +33 -180
all_results.json +18 -0
config.json +2 -2
eval_results.json +13 -0
generation_config.json +7 -0
model.safetensors +1 -1
tokenizer.json +1 -6
tokenizer_config.json +0 -4
train_results.json +8 -0
trainer_state.json +1535 -0
training_args.bin +3 -0

README.md CHANGED Viewed

@@ -1,199 +1,52 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: apache-2.0
+base_model: google-t5/t5-small
+tags:
+- generated_from_trainer
+model-index:
+- name: tst-summarization
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# tst-summarization
+This model is a fine-tuned version of [google-t5/t5-small](https://huggingface.co/google-t5/t5-small) on an unknown dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 0.0
+### Training results
+### Framework versions
+- Transformers 4.38.2
+- Pytorch 2.2.1+cu121
+- Datasets 2.18.0
+- Tokenizers 0.15.2

all_results.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+    "epoch": 3.0,
+    "eval_gen_len": 73.2282315978456,
+    "eval_loss": 1.641965389251709,
+    "eval_rouge1": 41.522,
+    "eval_rouge2": 19.1468,
+    "eval_rougeL": 29.378,
+    "eval_rougeLsum": 38.7211,
+    "eval_runtime": 1042.218,
+    "eval_samples": 13368,
+    "eval_samples_per_second": 12.826,
+    "eval_steps_per_second": 1.603,
+    "train_loss": 1.8120383115867458,
+    "train_runtime": 16851.3759,
+    "train_samples": 287113,
+    "train_samples_per_second": 51.114,
+    "train_steps_per_second": 6.389
+}

config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
-  "_name_or_path": "/tmp/tst-summarization/",
   "architectures": [
-    "T5Model"
   ],
   "classifier_dropout": 0.0,
   "d_ff": 2048,

 {
+  "_name_or_path": "google-t5/t5-small",
   "architectures": [
+    "T5ForConditionalGeneration"
   ],
   "classifier_dropout": 0.0,
   "d_ff": 2048,

eval_results.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+    "epoch": 3.0,
+    "eval_gen_len": 73.2282315978456,
+    "eval_loss": 1.641965389251709,
+    "eval_rouge1": 41.522,
+    "eval_rouge2": 19.1468,
+    "eval_rougeL": 29.378,
+    "eval_rougeLsum": 38.7211,
+    "eval_runtime": 1042.218,
+    "eval_samples": 13368,
+    "eval_samples_per_second": 12.826,
+    "eval_steps_per_second": 1.603
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "decoder_start_token_id": 0,
+  "eos_token_id": 1,
+  "pad_token_id": 0,
+  "transformers_version": "4.38.2"
+}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bbcc3dc388204ec566ca13b045738e173126e70e92eca1730c1b0b1cfb241ef4
 size 242041896

 version https://git-lfs.github.com/spec/v1
+oid sha256:b98a199b642a79ace9c1771f134484f74f81d548d02f86928a9c114b4f25e306
 size 242041896

tokenizer.json CHANGED Viewed

@@ -1,11 +1,6 @@
 {
   "version": "1.0",
-  "truncation": {
-    "direction": "Right",
-    "max_length": 128,
-    "strategy": "LongestFirst",
-    "stride": 0
-  },
   "padding": null,
   "added_tokens": [
     {

 {
   "version": "1.0",
+  "truncation": null,
   "padding": null,
   "added_tokens": [
     {

tokenizer_config.json CHANGED Viewed

@@ -930,12 +930,8 @@
   "clean_up_tokenization_spaces": true,
   "eos_token": "</s>",
   "extra_ids": 100,
-  "max_length": 128,
   "model_max_length": 512,
   "pad_token": "<pad>",
-  "stride": 0,
   "tokenizer_class": "T5Tokenizer",
-  "truncation_side": "right",
-  "truncation_strategy": "longest_first",
   "unk_token": "<unk>"
 }

   "clean_up_tokenization_spaces": true,
   "eos_token": "</s>",
   "extra_ids": 100,
   "model_max_length": 512,
   "pad_token": "<pad>",
   "tokenizer_class": "T5Tokenizer",
   "unk_token": "<unk>"
 }

train_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+    "epoch": 3.0,
+    "train_loss": 1.8120383115867458,
+    "train_runtime": 16851.3759,
+    "train_samples": 287113,
+    "train_samples_per_second": 51.114,
+    "train_steps_per_second": 6.389
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,1535 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 3.0,
+  "eval_steps": 500,
+  "global_step": 107670,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.01,
+      "grad_norm": 1.9537303447723389,
+      "learning_rate": 4.976780904615956e-05,
+      "loss": 1.9668,
+      "step": 500
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 1.9948451519012451,
+      "learning_rate": 4.953561809231913e-05,
+      "loss": 1.8978,
+      "step": 1000
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 1.6757797002792358,
+      "learning_rate": 4.9303427138478686e-05,
+      "loss": 1.8974,
+      "step": 1500
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.7021833658218384,
+      "learning_rate": 4.9071236184638245e-05,
+      "loss": 1.9052,
+      "step": 2000
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 2.8648219108581543,
+      "learning_rate": 4.883904523079781e-05,
+      "loss": 1.896,
+      "step": 2500
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.8240835666656494,
+      "learning_rate": 4.860685427695737e-05,
+      "loss": 1.868,
+      "step": 3000
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 2.1527557373046875,
+      "learning_rate": 4.837466332311693e-05,
+      "loss": 1.8731,
+      "step": 3500
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.9050869941711426,
+      "learning_rate": 4.81424723692765e-05,
+      "loss": 1.8807,
+      "step": 4000
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.9118114709854126,
+      "learning_rate": 4.791028141543606e-05,
+      "loss": 1.8889,
+      "step": 4500
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.4827492237091064,
+      "learning_rate": 4.767809046159562e-05,
+      "loss": 1.8786,
+      "step": 5000
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.8363677263259888,
+      "learning_rate": 4.7445899507755184e-05,
+      "loss": 1.8741,
+      "step": 5500
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.9690338373184204,
+      "learning_rate": 4.721370855391474e-05,
+      "loss": 1.8755,
+      "step": 6000
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.6585566997528076,
+      "learning_rate": 4.69815176000743e-05,
+      "loss": 1.8575,
+      "step": 6500
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.619232177734375,
+      "learning_rate": 4.674932664623387e-05,
+      "loss": 1.8638,
+      "step": 7000
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.8518427610397339,
+      "learning_rate": 4.6517135692393426e-05,
+      "loss": 1.8653,
+      "step": 7500
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 2.0025551319122314,
+      "learning_rate": 4.6284944738552985e-05,
+      "loss": 1.8653,
+      "step": 8000
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 2.128351926803589,
+      "learning_rate": 4.605275378471255e-05,
+      "loss": 1.863,
+      "step": 8500
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 2.0918049812316895,
+      "learning_rate": 4.582056283087211e-05,
+      "loss": 1.8472,
+      "step": 9000
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 2.1041295528411865,
+      "learning_rate": 4.558837187703167e-05,
+      "loss": 1.8523,
+      "step": 9500
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.6074202060699463,
+      "learning_rate": 4.5356180923191235e-05,
+      "loss": 1.8683,
+      "step": 10000
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.8500597476959229,
+      "learning_rate": 4.5123989969350793e-05,
+      "loss": 1.8622,
+      "step": 10500
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 2.6290550231933594,
+      "learning_rate": 4.489179901551035e-05,
+      "loss": 1.8539,
+      "step": 11000
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.9424813985824585,
+      "learning_rate": 4.4659608061669925e-05,
+      "loss": 1.8579,
+      "step": 11500
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.5146945714950562,
+      "learning_rate": 4.4427417107829484e-05,
+      "loss": 1.8541,
+      "step": 12000
+    },
+    {
+      "epoch": 0.35,
+      "grad_norm": 1.620976209640503,
+      "learning_rate": 4.419522615398904e-05,
+      "loss": 1.8601,
+      "step": 12500
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 1.6728603839874268,
+      "learning_rate": 4.396303520014861e-05,
+      "loss": 1.8716,
+      "step": 13000
+    },
+    {
+      "epoch": 0.38,
+      "grad_norm": 2.357908010482788,
+      "learning_rate": 4.373084424630817e-05,
+      "loss": 1.85,
+      "step": 13500
+    },
+    {
+      "epoch": 0.39,
+      "grad_norm": 1.9841961860656738,
+      "learning_rate": 4.3498653292467726e-05,
+      "loss": 1.8683,
+      "step": 14000
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 1.6038274765014648,
+      "learning_rate": 4.326646233862729e-05,
+      "loss": 1.8481,
+      "step": 14500
+    },
+    {
+      "epoch": 0.42,
+      "grad_norm": 1.645585060119629,
+      "learning_rate": 4.303427138478685e-05,
+      "loss": 1.8512,
+      "step": 15000
+    },
+    {
+      "epoch": 0.43,
+      "grad_norm": 1.8110160827636719,
+      "learning_rate": 4.280208043094641e-05,
+      "loss": 1.8539,
+      "step": 15500
+    },
+    {
+      "epoch": 0.45,
+      "grad_norm": 1.6607145071029663,
+      "learning_rate": 4.2569889477105975e-05,
+      "loss": 1.8464,
+      "step": 16000
+    },
+    {
+      "epoch": 0.46,
+      "grad_norm": 1.4501959085464478,
+      "learning_rate": 4.2337698523265534e-05,
+      "loss": 1.8304,
+      "step": 16500
+    },
+    {
+      "epoch": 0.47,
+      "grad_norm": 1.8375539779663086,
+      "learning_rate": 4.210550756942509e-05,
+      "loss": 1.8413,
+      "step": 17000
+    },
+    {
+      "epoch": 0.49,
+      "grad_norm": 1.5968436002731323,
+      "learning_rate": 4.187331661558466e-05,
+      "loss": 1.8373,
+      "step": 17500
+    },
+    {
+      "epoch": 0.5,
+      "grad_norm": 1.3943761587142944,
+      "learning_rate": 4.164112566174422e-05,
+      "loss": 1.8509,
+      "step": 18000
+    },
+    {
+      "epoch": 0.52,
+      "grad_norm": 1.7238069772720337,
+      "learning_rate": 4.140893470790378e-05,
+      "loss": 1.8549,
+      "step": 18500
+    },
+    {
+      "epoch": 0.53,
+      "grad_norm": 1.6093735694885254,
+      "learning_rate": 4.117674375406335e-05,
+      "loss": 1.8272,
+      "step": 19000
+    },
+    {
+      "epoch": 0.54,
+      "grad_norm": 1.628990888595581,
+      "learning_rate": 4.094455280022291e-05,
+      "loss": 1.852,
+      "step": 19500
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 1.7063369750976562,
+      "learning_rate": 4.071236184638247e-05,
+      "loss": 1.8506,
+      "step": 20000
+    },
+    {
+      "epoch": 0.57,
+      "grad_norm": 1.496973991394043,
+      "learning_rate": 4.048017089254203e-05,
+      "loss": 1.8382,
+      "step": 20500
+    },
+    {
+      "epoch": 0.59,
+      "grad_norm": 1.50234854221344,
+      "learning_rate": 4.024797993870159e-05,
+      "loss": 1.8335,
+      "step": 21000
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 1.779697060585022,
+      "learning_rate": 4.001578898486115e-05,
+      "loss": 1.8407,
+      "step": 21500
+    },
+    {
+      "epoch": 0.61,
+      "grad_norm": 1.792860507965088,
+      "learning_rate": 3.9783598031020716e-05,
+      "loss": 1.8279,
+      "step": 22000
+    },
+    {
+      "epoch": 0.63,
+      "grad_norm": 1.714028239250183,
+      "learning_rate": 3.9551407077180275e-05,
+      "loss": 1.8265,
+      "step": 22500
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 2.1110615730285645,
+      "learning_rate": 3.9319216123339834e-05,
+      "loss": 1.8312,
+      "step": 23000
+    },
+    {
+      "epoch": 0.65,
+      "grad_norm": 1.8647916316986084,
+      "learning_rate": 3.90870251694994e-05,
+      "loss": 1.8417,
+      "step": 23500
+    },
+    {
+      "epoch": 0.67,
+      "grad_norm": 1.669309377670288,
+      "learning_rate": 3.885483421565896e-05,
+      "loss": 1.8347,
+      "step": 24000
+    },
+    {
+      "epoch": 0.68,
+      "grad_norm": 1.6504507064819336,
+      "learning_rate": 3.862264326181852e-05,
+      "loss": 1.8283,
+      "step": 24500
+    },
+    {
+      "epoch": 0.7,
+      "grad_norm": 1.5461372137069702,
+      "learning_rate": 3.839045230797808e-05,
+      "loss": 1.8394,
+      "step": 25000
+    },
+    {
+      "epoch": 0.71,
+      "grad_norm": 2.034350872039795,
+      "learning_rate": 3.815826135413764e-05,
+      "loss": 1.8326,
+      "step": 25500
+    },
+    {
+      "epoch": 0.72,
+      "grad_norm": 1.6590315103530884,
+      "learning_rate": 3.792607040029721e-05,
+      "loss": 1.8343,
+      "step": 26000
+    },
+    {
+      "epoch": 0.74,
+      "grad_norm": 1.9240669012069702,
+      "learning_rate": 3.769387944645677e-05,
+      "loss": 1.8601,
+      "step": 26500
+    },
+    {
+      "epoch": 0.75,
+      "grad_norm": 1.5366922616958618,
+      "learning_rate": 3.746168849261633e-05,
+      "loss": 1.8191,
+      "step": 27000
+    },
+    {
+      "epoch": 0.77,
+      "grad_norm": 1.8003100156784058,
+      "learning_rate": 3.722949753877589e-05,
+      "loss": 1.8313,
+      "step": 27500
+    },
+    {
+      "epoch": 0.78,
+      "grad_norm": 3.656419277191162,
+      "learning_rate": 3.6997306584935456e-05,
+      "loss": 1.8192,
+      "step": 28000
+    },
+    {
+      "epoch": 0.79,
+      "grad_norm": 1.7153360843658447,
+      "learning_rate": 3.6765115631095015e-05,
+      "loss": 1.844,
+      "step": 28500
+    },
+    {
+      "epoch": 0.81,
+      "grad_norm": 1.5900826454162598,
+      "learning_rate": 3.6532924677254574e-05,
+      "loss": 1.8241,
+      "step": 29000
+    },
+    {
+      "epoch": 0.82,
+      "grad_norm": 1.9120334386825562,
+      "learning_rate": 3.630073372341414e-05,
+      "loss": 1.8367,
+      "step": 29500
+    },
+    {
+      "epoch": 0.84,
+      "grad_norm": 1.6516207456588745,
+      "learning_rate": 3.60685427695737e-05,
+      "loss": 1.8259,
+      "step": 30000
+    },
+    {
+      "epoch": 0.85,
+      "grad_norm": 1.7761846780776978,
+      "learning_rate": 3.583635181573326e-05,
+      "loss": 1.8322,
+      "step": 30500
+    },
+    {
+      "epoch": 0.86,
+      "grad_norm": 1.7233374118804932,
+      "learning_rate": 3.560416086189282e-05,
+      "loss": 1.8273,
+      "step": 31000
+    },
+    {
+      "epoch": 0.88,
+      "grad_norm": 1.6779149770736694,
+      "learning_rate": 3.537196990805238e-05,
+      "loss": 1.8281,
+      "step": 31500
+    },
+    {
+      "epoch": 0.89,
+      "grad_norm": 1.6603174209594727,
+      "learning_rate": 3.513977895421194e-05,
+      "loss": 1.8307,
+      "step": 32000
+    },
+    {
+      "epoch": 0.91,
+      "grad_norm": 1.6743112802505493,
+      "learning_rate": 3.490758800037151e-05,
+      "loss": 1.8214,
+      "step": 32500
+    },
+    {
+      "epoch": 0.92,
+      "grad_norm": 1.775214672088623,
+      "learning_rate": 3.4675397046531066e-05,
+      "loss": 1.8433,
+      "step": 33000
+    },
+    {
+      "epoch": 0.93,
+      "grad_norm": 1.6865200996398926,
+      "learning_rate": 3.444320609269063e-05,
+      "loss": 1.8342,
+      "step": 33500
+    },
+    {
+      "epoch": 0.95,
+      "grad_norm": 1.5064520835876465,
+      "learning_rate": 3.42110151388502e-05,
+      "loss": 1.8259,
+      "step": 34000
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 1.599684476852417,
+      "learning_rate": 3.3978824185009756e-05,
+      "loss": 1.8261,
+      "step": 34500
+    },
+    {
+      "epoch": 0.98,
+      "grad_norm": 1.7616256475448608,
+      "learning_rate": 3.3746633231169315e-05,
+      "loss": 1.8191,
+      "step": 35000
+    },
+    {
+      "epoch": 0.99,
+      "grad_norm": 1.7617506980895996,
+      "learning_rate": 3.351444227732888e-05,
+      "loss": 1.8203,
+      "step": 35500
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 2.03489351272583,
+      "learning_rate": 3.328225132348844e-05,
+      "loss": 1.825,
+      "step": 36000
+    },
+    {
+      "epoch": 1.02,
+      "grad_norm": 1.5925586223602295,
+      "learning_rate": 3.3050060369648e-05,
+      "loss": 1.8058,
+      "step": 36500
+    },
+    {
+      "epoch": 1.03,
+      "grad_norm": 2.1856117248535156,
+      "learning_rate": 3.2817869415807564e-05,
+      "loss": 1.8085,
+      "step": 37000
+    },
+    {
+      "epoch": 1.04,
+      "grad_norm": 1.5426925420761108,
+      "learning_rate": 3.258567846196712e-05,
+      "loss": 1.8058,
+      "step": 37500
+    },
+    {
+      "epoch": 1.06,
+      "grad_norm": 2.0849246978759766,
+      "learning_rate": 3.235348750812668e-05,
+      "loss": 1.796,
+      "step": 38000
+    },
+    {
+      "epoch": 1.07,
+      "grad_norm": 1.3791111707687378,
+      "learning_rate": 3.212129655428625e-05,
+      "loss": 1.8027,
+      "step": 38500
+    },
+    {
+      "epoch": 1.09,
+      "grad_norm": 1.5735108852386475,
+      "learning_rate": 3.1889105600445806e-05,
+      "loss": 1.8188,
+      "step": 39000
+    },
+    {
+      "epoch": 1.1,
+      "grad_norm": 1.7609148025512695,
+      "learning_rate": 3.1656914646605365e-05,
+      "loss": 1.7931,
+      "step": 39500
+    },
+    {
+      "epoch": 1.11,
+      "grad_norm": 1.814334511756897,
+      "learning_rate": 3.142472369276493e-05,
+      "loss": 1.8225,
+      "step": 40000
+    },
+    {
+      "epoch": 1.13,
+      "grad_norm": 1.6874113082885742,
+      "learning_rate": 3.119253273892449e-05,
+      "loss": 1.8083,
+      "step": 40500
+    },
+    {
+      "epoch": 1.14,
+      "grad_norm": 1.439292073249817,
+      "learning_rate": 3.0960341785084055e-05,
+      "loss": 1.82,
+      "step": 41000
+    },
+    {
+      "epoch": 1.16,
+      "grad_norm": 1.608949065208435,
+      "learning_rate": 3.072815083124362e-05,
+      "loss": 1.7961,
+      "step": 41500
+    },
+    {
+      "epoch": 1.17,
+      "grad_norm": 1.7770118713378906,
+      "learning_rate": 3.049595987740318e-05,
+      "loss": 1.8047,
+      "step": 42000
+    },
+    {
+      "epoch": 1.18,
+      "grad_norm": 1.6586685180664062,
+      "learning_rate": 3.0263768923562742e-05,
+      "loss": 1.8114,
+      "step": 42500
+    },
+    {
+      "epoch": 1.2,
+      "grad_norm": 1.9770480394363403,
+      "learning_rate": 3.00315779697223e-05,
+      "loss": 1.8178,
+      "step": 43000
+    },
+    {
+      "epoch": 1.21,
+      "grad_norm": 1.9781038761138916,
+      "learning_rate": 2.9799387015881863e-05,
+      "loss": 1.7902,
+      "step": 43500
+    },
+    {
+      "epoch": 1.23,
+      "grad_norm": 1.5816311836242676,
+      "learning_rate": 2.9567196062041426e-05,
+      "loss": 1.8204,
+      "step": 44000
+    },
+    {
+      "epoch": 1.24,
+      "grad_norm": 1.7043039798736572,
+      "learning_rate": 2.9335005108200985e-05,
+      "loss": 1.8009,
+      "step": 44500
+    },
+    {
+      "epoch": 1.25,
+      "grad_norm": 1.7163785696029663,
+      "learning_rate": 2.9102814154360547e-05,
+      "loss": 1.8097,
+      "step": 45000
+    },
+    {
+      "epoch": 1.27,
+      "grad_norm": 1.7846918106079102,
+      "learning_rate": 2.887062320052011e-05,
+      "loss": 1.7983,
+      "step": 45500
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 2.0754926204681396,
+      "learning_rate": 2.8638432246679668e-05,
+      "loss": 1.7871,
+      "step": 46000
+    },
+    {
+      "epoch": 1.3,
+      "grad_norm": 2.4134902954101562,
+      "learning_rate": 2.840624129283923e-05,
+      "loss": 1.7982,
+      "step": 46500
+    },
+    {
+      "epoch": 1.31,
+      "grad_norm": 1.5378786325454712,
+      "learning_rate": 2.8174050338998793e-05,
+      "loss": 1.8158,
+      "step": 47000
+    },
+    {
+      "epoch": 1.32,
+      "grad_norm": 1.5169446468353271,
+      "learning_rate": 2.794185938515835e-05,
+      "loss": 1.7911,
+      "step": 47500
+    },
+    {
+      "epoch": 1.34,
+      "grad_norm": 1.5888478755950928,
+      "learning_rate": 2.7709668431317914e-05,
+      "loss": 1.7987,
+      "step": 48000
+    },
+    {
+      "epoch": 1.35,
+      "grad_norm": 1.6548982858657837,
+      "learning_rate": 2.7477477477477483e-05,
+      "loss": 1.7959,
+      "step": 48500
+    },
+    {
+      "epoch": 1.37,
+      "grad_norm": 1.9931424856185913,
+      "learning_rate": 2.7245286523637042e-05,
+      "loss": 1.8071,
+      "step": 49000
+    },
+    {
+      "epoch": 1.38,
+      "grad_norm": 1.7344861030578613,
+      "learning_rate": 2.7013095569796604e-05,
+      "loss": 1.7918,
+      "step": 49500
+    },
+    {
+      "epoch": 1.39,
+      "grad_norm": 1.836928367614746,
+      "learning_rate": 2.6780904615956166e-05,
+      "loss": 1.8107,
+      "step": 50000
+    },
+    {
+      "epoch": 1.41,
+      "grad_norm": 1.8835091590881348,
+      "learning_rate": 2.6548713662115725e-05,
+      "loss": 1.8134,
+      "step": 50500
+    },
+    {
+      "epoch": 1.42,
+      "grad_norm": 1.6774250268936157,
+      "learning_rate": 2.6316522708275288e-05,
+      "loss": 1.8117,
+      "step": 51000
+    },
+    {
+      "epoch": 1.43,
+      "grad_norm": 1.9196052551269531,
+      "learning_rate": 2.608433175443485e-05,
+      "loss": 1.8253,
+      "step": 51500
+    },
+    {
+      "epoch": 1.45,
+      "grad_norm": 2.208343505859375,
+      "learning_rate": 2.585214080059441e-05,
+      "loss": 1.8069,
+      "step": 52000
+    },
+    {
+      "epoch": 1.46,
+      "grad_norm": 1.866508960723877,
+      "learning_rate": 2.561994984675397e-05,
+      "loss": 1.801,
+      "step": 52500
+    },
+    {
+      "epoch": 1.48,
+      "grad_norm": 1.6251405477523804,
+      "learning_rate": 2.5387758892913533e-05,
+      "loss": 1.7959,
+      "step": 53000
+    },
+    {
+      "epoch": 1.49,
+      "grad_norm": 1.6582661867141724,
+      "learning_rate": 2.5155567939073092e-05,
+      "loss": 1.7879,
+      "step": 53500
+    },
+    {
+      "epoch": 1.5,
+      "grad_norm": 1.6403582096099854,
+      "learning_rate": 2.4923376985232658e-05,
+      "loss": 1.8034,
+      "step": 54000
+    },
+    {
+      "epoch": 1.52,
+      "grad_norm": 1.930598258972168,
+      "learning_rate": 2.469118603139222e-05,
+      "loss": 1.7991,
+      "step": 54500
+    },
+    {
+      "epoch": 1.53,
+      "grad_norm": 1.7759175300598145,
+      "learning_rate": 2.445899507755178e-05,
+      "loss": 1.8028,
+      "step": 55000
+    },
+    {
+      "epoch": 1.55,
+      "grad_norm": 1.8325824737548828,
+      "learning_rate": 2.422680412371134e-05,
+      "loss": 1.7951,
+      "step": 55500
+    },
+    {
+      "epoch": 1.56,
+      "grad_norm": 1.7427431344985962,
+      "learning_rate": 2.3994613169870904e-05,
+      "loss": 1.8051,
+      "step": 56000
+    },
+    {
+      "epoch": 1.57,
+      "grad_norm": 1.6132415533065796,
+      "learning_rate": 2.3762422216030463e-05,
+      "loss": 1.8046,
+      "step": 56500
+    },
+    {
+      "epoch": 1.59,
+      "grad_norm": 1.5871241092681885,
+      "learning_rate": 2.3530231262190025e-05,
+      "loss": 1.7974,
+      "step": 57000
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 1.6785041093826294,
+      "learning_rate": 2.329804030834959e-05,
+      "loss": 1.824,
+      "step": 57500
+    },
+    {
+      "epoch": 1.62,
+      "grad_norm": 1.5425368547439575,
+      "learning_rate": 2.306584935450915e-05,
+      "loss": 1.8088,
+      "step": 58000
+    },
+    {
+      "epoch": 1.63,
+      "grad_norm": 2.0951876640319824,
+      "learning_rate": 2.283365840066871e-05,
+      "loss": 1.7957,
+      "step": 58500
+    },
+    {
+      "epoch": 1.64,
+      "grad_norm": 1.5651832818984985,
+      "learning_rate": 2.2601467446828274e-05,
+      "loss": 1.7926,
+      "step": 59000
+    },
+    {
+      "epoch": 1.66,
+      "grad_norm": 1.9217791557312012,
+      "learning_rate": 2.2369276492987833e-05,
+      "loss": 1.787,
+      "step": 59500
+    },
+    {
+      "epoch": 1.67,
+      "grad_norm": 1.565430998802185,
+      "learning_rate": 2.2137085539147395e-05,
+      "loss": 1.8023,
+      "step": 60000
+    },
+    {
+      "epoch": 1.69,
+      "grad_norm": 1.7779954671859741,
+      "learning_rate": 2.1904894585306957e-05,
+      "loss": 1.8024,
+      "step": 60500
+    },
+    {
+      "epoch": 1.7,
+      "grad_norm": 1.7349501848220825,
+      "learning_rate": 2.1672703631466516e-05,
+      "loss": 1.7968,
+      "step": 61000
+    },
+    {
+      "epoch": 1.71,
+      "grad_norm": 2.0634210109710693,
+      "learning_rate": 2.1440512677626082e-05,
+      "loss": 1.7895,
+      "step": 61500
+    },
+    {
+      "epoch": 1.73,
+      "grad_norm": 1.7442530393600464,
+      "learning_rate": 2.1208321723785644e-05,
+      "loss": 1.7896,
+      "step": 62000
+    },
+    {
+      "epoch": 1.74,
+      "grad_norm": 1.727230191230774,
+      "learning_rate": 2.0976130769945203e-05,
+      "loss": 1.8213,
+      "step": 62500
+    },
+    {
+      "epoch": 1.76,
+      "grad_norm": 1.626993179321289,
+      "learning_rate": 2.0743939816104765e-05,
+      "loss": 1.7983,
+      "step": 63000
+    },
+    {
+      "epoch": 1.77,
+      "grad_norm": 1.9263286590576172,
+      "learning_rate": 2.0511748862264328e-05,
+      "loss": 1.8106,
+      "step": 63500
+    },
+    {
+      "epoch": 1.78,
+      "grad_norm": 2.5916402339935303,
+      "learning_rate": 2.0279557908423887e-05,
+      "loss": 1.8211,
+      "step": 64000
+    },
+    {
+      "epoch": 1.8,
+      "grad_norm": 1.739984154701233,
+      "learning_rate": 2.004736695458345e-05,
+      "loss": 1.8093,
+      "step": 64500
+    },
+    {
+      "epoch": 1.81,
+      "grad_norm": 1.8712440729141235,
+      "learning_rate": 1.9815176000743015e-05,
+      "loss": 1.7874,
+      "step": 65000
+    },
+    {
+      "epoch": 1.83,
+      "grad_norm": 1.425011157989502,
+      "learning_rate": 1.9582985046902573e-05,
+      "loss": 1.813,
+      "step": 65500
+    },
+    {
+      "epoch": 1.84,
+      "grad_norm": 1.7112998962402344,
+      "learning_rate": 1.9350794093062136e-05,
+      "loss": 1.7979,
+      "step": 66000
+    },
+    {
+      "epoch": 1.85,
+      "grad_norm": 1.6773180961608887,
+      "learning_rate": 1.9118603139221698e-05,
+      "loss": 1.7976,
+      "step": 66500
+    },
+    {
+      "epoch": 1.87,
+      "grad_norm": 1.7254269123077393,
+      "learning_rate": 1.8886412185381257e-05,
+      "loss": 1.8066,
+      "step": 67000
+    },
+    {
+      "epoch": 1.88,
+      "grad_norm": 1.6539784669876099,
+      "learning_rate": 1.865422123154082e-05,
+      "loss": 1.7846,
+      "step": 67500
+    },
+    {
+      "epoch": 1.89,
+      "grad_norm": 1.9257705211639404,
+      "learning_rate": 1.842203027770038e-05,
+      "loss": 1.8066,
+      "step": 68000
+    },
+    {
+      "epoch": 1.91,
+      "grad_norm": 2.245328903198242,
+      "learning_rate": 1.8189839323859944e-05,
+      "loss": 1.7972,
+      "step": 68500
+    },
+    {
+      "epoch": 1.92,
+      "grad_norm": 1.8069318532943726,
+      "learning_rate": 1.7957648370019506e-05,
+      "loss": 1.7978,
+      "step": 69000
+    },
+    {
+      "epoch": 1.94,
+      "grad_norm": 1.8477637767791748,
+      "learning_rate": 1.772545741617907e-05,
+      "loss": 1.7919,
+      "step": 69500
+    },
+    {
+      "epoch": 1.95,
+      "grad_norm": 1.6907987594604492,
+      "learning_rate": 1.7493266462338627e-05,
+      "loss": 1.8147,
+      "step": 70000
+    },
+    {
+      "epoch": 1.96,
+      "grad_norm": 1.5092899799346924,
+      "learning_rate": 1.726107550849819e-05,
+      "loss": 1.7916,
+      "step": 70500
+    },
+    {
+      "epoch": 1.98,
+      "grad_norm": 1.6707723140716553,
+      "learning_rate": 1.7028884554657752e-05,
+      "loss": 1.8101,
+      "step": 71000
+    },
+    {
+      "epoch": 1.99,
+      "grad_norm": 1.978575587272644,
+      "learning_rate": 1.679669360081731e-05,
+      "loss": 1.7955,
+      "step": 71500
+    },
+    {
+      "epoch": 2.01,
+      "grad_norm": 1.796865463256836,
+      "learning_rate": 1.6564502646976873e-05,
+      "loss": 1.8171,
+      "step": 72000
+    },
+    {
+      "epoch": 2.02,
+      "grad_norm": 1.5833417177200317,
+      "learning_rate": 1.633231169313644e-05,
+      "loss": 1.7683,
+      "step": 72500
+    },
+    {
+      "epoch": 2.03,
+      "grad_norm": 2.0611090660095215,
+      "learning_rate": 1.6100120739295998e-05,
+      "loss": 1.7818,
+      "step": 73000
+    },
+    {
+      "epoch": 2.05,
+      "grad_norm": 1.7287518978118896,
+      "learning_rate": 1.586792978545556e-05,
+      "loss": 1.7965,
+      "step": 73500
+    },
+    {
+      "epoch": 2.06,
+      "grad_norm": 1.7561732530593872,
+      "learning_rate": 1.5635738831615122e-05,
+      "loss": 1.795,
+      "step": 74000
+    },
+    {
+      "epoch": 2.08,
+      "grad_norm": 1.5913869142532349,
+      "learning_rate": 1.540354787777468e-05,
+      "loss": 1.7736,
+      "step": 74500
+    },
+    {
+      "epoch": 2.09,
+      "grad_norm": 1.8258248567581177,
+      "learning_rate": 1.5171356923934243e-05,
+      "loss": 1.7874,
+      "step": 75000
+    },
+    {
+      "epoch": 2.1,
+      "grad_norm": 1.7454122304916382,
+      "learning_rate": 1.4939165970093804e-05,
+      "loss": 1.781,
+      "step": 75500
+    },
+    {
+      "epoch": 2.12,
+      "grad_norm": 1.4468224048614502,
+      "learning_rate": 1.470697501625337e-05,
+      "loss": 1.7779,
+      "step": 76000
+    },
+    {
+      "epoch": 2.13,
+      "grad_norm": 2.2896320819854736,
+      "learning_rate": 1.447478406241293e-05,
+      "loss": 1.8015,
+      "step": 76500
+    },
+    {
+      "epoch": 2.15,
+      "grad_norm": 1.531069040298462,
+      "learning_rate": 1.424259310857249e-05,
+      "loss": 1.7946,
+      "step": 77000
+    },
+    {
+      "epoch": 2.16,
+      "grad_norm": 2.1092686653137207,
+      "learning_rate": 1.4010402154732053e-05,
+      "loss": 1.7847,
+      "step": 77500
+    },
+    {
+      "epoch": 2.17,
+      "grad_norm": 2.2306418418884277,
+      "learning_rate": 1.3778211200891614e-05,
+      "loss": 1.8005,
+      "step": 78000
+    },
+    {
+      "epoch": 2.19,
+      "grad_norm": 1.438249111175537,
+      "learning_rate": 1.3546020247051174e-05,
+      "loss": 1.7836,
+      "step": 78500
+    },
+    {
+      "epoch": 2.2,
+      "grad_norm": 1.8369108438491821,
+      "learning_rate": 1.3313829293210736e-05,
+      "loss": 1.7908,
+      "step": 79000
+    },
+    {
+      "epoch": 2.22,
+      "grad_norm": 1.5445431470870972,
+      "learning_rate": 1.30816383393703e-05,
+      "loss": 1.7918,
+      "step": 79500
+    },
+    {
+      "epoch": 2.23,
+      "grad_norm": 1.8314502239227295,
+      "learning_rate": 1.2849447385529861e-05,
+      "loss": 1.7947,
+      "step": 80000
+    },
+    {
+      "epoch": 2.24,
+      "grad_norm": 1.5848190784454346,
+      "learning_rate": 1.2617256431689423e-05,
+      "loss": 1.7758,
+      "step": 80500
+    },
+    {
+      "epoch": 2.26,
+      "grad_norm": 1.9737972021102905,
+      "learning_rate": 1.2385065477848984e-05,
+      "loss": 1.7833,
+      "step": 81000
+    },
+    {
+      "epoch": 2.27,
+      "grad_norm": 1.9529500007629395,
+      "learning_rate": 1.2152874524008545e-05,
+      "loss": 1.7848,
+      "step": 81500
+    },
+    {
+      "epoch": 2.28,
+      "grad_norm": 1.7836568355560303,
+      "learning_rate": 1.1920683570168107e-05,
+      "loss": 1.7671,
+      "step": 82000
+    },
+    {
+      "epoch": 2.3,
+      "grad_norm": 1.957573652267456,
+      "learning_rate": 1.1688492616327669e-05,
+      "loss": 1.8016,
+      "step": 82500
+    },
+    {
+      "epoch": 2.31,
+      "grad_norm": 1.9392001628875732,
+      "learning_rate": 1.145630166248723e-05,
+      "loss": 1.7987,
+      "step": 83000
+    },
+    {
+      "epoch": 2.33,
+      "grad_norm": 1.8034446239471436,
+      "learning_rate": 1.1224110708646792e-05,
+      "loss": 1.7702,
+      "step": 83500
+    },
+    {
+      "epoch": 2.34,
+      "grad_norm": 2.152960777282715,
+      "learning_rate": 1.0991919754806353e-05,
+      "loss": 1.7809,
+      "step": 84000
+    },
+    {
+      "epoch": 2.35,
+      "grad_norm": 2.0862069129943848,
+      "learning_rate": 1.0759728800965915e-05,
+      "loss": 1.7798,
+      "step": 84500
+    },
+    {
+      "epoch": 2.37,
+      "grad_norm": 1.8202825784683228,
+      "learning_rate": 1.0527537847125477e-05,
+      "loss": 1.7818,
+      "step": 85000
+    },
+    {
+      "epoch": 2.38,
+      "grad_norm": 1.7600897550582886,
+      "learning_rate": 1.0295346893285038e-05,
+      "loss": 1.7777,
+      "step": 85500
+    },
+    {
+      "epoch": 2.4,
+      "grad_norm": 2.1568124294281006,
+      "learning_rate": 1.00631559394446e-05,
+      "loss": 1.7878,
+      "step": 86000
+    },
+    {
+      "epoch": 2.41,
+      "grad_norm": 1.488659381866455,
+      "learning_rate": 9.830964985604162e-06,
+      "loss": 1.7871,
+      "step": 86500
+    },
+    {
+      "epoch": 2.42,
+      "grad_norm": 1.5254462957382202,
+      "learning_rate": 9.598774031763723e-06,
+      "loss": 1.7701,
+      "step": 87000
+    },
+    {
+      "epoch": 2.44,
+      "grad_norm": 1.5069713592529297,
+      "learning_rate": 9.366583077923283e-06,
+      "loss": 1.7826,
+      "step": 87500
+    },
+    {
+      "epoch": 2.45,
+      "grad_norm": 1.986422061920166,
+      "learning_rate": 9.134392124082847e-06,
+      "loss": 1.7964,
+      "step": 88000
+    },
+    {
+      "epoch": 2.47,
+      "grad_norm": 2.4251198768615723,
+      "learning_rate": 8.902201170242408e-06,
+      "loss": 1.7736,
+      "step": 88500
+    },
+    {
+      "epoch": 2.48,
+      "grad_norm": 1.583479642868042,
+      "learning_rate": 8.670010216401969e-06,
+      "loss": 1.7759,
+      "step": 89000
+    },
+    {
+      "epoch": 2.49,
+      "grad_norm": 2.4297444820404053,
+      "learning_rate": 8.437819262561531e-06,
+      "loss": 1.7759,
+      "step": 89500
+    },
+    {
+      "epoch": 2.51,
+      "grad_norm": 1.8693283796310425,
+      "learning_rate": 8.205628308721093e-06,
+      "loss": 1.7736,
+      "step": 90000
+    },
+    {
+      "epoch": 2.52,
+      "grad_norm": 1.630678653717041,
+      "learning_rate": 7.973437354880654e-06,
+      "loss": 1.7768,
+      "step": 90500
+    },
+    {
+      "epoch": 2.54,
+      "grad_norm": 1.6369069814682007,
+      "learning_rate": 7.741246401040216e-06,
+      "loss": 1.7699,
+      "step": 91000
+    },
+    {
+      "epoch": 2.55,
+      "grad_norm": 1.8540914058685303,
+      "learning_rate": 7.509055447199777e-06,
+      "loss": 1.7691,
+      "step": 91500
+    },
+    {
+      "epoch": 2.56,
+      "grad_norm": 1.8826764822006226,
+      "learning_rate": 7.27686449335934e-06,
+      "loss": 1.7866,
+      "step": 92000
+    },
+    {
+      "epoch": 2.58,
+      "grad_norm": 1.901553750038147,
+      "learning_rate": 7.0446735395189e-06,
+      "loss": 1.7818,
+      "step": 92500
+    },
+    {
+      "epoch": 2.59,
+      "grad_norm": 1.9629470109939575,
+      "learning_rate": 6.812482585678462e-06,
+      "loss": 1.784,
+      "step": 93000
+    },
+    {
+      "epoch": 2.61,
+      "grad_norm": 2.1246750354766846,
+      "learning_rate": 6.580291631838025e-06,
+      "loss": 1.7887,
+      "step": 93500
+    },
+    {
+      "epoch": 2.62,
+      "grad_norm": 2.196052074432373,
+      "learning_rate": 6.3481006779975855e-06,
+      "loss": 1.7844,
+      "step": 94000
+    },
+    {
+      "epoch": 2.63,
+      "grad_norm": 1.5874074697494507,
+      "learning_rate": 6.115909724157147e-06,
+      "loss": 1.7629,
+      "step": 94500
+    },
+    {
+      "epoch": 2.65,
+      "grad_norm": 1.921108603477478,
+      "learning_rate": 5.883718770316709e-06,
+      "loss": 1.7698,
+      "step": 95000
+    },
+    {
+      "epoch": 2.66,
+      "grad_norm": 1.6300297975540161,
+      "learning_rate": 5.65152781647627e-06,
+      "loss": 1.7853,
+      "step": 95500
+    },
+    {
+      "epoch": 2.67,
+      "grad_norm": 1.78380286693573,
+      "learning_rate": 5.419336862635832e-06,
+      "loss": 1.7793,
+      "step": 96000
+    },
+    {
+      "epoch": 2.69,
+      "grad_norm": 1.6569774150848389,
+      "learning_rate": 5.1871459087953935e-06,
+      "loss": 1.7808,
+      "step": 96500
+    },
+    {
+      "epoch": 2.7,
+      "grad_norm": 1.647446632385254,
+      "learning_rate": 4.954954954954955e-06,
+      "loss": 1.7691,
+      "step": 97000
+    },
+    {
+      "epoch": 2.72,
+      "grad_norm": 1.3676789999008179,
+      "learning_rate": 4.722764001114516e-06,
+      "loss": 1.7847,
+      "step": 97500
+    },
+    {
+      "epoch": 2.73,
+      "grad_norm": 1.8430696725845337,
+      "learning_rate": 4.490573047274079e-06,
+      "loss": 1.7975,
+      "step": 98000
+    },
+    {
+      "epoch": 2.74,
+      "grad_norm": 2.3150343894958496,
+      "learning_rate": 4.25838209343364e-06,
+      "loss": 1.7717,
+      "step": 98500
+    },
+    {
+      "epoch": 2.76,
+      "grad_norm": 1.6771624088287354,
+      "learning_rate": 4.0261911395932016e-06,
+      "loss": 1.7868,
+      "step": 99000
+    },
+    {
+      "epoch": 2.77,
+      "grad_norm": 1.7548917531967163,
+      "learning_rate": 3.7940001857527634e-06,
+      "loss": 1.7876,
+      "step": 99500
+    },
+    {
+      "epoch": 2.79,
+      "grad_norm": 1.9142273664474487,
+      "learning_rate": 3.561809231912325e-06,
+      "loss": 1.7723,
+      "step": 100000
+    },
+    {
+      "epoch": 2.8,
+      "grad_norm": 2.0858819484710693,
+      "learning_rate": 3.3296182780718867e-06,
+      "loss": 1.7777,
+      "step": 100500
+    },
+    {
+      "epoch": 2.81,
+      "grad_norm": 1.5094643831253052,
+      "learning_rate": 3.097427324231448e-06,
+      "loss": 1.7848,
+      "step": 101000
+    },
+    {
+      "epoch": 2.83,
+      "grad_norm": 1.6694867610931396,
+      "learning_rate": 2.86523637039101e-06,
+      "loss": 1.7849,
+      "step": 101500
+    },
+    {
+      "epoch": 2.84,
+      "grad_norm": 1.7994331121444702,
+      "learning_rate": 2.6330454165505714e-06,
+      "loss": 1.7929,
+      "step": 102000
+    },
+    {
+      "epoch": 2.86,
+      "grad_norm": 1.817747950553894,
+      "learning_rate": 2.400854462710133e-06,
+      "loss": 1.7875,
+      "step": 102500
+    },
+    {
+      "epoch": 2.87,
+      "grad_norm": 1.7881470918655396,
+      "learning_rate": 2.1686635088696947e-06,
+      "loss": 1.7861,
+      "step": 103000
+    },
+    {
+      "epoch": 2.88,
+      "grad_norm": 1.7034624814987183,
+      "learning_rate": 1.936472555029256e-06,
+      "loss": 1.7788,
+      "step": 103500
+    },
+    {
+      "epoch": 2.9,
+      "grad_norm": 1.6703755855560303,
+      "learning_rate": 1.7042816011888178e-06,
+      "loss": 1.7901,
+      "step": 104000
+    },
+    {
+      "epoch": 2.91,
+      "grad_norm": 1.8089648485183716,
+      "learning_rate": 1.4720906473483793e-06,
+      "loss": 1.7705,
+      "step": 104500
+    },
+    {
+      "epoch": 2.93,
+      "grad_norm": 2.2493557929992676,
+      "learning_rate": 1.239899693507941e-06,
+      "loss": 1.7795,
+      "step": 105000
+    },
+    {
+      "epoch": 2.94,
+      "grad_norm": 1.8172557353973389,
+      "learning_rate": 1.0077087396675028e-06,
+      "loss": 1.7897,
+      "step": 105500
+    },
+    {
+      "epoch": 2.95,
+      "grad_norm": 2.0329835414886475,
+      "learning_rate": 7.755177858270642e-07,
+      "loss": 1.784,
+      "step": 106000
+    },
+    {
+      "epoch": 2.97,
+      "grad_norm": 1.859191656112671,
+      "learning_rate": 5.433268319866259e-07,
+      "loss": 1.775,
+      "step": 106500
+    },
+    {
+      "epoch": 2.98,
+      "grad_norm": 2.273327589035034,
+      "learning_rate": 3.1113587814618745e-07,
+      "loss": 1.7793,
+      "step": 107000
+    },
+    {
+      "epoch": 3.0,
+      "grad_norm": 1.7774237394332886,
+      "learning_rate": 7.894492430574905e-08,
+      "loss": 1.7776,
+      "step": 107500
+    },
+    {
+      "epoch": 3.0,
+      "step": 107670,
+      "total_flos": 2.3281764491516314e+17,
+      "train_loss": 1.8120383115867458,
+      "train_runtime": 16851.3759,
+      "train_samples_per_second": 51.114,
+      "train_steps_per_second": 6.389
+    }
+  ],
+  "logging_steps": 500,
+  "max_steps": 107670,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "total_flos": 2.3281764491516314e+17,
+  "train_batch_size": 8,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7d781972ecaafafd6a779f275c472cf5e93fe12fa5cf2ffd147ae32356ed364e
+size 5112