Training in progress, step 195, checkpoint

Browse files

Files changed (11) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +0 -0
last-checkpoint/trainer_state.json +1398 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/Mistral-Nemo-Instruct-2407
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/Mistral-Nemo-Instruct-2407",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "up_proj",
+    "o_proj",
+    "k_proj",
+    "q_proj",
+    "down_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:176532ef8efd189039fc8598e4e192d949a5d608f04a910527799a9cff3bcd57
+size 114106856

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e1667368b8fca5794eb514a6feeb15cfc47722c224b7913dbd58999d2d15334
+size 58562260

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6d4eb9b5f71f7fe935cd981ac385ce381accae654433638194abe6af50ff53cb
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:123c682f61f8b37e7e8cdf026c3620c5a4946ee5addcf20f097046daf8a1d55e
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0240ce510f08e6c2041724e9043e33be9d251d1e4a4d94eb68cd47b954b61d2
+size 17078292

last-checkpoint/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1398 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.2504012841091493,
+  "eval_steps": 500,
+  "global_step": 195,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0012841091492776886,
+      "grad_norm": 14.357385635375977,
+      "learning_rate": 2e-05,
+      "loss": 10.3729,
+      "step": 1
+    },
+    {
+      "epoch": 0.002568218298555377,
+      "grad_norm": 18.170686721801758,
+      "learning_rate": 4e-05,
+      "loss": 18.5166,
+      "step": 2
+    },
+    {
+      "epoch": 0.0038523274478330658,
+      "grad_norm": 14.81664752960205,
+      "learning_rate": 6e-05,
+      "loss": 10.5754,
+      "step": 3
+    },
+    {
+      "epoch": 0.005136436597110754,
+      "grad_norm": 18.0324649810791,
+      "learning_rate": 8e-05,
+      "loss": 11.6846,
+      "step": 4
+    },
+    {
+      "epoch": 0.006420545746388443,
+      "grad_norm": 22.417253494262695,
+      "learning_rate": 0.0001,
+      "loss": 10.7478,
+      "step": 5
+    },
+    {
+      "epoch": 0.0077046548956661316,
+      "grad_norm": 18.198213577270508,
+      "learning_rate": 9.999958813277234e-05,
+      "loss": 7.2643,
+      "step": 6
+    },
+    {
+      "epoch": 0.008988764044943821,
+      "grad_norm": 13.069869995117188,
+      "learning_rate": 9.999835253787473e-05,
+      "loss": 6.7081,
+      "step": 7
+    },
+    {
+      "epoch": 0.010272873194221509,
+      "grad_norm": 26.784624099731445,
+      "learning_rate": 9.999629323566323e-05,
+      "loss": 5.4085,
+      "step": 8
+    },
+    {
+      "epoch": 0.011556982343499198,
+      "grad_norm": 16.559431076049805,
+      "learning_rate": 9.999341026006419e-05,
+      "loss": 6.1126,
+      "step": 9
+    },
+    {
+      "epoch": 0.012841091492776886,
+      "grad_norm": 10.393391609191895,
+      "learning_rate": 9.998970365857373e-05,
+      "loss": 4.2975,
+      "step": 10
+    },
+    {
+      "epoch": 0.014125200642054575,
+      "grad_norm": 11.432884216308594,
+      "learning_rate": 9.998517349225698e-05,
+      "loss": 3.7118,
+      "step": 11
+    },
+    {
+      "epoch": 0.015409309791332263,
+      "grad_norm": 8.524252891540527,
+      "learning_rate": 9.9979819835747e-05,
+      "loss": 3.6477,
+      "step": 12
+    },
+    {
+      "epoch": 0.016693418940609953,
+      "grad_norm": 10.844688415527344,
+      "learning_rate": 9.997364277724361e-05,
+      "loss": 4.0287,
+      "step": 13
+    },
+    {
+      "epoch": 0.017977528089887642,
+      "grad_norm": 9.032464981079102,
+      "learning_rate": 9.996664241851197e-05,
+      "loss": 5.2492,
+      "step": 14
+    },
+    {
+      "epoch": 0.019261637239165328,
+      "grad_norm": 34.07743453979492,
+      "learning_rate": 9.99588188748808e-05,
+      "loss": 3.7249,
+      "step": 15
+    },
+    {
+      "epoch": 0.020545746388443017,
+      "grad_norm": 8.82406997680664,
+      "learning_rate": 9.995017227524049e-05,
+      "loss": 4.0989,
+      "step": 16
+    },
+    {
+      "epoch": 0.021829855537720707,
+      "grad_norm": 11.483402252197266,
+      "learning_rate": 9.994070276204116e-05,
+      "loss": 3.6261,
+      "step": 17
+    },
+    {
+      "epoch": 0.023113964686998396,
+      "grad_norm": 11.1176118850708,
+      "learning_rate": 9.993041049129005e-05,
+      "loss": 3.5575,
+      "step": 18
+    },
+    {
+      "epoch": 0.024398073836276082,
+      "grad_norm": 8.112608909606934,
+      "learning_rate": 9.991929563254914e-05,
+      "loss": 4.3715,
+      "step": 19
+    },
+    {
+      "epoch": 0.025682182985553772,
+      "grad_norm": 21.86610984802246,
+      "learning_rate": 9.990735836893226e-05,
+      "loss": 5.0893,
+      "step": 20
+    },
+    {
+      "epoch": 0.02696629213483146,
+      "grad_norm": 7.573733329772949,
+      "learning_rate": 9.989459889710213e-05,
+      "loss": 3.4477,
+      "step": 21
+    },
+    {
+      "epoch": 0.02825040128410915,
+      "grad_norm": 11.741495132446289,
+      "learning_rate": 9.988101742726708e-05,
+      "loss": 4.9142,
+      "step": 22
+    },
+    {
+      "epoch": 0.029534510433386837,
+      "grad_norm": 7.304950714111328,
+      "learning_rate": 9.986661418317759e-05,
+      "loss": 3.4248,
+      "step": 23
+    },
+    {
+      "epoch": 0.030818619582664526,
+      "grad_norm": 7.762635231018066,
+      "learning_rate": 9.985138940212264e-05,
+      "loss": 3.4191,
+      "step": 24
+    },
+    {
+      "epoch": 0.03210272873194221,
+      "grad_norm": 8.252693176269531,
+      "learning_rate": 9.983534333492575e-05,
+      "loss": 3.2264,
+      "step": 25
+    },
+    {
+      "epoch": 0.033386837881219905,
+      "grad_norm": 9.170884132385254,
+      "learning_rate": 9.981847624594092e-05,
+      "loss": 3.5244,
+      "step": 26
+    },
+    {
+      "epoch": 0.03467094703049759,
+      "grad_norm": 7.7457404136657715,
+      "learning_rate": 9.980078841304816e-05,
+      "loss": 3.25,
+      "step": 27
+    },
+    {
+      "epoch": 0.035955056179775284,
+      "grad_norm": 8.28051471710205,
+      "learning_rate": 9.978228012764902e-05,
+      "loss": 4.0507,
+      "step": 28
+    },
+    {
+      "epoch": 0.03723916532905297,
+      "grad_norm": 9.906140327453613,
+      "learning_rate": 9.976295169466178e-05,
+      "loss": 3.178,
+      "step": 29
+    },
+    {
+      "epoch": 0.038523274478330656,
+      "grad_norm": 9.861376762390137,
+      "learning_rate": 9.974280343251637e-05,
+      "loss": 3.9981,
+      "step": 30
+    },
+    {
+      "epoch": 0.03980738362760835,
+      "grad_norm": 6.8915228843688965,
+      "learning_rate": 9.97218356731491e-05,
+      "loss": 4.4493,
+      "step": 31
+    },
+    {
+      "epoch": 0.041091492776886035,
+      "grad_norm": 9.258909225463867,
+      "learning_rate": 9.97000487619973e-05,
+      "loss": 4.5515,
+      "step": 32
+    },
+    {
+      "epoch": 0.04237560192616372,
+      "grad_norm": 11.00881576538086,
+      "learning_rate": 9.967744305799357e-05,
+      "loss": 3.157,
+      "step": 33
+    },
+    {
+      "epoch": 0.043659711075441414,
+      "grad_norm": 7.662258148193359,
+      "learning_rate": 9.965401893355986e-05,
+      "loss": 4.4372,
+      "step": 34
+    },
+    {
+      "epoch": 0.0449438202247191,
+      "grad_norm": 7.644230842590332,
+      "learning_rate": 9.962977677460132e-05,
+      "loss": 3.8406,
+      "step": 35
+    },
+    {
+      "epoch": 0.04622792937399679,
+      "grad_norm": 7.352019309997559,
+      "learning_rate": 9.96047169805e-05,
+      "loss": 3.2486,
+      "step": 36
+    },
+    {
+      "epoch": 0.04751203852327448,
+      "grad_norm": 61.4245491027832,
+      "learning_rate": 9.957883996410821e-05,
+      "loss": 3.9117,
+      "step": 37
+    },
+    {
+      "epoch": 0.048796147672552165,
+      "grad_norm": 20.14286994934082,
+      "learning_rate": 9.955214615174174e-05,
+      "loss": 3.5284,
+      "step": 38
+    },
+    {
+      "epoch": 0.05008025682182986,
+      "grad_norm": 8.276716232299805,
+      "learning_rate": 9.952463598317285e-05,
+      "loss": 4.4079,
+      "step": 39
+    },
+    {
+      "epoch": 0.051364365971107544,
+      "grad_norm": 10.063166618347168,
+      "learning_rate": 9.949630991162304e-05,
+      "loss": 3.4591,
+      "step": 40
+    },
+    {
+      "epoch": 0.05264847512038523,
+      "grad_norm": 7.96419620513916,
+      "learning_rate": 9.946716840375551e-05,
+      "loss": 4.4116,
+      "step": 41
+    },
+    {
+      "epoch": 0.05393258426966292,
+      "grad_norm": 38.29932403564453,
+      "learning_rate": 9.943721193966755e-05,
+      "loss": 4.4902,
+      "step": 42
+    },
+    {
+      "epoch": 0.05521669341894061,
+      "grad_norm": 7.300318717956543,
+      "learning_rate": 9.940644101288259e-05,
+      "loss": 3.5134,
+      "step": 43
+    },
+    {
+      "epoch": 0.0565008025682183,
+      "grad_norm": 8.175259590148926,
+      "learning_rate": 9.937485613034208e-05,
+      "loss": 3.3953,
+      "step": 44
+    },
+    {
+      "epoch": 0.05778491171749599,
+      "grad_norm": 8.191417694091797,
+      "learning_rate": 9.934245781239714e-05,
+      "loss": 5.0298,
+      "step": 45
+    },
+    {
+      "epoch": 0.059069020866773674,
+      "grad_norm": 8.549973487854004,
+      "learning_rate": 9.930924659279999e-05,
+      "loss": 3.8172,
+      "step": 46
+    },
+    {
+      "epoch": 0.060353130016051366,
+      "grad_norm": 7.71370792388916,
+      "learning_rate": 9.927522301869515e-05,
+      "loss": 3.4193,
+      "step": 47
+    },
+    {
+      "epoch": 0.06163723916532905,
+      "grad_norm": 8.459449768066406,
+      "learning_rate": 9.924038765061042e-05,
+      "loss": 3.6469,
+      "step": 48
+    },
+    {
+      "epoch": 0.06292134831460675,
+      "grad_norm": 7.838904857635498,
+      "learning_rate": 9.920474106244763e-05,
+      "loss": 2.6446,
+      "step": 49
+    },
+    {
+      "epoch": 0.06420545746388442,
+      "grad_norm": 9.394428253173828,
+      "learning_rate": 9.916828384147331e-05,
+      "loss": 4.9068,
+      "step": 50
+    },
+    {
+      "epoch": 0.06548956661316212,
+      "grad_norm": 12.709273338317871,
+      "learning_rate": 9.91310165883088e-05,
+      "loss": 6.3763,
+      "step": 51
+    },
+    {
+      "epoch": 0.06677367576243981,
+      "grad_norm": 13.057750701904297,
+      "learning_rate": 9.909293991692048e-05,
+      "loss": 9.5596,
+      "step": 52
+    },
+    {
+      "epoch": 0.06805778491171749,
+      "grad_norm": 8.012452125549316,
+      "learning_rate": 9.905405445460972e-05,
+      "loss": 3.4335,
+      "step": 53
+    },
+    {
+      "epoch": 0.06934189406099518,
+      "grad_norm": 7.457191467285156,
+      "learning_rate": 9.90143608420024e-05,
+      "loss": 4.5937,
+      "step": 54
+    },
+    {
+      "epoch": 0.07062600321027288,
+      "grad_norm": 6.706194877624512,
+      "learning_rate": 9.897385973303845e-05,
+      "loss": 2.9339,
+      "step": 55
+    },
+    {
+      "epoch": 0.07191011235955057,
+      "grad_norm": 7.014476299285889,
+      "learning_rate": 9.893255179496106e-05,
+      "loss": 4.0212,
+      "step": 56
+    },
+    {
+      "epoch": 0.07319422150882825,
+      "grad_norm": 6.815770626068115,
+      "learning_rate": 9.889043770830566e-05,
+      "loss": 2.6769,
+      "step": 57
+    },
+    {
+      "epoch": 0.07447833065810594,
+      "grad_norm": 8.04780101776123,
+      "learning_rate": 9.884751816688872e-05,
+      "loss": 4.4797,
+      "step": 58
+    },
+    {
+      "epoch": 0.07576243980738363,
+      "grad_norm": 8.820972442626953,
+      "learning_rate": 9.880379387779637e-05,
+      "loss": 3.9321,
+      "step": 59
+    },
+    {
+      "epoch": 0.07704654895666131,
+      "grad_norm": 10.293983459472656,
+      "learning_rate": 9.875926556137265e-05,
+      "loss": 4.6843,
+      "step": 60
+    },
+    {
+      "epoch": 0.078330658105939,
+      "grad_norm": 6.741147518157959,
+      "learning_rate": 9.871393395120774e-05,
+      "loss": 3.1355,
+      "step": 61
+    },
+    {
+      "epoch": 0.0796147672552167,
+      "grad_norm": 6.0058417320251465,
+      "learning_rate": 9.866779979412583e-05,
+      "loss": 3.0209,
+      "step": 62
+    },
+    {
+      "epoch": 0.08089887640449438,
+      "grad_norm": 6.913457870483398,
+      "learning_rate": 9.862086385017283e-05,
+      "loss": 3.1158,
+      "step": 63
+    },
+    {
+      "epoch": 0.08218298555377207,
+      "grad_norm": 9.136816024780273,
+      "learning_rate": 9.85731268926038e-05,
+      "loss": 3.9152,
+      "step": 64
+    },
+    {
+      "epoch": 0.08346709470304976,
+      "grad_norm": 10.154070854187012,
+      "learning_rate": 9.852458970787026e-05,
+      "loss": 3.7569,
+      "step": 65
+    },
+    {
+      "epoch": 0.08475120385232744,
+      "grad_norm": 8.0647554397583,
+      "learning_rate": 9.847525309560729e-05,
+      "loss": 4.1515,
+      "step": 66
+    },
+    {
+      "epoch": 0.08603531300160513,
+      "grad_norm": 7.285793304443359,
+      "learning_rate": 9.842511786862019e-05,
+      "loss": 3.9606,
+      "step": 67
+    },
+    {
+      "epoch": 0.08731942215088283,
+      "grad_norm": 8.200295448303223,
+      "learning_rate": 9.837418485287127e-05,
+      "loss": 2.9297,
+      "step": 68
+    },
+    {
+      "epoch": 0.0886035313001605,
+      "grad_norm": 6.3481574058532715,
+      "learning_rate": 9.832245488746611e-05,
+      "loss": 3.7646,
+      "step": 69
+    },
+    {
+      "epoch": 0.0898876404494382,
+      "grad_norm": 7.198950290679932,
+      "learning_rate": 9.826992882463982e-05,
+      "loss": 3.0986,
+      "step": 70
+    },
+    {
+      "epoch": 0.09117174959871589,
+      "grad_norm": 7.222299098968506,
+      "learning_rate": 9.821660752974293e-05,
+      "loss": 4.2255,
+      "step": 71
+    },
+    {
+      "epoch": 0.09245585874799359,
+      "grad_norm": 6.583283424377441,
+      "learning_rate": 9.816249188122724e-05,
+      "loss": 3.1275,
+      "step": 72
+    },
+    {
+      "epoch": 0.09373996789727126,
+      "grad_norm": 7.3878865242004395,
+      "learning_rate": 9.810758277063119e-05,
+      "loss": 3.6662,
+      "step": 73
+    },
+    {
+      "epoch": 0.09502407704654896,
+      "grad_norm": 10.86058521270752,
+      "learning_rate": 9.805188110256534e-05,
+      "loss": 3.7664,
+      "step": 74
+    },
+    {
+      "epoch": 0.09630818619582665,
+      "grad_norm": 6.797985553741455,
+      "learning_rate": 9.799538779469734e-05,
+      "loss": 3.3086,
+      "step": 75
+    },
+    {
+      "epoch": 0.09759229534510433,
+      "grad_norm": 7.570103168487549,
+      "learning_rate": 9.793810377773686e-05,
+      "loss": 4.0137,
+      "step": 76
+    },
+    {
+      "epoch": 0.09887640449438202,
+      "grad_norm": 6.732529163360596,
+      "learning_rate": 9.78800299954203e-05,
+      "loss": 3.6866,
+      "step": 77
+    },
+    {
+      "epoch": 0.10016051364365972,
+      "grad_norm": 5.162958145141602,
+      "learning_rate": 9.782116740449516e-05,
+      "loss": 1.8445,
+      "step": 78
+    },
+    {
+      "epoch": 0.1014446227929374,
+      "grad_norm": 8.043292045593262,
+      "learning_rate": 9.77615169747043e-05,
+      "loss": 4.8526,
+      "step": 79
+    },
+    {
+      "epoch": 0.10272873194221509,
+      "grad_norm": 8.41696548461914,
+      "learning_rate": 9.770107968877003e-05,
+      "loss": 3.6461,
+      "step": 80
+    },
+    {
+      "epoch": 0.10401284109149278,
+      "grad_norm": 4.764773368835449,
+      "learning_rate": 9.763985654237786e-05,
+      "loss": 2.2128,
+      "step": 81
+    },
+    {
+      "epoch": 0.10529695024077046,
+      "grad_norm": 7.494049072265625,
+      "learning_rate": 9.757784854416006e-05,
+      "loss": 3.5384,
+      "step": 82
+    },
+    {
+      "epoch": 0.10658105939004815,
+      "grad_norm": 6.348560810089111,
+      "learning_rate": 9.751505671567913e-05,
+      "loss": 2.7733,
+      "step": 83
+    },
+    {
+      "epoch": 0.10786516853932585,
+      "grad_norm": 7.778624057769775,
+      "learning_rate": 9.745148209141093e-05,
+      "loss": 3.1605,
+      "step": 84
+    },
+    {
+      "epoch": 0.10914927768860354,
+      "grad_norm": 7.731088638305664,
+      "learning_rate": 9.738712571872763e-05,
+      "loss": 3.9436,
+      "step": 85
+    },
+    {
+      "epoch": 0.11043338683788122,
+      "grad_norm": 7.737299919128418,
+      "learning_rate": 9.732198865788047e-05,
+      "loss": 3.858,
+      "step": 86
+    },
+    {
+      "epoch": 0.11171749598715891,
+      "grad_norm": 10.583487510681152,
+      "learning_rate": 9.725607198198227e-05,
+      "loss": 3.343,
+      "step": 87
+    },
+    {
+      "epoch": 0.1130016051364366,
+      "grad_norm": 8.798162460327148,
+      "learning_rate": 9.718937677698976e-05,
+      "loss": 3.4893,
+      "step": 88
+    },
+    {
+      "epoch": 0.11428571428571428,
+      "grad_norm": 7.799386501312256,
+      "learning_rate": 9.712190414168572e-05,
+      "loss": 3.2881,
+      "step": 89
+    },
+    {
+      "epoch": 0.11556982343499198,
+      "grad_norm": 17.616641998291016,
+      "learning_rate": 9.705365518766085e-05,
+      "loss": 4.5298,
+      "step": 90
+    },
+    {
+      "epoch": 0.11685393258426967,
+      "grad_norm": 125.22333526611328,
+      "learning_rate": 9.698463103929542e-05,
+      "loss": 3.6476,
+      "step": 91
+    },
+    {
+      "epoch": 0.11813804173354735,
+      "grad_norm": 10.186848640441895,
+      "learning_rate": 9.691483283374084e-05,
+      "loss": 4.1665,
+      "step": 92
+    },
+    {
+      "epoch": 0.11942215088282504,
+      "grad_norm": 8.034070014953613,
+      "learning_rate": 9.684426172090085e-05,
+      "loss": 3.0576,
+      "step": 93
+    },
+    {
+      "epoch": 0.12070626003210273,
+      "grad_norm": 12.567116737365723,
+      "learning_rate": 9.677291886341256e-05,
+      "loss": 4.9112,
+      "step": 94
+    },
+    {
+      "epoch": 0.12199036918138041,
+      "grad_norm": 9.253437995910645,
+      "learning_rate": 9.67008054366274e-05,
+      "loss": 4.2526,
+      "step": 95
+    },
+    {
+      "epoch": 0.1232744783306581,
+      "grad_norm": 7.658414363861084,
+      "learning_rate": 9.662792262859166e-05,
+      "loss": 3.9683,
+      "step": 96
+    },
+    {
+      "epoch": 0.1245585874799358,
+      "grad_norm": 8.615376472473145,
+      "learning_rate": 9.65542716400269e-05,
+      "loss": 4.5914,
+      "step": 97
+    },
+    {
+      "epoch": 0.1258426966292135,
+      "grad_norm": 8.936386108398438,
+      "learning_rate": 9.647985368431032e-05,
+      "loss": 3.7773,
+      "step": 98
+    },
+    {
+      "epoch": 0.12712680577849117,
+      "grad_norm": 8.9071683883667,
+      "learning_rate": 9.640466998745456e-05,
+      "loss": 4.2849,
+      "step": 99
+    },
+    {
+      "epoch": 0.12841091492776885,
+      "grad_norm": 8.052922248840332,
+      "learning_rate": 9.632872178808766e-05,
+      "loss": 4.1163,
+      "step": 100
+    },
+    {
+      "epoch": 0.12969502407704656,
+      "grad_norm": 21.878726959228516,
+      "learning_rate": 9.625201033743261e-05,
+      "loss": 6.7764,
+      "step": 101
+    },
+    {
+      "epoch": 0.13097913322632423,
+      "grad_norm": 7.946263313293457,
+      "learning_rate": 9.617453689928668e-05,
+      "loss": 3.1499,
+      "step": 102
+    },
+    {
+      "epoch": 0.1322632423756019,
+      "grad_norm": 6.117218494415283,
+      "learning_rate": 9.609630275000072e-05,
+      "loss": 3.096,
+      "step": 103
+    },
+    {
+      "epoch": 0.13354735152487962,
+      "grad_norm": 8.127127647399902,
+      "learning_rate": 9.601730917845797e-05,
+      "loss": 3.2942,
+      "step": 104
+    },
+    {
+      "epoch": 0.1348314606741573,
+      "grad_norm": 7.058871269226074,
+      "learning_rate": 9.5937557486053e-05,
+      "loss": 2.7391,
+      "step": 105
+    },
+    {
+      "epoch": 0.13611556982343498,
+      "grad_norm": 6.4369587898254395,
+      "learning_rate": 9.585704898667014e-05,
+      "loss": 3.0658,
+      "step": 106
+    },
+    {
+      "epoch": 0.13739967897271269,
+      "grad_norm": 6.140347480773926,
+      "learning_rate": 9.577578500666187e-05,
+      "loss": 3.8174,
+      "step": 107
+    },
+    {
+      "epoch": 0.13868378812199036,
+      "grad_norm": 5.188270568847656,
+      "learning_rate": 9.569376688482701e-05,
+      "loss": 2.7533,
+      "step": 108
+    },
+    {
+      "epoch": 0.13996789727126807,
+      "grad_norm": 8.776408195495605,
+      "learning_rate": 9.56109959723886e-05,
+      "loss": 3.5718,
+      "step": 109
+    },
+    {
+      "epoch": 0.14125200642054575,
+      "grad_norm": 8.285562515258789,
+      "learning_rate": 9.552747363297172e-05,
+      "loss": 4.1781,
+      "step": 110
+    },
+    {
+      "epoch": 0.14253611556982343,
+      "grad_norm": 5.825533390045166,
+      "learning_rate": 9.544320124258092e-05,
+      "loss": 3.335,
+      "step": 111
+    },
+    {
+      "epoch": 0.14382022471910114,
+      "grad_norm": 5.4509358406066895,
+      "learning_rate": 9.535818018957767e-05,
+      "loss": 2.9343,
+      "step": 112
+    },
+    {
+      "epoch": 0.14510433386837882,
+      "grad_norm": 5.9650092124938965,
+      "learning_rate": 9.527241187465734e-05,
+      "loss": 2.968,
+      "step": 113
+    },
+    {
+      "epoch": 0.1463884430176565,
+      "grad_norm": 6.678060531616211,
+      "learning_rate": 9.518589771082627e-05,
+      "loss": 4.141,
+      "step": 114
+    },
+    {
+      "epoch": 0.1476725521669342,
+      "grad_norm": 9.003458976745605,
+      "learning_rate": 9.509863912337842e-05,
+      "loss": 3.2701,
+      "step": 115
+    },
+    {
+      "epoch": 0.14895666131621188,
+      "grad_norm": 9.366497039794922,
+      "learning_rate": 9.501063754987188e-05,
+      "loss": 4.5216,
+      "step": 116
+    },
+    {
+      "epoch": 0.15024077046548956,
+      "grad_norm": 5.9272074699401855,
+      "learning_rate": 9.492189444010521e-05,
+      "loss": 3.3256,
+      "step": 117
+    },
+    {
+      "epoch": 0.15152487961476727,
+      "grad_norm": 5.724755764007568,
+      "learning_rate": 9.483241125609358e-05,
+      "loss": 3.6303,
+      "step": 118
+    },
+    {
+      "epoch": 0.15280898876404495,
+      "grad_norm": 7.670520305633545,
+      "learning_rate": 9.474218947204459e-05,
+      "loss": 4.2398,
+      "step": 119
+    },
+    {
+      "epoch": 0.15409309791332262,
+      "grad_norm": 5.741958141326904,
+      "learning_rate": 9.465123057433412e-05,
+      "loss": 3.2178,
+      "step": 120
+    },
+    {
+      "epoch": 0.15537720706260033,
+      "grad_norm": 8.694066047668457,
+      "learning_rate": 9.455953606148172e-05,
+      "loss": 3.3743,
+      "step": 121
+    },
+    {
+      "epoch": 0.156661316211878,
+      "grad_norm": 6.3374128341674805,
+      "learning_rate": 9.446710744412595e-05,
+      "loss": 3.295,
+      "step": 122
+    },
+    {
+      "epoch": 0.1579454253611557,
+      "grad_norm": 8.692503929138184,
+      "learning_rate": 9.437394624499958e-05,
+      "loss": 3.4467,
+      "step": 123
+    },
+    {
+      "epoch": 0.1592295345104334,
+      "grad_norm": 6.704461097717285,
+      "learning_rate": 9.42800539989044e-05,
+      "loss": 3.3916,
+      "step": 124
+    },
+    {
+      "epoch": 0.16051364365971107,
+      "grad_norm": 7.450805187225342,
+      "learning_rate": 9.418543225268596e-05,
+      "loss": 3.5033,
+      "step": 125
+    },
+    {
+      "epoch": 0.16179775280898875,
+      "grad_norm": 6.723392009735107,
+      "learning_rate": 9.409008256520813e-05,
+      "loss": 2.8134,
+      "step": 126
+    },
+    {
+      "epoch": 0.16308186195826646,
+      "grad_norm": 6.70756196975708,
+      "learning_rate": 9.399400650732735e-05,
+      "loss": 3.097,
+      "step": 127
+    },
+    {
+      "epoch": 0.16436597110754414,
+      "grad_norm": 8.40323543548584,
+      "learning_rate": 9.389720566186681e-05,
+      "loss": 3.3212,
+      "step": 128
+    },
+    {
+      "epoch": 0.16565008025682182,
+      "grad_norm": 6.089616775512695,
+      "learning_rate": 9.379968162359034e-05,
+      "loss": 3.1663,
+      "step": 129
+    },
+    {
+      "epoch": 0.16693418940609953,
+      "grad_norm": 7.481161117553711,
+      "learning_rate": 9.370143599917615e-05,
+      "loss": 4.1485,
+      "step": 130
+    },
+    {
+      "epoch": 0.1682182985553772,
+      "grad_norm": 7.091290473937988,
+      "learning_rate": 9.360247040719039e-05,
+      "loss": 3.6029,
+      "step": 131
+    },
+    {
+      "epoch": 0.16950240770465488,
+      "grad_norm": 4.880778789520264,
+      "learning_rate": 9.350278647806037e-05,
+      "loss": 2.2796,
+      "step": 132
+    },
+    {
+      "epoch": 0.1707865168539326,
+      "grad_norm": 6.366818428039551,
+      "learning_rate": 9.340238585404788e-05,
+      "loss": 2.6689,
+      "step": 133
+    },
+    {
+      "epoch": 0.17207062600321027,
+      "grad_norm": 8.25015926361084,
+      "learning_rate": 9.330127018922194e-05,
+      "loss": 4.2835,
+      "step": 134
+    },
+    {
+      "epoch": 0.17335473515248795,
+      "grad_norm": 9.131752967834473,
+      "learning_rate": 9.319944114943171e-05,
+      "loss": 3.3757,
+      "step": 135
+    },
+    {
+      "epoch": 0.17463884430176566,
+      "grad_norm": 8.515515327453613,
+      "learning_rate": 9.309690041227899e-05,
+      "loss": 3.7804,
+      "step": 136
+    },
+    {
+      "epoch": 0.17592295345104333,
+      "grad_norm": 7.522832870483398,
+      "learning_rate": 9.29936496670905e-05,
+      "loss": 4.3532,
+      "step": 137
+    },
+    {
+      "epoch": 0.177207062600321,
+      "grad_norm": 9.975517272949219,
+      "learning_rate": 9.288969061489021e-05,
+      "loss": 5.3386,
+      "step": 138
+    },
+    {
+      "epoch": 0.17849117174959872,
+      "grad_norm": 7.00437593460083,
+      "learning_rate": 9.278502496837116e-05,
+      "loss": 2.997,
+      "step": 139
+    },
+    {
+      "epoch": 0.1797752808988764,
+      "grad_norm": 7.276975631713867,
+      "learning_rate": 9.267965445186733e-05,
+      "loss": 3.6225,
+      "step": 140
+    },
+    {
+      "epoch": 0.1810593900481541,
+      "grad_norm": 8.758260726928711,
+      "learning_rate": 9.257358080132523e-05,
+      "loss": 4.9658,
+      "step": 141
+    },
+    {
+      "epoch": 0.18234349919743179,
+      "grad_norm": 8.253544807434082,
+      "learning_rate": 9.24668057642753e-05,
+      "loss": 4.3946,
+      "step": 142
+    },
+    {
+      "epoch": 0.18362760834670946,
+      "grad_norm": 7.510740280151367,
+      "learning_rate": 9.235933109980301e-05,
+      "loss": 3.6756,
+      "step": 143
+    },
+    {
+      "epoch": 0.18491171749598717,
+      "grad_norm": 6.977139472961426,
+      "learning_rate": 9.225115857852014e-05,
+      "loss": 3.4522,
+      "step": 144
+    },
+    {
+      "epoch": 0.18619582664526485,
+      "grad_norm": 7.4238972663879395,
+      "learning_rate": 9.214228998253527e-05,
+      "loss": 3.8661,
+      "step": 145
+    },
+    {
+      "epoch": 0.18747993579454253,
+      "grad_norm": 7.118825435638428,
+      "learning_rate": 9.20327271054247e-05,
+      "loss": 3.6105,
+      "step": 146
+    },
+    {
+      "epoch": 0.18876404494382024,
+      "grad_norm": 8.235129356384277,
+      "learning_rate": 9.192247175220276e-05,
+      "loss": 3.7152,
+      "step": 147
+    },
+    {
+      "epoch": 0.19004815409309792,
+      "grad_norm": 6.447744846343994,
+      "learning_rate": 9.181152573929215e-05,
+      "loss": 3.706,
+      "step": 148
+    },
+    {
+      "epoch": 0.1913322632423756,
+      "grad_norm": 10.314383506774902,
+      "learning_rate": 9.16998908944939e-05,
+      "loss": 3.9419,
+      "step": 149
+    },
+    {
+      "epoch": 0.1926163723916533,
+      "grad_norm": 8.218704223632812,
+      "learning_rate": 9.158756905695739e-05,
+      "loss": 3.598,
+      "step": 150
+    },
+    {
+      "epoch": 0.19390048154093098,
+      "grad_norm": 11.949498176574707,
+      "learning_rate": 9.147456207714997e-05,
+      "loss": 5.7329,
+      "step": 151
+    },
+    {
+      "epoch": 0.19518459069020866,
+      "grad_norm": 18.321691513061523,
+      "learning_rate": 9.13608718168265e-05,
+      "loss": 9.0487,
+      "step": 152
+    },
+    {
+      "epoch": 0.19646869983948637,
+      "grad_norm": 6.177528381347656,
+      "learning_rate": 9.124650014899867e-05,
+      "loss": 2.8493,
+      "step": 153
+    },
+    {
+      "epoch": 0.19775280898876405,
+      "grad_norm": 6.979193687438965,
+      "learning_rate": 9.113144895790416e-05,
+      "loss": 3.3198,
+      "step": 154
+    },
+    {
+      "epoch": 0.19903691813804172,
+      "grad_norm": 9.914359092712402,
+      "learning_rate": 9.101572013897555e-05,
+      "loss": 3.9472,
+      "step": 155
+    },
+    {
+      "epoch": 0.20032102728731943,
+      "grad_norm": 5.75458288192749,
+      "learning_rate": 9.089931559880917e-05,
+      "loss": 2.7112,
+      "step": 156
+    },
+    {
+      "epoch": 0.2016051364365971,
+      "grad_norm": 7.481175899505615,
+      "learning_rate": 9.078223725513366e-05,
+      "loss": 3.9783,
+      "step": 157
+    },
+    {
+      "epoch": 0.2028892455858748,
+      "grad_norm": 7.678169250488281,
+      "learning_rate": 9.066448703677828e-05,
+      "loss": 3.1695,
+      "step": 158
+    },
+    {
+      "epoch": 0.2041733547351525,
+      "grad_norm": 8.589323997497559,
+      "learning_rate": 9.05460668836413e-05,
+      "loss": 3.9182,
+      "step": 159
+    },
+    {
+      "epoch": 0.20545746388443017,
+      "grad_norm": 7.451757907867432,
+      "learning_rate": 9.04269787466579e-05,
+      "loss": 3.1488,
+      "step": 160
+    },
+    {
+      "epoch": 0.20674157303370785,
+      "grad_norm": 5.324924468994141,
+      "learning_rate": 9.030722458776814e-05,
+      "loss": 3.0638,
+      "step": 161
+    },
+    {
+      "epoch": 0.20802568218298556,
+      "grad_norm": 6.008599281311035,
+      "learning_rate": 9.018680637988456e-05,
+      "loss": 2.8569,
+      "step": 162
+    },
+    {
+      "epoch": 0.20930979133226324,
+      "grad_norm": 5.319399833679199,
+      "learning_rate": 9.006572610685968e-05,
+      "loss": 2.2673,
+      "step": 163
+    },
+    {
+      "epoch": 0.21059390048154092,
+      "grad_norm": 7.054010391235352,
+      "learning_rate": 8.994398576345336e-05,
+      "loss": 4.3112,
+      "step": 164
+    },
+    {
+      "epoch": 0.21187800963081863,
+      "grad_norm": 7.008988857269287,
+      "learning_rate": 8.98215873552999e-05,
+      "loss": 3.7225,
+      "step": 165
+    },
+    {
+      "epoch": 0.2131621187800963,
+      "grad_norm": 6.705210208892822,
+      "learning_rate": 8.969853289887506e-05,
+      "loss": 3.9772,
+      "step": 166
+    },
+    {
+      "epoch": 0.21444622792937398,
+      "grad_norm": 6.178483009338379,
+      "learning_rate": 8.957482442146272e-05,
+      "loss": 3.0743,
+      "step": 167
+    },
+    {
+      "epoch": 0.2157303370786517,
+      "grad_norm": 7.0775532722473145,
+      "learning_rate": 8.945046396112158e-05,
+      "loss": 2.6931,
+      "step": 168
+    },
+    {
+      "epoch": 0.21701444622792937,
+      "grad_norm": 5.330657958984375,
+      "learning_rate": 8.932545356665157e-05,
+      "loss": 2.9166,
+      "step": 169
+    },
+    {
+      "epoch": 0.21829855537720708,
+      "grad_norm": 7.562477111816406,
+      "learning_rate": 8.919979529756008e-05,
+      "loss": 4.482,
+      "step": 170
+    },
+    {
+      "epoch": 0.21958266452648476,
+      "grad_norm": 7.135578632354736,
+      "learning_rate": 8.907349122402801e-05,
+      "loss": 3.0901,
+      "step": 171
+    },
+    {
+      "epoch": 0.22086677367576243,
+      "grad_norm": 8.700528144836426,
+      "learning_rate": 8.894654342687574e-05,
+      "loss": 3.7164,
+      "step": 172
+    },
+    {
+      "epoch": 0.22215088282504014,
+      "grad_norm": 9.571561813354492,
+      "learning_rate": 8.881895399752874e-05,
+      "loss": 4.8774,
+      "step": 173
+    },
+    {
+      "epoch": 0.22343499197431782,
+      "grad_norm": 5.657547950744629,
+      "learning_rate": 8.869072503798316e-05,
+      "loss": 2.6804,
+      "step": 174
+    },
+    {
+      "epoch": 0.2247191011235955,
+      "grad_norm": 6.577152252197266,
+      "learning_rate": 8.856185866077129e-05,
+      "loss": 3.2296,
+      "step": 175
+    },
+    {
+      "epoch": 0.2260032102728732,
+      "grad_norm": 6.924808025360107,
+      "learning_rate": 8.84323569889266e-05,
+      "loss": 2.9022,
+      "step": 176
+    },
+    {
+      "epoch": 0.22728731942215089,
+      "grad_norm": 7.516001224517822,
+      "learning_rate": 8.83022221559489e-05,
+      "loss": 3.6781,
+      "step": 177
+    },
+    {
+      "epoch": 0.22857142857142856,
+      "grad_norm": 7.878939628601074,
+      "learning_rate": 8.81714563057691e-05,
+      "loss": 4.868,
+      "step": 178
+    },
+    {
+      "epoch": 0.22985553772070627,
+      "grad_norm": 6.71074914932251,
+      "learning_rate": 8.80400615927139e-05,
+      "loss": 3.0282,
+      "step": 179
+    },
+    {
+      "epoch": 0.23113964686998395,
+      "grad_norm": 6.780827045440674,
+      "learning_rate": 8.790804018147039e-05,
+      "loss": 3.2256,
+      "step": 180
+    },
+    {
+      "epoch": 0.23242375601926163,
+      "grad_norm": 8.828377723693848,
+      "learning_rate": 8.777539424705023e-05,
+      "loss": 3.7639,
+      "step": 181
+    },
+    {
+      "epoch": 0.23370786516853934,
+      "grad_norm": 5.720644950866699,
+      "learning_rate": 8.764212597475397e-05,
+      "loss": 3.2172,
+      "step": 182
+    },
+    {
+      "epoch": 0.23499197431781702,
+      "grad_norm": 5.93733024597168,
+      "learning_rate": 8.750823756013498e-05,
+      "loss": 2.8897,
+      "step": 183
+    },
+    {
+      "epoch": 0.2362760834670947,
+      "grad_norm": 7.359304428100586,
+      "learning_rate": 8.737373120896324e-05,
+      "loss": 3.9367,
+      "step": 184
+    },
+    {
+      "epoch": 0.2375601926163724,
+      "grad_norm": 7.054259300231934,
+      "learning_rate": 8.72386091371891e-05,
+      "loss": 3.5839,
+      "step": 185
+    },
+    {
+      "epoch": 0.23884430176565008,
+      "grad_norm": 8.43356704711914,
+      "learning_rate": 8.710287357090665e-05,
+      "loss": 4.4185,
+      "step": 186
+    },
+    {
+      "epoch": 0.24012841091492776,
+      "grad_norm": 6.883121967315674,
+      "learning_rate": 8.696652674631717e-05,
+      "loss": 3.2293,
+      "step": 187
+    },
+    {
+      "epoch": 0.24141252006420547,
+      "grad_norm": 6.794466972351074,
+      "learning_rate": 8.68295709096922e-05,
+      "loss": 3.3295,
+      "step": 188
+    },
+    {
+      "epoch": 0.24269662921348314,
+      "grad_norm": 6.58953332901001,
+      "learning_rate": 8.669200831733655e-05,
+      "loss": 2.9255,
+      "step": 189
+    },
+    {
+      "epoch": 0.24398073836276082,
+      "grad_norm": 7.68890905380249,
+      "learning_rate": 8.655384123555117e-05,
+      "loss": 3.6443,
+      "step": 190
+    },
+    {
+      "epoch": 0.24526484751203853,
+      "grad_norm": 8.961833953857422,
+      "learning_rate": 8.641507194059579e-05,
+      "loss": 3.0887,
+      "step": 191
+    },
+    {
+      "epoch": 0.2465489566613162,
+      "grad_norm": 7.77858829498291,
+      "learning_rate": 8.627570271865141e-05,
+      "loss": 5.0099,
+      "step": 192
+    },
+    {
+      "epoch": 0.2478330658105939,
+      "grad_norm": 6.027552604675293,
+      "learning_rate": 8.613573586578262e-05,
+      "loss": 3.234,
+      "step": 193
+    },
+    {
+      "epoch": 0.2491171749598716,
+      "grad_norm": 5.958004951477051,
+      "learning_rate": 8.59951736878998e-05,
+      "loss": 2.8241,
+      "step": 194
+    },
+    {
+      "epoch": 0.2504012841091493,
+      "grad_norm": 8.112199783325195,
+      "learning_rate": 8.585401850072113e-05,
+      "loss": 3.3898,
+      "step": 195
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 779,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 195,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.2987881056305152e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7d85cb130960f0aa596839f027964534123ea615eb33c6d38662ebcf5a8455a2
+size 6776