Training in progress, step 233, checkpoint

Browse files

Files changed (13) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +35 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +31 -0
last-checkpoint/trainer_state.json +1680 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: facebook/opt-350m
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "facebook/opt-350m",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "project_out",
+    "project_in",
+    "out_proj",
+    "fc2",
+    "q_proj",
+    "fc1",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2159ac4486d27f5f80280d129433ce880fef6866acd01a2c063f63a97d4cf84
+size 14293800

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:90960656b58ffa954194a1d4720b24e2b4247b226a5532e1fe2be1b73430d2ff
+size 7579748

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:98a35daa7142931cd817e20bb35cffc7631377db3d4d758a9489420ab1d5af7a
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b225910dd63530c53a5b242bf23019e7bbecf1c7faf94571845c1196170d12fc
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "add_bos_token": true,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "</s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "errors": "replace",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "</s>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1680 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.25047030368180595,
+  "eval_steps": 233,
+  "global_step": 233,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0010749798441279225,
+      "grad_norm": 17.287843704223633,
+      "learning_rate": 2e-05,
+      "loss": 18.567,
+      "step": 1
+    },
+    {
+      "epoch": 0.0010749798441279225,
+      "eval_loss": 4.0416483879089355,
+      "eval_runtime": 6.371,
+      "eval_samples_per_second": 61.529,
+      "eval_steps_per_second": 30.765,
+      "step": 1
+    },
+    {
+      "epoch": 0.002149959688255845,
+      "grad_norm": 19.48370933532715,
+      "learning_rate": 4e-05,
+      "loss": 18.1211,
+      "step": 2
+    },
+    {
+      "epoch": 0.0032249395323837677,
+      "grad_norm": 14.203254699707031,
+      "learning_rate": 6e-05,
+      "loss": 14.5618,
+      "step": 3
+    },
+    {
+      "epoch": 0.00429991937651169,
+      "grad_norm": 16.90338897705078,
+      "learning_rate": 8e-05,
+      "loss": 17.2773,
+      "step": 4
+    },
+    {
+      "epoch": 0.005374899220639613,
+      "grad_norm": 10.459429740905762,
+      "learning_rate": 0.0001,
+      "loss": 14.4675,
+      "step": 5
+    },
+    {
+      "epoch": 0.0064498790647675355,
+      "grad_norm": 16.3222713470459,
+      "learning_rate": 0.00012,
+      "loss": 16.8006,
+      "step": 6
+    },
+    {
+      "epoch": 0.007524858908895459,
+      "grad_norm": 17.23369789123535,
+      "learning_rate": 0.00014,
+      "loss": 14.2422,
+      "step": 7
+    },
+    {
+      "epoch": 0.00859983875302338,
+      "grad_norm": 18.750120162963867,
+      "learning_rate": 0.00016,
+      "loss": 17.4167,
+      "step": 8
+    },
+    {
+      "epoch": 0.009674818597151304,
+      "grad_norm": 12.818103790283203,
+      "learning_rate": 0.00018,
+      "loss": 13.2251,
+      "step": 9
+    },
+    {
+      "epoch": 0.010749798441279226,
+      "grad_norm": 15.632926940917969,
+      "learning_rate": 0.0002,
+      "loss": 14.3551,
+      "step": 10
+    },
+    {
+      "epoch": 0.011824778285407149,
+      "grad_norm": 10.966743469238281,
+      "learning_rate": 0.00019999941823167997,
+      "loss": 12.1329,
+      "step": 11
+    },
+    {
+      "epoch": 0.012899758129535071,
+      "grad_norm": 10.447884559631348,
+      "learning_rate": 0.00019999767293348887,
+      "loss": 11.0189,
+      "step": 12
+    },
+    {
+      "epoch": 0.013974737973662993,
+      "grad_norm": 14.069694519042969,
+      "learning_rate": 0.00019999476412573398,
+      "loss": 12.6871,
+      "step": 13
+    },
+    {
+      "epoch": 0.015049717817790917,
+      "grad_norm": 17.129362106323242,
+      "learning_rate": 0.0001999906918422603,
+      "loss": 12.774,
+      "step": 14
+    },
+    {
+      "epoch": 0.01612469766191884,
+      "grad_norm": 12.345664978027344,
+      "learning_rate": 0.00019998545613045035,
+      "loss": 10.1907,
+      "step": 15
+    },
+    {
+      "epoch": 0.01719967750604676,
+      "grad_norm": 12.960017204284668,
+      "learning_rate": 0.00019997905705122353,
+      "loss": 9.124,
+      "step": 16
+    },
+    {
+      "epoch": 0.018274657350174684,
+      "grad_norm": 17.12679672241211,
+      "learning_rate": 0.0001999714946790355,
+      "loss": 11.1126,
+      "step": 17
+    },
+    {
+      "epoch": 0.019349637194302608,
+      "grad_norm": 13.355628967285156,
+      "learning_rate": 0.0001999627691018772,
+      "loss": 9.9217,
+      "step": 18
+    },
+    {
+      "epoch": 0.02042461703843053,
+      "grad_norm": 14.47995376586914,
+      "learning_rate": 0.00019995288042127393,
+      "loss": 9.8122,
+      "step": 19
+    },
+    {
+      "epoch": 0.021499596882558453,
+      "grad_norm": 15.26109504699707,
+      "learning_rate": 0.00019994182875228417,
+      "loss": 9.1869,
+      "step": 20
+    },
+    {
+      "epoch": 0.022574576726686373,
+      "grad_norm": 19.227262496948242,
+      "learning_rate": 0.00019992961422349805,
+      "loss": 8.0937,
+      "step": 21
+    },
+    {
+      "epoch": 0.023649556570814297,
+      "grad_norm": 15.230995178222656,
+      "learning_rate": 0.00019991623697703613,
+      "loss": 8.5341,
+      "step": 22
+    },
+    {
+      "epoch": 0.02472453641494222,
+      "grad_norm": 16.82895278930664,
+      "learning_rate": 0.00019990169716854758,
+      "loss": 9.1735,
+      "step": 23
+    },
+    {
+      "epoch": 0.025799516259070142,
+      "grad_norm": 22.261363983154297,
+      "learning_rate": 0.00019988599496720836,
+      "loss": 8.5753,
+      "step": 24
+    },
+    {
+      "epoch": 0.026874496103198066,
+      "grad_norm": 14.500650405883789,
+      "learning_rate": 0.0001998691305557194,
+      "loss": 8.7694,
+      "step": 25
+    },
+    {
+      "epoch": 0.027949475947325986,
+      "grad_norm": 13.450296401977539,
+      "learning_rate": 0.00019985110413030425,
+      "loss": 7.6744,
+      "step": 26
+    },
+    {
+      "epoch": 0.02902445579145391,
+      "grad_norm": 11.800576210021973,
+      "learning_rate": 0.00019983191590070703,
+      "loss": 6.7168,
+      "step": 27
+    },
+    {
+      "epoch": 0.030099435635581834,
+      "grad_norm": 15.437250137329102,
+      "learning_rate": 0.00019981156609018977,
+      "loss": 7.8992,
+      "step": 28
+    },
+    {
+      "epoch": 0.031174415479709755,
+      "grad_norm": 13.048258781433105,
+      "learning_rate": 0.00019979005493552996,
+      "loss": 7.4647,
+      "step": 29
+    },
+    {
+      "epoch": 0.03224939532383768,
+      "grad_norm": 17.663209915161133,
+      "learning_rate": 0.00019976738268701784,
+      "loss": 7.6277,
+      "step": 30
+    },
+    {
+      "epoch": 0.0333243751679656,
+      "grad_norm": 17.522117614746094,
+      "learning_rate": 0.00019974354960845326,
+      "loss": 7.3131,
+      "step": 31
+    },
+    {
+      "epoch": 0.03439935501209352,
+      "grad_norm": 16.121286392211914,
+      "learning_rate": 0.00019971855597714284,
+      "loss": 7.1682,
+      "step": 32
+    },
+    {
+      "epoch": 0.035474334856221444,
+      "grad_norm": 12.511422157287598,
+      "learning_rate": 0.00019969240208389665,
+      "loss": 6.4537,
+      "step": 33
+    },
+    {
+      "epoch": 0.03654931470034937,
+      "grad_norm": 14.760931015014648,
+      "learning_rate": 0.00019966508823302483,
+      "loss": 6.8972,
+      "step": 34
+    },
+    {
+      "epoch": 0.03762429454447729,
+      "grad_norm": 16.834484100341797,
+      "learning_rate": 0.00019963661474233402,
+      "loss": 8.2614,
+      "step": 35
+    },
+    {
+      "epoch": 0.038699274388605216,
+      "grad_norm": 13.5601224899292,
+      "learning_rate": 0.0001996069819431237,
+      "loss": 6.4588,
+      "step": 36
+    },
+    {
+      "epoch": 0.03977425423273313,
+      "grad_norm": 14.121377944946289,
+      "learning_rate": 0.00019957619018018242,
+      "loss": 6.057,
+      "step": 37
+    },
+    {
+      "epoch": 0.04084923407686106,
+      "grad_norm": 14.331984519958496,
+      "learning_rate": 0.00019954423981178354,
+      "loss": 5.9236,
+      "step": 38
+    },
+    {
+      "epoch": 0.04192421392098898,
+      "grad_norm": 14.163195610046387,
+      "learning_rate": 0.00019951113120968134,
+      "loss": 6.0719,
+      "step": 39
+    },
+    {
+      "epoch": 0.042999193765116905,
+      "grad_norm": 13.852533340454102,
+      "learning_rate": 0.00019947686475910655,
+      "loss": 5.6034,
+      "step": 40
+    },
+    {
+      "epoch": 0.04407417360924483,
+      "grad_norm": 14.488564491271973,
+      "learning_rate": 0.00019944144085876184,
+      "loss": 7.0848,
+      "step": 41
+    },
+    {
+      "epoch": 0.045149153453372746,
+      "grad_norm": 11.431620597839355,
+      "learning_rate": 0.0001994048599208173,
+      "loss": 5.5335,
+      "step": 42
+    },
+    {
+      "epoch": 0.04622413329750067,
+      "grad_norm": 13.871944427490234,
+      "learning_rate": 0.00019936712237090553,
+      "loss": 5.8063,
+      "step": 43
+    },
+    {
+      "epoch": 0.047299113141628595,
+      "grad_norm": 18.87192726135254,
+      "learning_rate": 0.00019932822864811677,
+      "loss": 6.0023,
+      "step": 44
+    },
+    {
+      "epoch": 0.04837409298575652,
+      "grad_norm": 12.797957420349121,
+      "learning_rate": 0.00019928817920499375,
+      "loss": 5.546,
+      "step": 45
+    },
+    {
+      "epoch": 0.04944907282988444,
+      "grad_norm": 14.95291519165039,
+      "learning_rate": 0.00019924697450752633,
+      "loss": 6.1613,
+      "step": 46
+    },
+    {
+      "epoch": 0.05052405267401236,
+      "grad_norm": 18.501853942871094,
+      "learning_rate": 0.00019920461503514635,
+      "loss": 6.1402,
+      "step": 47
+    },
+    {
+      "epoch": 0.051599032518140284,
+      "grad_norm": 19.192930221557617,
+      "learning_rate": 0.0001991611012807218,
+      "loss": 5.7711,
+      "step": 48
+    },
+    {
+      "epoch": 0.05267401236226821,
+      "grad_norm": 23.865346908569336,
+      "learning_rate": 0.00019911643375055107,
+      "loss": 6.6772,
+      "step": 49
+    },
+    {
+      "epoch": 0.05374899220639613,
+      "grad_norm": 27.616785049438477,
+      "learning_rate": 0.00019907061296435728,
+      "loss": 6.6335,
+      "step": 50
+    },
+    {
+      "epoch": 0.054823972050524056,
+      "grad_norm": 20.929126739501953,
+      "learning_rate": 0.0001990236394552821,
+      "loss": 6.4005,
+      "step": 51
+    },
+    {
+      "epoch": 0.05589895189465197,
+      "grad_norm": 10.98021411895752,
+      "learning_rate": 0.00019897551376987948,
+      "loss": 4.4406,
+      "step": 52
+    },
+    {
+      "epoch": 0.0569739317387799,
+      "grad_norm": 12.884988784790039,
+      "learning_rate": 0.00019892623646810943,
+      "loss": 4.7416,
+      "step": 53
+    },
+    {
+      "epoch": 0.05804891158290782,
+      "grad_norm": 14.663339614868164,
+      "learning_rate": 0.0001988758081233314,
+      "loss": 5.6428,
+      "step": 54
+    },
+    {
+      "epoch": 0.059123891427035745,
+      "grad_norm": 13.638205528259277,
+      "learning_rate": 0.00019882422932229765,
+      "loss": 6.3548,
+      "step": 55
+    },
+    {
+      "epoch": 0.06019887127116367,
+      "grad_norm": 15.025946617126465,
+      "learning_rate": 0.00019877150066514645,
+      "loss": 4.9333,
+      "step": 56
+    },
+    {
+      "epoch": 0.061273851115291586,
+      "grad_norm": 20.201622009277344,
+      "learning_rate": 0.000198717622765395,
+      "loss": 6.5034,
+      "step": 57
+    },
+    {
+      "epoch": 0.06234883095941951,
+      "grad_norm": 14.798491477966309,
+      "learning_rate": 0.00019866259624993246,
+      "loss": 4.757,
+      "step": 58
+    },
+    {
+      "epoch": 0.06342381080354743,
+      "grad_norm": 15.673213005065918,
+      "learning_rate": 0.00019860642175901247,
+      "loss": 7.0599,
+      "step": 59
+    },
+    {
+      "epoch": 0.06449879064767536,
+      "grad_norm": 18.45370864868164,
+      "learning_rate": 0.00019854909994624582,
+      "loss": 6.7934,
+      "step": 60
+    },
+    {
+      "epoch": 0.06557377049180328,
+      "grad_norm": 15.39392375946045,
+      "learning_rate": 0.0001984906314785928,
+      "loss": 5.5127,
+      "step": 61
+    },
+    {
+      "epoch": 0.0666487503359312,
+      "grad_norm": 16.213571548461914,
+      "learning_rate": 0.00019843101703635548,
+      "loss": 4.8815,
+      "step": 62
+    },
+    {
+      "epoch": 0.06772373018005913,
+      "grad_norm": 20.046100616455078,
+      "learning_rate": 0.00019837025731316967,
+      "loss": 5.3901,
+      "step": 63
+    },
+    {
+      "epoch": 0.06879871002418704,
+      "grad_norm": 16.978891372680664,
+      "learning_rate": 0.0001983083530159971,
+      "loss": 5.7858,
+      "step": 64
+    },
+    {
+      "epoch": 0.06987368986831496,
+      "grad_norm": 17.5430965423584,
+      "learning_rate": 0.00019824530486511687,
+      "loss": 6.2824,
+      "step": 65
+    },
+    {
+      "epoch": 0.07094866971244289,
+      "grad_norm": 15.383797645568848,
+      "learning_rate": 0.00019818111359411737,
+      "loss": 4.4531,
+      "step": 66
+    },
+    {
+      "epoch": 0.07202364955657081,
+      "grad_norm": 16.83544921875,
+      "learning_rate": 0.00019811577994988754,
+      "loss": 6.4399,
+      "step": 67
+    },
+    {
+      "epoch": 0.07309862940069874,
+      "grad_norm": 22.3226261138916,
+      "learning_rate": 0.00019804930469260828,
+      "loss": 7.8473,
+      "step": 68
+    },
+    {
+      "epoch": 0.07417360924482666,
+      "grad_norm": 18.50650978088379,
+      "learning_rate": 0.00019798168859574356,
+      "loss": 6.8441,
+      "step": 69
+    },
+    {
+      "epoch": 0.07524858908895458,
+      "grad_norm": 17.836515426635742,
+      "learning_rate": 0.00019791293244603142,
+      "loss": 5.3271,
+      "step": 70
+    },
+    {
+      "epoch": 0.07632356893308251,
+      "grad_norm": 16.706695556640625,
+      "learning_rate": 0.00019784303704347488,
+      "loss": 5.4312,
+      "step": 71
+    },
+    {
+      "epoch": 0.07739854877721043,
+      "grad_norm": 18.03818130493164,
+      "learning_rate": 0.00019777200320133254,
+      "loss": 5.9135,
+      "step": 72
+    },
+    {
+      "epoch": 0.07847352862133836,
+      "grad_norm": 11.856945991516113,
+      "learning_rate": 0.00019769983174610918,
+      "loss": 5.6232,
+      "step": 73
+    },
+    {
+      "epoch": 0.07954850846546627,
+      "grad_norm": 17.87145233154297,
+      "learning_rate": 0.00019762652351754616,
+      "loss": 4.9234,
+      "step": 74
+    },
+    {
+      "epoch": 0.08062348830959419,
+      "grad_norm": 16.913291931152344,
+      "learning_rate": 0.00019755207936861155,
+      "loss": 6.6548,
+      "step": 75
+    },
+    {
+      "epoch": 0.08169846815372211,
+      "grad_norm": 12.137495040893555,
+      "learning_rate": 0.00019747650016549027,
+      "loss": 4.1446,
+      "step": 76
+    },
+    {
+      "epoch": 0.08277344799785004,
+      "grad_norm": 15.74571704864502,
+      "learning_rate": 0.00019739978678757412,
+      "loss": 6.0891,
+      "step": 77
+    },
+    {
+      "epoch": 0.08384842784197796,
+      "grad_norm": 13.001721382141113,
+      "learning_rate": 0.0001973219401274513,
+      "loss": 4.2512,
+      "step": 78
+    },
+    {
+      "epoch": 0.08492340768610589,
+      "grad_norm": 20.199098587036133,
+      "learning_rate": 0.00019724296109089622,
+      "loss": 6.0262,
+      "step": 79
+    },
+    {
+      "epoch": 0.08599838753023381,
+      "grad_norm": 16.964731216430664,
+      "learning_rate": 0.00019716285059685892,
+      "loss": 4.7964,
+      "step": 80
+    },
+    {
+      "epoch": 0.08707336737436173,
+      "grad_norm": 15.965873718261719,
+      "learning_rate": 0.0001970816095774544,
+      "loss": 5.7548,
+      "step": 81
+    },
+    {
+      "epoch": 0.08814834721848966,
+      "grad_norm": 22.0924129486084,
+      "learning_rate": 0.00019699923897795163,
+      "loss": 7.1131,
+      "step": 82
+    },
+    {
+      "epoch": 0.08922332706261758,
+      "grad_norm": 16.394224166870117,
+      "learning_rate": 0.0001969157397567627,
+      "loss": 5.7294,
+      "step": 83
+    },
+    {
+      "epoch": 0.09029830690674549,
+      "grad_norm": 14.458036422729492,
+      "learning_rate": 0.0001968311128854317,
+      "loss": 5.4966,
+      "step": 84
+    },
+    {
+      "epoch": 0.09137328675087342,
+      "grad_norm": 12.637007713317871,
+      "learning_rate": 0.00019674535934862325,
+      "loss": 3.4551,
+      "step": 85
+    },
+    {
+      "epoch": 0.09244826659500134,
+      "grad_norm": 13.059622764587402,
+      "learning_rate": 0.00019665848014411118,
+      "loss": 5.1353,
+      "step": 86
+    },
+    {
+      "epoch": 0.09352324643912927,
+      "grad_norm": 20.794404983520508,
+      "learning_rate": 0.00019657047628276688,
+      "loss": 4.9761,
+      "step": 87
+    },
+    {
+      "epoch": 0.09459822628325719,
+      "grad_norm": 15.221658706665039,
+      "learning_rate": 0.00019648134878854747,
+      "loss": 4.8321,
+      "step": 88
+    },
+    {
+      "epoch": 0.09567320612738511,
+      "grad_norm": 14.838767051696777,
+      "learning_rate": 0.0001963910986984841,
+      "loss": 4.5949,
+      "step": 89
+    },
+    {
+      "epoch": 0.09674818597151304,
+      "grad_norm": 12.857973098754883,
+      "learning_rate": 0.00019629972706266952,
+      "loss": 4.0017,
+      "step": 90
+    },
+    {
+      "epoch": 0.09782316581564096,
+      "grad_norm": 15.524980545043945,
+      "learning_rate": 0.00019620723494424627,
+      "loss": 4.57,
+      "step": 91
+    },
+    {
+      "epoch": 0.09889814565976889,
+      "grad_norm": 11.060179710388184,
+      "learning_rate": 0.000196113623419394,
+      "loss": 4.1406,
+      "step": 92
+    },
+    {
+      "epoch": 0.0999731255038968,
+      "grad_norm": 18.566953659057617,
+      "learning_rate": 0.00019601889357731713,
+      "loss": 4.3026,
+      "step": 93
+    },
+    {
+      "epoch": 0.10104810534802472,
+      "grad_norm": 13.795799255371094,
+      "learning_rate": 0.00019592304652023206,
+      "loss": 3.585,
+      "step": 94
+    },
+    {
+      "epoch": 0.10212308519215264,
+      "grad_norm": 17.49445343017578,
+      "learning_rate": 0.0001958260833633544,
+      "loss": 5.1268,
+      "step": 95
+    },
+    {
+      "epoch": 0.10319806503628057,
+      "grad_norm": 13.12109661102295,
+      "learning_rate": 0.00019572800523488609,
+      "loss": 4.2585,
+      "step": 96
+    },
+    {
+      "epoch": 0.10427304488040849,
+      "grad_norm": 15.185480117797852,
+      "learning_rate": 0.00019562881327600198,
+      "loss": 4.8719,
+      "step": 97
+    },
+    {
+      "epoch": 0.10534802472453642,
+      "grad_norm": 11.378191947937012,
+      "learning_rate": 0.00019552850864083693,
+      "loss": 4.2474,
+      "step": 98
+    },
+    {
+      "epoch": 0.10642300456866434,
+      "grad_norm": 17.479673385620117,
+      "learning_rate": 0.0001954270924964721,
+      "loss": 4.1351,
+      "step": 99
+    },
+    {
+      "epoch": 0.10749798441279226,
+      "grad_norm": 20.179716110229492,
+      "learning_rate": 0.0001953245660229215,
+      "loss": 4.218,
+      "step": 100
+    },
+    {
+      "epoch": 0.10857296425692019,
+      "grad_norm": 15.806692123413086,
+      "learning_rate": 0.00019522093041311815,
+      "loss": 5.7112,
+      "step": 101
+    },
+    {
+      "epoch": 0.10964794410104811,
+      "grad_norm": 13.089418411254883,
+      "learning_rate": 0.00019511618687290043,
+      "loss": 3.3798,
+      "step": 102
+    },
+    {
+      "epoch": 0.11072292394517602,
+      "grad_norm": 15.82168197631836,
+      "learning_rate": 0.00019501033662099778,
+      "loss": 5.123,
+      "step": 103
+    },
+    {
+      "epoch": 0.11179790378930395,
+      "grad_norm": 17.98412322998047,
+      "learning_rate": 0.00019490338088901666,
+      "loss": 4.6133,
+      "step": 104
+    },
+    {
+      "epoch": 0.11287288363343187,
+      "grad_norm": 13.41305160522461,
+      "learning_rate": 0.0001947953209214262,
+      "loss": 4.4088,
+      "step": 105
+    },
+    {
+      "epoch": 0.1139478634775598,
+      "grad_norm": 17.843494415283203,
+      "learning_rate": 0.00019468615797554374,
+      "loss": 3.5071,
+      "step": 106
+    },
+    {
+      "epoch": 0.11502284332168772,
+      "grad_norm": 17.681631088256836,
+      "learning_rate": 0.00019457589332152008,
+      "loss": 5.0372,
+      "step": 107
+    },
+    {
+      "epoch": 0.11609782316581564,
+      "grad_norm": 17.937023162841797,
+      "learning_rate": 0.00019446452824232492,
+      "loss": 4.3635,
+      "step": 108
+    },
+    {
+      "epoch": 0.11717280300994357,
+      "grad_norm": 21.669342041015625,
+      "learning_rate": 0.00019435206403373178,
+      "loss": 5.2923,
+      "step": 109
+    },
+    {
+      "epoch": 0.11824778285407149,
+      "grad_norm": 18.59075927734375,
+      "learning_rate": 0.00019423850200430293,
+      "loss": 4.7142,
+      "step": 110
+    },
+    {
+      "epoch": 0.11932276269819941,
+      "grad_norm": 19.25830841064453,
+      "learning_rate": 0.00019412384347537414,
+      "loss": 5.0176,
+      "step": 111
+    },
+    {
+      "epoch": 0.12039774254232734,
+      "grad_norm": 21.377017974853516,
+      "learning_rate": 0.00019400808978103947,
+      "loss": 5.0599,
+      "step": 112
+    },
+    {
+      "epoch": 0.12147272238645525,
+      "grad_norm": 14.341522216796875,
+      "learning_rate": 0.0001938912422681355,
+      "loss": 4.9352,
+      "step": 113
+    },
+    {
+      "epoch": 0.12254770223058317,
+      "grad_norm": 15.528069496154785,
+      "learning_rate": 0.00019377330229622595,
+      "loss": 5.6631,
+      "step": 114
+    },
+    {
+      "epoch": 0.1236226820747111,
+      "grad_norm": 18.492849349975586,
+      "learning_rate": 0.0001936542712375855,
+      "loss": 4.6148,
+      "step": 115
+    },
+    {
+      "epoch": 0.12469766191883902,
+      "grad_norm": 20.3253116607666,
+      "learning_rate": 0.0001935341504771842,
+      "loss": 4.5666,
+      "step": 116
+    },
+    {
+      "epoch": 0.12577264176296696,
+      "grad_norm": 14.674714088439941,
+      "learning_rate": 0.00019341294141267108,
+      "loss": 4.8294,
+      "step": 117
+    },
+    {
+      "epoch": 0.12684762160709487,
+      "grad_norm": 10.383010864257812,
+      "learning_rate": 0.00019329064545435803,
+      "loss": 4.0049,
+      "step": 118
+    },
+    {
+      "epoch": 0.12792260145122278,
+      "grad_norm": 15.706880569458008,
+      "learning_rate": 0.00019316726402520334,
+      "loss": 4.5301,
+      "step": 119
+    },
+    {
+      "epoch": 0.12899758129535072,
+      "grad_norm": 14.116766929626465,
+      "learning_rate": 0.0001930427985607951,
+      "loss": 4.21,
+      "step": 120
+    },
+    {
+      "epoch": 0.13007256113947863,
+      "grad_norm": 12.008879661560059,
+      "learning_rate": 0.00019291725050933468,
+      "loss": 3.6814,
+      "step": 121
+    },
+    {
+      "epoch": 0.13114754098360656,
+      "grad_norm": 15.146964073181152,
+      "learning_rate": 0.00019279062133161957,
+      "loss": 4.0279,
+      "step": 122
+    },
+    {
+      "epoch": 0.13222252082773447,
+      "grad_norm": 15.026398658752441,
+      "learning_rate": 0.0001926629125010267,
+      "loss": 4.0473,
+      "step": 123
+    },
+    {
+      "epoch": 0.1332975006718624,
+      "grad_norm": 15.858999252319336,
+      "learning_rate": 0.00019253412550349509,
+      "loss": 4.3264,
+      "step": 124
+    },
+    {
+      "epoch": 0.13437248051599032,
+      "grad_norm": 18.721647262573242,
+      "learning_rate": 0.00019240426183750865,
+      "loss": 4.4262,
+      "step": 125
+    },
+    {
+      "epoch": 0.13544746036011826,
+      "grad_norm": 19.55031394958496,
+      "learning_rate": 0.0001922733230140787,
+      "loss": 5.5739,
+      "step": 126
+    },
+    {
+      "epoch": 0.13652244020424617,
+      "grad_norm": 10.331392288208008,
+      "learning_rate": 0.00019214131055672647,
+      "loss": 3.5695,
+      "step": 127
+    },
+    {
+      "epoch": 0.13759742004837408,
+      "grad_norm": 19.27557373046875,
+      "learning_rate": 0.0001920082260014652,
+      "loss": 5.4195,
+      "step": 128
+    },
+    {
+      "epoch": 0.13867239989250202,
+      "grad_norm": 21.552522659301758,
+      "learning_rate": 0.0001918740708967825,
+      "loss": 4.1473,
+      "step": 129
+    },
+    {
+      "epoch": 0.13974737973662993,
+      "grad_norm": 19.07160186767578,
+      "learning_rate": 0.0001917388468036222,
+      "loss": 4.3624,
+      "step": 130
+    },
+    {
+      "epoch": 0.14082235958075787,
+      "grad_norm": 24.328269958496094,
+      "learning_rate": 0.0001916025552953661,
+      "loss": 4.5408,
+      "step": 131
+    },
+    {
+      "epoch": 0.14189733942488578,
+      "grad_norm": 22.924718856811523,
+      "learning_rate": 0.00019146519795781587,
+      "loss": 4.2812,
+      "step": 132
+    },
+    {
+      "epoch": 0.14297231926901371,
+      "grad_norm": 15.937036514282227,
+      "learning_rate": 0.00019132677638917449,
+      "loss": 4.6842,
+      "step": 133
+    },
+    {
+      "epoch": 0.14404729911314162,
+      "grad_norm": 14.515525817871094,
+      "learning_rate": 0.00019118729220002755,
+      "loss": 3.2523,
+      "step": 134
+    },
+    {
+      "epoch": 0.14512227895726956,
+      "grad_norm": 19.804189682006836,
+      "learning_rate": 0.00019104674701332476,
+      "loss": 4.5473,
+      "step": 135
+    },
+    {
+      "epoch": 0.14619725880139747,
+      "grad_norm": 13.662827491760254,
+      "learning_rate": 0.00019090514246436087,
+      "loss": 4.1841,
+      "step": 136
+    },
+    {
+      "epoch": 0.14727223864552538,
+      "grad_norm": 22.40411376953125,
+      "learning_rate": 0.00019076248020075665,
+      "loss": 6.2449,
+      "step": 137
+    },
+    {
+      "epoch": 0.14834721848965332,
+      "grad_norm": 15.33782958984375,
+      "learning_rate": 0.00019061876188243982,
+      "loss": 2.8611,
+      "step": 138
+    },
+    {
+      "epoch": 0.14942219833378123,
+      "grad_norm": 20.899106979370117,
+      "learning_rate": 0.00019047398918162572,
+      "loss": 5.3855,
+      "step": 139
+    },
+    {
+      "epoch": 0.15049717817790917,
+      "grad_norm": 16.781774520874023,
+      "learning_rate": 0.00019032816378279768,
+      "loss": 4.2343,
+      "step": 140
+    },
+    {
+      "epoch": 0.15157215802203708,
+      "grad_norm": 15.55665397644043,
+      "learning_rate": 0.00019018128738268773,
+      "loss": 4.5545,
+      "step": 141
+    },
+    {
+      "epoch": 0.15264713786616502,
+      "grad_norm": 22.28097152709961,
+      "learning_rate": 0.00019003336169025654,
+      "loss": 5.1255,
+      "step": 142
+    },
+    {
+      "epoch": 0.15372211771029293,
+      "grad_norm": 14.668632507324219,
+      "learning_rate": 0.00018988438842667375,
+      "loss": 5.7869,
+      "step": 143
+    },
+    {
+      "epoch": 0.15479709755442086,
+      "grad_norm": 21.854108810424805,
+      "learning_rate": 0.00018973436932529793,
+      "loss": 5.1173,
+      "step": 144
+    },
+    {
+      "epoch": 0.15587207739854878,
+      "grad_norm": 16.630081176757812,
+      "learning_rate": 0.00018958330613165622,
+      "loss": 4.251,
+      "step": 145
+    },
+    {
+      "epoch": 0.1569470572426767,
+      "grad_norm": 16.389333724975586,
+      "learning_rate": 0.00018943120060342425,
+      "loss": 4.6531,
+      "step": 146
+    },
+    {
+      "epoch": 0.15802203708680462,
+      "grad_norm": 14.595359802246094,
+      "learning_rate": 0.0001892780545104056,
+      "loss": 4.1349,
+      "step": 147
+    },
+    {
+      "epoch": 0.15909701693093253,
+      "grad_norm": 18.753944396972656,
+      "learning_rate": 0.00018912386963451113,
+      "loss": 4.0963,
+      "step": 148
+    },
+    {
+      "epoch": 0.16017199677506047,
+      "grad_norm": 15.209190368652344,
+      "learning_rate": 0.00018896864776973837,
+      "loss": 3.6522,
+      "step": 149
+    },
+    {
+      "epoch": 0.16124697661918838,
+      "grad_norm": 20.873994827270508,
+      "learning_rate": 0.00018881239072215063,
+      "loss": 5.3913,
+      "step": 150
+    },
+    {
+      "epoch": 0.16232195646331632,
+      "grad_norm": 12.859075546264648,
+      "learning_rate": 0.00018865510030985588,
+      "loss": 2.6075,
+      "step": 151
+    },
+    {
+      "epoch": 0.16339693630744423,
+      "grad_norm": 21.292451858520508,
+      "learning_rate": 0.00018849677836298568,
+      "loss": 4.9356,
+      "step": 152
+    },
+    {
+      "epoch": 0.16447191615157217,
+      "grad_norm": 14.94565200805664,
+      "learning_rate": 0.00018833742672367393,
+      "loss": 3.804,
+      "step": 153
+    },
+    {
+      "epoch": 0.16554689599570008,
+      "grad_norm": 20.21578025817871,
+      "learning_rate": 0.00018817704724603536,
+      "loss": 5.3554,
+      "step": 154
+    },
+    {
+      "epoch": 0.16662187583982802,
+      "grad_norm": 18.05601692199707,
+      "learning_rate": 0.00018801564179614388,
+      "loss": 4.6274,
+      "step": 155
+    },
+    {
+      "epoch": 0.16769685568395593,
+      "grad_norm": 13.378555297851562,
+      "learning_rate": 0.00018785321225201108,
+      "loss": 3.8398,
+      "step": 156
+    },
+    {
+      "epoch": 0.16877183552808384,
+      "grad_norm": 13.038491249084473,
+      "learning_rate": 0.00018768976050356426,
+      "loss": 3.7924,
+      "step": 157
+    },
+    {
+      "epoch": 0.16984681537221177,
+      "grad_norm": 10.797876358032227,
+      "learning_rate": 0.00018752528845262433,
+      "loss": 3.273,
+      "step": 158
+    },
+    {
+      "epoch": 0.17092179521633968,
+      "grad_norm": 12.845779418945312,
+      "learning_rate": 0.00018735979801288392,
+      "loss": 3.7228,
+      "step": 159
+    },
+    {
+      "epoch": 0.17199677506046762,
+      "grad_norm": 12.737683296203613,
+      "learning_rate": 0.00018719329110988486,
+      "loss": 4.2175,
+      "step": 160
+    },
+    {
+      "epoch": 0.17307175490459553,
+      "grad_norm": 16.20618438720703,
+      "learning_rate": 0.00018702576968099608,
+      "loss": 3.4056,
+      "step": 161
+    },
+    {
+      "epoch": 0.17414673474872347,
+      "grad_norm": 19.91802978515625,
+      "learning_rate": 0.00018685723567539068,
+      "loss": 4.6,
+      "step": 162
+    },
+    {
+      "epoch": 0.17522171459285138,
+      "grad_norm": 15.743769645690918,
+      "learning_rate": 0.00018668769105402365,
+      "loss": 3.5829,
+      "step": 163
+    },
+    {
+      "epoch": 0.17629669443697932,
+      "grad_norm": 16.444740295410156,
+      "learning_rate": 0.00018651713778960875,
+      "loss": 4.4017,
+      "step": 164
+    },
+    {
+      "epoch": 0.17737167428110723,
+      "grad_norm": 15.459141731262207,
+      "learning_rate": 0.0001863455778665957,
+      "loss": 4.1357,
+      "step": 165
+    },
+    {
+      "epoch": 0.17844665412523517,
+      "grad_norm": 18.962736129760742,
+      "learning_rate": 0.00018617301328114705,
+      "loss": 4.5289,
+      "step": 166
+    },
+    {
+      "epoch": 0.17952163396936308,
+      "grad_norm": 17.9486083984375,
+      "learning_rate": 0.000185999446041115,
+      "loss": 4.1333,
+      "step": 167
+    },
+    {
+      "epoch": 0.18059661381349099,
+      "grad_norm": 12.445462226867676,
+      "learning_rate": 0.00018582487816601797,
+      "loss": 3.7512,
+      "step": 168
+    },
+    {
+      "epoch": 0.18167159365761892,
+      "grad_norm": 19.503494262695312,
+      "learning_rate": 0.00018564931168701712,
+      "loss": 4.716,
+      "step": 169
+    },
+    {
+      "epoch": 0.18274657350174683,
+      "grad_norm": 31.339258193969727,
+      "learning_rate": 0.00018547274864689285,
+      "loss": 6.1173,
+      "step": 170
+    },
+    {
+      "epoch": 0.18382155334587477,
+      "grad_norm": 10.792606353759766,
+      "learning_rate": 0.00018529519110002077,
+      "loss": 3.1399,
+      "step": 171
+    },
+    {
+      "epoch": 0.18489653319000268,
+      "grad_norm": 16.1004695892334,
+      "learning_rate": 0.00018511664111234798,
+      "loss": 3.8947,
+      "step": 172
+    },
+    {
+      "epoch": 0.18597151303413062,
+      "grad_norm": 11.773107528686523,
+      "learning_rate": 0.00018493710076136898,
+      "loss": 3.0606,
+      "step": 173
+    },
+    {
+      "epoch": 0.18704649287825853,
+      "grad_norm": 12.119939804077148,
+      "learning_rate": 0.00018475657213610166,
+      "loss": 2.9083,
+      "step": 174
+    },
+    {
+      "epoch": 0.18812147272238647,
+      "grad_norm": 17.70090103149414,
+      "learning_rate": 0.0001845750573370626,
+      "loss": 5.4718,
+      "step": 175
+    },
+    {
+      "epoch": 0.18919645256651438,
+      "grad_norm": 15.901100158691406,
+      "learning_rate": 0.00018439255847624303,
+      "loss": 5.1192,
+      "step": 176
+    },
+    {
+      "epoch": 0.1902714324106423,
+      "grad_norm": 14.755876541137695,
+      "learning_rate": 0.00018420907767708407,
+      "loss": 3.7262,
+      "step": 177
+    },
+    {
+      "epoch": 0.19134641225477023,
+      "grad_norm": 20.44917869567871,
+      "learning_rate": 0.00018402461707445205,
+      "loss": 4.4912,
+      "step": 178
+    },
+    {
+      "epoch": 0.19242139209889814,
+      "grad_norm": 12.053194046020508,
+      "learning_rate": 0.00018383917881461366,
+      "loss": 3.2561,
+      "step": 179
+    },
+    {
+      "epoch": 0.19349637194302607,
+      "grad_norm": 16.65236473083496,
+      "learning_rate": 0.000183652765055211,
+      "loss": 3.8193,
+      "step": 180
+    },
+    {
+      "epoch": 0.19457135178715398,
+      "grad_norm": 18.52997589111328,
+      "learning_rate": 0.00018346537796523645,
+      "loss": 5.1119,
+      "step": 181
+    },
+    {
+      "epoch": 0.19564633163128192,
+      "grad_norm": 20.083873748779297,
+      "learning_rate": 0.0001832770197250075,
+      "loss": 3.7478,
+      "step": 182
+    },
+    {
+      "epoch": 0.19672131147540983,
+      "grad_norm": 16.985313415527344,
+      "learning_rate": 0.00018308769252614124,
+      "loss": 4.1994,
+      "step": 183
+    },
+    {
+      "epoch": 0.19779629131953777,
+      "grad_norm": 20.08932113647461,
+      "learning_rate": 0.00018289739857152903,
+      "loss": 5.1951,
+      "step": 184
+    },
+    {
+      "epoch": 0.19887127116366568,
+      "grad_norm": 19.779159545898438,
+      "learning_rate": 0.00018270614007531076,
+      "loss": 3.849,
+      "step": 185
+    },
+    {
+      "epoch": 0.1999462510077936,
+      "grad_norm": 17.439552307128906,
+      "learning_rate": 0.00018251391926284906,
+      "loss": 3.9962,
+      "step": 186
+    },
+    {
+      "epoch": 0.20102123085192153,
+      "grad_norm": 22.15019416809082,
+      "learning_rate": 0.0001823207383707036,
+      "loss": 5.3047,
+      "step": 187
+    },
+    {
+      "epoch": 0.20209621069604944,
+      "grad_norm": 12.635649681091309,
+      "learning_rate": 0.00018212659964660476,
+      "loss": 2.8466,
+      "step": 188
+    },
+    {
+      "epoch": 0.20317119054017738,
+      "grad_norm": 21.744354248046875,
+      "learning_rate": 0.00018193150534942778,
+      "loss": 4.3091,
+      "step": 189
+    },
+    {
+      "epoch": 0.2042461703843053,
+      "grad_norm": 26.043798446655273,
+      "learning_rate": 0.00018173545774916627,
+      "loss": 3.7433,
+      "step": 190
+    },
+    {
+      "epoch": 0.20532115022843322,
+      "grad_norm": 13.015897750854492,
+      "learning_rate": 0.00018153845912690587,
+      "loss": 4.0063,
+      "step": 191
+    },
+    {
+      "epoch": 0.20639613007256113,
+      "grad_norm": 21.41050148010254,
+      "learning_rate": 0.00018134051177479777,
+      "loss": 3.7365,
+      "step": 192
+    },
+    {
+      "epoch": 0.20747110991668907,
+      "grad_norm": 20.63169288635254,
+      "learning_rate": 0.00018114161799603193,
+      "loss": 3.8878,
+      "step": 193
+    },
+    {
+      "epoch": 0.20854608976081698,
+      "grad_norm": 18.148544311523438,
+      "learning_rate": 0.00018094178010481034,
+      "loss": 3.4437,
+      "step": 194
+    },
+    {
+      "epoch": 0.20962106960494492,
+      "grad_norm": 15.158918380737305,
+      "learning_rate": 0.00018074100042632005,
+      "loss": 3.2009,
+      "step": 195
+    },
+    {
+      "epoch": 0.21069604944907283,
+      "grad_norm": 14.152005195617676,
+      "learning_rate": 0.00018053928129670624,
+      "loss": 3.4912,
+      "step": 196
+    },
+    {
+      "epoch": 0.21177102929320074,
+      "grad_norm": 13.470719337463379,
+      "learning_rate": 0.00018033662506304485,
+      "loss": 3.3799,
+      "step": 197
+    },
+    {
+      "epoch": 0.21284600913732868,
+      "grad_norm": 16.506755828857422,
+      "learning_rate": 0.00018013303408331543,
+      "loss": 2.9757,
+      "step": 198
+    },
+    {
+      "epoch": 0.2139209889814566,
+      "grad_norm": 15.626431465148926,
+      "learning_rate": 0.00017992851072637364,
+      "loss": 4.4908,
+      "step": 199
+    },
+    {
+      "epoch": 0.21499596882558453,
+      "grad_norm": 13.705154418945312,
+      "learning_rate": 0.00017972305737192366,
+      "loss": 3.9591,
+      "step": 200
+    },
+    {
+      "epoch": 0.21607094866971244,
+      "grad_norm": 17.541763305664062,
+      "learning_rate": 0.00017951667641049053,
+      "loss": 3.2296,
+      "step": 201
+    },
+    {
+      "epoch": 0.21714592851384037,
+      "grad_norm": 17.75946044921875,
+      "learning_rate": 0.0001793093702433924,
+      "loss": 3.4873,
+      "step": 202
+    },
+    {
+      "epoch": 0.21822090835796829,
+      "grad_norm": 21.355859756469727,
+      "learning_rate": 0.0001791011412827124,
+      "loss": 5.255,
+      "step": 203
+    },
+    {
+      "epoch": 0.21929588820209622,
+      "grad_norm": 11.548168182373047,
+      "learning_rate": 0.00017889199195127086,
+      "loss": 3.1538,
+      "step": 204
+    },
+    {
+      "epoch": 0.22037086804622413,
+      "grad_norm": 22.31789207458496,
+      "learning_rate": 0.00017868192468259686,
+      "loss": 4.0628,
+      "step": 205
+    },
+    {
+      "epoch": 0.22144584789035204,
+      "grad_norm": 13.790849685668945,
+      "learning_rate": 0.00017847094192090005,
+      "loss": 3.399,
+      "step": 206
+    },
+    {
+      "epoch": 0.22252082773447998,
+      "grad_norm": 9.146944046020508,
+      "learning_rate": 0.00017825904612104215,
+      "loss": 2.4616,
+      "step": 207
+    },
+    {
+      "epoch": 0.2235958075786079,
+      "grad_norm": 14.422385215759277,
+      "learning_rate": 0.00017804623974850844,
+      "loss": 3.5906,
+      "step": 208
+    },
+    {
+      "epoch": 0.22467078742273583,
+      "grad_norm": 16.845298767089844,
+      "learning_rate": 0.00017783252527937905,
+      "loss": 4.7812,
+      "step": 209
+    },
+    {
+      "epoch": 0.22574576726686374,
+      "grad_norm": 24.236703872680664,
+      "learning_rate": 0.0001776179052003001,
+      "loss": 5.1536,
+      "step": 210
+    },
+    {
+      "epoch": 0.22682074711099168,
+      "grad_norm": 21.463781356811523,
+      "learning_rate": 0.00017740238200845485,
+      "loss": 5.0244,
+      "step": 211
+    },
+    {
+      "epoch": 0.2278957269551196,
+      "grad_norm": 18.184011459350586,
+      "learning_rate": 0.00017718595821153462,
+      "loss": 5.0591,
+      "step": 212
+    },
+    {
+      "epoch": 0.22897070679924753,
+      "grad_norm": 16.528148651123047,
+      "learning_rate": 0.0001769686363277096,
+      "loss": 3.7127,
+      "step": 213
+    },
+    {
+      "epoch": 0.23004568664337544,
+      "grad_norm": 16.78187370300293,
+      "learning_rate": 0.0001767504188855995,
+      "loss": 4.499,
+      "step": 214
+    },
+    {
+      "epoch": 0.23112066648750335,
+      "grad_norm": 19.419116973876953,
+      "learning_rate": 0.00017653130842424427,
+      "loss": 3.4537,
+      "step": 215
+    },
+    {
+      "epoch": 0.23219564633163128,
+      "grad_norm": 14.738585472106934,
+      "learning_rate": 0.00017631130749307436,
+      "loss": 3.7363,
+      "step": 216
+    },
+    {
+      "epoch": 0.2332706261757592,
+      "grad_norm": 16.021595001220703,
+      "learning_rate": 0.0001760904186518812,
+      "loss": 4.2678,
+      "step": 217
+    },
+    {
+      "epoch": 0.23434560601988713,
+      "grad_norm": 16.80089569091797,
+      "learning_rate": 0.00017586864447078742,
+      "loss": 4.2492,
+      "step": 218
+    },
+    {
+      "epoch": 0.23542058586401504,
+      "grad_norm": 17.562274932861328,
+      "learning_rate": 0.0001756459875302169,
+      "loss": 4.4658,
+      "step": 219
+    },
+    {
+      "epoch": 0.23649556570814298,
+      "grad_norm": 13.105591773986816,
+      "learning_rate": 0.0001754224504208647,
+      "loss": 3.8664,
+      "step": 220
+    },
+    {
+      "epoch": 0.2375705455522709,
+      "grad_norm": 11.499171257019043,
+      "learning_rate": 0.00017519803574366698,
+      "loss": 3.7275,
+      "step": 221
+    },
+    {
+      "epoch": 0.23864552539639883,
+      "grad_norm": 14.275655746459961,
+      "learning_rate": 0.00017497274610977072,
+      "loss": 3.9924,
+      "step": 222
+    },
+    {
+      "epoch": 0.23972050524052674,
+      "grad_norm": 20.798551559448242,
+      "learning_rate": 0.00017474658414050342,
+      "loss": 4.1779,
+      "step": 223
+    },
+    {
+      "epoch": 0.24079548508465468,
+      "grad_norm": 17.76445770263672,
+      "learning_rate": 0.0001745195524673424,
+      "loss": 4.5601,
+      "step": 224
+    },
+    {
+      "epoch": 0.24187046492878259,
+      "grad_norm": 15.017122268676758,
+      "learning_rate": 0.00017429165373188438,
+      "loss": 3.23,
+      "step": 225
+    },
+    {
+      "epoch": 0.2429454447729105,
+      "grad_norm": 17.35443878173828,
+      "learning_rate": 0.00017406289058581465,
+      "loss": 4.0901,
+      "step": 226
+    },
+    {
+      "epoch": 0.24402042461703843,
+      "grad_norm": 17.933059692382812,
+      "learning_rate": 0.00017383326569087623,
+      "loss": 4.353,
+      "step": 227
+    },
+    {
+      "epoch": 0.24509540446116634,
+      "grad_norm": 16.176124572753906,
+      "learning_rate": 0.0001736027817188389,
+      "loss": 4.1159,
+      "step": 228
+    },
+    {
+      "epoch": 0.24617038430529428,
+      "grad_norm": 18.375131607055664,
+      "learning_rate": 0.00017337144135146817,
+      "loss": 4.673,
+      "step": 229
+    },
+    {
+      "epoch": 0.2472453641494222,
+      "grad_norm": 14.222498893737793,
+      "learning_rate": 0.00017313924728049393,
+      "loss": 3.5181,
+      "step": 230
+    },
+    {
+      "epoch": 0.24832034399355013,
+      "grad_norm": 14.064003944396973,
+      "learning_rate": 0.00017290620220757928,
+      "loss": 3.0101,
+      "step": 231
+    },
+    {
+      "epoch": 0.24939532383767804,
+      "grad_norm": 14.68502140045166,
+      "learning_rate": 0.00017267230884428905,
+      "loss": 2.8587,
+      "step": 232
+    },
+    {
+      "epoch": 0.25047030368180595,
+      "grad_norm": 15.439518928527832,
+      "learning_rate": 0.0001724375699120582,
+      "loss": 3.5475,
+      "step": 233
+    },
+    {
+      "epoch": 0.25047030368180595,
+      "eval_loss": 0.9161850214004517,
+      "eval_runtime": 5.6189,
+      "eval_samples_per_second": 69.765,
+      "eval_steps_per_second": 34.882,
+      "step": 233
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 931,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 233,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1757496188338176.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7130b963c3f2cbd527a2be5ff787de27ad0add8f8f0fb250c3e5d854997c98f
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff