Training in progress, step 200, checkpoint

Browse files

Files changed (12) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +45 -0
last-checkpoint/trainer_state.json +1458 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/mistral-7b
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/mistral-7b",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "gate_proj",
+    "o_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a0e78c49d5f64864a5c21a93b5d11c9d89de56b1cf2b9def457dcdd0c106e062
+size 671149168

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e89396e2f95d688a15f97c05aa13aef4fcd4406618965db2a9d35f9c22a9b15
+size 1342555602

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9ab2fa3f6cdfc2b2f902d1dd5a8a5a47ec334345d8ce1301b4b6e9dd74531de3
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1476a1aca76f187bd74e573a7df5cf16373c5fc2ca25ce5bfbcfe83c91c35c2f
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,45 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<unk>",
+  "padding_side": "left",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1458 @@

+{
+  "best_metric": 3.317033290863037,
+  "best_model_checkpoint": "miner_id_24/checkpoint-200",
+  "epoch": 0.0639897616381379,
+  "eval_steps": 200,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00031994880819068947,
+      "grad_norm": 22.934980392456055,
+      "learning_rate": 4e-05,
+      "loss": 6.7197,
+      "step": 1
+    },
+    {
+      "epoch": 0.00031994880819068947,
+      "eval_loss": 3.3727481365203857,
+      "eval_runtime": 231.8319,
+      "eval_samples_per_second": 5.677,
+      "eval_steps_per_second": 1.419,
+      "step": 1
+    },
+    {
+      "epoch": 0.0006398976163813789,
+      "grad_norm": 17.148805618286133,
+      "learning_rate": 8e-05,
+      "loss": 6.7449,
+      "step": 2
+    },
+    {
+      "epoch": 0.0009598464245720685,
+      "grad_norm": 7.445838451385498,
+      "learning_rate": 0.00012,
+      "loss": 5.9164,
+      "step": 3
+    },
+    {
+      "epoch": 0.0012797952327627579,
+      "grad_norm": 5.818986415863037,
+      "learning_rate": 0.00016,
+      "loss": 5.9623,
+      "step": 4
+    },
+    {
+      "epoch": 0.0015997440409534474,
+      "grad_norm": 45.06270980834961,
+      "learning_rate": 0.0002,
+      "loss": 6.327,
+      "step": 5
+    },
+    {
+      "epoch": 0.001919692849144137,
+      "grad_norm": 2.642199993133545,
+      "learning_rate": 0.00024,
+      "loss": 5.9756,
+      "step": 6
+    },
+    {
+      "epoch": 0.0022396416573348264,
+      "grad_norm": 10.692281723022461,
+      "learning_rate": 0.00028,
+      "loss": 6.0676,
+      "step": 7
+    },
+    {
+      "epoch": 0.0025595904655255157,
+      "grad_norm": 18.375436782836914,
+      "learning_rate": 0.00032,
+      "loss": 6.2809,
+      "step": 8
+    },
+    {
+      "epoch": 0.0028795392737162055,
+      "grad_norm": 8.690791130065918,
+      "learning_rate": 0.00036,
+      "loss": 6.5217,
+      "step": 9
+    },
+    {
+      "epoch": 0.003199488081906895,
+      "grad_norm": 278.91900634765625,
+      "learning_rate": 0.0004,
+      "loss": 20.744,
+      "step": 10
+    },
+    {
+      "epoch": 0.0035194368900975845,
+      "grad_norm": 62.391605377197266,
+      "learning_rate": 0.0003999998992085638,
+      "loss": 8.1234,
+      "step": 11
+    },
+    {
+      "epoch": 0.003839385698288274,
+      "grad_norm": 67.343017578125,
+      "learning_rate": 0.00039999959683436215,
+      "loss": 8.2013,
+      "step": 12
+    },
+    {
+      "epoch": 0.004159334506478964,
+      "grad_norm": 30.584428787231445,
+      "learning_rate": 0.0003999990928777159,
+      "loss": 7.7704,
+      "step": 13
+    },
+    {
+      "epoch": 0.004479283314669653,
+      "grad_norm": 35.005210876464844,
+      "learning_rate": 0.0003999983873391596,
+      "loss": 9.1118,
+      "step": 14
+    },
+    {
+      "epoch": 0.004799232122860342,
+      "grad_norm": 23.695714950561523,
+      "learning_rate": 0.00039999748021944193,
+      "loss": 7.7276,
+      "step": 15
+    },
+    {
+      "epoch": 0.0051191809310510315,
+      "grad_norm": 34.03984451293945,
+      "learning_rate": 0.0003999963715195253,
+      "loss": 7.4958,
+      "step": 16
+    },
+    {
+      "epoch": 0.005439129739241722,
+      "grad_norm": 13.545119285583496,
+      "learning_rate": 0.0003999950612405859,
+      "loss": 6.6119,
+      "step": 17
+    },
+    {
+      "epoch": 0.005759078547432411,
+      "grad_norm": 21.298782348632812,
+      "learning_rate": 0.0003999935493840139,
+      "loss": 7.3331,
+      "step": 18
+    },
+    {
+      "epoch": 0.0060790273556231,
+      "grad_norm": 8.125480651855469,
+      "learning_rate": 0.0003999918359514135,
+      "loss": 6.7062,
+      "step": 19
+    },
+    {
+      "epoch": 0.00639897616381379,
+      "grad_norm": 12.093789100646973,
+      "learning_rate": 0.0003999899209446023,
+      "loss": 6.6066,
+      "step": 20
+    },
+    {
+      "epoch": 0.006718924972004479,
+      "grad_norm": 5.6320085525512695,
+      "learning_rate": 0.00039998780436561234,
+      "loss": 6.4927,
+      "step": 21
+    },
+    {
+      "epoch": 0.007038873780195169,
+      "grad_norm": 4.095371723175049,
+      "learning_rate": 0.00039998548621668904,
+      "loss": 6.3477,
+      "step": 22
+    },
+    {
+      "epoch": 0.007358822588385858,
+      "grad_norm": 6.548256874084473,
+      "learning_rate": 0.00039998296650029197,
+      "loss": 6.627,
+      "step": 23
+    },
+    {
+      "epoch": 0.007678771396576548,
+      "grad_norm": 5.446329116821289,
+      "learning_rate": 0.0003999802452190944,
+      "loss": 6.4111,
+      "step": 24
+    },
+    {
+      "epoch": 0.007998720204767237,
+      "grad_norm": 338.24359130859375,
+      "learning_rate": 0.0003999773223759835,
+      "loss": 17.7788,
+      "step": 25
+    },
+    {
+      "epoch": 0.008318669012957927,
+      "grad_norm": 5.659632205963135,
+      "learning_rate": 0.0003999741979740603,
+      "loss": 6.6552,
+      "step": 26
+    },
+    {
+      "epoch": 0.008638617821148616,
+      "grad_norm": 5.3004679679870605,
+      "learning_rate": 0.00039997087201663976,
+      "loss": 6.6323,
+      "step": 27
+    },
+    {
+      "epoch": 0.008958566629339306,
+      "grad_norm": 4.444129943847656,
+      "learning_rate": 0.00039996734450725046,
+      "loss": 6.5521,
+      "step": 28
+    },
+    {
+      "epoch": 0.009278515437529996,
+      "grad_norm": 5.8855695724487305,
+      "learning_rate": 0.0003999636154496351,
+      "loss": 6.4258,
+      "step": 29
+    },
+    {
+      "epoch": 0.009598464245720684,
+      "grad_norm": 4.542850971221924,
+      "learning_rate": 0.00039995968484774993,
+      "loss": 6.635,
+      "step": 30
+    },
+    {
+      "epoch": 0.009918413053911375,
+      "grad_norm": 11.79590892791748,
+      "learning_rate": 0.0003999555527057653,
+      "loss": 6.1833,
+      "step": 31
+    },
+    {
+      "epoch": 0.010238361862102063,
+      "grad_norm": 7.367285251617432,
+      "learning_rate": 0.00039995121902806506,
+      "loss": 6.7455,
+      "step": 32
+    },
+    {
+      "epoch": 0.010558310670292753,
+      "grad_norm": 5.325396537780762,
+      "learning_rate": 0.0003999466838192473,
+      "loss": 6.6042,
+      "step": 33
+    },
+    {
+      "epoch": 0.010878259478483443,
+      "grad_norm": 4.978571891784668,
+      "learning_rate": 0.00039994194708412365,
+      "loss": 6.8796,
+      "step": 34
+    },
+    {
+      "epoch": 0.011198208286674132,
+      "grad_norm": 7.301144599914551,
+      "learning_rate": 0.0003999370088277195,
+      "loss": 6.5597,
+      "step": 35
+    },
+    {
+      "epoch": 0.011518157094864822,
+      "grad_norm": 3.852827787399292,
+      "learning_rate": 0.00039993186905527427,
+      "loss": 6.4115,
+      "step": 36
+    },
+    {
+      "epoch": 0.01183810590305551,
+      "grad_norm": 12.045527458190918,
+      "learning_rate": 0.000399926527772241,
+      "loss": 6.6072,
+      "step": 37
+    },
+    {
+      "epoch": 0.0121580547112462,
+      "grad_norm": 4.803719997406006,
+      "learning_rate": 0.00039992098498428663,
+      "loss": 6.6006,
+      "step": 38
+    },
+    {
+      "epoch": 0.01247800351943689,
+      "grad_norm": 14.13237476348877,
+      "learning_rate": 0.0003999152406972919,
+      "loss": 6.3613,
+      "step": 39
+    },
+    {
+      "epoch": 0.01279795232762758,
+      "grad_norm": 5.234755516052246,
+      "learning_rate": 0.00039990929491735117,
+      "loss": 6.4241,
+      "step": 40
+    },
+    {
+      "epoch": 0.01311790113581827,
+      "grad_norm": 913.4176635742188,
+      "learning_rate": 0.0003999031476507727,
+      "loss": 9.6577,
+      "step": 41
+    },
+    {
+      "epoch": 0.013437849944008958,
+      "grad_norm": 203.76280212402344,
+      "learning_rate": 0.0003998967989040786,
+      "loss": 11.1443,
+      "step": 42
+    },
+    {
+      "epoch": 0.013757798752199648,
+      "grad_norm": 201.66928100585938,
+      "learning_rate": 0.0003998902486840046,
+      "loss": 8.6425,
+      "step": 43
+    },
+    {
+      "epoch": 0.014077747560390338,
+      "grad_norm": 100.8696060180664,
+      "learning_rate": 0.0003998834969975002,
+      "loss": 9.0582,
+      "step": 44
+    },
+    {
+      "epoch": 0.014397696368581027,
+      "grad_norm": 39.69711685180664,
+      "learning_rate": 0.0003998765438517287,
+      "loss": 7.4971,
+      "step": 45
+    },
+    {
+      "epoch": 0.014717645176771717,
+      "grad_norm": 37.98813247680664,
+      "learning_rate": 0.0003998693892540672,
+      "loss": 6.8231,
+      "step": 46
+    },
+    {
+      "epoch": 0.015037593984962405,
+      "grad_norm": 10.686429977416992,
+      "learning_rate": 0.0003998620332121064,
+      "loss": 6.8307,
+      "step": 47
+    },
+    {
+      "epoch": 0.015357542793153095,
+      "grad_norm": 23.25743865966797,
+      "learning_rate": 0.0003998544757336509,
+      "loss": 7.0079,
+      "step": 48
+    },
+    {
+      "epoch": 0.015677491601343786,
+      "grad_norm": 22.725971221923828,
+      "learning_rate": 0.0003998467168267187,
+      "loss": 6.8757,
+      "step": 49
+    },
+    {
+      "epoch": 0.015997440409534474,
+      "grad_norm": 8.51931095123291,
+      "learning_rate": 0.0003998387564995418,
+      "loss": 6.8021,
+      "step": 50
+    },
+    {
+      "epoch": 0.016317389217725162,
+      "grad_norm": 6.992290496826172,
+      "learning_rate": 0.0003998305947605658,
+      "loss": 7.0493,
+      "step": 51
+    },
+    {
+      "epoch": 0.016637338025915854,
+      "grad_norm": 4.017070770263672,
+      "learning_rate": 0.0003998222316184501,
+      "loss": 6.7286,
+      "step": 52
+    },
+    {
+      "epoch": 0.016957286834106543,
+      "grad_norm": 4.484210968017578,
+      "learning_rate": 0.00039981366708206743,
+      "loss": 6.7036,
+      "step": 53
+    },
+    {
+      "epoch": 0.01727723564229723,
+      "grad_norm": 8.955873489379883,
+      "learning_rate": 0.0003998049011605047,
+      "loss": 7.0503,
+      "step": 54
+    },
+    {
+      "epoch": 0.017597184450487923,
+      "grad_norm": 6.5257792472839355,
+      "learning_rate": 0.00039979593386306223,
+      "loss": 6.905,
+      "step": 55
+    },
+    {
+      "epoch": 0.01791713325867861,
+      "grad_norm": 6.154618740081787,
+      "learning_rate": 0.00039978676519925374,
+      "loss": 6.761,
+      "step": 56
+    },
+    {
+      "epoch": 0.0182370820668693,
+      "grad_norm": 6.142489910125732,
+      "learning_rate": 0.00039977739517880703,
+      "loss": 6.6128,
+      "step": 57
+    },
+    {
+      "epoch": 0.018557030875059992,
+      "grad_norm": 10.0971040725708,
+      "learning_rate": 0.0003997678238116633,
+      "loss": 6.9919,
+      "step": 58
+    },
+    {
+      "epoch": 0.01887697968325068,
+      "grad_norm": 6.254273414611816,
+      "learning_rate": 0.00039975805110797745,
+      "loss": 6.7167,
+      "step": 59
+    },
+    {
+      "epoch": 0.01919692849144137,
+      "grad_norm": 6.39977502822876,
+      "learning_rate": 0.0003997480770781178,
+      "loss": 6.6531,
+      "step": 60
+    },
+    {
+      "epoch": 0.019516877299632057,
+      "grad_norm": 4.507251739501953,
+      "learning_rate": 0.0003997379017326666,
+      "loss": 6.7717,
+      "step": 61
+    },
+    {
+      "epoch": 0.01983682610782275,
+      "grad_norm": 5.046263694763184,
+      "learning_rate": 0.00039972752508241944,
+      "loss": 6.807,
+      "step": 62
+    },
+    {
+      "epoch": 0.020156774916013438,
+      "grad_norm": 7.079762935638428,
+      "learning_rate": 0.0003997169471383855,
+      "loss": 6.7065,
+      "step": 63
+    },
+    {
+      "epoch": 0.020476723724204126,
+      "grad_norm": 6.362200736999512,
+      "learning_rate": 0.00039970616791178777,
+      "loss": 6.8103,
+      "step": 64
+    },
+    {
+      "epoch": 0.020796672532394818,
+      "grad_norm": 41.3747673034668,
+      "learning_rate": 0.00039969518741406234,
+      "loss": 6.8975,
+      "step": 65
+    },
+    {
+      "epoch": 0.021116621340585506,
+      "grad_norm": 5.478196620941162,
+      "learning_rate": 0.0003996840056568593,
+      "loss": 6.658,
+      "step": 66
+    },
+    {
+      "epoch": 0.021436570148776195,
+      "grad_norm": 3.9833760261535645,
+      "learning_rate": 0.000399672622652042,
+      "loss": 6.7767,
+      "step": 67
+    },
+    {
+      "epoch": 0.021756518956966887,
+      "grad_norm": 3.953526496887207,
+      "learning_rate": 0.0003996610384116874,
+      "loss": 6.4851,
+      "step": 68
+    },
+    {
+      "epoch": 0.022076467765157575,
+      "grad_norm": 4.118646621704102,
+      "learning_rate": 0.0003996492529480859,
+      "loss": 6.6975,
+      "step": 69
+    },
+    {
+      "epoch": 0.022396416573348264,
+      "grad_norm": 3.562244415283203,
+      "learning_rate": 0.00039963726627374155,
+      "loss": 6.619,
+      "step": 70
+    },
+    {
+      "epoch": 0.022716365381538955,
+      "grad_norm": 7.297116279602051,
+      "learning_rate": 0.00039962507840137163,
+      "loss": 6.8697,
+      "step": 71
+    },
+    {
+      "epoch": 0.023036314189729644,
+      "grad_norm": 4.813419818878174,
+      "learning_rate": 0.0003996126893439071,
+      "loss": 6.8995,
+      "step": 72
+    },
+    {
+      "epoch": 0.023356262997920332,
+      "grad_norm": 7.643682956695557,
+      "learning_rate": 0.0003996000991144922,
+      "loss": 7.0324,
+      "step": 73
+    },
+    {
+      "epoch": 0.02367621180611102,
+      "grad_norm": 5.103664398193359,
+      "learning_rate": 0.00039958730772648483,
+      "loss": 6.7137,
+      "step": 74
+    },
+    {
+      "epoch": 0.023996160614301713,
+      "grad_norm": 19.361007690429688,
+      "learning_rate": 0.000399574315193456,
+      "loss": 6.6955,
+      "step": 75
+    },
+    {
+      "epoch": 0.0243161094224924,
+      "grad_norm": 191.5496826171875,
+      "learning_rate": 0.0003995611215291904,
+      "loss": 12.0782,
+      "step": 76
+    },
+    {
+      "epoch": 0.02463605823068309,
+      "grad_norm": 52.53076171875,
+      "learning_rate": 0.00039954772674768605,
+      "loss": 8.514,
+      "step": 77
+    },
+    {
+      "epoch": 0.02495600703887378,
+      "grad_norm": 138.49893188476562,
+      "learning_rate": 0.0003995341308631543,
+      "loss": 31.316,
+      "step": 78
+    },
+    {
+      "epoch": 0.02527595584706447,
+      "grad_norm": 22.098583221435547,
+      "learning_rate": 0.00039952033389001985,
+      "loss": 8.0865,
+      "step": 79
+    },
+    {
+      "epoch": 0.02559590465525516,
+      "grad_norm": 17.31378936767578,
+      "learning_rate": 0.00039950633584292063,
+      "loss": 7.4476,
+      "step": 80
+    },
+    {
+      "epoch": 0.02591585346344585,
+      "grad_norm": 12.283071517944336,
+      "learning_rate": 0.00039949213673670826,
+      "loss": 7.1162,
+      "step": 81
+    },
+    {
+      "epoch": 0.02623580227163654,
+      "grad_norm": 11.07091999053955,
+      "learning_rate": 0.00039947773658644735,
+      "loss": 6.9467,
+      "step": 82
+    },
+    {
+      "epoch": 0.026555751079827227,
+      "grad_norm": 5.554511547088623,
+      "learning_rate": 0.00039946313540741593,
+      "loss": 6.9002,
+      "step": 83
+    },
+    {
+      "epoch": 0.026875699888017916,
+      "grad_norm": 7.668940544128418,
+      "learning_rate": 0.0003994483332151053,
+      "loss": 6.7886,
+      "step": 84
+    },
+    {
+      "epoch": 0.027195648696208607,
+      "grad_norm": 6.739316463470459,
+      "learning_rate": 0.0003994333300252201,
+      "loss": 6.7452,
+      "step": 85
+    },
+    {
+      "epoch": 0.027515597504399296,
+      "grad_norm": 4.137218952178955,
+      "learning_rate": 0.0003994181258536781,
+      "loss": 6.4451,
+      "step": 86
+    },
+    {
+      "epoch": 0.027835546312589984,
+      "grad_norm": 6.443964958190918,
+      "learning_rate": 0.0003994027207166103,
+      "loss": 6.5939,
+      "step": 87
+    },
+    {
+      "epoch": 0.028155495120780676,
+      "grad_norm": 5.331020832061768,
+      "learning_rate": 0.00039938711463036105,
+      "loss": 6.473,
+      "step": 88
+    },
+    {
+      "epoch": 0.028475443928971365,
+      "grad_norm": 3.314025640487671,
+      "learning_rate": 0.00039937130761148775,
+      "loss": 6.4812,
+      "step": 89
+    },
+    {
+      "epoch": 0.028795392737162053,
+      "grad_norm": 3.9543004035949707,
+      "learning_rate": 0.0003993552996767611,
+      "loss": 6.7518,
+      "step": 90
+    },
+    {
+      "epoch": 0.029115341545352745,
+      "grad_norm": 4.591469764709473,
+      "learning_rate": 0.00039933909084316493,
+      "loss": 6.8772,
+      "step": 91
+    },
+    {
+      "epoch": 0.029435290353543433,
+      "grad_norm": 5.4851837158203125,
+      "learning_rate": 0.00039932268112789624,
+      "loss": 6.7562,
+      "step": 92
+    },
+    {
+      "epoch": 0.029755239161734122,
+      "grad_norm": 3.0986509323120117,
+      "learning_rate": 0.00039930607054836504,
+      "loss": 6.7457,
+      "step": 93
+    },
+    {
+      "epoch": 0.03007518796992481,
+      "grad_norm": 6.524487018585205,
+      "learning_rate": 0.00039928925912219456,
+      "loss": 6.6129,
+      "step": 94
+    },
+    {
+      "epoch": 0.030395136778115502,
+      "grad_norm": 5.0719313621521,
+      "learning_rate": 0.0003992722468672211,
+      "loss": 6.5837,
+      "step": 95
+    },
+    {
+      "epoch": 0.03071508558630619,
+      "grad_norm": 4.822458744049072,
+      "learning_rate": 0.00039925503380149405,
+      "loss": 6.5082,
+      "step": 96
+    },
+    {
+      "epoch": 0.03103503439449688,
+      "grad_norm": 5.4806599617004395,
+      "learning_rate": 0.00039923761994327574,
+      "loss": 6.8089,
+      "step": 97
+    },
+    {
+      "epoch": 0.03135498320268757,
+      "grad_norm": 5.32270622253418,
+      "learning_rate": 0.00039922000531104174,
+      "loss": 6.8169,
+      "step": 98
+    },
+    {
+      "epoch": 0.03167493201087826,
+      "grad_norm": 4.849973678588867,
+      "learning_rate": 0.00039920218992348046,
+      "loss": 6.5797,
+      "step": 99
+    },
+    {
+      "epoch": 0.03199488081906895,
+      "grad_norm": 3.636552095413208,
+      "learning_rate": 0.00039918417379949326,
+      "loss": 6.6796,
+      "step": 100
+    },
+    {
+      "epoch": 0.032314829627259636,
+      "grad_norm": 7.389275550842285,
+      "learning_rate": 0.0003991659569581948,
+      "loss": 6.5838,
+      "step": 101
+    },
+    {
+      "epoch": 0.032634778435450325,
+      "grad_norm": 6.951813220977783,
+      "learning_rate": 0.0003991475394189123,
+      "loss": 6.6162,
+      "step": 102
+    },
+    {
+      "epoch": 0.03295472724364102,
+      "grad_norm": 6.038928985595703,
+      "learning_rate": 0.000399128921201186,
+      "loss": 6.5367,
+      "step": 103
+    },
+    {
+      "epoch": 0.03327467605183171,
+      "grad_norm": 4.697129249572754,
+      "learning_rate": 0.0003991101023247693,
+      "loss": 6.822,
+      "step": 104
+    },
+    {
+      "epoch": 0.0335946248600224,
+      "grad_norm": 3.3993539810180664,
+      "learning_rate": 0.00039909108280962826,
+      "loss": 6.7785,
+      "step": 105
+    },
+    {
+      "epoch": 0.033914573668213085,
+      "grad_norm": 5.630346775054932,
+      "learning_rate": 0.0003990718626759419,
+      "loss": 6.6505,
+      "step": 106
+    },
+    {
+      "epoch": 0.034234522476403774,
+      "grad_norm": 4.182599067687988,
+      "learning_rate": 0.00039905244194410203,
+      "loss": 6.7451,
+      "step": 107
+    },
+    {
+      "epoch": 0.03455447128459446,
+      "grad_norm": 3.2485625743865967,
+      "learning_rate": 0.00039903282063471324,
+      "loss": 6.5288,
+      "step": 108
+    },
+    {
+      "epoch": 0.03487442009278516,
+      "grad_norm": 10.302080154418945,
+      "learning_rate": 0.00039901299876859313,
+      "loss": 6.9176,
+      "step": 109
+    },
+    {
+      "epoch": 0.035194368900975846,
+      "grad_norm": 10.48038387298584,
+      "learning_rate": 0.00039899297636677197,
+      "loss": 6.9726,
+      "step": 110
+    },
+    {
+      "epoch": 0.035514317709166535,
+      "grad_norm": 5.058242321014404,
+      "learning_rate": 0.00039897275345049263,
+      "loss": 6.6908,
+      "step": 111
+    },
+    {
+      "epoch": 0.03583426651735722,
+      "grad_norm": 6.116095066070557,
+      "learning_rate": 0.000398952330041211,
+      "loss": 6.5533,
+      "step": 112
+    },
+    {
+      "epoch": 0.03615421532554791,
+      "grad_norm": 6.3986639976501465,
+      "learning_rate": 0.0003989317061605955,
+      "loss": 6.8069,
+      "step": 113
+    },
+    {
+      "epoch": 0.0364741641337386,
+      "grad_norm": 5.98941707611084,
+      "learning_rate": 0.0003989108818305273,
+      "loss": 6.8029,
+      "step": 114
+    },
+    {
+      "epoch": 0.03679411294192929,
+      "grad_norm": 4.1077470779418945,
+      "learning_rate": 0.00039888985707310024,
+      "loss": 6.5364,
+      "step": 115
+    },
+    {
+      "epoch": 0.037114061750119984,
+      "grad_norm": 7.463810443878174,
+      "learning_rate": 0.00039886863191062076,
+      "loss": 6.7412,
+      "step": 116
+    },
+    {
+      "epoch": 0.03743401055831067,
+      "grad_norm": 4.7363362312316895,
+      "learning_rate": 0.000398847206365608,
+      "loss": 6.6096,
+      "step": 117
+    },
+    {
+      "epoch": 0.03775395936650136,
+      "grad_norm": 7.381870269775391,
+      "learning_rate": 0.00039882558046079364,
+      "loss": 6.9752,
+      "step": 118
+    },
+    {
+      "epoch": 0.03807390817469205,
+      "grad_norm": 11.667238235473633,
+      "learning_rate": 0.000398803754219122,
+      "loss": 6.9014,
+      "step": 119
+    },
+    {
+      "epoch": 0.03839385698288274,
+      "grad_norm": 10.061962127685547,
+      "learning_rate": 0.0003987817276637498,
+      "loss": 6.6761,
+      "step": 120
+    },
+    {
+      "epoch": 0.038713805791073426,
+      "grad_norm": 6.85370397567749,
+      "learning_rate": 0.00039875950081804653,
+      "loss": 6.7025,
+      "step": 121
+    },
+    {
+      "epoch": 0.039033754599264114,
+      "grad_norm": 6.230566501617432,
+      "learning_rate": 0.0003987370737055939,
+      "loss": 6.7836,
+      "step": 122
+    },
+    {
+      "epoch": 0.03935370340745481,
+      "grad_norm": 9.128447532653809,
+      "learning_rate": 0.0003987144463501864,
+      "loss": 7.0278,
+      "step": 123
+    },
+    {
+      "epoch": 0.0396736522156455,
+      "grad_norm": 4.422046184539795,
+      "learning_rate": 0.0003986916187758306,
+      "loss": 6.7092,
+      "step": 124
+    },
+    {
+      "epoch": 0.03999360102383619,
+      "grad_norm": 5.277736186981201,
+      "learning_rate": 0.00039866859100674585,
+      "loss": 6.5118,
+      "step": 125
+    },
+    {
+      "epoch": 0.040313549832026875,
+      "grad_norm": 5.670833110809326,
+      "learning_rate": 0.0003986453630673637,
+      "loss": 6.6172,
+      "step": 126
+    },
+    {
+      "epoch": 0.040633498640217564,
+      "grad_norm": 8.088812828063965,
+      "learning_rate": 0.00039862193498232815,
+      "loss": 6.7693,
+      "step": 127
+    },
+    {
+      "epoch": 0.04095344744840825,
+      "grad_norm": 4.622250556945801,
+      "learning_rate": 0.0003985983067764955,
+      "loss": 6.7558,
+      "step": 128
+    },
+    {
+      "epoch": 0.04127339625659895,
+      "grad_norm": 3.96929669380188,
+      "learning_rate": 0.0003985744784749343,
+      "loss": 6.6975,
+      "step": 129
+    },
+    {
+      "epoch": 0.041593345064789636,
+      "grad_norm": 4.888178825378418,
+      "learning_rate": 0.00039855045010292565,
+      "loss": 6.7552,
+      "step": 130
+    },
+    {
+      "epoch": 0.041913293872980324,
+      "grad_norm": 3.579385280609131,
+      "learning_rate": 0.0003985262216859627,
+      "loss": 6.5003,
+      "step": 131
+    },
+    {
+      "epoch": 0.04223324268117101,
+      "grad_norm": 7.001453876495361,
+      "learning_rate": 0.0003985017932497508,
+      "loss": 6.6921,
+      "step": 132
+    },
+    {
+      "epoch": 0.0425531914893617,
+      "grad_norm": 3.956249713897705,
+      "learning_rate": 0.00039847716482020767,
+      "loss": 6.6835,
+      "step": 133
+    },
+    {
+      "epoch": 0.04287314029755239,
+      "grad_norm": 6.118314266204834,
+      "learning_rate": 0.0003984523364234632,
+      "loss": 6.6699,
+      "step": 134
+    },
+    {
+      "epoch": 0.04319308910574308,
+      "grad_norm": 6.215986728668213,
+      "learning_rate": 0.00039842730808585926,
+      "loss": 6.5631,
+      "step": 135
+    },
+    {
+      "epoch": 0.04351303791393377,
+      "grad_norm": 5.343611717224121,
+      "learning_rate": 0.00039840207983395017,
+      "loss": 6.4691,
+      "step": 136
+    },
+    {
+      "epoch": 0.04383298672212446,
+      "grad_norm": 5.891202449798584,
+      "learning_rate": 0.00039837665169450195,
+      "loss": 6.3602,
+      "step": 137
+    },
+    {
+      "epoch": 0.04415293553031515,
+      "grad_norm": 5.650427341461182,
+      "learning_rate": 0.000398351023694493,
+      "loss": 6.5209,
+      "step": 138
+    },
+    {
+      "epoch": 0.04447288433850584,
+      "grad_norm": 8.644763946533203,
+      "learning_rate": 0.0003983251958611137,
+      "loss": 6.7151,
+      "step": 139
+    },
+    {
+      "epoch": 0.04479283314669653,
+      "grad_norm": 6.320985317230225,
+      "learning_rate": 0.00039829916822176634,
+      "loss": 6.615,
+      "step": 140
+    },
+    {
+      "epoch": 0.045112781954887216,
+      "grad_norm": 6.1446428298950195,
+      "learning_rate": 0.0003982729408040653,
+      "loss": 6.5516,
+      "step": 141
+    },
+    {
+      "epoch": 0.04543273076307791,
+      "grad_norm": 5.089375972747803,
+      "learning_rate": 0.00039824651363583693,
+      "loss": 6.5904,
+      "step": 142
+    },
+    {
+      "epoch": 0.0457526795712686,
+      "grad_norm": 5.653002738952637,
+      "learning_rate": 0.00039821988674511934,
+      "loss": 6.5341,
+      "step": 143
+    },
+    {
+      "epoch": 0.04607262837945929,
+      "grad_norm": 9.837624549865723,
+      "learning_rate": 0.0003981930601601628,
+      "loss": 6.656,
+      "step": 144
+    },
+    {
+      "epoch": 0.046392577187649976,
+      "grad_norm": 5.145659446716309,
+      "learning_rate": 0.0003981660339094293,
+      "loss": 6.3306,
+      "step": 145
+    },
+    {
+      "epoch": 0.046712525995840665,
+      "grad_norm": 4.123940944671631,
+      "learning_rate": 0.00039813880802159254,
+      "loss": 6.5867,
+      "step": 146
+    },
+    {
+      "epoch": 0.04703247480403135,
+      "grad_norm": 7.826724529266357,
+      "learning_rate": 0.0003981113825255383,
+      "loss": 6.6373,
+      "step": 147
+    },
+    {
+      "epoch": 0.04735242361222204,
+      "grad_norm": 7.781490802764893,
+      "learning_rate": 0.00039808375745036396,
+      "loss": 6.6397,
+      "step": 148
+    },
+    {
+      "epoch": 0.04767237242041274,
+      "grad_norm": 5.552570343017578,
+      "learning_rate": 0.0003980559328253787,
+      "loss": 6.673,
+      "step": 149
+    },
+    {
+      "epoch": 0.047992321228603425,
+      "grad_norm": 6.54767370223999,
+      "learning_rate": 0.00039802790868010335,
+      "loss": 6.5612,
+      "step": 150
+    },
+    {
+      "epoch": 0.048312270036794114,
+      "grad_norm": 8.244477272033691,
+      "learning_rate": 0.00039799968504427056,
+      "loss": 6.74,
+      "step": 151
+    },
+    {
+      "epoch": 0.0486322188449848,
+      "grad_norm": 7.447508811950684,
+      "learning_rate": 0.0003979712619478245,
+      "loss": 6.7098,
+      "step": 152
+    },
+    {
+      "epoch": 0.04895216765317549,
+      "grad_norm": 7.424586772918701,
+      "learning_rate": 0.00039794263942092103,
+      "loss": 6.6281,
+      "step": 153
+    },
+    {
+      "epoch": 0.04927211646136618,
+      "grad_norm": 6.1032891273498535,
+      "learning_rate": 0.00039791381749392754,
+      "loss": 6.5981,
+      "step": 154
+    },
+    {
+      "epoch": 0.04959206526955687,
+      "grad_norm": 5.076866149902344,
+      "learning_rate": 0.00039788479619742314,
+      "loss": 6.5594,
+      "step": 155
+    },
+    {
+      "epoch": 0.04991201407774756,
+      "grad_norm": 6.170379161834717,
+      "learning_rate": 0.00039785557556219807,
+      "loss": 6.6772,
+      "step": 156
+    },
+    {
+      "epoch": 0.05023196288593825,
+      "grad_norm": 3.279794454574585,
+      "learning_rate": 0.00039782615561925457,
+      "loss": 6.6905,
+      "step": 157
+    },
+    {
+      "epoch": 0.05055191169412894,
+      "grad_norm": 4.933487892150879,
+      "learning_rate": 0.000397796536399806,
+      "loss": 6.5733,
+      "step": 158
+    },
+    {
+      "epoch": 0.05087186050231963,
+      "grad_norm": 4.821317672729492,
+      "learning_rate": 0.00039776671793527734,
+      "loss": 6.5997,
+      "step": 159
+    },
+    {
+      "epoch": 0.05119180931051032,
+      "grad_norm": 4.446855068206787,
+      "learning_rate": 0.00039773670025730466,
+      "loss": 6.5848,
+      "step": 160
+    },
+    {
+      "epoch": 0.051511758118701005,
+      "grad_norm": 5.981055736541748,
+      "learning_rate": 0.0003977064833977358,
+      "loss": 6.5767,
+      "step": 161
+    },
+    {
+      "epoch": 0.0518317069268917,
+      "grad_norm": 5.806812286376953,
+      "learning_rate": 0.0003976760673886296,
+      "loss": 6.6296,
+      "step": 162
+    },
+    {
+      "epoch": 0.05215165573508239,
+      "grad_norm": 7.575981140136719,
+      "learning_rate": 0.0003976454522622563,
+      "loss": 6.521,
+      "step": 163
+    },
+    {
+      "epoch": 0.05247160454327308,
+      "grad_norm": 7.472120761871338,
+      "learning_rate": 0.00039761463805109744,
+      "loss": 6.6819,
+      "step": 164
+    },
+    {
+      "epoch": 0.052791553351463766,
+      "grad_norm": 5.2565412521362305,
+      "learning_rate": 0.0003975836247878458,
+      "loss": 6.5501,
+      "step": 165
+    },
+    {
+      "epoch": 0.053111502159654454,
+      "grad_norm": 5.9353413581848145,
+      "learning_rate": 0.0003975524125054051,
+      "loss": 6.7042,
+      "step": 166
+    },
+    {
+      "epoch": 0.05343145096784514,
+      "grad_norm": 4.687541961669922,
+      "learning_rate": 0.00039752100123689065,
+      "loss": 6.6537,
+      "step": 167
+    },
+    {
+      "epoch": 0.05375139977603583,
+      "grad_norm": 5.580118179321289,
+      "learning_rate": 0.00039748939101562846,
+      "loss": 6.6087,
+      "step": 168
+    },
+    {
+      "epoch": 0.054071348584226527,
+      "grad_norm": 6.239872932434082,
+      "learning_rate": 0.00039745758187515585,
+      "loss": 6.6247,
+      "step": 169
+    },
+    {
+      "epoch": 0.054391297392417215,
+      "grad_norm": 7.768801689147949,
+      "learning_rate": 0.000397425573849221,
+      "loss": 6.6399,
+      "step": 170
+    },
+    {
+      "epoch": 0.0547112462006079,
+      "grad_norm": 7.151449203491211,
+      "learning_rate": 0.00039739336697178343,
+      "loss": 6.4196,
+      "step": 171
+    },
+    {
+      "epoch": 0.05503119500879859,
+      "grad_norm": 9.474916458129883,
+      "learning_rate": 0.0003973609612770133,
+      "loss": 6.8067,
+      "step": 172
+    },
+    {
+      "epoch": 0.05535114381698928,
+      "grad_norm": 7.509045124053955,
+      "learning_rate": 0.00039732835679929184,
+      "loss": 6.91,
+      "step": 173
+    },
+    {
+      "epoch": 0.05567109262517997,
+      "grad_norm": 6.453274726867676,
+      "learning_rate": 0.00039729555357321123,
+      "loss": 6.6127,
+      "step": 174
+    },
+    {
+      "epoch": 0.055991041433370664,
+      "grad_norm": 6.131414413452148,
+      "learning_rate": 0.00039726255163357444,
+      "loss": 6.6598,
+      "step": 175
+    },
+    {
+      "epoch": 0.05631099024156135,
+      "grad_norm": 7.787036895751953,
+      "learning_rate": 0.00039722935101539527,
+      "loss": 6.6519,
+      "step": 176
+    },
+    {
+      "epoch": 0.05663093904975204,
+      "grad_norm": 7.098258018493652,
+      "learning_rate": 0.00039719595175389833,
+      "loss": 6.6283,
+      "step": 177
+    },
+    {
+      "epoch": 0.05695088785794273,
+      "grad_norm": 6.19675350189209,
+      "learning_rate": 0.0003971623538845191,
+      "loss": 6.5764,
+      "step": 178
+    },
+    {
+      "epoch": 0.05727083666613342,
+      "grad_norm": 6.365749359130859,
+      "learning_rate": 0.0003971285574429034,
+      "loss": 6.8207,
+      "step": 179
+    },
+    {
+      "epoch": 0.057590785474324106,
+      "grad_norm": 6.72816801071167,
+      "learning_rate": 0.0003970945624649082,
+      "loss": 6.5185,
+      "step": 180
+    },
+    {
+      "epoch": 0.057910734282514795,
+      "grad_norm": 6.747672080993652,
+      "learning_rate": 0.00039706036898660095,
+      "loss": 6.5095,
+      "step": 181
+    },
+    {
+      "epoch": 0.05823068309070549,
+      "grad_norm": 8.517967224121094,
+      "learning_rate": 0.0003970259770442594,
+      "loss": 6.6087,
+      "step": 182
+    },
+    {
+      "epoch": 0.05855063189889618,
+      "grad_norm": 7.719740390777588,
+      "learning_rate": 0.00039699138667437234,
+      "loss": 6.6355,
+      "step": 183
+    },
+    {
+      "epoch": 0.05887058070708687,
+      "grad_norm": 5.7464518547058105,
+      "learning_rate": 0.0003969565979136387,
+      "loss": 6.5026,
+      "step": 184
+    },
+    {
+      "epoch": 0.059190529515277555,
+      "grad_norm": 7.535585403442383,
+      "learning_rate": 0.00039692161079896816,
+      "loss": 6.5819,
+      "step": 185
+    },
+    {
+      "epoch": 0.059510478323468244,
+      "grad_norm": 10.726088523864746,
+      "learning_rate": 0.0003968864253674806,
+      "loss": 6.6332,
+      "step": 186
+    },
+    {
+      "epoch": 0.05983042713165893,
+      "grad_norm": 6.952598571777344,
+      "learning_rate": 0.0003968510416565067,
+      "loss": 6.4593,
+      "step": 187
+    },
+    {
+      "epoch": 0.06015037593984962,
+      "grad_norm": 5.812473297119141,
+      "learning_rate": 0.0003968154597035869,
+      "loss": 6.4498,
+      "step": 188
+    },
+    {
+      "epoch": 0.060470324748040316,
+      "grad_norm": 6.252615928649902,
+      "learning_rate": 0.00039677967954647263,
+      "loss": 6.4724,
+      "step": 189
+    },
+    {
+      "epoch": 0.060790273556231005,
+      "grad_norm": 5.516043663024902,
+      "learning_rate": 0.00039674370122312505,
+      "loss": 6.6256,
+      "step": 190
+    },
+    {
+      "epoch": 0.06111022236442169,
+      "grad_norm": 7.377323627471924,
+      "learning_rate": 0.00039670752477171604,
+      "loss": 6.8132,
+      "step": 191
+    },
+    {
+      "epoch": 0.06143017117261238,
+      "grad_norm": 7.111225128173828,
+      "learning_rate": 0.0003966711502306273,
+      "loss": 6.6017,
+      "step": 192
+    },
+    {
+      "epoch": 0.06175011998080307,
+      "grad_norm": 7.866201877593994,
+      "learning_rate": 0.0003966345776384509,
+      "loss": 6.6965,
+      "step": 193
+    },
+    {
+      "epoch": 0.06207006878899376,
+      "grad_norm": 8.679105758666992,
+      "learning_rate": 0.00039659780703398895,
+      "loss": 6.482,
+      "step": 194
+    },
+    {
+      "epoch": 0.062390017597184454,
+      "grad_norm": 6.715935230255127,
+      "learning_rate": 0.00039656083845625377,
+      "loss": 6.5263,
+      "step": 195
+    },
+    {
+      "epoch": 0.06270996640537514,
+      "grad_norm": 6.897395610809326,
+      "learning_rate": 0.0003965236719444675,
+      "loss": 6.7682,
+      "step": 196
+    },
+    {
+      "epoch": 0.06302991521356582,
+      "grad_norm": 5.384663105010986,
+      "learning_rate": 0.0003964863075380626,
+      "loss": 6.5211,
+      "step": 197
+    },
+    {
+      "epoch": 0.06334986402175652,
+      "grad_norm": 7.211827278137207,
+      "learning_rate": 0.0003964487452766811,
+      "loss": 6.7227,
+      "step": 198
+    },
+    {
+      "epoch": 0.06366981282994721,
+      "grad_norm": 6.876810550689697,
+      "learning_rate": 0.0003964109852001753,
+      "loss": 6.4332,
+      "step": 199
+    },
+    {
+      "epoch": 0.0639897616381379,
+      "grad_norm": 5.7009453773498535,
+      "learning_rate": 0.0003963730273486072,
+      "loss": 6.5789,
+      "step": 200
+    },
+    {
+      "epoch": 0.0639897616381379,
+      "eval_loss": 3.317033290863037,
+      "eval_runtime": 233.4326,
+      "eval_samples_per_second": 5.638,
+      "eval_steps_per_second": 1.409,
+      "step": 200
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 3060,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 200,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 6,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.8102124813693747e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:761d4b9259d1bad67621ab9fe0f38348d0c35bedf516fe16890b7b19205c4bc6
+size 6776