Training in progress, step 200, checkpoint

Browse files

Files changed (12) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +42 -0
last-checkpoint/trainer_state.json +1458 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: NousResearch/CodeLlama-7b-hf-flash
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "NousResearch/CodeLlama-7b-hf-flash",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.15,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "q_proj",
+    "k_proj",
+    "down_proj",
+    "gate_proj",
+    "v_proj",
+    "up_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bbe7bf38ff55273fba28bf8a5eec67669a9a456c4577907b44d64917efdd58cd
+size 639691872

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ecad31b57776699a4c7683e2ab76cdbb8a395dff135bf16f0091f272964b22db
+size 1279647314

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee55dad95ab2aa5141e72f636498273886c2790089f75739ce67f0836605be6a
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05bcff6304c57f73980e2c115f70c0be4066a85db604927dfb46a76647ef6e9a
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45ccb9c8b6b561889acea59191d66986d314e7cbd6a78abc6e49b139ca91c1e6
+size 500058

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1458 @@

+{
+  "best_metric": 1.836081624031067,
+  "best_model_checkpoint": "miner_id_24/checkpoint-200",
+  "epoch": 0.1078857759347291,
+  "eval_steps": 200,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0005394288796736456,
+      "grad_norm": 18.71552085876465,
+      "learning_rate": 2.0000000000000003e-06,
+      "loss": 57.3329,
+      "step": 1
+    },
+    {
+      "epoch": 0.0005394288796736456,
+      "eval_loss": 4.576467514038086,
+      "eval_runtime": 141.0154,
+      "eval_samples_per_second": 2.12,
+      "eval_steps_per_second": 2.12,
+      "step": 1
+    },
+    {
+      "epoch": 0.0010788577593472911,
+      "grad_norm": 40.17718505859375,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 111.721,
+      "step": 2
+    },
+    {
+      "epoch": 0.0016182866390209367,
+      "grad_norm": 55.62163162231445,
+      "learning_rate": 6e-06,
+      "loss": 145.8098,
+      "step": 3
+    },
+    {
+      "epoch": 0.0021577155186945822,
+      "grad_norm": 70.09906005859375,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 176.5399,
+      "step": 4
+    },
+    {
+      "epoch": 0.0026971443983682276,
+      "grad_norm": 96.45822143554688,
+      "learning_rate": 1e-05,
+      "loss": 205.6804,
+      "step": 5
+    },
+    {
+      "epoch": 0.0032365732780418733,
+      "grad_norm": 96.46897888183594,
+      "learning_rate": 1.2e-05,
+      "loss": 191.1242,
+      "step": 6
+    },
+    {
+      "epoch": 0.0037760021577155187,
+      "grad_norm": 123.18101501464844,
+      "learning_rate": 1.4000000000000001e-05,
+      "loss": 200.1216,
+      "step": 7
+    },
+    {
+      "epoch": 0.0043154310373891645,
+      "grad_norm": 112.75751495361328,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 199.3468,
+      "step": 8
+    },
+    {
+      "epoch": 0.00485485991706281,
+      "grad_norm": 105.84030151367188,
+      "learning_rate": 1.8e-05,
+      "loss": 197.7578,
+      "step": 9
+    },
+    {
+      "epoch": 0.005394288796736455,
+      "grad_norm": 152.0435333251953,
+      "learning_rate": 2e-05,
+      "loss": 221.8745,
+      "step": 10
+    },
+    {
+      "epoch": 0.0059337176764101005,
+      "grad_norm": 140.9628143310547,
+      "learning_rate": 2.2000000000000003e-05,
+      "loss": 202.1205,
+      "step": 11
+    },
+    {
+      "epoch": 0.006473146556083747,
+      "grad_norm": 136.8531036376953,
+      "learning_rate": 2.4e-05,
+      "loss": 192.207,
+      "step": 12
+    },
+    {
+      "epoch": 0.007012575435757392,
+      "grad_norm": 135.1580352783203,
+      "learning_rate": 2.6000000000000002e-05,
+      "loss": 188.5981,
+      "step": 13
+    },
+    {
+      "epoch": 0.007552004315431037,
+      "grad_norm": 135.94815063476562,
+      "learning_rate": 2.8000000000000003e-05,
+      "loss": 182.9973,
+      "step": 14
+    },
+    {
+      "epoch": 0.008091433195104683,
+      "grad_norm": 130.7935333251953,
+      "learning_rate": 3e-05,
+      "loss": 181.9996,
+      "step": 15
+    },
+    {
+      "epoch": 0.008630862074778329,
+      "grad_norm": 135.71165466308594,
+      "learning_rate": 3.2000000000000005e-05,
+      "loss": 156.7745,
+      "step": 16
+    },
+    {
+      "epoch": 0.009170290954451973,
+      "grad_norm": 80.55735778808594,
+      "learning_rate": 3.4000000000000007e-05,
+      "loss": 105.2249,
+      "step": 17
+    },
+    {
+      "epoch": 0.00970971983412562,
+      "grad_norm": 78.56623840332031,
+      "learning_rate": 3.6e-05,
+      "loss": 93.8699,
+      "step": 18
+    },
+    {
+      "epoch": 0.010249148713799266,
+      "grad_norm": 73.5405502319336,
+      "learning_rate": 3.8e-05,
+      "loss": 96.5256,
+      "step": 19
+    },
+    {
+      "epoch": 0.01078857759347291,
+      "grad_norm": 66.16717529296875,
+      "learning_rate": 4e-05,
+      "loss": 79.6901,
+      "step": 20
+    },
+    {
+      "epoch": 0.011328006473146556,
+      "grad_norm": 65.6923599243164,
+      "learning_rate": 4.2e-05,
+      "loss": 90.1881,
+      "step": 21
+    },
+    {
+      "epoch": 0.011867435352820201,
+      "grad_norm": 77.53053283691406,
+      "learning_rate": 4.4000000000000006e-05,
+      "loss": 85.8049,
+      "step": 22
+    },
+    {
+      "epoch": 0.012406864232493847,
+      "grad_norm": 71.17222595214844,
+      "learning_rate": 4.600000000000001e-05,
+      "loss": 64.6935,
+      "step": 23
+    },
+    {
+      "epoch": 0.012946293112167493,
+      "grad_norm": 46.50193786621094,
+      "learning_rate": 4.8e-05,
+      "loss": 72.2138,
+      "step": 24
+    },
+    {
+      "epoch": 0.013485721991841138,
+      "grad_norm": 45.66022491455078,
+      "learning_rate": 5e-05,
+      "loss": 71.2709,
+      "step": 25
+    },
+    {
+      "epoch": 0.014025150871514784,
+      "grad_norm": 46.14365768432617,
+      "learning_rate": 5.2000000000000004e-05,
+      "loss": 59.7781,
+      "step": 26
+    },
+    {
+      "epoch": 0.014564579751188429,
+      "grad_norm": 54.284664154052734,
+      "learning_rate": 5.4000000000000005e-05,
+      "loss": 64.8576,
+      "step": 27
+    },
+    {
+      "epoch": 0.015104008630862075,
+      "grad_norm": 43.3782958984375,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 68.6312,
+      "step": 28
+    },
+    {
+      "epoch": 0.01564343751053572,
+      "grad_norm": 35.549217224121094,
+      "learning_rate": 5.8e-05,
+      "loss": 62.3088,
+      "step": 29
+    },
+    {
+      "epoch": 0.016182866390209365,
+      "grad_norm": 42.21353530883789,
+      "learning_rate": 6e-05,
+      "loss": 65.5001,
+      "step": 30
+    },
+    {
+      "epoch": 0.01672229526988301,
+      "grad_norm": 46.08031463623047,
+      "learning_rate": 6.2e-05,
+      "loss": 57.0952,
+      "step": 31
+    },
+    {
+      "epoch": 0.017261724149556658,
+      "grad_norm": 38.45962905883789,
+      "learning_rate": 6.400000000000001e-05,
+      "loss": 65.1331,
+      "step": 32
+    },
+    {
+      "epoch": 0.017801153029230302,
+      "grad_norm": 34.330406188964844,
+      "learning_rate": 6.6e-05,
+      "loss": 56.026,
+      "step": 33
+    },
+    {
+      "epoch": 0.018340581908903947,
+      "grad_norm": 35.08675003051758,
+      "learning_rate": 6.800000000000001e-05,
+      "loss": 55.9277,
+      "step": 34
+    },
+    {
+      "epoch": 0.018880010788577595,
+      "grad_norm": 37.337825775146484,
+      "learning_rate": 7e-05,
+      "loss": 53.9387,
+      "step": 35
+    },
+    {
+      "epoch": 0.01941943966825124,
+      "grad_norm": 36.146873474121094,
+      "learning_rate": 7.2e-05,
+      "loss": 61.2999,
+      "step": 36
+    },
+    {
+      "epoch": 0.019958868547924884,
+      "grad_norm": 41.229610443115234,
+      "learning_rate": 7.4e-05,
+      "loss": 70.8618,
+      "step": 37
+    },
+    {
+      "epoch": 0.02049829742759853,
+      "grad_norm": 42.86275863647461,
+      "learning_rate": 7.6e-05,
+      "loss": 60.1886,
+      "step": 38
+    },
+    {
+      "epoch": 0.021037726307272176,
+      "grad_norm": 36.5433235168457,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 61.5439,
+      "step": 39
+    },
+    {
+      "epoch": 0.02157715518694582,
+      "grad_norm": 39.95774841308594,
+      "learning_rate": 8e-05,
+      "loss": 57.8462,
+      "step": 40
+    },
+    {
+      "epoch": 0.022116584066619465,
+      "grad_norm": 38.86470413208008,
+      "learning_rate": 8.2e-05,
+      "loss": 55.4324,
+      "step": 41
+    },
+    {
+      "epoch": 0.022656012946293113,
+      "grad_norm": 30.977352142333984,
+      "learning_rate": 8.4e-05,
+      "loss": 57.6402,
+      "step": 42
+    },
+    {
+      "epoch": 0.023195441825966757,
+      "grad_norm": 38.25783157348633,
+      "learning_rate": 8.6e-05,
+      "loss": 50.586,
+      "step": 43
+    },
+    {
+      "epoch": 0.023734870705640402,
+      "grad_norm": 37.11707305908203,
+      "learning_rate": 8.800000000000001e-05,
+      "loss": 36.7947,
+      "step": 44
+    },
+    {
+      "epoch": 0.02427429958531405,
+      "grad_norm": 40.3302116394043,
+      "learning_rate": 9e-05,
+      "loss": 40.2388,
+      "step": 45
+    },
+    {
+      "epoch": 0.024813728464987694,
+      "grad_norm": 42.60755920410156,
+      "learning_rate": 9.200000000000001e-05,
+      "loss": 58.9665,
+      "step": 46
+    },
+    {
+      "epoch": 0.02535315734466134,
+      "grad_norm": 44.4195442199707,
+      "learning_rate": 9.4e-05,
+      "loss": 50.2349,
+      "step": 47
+    },
+    {
+      "epoch": 0.025892586224334987,
+      "grad_norm": 37.404727935791016,
+      "learning_rate": 9.6e-05,
+      "loss": 48.7437,
+      "step": 48
+    },
+    {
+      "epoch": 0.02643201510400863,
+      "grad_norm": 48.31377410888672,
+      "learning_rate": 9.8e-05,
+      "loss": 54.4895,
+      "step": 49
+    },
+    {
+      "epoch": 0.026971443983682276,
+      "grad_norm": 51.360191345214844,
+      "learning_rate": 0.0001,
+      "loss": 57.4654,
+      "step": 50
+    },
+    {
+      "epoch": 0.02751087286335592,
+      "grad_norm": 23.211647033691406,
+      "learning_rate": 0.00010200000000000001,
+      "loss": 36.3068,
+      "step": 51
+    },
+    {
+      "epoch": 0.028050301743029568,
+      "grad_norm": 34.541805267333984,
+      "learning_rate": 0.00010400000000000001,
+      "loss": 75.6809,
+      "step": 52
+    },
+    {
+      "epoch": 0.028589730622703213,
+      "grad_norm": 42.0761833190918,
+      "learning_rate": 0.00010600000000000002,
+      "loss": 96.818,
+      "step": 53
+    },
+    {
+      "epoch": 0.029129159502376857,
+      "grad_norm": 43.26933670043945,
+      "learning_rate": 0.00010800000000000001,
+      "loss": 101.1451,
+      "step": 54
+    },
+    {
+      "epoch": 0.029668588382050505,
+      "grad_norm": 51.45765686035156,
+      "learning_rate": 0.00011000000000000002,
+      "loss": 115.0704,
+      "step": 55
+    },
+    {
+      "epoch": 0.03020801726172415,
+      "grad_norm": 43.3838005065918,
+      "learning_rate": 0.00011200000000000001,
+      "loss": 112.4747,
+      "step": 56
+    },
+    {
+      "epoch": 0.030747446141397794,
+      "grad_norm": 59.83226013183594,
+      "learning_rate": 0.00011399999999999999,
+      "loss": 114.741,
+      "step": 57
+    },
+    {
+      "epoch": 0.03128687502107144,
+      "grad_norm": 38.54649353027344,
+      "learning_rate": 0.000116,
+      "loss": 100.9508,
+      "step": 58
+    },
+    {
+      "epoch": 0.03182630390074508,
+      "grad_norm": 34.7606086730957,
+      "learning_rate": 0.000118,
+      "loss": 87.4097,
+      "step": 59
+    },
+    {
+      "epoch": 0.03236573278041873,
+      "grad_norm": 34.808265686035156,
+      "learning_rate": 0.00012,
+      "loss": 94.2411,
+      "step": 60
+    },
+    {
+      "epoch": 0.03290516166009238,
+      "grad_norm": 33.40951156616211,
+      "learning_rate": 0.000122,
+      "loss": 85.6042,
+      "step": 61
+    },
+    {
+      "epoch": 0.03344459053976602,
+      "grad_norm": 25.83111572265625,
+      "learning_rate": 0.000124,
+      "loss": 79.6747,
+      "step": 62
+    },
+    {
+      "epoch": 0.03398401941943967,
+      "grad_norm": 51.73832321166992,
+      "learning_rate": 0.000126,
+      "loss": 65.953,
+      "step": 63
+    },
+    {
+      "epoch": 0.034523448299113316,
+      "grad_norm": 38.63320541381836,
+      "learning_rate": 0.00012800000000000002,
+      "loss": 72.4182,
+      "step": 64
+    },
+    {
+      "epoch": 0.03506287717878696,
+      "grad_norm": 20.10302734375,
+      "learning_rate": 0.00013000000000000002,
+      "loss": 61.2385,
+      "step": 65
+    },
+    {
+      "epoch": 0.035602306058460605,
+      "grad_norm": 27.804248809814453,
+      "learning_rate": 0.000132,
+      "loss": 59.6029,
+      "step": 66
+    },
+    {
+      "epoch": 0.03614173493813425,
+      "grad_norm": 30.542932510375977,
+      "learning_rate": 0.000134,
+      "loss": 51.2451,
+      "step": 67
+    },
+    {
+      "epoch": 0.036681163817807894,
+      "grad_norm": 70.11331176757812,
+      "learning_rate": 0.00013600000000000003,
+      "loss": 51.116,
+      "step": 68
+    },
+    {
+      "epoch": 0.03722059269748154,
+      "grad_norm": 155.8134307861328,
+      "learning_rate": 0.000138,
+      "loss": 76.3231,
+      "step": 69
+    },
+    {
+      "epoch": 0.03776002157715519,
+      "grad_norm": 146.5844268798828,
+      "learning_rate": 0.00014,
+      "loss": 68.6173,
+      "step": 70
+    },
+    {
+      "epoch": 0.03829945045682883,
+      "grad_norm": 102.16127014160156,
+      "learning_rate": 0.000142,
+      "loss": 72.0777,
+      "step": 71
+    },
+    {
+      "epoch": 0.03883887933650248,
+      "grad_norm": 40.04204559326172,
+      "learning_rate": 0.000144,
+      "loss": 52.744,
+      "step": 72
+    },
+    {
+      "epoch": 0.039378308216176126,
+      "grad_norm": 75.35163116455078,
+      "learning_rate": 0.000146,
+      "loss": 59.1683,
+      "step": 73
+    },
+    {
+      "epoch": 0.03991773709584977,
+      "grad_norm": 77.30841827392578,
+      "learning_rate": 0.000148,
+      "loss": 51.3059,
+      "step": 74
+    },
+    {
+      "epoch": 0.040457165975523415,
+      "grad_norm": 52.49984359741211,
+      "learning_rate": 0.00015000000000000001,
+      "loss": 52.407,
+      "step": 75
+    },
+    {
+      "epoch": 0.04099659485519706,
+      "grad_norm": 35.61119842529297,
+      "learning_rate": 0.000152,
+      "loss": 50.8863,
+      "step": 76
+    },
+    {
+      "epoch": 0.041536023734870704,
+      "grad_norm": 34.10403060913086,
+      "learning_rate": 0.000154,
+      "loss": 53.6901,
+      "step": 77
+    },
+    {
+      "epoch": 0.04207545261454435,
+      "grad_norm": 39.79935836791992,
+      "learning_rate": 0.00015600000000000002,
+      "loss": 49.8857,
+      "step": 78
+    },
+    {
+      "epoch": 0.042614881494218,
+      "grad_norm": 35.74922561645508,
+      "learning_rate": 0.00015800000000000002,
+      "loss": 62.577,
+      "step": 79
+    },
+    {
+      "epoch": 0.04315431037389164,
+      "grad_norm": 31.491291046142578,
+      "learning_rate": 0.00016,
+      "loss": 52.0815,
+      "step": 80
+    },
+    {
+      "epoch": 0.04369373925356529,
+      "grad_norm": 23.866592407226562,
+      "learning_rate": 0.000162,
+      "loss": 64.6077,
+      "step": 81
+    },
+    {
+      "epoch": 0.04423316813323893,
+      "grad_norm": 28.5296688079834,
+      "learning_rate": 0.000164,
+      "loss": 59.5244,
+      "step": 82
+    },
+    {
+      "epoch": 0.04477259701291258,
+      "grad_norm": 33.92407989501953,
+      "learning_rate": 0.000166,
+      "loss": 62.9956,
+      "step": 83
+    },
+    {
+      "epoch": 0.045312025892586226,
+      "grad_norm": 27.05453109741211,
+      "learning_rate": 0.000168,
+      "loss": 52.9777,
+      "step": 84
+    },
+    {
+      "epoch": 0.04585145477225987,
+      "grad_norm": 23.927709579467773,
+      "learning_rate": 0.00017,
+      "loss": 56.0232,
+      "step": 85
+    },
+    {
+      "epoch": 0.046390883651933515,
+      "grad_norm": 31.250370025634766,
+      "learning_rate": 0.000172,
+      "loss": 55.1487,
+      "step": 86
+    },
+    {
+      "epoch": 0.04693031253160716,
+      "grad_norm": 32.98558044433594,
+      "learning_rate": 0.000174,
+      "loss": 54.3132,
+      "step": 87
+    },
+    {
+      "epoch": 0.047469741411280804,
+      "grad_norm": 39.15415954589844,
+      "learning_rate": 0.00017600000000000002,
+      "loss": 56.2989,
+      "step": 88
+    },
+    {
+      "epoch": 0.04800917029095445,
+      "grad_norm": 32.42843246459961,
+      "learning_rate": 0.00017800000000000002,
+      "loss": 41.7672,
+      "step": 89
+    },
+    {
+      "epoch": 0.0485485991706281,
+      "grad_norm": 42.03153610229492,
+      "learning_rate": 0.00018,
+      "loss": 50.3046,
+      "step": 90
+    },
+    {
+      "epoch": 0.04908802805030174,
+      "grad_norm": 38.14472961425781,
+      "learning_rate": 0.000182,
+      "loss": 50.2817,
+      "step": 91
+    },
+    {
+      "epoch": 0.04962745692997539,
+      "grad_norm": 32.74757385253906,
+      "learning_rate": 0.00018400000000000003,
+      "loss": 47.9721,
+      "step": 92
+    },
+    {
+      "epoch": 0.05016688580964904,
+      "grad_norm": 41.20277404785156,
+      "learning_rate": 0.00018600000000000002,
+      "loss": 48.2985,
+      "step": 93
+    },
+    {
+      "epoch": 0.05070631468932268,
+      "grad_norm": 42.31992721557617,
+      "learning_rate": 0.000188,
+      "loss": 58.7386,
+      "step": 94
+    },
+    {
+      "epoch": 0.051245743568996326,
+      "grad_norm": 28.106618881225586,
+      "learning_rate": 0.00019,
+      "loss": 46.3057,
+      "step": 95
+    },
+    {
+      "epoch": 0.051785172448669974,
+      "grad_norm": 37.70038604736328,
+      "learning_rate": 0.000192,
+      "loss": 35.5874,
+      "step": 96
+    },
+    {
+      "epoch": 0.052324601328343615,
+      "grad_norm": 36.007530212402344,
+      "learning_rate": 0.000194,
+      "loss": 47.9065,
+      "step": 97
+    },
+    {
+      "epoch": 0.05286403020801726,
+      "grad_norm": 29.738492965698242,
+      "learning_rate": 0.000196,
+      "loss": 49.6222,
+      "step": 98
+    },
+    {
+      "epoch": 0.05340345908769091,
+      "grad_norm": 42.806785583496094,
+      "learning_rate": 0.00019800000000000002,
+      "loss": 44.6868,
+      "step": 99
+    },
+    {
+      "epoch": 0.05394288796736455,
+      "grad_norm": 31.359643936157227,
+      "learning_rate": 0.0002,
+      "loss": 30.5743,
+      "step": 100
+    },
+    {
+      "epoch": 0.0544823168470382,
+      "grad_norm": 24.176820755004883,
+      "learning_rate": 0.00019999998344063995,
+      "loss": 41.8829,
+      "step": 101
+    },
+    {
+      "epoch": 0.05502174572671184,
+      "grad_norm": 43.5556755065918,
+      "learning_rate": 0.00019999993376256528,
+      "loss": 64.5931,
+      "step": 102
+    },
+    {
+      "epoch": 0.05556117460638549,
+      "grad_norm": 35.98505401611328,
+      "learning_rate": 0.00019999985096579245,
+      "loss": 94.4231,
+      "step": 103
+    },
+    {
+      "epoch": 0.056100603486059136,
+      "grad_norm": 35.83631134033203,
+      "learning_rate": 0.00019999973505034887,
+      "loss": 113.3877,
+      "step": 104
+    },
+    {
+      "epoch": 0.05664003236573278,
+      "grad_norm": 30.29425621032715,
+      "learning_rate": 0.00019999958601627296,
+      "loss": 113.0325,
+      "step": 105
+    },
+    {
+      "epoch": 0.057179461245406425,
+      "grad_norm": 27.389789581298828,
+      "learning_rate": 0.000199999403863614,
+      "loss": 111.3191,
+      "step": 106
+    },
+    {
+      "epoch": 0.05771889012508007,
+      "grad_norm": 27.400251388549805,
+      "learning_rate": 0.00019999918859243244,
+      "loss": 97.0415,
+      "step": 107
+    },
+    {
+      "epoch": 0.058258319004753714,
+      "grad_norm": 20.399946212768555,
+      "learning_rate": 0.0001999989402027995,
+      "loss": 90.2641,
+      "step": 108
+    },
+    {
+      "epoch": 0.05879774788442736,
+      "grad_norm": 25.029308319091797,
+      "learning_rate": 0.0001999986586947974,
+      "loss": 94.4251,
+      "step": 109
+    },
+    {
+      "epoch": 0.05933717676410101,
+      "grad_norm": 29.495418548583984,
+      "learning_rate": 0.00019999834406851945,
+      "loss": 94.9159,
+      "step": 110
+    },
+    {
+      "epoch": 0.05987660564377465,
+      "grad_norm": 19.77571678161621,
+      "learning_rate": 0.0001999979963240698,
+      "loss": 75.4925,
+      "step": 111
+    },
+    {
+      "epoch": 0.0604160345234483,
+      "grad_norm": 25.004566192626953,
+      "learning_rate": 0.00019999761546156365,
+      "loss": 71.3454,
+      "step": 112
+    },
+    {
+      "epoch": 0.06095546340312195,
+      "grad_norm": 34.21379852294922,
+      "learning_rate": 0.00019999720148112715,
+      "loss": 66.511,
+      "step": 113
+    },
+    {
+      "epoch": 0.06149489228279559,
+      "grad_norm": 22.71439552307129,
+      "learning_rate": 0.00019999675438289738,
+      "loss": 52.0498,
+      "step": 114
+    },
+    {
+      "epoch": 0.062034321162469236,
+      "grad_norm": 24.381750106811523,
+      "learning_rate": 0.0001999962741670224,
+      "loss": 55.2827,
+      "step": 115
+    },
+    {
+      "epoch": 0.06257375004214288,
+      "grad_norm": 37.246803283691406,
+      "learning_rate": 0.00019999576083366125,
+      "loss": 54.9355,
+      "step": 116
+    },
+    {
+      "epoch": 0.06311317892181653,
+      "grad_norm": 81.53564453125,
+      "learning_rate": 0.00019999521438298398,
+      "loss": 59.4422,
+      "step": 117
+    },
+    {
+      "epoch": 0.06365260780149017,
+      "grad_norm": 129.4823760986328,
+      "learning_rate": 0.00019999463481517156,
+      "loss": 67.393,
+      "step": 118
+    },
+    {
+      "epoch": 0.06419203668116381,
+      "grad_norm": 77.96698760986328,
+      "learning_rate": 0.00019999402213041588,
+      "loss": 67.9443,
+      "step": 119
+    },
+    {
+      "epoch": 0.06473146556083746,
+      "grad_norm": 53.094512939453125,
+      "learning_rate": 0.0001999933763289199,
+      "loss": 61.054,
+      "step": 120
+    },
+    {
+      "epoch": 0.06527089444051111,
+      "grad_norm": 52.896366119384766,
+      "learning_rate": 0.00019999269741089752,
+      "loss": 62.3436,
+      "step": 121
+    },
+    {
+      "epoch": 0.06581032332018476,
+      "grad_norm": 57.282318115234375,
+      "learning_rate": 0.00019999198537657353,
+      "loss": 56.6129,
+      "step": 122
+    },
+    {
+      "epoch": 0.0663497521998584,
+      "grad_norm": 46.553062438964844,
+      "learning_rate": 0.0001999912402261838,
+      "loss": 55.701,
+      "step": 123
+    },
+    {
+      "epoch": 0.06688918107953204,
+      "grad_norm": 28.822669982910156,
+      "learning_rate": 0.00019999046195997512,
+      "loss": 54.2102,
+      "step": 124
+    },
+    {
+      "epoch": 0.06742860995920569,
+      "grad_norm": 28.726089477539062,
+      "learning_rate": 0.00019998965057820516,
+      "loss": 56.0332,
+      "step": 125
+    },
+    {
+      "epoch": 0.06796803883887934,
+      "grad_norm": 26.886003494262695,
+      "learning_rate": 0.0001999888060811427,
+      "loss": 43.4516,
+      "step": 126
+    },
+    {
+      "epoch": 0.06850746771855298,
+      "grad_norm": 31.9282169342041,
+      "learning_rate": 0.00019998792846906747,
+      "loss": 52.2149,
+      "step": 127
+    },
+    {
+      "epoch": 0.06904689659822663,
+      "grad_norm": 38.317962646484375,
+      "learning_rate": 0.00019998701774227005,
+      "loss": 54.0044,
+      "step": 128
+    },
+    {
+      "epoch": 0.06958632547790028,
+      "grad_norm": 31.158544540405273,
+      "learning_rate": 0.00019998607390105209,
+      "loss": 55.2255,
+      "step": 129
+    },
+    {
+      "epoch": 0.07012575435757391,
+      "grad_norm": 33.239166259765625,
+      "learning_rate": 0.00019998509694572615,
+      "loss": 56.3811,
+      "step": 130
+    },
+    {
+      "epoch": 0.07066518323724756,
+      "grad_norm": 30.34086799621582,
+      "learning_rate": 0.00019998408687661582,
+      "loss": 52.0529,
+      "step": 131
+    },
+    {
+      "epoch": 0.07120461211692121,
+      "grad_norm": 24.05341911315918,
+      "learning_rate": 0.00019998304369405563,
+      "loss": 60.5602,
+      "step": 132
+    },
+    {
+      "epoch": 0.07174404099659486,
+      "grad_norm": 26.90273094177246,
+      "learning_rate": 0.00019998196739839103,
+      "loss": 57.3375,
+      "step": 133
+    },
+    {
+      "epoch": 0.0722834698762685,
+      "grad_norm": 24.157773971557617,
+      "learning_rate": 0.0001999808579899785,
+      "loss": 47.7251,
+      "step": 134
+    },
+    {
+      "epoch": 0.07282289875594215,
+      "grad_norm": 28.088014602661133,
+      "learning_rate": 0.00019997971546918545,
+      "loss": 56.1037,
+      "step": 135
+    },
+    {
+      "epoch": 0.07336232763561579,
+      "grad_norm": 32.39021682739258,
+      "learning_rate": 0.00019997853983639029,
+      "loss": 52.0922,
+      "step": 136
+    },
+    {
+      "epoch": 0.07390175651528944,
+      "grad_norm": 29.597578048706055,
+      "learning_rate": 0.0001999773310919824,
+      "loss": 46.3537,
+      "step": 137
+    },
+    {
+      "epoch": 0.07444118539496308,
+      "grad_norm": 38.31181335449219,
+      "learning_rate": 0.000199976089236362,
+      "loss": 46.8711,
+      "step": 138
+    },
+    {
+      "epoch": 0.07498061427463673,
+      "grad_norm": 39.67713165283203,
+      "learning_rate": 0.00019997481426994044,
+      "loss": 45.0961,
+      "step": 139
+    },
+    {
+      "epoch": 0.07552004315431038,
+      "grad_norm": 48.8436164855957,
+      "learning_rate": 0.00019997350619314,
+      "loss": 48.7547,
+      "step": 140
+    },
+    {
+      "epoch": 0.07605947203398401,
+      "grad_norm": 88.95709991455078,
+      "learning_rate": 0.00019997216500639383,
+      "loss": 50.3681,
+      "step": 141
+    },
+    {
+      "epoch": 0.07659890091365766,
+      "grad_norm": 34.2819938659668,
+      "learning_rate": 0.0001999707907101462,
+      "loss": 44.3903,
+      "step": 142
+    },
+    {
+      "epoch": 0.07713832979333131,
+      "grad_norm": 42.79631042480469,
+      "learning_rate": 0.00019996938330485217,
+      "loss": 31.0566,
+      "step": 143
+    },
+    {
+      "epoch": 0.07767775867300496,
+      "grad_norm": 37.28693389892578,
+      "learning_rate": 0.00019996794279097791,
+      "loss": 34.0999,
+      "step": 144
+    },
+    {
+      "epoch": 0.0782171875526786,
+      "grad_norm": 43.65718460083008,
+      "learning_rate": 0.00019996646916900051,
+      "loss": 48.7369,
+      "step": 145
+    },
+    {
+      "epoch": 0.07875661643235225,
+      "grad_norm": 39.86713409423828,
+      "learning_rate": 0.00019996496243940794,
+      "loss": 36.6841,
+      "step": 146
+    },
+    {
+      "epoch": 0.07929604531202589,
+      "grad_norm": 32.35002899169922,
+      "learning_rate": 0.0001999634226026993,
+      "loss": 43.1344,
+      "step": 147
+    },
+    {
+      "epoch": 0.07983547419169953,
+      "grad_norm": 36.14616775512695,
+      "learning_rate": 0.0001999618496593845,
+      "loss": 51.9779,
+      "step": 148
+    },
+    {
+      "epoch": 0.08037490307137318,
+      "grad_norm": 31.071197509765625,
+      "learning_rate": 0.00019996024360998456,
+      "loss": 39.5621,
+      "step": 149
+    },
+    {
+      "epoch": 0.08091433195104683,
+      "grad_norm": 33.61774444580078,
+      "learning_rate": 0.00019995860445503127,
+      "loss": 37.7614,
+      "step": 150
+    },
+    {
+      "epoch": 0.08145376083072048,
+      "grad_norm": 22.93950653076172,
+      "learning_rate": 0.00019995693219506758,
+      "loss": 59.2331,
+      "step": 151
+    },
+    {
+      "epoch": 0.08199318971039413,
+      "grad_norm": 31.307132720947266,
+      "learning_rate": 0.00019995522683064726,
+      "loss": 70.8054,
+      "step": 152
+    },
+    {
+      "epoch": 0.08253261859006776,
+      "grad_norm": 28.894466400146484,
+      "learning_rate": 0.00019995348836233516,
+      "loss": 84.8097,
+      "step": 153
+    },
+    {
+      "epoch": 0.08307204746974141,
+      "grad_norm": 26.76435661315918,
+      "learning_rate": 0.000199951716790707,
+      "loss": 101.4707,
+      "step": 154
+    },
+    {
+      "epoch": 0.08361147634941506,
+      "grad_norm": 26.842918395996094,
+      "learning_rate": 0.00019994991211634954,
+      "loss": 107.518,
+      "step": 155
+    },
+    {
+      "epoch": 0.0841509052290887,
+      "grad_norm": 25.251588821411133,
+      "learning_rate": 0.00019994807433986047,
+      "loss": 106.076,
+      "step": 156
+    },
+    {
+      "epoch": 0.08469033410876235,
+      "grad_norm": 28.60271453857422,
+      "learning_rate": 0.0001999462034618484,
+      "loss": 96.3093,
+      "step": 157
+    },
+    {
+      "epoch": 0.085229762988436,
+      "grad_norm": 22.537473678588867,
+      "learning_rate": 0.00019994429948293291,
+      "loss": 88.6475,
+      "step": 158
+    },
+    {
+      "epoch": 0.08576919186810963,
+      "grad_norm": 18.868396759033203,
+      "learning_rate": 0.00019994236240374465,
+      "loss": 92.4222,
+      "step": 159
+    },
+    {
+      "epoch": 0.08630862074778328,
+      "grad_norm": 21.84971046447754,
+      "learning_rate": 0.00019994039222492513,
+      "loss": 88.0079,
+      "step": 160
+    },
+    {
+      "epoch": 0.08684804962745693,
+      "grad_norm": 23.634244918823242,
+      "learning_rate": 0.00019993838894712682,
+      "loss": 77.0574,
+      "step": 161
+    },
+    {
+      "epoch": 0.08738747850713058,
+      "grad_norm": 18.22877311706543,
+      "learning_rate": 0.00019993635257101322,
+      "loss": 67.3958,
+      "step": 162
+    },
+    {
+      "epoch": 0.08792690738680423,
+      "grad_norm": 21.62260627746582,
+      "learning_rate": 0.00019993428309725872,
+      "loss": 65.1832,
+      "step": 163
+    },
+    {
+      "epoch": 0.08846633626647786,
+      "grad_norm": 18.148618698120117,
+      "learning_rate": 0.0001999321805265487,
+      "loss": 63.1231,
+      "step": 164
+    },
+    {
+      "epoch": 0.08900576514615151,
+      "grad_norm": 20.20022201538086,
+      "learning_rate": 0.00019993004485957956,
+      "loss": 59.0852,
+      "step": 165
+    },
+    {
+      "epoch": 0.08954519402582516,
+      "grad_norm": 28.2082576751709,
+      "learning_rate": 0.00019992787609705853,
+      "loss": 55.8505,
+      "step": 166
+    },
+    {
+      "epoch": 0.0900846229054988,
+      "grad_norm": 43.48365020751953,
+      "learning_rate": 0.00019992567423970394,
+      "loss": 40.495,
+      "step": 167
+    },
+    {
+      "epoch": 0.09062405178517245,
+      "grad_norm": 149.13955688476562,
+      "learning_rate": 0.00019992343928824498,
+      "loss": 91.8388,
+      "step": 168
+    },
+    {
+      "epoch": 0.0911634806648461,
+      "grad_norm": 91.07251739501953,
+      "learning_rate": 0.00019992117124342183,
+      "loss": 61.9425,
+      "step": 169
+    },
+    {
+      "epoch": 0.09170290954451973,
+      "grad_norm": 65.70806121826172,
+      "learning_rate": 0.00019991887010598565,
+      "loss": 59.7979,
+      "step": 170
+    },
+    {
+      "epoch": 0.09224233842419338,
+      "grad_norm": 45.109580993652344,
+      "learning_rate": 0.00019991653587669855,
+      "loss": 63.235,
+      "step": 171
+    },
+    {
+      "epoch": 0.09278176730386703,
+      "grad_norm": 49.24695587158203,
+      "learning_rate": 0.00019991416855633364,
+      "loss": 55.8371,
+      "step": 172
+    },
+    {
+      "epoch": 0.09332119618354068,
+      "grad_norm": 44.50947952270508,
+      "learning_rate": 0.0001999117681456749,
+      "loss": 45.3712,
+      "step": 173
+    },
+    {
+      "epoch": 0.09386062506321433,
+      "grad_norm": 45.105506896972656,
+      "learning_rate": 0.00019990933464551728,
+      "loss": 59.354,
+      "step": 174
+    },
+    {
+      "epoch": 0.09440005394288797,
+      "grad_norm": 31.862106323242188,
+      "learning_rate": 0.0001999068680566668,
+      "loss": 49.2883,
+      "step": 175
+    },
+    {
+      "epoch": 0.09493948282256161,
+      "grad_norm": 34.86188507080078,
+      "learning_rate": 0.00019990436837994028,
+      "loss": 40.9445,
+      "step": 176
+    },
+    {
+      "epoch": 0.09547891170223526,
+      "grad_norm": 52.34774398803711,
+      "learning_rate": 0.00019990183561616567,
+      "loss": 54.3114,
+      "step": 177
+    },
+    {
+      "epoch": 0.0960183405819089,
+      "grad_norm": 30.12732696533203,
+      "learning_rate": 0.00019989926976618172,
+      "loss": 44.8966,
+      "step": 178
+    },
+    {
+      "epoch": 0.09655776946158255,
+      "grad_norm": 29.296287536621094,
+      "learning_rate": 0.00019989667083083825,
+      "loss": 47.5101,
+      "step": 179
+    },
+    {
+      "epoch": 0.0970971983412562,
+      "grad_norm": 42.42873764038086,
+      "learning_rate": 0.00019989403881099597,
+      "loss": 48.2378,
+      "step": 180
+    },
+    {
+      "epoch": 0.09763662722092983,
+      "grad_norm": 31.62274742126465,
+      "learning_rate": 0.00019989137370752657,
+      "loss": 42.1564,
+      "step": 181
+    },
+    {
+      "epoch": 0.09817605610060348,
+      "grad_norm": 30.754499435424805,
+      "learning_rate": 0.00019988867552131275,
+      "loss": 52.2929,
+      "step": 182
+    },
+    {
+      "epoch": 0.09871548498027713,
+      "grad_norm": 31.932157516479492,
+      "learning_rate": 0.000199885944253248,
+      "loss": 45.6226,
+      "step": 183
+    },
+    {
+      "epoch": 0.09925491385995078,
+      "grad_norm": 33.754722595214844,
+      "learning_rate": 0.00019988317990423703,
+      "loss": 39.9572,
+      "step": 184
+    },
+    {
+      "epoch": 0.09979434273962443,
+      "grad_norm": 33.33165740966797,
+      "learning_rate": 0.00019988038247519522,
+      "loss": 52.7357,
+      "step": 185
+    },
+    {
+      "epoch": 0.10033377161929807,
+      "grad_norm": 28.355619430541992,
+      "learning_rate": 0.0001998775519670491,
+      "loss": 39.8865,
+      "step": 186
+    },
+    {
+      "epoch": 0.10087320049897171,
+      "grad_norm": 60.16803741455078,
+      "learning_rate": 0.00019987468838073613,
+      "loss": 48.3595,
+      "step": 187
+    },
+    {
+      "epoch": 0.10141262937864536,
+      "grad_norm": 33.5135498046875,
+      "learning_rate": 0.00019987179171720464,
+      "loss": 34.3803,
+      "step": 188
+    },
+    {
+      "epoch": 0.101952058258319,
+      "grad_norm": 33.8374137878418,
+      "learning_rate": 0.00019986886197741403,
+      "loss": 46.4517,
+      "step": 189
+    },
+    {
+      "epoch": 0.10249148713799265,
+      "grad_norm": 26.143709182739258,
+      "learning_rate": 0.0001998658991623345,
+      "loss": 30.6351,
+      "step": 190
+    },
+    {
+      "epoch": 0.1030309160176663,
+      "grad_norm": 28.791723251342773,
+      "learning_rate": 0.0001998629032729474,
+      "loss": 44.2275,
+      "step": 191
+    },
+    {
+      "epoch": 0.10357034489733995,
+      "grad_norm": 33.818931579589844,
+      "learning_rate": 0.00019985987431024485,
+      "loss": 43.5677,
+      "step": 192
+    },
+    {
+      "epoch": 0.10410977377701358,
+      "grad_norm": 40.07392883300781,
+      "learning_rate": 0.00019985681227523006,
+      "loss": 34.5844,
+      "step": 193
+    },
+    {
+      "epoch": 0.10464920265668723,
+      "grad_norm": 30.963062286376953,
+      "learning_rate": 0.00019985371716891708,
+      "loss": 44.1099,
+      "step": 194
+    },
+    {
+      "epoch": 0.10518863153636088,
+      "grad_norm": 31.774293899536133,
+      "learning_rate": 0.000199850588992331,
+      "loss": 36.4496,
+      "step": 195
+    },
+    {
+      "epoch": 0.10572806041603452,
+      "grad_norm": 47.396575927734375,
+      "learning_rate": 0.00019984742774650785,
+      "loss": 50.9736,
+      "step": 196
+    },
+    {
+      "epoch": 0.10626748929570817,
+      "grad_norm": 58.573341369628906,
+      "learning_rate": 0.00019984423343249457,
+      "loss": 44.6643,
+      "step": 197
+    },
+    {
+      "epoch": 0.10680691817538182,
+      "grad_norm": 33.57207107543945,
+      "learning_rate": 0.00019984100605134906,
+      "loss": 36.4154,
+      "step": 198
+    },
+    {
+      "epoch": 0.10734634705505545,
+      "grad_norm": 33.817752838134766,
+      "learning_rate": 0.00019983774560414027,
+      "loss": 38.8474,
+      "step": 199
+    },
+    {
+      "epoch": 0.1078857759347291,
+      "grad_norm": 34.572608947753906,
+      "learning_rate": 0.00019983445209194791,
+      "loss": 30.1009,
+      "step": 200
+    },
+    {
+      "epoch": 0.1078857759347291,
+      "eval_loss": 1.836081624031067,
+      "eval_runtime": 141.0356,
+      "eval_samples_per_second": 2.12,
+      "eval_steps_per_second": 2.12,
+      "step": 200
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 5559,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 200,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 3,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.571584181941043e+17,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:75d521b926cc9b52658f752edb06fe011094ff558c587d132452d4dea2c1c386
+size 6776