ArsParadox commited on Mar 28, 2025

Commit

f670287

verified ·

1 Parent(s): eaed343

Fine-tuned Mistral model on Viel-Lite Dataset

Browse files

Files changed (20) hide show

README.md +202 -0
adapter_config.json +37 -0
adapter_model.safetensors +3 -0
checkpoint-180/README.md +202 -0
checkpoint-180/adapter_config.json +37 -0
checkpoint-180/adapter_model.safetensors +3 -0
checkpoint-180/optimizer.pt +3 -0
checkpoint-180/rng_state.pth +3 -0
checkpoint-180/scaler.pt +3 -0
checkpoint-180/scheduler.pt +3 -0
checkpoint-180/special_tokens_map.json +30 -0
checkpoint-180/tokenizer.json +0 -0
checkpoint-180/tokenizer.model +3 -0
checkpoint-180/tokenizer_config.json +0 -0
checkpoint-180/trainer_state.json +1294 -0
checkpoint-180/training_args.bin +3 -0
special_tokens_map.json +6 -0
tokenizer.json +0 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/mistral-7b-instruct-v0.3-bnb-4bit
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "gate_proj",
+    "v_proj",
+    "k_proj",
+    "down_proj",
+    "o_proj",
+    "up_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fd1ceae16e91729527682a1cdd29e00194d3f84f4fbb4a158c66c078c6b5013e
+size 671149168

checkpoint-180/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/mistral-7b-instruct-v0.3-bnb-4bit
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

checkpoint-180/adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "gate_proj",
+    "v_proj",
+    "k_proj",
+    "down_proj",
+    "o_proj",
+    "up_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-180/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fd1ceae16e91729527682a1cdd29e00194d3f84f4fbb4a158c66c078c6b5013e
+size 671149168

checkpoint-180/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0ad77a9b3b0e9a7b8c2f7cf7316a4815ca636c2417ffd261de8ef3f615321b0e
+size 341314196

checkpoint-180/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:608fccb6c056ce88cdfd5355e6be2046f4d107a24a87c6b0d2c3b200ce6bb4ea
+size 14244

checkpoint-180/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:894d0e48bf1444f129e12325905662a936cdeeb9fec3a46a0155b3b08f997b67
+size 988

checkpoint-180/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9024c1b50ab9ddda3610cbc60929f3e57c4c124e3e86257a28298bf53e386ff6
+size 1064

checkpoint-180/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[control_768]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-180/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-180/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
+size 587404

checkpoint-180/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-180/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1294 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.144,
+  "eval_steps": 500,
+  "global_step": 180,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0008,
+      "grad_norm": 1.2484313249588013,
+      "learning_rate": 4e-05,
+      "loss": 0.7118,
+      "step": 1
+    },
+    {
+      "epoch": 0.0016,
+      "grad_norm": 1.2396782636642456,
+      "learning_rate": 8e-05,
+      "loss": 0.8189,
+      "step": 2
+    },
+    {
+      "epoch": 0.0024,
+      "grad_norm": 0.8842036128044128,
+      "learning_rate": 0.00012,
+      "loss": 0.5522,
+      "step": 3
+    },
+    {
+      "epoch": 0.0032,
+      "grad_norm": 0.9036827683448792,
+      "learning_rate": 0.00016,
+      "loss": 0.4924,
+      "step": 4
+    },
+    {
+      "epoch": 0.004,
+      "grad_norm": 1.2299714088439941,
+      "learning_rate": 0.0002,
+      "loss": 0.58,
+      "step": 5
+    },
+    {
+      "epoch": 0.0048,
+      "grad_norm": 1.3246244192123413,
+      "learning_rate": 0.00019885714285714287,
+      "loss": 0.4792,
+      "step": 6
+    },
+    {
+      "epoch": 0.0056,
+      "grad_norm": 1.540134072303772,
+      "learning_rate": 0.0001977142857142857,
+      "loss": 0.5264,
+      "step": 7
+    },
+    {
+      "epoch": 0.0064,
+      "grad_norm": 1.3656489849090576,
+      "learning_rate": 0.00019657142857142858,
+      "loss": 0.4013,
+      "step": 8
+    },
+    {
+      "epoch": 0.0072,
+      "grad_norm": 1.1679350137710571,
+      "learning_rate": 0.00019542857142857144,
+      "loss": 0.3793,
+      "step": 9
+    },
+    {
+      "epoch": 0.008,
+      "grad_norm": 1.1096352338790894,
+      "learning_rate": 0.0001942857142857143,
+      "loss": 0.4983,
+      "step": 10
+    },
+    {
+      "epoch": 0.0088,
+      "grad_norm": 1.0913289785385132,
+      "learning_rate": 0.00019314285714285717,
+      "loss": 0.4991,
+      "step": 11
+    },
+    {
+      "epoch": 0.0096,
+      "grad_norm": 1.1222748756408691,
+      "learning_rate": 0.000192,
+      "loss": 0.4747,
+      "step": 12
+    },
+    {
+      "epoch": 0.0104,
+      "grad_norm": 1.1606162786483765,
+      "learning_rate": 0.00019085714285714287,
+      "loss": 0.4826,
+      "step": 13
+    },
+    {
+      "epoch": 0.0112,
+      "grad_norm": 1.003353476524353,
+      "learning_rate": 0.00018971428571428573,
+      "loss": 0.4512,
+      "step": 14
+    },
+    {
+      "epoch": 0.012,
+      "grad_norm": 1.3158879280090332,
+      "learning_rate": 0.00018857142857142857,
+      "loss": 0.4743,
+      "step": 15
+    },
+    {
+      "epoch": 0.0128,
+      "grad_norm": 1.2782206535339355,
+      "learning_rate": 0.00018742857142857143,
+      "loss": 0.4539,
+      "step": 16
+    },
+    {
+      "epoch": 0.0136,
+      "grad_norm": 1.618283987045288,
+      "learning_rate": 0.0001862857142857143,
+      "loss": 0.4132,
+      "step": 17
+    },
+    {
+      "epoch": 0.0144,
+      "grad_norm": 1.3360353708267212,
+      "learning_rate": 0.00018514285714285716,
+      "loss": 0.5101,
+      "step": 18
+    },
+    {
+      "epoch": 0.0152,
+      "grad_norm": 1.3629347085952759,
+      "learning_rate": 0.00018400000000000003,
+      "loss": 0.531,
+      "step": 19
+    },
+    {
+      "epoch": 0.016,
+      "grad_norm": 2.1956729888916016,
+      "learning_rate": 0.00018285714285714286,
+      "loss": 0.4638,
+      "step": 20
+    },
+    {
+      "epoch": 0.0168,
+      "grad_norm": 1.1625367403030396,
+      "learning_rate": 0.00018171428571428573,
+      "loss": 0.4728,
+      "step": 21
+    },
+    {
+      "epoch": 0.0176,
+      "grad_norm": 1.1536197662353516,
+      "learning_rate": 0.00018057142857142857,
+      "loss": 0.4242,
+      "step": 22
+    },
+    {
+      "epoch": 0.0184,
+      "grad_norm": 1.5429607629776,
+      "learning_rate": 0.00017942857142857143,
+      "loss": 0.5295,
+      "step": 23
+    },
+    {
+      "epoch": 0.0192,
+      "grad_norm": 1.8186460733413696,
+      "learning_rate": 0.0001782857142857143,
+      "loss": 0.3945,
+      "step": 24
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.7319910526275635,
+      "learning_rate": 0.00017714285714285713,
+      "loss": 0.4291,
+      "step": 25
+    },
+    {
+      "epoch": 0.0208,
+      "grad_norm": 1.2174618244171143,
+      "learning_rate": 0.00017600000000000002,
+      "loss": 0.4502,
+      "step": 26
+    },
+    {
+      "epoch": 0.0216,
+      "grad_norm": 1.2327603101730347,
+      "learning_rate": 0.0001748571428571429,
+      "loss": 0.4819,
+      "step": 27
+    },
+    {
+      "epoch": 0.0224,
+      "grad_norm": 1.1828025579452515,
+      "learning_rate": 0.00017371428571428572,
+      "loss": 0.4445,
+      "step": 28
+    },
+    {
+      "epoch": 0.0232,
+      "grad_norm": 1.302361011505127,
+      "learning_rate": 0.0001725714285714286,
+      "loss": 0.4674,
+      "step": 29
+    },
+    {
+      "epoch": 0.024,
+      "grad_norm": 1.158236026763916,
+      "learning_rate": 0.00017142857142857143,
+      "loss": 0.381,
+      "step": 30
+    },
+    {
+      "epoch": 0.0248,
+      "grad_norm": 1.1804059743881226,
+      "learning_rate": 0.0001702857142857143,
+      "loss": 0.4083,
+      "step": 31
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 1.4666764736175537,
+      "learning_rate": 0.00016914285714285715,
+      "loss": 0.496,
+      "step": 32
+    },
+    {
+      "epoch": 0.0264,
+      "grad_norm": 0.9403315782546997,
+      "learning_rate": 0.000168,
+      "loss": 0.401,
+      "step": 33
+    },
+    {
+      "epoch": 0.0272,
+      "grad_norm": 1.3564960956573486,
+      "learning_rate": 0.00016685714285714285,
+      "loss": 0.4214,
+      "step": 34
+    },
+    {
+      "epoch": 0.028,
+      "grad_norm": 1.104662299156189,
+      "learning_rate": 0.00016571428571428575,
+      "loss": 0.3714,
+      "step": 35
+    },
+    {
+      "epoch": 0.0288,
+      "grad_norm": 1.098190426826477,
+      "learning_rate": 0.00016457142857142858,
+      "loss": 0.3765,
+      "step": 36
+    },
+    {
+      "epoch": 0.0296,
+      "grad_norm": 1.1176928281784058,
+      "learning_rate": 0.00016342857142857145,
+      "loss": 0.449,
+      "step": 37
+    },
+    {
+      "epoch": 0.0304,
+      "grad_norm": 1.014857530593872,
+      "learning_rate": 0.00016228571428571428,
+      "loss": 0.3733,
+      "step": 38
+    },
+    {
+      "epoch": 0.0312,
+      "grad_norm": 0.9619799256324768,
+      "learning_rate": 0.00016114285714285715,
+      "loss": 0.383,
+      "step": 39
+    },
+    {
+      "epoch": 0.032,
+      "grad_norm": 1.2253310680389404,
+      "learning_rate": 0.00016,
+      "loss": 0.4304,
+      "step": 40
+    },
+    {
+      "epoch": 0.0328,
+      "grad_norm": 1.2873166799545288,
+      "learning_rate": 0.00015885714285714285,
+      "loss": 0.4529,
+      "step": 41
+    },
+    {
+      "epoch": 0.0336,
+      "grad_norm": 1.3297463655471802,
+      "learning_rate": 0.00015771428571428571,
+      "loss": 0.4423,
+      "step": 42
+    },
+    {
+      "epoch": 0.0344,
+      "grad_norm": 1.1202378273010254,
+      "learning_rate": 0.00015657142857142858,
+      "loss": 0.4976,
+      "step": 43
+    },
+    {
+      "epoch": 0.0352,
+      "grad_norm": 1.1365331411361694,
+      "learning_rate": 0.00015542857142857144,
+      "loss": 0.3866,
+      "step": 44
+    },
+    {
+      "epoch": 0.036,
+      "grad_norm": 1.1100164651870728,
+      "learning_rate": 0.0001542857142857143,
+      "loss": 0.4384,
+      "step": 45
+    },
+    {
+      "epoch": 0.0368,
+      "grad_norm": 1.0640782117843628,
+      "learning_rate": 0.00015314285714285714,
+      "loss": 0.3889,
+      "step": 46
+    },
+    {
+      "epoch": 0.0376,
+      "grad_norm": 1.3084195852279663,
+      "learning_rate": 0.000152,
+      "loss": 0.4482,
+      "step": 47
+    },
+    {
+      "epoch": 0.0384,
+      "grad_norm": 1.3079781532287598,
+      "learning_rate": 0.00015085714285714287,
+      "loss": 0.4958,
+      "step": 48
+    },
+    {
+      "epoch": 0.0392,
+      "grad_norm": 1.1769342422485352,
+      "learning_rate": 0.0001497142857142857,
+      "loss": 0.4159,
+      "step": 49
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 1.0382602214813232,
+      "learning_rate": 0.00014857142857142857,
+      "loss": 0.4193,
+      "step": 50
+    },
+    {
+      "epoch": 0.0408,
+      "grad_norm": 1.0692741870880127,
+      "learning_rate": 0.00014742857142857144,
+      "loss": 0.4265,
+      "step": 51
+    },
+    {
+      "epoch": 0.0416,
+      "grad_norm": 0.9739915728569031,
+      "learning_rate": 0.0001462857142857143,
+      "loss": 0.3902,
+      "step": 52
+    },
+    {
+      "epoch": 0.0424,
+      "grad_norm": 1.0809895992279053,
+      "learning_rate": 0.00014514285714285717,
+      "loss": 0.4059,
+      "step": 53
+    },
+    {
+      "epoch": 0.0432,
+      "grad_norm": 1.1480313539505005,
+      "learning_rate": 0.000144,
+      "loss": 0.4432,
+      "step": 54
+    },
+    {
+      "epoch": 0.044,
+      "grad_norm": 0.8420342803001404,
+      "learning_rate": 0.00014285714285714287,
+      "loss": 0.3317,
+      "step": 55
+    },
+    {
+      "epoch": 0.0448,
+      "grad_norm": 0.8640209436416626,
+      "learning_rate": 0.0001417142857142857,
+      "loss": 0.3639,
+      "step": 56
+    },
+    {
+      "epoch": 0.0456,
+      "grad_norm": 1.0063021183013916,
+      "learning_rate": 0.00014057142857142857,
+      "loss": 0.3659,
+      "step": 57
+    },
+    {
+      "epoch": 0.0464,
+      "grad_norm": 0.9292585253715515,
+      "learning_rate": 0.00013942857142857143,
+      "loss": 0.3588,
+      "step": 58
+    },
+    {
+      "epoch": 0.0472,
+      "grad_norm": 0.9295415282249451,
+      "learning_rate": 0.0001382857142857143,
+      "loss": 0.3516,
+      "step": 59
+    },
+    {
+      "epoch": 0.048,
+      "grad_norm": 1.1771010160446167,
+      "learning_rate": 0.00013714285714285716,
+      "loss": 0.3753,
+      "step": 60
+    },
+    {
+      "epoch": 0.0488,
+      "grad_norm": 1.0638577938079834,
+      "learning_rate": 0.00013600000000000003,
+      "loss": 0.3609,
+      "step": 61
+    },
+    {
+      "epoch": 0.0496,
+      "grad_norm": 1.1870827674865723,
+      "learning_rate": 0.00013485714285714286,
+      "loss": 0.4136,
+      "step": 62
+    },
+    {
+      "epoch": 0.0504,
+      "grad_norm": 1.2225092649459839,
+      "learning_rate": 0.00013371428571428573,
+      "loss": 0.3917,
+      "step": 63
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 1.169294834136963,
+      "learning_rate": 0.00013257142857142856,
+      "loss": 0.381,
+      "step": 64
+    },
+    {
+      "epoch": 0.052,
+      "grad_norm": 0.9767704010009766,
+      "learning_rate": 0.00013142857142857143,
+      "loss": 0.3635,
+      "step": 65
+    },
+    {
+      "epoch": 0.0528,
+      "grad_norm": 1.1561055183410645,
+      "learning_rate": 0.0001302857142857143,
+      "loss": 0.3723,
+      "step": 66
+    },
+    {
+      "epoch": 0.0536,
+      "grad_norm": 1.2369543313980103,
+      "learning_rate": 0.00012914285714285713,
+      "loss": 0.4142,
+      "step": 67
+    },
+    {
+      "epoch": 0.0544,
+      "grad_norm": 1.3053923845291138,
+      "learning_rate": 0.00012800000000000002,
+      "loss": 0.3437,
+      "step": 68
+    },
+    {
+      "epoch": 0.0552,
+      "grad_norm": 1.0821707248687744,
+      "learning_rate": 0.00012685714285714286,
+      "loss": 0.3765,
+      "step": 69
+    },
+    {
+      "epoch": 0.056,
+      "grad_norm": 1.3041565418243408,
+      "learning_rate": 0.00012571428571428572,
+      "loss": 0.4846,
+      "step": 70
+    },
+    {
+      "epoch": 0.0568,
+      "grad_norm": 1.1836012601852417,
+      "learning_rate": 0.0001245714285714286,
+      "loss": 0.3651,
+      "step": 71
+    },
+    {
+      "epoch": 0.0576,
+      "grad_norm": 1.1629185676574707,
+      "learning_rate": 0.00012342857142857142,
+      "loss": 0.3126,
+      "step": 72
+    },
+    {
+      "epoch": 0.0584,
+      "grad_norm": 1.2542935609817505,
+      "learning_rate": 0.0001222857142857143,
+      "loss": 0.3796,
+      "step": 73
+    },
+    {
+      "epoch": 0.0592,
+      "grad_norm": 1.2257918119430542,
+      "learning_rate": 0.00012114285714285715,
+      "loss": 0.3752,
+      "step": 74
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.0555886030197144,
+      "learning_rate": 0.00012,
+      "loss": 0.3954,
+      "step": 75
+    },
+    {
+      "epoch": 0.0608,
+      "grad_norm": 1.230891227722168,
+      "learning_rate": 0.00011885714285714287,
+      "loss": 0.3875,
+      "step": 76
+    },
+    {
+      "epoch": 0.0616,
+      "grad_norm": 0.8854696154594421,
+      "learning_rate": 0.0001177142857142857,
+      "loss": 0.3118,
+      "step": 77
+    },
+    {
+      "epoch": 0.0624,
+      "grad_norm": 1.0647600889205933,
+      "learning_rate": 0.00011657142857142858,
+      "loss": 0.3555,
+      "step": 78
+    },
+    {
+      "epoch": 0.0632,
+      "grad_norm": 0.9846012592315674,
+      "learning_rate": 0.00011542857142857145,
+      "loss": 0.3608,
+      "step": 79
+    },
+    {
+      "epoch": 0.064,
+      "grad_norm": 1.286117434501648,
+      "learning_rate": 0.00011428571428571428,
+      "loss": 0.3681,
+      "step": 80
+    },
+    {
+      "epoch": 0.0648,
+      "grad_norm": 0.9961950182914734,
+      "learning_rate": 0.00011314285714285715,
+      "loss": 0.3476,
+      "step": 81
+    },
+    {
+      "epoch": 0.0656,
+      "grad_norm": 0.9044762849807739,
+      "learning_rate": 0.00011200000000000001,
+      "loss": 0.3652,
+      "step": 82
+    },
+    {
+      "epoch": 0.0664,
+      "grad_norm": 1.3212488889694214,
+      "learning_rate": 0.00011085714285714286,
+      "loss": 0.371,
+      "step": 83
+    },
+    {
+      "epoch": 0.0672,
+      "grad_norm": 1.1521120071411133,
+      "learning_rate": 0.00010971428571428573,
+      "loss": 0.3659,
+      "step": 84
+    },
+    {
+      "epoch": 0.068,
+      "grad_norm": 0.9962939620018005,
+      "learning_rate": 0.00010857142857142856,
+      "loss": 0.3525,
+      "step": 85
+    },
+    {
+      "epoch": 0.0688,
+      "grad_norm": 0.939199686050415,
+      "learning_rate": 0.00010742857142857143,
+      "loss": 0.3164,
+      "step": 86
+    },
+    {
+      "epoch": 0.0696,
+      "grad_norm": 1.098573088645935,
+      "learning_rate": 0.0001062857142857143,
+      "loss": 0.3656,
+      "step": 87
+    },
+    {
+      "epoch": 0.0704,
+      "grad_norm": 1.000960350036621,
+      "learning_rate": 0.00010514285714285714,
+      "loss": 0.3685,
+      "step": 88
+    },
+    {
+      "epoch": 0.0712,
+      "grad_norm": 0.980912446975708,
+      "learning_rate": 0.00010400000000000001,
+      "loss": 0.3632,
+      "step": 89
+    },
+    {
+      "epoch": 0.072,
+      "grad_norm": 1.2877452373504639,
+      "learning_rate": 0.00010285714285714286,
+      "loss": 0.4223,
+      "step": 90
+    },
+    {
+      "epoch": 0.0728,
+      "grad_norm": 1.1124482154846191,
+      "learning_rate": 0.00010171428571428572,
+      "loss": 0.3475,
+      "step": 91
+    },
+    {
+      "epoch": 0.0736,
+      "grad_norm": 1.0587921142578125,
+      "learning_rate": 0.00010057142857142859,
+      "loss": 0.3612,
+      "step": 92
+    },
+    {
+      "epoch": 0.0744,
+      "grad_norm": 0.9964851140975952,
+      "learning_rate": 9.942857142857144e-05,
+      "loss": 0.3435,
+      "step": 93
+    },
+    {
+      "epoch": 0.0752,
+      "grad_norm": 1.0694661140441895,
+      "learning_rate": 9.828571428571429e-05,
+      "loss": 0.3709,
+      "step": 94
+    },
+    {
+      "epoch": 0.076,
+      "grad_norm": 1.0262457132339478,
+      "learning_rate": 9.714285714285715e-05,
+      "loss": 0.3496,
+      "step": 95
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 0.9541298151016235,
+      "learning_rate": 9.6e-05,
+      "loss": 0.3491,
+      "step": 96
+    },
+    {
+      "epoch": 0.0776,
+      "grad_norm": 1.0212255716323853,
+      "learning_rate": 9.485714285714287e-05,
+      "loss": 0.3474,
+      "step": 97
+    },
+    {
+      "epoch": 0.0784,
+      "grad_norm": 0.992710292339325,
+      "learning_rate": 9.371428571428572e-05,
+      "loss": 0.3331,
+      "step": 98
+    },
+    {
+      "epoch": 0.0792,
+      "grad_norm": 1.0837984085083008,
+      "learning_rate": 9.257142857142858e-05,
+      "loss": 0.5021,
+      "step": 99
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.1747627258300781,
+      "learning_rate": 9.142857142857143e-05,
+      "loss": 0.3499,
+      "step": 100
+    },
+    {
+      "epoch": 0.0808,
+      "grad_norm": 0.9934831261634827,
+      "learning_rate": 9.028571428571428e-05,
+      "loss": 0.3632,
+      "step": 101
+    },
+    {
+      "epoch": 0.0816,
+      "grad_norm": 0.9274324178695679,
+      "learning_rate": 8.914285714285715e-05,
+      "loss": 0.3291,
+      "step": 102
+    },
+    {
+      "epoch": 0.0824,
+      "grad_norm": 1.0825073719024658,
+      "learning_rate": 8.800000000000001e-05,
+      "loss": 0.3972,
+      "step": 103
+    },
+    {
+      "epoch": 0.0832,
+      "grad_norm": 0.9639647603034973,
+      "learning_rate": 8.685714285714286e-05,
+      "loss": 0.3747,
+      "step": 104
+    },
+    {
+      "epoch": 0.084,
+      "grad_norm": 1.336667776107788,
+      "learning_rate": 8.571428571428571e-05,
+      "loss": 0.3836,
+      "step": 105
+    },
+    {
+      "epoch": 0.0848,
+      "grad_norm": 1.1072027683258057,
+      "learning_rate": 8.457142857142858e-05,
+      "loss": 0.3405,
+      "step": 106
+    },
+    {
+      "epoch": 0.0856,
+      "grad_norm": 1.1694585084915161,
+      "learning_rate": 8.342857142857143e-05,
+      "loss": 0.3752,
+      "step": 107
+    },
+    {
+      "epoch": 0.0864,
+      "grad_norm": 1.0039108991622925,
+      "learning_rate": 8.228571428571429e-05,
+      "loss": 0.3464,
+      "step": 108
+    },
+    {
+      "epoch": 0.0872,
+      "grad_norm": 0.9793558716773987,
+      "learning_rate": 8.114285714285714e-05,
+      "loss": 0.3062,
+      "step": 109
+    },
+    {
+      "epoch": 0.088,
+      "grad_norm": 0.8974325060844421,
+      "learning_rate": 8e-05,
+      "loss": 0.3015,
+      "step": 110
+    },
+    {
+      "epoch": 0.0888,
+      "grad_norm": 1.0261926651000977,
+      "learning_rate": 7.885714285714286e-05,
+      "loss": 0.3834,
+      "step": 111
+    },
+    {
+      "epoch": 0.0896,
+      "grad_norm": 1.844468116760254,
+      "learning_rate": 7.771428571428572e-05,
+      "loss": 0.4882,
+      "step": 112
+    },
+    {
+      "epoch": 0.0904,
+      "grad_norm": 0.9161112904548645,
+      "learning_rate": 7.657142857142857e-05,
+      "loss": 0.3179,
+      "step": 113
+    },
+    {
+      "epoch": 0.0912,
+      "grad_norm": 1.3449184894561768,
+      "learning_rate": 7.542857142857144e-05,
+      "loss": 0.4077,
+      "step": 114
+    },
+    {
+      "epoch": 0.092,
+      "grad_norm": 0.9990420341491699,
+      "learning_rate": 7.428571428571429e-05,
+      "loss": 0.3093,
+      "step": 115
+    },
+    {
+      "epoch": 0.0928,
+      "grad_norm": 1.0407809019088745,
+      "learning_rate": 7.314285714285715e-05,
+      "loss": 0.3575,
+      "step": 116
+    },
+    {
+      "epoch": 0.0936,
+      "grad_norm": 1.1811689138412476,
+      "learning_rate": 7.2e-05,
+      "loss": 0.4108,
+      "step": 117
+    },
+    {
+      "epoch": 0.0944,
+      "grad_norm": 1.0623583793640137,
+      "learning_rate": 7.085714285714285e-05,
+      "loss": 0.3481,
+      "step": 118
+    },
+    {
+      "epoch": 0.0952,
+      "grad_norm": 1.0132906436920166,
+      "learning_rate": 6.971428571428572e-05,
+      "loss": 0.3489,
+      "step": 119
+    },
+    {
+      "epoch": 0.096,
+      "grad_norm": 0.7905811667442322,
+      "learning_rate": 6.857142857142858e-05,
+      "loss": 0.285,
+      "step": 120
+    },
+    {
+      "epoch": 0.0968,
+      "grad_norm": 1.0692896842956543,
+      "learning_rate": 6.742857142857143e-05,
+      "loss": 0.3983,
+      "step": 121
+    },
+    {
+      "epoch": 0.0976,
+      "grad_norm": 1.1936224699020386,
+      "learning_rate": 6.628571428571428e-05,
+      "loss": 0.3634,
+      "step": 122
+    },
+    {
+      "epoch": 0.0984,
+      "grad_norm": 1.1843299865722656,
+      "learning_rate": 6.514285714285715e-05,
+      "loss": 0.3461,
+      "step": 123
+    },
+    {
+      "epoch": 0.0992,
+      "grad_norm": 0.8762491941452026,
+      "learning_rate": 6.400000000000001e-05,
+      "loss": 0.2921,
+      "step": 124
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.8600658178329468,
+      "learning_rate": 6.285714285714286e-05,
+      "loss": 0.2707,
+      "step": 125
+    },
+    {
+      "epoch": 0.1008,
+      "grad_norm": 1.1771011352539062,
+      "learning_rate": 6.171428571428571e-05,
+      "loss": 0.3786,
+      "step": 126
+    },
+    {
+      "epoch": 0.1016,
+      "grad_norm": 1.2326810359954834,
+      "learning_rate": 6.0571428571428576e-05,
+      "loss": 0.4223,
+      "step": 127
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 0.9957399964332581,
+      "learning_rate": 5.9428571428571434e-05,
+      "loss": 0.3816,
+      "step": 128
+    },
+    {
+      "epoch": 0.1032,
+      "grad_norm": 1.1350103616714478,
+      "learning_rate": 5.828571428571429e-05,
+      "loss": 0.3738,
+      "step": 129
+    },
+    {
+      "epoch": 0.104,
+      "grad_norm": 0.9324225187301636,
+      "learning_rate": 5.714285714285714e-05,
+      "loss": 0.3371,
+      "step": 130
+    },
+    {
+      "epoch": 0.1048,
+      "grad_norm": 1.0233099460601807,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 0.356,
+      "step": 131
+    },
+    {
+      "epoch": 0.1056,
+      "grad_norm": 0.9429035782814026,
+      "learning_rate": 5.485714285714286e-05,
+      "loss": 0.3153,
+      "step": 132
+    },
+    {
+      "epoch": 0.1064,
+      "grad_norm": 0.9844772219657898,
+      "learning_rate": 5.3714285714285714e-05,
+      "loss": 0.3266,
+      "step": 133
+    },
+    {
+      "epoch": 0.1072,
+      "grad_norm": 0.8643121123313904,
+      "learning_rate": 5.257142857142857e-05,
+      "loss": 0.2885,
+      "step": 134
+    },
+    {
+      "epoch": 0.108,
+      "grad_norm": 1.1424559354782104,
+      "learning_rate": 5.142857142857143e-05,
+      "loss": 0.3934,
+      "step": 135
+    },
+    {
+      "epoch": 0.1088,
+      "grad_norm": 0.8285444378852844,
+      "learning_rate": 5.028571428571429e-05,
+      "loss": 0.2926,
+      "step": 136
+    },
+    {
+      "epoch": 0.1096,
+      "grad_norm": 0.9021345376968384,
+      "learning_rate": 4.9142857142857144e-05,
+      "loss": 0.3096,
+      "step": 137
+    },
+    {
+      "epoch": 0.1104,
+      "grad_norm": 0.8790440559387207,
+      "learning_rate": 4.8e-05,
+      "loss": 0.3336,
+      "step": 138
+    },
+    {
+      "epoch": 0.1112,
+      "grad_norm": 0.8108084201812744,
+      "learning_rate": 4.685714285714286e-05,
+      "loss": 0.289,
+      "step": 139
+    },
+    {
+      "epoch": 0.112,
+      "grad_norm": 0.9260637164115906,
+      "learning_rate": 4.5714285714285716e-05,
+      "loss": 0.4126,
+      "step": 140
+    },
+    {
+      "epoch": 0.1128,
+      "grad_norm": 0.9265069365501404,
+      "learning_rate": 4.4571428571428574e-05,
+      "loss": 0.3178,
+      "step": 141
+    },
+    {
+      "epoch": 0.1136,
+      "grad_norm": 1.2468631267547607,
+      "learning_rate": 4.342857142857143e-05,
+      "loss": 0.3889,
+      "step": 142
+    },
+    {
+      "epoch": 0.1144,
+      "grad_norm": 0.8827513456344604,
+      "learning_rate": 4.228571428571429e-05,
+      "loss": 0.3165,
+      "step": 143
+    },
+    {
+      "epoch": 0.1152,
+      "grad_norm": 1.069765329360962,
+      "learning_rate": 4.1142857142857146e-05,
+      "loss": 0.4032,
+      "step": 144
+    },
+    {
+      "epoch": 0.116,
+      "grad_norm": 0.9694308042526245,
+      "learning_rate": 4e-05,
+      "loss": 0.3199,
+      "step": 145
+    },
+    {
+      "epoch": 0.1168,
+      "grad_norm": 0.9222254753112793,
+      "learning_rate": 3.885714285714286e-05,
+      "loss": 0.3042,
+      "step": 146
+    },
+    {
+      "epoch": 0.1176,
+      "grad_norm": 0.9769037365913391,
+      "learning_rate": 3.771428571428572e-05,
+      "loss": 0.3338,
+      "step": 147
+    },
+    {
+      "epoch": 0.1184,
+      "grad_norm": 0.9615337252616882,
+      "learning_rate": 3.6571428571428576e-05,
+      "loss": 0.3822,
+      "step": 148
+    },
+    {
+      "epoch": 0.1192,
+      "grad_norm": 0.8975436091423035,
+      "learning_rate": 3.5428571428571426e-05,
+      "loss": 0.3066,
+      "step": 149
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.0323967933654785,
+      "learning_rate": 3.428571428571429e-05,
+      "loss": 0.3446,
+      "step": 150
+    },
+    {
+      "epoch": 0.1208,
+      "grad_norm": 0.7756757736206055,
+      "learning_rate": 3.314285714285714e-05,
+      "loss": 0.2556,
+      "step": 151
+    },
+    {
+      "epoch": 0.1216,
+      "grad_norm": 1.119676947593689,
+      "learning_rate": 3.2000000000000005e-05,
+      "loss": 0.3492,
+      "step": 152
+    },
+    {
+      "epoch": 0.1224,
+      "grad_norm": 1.0270119905471802,
+      "learning_rate": 3.0857142857142856e-05,
+      "loss": 0.3593,
+      "step": 153
+    },
+    {
+      "epoch": 0.1232,
+      "grad_norm": 0.7868067026138306,
+      "learning_rate": 2.9714285714285717e-05,
+      "loss": 0.2695,
+      "step": 154
+    },
+    {
+      "epoch": 0.124,
+      "grad_norm": 1.1581284999847412,
+      "learning_rate": 2.857142857142857e-05,
+      "loss": 0.3769,
+      "step": 155
+    },
+    {
+      "epoch": 0.1248,
+      "grad_norm": 0.9818435311317444,
+      "learning_rate": 2.742857142857143e-05,
+      "loss": 0.3319,
+      "step": 156
+    },
+    {
+      "epoch": 0.1256,
+      "grad_norm": 1.6817810535430908,
+      "learning_rate": 2.6285714285714286e-05,
+      "loss": 0.4218,
+      "step": 157
+    },
+    {
+      "epoch": 0.1264,
+      "grad_norm": 0.9001001119613647,
+      "learning_rate": 2.5142857142857147e-05,
+      "loss": 0.2972,
+      "step": 158
+    },
+    {
+      "epoch": 0.1272,
+      "grad_norm": 1.030470848083496,
+      "learning_rate": 2.4e-05,
+      "loss": 0.3308,
+      "step": 159
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 0.9055750370025635,
+      "learning_rate": 2.2857142857142858e-05,
+      "loss": 0.3625,
+      "step": 160
+    },
+    {
+      "epoch": 0.1288,
+      "grad_norm": 0.8893150091171265,
+      "learning_rate": 2.1714285714285715e-05,
+      "loss": 0.3002,
+      "step": 161
+    },
+    {
+      "epoch": 0.1296,
+      "grad_norm": 0.8902642130851746,
+      "learning_rate": 2.0571428571428573e-05,
+      "loss": 0.2854,
+      "step": 162
+    },
+    {
+      "epoch": 0.1304,
+      "grad_norm": 0.9507644772529602,
+      "learning_rate": 1.942857142857143e-05,
+      "loss": 0.3117,
+      "step": 163
+    },
+    {
+      "epoch": 0.1312,
+      "grad_norm": 0.9474930763244629,
+      "learning_rate": 1.8285714285714288e-05,
+      "loss": 0.3149,
+      "step": 164
+    },
+    {
+      "epoch": 0.132,
+      "grad_norm": 1.1699720621109009,
+      "learning_rate": 1.7142857142857145e-05,
+      "loss": 0.3471,
+      "step": 165
+    },
+    {
+      "epoch": 0.1328,
+      "grad_norm": 0.9667317271232605,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 0.3164,
+      "step": 166
+    },
+    {
+      "epoch": 0.1336,
+      "grad_norm": 1.0271855592727661,
+      "learning_rate": 1.4857142857142858e-05,
+      "loss": 0.327,
+      "step": 167
+    },
+    {
+      "epoch": 0.1344,
+      "grad_norm": 1.0459516048431396,
+      "learning_rate": 1.3714285714285716e-05,
+      "loss": 0.3333,
+      "step": 168
+    },
+    {
+      "epoch": 0.1352,
+      "grad_norm": 0.9808986783027649,
+      "learning_rate": 1.2571428571428573e-05,
+      "loss": 0.2996,
+      "step": 169
+    },
+    {
+      "epoch": 0.136,
+      "grad_norm": 0.9472062587738037,
+      "learning_rate": 1.1428571428571429e-05,
+      "loss": 0.289,
+      "step": 170
+    },
+    {
+      "epoch": 0.1368,
+      "grad_norm": 1.0804734230041504,
+      "learning_rate": 1.0285714285714286e-05,
+      "loss": 0.3353,
+      "step": 171
+    },
+    {
+      "epoch": 0.1376,
+      "grad_norm": 0.9845679402351379,
+      "learning_rate": 9.142857142857144e-06,
+      "loss": 0.352,
+      "step": 172
+    },
+    {
+      "epoch": 0.1384,
+      "grad_norm": 1.3011351823806763,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 0.4186,
+      "step": 173
+    },
+    {
+      "epoch": 0.1392,
+      "grad_norm": 1.0187007188796997,
+      "learning_rate": 6.857142857142858e-06,
+      "loss": 0.3409,
+      "step": 174
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.0254411697387695,
+      "learning_rate": 5.7142857142857145e-06,
+      "loss": 0.3577,
+      "step": 175
+    },
+    {
+      "epoch": 0.1408,
+      "grad_norm": 1.2106192111968994,
+      "learning_rate": 4.571428571428572e-06,
+      "loss": 0.3982,
+      "step": 176
+    },
+    {
+      "epoch": 0.1416,
+      "grad_norm": 1.1773263216018677,
+      "learning_rate": 3.428571428571429e-06,
+      "loss": 0.4192,
+      "step": 177
+    },
+    {
+      "epoch": 0.1424,
+      "grad_norm": 1.0779730081558228,
+      "learning_rate": 2.285714285714286e-06,
+      "loss": 0.371,
+      "step": 178
+    },
+    {
+      "epoch": 0.1432,
+      "grad_norm": 1.0007537603378296,
+      "learning_rate": 1.142857142857143e-06,
+      "loss": 0.402,
+      "step": 179
+    },
+    {
+      "epoch": 0.144,
+      "grad_norm": 0.9963799715042114,
+      "learning_rate": 0.0,
+      "loss": 0.4271,
+      "step": 180
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 180,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.5709329524916224e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-180/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6b3ce913d8bc05c56e47a40b880b5b7e58b488777322d590faaba63f85b18db8
+size 5624

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "<|im_end|>",
+  "pad_token": "[control_768]",
+  "unk_token": "<unk>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
+size 587404

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff