hchang commited on Jul 19, 2024

Commit

e937342

verified ·

1 Parent(s): 705f89f

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

checkpoint-1000/README.md +202 -0
checkpoint-1000/adapter_config.json +31 -0
checkpoint-1000/adapter_model.safetensors +3 -0
checkpoint-1000/optimizer.pt +3 -0
checkpoint-1000/rng_state.pth +3 -0
checkpoint-1000/scheduler.pt +3 -0
checkpoint-1000/trainer_state.json +751 -0
checkpoint-1000/training_args.bin +3 -0
checkpoint-1500/README.md +202 -0
checkpoint-1500/adapter_config.json +31 -0
checkpoint-1500/adapter_model.safetensors +3 -0
checkpoint-1500/optimizer.pt +3 -0
checkpoint-1500/rng_state.pth +3 -0
checkpoint-1500/scheduler.pt +3 -0
checkpoint-1500/trainer_state.json +1110 -0
checkpoint-1500/training_args.bin +3 -0
checkpoint-2000/README.md +202 -0
checkpoint-2000/adapter_config.json +31 -0
checkpoint-2000/adapter_model.safetensors +3 -0
checkpoint-2000/optimizer.pt +3 -0
checkpoint-2000/rng_state.pth +3 -0
checkpoint-2000/scheduler.pt +3 -0
checkpoint-2000/trainer_state.json +1469 -0
checkpoint-2000/training_args.bin +3 -0
checkpoint-2500/README.md +202 -0
checkpoint-2500/adapter_config.json +31 -0
checkpoint-2500/adapter_model.safetensors +3 -0
checkpoint-2500/optimizer.pt +3 -0
checkpoint-2500/rng_state.pth +3 -0
checkpoint-2500/scheduler.pt +3 -0
checkpoint-2500/trainer_state.json +1828 -0
checkpoint-2500/training_args.bin +3 -0
checkpoint-3000/README.md +202 -0
checkpoint-3000/adapter_config.json +31 -0
checkpoint-3000/adapter_model.safetensors +3 -0
checkpoint-3000/optimizer.pt +3 -0
checkpoint-3000/rng_state.pth +3 -0
checkpoint-3000/scheduler.pt +3 -0
checkpoint-3000/trainer_state.json +2187 -0
checkpoint-3000/training_args.bin +3 -0
checkpoint-3004/README.md +202 -0
checkpoint-3004/adapter_config.json +31 -0
checkpoint-3004/adapter_model.safetensors +3 -0
checkpoint-3004/optimizer.pt +3 -0
checkpoint-3004/rng_state.pth +3 -0
checkpoint-3004/scheduler.pt +3 -0
checkpoint-3004/trainer_state.json +2187 -0
checkpoint-3004/training_args.bin +3 -0
checkpoint-500/README.md +202 -0
checkpoint-500/adapter_config.json +31 -0

checkpoint-1000/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: gpt2
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-1000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "gpt2",
+  "bias": "none",
+  "fan_in_fan_out": true,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:52172c7f5fd13b1a50899d9094ff00835b7ecf6da63f474601e1f00608c0a758
+size 594496

checkpoint-1000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d8cceb896033a02c9a5a8a91109d190488a1f91c73af6d0eb1b5ace57e2901f5
+size 1197932

checkpoint-1000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:daf9b148b2f9ddd9c9899ce11c115c7924fbccd3175b16679d8433c371025abf
+size 14180

checkpoint-1000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:da2994e763da51e82fbd5459bf85bd6bbcd46c378f73a7e024449ec7861beb2c
+size 1064

checkpoint-1000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,751 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.33288948069241014,
+  "eval_steps": 500,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.003328894806924101,
+      "grad_norm": 37.75,
+      "learning_rate": 1.9933422103861518e-05,
+      "loss": 1.4084,
+      "step": 10
+    },
+    {
+      "epoch": 0.006657789613848202,
+      "grad_norm": 12.125,
+      "learning_rate": 1.9866844207723038e-05,
+      "loss": 1.0474,
+      "step": 20
+    },
+    {
+      "epoch": 0.009986684420772303,
+      "grad_norm": 45.25,
+      "learning_rate": 1.9800266311584554e-05,
+      "loss": 1.2455,
+      "step": 30
+    },
+    {
+      "epoch": 0.013315579227696404,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.9733688415446073e-05,
+      "loss": 1.1969,
+      "step": 40
+    },
+    {
+      "epoch": 0.016644474034620507,
+      "grad_norm": 32.0,
+      "learning_rate": 1.966711051930759e-05,
+      "loss": 1.1659,
+      "step": 50
+    },
+    {
+      "epoch": 0.019973368841544607,
+      "grad_norm": 19.0,
+      "learning_rate": 1.960053262316911e-05,
+      "loss": 1.1159,
+      "step": 60
+    },
+    {
+      "epoch": 0.02330226364846871,
+      "grad_norm": 27.5,
+      "learning_rate": 1.953395472703063e-05,
+      "loss": 1.1861,
+      "step": 70
+    },
+    {
+      "epoch": 0.02663115845539281,
+      "grad_norm": 17.5,
+      "learning_rate": 1.9467376830892145e-05,
+      "loss": 1.1073,
+      "step": 80
+    },
+    {
+      "epoch": 0.02996005326231691,
+      "grad_norm": 33.75,
+      "learning_rate": 1.9400798934753665e-05,
+      "loss": 1.1506,
+      "step": 90
+    },
+    {
+      "epoch": 0.033288948069241014,
+      "grad_norm": 9.5,
+      "learning_rate": 1.933422103861518e-05,
+      "loss": 1.246,
+      "step": 100
+    },
+    {
+      "epoch": 0.03661784287616511,
+      "grad_norm": 11.3125,
+      "learning_rate": 1.92676431424767e-05,
+      "loss": 1.1708,
+      "step": 110
+    },
+    {
+      "epoch": 0.03994673768308921,
+      "grad_norm": 14.375,
+      "learning_rate": 1.9201065246338217e-05,
+      "loss": 1.1302,
+      "step": 120
+    },
+    {
+      "epoch": 0.043275632490013316,
+      "grad_norm": 20.625,
+      "learning_rate": 1.9134487350199737e-05,
+      "loss": 1.0537,
+      "step": 130
+    },
+    {
+      "epoch": 0.04660452729693742,
+      "grad_norm": 21.75,
+      "learning_rate": 1.9067909454061253e-05,
+      "loss": 1.0669,
+      "step": 140
+    },
+    {
+      "epoch": 0.049933422103861515,
+      "grad_norm": 14.125,
+      "learning_rate": 1.900133155792277e-05,
+      "loss": 1.0482,
+      "step": 150
+    },
+    {
+      "epoch": 0.05326231691078562,
+      "grad_norm": 27.625,
+      "learning_rate": 1.893475366178429e-05,
+      "loss": 1.1468,
+      "step": 160
+    },
+    {
+      "epoch": 0.05659121171770972,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.8868175765645805e-05,
+      "loss": 1.092,
+      "step": 170
+    },
+    {
+      "epoch": 0.05992010652463382,
+      "grad_norm": 32.5,
+      "learning_rate": 1.8801597869507325e-05,
+      "loss": 1.2208,
+      "step": 180
+    },
+    {
+      "epoch": 0.06324900133155792,
+      "grad_norm": 56.0,
+      "learning_rate": 1.873501997336884e-05,
+      "loss": 1.2359,
+      "step": 190
+    },
+    {
+      "epoch": 0.06657789613848203,
+      "grad_norm": 8.375,
+      "learning_rate": 1.866844207723036e-05,
+      "loss": 1.0712,
+      "step": 200
+    },
+    {
+      "epoch": 0.06990679094540612,
+      "grad_norm": 26.375,
+      "learning_rate": 1.860186418109188e-05,
+      "loss": 1.0156,
+      "step": 210
+    },
+    {
+      "epoch": 0.07323568575233022,
+      "grad_norm": 12.0,
+      "learning_rate": 1.8535286284953397e-05,
+      "loss": 1.0407,
+      "step": 220
+    },
+    {
+      "epoch": 0.07656458055925433,
+      "grad_norm": 13.75,
+      "learning_rate": 1.8468708388814916e-05,
+      "loss": 1.0161,
+      "step": 230
+    },
+    {
+      "epoch": 0.07989347536617843,
+      "grad_norm": 27.125,
+      "learning_rate": 1.8402130492676432e-05,
+      "loss": 1.1466,
+      "step": 240
+    },
+    {
+      "epoch": 0.08322237017310254,
+      "grad_norm": 26.125,
+      "learning_rate": 1.8335552596537952e-05,
+      "loss": 1.3192,
+      "step": 250
+    },
+    {
+      "epoch": 0.08655126498002663,
+      "grad_norm": 25.75,
+      "learning_rate": 1.826897470039947e-05,
+      "loss": 1.0628,
+      "step": 260
+    },
+    {
+      "epoch": 0.08988015978695073,
+      "grad_norm": 29.25,
+      "learning_rate": 1.8202396804260988e-05,
+      "loss": 0.967,
+      "step": 270
+    },
+    {
+      "epoch": 0.09320905459387484,
+      "grad_norm": 29.75,
+      "learning_rate": 1.8135818908122504e-05,
+      "loss": 1.0356,
+      "step": 280
+    },
+    {
+      "epoch": 0.09653794940079893,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.806924101198402e-05,
+      "loss": 0.9853,
+      "step": 290
+    },
+    {
+      "epoch": 0.09986684420772303,
+      "grad_norm": 27.5,
+      "learning_rate": 1.800266311584554e-05,
+      "loss": 1.0279,
+      "step": 300
+    },
+    {
+      "epoch": 0.10319573901464714,
+      "grad_norm": 5.65625,
+      "learning_rate": 1.7936085219707056e-05,
+      "loss": 1.0115,
+      "step": 310
+    },
+    {
+      "epoch": 0.10652463382157124,
+      "grad_norm": 8.1875,
+      "learning_rate": 1.7869507323568576e-05,
+      "loss": 0.942,
+      "step": 320
+    },
+    {
+      "epoch": 0.10985352862849534,
+      "grad_norm": 65.0,
+      "learning_rate": 1.7802929427430096e-05,
+      "loss": 1.0208,
+      "step": 330
+    },
+    {
+      "epoch": 0.11318242343541944,
+      "grad_norm": 9.625,
+      "learning_rate": 1.7736351531291612e-05,
+      "loss": 1.1729,
+      "step": 340
+    },
+    {
+      "epoch": 0.11651131824234354,
+      "grad_norm": 17.5,
+      "learning_rate": 1.766977363515313e-05,
+      "loss": 0.9322,
+      "step": 350
+    },
+    {
+      "epoch": 0.11984021304926765,
+      "grad_norm": 28.125,
+      "learning_rate": 1.7603195739014648e-05,
+      "loss": 1.0417,
+      "step": 360
+    },
+    {
+      "epoch": 0.12316910785619174,
+      "grad_norm": 18.25,
+      "learning_rate": 1.7536617842876168e-05,
+      "loss": 1.2491,
+      "step": 370
+    },
+    {
+      "epoch": 0.12649800266311584,
+      "grad_norm": 27.375,
+      "learning_rate": 1.7470039946737684e-05,
+      "loss": 0.9388,
+      "step": 380
+    },
+    {
+      "epoch": 0.12982689747003995,
+      "grad_norm": 22.125,
+      "learning_rate": 1.7403462050599203e-05,
+      "loss": 1.0488,
+      "step": 390
+    },
+    {
+      "epoch": 0.13315579227696406,
+      "grad_norm": 21.25,
+      "learning_rate": 1.733688415446072e-05,
+      "loss": 0.9951,
+      "step": 400
+    },
+    {
+      "epoch": 0.13648468708388814,
+      "grad_norm": 11.75,
+      "learning_rate": 1.727030625832224e-05,
+      "loss": 0.9212,
+      "step": 410
+    },
+    {
+      "epoch": 0.13981358189081225,
+      "grad_norm": 16.25,
+      "learning_rate": 1.7203728362183756e-05,
+      "loss": 1.0548,
+      "step": 420
+    },
+    {
+      "epoch": 0.14314247669773636,
+      "grad_norm": 19.875,
+      "learning_rate": 1.7137150466045275e-05,
+      "loss": 1.0238,
+      "step": 430
+    },
+    {
+      "epoch": 0.14647137150466044,
+      "grad_norm": 18.875,
+      "learning_rate": 1.707057256990679e-05,
+      "loss": 1.0327,
+      "step": 440
+    },
+    {
+      "epoch": 0.14980026631158455,
+      "grad_norm": 7.84375,
+      "learning_rate": 1.7003994673768308e-05,
+      "loss": 1.1155,
+      "step": 450
+    },
+    {
+      "epoch": 0.15312916111850866,
+      "grad_norm": 14.125,
+      "learning_rate": 1.693741677762983e-05,
+      "loss": 0.9627,
+      "step": 460
+    },
+    {
+      "epoch": 0.15645805592543274,
+      "grad_norm": 7.4375,
+      "learning_rate": 1.6870838881491347e-05,
+      "loss": 1.0216,
+      "step": 470
+    },
+    {
+      "epoch": 0.15978695073235685,
+      "grad_norm": 9.375,
+      "learning_rate": 1.6804260985352863e-05,
+      "loss": 1.0489,
+      "step": 480
+    },
+    {
+      "epoch": 0.16311584553928096,
+      "grad_norm": 19.5,
+      "learning_rate": 1.6737683089214383e-05,
+      "loss": 1.1254,
+      "step": 490
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "grad_norm": 26.125,
+      "learning_rate": 1.66711051930759e-05,
+      "loss": 1.0134,
+      "step": 500
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "eval_accuracy": 0.5015762402521985,
+      "eval_loss": 0.9942083358764648,
+      "eval_runtime": 211.8123,
+      "eval_samples_per_second": 113.818,
+      "eval_steps_per_second": 28.454,
+      "step": 500
+    },
+    {
+      "epoch": 0.16977363515312915,
+      "grad_norm": 12.3125,
+      "learning_rate": 1.660452729693742e-05,
+      "loss": 1.0164,
+      "step": 510
+    },
+    {
+      "epoch": 0.17310252996005326,
+      "grad_norm": 38.75,
+      "learning_rate": 1.6537949400798935e-05,
+      "loss": 1.0042,
+      "step": 520
+    },
+    {
+      "epoch": 0.17643142476697737,
+      "grad_norm": 20.125,
+      "learning_rate": 1.6471371504660455e-05,
+      "loss": 0.9786,
+      "step": 530
+    },
+    {
+      "epoch": 0.17976031957390146,
+      "grad_norm": 11.4375,
+      "learning_rate": 1.640479360852197e-05,
+      "loss": 0.9678,
+      "step": 540
+    },
+    {
+      "epoch": 0.18308921438082557,
+      "grad_norm": 19.375,
+      "learning_rate": 1.633821571238349e-05,
+      "loss": 1.0477,
+      "step": 550
+    },
+    {
+      "epoch": 0.18641810918774968,
+      "grad_norm": 16.75,
+      "learning_rate": 1.6271637816245007e-05,
+      "loss": 1.02,
+      "step": 560
+    },
+    {
+      "epoch": 0.18974700399467376,
+      "grad_norm": 15.5,
+      "learning_rate": 1.6205059920106527e-05,
+      "loss": 0.9864,
+      "step": 570
+    },
+    {
+      "epoch": 0.19307589880159787,
+      "grad_norm": 9.75,
+      "learning_rate": 1.6138482023968043e-05,
+      "loss": 1.0019,
+      "step": 580
+    },
+    {
+      "epoch": 0.19640479360852198,
+      "grad_norm": 12.125,
+      "learning_rate": 1.6071904127829563e-05,
+      "loss": 0.9483,
+      "step": 590
+    },
+    {
+      "epoch": 0.19973368841544606,
+      "grad_norm": 30.875,
+      "learning_rate": 1.6005326231691082e-05,
+      "loss": 1.0692,
+      "step": 600
+    },
+    {
+      "epoch": 0.20306258322237017,
+      "grad_norm": 10.125,
+      "learning_rate": 1.59387483355526e-05,
+      "loss": 0.9279,
+      "step": 610
+    },
+    {
+      "epoch": 0.20639147802929428,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.5872170439414115e-05,
+      "loss": 0.941,
+      "step": 620
+    },
+    {
+      "epoch": 0.2097203728362184,
+      "grad_norm": 3.03125,
+      "learning_rate": 1.5805592543275634e-05,
+      "loss": 1.0259,
+      "step": 630
+    },
+    {
+      "epoch": 0.21304926764314247,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.573901464713715e-05,
+      "loss": 0.9687,
+      "step": 640
+    },
+    {
+      "epoch": 0.21637816245006658,
+      "grad_norm": 12.8125,
+      "learning_rate": 1.567243675099867e-05,
+      "loss": 0.9512,
+      "step": 650
+    },
+    {
+      "epoch": 0.2197070572569907,
+      "grad_norm": 9.75,
+      "learning_rate": 1.5605858854860187e-05,
+      "loss": 0.9569,
+      "step": 660
+    },
+    {
+      "epoch": 0.22303595206391477,
+      "grad_norm": 17.375,
+      "learning_rate": 1.5539280958721706e-05,
+      "loss": 0.9694,
+      "step": 670
+    },
+    {
+      "epoch": 0.22636484687083888,
+      "grad_norm": 9.625,
+      "learning_rate": 1.5472703062583222e-05,
+      "loss": 0.97,
+      "step": 680
+    },
+    {
+      "epoch": 0.229693741677763,
+      "grad_norm": 21.0,
+      "learning_rate": 1.5406125166444742e-05,
+      "loss": 1.0506,
+      "step": 690
+    },
+    {
+      "epoch": 0.23302263648468707,
+      "grad_norm": 10.3125,
+      "learning_rate": 1.533954727030626e-05,
+      "loss": 0.9184,
+      "step": 700
+    },
+    {
+      "epoch": 0.23635153129161118,
+      "grad_norm": 21.5,
+      "learning_rate": 1.5272969374167778e-05,
+      "loss": 1.005,
+      "step": 710
+    },
+    {
+      "epoch": 0.2396804260985353,
+      "grad_norm": 4.40625,
+      "learning_rate": 1.5206391478029296e-05,
+      "loss": 0.8624,
+      "step": 720
+    },
+    {
+      "epoch": 0.24300932090545938,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.5139813581890814e-05,
+      "loss": 1.0051,
+      "step": 730
+    },
+    {
+      "epoch": 0.24633821571238348,
+      "grad_norm": 16.625,
+      "learning_rate": 1.5073235685752332e-05,
+      "loss": 1.0378,
+      "step": 740
+    },
+    {
+      "epoch": 0.2496671105193076,
+      "grad_norm": 6.4375,
+      "learning_rate": 1.500665778961385e-05,
+      "loss": 0.9191,
+      "step": 750
+    },
+    {
+      "epoch": 0.2529960053262317,
+      "grad_norm": 5.5625,
+      "learning_rate": 1.4940079893475368e-05,
+      "loss": 0.9722,
+      "step": 760
+    },
+    {
+      "epoch": 0.2563249001331558,
+      "grad_norm": 22.625,
+      "learning_rate": 1.4873501997336886e-05,
+      "loss": 1.014,
+      "step": 770
+    },
+    {
+      "epoch": 0.2596537949400799,
+      "grad_norm": 21.0,
+      "learning_rate": 1.4806924101198404e-05,
+      "loss": 1.0986,
+      "step": 780
+    },
+    {
+      "epoch": 0.262982689747004,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.4740346205059922e-05,
+      "loss": 0.9672,
+      "step": 790
+    },
+    {
+      "epoch": 0.2663115845539281,
+      "grad_norm": 17.0,
+      "learning_rate": 1.467376830892144e-05,
+      "loss": 0.9605,
+      "step": 800
+    },
+    {
+      "epoch": 0.26964047936085217,
+      "grad_norm": 31.75,
+      "learning_rate": 1.4607190412782957e-05,
+      "loss": 1.0308,
+      "step": 810
+    },
+    {
+      "epoch": 0.2729693741677763,
+      "grad_norm": 12.4375,
+      "learning_rate": 1.4540612516644474e-05,
+      "loss": 0.9268,
+      "step": 820
+    },
+    {
+      "epoch": 0.2762982689747004,
+      "grad_norm": 17.125,
+      "learning_rate": 1.4474034620505992e-05,
+      "loss": 0.9656,
+      "step": 830
+    },
+    {
+      "epoch": 0.2796271637816245,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.440745672436751e-05,
+      "loss": 0.9865,
+      "step": 840
+    },
+    {
+      "epoch": 0.2829560585885486,
+      "grad_norm": 22.625,
+      "learning_rate": 1.434087882822903e-05,
+      "loss": 1.1118,
+      "step": 850
+    },
+    {
+      "epoch": 0.2862849533954727,
+      "grad_norm": 13.0,
+      "learning_rate": 1.4274300932090547e-05,
+      "loss": 1.0043,
+      "step": 860
+    },
+    {
+      "epoch": 0.28961384820239683,
+      "grad_norm": 9.0625,
+      "learning_rate": 1.4207723035952065e-05,
+      "loss": 0.9768,
+      "step": 870
+    },
+    {
+      "epoch": 0.2929427430093209,
+      "grad_norm": 6.75,
+      "learning_rate": 1.4141145139813583e-05,
+      "loss": 1.0237,
+      "step": 880
+    },
+    {
+      "epoch": 0.296271637816245,
+      "grad_norm": 10.875,
+      "learning_rate": 1.4074567243675101e-05,
+      "loss": 1.1597,
+      "step": 890
+    },
+    {
+      "epoch": 0.2996005326231691,
+      "grad_norm": 5.75,
+      "learning_rate": 1.4007989347536619e-05,
+      "loss": 0.7993,
+      "step": 900
+    },
+    {
+      "epoch": 0.3029294274300932,
+      "grad_norm": 31.5,
+      "learning_rate": 1.3941411451398137e-05,
+      "loss": 0.9802,
+      "step": 910
+    },
+    {
+      "epoch": 0.3062583222370173,
+      "grad_norm": 12.25,
+      "learning_rate": 1.3874833555259655e-05,
+      "loss": 0.9565,
+      "step": 920
+    },
+    {
+      "epoch": 0.30958721704394143,
+      "grad_norm": 11.8125,
+      "learning_rate": 1.3808255659121173e-05,
+      "loss": 1.0024,
+      "step": 930
+    },
+    {
+      "epoch": 0.3129161118508655,
+      "grad_norm": 16.625,
+      "learning_rate": 1.3741677762982691e-05,
+      "loss": 1.0209,
+      "step": 940
+    },
+    {
+      "epoch": 0.3162450066577896,
+      "grad_norm": 13.375,
+      "learning_rate": 1.3675099866844209e-05,
+      "loss": 0.8767,
+      "step": 950
+    },
+    {
+      "epoch": 0.3195739014647137,
+      "grad_norm": 20.875,
+      "learning_rate": 1.3608521970705725e-05,
+      "loss": 0.9642,
+      "step": 960
+    },
+    {
+      "epoch": 0.3229027962716378,
+      "grad_norm": 17.25,
+      "learning_rate": 1.3541944074567243e-05,
+      "loss": 0.9508,
+      "step": 970
+    },
+    {
+      "epoch": 0.3262316910785619,
+      "grad_norm": 22.25,
+      "learning_rate": 1.3475366178428764e-05,
+      "loss": 1.1324,
+      "step": 980
+    },
+    {
+      "epoch": 0.32956058588548603,
+      "grad_norm": 11.5,
+      "learning_rate": 1.3408788282290282e-05,
+      "loss": 1.0655,
+      "step": 990
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "grad_norm": 15.625,
+      "learning_rate": 1.3342210386151799e-05,
+      "loss": 0.9608,
+      "step": 1000
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "eval_accuracy": 0.507300481168077,
+      "eval_loss": 0.9467151165008545,
+      "eval_runtime": 211.5004,
+      "eval_samples_per_second": 113.986,
+      "eval_steps_per_second": 28.496,
+      "step": 1000
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 3004,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cad4c1f90cca8627b67272acf21697ce26139a89a13bf55f49567978573bff87
+size 5176

checkpoint-1500/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: gpt2
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-1500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "gpt2",
+  "bias": "none",
+  "fan_in_fan_out": true,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:14782f453b90a7b1ee6e1e53d66ba514d60e021ba6854b2c176639bd6c200e19
+size 594496

checkpoint-1500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:108d9010c8a89fd433c5fd549edc9f1ed753bea7b7933758df6fac3a4212a6c3
+size 1197932

checkpoint-1500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:04f4736a3a6374ffdbe1f4c4ee6f76e6d55c950d8e64c2679081a60485da0635
+size 14180

checkpoint-1500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:72f1e567090d0c588243fb7889cc07f276cf55afef9d37d8721d36d6c52568c0
+size 1064

checkpoint-1500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1110 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.4993342210386152,
+  "eval_steps": 500,
+  "global_step": 1500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.003328894806924101,
+      "grad_norm": 37.75,
+      "learning_rate": 1.9933422103861518e-05,
+      "loss": 1.4084,
+      "step": 10
+    },
+    {
+      "epoch": 0.006657789613848202,
+      "grad_norm": 12.125,
+      "learning_rate": 1.9866844207723038e-05,
+      "loss": 1.0474,
+      "step": 20
+    },
+    {
+      "epoch": 0.009986684420772303,
+      "grad_norm": 45.25,
+      "learning_rate": 1.9800266311584554e-05,
+      "loss": 1.2455,
+      "step": 30
+    },
+    {
+      "epoch": 0.013315579227696404,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.9733688415446073e-05,
+      "loss": 1.1969,
+      "step": 40
+    },
+    {
+      "epoch": 0.016644474034620507,
+      "grad_norm": 32.0,
+      "learning_rate": 1.966711051930759e-05,
+      "loss": 1.1659,
+      "step": 50
+    },
+    {
+      "epoch": 0.019973368841544607,
+      "grad_norm": 19.0,
+      "learning_rate": 1.960053262316911e-05,
+      "loss": 1.1159,
+      "step": 60
+    },
+    {
+      "epoch": 0.02330226364846871,
+      "grad_norm": 27.5,
+      "learning_rate": 1.953395472703063e-05,
+      "loss": 1.1861,
+      "step": 70
+    },
+    {
+      "epoch": 0.02663115845539281,
+      "grad_norm": 17.5,
+      "learning_rate": 1.9467376830892145e-05,
+      "loss": 1.1073,
+      "step": 80
+    },
+    {
+      "epoch": 0.02996005326231691,
+      "grad_norm": 33.75,
+      "learning_rate": 1.9400798934753665e-05,
+      "loss": 1.1506,
+      "step": 90
+    },
+    {
+      "epoch": 0.033288948069241014,
+      "grad_norm": 9.5,
+      "learning_rate": 1.933422103861518e-05,
+      "loss": 1.246,
+      "step": 100
+    },
+    {
+      "epoch": 0.03661784287616511,
+      "grad_norm": 11.3125,
+      "learning_rate": 1.92676431424767e-05,
+      "loss": 1.1708,
+      "step": 110
+    },
+    {
+      "epoch": 0.03994673768308921,
+      "grad_norm": 14.375,
+      "learning_rate": 1.9201065246338217e-05,
+      "loss": 1.1302,
+      "step": 120
+    },
+    {
+      "epoch": 0.043275632490013316,
+      "grad_norm": 20.625,
+      "learning_rate": 1.9134487350199737e-05,
+      "loss": 1.0537,
+      "step": 130
+    },
+    {
+      "epoch": 0.04660452729693742,
+      "grad_norm": 21.75,
+      "learning_rate": 1.9067909454061253e-05,
+      "loss": 1.0669,
+      "step": 140
+    },
+    {
+      "epoch": 0.049933422103861515,
+      "grad_norm": 14.125,
+      "learning_rate": 1.900133155792277e-05,
+      "loss": 1.0482,
+      "step": 150
+    },
+    {
+      "epoch": 0.05326231691078562,
+      "grad_norm": 27.625,
+      "learning_rate": 1.893475366178429e-05,
+      "loss": 1.1468,
+      "step": 160
+    },
+    {
+      "epoch": 0.05659121171770972,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.8868175765645805e-05,
+      "loss": 1.092,
+      "step": 170
+    },
+    {
+      "epoch": 0.05992010652463382,
+      "grad_norm": 32.5,
+      "learning_rate": 1.8801597869507325e-05,
+      "loss": 1.2208,
+      "step": 180
+    },
+    {
+      "epoch": 0.06324900133155792,
+      "grad_norm": 56.0,
+      "learning_rate": 1.873501997336884e-05,
+      "loss": 1.2359,
+      "step": 190
+    },
+    {
+      "epoch": 0.06657789613848203,
+      "grad_norm": 8.375,
+      "learning_rate": 1.866844207723036e-05,
+      "loss": 1.0712,
+      "step": 200
+    },
+    {
+      "epoch": 0.06990679094540612,
+      "grad_norm": 26.375,
+      "learning_rate": 1.860186418109188e-05,
+      "loss": 1.0156,
+      "step": 210
+    },
+    {
+      "epoch": 0.07323568575233022,
+      "grad_norm": 12.0,
+      "learning_rate": 1.8535286284953397e-05,
+      "loss": 1.0407,
+      "step": 220
+    },
+    {
+      "epoch": 0.07656458055925433,
+      "grad_norm": 13.75,
+      "learning_rate": 1.8468708388814916e-05,
+      "loss": 1.0161,
+      "step": 230
+    },
+    {
+      "epoch": 0.07989347536617843,
+      "grad_norm": 27.125,
+      "learning_rate": 1.8402130492676432e-05,
+      "loss": 1.1466,
+      "step": 240
+    },
+    {
+      "epoch": 0.08322237017310254,
+      "grad_norm": 26.125,
+      "learning_rate": 1.8335552596537952e-05,
+      "loss": 1.3192,
+      "step": 250
+    },
+    {
+      "epoch": 0.08655126498002663,
+      "grad_norm": 25.75,
+      "learning_rate": 1.826897470039947e-05,
+      "loss": 1.0628,
+      "step": 260
+    },
+    {
+      "epoch": 0.08988015978695073,
+      "grad_norm": 29.25,
+      "learning_rate": 1.8202396804260988e-05,
+      "loss": 0.967,
+      "step": 270
+    },
+    {
+      "epoch": 0.09320905459387484,
+      "grad_norm": 29.75,
+      "learning_rate": 1.8135818908122504e-05,
+      "loss": 1.0356,
+      "step": 280
+    },
+    {
+      "epoch": 0.09653794940079893,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.806924101198402e-05,
+      "loss": 0.9853,
+      "step": 290
+    },
+    {
+      "epoch": 0.09986684420772303,
+      "grad_norm": 27.5,
+      "learning_rate": 1.800266311584554e-05,
+      "loss": 1.0279,
+      "step": 300
+    },
+    {
+      "epoch": 0.10319573901464714,
+      "grad_norm": 5.65625,
+      "learning_rate": 1.7936085219707056e-05,
+      "loss": 1.0115,
+      "step": 310
+    },
+    {
+      "epoch": 0.10652463382157124,
+      "grad_norm": 8.1875,
+      "learning_rate": 1.7869507323568576e-05,
+      "loss": 0.942,
+      "step": 320
+    },
+    {
+      "epoch": 0.10985352862849534,
+      "grad_norm": 65.0,
+      "learning_rate": 1.7802929427430096e-05,
+      "loss": 1.0208,
+      "step": 330
+    },
+    {
+      "epoch": 0.11318242343541944,
+      "grad_norm": 9.625,
+      "learning_rate": 1.7736351531291612e-05,
+      "loss": 1.1729,
+      "step": 340
+    },
+    {
+      "epoch": 0.11651131824234354,
+      "grad_norm": 17.5,
+      "learning_rate": 1.766977363515313e-05,
+      "loss": 0.9322,
+      "step": 350
+    },
+    {
+      "epoch": 0.11984021304926765,
+      "grad_norm": 28.125,
+      "learning_rate": 1.7603195739014648e-05,
+      "loss": 1.0417,
+      "step": 360
+    },
+    {
+      "epoch": 0.12316910785619174,
+      "grad_norm": 18.25,
+      "learning_rate": 1.7536617842876168e-05,
+      "loss": 1.2491,
+      "step": 370
+    },
+    {
+      "epoch": 0.12649800266311584,
+      "grad_norm": 27.375,
+      "learning_rate": 1.7470039946737684e-05,
+      "loss": 0.9388,
+      "step": 380
+    },
+    {
+      "epoch": 0.12982689747003995,
+      "grad_norm": 22.125,
+      "learning_rate": 1.7403462050599203e-05,
+      "loss": 1.0488,
+      "step": 390
+    },
+    {
+      "epoch": 0.13315579227696406,
+      "grad_norm": 21.25,
+      "learning_rate": 1.733688415446072e-05,
+      "loss": 0.9951,
+      "step": 400
+    },
+    {
+      "epoch": 0.13648468708388814,
+      "grad_norm": 11.75,
+      "learning_rate": 1.727030625832224e-05,
+      "loss": 0.9212,
+      "step": 410
+    },
+    {
+      "epoch": 0.13981358189081225,
+      "grad_norm": 16.25,
+      "learning_rate": 1.7203728362183756e-05,
+      "loss": 1.0548,
+      "step": 420
+    },
+    {
+      "epoch": 0.14314247669773636,
+      "grad_norm": 19.875,
+      "learning_rate": 1.7137150466045275e-05,
+      "loss": 1.0238,
+      "step": 430
+    },
+    {
+      "epoch": 0.14647137150466044,
+      "grad_norm": 18.875,
+      "learning_rate": 1.707057256990679e-05,
+      "loss": 1.0327,
+      "step": 440
+    },
+    {
+      "epoch": 0.14980026631158455,
+      "grad_norm": 7.84375,
+      "learning_rate": 1.7003994673768308e-05,
+      "loss": 1.1155,
+      "step": 450
+    },
+    {
+      "epoch": 0.15312916111850866,
+      "grad_norm": 14.125,
+      "learning_rate": 1.693741677762983e-05,
+      "loss": 0.9627,
+      "step": 460
+    },
+    {
+      "epoch": 0.15645805592543274,
+      "grad_norm": 7.4375,
+      "learning_rate": 1.6870838881491347e-05,
+      "loss": 1.0216,
+      "step": 470
+    },
+    {
+      "epoch": 0.15978695073235685,
+      "grad_norm": 9.375,
+      "learning_rate": 1.6804260985352863e-05,
+      "loss": 1.0489,
+      "step": 480
+    },
+    {
+      "epoch": 0.16311584553928096,
+      "grad_norm": 19.5,
+      "learning_rate": 1.6737683089214383e-05,
+      "loss": 1.1254,
+      "step": 490
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "grad_norm": 26.125,
+      "learning_rate": 1.66711051930759e-05,
+      "loss": 1.0134,
+      "step": 500
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "eval_accuracy": 0.5015762402521985,
+      "eval_loss": 0.9942083358764648,
+      "eval_runtime": 211.8123,
+      "eval_samples_per_second": 113.818,
+      "eval_steps_per_second": 28.454,
+      "step": 500
+    },
+    {
+      "epoch": 0.16977363515312915,
+      "grad_norm": 12.3125,
+      "learning_rate": 1.660452729693742e-05,
+      "loss": 1.0164,
+      "step": 510
+    },
+    {
+      "epoch": 0.17310252996005326,
+      "grad_norm": 38.75,
+      "learning_rate": 1.6537949400798935e-05,
+      "loss": 1.0042,
+      "step": 520
+    },
+    {
+      "epoch": 0.17643142476697737,
+      "grad_norm": 20.125,
+      "learning_rate": 1.6471371504660455e-05,
+      "loss": 0.9786,
+      "step": 530
+    },
+    {
+      "epoch": 0.17976031957390146,
+      "grad_norm": 11.4375,
+      "learning_rate": 1.640479360852197e-05,
+      "loss": 0.9678,
+      "step": 540
+    },
+    {
+      "epoch": 0.18308921438082557,
+      "grad_norm": 19.375,
+      "learning_rate": 1.633821571238349e-05,
+      "loss": 1.0477,
+      "step": 550
+    },
+    {
+      "epoch": 0.18641810918774968,
+      "grad_norm": 16.75,
+      "learning_rate": 1.6271637816245007e-05,
+      "loss": 1.02,
+      "step": 560
+    },
+    {
+      "epoch": 0.18974700399467376,
+      "grad_norm": 15.5,
+      "learning_rate": 1.6205059920106527e-05,
+      "loss": 0.9864,
+      "step": 570
+    },
+    {
+      "epoch": 0.19307589880159787,
+      "grad_norm": 9.75,
+      "learning_rate": 1.6138482023968043e-05,
+      "loss": 1.0019,
+      "step": 580
+    },
+    {
+      "epoch": 0.19640479360852198,
+      "grad_norm": 12.125,
+      "learning_rate": 1.6071904127829563e-05,
+      "loss": 0.9483,
+      "step": 590
+    },
+    {
+      "epoch": 0.19973368841544606,
+      "grad_norm": 30.875,
+      "learning_rate": 1.6005326231691082e-05,
+      "loss": 1.0692,
+      "step": 600
+    },
+    {
+      "epoch": 0.20306258322237017,
+      "grad_norm": 10.125,
+      "learning_rate": 1.59387483355526e-05,
+      "loss": 0.9279,
+      "step": 610
+    },
+    {
+      "epoch": 0.20639147802929428,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.5872170439414115e-05,
+      "loss": 0.941,
+      "step": 620
+    },
+    {
+      "epoch": 0.2097203728362184,
+      "grad_norm": 3.03125,
+      "learning_rate": 1.5805592543275634e-05,
+      "loss": 1.0259,
+      "step": 630
+    },
+    {
+      "epoch": 0.21304926764314247,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.573901464713715e-05,
+      "loss": 0.9687,
+      "step": 640
+    },
+    {
+      "epoch": 0.21637816245006658,
+      "grad_norm": 12.8125,
+      "learning_rate": 1.567243675099867e-05,
+      "loss": 0.9512,
+      "step": 650
+    },
+    {
+      "epoch": 0.2197070572569907,
+      "grad_norm": 9.75,
+      "learning_rate": 1.5605858854860187e-05,
+      "loss": 0.9569,
+      "step": 660
+    },
+    {
+      "epoch": 0.22303595206391477,
+      "grad_norm": 17.375,
+      "learning_rate": 1.5539280958721706e-05,
+      "loss": 0.9694,
+      "step": 670
+    },
+    {
+      "epoch": 0.22636484687083888,
+      "grad_norm": 9.625,
+      "learning_rate": 1.5472703062583222e-05,
+      "loss": 0.97,
+      "step": 680
+    },
+    {
+      "epoch": 0.229693741677763,
+      "grad_norm": 21.0,
+      "learning_rate": 1.5406125166444742e-05,
+      "loss": 1.0506,
+      "step": 690
+    },
+    {
+      "epoch": 0.23302263648468707,
+      "grad_norm": 10.3125,
+      "learning_rate": 1.533954727030626e-05,
+      "loss": 0.9184,
+      "step": 700
+    },
+    {
+      "epoch": 0.23635153129161118,
+      "grad_norm": 21.5,
+      "learning_rate": 1.5272969374167778e-05,
+      "loss": 1.005,
+      "step": 710
+    },
+    {
+      "epoch": 0.2396804260985353,
+      "grad_norm": 4.40625,
+      "learning_rate": 1.5206391478029296e-05,
+      "loss": 0.8624,
+      "step": 720
+    },
+    {
+      "epoch": 0.24300932090545938,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.5139813581890814e-05,
+      "loss": 1.0051,
+      "step": 730
+    },
+    {
+      "epoch": 0.24633821571238348,
+      "grad_norm": 16.625,
+      "learning_rate": 1.5073235685752332e-05,
+      "loss": 1.0378,
+      "step": 740
+    },
+    {
+      "epoch": 0.2496671105193076,
+      "grad_norm": 6.4375,
+      "learning_rate": 1.500665778961385e-05,
+      "loss": 0.9191,
+      "step": 750
+    },
+    {
+      "epoch": 0.2529960053262317,
+      "grad_norm": 5.5625,
+      "learning_rate": 1.4940079893475368e-05,
+      "loss": 0.9722,
+      "step": 760
+    },
+    {
+      "epoch": 0.2563249001331558,
+      "grad_norm": 22.625,
+      "learning_rate": 1.4873501997336886e-05,
+      "loss": 1.014,
+      "step": 770
+    },
+    {
+      "epoch": 0.2596537949400799,
+      "grad_norm": 21.0,
+      "learning_rate": 1.4806924101198404e-05,
+      "loss": 1.0986,
+      "step": 780
+    },
+    {
+      "epoch": 0.262982689747004,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.4740346205059922e-05,
+      "loss": 0.9672,
+      "step": 790
+    },
+    {
+      "epoch": 0.2663115845539281,
+      "grad_norm": 17.0,
+      "learning_rate": 1.467376830892144e-05,
+      "loss": 0.9605,
+      "step": 800
+    },
+    {
+      "epoch": 0.26964047936085217,
+      "grad_norm": 31.75,
+      "learning_rate": 1.4607190412782957e-05,
+      "loss": 1.0308,
+      "step": 810
+    },
+    {
+      "epoch": 0.2729693741677763,
+      "grad_norm": 12.4375,
+      "learning_rate": 1.4540612516644474e-05,
+      "loss": 0.9268,
+      "step": 820
+    },
+    {
+      "epoch": 0.2762982689747004,
+      "grad_norm": 17.125,
+      "learning_rate": 1.4474034620505992e-05,
+      "loss": 0.9656,
+      "step": 830
+    },
+    {
+      "epoch": 0.2796271637816245,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.440745672436751e-05,
+      "loss": 0.9865,
+      "step": 840
+    },
+    {
+      "epoch": 0.2829560585885486,
+      "grad_norm": 22.625,
+      "learning_rate": 1.434087882822903e-05,
+      "loss": 1.1118,
+      "step": 850
+    },
+    {
+      "epoch": 0.2862849533954727,
+      "grad_norm": 13.0,
+      "learning_rate": 1.4274300932090547e-05,
+      "loss": 1.0043,
+      "step": 860
+    },
+    {
+      "epoch": 0.28961384820239683,
+      "grad_norm": 9.0625,
+      "learning_rate": 1.4207723035952065e-05,
+      "loss": 0.9768,
+      "step": 870
+    },
+    {
+      "epoch": 0.2929427430093209,
+      "grad_norm": 6.75,
+      "learning_rate": 1.4141145139813583e-05,
+      "loss": 1.0237,
+      "step": 880
+    },
+    {
+      "epoch": 0.296271637816245,
+      "grad_norm": 10.875,
+      "learning_rate": 1.4074567243675101e-05,
+      "loss": 1.1597,
+      "step": 890
+    },
+    {
+      "epoch": 0.2996005326231691,
+      "grad_norm": 5.75,
+      "learning_rate": 1.4007989347536619e-05,
+      "loss": 0.7993,
+      "step": 900
+    },
+    {
+      "epoch": 0.3029294274300932,
+      "grad_norm": 31.5,
+      "learning_rate": 1.3941411451398137e-05,
+      "loss": 0.9802,
+      "step": 910
+    },
+    {
+      "epoch": 0.3062583222370173,
+      "grad_norm": 12.25,
+      "learning_rate": 1.3874833555259655e-05,
+      "loss": 0.9565,
+      "step": 920
+    },
+    {
+      "epoch": 0.30958721704394143,
+      "grad_norm": 11.8125,
+      "learning_rate": 1.3808255659121173e-05,
+      "loss": 1.0024,
+      "step": 930
+    },
+    {
+      "epoch": 0.3129161118508655,
+      "grad_norm": 16.625,
+      "learning_rate": 1.3741677762982691e-05,
+      "loss": 1.0209,
+      "step": 940
+    },
+    {
+      "epoch": 0.3162450066577896,
+      "grad_norm": 13.375,
+      "learning_rate": 1.3675099866844209e-05,
+      "loss": 0.8767,
+      "step": 950
+    },
+    {
+      "epoch": 0.3195739014647137,
+      "grad_norm": 20.875,
+      "learning_rate": 1.3608521970705725e-05,
+      "loss": 0.9642,
+      "step": 960
+    },
+    {
+      "epoch": 0.3229027962716378,
+      "grad_norm": 17.25,
+      "learning_rate": 1.3541944074567243e-05,
+      "loss": 0.9508,
+      "step": 970
+    },
+    {
+      "epoch": 0.3262316910785619,
+      "grad_norm": 22.25,
+      "learning_rate": 1.3475366178428764e-05,
+      "loss": 1.1324,
+      "step": 980
+    },
+    {
+      "epoch": 0.32956058588548603,
+      "grad_norm": 11.5,
+      "learning_rate": 1.3408788282290282e-05,
+      "loss": 1.0655,
+      "step": 990
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "grad_norm": 15.625,
+      "learning_rate": 1.3342210386151799e-05,
+      "loss": 0.9608,
+      "step": 1000
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "eval_accuracy": 0.507300481168077,
+      "eval_loss": 0.9467151165008545,
+      "eval_runtime": 211.5004,
+      "eval_samples_per_second": 113.986,
+      "eval_steps_per_second": 28.496,
+      "step": 1000
+    },
+    {
+      "epoch": 0.3362183754993342,
+      "grad_norm": 20.0,
+      "learning_rate": 1.3275632490013317e-05,
+      "loss": 1.0432,
+      "step": 1010
+    },
+    {
+      "epoch": 0.3395472703062583,
+      "grad_norm": 22.375,
+      "learning_rate": 1.3209054593874834e-05,
+      "loss": 1.0393,
+      "step": 1020
+    },
+    {
+      "epoch": 0.3428761651131824,
+      "grad_norm": 18.375,
+      "learning_rate": 1.3142476697736352e-05,
+      "loss": 1.0398,
+      "step": 1030
+    },
+    {
+      "epoch": 0.34620505992010653,
+      "grad_norm": 17.25,
+      "learning_rate": 1.307589880159787e-05,
+      "loss": 0.9225,
+      "step": 1040
+    },
+    {
+      "epoch": 0.34953395472703064,
+      "grad_norm": 11.75,
+      "learning_rate": 1.3009320905459388e-05,
+      "loss": 0.9262,
+      "step": 1050
+    },
+    {
+      "epoch": 0.35286284953395475,
+      "grad_norm": 10.25,
+      "learning_rate": 1.2942743009320906e-05,
+      "loss": 0.7248,
+      "step": 1060
+    },
+    {
+      "epoch": 0.3561917443408788,
+      "grad_norm": 12.875,
+      "learning_rate": 1.2876165113182424e-05,
+      "loss": 0.8358,
+      "step": 1070
+    },
+    {
+      "epoch": 0.3595206391478029,
+      "grad_norm": 4.65625,
+      "learning_rate": 1.2809587217043942e-05,
+      "loss": 0.8156,
+      "step": 1080
+    },
+    {
+      "epoch": 0.362849533954727,
+      "grad_norm": 23.5,
+      "learning_rate": 1.274300932090546e-05,
+      "loss": 1.0211,
+      "step": 1090
+    },
+    {
+      "epoch": 0.36617842876165113,
+      "grad_norm": 17.5,
+      "learning_rate": 1.2676431424766978e-05,
+      "loss": 0.9804,
+      "step": 1100
+    },
+    {
+      "epoch": 0.36950732356857524,
+      "grad_norm": 6.375,
+      "learning_rate": 1.2609853528628498e-05,
+      "loss": 0.9636,
+      "step": 1110
+    },
+    {
+      "epoch": 0.37283621837549935,
+      "grad_norm": 20.25,
+      "learning_rate": 1.2543275632490016e-05,
+      "loss": 0.9873,
+      "step": 1120
+    },
+    {
+      "epoch": 0.37616511318242346,
+      "grad_norm": 17.0,
+      "learning_rate": 1.2476697736351534e-05,
+      "loss": 0.9674,
+      "step": 1130
+    },
+    {
+      "epoch": 0.3794940079893475,
+      "grad_norm": 20.5,
+      "learning_rate": 1.241011984021305e-05,
+      "loss": 1.0073,
+      "step": 1140
+    },
+    {
+      "epoch": 0.3828229027962716,
+      "grad_norm": 4.125,
+      "learning_rate": 1.2343541944074568e-05,
+      "loss": 1.0919,
+      "step": 1150
+    },
+    {
+      "epoch": 0.38615179760319573,
+      "grad_norm": 15.3125,
+      "learning_rate": 1.2276964047936086e-05,
+      "loss": 1.0103,
+      "step": 1160
+    },
+    {
+      "epoch": 0.38948069241011984,
+      "grad_norm": 22.125,
+      "learning_rate": 1.2210386151797604e-05,
+      "loss": 0.9629,
+      "step": 1170
+    },
+    {
+      "epoch": 0.39280958721704395,
+      "grad_norm": 2.953125,
+      "learning_rate": 1.2143808255659122e-05,
+      "loss": 0.9545,
+      "step": 1180
+    },
+    {
+      "epoch": 0.39613848202396806,
+      "grad_norm": 7.59375,
+      "learning_rate": 1.207723035952064e-05,
+      "loss": 0.8836,
+      "step": 1190
+    },
+    {
+      "epoch": 0.3994673768308921,
+      "grad_norm": 28.125,
+      "learning_rate": 1.2010652463382158e-05,
+      "loss": 0.9632,
+      "step": 1200
+    },
+    {
+      "epoch": 0.40279627163781623,
+      "grad_norm": 15.125,
+      "learning_rate": 1.1944074567243676e-05,
+      "loss": 1.0765,
+      "step": 1210
+    },
+    {
+      "epoch": 0.40612516644474034,
+      "grad_norm": 11.625,
+      "learning_rate": 1.1877496671105194e-05,
+      "loss": 0.7911,
+      "step": 1220
+    },
+    {
+      "epoch": 0.40945406125166445,
+      "grad_norm": 15.5625,
+      "learning_rate": 1.1810918774966711e-05,
+      "loss": 1.0908,
+      "step": 1230
+    },
+    {
+      "epoch": 0.41278295605858856,
+      "grad_norm": 6.3125,
+      "learning_rate": 1.1744340878828231e-05,
+      "loss": 1.0418,
+      "step": 1240
+    },
+    {
+      "epoch": 0.41611185086551267,
+      "grad_norm": 2.671875,
+      "learning_rate": 1.1677762982689749e-05,
+      "loss": 0.9041,
+      "step": 1250
+    },
+    {
+      "epoch": 0.4194407456724368,
+      "grad_norm": 19.125,
+      "learning_rate": 1.1611185086551267e-05,
+      "loss": 1.0392,
+      "step": 1260
+    },
+    {
+      "epoch": 0.42276964047936083,
+      "grad_norm": 6.8125,
+      "learning_rate": 1.1544607190412785e-05,
+      "loss": 0.8439,
+      "step": 1270
+    },
+    {
+      "epoch": 0.42609853528628494,
+      "grad_norm": 7.75,
+      "learning_rate": 1.1478029294274303e-05,
+      "loss": 1.0188,
+      "step": 1280
+    },
+    {
+      "epoch": 0.42942743009320905,
+      "grad_norm": 17.25,
+      "learning_rate": 1.141145139813582e-05,
+      "loss": 1.1353,
+      "step": 1290
+    },
+    {
+      "epoch": 0.43275632490013316,
+      "grad_norm": 22.125,
+      "learning_rate": 1.1344873501997337e-05,
+      "loss": 0.987,
+      "step": 1300
+    },
+    {
+      "epoch": 0.43608521970705727,
+      "grad_norm": 16.0,
+      "learning_rate": 1.1278295605858855e-05,
+      "loss": 0.8446,
+      "step": 1310
+    },
+    {
+      "epoch": 0.4394141145139814,
+      "grad_norm": 48.25,
+      "learning_rate": 1.1211717709720373e-05,
+      "loss": 1.0588,
+      "step": 1320
+    },
+    {
+      "epoch": 0.44274300932090543,
+      "grad_norm": 17.625,
+      "learning_rate": 1.1145139813581891e-05,
+      "loss": 0.8579,
+      "step": 1330
+    },
+    {
+      "epoch": 0.44607190412782954,
+      "grad_norm": 20.25,
+      "learning_rate": 1.1078561917443409e-05,
+      "loss": 1.0973,
+      "step": 1340
+    },
+    {
+      "epoch": 0.44940079893475365,
+      "grad_norm": 14.25,
+      "learning_rate": 1.1011984021304927e-05,
+      "loss": 0.975,
+      "step": 1350
+    },
+    {
+      "epoch": 0.45272969374167776,
+      "grad_norm": 22.875,
+      "learning_rate": 1.0945406125166447e-05,
+      "loss": 0.9035,
+      "step": 1360
+    },
+    {
+      "epoch": 0.4560585885486019,
+      "grad_norm": 13.0,
+      "learning_rate": 1.0878828229027965e-05,
+      "loss": 1.0748,
+      "step": 1370
+    },
+    {
+      "epoch": 0.459387483355526,
+      "grad_norm": 13.125,
+      "learning_rate": 1.0812250332889482e-05,
+      "loss": 1.0373,
+      "step": 1380
+    },
+    {
+      "epoch": 0.4627163781624501,
+      "grad_norm": 15.0,
+      "learning_rate": 1.0745672436751e-05,
+      "loss": 0.9623,
+      "step": 1390
+    },
+    {
+      "epoch": 0.46604527296937415,
+      "grad_norm": 26.25,
+      "learning_rate": 1.0679094540612518e-05,
+      "loss": 0.9579,
+      "step": 1400
+    },
+    {
+      "epoch": 0.46937416777629826,
+      "grad_norm": 12.125,
+      "learning_rate": 1.0612516644474036e-05,
+      "loss": 0.918,
+      "step": 1410
+    },
+    {
+      "epoch": 0.47270306258322237,
+      "grad_norm": 30.75,
+      "learning_rate": 1.0545938748335554e-05,
+      "loss": 0.9823,
+      "step": 1420
+    },
+    {
+      "epoch": 0.4760319573901465,
+      "grad_norm": 7.0,
+      "learning_rate": 1.047936085219707e-05,
+      "loss": 0.9885,
+      "step": 1430
+    },
+    {
+      "epoch": 0.4793608521970706,
+      "grad_norm": 19.5,
+      "learning_rate": 1.0412782956058588e-05,
+      "loss": 0.9654,
+      "step": 1440
+    },
+    {
+      "epoch": 0.4826897470039947,
+      "grad_norm": 13.5,
+      "learning_rate": 1.0346205059920106e-05,
+      "loss": 0.8212,
+      "step": 1450
+    },
+    {
+      "epoch": 0.48601864181091875,
+      "grad_norm": 17.0,
+      "learning_rate": 1.0279627163781624e-05,
+      "loss": 0.9406,
+      "step": 1460
+    },
+    {
+      "epoch": 0.48934753661784286,
+      "grad_norm": 28.0,
+      "learning_rate": 1.0213049267643142e-05,
+      "loss": 0.9908,
+      "step": 1470
+    },
+    {
+      "epoch": 0.49267643142476697,
+      "grad_norm": 14.25,
+      "learning_rate": 1.014647137150466e-05,
+      "loss": 0.9418,
+      "step": 1480
+    },
+    {
+      "epoch": 0.4960053262316911,
+      "grad_norm": 12.25,
+      "learning_rate": 1.007989347536618e-05,
+      "loss": 1.075,
+      "step": 1490
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "grad_norm": 10.625,
+      "learning_rate": 1.0013315579227698e-05,
+      "loss": 1.0111,
+      "step": 1500
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "eval_accuracy": 0.508752281400365,
+      "eval_loss": 0.9421009421348572,
+      "eval_runtime": 209.8514,
+      "eval_samples_per_second": 114.881,
+      "eval_steps_per_second": 28.72,
+      "step": 1500
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 3004,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cad4c1f90cca8627b67272acf21697ce26139a89a13bf55f49567978573bff87
+size 5176

checkpoint-2000/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: gpt2
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-2000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "gpt2",
+  "bias": "none",
+  "fan_in_fan_out": true,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-2000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f104abf68186cbe4b90542847ba25bf8c54975b5ac630592d27161ac6c03061
+size 594496

checkpoint-2000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c759fa081fda0118016523492c6e21e9b10b36471f42cc39a7c6c8696f374726
+size 1197932

checkpoint-2000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7381e59686686a270df73ebf140af3ee4df79cbe6aaec78d4c68a2b0b8b96259
+size 14180

checkpoint-2000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a49449970e0dfcabc9d3d75da644cb8e65c15961f0aad182c781f194538c5b0
+size 1064

checkpoint-2000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1469 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.6657789613848203,
+  "eval_steps": 500,
+  "global_step": 2000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.003328894806924101,
+      "grad_norm": 37.75,
+      "learning_rate": 1.9933422103861518e-05,
+      "loss": 1.4084,
+      "step": 10
+    },
+    {
+      "epoch": 0.006657789613848202,
+      "grad_norm": 12.125,
+      "learning_rate": 1.9866844207723038e-05,
+      "loss": 1.0474,
+      "step": 20
+    },
+    {
+      "epoch": 0.009986684420772303,
+      "grad_norm": 45.25,
+      "learning_rate": 1.9800266311584554e-05,
+      "loss": 1.2455,
+      "step": 30
+    },
+    {
+      "epoch": 0.013315579227696404,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.9733688415446073e-05,
+      "loss": 1.1969,
+      "step": 40
+    },
+    {
+      "epoch": 0.016644474034620507,
+      "grad_norm": 32.0,
+      "learning_rate": 1.966711051930759e-05,
+      "loss": 1.1659,
+      "step": 50
+    },
+    {
+      "epoch": 0.019973368841544607,
+      "grad_norm": 19.0,
+      "learning_rate": 1.960053262316911e-05,
+      "loss": 1.1159,
+      "step": 60
+    },
+    {
+      "epoch": 0.02330226364846871,
+      "grad_norm": 27.5,
+      "learning_rate": 1.953395472703063e-05,
+      "loss": 1.1861,
+      "step": 70
+    },
+    {
+      "epoch": 0.02663115845539281,
+      "grad_norm": 17.5,
+      "learning_rate": 1.9467376830892145e-05,
+      "loss": 1.1073,
+      "step": 80
+    },
+    {
+      "epoch": 0.02996005326231691,
+      "grad_norm": 33.75,
+      "learning_rate": 1.9400798934753665e-05,
+      "loss": 1.1506,
+      "step": 90
+    },
+    {
+      "epoch": 0.033288948069241014,
+      "grad_norm": 9.5,
+      "learning_rate": 1.933422103861518e-05,
+      "loss": 1.246,
+      "step": 100
+    },
+    {
+      "epoch": 0.03661784287616511,
+      "grad_norm": 11.3125,
+      "learning_rate": 1.92676431424767e-05,
+      "loss": 1.1708,
+      "step": 110
+    },
+    {
+      "epoch": 0.03994673768308921,
+      "grad_norm": 14.375,
+      "learning_rate": 1.9201065246338217e-05,
+      "loss": 1.1302,
+      "step": 120
+    },
+    {
+      "epoch": 0.043275632490013316,
+      "grad_norm": 20.625,
+      "learning_rate": 1.9134487350199737e-05,
+      "loss": 1.0537,
+      "step": 130
+    },
+    {
+      "epoch": 0.04660452729693742,
+      "grad_norm": 21.75,
+      "learning_rate": 1.9067909454061253e-05,
+      "loss": 1.0669,
+      "step": 140
+    },
+    {
+      "epoch": 0.049933422103861515,
+      "grad_norm": 14.125,
+      "learning_rate": 1.900133155792277e-05,
+      "loss": 1.0482,
+      "step": 150
+    },
+    {
+      "epoch": 0.05326231691078562,
+      "grad_norm": 27.625,
+      "learning_rate": 1.893475366178429e-05,
+      "loss": 1.1468,
+      "step": 160
+    },
+    {
+      "epoch": 0.05659121171770972,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.8868175765645805e-05,
+      "loss": 1.092,
+      "step": 170
+    },
+    {
+      "epoch": 0.05992010652463382,
+      "grad_norm": 32.5,
+      "learning_rate": 1.8801597869507325e-05,
+      "loss": 1.2208,
+      "step": 180
+    },
+    {
+      "epoch": 0.06324900133155792,
+      "grad_norm": 56.0,
+      "learning_rate": 1.873501997336884e-05,
+      "loss": 1.2359,
+      "step": 190
+    },
+    {
+      "epoch": 0.06657789613848203,
+      "grad_norm": 8.375,
+      "learning_rate": 1.866844207723036e-05,
+      "loss": 1.0712,
+      "step": 200
+    },
+    {
+      "epoch": 0.06990679094540612,
+      "grad_norm": 26.375,
+      "learning_rate": 1.860186418109188e-05,
+      "loss": 1.0156,
+      "step": 210
+    },
+    {
+      "epoch": 0.07323568575233022,
+      "grad_norm": 12.0,
+      "learning_rate": 1.8535286284953397e-05,
+      "loss": 1.0407,
+      "step": 220
+    },
+    {
+      "epoch": 0.07656458055925433,
+      "grad_norm": 13.75,
+      "learning_rate": 1.8468708388814916e-05,
+      "loss": 1.0161,
+      "step": 230
+    },
+    {
+      "epoch": 0.07989347536617843,
+      "grad_norm": 27.125,
+      "learning_rate": 1.8402130492676432e-05,
+      "loss": 1.1466,
+      "step": 240
+    },
+    {
+      "epoch": 0.08322237017310254,
+      "grad_norm": 26.125,
+      "learning_rate": 1.8335552596537952e-05,
+      "loss": 1.3192,
+      "step": 250
+    },
+    {
+      "epoch": 0.08655126498002663,
+      "grad_norm": 25.75,
+      "learning_rate": 1.826897470039947e-05,
+      "loss": 1.0628,
+      "step": 260
+    },
+    {
+      "epoch": 0.08988015978695073,
+      "grad_norm": 29.25,
+      "learning_rate": 1.8202396804260988e-05,
+      "loss": 0.967,
+      "step": 270
+    },
+    {
+      "epoch": 0.09320905459387484,
+      "grad_norm": 29.75,
+      "learning_rate": 1.8135818908122504e-05,
+      "loss": 1.0356,
+      "step": 280
+    },
+    {
+      "epoch": 0.09653794940079893,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.806924101198402e-05,
+      "loss": 0.9853,
+      "step": 290
+    },
+    {
+      "epoch": 0.09986684420772303,
+      "grad_norm": 27.5,
+      "learning_rate": 1.800266311584554e-05,
+      "loss": 1.0279,
+      "step": 300
+    },
+    {
+      "epoch": 0.10319573901464714,
+      "grad_norm": 5.65625,
+      "learning_rate": 1.7936085219707056e-05,
+      "loss": 1.0115,
+      "step": 310
+    },
+    {
+      "epoch": 0.10652463382157124,
+      "grad_norm": 8.1875,
+      "learning_rate": 1.7869507323568576e-05,
+      "loss": 0.942,
+      "step": 320
+    },
+    {
+      "epoch": 0.10985352862849534,
+      "grad_norm": 65.0,
+      "learning_rate": 1.7802929427430096e-05,
+      "loss": 1.0208,
+      "step": 330
+    },
+    {
+      "epoch": 0.11318242343541944,
+      "grad_norm": 9.625,
+      "learning_rate": 1.7736351531291612e-05,
+      "loss": 1.1729,
+      "step": 340
+    },
+    {
+      "epoch": 0.11651131824234354,
+      "grad_norm": 17.5,
+      "learning_rate": 1.766977363515313e-05,
+      "loss": 0.9322,
+      "step": 350
+    },
+    {
+      "epoch": 0.11984021304926765,
+      "grad_norm": 28.125,
+      "learning_rate": 1.7603195739014648e-05,
+      "loss": 1.0417,
+      "step": 360
+    },
+    {
+      "epoch": 0.12316910785619174,
+      "grad_norm": 18.25,
+      "learning_rate": 1.7536617842876168e-05,
+      "loss": 1.2491,
+      "step": 370
+    },
+    {
+      "epoch": 0.12649800266311584,
+      "grad_norm": 27.375,
+      "learning_rate": 1.7470039946737684e-05,
+      "loss": 0.9388,
+      "step": 380
+    },
+    {
+      "epoch": 0.12982689747003995,
+      "grad_norm": 22.125,
+      "learning_rate": 1.7403462050599203e-05,
+      "loss": 1.0488,
+      "step": 390
+    },
+    {
+      "epoch": 0.13315579227696406,
+      "grad_norm": 21.25,
+      "learning_rate": 1.733688415446072e-05,
+      "loss": 0.9951,
+      "step": 400
+    },
+    {
+      "epoch": 0.13648468708388814,
+      "grad_norm": 11.75,
+      "learning_rate": 1.727030625832224e-05,
+      "loss": 0.9212,
+      "step": 410
+    },
+    {
+      "epoch": 0.13981358189081225,
+      "grad_norm": 16.25,
+      "learning_rate": 1.7203728362183756e-05,
+      "loss": 1.0548,
+      "step": 420
+    },
+    {
+      "epoch": 0.14314247669773636,
+      "grad_norm": 19.875,
+      "learning_rate": 1.7137150466045275e-05,
+      "loss": 1.0238,
+      "step": 430
+    },
+    {
+      "epoch": 0.14647137150466044,
+      "grad_norm": 18.875,
+      "learning_rate": 1.707057256990679e-05,
+      "loss": 1.0327,
+      "step": 440
+    },
+    {
+      "epoch": 0.14980026631158455,
+      "grad_norm": 7.84375,
+      "learning_rate": 1.7003994673768308e-05,
+      "loss": 1.1155,
+      "step": 450
+    },
+    {
+      "epoch": 0.15312916111850866,
+      "grad_norm": 14.125,
+      "learning_rate": 1.693741677762983e-05,
+      "loss": 0.9627,
+      "step": 460
+    },
+    {
+      "epoch": 0.15645805592543274,
+      "grad_norm": 7.4375,
+      "learning_rate": 1.6870838881491347e-05,
+      "loss": 1.0216,
+      "step": 470
+    },
+    {
+      "epoch": 0.15978695073235685,
+      "grad_norm": 9.375,
+      "learning_rate": 1.6804260985352863e-05,
+      "loss": 1.0489,
+      "step": 480
+    },
+    {
+      "epoch": 0.16311584553928096,
+      "grad_norm": 19.5,
+      "learning_rate": 1.6737683089214383e-05,
+      "loss": 1.1254,
+      "step": 490
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "grad_norm": 26.125,
+      "learning_rate": 1.66711051930759e-05,
+      "loss": 1.0134,
+      "step": 500
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "eval_accuracy": 0.5015762402521985,
+      "eval_loss": 0.9942083358764648,
+      "eval_runtime": 211.8123,
+      "eval_samples_per_second": 113.818,
+      "eval_steps_per_second": 28.454,
+      "step": 500
+    },
+    {
+      "epoch": 0.16977363515312915,
+      "grad_norm": 12.3125,
+      "learning_rate": 1.660452729693742e-05,
+      "loss": 1.0164,
+      "step": 510
+    },
+    {
+      "epoch": 0.17310252996005326,
+      "grad_norm": 38.75,
+      "learning_rate": 1.6537949400798935e-05,
+      "loss": 1.0042,
+      "step": 520
+    },
+    {
+      "epoch": 0.17643142476697737,
+      "grad_norm": 20.125,
+      "learning_rate": 1.6471371504660455e-05,
+      "loss": 0.9786,
+      "step": 530
+    },
+    {
+      "epoch": 0.17976031957390146,
+      "grad_norm": 11.4375,
+      "learning_rate": 1.640479360852197e-05,
+      "loss": 0.9678,
+      "step": 540
+    },
+    {
+      "epoch": 0.18308921438082557,
+      "grad_norm": 19.375,
+      "learning_rate": 1.633821571238349e-05,
+      "loss": 1.0477,
+      "step": 550
+    },
+    {
+      "epoch": 0.18641810918774968,
+      "grad_norm": 16.75,
+      "learning_rate": 1.6271637816245007e-05,
+      "loss": 1.02,
+      "step": 560
+    },
+    {
+      "epoch": 0.18974700399467376,
+      "grad_norm": 15.5,
+      "learning_rate": 1.6205059920106527e-05,
+      "loss": 0.9864,
+      "step": 570
+    },
+    {
+      "epoch": 0.19307589880159787,
+      "grad_norm": 9.75,
+      "learning_rate": 1.6138482023968043e-05,
+      "loss": 1.0019,
+      "step": 580
+    },
+    {
+      "epoch": 0.19640479360852198,
+      "grad_norm": 12.125,
+      "learning_rate": 1.6071904127829563e-05,
+      "loss": 0.9483,
+      "step": 590
+    },
+    {
+      "epoch": 0.19973368841544606,
+      "grad_norm": 30.875,
+      "learning_rate": 1.6005326231691082e-05,
+      "loss": 1.0692,
+      "step": 600
+    },
+    {
+      "epoch": 0.20306258322237017,
+      "grad_norm": 10.125,
+      "learning_rate": 1.59387483355526e-05,
+      "loss": 0.9279,
+      "step": 610
+    },
+    {
+      "epoch": 0.20639147802929428,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.5872170439414115e-05,
+      "loss": 0.941,
+      "step": 620
+    },
+    {
+      "epoch": 0.2097203728362184,
+      "grad_norm": 3.03125,
+      "learning_rate": 1.5805592543275634e-05,
+      "loss": 1.0259,
+      "step": 630
+    },
+    {
+      "epoch": 0.21304926764314247,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.573901464713715e-05,
+      "loss": 0.9687,
+      "step": 640
+    },
+    {
+      "epoch": 0.21637816245006658,
+      "grad_norm": 12.8125,
+      "learning_rate": 1.567243675099867e-05,
+      "loss": 0.9512,
+      "step": 650
+    },
+    {
+      "epoch": 0.2197070572569907,
+      "grad_norm": 9.75,
+      "learning_rate": 1.5605858854860187e-05,
+      "loss": 0.9569,
+      "step": 660
+    },
+    {
+      "epoch": 0.22303595206391477,
+      "grad_norm": 17.375,
+      "learning_rate": 1.5539280958721706e-05,
+      "loss": 0.9694,
+      "step": 670
+    },
+    {
+      "epoch": 0.22636484687083888,
+      "grad_norm": 9.625,
+      "learning_rate": 1.5472703062583222e-05,
+      "loss": 0.97,
+      "step": 680
+    },
+    {
+      "epoch": 0.229693741677763,
+      "grad_norm": 21.0,
+      "learning_rate": 1.5406125166444742e-05,
+      "loss": 1.0506,
+      "step": 690
+    },
+    {
+      "epoch": 0.23302263648468707,
+      "grad_norm": 10.3125,
+      "learning_rate": 1.533954727030626e-05,
+      "loss": 0.9184,
+      "step": 700
+    },
+    {
+      "epoch": 0.23635153129161118,
+      "grad_norm": 21.5,
+      "learning_rate": 1.5272969374167778e-05,
+      "loss": 1.005,
+      "step": 710
+    },
+    {
+      "epoch": 0.2396804260985353,
+      "grad_norm": 4.40625,
+      "learning_rate": 1.5206391478029296e-05,
+      "loss": 0.8624,
+      "step": 720
+    },
+    {
+      "epoch": 0.24300932090545938,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.5139813581890814e-05,
+      "loss": 1.0051,
+      "step": 730
+    },
+    {
+      "epoch": 0.24633821571238348,
+      "grad_norm": 16.625,
+      "learning_rate": 1.5073235685752332e-05,
+      "loss": 1.0378,
+      "step": 740
+    },
+    {
+      "epoch": 0.2496671105193076,
+      "grad_norm": 6.4375,
+      "learning_rate": 1.500665778961385e-05,
+      "loss": 0.9191,
+      "step": 750
+    },
+    {
+      "epoch": 0.2529960053262317,
+      "grad_norm": 5.5625,
+      "learning_rate": 1.4940079893475368e-05,
+      "loss": 0.9722,
+      "step": 760
+    },
+    {
+      "epoch": 0.2563249001331558,
+      "grad_norm": 22.625,
+      "learning_rate": 1.4873501997336886e-05,
+      "loss": 1.014,
+      "step": 770
+    },
+    {
+      "epoch": 0.2596537949400799,
+      "grad_norm": 21.0,
+      "learning_rate": 1.4806924101198404e-05,
+      "loss": 1.0986,
+      "step": 780
+    },
+    {
+      "epoch": 0.262982689747004,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.4740346205059922e-05,
+      "loss": 0.9672,
+      "step": 790
+    },
+    {
+      "epoch": 0.2663115845539281,
+      "grad_norm": 17.0,
+      "learning_rate": 1.467376830892144e-05,
+      "loss": 0.9605,
+      "step": 800
+    },
+    {
+      "epoch": 0.26964047936085217,
+      "grad_norm": 31.75,
+      "learning_rate": 1.4607190412782957e-05,
+      "loss": 1.0308,
+      "step": 810
+    },
+    {
+      "epoch": 0.2729693741677763,
+      "grad_norm": 12.4375,
+      "learning_rate": 1.4540612516644474e-05,
+      "loss": 0.9268,
+      "step": 820
+    },
+    {
+      "epoch": 0.2762982689747004,
+      "grad_norm": 17.125,
+      "learning_rate": 1.4474034620505992e-05,
+      "loss": 0.9656,
+      "step": 830
+    },
+    {
+      "epoch": 0.2796271637816245,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.440745672436751e-05,
+      "loss": 0.9865,
+      "step": 840
+    },
+    {
+      "epoch": 0.2829560585885486,
+      "grad_norm": 22.625,
+      "learning_rate": 1.434087882822903e-05,
+      "loss": 1.1118,
+      "step": 850
+    },
+    {
+      "epoch": 0.2862849533954727,
+      "grad_norm": 13.0,
+      "learning_rate": 1.4274300932090547e-05,
+      "loss": 1.0043,
+      "step": 860
+    },
+    {
+      "epoch": 0.28961384820239683,
+      "grad_norm": 9.0625,
+      "learning_rate": 1.4207723035952065e-05,
+      "loss": 0.9768,
+      "step": 870
+    },
+    {
+      "epoch": 0.2929427430093209,
+      "grad_norm": 6.75,
+      "learning_rate": 1.4141145139813583e-05,
+      "loss": 1.0237,
+      "step": 880
+    },
+    {
+      "epoch": 0.296271637816245,
+      "grad_norm": 10.875,
+      "learning_rate": 1.4074567243675101e-05,
+      "loss": 1.1597,
+      "step": 890
+    },
+    {
+      "epoch": 0.2996005326231691,
+      "grad_norm": 5.75,
+      "learning_rate": 1.4007989347536619e-05,
+      "loss": 0.7993,
+      "step": 900
+    },
+    {
+      "epoch": 0.3029294274300932,
+      "grad_norm": 31.5,
+      "learning_rate": 1.3941411451398137e-05,
+      "loss": 0.9802,
+      "step": 910
+    },
+    {
+      "epoch": 0.3062583222370173,
+      "grad_norm": 12.25,
+      "learning_rate": 1.3874833555259655e-05,
+      "loss": 0.9565,
+      "step": 920
+    },
+    {
+      "epoch": 0.30958721704394143,
+      "grad_norm": 11.8125,
+      "learning_rate": 1.3808255659121173e-05,
+      "loss": 1.0024,
+      "step": 930
+    },
+    {
+      "epoch": 0.3129161118508655,
+      "grad_norm": 16.625,
+      "learning_rate": 1.3741677762982691e-05,
+      "loss": 1.0209,
+      "step": 940
+    },
+    {
+      "epoch": 0.3162450066577896,
+      "grad_norm": 13.375,
+      "learning_rate": 1.3675099866844209e-05,
+      "loss": 0.8767,
+      "step": 950
+    },
+    {
+      "epoch": 0.3195739014647137,
+      "grad_norm": 20.875,
+      "learning_rate": 1.3608521970705725e-05,
+      "loss": 0.9642,
+      "step": 960
+    },
+    {
+      "epoch": 0.3229027962716378,
+      "grad_norm": 17.25,
+      "learning_rate": 1.3541944074567243e-05,
+      "loss": 0.9508,
+      "step": 970
+    },
+    {
+      "epoch": 0.3262316910785619,
+      "grad_norm": 22.25,
+      "learning_rate": 1.3475366178428764e-05,
+      "loss": 1.1324,
+      "step": 980
+    },
+    {
+      "epoch": 0.32956058588548603,
+      "grad_norm": 11.5,
+      "learning_rate": 1.3408788282290282e-05,
+      "loss": 1.0655,
+      "step": 990
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "grad_norm": 15.625,
+      "learning_rate": 1.3342210386151799e-05,
+      "loss": 0.9608,
+      "step": 1000
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "eval_accuracy": 0.507300481168077,
+      "eval_loss": 0.9467151165008545,
+      "eval_runtime": 211.5004,
+      "eval_samples_per_second": 113.986,
+      "eval_steps_per_second": 28.496,
+      "step": 1000
+    },
+    {
+      "epoch": 0.3362183754993342,
+      "grad_norm": 20.0,
+      "learning_rate": 1.3275632490013317e-05,
+      "loss": 1.0432,
+      "step": 1010
+    },
+    {
+      "epoch": 0.3395472703062583,
+      "grad_norm": 22.375,
+      "learning_rate": 1.3209054593874834e-05,
+      "loss": 1.0393,
+      "step": 1020
+    },
+    {
+      "epoch": 0.3428761651131824,
+      "grad_norm": 18.375,
+      "learning_rate": 1.3142476697736352e-05,
+      "loss": 1.0398,
+      "step": 1030
+    },
+    {
+      "epoch": 0.34620505992010653,
+      "grad_norm": 17.25,
+      "learning_rate": 1.307589880159787e-05,
+      "loss": 0.9225,
+      "step": 1040
+    },
+    {
+      "epoch": 0.34953395472703064,
+      "grad_norm": 11.75,
+      "learning_rate": 1.3009320905459388e-05,
+      "loss": 0.9262,
+      "step": 1050
+    },
+    {
+      "epoch": 0.35286284953395475,
+      "grad_norm": 10.25,
+      "learning_rate": 1.2942743009320906e-05,
+      "loss": 0.7248,
+      "step": 1060
+    },
+    {
+      "epoch": 0.3561917443408788,
+      "grad_norm": 12.875,
+      "learning_rate": 1.2876165113182424e-05,
+      "loss": 0.8358,
+      "step": 1070
+    },
+    {
+      "epoch": 0.3595206391478029,
+      "grad_norm": 4.65625,
+      "learning_rate": 1.2809587217043942e-05,
+      "loss": 0.8156,
+      "step": 1080
+    },
+    {
+      "epoch": 0.362849533954727,
+      "grad_norm": 23.5,
+      "learning_rate": 1.274300932090546e-05,
+      "loss": 1.0211,
+      "step": 1090
+    },
+    {
+      "epoch": 0.36617842876165113,
+      "grad_norm": 17.5,
+      "learning_rate": 1.2676431424766978e-05,
+      "loss": 0.9804,
+      "step": 1100
+    },
+    {
+      "epoch": 0.36950732356857524,
+      "grad_norm": 6.375,
+      "learning_rate": 1.2609853528628498e-05,
+      "loss": 0.9636,
+      "step": 1110
+    },
+    {
+      "epoch": 0.37283621837549935,
+      "grad_norm": 20.25,
+      "learning_rate": 1.2543275632490016e-05,
+      "loss": 0.9873,
+      "step": 1120
+    },
+    {
+      "epoch": 0.37616511318242346,
+      "grad_norm": 17.0,
+      "learning_rate": 1.2476697736351534e-05,
+      "loss": 0.9674,
+      "step": 1130
+    },
+    {
+      "epoch": 0.3794940079893475,
+      "grad_norm": 20.5,
+      "learning_rate": 1.241011984021305e-05,
+      "loss": 1.0073,
+      "step": 1140
+    },
+    {
+      "epoch": 0.3828229027962716,
+      "grad_norm": 4.125,
+      "learning_rate": 1.2343541944074568e-05,
+      "loss": 1.0919,
+      "step": 1150
+    },
+    {
+      "epoch": 0.38615179760319573,
+      "grad_norm": 15.3125,
+      "learning_rate": 1.2276964047936086e-05,
+      "loss": 1.0103,
+      "step": 1160
+    },
+    {
+      "epoch": 0.38948069241011984,
+      "grad_norm": 22.125,
+      "learning_rate": 1.2210386151797604e-05,
+      "loss": 0.9629,
+      "step": 1170
+    },
+    {
+      "epoch": 0.39280958721704395,
+      "grad_norm": 2.953125,
+      "learning_rate": 1.2143808255659122e-05,
+      "loss": 0.9545,
+      "step": 1180
+    },
+    {
+      "epoch": 0.39613848202396806,
+      "grad_norm": 7.59375,
+      "learning_rate": 1.207723035952064e-05,
+      "loss": 0.8836,
+      "step": 1190
+    },
+    {
+      "epoch": 0.3994673768308921,
+      "grad_norm": 28.125,
+      "learning_rate": 1.2010652463382158e-05,
+      "loss": 0.9632,
+      "step": 1200
+    },
+    {
+      "epoch": 0.40279627163781623,
+      "grad_norm": 15.125,
+      "learning_rate": 1.1944074567243676e-05,
+      "loss": 1.0765,
+      "step": 1210
+    },
+    {
+      "epoch": 0.40612516644474034,
+      "grad_norm": 11.625,
+      "learning_rate": 1.1877496671105194e-05,
+      "loss": 0.7911,
+      "step": 1220
+    },
+    {
+      "epoch": 0.40945406125166445,
+      "grad_norm": 15.5625,
+      "learning_rate": 1.1810918774966711e-05,
+      "loss": 1.0908,
+      "step": 1230
+    },
+    {
+      "epoch": 0.41278295605858856,
+      "grad_norm": 6.3125,
+      "learning_rate": 1.1744340878828231e-05,
+      "loss": 1.0418,
+      "step": 1240
+    },
+    {
+      "epoch": 0.41611185086551267,
+      "grad_norm": 2.671875,
+      "learning_rate": 1.1677762982689749e-05,
+      "loss": 0.9041,
+      "step": 1250
+    },
+    {
+      "epoch": 0.4194407456724368,
+      "grad_norm": 19.125,
+      "learning_rate": 1.1611185086551267e-05,
+      "loss": 1.0392,
+      "step": 1260
+    },
+    {
+      "epoch": 0.42276964047936083,
+      "grad_norm": 6.8125,
+      "learning_rate": 1.1544607190412785e-05,
+      "loss": 0.8439,
+      "step": 1270
+    },
+    {
+      "epoch": 0.42609853528628494,
+      "grad_norm": 7.75,
+      "learning_rate": 1.1478029294274303e-05,
+      "loss": 1.0188,
+      "step": 1280
+    },
+    {
+      "epoch": 0.42942743009320905,
+      "grad_norm": 17.25,
+      "learning_rate": 1.141145139813582e-05,
+      "loss": 1.1353,
+      "step": 1290
+    },
+    {
+      "epoch": 0.43275632490013316,
+      "grad_norm": 22.125,
+      "learning_rate": 1.1344873501997337e-05,
+      "loss": 0.987,
+      "step": 1300
+    },
+    {
+      "epoch": 0.43608521970705727,
+      "grad_norm": 16.0,
+      "learning_rate": 1.1278295605858855e-05,
+      "loss": 0.8446,
+      "step": 1310
+    },
+    {
+      "epoch": 0.4394141145139814,
+      "grad_norm": 48.25,
+      "learning_rate": 1.1211717709720373e-05,
+      "loss": 1.0588,
+      "step": 1320
+    },
+    {
+      "epoch": 0.44274300932090543,
+      "grad_norm": 17.625,
+      "learning_rate": 1.1145139813581891e-05,
+      "loss": 0.8579,
+      "step": 1330
+    },
+    {
+      "epoch": 0.44607190412782954,
+      "grad_norm": 20.25,
+      "learning_rate": 1.1078561917443409e-05,
+      "loss": 1.0973,
+      "step": 1340
+    },
+    {
+      "epoch": 0.44940079893475365,
+      "grad_norm": 14.25,
+      "learning_rate": 1.1011984021304927e-05,
+      "loss": 0.975,
+      "step": 1350
+    },
+    {
+      "epoch": 0.45272969374167776,
+      "grad_norm": 22.875,
+      "learning_rate": 1.0945406125166447e-05,
+      "loss": 0.9035,
+      "step": 1360
+    },
+    {
+      "epoch": 0.4560585885486019,
+      "grad_norm": 13.0,
+      "learning_rate": 1.0878828229027965e-05,
+      "loss": 1.0748,
+      "step": 1370
+    },
+    {
+      "epoch": 0.459387483355526,
+      "grad_norm": 13.125,
+      "learning_rate": 1.0812250332889482e-05,
+      "loss": 1.0373,
+      "step": 1380
+    },
+    {
+      "epoch": 0.4627163781624501,
+      "grad_norm": 15.0,
+      "learning_rate": 1.0745672436751e-05,
+      "loss": 0.9623,
+      "step": 1390
+    },
+    {
+      "epoch": 0.46604527296937415,
+      "grad_norm": 26.25,
+      "learning_rate": 1.0679094540612518e-05,
+      "loss": 0.9579,
+      "step": 1400
+    },
+    {
+      "epoch": 0.46937416777629826,
+      "grad_norm": 12.125,
+      "learning_rate": 1.0612516644474036e-05,
+      "loss": 0.918,
+      "step": 1410
+    },
+    {
+      "epoch": 0.47270306258322237,
+      "grad_norm": 30.75,
+      "learning_rate": 1.0545938748335554e-05,
+      "loss": 0.9823,
+      "step": 1420
+    },
+    {
+      "epoch": 0.4760319573901465,
+      "grad_norm": 7.0,
+      "learning_rate": 1.047936085219707e-05,
+      "loss": 0.9885,
+      "step": 1430
+    },
+    {
+      "epoch": 0.4793608521970706,
+      "grad_norm": 19.5,
+      "learning_rate": 1.0412782956058588e-05,
+      "loss": 0.9654,
+      "step": 1440
+    },
+    {
+      "epoch": 0.4826897470039947,
+      "grad_norm": 13.5,
+      "learning_rate": 1.0346205059920106e-05,
+      "loss": 0.8212,
+      "step": 1450
+    },
+    {
+      "epoch": 0.48601864181091875,
+      "grad_norm": 17.0,
+      "learning_rate": 1.0279627163781624e-05,
+      "loss": 0.9406,
+      "step": 1460
+    },
+    {
+      "epoch": 0.48934753661784286,
+      "grad_norm": 28.0,
+      "learning_rate": 1.0213049267643142e-05,
+      "loss": 0.9908,
+      "step": 1470
+    },
+    {
+      "epoch": 0.49267643142476697,
+      "grad_norm": 14.25,
+      "learning_rate": 1.014647137150466e-05,
+      "loss": 0.9418,
+      "step": 1480
+    },
+    {
+      "epoch": 0.4960053262316911,
+      "grad_norm": 12.25,
+      "learning_rate": 1.007989347536618e-05,
+      "loss": 1.075,
+      "step": 1490
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "grad_norm": 10.625,
+      "learning_rate": 1.0013315579227698e-05,
+      "loss": 1.0111,
+      "step": 1500
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "eval_accuracy": 0.508752281400365,
+      "eval_loss": 0.9421009421348572,
+      "eval_runtime": 209.8514,
+      "eval_samples_per_second": 114.881,
+      "eval_steps_per_second": 28.72,
+      "step": 1500
+    },
+    {
+      "epoch": 0.5026631158455392,
+      "grad_norm": 31.5,
+      "learning_rate": 9.946737683089214e-06,
+      "loss": 0.9077,
+      "step": 1510
+    },
+    {
+      "epoch": 0.5059920106524634,
+      "grad_norm": 19.375,
+      "learning_rate": 9.880159786950732e-06,
+      "loss": 1.1292,
+      "step": 1520
+    },
+    {
+      "epoch": 0.5093209054593875,
+      "grad_norm": 8.4375,
+      "learning_rate": 9.813581890812252e-06,
+      "loss": 1.0658,
+      "step": 1530
+    },
+    {
+      "epoch": 0.5126498002663116,
+      "grad_norm": 17.375,
+      "learning_rate": 9.74700399467377e-06,
+      "loss": 0.9691,
+      "step": 1540
+    },
+    {
+      "epoch": 0.5159786950732357,
+      "grad_norm": 8.125,
+      "learning_rate": 9.680426098535288e-06,
+      "loss": 1.0146,
+      "step": 1550
+    },
+    {
+      "epoch": 0.5193075898801598,
+      "grad_norm": 17.5,
+      "learning_rate": 9.613848202396806e-06,
+      "loss": 1.0196,
+      "step": 1560
+    },
+    {
+      "epoch": 0.5226364846870839,
+      "grad_norm": 15.1875,
+      "learning_rate": 9.547270306258324e-06,
+      "loss": 0.8996,
+      "step": 1570
+    },
+    {
+      "epoch": 0.525965379494008,
+      "grad_norm": 55.5,
+      "learning_rate": 9.48069241011984e-06,
+      "loss": 1.0211,
+      "step": 1580
+    },
+    {
+      "epoch": 0.5292942743009321,
+      "grad_norm": 4.21875,
+      "learning_rate": 9.41411451398136e-06,
+      "loss": 0.9728,
+      "step": 1590
+    },
+    {
+      "epoch": 0.5326231691078562,
+      "grad_norm": 6.71875,
+      "learning_rate": 9.347536617842877e-06,
+      "loss": 1.0168,
+      "step": 1600
+    },
+    {
+      "epoch": 0.5359520639147803,
+      "grad_norm": 24.75,
+      "learning_rate": 9.280958721704395e-06,
+      "loss": 0.9252,
+      "step": 1610
+    },
+    {
+      "epoch": 0.5392809587217043,
+      "grad_norm": 9.0625,
+      "learning_rate": 9.214380825565913e-06,
+      "loss": 1.0479,
+      "step": 1620
+    },
+    {
+      "epoch": 0.5426098535286284,
+      "grad_norm": 15.375,
+      "learning_rate": 9.147802929427431e-06,
+      "loss": 1.0699,
+      "step": 1630
+    },
+    {
+      "epoch": 0.5459387483355526,
+      "grad_norm": 17.5,
+      "learning_rate": 9.08122503328895e-06,
+      "loss": 0.9407,
+      "step": 1640
+    },
+    {
+      "epoch": 0.5492676431424767,
+      "grad_norm": 9.625,
+      "learning_rate": 9.014647137150465e-06,
+      "loss": 1.0124,
+      "step": 1650
+    },
+    {
+      "epoch": 0.5525965379494008,
+      "grad_norm": 10.375,
+      "learning_rate": 8.948069241011985e-06,
+      "loss": 0.8612,
+      "step": 1660
+    },
+    {
+      "epoch": 0.5559254327563249,
+      "grad_norm": 21.375,
+      "learning_rate": 8.881491344873503e-06,
+      "loss": 0.9613,
+      "step": 1670
+    },
+    {
+      "epoch": 0.559254327563249,
+      "grad_norm": 9.3125,
+      "learning_rate": 8.814913448735021e-06,
+      "loss": 0.9752,
+      "step": 1680
+    },
+    {
+      "epoch": 0.5625832223701731,
+      "grad_norm": 8.375,
+      "learning_rate": 8.748335552596539e-06,
+      "loss": 0.8926,
+      "step": 1690
+    },
+    {
+      "epoch": 0.5659121171770972,
+      "grad_norm": 17.625,
+      "learning_rate": 8.681757656458057e-06,
+      "loss": 1.0054,
+      "step": 1700
+    },
+    {
+      "epoch": 0.5692410119840213,
+      "grad_norm": 20.125,
+      "learning_rate": 8.615179760319575e-06,
+      "loss": 0.9203,
+      "step": 1710
+    },
+    {
+      "epoch": 0.5725699067909454,
+      "grad_norm": 28.75,
+      "learning_rate": 8.548601864181093e-06,
+      "loss": 0.9583,
+      "step": 1720
+    },
+    {
+      "epoch": 0.5758988015978695,
+      "grad_norm": 17.0,
+      "learning_rate": 8.48202396804261e-06,
+      "loss": 0.9582,
+      "step": 1730
+    },
+    {
+      "epoch": 0.5792276964047937,
+      "grad_norm": 13.4375,
+      "learning_rate": 8.415446071904129e-06,
+      "loss": 1.0034,
+      "step": 1740
+    },
+    {
+      "epoch": 0.5825565912117177,
+      "grad_norm": 27.75,
+      "learning_rate": 8.348868175765647e-06,
+      "loss": 1.0647,
+      "step": 1750
+    },
+    {
+      "epoch": 0.5858854860186418,
+      "grad_norm": 33.75,
+      "learning_rate": 8.282290279627165e-06,
+      "loss": 1.0575,
+      "step": 1760
+    },
+    {
+      "epoch": 0.5892143808255659,
+      "grad_norm": 14.375,
+      "learning_rate": 8.215712383488683e-06,
+      "loss": 1.0699,
+      "step": 1770
+    },
+    {
+      "epoch": 0.59254327563249,
+      "grad_norm": 6.09375,
+      "learning_rate": 8.1491344873502e-06,
+      "loss": 0.7706,
+      "step": 1780
+    },
+    {
+      "epoch": 0.5958721704394141,
+      "grad_norm": 8.5625,
+      "learning_rate": 8.082556591211719e-06,
+      "loss": 0.8626,
+      "step": 1790
+    },
+    {
+      "epoch": 0.5992010652463382,
+      "grad_norm": 7.46875,
+      "learning_rate": 8.015978695073236e-06,
+      "loss": 0.9427,
+      "step": 1800
+    },
+    {
+      "epoch": 0.6025299600532623,
+      "grad_norm": 23.375,
+      "learning_rate": 7.949400798934754e-06,
+      "loss": 0.962,
+      "step": 1810
+    },
+    {
+      "epoch": 0.6058588548601864,
+      "grad_norm": 7.65625,
+      "learning_rate": 7.882822902796272e-06,
+      "loss": 1.1216,
+      "step": 1820
+    },
+    {
+      "epoch": 0.6091877496671105,
+      "grad_norm": 10.0625,
+      "learning_rate": 7.81624500665779e-06,
+      "loss": 0.9076,
+      "step": 1830
+    },
+    {
+      "epoch": 0.6125166444740346,
+      "grad_norm": 6.78125,
+      "learning_rate": 7.749667110519308e-06,
+      "loss": 0.8158,
+      "step": 1840
+    },
+    {
+      "epoch": 0.6158455392809588,
+      "grad_norm": 29.5,
+      "learning_rate": 7.683089214380826e-06,
+      "loss": 0.9828,
+      "step": 1850
+    },
+    {
+      "epoch": 0.6191744340878829,
+      "grad_norm": 9.6875,
+      "learning_rate": 7.616511318242344e-06,
+      "loss": 0.9181,
+      "step": 1860
+    },
+    {
+      "epoch": 0.622503328894807,
+      "grad_norm": 7.6875,
+      "learning_rate": 7.549933422103862e-06,
+      "loss": 0.9864,
+      "step": 1870
+    },
+    {
+      "epoch": 0.625832223701731,
+      "grad_norm": 33.0,
+      "learning_rate": 7.48335552596538e-06,
+      "loss": 0.9898,
+      "step": 1880
+    },
+    {
+      "epoch": 0.6291611185086551,
+      "grad_norm": 13.1875,
+      "learning_rate": 7.416777629826898e-06,
+      "loss": 0.948,
+      "step": 1890
+    },
+    {
+      "epoch": 0.6324900133155792,
+      "grad_norm": 6.59375,
+      "learning_rate": 7.350199733688416e-06,
+      "loss": 0.9811,
+      "step": 1900
+    },
+    {
+      "epoch": 0.6358189081225033,
+      "grad_norm": 8.5625,
+      "learning_rate": 7.283621837549935e-06,
+      "loss": 1.0461,
+      "step": 1910
+    },
+    {
+      "epoch": 0.6391478029294274,
+      "grad_norm": 30.25,
+      "learning_rate": 7.217043941411453e-06,
+      "loss": 1.0408,
+      "step": 1920
+    },
+    {
+      "epoch": 0.6424766977363515,
+      "grad_norm": 25.625,
+      "learning_rate": 7.15046604527297e-06,
+      "loss": 1.0327,
+      "step": 1930
+    },
+    {
+      "epoch": 0.6458055925432756,
+      "grad_norm": 27.125,
+      "learning_rate": 7.083888149134488e-06,
+      "loss": 0.9213,
+      "step": 1940
+    },
+    {
+      "epoch": 0.6491344873501997,
+      "grad_norm": 22.125,
+      "learning_rate": 7.017310252996006e-06,
+      "loss": 1.1191,
+      "step": 1950
+    },
+    {
+      "epoch": 0.6524633821571239,
+      "grad_norm": 10.6875,
+      "learning_rate": 6.950732356857524e-06,
+      "loss": 0.9514,
+      "step": 1960
+    },
+    {
+      "epoch": 0.655792276964048,
+      "grad_norm": 16.375,
+      "learning_rate": 6.884154460719042e-06,
+      "loss": 0.9437,
+      "step": 1970
+    },
+    {
+      "epoch": 0.6591211717709721,
+      "grad_norm": 7.5625,
+      "learning_rate": 6.8175765645805605e-06,
+      "loss": 1.0645,
+      "step": 1980
+    },
+    {
+      "epoch": 0.6624500665778962,
+      "grad_norm": 9.125,
+      "learning_rate": 6.7509986684420784e-06,
+      "loss": 0.9085,
+      "step": 1990
+    },
+    {
+      "epoch": 0.6657789613848203,
+      "grad_norm": 9.6875,
+      "learning_rate": 6.6844207723035955e-06,
+      "loss": 0.9784,
+      "step": 2000
+    },
+    {
+      "epoch": 0.6657789613848203,
+      "eval_accuracy": 0.5068027210884354,
+      "eval_loss": 0.9414482712745667,
+      "eval_runtime": 213.4336,
+      "eval_samples_per_second": 112.953,
+      "eval_steps_per_second": 28.238,
+      "step": 2000
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 3004,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-2000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cad4c1f90cca8627b67272acf21697ce26139a89a13bf55f49567978573bff87
+size 5176

checkpoint-2500/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: gpt2
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-2500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "gpt2",
+  "bias": "none",
+  "fan_in_fan_out": true,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-2500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:28ba2843875dd81dc0f6b90fe92405c4ab1e51d38eb2aaf2c618a4942bcf8fac
+size 594496

checkpoint-2500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e4347e6cd2adac03a5dcc48f5a7cadb3aea80790877a706d6c9af012d30f827
+size 1197932

checkpoint-2500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f14627f211ca36d3af1867ea78696c0819636f756609895c84de083ffa0d303d
+size 14180

checkpoint-2500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:906f066f6010c5454854d9acb3637631b32f9cceedf440679327fdfb31848b15
+size 1064

checkpoint-2500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1828 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.8322237017310253,
+  "eval_steps": 500,
+  "global_step": 2500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.003328894806924101,
+      "grad_norm": 37.75,
+      "learning_rate": 1.9933422103861518e-05,
+      "loss": 1.4084,
+      "step": 10
+    },
+    {
+      "epoch": 0.006657789613848202,
+      "grad_norm": 12.125,
+      "learning_rate": 1.9866844207723038e-05,
+      "loss": 1.0474,
+      "step": 20
+    },
+    {
+      "epoch": 0.009986684420772303,
+      "grad_norm": 45.25,
+      "learning_rate": 1.9800266311584554e-05,
+      "loss": 1.2455,
+      "step": 30
+    },
+    {
+      "epoch": 0.013315579227696404,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.9733688415446073e-05,
+      "loss": 1.1969,
+      "step": 40
+    },
+    {
+      "epoch": 0.016644474034620507,
+      "grad_norm": 32.0,
+      "learning_rate": 1.966711051930759e-05,
+      "loss": 1.1659,
+      "step": 50
+    },
+    {
+      "epoch": 0.019973368841544607,
+      "grad_norm": 19.0,
+      "learning_rate": 1.960053262316911e-05,
+      "loss": 1.1159,
+      "step": 60
+    },
+    {
+      "epoch": 0.02330226364846871,
+      "grad_norm": 27.5,
+      "learning_rate": 1.953395472703063e-05,
+      "loss": 1.1861,
+      "step": 70
+    },
+    {
+      "epoch": 0.02663115845539281,
+      "grad_norm": 17.5,
+      "learning_rate": 1.9467376830892145e-05,
+      "loss": 1.1073,
+      "step": 80
+    },
+    {
+      "epoch": 0.02996005326231691,
+      "grad_norm": 33.75,
+      "learning_rate": 1.9400798934753665e-05,
+      "loss": 1.1506,
+      "step": 90
+    },
+    {
+      "epoch": 0.033288948069241014,
+      "grad_norm": 9.5,
+      "learning_rate": 1.933422103861518e-05,
+      "loss": 1.246,
+      "step": 100
+    },
+    {
+      "epoch": 0.03661784287616511,
+      "grad_norm": 11.3125,
+      "learning_rate": 1.92676431424767e-05,
+      "loss": 1.1708,
+      "step": 110
+    },
+    {
+      "epoch": 0.03994673768308921,
+      "grad_norm": 14.375,
+      "learning_rate": 1.9201065246338217e-05,
+      "loss": 1.1302,
+      "step": 120
+    },
+    {
+      "epoch": 0.043275632490013316,
+      "grad_norm": 20.625,
+      "learning_rate": 1.9134487350199737e-05,
+      "loss": 1.0537,
+      "step": 130
+    },
+    {
+      "epoch": 0.04660452729693742,
+      "grad_norm": 21.75,
+      "learning_rate": 1.9067909454061253e-05,
+      "loss": 1.0669,
+      "step": 140
+    },
+    {
+      "epoch": 0.049933422103861515,
+      "grad_norm": 14.125,
+      "learning_rate": 1.900133155792277e-05,
+      "loss": 1.0482,
+      "step": 150
+    },
+    {
+      "epoch": 0.05326231691078562,
+      "grad_norm": 27.625,
+      "learning_rate": 1.893475366178429e-05,
+      "loss": 1.1468,
+      "step": 160
+    },
+    {
+      "epoch": 0.05659121171770972,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.8868175765645805e-05,
+      "loss": 1.092,
+      "step": 170
+    },
+    {
+      "epoch": 0.05992010652463382,
+      "grad_norm": 32.5,
+      "learning_rate": 1.8801597869507325e-05,
+      "loss": 1.2208,
+      "step": 180
+    },
+    {
+      "epoch": 0.06324900133155792,
+      "grad_norm": 56.0,
+      "learning_rate": 1.873501997336884e-05,
+      "loss": 1.2359,
+      "step": 190
+    },
+    {
+      "epoch": 0.06657789613848203,
+      "grad_norm": 8.375,
+      "learning_rate": 1.866844207723036e-05,
+      "loss": 1.0712,
+      "step": 200
+    },
+    {
+      "epoch": 0.06990679094540612,
+      "grad_norm": 26.375,
+      "learning_rate": 1.860186418109188e-05,
+      "loss": 1.0156,
+      "step": 210
+    },
+    {
+      "epoch": 0.07323568575233022,
+      "grad_norm": 12.0,
+      "learning_rate": 1.8535286284953397e-05,
+      "loss": 1.0407,
+      "step": 220
+    },
+    {
+      "epoch": 0.07656458055925433,
+      "grad_norm": 13.75,
+      "learning_rate": 1.8468708388814916e-05,
+      "loss": 1.0161,
+      "step": 230
+    },
+    {
+      "epoch": 0.07989347536617843,
+      "grad_norm": 27.125,
+      "learning_rate": 1.8402130492676432e-05,
+      "loss": 1.1466,
+      "step": 240
+    },
+    {
+      "epoch": 0.08322237017310254,
+      "grad_norm": 26.125,
+      "learning_rate": 1.8335552596537952e-05,
+      "loss": 1.3192,
+      "step": 250
+    },
+    {
+      "epoch": 0.08655126498002663,
+      "grad_norm": 25.75,
+      "learning_rate": 1.826897470039947e-05,
+      "loss": 1.0628,
+      "step": 260
+    },
+    {
+      "epoch": 0.08988015978695073,
+      "grad_norm": 29.25,
+      "learning_rate": 1.8202396804260988e-05,
+      "loss": 0.967,
+      "step": 270
+    },
+    {
+      "epoch": 0.09320905459387484,
+      "grad_norm": 29.75,
+      "learning_rate": 1.8135818908122504e-05,
+      "loss": 1.0356,
+      "step": 280
+    },
+    {
+      "epoch": 0.09653794940079893,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.806924101198402e-05,
+      "loss": 0.9853,
+      "step": 290
+    },
+    {
+      "epoch": 0.09986684420772303,
+      "grad_norm": 27.5,
+      "learning_rate": 1.800266311584554e-05,
+      "loss": 1.0279,
+      "step": 300
+    },
+    {
+      "epoch": 0.10319573901464714,
+      "grad_norm": 5.65625,
+      "learning_rate": 1.7936085219707056e-05,
+      "loss": 1.0115,
+      "step": 310
+    },
+    {
+      "epoch": 0.10652463382157124,
+      "grad_norm": 8.1875,
+      "learning_rate": 1.7869507323568576e-05,
+      "loss": 0.942,
+      "step": 320
+    },
+    {
+      "epoch": 0.10985352862849534,
+      "grad_norm": 65.0,
+      "learning_rate": 1.7802929427430096e-05,
+      "loss": 1.0208,
+      "step": 330
+    },
+    {
+      "epoch": 0.11318242343541944,
+      "grad_norm": 9.625,
+      "learning_rate": 1.7736351531291612e-05,
+      "loss": 1.1729,
+      "step": 340
+    },
+    {
+      "epoch": 0.11651131824234354,
+      "grad_norm": 17.5,
+      "learning_rate": 1.766977363515313e-05,
+      "loss": 0.9322,
+      "step": 350
+    },
+    {
+      "epoch": 0.11984021304926765,
+      "grad_norm": 28.125,
+      "learning_rate": 1.7603195739014648e-05,
+      "loss": 1.0417,
+      "step": 360
+    },
+    {
+      "epoch": 0.12316910785619174,
+      "grad_norm": 18.25,
+      "learning_rate": 1.7536617842876168e-05,
+      "loss": 1.2491,
+      "step": 370
+    },
+    {
+      "epoch": 0.12649800266311584,
+      "grad_norm": 27.375,
+      "learning_rate": 1.7470039946737684e-05,
+      "loss": 0.9388,
+      "step": 380
+    },
+    {
+      "epoch": 0.12982689747003995,
+      "grad_norm": 22.125,
+      "learning_rate": 1.7403462050599203e-05,
+      "loss": 1.0488,
+      "step": 390
+    },
+    {
+      "epoch": 0.13315579227696406,
+      "grad_norm": 21.25,
+      "learning_rate": 1.733688415446072e-05,
+      "loss": 0.9951,
+      "step": 400
+    },
+    {
+      "epoch": 0.13648468708388814,
+      "grad_norm": 11.75,
+      "learning_rate": 1.727030625832224e-05,
+      "loss": 0.9212,
+      "step": 410
+    },
+    {
+      "epoch": 0.13981358189081225,
+      "grad_norm": 16.25,
+      "learning_rate": 1.7203728362183756e-05,
+      "loss": 1.0548,
+      "step": 420
+    },
+    {
+      "epoch": 0.14314247669773636,
+      "grad_norm": 19.875,
+      "learning_rate": 1.7137150466045275e-05,
+      "loss": 1.0238,
+      "step": 430
+    },
+    {
+      "epoch": 0.14647137150466044,
+      "grad_norm": 18.875,
+      "learning_rate": 1.707057256990679e-05,
+      "loss": 1.0327,
+      "step": 440
+    },
+    {
+      "epoch": 0.14980026631158455,
+      "grad_norm": 7.84375,
+      "learning_rate": 1.7003994673768308e-05,
+      "loss": 1.1155,
+      "step": 450
+    },
+    {
+      "epoch": 0.15312916111850866,
+      "grad_norm": 14.125,
+      "learning_rate": 1.693741677762983e-05,
+      "loss": 0.9627,
+      "step": 460
+    },
+    {
+      "epoch": 0.15645805592543274,
+      "grad_norm": 7.4375,
+      "learning_rate": 1.6870838881491347e-05,
+      "loss": 1.0216,
+      "step": 470
+    },
+    {
+      "epoch": 0.15978695073235685,
+      "grad_norm": 9.375,
+      "learning_rate": 1.6804260985352863e-05,
+      "loss": 1.0489,
+      "step": 480
+    },
+    {
+      "epoch": 0.16311584553928096,
+      "grad_norm": 19.5,
+      "learning_rate": 1.6737683089214383e-05,
+      "loss": 1.1254,
+      "step": 490
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "grad_norm": 26.125,
+      "learning_rate": 1.66711051930759e-05,
+      "loss": 1.0134,
+      "step": 500
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "eval_accuracy": 0.5015762402521985,
+      "eval_loss": 0.9942083358764648,
+      "eval_runtime": 211.8123,
+      "eval_samples_per_second": 113.818,
+      "eval_steps_per_second": 28.454,
+      "step": 500
+    },
+    {
+      "epoch": 0.16977363515312915,
+      "grad_norm": 12.3125,
+      "learning_rate": 1.660452729693742e-05,
+      "loss": 1.0164,
+      "step": 510
+    },
+    {
+      "epoch": 0.17310252996005326,
+      "grad_norm": 38.75,
+      "learning_rate": 1.6537949400798935e-05,
+      "loss": 1.0042,
+      "step": 520
+    },
+    {
+      "epoch": 0.17643142476697737,
+      "grad_norm": 20.125,
+      "learning_rate": 1.6471371504660455e-05,
+      "loss": 0.9786,
+      "step": 530
+    },
+    {
+      "epoch": 0.17976031957390146,
+      "grad_norm": 11.4375,
+      "learning_rate": 1.640479360852197e-05,
+      "loss": 0.9678,
+      "step": 540
+    },
+    {
+      "epoch": 0.18308921438082557,
+      "grad_norm": 19.375,
+      "learning_rate": 1.633821571238349e-05,
+      "loss": 1.0477,
+      "step": 550
+    },
+    {
+      "epoch": 0.18641810918774968,
+      "grad_norm": 16.75,
+      "learning_rate": 1.6271637816245007e-05,
+      "loss": 1.02,
+      "step": 560
+    },
+    {
+      "epoch": 0.18974700399467376,
+      "grad_norm": 15.5,
+      "learning_rate": 1.6205059920106527e-05,
+      "loss": 0.9864,
+      "step": 570
+    },
+    {
+      "epoch": 0.19307589880159787,
+      "grad_norm": 9.75,
+      "learning_rate": 1.6138482023968043e-05,
+      "loss": 1.0019,
+      "step": 580
+    },
+    {
+      "epoch": 0.19640479360852198,
+      "grad_norm": 12.125,
+      "learning_rate": 1.6071904127829563e-05,
+      "loss": 0.9483,
+      "step": 590
+    },
+    {
+      "epoch": 0.19973368841544606,
+      "grad_norm": 30.875,
+      "learning_rate": 1.6005326231691082e-05,
+      "loss": 1.0692,
+      "step": 600
+    },
+    {
+      "epoch": 0.20306258322237017,
+      "grad_norm": 10.125,
+      "learning_rate": 1.59387483355526e-05,
+      "loss": 0.9279,
+      "step": 610
+    },
+    {
+      "epoch": 0.20639147802929428,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.5872170439414115e-05,
+      "loss": 0.941,
+      "step": 620
+    },
+    {
+      "epoch": 0.2097203728362184,
+      "grad_norm": 3.03125,
+      "learning_rate": 1.5805592543275634e-05,
+      "loss": 1.0259,
+      "step": 630
+    },
+    {
+      "epoch": 0.21304926764314247,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.573901464713715e-05,
+      "loss": 0.9687,
+      "step": 640
+    },
+    {
+      "epoch": 0.21637816245006658,
+      "grad_norm": 12.8125,
+      "learning_rate": 1.567243675099867e-05,
+      "loss": 0.9512,
+      "step": 650
+    },
+    {
+      "epoch": 0.2197070572569907,
+      "grad_norm": 9.75,
+      "learning_rate": 1.5605858854860187e-05,
+      "loss": 0.9569,
+      "step": 660
+    },
+    {
+      "epoch": 0.22303595206391477,
+      "grad_norm": 17.375,
+      "learning_rate": 1.5539280958721706e-05,
+      "loss": 0.9694,
+      "step": 670
+    },
+    {
+      "epoch": 0.22636484687083888,
+      "grad_norm": 9.625,
+      "learning_rate": 1.5472703062583222e-05,
+      "loss": 0.97,
+      "step": 680
+    },
+    {
+      "epoch": 0.229693741677763,
+      "grad_norm": 21.0,
+      "learning_rate": 1.5406125166444742e-05,
+      "loss": 1.0506,
+      "step": 690
+    },
+    {
+      "epoch": 0.23302263648468707,
+      "grad_norm": 10.3125,
+      "learning_rate": 1.533954727030626e-05,
+      "loss": 0.9184,
+      "step": 700
+    },
+    {
+      "epoch": 0.23635153129161118,
+      "grad_norm": 21.5,
+      "learning_rate": 1.5272969374167778e-05,
+      "loss": 1.005,
+      "step": 710
+    },
+    {
+      "epoch": 0.2396804260985353,
+      "grad_norm": 4.40625,
+      "learning_rate": 1.5206391478029296e-05,
+      "loss": 0.8624,
+      "step": 720
+    },
+    {
+      "epoch": 0.24300932090545938,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.5139813581890814e-05,
+      "loss": 1.0051,
+      "step": 730
+    },
+    {
+      "epoch": 0.24633821571238348,
+      "grad_norm": 16.625,
+      "learning_rate": 1.5073235685752332e-05,
+      "loss": 1.0378,
+      "step": 740
+    },
+    {
+      "epoch": 0.2496671105193076,
+      "grad_norm": 6.4375,
+      "learning_rate": 1.500665778961385e-05,
+      "loss": 0.9191,
+      "step": 750
+    },
+    {
+      "epoch": 0.2529960053262317,
+      "grad_norm": 5.5625,
+      "learning_rate": 1.4940079893475368e-05,
+      "loss": 0.9722,
+      "step": 760
+    },
+    {
+      "epoch": 0.2563249001331558,
+      "grad_norm": 22.625,
+      "learning_rate": 1.4873501997336886e-05,
+      "loss": 1.014,
+      "step": 770
+    },
+    {
+      "epoch": 0.2596537949400799,
+      "grad_norm": 21.0,
+      "learning_rate": 1.4806924101198404e-05,
+      "loss": 1.0986,
+      "step": 780
+    },
+    {
+      "epoch": 0.262982689747004,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.4740346205059922e-05,
+      "loss": 0.9672,
+      "step": 790
+    },
+    {
+      "epoch": 0.2663115845539281,
+      "grad_norm": 17.0,
+      "learning_rate": 1.467376830892144e-05,
+      "loss": 0.9605,
+      "step": 800
+    },
+    {
+      "epoch": 0.26964047936085217,
+      "grad_norm": 31.75,
+      "learning_rate": 1.4607190412782957e-05,
+      "loss": 1.0308,
+      "step": 810
+    },
+    {
+      "epoch": 0.2729693741677763,
+      "grad_norm": 12.4375,
+      "learning_rate": 1.4540612516644474e-05,
+      "loss": 0.9268,
+      "step": 820
+    },
+    {
+      "epoch": 0.2762982689747004,
+      "grad_norm": 17.125,
+      "learning_rate": 1.4474034620505992e-05,
+      "loss": 0.9656,
+      "step": 830
+    },
+    {
+      "epoch": 0.2796271637816245,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.440745672436751e-05,
+      "loss": 0.9865,
+      "step": 840
+    },
+    {
+      "epoch": 0.2829560585885486,
+      "grad_norm": 22.625,
+      "learning_rate": 1.434087882822903e-05,
+      "loss": 1.1118,
+      "step": 850
+    },
+    {
+      "epoch": 0.2862849533954727,
+      "grad_norm": 13.0,
+      "learning_rate": 1.4274300932090547e-05,
+      "loss": 1.0043,
+      "step": 860
+    },
+    {
+      "epoch": 0.28961384820239683,
+      "grad_norm": 9.0625,
+      "learning_rate": 1.4207723035952065e-05,
+      "loss": 0.9768,
+      "step": 870
+    },
+    {
+      "epoch": 0.2929427430093209,
+      "grad_norm": 6.75,
+      "learning_rate": 1.4141145139813583e-05,
+      "loss": 1.0237,
+      "step": 880
+    },
+    {
+      "epoch": 0.296271637816245,
+      "grad_norm": 10.875,
+      "learning_rate": 1.4074567243675101e-05,
+      "loss": 1.1597,
+      "step": 890
+    },
+    {
+      "epoch": 0.2996005326231691,
+      "grad_norm": 5.75,
+      "learning_rate": 1.4007989347536619e-05,
+      "loss": 0.7993,
+      "step": 900
+    },
+    {
+      "epoch": 0.3029294274300932,
+      "grad_norm": 31.5,
+      "learning_rate": 1.3941411451398137e-05,
+      "loss": 0.9802,
+      "step": 910
+    },
+    {
+      "epoch": 0.3062583222370173,
+      "grad_norm": 12.25,
+      "learning_rate": 1.3874833555259655e-05,
+      "loss": 0.9565,
+      "step": 920
+    },
+    {
+      "epoch": 0.30958721704394143,
+      "grad_norm": 11.8125,
+      "learning_rate": 1.3808255659121173e-05,
+      "loss": 1.0024,
+      "step": 930
+    },
+    {
+      "epoch": 0.3129161118508655,
+      "grad_norm": 16.625,
+      "learning_rate": 1.3741677762982691e-05,
+      "loss": 1.0209,
+      "step": 940
+    },
+    {
+      "epoch": 0.3162450066577896,
+      "grad_norm": 13.375,
+      "learning_rate": 1.3675099866844209e-05,
+      "loss": 0.8767,
+      "step": 950
+    },
+    {
+      "epoch": 0.3195739014647137,
+      "grad_norm": 20.875,
+      "learning_rate": 1.3608521970705725e-05,
+      "loss": 0.9642,
+      "step": 960
+    },
+    {
+      "epoch": 0.3229027962716378,
+      "grad_norm": 17.25,
+      "learning_rate": 1.3541944074567243e-05,
+      "loss": 0.9508,
+      "step": 970
+    },
+    {
+      "epoch": 0.3262316910785619,
+      "grad_norm": 22.25,
+      "learning_rate": 1.3475366178428764e-05,
+      "loss": 1.1324,
+      "step": 980
+    },
+    {
+      "epoch": 0.32956058588548603,
+      "grad_norm": 11.5,
+      "learning_rate": 1.3408788282290282e-05,
+      "loss": 1.0655,
+      "step": 990
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "grad_norm": 15.625,
+      "learning_rate": 1.3342210386151799e-05,
+      "loss": 0.9608,
+      "step": 1000
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "eval_accuracy": 0.507300481168077,
+      "eval_loss": 0.9467151165008545,
+      "eval_runtime": 211.5004,
+      "eval_samples_per_second": 113.986,
+      "eval_steps_per_second": 28.496,
+      "step": 1000
+    },
+    {
+      "epoch": 0.3362183754993342,
+      "grad_norm": 20.0,
+      "learning_rate": 1.3275632490013317e-05,
+      "loss": 1.0432,
+      "step": 1010
+    },
+    {
+      "epoch": 0.3395472703062583,
+      "grad_norm": 22.375,
+      "learning_rate": 1.3209054593874834e-05,
+      "loss": 1.0393,
+      "step": 1020
+    },
+    {
+      "epoch": 0.3428761651131824,
+      "grad_norm": 18.375,
+      "learning_rate": 1.3142476697736352e-05,
+      "loss": 1.0398,
+      "step": 1030
+    },
+    {
+      "epoch": 0.34620505992010653,
+      "grad_norm": 17.25,
+      "learning_rate": 1.307589880159787e-05,
+      "loss": 0.9225,
+      "step": 1040
+    },
+    {
+      "epoch": 0.34953395472703064,
+      "grad_norm": 11.75,
+      "learning_rate": 1.3009320905459388e-05,
+      "loss": 0.9262,
+      "step": 1050
+    },
+    {
+      "epoch": 0.35286284953395475,
+      "grad_norm": 10.25,
+      "learning_rate": 1.2942743009320906e-05,
+      "loss": 0.7248,
+      "step": 1060
+    },
+    {
+      "epoch": 0.3561917443408788,
+      "grad_norm": 12.875,
+      "learning_rate": 1.2876165113182424e-05,
+      "loss": 0.8358,
+      "step": 1070
+    },
+    {
+      "epoch": 0.3595206391478029,
+      "grad_norm": 4.65625,
+      "learning_rate": 1.2809587217043942e-05,
+      "loss": 0.8156,
+      "step": 1080
+    },
+    {
+      "epoch": 0.362849533954727,
+      "grad_norm": 23.5,
+      "learning_rate": 1.274300932090546e-05,
+      "loss": 1.0211,
+      "step": 1090
+    },
+    {
+      "epoch": 0.36617842876165113,
+      "grad_norm": 17.5,
+      "learning_rate": 1.2676431424766978e-05,
+      "loss": 0.9804,
+      "step": 1100
+    },
+    {
+      "epoch": 0.36950732356857524,
+      "grad_norm": 6.375,
+      "learning_rate": 1.2609853528628498e-05,
+      "loss": 0.9636,
+      "step": 1110
+    },
+    {
+      "epoch": 0.37283621837549935,
+      "grad_norm": 20.25,
+      "learning_rate": 1.2543275632490016e-05,
+      "loss": 0.9873,
+      "step": 1120
+    },
+    {
+      "epoch": 0.37616511318242346,
+      "grad_norm": 17.0,
+      "learning_rate": 1.2476697736351534e-05,
+      "loss": 0.9674,
+      "step": 1130
+    },
+    {
+      "epoch": 0.3794940079893475,
+      "grad_norm": 20.5,
+      "learning_rate": 1.241011984021305e-05,
+      "loss": 1.0073,
+      "step": 1140
+    },
+    {
+      "epoch": 0.3828229027962716,
+      "grad_norm": 4.125,
+      "learning_rate": 1.2343541944074568e-05,
+      "loss": 1.0919,
+      "step": 1150
+    },
+    {
+      "epoch": 0.38615179760319573,
+      "grad_norm": 15.3125,
+      "learning_rate": 1.2276964047936086e-05,
+      "loss": 1.0103,
+      "step": 1160
+    },
+    {
+      "epoch": 0.38948069241011984,
+      "grad_norm": 22.125,
+      "learning_rate": 1.2210386151797604e-05,
+      "loss": 0.9629,
+      "step": 1170
+    },
+    {
+      "epoch": 0.39280958721704395,
+      "grad_norm": 2.953125,
+      "learning_rate": 1.2143808255659122e-05,
+      "loss": 0.9545,
+      "step": 1180
+    },
+    {
+      "epoch": 0.39613848202396806,
+      "grad_norm": 7.59375,
+      "learning_rate": 1.207723035952064e-05,
+      "loss": 0.8836,
+      "step": 1190
+    },
+    {
+      "epoch": 0.3994673768308921,
+      "grad_norm": 28.125,
+      "learning_rate": 1.2010652463382158e-05,
+      "loss": 0.9632,
+      "step": 1200
+    },
+    {
+      "epoch": 0.40279627163781623,
+      "grad_norm": 15.125,
+      "learning_rate": 1.1944074567243676e-05,
+      "loss": 1.0765,
+      "step": 1210
+    },
+    {
+      "epoch": 0.40612516644474034,
+      "grad_norm": 11.625,
+      "learning_rate": 1.1877496671105194e-05,
+      "loss": 0.7911,
+      "step": 1220
+    },
+    {
+      "epoch": 0.40945406125166445,
+      "grad_norm": 15.5625,
+      "learning_rate": 1.1810918774966711e-05,
+      "loss": 1.0908,
+      "step": 1230
+    },
+    {
+      "epoch": 0.41278295605858856,
+      "grad_norm": 6.3125,
+      "learning_rate": 1.1744340878828231e-05,
+      "loss": 1.0418,
+      "step": 1240
+    },
+    {
+      "epoch": 0.41611185086551267,
+      "grad_norm": 2.671875,
+      "learning_rate": 1.1677762982689749e-05,
+      "loss": 0.9041,
+      "step": 1250
+    },
+    {
+      "epoch": 0.4194407456724368,
+      "grad_norm": 19.125,
+      "learning_rate": 1.1611185086551267e-05,
+      "loss": 1.0392,
+      "step": 1260
+    },
+    {
+      "epoch": 0.42276964047936083,
+      "grad_norm": 6.8125,
+      "learning_rate": 1.1544607190412785e-05,
+      "loss": 0.8439,
+      "step": 1270
+    },
+    {
+      "epoch": 0.42609853528628494,
+      "grad_norm": 7.75,
+      "learning_rate": 1.1478029294274303e-05,
+      "loss": 1.0188,
+      "step": 1280
+    },
+    {
+      "epoch": 0.42942743009320905,
+      "grad_norm": 17.25,
+      "learning_rate": 1.141145139813582e-05,
+      "loss": 1.1353,
+      "step": 1290
+    },
+    {
+      "epoch": 0.43275632490013316,
+      "grad_norm": 22.125,
+      "learning_rate": 1.1344873501997337e-05,
+      "loss": 0.987,
+      "step": 1300
+    },
+    {
+      "epoch": 0.43608521970705727,
+      "grad_norm": 16.0,
+      "learning_rate": 1.1278295605858855e-05,
+      "loss": 0.8446,
+      "step": 1310
+    },
+    {
+      "epoch": 0.4394141145139814,
+      "grad_norm": 48.25,
+      "learning_rate": 1.1211717709720373e-05,
+      "loss": 1.0588,
+      "step": 1320
+    },
+    {
+      "epoch": 0.44274300932090543,
+      "grad_norm": 17.625,
+      "learning_rate": 1.1145139813581891e-05,
+      "loss": 0.8579,
+      "step": 1330
+    },
+    {
+      "epoch": 0.44607190412782954,
+      "grad_norm": 20.25,
+      "learning_rate": 1.1078561917443409e-05,
+      "loss": 1.0973,
+      "step": 1340
+    },
+    {
+      "epoch": 0.44940079893475365,
+      "grad_norm": 14.25,
+      "learning_rate": 1.1011984021304927e-05,
+      "loss": 0.975,
+      "step": 1350
+    },
+    {
+      "epoch": 0.45272969374167776,
+      "grad_norm": 22.875,
+      "learning_rate": 1.0945406125166447e-05,
+      "loss": 0.9035,
+      "step": 1360
+    },
+    {
+      "epoch": 0.4560585885486019,
+      "grad_norm": 13.0,
+      "learning_rate": 1.0878828229027965e-05,
+      "loss": 1.0748,
+      "step": 1370
+    },
+    {
+      "epoch": 0.459387483355526,
+      "grad_norm": 13.125,
+      "learning_rate": 1.0812250332889482e-05,
+      "loss": 1.0373,
+      "step": 1380
+    },
+    {
+      "epoch": 0.4627163781624501,
+      "grad_norm": 15.0,
+      "learning_rate": 1.0745672436751e-05,
+      "loss": 0.9623,
+      "step": 1390
+    },
+    {
+      "epoch": 0.46604527296937415,
+      "grad_norm": 26.25,
+      "learning_rate": 1.0679094540612518e-05,
+      "loss": 0.9579,
+      "step": 1400
+    },
+    {
+      "epoch": 0.46937416777629826,
+      "grad_norm": 12.125,
+      "learning_rate": 1.0612516644474036e-05,
+      "loss": 0.918,
+      "step": 1410
+    },
+    {
+      "epoch": 0.47270306258322237,
+      "grad_norm": 30.75,
+      "learning_rate": 1.0545938748335554e-05,
+      "loss": 0.9823,
+      "step": 1420
+    },
+    {
+      "epoch": 0.4760319573901465,
+      "grad_norm": 7.0,
+      "learning_rate": 1.047936085219707e-05,
+      "loss": 0.9885,
+      "step": 1430
+    },
+    {
+      "epoch": 0.4793608521970706,
+      "grad_norm": 19.5,
+      "learning_rate": 1.0412782956058588e-05,
+      "loss": 0.9654,
+      "step": 1440
+    },
+    {
+      "epoch": 0.4826897470039947,
+      "grad_norm": 13.5,
+      "learning_rate": 1.0346205059920106e-05,
+      "loss": 0.8212,
+      "step": 1450
+    },
+    {
+      "epoch": 0.48601864181091875,
+      "grad_norm": 17.0,
+      "learning_rate": 1.0279627163781624e-05,
+      "loss": 0.9406,
+      "step": 1460
+    },
+    {
+      "epoch": 0.48934753661784286,
+      "grad_norm": 28.0,
+      "learning_rate": 1.0213049267643142e-05,
+      "loss": 0.9908,
+      "step": 1470
+    },
+    {
+      "epoch": 0.49267643142476697,
+      "grad_norm": 14.25,
+      "learning_rate": 1.014647137150466e-05,
+      "loss": 0.9418,
+      "step": 1480
+    },
+    {
+      "epoch": 0.4960053262316911,
+      "grad_norm": 12.25,
+      "learning_rate": 1.007989347536618e-05,
+      "loss": 1.075,
+      "step": 1490
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "grad_norm": 10.625,
+      "learning_rate": 1.0013315579227698e-05,
+      "loss": 1.0111,
+      "step": 1500
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "eval_accuracy": 0.508752281400365,
+      "eval_loss": 0.9421009421348572,
+      "eval_runtime": 209.8514,
+      "eval_samples_per_second": 114.881,
+      "eval_steps_per_second": 28.72,
+      "step": 1500
+    },
+    {
+      "epoch": 0.5026631158455392,
+      "grad_norm": 31.5,
+      "learning_rate": 9.946737683089214e-06,
+      "loss": 0.9077,
+      "step": 1510
+    },
+    {
+      "epoch": 0.5059920106524634,
+      "grad_norm": 19.375,
+      "learning_rate": 9.880159786950732e-06,
+      "loss": 1.1292,
+      "step": 1520
+    },
+    {
+      "epoch": 0.5093209054593875,
+      "grad_norm": 8.4375,
+      "learning_rate": 9.813581890812252e-06,
+      "loss": 1.0658,
+      "step": 1530
+    },
+    {
+      "epoch": 0.5126498002663116,
+      "grad_norm": 17.375,
+      "learning_rate": 9.74700399467377e-06,
+      "loss": 0.9691,
+      "step": 1540
+    },
+    {
+      "epoch": 0.5159786950732357,
+      "grad_norm": 8.125,
+      "learning_rate": 9.680426098535288e-06,
+      "loss": 1.0146,
+      "step": 1550
+    },
+    {
+      "epoch": 0.5193075898801598,
+      "grad_norm": 17.5,
+      "learning_rate": 9.613848202396806e-06,
+      "loss": 1.0196,
+      "step": 1560
+    },
+    {
+      "epoch": 0.5226364846870839,
+      "grad_norm": 15.1875,
+      "learning_rate": 9.547270306258324e-06,
+      "loss": 0.8996,
+      "step": 1570
+    },
+    {
+      "epoch": 0.525965379494008,
+      "grad_norm": 55.5,
+      "learning_rate": 9.48069241011984e-06,
+      "loss": 1.0211,
+      "step": 1580
+    },
+    {
+      "epoch": 0.5292942743009321,
+      "grad_norm": 4.21875,
+      "learning_rate": 9.41411451398136e-06,
+      "loss": 0.9728,
+      "step": 1590
+    },
+    {
+      "epoch": 0.5326231691078562,
+      "grad_norm": 6.71875,
+      "learning_rate": 9.347536617842877e-06,
+      "loss": 1.0168,
+      "step": 1600
+    },
+    {
+      "epoch": 0.5359520639147803,
+      "grad_norm": 24.75,
+      "learning_rate": 9.280958721704395e-06,
+      "loss": 0.9252,
+      "step": 1610
+    },
+    {
+      "epoch": 0.5392809587217043,
+      "grad_norm": 9.0625,
+      "learning_rate": 9.214380825565913e-06,
+      "loss": 1.0479,
+      "step": 1620
+    },
+    {
+      "epoch": 0.5426098535286284,
+      "grad_norm": 15.375,
+      "learning_rate": 9.147802929427431e-06,
+      "loss": 1.0699,
+      "step": 1630
+    },
+    {
+      "epoch": 0.5459387483355526,
+      "grad_norm": 17.5,
+      "learning_rate": 9.08122503328895e-06,
+      "loss": 0.9407,
+      "step": 1640
+    },
+    {
+      "epoch": 0.5492676431424767,
+      "grad_norm": 9.625,
+      "learning_rate": 9.014647137150465e-06,
+      "loss": 1.0124,
+      "step": 1650
+    },
+    {
+      "epoch": 0.5525965379494008,
+      "grad_norm": 10.375,
+      "learning_rate": 8.948069241011985e-06,
+      "loss": 0.8612,
+      "step": 1660
+    },
+    {
+      "epoch": 0.5559254327563249,
+      "grad_norm": 21.375,
+      "learning_rate": 8.881491344873503e-06,
+      "loss": 0.9613,
+      "step": 1670
+    },
+    {
+      "epoch": 0.559254327563249,
+      "grad_norm": 9.3125,
+      "learning_rate": 8.814913448735021e-06,
+      "loss": 0.9752,
+      "step": 1680
+    },
+    {
+      "epoch": 0.5625832223701731,
+      "grad_norm": 8.375,
+      "learning_rate": 8.748335552596539e-06,
+      "loss": 0.8926,
+      "step": 1690
+    },
+    {
+      "epoch": 0.5659121171770972,
+      "grad_norm": 17.625,
+      "learning_rate": 8.681757656458057e-06,
+      "loss": 1.0054,
+      "step": 1700
+    },
+    {
+      "epoch": 0.5692410119840213,
+      "grad_norm": 20.125,
+      "learning_rate": 8.615179760319575e-06,
+      "loss": 0.9203,
+      "step": 1710
+    },
+    {
+      "epoch": 0.5725699067909454,
+      "grad_norm": 28.75,
+      "learning_rate": 8.548601864181093e-06,
+      "loss": 0.9583,
+      "step": 1720
+    },
+    {
+      "epoch": 0.5758988015978695,
+      "grad_norm": 17.0,
+      "learning_rate": 8.48202396804261e-06,
+      "loss": 0.9582,
+      "step": 1730
+    },
+    {
+      "epoch": 0.5792276964047937,
+      "grad_norm": 13.4375,
+      "learning_rate": 8.415446071904129e-06,
+      "loss": 1.0034,
+      "step": 1740
+    },
+    {
+      "epoch": 0.5825565912117177,
+      "grad_norm": 27.75,
+      "learning_rate": 8.348868175765647e-06,
+      "loss": 1.0647,
+      "step": 1750
+    },
+    {
+      "epoch": 0.5858854860186418,
+      "grad_norm": 33.75,
+      "learning_rate": 8.282290279627165e-06,
+      "loss": 1.0575,
+      "step": 1760
+    },
+    {
+      "epoch": 0.5892143808255659,
+      "grad_norm": 14.375,
+      "learning_rate": 8.215712383488683e-06,
+      "loss": 1.0699,
+      "step": 1770
+    },
+    {
+      "epoch": 0.59254327563249,
+      "grad_norm": 6.09375,
+      "learning_rate": 8.1491344873502e-06,
+      "loss": 0.7706,
+      "step": 1780
+    },
+    {
+      "epoch": 0.5958721704394141,
+      "grad_norm": 8.5625,
+      "learning_rate": 8.082556591211719e-06,
+      "loss": 0.8626,
+      "step": 1790
+    },
+    {
+      "epoch": 0.5992010652463382,
+      "grad_norm": 7.46875,
+      "learning_rate": 8.015978695073236e-06,
+      "loss": 0.9427,
+      "step": 1800
+    },
+    {
+      "epoch": 0.6025299600532623,
+      "grad_norm": 23.375,
+      "learning_rate": 7.949400798934754e-06,
+      "loss": 0.962,
+      "step": 1810
+    },
+    {
+      "epoch": 0.6058588548601864,
+      "grad_norm": 7.65625,
+      "learning_rate": 7.882822902796272e-06,
+      "loss": 1.1216,
+      "step": 1820
+    },
+    {
+      "epoch": 0.6091877496671105,
+      "grad_norm": 10.0625,
+      "learning_rate": 7.81624500665779e-06,
+      "loss": 0.9076,
+      "step": 1830
+    },
+    {
+      "epoch": 0.6125166444740346,
+      "grad_norm": 6.78125,
+      "learning_rate": 7.749667110519308e-06,
+      "loss": 0.8158,
+      "step": 1840
+    },
+    {
+      "epoch": 0.6158455392809588,
+      "grad_norm": 29.5,
+      "learning_rate": 7.683089214380826e-06,
+      "loss": 0.9828,
+      "step": 1850
+    },
+    {
+      "epoch": 0.6191744340878829,
+      "grad_norm": 9.6875,
+      "learning_rate": 7.616511318242344e-06,
+      "loss": 0.9181,
+      "step": 1860
+    },
+    {
+      "epoch": 0.622503328894807,
+      "grad_norm": 7.6875,
+      "learning_rate": 7.549933422103862e-06,
+      "loss": 0.9864,
+      "step": 1870
+    },
+    {
+      "epoch": 0.625832223701731,
+      "grad_norm": 33.0,
+      "learning_rate": 7.48335552596538e-06,
+      "loss": 0.9898,
+      "step": 1880
+    },
+    {
+      "epoch": 0.6291611185086551,
+      "grad_norm": 13.1875,
+      "learning_rate": 7.416777629826898e-06,
+      "loss": 0.948,
+      "step": 1890
+    },
+    {
+      "epoch": 0.6324900133155792,
+      "grad_norm": 6.59375,
+      "learning_rate": 7.350199733688416e-06,
+      "loss": 0.9811,
+      "step": 1900
+    },
+    {
+      "epoch": 0.6358189081225033,
+      "grad_norm": 8.5625,
+      "learning_rate": 7.283621837549935e-06,
+      "loss": 1.0461,
+      "step": 1910
+    },
+    {
+      "epoch": 0.6391478029294274,
+      "grad_norm": 30.25,
+      "learning_rate": 7.217043941411453e-06,
+      "loss": 1.0408,
+      "step": 1920
+    },
+    {
+      "epoch": 0.6424766977363515,
+      "grad_norm": 25.625,
+      "learning_rate": 7.15046604527297e-06,
+      "loss": 1.0327,
+      "step": 1930
+    },
+    {
+      "epoch": 0.6458055925432756,
+      "grad_norm": 27.125,
+      "learning_rate": 7.083888149134488e-06,
+      "loss": 0.9213,
+      "step": 1940
+    },
+    {
+      "epoch": 0.6491344873501997,
+      "grad_norm": 22.125,
+      "learning_rate": 7.017310252996006e-06,
+      "loss": 1.1191,
+      "step": 1950
+    },
+    {
+      "epoch": 0.6524633821571239,
+      "grad_norm": 10.6875,
+      "learning_rate": 6.950732356857524e-06,
+      "loss": 0.9514,
+      "step": 1960
+    },
+    {
+      "epoch": 0.655792276964048,
+      "grad_norm": 16.375,
+      "learning_rate": 6.884154460719042e-06,
+      "loss": 0.9437,
+      "step": 1970
+    },
+    {
+      "epoch": 0.6591211717709721,
+      "grad_norm": 7.5625,
+      "learning_rate": 6.8175765645805605e-06,
+      "loss": 1.0645,
+      "step": 1980
+    },
+    {
+      "epoch": 0.6624500665778962,
+      "grad_norm": 9.125,
+      "learning_rate": 6.7509986684420784e-06,
+      "loss": 0.9085,
+      "step": 1990
+    },
+    {
+      "epoch": 0.6657789613848203,
+      "grad_norm": 9.6875,
+      "learning_rate": 6.6844207723035955e-06,
+      "loss": 0.9784,
+      "step": 2000
+    },
+    {
+      "epoch": 0.6657789613848203,
+      "eval_accuracy": 0.5068027210884354,
+      "eval_loss": 0.9414482712745667,
+      "eval_runtime": 213.4336,
+      "eval_samples_per_second": 112.953,
+      "eval_steps_per_second": 28.238,
+      "step": 2000
+    },
+    {
+      "epoch": 0.6691078561917443,
+      "grad_norm": 22.125,
+      "learning_rate": 6.6178428761651135e-06,
+      "loss": 0.9537,
+      "step": 2010
+    },
+    {
+      "epoch": 0.6724367509986684,
+      "grad_norm": 19.5,
+      "learning_rate": 6.5512649800266314e-06,
+      "loss": 1.0052,
+      "step": 2020
+    },
+    {
+      "epoch": 0.6757656458055925,
+      "grad_norm": 22.75,
+      "learning_rate": 6.484687083888149e-06,
+      "loss": 1.0513,
+      "step": 2030
+    },
+    {
+      "epoch": 0.6790945406125166,
+      "grad_norm": 14.9375,
+      "learning_rate": 6.418109187749668e-06,
+      "loss": 1.0479,
+      "step": 2040
+    },
+    {
+      "epoch": 0.6824234354194407,
+      "grad_norm": 15.6875,
+      "learning_rate": 6.351531291611186e-06,
+      "loss": 0.9613,
+      "step": 2050
+    },
+    {
+      "epoch": 0.6857523302263648,
+      "grad_norm": 9.5625,
+      "learning_rate": 6.284953395472704e-06,
+      "loss": 0.98,
+      "step": 2060
+    },
+    {
+      "epoch": 0.689081225033289,
+      "grad_norm": 11.375,
+      "learning_rate": 6.218375499334221e-06,
+      "loss": 0.8031,
+      "step": 2070
+    },
+    {
+      "epoch": 0.6924101198402131,
+      "grad_norm": 8.5,
+      "learning_rate": 6.151797603195739e-06,
+      "loss": 0.9613,
+      "step": 2080
+    },
+    {
+      "epoch": 0.6957390146471372,
+      "grad_norm": 8.1875,
+      "learning_rate": 6.085219707057257e-06,
+      "loss": 1.0147,
+      "step": 2090
+    },
+    {
+      "epoch": 0.6990679094540613,
+      "grad_norm": 12.125,
+      "learning_rate": 6.018641810918775e-06,
+      "loss": 0.9856,
+      "step": 2100
+    },
+    {
+      "epoch": 0.7023968042609854,
+      "grad_norm": 26.0,
+      "learning_rate": 5.952063914780294e-06,
+      "loss": 0.9457,
+      "step": 2110
+    },
+    {
+      "epoch": 0.7057256990679095,
+      "grad_norm": 25.875,
+      "learning_rate": 5.885486018641812e-06,
+      "loss": 0.9133,
+      "step": 2120
+    },
+    {
+      "epoch": 0.7090545938748336,
+      "grad_norm": 25.0,
+      "learning_rate": 5.81890812250333e-06,
+      "loss": 1.0304,
+      "step": 2130
+    },
+    {
+      "epoch": 0.7123834886817576,
+      "grad_norm": 27.375,
+      "learning_rate": 5.752330226364847e-06,
+      "loss": 1.1041,
+      "step": 2140
+    },
+    {
+      "epoch": 0.7157123834886817,
+      "grad_norm": 15.75,
+      "learning_rate": 5.685752330226365e-06,
+      "loss": 1.0324,
+      "step": 2150
+    },
+    {
+      "epoch": 0.7190412782956058,
+      "grad_norm": 13.4375,
+      "learning_rate": 5.619174434087883e-06,
+      "loss": 0.9786,
+      "step": 2160
+    },
+    {
+      "epoch": 0.7223701731025299,
+      "grad_norm": 7.53125,
+      "learning_rate": 5.5525965379494016e-06,
+      "loss": 1.048,
+      "step": 2170
+    },
+    {
+      "epoch": 0.725699067909454,
+      "grad_norm": 23.75,
+      "learning_rate": 5.4860186418109195e-06,
+      "loss": 0.9347,
+      "step": 2180
+    },
+    {
+      "epoch": 0.7290279627163782,
+      "grad_norm": 30.875,
+      "learning_rate": 5.4194407456724375e-06,
+      "loss": 0.929,
+      "step": 2190
+    },
+    {
+      "epoch": 0.7323568575233023,
+      "grad_norm": 9.5625,
+      "learning_rate": 5.3528628495339554e-06,
+      "loss": 0.8509,
+      "step": 2200
+    },
+    {
+      "epoch": 0.7356857523302264,
+      "grad_norm": 31.125,
+      "learning_rate": 5.286284953395473e-06,
+      "loss": 1.0614,
+      "step": 2210
+    },
+    {
+      "epoch": 0.7390146471371505,
+      "grad_norm": 33.5,
+      "learning_rate": 5.2197070572569905e-06,
+      "loss": 0.9723,
+      "step": 2220
+    },
+    {
+      "epoch": 0.7423435419440746,
+      "grad_norm": 9.0,
+      "learning_rate": 5.1531291611185084e-06,
+      "loss": 0.8864,
+      "step": 2230
+    },
+    {
+      "epoch": 0.7456724367509987,
+      "grad_norm": 26.0,
+      "learning_rate": 5.086551264980027e-06,
+      "loss": 1.004,
+      "step": 2240
+    },
+    {
+      "epoch": 0.7490013315579228,
+      "grad_norm": 26.5,
+      "learning_rate": 5.019973368841545e-06,
+      "loss": 0.9622,
+      "step": 2250
+    },
+    {
+      "epoch": 0.7523302263648469,
+      "grad_norm": 12.75,
+      "learning_rate": 4.953395472703063e-06,
+      "loss": 0.9793,
+      "step": 2260
+    },
+    {
+      "epoch": 0.7556591211717709,
+      "grad_norm": 7.65625,
+      "learning_rate": 4.886817576564581e-06,
+      "loss": 0.7945,
+      "step": 2270
+    },
+    {
+      "epoch": 0.758988015978695,
+      "grad_norm": 16.875,
+      "learning_rate": 4.820239680426099e-06,
+      "loss": 1.151,
+      "step": 2280
+    },
+    {
+      "epoch": 0.7623169107856191,
+      "grad_norm": 14.1875,
+      "learning_rate": 4.753661784287617e-06,
+      "loss": 0.8101,
+      "step": 2290
+    },
+    {
+      "epoch": 0.7656458055925432,
+      "grad_norm": 5.53125,
+      "learning_rate": 4.687083888149135e-06,
+      "loss": 0.9417,
+      "step": 2300
+    },
+    {
+      "epoch": 0.7689747003994674,
+      "grad_norm": 14.75,
+      "learning_rate": 4.620505992010653e-06,
+      "loss": 0.9002,
+      "step": 2310
+    },
+    {
+      "epoch": 0.7723035952063915,
+      "grad_norm": 13.9375,
+      "learning_rate": 4.553928095872171e-06,
+      "loss": 1.0102,
+      "step": 2320
+    },
+    {
+      "epoch": 0.7756324900133156,
+      "grad_norm": 7.9375,
+      "learning_rate": 4.487350199733689e-06,
+      "loss": 0.9029,
+      "step": 2330
+    },
+    {
+      "epoch": 0.7789613848202397,
+      "grad_norm": 25.25,
+      "learning_rate": 4.420772303595207e-06,
+      "loss": 1.0435,
+      "step": 2340
+    },
+    {
+      "epoch": 0.7822902796271638,
+      "grad_norm": 3.53125,
+      "learning_rate": 4.354194407456725e-06,
+      "loss": 0.8851,
+      "step": 2350
+    },
+    {
+      "epoch": 0.7856191744340879,
+      "grad_norm": 36.0,
+      "learning_rate": 4.287616511318243e-06,
+      "loss": 0.9339,
+      "step": 2360
+    },
+    {
+      "epoch": 0.788948069241012,
+      "grad_norm": 25.375,
+      "learning_rate": 4.221038615179761e-06,
+      "loss": 0.943,
+      "step": 2370
+    },
+    {
+      "epoch": 0.7922769640479361,
+      "grad_norm": 6.90625,
+      "learning_rate": 4.1544607190412786e-06,
+      "loss": 0.969,
+      "step": 2380
+    },
+    {
+      "epoch": 0.7956058588548602,
+      "grad_norm": 29.375,
+      "learning_rate": 4.0878828229027965e-06,
+      "loss": 0.9423,
+      "step": 2390
+    },
+    {
+      "epoch": 0.7989347536617842,
+      "grad_norm": 9.125,
+      "learning_rate": 4.0213049267643145e-06,
+      "loss": 0.8299,
+      "step": 2400
+    },
+    {
+      "epoch": 0.8022636484687083,
+      "grad_norm": 9.75,
+      "learning_rate": 3.9547270306258324e-06,
+      "loss": 1.0702,
+      "step": 2410
+    },
+    {
+      "epoch": 0.8055925432756325,
+      "grad_norm": 13.1875,
+      "learning_rate": 3.88814913448735e-06,
+      "loss": 1.03,
+      "step": 2420
+    },
+    {
+      "epoch": 0.8089214380825566,
+      "grad_norm": 10.4375,
+      "learning_rate": 3.821571238348868e-06,
+      "loss": 0.9467,
+      "step": 2430
+    },
+    {
+      "epoch": 0.8122503328894807,
+      "grad_norm": 6.46875,
+      "learning_rate": 3.7549933422103863e-06,
+      "loss": 1.0284,
+      "step": 2440
+    },
+    {
+      "epoch": 0.8155792276964048,
+      "grad_norm": 9.75,
+      "learning_rate": 3.6884154460719047e-06,
+      "loss": 1.0224,
+      "step": 2450
+    },
+    {
+      "epoch": 0.8189081225033289,
+      "grad_norm": 11.0625,
+      "learning_rate": 3.621837549933422e-06,
+      "loss": 0.8798,
+      "step": 2460
+    },
+    {
+      "epoch": 0.822237017310253,
+      "grad_norm": 6.25,
+      "learning_rate": 3.55525965379494e-06,
+      "loss": 0.9709,
+      "step": 2470
+    },
+    {
+      "epoch": 0.8255659121171771,
+      "grad_norm": 35.5,
+      "learning_rate": 3.4886817576564585e-06,
+      "loss": 1.0462,
+      "step": 2480
+    },
+    {
+      "epoch": 0.8288948069241012,
+      "grad_norm": 19.25,
+      "learning_rate": 3.4221038615179765e-06,
+      "loss": 0.9246,
+      "step": 2490
+    },
+    {
+      "epoch": 0.8322237017310253,
+      "grad_norm": 11.9375,
+      "learning_rate": 3.355525965379494e-06,
+      "loss": 0.9595,
+      "step": 2500
+    },
+    {
+      "epoch": 0.8322237017310253,
+      "eval_accuracy": 0.5073834411813506,
+      "eval_loss": 0.9414454698562622,
+      "eval_runtime": 212.4541,
+      "eval_samples_per_second": 113.474,
+      "eval_steps_per_second": 28.368,
+      "step": 2500
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 3004,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-2500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cad4c1f90cca8627b67272acf21697ce26139a89a13bf55f49567978573bff87
+size 5176

checkpoint-3000/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: gpt2
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-3000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "gpt2",
+  "bias": "none",
+  "fan_in_fan_out": true,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-3000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b9a96974d5507d6c87ce1cb73655ab18eff252908ea00442a2b83fa42b8494cc
+size 594496

checkpoint-3000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c08cbb701c1e06dc504b94c421defebfa9d49990b6dcce27851005ac62723d87
+size 1197932

checkpoint-3000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:780ed1ee6a2aecfdf8e76a606d65f3f7b02699b8e043295aaeb36519f12d929b
+size 14180

checkpoint-3000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9f0f13b689215fcd891e1095e758600069c8ac26423baee7ed2caed4cbb4bff5
+size 1064

checkpoint-3000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2187 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.9986684420772304,
+  "eval_steps": 500,
+  "global_step": 3000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.003328894806924101,
+      "grad_norm": 37.75,
+      "learning_rate": 1.9933422103861518e-05,
+      "loss": 1.4084,
+      "step": 10
+    },
+    {
+      "epoch": 0.006657789613848202,
+      "grad_norm": 12.125,
+      "learning_rate": 1.9866844207723038e-05,
+      "loss": 1.0474,
+      "step": 20
+    },
+    {
+      "epoch": 0.009986684420772303,
+      "grad_norm": 45.25,
+      "learning_rate": 1.9800266311584554e-05,
+      "loss": 1.2455,
+      "step": 30
+    },
+    {
+      "epoch": 0.013315579227696404,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.9733688415446073e-05,
+      "loss": 1.1969,
+      "step": 40
+    },
+    {
+      "epoch": 0.016644474034620507,
+      "grad_norm": 32.0,
+      "learning_rate": 1.966711051930759e-05,
+      "loss": 1.1659,
+      "step": 50
+    },
+    {
+      "epoch": 0.019973368841544607,
+      "grad_norm": 19.0,
+      "learning_rate": 1.960053262316911e-05,
+      "loss": 1.1159,
+      "step": 60
+    },
+    {
+      "epoch": 0.02330226364846871,
+      "grad_norm": 27.5,
+      "learning_rate": 1.953395472703063e-05,
+      "loss": 1.1861,
+      "step": 70
+    },
+    {
+      "epoch": 0.02663115845539281,
+      "grad_norm": 17.5,
+      "learning_rate": 1.9467376830892145e-05,
+      "loss": 1.1073,
+      "step": 80
+    },
+    {
+      "epoch": 0.02996005326231691,
+      "grad_norm": 33.75,
+      "learning_rate": 1.9400798934753665e-05,
+      "loss": 1.1506,
+      "step": 90
+    },
+    {
+      "epoch": 0.033288948069241014,
+      "grad_norm": 9.5,
+      "learning_rate": 1.933422103861518e-05,
+      "loss": 1.246,
+      "step": 100
+    },
+    {
+      "epoch": 0.03661784287616511,
+      "grad_norm": 11.3125,
+      "learning_rate": 1.92676431424767e-05,
+      "loss": 1.1708,
+      "step": 110
+    },
+    {
+      "epoch": 0.03994673768308921,
+      "grad_norm": 14.375,
+      "learning_rate": 1.9201065246338217e-05,
+      "loss": 1.1302,
+      "step": 120
+    },
+    {
+      "epoch": 0.043275632490013316,
+      "grad_norm": 20.625,
+      "learning_rate": 1.9134487350199737e-05,
+      "loss": 1.0537,
+      "step": 130
+    },
+    {
+      "epoch": 0.04660452729693742,
+      "grad_norm": 21.75,
+      "learning_rate": 1.9067909454061253e-05,
+      "loss": 1.0669,
+      "step": 140
+    },
+    {
+      "epoch": 0.049933422103861515,
+      "grad_norm": 14.125,
+      "learning_rate": 1.900133155792277e-05,
+      "loss": 1.0482,
+      "step": 150
+    },
+    {
+      "epoch": 0.05326231691078562,
+      "grad_norm": 27.625,
+      "learning_rate": 1.893475366178429e-05,
+      "loss": 1.1468,
+      "step": 160
+    },
+    {
+      "epoch": 0.05659121171770972,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.8868175765645805e-05,
+      "loss": 1.092,
+      "step": 170
+    },
+    {
+      "epoch": 0.05992010652463382,
+      "grad_norm": 32.5,
+      "learning_rate": 1.8801597869507325e-05,
+      "loss": 1.2208,
+      "step": 180
+    },
+    {
+      "epoch": 0.06324900133155792,
+      "grad_norm": 56.0,
+      "learning_rate": 1.873501997336884e-05,
+      "loss": 1.2359,
+      "step": 190
+    },
+    {
+      "epoch": 0.06657789613848203,
+      "grad_norm": 8.375,
+      "learning_rate": 1.866844207723036e-05,
+      "loss": 1.0712,
+      "step": 200
+    },
+    {
+      "epoch": 0.06990679094540612,
+      "grad_norm": 26.375,
+      "learning_rate": 1.860186418109188e-05,
+      "loss": 1.0156,
+      "step": 210
+    },
+    {
+      "epoch": 0.07323568575233022,
+      "grad_norm": 12.0,
+      "learning_rate": 1.8535286284953397e-05,
+      "loss": 1.0407,
+      "step": 220
+    },
+    {
+      "epoch": 0.07656458055925433,
+      "grad_norm": 13.75,
+      "learning_rate": 1.8468708388814916e-05,
+      "loss": 1.0161,
+      "step": 230
+    },
+    {
+      "epoch": 0.07989347536617843,
+      "grad_norm": 27.125,
+      "learning_rate": 1.8402130492676432e-05,
+      "loss": 1.1466,
+      "step": 240
+    },
+    {
+      "epoch": 0.08322237017310254,
+      "grad_norm": 26.125,
+      "learning_rate": 1.8335552596537952e-05,
+      "loss": 1.3192,
+      "step": 250
+    },
+    {
+      "epoch": 0.08655126498002663,
+      "grad_norm": 25.75,
+      "learning_rate": 1.826897470039947e-05,
+      "loss": 1.0628,
+      "step": 260
+    },
+    {
+      "epoch": 0.08988015978695073,
+      "grad_norm": 29.25,
+      "learning_rate": 1.8202396804260988e-05,
+      "loss": 0.967,
+      "step": 270
+    },
+    {
+      "epoch": 0.09320905459387484,
+      "grad_norm": 29.75,
+      "learning_rate": 1.8135818908122504e-05,
+      "loss": 1.0356,
+      "step": 280
+    },
+    {
+      "epoch": 0.09653794940079893,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.806924101198402e-05,
+      "loss": 0.9853,
+      "step": 290
+    },
+    {
+      "epoch": 0.09986684420772303,
+      "grad_norm": 27.5,
+      "learning_rate": 1.800266311584554e-05,
+      "loss": 1.0279,
+      "step": 300
+    },
+    {
+      "epoch": 0.10319573901464714,
+      "grad_norm": 5.65625,
+      "learning_rate": 1.7936085219707056e-05,
+      "loss": 1.0115,
+      "step": 310
+    },
+    {
+      "epoch": 0.10652463382157124,
+      "grad_norm": 8.1875,
+      "learning_rate": 1.7869507323568576e-05,
+      "loss": 0.942,
+      "step": 320
+    },
+    {
+      "epoch": 0.10985352862849534,
+      "grad_norm": 65.0,
+      "learning_rate": 1.7802929427430096e-05,
+      "loss": 1.0208,
+      "step": 330
+    },
+    {
+      "epoch": 0.11318242343541944,
+      "grad_norm": 9.625,
+      "learning_rate": 1.7736351531291612e-05,
+      "loss": 1.1729,
+      "step": 340
+    },
+    {
+      "epoch": 0.11651131824234354,
+      "grad_norm": 17.5,
+      "learning_rate": 1.766977363515313e-05,
+      "loss": 0.9322,
+      "step": 350
+    },
+    {
+      "epoch": 0.11984021304926765,
+      "grad_norm": 28.125,
+      "learning_rate": 1.7603195739014648e-05,
+      "loss": 1.0417,
+      "step": 360
+    },
+    {
+      "epoch": 0.12316910785619174,
+      "grad_norm": 18.25,
+      "learning_rate": 1.7536617842876168e-05,
+      "loss": 1.2491,
+      "step": 370
+    },
+    {
+      "epoch": 0.12649800266311584,
+      "grad_norm": 27.375,
+      "learning_rate": 1.7470039946737684e-05,
+      "loss": 0.9388,
+      "step": 380
+    },
+    {
+      "epoch": 0.12982689747003995,
+      "grad_norm": 22.125,
+      "learning_rate": 1.7403462050599203e-05,
+      "loss": 1.0488,
+      "step": 390
+    },
+    {
+      "epoch": 0.13315579227696406,
+      "grad_norm": 21.25,
+      "learning_rate": 1.733688415446072e-05,
+      "loss": 0.9951,
+      "step": 400
+    },
+    {
+      "epoch": 0.13648468708388814,
+      "grad_norm": 11.75,
+      "learning_rate": 1.727030625832224e-05,
+      "loss": 0.9212,
+      "step": 410
+    },
+    {
+      "epoch": 0.13981358189081225,
+      "grad_norm": 16.25,
+      "learning_rate": 1.7203728362183756e-05,
+      "loss": 1.0548,
+      "step": 420
+    },
+    {
+      "epoch": 0.14314247669773636,
+      "grad_norm": 19.875,
+      "learning_rate": 1.7137150466045275e-05,
+      "loss": 1.0238,
+      "step": 430
+    },
+    {
+      "epoch": 0.14647137150466044,
+      "grad_norm": 18.875,
+      "learning_rate": 1.707057256990679e-05,
+      "loss": 1.0327,
+      "step": 440
+    },
+    {
+      "epoch": 0.14980026631158455,
+      "grad_norm": 7.84375,
+      "learning_rate": 1.7003994673768308e-05,
+      "loss": 1.1155,
+      "step": 450
+    },
+    {
+      "epoch": 0.15312916111850866,
+      "grad_norm": 14.125,
+      "learning_rate": 1.693741677762983e-05,
+      "loss": 0.9627,
+      "step": 460
+    },
+    {
+      "epoch": 0.15645805592543274,
+      "grad_norm": 7.4375,
+      "learning_rate": 1.6870838881491347e-05,
+      "loss": 1.0216,
+      "step": 470
+    },
+    {
+      "epoch": 0.15978695073235685,
+      "grad_norm": 9.375,
+      "learning_rate": 1.6804260985352863e-05,
+      "loss": 1.0489,
+      "step": 480
+    },
+    {
+      "epoch": 0.16311584553928096,
+      "grad_norm": 19.5,
+      "learning_rate": 1.6737683089214383e-05,
+      "loss": 1.1254,
+      "step": 490
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "grad_norm": 26.125,
+      "learning_rate": 1.66711051930759e-05,
+      "loss": 1.0134,
+      "step": 500
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "eval_accuracy": 0.5015762402521985,
+      "eval_loss": 0.9942083358764648,
+      "eval_runtime": 211.8123,
+      "eval_samples_per_second": 113.818,
+      "eval_steps_per_second": 28.454,
+      "step": 500
+    },
+    {
+      "epoch": 0.16977363515312915,
+      "grad_norm": 12.3125,
+      "learning_rate": 1.660452729693742e-05,
+      "loss": 1.0164,
+      "step": 510
+    },
+    {
+      "epoch": 0.17310252996005326,
+      "grad_norm": 38.75,
+      "learning_rate": 1.6537949400798935e-05,
+      "loss": 1.0042,
+      "step": 520
+    },
+    {
+      "epoch": 0.17643142476697737,
+      "grad_norm": 20.125,
+      "learning_rate": 1.6471371504660455e-05,
+      "loss": 0.9786,
+      "step": 530
+    },
+    {
+      "epoch": 0.17976031957390146,
+      "grad_norm": 11.4375,
+      "learning_rate": 1.640479360852197e-05,
+      "loss": 0.9678,
+      "step": 540
+    },
+    {
+      "epoch": 0.18308921438082557,
+      "grad_norm": 19.375,
+      "learning_rate": 1.633821571238349e-05,
+      "loss": 1.0477,
+      "step": 550
+    },
+    {
+      "epoch": 0.18641810918774968,
+      "grad_norm": 16.75,
+      "learning_rate": 1.6271637816245007e-05,
+      "loss": 1.02,
+      "step": 560
+    },
+    {
+      "epoch": 0.18974700399467376,
+      "grad_norm": 15.5,
+      "learning_rate": 1.6205059920106527e-05,
+      "loss": 0.9864,
+      "step": 570
+    },
+    {
+      "epoch": 0.19307589880159787,
+      "grad_norm": 9.75,
+      "learning_rate": 1.6138482023968043e-05,
+      "loss": 1.0019,
+      "step": 580
+    },
+    {
+      "epoch": 0.19640479360852198,
+      "grad_norm": 12.125,
+      "learning_rate": 1.6071904127829563e-05,
+      "loss": 0.9483,
+      "step": 590
+    },
+    {
+      "epoch": 0.19973368841544606,
+      "grad_norm": 30.875,
+      "learning_rate": 1.6005326231691082e-05,
+      "loss": 1.0692,
+      "step": 600
+    },
+    {
+      "epoch": 0.20306258322237017,
+      "grad_norm": 10.125,
+      "learning_rate": 1.59387483355526e-05,
+      "loss": 0.9279,
+      "step": 610
+    },
+    {
+      "epoch": 0.20639147802929428,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.5872170439414115e-05,
+      "loss": 0.941,
+      "step": 620
+    },
+    {
+      "epoch": 0.2097203728362184,
+      "grad_norm": 3.03125,
+      "learning_rate": 1.5805592543275634e-05,
+      "loss": 1.0259,
+      "step": 630
+    },
+    {
+      "epoch": 0.21304926764314247,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.573901464713715e-05,
+      "loss": 0.9687,
+      "step": 640
+    },
+    {
+      "epoch": 0.21637816245006658,
+      "grad_norm": 12.8125,
+      "learning_rate": 1.567243675099867e-05,
+      "loss": 0.9512,
+      "step": 650
+    },
+    {
+      "epoch": 0.2197070572569907,
+      "grad_norm": 9.75,
+      "learning_rate": 1.5605858854860187e-05,
+      "loss": 0.9569,
+      "step": 660
+    },
+    {
+      "epoch": 0.22303595206391477,
+      "grad_norm": 17.375,
+      "learning_rate": 1.5539280958721706e-05,
+      "loss": 0.9694,
+      "step": 670
+    },
+    {
+      "epoch": 0.22636484687083888,
+      "grad_norm": 9.625,
+      "learning_rate": 1.5472703062583222e-05,
+      "loss": 0.97,
+      "step": 680
+    },
+    {
+      "epoch": 0.229693741677763,
+      "grad_norm": 21.0,
+      "learning_rate": 1.5406125166444742e-05,
+      "loss": 1.0506,
+      "step": 690
+    },
+    {
+      "epoch": 0.23302263648468707,
+      "grad_norm": 10.3125,
+      "learning_rate": 1.533954727030626e-05,
+      "loss": 0.9184,
+      "step": 700
+    },
+    {
+      "epoch": 0.23635153129161118,
+      "grad_norm": 21.5,
+      "learning_rate": 1.5272969374167778e-05,
+      "loss": 1.005,
+      "step": 710
+    },
+    {
+      "epoch": 0.2396804260985353,
+      "grad_norm": 4.40625,
+      "learning_rate": 1.5206391478029296e-05,
+      "loss": 0.8624,
+      "step": 720
+    },
+    {
+      "epoch": 0.24300932090545938,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.5139813581890814e-05,
+      "loss": 1.0051,
+      "step": 730
+    },
+    {
+      "epoch": 0.24633821571238348,
+      "grad_norm": 16.625,
+      "learning_rate": 1.5073235685752332e-05,
+      "loss": 1.0378,
+      "step": 740
+    },
+    {
+      "epoch": 0.2496671105193076,
+      "grad_norm": 6.4375,
+      "learning_rate": 1.500665778961385e-05,
+      "loss": 0.9191,
+      "step": 750
+    },
+    {
+      "epoch": 0.2529960053262317,
+      "grad_norm": 5.5625,
+      "learning_rate": 1.4940079893475368e-05,
+      "loss": 0.9722,
+      "step": 760
+    },
+    {
+      "epoch": 0.2563249001331558,
+      "grad_norm": 22.625,
+      "learning_rate": 1.4873501997336886e-05,
+      "loss": 1.014,
+      "step": 770
+    },
+    {
+      "epoch": 0.2596537949400799,
+      "grad_norm": 21.0,
+      "learning_rate": 1.4806924101198404e-05,
+      "loss": 1.0986,
+      "step": 780
+    },
+    {
+      "epoch": 0.262982689747004,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.4740346205059922e-05,
+      "loss": 0.9672,
+      "step": 790
+    },
+    {
+      "epoch": 0.2663115845539281,
+      "grad_norm": 17.0,
+      "learning_rate": 1.467376830892144e-05,
+      "loss": 0.9605,
+      "step": 800
+    },
+    {
+      "epoch": 0.26964047936085217,
+      "grad_norm": 31.75,
+      "learning_rate": 1.4607190412782957e-05,
+      "loss": 1.0308,
+      "step": 810
+    },
+    {
+      "epoch": 0.2729693741677763,
+      "grad_norm": 12.4375,
+      "learning_rate": 1.4540612516644474e-05,
+      "loss": 0.9268,
+      "step": 820
+    },
+    {
+      "epoch": 0.2762982689747004,
+      "grad_norm": 17.125,
+      "learning_rate": 1.4474034620505992e-05,
+      "loss": 0.9656,
+      "step": 830
+    },
+    {
+      "epoch": 0.2796271637816245,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.440745672436751e-05,
+      "loss": 0.9865,
+      "step": 840
+    },
+    {
+      "epoch": 0.2829560585885486,
+      "grad_norm": 22.625,
+      "learning_rate": 1.434087882822903e-05,
+      "loss": 1.1118,
+      "step": 850
+    },
+    {
+      "epoch": 0.2862849533954727,
+      "grad_norm": 13.0,
+      "learning_rate": 1.4274300932090547e-05,
+      "loss": 1.0043,
+      "step": 860
+    },
+    {
+      "epoch": 0.28961384820239683,
+      "grad_norm": 9.0625,
+      "learning_rate": 1.4207723035952065e-05,
+      "loss": 0.9768,
+      "step": 870
+    },
+    {
+      "epoch": 0.2929427430093209,
+      "grad_norm": 6.75,
+      "learning_rate": 1.4141145139813583e-05,
+      "loss": 1.0237,
+      "step": 880
+    },
+    {
+      "epoch": 0.296271637816245,
+      "grad_norm": 10.875,
+      "learning_rate": 1.4074567243675101e-05,
+      "loss": 1.1597,
+      "step": 890
+    },
+    {
+      "epoch": 0.2996005326231691,
+      "grad_norm": 5.75,
+      "learning_rate": 1.4007989347536619e-05,
+      "loss": 0.7993,
+      "step": 900
+    },
+    {
+      "epoch": 0.3029294274300932,
+      "grad_norm": 31.5,
+      "learning_rate": 1.3941411451398137e-05,
+      "loss": 0.9802,
+      "step": 910
+    },
+    {
+      "epoch": 0.3062583222370173,
+      "grad_norm": 12.25,
+      "learning_rate": 1.3874833555259655e-05,
+      "loss": 0.9565,
+      "step": 920
+    },
+    {
+      "epoch": 0.30958721704394143,
+      "grad_norm": 11.8125,
+      "learning_rate": 1.3808255659121173e-05,
+      "loss": 1.0024,
+      "step": 930
+    },
+    {
+      "epoch": 0.3129161118508655,
+      "grad_norm": 16.625,
+      "learning_rate": 1.3741677762982691e-05,
+      "loss": 1.0209,
+      "step": 940
+    },
+    {
+      "epoch": 0.3162450066577896,
+      "grad_norm": 13.375,
+      "learning_rate": 1.3675099866844209e-05,
+      "loss": 0.8767,
+      "step": 950
+    },
+    {
+      "epoch": 0.3195739014647137,
+      "grad_norm": 20.875,
+      "learning_rate": 1.3608521970705725e-05,
+      "loss": 0.9642,
+      "step": 960
+    },
+    {
+      "epoch": 0.3229027962716378,
+      "grad_norm": 17.25,
+      "learning_rate": 1.3541944074567243e-05,
+      "loss": 0.9508,
+      "step": 970
+    },
+    {
+      "epoch": 0.3262316910785619,
+      "grad_norm": 22.25,
+      "learning_rate": 1.3475366178428764e-05,
+      "loss": 1.1324,
+      "step": 980
+    },
+    {
+      "epoch": 0.32956058588548603,
+      "grad_norm": 11.5,
+      "learning_rate": 1.3408788282290282e-05,
+      "loss": 1.0655,
+      "step": 990
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "grad_norm": 15.625,
+      "learning_rate": 1.3342210386151799e-05,
+      "loss": 0.9608,
+      "step": 1000
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "eval_accuracy": 0.507300481168077,
+      "eval_loss": 0.9467151165008545,
+      "eval_runtime": 211.5004,
+      "eval_samples_per_second": 113.986,
+      "eval_steps_per_second": 28.496,
+      "step": 1000
+    },
+    {
+      "epoch": 0.3362183754993342,
+      "grad_norm": 20.0,
+      "learning_rate": 1.3275632490013317e-05,
+      "loss": 1.0432,
+      "step": 1010
+    },
+    {
+      "epoch": 0.3395472703062583,
+      "grad_norm": 22.375,
+      "learning_rate": 1.3209054593874834e-05,
+      "loss": 1.0393,
+      "step": 1020
+    },
+    {
+      "epoch": 0.3428761651131824,
+      "grad_norm": 18.375,
+      "learning_rate": 1.3142476697736352e-05,
+      "loss": 1.0398,
+      "step": 1030
+    },
+    {
+      "epoch": 0.34620505992010653,
+      "grad_norm": 17.25,
+      "learning_rate": 1.307589880159787e-05,
+      "loss": 0.9225,
+      "step": 1040
+    },
+    {
+      "epoch": 0.34953395472703064,
+      "grad_norm": 11.75,
+      "learning_rate": 1.3009320905459388e-05,
+      "loss": 0.9262,
+      "step": 1050
+    },
+    {
+      "epoch": 0.35286284953395475,
+      "grad_norm": 10.25,
+      "learning_rate": 1.2942743009320906e-05,
+      "loss": 0.7248,
+      "step": 1060
+    },
+    {
+      "epoch": 0.3561917443408788,
+      "grad_norm": 12.875,
+      "learning_rate": 1.2876165113182424e-05,
+      "loss": 0.8358,
+      "step": 1070
+    },
+    {
+      "epoch": 0.3595206391478029,
+      "grad_norm": 4.65625,
+      "learning_rate": 1.2809587217043942e-05,
+      "loss": 0.8156,
+      "step": 1080
+    },
+    {
+      "epoch": 0.362849533954727,
+      "grad_norm": 23.5,
+      "learning_rate": 1.274300932090546e-05,
+      "loss": 1.0211,
+      "step": 1090
+    },
+    {
+      "epoch": 0.36617842876165113,
+      "grad_norm": 17.5,
+      "learning_rate": 1.2676431424766978e-05,
+      "loss": 0.9804,
+      "step": 1100
+    },
+    {
+      "epoch": 0.36950732356857524,
+      "grad_norm": 6.375,
+      "learning_rate": 1.2609853528628498e-05,
+      "loss": 0.9636,
+      "step": 1110
+    },
+    {
+      "epoch": 0.37283621837549935,
+      "grad_norm": 20.25,
+      "learning_rate": 1.2543275632490016e-05,
+      "loss": 0.9873,
+      "step": 1120
+    },
+    {
+      "epoch": 0.37616511318242346,
+      "grad_norm": 17.0,
+      "learning_rate": 1.2476697736351534e-05,
+      "loss": 0.9674,
+      "step": 1130
+    },
+    {
+      "epoch": 0.3794940079893475,
+      "grad_norm": 20.5,
+      "learning_rate": 1.241011984021305e-05,
+      "loss": 1.0073,
+      "step": 1140
+    },
+    {
+      "epoch": 0.3828229027962716,
+      "grad_norm": 4.125,
+      "learning_rate": 1.2343541944074568e-05,
+      "loss": 1.0919,
+      "step": 1150
+    },
+    {
+      "epoch": 0.38615179760319573,
+      "grad_norm": 15.3125,
+      "learning_rate": 1.2276964047936086e-05,
+      "loss": 1.0103,
+      "step": 1160
+    },
+    {
+      "epoch": 0.38948069241011984,
+      "grad_norm": 22.125,
+      "learning_rate": 1.2210386151797604e-05,
+      "loss": 0.9629,
+      "step": 1170
+    },
+    {
+      "epoch": 0.39280958721704395,
+      "grad_norm": 2.953125,
+      "learning_rate": 1.2143808255659122e-05,
+      "loss": 0.9545,
+      "step": 1180
+    },
+    {
+      "epoch": 0.39613848202396806,
+      "grad_norm": 7.59375,
+      "learning_rate": 1.207723035952064e-05,
+      "loss": 0.8836,
+      "step": 1190
+    },
+    {
+      "epoch": 0.3994673768308921,
+      "grad_norm": 28.125,
+      "learning_rate": 1.2010652463382158e-05,
+      "loss": 0.9632,
+      "step": 1200
+    },
+    {
+      "epoch": 0.40279627163781623,
+      "grad_norm": 15.125,
+      "learning_rate": 1.1944074567243676e-05,
+      "loss": 1.0765,
+      "step": 1210
+    },
+    {
+      "epoch": 0.40612516644474034,
+      "grad_norm": 11.625,
+      "learning_rate": 1.1877496671105194e-05,
+      "loss": 0.7911,
+      "step": 1220
+    },
+    {
+      "epoch": 0.40945406125166445,
+      "grad_norm": 15.5625,
+      "learning_rate": 1.1810918774966711e-05,
+      "loss": 1.0908,
+      "step": 1230
+    },
+    {
+      "epoch": 0.41278295605858856,
+      "grad_norm": 6.3125,
+      "learning_rate": 1.1744340878828231e-05,
+      "loss": 1.0418,
+      "step": 1240
+    },
+    {
+      "epoch": 0.41611185086551267,
+      "grad_norm": 2.671875,
+      "learning_rate": 1.1677762982689749e-05,
+      "loss": 0.9041,
+      "step": 1250
+    },
+    {
+      "epoch": 0.4194407456724368,
+      "grad_norm": 19.125,
+      "learning_rate": 1.1611185086551267e-05,
+      "loss": 1.0392,
+      "step": 1260
+    },
+    {
+      "epoch": 0.42276964047936083,
+      "grad_norm": 6.8125,
+      "learning_rate": 1.1544607190412785e-05,
+      "loss": 0.8439,
+      "step": 1270
+    },
+    {
+      "epoch": 0.42609853528628494,
+      "grad_norm": 7.75,
+      "learning_rate": 1.1478029294274303e-05,
+      "loss": 1.0188,
+      "step": 1280
+    },
+    {
+      "epoch": 0.42942743009320905,
+      "grad_norm": 17.25,
+      "learning_rate": 1.141145139813582e-05,
+      "loss": 1.1353,
+      "step": 1290
+    },
+    {
+      "epoch": 0.43275632490013316,
+      "grad_norm": 22.125,
+      "learning_rate": 1.1344873501997337e-05,
+      "loss": 0.987,
+      "step": 1300
+    },
+    {
+      "epoch": 0.43608521970705727,
+      "grad_norm": 16.0,
+      "learning_rate": 1.1278295605858855e-05,
+      "loss": 0.8446,
+      "step": 1310
+    },
+    {
+      "epoch": 0.4394141145139814,
+      "grad_norm": 48.25,
+      "learning_rate": 1.1211717709720373e-05,
+      "loss": 1.0588,
+      "step": 1320
+    },
+    {
+      "epoch": 0.44274300932090543,
+      "grad_norm": 17.625,
+      "learning_rate": 1.1145139813581891e-05,
+      "loss": 0.8579,
+      "step": 1330
+    },
+    {
+      "epoch": 0.44607190412782954,
+      "grad_norm": 20.25,
+      "learning_rate": 1.1078561917443409e-05,
+      "loss": 1.0973,
+      "step": 1340
+    },
+    {
+      "epoch": 0.44940079893475365,
+      "grad_norm": 14.25,
+      "learning_rate": 1.1011984021304927e-05,
+      "loss": 0.975,
+      "step": 1350
+    },
+    {
+      "epoch": 0.45272969374167776,
+      "grad_norm": 22.875,
+      "learning_rate": 1.0945406125166447e-05,
+      "loss": 0.9035,
+      "step": 1360
+    },
+    {
+      "epoch": 0.4560585885486019,
+      "grad_norm": 13.0,
+      "learning_rate": 1.0878828229027965e-05,
+      "loss": 1.0748,
+      "step": 1370
+    },
+    {
+      "epoch": 0.459387483355526,
+      "grad_norm": 13.125,
+      "learning_rate": 1.0812250332889482e-05,
+      "loss": 1.0373,
+      "step": 1380
+    },
+    {
+      "epoch": 0.4627163781624501,
+      "grad_norm": 15.0,
+      "learning_rate": 1.0745672436751e-05,
+      "loss": 0.9623,
+      "step": 1390
+    },
+    {
+      "epoch": 0.46604527296937415,
+      "grad_norm": 26.25,
+      "learning_rate": 1.0679094540612518e-05,
+      "loss": 0.9579,
+      "step": 1400
+    },
+    {
+      "epoch": 0.46937416777629826,
+      "grad_norm": 12.125,
+      "learning_rate": 1.0612516644474036e-05,
+      "loss": 0.918,
+      "step": 1410
+    },
+    {
+      "epoch": 0.47270306258322237,
+      "grad_norm": 30.75,
+      "learning_rate": 1.0545938748335554e-05,
+      "loss": 0.9823,
+      "step": 1420
+    },
+    {
+      "epoch": 0.4760319573901465,
+      "grad_norm": 7.0,
+      "learning_rate": 1.047936085219707e-05,
+      "loss": 0.9885,
+      "step": 1430
+    },
+    {
+      "epoch": 0.4793608521970706,
+      "grad_norm": 19.5,
+      "learning_rate": 1.0412782956058588e-05,
+      "loss": 0.9654,
+      "step": 1440
+    },
+    {
+      "epoch": 0.4826897470039947,
+      "grad_norm": 13.5,
+      "learning_rate": 1.0346205059920106e-05,
+      "loss": 0.8212,
+      "step": 1450
+    },
+    {
+      "epoch": 0.48601864181091875,
+      "grad_norm": 17.0,
+      "learning_rate": 1.0279627163781624e-05,
+      "loss": 0.9406,
+      "step": 1460
+    },
+    {
+      "epoch": 0.48934753661784286,
+      "grad_norm": 28.0,
+      "learning_rate": 1.0213049267643142e-05,
+      "loss": 0.9908,
+      "step": 1470
+    },
+    {
+      "epoch": 0.49267643142476697,
+      "grad_norm": 14.25,
+      "learning_rate": 1.014647137150466e-05,
+      "loss": 0.9418,
+      "step": 1480
+    },
+    {
+      "epoch": 0.4960053262316911,
+      "grad_norm": 12.25,
+      "learning_rate": 1.007989347536618e-05,
+      "loss": 1.075,
+      "step": 1490
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "grad_norm": 10.625,
+      "learning_rate": 1.0013315579227698e-05,
+      "loss": 1.0111,
+      "step": 1500
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "eval_accuracy": 0.508752281400365,
+      "eval_loss": 0.9421009421348572,
+      "eval_runtime": 209.8514,
+      "eval_samples_per_second": 114.881,
+      "eval_steps_per_second": 28.72,
+      "step": 1500
+    },
+    {
+      "epoch": 0.5026631158455392,
+      "grad_norm": 31.5,
+      "learning_rate": 9.946737683089214e-06,
+      "loss": 0.9077,
+      "step": 1510
+    },
+    {
+      "epoch": 0.5059920106524634,
+      "grad_norm": 19.375,
+      "learning_rate": 9.880159786950732e-06,
+      "loss": 1.1292,
+      "step": 1520
+    },
+    {
+      "epoch": 0.5093209054593875,
+      "grad_norm": 8.4375,
+      "learning_rate": 9.813581890812252e-06,
+      "loss": 1.0658,
+      "step": 1530
+    },
+    {
+      "epoch": 0.5126498002663116,
+      "grad_norm": 17.375,
+      "learning_rate": 9.74700399467377e-06,
+      "loss": 0.9691,
+      "step": 1540
+    },
+    {
+      "epoch": 0.5159786950732357,
+      "grad_norm": 8.125,
+      "learning_rate": 9.680426098535288e-06,
+      "loss": 1.0146,
+      "step": 1550
+    },
+    {
+      "epoch": 0.5193075898801598,
+      "grad_norm": 17.5,
+      "learning_rate": 9.613848202396806e-06,
+      "loss": 1.0196,
+      "step": 1560
+    },
+    {
+      "epoch": 0.5226364846870839,
+      "grad_norm": 15.1875,
+      "learning_rate": 9.547270306258324e-06,
+      "loss": 0.8996,
+      "step": 1570
+    },
+    {
+      "epoch": 0.525965379494008,
+      "grad_norm": 55.5,
+      "learning_rate": 9.48069241011984e-06,
+      "loss": 1.0211,
+      "step": 1580
+    },
+    {
+      "epoch": 0.5292942743009321,
+      "grad_norm": 4.21875,
+      "learning_rate": 9.41411451398136e-06,
+      "loss": 0.9728,
+      "step": 1590
+    },
+    {
+      "epoch": 0.5326231691078562,
+      "grad_norm": 6.71875,
+      "learning_rate": 9.347536617842877e-06,
+      "loss": 1.0168,
+      "step": 1600
+    },
+    {
+      "epoch": 0.5359520639147803,
+      "grad_norm": 24.75,
+      "learning_rate": 9.280958721704395e-06,
+      "loss": 0.9252,
+      "step": 1610
+    },
+    {
+      "epoch": 0.5392809587217043,
+      "grad_norm": 9.0625,
+      "learning_rate": 9.214380825565913e-06,
+      "loss": 1.0479,
+      "step": 1620
+    },
+    {
+      "epoch": 0.5426098535286284,
+      "grad_norm": 15.375,
+      "learning_rate": 9.147802929427431e-06,
+      "loss": 1.0699,
+      "step": 1630
+    },
+    {
+      "epoch": 0.5459387483355526,
+      "grad_norm": 17.5,
+      "learning_rate": 9.08122503328895e-06,
+      "loss": 0.9407,
+      "step": 1640
+    },
+    {
+      "epoch": 0.5492676431424767,
+      "grad_norm": 9.625,
+      "learning_rate": 9.014647137150465e-06,
+      "loss": 1.0124,
+      "step": 1650
+    },
+    {
+      "epoch": 0.5525965379494008,
+      "grad_norm": 10.375,
+      "learning_rate": 8.948069241011985e-06,
+      "loss": 0.8612,
+      "step": 1660
+    },
+    {
+      "epoch": 0.5559254327563249,
+      "grad_norm": 21.375,
+      "learning_rate": 8.881491344873503e-06,
+      "loss": 0.9613,
+      "step": 1670
+    },
+    {
+      "epoch": 0.559254327563249,
+      "grad_norm": 9.3125,
+      "learning_rate": 8.814913448735021e-06,
+      "loss": 0.9752,
+      "step": 1680
+    },
+    {
+      "epoch": 0.5625832223701731,
+      "grad_norm": 8.375,
+      "learning_rate": 8.748335552596539e-06,
+      "loss": 0.8926,
+      "step": 1690
+    },
+    {
+      "epoch": 0.5659121171770972,
+      "grad_norm": 17.625,
+      "learning_rate": 8.681757656458057e-06,
+      "loss": 1.0054,
+      "step": 1700
+    },
+    {
+      "epoch": 0.5692410119840213,
+      "grad_norm": 20.125,
+      "learning_rate": 8.615179760319575e-06,
+      "loss": 0.9203,
+      "step": 1710
+    },
+    {
+      "epoch": 0.5725699067909454,
+      "grad_norm": 28.75,
+      "learning_rate": 8.548601864181093e-06,
+      "loss": 0.9583,
+      "step": 1720
+    },
+    {
+      "epoch": 0.5758988015978695,
+      "grad_norm": 17.0,
+      "learning_rate": 8.48202396804261e-06,
+      "loss": 0.9582,
+      "step": 1730
+    },
+    {
+      "epoch": 0.5792276964047937,
+      "grad_norm": 13.4375,
+      "learning_rate": 8.415446071904129e-06,
+      "loss": 1.0034,
+      "step": 1740
+    },
+    {
+      "epoch": 0.5825565912117177,
+      "grad_norm": 27.75,
+      "learning_rate": 8.348868175765647e-06,
+      "loss": 1.0647,
+      "step": 1750
+    },
+    {
+      "epoch": 0.5858854860186418,
+      "grad_norm": 33.75,
+      "learning_rate": 8.282290279627165e-06,
+      "loss": 1.0575,
+      "step": 1760
+    },
+    {
+      "epoch": 0.5892143808255659,
+      "grad_norm": 14.375,
+      "learning_rate": 8.215712383488683e-06,
+      "loss": 1.0699,
+      "step": 1770
+    },
+    {
+      "epoch": 0.59254327563249,
+      "grad_norm": 6.09375,
+      "learning_rate": 8.1491344873502e-06,
+      "loss": 0.7706,
+      "step": 1780
+    },
+    {
+      "epoch": 0.5958721704394141,
+      "grad_norm": 8.5625,
+      "learning_rate": 8.082556591211719e-06,
+      "loss": 0.8626,
+      "step": 1790
+    },
+    {
+      "epoch": 0.5992010652463382,
+      "grad_norm": 7.46875,
+      "learning_rate": 8.015978695073236e-06,
+      "loss": 0.9427,
+      "step": 1800
+    },
+    {
+      "epoch": 0.6025299600532623,
+      "grad_norm": 23.375,
+      "learning_rate": 7.949400798934754e-06,
+      "loss": 0.962,
+      "step": 1810
+    },
+    {
+      "epoch": 0.6058588548601864,
+      "grad_norm": 7.65625,
+      "learning_rate": 7.882822902796272e-06,
+      "loss": 1.1216,
+      "step": 1820
+    },
+    {
+      "epoch": 0.6091877496671105,
+      "grad_norm": 10.0625,
+      "learning_rate": 7.81624500665779e-06,
+      "loss": 0.9076,
+      "step": 1830
+    },
+    {
+      "epoch": 0.6125166444740346,
+      "grad_norm": 6.78125,
+      "learning_rate": 7.749667110519308e-06,
+      "loss": 0.8158,
+      "step": 1840
+    },
+    {
+      "epoch": 0.6158455392809588,
+      "grad_norm": 29.5,
+      "learning_rate": 7.683089214380826e-06,
+      "loss": 0.9828,
+      "step": 1850
+    },
+    {
+      "epoch": 0.6191744340878829,
+      "grad_norm": 9.6875,
+      "learning_rate": 7.616511318242344e-06,
+      "loss": 0.9181,
+      "step": 1860
+    },
+    {
+      "epoch": 0.622503328894807,
+      "grad_norm": 7.6875,
+      "learning_rate": 7.549933422103862e-06,
+      "loss": 0.9864,
+      "step": 1870
+    },
+    {
+      "epoch": 0.625832223701731,
+      "grad_norm": 33.0,
+      "learning_rate": 7.48335552596538e-06,
+      "loss": 0.9898,
+      "step": 1880
+    },
+    {
+      "epoch": 0.6291611185086551,
+      "grad_norm": 13.1875,
+      "learning_rate": 7.416777629826898e-06,
+      "loss": 0.948,
+      "step": 1890
+    },
+    {
+      "epoch": 0.6324900133155792,
+      "grad_norm": 6.59375,
+      "learning_rate": 7.350199733688416e-06,
+      "loss": 0.9811,
+      "step": 1900
+    },
+    {
+      "epoch": 0.6358189081225033,
+      "grad_norm": 8.5625,
+      "learning_rate": 7.283621837549935e-06,
+      "loss": 1.0461,
+      "step": 1910
+    },
+    {
+      "epoch": 0.6391478029294274,
+      "grad_norm": 30.25,
+      "learning_rate": 7.217043941411453e-06,
+      "loss": 1.0408,
+      "step": 1920
+    },
+    {
+      "epoch": 0.6424766977363515,
+      "grad_norm": 25.625,
+      "learning_rate": 7.15046604527297e-06,
+      "loss": 1.0327,
+      "step": 1930
+    },
+    {
+      "epoch": 0.6458055925432756,
+      "grad_norm": 27.125,
+      "learning_rate": 7.083888149134488e-06,
+      "loss": 0.9213,
+      "step": 1940
+    },
+    {
+      "epoch": 0.6491344873501997,
+      "grad_norm": 22.125,
+      "learning_rate": 7.017310252996006e-06,
+      "loss": 1.1191,
+      "step": 1950
+    },
+    {
+      "epoch": 0.6524633821571239,
+      "grad_norm": 10.6875,
+      "learning_rate": 6.950732356857524e-06,
+      "loss": 0.9514,
+      "step": 1960
+    },
+    {
+      "epoch": 0.655792276964048,
+      "grad_norm": 16.375,
+      "learning_rate": 6.884154460719042e-06,
+      "loss": 0.9437,
+      "step": 1970
+    },
+    {
+      "epoch": 0.6591211717709721,
+      "grad_norm": 7.5625,
+      "learning_rate": 6.8175765645805605e-06,
+      "loss": 1.0645,
+      "step": 1980
+    },
+    {
+      "epoch": 0.6624500665778962,
+      "grad_norm": 9.125,
+      "learning_rate": 6.7509986684420784e-06,
+      "loss": 0.9085,
+      "step": 1990
+    },
+    {
+      "epoch": 0.6657789613848203,
+      "grad_norm": 9.6875,
+      "learning_rate": 6.6844207723035955e-06,
+      "loss": 0.9784,
+      "step": 2000
+    },
+    {
+      "epoch": 0.6657789613848203,
+      "eval_accuracy": 0.5068027210884354,
+      "eval_loss": 0.9414482712745667,
+      "eval_runtime": 213.4336,
+      "eval_samples_per_second": 112.953,
+      "eval_steps_per_second": 28.238,
+      "step": 2000
+    },
+    {
+      "epoch": 0.6691078561917443,
+      "grad_norm": 22.125,
+      "learning_rate": 6.6178428761651135e-06,
+      "loss": 0.9537,
+      "step": 2010
+    },
+    {
+      "epoch": 0.6724367509986684,
+      "grad_norm": 19.5,
+      "learning_rate": 6.5512649800266314e-06,
+      "loss": 1.0052,
+      "step": 2020
+    },
+    {
+      "epoch": 0.6757656458055925,
+      "grad_norm": 22.75,
+      "learning_rate": 6.484687083888149e-06,
+      "loss": 1.0513,
+      "step": 2030
+    },
+    {
+      "epoch": 0.6790945406125166,
+      "grad_norm": 14.9375,
+      "learning_rate": 6.418109187749668e-06,
+      "loss": 1.0479,
+      "step": 2040
+    },
+    {
+      "epoch": 0.6824234354194407,
+      "grad_norm": 15.6875,
+      "learning_rate": 6.351531291611186e-06,
+      "loss": 0.9613,
+      "step": 2050
+    },
+    {
+      "epoch": 0.6857523302263648,
+      "grad_norm": 9.5625,
+      "learning_rate": 6.284953395472704e-06,
+      "loss": 0.98,
+      "step": 2060
+    },
+    {
+      "epoch": 0.689081225033289,
+      "grad_norm": 11.375,
+      "learning_rate": 6.218375499334221e-06,
+      "loss": 0.8031,
+      "step": 2070
+    },
+    {
+      "epoch": 0.6924101198402131,
+      "grad_norm": 8.5,
+      "learning_rate": 6.151797603195739e-06,
+      "loss": 0.9613,
+      "step": 2080
+    },
+    {
+      "epoch": 0.6957390146471372,
+      "grad_norm": 8.1875,
+      "learning_rate": 6.085219707057257e-06,
+      "loss": 1.0147,
+      "step": 2090
+    },
+    {
+      "epoch": 0.6990679094540613,
+      "grad_norm": 12.125,
+      "learning_rate": 6.018641810918775e-06,
+      "loss": 0.9856,
+      "step": 2100
+    },
+    {
+      "epoch": 0.7023968042609854,
+      "grad_norm": 26.0,
+      "learning_rate": 5.952063914780294e-06,
+      "loss": 0.9457,
+      "step": 2110
+    },
+    {
+      "epoch": 0.7057256990679095,
+      "grad_norm": 25.875,
+      "learning_rate": 5.885486018641812e-06,
+      "loss": 0.9133,
+      "step": 2120
+    },
+    {
+      "epoch": 0.7090545938748336,
+      "grad_norm": 25.0,
+      "learning_rate": 5.81890812250333e-06,
+      "loss": 1.0304,
+      "step": 2130
+    },
+    {
+      "epoch": 0.7123834886817576,
+      "grad_norm": 27.375,
+      "learning_rate": 5.752330226364847e-06,
+      "loss": 1.1041,
+      "step": 2140
+    },
+    {
+      "epoch": 0.7157123834886817,
+      "grad_norm": 15.75,
+      "learning_rate": 5.685752330226365e-06,
+      "loss": 1.0324,
+      "step": 2150
+    },
+    {
+      "epoch": 0.7190412782956058,
+      "grad_norm": 13.4375,
+      "learning_rate": 5.619174434087883e-06,
+      "loss": 0.9786,
+      "step": 2160
+    },
+    {
+      "epoch": 0.7223701731025299,
+      "grad_norm": 7.53125,
+      "learning_rate": 5.5525965379494016e-06,
+      "loss": 1.048,
+      "step": 2170
+    },
+    {
+      "epoch": 0.725699067909454,
+      "grad_norm": 23.75,
+      "learning_rate": 5.4860186418109195e-06,
+      "loss": 0.9347,
+      "step": 2180
+    },
+    {
+      "epoch": 0.7290279627163782,
+      "grad_norm": 30.875,
+      "learning_rate": 5.4194407456724375e-06,
+      "loss": 0.929,
+      "step": 2190
+    },
+    {
+      "epoch": 0.7323568575233023,
+      "grad_norm": 9.5625,
+      "learning_rate": 5.3528628495339554e-06,
+      "loss": 0.8509,
+      "step": 2200
+    },
+    {
+      "epoch": 0.7356857523302264,
+      "grad_norm": 31.125,
+      "learning_rate": 5.286284953395473e-06,
+      "loss": 1.0614,
+      "step": 2210
+    },
+    {
+      "epoch": 0.7390146471371505,
+      "grad_norm": 33.5,
+      "learning_rate": 5.2197070572569905e-06,
+      "loss": 0.9723,
+      "step": 2220
+    },
+    {
+      "epoch": 0.7423435419440746,
+      "grad_norm": 9.0,
+      "learning_rate": 5.1531291611185084e-06,
+      "loss": 0.8864,
+      "step": 2230
+    },
+    {
+      "epoch": 0.7456724367509987,
+      "grad_norm": 26.0,
+      "learning_rate": 5.086551264980027e-06,
+      "loss": 1.004,
+      "step": 2240
+    },
+    {
+      "epoch": 0.7490013315579228,
+      "grad_norm": 26.5,
+      "learning_rate": 5.019973368841545e-06,
+      "loss": 0.9622,
+      "step": 2250
+    },
+    {
+      "epoch": 0.7523302263648469,
+      "grad_norm": 12.75,
+      "learning_rate": 4.953395472703063e-06,
+      "loss": 0.9793,
+      "step": 2260
+    },
+    {
+      "epoch": 0.7556591211717709,
+      "grad_norm": 7.65625,
+      "learning_rate": 4.886817576564581e-06,
+      "loss": 0.7945,
+      "step": 2270
+    },
+    {
+      "epoch": 0.758988015978695,
+      "grad_norm": 16.875,
+      "learning_rate": 4.820239680426099e-06,
+      "loss": 1.151,
+      "step": 2280
+    },
+    {
+      "epoch": 0.7623169107856191,
+      "grad_norm": 14.1875,
+      "learning_rate": 4.753661784287617e-06,
+      "loss": 0.8101,
+      "step": 2290
+    },
+    {
+      "epoch": 0.7656458055925432,
+      "grad_norm": 5.53125,
+      "learning_rate": 4.687083888149135e-06,
+      "loss": 0.9417,
+      "step": 2300
+    },
+    {
+      "epoch": 0.7689747003994674,
+      "grad_norm": 14.75,
+      "learning_rate": 4.620505992010653e-06,
+      "loss": 0.9002,
+      "step": 2310
+    },
+    {
+      "epoch": 0.7723035952063915,
+      "grad_norm": 13.9375,
+      "learning_rate": 4.553928095872171e-06,
+      "loss": 1.0102,
+      "step": 2320
+    },
+    {
+      "epoch": 0.7756324900133156,
+      "grad_norm": 7.9375,
+      "learning_rate": 4.487350199733689e-06,
+      "loss": 0.9029,
+      "step": 2330
+    },
+    {
+      "epoch": 0.7789613848202397,
+      "grad_norm": 25.25,
+      "learning_rate": 4.420772303595207e-06,
+      "loss": 1.0435,
+      "step": 2340
+    },
+    {
+      "epoch": 0.7822902796271638,
+      "grad_norm": 3.53125,
+      "learning_rate": 4.354194407456725e-06,
+      "loss": 0.8851,
+      "step": 2350
+    },
+    {
+      "epoch": 0.7856191744340879,
+      "grad_norm": 36.0,
+      "learning_rate": 4.287616511318243e-06,
+      "loss": 0.9339,
+      "step": 2360
+    },
+    {
+      "epoch": 0.788948069241012,
+      "grad_norm": 25.375,
+      "learning_rate": 4.221038615179761e-06,
+      "loss": 0.943,
+      "step": 2370
+    },
+    {
+      "epoch": 0.7922769640479361,
+      "grad_norm": 6.90625,
+      "learning_rate": 4.1544607190412786e-06,
+      "loss": 0.969,
+      "step": 2380
+    },
+    {
+      "epoch": 0.7956058588548602,
+      "grad_norm": 29.375,
+      "learning_rate": 4.0878828229027965e-06,
+      "loss": 0.9423,
+      "step": 2390
+    },
+    {
+      "epoch": 0.7989347536617842,
+      "grad_norm": 9.125,
+      "learning_rate": 4.0213049267643145e-06,
+      "loss": 0.8299,
+      "step": 2400
+    },
+    {
+      "epoch": 0.8022636484687083,
+      "grad_norm": 9.75,
+      "learning_rate": 3.9547270306258324e-06,
+      "loss": 1.0702,
+      "step": 2410
+    },
+    {
+      "epoch": 0.8055925432756325,
+      "grad_norm": 13.1875,
+      "learning_rate": 3.88814913448735e-06,
+      "loss": 1.03,
+      "step": 2420
+    },
+    {
+      "epoch": 0.8089214380825566,
+      "grad_norm": 10.4375,
+      "learning_rate": 3.821571238348868e-06,
+      "loss": 0.9467,
+      "step": 2430
+    },
+    {
+      "epoch": 0.8122503328894807,
+      "grad_norm": 6.46875,
+      "learning_rate": 3.7549933422103863e-06,
+      "loss": 1.0284,
+      "step": 2440
+    },
+    {
+      "epoch": 0.8155792276964048,
+      "grad_norm": 9.75,
+      "learning_rate": 3.6884154460719047e-06,
+      "loss": 1.0224,
+      "step": 2450
+    },
+    {
+      "epoch": 0.8189081225033289,
+      "grad_norm": 11.0625,
+      "learning_rate": 3.621837549933422e-06,
+      "loss": 0.8798,
+      "step": 2460
+    },
+    {
+      "epoch": 0.822237017310253,
+      "grad_norm": 6.25,
+      "learning_rate": 3.55525965379494e-06,
+      "loss": 0.9709,
+      "step": 2470
+    },
+    {
+      "epoch": 0.8255659121171771,
+      "grad_norm": 35.5,
+      "learning_rate": 3.4886817576564585e-06,
+      "loss": 1.0462,
+      "step": 2480
+    },
+    {
+      "epoch": 0.8288948069241012,
+      "grad_norm": 19.25,
+      "learning_rate": 3.4221038615179765e-06,
+      "loss": 0.9246,
+      "step": 2490
+    },
+    {
+      "epoch": 0.8322237017310253,
+      "grad_norm": 11.9375,
+      "learning_rate": 3.355525965379494e-06,
+      "loss": 0.9595,
+      "step": 2500
+    },
+    {
+      "epoch": 0.8322237017310253,
+      "eval_accuracy": 0.5073834411813506,
+      "eval_loss": 0.9414454698562622,
+      "eval_runtime": 212.4541,
+      "eval_samples_per_second": 113.474,
+      "eval_steps_per_second": 28.368,
+      "step": 2500
+    },
+    {
+      "epoch": 0.8355525965379494,
+      "grad_norm": 7.1875,
+      "learning_rate": 3.2889480692410124e-06,
+      "loss": 0.9402,
+      "step": 2510
+    },
+    {
+      "epoch": 0.8388814913448736,
+      "grad_norm": 26.875,
+      "learning_rate": 3.2223701731025303e-06,
+      "loss": 1.0118,
+      "step": 2520
+    },
+    {
+      "epoch": 0.8422103861517976,
+      "grad_norm": 5.53125,
+      "learning_rate": 3.1557922769640483e-06,
+      "loss": 0.9671,
+      "step": 2530
+    },
+    {
+      "epoch": 0.8455392809587217,
+      "grad_norm": 10.5,
+      "learning_rate": 3.089214380825566e-06,
+      "loss": 0.864,
+      "step": 2540
+    },
+    {
+      "epoch": 0.8488681757656458,
+      "grad_norm": 14.6875,
+      "learning_rate": 3.022636484687084e-06,
+      "loss": 0.8927,
+      "step": 2550
+    },
+    {
+      "epoch": 0.8521970705725699,
+      "grad_norm": 9.625,
+      "learning_rate": 2.956058588548602e-06,
+      "loss": 0.9096,
+      "step": 2560
+    },
+    {
+      "epoch": 0.855525965379494,
+      "grad_norm": 10.9375,
+      "learning_rate": 2.8894806924101197e-06,
+      "loss": 0.9587,
+      "step": 2570
+    },
+    {
+      "epoch": 0.8588548601864181,
+      "grad_norm": 25.375,
+      "learning_rate": 2.822902796271638e-06,
+      "loss": 1.0519,
+      "step": 2580
+    },
+    {
+      "epoch": 0.8621837549933422,
+      "grad_norm": 23.375,
+      "learning_rate": 2.756324900133156e-06,
+      "loss": 1.1253,
+      "step": 2590
+    },
+    {
+      "epoch": 0.8655126498002663,
+      "grad_norm": 43.0,
+      "learning_rate": 2.689747003994674e-06,
+      "loss": 0.9927,
+      "step": 2600
+    },
+    {
+      "epoch": 0.8688415446071904,
+      "grad_norm": 19.75,
+      "learning_rate": 2.6231691078561923e-06,
+      "loss": 0.9283,
+      "step": 2610
+    },
+    {
+      "epoch": 0.8721704394141145,
+      "grad_norm": 10.0625,
+      "learning_rate": 2.55659121171771e-06,
+      "loss": 1.0127,
+      "step": 2620
+    },
+    {
+      "epoch": 0.8754993342210386,
+      "grad_norm": 14.875,
+      "learning_rate": 2.490013315579228e-06,
+      "loss": 0.9217,
+      "step": 2630
+    },
+    {
+      "epoch": 0.8788282290279628,
+      "grad_norm": 7.90625,
+      "learning_rate": 2.4234354194407458e-06,
+      "loss": 0.9184,
+      "step": 2640
+    },
+    {
+      "epoch": 0.8821571238348869,
+      "grad_norm": 11.1875,
+      "learning_rate": 2.3568575233022637e-06,
+      "loss": 0.9643,
+      "step": 2650
+    },
+    {
+      "epoch": 0.8854860186418109,
+      "grad_norm": 5.84375,
+      "learning_rate": 2.2902796271637817e-06,
+      "loss": 0.9684,
+      "step": 2660
+    },
+    {
+      "epoch": 0.888814913448735,
+      "grad_norm": 13.75,
+      "learning_rate": 2.2237017310252996e-06,
+      "loss": 0.9295,
+      "step": 2670
+    },
+    {
+      "epoch": 0.8921438082556591,
+      "grad_norm": 20.125,
+      "learning_rate": 2.157123834886818e-06,
+      "loss": 0.9395,
+      "step": 2680
+    },
+    {
+      "epoch": 0.8954727030625832,
+      "grad_norm": 8.1875,
+      "learning_rate": 2.0905459387483355e-06,
+      "loss": 1.0618,
+      "step": 2690
+    },
+    {
+      "epoch": 0.8988015978695073,
+      "grad_norm": 24.125,
+      "learning_rate": 2.023968042609854e-06,
+      "loss": 0.9056,
+      "step": 2700
+    },
+    {
+      "epoch": 0.9021304926764314,
+      "grad_norm": 8.5625,
+      "learning_rate": 1.9573901464713714e-06,
+      "loss": 0.8271,
+      "step": 2710
+    },
+    {
+      "epoch": 0.9054593874833555,
+      "grad_norm": 13.625,
+      "learning_rate": 1.8908122503328896e-06,
+      "loss": 0.9926,
+      "step": 2720
+    },
+    {
+      "epoch": 0.9087882822902796,
+      "grad_norm": 10.0625,
+      "learning_rate": 1.8242343541944078e-06,
+      "loss": 0.9625,
+      "step": 2730
+    },
+    {
+      "epoch": 0.9121171770972037,
+      "grad_norm": 12.5,
+      "learning_rate": 1.7576564580559255e-06,
+      "loss": 0.9791,
+      "step": 2740
+    },
+    {
+      "epoch": 0.9154460719041279,
+      "grad_norm": 33.0,
+      "learning_rate": 1.6910785619174437e-06,
+      "loss": 0.9139,
+      "step": 2750
+    },
+    {
+      "epoch": 0.918774966711052,
+      "grad_norm": 7.1875,
+      "learning_rate": 1.6245006657789616e-06,
+      "loss": 0.8434,
+      "step": 2760
+    },
+    {
+      "epoch": 0.9221038615179761,
+      "grad_norm": 20.5,
+      "learning_rate": 1.5579227696404794e-06,
+      "loss": 0.9575,
+      "step": 2770
+    },
+    {
+      "epoch": 0.9254327563249002,
+      "grad_norm": 15.5,
+      "learning_rate": 1.4913448735019975e-06,
+      "loss": 1.0703,
+      "step": 2780
+    },
+    {
+      "epoch": 0.9287616511318242,
+      "grad_norm": 26.875,
+      "learning_rate": 1.4247669773635153e-06,
+      "loss": 1.0057,
+      "step": 2790
+    },
+    {
+      "epoch": 0.9320905459387483,
+      "grad_norm": 10.1875,
+      "learning_rate": 1.3581890812250334e-06,
+      "loss": 0.7978,
+      "step": 2800
+    },
+    {
+      "epoch": 0.9354194407456724,
+      "grad_norm": 15.0,
+      "learning_rate": 1.2916111850865514e-06,
+      "loss": 0.9641,
+      "step": 2810
+    },
+    {
+      "epoch": 0.9387483355525965,
+      "grad_norm": 12.0625,
+      "learning_rate": 1.2250332889480693e-06,
+      "loss": 1.0366,
+      "step": 2820
+    },
+    {
+      "epoch": 0.9420772303595206,
+      "grad_norm": 8.125,
+      "learning_rate": 1.1584553928095873e-06,
+      "loss": 1.0024,
+      "step": 2830
+    },
+    {
+      "epoch": 0.9454061251664447,
+      "grad_norm": 15.4375,
+      "learning_rate": 1.0918774966711052e-06,
+      "loss": 1.0015,
+      "step": 2840
+    },
+    {
+      "epoch": 0.9487350199733688,
+      "grad_norm": 6.3125,
+      "learning_rate": 1.0252996005326232e-06,
+      "loss": 1.0035,
+      "step": 2850
+    },
+    {
+      "epoch": 0.952063914780293,
+      "grad_norm": 26.25,
+      "learning_rate": 9.587217043941411e-07,
+      "loss": 1.0925,
+      "step": 2860
+    },
+    {
+      "epoch": 0.9553928095872171,
+      "grad_norm": 31.625,
+      "learning_rate": 8.921438082556592e-07,
+      "loss": 0.9177,
+      "step": 2870
+    },
+    {
+      "epoch": 0.9587217043941412,
+      "grad_norm": 25.125,
+      "learning_rate": 8.255659121171772e-07,
+      "loss": 1.128,
+      "step": 2880
+    },
+    {
+      "epoch": 0.9620505992010653,
+      "grad_norm": 15.875,
+      "learning_rate": 7.589880159786951e-07,
+      "loss": 1.0045,
+      "step": 2890
+    },
+    {
+      "epoch": 0.9653794940079894,
+      "grad_norm": 13.6875,
+      "learning_rate": 6.924101198402131e-07,
+      "loss": 1.0835,
+      "step": 2900
+    },
+    {
+      "epoch": 0.9687083888149135,
+      "grad_norm": 19.375,
+      "learning_rate": 6.258322237017311e-07,
+      "loss": 0.9599,
+      "step": 2910
+    },
+    {
+      "epoch": 0.9720372836218375,
+      "grad_norm": 13.1875,
+      "learning_rate": 5.592543275632491e-07,
+      "loss": 0.9699,
+      "step": 2920
+    },
+    {
+      "epoch": 0.9753661784287616,
+      "grad_norm": 23.0,
+      "learning_rate": 4.92676431424767e-07,
+      "loss": 1.0096,
+      "step": 2930
+    },
+    {
+      "epoch": 0.9786950732356857,
+      "grad_norm": 24.875,
+      "learning_rate": 4.2609853528628503e-07,
+      "loss": 1.0898,
+      "step": 2940
+    },
+    {
+      "epoch": 0.9820239680426098,
+      "grad_norm": 5.6875,
+      "learning_rate": 3.5952063914780293e-07,
+      "loss": 0.9168,
+      "step": 2950
+    },
+    {
+      "epoch": 0.9853528628495339,
+      "grad_norm": 14.0,
+      "learning_rate": 2.9294274300932093e-07,
+      "loss": 1.0706,
+      "step": 2960
+    },
+    {
+      "epoch": 0.988681757656458,
+      "grad_norm": 11.25,
+      "learning_rate": 2.263648468708389e-07,
+      "loss": 1.0224,
+      "step": 2970
+    },
+    {
+      "epoch": 0.9920106524633822,
+      "grad_norm": 6.0625,
+      "learning_rate": 1.5978695073235687e-07,
+      "loss": 1.092,
+      "step": 2980
+    },
+    {
+      "epoch": 0.9953395472703063,
+      "grad_norm": 18.375,
+      "learning_rate": 9.320905459387485e-08,
+      "loss": 0.9978,
+      "step": 2990
+    },
+    {
+      "epoch": 0.9986684420772304,
+      "grad_norm": 6.21875,
+      "learning_rate": 2.6631158455392814e-08,
+      "loss": 0.9865,
+      "step": 3000
+    },
+    {
+      "epoch": 0.9986684420772304,
+      "eval_accuracy": 0.5078812012609922,
+      "eval_loss": 0.9415724873542786,
+      "eval_runtime": 213.578,
+      "eval_samples_per_second": 112.877,
+      "eval_steps_per_second": 28.219,
+      "step": 3000
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 3004,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-3000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cad4c1f90cca8627b67272acf21697ce26139a89a13bf55f49567978573bff87
+size 5176

checkpoint-3004/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: gpt2
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-3004/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "gpt2",
+  "bias": "none",
+  "fan_in_fan_out": true,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-3004/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f341cf631bdef89c245109cc4c977c7d496e441e6d11145eddc5dad6a1dc473a
+size 594496

checkpoint-3004/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:852181afa1a19e5c441b9c55486b58d7fec27001696126f592f0c10dac0aadba
+size 1197932

checkpoint-3004/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:837c235220b00aa9bd6fd9e829dc15309c352bfb953f2629a19c980bd73a1269
+size 14180

checkpoint-3004/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e27ec6d082724e3872c4fc029fdb8a14d33ee1a339fc5241410e1d93dfc98718
+size 1064

checkpoint-3004/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2187 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 3004,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.003328894806924101,
+      "grad_norm": 37.75,
+      "learning_rate": 1.9933422103861518e-05,
+      "loss": 1.4084,
+      "step": 10
+    },
+    {
+      "epoch": 0.006657789613848202,
+      "grad_norm": 12.125,
+      "learning_rate": 1.9866844207723038e-05,
+      "loss": 1.0474,
+      "step": 20
+    },
+    {
+      "epoch": 0.009986684420772303,
+      "grad_norm": 45.25,
+      "learning_rate": 1.9800266311584554e-05,
+      "loss": 1.2455,
+      "step": 30
+    },
+    {
+      "epoch": 0.013315579227696404,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.9733688415446073e-05,
+      "loss": 1.1969,
+      "step": 40
+    },
+    {
+      "epoch": 0.016644474034620507,
+      "grad_norm": 32.0,
+      "learning_rate": 1.966711051930759e-05,
+      "loss": 1.1659,
+      "step": 50
+    },
+    {
+      "epoch": 0.019973368841544607,
+      "grad_norm": 19.0,
+      "learning_rate": 1.960053262316911e-05,
+      "loss": 1.1159,
+      "step": 60
+    },
+    {
+      "epoch": 0.02330226364846871,
+      "grad_norm": 27.5,
+      "learning_rate": 1.953395472703063e-05,
+      "loss": 1.1861,
+      "step": 70
+    },
+    {
+      "epoch": 0.02663115845539281,
+      "grad_norm": 17.5,
+      "learning_rate": 1.9467376830892145e-05,
+      "loss": 1.1073,
+      "step": 80
+    },
+    {
+      "epoch": 0.02996005326231691,
+      "grad_norm": 33.75,
+      "learning_rate": 1.9400798934753665e-05,
+      "loss": 1.1506,
+      "step": 90
+    },
+    {
+      "epoch": 0.033288948069241014,
+      "grad_norm": 9.5,
+      "learning_rate": 1.933422103861518e-05,
+      "loss": 1.246,
+      "step": 100
+    },
+    {
+      "epoch": 0.03661784287616511,
+      "grad_norm": 11.3125,
+      "learning_rate": 1.92676431424767e-05,
+      "loss": 1.1708,
+      "step": 110
+    },
+    {
+      "epoch": 0.03994673768308921,
+      "grad_norm": 14.375,
+      "learning_rate": 1.9201065246338217e-05,
+      "loss": 1.1302,
+      "step": 120
+    },
+    {
+      "epoch": 0.043275632490013316,
+      "grad_norm": 20.625,
+      "learning_rate": 1.9134487350199737e-05,
+      "loss": 1.0537,
+      "step": 130
+    },
+    {
+      "epoch": 0.04660452729693742,
+      "grad_norm": 21.75,
+      "learning_rate": 1.9067909454061253e-05,
+      "loss": 1.0669,
+      "step": 140
+    },
+    {
+      "epoch": 0.049933422103861515,
+      "grad_norm": 14.125,
+      "learning_rate": 1.900133155792277e-05,
+      "loss": 1.0482,
+      "step": 150
+    },
+    {
+      "epoch": 0.05326231691078562,
+      "grad_norm": 27.625,
+      "learning_rate": 1.893475366178429e-05,
+      "loss": 1.1468,
+      "step": 160
+    },
+    {
+      "epoch": 0.05659121171770972,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.8868175765645805e-05,
+      "loss": 1.092,
+      "step": 170
+    },
+    {
+      "epoch": 0.05992010652463382,
+      "grad_norm": 32.5,
+      "learning_rate": 1.8801597869507325e-05,
+      "loss": 1.2208,
+      "step": 180
+    },
+    {
+      "epoch": 0.06324900133155792,
+      "grad_norm": 56.0,
+      "learning_rate": 1.873501997336884e-05,
+      "loss": 1.2359,
+      "step": 190
+    },
+    {
+      "epoch": 0.06657789613848203,
+      "grad_norm": 8.375,
+      "learning_rate": 1.866844207723036e-05,
+      "loss": 1.0712,
+      "step": 200
+    },
+    {
+      "epoch": 0.06990679094540612,
+      "grad_norm": 26.375,
+      "learning_rate": 1.860186418109188e-05,
+      "loss": 1.0156,
+      "step": 210
+    },
+    {
+      "epoch": 0.07323568575233022,
+      "grad_norm": 12.0,
+      "learning_rate": 1.8535286284953397e-05,
+      "loss": 1.0407,
+      "step": 220
+    },
+    {
+      "epoch": 0.07656458055925433,
+      "grad_norm": 13.75,
+      "learning_rate": 1.8468708388814916e-05,
+      "loss": 1.0161,
+      "step": 230
+    },
+    {
+      "epoch": 0.07989347536617843,
+      "grad_norm": 27.125,
+      "learning_rate": 1.8402130492676432e-05,
+      "loss": 1.1466,
+      "step": 240
+    },
+    {
+      "epoch": 0.08322237017310254,
+      "grad_norm": 26.125,
+      "learning_rate": 1.8335552596537952e-05,
+      "loss": 1.3192,
+      "step": 250
+    },
+    {
+      "epoch": 0.08655126498002663,
+      "grad_norm": 25.75,
+      "learning_rate": 1.826897470039947e-05,
+      "loss": 1.0628,
+      "step": 260
+    },
+    {
+      "epoch": 0.08988015978695073,
+      "grad_norm": 29.25,
+      "learning_rate": 1.8202396804260988e-05,
+      "loss": 0.967,
+      "step": 270
+    },
+    {
+      "epoch": 0.09320905459387484,
+      "grad_norm": 29.75,
+      "learning_rate": 1.8135818908122504e-05,
+      "loss": 1.0356,
+      "step": 280
+    },
+    {
+      "epoch": 0.09653794940079893,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.806924101198402e-05,
+      "loss": 0.9853,
+      "step": 290
+    },
+    {
+      "epoch": 0.09986684420772303,
+      "grad_norm": 27.5,
+      "learning_rate": 1.800266311584554e-05,
+      "loss": 1.0279,
+      "step": 300
+    },
+    {
+      "epoch": 0.10319573901464714,
+      "grad_norm": 5.65625,
+      "learning_rate": 1.7936085219707056e-05,
+      "loss": 1.0115,
+      "step": 310
+    },
+    {
+      "epoch": 0.10652463382157124,
+      "grad_norm": 8.1875,
+      "learning_rate": 1.7869507323568576e-05,
+      "loss": 0.942,
+      "step": 320
+    },
+    {
+      "epoch": 0.10985352862849534,
+      "grad_norm": 65.0,
+      "learning_rate": 1.7802929427430096e-05,
+      "loss": 1.0208,
+      "step": 330
+    },
+    {
+      "epoch": 0.11318242343541944,
+      "grad_norm": 9.625,
+      "learning_rate": 1.7736351531291612e-05,
+      "loss": 1.1729,
+      "step": 340
+    },
+    {
+      "epoch": 0.11651131824234354,
+      "grad_norm": 17.5,
+      "learning_rate": 1.766977363515313e-05,
+      "loss": 0.9322,
+      "step": 350
+    },
+    {
+      "epoch": 0.11984021304926765,
+      "grad_norm": 28.125,
+      "learning_rate": 1.7603195739014648e-05,
+      "loss": 1.0417,
+      "step": 360
+    },
+    {
+      "epoch": 0.12316910785619174,
+      "grad_norm": 18.25,
+      "learning_rate": 1.7536617842876168e-05,
+      "loss": 1.2491,
+      "step": 370
+    },
+    {
+      "epoch": 0.12649800266311584,
+      "grad_norm": 27.375,
+      "learning_rate": 1.7470039946737684e-05,
+      "loss": 0.9388,
+      "step": 380
+    },
+    {
+      "epoch": 0.12982689747003995,
+      "grad_norm": 22.125,
+      "learning_rate": 1.7403462050599203e-05,
+      "loss": 1.0488,
+      "step": 390
+    },
+    {
+      "epoch": 0.13315579227696406,
+      "grad_norm": 21.25,
+      "learning_rate": 1.733688415446072e-05,
+      "loss": 0.9951,
+      "step": 400
+    },
+    {
+      "epoch": 0.13648468708388814,
+      "grad_norm": 11.75,
+      "learning_rate": 1.727030625832224e-05,
+      "loss": 0.9212,
+      "step": 410
+    },
+    {
+      "epoch": 0.13981358189081225,
+      "grad_norm": 16.25,
+      "learning_rate": 1.7203728362183756e-05,
+      "loss": 1.0548,
+      "step": 420
+    },
+    {
+      "epoch": 0.14314247669773636,
+      "grad_norm": 19.875,
+      "learning_rate": 1.7137150466045275e-05,
+      "loss": 1.0238,
+      "step": 430
+    },
+    {
+      "epoch": 0.14647137150466044,
+      "grad_norm": 18.875,
+      "learning_rate": 1.707057256990679e-05,
+      "loss": 1.0327,
+      "step": 440
+    },
+    {
+      "epoch": 0.14980026631158455,
+      "grad_norm": 7.84375,
+      "learning_rate": 1.7003994673768308e-05,
+      "loss": 1.1155,
+      "step": 450
+    },
+    {
+      "epoch": 0.15312916111850866,
+      "grad_norm": 14.125,
+      "learning_rate": 1.693741677762983e-05,
+      "loss": 0.9627,
+      "step": 460
+    },
+    {
+      "epoch": 0.15645805592543274,
+      "grad_norm": 7.4375,
+      "learning_rate": 1.6870838881491347e-05,
+      "loss": 1.0216,
+      "step": 470
+    },
+    {
+      "epoch": 0.15978695073235685,
+      "grad_norm": 9.375,
+      "learning_rate": 1.6804260985352863e-05,
+      "loss": 1.0489,
+      "step": 480
+    },
+    {
+      "epoch": 0.16311584553928096,
+      "grad_norm": 19.5,
+      "learning_rate": 1.6737683089214383e-05,
+      "loss": 1.1254,
+      "step": 490
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "grad_norm": 26.125,
+      "learning_rate": 1.66711051930759e-05,
+      "loss": 1.0134,
+      "step": 500
+    },
+    {
+      "epoch": 0.16644474034620507,
+      "eval_accuracy": 0.5015762402521985,
+      "eval_loss": 0.9942083358764648,
+      "eval_runtime": 211.8123,
+      "eval_samples_per_second": 113.818,
+      "eval_steps_per_second": 28.454,
+      "step": 500
+    },
+    {
+      "epoch": 0.16977363515312915,
+      "grad_norm": 12.3125,
+      "learning_rate": 1.660452729693742e-05,
+      "loss": 1.0164,
+      "step": 510
+    },
+    {
+      "epoch": 0.17310252996005326,
+      "grad_norm": 38.75,
+      "learning_rate": 1.6537949400798935e-05,
+      "loss": 1.0042,
+      "step": 520
+    },
+    {
+      "epoch": 0.17643142476697737,
+      "grad_norm": 20.125,
+      "learning_rate": 1.6471371504660455e-05,
+      "loss": 0.9786,
+      "step": 530
+    },
+    {
+      "epoch": 0.17976031957390146,
+      "grad_norm": 11.4375,
+      "learning_rate": 1.640479360852197e-05,
+      "loss": 0.9678,
+      "step": 540
+    },
+    {
+      "epoch": 0.18308921438082557,
+      "grad_norm": 19.375,
+      "learning_rate": 1.633821571238349e-05,
+      "loss": 1.0477,
+      "step": 550
+    },
+    {
+      "epoch": 0.18641810918774968,
+      "grad_norm": 16.75,
+      "learning_rate": 1.6271637816245007e-05,
+      "loss": 1.02,
+      "step": 560
+    },
+    {
+      "epoch": 0.18974700399467376,
+      "grad_norm": 15.5,
+      "learning_rate": 1.6205059920106527e-05,
+      "loss": 0.9864,
+      "step": 570
+    },
+    {
+      "epoch": 0.19307589880159787,
+      "grad_norm": 9.75,
+      "learning_rate": 1.6138482023968043e-05,
+      "loss": 1.0019,
+      "step": 580
+    },
+    {
+      "epoch": 0.19640479360852198,
+      "grad_norm": 12.125,
+      "learning_rate": 1.6071904127829563e-05,
+      "loss": 0.9483,
+      "step": 590
+    },
+    {
+      "epoch": 0.19973368841544606,
+      "grad_norm": 30.875,
+      "learning_rate": 1.6005326231691082e-05,
+      "loss": 1.0692,
+      "step": 600
+    },
+    {
+      "epoch": 0.20306258322237017,
+      "grad_norm": 10.125,
+      "learning_rate": 1.59387483355526e-05,
+      "loss": 0.9279,
+      "step": 610
+    },
+    {
+      "epoch": 0.20639147802929428,
+      "grad_norm": 11.6875,
+      "learning_rate": 1.5872170439414115e-05,
+      "loss": 0.941,
+      "step": 620
+    },
+    {
+      "epoch": 0.2097203728362184,
+      "grad_norm": 3.03125,
+      "learning_rate": 1.5805592543275634e-05,
+      "loss": 1.0259,
+      "step": 630
+    },
+    {
+      "epoch": 0.21304926764314247,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.573901464713715e-05,
+      "loss": 0.9687,
+      "step": 640
+    },
+    {
+      "epoch": 0.21637816245006658,
+      "grad_norm": 12.8125,
+      "learning_rate": 1.567243675099867e-05,
+      "loss": 0.9512,
+      "step": 650
+    },
+    {
+      "epoch": 0.2197070572569907,
+      "grad_norm": 9.75,
+      "learning_rate": 1.5605858854860187e-05,
+      "loss": 0.9569,
+      "step": 660
+    },
+    {
+      "epoch": 0.22303595206391477,
+      "grad_norm": 17.375,
+      "learning_rate": 1.5539280958721706e-05,
+      "loss": 0.9694,
+      "step": 670
+    },
+    {
+      "epoch": 0.22636484687083888,
+      "grad_norm": 9.625,
+      "learning_rate": 1.5472703062583222e-05,
+      "loss": 0.97,
+      "step": 680
+    },
+    {
+      "epoch": 0.229693741677763,
+      "grad_norm": 21.0,
+      "learning_rate": 1.5406125166444742e-05,
+      "loss": 1.0506,
+      "step": 690
+    },
+    {
+      "epoch": 0.23302263648468707,
+      "grad_norm": 10.3125,
+      "learning_rate": 1.533954727030626e-05,
+      "loss": 0.9184,
+      "step": 700
+    },
+    {
+      "epoch": 0.23635153129161118,
+      "grad_norm": 21.5,
+      "learning_rate": 1.5272969374167778e-05,
+      "loss": 1.005,
+      "step": 710
+    },
+    {
+      "epoch": 0.2396804260985353,
+      "grad_norm": 4.40625,
+      "learning_rate": 1.5206391478029296e-05,
+      "loss": 0.8624,
+      "step": 720
+    },
+    {
+      "epoch": 0.24300932090545938,
+      "grad_norm": 13.9375,
+      "learning_rate": 1.5139813581890814e-05,
+      "loss": 1.0051,
+      "step": 730
+    },
+    {
+      "epoch": 0.24633821571238348,
+      "grad_norm": 16.625,
+      "learning_rate": 1.5073235685752332e-05,
+      "loss": 1.0378,
+      "step": 740
+    },
+    {
+      "epoch": 0.2496671105193076,
+      "grad_norm": 6.4375,
+      "learning_rate": 1.500665778961385e-05,
+      "loss": 0.9191,
+      "step": 750
+    },
+    {
+      "epoch": 0.2529960053262317,
+      "grad_norm": 5.5625,
+      "learning_rate": 1.4940079893475368e-05,
+      "loss": 0.9722,
+      "step": 760
+    },
+    {
+      "epoch": 0.2563249001331558,
+      "grad_norm": 22.625,
+      "learning_rate": 1.4873501997336886e-05,
+      "loss": 1.014,
+      "step": 770
+    },
+    {
+      "epoch": 0.2596537949400799,
+      "grad_norm": 21.0,
+      "learning_rate": 1.4806924101198404e-05,
+      "loss": 1.0986,
+      "step": 780
+    },
+    {
+      "epoch": 0.262982689747004,
+      "grad_norm": 10.4375,
+      "learning_rate": 1.4740346205059922e-05,
+      "loss": 0.9672,
+      "step": 790
+    },
+    {
+      "epoch": 0.2663115845539281,
+      "grad_norm": 17.0,
+      "learning_rate": 1.467376830892144e-05,
+      "loss": 0.9605,
+      "step": 800
+    },
+    {
+      "epoch": 0.26964047936085217,
+      "grad_norm": 31.75,
+      "learning_rate": 1.4607190412782957e-05,
+      "loss": 1.0308,
+      "step": 810
+    },
+    {
+      "epoch": 0.2729693741677763,
+      "grad_norm": 12.4375,
+      "learning_rate": 1.4540612516644474e-05,
+      "loss": 0.9268,
+      "step": 820
+    },
+    {
+      "epoch": 0.2762982689747004,
+      "grad_norm": 17.125,
+      "learning_rate": 1.4474034620505992e-05,
+      "loss": 0.9656,
+      "step": 830
+    },
+    {
+      "epoch": 0.2796271637816245,
+      "grad_norm": 4.53125,
+      "learning_rate": 1.440745672436751e-05,
+      "loss": 0.9865,
+      "step": 840
+    },
+    {
+      "epoch": 0.2829560585885486,
+      "grad_norm": 22.625,
+      "learning_rate": 1.434087882822903e-05,
+      "loss": 1.1118,
+      "step": 850
+    },
+    {
+      "epoch": 0.2862849533954727,
+      "grad_norm": 13.0,
+      "learning_rate": 1.4274300932090547e-05,
+      "loss": 1.0043,
+      "step": 860
+    },
+    {
+      "epoch": 0.28961384820239683,
+      "grad_norm": 9.0625,
+      "learning_rate": 1.4207723035952065e-05,
+      "loss": 0.9768,
+      "step": 870
+    },
+    {
+      "epoch": 0.2929427430093209,
+      "grad_norm": 6.75,
+      "learning_rate": 1.4141145139813583e-05,
+      "loss": 1.0237,
+      "step": 880
+    },
+    {
+      "epoch": 0.296271637816245,
+      "grad_norm": 10.875,
+      "learning_rate": 1.4074567243675101e-05,
+      "loss": 1.1597,
+      "step": 890
+    },
+    {
+      "epoch": 0.2996005326231691,
+      "grad_norm": 5.75,
+      "learning_rate": 1.4007989347536619e-05,
+      "loss": 0.7993,
+      "step": 900
+    },
+    {
+      "epoch": 0.3029294274300932,
+      "grad_norm": 31.5,
+      "learning_rate": 1.3941411451398137e-05,
+      "loss": 0.9802,
+      "step": 910
+    },
+    {
+      "epoch": 0.3062583222370173,
+      "grad_norm": 12.25,
+      "learning_rate": 1.3874833555259655e-05,
+      "loss": 0.9565,
+      "step": 920
+    },
+    {
+      "epoch": 0.30958721704394143,
+      "grad_norm": 11.8125,
+      "learning_rate": 1.3808255659121173e-05,
+      "loss": 1.0024,
+      "step": 930
+    },
+    {
+      "epoch": 0.3129161118508655,
+      "grad_norm": 16.625,
+      "learning_rate": 1.3741677762982691e-05,
+      "loss": 1.0209,
+      "step": 940
+    },
+    {
+      "epoch": 0.3162450066577896,
+      "grad_norm": 13.375,
+      "learning_rate": 1.3675099866844209e-05,
+      "loss": 0.8767,
+      "step": 950
+    },
+    {
+      "epoch": 0.3195739014647137,
+      "grad_norm": 20.875,
+      "learning_rate": 1.3608521970705725e-05,
+      "loss": 0.9642,
+      "step": 960
+    },
+    {
+      "epoch": 0.3229027962716378,
+      "grad_norm": 17.25,
+      "learning_rate": 1.3541944074567243e-05,
+      "loss": 0.9508,
+      "step": 970
+    },
+    {
+      "epoch": 0.3262316910785619,
+      "grad_norm": 22.25,
+      "learning_rate": 1.3475366178428764e-05,
+      "loss": 1.1324,
+      "step": 980
+    },
+    {
+      "epoch": 0.32956058588548603,
+      "grad_norm": 11.5,
+      "learning_rate": 1.3408788282290282e-05,
+      "loss": 1.0655,
+      "step": 990
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "grad_norm": 15.625,
+      "learning_rate": 1.3342210386151799e-05,
+      "loss": 0.9608,
+      "step": 1000
+    },
+    {
+      "epoch": 0.33288948069241014,
+      "eval_accuracy": 0.507300481168077,
+      "eval_loss": 0.9467151165008545,
+      "eval_runtime": 211.5004,
+      "eval_samples_per_second": 113.986,
+      "eval_steps_per_second": 28.496,
+      "step": 1000
+    },
+    {
+      "epoch": 0.3362183754993342,
+      "grad_norm": 20.0,
+      "learning_rate": 1.3275632490013317e-05,
+      "loss": 1.0432,
+      "step": 1010
+    },
+    {
+      "epoch": 0.3395472703062583,
+      "grad_norm": 22.375,
+      "learning_rate": 1.3209054593874834e-05,
+      "loss": 1.0393,
+      "step": 1020
+    },
+    {
+      "epoch": 0.3428761651131824,
+      "grad_norm": 18.375,
+      "learning_rate": 1.3142476697736352e-05,
+      "loss": 1.0398,
+      "step": 1030
+    },
+    {
+      "epoch": 0.34620505992010653,
+      "grad_norm": 17.25,
+      "learning_rate": 1.307589880159787e-05,
+      "loss": 0.9225,
+      "step": 1040
+    },
+    {
+      "epoch": 0.34953395472703064,
+      "grad_norm": 11.75,
+      "learning_rate": 1.3009320905459388e-05,
+      "loss": 0.9262,
+      "step": 1050
+    },
+    {
+      "epoch": 0.35286284953395475,
+      "grad_norm": 10.25,
+      "learning_rate": 1.2942743009320906e-05,
+      "loss": 0.7248,
+      "step": 1060
+    },
+    {
+      "epoch": 0.3561917443408788,
+      "grad_norm": 12.875,
+      "learning_rate": 1.2876165113182424e-05,
+      "loss": 0.8358,
+      "step": 1070
+    },
+    {
+      "epoch": 0.3595206391478029,
+      "grad_norm": 4.65625,
+      "learning_rate": 1.2809587217043942e-05,
+      "loss": 0.8156,
+      "step": 1080
+    },
+    {
+      "epoch": 0.362849533954727,
+      "grad_norm": 23.5,
+      "learning_rate": 1.274300932090546e-05,
+      "loss": 1.0211,
+      "step": 1090
+    },
+    {
+      "epoch": 0.36617842876165113,
+      "grad_norm": 17.5,
+      "learning_rate": 1.2676431424766978e-05,
+      "loss": 0.9804,
+      "step": 1100
+    },
+    {
+      "epoch": 0.36950732356857524,
+      "grad_norm": 6.375,
+      "learning_rate": 1.2609853528628498e-05,
+      "loss": 0.9636,
+      "step": 1110
+    },
+    {
+      "epoch": 0.37283621837549935,
+      "grad_norm": 20.25,
+      "learning_rate": 1.2543275632490016e-05,
+      "loss": 0.9873,
+      "step": 1120
+    },
+    {
+      "epoch": 0.37616511318242346,
+      "grad_norm": 17.0,
+      "learning_rate": 1.2476697736351534e-05,
+      "loss": 0.9674,
+      "step": 1130
+    },
+    {
+      "epoch": 0.3794940079893475,
+      "grad_norm": 20.5,
+      "learning_rate": 1.241011984021305e-05,
+      "loss": 1.0073,
+      "step": 1140
+    },
+    {
+      "epoch": 0.3828229027962716,
+      "grad_norm": 4.125,
+      "learning_rate": 1.2343541944074568e-05,
+      "loss": 1.0919,
+      "step": 1150
+    },
+    {
+      "epoch": 0.38615179760319573,
+      "grad_norm": 15.3125,
+      "learning_rate": 1.2276964047936086e-05,
+      "loss": 1.0103,
+      "step": 1160
+    },
+    {
+      "epoch": 0.38948069241011984,
+      "grad_norm": 22.125,
+      "learning_rate": 1.2210386151797604e-05,
+      "loss": 0.9629,
+      "step": 1170
+    },
+    {
+      "epoch": 0.39280958721704395,
+      "grad_norm": 2.953125,
+      "learning_rate": 1.2143808255659122e-05,
+      "loss": 0.9545,
+      "step": 1180
+    },
+    {
+      "epoch": 0.39613848202396806,
+      "grad_norm": 7.59375,
+      "learning_rate": 1.207723035952064e-05,
+      "loss": 0.8836,
+      "step": 1190
+    },
+    {
+      "epoch": 0.3994673768308921,
+      "grad_norm": 28.125,
+      "learning_rate": 1.2010652463382158e-05,
+      "loss": 0.9632,
+      "step": 1200
+    },
+    {
+      "epoch": 0.40279627163781623,
+      "grad_norm": 15.125,
+      "learning_rate": 1.1944074567243676e-05,
+      "loss": 1.0765,
+      "step": 1210
+    },
+    {
+      "epoch": 0.40612516644474034,
+      "grad_norm": 11.625,
+      "learning_rate": 1.1877496671105194e-05,
+      "loss": 0.7911,
+      "step": 1220
+    },
+    {
+      "epoch": 0.40945406125166445,
+      "grad_norm": 15.5625,
+      "learning_rate": 1.1810918774966711e-05,
+      "loss": 1.0908,
+      "step": 1230
+    },
+    {
+      "epoch": 0.41278295605858856,
+      "grad_norm": 6.3125,
+      "learning_rate": 1.1744340878828231e-05,
+      "loss": 1.0418,
+      "step": 1240
+    },
+    {
+      "epoch": 0.41611185086551267,
+      "grad_norm": 2.671875,
+      "learning_rate": 1.1677762982689749e-05,
+      "loss": 0.9041,
+      "step": 1250
+    },
+    {
+      "epoch": 0.4194407456724368,
+      "grad_norm": 19.125,
+      "learning_rate": 1.1611185086551267e-05,
+      "loss": 1.0392,
+      "step": 1260
+    },
+    {
+      "epoch": 0.42276964047936083,
+      "grad_norm": 6.8125,
+      "learning_rate": 1.1544607190412785e-05,
+      "loss": 0.8439,
+      "step": 1270
+    },
+    {
+      "epoch": 0.42609853528628494,
+      "grad_norm": 7.75,
+      "learning_rate": 1.1478029294274303e-05,
+      "loss": 1.0188,
+      "step": 1280
+    },
+    {
+      "epoch": 0.42942743009320905,
+      "grad_norm": 17.25,
+      "learning_rate": 1.141145139813582e-05,
+      "loss": 1.1353,
+      "step": 1290
+    },
+    {
+      "epoch": 0.43275632490013316,
+      "grad_norm": 22.125,
+      "learning_rate": 1.1344873501997337e-05,
+      "loss": 0.987,
+      "step": 1300
+    },
+    {
+      "epoch": 0.43608521970705727,
+      "grad_norm": 16.0,
+      "learning_rate": 1.1278295605858855e-05,
+      "loss": 0.8446,
+      "step": 1310
+    },
+    {
+      "epoch": 0.4394141145139814,
+      "grad_norm": 48.25,
+      "learning_rate": 1.1211717709720373e-05,
+      "loss": 1.0588,
+      "step": 1320
+    },
+    {
+      "epoch": 0.44274300932090543,
+      "grad_norm": 17.625,
+      "learning_rate": 1.1145139813581891e-05,
+      "loss": 0.8579,
+      "step": 1330
+    },
+    {
+      "epoch": 0.44607190412782954,
+      "grad_norm": 20.25,
+      "learning_rate": 1.1078561917443409e-05,
+      "loss": 1.0973,
+      "step": 1340
+    },
+    {
+      "epoch": 0.44940079893475365,
+      "grad_norm": 14.25,
+      "learning_rate": 1.1011984021304927e-05,
+      "loss": 0.975,
+      "step": 1350
+    },
+    {
+      "epoch": 0.45272969374167776,
+      "grad_norm": 22.875,
+      "learning_rate": 1.0945406125166447e-05,
+      "loss": 0.9035,
+      "step": 1360
+    },
+    {
+      "epoch": 0.4560585885486019,
+      "grad_norm": 13.0,
+      "learning_rate": 1.0878828229027965e-05,
+      "loss": 1.0748,
+      "step": 1370
+    },
+    {
+      "epoch": 0.459387483355526,
+      "grad_norm": 13.125,
+      "learning_rate": 1.0812250332889482e-05,
+      "loss": 1.0373,
+      "step": 1380
+    },
+    {
+      "epoch": 0.4627163781624501,
+      "grad_norm": 15.0,
+      "learning_rate": 1.0745672436751e-05,
+      "loss": 0.9623,
+      "step": 1390
+    },
+    {
+      "epoch": 0.46604527296937415,
+      "grad_norm": 26.25,
+      "learning_rate": 1.0679094540612518e-05,
+      "loss": 0.9579,
+      "step": 1400
+    },
+    {
+      "epoch": 0.46937416777629826,
+      "grad_norm": 12.125,
+      "learning_rate": 1.0612516644474036e-05,
+      "loss": 0.918,
+      "step": 1410
+    },
+    {
+      "epoch": 0.47270306258322237,
+      "grad_norm": 30.75,
+      "learning_rate": 1.0545938748335554e-05,
+      "loss": 0.9823,
+      "step": 1420
+    },
+    {
+      "epoch": 0.4760319573901465,
+      "grad_norm": 7.0,
+      "learning_rate": 1.047936085219707e-05,
+      "loss": 0.9885,
+      "step": 1430
+    },
+    {
+      "epoch": 0.4793608521970706,
+      "grad_norm": 19.5,
+      "learning_rate": 1.0412782956058588e-05,
+      "loss": 0.9654,
+      "step": 1440
+    },
+    {
+      "epoch": 0.4826897470039947,
+      "grad_norm": 13.5,
+      "learning_rate": 1.0346205059920106e-05,
+      "loss": 0.8212,
+      "step": 1450
+    },
+    {
+      "epoch": 0.48601864181091875,
+      "grad_norm": 17.0,
+      "learning_rate": 1.0279627163781624e-05,
+      "loss": 0.9406,
+      "step": 1460
+    },
+    {
+      "epoch": 0.48934753661784286,
+      "grad_norm": 28.0,
+      "learning_rate": 1.0213049267643142e-05,
+      "loss": 0.9908,
+      "step": 1470
+    },
+    {
+      "epoch": 0.49267643142476697,
+      "grad_norm": 14.25,
+      "learning_rate": 1.014647137150466e-05,
+      "loss": 0.9418,
+      "step": 1480
+    },
+    {
+      "epoch": 0.4960053262316911,
+      "grad_norm": 12.25,
+      "learning_rate": 1.007989347536618e-05,
+      "loss": 1.075,
+      "step": 1490
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "grad_norm": 10.625,
+      "learning_rate": 1.0013315579227698e-05,
+      "loss": 1.0111,
+      "step": 1500
+    },
+    {
+      "epoch": 0.4993342210386152,
+      "eval_accuracy": 0.508752281400365,
+      "eval_loss": 0.9421009421348572,
+      "eval_runtime": 209.8514,
+      "eval_samples_per_second": 114.881,
+      "eval_steps_per_second": 28.72,
+      "step": 1500
+    },
+    {
+      "epoch": 0.5026631158455392,
+      "grad_norm": 31.5,
+      "learning_rate": 9.946737683089214e-06,
+      "loss": 0.9077,
+      "step": 1510
+    },
+    {
+      "epoch": 0.5059920106524634,
+      "grad_norm": 19.375,
+      "learning_rate": 9.880159786950732e-06,
+      "loss": 1.1292,
+      "step": 1520
+    },
+    {
+      "epoch": 0.5093209054593875,
+      "grad_norm": 8.4375,
+      "learning_rate": 9.813581890812252e-06,
+      "loss": 1.0658,
+      "step": 1530
+    },
+    {
+      "epoch": 0.5126498002663116,
+      "grad_norm": 17.375,
+      "learning_rate": 9.74700399467377e-06,
+      "loss": 0.9691,
+      "step": 1540
+    },
+    {
+      "epoch": 0.5159786950732357,
+      "grad_norm": 8.125,
+      "learning_rate": 9.680426098535288e-06,
+      "loss": 1.0146,
+      "step": 1550
+    },
+    {
+      "epoch": 0.5193075898801598,
+      "grad_norm": 17.5,
+      "learning_rate": 9.613848202396806e-06,
+      "loss": 1.0196,
+      "step": 1560
+    },
+    {
+      "epoch": 0.5226364846870839,
+      "grad_norm": 15.1875,
+      "learning_rate": 9.547270306258324e-06,
+      "loss": 0.8996,
+      "step": 1570
+    },
+    {
+      "epoch": 0.525965379494008,
+      "grad_norm": 55.5,
+      "learning_rate": 9.48069241011984e-06,
+      "loss": 1.0211,
+      "step": 1580
+    },
+    {
+      "epoch": 0.5292942743009321,
+      "grad_norm": 4.21875,
+      "learning_rate": 9.41411451398136e-06,
+      "loss": 0.9728,
+      "step": 1590
+    },
+    {
+      "epoch": 0.5326231691078562,
+      "grad_norm": 6.71875,
+      "learning_rate": 9.347536617842877e-06,
+      "loss": 1.0168,
+      "step": 1600
+    },
+    {
+      "epoch": 0.5359520639147803,
+      "grad_norm": 24.75,
+      "learning_rate": 9.280958721704395e-06,
+      "loss": 0.9252,
+      "step": 1610
+    },
+    {
+      "epoch": 0.5392809587217043,
+      "grad_norm": 9.0625,
+      "learning_rate": 9.214380825565913e-06,
+      "loss": 1.0479,
+      "step": 1620
+    },
+    {
+      "epoch": 0.5426098535286284,
+      "grad_norm": 15.375,
+      "learning_rate": 9.147802929427431e-06,
+      "loss": 1.0699,
+      "step": 1630
+    },
+    {
+      "epoch": 0.5459387483355526,
+      "grad_norm": 17.5,
+      "learning_rate": 9.08122503328895e-06,
+      "loss": 0.9407,
+      "step": 1640
+    },
+    {
+      "epoch": 0.5492676431424767,
+      "grad_norm": 9.625,
+      "learning_rate": 9.014647137150465e-06,
+      "loss": 1.0124,
+      "step": 1650
+    },
+    {
+      "epoch": 0.5525965379494008,
+      "grad_norm": 10.375,
+      "learning_rate": 8.948069241011985e-06,
+      "loss": 0.8612,
+      "step": 1660
+    },
+    {
+      "epoch": 0.5559254327563249,
+      "grad_norm": 21.375,
+      "learning_rate": 8.881491344873503e-06,
+      "loss": 0.9613,
+      "step": 1670
+    },
+    {
+      "epoch": 0.559254327563249,
+      "grad_norm": 9.3125,
+      "learning_rate": 8.814913448735021e-06,
+      "loss": 0.9752,
+      "step": 1680
+    },
+    {
+      "epoch": 0.5625832223701731,
+      "grad_norm": 8.375,
+      "learning_rate": 8.748335552596539e-06,
+      "loss": 0.8926,
+      "step": 1690
+    },
+    {
+      "epoch": 0.5659121171770972,
+      "grad_norm": 17.625,
+      "learning_rate": 8.681757656458057e-06,
+      "loss": 1.0054,
+      "step": 1700
+    },
+    {
+      "epoch": 0.5692410119840213,
+      "grad_norm": 20.125,
+      "learning_rate": 8.615179760319575e-06,
+      "loss": 0.9203,
+      "step": 1710
+    },
+    {
+      "epoch": 0.5725699067909454,
+      "grad_norm": 28.75,
+      "learning_rate": 8.548601864181093e-06,
+      "loss": 0.9583,
+      "step": 1720
+    },
+    {
+      "epoch": 0.5758988015978695,
+      "grad_norm": 17.0,
+      "learning_rate": 8.48202396804261e-06,
+      "loss": 0.9582,
+      "step": 1730
+    },
+    {
+      "epoch": 0.5792276964047937,
+      "grad_norm": 13.4375,
+      "learning_rate": 8.415446071904129e-06,
+      "loss": 1.0034,
+      "step": 1740
+    },
+    {
+      "epoch": 0.5825565912117177,
+      "grad_norm": 27.75,
+      "learning_rate": 8.348868175765647e-06,
+      "loss": 1.0647,
+      "step": 1750
+    },
+    {
+      "epoch": 0.5858854860186418,
+      "grad_norm": 33.75,
+      "learning_rate": 8.282290279627165e-06,
+      "loss": 1.0575,
+      "step": 1760
+    },
+    {
+      "epoch": 0.5892143808255659,
+      "grad_norm": 14.375,
+      "learning_rate": 8.215712383488683e-06,
+      "loss": 1.0699,
+      "step": 1770
+    },
+    {
+      "epoch": 0.59254327563249,
+      "grad_norm": 6.09375,
+      "learning_rate": 8.1491344873502e-06,
+      "loss": 0.7706,
+      "step": 1780
+    },
+    {
+      "epoch": 0.5958721704394141,
+      "grad_norm": 8.5625,
+      "learning_rate": 8.082556591211719e-06,
+      "loss": 0.8626,
+      "step": 1790
+    },
+    {
+      "epoch": 0.5992010652463382,
+      "grad_norm": 7.46875,
+      "learning_rate": 8.015978695073236e-06,
+      "loss": 0.9427,
+      "step": 1800
+    },
+    {
+      "epoch": 0.6025299600532623,
+      "grad_norm": 23.375,
+      "learning_rate": 7.949400798934754e-06,
+      "loss": 0.962,
+      "step": 1810
+    },
+    {
+      "epoch": 0.6058588548601864,
+      "grad_norm": 7.65625,
+      "learning_rate": 7.882822902796272e-06,
+      "loss": 1.1216,
+      "step": 1820
+    },
+    {
+      "epoch": 0.6091877496671105,
+      "grad_norm": 10.0625,
+      "learning_rate": 7.81624500665779e-06,
+      "loss": 0.9076,
+      "step": 1830
+    },
+    {
+      "epoch": 0.6125166444740346,
+      "grad_norm": 6.78125,
+      "learning_rate": 7.749667110519308e-06,
+      "loss": 0.8158,
+      "step": 1840
+    },
+    {
+      "epoch": 0.6158455392809588,
+      "grad_norm": 29.5,
+      "learning_rate": 7.683089214380826e-06,
+      "loss": 0.9828,
+      "step": 1850
+    },
+    {
+      "epoch": 0.6191744340878829,
+      "grad_norm": 9.6875,
+      "learning_rate": 7.616511318242344e-06,
+      "loss": 0.9181,
+      "step": 1860
+    },
+    {
+      "epoch": 0.622503328894807,
+      "grad_norm": 7.6875,
+      "learning_rate": 7.549933422103862e-06,
+      "loss": 0.9864,
+      "step": 1870
+    },
+    {
+      "epoch": 0.625832223701731,
+      "grad_norm": 33.0,
+      "learning_rate": 7.48335552596538e-06,
+      "loss": 0.9898,
+      "step": 1880
+    },
+    {
+      "epoch": 0.6291611185086551,
+      "grad_norm": 13.1875,
+      "learning_rate": 7.416777629826898e-06,
+      "loss": 0.948,
+      "step": 1890
+    },
+    {
+      "epoch": 0.6324900133155792,
+      "grad_norm": 6.59375,
+      "learning_rate": 7.350199733688416e-06,
+      "loss": 0.9811,
+      "step": 1900
+    },
+    {
+      "epoch": 0.6358189081225033,
+      "grad_norm": 8.5625,
+      "learning_rate": 7.283621837549935e-06,
+      "loss": 1.0461,
+      "step": 1910
+    },
+    {
+      "epoch": 0.6391478029294274,
+      "grad_norm": 30.25,
+      "learning_rate": 7.217043941411453e-06,
+      "loss": 1.0408,
+      "step": 1920
+    },
+    {
+      "epoch": 0.6424766977363515,
+      "grad_norm": 25.625,
+      "learning_rate": 7.15046604527297e-06,
+      "loss": 1.0327,
+      "step": 1930
+    },
+    {
+      "epoch": 0.6458055925432756,
+      "grad_norm": 27.125,
+      "learning_rate": 7.083888149134488e-06,
+      "loss": 0.9213,
+      "step": 1940
+    },
+    {
+      "epoch": 0.6491344873501997,
+      "grad_norm": 22.125,
+      "learning_rate": 7.017310252996006e-06,
+      "loss": 1.1191,
+      "step": 1950
+    },
+    {
+      "epoch": 0.6524633821571239,
+      "grad_norm": 10.6875,
+      "learning_rate": 6.950732356857524e-06,
+      "loss": 0.9514,
+      "step": 1960
+    },
+    {
+      "epoch": 0.655792276964048,
+      "grad_norm": 16.375,
+      "learning_rate": 6.884154460719042e-06,
+      "loss": 0.9437,
+      "step": 1970
+    },
+    {
+      "epoch": 0.6591211717709721,
+      "grad_norm": 7.5625,
+      "learning_rate": 6.8175765645805605e-06,
+      "loss": 1.0645,
+      "step": 1980
+    },
+    {
+      "epoch": 0.6624500665778962,
+      "grad_norm": 9.125,
+      "learning_rate": 6.7509986684420784e-06,
+      "loss": 0.9085,
+      "step": 1990
+    },
+    {
+      "epoch": 0.6657789613848203,
+      "grad_norm": 9.6875,
+      "learning_rate": 6.6844207723035955e-06,
+      "loss": 0.9784,
+      "step": 2000
+    },
+    {
+      "epoch": 0.6657789613848203,
+      "eval_accuracy": 0.5068027210884354,
+      "eval_loss": 0.9414482712745667,
+      "eval_runtime": 213.4336,
+      "eval_samples_per_second": 112.953,
+      "eval_steps_per_second": 28.238,
+      "step": 2000
+    },
+    {
+      "epoch": 0.6691078561917443,
+      "grad_norm": 22.125,
+      "learning_rate": 6.6178428761651135e-06,
+      "loss": 0.9537,
+      "step": 2010
+    },
+    {
+      "epoch": 0.6724367509986684,
+      "grad_norm": 19.5,
+      "learning_rate": 6.5512649800266314e-06,
+      "loss": 1.0052,
+      "step": 2020
+    },
+    {
+      "epoch": 0.6757656458055925,
+      "grad_norm": 22.75,
+      "learning_rate": 6.484687083888149e-06,
+      "loss": 1.0513,
+      "step": 2030
+    },
+    {
+      "epoch": 0.6790945406125166,
+      "grad_norm": 14.9375,
+      "learning_rate": 6.418109187749668e-06,
+      "loss": 1.0479,
+      "step": 2040
+    },
+    {
+      "epoch": 0.6824234354194407,
+      "grad_norm": 15.6875,
+      "learning_rate": 6.351531291611186e-06,
+      "loss": 0.9613,
+      "step": 2050
+    },
+    {
+      "epoch": 0.6857523302263648,
+      "grad_norm": 9.5625,
+      "learning_rate": 6.284953395472704e-06,
+      "loss": 0.98,
+      "step": 2060
+    },
+    {
+      "epoch": 0.689081225033289,
+      "grad_norm": 11.375,
+      "learning_rate": 6.218375499334221e-06,
+      "loss": 0.8031,
+      "step": 2070
+    },
+    {
+      "epoch": 0.6924101198402131,
+      "grad_norm": 8.5,
+      "learning_rate": 6.151797603195739e-06,
+      "loss": 0.9613,
+      "step": 2080
+    },
+    {
+      "epoch": 0.6957390146471372,
+      "grad_norm": 8.1875,
+      "learning_rate": 6.085219707057257e-06,
+      "loss": 1.0147,
+      "step": 2090
+    },
+    {
+      "epoch": 0.6990679094540613,
+      "grad_norm": 12.125,
+      "learning_rate": 6.018641810918775e-06,
+      "loss": 0.9856,
+      "step": 2100
+    },
+    {
+      "epoch": 0.7023968042609854,
+      "grad_norm": 26.0,
+      "learning_rate": 5.952063914780294e-06,
+      "loss": 0.9457,
+      "step": 2110
+    },
+    {
+      "epoch": 0.7057256990679095,
+      "grad_norm": 25.875,
+      "learning_rate": 5.885486018641812e-06,
+      "loss": 0.9133,
+      "step": 2120
+    },
+    {
+      "epoch": 0.7090545938748336,
+      "grad_norm": 25.0,
+      "learning_rate": 5.81890812250333e-06,
+      "loss": 1.0304,
+      "step": 2130
+    },
+    {
+      "epoch": 0.7123834886817576,
+      "grad_norm": 27.375,
+      "learning_rate": 5.752330226364847e-06,
+      "loss": 1.1041,
+      "step": 2140
+    },
+    {
+      "epoch": 0.7157123834886817,
+      "grad_norm": 15.75,
+      "learning_rate": 5.685752330226365e-06,
+      "loss": 1.0324,
+      "step": 2150
+    },
+    {
+      "epoch": 0.7190412782956058,
+      "grad_norm": 13.4375,
+      "learning_rate": 5.619174434087883e-06,
+      "loss": 0.9786,
+      "step": 2160
+    },
+    {
+      "epoch": 0.7223701731025299,
+      "grad_norm": 7.53125,
+      "learning_rate": 5.5525965379494016e-06,
+      "loss": 1.048,
+      "step": 2170
+    },
+    {
+      "epoch": 0.725699067909454,
+      "grad_norm": 23.75,
+      "learning_rate": 5.4860186418109195e-06,
+      "loss": 0.9347,
+      "step": 2180
+    },
+    {
+      "epoch": 0.7290279627163782,
+      "grad_norm": 30.875,
+      "learning_rate": 5.4194407456724375e-06,
+      "loss": 0.929,
+      "step": 2190
+    },
+    {
+      "epoch": 0.7323568575233023,
+      "grad_norm": 9.5625,
+      "learning_rate": 5.3528628495339554e-06,
+      "loss": 0.8509,
+      "step": 2200
+    },
+    {
+      "epoch": 0.7356857523302264,
+      "grad_norm": 31.125,
+      "learning_rate": 5.286284953395473e-06,
+      "loss": 1.0614,
+      "step": 2210
+    },
+    {
+      "epoch": 0.7390146471371505,
+      "grad_norm": 33.5,
+      "learning_rate": 5.2197070572569905e-06,
+      "loss": 0.9723,
+      "step": 2220
+    },
+    {
+      "epoch": 0.7423435419440746,
+      "grad_norm": 9.0,
+      "learning_rate": 5.1531291611185084e-06,
+      "loss": 0.8864,
+      "step": 2230
+    },
+    {
+      "epoch": 0.7456724367509987,
+      "grad_norm": 26.0,
+      "learning_rate": 5.086551264980027e-06,
+      "loss": 1.004,
+      "step": 2240
+    },
+    {
+      "epoch": 0.7490013315579228,
+      "grad_norm": 26.5,
+      "learning_rate": 5.019973368841545e-06,
+      "loss": 0.9622,
+      "step": 2250
+    },
+    {
+      "epoch": 0.7523302263648469,
+      "grad_norm": 12.75,
+      "learning_rate": 4.953395472703063e-06,
+      "loss": 0.9793,
+      "step": 2260
+    },
+    {
+      "epoch": 0.7556591211717709,
+      "grad_norm": 7.65625,
+      "learning_rate": 4.886817576564581e-06,
+      "loss": 0.7945,
+      "step": 2270
+    },
+    {
+      "epoch": 0.758988015978695,
+      "grad_norm": 16.875,
+      "learning_rate": 4.820239680426099e-06,
+      "loss": 1.151,
+      "step": 2280
+    },
+    {
+      "epoch": 0.7623169107856191,
+      "grad_norm": 14.1875,
+      "learning_rate": 4.753661784287617e-06,
+      "loss": 0.8101,
+      "step": 2290
+    },
+    {
+      "epoch": 0.7656458055925432,
+      "grad_norm": 5.53125,
+      "learning_rate": 4.687083888149135e-06,
+      "loss": 0.9417,
+      "step": 2300
+    },
+    {
+      "epoch": 0.7689747003994674,
+      "grad_norm": 14.75,
+      "learning_rate": 4.620505992010653e-06,
+      "loss": 0.9002,
+      "step": 2310
+    },
+    {
+      "epoch": 0.7723035952063915,
+      "grad_norm": 13.9375,
+      "learning_rate": 4.553928095872171e-06,
+      "loss": 1.0102,
+      "step": 2320
+    },
+    {
+      "epoch": 0.7756324900133156,
+      "grad_norm": 7.9375,
+      "learning_rate": 4.487350199733689e-06,
+      "loss": 0.9029,
+      "step": 2330
+    },
+    {
+      "epoch": 0.7789613848202397,
+      "grad_norm": 25.25,
+      "learning_rate": 4.420772303595207e-06,
+      "loss": 1.0435,
+      "step": 2340
+    },
+    {
+      "epoch": 0.7822902796271638,
+      "grad_norm": 3.53125,
+      "learning_rate": 4.354194407456725e-06,
+      "loss": 0.8851,
+      "step": 2350
+    },
+    {
+      "epoch": 0.7856191744340879,
+      "grad_norm": 36.0,
+      "learning_rate": 4.287616511318243e-06,
+      "loss": 0.9339,
+      "step": 2360
+    },
+    {
+      "epoch": 0.788948069241012,
+      "grad_norm": 25.375,
+      "learning_rate": 4.221038615179761e-06,
+      "loss": 0.943,
+      "step": 2370
+    },
+    {
+      "epoch": 0.7922769640479361,
+      "grad_norm": 6.90625,
+      "learning_rate": 4.1544607190412786e-06,
+      "loss": 0.969,
+      "step": 2380
+    },
+    {
+      "epoch": 0.7956058588548602,
+      "grad_norm": 29.375,
+      "learning_rate": 4.0878828229027965e-06,
+      "loss": 0.9423,
+      "step": 2390
+    },
+    {
+      "epoch": 0.7989347536617842,
+      "grad_norm": 9.125,
+      "learning_rate": 4.0213049267643145e-06,
+      "loss": 0.8299,
+      "step": 2400
+    },
+    {
+      "epoch": 0.8022636484687083,
+      "grad_norm": 9.75,
+      "learning_rate": 3.9547270306258324e-06,
+      "loss": 1.0702,
+      "step": 2410
+    },
+    {
+      "epoch": 0.8055925432756325,
+      "grad_norm": 13.1875,
+      "learning_rate": 3.88814913448735e-06,
+      "loss": 1.03,
+      "step": 2420
+    },
+    {
+      "epoch": 0.8089214380825566,
+      "grad_norm": 10.4375,
+      "learning_rate": 3.821571238348868e-06,
+      "loss": 0.9467,
+      "step": 2430
+    },
+    {
+      "epoch": 0.8122503328894807,
+      "grad_norm": 6.46875,
+      "learning_rate": 3.7549933422103863e-06,
+      "loss": 1.0284,
+      "step": 2440
+    },
+    {
+      "epoch": 0.8155792276964048,
+      "grad_norm": 9.75,
+      "learning_rate": 3.6884154460719047e-06,
+      "loss": 1.0224,
+      "step": 2450
+    },
+    {
+      "epoch": 0.8189081225033289,
+      "grad_norm": 11.0625,
+      "learning_rate": 3.621837549933422e-06,
+      "loss": 0.8798,
+      "step": 2460
+    },
+    {
+      "epoch": 0.822237017310253,
+      "grad_norm": 6.25,
+      "learning_rate": 3.55525965379494e-06,
+      "loss": 0.9709,
+      "step": 2470
+    },
+    {
+      "epoch": 0.8255659121171771,
+      "grad_norm": 35.5,
+      "learning_rate": 3.4886817576564585e-06,
+      "loss": 1.0462,
+      "step": 2480
+    },
+    {
+      "epoch": 0.8288948069241012,
+      "grad_norm": 19.25,
+      "learning_rate": 3.4221038615179765e-06,
+      "loss": 0.9246,
+      "step": 2490
+    },
+    {
+      "epoch": 0.8322237017310253,
+      "grad_norm": 11.9375,
+      "learning_rate": 3.355525965379494e-06,
+      "loss": 0.9595,
+      "step": 2500
+    },
+    {
+      "epoch": 0.8322237017310253,
+      "eval_accuracy": 0.5073834411813506,
+      "eval_loss": 0.9414454698562622,
+      "eval_runtime": 212.4541,
+      "eval_samples_per_second": 113.474,
+      "eval_steps_per_second": 28.368,
+      "step": 2500
+    },
+    {
+      "epoch": 0.8355525965379494,
+      "grad_norm": 7.1875,
+      "learning_rate": 3.2889480692410124e-06,
+      "loss": 0.9402,
+      "step": 2510
+    },
+    {
+      "epoch": 0.8388814913448736,
+      "grad_norm": 26.875,
+      "learning_rate": 3.2223701731025303e-06,
+      "loss": 1.0118,
+      "step": 2520
+    },
+    {
+      "epoch": 0.8422103861517976,
+      "grad_norm": 5.53125,
+      "learning_rate": 3.1557922769640483e-06,
+      "loss": 0.9671,
+      "step": 2530
+    },
+    {
+      "epoch": 0.8455392809587217,
+      "grad_norm": 10.5,
+      "learning_rate": 3.089214380825566e-06,
+      "loss": 0.864,
+      "step": 2540
+    },
+    {
+      "epoch": 0.8488681757656458,
+      "grad_norm": 14.6875,
+      "learning_rate": 3.022636484687084e-06,
+      "loss": 0.8927,
+      "step": 2550
+    },
+    {
+      "epoch": 0.8521970705725699,
+      "grad_norm": 9.625,
+      "learning_rate": 2.956058588548602e-06,
+      "loss": 0.9096,
+      "step": 2560
+    },
+    {
+      "epoch": 0.855525965379494,
+      "grad_norm": 10.9375,
+      "learning_rate": 2.8894806924101197e-06,
+      "loss": 0.9587,
+      "step": 2570
+    },
+    {
+      "epoch": 0.8588548601864181,
+      "grad_norm": 25.375,
+      "learning_rate": 2.822902796271638e-06,
+      "loss": 1.0519,
+      "step": 2580
+    },
+    {
+      "epoch": 0.8621837549933422,
+      "grad_norm": 23.375,
+      "learning_rate": 2.756324900133156e-06,
+      "loss": 1.1253,
+      "step": 2590
+    },
+    {
+      "epoch": 0.8655126498002663,
+      "grad_norm": 43.0,
+      "learning_rate": 2.689747003994674e-06,
+      "loss": 0.9927,
+      "step": 2600
+    },
+    {
+      "epoch": 0.8688415446071904,
+      "grad_norm": 19.75,
+      "learning_rate": 2.6231691078561923e-06,
+      "loss": 0.9283,
+      "step": 2610
+    },
+    {
+      "epoch": 0.8721704394141145,
+      "grad_norm": 10.0625,
+      "learning_rate": 2.55659121171771e-06,
+      "loss": 1.0127,
+      "step": 2620
+    },
+    {
+      "epoch": 0.8754993342210386,
+      "grad_norm": 14.875,
+      "learning_rate": 2.490013315579228e-06,
+      "loss": 0.9217,
+      "step": 2630
+    },
+    {
+      "epoch": 0.8788282290279628,
+      "grad_norm": 7.90625,
+      "learning_rate": 2.4234354194407458e-06,
+      "loss": 0.9184,
+      "step": 2640
+    },
+    {
+      "epoch": 0.8821571238348869,
+      "grad_norm": 11.1875,
+      "learning_rate": 2.3568575233022637e-06,
+      "loss": 0.9643,
+      "step": 2650
+    },
+    {
+      "epoch": 0.8854860186418109,
+      "grad_norm": 5.84375,
+      "learning_rate": 2.2902796271637817e-06,
+      "loss": 0.9684,
+      "step": 2660
+    },
+    {
+      "epoch": 0.888814913448735,
+      "grad_norm": 13.75,
+      "learning_rate": 2.2237017310252996e-06,
+      "loss": 0.9295,
+      "step": 2670
+    },
+    {
+      "epoch": 0.8921438082556591,
+      "grad_norm": 20.125,
+      "learning_rate": 2.157123834886818e-06,
+      "loss": 0.9395,
+      "step": 2680
+    },
+    {
+      "epoch": 0.8954727030625832,
+      "grad_norm": 8.1875,
+      "learning_rate": 2.0905459387483355e-06,
+      "loss": 1.0618,
+      "step": 2690
+    },
+    {
+      "epoch": 0.8988015978695073,
+      "grad_norm": 24.125,
+      "learning_rate": 2.023968042609854e-06,
+      "loss": 0.9056,
+      "step": 2700
+    },
+    {
+      "epoch": 0.9021304926764314,
+      "grad_norm": 8.5625,
+      "learning_rate": 1.9573901464713714e-06,
+      "loss": 0.8271,
+      "step": 2710
+    },
+    {
+      "epoch": 0.9054593874833555,
+      "grad_norm": 13.625,
+      "learning_rate": 1.8908122503328896e-06,
+      "loss": 0.9926,
+      "step": 2720
+    },
+    {
+      "epoch": 0.9087882822902796,
+      "grad_norm": 10.0625,
+      "learning_rate": 1.8242343541944078e-06,
+      "loss": 0.9625,
+      "step": 2730
+    },
+    {
+      "epoch": 0.9121171770972037,
+      "grad_norm": 12.5,
+      "learning_rate": 1.7576564580559255e-06,
+      "loss": 0.9791,
+      "step": 2740
+    },
+    {
+      "epoch": 0.9154460719041279,
+      "grad_norm": 33.0,
+      "learning_rate": 1.6910785619174437e-06,
+      "loss": 0.9139,
+      "step": 2750
+    },
+    {
+      "epoch": 0.918774966711052,
+      "grad_norm": 7.1875,
+      "learning_rate": 1.6245006657789616e-06,
+      "loss": 0.8434,
+      "step": 2760
+    },
+    {
+      "epoch": 0.9221038615179761,
+      "grad_norm": 20.5,
+      "learning_rate": 1.5579227696404794e-06,
+      "loss": 0.9575,
+      "step": 2770
+    },
+    {
+      "epoch": 0.9254327563249002,
+      "grad_norm": 15.5,
+      "learning_rate": 1.4913448735019975e-06,
+      "loss": 1.0703,
+      "step": 2780
+    },
+    {
+      "epoch": 0.9287616511318242,
+      "grad_norm": 26.875,
+      "learning_rate": 1.4247669773635153e-06,
+      "loss": 1.0057,
+      "step": 2790
+    },
+    {
+      "epoch": 0.9320905459387483,
+      "grad_norm": 10.1875,
+      "learning_rate": 1.3581890812250334e-06,
+      "loss": 0.7978,
+      "step": 2800
+    },
+    {
+      "epoch": 0.9354194407456724,
+      "grad_norm": 15.0,
+      "learning_rate": 1.2916111850865514e-06,
+      "loss": 0.9641,
+      "step": 2810
+    },
+    {
+      "epoch": 0.9387483355525965,
+      "grad_norm": 12.0625,
+      "learning_rate": 1.2250332889480693e-06,
+      "loss": 1.0366,
+      "step": 2820
+    },
+    {
+      "epoch": 0.9420772303595206,
+      "grad_norm": 8.125,
+      "learning_rate": 1.1584553928095873e-06,
+      "loss": 1.0024,
+      "step": 2830
+    },
+    {
+      "epoch": 0.9454061251664447,
+      "grad_norm": 15.4375,
+      "learning_rate": 1.0918774966711052e-06,
+      "loss": 1.0015,
+      "step": 2840
+    },
+    {
+      "epoch": 0.9487350199733688,
+      "grad_norm": 6.3125,
+      "learning_rate": 1.0252996005326232e-06,
+      "loss": 1.0035,
+      "step": 2850
+    },
+    {
+      "epoch": 0.952063914780293,
+      "grad_norm": 26.25,
+      "learning_rate": 9.587217043941411e-07,
+      "loss": 1.0925,
+      "step": 2860
+    },
+    {
+      "epoch": 0.9553928095872171,
+      "grad_norm": 31.625,
+      "learning_rate": 8.921438082556592e-07,
+      "loss": 0.9177,
+      "step": 2870
+    },
+    {
+      "epoch": 0.9587217043941412,
+      "grad_norm": 25.125,
+      "learning_rate": 8.255659121171772e-07,
+      "loss": 1.128,
+      "step": 2880
+    },
+    {
+      "epoch": 0.9620505992010653,
+      "grad_norm": 15.875,
+      "learning_rate": 7.589880159786951e-07,
+      "loss": 1.0045,
+      "step": 2890
+    },
+    {
+      "epoch": 0.9653794940079894,
+      "grad_norm": 13.6875,
+      "learning_rate": 6.924101198402131e-07,
+      "loss": 1.0835,
+      "step": 2900
+    },
+    {
+      "epoch": 0.9687083888149135,
+      "grad_norm": 19.375,
+      "learning_rate": 6.258322237017311e-07,
+      "loss": 0.9599,
+      "step": 2910
+    },
+    {
+      "epoch": 0.9720372836218375,
+      "grad_norm": 13.1875,
+      "learning_rate": 5.592543275632491e-07,
+      "loss": 0.9699,
+      "step": 2920
+    },
+    {
+      "epoch": 0.9753661784287616,
+      "grad_norm": 23.0,
+      "learning_rate": 4.92676431424767e-07,
+      "loss": 1.0096,
+      "step": 2930
+    },
+    {
+      "epoch": 0.9786950732356857,
+      "grad_norm": 24.875,
+      "learning_rate": 4.2609853528628503e-07,
+      "loss": 1.0898,
+      "step": 2940
+    },
+    {
+      "epoch": 0.9820239680426098,
+      "grad_norm": 5.6875,
+      "learning_rate": 3.5952063914780293e-07,
+      "loss": 0.9168,
+      "step": 2950
+    },
+    {
+      "epoch": 0.9853528628495339,
+      "grad_norm": 14.0,
+      "learning_rate": 2.9294274300932093e-07,
+      "loss": 1.0706,
+      "step": 2960
+    },
+    {
+      "epoch": 0.988681757656458,
+      "grad_norm": 11.25,
+      "learning_rate": 2.263648468708389e-07,
+      "loss": 1.0224,
+      "step": 2970
+    },
+    {
+      "epoch": 0.9920106524633822,
+      "grad_norm": 6.0625,
+      "learning_rate": 1.5978695073235687e-07,
+      "loss": 1.092,
+      "step": 2980
+    },
+    {
+      "epoch": 0.9953395472703063,
+      "grad_norm": 18.375,
+      "learning_rate": 9.320905459387485e-08,
+      "loss": 0.9978,
+      "step": 2990
+    },
+    {
+      "epoch": 0.9986684420772304,
+      "grad_norm": 6.21875,
+      "learning_rate": 2.6631158455392814e-08,
+      "loss": 0.9865,
+      "step": 3000
+    },
+    {
+      "epoch": 0.9986684420772304,
+      "eval_accuracy": 0.5078812012609922,
+      "eval_loss": 0.9415724873542786,
+      "eval_runtime": 213.578,
+      "eval_samples_per_second": 112.877,
+      "eval_steps_per_second": 28.219,
+      "step": 3000
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 3004,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-3004/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cad4c1f90cca8627b67272acf21697ce26139a89a13bf55f49567978573bff87
+size 5176

checkpoint-500/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: gpt2
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "gpt2",
+  "bias": "none",
+  "fan_in_fan_out": true,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}