Training in progress, step 385, checkpoint

Browse files

Files changed (11) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +31 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +23 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +73 -0
last-checkpoint/trainer_state.json +2744 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: beomi/polyglot-ko-12.8b-safetensors
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "beomi/polyglot-ko-12.8b-safetensors",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query_key_value",
+    "dense_4h_to_h",
+    "dense_h_to_4h",
+    "dense"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0675a27ef3d96ab44e6ebd079be5530168710017a48f3e2e02053473cf990f8
+size 104902272

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c1fd95ea42653b72693d47208698df28b2357bf072eddac771f7a93233f994c1
+size 53623316

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1d46c42c306f63e31b60c47c81a26311528b02cfebf6f431748382e6ff5912c6
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5c7907d50db90844f248b22e6ade47b806b24b4ec4aa7121507687d73e6c6190
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<|sep|>",
+    "<|acc|>",
+    "<|tel|>",
+    "<|rrn|>"
+  ],
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,73 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|unused0|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|unused1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<|sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30000": {
+      "content": "<|acc|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30001": {
+      "content": "<|tel|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30002": {
+      "content": "<|rrn|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<|sep|>",
+    "<|acc|>",
+    "<|tel|>",
+    "<|rrn|>"
+  ],
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "PreTrainedTokenizerFast"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2744 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.03079384123175365,
+  "eval_steps": 385,
+  "global_step": 385,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 7.998400319936013e-05,
+      "grad_norm": 1.0834295749664307,
+      "learning_rate": 2e-05,
+      "loss": 8.3241,
+      "step": 1
+    },
+    {
+      "epoch": 7.998400319936013e-05,
+      "eval_loss": 1.843304991722107,
+      "eval_runtime": 289.2804,
+      "eval_samples_per_second": 18.2,
+      "eval_steps_per_second": 9.102,
+      "step": 1
+    },
+    {
+      "epoch": 0.00015996800639872026,
+      "grad_norm": 1.1767560243606567,
+      "learning_rate": 4e-05,
+      "loss": 7.4616,
+      "step": 2
+    },
+    {
+      "epoch": 0.0002399520095980804,
+      "grad_norm": 1.1885181665420532,
+      "learning_rate": 6e-05,
+      "loss": 9.0656,
+      "step": 3
+    },
+    {
+      "epoch": 0.0003199360127974405,
+      "grad_norm": 0.9075803756713867,
+      "learning_rate": 8e-05,
+      "loss": 7.0921,
+      "step": 4
+    },
+    {
+      "epoch": 0.0003999200159968006,
+      "grad_norm": 1.4569883346557617,
+      "learning_rate": 0.0001,
+      "loss": 9.2885,
+      "step": 5
+    },
+    {
+      "epoch": 0.0004799040191961608,
+      "grad_norm": 1.3345720767974854,
+      "learning_rate": 0.00012,
+      "loss": 6.484,
+      "step": 6
+    },
+    {
+      "epoch": 0.0005598880223955209,
+      "grad_norm": 1.5282152891159058,
+      "learning_rate": 0.00014,
+      "loss": 6.8528,
+      "step": 7
+    },
+    {
+      "epoch": 0.000639872025594881,
+      "grad_norm": 2.0517117977142334,
+      "learning_rate": 0.00016,
+      "loss": 6.8443,
+      "step": 8
+    },
+    {
+      "epoch": 0.0007198560287942411,
+      "grad_norm": 2.188291311264038,
+      "learning_rate": 0.00018,
+      "loss": 7.3742,
+      "step": 9
+    },
+    {
+      "epoch": 0.0007998400319936012,
+      "grad_norm": 2.3675291538238525,
+      "learning_rate": 0.0002,
+      "loss": 7.1504,
+      "step": 10
+    },
+    {
+      "epoch": 0.0008798240351929614,
+      "grad_norm": 2.468874931335449,
+      "learning_rate": 0.0001999997891921711,
+      "loss": 4.3571,
+      "step": 11
+    },
+    {
+      "epoch": 0.0009598080383923216,
+      "grad_norm": 3.100978136062622,
+      "learning_rate": 0.0001999991567695732,
+      "loss": 5.447,
+      "step": 12
+    },
+    {
+      "epoch": 0.0010397920415916816,
+      "grad_norm": 3.456265687942505,
+      "learning_rate": 0.0001999981027348727,
+      "loss": 6.7931,
+      "step": 13
+    },
+    {
+      "epoch": 0.0011197760447910418,
+      "grad_norm": 3.2227530479431152,
+      "learning_rate": 0.00019999662709251355,
+      "loss": 5.6123,
+      "step": 14
+    },
+    {
+      "epoch": 0.0011997600479904018,
+      "grad_norm": 4.164862632751465,
+      "learning_rate": 0.00019999472984871732,
+      "loss": 5.619,
+      "step": 15
+    },
+    {
+      "epoch": 0.001279744051189762,
+      "grad_norm": 4.771204471588135,
+      "learning_rate": 0.00019999241101148306,
+      "loss": 6.6284,
+      "step": 16
+    },
+    {
+      "epoch": 0.0013597280543891223,
+      "grad_norm": 5.408437728881836,
+      "learning_rate": 0.00019998967059058737,
+      "loss": 6.9675,
+      "step": 17
+    },
+    {
+      "epoch": 0.0014397120575884823,
+      "grad_norm": 4.746880531311035,
+      "learning_rate": 0.0001999865085975843,
+      "loss": 5.3069,
+      "step": 18
+    },
+    {
+      "epoch": 0.0015196960607878425,
+      "grad_norm": 3.782351493835449,
+      "learning_rate": 0.00019998292504580528,
+      "loss": 4.9249,
+      "step": 19
+    },
+    {
+      "epoch": 0.0015996800639872025,
+      "grad_norm": 4.084463119506836,
+      "learning_rate": 0.00019997891995035912,
+      "loss": 4.7503,
+      "step": 20
+    },
+    {
+      "epoch": 0.0016796640671865627,
+      "grad_norm": 4.680345058441162,
+      "learning_rate": 0.00019997449332813201,
+      "loss": 5.6088,
+      "step": 21
+    },
+    {
+      "epoch": 0.0017596480703859227,
+      "grad_norm": 4.239553451538086,
+      "learning_rate": 0.0001999696451977872,
+      "loss": 6.5651,
+      "step": 22
+    },
+    {
+      "epoch": 0.001839632073585283,
+      "grad_norm": 3.8252408504486084,
+      "learning_rate": 0.00019996437557976516,
+      "loss": 3.6374,
+      "step": 23
+    },
+    {
+      "epoch": 0.0019196160767846432,
+      "grad_norm": 4.3223114013671875,
+      "learning_rate": 0.00019995868449628346,
+      "loss": 5.379,
+      "step": 24
+    },
+    {
+      "epoch": 0.001999600079984003,
+      "grad_norm": 4.155852794647217,
+      "learning_rate": 0.0001999525719713366,
+      "loss": 4.4252,
+      "step": 25
+    },
+    {
+      "epoch": 0.002079584083183363,
+      "grad_norm": 4.038774490356445,
+      "learning_rate": 0.00019994603803069594,
+      "loss": 3.8126,
+      "step": 26
+    },
+    {
+      "epoch": 0.0021595680863827236,
+      "grad_norm": 4.452616214752197,
+      "learning_rate": 0.0001999390827019096,
+      "loss": 4.3482,
+      "step": 27
+    },
+    {
+      "epoch": 0.0022395520895820836,
+      "grad_norm": 4.7611212730407715,
+      "learning_rate": 0.0001999317060143023,
+      "loss": 5.3983,
+      "step": 28
+    },
+    {
+      "epoch": 0.0023195360927814436,
+      "grad_norm": 3.455153465270996,
+      "learning_rate": 0.00019992390799897534,
+      "loss": 4.5008,
+      "step": 29
+    },
+    {
+      "epoch": 0.0023995200959808036,
+      "grad_norm": 4.436529636383057,
+      "learning_rate": 0.0001999156886888064,
+      "loss": 5.7692,
+      "step": 30
+    },
+    {
+      "epoch": 0.002479504099180164,
+      "grad_norm": 5.176506996154785,
+      "learning_rate": 0.00019990704811844934,
+      "loss": 5.2556,
+      "step": 31
+    },
+    {
+      "epoch": 0.002559488102379524,
+      "grad_norm": 4.376233100891113,
+      "learning_rate": 0.00019989798632433415,
+      "loss": 4.9445,
+      "step": 32
+    },
+    {
+      "epoch": 0.002639472105578884,
+      "grad_norm": 4.30057430267334,
+      "learning_rate": 0.0001998885033446668,
+      "loss": 5.1481,
+      "step": 33
+    },
+    {
+      "epoch": 0.0027194561087782445,
+      "grad_norm": 4.3144330978393555,
+      "learning_rate": 0.00019987859921942903,
+      "loss": 5.6733,
+      "step": 34
+    },
+    {
+      "epoch": 0.0027994401119776045,
+      "grad_norm": 4.325514316558838,
+      "learning_rate": 0.00019986827399037812,
+      "loss": 4.7508,
+      "step": 35
+    },
+    {
+      "epoch": 0.0028794241151769645,
+      "grad_norm": 3.4611852169036865,
+      "learning_rate": 0.0001998575277010469,
+      "loss": 4.2926,
+      "step": 36
+    },
+    {
+      "epoch": 0.0029594081183763245,
+      "grad_norm": 4.122291564941406,
+      "learning_rate": 0.0001998463603967434,
+      "loss": 5.5432,
+      "step": 37
+    },
+    {
+      "epoch": 0.003039392121575685,
+      "grad_norm": 3.8572282791137695,
+      "learning_rate": 0.00019983477212455074,
+      "loss": 3.6934,
+      "step": 38
+    },
+    {
+      "epoch": 0.003119376124775045,
+      "grad_norm": 4.3047943115234375,
+      "learning_rate": 0.00019982276293332686,
+      "loss": 5.3475,
+      "step": 39
+    },
+    {
+      "epoch": 0.003199360127974405,
+      "grad_norm": 3.8791322708129883,
+      "learning_rate": 0.00019981033287370443,
+      "loss": 4.9571,
+      "step": 40
+    },
+    {
+      "epoch": 0.0032793441311737654,
+      "grad_norm": 4.470249176025391,
+      "learning_rate": 0.00019979748199809049,
+      "loss": 5.1785,
+      "step": 41
+    },
+    {
+      "epoch": 0.0033593281343731254,
+      "grad_norm": 3.4192795753479004,
+      "learning_rate": 0.00019978421036066633,
+      "loss": 4.6074,
+      "step": 42
+    },
+    {
+      "epoch": 0.0034393121375724854,
+      "grad_norm": 3.722909927368164,
+      "learning_rate": 0.0001997705180173873,
+      "loss": 3.9538,
+      "step": 43
+    },
+    {
+      "epoch": 0.0035192961407718455,
+      "grad_norm": 3.8294920921325684,
+      "learning_rate": 0.00019975640502598244,
+      "loss": 4.7456,
+      "step": 44
+    },
+    {
+      "epoch": 0.003599280143971206,
+      "grad_norm": 3.147818088531494,
+      "learning_rate": 0.00019974187144595432,
+      "loss": 4.2381,
+      "step": 45
+    },
+    {
+      "epoch": 0.003679264147170566,
+      "grad_norm": 4.7630109786987305,
+      "learning_rate": 0.00019972691733857883,
+      "loss": 4.5834,
+      "step": 46
+    },
+    {
+      "epoch": 0.003759248150369926,
+      "grad_norm": 3.569880485534668,
+      "learning_rate": 0.00019971154276690478,
+      "loss": 3.8295,
+      "step": 47
+    },
+    {
+      "epoch": 0.0038392321535692863,
+      "grad_norm": 5.48881196975708,
+      "learning_rate": 0.00019969574779575376,
+      "loss": 5.0891,
+      "step": 48
+    },
+    {
+      "epoch": 0.003919216156768646,
+      "grad_norm": 3.903329849243164,
+      "learning_rate": 0.0001996795324917199,
+      "loss": 5.4285,
+      "step": 49
+    },
+    {
+      "epoch": 0.003999200159968006,
+      "grad_norm": 3.75028395652771,
+      "learning_rate": 0.00019966289692316944,
+      "loss": 3.8204,
+      "step": 50
+    },
+    {
+      "epoch": 0.004079184163167366,
+      "grad_norm": 4.618161678314209,
+      "learning_rate": 0.00019964584116024053,
+      "loss": 4.5384,
+      "step": 51
+    },
+    {
+      "epoch": 0.004159168166366726,
+      "grad_norm": 4.52060604095459,
+      "learning_rate": 0.00019962836527484296,
+      "loss": 4.5576,
+      "step": 52
+    },
+    {
+      "epoch": 0.004239152169566086,
+      "grad_norm": 4.482393264770508,
+      "learning_rate": 0.00019961046934065778,
+      "loss": 4.3975,
+      "step": 53
+    },
+    {
+      "epoch": 0.004319136172765447,
+      "grad_norm": 4.1788411140441895,
+      "learning_rate": 0.00019959215343313703,
+      "loss": 4.1852,
+      "step": 54
+    },
+    {
+      "epoch": 0.004399120175964807,
+      "grad_norm": 4.333094120025635,
+      "learning_rate": 0.00019957341762950344,
+      "loss": 4.7889,
+      "step": 55
+    },
+    {
+      "epoch": 0.004479104179164167,
+      "grad_norm": 4.460629463195801,
+      "learning_rate": 0.00019955426200875018,
+      "loss": 4.0291,
+      "step": 56
+    },
+    {
+      "epoch": 0.004559088182363527,
+      "grad_norm": 3.434190511703491,
+      "learning_rate": 0.00019953468665164024,
+      "loss": 5.1293,
+      "step": 57
+    },
+    {
+      "epoch": 0.004639072185562887,
+      "grad_norm": 3.6340487003326416,
+      "learning_rate": 0.00019951469164070646,
+      "loss": 4.3983,
+      "step": 58
+    },
+    {
+      "epoch": 0.004719056188762247,
+      "grad_norm": 3.3554069995880127,
+      "learning_rate": 0.0001994942770602509,
+      "loss": 4.3103,
+      "step": 59
+    },
+    {
+      "epoch": 0.004799040191961607,
+      "grad_norm": 3.730646848678589,
+      "learning_rate": 0.00019947344299634464,
+      "loss": 4.2136,
+      "step": 60
+    },
+    {
+      "epoch": 0.004879024195160968,
+      "grad_norm": 3.0557665824890137,
+      "learning_rate": 0.00019945218953682734,
+      "loss": 4.6209,
+      "step": 61
+    },
+    {
+      "epoch": 0.004959008198360328,
+      "grad_norm": 4.148006916046143,
+      "learning_rate": 0.00019943051677130696,
+      "loss": 4.6885,
+      "step": 62
+    },
+    {
+      "epoch": 0.005038992201559688,
+      "grad_norm": 4.280196189880371,
+      "learning_rate": 0.0001994084247911592,
+      "loss": 6.0293,
+      "step": 63
+    },
+    {
+      "epoch": 0.005118976204759048,
+      "grad_norm": 4.3163371086120605,
+      "learning_rate": 0.0001993859136895274,
+      "loss": 5.3148,
+      "step": 64
+    },
+    {
+      "epoch": 0.005198960207958408,
+      "grad_norm": 3.7306902408599854,
+      "learning_rate": 0.00019936298356132176,
+      "loss": 5.5089,
+      "step": 65
+    },
+    {
+      "epoch": 0.005278944211157768,
+      "grad_norm": 3.655566453933716,
+      "learning_rate": 0.00019933963450321945,
+      "loss": 5.4979,
+      "step": 66
+    },
+    {
+      "epoch": 0.005358928214357128,
+      "grad_norm": 3.5753936767578125,
+      "learning_rate": 0.00019931586661366363,
+      "loss": 5.0488,
+      "step": 67
+    },
+    {
+      "epoch": 0.005438912217556489,
+      "grad_norm": 3.427351713180542,
+      "learning_rate": 0.0001992916799928635,
+      "loss": 4.4419,
+      "step": 68
+    },
+    {
+      "epoch": 0.005518896220755849,
+      "grad_norm": 3.284907341003418,
+      "learning_rate": 0.0001992670747427936,
+      "loss": 4.1645,
+      "step": 69
+    },
+    {
+      "epoch": 0.005598880223955209,
+      "grad_norm": 3.8085391521453857,
+      "learning_rate": 0.0001992420509671936,
+      "loss": 6.0662,
+      "step": 70
+    },
+    {
+      "epoch": 0.005678864227154569,
+      "grad_norm": 3.331713914871216,
+      "learning_rate": 0.00019921660877156755,
+      "loss": 3.8283,
+      "step": 71
+    },
+    {
+      "epoch": 0.005758848230353929,
+      "grad_norm": 3.343092441558838,
+      "learning_rate": 0.0001991907482631838,
+      "loss": 4.5758,
+      "step": 72
+    },
+    {
+      "epoch": 0.005838832233553289,
+      "grad_norm": 3.283320188522339,
+      "learning_rate": 0.00019916446955107428,
+      "loss": 5.102,
+      "step": 73
+    },
+    {
+      "epoch": 0.005918816236752649,
+      "grad_norm": 3.636723756790161,
+      "learning_rate": 0.00019913777274603418,
+      "loss": 4.2686,
+      "step": 74
+    },
+    {
+      "epoch": 0.00599880023995201,
+      "grad_norm": 3.107661485671997,
+      "learning_rate": 0.00019911065796062135,
+      "loss": 4.1827,
+      "step": 75
+    },
+    {
+      "epoch": 0.00607878424315137,
+      "grad_norm": 4.445980072021484,
+      "learning_rate": 0.00019908312530915603,
+      "loss": 4.568,
+      "step": 76
+    },
+    {
+      "epoch": 0.00615876824635073,
+      "grad_norm": 3.0102596282958984,
+      "learning_rate": 0.00019905517490772015,
+      "loss": 4.0638,
+      "step": 77
+    },
+    {
+      "epoch": 0.00623875224955009,
+      "grad_norm": 4.667107105255127,
+      "learning_rate": 0.00019902680687415705,
+      "loss": 4.9191,
+      "step": 78
+    },
+    {
+      "epoch": 0.00631873625274945,
+      "grad_norm": 4.729066848754883,
+      "learning_rate": 0.0001989980213280707,
+      "loss": 4.5239,
+      "step": 79
+    },
+    {
+      "epoch": 0.00639872025594881,
+      "grad_norm": 4.467210292816162,
+      "learning_rate": 0.00019896881839082556,
+      "loss": 5.3754,
+      "step": 80
+    },
+    {
+      "epoch": 0.00647870425914817,
+      "grad_norm": 4.176740646362305,
+      "learning_rate": 0.0001989391981855457,
+      "loss": 5.5159,
+      "step": 81
+    },
+    {
+      "epoch": 0.006558688262347531,
+      "grad_norm": 4.937440395355225,
+      "learning_rate": 0.0001989091608371146,
+      "loss": 4.9222,
+      "step": 82
+    },
+    {
+      "epoch": 0.006638672265546891,
+      "grad_norm": 3.6677310466766357,
+      "learning_rate": 0.00019887870647217442,
+      "loss": 3.5926,
+      "step": 83
+    },
+    {
+      "epoch": 0.006718656268746251,
+      "grad_norm": 4.629709243774414,
+      "learning_rate": 0.00019884783521912554,
+      "loss": 4.0758,
+      "step": 84
+    },
+    {
+      "epoch": 0.006798640271945611,
+      "grad_norm": 4.480130672454834,
+      "learning_rate": 0.00019881654720812594,
+      "loss": 4.2366,
+      "step": 85
+    },
+    {
+      "epoch": 0.006878624275144971,
+      "grad_norm": 4.163790225982666,
+      "learning_rate": 0.00019878484257109083,
+      "loss": 4.1109,
+      "step": 86
+    },
+    {
+      "epoch": 0.006958608278344331,
+      "grad_norm": 4.017063617706299,
+      "learning_rate": 0.00019875272144169193,
+      "loss": 4.599,
+      "step": 87
+    },
+    {
+      "epoch": 0.007038592281543691,
+      "grad_norm": 4.747475624084473,
+      "learning_rate": 0.0001987201839553569,
+      "loss": 4.0236,
+      "step": 88
+    },
+    {
+      "epoch": 0.007118576284743052,
+      "grad_norm": 4.168716907501221,
+      "learning_rate": 0.00019868723024926891,
+      "loss": 3.9595,
+      "step": 89
+    },
+    {
+      "epoch": 0.007198560287942412,
+      "grad_norm": 4.3506011962890625,
+      "learning_rate": 0.00019865386046236596,
+      "loss": 4.4328,
+      "step": 90
+    },
+    {
+      "epoch": 0.007278544291141772,
+      "grad_norm": 4.0267839431762695,
+      "learning_rate": 0.00019862007473534025,
+      "loss": 4.749,
+      "step": 91
+    },
+    {
+      "epoch": 0.007358528294341132,
+      "grad_norm": 5.045677185058594,
+      "learning_rate": 0.00019858587321063776,
+      "loss": 4.7425,
+      "step": 92
+    },
+    {
+      "epoch": 0.007438512297540492,
+      "grad_norm": 4.732039451599121,
+      "learning_rate": 0.00019855125603245738,
+      "loss": 4.1719,
+      "step": 93
+    },
+    {
+      "epoch": 0.007518496300739852,
+      "grad_norm": 3.6055307388305664,
+      "learning_rate": 0.00019851622334675066,
+      "loss": 3.3813,
+      "step": 94
+    },
+    {
+      "epoch": 0.007598480303939212,
+      "grad_norm": 4.301301002502441,
+      "learning_rate": 0.00019848077530122083,
+      "loss": 4.7025,
+      "step": 95
+    },
+    {
+      "epoch": 0.007678464307138573,
+      "grad_norm": 4.625269889831543,
+      "learning_rate": 0.00019844491204532236,
+      "loss": 5.462,
+      "step": 96
+    },
+    {
+      "epoch": 0.007758448310337933,
+      "grad_norm": 4.291022300720215,
+      "learning_rate": 0.00019840863373026045,
+      "loss": 4.4898,
+      "step": 97
+    },
+    {
+      "epoch": 0.007838432313537293,
+      "grad_norm": 4.237590312957764,
+      "learning_rate": 0.0001983719405089901,
+      "loss": 4.578,
+      "step": 98
+    },
+    {
+      "epoch": 0.007918416316736652,
+      "grad_norm": 5.464932441711426,
+      "learning_rate": 0.00019833483253621568,
+      "loss": 4.622,
+      "step": 99
+    },
+    {
+      "epoch": 0.007998400319936013,
+      "grad_norm": 4.568652629852295,
+      "learning_rate": 0.0001982973099683902,
+      "loss": 3.8691,
+      "step": 100
+    },
+    {
+      "epoch": 0.008078384323135374,
+      "grad_norm": 3.8946774005889893,
+      "learning_rate": 0.00019825937296371468,
+      "loss": 3.707,
+      "step": 101
+    },
+    {
+      "epoch": 0.008158368326334733,
+      "grad_norm": 4.361711502075195,
+      "learning_rate": 0.00019822102168213753,
+      "loss": 4.472,
+      "step": 102
+    },
+    {
+      "epoch": 0.008238352329534094,
+      "grad_norm": 4.38311767578125,
+      "learning_rate": 0.0001981822562853537,
+      "loss": 4.7478,
+      "step": 103
+    },
+    {
+      "epoch": 0.008318336332733453,
+      "grad_norm": 4.354578018188477,
+      "learning_rate": 0.0001981430769368042,
+      "loss": 5.8493,
+      "step": 104
+    },
+    {
+      "epoch": 0.008398320335932814,
+      "grad_norm": 4.7935590744018555,
+      "learning_rate": 0.00019810348380167527,
+      "loss": 5.5214,
+      "step": 105
+    },
+    {
+      "epoch": 0.008478304339132173,
+      "grad_norm": 4.473304271697998,
+      "learning_rate": 0.00019806347704689778,
+      "loss": 4.4084,
+      "step": 106
+    },
+    {
+      "epoch": 0.008558288342331534,
+      "grad_norm": 3.7808096408843994,
+      "learning_rate": 0.00019802305684114648,
+      "loss": 4.8573,
+      "step": 107
+    },
+    {
+      "epoch": 0.008638272345530894,
+      "grad_norm": 4.11630392074585,
+      "learning_rate": 0.00019798222335483932,
+      "loss": 3.9953,
+      "step": 108
+    },
+    {
+      "epoch": 0.008718256348730254,
+      "grad_norm": 3.9533166885375977,
+      "learning_rate": 0.0001979409767601366,
+      "loss": 3.9655,
+      "step": 109
+    },
+    {
+      "epoch": 0.008798240351929614,
+      "grad_norm": 4.105170249938965,
+      "learning_rate": 0.00019789931723094046,
+      "loss": 3.9458,
+      "step": 110
+    },
+    {
+      "epoch": 0.008878224355128974,
+      "grad_norm": 5.170469284057617,
+      "learning_rate": 0.00019785724494289402,
+      "loss": 4.3416,
+      "step": 111
+    },
+    {
+      "epoch": 0.008958208358328335,
+      "grad_norm": 5.103896141052246,
+      "learning_rate": 0.00019781476007338058,
+      "loss": 4.6881,
+      "step": 112
+    },
+    {
+      "epoch": 0.009038192361527694,
+      "grad_norm": 5.349043369293213,
+      "learning_rate": 0.00019777186280152303,
+      "loss": 5.3737,
+      "step": 113
+    },
+    {
+      "epoch": 0.009118176364727055,
+      "grad_norm": 5.156305313110352,
+      "learning_rate": 0.000197728553308183,
+      "loss": 4.1623,
+      "step": 114
+    },
+    {
+      "epoch": 0.009198160367926415,
+      "grad_norm": 4.510937213897705,
+      "learning_rate": 0.0001976848317759601,
+      "loss": 6.1072,
+      "step": 115
+    },
+    {
+      "epoch": 0.009278144371125775,
+      "grad_norm": 6.212741851806641,
+      "learning_rate": 0.0001976406983891911,
+      "loss": 5.0148,
+      "step": 116
+    },
+    {
+      "epoch": 0.009358128374325135,
+      "grad_norm": 4.464901447296143,
+      "learning_rate": 0.00019759615333394932,
+      "loss": 4.0828,
+      "step": 117
+    },
+    {
+      "epoch": 0.009438112377524495,
+      "grad_norm": 5.239302158355713,
+      "learning_rate": 0.00019755119679804367,
+      "loss": 4.649,
+      "step": 118
+    },
+    {
+      "epoch": 0.009518096380723855,
+      "grad_norm": 4.33208703994751,
+      "learning_rate": 0.00019750582897101797,
+      "loss": 3.6407,
+      "step": 119
+    },
+    {
+      "epoch": 0.009598080383923215,
+      "grad_norm": 3.9181249141693115,
+      "learning_rate": 0.00019746005004415005,
+      "loss": 3.5174,
+      "step": 120
+    },
+    {
+      "epoch": 0.009678064387122575,
+      "grad_norm": 5.414444923400879,
+      "learning_rate": 0.00019741386021045104,
+      "loss": 4.3238,
+      "step": 121
+    },
+    {
+      "epoch": 0.009758048390321936,
+      "grad_norm": 5.03347635269165,
+      "learning_rate": 0.0001973672596646645,
+      "loss": 4.0335,
+      "step": 122
+    },
+    {
+      "epoch": 0.009838032393521295,
+      "grad_norm": 4.60539436340332,
+      "learning_rate": 0.00019732024860326565,
+      "loss": 4.4081,
+      "step": 123
+    },
+    {
+      "epoch": 0.009918016396720656,
+      "grad_norm": 4.669139862060547,
+      "learning_rate": 0.00019727282722446047,
+      "loss": 4.7103,
+      "step": 124
+    },
+    {
+      "epoch": 0.009998000399920015,
+      "grad_norm": 4.479554653167725,
+      "learning_rate": 0.00019722499572818496,
+      "loss": 4.5797,
+      "step": 125
+    },
+    {
+      "epoch": 0.010077984403119376,
+      "grad_norm": 4.219813346862793,
+      "learning_rate": 0.00019717675431610415,
+      "loss": 3.3457,
+      "step": 126
+    },
+    {
+      "epoch": 0.010157968406318735,
+      "grad_norm": 4.138061046600342,
+      "learning_rate": 0.0001971281031916114,
+      "loss": 4.4678,
+      "step": 127
+    },
+    {
+      "epoch": 0.010237952409518096,
+      "grad_norm": 4.0591230392456055,
+      "learning_rate": 0.00019707904255982745,
+      "loss": 3.4897,
+      "step": 128
+    },
+    {
+      "epoch": 0.010317936412717457,
+      "grad_norm": 5.270677089691162,
+      "learning_rate": 0.00019702957262759965,
+      "loss": 4.8912,
+      "step": 129
+    },
+    {
+      "epoch": 0.010397920415916816,
+      "grad_norm": 4.547000408172607,
+      "learning_rate": 0.00019697969360350098,
+      "loss": 4.3903,
+      "step": 130
+    },
+    {
+      "epoch": 0.010477904419116177,
+      "grad_norm": 4.9797139167785645,
+      "learning_rate": 0.00019692940569782913,
+      "loss": 4.5133,
+      "step": 131
+    },
+    {
+      "epoch": 0.010557888422315536,
+      "grad_norm": 6.008891582489014,
+      "learning_rate": 0.0001968787091226059,
+      "loss": 4.8437,
+      "step": 132
+    },
+    {
+      "epoch": 0.010637872425514897,
+      "grad_norm": 6.226869583129883,
+      "learning_rate": 0.0001968276040915759,
+      "loss": 4.9427,
+      "step": 133
+    },
+    {
+      "epoch": 0.010717856428714256,
+      "grad_norm": 5.999492168426514,
+      "learning_rate": 0.00019677609082020597,
+      "loss": 5.0577,
+      "step": 134
+    },
+    {
+      "epoch": 0.010797840431913617,
+      "grad_norm": 6.069042682647705,
+      "learning_rate": 0.00019672416952568416,
+      "loss": 5.8394,
+      "step": 135
+    },
+    {
+      "epoch": 0.010877824435112978,
+      "grad_norm": 4.348324775695801,
+      "learning_rate": 0.00019667184042691875,
+      "loss": 3.7166,
+      "step": 136
+    },
+    {
+      "epoch": 0.010957808438312337,
+      "grad_norm": 4.531675815582275,
+      "learning_rate": 0.00019661910374453744,
+      "loss": 5.053,
+      "step": 137
+    },
+    {
+      "epoch": 0.011037792441511698,
+      "grad_norm": 5.470901966094971,
+      "learning_rate": 0.00019656595970088628,
+      "loss": 4.7888,
+      "step": 138
+    },
+    {
+      "epoch": 0.011117776444711057,
+      "grad_norm": 4.40546989440918,
+      "learning_rate": 0.0001965124085200289,
+      "loss": 4.6627,
+      "step": 139
+    },
+    {
+      "epoch": 0.011197760447910418,
+      "grad_norm": 4.321747303009033,
+      "learning_rate": 0.00019645845042774553,
+      "loss": 3.2058,
+      "step": 140
+    },
+    {
+      "epoch": 0.011277744451109777,
+      "grad_norm": 4.924249649047852,
+      "learning_rate": 0.00019640408565153186,
+      "loss": 5.0648,
+      "step": 141
+    },
+    {
+      "epoch": 0.011357728454309138,
+      "grad_norm": 5.039271831512451,
+      "learning_rate": 0.00019634931442059832,
+      "loss": 5.2905,
+      "step": 142
+    },
+    {
+      "epoch": 0.011437712457508499,
+      "grad_norm": 4.569279193878174,
+      "learning_rate": 0.00019629413696586903,
+      "loss": 6.2246,
+      "step": 143
+    },
+    {
+      "epoch": 0.011517696460707858,
+      "grad_norm": 4.1219401359558105,
+      "learning_rate": 0.00019623855351998072,
+      "loss": 4.0549,
+      "step": 144
+    },
+    {
+      "epoch": 0.011597680463907219,
+      "grad_norm": 4.945868492126465,
+      "learning_rate": 0.00019618256431728194,
+      "loss": 3.6012,
+      "step": 145
+    },
+    {
+      "epoch": 0.011677664467106578,
+      "grad_norm": 5.271925926208496,
+      "learning_rate": 0.0001961261695938319,
+      "loss": 3.9763,
+      "step": 146
+    },
+    {
+      "epoch": 0.011757648470305939,
+      "grad_norm": 4.197992324829102,
+      "learning_rate": 0.00019606936958739963,
+      "loss": 4.2189,
+      "step": 147
+    },
+    {
+      "epoch": 0.011837632473505298,
+      "grad_norm": 7.92578649520874,
+      "learning_rate": 0.00019601216453746283,
+      "loss": 4.5706,
+      "step": 148
+    },
+    {
+      "epoch": 0.011917616476704659,
+      "grad_norm": 4.556253910064697,
+      "learning_rate": 0.00019595455468520695,
+      "loss": 3.3737,
+      "step": 149
+    },
+    {
+      "epoch": 0.01199760047990402,
+      "grad_norm": 4.560910224914551,
+      "learning_rate": 0.00019589654027352414,
+      "loss": 2.9566,
+      "step": 150
+    },
+    {
+      "epoch": 0.012077584483103379,
+      "grad_norm": 5.795867919921875,
+      "learning_rate": 0.00019583812154701225,
+      "loss": 4.4294,
+      "step": 151
+    },
+    {
+      "epoch": 0.01215756848630274,
+      "grad_norm": 5.964183330535889,
+      "learning_rate": 0.00019577929875197377,
+      "loss": 3.7239,
+      "step": 152
+    },
+    {
+      "epoch": 0.012237552489502099,
+      "grad_norm": 4.3232102394104,
+      "learning_rate": 0.00019572007213641485,
+      "loss": 3.0652,
+      "step": 153
+    },
+    {
+      "epoch": 0.01231753649270146,
+      "grad_norm": 4.781129360198975,
+      "learning_rate": 0.0001956604419500441,
+      "loss": 3.6271,
+      "step": 154
+    },
+    {
+      "epoch": 0.012397520495900819,
+      "grad_norm": 8.637353897094727,
+      "learning_rate": 0.0001956004084442718,
+      "loss": 5.0342,
+      "step": 155
+    },
+    {
+      "epoch": 0.01247750449910018,
+      "grad_norm": 4.655914306640625,
+      "learning_rate": 0.00019553997187220855,
+      "loss": 3.2862,
+      "step": 156
+    },
+    {
+      "epoch": 0.01255748850229954,
+      "grad_norm": 5.187161922454834,
+      "learning_rate": 0.00019547913248866444,
+      "loss": 5.0299,
+      "step": 157
+    },
+    {
+      "epoch": 0.0126374725054989,
+      "grad_norm": 6.352967739105225,
+      "learning_rate": 0.00019541789055014784,
+      "loss": 5.49,
+      "step": 158
+    },
+    {
+      "epoch": 0.01271745650869826,
+      "grad_norm": 4.698216915130615,
+      "learning_rate": 0.00019535624631486433,
+      "loss": 4.1956,
+      "step": 159
+    },
+    {
+      "epoch": 0.01279744051189762,
+      "grad_norm": 4.389121055603027,
+      "learning_rate": 0.00019529420004271567,
+      "loss": 3.7758,
+      "step": 160
+    },
+    {
+      "epoch": 0.01287742451509698,
+      "grad_norm": 4.86370849609375,
+      "learning_rate": 0.00019523175199529863,
+      "loss": 4.4053,
+      "step": 161
+    },
+    {
+      "epoch": 0.01295740851829634,
+      "grad_norm": 5.030034065246582,
+      "learning_rate": 0.000195168902435904,
+      "loss": 3.5576,
+      "step": 162
+    },
+    {
+      "epoch": 0.0130373925214957,
+      "grad_norm": 4.813689231872559,
+      "learning_rate": 0.00019510565162951537,
+      "loss": 3.9392,
+      "step": 163
+    },
+    {
+      "epoch": 0.013117376524695062,
+      "grad_norm": 4.899305820465088,
+      "learning_rate": 0.00019504199984280799,
+      "loss": 4.3442,
+      "step": 164
+    },
+    {
+      "epoch": 0.013197360527894421,
+      "grad_norm": 4.388523578643799,
+      "learning_rate": 0.0001949779473441478,
+      "loss": 3.5577,
+      "step": 165
+    },
+    {
+      "epoch": 0.013277344531093782,
+      "grad_norm": 5.756633758544922,
+      "learning_rate": 0.00019491349440359015,
+      "loss": 4.0196,
+      "step": 166
+    },
+    {
+      "epoch": 0.013357328534293141,
+      "grad_norm": 4.173125267028809,
+      "learning_rate": 0.0001948486412928787,
+      "loss": 3.4954,
+      "step": 167
+    },
+    {
+      "epoch": 0.013437312537492502,
+      "grad_norm": 4.68442964553833,
+      "learning_rate": 0.00019478338828544435,
+      "loss": 3.6734,
+      "step": 168
+    },
+    {
+      "epoch": 0.013517296540691861,
+      "grad_norm": 4.491235733032227,
+      "learning_rate": 0.00019471773565640403,
+      "loss": 2.8435,
+      "step": 169
+    },
+    {
+      "epoch": 0.013597280543891222,
+      "grad_norm": 5.267945289611816,
+      "learning_rate": 0.00019465168368255946,
+      "loss": 4.3094,
+      "step": 170
+    },
+    {
+      "epoch": 0.013677264547090583,
+      "grad_norm": 3.776217460632324,
+      "learning_rate": 0.00019458523264239612,
+      "loss": 2.9521,
+      "step": 171
+    },
+    {
+      "epoch": 0.013757248550289942,
+      "grad_norm": 5.366693019866943,
+      "learning_rate": 0.00019451838281608197,
+      "loss": 4.4226,
+      "step": 172
+    },
+    {
+      "epoch": 0.013837232553489303,
+      "grad_norm": 5.224587917327881,
+      "learning_rate": 0.00019445113448546638,
+      "loss": 4.0713,
+      "step": 173
+    },
+    {
+      "epoch": 0.013917216556688662,
+      "grad_norm": 5.798890113830566,
+      "learning_rate": 0.00019438348793407881,
+      "loss": 5.7135,
+      "step": 174
+    },
+    {
+      "epoch": 0.013997200559888023,
+      "grad_norm": 4.861985206604004,
+      "learning_rate": 0.00019431544344712776,
+      "loss": 4.3844,
+      "step": 175
+    },
+    {
+      "epoch": 0.014077184563087382,
+      "grad_norm": 4.904694557189941,
+      "learning_rate": 0.0001942470013114994,
+      "loss": 4.2836,
+      "step": 176
+    },
+    {
+      "epoch": 0.014157168566286743,
+      "grad_norm": 5.404961109161377,
+      "learning_rate": 0.0001941781618157565,
+      "loss": 4.0793,
+      "step": 177
+    },
+    {
+      "epoch": 0.014237152569486104,
+      "grad_norm": 3.9328558444976807,
+      "learning_rate": 0.0001941089252501372,
+      "loss": 3.4638,
+      "step": 178
+    },
+    {
+      "epoch": 0.014317136572685463,
+      "grad_norm": 7.811837196350098,
+      "learning_rate": 0.00019403929190655358,
+      "loss": 5.4092,
+      "step": 179
+    },
+    {
+      "epoch": 0.014397120575884824,
+      "grad_norm": 4.343170642852783,
+      "learning_rate": 0.00019396926207859084,
+      "loss": 4.003,
+      "step": 180
+    },
+    {
+      "epoch": 0.014477104579084183,
+      "grad_norm": 4.674137115478516,
+      "learning_rate": 0.00019389883606150566,
+      "loss": 3.613,
+      "step": 181
+    },
+    {
+      "epoch": 0.014557088582283544,
+      "grad_norm": 5.229063510894775,
+      "learning_rate": 0.00019382801415222516,
+      "loss": 4.1551,
+      "step": 182
+    },
+    {
+      "epoch": 0.014637072585482903,
+      "grad_norm": 4.8597493171691895,
+      "learning_rate": 0.00019375679664934556,
+      "loss": 3.1254,
+      "step": 183
+    },
+    {
+      "epoch": 0.014717056588682264,
+      "grad_norm": 4.986814498901367,
+      "learning_rate": 0.00019368518385313107,
+      "loss": 3.7172,
+      "step": 184
+    },
+    {
+      "epoch": 0.014797040591881624,
+      "grad_norm": 4.790543079376221,
+      "learning_rate": 0.00019361317606551238,
+      "loss": 4.4556,
+      "step": 185
+    },
+    {
+      "epoch": 0.014877024595080984,
+      "grad_norm": 4.165306568145752,
+      "learning_rate": 0.0001935407735900857,
+      "loss": 3.7351,
+      "step": 186
+    },
+    {
+      "epoch": 0.014957008598280344,
+      "grad_norm": 3.8044018745422363,
+      "learning_rate": 0.00019346797673211108,
+      "loss": 2.7252,
+      "step": 187
+    },
+    {
+      "epoch": 0.015036992601479704,
+      "grad_norm": 3.905625581741333,
+      "learning_rate": 0.00019339478579851155,
+      "loss": 3.4771,
+      "step": 188
+    },
+    {
+      "epoch": 0.015116976604679064,
+      "grad_norm": 3.9782605171203613,
+      "learning_rate": 0.0001933212010978715,
+      "loss": 3.2635,
+      "step": 189
+    },
+    {
+      "epoch": 0.015196960607878424,
+      "grad_norm": 5.697634696960449,
+      "learning_rate": 0.00019324722294043558,
+      "loss": 4.5356,
+      "step": 190
+    },
+    {
+      "epoch": 0.015276944611077784,
+      "grad_norm": 4.195896148681641,
+      "learning_rate": 0.0001931728516381073,
+      "loss": 3.6118,
+      "step": 191
+    },
+    {
+      "epoch": 0.015356928614277145,
+      "grad_norm": 4.738584518432617,
+      "learning_rate": 0.0001930980875044477,
+      "loss": 4.2477,
+      "step": 192
+    },
+    {
+      "epoch": 0.015436912617476505,
+      "grad_norm": 4.4908623695373535,
+      "learning_rate": 0.00019302293085467405,
+      "loss": 3.6161,
+      "step": 193
+    },
+    {
+      "epoch": 0.015516896620675865,
+      "grad_norm": 3.800361394882202,
+      "learning_rate": 0.00019294738200565856,
+      "loss": 3.952,
+      "step": 194
+    },
+    {
+      "epoch": 0.015596880623875225,
+      "grad_norm": 4.386847972869873,
+      "learning_rate": 0.00019287144127592704,
+      "loss": 4.3552,
+      "step": 195
+    },
+    {
+      "epoch": 0.015676864627074585,
+      "grad_norm": 5.358420372009277,
+      "learning_rate": 0.0001927951089856575,
+      "loss": 4.2666,
+      "step": 196
+    },
+    {
+      "epoch": 0.015756848630273945,
+      "grad_norm": 5.825019359588623,
+      "learning_rate": 0.00019271838545667876,
+      "loss": 3.5997,
+      "step": 197
+    },
+    {
+      "epoch": 0.015836832633473304,
+      "grad_norm": 5.260373592376709,
+      "learning_rate": 0.0001926412710124693,
+      "loss": 4.0126,
+      "step": 198
+    },
+    {
+      "epoch": 0.015916816636672666,
+      "grad_norm": 5.913776874542236,
+      "learning_rate": 0.00019256376597815564,
+      "loss": 4.1883,
+      "step": 199
+    },
+    {
+      "epoch": 0.015996800639872025,
+      "grad_norm": 4.220345497131348,
+      "learning_rate": 0.0001924858706805112,
+      "loss": 3.3007,
+      "step": 200
+    },
+    {
+      "epoch": 0.016076784643071385,
+      "grad_norm": 5.506844520568848,
+      "learning_rate": 0.0001924075854479547,
+      "loss": 4.8178,
+      "step": 201
+    },
+    {
+      "epoch": 0.016156768646270747,
+      "grad_norm": 4.918891906738281,
+      "learning_rate": 0.00019232891061054895,
+      "loss": 3.23,
+      "step": 202
+    },
+    {
+      "epoch": 0.016236752649470106,
+      "grad_norm": 5.011964797973633,
+      "learning_rate": 0.0001922498464999994,
+      "loss": 4.1303,
+      "step": 203
+    },
+    {
+      "epoch": 0.016316736652669465,
+      "grad_norm": 4.176311492919922,
+      "learning_rate": 0.0001921703934496527,
+      "loss": 3.845,
+      "step": 204
+    },
+    {
+      "epoch": 0.016396720655868825,
+      "grad_norm": 4.5768537521362305,
+      "learning_rate": 0.0001920905517944954,
+      "loss": 3.6125,
+      "step": 205
+    },
+    {
+      "epoch": 0.016476704659068187,
+      "grad_norm": 4.338911533355713,
+      "learning_rate": 0.00019201032187115234,
+      "loss": 3.5755,
+      "step": 206
+    },
+    {
+      "epoch": 0.016556688662267546,
+      "grad_norm": 4.148381233215332,
+      "learning_rate": 0.0001919297040178855,
+      "loss": 3.1388,
+      "step": 207
+    },
+    {
+      "epoch": 0.016636672665466905,
+      "grad_norm": 4.2749199867248535,
+      "learning_rate": 0.00019184869857459232,
+      "loss": 3.1602,
+      "step": 208
+    },
+    {
+      "epoch": 0.016716656668666268,
+      "grad_norm": 4.980453968048096,
+      "learning_rate": 0.00019176730588280446,
+      "loss": 5.2039,
+      "step": 209
+    },
+    {
+      "epoch": 0.016796640671865627,
+      "grad_norm": 3.6742446422576904,
+      "learning_rate": 0.00019168552628568631,
+      "loss": 3.8579,
+      "step": 210
+    },
+    {
+      "epoch": 0.016876624675064986,
+      "grad_norm": 5.142124176025391,
+      "learning_rate": 0.00019160336012803337,
+      "loss": 3.8598,
+      "step": 211
+    },
+    {
+      "epoch": 0.016956608678264345,
+      "grad_norm": 3.8903112411499023,
+      "learning_rate": 0.00019152080775627103,
+      "loss": 3.3726,
+      "step": 212
+    },
+    {
+      "epoch": 0.017036592681463708,
+      "grad_norm": 3.7564847469329834,
+      "learning_rate": 0.0001914378695184531,
+      "loss": 3.2007,
+      "step": 213
+    },
+    {
+      "epoch": 0.017116576684663067,
+      "grad_norm": 7.749126434326172,
+      "learning_rate": 0.0001913545457642601,
+      "loss": 3.9879,
+      "step": 214
+    },
+    {
+      "epoch": 0.017196560687862426,
+      "grad_norm": 3.9350130558013916,
+      "learning_rate": 0.00019127083684499806,
+      "loss": 2.8209,
+      "step": 215
+    },
+    {
+      "epoch": 0.01727654469106179,
+      "grad_norm": 4.431557655334473,
+      "learning_rate": 0.00019118674311359684,
+      "loss": 4.0645,
+      "step": 216
+    },
+    {
+      "epoch": 0.017356528694261148,
+      "grad_norm": 6.666266441345215,
+      "learning_rate": 0.00019110226492460885,
+      "loss": 5.2874,
+      "step": 217
+    },
+    {
+      "epoch": 0.017436512697460507,
+      "grad_norm": 4.859610080718994,
+      "learning_rate": 0.0001910174026342073,
+      "loss": 3.3933,
+      "step": 218
+    },
+    {
+      "epoch": 0.017516496700659866,
+      "grad_norm": 6.4448676109313965,
+      "learning_rate": 0.00019093215660018493,
+      "loss": 2.8147,
+      "step": 219
+    },
+    {
+      "epoch": 0.01759648070385923,
+      "grad_norm": 4.527115821838379,
+      "learning_rate": 0.00019084652718195238,
+      "loss": 4.2256,
+      "step": 220
+    },
+    {
+      "epoch": 0.017676464707058588,
+      "grad_norm": 5.745115280151367,
+      "learning_rate": 0.00019076051474053665,
+      "loss": 3.204,
+      "step": 221
+    },
+    {
+      "epoch": 0.017756448710257947,
+      "grad_norm": 5.867636680603027,
+      "learning_rate": 0.00019067411963857967,
+      "loss": 4.5293,
+      "step": 222
+    },
+    {
+      "epoch": 0.01783643271345731,
+      "grad_norm": 4.640747547149658,
+      "learning_rate": 0.00019058734224033672,
+      "loss": 4.6124,
+      "step": 223
+    },
+    {
+      "epoch": 0.01791641671665667,
+      "grad_norm": 3.6582233905792236,
+      "learning_rate": 0.0001905001829116749,
+      "loss": 2.5253,
+      "step": 224
+    },
+    {
+      "epoch": 0.017996400719856028,
+      "grad_norm": 5.60170841217041,
+      "learning_rate": 0.0001904126420200716,
+      "loss": 4.8805,
+      "step": 225
+    },
+    {
+      "epoch": 0.018076384723055387,
+      "grad_norm": 6.315110683441162,
+      "learning_rate": 0.0001903247199346129,
+      "loss": 4.0317,
+      "step": 226
+    },
+    {
+      "epoch": 0.01815636872625475,
+      "grad_norm": 3.8886184692382812,
+      "learning_rate": 0.0001902364170259921,
+      "loss": 4.6138,
+      "step": 227
+    },
+    {
+      "epoch": 0.01823635272945411,
+      "grad_norm": 5.318291664123535,
+      "learning_rate": 0.00019014773366650807,
+      "loss": 4.1923,
+      "step": 228
+    },
+    {
+      "epoch": 0.018316336732653468,
+      "grad_norm": 4.346369743347168,
+      "learning_rate": 0.00019005867023006375,
+      "loss": 3.8385,
+      "step": 229
+    },
+    {
+      "epoch": 0.01839632073585283,
+      "grad_norm": 6.551377296447754,
+      "learning_rate": 0.00018996922709216455,
+      "loss": 3.8002,
+      "step": 230
+    },
+    {
+      "epoch": 0.01847630473905219,
+      "grad_norm": 4.525792121887207,
+      "learning_rate": 0.0001898794046299167,
+      "loss": 2.9745,
+      "step": 231
+    },
+    {
+      "epoch": 0.01855628874225155,
+      "grad_norm": 4.6721625328063965,
+      "learning_rate": 0.00018978920322202582,
+      "loss": 4.3312,
+      "step": 232
+    },
+    {
+      "epoch": 0.018636272745450908,
+      "grad_norm": 6.13138484954834,
+      "learning_rate": 0.00018969862324879513,
+      "loss": 3.5232,
+      "step": 233
+    },
+    {
+      "epoch": 0.01871625674865027,
+      "grad_norm": 3.9804527759552,
+      "learning_rate": 0.000189607665092124,
+      "loss": 3.8828,
+      "step": 234
+    },
+    {
+      "epoch": 0.01879624075184963,
+      "grad_norm": 3.796212911605835,
+      "learning_rate": 0.00018951632913550626,
+      "loss": 3.4832,
+      "step": 235
+    },
+    {
+      "epoch": 0.01887622475504899,
+      "grad_norm": 3.9256110191345215,
+      "learning_rate": 0.00018942461576402857,
+      "loss": 4.0118,
+      "step": 236
+    },
+    {
+      "epoch": 0.01895620875824835,
+      "grad_norm": 4.6217474937438965,
+      "learning_rate": 0.0001893325253643689,
+      "loss": 3.6662,
+      "step": 237
+    },
+    {
+      "epoch": 0.01903619276144771,
+      "grad_norm": 7.017457485198975,
+      "learning_rate": 0.00018924005832479478,
+      "loss": 4.5404,
+      "step": 238
+    },
+    {
+      "epoch": 0.01911617676464707,
+      "grad_norm": 7.708695411682129,
+      "learning_rate": 0.00018914721503516177,
+      "loss": 5.2304,
+      "step": 239
+    },
+    {
+      "epoch": 0.01919616076784643,
+      "grad_norm": 4.276139736175537,
+      "learning_rate": 0.00018905399588691163,
+      "loss": 4.0688,
+      "step": 240
+    },
+    {
+      "epoch": 0.01927614477104579,
+      "grad_norm": 3.8450303077697754,
+      "learning_rate": 0.00018896040127307098,
+      "loss": 4.257,
+      "step": 241
+    },
+    {
+      "epoch": 0.01935612877424515,
+      "grad_norm": 5.30068826675415,
+      "learning_rate": 0.0001888664315882493,
+      "loss": 5.0476,
+      "step": 242
+    },
+    {
+      "epoch": 0.01943611277744451,
+      "grad_norm": 5.693291664123535,
+      "learning_rate": 0.0001887720872286375,
+      "loss": 3.6242,
+      "step": 243
+    },
+    {
+      "epoch": 0.019516096780643873,
+      "grad_norm": 4.2235426902771,
+      "learning_rate": 0.0001886773685920062,
+      "loss": 4.0089,
+      "step": 244
+    },
+    {
+      "epoch": 0.019596080783843232,
+      "grad_norm": 3.830811023712158,
+      "learning_rate": 0.00018858227607770398,
+      "loss": 3.2782,
+      "step": 245
+    },
+    {
+      "epoch": 0.01967606478704259,
+      "grad_norm": 4.885007381439209,
+      "learning_rate": 0.00018848681008665582,
+      "loss": 4.5444,
+      "step": 246
+    },
+    {
+      "epoch": 0.01975604879024195,
+      "grad_norm": 4.123133659362793,
+      "learning_rate": 0.0001883909710213612,
+      "loss": 3.8321,
+      "step": 247
+    },
+    {
+      "epoch": 0.019836032793441313,
+      "grad_norm": 4.4161553382873535,
+      "learning_rate": 0.00018829475928589271,
+      "loss": 3.9976,
+      "step": 248
+    },
+    {
+      "epoch": 0.019916016796640672,
+      "grad_norm": 4.6640119552612305,
+      "learning_rate": 0.00018819817528589402,
+      "loss": 5.2068,
+      "step": 249
+    },
+    {
+      "epoch": 0.01999600079984003,
+      "grad_norm": 3.815941572189331,
+      "learning_rate": 0.00018810121942857845,
+      "loss": 3.5509,
+      "step": 250
+    },
+    {
+      "epoch": 0.020075984803039394,
+      "grad_norm": 4.366556167602539,
+      "learning_rate": 0.00018800389212272707,
+      "loss": 4.0747,
+      "step": 251
+    },
+    {
+      "epoch": 0.020155968806238753,
+      "grad_norm": 4.601415157318115,
+      "learning_rate": 0.00018790619377868703,
+      "loss": 4.1809,
+      "step": 252
+    },
+    {
+      "epoch": 0.020235952809438112,
+      "grad_norm": 3.5466244220733643,
+      "learning_rate": 0.0001878081248083698,
+      "loss": 3.5347,
+      "step": 253
+    },
+    {
+      "epoch": 0.02031593681263747,
+      "grad_norm": 4.740442752838135,
+      "learning_rate": 0.0001877096856252496,
+      "loss": 4.3345,
+      "step": 254
+    },
+    {
+      "epoch": 0.020395920815836834,
+      "grad_norm": 4.473808288574219,
+      "learning_rate": 0.00018761087664436138,
+      "loss": 3.6354,
+      "step": 255
+    },
+    {
+      "epoch": 0.020475904819036193,
+      "grad_norm": 5.302408695220947,
+      "learning_rate": 0.00018751169828229927,
+      "loss": 4.7978,
+      "step": 256
+    },
+    {
+      "epoch": 0.020555888822235552,
+      "grad_norm": 4.8885698318481445,
+      "learning_rate": 0.00018741215095721483,
+      "loss": 4.3185,
+      "step": 257
+    },
+    {
+      "epoch": 0.020635872825434914,
+      "grad_norm": 4.722813129425049,
+      "learning_rate": 0.0001873122350888151,
+      "loss": 4.2099,
+      "step": 258
+    },
+    {
+      "epoch": 0.020715856828634274,
+      "grad_norm": 4.454716205596924,
+      "learning_rate": 0.0001872119510983611,
+      "loss": 3.6928,
+      "step": 259
+    },
+    {
+      "epoch": 0.020795840831833633,
+      "grad_norm": 4.948916435241699,
+      "learning_rate": 0.00018711129940866575,
+      "loss": 4.193,
+      "step": 260
+    },
+    {
+      "epoch": 0.020875824835032992,
+      "grad_norm": 4.173530578613281,
+      "learning_rate": 0.00018701028044409244,
+      "loss": 3.6785,
+      "step": 261
+    },
+    {
+      "epoch": 0.020955808838232354,
+      "grad_norm": 4.344187259674072,
+      "learning_rate": 0.00018690889463055283,
+      "loss": 4.2572,
+      "step": 262
+    },
+    {
+      "epoch": 0.021035792841431714,
+      "grad_norm": 4.657215595245361,
+      "learning_rate": 0.00018680714239550548,
+      "loss": 4.4998,
+      "step": 263
+    },
+    {
+      "epoch": 0.021115776844631073,
+      "grad_norm": 4.247788906097412,
+      "learning_rate": 0.00018670502416795367,
+      "loss": 3.063,
+      "step": 264
+    },
+    {
+      "epoch": 0.021195760847830435,
+      "grad_norm": 3.5779287815093994,
+      "learning_rate": 0.00018660254037844388,
+      "loss": 3.5982,
+      "step": 265
+    },
+    {
+      "epoch": 0.021275744851029794,
+      "grad_norm": 6.588551044464111,
+      "learning_rate": 0.0001864996914590638,
+      "loss": 4.7009,
+      "step": 266
+    },
+    {
+      "epoch": 0.021355728854229154,
+      "grad_norm": 6.7050395011901855,
+      "learning_rate": 0.00018639647784344057,
+      "loss": 4.9945,
+      "step": 267
+    },
+    {
+      "epoch": 0.021435712857428513,
+      "grad_norm": 4.874384880065918,
+      "learning_rate": 0.00018629289996673897,
+      "loss": 3.8558,
+      "step": 268
+    },
+    {
+      "epoch": 0.021515696860627875,
+      "grad_norm": 4.827778339385986,
+      "learning_rate": 0.00018618895826565956,
+      "loss": 3.3468,
+      "step": 269
+    },
+    {
+      "epoch": 0.021595680863827234,
+      "grad_norm": 3.7480504512786865,
+      "learning_rate": 0.00018608465317843678,
+      "loss": 2.802,
+      "step": 270
+    },
+    {
+      "epoch": 0.021675664867026594,
+      "grad_norm": 3.7615654468536377,
+      "learning_rate": 0.00018597998514483725,
+      "loss": 3.4261,
+      "step": 271
+    },
+    {
+      "epoch": 0.021755648870225956,
+      "grad_norm": 4.826491355895996,
+      "learning_rate": 0.00018587495460615778,
+      "loss": 4.5241,
+      "step": 272
+    },
+    {
+      "epoch": 0.021835632873425315,
+      "grad_norm": 4.077682018280029,
+      "learning_rate": 0.00018576956200522354,
+      "loss": 3.3679,
+      "step": 273
+    },
+    {
+      "epoch": 0.021915616876624675,
+      "grad_norm": 4.723017692565918,
+      "learning_rate": 0.00018566380778638628,
+      "loss": 4.3965,
+      "step": 274
+    },
+    {
+      "epoch": 0.021995600879824034,
+      "grad_norm": 4.252414703369141,
+      "learning_rate": 0.00018555769239552233,
+      "loss": 3.0784,
+      "step": 275
+    },
+    {
+      "epoch": 0.022075584883023396,
+      "grad_norm": 5.542708396911621,
+      "learning_rate": 0.00018545121628003077,
+      "loss": 4.9838,
+      "step": 276
+    },
+    {
+      "epoch": 0.022155568886222755,
+      "grad_norm": 5.040409088134766,
+      "learning_rate": 0.0001853443798888316,
+      "loss": 4.732,
+      "step": 277
+    },
+    {
+      "epoch": 0.022235552889422115,
+      "grad_norm": 5.463371276855469,
+      "learning_rate": 0.0001852371836723638,
+      "loss": 3.2822,
+      "step": 278
+    },
+    {
+      "epoch": 0.022315536892621477,
+      "grad_norm": 4.664473056793213,
+      "learning_rate": 0.0001851296280825833,
+      "loss": 4.2588,
+      "step": 279
+    },
+    {
+      "epoch": 0.022395520895820836,
+      "grad_norm": 3.489403009414673,
+      "learning_rate": 0.00018502171357296144,
+      "loss": 2.7776,
+      "step": 280
+    },
+    {
+      "epoch": 0.022475504899020195,
+      "grad_norm": 4.194101810455322,
+      "learning_rate": 0.00018491344059848257,
+      "loss": 3.9611,
+      "step": 281
+    },
+    {
+      "epoch": 0.022555488902219555,
+      "grad_norm": 3.926426887512207,
+      "learning_rate": 0.0001848048096156426,
+      "loss": 4.4041,
+      "step": 282
+    },
+    {
+      "epoch": 0.022635472905418917,
+      "grad_norm": 4.642327308654785,
+      "learning_rate": 0.0001846958210824467,
+      "loss": 3.755,
+      "step": 283
+    },
+    {
+      "epoch": 0.022715456908618276,
+      "grad_norm": 4.824521064758301,
+      "learning_rate": 0.00018458647545840763,
+      "loss": 3.2453,
+      "step": 284
+    },
+    {
+      "epoch": 0.022795440911817635,
+      "grad_norm": 4.623475074768066,
+      "learning_rate": 0.00018447677320454367,
+      "loss": 3.9445,
+      "step": 285
+    },
+    {
+      "epoch": 0.022875424915016998,
+      "grad_norm": 5.094510555267334,
+      "learning_rate": 0.00018436671478337666,
+      "loss": 4.0939,
+      "step": 286
+    },
+    {
+      "epoch": 0.022955408918216357,
+      "grad_norm": 4.4365339279174805,
+      "learning_rate": 0.00018425630065893014,
+      "loss": 3.676,
+      "step": 287
+    },
+    {
+      "epoch": 0.023035392921415716,
+      "grad_norm": 4.2287187576293945,
+      "learning_rate": 0.00018414553129672732,
+      "loss": 4.9768,
+      "step": 288
+    },
+    {
+      "epoch": 0.023115376924615075,
+      "grad_norm": 4.334507465362549,
+      "learning_rate": 0.00018403440716378928,
+      "loss": 3.5081,
+      "step": 289
+    },
+    {
+      "epoch": 0.023195360927814438,
+      "grad_norm": 4.643796443939209,
+      "learning_rate": 0.00018392292872863267,
+      "loss": 2.7211,
+      "step": 290
+    },
+    {
+      "epoch": 0.023275344931013797,
+      "grad_norm": 5.521508693695068,
+      "learning_rate": 0.00018381109646126807,
+      "loss": 3.4703,
+      "step": 291
+    },
+    {
+      "epoch": 0.023355328934213156,
+      "grad_norm": 18.558773040771484,
+      "learning_rate": 0.00018369891083319778,
+      "loss": 4.6301,
+      "step": 292
+    },
+    {
+      "epoch": 0.02343531293741252,
+      "grad_norm": 4.116851806640625,
+      "learning_rate": 0.00018358637231741405,
+      "loss": 2.6934,
+      "step": 293
+    },
+    {
+      "epoch": 0.023515296940611878,
+      "grad_norm": 3.7274317741394043,
+      "learning_rate": 0.00018347348138839683,
+      "loss": 3.3539,
+      "step": 294
+    },
+    {
+      "epoch": 0.023595280943811237,
+      "grad_norm": 4.779667854309082,
+      "learning_rate": 0.00018336023852211195,
+      "loss": 4.4736,
+      "step": 295
+    },
+    {
+      "epoch": 0.023675264947010596,
+      "grad_norm": 3.460465669631958,
+      "learning_rate": 0.0001832466441960091,
+      "loss": 3.0599,
+      "step": 296
+    },
+    {
+      "epoch": 0.02375524895020996,
+      "grad_norm": 5.266543388366699,
+      "learning_rate": 0.0001831326988890197,
+      "loss": 4.399,
+      "step": 297
+    },
+    {
+      "epoch": 0.023835232953409318,
+      "grad_norm": 5.425334453582764,
+      "learning_rate": 0.00018301840308155507,
+      "loss": 4.8967,
+      "step": 298
+    },
+    {
+      "epoch": 0.023915216956608677,
+      "grad_norm": 4.659543037414551,
+      "learning_rate": 0.00018290375725550417,
+      "loss": 3.5647,
+      "step": 299
+    },
+    {
+      "epoch": 0.02399520095980804,
+      "grad_norm": 4.593050956726074,
+      "learning_rate": 0.00018278876189423179,
+      "loss": 4.2569,
+      "step": 300
+    },
+    {
+      "epoch": 0.0240751849630074,
+      "grad_norm": 5.206689357757568,
+      "learning_rate": 0.00018267341748257635,
+      "loss": 4.7223,
+      "step": 301
+    },
+    {
+      "epoch": 0.024155168966206758,
+      "grad_norm": 5.0191650390625,
+      "learning_rate": 0.00018255772450684798,
+      "loss": 3.9914,
+      "step": 302
+    },
+    {
+      "epoch": 0.024235152969406117,
+      "grad_norm": 5.315042018890381,
+      "learning_rate": 0.00018244168345482635,
+      "loss": 3.8242,
+      "step": 303
+    },
+    {
+      "epoch": 0.02431513697260548,
+      "grad_norm": 3.8089730739593506,
+      "learning_rate": 0.00018232529481575872,
+      "loss": 3.3759,
+      "step": 304
+    },
+    {
+      "epoch": 0.02439512097580484,
+      "grad_norm": 4.05519962310791,
+      "learning_rate": 0.00018220855908035783,
+      "loss": 4.3841,
+      "step": 305
+    },
+    {
+      "epoch": 0.024475104979004198,
+      "grad_norm": 4.002076148986816,
+      "learning_rate": 0.00018209147674079983,
+      "loss": 2.8373,
+      "step": 306
+    },
+    {
+      "epoch": 0.02455508898220356,
+      "grad_norm": 7.781075954437256,
+      "learning_rate": 0.00018197404829072215,
+      "loss": 4.7255,
+      "step": 307
+    },
+    {
+      "epoch": 0.02463507298540292,
+      "grad_norm": 4.209305286407471,
+      "learning_rate": 0.00018185627422522148,
+      "loss": 4.07,
+      "step": 308
+    },
+    {
+      "epoch": 0.02471505698860228,
+      "grad_norm": 5.018499374389648,
+      "learning_rate": 0.00018173815504085185,
+      "loss": 5.1119,
+      "step": 309
+    },
+    {
+      "epoch": 0.024795040991801638,
+      "grad_norm": 4.057737350463867,
+      "learning_rate": 0.0001816196912356222,
+      "loss": 3.4909,
+      "step": 310
+    },
+    {
+      "epoch": 0.024875024995001,
+      "grad_norm": 3.992469549179077,
+      "learning_rate": 0.00018150088330899438,
+      "loss": 3.5484,
+      "step": 311
+    },
+    {
+      "epoch": 0.02495500899820036,
+      "grad_norm": 3.899209499359131,
+      "learning_rate": 0.00018138173176188133,
+      "loss": 3.4713,
+      "step": 312
+    },
+    {
+      "epoch": 0.02503499300139972,
+      "grad_norm": 3.767697334289551,
+      "learning_rate": 0.00018126223709664458,
+      "loss": 3.6896,
+      "step": 313
+    },
+    {
+      "epoch": 0.02511497700459908,
+      "grad_norm": 4.881612300872803,
+      "learning_rate": 0.00018114239981709232,
+      "loss": 5.9193,
+      "step": 314
+    },
+    {
+      "epoch": 0.02519496100779844,
+      "grad_norm": 4.717601299285889,
+      "learning_rate": 0.00018102222042847737,
+      "loss": 3.0861,
+      "step": 315
+    },
+    {
+      "epoch": 0.0252749450109978,
+      "grad_norm": 4.3629655838012695,
+      "learning_rate": 0.00018090169943749476,
+      "loss": 3.9999,
+      "step": 316
+    },
+    {
+      "epoch": 0.02535492901419716,
+      "grad_norm": 3.4362242221832275,
+      "learning_rate": 0.0001807808373522799,
+      "loss": 2.9295,
+      "step": 317
+    },
+    {
+      "epoch": 0.02543491301739652,
+      "grad_norm": 4.793106555938721,
+      "learning_rate": 0.00018065963468240625,
+      "loss": 4.5639,
+      "step": 318
+    },
+    {
+      "epoch": 0.02551489702059588,
+      "grad_norm": 4.6200995445251465,
+      "learning_rate": 0.00018053809193888326,
+      "loss": 4.0014,
+      "step": 319
+    },
+    {
+      "epoch": 0.02559488102379524,
+      "grad_norm": 4.23821496963501,
+      "learning_rate": 0.00018041620963415417,
+      "loss": 3.8532,
+      "step": 320
+    },
+    {
+      "epoch": 0.025674865026994603,
+      "grad_norm": 3.4566526412963867,
+      "learning_rate": 0.00018029398828209385,
+      "loss": 3.0738,
+      "step": 321
+    },
+    {
+      "epoch": 0.02575484903019396,
+      "grad_norm": 4.199108123779297,
+      "learning_rate": 0.00018017142839800668,
+      "loss": 3.502,
+      "step": 322
+    },
+    {
+      "epoch": 0.02583483303339332,
+      "grad_norm": 5.139927864074707,
+      "learning_rate": 0.00018004853049862426,
+      "loss": 4.6712,
+      "step": 323
+    },
+    {
+      "epoch": 0.02591481703659268,
+      "grad_norm": 3.8838582038879395,
+      "learning_rate": 0.00017992529510210348,
+      "loss": 3.6336,
+      "step": 324
+    },
+    {
+      "epoch": 0.025994801039792043,
+      "grad_norm": 4.37532377243042,
+      "learning_rate": 0.000179801722728024,
+      "loss": 3.3539,
+      "step": 325
+    },
+    {
+      "epoch": 0.0260747850429914,
+      "grad_norm": 3.94706392288208,
+      "learning_rate": 0.00017967781389738625,
+      "loss": 3.2856,
+      "step": 326
+    },
+    {
+      "epoch": 0.02615476904619076,
+      "grad_norm": 4.521001815795898,
+      "learning_rate": 0.00017955356913260933,
+      "loss": 3.8893,
+      "step": 327
+    },
+    {
+      "epoch": 0.026234753049390123,
+      "grad_norm": 5.034095764160156,
+      "learning_rate": 0.0001794289889575286,
+      "loss": 3.3198,
+      "step": 328
+    },
+    {
+      "epoch": 0.026314737052589483,
+      "grad_norm": 5.225372791290283,
+      "learning_rate": 0.00017930407389739363,
+      "loss": 3.2792,
+      "step": 329
+    },
+    {
+      "epoch": 0.026394721055788842,
+      "grad_norm": 4.562963962554932,
+      "learning_rate": 0.00017917882447886582,
+      "loss": 4.2738,
+      "step": 330
+    },
+    {
+      "epoch": 0.0264747050589882,
+      "grad_norm": 4.397261142730713,
+      "learning_rate": 0.00017905324123001633,
+      "loss": 3.7458,
+      "step": 331
+    },
+    {
+      "epoch": 0.026554689062187564,
+      "grad_norm": 4.76125955581665,
+      "learning_rate": 0.00017892732468032386,
+      "loss": 3.6008,
+      "step": 332
+    },
+    {
+      "epoch": 0.026634673065386923,
+      "grad_norm": 4.582405090332031,
+      "learning_rate": 0.00017880107536067218,
+      "loss": 3.7548,
+      "step": 333
+    },
+    {
+      "epoch": 0.026714657068586282,
+      "grad_norm": 4.324102401733398,
+      "learning_rate": 0.00017867449380334834,
+      "loss": 3.357,
+      "step": 334
+    },
+    {
+      "epoch": 0.026794641071785644,
+      "grad_norm": 5.060311317443848,
+      "learning_rate": 0.00017854758054203988,
+      "loss": 4.4687,
+      "step": 335
+    },
+    {
+      "epoch": 0.026874625074985004,
+      "grad_norm": 3.645390033721924,
+      "learning_rate": 0.00017842033611183307,
+      "loss": 3.6036,
+      "step": 336
+    },
+    {
+      "epoch": 0.026954609078184363,
+      "grad_norm": 4.807192325592041,
+      "learning_rate": 0.00017829276104921028,
+      "loss": 4.1459,
+      "step": 337
+    },
+    {
+      "epoch": 0.027034593081383722,
+      "grad_norm": 5.273848056793213,
+      "learning_rate": 0.00017816485589204801,
+      "loss": 3.9517,
+      "step": 338
+    },
+    {
+      "epoch": 0.027114577084583084,
+      "grad_norm": 3.860368013381958,
+      "learning_rate": 0.00017803662117961438,
+      "loss": 3.0302,
+      "step": 339
+    },
+    {
+      "epoch": 0.027194561087782444,
+      "grad_norm": 4.04617977142334,
+      "learning_rate": 0.00017790805745256704,
+      "loss": 3.5597,
+      "step": 340
+    },
+    {
+      "epoch": 0.027274545090981803,
+      "grad_norm": 4.716963768005371,
+      "learning_rate": 0.0001777791652529508,
+      "loss": 3.8941,
+      "step": 341
+    },
+    {
+      "epoch": 0.027354529094181165,
+      "grad_norm": 5.491020202636719,
+      "learning_rate": 0.00017764994512419534,
+      "loss": 4.2495,
+      "step": 342
+    },
+    {
+      "epoch": 0.027434513097380524,
+      "grad_norm": 4.510169982910156,
+      "learning_rate": 0.00017752039761111297,
+      "loss": 3.5249,
+      "step": 343
+    },
+    {
+      "epoch": 0.027514497100579884,
+      "grad_norm": 7.709770202636719,
+      "learning_rate": 0.0001773905232598963,
+      "loss": 4.3635,
+      "step": 344
+    },
+    {
+      "epoch": 0.027594481103779243,
+      "grad_norm": 4.040260314941406,
+      "learning_rate": 0.0001772603226181159,
+      "loss": 3.3084,
+      "step": 345
+    },
+    {
+      "epoch": 0.027674465106978605,
+      "grad_norm": 10.078537940979004,
+      "learning_rate": 0.00017712979623471807,
+      "loss": 4.2793,
+      "step": 346
+    },
+    {
+      "epoch": 0.027754449110177964,
+      "grad_norm": 4.819201469421387,
+      "learning_rate": 0.0001769989446600225,
+      "loss": 4.2297,
+      "step": 347
+    },
+    {
+      "epoch": 0.027834433113377324,
+      "grad_norm": 3.5136406421661377,
+      "learning_rate": 0.00017686776844571988,
+      "loss": 3.05,
+      "step": 348
+    },
+    {
+      "epoch": 0.027914417116576686,
+      "grad_norm": 5.069991111755371,
+      "learning_rate": 0.0001767362681448697,
+      "loss": 4.6393,
+      "step": 349
+    },
+    {
+      "epoch": 0.027994401119776045,
+      "grad_norm": 5.238731384277344,
+      "learning_rate": 0.0001766044443118978,
+      "loss": 4.2915,
+      "step": 350
+    },
+    {
+      "epoch": 0.028074385122975404,
+      "grad_norm": 7.37299919128418,
+      "learning_rate": 0.00017647229750259412,
+      "loss": 4.3018,
+      "step": 351
+    },
+    {
+      "epoch": 0.028154369126174764,
+      "grad_norm": 4.723372459411621,
+      "learning_rate": 0.00017633982827411032,
+      "loss": 2.7848,
+      "step": 352
+    },
+    {
+      "epoch": 0.028234353129374126,
+      "grad_norm": 4.730043411254883,
+      "learning_rate": 0.00017620703718495735,
+      "loss": 4.3046,
+      "step": 353
+    },
+    {
+      "epoch": 0.028314337132573485,
+      "grad_norm": 5.053976535797119,
+      "learning_rate": 0.00017607392479500325,
+      "loss": 3.5784,
+      "step": 354
+    },
+    {
+      "epoch": 0.028394321135772844,
+      "grad_norm": 3.9743452072143555,
+      "learning_rate": 0.00017594049166547073,
+      "loss": 3.5196,
+      "step": 355
+    },
+    {
+      "epoch": 0.028474305138972207,
+      "grad_norm": 3.9845669269561768,
+      "learning_rate": 0.00017580673835893473,
+      "loss": 3.56,
+      "step": 356
+    },
+    {
+      "epoch": 0.028554289142171566,
+      "grad_norm": 4.781105995178223,
+      "learning_rate": 0.00017567266543932014,
+      "loss": 4.1717,
+      "step": 357
+    },
+    {
+      "epoch": 0.028634273145370925,
+      "grad_norm": 6.14730167388916,
+      "learning_rate": 0.00017553827347189938,
+      "loss": 3.501,
+      "step": 358
+    },
+    {
+      "epoch": 0.028714257148570285,
+      "grad_norm": 4.144578456878662,
+      "learning_rate": 0.00017540356302329007,
+      "loss": 3.6008,
+      "step": 359
+    },
+    {
+      "epoch": 0.028794241151769647,
+      "grad_norm": 4.807369709014893,
+      "learning_rate": 0.00017526853466145244,
+      "loss": 4.4085,
+      "step": 360
+    },
+    {
+      "epoch": 0.028874225154969006,
+      "grad_norm": 3.8738791942596436,
+      "learning_rate": 0.00017513318895568737,
+      "loss": 2.7231,
+      "step": 361
+    },
+    {
+      "epoch": 0.028954209158168365,
+      "grad_norm": 4.871976852416992,
+      "learning_rate": 0.0001749975264766334,
+      "loss": 4.6136,
+      "step": 362
+    },
+    {
+      "epoch": 0.029034193161367728,
+      "grad_norm": 3.6613216400146484,
+      "learning_rate": 0.00017486154779626482,
+      "loss": 3.2433,
+      "step": 363
+    },
+    {
+      "epoch": 0.029114177164567087,
+      "grad_norm": 6.628910064697266,
+      "learning_rate": 0.0001747252534878891,
+      "loss": 4.12,
+      "step": 364
+    },
+    {
+      "epoch": 0.029194161167766446,
+      "grad_norm": 3.9633758068084717,
+      "learning_rate": 0.00017458864412614434,
+      "loss": 3.3254,
+      "step": 365
+    },
+    {
+      "epoch": 0.029274145170965805,
+      "grad_norm": 5.479128837585449,
+      "learning_rate": 0.000174451720286997,
+      "loss": 4.8234,
+      "step": 366
+    },
+    {
+      "epoch": 0.029354129174165168,
+      "grad_norm": 5.950286388397217,
+      "learning_rate": 0.00017431448254773944,
+      "loss": 5.4012,
+      "step": 367
+    },
+    {
+      "epoch": 0.029434113177364527,
+      "grad_norm": 3.8191025257110596,
+      "learning_rate": 0.00017417693148698743,
+      "loss": 3.1482,
+      "step": 368
+    },
+    {
+      "epoch": 0.029514097180563886,
+      "grad_norm": 3.818920135498047,
+      "learning_rate": 0.0001740390676846778,
+      "loss": 4.2181,
+      "step": 369
+    },
+    {
+      "epoch": 0.02959408118376325,
+      "grad_norm": 4.036929607391357,
+      "learning_rate": 0.00017390089172206592,
+      "loss": 3.2691,
+      "step": 370
+    },
+    {
+      "epoch": 0.029674065186962608,
+      "grad_norm": 3.912344217300415,
+      "learning_rate": 0.0001737624041817233,
+      "loss": 3.7853,
+      "step": 371
+    },
+    {
+      "epoch": 0.029754049190161967,
+      "grad_norm": 4.815845012664795,
+      "learning_rate": 0.00017362360564753505,
+      "loss": 4.7852,
+      "step": 372
+    },
+    {
+      "epoch": 0.029834033193361326,
+      "grad_norm": 4.733504772186279,
+      "learning_rate": 0.00017348449670469756,
+      "loss": 5.0045,
+      "step": 373
+    },
+    {
+      "epoch": 0.02991401719656069,
+      "grad_norm": 3.8281900882720947,
+      "learning_rate": 0.00017334507793971592,
+      "loss": 3.4749,
+      "step": 374
+    },
+    {
+      "epoch": 0.029994001199760048,
+      "grad_norm": 4.226433277130127,
+      "learning_rate": 0.00017320534994040148,
+      "loss": 3.6603,
+      "step": 375
+    },
+    {
+      "epoch": 0.030073985202959407,
+      "grad_norm": 3.6558430194854736,
+      "learning_rate": 0.00017306531329586933,
+      "loss": 3.5563,
+      "step": 376
+    },
+    {
+      "epoch": 0.03015396920615877,
+      "grad_norm": 3.7194387912750244,
+      "learning_rate": 0.00017292496859653588,
+      "loss": 3.1529,
+      "step": 377
+    },
+    {
+      "epoch": 0.03023395320935813,
+      "grad_norm": 4.224362373352051,
+      "learning_rate": 0.00017278431643411642,
+      "loss": 3.0775,
+      "step": 378
+    },
+    {
+      "epoch": 0.030313937212557488,
+      "grad_norm": 4.306207656860352,
+      "learning_rate": 0.00017264335740162242,
+      "loss": 3.2271,
+      "step": 379
+    },
+    {
+      "epoch": 0.030393921215756847,
+      "grad_norm": 6.243900775909424,
+      "learning_rate": 0.00017250209209335927,
+      "loss": 4.905,
+      "step": 380
+    },
+    {
+      "epoch": 0.03047390521895621,
+      "grad_norm": 4.626831531524658,
+      "learning_rate": 0.00017236052110492365,
+      "loss": 3.9372,
+      "step": 381
+    },
+    {
+      "epoch": 0.03055388922215557,
+      "grad_norm": 3.9243571758270264,
+      "learning_rate": 0.00017221864503320092,
+      "loss": 3.6657,
+      "step": 382
+    },
+    {
+      "epoch": 0.030633873225354928,
+      "grad_norm": 4.48898983001709,
+      "learning_rate": 0.00017207646447636295,
+      "loss": 3.0103,
+      "step": 383
+    },
+    {
+      "epoch": 0.03071385722855429,
+      "grad_norm": 4.642123699188232,
+      "learning_rate": 0.0001719339800338651,
+      "loss": 3.9077,
+      "step": 384
+    },
+    {
+      "epoch": 0.03079384123175365,
+      "grad_norm": 4.266906261444092,
+      "learning_rate": 0.0001717911923064442,
+      "loss": 3.7819,
+      "step": 385
+    },
+    {
+      "epoch": 0.03079384123175365,
+      "eval_loss": 0.9522297978401184,
+      "eval_runtime": 288.1682,
+      "eval_samples_per_second": 18.271,
+      "eval_steps_per_second": 9.137,
+      "step": 385
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1540,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 385,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.7600393986965504e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e9fb63fb901227a78d49b512821152d1ec895187d3fc4c3e2d6807ba942e72b
+size 6776