Upload folder using huggingface_hub

Browse files

Files changed (11) hide show

README.md +208 -0
adapter_config.json +50 -0
adapter_model.safetensors +3 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +27 -0
tokenizer.json +0 -0
tokenizer_config.json +0 -0
trainer_state.json +1754 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,208 @@

+---
+base_model: unsloth/DeepSeek-OCR
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:./deepseek_ocr
+- lora
+- transformers
+- unsloth
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,50 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "DeepseekOCRForCausalLM",
+    "parent_library": "transformers_modules.deepseek_ocr.modeling_deepseekocr",
+    "unsloth_fixed": true
+  },
+  "base_model_name_or_path": "unsloth/DeepSeek-OCR",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "o_proj",
+    "k_proj",
+    "up_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8735a7e748c4dbd37bf0ba53cd50beb4c7fa5b399fe79c9f0cfa34247ad42fdf
+size 310662536

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:000c9c28f96addd0d6da09158a5d8fdebf4b237f7161c75b28c5654c2cfa4e55
+size 162452055

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8ad25a511bee3332dbed17a70bf3df04eb9aefe047f2ca66f0b3a4e2f5c412ab
+size 14645

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:830f1d36edab13214ac6df2f6ab157c3d7d466ad075f3415bcdbe0e363c167d7
+size 1465

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "additional_special_tokens": [
+    "<|User|>",
+    "<|Assistant|>"
+  ],
+  "bos_token": {
+    "content": "<｜begin▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<｜▁pad▁｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

trainer_state.json ADDED Viewed

	@@ -0,0 +1,1754 @@

+{
+  "best_global_step": 1200,
+  "best_metric": 0.14440582692623138,
+  "best_model_checkpoint": "./outputs/checkpoint-1200",
+  "epoch": 0.9962640099626401,
+  "eval_steps": 200,
+  "global_step": 1200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.004151100041511001,
+      "grad_norm": 0.7003103494644165,
+      "learning_rate": 0.00016,
+      "loss": 0.7739,
+      "step": 5
+    },
+    {
+      "epoch": 0.008302200083022002,
+      "grad_norm": 0.2718559503555298,
+      "learning_rate": 0.00019933333333333334,
+      "loss": 0.4629,
+      "step": 10
+    },
+    {
+      "epoch": 0.008302200083022002,
+      "eval_loss": 0.38488370180130005,
+      "eval_runtime": 191.3755,
+      "eval_samples_per_second": 4.703,
+      "eval_steps_per_second": 2.351,
+      "step": 10
+    },
+    {
+      "epoch": 0.012453300124533,
+      "grad_norm": 0.14381290972232819,
+      "learning_rate": 0.00019850000000000003,
+      "loss": 0.3145,
+      "step": 15
+    },
+    {
+      "epoch": 0.016604400166044003,
+      "grad_norm": 0.19169217348098755,
+      "learning_rate": 0.00019766666666666666,
+      "loss": 0.2939,
+      "step": 20
+    },
+    {
+      "epoch": 0.020755500207555,
+      "grad_norm": 0.14952002465724945,
+      "learning_rate": 0.00019683333333333334,
+      "loss": 0.2497,
+      "step": 25
+    },
+    {
+      "epoch": 0.024906600249066,
+      "grad_norm": 0.14751273393630981,
+      "learning_rate": 0.000196,
+      "loss": 0.232,
+      "step": 30
+    },
+    {
+      "epoch": 0.029057700290577002,
+      "grad_norm": 0.14557640254497528,
+      "learning_rate": 0.00019516666666666668,
+      "loss": 0.2456,
+      "step": 35
+    },
+    {
+      "epoch": 0.033208800332088007,
+      "grad_norm": 0.13989262282848358,
+      "learning_rate": 0.00019433333333333333,
+      "loss": 0.2252,
+      "step": 40
+    },
+    {
+      "epoch": 0.037359900373599,
+      "grad_norm": 0.15823808312416077,
+      "learning_rate": 0.00019350000000000001,
+      "loss": 0.2263,
+      "step": 45
+    },
+    {
+      "epoch": 0.04151100041511,
+      "grad_norm": 0.14398093521595,
+      "learning_rate": 0.0001926666666666667,
+      "loss": 0.2162,
+      "step": 50
+    },
+    {
+      "epoch": 0.045662100456621,
+      "grad_norm": 0.12337611615657806,
+      "learning_rate": 0.00019183333333333333,
+      "loss": 0.2065,
+      "step": 55
+    },
+    {
+      "epoch": 0.049813200498132,
+      "grad_norm": 0.1576000601053238,
+      "learning_rate": 0.000191,
+      "loss": 0.2067,
+      "step": 60
+    },
+    {
+      "epoch": 0.053964300539643004,
+      "grad_norm": 0.13802748918533325,
+      "learning_rate": 0.00019016666666666666,
+      "loss": 0.2055,
+      "step": 65
+    },
+    {
+      "epoch": 0.058115400581154004,
+      "grad_norm": 0.12468370050191879,
+      "learning_rate": 0.00018933333333333335,
+      "loss": 0.1899,
+      "step": 70
+    },
+    {
+      "epoch": 0.062266500622665005,
+      "grad_norm": 0.13926441967487335,
+      "learning_rate": 0.0001885,
+      "loss": 0.2005,
+      "step": 75
+    },
+    {
+      "epoch": 0.06641760066417601,
+      "grad_norm": 0.12755008041858673,
+      "learning_rate": 0.00018766666666666668,
+      "loss": 0.1843,
+      "step": 80
+    },
+    {
+      "epoch": 0.070568700705687,
+      "grad_norm": 0.11908452957868576,
+      "learning_rate": 0.00018683333333333334,
+      "loss": 0.195,
+      "step": 85
+    },
+    {
+      "epoch": 0.074719800747198,
+      "grad_norm": 0.10701552778482437,
+      "learning_rate": 0.00018600000000000002,
+      "loss": 0.1864,
+      "step": 90
+    },
+    {
+      "epoch": 0.07887090078870901,
+      "grad_norm": 0.12250486761331558,
+      "learning_rate": 0.00018516666666666668,
+      "loss": 0.1953,
+      "step": 95
+    },
+    {
+      "epoch": 0.08302200083022,
+      "grad_norm": 0.14064641296863556,
+      "learning_rate": 0.00018433333333333333,
+      "loss": 0.1809,
+      "step": 100
+    },
+    {
+      "epoch": 0.08717310087173101,
+      "grad_norm": 0.12372340261936188,
+      "learning_rate": 0.00018350000000000002,
+      "loss": 0.178,
+      "step": 105
+    },
+    {
+      "epoch": 0.091324200913242,
+      "grad_norm": 0.089394710958004,
+      "learning_rate": 0.00018266666666666667,
+      "loss": 0.1641,
+      "step": 110
+    },
+    {
+      "epoch": 0.09547530095475301,
+      "grad_norm": 0.11845199763774872,
+      "learning_rate": 0.00018183333333333335,
+      "loss": 0.1914,
+      "step": 115
+    },
+    {
+      "epoch": 0.099626400996264,
+      "grad_norm": 0.10505373775959015,
+      "learning_rate": 0.000181,
+      "loss": 0.2002,
+      "step": 120
+    },
+    {
+      "epoch": 0.10377750103777501,
+      "grad_norm": 0.11983006447553635,
+      "learning_rate": 0.0001801666666666667,
+      "loss": 0.178,
+      "step": 125
+    },
+    {
+      "epoch": 0.10792860107928601,
+      "grad_norm": 0.14301978051662445,
+      "learning_rate": 0.00017933333333333332,
+      "loss": 0.1838,
+      "step": 130
+    },
+    {
+      "epoch": 0.11207970112079702,
+      "grad_norm": 0.11357705295085907,
+      "learning_rate": 0.0001785,
+      "loss": 0.1758,
+      "step": 135
+    },
+    {
+      "epoch": 0.11623080116230801,
+      "grad_norm": 0.11982124298810959,
+      "learning_rate": 0.00017766666666666666,
+      "loss": 0.1838,
+      "step": 140
+    },
+    {
+      "epoch": 0.12038190120381902,
+      "grad_norm": 0.10831739008426666,
+      "learning_rate": 0.00017683333333333334,
+      "loss": 0.1844,
+      "step": 145
+    },
+    {
+      "epoch": 0.12453300124533001,
+      "grad_norm": 0.1307750791311264,
+      "learning_rate": 0.00017600000000000002,
+      "loss": 0.1682,
+      "step": 150
+    },
+    {
+      "epoch": 0.12868410128684102,
+      "grad_norm": 0.11941725760698318,
+      "learning_rate": 0.00017516666666666668,
+      "loss": 0.1817,
+      "step": 155
+    },
+    {
+      "epoch": 0.13283520132835203,
+      "grad_norm": 0.11858333647251129,
+      "learning_rate": 0.00017433333333333336,
+      "loss": 0.1755,
+      "step": 160
+    },
+    {
+      "epoch": 0.136986301369863,
+      "grad_norm": 0.148487389087677,
+      "learning_rate": 0.00017350000000000002,
+      "loss": 0.1737,
+      "step": 165
+    },
+    {
+      "epoch": 0.141137401411374,
+      "grad_norm": 0.09661240875720978,
+      "learning_rate": 0.00017266666666666667,
+      "loss": 0.1711,
+      "step": 170
+    },
+    {
+      "epoch": 0.14528850145288502,
+      "grad_norm": 0.11982620507478714,
+      "learning_rate": 0.00017183333333333333,
+      "loss": 0.1761,
+      "step": 175
+    },
+    {
+      "epoch": 0.149439601494396,
+      "grad_norm": 0.13303467631340027,
+      "learning_rate": 0.000171,
+      "loss": 0.1852,
+      "step": 180
+    },
+    {
+      "epoch": 0.153590701535907,
+      "grad_norm": 0.17849081754684448,
+      "learning_rate": 0.00017016666666666666,
+      "loss": 0.175,
+      "step": 185
+    },
+    {
+      "epoch": 0.15774180157741802,
+      "grad_norm": 0.1439221203327179,
+      "learning_rate": 0.00016933333333333335,
+      "loss": 0.1829,
+      "step": 190
+    },
+    {
+      "epoch": 0.16189290161892902,
+      "grad_norm": 0.10196991264820099,
+      "learning_rate": 0.0001685,
+      "loss": 0.1806,
+      "step": 195
+    },
+    {
+      "epoch": 0.16604400166044,
+      "grad_norm": 0.08456692099571228,
+      "learning_rate": 0.00016766666666666669,
+      "loss": 0.1654,
+      "step": 200
+    },
+    {
+      "epoch": 0.170195101701951,
+      "grad_norm": 0.11249461770057678,
+      "learning_rate": 0.00016683333333333334,
+      "loss": 0.1742,
+      "step": 205
+    },
+    {
+      "epoch": 0.17434620174346202,
+      "grad_norm": 0.13056688010692596,
+      "learning_rate": 0.000166,
+      "loss": 0.1638,
+      "step": 210
+    },
+    {
+      "epoch": 0.17849730178497303,
+      "grad_norm": 0.11072465032339096,
+      "learning_rate": 0.00016516666666666668,
+      "loss": 0.174,
+      "step": 215
+    },
+    {
+      "epoch": 0.182648401826484,
+      "grad_norm": 0.1256282925605774,
+      "learning_rate": 0.00016433333333333333,
+      "loss": 0.1787,
+      "step": 220
+    },
+    {
+      "epoch": 0.18679950186799502,
+      "grad_norm": 0.11549237370491028,
+      "learning_rate": 0.00016350000000000002,
+      "loss": 0.158,
+      "step": 225
+    },
+    {
+      "epoch": 0.19095060190950602,
+      "grad_norm": 0.1422484964132309,
+      "learning_rate": 0.00016266666666666667,
+      "loss": 0.1763,
+      "step": 230
+    },
+    {
+      "epoch": 0.19510170195101703,
+      "grad_norm": 0.15041838586330414,
+      "learning_rate": 0.00016183333333333335,
+      "loss": 0.1724,
+      "step": 235
+    },
+    {
+      "epoch": 0.199252801992528,
+      "grad_norm": 0.15117141604423523,
+      "learning_rate": 0.000161,
+      "loss": 0.1748,
+      "step": 240
+    },
+    {
+      "epoch": 0.20340390203403902,
+      "grad_norm": 0.13535872101783752,
+      "learning_rate": 0.00016016666666666667,
+      "loss": 0.1639,
+      "step": 245
+    },
+    {
+      "epoch": 0.20755500207555003,
+      "grad_norm": 0.11507098376750946,
+      "learning_rate": 0.00015933333333333332,
+      "loss": 0.1707,
+      "step": 250
+    },
+    {
+      "epoch": 0.21170610211706103,
+      "grad_norm": 0.1293431520462036,
+      "learning_rate": 0.0001585,
+      "loss": 0.1549,
+      "step": 255
+    },
+    {
+      "epoch": 0.21585720215857201,
+      "grad_norm": 0.11451301723718643,
+      "learning_rate": 0.00015766666666666669,
+      "loss": 0.169,
+      "step": 260
+    },
+    {
+      "epoch": 0.22000830220008302,
+      "grad_norm": 0.12253163754940033,
+      "learning_rate": 0.00015683333333333334,
+      "loss": 0.1622,
+      "step": 265
+    },
+    {
+      "epoch": 0.22415940224159403,
+      "grad_norm": 0.12956801056861877,
+      "learning_rate": 0.00015600000000000002,
+      "loss": 0.1682,
+      "step": 270
+    },
+    {
+      "epoch": 0.228310502283105,
+      "grad_norm": 0.13183289766311646,
+      "learning_rate": 0.00015516666666666668,
+      "loss": 0.173,
+      "step": 275
+    },
+    {
+      "epoch": 0.23246160232461602,
+      "grad_norm": 0.10935479402542114,
+      "learning_rate": 0.00015433333333333334,
+      "loss": 0.1655,
+      "step": 280
+    },
+    {
+      "epoch": 0.23661270236612703,
+      "grad_norm": 0.12317913770675659,
+      "learning_rate": 0.0001535,
+      "loss": 0.1737,
+      "step": 285
+    },
+    {
+      "epoch": 0.24076380240763803,
+      "grad_norm": 0.11220147460699081,
+      "learning_rate": 0.00015266666666666667,
+      "loss": 0.1629,
+      "step": 290
+    },
+    {
+      "epoch": 0.244914902449149,
+      "grad_norm": 0.13465231657028198,
+      "learning_rate": 0.00015183333333333333,
+      "loss": 0.1662,
+      "step": 295
+    },
+    {
+      "epoch": 0.24906600249066002,
+      "grad_norm": 0.11543688923120499,
+      "learning_rate": 0.000151,
+      "loss": 0.1696,
+      "step": 300
+    },
+    {
+      "epoch": 0.253217102532171,
+      "grad_norm": 0.11491172760725021,
+      "learning_rate": 0.00015016666666666667,
+      "loss": 0.1658,
+      "step": 305
+    },
+    {
+      "epoch": 0.25736820257368204,
+      "grad_norm": 0.12188146263360977,
+      "learning_rate": 0.00014933333333333335,
+      "loss": 0.1658,
+      "step": 310
+    },
+    {
+      "epoch": 0.261519302615193,
+      "grad_norm": 0.12088894098997116,
+      "learning_rate": 0.0001485,
+      "loss": 0.1784,
+      "step": 315
+    },
+    {
+      "epoch": 0.26567040265670405,
+      "grad_norm": 0.12337731570005417,
+      "learning_rate": 0.00014766666666666666,
+      "loss": 0.166,
+      "step": 320
+    },
+    {
+      "epoch": 0.26982150269821503,
+      "grad_norm": 0.1168065220117569,
+      "learning_rate": 0.00014683333333333334,
+      "loss": 0.1673,
+      "step": 325
+    },
+    {
+      "epoch": 0.273972602739726,
+      "grad_norm": 0.11037846654653549,
+      "learning_rate": 0.000146,
+      "loss": 0.1593,
+      "step": 330
+    },
+    {
+      "epoch": 0.27812370278123705,
+      "grad_norm": 0.1385302096605301,
+      "learning_rate": 0.00014516666666666668,
+      "loss": 0.1711,
+      "step": 335
+    },
+    {
+      "epoch": 0.282274802822748,
+      "grad_norm": 0.12126076221466064,
+      "learning_rate": 0.00014433333333333334,
+      "loss": 0.1672,
+      "step": 340
+    },
+    {
+      "epoch": 0.286425902864259,
+      "grad_norm": 0.13003192842006683,
+      "learning_rate": 0.00014350000000000002,
+      "loss": 0.1856,
+      "step": 345
+    },
+    {
+      "epoch": 0.29057700290577004,
+      "grad_norm": 0.11907174438238144,
+      "learning_rate": 0.00014266666666666667,
+      "loss": 0.1626,
+      "step": 350
+    },
+    {
+      "epoch": 0.294728102947281,
+      "grad_norm": 0.1277119666337967,
+      "learning_rate": 0.00014183333333333333,
+      "loss": 0.1546,
+      "step": 355
+    },
+    {
+      "epoch": 0.298879202988792,
+      "grad_norm": 0.09578083455562592,
+      "learning_rate": 0.000141,
+      "loss": 0.1581,
+      "step": 360
+    },
+    {
+      "epoch": 0.30303030303030304,
+      "grad_norm": 0.14650468528270721,
+      "learning_rate": 0.00014016666666666667,
+      "loss": 0.155,
+      "step": 365
+    },
+    {
+      "epoch": 0.307181403071814,
+      "grad_norm": 0.09313970804214478,
+      "learning_rate": 0.00013933333333333335,
+      "loss": 0.1538,
+      "step": 370
+    },
+    {
+      "epoch": 0.31133250311332505,
+      "grad_norm": 0.12291823327541351,
+      "learning_rate": 0.0001385,
+      "loss": 0.1618,
+      "step": 375
+    },
+    {
+      "epoch": 0.31548360315483603,
+      "grad_norm": 0.15737979114055634,
+      "learning_rate": 0.0001376666666666667,
+      "loss": 0.1489,
+      "step": 380
+    },
+    {
+      "epoch": 0.319634703196347,
+      "grad_norm": 0.1407833844423294,
+      "learning_rate": 0.00013683333333333334,
+      "loss": 0.1667,
+      "step": 385
+    },
+    {
+      "epoch": 0.32378580323785805,
+      "grad_norm": 0.11220885813236237,
+      "learning_rate": 0.00013600000000000003,
+      "loss": 0.1612,
+      "step": 390
+    },
+    {
+      "epoch": 0.32793690327936903,
+      "grad_norm": 0.11603619903326035,
+      "learning_rate": 0.00013516666666666665,
+      "loss": 0.1512,
+      "step": 395
+    },
+    {
+      "epoch": 0.33208800332088,
+      "grad_norm": 0.1272398978471756,
+      "learning_rate": 0.00013433333333333334,
+      "loss": 0.1698,
+      "step": 400
+    },
+    {
+      "epoch": 0.33623910336239105,
+      "grad_norm": 0.09472394734621048,
+      "learning_rate": 0.0001335,
+      "loss": 0.1383,
+      "step": 405
+    },
+    {
+      "epoch": 0.340390203403902,
+      "grad_norm": 0.14773984253406525,
+      "learning_rate": 0.00013266666666666667,
+      "loss": 0.1448,
+      "step": 410
+    },
+    {
+      "epoch": 0.34454130344541306,
+      "grad_norm": 0.12423422932624817,
+      "learning_rate": 0.00013183333333333333,
+      "loss": 0.1592,
+      "step": 415
+    },
+    {
+      "epoch": 0.34869240348692404,
+      "grad_norm": 0.09750059992074966,
+      "learning_rate": 0.000131,
+      "loss": 0.1547,
+      "step": 420
+    },
+    {
+      "epoch": 0.352843503528435,
+      "grad_norm": 0.15196076035499573,
+      "learning_rate": 0.00013016666666666667,
+      "loss": 0.1454,
+      "step": 425
+    },
+    {
+      "epoch": 0.35699460356994606,
+      "grad_norm": 0.13726986944675446,
+      "learning_rate": 0.00012933333333333332,
+      "loss": 0.1671,
+      "step": 430
+    },
+    {
+      "epoch": 0.36114570361145704,
+      "grad_norm": 0.13060466945171356,
+      "learning_rate": 0.0001285,
+      "loss": 0.1547,
+      "step": 435
+    },
+    {
+      "epoch": 0.365296803652968,
+      "grad_norm": 0.12362024933099747,
+      "learning_rate": 0.00012766666666666666,
+      "loss": 0.1616,
+      "step": 440
+    },
+    {
+      "epoch": 0.36944790369447905,
+      "grad_norm": 0.1080276295542717,
+      "learning_rate": 0.00012683333333333334,
+      "loss": 0.1564,
+      "step": 445
+    },
+    {
+      "epoch": 0.37359900373599003,
+      "grad_norm": 0.11262942105531693,
+      "learning_rate": 0.000126,
+      "loss": 0.1604,
+      "step": 450
+    },
+    {
+      "epoch": 0.377750103777501,
+      "grad_norm": 0.13379591703414917,
+      "learning_rate": 0.00012516666666666668,
+      "loss": 0.157,
+      "step": 455
+    },
+    {
+      "epoch": 0.38190120381901205,
+      "grad_norm": 0.12742692232131958,
+      "learning_rate": 0.00012433333333333334,
+      "loss": 0.1579,
+      "step": 460
+    },
+    {
+      "epoch": 0.386052303860523,
+      "grad_norm": 0.10482796281576157,
+      "learning_rate": 0.00012350000000000002,
+      "loss": 0.1528,
+      "step": 465
+    },
+    {
+      "epoch": 0.39020340390203406,
+      "grad_norm": 0.12541286647319794,
+      "learning_rate": 0.00012266666666666668,
+      "loss": 0.1498,
+      "step": 470
+    },
+    {
+      "epoch": 0.39435450394354504,
+      "grad_norm": 0.15082834661006927,
+      "learning_rate": 0.00012183333333333333,
+      "loss": 0.1402,
+      "step": 475
+    },
+    {
+      "epoch": 0.398505603985056,
+      "grad_norm": 0.11872395128011703,
+      "learning_rate": 0.000121,
+      "loss": 0.1613,
+      "step": 480
+    },
+    {
+      "epoch": 0.40265670402656706,
+      "grad_norm": 0.12806229293346405,
+      "learning_rate": 0.00012016666666666667,
+      "loss": 0.1467,
+      "step": 485
+    },
+    {
+      "epoch": 0.40680780406807804,
+      "grad_norm": 0.11704318970441818,
+      "learning_rate": 0.00011933333333333334,
+      "loss": 0.1536,
+      "step": 490
+    },
+    {
+      "epoch": 0.410958904109589,
+      "grad_norm": 0.11440624296665192,
+      "learning_rate": 0.00011850000000000001,
+      "loss": 0.1488,
+      "step": 495
+    },
+    {
+      "epoch": 0.41511000415110005,
+      "grad_norm": 0.1284865289926529,
+      "learning_rate": 0.00011766666666666668,
+      "loss": 0.1479,
+      "step": 500
+    },
+    {
+      "epoch": 0.41926110419261103,
+      "grad_norm": 0.1310071051120758,
+      "learning_rate": 0.00011683333333333333,
+      "loss": 0.1638,
+      "step": 505
+    },
+    {
+      "epoch": 0.42341220423412207,
+      "grad_norm": 0.14244677126407623,
+      "learning_rate": 0.000116,
+      "loss": 0.166,
+      "step": 510
+    },
+    {
+      "epoch": 0.42756330427563305,
+      "grad_norm": 0.12084666639566422,
+      "learning_rate": 0.00011516666666666667,
+      "loss": 0.1521,
+      "step": 515
+    },
+    {
+      "epoch": 0.43171440431714403,
+      "grad_norm": 0.13859711587429047,
+      "learning_rate": 0.00011433333333333334,
+      "loss": 0.1575,
+      "step": 520
+    },
+    {
+      "epoch": 0.43586550435865506,
+      "grad_norm": 0.14870645105838776,
+      "learning_rate": 0.00011350000000000001,
+      "loss": 0.1599,
+      "step": 525
+    },
+    {
+      "epoch": 0.44001660440016604,
+      "grad_norm": 0.12018068134784698,
+      "learning_rate": 0.00011266666666666668,
+      "loss": 0.1648,
+      "step": 530
+    },
+    {
+      "epoch": 0.444167704441677,
+      "grad_norm": 0.120558962225914,
+      "learning_rate": 0.00011183333333333335,
+      "loss": 0.154,
+      "step": 535
+    },
+    {
+      "epoch": 0.44831880448318806,
+      "grad_norm": 0.11315838992595673,
+      "learning_rate": 0.00011100000000000001,
+      "loss": 0.1465,
+      "step": 540
+    },
+    {
+      "epoch": 0.45246990452469904,
+      "grad_norm": 0.1233653798699379,
+      "learning_rate": 0.00011016666666666666,
+      "loss": 0.1389,
+      "step": 545
+    },
+    {
+      "epoch": 0.45662100456621,
+      "grad_norm": 0.12076210975646973,
+      "learning_rate": 0.00010933333333333333,
+      "loss": 0.1575,
+      "step": 550
+    },
+    {
+      "epoch": 0.46077210460772106,
+      "grad_norm": 0.11424656212329865,
+      "learning_rate": 0.00010850000000000001,
+      "loss": 0.1542,
+      "step": 555
+    },
+    {
+      "epoch": 0.46492320464923204,
+      "grad_norm": 0.11583676189184189,
+      "learning_rate": 0.00010766666666666668,
+      "loss": 0.1626,
+      "step": 560
+    },
+    {
+      "epoch": 0.46907430469074307,
+      "grad_norm": 0.12343718856573105,
+      "learning_rate": 0.00010683333333333335,
+      "loss": 0.1394,
+      "step": 565
+    },
+    {
+      "epoch": 0.47322540473225405,
+      "grad_norm": 0.12574954330921173,
+      "learning_rate": 0.00010600000000000002,
+      "loss": 0.1402,
+      "step": 570
+    },
+    {
+      "epoch": 0.47737650477376503,
+      "grad_norm": 0.13551151752471924,
+      "learning_rate": 0.00010516666666666668,
+      "loss": 0.1598,
+      "step": 575
+    },
+    {
+      "epoch": 0.48152760481527607,
+      "grad_norm": 0.12537416815757751,
+      "learning_rate": 0.00010433333333333333,
+      "loss": 0.1588,
+      "step": 580
+    },
+    {
+      "epoch": 0.48567870485678705,
+      "grad_norm": 0.13128598034381866,
+      "learning_rate": 0.0001035,
+      "loss": 0.143,
+      "step": 585
+    },
+    {
+      "epoch": 0.489829804898298,
+      "grad_norm": 0.11566058546304703,
+      "learning_rate": 0.00010266666666666666,
+      "loss": 0.1608,
+      "step": 590
+    },
+    {
+      "epoch": 0.49398090493980906,
+      "grad_norm": 0.11678820848464966,
+      "learning_rate": 0.00010183333333333333,
+      "loss": 0.1505,
+      "step": 595
+    },
+    {
+      "epoch": 0.49813200498132004,
+      "grad_norm": 0.12501274049282074,
+      "learning_rate": 0.000101,
+      "loss": 0.161,
+      "step": 600
+    },
+    {
+      "epoch": 0.49813200498132004,
+      "eval_loss": 0.15276095271110535,
+      "eval_runtime": 185.8767,
+      "eval_samples_per_second": 4.842,
+      "eval_steps_per_second": 2.421,
+      "step": 600
+    },
+    {
+      "epoch": 0.502283105022831,
+      "grad_norm": 0.10359059274196625,
+      "learning_rate": 0.00010016666666666667,
+      "loss": 0.1498,
+      "step": 605
+    },
+    {
+      "epoch": 0.506434205064342,
+      "grad_norm": 0.12891648709774017,
+      "learning_rate": 9.933333333333334e-05,
+      "loss": 0.1565,
+      "step": 610
+    },
+    {
+      "epoch": 0.5105853051058531,
+      "grad_norm": 0.1454884111881256,
+      "learning_rate": 9.850000000000001e-05,
+      "loss": 0.1422,
+      "step": 615
+    },
+    {
+      "epoch": 0.5147364051473641,
+      "grad_norm": 0.12548445165157318,
+      "learning_rate": 9.766666666666668e-05,
+      "loss": 0.176,
+      "step": 620
+    },
+    {
+      "epoch": 0.518887505188875,
+      "grad_norm": 0.11389490962028503,
+      "learning_rate": 9.683333333333335e-05,
+      "loss": 0.152,
+      "step": 625
+    },
+    {
+      "epoch": 0.523038605230386,
+      "grad_norm": 0.13731062412261963,
+      "learning_rate": 9.6e-05,
+      "loss": 0.1438,
+      "step": 630
+    },
+    {
+      "epoch": 0.527189705271897,
+      "grad_norm": 0.10833003371953964,
+      "learning_rate": 9.516666666666667e-05,
+      "loss": 0.136,
+      "step": 635
+    },
+    {
+      "epoch": 0.5313408053134081,
+      "grad_norm": 0.13331717252731323,
+      "learning_rate": 9.433333333333334e-05,
+      "loss": 0.1515,
+      "step": 640
+    },
+    {
+      "epoch": 0.5354919053549191,
+      "grad_norm": 0.11971119791269302,
+      "learning_rate": 9.350000000000001e-05,
+      "loss": 0.1608,
+      "step": 645
+    },
+    {
+      "epoch": 0.5396430053964301,
+      "grad_norm": 0.12283340841531754,
+      "learning_rate": 9.266666666666666e-05,
+      "loss": 0.1478,
+      "step": 650
+    },
+    {
+      "epoch": 0.543794105437941,
+      "grad_norm": 0.18648238480091095,
+      "learning_rate": 9.183333333333333e-05,
+      "loss": 0.1578,
+      "step": 655
+    },
+    {
+      "epoch": 0.547945205479452,
+      "grad_norm": 0.1412057876586914,
+      "learning_rate": 9.1e-05,
+      "loss": 0.1396,
+      "step": 660
+    },
+    {
+      "epoch": 0.552096305520963,
+      "grad_norm": 0.14734192192554474,
+      "learning_rate": 9.016666666666667e-05,
+      "loss": 0.1571,
+      "step": 665
+    },
+    {
+      "epoch": 0.5562474055624741,
+      "grad_norm": 0.10538703948259354,
+      "learning_rate": 8.933333333333334e-05,
+      "loss": 0.1641,
+      "step": 670
+    },
+    {
+      "epoch": 0.5603985056039851,
+      "grad_norm": 0.12703998386859894,
+      "learning_rate": 8.850000000000001e-05,
+      "loss": 0.1559,
+      "step": 675
+    },
+    {
+      "epoch": 0.564549605645496,
+      "grad_norm": 0.12684640288352966,
+      "learning_rate": 8.766666666666668e-05,
+      "loss": 0.1537,
+      "step": 680
+    },
+    {
+      "epoch": 0.568700705687007,
+      "grad_norm": 0.1368802934885025,
+      "learning_rate": 8.683333333333333e-05,
+      "loss": 0.1554,
+      "step": 685
+    },
+    {
+      "epoch": 0.572851805728518,
+      "grad_norm": 0.12192381918430328,
+      "learning_rate": 8.6e-05,
+      "loss": 0.1698,
+      "step": 690
+    },
+    {
+      "epoch": 0.5770029057700291,
+      "grad_norm": 0.09523618221282959,
+      "learning_rate": 8.516666666666667e-05,
+      "loss": 0.1362,
+      "step": 695
+    },
+    {
+      "epoch": 0.5811540058115401,
+      "grad_norm": 0.12437159568071365,
+      "learning_rate": 8.433333333333334e-05,
+      "loss": 0.1526,
+      "step": 700
+    },
+    {
+      "epoch": 0.5853051058530511,
+      "grad_norm": 0.12487108260393143,
+      "learning_rate": 8.35e-05,
+      "loss": 0.1362,
+      "step": 705
+    },
+    {
+      "epoch": 0.589456205894562,
+      "grad_norm": 0.10976472496986389,
+      "learning_rate": 8.266666666666667e-05,
+      "loss": 0.1516,
+      "step": 710
+    },
+    {
+      "epoch": 0.593607305936073,
+      "grad_norm": 0.11062753945589066,
+      "learning_rate": 8.183333333333333e-05,
+      "loss": 0.1486,
+      "step": 715
+    },
+    {
+      "epoch": 0.597758405977584,
+      "grad_norm": 0.1419171243906021,
+      "learning_rate": 8.1e-05,
+      "loss": 0.1558,
+      "step": 720
+    },
+    {
+      "epoch": 0.6019095060190951,
+      "grad_norm": 0.11999824643135071,
+      "learning_rate": 8.016666666666667e-05,
+      "loss": 0.1365,
+      "step": 725
+    },
+    {
+      "epoch": 0.6060606060606061,
+      "grad_norm": 0.12366942316293716,
+      "learning_rate": 7.933333333333334e-05,
+      "loss": 0.1594,
+      "step": 730
+    },
+    {
+      "epoch": 0.6102117061021171,
+      "grad_norm": 0.12560267746448517,
+      "learning_rate": 7.850000000000001e-05,
+      "loss": 0.1469,
+      "step": 735
+    },
+    {
+      "epoch": 0.614362806143628,
+      "grad_norm": 0.12089208513498306,
+      "learning_rate": 7.766666666666667e-05,
+      "loss": 0.1557,
+      "step": 740
+    },
+    {
+      "epoch": 0.618513906185139,
+      "grad_norm": 0.1430719494819641,
+      "learning_rate": 7.683333333333334e-05,
+      "loss": 0.1423,
+      "step": 745
+    },
+    {
+      "epoch": 0.6226650062266501,
+      "grad_norm": 0.13126327097415924,
+      "learning_rate": 7.6e-05,
+      "loss": 0.1468,
+      "step": 750
+    },
+    {
+      "epoch": 0.6268161062681611,
+      "grad_norm": 0.09532318264245987,
+      "learning_rate": 7.516666666666667e-05,
+      "loss": 0.1449,
+      "step": 755
+    },
+    {
+      "epoch": 0.6309672063096721,
+      "grad_norm": 0.12227542698383331,
+      "learning_rate": 7.433333333333333e-05,
+      "loss": 0.1544,
+      "step": 760
+    },
+    {
+      "epoch": 0.635118306351183,
+      "grad_norm": 0.14084969460964203,
+      "learning_rate": 7.35e-05,
+      "loss": 0.1488,
+      "step": 765
+    },
+    {
+      "epoch": 0.639269406392694,
+      "grad_norm": 0.12827131152153015,
+      "learning_rate": 7.266666666666667e-05,
+      "loss": 0.1447,
+      "step": 770
+    },
+    {
+      "epoch": 0.6434205064342051,
+      "grad_norm": 0.14061811566352844,
+      "learning_rate": 7.183333333333334e-05,
+      "loss": 0.1428,
+      "step": 775
+    },
+    {
+      "epoch": 0.6475716064757161,
+      "grad_norm": 0.14365419745445251,
+      "learning_rate": 7.1e-05,
+      "loss": 0.1574,
+      "step": 780
+    },
+    {
+      "epoch": 0.6517227065172271,
+      "grad_norm": 0.11606994271278381,
+      "learning_rate": 7.016666666666667e-05,
+      "loss": 0.146,
+      "step": 785
+    },
+    {
+      "epoch": 0.6558738065587381,
+      "grad_norm": 0.12274261564016342,
+      "learning_rate": 6.933333333333334e-05,
+      "loss": 0.1369,
+      "step": 790
+    },
+    {
+      "epoch": 0.660024906600249,
+      "grad_norm": 0.11611846834421158,
+      "learning_rate": 6.850000000000001e-05,
+      "loss": 0.1557,
+      "step": 795
+    },
+    {
+      "epoch": 0.66417600664176,
+      "grad_norm": 0.13420958817005157,
+      "learning_rate": 6.766666666666667e-05,
+      "loss": 0.1491,
+      "step": 800
+    },
+    {
+      "epoch": 0.66417600664176,
+      "eval_loss": 0.14878682792186737,
+      "eval_runtime": 186.9396,
+      "eval_samples_per_second": 4.814,
+      "eval_steps_per_second": 2.407,
+      "step": 800
+    },
+    {
+      "epoch": 0.6683271066832711,
+      "grad_norm": 0.12904717028141022,
+      "learning_rate": 6.683333333333334e-05,
+      "loss": 0.153,
+      "step": 805
+    },
+    {
+      "epoch": 0.6724782067247821,
+      "grad_norm": 0.12883269786834717,
+      "learning_rate": 6.6e-05,
+      "loss": 0.1455,
+      "step": 810
+    },
+    {
+      "epoch": 0.6766293067662931,
+      "grad_norm": 0.15537196397781372,
+      "learning_rate": 6.516666666666666e-05,
+      "loss": 0.1564,
+      "step": 815
+    },
+    {
+      "epoch": 0.680780406807804,
+      "grad_norm": 0.12396696209907532,
+      "learning_rate": 6.433333333333333e-05,
+      "loss": 0.1458,
+      "step": 820
+    },
+    {
+      "epoch": 0.684931506849315,
+      "grad_norm": 0.15211425721645355,
+      "learning_rate": 6.35e-05,
+      "loss": 0.1424,
+      "step": 825
+    },
+    {
+      "epoch": 0.6890826068908261,
+      "grad_norm": 0.12306790798902512,
+      "learning_rate": 6.266666666666667e-05,
+      "loss": 0.1418,
+      "step": 830
+    },
+    {
+      "epoch": 0.6932337069323371,
+      "grad_norm": 0.13135729730129242,
+      "learning_rate": 6.183333333333334e-05,
+      "loss": 0.1329,
+      "step": 835
+    },
+    {
+      "epoch": 0.6973848069738481,
+      "grad_norm": 0.1494913101196289,
+      "learning_rate": 6.1e-05,
+      "loss": 0.1518,
+      "step": 840
+    },
+    {
+      "epoch": 0.7015359070153591,
+      "grad_norm": 0.10251809656620026,
+      "learning_rate": 6.0166666666666674e-05,
+      "loss": 0.135,
+      "step": 845
+    },
+    {
+      "epoch": 0.70568700705687,
+      "grad_norm": 0.10936664044857025,
+      "learning_rate": 5.9333333333333343e-05,
+      "loss": 0.1498,
+      "step": 850
+    },
+    {
+      "epoch": 0.709838107098381,
+      "grad_norm": 0.14118026196956635,
+      "learning_rate": 5.85e-05,
+      "loss": 0.1549,
+      "step": 855
+    },
+    {
+      "epoch": 0.7139892071398921,
+      "grad_norm": 0.12029966711997986,
+      "learning_rate": 5.766666666666667e-05,
+      "loss": 0.1279,
+      "step": 860
+    },
+    {
+      "epoch": 0.7181403071814031,
+      "grad_norm": 0.13987119495868683,
+      "learning_rate": 5.683333333333334e-05,
+      "loss": 0.1311,
+      "step": 865
+    },
+    {
+      "epoch": 0.7222914072229141,
+      "grad_norm": 0.14721432328224182,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 0.1539,
+      "step": 870
+    },
+    {
+      "epoch": 0.726442507264425,
+      "grad_norm": 0.12505626678466797,
+      "learning_rate": 5.516666666666667e-05,
+      "loss": 0.1479,
+      "step": 875
+    },
+    {
+      "epoch": 0.730593607305936,
+      "grad_norm": 0.13287393748760223,
+      "learning_rate": 5.433333333333334e-05,
+      "loss": 0.1586,
+      "step": 880
+    },
+    {
+      "epoch": 0.7347447073474471,
+      "grad_norm": 0.10323189944028854,
+      "learning_rate": 5.3500000000000006e-05,
+      "loss": 0.1401,
+      "step": 885
+    },
+    {
+      "epoch": 0.7388958073889581,
+      "grad_norm": 0.12466787546873093,
+      "learning_rate": 5.266666666666666e-05,
+      "loss": 0.1428,
+      "step": 890
+    },
+    {
+      "epoch": 0.7430469074304691,
+      "grad_norm": 0.13881418108940125,
+      "learning_rate": 5.183333333333333e-05,
+      "loss": 0.1471,
+      "step": 895
+    },
+    {
+      "epoch": 0.7471980074719801,
+      "grad_norm": 0.1371707320213318,
+      "learning_rate": 5.1000000000000006e-05,
+      "loss": 0.1508,
+      "step": 900
+    },
+    {
+      "epoch": 0.751349107513491,
+      "grad_norm": 0.13635429739952087,
+      "learning_rate": 5.0166666666666675e-05,
+      "loss": 0.147,
+      "step": 905
+    },
+    {
+      "epoch": 0.755500207555002,
+      "grad_norm": 0.126560777425766,
+      "learning_rate": 4.933333333333334e-05,
+      "loss": 0.1365,
+      "step": 910
+    },
+    {
+      "epoch": 0.7596513075965131,
+      "grad_norm": 0.10843600332736969,
+      "learning_rate": 4.85e-05,
+      "loss": 0.1471,
+      "step": 915
+    },
+    {
+      "epoch": 0.7638024076380241,
+      "grad_norm": 0.13791149854660034,
+      "learning_rate": 4.766666666666667e-05,
+      "loss": 0.1511,
+      "step": 920
+    },
+    {
+      "epoch": 0.7679535076795351,
+      "grad_norm": 0.12029317021369934,
+      "learning_rate": 4.683333333333334e-05,
+      "loss": 0.1492,
+      "step": 925
+    },
+    {
+      "epoch": 0.772104607721046,
+      "grad_norm": 0.08313252031803131,
+      "learning_rate": 4.600000000000001e-05,
+      "loss": 0.1422,
+      "step": 930
+    },
+    {
+      "epoch": 0.776255707762557,
+      "grad_norm": 0.09343947470188141,
+      "learning_rate": 4.516666666666667e-05,
+      "loss": 0.1467,
+      "step": 935
+    },
+    {
+      "epoch": 0.7804068078040681,
+      "grad_norm": 0.11305926740169525,
+      "learning_rate": 4.433333333333334e-05,
+      "loss": 0.143,
+      "step": 940
+    },
+    {
+      "epoch": 0.7845579078455791,
+      "grad_norm": 0.12202338129281998,
+      "learning_rate": 4.35e-05,
+      "loss": 0.1382,
+      "step": 945
+    },
+    {
+      "epoch": 0.7887090078870901,
+      "grad_norm": 0.13653194904327393,
+      "learning_rate": 4.266666666666667e-05,
+      "loss": 0.1248,
+      "step": 950
+    },
+    {
+      "epoch": 0.7928601079286011,
+      "grad_norm": 0.1358615607023239,
+      "learning_rate": 4.183333333333334e-05,
+      "loss": 0.1528,
+      "step": 955
+    },
+    {
+      "epoch": 0.797011207970112,
+      "grad_norm": 0.13426993787288666,
+      "learning_rate": 4.1e-05,
+      "loss": 0.1511,
+      "step": 960
+    },
+    {
+      "epoch": 0.801162308011623,
+      "grad_norm": 0.08840786665678024,
+      "learning_rate": 4.016666666666667e-05,
+      "loss": 0.147,
+      "step": 965
+    },
+    {
+      "epoch": 0.8053134080531341,
+      "grad_norm": 0.10167238861322403,
+      "learning_rate": 3.933333333333333e-05,
+      "loss": 0.124,
+      "step": 970
+    },
+    {
+      "epoch": 0.8094645080946451,
+      "grad_norm": 0.15286456048488617,
+      "learning_rate": 3.85e-05,
+      "loss": 0.1388,
+      "step": 975
+    },
+    {
+      "epoch": 0.8136156081361561,
+      "grad_norm": 0.12808531522750854,
+      "learning_rate": 3.766666666666667e-05,
+      "loss": 0.1312,
+      "step": 980
+    },
+    {
+      "epoch": 0.8177667081776671,
+      "grad_norm": 0.11656677722930908,
+      "learning_rate": 3.683333333333334e-05,
+      "loss": 0.1479,
+      "step": 985
+    },
+    {
+      "epoch": 0.821917808219178,
+      "grad_norm": 0.10321146994829178,
+      "learning_rate": 3.6e-05,
+      "loss": 0.1233,
+      "step": 990
+    },
+    {
+      "epoch": 0.8260689082606891,
+      "grad_norm": 0.14637711644172668,
+      "learning_rate": 3.516666666666667e-05,
+      "loss": 0.1381,
+      "step": 995
+    },
+    {
+      "epoch": 0.8302200083022001,
+      "grad_norm": 0.11799775063991547,
+      "learning_rate": 3.433333333333333e-05,
+      "loss": 0.1371,
+      "step": 1000
+    },
+    {
+      "epoch": 0.8302200083022001,
+      "eval_loss": 0.14595866203308105,
+      "eval_runtime": 186.6793,
+      "eval_samples_per_second": 4.821,
+      "eval_steps_per_second": 2.411,
+      "step": 1000
+    },
+    {
+      "epoch": 0.8343711083437111,
+      "grad_norm": 0.056576840579509735,
+      "learning_rate": 3.35e-05,
+      "loss": 0.1278,
+      "step": 1005
+    },
+    {
+      "epoch": 0.8385222083852221,
+      "grad_norm": 0.12720656394958496,
+      "learning_rate": 3.266666666666667e-05,
+      "loss": 0.1467,
+      "step": 1010
+    },
+    {
+      "epoch": 0.842673308426733,
+      "grad_norm": 0.1243145540356636,
+      "learning_rate": 3.183333333333334e-05,
+      "loss": 0.1404,
+      "step": 1015
+    },
+    {
+      "epoch": 0.8468244084682441,
+      "grad_norm": 0.12584663927555084,
+      "learning_rate": 3.1e-05,
+      "loss": 0.1511,
+      "step": 1020
+    },
+    {
+      "epoch": 0.8509755085097551,
+      "grad_norm": 0.14398354291915894,
+      "learning_rate": 3.016666666666667e-05,
+      "loss": 0.1435,
+      "step": 1025
+    },
+    {
+      "epoch": 0.8551266085512661,
+      "grad_norm": 0.17242616415023804,
+      "learning_rate": 2.9333333333333336e-05,
+      "loss": 0.1524,
+      "step": 1030
+    },
+    {
+      "epoch": 0.8592777085927771,
+      "grad_norm": 0.11134395748376846,
+      "learning_rate": 2.8499999999999998e-05,
+      "loss": 0.1406,
+      "step": 1035
+    },
+    {
+      "epoch": 0.8634288086342881,
+      "grad_norm": 0.13715781271457672,
+      "learning_rate": 2.7666666666666667e-05,
+      "loss": 0.1504,
+      "step": 1040
+    },
+    {
+      "epoch": 0.867579908675799,
+      "grad_norm": 0.13127276301383972,
+      "learning_rate": 2.6833333333333333e-05,
+      "loss": 0.1465,
+      "step": 1045
+    },
+    {
+      "epoch": 0.8717310087173101,
+      "grad_norm": 0.1410035789012909,
+      "learning_rate": 2.6000000000000002e-05,
+      "loss": 0.1289,
+      "step": 1050
+    },
+    {
+      "epoch": 0.8758821087588211,
+      "grad_norm": 0.1502102166414261,
+      "learning_rate": 2.5166666666666667e-05,
+      "loss": 0.1367,
+      "step": 1055
+    },
+    {
+      "epoch": 0.8800332088003321,
+      "grad_norm": 0.12710800766944885,
+      "learning_rate": 2.4333333333333336e-05,
+      "loss": 0.135,
+      "step": 1060
+    },
+    {
+      "epoch": 0.8841843088418431,
+      "grad_norm": 0.1329444795846939,
+      "learning_rate": 2.35e-05,
+      "loss": 0.1315,
+      "step": 1065
+    },
+    {
+      "epoch": 0.888335408883354,
+      "grad_norm": 0.12778909504413605,
+      "learning_rate": 2.2666666666666668e-05,
+      "loss": 0.1541,
+      "step": 1070
+    },
+    {
+      "epoch": 0.8924865089248651,
+      "grad_norm": 0.12005715072154999,
+      "learning_rate": 2.1833333333333333e-05,
+      "loss": 0.151,
+      "step": 1075
+    },
+    {
+      "epoch": 0.8966376089663761,
+      "grad_norm": 0.08895500004291534,
+      "learning_rate": 2.1e-05,
+      "loss": 0.1387,
+      "step": 1080
+    },
+    {
+      "epoch": 0.9007887090078871,
+      "grad_norm": 0.12626707553863525,
+      "learning_rate": 2.0166666666666668e-05,
+      "loss": 0.149,
+      "step": 1085
+    },
+    {
+      "epoch": 0.9049398090493981,
+      "grad_norm": 0.13254553079605103,
+      "learning_rate": 1.9333333333333333e-05,
+      "loss": 0.1414,
+      "step": 1090
+    },
+    {
+      "epoch": 0.9090909090909091,
+      "grad_norm": 0.1267685890197754,
+      "learning_rate": 1.85e-05,
+      "loss": 0.1647,
+      "step": 1095
+    },
+    {
+      "epoch": 0.91324200913242,
+      "grad_norm": 0.12603411078453064,
+      "learning_rate": 1.7666666666666668e-05,
+      "loss": 0.1407,
+      "step": 1100
+    },
+    {
+      "epoch": 0.9173931091739311,
+      "grad_norm": 0.172598734498024,
+      "learning_rate": 1.6833333333333334e-05,
+      "loss": 0.1435,
+      "step": 1105
+    },
+    {
+      "epoch": 0.9215442092154421,
+      "grad_norm": 0.143843412399292,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 0.1517,
+      "step": 1110
+    },
+    {
+      "epoch": 0.9256953092569531,
+      "grad_norm": 0.14932668209075928,
+      "learning_rate": 1.5166666666666668e-05,
+      "loss": 0.1528,
+      "step": 1115
+    },
+    {
+      "epoch": 0.9298464092984641,
+      "grad_norm": 0.1127849742770195,
+      "learning_rate": 1.4333333333333334e-05,
+      "loss": 0.1355,
+      "step": 1120
+    },
+    {
+      "epoch": 0.933997509339975,
+      "grad_norm": 0.1379147619009018,
+      "learning_rate": 1.3500000000000001e-05,
+      "loss": 0.1505,
+      "step": 1125
+    },
+    {
+      "epoch": 0.9381486093814861,
+      "grad_norm": 0.11335213482379913,
+      "learning_rate": 1.2666666666666668e-05,
+      "loss": 0.1517,
+      "step": 1130
+    },
+    {
+      "epoch": 0.9422997094229971,
+      "grad_norm": 0.13676024973392487,
+      "learning_rate": 1.1833333333333334e-05,
+      "loss": 0.1487,
+      "step": 1135
+    },
+    {
+      "epoch": 0.9464508094645081,
+      "grad_norm": 0.11180838197469711,
+      "learning_rate": 1.1000000000000001e-05,
+      "loss": 0.124,
+      "step": 1140
+    },
+    {
+      "epoch": 0.9506019095060191,
+      "grad_norm": 0.13548138737678528,
+      "learning_rate": 1.0166666666666667e-05,
+      "loss": 0.1496,
+      "step": 1145
+    },
+    {
+      "epoch": 0.9547530095475301,
+      "grad_norm": 0.1309673935174942,
+      "learning_rate": 9.333333333333334e-06,
+      "loss": 0.1315,
+      "step": 1150
+    },
+    {
+      "epoch": 0.958904109589041,
+      "grad_norm": 0.11803894490003586,
+      "learning_rate": 8.500000000000002e-06,
+      "loss": 0.1609,
+      "step": 1155
+    },
+    {
+      "epoch": 0.9630552096305521,
+      "grad_norm": 0.12026551365852356,
+      "learning_rate": 7.666666666666667e-06,
+      "loss": 0.145,
+      "step": 1160
+    },
+    {
+      "epoch": 0.9672063096720631,
+      "grad_norm": 0.14298652112483978,
+      "learning_rate": 6.833333333333333e-06,
+      "loss": 0.154,
+      "step": 1165
+    },
+    {
+      "epoch": 0.9713574097135741,
+      "grad_norm": 0.13830389082431793,
+      "learning_rate": 6e-06,
+      "loss": 0.1373,
+      "step": 1170
+    },
+    {
+      "epoch": 0.9755085097550851,
+      "grad_norm": 0.1225619986653328,
+      "learning_rate": 5.166666666666667e-06,
+      "loss": 0.1522,
+      "step": 1175
+    },
+    {
+      "epoch": 0.979659609796596,
+      "grad_norm": 0.1404723823070526,
+      "learning_rate": 4.333333333333334e-06,
+      "loss": 0.1517,
+      "step": 1180
+    },
+    {
+      "epoch": 0.9838107098381071,
+      "grad_norm": 0.12082472443580627,
+      "learning_rate": 3.5000000000000004e-06,
+      "loss": 0.1503,
+      "step": 1185
+    },
+    {
+      "epoch": 0.9879618098796181,
+      "grad_norm": 0.09916210919618607,
+      "learning_rate": 2.666666666666667e-06,
+      "loss": 0.1369,
+      "step": 1190
+    },
+    {
+      "epoch": 0.9921129099211291,
+      "grad_norm": 0.13724960386753082,
+      "learning_rate": 1.8333333333333335e-06,
+      "loss": 0.1304,
+      "step": 1195
+    },
+    {
+      "epoch": 0.9962640099626401,
+      "grad_norm": 0.13304875791072845,
+      "learning_rate": 1.0000000000000002e-06,
+      "loss": 0.1429,
+      "step": 1200
+    },
+    {
+      "epoch": 0.9962640099626401,
+      "eval_loss": 0.14440582692623138,
+      "eval_runtime": 186.846,
+      "eval_samples_per_second": 4.817,
+      "eval_steps_per_second": 2.408,
+      "step": 1200
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 1205,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 200,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.0590193691259955e+18,
+  "train_batch_size": 10,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d3c9c56184f5d8d7b68e83a22f6d77f9526e1012e8aa88632b3d7e2fb11870a
+size 5713