VoltageVagabond commited on Apr 8

Commit

fd24fbb

verified ·

1 Parent(s): e0fd362

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
README.md +2 -2
_git_history_archive.txt +67 -0
adapters_backup/README.md +2 -2
adapters_backup/adapter_config.json +4 -4
adapters_backup/adapter_model.safetensors +1 -1
adapters_backup/checkpoint-1600/adapter_config.json +4 -4
adapters_backup/checkpoint-1600/adapter_model.safetensors +1 -1
adapters_backup/checkpoint-1600/optimizer.pt +1 -1
adapters_backup/checkpoint-1600/rng_state.pth +1 -1
adapters_backup/checkpoint-1600/scheduler.pt +1 -1
adapters_backup/checkpoint-1600/trainer_state.json +1123 -1123
adapters_backup/checkpoint-1600/training_args.bin +2 -2
adapters_backup/checkpoint-3200/README.md +209 -0
adapters_backup/checkpoint-3200/adapter_config.json +47 -0
adapters_backup/checkpoint-3200/adapter_model.safetensors +3 -0
adapters_backup/checkpoint-3200/chat_template.jinja +45 -0
adapters_backup/checkpoint-3200/optimizer.pt +3 -0
adapters_backup/checkpoint-3200/rng_state.pth +3 -0
adapters_backup/checkpoint-3200/scheduler.pt +3 -0
adapters_backup/checkpoint-3200/tokenizer.json +0 -0
adapters_backup/checkpoint-3200/tokenizer_config.json +19 -0
adapters_backup/checkpoint-3200/trainer_state.json +3234 -0
adapters_backup/checkpoint-3200/training_args.bin +3 -0
adapters_backup/checkpoint-4800/README.md +209 -0
adapters_backup/checkpoint-4800/adapter_config.json +47 -0
adapters_backup/checkpoint-4800/adapter_model.safetensors +3 -0
adapters_backup/checkpoint-4800/chat_template.jinja +45 -0
adapters_backup/checkpoint-4800/optimizer.pt +3 -0
adapters_backup/checkpoint-4800/rng_state.pth +3 -0
adapters_backup/checkpoint-4800/scheduler.pt +3 -0
adapters_backup/checkpoint-4800/tokenizer.json +0 -0
adapters_backup/checkpoint-4800/tokenizer_config.json +19 -0
adapters_backup/checkpoint-4800/trainer_state.json +0 -0
adapters_backup/checkpoint-4800/training_args.bin +3 -0
adapters_backup/training_args.bin +2 -2
adapters_full/README.md +62 -0
adapters_full/adapter_config.json +47 -0
adapters_full/adapter_model.safetensors +3 -0
adapters_full/chat_template.jinja +45 -0
adapters_full/checkpoint-4000/README.md +209 -0
adapters_full/checkpoint-4000/adapter_config.json +47 -0
adapters_full/checkpoint-4000/adapter_model.safetensors +3 -0
adapters_full/checkpoint-4000/chat_template.jinja +45 -0
adapters_full/checkpoint-4000/optimizer.pt +3 -0
adapters_full/checkpoint-4000/rng_state.pth +3 -0
adapters_full/checkpoint-4000/scheduler.pt +3 -0
adapters_full/checkpoint-4000/tokenizer.json +0 -0
adapters_full/checkpoint-4000/tokenizer_config.json +19 -0
adapters_full/checkpoint-4000/trainer_state.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+docs/references/papers/LFM2_TechReport.pdf filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -108,8 +108,8 @@ It is **not** intended for production spam filtering.
 | Model | Description | Link |
 |-------|-------------|------|
 | spam-classifier-mlx | Qwen 3.5 0.8B MLX LoRA fine-tune | [VoltageVagabond/spam-classifier-mlx](https://huggingface.co/VoltageVagabond/spam-classifier-mlx) |
-| spam-classifier-gradio-model | sklearn voting ensemble (RF + LR + SVM) | [VoltageVagabond/spam-classifier-gradio-model](https://huggingface.co/VoltageVagabond/spam-classifier-gradio-model) |
-| spam-xai-model | Calibrated Random Forest with XAI | [VoltageVagabond/spam-xai-model](https://huggingface.co/VoltageVagabond/spam-xai-model) |
 ## Citation

 | Model | Description | Link |
 |-------|-------------|------|
 | spam-classifier-mlx | Qwen 3.5 0.8B MLX LoRA fine-tune | [VoltageVagabond/spam-classifier-mlx](https://huggingface.co/VoltageVagabond/spam-classifier-mlx) |
+| spam-xai-model | sklearn voting ensemble (RF + LR + SVM) with LIME/SHAP/ELI5 explainability | [VoltageVagabond/spam-xai-model](https://huggingface.co/VoltageVagabond/spam-xai-model) |
+| spam-xai-classifier (Space) | Live Gradio web app for the sklearn classifier | [VoltageVagabond/spam-xai-classifier](https://huggingface.co/spaces/VoltageVagabond/spam-xai-classifier) |
 ## Citation

_git_history_archive.txt ADDED Viewed

	@@ -0,0 +1,67 @@

+# Git History Archive — spam-classifier-liquid
+# Saved 2026-04-07 before absorbing into parent repo
+# Original repo had no remote; this is a flat snapshot of the local commit log.
+## Full log (--all --decorate --graph)
+* 8c0f1bf 2026-03-27  (HEAD -> main) docs: update changelog with v0.3.1 — timing corrections and code sources reference
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* 02920a6 2026-03-27  docs: update training times — ~45 min (notebook, 1 epoch) / ~2-2.5 hrs (full, 3 epochs)
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* 7b53739 2026-03-27  docs: add code sources reference — every snippet traced to its origin
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* 9bf2ded 2026-03-27  docs: update changelog with v0.3.0 cookbook-aligned LoRA config
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* b890c3b 2026-03-27  feat: update LoRA config to match Liquid AI official cookbook
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* dfca3ff 2026-03-27  docs: update changelog — no orphaned port issue in Liquid AI version
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* cd2c511 2026-03-27  docs: update changelog — batch size 8 tested and reverted
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* f212409 2026-03-27  revert: batch size back to 4 — MPS saturated, no speed gain at 8
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* e5f71f0 2026-03-27  perf: increase batch size to 8 for faster training
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* 4a4c721 2026-03-27  docs: update changelog with v0.2.0 performance tuning
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* 7ca6a5c 2026-03-27  perf: increase batch size to 4 and LoRA rank to 32 for faster, better training
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* f8010cc 2026-03-27  docs: update changelog with v0.1.1 fixes
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* b89f744 2026-03-27  fix: rename max_seq_length to max_length for TRL v0.29 compatibility
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* 778a3dd 2026-03-27  feat: add interactive Jupyter notebook walkthrough
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* d39660b 2026-03-27  docs: add beginner-friendly guides (Liquid AI, LoRA, training, setup)
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* 258e8ff 2026-03-27  feat: add Gradio web UI with Classify and Chat tabs
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* 81dd454 2026-03-27  feat: add LoRA fine-tuning script using TRL SFTTrainer
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* ffadd3f 2026-03-27  feat: add macOS .command launcher scripts
+|   Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+|
+* e6f7f30 2026-03-27  chore: initial project scaffolding for Liquid AI spam classifier
+    Dakwan Balfour <JOhNdOe-hue-cyber@users.noreply.github.com>
+## Branches
+* main
+## Tags

adapters_backup/README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 base_model: LiquidAI/LFM2.5-1.2B-Instruct
 library_name: peft
-model_name: adapters
 tags:
 - base_model:adapter:LiquidAI/LFM2.5-1.2B-Instruct
 - lora
@@ -12,7 +12,7 @@ licence: license
 pipeline_tag: text-generation
 ---
-# Model Card for adapters
 This model is a fine-tuned version of [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct).
 It has been trained using [TRL](https://github.com/huggingface/trl).

 ---
 base_model: LiquidAI/LFM2.5-1.2B-Instruct
 library_name: peft
+model_name: adapters_fast
 tags:
 - base_model:adapter:LiquidAI/LFM2.5-1.2B-Instruct
 - lora
 pipeline_tag: text-generation
 ---
+# Model Card for adapters_fast
 This model is a fine-tuned version of [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct).
 It has been trained using [TRL](https://github.com/huggingface/trl).

adapters_backup/adapter_config.json CHANGED Viewed

@@ -30,13 +30,13 @@
   "revision": null,
   "target_modules": [
     "w1",
-    "w2",
-    "in_proj",
     "out_proj",
     "v_proj",
-    "k_proj",
     "q_proj",
-    "w3"
   ],
   "target_parameters": null,
   "task_type": "CAUSAL_LM",

   "revision": null,
   "target_modules": [
     "w1",
     "out_proj",
+    "w3",
+    "w2",
     "v_proj",
+    "in_proj",
     "q_proj",
+    "k_proj"
   ],
   "target_parameters": null,
   "task_type": "CAUSAL_LM",

adapters_backup/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dfb25d4ed3ce27f55a8a6b3ed88c2b0c217532929c2450711bf97b6adfb230c1
 size 22240880

 version https://git-lfs.github.com/spec/v1
+oid sha256:a19d950faf1cff366b898e918ccf3219ec7b5afe8fd3eda00c1064a2aa7e3423
 size 22240880

adapters_backup/checkpoint-1600/adapter_config.json CHANGED Viewed

@@ -30,13 +30,13 @@
   "revision": null,
   "target_modules": [
     "w1",
-    "w2",
-    "in_proj",
     "out_proj",
     "v_proj",
-    "k_proj",
     "q_proj",
-    "w3"
   ],
   "target_parameters": null,
   "task_type": "CAUSAL_LM",

   "revision": null,
   "target_modules": [
     "w1",
     "out_proj",
+    "w3",
+    "w2",
     "v_proj",
+    "in_proj",
     "q_proj",
+    "k_proj"
   ],
   "target_parameters": null,
   "task_type": "CAUSAL_LM",

adapters_backup/checkpoint-1600/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e53cd62e6a555731ac6fd03c4028a958a01ce2c839440d543657c7341d04fb9f
 size 22240880

 version https://git-lfs.github.com/spec/v1
+oid sha256:abd63097be6ea3cd4fe1b79066a55c046a0e2296e776e5d01d6ce1410b4c0ed7
 size 22240880

adapters_backup/checkpoint-1600/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4950d6b686bed6152569f1ef8141dfb6dc64724c624dc0f2642c3d762b6eead6
 size 44583435

 version https://git-lfs.github.com/spec/v1
+oid sha256:2eeb55c4d9414b608a34920cf1d4b09c70f0d2284a48a0e69189be3b09578c9a
 size 44583435

adapters_backup/checkpoint-1600/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e3a77d4a8b98ce027a4d6a3b9fb5d7c904e27ec1efd5c0468c24fa26bb738316
 size 14455

 version https://git-lfs.github.com/spec/v1
+oid sha256:2cddf27219365242ec1046a3532a63a24c3f350c77f100e4f973369db2cc849d
 size 14455

adapters_backup/checkpoint-1600/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8a46ff0609e31271554a0745b8ee400c57d37b3006b0e239b124dc8f3c864c23
 size 1465

 version https://git-lfs.github.com/spec/v1
+oid sha256:b48ba3b2ef84e73f260e2408a9f93631baf669715d86d243a5a69bcecc482044
 size 1465

adapters_backup/checkpoint-1600/trainer_state.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 2.0,
   "eval_steps": 500,
   "global_step": 1600,
   "is_hyper_param_search": false,
@@ -10,1608 +10,1608 @@
   "is_world_process_zero": true,
   "log_history": [
     {
-      "entropy": 2.4365317583084107,
-      "epoch": 0.0125,
-      "grad_norm": 2.2641165256500244,
-      "learning_rate": 0.00019925,
-      "loss": 3.0923173904418944,
-      "mean_token_accuracy": 0.4703604757785797,
-      "num_tokens": 19650.0,
       "step": 10
     },
     {
-      "entropy": 2.2125820398330687,
-      "epoch": 0.025,
-      "grad_norm": 1.2537646293640137,
-      "learning_rate": 0.00019841666666666667,
-      "loss": 2.1859312057495117,
-      "mean_token_accuracy": 0.5804757893085479,
-      "num_tokens": 39600.0,
       "step": 20
     },
     {
-      "entropy": 1.9064133524894715,
-      "epoch": 0.0375,
-      "grad_norm": 1.112907886505127,
-      "learning_rate": 0.00019758333333333333,
-      "loss": 1.8618108749389648,
-      "mean_token_accuracy": 0.6217218160629272,
-      "num_tokens": 58974.0,
       "step": 30
     },
     {
-      "entropy": 1.7035388112068177,
-      "epoch": 0.05,
-      "grad_norm": 1.0934430360794067,
-      "learning_rate": 0.00019675,
-      "loss": 1.6600250244140624,
-      "mean_token_accuracy": 0.6560751855373382,
-      "num_tokens": 78632.0,
       "step": 40
     },
     {
-      "entropy": 1.679866325855255,
-      "epoch": 0.0625,
-      "grad_norm": 0.9702284932136536,
-      "learning_rate": 0.0001959166666666667,
-      "loss": 1.6569272994995117,
-      "mean_token_accuracy": 0.6543680191040039,
-      "num_tokens": 98467.0,
       "step": 50
     },
     {
-      "entropy": 1.650013840198517,
-      "epoch": 0.075,
-      "grad_norm": 1.0482919216156006,
-      "learning_rate": 0.00019508333333333335,
-      "loss": 1.6767396926879883,
-      "mean_token_accuracy": 0.6546808481216431,
-      "num_tokens": 118695.0,
       "step": 60
     },
     {
-      "entropy": 1.6587832570075989,
-      "epoch": 0.0875,
-      "grad_norm": 1.0625005960464478,
-      "learning_rate": 0.00019425,
-      "loss": 1.6058927536010743,
-      "mean_token_accuracy": 0.6624441742897034,
-      "num_tokens": 138581.0,
       "step": 70
     },
     {
-      "entropy": 1.5274210929870606,
-      "epoch": 0.1,
-      "grad_norm": 1.104002594947815,
-      "learning_rate": 0.00019341666666666666,
-      "loss": 1.5227754592895508,
-      "mean_token_accuracy": 0.672959280014038,
-      "num_tokens": 158134.0,
       "step": 80
     },
     {
-      "entropy": 1.6366503357887268,
-      "epoch": 0.1125,
-      "grad_norm": 1.202987551689148,
-      "learning_rate": 0.00019258333333333334,
-      "loss": 1.6429677963256837,
-      "mean_token_accuracy": 0.6619946777820587,
-      "num_tokens": 177805.0,
       "step": 90
     },
     {
-      "entropy": 1.4932058334350586,
-      "epoch": 0.125,
-      "grad_norm": 1.1348373889923096,
-      "learning_rate": 0.00019175,
-      "loss": 1.4319764137268067,
-      "mean_token_accuracy": 0.6968584775924682,
-      "num_tokens": 196734.0,
       "step": 100
     },
     {
-      "entropy": 1.6097277998924255,
-      "epoch": 0.1375,
-      "grad_norm": 1.0352333784103394,
-      "learning_rate": 0.00019091666666666668,
-      "loss": 1.6199520111083985,
-      "mean_token_accuracy": 0.6659628450870514,
-      "num_tokens": 216459.0,
       "step": 110
     },
     {
-      "entropy": 1.4460569024085999,
-      "epoch": 0.15,
-      "grad_norm": 1.0603595972061157,
-      "learning_rate": 0.00019008333333333334,
-      "loss": 1.4454618453979493,
-      "mean_token_accuracy": 0.6908528983592988,
-      "num_tokens": 235952.0,
       "step": 120
     },
     {
-      "entropy": 1.486998987197876,
-      "epoch": 0.1625,
-      "grad_norm": 1.1332181692123413,
-      "learning_rate": 0.00018925000000000002,
-      "loss": 1.4601757049560546,
-      "mean_token_accuracy": 0.6834868788719177,
-      "num_tokens": 255310.0,
       "step": 130
     },
     {
-      "entropy": 1.4068393468856812,
-      "epoch": 0.175,
-      "grad_norm": 1.4150217771530151,
-      "learning_rate": 0.00018841666666666667,
-      "loss": 1.3998458862304688,
-      "mean_token_accuracy": 0.6949155867099762,
-      "num_tokens": 274543.0,
       "step": 140
     },
     {
-      "entropy": 1.518944799900055,
-      "epoch": 0.1875,
-      "grad_norm": 1.0065048933029175,
-      "learning_rate": 0.00018758333333333333,
-      "loss": 1.5198850631713867,
-      "mean_token_accuracy": 0.6846470057964325,
-      "num_tokens": 294346.0,
       "step": 150
     },
     {
-      "entropy": 1.5140818357467651,
-      "epoch": 0.2,
-      "grad_norm": 1.0254008769989014,
-      "learning_rate": 0.00018675,
-      "loss": 1.4625560760498046,
-      "mean_token_accuracy": 0.6905276715755463,
-      "num_tokens": 314317.0,
       "step": 160
     },
     {
-      "entropy": 1.3697773694992066,
-      "epoch": 0.2125,
-      "grad_norm": 1.0158512592315674,
-      "learning_rate": 0.00018591666666666667,
-      "loss": 1.3859831809997558,
-      "mean_token_accuracy": 0.702717524766922,
-      "num_tokens": 333418.0,
       "step": 170
     },
     {
-      "entropy": 1.3790300369262696,
-      "epoch": 0.225,
-      "grad_norm": 0.9853971600532532,
-      "learning_rate": 0.00018508333333333335,
-      "loss": 1.3514082908630372,
-      "mean_token_accuracy": 0.7031752705574036,
-      "num_tokens": 352849.0,
       "step": 180
     },
     {
-      "entropy": 1.44560107588768,
-      "epoch": 0.2375,
-      "grad_norm": 1.0075277090072632,
-      "learning_rate": 0.00018425,
-      "loss": 1.4432807922363282,
-      "mean_token_accuracy": 0.6883994936943054,
-      "num_tokens": 371968.0,
       "step": 190
     },
     {
-      "entropy": 1.4308308720588685,
-      "epoch": 0.25,
-      "grad_norm": 1.069111943244934,
-      "learning_rate": 0.0001834166666666667,
-      "loss": 1.399219036102295,
-      "mean_token_accuracy": 0.6990507245063782,
-      "num_tokens": 391922.0,
       "step": 200
     },
     {
-      "entropy": 1.5079384326934815,
-      "epoch": 0.2625,
-      "grad_norm": 1.0157058238983154,
-      "learning_rate": 0.00018258333333333334,
-      "loss": 1.5034866333007812,
-      "mean_token_accuracy": 0.6795864582061768,
-      "num_tokens": 411933.0,
       "step": 210
     },
     {
-      "entropy": 1.3697621464729308,
-      "epoch": 0.275,
-      "grad_norm": 1.0230482816696167,
-      "learning_rate": 0.00018175,
-      "loss": 1.3737371444702149,
-      "mean_token_accuracy": 0.6980592548847199,
-      "num_tokens": 431351.0,
       "step": 220
     },
     {
-      "entropy": 1.4589454770088195,
-      "epoch": 0.2875,
-      "grad_norm": 0.9660580158233643,
-      "learning_rate": 0.00018091666666666666,
-      "loss": 1.4150454521179199,
-      "mean_token_accuracy": 0.6872711777687073,
-      "num_tokens": 451445.0,
       "step": 230
     },
     {
-      "entropy": 1.4555582284927369,
-      "epoch": 0.3,
-      "grad_norm": 0.9709576964378357,
-      "learning_rate": 0.00018008333333333334,
-      "loss": 1.4869775772094727,
-      "mean_token_accuracy": 0.6791890025138855,
-      "num_tokens": 470890.0,
       "step": 240
     },
     {
-      "entropy": 1.3962551832199097,
-      "epoch": 0.3125,
-      "grad_norm": 1.033650279045105,
-      "learning_rate": 0.00017925000000000002,
-      "loss": 1.365213680267334,
-      "mean_token_accuracy": 0.7061276912689209,
-      "num_tokens": 490394.0,
       "step": 250
     },
     {
-      "entropy": 1.431434118747711,
-      "epoch": 0.325,
-      "grad_norm": 0.9621152281761169,
-      "learning_rate": 0.00017841666666666668,
-      "loss": 1.4454474449157715,
-      "mean_token_accuracy": 0.6888130843639374,
-      "num_tokens": 509844.0,
       "step": 260
     },
     {
-      "entropy": 1.4493051528930665,
-      "epoch": 0.3375,
-      "grad_norm": 1.0050407648086548,
-      "learning_rate": 0.00017758333333333336,
-      "loss": 1.4004398345947267,
-      "mean_token_accuracy": 0.6924900412559509,
-      "num_tokens": 529701.0,
       "step": 270
     },
     {
-      "entropy": 1.3920022606849671,
-      "epoch": 0.35,
-      "grad_norm": 1.0407099723815918,
-      "learning_rate": 0.00017675000000000001,
-      "loss": 1.3911012649536132,
-      "mean_token_accuracy": 0.698156726360321,
-      "num_tokens": 548793.0,
       "step": 280
     },
     {
-      "entropy": 1.3564496397972108,
-      "epoch": 0.3625,
-      "grad_norm": 0.9233337044715881,
-      "learning_rate": 0.0001759166666666667,
-      "loss": 1.3383137702941894,
-      "mean_token_accuracy": 0.7085518896579742,
-      "num_tokens": 568445.0,
       "step": 290
     },
     {
-      "entropy": 1.2203116059303283,
-      "epoch": 0.375,
-      "grad_norm": 1.249506950378418,
-      "learning_rate": 0.00017508333333333332,
-      "loss": 1.2175632476806642,
-      "mean_token_accuracy": 0.7249119699001312,
-      "num_tokens": 587359.0,
       "step": 300
     },
     {
-      "entropy": 1.399228608608246,
-      "epoch": 0.3875,
-      "grad_norm": 0.9400711059570312,
-      "learning_rate": 0.00017425,
-      "loss": 1.3867461204528808,
-      "mean_token_accuracy": 0.701468026638031,
-      "num_tokens": 607145.0,
       "step": 310
     },
     {
-      "entropy": 1.4428321838378906,
-      "epoch": 0.4,
-      "grad_norm": 1.0482590198516846,
-      "learning_rate": 0.00017341666666666666,
-      "loss": 1.460653591156006,
-      "mean_token_accuracy": 0.6849877774715424,
-      "num_tokens": 626704.0,
       "step": 320
     },
     {
-      "entropy": 1.4342604160308838,
-      "epoch": 0.4125,
-      "grad_norm": 1.3122109174728394,
-      "learning_rate": 0.00017258333333333335,
-      "loss": 1.4024925231933594,
-      "mean_token_accuracy": 0.6946839153766632,
-      "num_tokens": 646375.0,
       "step": 330
     },
     {
-      "entropy": 1.424432909488678,
-      "epoch": 0.425,
-      "grad_norm": 1.0658537149429321,
-      "learning_rate": 0.00017175,
-      "loss": 1.4141224861145019,
-      "mean_token_accuracy": 0.6981550514698028,
-      "num_tokens": 665615.0,
       "step": 340
     },
     {
-      "entropy": 1.3667653918266296,
-      "epoch": 0.4375,
-      "grad_norm": 1.0600755214691162,
-      "learning_rate": 0.00017091666666666668,
-      "loss": 1.3437227249145507,
-      "mean_token_accuracy": 0.7075554847717285,
-      "num_tokens": 685264.0,
       "step": 350
     },
     {
-      "entropy": 1.2835592031478882,
-      "epoch": 0.45,
-      "grad_norm": 1.1075655221939087,
-      "learning_rate": 0.00017008333333333334,
-      "loss": 1.2650139808654786,
-      "mean_token_accuracy": 0.7190586984157562,
-      "num_tokens": 704293.0,
       "step": 360
     },
     {
-      "entropy": 1.4035864353179932,
-      "epoch": 0.4625,
-      "grad_norm": 0.9761744737625122,
-      "learning_rate": 0.00016925,
-      "loss": 1.388676357269287,
-      "mean_token_accuracy": 0.7017988383769989,
-      "num_tokens": 724065.0,
       "step": 370
     },
     {
-      "entropy": 1.4664394021034242,
-      "epoch": 0.475,
-      "grad_norm": 1.0783321857452393,
-      "learning_rate": 0.00016841666666666668,
-      "loss": 1.4701464653015137,
-      "mean_token_accuracy": 0.6891542494297027,
-      "num_tokens": 743995.0,
       "step": 380
     },
     {
-      "entropy": 1.4636149168014527,
-      "epoch": 0.4875,
-      "grad_norm": 1.0437464714050293,
-      "learning_rate": 0.00016758333333333333,
-      "loss": 1.431545352935791,
-      "mean_token_accuracy": 0.695414400100708,
-      "num_tokens": 763861.0,
       "step": 390
     },
     {
-      "entropy": 1.321478819847107,
-      "epoch": 0.5,
-      "grad_norm": 0.9926327466964722,
-      "learning_rate": 0.00016675000000000001,
-      "loss": 1.282765769958496,
-      "mean_token_accuracy": 0.7207030296325684,
-      "num_tokens": 783127.0,
       "step": 400
     },
     {
-      "entropy": 1.3715664863586425,
-      "epoch": 0.5125,
-      "grad_norm": 1.0086846351623535,
-      "learning_rate": 0.00016591666666666667,
-      "loss": 1.3807523727416993,
-      "mean_token_accuracy": 0.7099938869476319,
-      "num_tokens": 802899.0,
       "step": 410
     },
     {
-      "entropy": 1.490940225124359,
-      "epoch": 0.525,
-      "grad_norm": 0.9043129682540894,
-      "learning_rate": 0.00016508333333333335,
-      "loss": 1.4725797653198243,
-      "mean_token_accuracy": 0.6809450089931488,
-      "num_tokens": 822741.0,
       "step": 420
     },
     {
-      "entropy": 1.3289565563201904,
-      "epoch": 0.5375,
-      "grad_norm": 1.011071801185608,
-      "learning_rate": 0.00016425,
-      "loss": 1.3225143432617188,
-      "mean_token_accuracy": 0.7128117382526398,
-      "num_tokens": 842495.0,
       "step": 430
     },
     {
-      "entropy": 1.486392891407013,
-      "epoch": 0.55,
-      "grad_norm": 1.1270556449890137,
-      "learning_rate": 0.0001634166666666667,
-      "loss": 1.4867573738098145,
-      "mean_token_accuracy": 0.6962902247905731,
-      "num_tokens": 861852.0,
       "step": 440
     },
     {
-      "entropy": 1.5179155349731446,
-      "epoch": 0.5625,
-      "grad_norm": 0.9689193964004517,
-      "learning_rate": 0.00016258333333333332,
-      "loss": 1.4660252571105956,
-      "mean_token_accuracy": 0.6884240567684173,
-      "num_tokens": 882172.0,
       "step": 450
     },
     {
-      "entropy": 1.315347284078598,
-      "epoch": 0.575,
-      "grad_norm": 0.9755958318710327,
-      "learning_rate": 0.00016175,
-      "loss": 1.3408552169799806,
-      "mean_token_accuracy": 0.7023321747779846,
-      "num_tokens": 901952.0,
       "step": 460
     },
     {
-      "entropy": 1.3972121477127075,
-      "epoch": 0.5875,
-      "grad_norm": 1.0787208080291748,
-      "learning_rate": 0.00016091666666666668,
-      "loss": 1.4059626579284668,
-      "mean_token_accuracy": 0.6934876084327698,
-      "num_tokens": 921955.0,
       "step": 470
     },
     {
-      "entropy": 1.4170458436012268,
-      "epoch": 0.6,
-      "grad_norm": 0.9871561527252197,
-      "learning_rate": 0.00016008333333333334,
-      "loss": 1.3711077690124511,
-      "mean_token_accuracy": 0.7038675427436829,
-      "num_tokens": 941767.0,
       "step": 480
     },
     {
-      "entropy": 1.3112692713737488,
-      "epoch": 0.6125,
-      "grad_norm": 1.144695520401001,
-      "learning_rate": 0.00015925000000000002,
-      "loss": 1.303321647644043,
-      "mean_token_accuracy": 0.7135074377059937,
-      "num_tokens": 961372.0,
       "step": 490
     },
     {
-      "entropy": 1.4551821947097778,
-      "epoch": 0.625,
-      "grad_norm": 1.00057852268219,
-      "learning_rate": 0.00015841666666666668,
-      "loss": 1.4500386238098144,
-      "mean_token_accuracy": 0.6925365328788757,
-      "num_tokens": 981288.0,
       "step": 500
     },
     {
-      "entropy": 1.4434048652648925,
-      "epoch": 0.6375,
-      "grad_norm": 1.0493526458740234,
-      "learning_rate": 0.00015758333333333336,
-      "loss": 1.4462182998657227,
-      "mean_token_accuracy": 0.6958550333976745,
-      "num_tokens": 1000494.0,
       "step": 510
     },
     {
-      "entropy": 1.318277359008789,
-      "epoch": 0.65,
-      "grad_norm": 0.9978954195976257,
-      "learning_rate": 0.00015675,
-      "loss": 1.2828590393066406,
-      "mean_token_accuracy": 0.7108975887298584,
-      "num_tokens": 1020446.0,
       "step": 520
     },
     {
-      "entropy": 1.2461042165756226,
-      "epoch": 0.6625,
-      "grad_norm": 0.9871781468391418,
-      "learning_rate": 0.00015591666666666667,
-      "loss": 1.231837558746338,
-      "mean_token_accuracy": 0.7225608587265014,
-      "num_tokens": 1040102.0,
       "step": 530
     },
     {
-      "entropy": 1.3526182651519776,
-      "epoch": 0.675,
-      "grad_norm": 1.0228792428970337,
-      "learning_rate": 0.00015508333333333333,
-      "loss": 1.3261881828308106,
-      "mean_token_accuracy": 0.7123725891113282,
-      "num_tokens": 1059218.0,
       "step": 540
     },
     {
-      "entropy": 1.4202748894691468,
-      "epoch": 0.6875,
-      "grad_norm": 1.0788418054580688,
-      "learning_rate": 0.00015425,
-      "loss": 1.3840529441833496,
-      "mean_token_accuracy": 0.7052250027656555,
-      "num_tokens": 1078941.0,
       "step": 550
     },
     {
-      "entropy": 1.488029384613037,
-      "epoch": 0.7,
-      "grad_norm": 1.0285416841506958,
-      "learning_rate": 0.00015341666666666666,
-      "loss": 1.4981650352478026,
-      "mean_token_accuracy": 0.6793413817882538,
-      "num_tokens": 1099030.0,
       "step": 560
     },
     {
-      "entropy": 1.252402228116989,
-      "epoch": 0.7125,
-      "grad_norm": 0.9509746432304382,
-      "learning_rate": 0.00015258333333333335,
-      "loss": 1.2182463645935058,
-      "mean_token_accuracy": 0.7261222839355469,
-      "num_tokens": 1118437.0,
       "step": 570
     },
     {
-      "entropy": 1.2852157652378082,
-      "epoch": 0.725,
-      "grad_norm": 1.0135730504989624,
-      "learning_rate": 0.00015175,
-      "loss": 1.2676867485046386,
-      "mean_token_accuracy": 0.7214752614498139,
-      "num_tokens": 1137692.0,
       "step": 580
     },
     {
-      "entropy": 1.2683582544326781,
-      "epoch": 0.7375,
-      "grad_norm": 1.0020545721054077,
-      "learning_rate": 0.00015091666666666668,
-      "loss": 1.2557583808898927,
-      "mean_token_accuracy": 0.7249198496341706,
-      "num_tokens": 1156808.0,
       "step": 590
     },
     {
-      "entropy": 1.335302472114563,
-      "epoch": 0.75,
-      "grad_norm": 1.1243020296096802,
-      "learning_rate": 0.00015008333333333334,
-      "loss": 1.340705966949463,
-      "mean_token_accuracy": 0.7099980711936951,
-      "num_tokens": 1176753.0,
       "step": 600
     },
     {
-      "entropy": 1.3958010911941527,
-      "epoch": 0.7625,
-      "grad_norm": 1.0989586114883423,
-      "learning_rate": 0.00014925,
-      "loss": 1.3564892768859864,
-      "mean_token_accuracy": 0.7014556527137756,
-      "num_tokens": 1196444.0,
       "step": 610
     },
     {
-      "entropy": 1.101463145017624,
-      "epoch": 0.775,
-      "grad_norm": 1.0296299457550049,
-      "learning_rate": 0.00014841666666666668,
-      "loss": 1.0752368927001954,
-      "mean_token_accuracy": 0.7481070041656495,
-      "num_tokens": 1215544.0,
       "step": 620
     },
     {
-      "entropy": 1.2422587156295777,
-      "epoch": 0.7875,
-      "grad_norm": 1.0575766563415527,
-      "learning_rate": 0.00014758333333333333,
-      "loss": 1.2519227981567382,
-      "mean_token_accuracy": 0.725595885515213,
-      "num_tokens": 1234773.0,
       "step": 630
     },
     {
-      "entropy": 1.2721437513828278,
-      "epoch": 0.8,
-      "grad_norm": 0.9789795279502869,
-      "learning_rate": 0.00014675000000000002,
-      "loss": 1.2379184722900392,
-      "mean_token_accuracy": 0.7246088445186615,
-      "num_tokens": 1254724.0,
       "step": 640
     },
     {
-      "entropy": 1.3314976692199707,
-      "epoch": 0.8125,
-      "grad_norm": 1.0367317199707031,
-      "learning_rate": 0.00014591666666666667,
-      "loss": 1.3306329727172852,
-      "mean_token_accuracy": 0.7178499698638916,
-      "num_tokens": 1274798.0,
       "step": 650
     },
     {
-      "entropy": 1.3646180868148803,
-      "epoch": 0.825,
-      "grad_norm": 1.0535913705825806,
-      "learning_rate": 0.00014508333333333335,
-      "loss": 1.3464747428894044,
-      "mean_token_accuracy": 0.7110415756702423,
-      "num_tokens": 1294126.0,
       "step": 660
     },
     {
-      "entropy": 1.3923122048377992,
-      "epoch": 0.8375,
-      "grad_norm": 1.0129584074020386,
-      "learning_rate": 0.00014425,
-      "loss": 1.3614535331726074,
-      "mean_token_accuracy": 0.7091562509536743,
-      "num_tokens": 1313840.0,
       "step": 670
     },
     {
-      "entropy": 1.2955705881118775,
-      "epoch": 0.85,
-      "grad_norm": 1.0217289924621582,
-      "learning_rate": 0.00014341666666666667,
-      "loss": 1.3026324272155763,
-      "mean_token_accuracy": 0.7215434312820435,
-      "num_tokens": 1333374.0,
       "step": 680
     },
     {
-      "entropy": 1.2429138720035553,
-      "epoch": 0.8625,
-      "grad_norm": 1.1768354177474976,
-      "learning_rate": 0.00014258333333333335,
-      "loss": 1.2484370231628419,
-      "mean_token_accuracy": 0.7228560984134674,
-      "num_tokens": 1352803.0,
       "step": 690
     },
     {
-      "entropy": 1.4034168601036072,
-      "epoch": 0.875,
-      "grad_norm": 1.0423661470413208,
-      "learning_rate": 0.00014175,
-      "loss": 1.3977928161621094,
-      "mean_token_accuracy": 0.7066435754299164,
-      "num_tokens": 1372027.0,
       "step": 700
     },
     {
-      "entropy": 1.3989041566848754,
-      "epoch": 0.8875,
-      "grad_norm": 1.0630419254302979,
-      "learning_rate": 0.00014091666666666669,
-      "loss": 1.3820555686950684,
-      "mean_token_accuracy": 0.7068613171577454,
-      "num_tokens": 1391633.0,
       "step": 710
     },
     {
-      "entropy": 1.3769522070884705,
-      "epoch": 0.9,
-      "grad_norm": 1.0013467073440552,
-      "learning_rate": 0.00014008333333333334,
-      "loss": 1.3443568229675293,
-      "mean_token_accuracy": 0.7107380628585815,
-      "num_tokens": 1411760.0,
       "step": 720
     },
     {
-      "entropy": 1.268526130914688,
-      "epoch": 0.9125,
-      "grad_norm": 1.0953330993652344,
-      "learning_rate": 0.00013925000000000002,
-      "loss": 1.2552387237548828,
-      "mean_token_accuracy": 0.7234481334686279,
-      "num_tokens": 1431014.0,
       "step": 730
     },
     {
-      "entropy": 1.2942246317863464,
-      "epoch": 0.925,
-      "grad_norm": 1.1178935766220093,
-      "learning_rate": 0.00013841666666666668,
-      "loss": 1.28503360748291,
-      "mean_token_accuracy": 0.7154992341995239,
-      "num_tokens": 1450281.0,
       "step": 740
     },
     {
-      "entropy": 1.4117084741592407,
-      "epoch": 0.9375,
-      "grad_norm": 0.9301122426986694,
-      "learning_rate": 0.00013758333333333333,
-      "loss": 1.4023655891418456,
-      "mean_token_accuracy": 0.7010603427886963,
-      "num_tokens": 1469975.0,
       "step": 750
     },
     {
-      "entropy": 1.4011817216873168,
-      "epoch": 0.95,
-      "grad_norm": 0.9954379796981812,
-      "learning_rate": 0.00013675,
-      "loss": 1.3595520973205566,
-      "mean_token_accuracy": 0.707509434223175,
-      "num_tokens": 1490135.0,
       "step": 760
     },
     {
-      "entropy": 1.214120751619339,
-      "epoch": 0.9625,
-      "grad_norm": 1.0739448070526123,
-      "learning_rate": 0.00013591666666666667,
-      "loss": 1.2098024368286133,
-      "mean_token_accuracy": 0.7242987155914307,
-      "num_tokens": 1509619.0,
       "step": 770
     },
     {
-      "entropy": 1.3438146114349365,
-      "epoch": 0.975,
-      "grad_norm": 1.089935302734375,
-      "learning_rate": 0.00013508333333333333,
-      "loss": 1.3319005966186523,
-      "mean_token_accuracy": 0.714084678888321,
-      "num_tokens": 1528891.0,
       "step": 780
     },
     {
-      "entropy": 1.3888260841369628,
-      "epoch": 0.9875,
-      "grad_norm": 0.9926638007164001,
-      "learning_rate": 0.00013425,
-      "loss": 1.3492444992065429,
-      "mean_token_accuracy": 0.7118530929088592,
-      "num_tokens": 1548362.0,
       "step": 790
     },
     {
-      "entropy": 1.399612510204315,
-      "epoch": 1.0,
-      "grad_norm": 1.0530741214752197,
-      "learning_rate": 0.00013341666666666667,
-      "loss": 1.4108482360839845,
-      "mean_token_accuracy": 0.7038941025733948,
-      "num_tokens": 1567803.0,
       "step": 800
     },
     {
-      "entropy": 1.2114556849002838,
-      "epoch": 1.0125,
-      "grad_norm": 0.9266390204429626,
-      "learning_rate": 0.00013258333333333335,
-      "loss": 1.1405961990356446,
-      "mean_token_accuracy": 0.7448844015598297,
-      "num_tokens": 1587528.0,
       "step": 810
     },
     {
-      "entropy": 1.273935067653656,
-      "epoch": 1.025,
-      "grad_norm": 1.0611053705215454,
-      "learning_rate": 0.00013175,
-      "loss": 1.2489707946777344,
-      "mean_token_accuracy": 0.7268446266651154,
-      "num_tokens": 1606759.0,
       "step": 820
     },
     {
-      "entropy": 1.2134633004665374,
-      "epoch": 1.0375,
-      "grad_norm": 1.1921463012695312,
-      "learning_rate": 0.00013091666666666666,
-      "loss": 1.178335189819336,
-      "mean_token_accuracy": 0.7363758504390716,
-      "num_tokens": 1626169.0,
       "step": 830
     },
     {
-      "entropy": 1.2366178393363954,
-      "epoch": 1.05,
-      "grad_norm": 1.3666439056396484,
-      "learning_rate": 0.00013008333333333334,
-      "loss": 1.1955602645874024,
-      "mean_token_accuracy": 0.7361269950866699,
-      "num_tokens": 1645763.0,
       "step": 840
     },
     {
-      "entropy": 1.1986367166042329,
-      "epoch": 1.0625,
-      "grad_norm": 1.0088763236999512,
-      "learning_rate": 0.00012925,
-      "loss": 1.1691818237304688,
-      "mean_token_accuracy": 0.7368346631526947,
-      "num_tokens": 1664971.0,
       "step": 850
     },
     {
-      "entropy": 1.1878794968128203,
-      "epoch": 1.075,
-      "grad_norm": 1.826392412185669,
-      "learning_rate": 0.00012841666666666668,
-      "loss": 1.175551986694336,
-      "mean_token_accuracy": 0.7320316910743714,
-      "num_tokens": 1684853.0,
       "step": 860
     },
     {
-      "entropy": 1.2722360610961914,
-      "epoch": 1.0875,
-      "grad_norm": 1.246541142463684,
-      "learning_rate": 0.00012758333333333334,
-      "loss": 1.264747142791748,
-      "mean_token_accuracy": 0.7183553338050842,
-      "num_tokens": 1704438.0,
       "step": 870
     },
     {
-      "entropy": 1.2308316648006439,
-      "epoch": 1.1,
-      "grad_norm": 1.2344533205032349,
-      "learning_rate": 0.00012675000000000002,
-      "loss": 1.1870238304138183,
-      "mean_token_accuracy": 0.7294324994087219,
-      "num_tokens": 1724422.0,
       "step": 880
     },
     {
-      "entropy": 1.0875967979431151,
-      "epoch": 1.1125,
-      "grad_norm": 1.0286897420883179,
-      "learning_rate": 0.00012591666666666667,
-      "loss": 1.0690074920654298,
-      "mean_token_accuracy": 0.7541925728321075,
-      "num_tokens": 1744004.0,
       "step": 890
     },
     {
-      "entropy": 1.2731964230537414,
-      "epoch": 1.125,
-      "grad_norm": 1.020310640335083,
-      "learning_rate": 0.00012508333333333333,
-      "loss": 1.234278964996338,
-      "mean_token_accuracy": 0.72997545003891,
-      "num_tokens": 1763399.0,
       "step": 900
     },
     {
-      "entropy": 1.26662278175354,
-      "epoch": 1.1375,
-      "grad_norm": 1.0566675662994385,
-      "learning_rate": 0.00012425,
-      "loss": 1.2269045829772949,
-      "mean_token_accuracy": 0.7254461228847504,
-      "num_tokens": 1783241.0,
       "step": 910
     },
     {
-      "entropy": 1.2319360613822936,
-      "epoch": 1.15,
-      "grad_norm": 1.164506435394287,
-      "learning_rate": 0.00012341666666666667,
-      "loss": 1.2044157981872559,
-      "mean_token_accuracy": 0.7300530433654785,
-      "num_tokens": 1803177.0,
       "step": 920
     },
     {
-      "entropy": 1.175478756427765,
-      "epoch": 1.1625,
-      "grad_norm": 1.058076024055481,
-      "learning_rate": 0.00012258333333333335,
-      "loss": 1.1703608512878418,
-      "mean_token_accuracy": 0.7407234668731689,
-      "num_tokens": 1822918.0,
       "step": 930
     },
     {
-      "entropy": 1.2673091530799865,
-      "epoch": 1.175,
-      "grad_norm": 1.176710844039917,
-      "learning_rate": 0.00012175,
-      "loss": 1.2230928421020508,
-      "mean_token_accuracy": 0.7351066827774048,
-      "num_tokens": 1842595.0,
       "step": 940
     },
     {
-      "entropy": 1.0998504757881165,
-      "epoch": 1.1875,
-      "grad_norm": 1.1307560205459595,
-      "learning_rate": 0.00012091666666666667,
-      "loss": 1.0505106925964356,
-      "mean_token_accuracy": 0.756383728981018,
-      "num_tokens": 1861897.0,
       "step": 950
     },
     {
-      "entropy": 1.2347984194755555,
-      "epoch": 1.2,
-      "grad_norm": 1.0860289335250854,
-      "learning_rate": 0.00012008333333333334,
-      "loss": 1.2352341651916503,
-      "mean_token_accuracy": 0.7240101575851441,
-      "num_tokens": 1881755.0,
       "step": 960
     },
     {
-      "entropy": 1.2585964500904083,
-      "epoch": 1.2125,
-      "grad_norm": 1.044980764389038,
-      "learning_rate": 0.00011925,
-      "loss": 1.2421728134155274,
-      "mean_token_accuracy": 0.7247012615203857,
-      "num_tokens": 1901666.0,
       "step": 970
     },
     {
-      "entropy": 1.1582435846328736,
-      "epoch": 1.225,
-      "grad_norm": 1.2299224138259888,
-      "learning_rate": 0.00011841666666666667,
-      "loss": 1.1180283546447753,
-      "mean_token_accuracy": 0.7496900379657745,
-      "num_tokens": 1921511.0,
       "step": 980
     },
     {
-      "entropy": 1.2354500055313111,
-      "epoch": 1.2375,
-      "grad_norm": 1.279137134552002,
-      "learning_rate": 0.00011758333333333334,
-      "loss": 1.2111740112304688,
-      "mean_token_accuracy": 0.7311923027038574,
-      "num_tokens": 1941105.0,
       "step": 990
     },
     {
-      "entropy": 1.318113088607788,
-      "epoch": 1.25,
-      "grad_norm": 1.0969278812408447,
-      "learning_rate": 0.00011675,
-      "loss": 1.2783867835998535,
-      "mean_token_accuracy": 0.7130617916584014,
-      "num_tokens": 1960670.0,
       "step": 1000
     },
     {
-      "entropy": 1.1617076337337493,
-      "epoch": 1.2625,
-      "grad_norm": 1.1504206657409668,
-      "learning_rate": 0.00011591666666666667,
-      "loss": 1.1332786560058594,
-      "mean_token_accuracy": 0.7464128196239471,
-      "num_tokens": 1980198.0,
       "step": 1010
     },
     {
-      "entropy": 1.1654898643493652,
-      "epoch": 1.275,
-      "grad_norm": 1.0788720846176147,
-      "learning_rate": 0.00011508333333333334,
-      "loss": 1.143263816833496,
-      "mean_token_accuracy": 0.7419422626495361,
-      "num_tokens": 1999691.0,
       "step": 1020
     },
     {
-      "entropy": 1.0700674295425414,
-      "epoch": 1.2875,
-      "grad_norm": 1.2229335308074951,
-      "learning_rate": 0.00011425000000000001,
-      "loss": 1.0265979766845703,
-      "mean_token_accuracy": 0.7592511177062988,
-      "num_tokens": 2018961.0,
       "step": 1030
     },
     {
-      "entropy": 1.343198013305664,
-      "epoch": 1.3,
-      "grad_norm": 1.0971119403839111,
-      "learning_rate": 0.00011341666666666668,
-      "loss": 1.3413372039794922,
-      "mean_token_accuracy": 0.7115301251411438,
-      "num_tokens": 2039278.0,
       "step": 1040
     },
     {
-      "entropy": 1.173759299516678,
-      "epoch": 1.3125,
-      "grad_norm": 1.1787611246109009,
-      "learning_rate": 0.00011258333333333332,
-      "loss": 1.1580984115600585,
-      "mean_token_accuracy": 0.7372494816780091,
-      "num_tokens": 2059308.0,
       "step": 1050
     },
     {
-      "entropy": 1.1994649350643158,
-      "epoch": 1.325,
-      "grad_norm": 1.1677119731903076,
-      "learning_rate": 0.00011175,
-      "loss": 1.1848763465881347,
-      "mean_token_accuracy": 0.7347624957561493,
-      "num_tokens": 2078671.0,
       "step": 1060
     },
     {
-      "entropy": 1.180502289533615,
-      "epoch": 1.3375,
-      "grad_norm": 1.1610862016677856,
-      "learning_rate": 0.00011091666666666667,
-      "loss": 1.128573226928711,
-      "mean_token_accuracy": 0.7481857478618622,
-      "num_tokens": 2098285.0,
       "step": 1070
     },
     {
-      "entropy": 1.1774109721183776,
-      "epoch": 1.35,
-      "grad_norm": 1.1028215885162354,
-      "learning_rate": 0.00011008333333333334,
-      "loss": 1.1424349784851073,
-      "mean_token_accuracy": 0.7449199557304382,
-      "num_tokens": 2117580.0,
       "step": 1080
     },
     {
-      "entropy": 1.2145233869552612,
-      "epoch": 1.3625,
-      "grad_norm": 1.1750071048736572,
-      "learning_rate": 0.00010925000000000001,
-      "loss": 1.184683609008789,
-      "mean_token_accuracy": 0.7363656103610993,
-      "num_tokens": 2136772.0,
       "step": 1090
     },
     {
-      "entropy": 1.3310753881931305,
-      "epoch": 1.375,
-      "grad_norm": 1.1250444650650024,
-      "learning_rate": 0.00010841666666666668,
-      "loss": 1.289270782470703,
-      "mean_token_accuracy": 0.7221044361591339,
-      "num_tokens": 2155997.0,
       "step": 1100
     },
     {
-      "entropy": 1.2002023696899413,
-      "epoch": 1.3875,
-      "grad_norm": 1.105889916419983,
-      "learning_rate": 0.00010758333333333335,
-      "loss": 1.1718459129333496,
-      "mean_token_accuracy": 0.7428612053394318,
-      "num_tokens": 2175776.0,
       "step": 1110
     },
     {
-      "entropy": 1.2213382959365844,
-      "epoch": 1.4,
-      "grad_norm": 1.0537785291671753,
-      "learning_rate": 0.00010674999999999999,
-      "loss": 1.2042366981506347,
-      "mean_token_accuracy": 0.734257060289383,
-      "num_tokens": 2195589.0,
       "step": 1120
     },
     {
-      "entropy": 1.3082896590232849,
-      "epoch": 1.4125,
-      "grad_norm": 1.214430570602417,
-      "learning_rate": 0.00010591666666666666,
-      "loss": 1.2770617485046387,
-      "mean_token_accuracy": 0.7225141525268555,
-      "num_tokens": 2215069.0,
       "step": 1130
     },
     {
-      "entropy": 1.2698805391788484,
-      "epoch": 1.425,
-      "grad_norm": 1.1503865718841553,
-      "learning_rate": 0.00010508333333333333,
-      "loss": 1.2459848403930665,
-      "mean_token_accuracy": 0.7274727523326874,
-      "num_tokens": 2234322.0,
       "step": 1140
     },
     {
-      "entropy": 1.1276936411857605,
-      "epoch": 1.4375,
-      "grad_norm": 1.183884859085083,
-      "learning_rate": 0.00010425,
-      "loss": 1.0825308799743651,
-      "mean_token_accuracy": 0.7583949148654938,
-      "num_tokens": 2254109.0,
       "step": 1150
     },
     {
-      "entropy": 1.162560474872589,
-      "epoch": 1.45,
-      "grad_norm": 1.123547911643982,
-      "learning_rate": 0.00010341666666666667,
-      "loss": 1.131033706665039,
-      "mean_token_accuracy": 0.7409184396266937,
-      "num_tokens": 2274063.0,
       "step": 1160
     },
     {
-      "entropy": 1.172844797372818,
-      "epoch": 1.4625,
-      "grad_norm": 1.1609870195388794,
-      "learning_rate": 0.00010258333333333334,
-      "loss": 1.1425368309020996,
-      "mean_token_accuracy": 0.7405532896518707,
-      "num_tokens": 2293593.0,
       "step": 1170
     },
     {
-      "entropy": 1.1611240029335022,
-      "epoch": 1.475,
-      "grad_norm": 1.2433576583862305,
-      "learning_rate": 0.00010175,
-      "loss": 1.156510066986084,
-      "mean_token_accuracy": 0.7392487466335297,
-      "num_tokens": 2313022.0,
       "step": 1180
     },
     {
-      "entropy": 1.297146165370941,
-      "epoch": 1.4875,
-      "grad_norm": 1.2858229875564575,
-      "learning_rate": 0.00010091666666666668,
-      "loss": 1.2874930381774903,
-      "mean_token_accuracy": 0.7157010912895203,
-      "num_tokens": 2333416.0,
       "step": 1190
     },
     {
-      "entropy": 1.1096710920333863,
-      "epoch": 1.5,
-      "grad_norm": 1.3551592826843262,
-      "learning_rate": 0.00010008333333333333,
-      "loss": 1.0497027397155763,
-      "mean_token_accuracy": 0.7567280232906342,
-      "num_tokens": 2353307.0,
       "step": 1200
     },
     {
-      "entropy": 1.2629383742809295,
-      "epoch": 1.5125,
-      "grad_norm": 1.1824836730957031,
-      "learning_rate": 9.925000000000001e-05,
-      "loss": 1.255314254760742,
-      "mean_token_accuracy": 0.7271463632583618,
-      "num_tokens": 2372437.0,
       "step": 1210
     },
     {
-      "entropy": 1.2311269104480744,
-      "epoch": 1.525,
-      "grad_norm": 1.3155615329742432,
-      "learning_rate": 9.841666666666667e-05,
-      "loss": 1.2020380020141601,
-      "mean_token_accuracy": 0.7358236730098724,
-      "num_tokens": 2391960.0,
       "step": 1220
     },
     {
-      "entropy": 1.237127846479416,
-      "epoch": 1.5375,
-      "grad_norm": 1.2081711292266846,
-      "learning_rate": 9.758333333333334e-05,
-      "loss": 1.2208969116210937,
-      "mean_token_accuracy": 0.729817122220993,
-      "num_tokens": 2411868.0,
       "step": 1230
     },
     {
-      "entropy": 1.119310849905014,
-      "epoch": 1.55,
-      "grad_norm": 1.1908034086227417,
-      "learning_rate": 9.675000000000001e-05,
-      "loss": 1.095372200012207,
-      "mean_token_accuracy": 0.7499478220939636,
-      "num_tokens": 2431339.0,
       "step": 1240
     },
     {
-      "entropy": 1.304679548740387,
-      "epoch": 1.5625,
-      "grad_norm": 1.1009821891784668,
-      "learning_rate": 9.591666666666666e-05,
-      "loss": 1.2699440002441407,
-      "mean_token_accuracy": 0.7241815030574799,
-      "num_tokens": 2450796.0,
       "step": 1250
     },
     {
-      "entropy": 1.1995501220226288,
-      "epoch": 1.575,
-      "grad_norm": 1.161159873008728,
-      "learning_rate": 9.508333333333333e-05,
-      "loss": 1.1768548965454102,
-      "mean_token_accuracy": 0.7322454929351807,
-      "num_tokens": 2470022.0,
       "step": 1260
     },
     {
-      "entropy": 1.1767509758472443,
-      "epoch": 1.5875,
-      "grad_norm": 1.214721918106079,
-      "learning_rate": 9.425e-05,
-      "loss": 1.1586053848266602,
-      "mean_token_accuracy": 0.7356619358062744,
-      "num_tokens": 2489292.0,
       "step": 1270
     },
     {
-      "entropy": 1.138701504468918,
-      "epoch": 1.6,
-      "grad_norm": 1.100012183189392,
-      "learning_rate": 9.341666666666667e-05,
-      "loss": 1.0964359283447265,
-      "mean_token_accuracy": 0.7454494297504425,
-      "num_tokens": 2509009.0,
       "step": 1280
     },
     {
-      "entropy": 1.1690003037452699,
-      "epoch": 1.6125,
-      "grad_norm": 1.2297983169555664,
-      "learning_rate": 9.258333333333334e-05,
-      "loss": 1.1631418228149415,
-      "mean_token_accuracy": 0.7382839739322662,
-      "num_tokens": 2528620.0,
       "step": 1290
     },
     {
-      "entropy": 1.1637387096881866,
-      "epoch": 1.625,
-      "grad_norm": 1.2777661085128784,
-      "learning_rate": 9.175000000000001e-05,
-      "loss": 1.1510313987731933,
-      "mean_token_accuracy": 0.735828697681427,
-      "num_tokens": 2548196.0,
       "step": 1300
     },
     {
-      "entropy": 1.20295706987381,
-      "epoch": 1.6375,
-      "grad_norm": 1.1685494184494019,
-      "learning_rate": 9.091666666666668e-05,
-      "loss": 1.1551430702209473,
-      "mean_token_accuracy": 0.7458238661289215,
-      "num_tokens": 2567533.0,
       "step": 1310
     },
     {
-      "entropy": 1.09195419549942,
-      "epoch": 1.65,
-      "grad_norm": 1.1366465091705322,
-      "learning_rate": 9.008333333333335e-05,
-      "loss": 1.0663909912109375,
-      "mean_token_accuracy": 0.7581624269485474,
-      "num_tokens": 2586652.0,
       "step": 1320
     },
     {
-      "entropy": 1.1013097047805787,
-      "epoch": 1.6625,
-      "grad_norm": 1.2281990051269531,
-      "learning_rate": 8.925e-05,
-      "loss": 1.0794993400573731,
-      "mean_token_accuracy": 0.7515947103500367,
-      "num_tokens": 2605873.0,
       "step": 1330
     },
     {
-      "entropy": 1.1546533286571503,
-      "epoch": 1.675,
-      "grad_norm": 1.150604009628296,
-      "learning_rate": 8.841666666666667e-05,
-      "loss": 1.1236547470092773,
-      "mean_token_accuracy": 0.747076016664505,
-      "num_tokens": 2625435.0,
       "step": 1340
     },
     {
-      "entropy": 1.1468034386634827,
-      "epoch": 1.6875,
-      "grad_norm": 1.2128630876541138,
-      "learning_rate": 8.758333333333334e-05,
-      "loss": 1.1327264785766602,
-      "mean_token_accuracy": 0.743101853132248,
-      "num_tokens": 2644613.0,
       "step": 1350
     },
     {
-      "entropy": 1.2498606383800506,
-      "epoch": 1.7,
-      "grad_norm": 1.2257990837097168,
-      "learning_rate": 8.675000000000001e-05,
-      "loss": 1.2251564979553222,
-      "mean_token_accuracy": 0.7284741044044495,
-      "num_tokens": 2664607.0,
       "step": 1360
     },
     {
-      "entropy": 1.1153142929077149,
-      "epoch": 1.7125,
-      "grad_norm": 1.338675618171692,
-      "learning_rate": 8.591666666666666e-05,
-      "loss": 1.0588098526000977,
-      "mean_token_accuracy": 0.75564626455307,
-      "num_tokens": 2684339.0,
       "step": 1370
     },
     {
-      "entropy": 1.0752389311790467,
-      "epoch": 1.725,
-      "grad_norm": 1.0985221862792969,
-      "learning_rate": 8.508333333333333e-05,
-      "loss": 1.0512856483459472,
-      "mean_token_accuracy": 0.7600628316402436,
-      "num_tokens": 2703919.0,
       "step": 1380
     },
     {
-      "entropy": 1.09818754196167,
-      "epoch": 1.7375,
-      "grad_norm": 1.1556848287582397,
-      "learning_rate": 8.425e-05,
-      "loss": 1.0800713539123534,
-      "mean_token_accuracy": 0.7518243432044983,
-      "num_tokens": 2723128.0,
       "step": 1390
     },
     {
-      "entropy": 1.2366557955741881,
-      "epoch": 1.75,
-      "grad_norm": 1.3306751251220703,
-      "learning_rate": 8.341666666666667e-05,
-      "loss": 1.2005329132080078,
-      "mean_token_accuracy": 0.7348743200302124,
-      "num_tokens": 2742996.0,
       "step": 1400
     },
     {
-      "entropy": 1.0495585322380065,
-      "epoch": 1.7625,
-      "grad_norm": 1.1171796321868896,
-      "learning_rate": 8.258333333333334e-05,
-      "loss": 1.0380284309387207,
-      "mean_token_accuracy": 0.7589188039302825,
-      "num_tokens": 2762343.0,
       "step": 1410
     },
     {
-      "entropy": 1.1524301767349243,
-      "epoch": 1.775,
-      "grad_norm": 1.2297626733779907,
-      "learning_rate": 8.175000000000001e-05,
-      "loss": 1.117063331604004,
-      "mean_token_accuracy": 0.7440081238746643,
-      "num_tokens": 2782095.0,
       "step": 1420
     },
     {
-      "entropy": 1.1989089012145997,
-      "epoch": 1.7875,
-      "grad_norm": 1.3411099910736084,
-      "learning_rate": 8.091666666666668e-05,
-      "loss": 1.1682716369628907,
-      "mean_token_accuracy": 0.7408987522125244,
-      "num_tokens": 2802005.0,
       "step": 1430
     },
     {
-      "entropy": 1.1420318186283112,
-      "epoch": 1.8,
-      "grad_norm": 1.2690355777740479,
-      "learning_rate": 8.008333333333333e-05,
-      "loss": 1.1279597282409668,
-      "mean_token_accuracy": 0.7459539830684662,
-      "num_tokens": 2822160.0,
       "step": 1440
     },
     {
-      "entropy": 1.0963495194911956,
-      "epoch": 1.8125,
-      "grad_norm": 1.1553294658660889,
-      "learning_rate": 7.925e-05,
-      "loss": 1.0589072227478027,
-      "mean_token_accuracy": 0.757156765460968,
-      "num_tokens": 2841484.0,
       "step": 1450
     },
     {
-      "entropy": 1.0866885364055634,
-      "epoch": 1.825,
-      "grad_norm": 1.235066533088684,
-      "learning_rate": 7.841666666666667e-05,
-      "loss": 1.055964183807373,
-      "mean_token_accuracy": 0.762250280380249,
-      "num_tokens": 2860959.0,
       "step": 1460
     },
     {
-      "entropy": 1.1359103441238403,
-      "epoch": 1.8375,
-      "grad_norm": 1.2188527584075928,
-      "learning_rate": 7.758333333333334e-05,
-      "loss": 1.0978761672973634,
-      "mean_token_accuracy": 0.7553177118301392,
-      "num_tokens": 2880839.0,
       "step": 1470
     },
     {
-      "entropy": 1.1913959503173828,
-      "epoch": 1.85,
-      "grad_norm": 1.1904797554016113,
-      "learning_rate": 7.675e-05,
-      "loss": 1.175191307067871,
-      "mean_token_accuracy": 0.7345862329006195,
-      "num_tokens": 2900475.0,
       "step": 1480
     },
     {
-      "entropy": 1.2953790843486785,
-      "epoch": 1.8625,
-      "grad_norm": 1.1965097188949585,
-      "learning_rate": 7.591666666666666e-05,
-      "loss": 1.280709934234619,
-      "mean_token_accuracy": 0.7224766254425049,
-      "num_tokens": 2919958.0,
       "step": 1490
     },
     {
-      "entropy": 1.201363343000412,
-      "epoch": 1.875,
-      "grad_norm": 1.2609730958938599,
-      "learning_rate": 7.508333333333333e-05,
-      "loss": 1.199178695678711,
-      "mean_token_accuracy": 0.7449064493179322,
-      "num_tokens": 2939883.0,
       "step": 1500
     },
     {
-      "entropy": 1.2543865263462066,
-      "epoch": 1.8875,
-      "grad_norm": 1.237781286239624,
-      "learning_rate": 7.425e-05,
-      "loss": 1.2171030044555664,
-      "mean_token_accuracy": 0.7319530665874481,
-      "num_tokens": 2959127.0,
       "step": 1510
     },
     {
-      "entropy": 1.1565960764884948,
-      "epoch": 1.9,
-      "grad_norm": 1.1916192770004272,
-      "learning_rate": 7.341666666666667e-05,
-      "loss": 1.0884868621826171,
-      "mean_token_accuracy": 0.7524564802646637,
-      "num_tokens": 2978570.0,
       "step": 1520
     },
     {
-      "entropy": 1.1555658102035522,
-      "epoch": 1.9125,
-      "grad_norm": 1.2012529373168945,
-      "learning_rate": 7.258333333333334e-05,
-      "loss": 1.1715718269348145,
-      "mean_token_accuracy": 0.7386487185955047,
-      "num_tokens": 2997991.0,
       "step": 1530
     },
     {
-      "entropy": 1.330852198600769,
-      "epoch": 1.925,
-      "grad_norm": 1.1955431699752808,
-      "learning_rate": 7.175000000000001e-05,
-      "loss": 1.3101895332336426,
-      "mean_token_accuracy": 0.7175221145153046,
-      "num_tokens": 3017940.0,
       "step": 1540
     },
     {
-      "entropy": 1.3083417534828186,
-      "epoch": 1.9375,
-      "grad_norm": 1.214016318321228,
-      "learning_rate": 7.091666666666666e-05,
-      "loss": 1.2484627723693849,
-      "mean_token_accuracy": 0.7284034073352814,
-      "num_tokens": 3037566.0,
       "step": 1550
     },
     {
-      "entropy": 1.0928865134716035,
-      "epoch": 1.95,
-      "grad_norm": 1.1900498867034912,
-      "learning_rate": 7.008333333333333e-05,
-      "loss": 1.0677626609802247,
-      "mean_token_accuracy": 0.7540374755859375,
-      "num_tokens": 3057492.0,
       "step": 1560
     },
     {
-      "entropy": 1.1798087418079377,
-      "epoch": 1.9625,
-      "grad_norm": 1.141886830329895,
-      "learning_rate": 6.925e-05,
-      "loss": 1.1684003829956056,
-      "mean_token_accuracy": 0.7409846067428589,
-      "num_tokens": 3076975.0,
       "step": 1570
     },
     {
-      "entropy": 1.0994779944419861,
-      "epoch": 1.975,
-      "grad_norm": 1.233418583869934,
-      "learning_rate": 6.841666666666667e-05,
-      "loss": 1.0708105087280273,
-      "mean_token_accuracy": 0.7573552906513215,
-      "num_tokens": 3096828.0,
       "step": 1580
     },
     {
-      "entropy": 1.2836666464805604,
-      "epoch": 1.9875,
-      "grad_norm": 1.193438172340393,
-      "learning_rate": 6.758333333333333e-05,
-      "loss": 1.2414496421813965,
-      "mean_token_accuracy": 0.7279482066631318,
-      "num_tokens": 3116437.0,
       "step": 1590
     },
     {
-      "entropy": 1.087252539396286,
-      "epoch": 2.0,
-      "grad_norm": 1.0979626178741455,
-      "learning_rate": 6.675e-05,
-      "loss": 1.061672306060791,
-      "mean_token_accuracy": 0.7541257202625274,
-      "num_tokens": 3135606.0,
       "step": 1600
     }
   ],
   "logging_steps": 10,
-  "max_steps": 2400,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 3,
   "save_steps": 500,
@@ -1627,7 +1627,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 2.0478329243904e+16,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 1.0,
   "eval_steps": 500,
   "global_step": 1600,
   "is_hyper_param_search": false,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "entropy": 2.8972578048706055,
+      "epoch": 0.00625,
+      "grad_norm": 1.416805624961853,
+      "learning_rate": 0.00019962500000000001,
+      "loss": 3.8105133056640623,
+      "mean_token_accuracy": 0.4103764593601227,
+      "num_tokens": 17074.0,
       "step": 10
     },
     {
+      "entropy": 2.769114351272583,
+      "epoch": 0.0125,
+      "grad_norm": 1.159595012664795,
+      "learning_rate": 0.00019920833333333336,
+      "loss": 2.690728759765625,
+      "mean_token_accuracy": 0.5351322680711746,
+      "num_tokens": 33777.0,
       "step": 20
     },
     {
+      "entropy": 2.3261287093162535,
+      "epoch": 0.01875,
+      "grad_norm": 1.3773282766342163,
+      "learning_rate": 0.0001987916666666667,
+      "loss": 2.328271675109863,
+      "mean_token_accuracy": 0.5926208615303039,
+      "num_tokens": 49315.0,
       "step": 30
     },
     {
+      "entropy": 2.3075815558433534,
+      "epoch": 0.025,
+      "grad_norm": 1.0916646718978882,
+      "learning_rate": 0.000198375,
+      "loss": 2.1861215591430665,
+      "mean_token_accuracy": 0.6114547044038773,
+      "num_tokens": 65083.0,
       "step": 40
     },
     {
+      "entropy": 1.9178041577339173,
+      "epoch": 0.03125,
+      "grad_norm": 0.9288890361785889,
+      "learning_rate": 0.00019795833333333332,
+      "loss": 1.95428466796875,
+      "mean_token_accuracy": 0.645366108417511,
+      "num_tokens": 81240.0,
       "step": 50
     },
     {
+      "entropy": 2.342257523536682,
+      "epoch": 0.0375,
+      "grad_norm": 1.0486043691635132,
+      "learning_rate": 0.00019754166666666667,
+      "loss": 2.3062065124511717,
+      "mean_token_accuracy": 0.6025018393993378,
+      "num_tokens": 97110.0,
       "step": 60
     },
     {
+      "entropy": 1.842692232131958,
+      "epoch": 0.04375,
+      "grad_norm": 1.1565988063812256,
+      "learning_rate": 0.000197125,
+      "loss": 1.848040771484375,
+      "mean_token_accuracy": 0.649482148885727,
+      "num_tokens": 113661.0,
       "step": 70
     },
     {
+      "entropy": 2.015536868572235,
+      "epoch": 0.05,
+      "grad_norm": 1.036302089691162,
+      "learning_rate": 0.00019670833333333335,
+      "loss": 2.023266410827637,
+      "mean_token_accuracy": 0.6400867640972138,
+      "num_tokens": 129571.0,
       "step": 80
     },
     {
+      "entropy": 2.291021800041199,
+      "epoch": 0.05625,
+      "grad_norm": 1.1765780448913574,
+      "learning_rate": 0.00019629166666666666,
+      "loss": 2.2915937423706056,
+      "mean_token_accuracy": 0.6016066193580627,
+      "num_tokens": 145845.0,
       "step": 90
     },
     {
+      "entropy": 1.9315234899520874,
+      "epoch": 0.0625,
+      "grad_norm": 1.1040469408035278,
+      "learning_rate": 0.000195875,
+      "loss": 1.8839471817016602,
+      "mean_token_accuracy": 0.656380432844162,
+      "num_tokens": 162128.0,
       "step": 100
     },
     {
+      "entropy": 1.864959979057312,
+      "epoch": 0.06875,
+      "grad_norm": 1.0841010808944702,
+      "learning_rate": 0.00019545833333333335,
+      "loss": 1.855326271057129,
+      "mean_token_accuracy": 0.6628630757331848,
+      "num_tokens": 178343.0,
       "step": 110
     },
     {
+      "entropy": 1.9021487474441527,
+      "epoch": 0.075,
+      "grad_norm": 1.0495465993881226,
+      "learning_rate": 0.0001950416666666667,
+      "loss": 1.8911224365234376,
+      "mean_token_accuracy": 0.6627636551856995,
+      "num_tokens": 194216.0,
       "step": 120
     },
     {
+      "entropy": 2.0799292087554933,
+      "epoch": 0.08125,
+      "grad_norm": 1.4638044834136963,
+      "learning_rate": 0.000194625,
+      "loss": 2.0677186965942385,
+      "mean_token_accuracy": 0.6408409655094147,
+      "num_tokens": 209861.0,
       "step": 130
     },
     {
+      "entropy": 2.0656333684921266,
+      "epoch": 0.0875,
+      "grad_norm": 1.2326873540878296,
+      "learning_rate": 0.00019420833333333334,
+      "loss": 2.0436325073242188,
+      "mean_token_accuracy": 0.647420459985733,
+      "num_tokens": 225951.0,
       "step": 140
     },
     {
+      "entropy": 2.151374113559723,
+      "epoch": 0.09375,
+      "grad_norm": 1.209037184715271,
+      "learning_rate": 0.00019379166666666668,
+      "loss": 2.1708988189697265,
+      "mean_token_accuracy": 0.6336644470691681,
+      "num_tokens": 241973.0,
       "step": 150
     },
     {
+      "entropy": 1.9679807424545288,
+      "epoch": 0.1,
+      "grad_norm": 1.0798423290252686,
+      "learning_rate": 0.00019337500000000002,
+      "loss": 1.9049331665039062,
+      "mean_token_accuracy": 0.6611163139343261,
+      "num_tokens": 257148.0,
       "step": 160
     },
     {
+      "entropy": 1.9416646242141724,
+      "epoch": 0.10625,
+      "grad_norm": 0.9878492951393127,
+      "learning_rate": 0.00019295833333333334,
+      "loss": 1.960176658630371,
+      "mean_token_accuracy": 0.6551605999469757,
+      "num_tokens": 273456.0,
       "step": 170
     },
     {
+      "entropy": 1.7779759645462037,
+      "epoch": 0.1125,
+      "grad_norm": 1.074549674987793,
+      "learning_rate": 0.00019254166666666668,
+      "loss": 1.7707120895385742,
+      "mean_token_accuracy": 0.6601927995681762,
+      "num_tokens": 290923.0,
       "step": 180
     },
     {
+      "entropy": 2.111535668373108,
+      "epoch": 0.11875,
+      "grad_norm": 1.4603313207626343,
+      "learning_rate": 0.000192125,
+      "loss": 2.09625358581543,
+      "mean_token_accuracy": 0.6402183502912522,
+      "num_tokens": 307822.0,
       "step": 190
     },
     {
+      "entropy": 2.077592122554779,
+      "epoch": 0.125,
+      "grad_norm": 1.1337363719940186,
+      "learning_rate": 0.00019170833333333334,
+      "loss": 2.084154510498047,
+      "mean_token_accuracy": 0.641227388381958,
+      "num_tokens": 324142.0,
       "step": 200
     },
     {
+      "entropy": 1.8829279899597169,
+      "epoch": 0.13125,
+      "grad_norm": 1.0533121824264526,
+      "learning_rate": 0.00019129166666666668,
+      "loss": 1.83758544921875,
+      "mean_token_accuracy": 0.6638262569904327,
+      "num_tokens": 341542.0,
       "step": 210
     },
     {
+      "entropy": 1.649771249294281,
+      "epoch": 0.1375,
+      "grad_norm": 1.2242692708969116,
+      "learning_rate": 0.000190875,
+      "loss": 1.6660097122192383,
+      "mean_token_accuracy": 0.696986198425293,
+      "num_tokens": 356591.0,
       "step": 220
     },
     {
+      "entropy": 1.6322881817817687,
+      "epoch": 0.14375,
+      "grad_norm": 1.318080186843872,
+      "learning_rate": 0.00019045833333333333,
+      "loss": 1.6340875625610352,
+      "mean_token_accuracy": 0.7008972883224487,
+      "num_tokens": 371764.0,
       "step": 230
     },
     {
+      "entropy": 1.7258678793907165,
+      "epoch": 0.15,
+      "grad_norm": 1.1507346630096436,
+      "learning_rate": 0.00019004166666666667,
+      "loss": 1.7209365844726563,
+      "mean_token_accuracy": 0.6634244680404663,
+      "num_tokens": 390139.0,
       "step": 240
     },
     {
+      "entropy": 2.0835100650787353,
+      "epoch": 0.15625,
+      "grad_norm": 1.1298671960830688,
+      "learning_rate": 0.00018962500000000001,
+      "loss": 2.0685489654541014,
+      "mean_token_accuracy": 0.6533876061439514,
+      "num_tokens": 404727.0,
       "step": 250
     },
     {
+      "entropy": 1.8807834386825562,
+      "epoch": 0.1625,
+      "grad_norm": 1.4069880247116089,
+      "learning_rate": 0.00018920833333333336,
+      "loss": 1.8705434799194336,
+      "mean_token_accuracy": 0.6535706460475922,
+      "num_tokens": 421923.0,
       "step": 260
     },
     {
+      "entropy": 1.6720293521881104,
+      "epoch": 0.16875,
+      "grad_norm": 1.2488282918930054,
+      "learning_rate": 0.00018879166666666667,
+      "loss": 1.647348976135254,
+      "mean_token_accuracy": 0.6866099178791046,
+      "num_tokens": 439038.0,
       "step": 270
     },
     {
+      "entropy": 1.930847203731537,
+      "epoch": 0.175,
+      "grad_norm": 1.0187071561813354,
+      "learning_rate": 0.000188375,
+      "loss": 1.9441492080688476,
+      "mean_token_accuracy": 0.6643437385559082,
+      "num_tokens": 455019.0,
       "step": 280
     },
     {
+      "entropy": 1.783823847770691,
+      "epoch": 0.18125,
+      "grad_norm": 0.991218090057373,
+      "learning_rate": 0.00018795833333333335,
+      "loss": 1.766385841369629,
+      "mean_token_accuracy": 0.6792463660240173,
+      "num_tokens": 470470.0,
       "step": 290
     },
     {
+      "entropy": 1.6973824977874756,
+      "epoch": 0.1875,
+      "grad_norm": 1.1331487894058228,
+      "learning_rate": 0.0001875416666666667,
+      "loss": 1.6720619201660156,
+      "mean_token_accuracy": 0.690255868434906,
+      "num_tokens": 486512.0,
       "step": 300
     },
     {
+      "entropy": 1.881280207633972,
+      "epoch": 0.19375,
+      "grad_norm": 1.0860546827316284,
+      "learning_rate": 0.000187125,
+      "loss": 1.8710559844970702,
+      "mean_token_accuracy": 0.6732289731502533,
+      "num_tokens": 501664.0,
       "step": 310
     },
     {
+      "entropy": 1.928344440460205,
+      "epoch": 0.2,
+      "grad_norm": 1.0820534229278564,
+      "learning_rate": 0.00018670833333333335,
+      "loss": 1.94879093170166,
+      "mean_token_accuracy": 0.6571235120296478,
+      "num_tokens": 516500.0,
       "step": 320
     },
     {
+      "entropy": 1.780434775352478,
+      "epoch": 0.20625,
+      "grad_norm": 1.149436116218567,
+      "learning_rate": 0.0001862916666666667,
+      "loss": 1.739248275756836,
+      "mean_token_accuracy": 0.6907038509845733,
+      "num_tokens": 531623.0,
       "step": 330
     },
     {
+      "entropy": 1.835638737678528,
+      "epoch": 0.2125,
+      "grad_norm": 1.217748999595642,
+      "learning_rate": 0.000185875,
+      "loss": 1.837971305847168,
+      "mean_token_accuracy": 0.6789492428302765,
+      "num_tokens": 547248.0,
       "step": 340
     },
     {
+      "entropy": 1.529280412197113,
+      "epoch": 0.21875,
+      "grad_norm": 1.1209408044815063,
+      "learning_rate": 0.00018545833333333335,
+      "loss": 1.5159669876098634,
+      "mean_token_accuracy": 0.7098696529865265,
+      "num_tokens": 562556.0,
       "step": 350
     },
     {
+      "entropy": 1.9280451774597167,
+      "epoch": 0.225,
+      "grad_norm": 1.0258183479309082,
+      "learning_rate": 0.00018504166666666666,
+      "loss": 1.9479742050170898,
+      "mean_token_accuracy": 0.6640809357166291,
+      "num_tokens": 578023.0,
       "step": 360
     },
     {
+      "entropy": 1.8790152072906494,
+      "epoch": 0.23125,
+      "grad_norm": 1.157669186592102,
+      "learning_rate": 0.000184625,
+      "loss": 1.847334861755371,
+      "mean_token_accuracy": 0.6603596329689025,
+      "num_tokens": 594019.0,
       "step": 370
     },
     {
+      "entropy": 1.8294876575469972,
+      "epoch": 0.2375,
+      "grad_norm": 1.0211504697799683,
+      "learning_rate": 0.00018420833333333334,
+      "loss": 1.8582696914672852,
+      "mean_token_accuracy": 0.6679854333400727,
+      "num_tokens": 609499.0,
       "step": 380
     },
     {
+      "entropy": 1.8593019366264343,
+      "epoch": 0.24375,
+      "grad_norm": 1.2300069332122803,
+      "learning_rate": 0.00018379166666666668,
+      "loss": 1.8436058044433594,
+      "mean_token_accuracy": 0.6740959763526917,
+      "num_tokens": 624831.0,
       "step": 390
     },
     {
+      "entropy": 1.6092237114906311,
+      "epoch": 0.25,
+      "grad_norm": 1.2899959087371826,
+      "learning_rate": 0.000183375,
+      "loss": 1.5911931991577148,
+      "mean_token_accuracy": 0.7107231378555298,
+      "num_tokens": 640781.0,
       "step": 400
     },
     {
+      "entropy": 2.147260272502899,
+      "epoch": 0.25625,
+      "grad_norm": 1.28315007686615,
+      "learning_rate": 0.00018295833333333334,
+      "loss": 2.1315792083740233,
+      "mean_token_accuracy": 0.6412826657295227,
+      "num_tokens": 656795.0,
       "step": 410
     },
     {
+      "entropy": 1.8276140928268432,
+      "epoch": 0.2625,
+      "grad_norm": 0.9926204681396484,
+      "learning_rate": 0.00018254166666666668,
+      "loss": 1.7912399291992187,
+      "mean_token_accuracy": 0.6752909004688263,
+      "num_tokens": 673839.0,
       "step": 420
     },
     {
+      "entropy": 1.725200641155243,
+      "epoch": 0.26875,
+      "grad_norm": 0.9599955677986145,
+      "learning_rate": 0.00018212500000000002,
+      "loss": 1.6968486785888672,
+      "mean_token_accuracy": 0.6876484453678131,
+      "num_tokens": 691102.0,
       "step": 430
     },
     {
+      "entropy": 1.49821537733078,
+      "epoch": 0.275,
+      "grad_norm": 1.1128442287445068,
+      "learning_rate": 0.00018170833333333334,
+      "loss": 1.4911989212036132,
+      "mean_token_accuracy": 0.7070409774780273,
+      "num_tokens": 707939.0,
       "step": 440
     },
     {
+      "entropy": 2.0437518835067747,
+      "epoch": 0.28125,
+      "grad_norm": 1.1485779285430908,
+      "learning_rate": 0.00018129166666666668,
+      "loss": 2.0552061080932615,
+      "mean_token_accuracy": 0.6452532887458802,
+      "num_tokens": 724384.0,
       "step": 450
     },
     {
+      "entropy": 1.9125534653663636,
+      "epoch": 0.2875,
+      "grad_norm": 1.3141529560089111,
+      "learning_rate": 0.00018087500000000002,
+      "loss": 1.8738250732421875,
+      "mean_token_accuracy": 0.6706897974014282,
+      "num_tokens": 739865.0,
       "step": 460
     },
     {
+      "entropy": 1.9561587691307067,
+      "epoch": 0.29375,
+      "grad_norm": 1.0918525457382202,
+      "learning_rate": 0.00018045833333333336,
+      "loss": 1.938099479675293,
+      "mean_token_accuracy": 0.6760513365268708,
+      "num_tokens": 755491.0,
       "step": 470
     },
     {
+      "entropy": 1.6972344875335694,
+      "epoch": 0.3,
+      "grad_norm": 1.183408260345459,
+      "learning_rate": 0.00018004166666666667,
+      "loss": 1.6730932235717773,
+      "mean_token_accuracy": 0.6902998864650727,
+      "num_tokens": 771754.0,
       "step": 480
     },
     {
+      "entropy": 1.6555222153663636,
+      "epoch": 0.30625,
+      "grad_norm": 1.2446097135543823,
+      "learning_rate": 0.000179625,
+      "loss": 1.644314956665039,
+      "mean_token_accuracy": 0.7027111053466797,
+      "num_tokens": 787882.0,
       "step": 490
     },
     {
+      "entropy": 1.6912259459495544,
+      "epoch": 0.3125,
+      "grad_norm": 1.0987075567245483,
+      "learning_rate": 0.00017920833333333333,
+      "loss": 1.6494056701660156,
+      "mean_token_accuracy": 0.6928456544876098,
+      "num_tokens": 804532.0,
       "step": 500
     },
     {
+      "entropy": 1.8515005946159362,
+      "epoch": 0.31875,
+      "grad_norm": 1.1869553327560425,
+      "learning_rate": 0.00017879166666666667,
+      "loss": 1.856374740600586,
+      "mean_token_accuracy": 0.6716830492019653,
+      "num_tokens": 819940.0,
       "step": 510
     },
     {
+      "entropy": 1.696764051914215,
+      "epoch": 0.325,
+      "grad_norm": 1.1994718313217163,
+      "learning_rate": 0.000178375,
+      "loss": 1.6898420333862305,
+      "mean_token_accuracy": 0.6878461837768555,
+      "num_tokens": 835747.0,
       "step": 520
     },
     {
+      "entropy": 1.9474074840545654,
+      "epoch": 0.33125,
+      "grad_norm": 1.0442698001861572,
+      "learning_rate": 0.00017795833333333333,
+      "loss": 1.948105812072754,
+      "mean_token_accuracy": 0.671975576877594,
+      "num_tokens": 850222.0,
       "step": 530
     },
     {
+      "entropy": 1.5088442265987396,
+      "epoch": 0.3375,
+      "grad_norm": 1.0030030012130737,
+      "learning_rate": 0.00017754166666666667,
+      "loss": 1.4812466621398925,
+      "mean_token_accuracy": 0.7255652785301209,
+      "num_tokens": 866098.0,
       "step": 540
     },
     {
+      "entropy": 1.4793359756469726,
+      "epoch": 0.34375,
+      "grad_norm": 1.1266038417816162,
+      "learning_rate": 0.000177125,
+      "loss": 1.483462142944336,
+      "mean_token_accuracy": 0.7108414351940155,
+      "num_tokens": 883108.0,
       "step": 550
     },
     {
+      "entropy": 1.609874677658081,
+      "epoch": 0.35,
+      "grad_norm": 1.003450632095337,
+      "learning_rate": 0.00017670833333333335,
+      "loss": 1.6068243026733398,
+      "mean_token_accuracy": 0.6996320366859436,
+      "num_tokens": 898865.0,
       "step": 560
     },
     {
+      "entropy": 1.773156213760376,
+      "epoch": 0.35625,
+      "grad_norm": 2.341601848602295,
+      "learning_rate": 0.00017629166666666666,
+      "loss": 1.7459211349487305,
+      "mean_token_accuracy": 0.6891302824020386,
+      "num_tokens": 914296.0,
       "step": 570
     },
     {
+      "entropy": 1.7185376048088075,
+      "epoch": 0.3625,
+      "grad_norm": 1.1557060480117798,
+      "learning_rate": 0.000175875,
+      "loss": 1.6925424575805663,
+      "mean_token_accuracy": 0.6805954694747924,
+      "num_tokens": 932234.0,
       "step": 580
     },
     {
+      "entropy": 1.8280374526977539,
+      "epoch": 0.36875,
+      "grad_norm": 1.1782957315444946,
+      "learning_rate": 0.00017545833333333335,
+      "loss": 1.8421060562133789,
+      "mean_token_accuracy": 0.6776642084121705,
+      "num_tokens": 948747.0,
       "step": 590
     },
     {
+      "entropy": 1.8082952618598938,
+      "epoch": 0.375,
+      "grad_norm": 0.9948606491088867,
+      "learning_rate": 0.0001750416666666667,
+      "loss": 1.783558464050293,
+      "mean_token_accuracy": 0.6757851302623749,
+      "num_tokens": 964288.0,
       "step": 600
     },
     {
+      "entropy": 1.760896122455597,
+      "epoch": 0.38125,
+      "grad_norm": 17.713958740234375,
+      "learning_rate": 0.00017462500000000003,
+      "loss": 1.7631986618041993,
+      "mean_token_accuracy": 0.6811207413673401,
+      "num_tokens": 980203.0,
       "step": 610
     },
     {
+      "entropy": 1.9898195564746857,
+      "epoch": 0.3875,
+      "grad_norm": 1.0574253797531128,
+      "learning_rate": 0.00017420833333333334,
+      "loss": 1.9516635894775392,
+      "mean_token_accuracy": 0.6489899933338166,
+      "num_tokens": 996871.0,
       "step": 620
     },
     {
+      "entropy": 1.7820778012275695,
+      "epoch": 0.39375,
+      "grad_norm": 1.0086643695831299,
+      "learning_rate": 0.00017379166666666669,
+      "loss": 1.8043378829956054,
+      "mean_token_accuracy": 0.6813792884349823,
+      "num_tokens": 1012770.0,
       "step": 630
     },
     {
+      "entropy": 1.8386994361877442,
+      "epoch": 0.4,
+      "grad_norm": 1.2745709419250488,
+      "learning_rate": 0.000173375,
+      "loss": 1.8168407440185548,
+      "mean_token_accuracy": 0.6552604496479034,
+      "num_tokens": 1030031.0,
       "step": 640
     },
     {
+      "entropy": 1.6865394830703735,
+      "epoch": 0.40625,
+      "grad_norm": 1.3551218509674072,
+      "learning_rate": 0.00017295833333333334,
+      "loss": 1.6793342590332032,
+      "mean_token_accuracy": 0.6937127232551574,
+      "num_tokens": 1044365.0,
       "step": 650
     },
     {
+      "entropy": 1.69602689743042,
+      "epoch": 0.4125,
+      "grad_norm": 1.1780422925949097,
+      "learning_rate": 0.00017254166666666665,
+      "loss": 1.6850801467895509,
+      "mean_token_accuracy": 0.7048744976520538,
+      "num_tokens": 1059256.0,
       "step": 660
     },
     {
+      "entropy": 1.8743945717811585,
+      "epoch": 0.41875,
+      "grad_norm": 1.2194169759750366,
+      "learning_rate": 0.000172125,
+      "loss": 1.8435325622558594,
+      "mean_token_accuracy": 0.6657077252864838,
+      "num_tokens": 1074881.0,
       "step": 670
     },
     {
+      "entropy": 1.638406789302826,
+      "epoch": 0.425,
+      "grad_norm": 1.2872169017791748,
+      "learning_rate": 0.00017170833333333334,
+      "loss": 1.6532812118530273,
+      "mean_token_accuracy": 0.696779602766037,
+      "num_tokens": 1091137.0,
       "step": 680
     },
     {
+      "entropy": 1.8440260648727418,
+      "epoch": 0.43125,
+      "grad_norm": 1.3588929176330566,
+      "learning_rate": 0.00017129166666666668,
+      "loss": 1.840639877319336,
+      "mean_token_accuracy": 0.6729660153388977,
+      "num_tokens": 1107054.0,
       "step": 690
     },
     {
+      "entropy": 1.5835177421569824,
+      "epoch": 0.4375,
+      "grad_norm": 0.9857878684997559,
+      "learning_rate": 0.00017087500000000002,
+      "loss": 1.5488386154174805,
+      "mean_token_accuracy": 0.724124139547348,
+      "num_tokens": 1121191.0,
       "step": 700
     },
     {
+      "entropy": 1.729893934726715,
+      "epoch": 0.44375,
+      "grad_norm": 1.2562510967254639,
+      "learning_rate": 0.00017045833333333333,
+      "loss": 1.7510330200195312,
+      "mean_token_accuracy": 0.6822909355163574,
+      "num_tokens": 1137417.0,
       "step": 710
     },
     {
+      "entropy": 1.8747714400291442,
+      "epoch": 0.45,
+      "grad_norm": 1.0315498113632202,
+      "learning_rate": 0.00017004166666666668,
+      "loss": 1.8536712646484375,
+      "mean_token_accuracy": 0.668778932094574,
+      "num_tokens": 1153502.0,
       "step": 720
     },
     {
+      "entropy": 1.5935072481632233,
+      "epoch": 0.45625,
+      "grad_norm": 1.1812435388565063,
+      "learning_rate": 0.00016962500000000002,
+      "loss": 1.566417121887207,
+      "mean_token_accuracy": 0.7045138716697693,
+      "num_tokens": 1168537.0,
       "step": 730
     },
     {
+      "entropy": 1.8550025582313538,
+      "epoch": 0.4625,
+      "grad_norm": 0.956068217754364,
+      "learning_rate": 0.00016920833333333336,
+      "loss": 1.854224395751953,
+      "mean_token_accuracy": 0.6738598048686981,
+      "num_tokens": 1183781.0,
       "step": 740
     },
     {
+      "entropy": 2.065062153339386,
+      "epoch": 0.46875,
+      "grad_norm": 1.1881858110427856,
+      "learning_rate": 0.00016879166666666667,
+      "loss": 2.0420166015625,
+      "mean_token_accuracy": 0.6490989983081817,
+      "num_tokens": 1201200.0,
       "step": 750
     },
     {
+      "entropy": 1.6268154442310334,
+      "epoch": 0.475,
+      "grad_norm": 1.0978918075561523,
+      "learning_rate": 0.000168375,
+      "loss": 1.6155092239379882,
+      "mean_token_accuracy": 0.6949241161346436,
+      "num_tokens": 1217619.0,
       "step": 760
     },
     {
+      "entropy": 1.7807599782943726,
+      "epoch": 0.48125,
+      "grad_norm": 1.115274429321289,
+      "learning_rate": 0.00016795833333333335,
+      "loss": 1.7416255950927735,
+      "mean_token_accuracy": 0.6845939517021179,
+      "num_tokens": 1234024.0,
       "step": 770
     },
     {
+      "entropy": 1.6363184571266174,
+      "epoch": 0.4875,
+      "grad_norm": 1.0698058605194092,
+      "learning_rate": 0.0001675416666666667,
+      "loss": 1.658616065979004,
+      "mean_token_accuracy": 0.6895378947257995,
+      "num_tokens": 1249959.0,
       "step": 780
     },
     {
+      "entropy": 1.7100866436958313,
+      "epoch": 0.49375,
+      "grad_norm": 1.5094223022460938,
+      "learning_rate": 0.000167125,
+      "loss": 1.6892465591430663,
+      "mean_token_accuracy": 0.6900394260883331,
+      "num_tokens": 1266082.0,
       "step": 790
     },
     {
+      "entropy": 1.8856651127338409,
+      "epoch": 0.5,
+      "grad_norm": 0.9061095118522644,
+      "learning_rate": 0.00016670833333333332,
+      "loss": 1.825701904296875,
+      "mean_token_accuracy": 0.6656161487102509,
+      "num_tokens": 1282730.0,
       "step": 800
     },
     {
+      "entropy": 1.4934285402297973,
+      "epoch": 0.50625,
+      "grad_norm": 1.262459635734558,
+      "learning_rate": 0.00016629166666666667,
+      "loss": 1.4946110725402832,
+      "mean_token_accuracy": 0.7251970648765564,
+      "num_tokens": 1298552.0,
       "step": 810
     },
     {
+      "entropy": 1.4886265635490417,
+      "epoch": 0.5125,
+      "grad_norm": 1.0677028894424438,
+      "learning_rate": 0.000165875,
+      "loss": 1.4603113174438476,
+      "mean_token_accuracy": 0.7227605879306793,
+      "num_tokens": 1314824.0,
       "step": 820
     },
     {
+      "entropy": 1.692549991607666,
+      "epoch": 0.51875,
+      "grad_norm": 1.0945903062820435,
+      "learning_rate": 0.00016545833333333335,
+      "loss": 1.7372652053833009,
+      "mean_token_accuracy": 0.6853966057300568,
+      "num_tokens": 1330791.0,
       "step": 830
     },
     {
+      "entropy": 1.8210653901100158,
+      "epoch": 0.525,
+      "grad_norm": 1.1291331052780151,
+      "learning_rate": 0.00016504166666666666,
+      "loss": 1.7676584243774414,
+      "mean_token_accuracy": 0.6854879319667816,
+      "num_tokens": 1345756.0,
       "step": 840
     },
     {
+      "entropy": 1.6212540507316588,
+      "epoch": 0.53125,
+      "grad_norm": 1.5413988828659058,
+      "learning_rate": 0.000164625,
+      "loss": 1.623637580871582,
+      "mean_token_accuracy": 0.7191856324672699,
+      "num_tokens": 1359982.0,
       "step": 850
     },
     {
+      "entropy": 1.8811518788337707,
+      "epoch": 0.5375,
+      "grad_norm": 1.1786221265792847,
+      "learning_rate": 0.00016420833333333334,
+      "loss": 1.8713268280029296,
+      "mean_token_accuracy": 0.6602873921394348,
+      "num_tokens": 1376178.0,
       "step": 860
     },
     {
+      "entropy": 2.035761559009552,
+      "epoch": 0.54375,
+      "grad_norm": 1.0984121561050415,
+      "learning_rate": 0.00016379166666666669,
+      "loss": 2.059285354614258,
+      "mean_token_accuracy": 0.6380216658115387,
+      "num_tokens": 1392868.0,
       "step": 870
     },
     {
+      "entropy": 1.6217237949371337,
+      "epoch": 0.55,
+      "grad_norm": 0.9770920276641846,
+      "learning_rate": 0.000163375,
+      "loss": 1.5708234786987305,
+      "mean_token_accuracy": 0.7149775147438049,
+      "num_tokens": 1407764.0,
       "step": 880
     },
     {
+      "entropy": 1.602774453163147,
+      "epoch": 0.55625,
+      "grad_norm": 1.0390586853027344,
+      "learning_rate": 0.00016295833333333334,
+      "loss": 1.607761764526367,
+      "mean_token_accuracy": 0.705094438791275,
+      "num_tokens": 1424197.0,
       "step": 890
     },
     {
+      "entropy": 1.69694527387619,
+      "epoch": 0.5625,
+      "grad_norm": 1.179693579673767,
+      "learning_rate": 0.00016254166666666668,
+      "loss": 1.6948720932006835,
+      "mean_token_accuracy": 0.6927467882633209,
+      "num_tokens": 1440504.0,
       "step": 900
     },
     {
+      "entropy": 1.6066429018974304,
+      "epoch": 0.56875,
+      "grad_norm": 1.1319488286972046,
+      "learning_rate": 0.00016212500000000002,
+      "loss": 1.5969940185546876,
+      "mean_token_accuracy": 0.7075757026672364,
+      "num_tokens": 1456530.0,
       "step": 910
     },
     {
+      "entropy": 1.8973723888397216,
+      "epoch": 0.575,
+      "grad_norm": 1.2241361141204834,
+      "learning_rate": 0.00016170833333333334,
+      "loss": 1.8886999130249023,
+      "mean_token_accuracy": 0.6638000011444092,
+      "num_tokens": 1473296.0,
       "step": 920
     },
     {
+      "entropy": 1.7187514424324035,
+      "epoch": 0.58125,
+      "grad_norm": 1.173000454902649,
+      "learning_rate": 0.00016129166666666668,
+      "loss": 1.6855524063110352,
+      "mean_token_accuracy": 0.6964821815490723,
+      "num_tokens": 1488922.0,
       "step": 930
     },
     {
+      "entropy": 1.8056416869163514,
+      "epoch": 0.5875,
+      "grad_norm": 1.0227336883544922,
+      "learning_rate": 0.000160875,
+      "loss": 1.7846719741821289,
+      "mean_token_accuracy": 0.6708004891872406,
+      "num_tokens": 1506033.0,
       "step": 940
     },
     {
+      "entropy": 1.919889748096466,
+      "epoch": 0.59375,
+      "grad_norm": 0.9519665241241455,
+      "learning_rate": 0.00016045833333333333,
+      "loss": 1.9278553009033204,
+      "mean_token_accuracy": 0.6540423572063446,
+      "num_tokens": 1523413.0,
       "step": 950
     },
     {
+      "entropy": 1.8174611330032349,
+      "epoch": 0.6,
+      "grad_norm": 1.0088615417480469,
+      "learning_rate": 0.00016004166666666668,
+      "loss": 1.7834074020385742,
+      "mean_token_accuracy": 0.6924533307552337,
+      "num_tokens": 1539536.0,
       "step": 960
     },
     {
+      "entropy": 1.9116937160491942,
+      "epoch": 0.60625,
+      "grad_norm": 1.1767348051071167,
+      "learning_rate": 0.000159625,
+      "loss": 1.8945436477661133,
+      "mean_token_accuracy": 0.6457314133644104,
+      "num_tokens": 1557886.0,
       "step": 970
     },
     {
+      "entropy": 1.7096561312675476,
+      "epoch": 0.6125,
+      "grad_norm": 1.1833308935165405,
+      "learning_rate": 0.00015920833333333333,
+      "loss": 1.7359018325805664,
+      "mean_token_accuracy": 0.6804608941078186,
+      "num_tokens": 1574248.0,
       "step": 980
     },
     {
+      "entropy": 1.9041632771492005,
+      "epoch": 0.61875,
+      "grad_norm": 0.9453931450843811,
+      "learning_rate": 0.00015879166666666667,
+      "loss": 1.8600358963012695,
+      "mean_token_accuracy": 0.6616500198841095,
+      "num_tokens": 1590862.0,
       "step": 990
     },
     {
+      "entropy": 1.4851105570793153,
+      "epoch": 0.625,
+      "grad_norm": 1.079835057258606,
+      "learning_rate": 0.00015837500000000001,
+      "loss": 1.4834007263183593,
+      "mean_token_accuracy": 0.7172181904315948,
+      "num_tokens": 1606914.0,
       "step": 1000
     },
     {
+      "entropy": 1.8303247690200806,
+      "epoch": 0.63125,
+      "grad_norm": 0.9633236527442932,
+      "learning_rate": 0.00015795833333333333,
+      "loss": 1.8288990020751954,
+      "mean_token_accuracy": 0.6784947097301484,
+      "num_tokens": 1622896.0,
       "step": 1010
     },
     {
+      "entropy": 1.8160423159599304,
+      "epoch": 0.6375,
+      "grad_norm": 1.007555603981018,
+      "learning_rate": 0.00015754166666666667,
+      "loss": 1.7530982971191407,
+      "mean_token_accuracy": 0.6823331356048584,
+      "num_tokens": 1640376.0,
       "step": 1020
     },
     {
+      "entropy": 1.7904390811920166,
+      "epoch": 0.64375,
+      "grad_norm": 1.3964345455169678,
+      "learning_rate": 0.000157125,
+      "loss": 1.8209213256835937,
+      "mean_token_accuracy": 0.6769470632076263,
+      "num_tokens": 1657007.0,
       "step": 1030
     },
     {
+      "entropy": 1.876240646839142,
+      "epoch": 0.65,
+      "grad_norm": 1.1620566844940186,
+      "learning_rate": 0.00015670833333333335,
+      "loss": 1.879776954650879,
+      "mean_token_accuracy": 0.6752348482608795,
+      "num_tokens": 1674235.0,
       "step": 1040
     },
     {
+      "entropy": 1.40432670712471,
+      "epoch": 0.65625,
+      "grad_norm": 1.1437697410583496,
+      "learning_rate": 0.0001562916666666667,
+      "loss": 1.3821091651916504,
+      "mean_token_accuracy": 0.7261551082134247,
+      "num_tokens": 1690880.0,
       "step": 1050
     },
     {
+      "entropy": 1.630136674642563,
+      "epoch": 0.6625,
+      "grad_norm": 1.173415184020996,
+      "learning_rate": 0.000155875,
+      "loss": 1.6407217025756835,
+      "mean_token_accuracy": 0.7053111135959625,
+      "num_tokens": 1706773.0,
       "step": 1060
     },
     {
+      "entropy": 1.9234841227531434,
+      "epoch": 0.66875,
+      "grad_norm": 0.9936195015907288,
+      "learning_rate": 0.00015545833333333335,
+      "loss": 1.9025312423706056,
+      "mean_token_accuracy": 0.6550717502832413,
+      "num_tokens": 1724083.0,
       "step": 1070
     },
     {
+      "entropy": 1.5203362822532653,
+      "epoch": 0.675,
+      "grad_norm": 1.3403916358947754,
+      "learning_rate": 0.0001550416666666667,
+      "loss": 1.4605630874633788,
+      "mean_token_accuracy": 0.7276029765605927,
+      "num_tokens": 1739086.0,
       "step": 1080
     },
     {
+      "entropy": 1.5262176454067231,
+      "epoch": 0.68125,
+      "grad_norm": 1.052614450454712,
+      "learning_rate": 0.000154625,
+      "loss": 1.542721652984619,
+      "mean_token_accuracy": 0.7090686440467835,
+      "num_tokens": 1754825.0,
       "step": 1090
     },
     {
+      "entropy": 1.8050179362297059,
+      "epoch": 0.6875,
+      "grad_norm": 1.4718170166015625,
+      "learning_rate": 0.00015420833333333335,
+      "loss": 1.777005386352539,
+      "mean_token_accuracy": 0.6807081162929535,
+      "num_tokens": 1770216.0,
       "step": 1100
     },
     {
+      "entropy": 1.6406042158603669,
+      "epoch": 0.69375,
+      "grad_norm": 1.115580439567566,
+      "learning_rate": 0.00015379166666666666,
+      "loss": 1.6249666213989258,
+      "mean_token_accuracy": 0.7058773934841156,
+      "num_tokens": 1785236.0,
       "step": 1110
     },
     {
+      "entropy": 1.6661675333976746,
+      "epoch": 0.7,
+      "grad_norm": 0.9184897541999817,
+      "learning_rate": 0.000153375,
+      "loss": 1.680354690551758,
+      "mean_token_accuracy": 0.7018162786960602,
+      "num_tokens": 1800794.0,
       "step": 1120
     },
     {
+      "entropy": 1.7879603862762452,
+      "epoch": 0.70625,
+      "grad_norm": 1.1904963254928589,
+      "learning_rate": 0.00015295833333333334,
+      "loss": 1.7555608749389648,
+      "mean_token_accuracy": 0.6879275143146515,
+      "num_tokens": 1816368.0,
       "step": 1130
     },
     {
+      "entropy": 1.542227828502655,
+      "epoch": 0.7125,
+      "grad_norm": 1.5405501127243042,
+      "learning_rate": 0.00015254166666666668,
+      "loss": 1.5250240325927735,
+      "mean_token_accuracy": 0.7067281067371368,
+      "num_tokens": 1833799.0,
       "step": 1140
     },
     {
+      "entropy": 1.6808035492897033,
+      "epoch": 0.71875,
+      "grad_norm": 1.0687938928604126,
+      "learning_rate": 0.000152125,
+      "loss": 1.6870901107788085,
+      "mean_token_accuracy": 0.6859738230705261,
+      "num_tokens": 1850599.0,
       "step": 1150
     },
     {
+      "entropy": 1.5208389639854432,
+      "epoch": 0.725,
+      "grad_norm": 0.7306898236274719,
+      "learning_rate": 0.00015170833333333334,
+      "loss": 1.489798355102539,
+      "mean_token_accuracy": 0.7269160747528076,
+      "num_tokens": 1865850.0,
       "step": 1160
     },
     {
+      "entropy": 1.7221656441688538,
+      "epoch": 0.73125,
+      "grad_norm": 1.0556329488754272,
+      "learning_rate": 0.00015129166666666668,
+      "loss": 1.7220314025878907,
+      "mean_token_accuracy": 0.6948069214820862,
+      "num_tokens": 1881109.0,
       "step": 1170
     },
     {
+      "entropy": 1.8467972993850708,
+      "epoch": 0.7375,
+      "grad_norm": 1.0107264518737793,
+      "learning_rate": 0.00015087500000000002,
+      "loss": 1.8298328399658204,
+      "mean_token_accuracy": 0.673337870836258,
+      "num_tokens": 1896961.0,
       "step": 1180
     },
     {
+      "entropy": 1.811994230747223,
+      "epoch": 0.74375,
+      "grad_norm": 0.9903097748756409,
+      "learning_rate": 0.00015045833333333334,
+      "loss": 1.7922752380371094,
+      "mean_token_accuracy": 0.6801791548728943,
+      "num_tokens": 1913474.0,
       "step": 1190
     },
     {
+      "entropy": 1.692976748943329,
+      "epoch": 0.75,
+      "grad_norm": 1.2231838703155518,
+      "learning_rate": 0.00015004166666666668,
+      "loss": 1.7092206954956055,
+      "mean_token_accuracy": 0.7039589881896973,
+      "num_tokens": 1928065.0,
       "step": 1200
     },
     {
+      "entropy": 1.7056877970695496,
+      "epoch": 0.75625,
+      "grad_norm": 1.0669372081756592,
+      "learning_rate": 0.00014962500000000002,
+      "loss": 1.6774791717529296,
+      "mean_token_accuracy": 0.6932863354682922,
+      "num_tokens": 1944000.0,
       "step": 1210
     },
     {
+      "entropy": 1.6272387504577637,
+      "epoch": 0.7625,
+      "grad_norm": 1.0480815172195435,
+      "learning_rate": 0.00014920833333333336,
+      "loss": 1.6001169204711914,
+      "mean_token_accuracy": 0.6986334085464477,
+      "num_tokens": 1959802.0,
       "step": 1220
     },
     {
+      "entropy": 1.6549307227134704,
+      "epoch": 0.76875,
+      "grad_norm": 1.2522614002227783,
+      "learning_rate": 0.00014879166666666667,
+      "loss": 1.670203399658203,
+      "mean_token_accuracy": 0.6849127054214478,
+      "num_tokens": 1976404.0,
       "step": 1230
     },
     {
+      "entropy": 1.5742060959339141,
+      "epoch": 0.775,
+      "grad_norm": 1.3071776628494263,
+      "learning_rate": 0.000148375,
+      "loss": 1.5255179405212402,
+      "mean_token_accuracy": 0.7283611118793487,
+      "num_tokens": 1990354.0,
       "step": 1240
     },
     {
+      "entropy": 1.3672740757465363,
+      "epoch": 0.78125,
+      "grad_norm": 1.1295819282531738,
+      "learning_rate": 0.00014795833333333333,
+      "loss": 1.3578125,
+      "mean_token_accuracy": 0.7339789867401123,
+      "num_tokens": 2007259.0,
       "step": 1250
     },
     {
+      "entropy": 1.5945733308792114,
+      "epoch": 0.7875,
+      "grad_norm": 1.6405155658721924,
+      "learning_rate": 0.00014754166666666667,
+      "loss": 1.5962472915649415,
+      "mean_token_accuracy": 0.6940421521663666,
+      "num_tokens": 2023439.0,
       "step": 1260
     },
     {
+      "entropy": 1.7175377368927003,
+      "epoch": 0.79375,
+      "grad_norm": 1.2672407627105713,
+      "learning_rate": 0.000147125,
+      "loss": 1.7290122985839844,
+      "mean_token_accuracy": 0.6945447564125061,
+      "num_tokens": 2039429.0,
       "step": 1270
     },
     {
+      "entropy": 1.4956220388412476,
+      "epoch": 0.8,
+      "grad_norm": 1.0772604942321777,
+      "learning_rate": 0.00014670833333333333,
+      "loss": 1.48792724609375,
+      "mean_token_accuracy": 0.7135675251483917,
+      "num_tokens": 2054525.0,
       "step": 1280
     },
     {
+      "entropy": 1.4603404819965362,
+      "epoch": 0.80625,
+      "grad_norm": 0.9915527701377869,
+      "learning_rate": 0.00014629166666666667,
+      "loss": 1.4228525161743164,
+      "mean_token_accuracy": 0.7315677225589752,
+      "num_tokens": 2070908.0,
       "step": 1290
     },
     {
+      "entropy": 1.8602357029914856,
+      "epoch": 0.8125,
+      "grad_norm": 1.2213199138641357,
+      "learning_rate": 0.000145875,
+      "loss": 1.875438117980957,
+      "mean_token_accuracy": 0.6696613788604736,
+      "num_tokens": 2086420.0,
       "step": 1300
     },
     {
+      "entropy": 1.7318559408187866,
+      "epoch": 0.81875,
+      "grad_norm": 1.2372366189956665,
+      "learning_rate": 0.00014545833333333335,
+      "loss": 1.7164314270019532,
+      "mean_token_accuracy": 0.6757801532745361,
+      "num_tokens": 2103947.0,
       "step": 1310
     },
     {
+      "entropy": 1.3927726984024047,
+      "epoch": 0.825,
+      "grad_norm": 1.3297343254089355,
+      "learning_rate": 0.00014504166666666666,
+      "loss": 1.3864904403686524,
+      "mean_token_accuracy": 0.7442179620265961,
+      "num_tokens": 2118375.0,
       "step": 1320
     },
     {
+      "entropy": 1.8476340055465699,
+      "epoch": 0.83125,
+      "grad_norm": 1.2429879903793335,
+      "learning_rate": 0.000144625,
+      "loss": 1.870237159729004,
+      "mean_token_accuracy": 0.6771714389324188,
+      "num_tokens": 2133631.0,
       "step": 1330
     },
     {
+      "entropy": 1.5825651347637177,
+      "epoch": 0.8375,
+      "grad_norm": 1.1128071546554565,
+      "learning_rate": 0.00014420833333333335,
+      "loss": 1.5584844589233398,
+      "mean_token_accuracy": 0.718721890449524,
+      "num_tokens": 2149672.0,
       "step": 1340
     },
     {
+      "entropy": 1.4676709055900574,
+      "epoch": 0.84375,
+      "grad_norm": 1.029419183731079,
+      "learning_rate": 0.0001437916666666667,
+      "loss": 1.4486634254455566,
+      "mean_token_accuracy": 0.7196858763694763,
+      "num_tokens": 2165526.0,
       "step": 1350
     },
     {
+      "entropy": 1.6996529340744018,
+      "epoch": 0.85,
+      "grad_norm": 1.1256935596466064,
+      "learning_rate": 0.000143375,
+      "loss": 1.7186290740966796,
+      "mean_token_accuracy": 0.6910524368286133,
+      "num_tokens": 2181925.0,
       "step": 1360
     },
     {
+      "entropy": 1.8775145173072816,
+      "epoch": 0.85625,
+      "grad_norm": 1.0610681772232056,
+      "learning_rate": 0.00014295833333333334,
+      "loss": 1.8524488449096679,
+      "mean_token_accuracy": 0.6767737805843353,
+      "num_tokens": 2197351.0,
       "step": 1370
     },
     {
+      "entropy": 1.7408287942409515,
+      "epoch": 0.8625,
+      "grad_norm": 1.1001033782958984,
+      "learning_rate": 0.00014254166666666668,
+      "loss": 1.7132286071777343,
+      "mean_token_accuracy": 0.6875977098941803,
+      "num_tokens": 2213976.0,
       "step": 1380
     },
     {
+      "entropy": 1.609831404685974,
+      "epoch": 0.86875,
+      "grad_norm": 1.3175855875015259,
+      "learning_rate": 0.000142125,
+      "loss": 1.617106819152832,
+      "mean_token_accuracy": 0.7043311834335327,
+      "num_tokens": 2228967.0,
       "step": 1390
     },
     {
+      "entropy": 1.6383503794670105,
+      "epoch": 0.875,
+      "grad_norm": 1.304242730140686,
+      "learning_rate": 0.00014170833333333334,
+      "loss": 1.6476552963256836,
+      "mean_token_accuracy": 0.6986299633979798,
+      "num_tokens": 2244568.0,
       "step": 1400
     },
     {
+      "entropy": 1.765878963470459,
+      "epoch": 0.88125,
+      "grad_norm": 1.08024263381958,
+      "learning_rate": 0.00014129166666666665,
+      "loss": 1.743129348754883,
+      "mean_token_accuracy": 0.6806416690349579,
+      "num_tokens": 2260843.0,
       "step": 1410
     },
     {
+      "entropy": 1.7234230637550354,
+      "epoch": 0.8875,
+      "grad_norm": 1.1865103244781494,
+      "learning_rate": 0.000140875,
+      "loss": 1.728973960876465,
+      "mean_token_accuracy": 0.6861885011196136,
+      "num_tokens": 2276053.0,
       "step": 1420
     },
     {
+      "entropy": 1.3930821239948272,
+      "epoch": 0.89375,
+      "grad_norm": 1.0010002851486206,
+      "learning_rate": 0.00014045833333333334,
+      "loss": 1.3594303131103516,
+      "mean_token_accuracy": 0.7466361939907074,
+      "num_tokens": 2290658.0,
       "step": 1430
     },
     {
+      "entropy": 1.7306805908679963,
+      "epoch": 0.9,
+      "grad_norm": 0.9718702435493469,
+      "learning_rate": 0.00014004166666666668,
+      "loss": 1.7531225204467773,
+      "mean_token_accuracy": 0.6929883539676667,
+      "num_tokens": 2307306.0,
       "step": 1440
     },
     {
+      "entropy": 2.0208531498908995,
+      "epoch": 0.90625,
+      "grad_norm": 1.210390567779541,
+      "learning_rate": 0.00013962500000000002,
+      "loss": 2.0112279891967773,
+      "mean_token_accuracy": 0.6591072261333466,
+      "num_tokens": 2323106.0,
       "step": 1450
     },
     {
+      "entropy": 1.7247427701950073,
+      "epoch": 0.9125,
+      "grad_norm": 1.0104308128356934,
+      "learning_rate": 0.00013920833333333333,
+      "loss": 1.6930545806884765,
+      "mean_token_accuracy": 0.6927989542484283,
+      "num_tokens": 2339150.0,
       "step": 1460
     },
     {
+      "entropy": 1.5396546006202698,
+      "epoch": 0.91875,
+      "grad_norm": 1.180051326751709,
+      "learning_rate": 0.00013879166666666667,
+      "loss": 1.5373605728149413,
+      "mean_token_accuracy": 0.7118293285369873,
+      "num_tokens": 2355242.0,
       "step": 1470
     },
     {
+      "entropy": 1.4924741625785827,
+      "epoch": 0.925,
+      "grad_norm": 1.0538833141326904,
+      "learning_rate": 0.00013837500000000002,
+      "loss": 1.4466129302978517,
+      "mean_token_accuracy": 0.7262342572212219,
+      "num_tokens": 2371838.0,
       "step": 1480
     },
     {
+      "entropy": 1.6197248876094819,
+      "epoch": 0.93125,
+      "grad_norm": 1.2407019138336182,
+      "learning_rate": 0.00013795833333333336,
+      "loss": 1.6408515930175782,
+      "mean_token_accuracy": 0.6895669877529145,
+      "num_tokens": 2388383.0,
       "step": 1490
     },
     {
+      "entropy": 1.6017064571380615,
+      "epoch": 0.9375,
+      "grad_norm": 1.115491509437561,
+      "learning_rate": 0.00013754166666666667,
+      "loss": 1.6164506912231444,
+      "mean_token_accuracy": 0.7063500344753265,
+      "num_tokens": 2405920.0,
       "step": 1500
     },
     {
+      "entropy": 1.7128301978111267,
+      "epoch": 0.94375,
+      "grad_norm": 1.1029974222183228,
+      "learning_rate": 0.000137125,
+      "loss": 1.6670164108276366,
+      "mean_token_accuracy": 0.6883853197097778,
+      "num_tokens": 2423475.0,
       "step": 1510
     },
     {
+      "entropy": 1.7637011766433717,
+      "epoch": 0.95,
+      "grad_norm": 1.2063648700714111,
+      "learning_rate": 0.00013670833333333335,
+      "loss": 1.753184700012207,
+      "mean_token_accuracy": 0.697669267654419,
+      "num_tokens": 2438334.0,
       "step": 1520
     },
     {
+      "entropy": 1.4334100246429444,
+      "epoch": 0.95625,
+      "grad_norm": 1.211255669593811,
+      "learning_rate": 0.0001362916666666667,
+      "loss": 1.4158055305480957,
+      "mean_token_accuracy": 0.7352574229240417,
+      "num_tokens": 2455594.0,
       "step": 1530
     },
     {
+      "entropy": 1.8677887678146363,
+      "epoch": 0.9625,
+      "grad_norm": 1.433374047279358,
+      "learning_rate": 0.000135875,
+      "loss": 1.8924720764160157,
+      "mean_token_accuracy": 0.6550890862941742,
+      "num_tokens": 2473287.0,
       "step": 1540
     },
     {
+      "entropy": 1.7000919938087464,
+      "epoch": 0.96875,
+      "grad_norm": 1.1278074979782104,
+      "learning_rate": 0.00013545833333333332,
+      "loss": 1.6923490524291993,
+      "mean_token_accuracy": 0.6900858581066132,
+      "num_tokens": 2489522.0,
       "step": 1550
     },
     {
+      "entropy": 1.7960133492946624,
+      "epoch": 0.975,
+      "grad_norm": 1.2543061971664429,
+      "learning_rate": 0.00013504166666666666,
+      "loss": 1.7780380249023438,
+      "mean_token_accuracy": 0.6973862290382385,
+      "num_tokens": 2505959.0,
       "step": 1560
     },
     {
+      "entropy": 1.9542201280593872,
+      "epoch": 0.98125,
+      "grad_norm": 1.0181416273117065,
+      "learning_rate": 0.000134625,
+      "loss": 1.9185314178466797,
+      "mean_token_accuracy": 0.671462482213974,
+      "num_tokens": 2522120.0,
       "step": 1570
     },
     {
+      "entropy": 1.9292239546775818,
+      "epoch": 0.9875,
+      "grad_norm": 1.3379733562469482,
+      "learning_rate": 0.00013420833333333335,
+      "loss": 1.9222425460815429,
+      "mean_token_accuracy": 0.6739412903785705,
+      "num_tokens": 2537106.0,
       "step": 1580
     },
     {
+      "entropy": 1.447037798166275,
+      "epoch": 0.99375,
+      "grad_norm": 0.9749404788017273,
+      "learning_rate": 0.00013379166666666666,
+      "loss": 1.4738496780395507,
+      "mean_token_accuracy": 0.7318728864192963,
+      "num_tokens": 2551675.0,
       "step": 1590
     },
     {
+      "entropy": 1.5189184904098512,
+      "epoch": 1.0,
+      "grad_norm": 1.0811270475387573,
+      "learning_rate": 0.000133375,
+      "loss": 1.4607027053833008,
+      "mean_token_accuracy": 0.7251661479473114,
+      "num_tokens": 2566677.0,
       "step": 1600
     }
   ],
   "logging_steps": 10,
+  "max_steps": 4800,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 3,
   "save_steps": 500,
       "attributes": {}
     }
   },
+  "total_flos": 2.0115725627418624e+16,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

adapters_backup/checkpoint-1600/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9f3d474fca8712f4970235089141cc3151ec0251001f0277101040ba3e632c1d
-size 5585

 version https://git-lfs.github.com/spec/v1
+oid sha256:5bd3e5abc6ef5bc38efc338fc4014b24c23c1bf16f86b2ba243374bd94c6e850
+size 5713

adapters_backup/checkpoint-3200/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: LiquidAI/LFM2.5-1.2B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:LiquidAI/LFM2.5-1.2B-Instruct
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.1

adapters_backup/checkpoint-3200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "LiquidAI/LFM2.5-1.2B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "w1",
+    "out_proj",
+    "w3",
+    "w2",
+    "v_proj",
+    "in_proj",
+    "q_proj",
+    "k_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapters_backup/checkpoint-3200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bc3d8f22c6b55d11ce402d9ec50dbec966734797594e1f719ea71216e3f5fbd4
+size 22240880

adapters_backup/checkpoint-3200/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,45 @@

+{{- bos_token -}}
+{%- set keep_past_thinking = keep_past_thinking | default(false) -%}
+{%- set ns = namespace(system_prompt="") -%}
+{%- if messages[0]["role"] == "system" -%}
+    {%- set ns.system_prompt = messages[0]["content"] -%}
+    {%- set messages = messages[1:] -%}
+{%- endif -%}
+{%- if tools -%}
+    {%- set ns.system_prompt = ns.system_prompt + ("\n" if ns.system_prompt else "") + "List of tools: [" -%}
+    {%- for tool in tools -%}
+        {%- if tool is not string -%}
+            {%- set tool = tool | tojson -%}
+        {%- endif -%}
+        {%- set ns.system_prompt = ns.system_prompt + tool -%}
+        {%- if not loop.last -%}
+            {%- set ns.system_prompt = ns.system_prompt + ", " -%}
+        {%- endif -%}
+    {%- endfor -%}
+    {%- set ns.system_prompt = ns.system_prompt + "]" -%}
+{%- endif -%}
+{%- if ns.system_prompt -%}
+    {{- "<|im_start|>system\n" + ns.system_prompt + "<|im_end|>\n" -}}
+{%- endif -%}
+{%- set ns.last_assistant_index = -1 -%}
+{%- for message in messages -%}
+    {%- if message["role"] == "assistant" -%}
+        {%- set ns.last_assistant_index = loop.index0 -%}
+    {%- endif -%}
+{%- endfor -%}
+{%- for message in messages -%}
+    {{- "<|im_start|>" + message["role"] + "\n" -}}
+    {%- set content = message["content"] -%}
+    {%- if content is not string -%}
+        {%- set content = content | tojson -%}
+    {%- endif -%}
+    {%- if message["role"] == "assistant" and not keep_past_thinking and loop.index0 != ns.last_assistant_index -%}
+        {%- if "</think>" in content -%}
+            {%- set content = content.split("</think>")[-1] | trim -%}
+        {%- endif -%}
+    {%- endif -%}
+    {{- content + "<|im_end|>\n" -}}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{- "<|im_start|>assistant\n" -}}
+{%- endif -%}

adapters_backup/checkpoint-3200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5853997b5ed6222610c8e1d9535629628693c5df15b5039847703714e52f35c6
+size 44583435

adapters_backup/checkpoint-3200/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e3a77d4a8b98ce027a4d6a3b9fb5d7c904e27ec1efd5c0468c24fa26bb738316
+size 14455

adapters_backup/checkpoint-3200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5620a37e2be18cb5e5fff6b7cb9e0fdabc43ac0425bf621bf3160c261dc50fbc
+size 1465

adapters_backup/checkpoint-3200/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

adapters_backup/checkpoint-3200/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "is_local": false,
+  "legacy": false,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|pad|>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "use_default_system_prompt": false,
+  "use_fast": true
+}

adapters_backup/checkpoint-3200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,3234 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 3200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 2.8972578048706055,
+      "epoch": 0.00625,
+      "grad_norm": 1.416805624961853,
+      "learning_rate": 0.00019962500000000001,
+      "loss": 3.8105133056640623,
+      "mean_token_accuracy": 0.4103764593601227,
+      "num_tokens": 17074.0,
+      "step": 10
+    },
+    {
+      "entropy": 2.769114351272583,
+      "epoch": 0.0125,
+      "grad_norm": 1.159595012664795,
+      "learning_rate": 0.00019920833333333336,
+      "loss": 2.690728759765625,
+      "mean_token_accuracy": 0.5351322680711746,
+      "num_tokens": 33777.0,
+      "step": 20
+    },
+    {
+      "entropy": 2.3261287093162535,
+      "epoch": 0.01875,
+      "grad_norm": 1.3773282766342163,
+      "learning_rate": 0.0001987916666666667,
+      "loss": 2.328271675109863,
+      "mean_token_accuracy": 0.5926208615303039,
+      "num_tokens": 49315.0,
+      "step": 30
+    },
+    {
+      "entropy": 2.3075815558433534,
+      "epoch": 0.025,
+      "grad_norm": 1.0916646718978882,
+      "learning_rate": 0.000198375,
+      "loss": 2.1861215591430665,
+      "mean_token_accuracy": 0.6114547044038773,
+      "num_tokens": 65083.0,
+      "step": 40
+    },
+    {
+      "entropy": 1.9178041577339173,
+      "epoch": 0.03125,
+      "grad_norm": 0.9288890361785889,
+      "learning_rate": 0.00019795833333333332,
+      "loss": 1.95428466796875,
+      "mean_token_accuracy": 0.645366108417511,
+      "num_tokens": 81240.0,
+      "step": 50
+    },
+    {
+      "entropy": 2.342257523536682,
+      "epoch": 0.0375,
+      "grad_norm": 1.0486043691635132,
+      "learning_rate": 0.00019754166666666667,
+      "loss": 2.3062065124511717,
+      "mean_token_accuracy": 0.6025018393993378,
+      "num_tokens": 97110.0,
+      "step": 60
+    },
+    {
+      "entropy": 1.842692232131958,
+      "epoch": 0.04375,
+      "grad_norm": 1.1565988063812256,
+      "learning_rate": 0.000197125,
+      "loss": 1.848040771484375,
+      "mean_token_accuracy": 0.649482148885727,
+      "num_tokens": 113661.0,
+      "step": 70
+    },
+    {
+      "entropy": 2.015536868572235,
+      "epoch": 0.05,
+      "grad_norm": 1.036302089691162,
+      "learning_rate": 0.00019670833333333335,
+      "loss": 2.023266410827637,
+      "mean_token_accuracy": 0.6400867640972138,
+      "num_tokens": 129571.0,
+      "step": 80
+    },
+    {
+      "entropy": 2.291021800041199,
+      "epoch": 0.05625,
+      "grad_norm": 1.1765780448913574,
+      "learning_rate": 0.00019629166666666666,
+      "loss": 2.2915937423706056,
+      "mean_token_accuracy": 0.6016066193580627,
+      "num_tokens": 145845.0,
+      "step": 90
+    },
+    {
+      "entropy": 1.9315234899520874,
+      "epoch": 0.0625,
+      "grad_norm": 1.1040469408035278,
+      "learning_rate": 0.000195875,
+      "loss": 1.8839471817016602,
+      "mean_token_accuracy": 0.656380432844162,
+      "num_tokens": 162128.0,
+      "step": 100
+    },
+    {
+      "entropy": 1.864959979057312,
+      "epoch": 0.06875,
+      "grad_norm": 1.0841010808944702,
+      "learning_rate": 0.00019545833333333335,
+      "loss": 1.855326271057129,
+      "mean_token_accuracy": 0.6628630757331848,
+      "num_tokens": 178343.0,
+      "step": 110
+    },
+    {
+      "entropy": 1.9021487474441527,
+      "epoch": 0.075,
+      "grad_norm": 1.0495465993881226,
+      "learning_rate": 0.0001950416666666667,
+      "loss": 1.8911224365234376,
+      "mean_token_accuracy": 0.6627636551856995,
+      "num_tokens": 194216.0,
+      "step": 120
+    },
+    {
+      "entropy": 2.0799292087554933,
+      "epoch": 0.08125,
+      "grad_norm": 1.4638044834136963,
+      "learning_rate": 0.000194625,
+      "loss": 2.0677186965942385,
+      "mean_token_accuracy": 0.6408409655094147,
+      "num_tokens": 209861.0,
+      "step": 130
+    },
+    {
+      "entropy": 2.0656333684921266,
+      "epoch": 0.0875,
+      "grad_norm": 1.2326873540878296,
+      "learning_rate": 0.00019420833333333334,
+      "loss": 2.0436325073242188,
+      "mean_token_accuracy": 0.647420459985733,
+      "num_tokens": 225951.0,
+      "step": 140
+    },
+    {
+      "entropy": 2.151374113559723,
+      "epoch": 0.09375,
+      "grad_norm": 1.209037184715271,
+      "learning_rate": 0.00019379166666666668,
+      "loss": 2.1708988189697265,
+      "mean_token_accuracy": 0.6336644470691681,
+      "num_tokens": 241973.0,
+      "step": 150
+    },
+    {
+      "entropy": 1.9679807424545288,
+      "epoch": 0.1,
+      "grad_norm": 1.0798423290252686,
+      "learning_rate": 0.00019337500000000002,
+      "loss": 1.9049331665039062,
+      "mean_token_accuracy": 0.6611163139343261,
+      "num_tokens": 257148.0,
+      "step": 160
+    },
+    {
+      "entropy": 1.9416646242141724,
+      "epoch": 0.10625,
+      "grad_norm": 0.9878492951393127,
+      "learning_rate": 0.00019295833333333334,
+      "loss": 1.960176658630371,
+      "mean_token_accuracy": 0.6551605999469757,
+      "num_tokens": 273456.0,
+      "step": 170
+    },
+    {
+      "entropy": 1.7779759645462037,
+      "epoch": 0.1125,
+      "grad_norm": 1.074549674987793,
+      "learning_rate": 0.00019254166666666668,
+      "loss": 1.7707120895385742,
+      "mean_token_accuracy": 0.6601927995681762,
+      "num_tokens": 290923.0,
+      "step": 180
+    },
+    {
+      "entropy": 2.111535668373108,
+      "epoch": 0.11875,
+      "grad_norm": 1.4603313207626343,
+      "learning_rate": 0.000192125,
+      "loss": 2.09625358581543,
+      "mean_token_accuracy": 0.6402183502912522,
+      "num_tokens": 307822.0,
+      "step": 190
+    },
+    {
+      "entropy": 2.077592122554779,
+      "epoch": 0.125,
+      "grad_norm": 1.1337363719940186,
+      "learning_rate": 0.00019170833333333334,
+      "loss": 2.084154510498047,
+      "mean_token_accuracy": 0.641227388381958,
+      "num_tokens": 324142.0,
+      "step": 200
+    },
+    {
+      "entropy": 1.8829279899597169,
+      "epoch": 0.13125,
+      "grad_norm": 1.0533121824264526,
+      "learning_rate": 0.00019129166666666668,
+      "loss": 1.83758544921875,
+      "mean_token_accuracy": 0.6638262569904327,
+      "num_tokens": 341542.0,
+      "step": 210
+    },
+    {
+      "entropy": 1.649771249294281,
+      "epoch": 0.1375,
+      "grad_norm": 1.2242692708969116,
+      "learning_rate": 0.000190875,
+      "loss": 1.6660097122192383,
+      "mean_token_accuracy": 0.696986198425293,
+      "num_tokens": 356591.0,
+      "step": 220
+    },
+    {
+      "entropy": 1.6322881817817687,
+      "epoch": 0.14375,
+      "grad_norm": 1.318080186843872,
+      "learning_rate": 0.00019045833333333333,
+      "loss": 1.6340875625610352,
+      "mean_token_accuracy": 0.7008972883224487,
+      "num_tokens": 371764.0,
+      "step": 230
+    },
+    {
+      "entropy": 1.7258678793907165,
+      "epoch": 0.15,
+      "grad_norm": 1.1507346630096436,
+      "learning_rate": 0.00019004166666666667,
+      "loss": 1.7209365844726563,
+      "mean_token_accuracy": 0.6634244680404663,
+      "num_tokens": 390139.0,
+      "step": 240
+    },
+    {
+      "entropy": 2.0835100650787353,
+      "epoch": 0.15625,
+      "grad_norm": 1.1298671960830688,
+      "learning_rate": 0.00018962500000000001,
+      "loss": 2.0685489654541014,
+      "mean_token_accuracy": 0.6533876061439514,
+      "num_tokens": 404727.0,
+      "step": 250
+    },
+    {
+      "entropy": 1.8807834386825562,
+      "epoch": 0.1625,
+      "grad_norm": 1.4069880247116089,
+      "learning_rate": 0.00018920833333333336,
+      "loss": 1.8705434799194336,
+      "mean_token_accuracy": 0.6535706460475922,
+      "num_tokens": 421923.0,
+      "step": 260
+    },
+    {
+      "entropy": 1.6720293521881104,
+      "epoch": 0.16875,
+      "grad_norm": 1.2488282918930054,
+      "learning_rate": 0.00018879166666666667,
+      "loss": 1.647348976135254,
+      "mean_token_accuracy": 0.6866099178791046,
+      "num_tokens": 439038.0,
+      "step": 270
+    },
+    {
+      "entropy": 1.930847203731537,
+      "epoch": 0.175,
+      "grad_norm": 1.0187071561813354,
+      "learning_rate": 0.000188375,
+      "loss": 1.9441492080688476,
+      "mean_token_accuracy": 0.6643437385559082,
+      "num_tokens": 455019.0,
+      "step": 280
+    },
+    {
+      "entropy": 1.783823847770691,
+      "epoch": 0.18125,
+      "grad_norm": 0.991218090057373,
+      "learning_rate": 0.00018795833333333335,
+      "loss": 1.766385841369629,
+      "mean_token_accuracy": 0.6792463660240173,
+      "num_tokens": 470470.0,
+      "step": 290
+    },
+    {
+      "entropy": 1.6973824977874756,
+      "epoch": 0.1875,
+      "grad_norm": 1.1331487894058228,
+      "learning_rate": 0.0001875416666666667,
+      "loss": 1.6720619201660156,
+      "mean_token_accuracy": 0.690255868434906,
+      "num_tokens": 486512.0,
+      "step": 300
+    },
+    {
+      "entropy": 1.881280207633972,
+      "epoch": 0.19375,
+      "grad_norm": 1.0860546827316284,
+      "learning_rate": 0.000187125,
+      "loss": 1.8710559844970702,
+      "mean_token_accuracy": 0.6732289731502533,
+      "num_tokens": 501664.0,
+      "step": 310
+    },
+    {
+      "entropy": 1.928344440460205,
+      "epoch": 0.2,
+      "grad_norm": 1.0820534229278564,
+      "learning_rate": 0.00018670833333333335,
+      "loss": 1.94879093170166,
+      "mean_token_accuracy": 0.6571235120296478,
+      "num_tokens": 516500.0,
+      "step": 320
+    },
+    {
+      "entropy": 1.780434775352478,
+      "epoch": 0.20625,
+      "grad_norm": 1.149436116218567,
+      "learning_rate": 0.0001862916666666667,
+      "loss": 1.739248275756836,
+      "mean_token_accuracy": 0.6907038509845733,
+      "num_tokens": 531623.0,
+      "step": 330
+    },
+    {
+      "entropy": 1.835638737678528,
+      "epoch": 0.2125,
+      "grad_norm": 1.217748999595642,
+      "learning_rate": 0.000185875,
+      "loss": 1.837971305847168,
+      "mean_token_accuracy": 0.6789492428302765,
+      "num_tokens": 547248.0,
+      "step": 340
+    },
+    {
+      "entropy": 1.529280412197113,
+      "epoch": 0.21875,
+      "grad_norm": 1.1209408044815063,
+      "learning_rate": 0.00018545833333333335,
+      "loss": 1.5159669876098634,
+      "mean_token_accuracy": 0.7098696529865265,
+      "num_tokens": 562556.0,
+      "step": 350
+    },
+    {
+      "entropy": 1.9280451774597167,
+      "epoch": 0.225,
+      "grad_norm": 1.0258183479309082,
+      "learning_rate": 0.00018504166666666666,
+      "loss": 1.9479742050170898,
+      "mean_token_accuracy": 0.6640809357166291,
+      "num_tokens": 578023.0,
+      "step": 360
+    },
+    {
+      "entropy": 1.8790152072906494,
+      "epoch": 0.23125,
+      "grad_norm": 1.157669186592102,
+      "learning_rate": 0.000184625,
+      "loss": 1.847334861755371,
+      "mean_token_accuracy": 0.6603596329689025,
+      "num_tokens": 594019.0,
+      "step": 370
+    },
+    {
+      "entropy": 1.8294876575469972,
+      "epoch": 0.2375,
+      "grad_norm": 1.0211504697799683,
+      "learning_rate": 0.00018420833333333334,
+      "loss": 1.8582696914672852,
+      "mean_token_accuracy": 0.6679854333400727,
+      "num_tokens": 609499.0,
+      "step": 380
+    },
+    {
+      "entropy": 1.8593019366264343,
+      "epoch": 0.24375,
+      "grad_norm": 1.2300069332122803,
+      "learning_rate": 0.00018379166666666668,
+      "loss": 1.8436058044433594,
+      "mean_token_accuracy": 0.6740959763526917,
+      "num_tokens": 624831.0,
+      "step": 390
+    },
+    {
+      "entropy": 1.6092237114906311,
+      "epoch": 0.25,
+      "grad_norm": 1.2899959087371826,
+      "learning_rate": 0.000183375,
+      "loss": 1.5911931991577148,
+      "mean_token_accuracy": 0.7107231378555298,
+      "num_tokens": 640781.0,
+      "step": 400
+    },
+    {
+      "entropy": 2.147260272502899,
+      "epoch": 0.25625,
+      "grad_norm": 1.28315007686615,
+      "learning_rate": 0.00018295833333333334,
+      "loss": 2.1315792083740233,
+      "mean_token_accuracy": 0.6412826657295227,
+      "num_tokens": 656795.0,
+      "step": 410
+    },
+    {
+      "entropy": 1.8276140928268432,
+      "epoch": 0.2625,
+      "grad_norm": 0.9926204681396484,
+      "learning_rate": 0.00018254166666666668,
+      "loss": 1.7912399291992187,
+      "mean_token_accuracy": 0.6752909004688263,
+      "num_tokens": 673839.0,
+      "step": 420
+    },
+    {
+      "entropy": 1.725200641155243,
+      "epoch": 0.26875,
+      "grad_norm": 0.9599955677986145,
+      "learning_rate": 0.00018212500000000002,
+      "loss": 1.6968486785888672,
+      "mean_token_accuracy": 0.6876484453678131,
+      "num_tokens": 691102.0,
+      "step": 430
+    },
+    {
+      "entropy": 1.49821537733078,
+      "epoch": 0.275,
+      "grad_norm": 1.1128442287445068,
+      "learning_rate": 0.00018170833333333334,
+      "loss": 1.4911989212036132,
+      "mean_token_accuracy": 0.7070409774780273,
+      "num_tokens": 707939.0,
+      "step": 440
+    },
+    {
+      "entropy": 2.0437518835067747,
+      "epoch": 0.28125,
+      "grad_norm": 1.1485779285430908,
+      "learning_rate": 0.00018129166666666668,
+      "loss": 2.0552061080932615,
+      "mean_token_accuracy": 0.6452532887458802,
+      "num_tokens": 724384.0,
+      "step": 450
+    },
+    {
+      "entropy": 1.9125534653663636,
+      "epoch": 0.2875,
+      "grad_norm": 1.3141529560089111,
+      "learning_rate": 0.00018087500000000002,
+      "loss": 1.8738250732421875,
+      "mean_token_accuracy": 0.6706897974014282,
+      "num_tokens": 739865.0,
+      "step": 460
+    },
+    {
+      "entropy": 1.9561587691307067,
+      "epoch": 0.29375,
+      "grad_norm": 1.0918525457382202,
+      "learning_rate": 0.00018045833333333336,
+      "loss": 1.938099479675293,
+      "mean_token_accuracy": 0.6760513365268708,
+      "num_tokens": 755491.0,
+      "step": 470
+    },
+    {
+      "entropy": 1.6972344875335694,
+      "epoch": 0.3,
+      "grad_norm": 1.183408260345459,
+      "learning_rate": 0.00018004166666666667,
+      "loss": 1.6730932235717773,
+      "mean_token_accuracy": 0.6902998864650727,
+      "num_tokens": 771754.0,
+      "step": 480
+    },
+    {
+      "entropy": 1.6555222153663636,
+      "epoch": 0.30625,
+      "grad_norm": 1.2446097135543823,
+      "learning_rate": 0.000179625,
+      "loss": 1.644314956665039,
+      "mean_token_accuracy": 0.7027111053466797,
+      "num_tokens": 787882.0,
+      "step": 490
+    },
+    {
+      "entropy": 1.6912259459495544,
+      "epoch": 0.3125,
+      "grad_norm": 1.0987075567245483,
+      "learning_rate": 0.00017920833333333333,
+      "loss": 1.6494056701660156,
+      "mean_token_accuracy": 0.6928456544876098,
+      "num_tokens": 804532.0,
+      "step": 500
+    },
+    {
+      "entropy": 1.8515005946159362,
+      "epoch": 0.31875,
+      "grad_norm": 1.1869553327560425,
+      "learning_rate": 0.00017879166666666667,
+      "loss": 1.856374740600586,
+      "mean_token_accuracy": 0.6716830492019653,
+      "num_tokens": 819940.0,
+      "step": 510
+    },
+    {
+      "entropy": 1.696764051914215,
+      "epoch": 0.325,
+      "grad_norm": 1.1994718313217163,
+      "learning_rate": 0.000178375,
+      "loss": 1.6898420333862305,
+      "mean_token_accuracy": 0.6878461837768555,
+      "num_tokens": 835747.0,
+      "step": 520
+    },
+    {
+      "entropy": 1.9474074840545654,
+      "epoch": 0.33125,
+      "grad_norm": 1.0442698001861572,
+      "learning_rate": 0.00017795833333333333,
+      "loss": 1.948105812072754,
+      "mean_token_accuracy": 0.671975576877594,
+      "num_tokens": 850222.0,
+      "step": 530
+    },
+    {
+      "entropy": 1.5088442265987396,
+      "epoch": 0.3375,
+      "grad_norm": 1.0030030012130737,
+      "learning_rate": 0.00017754166666666667,
+      "loss": 1.4812466621398925,
+      "mean_token_accuracy": 0.7255652785301209,
+      "num_tokens": 866098.0,
+      "step": 540
+    },
+    {
+      "entropy": 1.4793359756469726,
+      "epoch": 0.34375,
+      "grad_norm": 1.1266038417816162,
+      "learning_rate": 0.000177125,
+      "loss": 1.483462142944336,
+      "mean_token_accuracy": 0.7108414351940155,
+      "num_tokens": 883108.0,
+      "step": 550
+    },
+    {
+      "entropy": 1.609874677658081,
+      "epoch": 0.35,
+      "grad_norm": 1.003450632095337,
+      "learning_rate": 0.00017670833333333335,
+      "loss": 1.6068243026733398,
+      "mean_token_accuracy": 0.6996320366859436,
+      "num_tokens": 898865.0,
+      "step": 560
+    },
+    {
+      "entropy": 1.773156213760376,
+      "epoch": 0.35625,
+      "grad_norm": 2.341601848602295,
+      "learning_rate": 0.00017629166666666666,
+      "loss": 1.7459211349487305,
+      "mean_token_accuracy": 0.6891302824020386,
+      "num_tokens": 914296.0,
+      "step": 570
+    },
+    {
+      "entropy": 1.7185376048088075,
+      "epoch": 0.3625,
+      "grad_norm": 1.1557060480117798,
+      "learning_rate": 0.000175875,
+      "loss": 1.6925424575805663,
+      "mean_token_accuracy": 0.6805954694747924,
+      "num_tokens": 932234.0,
+      "step": 580
+    },
+    {
+      "entropy": 1.8280374526977539,
+      "epoch": 0.36875,
+      "grad_norm": 1.1782957315444946,
+      "learning_rate": 0.00017545833333333335,
+      "loss": 1.8421060562133789,
+      "mean_token_accuracy": 0.6776642084121705,
+      "num_tokens": 948747.0,
+      "step": 590
+    },
+    {
+      "entropy": 1.8082952618598938,
+      "epoch": 0.375,
+      "grad_norm": 0.9948606491088867,
+      "learning_rate": 0.0001750416666666667,
+      "loss": 1.783558464050293,
+      "mean_token_accuracy": 0.6757851302623749,
+      "num_tokens": 964288.0,
+      "step": 600
+    },
+    {
+      "entropy": 1.760896122455597,
+      "epoch": 0.38125,
+      "grad_norm": 17.713958740234375,
+      "learning_rate": 0.00017462500000000003,
+      "loss": 1.7631986618041993,
+      "mean_token_accuracy": 0.6811207413673401,
+      "num_tokens": 980203.0,
+      "step": 610
+    },
+    {
+      "entropy": 1.9898195564746857,
+      "epoch": 0.3875,
+      "grad_norm": 1.0574253797531128,
+      "learning_rate": 0.00017420833333333334,
+      "loss": 1.9516635894775392,
+      "mean_token_accuracy": 0.6489899933338166,
+      "num_tokens": 996871.0,
+      "step": 620
+    },
+    {
+      "entropy": 1.7820778012275695,
+      "epoch": 0.39375,
+      "grad_norm": 1.0086643695831299,
+      "learning_rate": 0.00017379166666666669,
+      "loss": 1.8043378829956054,
+      "mean_token_accuracy": 0.6813792884349823,
+      "num_tokens": 1012770.0,
+      "step": 630
+    },
+    {
+      "entropy": 1.8386994361877442,
+      "epoch": 0.4,
+      "grad_norm": 1.2745709419250488,
+      "learning_rate": 0.000173375,
+      "loss": 1.8168407440185548,
+      "mean_token_accuracy": 0.6552604496479034,
+      "num_tokens": 1030031.0,
+      "step": 640
+    },
+    {
+      "entropy": 1.6865394830703735,
+      "epoch": 0.40625,
+      "grad_norm": 1.3551218509674072,
+      "learning_rate": 0.00017295833333333334,
+      "loss": 1.6793342590332032,
+      "mean_token_accuracy": 0.6937127232551574,
+      "num_tokens": 1044365.0,
+      "step": 650
+    },
+    {
+      "entropy": 1.69602689743042,
+      "epoch": 0.4125,
+      "grad_norm": 1.1780422925949097,
+      "learning_rate": 0.00017254166666666665,
+      "loss": 1.6850801467895509,
+      "mean_token_accuracy": 0.7048744976520538,
+      "num_tokens": 1059256.0,
+      "step": 660
+    },
+    {
+      "entropy": 1.8743945717811585,
+      "epoch": 0.41875,
+      "grad_norm": 1.2194169759750366,
+      "learning_rate": 0.000172125,
+      "loss": 1.8435325622558594,
+      "mean_token_accuracy": 0.6657077252864838,
+      "num_tokens": 1074881.0,
+      "step": 670
+    },
+    {
+      "entropy": 1.638406789302826,
+      "epoch": 0.425,
+      "grad_norm": 1.2872169017791748,
+      "learning_rate": 0.00017170833333333334,
+      "loss": 1.6532812118530273,
+      "mean_token_accuracy": 0.696779602766037,
+      "num_tokens": 1091137.0,
+      "step": 680
+    },
+    {
+      "entropy": 1.8440260648727418,
+      "epoch": 0.43125,
+      "grad_norm": 1.3588929176330566,
+      "learning_rate": 0.00017129166666666668,
+      "loss": 1.840639877319336,
+      "mean_token_accuracy": 0.6729660153388977,
+      "num_tokens": 1107054.0,
+      "step": 690
+    },
+    {
+      "entropy": 1.5835177421569824,
+      "epoch": 0.4375,
+      "grad_norm": 0.9857878684997559,
+      "learning_rate": 0.00017087500000000002,
+      "loss": 1.5488386154174805,
+      "mean_token_accuracy": 0.724124139547348,
+      "num_tokens": 1121191.0,
+      "step": 700
+    },
+    {
+      "entropy": 1.729893934726715,
+      "epoch": 0.44375,
+      "grad_norm": 1.2562510967254639,
+      "learning_rate": 0.00017045833333333333,
+      "loss": 1.7510330200195312,
+      "mean_token_accuracy": 0.6822909355163574,
+      "num_tokens": 1137417.0,
+      "step": 710
+    },
+    {
+      "entropy": 1.8747714400291442,
+      "epoch": 0.45,
+      "grad_norm": 1.0315498113632202,
+      "learning_rate": 0.00017004166666666668,
+      "loss": 1.8536712646484375,
+      "mean_token_accuracy": 0.668778932094574,
+      "num_tokens": 1153502.0,
+      "step": 720
+    },
+    {
+      "entropy": 1.5935072481632233,
+      "epoch": 0.45625,
+      "grad_norm": 1.1812435388565063,
+      "learning_rate": 0.00016962500000000002,
+      "loss": 1.566417121887207,
+      "mean_token_accuracy": 0.7045138716697693,
+      "num_tokens": 1168537.0,
+      "step": 730
+    },
+    {
+      "entropy": 1.8550025582313538,
+      "epoch": 0.4625,
+      "grad_norm": 0.956068217754364,
+      "learning_rate": 0.00016920833333333336,
+      "loss": 1.854224395751953,
+      "mean_token_accuracy": 0.6738598048686981,
+      "num_tokens": 1183781.0,
+      "step": 740
+    },
+    {
+      "entropy": 2.065062153339386,
+      "epoch": 0.46875,
+      "grad_norm": 1.1881858110427856,
+      "learning_rate": 0.00016879166666666667,
+      "loss": 2.0420166015625,
+      "mean_token_accuracy": 0.6490989983081817,
+      "num_tokens": 1201200.0,
+      "step": 750
+    },
+    {
+      "entropy": 1.6268154442310334,
+      "epoch": 0.475,
+      "grad_norm": 1.0978918075561523,
+      "learning_rate": 0.000168375,
+      "loss": 1.6155092239379882,
+      "mean_token_accuracy": 0.6949241161346436,
+      "num_tokens": 1217619.0,
+      "step": 760
+    },
+    {
+      "entropy": 1.7807599782943726,
+      "epoch": 0.48125,
+      "grad_norm": 1.115274429321289,
+      "learning_rate": 0.00016795833333333335,
+      "loss": 1.7416255950927735,
+      "mean_token_accuracy": 0.6845939517021179,
+      "num_tokens": 1234024.0,
+      "step": 770
+    },
+    {
+      "entropy": 1.6363184571266174,
+      "epoch": 0.4875,
+      "grad_norm": 1.0698058605194092,
+      "learning_rate": 0.0001675416666666667,
+      "loss": 1.658616065979004,
+      "mean_token_accuracy": 0.6895378947257995,
+      "num_tokens": 1249959.0,
+      "step": 780
+    },
+    {
+      "entropy": 1.7100866436958313,
+      "epoch": 0.49375,
+      "grad_norm": 1.5094223022460938,
+      "learning_rate": 0.000167125,
+      "loss": 1.6892465591430663,
+      "mean_token_accuracy": 0.6900394260883331,
+      "num_tokens": 1266082.0,
+      "step": 790
+    },
+    {
+      "entropy": 1.8856651127338409,
+      "epoch": 0.5,
+      "grad_norm": 0.9061095118522644,
+      "learning_rate": 0.00016670833333333332,
+      "loss": 1.825701904296875,
+      "mean_token_accuracy": 0.6656161487102509,
+      "num_tokens": 1282730.0,
+      "step": 800
+    },
+    {
+      "entropy": 1.4934285402297973,
+      "epoch": 0.50625,
+      "grad_norm": 1.262459635734558,
+      "learning_rate": 0.00016629166666666667,
+      "loss": 1.4946110725402832,
+      "mean_token_accuracy": 0.7251970648765564,
+      "num_tokens": 1298552.0,
+      "step": 810
+    },
+    {
+      "entropy": 1.4886265635490417,
+      "epoch": 0.5125,
+      "grad_norm": 1.0677028894424438,
+      "learning_rate": 0.000165875,
+      "loss": 1.4603113174438476,
+      "mean_token_accuracy": 0.7227605879306793,
+      "num_tokens": 1314824.0,
+      "step": 820
+    },
+    {
+      "entropy": 1.692549991607666,
+      "epoch": 0.51875,
+      "grad_norm": 1.0945903062820435,
+      "learning_rate": 0.00016545833333333335,
+      "loss": 1.7372652053833009,
+      "mean_token_accuracy": 0.6853966057300568,
+      "num_tokens": 1330791.0,
+      "step": 830
+    },
+    {
+      "entropy": 1.8210653901100158,
+      "epoch": 0.525,
+      "grad_norm": 1.1291331052780151,
+      "learning_rate": 0.00016504166666666666,
+      "loss": 1.7676584243774414,
+      "mean_token_accuracy": 0.6854879319667816,
+      "num_tokens": 1345756.0,
+      "step": 840
+    },
+    {
+      "entropy": 1.6212540507316588,
+      "epoch": 0.53125,
+      "grad_norm": 1.5413988828659058,
+      "learning_rate": 0.000164625,
+      "loss": 1.623637580871582,
+      "mean_token_accuracy": 0.7191856324672699,
+      "num_tokens": 1359982.0,
+      "step": 850
+    },
+    {
+      "entropy": 1.8811518788337707,
+      "epoch": 0.5375,
+      "grad_norm": 1.1786221265792847,
+      "learning_rate": 0.00016420833333333334,
+      "loss": 1.8713268280029296,
+      "mean_token_accuracy": 0.6602873921394348,
+      "num_tokens": 1376178.0,
+      "step": 860
+    },
+    {
+      "entropy": 2.035761559009552,
+      "epoch": 0.54375,
+      "grad_norm": 1.0984121561050415,
+      "learning_rate": 0.00016379166666666669,
+      "loss": 2.059285354614258,
+      "mean_token_accuracy": 0.6380216658115387,
+      "num_tokens": 1392868.0,
+      "step": 870
+    },
+    {
+      "entropy": 1.6217237949371337,
+      "epoch": 0.55,
+      "grad_norm": 0.9770920276641846,
+      "learning_rate": 0.000163375,
+      "loss": 1.5708234786987305,
+      "mean_token_accuracy": 0.7149775147438049,
+      "num_tokens": 1407764.0,
+      "step": 880
+    },
+    {
+      "entropy": 1.602774453163147,
+      "epoch": 0.55625,
+      "grad_norm": 1.0390586853027344,
+      "learning_rate": 0.00016295833333333334,
+      "loss": 1.607761764526367,
+      "mean_token_accuracy": 0.705094438791275,
+      "num_tokens": 1424197.0,
+      "step": 890
+    },
+    {
+      "entropy": 1.69694527387619,
+      "epoch": 0.5625,
+      "grad_norm": 1.179693579673767,
+      "learning_rate": 0.00016254166666666668,
+      "loss": 1.6948720932006835,
+      "mean_token_accuracy": 0.6927467882633209,
+      "num_tokens": 1440504.0,
+      "step": 900
+    },
+    {
+      "entropy": 1.6066429018974304,
+      "epoch": 0.56875,
+      "grad_norm": 1.1319488286972046,
+      "learning_rate": 0.00016212500000000002,
+      "loss": 1.5969940185546876,
+      "mean_token_accuracy": 0.7075757026672364,
+      "num_tokens": 1456530.0,
+      "step": 910
+    },
+    {
+      "entropy": 1.8973723888397216,
+      "epoch": 0.575,
+      "grad_norm": 1.2241361141204834,
+      "learning_rate": 0.00016170833333333334,
+      "loss": 1.8886999130249023,
+      "mean_token_accuracy": 0.6638000011444092,
+      "num_tokens": 1473296.0,
+      "step": 920
+    },
+    {
+      "entropy": 1.7187514424324035,
+      "epoch": 0.58125,
+      "grad_norm": 1.173000454902649,
+      "learning_rate": 0.00016129166666666668,
+      "loss": 1.6855524063110352,
+      "mean_token_accuracy": 0.6964821815490723,
+      "num_tokens": 1488922.0,
+      "step": 930
+    },
+    {
+      "entropy": 1.8056416869163514,
+      "epoch": 0.5875,
+      "grad_norm": 1.0227336883544922,
+      "learning_rate": 0.000160875,
+      "loss": 1.7846719741821289,
+      "mean_token_accuracy": 0.6708004891872406,
+      "num_tokens": 1506033.0,
+      "step": 940
+    },
+    {
+      "entropy": 1.919889748096466,
+      "epoch": 0.59375,
+      "grad_norm": 0.9519665241241455,
+      "learning_rate": 0.00016045833333333333,
+      "loss": 1.9278553009033204,
+      "mean_token_accuracy": 0.6540423572063446,
+      "num_tokens": 1523413.0,
+      "step": 950
+    },
+    {
+      "entropy": 1.8174611330032349,
+      "epoch": 0.6,
+      "grad_norm": 1.0088615417480469,
+      "learning_rate": 0.00016004166666666668,
+      "loss": 1.7834074020385742,
+      "mean_token_accuracy": 0.6924533307552337,
+      "num_tokens": 1539536.0,
+      "step": 960
+    },
+    {
+      "entropy": 1.9116937160491942,
+      "epoch": 0.60625,
+      "grad_norm": 1.1767348051071167,
+      "learning_rate": 0.000159625,
+      "loss": 1.8945436477661133,
+      "mean_token_accuracy": 0.6457314133644104,
+      "num_tokens": 1557886.0,
+      "step": 970
+    },
+    {
+      "entropy": 1.7096561312675476,
+      "epoch": 0.6125,
+      "grad_norm": 1.1833308935165405,
+      "learning_rate": 0.00015920833333333333,
+      "loss": 1.7359018325805664,
+      "mean_token_accuracy": 0.6804608941078186,
+      "num_tokens": 1574248.0,
+      "step": 980
+    },
+    {
+      "entropy": 1.9041632771492005,
+      "epoch": 0.61875,
+      "grad_norm": 0.9453931450843811,
+      "learning_rate": 0.00015879166666666667,
+      "loss": 1.8600358963012695,
+      "mean_token_accuracy": 0.6616500198841095,
+      "num_tokens": 1590862.0,
+      "step": 990
+    },
+    {
+      "entropy": 1.4851105570793153,
+      "epoch": 0.625,
+      "grad_norm": 1.079835057258606,
+      "learning_rate": 0.00015837500000000001,
+      "loss": 1.4834007263183593,
+      "mean_token_accuracy": 0.7172181904315948,
+      "num_tokens": 1606914.0,
+      "step": 1000
+    },
+    {
+      "entropy": 1.8303247690200806,
+      "epoch": 0.63125,
+      "grad_norm": 0.9633236527442932,
+      "learning_rate": 0.00015795833333333333,
+      "loss": 1.8288990020751954,
+      "mean_token_accuracy": 0.6784947097301484,
+      "num_tokens": 1622896.0,
+      "step": 1010
+    },
+    {
+      "entropy": 1.8160423159599304,
+      "epoch": 0.6375,
+      "grad_norm": 1.007555603981018,
+      "learning_rate": 0.00015754166666666667,
+      "loss": 1.7530982971191407,
+      "mean_token_accuracy": 0.6823331356048584,
+      "num_tokens": 1640376.0,
+      "step": 1020
+    },
+    {
+      "entropy": 1.7904390811920166,
+      "epoch": 0.64375,
+      "grad_norm": 1.3964345455169678,
+      "learning_rate": 0.000157125,
+      "loss": 1.8209213256835937,
+      "mean_token_accuracy": 0.6769470632076263,
+      "num_tokens": 1657007.0,
+      "step": 1030
+    },
+    {
+      "entropy": 1.876240646839142,
+      "epoch": 0.65,
+      "grad_norm": 1.1620566844940186,
+      "learning_rate": 0.00015670833333333335,
+      "loss": 1.879776954650879,
+      "mean_token_accuracy": 0.6752348482608795,
+      "num_tokens": 1674235.0,
+      "step": 1040
+    },
+    {
+      "entropy": 1.40432670712471,
+      "epoch": 0.65625,
+      "grad_norm": 1.1437697410583496,
+      "learning_rate": 0.0001562916666666667,
+      "loss": 1.3821091651916504,
+      "mean_token_accuracy": 0.7261551082134247,
+      "num_tokens": 1690880.0,
+      "step": 1050
+    },
+    {
+      "entropy": 1.630136674642563,
+      "epoch": 0.6625,
+      "grad_norm": 1.173415184020996,
+      "learning_rate": 0.000155875,
+      "loss": 1.6407217025756835,
+      "mean_token_accuracy": 0.7053111135959625,
+      "num_tokens": 1706773.0,
+      "step": 1060
+    },
+    {
+      "entropy": 1.9234841227531434,
+      "epoch": 0.66875,
+      "grad_norm": 0.9936195015907288,
+      "learning_rate": 0.00015545833333333335,
+      "loss": 1.9025312423706056,
+      "mean_token_accuracy": 0.6550717502832413,
+      "num_tokens": 1724083.0,
+      "step": 1070
+    },
+    {
+      "entropy": 1.5203362822532653,
+      "epoch": 0.675,
+      "grad_norm": 1.3403916358947754,
+      "learning_rate": 0.0001550416666666667,
+      "loss": 1.4605630874633788,
+      "mean_token_accuracy": 0.7276029765605927,
+      "num_tokens": 1739086.0,
+      "step": 1080
+    },
+    {
+      "entropy": 1.5262176454067231,
+      "epoch": 0.68125,
+      "grad_norm": 1.052614450454712,
+      "learning_rate": 0.000154625,
+      "loss": 1.542721652984619,
+      "mean_token_accuracy": 0.7090686440467835,
+      "num_tokens": 1754825.0,
+      "step": 1090
+    },
+    {
+      "entropy": 1.8050179362297059,
+      "epoch": 0.6875,
+      "grad_norm": 1.4718170166015625,
+      "learning_rate": 0.00015420833333333335,
+      "loss": 1.777005386352539,
+      "mean_token_accuracy": 0.6807081162929535,
+      "num_tokens": 1770216.0,
+      "step": 1100
+    },
+    {
+      "entropy": 1.6406042158603669,
+      "epoch": 0.69375,
+      "grad_norm": 1.115580439567566,
+      "learning_rate": 0.00015379166666666666,
+      "loss": 1.6249666213989258,
+      "mean_token_accuracy": 0.7058773934841156,
+      "num_tokens": 1785236.0,
+      "step": 1110
+    },
+    {
+      "entropy": 1.6661675333976746,
+      "epoch": 0.7,
+      "grad_norm": 0.9184897541999817,
+      "learning_rate": 0.000153375,
+      "loss": 1.680354690551758,
+      "mean_token_accuracy": 0.7018162786960602,
+      "num_tokens": 1800794.0,
+      "step": 1120
+    },
+    {
+      "entropy": 1.7879603862762452,
+      "epoch": 0.70625,
+      "grad_norm": 1.1904963254928589,
+      "learning_rate": 0.00015295833333333334,
+      "loss": 1.7555608749389648,
+      "mean_token_accuracy": 0.6879275143146515,
+      "num_tokens": 1816368.0,
+      "step": 1130
+    },
+    {
+      "entropy": 1.542227828502655,
+      "epoch": 0.7125,
+      "grad_norm": 1.5405501127243042,
+      "learning_rate": 0.00015254166666666668,
+      "loss": 1.5250240325927735,
+      "mean_token_accuracy": 0.7067281067371368,
+      "num_tokens": 1833799.0,
+      "step": 1140
+    },
+    {
+      "entropy": 1.6808035492897033,
+      "epoch": 0.71875,
+      "grad_norm": 1.0687938928604126,
+      "learning_rate": 0.000152125,
+      "loss": 1.6870901107788085,
+      "mean_token_accuracy": 0.6859738230705261,
+      "num_tokens": 1850599.0,
+      "step": 1150
+    },
+    {
+      "entropy": 1.5208389639854432,
+      "epoch": 0.725,
+      "grad_norm": 0.7306898236274719,
+      "learning_rate": 0.00015170833333333334,
+      "loss": 1.489798355102539,
+      "mean_token_accuracy": 0.7269160747528076,
+      "num_tokens": 1865850.0,
+      "step": 1160
+    },
+    {
+      "entropy": 1.7221656441688538,
+      "epoch": 0.73125,
+      "grad_norm": 1.0556329488754272,
+      "learning_rate": 0.00015129166666666668,
+      "loss": 1.7220314025878907,
+      "mean_token_accuracy": 0.6948069214820862,
+      "num_tokens": 1881109.0,
+      "step": 1170
+    },
+    {
+      "entropy": 1.8467972993850708,
+      "epoch": 0.7375,
+      "grad_norm": 1.0107264518737793,
+      "learning_rate": 0.00015087500000000002,
+      "loss": 1.8298328399658204,
+      "mean_token_accuracy": 0.673337870836258,
+      "num_tokens": 1896961.0,
+      "step": 1180
+    },
+    {
+      "entropy": 1.811994230747223,
+      "epoch": 0.74375,
+      "grad_norm": 0.9903097748756409,
+      "learning_rate": 0.00015045833333333334,
+      "loss": 1.7922752380371094,
+      "mean_token_accuracy": 0.6801791548728943,
+      "num_tokens": 1913474.0,
+      "step": 1190
+    },
+    {
+      "entropy": 1.692976748943329,
+      "epoch": 0.75,
+      "grad_norm": 1.2231838703155518,
+      "learning_rate": 0.00015004166666666668,
+      "loss": 1.7092206954956055,
+      "mean_token_accuracy": 0.7039589881896973,
+      "num_tokens": 1928065.0,
+      "step": 1200
+    },
+    {
+      "entropy": 1.7056877970695496,
+      "epoch": 0.75625,
+      "grad_norm": 1.0669372081756592,
+      "learning_rate": 0.00014962500000000002,
+      "loss": 1.6774791717529296,
+      "mean_token_accuracy": 0.6932863354682922,
+      "num_tokens": 1944000.0,
+      "step": 1210
+    },
+    {
+      "entropy": 1.6272387504577637,
+      "epoch": 0.7625,
+      "grad_norm": 1.0480815172195435,
+      "learning_rate": 0.00014920833333333336,
+      "loss": 1.6001169204711914,
+      "mean_token_accuracy": 0.6986334085464477,
+      "num_tokens": 1959802.0,
+      "step": 1220
+    },
+    {
+      "entropy": 1.6549307227134704,
+      "epoch": 0.76875,
+      "grad_norm": 1.2522614002227783,
+      "learning_rate": 0.00014879166666666667,
+      "loss": 1.670203399658203,
+      "mean_token_accuracy": 0.6849127054214478,
+      "num_tokens": 1976404.0,
+      "step": 1230
+    },
+    {
+      "entropy": 1.5742060959339141,
+      "epoch": 0.775,
+      "grad_norm": 1.3071776628494263,
+      "learning_rate": 0.000148375,
+      "loss": 1.5255179405212402,
+      "mean_token_accuracy": 0.7283611118793487,
+      "num_tokens": 1990354.0,
+      "step": 1240
+    },
+    {
+      "entropy": 1.3672740757465363,
+      "epoch": 0.78125,
+      "grad_norm": 1.1295819282531738,
+      "learning_rate": 0.00014795833333333333,
+      "loss": 1.3578125,
+      "mean_token_accuracy": 0.7339789867401123,
+      "num_tokens": 2007259.0,
+      "step": 1250
+    },
+    {
+      "entropy": 1.5945733308792114,
+      "epoch": 0.7875,
+      "grad_norm": 1.6405155658721924,
+      "learning_rate": 0.00014754166666666667,
+      "loss": 1.5962472915649415,
+      "mean_token_accuracy": 0.6940421521663666,
+      "num_tokens": 2023439.0,
+      "step": 1260
+    },
+    {
+      "entropy": 1.7175377368927003,
+      "epoch": 0.79375,
+      "grad_norm": 1.2672407627105713,
+      "learning_rate": 0.000147125,
+      "loss": 1.7290122985839844,
+      "mean_token_accuracy": 0.6945447564125061,
+      "num_tokens": 2039429.0,
+      "step": 1270
+    },
+    {
+      "entropy": 1.4956220388412476,
+      "epoch": 0.8,
+      "grad_norm": 1.0772604942321777,
+      "learning_rate": 0.00014670833333333333,
+      "loss": 1.48792724609375,
+      "mean_token_accuracy": 0.7135675251483917,
+      "num_tokens": 2054525.0,
+      "step": 1280
+    },
+    {
+      "entropy": 1.4603404819965362,
+      "epoch": 0.80625,
+      "grad_norm": 0.9915527701377869,
+      "learning_rate": 0.00014629166666666667,
+      "loss": 1.4228525161743164,
+      "mean_token_accuracy": 0.7315677225589752,
+      "num_tokens": 2070908.0,
+      "step": 1290
+    },
+    {
+      "entropy": 1.8602357029914856,
+      "epoch": 0.8125,
+      "grad_norm": 1.2213199138641357,
+      "learning_rate": 0.000145875,
+      "loss": 1.875438117980957,
+      "mean_token_accuracy": 0.6696613788604736,
+      "num_tokens": 2086420.0,
+      "step": 1300
+    },
+    {
+      "entropy": 1.7318559408187866,
+      "epoch": 0.81875,
+      "grad_norm": 1.2372366189956665,
+      "learning_rate": 0.00014545833333333335,
+      "loss": 1.7164314270019532,
+      "mean_token_accuracy": 0.6757801532745361,
+      "num_tokens": 2103947.0,
+      "step": 1310
+    },
+    {
+      "entropy": 1.3927726984024047,
+      "epoch": 0.825,
+      "grad_norm": 1.3297343254089355,
+      "learning_rate": 0.00014504166666666666,
+      "loss": 1.3864904403686524,
+      "mean_token_accuracy": 0.7442179620265961,
+      "num_tokens": 2118375.0,
+      "step": 1320
+    },
+    {
+      "entropy": 1.8476340055465699,
+      "epoch": 0.83125,
+      "grad_norm": 1.2429879903793335,
+      "learning_rate": 0.000144625,
+      "loss": 1.870237159729004,
+      "mean_token_accuracy": 0.6771714389324188,
+      "num_tokens": 2133631.0,
+      "step": 1330
+    },
+    {
+      "entropy": 1.5825651347637177,
+      "epoch": 0.8375,
+      "grad_norm": 1.1128071546554565,
+      "learning_rate": 0.00014420833333333335,
+      "loss": 1.5584844589233398,
+      "mean_token_accuracy": 0.718721890449524,
+      "num_tokens": 2149672.0,
+      "step": 1340
+    },
+    {
+      "entropy": 1.4676709055900574,
+      "epoch": 0.84375,
+      "grad_norm": 1.029419183731079,
+      "learning_rate": 0.0001437916666666667,
+      "loss": 1.4486634254455566,
+      "mean_token_accuracy": 0.7196858763694763,
+      "num_tokens": 2165526.0,
+      "step": 1350
+    },
+    {
+      "entropy": 1.6996529340744018,
+      "epoch": 0.85,
+      "grad_norm": 1.1256935596466064,
+      "learning_rate": 0.000143375,
+      "loss": 1.7186290740966796,
+      "mean_token_accuracy": 0.6910524368286133,
+      "num_tokens": 2181925.0,
+      "step": 1360
+    },
+    {
+      "entropy": 1.8775145173072816,
+      "epoch": 0.85625,
+      "grad_norm": 1.0610681772232056,
+      "learning_rate": 0.00014295833333333334,
+      "loss": 1.8524488449096679,
+      "mean_token_accuracy": 0.6767737805843353,
+      "num_tokens": 2197351.0,
+      "step": 1370
+    },
+    {
+      "entropy": 1.7408287942409515,
+      "epoch": 0.8625,
+      "grad_norm": 1.1001033782958984,
+      "learning_rate": 0.00014254166666666668,
+      "loss": 1.7132286071777343,
+      "mean_token_accuracy": 0.6875977098941803,
+      "num_tokens": 2213976.0,
+      "step": 1380
+    },
+    {
+      "entropy": 1.609831404685974,
+      "epoch": 0.86875,
+      "grad_norm": 1.3175855875015259,
+      "learning_rate": 0.000142125,
+      "loss": 1.617106819152832,
+      "mean_token_accuracy": 0.7043311834335327,
+      "num_tokens": 2228967.0,
+      "step": 1390
+    },
+    {
+      "entropy": 1.6383503794670105,
+      "epoch": 0.875,
+      "grad_norm": 1.304242730140686,
+      "learning_rate": 0.00014170833333333334,
+      "loss": 1.6476552963256836,
+      "mean_token_accuracy": 0.6986299633979798,
+      "num_tokens": 2244568.0,
+      "step": 1400
+    },
+    {
+      "entropy": 1.765878963470459,
+      "epoch": 0.88125,
+      "grad_norm": 1.08024263381958,
+      "learning_rate": 0.00014129166666666665,
+      "loss": 1.743129348754883,
+      "mean_token_accuracy": 0.6806416690349579,
+      "num_tokens": 2260843.0,
+      "step": 1410
+    },
+    {
+      "entropy": 1.7234230637550354,
+      "epoch": 0.8875,
+      "grad_norm": 1.1865103244781494,
+      "learning_rate": 0.000140875,
+      "loss": 1.728973960876465,
+      "mean_token_accuracy": 0.6861885011196136,
+      "num_tokens": 2276053.0,
+      "step": 1420
+    },
+    {
+      "entropy": 1.3930821239948272,
+      "epoch": 0.89375,
+      "grad_norm": 1.0010002851486206,
+      "learning_rate": 0.00014045833333333334,
+      "loss": 1.3594303131103516,
+      "mean_token_accuracy": 0.7466361939907074,
+      "num_tokens": 2290658.0,
+      "step": 1430
+    },
+    {
+      "entropy": 1.7306805908679963,
+      "epoch": 0.9,
+      "grad_norm": 0.9718702435493469,
+      "learning_rate": 0.00014004166666666668,
+      "loss": 1.7531225204467773,
+      "mean_token_accuracy": 0.6929883539676667,
+      "num_tokens": 2307306.0,
+      "step": 1440
+    },
+    {
+      "entropy": 2.0208531498908995,
+      "epoch": 0.90625,
+      "grad_norm": 1.210390567779541,
+      "learning_rate": 0.00013962500000000002,
+      "loss": 2.0112279891967773,
+      "mean_token_accuracy": 0.6591072261333466,
+      "num_tokens": 2323106.0,
+      "step": 1450
+    },
+    {
+      "entropy": 1.7247427701950073,
+      "epoch": 0.9125,
+      "grad_norm": 1.0104308128356934,
+      "learning_rate": 0.00013920833333333333,
+      "loss": 1.6930545806884765,
+      "mean_token_accuracy": 0.6927989542484283,
+      "num_tokens": 2339150.0,
+      "step": 1460
+    },
+    {
+      "entropy": 1.5396546006202698,
+      "epoch": 0.91875,
+      "grad_norm": 1.180051326751709,
+      "learning_rate": 0.00013879166666666667,
+      "loss": 1.5373605728149413,
+      "mean_token_accuracy": 0.7118293285369873,
+      "num_tokens": 2355242.0,
+      "step": 1470
+    },
+    {
+      "entropy": 1.4924741625785827,
+      "epoch": 0.925,
+      "grad_norm": 1.0538833141326904,
+      "learning_rate": 0.00013837500000000002,
+      "loss": 1.4466129302978517,
+      "mean_token_accuracy": 0.7262342572212219,
+      "num_tokens": 2371838.0,
+      "step": 1480
+    },
+    {
+      "entropy": 1.6197248876094819,
+      "epoch": 0.93125,
+      "grad_norm": 1.2407019138336182,
+      "learning_rate": 0.00013795833333333336,
+      "loss": 1.6408515930175782,
+      "mean_token_accuracy": 0.6895669877529145,
+      "num_tokens": 2388383.0,
+      "step": 1490
+    },
+    {
+      "entropy": 1.6017064571380615,
+      "epoch": 0.9375,
+      "grad_norm": 1.115491509437561,
+      "learning_rate": 0.00013754166666666667,
+      "loss": 1.6164506912231444,
+      "mean_token_accuracy": 0.7063500344753265,
+      "num_tokens": 2405920.0,
+      "step": 1500
+    },
+    {
+      "entropy": 1.7128301978111267,
+      "epoch": 0.94375,
+      "grad_norm": 1.1029974222183228,
+      "learning_rate": 0.000137125,
+      "loss": 1.6670164108276366,
+      "mean_token_accuracy": 0.6883853197097778,
+      "num_tokens": 2423475.0,
+      "step": 1510
+    },
+    {
+      "entropy": 1.7637011766433717,
+      "epoch": 0.95,
+      "grad_norm": 1.2063648700714111,
+      "learning_rate": 0.00013670833333333335,
+      "loss": 1.753184700012207,
+      "mean_token_accuracy": 0.697669267654419,
+      "num_tokens": 2438334.0,
+      "step": 1520
+    },
+    {
+      "entropy": 1.4334100246429444,
+      "epoch": 0.95625,
+      "grad_norm": 1.211255669593811,
+      "learning_rate": 0.0001362916666666667,
+      "loss": 1.4158055305480957,
+      "mean_token_accuracy": 0.7352574229240417,
+      "num_tokens": 2455594.0,
+      "step": 1530
+    },
+    {
+      "entropy": 1.8677887678146363,
+      "epoch": 0.9625,
+      "grad_norm": 1.433374047279358,
+      "learning_rate": 0.000135875,
+      "loss": 1.8924720764160157,
+      "mean_token_accuracy": 0.6550890862941742,
+      "num_tokens": 2473287.0,
+      "step": 1540
+    },
+    {
+      "entropy": 1.7000919938087464,
+      "epoch": 0.96875,
+      "grad_norm": 1.1278074979782104,
+      "learning_rate": 0.00013545833333333332,
+      "loss": 1.6923490524291993,
+      "mean_token_accuracy": 0.6900858581066132,
+      "num_tokens": 2489522.0,
+      "step": 1550
+    },
+    {
+      "entropy": 1.7960133492946624,
+      "epoch": 0.975,
+      "grad_norm": 1.2543061971664429,
+      "learning_rate": 0.00013504166666666666,
+      "loss": 1.7780380249023438,
+      "mean_token_accuracy": 0.6973862290382385,
+      "num_tokens": 2505959.0,
+      "step": 1560
+    },
+    {
+      "entropy": 1.9542201280593872,
+      "epoch": 0.98125,
+      "grad_norm": 1.0181416273117065,
+      "learning_rate": 0.000134625,
+      "loss": 1.9185314178466797,
+      "mean_token_accuracy": 0.671462482213974,
+      "num_tokens": 2522120.0,
+      "step": 1570
+    },
+    {
+      "entropy": 1.9292239546775818,
+      "epoch": 0.9875,
+      "grad_norm": 1.3379733562469482,
+      "learning_rate": 0.00013420833333333335,
+      "loss": 1.9222425460815429,
+      "mean_token_accuracy": 0.6739412903785705,
+      "num_tokens": 2537106.0,
+      "step": 1580
+    },
+    {
+      "entropy": 1.447037798166275,
+      "epoch": 0.99375,
+      "grad_norm": 0.9749404788017273,
+      "learning_rate": 0.00013379166666666666,
+      "loss": 1.4738496780395507,
+      "mean_token_accuracy": 0.7318728864192963,
+      "num_tokens": 2551675.0,
+      "step": 1590
+    },
+    {
+      "entropy": 1.5189184904098512,
+      "epoch": 1.0,
+      "grad_norm": 1.0811270475387573,
+      "learning_rate": 0.000133375,
+      "loss": 1.4607027053833008,
+      "mean_token_accuracy": 0.7251661479473114,
+      "num_tokens": 2566677.0,
+      "step": 1600
+    },
+    {
+      "entropy": 1.5722868740558624,
+      "epoch": 1.00625,
+      "grad_norm": 1.1551116704940796,
+      "learning_rate": 0.00013295833333333334,
+      "loss": 1.4944665908813477,
+      "mean_token_accuracy": 0.7075038552284241,
+      "num_tokens": 2584587.0,
+      "step": 1610
+    },
+    {
+      "entropy": 1.4713022589683533,
+      "epoch": 1.0125,
+      "grad_norm": 1.3789024353027344,
+      "learning_rate": 0.00013254166666666669,
+      "loss": 1.48437442779541,
+      "mean_token_accuracy": 0.7256957054138183,
+      "num_tokens": 2601445.0,
+      "step": 1620
+    },
+    {
+      "entropy": 1.7074804306030273,
+      "epoch": 1.01875,
+      "grad_norm": 0.9721592664718628,
+      "learning_rate": 0.000132125,
+      "loss": 1.6762861251831054,
+      "mean_token_accuracy": 0.6954738020896911,
+      "num_tokens": 2617555.0,
+      "step": 1630
+    },
+    {
+      "entropy": 1.564556896686554,
+      "epoch": 1.025,
+      "grad_norm": 0.9383876323699951,
+      "learning_rate": 0.00013170833333333334,
+      "loss": 1.5295989036560058,
+      "mean_token_accuracy": 0.718695729970932,
+      "num_tokens": 2634525.0,
+      "step": 1640
+    },
+    {
+      "entropy": 1.5693002760410308,
+      "epoch": 1.03125,
+      "grad_norm": 1.349861741065979,
+      "learning_rate": 0.00013129166666666668,
+      "loss": 1.5277949333190919,
+      "mean_token_accuracy": 0.7092782378196716,
+      "num_tokens": 2650142.0,
+      "step": 1650
+    },
+    {
+      "entropy": 1.578352963924408,
+      "epoch": 1.0375,
+      "grad_norm": 1.1445894241333008,
+      "learning_rate": 0.00013087500000000002,
+      "loss": 1.542719841003418,
+      "mean_token_accuracy": 0.7064581930637359,
+      "num_tokens": 2666972.0,
+      "step": 1660
+    },
+    {
+      "entropy": 1.543465781211853,
+      "epoch": 1.04375,
+      "grad_norm": 1.3881875276565552,
+      "learning_rate": 0.00013045833333333334,
+      "loss": 1.5280303955078125,
+      "mean_token_accuracy": 0.7207661390304565,
+      "num_tokens": 2683010.0,
+      "step": 1670
+    },
+    {
+      "entropy": 1.6715376853942872,
+      "epoch": 1.05,
+      "grad_norm": 1.0701199769973755,
+      "learning_rate": 0.00013004166666666668,
+      "loss": 1.605533981323242,
+      "mean_token_accuracy": 0.7042076170444489,
+      "num_tokens": 2699750.0,
+      "step": 1680
+    },
+    {
+      "entropy": 1.8074408769607544,
+      "epoch": 1.05625,
+      "grad_norm": 1.1535950899124146,
+      "learning_rate": 0.000129625,
+      "loss": 1.8149070739746094,
+      "mean_token_accuracy": 0.6754674971103668,
+      "num_tokens": 2715327.0,
+      "step": 1690
+    },
+    {
+      "entropy": 1.390859466791153,
+      "epoch": 1.0625,
+      "grad_norm": 1.3075913190841675,
+      "learning_rate": 0.00012920833333333333,
+      "loss": 1.3382477760314941,
+      "mean_token_accuracy": 0.7446447789669037,
+      "num_tokens": 2731545.0,
+      "step": 1700
+    },
+    {
+      "entropy": 1.5530614137649537,
+      "epoch": 1.06875,
+      "grad_norm": 1.0769933462142944,
+      "learning_rate": 0.00012879166666666668,
+      "loss": 1.5612698554992677,
+      "mean_token_accuracy": 0.7086196482181549,
+      "num_tokens": 2747474.0,
+      "step": 1710
+    },
+    {
+      "entropy": 1.4727447509765625,
+      "epoch": 1.075,
+      "grad_norm": 1.3405673503875732,
+      "learning_rate": 0.000128375,
+      "loss": 1.467014694213867,
+      "mean_token_accuracy": 0.721301943063736,
+      "num_tokens": 2763752.0,
+      "step": 1720
+    },
+    {
+      "entropy": 1.4368587255477905,
+      "epoch": 1.08125,
+      "grad_norm": 1.1225148439407349,
+      "learning_rate": 0.00012795833333333333,
+      "loss": 1.373732566833496,
+      "mean_token_accuracy": 0.735386061668396,
+      "num_tokens": 2780825.0,
+      "step": 1730
+    },
+    {
+      "entropy": 1.910112488269806,
+      "epoch": 1.0875,
+      "grad_norm": 1.3501793146133423,
+      "learning_rate": 0.00012754166666666667,
+      "loss": 1.8609920501708985,
+      "mean_token_accuracy": 0.6692535221576691,
+      "num_tokens": 2796189.0,
+      "step": 1740
+    },
+    {
+      "entropy": 1.5071870803833007,
+      "epoch": 1.09375,
+      "grad_norm": 1.3129894733428955,
+      "learning_rate": 0.00012712500000000001,
+      "loss": 1.4644898414611816,
+      "mean_token_accuracy": 0.7083827078342437,
+      "num_tokens": 2812366.0,
+      "step": 1750
+    },
+    {
+      "entropy": 1.5374130189418793,
+      "epoch": 1.1,
+      "grad_norm": 1.5225971937179565,
+      "learning_rate": 0.00012670833333333333,
+      "loss": 1.5303051948547364,
+      "mean_token_accuracy": 0.7062317490577698,
+      "num_tokens": 2828931.0,
+      "step": 1760
+    },
+    {
+      "entropy": 1.3442133665084839,
+      "epoch": 1.10625,
+      "grad_norm": 0.9940143823623657,
+      "learning_rate": 0.00012629166666666667,
+      "loss": 1.3562307357788086,
+      "mean_token_accuracy": 0.7415487766265869,
+      "num_tokens": 2845649.0,
+      "step": 1770
+    },
+    {
+      "entropy": 1.7066189289093017,
+      "epoch": 1.1125,
+      "grad_norm": 1.0410038232803345,
+      "learning_rate": 0.000125875,
+      "loss": 1.6697145462036134,
+      "mean_token_accuracy": 0.6902722358703614,
+      "num_tokens": 2863192.0,
+      "step": 1780
+    },
+    {
+      "entropy": 1.461040985584259,
+      "epoch": 1.11875,
+      "grad_norm": 1.3850445747375488,
+      "learning_rate": 0.00012545833333333335,
+      "loss": 1.425504493713379,
+      "mean_token_accuracy": 0.7344222486019134,
+      "num_tokens": 2879185.0,
+      "step": 1790
+    },
+    {
+      "entropy": 1.578921377658844,
+      "epoch": 1.125,
+      "grad_norm": 1.1007256507873535,
+      "learning_rate": 0.00012504166666666667,
+      "loss": 1.5645020484924317,
+      "mean_token_accuracy": 0.7210570514202118,
+      "num_tokens": 2895391.0,
+      "step": 1800
+    },
+    {
+      "entropy": 1.6235330820083618,
+      "epoch": 1.13125,
+      "grad_norm": 1.2813857793807983,
+      "learning_rate": 0.000124625,
+      "loss": 1.587186622619629,
+      "mean_token_accuracy": 0.7002721726894379,
+      "num_tokens": 2911550.0,
+      "step": 1810
+    },
+    {
+      "entropy": 1.4750116109848022,
+      "epoch": 1.1375,
+      "grad_norm": 1.5143760442733765,
+      "learning_rate": 0.00012420833333333335,
+      "loss": 1.463811492919922,
+      "mean_token_accuracy": 0.7254601418972015,
+      "num_tokens": 2927474.0,
+      "step": 1820
+    },
+    {
+      "entropy": 1.6249911546707154,
+      "epoch": 1.14375,
+      "grad_norm": 1.169236183166504,
+      "learning_rate": 0.0001237916666666667,
+      "loss": 1.5588427543640138,
+      "mean_token_accuracy": 0.7059515714645386,
+      "num_tokens": 2943789.0,
+      "step": 1830
+    },
+    {
+      "entropy": 1.30964452624321,
+      "epoch": 1.15,
+      "grad_norm": 1.1322827339172363,
+      "learning_rate": 0.000123375,
+      "loss": 1.307899284362793,
+      "mean_token_accuracy": 0.7390363335609436,
+      "num_tokens": 2959984.0,
+      "step": 1840
+    },
+    {
+      "entropy": 1.6029350578784942,
+      "epoch": 1.15625,
+      "grad_norm": 1.213231086730957,
+      "learning_rate": 0.00012295833333333332,
+      "loss": 1.5782454490661622,
+      "mean_token_accuracy": 0.7148236751556396,
+      "num_tokens": 2975243.0,
+      "step": 1850
+    },
+    {
+      "entropy": 1.5944063544273377,
+      "epoch": 1.1625,
+      "grad_norm": 1.0796669721603394,
+      "learning_rate": 0.00012254166666666666,
+      "loss": 1.5725255012512207,
+      "mean_token_accuracy": 0.7146448731422425,
+      "num_tokens": 2990859.0,
+      "step": 1860
+    },
+    {
+      "entropy": 1.5244378209114076,
+      "epoch": 1.16875,
+      "grad_norm": 1.419023036956787,
+      "learning_rate": 0.000122125,
+      "loss": 1.4670942306518555,
+      "mean_token_accuracy": 0.7239357471466065,
+      "num_tokens": 3007127.0,
+      "step": 1870
+    },
+    {
+      "entropy": 1.6091014623641968,
+      "epoch": 1.175,
+      "grad_norm": 1.4825661182403564,
+      "learning_rate": 0.00012170833333333334,
+      "loss": 1.6093076705932616,
+      "mean_token_accuracy": 0.7088693916797638,
+      "num_tokens": 3022483.0,
+      "step": 1880
+    },
+    {
+      "entropy": 1.45444712638855,
+      "epoch": 1.18125,
+      "grad_norm": 1.3558845520019531,
+      "learning_rate": 0.00012129166666666667,
+      "loss": 1.4141746520996095,
+      "mean_token_accuracy": 0.732920354604721,
+      "num_tokens": 3037617.0,
+      "step": 1890
+    },
+    {
+      "entropy": 1.6253526747226714,
+      "epoch": 1.1875,
+      "grad_norm": 1.3929126262664795,
+      "learning_rate": 0.00012087500000000001,
+      "loss": 1.6000051498413086,
+      "mean_token_accuracy": 0.7068257510662079,
+      "num_tokens": 3053388.0,
+      "step": 1900
+    },
+    {
+      "entropy": 1.5285037100315093,
+      "epoch": 1.19375,
+      "grad_norm": 1.1625466346740723,
+      "learning_rate": 0.00012045833333333334,
+      "loss": 1.4943373680114747,
+      "mean_token_accuracy": 0.7252572357654572,
+      "num_tokens": 3069339.0,
+      "step": 1910
+    },
+    {
+      "entropy": 1.8175406813621522,
+      "epoch": 1.2,
+      "grad_norm": 1.5570393800735474,
+      "learning_rate": 0.00012004166666666668,
+      "loss": 1.8013412475585937,
+      "mean_token_accuracy": 0.6630936324596405,
+      "num_tokens": 3085824.0,
+      "step": 1920
+    },
+    {
+      "entropy": 1.5790231585502625,
+      "epoch": 1.20625,
+      "grad_norm": 1.1993381977081299,
+      "learning_rate": 0.00011962500000000001,
+      "loss": 1.566931915283203,
+      "mean_token_accuracy": 0.7000592827796936,
+      "num_tokens": 3101787.0,
+      "step": 1930
+    },
+    {
+      "entropy": 1.6829537510871888,
+      "epoch": 1.2125,
+      "grad_norm": 1.056292176246643,
+      "learning_rate": 0.00011920833333333335,
+      "loss": 1.6262767791748047,
+      "mean_token_accuracy": 0.7090820074081421,
+      "num_tokens": 3118353.0,
+      "step": 1940
+    },
+    {
+      "entropy": 1.7127684593200683,
+      "epoch": 1.21875,
+      "grad_norm": 1.8370107412338257,
+      "learning_rate": 0.00011879166666666668,
+      "loss": 1.6998786926269531,
+      "mean_token_accuracy": 0.6885552763938904,
+      "num_tokens": 3133677.0,
+      "step": 1950
+    },
+    {
+      "entropy": 1.4167523980140686,
+      "epoch": 1.225,
+      "grad_norm": 1.22276771068573,
+      "learning_rate": 0.00011837500000000002,
+      "loss": 1.400386428833008,
+      "mean_token_accuracy": 0.7395426869392395,
+      "num_tokens": 3149970.0,
+      "step": 1960
+    },
+    {
+      "entropy": 1.5884539484977722,
+      "epoch": 1.23125,
+      "grad_norm": 1.102330207824707,
+      "learning_rate": 0.00011795833333333335,
+      "loss": 1.5546161651611328,
+      "mean_token_accuracy": 0.7126840889453888,
+      "num_tokens": 3165610.0,
+      "step": 1970
+    },
+    {
+      "entropy": 1.4745222628116608,
+      "epoch": 1.2375,
+      "grad_norm": 1.1550710201263428,
+      "learning_rate": 0.00011754166666666669,
+      "loss": 1.4146233558654786,
+      "mean_token_accuracy": 0.722059839963913,
+      "num_tokens": 3182140.0,
+      "step": 1980
+    },
+    {
+      "entropy": 1.2851545333862304,
+      "epoch": 1.24375,
+      "grad_norm": 1.3434416055679321,
+      "learning_rate": 0.000117125,
+      "loss": 1.2884522438049317,
+      "mean_token_accuracy": 0.7470438361167908,
+      "num_tokens": 3198173.0,
+      "step": 1990
+    },
+    {
+      "entropy": 1.2883932530879973,
+      "epoch": 1.25,
+      "grad_norm": 1.4781601428985596,
+      "learning_rate": 0.00011670833333333333,
+      "loss": 1.2788150787353516,
+      "mean_token_accuracy": 0.7492641091346741,
+      "num_tokens": 3213266.0,
+      "step": 2000
+    },
+    {
+      "entropy": 1.4300999522209168,
+      "epoch": 1.25625,
+      "grad_norm": 1.2841081619262695,
+      "learning_rate": 0.00011629166666666667,
+      "loss": 1.370127010345459,
+      "mean_token_accuracy": 0.7352509915828704,
+      "num_tokens": 3231710.0,
+      "step": 2010
+    },
+    {
+      "entropy": 1.4790874660015105,
+      "epoch": 1.2625,
+      "grad_norm": 1.2271722555160522,
+      "learning_rate": 0.000115875,
+      "loss": 1.4632418632507325,
+      "mean_token_accuracy": 0.7333620309829711,
+      "num_tokens": 3246164.0,
+      "step": 2020
+    },
+    {
+      "entropy": 1.5686452507972717,
+      "epoch": 1.26875,
+      "grad_norm": 1.3024920225143433,
+      "learning_rate": 0.00011545833333333334,
+      "loss": 1.5722068786621093,
+      "mean_token_accuracy": 0.7040091097354889,
+      "num_tokens": 3263230.0,
+      "step": 2030
+    },
+    {
+      "entropy": 1.4046522855758667,
+      "epoch": 1.275,
+      "grad_norm": 1.228481650352478,
+      "learning_rate": 0.00011504166666666667,
+      "loss": 1.3798538208007813,
+      "mean_token_accuracy": 0.7337527394294738,
+      "num_tokens": 3278763.0,
+      "step": 2040
+    },
+    {
+      "entropy": 1.3945519745349884,
+      "epoch": 1.28125,
+      "grad_norm": 1.290372610092163,
+      "learning_rate": 0.00011462500000000001,
+      "loss": 1.3584356307983398,
+      "mean_token_accuracy": 0.7443967878818512,
+      "num_tokens": 3293696.0,
+      "step": 2050
+    },
+    {
+      "entropy": 1.4900161147117614,
+      "epoch": 1.2875,
+      "grad_norm": 1.1306453943252563,
+      "learning_rate": 0.00011420833333333334,
+      "loss": 1.4872239112854004,
+      "mean_token_accuracy": 0.7120920658111572,
+      "num_tokens": 3309310.0,
+      "step": 2060
+    },
+    {
+      "entropy": 1.3569008827209472,
+      "epoch": 1.29375,
+      "grad_norm": 1.2758461236953735,
+      "learning_rate": 0.00011379166666666668,
+      "loss": 1.3353228569030762,
+      "mean_token_accuracy": 0.7443707466125489,
+      "num_tokens": 3325168.0,
+      "step": 2070
+    },
+    {
+      "entropy": 1.6185344874858856,
+      "epoch": 1.3,
+      "grad_norm": 1.5052305459976196,
+      "learning_rate": 0.000113375,
+      "loss": 1.6000774383544922,
+      "mean_token_accuracy": 0.7019944131374359,
+      "num_tokens": 3341042.0,
+      "step": 2080
+    },
+    {
+      "entropy": 1.4655375361442566,
+      "epoch": 1.30625,
+      "grad_norm": 1.3974571228027344,
+      "learning_rate": 0.00011295833333333335,
+      "loss": 1.4321935653686524,
+      "mean_token_accuracy": 0.7371616125106811,
+      "num_tokens": 3355186.0,
+      "step": 2090
+    },
+    {
+      "entropy": 1.398791140317917,
+      "epoch": 1.3125,
+      "grad_norm": 1.2042092084884644,
+      "learning_rate": 0.00011254166666666667,
+      "loss": 1.3696802139282227,
+      "mean_token_accuracy": 0.7524273097515106,
+      "num_tokens": 3369191.0,
+      "step": 2100
+    },
+    {
+      "entropy": 1.4808842182159423,
+      "epoch": 1.31875,
+      "grad_norm": 1.6055423021316528,
+      "learning_rate": 0.00011212500000000001,
+      "loss": 1.4401930809020995,
+      "mean_token_accuracy": 0.7326848268508911,
+      "num_tokens": 3384422.0,
+      "step": 2110
+    },
+    {
+      "entropy": 1.4481637477874756,
+      "epoch": 1.325,
+      "grad_norm": 1.3678208589553833,
+      "learning_rate": 0.00011170833333333334,
+      "loss": 1.42474308013916,
+      "mean_token_accuracy": 0.7354761302471161,
+      "num_tokens": 3399444.0,
+      "step": 2120
+    },
+    {
+      "entropy": 1.6247466444969176,
+      "epoch": 1.33125,
+      "grad_norm": 1.223132848739624,
+      "learning_rate": 0.00011129166666666668,
+      "loss": 1.5991174697875976,
+      "mean_token_accuracy": 0.6908230066299439,
+      "num_tokens": 3415804.0,
+      "step": 2130
+    },
+    {
+      "entropy": 1.626929020881653,
+      "epoch": 1.3375,
+      "grad_norm": 1.1557271480560303,
+      "learning_rate": 0.000110875,
+      "loss": 1.6279167175292968,
+      "mean_token_accuracy": 0.698086017370224,
+      "num_tokens": 3432556.0,
+      "step": 2140
+    },
+    {
+      "entropy": 1.3584868609905243,
+      "epoch": 1.34375,
+      "grad_norm": 1.2452704906463623,
+      "learning_rate": 0.00011045833333333333,
+      "loss": 1.3075440406799317,
+      "mean_token_accuracy": 0.7528424978256225,
+      "num_tokens": 3448867.0,
+      "step": 2150
+    },
+    {
+      "entropy": 1.6224844813346864,
+      "epoch": 1.35,
+      "grad_norm": 1.6471002101898193,
+      "learning_rate": 0.00011004166666666667,
+      "loss": 1.610884666442871,
+      "mean_token_accuracy": 0.7035767018795014,
+      "num_tokens": 3465070.0,
+      "step": 2160
+    },
+    {
+      "entropy": 1.5599715054035186,
+      "epoch": 1.35625,
+      "grad_norm": 1.3792170286178589,
+      "learning_rate": 0.000109625,
+      "loss": 1.5434111595153808,
+      "mean_token_accuracy": 0.7160674929618835,
+      "num_tokens": 3480917.0,
+      "step": 2170
+    },
+    {
+      "entropy": 1.5391543865203858,
+      "epoch": 1.3625,
+      "grad_norm": 1.1845561265945435,
+      "learning_rate": 0.00010920833333333334,
+      "loss": 1.5082951545715333,
+      "mean_token_accuracy": 0.7180830955505371,
+      "num_tokens": 3496391.0,
+      "step": 2180
+    },
+    {
+      "entropy": 1.6296917855739594,
+      "epoch": 1.36875,
+      "grad_norm": 1.2620705366134644,
+      "learning_rate": 0.00010879166666666666,
+      "loss": 1.6139934539794922,
+      "mean_token_accuracy": 0.6978007674217224,
+      "num_tokens": 3512618.0,
+      "step": 2190
+    },
+    {
+      "entropy": 1.507855612039566,
+      "epoch": 1.375,
+      "grad_norm": 1.5587466955184937,
+      "learning_rate": 0.000108375,
+      "loss": 1.4773646354675294,
+      "mean_token_accuracy": 0.7376876771450043,
+      "num_tokens": 3527501.0,
+      "step": 2200
+    },
+    {
+      "entropy": 1.4952866971492766,
+      "epoch": 1.38125,
+      "grad_norm": 1.2983455657958984,
+      "learning_rate": 0.00010795833333333333,
+      "loss": 1.4471445083618164,
+      "mean_token_accuracy": 0.7316478371620179,
+      "num_tokens": 3543951.0,
+      "step": 2210
+    },
+    {
+      "entropy": 1.3594684064388276,
+      "epoch": 1.3875,
+      "grad_norm": 1.297422170639038,
+      "learning_rate": 0.00010754166666666667,
+      "loss": 1.3141795158386231,
+      "mean_token_accuracy": 0.7459075093269348,
+      "num_tokens": 3558466.0,
+      "step": 2220
+    },
+    {
+      "entropy": 1.4920692324638367,
+      "epoch": 1.39375,
+      "grad_norm": 1.1560890674591064,
+      "learning_rate": 0.00010712500000000002,
+      "loss": 1.503106689453125,
+      "mean_token_accuracy": 0.7318450331687927,
+      "num_tokens": 3573292.0,
+      "step": 2230
+    },
+    {
+      "entropy": 1.480102813243866,
+      "epoch": 1.4,
+      "grad_norm": 1.3358945846557617,
+      "learning_rate": 0.00010670833333333334,
+      "loss": 1.425229835510254,
+      "mean_token_accuracy": 0.7334981381893158,
+      "num_tokens": 3588973.0,
+      "step": 2240
+    },
+    {
+      "entropy": 1.4895495772361755,
+      "epoch": 1.40625,
+      "grad_norm": 1.1994125843048096,
+      "learning_rate": 0.00010629166666666668,
+      "loss": 1.5040643692016602,
+      "mean_token_accuracy": 0.7194438993930816,
+      "num_tokens": 3604377.0,
+      "step": 2250
+    },
+    {
+      "entropy": 1.5496041357517243,
+      "epoch": 1.4125,
+      "grad_norm": 1.0622626543045044,
+      "learning_rate": 0.00010587500000000001,
+      "loss": 1.5283061981201171,
+      "mean_token_accuracy": 0.7214846253395081,
+      "num_tokens": 3619253.0,
+      "step": 2260
+    },
+    {
+      "entropy": 1.4854934245347977,
+      "epoch": 1.41875,
+      "grad_norm": 1.2156522274017334,
+      "learning_rate": 0.00010545833333333335,
+      "loss": 1.45772066116333,
+      "mean_token_accuracy": 0.7402491807937622,
+      "num_tokens": 3635194.0,
+      "step": 2270
+    },
+    {
+      "entropy": 1.4378338694572448,
+      "epoch": 1.425,
+      "grad_norm": 1.268330693244934,
+      "learning_rate": 0.00010504166666666668,
+      "loss": 1.433200740814209,
+      "mean_token_accuracy": 0.721939891576767,
+      "num_tokens": 3650743.0,
+      "step": 2280
+    },
+    {
+      "entropy": 1.6567742109298706,
+      "epoch": 1.43125,
+      "grad_norm": 1.406450867652893,
+      "learning_rate": 0.000104625,
+      "loss": 1.5962560653686524,
+      "mean_token_accuracy": 0.6955357909202575,
+      "num_tokens": 3666967.0,
+      "step": 2290
+    },
+    {
+      "entropy": 1.4724194526672363,
+      "epoch": 1.4375,
+      "grad_norm": 1.2553515434265137,
+      "learning_rate": 0.00010420833333333334,
+      "loss": 1.4309930801391602,
+      "mean_token_accuracy": 0.7161106109619141,
+      "num_tokens": 3682895.0,
+      "step": 2300
+    },
+    {
+      "entropy": 1.5601205706596375,
+      "epoch": 1.44375,
+      "grad_norm": 1.4266722202301025,
+      "learning_rate": 0.00010379166666666666,
+      "loss": 1.5680569648742675,
+      "mean_token_accuracy": 0.7100606679916381,
+      "num_tokens": 3699159.0,
+      "step": 2310
+    },
+    {
+      "entropy": 1.5770993947982788,
+      "epoch": 1.45,
+      "grad_norm": 1.0669773817062378,
+      "learning_rate": 0.000103375,
+      "loss": 1.5516767501831055,
+      "mean_token_accuracy": 0.7135333299636841,
+      "num_tokens": 3715098.0,
+      "step": 2320
+    },
+    {
+      "entropy": 1.8675716519355774,
+      "epoch": 1.45625,
+      "grad_norm": 1.2342056035995483,
+      "learning_rate": 0.00010295833333333333,
+      "loss": 1.8174869537353515,
+      "mean_token_accuracy": 0.6679181456565857,
+      "num_tokens": 3732870.0,
+      "step": 2330
+    },
+    {
+      "entropy": 1.430324125289917,
+      "epoch": 1.4625,
+      "grad_norm": 1.2945976257324219,
+      "learning_rate": 0.00010254166666666667,
+      "loss": 1.4287038803100587,
+      "mean_token_accuracy": 0.72167067527771,
+      "num_tokens": 3749933.0,
+      "step": 2340
+    },
+    {
+      "entropy": 1.730414831638336,
+      "epoch": 1.46875,
+      "grad_norm": 1.2890760898590088,
+      "learning_rate": 0.000102125,
+      "loss": 1.7310367584228517,
+      "mean_token_accuracy": 0.6827211558818818,
+      "num_tokens": 3765738.0,
+      "step": 2350
+    },
+    {
+      "entropy": 1.589302372932434,
+      "epoch": 1.475,
+      "grad_norm": 1.2703382968902588,
+      "learning_rate": 0.00010170833333333334,
+      "loss": 1.554899311065674,
+      "mean_token_accuracy": 0.7175840258598327,
+      "num_tokens": 3782283.0,
+      "step": 2360
+    },
+    {
+      "entropy": 1.5381306529045105,
+      "epoch": 1.48125,
+      "grad_norm": 1.22355055809021,
+      "learning_rate": 0.00010129166666666667,
+      "loss": 1.5299139022827148,
+      "mean_token_accuracy": 0.720841133594513,
+      "num_tokens": 3798668.0,
+      "step": 2370
+    },
+    {
+      "entropy": 1.4656860113143921,
+      "epoch": 1.4875,
+      "grad_norm": 1.3395017385482788,
+      "learning_rate": 0.00010087500000000001,
+      "loss": 1.4125995635986328,
+      "mean_token_accuracy": 0.7339462757110595,
+      "num_tokens": 3815010.0,
+      "step": 2380
+    },
+    {
+      "entropy": 1.804145634174347,
+      "epoch": 1.49375,
+      "grad_norm": 1.314396619796753,
+      "learning_rate": 0.00010045833333333334,
+      "loss": 1.7817136764526367,
+      "mean_token_accuracy": 0.6736281871795654,
+      "num_tokens": 3832072.0,
+      "step": 2390
+    },
+    {
+      "entropy": 1.4611708521842957,
+      "epoch": 1.5,
+      "grad_norm": 1.1895500421524048,
+      "learning_rate": 0.00010004166666666668,
+      "loss": 1.3968372344970703,
+      "mean_token_accuracy": 0.7360989391803742,
+      "num_tokens": 3847132.0,
+      "step": 2400
+    },
+    {
+      "entropy": 1.6815272569656372,
+      "epoch": 1.50625,
+      "grad_norm": 1.618330955505371,
+      "learning_rate": 9.9625e-05,
+      "loss": 1.6670255661010742,
+      "mean_token_accuracy": 0.6954464137554168,
+      "num_tokens": 3863728.0,
+      "step": 2410
+    },
+    {
+      "entropy": 1.4748624086380004,
+      "epoch": 1.5125,
+      "grad_norm": 1.3931251764297485,
+      "learning_rate": 9.920833333333334e-05,
+      "loss": 1.4734206199645996,
+      "mean_token_accuracy": 0.7243768811225891,
+      "num_tokens": 3881091.0,
+      "step": 2420
+    },
+    {
+      "entropy": 1.5684111356735229,
+      "epoch": 1.51875,
+      "grad_norm": 1.2951520681381226,
+      "learning_rate": 9.879166666666666e-05,
+      "loss": 1.5724835395812988,
+      "mean_token_accuracy": 0.7170560956001282,
+      "num_tokens": 3896950.0,
+      "step": 2430
+    },
+    {
+      "entropy": 1.4423527359962462,
+      "epoch": 1.525,
+      "grad_norm": 1.3819620609283447,
+      "learning_rate": 9.8375e-05,
+      "loss": 1.3830111503601075,
+      "mean_token_accuracy": 0.734988021850586,
+      "num_tokens": 3913111.0,
+      "step": 2440
+    },
+    {
+      "entropy": 1.445133638381958,
+      "epoch": 1.53125,
+      "grad_norm": 1.1904077529907227,
+      "learning_rate": 9.795833333333335e-05,
+      "loss": 1.4090572357177735,
+      "mean_token_accuracy": 0.7465642392635345,
+      "num_tokens": 3928423.0,
+      "step": 2450
+    },
+    {
+      "entropy": 1.3929100334644318,
+      "epoch": 1.5375,
+      "grad_norm": 1.2035553455352783,
+      "learning_rate": 9.754166666666667e-05,
+      "loss": 1.3762245178222656,
+      "mean_token_accuracy": 0.7424494504928589,
+      "num_tokens": 3944592.0,
+      "step": 2460
+    },
+    {
+      "entropy": 1.4243842720985413,
+      "epoch": 1.54375,
+      "grad_norm": 1.23099946975708,
+      "learning_rate": 9.7125e-05,
+      "loss": 1.392878532409668,
+      "mean_token_accuracy": 0.7290188908576966,
+      "num_tokens": 3961698.0,
+      "step": 2470
+    },
+    {
+      "entropy": 1.2783382177352904,
+      "epoch": 1.55,
+      "grad_norm": 1.0636597871780396,
+      "learning_rate": 9.670833333333333e-05,
+      "loss": 1.2710566520690918,
+      "mean_token_accuracy": 0.7456799983978272,
+      "num_tokens": 3978348.0,
+      "step": 2480
+    },
+    {
+      "entropy": 1.4664524912834167,
+      "epoch": 1.55625,
+      "grad_norm": 1.304549217224121,
+      "learning_rate": 9.629166666666667e-05,
+      "loss": 1.4022022247314454,
+      "mean_token_accuracy": 0.7238153696060181,
+      "num_tokens": 3995617.0,
+      "step": 2490
+    },
+    {
+      "entropy": 1.5068158030509948,
+      "epoch": 1.5625,
+      "grad_norm": 1.3583524227142334,
+      "learning_rate": 9.5875e-05,
+      "loss": 1.5101305961608886,
+      "mean_token_accuracy": 0.7212704837322235,
+      "num_tokens": 4011757.0,
+      "step": 2500
+    },
+    {
+      "entropy": 1.3247866868972777,
+      "epoch": 1.56875,
+      "grad_norm": 1.2817496061325073,
+      "learning_rate": 9.545833333333334e-05,
+      "loss": 1.2877973556518554,
+      "mean_token_accuracy": 0.7563863575458527,
+      "num_tokens": 4027626.0,
+      "step": 2510
+    },
+    {
+      "entropy": 1.379275918006897,
+      "epoch": 1.575,
+      "grad_norm": 1.280960202217102,
+      "learning_rate": 9.504166666666667e-05,
+      "loss": 1.3508204460144042,
+      "mean_token_accuracy": 0.7264174938201904,
+      "num_tokens": 4043371.0,
+      "step": 2520
+    },
+    {
+      "entropy": 1.3637795805931092,
+      "epoch": 1.58125,
+      "grad_norm": 1.5878641605377197,
+      "learning_rate": 9.462500000000001e-05,
+      "loss": 1.3469207763671875,
+      "mean_token_accuracy": 0.7273931324481964,
+      "num_tokens": 4060055.0,
+      "step": 2530
+    },
+    {
+      "entropy": 1.5789328217506409,
+      "epoch": 1.5875,
+      "grad_norm": 1.640913486480713,
+      "learning_rate": 9.420833333333334e-05,
+      "loss": 1.5729190826416015,
+      "mean_token_accuracy": 0.7119402408599853,
+      "num_tokens": 4075900.0,
+      "step": 2540
+    },
+    {
+      "entropy": 1.7490926384925842,
+      "epoch": 1.59375,
+      "grad_norm": 1.6071687936782837,
+      "learning_rate": 9.379166666666667e-05,
+      "loss": 1.709273910522461,
+      "mean_token_accuracy": 0.6931175053119659,
+      "num_tokens": 4091912.0,
+      "step": 2550
+    },
+    {
+      "entropy": 1.4965949416160584,
+      "epoch": 1.6,
+      "grad_norm": 1.4065935611724854,
+      "learning_rate": 9.3375e-05,
+      "loss": 1.4635175704956054,
+      "mean_token_accuracy": 0.7154323875904083,
+      "num_tokens": 4107106.0,
+      "step": 2560
+    },
+    {
+      "entropy": 1.4448750913143158,
+      "epoch": 1.60625,
+      "grad_norm": 1.0949947834014893,
+      "learning_rate": 9.295833333333333e-05,
+      "loss": 1.4171462059020996,
+      "mean_token_accuracy": 0.7289912223815918,
+      "num_tokens": 4123314.0,
+      "step": 2570
+    },
+    {
+      "entropy": 1.586699116230011,
+      "epoch": 1.6125,
+      "grad_norm": 1.2809687852859497,
+      "learning_rate": 9.254166666666668e-05,
+      "loss": 1.5513721466064454,
+      "mean_token_accuracy": 0.7160951435565949,
+      "num_tokens": 4139375.0,
+      "step": 2580
+    },
+    {
+      "entropy": 1.355649709701538,
+      "epoch": 1.61875,
+      "grad_norm": 1.2908111810684204,
+      "learning_rate": 9.2125e-05,
+      "loss": 1.3552752494812013,
+      "mean_token_accuracy": 0.7381039083003997,
+      "num_tokens": 4155569.0,
+      "step": 2590
+    },
+    {
+      "entropy": 1.520990651845932,
+      "epoch": 1.625,
+      "grad_norm": 1.3035266399383545,
+      "learning_rate": 9.170833333333334e-05,
+      "loss": 1.5246206283569337,
+      "mean_token_accuracy": 0.708776718378067,
+      "num_tokens": 4172811.0,
+      "step": 2600
+    },
+    {
+      "entropy": 1.5370873808860779,
+      "epoch": 1.63125,
+      "grad_norm": 1.2901692390441895,
+      "learning_rate": 9.129166666666667e-05,
+      "loss": 1.494930362701416,
+      "mean_token_accuracy": 0.7182445406913758,
+      "num_tokens": 4189911.0,
+      "step": 2610
+    },
+    {
+      "entropy": 1.6943754434585572,
+      "epoch": 1.6375,
+      "grad_norm": 1.422568678855896,
+      "learning_rate": 9.0875e-05,
+      "loss": 1.670543098449707,
+      "mean_token_accuracy": 0.6984758317470551,
+      "num_tokens": 4204330.0,
+      "step": 2620
+    },
+    {
+      "entropy": 1.28586905002594,
+      "epoch": 1.64375,
+      "grad_norm": 1.372889757156372,
+      "learning_rate": 9.045833333333333e-05,
+      "loss": 1.2775461196899414,
+      "mean_token_accuracy": 0.7427131831645966,
+      "num_tokens": 4219629.0,
+      "step": 2630
+    },
+    {
+      "entropy": 1.6646262407302856,
+      "epoch": 1.65,
+      "grad_norm": 1.043871283531189,
+      "learning_rate": 9.004166666666667e-05,
+      "loss": 1.6240650177001954,
+      "mean_token_accuracy": 0.701425063610077,
+      "num_tokens": 4235748.0,
+      "step": 2640
+    },
+    {
+      "entropy": 1.3864838480949402,
+      "epoch": 1.65625,
+      "grad_norm": 1.4441967010498047,
+      "learning_rate": 8.962500000000001e-05,
+      "loss": 1.3634360313415528,
+      "mean_token_accuracy": 0.7346426248550415,
+      "num_tokens": 4252128.0,
+      "step": 2650
+    },
+    {
+      "entropy": 1.715321946144104,
+      "epoch": 1.6625,
+      "grad_norm": 1.1895242929458618,
+      "learning_rate": 8.920833333333334e-05,
+      "loss": 1.6987434387207032,
+      "mean_token_accuracy": 0.6735502183437347,
+      "num_tokens": 4269811.0,
+      "step": 2660
+    },
+    {
+      "entropy": 1.5965183973312378,
+      "epoch": 1.66875,
+      "grad_norm": 1.4692190885543823,
+      "learning_rate": 8.879166666666668e-05,
+      "loss": 1.58436918258667,
+      "mean_token_accuracy": 0.7096805095672607,
+      "num_tokens": 4284613.0,
+      "step": 2670
+    },
+    {
+      "entropy": 1.542008912563324,
+      "epoch": 1.675,
+      "grad_norm": 1.316340684890747,
+      "learning_rate": 8.837500000000001e-05,
+      "loss": 1.5008735656738281,
+      "mean_token_accuracy": 0.7172623038291931,
+      "num_tokens": 4301053.0,
+      "step": 2680
+    },
+    {
+      "entropy": 1.4867228150367737,
+      "epoch": 1.68125,
+      "grad_norm": 24.226320266723633,
+      "learning_rate": 8.795833333333335e-05,
+      "loss": 1.460626983642578,
+      "mean_token_accuracy": 0.7286741614341736,
+      "num_tokens": 4316305.0,
+      "step": 2690
+    },
+    {
+      "entropy": 1.7473996877670288,
+      "epoch": 1.6875,
+      "grad_norm": 1.285845160484314,
+      "learning_rate": 8.754166666666666e-05,
+      "loss": 1.7414569854736328,
+      "mean_token_accuracy": 0.6965268373489379,
+      "num_tokens": 4331423.0,
+      "step": 2700
+    },
+    {
+      "entropy": 1.529891985654831,
+      "epoch": 1.69375,
+      "grad_norm": 1.0836328268051147,
+      "learning_rate": 8.7125e-05,
+      "loss": 1.5023365020751953,
+      "mean_token_accuracy": 0.7222744286060333,
+      "num_tokens": 4347395.0,
+      "step": 2710
+    },
+    {
+      "entropy": 1.4650962769985199,
+      "epoch": 1.7,
+      "grad_norm": 1.3328890800476074,
+      "learning_rate": 8.670833333333333e-05,
+      "loss": 1.4283534049987794,
+      "mean_token_accuracy": 0.7230254769325256,
+      "num_tokens": 4363897.0,
+      "step": 2720
+    },
+    {
+      "entropy": 1.7307329058647156,
+      "epoch": 1.70625,
+      "grad_norm": 1.3583158254623413,
+      "learning_rate": 8.629166666666667e-05,
+      "loss": 1.782860565185547,
+      "mean_token_accuracy": 0.6734575390815735,
+      "num_tokens": 4380302.0,
+      "step": 2730
+    },
+    {
+      "entropy": 1.6353549718856812,
+      "epoch": 1.7125,
+      "grad_norm": 1.3317112922668457,
+      "learning_rate": 8.5875e-05,
+      "loss": 1.5920299530029296,
+      "mean_token_accuracy": 0.713880306482315,
+      "num_tokens": 4396276.0,
+      "step": 2740
+    },
+    {
+      "entropy": 1.561123514175415,
+      "epoch": 1.71875,
+      "grad_norm": 1.3166691064834595,
+      "learning_rate": 8.545833333333334e-05,
+      "loss": 1.5372273445129394,
+      "mean_token_accuracy": 0.7211565136909485,
+      "num_tokens": 4411247.0,
+      "step": 2750
+    },
+    {
+      "entropy": 1.564157283306122,
+      "epoch": 1.725,
+      "grad_norm": 1.2636748552322388,
+      "learning_rate": 8.504166666666667e-05,
+      "loss": 1.4918930053710937,
+      "mean_token_accuracy": 0.7158837258815766,
+      "num_tokens": 4427813.0,
+      "step": 2760
+    },
+    {
+      "entropy": 1.4304892539978027,
+      "epoch": 1.73125,
+      "grad_norm": 1.5613315105438232,
+      "learning_rate": 8.4625e-05,
+      "loss": 1.3944540977478028,
+      "mean_token_accuracy": 0.7275691747665405,
+      "num_tokens": 4445296.0,
+      "step": 2770
+    },
+    {
+      "entropy": 1.2236315131187439,
+      "epoch": 1.7375,
+      "grad_norm": 1.2611221075057983,
+      "learning_rate": 8.420833333333334e-05,
+      "loss": 1.1938905715942383,
+      "mean_token_accuracy": 0.7641484498977661,
+      "num_tokens": 4462124.0,
+      "step": 2780
+    },
+    {
+      "entropy": 1.3692725896835327,
+      "epoch": 1.74375,
+      "grad_norm": 1.2629590034484863,
+      "learning_rate": 8.379166666666667e-05,
+      "loss": 1.3642467498779296,
+      "mean_token_accuracy": 0.7303588569164277,
+      "num_tokens": 4478372.0,
+      "step": 2790
+    },
+    {
+      "entropy": 1.6227773070335387,
+      "epoch": 1.75,
+      "grad_norm": 1.2561644315719604,
+      "learning_rate": 8.337500000000001e-05,
+      "loss": 1.6058052062988282,
+      "mean_token_accuracy": 0.700336241722107,
+      "num_tokens": 4494171.0,
+      "step": 2800
+    },
+    {
+      "entropy": 1.428490024805069,
+      "epoch": 1.75625,
+      "grad_norm": 1.3820418119430542,
+      "learning_rate": 8.295833333333333e-05,
+      "loss": 1.3879735946655274,
+      "mean_token_accuracy": 0.7351743221282959,
+      "num_tokens": 4510230.0,
+      "step": 2810
+    },
+    {
+      "entropy": 1.4222940444946288,
+      "epoch": 1.7625,
+      "grad_norm": 1.2397351264953613,
+      "learning_rate": 8.254166666666668e-05,
+      "loss": 1.4101068496704101,
+      "mean_token_accuracy": 0.7230583786964416,
+      "num_tokens": 4527606.0,
+      "step": 2820
+    },
+    {
+      "entropy": 1.4971628785133362,
+      "epoch": 1.76875,
+      "grad_norm": 1.3096486330032349,
+      "learning_rate": 8.2125e-05,
+      "loss": 1.464939785003662,
+      "mean_token_accuracy": 0.7111652135848999,
+      "num_tokens": 4545225.0,
+      "step": 2830
+    },
+    {
+      "entropy": 1.3849402070045471,
+      "epoch": 1.775,
+      "grad_norm": 1.205183982849121,
+      "learning_rate": 8.170833333333335e-05,
+      "loss": 1.3683393478393555,
+      "mean_token_accuracy": 0.7407701790332795,
+      "num_tokens": 4560402.0,
+      "step": 2840
+    },
+    {
+      "entropy": 1.6577628076076507,
+      "epoch": 1.78125,
+      "grad_norm": 1.5654460191726685,
+      "learning_rate": 8.129166666666666e-05,
+      "loss": 1.6171913146972656,
+      "mean_token_accuracy": 0.7075854480266571,
+      "num_tokens": 4576965.0,
+      "step": 2850
+    },
+    {
+      "entropy": 1.3645796418190002,
+      "epoch": 1.7875,
+      "grad_norm": 1.235590934753418,
+      "learning_rate": 8.0875e-05,
+      "loss": 1.347662353515625,
+      "mean_token_accuracy": 0.7304706692695617,
+      "num_tokens": 4593434.0,
+      "step": 2860
+    },
+    {
+      "entropy": 1.4836460769176483,
+      "epoch": 1.79375,
+      "grad_norm": 1.320184350013733,
+      "learning_rate": 8.045833333333334e-05,
+      "loss": 1.4724997520446776,
+      "mean_token_accuracy": 0.7195299446582795,
+      "num_tokens": 4609923.0,
+      "step": 2870
+    },
+    {
+      "entropy": 1.6642062067985535,
+      "epoch": 1.8,
+      "grad_norm": 1.2351288795471191,
+      "learning_rate": 8.004166666666667e-05,
+      "loss": 1.6861392974853515,
+      "mean_token_accuracy": 0.7051100075244904,
+      "num_tokens": 4624454.0,
+      "step": 2880
+    },
+    {
+      "entropy": 1.5779452681541444,
+      "epoch": 1.80625,
+      "grad_norm": 1.2252860069274902,
+      "learning_rate": 7.962500000000001e-05,
+      "loss": 1.533352756500244,
+      "mean_token_accuracy": 0.6999219834804535,
+      "num_tokens": 4640827.0,
+      "step": 2890
+    },
+    {
+      "entropy": 1.5528077244758607,
+      "epoch": 1.8125,
+      "grad_norm": 1.1443504095077515,
+      "learning_rate": 7.920833333333334e-05,
+      "loss": 1.5033111572265625,
+      "mean_token_accuracy": 0.7126169025897979,
+      "num_tokens": 4658220.0,
+      "step": 2900
+    },
+    {
+      "entropy": 1.3986522793769836,
+      "epoch": 1.81875,
+      "grad_norm": 1.5263164043426514,
+      "learning_rate": 7.879166666666668e-05,
+      "loss": 1.372209644317627,
+      "mean_token_accuracy": 0.7441541969776153,
+      "num_tokens": 4672842.0,
+      "step": 2910
+    },
+    {
+      "entropy": 1.4560746192932128,
+      "epoch": 1.825,
+      "grad_norm": 1.5468953847885132,
+      "learning_rate": 7.8375e-05,
+      "loss": 1.4648940086364746,
+      "mean_token_accuracy": 0.7272057294845581,
+      "num_tokens": 4688299.0,
+      "step": 2920
+    },
+    {
+      "entropy": 1.4954636991024017,
+      "epoch": 1.83125,
+      "grad_norm": 1.0781564712524414,
+      "learning_rate": 7.795833333333334e-05,
+      "loss": 1.4502483367919923,
+      "mean_token_accuracy": 0.7171670913696289,
+      "num_tokens": 4703817.0,
+      "step": 2930
+    },
+    {
+      "entropy": 1.5171880543231964,
+      "epoch": 1.8375,
+      "grad_norm": 1.3267104625701904,
+      "learning_rate": 7.754166666666666e-05,
+      "loss": 1.4769481658935546,
+      "mean_token_accuracy": 0.7215599358081818,
+      "num_tokens": 4719894.0,
+      "step": 2940
+    },
+    {
+      "entropy": 1.3195408761501313,
+      "epoch": 1.84375,
+      "grad_norm": 1.2717158794403076,
+      "learning_rate": 7.7125e-05,
+      "loss": 1.3177043914794921,
+      "mean_token_accuracy": 0.7504824101924896,
+      "num_tokens": 4734718.0,
+      "step": 2950
+    },
+    {
+      "entropy": 1.5918075561523437,
+      "epoch": 1.85,
+      "grad_norm": 1.2488837242126465,
+      "learning_rate": 7.670833333333333e-05,
+      "loss": 1.5610873222351074,
+      "mean_token_accuracy": 0.7076693117618561,
+      "num_tokens": 4751161.0,
+      "step": 2960
+    },
+    {
+      "entropy": 1.6460988879203797,
+      "epoch": 1.85625,
+      "grad_norm": 1.3455753326416016,
+      "learning_rate": 7.629166666666667e-05,
+      "loss": 1.6128368377685547,
+      "mean_token_accuracy": 0.7088452041149139,
+      "num_tokens": 4767233.0,
+      "step": 2970
+    },
+    {
+      "entropy": 1.4300294637680053,
+      "epoch": 1.8625,
+      "grad_norm": 1.4921503067016602,
+      "learning_rate": 7.5875e-05,
+      "loss": 1.3731948852539062,
+      "mean_token_accuracy": 0.7404947876930237,
+      "num_tokens": 4782969.0,
+      "step": 2980
+    },
+    {
+      "entropy": 1.576975119113922,
+      "epoch": 1.86875,
+      "grad_norm": 1.5002368688583374,
+      "learning_rate": 7.545833333333334e-05,
+      "loss": 1.5616255760192872,
+      "mean_token_accuracy": 0.7097465932369232,
+      "num_tokens": 4799235.0,
+      "step": 2990
+    },
+    {
+      "entropy": 1.488182783126831,
+      "epoch": 1.875,
+      "grad_norm": 1.83254075050354,
+      "learning_rate": 7.504166666666667e-05,
+      "loss": 1.4698299407958983,
+      "mean_token_accuracy": 0.7252460658550263,
+      "num_tokens": 4815994.0,
+      "step": 3000
+    },
+    {
+      "entropy": 1.5791472911834716,
+      "epoch": 1.88125,
+      "grad_norm": 1.4819544553756714,
+      "learning_rate": 7.4625e-05,
+      "loss": 1.5363513946533203,
+      "mean_token_accuracy": 0.7158302247524262,
+      "num_tokens": 4831602.0,
+      "step": 3010
+    },
+    {
+      "entropy": 1.5102008521556853,
+      "epoch": 1.8875,
+      "grad_norm": 1.295324444770813,
+      "learning_rate": 7.420833333333334e-05,
+      "loss": 1.5118574142456054,
+      "mean_token_accuracy": 0.7126592576503754,
+      "num_tokens": 4846906.0,
+      "step": 3020
+    },
+    {
+      "entropy": 1.6536986708641053,
+      "epoch": 1.89375,
+      "grad_norm": 1.3863139152526855,
+      "learning_rate": 7.379166666666667e-05,
+      "loss": 1.6189361572265626,
+      "mean_token_accuracy": 0.7046464741230011,
+      "num_tokens": 4863229.0,
+      "step": 3030
+    },
+    {
+      "entropy": 1.4736833274364471,
+      "epoch": 1.9,
+      "grad_norm": 1.3712388277053833,
+      "learning_rate": 7.337500000000001e-05,
+      "loss": 1.4391626358032226,
+      "mean_token_accuracy": 0.7212695777416229,
+      "num_tokens": 4879091.0,
+      "step": 3040
+    },
+    {
+      "entropy": 1.5309330582618714,
+      "epoch": 1.90625,
+      "grad_norm": 1.4493404626846313,
+      "learning_rate": 7.295833333333334e-05,
+      "loss": 1.483638381958008,
+      "mean_token_accuracy": 0.7188887298107147,
+      "num_tokens": 4895732.0,
+      "step": 3050
+    },
+    {
+      "entropy": 1.6084718346595763,
+      "epoch": 1.9125,
+      "grad_norm": 1.4487833976745605,
+      "learning_rate": 7.254166666666668e-05,
+      "loss": 1.5670183181762696,
+      "mean_token_accuracy": 0.7159606039524078,
+      "num_tokens": 4911076.0,
+      "step": 3060
+    },
+    {
+      "entropy": 1.730119562149048,
+      "epoch": 1.91875,
+      "grad_norm": 1.2320717573165894,
+      "learning_rate": 7.2125e-05,
+      "loss": 1.6761627197265625,
+      "mean_token_accuracy": 0.6998885095119476,
+      "num_tokens": 4927065.0,
+      "step": 3070
+    },
+    {
+      "entropy": 1.2469948709011078,
+      "epoch": 1.925,
+      "grad_norm": 1.4127497673034668,
+      "learning_rate": 7.170833333333333e-05,
+      "loss": 1.2160426139831544,
+      "mean_token_accuracy": 0.7594240248203278,
+      "num_tokens": 4943418.0,
+      "step": 3080
+    },
+    {
+      "entropy": 1.2954376578330993,
+      "epoch": 1.93125,
+      "grad_norm": 1.1853926181793213,
+      "learning_rate": 7.129166666666667e-05,
+      "loss": 1.2705731391906738,
+      "mean_token_accuracy": 0.7542681276798249,
+      "num_tokens": 4959238.0,
+      "step": 3090
+    },
+    {
+      "entropy": 1.3804858148097991,
+      "epoch": 1.9375,
+      "grad_norm": 1.60636305809021,
+      "learning_rate": 7.0875e-05,
+      "loss": 1.3857073783874512,
+      "mean_token_accuracy": 0.7389215409755707,
+      "num_tokens": 4974795.0,
+      "step": 3100
+    },
+    {
+      "entropy": 1.6878295361995697,
+      "epoch": 1.94375,
+      "grad_norm": 1.1700066328048706,
+      "learning_rate": 7.045833333333334e-05,
+      "loss": 1.6632881164550781,
+      "mean_token_accuracy": 0.703966373205185,
+      "num_tokens": 4990118.0,
+      "step": 3110
+    },
+    {
+      "entropy": 1.2717679560184478,
+      "epoch": 1.95,
+      "grad_norm": 2.0792453289031982,
+      "learning_rate": 7.004166666666667e-05,
+      "loss": 1.2325850486755372,
+      "mean_token_accuracy": 0.7667054653167724,
+      "num_tokens": 5004766.0,
+      "step": 3120
+    },
+    {
+      "entropy": 1.438014167547226,
+      "epoch": 1.95625,
+      "grad_norm": 1.2766367197036743,
+      "learning_rate": 6.962500000000001e-05,
+      "loss": 1.4068305969238282,
+      "mean_token_accuracy": 0.7309353291988373,
+      "num_tokens": 5021383.0,
+      "step": 3130
+    },
+    {
+      "entropy": 1.641681444644928,
+      "epoch": 1.9625,
+      "grad_norm": 1.1961487531661987,
+      "learning_rate": 6.920833333333334e-05,
+      "loss": 1.6495939254760743,
+      "mean_token_accuracy": 0.7029170572757721,
+      "num_tokens": 5037300.0,
+      "step": 3140
+    },
+    {
+      "entropy": 1.6759935021400452,
+      "epoch": 1.96875,
+      "grad_norm": 1.5381704568862915,
+      "learning_rate": 6.879166666666667e-05,
+      "loss": 1.653905487060547,
+      "mean_token_accuracy": 0.700744116306305,
+      "num_tokens": 5053656.0,
+      "step": 3150
+    },
+    {
+      "entropy": 1.506896734237671,
+      "epoch": 1.975,
+      "grad_norm": 1.581653118133545,
+      "learning_rate": 6.8375e-05,
+      "loss": 1.4852601051330567,
+      "mean_token_accuracy": 0.7296431720256805,
+      "num_tokens": 5067985.0,
+      "step": 3160
+    },
+    {
+      "entropy": 1.42970010638237,
+      "epoch": 1.98125,
+      "grad_norm": 0.9960667490959167,
+      "learning_rate": 6.795833333333334e-05,
+      "loss": 1.3865435600280762,
+      "mean_token_accuracy": 0.7410664558410645,
+      "num_tokens": 5084409.0,
+      "step": 3170
+    },
+    {
+      "entropy": 1.5444799602031707,
+      "epoch": 1.9875,
+      "grad_norm": 1.3578131198883057,
+      "learning_rate": 6.754166666666666e-05,
+      "loss": 1.5354645729064942,
+      "mean_token_accuracy": 0.7221448838710784,
+      "num_tokens": 5099986.0,
+      "step": 3180
+    },
+    {
+      "entropy": 1.4982260465621948,
+      "epoch": 1.99375,
+      "grad_norm": 1.3280580043792725,
+      "learning_rate": 6.7125e-05,
+      "loss": 1.4895822525024414,
+      "mean_token_accuracy": 0.7181140720844269,
+      "num_tokens": 5116593.0,
+      "step": 3190
+    },
+    {
+      "entropy": 1.4417502641677857,
+      "epoch": 2.0,
+      "grad_norm": 1.0056556463241577,
+      "learning_rate": 6.670833333333333e-05,
+      "loss": 1.3911771774291992,
+      "mean_token_accuracy": 0.7312404155731201,
+      "num_tokens": 5133354.0,
+      "step": 3200
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 4800,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.023160125633331e+16,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

adapters_backup/checkpoint-3200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5bd3e5abc6ef5bc38efc338fc4014b24c23c1bf16f86b2ba243374bd94c6e850
+size 5713

adapters_backup/checkpoint-4800/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: LiquidAI/LFM2.5-1.2B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:LiquidAI/LFM2.5-1.2B-Instruct
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.1

adapters_backup/checkpoint-4800/adapter_config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "LiquidAI/LFM2.5-1.2B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "w1",
+    "out_proj",
+    "w3",
+    "w2",
+    "v_proj",
+    "in_proj",
+    "q_proj",
+    "k_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapters_backup/checkpoint-4800/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a19d950faf1cff366b898e918ccf3219ec7b5afe8fd3eda00c1064a2aa7e3423
+size 22240880

adapters_backup/checkpoint-4800/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,45 @@

+{{- bos_token -}}
+{%- set keep_past_thinking = keep_past_thinking | default(false) -%}
+{%- set ns = namespace(system_prompt="") -%}
+{%- if messages[0]["role"] == "system" -%}
+    {%- set ns.system_prompt = messages[0]["content"] -%}
+    {%- set messages = messages[1:] -%}
+{%- endif -%}
+{%- if tools -%}
+    {%- set ns.system_prompt = ns.system_prompt + ("\n" if ns.system_prompt else "") + "List of tools: [" -%}
+    {%- for tool in tools -%}
+        {%- if tool is not string -%}
+            {%- set tool = tool | tojson -%}
+        {%- endif -%}
+        {%- set ns.system_prompt = ns.system_prompt + tool -%}
+        {%- if not loop.last -%}
+            {%- set ns.system_prompt = ns.system_prompt + ", " -%}
+        {%- endif -%}
+    {%- endfor -%}
+    {%- set ns.system_prompt = ns.system_prompt + "]" -%}
+{%- endif -%}
+{%- if ns.system_prompt -%}
+    {{- "<|im_start|>system\n" + ns.system_prompt + "<|im_end|>\n" -}}
+{%- endif -%}
+{%- set ns.last_assistant_index = -1 -%}
+{%- for message in messages -%}
+    {%- if message["role"] == "assistant" -%}
+        {%- set ns.last_assistant_index = loop.index0 -%}
+    {%- endif -%}
+{%- endfor -%}
+{%- for message in messages -%}
+    {{- "<|im_start|>" + message["role"] + "\n" -}}
+    {%- set content = message["content"] -%}
+    {%- if content is not string -%}
+        {%- set content = content | tojson -%}
+    {%- endif -%}
+    {%- if message["role"] == "assistant" and not keep_past_thinking and loop.index0 != ns.last_assistant_index -%}
+        {%- if "</think>" in content -%}
+            {%- set content = content.split("</think>")[-1] | trim -%}
+        {%- endif -%}
+    {%- endif -%}
+    {{- content + "<|im_end|>\n" -}}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{- "<|im_start|>assistant\n" -}}
+{%- endif -%}

adapters_backup/checkpoint-4800/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f95927a73cced9aa2b457cad481038484e0ee2dc9926a320ba0d4740ea301ba2
+size 44583435

adapters_backup/checkpoint-4800/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dba4fde4ee04d2f472bb4dea96a48e8fdf7891d2b0694a8f012e8133a2e176ae
+size 14455

adapters_backup/checkpoint-4800/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ec6662961b577a17b223e71f2c49f73003734d324c1057bf78b9d94b11f83fa
+size 1465

adapters_backup/checkpoint-4800/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

adapters_backup/checkpoint-4800/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "is_local": false,
+  "legacy": false,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|pad|>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "use_default_system_prompt": false,
+  "use_fast": true
+}

adapters_backup/checkpoint-4800/trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

adapters_backup/checkpoint-4800/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5bd3e5abc6ef5bc38efc338fc4014b24c23c1bf16f86b2ba243374bd94c6e850
+size 5713

adapters_backup/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9f3d474fca8712f4970235089141cc3151ec0251001f0277101040ba3e632c1d
-size 5585

 version https://git-lfs.github.com/spec/v1
+oid sha256:5bd3e5abc6ef5bc38efc338fc4014b24c23c1bf16f86b2ba243374bd94c6e850
+size 5713

adapters_full/README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+base_model: LiquidAI/LFM2.5-1.2B-Instruct
+library_name: peft
+model_name: adapters_full
+tags:
+- base_model:adapter:LiquidAI/LFM2.5-1.2B-Instruct
+- lora
+- sft
+- transformers
+- trl
+licence: license
+pipeline_tag: text-generation
+---
+# Model Card for adapters_full
+This model is a fine-tuned version of [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="None", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+This model was trained with SFT.
+### Framework versions
+- PEFT 0.18.1
+- TRL: 0.29.1
+- Transformers: 5.4.0
+- Pytorch: 2.11.0
+- Datasets: 4.8.4
+- Tokenizers: 0.22.2
+## Citations
+Cite TRL as:
+```bibtex
+@software{vonwerra2020trl,
+  title   = {{TRL: Transformers Reinforcement Learning}},
+  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
+  license = {Apache-2.0},
+  url     = {https://github.com/huggingface/trl},
+  year    = {2020}
+}
+```

adapters_full/adapter_config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "LiquidAI/LFM2.5-1.2B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "w2",
+    "v_proj",
+    "w1",
+    "out_proj",
+    "w3",
+    "q_proj",
+    "in_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapters_full/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:df8b345a42da3d625e48900fef0f25bfb500e98ae3a2ec441f5ba90a214daed8
+size 22240880

adapters_full/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,45 @@

+{{- bos_token -}}
+{%- set keep_past_thinking = keep_past_thinking | default(false) -%}
+{%- set ns = namespace(system_prompt="") -%}
+{%- if messages[0]["role"] == "system" -%}
+    {%- set ns.system_prompt = messages[0]["content"] -%}
+    {%- set messages = messages[1:] -%}
+{%- endif -%}
+{%- if tools -%}
+    {%- set ns.system_prompt = ns.system_prompt + ("\n" if ns.system_prompt else "") + "List of tools: [" -%}
+    {%- for tool in tools -%}
+        {%- if tool is not string -%}
+            {%- set tool = tool | tojson -%}
+        {%- endif -%}
+        {%- set ns.system_prompt = ns.system_prompt + tool -%}
+        {%- if not loop.last -%}
+            {%- set ns.system_prompt = ns.system_prompt + ", " -%}
+        {%- endif -%}
+    {%- endfor -%}
+    {%- set ns.system_prompt = ns.system_prompt + "]" -%}
+{%- endif -%}
+{%- if ns.system_prompt -%}
+    {{- "<|im_start|>system\n" + ns.system_prompt + "<|im_end|>\n" -}}
+{%- endif -%}
+{%- set ns.last_assistant_index = -1 -%}
+{%- for message in messages -%}
+    {%- if message["role"] == "assistant" -%}
+        {%- set ns.last_assistant_index = loop.index0 -%}
+    {%- endif -%}
+{%- endfor -%}
+{%- for message in messages -%}
+    {{- "<|im_start|>" + message["role"] + "\n" -}}
+    {%- set content = message["content"] -%}
+    {%- if content is not string -%}
+        {%- set content = content | tojson -%}
+    {%- endif -%}
+    {%- if message["role"] == "assistant" and not keep_past_thinking and loop.index0 != ns.last_assistant_index -%}
+        {%- if "</think>" in content -%}
+            {%- set content = content.split("</think>")[-1] | trim -%}
+        {%- endif -%}
+    {%- endif -%}
+    {{- content + "<|im_end|>\n" -}}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{- "<|im_start|>assistant\n" -}}
+{%- endif -%}

adapters_full/checkpoint-4000/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: LiquidAI/LFM2.5-1.2B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:LiquidAI/LFM2.5-1.2B-Instruct
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.1

adapters_full/checkpoint-4000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "LiquidAI/LFM2.5-1.2B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "w2",
+    "v_proj",
+    "w1",
+    "out_proj",
+    "w3",
+    "q_proj",
+    "in_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapters_full/checkpoint-4000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f82936a543f035d2e7611a9778af665ac48923d9405d08bacefb5ba93a551713
+size 22240880

adapters_full/checkpoint-4000/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,45 @@

+{{- bos_token -}}
+{%- set keep_past_thinking = keep_past_thinking | default(false) -%}
+{%- set ns = namespace(system_prompt="") -%}
+{%- if messages[0]["role"] == "system" -%}
+    {%- set ns.system_prompt = messages[0]["content"] -%}
+    {%- set messages = messages[1:] -%}
+{%- endif -%}
+{%- if tools -%}
+    {%- set ns.system_prompt = ns.system_prompt + ("\n" if ns.system_prompt else "") + "List of tools: [" -%}
+    {%- for tool in tools -%}
+        {%- if tool is not string -%}
+            {%- set tool = tool | tojson -%}
+        {%- endif -%}
+        {%- set ns.system_prompt = ns.system_prompt + tool -%}
+        {%- if not loop.last -%}
+            {%- set ns.system_prompt = ns.system_prompt + ", " -%}
+        {%- endif -%}
+    {%- endfor -%}
+    {%- set ns.system_prompt = ns.system_prompt + "]" -%}
+{%- endif -%}
+{%- if ns.system_prompt -%}
+    {{- "<|im_start|>system\n" + ns.system_prompt + "<|im_end|>\n" -}}
+{%- endif -%}
+{%- set ns.last_assistant_index = -1 -%}
+{%- for message in messages -%}
+    {%- if message["role"] == "assistant" -%}
+        {%- set ns.last_assistant_index = loop.index0 -%}
+    {%- endif -%}
+{%- endfor -%}
+{%- for message in messages -%}
+    {{- "<|im_start|>" + message["role"] + "\n" -}}
+    {%- set content = message["content"] -%}
+    {%- if content is not string -%}
+        {%- set content = content | tojson -%}
+    {%- endif -%}
+    {%- if message["role"] == "assistant" and not keep_past_thinking and loop.index0 != ns.last_assistant_index -%}
+        {%- if "</think>" in content -%}
+            {%- set content = content.split("</think>")[-1] | trim -%}
+        {%- endif -%}
+    {%- endif -%}
+    {{- content + "<|im_end|>\n" -}}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{- "<|im_start|>assistant\n" -}}
+{%- endif -%}

adapters_full/checkpoint-4000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ea8d80b197a627dfcd71b4efefa8eff92e645e4d70bf0afee75f9e1649ec1a1
+size 44583435

adapters_full/checkpoint-4000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2cddf27219365242ec1046a3532a63a24c3f350c77f100e4f973369db2cc849d
+size 14455

adapters_full/checkpoint-4000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d0a253ec264f70d0620c7f9af3c0e7bd68f7b456dd006e553483387f18b4cfe
+size 1465

adapters_full/checkpoint-4000/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

adapters_full/checkpoint-4000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "is_local": false,
+  "legacy": false,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|pad|>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "use_default_system_prompt": false,
+  "use_fast": true
+}

adapters_full/checkpoint-4000/trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff