Aananda-giri commited on Dec 15, 2025

Commit

8b35fe8

verified ·

1 Parent(s): bee823c

Upload Thera dialogue fine-tuned model

Browse files

Files changed (22) hide show

.gitattributes +2 -0
README.md +4 -4
checkpoint-919/README.md +209 -0
checkpoint-919/adapter_config.json +46 -0
checkpoint-919/adapter_model.safetensors +3 -0
checkpoint-919/added_tokens.json +28 -0
checkpoint-919/chat_template.jinja +89 -0
checkpoint-919/merges.txt +0 -0
checkpoint-919/optimizer.pt +3 -0
checkpoint-919/rng_state.pth +3 -0
checkpoint-919/scheduler.pt +3 -0
checkpoint-919/special_tokens_map.json +31 -0
checkpoint-919/tokenizer.json +3 -0
checkpoint-919/tokenizer_config.json +239 -0
checkpoint-919/trainer_state.json +955 -0
checkpoint-919/training_args.bin +3 -0
checkpoint-919/vocab.json +0 -0
crypto-thera-adapters/adapter_config.json +46 -0
crypto-thera-adapters/adapter_model.safetensors +3 -0
crypto-thera-adapters/loss_plots.png +3 -0
crypto-thera-adapters/training_args.bin +3 -0
crypto-thera-adapters/training_history.json +932 -0

.gitattributes CHANGED Viewed

@@ -37,3 +37,5 @@ tokenizer.json filter=lfs diff=lfs merge=lfs -text
 Thera-adapters/loss_plots.png filter=lfs diff=lfs merge=lfs -text
 checkpoint-405/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 checkpoint-810/tokenizer.json filter=lfs diff=lfs merge=lfs -text

 Thera-adapters/loss_plots.png filter=lfs diff=lfs merge=lfs -text
 checkpoint-405/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 checkpoint-810/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-919/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+crypto-thera-adapters/loss_plots.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 base_model: Qwen/Qwen3-4B
 library_name: peft
-model_name: CryptoStatuette-qwen-finetuned
 tags:
 - base_model:adapter:Qwen/Qwen3-4B
 - lora
@@ -12,7 +12,7 @@ licence: license
 pipeline_tag: text-generation
 ---
-# Model Card for CryptoStatuette-qwen-finetuned
 This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B).
 It has been trained using [TRL](https://github.com/huggingface/trl).
@@ -38,8 +38,8 @@ This model was trained with SFT.
 ### Framework versions
 - PEFT 0.18.0
-- TRL: 0.25.1
-- Transformers: 4.57.2
 - Pytorch: 2.9.0+cu126
 - Datasets: 4.0.0
 - Tokenizers: 0.22.1

 ---
 base_model: Qwen/Qwen3-4B
 library_name: peft
+model_name: Thera-qwen-finetuned
 tags:
 - base_model:adapter:Qwen/Qwen3-4B
 - lora
 pipeline_tag: text-generation
 ---
+# Model Card for Thera-qwen-finetuned
 This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ### Framework versions
 - PEFT 0.18.0
+- TRL: 0.26.1
+- Transformers: 4.57.3
 - Pytorch: 2.9.0+cu126
 - Datasets: 4.0.0
 - Tokenizers: 0.22.1

checkpoint-919/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: Qwen/Qwen3-4B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen3-4B
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

checkpoint-919/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen3-4B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "down_proj",
+    "o_proj",
+    "k_proj",
+    "up_proj",
+    "v_proj",
+    "gate_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-919/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c56f62f45a8ab6ac856e3e9f2349d60cc62c28c8c7e8a9d8135adaa254dad027
+size 528550256

checkpoint-919/added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-919/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,89 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

checkpoint-919/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-919/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c4027bb9ddd9c7da1a321d305718e185b0225c664c4c005e7ef619fb25289fbc
+size 1057397963

checkpoint-919/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e1b9a669990eaa12fac0cc394f2dc8ae57aa35afb582a495f244c1744ad72a8a
+size 14645

checkpoint-919/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:625b5b26600a0c8a228296cfb69e3a316e7cbd299ada136d4cc35aa98a57707f
+size 1465

checkpoint-919/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-919/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
+size 11422654

checkpoint-919/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,239 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-919/trainer_state.json ADDED Viewed

	@@ -0,0 +1,955 @@

+{
+  "best_global_step": 919,
+  "best_metric": 0.5503931641578674,
+  "best_model_checkpoint": "Thera-qwen-finetuned/checkpoint-919",
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 919,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 1.4115246817469598,
+      "epoch": 0.010887316276537834,
+      "grad_norm": 2.907423257827759,
+      "learning_rate": 4.891304347826087e-05,
+      "loss": 3.3033,
+      "mean_token_accuracy": 0.5003452189266682,
+      "num_tokens": 7489.0,
+      "step": 10
+    },
+    {
+      "entropy": 1.1823123581707478,
+      "epoch": 0.021774632553075667,
+      "grad_norm": 1.0035202503204346,
+      "learning_rate": 0.0001032608695652174,
+      "loss": 1.0891,
+      "mean_token_accuracy": 0.807713833451271,
+      "num_tokens": 14981.0,
+      "step": 20
+    },
+    {
+      "entropy": 0.7195636250078679,
+      "epoch": 0.0326619488296135,
+      "grad_norm": 0.7734199166297913,
+      "learning_rate": 0.0001576086956521739,
+      "loss": 0.7185,
+      "mean_token_accuracy": 0.8713902071118355,
+      "num_tokens": 22399.0,
+      "step": 30
+    },
+    {
+      "entropy": 0.598898995667696,
+      "epoch": 0.043549265106151334,
+      "grad_norm": 0.737375020980835,
+      "learning_rate": 0.00021195652173913043,
+      "loss": 0.6231,
+      "mean_token_accuracy": 0.8840597704052925,
+      "num_tokens": 29741.0,
+      "step": 40
+    },
+    {
+      "entropy": 0.6763634454458952,
+      "epoch": 0.05443658138268917,
+      "grad_norm": 0.77037113904953,
+      "learning_rate": 0.000266304347826087,
+      "loss": 0.7017,
+      "mean_token_accuracy": 0.8767582342028618,
+      "num_tokens": 37112.0,
+      "step": 50
+    },
+    {
+      "entropy": 0.6615333516150713,
+      "epoch": 0.065323897659227,
+      "grad_norm": 0.9717927575111389,
+      "learning_rate": 0.00032065217391304346,
+      "loss": 0.7214,
+      "mean_token_accuracy": 0.8725218087434768,
+      "num_tokens": 44590.0,
+      "step": 60
+    },
+    {
+      "entropy": 0.665872010961175,
+      "epoch": 0.07621121393576484,
+      "grad_norm": 0.6530428528785706,
+      "learning_rate": 0.000375,
+      "loss": 0.6938,
+      "mean_token_accuracy": 0.8677900999784469,
+      "num_tokens": 52209.0,
+      "step": 70
+    },
+    {
+      "entropy": 0.6730144165456295,
+      "epoch": 0.08709853021230267,
+      "grad_norm": 0.8795536756515503,
+      "learning_rate": 0.00042934782608695655,
+      "loss": 0.6937,
+      "mean_token_accuracy": 0.8736036345362663,
+      "num_tokens": 59627.0,
+      "step": 80
+    },
+    {
+      "entropy": 0.5524990629404783,
+      "epoch": 0.0979858464888405,
+      "grad_norm": 2.4545137882232666,
+      "learning_rate": 0.00048369565217391304,
+      "loss": 0.6051,
+      "mean_token_accuracy": 0.8858248308300972,
+      "num_tokens": 66878.0,
+      "step": 90
+    },
+    {
+      "entropy": 0.7578814126551151,
+      "epoch": 0.10887316276537834,
+      "grad_norm": 1.7287325859069824,
+      "learning_rate": 0.0004999116169004186,
+      "loss": 0.7764,
+      "mean_token_accuracy": 0.8569089323282242,
+      "num_tokens": 74378.0,
+      "step": 100
+    },
+    {
+      "entropy": 0.6519227813929319,
+      "epoch": 0.11976047904191617,
+      "grad_norm": 0.6661782264709473,
+      "learning_rate": 0.0004994788705196884,
+      "loss": 0.7017,
+      "mean_token_accuracy": 0.8656885161995888,
+      "num_tokens": 81971.0,
+      "step": 110
+    },
+    {
+      "entropy": 0.6764750100672245,
+      "epoch": 0.130647795318454,
+      "grad_norm": 0.5872151255607605,
+      "learning_rate": 0.0004986861508565064,
+      "loss": 0.6964,
+      "mean_token_accuracy": 0.867901885509491,
+      "num_tokens": 89448.0,
+      "step": 120
+    },
+    {
+      "entropy": 0.6099378645420075,
+      "epoch": 0.14153511159499182,
+      "grad_norm": 0.569556474685669,
+      "learning_rate": 0.0004975346017267744,
+      "loss": 0.6559,
+      "mean_token_accuracy": 0.877436301112175,
+      "num_tokens": 96736.0,
+      "step": 130
+    },
+    {
+      "entropy": 0.7155832894146442,
+      "epoch": 0.15242242787152968,
+      "grad_norm": 1.407532811164856,
+      "learning_rate": 0.000496025884701748,
+      "loss": 0.7803,
+      "mean_token_accuracy": 0.861001954972744,
+      "num_tokens": 104199.0,
+      "step": 140
+    },
+    {
+      "entropy": 0.6367012947797775,
+      "epoch": 0.1633097441480675,
+      "grad_norm": 0.7547012567520142,
+      "learning_rate": 0.0004941621767105542,
+      "loss": 0.66,
+      "mean_token_accuracy": 0.8748047932982445,
+      "num_tokens": 111648.0,
+      "step": 150
+    },
+    {
+      "entropy": 0.6732052929699421,
+      "epoch": 0.17419706042460534,
+      "grad_norm": 6.324690341949463,
+      "learning_rate": 0.0004919461668990982,
+      "loss": 0.7214,
+      "mean_token_accuracy": 0.869445288181305,
+      "num_tokens": 119010.0,
+      "step": 160
+    },
+    {
+      "entropy": 0.7065097156912088,
+      "epoch": 0.18508437670114317,
+      "grad_norm": 0.6715346574783325,
+      "learning_rate": 0.0004893810527498928,
+      "loss": 0.7486,
+      "mean_token_accuracy": 0.867114982008934,
+      "num_tokens": 126525.0,
+      "step": 170
+    },
+    {
+      "entropy": 0.6536071257665753,
+      "epoch": 0.195971692977681,
+      "grad_norm": 0.8438016176223755,
+      "learning_rate": 0.0004864705354684076,
+      "loss": 0.6745,
+      "mean_token_accuracy": 0.8712003126740455,
+      "num_tokens": 133957.0,
+      "step": 180
+    },
+    {
+      "entropy": 0.7786130860447884,
+      "epoch": 0.20685900925421882,
+      "grad_norm": 11.108195304870605,
+      "learning_rate": 0.00048321881464259676,
+      "loss": 0.8311,
+      "mean_token_accuracy": 0.8524066478013992,
+      "num_tokens": 141635.0,
+      "step": 190
+    },
+    {
+      "entropy": 0.966428604722023,
+      "epoch": 0.21774632553075668,
+      "grad_norm": 0.9511222243309021,
+      "learning_rate": 0.0004796305821833098,
+      "loss": 1.1536,
+      "mean_token_accuracy": 0.8078579772263765,
+      "num_tokens": 148904.0,
+      "step": 200
+    },
+    {
+      "entropy": 0.7515945039689541,
+      "epoch": 0.2286336418072945,
+      "grad_norm": 1.1783875226974487,
+      "learning_rate": 0.00047571101555432896,
+      "loss": 0.7879,
+      "mean_token_accuracy": 0.8544724628329277,
+      "num_tokens": 156485.0,
+      "step": 210
+    },
+    {
+      "entropy": 0.7837981097400188,
+      "epoch": 0.23952095808383234,
+      "grad_norm": 6.818761348724365,
+      "learning_rate": 0.0004714657703018024,
+      "loss": 0.8526,
+      "mean_token_accuracy": 0.8500533938407898,
+      "num_tokens": 164052.0,
+      "step": 220
+    },
+    {
+      "entropy": 2.9163815125823023,
+      "epoch": 0.25040827436037016,
+      "grad_norm": 30.283111572265625,
+      "learning_rate": 0.0004669009718938517,
+      "loss": 4.5265,
+      "mean_token_accuracy": 0.5338523894548416,
+      "num_tokens": 171458.0,
+      "step": 230
+    },
+    {
+      "entropy": 2.0303648129105567,
+      "epoch": 0.261295590636908,
+      "grad_norm": 115.25823211669922,
+      "learning_rate": 0.00046202320688212834,
+      "loss": 2.5513,
+      "mean_token_accuracy": 0.6328982267528772,
+      "num_tokens": 179120.0,
+      "step": 240
+    },
+    {
+      "entropy": 0.7287540748715401,
+      "epoch": 0.2721829069134458,
+      "grad_norm": 8.124970436096191,
+      "learning_rate": 0.00045683951339807265,
+      "loss": 0.7502,
+      "mean_token_accuracy": 0.8621942937374115,
+      "num_tokens": 186544.0,
+      "step": 250
+    },
+    {
+      "entropy": 0.6197790112346411,
+      "epoch": 0.28307022318998365,
+      "grad_norm": 0.8240286111831665,
+      "learning_rate": 0.0004513573709975877,
+      "loss": 0.6533,
+      "mean_token_accuracy": 0.8799639001488686,
+      "num_tokens": 193880.0,
+      "step": 260
+    },
+    {
+      "entropy": 0.5914675913751125,
+      "epoch": 0.2939575394665215,
+      "grad_norm": 0.7374333143234253,
+      "learning_rate": 0.0004455846898687814,
+      "loss": 0.6144,
+      "mean_token_accuracy": 0.8865744516253471,
+      "num_tokens": 201144.0,
+      "step": 270
+    },
+    {
+      "entropy": 0.6894511558115483,
+      "epoch": 0.30484485574305936,
+      "grad_norm": 0.5616167783737183,
+      "learning_rate": 0.00043952979941834925,
+      "loss": 0.7368,
+      "mean_token_accuracy": 0.869646517932415,
+      "num_tokens": 208571.0,
+      "step": 280
+    },
+    {
+      "entropy": 0.7011830236762762,
+      "epoch": 0.3157321720195972,
+      "grad_norm": 0.9096492528915405,
+      "learning_rate": 0.0004332014362530659,
+      "loss": 0.7601,
+      "mean_token_accuracy": 0.8600849106907844,
+      "num_tokens": 216262.0,
+      "step": 290
+    },
+    {
+      "entropy": 0.6535980202257633,
+      "epoch": 0.326619488296135,
+      "grad_norm": 0.798920214176178,
+      "learning_rate": 0.00042660873157372763,
+      "loss": 0.6919,
+      "mean_token_accuracy": 0.8731523841619492,
+      "num_tokens": 223604.0,
+      "step": 300
+    },
+    {
+      "entropy": 0.7168508902192116,
+      "epoch": 0.33750680457267285,
+      "grad_norm": 0.7000783085823059,
+      "learning_rate": 0.00041976119799973477,
+      "loss": 0.7312,
+      "mean_token_accuracy": 0.862776193022728,
+      "num_tokens": 231179.0,
+      "step": 310
+    },
+    {
+      "entropy": 0.542249171063304,
+      "epoch": 0.3483941208492107,
+      "grad_norm": 0.6315221190452576,
+      "learning_rate": 0.00041266871584332454,
+      "loss": 0.5784,
+      "mean_token_accuracy": 0.8910727813839913,
+      "num_tokens": 238379.0,
+      "step": 320
+    },
+    {
+      "entropy": 0.7523471737280488,
+      "epoch": 0.3592814371257485,
+      "grad_norm": 0.7601750493049622,
+      "learning_rate": 0.0004053415188532599,
+      "loss": 0.7976,
+      "mean_token_accuracy": 0.8552964687347412,
+      "num_tokens": 246037.0,
+      "step": 330
+    },
+    {
+      "entropy": 0.6234626328572631,
+      "epoch": 0.37016875340228633,
+      "grad_norm": 0.7975150346755981,
+      "learning_rate": 0.0003977901794485446,
+      "loss": 0.6718,
+      "mean_token_accuracy": 0.8762789338827133,
+      "num_tokens": 253413.0,
+      "step": 340
+    },
+    {
+      "entropy": 0.6043397862464189,
+      "epoch": 0.38105606967882416,
+      "grad_norm": 0.6521063446998596,
+      "learning_rate": 0.0003900255934634699,
+      "loss": 0.6134,
+      "mean_token_accuracy": 0.8838247537612915,
+      "num_tokens": 260806.0,
+      "step": 350
+    },
+    {
+      "entropy": 0.6836169838905335,
+      "epoch": 0.391943385955362,
+      "grad_norm": 0.9141950607299805,
+      "learning_rate": 0.0003820589644260065,
+      "loss": 0.7149,
+      "mean_token_accuracy": 0.8627639308571815,
+      "num_tokens": 268395.0,
+      "step": 360
+    },
+    {
+      "entropy": 0.6906204871833325,
+      "epoch": 0.4028307022318998,
+      "grad_norm": 0.6039382815361023,
+      "learning_rate": 0.00037390178739222363,
+      "loss": 0.749,
+      "mean_token_accuracy": 0.8611163705587387,
+      "num_tokens": 276103.0,
+      "step": 370
+    },
+    {
+      "entropy": 0.7154507145285607,
+      "epoch": 0.41371801850843765,
+      "grad_norm": 0.7205569744110107,
+      "learning_rate": 0.00036556583236006237,
+      "loss": 0.7435,
+      "mean_token_accuracy": 0.8620465710759163,
+      "num_tokens": 283734.0,
+      "step": 380
+    },
+    {
+      "entropy": 0.6251850917935371,
+      "epoch": 0.42460533478497553,
+      "grad_norm": 0.7158589959144592,
+      "learning_rate": 0.0003570631272863956,
+      "loss": 0.638,
+      "mean_token_accuracy": 0.8804232597351074,
+      "num_tokens": 291104.0,
+      "step": 390
+    },
+    {
+      "entropy": 0.655794395133853,
+      "epoch": 0.43549265106151336,
+      "grad_norm": 0.6197627186775208,
+      "learning_rate": 0.0003484059407318781,
+      "loss": 0.7033,
+      "mean_token_accuracy": 0.8641186743974686,
+      "num_tokens": 298667.0,
+      "step": 400
+    },
+    {
+      "entropy": 0.6755085363984108,
+      "epoch": 0.4463799673380512,
+      "grad_norm": 0.6667740345001221,
+      "learning_rate": 0.00033960676415863015,
+      "loss": 0.724,
+      "mean_token_accuracy": 0.8660084888339042,
+      "num_tokens": 306246.0,
+      "step": 410
+    },
+    {
+      "entropy": 0.6676596872508526,
+      "epoch": 0.457267283614589,
+      "grad_norm": 0.7817707061767578,
+      "learning_rate": 0.00033067829390629453,
+      "loss": 0.6793,
+      "mean_token_accuracy": 0.8715122014284133,
+      "num_tokens": 313691.0,
+      "step": 420
+    },
+    {
+      "entropy": 0.7171908929944039,
+      "epoch": 0.46815459989112684,
+      "grad_norm": 1.0613240003585815,
+      "learning_rate": 0.00032163341287247876,
+      "loss": 0.7552,
+      "mean_token_accuracy": 0.8618355020880699,
+      "num_tokens": 321394.0,
+      "step": 430
+    },
+    {
+      "entropy": 0.5122752044349909,
+      "epoch": 0.47904191616766467,
+      "grad_norm": 0.649142324924469,
+      "learning_rate": 0.00031248517192400876,
+      "loss": 0.5586,
+      "mean_token_accuracy": 0.8886241987347603,
+      "num_tokens": 328807.0,
+      "step": 440
+    },
+    {
+      "entropy": 0.5436007279902697,
+      "epoch": 0.4899292324442025,
+      "grad_norm": 0.5830965042114258,
+      "learning_rate": 0.0003032467710658231,
+      "loss": 0.5397,
+      "mean_token_accuracy": 0.8886899709701538,
+      "num_tokens": 336117.0,
+      "step": 450
+    },
+    {
+      "entropy": 0.6272627621889114,
+      "epoch": 0.5008165487207403,
+      "grad_norm": 0.6612179279327393,
+      "learning_rate": 0.0002939315403946733,
+      "loss": 0.6844,
+      "mean_token_accuracy": 0.8671970278024673,
+      "num_tokens": 343690.0,
+      "step": 460
+    },
+    {
+      "entropy": 0.6421366423368454,
+      "epoch": 0.5117038649972782,
+      "grad_norm": 0.6228363513946533,
+      "learning_rate": 0.0002845529208651161,
+      "loss": 0.6428,
+      "mean_token_accuracy": 0.8806748360395431,
+      "num_tokens": 351025.0,
+      "step": 470
+    },
+    {
+      "entropy": 0.5570888858288526,
+      "epoch": 0.522591181273816,
+      "grad_norm": 0.6702244877815247,
+      "learning_rate": 0.00027512444489554767,
+      "loss": 0.6107,
+      "mean_token_accuracy": 0.8815102204680443,
+      "num_tokens": 358280.0,
+      "step": 480
+    },
+    {
+      "entropy": 0.5678568474948407,
+      "epoch": 0.5334784975503538,
+      "grad_norm": 0.6265514492988586,
+      "learning_rate": 0.00026565971684226573,
+      "loss": 0.5648,
+      "mean_token_accuracy": 0.8863577455282211,
+      "num_tokens": 365676.0,
+      "step": 490
+    },
+    {
+      "entropy": 0.5953685358166695,
+      "epoch": 0.5443658138268916,
+      "grad_norm": 0.8042871356010437,
+      "learning_rate": 0.0002561723933697317,
+      "loss": 0.6452,
+      "mean_token_accuracy": 0.8779568284749985,
+      "num_tokens": 373045.0,
+      "step": 500
+    },
+    {
+      "entropy": 0.7206135954707861,
+      "epoch": 0.5552531301034295,
+      "grad_norm": 0.65378737449646,
+      "learning_rate": 0.0002466761637453568,
+      "loss": 0.7605,
+      "mean_token_accuracy": 0.8624323174357414,
+      "num_tokens": 380560.0,
+      "step": 510
+    },
+    {
+      "entropy": 0.6116405792534352,
+      "epoch": 0.5661404463799673,
+      "grad_norm": 0.5593022108078003,
+      "learning_rate": 0.00023718473008724742,
+      "loss": 0.6431,
+      "mean_token_accuracy": 0.8734307438135147,
+      "num_tokens": 388027.0,
+      "step": 520
+    },
+    {
+      "entropy": 0.6847221277654171,
+      "epoch": 0.5770277626565051,
+      "grad_norm": 0.6043410301208496,
+      "learning_rate": 0.00022771178759340514,
+      "loss": 0.6989,
+      "mean_token_accuracy": 0.8697591915726661,
+      "num_tokens": 395471.0,
+      "step": 530
+    },
+    {
+      "entropy": 0.6371712744235992,
+      "epoch": 0.587915078933043,
+      "grad_norm": 0.5856935381889343,
+      "learning_rate": 0.00021827100478091506,
+      "loss": 0.6722,
+      "mean_token_accuracy": 0.8688437402248382,
+      "num_tokens": 403108.0,
+      "step": 540
+    },
+    {
+      "entropy": 0.5488773200660944,
+      "epoch": 0.5988023952095808,
+      "grad_norm": 0.5797805190086365,
+      "learning_rate": 0.00020887600376362904,
+      "loss": 0.572,
+      "mean_token_accuracy": 0.8839038833975792,
+      "num_tokens": 410452.0,
+      "step": 550
+    },
+    {
+      "entropy": 0.5808871898800134,
+      "epoch": 0.6096897114861187,
+      "grad_norm": 0.521022379398346,
+      "learning_rate": 0.00019954034059680668,
+      "loss": 0.5912,
+      "mean_token_accuracy": 0.888222835958004,
+      "num_tokens": 417828.0,
+      "step": 560
+    },
+    {
+      "entropy": 0.5592609565705061,
+      "epoch": 0.6205770277626566,
+      "grad_norm": 0.5325660705566406,
+      "learning_rate": 0.00019027748571707066,
+      "loss": 0.569,
+      "mean_token_accuracy": 0.8819878950715065,
+      "num_tokens": 425205.0,
+      "step": 570
+    },
+    {
+      "entropy": 0.6147103808820248,
+      "epoch": 0.6314643440391944,
+      "grad_norm": 0.5417932271957397,
+      "learning_rate": 0.00018110080450590182,
+      "loss": 0.6577,
+      "mean_token_accuracy": 0.8764070853590965,
+      "num_tokens": 432668.0,
+      "step": 580
+    },
+    {
+      "entropy": 0.6328225396573544,
+      "epoch": 0.6423516603157322,
+      "grad_norm": 0.6512264013290405,
+      "learning_rate": 0.0001720235380047188,
+      "loss": 0.6491,
+      "mean_token_accuracy": 0.8737886667251586,
+      "num_tokens": 440161.0,
+      "step": 590
+    },
+    {
+      "entropy": 0.6266105823218823,
+      "epoch": 0.65323897659227,
+      "grad_norm": 0.6788283586502075,
+      "learning_rate": 0.00016305878380936723,
+      "loss": 0.6251,
+      "mean_token_accuracy": 0.8808962866663933,
+      "num_tokens": 447470.0,
+      "step": 600
+    },
+    {
+      "entropy": 0.5399745622649789,
+      "epoch": 0.6641262928688079,
+      "grad_norm": 0.6711626052856445,
+      "learning_rate": 0.00015421947717158752,
+      "loss": 0.5736,
+      "mean_token_accuracy": 0.8923148453235626,
+      "num_tokens": 454744.0,
+      "step": 610
+    },
+    {
+      "entropy": 0.695448774844408,
+      "epoch": 0.6750136091453457,
+      "grad_norm": 0.755664587020874,
+      "learning_rate": 0.00014551837233472853,
+      "loss": 0.7435,
+      "mean_token_accuracy": 0.8619298219680787,
+      "num_tokens": 462432.0,
+      "step": 620
+    },
+    {
+      "entropy": 0.5408242929726839,
+      "epoch": 0.6859009254218835,
+      "grad_norm": 0.6359855532646179,
+      "learning_rate": 0.0001369680241306384,
+      "loss": 0.5643,
+      "mean_token_accuracy": 0.8882203832268715,
+      "num_tokens": 469763.0,
+      "step": 630
+    },
+    {
+      "entropy": 0.5984043713659049,
+      "epoch": 0.6967882416984214,
+      "grad_norm": 0.6010624170303345,
+      "learning_rate": 0.00012858076986428722,
+      "loss": 0.5929,
+      "mean_token_accuracy": 0.8816081374883652,
+      "num_tokens": 477149.0,
+      "step": 640
+    },
+    {
+      "entropy": 0.5971009206026793,
+      "epoch": 0.7076755579749592,
+      "grad_norm": 0.6069923043251038,
+      "learning_rate": 0.00012036871151225798,
+      "loss": 0.6258,
+      "mean_token_accuracy": 0.8779259011149406,
+      "num_tokens": 484640.0,
+      "step": 650
+    },
+    {
+      "entropy": 0.6068212665617466,
+      "epoch": 0.718562874251497,
+      "grad_norm": 0.5725326538085938,
+      "learning_rate": 0.00011234369826079432,
+      "loss": 0.613,
+      "mean_token_accuracy": 0.8772263854742051,
+      "num_tokens": 492176.0,
+      "step": 660
+    },
+    {
+      "entropy": 0.6018822252750397,
+      "epoch": 0.7294501905280348,
+      "grad_norm": 0.7555631995201111,
+      "learning_rate": 0.00010451730940859949,
+      "loss": 0.6395,
+      "mean_token_accuracy": 0.8763651743531227,
+      "num_tokens": 499845.0,
+      "step": 670
+    },
+    {
+      "entropy": 0.5979844883084298,
+      "epoch": 0.7403375068045727,
+      "grad_norm": 0.5494154691696167,
+      "learning_rate": 9.690083765905544e-05,
+      "loss": 0.6319,
+      "mean_token_accuracy": 0.8793143942952156,
+      "num_tokens": 507418.0,
+      "step": 680
+    },
+    {
+      "entropy": 0.5443667802959681,
+      "epoch": 0.7512248230811105,
+      "grad_norm": 0.7554852366447449,
+      "learning_rate": 8.950527282597156e-05,
+      "loss": 0.5712,
+      "mean_token_accuracy": 0.8878106832504272,
+      "num_tokens": 514785.0,
+      "step": 690
+    },
+    {
+      "entropy": 0.5350296102464199,
+      "epoch": 0.7621121393576483,
+      "grad_norm": 0.5492348670959473,
+      "learning_rate": 8.234128597637239e-05,
+      "loss": 0.5349,
+      "mean_token_accuracy": 0.8928583487868309,
+      "num_tokens": 522076.0,
+      "step": 700
+    },
+    {
+      "entropy": 0.5867601800709963,
+      "epoch": 0.7729994556341862,
+      "grad_norm": 0.6712722778320312,
+      "learning_rate": 7.541921403320593e-05,
+      "loss": 0.6173,
+      "mean_token_accuracy": 0.8806560948491097,
+      "num_tokens": 529578.0,
+      "step": 710
+    },
+    {
+      "entropy": 0.6274564698338508,
+      "epoch": 0.783886771910724,
+      "grad_norm": 0.5191476345062256,
+      "learning_rate": 6.874904486018821e-05,
+      "loss": 0.6322,
+      "mean_token_accuracy": 0.8802066639065742,
+      "num_tokens": 537024.0,
+      "step": 720
+    },
+    {
+      "entropy": 0.5485220493748784,
+      "epoch": 0.7947740881872618,
+      "grad_norm": 0.6473787426948547,
+      "learning_rate": 6.234040285030551e-05,
+      "loss": 0.5631,
+      "mean_token_accuracy": 0.884115393459797,
+      "num_tokens": 544467.0,
+      "step": 730
+    },
+    {
+      "entropy": 0.5133912313729525,
+      "epoch": 0.8056614044637996,
+      "grad_norm": 0.6537692546844482,
+      "learning_rate": 5.6202535038770045e-05,
+      "loss": 0.544,
+      "mean_token_accuracy": 0.8923841789364815,
+      "num_tokens": 551834.0,
+      "step": 740
+    },
+    {
+      "entropy": 0.5843172324821353,
+      "epoch": 0.8165487207403375,
+      "grad_norm": 0.6818355321884155,
+      "learning_rate": 5.0344297760463954e-05,
+      "loss": 0.5861,
+      "mean_token_accuracy": 0.8821831315755844,
+      "num_tokens": 559289.0,
+      "step": 750
+    },
+    {
+      "entropy": 0.5654132578521966,
+      "epoch": 0.8274360370168753,
+      "grad_norm": 0.7315685153007507,
+      "learning_rate": 4.477414387112652e-05,
+      "loss": 0.6023,
+      "mean_token_accuracy": 0.8810448855161667,
+      "num_tokens": 566780.0,
+      "step": 760
+    },
+    {
+      "entropy": 0.5563108414411545,
+      "epoch": 0.8383233532934131,
+      "grad_norm": 0.6280286312103271,
+      "learning_rate": 3.950011055072039e-05,
+      "loss": 0.5637,
+      "mean_token_accuracy": 0.8888336777687073,
+      "num_tokens": 574206.0,
+      "step": 770
+    },
+    {
+      "entropy": 0.5699046881869435,
+      "epoch": 0.8492106695699511,
+      "grad_norm": 0.4488978683948517,
+      "learning_rate": 3.4529807706578346e-05,
+      "loss": 0.6045,
+      "mean_token_accuracy": 0.8825341418385506,
+      "num_tokens": 581608.0,
+      "step": 780
+    },
+    {
+      "entropy": 0.5627471528947353,
+      "epoch": 0.8600979858464889,
+      "grad_norm": 0.7140914797782898,
+      "learning_rate": 2.987040699306076e-05,
+      "loss": 0.6071,
+      "mean_token_accuracy": 0.8823000833392143,
+      "num_tokens": 589176.0,
+      "step": 790
+    },
+    {
+      "entropy": 0.5856846395879984,
+      "epoch": 0.8709853021230267,
+      "grad_norm": 0.7391604781150818,
+      "learning_rate": 2.5528631463569348e-05,
+      "loss": 0.6048,
+      "mean_token_accuracy": 0.8830034390091897,
+      "num_tokens": 596679.0,
+      "step": 800
+    },
+    {
+      "entropy": 0.47006473541259763,
+      "epoch": 0.8818726183995645,
+      "grad_norm": 0.44131314754486084,
+      "learning_rate": 2.151074586984744e-05,
+      "loss": 0.4758,
+      "mean_token_accuracy": 0.8987425476312637,
+      "num_tokens": 603956.0,
+      "step": 810
+    },
+    {
+      "entropy": 0.500870693475008,
+      "epoch": 0.8927599346761024,
+      "grad_norm": 0.9443516135215759,
+      "learning_rate": 1.7822547622564188e-05,
+      "loss": 0.5234,
+      "mean_token_accuracy": 0.8955065041780472,
+      "num_tokens": 611308.0,
+      "step": 820
+    },
+    {
+      "entropy": 0.5766935784369707,
+      "epoch": 0.9036472509526402,
+      "grad_norm": 0.6435447931289673,
+      "learning_rate": 1.4469358426225682e-05,
+      "loss": 0.5874,
+      "mean_token_accuracy": 0.8817667260766029,
+      "num_tokens": 618906.0,
+      "step": 830
+    },
+    {
+      "entropy": 0.5494085047394037,
+      "epoch": 0.914534567229178,
+      "grad_norm": 0.7876263856887817,
+      "learning_rate": 1.1456016600482706e-05,
+      "loss": 0.5684,
+      "mean_token_accuracy": 0.8897502169013023,
+      "num_tokens": 626224.0,
+      "step": 840
+    },
+    {
+      "entropy": 0.5102550655603408,
+      "epoch": 0.9254218835057159,
+      "grad_norm": 0.577179491519928,
+      "learning_rate": 8.78687009891499e-06,
+      "loss": 0.5115,
+      "mean_token_accuracy": 0.8966878160834313,
+      "num_tokens": 633424.0,
+      "step": 850
+    },
+    {
+      "entropy": 0.5591466184705496,
+      "epoch": 0.9363091997822537,
+      "grad_norm": 0.5634007453918457,
+      "learning_rate": 6.465770235365404e-06,
+      "loss": 0.5585,
+      "mean_token_accuracy": 0.88973039239645,
+      "num_tokens": 640829.0,
+      "step": 860
+    },
+    {
+      "entropy": 0.5340237215161323,
+      "epoch": 0.9471965160587915,
+      "grad_norm": 0.47885215282440186,
+      "learning_rate": 4.496066126875531e-06,
+      "loss": 0.522,
+      "mean_token_accuracy": 0.8937937587499618,
+      "num_tokens": 648132.0,
+      "step": 870
+    },
+    {
+      "entropy": 0.5373509109020234,
+      "epoch": 0.9580838323353293,
+      "grad_norm": 0.45497098565101624,
+      "learning_rate": 2.8805998612418396e-06,
+      "loss": 0.5222,
+      "mean_token_accuracy": 0.8893009826540947,
+      "num_tokens": 655528.0,
+      "step": 880
+    },
+    {
+      "entropy": 0.6342795874923468,
+      "epoch": 0.9689711486118672,
+      "grad_norm": 0.5892038345336914,
+      "learning_rate": 1.6217023961647982e-06,
+      "loss": 0.6542,
+      "mean_token_accuracy": 0.8745187908411026,
+      "num_tokens": 663081.0,
+      "step": 890
+    },
+    {
+      "entropy": 0.594165425002575,
+      "epoch": 0.979858464888405,
+      "grad_norm": 0.4947717785835266,
+      "learning_rate": 7.211901959078004e-07,
+      "loss": 0.6075,
+      "mean_token_accuracy": 0.8763557627797127,
+      "num_tokens": 670555.0,
+      "step": 900
+    },
+    {
+      "entropy": 0.5458611365407705,
+      "epoch": 0.9907457811649428,
+      "grad_norm": 0.5046107769012451,
+      "learning_rate": 1.8036261031936784e-07,
+      "loss": 0.5612,
+      "mean_token_accuracy": 0.8900749757885933,
+      "num_tokens": 677920.0,
+      "step": 910
+    },
+    {
+      "epoch": 1.0,
+      "eval_entropy": 0.5564239460492192,
+      "eval_loss": 0.5503931641578674,
+      "eval_mean_token_accuracy": 0.8885072296289477,
+      "eval_num_tokens": 684102.0,
+      "eval_runtime": 353.9795,
+      "eval_samples_per_second": 1.155,
+      "eval_steps_per_second": 1.155,
+      "step": 919
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 919,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.5456460396345344e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-919/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7320aac5597ff43ceb7a64adab1b8e32ac8f821a4d32f503b6ad887aa2b70fc5
+size 6225

checkpoint-919/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

crypto-thera-adapters/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen3-4B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "down_proj",
+    "o_proj",
+    "k_proj",
+    "up_proj",
+    "v_proj",
+    "gate_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

crypto-thera-adapters/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c56f62f45a8ab6ac856e3e9f2349d60cc62c28c8c7e8a9d8135adaa254dad027
+size 528550256

crypto-thera-adapters/loss_plots.png ADDED Viewed

Git LFS Details

SHA256: 1773ae86bb8cd35af9e3a34d017d0cb6ed36219f8a160636bf173d4a4cda7420
Pointer size: 131 Bytes
Size of remote file: 198 kB

crypto-thera-adapters/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7320aac5597ff43ceb7a64adab1b8e32ac8f821a4d32f503b6ad887aa2b70fc5
+size 6225

crypto-thera-adapters/training_history.json ADDED Viewed

	@@ -0,0 +1,932 @@

+[
+  {
+    "loss": 3.3033,
+    "grad_norm": 2.907423257827759,
+    "learning_rate": 4.891304347826087e-05,
+    "entropy": 1.4115246817469598,
+    "num_tokens": 7489.0,
+    "mean_token_accuracy": 0.5003452189266682,
+    "epoch": 0.010887316276537834,
+    "step": 10
+  },
+  {
+    "loss": 1.0891,
+    "grad_norm": 1.0035202503204346,
+    "learning_rate": 0.0001032608695652174,
+    "entropy": 1.1823123581707478,
+    "num_tokens": 14981.0,
+    "mean_token_accuracy": 0.807713833451271,
+    "epoch": 0.021774632553075667,
+    "step": 20
+  },
+  {
+    "loss": 0.7185,
+    "grad_norm": 0.7734199166297913,
+    "learning_rate": 0.0001576086956521739,
+    "entropy": 0.7195636250078679,
+    "num_tokens": 22399.0,
+    "mean_token_accuracy": 0.8713902071118355,
+    "epoch": 0.0326619488296135,
+    "step": 30
+  },
+  {
+    "loss": 0.6231,
+    "grad_norm": 0.737375020980835,
+    "learning_rate": 0.00021195652173913043,
+    "entropy": 0.598898995667696,
+    "num_tokens": 29741.0,
+    "mean_token_accuracy": 0.8840597704052925,
+    "epoch": 0.043549265106151334,
+    "step": 40
+  },
+  {
+    "loss": 0.7017,
+    "grad_norm": 0.77037113904953,
+    "learning_rate": 0.000266304347826087,
+    "entropy": 0.6763634454458952,
+    "num_tokens": 37112.0,
+    "mean_token_accuracy": 0.8767582342028618,
+    "epoch": 0.05443658138268917,
+    "step": 50
+  },
+  {
+    "loss": 0.7214,
+    "grad_norm": 0.9717927575111389,
+    "learning_rate": 0.00032065217391304346,
+    "entropy": 0.6615333516150713,
+    "num_tokens": 44590.0,
+    "mean_token_accuracy": 0.8725218087434768,
+    "epoch": 0.065323897659227,
+    "step": 60
+  },
+  {
+    "loss": 0.6938,
+    "grad_norm": 0.6530428528785706,
+    "learning_rate": 0.000375,
+    "entropy": 0.665872010961175,
+    "num_tokens": 52209.0,
+    "mean_token_accuracy": 0.8677900999784469,
+    "epoch": 0.07621121393576484,
+    "step": 70
+  },
+  {
+    "loss": 0.6937,
+    "grad_norm": 0.8795536756515503,
+    "learning_rate": 0.00042934782608695655,
+    "entropy": 0.6730144165456295,
+    "num_tokens": 59627.0,
+    "mean_token_accuracy": 0.8736036345362663,
+    "epoch": 0.08709853021230267,
+    "step": 80
+  },
+  {
+    "loss": 0.6051,
+    "grad_norm": 2.4545137882232666,
+    "learning_rate": 0.00048369565217391304,
+    "entropy": 0.5524990629404783,
+    "num_tokens": 66878.0,
+    "mean_token_accuracy": 0.8858248308300972,
+    "epoch": 0.0979858464888405,
+    "step": 90
+  },
+  {
+    "loss": 0.7764,
+    "grad_norm": 1.7287325859069824,
+    "learning_rate": 0.0004999116169004186,
+    "entropy": 0.7578814126551151,
+    "num_tokens": 74378.0,
+    "mean_token_accuracy": 0.8569089323282242,
+    "epoch": 0.10887316276537834,
+    "step": 100
+  },
+  {
+    "loss": 0.7017,
+    "grad_norm": 0.6661782264709473,
+    "learning_rate": 0.0004994788705196884,
+    "entropy": 0.6519227813929319,
+    "num_tokens": 81971.0,
+    "mean_token_accuracy": 0.8656885161995888,
+    "epoch": 0.11976047904191617,
+    "step": 110
+  },
+  {
+    "loss": 0.6964,
+    "grad_norm": 0.5872151255607605,
+    "learning_rate": 0.0004986861508565064,
+    "entropy": 0.6764750100672245,
+    "num_tokens": 89448.0,
+    "mean_token_accuracy": 0.867901885509491,
+    "epoch": 0.130647795318454,
+    "step": 120
+  },
+  {
+    "loss": 0.6559,
+    "grad_norm": 0.569556474685669,
+    "learning_rate": 0.0004975346017267744,
+    "entropy": 0.6099378645420075,
+    "num_tokens": 96736.0,
+    "mean_token_accuracy": 0.877436301112175,
+    "epoch": 0.14153511159499182,
+    "step": 130
+  },
+  {
+    "loss": 0.7803,
+    "grad_norm": 1.407532811164856,
+    "learning_rate": 0.000496025884701748,
+    "entropy": 0.7155832894146442,
+    "num_tokens": 104199.0,
+    "mean_token_accuracy": 0.861001954972744,
+    "epoch": 0.15242242787152968,
+    "step": 140
+  },
+  {
+    "loss": 0.66,
+    "grad_norm": 0.7547012567520142,
+    "learning_rate": 0.0004941621767105542,
+    "entropy": 0.6367012947797775,
+    "num_tokens": 111648.0,
+    "mean_token_accuracy": 0.8748047932982445,
+    "epoch": 0.1633097441480675,
+    "step": 150
+  },
+  {
+    "loss": 0.7214,
+    "grad_norm": 6.324690341949463,
+    "learning_rate": 0.0004919461668990982,
+    "entropy": 0.6732052929699421,
+    "num_tokens": 119010.0,
+    "mean_token_accuracy": 0.869445288181305,
+    "epoch": 0.17419706042460534,
+    "step": 160
+  },
+  {
+    "loss": 0.7486,
+    "grad_norm": 0.6715346574783325,
+    "learning_rate": 0.0004893810527498928,
+    "entropy": 0.7065097156912088,
+    "num_tokens": 126525.0,
+    "mean_token_accuracy": 0.867114982008934,
+    "epoch": 0.18508437670114317,
+    "step": 170
+  },
+  {
+    "loss": 0.6745,
+    "grad_norm": 0.8438016176223755,
+    "learning_rate": 0.0004864705354684076,
+    "entropy": 0.6536071257665753,
+    "num_tokens": 133957.0,
+    "mean_token_accuracy": 0.8712003126740455,
+    "epoch": 0.195971692977681,
+    "step": 180
+  },
+  {
+    "loss": 0.8311,
+    "grad_norm": 11.108195304870605,
+    "learning_rate": 0.00048321881464259676,
+    "entropy": 0.7786130860447884,
+    "num_tokens": 141635.0,
+    "mean_token_accuracy": 0.8524066478013992,
+    "epoch": 0.20685900925421882,
+    "step": 190
+  },
+  {
+    "loss": 1.1536,
+    "grad_norm": 0.9511222243309021,
+    "learning_rate": 0.0004796305821833098,
+    "entropy": 0.966428604722023,
+    "num_tokens": 148904.0,
+    "mean_token_accuracy": 0.8078579772263765,
+    "epoch": 0.21774632553075668,
+    "step": 200
+  },
+  {
+    "loss": 0.7879,
+    "grad_norm": 1.1783875226974487,
+    "learning_rate": 0.00047571101555432896,
+    "entropy": 0.7515945039689541,
+    "num_tokens": 156485.0,
+    "mean_token_accuracy": 0.8544724628329277,
+    "epoch": 0.2286336418072945,
+    "step": 210
+  },
+  {
+    "loss": 0.8526,
+    "grad_norm": 6.818761348724365,
+    "learning_rate": 0.0004714657703018024,
+    "entropy": 0.7837981097400188,
+    "num_tokens": 164052.0,
+    "mean_token_accuracy": 0.8500533938407898,
+    "epoch": 0.23952095808383234,
+    "step": 220
+  },
+  {
+    "loss": 4.5265,
+    "grad_norm": 30.283111572265625,
+    "learning_rate": 0.0004669009718938517,
+    "entropy": 2.9163815125823023,
+    "num_tokens": 171458.0,
+    "mean_token_accuracy": 0.5338523894548416,
+    "epoch": 0.25040827436037016,
+    "step": 230
+  },
+  {
+    "loss": 2.5513,
+    "grad_norm": 115.25823211669922,
+    "learning_rate": 0.00046202320688212834,
+    "entropy": 2.0303648129105567,
+    "num_tokens": 179120.0,
+    "mean_token_accuracy": 0.6328982267528772,
+    "epoch": 0.261295590636908,
+    "step": 240
+  },
+  {
+    "loss": 0.7502,
+    "grad_norm": 8.124970436096191,
+    "learning_rate": 0.00045683951339807265,
+    "entropy": 0.7287540748715401,
+    "num_tokens": 186544.0,
+    "mean_token_accuracy": 0.8621942937374115,
+    "epoch": 0.2721829069134458,
+    "step": 250
+  },
+  {
+    "loss": 0.6533,
+    "grad_norm": 0.8240286111831665,
+    "learning_rate": 0.0004513573709975877,
+    "entropy": 0.6197790112346411,
+    "num_tokens": 193880.0,
+    "mean_token_accuracy": 0.8799639001488686,
+    "epoch": 0.28307022318998365,
+    "step": 260
+  },
+  {
+    "loss": 0.6144,
+    "grad_norm": 0.7374333143234253,
+    "learning_rate": 0.0004455846898687814,
+    "entropy": 0.5914675913751125,
+    "num_tokens": 201144.0,
+    "mean_token_accuracy": 0.8865744516253471,
+    "epoch": 0.2939575394665215,
+    "step": 270
+  },
+  {
+    "loss": 0.7368,
+    "grad_norm": 0.5616167783737183,
+    "learning_rate": 0.00043952979941834925,
+    "entropy": 0.6894511558115483,
+    "num_tokens": 208571.0,
+    "mean_token_accuracy": 0.869646517932415,
+    "epoch": 0.30484485574305936,
+    "step": 280
+  },
+  {
+    "loss": 0.7601,
+    "grad_norm": 0.9096492528915405,
+    "learning_rate": 0.0004332014362530659,
+    "entropy": 0.7011830236762762,
+    "num_tokens": 216262.0,
+    "mean_token_accuracy": 0.8600849106907844,
+    "epoch": 0.3157321720195972,
+    "step": 290
+  },
+  {
+    "loss": 0.6919,
+    "grad_norm": 0.798920214176178,
+    "learning_rate": 0.00042660873157372763,
+    "entropy": 0.6535980202257633,
+    "num_tokens": 223604.0,
+    "mean_token_accuracy": 0.8731523841619492,
+    "epoch": 0.326619488296135,
+    "step": 300
+  },
+  {
+    "loss": 0.7312,
+    "grad_norm": 0.7000783085823059,
+    "learning_rate": 0.00041976119799973477,
+    "entropy": 0.7168508902192116,
+    "num_tokens": 231179.0,
+    "mean_token_accuracy": 0.862776193022728,
+    "epoch": 0.33750680457267285,
+    "step": 310
+  },
+  {
+    "loss": 0.5784,
+    "grad_norm": 0.6315221190452576,
+    "learning_rate": 0.00041266871584332454,
+    "entropy": 0.542249171063304,
+    "num_tokens": 238379.0,
+    "mean_token_accuracy": 0.8910727813839913,
+    "epoch": 0.3483941208492107,
+    "step": 320
+  },
+  {
+    "loss": 0.7976,
+    "grad_norm": 0.7601750493049622,
+    "learning_rate": 0.0004053415188532599,
+    "entropy": 0.7523471737280488,
+    "num_tokens": 246037.0,
+    "mean_token_accuracy": 0.8552964687347412,
+    "epoch": 0.3592814371257485,
+    "step": 330
+  },
+  {
+    "loss": 0.6718,
+    "grad_norm": 0.7975150346755981,
+    "learning_rate": 0.0003977901794485446,
+    "entropy": 0.6234626328572631,
+    "num_tokens": 253413.0,
+    "mean_token_accuracy": 0.8762789338827133,
+    "epoch": 0.37016875340228633,
+    "step": 340
+  },
+  {
+    "loss": 0.6134,
+    "grad_norm": 0.6521063446998596,
+    "learning_rate": 0.0003900255934634699,
+    "entropy": 0.6043397862464189,
+    "num_tokens": 260806.0,
+    "mean_token_accuracy": 0.8838247537612915,
+    "epoch": 0.38105606967882416,
+    "step": 350
+  },
+  {
+    "loss": 0.7149,
+    "grad_norm": 0.9141950607299805,
+    "learning_rate": 0.0003820589644260065,
+    "entropy": 0.6836169838905335,
+    "num_tokens": 268395.0,
+    "mean_token_accuracy": 0.8627639308571815,
+    "epoch": 0.391943385955362,
+    "step": 360
+  },
+  {
+    "loss": 0.749,
+    "grad_norm": 0.6039382815361023,
+    "learning_rate": 0.00037390178739222363,
+    "entropy": 0.6906204871833325,
+    "num_tokens": 276103.0,
+    "mean_token_accuracy": 0.8611163705587387,
+    "epoch": 0.4028307022318998,
+    "step": 370
+  },
+  {
+    "loss": 0.7435,
+    "grad_norm": 0.7205569744110107,
+    "learning_rate": 0.00036556583236006237,
+    "entropy": 0.7154507145285607,
+    "num_tokens": 283734.0,
+    "mean_token_accuracy": 0.8620465710759163,
+    "epoch": 0.41371801850843765,
+    "step": 380
+  },
+  {
+    "loss": 0.638,
+    "grad_norm": 0.7158589959144592,
+    "learning_rate": 0.0003570631272863956,
+    "entropy": 0.6251850917935371,
+    "num_tokens": 291104.0,
+    "mean_token_accuracy": 0.8804232597351074,
+    "epoch": 0.42460533478497553,
+    "step": 390
+  },
+  {
+    "loss": 0.7033,
+    "grad_norm": 0.6197627186775208,
+    "learning_rate": 0.0003484059407318781,
+    "entropy": 0.655794395133853,
+    "num_tokens": 298667.0,
+    "mean_token_accuracy": 0.8641186743974686,
+    "epoch": 0.43549265106151336,
+    "step": 400
+  },
+  {
+    "loss": 0.724,
+    "grad_norm": 0.6667740345001221,
+    "learning_rate": 0.00033960676415863015,
+    "entropy": 0.6755085363984108,
+    "num_tokens": 306246.0,
+    "mean_token_accuracy": 0.8660084888339042,
+    "epoch": 0.4463799673380512,
+    "step": 410
+  },
+  {
+    "loss": 0.6793,
+    "grad_norm": 0.7817707061767578,
+    "learning_rate": 0.00033067829390629453,
+    "entropy": 0.6676596872508526,
+    "num_tokens": 313691.0,
+    "mean_token_accuracy": 0.8715122014284133,
+    "epoch": 0.457267283614589,
+    "step": 420
+  },
+  {
+    "loss": 0.7552,
+    "grad_norm": 1.0613240003585815,
+    "learning_rate": 0.00032163341287247876,
+    "entropy": 0.7171908929944039,
+    "num_tokens": 321394.0,
+    "mean_token_accuracy": 0.8618355020880699,
+    "epoch": 0.46815459989112684,
+    "step": 430
+  },
+  {
+    "loss": 0.5586,
+    "grad_norm": 0.649142324924469,
+    "learning_rate": 0.00031248517192400876,
+    "entropy": 0.5122752044349909,
+    "num_tokens": 328807.0,
+    "mean_token_accuracy": 0.8886241987347603,
+    "epoch": 0.47904191616766467,
+    "step": 440
+  },
+  {
+    "loss": 0.5397,
+    "grad_norm": 0.5830965042114258,
+    "learning_rate": 0.0003032467710658231,
+    "entropy": 0.5436007279902697,
+    "num_tokens": 336117.0,
+    "mean_token_accuracy": 0.8886899709701538,
+    "epoch": 0.4899292324442025,
+    "step": 450
+  },
+  {
+    "loss": 0.6844,
+    "grad_norm": 0.6612179279327393,
+    "learning_rate": 0.0002939315403946733,
+    "entropy": 0.6272627621889114,
+    "num_tokens": 343690.0,
+    "mean_token_accuracy": 0.8671970278024673,
+    "epoch": 0.5008165487207403,
+    "step": 460
+  },
+  {
+    "loss": 0.6428,
+    "grad_norm": 0.6228363513946533,
+    "learning_rate": 0.0002845529208651161,
+    "entropy": 0.6421366423368454,
+    "num_tokens": 351025.0,
+    "mean_token_accuracy": 0.8806748360395431,
+    "epoch": 0.5117038649972782,
+    "step": 470
+  },
+  {
+    "loss": 0.6107,
+    "grad_norm": 0.6702244877815247,
+    "learning_rate": 0.00027512444489554767,
+    "entropy": 0.5570888858288526,
+    "num_tokens": 358280.0,
+    "mean_token_accuracy": 0.8815102204680443,
+    "epoch": 0.522591181273816,
+    "step": 480
+  },
+  {
+    "loss": 0.5648,
+    "grad_norm": 0.6265514492988586,
+    "learning_rate": 0.00026565971684226573,
+    "entropy": 0.5678568474948407,
+    "num_tokens": 365676.0,
+    "mean_token_accuracy": 0.8863577455282211,
+    "epoch": 0.5334784975503538,
+    "step": 490
+  },
+  {
+    "loss": 0.6452,
+    "grad_norm": 0.8042871356010437,
+    "learning_rate": 0.0002561723933697317,
+    "entropy": 0.5953685358166695,
+    "num_tokens": 373045.0,
+    "mean_token_accuracy": 0.8779568284749985,
+    "epoch": 0.5443658138268916,
+    "step": 500
+  },
+  {
+    "loss": 0.7605,
+    "grad_norm": 0.65378737449646,
+    "learning_rate": 0.0002466761637453568,
+    "entropy": 0.7206135954707861,
+    "num_tokens": 380560.0,
+    "mean_token_accuracy": 0.8624323174357414,
+    "epoch": 0.5552531301034295,
+    "step": 510
+  },
+  {
+    "loss": 0.6431,
+    "grad_norm": 0.5593022108078003,
+    "learning_rate": 0.00023718473008724742,
+    "entropy": 0.6116405792534352,
+    "num_tokens": 388027.0,
+    "mean_token_accuracy": 0.8734307438135147,
+    "epoch": 0.5661404463799673,
+    "step": 520
+  },
+  {
+    "loss": 0.6989,
+    "grad_norm": 0.6043410301208496,
+    "learning_rate": 0.00022771178759340514,
+    "entropy": 0.6847221277654171,
+    "num_tokens": 395471.0,
+    "mean_token_accuracy": 0.8697591915726661,
+    "epoch": 0.5770277626565051,
+    "step": 530
+  },
+  {
+    "loss": 0.6722,
+    "grad_norm": 0.5856935381889343,
+    "learning_rate": 0.00021827100478091506,
+    "entropy": 0.6371712744235992,
+    "num_tokens": 403108.0,
+    "mean_token_accuracy": 0.8688437402248382,
+    "epoch": 0.587915078933043,
+    "step": 540
+  },
+  {
+    "loss": 0.572,
+    "grad_norm": 0.5797805190086365,
+    "learning_rate": 0.00020887600376362904,
+    "entropy": 0.5488773200660944,
+    "num_tokens": 410452.0,
+    "mean_token_accuracy": 0.8839038833975792,
+    "epoch": 0.5988023952095808,
+    "step": 550
+  },
+  {
+    "loss": 0.5912,
+    "grad_norm": 0.521022379398346,
+    "learning_rate": 0.00019954034059680668,
+    "entropy": 0.5808871898800134,
+    "num_tokens": 417828.0,
+    "mean_token_accuracy": 0.888222835958004,
+    "epoch": 0.6096897114861187,
+    "step": 560
+  },
+  {
+    "loss": 0.569,
+    "grad_norm": 0.5325660705566406,
+    "learning_rate": 0.00019027748571707066,
+    "entropy": 0.5592609565705061,
+    "num_tokens": 425205.0,
+    "mean_token_accuracy": 0.8819878950715065,
+    "epoch": 0.6205770277626566,
+    "step": 570
+  },
+  {
+    "loss": 0.6577,
+    "grad_norm": 0.5417932271957397,
+    "learning_rate": 0.00018110080450590182,
+    "entropy": 0.6147103808820248,
+    "num_tokens": 432668.0,
+    "mean_token_accuracy": 0.8764070853590965,
+    "epoch": 0.6314643440391944,
+    "step": 580
+  },
+  {
+    "loss": 0.6491,
+    "grad_norm": 0.6512264013290405,
+    "learning_rate": 0.0001720235380047188,
+    "entropy": 0.6328225396573544,
+    "num_tokens": 440161.0,
+    "mean_token_accuracy": 0.8737886667251586,
+    "epoch": 0.6423516603157322,
+    "step": 590
+  },
+  {
+    "loss": 0.6251,
+    "grad_norm": 0.6788283586502075,
+    "learning_rate": 0.00016305878380936723,
+    "entropy": 0.6266105823218823,
+    "num_tokens": 447470.0,
+    "mean_token_accuracy": 0.8808962866663933,
+    "epoch": 0.65323897659227,
+    "step": 600
+  },
+  {
+    "loss": 0.5736,
+    "grad_norm": 0.6711626052856445,
+    "learning_rate": 0.00015421947717158752,
+    "entropy": 0.5399745622649789,
+    "num_tokens": 454744.0,
+    "mean_token_accuracy": 0.8923148453235626,
+    "epoch": 0.6641262928688079,
+    "step": 610
+  },
+  {
+    "loss": 0.7435,
+    "grad_norm": 0.755664587020874,
+    "learning_rate": 0.00014551837233472853,
+    "entropy": 0.695448774844408,
+    "num_tokens": 462432.0,
+    "mean_token_accuracy": 0.8619298219680787,
+    "epoch": 0.6750136091453457,
+    "step": 620
+  },
+  {
+    "loss": 0.5643,
+    "grad_norm": 0.6359855532646179,
+    "learning_rate": 0.0001369680241306384,
+    "entropy": 0.5408242929726839,
+    "num_tokens": 469763.0,
+    "mean_token_accuracy": 0.8882203832268715,
+    "epoch": 0.6859009254218835,
+    "step": 630
+  },
+  {
+    "loss": 0.5929,
+    "grad_norm": 0.6010624170303345,
+    "learning_rate": 0.00012858076986428722,
+    "entropy": 0.5984043713659049,
+    "num_tokens": 477149.0,
+    "mean_token_accuracy": 0.8816081374883652,
+    "epoch": 0.6967882416984214,
+    "step": 640
+  },
+  {
+    "loss": 0.6258,
+    "grad_norm": 0.6069923043251038,
+    "learning_rate": 0.00012036871151225798,
+    "entropy": 0.5971009206026793,
+    "num_tokens": 484640.0,
+    "mean_token_accuracy": 0.8779259011149406,
+    "epoch": 0.7076755579749592,
+    "step": 650
+  },
+  {
+    "loss": 0.613,
+    "grad_norm": 0.5725326538085938,
+    "learning_rate": 0.00011234369826079432,
+    "entropy": 0.6068212665617466,
+    "num_tokens": 492176.0,
+    "mean_token_accuracy": 0.8772263854742051,
+    "epoch": 0.718562874251497,
+    "step": 660
+  },
+  {
+    "loss": 0.6395,
+    "grad_norm": 0.7555631995201111,
+    "learning_rate": 0.00010451730940859949,
+    "entropy": 0.6018822252750397,
+    "num_tokens": 499845.0,
+    "mean_token_accuracy": 0.8763651743531227,
+    "epoch": 0.7294501905280348,
+    "step": 670
+  },
+  {
+    "loss": 0.6319,
+    "grad_norm": 0.5494154691696167,
+    "learning_rate": 9.690083765905544e-05,
+    "entropy": 0.5979844883084298,
+    "num_tokens": 507418.0,
+    "mean_token_accuracy": 0.8793143942952156,
+    "epoch": 0.7403375068045727,
+    "step": 680
+  },
+  {
+    "loss": 0.5712,
+    "grad_norm": 0.7554852366447449,
+    "learning_rate": 8.950527282597156e-05,
+    "entropy": 0.5443667802959681,
+    "num_tokens": 514785.0,
+    "mean_token_accuracy": 0.8878106832504272,
+    "epoch": 0.7512248230811105,
+    "step": 690
+  },
+  {
+    "loss": 0.5349,
+    "grad_norm": 0.5492348670959473,
+    "learning_rate": 8.234128597637239e-05,
+    "entropy": 0.5350296102464199,
+    "num_tokens": 522076.0,
+    "mean_token_accuracy": 0.8928583487868309,
+    "epoch": 0.7621121393576483,
+    "step": 700
+  },
+  {
+    "loss": 0.6173,
+    "grad_norm": 0.6712722778320312,
+    "learning_rate": 7.541921403320593e-05,
+    "entropy": 0.5867601800709963,
+    "num_tokens": 529578.0,
+    "mean_token_accuracy": 0.8806560948491097,
+    "epoch": 0.7729994556341862,
+    "step": 710
+  },
+  {
+    "loss": 0.6322,
+    "grad_norm": 0.5191476345062256,
+    "learning_rate": 6.874904486018821e-05,
+    "entropy": 0.6274564698338508,
+    "num_tokens": 537024.0,
+    "mean_token_accuracy": 0.8802066639065742,
+    "epoch": 0.783886771910724,
+    "step": 720
+  },
+  {
+    "loss": 0.5631,
+    "grad_norm": 0.6473787426948547,
+    "learning_rate": 6.234040285030551e-05,
+    "entropy": 0.5485220493748784,
+    "num_tokens": 544467.0,
+    "mean_token_accuracy": 0.884115393459797,
+    "epoch": 0.7947740881872618,
+    "step": 730
+  },
+  {
+    "loss": 0.544,
+    "grad_norm": 0.6537692546844482,
+    "learning_rate": 5.6202535038770045e-05,
+    "entropy": 0.5133912313729525,
+    "num_tokens": 551834.0,
+    "mean_token_accuracy": 0.8923841789364815,
+    "epoch": 0.8056614044637996,
+    "step": 740
+  },
+  {
+    "loss": 0.5861,
+    "grad_norm": 0.6818355321884155,
+    "learning_rate": 5.0344297760463954e-05,
+    "entropy": 0.5843172324821353,
+    "num_tokens": 559289.0,
+    "mean_token_accuracy": 0.8821831315755844,
+    "epoch": 0.8165487207403375,
+    "step": 750
+  },
+  {
+    "loss": 0.6023,
+    "grad_norm": 0.7315685153007507,
+    "learning_rate": 4.477414387112652e-05,
+    "entropy": 0.5654132578521966,
+    "num_tokens": 566780.0,
+    "mean_token_accuracy": 0.8810448855161667,
+    "epoch": 0.8274360370168753,
+    "step": 760
+  },
+  {
+    "loss": 0.5637,
+    "grad_norm": 0.6280286312103271,
+    "learning_rate": 3.950011055072039e-05,
+    "entropy": 0.5563108414411545,
+    "num_tokens": 574206.0,
+    "mean_token_accuracy": 0.8888336777687073,
+    "epoch": 0.8383233532934131,
+    "step": 770
+  },
+  {
+    "loss": 0.6045,
+    "grad_norm": 0.4488978683948517,
+    "learning_rate": 3.4529807706578346e-05,
+    "entropy": 0.5699046881869435,
+    "num_tokens": 581608.0,
+    "mean_token_accuracy": 0.8825341418385506,
+    "epoch": 0.8492106695699511,
+    "step": 780
+  },
+  {
+    "loss": 0.6071,
+    "grad_norm": 0.7140914797782898,
+    "learning_rate": 2.987040699306076e-05,
+    "entropy": 0.5627471528947353,
+    "num_tokens": 589176.0,
+    "mean_token_accuracy": 0.8823000833392143,
+    "epoch": 0.8600979858464889,
+    "step": 790
+  },
+  {
+    "loss": 0.6048,
+    "grad_norm": 0.7391604781150818,
+    "learning_rate": 2.5528631463569348e-05,
+    "entropy": 0.5856846395879984,
+    "num_tokens": 596679.0,
+    "mean_token_accuracy": 0.8830034390091897,
+    "epoch": 0.8709853021230267,
+    "step": 800
+  },
+  {
+    "loss": 0.4758,
+    "grad_norm": 0.44131314754486084,
+    "learning_rate": 2.151074586984744e-05,
+    "entropy": 0.47006473541259763,
+    "num_tokens": 603956.0,
+    "mean_token_accuracy": 0.8987425476312637,
+    "epoch": 0.8818726183995645,
+    "step": 810
+  },
+  {
+    "loss": 0.5234,
+    "grad_norm": 0.9443516135215759,
+    "learning_rate": 1.7822547622564188e-05,
+    "entropy": 0.500870693475008,
+    "num_tokens": 611308.0,
+    "mean_token_accuracy": 0.8955065041780472,
+    "epoch": 0.8927599346761024,
+    "step": 820
+  },
+  {
+    "loss": 0.5874,
+    "grad_norm": 0.6435447931289673,
+    "learning_rate": 1.4469358426225682e-05,
+    "entropy": 0.5766935784369707,
+    "num_tokens": 618906.0,
+    "mean_token_accuracy": 0.8817667260766029,
+    "epoch": 0.9036472509526402,
+    "step": 830
+  },
+  {
+    "loss": 0.5684,
+    "grad_norm": 0.7876263856887817,
+    "learning_rate": 1.1456016600482706e-05,
+    "entropy": 0.5494085047394037,
+    "num_tokens": 626224.0,
+    "mean_token_accuracy": 0.8897502169013023,
+    "epoch": 0.914534567229178,
+    "step": 840
+  },
+  {
+    "loss": 0.5115,
+    "grad_norm": 0.577179491519928,
+    "learning_rate": 8.78687009891499e-06,
+    "entropy": 0.5102550655603408,
+    "num_tokens": 633424.0,
+    "mean_token_accuracy": 0.8966878160834313,
+    "epoch": 0.9254218835057159,
+    "step": 850
+  },
+  {
+    "loss": 0.5585,
+    "grad_norm": 0.5634007453918457,
+    "learning_rate": 6.465770235365404e-06,
+    "entropy": 0.5591466184705496,
+    "num_tokens": 640829.0,
+    "mean_token_accuracy": 0.88973039239645,
+    "epoch": 0.9363091997822537,
+    "step": 860
+  },
+  {
+    "loss": 0.522,
+    "grad_norm": 0.47885215282440186,
+    "learning_rate": 4.496066126875531e-06,
+    "entropy": 0.5340237215161323,
+    "num_tokens": 648132.0,
+    "mean_token_accuracy": 0.8937937587499618,
+    "epoch": 0.9471965160587915,
+    "step": 870
+  },
+  {
+    "loss": 0.5222,
+    "grad_norm": 0.45497098565101624,
+    "learning_rate": 2.8805998612418396e-06,
+    "entropy": 0.5373509109020234,
+    "num_tokens": 655528.0,
+    "mean_token_accuracy": 0.8893009826540947,
+    "epoch": 0.9580838323353293,
+    "step": 880
+  },
+  {
+    "loss": 0.6542,
+    "grad_norm": 0.5892038345336914,
+    "learning_rate": 1.6217023961647982e-06,
+    "entropy": 0.6342795874923468,
+    "num_tokens": 663081.0,
+    "mean_token_accuracy": 0.8745187908411026,
+    "epoch": 0.9689711486118672,
+    "step": 890
+  },
+  {
+    "loss": 0.6075,
+    "grad_norm": 0.4947717785835266,
+    "learning_rate": 7.211901959078004e-07,
+    "entropy": 0.594165425002575,
+    "num_tokens": 670555.0,
+    "mean_token_accuracy": 0.8763557627797127,
+    "epoch": 0.979858464888405,
+    "step": 900
+  },
+  {
+    "loss": 0.5612,
+    "grad_norm": 0.5046107769012451,
+    "learning_rate": 1.8036261031936784e-07,
+    "entropy": 0.5458611365407705,
+    "num_tokens": 677920.0,
+    "mean_token_accuracy": 0.8900749757885933,
+    "epoch": 0.9907457811649428,
+    "step": 910
+  },
+  {
+    "eval_loss": 0.5503931641578674,
+    "eval_runtime": 353.9795,
+    "eval_samples_per_second": 1.155,
+    "eval_steps_per_second": 1.155,
+    "eval_entropy": 0.5564239460492192,
+    "eval_num_tokens": 684102.0,
+    "eval_mean_token_accuracy": 0.8885072296289477,
+    "epoch": 1.0,
+    "step": 919
+  },
+  {
+    "train_runtime": 7276.3168,
+    "train_samples_per_second": 0.505,
+    "train_steps_per_second": 0.126,
+    "total_flos": 1.5456460396345344e+16,
+    "train_loss": 0.7488500804714332,
+    "epoch": 1.0,
+    "step": 919
+  }
+]