Instructions to use Komma-LuisMiSanVe/LangToSQL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Komma-LuisMiSanVe/LangToSQL with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Komma-LuisMiSanVe/LangToSQL",
	filename="LangToSQL-1.5B-F16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Komma-LuisMiSanVe/LangToSQL with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
./llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Use Docker

docker model run hf.co/Komma-LuisMiSanVe/LangToSQL:F16

LM Studio
Jan
Ollama
How to use Komma-LuisMiSanVe/LangToSQL with Ollama:
```
ollama run hf.co/Komma-LuisMiSanVe/LangToSQL:F16
```

Unsloth Studio

How to use Komma-LuisMiSanVe/LangToSQL with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Komma-LuisMiSanVe/LangToSQL to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Komma-LuisMiSanVe/LangToSQL to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Komma-LuisMiSanVe/LangToSQL to start chatting

How to use Komma-LuisMiSanVe/LangToSQL with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Komma-LuisMiSanVe/LangToSQL:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Komma-LuisMiSanVe/LangToSQL with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Komma-LuisMiSanVe/LangToSQL:F16

Run Hermes

hermes

Docker Model Runner
How to use Komma-LuisMiSanVe/LangToSQL with Docker Model Runner:
```
docker model run hf.co/Komma-LuisMiSanVe/LangToSQL:F16
```

Lemonade

How to use Komma-LuisMiSanVe/LangToSQL with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Komma-LuisMiSanVe/LangToSQL:F16

Run and chat with the model

lemonade run user.LangToSQL-F16

List all available models

lemonade list

Komma-LuisMiSanVe commited on 22 days ago

Commit

0bc3784

verified ·

1 Parent(s): 1e6c9f0

Delete sql-model

Browse files

Files changed (17) hide show

sql-model/README.md +0 -62
sql-model/adapter_config.json +0 -41
sql-model/adapter_model.safetensors +0 -3
sql-model/chat_template.jinja +0 -54
sql-model/checkpoint-1750/README.md +0 -209
sql-model/checkpoint-1750/adapter_config.json +0 -41
sql-model/checkpoint-1750/adapter_model.safetensors +0 -3
sql-model/checkpoint-1750/chat_template.jinja +0 -54
sql-model/checkpoint-1750/optimizer.pt +0 -3
sql-model/checkpoint-1750/rng_state.pth +0 -3
sql-model/checkpoint-1750/scheduler.pt +0 -3
sql-model/checkpoint-1750/tokenizer.json +0 -3
sql-model/checkpoint-1750/tokenizer_config.json +0 -29
sql-model/checkpoint-1750/trainer_state.json +0 -1784
sql-model/checkpoint-1750/training_args.bin +0 -3
sql-model/tokenizer.json +0 -3
sql-model/tokenizer_config.json +0 -29

sql-model/README.md DELETED Viewed

@@ -1,62 +0,0 @@
----
-base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
-library_name: peft
-model_name: sql-model
-tags:
-- base_model:adapter:Qwen/Qwen2.5-Coder-1.5B-Instruct
-- lora
-- sft
-- transformers
-- trl
-licence: license
-pipeline_tag: text-generation
----
-# Model Card for sql-model
-This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="None", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
-```
-## Training procedure
-This model was trained with SFT.
-### Framework versions
-- PEFT 0.18.1
-- TRL: 1.0.0
-- Transformers: 5.4.0
-- Pytorch: 2.11.0
-- Datasets: 4.8.4
-- Tokenizers: 0.22.2
-## Citations
-Cite TRL as:
-```bibtex
-@software{vonwerra2020trl,
-  title   = {{TRL: Transformers Reinforcement Learning}},
-  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and GallouÃ©dec, Quentin},
-  license = {Apache-2.0},
-  url     = {https://github.com/huggingface/trl},
-  year    = {2020}
-}
-```

sql-model/adapter_config.json DELETED Viewed

@@ -1,41 +0,0 @@
-{
-  "alora_invocation_tokens": null,
-  "alpha_pattern": {},
-  "arrow_config": null,
-  "auto_mapping": null,
-  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-1.5B-Instruct",
-  "bias": "none",
-  "corda_config": null,
-  "ensure_weight_tying": false,
-  "eva_config": null,
-  "exclude_modules": null,
-  "fan_in_fan_out": false,
-  "inference_mode": true,
-  "init_lora_weights": true,
-  "layer_replication": null,
-  "layers_pattern": null,
-  "layers_to_transform": null,
-  "loftq_config": {},
-  "lora_alpha": 32,
-  "lora_bias": false,
-  "lora_dropout": 0.05,
-  "megatron_config": null,
-  "megatron_core": "megatron.core",
-  "modules_to_save": null,
-  "peft_type": "LORA",
-  "peft_version": "0.18.1",
-  "qalora_group_size": 16,
-  "r": 16,
-  "rank_pattern": {},
-  "revision": null,
-  "target_modules": [
-    "v_proj",
-    "q_proj"
-  ],
-  "target_parameters": null,
-  "task_type": "CAUSAL_LM",
-  "trainable_token_indices": null,
-  "use_dora": false,
-  "use_qalora": false,
-  "use_rslora": false
-}

sql-model/adapter_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:233052429edaba0de411ff7a503eec8183193d955664da50103ee110cb835952
-size 8731128

sql-model/chat_template.jinja DELETED Viewed

@@ -1,54 +0,0 @@
-{%- if tools %}
-    {{- '<|im_start|>system\n' }}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- messages[0]['content'] }}
-    {%- else %}
-        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
-    {%- endif %}
-    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
-    {%- for tool in tools %}
-        {{- "\n" }}
-        {{- tool | tojson }}
-    {%- endfor %}
-    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
-{%- else %}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
-    {%- else %}
-        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
-    {%- endif %}
-{%- endif %}
-{%- for message in messages %}
-    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
-        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
-    {%- elif message.role == "assistant" %}
-        {{- '<|im_start|>' + message.role }}
-        {%- if message.content %}
-            {{- '\n' + message.content }}
-        {%- endif %}
-        {%- for tool_call in message.tool_calls %}
-            {%- if tool_call.function is defined %}
-                {%- set tool_call = tool_call.function %}
-            {%- endif %}
-            {{- '\n<tool_call>\n{"name": "' }}
-            {{- tool_call.name }}
-            {{- '", "arguments": ' }}
-            {{- tool_call.arguments | tojson }}
-            {{- '}\n</tool_call>' }}
-        {%- endfor %}
-        {{- '<|im_end|>\n' }}
-    {%- elif message.role == "tool" %}
-        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
-            {{- '<|im_start|>user' }}
-        {%- endif %}
-        {{- '\n<tool_response>\n' }}
-        {{- message.content }}
-        {{- '\n</tool_response>' }}
-        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
-            {{- '<|im_end|>\n' }}
-        {%- endif %}
-    {%- endif %}
-{%- endfor %}
-{%- if add_generation_prompt %}
-    {{- '<|im_start|>assistant\n' }}
-{%- endif %}

sql-model/checkpoint-1750/README.md DELETED Viewed

@@ -1,209 +0,0 @@
----
-base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
-library_name: peft
-pipeline_tag: text-generation
-tags:
-- base_model:adapter:Qwen/Qwen2.5-Coder-1.5B-Instruct
-- lora
-- sft
-- transformers
-- trl
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
-### Framework versions
-- PEFT 0.18.1

sql-model/checkpoint-1750/adapter_config.json DELETED Viewed

@@ -1,41 +0,0 @@
-{
-  "alora_invocation_tokens": null,
-  "alpha_pattern": {},
-  "arrow_config": null,
-  "auto_mapping": null,
-  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-1.5B-Instruct",
-  "bias": "none",
-  "corda_config": null,
-  "ensure_weight_tying": false,
-  "eva_config": null,
-  "exclude_modules": null,
-  "fan_in_fan_out": false,
-  "inference_mode": true,
-  "init_lora_weights": true,
-  "layer_replication": null,
-  "layers_pattern": null,
-  "layers_to_transform": null,
-  "loftq_config": {},
-  "lora_alpha": 32,
-  "lora_bias": false,
-  "lora_dropout": 0.05,
-  "megatron_config": null,
-  "megatron_core": "megatron.core",
-  "modules_to_save": null,
-  "peft_type": "LORA",
-  "peft_version": "0.18.1",
-  "qalora_group_size": 16,
-  "r": 16,
-  "rank_pattern": {},
-  "revision": null,
-  "target_modules": [
-    "v_proj",
-    "q_proj"
-  ],
-  "target_parameters": null,
-  "task_type": "CAUSAL_LM",
-  "trainable_token_indices": null,
-  "use_dora": false,
-  "use_qalora": false,
-  "use_rslora": false
-}

sql-model/checkpoint-1750/adapter_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:233052429edaba0de411ff7a503eec8183193d955664da50103ee110cb835952
-size 8731128

sql-model/checkpoint-1750/chat_template.jinja DELETED Viewed

@@ -1,54 +0,0 @@
-{%- if tools %}
-    {{- '<|im_start|>system\n' }}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- messages[0]['content'] }}
-    {%- else %}
-        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
-    {%- endif %}
-    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
-    {%- for tool in tools %}
-        {{- "\n" }}
-        {{- tool | tojson }}
-    {%- endfor %}
-    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
-{%- else %}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
-    {%- else %}
-        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
-    {%- endif %}
-{%- endif %}
-{%- for message in messages %}
-    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
-        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
-    {%- elif message.role == "assistant" %}
-        {{- '<|im_start|>' + message.role }}
-        {%- if message.content %}
-            {{- '\n' + message.content }}
-        {%- endif %}
-        {%- for tool_call in message.tool_calls %}
-            {%- if tool_call.function is defined %}
-                {%- set tool_call = tool_call.function %}
-            {%- endif %}
-            {{- '\n<tool_call>\n{"name": "' }}
-            {{- tool_call.name }}
-            {{- '", "arguments": ' }}
-            {{- tool_call.arguments | tojson }}
-            {{- '}\n</tool_call>' }}
-        {%- endfor %}
-        {{- '<|im_end|>\n' }}
-    {%- elif message.role == "tool" %}
-        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
-            {{- '<|im_start|>user' }}
-        {%- endif %}
-        {{- '\n<tool_response>\n' }}
-        {{- message.content }}
-        {{- '\n</tool_response>' }}
-        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
-            {{- '<|im_end|>\n' }}
-        {%- endif %}
-    {%- endif %}
-{%- endfor %}
-{%- if add_generation_prompt %}
-    {{- '<|im_start|>assistant\n' }}
-{%- endif %}

sql-model/checkpoint-1750/optimizer.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:6f63982639d2cdd3d73b243be88644fe815c0b4a48555410eb414422116b7ade
-size 17524171

sql-model/checkpoint-1750/rng_state.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:c593dbe7b4c13895455ed97063d53435660103dbe2e0cf605b493badb4ad85cd
-size 14455

sql-model/checkpoint-1750/scheduler.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:af667abda37bcebe1e332a00c2289b57845721907fb4438e53936d778ed7af82
-size 1465

sql-model/checkpoint-1750/tokenizer.json DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
-size 11421892

sql-model/checkpoint-1750/tokenizer_config.json DELETED Viewed

@@ -1,29 +0,0 @@
-{
-  "add_prefix_space": false,
-  "backend": "tokenizers",
-  "bos_token": null,
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "<|im_end|>",
-  "errors": "replace",
-  "extra_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "is_local": false,
-  "model_max_length": 32768,
-  "pad_token": "<|endoftext|>",
-  "split_special_tokens": false,
-  "tokenizer_class": "Qwen2Tokenizer",
-  "unk_token": null
-}

sql-model/checkpoint-1750/trainer_state.json DELETED Viewed

@@ -1,1784 +0,0 @@
-{
-  "best_global_step": null,
-  "best_metric": null,
-  "best_model_checkpoint": null,
-  "epoch": 1.0,
-  "eval_steps": 500,
-  "global_step": 1750,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [
-    {
-      "entropy": 1.755023404955864,
-      "epoch": 0.005714285714285714,
-      "grad_norm": 0.4571238160133362,
-      "learning_rate": 0.00019897142857142858,
-      "loss": 3.2381954193115234,
-      "mean_token_accuracy": 0.556605902314186,
-      "num_tokens": 3387.0,
-      "step": 10
-    },
-    {
-      "entropy": 1.872155451774597,
-      "epoch": 0.011428571428571429,
-      "grad_norm": 0.34199467301368713,
-      "learning_rate": 0.00019782857142857142,
-      "loss": 2.191742706298828,
-      "mean_token_accuracy": 0.6503265604376793,
-      "num_tokens": 6716.0,
-      "step": 20
-    },
-    {
-      "entropy": 1.484403568506241,
-      "epoch": 0.017142857142857144,
-      "grad_norm": 0.3380744755268097,
-      "learning_rate": 0.0001966857142857143,
-      "loss": 1.4328497886657714,
-      "mean_token_accuracy": 0.746163409948349,
-      "num_tokens": 9752.0,
-      "step": 30
-    },
-    {
-      "entropy": 1.0218224853277207,
-      "epoch": 0.022857142857142857,
-      "grad_norm": 0.2288217842578888,
-      "learning_rate": 0.00019554285714285717,
-      "loss": 1.1417624473571777,
-      "mean_token_accuracy": 0.8017501533031464,
-      "num_tokens": 13031.0,
-      "step": 40
-    },
-    {
-      "entropy": 0.9253160357475281,
-      "epoch": 0.02857142857142857,
-      "grad_norm": 0.3444674015045166,
-      "learning_rate": 0.0001944,
-      "loss": 1.0559259414672852,
-      "mean_token_accuracy": 0.8092378318309784,
-      "num_tokens": 16325.0,
-      "step": 50
-    },
-    {
-      "entropy": 0.8747231267392636,
-      "epoch": 0.03428571428571429,
-      "grad_norm": 0.39356729388237,
-      "learning_rate": 0.00019325714285714287,
-      "loss": 0.8799959182739258,
-      "mean_token_accuracy": 0.8227440923452377,
-      "num_tokens": 19968.0,
-      "step": 60
-    },
-    {
-      "entropy": 0.8573793575167656,
-      "epoch": 0.04,
-      "grad_norm": 0.20990096032619476,
-      "learning_rate": 0.0001921142857142857,
-      "loss": 0.844728660583496,
-      "mean_token_accuracy": 0.8225094452500343,
-      "num_tokens": 23194.0,
-      "step": 70
-    },
-    {
-      "entropy": 0.8150502145290375,
-      "epoch": 0.045714285714285714,
-      "grad_norm": 0.2119598537683487,
-      "learning_rate": 0.00019097142857142857,
-      "loss": 0.7866294860839844,
-      "mean_token_accuracy": 0.8332153782248497,
-      "num_tokens": 26444.0,
-      "step": 80
-    },
-    {
-      "entropy": 0.7989433646202088,
-      "epoch": 0.05142857142857143,
-      "grad_norm": 5.590895175933838,
-      "learning_rate": 0.00018982857142857144,
-      "loss": 0.7787356853485108,
-      "mean_token_accuracy": 0.8374833926558495,
-      "num_tokens": 29847.0,
-      "step": 90
-    },
-    {
-      "entropy": 0.7925440274178982,
-      "epoch": 0.05714285714285714,
-      "grad_norm": 0.2303859442472458,
-      "learning_rate": 0.00018868571428571428,
-      "loss": 0.7289025783538818,
-      "mean_token_accuracy": 0.8440518975257874,
-      "num_tokens": 33416.0,
-      "step": 100
-    },
-    {
-      "entropy": 0.7396377846598625,
-      "epoch": 0.06285714285714286,
-      "grad_norm": 0.1661251187324524,
-      "learning_rate": 0.00018754285714285714,
-      "loss": 0.7113327980041504,
-      "mean_token_accuracy": 0.844328448176384,
-      "num_tokens": 36508.0,
-      "step": 110
-    },
-    {
-      "entropy": 0.7262422397732735,
-      "epoch": 0.06857142857142857,
-      "grad_norm": 0.16159392893314362,
-      "learning_rate": 0.00018640000000000003,
-      "loss": 0.7208394050598145,
-      "mean_token_accuracy": 0.8527172908186913,
-      "num_tokens": 39969.0,
-      "step": 120
-    },
-    {
-      "entropy": 0.7012980431318283,
-      "epoch": 0.07428571428571429,
-      "grad_norm": 0.18032804131507874,
-      "learning_rate": 0.00018525714285714287,
-      "loss": 0.7320738792419433,
-      "mean_token_accuracy": 0.8414866328239441,
-      "num_tokens": 43554.0,
-      "step": 130
-    },
-    {
-      "entropy": 0.7090784892439842,
-      "epoch": 0.08,
-      "grad_norm": 0.1905902475118637,
-      "learning_rate": 0.00018411428571428573,
-      "loss": 0.7095338821411132,
-      "mean_token_accuracy": 0.8467919006943703,
-      "num_tokens": 47352.0,
-      "step": 140
-    },
-    {
-      "entropy": 0.7508301064372063,
-      "epoch": 0.08571428571428572,
-      "grad_norm": 0.2042306363582611,
-      "learning_rate": 0.00018297142857142857,
-      "loss": 0.714650297164917,
-      "mean_token_accuracy": 0.8489894285798073,
-      "num_tokens": 50592.0,
-      "step": 150
-    },
-    {
-      "entropy": 0.7499568641185761,
-      "epoch": 0.09142857142857143,
-      "grad_norm": 0.14059635996818542,
-      "learning_rate": 0.00018182857142857143,
-      "loss": 0.7066077709197998,
-      "mean_token_accuracy": 0.8459182664752006,
-      "num_tokens": 53905.0,
-      "step": 160
-    },
-    {
-      "entropy": 0.751581659913063,
-      "epoch": 0.09714285714285714,
-      "grad_norm": 0.1389404684305191,
-      "learning_rate": 0.0001806857142857143,
-      "loss": 0.7304620265960693,
-      "mean_token_accuracy": 0.8408536836504936,
-      "num_tokens": 57406.0,
-      "step": 170
-    },
-    {
-      "entropy": 0.7080035664141178,
-      "epoch": 0.10285714285714286,
-      "grad_norm": 0.20151932537555695,
-      "learning_rate": 0.00017954285714285714,
-      "loss": 0.6872885704040528,
-      "mean_token_accuracy": 0.8518571972846984,
-      "num_tokens": 60514.0,
-      "step": 180
-    },
-    {
-      "entropy": 0.7358761139214038,
-      "epoch": 0.10857142857142857,
-      "grad_norm": 0.15893664956092834,
-      "learning_rate": 0.0001784,
-      "loss": 0.7053659915924072,
-      "mean_token_accuracy": 0.8461879312992096,
-      "num_tokens": 63978.0,
-      "step": 190
-    },
-    {
-      "entropy": 0.7305082514882087,
-      "epoch": 0.11428571428571428,
-      "grad_norm": 0.17859824001789093,
-      "learning_rate": 0.00017725714285714286,
-      "loss": 0.7007936954498291,
-      "mean_token_accuracy": 0.8456599608063697,
-      "num_tokens": 67445.0,
-      "step": 200
-    },
-    {
-      "entropy": 0.6865443497896194,
-      "epoch": 0.12,
-      "grad_norm": 0.17160974442958832,
-      "learning_rate": 0.00017611428571428573,
-      "loss": 0.684369421005249,
-      "mean_token_accuracy": 0.8545770645141602,
-      "num_tokens": 71004.0,
-      "step": 210
-    },
-    {
-      "entropy": 0.743726947158575,
-      "epoch": 0.12571428571428572,
-      "grad_norm": 0.19258913397789001,
-      "learning_rate": 0.0001749714285714286,
-      "loss": 0.6941751956939697,
-      "mean_token_accuracy": 0.8491073668003082,
-      "num_tokens": 74467.0,
-      "step": 220
-    },
-    {
-      "entropy": 0.7293999463319778,
-      "epoch": 0.13142857142857142,
-      "grad_norm": 0.21078741550445557,
-      "learning_rate": 0.00017382857142857143,
-      "loss": 0.7608419418334961,
-      "mean_token_accuracy": 0.8355702117085457,
-      "num_tokens": 77733.0,
-      "step": 230
-    },
-    {
-      "entropy": 0.6695528343319893,
-      "epoch": 0.13714285714285715,
-      "grad_norm": 0.2082340568304062,
-      "learning_rate": 0.0001726857142857143,
-      "loss": 0.6619296073913574,
-      "mean_token_accuracy": 0.8572208806872368,
-      "num_tokens": 81271.0,
-      "step": 240
-    },
-    {
-      "entropy": 0.7246910102665425,
-      "epoch": 0.14285714285714285,
-      "grad_norm": 0.20099493861198425,
-      "learning_rate": 0.00017154285714285716,
-      "loss": 0.7173333168029785,
-      "mean_token_accuracy": 0.8494016170501709,
-      "num_tokens": 84574.0,
-      "step": 250
-    },
-    {
-      "entropy": 0.6853026911616326,
-      "epoch": 0.14857142857142858,
-      "grad_norm": 0.17009125649929047,
-      "learning_rate": 0.0001704,
-      "loss": 0.6357984066009521,
-      "mean_token_accuracy": 0.8530513703823089,
-      "num_tokens": 88152.0,
-      "step": 260
-    },
-    {
-      "entropy": 0.6933697812259197,
-      "epoch": 0.15428571428571428,
-      "grad_norm": 0.24327197670936584,
-      "learning_rate": 0.00016925714285714286,
-      "loss": 0.6879170417785645,
-      "mean_token_accuracy": 0.8545807048678398,
-      "num_tokens": 91522.0,
-      "step": 270
-    },
-    {
-      "entropy": 0.685890257358551,
-      "epoch": 0.16,
-      "grad_norm": 0.1667262464761734,
-      "learning_rate": 0.00016811428571428572,
-      "loss": 0.6770327091217041,
-      "mean_token_accuracy": 0.8566557437181472,
-      "num_tokens": 94937.0,
-      "step": 280
-    },
-    {
-      "entropy": 0.6751709222793579,
-      "epoch": 0.1657142857142857,
-      "grad_norm": 0.18273314833641052,
-      "learning_rate": 0.0001669714285714286,
-      "loss": 0.6363730907440186,
-      "mean_token_accuracy": 0.86243856549263,
-      "num_tokens": 97983.0,
-      "step": 290
-    },
-    {
-      "entropy": 0.6693296499550343,
-      "epoch": 0.17142857142857143,
-      "grad_norm": 0.18416839838027954,
-      "learning_rate": 0.00016582857142857145,
-      "loss": 0.6525997161865235,
-      "mean_token_accuracy": 0.8522453367710113,
-      "num_tokens": 101606.0,
-      "step": 300
-    },
-    {
-      "entropy": 0.7025774903595448,
-      "epoch": 0.17714285714285713,
-      "grad_norm": 0.2354522943496704,
-      "learning_rate": 0.0001646857142857143,
-      "loss": 0.7257501125335694,
-      "mean_token_accuracy": 0.8442893981933594,
-      "num_tokens": 104894.0,
-      "step": 310
-    },
-    {
-      "entropy": 0.7182297334074974,
-      "epoch": 0.18285714285714286,
-      "grad_norm": 0.525611162185669,
-      "learning_rate": 0.00016354285714285715,
-      "loss": 0.7307909011840821,
-      "mean_token_accuracy": 0.8398969128727913,
-      "num_tokens": 108488.0,
-      "step": 320
-    },
-    {
-      "entropy": 0.7585869312286377,
-      "epoch": 0.18857142857142858,
-      "grad_norm": 0.19886285066604614,
-      "learning_rate": 0.00016240000000000002,
-      "loss": 0.7316296100616455,
-      "mean_token_accuracy": 0.8501298695802688,
-      "num_tokens": 111621.0,
-      "step": 330
-    },
-    {
-      "entropy": 0.7191452518105507,
-      "epoch": 0.19428571428571428,
-      "grad_norm": 0.18904122710227966,
-      "learning_rate": 0.00016125714285714285,
-      "loss": 0.7095869541168213,
-      "mean_token_accuracy": 0.8502562329173088,
-      "num_tokens": 114846.0,
-      "step": 340
-    },
-    {
-      "entropy": 0.76069777905941,
-      "epoch": 0.2,
-      "grad_norm": 0.1604829579591751,
-      "learning_rate": 0.00016011428571428572,
-      "loss": 0.7304455280303955,
-      "mean_token_accuracy": 0.8341712266206741,
-      "num_tokens": 118373.0,
-      "step": 350
-    },
-    {
-      "entropy": 0.7046815037727356,
-      "epoch": 0.2057142857142857,
-      "grad_norm": 0.19462066888809204,
-      "learning_rate": 0.00015897142857142858,
-      "loss": 0.7354346752166748,
-      "mean_token_accuracy": 0.8472130700945855,
-      "num_tokens": 122029.0,
-      "step": 360
-    },
-    {
-      "entropy": 0.6850700139999389,
-      "epoch": 0.21142857142857144,
-      "grad_norm": 0.22383001446723938,
-      "learning_rate": 0.00015782857142857145,
-      "loss": 0.6635906219482421,
-      "mean_token_accuracy": 0.853803887963295,
-      "num_tokens": 125215.0,
-      "step": 370
-    },
-    {
-      "entropy": 0.6743280217051506,
-      "epoch": 0.21714285714285714,
-      "grad_norm": 0.22975340485572815,
-      "learning_rate": 0.0001566857142857143,
-      "loss": 0.6944728851318359,
-      "mean_token_accuracy": 0.8531624928116799,
-      "num_tokens": 128377.0,
-      "step": 380
-    },
-    {
-      "entropy": 0.6956075571477414,
-      "epoch": 0.22285714285714286,
-      "grad_norm": 0.16905982792377472,
-      "learning_rate": 0.00015554285714285715,
-      "loss": 0.6574249267578125,
-      "mean_token_accuracy": 0.8537535384297371,
-      "num_tokens": 131561.0,
-      "step": 390
-    },
-    {
-      "entropy": 0.6535641267895699,
-      "epoch": 0.22857142857142856,
-      "grad_norm": 0.2020592838525772,
-      "learning_rate": 0.0001544,
-      "loss": 0.6684637546539307,
-      "mean_token_accuracy": 0.8588701456785202,
-      "num_tokens": 134886.0,
-      "step": 400
-    },
-    {
-      "entropy": 0.7120788738131523,
-      "epoch": 0.2342857142857143,
-      "grad_norm": 0.17928935587406158,
-      "learning_rate": 0.00015325714285714285,
-      "loss": 0.6707518100738525,
-      "mean_token_accuracy": 0.855300298333168,
-      "num_tokens": 138226.0,
-      "step": 410
-    },
-    {
-      "entropy": 0.7220979325473309,
-      "epoch": 0.24,
-      "grad_norm": 0.1849253624677658,
-      "learning_rate": 0.00015211428571428571,
-      "loss": 0.6626916408538819,
-      "mean_token_accuracy": 0.8522223040461541,
-      "num_tokens": 141520.0,
-      "step": 420
-    },
-    {
-      "entropy": 0.6991067595779896,
-      "epoch": 0.24571428571428572,
-      "grad_norm": 0.20180848240852356,
-      "learning_rate": 0.00015097142857142858,
-      "loss": 0.6791836261749268,
-      "mean_token_accuracy": 0.8506909489631653,
-      "num_tokens": 144798.0,
-      "step": 430
-    },
-    {
-      "entropy": 0.6208832249045372,
-      "epoch": 0.25142857142857145,
-      "grad_norm": 0.16604188084602356,
-      "learning_rate": 0.00014982857142857144,
-      "loss": 0.6340303897857666,
-      "mean_token_accuracy": 0.8603393912315369,
-      "num_tokens": 148478.0,
-      "step": 440
-    },
-    {
-      "entropy": 0.6770513989031315,
-      "epoch": 0.2571428571428571,
-      "grad_norm": 0.20192453265190125,
-      "learning_rate": 0.0001486857142857143,
-      "loss": 0.6619882106781005,
-      "mean_token_accuracy": 0.8508818298578262,
-      "num_tokens": 152091.0,
-      "step": 450
-    },
-    {
-      "entropy": 0.7052303500473499,
-      "epoch": 0.26285714285714284,
-      "grad_norm": 0.23521800339221954,
-      "learning_rate": 0.00014754285714285717,
-      "loss": 0.7204936027526856,
-      "mean_token_accuracy": 0.8412432789802551,
-      "num_tokens": 155676.0,
-      "step": 460
-    },
-    {
-      "entropy": 0.7039557799696923,
-      "epoch": 0.26857142857142857,
-      "grad_norm": 0.18875496089458466,
-      "learning_rate": 0.0001464,
-      "loss": 0.6947028636932373,
-      "mean_token_accuracy": 0.8545891866087914,
-      "num_tokens": 159143.0,
-      "step": 470
-    },
-    {
-      "entropy": 0.7390237525105476,
-      "epoch": 0.2742857142857143,
-      "grad_norm": 0.20184092223644257,
-      "learning_rate": 0.00014525714285714287,
-      "loss": 0.7198336124420166,
-      "mean_token_accuracy": 0.8431004419922828,
-      "num_tokens": 162539.0,
-      "step": 480
-    },
-    {
-      "entropy": 0.7064043849706649,
-      "epoch": 0.28,
-      "grad_norm": 0.15704013407230377,
-      "learning_rate": 0.0001441142857142857,
-      "loss": 0.6998549938201905,
-      "mean_token_accuracy": 0.8420170620083809,
-      "num_tokens": 166150.0,
-      "step": 490
-    },
-    {
-      "entropy": 0.6628431506454945,
-      "epoch": 0.2857142857142857,
-      "grad_norm": 0.1651039719581604,
-      "learning_rate": 0.00014297142857142857,
-      "loss": 0.6178061485290527,
-      "mean_token_accuracy": 0.8624962165951728,
-      "num_tokens": 169383.0,
-      "step": 500
-    },
-    {
-      "entropy": 0.6844660565257072,
-      "epoch": 0.2914285714285714,
-      "grad_norm": 0.23858557641506195,
-      "learning_rate": 0.00014182857142857144,
-      "loss": 0.6924018859863281,
-      "mean_token_accuracy": 0.8506158754229546,
-      "num_tokens": 172663.0,
-      "step": 510
-    },
-    {
-      "entropy": 0.6808311395347119,
-      "epoch": 0.29714285714285715,
-      "grad_norm": 0.23313727974891663,
-      "learning_rate": 0.00014068571428571427,
-      "loss": 0.6816984653472901,
-      "mean_token_accuracy": 0.8454315170645714,
-      "num_tokens": 175956.0,
-      "step": 520
-    },
-    {
-      "entropy": 0.683815760165453,
-      "epoch": 0.3028571428571429,
-      "grad_norm": 0.20086617767810822,
-      "learning_rate": 0.00013954285714285717,
-      "loss": 0.6635974884033203,
-      "mean_token_accuracy": 0.8560041651129723,
-      "num_tokens": 179187.0,
-      "step": 530
-    },
-    {
-      "entropy": 0.6870512694120408,
-      "epoch": 0.30857142857142855,
-      "grad_norm": 0.24712982773780823,
-      "learning_rate": 0.0001384,
-      "loss": 0.702989149093628,
-      "mean_token_accuracy": 0.8512916043400764,
-      "num_tokens": 182700.0,
-      "step": 540
-    },
-    {
-      "entropy": 0.6640612557530403,
-      "epoch": 0.3142857142857143,
-      "grad_norm": 0.19018641114234924,
-      "learning_rate": 0.00013725714285714287,
-      "loss": 0.6487014293670654,
-      "mean_token_accuracy": 0.8530389070510864,
-      "num_tokens": 186178.0,
-      "step": 550
-    },
-    {
-      "entropy": 0.6581176854670048,
-      "epoch": 0.32,
-      "grad_norm": 0.17294897139072418,
-      "learning_rate": 0.00013611428571428573,
-      "loss": 0.6333216190338135,
-      "mean_token_accuracy": 0.8581204935908318,
-      "num_tokens": 189593.0,
-      "step": 560
-    },
-    {
-      "entropy": 0.6372215747833252,
-      "epoch": 0.32571428571428573,
-      "grad_norm": 0.20004868507385254,
-      "learning_rate": 0.00013497142857142857,
-      "loss": 0.634274959564209,
-      "mean_token_accuracy": 0.8586296364665031,
-      "num_tokens": 192755.0,
-      "step": 570
-    },
-    {
-      "entropy": 0.6754648350179195,
-      "epoch": 0.3314285714285714,
-      "grad_norm": 0.23564350605010986,
-      "learning_rate": 0.00013382857142857143,
-      "loss": 0.7094334602355957,
-      "mean_token_accuracy": 0.8480966657400131,
-      "num_tokens": 196315.0,
-      "step": 580
-    },
-    {
-      "entropy": 0.66893340498209,
-      "epoch": 0.33714285714285713,
-      "grad_norm": 0.21135050058364868,
-      "learning_rate": 0.0001326857142857143,
-      "loss": 0.6854133129119873,
-      "mean_token_accuracy": 0.8476407691836357,
-      "num_tokens": 199842.0,
-      "step": 590
-    },
-    {
-      "entropy": 0.734431654214859,
-      "epoch": 0.34285714285714286,
-      "grad_norm": 0.15609301626682281,
-      "learning_rate": 0.00013154285714285713,
-      "loss": 0.6732261657714844,
-      "mean_token_accuracy": 0.8533253937959671,
-      "num_tokens": 203217.0,
-      "step": 600
-    },
-    {
-      "entropy": 0.6809550739824772,
-      "epoch": 0.3485714285714286,
-      "grad_norm": 0.1637752801179886,
-      "learning_rate": 0.0001304,
-      "loss": 0.628327465057373,
-      "mean_token_accuracy": 0.8617108896374702,
-      "num_tokens": 206274.0,
-      "step": 610
-    },
-    {
-      "entropy": 0.6702453441917896,
-      "epoch": 0.35428571428571426,
-      "grad_norm": 0.2080763578414917,
-      "learning_rate": 0.00012925714285714286,
-      "loss": 0.6497041702270507,
-      "mean_token_accuracy": 0.8528500080108643,
-      "num_tokens": 209447.0,
-      "step": 620
-    },
-    {
-      "entropy": 0.6892564371228218,
-      "epoch": 0.36,
-      "grad_norm": 0.24688860774040222,
-      "learning_rate": 0.00012811428571428573,
-      "loss": 0.692191743850708,
-      "mean_token_accuracy": 0.8509424239397049,
-      "num_tokens": 212753.0,
-      "step": 630
-    },
-    {
-      "entropy": 0.6778513200581073,
-      "epoch": 0.3657142857142857,
-      "grad_norm": 0.23269857466220856,
-      "learning_rate": 0.0001269714285714286,
-      "loss": 0.6862228393554688,
-      "mean_token_accuracy": 0.8516128286719322,
-      "num_tokens": 216340.0,
-      "step": 640
-    },
-    {
-      "entropy": 0.6519438281655312,
-      "epoch": 0.37142857142857144,
-      "grad_norm": 0.21059896051883698,
-      "learning_rate": 0.00012582857142857143,
-      "loss": 0.6068024635314941,
-      "mean_token_accuracy": 0.8627916231751442,
-      "num_tokens": 219607.0,
-      "step": 650
-    },
-    {
-      "entropy": 0.6636812917888164,
-      "epoch": 0.37714285714285717,
-      "grad_norm": 0.19394846260547638,
-      "learning_rate": 0.0001246857142857143,
-      "loss": 0.6602859020233154,
-      "mean_token_accuracy": 0.8570883437991142,
-      "num_tokens": 222993.0,
-      "step": 660
-    },
-    {
-      "entropy": 0.6973143525421619,
-      "epoch": 0.38285714285714284,
-      "grad_norm": 0.19678597152233124,
-      "learning_rate": 0.00012354285714285713,
-      "loss": 0.7101614475250244,
-      "mean_token_accuracy": 0.8465988427400589,
-      "num_tokens": 226354.0,
-      "step": 670
-    },
-    {
-      "entropy": 0.6470928482711316,
-      "epoch": 0.38857142857142857,
-      "grad_norm": 0.23552727699279785,
-      "learning_rate": 0.0001224,
-      "loss": 0.6224793434143067,
-      "mean_token_accuracy": 0.8593849778175354,
-      "num_tokens": 229839.0,
-      "step": 680
-    },
-    {
-      "entropy": 0.6596762828528882,
-      "epoch": 0.3942857142857143,
-      "grad_norm": 0.24519386887550354,
-      "learning_rate": 0.00012125714285714287,
-      "loss": 0.6345892429351807,
-      "mean_token_accuracy": 0.8584522992372513,
-      "num_tokens": 233176.0,
-      "step": 690
-    },
-    {
-      "entropy": 0.6750190995633603,
-      "epoch": 0.4,
-      "grad_norm": 0.18651342391967773,
-      "learning_rate": 0.00012011428571428571,
-      "loss": 0.6961663722991943,
-      "mean_token_accuracy": 0.8505131497979164,
-      "num_tokens": 236611.0,
-      "step": 700
-    },
-    {
-      "entropy": 0.6748621694743633,
-      "epoch": 0.4057142857142857,
-      "grad_norm": 0.2448211908340454,
-      "learning_rate": 0.00011897142857142857,
-      "loss": 0.6589895725250244,
-      "mean_token_accuracy": 0.8562820449471473,
-      "num_tokens": 239719.0,
-      "step": 710
-    },
-    {
-      "entropy": 0.6976276993751526,
-      "epoch": 0.4114285714285714,
-      "grad_norm": 0.18323859572410583,
-      "learning_rate": 0.00011782857142857145,
-      "loss": 0.7028826713562012,
-      "mean_token_accuracy": 0.8523884430527687,
-      "num_tokens": 243277.0,
-      "step": 720
-    },
-    {
-      "entropy": 0.6387927994132042,
-      "epoch": 0.41714285714285715,
-      "grad_norm": 0.22905172407627106,
-      "learning_rate": 0.00011668571428571429,
-      "loss": 0.5653384685516357,
-      "mean_token_accuracy": 0.8658381626009941,
-      "num_tokens": 246681.0,
-      "step": 730
-    },
-    {
-      "entropy": 0.711549348384142,
-      "epoch": 0.4228571428571429,
-      "grad_norm": 0.22809088230133057,
-      "learning_rate": 0.00011554285714285715,
-      "loss": 0.7091411590576172,
-      "mean_token_accuracy": 0.8466275259852409,
-      "num_tokens": 250174.0,
-      "step": 740
-    },
-    {
-      "entropy": 0.6525940448045731,
-      "epoch": 0.42857142857142855,
-      "grad_norm": 0.165474072098732,
-      "learning_rate": 0.0001144,
-      "loss": 0.6481022834777832,
-      "mean_token_accuracy": 0.8572756558656692,
-      "num_tokens": 253671.0,
-      "step": 750
-    },
-    {
-      "entropy": 0.6324376344680787,
-      "epoch": 0.4342857142857143,
-      "grad_norm": 0.2185923010110855,
-      "learning_rate": 0.00011325714285714287,
-      "loss": 0.641082763671875,
-      "mean_token_accuracy": 0.8592174142599106,
-      "num_tokens": 256803.0,
-      "step": 760
-    },
-    {
-      "entropy": 0.6347133338451385,
-      "epoch": 0.44,
-      "grad_norm": 0.21539245545864105,
-      "learning_rate": 0.00011211428571428573,
-      "loss": 0.6200827121734619,
-      "mean_token_accuracy": 0.8640724703669548,
-      "num_tokens": 260043.0,
-      "step": 770
-    },
-    {
-      "entropy": 0.6413769535720348,
-      "epoch": 0.44571428571428573,
-      "grad_norm": 0.2725240886211395,
-      "learning_rate": 0.00011097142857142857,
-      "loss": 0.6957359790802002,
-      "mean_token_accuracy": 0.8510783404111862,
-      "num_tokens": 263322.0,
-      "step": 780
-    },
-    {
-      "entropy": 0.6790083512663841,
-      "epoch": 0.4514285714285714,
-      "grad_norm": 0.2525796890258789,
-      "learning_rate": 0.00010982857142857143,
-      "loss": 0.6288758277893066,
-      "mean_token_accuracy": 0.8544105023145676,
-      "num_tokens": 266791.0,
-      "step": 790
-    },
-    {
-      "entropy": 0.6541981130838395,
-      "epoch": 0.45714285714285713,
-      "grad_norm": 0.2138395756483078,
-      "learning_rate": 0.0001086857142857143,
-      "loss": 0.6304707527160645,
-      "mean_token_accuracy": 0.8575463786721229,
-      "num_tokens": 269777.0,
-      "step": 800
-    },
-    {
-      "entropy": 0.6959837578237057,
-      "epoch": 0.46285714285714286,
-      "grad_norm": 0.2314756065607071,
-      "learning_rate": 0.00010754285714285715,
-      "loss": 0.6351391315460205,
-      "mean_token_accuracy": 0.8612324088811875,
-      "num_tokens": 272988.0,
-      "step": 810
-    },
-    {
-      "entropy": 0.6571616813540458,
-      "epoch": 0.4685714285714286,
-      "grad_norm": 0.19059552252292633,
-      "learning_rate": 0.00010640000000000001,
-      "loss": 0.6456667423248291,
-      "mean_token_accuracy": 0.8525185197591781,
-      "num_tokens": 276377.0,
-      "step": 820
-    },
-    {
-      "entropy": 0.5789781011641025,
-      "epoch": 0.4742857142857143,
-      "grad_norm": 0.20973151922225952,
-      "learning_rate": 0.00010525714285714285,
-      "loss": 0.5815204620361328,
-      "mean_token_accuracy": 0.8682083815336228,
-      "num_tokens": 279565.0,
-      "step": 830
-    },
-    {
-      "entropy": 0.6460968509316445,
-      "epoch": 0.48,
-      "grad_norm": 0.19541703164577484,
-      "learning_rate": 0.00010411428571428573,
-      "loss": 0.6836059093475342,
-      "mean_token_accuracy": 0.861596092581749,
-      "num_tokens": 282990.0,
-      "step": 840
-    },
-    {
-      "entropy": 0.6585176661610603,
-      "epoch": 0.4857142857142857,
-      "grad_norm": 0.16561760008335114,
-      "learning_rate": 0.00010297142857142859,
-      "loss": 0.6332673072814942,
-      "mean_token_accuracy": 0.8653052359819412,
-      "num_tokens": 286266.0,
-      "step": 850
-    },
-    {
-      "entropy": 0.6495703898370266,
-      "epoch": 0.49142857142857144,
-      "grad_norm": 0.15301378071308136,
-      "learning_rate": 0.00010182857142857143,
-      "loss": 0.6165118217468262,
-      "mean_token_accuracy": 0.8553761512041091,
-      "num_tokens": 289474.0,
-      "step": 860
-    },
-    {
-      "entropy": 0.6630740962922573,
-      "epoch": 0.49714285714285716,
-      "grad_norm": 0.19287355244159698,
-      "learning_rate": 0.00010068571428571429,
-      "loss": 0.6098609924316406,
-      "mean_token_accuracy": 0.8592873826622963,
-      "num_tokens": 292837.0,
-      "step": 870
-    },
-    {
-      "entropy": 0.6436114467680454,
-      "epoch": 0.5028571428571429,
-      "grad_norm": 0.19893701374530792,
-      "learning_rate": 9.954285714285714e-05,
-      "loss": 0.6510508537292481,
-      "mean_token_accuracy": 0.8588305786252022,
-      "num_tokens": 296142.0,
-      "step": 880
-    },
-    {
-      "entropy": 0.6108668148517609,
-      "epoch": 0.5085714285714286,
-      "grad_norm": 0.24575024843215942,
-      "learning_rate": 9.84e-05,
-      "loss": 0.6009951591491699,
-      "mean_token_accuracy": 0.8642948284745217,
-      "num_tokens": 299685.0,
-      "step": 890
-    },
-    {
-      "entropy": 0.6458855651319027,
-      "epoch": 0.5142857142857142,
-      "grad_norm": 0.27966228127479553,
-      "learning_rate": 9.725714285714286e-05,
-      "loss": 0.6327517032623291,
-      "mean_token_accuracy": 0.8518458366394043,
-      "num_tokens": 303189.0,
-      "step": 900
-    },
-    {
-      "entropy": 0.6874921523034573,
-      "epoch": 0.52,
-      "grad_norm": 0.20877590775489807,
-      "learning_rate": 9.611428571428572e-05,
-      "loss": 0.6910979270935058,
-      "mean_token_accuracy": 0.8424041703343391,
-      "num_tokens": 306468.0,
-      "step": 910
-    },
-    {
-      "entropy": 0.6255587831139564,
-      "epoch": 0.5257142857142857,
-      "grad_norm": 0.2100318968296051,
-      "learning_rate": 9.497142857142857e-05,
-      "loss": 0.6689131736755372,
-      "mean_token_accuracy": 0.8548390552401542,
-      "num_tokens": 309955.0,
-      "step": 920
-    },
-    {
-      "entropy": 0.6701849482953548,
-      "epoch": 0.5314285714285715,
-      "grad_norm": 0.25995635986328125,
-      "learning_rate": 9.382857142857144e-05,
-      "loss": 0.6717410087585449,
-      "mean_token_accuracy": 0.8549696207046509,
-      "num_tokens": 313117.0,
-      "step": 930
-    },
-    {
-      "entropy": 0.6915492154657841,
-      "epoch": 0.5371428571428571,
-      "grad_norm": 0.21679414808750153,
-      "learning_rate": 9.268571428571429e-05,
-      "loss": 0.6586971759796143,
-      "mean_token_accuracy": 0.859293457865715,
-      "num_tokens": 316519.0,
-      "step": 940
-    },
-    {
-      "entropy": 0.6822380021214485,
-      "epoch": 0.5428571428571428,
-      "grad_norm": 0.2091035693883896,
-      "learning_rate": 9.154285714285715e-05,
-      "loss": 0.6489524841308594,
-      "mean_token_accuracy": 0.8576316460967064,
-      "num_tokens": 319782.0,
-      "step": 950
-    },
-    {
-      "entropy": 0.6756891958415508,
-      "epoch": 0.5485714285714286,
-      "grad_norm": 0.2485855668783188,
-      "learning_rate": 9.04e-05,
-      "loss": 0.656977367401123,
-      "mean_token_accuracy": 0.8671488896012306,
-      "num_tokens": 323019.0,
-      "step": 960
-    },
-    {
-      "entropy": 0.6616588845849037,
-      "epoch": 0.5542857142857143,
-      "grad_norm": 0.2510681450366974,
-      "learning_rate": 8.925714285714287e-05,
-      "loss": 0.6415849208831788,
-      "mean_token_accuracy": 0.8642254650592804,
-      "num_tokens": 326339.0,
-      "step": 970
-    },
-    {
-      "entropy": 0.6396586582064628,
-      "epoch": 0.56,
-      "grad_norm": 0.20449747145175934,
-      "learning_rate": 8.811428571428572e-05,
-      "loss": 0.6375674247741699,
-      "mean_token_accuracy": 0.8578563511371613,
-      "num_tokens": 329774.0,
-      "step": 980
-    },
-    {
-      "entropy": 0.6666180461645126,
-      "epoch": 0.5657142857142857,
-      "grad_norm": 0.2526116669178009,
-      "learning_rate": 8.697142857142857e-05,
-      "loss": 0.609344482421875,
-      "mean_token_accuracy": 0.8597319439053536,
-      "num_tokens": 333087.0,
-      "step": 990
-    },
-    {
-      "entropy": 0.6488007657229901,
-      "epoch": 0.5714285714285714,
-      "grad_norm": 0.20164141058921814,
-      "learning_rate": 8.582857142857143e-05,
-      "loss": 0.6201111316680908,
-      "mean_token_accuracy": 0.8526906743645668,
-      "num_tokens": 336508.0,
-      "step": 1000
-    },
-    {
-      "entropy": 0.63390611410141,
-      "epoch": 0.5771428571428572,
-      "grad_norm": 0.2473566234111786,
-      "learning_rate": 8.46857142857143e-05,
-      "loss": 0.5988630294799805,
-      "mean_token_accuracy": 0.8674046665430069,
-      "num_tokens": 339967.0,
-      "step": 1010
-    },
-    {
-      "entropy": 0.586044154316187,
-      "epoch": 0.5828571428571429,
-      "grad_norm": 0.21561333537101746,
-      "learning_rate": 8.354285714285715e-05,
-      "loss": 0.5799360752105713,
-      "mean_token_accuracy": 0.8686427220702171,
-      "num_tokens": 342923.0,
-      "step": 1020
-    },
-    {
-      "entropy": 0.6335062697529793,
-      "epoch": 0.5885714285714285,
-      "grad_norm": 0.20071916282176971,
-      "learning_rate": 8.24e-05,
-      "loss": 0.6467567443847656,
-      "mean_token_accuracy": 0.8589577525854111,
-      "num_tokens": 346417.0,
-      "step": 1030
-    },
-    {
-      "entropy": 0.6805698774755001,
-      "epoch": 0.5942857142857143,
-      "grad_norm": 0.295105904340744,
-      "learning_rate": 8.125714285714286e-05,
-      "loss": 0.6580337524414063,
-      "mean_token_accuracy": 0.8565412655472755,
-      "num_tokens": 349854.0,
-      "step": 1040
-    },
-    {
-      "entropy": 0.6416849888861179,
-      "epoch": 0.6,
-      "grad_norm": 0.25477221608161926,
-      "learning_rate": 8.011428571428573e-05,
-      "loss": 0.5906441688537598,
-      "mean_token_accuracy": 0.8619327709078789,
-      "num_tokens": 353294.0,
-      "step": 1050
-    },
-    {
-      "entropy": 0.6331484273076058,
-      "epoch": 0.6057142857142858,
-      "grad_norm": 0.22354693710803986,
-      "learning_rate": 7.897142857142858e-05,
-      "loss": 0.6218142509460449,
-      "mean_token_accuracy": 0.8641349360346794,
-      "num_tokens": 356543.0,
-      "step": 1060
-    },
-    {
-      "entropy": 0.6120303176343441,
-      "epoch": 0.6114285714285714,
-      "grad_norm": 0.20340518653392792,
-      "learning_rate": 7.782857142857143e-05,
-      "loss": 0.601622486114502,
-      "mean_token_accuracy": 0.8639883920550346,
-      "num_tokens": 359765.0,
-      "step": 1070
-    },
-    {
-      "entropy": 0.6292500749230385,
-      "epoch": 0.6171428571428571,
-      "grad_norm": 0.2069106251001358,
-      "learning_rate": 7.668571428571429e-05,
-      "loss": 0.6348252773284913,
-      "mean_token_accuracy": 0.8540400981903076,
-      "num_tokens": 363116.0,
-      "step": 1080
-    },
-    {
-      "entropy": 0.622652218490839,
-      "epoch": 0.6228571428571429,
-      "grad_norm": 0.23602575063705444,
-      "learning_rate": 7.554285714285716e-05,
-      "loss": 0.6226765632629394,
-      "mean_token_accuracy": 0.8595390111207962,
-      "num_tokens": 366720.0,
-      "step": 1090
-    },
-    {
-      "entropy": 0.6517547190189361,
-      "epoch": 0.6285714285714286,
-      "grad_norm": 0.28087317943573,
-      "learning_rate": 7.44e-05,
-      "loss": 0.6399660587310791,
-      "mean_token_accuracy": 0.8540969029068947,
-      "num_tokens": 370194.0,
-      "step": 1100
-    },
-    {
-      "entropy": 0.6615139648318291,
-      "epoch": 0.6342857142857142,
-      "grad_norm": 0.3010542094707489,
-      "learning_rate": 7.325714285714286e-05,
-      "loss": 0.6588103771209717,
-      "mean_token_accuracy": 0.8464817240834236,
-      "num_tokens": 373541.0,
-      "step": 1110
-    },
-    {
-      "entropy": 0.647005096077919,
-      "epoch": 0.64,
-      "grad_norm": 0.26990506052970886,
-      "learning_rate": 7.211428571428572e-05,
-      "loss": 0.6175911903381348,
-      "mean_token_accuracy": 0.8642094656825066,
-      "num_tokens": 376643.0,
-      "step": 1120
-    },
-    {
-      "entropy": 0.6991826109588146,
-      "epoch": 0.6457142857142857,
-      "grad_norm": 0.2759954333305359,
-      "learning_rate": 7.097142857142857e-05,
-      "loss": 0.6602604389190674,
-      "mean_token_accuracy": 0.8484420910477638,
-      "num_tokens": 379874.0,
-      "step": 1130
-    },
-    {
-      "entropy": 0.6453976444900036,
-      "epoch": 0.6514285714285715,
-      "grad_norm": 0.21964532136917114,
-      "learning_rate": 6.982857142857144e-05,
-      "loss": 0.5955167293548584,
-      "mean_token_accuracy": 0.861721670627594,
-      "num_tokens": 383311.0,
-      "step": 1140
-    },
-    {
-      "entropy": 0.6503799192607402,
-      "epoch": 0.6571428571428571,
-      "grad_norm": 0.19722476601600647,
-      "learning_rate": 6.868571428571429e-05,
-      "loss": 0.6320806026458741,
-      "mean_token_accuracy": 0.8562078520655632,
-      "num_tokens": 386679.0,
-      "step": 1150
-    },
-    {
-      "entropy": 0.6544807381927967,
-      "epoch": 0.6628571428571428,
-      "grad_norm": 0.28297051787376404,
-      "learning_rate": 6.754285714285714e-05,
-      "loss": 0.6531005382537842,
-      "mean_token_accuracy": 0.8550411075353622,
-      "num_tokens": 389963.0,
-      "step": 1160
-    },
-    {
-      "entropy": 0.6717952400445938,
-      "epoch": 0.6685714285714286,
-      "grad_norm": 0.24083739519119263,
-      "learning_rate": 6.64e-05,
-      "loss": 0.647878885269165,
-      "mean_token_accuracy": 0.8565554738044738,
-      "num_tokens": 393277.0,
-      "step": 1170
-    },
-    {
-      "entropy": 0.6969816297292709,
-      "epoch": 0.6742857142857143,
-      "grad_norm": 0.2341049164533615,
-      "learning_rate": 6.525714285714287e-05,
-      "loss": 0.6588397979736328,
-      "mean_token_accuracy": 0.8613204509019852,
-      "num_tokens": 396744.0,
-      "step": 1180
-    },
-    {
-      "entropy": 0.5977616436779499,
-      "epoch": 0.68,
-      "grad_norm": 0.21638202667236328,
-      "learning_rate": 6.411428571428572e-05,
-      "loss": 0.5673915386199951,
-      "mean_token_accuracy": 0.8722260281443596,
-      "num_tokens": 400200.0,
-      "step": 1190
-    },
-    {
-      "entropy": 0.6362225770950317,
-      "epoch": 0.6857142857142857,
-      "grad_norm": 0.2523053288459778,
-      "learning_rate": 6.297142857142857e-05,
-      "loss": 0.6235575199127197,
-      "mean_token_accuracy": 0.8589386835694313,
-      "num_tokens": 403472.0,
-      "step": 1200
-    },
-    {
-      "entropy": 0.6172734223306179,
-      "epoch": 0.6914285714285714,
-      "grad_norm": 0.21801182627677917,
-      "learning_rate": 6.182857142857143e-05,
-      "loss": 0.6179306030273437,
-      "mean_token_accuracy": 0.8626845121383667,
-      "num_tokens": 406776.0,
-      "step": 1210
-    },
-    {
-      "entropy": 0.6551583506166935,
-      "epoch": 0.6971428571428572,
-      "grad_norm": 0.27856746315956116,
-      "learning_rate": 6.068571428571429e-05,
-      "loss": 0.6389457225799561,
-      "mean_token_accuracy": 0.8546857088804245,
-      "num_tokens": 410307.0,
-      "step": 1220
-    },
-    {
-      "entropy": 0.7079406701028347,
-      "epoch": 0.7028571428571428,
-      "grad_norm": 0.29186904430389404,
-      "learning_rate": 5.9542857142857146e-05,
-      "loss": 0.7094098567962647,
-      "mean_token_accuracy": 0.8335159897804261,
-      "num_tokens": 413745.0,
-      "step": 1230
-    },
-    {
-      "entropy": 0.683994609862566,
-      "epoch": 0.7085714285714285,
-      "grad_norm": 0.2444518506526947,
-      "learning_rate": 5.8399999999999997e-05,
-      "loss": 0.626039981842041,
-      "mean_token_accuracy": 0.8552980974316597,
-      "num_tokens": 416674.0,
-      "step": 1240
-    },
-    {
-      "entropy": 0.6473595723509789,
-      "epoch": 0.7142857142857143,
-      "grad_norm": 0.2124943733215332,
-      "learning_rate": 5.725714285714287e-05,
-      "loss": 0.6362271308898926,
-      "mean_token_accuracy": 0.8586657717823982,
-      "num_tokens": 419814.0,
-      "step": 1250
-    },
-    {
-      "entropy": 0.6558213859796524,
-      "epoch": 0.72,
-      "grad_norm": 0.3233776092529297,
-      "learning_rate": 5.611428571428572e-05,
-      "loss": 0.6335158824920655,
-      "mean_token_accuracy": 0.8607818141579628,
-      "num_tokens": 423034.0,
-      "step": 1260
-    },
-    {
-      "entropy": 0.6101858265697956,
-      "epoch": 0.7257142857142858,
-      "grad_norm": 0.23161561787128448,
-      "learning_rate": 5.4971428571428576e-05,
-      "loss": 0.5872994422912597,
-      "mean_token_accuracy": 0.8612972646951675,
-      "num_tokens": 426358.0,
-      "step": 1270
-    },
-    {
-      "entropy": 0.6065807826817036,
-      "epoch": 0.7314285714285714,
-      "grad_norm": 0.3137820065021515,
-      "learning_rate": 5.3828571428571426e-05,
-      "loss": 0.6199213027954101,
-      "mean_token_accuracy": 0.8634087055921554,
-      "num_tokens": 429800.0,
-      "step": 1280
-    },
-    {
-      "entropy": 0.6477028757333756,
-      "epoch": 0.7371428571428571,
-      "grad_norm": 0.20417962968349457,
-      "learning_rate": 5.2685714285714284e-05,
-      "loss": 0.6403303623199463,
-      "mean_token_accuracy": 0.858014090359211,
-      "num_tokens": 433136.0,
-      "step": 1290
-    },
-    {
-      "entropy": 0.6253302626311779,
-      "epoch": 0.7428571428571429,
-      "grad_norm": 0.2649274468421936,
-      "learning_rate": 5.154285714285715e-05,
-      "loss": 0.5664835453033448,
-      "mean_token_accuracy": 0.8699078008532524,
-      "num_tokens": 436165.0,
-      "step": 1300
-    },
-    {
-      "entropy": 0.6552011057734489,
-      "epoch": 0.7485714285714286,
-      "grad_norm": 0.22695457935333252,
-      "learning_rate": 5.0400000000000005e-05,
-      "loss": 0.6598159313201905,
-      "mean_token_accuracy": 0.8518921718001365,
-      "num_tokens": 439362.0,
-      "step": 1310
-    },
-    {
-      "entropy": 0.6426700457930565,
-      "epoch": 0.7542857142857143,
-      "grad_norm": 0.2776673436164856,
-      "learning_rate": 4.9257142857142856e-05,
-      "loss": 0.6768081188201904,
-      "mean_token_accuracy": 0.8481876432895661,
-      "num_tokens": 442586.0,
-      "step": 1320
-    },
-    {
-      "entropy": 0.6550601176917553,
-      "epoch": 0.76,
-      "grad_norm": 0.27410948276519775,
-      "learning_rate": 4.811428571428572e-05,
-      "loss": 0.636206865310669,
-      "mean_token_accuracy": 0.8564146623015404,
-      "num_tokens": 446043.0,
-      "step": 1330
-    },
-    {
-      "entropy": 0.6599532529711724,
-      "epoch": 0.7657142857142857,
-      "grad_norm": 0.23421648144721985,
-      "learning_rate": 4.697142857142857e-05,
-      "loss": 0.6225139617919921,
-      "mean_token_accuracy": 0.8554374307394028,
-      "num_tokens": 449551.0,
-      "step": 1340
-    },
-    {
-      "entropy": 0.6726249933242798,
-      "epoch": 0.7714285714285715,
-      "grad_norm": 0.23552796244621277,
-      "learning_rate": 4.5828571428571435e-05,
-      "loss": 0.6367889881134033,
-      "mean_token_accuracy": 0.8643324449658394,
-      "num_tokens": 452586.0,
-      "step": 1350
-    },
-    {
-      "entropy": 0.6467246741056443,
-      "epoch": 0.7771428571428571,
-      "grad_norm": 0.18453116714954376,
-      "learning_rate": 4.4685714285714286e-05,
-      "loss": 0.6355658054351807,
-      "mean_token_accuracy": 0.8546889841556549,
-      "num_tokens": 455761.0,
-      "step": 1360
-    },
-    {
-      "entropy": 0.6625100992619991,
-      "epoch": 0.7828571428571428,
-      "grad_norm": 0.21036536991596222,
-      "learning_rate": 4.354285714285714e-05,
-      "loss": 0.6642748355865479,
-      "mean_token_accuracy": 0.8560309842228889,
-      "num_tokens": 459175.0,
-      "step": 1370
-    },
-    {
-      "entropy": 0.6317512311041356,
-      "epoch": 0.7885714285714286,
-      "grad_norm": 0.21304954588413239,
-      "learning_rate": 4.24e-05,
-      "loss": 0.601001787185669,
-      "mean_token_accuracy": 0.8565791383385658,
-      "num_tokens": 462701.0,
-      "step": 1380
-    },
-    {
-      "entropy": 0.644014036655426,
-      "epoch": 0.7942857142857143,
-      "grad_norm": 0.26754629611968994,
-      "learning_rate": 4.125714285714286e-05,
-      "loss": 0.6244466781616211,
-      "mean_token_accuracy": 0.8559624046087265,
-      "num_tokens": 465821.0,
-      "step": 1390
-    },
-    {
-      "entropy": 0.6658924698829651,
-      "epoch": 0.8,
-      "grad_norm": 0.2504512369632721,
-      "learning_rate": 4.0114285714285715e-05,
-      "loss": 0.6396134853363037,
-      "mean_token_accuracy": 0.8645422115921975,
-      "num_tokens": 469319.0,
-      "step": 1400
-    },
-    {
-      "entropy": 0.6185431003570556,
-      "epoch": 0.8057142857142857,
-      "grad_norm": 0.2430698573589325,
-      "learning_rate": 3.897142857142857e-05,
-      "loss": 0.6310626029968261,
-      "mean_token_accuracy": 0.8630247727036476,
-      "num_tokens": 472580.0,
-      "step": 1410
-    },
-    {
-      "entropy": 0.5944720163941384,
-      "epoch": 0.8114285714285714,
-      "grad_norm": 0.25170740485191345,
-      "learning_rate": 3.782857142857143e-05,
-      "loss": 0.5594588279724121,
-      "mean_token_accuracy": 0.8738556310534478,
-      "num_tokens": 475539.0,
-      "step": 1420
-    },
-    {
-      "entropy": 0.6284624971449375,
-      "epoch": 0.8171428571428572,
-      "grad_norm": 0.2709786295890808,
-      "learning_rate": 3.668571428571429e-05,
-      "loss": 0.639189624786377,
-      "mean_token_accuracy": 0.862536971271038,
-      "num_tokens": 478827.0,
-      "step": 1430
-    },
-    {
-      "entropy": 0.6724597789347172,
-      "epoch": 0.8228571428571428,
-      "grad_norm": 0.2555326521396637,
-      "learning_rate": 3.5542857142857145e-05,
-      "loss": 0.6536776542663574,
-      "mean_token_accuracy": 0.8553947672247887,
-      "num_tokens": 482017.0,
-      "step": 1440
-    },
-    {
-      "entropy": 0.6180280610918999,
-      "epoch": 0.8285714285714286,
-      "grad_norm": 0.22328683733940125,
-      "learning_rate": 3.4399999999999996e-05,
-      "loss": 0.6490192413330078,
-      "mean_token_accuracy": 0.8624721497297287,
-      "num_tokens": 485285.0,
-      "step": 1450
-    },
-    {
-      "entropy": 0.6554854273796081,
-      "epoch": 0.8342857142857143,
-      "grad_norm": 0.2618810832500458,
-      "learning_rate": 3.325714285714286e-05,
-      "loss": 0.6328207015991211,
-      "mean_token_accuracy": 0.8652528643608093,
-      "num_tokens": 488237.0,
-      "step": 1460
-    },
-    {
-      "entropy": 0.6485379718244075,
-      "epoch": 0.84,
-      "grad_norm": 0.3207658529281616,
-      "learning_rate": 3.211428571428571e-05,
-      "loss": 0.6625119686126709,
-      "mean_token_accuracy": 0.8627592906355858,
-      "num_tokens": 491545.0,
-      "step": 1470
-    },
-    {
-      "entropy": 0.6200287729501724,
-      "epoch": 0.8457142857142858,
-      "grad_norm": 0.2355208396911621,
-      "learning_rate": 3.0971428571428575e-05,
-      "loss": 0.5889796733856201,
-      "mean_token_accuracy": 0.8751833841204644,
-      "num_tokens": 494760.0,
-      "step": 1480
-    },
-    {
-      "entropy": 0.6196223556995392,
-      "epoch": 0.8514285714285714,
-      "grad_norm": 0.19851188361644745,
-      "learning_rate": 2.982857142857143e-05,
-      "loss": 0.5831719875335694,
-      "mean_token_accuracy": 0.8716864466667176,
-      "num_tokens": 497728.0,
-      "step": 1490
-    },
-    {
-      "entropy": 0.6233154498040676,
-      "epoch": 0.8571428571428571,
-      "grad_norm": 0.24745343625545502,
-      "learning_rate": 2.8685714285714286e-05,
-      "loss": 0.6103721141815186,
-      "mean_token_accuracy": 0.8551008567214012,
-      "num_tokens": 500804.0,
-      "step": 1500
-    },
-    {
-      "entropy": 0.599901518970728,
-      "epoch": 0.8628571428571429,
-      "grad_norm": 0.23715294897556305,
-      "learning_rate": 2.7542857142857144e-05,
-      "loss": 0.5849476814270019,
-      "mean_token_accuracy": 0.8684423178434372,
-      "num_tokens": 504066.0,
-      "step": 1510
-    },
-    {
-      "entropy": 0.629510759562254,
-      "epoch": 0.8685714285714285,
-      "grad_norm": 0.2427317500114441,
-      "learning_rate": 2.64e-05,
-      "loss": 0.631610631942749,
-      "mean_token_accuracy": 0.8575234442949295,
-      "num_tokens": 507430.0,
-      "step": 1520
-    },
-    {
-      "entropy": 0.6032628089189529,
-      "epoch": 0.8742857142857143,
-      "grad_norm": 0.20266121625900269,
-      "learning_rate": 2.5257142857142855e-05,
-      "loss": 0.5601680755615235,
-      "mean_token_accuracy": 0.8660111904144288,
-      "num_tokens": 510923.0,
-      "step": 1530
-    },
-    {
-      "entropy": 0.599126148968935,
-      "epoch": 0.88,
-      "grad_norm": 0.24769169092178345,
-      "learning_rate": 2.4114285714285713e-05,
-      "loss": 0.6129156589508057,
-      "mean_token_accuracy": 0.8624033451080322,
-      "num_tokens": 514276.0,
-      "step": 1540
-    },
-    {
-      "entropy": 0.6265991859138011,
-      "epoch": 0.8857142857142857,
-      "grad_norm": 0.24149306118488312,
-      "learning_rate": 2.297142857142857e-05,
-      "loss": 0.645173168182373,
-      "mean_token_accuracy": 0.8579171389341355,
-      "num_tokens": 517443.0,
-      "step": 1550
-    },
-    {
-      "entropy": 0.5904549680650234,
-      "epoch": 0.8914285714285715,
-      "grad_norm": 0.2332906723022461,
-      "learning_rate": 2.1828571428571428e-05,
-      "loss": 0.5481174945831299,
-      "mean_token_accuracy": 0.8670677661895752,
-      "num_tokens": 520947.0,
-      "step": 1560
-    },
-    {
-      "entropy": 0.7045296929776669,
-      "epoch": 0.8971428571428571,
-      "grad_norm": 0.37327486276626587,
-      "learning_rate": 2.0685714285714285e-05,
-      "loss": 0.6851255416870117,
-      "mean_token_accuracy": 0.8483647271990776,
-      "num_tokens": 524104.0,
-      "step": 1570
-    },
-    {
-      "entropy": 0.6310351669788361,
-      "epoch": 0.9028571428571428,
-      "grad_norm": 0.27085983753204346,
-      "learning_rate": 1.9542857142857143e-05,
-      "loss": 0.6175636291503906,
-      "mean_token_accuracy": 0.8585921674966812,
-      "num_tokens": 527462.0,
-      "step": 1580
-    },
-    {
-      "entropy": 0.6248554646968841,
-      "epoch": 0.9085714285714286,
-      "grad_norm": 0.29151156544685364,
-      "learning_rate": 1.84e-05,
-      "loss": 0.6107958793640137,
-      "mean_token_accuracy": 0.8647030532360077,
-      "num_tokens": 530929.0,
-      "step": 1590
-    },
-    {
-      "entropy": 0.6354066073894501,
-      "epoch": 0.9142857142857143,
-      "grad_norm": 0.26907071471214294,
-      "learning_rate": 1.7257142857142857e-05,
-      "loss": 0.6115604400634765,
-      "mean_token_accuracy": 0.8671793237328529,
-      "num_tokens": 534220.0,
-      "step": 1600
-    },
-    {
-      "entropy": 0.6623220466077328,
-      "epoch": 0.92,
-      "grad_norm": 0.23479118943214417,
-      "learning_rate": 1.6114285714285715e-05,
-      "loss": 0.629840898513794,
-      "mean_token_accuracy": 0.8525548726320267,
-      "num_tokens": 537701.0,
-      "step": 1610
-    },
-    {
-      "entropy": 0.6476062543690204,
-      "epoch": 0.9257142857142857,
-      "grad_norm": 0.2056824117898941,
-      "learning_rate": 1.4971428571428572e-05,
-      "loss": 0.6435581684112549,
-      "mean_token_accuracy": 0.8595114961266518,
-      "num_tokens": 541293.0,
-      "step": 1620
-    },
-    {
-      "entropy": 0.6168920576572419,
-      "epoch": 0.9314285714285714,
-      "grad_norm": 0.22080306708812714,
-      "learning_rate": 1.382857142857143e-05,
-      "loss": 0.5561806678771972,
-      "mean_token_accuracy": 0.8639644294977188,
-      "num_tokens": 544565.0,
-      "step": 1630
-    },
-    {
-      "entropy": 0.6294913746416568,
-      "epoch": 0.9371428571428572,
-      "grad_norm": 0.3226984441280365,
-      "learning_rate": 1.2685714285714287e-05,
-      "loss": 0.5850194931030274,
-      "mean_token_accuracy": 0.8648865327239037,
-      "num_tokens": 547743.0,
-      "step": 1640
-    },
-    {
-      "entropy": 0.6362057097256184,
-      "epoch": 0.9428571428571428,
-      "grad_norm": 0.2684152126312256,
-      "learning_rate": 1.1542857142857143e-05,
-      "loss": 0.6132237911224365,
-      "mean_token_accuracy": 0.8577535077929497,
-      "num_tokens": 551275.0,
-      "step": 1650
-    },
-    {
-      "entropy": 0.6077159576117992,
-      "epoch": 0.9485714285714286,
-      "grad_norm": 0.26599714159965515,
-      "learning_rate": 1.04e-05,
-      "loss": 0.5903688430786133,
-      "mean_token_accuracy": 0.8740017876029015,
-      "num_tokens": 554538.0,
-      "step": 1660
-    },
-    {
-      "entropy": 0.6076049767434597,
-      "epoch": 0.9542857142857143,
-      "grad_norm": 0.22315815091133118,
-      "learning_rate": 9.257142857142858e-06,
-      "loss": 0.5887226104736328,
-      "mean_token_accuracy": 0.8686643913388252,
-      "num_tokens": 557948.0,
-      "step": 1670
-    },
-    {
-      "entropy": 0.6360240176320076,
-      "epoch": 0.96,
-      "grad_norm": 0.2517399787902832,
-      "learning_rate": 8.114285714285715e-06,
-      "loss": 0.6221353054046631,
-      "mean_token_accuracy": 0.8617710500955582,
-      "num_tokens": 561093.0,
-      "step": 1680
-    },
-    {
-      "entropy": 0.6221488267183304,
-      "epoch": 0.9657142857142857,
-      "grad_norm": 0.2868978977203369,
-      "learning_rate": 6.971428571428572e-06,
-      "loss": 0.6088948249816895,
-      "mean_token_accuracy": 0.8645280092954636,
-      "num_tokens": 564497.0,
-      "step": 1690
-    },
-    {
-      "entropy": 0.6176722340285778,
-      "epoch": 0.9714285714285714,
-      "grad_norm": 0.27099671959877014,
-      "learning_rate": 5.828571428571429e-06,
-      "loss": 0.5873197078704834,
-      "mean_token_accuracy": 0.8671502575278283,
-      "num_tokens": 567977.0,
-      "step": 1700
-    },
-    {
-      "entropy": 0.5968088746070862,
-      "epoch": 0.9771428571428571,
-      "grad_norm": 0.25095444917678833,
-      "learning_rate": 4.685714285714286e-06,
-      "loss": 0.6059055805206299,
-      "mean_token_accuracy": 0.8739419683814049,
-      "num_tokens": 571185.0,
-      "step": 1710
-    },
-    {
-      "entropy": 0.6225132785737515,
-      "epoch": 0.9828571428571429,
-      "grad_norm": 0.28391072154045105,
-      "learning_rate": 3.542857142857143e-06,
-      "loss": 0.5855085372924804,
-      "mean_token_accuracy": 0.8671080946922303,
-      "num_tokens": 574447.0,
-      "step": 1720
-    },
-    {
-      "entropy": 0.6214111320674419,
-      "epoch": 0.9885714285714285,
-      "grad_norm": 0.30393826961517334,
-      "learning_rate": 2.4000000000000003e-06,
-      "loss": 0.6098702907562256,
-      "mean_token_accuracy": 0.8673963889479637,
-      "num_tokens": 577647.0,
-      "step": 1730
-    },
-    {
-      "entropy": 0.6451270334422589,
-      "epoch": 0.9942857142857143,
-      "grad_norm": 0.6247619390487671,
-      "learning_rate": 1.2571428571428573e-06,
-      "loss": 0.6435680389404297,
-      "mean_token_accuracy": 0.8606103897094727,
-      "num_tokens": 580924.0,
-      "step": 1740
-    },
-    {
-      "entropy": 0.6013512119650841,
-      "epoch": 1.0,
-      "grad_norm": 0.22891853749752045,
-      "learning_rate": 1.142857142857143e-07,
-      "loss": 0.5656134605407714,
-      "mean_token_accuracy": 0.8671733975410462,
-      "num_tokens": 584229.0,
-      "step": 1750
-    }
-  ],
-  "logging_steps": 10,
-  "max_steps": 1750,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 1,
-  "save_steps": 500,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": true,
-        "should_training_stop": true
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 4600872360760320.0,
-  "train_batch_size": 1,
-  "trial_name": null,
-  "trial_params": null
-}

sql-model/checkpoint-1750/training_args.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:29a4a11ec0ba52a64430eabdbdc4808ed84fc37c06b05e2f78d6eedc6da2ee37
-size 5649

sql-model/tokenizer.json DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
-size 11421892

sql-model/tokenizer_config.json DELETED Viewed

@@ -1,29 +0,0 @@
-{
-  "add_prefix_space": false,
-  "backend": "tokenizers",
-  "bos_token": null,
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "<|im_end|>",
-  "errors": "replace",
-  "extra_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "is_local": false,
-  "model_max_length": 32768,
-  "pad_token": "<|im_end|>",
-  "split_special_tokens": false,
-  "tokenizer_class": "Qwen2Tokenizer",
-  "unk_token": null
-}