Instructions to use Komma-LuisMiSanVe/LangToSQL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Komma-LuisMiSanVe/LangToSQL with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Komma-LuisMiSanVe/LangToSQL",
	filename="LangToSQL-1.5B-F16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Komma-LuisMiSanVe/LangToSQL with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
./llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Use Docker

docker model run hf.co/Komma-LuisMiSanVe/LangToSQL:F16

LM Studio
Jan
Ollama
How to use Komma-LuisMiSanVe/LangToSQL with Ollama:
```
ollama run hf.co/Komma-LuisMiSanVe/LangToSQL:F16
```

Unsloth Studio new

How to use Komma-LuisMiSanVe/LangToSQL with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Komma-LuisMiSanVe/LangToSQL to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Komma-LuisMiSanVe/LangToSQL to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Komma-LuisMiSanVe/LangToSQL to start chatting

Pi new

How to use Komma-LuisMiSanVe/LangToSQL with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Komma-LuisMiSanVe/LangToSQL:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Komma-LuisMiSanVe/LangToSQL with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Komma-LuisMiSanVe/LangToSQL:F16

Run Hermes

hermes

Docker Model Runner
How to use Komma-LuisMiSanVe/LangToSQL with Docker Model Runner:
```
docker model run hf.co/Komma-LuisMiSanVe/LangToSQL:F16
```

Lemonade

How to use Komma-LuisMiSanVe/LangToSQL with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Komma-LuisMiSanVe/LangToSQL:F16

Run and chat with the model

lemonade run user.LangToSQL-F16

List all available models

lemonade list

Komma-LuisMiSanVe commited on Mar 31

Commit

975ecd5

1 Parent(s): 82dbca0

Add trained model

Browse files

Files changed (12) hide show

sql-model-merged/.gitattributes +1 -0
sql-model-merged/config.json +34 -0
sql-model-merged/generation_config.json +6 -0
sql-model/README.md +62 -0
sql-model/adapter_config.json +41 -0
sql-model/checkpoint-1750/README.md +209 -0
sql-model/checkpoint-1750/adapter_config.json +41 -0
sql-model/checkpoint-1750/tokenizer.json +0 -0
sql-model/checkpoint-1750/tokenizer_config.json +14 -0
sql-model/checkpoint-1750/trainer_state.json +1784 -0
sql-model/tokenizer.json +0 -0
sql-model/tokenizer_config.json +14 -0

sql-model-merged/.gitattributes ADDED Viewed

	@@ -0,0 +1 @@


1	+ *.safetensors filter=lfs diff=lfs merge=lfs -text

sql-model-merged/config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 32013,
+  "dtype": "float32",
+  "eos_token_id": 32014,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 5504,
+  "max_position_embeddings": 16384,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 16,
+  "pad_token_id": null,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "factor": 4.0,
+    "rope_theta": 100000,
+    "rope_type": "linear",
+    "type": "linear"
+  },
+  "tie_word_embeddings": false,
+  "transformers_version": "5.4.0",
+  "use_cache": true,
+  "vocab_size": 32256
+}

sql-model-merged/generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 32013,
+  "eos_token_id": 32014,
+  "transformers_version": "5.4.0"
+}

sql-model/README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+base_model: deepseek-ai/deepseek-coder-1.3b-base
+library_name: peft
+model_name: sql-model
+tags:
+- base_model:adapter:deepseek-ai/deepseek-coder-1.3b-base
+- lora
+- sft
+- transformers
+- trl
+licence: license
+pipeline_tag: text-generation
+---
+# Model Card for sql-model
+This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="None", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+This model was trained with SFT.
+### Framework versions
+- PEFT 0.18.1
+- TRL: 1.0.0
+- Transformers: 5.4.0
+- Pytorch: 2.11.0
+- Datasets: 4.8.4
+- Tokenizers: 0.22.2
+## Citations
+Cite TRL as:
+```bibtex
+@software{vonwerra2020trl,
+  title   = {{TRL: Transformers Reinforcement Learning}},
+  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and GallouÃ©dec, Quentin},
+  license = {Apache-2.0},
+  url     = {https://github.com/huggingface/trl},
+  year    = {2020}
+}
+```

sql-model/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "deepseek-ai/deepseek-coder-1.3b-base",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

sql-model/checkpoint-1750/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: deepseek-ai/deepseek-coder-1.3b-base
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:deepseek-ai/deepseek-coder-1.3b-base
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.1

sql-model/checkpoint-1750/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "deepseek-ai/deepseek-coder-1.3b-base",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

sql-model/checkpoint-1750/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

sql-model/checkpoint-1750/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "add_prefix_space": null,
+  "backend": "tokenizers",
+  "bos_token": "<｜begin▁of▁sentence｜>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<｜end▁of▁sentence｜>",
+  "is_local": false,
+  "model_max_length": 16384,
+  "pad_token": "<｜end▁of▁sentence｜>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": null,
+  "use_default_system_prompt": false
+}

sql-model/checkpoint-1750/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1784 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 1750,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 1.260360875725746,
+      "epoch": 0.005714285714285714,
+      "grad_norm": 0.14475300908088684,
+      "learning_rate": 0.00019897142857142858,
+      "loss": 1.8931827545166016,
+      "mean_token_accuracy": 0.7422700569033622,
+      "num_tokens": 20480.0,
+      "step": 10
+    },
+    {
+      "entropy": 0.6686364717781543,
+      "epoch": 0.011428571428571429,
+      "grad_norm": 0.12105409055948257,
+      "learning_rate": 0.00019782857142857142,
+      "loss": 0.8410327911376954,
+      "mean_token_accuracy": 0.9263698622584343,
+      "num_tokens": 40960.0,
+      "step": 20
+    },
+    {
+      "entropy": 0.5374040573835372,
+      "epoch": 0.017142857142857144,
+      "grad_norm": 0.0709354504942894,
+      "learning_rate": 0.0001966857142857143,
+      "loss": 0.5800150871276856,
+      "mean_token_accuracy": 0.9434442237019539,
+      "num_tokens": 61440.0,
+      "step": 30
+    },
+    {
+      "entropy": 0.673083483427763,
+      "epoch": 0.022857142857142857,
+      "grad_norm": 0.19102220237255096,
+      "learning_rate": 0.00019554285714285717,
+      "loss": 0.6174958229064942,
+      "mean_token_accuracy": 0.9334637925028801,
+      "num_tokens": 81920.0,
+      "step": 40
+    },
+    {
+      "entropy": 0.6077337548136711,
+      "epoch": 0.02857142857142857,
+      "grad_norm": 0.124457947909832,
+      "learning_rate": 0.0001944,
+      "loss": 0.6033527374267578,
+      "mean_token_accuracy": 0.9302348345518112,
+      "num_tokens": 102400.0,
+      "step": 50
+    },
+    {
+      "entropy": 0.6865496441721917,
+      "epoch": 0.03428571428571429,
+      "grad_norm": 0.12185850739479065,
+      "learning_rate": 0.00019325714285714287,
+      "loss": 0.7060577392578125,
+      "mean_token_accuracy": 0.914041094481945,
+      "num_tokens": 122880.0,
+      "step": 60
+    },
+    {
+      "entropy": 0.5015663422644139,
+      "epoch": 0.04,
+      "grad_norm": 0.12615624070167542,
+      "learning_rate": 0.0001921142857142857,
+      "loss": 0.5055368900299072,
+      "mean_token_accuracy": 0.9372798427939415,
+      "num_tokens": 143360.0,
+      "step": 70
+    },
+    {
+      "entropy": 0.4910366632044315,
+      "epoch": 0.045714285714285714,
+      "grad_norm": 0.07611342519521713,
+      "learning_rate": 0.00019097142857142857,
+      "loss": 0.5325770854949952,
+      "mean_token_accuracy": 0.933317020535469,
+      "num_tokens": 163840.0,
+      "step": 80
+    },
+    {
+      "entropy": 0.6832622230052948,
+      "epoch": 0.05142857142857143,
+      "grad_norm": 0.053595155477523804,
+      "learning_rate": 0.00018982857142857144,
+      "loss": 0.6098953247070312,
+      "mean_token_accuracy": 0.9255381539463997,
+      "num_tokens": 184320.0,
+      "step": 90
+    },
+    {
+      "entropy": 0.5981594547629356,
+      "epoch": 0.05714285714285714,
+      "grad_norm": 0.1217501237988472,
+      "learning_rate": 0.00018868571428571428,
+      "loss": 0.6175091743469239,
+      "mean_token_accuracy": 0.9214774891734123,
+      "num_tokens": 204800.0,
+      "step": 100
+    },
+    {
+      "entropy": 0.46232525929808616,
+      "epoch": 0.06285714285714286,
+      "grad_norm": 0.11764337867498398,
+      "learning_rate": 0.00018754285714285714,
+      "loss": 0.5024982929229737,
+      "mean_token_accuracy": 0.9378180012106896,
+      "num_tokens": 225280.0,
+      "step": 110
+    },
+    {
+      "entropy": 0.6816697306931019,
+      "epoch": 0.06857142857142857,
+      "grad_norm": 0.11498382687568665,
+      "learning_rate": 0.00018640000000000003,
+      "loss": 0.6160881996154786,
+      "mean_token_accuracy": 0.9226027399301528,
+      "num_tokens": 245760.0,
+      "step": 120
+    },
+    {
+      "entropy": 0.5926385544240474,
+      "epoch": 0.07428571428571429,
+      "grad_norm": 0.14184272289276123,
+      "learning_rate": 0.00018525714285714287,
+      "loss": 0.6105648040771484,
+      "mean_token_accuracy": 0.9212328717112541,
+      "num_tokens": 266240.0,
+      "step": 130
+    },
+    {
+      "entropy": 0.6943187221884728,
+      "epoch": 0.08,
+      "grad_norm": 0.06698362529277802,
+      "learning_rate": 0.00018411428571428573,
+      "loss": 0.7239139556884766,
+      "mean_token_accuracy": 0.9079256355762482,
+      "num_tokens": 286720.0,
+      "step": 140
+    },
+    {
+      "entropy": 0.5330000497400761,
+      "epoch": 0.08571428571428572,
+      "grad_norm": 0.05283268541097641,
+      "learning_rate": 0.00018297142857142857,
+      "loss": 0.5285147666931153,
+      "mean_token_accuracy": 0.9346868827939033,
+      "num_tokens": 307200.0,
+      "step": 150
+    },
+    {
+      "entropy": 0.6167198076844216,
+      "epoch": 0.09142857142857143,
+      "grad_norm": 0.11342243105173111,
+      "learning_rate": 0.00018182857142857143,
+      "loss": 0.5238636493682861,
+      "mean_token_accuracy": 0.9310665339231491,
+      "num_tokens": 327680.0,
+      "step": 160
+    },
+    {
+      "entropy": 0.5902882516384125,
+      "epoch": 0.09714285714285714,
+      "grad_norm": 0.052674733102321625,
+      "learning_rate": 0.0001806857142857143,
+      "loss": 0.6207498550415039,
+      "mean_token_accuracy": 0.9209882557392121,
+      "num_tokens": 348160.0,
+      "step": 170
+    },
+    {
+      "entropy": 0.5036320418119431,
+      "epoch": 0.10285714285714286,
+      "grad_norm": 0.1300937831401825,
+      "learning_rate": 0.00017954285714285714,
+      "loss": 0.4891793727874756,
+      "mean_token_accuracy": 0.9383561670780182,
+      "num_tokens": 368640.0,
+      "step": 180
+    },
+    {
+      "entropy": 0.5235385537147522,
+      "epoch": 0.10857142857142857,
+      "grad_norm": 0.15759502351284027,
+      "learning_rate": 0.0001784,
+      "loss": 0.5480166912078858,
+      "mean_token_accuracy": 0.9290606617927551,
+      "num_tokens": 389120.0,
+      "step": 190
+    },
+    {
+      "entropy": 0.5598056815564633,
+      "epoch": 0.11428571428571428,
+      "grad_norm": 0.2110883891582489,
+      "learning_rate": 0.00017725714285714286,
+      "loss": 0.5246779918670654,
+      "mean_token_accuracy": 0.9307729959487915,
+      "num_tokens": 409600.0,
+      "step": 200
+    },
+    {
+      "entropy": 0.5961586087942123,
+      "epoch": 0.12,
+      "grad_norm": 0.2098916918039322,
+      "learning_rate": 0.00017611428571428573,
+      "loss": 0.6312565326690673,
+      "mean_token_accuracy": 0.9181996077299118,
+      "num_tokens": 430080.0,
+      "step": 210
+    },
+    {
+      "entropy": 0.6443906672298908,
+      "epoch": 0.12571428571428572,
+      "grad_norm": 0.05823206156492233,
+      "learning_rate": 0.0001749714285714286,
+      "loss": 0.5860573768615722,
+      "mean_token_accuracy": 0.9248043030500412,
+      "num_tokens": 450560.0,
+      "step": 220
+    },
+    {
+      "entropy": 0.4558599293231964,
+      "epoch": 0.13142857142857142,
+      "grad_norm": 0.04420356824994087,
+      "learning_rate": 0.00017382857142857143,
+      "loss": 0.5038563728332519,
+      "mean_token_accuracy": 0.9343444183468819,
+      "num_tokens": 471040.0,
+      "step": 230
+    },
+    {
+      "entropy": 0.700678950548172,
+      "epoch": 0.13714285714285715,
+      "grad_norm": 0.056033965200185776,
+      "learning_rate": 0.0001726857142857143,
+      "loss": 0.6543848514556885,
+      "mean_token_accuracy": 0.9165851280093193,
+      "num_tokens": 491520.0,
+      "step": 240
+    },
+    {
+      "entropy": 0.5037958301603794,
+      "epoch": 0.14285714285714285,
+      "grad_norm": 0.04548604413866997,
+      "learning_rate": 0.00017154285714285716,
+      "loss": 0.5128421306610107,
+      "mean_token_accuracy": 0.9345401152968407,
+      "num_tokens": 512000.0,
+      "step": 250
+    },
+    {
+      "entropy": 0.7069774232804775,
+      "epoch": 0.14857142857142858,
+      "grad_norm": 0.06355439871549606,
+      "learning_rate": 0.0001704,
+      "loss": 0.6548576354980469,
+      "mean_token_accuracy": 0.9170254364609718,
+      "num_tokens": 532480.0,
+      "step": 260
+    },
+    {
+      "entropy": 0.5105256482958793,
+      "epoch": 0.15428571428571428,
+      "grad_norm": 0.18286369740962982,
+      "learning_rate": 0.00016925714285714286,
+      "loss": 0.5638271331787109,
+      "mean_token_accuracy": 0.927397258579731,
+      "num_tokens": 552960.0,
+      "step": 270
+    },
+    {
+      "entropy": 0.5947913646697998,
+      "epoch": 0.16,
+      "grad_norm": 0.06678909808397293,
+      "learning_rate": 0.00016811428571428572,
+      "loss": 0.5556015014648438,
+      "mean_token_accuracy": 0.9284735813736915,
+      "num_tokens": 573440.0,
+      "step": 280
+    },
+    {
+      "entropy": 0.4055472381412983,
+      "epoch": 0.1657142857142857,
+      "grad_norm": 0.060138579457998276,
+      "learning_rate": 0.0001669714285714286,
+      "loss": 0.4508848190307617,
+      "mean_token_accuracy": 0.9411448150873184,
+      "num_tokens": 593920.0,
+      "step": 290
+    },
+    {
+      "entropy": 0.7649829842150211,
+      "epoch": 0.17142857142857143,
+      "grad_norm": 0.16261634230613708,
+      "learning_rate": 0.00016582857142857145,
+      "loss": 0.6465051651000977,
+      "mean_token_accuracy": 0.9155088007450104,
+      "num_tokens": 614400.0,
+      "step": 300
+    },
+    {
+      "entropy": 0.36847573295235636,
+      "epoch": 0.17714285714285713,
+      "grad_norm": 0.04129418730735779,
+      "learning_rate": 0.0001646857142857143,
+      "loss": 0.5112833976745605,
+      "mean_token_accuracy": 0.9333659455180168,
+      "num_tokens": 634880.0,
+      "step": 310
+    },
+    {
+      "entropy": 0.7789666198194027,
+      "epoch": 0.18285714285714286,
+      "grad_norm": 0.2533145546913147,
+      "learning_rate": 0.00016354285714285715,
+      "loss": 0.63450927734375,
+      "mean_token_accuracy": 0.9185420736670494,
+      "num_tokens": 655360.0,
+      "step": 320
+    },
+    {
+      "entropy": 0.33430018573999404,
+      "epoch": 0.18857142857142858,
+      "grad_norm": 0.122005395591259,
+      "learning_rate": 0.00016240000000000002,
+      "loss": 0.4856534957885742,
+      "mean_token_accuracy": 0.9402152642607688,
+      "num_tokens": 675840.0,
+      "step": 330
+    },
+    {
+      "entropy": 0.5829201385378837,
+      "epoch": 0.19428571428571428,
+      "grad_norm": 0.20030805468559265,
+      "learning_rate": 0.00016125714285714285,
+      "loss": 0.49710564613342284,
+      "mean_token_accuracy": 0.9347358077764512,
+      "num_tokens": 696320.0,
+      "step": 340
+    },
+    {
+      "entropy": 0.5763799995183945,
+      "epoch": 0.2,
+      "grad_norm": 0.08205129206180573,
+      "learning_rate": 0.00016011428571428572,
+      "loss": 0.5814402580261231,
+      "mean_token_accuracy": 0.9234344378113747,
+      "num_tokens": 716800.0,
+      "step": 350
+    },
+    {
+      "entropy": 0.5829779535531998,
+      "epoch": 0.2057142857142857,
+      "grad_norm": 0.09151940047740936,
+      "learning_rate": 0.00015897142857142858,
+      "loss": 0.6152735233306885,
+      "mean_token_accuracy": 0.9193737730383873,
+      "num_tokens": 737280.0,
+      "step": 360
+    },
+    {
+      "entropy": 0.5310064390301704,
+      "epoch": 0.21142857142857144,
+      "grad_norm": 0.07890532165765762,
+      "learning_rate": 0.00015782857142857145,
+      "loss": 0.5040150642395019,
+      "mean_token_accuracy": 0.9362524405121804,
+      "num_tokens": 757760.0,
+      "step": 370
+    },
+    {
+      "entropy": 0.43088596761226655,
+      "epoch": 0.21714285714285714,
+      "grad_norm": 0.16556017100811005,
+      "learning_rate": 0.0001566857142857143,
+      "loss": 0.4871084690093994,
+      "mean_token_accuracy": 0.936545978486538,
+      "num_tokens": 778240.0,
+      "step": 380
+    },
+    {
+      "entropy": 0.5633072890341282,
+      "epoch": 0.22285714285714286,
+      "grad_norm": 0.04732167348265648,
+      "learning_rate": 0.00015554285714285715,
+      "loss": 0.46471123695373534,
+      "mean_token_accuracy": 0.938209393620491,
+      "num_tokens": 798720.0,
+      "step": 390
+    },
+    {
+      "entropy": 0.47751612216234207,
+      "epoch": 0.22857142857142856,
+      "grad_norm": 0.10221615433692932,
+      "learning_rate": 0.0001544,
+      "loss": 0.5514689922332764,
+      "mean_token_accuracy": 0.9290117397904396,
+      "num_tokens": 819200.0,
+      "step": 400
+    },
+    {
+      "entropy": 0.5929409205913544,
+      "epoch": 0.2342857142857143,
+      "grad_norm": 0.07100944966077805,
+      "learning_rate": 0.00015325714285714285,
+      "loss": 0.5279238224029541,
+      "mean_token_accuracy": 0.9321428522467613,
+      "num_tokens": 839680.0,
+      "step": 410
+    },
+    {
+      "entropy": 0.5618587106466293,
+      "epoch": 0.24,
+      "grad_norm": 0.040722209960222244,
+      "learning_rate": 0.00015211428571428571,
+      "loss": 0.5124303340911865,
+      "mean_token_accuracy": 0.9332681059837341,
+      "num_tokens": 860160.0,
+      "step": 420
+    },
+    {
+      "entropy": 0.4459361031651497,
+      "epoch": 0.24571428571428572,
+      "grad_norm": 0.10662642121315002,
+      "learning_rate": 0.00015097142857142858,
+      "loss": 0.5077432632446289,
+      "mean_token_accuracy": 0.9327299326658249,
+      "num_tokens": 880640.0,
+      "step": 430
+    },
+    {
+      "entropy": 0.738821828365326,
+      "epoch": 0.25142857142857145,
+      "grad_norm": 0.15264756977558136,
+      "learning_rate": 0.00014982857142857144,
+      "loss": 0.6622607231140136,
+      "mean_token_accuracy": 0.9129158467054367,
+      "num_tokens": 901120.0,
+      "step": 440
+    },
+    {
+      "entropy": 0.5146319203078746,
+      "epoch": 0.2571428571428571,
+      "grad_norm": 0.09214670211076736,
+      "learning_rate": 0.0001486857142857143,
+      "loss": 0.6306316375732421,
+      "mean_token_accuracy": 0.9173189774155617,
+      "num_tokens": 921600.0,
+      "step": 450
+    },
+    {
+      "entropy": 0.7235424101352692,
+      "epoch": 0.26285714285714284,
+      "grad_norm": 0.09560181945562363,
+      "learning_rate": 0.00014754285714285717,
+      "loss": 0.6143333435058593,
+      "mean_token_accuracy": 0.9204500928521157,
+      "num_tokens": 942080.0,
+      "step": 460
+    },
+    {
+      "entropy": 0.525194027274847,
+      "epoch": 0.26857142857142857,
+      "grad_norm": 0.0698908343911171,
+      "learning_rate": 0.0001464,
+      "loss": 0.5927223205566406,
+      "mean_token_accuracy": 0.9236301362514496,
+      "num_tokens": 962560.0,
+      "step": 470
+    },
+    {
+      "entropy": 0.6539882756769657,
+      "epoch": 0.2742857142857143,
+      "grad_norm": 0.06869114935398102,
+      "learning_rate": 0.00014525714285714287,
+      "loss": 0.5622227668762207,
+      "mean_token_accuracy": 0.9269569426774978,
+      "num_tokens": 983040.0,
+      "step": 480
+    },
+    {
+      "entropy": 0.6007681101560592,
+      "epoch": 0.28,
+      "grad_norm": 0.05957495793700218,
+      "learning_rate": 0.0001441142857142857,
+      "loss": 0.6351003646850586,
+      "mean_token_accuracy": 0.9169765204191208,
+      "num_tokens": 1003520.0,
+      "step": 490
+    },
+    {
+      "entropy": 0.47484939694404604,
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.12594401836395264,
+      "learning_rate": 0.00014297142857142857,
+      "loss": 0.48124194145202637,
+      "mean_token_accuracy": 0.9353718146681785,
+      "num_tokens": 1024000.0,
+      "step": 500
+    },
+    {
+      "entropy": 0.5398600555956363,
+      "epoch": 0.2914285714285714,
+      "grad_norm": 0.10446745902299881,
+      "learning_rate": 0.00014182857142857144,
+      "loss": 0.48729524612426756,
+      "mean_token_accuracy": 0.9348336592316627,
+      "num_tokens": 1044480.0,
+      "step": 510
+    },
+    {
+      "entropy": 0.5471270829439163,
+      "epoch": 0.29714285714285715,
+      "grad_norm": 0.06563620269298553,
+      "learning_rate": 0.00014068571428571427,
+      "loss": 0.5080405712127686,
+      "mean_token_accuracy": 0.9341976478695869,
+      "num_tokens": 1064960.0,
+      "step": 520
+    },
+    {
+      "entropy": 0.33239252008497716,
+      "epoch": 0.3028571428571429,
+      "grad_norm": 0.08698952198028564,
+      "learning_rate": 0.00013954285714285717,
+      "loss": 0.4493887901306152,
+      "mean_token_accuracy": 0.9398238703608512,
+      "num_tokens": 1085440.0,
+      "step": 530
+    },
+    {
+      "entropy": 0.6027603760361672,
+      "epoch": 0.30857142857142855,
+      "grad_norm": 0.05392242968082428,
+      "learning_rate": 0.0001384,
+      "loss": 0.586741590499878,
+      "mean_token_accuracy": 0.9232876688241959,
+      "num_tokens": 1105920.0,
+      "step": 540
+    },
+    {
+      "entropy": 0.5952301643788814,
+      "epoch": 0.3142857142857143,
+      "grad_norm": 0.18325185775756836,
+      "learning_rate": 0.00013725714285714287,
+      "loss": 0.5690345287322998,
+      "mean_token_accuracy": 0.9233365893363953,
+      "num_tokens": 1126400.0,
+      "step": 550
+    },
+    {
+      "entropy": 0.5742614075541497,
+      "epoch": 0.32,
+      "grad_norm": 0.24716411530971527,
+      "learning_rate": 0.00013611428571428573,
+      "loss": 0.5659992218017578,
+      "mean_token_accuracy": 0.9252935364842415,
+      "num_tokens": 1146880.0,
+      "step": 560
+    },
+    {
+      "entropy": 0.5065363116562367,
+      "epoch": 0.32571428571428573,
+      "grad_norm": 0.11625125259160995,
+      "learning_rate": 0.00013497142857142857,
+      "loss": 0.46491494178771975,
+      "mean_token_accuracy": 0.9386496990919113,
+      "num_tokens": 1167360.0,
+      "step": 570
+    },
+    {
+      "entropy": 0.5801173232495784,
+      "epoch": 0.3314285714285714,
+      "grad_norm": 0.053480129688978195,
+      "learning_rate": 0.00013382857142857143,
+      "loss": 0.5747738838195801,
+      "mean_token_accuracy": 0.925,
+      "num_tokens": 1187840.0,
+      "step": 580
+    },
+    {
+      "entropy": 0.6268129035830498,
+      "epoch": 0.33714285714285713,
+      "grad_norm": 0.13001662492752075,
+      "learning_rate": 0.0001326857142857143,
+      "loss": 0.6112659454345704,
+      "mean_token_accuracy": 0.9196183919906616,
+      "num_tokens": 1208320.0,
+      "step": 590
+    },
+    {
+      "entropy": 0.5466722629964351,
+      "epoch": 0.34285714285714286,
+      "grad_norm": 0.1628836840391159,
+      "learning_rate": 0.00013154285714285713,
+      "loss": 0.5239183902740479,
+      "mean_token_accuracy": 0.9299412861466407,
+      "num_tokens": 1228800.0,
+      "step": 600
+    },
+    {
+      "entropy": 0.3720652416348457,
+      "epoch": 0.3485714285714286,
+      "grad_norm": 0.16002608835697174,
+      "learning_rate": 0.0001304,
+      "loss": 0.41176614761352537,
+      "mean_token_accuracy": 0.9441291555762291,
+      "num_tokens": 1249280.0,
+      "step": 610
+    },
+    {
+      "entropy": 0.5275939412415027,
+      "epoch": 0.35428571428571426,
+      "grad_norm": 0.08797982335090637,
+      "learning_rate": 0.00012925714285714286,
+      "loss": 0.47243666648864746,
+      "mean_token_accuracy": 0.937622307240963,
+      "num_tokens": 1269760.0,
+      "step": 620
+    },
+    {
+      "entropy": 0.43704629465937617,
+      "epoch": 0.36,
+      "grad_norm": 0.05002142861485481,
+      "learning_rate": 0.00012811428571428573,
+      "loss": 0.4989802837371826,
+      "mean_token_accuracy": 0.9348336562514306,
+      "num_tokens": 1290240.0,
+      "step": 630
+    },
+    {
+      "entropy": 0.6504201047122479,
+      "epoch": 0.3657142857142857,
+      "grad_norm": 0.11755041778087616,
+      "learning_rate": 0.0001269714285714286,
+      "loss": 0.6163222312927246,
+      "mean_token_accuracy": 0.9187377691268921,
+      "num_tokens": 1310720.0,
+      "step": 640
+    },
+    {
+      "entropy": 0.5586043894290924,
+      "epoch": 0.37142857142857144,
+      "grad_norm": 0.14057014882564545,
+      "learning_rate": 0.00012582857142857143,
+      "loss": 0.4861945152282715,
+      "mean_token_accuracy": 0.9349315032362938,
+      "num_tokens": 1331200.0,
+      "step": 650
+    },
+    {
+      "entropy": 0.44176297783851626,
+      "epoch": 0.37714285714285717,
+      "grad_norm": 0.05297749489545822,
+      "learning_rate": 0.0001246857142857143,
+      "loss": 0.5086590766906738,
+      "mean_token_accuracy": 0.9314090013504028,
+      "num_tokens": 1351680.0,
+      "step": 660
+    },
+    {
+      "entropy": 0.5753153517842293,
+      "epoch": 0.38285714285714284,
+      "grad_norm": 0.08983558416366577,
+      "learning_rate": 0.00012354285714285713,
+      "loss": 0.5077627658843994,
+      "mean_token_accuracy": 0.9314579233527184,
+      "num_tokens": 1372160.0,
+      "step": 670
+    },
+    {
+      "entropy": 0.5109598033130169,
+      "epoch": 0.38857142857142857,
+      "grad_norm": 0.061045870184898376,
+      "learning_rate": 0.0001224,
+      "loss": 0.5647973537445068,
+      "mean_token_accuracy": 0.9246575310826302,
+      "num_tokens": 1392640.0,
+      "step": 680
+    },
+    {
+      "entropy": 0.5929775595664978,
+      "epoch": 0.3942857142857143,
+      "grad_norm": 0.1480552852153778,
+      "learning_rate": 0.00012125714285714287,
+      "loss": 0.5129212856292724,
+      "mean_token_accuracy": 0.9318003863096237,
+      "num_tokens": 1413120.0,
+      "step": 690
+    },
+    {
+      "entropy": 0.5623379878699779,
+      "epoch": 0.4,
+      "grad_norm": 0.11450239270925522,
+      "learning_rate": 0.00012011428571428571,
+      "loss": 0.5803545475006103,
+      "mean_token_accuracy": 0.9237279817461967,
+      "num_tokens": 1433600.0,
+      "step": 700
+    },
+    {
+      "entropy": 0.4025018382817507,
+      "epoch": 0.4057142857142857,
+      "grad_norm": 0.1846701204776764,
+      "learning_rate": 0.00011897142857142857,
+      "loss": 0.4205235481262207,
+      "mean_token_accuracy": 0.9443248510360718,
+      "num_tokens": 1454080.0,
+      "step": 710
+    },
+    {
+      "entropy": 0.6451694712042808,
+      "epoch": 0.4114285714285714,
+      "grad_norm": 0.24713803827762604,
+      "learning_rate": 0.00011782857142857145,
+      "loss": 0.584080696105957,
+      "mean_token_accuracy": 0.9242172226309776,
+      "num_tokens": 1474560.0,
+      "step": 720
+    },
+    {
+      "entropy": 0.49122287705540657,
+      "epoch": 0.41714285714285715,
+      "grad_norm": 0.06742525100708008,
+      "learning_rate": 0.00011668571428571429,
+      "loss": 0.5123952865600586,
+      "mean_token_accuracy": 0.9291585102677345,
+      "num_tokens": 1495040.0,
+      "step": 730
+    },
+    {
+      "entropy": 0.5520305454730987,
+      "epoch": 0.4228571428571429,
+      "grad_norm": 0.06774848699569702,
+      "learning_rate": 0.00011554285714285715,
+      "loss": 0.5740731716156006,
+      "mean_token_accuracy": 0.9251956939697266,
+      "num_tokens": 1515520.0,
+      "step": 740
+    },
+    {
+      "entropy": 0.6016006499528885,
+      "epoch": 0.42857142857142855,
+      "grad_norm": 0.10677064955234528,
+      "learning_rate": 0.0001144,
+      "loss": 0.576433515548706,
+      "mean_token_accuracy": 0.9245596766471863,
+      "num_tokens": 1536000.0,
+      "step": 750
+    },
+    {
+      "entropy": 0.3902605950832367,
+      "epoch": 0.4342857142857143,
+      "grad_norm": 0.20825354754924774,
+      "learning_rate": 0.00011325714285714287,
+      "loss": 0.43376636505126953,
+      "mean_token_accuracy": 0.9424168273806572,
+      "num_tokens": 1556480.0,
+      "step": 760
+    },
+    {
+      "entropy": 0.44849342107772827,
+      "epoch": 0.44,
+      "grad_norm": 0.04826491326093674,
+      "learning_rate": 0.00011211428571428573,
+      "loss": 0.48691091537475584,
+      "mean_token_accuracy": 0.935812133550644,
+      "num_tokens": 1576960.0,
+      "step": 770
+    },
+    {
+      "entropy": 0.6068522818386555,
+      "epoch": 0.44571428571428573,
+      "grad_norm": 0.12682656943798065,
+      "learning_rate": 0.00011097142857142857,
+      "loss": 0.5062217712402344,
+      "mean_token_accuracy": 0.9334637969732285,
+      "num_tokens": 1597440.0,
+      "step": 780
+    },
+    {
+      "entropy": 0.5890683718025684,
+      "epoch": 0.4514285714285714,
+      "grad_norm": 0.05915551632642746,
+      "learning_rate": 0.00010982857142857143,
+      "loss": 0.5772375106811524,
+      "mean_token_accuracy": 0.922945199906826,
+      "num_tokens": 1617920.0,
+      "step": 790
+    },
+    {
+      "entropy": 0.44907821491360667,
+      "epoch": 0.45714285714285713,
+      "grad_norm": 0.05221934989094734,
+      "learning_rate": 0.0001086857142857143,
+      "loss": 0.39677114486694337,
+      "mean_token_accuracy": 0.9475048884749413,
+      "num_tokens": 1638400.0,
+      "step": 800
+    },
+    {
+      "entropy": 0.3511381965130568,
+      "epoch": 0.46285714285714286,
+      "grad_norm": 0.037601977586746216,
+      "learning_rate": 0.00010754285714285715,
+      "loss": 0.49465084075927734,
+      "mean_token_accuracy": 0.9351761221885682,
+      "num_tokens": 1658880.0,
+      "step": 810
+    },
+    {
+      "entropy": 0.5840609729290008,
+      "epoch": 0.4685714285714286,
+      "grad_norm": 0.062267005443573,
+      "learning_rate": 0.00010640000000000001,
+      "loss": 0.5255018711090088,
+      "mean_token_accuracy": 0.929941289126873,
+      "num_tokens": 1679360.0,
+      "step": 820
+    },
+    {
+      "entropy": 0.55433234795928,
+      "epoch": 0.4742857142857143,
+      "grad_norm": 0.07288116216659546,
+      "learning_rate": 0.00010525714285714285,
+      "loss": 0.5209427356719971,
+      "mean_token_accuracy": 0.9328767120838165,
+      "num_tokens": 1699840.0,
+      "step": 830
+    },
+    {
+      "entropy": 0.5656354635953903,
+      "epoch": 0.48,
+      "grad_norm": 0.06822408735752106,
+      "learning_rate": 0.00010411428571428573,
+      "loss": 0.549201250076294,
+      "mean_token_accuracy": 0.9275929510593415,
+      "num_tokens": 1720320.0,
+      "step": 840
+    },
+    {
+      "entropy": 0.48973754718899726,
+      "epoch": 0.4857142857142857,
+      "grad_norm": 0.05990103632211685,
+      "learning_rate": 0.00010297142857142859,
+      "loss": 0.49325761795043943,
+      "mean_token_accuracy": 0.9340508818626404,
+      "num_tokens": 1740800.0,
+      "step": 850
+    },
+    {
+      "entropy": 0.47543270140886307,
+      "epoch": 0.49142857142857144,
+      "grad_norm": 0.17406728863716125,
+      "learning_rate": 0.00010182857142857143,
+      "loss": 0.4794588565826416,
+      "mean_token_accuracy": 0.9373776882886886,
+      "num_tokens": 1761280.0,
+      "step": 860
+    },
+    {
+      "entropy": 0.5125072635710239,
+      "epoch": 0.49714285714285716,
+      "grad_norm": 0.09598424285650253,
+      "learning_rate": 0.00010068571428571429,
+      "loss": 0.48879342079162597,
+      "mean_token_accuracy": 0.9328767105937004,
+      "num_tokens": 1781760.0,
+      "step": 870
+    },
+    {
+      "entropy": 0.5078008212149143,
+      "epoch": 0.5028571428571429,
+      "grad_norm": 0.04539811238646507,
+      "learning_rate": 9.954285714285714e-05,
+      "loss": 0.5100128173828125,
+      "mean_token_accuracy": 0.933170248568058,
+      "num_tokens": 1802240.0,
+      "step": 880
+    },
+    {
+      "entropy": 0.5812924958765506,
+      "epoch": 0.5085714285714286,
+      "grad_norm": 0.08453221619129181,
+      "learning_rate": 9.84e-05,
+      "loss": 0.5912289619445801,
+      "mean_token_accuracy": 0.9213307231664658,
+      "num_tokens": 1822720.0,
+      "step": 890
+    },
+    {
+      "entropy": 0.6200520746409893,
+      "epoch": 0.5142857142857142,
+      "grad_norm": 0.06393375247716904,
+      "learning_rate": 9.725714285714286e-05,
+      "loss": 0.5631557464599609,
+      "mean_token_accuracy": 0.92387475669384,
+      "num_tokens": 1843200.0,
+      "step": 900
+    },
+    {
+      "entropy": 0.5358941219747066,
+      "epoch": 0.52,
+      "grad_norm": 0.06611394882202148,
+      "learning_rate": 9.611428571428572e-05,
+      "loss": 0.5037489414215088,
+      "mean_token_accuracy": 0.9342954933643342,
+      "num_tokens": 1863680.0,
+      "step": 910
+    },
+    {
+      "entropy": 0.49648854285478594,
+      "epoch": 0.5257142857142857,
+      "grad_norm": 0.048722244799137115,
+      "learning_rate": 9.497142857142857e-05,
+      "loss": 0.5899598121643066,
+      "mean_token_accuracy": 0.9235812112689018,
+      "num_tokens": 1884160.0,
+      "step": 920
+    },
+    {
+      "entropy": 0.5569692045450211,
+      "epoch": 0.5314285714285715,
+      "grad_norm": 0.059873905032873154,
+      "learning_rate": 9.382857142857144e-05,
+      "loss": 0.4581919193267822,
+      "mean_token_accuracy": 0.9386986270546913,
+      "num_tokens": 1904640.0,
+      "step": 930
+    },
+    {
+      "entropy": 0.5027755253016949,
+      "epoch": 0.5371428571428571,
+      "grad_norm": 0.06935913860797882,
+      "learning_rate": 9.268571428571429e-05,
+      "loss": 0.5418489933013916,
+      "mean_token_accuracy": 0.9288649693131447,
+      "num_tokens": 1925120.0,
+      "step": 940
+    },
+    {
+      "entropy": 0.5224891372025013,
+      "epoch": 0.5428571428571428,
+      "grad_norm": 0.08146216720342636,
+      "learning_rate": 9.154285714285715e-05,
+      "loss": 0.4692467212677002,
+      "mean_token_accuracy": 0.9363013684749604,
+      "num_tokens": 1945600.0,
+      "step": 950
+    },
+    {
+      "entropy": 0.3952696807682514,
+      "epoch": 0.5485714285714286,
+      "grad_norm": 0.04515131562948227,
+      "learning_rate": 9.04e-05,
+      "loss": 0.4561454296112061,
+      "mean_token_accuracy": 0.9393835559487342,
+      "num_tokens": 1966080.0,
+      "step": 960
+    },
+    {
+      "entropy": 0.585730504244566,
+      "epoch": 0.5542857142857143,
+      "grad_norm": 0.10511744022369385,
+      "learning_rate": 8.925714285714287e-05,
+      "loss": 0.49766831398010253,
+      "mean_token_accuracy": 0.9336105614900589,
+      "num_tokens": 1986560.0,
+      "step": 970
+    },
+    {
+      "entropy": 0.44790558367967603,
+      "epoch": 0.56,
+      "grad_norm": 0.10671456158161163,
+      "learning_rate": 8.811428571428572e-05,
+      "loss": 0.5533726215362549,
+      "mean_token_accuracy": 0.9256360054016113,
+      "num_tokens": 2007040.0,
+      "step": 980
+    },
+    {
+      "entropy": 0.6043247401714325,
+      "epoch": 0.5657142857142857,
+      "grad_norm": 0.07618619501590729,
+      "learning_rate": 8.697142857142857e-05,
+      "loss": 0.504014253616333,
+      "mean_token_accuracy": 0.9335616484284401,
+      "num_tokens": 2027520.0,
+      "step": 990
+    },
+    {
+      "entropy": 0.505135278403759,
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.04772692546248436,
+      "learning_rate": 8.582857142857143e-05,
+      "loss": 0.5780378341674804,
+      "mean_token_accuracy": 0.9245107591152191,
+      "num_tokens": 2048000.0,
+      "step": 1000
+    },
+    {
+      "entropy": 0.6646146893501281,
+      "epoch": 0.5771428571428572,
+      "grad_norm": 0.14273883402347565,
+      "learning_rate": 8.46857142857143e-05,
+      "loss": 0.5886980056762695,
+      "mean_token_accuracy": 0.9233855158090591,
+      "num_tokens": 2068480.0,
+      "step": 1010
+    },
+    {
+      "entropy": 0.4395291399210691,
+      "epoch": 0.5828571428571429,
+      "grad_norm": 0.14279323816299438,
+      "learning_rate": 8.354285714285715e-05,
+      "loss": 0.4073338508605957,
+      "mean_token_accuracy": 0.9472602680325508,
+      "num_tokens": 2088960.0,
+      "step": 1020
+    },
+    {
+      "entropy": 0.46593534424901006,
+      "epoch": 0.5885714285714285,
+      "grad_norm": 0.07002478837966919,
+      "learning_rate": 8.24e-05,
+      "loss": 0.5809737205505371,
+      "mean_token_accuracy": 0.9250978484749794,
+      "num_tokens": 2109440.0,
+      "step": 1030
+    },
+    {
+      "entropy": 0.6851501919329166,
+      "epoch": 0.5942857142857143,
+      "grad_norm": 0.09571711719036102,
+      "learning_rate": 8.125714285714286e-05,
+      "loss": 0.5552125453948975,
+      "mean_token_accuracy": 0.9268101781606675,
+      "num_tokens": 2129920.0,
+      "step": 1040
+    },
+    {
+      "entropy": 0.5178290113806725,
+      "epoch": 0.6,
+      "grad_norm": 0.05673440545797348,
+      "learning_rate": 8.011428571428573e-05,
+      "loss": 0.5303431034088135,
+      "mean_token_accuracy": 0.9276418760418892,
+      "num_tokens": 2150400.0,
+      "step": 1050
+    },
+    {
+      "entropy": 0.5137788712978363,
+      "epoch": 0.6057142857142858,
+      "grad_norm": 0.10853283107280731,
+      "learning_rate": 7.897142857142858e-05,
+      "loss": 0.5105819702148438,
+      "mean_token_accuracy": 0.9317025363445282,
+      "num_tokens": 2170880.0,
+      "step": 1060
+    },
+    {
+      "entropy": 0.47450395971536635,
+      "epoch": 0.6114285714285714,
+      "grad_norm": 0.07990778237581253,
+      "learning_rate": 7.782857142857143e-05,
+      "loss": 0.4573202610015869,
+      "mean_token_accuracy": 0.9386007770895958,
+      "num_tokens": 2191360.0,
+      "step": 1070
+    },
+    {
+      "entropy": 0.48751090839505196,
+      "epoch": 0.6171428571428571,
+      "grad_norm": 0.06365620344877243,
+      "learning_rate": 7.668571428571429e-05,
+      "loss": 0.47788248062133787,
+      "mean_token_accuracy": 0.9339530274271965,
+      "num_tokens": 2211840.0,
+      "step": 1080
+    },
+    {
+      "entropy": 0.6432172752916813,
+      "epoch": 0.6228571428571429,
+      "grad_norm": 0.15153902769088745,
+      "learning_rate": 7.554285714285716e-05,
+      "loss": 0.643152379989624,
+      "mean_token_accuracy": 0.9154598772525787,
+      "num_tokens": 2232320.0,
+      "step": 1090
+    },
+    {
+      "entropy": 0.5494025968015194,
+      "epoch": 0.6285714285714286,
+      "grad_norm": 0.05778637155890465,
+      "learning_rate": 7.44e-05,
+      "loss": 0.5356337070465088,
+      "mean_token_accuracy": 0.9269080251455307,
+      "num_tokens": 2252800.0,
+      "step": 1100
+    },
+    {
+      "entropy": 0.4848433412611485,
+      "epoch": 0.6342857142857142,
+      "grad_norm": 0.06926964968442917,
+      "learning_rate": 7.325714285714286e-05,
+      "loss": 0.4998468399047852,
+      "mean_token_accuracy": 0.9333170250058174,
+      "num_tokens": 2273280.0,
+      "step": 1110
+    },
+    {
+      "entropy": 0.48359694480896,
+      "epoch": 0.64,
+      "grad_norm": 0.047927126288414,
+      "learning_rate": 7.211428571428572e-05,
+      "loss": 0.4242386341094971,
+      "mean_token_accuracy": 0.9413894265890121,
+      "num_tokens": 2293760.0,
+      "step": 1120
+    },
+    {
+      "entropy": 0.46179538443684576,
+      "epoch": 0.6457142857142857,
+      "grad_norm": 0.055567119270563126,
+      "learning_rate": 7.097142857142857e-05,
+      "loss": 0.47112126350402833,
+      "mean_token_accuracy": 0.937279836833477,
+      "num_tokens": 2314240.0,
+      "step": 1130
+    },
+    {
+      "entropy": 0.542270090430975,
+      "epoch": 0.6514285714285715,
+      "grad_norm": 0.05670230835676193,
+      "learning_rate": 6.982857142857144e-05,
+      "loss": 0.5318952560424804,
+      "mean_token_accuracy": 0.9284246578812599,
+      "num_tokens": 2334720.0,
+      "step": 1140
+    },
+    {
+      "entropy": 0.5143633857369423,
+      "epoch": 0.6571428571428571,
+      "grad_norm": 0.13285626471042633,
+      "learning_rate": 6.868571428571429e-05,
+      "loss": 0.5228913784027099,
+      "mean_token_accuracy": 0.9306262180209159,
+      "num_tokens": 2355200.0,
+      "step": 1150
+    },
+    {
+      "entropy": 0.5278540596365928,
+      "epoch": 0.6628571428571428,
+      "grad_norm": 0.12014856934547424,
+      "learning_rate": 6.754285714285714e-05,
+      "loss": 0.4942105770111084,
+      "mean_token_accuracy": 0.9336105525493622,
+      "num_tokens": 2375680.0,
+      "step": 1160
+    },
+    {
+      "entropy": 0.47442091852426527,
+      "epoch": 0.6685714285714286,
+      "grad_norm": 0.07225783169269562,
+      "learning_rate": 6.64e-05,
+      "loss": 0.5058286190032959,
+      "mean_token_accuracy": 0.9322896271944046,
+      "num_tokens": 2396160.0,
+      "step": 1170
+    },
+    {
+      "entropy": 0.5688920177519321,
+      "epoch": 0.6742857142857143,
+      "grad_norm": 0.18159577250480652,
+      "learning_rate": 6.525714285714287e-05,
+      "loss": 0.5753805637359619,
+      "mean_token_accuracy": 0.9244618400931358,
+      "num_tokens": 2416640.0,
+      "step": 1180
+    },
+    {
+      "entropy": 0.6023895367980003,
+      "epoch": 0.68,
+      "grad_norm": 0.08553122729063034,
+      "learning_rate": 6.411428571428572e-05,
+      "loss": 0.5713490009307861,
+      "mean_token_accuracy": 0.9245596811175346,
+      "num_tokens": 2437120.0,
+      "step": 1190
+    },
+    {
+      "entropy": 0.5261122666299343,
+      "epoch": 0.6857142857142857,
+      "grad_norm": 0.049268610775470734,
+      "learning_rate": 6.297142857142857e-05,
+      "loss": 0.4773250579833984,
+      "mean_token_accuracy": 0.9357142806053161,
+      "num_tokens": 2457600.0,
+      "step": 1200
+    },
+    {
+      "entropy": 0.4250579118728638,
+      "epoch": 0.6914285714285714,
+      "grad_norm": 0.08859790861606598,
+      "learning_rate": 6.182857142857143e-05,
+      "loss": 0.5190114974975586,
+      "mean_token_accuracy": 0.9327788591384888,
+      "num_tokens": 2478080.0,
+      "step": 1210
+    },
+    {
+      "entropy": 0.5697022847831249,
+      "epoch": 0.6971428571428572,
+      "grad_norm": 0.057728637009859085,
+      "learning_rate": 6.068571428571429e-05,
+      "loss": 0.5468664169311523,
+      "mean_token_accuracy": 0.9258806198835373,
+      "num_tokens": 2498560.0,
+      "step": 1220
+    },
+    {
+      "entropy": 0.6075248032808304,
+      "epoch": 0.7028571428571428,
+      "grad_norm": 0.11526218056678772,
+      "learning_rate": 5.9542857142857146e-05,
+      "loss": 0.5527401447296143,
+      "mean_token_accuracy": 0.9275929555296898,
+      "num_tokens": 2519040.0,
+      "step": 1230
+    },
+    {
+      "entropy": 0.42498912289738655,
+      "epoch": 0.7085714285714285,
+      "grad_norm": 0.03980788588523865,
+      "learning_rate": 5.8399999999999997e-05,
+      "loss": 0.33912794589996337,
+      "mean_token_accuracy": 0.952054788172245,
+      "num_tokens": 2539520.0,
+      "step": 1240
+    },
+    {
+      "entropy": 0.3205617554485798,
+      "epoch": 0.7142857142857143,
+      "grad_norm": 0.18020202219486237,
+      "learning_rate": 5.725714285714287e-05,
+      "loss": 0.46316843032836913,
+      "mean_token_accuracy": 0.9384540095925331,
+      "num_tokens": 2560000.0,
+      "step": 1250
+    },
+    {
+      "entropy": 0.5134926207363606,
+      "epoch": 0.72,
+      "grad_norm": 0.05606986954808235,
+      "learning_rate": 5.611428571428572e-05,
+      "loss": 0.493113374710083,
+      "mean_token_accuracy": 0.9350782737135888,
+      "num_tokens": 2580480.0,
+      "step": 1260
+    },
+    {
+      "entropy": 0.5948906727135181,
+      "epoch": 0.7257142857142858,
+      "grad_norm": 0.0776003748178482,
+      "learning_rate": 5.4971428571428576e-05,
+      "loss": 0.5038031101226806,
+      "mean_token_accuracy": 0.9321917742490768,
+      "num_tokens": 2600960.0,
+      "step": 1270
+    },
+    {
+      "entropy": 0.49316187873482703,
+      "epoch": 0.7314285714285714,
+      "grad_norm": 0.045350395143032074,
+      "learning_rate": 5.3828571428571426e-05,
+      "loss": 0.5338226795196533,
+      "mean_token_accuracy": 0.9277397200465203,
+      "num_tokens": 2621440.0,
+      "step": 1280
+    },
+    {
+      "entropy": 0.5127949021756649,
+      "epoch": 0.7371428571428571,
+      "grad_norm": 0.11139972507953644,
+      "learning_rate": 5.2685714285714284e-05,
+      "loss": 0.5134667873382568,
+      "mean_token_accuracy": 0.9301369816064835,
+      "num_tokens": 2641920.0,
+      "step": 1290
+    },
+    {
+      "entropy": 0.5037195473909378,
+      "epoch": 0.7428571428571429,
+      "grad_norm": 0.07528945058584213,
+      "learning_rate": 5.154285714285715e-05,
+      "loss": 0.4140820026397705,
+      "mean_token_accuracy": 0.9437377616763115,
+      "num_tokens": 2662400.0,
+      "step": 1300
+    },
+    {
+      "entropy": 0.4342464208602905,
+      "epoch": 0.7485714285714286,
+      "grad_norm": 0.04880405217409134,
+      "learning_rate": 5.0400000000000005e-05,
+      "loss": 0.44888153076171877,
+      "mean_token_accuracy": 0.9398727953433991,
+      "num_tokens": 2682880.0,
+      "step": 1310
+    },
+    {
+      "entropy": 0.4520566128194332,
+      "epoch": 0.7542857142857143,
+      "grad_norm": 0.06463604420423508,
+      "learning_rate": 4.9257142857142856e-05,
+      "loss": 0.4620377540588379,
+      "mean_token_accuracy": 0.9389921724796295,
+      "num_tokens": 2703360.0,
+      "step": 1320
+    },
+    {
+      "entropy": 0.5098449736833572,
+      "epoch": 0.76,
+      "grad_norm": 0.08104566484689713,
+      "learning_rate": 4.811428571428572e-05,
+      "loss": 0.5255457878112793,
+      "mean_token_accuracy": 0.9294031277298928,
+      "num_tokens": 2723840.0,
+      "step": 1330
+    },
+    {
+      "entropy": 0.5557918883860111,
+      "epoch": 0.7657142857142857,
+      "grad_norm": 0.17115789651870728,
+      "learning_rate": 4.697142857142857e-05,
+      "loss": 0.5438389301300048,
+      "mean_token_accuracy": 0.9260763168334961,
+      "num_tokens": 2744320.0,
+      "step": 1340
+    },
+    {
+      "entropy": 0.4861530035734177,
+      "epoch": 0.7714285714285715,
+      "grad_norm": 0.11599256098270416,
+      "learning_rate": 4.5828571428571435e-05,
+      "loss": 0.4168668270111084,
+      "mean_token_accuracy": 0.9445205450057983,
+      "num_tokens": 2764800.0,
+      "step": 1350
+    },
+    {
+      "entropy": 0.4026035204529762,
+      "epoch": 0.7771428571428571,
+      "grad_norm": 0.12264294922351837,
+      "learning_rate": 4.4685714285714286e-05,
+      "loss": 0.44530544281005857,
+      "mean_token_accuracy": 0.9406555786728859,
+      "num_tokens": 2785280.0,
+      "step": 1360
+    },
+    {
+      "entropy": 0.4574496321380138,
+      "epoch": 0.7828571428571428,
+      "grad_norm": 0.061350978910923004,
+      "learning_rate": 4.354285714285714e-05,
+      "loss": 0.4872305393218994,
+      "mean_token_accuracy": 0.9332681015133858,
+      "num_tokens": 2805760.0,
+      "step": 1370
+    },
+    {
+      "entropy": 0.5247848644852638,
+      "epoch": 0.7885714285714286,
+      "grad_norm": 0.16169291734695435,
+      "learning_rate": 4.24e-05,
+      "loss": 0.6098764419555665,
+      "mean_token_accuracy": 0.918003910779953,
+      "num_tokens": 2826240.0,
+      "step": 1380
+    },
+    {
+      "entropy": 0.6509688436985016,
+      "epoch": 0.7942857142857143,
+      "grad_norm": 0.08834604918956757,
+      "learning_rate": 4.125714285714286e-05,
+      "loss": 0.44071273803710936,
+      "mean_token_accuracy": 0.9414872780442238,
+      "num_tokens": 2846720.0,
+      "step": 1390
+    },
+    {
+      "entropy": 0.4596972182393074,
+      "epoch": 0.8,
+      "grad_norm": 0.11084280163049698,
+      "learning_rate": 4.0114285714285715e-05,
+      "loss": 0.5611968994140625,
+      "mean_token_accuracy": 0.9244618371129036,
+      "num_tokens": 2867200.0,
+      "step": 1400
+    },
+    {
+      "entropy": 0.4872237786650658,
+      "epoch": 0.8057142857142857,
+      "grad_norm": 0.06508481502532959,
+      "learning_rate": 3.897142857142857e-05,
+      "loss": 0.48762712478637693,
+      "mean_token_accuracy": 0.934931506216526,
+      "num_tokens": 2887680.0,
+      "step": 1410
+    },
+    {
+      "entropy": 0.5000090979039669,
+      "epoch": 0.8114285714285714,
+      "grad_norm": 0.09130967408418655,
+      "learning_rate": 3.782857142857143e-05,
+      "loss": 0.4005345344543457,
+      "mean_token_accuracy": 0.9463307186961174,
+      "num_tokens": 2908160.0,
+      "step": 1420
+    },
+    {
+      "entropy": 0.47738232761621474,
+      "epoch": 0.8171428571428572,
+      "grad_norm": 0.044347282499074936,
+      "learning_rate": 3.668571428571429e-05,
+      "loss": 0.48606581687927247,
+      "mean_token_accuracy": 0.9353718161582947,
+      "num_tokens": 2928640.0,
+      "step": 1430
+    },
+    {
+      "entropy": 0.42782202512025835,
+      "epoch": 0.8228571428571428,
+      "grad_norm": 0.07000281661748886,
+      "learning_rate": 3.5542857142857145e-05,
+      "loss": 0.4319033145904541,
+      "mean_token_accuracy": 0.9425146728754044,
+      "num_tokens": 2949120.0,
+      "step": 1440
+    },
+    {
+      "entropy": 0.42102750316262244,
+      "epoch": 0.8285714285714286,
+      "grad_norm": 0.051525264978408813,
+      "learning_rate": 3.4399999999999996e-05,
+      "loss": 0.5133680820465087,
+      "mean_token_accuracy": 0.9324363932013512,
+      "num_tokens": 2969600.0,
+      "step": 1450
+    },
+    {
+      "entropy": 0.49969290792942045,
+      "epoch": 0.8342857142857143,
+      "grad_norm": 0.09799740463495255,
+      "learning_rate": 3.325714285714286e-05,
+      "loss": 0.35628225803375246,
+      "mean_token_accuracy": 0.9507338494062424,
+      "num_tokens": 2990080.0,
+      "step": 1460
+    },
+    {
+      "entropy": 0.45728029012680055,
+      "epoch": 0.84,
+      "grad_norm": 0.09343012422323227,
+      "learning_rate": 3.211428571428571e-05,
+      "loss": 0.49259514808654786,
+      "mean_token_accuracy": 0.9346379593014718,
+      "num_tokens": 3010560.0,
+      "step": 1470
+    },
+    {
+      "entropy": 0.43957103714346885,
+      "epoch": 0.8457142857142858,
+      "grad_norm": 0.08746747672557831,
+      "learning_rate": 3.0971428571428575e-05,
+      "loss": 0.4531531810760498,
+      "mean_token_accuracy": 0.9389432400465012,
+      "num_tokens": 3031040.0,
+      "step": 1480
+    },
+    {
+      "entropy": 0.4378614515066147,
+      "epoch": 0.8514285714285714,
+      "grad_norm": 0.08076383918523788,
+      "learning_rate": 2.982857142857143e-05,
+      "loss": 0.38294827938079834,
+      "mean_token_accuracy": 0.9463307201862335,
+      "num_tokens": 3051520.0,
+      "step": 1490
+    },
+    {
+      "entropy": 0.404469034075737,
+      "epoch": 0.8571428571428571,
+      "grad_norm": 0.07423587143421173,
+      "learning_rate": 2.8685714285714286e-05,
+      "loss": 0.3835278511047363,
+      "mean_token_accuracy": 0.9466731920838356,
+      "num_tokens": 3072000.0,
+      "step": 1500
+    },
+    {
+      "entropy": 0.4296922437846661,
+      "epoch": 0.8628571428571429,
+      "grad_norm": 0.0661383792757988,
+      "learning_rate": 2.7542857142857144e-05,
+      "loss": 0.4782656192779541,
+      "mean_token_accuracy": 0.9355185866355896,
+      "num_tokens": 3092480.0,
+      "step": 1510
+    },
+    {
+      "entropy": 0.5012158110737801,
+      "epoch": 0.8685714285714285,
+      "grad_norm": 0.061526812613010406,
+      "learning_rate": 2.64e-05,
+      "loss": 0.498062801361084,
+      "mean_token_accuracy": 0.9329256370663643,
+      "num_tokens": 3112960.0,
+      "step": 1520
+    },
+    {
+      "entropy": 0.5610668167471886,
+      "epoch": 0.8742857142857143,
+      "grad_norm": 0.06692764163017273,
+      "learning_rate": 2.5257142857142855e-05,
+      "loss": 0.5882072925567627,
+      "mean_token_accuracy": 0.9202544033527374,
+      "num_tokens": 3133440.0,
+      "step": 1530
+    },
+    {
+      "entropy": 0.5811519846320152,
+      "epoch": 0.88,
+      "grad_norm": 0.07140078395605087,
+      "learning_rate": 2.4114285714285713e-05,
+      "loss": 0.5379503250122071,
+      "mean_token_accuracy": 0.9279354155063629,
+      "num_tokens": 3153920.0,
+      "step": 1540
+    },
+    {
+      "entropy": 0.5530194289982319,
+      "epoch": 0.8857142857142857,
+      "grad_norm": 0.06519081443548203,
+      "learning_rate": 2.297142857142857e-05,
+      "loss": 0.46042847633361816,
+      "mean_token_accuracy": 0.9379647731781006,
+      "num_tokens": 3174400.0,
+      "step": 1550
+    },
+    {
+      "entropy": 0.48414005115628245,
+      "epoch": 0.8914285714285715,
+      "grad_norm": 0.1207524910569191,
+      "learning_rate": 2.1828571428571428e-05,
+      "loss": 0.5741530895233155,
+      "mean_token_accuracy": 0.9217710316181182,
+      "num_tokens": 3194880.0,
+      "step": 1560
+    },
+    {
+      "entropy": 0.5202787101268769,
+      "epoch": 0.8971428571428571,
+      "grad_norm": 0.14086130261421204,
+      "learning_rate": 2.0685714285714285e-05,
+      "loss": 0.4223160743713379,
+      "mean_token_accuracy": 0.9427103668451309,
+      "num_tokens": 3215360.0,
+      "step": 1570
+    },
+    {
+      "entropy": 0.46667657494544984,
+      "epoch": 0.9028571428571428,
+      "grad_norm": 0.08360854536294937,
+      "learning_rate": 1.9542857142857143e-05,
+      "loss": 0.5375402450561524,
+      "mean_token_accuracy": 0.9278864920139313,
+      "num_tokens": 3235840.0,
+      "step": 1580
+    },
+    {
+      "entropy": 0.5302047237753869,
+      "epoch": 0.9085714285714286,
+      "grad_norm": 0.1263239085674286,
+      "learning_rate": 1.84e-05,
+      "loss": 0.5738934993743896,
+      "mean_token_accuracy": 0.9234833672642708,
+      "num_tokens": 3256320.0,
+      "step": 1590
+    },
+    {
+      "entropy": 0.5907163083553314,
+      "epoch": 0.9142857142857143,
+      "grad_norm": 0.1855882853269577,
+      "learning_rate": 1.7257142857142857e-05,
+      "loss": 0.5114128112792968,
+      "mean_token_accuracy": 0.9324853181838989,
+      "num_tokens": 3276800.0,
+      "step": 1600
+    },
+    {
+      "entropy": 0.5579292640089989,
+      "epoch": 0.92,
+      "grad_norm": 0.07140802592039108,
+      "learning_rate": 1.6114285714285715e-05,
+      "loss": 0.5325123786926269,
+      "mean_token_accuracy": 0.927788645029068,
+      "num_tokens": 3297280.0,
+      "step": 1610
+    },
+    {
+      "entropy": 0.5179159216582775,
+      "epoch": 0.9257142857142857,
+      "grad_norm": 0.16600361466407776,
+      "learning_rate": 1.4971428571428572e-05,
+      "loss": 0.6152241230010986,
+      "mean_token_accuracy": 0.9193737730383873,
+      "num_tokens": 3317760.0,
+      "step": 1620
+    },
+    {
+      "entropy": 0.5268598526716233,
+      "epoch": 0.9314285714285714,
+      "grad_norm": 0.09788428992033005,
+      "learning_rate": 1.382857142857143e-05,
+      "loss": 0.4881411075592041,
+      "mean_token_accuracy": 0.9339041069149971,
+      "num_tokens": 3338240.0,
+      "step": 1630
+    },
+    {
+      "entropy": 0.5437945984303951,
+      "epoch": 0.9371428571428572,
+      "grad_norm": 0.10436356067657471,
+      "learning_rate": 1.2685714285714287e-05,
+      "loss": 0.4268455982208252,
+      "mean_token_accuracy": 0.9410469651222229,
+      "num_tokens": 3358720.0,
+      "step": 1640
+    },
+    {
+      "entropy": 0.5317239560186863,
+      "epoch": 0.9428571428571428,
+      "grad_norm": 0.05967115983366966,
+      "learning_rate": 1.1542857142857143e-05,
+      "loss": 0.5399470329284668,
+      "mean_token_accuracy": 0.9254892319440842,
+      "num_tokens": 3379200.0,
+      "step": 1650
+    },
+    {
+      "entropy": 0.4992189362645149,
+      "epoch": 0.9485714285714286,
+      "grad_norm": 0.13386836647987366,
+      "learning_rate": 1.04e-05,
+      "loss": 0.5015877723693848,
+      "mean_token_accuracy": 0.934491191804409,
+      "num_tokens": 3399680.0,
+      "step": 1660
+    },
+    {
+      "entropy": 0.5123340539634228,
+      "epoch": 0.9542857142857143,
+      "grad_norm": 0.05497074872255325,
+      "learning_rate": 9.257142857142858e-06,
+      "loss": 0.5586126327514649,
+      "mean_token_accuracy": 0.9271526366472245,
+      "num_tokens": 3420160.0,
+      "step": 1670
+    },
+    {
+      "entropy": 0.5163224466145039,
+      "epoch": 0.96,
+      "grad_norm": 0.0693693533539772,
+      "learning_rate": 8.114285714285715e-06,
+      "loss": 0.4154366493225098,
+      "mean_token_accuracy": 0.9436399176716804,
+      "num_tokens": 3440640.0,
+      "step": 1680
+    },
+    {
+      "entropy": 0.49698133692145346,
+      "epoch": 0.9657142857142857,
+      "grad_norm": 0.08076659590005875,
+      "learning_rate": 6.971428571428572e-06,
+      "loss": 0.5139700412750244,
+      "mean_token_accuracy": 0.9308219149708747,
+      "num_tokens": 3461120.0,
+      "step": 1690
+    },
+    {
+      "entropy": 0.49391010627150533,
+      "epoch": 0.9714285714285714,
+      "grad_norm": 0.09655225276947021,
+      "learning_rate": 5.828571428571429e-06,
+      "loss": 0.5655792713165283,
+      "mean_token_accuracy": 0.9232387498021126,
+      "num_tokens": 3481600.0,
+      "step": 1700
+    },
+    {
+      "entropy": 0.49822288006544113,
+      "epoch": 0.9771428571428571,
+      "grad_norm": 0.07442907243967056,
+      "learning_rate": 4.685714285714286e-06,
+      "loss": 0.49275979995727537,
+      "mean_token_accuracy": 0.9336105659604073,
+      "num_tokens": 3502080.0,
+      "step": 1710
+    },
+    {
+      "entropy": 0.508363638818264,
+      "epoch": 0.9828571428571429,
+      "grad_norm": 0.10111571848392487,
+      "learning_rate": 3.542857142857143e-06,
+      "loss": 0.4838888168334961,
+      "mean_token_accuracy": 0.9345401152968407,
+      "num_tokens": 3522560.0,
+      "step": 1720
+    },
+    {
+      "entropy": 0.5120342709124088,
+      "epoch": 0.9885714285714285,
+      "grad_norm": 0.1176028847694397,
+      "learning_rate": 2.4000000000000003e-06,
+      "loss": 0.4899916172027588,
+      "mean_token_accuracy": 0.936203519999981,
+      "num_tokens": 3543040.0,
+      "step": 1730
+    },
+    {
+      "entropy": 0.513513358682394,
+      "epoch": 0.9942857142857143,
+      "grad_norm": 0.14512506127357483,
+      "learning_rate": 1.2571428571428573e-06,
+      "loss": 0.5239700317382813,
+      "mean_token_accuracy": 0.9305772989988327,
+      "num_tokens": 3563520.0,
+      "step": 1740
+    },
+    {
+      "entropy": 0.5018158197402954,
+      "epoch": 1.0,
+      "grad_norm": 0.14404776692390442,
+      "learning_rate": 1.142857142857143e-07,
+      "loss": 0.5281579971313477,
+      "mean_token_accuracy": 0.9306751415133476,
+      "num_tokens": 3584000.0,
+      "step": 1750
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 1750,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.7601617813504e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

sql-model/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

sql-model/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "add_prefix_space": null,
+  "backend": "tokenizers",
+  "bos_token": "<｜begin▁of▁sentence｜>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<｜end▁of▁sentence｜>",
+  "is_local": false,
+  "model_max_length": 16384,
+  "pad_token": "<｜end▁of▁sentence｜>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": null,
+  "use_default_system_prompt": false
+}