Instructions to use Komma-LuisMiSanVe/LangToSQL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Komma-LuisMiSanVe/LangToSQL with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Komma-LuisMiSanVe/LangToSQL",
	filename="LangToSQL-1.5B-F16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Komma-LuisMiSanVe/LangToSQL with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
./llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Komma-LuisMiSanVe/LangToSQL:F16

Use Docker

docker model run hf.co/Komma-LuisMiSanVe/LangToSQL:F16

LM Studio
Jan
Ollama
How to use Komma-LuisMiSanVe/LangToSQL with Ollama:
```
ollama run hf.co/Komma-LuisMiSanVe/LangToSQL:F16
```

Unsloth Studio

How to use Komma-LuisMiSanVe/LangToSQL with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Komma-LuisMiSanVe/LangToSQL to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Komma-LuisMiSanVe/LangToSQL to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Komma-LuisMiSanVe/LangToSQL to start chatting

How to use Komma-LuisMiSanVe/LangToSQL with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Komma-LuisMiSanVe/LangToSQL:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Komma-LuisMiSanVe/LangToSQL with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Komma-LuisMiSanVe/LangToSQL:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Komma-LuisMiSanVe/LangToSQL:F16

Run Hermes

hermes

Docker Model Runner
How to use Komma-LuisMiSanVe/LangToSQL with Docker Model Runner:
```
docker model run hf.co/Komma-LuisMiSanVe/LangToSQL:F16
```

Lemonade

How to use Komma-LuisMiSanVe/LangToSQL with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Komma-LuisMiSanVe/LangToSQL:F16

Run and chat with the model

lemonade run user.LangToSQL-F16

List all available models

lemonade list

Komma-LuisMiSanVe commited on 22 days ago

Commit

1e6c9f0

verified ·

1 Parent(s): c1c7d48

Upload model files

Browse files

Upload safetensors and GGUF models

Files changed (25) hide show

.gitattributes +4 -0
LangToSQL-1.5B-F16.gguf +3 -0
sql-model-merged/chat_template.jinja +54 -0
sql-model-merged/config.json +61 -0
sql-model-merged/generation_config.json +14 -0
sql-model-merged/model.safetensors +3 -0
sql-model-merged/tokenizer.json +3 -0
sql-model-merged/tokenizer_config.json +29 -0
sql-model/README.md +62 -0
sql-model/adapter_config.json +41 -0
sql-model/adapter_model.safetensors +3 -0
sql-model/chat_template.jinja +54 -0
sql-model/checkpoint-1750/README.md +209 -0
sql-model/checkpoint-1750/adapter_config.json +41 -0
sql-model/checkpoint-1750/adapter_model.safetensors +3 -0
sql-model/checkpoint-1750/chat_template.jinja +54 -0
sql-model/checkpoint-1750/optimizer.pt +3 -0
sql-model/checkpoint-1750/rng_state.pth +3 -0
sql-model/checkpoint-1750/scheduler.pt +3 -0
sql-model/checkpoint-1750/tokenizer.json +3 -0
sql-model/checkpoint-1750/tokenizer_config.json +29 -0
sql-model/checkpoint-1750/trainer_state.json +1784 -0
sql-model/checkpoint-1750/training_args.bin +3 -0
sql-model/tokenizer.json +3 -0
sql-model/tokenizer_config.json +29 -0

.gitattributes CHANGED Viewed

@@ -35,3 +35,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 train.json filter=lfs diff=lfs merge=lfs -text
 LangToSQL-1.3B-F16.gguf filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 train.json filter=lfs diff=lfs merge=lfs -text
 LangToSQL-1.3B-F16.gguf filter=lfs diff=lfs merge=lfs -text
+LangToSQL-1.5B-F16.gguf filter=lfs diff=lfs merge=lfs -text
+sql-model-merged/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+sql-model/checkpoint-1750/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+sql-model/tokenizer.json filter=lfs diff=lfs merge=lfs -text

LangToSQL-1.5B-F16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:29e6ea365070b1fd1afe089af5d212e4e894181cb076b8e1a5e310ec3e5aec05
+size 3093668832

sql-model-merged/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

sql-model-merged/config.json ADDED Viewed

	@@ -0,0 +1,61 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "float32",
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 1536,
+  "initializer_range": 0.02,
+  "intermediate_size": 8960,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 28,
+  "model_type": "qwen2",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 2,
+  "pad_token_id": null,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 1000000.0,
+    "rope_type": "default"
+  },
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.4.0",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

sql-model-merged/generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "repetition_penalty": 1.1,
+  "temperature": 0.7,
+  "top_k": 20,
+  "top_p": 0.8,
+  "transformers_version": "5.4.0"
+}

sql-model-merged/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed8302c1aacc01908dcd07d76566b907e8caafc15972c5c404b75c49e8277a14
+size 6174895536

sql-model-merged/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
+size 11421892

sql-model-merged/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "is_local": false,
+  "model_max_length": 32768,
+  "pad_token": "<|im_end|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

sql-model/README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
+library_name: peft
+model_name: sql-model
+tags:
+- base_model:adapter:Qwen/Qwen2.5-Coder-1.5B-Instruct
+- lora
+- sft
+- transformers
+- trl
+licence: license
+pipeline_tag: text-generation
+---
+# Model Card for sql-model
+This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="None", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+This model was trained with SFT.
+### Framework versions
+- PEFT 0.18.1
+- TRL: 1.0.0
+- Transformers: 5.4.0
+- Pytorch: 2.11.0
+- Datasets: 4.8.4
+- Tokenizers: 0.22.2
+## Citations
+Cite TRL as:
+```bibtex
+@software{vonwerra2020trl,
+  title   = {{TRL: Transformers Reinforcement Learning}},
+  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and GallouÃ©dec, Quentin},
+  license = {Apache-2.0},
+  url     = {https://github.com/huggingface/trl},
+  year    = {2020}
+}
+```

sql-model/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-1.5B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

sql-model/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:233052429edaba0de411ff7a503eec8183193d955664da50103ee110cb835952
+size 8731128

sql-model/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

sql-model/checkpoint-1750/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen2.5-Coder-1.5B-Instruct
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.1

sql-model/checkpoint-1750/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-1.5B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

sql-model/checkpoint-1750/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:233052429edaba0de411ff7a503eec8183193d955664da50103ee110cb835952
+size 8731128

sql-model/checkpoint-1750/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

sql-model/checkpoint-1750/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f63982639d2cdd3d73b243be88644fe815c0b4a48555410eb414422116b7ade
+size 17524171

sql-model/checkpoint-1750/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c593dbe7b4c13895455ed97063d53435660103dbe2e0cf605b493badb4ad85cd
+size 14455

sql-model/checkpoint-1750/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af667abda37bcebe1e332a00c2289b57845721907fb4438e53936d778ed7af82
+size 1465

sql-model/checkpoint-1750/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
+size 11421892

sql-model/checkpoint-1750/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "is_local": false,
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

sql-model/checkpoint-1750/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1784 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 1750,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 1.755023404955864,
+      "epoch": 0.005714285714285714,
+      "grad_norm": 0.4571238160133362,
+      "learning_rate": 0.00019897142857142858,
+      "loss": 3.2381954193115234,
+      "mean_token_accuracy": 0.556605902314186,
+      "num_tokens": 3387.0,
+      "step": 10
+    },
+    {
+      "entropy": 1.872155451774597,
+      "epoch": 0.011428571428571429,
+      "grad_norm": 0.34199467301368713,
+      "learning_rate": 0.00019782857142857142,
+      "loss": 2.191742706298828,
+      "mean_token_accuracy": 0.6503265604376793,
+      "num_tokens": 6716.0,
+      "step": 20
+    },
+    {
+      "entropy": 1.484403568506241,
+      "epoch": 0.017142857142857144,
+      "grad_norm": 0.3380744755268097,
+      "learning_rate": 0.0001966857142857143,
+      "loss": 1.4328497886657714,
+      "mean_token_accuracy": 0.746163409948349,
+      "num_tokens": 9752.0,
+      "step": 30
+    },
+    {
+      "entropy": 1.0218224853277207,
+      "epoch": 0.022857142857142857,
+      "grad_norm": 0.2288217842578888,
+      "learning_rate": 0.00019554285714285717,
+      "loss": 1.1417624473571777,
+      "mean_token_accuracy": 0.8017501533031464,
+      "num_tokens": 13031.0,
+      "step": 40
+    },
+    {
+      "entropy": 0.9253160357475281,
+      "epoch": 0.02857142857142857,
+      "grad_norm": 0.3444674015045166,
+      "learning_rate": 0.0001944,
+      "loss": 1.0559259414672852,
+      "mean_token_accuracy": 0.8092378318309784,
+      "num_tokens": 16325.0,
+      "step": 50
+    },
+    {
+      "entropy": 0.8747231267392636,
+      "epoch": 0.03428571428571429,
+      "grad_norm": 0.39356729388237,
+      "learning_rate": 0.00019325714285714287,
+      "loss": 0.8799959182739258,
+      "mean_token_accuracy": 0.8227440923452377,
+      "num_tokens": 19968.0,
+      "step": 60
+    },
+    {
+      "entropy": 0.8573793575167656,
+      "epoch": 0.04,
+      "grad_norm": 0.20990096032619476,
+      "learning_rate": 0.0001921142857142857,
+      "loss": 0.844728660583496,
+      "mean_token_accuracy": 0.8225094452500343,
+      "num_tokens": 23194.0,
+      "step": 70
+    },
+    {
+      "entropy": 0.8150502145290375,
+      "epoch": 0.045714285714285714,
+      "grad_norm": 0.2119598537683487,
+      "learning_rate": 0.00019097142857142857,
+      "loss": 0.7866294860839844,
+      "mean_token_accuracy": 0.8332153782248497,
+      "num_tokens": 26444.0,
+      "step": 80
+    },
+    {
+      "entropy": 0.7989433646202088,
+      "epoch": 0.05142857142857143,
+      "grad_norm": 5.590895175933838,
+      "learning_rate": 0.00018982857142857144,
+      "loss": 0.7787356853485108,
+      "mean_token_accuracy": 0.8374833926558495,
+      "num_tokens": 29847.0,
+      "step": 90
+    },
+    {
+      "entropy": 0.7925440274178982,
+      "epoch": 0.05714285714285714,
+      "grad_norm": 0.2303859442472458,
+      "learning_rate": 0.00018868571428571428,
+      "loss": 0.7289025783538818,
+      "mean_token_accuracy": 0.8440518975257874,
+      "num_tokens": 33416.0,
+      "step": 100
+    },
+    {
+      "entropy": 0.7396377846598625,
+      "epoch": 0.06285714285714286,
+      "grad_norm": 0.1661251187324524,
+      "learning_rate": 0.00018754285714285714,
+      "loss": 0.7113327980041504,
+      "mean_token_accuracy": 0.844328448176384,
+      "num_tokens": 36508.0,
+      "step": 110
+    },
+    {
+      "entropy": 0.7262422397732735,
+      "epoch": 0.06857142857142857,
+      "grad_norm": 0.16159392893314362,
+      "learning_rate": 0.00018640000000000003,
+      "loss": 0.7208394050598145,
+      "mean_token_accuracy": 0.8527172908186913,
+      "num_tokens": 39969.0,
+      "step": 120
+    },
+    {
+      "entropy": 0.7012980431318283,
+      "epoch": 0.07428571428571429,
+      "grad_norm": 0.18032804131507874,
+      "learning_rate": 0.00018525714285714287,
+      "loss": 0.7320738792419433,
+      "mean_token_accuracy": 0.8414866328239441,
+      "num_tokens": 43554.0,
+      "step": 130
+    },
+    {
+      "entropy": 0.7090784892439842,
+      "epoch": 0.08,
+      "grad_norm": 0.1905902475118637,
+      "learning_rate": 0.00018411428571428573,
+      "loss": 0.7095338821411132,
+      "mean_token_accuracy": 0.8467919006943703,
+      "num_tokens": 47352.0,
+      "step": 140
+    },
+    {
+      "entropy": 0.7508301064372063,
+      "epoch": 0.08571428571428572,
+      "grad_norm": 0.2042306363582611,
+      "learning_rate": 0.00018297142857142857,
+      "loss": 0.714650297164917,
+      "mean_token_accuracy": 0.8489894285798073,
+      "num_tokens": 50592.0,
+      "step": 150
+    },
+    {
+      "entropy": 0.7499568641185761,
+      "epoch": 0.09142857142857143,
+      "grad_norm": 0.14059635996818542,
+      "learning_rate": 0.00018182857142857143,
+      "loss": 0.7066077709197998,
+      "mean_token_accuracy": 0.8459182664752006,
+      "num_tokens": 53905.0,
+      "step": 160
+    },
+    {
+      "entropy": 0.751581659913063,
+      "epoch": 0.09714285714285714,
+      "grad_norm": 0.1389404684305191,
+      "learning_rate": 0.0001806857142857143,
+      "loss": 0.7304620265960693,
+      "mean_token_accuracy": 0.8408536836504936,
+      "num_tokens": 57406.0,
+      "step": 170
+    },
+    {
+      "entropy": 0.7080035664141178,
+      "epoch": 0.10285714285714286,
+      "grad_norm": 0.20151932537555695,
+      "learning_rate": 0.00017954285714285714,
+      "loss": 0.6872885704040528,
+      "mean_token_accuracy": 0.8518571972846984,
+      "num_tokens": 60514.0,
+      "step": 180
+    },
+    {
+      "entropy": 0.7358761139214038,
+      "epoch": 0.10857142857142857,
+      "grad_norm": 0.15893664956092834,
+      "learning_rate": 0.0001784,
+      "loss": 0.7053659915924072,
+      "mean_token_accuracy": 0.8461879312992096,
+      "num_tokens": 63978.0,
+      "step": 190
+    },
+    {
+      "entropy": 0.7305082514882087,
+      "epoch": 0.11428571428571428,
+      "grad_norm": 0.17859824001789093,
+      "learning_rate": 0.00017725714285714286,
+      "loss": 0.7007936954498291,
+      "mean_token_accuracy": 0.8456599608063697,
+      "num_tokens": 67445.0,
+      "step": 200
+    },
+    {
+      "entropy": 0.6865443497896194,
+      "epoch": 0.12,
+      "grad_norm": 0.17160974442958832,
+      "learning_rate": 0.00017611428571428573,
+      "loss": 0.684369421005249,
+      "mean_token_accuracy": 0.8545770645141602,
+      "num_tokens": 71004.0,
+      "step": 210
+    },
+    {
+      "entropy": 0.743726947158575,
+      "epoch": 0.12571428571428572,
+      "grad_norm": 0.19258913397789001,
+      "learning_rate": 0.0001749714285714286,
+      "loss": 0.6941751956939697,
+      "mean_token_accuracy": 0.8491073668003082,
+      "num_tokens": 74467.0,
+      "step": 220
+    },
+    {
+      "entropy": 0.7293999463319778,
+      "epoch": 0.13142857142857142,
+      "grad_norm": 0.21078741550445557,
+      "learning_rate": 0.00017382857142857143,
+      "loss": 0.7608419418334961,
+      "mean_token_accuracy": 0.8355702117085457,
+      "num_tokens": 77733.0,
+      "step": 230
+    },
+    {
+      "entropy": 0.6695528343319893,
+      "epoch": 0.13714285714285715,
+      "grad_norm": 0.2082340568304062,
+      "learning_rate": 0.0001726857142857143,
+      "loss": 0.6619296073913574,
+      "mean_token_accuracy": 0.8572208806872368,
+      "num_tokens": 81271.0,
+      "step": 240
+    },
+    {
+      "entropy": 0.7246910102665425,
+      "epoch": 0.14285714285714285,
+      "grad_norm": 0.20099493861198425,
+      "learning_rate": 0.00017154285714285716,
+      "loss": 0.7173333168029785,
+      "mean_token_accuracy": 0.8494016170501709,
+      "num_tokens": 84574.0,
+      "step": 250
+    },
+    {
+      "entropy": 0.6853026911616326,
+      "epoch": 0.14857142857142858,
+      "grad_norm": 0.17009125649929047,
+      "learning_rate": 0.0001704,
+      "loss": 0.6357984066009521,
+      "mean_token_accuracy": 0.8530513703823089,
+      "num_tokens": 88152.0,
+      "step": 260
+    },
+    {
+      "entropy": 0.6933697812259197,
+      "epoch": 0.15428571428571428,
+      "grad_norm": 0.24327197670936584,
+      "learning_rate": 0.00016925714285714286,
+      "loss": 0.6879170417785645,
+      "mean_token_accuracy": 0.8545807048678398,
+      "num_tokens": 91522.0,
+      "step": 270
+    },
+    {
+      "entropy": 0.685890257358551,
+      "epoch": 0.16,
+      "grad_norm": 0.1667262464761734,
+      "learning_rate": 0.00016811428571428572,
+      "loss": 0.6770327091217041,
+      "mean_token_accuracy": 0.8566557437181472,
+      "num_tokens": 94937.0,
+      "step": 280
+    },
+    {
+      "entropy": 0.6751709222793579,
+      "epoch": 0.1657142857142857,
+      "grad_norm": 0.18273314833641052,
+      "learning_rate": 0.0001669714285714286,
+      "loss": 0.6363730907440186,
+      "mean_token_accuracy": 0.86243856549263,
+      "num_tokens": 97983.0,
+      "step": 290
+    },
+    {
+      "entropy": 0.6693296499550343,
+      "epoch": 0.17142857142857143,
+      "grad_norm": 0.18416839838027954,
+      "learning_rate": 0.00016582857142857145,
+      "loss": 0.6525997161865235,
+      "mean_token_accuracy": 0.8522453367710113,
+      "num_tokens": 101606.0,
+      "step": 300
+    },
+    {
+      "entropy": 0.7025774903595448,
+      "epoch": 0.17714285714285713,
+      "grad_norm": 0.2354522943496704,
+      "learning_rate": 0.0001646857142857143,
+      "loss": 0.7257501125335694,
+      "mean_token_accuracy": 0.8442893981933594,
+      "num_tokens": 104894.0,
+      "step": 310
+    },
+    {
+      "entropy": 0.7182297334074974,
+      "epoch": 0.18285714285714286,
+      "grad_norm": 0.525611162185669,
+      "learning_rate": 0.00016354285714285715,
+      "loss": 0.7307909011840821,
+      "mean_token_accuracy": 0.8398969128727913,
+      "num_tokens": 108488.0,
+      "step": 320
+    },
+    {
+      "entropy": 0.7585869312286377,
+      "epoch": 0.18857142857142858,
+      "grad_norm": 0.19886285066604614,
+      "learning_rate": 0.00016240000000000002,
+      "loss": 0.7316296100616455,
+      "mean_token_accuracy": 0.8501298695802688,
+      "num_tokens": 111621.0,
+      "step": 330
+    },
+    {
+      "entropy": 0.7191452518105507,
+      "epoch": 0.19428571428571428,
+      "grad_norm": 0.18904122710227966,
+      "learning_rate": 0.00016125714285714285,
+      "loss": 0.7095869541168213,
+      "mean_token_accuracy": 0.8502562329173088,
+      "num_tokens": 114846.0,
+      "step": 340
+    },
+    {
+      "entropy": 0.76069777905941,
+      "epoch": 0.2,
+      "grad_norm": 0.1604829579591751,
+      "learning_rate": 0.00016011428571428572,
+      "loss": 0.7304455280303955,
+      "mean_token_accuracy": 0.8341712266206741,
+      "num_tokens": 118373.0,
+      "step": 350
+    },
+    {
+      "entropy": 0.7046815037727356,
+      "epoch": 0.2057142857142857,
+      "grad_norm": 0.19462066888809204,
+      "learning_rate": 0.00015897142857142858,
+      "loss": 0.7354346752166748,
+      "mean_token_accuracy": 0.8472130700945855,
+      "num_tokens": 122029.0,
+      "step": 360
+    },
+    {
+      "entropy": 0.6850700139999389,
+      "epoch": 0.21142857142857144,
+      "grad_norm": 0.22383001446723938,
+      "learning_rate": 0.00015782857142857145,
+      "loss": 0.6635906219482421,
+      "mean_token_accuracy": 0.853803887963295,
+      "num_tokens": 125215.0,
+      "step": 370
+    },
+    {
+      "entropy": 0.6743280217051506,
+      "epoch": 0.21714285714285714,
+      "grad_norm": 0.22975340485572815,
+      "learning_rate": 0.0001566857142857143,
+      "loss": 0.6944728851318359,
+      "mean_token_accuracy": 0.8531624928116799,
+      "num_tokens": 128377.0,
+      "step": 380
+    },
+    {
+      "entropy": 0.6956075571477414,
+      "epoch": 0.22285714285714286,
+      "grad_norm": 0.16905982792377472,
+      "learning_rate": 0.00015554285714285715,
+      "loss": 0.6574249267578125,
+      "mean_token_accuracy": 0.8537535384297371,
+      "num_tokens": 131561.0,
+      "step": 390
+    },
+    {
+      "entropy": 0.6535641267895699,
+      "epoch": 0.22857142857142856,
+      "grad_norm": 0.2020592838525772,
+      "learning_rate": 0.0001544,
+      "loss": 0.6684637546539307,
+      "mean_token_accuracy": 0.8588701456785202,
+      "num_tokens": 134886.0,
+      "step": 400
+    },
+    {
+      "entropy": 0.7120788738131523,
+      "epoch": 0.2342857142857143,
+      "grad_norm": 0.17928935587406158,
+      "learning_rate": 0.00015325714285714285,
+      "loss": 0.6707518100738525,
+      "mean_token_accuracy": 0.855300298333168,
+      "num_tokens": 138226.0,
+      "step": 410
+    },
+    {
+      "entropy": 0.7220979325473309,
+      "epoch": 0.24,
+      "grad_norm": 0.1849253624677658,
+      "learning_rate": 0.00015211428571428571,
+      "loss": 0.6626916408538819,
+      "mean_token_accuracy": 0.8522223040461541,
+      "num_tokens": 141520.0,
+      "step": 420
+    },
+    {
+      "entropy": 0.6991067595779896,
+      "epoch": 0.24571428571428572,
+      "grad_norm": 0.20180848240852356,
+      "learning_rate": 0.00015097142857142858,
+      "loss": 0.6791836261749268,
+      "mean_token_accuracy": 0.8506909489631653,
+      "num_tokens": 144798.0,
+      "step": 430
+    },
+    {
+      "entropy": 0.6208832249045372,
+      "epoch": 0.25142857142857145,
+      "grad_norm": 0.16604188084602356,
+      "learning_rate": 0.00014982857142857144,
+      "loss": 0.6340303897857666,
+      "mean_token_accuracy": 0.8603393912315369,
+      "num_tokens": 148478.0,
+      "step": 440
+    },
+    {
+      "entropy": 0.6770513989031315,
+      "epoch": 0.2571428571428571,
+      "grad_norm": 0.20192453265190125,
+      "learning_rate": 0.0001486857142857143,
+      "loss": 0.6619882106781005,
+      "mean_token_accuracy": 0.8508818298578262,
+      "num_tokens": 152091.0,
+      "step": 450
+    },
+    {
+      "entropy": 0.7052303500473499,
+      "epoch": 0.26285714285714284,
+      "grad_norm": 0.23521800339221954,
+      "learning_rate": 0.00014754285714285717,
+      "loss": 0.7204936027526856,
+      "mean_token_accuracy": 0.8412432789802551,
+      "num_tokens": 155676.0,
+      "step": 460
+    },
+    {
+      "entropy": 0.7039557799696923,
+      "epoch": 0.26857142857142857,
+      "grad_norm": 0.18875496089458466,
+      "learning_rate": 0.0001464,
+      "loss": 0.6947028636932373,
+      "mean_token_accuracy": 0.8545891866087914,
+      "num_tokens": 159143.0,
+      "step": 470
+    },
+    {
+      "entropy": 0.7390237525105476,
+      "epoch": 0.2742857142857143,
+      "grad_norm": 0.20184092223644257,
+      "learning_rate": 0.00014525714285714287,
+      "loss": 0.7198336124420166,
+      "mean_token_accuracy": 0.8431004419922828,
+      "num_tokens": 162539.0,
+      "step": 480
+    },
+    {
+      "entropy": 0.7064043849706649,
+      "epoch": 0.28,
+      "grad_norm": 0.15704013407230377,
+      "learning_rate": 0.0001441142857142857,
+      "loss": 0.6998549938201905,
+      "mean_token_accuracy": 0.8420170620083809,
+      "num_tokens": 166150.0,
+      "step": 490
+    },
+    {
+      "entropy": 0.6628431506454945,
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.1651039719581604,
+      "learning_rate": 0.00014297142857142857,
+      "loss": 0.6178061485290527,
+      "mean_token_accuracy": 0.8624962165951728,
+      "num_tokens": 169383.0,
+      "step": 500
+    },
+    {
+      "entropy": 0.6844660565257072,
+      "epoch": 0.2914285714285714,
+      "grad_norm": 0.23858557641506195,
+      "learning_rate": 0.00014182857142857144,
+      "loss": 0.6924018859863281,
+      "mean_token_accuracy": 0.8506158754229546,
+      "num_tokens": 172663.0,
+      "step": 510
+    },
+    {
+      "entropy": 0.6808311395347119,
+      "epoch": 0.29714285714285715,
+      "grad_norm": 0.23313727974891663,
+      "learning_rate": 0.00014068571428571427,
+      "loss": 0.6816984653472901,
+      "mean_token_accuracy": 0.8454315170645714,
+      "num_tokens": 175956.0,
+      "step": 520
+    },
+    {
+      "entropy": 0.683815760165453,
+      "epoch": 0.3028571428571429,
+      "grad_norm": 0.20086617767810822,
+      "learning_rate": 0.00013954285714285717,
+      "loss": 0.6635974884033203,
+      "mean_token_accuracy": 0.8560041651129723,
+      "num_tokens": 179187.0,
+      "step": 530
+    },
+    {
+      "entropy": 0.6870512694120408,
+      "epoch": 0.30857142857142855,
+      "grad_norm": 0.24712982773780823,
+      "learning_rate": 0.0001384,
+      "loss": 0.702989149093628,
+      "mean_token_accuracy": 0.8512916043400764,
+      "num_tokens": 182700.0,
+      "step": 540
+    },
+    {
+      "entropy": 0.6640612557530403,
+      "epoch": 0.3142857142857143,
+      "grad_norm": 0.19018641114234924,
+      "learning_rate": 0.00013725714285714287,
+      "loss": 0.6487014293670654,
+      "mean_token_accuracy": 0.8530389070510864,
+      "num_tokens": 186178.0,
+      "step": 550
+    },
+    {
+      "entropy": 0.6581176854670048,
+      "epoch": 0.32,
+      "grad_norm": 0.17294897139072418,
+      "learning_rate": 0.00013611428571428573,
+      "loss": 0.6333216190338135,
+      "mean_token_accuracy": 0.8581204935908318,
+      "num_tokens": 189593.0,
+      "step": 560
+    },
+    {
+      "entropy": 0.6372215747833252,
+      "epoch": 0.32571428571428573,
+      "grad_norm": 0.20004868507385254,
+      "learning_rate": 0.00013497142857142857,
+      "loss": 0.634274959564209,
+      "mean_token_accuracy": 0.8586296364665031,
+      "num_tokens": 192755.0,
+      "step": 570
+    },
+    {
+      "entropy": 0.6754648350179195,
+      "epoch": 0.3314285714285714,
+      "grad_norm": 0.23564350605010986,
+      "learning_rate": 0.00013382857142857143,
+      "loss": 0.7094334602355957,
+      "mean_token_accuracy": 0.8480966657400131,
+      "num_tokens": 196315.0,
+      "step": 580
+    },
+    {
+      "entropy": 0.66893340498209,
+      "epoch": 0.33714285714285713,
+      "grad_norm": 0.21135050058364868,
+      "learning_rate": 0.0001326857142857143,
+      "loss": 0.6854133129119873,
+      "mean_token_accuracy": 0.8476407691836357,
+      "num_tokens": 199842.0,
+      "step": 590
+    },
+    {
+      "entropy": 0.734431654214859,
+      "epoch": 0.34285714285714286,
+      "grad_norm": 0.15609301626682281,
+      "learning_rate": 0.00013154285714285713,
+      "loss": 0.6732261657714844,
+      "mean_token_accuracy": 0.8533253937959671,
+      "num_tokens": 203217.0,
+      "step": 600
+    },
+    {
+      "entropy": 0.6809550739824772,
+      "epoch": 0.3485714285714286,
+      "grad_norm": 0.1637752801179886,
+      "learning_rate": 0.0001304,
+      "loss": 0.628327465057373,
+      "mean_token_accuracy": 0.8617108896374702,
+      "num_tokens": 206274.0,
+      "step": 610
+    },
+    {
+      "entropy": 0.6702453441917896,
+      "epoch": 0.35428571428571426,
+      "grad_norm": 0.2080763578414917,
+      "learning_rate": 0.00012925714285714286,
+      "loss": 0.6497041702270507,
+      "mean_token_accuracy": 0.8528500080108643,
+      "num_tokens": 209447.0,
+      "step": 620
+    },
+    {
+      "entropy": 0.6892564371228218,
+      "epoch": 0.36,
+      "grad_norm": 0.24688860774040222,
+      "learning_rate": 0.00012811428571428573,
+      "loss": 0.692191743850708,
+      "mean_token_accuracy": 0.8509424239397049,
+      "num_tokens": 212753.0,
+      "step": 630
+    },
+    {
+      "entropy": 0.6778513200581073,
+      "epoch": 0.3657142857142857,
+      "grad_norm": 0.23269857466220856,
+      "learning_rate": 0.0001269714285714286,
+      "loss": 0.6862228393554688,
+      "mean_token_accuracy": 0.8516128286719322,
+      "num_tokens": 216340.0,
+      "step": 640
+    },
+    {
+      "entropy": 0.6519438281655312,
+      "epoch": 0.37142857142857144,
+      "grad_norm": 0.21059896051883698,
+      "learning_rate": 0.00012582857142857143,
+      "loss": 0.6068024635314941,
+      "mean_token_accuracy": 0.8627916231751442,
+      "num_tokens": 219607.0,
+      "step": 650
+    },
+    {
+      "entropy": 0.6636812917888164,
+      "epoch": 0.37714285714285717,
+      "grad_norm": 0.19394846260547638,
+      "learning_rate": 0.0001246857142857143,
+      "loss": 0.6602859020233154,
+      "mean_token_accuracy": 0.8570883437991142,
+      "num_tokens": 222993.0,
+      "step": 660
+    },
+    {
+      "entropy": 0.6973143525421619,
+      "epoch": 0.38285714285714284,
+      "grad_norm": 0.19678597152233124,
+      "learning_rate": 0.00012354285714285713,
+      "loss": 0.7101614475250244,
+      "mean_token_accuracy": 0.8465988427400589,
+      "num_tokens": 226354.0,
+      "step": 670
+    },
+    {
+      "entropy": 0.6470928482711316,
+      "epoch": 0.38857142857142857,
+      "grad_norm": 0.23552727699279785,
+      "learning_rate": 0.0001224,
+      "loss": 0.6224793434143067,
+      "mean_token_accuracy": 0.8593849778175354,
+      "num_tokens": 229839.0,
+      "step": 680
+    },
+    {
+      "entropy": 0.6596762828528882,
+      "epoch": 0.3942857142857143,
+      "grad_norm": 0.24519386887550354,
+      "learning_rate": 0.00012125714285714287,
+      "loss": 0.6345892429351807,
+      "mean_token_accuracy": 0.8584522992372513,
+      "num_tokens": 233176.0,
+      "step": 690
+    },
+    {
+      "entropy": 0.6750190995633603,
+      "epoch": 0.4,
+      "grad_norm": 0.18651342391967773,
+      "learning_rate": 0.00012011428571428571,
+      "loss": 0.6961663722991943,
+      "mean_token_accuracy": 0.8505131497979164,
+      "num_tokens": 236611.0,
+      "step": 700
+    },
+    {
+      "entropy": 0.6748621694743633,
+      "epoch": 0.4057142857142857,
+      "grad_norm": 0.2448211908340454,
+      "learning_rate": 0.00011897142857142857,
+      "loss": 0.6589895725250244,
+      "mean_token_accuracy": 0.8562820449471473,
+      "num_tokens": 239719.0,
+      "step": 710
+    },
+    {
+      "entropy": 0.6976276993751526,
+      "epoch": 0.4114285714285714,
+      "grad_norm": 0.18323859572410583,
+      "learning_rate": 0.00011782857142857145,
+      "loss": 0.7028826713562012,
+      "mean_token_accuracy": 0.8523884430527687,
+      "num_tokens": 243277.0,
+      "step": 720
+    },
+    {
+      "entropy": 0.6387927994132042,
+      "epoch": 0.41714285714285715,
+      "grad_norm": 0.22905172407627106,
+      "learning_rate": 0.00011668571428571429,
+      "loss": 0.5653384685516357,
+      "mean_token_accuracy": 0.8658381626009941,
+      "num_tokens": 246681.0,
+      "step": 730
+    },
+    {
+      "entropy": 0.711549348384142,
+      "epoch": 0.4228571428571429,
+      "grad_norm": 0.22809088230133057,
+      "learning_rate": 0.00011554285714285715,
+      "loss": 0.7091411590576172,
+      "mean_token_accuracy": 0.8466275259852409,
+      "num_tokens": 250174.0,
+      "step": 740
+    },
+    {
+      "entropy": 0.6525940448045731,
+      "epoch": 0.42857142857142855,
+      "grad_norm": 0.165474072098732,
+      "learning_rate": 0.0001144,
+      "loss": 0.6481022834777832,
+      "mean_token_accuracy": 0.8572756558656692,
+      "num_tokens": 253671.0,
+      "step": 750
+    },
+    {
+      "entropy": 0.6324376344680787,
+      "epoch": 0.4342857142857143,
+      "grad_norm": 0.2185923010110855,
+      "learning_rate": 0.00011325714285714287,
+      "loss": 0.641082763671875,
+      "mean_token_accuracy": 0.8592174142599106,
+      "num_tokens": 256803.0,
+      "step": 760
+    },
+    {
+      "entropy": 0.6347133338451385,
+      "epoch": 0.44,
+      "grad_norm": 0.21539245545864105,
+      "learning_rate": 0.00011211428571428573,
+      "loss": 0.6200827121734619,
+      "mean_token_accuracy": 0.8640724703669548,
+      "num_tokens": 260043.0,
+      "step": 770
+    },
+    {
+      "entropy": 0.6413769535720348,
+      "epoch": 0.44571428571428573,
+      "grad_norm": 0.2725240886211395,
+      "learning_rate": 0.00011097142857142857,
+      "loss": 0.6957359790802002,
+      "mean_token_accuracy": 0.8510783404111862,
+      "num_tokens": 263322.0,
+      "step": 780
+    },
+    {
+      "entropy": 0.6790083512663841,
+      "epoch": 0.4514285714285714,
+      "grad_norm": 0.2525796890258789,
+      "learning_rate": 0.00010982857142857143,
+      "loss": 0.6288758277893066,
+      "mean_token_accuracy": 0.8544105023145676,
+      "num_tokens": 266791.0,
+      "step": 790
+    },
+    {
+      "entropy": 0.6541981130838395,
+      "epoch": 0.45714285714285713,
+      "grad_norm": 0.2138395756483078,
+      "learning_rate": 0.0001086857142857143,
+      "loss": 0.6304707527160645,
+      "mean_token_accuracy": 0.8575463786721229,
+      "num_tokens": 269777.0,
+      "step": 800
+    },
+    {
+      "entropy": 0.6959837578237057,
+      "epoch": 0.46285714285714286,
+      "grad_norm": 0.2314756065607071,
+      "learning_rate": 0.00010754285714285715,
+      "loss": 0.6351391315460205,
+      "mean_token_accuracy": 0.8612324088811875,
+      "num_tokens": 272988.0,
+      "step": 810
+    },
+    {
+      "entropy": 0.6571616813540458,
+      "epoch": 0.4685714285714286,
+      "grad_norm": 0.19059552252292633,
+      "learning_rate": 0.00010640000000000001,
+      "loss": 0.6456667423248291,
+      "mean_token_accuracy": 0.8525185197591781,
+      "num_tokens": 276377.0,
+      "step": 820
+    },
+    {
+      "entropy": 0.5789781011641025,
+      "epoch": 0.4742857142857143,
+      "grad_norm": 0.20973151922225952,
+      "learning_rate": 0.00010525714285714285,
+      "loss": 0.5815204620361328,
+      "mean_token_accuracy": 0.8682083815336228,
+      "num_tokens": 279565.0,
+      "step": 830
+    },
+    {
+      "entropy": 0.6460968509316445,
+      "epoch": 0.48,
+      "grad_norm": 0.19541703164577484,
+      "learning_rate": 0.00010411428571428573,
+      "loss": 0.6836059093475342,
+      "mean_token_accuracy": 0.861596092581749,
+      "num_tokens": 282990.0,
+      "step": 840
+    },
+    {
+      "entropy": 0.6585176661610603,
+      "epoch": 0.4857142857142857,
+      "grad_norm": 0.16561760008335114,
+      "learning_rate": 0.00010297142857142859,
+      "loss": 0.6332673072814942,
+      "mean_token_accuracy": 0.8653052359819412,
+      "num_tokens": 286266.0,
+      "step": 850
+    },
+    {
+      "entropy": 0.6495703898370266,
+      "epoch": 0.49142857142857144,
+      "grad_norm": 0.15301378071308136,
+      "learning_rate": 0.00010182857142857143,
+      "loss": 0.6165118217468262,
+      "mean_token_accuracy": 0.8553761512041091,
+      "num_tokens": 289474.0,
+      "step": 860
+    },
+    {
+      "entropy": 0.6630740962922573,
+      "epoch": 0.49714285714285716,
+      "grad_norm": 0.19287355244159698,
+      "learning_rate": 0.00010068571428571429,
+      "loss": 0.6098609924316406,
+      "mean_token_accuracy": 0.8592873826622963,
+      "num_tokens": 292837.0,
+      "step": 870
+    },
+    {
+      "entropy": 0.6436114467680454,
+      "epoch": 0.5028571428571429,
+      "grad_norm": 0.19893701374530792,
+      "learning_rate": 9.954285714285714e-05,
+      "loss": 0.6510508537292481,
+      "mean_token_accuracy": 0.8588305786252022,
+      "num_tokens": 296142.0,
+      "step": 880
+    },
+    {
+      "entropy": 0.6108668148517609,
+      "epoch": 0.5085714285714286,
+      "grad_norm": 0.24575024843215942,
+      "learning_rate": 9.84e-05,
+      "loss": 0.6009951591491699,
+      "mean_token_accuracy": 0.8642948284745217,
+      "num_tokens": 299685.0,
+      "step": 890
+    },
+    {
+      "entropy": 0.6458855651319027,
+      "epoch": 0.5142857142857142,
+      "grad_norm": 0.27966228127479553,
+      "learning_rate": 9.725714285714286e-05,
+      "loss": 0.6327517032623291,
+      "mean_token_accuracy": 0.8518458366394043,
+      "num_tokens": 303189.0,
+      "step": 900
+    },
+    {
+      "entropy": 0.6874921523034573,
+      "epoch": 0.52,
+      "grad_norm": 0.20877590775489807,
+      "learning_rate": 9.611428571428572e-05,
+      "loss": 0.6910979270935058,
+      "mean_token_accuracy": 0.8424041703343391,
+      "num_tokens": 306468.0,
+      "step": 910
+    },
+    {
+      "entropy": 0.6255587831139564,
+      "epoch": 0.5257142857142857,
+      "grad_norm": 0.2100318968296051,
+      "learning_rate": 9.497142857142857e-05,
+      "loss": 0.6689131736755372,
+      "mean_token_accuracy": 0.8548390552401542,
+      "num_tokens": 309955.0,
+      "step": 920
+    },
+    {
+      "entropy": 0.6701849482953548,
+      "epoch": 0.5314285714285715,
+      "grad_norm": 0.25995635986328125,
+      "learning_rate": 9.382857142857144e-05,
+      "loss": 0.6717410087585449,
+      "mean_token_accuracy": 0.8549696207046509,
+      "num_tokens": 313117.0,
+      "step": 930
+    },
+    {
+      "entropy": 0.6915492154657841,
+      "epoch": 0.5371428571428571,
+      "grad_norm": 0.21679414808750153,
+      "learning_rate": 9.268571428571429e-05,
+      "loss": 0.6586971759796143,
+      "mean_token_accuracy": 0.859293457865715,
+      "num_tokens": 316519.0,
+      "step": 940
+    },
+    {
+      "entropy": 0.6822380021214485,
+      "epoch": 0.5428571428571428,
+      "grad_norm": 0.2091035693883896,
+      "learning_rate": 9.154285714285715e-05,
+      "loss": 0.6489524841308594,
+      "mean_token_accuracy": 0.8576316460967064,
+      "num_tokens": 319782.0,
+      "step": 950
+    },
+    {
+      "entropy": 0.6756891958415508,
+      "epoch": 0.5485714285714286,
+      "grad_norm": 0.2485855668783188,
+      "learning_rate": 9.04e-05,
+      "loss": 0.656977367401123,
+      "mean_token_accuracy": 0.8671488896012306,
+      "num_tokens": 323019.0,
+      "step": 960
+    },
+    {
+      "entropy": 0.6616588845849037,
+      "epoch": 0.5542857142857143,
+      "grad_norm": 0.2510681450366974,
+      "learning_rate": 8.925714285714287e-05,
+      "loss": 0.6415849208831788,
+      "mean_token_accuracy": 0.8642254650592804,
+      "num_tokens": 326339.0,
+      "step": 970
+    },
+    {
+      "entropy": 0.6396586582064628,
+      "epoch": 0.56,
+      "grad_norm": 0.20449747145175934,
+      "learning_rate": 8.811428571428572e-05,
+      "loss": 0.6375674247741699,
+      "mean_token_accuracy": 0.8578563511371613,
+      "num_tokens": 329774.0,
+      "step": 980
+    },
+    {
+      "entropy": 0.6666180461645126,
+      "epoch": 0.5657142857142857,
+      "grad_norm": 0.2526116669178009,
+      "learning_rate": 8.697142857142857e-05,
+      "loss": 0.609344482421875,
+      "mean_token_accuracy": 0.8597319439053536,
+      "num_tokens": 333087.0,
+      "step": 990
+    },
+    {
+      "entropy": 0.6488007657229901,
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.20164141058921814,
+      "learning_rate": 8.582857142857143e-05,
+      "loss": 0.6201111316680908,
+      "mean_token_accuracy": 0.8526906743645668,
+      "num_tokens": 336508.0,
+      "step": 1000
+    },
+    {
+      "entropy": 0.63390611410141,
+      "epoch": 0.5771428571428572,
+      "grad_norm": 0.2473566234111786,
+      "learning_rate": 8.46857142857143e-05,
+      "loss": 0.5988630294799805,
+      "mean_token_accuracy": 0.8674046665430069,
+      "num_tokens": 339967.0,
+      "step": 1010
+    },
+    {
+      "entropy": 0.586044154316187,
+      "epoch": 0.5828571428571429,
+      "grad_norm": 0.21561333537101746,
+      "learning_rate": 8.354285714285715e-05,
+      "loss": 0.5799360752105713,
+      "mean_token_accuracy": 0.8686427220702171,
+      "num_tokens": 342923.0,
+      "step": 1020
+    },
+    {
+      "entropy": 0.6335062697529793,
+      "epoch": 0.5885714285714285,
+      "grad_norm": 0.20071916282176971,
+      "learning_rate": 8.24e-05,
+      "loss": 0.6467567443847656,
+      "mean_token_accuracy": 0.8589577525854111,
+      "num_tokens": 346417.0,
+      "step": 1030
+    },
+    {
+      "entropy": 0.6805698774755001,
+      "epoch": 0.5942857142857143,
+      "grad_norm": 0.295105904340744,
+      "learning_rate": 8.125714285714286e-05,
+      "loss": 0.6580337524414063,
+      "mean_token_accuracy": 0.8565412655472755,
+      "num_tokens": 349854.0,
+      "step": 1040
+    },
+    {
+      "entropy": 0.6416849888861179,
+      "epoch": 0.6,
+      "grad_norm": 0.25477221608161926,
+      "learning_rate": 8.011428571428573e-05,
+      "loss": 0.5906441688537598,
+      "mean_token_accuracy": 0.8619327709078789,
+      "num_tokens": 353294.0,
+      "step": 1050
+    },
+    {
+      "entropy": 0.6331484273076058,
+      "epoch": 0.6057142857142858,
+      "grad_norm": 0.22354693710803986,
+      "learning_rate": 7.897142857142858e-05,
+      "loss": 0.6218142509460449,
+      "mean_token_accuracy": 0.8641349360346794,
+      "num_tokens": 356543.0,
+      "step": 1060
+    },
+    {
+      "entropy": 0.6120303176343441,
+      "epoch": 0.6114285714285714,
+      "grad_norm": 0.20340518653392792,
+      "learning_rate": 7.782857142857143e-05,
+      "loss": 0.601622486114502,
+      "mean_token_accuracy": 0.8639883920550346,
+      "num_tokens": 359765.0,
+      "step": 1070
+    },
+    {
+      "entropy": 0.6292500749230385,
+      "epoch": 0.6171428571428571,
+      "grad_norm": 0.2069106251001358,
+      "learning_rate": 7.668571428571429e-05,
+      "loss": 0.6348252773284913,
+      "mean_token_accuracy": 0.8540400981903076,
+      "num_tokens": 363116.0,
+      "step": 1080
+    },
+    {
+      "entropy": 0.622652218490839,
+      "epoch": 0.6228571428571429,
+      "grad_norm": 0.23602575063705444,
+      "learning_rate": 7.554285714285716e-05,
+      "loss": 0.6226765632629394,
+      "mean_token_accuracy": 0.8595390111207962,
+      "num_tokens": 366720.0,
+      "step": 1090
+    },
+    {
+      "entropy": 0.6517547190189361,
+      "epoch": 0.6285714285714286,
+      "grad_norm": 0.28087317943573,
+      "learning_rate": 7.44e-05,
+      "loss": 0.6399660587310791,
+      "mean_token_accuracy": 0.8540969029068947,
+      "num_tokens": 370194.0,
+      "step": 1100
+    },
+    {
+      "entropy": 0.6615139648318291,
+      "epoch": 0.6342857142857142,
+      "grad_norm": 0.3010542094707489,
+      "learning_rate": 7.325714285714286e-05,
+      "loss": 0.6588103771209717,
+      "mean_token_accuracy": 0.8464817240834236,
+      "num_tokens": 373541.0,
+      "step": 1110
+    },
+    {
+      "entropy": 0.647005096077919,
+      "epoch": 0.64,
+      "grad_norm": 0.26990506052970886,
+      "learning_rate": 7.211428571428572e-05,
+      "loss": 0.6175911903381348,
+      "mean_token_accuracy": 0.8642094656825066,
+      "num_tokens": 376643.0,
+      "step": 1120
+    },
+    {
+      "entropy": 0.6991826109588146,
+      "epoch": 0.6457142857142857,
+      "grad_norm": 0.2759954333305359,
+      "learning_rate": 7.097142857142857e-05,
+      "loss": 0.6602604389190674,
+      "mean_token_accuracy": 0.8484420910477638,
+      "num_tokens": 379874.0,
+      "step": 1130
+    },
+    {
+      "entropy": 0.6453976444900036,
+      "epoch": 0.6514285714285715,
+      "grad_norm": 0.21964532136917114,
+      "learning_rate": 6.982857142857144e-05,
+      "loss": 0.5955167293548584,
+      "mean_token_accuracy": 0.861721670627594,
+      "num_tokens": 383311.0,
+      "step": 1140
+    },
+    {
+      "entropy": 0.6503799192607402,
+      "epoch": 0.6571428571428571,
+      "grad_norm": 0.19722476601600647,
+      "learning_rate": 6.868571428571429e-05,
+      "loss": 0.6320806026458741,
+      "mean_token_accuracy": 0.8562078520655632,
+      "num_tokens": 386679.0,
+      "step": 1150
+    },
+    {
+      "entropy": 0.6544807381927967,
+      "epoch": 0.6628571428571428,
+      "grad_norm": 0.28297051787376404,
+      "learning_rate": 6.754285714285714e-05,
+      "loss": 0.6531005382537842,
+      "mean_token_accuracy": 0.8550411075353622,
+      "num_tokens": 389963.0,
+      "step": 1160
+    },
+    {
+      "entropy": 0.6717952400445938,
+      "epoch": 0.6685714285714286,
+      "grad_norm": 0.24083739519119263,
+      "learning_rate": 6.64e-05,
+      "loss": 0.647878885269165,
+      "mean_token_accuracy": 0.8565554738044738,
+      "num_tokens": 393277.0,
+      "step": 1170
+    },
+    {
+      "entropy": 0.6969816297292709,
+      "epoch": 0.6742857142857143,
+      "grad_norm": 0.2341049164533615,
+      "learning_rate": 6.525714285714287e-05,
+      "loss": 0.6588397979736328,
+      "mean_token_accuracy": 0.8613204509019852,
+      "num_tokens": 396744.0,
+      "step": 1180
+    },
+    {
+      "entropy": 0.5977616436779499,
+      "epoch": 0.68,
+      "grad_norm": 0.21638202667236328,
+      "learning_rate": 6.411428571428572e-05,
+      "loss": 0.5673915386199951,
+      "mean_token_accuracy": 0.8722260281443596,
+      "num_tokens": 400200.0,
+      "step": 1190
+    },
+    {
+      "entropy": 0.6362225770950317,
+      "epoch": 0.6857142857142857,
+      "grad_norm": 0.2523053288459778,
+      "learning_rate": 6.297142857142857e-05,
+      "loss": 0.6235575199127197,
+      "mean_token_accuracy": 0.8589386835694313,
+      "num_tokens": 403472.0,
+      "step": 1200
+    },
+    {
+      "entropy": 0.6172734223306179,
+      "epoch": 0.6914285714285714,
+      "grad_norm": 0.21801182627677917,
+      "learning_rate": 6.182857142857143e-05,
+      "loss": 0.6179306030273437,
+      "mean_token_accuracy": 0.8626845121383667,
+      "num_tokens": 406776.0,
+      "step": 1210
+    },
+    {
+      "entropy": 0.6551583506166935,
+      "epoch": 0.6971428571428572,
+      "grad_norm": 0.27856746315956116,
+      "learning_rate": 6.068571428571429e-05,
+      "loss": 0.6389457225799561,
+      "mean_token_accuracy": 0.8546857088804245,
+      "num_tokens": 410307.0,
+      "step": 1220
+    },
+    {
+      "entropy": 0.7079406701028347,
+      "epoch": 0.7028571428571428,
+      "grad_norm": 0.29186904430389404,
+      "learning_rate": 5.9542857142857146e-05,
+      "loss": 0.7094098567962647,
+      "mean_token_accuracy": 0.8335159897804261,
+      "num_tokens": 413745.0,
+      "step": 1230
+    },
+    {
+      "entropy": 0.683994609862566,
+      "epoch": 0.7085714285714285,
+      "grad_norm": 0.2444518506526947,
+      "learning_rate": 5.8399999999999997e-05,
+      "loss": 0.626039981842041,
+      "mean_token_accuracy": 0.8552980974316597,
+      "num_tokens": 416674.0,
+      "step": 1240
+    },
+    {
+      "entropy": 0.6473595723509789,
+      "epoch": 0.7142857142857143,
+      "grad_norm": 0.2124943733215332,
+      "learning_rate": 5.725714285714287e-05,
+      "loss": 0.6362271308898926,
+      "mean_token_accuracy": 0.8586657717823982,
+      "num_tokens": 419814.0,
+      "step": 1250
+    },
+    {
+      "entropy": 0.6558213859796524,
+      "epoch": 0.72,
+      "grad_norm": 0.3233776092529297,
+      "learning_rate": 5.611428571428572e-05,
+      "loss": 0.6335158824920655,
+      "mean_token_accuracy": 0.8607818141579628,
+      "num_tokens": 423034.0,
+      "step": 1260
+    },
+    {
+      "entropy": 0.6101858265697956,
+      "epoch": 0.7257142857142858,
+      "grad_norm": 0.23161561787128448,
+      "learning_rate": 5.4971428571428576e-05,
+      "loss": 0.5872994422912597,
+      "mean_token_accuracy": 0.8612972646951675,
+      "num_tokens": 426358.0,
+      "step": 1270
+    },
+    {
+      "entropy": 0.6065807826817036,
+      "epoch": 0.7314285714285714,
+      "grad_norm": 0.3137820065021515,
+      "learning_rate": 5.3828571428571426e-05,
+      "loss": 0.6199213027954101,
+      "mean_token_accuracy": 0.8634087055921554,
+      "num_tokens": 429800.0,
+      "step": 1280
+    },
+    {
+      "entropy": 0.6477028757333756,
+      "epoch": 0.7371428571428571,
+      "grad_norm": 0.20417962968349457,
+      "learning_rate": 5.2685714285714284e-05,
+      "loss": 0.6403303623199463,
+      "mean_token_accuracy": 0.858014090359211,
+      "num_tokens": 433136.0,
+      "step": 1290
+    },
+    {
+      "entropy": 0.6253302626311779,
+      "epoch": 0.7428571428571429,
+      "grad_norm": 0.2649274468421936,
+      "learning_rate": 5.154285714285715e-05,
+      "loss": 0.5664835453033448,
+      "mean_token_accuracy": 0.8699078008532524,
+      "num_tokens": 436165.0,
+      "step": 1300
+    },
+    {
+      "entropy": 0.6552011057734489,
+      "epoch": 0.7485714285714286,
+      "grad_norm": 0.22695457935333252,
+      "learning_rate": 5.0400000000000005e-05,
+      "loss": 0.6598159313201905,
+      "mean_token_accuracy": 0.8518921718001365,
+      "num_tokens": 439362.0,
+      "step": 1310
+    },
+    {
+      "entropy": 0.6426700457930565,
+      "epoch": 0.7542857142857143,
+      "grad_norm": 0.2776673436164856,
+      "learning_rate": 4.9257142857142856e-05,
+      "loss": 0.6768081188201904,
+      "mean_token_accuracy": 0.8481876432895661,
+      "num_tokens": 442586.0,
+      "step": 1320
+    },
+    {
+      "entropy": 0.6550601176917553,
+      "epoch": 0.76,
+      "grad_norm": 0.27410948276519775,
+      "learning_rate": 4.811428571428572e-05,
+      "loss": 0.636206865310669,
+      "mean_token_accuracy": 0.8564146623015404,
+      "num_tokens": 446043.0,
+      "step": 1330
+    },
+    {
+      "entropy": 0.6599532529711724,
+      "epoch": 0.7657142857142857,
+      "grad_norm": 0.23421648144721985,
+      "learning_rate": 4.697142857142857e-05,
+      "loss": 0.6225139617919921,
+      "mean_token_accuracy": 0.8554374307394028,
+      "num_tokens": 449551.0,
+      "step": 1340
+    },
+    {
+      "entropy": 0.6726249933242798,
+      "epoch": 0.7714285714285715,
+      "grad_norm": 0.23552796244621277,
+      "learning_rate": 4.5828571428571435e-05,
+      "loss": 0.6367889881134033,
+      "mean_token_accuracy": 0.8643324449658394,
+      "num_tokens": 452586.0,
+      "step": 1350
+    },
+    {
+      "entropy": 0.6467246741056443,
+      "epoch": 0.7771428571428571,
+      "grad_norm": 0.18453116714954376,
+      "learning_rate": 4.4685714285714286e-05,
+      "loss": 0.6355658054351807,
+      "mean_token_accuracy": 0.8546889841556549,
+      "num_tokens": 455761.0,
+      "step": 1360
+    },
+    {
+      "entropy": 0.6625100992619991,
+      "epoch": 0.7828571428571428,
+      "grad_norm": 0.21036536991596222,
+      "learning_rate": 4.354285714285714e-05,
+      "loss": 0.6642748355865479,
+      "mean_token_accuracy": 0.8560309842228889,
+      "num_tokens": 459175.0,
+      "step": 1370
+    },
+    {
+      "entropy": 0.6317512311041356,
+      "epoch": 0.7885714285714286,
+      "grad_norm": 0.21304954588413239,
+      "learning_rate": 4.24e-05,
+      "loss": 0.601001787185669,
+      "mean_token_accuracy": 0.8565791383385658,
+      "num_tokens": 462701.0,
+      "step": 1380
+    },
+    {
+      "entropy": 0.644014036655426,
+      "epoch": 0.7942857142857143,
+      "grad_norm": 0.26754629611968994,
+      "learning_rate": 4.125714285714286e-05,
+      "loss": 0.6244466781616211,
+      "mean_token_accuracy": 0.8559624046087265,
+      "num_tokens": 465821.0,
+      "step": 1390
+    },
+    {
+      "entropy": 0.6658924698829651,
+      "epoch": 0.8,
+      "grad_norm": 0.2504512369632721,
+      "learning_rate": 4.0114285714285715e-05,
+      "loss": 0.6396134853363037,
+      "mean_token_accuracy": 0.8645422115921975,
+      "num_tokens": 469319.0,
+      "step": 1400
+    },
+    {
+      "entropy": 0.6185431003570556,
+      "epoch": 0.8057142857142857,
+      "grad_norm": 0.2430698573589325,
+      "learning_rate": 3.897142857142857e-05,
+      "loss": 0.6310626029968261,
+      "mean_token_accuracy": 0.8630247727036476,
+      "num_tokens": 472580.0,
+      "step": 1410
+    },
+    {
+      "entropy": 0.5944720163941384,
+      "epoch": 0.8114285714285714,
+      "grad_norm": 0.25170740485191345,
+      "learning_rate": 3.782857142857143e-05,
+      "loss": 0.5594588279724121,
+      "mean_token_accuracy": 0.8738556310534478,
+      "num_tokens": 475539.0,
+      "step": 1420
+    },
+    {
+      "entropy": 0.6284624971449375,
+      "epoch": 0.8171428571428572,
+      "grad_norm": 0.2709786295890808,
+      "learning_rate": 3.668571428571429e-05,
+      "loss": 0.639189624786377,
+      "mean_token_accuracy": 0.862536971271038,
+      "num_tokens": 478827.0,
+      "step": 1430
+    },
+    {
+      "entropy": 0.6724597789347172,
+      "epoch": 0.8228571428571428,
+      "grad_norm": 0.2555326521396637,
+      "learning_rate": 3.5542857142857145e-05,
+      "loss": 0.6536776542663574,
+      "mean_token_accuracy": 0.8553947672247887,
+      "num_tokens": 482017.0,
+      "step": 1440
+    },
+    {
+      "entropy": 0.6180280610918999,
+      "epoch": 0.8285714285714286,
+      "grad_norm": 0.22328683733940125,
+      "learning_rate": 3.4399999999999996e-05,
+      "loss": 0.6490192413330078,
+      "mean_token_accuracy": 0.8624721497297287,
+      "num_tokens": 485285.0,
+      "step": 1450
+    },
+    {
+      "entropy": 0.6554854273796081,
+      "epoch": 0.8342857142857143,
+      "grad_norm": 0.2618810832500458,
+      "learning_rate": 3.325714285714286e-05,
+      "loss": 0.6328207015991211,
+      "mean_token_accuracy": 0.8652528643608093,
+      "num_tokens": 488237.0,
+      "step": 1460
+    },
+    {
+      "entropy": 0.6485379718244075,
+      "epoch": 0.84,
+      "grad_norm": 0.3207658529281616,
+      "learning_rate": 3.211428571428571e-05,
+      "loss": 0.6625119686126709,
+      "mean_token_accuracy": 0.8627592906355858,
+      "num_tokens": 491545.0,
+      "step": 1470
+    },
+    {
+      "entropy": 0.6200287729501724,
+      "epoch": 0.8457142857142858,
+      "grad_norm": 0.2355208396911621,
+      "learning_rate": 3.0971428571428575e-05,
+      "loss": 0.5889796733856201,
+      "mean_token_accuracy": 0.8751833841204644,
+      "num_tokens": 494760.0,
+      "step": 1480
+    },
+    {
+      "entropy": 0.6196223556995392,
+      "epoch": 0.8514285714285714,
+      "grad_norm": 0.19851188361644745,
+      "learning_rate": 2.982857142857143e-05,
+      "loss": 0.5831719875335694,
+      "mean_token_accuracy": 0.8716864466667176,
+      "num_tokens": 497728.0,
+      "step": 1490
+    },
+    {
+      "entropy": 0.6233154498040676,
+      "epoch": 0.8571428571428571,
+      "grad_norm": 0.24745343625545502,
+      "learning_rate": 2.8685714285714286e-05,
+      "loss": 0.6103721141815186,
+      "mean_token_accuracy": 0.8551008567214012,
+      "num_tokens": 500804.0,
+      "step": 1500
+    },
+    {
+      "entropy": 0.599901518970728,
+      "epoch": 0.8628571428571429,
+      "grad_norm": 0.23715294897556305,
+      "learning_rate": 2.7542857142857144e-05,
+      "loss": 0.5849476814270019,
+      "mean_token_accuracy": 0.8684423178434372,
+      "num_tokens": 504066.0,
+      "step": 1510
+    },
+    {
+      "entropy": 0.629510759562254,
+      "epoch": 0.8685714285714285,
+      "grad_norm": 0.2427317500114441,
+      "learning_rate": 2.64e-05,
+      "loss": 0.631610631942749,
+      "mean_token_accuracy": 0.8575234442949295,
+      "num_tokens": 507430.0,
+      "step": 1520
+    },
+    {
+      "entropy": 0.6032628089189529,
+      "epoch": 0.8742857142857143,
+      "grad_norm": 0.20266121625900269,
+      "learning_rate": 2.5257142857142855e-05,
+      "loss": 0.5601680755615235,
+      "mean_token_accuracy": 0.8660111904144288,
+      "num_tokens": 510923.0,
+      "step": 1530
+    },
+    {
+      "entropy": 0.599126148968935,
+      "epoch": 0.88,
+      "grad_norm": 0.24769169092178345,
+      "learning_rate": 2.4114285714285713e-05,
+      "loss": 0.6129156589508057,
+      "mean_token_accuracy": 0.8624033451080322,
+      "num_tokens": 514276.0,
+      "step": 1540
+    },
+    {
+      "entropy": 0.6265991859138011,
+      "epoch": 0.8857142857142857,
+      "grad_norm": 0.24149306118488312,
+      "learning_rate": 2.297142857142857e-05,
+      "loss": 0.645173168182373,
+      "mean_token_accuracy": 0.8579171389341355,
+      "num_tokens": 517443.0,
+      "step": 1550
+    },
+    {
+      "entropy": 0.5904549680650234,
+      "epoch": 0.8914285714285715,
+      "grad_norm": 0.2332906723022461,
+      "learning_rate": 2.1828571428571428e-05,
+      "loss": 0.5481174945831299,
+      "mean_token_accuracy": 0.8670677661895752,
+      "num_tokens": 520947.0,
+      "step": 1560
+    },
+    {
+      "entropy": 0.7045296929776669,
+      "epoch": 0.8971428571428571,
+      "grad_norm": 0.37327486276626587,
+      "learning_rate": 2.0685714285714285e-05,
+      "loss": 0.6851255416870117,
+      "mean_token_accuracy": 0.8483647271990776,
+      "num_tokens": 524104.0,
+      "step": 1570
+    },
+    {
+      "entropy": 0.6310351669788361,
+      "epoch": 0.9028571428571428,
+      "grad_norm": 0.27085983753204346,
+      "learning_rate": 1.9542857142857143e-05,
+      "loss": 0.6175636291503906,
+      "mean_token_accuracy": 0.8585921674966812,
+      "num_tokens": 527462.0,
+      "step": 1580
+    },
+    {
+      "entropy": 0.6248554646968841,
+      "epoch": 0.9085714285714286,
+      "grad_norm": 0.29151156544685364,
+      "learning_rate": 1.84e-05,
+      "loss": 0.6107958793640137,
+      "mean_token_accuracy": 0.8647030532360077,
+      "num_tokens": 530929.0,
+      "step": 1590
+    },
+    {
+      "entropy": 0.6354066073894501,
+      "epoch": 0.9142857142857143,
+      "grad_norm": 0.26907071471214294,
+      "learning_rate": 1.7257142857142857e-05,
+      "loss": 0.6115604400634765,
+      "mean_token_accuracy": 0.8671793237328529,
+      "num_tokens": 534220.0,
+      "step": 1600
+    },
+    {
+      "entropy": 0.6623220466077328,
+      "epoch": 0.92,
+      "grad_norm": 0.23479118943214417,
+      "learning_rate": 1.6114285714285715e-05,
+      "loss": 0.629840898513794,
+      "mean_token_accuracy": 0.8525548726320267,
+      "num_tokens": 537701.0,
+      "step": 1610
+    },
+    {
+      "entropy": 0.6476062543690204,
+      "epoch": 0.9257142857142857,
+      "grad_norm": 0.2056824117898941,
+      "learning_rate": 1.4971428571428572e-05,
+      "loss": 0.6435581684112549,
+      "mean_token_accuracy": 0.8595114961266518,
+      "num_tokens": 541293.0,
+      "step": 1620
+    },
+    {
+      "entropy": 0.6168920576572419,
+      "epoch": 0.9314285714285714,
+      "grad_norm": 0.22080306708812714,
+      "learning_rate": 1.382857142857143e-05,
+      "loss": 0.5561806678771972,
+      "mean_token_accuracy": 0.8639644294977188,
+      "num_tokens": 544565.0,
+      "step": 1630
+    },
+    {
+      "entropy": 0.6294913746416568,
+      "epoch": 0.9371428571428572,
+      "grad_norm": 0.3226984441280365,
+      "learning_rate": 1.2685714285714287e-05,
+      "loss": 0.5850194931030274,
+      "mean_token_accuracy": 0.8648865327239037,
+      "num_tokens": 547743.0,
+      "step": 1640
+    },
+    {
+      "entropy": 0.6362057097256184,
+      "epoch": 0.9428571428571428,
+      "grad_norm": 0.2684152126312256,
+      "learning_rate": 1.1542857142857143e-05,
+      "loss": 0.6132237911224365,
+      "mean_token_accuracy": 0.8577535077929497,
+      "num_tokens": 551275.0,
+      "step": 1650
+    },
+    {
+      "entropy": 0.6077159576117992,
+      "epoch": 0.9485714285714286,
+      "grad_norm": 0.26599714159965515,
+      "learning_rate": 1.04e-05,
+      "loss": 0.5903688430786133,
+      "mean_token_accuracy": 0.8740017876029015,
+      "num_tokens": 554538.0,
+      "step": 1660
+    },
+    {
+      "entropy": 0.6076049767434597,
+      "epoch": 0.9542857142857143,
+      "grad_norm": 0.22315815091133118,
+      "learning_rate": 9.257142857142858e-06,
+      "loss": 0.5887226104736328,
+      "mean_token_accuracy": 0.8686643913388252,
+      "num_tokens": 557948.0,
+      "step": 1670
+    },
+    {
+      "entropy": 0.6360240176320076,
+      "epoch": 0.96,
+      "grad_norm": 0.2517399787902832,
+      "learning_rate": 8.114285714285715e-06,
+      "loss": 0.6221353054046631,
+      "mean_token_accuracy": 0.8617710500955582,
+      "num_tokens": 561093.0,
+      "step": 1680
+    },
+    {
+      "entropy": 0.6221488267183304,
+      "epoch": 0.9657142857142857,
+      "grad_norm": 0.2868978977203369,
+      "learning_rate": 6.971428571428572e-06,
+      "loss": 0.6088948249816895,
+      "mean_token_accuracy": 0.8645280092954636,
+      "num_tokens": 564497.0,
+      "step": 1690
+    },
+    {
+      "entropy": 0.6176722340285778,
+      "epoch": 0.9714285714285714,
+      "grad_norm": 0.27099671959877014,
+      "learning_rate": 5.828571428571429e-06,
+      "loss": 0.5873197078704834,
+      "mean_token_accuracy": 0.8671502575278283,
+      "num_tokens": 567977.0,
+      "step": 1700
+    },
+    {
+      "entropy": 0.5968088746070862,
+      "epoch": 0.9771428571428571,
+      "grad_norm": 0.25095444917678833,
+      "learning_rate": 4.685714285714286e-06,
+      "loss": 0.6059055805206299,
+      "mean_token_accuracy": 0.8739419683814049,
+      "num_tokens": 571185.0,
+      "step": 1710
+    },
+    {
+      "entropy": 0.6225132785737515,
+      "epoch": 0.9828571428571429,
+      "grad_norm": 0.28391072154045105,
+      "learning_rate": 3.542857142857143e-06,
+      "loss": 0.5855085372924804,
+      "mean_token_accuracy": 0.8671080946922303,
+      "num_tokens": 574447.0,
+      "step": 1720
+    },
+    {
+      "entropy": 0.6214111320674419,
+      "epoch": 0.9885714285714285,
+      "grad_norm": 0.30393826961517334,
+      "learning_rate": 2.4000000000000003e-06,
+      "loss": 0.6098702907562256,
+      "mean_token_accuracy": 0.8673963889479637,
+      "num_tokens": 577647.0,
+      "step": 1730
+    },
+    {
+      "entropy": 0.6451270334422589,
+      "epoch": 0.9942857142857143,
+      "grad_norm": 0.6247619390487671,
+      "learning_rate": 1.2571428571428573e-06,
+      "loss": 0.6435680389404297,
+      "mean_token_accuracy": 0.8606103897094727,
+      "num_tokens": 580924.0,
+      "step": 1740
+    },
+    {
+      "entropy": 0.6013512119650841,
+      "epoch": 1.0,
+      "grad_norm": 0.22891853749752045,
+      "learning_rate": 1.142857142857143e-07,
+      "loss": 0.5656134605407714,
+      "mean_token_accuracy": 0.8671733975410462,
+      "num_tokens": 584229.0,
+      "step": 1750
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 1750,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4600872360760320.0,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

sql-model/checkpoint-1750/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:29a4a11ec0ba52a64430eabdbdc4808ed84fc37c06b05e2f78d6eedc6da2ee37
+size 5649

sql-model/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
+size 11421892

sql-model/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "is_local": false,
+  "model_max_length": 32768,
+  "pad_token": "<|im_end|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}