Instructions to use cs-552-2026-the-transformers/math_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cs-552-2026-the-transformers/math_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cs-552-2026-the-transformers/math_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cs-552-2026-the-transformers/math_model")
model = AutoModelForCausalLM.from_pretrained("cs-552-2026-the-transformers/math_model", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cs-552-2026-the-transformers/math_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cs-552-2026-the-transformers/math_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-the-transformers/math_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cs-552-2026-the-transformers/math_model

SGLang

How to use cs-552-2026-the-transformers/math_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cs-552-2026-the-transformers/math_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-the-transformers/math_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cs-552-2026-the-transformers/math_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-the-transformers/math_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cs-552-2026-the-transformers/math_model with Docker Model Runner:
```
docker model run hf.co/cs-552-2026-the-transformers/math_model
```

Automated MNLP evaluation report (2026-06-11)

by zechen-nlp - opened Jun 3

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

+12069

-62

This view is limited to 50 files because it contains too many changes. See the raw diff here.

Files changed (50) hide show

EVAL_REPORT.md +144 -0
README.md +2 -48
chat_template.jinja +1 -5
checkpoint-1011/chat_template.jinja +89 -0
checkpoint-1011/config.json +63 -0
checkpoint-1011/generation_config.json +12 -0
checkpoint-1011/model.safetensors +3 -0
checkpoint-1011/optimizer.pt +3 -0
checkpoint-1011/rng_state.pth +3 -0
checkpoint-1011/scheduler.pt +3 -0
checkpoint-1011/tokenizer.json +3 -0
checkpoint-1011/tokenizer_config.json +30 -0
checkpoint-1011/trainer_state.json +1044 -0
checkpoint-1011/training_args.bin +3 -0
checkpoint-125/chat_template.jinja +89 -0
checkpoint-125/config.json +63 -0
checkpoint-125/generation_config.json +12 -0
checkpoint-125/model.safetensors +3 -0
checkpoint-125/optimizer.pt +3 -0
checkpoint-125/rng_state.pth +3 -0
checkpoint-125/scheduler.pt +3 -0
checkpoint-125/tokenizer.json +3 -0
checkpoint-125/tokenizer_config.json +30 -0
checkpoint-125/trainer_state.json +154 -0
checkpoint-125/training_args.bin +3 -0
checkpoint-500/chat_template.jinja +89 -0
checkpoint-500/config.json +63 -0
checkpoint-500/generation_config.json +12 -0
checkpoint-500/model.safetensors +3 -0
checkpoint-500/optimizer.pt +3 -0
checkpoint-500/rng_state.pth +3 -0
checkpoint-500/scheduler.pt +3 -0
checkpoint-500/tokenizer.json +3 -0
checkpoint-500/tokenizer_config.json +30 -0
checkpoint-500/trainer_state.json +589 -0
checkpoint-500/training_args.bin +3 -0
checkpoint-600/chat_template.jinja +89 -0
checkpoint-600/config.json +63 -0
checkpoint-600/generation_config.json +12 -0
checkpoint-600/model.safetensors +3 -0
checkpoint-600/optimizer.pt +3 -0
checkpoint-600/rng_state.pth +3 -0
checkpoint-600/scheduler.pt +3 -0
checkpoint-600/tokenizer.json +3 -0
checkpoint-600/tokenizer_config.json +30 -0
checkpoint-600/trainer_state.json +700 -0
checkpoint-600/training_args.bin +3 -0
checkpoint-606/chat_template.jinja +89 -0
checkpoint-606/config.json +63 -0
checkpoint-606/generation_config.json +12 -0

EVAL_REPORT.md ADDED Viewed

	@@ -0,0 +1,144 @@

+# Automated MNLP evaluation report
+- **Model repo:** [`cs-552-2026-the-transformers/math_model`](https://huggingface.co/cs-552-2026-the-transformers/math_model)
+- **Owner(s):** group **the-transformers**
+- **Generated at:** 2026-06-11T06:23:10+00:00 (UTC)
+- **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
+_This PR is opened automatically by the course CI. It is **non-blocking** — you do not need to merge it. The next nightly run will refresh this file._
+## Evaluated checkpoint
+- **Commit:** [`f9658b7`](https://huggingface.co/cs-552-2026-the-transformers/math_model/commit/f9658b7782b5e50dd8aa2a81f67c1d03d3476f92)
+- **Message:** Update model
+- **Committed:** 2026-06-10T17:25:31+00:00
+## Summary
+| Benchmark | Accuracy | Status |
+|---|---:|---|
+| Math | 0.5300 | ok |
+| Knowledge | — | not run |
+| Multilingual | — | not run |
+| Safety | — | not run |
+## Sample completions
+_Prompts are intentionally omitted to avoid revealing benchmark contents. For multi-completion problems, only one completion is shown per sample._
+### Math
+**Correct** (1 shown)
+- **reference**: `40_9`
+- **overall** (7/8 completions correct)
+- **extracted** (✓): `40_9`
+- **completion**:
+    ```text
+    <think>
+    Okay, let's see. I need to subtract two numbers in base 9: 58₉ minus 18₉. Hmm, base 9... I remember that subtracting in different bases can be similar to decimal, but you have to remember that each digit goes from 0 to 8 instead of 0 to 9. So maybe I should convert both numbers to base 10, subtract them, and then convert the result back to base 9? That might be easier for me, since I'm mor
+    ... [truncated 823 chars] ...
+    I think the answer is correct.
+    </think>
+    To solve the problem of evaluating \(58_9 - 18_9\) and writing the result in base 9, we can follow these steps:
+    1. **Convert both numbers to base 10:**
+       - Convert \(58_9\) to base 10:
+         \[
+         58_9 = 5 \times 9^1 + 8 \times 9^0 = 5 \times 9 + 8 = 45 + 8 = 53_{10}
+         \]
+       - Convert \(18_9\) to base 10:
+         \[
+         18_9 = 1 \times 9^1 + 8 \times 9^0 = 1 \times 9 + 8 = 9 + 8 = 17_{10}
+         \]
+    2. **Subtract the two base 10 numbers:**
+       \[
+       53_{10} - 17_{10} = 36_{10}
+       \]
+    3. **Convert the result back to base 9:**
+       - Divide 36 by 9 to find the quotient and remainder:
+         \[
+         36 \div 9 = 4 \text{ remainder } 0
+         \]
+       - Therefore, \(36_{10}\) in base 9 is \(40_9\).
+    4. **Verify the result:**
+       - Convert \(40_9\) back to base 10:
+         \[
+         40_9 = 4 \times 9^1 + 0 \times 9^0 = 4 \times 9 + 0 = 36_{10}
+         \]
+       - Convert \(58_9\) and \(18_9\) back to base 10:
+         \[
+         58_9 = 53_{10}, \quad 18_9 = 17_{10}
+         \]
+       - Subtract the base 10 numbers:
+         \[
+         53_{10} - 17_{10} = 36_{10}
+         \]
+       - Convert \(36_{10}\) back to base 9:
+         \[
+         36_{10} = 40_9
+         \]
+    Thus, the final answer is \(\boxed{40_9}\).
+    ```
+**Incorrect** (1 shown)
+- **reference**: `$f(x,y)= g(x+y, xy(x-y)^{2})$ for some polynomial $g$`
+- **overall** (0/8 completions correct)
+- **extracted** (✗): `c`
+- **completion**:
+    ```text
+    <think>
+    Okay, let's try to figure out this problem. So, we need to find all polynomials f(x, y) in complex numbers such that for any complex numbers a and b, the equation f(a², b²) equals f((a - b)² / 2, (a + b)² / 2). Hmm, interesting. Let me start by understanding what this condition means.
+    First, maybe I should consider specific values for a and b to see if I can spot a pattern or figure out t
+    ... [truncated 822 chars] ...
+    + q}{2}\right)a^2 + \left(\frac{p + q}{2}\right)b^2 - pqab + r.
+    \]
+    Equating the coefficients with the left-hand side:
+    \[
+    p a^2 + q b^2 + r = \left(\frac{p + q}{2}\right)a^2 + \left(\frac{p + q}{2}\right)b^2 - pqab + r.
+    \]
+    For this to hold for all $ a $ and $ b $, the coefficients must match:
+    \[
+    p = \frac{p + q}{2} \implies 2p = p + q \implies q = p,
+    \]
+    \[
+    q = \frac{p + q}{2} \implies 2q = p + q \implies p = q,
+    \]
+    \[
+    -pq = 0 \implies pq = 0 \implies p = 0 \text{ (since } p \neq 0).
+    \]
+    Thus, the only solution for linear polynomials is $ f(x, y) = 0 $, which is a constant.
+    #### Quadratic Polynomials
+    Assume $ f(x, y) = Ax^2 + By^2 + Cxy + Dx + Ey + F $. Substitute into the equation and equate coefficients. This leads to:
+    \[
+    A = B = C = D = E = F = 0.
+    \]
+    Thus, the only solution is $ f(x, y) = c $.
+    ### Step 4: Higher-Degree Polynomials
+    Higher-degree polynomials will similarly force all coefficients to zero, confirming that only constant polynomials satisfy the condition.
+    ### Conclusion
+    After testing various forms and deriving constraints, the only polynomials $ f(x, y) \in \mathbb{C}[x, y] \) that satisfy the given functional equation are the constant polynomials.
+    $$
+    \boxed{c}
+    $$
+    ```

README.md CHANGED Viewed

@@ -2,56 +2,10 @@
 base_model: Qwen/Qwen3-1.7B
 library_name: transformers
 tags:
-- generated_from_trainer
-- trl
 - sft
-licence: license
 ---
 # math_model
-This model is a fine-tuned version of [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="cs-552-2026-the-transformers/math_model", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
-```
-## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/jdecim/MNLP/runs/2btc84l3)
-This model was trained with SFT.
-### Framework versions
-- TRL: 0.27.2
-- Transformers: 5.8.0
-- Pytorch: 2.10.0+cu128
-- Datasets: 4.8.5
-- Tokenizers: 0.22.2
-## Citations
-Cite TRL as:
-```bibtex
-@misc{vonwerra2022trl,
-	title        = {{TRL: Transformer Reinforcement Learning}},
-	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
-	year         = 2020,
-	journal      = {GitHub repository},
-	publisher    = {GitHub},
-	howpublished = {\url{https://github.com/huggingface/trl}}
-}
-```

 base_model: Qwen/Qwen3-1.7B
 library_name: transformers
 tags:
+- math
 - sft
 ---
 # math_model
+Fine-tuned version of [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) for math reasoning.

chat_template.jinja CHANGED Viewed

@@ -10,13 +10,9 @@
     {%- endfor %}
     {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
 {%- else %}
-    {#- BOXING INSTRUCTION injected so it is present even when the caller (e.g. CI)
-        passes no system message. Targets our no-box / multi-box failure mode. #}
-    {{- '<|im_start|>system\n' }}
     {%- if messages[0].role == 'system' %}
-        {{- messages[0].content + '\n\n' }}
     {%- endif %}
-    {{- 'When solving a math problem, work through it step by step and write your final answer inside a \boxed{...}.<|im_end|>\n' }}
 {%- endif %}
 {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
 {%- for message in messages[::-1] %}

     {%- endfor %}
     {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
 {%- else %}
     {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
     {%- endif %}
 {%- endif %}
 {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
 {%- for message in messages[::-1] %}

checkpoint-1011/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,89 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

checkpoint-1011/config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": null,
+  "dtype": "bfloat16",
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 6144,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 28,
+  "model_type": "qwen3",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 1000000,
+    "rope_type": "default"
+  },
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.7.0",
+  "use_cache": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

checkpoint-1011/generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "temperature": 0.6,
+  "top_k": 20,
+  "top_p": 0.95,
+  "transformers_version": "5.7.0"
+}

checkpoint-1011/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3af4557502ae7c7853ed3b684aa50cfd00317aafc237f2dda688075bb3dc81c1
+size 3441185608

checkpoint-1011/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:016daefe3fcafd2822ed799155138cb8e476fe309f9070f5157970a18bcc6d4c
+size 6882572207

checkpoint-1011/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4a9f217e852f439efa6bd32fde98d6867f11aa6ea13ddc021ba10af6a0b0934
+size 14645

checkpoint-1011/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ee946d348a6273229e02f92d4563cb2ba7ac75075130a9d6c5c0eb31d584ddc
+size 1465

checkpoint-1011/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
+size 11422650

checkpoint-1011/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-1011/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1044 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 3.0,
+  "eval_steps": 500,
+  "global_step": 1011,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": NaN,
+      "epoch": 0.029717682020802376,
+      "grad_norm": NaN,
+      "learning_rate": 1.8000000000000001e-06,
+      "loss": 1005514863411.2,
+      "mean_token_accuracy": 0.07770282882993343,
+      "num_tokens": 97764.0,
+      "step": 10
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.05943536404160475,
+      "grad_norm": NaN,
+      "learning_rate": 3.8000000000000005e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007392044077278115,
+      "num_tokens": 190372.0,
+      "step": 20
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.08915304606240713,
+      "grad_norm": NaN,
+      "learning_rate": 5.8e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000383764869911829,
+      "num_tokens": 280928.0,
+      "step": 30
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.1188707280832095,
+      "grad_norm": NaN,
+      "learning_rate": 7.800000000000002e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0003212536917999387,
+      "num_tokens": 380644.0,
+      "step": 40
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.1485884101040119,
+      "grad_norm": NaN,
+      "learning_rate": 9.800000000000001e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0003853117574180942,
+      "num_tokens": 475467.0,
+      "step": 50
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.17830609212481427,
+      "grad_norm": NaN,
+      "learning_rate": 1.18e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0003134153150313068,
+      "num_tokens": 572411.0,
+      "step": 60
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.20802377414561665,
+      "grad_norm": NaN,
+      "learning_rate": 1.38e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0011001345737895463,
+      "num_tokens": 661517.0,
+      "step": 70
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.237741456166419,
+      "grad_norm": NaN,
+      "learning_rate": 1.58e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0012231820757733658,
+      "num_tokens": 752142.0,
+      "step": 80
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.2674591381872214,
+      "grad_norm": NaN,
+      "learning_rate": 1.7800000000000002e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0011551132018212229,
+      "num_tokens": 850399.0,
+      "step": 90
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.2971768202080238,
+      "grad_norm": NaN,
+      "learning_rate": 1.98e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00047943509634933433,
+      "num_tokens": 954759.0,
+      "step": 100
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.32689450222882616,
+      "grad_norm": NaN,
+      "learning_rate": 1.9802414928649837e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0008438902725174558,
+      "num_tokens": 1050439.0,
+      "step": 110
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.35661218424962854,
+      "grad_norm": NaN,
+      "learning_rate": 1.958287596048299e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0009903057703922967,
+      "num_tokens": 1135089.0,
+      "step": 120
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.3863298662704309,
+      "grad_norm": NaN,
+      "learning_rate": 1.9363336992316137e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00040032515971688554,
+      "num_tokens": 1228149.0,
+      "step": 130
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.4160475482912333,
+      "grad_norm": NaN,
+      "learning_rate": 1.9143798024149288e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0010247966441966128,
+      "num_tokens": 1331672.0,
+      "step": 140
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.4457652303120357,
+      "grad_norm": NaN,
+      "learning_rate": 1.892425905598244e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000843161555530969,
+      "num_tokens": 1421314.0,
+      "step": 150
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.475482912332838,
+      "grad_norm": NaN,
+      "learning_rate": 1.8704720087815587e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007889680869993754,
+      "num_tokens": 1511701.0,
+      "step": 160
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.5052005943536404,
+      "grad_norm": NaN,
+      "learning_rate": 1.848518111964874e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007493881763366517,
+      "num_tokens": 1605063.0,
+      "step": 170
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.5349182763744428,
+      "grad_norm": NaN,
+      "learning_rate": 1.826564215148189e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000567350817436818,
+      "num_tokens": 1703107.0,
+      "step": 180
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.5646359583952452,
+      "grad_norm": NaN,
+      "learning_rate": 1.804610318331504e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00027060438151238486,
+      "num_tokens": 1796279.0,
+      "step": 190
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.5943536404160475,
+      "grad_norm": NaN,
+      "learning_rate": 1.782656421514819e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0008226798396208324,
+      "num_tokens": 1893647.0,
+      "step": 200
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.6240713224368499,
+      "grad_norm": NaN,
+      "learning_rate": 1.760702524698134e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007338574505411088,
+      "num_tokens": 1982325.0,
+      "step": 210
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.6537890044576523,
+      "grad_norm": NaN,
+      "learning_rate": 1.7387486278814492e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0004891513846814633,
+      "num_tokens": 2071869.0,
+      "step": 220
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.6835066864784547,
+      "grad_norm": NaN,
+      "learning_rate": 1.716794731064764e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00038509477453771976,
+      "num_tokens": 2168370.0,
+      "step": 230
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.7132243684992571,
+      "grad_norm": NaN,
+      "learning_rate": 1.694840834248079e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0009255842014681548,
+      "num_tokens": 2263211.0,
+      "step": 240
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.7429420505200595,
+      "grad_norm": NaN,
+      "learning_rate": 1.6728869374313943e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0011857455348945223,
+      "num_tokens": 2352425.0,
+      "step": 250
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.7726597325408618,
+      "grad_norm": NaN,
+      "learning_rate": 1.6509330406147094e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0008204675337765366,
+      "num_tokens": 2445217.0,
+      "step": 260
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.8023774145616642,
+      "grad_norm": NaN,
+      "learning_rate": 1.6289791437980242e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0013553448639868292,
+      "num_tokens": 2537677.0,
+      "step": 270
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.8320950965824666,
+      "grad_norm": NaN,
+      "learning_rate": 1.6070252469813393e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0011630529144895263,
+      "num_tokens": 2631646.0,
+      "step": 280
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.861812778603269,
+      "grad_norm": NaN,
+      "learning_rate": 1.5850713501646545e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0009409735575900413,
+      "num_tokens": 2722632.0,
+      "step": 290
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.8915304606240714,
+      "grad_norm": NaN,
+      "learning_rate": 1.5631174533479693e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006484775040007662,
+      "num_tokens": 2819185.0,
+      "step": 300
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.9212481426448736,
+      "grad_norm": NaN,
+      "learning_rate": 1.5411635565312844e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0011682231881422922,
+      "num_tokens": 2910483.0,
+      "step": 310
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.950965824665676,
+      "grad_norm": NaN,
+      "learning_rate": 1.5192096597145995e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0004708930806373246,
+      "num_tokens": 2997057.0,
+      "step": 320
+    },
+    {
+      "entropy": NaN,
+      "epoch": 0.9806835066864784,
+      "grad_norm": NaN,
+      "learning_rate": 1.4972557628979145e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0011955970534472727,
+      "num_tokens": 3091284.0,
+      "step": 330
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.0089153046062407,
+      "grad_norm": NaN,
+      "learning_rate": 1.4753018660812295e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006843029964090276,
+      "num_tokens": 3177800.0,
+      "step": 340
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.0386329866270432,
+      "grad_norm": NaN,
+      "learning_rate": 1.4533479692645444e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0005406756106822285,
+      "num_tokens": 3271989.0,
+      "step": 350
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.0683506686478454,
+      "grad_norm": NaN,
+      "learning_rate": 1.4313940724478596e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007009803419350646,
+      "num_tokens": 3365082.0,
+      "step": 360
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.098068350668648,
+      "grad_norm": NaN,
+      "learning_rate": 1.4094401756311746e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.001003894711175235,
+      "num_tokens": 3465865.0,
+      "step": 370
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.1277860326894502,
+      "grad_norm": NaN,
+      "learning_rate": 1.3874862788144897e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0004136150790145621,
+      "num_tokens": 3564998.0,
+      "step": 380
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.1575037147102525,
+      "grad_norm": NaN,
+      "learning_rate": 1.3655323819978048e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000998112405068241,
+      "num_tokens": 3661508.0,
+      "step": 390
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.187221396731055,
+      "grad_norm": NaN,
+      "learning_rate": 1.3435784851811198e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0009621107514249161,
+      "num_tokens": 3755395.0,
+      "step": 400
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.2169390787518575,
+      "grad_norm": NaN,
+      "learning_rate": 1.3216245883644348e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0004976072828867473,
+      "num_tokens": 3842994.0,
+      "step": 410
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.2466567607726597,
+      "grad_norm": NaN,
+      "learning_rate": 1.2996706915477499e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0003434549740632065,
+      "num_tokens": 3934618.0,
+      "step": 420
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.276374442793462,
+      "grad_norm": NaN,
+      "learning_rate": 1.2777167947310649e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00026364591685705817,
+      "num_tokens": 4026711.0,
+      "step": 430
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.3060921248142645,
+      "grad_norm": NaN,
+      "learning_rate": 1.2557628979143798e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006349188319290988,
+      "num_tokens": 4114609.0,
+      "step": 440
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.3358098068350668,
+      "grad_norm": NaN,
+      "learning_rate": 1.2338090010976948e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0004265022376785055,
+      "num_tokens": 4203953.0,
+      "step": 450
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.3655274888558693,
+      "grad_norm": NaN,
+      "learning_rate": 1.2118551042810101e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0012633154692593962,
+      "num_tokens": 4298319.0,
+      "step": 460
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.3952451708766715,
+      "grad_norm": NaN,
+      "learning_rate": 1.189901207464325e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0011256658195634372,
+      "num_tokens": 4396020.0,
+      "step": 470
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.424962852897474,
+      "grad_norm": NaN,
+      "learning_rate": 1.16794731064764e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0013131745137798135,
+      "num_tokens": 4491203.0,
+      "step": 480
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.4546805349182763,
+      "grad_norm": NaN,
+      "learning_rate": 1.1459934138309552e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0012716218378045596,
+      "num_tokens": 4582199.0,
+      "step": 490
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.4843982169390788,
+      "grad_norm": NaN,
+      "learning_rate": 1.1240395170142701e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0014869970778818241,
+      "num_tokens": 4672589.0,
+      "step": 500
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.514115898959881,
+      "grad_norm": NaN,
+      "learning_rate": 1.1020856201975851e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00046981508567114363,
+      "num_tokens": 4767693.0,
+      "step": 510
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.5438335809806834,
+      "grad_norm": NaN,
+      "learning_rate": 1.0801317233809e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000601445968641201,
+      "num_tokens": 4860371.0,
+      "step": 520
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.5735512630014858,
+      "grad_norm": NaN,
+      "learning_rate": 1.0581778265642154e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0012073652920662425,
+      "num_tokens": 4950622.0,
+      "step": 530
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.6032689450222883,
+      "grad_norm": NaN,
+      "learning_rate": 1.0362239297475303e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00028707647215924225,
+      "num_tokens": 5044878.0,
+      "step": 540
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.6329866270430906,
+      "grad_norm": NaN,
+      "learning_rate": 1.0142700329308453e-05,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0004101752907445189,
+      "num_tokens": 5140938.0,
+      "step": 550
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.6627043090638929,
+      "grad_norm": NaN,
+      "learning_rate": 9.923161361141603e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0008072987395280507,
+      "num_tokens": 5232852.0,
+      "step": 560
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.6924219910846954,
+      "grad_norm": NaN,
+      "learning_rate": 9.703622392974754e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0012222853780258446,
+      "num_tokens": 5321835.0,
+      "step": 570
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.7221396731054979,
+      "grad_norm": NaN,
+      "learning_rate": 9.484083424807904e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.001064640874392353,
+      "num_tokens": 5416464.0,
+      "step": 580
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.7518573551263001,
+      "grad_norm": NaN,
+      "learning_rate": 9.264544456641055e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0009612867230316624,
+      "num_tokens": 5515169.0,
+      "step": 590
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.7815750371471024,
+      "grad_norm": NaN,
+      "learning_rate": 9.045005488474205e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0008774219051701948,
+      "num_tokens": 5611971.0,
+      "step": 600
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.811292719167905,
+      "grad_norm": NaN,
+      "learning_rate": 8.825466520307355e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007863394406740553,
+      "num_tokens": 5704255.0,
+      "step": 610
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.8410104011887074,
+      "grad_norm": NaN,
+      "learning_rate": 8.605927552140506e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000578320066415472,
+      "num_tokens": 5800440.0,
+      "step": 620
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.8707280832095097,
+      "grad_norm": NaN,
+      "learning_rate": 8.386388583973656e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0004275172817870043,
+      "num_tokens": 5890898.0,
+      "step": 630
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.900445765230312,
+      "grad_norm": NaN,
+      "learning_rate": 8.166849615806807e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.001425291335908696,
+      "num_tokens": 5987228.0,
+      "step": 640
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.9301634472511144,
+      "grad_norm": NaN,
+      "learning_rate": 7.947310647639957e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006059414023184218,
+      "num_tokens": 6076579.0,
+      "step": 650
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.959881129271917,
+      "grad_norm": NaN,
+      "learning_rate": 7.727771679473108e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0008722793892957271,
+      "num_tokens": 6171013.0,
+      "step": 660
+    },
+    {
+      "entropy": NaN,
+      "epoch": 1.9895988112927192,
+      "grad_norm": NaN,
+      "learning_rate": 7.508232711306257e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007748565993097145,
+      "num_tokens": 6265432.0,
+      "step": 670
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.0178306092124814,
+      "grad_norm": NaN,
+      "learning_rate": 7.288693743139408e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007497724390735751,
+      "num_tokens": 6349548.0,
+      "step": 680
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.0475482912332836,
+      "grad_norm": NaN,
+      "learning_rate": 7.069154774972559e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006858149892650545,
+      "num_tokens": 6440481.0,
+      "step": 690
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.0772659732540864,
+      "grad_norm": NaN,
+      "learning_rate": 6.849615806805708e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0008518316331901588,
+      "num_tokens": 6533227.0,
+      "step": 700
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.1069836552748886,
+      "grad_norm": NaN,
+      "learning_rate": 6.630076838638859e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006546900600369554,
+      "num_tokens": 6631305.0,
+      "step": 710
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.136701337295691,
+      "grad_norm": NaN,
+      "learning_rate": 6.410537870472009e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006311897363048047,
+      "num_tokens": 6719619.0,
+      "step": 720
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.166419019316493,
+      "grad_norm": NaN,
+      "learning_rate": 6.19099890230516e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0008787176091573201,
+      "num_tokens": 6808489.0,
+      "step": 730
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.196136701337296,
+      "grad_norm": NaN,
+      "learning_rate": 5.9714599341383095e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0010024115661508403,
+      "num_tokens": 6904544.0,
+      "step": 740
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.225854383358098,
+      "grad_norm": NaN,
+      "learning_rate": 5.75192096597146e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006462240009568632,
+      "num_tokens": 6998909.0,
+      "step": 750
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.2555720653789004,
+      "grad_norm": NaN,
+      "learning_rate": 5.532381997804611e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0005598854688287247,
+      "num_tokens": 7093305.0,
+      "step": 760
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.2852897473997027,
+      "grad_norm": NaN,
+      "learning_rate": 5.312843029637761e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00036793685212614945,
+      "num_tokens": 7184216.0,
+      "step": 770
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.315007429420505,
+      "grad_norm": NaN,
+      "learning_rate": 5.093304061470912e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000779397370206425,
+      "num_tokens": 7273139.0,
+      "step": 780
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.3447251114413077,
+      "grad_norm": NaN,
+      "learning_rate": 4.873765093304062e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006776292997528799,
+      "num_tokens": 7375877.0,
+      "step": 790
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.37444279346211,
+      "grad_norm": NaN,
+      "learning_rate": 4.654226125137213e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0013253977747808676,
+      "num_tokens": 7471802.0,
+      "step": 800
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.4041604754829122,
+      "grad_norm": NaN,
+      "learning_rate": 4.434687156970362e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007797809317708016,
+      "num_tokens": 7568137.0,
+      "step": 810
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.433878157503715,
+      "grad_norm": NaN,
+      "learning_rate": 4.215148188803513e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0005893595473025926,
+      "num_tokens": 7662880.0,
+      "step": 820
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.463595839524517,
+      "grad_norm": NaN,
+      "learning_rate": 3.995609220636663e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007197739760158584,
+      "num_tokens": 7755580.0,
+      "step": 830
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.4933135215453195,
+      "grad_norm": NaN,
+      "learning_rate": 3.776070252469814e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000664884723664727,
+      "num_tokens": 7854588.0,
+      "step": 840
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.5230312035661218,
+      "grad_norm": NaN,
+      "learning_rate": 3.5565312843029644e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000588176279416075,
+      "num_tokens": 7950417.0,
+      "step": 850
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.552748885586924,
+      "grad_norm": NaN,
+      "learning_rate": 3.3369923161361144e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007997245862497948,
+      "num_tokens": 8043286.0,
+      "step": 860
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.5824665676077267,
+      "grad_norm": NaN,
+      "learning_rate": 3.117453347969265e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0005276548559777439,
+      "num_tokens": 8138513.0,
+      "step": 870
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.612184249628529,
+      "grad_norm": NaN,
+      "learning_rate": 2.897914379802415e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00023305922804865985,
+      "num_tokens": 8228539.0,
+      "step": 880
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.6419019316493313,
+      "grad_norm": NaN,
+      "learning_rate": 2.6783754116355656e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0005359479633625597,
+      "num_tokens": 8320083.0,
+      "step": 890
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.6716196136701336,
+      "grad_norm": NaN,
+      "learning_rate": 2.458836443468716e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0006636981313931755,
+      "num_tokens": 8416446.0,
+      "step": 900
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.7013372956909363,
+      "grad_norm": NaN,
+      "learning_rate": 2.239297475301866e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.002058879991818685,
+      "num_tokens": 8513551.0,
+      "step": 910
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.7310549777117386,
+      "grad_norm": NaN,
+      "learning_rate": 2.0197585071350167e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.000950983911025105,
+      "num_tokens": 8603909.0,
+      "step": 920
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.760772659732541,
+      "grad_norm": NaN,
+      "learning_rate": 1.800219538968167e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0014912645376170986,
+      "num_tokens": 8695081.0,
+      "step": 930
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.790490341753343,
+      "grad_norm": NaN,
+      "learning_rate": 1.5806805708013173e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.001528770748700481,
+      "num_tokens": 8790760.0,
+      "step": 940
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.8202080237741454,
+      "grad_norm": NaN,
+      "learning_rate": 1.3611416026344676e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0012167295863036998,
+      "num_tokens": 8879103.0,
+      "step": 950
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.849925705794948,
+      "grad_norm": NaN,
+      "learning_rate": 1.1416026344676181e-06,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007455661499989219,
+      "num_tokens": 8972354.0,
+      "step": 960
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.8796433878157504,
+      "grad_norm": NaN,
+      "learning_rate": 9.220636663007684e-07,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0003935990913305432,
+      "num_tokens": 9069381.0,
+      "step": 970
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.9093610698365526,
+      "grad_norm": NaN,
+      "learning_rate": 7.025246981339188e-07,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00037132185607333665,
+      "num_tokens": 9164901.0,
+      "step": 980
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.9390787518573553,
+      "grad_norm": NaN,
+      "learning_rate": 4.829857299670692e-07,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.00036687121028080585,
+      "num_tokens": 9258811.0,
+      "step": 990
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.9687964338781576,
+      "grad_norm": NaN,
+      "learning_rate": 2.6344676180021955e-07,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0008805370052868966,
+      "num_tokens": 9351174.0,
+      "step": 1000
+    },
+    {
+      "entropy": NaN,
+      "epoch": 2.99851411589896,
+      "grad_norm": NaN,
+      "learning_rate": 4.390779363336993e-08,
+      "loss": 0.0,
+      "mean_token_accuracy": 0.0007362255608313717,
+      "num_tokens": 9440239.0,
+      "step": 1010
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 1011,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.0120089572705894e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1011/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ae0f7084660199ed7fd16027c0c6af5e726790f32a022b019f1277bec4e01142
+size 5713

checkpoint-125/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,89 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

checkpoint-125/config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": null,
+  "dtype": "bfloat16",
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 6144,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 28,
+  "model_type": "qwen3",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 1000000,
+    "rope_type": "default"
+  },
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.7.0",
+  "use_cache": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

checkpoint-125/generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "temperature": 0.6,
+  "top_k": 20,
+  "top_p": 0.95,
+  "transformers_version": "5.7.0"
+}

checkpoint-125/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7074b51b321535051b77436f30720111854403259c583a273488e061c0d23072
+size 3441185608

checkpoint-125/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b569f0710425a088cc5cd95c2def7bbb89539cdd1c3b41312c8d8fc136d2e186
+size 6882572207

checkpoint-125/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:61c19bab1174704a4a4441475683bf1270277af15d2e2c95e964789128e482c4
+size 14645

checkpoint-125/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:770e85488b2d6f2b41a590e6c2bb599f3a174081e14b8c48d3638681b4a06597
+size 1465

checkpoint-125/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
+size 11422650

checkpoint-125/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-125/trainer_state.json ADDED Viewed

	@@ -0,0 +1,154 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 125,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 0.2860680317506194,
+      "epoch": 0.08,
+      "grad_norm": 7.46875,
+      "learning_rate": 1.5000000000000002e-05,
+      "loss": 0.7724831104278564,
+      "mean_token_accuracy": 0.8434554561972618,
+      "num_tokens": 94518.0,
+      "step": 10
+    },
+    {
+      "entropy": 0.3743372498080134,
+      "epoch": 0.16,
+      "grad_norm": 2.6875,
+      "learning_rate": 1.8761061946902657e-05,
+      "loss": 0.4372231483459473,
+      "mean_token_accuracy": 0.8732902690768242,
+      "num_tokens": 187445.0,
+      "step": 20
+    },
+    {
+      "entropy": 0.3575092004612088,
+      "epoch": 0.24,
+      "grad_norm": 2.625,
+      "learning_rate": 1.6991150442477876e-05,
+      "loss": 0.3412153959274292,
+      "mean_token_accuracy": 0.8936023108661175,
+      "num_tokens": 280610.0,
+      "step": 30
+    },
+    {
+      "entropy": 0.31294157058000566,
+      "epoch": 0.32,
+      "grad_norm": 2.421875,
+      "learning_rate": 1.5221238938053098e-05,
+      "loss": 0.3175334453582764,
+      "mean_token_accuracy": 0.8971926532685757,
+      "num_tokens": 379577.0,
+      "step": 40
+    },
+    {
+      "entropy": 0.30759106744080783,
+      "epoch": 0.4,
+      "grad_norm": 1.7421875,
+      "learning_rate": 1.345132743362832e-05,
+      "loss": 0.32603118419647215,
+      "mean_token_accuracy": 0.8967837899923324,
+      "num_tokens": 477577.0,
+      "step": 50
+    },
+    {
+      "entropy": 0.31574366632848977,
+      "epoch": 0.48,
+      "grad_norm": 1.9921875,
+      "learning_rate": 1.1681415929203541e-05,
+      "loss": 0.323792552947998,
+      "mean_token_accuracy": 0.8988608829677105,
+      "num_tokens": 573360.0,
+      "step": 60
+    },
+    {
+      "entropy": 0.3065364420413971,
+      "epoch": 0.56,
+      "grad_norm": 1.6171875,
+      "learning_rate": 9.911504424778762e-06,
+      "loss": 0.32266840934753416,
+      "mean_token_accuracy": 0.8985385209321975,
+      "num_tokens": 665761.0,
+      "step": 70
+    },
+    {
+      "entropy": 0.27206654362380506,
+      "epoch": 0.64,
+      "grad_norm": 2.1875,
+      "learning_rate": 8.141592920353984e-06,
+      "loss": 0.2854612350463867,
+      "mean_token_accuracy": 0.9085792370140553,
+      "num_tokens": 756921.0,
+      "step": 80
+    },
+    {
+      "entropy": 0.28962985165417193,
+      "epoch": 0.72,
+      "grad_norm": 1.8125,
+      "learning_rate": 6.371681415929204e-06,
+      "loss": 0.29041600227355957,
+      "mean_token_accuracy": 0.9059965394437313,
+      "num_tokens": 845171.0,
+      "step": 90
+    },
+    {
+      "entropy": 0.2753044681623578,
+      "epoch": 0.8,
+      "grad_norm": 2.09375,
+      "learning_rate": 4.6017699115044254e-06,
+      "loss": 0.2797576427459717,
+      "mean_token_accuracy": 0.9101039692759514,
+      "num_tokens": 939550.0,
+      "step": 100
+    },
+    {
+      "entropy": 0.2808386910706758,
+      "epoch": 0.88,
+      "grad_norm": 1.5859375,
+      "learning_rate": 2.831858407079646e-06,
+      "loss": 0.2893812894821167,
+      "mean_token_accuracy": 0.9063553236424923,
+      "num_tokens": 1041441.0,
+      "step": 110
+    },
+    {
+      "entropy": 0.2944059643894434,
+      "epoch": 0.96,
+      "grad_norm": 1.6875,
+      "learning_rate": 1.0619469026548673e-06,
+      "loss": 0.2996867895126343,
+      "mean_token_accuracy": 0.9011975660920143,
+      "num_tokens": 1132794.0,
+      "step": 120
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 125,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.2589656731283456e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-125/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b4e0921a533d54992c732b7cf0c8a29ef97e50bbd333d4e4c07ae7183a1feb0f
+size 5713

checkpoint-500/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,89 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

checkpoint-500/config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": null,
+  "dtype": "bfloat16",
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 6144,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 28,
+  "model_type": "qwen3",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 1000000,
+    "rope_type": "default"
+  },
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.7.0",
+  "use_cache": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

checkpoint-500/generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "temperature": 0.6,
+  "top_k": 20,
+  "top_p": 0.95,
+  "transformers_version": "5.7.0"
+}

checkpoint-500/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:079565c41b7390c29162b2177ece2956e65a2c94398b94f8fbdcc9aaea1e5987
+size 3441185608

checkpoint-500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e11662089759b5ab33db1036a43acc423329b165030883aa32639c9ab33f035f
+size 6882572207

checkpoint-500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f196323d7423b60f8e4ceb7dbf8715ee326c0d068e5ff164f13c63b279b9f1a0
+size 14645

checkpoint-500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:96ae61b7327de513c572cbdd449d40136fa2f4fac8357d34b3d1a844dc5168d3
+size 1465

checkpoint-500/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
+size 11422650

checkpoint-500/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,589 @@

+{
+  "best_global_step": 500,
+  "best_metric": 0.2841346263885498,
+  "best_model_checkpoint": "/scratch/checkpoints/math_sft_v2/checkpoint-500",
+  "epoch": 1.6507018992568125,
+  "eval_steps": 100,
+  "global_step": 500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 0.26208699569106103,
+      "epoch": 0.03303055326176713,
+      "grad_norm": 13.3125,
+      "learning_rate": 1.5e-06,
+      "loss": 0.8376299858093261,
+      "mean_token_accuracy": 0.8421687923371792,
+      "num_tokens": 98215.0,
+      "step": 10
+    },
+    {
+      "entropy": 0.2689803125336766,
+      "epoch": 0.06606110652353427,
+      "grad_norm": 11.5625,
+      "learning_rate": 3.1666666666666667e-06,
+      "loss": 0.853145980834961,
+      "mean_token_accuracy": 0.8399689704179764,
+      "num_tokens": 191821.0,
+      "step": 20
+    },
+    {
+      "entropy": 0.28777916859835384,
+      "epoch": 0.0990916597853014,
+      "grad_norm": 12.375,
+      "learning_rate": 4.833333333333333e-06,
+      "loss": 0.7679028034210205,
+      "mean_token_accuracy": 0.8431851923465729,
+      "num_tokens": 285491.0,
+      "step": 30
+    },
+    {
+      "entropy": 0.3033197607845068,
+      "epoch": 0.13212221304706853,
+      "grad_norm": 5.375,
+      "learning_rate": 6.5000000000000004e-06,
+      "loss": 0.6137509822845459,
+      "mean_token_accuracy": 0.8549904882907867,
+      "num_tokens": 381273.0,
+      "step": 40
+    },
+    {
+      "entropy": 0.34702248610556125,
+      "epoch": 0.16515276630883569,
+      "grad_norm": 4.25,
+      "learning_rate": 8.166666666666668e-06,
+      "loss": 0.5155063152313233,
+      "mean_token_accuracy": 0.8607264116406441,
+      "num_tokens": 476481.0,
+      "step": 50
+    },
+    {
+      "entropy": 0.35821260381489994,
+      "epoch": 0.1981833195706028,
+      "grad_norm": 2.890625,
+      "learning_rate": 9.833333333333333e-06,
+      "loss": 0.42441272735595703,
+      "mean_token_accuracy": 0.8774531938135623,
+      "num_tokens": 570979.0,
+      "step": 60
+    },
+    {
+      "entropy": 0.33337059412151576,
+      "epoch": 0.23121387283236994,
+      "grad_norm": 3.15625,
+      "learning_rate": 9.835164835164835e-06,
+      "loss": 0.34101109504699706,
+      "mean_token_accuracy": 0.8939957685768605,
+      "num_tokens": 664789.0,
+      "step": 70
+    },
+    {
+      "entropy": 0.30876336991786957,
+      "epoch": 0.26424442609413706,
+      "grad_norm": 1.8515625,
+      "learning_rate": 9.652014652014653e-06,
+      "loss": 0.32317628860473635,
+      "mean_token_accuracy": 0.8987478196620942,
+      "num_tokens": 764246.0,
+      "step": 80
+    },
+    {
+      "entropy": 0.30824985895305873,
+      "epoch": 0.2972749793559042,
+      "grad_norm": 2.15625,
+      "learning_rate": 9.46886446886447e-06,
+      "loss": 0.31807353496551516,
+      "mean_token_accuracy": 0.896314675360918,
+      "num_tokens": 855689.0,
+      "step": 90
+    },
+    {
+      "entropy": 0.31522620394825934,
+      "epoch": 0.33030553261767137,
+      "grad_norm": 1.8828125,
+      "learning_rate": 9.285714285714288e-06,
+      "loss": 0.338785982131958,
+      "mean_token_accuracy": 0.89400485008955,
+      "num_tokens": 949366.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.33030553261767137,
+      "eval_entropy": 0.3022541454544774,
+      "eval_loss": 0.3146008253097534,
+      "eval_mean_token_accuracy": 0.8988911988558592,
+      "eval_num_tokens": 949366.0,
+      "eval_runtime": 17.914,
+      "eval_samples_per_second": 30.088,
+      "eval_steps_per_second": 15.072,
+      "step": 100
+    },
+    {
+      "entropy": 0.2985268572345376,
+      "epoch": 0.3633360858794385,
+      "grad_norm": 2.171875,
+      "learning_rate": 9.102564102564104e-06,
+      "loss": 0.31417596340179443,
+      "mean_token_accuracy": 0.89812126532197,
+      "num_tokens": 1040854.0,
+      "step": 110
+    },
+    {
+      "entropy": 0.3040375892072916,
+      "epoch": 0.3963666391412056,
+      "grad_norm": 1.9609375,
+      "learning_rate": 8.91941391941392e-06,
+      "loss": 0.32050628662109376,
+      "mean_token_accuracy": 0.8970061101019382,
+      "num_tokens": 1132424.0,
+      "step": 120
+    },
+    {
+      "entropy": 0.3125588171184063,
+      "epoch": 0.42939719240297275,
+      "grad_norm": 1.8203125,
+      "learning_rate": 8.736263736263737e-06,
+      "loss": 0.3176266193389893,
+      "mean_token_accuracy": 0.8971590586006641,
+      "num_tokens": 1221899.0,
+      "step": 130
+    },
+    {
+      "entropy": 0.312872576713562,
+      "epoch": 0.4624277456647399,
+      "grad_norm": 1.78125,
+      "learning_rate": 8.553113553113553e-06,
+      "loss": 0.3343791484832764,
+      "mean_token_accuracy": 0.8940604299306869,
+      "num_tokens": 1315052.0,
+      "step": 140
+    },
+    {
+      "entropy": 0.27830475065857174,
+      "epoch": 0.495458298926507,
+      "grad_norm": 2.0,
+      "learning_rate": 8.369963369963371e-06,
+      "loss": 0.28718521595001223,
+      "mean_token_accuracy": 0.9051955968141556,
+      "num_tokens": 1401947.0,
+      "step": 150
+    },
+    {
+      "entropy": 0.3005422970280051,
+      "epoch": 0.5284888521882741,
+      "grad_norm": 2.34375,
+      "learning_rate": 8.186813186813188e-06,
+      "loss": 0.31569790840148926,
+      "mean_token_accuracy": 0.8999212823808194,
+      "num_tokens": 1494107.0,
+      "step": 160
+    },
+    {
+      "entropy": 0.31200682315975425,
+      "epoch": 0.5615194054500413,
+      "grad_norm": 2.015625,
+      "learning_rate": 8.003663003663006e-06,
+      "loss": 0.32325453758239747,
+      "mean_token_accuracy": 0.8952137432992459,
+      "num_tokens": 1586474.0,
+      "step": 170
+    },
+    {
+      "entropy": 0.2929861258715391,
+      "epoch": 0.5945499587118084,
+      "grad_norm": 2.09375,
+      "learning_rate": 7.820512820512822e-06,
+      "loss": 0.29693677425384524,
+      "mean_token_accuracy": 0.9031261466443539,
+      "num_tokens": 1678162.0,
+      "step": 180
+    },
+    {
+      "entropy": 0.30670158881694076,
+      "epoch": 0.6275805119735756,
+      "grad_norm": 1.828125,
+      "learning_rate": 7.637362637362638e-06,
+      "loss": 0.31567308902740476,
+      "mean_token_accuracy": 0.8983482263982296,
+      "num_tokens": 1781401.0,
+      "step": 190
+    },
+    {
+      "entropy": 0.2866023685783148,
+      "epoch": 0.6606110652353427,
+      "grad_norm": 1.6953125,
+      "learning_rate": 7.454212454212455e-06,
+      "loss": 0.29333360195159913,
+      "mean_token_accuracy": 0.904055442661047,
+      "num_tokens": 1868038.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.6606110652353427,
+      "eval_entropy": 0.2921138784951634,
+      "eval_loss": 0.29292193055152893,
+      "eval_mean_token_accuracy": 0.904505420172656,
+      "eval_num_tokens": 1868038.0,
+      "eval_runtime": 17.8732,
+      "eval_samples_per_second": 30.157,
+      "eval_steps_per_second": 15.106,
+      "step": 200
+    },
+    {
+      "entropy": 0.28792863376438615,
+      "epoch": 0.6936416184971098,
+      "grad_norm": 1.9609375,
+      "learning_rate": 7.271062271062271e-06,
+      "loss": 0.29235827922821045,
+      "mean_token_accuracy": 0.9050419509410859,
+      "num_tokens": 1959819.0,
+      "step": 210
+    },
+    {
+      "entropy": 0.28098181951791046,
+      "epoch": 0.726672171758877,
+      "grad_norm": 1.8515625,
+      "learning_rate": 7.087912087912089e-06,
+      "loss": 0.2915766954421997,
+      "mean_token_accuracy": 0.9068245775997639,
+      "num_tokens": 2059113.0,
+      "step": 220
+    },
+    {
+      "entropy": 0.26028320472687483,
+      "epoch": 0.7597027250206441,
+      "grad_norm": 2.0,
+      "learning_rate": 6.9047619047619055e-06,
+      "loss": 0.2654874324798584,
+      "mean_token_accuracy": 0.9116183832287789,
+      "num_tokens": 2145321.0,
+      "step": 230
+    },
+    {
+      "entropy": 0.2855058180168271,
+      "epoch": 0.7927332782824112,
+      "grad_norm": 1.84375,
+      "learning_rate": 6.721611721611723e-06,
+      "loss": 0.2998212337493896,
+      "mean_token_accuracy": 0.9046589516103267,
+      "num_tokens": 2240859.0,
+      "step": 240
+    },
+    {
+      "entropy": 0.2783821925520897,
+      "epoch": 0.8257638315441783,
+      "grad_norm": 1.9609375,
+      "learning_rate": 6.538461538461539e-06,
+      "loss": 0.28870720863342286,
+      "mean_token_accuracy": 0.9073394626379013,
+      "num_tokens": 2333798.0,
+      "step": 250
+    },
+    {
+      "entropy": 0.277759512513876,
+      "epoch": 0.8587943848059455,
+      "grad_norm": 1.7734375,
+      "learning_rate": 6.3553113553113555e-06,
+      "loss": 0.29060728549957277,
+      "mean_token_accuracy": 0.9055352889001369,
+      "num_tokens": 2429910.0,
+      "step": 260
+    },
+    {
+      "entropy": 0.2882726760581136,
+      "epoch": 0.8918249380677127,
+      "grad_norm": 1.703125,
+      "learning_rate": 6.172161172161173e-06,
+      "loss": 0.2988502740859985,
+      "mean_token_accuracy": 0.9038613699376583,
+      "num_tokens": 2524730.0,
+      "step": 270
+    },
+    {
+      "entropy": 0.2855747708119452,
+      "epoch": 0.9248554913294798,
+      "grad_norm": 1.90625,
+      "learning_rate": 5.989010989010989e-06,
+      "loss": 0.289182186126709,
+      "mean_token_accuracy": 0.9049803033471108,
+      "num_tokens": 2615802.0,
+      "step": 280
+    },
+    {
+      "entropy": 0.26923007648438213,
+      "epoch": 0.9578860445912469,
+      "grad_norm": 2.84375,
+      "learning_rate": 5.805860805860807e-06,
+      "loss": 0.27834036350250246,
+      "mean_token_accuracy": 0.9109352678060532,
+      "num_tokens": 2709073.0,
+      "step": 290
+    },
+    {
+      "entropy": 0.2844470454379916,
+      "epoch": 0.990916597853014,
+      "grad_norm": 1.640625,
+      "learning_rate": 5.6227106227106235e-06,
+      "loss": 0.28873960971832274,
+      "mean_token_accuracy": 0.9069667845964432,
+      "num_tokens": 2805135.0,
+      "step": 300
+    },
+    {
+      "epoch": 0.990916597853014,
+      "eval_entropy": 0.285272175570329,
+      "eval_loss": 0.28705036640167236,
+      "eval_mean_token_accuracy": 0.9062359571456909,
+      "eval_num_tokens": 2805135.0,
+      "eval_runtime": 17.9038,
+      "eval_samples_per_second": 30.105,
+      "eval_steps_per_second": 15.081,
+      "step": 300
+    },
+    {
+      "entropy": 0.2869852078266633,
+      "epoch": 1.023121387283237,
+      "grad_norm": 1.9375,
+      "learning_rate": 5.43956043956044e-06,
+      "loss": 0.2850792646408081,
+      "mean_token_accuracy": 0.9064911672702203,
+      "num_tokens": 2898984.0,
+      "step": 310
+    },
+    {
+      "entropy": 0.2757460119202733,
+      "epoch": 1.0561519405450042,
+      "grad_norm": 1.8359375,
+      "learning_rate": 5.256410256410257e-06,
+      "loss": 0.2779810905456543,
+      "mean_token_accuracy": 0.9096895463764667,
+      "num_tokens": 2991694.0,
+      "step": 320
+    },
+    {
+      "entropy": 0.2600112119689584,
+      "epoch": 1.0891824938067713,
+      "grad_norm": 2.0625,
+      "learning_rate": 5.0732600732600735e-06,
+      "loss": 0.2662019729614258,
+      "mean_token_accuracy": 0.912333931028843,
+      "num_tokens": 3078862.0,
+      "step": 330
+    },
+    {
+      "entropy": 0.27342566605657337,
+      "epoch": 1.1222130470685383,
+      "grad_norm": 1.921875,
+      "learning_rate": 4.890109890109891e-06,
+      "loss": 0.28096842765808105,
+      "mean_token_accuracy": 0.9091624312102795,
+      "num_tokens": 3174256.0,
+      "step": 340
+    },
+    {
+      "entropy": 0.2671937083825469,
+      "epoch": 1.1552436003303055,
+      "grad_norm": 1.75,
+      "learning_rate": 4.706959706959707e-06,
+      "loss": 0.2723649501800537,
+      "mean_token_accuracy": 0.9101433955132961,
+      "num_tokens": 3272477.0,
+      "step": 350
+    },
+    {
+      "entropy": 0.2617026660591364,
+      "epoch": 1.1882741535920727,
+      "grad_norm": 1.9453125,
+      "learning_rate": 4.523809523809524e-06,
+      "loss": 0.26422107219696045,
+      "mean_token_accuracy": 0.913913157582283,
+      "num_tokens": 3364137.0,
+      "step": 360
+    },
+    {
+      "entropy": 0.2595691759139299,
+      "epoch": 1.2213047068538398,
+      "grad_norm": 1.5390625,
+      "learning_rate": 4.340659340659341e-06,
+      "loss": 0.2714951276779175,
+      "mean_token_accuracy": 0.911636833101511,
+      "num_tokens": 3460217.0,
+      "step": 370
+    },
+    {
+      "entropy": 0.2933320404961705,
+      "epoch": 1.254335260115607,
+      "grad_norm": 1.9765625,
+      "learning_rate": 4.157509157509158e-06,
+      "loss": 0.29573276042938235,
+      "mean_token_accuracy": 0.902192335575819,
+      "num_tokens": 3557030.0,
+      "step": 380
+    },
+    {
+      "entropy": 0.2693369995802641,
+      "epoch": 1.287365813377374,
+      "grad_norm": 1.890625,
+      "learning_rate": 3.974358974358974e-06,
+      "loss": 0.27785425186157225,
+      "mean_token_accuracy": 0.9090337291359901,
+      "num_tokens": 3654159.0,
+      "step": 390
+    },
+    {
+      "entropy": 0.2908114368095994,
+      "epoch": 1.3203963666391412,
+      "grad_norm": 2.015625,
+      "learning_rate": 3.7912087912087915e-06,
+      "loss": 0.30535905361175536,
+      "mean_token_accuracy": 0.901054035872221,
+      "num_tokens": 3749664.0,
+      "step": 400
+    },
+    {
+      "epoch": 1.3203963666391412,
+      "eval_entropy": 0.2783401116176888,
+      "eval_loss": 0.28495287895202637,
+      "eval_mean_token_accuracy": 0.9069356251645971,
+      "eval_num_tokens": 3749664.0,
+      "eval_runtime": 17.8814,
+      "eval_samples_per_second": 30.143,
+      "eval_steps_per_second": 15.099,
+      "step": 400
+    },
+    {
+      "entropy": 0.2749379588291049,
+      "epoch": 1.3534269199009084,
+      "grad_norm": 1.84375,
+      "learning_rate": 3.6080586080586083e-06,
+      "loss": 0.27795455455780027,
+      "mean_token_accuracy": 0.9090839587152004,
+      "num_tokens": 3845841.0,
+      "step": 410
+    },
+    {
+      "entropy": 0.2859474029392004,
+      "epoch": 1.3864574731626755,
+      "grad_norm": 1.6796875,
+      "learning_rate": 3.424908424908425e-06,
+      "loss": 0.2886682987213135,
+      "mean_token_accuracy": 0.9059950634837151,
+      "num_tokens": 3942402.0,
+      "step": 420
+    },
+    {
+      "entropy": 0.2537162031978369,
+      "epoch": 1.4194880264244425,
+      "grad_norm": 1.9375,
+      "learning_rate": 3.2417582417582424e-06,
+      "loss": 0.25533475875854494,
+      "mean_token_accuracy": 0.9158934533596039,
+      "num_tokens": 4034122.0,
+      "step": 430
+    },
+    {
+      "entropy": 0.28255065642297267,
+      "epoch": 1.4525185796862097,
+      "grad_norm": 1.7734375,
+      "learning_rate": 3.058608058608059e-06,
+      "loss": 0.29817631244659426,
+      "mean_token_accuracy": 0.9040707617998123,
+      "num_tokens": 4127068.0,
+      "step": 440
+    },
+    {
+      "entropy": 0.2898306529968977,
+      "epoch": 1.4855491329479769,
+      "grad_norm": 1.703125,
+      "learning_rate": 2.8754578754578755e-06,
+      "loss": 0.2923931121826172,
+      "mean_token_accuracy": 0.9030195340514183,
+      "num_tokens": 4224435.0,
+      "step": 450
+    },
+    {
+      "entropy": 0.27209411058574917,
+      "epoch": 1.518579686209744,
+      "grad_norm": 1.7578125,
+      "learning_rate": 2.6923076923076923e-06,
+      "loss": 0.28646528720855713,
+      "mean_token_accuracy": 0.9095881894230843,
+      "num_tokens": 4321028.0,
+      "step": 460
+    },
+    {
+      "entropy": 0.280727200768888,
+      "epoch": 1.5516102394715112,
+      "grad_norm": 1.7421875,
+      "learning_rate": 2.509157509157509e-06,
+      "loss": 0.2891413688659668,
+      "mean_token_accuracy": 0.9067636162042618,
+      "num_tokens": 4414916.0,
+      "step": 470
+    },
+    {
+      "entropy": 0.27109829243272543,
+      "epoch": 1.5846407927332784,
+      "grad_norm": 1.8359375,
+      "learning_rate": 2.3260073260073264e-06,
+      "loss": 0.26625516414642336,
+      "mean_token_accuracy": 0.9099020563066006,
+      "num_tokens": 4505708.0,
+      "step": 480
+    },
+    {
+      "entropy": 0.2729395367205143,
+      "epoch": 1.6176713459950454,
+      "grad_norm": 1.7890625,
+      "learning_rate": 2.1428571428571427e-06,
+      "loss": 0.2779061317443848,
+      "mean_token_accuracy": 0.909747340530157,
+      "num_tokens": 4600244.0,
+      "step": 490
+    },
+    {
+      "entropy": 0.2682301852852106,
+      "epoch": 1.6507018992568125,
+      "grad_norm": 1.6953125,
+      "learning_rate": 1.95970695970696e-06,
+      "loss": 0.274446964263916,
+      "mean_token_accuracy": 0.9094639919698239,
+      "num_tokens": 4688147.0,
+      "step": 500
+    },
+    {
+      "epoch": 1.6507018992568125,
+      "eval_entropy": 0.2756172451708052,
+      "eval_loss": 0.2841346263885498,
+      "eval_mean_token_accuracy": 0.9072470570052111,
+      "eval_num_tokens": 4688147.0,
+      "eval_runtime": 17.9119,
+      "eval_samples_per_second": 30.092,
+      "eval_steps_per_second": 15.074,
+      "step": 500
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 606,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.028742352950886e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6696e1fef8bde5e22de2f5ae18c284f2b2d35a345157db5151571518385fb38b
+size 5713

checkpoint-600/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,89 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

checkpoint-600/config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": null,
+  "dtype": "bfloat16",
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 6144,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 28,
+  "model_type": "qwen3",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 1000000,
+    "rope_type": "default"
+  },
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.7.0",
+  "use_cache": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

checkpoint-600/generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "temperature": 0.6,
+  "top_k": 20,
+  "top_p": 0.95,
+  "transformers_version": "5.7.0"
+}

checkpoint-600/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:066f20885c5c9a598d45fbcd21aa7ab9081dffda3fb1930d573d7c3d99b5d998
+size 3441185608

checkpoint-600/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c75c57fe3c14c29f7a73acb2b44e60573aaf69cd744a4ce99770cdfdbe4de45e
+size 6882572207

checkpoint-600/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fdc37bbd2e979f041dfbbb004a5c74bab6cdda159cb18116df728588515a9ef6
+size 14645

checkpoint-600/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:540d8ab45fe72a01988766f9263355239d34a4ff0ef13ddac7f1f0d1198ce9a9
+size 1465

checkpoint-600/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
+size 11422650

checkpoint-600/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-600/trainer_state.json ADDED Viewed

	@@ -0,0 +1,700 @@

+{
+  "best_global_step": 600,
+  "best_metric": 0.28394654393196106,
+  "best_model_checkpoint": "/scratch/checkpoints/math_sft_v2/checkpoint-600",
+  "epoch": 1.981007431874484,
+  "eval_steps": 100,
+  "global_step": 600,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 0.26208699569106103,
+      "epoch": 0.03303055326176713,
+      "grad_norm": 13.3125,
+      "learning_rate": 1.5e-06,
+      "loss": 0.8376299858093261,
+      "mean_token_accuracy": 0.8421687923371792,
+      "num_tokens": 98215.0,
+      "step": 10
+    },
+    {
+      "entropy": 0.2689803125336766,
+      "epoch": 0.06606110652353427,
+      "grad_norm": 11.5625,
+      "learning_rate": 3.1666666666666667e-06,
+      "loss": 0.853145980834961,
+      "mean_token_accuracy": 0.8399689704179764,
+      "num_tokens": 191821.0,
+      "step": 20
+    },
+    {
+      "entropy": 0.28777916859835384,
+      "epoch": 0.0990916597853014,
+      "grad_norm": 12.375,
+      "learning_rate": 4.833333333333333e-06,
+      "loss": 0.7679028034210205,
+      "mean_token_accuracy": 0.8431851923465729,
+      "num_tokens": 285491.0,
+      "step": 30
+    },
+    {
+      "entropy": 0.3033197607845068,
+      "epoch": 0.13212221304706853,
+      "grad_norm": 5.375,
+      "learning_rate": 6.5000000000000004e-06,
+      "loss": 0.6137509822845459,
+      "mean_token_accuracy": 0.8549904882907867,
+      "num_tokens": 381273.0,
+      "step": 40
+    },
+    {
+      "entropy": 0.34702248610556125,
+      "epoch": 0.16515276630883569,
+      "grad_norm": 4.25,
+      "learning_rate": 8.166666666666668e-06,
+      "loss": 0.5155063152313233,
+      "mean_token_accuracy": 0.8607264116406441,
+      "num_tokens": 476481.0,
+      "step": 50
+    },
+    {
+      "entropy": 0.35821260381489994,
+      "epoch": 0.1981833195706028,
+      "grad_norm": 2.890625,
+      "learning_rate": 9.833333333333333e-06,
+      "loss": 0.42441272735595703,
+      "mean_token_accuracy": 0.8774531938135623,
+      "num_tokens": 570979.0,
+      "step": 60
+    },
+    {
+      "entropy": 0.33337059412151576,
+      "epoch": 0.23121387283236994,
+      "grad_norm": 3.15625,
+      "learning_rate": 9.835164835164835e-06,
+      "loss": 0.34101109504699706,
+      "mean_token_accuracy": 0.8939957685768605,
+      "num_tokens": 664789.0,
+      "step": 70
+    },
+    {
+      "entropy": 0.30876336991786957,
+      "epoch": 0.26424442609413706,
+      "grad_norm": 1.8515625,
+      "learning_rate": 9.652014652014653e-06,
+      "loss": 0.32317628860473635,
+      "mean_token_accuracy": 0.8987478196620942,
+      "num_tokens": 764246.0,
+      "step": 80
+    },
+    {
+      "entropy": 0.30824985895305873,
+      "epoch": 0.2972749793559042,
+      "grad_norm": 2.15625,
+      "learning_rate": 9.46886446886447e-06,
+      "loss": 0.31807353496551516,
+      "mean_token_accuracy": 0.896314675360918,
+      "num_tokens": 855689.0,
+      "step": 90
+    },
+    {
+      "entropy": 0.31522620394825934,
+      "epoch": 0.33030553261767137,
+      "grad_norm": 1.8828125,
+      "learning_rate": 9.285714285714288e-06,
+      "loss": 0.338785982131958,
+      "mean_token_accuracy": 0.89400485008955,
+      "num_tokens": 949366.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.33030553261767137,
+      "eval_entropy": 0.3022541454544774,
+      "eval_loss": 0.3146008253097534,
+      "eval_mean_token_accuracy": 0.8988911988558592,
+      "eval_num_tokens": 949366.0,
+      "eval_runtime": 17.914,
+      "eval_samples_per_second": 30.088,
+      "eval_steps_per_second": 15.072,
+      "step": 100
+    },
+    {
+      "entropy": 0.2985268572345376,
+      "epoch": 0.3633360858794385,
+      "grad_norm": 2.171875,
+      "learning_rate": 9.102564102564104e-06,
+      "loss": 0.31417596340179443,
+      "mean_token_accuracy": 0.89812126532197,
+      "num_tokens": 1040854.0,
+      "step": 110
+    },
+    {
+      "entropy": 0.3040375892072916,
+      "epoch": 0.3963666391412056,
+      "grad_norm": 1.9609375,
+      "learning_rate": 8.91941391941392e-06,
+      "loss": 0.32050628662109376,
+      "mean_token_accuracy": 0.8970061101019382,
+      "num_tokens": 1132424.0,
+      "step": 120
+    },
+    {
+      "entropy": 0.3125588171184063,
+      "epoch": 0.42939719240297275,
+      "grad_norm": 1.8203125,
+      "learning_rate": 8.736263736263737e-06,
+      "loss": 0.3176266193389893,
+      "mean_token_accuracy": 0.8971590586006641,
+      "num_tokens": 1221899.0,
+      "step": 130
+    },
+    {
+      "entropy": 0.312872576713562,
+      "epoch": 0.4624277456647399,
+      "grad_norm": 1.78125,
+      "learning_rate": 8.553113553113553e-06,
+      "loss": 0.3343791484832764,
+      "mean_token_accuracy": 0.8940604299306869,
+      "num_tokens": 1315052.0,
+      "step": 140
+    },
+    {
+      "entropy": 0.27830475065857174,
+      "epoch": 0.495458298926507,
+      "grad_norm": 2.0,
+      "learning_rate": 8.369963369963371e-06,
+      "loss": 0.28718521595001223,
+      "mean_token_accuracy": 0.9051955968141556,
+      "num_tokens": 1401947.0,
+      "step": 150
+    },
+    {
+      "entropy": 0.3005422970280051,
+      "epoch": 0.5284888521882741,
+      "grad_norm": 2.34375,
+      "learning_rate": 8.186813186813188e-06,
+      "loss": 0.31569790840148926,
+      "mean_token_accuracy": 0.8999212823808194,
+      "num_tokens": 1494107.0,
+      "step": 160
+    },
+    {
+      "entropy": 0.31200682315975425,
+      "epoch": 0.5615194054500413,
+      "grad_norm": 2.015625,
+      "learning_rate": 8.003663003663006e-06,
+      "loss": 0.32325453758239747,
+      "mean_token_accuracy": 0.8952137432992459,
+      "num_tokens": 1586474.0,
+      "step": 170
+    },
+    {
+      "entropy": 0.2929861258715391,
+      "epoch": 0.5945499587118084,
+      "grad_norm": 2.09375,
+      "learning_rate": 7.820512820512822e-06,
+      "loss": 0.29693677425384524,
+      "mean_token_accuracy": 0.9031261466443539,
+      "num_tokens": 1678162.0,
+      "step": 180
+    },
+    {
+      "entropy": 0.30670158881694076,
+      "epoch": 0.6275805119735756,
+      "grad_norm": 1.828125,
+      "learning_rate": 7.637362637362638e-06,
+      "loss": 0.31567308902740476,
+      "mean_token_accuracy": 0.8983482263982296,
+      "num_tokens": 1781401.0,
+      "step": 190
+    },
+    {
+      "entropy": 0.2866023685783148,
+      "epoch": 0.6606110652353427,
+      "grad_norm": 1.6953125,
+      "learning_rate": 7.454212454212455e-06,
+      "loss": 0.29333360195159913,
+      "mean_token_accuracy": 0.904055442661047,
+      "num_tokens": 1868038.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.6606110652353427,
+      "eval_entropy": 0.2921138784951634,
+      "eval_loss": 0.29292193055152893,
+      "eval_mean_token_accuracy": 0.904505420172656,
+      "eval_num_tokens": 1868038.0,
+      "eval_runtime": 17.8732,
+      "eval_samples_per_second": 30.157,
+      "eval_steps_per_second": 15.106,
+      "step": 200
+    },
+    {
+      "entropy": 0.28792863376438615,
+      "epoch": 0.6936416184971098,
+      "grad_norm": 1.9609375,
+      "learning_rate": 7.271062271062271e-06,
+      "loss": 0.29235827922821045,
+      "mean_token_accuracy": 0.9050419509410859,
+      "num_tokens": 1959819.0,
+      "step": 210
+    },
+    {
+      "entropy": 0.28098181951791046,
+      "epoch": 0.726672171758877,
+      "grad_norm": 1.8515625,
+      "learning_rate": 7.087912087912089e-06,
+      "loss": 0.2915766954421997,
+      "mean_token_accuracy": 0.9068245775997639,
+      "num_tokens": 2059113.0,
+      "step": 220
+    },
+    {
+      "entropy": 0.26028320472687483,
+      "epoch": 0.7597027250206441,
+      "grad_norm": 2.0,
+      "learning_rate": 6.9047619047619055e-06,
+      "loss": 0.2654874324798584,
+      "mean_token_accuracy": 0.9116183832287789,
+      "num_tokens": 2145321.0,
+      "step": 230
+    },
+    {
+      "entropy": 0.2855058180168271,
+      "epoch": 0.7927332782824112,
+      "grad_norm": 1.84375,
+      "learning_rate": 6.721611721611723e-06,
+      "loss": 0.2998212337493896,
+      "mean_token_accuracy": 0.9046589516103267,
+      "num_tokens": 2240859.0,
+      "step": 240
+    },
+    {
+      "entropy": 0.2783821925520897,
+      "epoch": 0.8257638315441783,
+      "grad_norm": 1.9609375,
+      "learning_rate": 6.538461538461539e-06,
+      "loss": 0.28870720863342286,
+      "mean_token_accuracy": 0.9073394626379013,
+      "num_tokens": 2333798.0,
+      "step": 250
+    },
+    {
+      "entropy": 0.277759512513876,
+      "epoch": 0.8587943848059455,
+      "grad_norm": 1.7734375,
+      "learning_rate": 6.3553113553113555e-06,
+      "loss": 0.29060728549957277,
+      "mean_token_accuracy": 0.9055352889001369,
+      "num_tokens": 2429910.0,
+      "step": 260
+    },
+    {
+      "entropy": 0.2882726760581136,
+      "epoch": 0.8918249380677127,
+      "grad_norm": 1.703125,
+      "learning_rate": 6.172161172161173e-06,
+      "loss": 0.2988502740859985,
+      "mean_token_accuracy": 0.9038613699376583,
+      "num_tokens": 2524730.0,
+      "step": 270
+    },
+    {
+      "entropy": 0.2855747708119452,
+      "epoch": 0.9248554913294798,
+      "grad_norm": 1.90625,
+      "learning_rate": 5.989010989010989e-06,
+      "loss": 0.289182186126709,
+      "mean_token_accuracy": 0.9049803033471108,
+      "num_tokens": 2615802.0,
+      "step": 280
+    },
+    {
+      "entropy": 0.26923007648438213,
+      "epoch": 0.9578860445912469,
+      "grad_norm": 2.84375,
+      "learning_rate": 5.805860805860807e-06,
+      "loss": 0.27834036350250246,
+      "mean_token_accuracy": 0.9109352678060532,
+      "num_tokens": 2709073.0,
+      "step": 290
+    },
+    {
+      "entropy": 0.2844470454379916,
+      "epoch": 0.990916597853014,
+      "grad_norm": 1.640625,
+      "learning_rate": 5.6227106227106235e-06,
+      "loss": 0.28873960971832274,
+      "mean_token_accuracy": 0.9069667845964432,
+      "num_tokens": 2805135.0,
+      "step": 300
+    },
+    {
+      "epoch": 0.990916597853014,
+      "eval_entropy": 0.285272175570329,
+      "eval_loss": 0.28705036640167236,
+      "eval_mean_token_accuracy": 0.9062359571456909,
+      "eval_num_tokens": 2805135.0,
+      "eval_runtime": 17.9038,
+      "eval_samples_per_second": 30.105,
+      "eval_steps_per_second": 15.081,
+      "step": 300
+    },
+    {
+      "entropy": 0.2869852078266633,
+      "epoch": 1.023121387283237,
+      "grad_norm": 1.9375,
+      "learning_rate": 5.43956043956044e-06,
+      "loss": 0.2850792646408081,
+      "mean_token_accuracy": 0.9064911672702203,
+      "num_tokens": 2898984.0,
+      "step": 310
+    },
+    {
+      "entropy": 0.2757460119202733,
+      "epoch": 1.0561519405450042,
+      "grad_norm": 1.8359375,
+      "learning_rate": 5.256410256410257e-06,
+      "loss": 0.2779810905456543,
+      "mean_token_accuracy": 0.9096895463764667,
+      "num_tokens": 2991694.0,
+      "step": 320
+    },
+    {
+      "entropy": 0.2600112119689584,
+      "epoch": 1.0891824938067713,
+      "grad_norm": 2.0625,
+      "learning_rate": 5.0732600732600735e-06,
+      "loss": 0.2662019729614258,
+      "mean_token_accuracy": 0.912333931028843,
+      "num_tokens": 3078862.0,
+      "step": 330
+    },
+    {
+      "entropy": 0.27342566605657337,
+      "epoch": 1.1222130470685383,
+      "grad_norm": 1.921875,
+      "learning_rate": 4.890109890109891e-06,
+      "loss": 0.28096842765808105,
+      "mean_token_accuracy": 0.9091624312102795,
+      "num_tokens": 3174256.0,
+      "step": 340
+    },
+    {
+      "entropy": 0.2671937083825469,
+      "epoch": 1.1552436003303055,
+      "grad_norm": 1.75,
+      "learning_rate": 4.706959706959707e-06,
+      "loss": 0.2723649501800537,
+      "mean_token_accuracy": 0.9101433955132961,
+      "num_tokens": 3272477.0,
+      "step": 350
+    },
+    {
+      "entropy": 0.2617026660591364,
+      "epoch": 1.1882741535920727,
+      "grad_norm": 1.9453125,
+      "learning_rate": 4.523809523809524e-06,
+      "loss": 0.26422107219696045,
+      "mean_token_accuracy": 0.913913157582283,
+      "num_tokens": 3364137.0,
+      "step": 360
+    },
+    {
+      "entropy": 0.2595691759139299,
+      "epoch": 1.2213047068538398,
+      "grad_norm": 1.5390625,
+      "learning_rate": 4.340659340659341e-06,
+      "loss": 0.2714951276779175,
+      "mean_token_accuracy": 0.911636833101511,
+      "num_tokens": 3460217.0,
+      "step": 370
+    },
+    {
+      "entropy": 0.2933320404961705,
+      "epoch": 1.254335260115607,
+      "grad_norm": 1.9765625,
+      "learning_rate": 4.157509157509158e-06,
+      "loss": 0.29573276042938235,
+      "mean_token_accuracy": 0.902192335575819,
+      "num_tokens": 3557030.0,
+      "step": 380
+    },
+    {
+      "entropy": 0.2693369995802641,
+      "epoch": 1.287365813377374,
+      "grad_norm": 1.890625,
+      "learning_rate": 3.974358974358974e-06,
+      "loss": 0.27785425186157225,
+      "mean_token_accuracy": 0.9090337291359901,
+      "num_tokens": 3654159.0,
+      "step": 390
+    },
+    {
+      "entropy": 0.2908114368095994,
+      "epoch": 1.3203963666391412,
+      "grad_norm": 2.015625,
+      "learning_rate": 3.7912087912087915e-06,
+      "loss": 0.30535905361175536,
+      "mean_token_accuracy": 0.901054035872221,
+      "num_tokens": 3749664.0,
+      "step": 400
+    },
+    {
+      "epoch": 1.3203963666391412,
+      "eval_entropy": 0.2783401116176888,
+      "eval_loss": 0.28495287895202637,
+      "eval_mean_token_accuracy": 0.9069356251645971,
+      "eval_num_tokens": 3749664.0,
+      "eval_runtime": 17.8814,
+      "eval_samples_per_second": 30.143,
+      "eval_steps_per_second": 15.099,
+      "step": 400
+    },
+    {
+      "entropy": 0.2749379588291049,
+      "epoch": 1.3534269199009084,
+      "grad_norm": 1.84375,
+      "learning_rate": 3.6080586080586083e-06,
+      "loss": 0.27795455455780027,
+      "mean_token_accuracy": 0.9090839587152004,
+      "num_tokens": 3845841.0,
+      "step": 410
+    },
+    {
+      "entropy": 0.2859474029392004,
+      "epoch": 1.3864574731626755,
+      "grad_norm": 1.6796875,
+      "learning_rate": 3.424908424908425e-06,
+      "loss": 0.2886682987213135,
+      "mean_token_accuracy": 0.9059950634837151,
+      "num_tokens": 3942402.0,
+      "step": 420
+    },
+    {
+      "entropy": 0.2537162031978369,
+      "epoch": 1.4194880264244425,
+      "grad_norm": 1.9375,
+      "learning_rate": 3.2417582417582424e-06,
+      "loss": 0.25533475875854494,
+      "mean_token_accuracy": 0.9158934533596039,
+      "num_tokens": 4034122.0,
+      "step": 430
+    },
+    {
+      "entropy": 0.28255065642297267,
+      "epoch": 1.4525185796862097,
+      "grad_norm": 1.7734375,
+      "learning_rate": 3.058608058608059e-06,
+      "loss": 0.29817631244659426,
+      "mean_token_accuracy": 0.9040707617998123,
+      "num_tokens": 4127068.0,
+      "step": 440
+    },
+    {
+      "entropy": 0.2898306529968977,
+      "epoch": 1.4855491329479769,
+      "grad_norm": 1.703125,
+      "learning_rate": 2.8754578754578755e-06,
+      "loss": 0.2923931121826172,
+      "mean_token_accuracy": 0.9030195340514183,
+      "num_tokens": 4224435.0,
+      "step": 450
+    },
+    {
+      "entropy": 0.27209411058574917,
+      "epoch": 1.518579686209744,
+      "grad_norm": 1.7578125,
+      "learning_rate": 2.6923076923076923e-06,
+      "loss": 0.28646528720855713,
+      "mean_token_accuracy": 0.9095881894230843,
+      "num_tokens": 4321028.0,
+      "step": 460
+    },
+    {
+      "entropy": 0.280727200768888,
+      "epoch": 1.5516102394715112,
+      "grad_norm": 1.7421875,
+      "learning_rate": 2.509157509157509e-06,
+      "loss": 0.2891413688659668,
+      "mean_token_accuracy": 0.9067636162042618,
+      "num_tokens": 4414916.0,
+      "step": 470
+    },
+    {
+      "entropy": 0.27109829243272543,
+      "epoch": 1.5846407927332784,
+      "grad_norm": 1.8359375,
+      "learning_rate": 2.3260073260073264e-06,
+      "loss": 0.26625516414642336,
+      "mean_token_accuracy": 0.9099020563066006,
+      "num_tokens": 4505708.0,
+      "step": 480
+    },
+    {
+      "entropy": 0.2729395367205143,
+      "epoch": 1.6176713459950454,
+      "grad_norm": 1.7890625,
+      "learning_rate": 2.1428571428571427e-06,
+      "loss": 0.2779061317443848,
+      "mean_token_accuracy": 0.909747340530157,
+      "num_tokens": 4600244.0,
+      "step": 490
+    },
+    {
+      "entropy": 0.2682301852852106,
+      "epoch": 1.6507018992568125,
+      "grad_norm": 1.6953125,
+      "learning_rate": 1.95970695970696e-06,
+      "loss": 0.274446964263916,
+      "mean_token_accuracy": 0.9094639919698239,
+      "num_tokens": 4688147.0,
+      "step": 500
+    },
+    {
+      "epoch": 1.6507018992568125,
+      "eval_entropy": 0.2756172451708052,
+      "eval_loss": 0.2841346263885498,
+      "eval_mean_token_accuracy": 0.9072470570052111,
+      "eval_num_tokens": 4688147.0,
+      "eval_runtime": 17.9119,
+      "eval_samples_per_second": 30.092,
+      "eval_steps_per_second": 15.074,
+      "step": 500
+    },
+    {
+      "entropy": 0.26342453975230457,
+      "epoch": 1.6837324525185797,
+      "grad_norm": 1.890625,
+      "learning_rate": 1.7765567765567768e-06,
+      "loss": 0.26432271003723146,
+      "mean_token_accuracy": 0.9114409819245338,
+      "num_tokens": 4780124.0,
+      "step": 510
+    },
+    {
+      "entropy": 0.28657868495211003,
+      "epoch": 1.7167630057803467,
+      "grad_norm": 1.7734375,
+      "learning_rate": 1.5934065934065933e-06,
+      "loss": 0.2983593225479126,
+      "mean_token_accuracy": 0.9028328903019428,
+      "num_tokens": 4874633.0,
+      "step": 520
+    },
+    {
+      "entropy": 0.25633321776986123,
+      "epoch": 1.7497935590421139,
+      "grad_norm": 1.8359375,
+      "learning_rate": 1.4102564102564104e-06,
+      "loss": 0.2606488227844238,
+      "mean_token_accuracy": 0.9153999648988247,
+      "num_tokens": 4967263.0,
+      "step": 530
+    },
+    {
+      "entropy": 0.26120030507445335,
+      "epoch": 1.782824112303881,
+      "grad_norm": 2.078125,
+      "learning_rate": 1.2271062271062271e-06,
+      "loss": 0.2706313610076904,
+      "mean_token_accuracy": 0.9130682416260243,
+      "num_tokens": 5056736.0,
+      "step": 540
+    },
+    {
+      "entropy": 0.27009565513581035,
+      "epoch": 1.8158546655656482,
+      "grad_norm": 1.8828125,
+      "learning_rate": 1.0439560439560442e-06,
+      "loss": 0.277858567237854,
+      "mean_token_accuracy": 0.9087999157607556,
+      "num_tokens": 5150617.0,
+      "step": 550
+    },
+    {
+      "entropy": 0.2819518899545074,
+      "epoch": 1.8488852188274154,
+      "grad_norm": 1.9609375,
+      "learning_rate": 8.608058608058609e-07,
+      "loss": 0.28743035793304444,
+      "mean_token_accuracy": 0.9067244723439216,
+      "num_tokens": 5243538.0,
+      "step": 560
+    },
+    {
+      "entropy": 0.2858045116066933,
+      "epoch": 1.8819157720891826,
+      "grad_norm": 2.203125,
+      "learning_rate": 6.776556776556777e-07,
+      "loss": 0.2939825296401978,
+      "mean_token_accuracy": 0.904579871147871,
+      "num_tokens": 5331441.0,
+      "step": 570
+    },
+    {
+      "entropy": 0.2926233192905784,
+      "epoch": 1.9149463253509498,
+      "grad_norm": 2.3125,
+      "learning_rate": 4.945054945054946e-07,
+      "loss": 0.29961700439453126,
+      "mean_token_accuracy": 0.9033655568957328,
+      "num_tokens": 5426734.0,
+      "step": 580
+    },
+    {
+      "entropy": 0.2706678254529834,
+      "epoch": 1.9479768786127167,
+      "grad_norm": 2.03125,
+      "learning_rate": 3.113553113553114e-07,
+      "loss": 0.27953526973724363,
+      "mean_token_accuracy": 0.9093762949109078,
+      "num_tokens": 5518190.0,
+      "step": 590
+    },
+    {
+      "entropy": 0.27056904807686805,
+      "epoch": 1.981007431874484,
+      "grad_norm": 2.09375,
+      "learning_rate": 1.282051282051282e-07,
+      "loss": 0.2764536619186401,
+      "mean_token_accuracy": 0.9110228553414345,
+      "num_tokens": 5609238.0,
+      "step": 600
+    },
+    {
+      "epoch": 1.981007431874484,
+      "eval_entropy": 0.2757842331572815,
+      "eval_loss": 0.28394654393196106,
+      "eval_mean_token_accuracy": 0.9073928307603907,
+      "eval_num_tokens": 5609238.0,
+      "eval_runtime": 18.1138,
+      "eval_samples_per_second": 29.756,
+      "eval_steps_per_second": 14.906,
+      "step": 600
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 606,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6.018332557481165e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-600/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6696e1fef8bde5e22de2f5ae18c284f2b2d35a345157db5151571518385fb38b
+size 5713

checkpoint-606/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,89 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

checkpoint-606/config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": null,
+  "dtype": "bfloat16",
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 6144,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 28,
+  "model_type": "qwen3",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 1000000,
+    "rope_type": "default"
+  },
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.7.0",
+  "use_cache": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

checkpoint-606/generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "temperature": 0.6,
+  "top_k": 20,
+  "top_p": 0.95,
+  "transformers_version": "5.7.0"
+}