Yash1005 commited on May 22

Commit

dfa9607

verified ·

1 Parent(s): 01fbabe

upload trained Qwen LangID guard LoRA

Browse files

Files changed (30) hide show

.gitattributes +3 -0
README.md +207 -0
adapter_config.json +48 -0
adapter_model.safetensors +3 -0
chat_template.jinja +154 -0
checkpoint-1125/README.md +207 -0
checkpoint-1125/adapter_config.json +48 -0
checkpoint-1125/adapter_model.safetensors +3 -0
checkpoint-1125/chat_template.jinja +154 -0
checkpoint-1125/optimizer.pt +3 -0
checkpoint-1125/rng_state.pth +3 -0
checkpoint-1125/scheduler.pt +3 -0
checkpoint-1125/tokenizer.json +3 -0
checkpoint-1125/tokenizer_config.json +32 -0
checkpoint-1125/trainer_state.json +826 -0
checkpoint-1125/training_args.bin +3 -0
checkpoint-2250/README.md +207 -0
checkpoint-2250/adapter_config.json +48 -0
checkpoint-2250/adapter_model.safetensors +3 -0
checkpoint-2250/chat_template.jinja +154 -0
checkpoint-2250/optimizer.pt +3 -0
checkpoint-2250/rng_state.pth +3 -0
checkpoint-2250/scheduler.pt +3 -0
checkpoint-2250/tokenizer.json +3 -0
checkpoint-2250/tokenizer_config.json +32 -0
checkpoint-2250/trainer_state.json +1625 -0
checkpoint-2250/training_args.bin +3 -0
tokenizer.json +3 -0
tokenizer_config.json +32 -0
training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+checkpoint-1125/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-2250/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: Qwen/Qwen3.5-2B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen3.5-2B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.19.1

adapter_config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen3.5-2B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "lora_ga_config": null,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.19.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj",
+    "o_proj",
+    "k_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_bdlora": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:42cc2cadc1cf728b9d3ca8793d77f15786d281c09f45e9183fba3602774dcb26
+size 43672224

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,154 @@

+{%- set image_count = namespace(value=0) %}
+{%- set video_count = namespace(value=0) %}
+{%- macro render_content(content, do_vision_count, is_system_content=false) %}
+    {%- if content is string %}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping %}
+        {%- for item in content %}
+            {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain images.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set image_count.value = image_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Picture ' ~ image_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
+            {%- elif 'video' in item or item.type == 'video' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain videos.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set video_count.value = video_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Video ' ~ video_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
+            {%- elif 'text' in item %}
+                {{- item.text }}
+            {%- else %}
+                {{- raise_exception('Unexpected item type in content.') }}
+            {%- endif %}
+        {%- endfor %}
+    {%- elif content is none or content is undefined %}
+        {{- '' }}
+    {%- else %}
+        {{- raise_exception('Unexpected content type.') }}
+    {%- endif %}
+{%- endmacro %}
+{%- if not messages %}
+    {{- raise_exception('No messages provided.') }}
+{%- endif %}
+{%- if tools and tools is iterable and tools is not mapping %}
+    {{- '<|im_start|>system\n' }}
+    {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {%- if content %}
+            {{- '\n\n' + content }}
+        {%- endif %}
+    {%- endif %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" %}
+        {%- set content = render_content(message.content, false)|trim %}
+        {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
+            {%- set ns.multi_step_tool = false %}
+            {%- set ns.last_query_index = index %}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if ns.multi_step_tool %}
+    {{- raise_exception('No user query found in messages.') }}
+{%- endif %}
+{%- for message in messages %}
+    {%- set content = render_content(message.content, true)|trim %}
+    {%- if message.role == "system" %}
+        {%- if not loop.first %}
+            {{- raise_exception('System message must be at the beginning.') }}
+        {%- endif %}
+    {%- elif message.role == "user" %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- set reasoning_content = reasoning_content|trim %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if tool_call.function is defined %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {%- if loop.first %}
+                    {%- if content|trim %}
+                        {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- else %}
+                        {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- endif %}
+                {%- else %}
+                    {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                {%- endif %}
+                {%- if tool_call.arguments is defined %}
+                    {%- for args_name, args_value in tool_call.arguments|items %}
+                        {{- '<parameter=' + args_name + '>\n' }}
+                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                        {{- args_value }}
+                        {{- '\n</parameter>\n' }}
+                    {%- endfor %}
+                {%- endif %}
+                {{- '</function>\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- raise_exception('Unexpected message role.') }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is true %}
+        {{- '<think>\n' }}
+    {%- else %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

checkpoint-1125/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: Qwen/Qwen3.5-2B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen3.5-2B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.19.1

checkpoint-1125/adapter_config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen3.5-2B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "lora_ga_config": null,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.19.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj",
+    "o_proj",
+    "k_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_bdlora": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-1125/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cac2b9560a3033b1bea519d4e7736abf89a10a2ac8d51756d50ae608dca0b739
+size 43672224

checkpoint-1125/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,154 @@

+{%- set image_count = namespace(value=0) %}
+{%- set video_count = namespace(value=0) %}
+{%- macro render_content(content, do_vision_count, is_system_content=false) %}
+    {%- if content is string %}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping %}
+        {%- for item in content %}
+            {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain images.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set image_count.value = image_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Picture ' ~ image_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
+            {%- elif 'video' in item or item.type == 'video' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain videos.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set video_count.value = video_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Video ' ~ video_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
+            {%- elif 'text' in item %}
+                {{- item.text }}
+            {%- else %}
+                {{- raise_exception('Unexpected item type in content.') }}
+            {%- endif %}
+        {%- endfor %}
+    {%- elif content is none or content is undefined %}
+        {{- '' }}
+    {%- else %}
+        {{- raise_exception('Unexpected content type.') }}
+    {%- endif %}
+{%- endmacro %}
+{%- if not messages %}
+    {{- raise_exception('No messages provided.') }}
+{%- endif %}
+{%- if tools and tools is iterable and tools is not mapping %}
+    {{- '<|im_start|>system\n' }}
+    {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {%- if content %}
+            {{- '\n\n' + content }}
+        {%- endif %}
+    {%- endif %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" %}
+        {%- set content = render_content(message.content, false)|trim %}
+        {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
+            {%- set ns.multi_step_tool = false %}
+            {%- set ns.last_query_index = index %}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if ns.multi_step_tool %}
+    {{- raise_exception('No user query found in messages.') }}
+{%- endif %}
+{%- for message in messages %}
+    {%- set content = render_content(message.content, true)|trim %}
+    {%- if message.role == "system" %}
+        {%- if not loop.first %}
+            {{- raise_exception('System message must be at the beginning.') }}
+        {%- endif %}
+    {%- elif message.role == "user" %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- set reasoning_content = reasoning_content|trim %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if tool_call.function is defined %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {%- if loop.first %}
+                    {%- if content|trim %}
+                        {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- else %}
+                        {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- endif %}
+                {%- else %}
+                    {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                {%- endif %}
+                {%- if tool_call.arguments is defined %}
+                    {%- for args_name, args_value in tool_call.arguments|items %}
+                        {{- '<parameter=' + args_name + '>\n' }}
+                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                        {{- args_value }}
+                        {{- '\n</parameter>\n' }}
+                    {%- endfor %}
+                {%- endif %}
+                {{- '</function>\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- raise_exception('Unexpected message role.') }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is true %}
+        {{- '<think>\n' }}
+    {%- else %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

checkpoint-1125/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b43fe79f959aad059bb99e961e9ce6629a5cb8436abb79220cb7c50798e77ab1
+size 87455482

checkpoint-1125/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5cc373ff040a2549fba5716d2d04c98a42df3a2c16be5643e54a5e394dff6691
+size 14244

checkpoint-1125/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2ff7674480b4479d4d875f43691bde4aed921f4f94ac4e33b7c4727012dbbaa5
+size 1064

checkpoint-1125/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d73c2c5f7aa0ed522c8d96ef3524739eb61e3c78e74839a2ce4a1c56ea340a20
+size 19989424

checkpoint-1125/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "add_prefix_space": false,
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>",
+  "audio_token": "<|audio_pad|>",
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "image_token": "<|image_pad|>",
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 262144,
+  "model_specific_special_tokens": {
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>",
+    "audio_token": "<|audio_pad|>",
+    "image_token": "<|image_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>"
+  },
+  "pad_token": "<|endoftext|>",
+  "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null,
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>"
+}

checkpoint-1125/trainer_state.json ADDED Viewed

	@@ -0,0 +1,826 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 1125,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008888888888888889,
+      "grad_norm": 11.21920108795166,
+      "learning_rate": 7.964601769911505e-06,
+      "loss": 2.420624923706055,
+      "step": 10
+    },
+    {
+      "epoch": 0.017777777777777778,
+      "grad_norm": 2.168776750564575,
+      "learning_rate": 1.6814159292035402e-05,
+      "loss": 2.054325294494629,
+      "step": 20
+    },
+    {
+      "epoch": 0.02666666666666667,
+      "grad_norm": 1.835652232170105,
+      "learning_rate": 2.5663716814159294e-05,
+      "loss": 1.6131031036376953,
+      "step": 30
+    },
+    {
+      "epoch": 0.035555555555555556,
+      "grad_norm": 1.4394404888153076,
+      "learning_rate": 3.451327433628319e-05,
+      "loss": 1.3123435020446776,
+      "step": 40
+    },
+    {
+      "epoch": 0.044444444444444446,
+      "grad_norm": 0.770032525062561,
+      "learning_rate": 4.3362831858407084e-05,
+      "loss": 0.9337721824645996,
+      "step": 50
+    },
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 0.6677928566932678,
+      "learning_rate": 5.221238938053098e-05,
+      "loss": 0.7623300552368164,
+      "step": 60
+    },
+    {
+      "epoch": 0.06222222222222222,
+      "grad_norm": 0.7739974856376648,
+      "learning_rate": 6.106194690265487e-05,
+      "loss": 0.6224458694458008,
+      "step": 70
+    },
+    {
+      "epoch": 0.07111111111111111,
+      "grad_norm": 0.5803143382072449,
+      "learning_rate": 6.991150442477876e-05,
+      "loss": 0.5023368835449219,
+      "step": 80
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 0.5280536413192749,
+      "learning_rate": 7.876106194690266e-05,
+      "loss": 0.4957974910736084,
+      "step": 90
+    },
+    {
+      "epoch": 0.08888888888888889,
+      "grad_norm": 0.544360876083374,
+      "learning_rate": 8.761061946902655e-05,
+      "loss": 0.4931485652923584,
+      "step": 100
+    },
+    {
+      "epoch": 0.09777777777777778,
+      "grad_norm": 0.5157379508018494,
+      "learning_rate": 9.646017699115044e-05,
+      "loss": 0.5069647789001465,
+      "step": 110
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 0.4112723767757416,
+      "learning_rate": 9.999805495168024e-05,
+      "loss": 0.4394033908843994,
+      "step": 120
+    },
+    {
+      "epoch": 0.11555555555555555,
+      "grad_norm": 0.5091575980186462,
+      "learning_rate": 9.998616909329826e-05,
+      "loss": 0.4934559345245361,
+      "step": 130
+    },
+    {
+      "epoch": 0.12444444444444444,
+      "grad_norm": 0.372969388961792,
+      "learning_rate": 9.996348052452065e-05,
+      "loss": 0.48008193969726565,
+      "step": 140
+    },
+    {
+      "epoch": 0.13333333333333333,
+      "grad_norm": 0.42792317271232605,
+      "learning_rate": 9.992999414866448e-05,
+      "loss": 0.40410771369934084,
+      "step": 150
+    },
+    {
+      "epoch": 0.14222222222222222,
+      "grad_norm": 0.605674684047699,
+      "learning_rate": 9.988571720260407e-05,
+      "loss": 0.44922242164611814,
+      "step": 160
+    },
+    {
+      "epoch": 0.1511111111111111,
+      "grad_norm": 0.4557352662086487,
+      "learning_rate": 9.98306592552068e-05,
+      "loss": 0.4807539939880371,
+      "step": 170
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.35166481137275696,
+      "learning_rate": 9.976483220526534e-05,
+      "loss": 0.41620450019836425,
+      "step": 180
+    },
+    {
+      "epoch": 0.1688888888888889,
+      "grad_norm": 0.3679167628288269,
+      "learning_rate": 9.968825027892603e-05,
+      "loss": 0.5430550575256348,
+      "step": 190
+    },
+    {
+      "epoch": 0.17777777777777778,
+      "grad_norm": 0.40084755420684814,
+      "learning_rate": 9.960093002661443e-05,
+      "loss": 0.45009822845458985,
+      "step": 200
+    },
+    {
+      "epoch": 0.18666666666666668,
+      "grad_norm": 0.4580940306186676,
+      "learning_rate": 9.95028903194586e-05,
+      "loss": 0.42539730072021487,
+      "step": 210
+    },
+    {
+      "epoch": 0.19555555555555557,
+      "grad_norm": 0.33829355239868164,
+      "learning_rate": 9.939415234521074e-05,
+      "loss": 0.45442776679992675,
+      "step": 220
+    },
+    {
+      "epoch": 0.20444444444444446,
+      "grad_norm": 0.3858638405799866,
+      "learning_rate": 9.92747396036682e-05,
+      "loss": 0.46364641189575195,
+      "step": 230
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 0.38474398851394653,
+      "learning_rate": 9.91446779015949e-05,
+      "loss": 0.47534937858581544,
+      "step": 240
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 0.41425031423568726,
+      "learning_rate": 9.900399534714406e-05,
+      "loss": 0.4933596134185791,
+      "step": 250
+    },
+    {
+      "epoch": 0.2311111111111111,
+      "grad_norm": 0.35072949528694153,
+      "learning_rate": 9.885272234378373e-05,
+      "loss": 0.4453242778778076,
+      "step": 260
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 0.3859287202358246,
+      "learning_rate": 9.869089158372608e-05,
+      "loss": 0.48943133354187013,
+      "step": 270
+    },
+    {
+      "epoch": 0.24888888888888888,
+      "grad_norm": 0.35846146941185,
+      "learning_rate": 9.851853804086221e-05,
+      "loss": 0.4975121021270752,
+      "step": 280
+    },
+    {
+      "epoch": 0.2577777777777778,
+      "grad_norm": 0.43171802163124084,
+      "learning_rate": 9.833569896320376e-05,
+      "loss": 0.4233686447143555,
+      "step": 290
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 0.38140612840652466,
+      "learning_rate": 9.814241386483315e-05,
+      "loss": 0.45711665153503417,
+      "step": 300
+    },
+    {
+      "epoch": 0.27555555555555555,
+      "grad_norm": 0.3377639353275299,
+      "learning_rate": 9.7938724517364e-05,
+      "loss": 0.4544373035430908,
+      "step": 310
+    },
+    {
+      "epoch": 0.28444444444444444,
+      "grad_norm": 0.34243062138557434,
+      "learning_rate": 9.772467494091368e-05,
+      "loss": 0.42964882850646974,
+      "step": 320
+    },
+    {
+      "epoch": 0.29333333333333333,
+      "grad_norm": 0.34049737453460693,
+      "learning_rate": 9.750031139459004e-05,
+      "loss": 0.42655010223388673,
+      "step": 330
+    },
+    {
+      "epoch": 0.3022222222222222,
+      "grad_norm": 0.35732948780059814,
+      "learning_rate": 9.726568236649401e-05,
+      "loss": 0.45878915786743163,
+      "step": 340
+    },
+    {
+      "epoch": 0.3111111111111111,
+      "grad_norm": 0.3442543148994446,
+      "learning_rate": 9.702083856324078e-05,
+      "loss": 0.43412394523620607,
+      "step": 350
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 0.3641473650932312,
+      "learning_rate": 9.676583289900137e-05,
+      "loss": 0.4166750907897949,
+      "step": 360
+    },
+    {
+      "epoch": 0.3288888888888889,
+      "grad_norm": 0.38029661774635315,
+      "learning_rate": 9.650072048406705e-05,
+      "loss": 0.4133127212524414,
+      "step": 370
+    },
+    {
+      "epoch": 0.3377777777777778,
+      "grad_norm": 0.46410584449768066,
+      "learning_rate": 9.622555861293937e-05,
+      "loss": 0.4788216590881348,
+      "step": 380
+    },
+    {
+      "epoch": 0.3466666666666667,
+      "grad_norm": 0.46155110001564026,
+      "learning_rate": 9.594040675194789e-05,
+      "loss": 0.438585090637207,
+      "step": 390
+    },
+    {
+      "epoch": 0.35555555555555557,
+      "grad_norm": 0.3239903450012207,
+      "learning_rate": 9.564532652639874e-05,
+      "loss": 0.41576008796691893,
+      "step": 400
+    },
+    {
+      "epoch": 0.36444444444444446,
+      "grad_norm": 0.37329864501953125,
+      "learning_rate": 9.534038170725656e-05,
+      "loss": 0.45173234939575196,
+      "step": 410
+    },
+    {
+      "epoch": 0.37333333333333335,
+      "grad_norm": 0.36062464118003845,
+      "learning_rate": 9.502563819736261e-05,
+      "loss": 0.4431358814239502,
+      "step": 420
+    },
+    {
+      "epoch": 0.38222222222222224,
+      "grad_norm": 0.32877373695373535,
+      "learning_rate": 9.47011640171923e-05,
+      "loss": 0.38395123481750487,
+      "step": 430
+    },
+    {
+      "epoch": 0.39111111111111113,
+      "grad_norm": 0.39523133635520935,
+      "learning_rate": 9.436702929015504e-05,
+      "loss": 0.45081305503845215,
+      "step": 440
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.396107017993927,
+      "learning_rate": 9.402330622743953e-05,
+      "loss": 0.431630802154541,
+      "step": 450
+    },
+    {
+      "epoch": 0.4088888888888889,
+      "grad_norm": 0.34849977493286133,
+      "learning_rate": 9.367006911240794e-05,
+      "loss": 0.4235401153564453,
+      "step": 460
+    },
+    {
+      "epoch": 0.4177777777777778,
+      "grad_norm": 0.32264649868011475,
+      "learning_rate": 9.330739428454228e-05,
+      "loss": 0.41276025772094727,
+      "step": 470
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 0.40649130940437317,
+      "learning_rate": 9.293536012294625e-05,
+      "loss": 0.4388761520385742,
+      "step": 480
+    },
+    {
+      "epoch": 0.43555555555555553,
+      "grad_norm": 0.3980196714401245,
+      "learning_rate": 9.25540470294066e-05,
+      "loss": 0.434765625,
+      "step": 490
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 0.34392979741096497,
+      "learning_rate": 9.216353741101698e-05,
+      "loss": 0.4237233638763428,
+      "step": 500
+    },
+    {
+      "epoch": 0.4533333333333333,
+      "grad_norm": 0.4070020318031311,
+      "learning_rate": 9.176391566236874e-05,
+      "loss": 0.43479743003845217,
+      "step": 510
+    },
+    {
+      "epoch": 0.4622222222222222,
+      "grad_norm": 0.3691484332084656,
+      "learning_rate": 9.13552681473121e-05,
+      "loss": 0.41257572174072266,
+      "step": 520
+    },
+    {
+      "epoch": 0.4711111111111111,
+      "grad_norm": 0.31974899768829346,
+      "learning_rate": 9.093768318029157e-05,
+      "loss": 0.4397622585296631,
+      "step": 530
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 0.3324742317199707,
+      "learning_rate": 9.051125100726012e-05,
+      "loss": 0.4031101703643799,
+      "step": 540
+    },
+    {
+      "epoch": 0.4888888888888889,
+      "grad_norm": 0.4857999384403229,
+      "learning_rate": 9.00760637861757e-05,
+      "loss": 0.4260400295257568,
+      "step": 550
+    },
+    {
+      "epoch": 0.49777777777777776,
+      "grad_norm": 0.3340180218219757,
+      "learning_rate": 8.96322155670846e-05,
+      "loss": 0.45470619201660156,
+      "step": 560
+    },
+    {
+      "epoch": 0.5066666666666667,
+      "grad_norm": 0.44739553332328796,
+      "learning_rate": 8.917980227179592e-05,
+      "loss": 0.40158867835998535,
+      "step": 570
+    },
+    {
+      "epoch": 0.5155555555555555,
+      "grad_norm": 0.3540358245372772,
+      "learning_rate": 8.871892167315158e-05,
+      "loss": 0.43067140579223634,
+      "step": 580
+    },
+    {
+      "epoch": 0.5244444444444445,
+      "grad_norm": 0.3139533996582031,
+      "learning_rate": 8.824967337389618e-05,
+      "loss": 0.4297904968261719,
+      "step": 590
+    },
+    {
+      "epoch": 0.5333333333333333,
+      "grad_norm": 0.31625425815582275,
+      "learning_rate": 8.777215878515148e-05,
+      "loss": 0.41481895446777345,
+      "step": 600
+    },
+    {
+      "epoch": 0.5422222222222223,
+      "grad_norm": 0.39057719707489014,
+      "learning_rate": 8.728648110450007e-05,
+      "loss": 0.4180173873901367,
+      "step": 610
+    },
+    {
+      "epoch": 0.5511111111111111,
+      "grad_norm": 0.358767569065094,
+      "learning_rate": 8.679274529368284e-05,
+      "loss": 0.42813663482666015,
+      "step": 620
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 0.3512662649154663,
+      "learning_rate": 8.629105805591536e-05,
+      "loss": 0.45242977142333984,
+      "step": 630
+    },
+    {
+      "epoch": 0.5688888888888889,
+      "grad_norm": 0.3629150688648224,
+      "learning_rate": 8.57815278128278e-05,
+      "loss": 0.39222893714904783,
+      "step": 640
+    },
+    {
+      "epoch": 0.5777777777777777,
+      "grad_norm": 0.4052797555923462,
+      "learning_rate": 8.52642646810335e-05,
+      "loss": 0.43261022567749025,
+      "step": 650
+    },
+    {
+      "epoch": 0.5866666666666667,
+      "grad_norm": 0.37778323888778687,
+      "learning_rate": 8.473938044833118e-05,
+      "loss": 0.3872582674026489,
+      "step": 660
+    },
+    {
+      "epoch": 0.5955555555555555,
+      "grad_norm": 0.3754846453666687,
+      "learning_rate": 8.420698854954614e-05,
+      "loss": 0.3962836980819702,
+      "step": 670
+    },
+    {
+      "epoch": 0.6044444444444445,
+      "grad_norm": 0.3173210918903351,
+      "learning_rate": 8.366720404201532e-05,
+      "loss": 0.40433526039123535,
+      "step": 680
+    },
+    {
+      "epoch": 0.6133333333333333,
+      "grad_norm": 0.3750768005847931,
+      "learning_rate": 8.312014358072182e-05,
+      "loss": 0.44541182518005373,
+      "step": 690
+    },
+    {
+      "epoch": 0.6222222222222222,
+      "grad_norm": 0.396304726600647,
+      "learning_rate": 8.256592539308412e-05,
+      "loss": 0.39640786647796633,
+      "step": 700
+    },
+    {
+      "epoch": 0.6311111111111111,
+      "grad_norm": 0.27579763531684875,
+      "learning_rate": 8.200466925340551e-05,
+      "loss": 0.42966270446777344,
+      "step": 710
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 0.3287679851055145,
+      "learning_rate": 8.143649645698916e-05,
+      "loss": 0.4303389549255371,
+      "step": 720
+    },
+    {
+      "epoch": 0.6488888888888888,
+      "grad_norm": 0.3433365821838379,
+      "learning_rate": 8.086152979392455e-05,
+      "loss": 0.42377805709838867,
+      "step": 730
+    },
+    {
+      "epoch": 0.6577777777777778,
+      "grad_norm": 0.37662261724472046,
+      "learning_rate": 8.027989352255072e-05,
+      "loss": 0.3759341239929199,
+      "step": 740
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 0.293804794549942,
+      "learning_rate": 7.969171334260241e-05,
+      "loss": 0.4281143665313721,
+      "step": 750
+    },
+    {
+      "epoch": 0.6755555555555556,
+      "grad_norm": 0.3423449695110321,
+      "learning_rate": 7.909711636804456e-05,
+      "loss": 0.4293722152709961,
+      "step": 760
+    },
+    {
+      "epoch": 0.6844444444444444,
+      "grad_norm": 0.3712252080440521,
+      "learning_rate": 7.849623109960114e-05,
+      "loss": 0.3751317262649536,
+      "step": 770
+    },
+    {
+      "epoch": 0.6933333333333334,
+      "grad_norm": 0.30978795886039734,
+      "learning_rate": 7.788918739698442e-05,
+      "loss": 0.4274724006652832,
+      "step": 780
+    },
+    {
+      "epoch": 0.7022222222222222,
+      "grad_norm": 0.3778475821018219,
+      "learning_rate": 7.727611645083046e-05,
+      "loss": 0.4593647480010986,
+      "step": 790
+    },
+    {
+      "epoch": 0.7111111111111111,
+      "grad_norm": 0.4193788170814514,
+      "learning_rate": 7.665715075434693e-05,
+      "loss": 0.4005401134490967,
+      "step": 800
+    },
+    {
+      "epoch": 0.72,
+      "grad_norm": 0.4584580063819885,
+      "learning_rate": 7.603242407467957e-05,
+      "loss": 0.4481417179107666,
+      "step": 810
+    },
+    {
+      "epoch": 0.7288888888888889,
+      "grad_norm": 0.4954594373703003,
+      "learning_rate": 7.54020714240031e-05,
+      "loss": 0.4419954776763916,
+      "step": 820
+    },
+    {
+      "epoch": 0.7377777777777778,
+      "grad_norm": 0.32087239623069763,
+      "learning_rate": 7.476622903034331e-05,
+      "loss": 0.358188796043396,
+      "step": 830
+    },
+    {
+      "epoch": 0.7466666666666667,
+      "grad_norm": 0.415523499250412,
+      "learning_rate": 7.412503430813625e-05,
+      "loss": 0.4097938060760498,
+      "step": 840
+    },
+    {
+      "epoch": 0.7555555555555555,
+      "grad_norm": 0.4330928325653076,
+      "learning_rate": 7.347862582853102e-05,
+      "loss": 0.39210293292999265,
+      "step": 850
+    },
+    {
+      "epoch": 0.7644444444444445,
+      "grad_norm": 0.32685935497283936,
+      "learning_rate": 7.282714328944267e-05,
+      "loss": 0.4046238899230957,
+      "step": 860
+    },
+    {
+      "epoch": 0.7733333333333333,
+      "grad_norm": 0.25576087832450867,
+      "learning_rate": 7.217072748536147e-05,
+      "loss": 0.424999475479126,
+      "step": 870
+    },
+    {
+      "epoch": 0.7822222222222223,
+      "grad_norm": 0.5378305912017822,
+      "learning_rate": 7.150952027692523e-05,
+      "loss": 0.4036864757537842,
+      "step": 880
+    },
+    {
+      "epoch": 0.7911111111111111,
+      "grad_norm": 0.3434256315231323,
+      "learning_rate": 7.08436645602613e-05,
+      "loss": 0.4073346614837646,
+      "step": 890
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 0.38994190096855164,
+      "learning_rate": 7.017330423610469e-05,
+      "loss": 0.38085227012634276,
+      "step": 900
+    },
+    {
+      "epoch": 0.8088888888888889,
+      "grad_norm": 0.36494192481040955,
+      "learning_rate": 6.949858417869908e-05,
+      "loss": 0.41643786430358887,
+      "step": 910
+    },
+    {
+      "epoch": 0.8177777777777778,
+      "grad_norm": 0.3612149655818939,
+      "learning_rate": 6.881965020448753e-05,
+      "loss": 0.4108633518218994,
+      "step": 920
+    },
+    {
+      "epoch": 0.8266666666666667,
+      "grad_norm": 0.43641334772109985,
+      "learning_rate": 6.813664904059944e-05,
+      "loss": 0.42844533920288086,
+      "step": 930
+    },
+    {
+      "epoch": 0.8355555555555556,
+      "grad_norm": 0.37633153796195984,
+      "learning_rate": 6.744972829314079e-05,
+      "loss": 0.4326623916625977,
+      "step": 940
+    },
+    {
+      "epoch": 0.8444444444444444,
+      "grad_norm": 0.30158376693725586,
+      "learning_rate": 6.675903641529442e-05,
+      "loss": 0.41312832832336427,
+      "step": 950
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.49504661560058594,
+      "learning_rate": 6.60647226752372e-05,
+      "loss": 0.38414552211761477,
+      "step": 960
+    },
+    {
+      "epoch": 0.8622222222222222,
+      "grad_norm": 0.40403881669044495,
+      "learning_rate": 6.536693712388108e-05,
+      "loss": 0.40920281410217285,
+      "step": 970
+    },
+    {
+      "epoch": 0.8711111111111111,
+      "grad_norm": 0.3977372646331787,
+      "learning_rate": 6.466583056244502e-05,
+      "loss": 0.4197856903076172,
+      "step": 980
+    },
+    {
+      "epoch": 0.88,
+      "grad_norm": 0.3222980499267578,
+      "learning_rate": 6.396155450986467e-05,
+      "loss": 0.43209381103515626,
+      "step": 990
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 0.3269721269607544,
+      "learning_rate": 6.32542611700471e-05,
+      "loss": 0.3872633457183838,
+      "step": 1000
+    },
+    {
+      "epoch": 0.8977777777777778,
+      "grad_norm": 0.2627931237220764,
+      "learning_rate": 6.254410339897733e-05,
+      "loss": 0.40660743713378905,
+      "step": 1010
+    },
+    {
+      "epoch": 0.9066666666666666,
+      "grad_norm": 0.4035646915435791,
+      "learning_rate": 6.183123467168407e-05,
+      "loss": 0.4188549995422363,
+      "step": 1020
+    },
+    {
+      "epoch": 0.9155555555555556,
+      "grad_norm": 0.32446223497390747,
+      "learning_rate": 6.111580904907158e-05,
+      "loss": 0.3956818342208862,
+      "step": 1030
+    },
+    {
+      "epoch": 0.9244444444444444,
+      "grad_norm": 0.400729238986969,
+      "learning_rate": 6.039798114462497e-05,
+      "loss": 0.4168698310852051,
+      "step": 1040
+    },
+    {
+      "epoch": 0.9333333333333333,
+      "grad_norm": 0.3183426856994629,
+      "learning_rate": 5.967790609099604e-05,
+      "loss": 0.42308673858642576,
+      "step": 1050
+    },
+    {
+      "epoch": 0.9422222222222222,
+      "grad_norm": 0.3495250940322876,
+      "learning_rate": 5.895573950647702e-05,
+      "loss": 0.3833682775497437,
+      "step": 1060
+    },
+    {
+      "epoch": 0.9511111111111111,
+      "grad_norm": 0.31198158860206604,
+      "learning_rate": 5.8231637461369124e-05,
+      "loss": 0.3714336633682251,
+      "step": 1070
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 0.33288607001304626,
+      "learning_rate": 5.75057564442538e-05,
+      "loss": 0.37983601093292235,
+      "step": 1080
+    },
+    {
+      "epoch": 0.9688888888888889,
+      "grad_norm": 0.2962404787540436,
+      "learning_rate": 5.677825332817313e-05,
+      "loss": 0.4199959754943848,
+      "step": 1090
+    },
+    {
+      "epoch": 0.9777777777777777,
+      "grad_norm": 0.3724680542945862,
+      "learning_rate": 5.604928533672751e-05,
+      "loss": 0.3786408185958862,
+      "step": 1100
+    },
+    {
+      "epoch": 0.9866666666666667,
+      "grad_norm": 0.3874838948249817,
+      "learning_rate": 5.5319010010097416e-05,
+      "loss": 0.40485186576843263,
+      "step": 1110
+    },
+    {
+      "epoch": 0.9955555555555555,
+      "grad_norm": 0.3587503731250763,
+      "learning_rate": 5.458758517099671e-05,
+      "loss": 0.3982378005981445,
+      "step": 1120
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": 0.3836138844490051,
+      "eval_runtime": 358.5191,
+      "eval_samples_per_second": 2.789,
+      "eval_steps_per_second": 2.789,
+      "step": 1125
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2250,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7.158793139835494e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1125/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:edb44ddd12e5044cfebc57246c77bd185ad4fad52f56a66554c2013fa82c4cfa
+size 4920

checkpoint-2250/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: Qwen/Qwen3.5-2B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen3.5-2B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.19.1

checkpoint-2250/adapter_config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen3.5-2B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "lora_ga_config": null,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.19.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj",
+    "o_proj",
+    "k_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_bdlora": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-2250/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:42cc2cadc1cf728b9d3ca8793d77f15786d281c09f45e9183fba3602774dcb26
+size 43672224

checkpoint-2250/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,154 @@

+{%- set image_count = namespace(value=0) %}
+{%- set video_count = namespace(value=0) %}
+{%- macro render_content(content, do_vision_count, is_system_content=false) %}
+    {%- if content is string %}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping %}
+        {%- for item in content %}
+            {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain images.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set image_count.value = image_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Picture ' ~ image_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
+            {%- elif 'video' in item or item.type == 'video' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain videos.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set video_count.value = video_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Video ' ~ video_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
+            {%- elif 'text' in item %}
+                {{- item.text }}
+            {%- else %}
+                {{- raise_exception('Unexpected item type in content.') }}
+            {%- endif %}
+        {%- endfor %}
+    {%- elif content is none or content is undefined %}
+        {{- '' }}
+    {%- else %}
+        {{- raise_exception('Unexpected content type.') }}
+    {%- endif %}
+{%- endmacro %}
+{%- if not messages %}
+    {{- raise_exception('No messages provided.') }}
+{%- endif %}
+{%- if tools and tools is iterable and tools is not mapping %}
+    {{- '<|im_start|>system\n' }}
+    {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {%- if content %}
+            {{- '\n\n' + content }}
+        {%- endif %}
+    {%- endif %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" %}
+        {%- set content = render_content(message.content, false)|trim %}
+        {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
+            {%- set ns.multi_step_tool = false %}
+            {%- set ns.last_query_index = index %}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if ns.multi_step_tool %}
+    {{- raise_exception('No user query found in messages.') }}
+{%- endif %}
+{%- for message in messages %}
+    {%- set content = render_content(message.content, true)|trim %}
+    {%- if message.role == "system" %}
+        {%- if not loop.first %}
+            {{- raise_exception('System message must be at the beginning.') }}
+        {%- endif %}
+    {%- elif message.role == "user" %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- set reasoning_content = reasoning_content|trim %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if tool_call.function is defined %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {%- if loop.first %}
+                    {%- if content|trim %}
+                        {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- else %}
+                        {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- endif %}
+                {%- else %}
+                    {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                {%- endif %}
+                {%- if tool_call.arguments is defined %}
+                    {%- for args_name, args_value in tool_call.arguments|items %}
+                        {{- '<parameter=' + args_name + '>\n' }}
+                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                        {{- args_value }}
+                        {{- '\n</parameter>\n' }}
+                    {%- endfor %}
+                {%- endif %}
+                {{- '</function>\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- raise_exception('Unexpected message role.') }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is true %}
+        {{- '<think>\n' }}
+    {%- else %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

checkpoint-2250/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:404a943a609bfcb75250a8011119f09418c894895bc308a075552b1b6559b68e
+size 87455482

checkpoint-2250/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a4e21b972e65751c8023031b61e969faab5e81f9d646cc34a6f2246636644f45
+size 14244

checkpoint-2250/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1fa7b0a07a77239aa276c8f4457f190e8ab830494b0402a18140f02280d03eba
+size 1064

checkpoint-2250/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d73c2c5f7aa0ed522c8d96ef3524739eb61e3c78e74839a2ce4a1c56ea340a20
+size 19989424

checkpoint-2250/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "add_prefix_space": false,
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>",
+  "audio_token": "<|audio_pad|>",
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "image_token": "<|image_pad|>",
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 262144,
+  "model_specific_special_tokens": {
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>",
+    "audio_token": "<|audio_pad|>",
+    "image_token": "<|image_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>"
+  },
+  "pad_token": "<|endoftext|>",
+  "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null,
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>"
+}

checkpoint-2250/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1625 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 2250,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008888888888888889,
+      "grad_norm": 11.21920108795166,
+      "learning_rate": 7.964601769911505e-06,
+      "loss": 2.420624923706055,
+      "step": 10
+    },
+    {
+      "epoch": 0.017777777777777778,
+      "grad_norm": 2.168776750564575,
+      "learning_rate": 1.6814159292035402e-05,
+      "loss": 2.054325294494629,
+      "step": 20
+    },
+    {
+      "epoch": 0.02666666666666667,
+      "grad_norm": 1.835652232170105,
+      "learning_rate": 2.5663716814159294e-05,
+      "loss": 1.6131031036376953,
+      "step": 30
+    },
+    {
+      "epoch": 0.035555555555555556,
+      "grad_norm": 1.4394404888153076,
+      "learning_rate": 3.451327433628319e-05,
+      "loss": 1.3123435020446776,
+      "step": 40
+    },
+    {
+      "epoch": 0.044444444444444446,
+      "grad_norm": 0.770032525062561,
+      "learning_rate": 4.3362831858407084e-05,
+      "loss": 0.9337721824645996,
+      "step": 50
+    },
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 0.6677928566932678,
+      "learning_rate": 5.221238938053098e-05,
+      "loss": 0.7623300552368164,
+      "step": 60
+    },
+    {
+      "epoch": 0.06222222222222222,
+      "grad_norm": 0.7739974856376648,
+      "learning_rate": 6.106194690265487e-05,
+      "loss": 0.6224458694458008,
+      "step": 70
+    },
+    {
+      "epoch": 0.07111111111111111,
+      "grad_norm": 0.5803143382072449,
+      "learning_rate": 6.991150442477876e-05,
+      "loss": 0.5023368835449219,
+      "step": 80
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 0.5280536413192749,
+      "learning_rate": 7.876106194690266e-05,
+      "loss": 0.4957974910736084,
+      "step": 90
+    },
+    {
+      "epoch": 0.08888888888888889,
+      "grad_norm": 0.544360876083374,
+      "learning_rate": 8.761061946902655e-05,
+      "loss": 0.4931485652923584,
+      "step": 100
+    },
+    {
+      "epoch": 0.09777777777777778,
+      "grad_norm": 0.5157379508018494,
+      "learning_rate": 9.646017699115044e-05,
+      "loss": 0.5069647789001465,
+      "step": 110
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 0.4112723767757416,
+      "learning_rate": 9.999805495168024e-05,
+      "loss": 0.4394033908843994,
+      "step": 120
+    },
+    {
+      "epoch": 0.11555555555555555,
+      "grad_norm": 0.5091575980186462,
+      "learning_rate": 9.998616909329826e-05,
+      "loss": 0.4934559345245361,
+      "step": 130
+    },
+    {
+      "epoch": 0.12444444444444444,
+      "grad_norm": 0.372969388961792,
+      "learning_rate": 9.996348052452065e-05,
+      "loss": 0.48008193969726565,
+      "step": 140
+    },
+    {
+      "epoch": 0.13333333333333333,
+      "grad_norm": 0.42792317271232605,
+      "learning_rate": 9.992999414866448e-05,
+      "loss": 0.40410771369934084,
+      "step": 150
+    },
+    {
+      "epoch": 0.14222222222222222,
+      "grad_norm": 0.605674684047699,
+      "learning_rate": 9.988571720260407e-05,
+      "loss": 0.44922242164611814,
+      "step": 160
+    },
+    {
+      "epoch": 0.1511111111111111,
+      "grad_norm": 0.4557352662086487,
+      "learning_rate": 9.98306592552068e-05,
+      "loss": 0.4807539939880371,
+      "step": 170
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.35166481137275696,
+      "learning_rate": 9.976483220526534e-05,
+      "loss": 0.41620450019836425,
+      "step": 180
+    },
+    {
+      "epoch": 0.1688888888888889,
+      "grad_norm": 0.3679167628288269,
+      "learning_rate": 9.968825027892603e-05,
+      "loss": 0.5430550575256348,
+      "step": 190
+    },
+    {
+      "epoch": 0.17777777777777778,
+      "grad_norm": 0.40084755420684814,
+      "learning_rate": 9.960093002661443e-05,
+      "loss": 0.45009822845458985,
+      "step": 200
+    },
+    {
+      "epoch": 0.18666666666666668,
+      "grad_norm": 0.4580940306186676,
+      "learning_rate": 9.95028903194586e-05,
+      "loss": 0.42539730072021487,
+      "step": 210
+    },
+    {
+      "epoch": 0.19555555555555557,
+      "grad_norm": 0.33829355239868164,
+      "learning_rate": 9.939415234521074e-05,
+      "loss": 0.45442776679992675,
+      "step": 220
+    },
+    {
+      "epoch": 0.20444444444444446,
+      "grad_norm": 0.3858638405799866,
+      "learning_rate": 9.92747396036682e-05,
+      "loss": 0.46364641189575195,
+      "step": 230
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 0.38474398851394653,
+      "learning_rate": 9.91446779015949e-05,
+      "loss": 0.47534937858581544,
+      "step": 240
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 0.41425031423568726,
+      "learning_rate": 9.900399534714406e-05,
+      "loss": 0.4933596134185791,
+      "step": 250
+    },
+    {
+      "epoch": 0.2311111111111111,
+      "grad_norm": 0.35072949528694153,
+      "learning_rate": 9.885272234378373e-05,
+      "loss": 0.4453242778778076,
+      "step": 260
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 0.3859287202358246,
+      "learning_rate": 9.869089158372608e-05,
+      "loss": 0.48943133354187013,
+      "step": 270
+    },
+    {
+      "epoch": 0.24888888888888888,
+      "grad_norm": 0.35846146941185,
+      "learning_rate": 9.851853804086221e-05,
+      "loss": 0.4975121021270752,
+      "step": 280
+    },
+    {
+      "epoch": 0.2577777777777778,
+      "grad_norm": 0.43171802163124084,
+      "learning_rate": 9.833569896320376e-05,
+      "loss": 0.4233686447143555,
+      "step": 290
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 0.38140612840652466,
+      "learning_rate": 9.814241386483315e-05,
+      "loss": 0.45711665153503417,
+      "step": 300
+    },
+    {
+      "epoch": 0.27555555555555555,
+      "grad_norm": 0.3377639353275299,
+      "learning_rate": 9.7938724517364e-05,
+      "loss": 0.4544373035430908,
+      "step": 310
+    },
+    {
+      "epoch": 0.28444444444444444,
+      "grad_norm": 0.34243062138557434,
+      "learning_rate": 9.772467494091368e-05,
+      "loss": 0.42964882850646974,
+      "step": 320
+    },
+    {
+      "epoch": 0.29333333333333333,
+      "grad_norm": 0.34049737453460693,
+      "learning_rate": 9.750031139459004e-05,
+      "loss": 0.42655010223388673,
+      "step": 330
+    },
+    {
+      "epoch": 0.3022222222222222,
+      "grad_norm": 0.35732948780059814,
+      "learning_rate": 9.726568236649401e-05,
+      "loss": 0.45878915786743163,
+      "step": 340
+    },
+    {
+      "epoch": 0.3111111111111111,
+      "grad_norm": 0.3442543148994446,
+      "learning_rate": 9.702083856324078e-05,
+      "loss": 0.43412394523620607,
+      "step": 350
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 0.3641473650932312,
+      "learning_rate": 9.676583289900137e-05,
+      "loss": 0.4166750907897949,
+      "step": 360
+    },
+    {
+      "epoch": 0.3288888888888889,
+      "grad_norm": 0.38029661774635315,
+      "learning_rate": 9.650072048406705e-05,
+      "loss": 0.4133127212524414,
+      "step": 370
+    },
+    {
+      "epoch": 0.3377777777777778,
+      "grad_norm": 0.46410584449768066,
+      "learning_rate": 9.622555861293937e-05,
+      "loss": 0.4788216590881348,
+      "step": 380
+    },
+    {
+      "epoch": 0.3466666666666667,
+      "grad_norm": 0.46155110001564026,
+      "learning_rate": 9.594040675194789e-05,
+      "loss": 0.438585090637207,
+      "step": 390
+    },
+    {
+      "epoch": 0.35555555555555557,
+      "grad_norm": 0.3239903450012207,
+      "learning_rate": 9.564532652639874e-05,
+      "loss": 0.41576008796691893,
+      "step": 400
+    },
+    {
+      "epoch": 0.36444444444444446,
+      "grad_norm": 0.37329864501953125,
+      "learning_rate": 9.534038170725656e-05,
+      "loss": 0.45173234939575196,
+      "step": 410
+    },
+    {
+      "epoch": 0.37333333333333335,
+      "grad_norm": 0.36062464118003845,
+      "learning_rate": 9.502563819736261e-05,
+      "loss": 0.4431358814239502,
+      "step": 420
+    },
+    {
+      "epoch": 0.38222222222222224,
+      "grad_norm": 0.32877373695373535,
+      "learning_rate": 9.47011640171923e-05,
+      "loss": 0.38395123481750487,
+      "step": 430
+    },
+    {
+      "epoch": 0.39111111111111113,
+      "grad_norm": 0.39523133635520935,
+      "learning_rate": 9.436702929015504e-05,
+      "loss": 0.45081305503845215,
+      "step": 440
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.396107017993927,
+      "learning_rate": 9.402330622743953e-05,
+      "loss": 0.431630802154541,
+      "step": 450
+    },
+    {
+      "epoch": 0.4088888888888889,
+      "grad_norm": 0.34849977493286133,
+      "learning_rate": 9.367006911240794e-05,
+      "loss": 0.4235401153564453,
+      "step": 460
+    },
+    {
+      "epoch": 0.4177777777777778,
+      "grad_norm": 0.32264649868011475,
+      "learning_rate": 9.330739428454228e-05,
+      "loss": 0.41276025772094727,
+      "step": 470
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 0.40649130940437317,
+      "learning_rate": 9.293536012294625e-05,
+      "loss": 0.4388761520385742,
+      "step": 480
+    },
+    {
+      "epoch": 0.43555555555555553,
+      "grad_norm": 0.3980196714401245,
+      "learning_rate": 9.25540470294066e-05,
+      "loss": 0.434765625,
+      "step": 490
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 0.34392979741096497,
+      "learning_rate": 9.216353741101698e-05,
+      "loss": 0.4237233638763428,
+      "step": 500
+    },
+    {
+      "epoch": 0.4533333333333333,
+      "grad_norm": 0.4070020318031311,
+      "learning_rate": 9.176391566236874e-05,
+      "loss": 0.43479743003845217,
+      "step": 510
+    },
+    {
+      "epoch": 0.4622222222222222,
+      "grad_norm": 0.3691484332084656,
+      "learning_rate": 9.13552681473121e-05,
+      "loss": 0.41257572174072266,
+      "step": 520
+    },
+    {
+      "epoch": 0.4711111111111111,
+      "grad_norm": 0.31974899768829346,
+      "learning_rate": 9.093768318029157e-05,
+      "loss": 0.4397622585296631,
+      "step": 530
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 0.3324742317199707,
+      "learning_rate": 9.051125100726012e-05,
+      "loss": 0.4031101703643799,
+      "step": 540
+    },
+    {
+      "epoch": 0.4888888888888889,
+      "grad_norm": 0.4857999384403229,
+      "learning_rate": 9.00760637861757e-05,
+      "loss": 0.4260400295257568,
+      "step": 550
+    },
+    {
+      "epoch": 0.49777777777777776,
+      "grad_norm": 0.3340180218219757,
+      "learning_rate": 8.96322155670846e-05,
+      "loss": 0.45470619201660156,
+      "step": 560
+    },
+    {
+      "epoch": 0.5066666666666667,
+      "grad_norm": 0.44739553332328796,
+      "learning_rate": 8.917980227179592e-05,
+      "loss": 0.40158867835998535,
+      "step": 570
+    },
+    {
+      "epoch": 0.5155555555555555,
+      "grad_norm": 0.3540358245372772,
+      "learning_rate": 8.871892167315158e-05,
+      "loss": 0.43067140579223634,
+      "step": 580
+    },
+    {
+      "epoch": 0.5244444444444445,
+      "grad_norm": 0.3139533996582031,
+      "learning_rate": 8.824967337389618e-05,
+      "loss": 0.4297904968261719,
+      "step": 590
+    },
+    {
+      "epoch": 0.5333333333333333,
+      "grad_norm": 0.31625425815582275,
+      "learning_rate": 8.777215878515148e-05,
+      "loss": 0.41481895446777345,
+      "step": 600
+    },
+    {
+      "epoch": 0.5422222222222223,
+      "grad_norm": 0.39057719707489014,
+      "learning_rate": 8.728648110450007e-05,
+      "loss": 0.4180173873901367,
+      "step": 610
+    },
+    {
+      "epoch": 0.5511111111111111,
+      "grad_norm": 0.358767569065094,
+      "learning_rate": 8.679274529368284e-05,
+      "loss": 0.42813663482666015,
+      "step": 620
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 0.3512662649154663,
+      "learning_rate": 8.629105805591536e-05,
+      "loss": 0.45242977142333984,
+      "step": 630
+    },
+    {
+      "epoch": 0.5688888888888889,
+      "grad_norm": 0.3629150688648224,
+      "learning_rate": 8.57815278128278e-05,
+      "loss": 0.39222893714904783,
+      "step": 640
+    },
+    {
+      "epoch": 0.5777777777777777,
+      "grad_norm": 0.4052797555923462,
+      "learning_rate": 8.52642646810335e-05,
+      "loss": 0.43261022567749025,
+      "step": 650
+    },
+    {
+      "epoch": 0.5866666666666667,
+      "grad_norm": 0.37778323888778687,
+      "learning_rate": 8.473938044833118e-05,
+      "loss": 0.3872582674026489,
+      "step": 660
+    },
+    {
+      "epoch": 0.5955555555555555,
+      "grad_norm": 0.3754846453666687,
+      "learning_rate": 8.420698854954614e-05,
+      "loss": 0.3962836980819702,
+      "step": 670
+    },
+    {
+      "epoch": 0.6044444444444445,
+      "grad_norm": 0.3173210918903351,
+      "learning_rate": 8.366720404201532e-05,
+      "loss": 0.40433526039123535,
+      "step": 680
+    },
+    {
+      "epoch": 0.6133333333333333,
+      "grad_norm": 0.3750768005847931,
+      "learning_rate": 8.312014358072182e-05,
+      "loss": 0.44541182518005373,
+      "step": 690
+    },
+    {
+      "epoch": 0.6222222222222222,
+      "grad_norm": 0.396304726600647,
+      "learning_rate": 8.256592539308412e-05,
+      "loss": 0.39640786647796633,
+      "step": 700
+    },
+    {
+      "epoch": 0.6311111111111111,
+      "grad_norm": 0.27579763531684875,
+      "learning_rate": 8.200466925340551e-05,
+      "loss": 0.42966270446777344,
+      "step": 710
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 0.3287679851055145,
+      "learning_rate": 8.143649645698916e-05,
+      "loss": 0.4303389549255371,
+      "step": 720
+    },
+    {
+      "epoch": 0.6488888888888888,
+      "grad_norm": 0.3433365821838379,
+      "learning_rate": 8.086152979392455e-05,
+      "loss": 0.42377805709838867,
+      "step": 730
+    },
+    {
+      "epoch": 0.6577777777777778,
+      "grad_norm": 0.37662261724472046,
+      "learning_rate": 8.027989352255072e-05,
+      "loss": 0.3759341239929199,
+      "step": 740
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 0.293804794549942,
+      "learning_rate": 7.969171334260241e-05,
+      "loss": 0.4281143665313721,
+      "step": 750
+    },
+    {
+      "epoch": 0.6755555555555556,
+      "grad_norm": 0.3423449695110321,
+      "learning_rate": 7.909711636804456e-05,
+      "loss": 0.4293722152709961,
+      "step": 760
+    },
+    {
+      "epoch": 0.6844444444444444,
+      "grad_norm": 0.3712252080440521,
+      "learning_rate": 7.849623109960114e-05,
+      "loss": 0.3751317262649536,
+      "step": 770
+    },
+    {
+      "epoch": 0.6933333333333334,
+      "grad_norm": 0.30978795886039734,
+      "learning_rate": 7.788918739698442e-05,
+      "loss": 0.4274724006652832,
+      "step": 780
+    },
+    {
+      "epoch": 0.7022222222222222,
+      "grad_norm": 0.3778475821018219,
+      "learning_rate": 7.727611645083046e-05,
+      "loss": 0.4593647480010986,
+      "step": 790
+    },
+    {
+      "epoch": 0.7111111111111111,
+      "grad_norm": 0.4193788170814514,
+      "learning_rate": 7.665715075434693e-05,
+      "loss": 0.4005401134490967,
+      "step": 800
+    },
+    {
+      "epoch": 0.72,
+      "grad_norm": 0.4584580063819885,
+      "learning_rate": 7.603242407467957e-05,
+      "loss": 0.4481417179107666,
+      "step": 810
+    },
+    {
+      "epoch": 0.7288888888888889,
+      "grad_norm": 0.4954594373703003,
+      "learning_rate": 7.54020714240031e-05,
+      "loss": 0.4419954776763916,
+      "step": 820
+    },
+    {
+      "epoch": 0.7377777777777778,
+      "grad_norm": 0.32087239623069763,
+      "learning_rate": 7.476622903034331e-05,
+      "loss": 0.358188796043396,
+      "step": 830
+    },
+    {
+      "epoch": 0.7466666666666667,
+      "grad_norm": 0.415523499250412,
+      "learning_rate": 7.412503430813625e-05,
+      "loss": 0.4097938060760498,
+      "step": 840
+    },
+    {
+      "epoch": 0.7555555555555555,
+      "grad_norm": 0.4330928325653076,
+      "learning_rate": 7.347862582853102e-05,
+      "loss": 0.39210293292999265,
+      "step": 850
+    },
+    {
+      "epoch": 0.7644444444444445,
+      "grad_norm": 0.32685935497283936,
+      "learning_rate": 7.282714328944267e-05,
+      "loss": 0.4046238899230957,
+      "step": 860
+    },
+    {
+      "epoch": 0.7733333333333333,
+      "grad_norm": 0.25576087832450867,
+      "learning_rate": 7.217072748536147e-05,
+      "loss": 0.424999475479126,
+      "step": 870
+    },
+    {
+      "epoch": 0.7822222222222223,
+      "grad_norm": 0.5378305912017822,
+      "learning_rate": 7.150952027692523e-05,
+      "loss": 0.4036864757537842,
+      "step": 880
+    },
+    {
+      "epoch": 0.7911111111111111,
+      "grad_norm": 0.3434256315231323,
+      "learning_rate": 7.08436645602613e-05,
+      "loss": 0.4073346614837646,
+      "step": 890
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 0.38994190096855164,
+      "learning_rate": 7.017330423610469e-05,
+      "loss": 0.38085227012634276,
+      "step": 900
+    },
+    {
+      "epoch": 0.8088888888888889,
+      "grad_norm": 0.36494192481040955,
+      "learning_rate": 6.949858417869908e-05,
+      "loss": 0.41643786430358887,
+      "step": 910
+    },
+    {
+      "epoch": 0.8177777777777778,
+      "grad_norm": 0.3612149655818939,
+      "learning_rate": 6.881965020448753e-05,
+      "loss": 0.4108633518218994,
+      "step": 920
+    },
+    {
+      "epoch": 0.8266666666666667,
+      "grad_norm": 0.43641334772109985,
+      "learning_rate": 6.813664904059944e-05,
+      "loss": 0.42844533920288086,
+      "step": 930
+    },
+    {
+      "epoch": 0.8355555555555556,
+      "grad_norm": 0.37633153796195984,
+      "learning_rate": 6.744972829314079e-05,
+      "loss": 0.4326623916625977,
+      "step": 940
+    },
+    {
+      "epoch": 0.8444444444444444,
+      "grad_norm": 0.30158376693725586,
+      "learning_rate": 6.675903641529442e-05,
+      "loss": 0.41312832832336427,
+      "step": 950
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.49504661560058594,
+      "learning_rate": 6.60647226752372e-05,
+      "loss": 0.38414552211761477,
+      "step": 960
+    },
+    {
+      "epoch": 0.8622222222222222,
+      "grad_norm": 0.40403881669044495,
+      "learning_rate": 6.536693712388108e-05,
+      "loss": 0.40920281410217285,
+      "step": 970
+    },
+    {
+      "epoch": 0.8711111111111111,
+      "grad_norm": 0.3977372646331787,
+      "learning_rate": 6.466583056244502e-05,
+      "loss": 0.4197856903076172,
+      "step": 980
+    },
+    {
+      "epoch": 0.88,
+      "grad_norm": 0.3222980499267578,
+      "learning_rate": 6.396155450986467e-05,
+      "loss": 0.43209381103515626,
+      "step": 990
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 0.3269721269607544,
+      "learning_rate": 6.32542611700471e-05,
+      "loss": 0.3872633457183838,
+      "step": 1000
+    },
+    {
+      "epoch": 0.8977777777777778,
+      "grad_norm": 0.2627931237220764,
+      "learning_rate": 6.254410339897733e-05,
+      "loss": 0.40660743713378905,
+      "step": 1010
+    },
+    {
+      "epoch": 0.9066666666666666,
+      "grad_norm": 0.4035646915435791,
+      "learning_rate": 6.183123467168407e-05,
+      "loss": 0.4188549995422363,
+      "step": 1020
+    },
+    {
+      "epoch": 0.9155555555555556,
+      "grad_norm": 0.32446223497390747,
+      "learning_rate": 6.111580904907158e-05,
+      "loss": 0.3956818342208862,
+      "step": 1030
+    },
+    {
+      "epoch": 0.9244444444444444,
+      "grad_norm": 0.400729238986969,
+      "learning_rate": 6.039798114462497e-05,
+      "loss": 0.4168698310852051,
+      "step": 1040
+    },
+    {
+      "epoch": 0.9333333333333333,
+      "grad_norm": 0.3183426856994629,
+      "learning_rate": 5.967790609099604e-05,
+      "loss": 0.42308673858642576,
+      "step": 1050
+    },
+    {
+      "epoch": 0.9422222222222222,
+      "grad_norm": 0.3495250940322876,
+      "learning_rate": 5.895573950647702e-05,
+      "loss": 0.3833682775497437,
+      "step": 1060
+    },
+    {
+      "epoch": 0.9511111111111111,
+      "grad_norm": 0.31198158860206604,
+      "learning_rate": 5.8231637461369124e-05,
+      "loss": 0.3714336633682251,
+      "step": 1070
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 0.33288607001304626,
+      "learning_rate": 5.75057564442538e-05,
+      "loss": 0.37983601093292235,
+      "step": 1080
+    },
+    {
+      "epoch": 0.9688888888888889,
+      "grad_norm": 0.2962404787540436,
+      "learning_rate": 5.677825332817313e-05,
+      "loss": 0.4199959754943848,
+      "step": 1090
+    },
+    {
+      "epoch": 0.9777777777777777,
+      "grad_norm": 0.3724680542945862,
+      "learning_rate": 5.604928533672751e-05,
+      "loss": 0.3786408185958862,
+      "step": 1100
+    },
+    {
+      "epoch": 0.9866666666666667,
+      "grad_norm": 0.3874838948249817,
+      "learning_rate": 5.5319010010097416e-05,
+      "loss": 0.40485186576843263,
+      "step": 1110
+    },
+    {
+      "epoch": 0.9955555555555555,
+      "grad_norm": 0.3587503731250763,
+      "learning_rate": 5.458758517099671e-05,
+      "loss": 0.3982378005981445,
+      "step": 1120
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": 0.3836138844490051,
+      "eval_runtime": 358.5191,
+      "eval_samples_per_second": 2.789,
+      "eval_steps_per_second": 2.789,
+      "step": 1125
+    },
+    {
+      "epoch": 1.0044444444444445,
+      "grad_norm": 0.33365780115127563,
+      "learning_rate": 5.385516889056501e-05,
+      "loss": 0.39370877742767335,
+      "step": 1130
+    },
+    {
+      "epoch": 1.0133333333333334,
+      "grad_norm": 0.37060853838920593,
+      "learning_rate": 5.3121919454206235e-05,
+      "loss": 0.39833950996398926,
+      "step": 1140
+    },
+    {
+      "epoch": 1.0222222222222221,
+      "grad_norm": 0.42938774824142456,
+      "learning_rate": 5.238799532738101e-05,
+      "loss": 0.40357198715209963,
+      "step": 1150
+    },
+    {
+      "epoch": 1.031111111111111,
+      "grad_norm": 0.34901952743530273,
+      "learning_rate": 5.165355512135997e-05,
+      "loss": 0.37219626903533937,
+      "step": 1160
+    },
+    {
+      "epoch": 1.04,
+      "grad_norm": 0.2980557084083557,
+      "learning_rate": 5.091875755894567e-05,
+      "loss": 0.414153003692627,
+      "step": 1170
+    },
+    {
+      "epoch": 1.048888888888889,
+      "grad_norm": 0.34108784794807434,
+      "learning_rate": 5.018376144017042e-05,
+      "loss": 0.36008477210998535,
+      "step": 1180
+    },
+    {
+      "epoch": 1.0577777777777777,
+      "grad_norm": 0.3673205077648163,
+      "learning_rate": 4.9448725607977334e-05,
+      "loss": 0.35660006999969485,
+      "step": 1190
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 0.3826650083065033,
+      "learning_rate": 4.871380891389209e-05,
+      "loss": 0.38915951251983644,
+      "step": 1200
+    },
+    {
+      "epoch": 1.0755555555555556,
+      "grad_norm": 0.3806168735027313,
+      "learning_rate": 4.7979170183693075e-05,
+      "loss": 0.37926878929138186,
+      "step": 1210
+    },
+    {
+      "epoch": 1.0844444444444445,
+      "grad_norm": 0.34045353531837463,
+      "learning_rate": 4.7244968183086644e-05,
+      "loss": 0.3457269906997681,
+      "step": 1220
+    },
+    {
+      "epoch": 1.0933333333333333,
+      "grad_norm": 0.3612907826900482,
+      "learning_rate": 4.651136158339588e-05,
+      "loss": 0.37788989543914797,
+      "step": 1230
+    },
+    {
+      "epoch": 1.1022222222222222,
+      "grad_norm": 0.487962543964386,
+      "learning_rate": 4.577850892726937e-05,
+      "loss": 0.3983138084411621,
+      "step": 1240
+    },
+    {
+      "epoch": 1.1111111111111112,
+      "grad_norm": 0.4341575801372528,
+      "learning_rate": 4.504656859441797e-05,
+      "loss": 0.4141641616821289,
+      "step": 1250
+    },
+    {
+      "epoch": 1.12,
+      "grad_norm": 0.43467429280281067,
+      "learning_rate": 4.431569876738666e-05,
+      "loss": 0.3803189992904663,
+      "step": 1260
+    },
+    {
+      "epoch": 1.1288888888888888,
+      "grad_norm": 0.37924763560295105,
+      "learning_rate": 4.358605739736918e-05,
+      "loss": 0.35183124542236327,
+      "step": 1270
+    },
+    {
+      "epoch": 1.1377777777777778,
+      "grad_norm": 0.3492391109466553,
+      "learning_rate": 4.285780217007253e-05,
+      "loss": 0.3747081279754639,
+      "step": 1280
+    },
+    {
+      "epoch": 1.1466666666666667,
+      "grad_norm": 0.4096994996070862,
+      "learning_rate": 4.213109047163887e-05,
+      "loss": 0.39840915203094485,
+      "step": 1290
+    },
+    {
+      "epoch": 1.1555555555555554,
+      "grad_norm": 0.4173014760017395,
+      "learning_rate": 4.1406079354632135e-05,
+      "loss": 0.4359447002410889,
+      "step": 1300
+    },
+    {
+      "epoch": 1.1644444444444444,
+      "grad_norm": 0.49125319719314575,
+      "learning_rate": 4.0682925504096884e-05,
+      "loss": 0.3704967737197876,
+      "step": 1310
+    },
+    {
+      "epoch": 1.1733333333333333,
+      "grad_norm": 0.49830952286720276,
+      "learning_rate": 3.9961785203696414e-05,
+      "loss": 0.4010519504547119,
+      "step": 1320
+    },
+    {
+      "epoch": 1.1822222222222223,
+      "grad_norm": 0.422219455242157,
+      "learning_rate": 3.92428143019376e-05,
+      "loss": 0.33720762729644777,
+      "step": 1330
+    },
+    {
+      "epoch": 1.1911111111111112,
+      "grad_norm": 0.39069873094558716,
+      "learning_rate": 3.852616817849e-05,
+      "loss": 0.37198896408081056,
+      "step": 1340
+    },
+    {
+      "epoch": 1.2,
+      "grad_norm": 0.4538562297821045,
+      "learning_rate": 3.7812001710606005e-05,
+      "loss": 0.37902848720550536,
+      "step": 1350
+    },
+    {
+      "epoch": 1.208888888888889,
+      "grad_norm": 0.5780468583106995,
+      "learning_rate": 3.71004692396498e-05,
+      "loss": 0.397153377532959,
+      "step": 1360
+    },
+    {
+      "epoch": 1.2177777777777778,
+      "grad_norm": 0.45919519662857056,
+      "learning_rate": 3.639172453774192e-05,
+      "loss": 0.3878772497177124,
+      "step": 1370
+    },
+    {
+      "epoch": 1.2266666666666666,
+      "grad_norm": 0.3947475254535675,
+      "learning_rate": 3.5685920774527074e-05,
+      "loss": 0.358890962600708,
+      "step": 1380
+    },
+    {
+      "epoch": 1.2355555555555555,
+      "grad_norm": 0.4355222284793854,
+      "learning_rate": 3.498321048407195e-05,
+      "loss": 0.33703978061676027,
+      "step": 1390
+    },
+    {
+      "epoch": 1.2444444444444445,
+      "grad_norm": 0.41858533024787903,
+      "learning_rate": 3.4283745531900514e-05,
+      "loss": 0.3864290714263916,
+      "step": 1400
+    },
+    {
+      "epoch": 1.2533333333333334,
+      "grad_norm": 0.5407046675682068,
+      "learning_rate": 3.3587677082173696e-05,
+      "loss": 0.3781212091445923,
+      "step": 1410
+    },
+    {
+      "epoch": 1.2622222222222224,
+      "grad_norm": 0.4756890833377838,
+      "learning_rate": 3.289515556502076e-05,
+      "loss": 0.3857539653778076,
+      "step": 1420
+    },
+    {
+      "epoch": 1.271111111111111,
+      "grad_norm": 0.4864954650402069,
+      "learning_rate": 3.220633064402925e-05,
+      "loss": 0.372141170501709,
+      "step": 1430
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.40456199645996094,
+      "learning_rate": 3.152135118390049e-05,
+      "loss": 0.3438271522521973,
+      "step": 1440
+    },
+    {
+      "epoch": 1.2888888888888888,
+      "grad_norm": 0.4205606281757355,
+      "learning_rate": 3.0840365218277986e-05,
+      "loss": 0.38721680641174316,
+      "step": 1450
+    },
+    {
+      "epoch": 1.2977777777777777,
+      "grad_norm": 0.39853718876838684,
+      "learning_rate": 3.01635199177552e-05,
+      "loss": 0.35543074607849123,
+      "step": 1460
+    },
+    {
+      "epoch": 1.3066666666666666,
+      "grad_norm": 0.3695829510688782,
+      "learning_rate": 2.949096155806995e-05,
+      "loss": 0.3647240161895752,
+      "step": 1470
+    },
+    {
+      "epoch": 1.3155555555555556,
+      "grad_norm": 0.41311612725257874,
+      "learning_rate": 2.8822835488492124e-05,
+      "loss": 0.3939005613327026,
+      "step": 1480
+    },
+    {
+      "epoch": 1.3244444444444445,
+      "grad_norm": 0.3602710962295532,
+      "learning_rate": 2.8159286100411773e-05,
+      "loss": 0.3515706300735474,
+      "step": 1490
+    },
+    {
+      "epoch": 1.3333333333333333,
+      "grad_norm": 0.44238173961639404,
+      "learning_rate": 2.750045679613402e-05,
+      "loss": 0.3359719514846802,
+      "step": 1500
+    },
+    {
+      "epoch": 1.3422222222222222,
+      "grad_norm": 0.4533243775367737,
+      "learning_rate": 2.6846489957887867e-05,
+      "loss": 0.38518357276916504,
+      "step": 1510
+    },
+    {
+      "epoch": 1.3511111111111112,
+      "grad_norm": 0.48803240060806274,
+      "learning_rate": 2.6197526917055354e-05,
+      "loss": 0.3578159809112549,
+      "step": 1520
+    },
+    {
+      "epoch": 1.3599999999999999,
+      "grad_norm": 0.45722082257270813,
+      "learning_rate": 2.555370792362792e-05,
+      "loss": 0.3800940752029419,
+      "step": 1530
+    },
+    {
+      "epoch": 1.3688888888888888,
+      "grad_norm": 0.5655418634414673,
+      "learning_rate": 2.491517211589643e-05,
+      "loss": 0.3621096134185791,
+      "step": 1540
+    },
+    {
+      "epoch": 1.3777777777777778,
+      "grad_norm": 0.3579365909099579,
+      "learning_rate": 2.428205749038138e-05,
+      "loss": 0.35213282108306887,
+      "step": 1550
+    },
+    {
+      "epoch": 1.3866666666666667,
+      "grad_norm": 0.4807777404785156,
+      "learning_rate": 2.3654500872009962e-05,
+      "loss": 0.37083146572113035,
+      "step": 1560
+    },
+    {
+      "epoch": 1.3955555555555557,
+      "grad_norm": 0.42942166328430176,
+      "learning_rate": 2.3032637884546232e-05,
+      "loss": 0.35763652324676515,
+      "step": 1570
+    },
+    {
+      "epoch": 1.4044444444444444,
+      "grad_norm": 0.3682578206062317,
+      "learning_rate": 2.241660292128106e-05,
+      "loss": 0.35643126964569094,
+      "step": 1580
+    },
+    {
+      "epoch": 1.4133333333333333,
+      "grad_norm": 0.37660112977027893,
+      "learning_rate": 2.1806529115987662e-05,
+      "loss": 0.3433783769607544,
+      "step": 1590
+    },
+    {
+      "epoch": 1.4222222222222223,
+      "grad_norm": 0.3853251039981842,
+      "learning_rate": 2.1202548314149667e-05,
+      "loss": 0.3709646940231323,
+      "step": 1600
+    },
+    {
+      "epoch": 1.431111111111111,
+      "grad_norm": 0.39444634318351746,
+      "learning_rate": 2.0604791044467392e-05,
+      "loss": 0.3709129810333252,
+      "step": 1610
+    },
+    {
+      "epoch": 1.44,
+      "grad_norm": 0.4143366515636444,
+      "learning_rate": 2.0013386490648882e-05,
+      "loss": 0.38674709796905515,
+      "step": 1620
+    },
+    {
+      "epoch": 1.448888888888889,
+      "grad_norm": 0.4747564196586609,
+      "learning_rate": 1.9428462463491277e-05,
+      "loss": 0.35155019760131834,
+      "step": 1630
+    },
+    {
+      "epoch": 1.4577777777777778,
+      "grad_norm": 0.47379782795906067,
+      "learning_rate": 1.885014537325937e-05,
+      "loss": 0.38028242588043215,
+      "step": 1640
+    },
+    {
+      "epoch": 1.4666666666666668,
+      "grad_norm": 0.44089680910110474,
+      "learning_rate": 1.8278560202366412e-05,
+      "loss": 0.3407177448272705,
+      "step": 1650
+    },
+    {
+      "epoch": 1.4755555555555555,
+      "grad_norm": 0.39386194944381714,
+      "learning_rate": 1.771383047836371e-05,
+      "loss": 0.34243743419647216,
+      "step": 1660
+    },
+    {
+      "epoch": 1.4844444444444445,
+      "grad_norm": 0.26572591066360474,
+      "learning_rate": 1.7156078247244577e-05,
+      "loss": 0.37088680267333984,
+      "step": 1670
+    },
+    {
+      "epoch": 1.4933333333333334,
+      "grad_norm": 0.5262919068336487,
+      "learning_rate": 1.6605424047068578e-05,
+      "loss": 0.34389023780822753,
+      "step": 1680
+    },
+    {
+      "epoch": 1.5022222222222221,
+      "grad_norm": 0.39271703362464905,
+      "learning_rate": 1.6061986881911434e-05,
+      "loss": 0.41210026741027833,
+      "step": 1690
+    },
+    {
+      "epoch": 1.511111111111111,
+      "grad_norm": 0.4093843698501587,
+      "learning_rate": 1.552588419614665e-05,
+      "loss": 0.3735250473022461,
+      "step": 1700
+    },
+    {
+      "epoch": 1.52,
+      "grad_norm": 0.40127241611480713,
+      "learning_rate": 1.4997231849064125e-05,
+      "loss": 0.3680351734161377,
+      "step": 1710
+    },
+    {
+      "epoch": 1.528888888888889,
+      "grad_norm": 0.3623751401901245,
+      "learning_rate": 1.4476144089831412e-05,
+      "loss": 0.3539560556411743,
+      "step": 1720
+    },
+    {
+      "epoch": 1.537777777777778,
+      "grad_norm": 0.46453019976615906,
+      "learning_rate": 1.396273353280269e-05,
+      "loss": 0.36685130596160886,
+      "step": 1730
+    },
+    {
+      "epoch": 1.5466666666666666,
+      "grad_norm": 0.5483896136283875,
+      "learning_rate": 1.345711113318146e-05,
+      "loss": 0.38913209438323976,
+      "step": 1740
+    },
+    {
+      "epoch": 1.5555555555555556,
+      "grad_norm": 0.3098852038383484,
+      "learning_rate": 1.2959386163041388e-05,
+      "loss": 0.358487868309021,
+      "step": 1750
+    },
+    {
+      "epoch": 1.5644444444444443,
+      "grad_norm": 0.43505361676216125,
+      "learning_rate": 1.2469666187711216e-05,
+      "loss": 0.3721137046813965,
+      "step": 1760
+    },
+    {
+      "epoch": 1.5733333333333333,
+      "grad_norm": 0.4241721034049988,
+      "learning_rate": 1.1988057042528229e-05,
+      "loss": 0.375252890586853,
+      "step": 1770
+    },
+    {
+      "epoch": 1.5822222222222222,
+      "grad_norm": 0.4724214971065521,
+      "learning_rate": 1.151466280996596e-05,
+      "loss": 0.42426095008850095,
+      "step": 1780
+    },
+    {
+      "epoch": 1.5911111111111111,
+      "grad_norm": 0.3953167200088501,
+      "learning_rate": 1.1049585797140322e-05,
+      "loss": 0.40070137977600095,
+      "step": 1790
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 0.45680221915245056,
+      "learning_rate": 1.0592926513699774e-05,
+      "loss": 0.39249238967895506,
+      "step": 1800
+    },
+    {
+      "epoch": 1.608888888888889,
+      "grad_norm": 0.3459886908531189,
+      "learning_rate": 1.0144783650103611e-05,
+      "loss": 0.3837989330291748,
+      "step": 1810
+    },
+    {
+      "epoch": 1.6177777777777778,
+      "grad_norm": 0.38981544971466064,
+      "learning_rate": 9.705254056293745e-06,
+      "loss": 0.35657615661621095,
+      "step": 1820
+    },
+    {
+      "epoch": 1.6266666666666667,
+      "grad_norm": 0.3623422086238861,
+      "learning_rate": 9.274432720763936e-06,
+      "loss": 0.3656754016876221,
+      "step": 1830
+    },
+    {
+      "epoch": 1.6355555555555554,
+      "grad_norm": 0.4460585415363312,
+      "learning_rate": 8.852412750031552e-06,
+      "loss": 0.36761391162872314,
+      "step": 1840
+    },
+    {
+      "epoch": 1.6444444444444444,
+      "grad_norm": 0.3528312146663666,
+      "learning_rate": 8.439285348515875e-06,
+      "loss": 0.3759877920150757,
+      "step": 1850
+    },
+    {
+      "epoch": 1.6533333333333333,
+      "grad_norm": 0.5255836248397827,
+      "learning_rate": 8.035139798827606e-06,
+      "loss": 0.38388800621032715,
+      "step": 1860
+    },
+    {
+      "epoch": 1.6622222222222223,
+      "grad_norm": 0.4128068685531616,
+      "learning_rate": 7.64006344247361e-06,
+      "loss": 0.33345489501953124,
+      "step": 1870
+    },
+    {
+      "epoch": 1.6711111111111112,
+      "grad_norm": 0.4646780788898468,
+      "learning_rate": 7.254141660981195e-06,
+      "loss": 0.3499415397644043,
+      "step": 1880
+    },
+    {
+      "epoch": 1.6800000000000002,
+      "grad_norm": 0.4263407588005066,
+      "learning_rate": 6.8774578574459795e-06,
+      "loss": 0.3327197790145874,
+      "step": 1890
+    },
+    {
+      "epoch": 1.6888888888888889,
+      "grad_norm": 0.5277701616287231,
+      "learning_rate": 6.510093438507347e-06,
+      "loss": 0.3840624809265137,
+      "step": 1900
+    },
+    {
+      "epoch": 1.6977777777777778,
+      "grad_norm": 0.37307706475257874,
+      "learning_rate": 6.152127796755264e-06,
+      "loss": 0.3754091739654541,
+      "step": 1910
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 0.44183653593063354,
+      "learning_rate": 5.803638293572472e-06,
+      "loss": 0.38776986598968505,
+      "step": 1920
+    },
+    {
+      "epoch": 1.7155555555555555,
+      "grad_norm": 0.48444098234176636,
+      "learning_rate": 5.4647002424156156e-06,
+      "loss": 0.3501073122024536,
+      "step": 1930
+    },
+    {
+      "epoch": 1.7244444444444444,
+      "grad_norm": 0.49215278029441833,
+      "learning_rate": 5.135386892538968e-06,
+      "loss": 0.3459045648574829,
+      "step": 1940
+    },
+    {
+      "epoch": 1.7333333333333334,
+      "grad_norm": 0.45480048656463623,
+      "learning_rate": 4.81576941316415e-06,
+      "loss": 0.38027493953704833,
+      "step": 1950
+    },
+    {
+      "epoch": 1.7422222222222223,
+      "grad_norm": 0.3511333465576172,
+      "learning_rate": 4.5059168780995975e-06,
+      "loss": 0.4008301258087158,
+      "step": 1960
+    },
+    {
+      "epoch": 1.751111111111111,
+      "grad_norm": 0.3821912109851837,
+      "learning_rate": 4.205896250812635e-06,
+      "loss": 0.33486261367797854,
+      "step": 1970
+    },
+    {
+      "epoch": 1.76,
+      "grad_norm": 0.31993553042411804,
+      "learning_rate": 3.9157723699578264e-06,
+      "loss": 0.3505268096923828,
+      "step": 1980
+    },
+    {
+      "epoch": 1.7688888888888887,
+      "grad_norm": 0.36553364992141724,
+      "learning_rate": 3.6356079353643157e-06,
+      "loss": 0.35234646797180175,
+      "step": 1990
+    },
+    {
+      "epoch": 1.7777777777777777,
+      "grad_norm": 0.4717234969139099,
+      "learning_rate": 3.36546349448566e-06,
+      "loss": 0.35505836009979247,
+      "step": 2000
+    },
+    {
+      "epoch": 1.7866666666666666,
+      "grad_norm": 0.36850428581237793,
+      "learning_rate": 3.1053974293145493e-06,
+      "loss": 0.355438756942749,
+      "step": 2010
+    },
+    {
+      "epoch": 1.7955555555555556,
+      "grad_norm": 0.4966508448123932,
+      "learning_rate": 2.8554659437657506e-06,
+      "loss": 0.4134857654571533,
+      "step": 2020
+    },
+    {
+      "epoch": 1.8044444444444445,
+      "grad_norm": 0.5912292003631592,
+      "learning_rate": 2.6157230515295392e-06,
+      "loss": 0.34981932640075686,
+      "step": 2030
+    },
+    {
+      "epoch": 1.8133333333333335,
+      "grad_norm": 0.49372735619544983,
+      "learning_rate": 2.386220564398706e-06,
+      "loss": 0.3849682569503784,
+      "step": 2040
+    },
+    {
+      "epoch": 1.8222222222222222,
+      "grad_norm": 0.45357146859169006,
+      "learning_rate": 2.1670080810712145e-06,
+      "loss": 0.4477557182312012,
+      "step": 2050
+    },
+    {
+      "epoch": 1.8311111111111111,
+      "grad_norm": 0.4242260456085205,
+      "learning_rate": 1.958132976431265e-06,
+      "loss": 0.3643706560134888,
+      "step": 2060
+    },
+    {
+      "epoch": 1.8399999999999999,
+      "grad_norm": 0.4310312271118164,
+      "learning_rate": 1.7596403913109072e-06,
+      "loss": 0.3599487066268921,
+      "step": 2070
+    },
+    {
+      "epoch": 1.8488888888888888,
+      "grad_norm": 0.6903954148292542,
+      "learning_rate": 1.571573222734507e-06,
+      "loss": 0.3706610441207886,
+      "step": 2080
+    },
+    {
+      "epoch": 1.8577777777777778,
+      "grad_norm": 0.4340752959251404,
+      "learning_rate": 1.3939721146480456e-06,
+      "loss": 0.33601970672607423,
+      "step": 2090
+    },
+    {
+      "epoch": 1.8666666666666667,
+      "grad_norm": 0.37018489837646484,
+      "learning_rate": 1.226875449135445e-06,
+      "loss": 0.3561201810836792,
+      "step": 2100
+    },
+    {
+      "epoch": 1.8755555555555556,
+      "grad_norm": 0.4505303204059601,
+      "learning_rate": 1.0703193381236209e-06,
+      "loss": 0.36335651874542235,
+      "step": 2110
+    },
+    {
+      "epoch": 1.8844444444444446,
+      "grad_norm": 0.4433459937572479,
+      "learning_rate": 9.243376155782357e-07,
+      "loss": 0.3289212465286255,
+      "step": 2120
+    },
+    {
+      "epoch": 1.8933333333333333,
+      "grad_norm": 0.3986717462539673,
+      "learning_rate": 7.889618301916424e-07,
+      "loss": 0.40375485420227053,
+      "step": 2130
+    },
+    {
+      "epoch": 1.9022222222222223,
+      "grad_norm": 0.3388350009918213,
+      "learning_rate": 6.642212385648494e-07,
+      "loss": 0.3685792922973633,
+      "step": 2140
+    },
+    {
+      "epoch": 1.911111111111111,
+      "grad_norm": 0.5131290555000305,
+      "learning_rate": 5.501427988846842e-07,
+      "loss": 0.3703050374984741,
+      "step": 2150
+    },
+    {
+      "epoch": 1.92,
+      "grad_norm": 0.4321134090423584,
+      "learning_rate": 4.4675116509781954e-07,
+      "loss": 0.3673090934753418,
+      "step": 2160
+    },
+    {
+      "epoch": 1.9288888888888889,
+      "grad_norm": 0.48017483949661255,
+      "learning_rate": 3.5406868158263106e-07,
+      "loss": 0.34112837314605715,
+      "step": 2170
+    },
+    {
+      "epoch": 1.9377777777777778,
+      "grad_norm": 0.48736897110939026,
+      "learning_rate": 2.721153783203589e-07,
+      "loss": 0.38310391902923585,
+      "step": 2180
+    },
+    {
+      "epoch": 1.9466666666666668,
+      "grad_norm": 0.45370784401893616,
+      "learning_rate": 2.0090896656627668e-07,
+      "loss": 0.3905895709991455,
+      "step": 2190
+    },
+    {
+      "epoch": 1.9555555555555557,
+      "grad_norm": 0.33105170726776123,
+      "learning_rate": 1.4046483502206388e-07,
+      "loss": 0.35344064235687256,
+      "step": 2200
+    },
+    {
+      "epoch": 1.9644444444444444,
+      "grad_norm": 0.2413172870874405,
+      "learning_rate": 9.079604651009433e-08,
+      "loss": 0.3432020902633667,
+      "step": 2210
+    },
+    {
+      "epoch": 1.9733333333333334,
+      "grad_norm": 0.3724231719970703,
+      "learning_rate": 5.1913335150388656e-08,
+      "loss": 0.3234715461730957,
+      "step": 2220
+    },
+    {
+      "epoch": 1.982222222222222,
+      "grad_norm": 0.4314740002155304,
+      "learning_rate": 2.382510404079219e-08,
+      "loss": 0.3377274513244629,
+      "step": 2230
+    },
+    {
+      "epoch": 1.991111111111111,
+      "grad_norm": 0.48465994000434875,
+      "learning_rate": 6.5374234409831815e-09,
+      "loss": 0.4104018211364746,
+      "step": 2240
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 0.5848714709281921,
+      "learning_rate": 5.4029460561100253e-11,
+      "loss": 0.38427934646606443,
+      "step": 2250
+    },
+    {
+      "epoch": 2.0,
+      "eval_loss": 0.371486097574234,
+      "eval_runtime": 358.2834,
+      "eval_samples_per_second": 2.791,
+      "eval_steps_per_second": 2.791,
+      "step": 2250
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2250,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.4317586279670989e+17,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-2250/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:edb44ddd12e5044cfebc57246c77bd185ad4fad52f56a66554c2013fa82c4cfa
+size 4920

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d73c2c5f7aa0ed522c8d96ef3524739eb61e3c78e74839a2ce4a1c56ea340a20
+size 19989424

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "add_prefix_space": false,
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>",
+  "audio_token": "<|audio_pad|>",
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "image_token": "<|image_pad|>",
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 262144,
+  "model_specific_special_tokens": {
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>",
+    "audio_token": "<|audio_pad|>",
+    "image_token": "<|image_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>"
+  },
+  "pad_token": "<|endoftext|>",
+  "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null,
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:edb44ddd12e5044cfebc57246c77bd185ad4fad52f56a66554c2013fa82c4cfa
+size 4920