tony24254 commited on Dec 1, 2025

Commit

eb4f7d4

verified ·

1 Parent(s): 6407baa

Add files using upload-large-folder tool

Browse files

Files changed (50) hide show

.gitkeep +1 -0
Math_QA/group_09/adapter/README.md +202 -0
Math_QA/group_09/adapter/adapter_config.json +34 -0
Math_QA/group_09/adapter/added_tokens.json +24 -0
Math_QA/group_09/adapter/chat_template.jinja +54 -0
Math_QA/group_09/adapter/merges.txt +0 -0
Math_QA/group_09/adapter/special_tokens_map.json +31 -0
Math_QA/group_09/adapter/tokenizer_config.json +207 -0
Math_QA/group_09/adapter/vocab.json +0 -0
Math_QA/group_09/checkpoints/checkpoint-1200/adapter_config.json +34 -0
Math_QA/group_09/checkpoints/checkpoint-1200/added_tokens.json +24 -0
Math_QA/group_09/checkpoints/checkpoint-1200/chat_template.jinja +54 -0
Math_QA/group_09/checkpoints/checkpoint-1200/merges.txt +0 -0
Math_QA/group_09/checkpoints/checkpoint-1200/special_tokens_map.json +31 -0
Math_QA/group_09/checkpoints/checkpoint-1500/added_tokens.json +24 -0
Math_QA/group_09/checkpoints/checkpoint-1800/README.md +202 -0
Math_QA/group_09/checkpoints/checkpoint-1800/adapter_config.json +34 -0
Math_QA/group_09/checkpoints/checkpoint-1800/added_tokens.json +24 -0
Math_QA/group_09/checkpoints/checkpoint-1800/chat_template.jinja +54 -0
Math_QA/group_09/checkpoints/checkpoint-1800/merges.txt +0 -0
Math_QA/group_09/checkpoints/checkpoint-1800/special_tokens_map.json +31 -0
Math_QA/group_09/checkpoints/checkpoint-1800/tokenizer_config.json +207 -0
Math_QA/group_09/checkpoints/checkpoint-1800/trainer_state.json +2561 -0
Math_QA/group_09/checkpoints/checkpoint-1800/vocab.json +0 -0
Math_QA/group_09/checkpoints/checkpoint-300/README.md +202 -0
Math_QA/group_09/checkpoints/checkpoint-300/adapter_config.json +34 -0
Math_QA/group_09/checkpoints/checkpoint-300/added_tokens.json +24 -0
Math_QA/group_09/checkpoints/checkpoint-300/chat_template.jinja +54 -0
Math_QA/group_09/checkpoints/checkpoint-300/special_tokens_map.json +31 -0
Math_QA/group_09/checkpoints/checkpoint-300/tokenizer_config.json +207 -0
Math_QA/group_09/checkpoints/checkpoint-300/trainer_state.json +461 -0
Math_QA/group_09/checkpoints/checkpoint-300/vocab.json +0 -0
Math_QA/group_09/checkpoints/checkpoint-600/README.md +202 -0
Math_QA/group_09/checkpoints/checkpoint-600/adapter_config.json +34 -0
Math_QA/group_09/checkpoints/checkpoint-600/added_tokens.json +24 -0
Math_QA/group_09/checkpoints/checkpoint-600/chat_template.jinja +54 -0
Math_QA/group_09/checkpoints/checkpoint-600/merges.txt +0 -0
Math_QA/group_09/checkpoints/checkpoint-600/special_tokens_map.json +31 -0
Math_QA/group_09/checkpoints/checkpoint-600/tokenizer_config.json +207 -0
Math_QA/group_09/checkpoints/checkpoint-600/trainer_state.json +881 -0
Math_QA/group_09/checkpoints/checkpoint-600/vocab.json +0 -0
Math_QA/group_09/metadata.json +2718 -0
Math_QA/group_09/prompt_group.json +613 -0
Math_QA/group_09/tokenizer/added_tokens.json +24 -0
Math_QA/group_09/tokenizer/chat_template.jinja +54 -0
Math_QA/group_09/tokenizer/merges.txt +0 -0
Math_QA/group_09/tokenizer/special_tokens_map.json +31 -0
Math_QA/group_09/tokenizer/tokenizer_config.json +207 -0
Math_QA/group_09/tokenizer/vocab.json +0 -0
README.md +223 -0

.gitkeep ADDED Viewed

	@@ -0,0 +1 @@


1	+

Math_QA/group_09/adapter/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

Math_QA/group_09/adapter/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "o_proj",
+    "k_proj",
+    "q_proj",
+    "up_proj",
+    "down_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

Math_QA/group_09/adapter/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

Math_QA/group_09/adapter/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

Math_QA/group_09/adapter/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

Math_QA/group_09/adapter/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

Math_QA/group_09/adapter/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

Math_QA/group_09/adapter/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

Math_QA/group_09/checkpoints/checkpoint-1200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "o_proj",
+    "k_proj",
+    "q_proj",
+    "up_proj",
+    "down_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

Math_QA/group_09/checkpoints/checkpoint-1200/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

Math_QA/group_09/checkpoints/checkpoint-1200/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

Math_QA/group_09/checkpoints/checkpoint-1200/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

Math_QA/group_09/checkpoints/checkpoint-1200/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

Math_QA/group_09/checkpoints/checkpoint-1500/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

Math_QA/group_09/checkpoints/checkpoint-1800/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

Math_QA/group_09/checkpoints/checkpoint-1800/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "o_proj",
+    "k_proj",
+    "q_proj",
+    "up_proj",
+    "down_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

Math_QA/group_09/checkpoints/checkpoint-1800/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

Math_QA/group_09/checkpoints/checkpoint-1800/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

Math_QA/group_09/checkpoints/checkpoint-1800/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

Math_QA/group_09/checkpoints/checkpoint-1800/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

Math_QA/group_09/checkpoints/checkpoint-1800/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

Math_QA/group_09/checkpoints/checkpoint-1800/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2561 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 450.0,
+  "eval_steps": 500,
+  "global_step": 1800,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.32,
+      "grad_norm": 10.95508098602295,
+      "learning_rate": 0.0,
+      "loss": 1.9528,
+      "step": 1
+    },
+    {
+      "epoch": 1.32,
+      "grad_norm": 6.976505279541016,
+      "learning_rate": 7.4074074074074075e-06,
+      "loss": 1.7919,
+      "step": 5
+    },
+    {
+      "epoch": 2.64,
+      "grad_norm": 3.288942575454712,
+      "learning_rate": 1.6666666666666667e-05,
+      "loss": 1.6625,
+      "step": 10
+    },
+    {
+      "epoch": 3.96,
+      "grad_norm": 2.111987829208374,
+      "learning_rate": 2.5925925925925925e-05,
+      "loss": 1.2009,
+      "step": 15
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 2.4555912017822266,
+      "learning_rate": 3.518518518518519e-05,
+      "loss": 0.8544,
+      "step": 20
+    },
+    {
+      "epoch": 6.32,
+      "grad_norm": 0.656902015209198,
+      "learning_rate": 4.4444444444444447e-05,
+      "loss": 0.7449,
+      "step": 25
+    },
+    {
+      "epoch": 7.64,
+      "grad_norm": 0.5291489958763123,
+      "learning_rate": 5.370370370370371e-05,
+      "loss": 0.5884,
+      "step": 30
+    },
+    {
+      "epoch": 8.96,
+      "grad_norm": 0.5356371998786926,
+      "learning_rate": 6.296296296296296e-05,
+      "loss": 0.6349,
+      "step": 35
+    },
+    {
+      "epoch": 10.0,
+      "grad_norm": 1.3841232061386108,
+      "learning_rate": 7.222222222222222e-05,
+      "loss": 0.5088,
+      "step": 40
+    },
+    {
+      "epoch": 11.32,
+      "grad_norm": 0.6104851365089417,
+      "learning_rate": 8.148148148148148e-05,
+      "loss": 0.4279,
+      "step": 45
+    },
+    {
+      "epoch": 12.64,
+      "grad_norm": 0.6166547536849976,
+      "learning_rate": 9.074074074074075e-05,
+      "loss": 0.2689,
+      "step": 50
+    },
+    {
+      "epoch": 13.96,
+      "grad_norm": 1.70536470413208,
+      "learning_rate": 0.0001,
+      "loss": 0.1985,
+      "step": 55
+    },
+    {
+      "epoch": 15.0,
+      "grad_norm": 3.289710760116577,
+      "learning_rate": 9.971363115693013e-05,
+      "loss": 0.1346,
+      "step": 60
+    },
+    {
+      "epoch": 16.32,
+      "grad_norm": 0.7723984122276306,
+      "learning_rate": 9.942726231386026e-05,
+      "loss": 0.0959,
+      "step": 65
+    },
+    {
+      "epoch": 17.64,
+      "grad_norm": 0.8176506161689758,
+      "learning_rate": 9.914089347079038e-05,
+      "loss": 0.0617,
+      "step": 70
+    },
+    {
+      "epoch": 18.96,
+      "grad_norm": 0.5226219892501831,
+      "learning_rate": 9.885452462772051e-05,
+      "loss": 0.0428,
+      "step": 75
+    },
+    {
+      "epoch": 20.0,
+      "grad_norm": 2.8831839561462402,
+      "learning_rate": 9.856815578465064e-05,
+      "loss": 0.0416,
+      "step": 80
+    },
+    {
+      "epoch": 21.32,
+      "grad_norm": 0.26046544313430786,
+      "learning_rate": 9.828178694158075e-05,
+      "loss": 0.0334,
+      "step": 85
+    },
+    {
+      "epoch": 22.64,
+      "grad_norm": 0.5656669735908508,
+      "learning_rate": 9.799541809851088e-05,
+      "loss": 0.0347,
+      "step": 90
+    },
+    {
+      "epoch": 23.96,
+      "grad_norm": 0.5219624042510986,
+      "learning_rate": 9.7709049255441e-05,
+      "loss": 0.0336,
+      "step": 95
+    },
+    {
+      "epoch": 25.0,
+      "grad_norm": 1.2479528188705444,
+      "learning_rate": 9.742268041237114e-05,
+      "loss": 0.0325,
+      "step": 100
+    },
+    {
+      "epoch": 26.32,
+      "grad_norm": 0.3272712826728821,
+      "learning_rate": 9.713631156930127e-05,
+      "loss": 0.0317,
+      "step": 105
+    },
+    {
+      "epoch": 27.64,
+      "grad_norm": 0.4236655831336975,
+      "learning_rate": 9.68499427262314e-05,
+      "loss": 0.0322,
+      "step": 110
+    },
+    {
+      "epoch": 28.96,
+      "grad_norm": 0.23534469306468964,
+      "learning_rate": 9.656357388316152e-05,
+      "loss": 0.029,
+      "step": 115
+    },
+    {
+      "epoch": 30.0,
+      "grad_norm": 0.530704140663147,
+      "learning_rate": 9.627720504009165e-05,
+      "loss": 0.0301,
+      "step": 120
+    },
+    {
+      "epoch": 31.32,
+      "grad_norm": 0.08252622932195663,
+      "learning_rate": 9.599083619702178e-05,
+      "loss": 0.029,
+      "step": 125
+    },
+    {
+      "epoch": 32.64,
+      "grad_norm": 0.2679576277732849,
+      "learning_rate": 9.57044673539519e-05,
+      "loss": 0.0287,
+      "step": 130
+    },
+    {
+      "epoch": 33.96,
+      "grad_norm": 0.30863121151924133,
+      "learning_rate": 9.541809851088203e-05,
+      "loss": 0.029,
+      "step": 135
+    },
+    {
+      "epoch": 35.0,
+      "grad_norm": 0.27921056747436523,
+      "learning_rate": 9.513172966781214e-05,
+      "loss": 0.0272,
+      "step": 140
+    },
+    {
+      "epoch": 36.32,
+      "grad_norm": 0.15001249313354492,
+      "learning_rate": 9.484536082474227e-05,
+      "loss": 0.0289,
+      "step": 145
+    },
+    {
+      "epoch": 37.64,
+      "grad_norm": 0.391609787940979,
+      "learning_rate": 9.45589919816724e-05,
+      "loss": 0.0295,
+      "step": 150
+    },
+    {
+      "epoch": 38.96,
+      "grad_norm": 0.24230684340000153,
+      "learning_rate": 9.427262313860252e-05,
+      "loss": 0.0265,
+      "step": 155
+    },
+    {
+      "epoch": 40.0,
+      "grad_norm": 2.2498250007629395,
+      "learning_rate": 9.398625429553265e-05,
+      "loss": 0.0319,
+      "step": 160
+    },
+    {
+      "epoch": 41.32,
+      "grad_norm": 0.14986856281757355,
+      "learning_rate": 9.369988545246277e-05,
+      "loss": 0.0277,
+      "step": 165
+    },
+    {
+      "epoch": 42.64,
+      "grad_norm": 0.14574986696243286,
+      "learning_rate": 9.34135166093929e-05,
+      "loss": 0.0264,
+      "step": 170
+    },
+    {
+      "epoch": 43.96,
+      "grad_norm": 0.11353456974029541,
+      "learning_rate": 9.312714776632303e-05,
+      "loss": 0.026,
+      "step": 175
+    },
+    {
+      "epoch": 45.0,
+      "grad_norm": 0.19234131276607513,
+      "learning_rate": 9.284077892325315e-05,
+      "loss": 0.0237,
+      "step": 180
+    },
+    {
+      "epoch": 46.32,
+      "grad_norm": 0.058677107095718384,
+      "learning_rate": 9.255441008018328e-05,
+      "loss": 0.0265,
+      "step": 185
+    },
+    {
+      "epoch": 47.64,
+      "grad_norm": 0.2846521735191345,
+      "learning_rate": 9.22680412371134e-05,
+      "loss": 0.0279,
+      "step": 190
+    },
+    {
+      "epoch": 48.96,
+      "grad_norm": 0.06889114528894424,
+      "learning_rate": 9.198167239404353e-05,
+      "loss": 0.0257,
+      "step": 195
+    },
+    {
+      "epoch": 50.0,
+      "grad_norm": 0.1600271314382553,
+      "learning_rate": 9.169530355097366e-05,
+      "loss": 0.0249,
+      "step": 200
+    },
+    {
+      "epoch": 51.32,
+      "grad_norm": 0.06680695712566376,
+      "learning_rate": 9.140893470790379e-05,
+      "loss": 0.0245,
+      "step": 205
+    },
+    {
+      "epoch": 52.64,
+      "grad_norm": 0.06898869574069977,
+      "learning_rate": 9.112256586483391e-05,
+      "loss": 0.0257,
+      "step": 210
+    },
+    {
+      "epoch": 53.96,
+      "grad_norm": 0.04665664583444595,
+      "learning_rate": 9.083619702176404e-05,
+      "loss": 0.0246,
+      "step": 215
+    },
+    {
+      "epoch": 55.0,
+      "grad_norm": 0.18880419433116913,
+      "learning_rate": 9.054982817869416e-05,
+      "loss": 0.0267,
+      "step": 220
+    },
+    {
+      "epoch": 56.32,
+      "grad_norm": 0.05329155549407005,
+      "learning_rate": 9.026345933562429e-05,
+      "loss": 0.0258,
+      "step": 225
+    },
+    {
+      "epoch": 57.64,
+      "grad_norm": 0.05351603031158447,
+      "learning_rate": 8.997709049255442e-05,
+      "loss": 0.0264,
+      "step": 230
+    },
+    {
+      "epoch": 58.96,
+      "grad_norm": 0.05472696200013161,
+      "learning_rate": 8.969072164948454e-05,
+      "loss": 0.0266,
+      "step": 235
+    },
+    {
+      "epoch": 60.0,
+      "grad_norm": 0.17182305455207825,
+      "learning_rate": 8.940435280641467e-05,
+      "loss": 0.0255,
+      "step": 240
+    },
+    {
+      "epoch": 61.32,
+      "grad_norm": 0.05441403388977051,
+      "learning_rate": 8.91179839633448e-05,
+      "loss": 0.0259,
+      "step": 245
+    },
+    {
+      "epoch": 62.64,
+      "grad_norm": 0.05443132296204567,
+      "learning_rate": 8.883161512027491e-05,
+      "loss": 0.025,
+      "step": 250
+    },
+    {
+      "epoch": 63.96,
+      "grad_norm": 0.05410757660865784,
+      "learning_rate": 8.854524627720504e-05,
+      "loss": 0.0261,
+      "step": 255
+    },
+    {
+      "epoch": 65.0,
+      "grad_norm": 0.16327381134033203,
+      "learning_rate": 8.825887743413516e-05,
+      "loss": 0.0265,
+      "step": 260
+    },
+    {
+      "epoch": 66.32,
+      "grad_norm": 0.05516252666711807,
+      "learning_rate": 8.797250859106529e-05,
+      "loss": 0.0251,
+      "step": 265
+    },
+    {
+      "epoch": 67.64,
+      "grad_norm": 0.0483415424823761,
+      "learning_rate": 8.768613974799542e-05,
+      "loss": 0.0255,
+      "step": 270
+    },
+    {
+      "epoch": 68.96,
+      "grad_norm": 0.062226541340351105,
+      "learning_rate": 8.739977090492554e-05,
+      "loss": 0.0247,
+      "step": 275
+    },
+    {
+      "epoch": 70.0,
+      "grad_norm": 0.20358847081661224,
+      "learning_rate": 8.711340206185567e-05,
+      "loss": 0.0267,
+      "step": 280
+    },
+    {
+      "epoch": 71.32,
+      "grad_norm": 0.04628003016114235,
+      "learning_rate": 8.682703321878581e-05,
+      "loss": 0.0255,
+      "step": 285
+    },
+    {
+      "epoch": 72.64,
+      "grad_norm": 0.06483373790979385,
+      "learning_rate": 8.654066437571594e-05,
+      "loss": 0.0257,
+      "step": 290
+    },
+    {
+      "epoch": 73.96,
+      "grad_norm": 0.04926105588674545,
+      "learning_rate": 8.625429553264606e-05,
+      "loss": 0.0244,
+      "step": 295
+    },
+    {
+      "epoch": 75.0,
+      "grad_norm": 0.1988091617822647,
+      "learning_rate": 8.596792668957619e-05,
+      "loss": 0.0239,
+      "step": 300
+    },
+    {
+      "epoch": 76.32,
+      "grad_norm": 0.04305023327469826,
+      "learning_rate": 8.56815578465063e-05,
+      "loss": 0.0248,
+      "step": 305
+    },
+    {
+      "epoch": 77.64,
+      "grad_norm": 0.04323578625917435,
+      "learning_rate": 8.539518900343643e-05,
+      "loss": 0.0254,
+      "step": 310
+    },
+    {
+      "epoch": 78.96,
+      "grad_norm": 0.04426678270101547,
+      "learning_rate": 8.510882016036655e-05,
+      "loss": 0.0254,
+      "step": 315
+    },
+    {
+      "epoch": 80.0,
+      "grad_norm": 0.14689449965953827,
+      "learning_rate": 8.482245131729668e-05,
+      "loss": 0.0259,
+      "step": 320
+    },
+    {
+      "epoch": 81.32,
+      "grad_norm": 0.04256561025977135,
+      "learning_rate": 8.453608247422681e-05,
+      "loss": 0.0256,
+      "step": 325
+    },
+    {
+      "epoch": 82.64,
+      "grad_norm": 0.03943061828613281,
+      "learning_rate": 8.424971363115693e-05,
+      "loss": 0.0235,
+      "step": 330
+    },
+    {
+      "epoch": 83.96,
+      "grad_norm": 0.041899990290403366,
+      "learning_rate": 8.396334478808706e-05,
+      "loss": 0.0249,
+      "step": 335
+    },
+    {
+      "epoch": 85.0,
+      "grad_norm": 0.151236429810524,
+      "learning_rate": 8.367697594501719e-05,
+      "loss": 0.0255,
+      "step": 340
+    },
+    {
+      "epoch": 86.32,
+      "grad_norm": 0.042102884501218796,
+      "learning_rate": 8.339060710194731e-05,
+      "loss": 0.0244,
+      "step": 345
+    },
+    {
+      "epoch": 87.64,
+      "grad_norm": 0.04723669961094856,
+      "learning_rate": 8.310423825887744e-05,
+      "loss": 0.0251,
+      "step": 350
+    },
+    {
+      "epoch": 88.96,
+      "grad_norm": 0.0578082799911499,
+      "learning_rate": 8.281786941580757e-05,
+      "loss": 0.0261,
+      "step": 355
+    },
+    {
+      "epoch": 90.0,
+      "grad_norm": 0.10269813239574432,
+      "learning_rate": 8.253150057273768e-05,
+      "loss": 0.0225,
+      "step": 360
+    },
+    {
+      "epoch": 91.32,
+      "grad_norm": 0.046400491148233414,
+      "learning_rate": 8.224513172966782e-05,
+      "loss": 0.0262,
+      "step": 365
+    },
+    {
+      "epoch": 92.64,
+      "grad_norm": 0.04183673858642578,
+      "learning_rate": 8.195876288659795e-05,
+      "loss": 0.0239,
+      "step": 370
+    },
+    {
+      "epoch": 93.96,
+      "grad_norm": 0.04400316998362541,
+      "learning_rate": 8.167239404352807e-05,
+      "loss": 0.0263,
+      "step": 375
+    },
+    {
+      "epoch": 95.0,
+      "grad_norm": 0.10862386226654053,
+      "learning_rate": 8.13860252004582e-05,
+      "loss": 0.025,
+      "step": 380
+    },
+    {
+      "epoch": 96.32,
+      "grad_norm": 0.05308162048459053,
+      "learning_rate": 8.109965635738833e-05,
+      "loss": 0.0248,
+      "step": 385
+    },
+    {
+      "epoch": 97.64,
+      "grad_norm": 0.04261139780282974,
+      "learning_rate": 8.081328751431845e-05,
+      "loss": 0.0244,
+      "step": 390
+    },
+    {
+      "epoch": 98.96,
+      "grad_norm": 0.05337546020746231,
+      "learning_rate": 8.052691867124858e-05,
+      "loss": 0.0253,
+      "step": 395
+    },
+    {
+      "epoch": 100.0,
+      "grad_norm": 0.15639856457710266,
+      "learning_rate": 8.02405498281787e-05,
+      "loss": 0.0243,
+      "step": 400
+    },
+    {
+      "epoch": 101.32,
+      "grad_norm": 0.04450729116797447,
+      "learning_rate": 7.995418098510883e-05,
+      "loss": 0.0258,
+      "step": 405
+    },
+    {
+      "epoch": 102.64,
+      "grad_norm": 0.042327046394348145,
+      "learning_rate": 7.966781214203894e-05,
+      "loss": 0.0244,
+      "step": 410
+    },
+    {
+      "epoch": 103.96,
+      "grad_norm": 0.04105006903409958,
+      "learning_rate": 7.938144329896907e-05,
+      "loss": 0.0253,
+      "step": 415
+    },
+    {
+      "epoch": 105.0,
+      "grad_norm": 0.17930248379707336,
+      "learning_rate": 7.90950744558992e-05,
+      "loss": 0.0261,
+      "step": 420
+    },
+    {
+      "epoch": 106.32,
+      "grad_norm": 0.04404031112790108,
+      "learning_rate": 7.880870561282932e-05,
+      "loss": 0.0241,
+      "step": 425
+    },
+    {
+      "epoch": 107.64,
+      "grad_norm": 0.04142986983060837,
+      "learning_rate": 7.852233676975945e-05,
+      "loss": 0.0245,
+      "step": 430
+    },
+    {
+      "epoch": 108.96,
+      "grad_norm": 0.041959185153245926,
+      "learning_rate": 7.823596792668958e-05,
+      "loss": 0.0254,
+      "step": 435
+    },
+    {
+      "epoch": 110.0,
+      "grad_norm": 0.27740439772605896,
+      "learning_rate": 7.79495990836197e-05,
+      "loss": 0.0292,
+      "step": 440
+    },
+    {
+      "epoch": 111.32,
+      "grad_norm": 0.03657572343945503,
+      "learning_rate": 7.766323024054983e-05,
+      "loss": 0.026,
+      "step": 445
+    },
+    {
+      "epoch": 112.64,
+      "grad_norm": 0.042320434004068375,
+      "learning_rate": 7.737686139747996e-05,
+      "loss": 0.0251,
+      "step": 450
+    },
+    {
+      "epoch": 113.96,
+      "grad_norm": 0.0473681204020977,
+      "learning_rate": 7.709049255441008e-05,
+      "loss": 0.026,
+      "step": 455
+    },
+    {
+      "epoch": 115.0,
+      "grad_norm": 0.1326676607131958,
+      "learning_rate": 7.680412371134021e-05,
+      "loss": 0.0241,
+      "step": 460
+    },
+    {
+      "epoch": 116.32,
+      "grad_norm": 0.04483647271990776,
+      "learning_rate": 7.651775486827034e-05,
+      "loss": 0.0236,
+      "step": 465
+    },
+    {
+      "epoch": 117.64,
+      "grad_norm": 0.038961004465818405,
+      "learning_rate": 7.623138602520046e-05,
+      "loss": 0.0235,
+      "step": 470
+    },
+    {
+      "epoch": 118.96,
+      "grad_norm": 0.042134907096624374,
+      "learning_rate": 7.594501718213059e-05,
+      "loss": 0.0252,
+      "step": 475
+    },
+    {
+      "epoch": 120.0,
+      "grad_norm": 0.13292020559310913,
+      "learning_rate": 7.565864833906071e-05,
+      "loss": 0.024,
+      "step": 480
+    },
+    {
+      "epoch": 121.32,
+      "grad_norm": 0.03745294362306595,
+      "learning_rate": 7.537227949599084e-05,
+      "loss": 0.025,
+      "step": 485
+    },
+    {
+      "epoch": 122.64,
+      "grad_norm": 0.035545315593481064,
+      "learning_rate": 7.508591065292097e-05,
+      "loss": 0.0253,
+      "step": 490
+    },
+    {
+      "epoch": 123.96,
+      "grad_norm": 0.03991984575986862,
+      "learning_rate": 7.47995418098511e-05,
+      "loss": 0.026,
+      "step": 495
+    },
+    {
+      "epoch": 125.0,
+      "grad_norm": 0.1339961290359497,
+      "learning_rate": 7.451317296678122e-05,
+      "loss": 0.0246,
+      "step": 500
+    },
+    {
+      "epoch": 126.32,
+      "grad_norm": 0.04381132498383522,
+      "learning_rate": 7.422680412371135e-05,
+      "loss": 0.0235,
+      "step": 505
+    },
+    {
+      "epoch": 127.64,
+      "grad_norm": 0.048515841364860535,
+      "learning_rate": 7.394043528064147e-05,
+      "loss": 0.0242,
+      "step": 510
+    },
+    {
+      "epoch": 128.96,
+      "grad_norm": 0.04145604744553566,
+      "learning_rate": 7.36540664375716e-05,
+      "loss": 0.0249,
+      "step": 515
+    },
+    {
+      "epoch": 130.0,
+      "grad_norm": 0.14400818943977356,
+      "learning_rate": 7.336769759450171e-05,
+      "loss": 0.0247,
+      "step": 520
+    },
+    {
+      "epoch": 131.32,
+      "grad_norm": 0.04025031998753548,
+      "learning_rate": 7.308132875143184e-05,
+      "loss": 0.0241,
+      "step": 525
+    },
+    {
+      "epoch": 132.64,
+      "grad_norm": 0.037277135998010635,
+      "learning_rate": 7.279495990836197e-05,
+      "loss": 0.0242,
+      "step": 530
+    },
+    {
+      "epoch": 133.96,
+      "grad_norm": 0.03666083887219429,
+      "learning_rate": 7.250859106529209e-05,
+      "loss": 0.0251,
+      "step": 535
+    },
+    {
+      "epoch": 135.0,
+      "grad_norm": 0.09921745210886002,
+      "learning_rate": 7.222222222222222e-05,
+      "loss": 0.0241,
+      "step": 540
+    },
+    {
+      "epoch": 136.32,
+      "grad_norm": 0.0382193848490715,
+      "learning_rate": 7.193585337915235e-05,
+      "loss": 0.0247,
+      "step": 545
+    },
+    {
+      "epoch": 137.64,
+      "grad_norm": 0.0314810685813427,
+      "learning_rate": 7.164948453608247e-05,
+      "loss": 0.0239,
+      "step": 550
+    },
+    {
+      "epoch": 138.96,
+      "grad_norm": 0.04278745502233505,
+      "learning_rate": 7.136311569301261e-05,
+      "loss": 0.0243,
+      "step": 555
+    },
+    {
+      "epoch": 140.0,
+      "grad_norm": 0.09295342862606049,
+      "learning_rate": 7.107674684994274e-05,
+      "loss": 0.0234,
+      "step": 560
+    },
+    {
+      "epoch": 141.32,
+      "grad_norm": 0.03429599106311798,
+      "learning_rate": 7.079037800687286e-05,
+      "loss": 0.0248,
+      "step": 565
+    },
+    {
+      "epoch": 142.64,
+      "grad_norm": 0.03622185438871384,
+      "learning_rate": 7.050400916380299e-05,
+      "loss": 0.0234,
+      "step": 570
+    },
+    {
+      "epoch": 143.96,
+      "grad_norm": 0.042615506798028946,
+      "learning_rate": 7.02176403207331e-05,
+      "loss": 0.0242,
+      "step": 575
+    },
+    {
+      "epoch": 145.0,
+      "grad_norm": 0.13792142271995544,
+      "learning_rate": 6.993127147766323e-05,
+      "loss": 0.0268,
+      "step": 580
+    },
+    {
+      "epoch": 146.32,
+      "grad_norm": 0.035664405673742294,
+      "learning_rate": 6.964490263459336e-05,
+      "loss": 0.0231,
+      "step": 585
+    },
+    {
+      "epoch": 147.64,
+      "grad_norm": 0.033511932939291,
+      "learning_rate": 6.935853379152348e-05,
+      "loss": 0.0258,
+      "step": 590
+    },
+    {
+      "epoch": 148.96,
+      "grad_norm": 0.036591917276382446,
+      "learning_rate": 6.907216494845361e-05,
+      "loss": 0.0248,
+      "step": 595
+    },
+    {
+      "epoch": 150.0,
+      "grad_norm": 0.11892726272344589,
+      "learning_rate": 6.878579610538374e-05,
+      "loss": 0.0257,
+      "step": 600
+    },
+    {
+      "epoch": 151.32,
+      "grad_norm": 0.03532181680202484,
+      "learning_rate": 6.849942726231386e-05,
+      "loss": 0.0246,
+      "step": 605
+    },
+    {
+      "epoch": 152.64,
+      "grad_norm": 0.039349090307950974,
+      "learning_rate": 6.821305841924399e-05,
+      "loss": 0.0244,
+      "step": 610
+    },
+    {
+      "epoch": 153.96,
+      "grad_norm": 0.03686106950044632,
+      "learning_rate": 6.792668957617412e-05,
+      "loss": 0.0247,
+      "step": 615
+    },
+    {
+      "epoch": 155.0,
+      "grad_norm": 0.08257201313972473,
+      "learning_rate": 6.764032073310424e-05,
+      "loss": 0.0231,
+      "step": 620
+    },
+    {
+      "epoch": 156.32,
+      "grad_norm": 0.035335343331098557,
+      "learning_rate": 6.735395189003437e-05,
+      "loss": 0.0243,
+      "step": 625
+    },
+    {
+      "epoch": 157.64,
+      "grad_norm": 0.030693387612700462,
+      "learning_rate": 6.706758304696448e-05,
+      "loss": 0.0239,
+      "step": 630
+    },
+    {
+      "epoch": 158.96,
+      "grad_norm": 0.031573694199323654,
+      "learning_rate": 6.678121420389462e-05,
+      "loss": 0.0236,
+      "step": 635
+    },
+    {
+      "epoch": 160.0,
+      "grad_norm": 0.11772840470075607,
+      "learning_rate": 6.649484536082475e-05,
+      "loss": 0.0247,
+      "step": 640
+    },
+    {
+      "epoch": 161.32,
+      "grad_norm": 0.03553156182169914,
+      "learning_rate": 6.620847651775487e-05,
+      "loss": 0.0231,
+      "step": 645
+    },
+    {
+      "epoch": 162.64,
+      "grad_norm": 0.04065680876374245,
+      "learning_rate": 6.5922107674685e-05,
+      "loss": 0.0247,
+      "step": 650
+    },
+    {
+      "epoch": 163.96,
+      "grad_norm": 0.03680557757616043,
+      "learning_rate": 6.563573883161513e-05,
+      "loss": 0.0244,
+      "step": 655
+    },
+    {
+      "epoch": 165.0,
+      "grad_norm": 0.1432940512895584,
+      "learning_rate": 6.534936998854525e-05,
+      "loss": 0.0254,
+      "step": 660
+    },
+    {
+      "epoch": 166.32,
+      "grad_norm": 0.0374530591070652,
+      "learning_rate": 6.506300114547538e-05,
+      "loss": 0.024,
+      "step": 665
+    },
+    {
+      "epoch": 167.64,
+      "grad_norm": 0.039093125611543655,
+      "learning_rate": 6.477663230240551e-05,
+      "loss": 0.0242,
+      "step": 670
+    },
+    {
+      "epoch": 168.96,
+      "grad_norm": 0.03439056873321533,
+      "learning_rate": 6.449026345933563e-05,
+      "loss": 0.0238,
+      "step": 675
+    },
+    {
+      "epoch": 170.0,
+      "grad_norm": 0.07211510837078094,
+      "learning_rate": 6.420389461626576e-05,
+      "loss": 0.0224,
+      "step": 680
+    },
+    {
+      "epoch": 171.32,
+      "grad_norm": 0.03178408369421959,
+      "learning_rate": 6.391752577319587e-05,
+      "loss": 0.0246,
+      "step": 685
+    },
+    {
+      "epoch": 172.64,
+      "grad_norm": 0.02913156896829605,
+      "learning_rate": 6.3631156930126e-05,
+      "loss": 0.0255,
+      "step": 690
+    },
+    {
+      "epoch": 173.96,
+      "grad_norm": 0.03487716615200043,
+      "learning_rate": 6.334478808705613e-05,
+      "loss": 0.0257,
+      "step": 695
+    },
+    {
+      "epoch": 175.0,
+      "grad_norm": 0.12451174110174179,
+      "learning_rate": 6.305841924398625e-05,
+      "loss": 0.0253,
+      "step": 700
+    },
+    {
+      "epoch": 176.32,
+      "grad_norm": 0.0366508811712265,
+      "learning_rate": 6.277205040091638e-05,
+      "loss": 0.0241,
+      "step": 705
+    },
+    {
+      "epoch": 177.64,
+      "grad_norm": 0.03491870313882828,
+      "learning_rate": 6.24856815578465e-05,
+      "loss": 0.0242,
+      "step": 710
+    },
+    {
+      "epoch": 178.96,
+      "grad_norm": 0.03027982823550701,
+      "learning_rate": 6.219931271477663e-05,
+      "loss": 0.0257,
+      "step": 715
+    },
+    {
+      "epoch": 180.0,
+      "grad_norm": 0.08150530606508255,
+      "learning_rate": 6.191294387170676e-05,
+      "loss": 0.0236,
+      "step": 720
+    },
+    {
+      "epoch": 181.32,
+      "grad_norm": 0.03483245149254799,
+      "learning_rate": 6.162657502863689e-05,
+      "loss": 0.0232,
+      "step": 725
+    },
+    {
+      "epoch": 182.64,
+      "grad_norm": 0.034706421196460724,
+      "learning_rate": 6.134020618556701e-05,
+      "loss": 0.0241,
+      "step": 730
+    },
+    {
+      "epoch": 183.96,
+      "grad_norm": 0.03622004762291908,
+      "learning_rate": 6.105383734249714e-05,
+      "loss": 0.0233,
+      "step": 735
+    },
+    {
+      "epoch": 185.0,
+      "grad_norm": 0.10144224017858505,
+      "learning_rate": 6.076746849942726e-05,
+      "loss": 0.0249,
+      "step": 740
+    },
+    {
+      "epoch": 186.32,
+      "grad_norm": 0.03530497848987579,
+      "learning_rate": 6.0481099656357384e-05,
+      "loss": 0.0238,
+      "step": 745
+    },
+    {
+      "epoch": 187.64,
+      "grad_norm": 0.034086182713508606,
+      "learning_rate": 6.019473081328752e-05,
+      "loss": 0.0245,
+      "step": 750
+    },
+    {
+      "epoch": 188.96,
+      "grad_norm": 0.039041388779878616,
+      "learning_rate": 5.9908361970217644e-05,
+      "loss": 0.0243,
+      "step": 755
+    },
+    {
+      "epoch": 190.0,
+      "grad_norm": 0.1247899979352951,
+      "learning_rate": 5.962199312714777e-05,
+      "loss": 0.0245,
+      "step": 760
+    },
+    {
+      "epoch": 191.32,
+      "grad_norm": 0.035458508878946304,
+      "learning_rate": 5.93356242840779e-05,
+      "loss": 0.0238,
+      "step": 765
+    },
+    {
+      "epoch": 192.64,
+      "grad_norm": 0.03673034906387329,
+      "learning_rate": 5.904925544100802e-05,
+      "loss": 0.0244,
+      "step": 770
+    },
+    {
+      "epoch": 193.96,
+      "grad_norm": 0.03364979103207588,
+      "learning_rate": 5.876288659793815e-05,
+      "loss": 0.0239,
+      "step": 775
+    },
+    {
+      "epoch": 195.0,
+      "grad_norm": 0.09387586265802383,
+      "learning_rate": 5.8476517754868276e-05,
+      "loss": 0.0235,
+      "step": 780
+    },
+    {
+      "epoch": 196.32,
+      "grad_norm": 0.03462570905685425,
+      "learning_rate": 5.81901489117984e-05,
+      "loss": 0.0248,
+      "step": 785
+    },
+    {
+      "epoch": 197.64,
+      "grad_norm": 0.03342005982995033,
+      "learning_rate": 5.790378006872853e-05,
+      "loss": 0.0246,
+      "step": 790
+    },
+    {
+      "epoch": 198.96,
+      "grad_norm": 0.041909925639629364,
+      "learning_rate": 5.761741122565865e-05,
+      "loss": 0.0246,
+      "step": 795
+    },
+    {
+      "epoch": 200.0,
+      "grad_norm": 0.15439164638519287,
+      "learning_rate": 5.7331042382588775e-05,
+      "loss": 0.0258,
+      "step": 800
+    },
+    {
+      "epoch": 201.32,
+      "grad_norm": 0.02883634716272354,
+      "learning_rate": 5.70446735395189e-05,
+      "loss": 0.0236,
+      "step": 805
+    },
+    {
+      "epoch": 202.64,
+      "grad_norm": 0.029865020886063576,
+      "learning_rate": 5.675830469644903e-05,
+      "loss": 0.0235,
+      "step": 810
+    },
+    {
+      "epoch": 203.96,
+      "grad_norm": 0.030608315020799637,
+      "learning_rate": 5.6471935853379155e-05,
+      "loss": 0.024,
+      "step": 815
+    },
+    {
+      "epoch": 205.0,
+      "grad_norm": 0.07783036679029465,
+      "learning_rate": 5.618556701030928e-05,
+      "loss": 0.0224,
+      "step": 820
+    },
+    {
+      "epoch": 206.32,
+      "grad_norm": 0.035508111119270325,
+      "learning_rate": 5.589919816723941e-05,
+      "loss": 0.0233,
+      "step": 825
+    },
+    {
+      "epoch": 207.64,
+      "grad_norm": 0.03703364357352257,
+      "learning_rate": 5.5612829324169534e-05,
+      "loss": 0.0242,
+      "step": 830
+    },
+    {
+      "epoch": 208.96,
+      "grad_norm": 0.030922846868634224,
+      "learning_rate": 5.532646048109966e-05,
+      "loss": 0.0239,
+      "step": 835
+    },
+    {
+      "epoch": 210.0,
+      "grad_norm": 0.11316124349832535,
+      "learning_rate": 5.504009163802979e-05,
+      "loss": 0.0236,
+      "step": 840
+    },
+    {
+      "epoch": 211.32,
+      "grad_norm": 0.032941922545433044,
+      "learning_rate": 5.4753722794959914e-05,
+      "loss": 0.0237,
+      "step": 845
+    },
+    {
+      "epoch": 212.64,
+      "grad_norm": 0.028119860216975212,
+      "learning_rate": 5.4467353951890033e-05,
+      "loss": 0.0235,
+      "step": 850
+    },
+    {
+      "epoch": 213.96,
+      "grad_norm": 0.03130020201206207,
+      "learning_rate": 5.418098510882016e-05,
+      "loss": 0.023,
+      "step": 855
+    },
+    {
+      "epoch": 215.0,
+      "grad_norm": 0.06978127360343933,
+      "learning_rate": 5.3894616265750286e-05,
+      "loss": 0.0226,
+      "step": 860
+    },
+    {
+      "epoch": 216.32,
+      "grad_norm": 0.030422938987612724,
+      "learning_rate": 5.360824742268041e-05,
+      "loss": 0.0231,
+      "step": 865
+    },
+    {
+      "epoch": 217.64,
+      "grad_norm": 0.028223881497979164,
+      "learning_rate": 5.332187857961054e-05,
+      "loss": 0.0238,
+      "step": 870
+    },
+    {
+      "epoch": 218.96,
+      "grad_norm": 0.029208194464445114,
+      "learning_rate": 5.3035509736540666e-05,
+      "loss": 0.0243,
+      "step": 875
+    },
+    {
+      "epoch": 220.0,
+      "grad_norm": 0.16511231660842896,
+      "learning_rate": 5.274914089347079e-05,
+      "loss": 0.0271,
+      "step": 880
+    },
+    {
+      "epoch": 221.32,
+      "grad_norm": 0.03705955296754837,
+      "learning_rate": 5.246277205040092e-05,
+      "loss": 0.0243,
+      "step": 885
+    },
+    {
+      "epoch": 222.64,
+      "grad_norm": 0.030203381553292274,
+      "learning_rate": 5.2176403207331045e-05,
+      "loss": 0.0241,
+      "step": 890
+    },
+    {
+      "epoch": 223.96,
+      "grad_norm": 0.027039049193263054,
+      "learning_rate": 5.189003436426118e-05,
+      "loss": 0.0234,
+      "step": 895
+    },
+    {
+      "epoch": 225.0,
+      "grad_norm": 0.11282758414745331,
+      "learning_rate": 5.1603665521191305e-05,
+      "loss": 0.0254,
+      "step": 900
+    },
+    {
+      "epoch": 226.32,
+      "grad_norm": 0.03700408712029457,
+      "learning_rate": 5.131729667812142e-05,
+      "loss": 0.0236,
+      "step": 905
+    },
+    {
+      "epoch": 227.64,
+      "grad_norm": 0.030705822631716728,
+      "learning_rate": 5.1030927835051544e-05,
+      "loss": 0.024,
+      "step": 910
+    },
+    {
+      "epoch": 228.96,
+      "grad_norm": 0.03678268566727638,
+      "learning_rate": 5.074455899198167e-05,
+      "loss": 0.0238,
+      "step": 915
+    },
+    {
+      "epoch": 230.0,
+      "grad_norm": 0.12632058560848236,
+      "learning_rate": 5.04581901489118e-05,
+      "loss": 0.0269,
+      "step": 920
+    },
+    {
+      "epoch": 231.32,
+      "grad_norm": 0.030165374279022217,
+      "learning_rate": 5.0171821305841924e-05,
+      "loss": 0.0244,
+      "step": 925
+    },
+    {
+      "epoch": 232.64,
+      "grad_norm": 0.029971277341246605,
+      "learning_rate": 4.988545246277205e-05,
+      "loss": 0.0239,
+      "step": 930
+    },
+    {
+      "epoch": 233.96,
+      "grad_norm": 0.033762127161026,
+      "learning_rate": 4.9599083619702184e-05,
+      "loss": 0.0237,
+      "step": 935
+    },
+    {
+      "epoch": 235.0,
+      "grad_norm": 0.09928340464830399,
+      "learning_rate": 4.931271477663231e-05,
+      "loss": 0.0236,
+      "step": 940
+    },
+    {
+      "epoch": 236.32,
+      "grad_norm": 0.030009057372808456,
+      "learning_rate": 4.902634593356243e-05,
+      "loss": 0.0238,
+      "step": 945
+    },
+    {
+      "epoch": 237.64,
+      "grad_norm": 0.03369998559355736,
+      "learning_rate": 4.8739977090492556e-05,
+      "loss": 0.0239,
+      "step": 950
+    },
+    {
+      "epoch": 238.96,
+      "grad_norm": 0.03107636794447899,
+      "learning_rate": 4.845360824742268e-05,
+      "loss": 0.0251,
+      "step": 955
+    },
+    {
+      "epoch": 240.0,
+      "grad_norm": 0.10390744358301163,
+      "learning_rate": 4.816723940435281e-05,
+      "loss": 0.0227,
+      "step": 960
+    },
+    {
+      "epoch": 241.32,
+      "grad_norm": 0.03572176396846771,
+      "learning_rate": 4.7880870561282936e-05,
+      "loss": 0.0242,
+      "step": 965
+    },
+    {
+      "epoch": 242.64,
+      "grad_norm": 0.03051804192364216,
+      "learning_rate": 4.7594501718213055e-05,
+      "loss": 0.0232,
+      "step": 970
+    },
+    {
+      "epoch": 243.96,
+      "grad_norm": 0.031635165214538574,
+      "learning_rate": 4.730813287514318e-05,
+      "loss": 0.0241,
+      "step": 975
+    },
+    {
+      "epoch": 245.0,
+      "grad_norm": 0.0863058865070343,
+      "learning_rate": 4.7021764032073315e-05,
+      "loss": 0.0231,
+      "step": 980
+    },
+    {
+      "epoch": 246.32,
+      "grad_norm": 0.03220526874065399,
+      "learning_rate": 4.673539518900344e-05,
+      "loss": 0.0237,
+      "step": 985
+    },
+    {
+      "epoch": 247.64,
+      "grad_norm": 0.030770031735301018,
+      "learning_rate": 4.644902634593357e-05,
+      "loss": 0.0229,
+      "step": 990
+    },
+    {
+      "epoch": 248.96,
+      "grad_norm": 0.036592498421669006,
+      "learning_rate": 4.6162657502863694e-05,
+      "loss": 0.0233,
+      "step": 995
+    },
+    {
+      "epoch": 250.0,
+      "grad_norm": 0.09140961617231369,
+      "learning_rate": 4.5876288659793814e-05,
+      "loss": 0.0233,
+      "step": 1000
+    },
+    {
+      "epoch": 251.32,
+      "grad_norm": 0.03191279247403145,
+      "learning_rate": 4.558991981672394e-05,
+      "loss": 0.0234,
+      "step": 1005
+    },
+    {
+      "epoch": 252.64,
+      "grad_norm": 0.02950333058834076,
+      "learning_rate": 4.530355097365407e-05,
+      "loss": 0.024,
+      "step": 1010
+    },
+    {
+      "epoch": 253.96,
+      "grad_norm": 0.031532324850559235,
+      "learning_rate": 4.5017182130584194e-05,
+      "loss": 0.0233,
+      "step": 1015
+    },
+    {
+      "epoch": 255.0,
+      "grad_norm": 0.10817220062017441,
+      "learning_rate": 4.473081328751432e-05,
+      "loss": 0.0228,
+      "step": 1020
+    },
+    {
+      "epoch": 256.32,
+      "grad_norm": 0.03229045867919922,
+      "learning_rate": 4.4444444444444447e-05,
+      "loss": 0.0249,
+      "step": 1025
+    },
+    {
+      "epoch": 257.64,
+      "grad_norm": 0.027881359681487083,
+      "learning_rate": 4.415807560137457e-05,
+      "loss": 0.0236,
+      "step": 1030
+    },
+    {
+      "epoch": 258.96,
+      "grad_norm": 0.027970343828201294,
+      "learning_rate": 4.38717067583047e-05,
+      "loss": 0.0248,
+      "step": 1035
+    },
+    {
+      "epoch": 260.0,
+      "grad_norm": 0.0961368978023529,
+      "learning_rate": 4.3585337915234826e-05,
+      "loss": 0.0236,
+      "step": 1040
+    },
+    {
+      "epoch": 261.32,
+      "grad_norm": 0.03192312270402908,
+      "learning_rate": 4.329896907216495e-05,
+      "loss": 0.0231,
+      "step": 1045
+    },
+    {
+      "epoch": 262.64,
+      "grad_norm": 0.03287699446082115,
+      "learning_rate": 4.301260022909508e-05,
+      "loss": 0.0244,
+      "step": 1050
+    },
+    {
+      "epoch": 263.96,
+      "grad_norm": 0.03482283651828766,
+      "learning_rate": 4.27262313860252e-05,
+      "loss": 0.0231,
+      "step": 1055
+    },
+    {
+      "epoch": 265.0,
+      "grad_norm": 0.12014977633953094,
+      "learning_rate": 4.2439862542955325e-05,
+      "loss": 0.0246,
+      "step": 1060
+    },
+    {
+      "epoch": 266.32,
+      "grad_norm": 0.030348435044288635,
+      "learning_rate": 4.215349369988545e-05,
+      "loss": 0.0235,
+      "step": 1065
+    },
+    {
+      "epoch": 267.64,
+      "grad_norm": 0.027197284623980522,
+      "learning_rate": 4.1867124856815585e-05,
+      "loss": 0.0238,
+      "step": 1070
+    },
+    {
+      "epoch": 268.96,
+      "grad_norm": 0.03164960816502571,
+      "learning_rate": 4.158075601374571e-05,
+      "loss": 0.024,
+      "step": 1075
+    },
+    {
+      "epoch": 270.0,
+      "grad_norm": 0.09021521359682083,
+      "learning_rate": 4.129438717067583e-05,
+      "loss": 0.0237,
+      "step": 1080
+    },
+    {
+      "epoch": 271.32,
+      "grad_norm": 0.03432054817676544,
+      "learning_rate": 4.100801832760596e-05,
+      "loss": 0.024,
+      "step": 1085
+    },
+    {
+      "epoch": 272.64,
+      "grad_norm": 0.029961712658405304,
+      "learning_rate": 4.0721649484536084e-05,
+      "loss": 0.0224,
+      "step": 1090
+    },
+    {
+      "epoch": 273.96,
+      "grad_norm": 0.02801748737692833,
+      "learning_rate": 4.043528064146621e-05,
+      "loss": 0.0245,
+      "step": 1095
+    },
+    {
+      "epoch": 275.0,
+      "grad_norm": 0.09304305166006088,
+      "learning_rate": 4.014891179839634e-05,
+      "loss": 0.0229,
+      "step": 1100
+    },
+    {
+      "epoch": 276.32,
+      "grad_norm": 0.03154018521308899,
+      "learning_rate": 3.9862542955326463e-05,
+      "loss": 0.0242,
+      "step": 1105
+    },
+    {
+      "epoch": 277.64,
+      "grad_norm": 0.029925866052508354,
+      "learning_rate": 3.957617411225659e-05,
+      "loss": 0.024,
+      "step": 1110
+    },
+    {
+      "epoch": 278.96,
+      "grad_norm": 0.032234761863946915,
+      "learning_rate": 3.9289805269186716e-05,
+      "loss": 0.0232,
+      "step": 1115
+    },
+    {
+      "epoch": 280.0,
+      "grad_norm": 0.09113281220197678,
+      "learning_rate": 3.900343642611684e-05,
+      "loss": 0.0238,
+      "step": 1120
+    },
+    {
+      "epoch": 281.32,
+      "grad_norm": 0.03371744975447655,
+      "learning_rate": 3.871706758304697e-05,
+      "loss": 0.0242,
+      "step": 1125
+    },
+    {
+      "epoch": 282.64,
+      "grad_norm": 0.033525336533784866,
+      "learning_rate": 3.8430698739977096e-05,
+      "loss": 0.0234,
+      "step": 1130
+    },
+    {
+      "epoch": 283.96,
+      "grad_norm": 0.030558524653315544,
+      "learning_rate": 3.8144329896907216e-05,
+      "loss": 0.0237,
+      "step": 1135
+    },
+    {
+      "epoch": 285.0,
+      "grad_norm": 0.07060851901769638,
+      "learning_rate": 3.785796105383734e-05,
+      "loss": 0.022,
+      "step": 1140
+    },
+    {
+      "epoch": 286.32,
+      "grad_norm": 0.02952047996222973,
+      "learning_rate": 3.757159221076747e-05,
+      "loss": 0.0238,
+      "step": 1145
+    },
+    {
+      "epoch": 287.64,
+      "grad_norm": 0.030197326093912125,
+      "learning_rate": 3.7285223367697595e-05,
+      "loss": 0.0227,
+      "step": 1150
+    },
+    {
+      "epoch": 288.96,
+      "grad_norm": 0.028898609802126884,
+      "learning_rate": 3.699885452462772e-05,
+      "loss": 0.0232,
+      "step": 1155
+    },
+    {
+      "epoch": 290.0,
+      "grad_norm": 0.10391610860824585,
+      "learning_rate": 3.671248568155785e-05,
+      "loss": 0.0236,
+      "step": 1160
+    },
+    {
+      "epoch": 291.32,
+      "grad_norm": 0.0285499207675457,
+      "learning_rate": 3.6426116838487974e-05,
+      "loss": 0.0238,
+      "step": 1165
+    },
+    {
+      "epoch": 292.64,
+      "grad_norm": 0.028268715366721153,
+      "learning_rate": 3.61397479954181e-05,
+      "loss": 0.0229,
+      "step": 1170
+    },
+    {
+      "epoch": 293.96,
+      "grad_norm": 0.02961159311234951,
+      "learning_rate": 3.585337915234823e-05,
+      "loss": 0.0247,
+      "step": 1175
+    },
+    {
+      "epoch": 295.0,
+      "grad_norm": 0.08803751319646835,
+      "learning_rate": 3.5567010309278354e-05,
+      "loss": 0.0226,
+      "step": 1180
+    },
+    {
+      "epoch": 296.32,
+      "grad_norm": 0.03452374413609505,
+      "learning_rate": 3.528064146620848e-05,
+      "loss": 0.0244,
+      "step": 1185
+    },
+    {
+      "epoch": 297.64,
+      "grad_norm": 0.028895270079374313,
+      "learning_rate": 3.49942726231386e-05,
+      "loss": 0.023,
+      "step": 1190
+    },
+    {
+      "epoch": 298.96,
+      "grad_norm": 0.029182473197579384,
+      "learning_rate": 3.4707903780068726e-05,
+      "loss": 0.0234,
+      "step": 1195
+    },
+    {
+      "epoch": 300.0,
+      "grad_norm": 0.11874058097600937,
+      "learning_rate": 3.442153493699885e-05,
+      "loss": 0.0235,
+      "step": 1200
+    },
+    {
+      "epoch": 301.32,
+      "grad_norm": 0.030481066554784775,
+      "learning_rate": 3.4135166093928986e-05,
+      "loss": 0.0237,
+      "step": 1205
+    },
+    {
+      "epoch": 302.64,
+      "grad_norm": 0.03108309395611286,
+      "learning_rate": 3.384879725085911e-05,
+      "loss": 0.023,
+      "step": 1210
+    },
+    {
+      "epoch": 303.96,
+      "grad_norm": 0.03036290407180786,
+      "learning_rate": 3.356242840778923e-05,
+      "loss": 0.0228,
+      "step": 1215
+    },
+    {
+      "epoch": 305.0,
+      "grad_norm": 0.07720436155796051,
+      "learning_rate": 3.327605956471936e-05,
+      "loss": 0.0223,
+      "step": 1220
+    },
+    {
+      "epoch": 306.32,
+      "grad_norm": 0.03028162382543087,
+      "learning_rate": 3.2989690721649485e-05,
+      "loss": 0.0235,
+      "step": 1225
+    },
+    {
+      "epoch": 307.64,
+      "grad_norm": 0.033151157200336456,
+      "learning_rate": 3.270332187857961e-05,
+      "loss": 0.0226,
+      "step": 1230
+    },
+    {
+      "epoch": 308.96,
+      "grad_norm": 0.02951214276254177,
+      "learning_rate": 3.241695303550974e-05,
+      "loss": 0.0235,
+      "step": 1235
+    },
+    {
+      "epoch": 310.0,
+      "grad_norm": 0.09070917963981628,
+      "learning_rate": 3.2130584192439865e-05,
+      "loss": 0.0257,
+      "step": 1240
+    },
+    {
+      "epoch": 311.32,
+      "grad_norm": 0.03337477520108223,
+      "learning_rate": 3.184421534936999e-05,
+      "loss": 0.0248,
+      "step": 1245
+    },
+    {
+      "epoch": 312.64,
+      "grad_norm": 0.03151268512010574,
+      "learning_rate": 3.155784650630012e-05,
+      "loss": 0.0226,
+      "step": 1250
+    },
+    {
+      "epoch": 313.96,
+      "grad_norm": 0.030940482392907143,
+      "learning_rate": 3.1271477663230244e-05,
+      "loss": 0.024,
+      "step": 1255
+    },
+    {
+      "epoch": 315.0,
+      "grad_norm": 0.09032298624515533,
+      "learning_rate": 3.098510882016037e-05,
+      "loss": 0.0236,
+      "step": 1260
+    },
+    {
+      "epoch": 316.32,
+      "grad_norm": 0.029143668711185455,
+      "learning_rate": 3.06987399770905e-05,
+      "loss": 0.0222,
+      "step": 1265
+    },
+    {
+      "epoch": 317.64,
+      "grad_norm": 0.029851289466023445,
+      "learning_rate": 3.0412371134020617e-05,
+      "loss": 0.0246,
+      "step": 1270
+    },
+    {
+      "epoch": 318.96,
+      "grad_norm": 0.03257305920124054,
+      "learning_rate": 3.0126002290950743e-05,
+      "loss": 0.023,
+      "step": 1275
+    },
+    {
+      "epoch": 320.0,
+      "grad_norm": 0.10195237398147583,
+      "learning_rate": 2.983963344788087e-05,
+      "loss": 0.0242,
+      "step": 1280
+    },
+    {
+      "epoch": 321.32,
+      "grad_norm": 0.03116573579609394,
+      "learning_rate": 2.9553264604811e-05,
+      "loss": 0.0237,
+      "step": 1285
+    },
+    {
+      "epoch": 322.64,
+      "grad_norm": 0.033235374838113785,
+      "learning_rate": 2.9266895761741126e-05,
+      "loss": 0.0253,
+      "step": 1290
+    },
+    {
+      "epoch": 323.96,
+      "grad_norm": 0.03546692803502083,
+      "learning_rate": 2.8980526918671253e-05,
+      "loss": 0.0234,
+      "step": 1295
+    },
+    {
+      "epoch": 325.0,
+      "grad_norm": 0.0778215229511261,
+      "learning_rate": 2.8694158075601372e-05,
+      "loss": 0.0231,
+      "step": 1300
+    },
+    {
+      "epoch": 326.32,
+      "grad_norm": 0.029815560206770897,
+      "learning_rate": 2.8407789232531502e-05,
+      "loss": 0.0241,
+      "step": 1305
+    },
+    {
+      "epoch": 327.64,
+      "grad_norm": 0.03497137874364853,
+      "learning_rate": 2.812142038946163e-05,
+      "loss": 0.0233,
+      "step": 1310
+    },
+    {
+      "epoch": 328.96,
+      "grad_norm": 0.030050713568925858,
+      "learning_rate": 2.7835051546391755e-05,
+      "loss": 0.0244,
+      "step": 1315
+    },
+    {
+      "epoch": 330.0,
+      "grad_norm": 0.09748966246843338,
+      "learning_rate": 2.754868270332188e-05,
+      "loss": 0.0231,
+      "step": 1320
+    },
+    {
+      "epoch": 331.32,
+      "grad_norm": 0.0319872722029686,
+      "learning_rate": 2.7262313860252005e-05,
+      "loss": 0.0235,
+      "step": 1325
+    },
+    {
+      "epoch": 332.64,
+      "grad_norm": 0.029525283724069595,
+      "learning_rate": 2.697594501718213e-05,
+      "loss": 0.0243,
+      "step": 1330
+    },
+    {
+      "epoch": 333.96,
+      "grad_norm": 0.029868364334106445,
+      "learning_rate": 2.6689576174112258e-05,
+      "loss": 0.0245,
+      "step": 1335
+    },
+    {
+      "epoch": 335.0,
+      "grad_norm": 0.07746418565511703,
+      "learning_rate": 2.6403207331042384e-05,
+      "loss": 0.0212,
+      "step": 1340
+    },
+    {
+      "epoch": 336.32,
+      "grad_norm": 0.02571861259639263,
+      "learning_rate": 2.611683848797251e-05,
+      "loss": 0.0233,
+      "step": 1345
+    },
+    {
+      "epoch": 337.64,
+      "grad_norm": 0.0320206955075264,
+      "learning_rate": 2.5830469644902637e-05,
+      "loss": 0.0238,
+      "step": 1350
+    },
+    {
+      "epoch": 338.96,
+      "grad_norm": 0.03084505721926689,
+      "learning_rate": 2.554410080183276e-05,
+      "loss": 0.024,
+      "step": 1355
+    },
+    {
+      "epoch": 340.0,
+      "grad_norm": 0.1282522976398468,
+      "learning_rate": 2.5257731958762887e-05,
+      "loss": 0.0237,
+      "step": 1360
+    },
+    {
+      "epoch": 341.32,
+      "grad_norm": 0.03159436210989952,
+      "learning_rate": 2.4971363115693013e-05,
+      "loss": 0.0239,
+      "step": 1365
+    },
+    {
+      "epoch": 342.64,
+      "grad_norm": 0.03368183225393295,
+      "learning_rate": 2.468499427262314e-05,
+      "loss": 0.023,
+      "step": 1370
+    },
+    {
+      "epoch": 343.96,
+      "grad_norm": 0.02871900610625744,
+      "learning_rate": 2.4398625429553266e-05,
+      "loss": 0.0232,
+      "step": 1375
+    },
+    {
+      "epoch": 345.0,
+      "grad_norm": 0.06527750939130783,
+      "learning_rate": 2.4112256586483393e-05,
+      "loss": 0.0216,
+      "step": 1380
+    },
+    {
+      "epoch": 346.32,
+      "grad_norm": 0.029657971113920212,
+      "learning_rate": 2.3825887743413516e-05,
+      "loss": 0.0246,
+      "step": 1385
+    },
+    {
+      "epoch": 347.64,
+      "grad_norm": 0.029672225937247276,
+      "learning_rate": 2.3539518900343642e-05,
+      "loss": 0.0226,
+      "step": 1390
+    },
+    {
+      "epoch": 348.96,
+      "grad_norm": 0.032295338809490204,
+      "learning_rate": 2.3253150057273772e-05,
+      "loss": 0.0234,
+      "step": 1395
+    },
+    {
+      "epoch": 350.0,
+      "grad_norm": 0.12228602916002274,
+      "learning_rate": 2.2966781214203895e-05,
+      "loss": 0.0246,
+      "step": 1400
+    },
+    {
+      "epoch": 351.32,
+      "grad_norm": 0.031152470037341118,
+      "learning_rate": 2.268041237113402e-05,
+      "loss": 0.0244,
+      "step": 1405
+    },
+    {
+      "epoch": 352.64,
+      "grad_norm": 0.03246377035975456,
+      "learning_rate": 2.2394043528064148e-05,
+      "loss": 0.0241,
+      "step": 1410
+    },
+    {
+      "epoch": 353.96,
+      "grad_norm": 0.03664344921708107,
+      "learning_rate": 2.210767468499427e-05,
+      "loss": 0.0236,
+      "step": 1415
+    },
+    {
+      "epoch": 355.0,
+      "grad_norm": 0.12599903345108032,
+      "learning_rate": 2.18213058419244e-05,
+      "loss": 0.0242,
+      "step": 1420
+    },
+    {
+      "epoch": 356.32,
+      "grad_norm": 0.03213375434279442,
+      "learning_rate": 2.1534936998854528e-05,
+      "loss": 0.023,
+      "step": 1425
+    },
+    {
+      "epoch": 357.64,
+      "grad_norm": 0.029569735750555992,
+      "learning_rate": 2.124856815578465e-05,
+      "loss": 0.0242,
+      "step": 1430
+    },
+    {
+      "epoch": 358.96,
+      "grad_norm": 0.030345458537340164,
+      "learning_rate": 2.0962199312714777e-05,
+      "loss": 0.0237,
+      "step": 1435
+    },
+    {
+      "epoch": 360.0,
+      "grad_norm": 0.07442766427993774,
+      "learning_rate": 2.0675830469644904e-05,
+      "loss": 0.0225,
+      "step": 1440
+    },
+    {
+      "epoch": 361.32,
+      "grad_norm": 0.03161914646625519,
+      "learning_rate": 2.038946162657503e-05,
+      "loss": 0.0247,
+      "step": 1445
+    },
+    {
+      "epoch": 362.64,
+      "grad_norm": 0.03342209383845329,
+      "learning_rate": 2.0103092783505157e-05,
+      "loss": 0.0224,
+      "step": 1450
+    },
+    {
+      "epoch": 363.96,
+      "grad_norm": 0.029506616294384003,
+      "learning_rate": 1.981672394043528e-05,
+      "loss": 0.0226,
+      "step": 1455
+    },
+    {
+      "epoch": 365.0,
+      "grad_norm": 0.13045279681682587,
+      "learning_rate": 1.9530355097365406e-05,
+      "loss": 0.0251,
+      "step": 1460
+    },
+    {
+      "epoch": 366.32,
+      "grad_norm": 0.03303099796175957,
+      "learning_rate": 1.9243986254295536e-05,
+      "loss": 0.0239,
+      "step": 1465
+    },
+    {
+      "epoch": 367.64,
+      "grad_norm": 0.02956564724445343,
+      "learning_rate": 1.895761741122566e-05,
+      "loss": 0.0221,
+      "step": 1470
+    },
+    {
+      "epoch": 368.96,
+      "grad_norm": 0.03200279548764229,
+      "learning_rate": 1.8671248568155786e-05,
+      "loss": 0.0234,
+      "step": 1475
+    },
+    {
+      "epoch": 370.0,
+      "grad_norm": 0.12507398426532745,
+      "learning_rate": 1.8384879725085912e-05,
+      "loss": 0.0235,
+      "step": 1480
+    },
+    {
+      "epoch": 371.32,
+      "grad_norm": 0.03214867785573006,
+      "learning_rate": 1.809851088201604e-05,
+      "loss": 0.0233,
+      "step": 1485
+    },
+    {
+      "epoch": 372.64,
+      "grad_norm": 0.03199266269803047,
+      "learning_rate": 1.7812142038946165e-05,
+      "loss": 0.023,
+      "step": 1490
+    },
+    {
+      "epoch": 373.96,
+      "grad_norm": 0.027682902291417122,
+      "learning_rate": 1.7525773195876288e-05,
+      "loss": 0.0246,
+      "step": 1495
+    },
+    {
+      "epoch": 375.0,
+      "grad_norm": 0.10432948172092438,
+      "learning_rate": 1.7239404352806415e-05,
+      "loss": 0.0228,
+      "step": 1500
+    },
+    {
+      "epoch": 376.32,
+      "grad_norm": 0.03665570914745331,
+      "learning_rate": 1.695303550973654e-05,
+      "loss": 0.0235,
+      "step": 1505
+    },
+    {
+      "epoch": 377.64,
+      "grad_norm": 0.03269299864768982,
+      "learning_rate": 1.6666666666666667e-05,
+      "loss": 0.0228,
+      "step": 1510
+    },
+    {
+      "epoch": 378.96,
+      "grad_norm": 0.030298851430416107,
+      "learning_rate": 1.6380297823596794e-05,
+      "loss": 0.0232,
+      "step": 1515
+    },
+    {
+      "epoch": 380.0,
+      "grad_norm": 0.1330370008945465,
+      "learning_rate": 1.609392898052692e-05,
+      "loss": 0.024,
+      "step": 1520
+    },
+    {
+      "epoch": 381.32,
+      "grad_norm": 0.026194848120212555,
+      "learning_rate": 1.5807560137457044e-05,
+      "loss": 0.0232,
+      "step": 1525
+    },
+    {
+      "epoch": 382.64,
+      "grad_norm": 0.030696984380483627,
+      "learning_rate": 1.5521191294387173e-05,
+      "loss": 0.024,
+      "step": 1530
+    },
+    {
+      "epoch": 383.96,
+      "grad_norm": 0.03159346804022789,
+      "learning_rate": 1.5234822451317298e-05,
+      "loss": 0.0237,
+      "step": 1535
+    },
+    {
+      "epoch": 385.0,
+      "grad_norm": 0.0895160585641861,
+      "learning_rate": 1.4948453608247423e-05,
+      "loss": 0.024,
+      "step": 1540
+    },
+    {
+      "epoch": 386.32,
+      "grad_norm": 0.030342400074005127,
+      "learning_rate": 1.466208476517755e-05,
+      "loss": 0.0226,
+      "step": 1545
+    },
+    {
+      "epoch": 387.64,
+      "grad_norm": 0.03451743721961975,
+      "learning_rate": 1.4375715922107674e-05,
+      "loss": 0.0241,
+      "step": 1550
+    },
+    {
+      "epoch": 388.96,
+      "grad_norm": 0.034534044563770294,
+      "learning_rate": 1.40893470790378e-05,
+      "loss": 0.0224,
+      "step": 1555
+    },
+    {
+      "epoch": 390.0,
+      "grad_norm": 0.11649748682975769,
+      "learning_rate": 1.3802978235967929e-05,
+      "loss": 0.024,
+      "step": 1560
+    },
+    {
+      "epoch": 391.32,
+      "grad_norm": 0.02730483002960682,
+      "learning_rate": 1.3516609392898052e-05,
+      "loss": 0.0232,
+      "step": 1565
+    },
+    {
+      "epoch": 392.64,
+      "grad_norm": 0.03302980959415436,
+      "learning_rate": 1.323024054982818e-05,
+      "loss": 0.0245,
+      "step": 1570
+    },
+    {
+      "epoch": 393.96,
+      "grad_norm": 0.030424287542700768,
+      "learning_rate": 1.2943871706758307e-05,
+      "loss": 0.0231,
+      "step": 1575
+    },
+    {
+      "epoch": 395.0,
+      "grad_norm": 0.09190870821475983,
+      "learning_rate": 1.2657502863688431e-05,
+      "loss": 0.0233,
+      "step": 1580
+    },
+    {
+      "epoch": 396.32,
+      "grad_norm": 0.03016272746026516,
+      "learning_rate": 1.2371134020618558e-05,
+      "loss": 0.0237,
+      "step": 1585
+    },
+    {
+      "epoch": 397.64,
+      "grad_norm": 0.029102135449647903,
+      "learning_rate": 1.2084765177548683e-05,
+      "loss": 0.0237,
+      "step": 1590
+    },
+    {
+      "epoch": 398.96,
+      "grad_norm": 0.030849164351820946,
+      "learning_rate": 1.1798396334478809e-05,
+      "loss": 0.0238,
+      "step": 1595
+    },
+    {
+      "epoch": 400.0,
+      "grad_norm": 0.09185610711574554,
+      "learning_rate": 1.1512027491408934e-05,
+      "loss": 0.0223,
+      "step": 1600
+    },
+    {
+      "epoch": 401.32,
+      "grad_norm": 0.030718082562088966,
+      "learning_rate": 1.1225658648339062e-05,
+      "loss": 0.0228,
+      "step": 1605
+    },
+    {
+      "epoch": 402.64,
+      "grad_norm": 0.028845084831118584,
+      "learning_rate": 1.0939289805269187e-05,
+      "loss": 0.0238,
+      "step": 1610
+    },
+    {
+      "epoch": 403.96,
+      "grad_norm": 0.03036542609333992,
+      "learning_rate": 1.0652920962199313e-05,
+      "loss": 0.0241,
+      "step": 1615
+    },
+    {
+      "epoch": 405.0,
+      "grad_norm": 0.10246625542640686,
+      "learning_rate": 1.036655211912944e-05,
+      "loss": 0.0234,
+      "step": 1620
+    },
+    {
+      "epoch": 406.32,
+      "grad_norm": 0.03127530962228775,
+      "learning_rate": 1.0080183276059566e-05,
+      "loss": 0.0238,
+      "step": 1625
+    },
+    {
+      "epoch": 407.64,
+      "grad_norm": 0.036298803985118866,
+      "learning_rate": 9.793814432989691e-06,
+      "loss": 0.0226,
+      "step": 1630
+    },
+    {
+      "epoch": 408.96,
+      "grad_norm": 0.028423035517334938,
+      "learning_rate": 9.507445589919818e-06,
+      "loss": 0.0231,
+      "step": 1635
+    },
+    {
+      "epoch": 410.0,
+      "grad_norm": 0.07871800661087036,
+      "learning_rate": 9.221076746849944e-06,
+      "loss": 0.0218,
+      "step": 1640
+    },
+    {
+      "epoch": 411.32,
+      "grad_norm": 0.0336175374686718,
+      "learning_rate": 8.934707903780069e-06,
+      "loss": 0.0242,
+      "step": 1645
+    },
+    {
+      "epoch": 412.64,
+      "grad_norm": 0.03624117374420166,
+      "learning_rate": 8.648339060710195e-06,
+      "loss": 0.0227,
+      "step": 1650
+    },
+    {
+      "epoch": 413.96,
+      "grad_norm": 0.03119911253452301,
+      "learning_rate": 8.36197021764032e-06,
+      "loss": 0.0218,
+      "step": 1655
+    },
+    {
+      "epoch": 415.0,
+      "grad_norm": 0.09461841732263565,
+      "learning_rate": 8.075601374570448e-06,
+      "loss": 0.0227,
+      "step": 1660
+    },
+    {
+      "epoch": 416.32,
+      "grad_norm": 0.02897919900715351,
+      "learning_rate": 7.789232531500573e-06,
+      "loss": 0.0233,
+      "step": 1665
+    },
+    {
+      "epoch": 417.64,
+      "grad_norm": 0.03222072497010231,
+      "learning_rate": 7.502863688430699e-06,
+      "loss": 0.0234,
+      "step": 1670
+    },
+    {
+      "epoch": 418.96,
+      "grad_norm": 0.02793605998158455,
+      "learning_rate": 7.216494845360824e-06,
+      "loss": 0.0231,
+      "step": 1675
+    },
+    {
+      "epoch": 420.0,
+      "grad_norm": 0.10282719135284424,
+      "learning_rate": 6.930126002290952e-06,
+      "loss": 0.0242,
+      "step": 1680
+    },
+    {
+      "epoch": 421.32,
+      "grad_norm": 0.029103396460413933,
+      "learning_rate": 6.643757159221077e-06,
+      "loss": 0.0228,
+      "step": 1685
+    },
+    {
+      "epoch": 422.64,
+      "grad_norm": 0.027615424245595932,
+      "learning_rate": 6.357388316151203e-06,
+      "loss": 0.0229,
+      "step": 1690
+    },
+    {
+      "epoch": 423.96,
+      "grad_norm": 0.03273004665970802,
+      "learning_rate": 6.071019473081329e-06,
+      "loss": 0.023,
+      "step": 1695
+    },
+    {
+      "epoch": 425.0,
+      "grad_norm": 0.088851198554039,
+      "learning_rate": 5.784650630011455e-06,
+      "loss": 0.0242,
+      "step": 1700
+    },
+    {
+      "epoch": 426.32,
+      "grad_norm": 0.031545545905828476,
+      "learning_rate": 5.498281786941581e-06,
+      "loss": 0.0236,
+      "step": 1705
+    },
+    {
+      "epoch": 427.64,
+      "grad_norm": 0.03436841815710068,
+      "learning_rate": 5.211912943871707e-06,
+      "loss": 0.0231,
+      "step": 1710
+    },
+    {
+      "epoch": 428.96,
+      "grad_norm": 0.03470204398036003,
+      "learning_rate": 4.925544100801833e-06,
+      "loss": 0.023,
+      "step": 1715
+    },
+    {
+      "epoch": 430.0,
+      "grad_norm": 0.0859316810965538,
+      "learning_rate": 4.639175257731959e-06,
+      "loss": 0.0233,
+      "step": 1720
+    },
+    {
+      "epoch": 431.32,
+      "grad_norm": 0.02714327722787857,
+      "learning_rate": 4.352806414662085e-06,
+      "loss": 0.0215,
+      "step": 1725
+    },
+    {
+      "epoch": 432.64,
+      "grad_norm": 0.03115593083202839,
+      "learning_rate": 4.066437571592211e-06,
+      "loss": 0.0233,
+      "step": 1730
+    },
+    {
+      "epoch": 433.96,
+      "grad_norm": 0.03160055726766586,
+      "learning_rate": 3.7800687285223365e-06,
+      "loss": 0.0222,
+      "step": 1735
+    },
+    {
+      "epoch": 435.0,
+      "grad_norm": 0.10642414540052414,
+      "learning_rate": 3.493699885452463e-06,
+      "loss": 0.0221,
+      "step": 1740
+    },
+    {
+      "epoch": 436.32,
+      "grad_norm": 0.029918361455202103,
+      "learning_rate": 3.2073310423825886e-06,
+      "loss": 0.024,
+      "step": 1745
+    },
+    {
+      "epoch": 437.64,
+      "grad_norm": 0.030128490179777145,
+      "learning_rate": 2.920962199312715e-06,
+      "loss": 0.023,
+      "step": 1750
+    },
+    {
+      "epoch": 438.96,
+      "grad_norm": 0.03472098708152771,
+      "learning_rate": 2.6345933562428407e-06,
+      "loss": 0.0231,
+      "step": 1755
+    },
+    {
+      "epoch": 440.0,
+      "grad_norm": 0.10841913521289825,
+      "learning_rate": 2.3482245131729668e-06,
+      "loss": 0.0238,
+      "step": 1760
+    },
+    {
+      "epoch": 441.32,
+      "grad_norm": 0.03282919153571129,
+      "learning_rate": 2.061855670103093e-06,
+      "loss": 0.0241,
+      "step": 1765
+    },
+    {
+      "epoch": 442.64,
+      "grad_norm": 0.030162909999489784,
+      "learning_rate": 1.7754868270332189e-06,
+      "loss": 0.0234,
+      "step": 1770
+    },
+    {
+      "epoch": 443.96,
+      "grad_norm": 0.032848529517650604,
+      "learning_rate": 1.4891179839633447e-06,
+      "loss": 0.0225,
+      "step": 1775
+    },
+    {
+      "epoch": 445.0,
+      "grad_norm": 0.09595301747322083,
+      "learning_rate": 1.202749140893471e-06,
+      "loss": 0.023,
+      "step": 1780
+    },
+    {
+      "epoch": 446.32,
+      "grad_norm": 0.027366334572434425,
+      "learning_rate": 9.163802978235968e-07,
+      "loss": 0.0242,
+      "step": 1785
+    },
+    {
+      "epoch": 447.64,
+      "grad_norm": 0.029810229316353798,
+      "learning_rate": 6.300114547537229e-07,
+      "loss": 0.0243,
+      "step": 1790
+    },
+    {
+      "epoch": 448.96,
+      "grad_norm": 0.03164233639836311,
+      "learning_rate": 3.436426116838488e-07,
+      "loss": 0.0238,
+      "step": 1795
+    },
+    {
+      "epoch": 450.0,
+      "grad_norm": 0.11831732094287872,
+      "learning_rate": 5.72737686139748e-08,
+      "loss": 0.0246,
+      "step": 1800
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 1800,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 450,
+  "save_steps": 300,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.5308141101056e+18,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

Math_QA/group_09/checkpoints/checkpoint-1800/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

Math_QA/group_09/checkpoints/checkpoint-300/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

Math_QA/group_09/checkpoints/checkpoint-300/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "o_proj",
+    "k_proj",
+    "q_proj",
+    "up_proj",
+    "down_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

Math_QA/group_09/checkpoints/checkpoint-300/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

Math_QA/group_09/checkpoints/checkpoint-300/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

Math_QA/group_09/checkpoints/checkpoint-300/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

Math_QA/group_09/checkpoints/checkpoint-300/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

Math_QA/group_09/checkpoints/checkpoint-300/trainer_state.json ADDED Viewed

	@@ -0,0 +1,461 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 75.0,
+  "eval_steps": 500,
+  "global_step": 300,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.32,
+      "grad_norm": 10.95508098602295,
+      "learning_rate": 0.0,
+      "loss": 1.9528,
+      "step": 1
+    },
+    {
+      "epoch": 1.32,
+      "grad_norm": 6.976505279541016,
+      "learning_rate": 7.4074074074074075e-06,
+      "loss": 1.7919,
+      "step": 5
+    },
+    {
+      "epoch": 2.64,
+      "grad_norm": 3.288942575454712,
+      "learning_rate": 1.6666666666666667e-05,
+      "loss": 1.6625,
+      "step": 10
+    },
+    {
+      "epoch": 3.96,
+      "grad_norm": 2.111987829208374,
+      "learning_rate": 2.5925925925925925e-05,
+      "loss": 1.2009,
+      "step": 15
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 2.4555912017822266,
+      "learning_rate": 3.518518518518519e-05,
+      "loss": 0.8544,
+      "step": 20
+    },
+    {
+      "epoch": 6.32,
+      "grad_norm": 0.656902015209198,
+      "learning_rate": 4.4444444444444447e-05,
+      "loss": 0.7449,
+      "step": 25
+    },
+    {
+      "epoch": 7.64,
+      "grad_norm": 0.5291489958763123,
+      "learning_rate": 5.370370370370371e-05,
+      "loss": 0.5884,
+      "step": 30
+    },
+    {
+      "epoch": 8.96,
+      "grad_norm": 0.5356371998786926,
+      "learning_rate": 6.296296296296296e-05,
+      "loss": 0.6349,
+      "step": 35
+    },
+    {
+      "epoch": 10.0,
+      "grad_norm": 1.3841232061386108,
+      "learning_rate": 7.222222222222222e-05,
+      "loss": 0.5088,
+      "step": 40
+    },
+    {
+      "epoch": 11.32,
+      "grad_norm": 0.6104851365089417,
+      "learning_rate": 8.148148148148148e-05,
+      "loss": 0.4279,
+      "step": 45
+    },
+    {
+      "epoch": 12.64,
+      "grad_norm": 0.6166547536849976,
+      "learning_rate": 9.074074074074075e-05,
+      "loss": 0.2689,
+      "step": 50
+    },
+    {
+      "epoch": 13.96,
+      "grad_norm": 1.70536470413208,
+      "learning_rate": 0.0001,
+      "loss": 0.1985,
+      "step": 55
+    },
+    {
+      "epoch": 15.0,
+      "grad_norm": 3.289710760116577,
+      "learning_rate": 9.971363115693013e-05,
+      "loss": 0.1346,
+      "step": 60
+    },
+    {
+      "epoch": 16.32,
+      "grad_norm": 0.7723984122276306,
+      "learning_rate": 9.942726231386026e-05,
+      "loss": 0.0959,
+      "step": 65
+    },
+    {
+      "epoch": 17.64,
+      "grad_norm": 0.8176506161689758,
+      "learning_rate": 9.914089347079038e-05,
+      "loss": 0.0617,
+      "step": 70
+    },
+    {
+      "epoch": 18.96,
+      "grad_norm": 0.5226219892501831,
+      "learning_rate": 9.885452462772051e-05,
+      "loss": 0.0428,
+      "step": 75
+    },
+    {
+      "epoch": 20.0,
+      "grad_norm": 2.8831839561462402,
+      "learning_rate": 9.856815578465064e-05,
+      "loss": 0.0416,
+      "step": 80
+    },
+    {
+      "epoch": 21.32,
+      "grad_norm": 0.26046544313430786,
+      "learning_rate": 9.828178694158075e-05,
+      "loss": 0.0334,
+      "step": 85
+    },
+    {
+      "epoch": 22.64,
+      "grad_norm": 0.5656669735908508,
+      "learning_rate": 9.799541809851088e-05,
+      "loss": 0.0347,
+      "step": 90
+    },
+    {
+      "epoch": 23.96,
+      "grad_norm": 0.5219624042510986,
+      "learning_rate": 9.7709049255441e-05,
+      "loss": 0.0336,
+      "step": 95
+    },
+    {
+      "epoch": 25.0,
+      "grad_norm": 1.2479528188705444,
+      "learning_rate": 9.742268041237114e-05,
+      "loss": 0.0325,
+      "step": 100
+    },
+    {
+      "epoch": 26.32,
+      "grad_norm": 0.3272712826728821,
+      "learning_rate": 9.713631156930127e-05,
+      "loss": 0.0317,
+      "step": 105
+    },
+    {
+      "epoch": 27.64,
+      "grad_norm": 0.4236655831336975,
+      "learning_rate": 9.68499427262314e-05,
+      "loss": 0.0322,
+      "step": 110
+    },
+    {
+      "epoch": 28.96,
+      "grad_norm": 0.23534469306468964,
+      "learning_rate": 9.656357388316152e-05,
+      "loss": 0.029,
+      "step": 115
+    },
+    {
+      "epoch": 30.0,
+      "grad_norm": 0.530704140663147,
+      "learning_rate": 9.627720504009165e-05,
+      "loss": 0.0301,
+      "step": 120
+    },
+    {
+      "epoch": 31.32,
+      "grad_norm": 0.08252622932195663,
+      "learning_rate": 9.599083619702178e-05,
+      "loss": 0.029,
+      "step": 125
+    },
+    {
+      "epoch": 32.64,
+      "grad_norm": 0.2679576277732849,
+      "learning_rate": 9.57044673539519e-05,
+      "loss": 0.0287,
+      "step": 130
+    },
+    {
+      "epoch": 33.96,
+      "grad_norm": 0.30863121151924133,
+      "learning_rate": 9.541809851088203e-05,
+      "loss": 0.029,
+      "step": 135
+    },
+    {
+      "epoch": 35.0,
+      "grad_norm": 0.27921056747436523,
+      "learning_rate": 9.513172966781214e-05,
+      "loss": 0.0272,
+      "step": 140
+    },
+    {
+      "epoch": 36.32,
+      "grad_norm": 0.15001249313354492,
+      "learning_rate": 9.484536082474227e-05,
+      "loss": 0.0289,
+      "step": 145
+    },
+    {
+      "epoch": 37.64,
+      "grad_norm": 0.391609787940979,
+      "learning_rate": 9.45589919816724e-05,
+      "loss": 0.0295,
+      "step": 150
+    },
+    {
+      "epoch": 38.96,
+      "grad_norm": 0.24230684340000153,
+      "learning_rate": 9.427262313860252e-05,
+      "loss": 0.0265,
+      "step": 155
+    },
+    {
+      "epoch": 40.0,
+      "grad_norm": 2.2498250007629395,
+      "learning_rate": 9.398625429553265e-05,
+      "loss": 0.0319,
+      "step": 160
+    },
+    {
+      "epoch": 41.32,
+      "grad_norm": 0.14986856281757355,
+      "learning_rate": 9.369988545246277e-05,
+      "loss": 0.0277,
+      "step": 165
+    },
+    {
+      "epoch": 42.64,
+      "grad_norm": 0.14574986696243286,
+      "learning_rate": 9.34135166093929e-05,
+      "loss": 0.0264,
+      "step": 170
+    },
+    {
+      "epoch": 43.96,
+      "grad_norm": 0.11353456974029541,
+      "learning_rate": 9.312714776632303e-05,
+      "loss": 0.026,
+      "step": 175
+    },
+    {
+      "epoch": 45.0,
+      "grad_norm": 0.19234131276607513,
+      "learning_rate": 9.284077892325315e-05,
+      "loss": 0.0237,
+      "step": 180
+    },
+    {
+      "epoch": 46.32,
+      "grad_norm": 0.058677107095718384,
+      "learning_rate": 9.255441008018328e-05,
+      "loss": 0.0265,
+      "step": 185
+    },
+    {
+      "epoch": 47.64,
+      "grad_norm": 0.2846521735191345,
+      "learning_rate": 9.22680412371134e-05,
+      "loss": 0.0279,
+      "step": 190
+    },
+    {
+      "epoch": 48.96,
+      "grad_norm": 0.06889114528894424,
+      "learning_rate": 9.198167239404353e-05,
+      "loss": 0.0257,
+      "step": 195
+    },
+    {
+      "epoch": 50.0,
+      "grad_norm": 0.1600271314382553,
+      "learning_rate": 9.169530355097366e-05,
+      "loss": 0.0249,
+      "step": 200
+    },
+    {
+      "epoch": 51.32,
+      "grad_norm": 0.06680695712566376,
+      "learning_rate": 9.140893470790379e-05,
+      "loss": 0.0245,
+      "step": 205
+    },
+    {
+      "epoch": 52.64,
+      "grad_norm": 0.06898869574069977,
+      "learning_rate": 9.112256586483391e-05,
+      "loss": 0.0257,
+      "step": 210
+    },
+    {
+      "epoch": 53.96,
+      "grad_norm": 0.04665664583444595,
+      "learning_rate": 9.083619702176404e-05,
+      "loss": 0.0246,
+      "step": 215
+    },
+    {
+      "epoch": 55.0,
+      "grad_norm": 0.18880419433116913,
+      "learning_rate": 9.054982817869416e-05,
+      "loss": 0.0267,
+      "step": 220
+    },
+    {
+      "epoch": 56.32,
+      "grad_norm": 0.05329155549407005,
+      "learning_rate": 9.026345933562429e-05,
+      "loss": 0.0258,
+      "step": 225
+    },
+    {
+      "epoch": 57.64,
+      "grad_norm": 0.05351603031158447,
+      "learning_rate": 8.997709049255442e-05,
+      "loss": 0.0264,
+      "step": 230
+    },
+    {
+      "epoch": 58.96,
+      "grad_norm": 0.05472696200013161,
+      "learning_rate": 8.969072164948454e-05,
+      "loss": 0.0266,
+      "step": 235
+    },
+    {
+      "epoch": 60.0,
+      "grad_norm": 0.17182305455207825,
+      "learning_rate": 8.940435280641467e-05,
+      "loss": 0.0255,
+      "step": 240
+    },
+    {
+      "epoch": 61.32,
+      "grad_norm": 0.05441403388977051,
+      "learning_rate": 8.91179839633448e-05,
+      "loss": 0.0259,
+      "step": 245
+    },
+    {
+      "epoch": 62.64,
+      "grad_norm": 0.05443132296204567,
+      "learning_rate": 8.883161512027491e-05,
+      "loss": 0.025,
+      "step": 250
+    },
+    {
+      "epoch": 63.96,
+      "grad_norm": 0.05410757660865784,
+      "learning_rate": 8.854524627720504e-05,
+      "loss": 0.0261,
+      "step": 255
+    },
+    {
+      "epoch": 65.0,
+      "grad_norm": 0.16327381134033203,
+      "learning_rate": 8.825887743413516e-05,
+      "loss": 0.0265,
+      "step": 260
+    },
+    {
+      "epoch": 66.32,
+      "grad_norm": 0.05516252666711807,
+      "learning_rate": 8.797250859106529e-05,
+      "loss": 0.0251,
+      "step": 265
+    },
+    {
+      "epoch": 67.64,
+      "grad_norm": 0.0483415424823761,
+      "learning_rate": 8.768613974799542e-05,
+      "loss": 0.0255,
+      "step": 270
+    },
+    {
+      "epoch": 68.96,
+      "grad_norm": 0.062226541340351105,
+      "learning_rate": 8.739977090492554e-05,
+      "loss": 0.0247,
+      "step": 275
+    },
+    {
+      "epoch": 70.0,
+      "grad_norm": 0.20358847081661224,
+      "learning_rate": 8.711340206185567e-05,
+      "loss": 0.0267,
+      "step": 280
+    },
+    {
+      "epoch": 71.32,
+      "grad_norm": 0.04628003016114235,
+      "learning_rate": 8.682703321878581e-05,
+      "loss": 0.0255,
+      "step": 285
+    },
+    {
+      "epoch": 72.64,
+      "grad_norm": 0.06483373790979385,
+      "learning_rate": 8.654066437571594e-05,
+      "loss": 0.0257,
+      "step": 290
+    },
+    {
+      "epoch": 73.96,
+      "grad_norm": 0.04926105588674545,
+      "learning_rate": 8.625429553264606e-05,
+      "loss": 0.0244,
+      "step": 295
+    },
+    {
+      "epoch": 75.0,
+      "grad_norm": 0.1988091617822647,
+      "learning_rate": 8.596792668957619e-05,
+      "loss": 0.0239,
+      "step": 300
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 1800,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 450,
+  "save_steps": 300,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.551356850176e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

Math_QA/group_09/checkpoints/checkpoint-300/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

Math_QA/group_09/checkpoints/checkpoint-600/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

Math_QA/group_09/checkpoints/checkpoint-600/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "o_proj",
+    "k_proj",
+    "q_proj",
+    "up_proj",
+    "down_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

Math_QA/group_09/checkpoints/checkpoint-600/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

Math_QA/group_09/checkpoints/checkpoint-600/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

Math_QA/group_09/checkpoints/checkpoint-600/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

Math_QA/group_09/checkpoints/checkpoint-600/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

Math_QA/group_09/checkpoints/checkpoint-600/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

Math_QA/group_09/checkpoints/checkpoint-600/trainer_state.json ADDED Viewed

	@@ -0,0 +1,881 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 150.0,
+  "eval_steps": 500,
+  "global_step": 600,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.32,
+      "grad_norm": 10.95508098602295,
+      "learning_rate": 0.0,
+      "loss": 1.9528,
+      "step": 1
+    },
+    {
+      "epoch": 1.32,
+      "grad_norm": 6.976505279541016,
+      "learning_rate": 7.4074074074074075e-06,
+      "loss": 1.7919,
+      "step": 5
+    },
+    {
+      "epoch": 2.64,
+      "grad_norm": 3.288942575454712,
+      "learning_rate": 1.6666666666666667e-05,
+      "loss": 1.6625,
+      "step": 10
+    },
+    {
+      "epoch": 3.96,
+      "grad_norm": 2.111987829208374,
+      "learning_rate": 2.5925925925925925e-05,
+      "loss": 1.2009,
+      "step": 15
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 2.4555912017822266,
+      "learning_rate": 3.518518518518519e-05,
+      "loss": 0.8544,
+      "step": 20
+    },
+    {
+      "epoch": 6.32,
+      "grad_norm": 0.656902015209198,
+      "learning_rate": 4.4444444444444447e-05,
+      "loss": 0.7449,
+      "step": 25
+    },
+    {
+      "epoch": 7.64,
+      "grad_norm": 0.5291489958763123,
+      "learning_rate": 5.370370370370371e-05,
+      "loss": 0.5884,
+      "step": 30
+    },
+    {
+      "epoch": 8.96,
+      "grad_norm": 0.5356371998786926,
+      "learning_rate": 6.296296296296296e-05,
+      "loss": 0.6349,
+      "step": 35
+    },
+    {
+      "epoch": 10.0,
+      "grad_norm": 1.3841232061386108,
+      "learning_rate": 7.222222222222222e-05,
+      "loss": 0.5088,
+      "step": 40
+    },
+    {
+      "epoch": 11.32,
+      "grad_norm": 0.6104851365089417,
+      "learning_rate": 8.148148148148148e-05,
+      "loss": 0.4279,
+      "step": 45
+    },
+    {
+      "epoch": 12.64,
+      "grad_norm": 0.6166547536849976,
+      "learning_rate": 9.074074074074075e-05,
+      "loss": 0.2689,
+      "step": 50
+    },
+    {
+      "epoch": 13.96,
+      "grad_norm": 1.70536470413208,
+      "learning_rate": 0.0001,
+      "loss": 0.1985,
+      "step": 55
+    },
+    {
+      "epoch": 15.0,
+      "grad_norm": 3.289710760116577,
+      "learning_rate": 9.971363115693013e-05,
+      "loss": 0.1346,
+      "step": 60
+    },
+    {
+      "epoch": 16.32,
+      "grad_norm": 0.7723984122276306,
+      "learning_rate": 9.942726231386026e-05,
+      "loss": 0.0959,
+      "step": 65
+    },
+    {
+      "epoch": 17.64,
+      "grad_norm": 0.8176506161689758,
+      "learning_rate": 9.914089347079038e-05,
+      "loss": 0.0617,
+      "step": 70
+    },
+    {
+      "epoch": 18.96,
+      "grad_norm": 0.5226219892501831,
+      "learning_rate": 9.885452462772051e-05,
+      "loss": 0.0428,
+      "step": 75
+    },
+    {
+      "epoch": 20.0,
+      "grad_norm": 2.8831839561462402,
+      "learning_rate": 9.856815578465064e-05,
+      "loss": 0.0416,
+      "step": 80
+    },
+    {
+      "epoch": 21.32,
+      "grad_norm": 0.26046544313430786,
+      "learning_rate": 9.828178694158075e-05,
+      "loss": 0.0334,
+      "step": 85
+    },
+    {
+      "epoch": 22.64,
+      "grad_norm": 0.5656669735908508,
+      "learning_rate": 9.799541809851088e-05,
+      "loss": 0.0347,
+      "step": 90
+    },
+    {
+      "epoch": 23.96,
+      "grad_norm": 0.5219624042510986,
+      "learning_rate": 9.7709049255441e-05,
+      "loss": 0.0336,
+      "step": 95
+    },
+    {
+      "epoch": 25.0,
+      "grad_norm": 1.2479528188705444,
+      "learning_rate": 9.742268041237114e-05,
+      "loss": 0.0325,
+      "step": 100
+    },
+    {
+      "epoch": 26.32,
+      "grad_norm": 0.3272712826728821,
+      "learning_rate": 9.713631156930127e-05,
+      "loss": 0.0317,
+      "step": 105
+    },
+    {
+      "epoch": 27.64,
+      "grad_norm": 0.4236655831336975,
+      "learning_rate": 9.68499427262314e-05,
+      "loss": 0.0322,
+      "step": 110
+    },
+    {
+      "epoch": 28.96,
+      "grad_norm": 0.23534469306468964,
+      "learning_rate": 9.656357388316152e-05,
+      "loss": 0.029,
+      "step": 115
+    },
+    {
+      "epoch": 30.0,
+      "grad_norm": 0.530704140663147,
+      "learning_rate": 9.627720504009165e-05,
+      "loss": 0.0301,
+      "step": 120
+    },
+    {
+      "epoch": 31.32,
+      "grad_norm": 0.08252622932195663,
+      "learning_rate": 9.599083619702178e-05,
+      "loss": 0.029,
+      "step": 125
+    },
+    {
+      "epoch": 32.64,
+      "grad_norm": 0.2679576277732849,
+      "learning_rate": 9.57044673539519e-05,
+      "loss": 0.0287,
+      "step": 130
+    },
+    {
+      "epoch": 33.96,
+      "grad_norm": 0.30863121151924133,
+      "learning_rate": 9.541809851088203e-05,
+      "loss": 0.029,
+      "step": 135
+    },
+    {
+      "epoch": 35.0,
+      "grad_norm": 0.27921056747436523,
+      "learning_rate": 9.513172966781214e-05,
+      "loss": 0.0272,
+      "step": 140
+    },
+    {
+      "epoch": 36.32,
+      "grad_norm": 0.15001249313354492,
+      "learning_rate": 9.484536082474227e-05,
+      "loss": 0.0289,
+      "step": 145
+    },
+    {
+      "epoch": 37.64,
+      "grad_norm": 0.391609787940979,
+      "learning_rate": 9.45589919816724e-05,
+      "loss": 0.0295,
+      "step": 150
+    },
+    {
+      "epoch": 38.96,
+      "grad_norm": 0.24230684340000153,
+      "learning_rate": 9.427262313860252e-05,
+      "loss": 0.0265,
+      "step": 155
+    },
+    {
+      "epoch": 40.0,
+      "grad_norm": 2.2498250007629395,
+      "learning_rate": 9.398625429553265e-05,
+      "loss": 0.0319,
+      "step": 160
+    },
+    {
+      "epoch": 41.32,
+      "grad_norm": 0.14986856281757355,
+      "learning_rate": 9.369988545246277e-05,
+      "loss": 0.0277,
+      "step": 165
+    },
+    {
+      "epoch": 42.64,
+      "grad_norm": 0.14574986696243286,
+      "learning_rate": 9.34135166093929e-05,
+      "loss": 0.0264,
+      "step": 170
+    },
+    {
+      "epoch": 43.96,
+      "grad_norm": 0.11353456974029541,
+      "learning_rate": 9.312714776632303e-05,
+      "loss": 0.026,
+      "step": 175
+    },
+    {
+      "epoch": 45.0,
+      "grad_norm": 0.19234131276607513,
+      "learning_rate": 9.284077892325315e-05,
+      "loss": 0.0237,
+      "step": 180
+    },
+    {
+      "epoch": 46.32,
+      "grad_norm": 0.058677107095718384,
+      "learning_rate": 9.255441008018328e-05,
+      "loss": 0.0265,
+      "step": 185
+    },
+    {
+      "epoch": 47.64,
+      "grad_norm": 0.2846521735191345,
+      "learning_rate": 9.22680412371134e-05,
+      "loss": 0.0279,
+      "step": 190
+    },
+    {
+      "epoch": 48.96,
+      "grad_norm": 0.06889114528894424,
+      "learning_rate": 9.198167239404353e-05,
+      "loss": 0.0257,
+      "step": 195
+    },
+    {
+      "epoch": 50.0,
+      "grad_norm": 0.1600271314382553,
+      "learning_rate": 9.169530355097366e-05,
+      "loss": 0.0249,
+      "step": 200
+    },
+    {
+      "epoch": 51.32,
+      "grad_norm": 0.06680695712566376,
+      "learning_rate": 9.140893470790379e-05,
+      "loss": 0.0245,
+      "step": 205
+    },
+    {
+      "epoch": 52.64,
+      "grad_norm": 0.06898869574069977,
+      "learning_rate": 9.112256586483391e-05,
+      "loss": 0.0257,
+      "step": 210
+    },
+    {
+      "epoch": 53.96,
+      "grad_norm": 0.04665664583444595,
+      "learning_rate": 9.083619702176404e-05,
+      "loss": 0.0246,
+      "step": 215
+    },
+    {
+      "epoch": 55.0,
+      "grad_norm": 0.18880419433116913,
+      "learning_rate": 9.054982817869416e-05,
+      "loss": 0.0267,
+      "step": 220
+    },
+    {
+      "epoch": 56.32,
+      "grad_norm": 0.05329155549407005,
+      "learning_rate": 9.026345933562429e-05,
+      "loss": 0.0258,
+      "step": 225
+    },
+    {
+      "epoch": 57.64,
+      "grad_norm": 0.05351603031158447,
+      "learning_rate": 8.997709049255442e-05,
+      "loss": 0.0264,
+      "step": 230
+    },
+    {
+      "epoch": 58.96,
+      "grad_norm": 0.05472696200013161,
+      "learning_rate": 8.969072164948454e-05,
+      "loss": 0.0266,
+      "step": 235
+    },
+    {
+      "epoch": 60.0,
+      "grad_norm": 0.17182305455207825,
+      "learning_rate": 8.940435280641467e-05,
+      "loss": 0.0255,
+      "step": 240
+    },
+    {
+      "epoch": 61.32,
+      "grad_norm": 0.05441403388977051,
+      "learning_rate": 8.91179839633448e-05,
+      "loss": 0.0259,
+      "step": 245
+    },
+    {
+      "epoch": 62.64,
+      "grad_norm": 0.05443132296204567,
+      "learning_rate": 8.883161512027491e-05,
+      "loss": 0.025,
+      "step": 250
+    },
+    {
+      "epoch": 63.96,
+      "grad_norm": 0.05410757660865784,
+      "learning_rate": 8.854524627720504e-05,
+      "loss": 0.0261,
+      "step": 255
+    },
+    {
+      "epoch": 65.0,
+      "grad_norm": 0.16327381134033203,
+      "learning_rate": 8.825887743413516e-05,
+      "loss": 0.0265,
+      "step": 260
+    },
+    {
+      "epoch": 66.32,
+      "grad_norm": 0.05516252666711807,
+      "learning_rate": 8.797250859106529e-05,
+      "loss": 0.0251,
+      "step": 265
+    },
+    {
+      "epoch": 67.64,
+      "grad_norm": 0.0483415424823761,
+      "learning_rate": 8.768613974799542e-05,
+      "loss": 0.0255,
+      "step": 270
+    },
+    {
+      "epoch": 68.96,
+      "grad_norm": 0.062226541340351105,
+      "learning_rate": 8.739977090492554e-05,
+      "loss": 0.0247,
+      "step": 275
+    },
+    {
+      "epoch": 70.0,
+      "grad_norm": 0.20358847081661224,
+      "learning_rate": 8.711340206185567e-05,
+      "loss": 0.0267,
+      "step": 280
+    },
+    {
+      "epoch": 71.32,
+      "grad_norm": 0.04628003016114235,
+      "learning_rate": 8.682703321878581e-05,
+      "loss": 0.0255,
+      "step": 285
+    },
+    {
+      "epoch": 72.64,
+      "grad_norm": 0.06483373790979385,
+      "learning_rate": 8.654066437571594e-05,
+      "loss": 0.0257,
+      "step": 290
+    },
+    {
+      "epoch": 73.96,
+      "grad_norm": 0.04926105588674545,
+      "learning_rate": 8.625429553264606e-05,
+      "loss": 0.0244,
+      "step": 295
+    },
+    {
+      "epoch": 75.0,
+      "grad_norm": 0.1988091617822647,
+      "learning_rate": 8.596792668957619e-05,
+      "loss": 0.0239,
+      "step": 300
+    },
+    {
+      "epoch": 76.32,
+      "grad_norm": 0.04305023327469826,
+      "learning_rate": 8.56815578465063e-05,
+      "loss": 0.0248,
+      "step": 305
+    },
+    {
+      "epoch": 77.64,
+      "grad_norm": 0.04323578625917435,
+      "learning_rate": 8.539518900343643e-05,
+      "loss": 0.0254,
+      "step": 310
+    },
+    {
+      "epoch": 78.96,
+      "grad_norm": 0.04426678270101547,
+      "learning_rate": 8.510882016036655e-05,
+      "loss": 0.0254,
+      "step": 315
+    },
+    {
+      "epoch": 80.0,
+      "grad_norm": 0.14689449965953827,
+      "learning_rate": 8.482245131729668e-05,
+      "loss": 0.0259,
+      "step": 320
+    },
+    {
+      "epoch": 81.32,
+      "grad_norm": 0.04256561025977135,
+      "learning_rate": 8.453608247422681e-05,
+      "loss": 0.0256,
+      "step": 325
+    },
+    {
+      "epoch": 82.64,
+      "grad_norm": 0.03943061828613281,
+      "learning_rate": 8.424971363115693e-05,
+      "loss": 0.0235,
+      "step": 330
+    },
+    {
+      "epoch": 83.96,
+      "grad_norm": 0.041899990290403366,
+      "learning_rate": 8.396334478808706e-05,
+      "loss": 0.0249,
+      "step": 335
+    },
+    {
+      "epoch": 85.0,
+      "grad_norm": 0.151236429810524,
+      "learning_rate": 8.367697594501719e-05,
+      "loss": 0.0255,
+      "step": 340
+    },
+    {
+      "epoch": 86.32,
+      "grad_norm": 0.042102884501218796,
+      "learning_rate": 8.339060710194731e-05,
+      "loss": 0.0244,
+      "step": 345
+    },
+    {
+      "epoch": 87.64,
+      "grad_norm": 0.04723669961094856,
+      "learning_rate": 8.310423825887744e-05,
+      "loss": 0.0251,
+      "step": 350
+    },
+    {
+      "epoch": 88.96,
+      "grad_norm": 0.0578082799911499,
+      "learning_rate": 8.281786941580757e-05,
+      "loss": 0.0261,
+      "step": 355
+    },
+    {
+      "epoch": 90.0,
+      "grad_norm": 0.10269813239574432,
+      "learning_rate": 8.253150057273768e-05,
+      "loss": 0.0225,
+      "step": 360
+    },
+    {
+      "epoch": 91.32,
+      "grad_norm": 0.046400491148233414,
+      "learning_rate": 8.224513172966782e-05,
+      "loss": 0.0262,
+      "step": 365
+    },
+    {
+      "epoch": 92.64,
+      "grad_norm": 0.04183673858642578,
+      "learning_rate": 8.195876288659795e-05,
+      "loss": 0.0239,
+      "step": 370
+    },
+    {
+      "epoch": 93.96,
+      "grad_norm": 0.04400316998362541,
+      "learning_rate": 8.167239404352807e-05,
+      "loss": 0.0263,
+      "step": 375
+    },
+    {
+      "epoch": 95.0,
+      "grad_norm": 0.10862386226654053,
+      "learning_rate": 8.13860252004582e-05,
+      "loss": 0.025,
+      "step": 380
+    },
+    {
+      "epoch": 96.32,
+      "grad_norm": 0.05308162048459053,
+      "learning_rate": 8.109965635738833e-05,
+      "loss": 0.0248,
+      "step": 385
+    },
+    {
+      "epoch": 97.64,
+      "grad_norm": 0.04261139780282974,
+      "learning_rate": 8.081328751431845e-05,
+      "loss": 0.0244,
+      "step": 390
+    },
+    {
+      "epoch": 98.96,
+      "grad_norm": 0.05337546020746231,
+      "learning_rate": 8.052691867124858e-05,
+      "loss": 0.0253,
+      "step": 395
+    },
+    {
+      "epoch": 100.0,
+      "grad_norm": 0.15639856457710266,
+      "learning_rate": 8.02405498281787e-05,
+      "loss": 0.0243,
+      "step": 400
+    },
+    {
+      "epoch": 101.32,
+      "grad_norm": 0.04450729116797447,
+      "learning_rate": 7.995418098510883e-05,
+      "loss": 0.0258,
+      "step": 405
+    },
+    {
+      "epoch": 102.64,
+      "grad_norm": 0.042327046394348145,
+      "learning_rate": 7.966781214203894e-05,
+      "loss": 0.0244,
+      "step": 410
+    },
+    {
+      "epoch": 103.96,
+      "grad_norm": 0.04105006903409958,
+      "learning_rate": 7.938144329896907e-05,
+      "loss": 0.0253,
+      "step": 415
+    },
+    {
+      "epoch": 105.0,
+      "grad_norm": 0.17930248379707336,
+      "learning_rate": 7.90950744558992e-05,
+      "loss": 0.0261,
+      "step": 420
+    },
+    {
+      "epoch": 106.32,
+      "grad_norm": 0.04404031112790108,
+      "learning_rate": 7.880870561282932e-05,
+      "loss": 0.0241,
+      "step": 425
+    },
+    {
+      "epoch": 107.64,
+      "grad_norm": 0.04142986983060837,
+      "learning_rate": 7.852233676975945e-05,
+      "loss": 0.0245,
+      "step": 430
+    },
+    {
+      "epoch": 108.96,
+      "grad_norm": 0.041959185153245926,
+      "learning_rate": 7.823596792668958e-05,
+      "loss": 0.0254,
+      "step": 435
+    },
+    {
+      "epoch": 110.0,
+      "grad_norm": 0.27740439772605896,
+      "learning_rate": 7.79495990836197e-05,
+      "loss": 0.0292,
+      "step": 440
+    },
+    {
+      "epoch": 111.32,
+      "grad_norm": 0.03657572343945503,
+      "learning_rate": 7.766323024054983e-05,
+      "loss": 0.026,
+      "step": 445
+    },
+    {
+      "epoch": 112.64,
+      "grad_norm": 0.042320434004068375,
+      "learning_rate": 7.737686139747996e-05,
+      "loss": 0.0251,
+      "step": 450
+    },
+    {
+      "epoch": 113.96,
+      "grad_norm": 0.0473681204020977,
+      "learning_rate": 7.709049255441008e-05,
+      "loss": 0.026,
+      "step": 455
+    },
+    {
+      "epoch": 115.0,
+      "grad_norm": 0.1326676607131958,
+      "learning_rate": 7.680412371134021e-05,
+      "loss": 0.0241,
+      "step": 460
+    },
+    {
+      "epoch": 116.32,
+      "grad_norm": 0.04483647271990776,
+      "learning_rate": 7.651775486827034e-05,
+      "loss": 0.0236,
+      "step": 465
+    },
+    {
+      "epoch": 117.64,
+      "grad_norm": 0.038961004465818405,
+      "learning_rate": 7.623138602520046e-05,
+      "loss": 0.0235,
+      "step": 470
+    },
+    {
+      "epoch": 118.96,
+      "grad_norm": 0.042134907096624374,
+      "learning_rate": 7.594501718213059e-05,
+      "loss": 0.0252,
+      "step": 475
+    },
+    {
+      "epoch": 120.0,
+      "grad_norm": 0.13292020559310913,
+      "learning_rate": 7.565864833906071e-05,
+      "loss": 0.024,
+      "step": 480
+    },
+    {
+      "epoch": 121.32,
+      "grad_norm": 0.03745294362306595,
+      "learning_rate": 7.537227949599084e-05,
+      "loss": 0.025,
+      "step": 485
+    },
+    {
+      "epoch": 122.64,
+      "grad_norm": 0.035545315593481064,
+      "learning_rate": 7.508591065292097e-05,
+      "loss": 0.0253,
+      "step": 490
+    },
+    {
+      "epoch": 123.96,
+      "grad_norm": 0.03991984575986862,
+      "learning_rate": 7.47995418098511e-05,
+      "loss": 0.026,
+      "step": 495
+    },
+    {
+      "epoch": 125.0,
+      "grad_norm": 0.1339961290359497,
+      "learning_rate": 7.451317296678122e-05,
+      "loss": 0.0246,
+      "step": 500
+    },
+    {
+      "epoch": 126.32,
+      "grad_norm": 0.04381132498383522,
+      "learning_rate": 7.422680412371135e-05,
+      "loss": 0.0235,
+      "step": 505
+    },
+    {
+      "epoch": 127.64,
+      "grad_norm": 0.048515841364860535,
+      "learning_rate": 7.394043528064147e-05,
+      "loss": 0.0242,
+      "step": 510
+    },
+    {
+      "epoch": 128.96,
+      "grad_norm": 0.04145604744553566,
+      "learning_rate": 7.36540664375716e-05,
+      "loss": 0.0249,
+      "step": 515
+    },
+    {
+      "epoch": 130.0,
+      "grad_norm": 0.14400818943977356,
+      "learning_rate": 7.336769759450171e-05,
+      "loss": 0.0247,
+      "step": 520
+    },
+    {
+      "epoch": 131.32,
+      "grad_norm": 0.04025031998753548,
+      "learning_rate": 7.308132875143184e-05,
+      "loss": 0.0241,
+      "step": 525
+    },
+    {
+      "epoch": 132.64,
+      "grad_norm": 0.037277135998010635,
+      "learning_rate": 7.279495990836197e-05,
+      "loss": 0.0242,
+      "step": 530
+    },
+    {
+      "epoch": 133.96,
+      "grad_norm": 0.03666083887219429,
+      "learning_rate": 7.250859106529209e-05,
+      "loss": 0.0251,
+      "step": 535
+    },
+    {
+      "epoch": 135.0,
+      "grad_norm": 0.09921745210886002,
+      "learning_rate": 7.222222222222222e-05,
+      "loss": 0.0241,
+      "step": 540
+    },
+    {
+      "epoch": 136.32,
+      "grad_norm": 0.0382193848490715,
+      "learning_rate": 7.193585337915235e-05,
+      "loss": 0.0247,
+      "step": 545
+    },
+    {
+      "epoch": 137.64,
+      "grad_norm": 0.0314810685813427,
+      "learning_rate": 7.164948453608247e-05,
+      "loss": 0.0239,
+      "step": 550
+    },
+    {
+      "epoch": 138.96,
+      "grad_norm": 0.04278745502233505,
+      "learning_rate": 7.136311569301261e-05,
+      "loss": 0.0243,
+      "step": 555
+    },
+    {
+      "epoch": 140.0,
+      "grad_norm": 0.09295342862606049,
+      "learning_rate": 7.107674684994274e-05,
+      "loss": 0.0234,
+      "step": 560
+    },
+    {
+      "epoch": 141.32,
+      "grad_norm": 0.03429599106311798,
+      "learning_rate": 7.079037800687286e-05,
+      "loss": 0.0248,
+      "step": 565
+    },
+    {
+      "epoch": 142.64,
+      "grad_norm": 0.03622185438871384,
+      "learning_rate": 7.050400916380299e-05,
+      "loss": 0.0234,
+      "step": 570
+    },
+    {
+      "epoch": 143.96,
+      "grad_norm": 0.042615506798028946,
+      "learning_rate": 7.02176403207331e-05,
+      "loss": 0.0242,
+      "step": 575
+    },
+    {
+      "epoch": 145.0,
+      "grad_norm": 0.13792142271995544,
+      "learning_rate": 6.993127147766323e-05,
+      "loss": 0.0268,
+      "step": 580
+    },
+    {
+      "epoch": 146.32,
+      "grad_norm": 0.035664405673742294,
+      "learning_rate": 6.964490263459336e-05,
+      "loss": 0.0231,
+      "step": 585
+    },
+    {
+      "epoch": 147.64,
+      "grad_norm": 0.033511932939291,
+      "learning_rate": 6.935853379152348e-05,
+      "loss": 0.0258,
+      "step": 590
+    },
+    {
+      "epoch": 148.96,
+      "grad_norm": 0.036591917276382446,
+      "learning_rate": 6.907216494845361e-05,
+      "loss": 0.0248,
+      "step": 595
+    },
+    {
+      "epoch": 150.0,
+      "grad_norm": 0.11892726272344589,
+      "learning_rate": 6.878579610538374e-05,
+      "loss": 0.0257,
+      "step": 600
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 1800,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 450,
+  "save_steps": 300,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.102713700352e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

Math_QA/group_09/checkpoints/checkpoint-600/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

Math_QA/group_09/metadata.json ADDED Viewed

	@@ -0,0 +1,2718 @@

+{
+  "dataset_name": "Math_QA",
+  "group_index": 9,
+  "prompt_group_file": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/prompt_groups/Math_QA/group_09.json",
+  "output_dir": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/outputs/Math_QA/group_09",
+  "checkpoint_root": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/outputs/Math_QA/group_09/checkpoints",
+  "generated_at": "2025-11-06T12:12:07Z",
+  "train_loss": 0.049091512378719115,
+  "metrics": {
+    "train_runtime": 30940.5798,
+    "train_samples_per_second": 1.862,
+    "train_steps_per_second": 0.058,
+    "total_flos": 1.5308141101056e+18,
+    "train_loss": 0.049091512378719115,
+    "epoch": 450.0
+  },
+  "trainer_state": [
+    {
+      "loss": 1.9528,
+      "grad_norm": 10.95508098602295,
+      "learning_rate": 0.0,
+      "epoch": 0.32,
+      "step": 1
+    },
+    {
+      "loss": 1.7919,
+      "grad_norm": 6.976505279541016,
+      "learning_rate": 7.4074074074074075e-06,
+      "epoch": 1.32,
+      "step": 5
+    },
+    {
+      "loss": 1.6625,
+      "grad_norm": 3.288942575454712,
+      "learning_rate": 1.6666666666666667e-05,
+      "epoch": 2.64,
+      "step": 10
+    },
+    {
+      "loss": 1.2009,
+      "grad_norm": 2.111987829208374,
+      "learning_rate": 2.5925925925925925e-05,
+      "epoch": 3.96,
+      "step": 15
+    },
+    {
+      "loss": 0.8544,
+      "grad_norm": 2.4555912017822266,
+      "learning_rate": 3.518518518518519e-05,
+      "epoch": 5.0,
+      "step": 20
+    },
+    {
+      "loss": 0.7449,
+      "grad_norm": 0.656902015209198,
+      "learning_rate": 4.4444444444444447e-05,
+      "epoch": 6.32,
+      "step": 25
+    },
+    {
+      "loss": 0.5884,
+      "grad_norm": 0.5291489958763123,
+      "learning_rate": 5.370370370370371e-05,
+      "epoch": 7.64,
+      "step": 30
+    },
+    {
+      "loss": 0.6349,
+      "grad_norm": 0.5356371998786926,
+      "learning_rate": 6.296296296296296e-05,
+      "epoch": 8.96,
+      "step": 35
+    },
+    {
+      "loss": 0.5088,
+      "grad_norm": 1.3841232061386108,
+      "learning_rate": 7.222222222222222e-05,
+      "epoch": 10.0,
+      "step": 40
+    },
+    {
+      "loss": 0.4279,
+      "grad_norm": 0.6104851365089417,
+      "learning_rate": 8.148148148148148e-05,
+      "epoch": 11.32,
+      "step": 45
+    },
+    {
+      "loss": 0.2689,
+      "grad_norm": 0.6166547536849976,
+      "learning_rate": 9.074074074074075e-05,
+      "epoch": 12.64,
+      "step": 50
+    },
+    {
+      "loss": 0.1985,
+      "grad_norm": 1.70536470413208,
+      "learning_rate": 0.0001,
+      "epoch": 13.96,
+      "step": 55
+    },
+    {
+      "loss": 0.1346,
+      "grad_norm": 3.289710760116577,
+      "learning_rate": 9.971363115693013e-05,
+      "epoch": 15.0,
+      "step": 60
+    },
+    {
+      "loss": 0.0959,
+      "grad_norm": 0.7723984122276306,
+      "learning_rate": 9.942726231386026e-05,
+      "epoch": 16.32,
+      "step": 65
+    },
+    {
+      "loss": 0.0617,
+      "grad_norm": 0.8176506161689758,
+      "learning_rate": 9.914089347079038e-05,
+      "epoch": 17.64,
+      "step": 70
+    },
+    {
+      "loss": 0.0428,
+      "grad_norm": 0.5226219892501831,
+      "learning_rate": 9.885452462772051e-05,
+      "epoch": 18.96,
+      "step": 75
+    },
+    {
+      "loss": 0.0416,
+      "grad_norm": 2.8831839561462402,
+      "learning_rate": 9.856815578465064e-05,
+      "epoch": 20.0,
+      "step": 80
+    },
+    {
+      "loss": 0.0334,
+      "grad_norm": 0.26046544313430786,
+      "learning_rate": 9.828178694158075e-05,
+      "epoch": 21.32,
+      "step": 85
+    },
+    {
+      "loss": 0.0347,
+      "grad_norm": 0.5656669735908508,
+      "learning_rate": 9.799541809851088e-05,
+      "epoch": 22.64,
+      "step": 90
+    },
+    {
+      "loss": 0.0336,
+      "grad_norm": 0.5219624042510986,
+      "learning_rate": 9.7709049255441e-05,
+      "epoch": 23.96,
+      "step": 95
+    },
+    {
+      "loss": 0.0325,
+      "grad_norm": 1.2479528188705444,
+      "learning_rate": 9.742268041237114e-05,
+      "epoch": 25.0,
+      "step": 100
+    },
+    {
+      "loss": 0.0317,
+      "grad_norm": 0.3272712826728821,
+      "learning_rate": 9.713631156930127e-05,
+      "epoch": 26.32,
+      "step": 105
+    },
+    {
+      "loss": 0.0322,
+      "grad_norm": 0.4236655831336975,
+      "learning_rate": 9.68499427262314e-05,
+      "epoch": 27.64,
+      "step": 110
+    },
+    {
+      "loss": 0.029,
+      "grad_norm": 0.23534469306468964,
+      "learning_rate": 9.656357388316152e-05,
+      "epoch": 28.96,
+      "step": 115
+    },
+    {
+      "loss": 0.0301,
+      "grad_norm": 0.530704140663147,
+      "learning_rate": 9.627720504009165e-05,
+      "epoch": 30.0,
+      "step": 120
+    },
+    {
+      "loss": 0.029,
+      "grad_norm": 0.08252622932195663,
+      "learning_rate": 9.599083619702178e-05,
+      "epoch": 31.32,
+      "step": 125
+    },
+    {
+      "loss": 0.0287,
+      "grad_norm": 0.2679576277732849,
+      "learning_rate": 9.57044673539519e-05,
+      "epoch": 32.64,
+      "step": 130
+    },
+    {
+      "loss": 0.029,
+      "grad_norm": 0.30863121151924133,
+      "learning_rate": 9.541809851088203e-05,
+      "epoch": 33.96,
+      "step": 135
+    },
+    {
+      "loss": 0.0272,
+      "grad_norm": 0.27921056747436523,
+      "learning_rate": 9.513172966781214e-05,
+      "epoch": 35.0,
+      "step": 140
+    },
+    {
+      "loss": 0.0289,
+      "grad_norm": 0.15001249313354492,
+      "learning_rate": 9.484536082474227e-05,
+      "epoch": 36.32,
+      "step": 145
+    },
+    {
+      "loss": 0.0295,
+      "grad_norm": 0.391609787940979,
+      "learning_rate": 9.45589919816724e-05,
+      "epoch": 37.64,
+      "step": 150
+    },
+    {
+      "loss": 0.0265,
+      "grad_norm": 0.24230684340000153,
+      "learning_rate": 9.427262313860252e-05,
+      "epoch": 38.96,
+      "step": 155
+    },
+    {
+      "loss": 0.0319,
+      "grad_norm": 2.2498250007629395,
+      "learning_rate": 9.398625429553265e-05,
+      "epoch": 40.0,
+      "step": 160
+    },
+    {
+      "loss": 0.0277,
+      "grad_norm": 0.14986856281757355,
+      "learning_rate": 9.369988545246277e-05,
+      "epoch": 41.32,
+      "step": 165
+    },
+    {
+      "loss": 0.0264,
+      "grad_norm": 0.14574986696243286,
+      "learning_rate": 9.34135166093929e-05,
+      "epoch": 42.64,
+      "step": 170
+    },
+    {
+      "loss": 0.026,
+      "grad_norm": 0.11353456974029541,
+      "learning_rate": 9.312714776632303e-05,
+      "epoch": 43.96,
+      "step": 175
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.19234131276607513,
+      "learning_rate": 9.284077892325315e-05,
+      "epoch": 45.0,
+      "step": 180
+    },
+    {
+      "loss": 0.0265,
+      "grad_norm": 0.058677107095718384,
+      "learning_rate": 9.255441008018328e-05,
+      "epoch": 46.32,
+      "step": 185
+    },
+    {
+      "loss": 0.0279,
+      "grad_norm": 0.2846521735191345,
+      "learning_rate": 9.22680412371134e-05,
+      "epoch": 47.64,
+      "step": 190
+    },
+    {
+      "loss": 0.0257,
+      "grad_norm": 0.06889114528894424,
+      "learning_rate": 9.198167239404353e-05,
+      "epoch": 48.96,
+      "step": 195
+    },
+    {
+      "loss": 0.0249,
+      "grad_norm": 0.1600271314382553,
+      "learning_rate": 9.169530355097366e-05,
+      "epoch": 50.0,
+      "step": 200
+    },
+    {
+      "loss": 0.0245,
+      "grad_norm": 0.06680695712566376,
+      "learning_rate": 9.140893470790379e-05,
+      "epoch": 51.32,
+      "step": 205
+    },
+    {
+      "loss": 0.0257,
+      "grad_norm": 0.06898869574069977,
+      "learning_rate": 9.112256586483391e-05,
+      "epoch": 52.64,
+      "step": 210
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.04665664583444595,
+      "learning_rate": 9.083619702176404e-05,
+      "epoch": 53.96,
+      "step": 215
+    },
+    {
+      "loss": 0.0267,
+      "grad_norm": 0.18880419433116913,
+      "learning_rate": 9.054982817869416e-05,
+      "epoch": 55.0,
+      "step": 220
+    },
+    {
+      "loss": 0.0258,
+      "grad_norm": 0.05329155549407005,
+      "learning_rate": 9.026345933562429e-05,
+      "epoch": 56.32,
+      "step": 225
+    },
+    {
+      "loss": 0.0264,
+      "grad_norm": 0.05351603031158447,
+      "learning_rate": 8.997709049255442e-05,
+      "epoch": 57.64,
+      "step": 230
+    },
+    {
+      "loss": 0.0266,
+      "grad_norm": 0.05472696200013161,
+      "learning_rate": 8.969072164948454e-05,
+      "epoch": 58.96,
+      "step": 235
+    },
+    {
+      "loss": 0.0255,
+      "grad_norm": 0.17182305455207825,
+      "learning_rate": 8.940435280641467e-05,
+      "epoch": 60.0,
+      "step": 240
+    },
+    {
+      "loss": 0.0259,
+      "grad_norm": 0.05441403388977051,
+      "learning_rate": 8.91179839633448e-05,
+      "epoch": 61.32,
+      "step": 245
+    },
+    {
+      "loss": 0.025,
+      "grad_norm": 0.05443132296204567,
+      "learning_rate": 8.883161512027491e-05,
+      "epoch": 62.64,
+      "step": 250
+    },
+    {
+      "loss": 0.0261,
+      "grad_norm": 0.05410757660865784,
+      "learning_rate": 8.854524627720504e-05,
+      "epoch": 63.96,
+      "step": 255
+    },
+    {
+      "loss": 0.0265,
+      "grad_norm": 0.16327381134033203,
+      "learning_rate": 8.825887743413516e-05,
+      "epoch": 65.0,
+      "step": 260
+    },
+    {
+      "loss": 0.0251,
+      "grad_norm": 0.05516252666711807,
+      "learning_rate": 8.797250859106529e-05,
+      "epoch": 66.32,
+      "step": 265
+    },
+    {
+      "loss": 0.0255,
+      "grad_norm": 0.0483415424823761,
+      "learning_rate": 8.768613974799542e-05,
+      "epoch": 67.64,
+      "step": 270
+    },
+    {
+      "loss": 0.0247,
+      "grad_norm": 0.062226541340351105,
+      "learning_rate": 8.739977090492554e-05,
+      "epoch": 68.96,
+      "step": 275
+    },
+    {
+      "loss": 0.0267,
+      "grad_norm": 0.20358847081661224,
+      "learning_rate": 8.711340206185567e-05,
+      "epoch": 70.0,
+      "step": 280
+    },
+    {
+      "loss": 0.0255,
+      "grad_norm": 0.04628003016114235,
+      "learning_rate": 8.682703321878581e-05,
+      "epoch": 71.32,
+      "step": 285
+    },
+    {
+      "loss": 0.0257,
+      "grad_norm": 0.06483373790979385,
+      "learning_rate": 8.654066437571594e-05,
+      "epoch": 72.64,
+      "step": 290
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.04926105588674545,
+      "learning_rate": 8.625429553264606e-05,
+      "epoch": 73.96,
+      "step": 295
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.1988091617822647,
+      "learning_rate": 8.596792668957619e-05,
+      "epoch": 75.0,
+      "step": 300
+    },
+    {
+      "loss": 0.0248,
+      "grad_norm": 0.04305023327469826,
+      "learning_rate": 8.56815578465063e-05,
+      "epoch": 76.32,
+      "step": 305
+    },
+    {
+      "loss": 0.0254,
+      "grad_norm": 0.04323578625917435,
+      "learning_rate": 8.539518900343643e-05,
+      "epoch": 77.64,
+      "step": 310
+    },
+    {
+      "loss": 0.0254,
+      "grad_norm": 0.04426678270101547,
+      "learning_rate": 8.510882016036655e-05,
+      "epoch": 78.96,
+      "step": 315
+    },
+    {
+      "loss": 0.0259,
+      "grad_norm": 0.14689449965953827,
+      "learning_rate": 8.482245131729668e-05,
+      "epoch": 80.0,
+      "step": 320
+    },
+    {
+      "loss": 0.0256,
+      "grad_norm": 0.04256561025977135,
+      "learning_rate": 8.453608247422681e-05,
+      "epoch": 81.32,
+      "step": 325
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.03943061828613281,
+      "learning_rate": 8.424971363115693e-05,
+      "epoch": 82.64,
+      "step": 330
+    },
+    {
+      "loss": 0.0249,
+      "grad_norm": 0.041899990290403366,
+      "learning_rate": 8.396334478808706e-05,
+      "epoch": 83.96,
+      "step": 335
+    },
+    {
+      "loss": 0.0255,
+      "grad_norm": 0.151236429810524,
+      "learning_rate": 8.367697594501719e-05,
+      "epoch": 85.0,
+      "step": 340
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.042102884501218796,
+      "learning_rate": 8.339060710194731e-05,
+      "epoch": 86.32,
+      "step": 345
+    },
+    {
+      "loss": 0.0251,
+      "grad_norm": 0.04723669961094856,
+      "learning_rate": 8.310423825887744e-05,
+      "epoch": 87.64,
+      "step": 350
+    },
+    {
+      "loss": 0.0261,
+      "grad_norm": 0.0578082799911499,
+      "learning_rate": 8.281786941580757e-05,
+      "epoch": 88.96,
+      "step": 355
+    },
+    {
+      "loss": 0.0225,
+      "grad_norm": 0.10269813239574432,
+      "learning_rate": 8.253150057273768e-05,
+      "epoch": 90.0,
+      "step": 360
+    },
+    {
+      "loss": 0.0262,
+      "grad_norm": 0.046400491148233414,
+      "learning_rate": 8.224513172966782e-05,
+      "epoch": 91.32,
+      "step": 365
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.04183673858642578,
+      "learning_rate": 8.195876288659795e-05,
+      "epoch": 92.64,
+      "step": 370
+    },
+    {
+      "loss": 0.0263,
+      "grad_norm": 0.04400316998362541,
+      "learning_rate": 8.167239404352807e-05,
+      "epoch": 93.96,
+      "step": 375
+    },
+    {
+      "loss": 0.025,
+      "grad_norm": 0.10862386226654053,
+      "learning_rate": 8.13860252004582e-05,
+      "epoch": 95.0,
+      "step": 380
+    },
+    {
+      "loss": 0.0248,
+      "grad_norm": 0.05308162048459053,
+      "learning_rate": 8.109965635738833e-05,
+      "epoch": 96.32,
+      "step": 385
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.04261139780282974,
+      "learning_rate": 8.081328751431845e-05,
+      "epoch": 97.64,
+      "step": 390
+    },
+    {
+      "loss": 0.0253,
+      "grad_norm": 0.05337546020746231,
+      "learning_rate": 8.052691867124858e-05,
+      "epoch": 98.96,
+      "step": 395
+    },
+    {
+      "loss": 0.0243,
+      "grad_norm": 0.15639856457710266,
+      "learning_rate": 8.02405498281787e-05,
+      "epoch": 100.0,
+      "step": 400
+    },
+    {
+      "loss": 0.0258,
+      "grad_norm": 0.04450729116797447,
+      "learning_rate": 7.995418098510883e-05,
+      "epoch": 101.32,
+      "step": 405
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.042327046394348145,
+      "learning_rate": 7.966781214203894e-05,
+      "epoch": 102.64,
+      "step": 410
+    },
+    {
+      "loss": 0.0253,
+      "grad_norm": 0.04105006903409958,
+      "learning_rate": 7.938144329896907e-05,
+      "epoch": 103.96,
+      "step": 415
+    },
+    {
+      "loss": 0.0261,
+      "grad_norm": 0.17930248379707336,
+      "learning_rate": 7.90950744558992e-05,
+      "epoch": 105.0,
+      "step": 420
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.04404031112790108,
+      "learning_rate": 7.880870561282932e-05,
+      "epoch": 106.32,
+      "step": 425
+    },
+    {
+      "loss": 0.0245,
+      "grad_norm": 0.04142986983060837,
+      "learning_rate": 7.852233676975945e-05,
+      "epoch": 107.64,
+      "step": 430
+    },
+    {
+      "loss": 0.0254,
+      "grad_norm": 0.041959185153245926,
+      "learning_rate": 7.823596792668958e-05,
+      "epoch": 108.96,
+      "step": 435
+    },
+    {
+      "loss": 0.0292,
+      "grad_norm": 0.27740439772605896,
+      "learning_rate": 7.79495990836197e-05,
+      "epoch": 110.0,
+      "step": 440
+    },
+    {
+      "loss": 0.026,
+      "grad_norm": 0.03657572343945503,
+      "learning_rate": 7.766323024054983e-05,
+      "epoch": 111.32,
+      "step": 445
+    },
+    {
+      "loss": 0.0251,
+      "grad_norm": 0.042320434004068375,
+      "learning_rate": 7.737686139747996e-05,
+      "epoch": 112.64,
+      "step": 450
+    },
+    {
+      "loss": 0.026,
+      "grad_norm": 0.0473681204020977,
+      "learning_rate": 7.709049255441008e-05,
+      "epoch": 113.96,
+      "step": 455
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.1326676607131958,
+      "learning_rate": 7.680412371134021e-05,
+      "epoch": 115.0,
+      "step": 460
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.04483647271990776,
+      "learning_rate": 7.651775486827034e-05,
+      "epoch": 116.32,
+      "step": 465
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.038961004465818405,
+      "learning_rate": 7.623138602520046e-05,
+      "epoch": 117.64,
+      "step": 470
+    },
+    {
+      "loss": 0.0252,
+      "grad_norm": 0.042134907096624374,
+      "learning_rate": 7.594501718213059e-05,
+      "epoch": 118.96,
+      "step": 475
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.13292020559310913,
+      "learning_rate": 7.565864833906071e-05,
+      "epoch": 120.0,
+      "step": 480
+    },
+    {
+      "loss": 0.025,
+      "grad_norm": 0.03745294362306595,
+      "learning_rate": 7.537227949599084e-05,
+      "epoch": 121.32,
+      "step": 485
+    },
+    {
+      "loss": 0.0253,
+      "grad_norm": 0.035545315593481064,
+      "learning_rate": 7.508591065292097e-05,
+      "epoch": 122.64,
+      "step": 490
+    },
+    {
+      "loss": 0.026,
+      "grad_norm": 0.03991984575986862,
+      "learning_rate": 7.47995418098511e-05,
+      "epoch": 123.96,
+      "step": 495
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.1339961290359497,
+      "learning_rate": 7.451317296678122e-05,
+      "epoch": 125.0,
+      "step": 500
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.04381132498383522,
+      "learning_rate": 7.422680412371135e-05,
+      "epoch": 126.32,
+      "step": 505
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.048515841364860535,
+      "learning_rate": 7.394043528064147e-05,
+      "epoch": 127.64,
+      "step": 510
+    },
+    {
+      "loss": 0.0249,
+      "grad_norm": 0.04145604744553566,
+      "learning_rate": 7.36540664375716e-05,
+      "epoch": 128.96,
+      "step": 515
+    },
+    {
+      "loss": 0.0247,
+      "grad_norm": 0.14400818943977356,
+      "learning_rate": 7.336769759450171e-05,
+      "epoch": 130.0,
+      "step": 520
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.04025031998753548,
+      "learning_rate": 7.308132875143184e-05,
+      "epoch": 131.32,
+      "step": 525
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.037277135998010635,
+      "learning_rate": 7.279495990836197e-05,
+      "epoch": 132.64,
+      "step": 530
+    },
+    {
+      "loss": 0.0251,
+      "grad_norm": 0.03666083887219429,
+      "learning_rate": 7.250859106529209e-05,
+      "epoch": 133.96,
+      "step": 535
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.09921745210886002,
+      "learning_rate": 7.222222222222222e-05,
+      "epoch": 135.0,
+      "step": 540
+    },
+    {
+      "loss": 0.0247,
+      "grad_norm": 0.0382193848490715,
+      "learning_rate": 7.193585337915235e-05,
+      "epoch": 136.32,
+      "step": 545
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.0314810685813427,
+      "learning_rate": 7.164948453608247e-05,
+      "epoch": 137.64,
+      "step": 550
+    },
+    {
+      "loss": 0.0243,
+      "grad_norm": 0.04278745502233505,
+      "learning_rate": 7.136311569301261e-05,
+      "epoch": 138.96,
+      "step": 555
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.09295342862606049,
+      "learning_rate": 7.107674684994274e-05,
+      "epoch": 140.0,
+      "step": 560
+    },
+    {
+      "loss": 0.0248,
+      "grad_norm": 0.03429599106311798,
+      "learning_rate": 7.079037800687286e-05,
+      "epoch": 141.32,
+      "step": 565
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.03622185438871384,
+      "learning_rate": 7.050400916380299e-05,
+      "epoch": 142.64,
+      "step": 570
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.042615506798028946,
+      "learning_rate": 7.02176403207331e-05,
+      "epoch": 143.96,
+      "step": 575
+    },
+    {
+      "loss": 0.0268,
+      "grad_norm": 0.13792142271995544,
+      "learning_rate": 6.993127147766323e-05,
+      "epoch": 145.0,
+      "step": 580
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.035664405673742294,
+      "learning_rate": 6.964490263459336e-05,
+      "epoch": 146.32,
+      "step": 585
+    },
+    {
+      "loss": 0.0258,
+      "grad_norm": 0.033511932939291,
+      "learning_rate": 6.935853379152348e-05,
+      "epoch": 147.64,
+      "step": 590
+    },
+    {
+      "loss": 0.0248,
+      "grad_norm": 0.036591917276382446,
+      "learning_rate": 6.907216494845361e-05,
+      "epoch": 148.96,
+      "step": 595
+    },
+    {
+      "loss": 0.0257,
+      "grad_norm": 0.11892726272344589,
+      "learning_rate": 6.878579610538374e-05,
+      "epoch": 150.0,
+      "step": 600
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.03532181680202484,
+      "learning_rate": 6.849942726231386e-05,
+      "epoch": 151.32,
+      "step": 605
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.039349090307950974,
+      "learning_rate": 6.821305841924399e-05,
+      "epoch": 152.64,
+      "step": 610
+    },
+    {
+      "loss": 0.0247,
+      "grad_norm": 0.03686106950044632,
+      "learning_rate": 6.792668957617412e-05,
+      "epoch": 153.96,
+      "step": 615
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.08257201313972473,
+      "learning_rate": 6.764032073310424e-05,
+      "epoch": 155.0,
+      "step": 620
+    },
+    {
+      "loss": 0.0243,
+      "grad_norm": 0.035335343331098557,
+      "learning_rate": 6.735395189003437e-05,
+      "epoch": 156.32,
+      "step": 625
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.030693387612700462,
+      "learning_rate": 6.706758304696448e-05,
+      "epoch": 157.64,
+      "step": 630
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.031573694199323654,
+      "learning_rate": 6.678121420389462e-05,
+      "epoch": 158.96,
+      "step": 635
+    },
+    {
+      "loss": 0.0247,
+      "grad_norm": 0.11772840470075607,
+      "learning_rate": 6.649484536082475e-05,
+      "epoch": 160.0,
+      "step": 640
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.03553156182169914,
+      "learning_rate": 6.620847651775487e-05,
+      "epoch": 161.32,
+      "step": 645
+    },
+    {
+      "loss": 0.0247,
+      "grad_norm": 0.04065680876374245,
+      "learning_rate": 6.5922107674685e-05,
+      "epoch": 162.64,
+      "step": 650
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.03680557757616043,
+      "learning_rate": 6.563573883161513e-05,
+      "epoch": 163.96,
+      "step": 655
+    },
+    {
+      "loss": 0.0254,
+      "grad_norm": 0.1432940512895584,
+      "learning_rate": 6.534936998854525e-05,
+      "epoch": 165.0,
+      "step": 660
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.0374530591070652,
+      "learning_rate": 6.506300114547538e-05,
+      "epoch": 166.32,
+      "step": 665
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.039093125611543655,
+      "learning_rate": 6.477663230240551e-05,
+      "epoch": 167.64,
+      "step": 670
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.03439056873321533,
+      "learning_rate": 6.449026345933563e-05,
+      "epoch": 168.96,
+      "step": 675
+    },
+    {
+      "loss": 0.0224,
+      "grad_norm": 0.07211510837078094,
+      "learning_rate": 6.420389461626576e-05,
+      "epoch": 170.0,
+      "step": 680
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.03178408369421959,
+      "learning_rate": 6.391752577319587e-05,
+      "epoch": 171.32,
+      "step": 685
+    },
+    {
+      "loss": 0.0255,
+      "grad_norm": 0.02913156896829605,
+      "learning_rate": 6.3631156930126e-05,
+      "epoch": 172.64,
+      "step": 690
+    },
+    {
+      "loss": 0.0257,
+      "grad_norm": 0.03487716615200043,
+      "learning_rate": 6.334478808705613e-05,
+      "epoch": 173.96,
+      "step": 695
+    },
+    {
+      "loss": 0.0253,
+      "grad_norm": 0.12451174110174179,
+      "learning_rate": 6.305841924398625e-05,
+      "epoch": 175.0,
+      "step": 700
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.0366508811712265,
+      "learning_rate": 6.277205040091638e-05,
+      "epoch": 176.32,
+      "step": 705
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.03491870313882828,
+      "learning_rate": 6.24856815578465e-05,
+      "epoch": 177.64,
+      "step": 710
+    },
+    {
+      "loss": 0.0257,
+      "grad_norm": 0.03027982823550701,
+      "learning_rate": 6.219931271477663e-05,
+      "epoch": 178.96,
+      "step": 715
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.08150530606508255,
+      "learning_rate": 6.191294387170676e-05,
+      "epoch": 180.0,
+      "step": 720
+    },
+    {
+      "loss": 0.0232,
+      "grad_norm": 0.03483245149254799,
+      "learning_rate": 6.162657502863689e-05,
+      "epoch": 181.32,
+      "step": 725
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.034706421196460724,
+      "learning_rate": 6.134020618556701e-05,
+      "epoch": 182.64,
+      "step": 730
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.03622004762291908,
+      "learning_rate": 6.105383734249714e-05,
+      "epoch": 183.96,
+      "step": 735
+    },
+    {
+      "loss": 0.0249,
+      "grad_norm": 0.10144224017858505,
+      "learning_rate": 6.076746849942726e-05,
+      "epoch": 185.0,
+      "step": 740
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.03530497848987579,
+      "learning_rate": 6.0481099656357384e-05,
+      "epoch": 186.32,
+      "step": 745
+    },
+    {
+      "loss": 0.0245,
+      "grad_norm": 0.034086182713508606,
+      "learning_rate": 6.019473081328752e-05,
+      "epoch": 187.64,
+      "step": 750
+    },
+    {
+      "loss": 0.0243,
+      "grad_norm": 0.039041388779878616,
+      "learning_rate": 5.9908361970217644e-05,
+      "epoch": 188.96,
+      "step": 755
+    },
+    {
+      "loss": 0.0245,
+      "grad_norm": 0.1247899979352951,
+      "learning_rate": 5.962199312714777e-05,
+      "epoch": 190.0,
+      "step": 760
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.035458508878946304,
+      "learning_rate": 5.93356242840779e-05,
+      "epoch": 191.32,
+      "step": 765
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.03673034906387329,
+      "learning_rate": 5.904925544100802e-05,
+      "epoch": 192.64,
+      "step": 770
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.03364979103207588,
+      "learning_rate": 5.876288659793815e-05,
+      "epoch": 193.96,
+      "step": 775
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.09387586265802383,
+      "learning_rate": 5.8476517754868276e-05,
+      "epoch": 195.0,
+      "step": 780
+    },
+    {
+      "loss": 0.0248,
+      "grad_norm": 0.03462570905685425,
+      "learning_rate": 5.81901489117984e-05,
+      "epoch": 196.32,
+      "step": 785
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.03342005982995033,
+      "learning_rate": 5.790378006872853e-05,
+      "epoch": 197.64,
+      "step": 790
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.041909925639629364,
+      "learning_rate": 5.761741122565865e-05,
+      "epoch": 198.96,
+      "step": 795
+    },
+    {
+      "loss": 0.0258,
+      "grad_norm": 0.15439164638519287,
+      "learning_rate": 5.7331042382588775e-05,
+      "epoch": 200.0,
+      "step": 800
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.02883634716272354,
+      "learning_rate": 5.70446735395189e-05,
+      "epoch": 201.32,
+      "step": 805
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.029865020886063576,
+      "learning_rate": 5.675830469644903e-05,
+      "epoch": 202.64,
+      "step": 810
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.030608315020799637,
+      "learning_rate": 5.6471935853379155e-05,
+      "epoch": 203.96,
+      "step": 815
+    },
+    {
+      "loss": 0.0224,
+      "grad_norm": 0.07783036679029465,
+      "learning_rate": 5.618556701030928e-05,
+      "epoch": 205.0,
+      "step": 820
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.035508111119270325,
+      "learning_rate": 5.589919816723941e-05,
+      "epoch": 206.32,
+      "step": 825
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.03703364357352257,
+      "learning_rate": 5.5612829324169534e-05,
+      "epoch": 207.64,
+      "step": 830
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.030922846868634224,
+      "learning_rate": 5.532646048109966e-05,
+      "epoch": 208.96,
+      "step": 835
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.11316124349832535,
+      "learning_rate": 5.504009163802979e-05,
+      "epoch": 210.0,
+      "step": 840
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.032941922545433044,
+      "learning_rate": 5.4753722794959914e-05,
+      "epoch": 211.32,
+      "step": 845
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.028119860216975212,
+      "learning_rate": 5.4467353951890033e-05,
+      "epoch": 212.64,
+      "step": 850
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.03130020201206207,
+      "learning_rate": 5.418098510882016e-05,
+      "epoch": 213.96,
+      "step": 855
+    },
+    {
+      "loss": 0.0226,
+      "grad_norm": 0.06978127360343933,
+      "learning_rate": 5.3894616265750286e-05,
+      "epoch": 215.0,
+      "step": 860
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.030422938987612724,
+      "learning_rate": 5.360824742268041e-05,
+      "epoch": 216.32,
+      "step": 865
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.028223881497979164,
+      "learning_rate": 5.332187857961054e-05,
+      "epoch": 217.64,
+      "step": 870
+    },
+    {
+      "loss": 0.0243,
+      "grad_norm": 0.029208194464445114,
+      "learning_rate": 5.3035509736540666e-05,
+      "epoch": 218.96,
+      "step": 875
+    },
+    {
+      "loss": 0.0271,
+      "grad_norm": 0.16511231660842896,
+      "learning_rate": 5.274914089347079e-05,
+      "epoch": 220.0,
+      "step": 880
+    },
+    {
+      "loss": 0.0243,
+      "grad_norm": 0.03705955296754837,
+      "learning_rate": 5.246277205040092e-05,
+      "epoch": 221.32,
+      "step": 885
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.030203381553292274,
+      "learning_rate": 5.2176403207331045e-05,
+      "epoch": 222.64,
+      "step": 890
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.027039049193263054,
+      "learning_rate": 5.189003436426118e-05,
+      "epoch": 223.96,
+      "step": 895
+    },
+    {
+      "loss": 0.0254,
+      "grad_norm": 0.11282758414745331,
+      "learning_rate": 5.1603665521191305e-05,
+      "epoch": 225.0,
+      "step": 900
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.03700408712029457,
+      "learning_rate": 5.131729667812142e-05,
+      "epoch": 226.32,
+      "step": 905
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.030705822631716728,
+      "learning_rate": 5.1030927835051544e-05,
+      "epoch": 227.64,
+      "step": 910
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.03678268566727638,
+      "learning_rate": 5.074455899198167e-05,
+      "epoch": 228.96,
+      "step": 915
+    },
+    {
+      "loss": 0.0269,
+      "grad_norm": 0.12632058560848236,
+      "learning_rate": 5.04581901489118e-05,
+      "epoch": 230.0,
+      "step": 920
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.030165374279022217,
+      "learning_rate": 5.0171821305841924e-05,
+      "epoch": 231.32,
+      "step": 925
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.029971277341246605,
+      "learning_rate": 4.988545246277205e-05,
+      "epoch": 232.64,
+      "step": 930
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.033762127161026,
+      "learning_rate": 4.9599083619702184e-05,
+      "epoch": 233.96,
+      "step": 935
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.09928340464830399,
+      "learning_rate": 4.931271477663231e-05,
+      "epoch": 235.0,
+      "step": 940
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.030009057372808456,
+      "learning_rate": 4.902634593356243e-05,
+      "epoch": 236.32,
+      "step": 945
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.03369998559355736,
+      "learning_rate": 4.8739977090492556e-05,
+      "epoch": 237.64,
+      "step": 950
+    },
+    {
+      "loss": 0.0251,
+      "grad_norm": 0.03107636794447899,
+      "learning_rate": 4.845360824742268e-05,
+      "epoch": 238.96,
+      "step": 955
+    },
+    {
+      "loss": 0.0227,
+      "grad_norm": 0.10390744358301163,
+      "learning_rate": 4.816723940435281e-05,
+      "epoch": 240.0,
+      "step": 960
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.03572176396846771,
+      "learning_rate": 4.7880870561282936e-05,
+      "epoch": 241.32,
+      "step": 965
+    },
+    {
+      "loss": 0.0232,
+      "grad_norm": 0.03051804192364216,
+      "learning_rate": 4.7594501718213055e-05,
+      "epoch": 242.64,
+      "step": 970
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.031635165214538574,
+      "learning_rate": 4.730813287514318e-05,
+      "epoch": 243.96,
+      "step": 975
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.0863058865070343,
+      "learning_rate": 4.7021764032073315e-05,
+      "epoch": 245.0,
+      "step": 980
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.03220526874065399,
+      "learning_rate": 4.673539518900344e-05,
+      "epoch": 246.32,
+      "step": 985
+    },
+    {
+      "loss": 0.0229,
+      "grad_norm": 0.030770031735301018,
+      "learning_rate": 4.644902634593357e-05,
+      "epoch": 247.64,
+      "step": 990
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.036592498421669006,
+      "learning_rate": 4.6162657502863694e-05,
+      "epoch": 248.96,
+      "step": 995
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.09140961617231369,
+      "learning_rate": 4.5876288659793814e-05,
+      "epoch": 250.0,
+      "step": 1000
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.03191279247403145,
+      "learning_rate": 4.558991981672394e-05,
+      "epoch": 251.32,
+      "step": 1005
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.02950333058834076,
+      "learning_rate": 4.530355097365407e-05,
+      "epoch": 252.64,
+      "step": 1010
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.031532324850559235,
+      "learning_rate": 4.5017182130584194e-05,
+      "epoch": 253.96,
+      "step": 1015
+    },
+    {
+      "loss": 0.0228,
+      "grad_norm": 0.10817220062017441,
+      "learning_rate": 4.473081328751432e-05,
+      "epoch": 255.0,
+      "step": 1020
+    },
+    {
+      "loss": 0.0249,
+      "grad_norm": 0.03229045867919922,
+      "learning_rate": 4.4444444444444447e-05,
+      "epoch": 256.32,
+      "step": 1025
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.027881359681487083,
+      "learning_rate": 4.415807560137457e-05,
+      "epoch": 257.64,
+      "step": 1030
+    },
+    {
+      "loss": 0.0248,
+      "grad_norm": 0.027970343828201294,
+      "learning_rate": 4.38717067583047e-05,
+      "epoch": 258.96,
+      "step": 1035
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.0961368978023529,
+      "learning_rate": 4.3585337915234826e-05,
+      "epoch": 260.0,
+      "step": 1040
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.03192312270402908,
+      "learning_rate": 4.329896907216495e-05,
+      "epoch": 261.32,
+      "step": 1045
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.03287699446082115,
+      "learning_rate": 4.301260022909508e-05,
+      "epoch": 262.64,
+      "step": 1050
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.03482283651828766,
+      "learning_rate": 4.27262313860252e-05,
+      "epoch": 263.96,
+      "step": 1055
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.12014977633953094,
+      "learning_rate": 4.2439862542955325e-05,
+      "epoch": 265.0,
+      "step": 1060
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.030348435044288635,
+      "learning_rate": 4.215349369988545e-05,
+      "epoch": 266.32,
+      "step": 1065
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.027197284623980522,
+      "learning_rate": 4.1867124856815585e-05,
+      "epoch": 267.64,
+      "step": 1070
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.03164960816502571,
+      "learning_rate": 4.158075601374571e-05,
+      "epoch": 268.96,
+      "step": 1075
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.09021521359682083,
+      "learning_rate": 4.129438717067583e-05,
+      "epoch": 270.0,
+      "step": 1080
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.03432054817676544,
+      "learning_rate": 4.100801832760596e-05,
+      "epoch": 271.32,
+      "step": 1085
+    },
+    {
+      "loss": 0.0224,
+      "grad_norm": 0.029961712658405304,
+      "learning_rate": 4.0721649484536084e-05,
+      "epoch": 272.64,
+      "step": 1090
+    },
+    {
+      "loss": 0.0245,
+      "grad_norm": 0.02801748737692833,
+      "learning_rate": 4.043528064146621e-05,
+      "epoch": 273.96,
+      "step": 1095
+    },
+    {
+      "loss": 0.0229,
+      "grad_norm": 0.09304305166006088,
+      "learning_rate": 4.014891179839634e-05,
+      "epoch": 275.0,
+      "step": 1100
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.03154018521308899,
+      "learning_rate": 3.9862542955326463e-05,
+      "epoch": 276.32,
+      "step": 1105
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.029925866052508354,
+      "learning_rate": 3.957617411225659e-05,
+      "epoch": 277.64,
+      "step": 1110
+    },
+    {
+      "loss": 0.0232,
+      "grad_norm": 0.032234761863946915,
+      "learning_rate": 3.9289805269186716e-05,
+      "epoch": 278.96,
+      "step": 1115
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.09113281220197678,
+      "learning_rate": 3.900343642611684e-05,
+      "epoch": 280.0,
+      "step": 1120
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.03371744975447655,
+      "learning_rate": 3.871706758304697e-05,
+      "epoch": 281.32,
+      "step": 1125
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.033525336533784866,
+      "learning_rate": 3.8430698739977096e-05,
+      "epoch": 282.64,
+      "step": 1130
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.030558524653315544,
+      "learning_rate": 3.8144329896907216e-05,
+      "epoch": 283.96,
+      "step": 1135
+    },
+    {
+      "loss": 0.022,
+      "grad_norm": 0.07060851901769638,
+      "learning_rate": 3.785796105383734e-05,
+      "epoch": 285.0,
+      "step": 1140
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.02952047996222973,
+      "learning_rate": 3.757159221076747e-05,
+      "epoch": 286.32,
+      "step": 1145
+    },
+    {
+      "loss": 0.0227,
+      "grad_norm": 0.030197326093912125,
+      "learning_rate": 3.7285223367697595e-05,
+      "epoch": 287.64,
+      "step": 1150
+    },
+    {
+      "loss": 0.0232,
+      "grad_norm": 0.028898609802126884,
+      "learning_rate": 3.699885452462772e-05,
+      "epoch": 288.96,
+      "step": 1155
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.10391610860824585,
+      "learning_rate": 3.671248568155785e-05,
+      "epoch": 290.0,
+      "step": 1160
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.0285499207675457,
+      "learning_rate": 3.6426116838487974e-05,
+      "epoch": 291.32,
+      "step": 1165
+    },
+    {
+      "loss": 0.0229,
+      "grad_norm": 0.028268715366721153,
+      "learning_rate": 3.61397479954181e-05,
+      "epoch": 292.64,
+      "step": 1170
+    },
+    {
+      "loss": 0.0247,
+      "grad_norm": 0.02961159311234951,
+      "learning_rate": 3.585337915234823e-05,
+      "epoch": 293.96,
+      "step": 1175
+    },
+    {
+      "loss": 0.0226,
+      "grad_norm": 0.08803751319646835,
+      "learning_rate": 3.5567010309278354e-05,
+      "epoch": 295.0,
+      "step": 1180
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.03452374413609505,
+      "learning_rate": 3.528064146620848e-05,
+      "epoch": 296.32,
+      "step": 1185
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.028895270079374313,
+      "learning_rate": 3.49942726231386e-05,
+      "epoch": 297.64,
+      "step": 1190
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.029182473197579384,
+      "learning_rate": 3.4707903780068726e-05,
+      "epoch": 298.96,
+      "step": 1195
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.11874058097600937,
+      "learning_rate": 3.442153493699885e-05,
+      "epoch": 300.0,
+      "step": 1200
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.030481066554784775,
+      "learning_rate": 3.4135166093928986e-05,
+      "epoch": 301.32,
+      "step": 1205
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.03108309395611286,
+      "learning_rate": 3.384879725085911e-05,
+      "epoch": 302.64,
+      "step": 1210
+    },
+    {
+      "loss": 0.0228,
+      "grad_norm": 0.03036290407180786,
+      "learning_rate": 3.356242840778923e-05,
+      "epoch": 303.96,
+      "step": 1215
+    },
+    {
+      "loss": 0.0223,
+      "grad_norm": 0.07720436155796051,
+      "learning_rate": 3.327605956471936e-05,
+      "epoch": 305.0,
+      "step": 1220
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.03028162382543087,
+      "learning_rate": 3.2989690721649485e-05,
+      "epoch": 306.32,
+      "step": 1225
+    },
+    {
+      "loss": 0.0226,
+      "grad_norm": 0.033151157200336456,
+      "learning_rate": 3.270332187857961e-05,
+      "epoch": 307.64,
+      "step": 1230
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.02951214276254177,
+      "learning_rate": 3.241695303550974e-05,
+      "epoch": 308.96,
+      "step": 1235
+    },
+    {
+      "loss": 0.0257,
+      "grad_norm": 0.09070917963981628,
+      "learning_rate": 3.2130584192439865e-05,
+      "epoch": 310.0,
+      "step": 1240
+    },
+    {
+      "loss": 0.0248,
+      "grad_norm": 0.03337477520108223,
+      "learning_rate": 3.184421534936999e-05,
+      "epoch": 311.32,
+      "step": 1245
+    },
+    {
+      "loss": 0.0226,
+      "grad_norm": 0.03151268512010574,
+      "learning_rate": 3.155784650630012e-05,
+      "epoch": 312.64,
+      "step": 1250
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.030940482392907143,
+      "learning_rate": 3.1271477663230244e-05,
+      "epoch": 313.96,
+      "step": 1255
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.09032298624515533,
+      "learning_rate": 3.098510882016037e-05,
+      "epoch": 315.0,
+      "step": 1260
+    },
+    {
+      "loss": 0.0222,
+      "grad_norm": 0.029143668711185455,
+      "learning_rate": 3.06987399770905e-05,
+      "epoch": 316.32,
+      "step": 1265
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.029851289466023445,
+      "learning_rate": 3.0412371134020617e-05,
+      "epoch": 317.64,
+      "step": 1270
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.03257305920124054,
+      "learning_rate": 3.0126002290950743e-05,
+      "epoch": 318.96,
+      "step": 1275
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.10195237398147583,
+      "learning_rate": 2.983963344788087e-05,
+      "epoch": 320.0,
+      "step": 1280
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.03116573579609394,
+      "learning_rate": 2.9553264604811e-05,
+      "epoch": 321.32,
+      "step": 1285
+    },
+    {
+      "loss": 0.0253,
+      "grad_norm": 0.033235374838113785,
+      "learning_rate": 2.9266895761741126e-05,
+      "epoch": 322.64,
+      "step": 1290
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.03546692803502083,
+      "learning_rate": 2.8980526918671253e-05,
+      "epoch": 323.96,
+      "step": 1295
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.0778215229511261,
+      "learning_rate": 2.8694158075601372e-05,
+      "epoch": 325.0,
+      "step": 1300
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.029815560206770897,
+      "learning_rate": 2.8407789232531502e-05,
+      "epoch": 326.32,
+      "step": 1305
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.03497137874364853,
+      "learning_rate": 2.812142038946163e-05,
+      "epoch": 327.64,
+      "step": 1310
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.030050713568925858,
+      "learning_rate": 2.7835051546391755e-05,
+      "epoch": 328.96,
+      "step": 1315
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.09748966246843338,
+      "learning_rate": 2.754868270332188e-05,
+      "epoch": 330.0,
+      "step": 1320
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.0319872722029686,
+      "learning_rate": 2.7262313860252005e-05,
+      "epoch": 331.32,
+      "step": 1325
+    },
+    {
+      "loss": 0.0243,
+      "grad_norm": 0.029525283724069595,
+      "learning_rate": 2.697594501718213e-05,
+      "epoch": 332.64,
+      "step": 1330
+    },
+    {
+      "loss": 0.0245,
+      "grad_norm": 0.029868364334106445,
+      "learning_rate": 2.6689576174112258e-05,
+      "epoch": 333.96,
+      "step": 1335
+    },
+    {
+      "loss": 0.0212,
+      "grad_norm": 0.07746418565511703,
+      "learning_rate": 2.6403207331042384e-05,
+      "epoch": 335.0,
+      "step": 1340
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.02571861259639263,
+      "learning_rate": 2.611683848797251e-05,
+      "epoch": 336.32,
+      "step": 1345
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.0320206955075264,
+      "learning_rate": 2.5830469644902637e-05,
+      "epoch": 337.64,
+      "step": 1350
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.03084505721926689,
+      "learning_rate": 2.554410080183276e-05,
+      "epoch": 338.96,
+      "step": 1355
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.1282522976398468,
+      "learning_rate": 2.5257731958762887e-05,
+      "epoch": 340.0,
+      "step": 1360
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.03159436210989952,
+      "learning_rate": 2.4971363115693013e-05,
+      "epoch": 341.32,
+      "step": 1365
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.03368183225393295,
+      "learning_rate": 2.468499427262314e-05,
+      "epoch": 342.64,
+      "step": 1370
+    },
+    {
+      "loss": 0.0232,
+      "grad_norm": 0.02871900610625744,
+      "learning_rate": 2.4398625429553266e-05,
+      "epoch": 343.96,
+      "step": 1375
+    },
+    {
+      "loss": 0.0216,
+      "grad_norm": 0.06527750939130783,
+      "learning_rate": 2.4112256586483393e-05,
+      "epoch": 345.0,
+      "step": 1380
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.029657971113920212,
+      "learning_rate": 2.3825887743413516e-05,
+      "epoch": 346.32,
+      "step": 1385
+    },
+    {
+      "loss": 0.0226,
+      "grad_norm": 0.029672225937247276,
+      "learning_rate": 2.3539518900343642e-05,
+      "epoch": 347.64,
+      "step": 1390
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.032295338809490204,
+      "learning_rate": 2.3253150057273772e-05,
+      "epoch": 348.96,
+      "step": 1395
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.12228602916002274,
+      "learning_rate": 2.2966781214203895e-05,
+      "epoch": 350.0,
+      "step": 1400
+    },
+    {
+      "loss": 0.0244,
+      "grad_norm": 0.031152470037341118,
+      "learning_rate": 2.268041237113402e-05,
+      "epoch": 351.32,
+      "step": 1405
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.03246377035975456,
+      "learning_rate": 2.2394043528064148e-05,
+      "epoch": 352.64,
+      "step": 1410
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.03664344921708107,
+      "learning_rate": 2.210767468499427e-05,
+      "epoch": 353.96,
+      "step": 1415
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.12599903345108032,
+      "learning_rate": 2.18213058419244e-05,
+      "epoch": 355.0,
+      "step": 1420
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.03213375434279442,
+      "learning_rate": 2.1534936998854528e-05,
+      "epoch": 356.32,
+      "step": 1425
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.029569735750555992,
+      "learning_rate": 2.124856815578465e-05,
+      "epoch": 357.64,
+      "step": 1430
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.030345458537340164,
+      "learning_rate": 2.0962199312714777e-05,
+      "epoch": 358.96,
+      "step": 1435
+    },
+    {
+      "loss": 0.0225,
+      "grad_norm": 0.07442766427993774,
+      "learning_rate": 2.0675830469644904e-05,
+      "epoch": 360.0,
+      "step": 1440
+    },
+    {
+      "loss": 0.0247,
+      "grad_norm": 0.03161914646625519,
+      "learning_rate": 2.038946162657503e-05,
+      "epoch": 361.32,
+      "step": 1445
+    },
+    {
+      "loss": 0.0224,
+      "grad_norm": 0.03342209383845329,
+      "learning_rate": 2.0103092783505157e-05,
+      "epoch": 362.64,
+      "step": 1450
+    },
+    {
+      "loss": 0.0226,
+      "grad_norm": 0.029506616294384003,
+      "learning_rate": 1.981672394043528e-05,
+      "epoch": 363.96,
+      "step": 1455
+    },
+    {
+      "loss": 0.0251,
+      "grad_norm": 0.13045279681682587,
+      "learning_rate": 1.9530355097365406e-05,
+      "epoch": 365.0,
+      "step": 1460
+    },
+    {
+      "loss": 0.0239,
+      "grad_norm": 0.03303099796175957,
+      "learning_rate": 1.9243986254295536e-05,
+      "epoch": 366.32,
+      "step": 1465
+    },
+    {
+      "loss": 0.0221,
+      "grad_norm": 0.02956564724445343,
+      "learning_rate": 1.895761741122566e-05,
+      "epoch": 367.64,
+      "step": 1470
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.03200279548764229,
+      "learning_rate": 1.8671248568155786e-05,
+      "epoch": 368.96,
+      "step": 1475
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.12507398426532745,
+      "learning_rate": 1.8384879725085912e-05,
+      "epoch": 370.0,
+      "step": 1480
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.03214867785573006,
+      "learning_rate": 1.809851088201604e-05,
+      "epoch": 371.32,
+      "step": 1485
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.03199266269803047,
+      "learning_rate": 1.7812142038946165e-05,
+      "epoch": 372.64,
+      "step": 1490
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.027682902291417122,
+      "learning_rate": 1.7525773195876288e-05,
+      "epoch": 373.96,
+      "step": 1495
+    },
+    {
+      "loss": 0.0228,
+      "grad_norm": 0.10432948172092438,
+      "learning_rate": 1.7239404352806415e-05,
+      "epoch": 375.0,
+      "step": 1500
+    },
+    {
+      "loss": 0.0235,
+      "grad_norm": 0.03665570914745331,
+      "learning_rate": 1.695303550973654e-05,
+      "epoch": 376.32,
+      "step": 1505
+    },
+    {
+      "loss": 0.0228,
+      "grad_norm": 0.03269299864768982,
+      "learning_rate": 1.6666666666666667e-05,
+      "epoch": 377.64,
+      "step": 1510
+    },
+    {
+      "loss": 0.0232,
+      "grad_norm": 0.030298851430416107,
+      "learning_rate": 1.6380297823596794e-05,
+      "epoch": 378.96,
+      "step": 1515
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.1330370008945465,
+      "learning_rate": 1.609392898052692e-05,
+      "epoch": 380.0,
+      "step": 1520
+    },
+    {
+      "loss": 0.0232,
+      "grad_norm": 0.026194848120212555,
+      "learning_rate": 1.5807560137457044e-05,
+      "epoch": 381.32,
+      "step": 1525
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.030696984380483627,
+      "learning_rate": 1.5521191294387173e-05,
+      "epoch": 382.64,
+      "step": 1530
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.03159346804022789,
+      "learning_rate": 1.5234822451317298e-05,
+      "epoch": 383.96,
+      "step": 1535
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.0895160585641861,
+      "learning_rate": 1.4948453608247423e-05,
+      "epoch": 385.0,
+      "step": 1540
+    },
+    {
+      "loss": 0.0226,
+      "grad_norm": 0.030342400074005127,
+      "learning_rate": 1.466208476517755e-05,
+      "epoch": 386.32,
+      "step": 1545
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.03451743721961975,
+      "learning_rate": 1.4375715922107674e-05,
+      "epoch": 387.64,
+      "step": 1550
+    },
+    {
+      "loss": 0.0224,
+      "grad_norm": 0.034534044563770294,
+      "learning_rate": 1.40893470790378e-05,
+      "epoch": 388.96,
+      "step": 1555
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.11649748682975769,
+      "learning_rate": 1.3802978235967929e-05,
+      "epoch": 390.0,
+      "step": 1560
+    },
+    {
+      "loss": 0.0232,
+      "grad_norm": 0.02730483002960682,
+      "learning_rate": 1.3516609392898052e-05,
+      "epoch": 391.32,
+      "step": 1565
+    },
+    {
+      "loss": 0.0245,
+      "grad_norm": 0.03302980959415436,
+      "learning_rate": 1.323024054982818e-05,
+      "epoch": 392.64,
+      "step": 1570
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.030424287542700768,
+      "learning_rate": 1.2943871706758307e-05,
+      "epoch": 393.96,
+      "step": 1575
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.09190870821475983,
+      "learning_rate": 1.2657502863688431e-05,
+      "epoch": 395.0,
+      "step": 1580
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.03016272746026516,
+      "learning_rate": 1.2371134020618558e-05,
+      "epoch": 396.32,
+      "step": 1585
+    },
+    {
+      "loss": 0.0237,
+      "grad_norm": 0.029102135449647903,
+      "learning_rate": 1.2084765177548683e-05,
+      "epoch": 397.64,
+      "step": 1590
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.030849164351820946,
+      "learning_rate": 1.1798396334478809e-05,
+      "epoch": 398.96,
+      "step": 1595
+    },
+    {
+      "loss": 0.0223,
+      "grad_norm": 0.09185610711574554,
+      "learning_rate": 1.1512027491408934e-05,
+      "epoch": 400.0,
+      "step": 1600
+    },
+    {
+      "loss": 0.0228,
+      "grad_norm": 0.030718082562088966,
+      "learning_rate": 1.1225658648339062e-05,
+      "epoch": 401.32,
+      "step": 1605
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.028845084831118584,
+      "learning_rate": 1.0939289805269187e-05,
+      "epoch": 402.64,
+      "step": 1610
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.03036542609333992,
+      "learning_rate": 1.0652920962199313e-05,
+      "epoch": 403.96,
+      "step": 1615
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.10246625542640686,
+      "learning_rate": 1.036655211912944e-05,
+      "epoch": 405.0,
+      "step": 1620
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.03127530962228775,
+      "learning_rate": 1.0080183276059566e-05,
+      "epoch": 406.32,
+      "step": 1625
+    },
+    {
+      "loss": 0.0226,
+      "grad_norm": 0.036298803985118866,
+      "learning_rate": 9.793814432989691e-06,
+      "epoch": 407.64,
+      "step": 1630
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.028423035517334938,
+      "learning_rate": 9.507445589919818e-06,
+      "epoch": 408.96,
+      "step": 1635
+    },
+    {
+      "loss": 0.0218,
+      "grad_norm": 0.07871800661087036,
+      "learning_rate": 9.221076746849944e-06,
+      "epoch": 410.0,
+      "step": 1640
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.0336175374686718,
+      "learning_rate": 8.934707903780069e-06,
+      "epoch": 411.32,
+      "step": 1645
+    },
+    {
+      "loss": 0.0227,
+      "grad_norm": 0.03624117374420166,
+      "learning_rate": 8.648339060710195e-06,
+      "epoch": 412.64,
+      "step": 1650
+    },
+    {
+      "loss": 0.0218,
+      "grad_norm": 0.03119911253452301,
+      "learning_rate": 8.36197021764032e-06,
+      "epoch": 413.96,
+      "step": 1655
+    },
+    {
+      "loss": 0.0227,
+      "grad_norm": 0.09461841732263565,
+      "learning_rate": 8.075601374570448e-06,
+      "epoch": 415.0,
+      "step": 1660
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.02897919900715351,
+      "learning_rate": 7.789232531500573e-06,
+      "epoch": 416.32,
+      "step": 1665
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.03222072497010231,
+      "learning_rate": 7.502863688430699e-06,
+      "epoch": 417.64,
+      "step": 1670
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.02793605998158455,
+      "learning_rate": 7.216494845360824e-06,
+      "epoch": 418.96,
+      "step": 1675
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.10282719135284424,
+      "learning_rate": 6.930126002290952e-06,
+      "epoch": 420.0,
+      "step": 1680
+    },
+    {
+      "loss": 0.0228,
+      "grad_norm": 0.029103396460413933,
+      "learning_rate": 6.643757159221077e-06,
+      "epoch": 421.32,
+      "step": 1685
+    },
+    {
+      "loss": 0.0229,
+      "grad_norm": 0.027615424245595932,
+      "learning_rate": 6.357388316151203e-06,
+      "epoch": 422.64,
+      "step": 1690
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.03273004665970802,
+      "learning_rate": 6.071019473081329e-06,
+      "epoch": 423.96,
+      "step": 1695
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.088851198554039,
+      "learning_rate": 5.784650630011455e-06,
+      "epoch": 425.0,
+      "step": 1700
+    },
+    {
+      "loss": 0.0236,
+      "grad_norm": 0.031545545905828476,
+      "learning_rate": 5.498281786941581e-06,
+      "epoch": 426.32,
+      "step": 1705
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.03436841815710068,
+      "learning_rate": 5.211912943871707e-06,
+      "epoch": 427.64,
+      "step": 1710
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.03470204398036003,
+      "learning_rate": 4.925544100801833e-06,
+      "epoch": 428.96,
+      "step": 1715
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.0859316810965538,
+      "learning_rate": 4.639175257731959e-06,
+      "epoch": 430.0,
+      "step": 1720
+    },
+    {
+      "loss": 0.0215,
+      "grad_norm": 0.02714327722787857,
+      "learning_rate": 4.352806414662085e-06,
+      "epoch": 431.32,
+      "step": 1725
+    },
+    {
+      "loss": 0.0233,
+      "grad_norm": 0.03115593083202839,
+      "learning_rate": 4.066437571592211e-06,
+      "epoch": 432.64,
+      "step": 1730
+    },
+    {
+      "loss": 0.0222,
+      "grad_norm": 0.03160055726766586,
+      "learning_rate": 3.7800687285223365e-06,
+      "epoch": 433.96,
+      "step": 1735
+    },
+    {
+      "loss": 0.0221,
+      "grad_norm": 0.10642414540052414,
+      "learning_rate": 3.493699885452463e-06,
+      "epoch": 435.0,
+      "step": 1740
+    },
+    {
+      "loss": 0.024,
+      "grad_norm": 0.029918361455202103,
+      "learning_rate": 3.2073310423825886e-06,
+      "epoch": 436.32,
+      "step": 1745
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.030128490179777145,
+      "learning_rate": 2.920962199312715e-06,
+      "epoch": 437.64,
+      "step": 1750
+    },
+    {
+      "loss": 0.0231,
+      "grad_norm": 0.03472098708152771,
+      "learning_rate": 2.6345933562428407e-06,
+      "epoch": 438.96,
+      "step": 1755
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.10841913521289825,
+      "learning_rate": 2.3482245131729668e-06,
+      "epoch": 440.0,
+      "step": 1760
+    },
+    {
+      "loss": 0.0241,
+      "grad_norm": 0.03282919153571129,
+      "learning_rate": 2.061855670103093e-06,
+      "epoch": 441.32,
+      "step": 1765
+    },
+    {
+      "loss": 0.0234,
+      "grad_norm": 0.030162909999489784,
+      "learning_rate": 1.7754868270332189e-06,
+      "epoch": 442.64,
+      "step": 1770
+    },
+    {
+      "loss": 0.0225,
+      "grad_norm": 0.032848529517650604,
+      "learning_rate": 1.4891179839633447e-06,
+      "epoch": 443.96,
+      "step": 1775
+    },
+    {
+      "loss": 0.023,
+      "grad_norm": 0.09595301747322083,
+      "learning_rate": 1.202749140893471e-06,
+      "epoch": 445.0,
+      "step": 1780
+    },
+    {
+      "loss": 0.0242,
+      "grad_norm": 0.027366334572434425,
+      "learning_rate": 9.163802978235968e-07,
+      "epoch": 446.32,
+      "step": 1785
+    },
+    {
+      "loss": 0.0243,
+      "grad_norm": 0.029810229316353798,
+      "learning_rate": 6.300114547537229e-07,
+      "epoch": 447.64,
+      "step": 1790
+    },
+    {
+      "loss": 0.0238,
+      "grad_norm": 0.03164233639836311,
+      "learning_rate": 3.436426116838488e-07,
+      "epoch": 448.96,
+      "step": 1795
+    },
+    {
+      "loss": 0.0246,
+      "grad_norm": 0.11831732094287872,
+      "learning_rate": 5.72737686139748e-08,
+      "epoch": 450.0,
+      "step": 1800
+    },
+    {
+      "train_runtime": 30940.5798,
+      "train_samples_per_second": 1.862,
+      "train_steps_per_second": 0.058,
+      "total_flos": 1.5308141101056e+18,
+      "train_loss": 0.049091512378719115,
+      "epoch": 450.0,
+      "step": 1800
+    }
+  ],
+  "training_args": {
+    "output_dir": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/outputs/Math_QA/group_09/checkpoints",
+    "overwrite_output_dir": false,
+    "do_train": false,
+    "do_eval": false,
+    "do_predict": false,
+    "eval_strategy": "no",
+    "prediction_loss_only": false,
+    "per_device_train_batch_size": 2,
+    "per_device_eval_batch_size": 8,
+    "per_gpu_train_batch_size": null,
+    "per_gpu_eval_batch_size": null,
+    "gradient_accumulation_steps": 16,
+    "eval_accumulation_steps": null,
+    "eval_delay": 0,
+    "torch_empty_cache_steps": null,
+    "learning_rate": 0.0001,
+    "weight_decay": 0.01,
+    "adam_beta1": 0.9,
+    "adam_beta2": 0.999,
+    "adam_epsilon": 1e-08,
+    "max_grad_norm": 1.0,
+    "num_train_epochs": 12,
+    "max_steps": 1800,
+    "lr_scheduler_type": "linear",
+    "lr_scheduler_kwargs": {},
+    "warmup_ratio": 0.03,
+    "warmup_steps": 0,
+    "log_level": "passive",
+    "log_level_replica": "warning",
+    "log_on_each_node": true,
+    "logging_dir": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/logs/Math_QA/group_09",
+    "logging_strategy": "steps",
+    "logging_first_step": true,
+    "logging_steps": 5,
+    "logging_nan_inf_filter": true,
+    "save_strategy": "steps",
+    "save_steps": 300,
+    "save_total_limit": 6,
+    "save_safetensors": true,
+    "save_on_each_node": false,
+    "save_only_model": false,
+    "restore_callback_states_from_checkpoint": false,
+    "no_cuda": false,
+    "use_cpu": false,
+    "use_mps_device": false,
+    "seed": 42,
+    "data_seed": null,
+    "jit_mode_eval": false,
+    "bf16": true,
+    "fp16": false,
+    "fp16_opt_level": "O1",
+    "half_precision_backend": "auto",
+    "bf16_full_eval": false,
+    "fp16_full_eval": false,
+    "tf32": null,
+    "local_rank": 0,
+    "ddp_backend": null,
+    "tpu_num_cores": null,
+    "tpu_metrics_debug": false,
+    "debug": [],
+    "dataloader_drop_last": false,
+    "eval_steps": null,
+    "dataloader_num_workers": 0,
+    "dataloader_prefetch_factor": null,
+    "past_index": -1,
+    "run_name": null,
+    "disable_tqdm": false,
+    "remove_unused_columns": true,
+    "label_names": null,
+    "load_best_model_at_end": false,
+    "metric_for_best_model": null,
+    "greater_is_better": null,
+    "ignore_data_skip": false,
+    "fsdp": [],
+    "fsdp_min_num_params": 0,
+    "fsdp_config": {
+      "min_num_params": 0,
+      "xla": false,
+      "xla_fsdp_v2": false,
+      "xla_fsdp_grad_ckpt": false
+    },
+    "fsdp_transformer_layer_cls_to_wrap": null,
+    "accelerator_config": {
+      "split_batches": false,
+      "dispatch_batches": null,
+      "even_batches": true,
+      "use_seedable_sampler": true,
+      "non_blocking": false,
+      "gradient_accumulation_kwargs": null
+    },
+    "parallelism_config": null,
+    "deepspeed": null,
+    "label_smoothing_factor": 0.0,
+    "optim": "adamw_torch",
+    "optim_args": null,
+    "adafactor": false,
+    "group_by_length": false,
+    "length_column_name": "length",
+    "report_to": [],
+    "project": "huggingface",
+    "trackio_space_id": "trackio",
+    "ddp_find_unused_parameters": null,
+    "ddp_bucket_cap_mb": null,
+    "ddp_broadcast_buffers": null,
+    "dataloader_pin_memory": true,
+    "dataloader_persistent_workers": false,
+    "skip_memory_metrics": true,
+    "use_legacy_prediction_loop": false,
+    "push_to_hub": false,
+    "resume_from_checkpoint": null,
+    "hub_model_id": null,
+    "hub_strategy": "every_save",
+    "hub_token": "<HUB_TOKEN>",
+    "hub_private_repo": null,
+    "hub_always_push": false,
+    "hub_revision": null,
+    "gradient_checkpointing": true,
+    "gradient_checkpointing_kwargs": null,
+    "include_inputs_for_metrics": false,
+    "include_for_metrics": [],
+    "eval_do_concat_batches": true,
+    "fp16_backend": "auto",
+    "push_to_hub_model_id": null,
+    "push_to_hub_organization": null,
+    "push_to_hub_token": "<PUSH_TO_HUB_TOKEN>",
+    "mp_parameters": "",
+    "auto_find_batch_size": false,
+    "full_determinism": false,
+    "torchdynamo": null,
+    "ray_scope": "last",
+    "ddp_timeout": 1800,
+    "torch_compile": false,
+    "torch_compile_backend": null,
+    "torch_compile_mode": null,
+    "include_tokens_per_second": false,
+    "include_num_input_tokens_seen": "no",
+    "neftune_noise_alpha": null,
+    "optim_target_modules": null,
+    "batch_eval_metrics": false,
+    "eval_on_start": false,
+    "use_liger_kernel": false,
+    "liger_kernel_config": null,
+    "eval_use_gather_object": false,
+    "average_tokens_across_devices": true
+  },
+  "lora_config": {
+    "r": 64,
+    "alpha": 128,
+    "dropout": 0.05,
+    "target_modules": [
+      "q_proj",
+      "k_proj",
+      "v_proj",
+      "o_proj",
+      "gate_proj",
+      "up_proj",
+      "down_proj"
+    ]
+  },
+  "effective_batch_size": 32,
+  "world_size": 1,
+  "git_commit": ""
+}

Math_QA/group_09/prompt_group.json ADDED Viewed

	@@ -0,0 +1,613 @@

+{
+  "dataset_name": "Math_QA",
+  "group_index": 9,
+  "source_file": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/prepare/data/math/Math_QA.json",
+  "selected_indices": [
+    142,
+    181,
+    659,
+    708,
+    738,
+    820,
+    997,
+    1097,
+    1185,
+    1268,
+    1293,
+    1454,
+    1463,
+    1552,
+    1649,
+    1692,
+    1903,
+    2038,
+    2076,
+    2098,
+    2107,
+    2389,
+    2401,
+    2452,
+    2577,
+    2616,
+    2619,
+    2646,
+    2739,
+    2823,
+    2993,
+    3049,
+    3183,
+    3378,
+    3386,
+    3454,
+    3517,
+    3526,
+    3539,
+    3711,
+    3887,
+    3994,
+    4085,
+    4113,
+    4149,
+    4251,
+    4330,
+    4449,
+    4521,
+    4557,
+    4623,
+    4646,
+    4651,
+    4680,
+    4854,
+    4870,
+    4927,
+    4937,
+    5025,
+    5039,
+    5118,
+    5131,
+    5198,
+    5213,
+    5241,
+    5311,
+    5437,
+    5441,
+    5839,
+    5908,
+    5918,
+    6051,
+    6096,
+    6113,
+    6340,
+    6345,
+    6395,
+    6424,
+    6742,
+    6796,
+    6812,
+    6847,
+    6879,
+    7107,
+    7129,
+    7652,
+    7854,
+    8037,
+    8177,
+    8246,
+    8337,
+    9033,
+    9248,
+    9274,
+    9379,
+    9483,
+    9523,
+    9581,
+    9777,
+    9816
+  ],
+  "total_records": 10000,
+  "sample_count": 100,
+  "generated_at": "2025-11-05T12:58:28Z",
+  "seed": 760231939,
+  "samples": [
+    {
+      "prompt": "what is the 26 th digit to the right of the decimal point in the decimal form of 6 / 11 ?\na ) 3 , b ) 4 , c ) 5 , d ) 6 , e ) 7",
+      "response": "\"6 / 11 = 0.54545454 . . . the even numbered positions in the decimal expansion are all 4 . the answer is b .\"",
+      "system": ""
+    },
+    {
+      "prompt": "arjun started a business with rs . 40,000 and is joined afterwards by anoop with rs . 80 , 000 . after how many months did anoop join if the profits at the end of the year are divided equally ?\na ) 3 , b ) 4 , c ) 5 , d ) 6 , e ) 7",
+      "response": "\"suppose anoop joined after 3 months . then , 40000 * 12 = 80000 * ( 12 – x ) = > x = 6 . answer : d\"",
+      "system": ""
+    },
+    {
+      "prompt": "if log 2 = 0.3010 and log 3 = 0.4771 , the value of log 5 ( 512 )\na ) 3.876 , b ) 2.967 , c ) 2.87 , d ) 3.912 , e ) 1.9",
+      "response": "\"log 5 ( 512 ) = log ( 512 ) / log 5 = log 2 ^ 9 / log ( 10 / 2 ) = 9 log 2 / ( log 10 - log 2 ) = ( 9 x 0.3010 ) / ( 1 - 0.3010 ) = 2.709 / 0.699 = 2709 / 699 = 3.876 answer is a .\"",
+      "system": ""
+    },
+    {
+      "prompt": "4,25 , 49,121 , 169,289 , 361,529 ,\na ) 149 , b ) 169 , c ) 841 , d ) 920 , e ) 219",
+      "response": "\"29 ^ 2 = 841 because follow sequence of square of the prime numbers answer : c\"",
+      "system": ""
+    },
+    {
+      "prompt": "a survey of employers found that during 1993 employment costs rose 3.5 percent , where employment costs consist of salary costs and fringe - benefit costs . if salary costs rose 3 percent and fringe - benefit costs rose 6.5 percent during 1993 , then fringe - benefit costs represented what percent of employment costs at the beginning of 1993 ?\na ) 16.5 % , b ) 14 % , c ) 35 % , d ) 55 % , e ) 65 %",
+      "response": "the amount by which employment costs rose is equal to 0.035 ( salary costs + fringe benefit costs ) ; on the other hand the amount by which employment costs rose is equal to 0.03 * salary costs + 0.065 * fringe benefit costs ; so , 35 ( s + f ) = 30 s + 65 f - - > s = 6 f - - > f / s = 1 / 6 - - > f / ( s + f ) = 1 / ( 1 + 6 ) = 1 / 7 = 0.14 . answer : b .",
+      "system": ""
+    },
+    {
+      "prompt": "if x and y are both odd prime numbers and x < y , how many distinct positive integer w factors does 2 xy have ?\na ) 3 , b ) 4 , c ) 6 , d ) 8 , e ) 12",
+      "response": "since 2 xy prime w factors are x ^ 1 * y ^ 1 * 2 ^ 1 , its total number or factors must be ( 1 + 1 ) ( 1 + 1 ) ( 1 + 1 ) = 2 ^ 3 = 8 . thus , i think d would be the correct answer .",
+      "system": ""
+    },
+    {
+      "prompt": "p alone can complete a job in 5 days . the work done by q alone in one day is equal to one - fourth of the work done by p alone in one day . in how many days can the work be completed if p and q work together ?\na ) 2.5 , b ) 3.0 , c ) 3.5 , d ) 4.0 , e ) 4.5",
+      "response": "\"p ' s rate is 1 / 5 q ' s rate is 1 / 20 the combined rate is 1 / 5 + 1 / 20 = 1 / 4 if they work together , the job will take 4 days . the answer is d .\"",
+      "system": ""
+    },
+    {
+      "prompt": "a library branch originally contained 18360 volumes , 30 % of which were fiction novels . 1 / 3 of the volumes were transferred to another location and 1 / 3 of the volumes transferred were fiction novels . what percent of the remaining collection was fiction novels ?\na ) 2.5 % , b ) 17.67 % , c ) 28.3 % , d ) 45.2 % , e ) 73.6 %",
+      "response": "\"as everything is either fraction or percentage , the given figure 18360 is just a false alarm . we can do this by assuming that originally the branch had 100 volumes . originally : total - 100 fiction - 30 transferred : total - 33 ( one third of original total ) fiction - 11 ( one third of those transferred ) remaining : total - 100 − 33 = 67100 − 33 = 67 fiction - 30 − 11 = 1930 − 11 = 19 to find : 19 is what percent of 67 28.3 option c\"",
+      "system": ""
+    },
+    {
+      "prompt": "of the goose eggs laid at a certain pond , 1 / 4 hatched and 4 / 5 of the geese that hatched from those eggs survived the first month . of the geese that survived the first month , 2 / 5 did not survive the first year . if 120 geese survived the first year and if no more than one goose hatched from each egg , how many goose eggs were laid at the pond ?\na ) 600 , b ) 700 , c ) 800 , d ) 900 , e ) 1000",
+      "response": "\"let x be the number of eggs that were laid . ( 3 / 5 ) ( 4 / 5 ) ( 1 / 4 ) x = 120 ( 12 / 100 ) x = 120 x = 1000 the answer is e .\"",
+      "system": ""
+    },
+    {
+      "prompt": "solution x is 40 % chemical a and 60 % chemical b by volume . solution y is 50 % chemical a and 50 % chemical b by volume . if a mixture of x and y is 47 % chemical a , what percent of the mixture is solution x ?\na ) 15 % , b ) 20 % , c ) 25 % , d ) 30 % , e ) 35 %",
+      "response": "\"the volume of the mixture be x + y . 0.4 x + 0.5 y = 0.47 ( x + y ) x = 3 y / 7 x / ( x + y ) = ( 3 y / 7 ) / ( 10 y / 7 ) = 3 / 10 = 30 % . the answer is d .\"",
+      "system": ""
+    },
+    {
+      "prompt": "find the fraction which has the same ratio to 2 / 6 that 3 / 4 has to 1 / 2\na ) 1 / 2 , b ) 2 / 5 , c ) 6 / 8 , d ) 9 / 4 , e ) 7 / 5",
+      "response": "\"p : 2 / 6 = 3 / 4 : 1 / 2 as the product of the means is equal to the product of the extremes . p * 1 / 2 = 2 / 6 * 3 / 4 p * 1 / 2 = 6 / 24 p = 1 / 2 = > p = 1 / 2 answer : a\"",
+      "system": ""
+    },
+    {
+      "prompt": "a man invests in a 16 % stock at 128 . the interest obtained by him is\na ) 22.5 % , b ) 42.5 % , c ) 12.5 % , d ) 62.5 % , e ) 82.5 %",
+      "response": "by investing rs 128 , income derived = rs . 16 by investing rs . 100 , income derived = = rs . 12.5 interest obtained = 12.5 % answer : c",
+      "system": ""
+    },
+    {
+      "prompt": "13 different biology books and 8 different chemistry books lie on a shelf . in how many ways can a student pick 2 books of each type ?\na ) 80 , b ) 160 , c ) 720 , d ) 1100 , e ) 2184",
+      "response": "\"no . of ways of picking 2 biology books ( from 13 books ) = 13 c 2 = ( 13 * 12 ) / 2 = 78 no . of ways of picking 2 chemistry books ( from 8 books ) = 8 c 2 = ( 8 * 7 ) / 2 = 28 total ways of picking 2 books of each type = 78 * 28 = 2184 ( option e )\"",
+      "system": ""
+    },
+    {
+      "prompt": "there is food for 760 men for 22 days . how many more men should join after two days so that the same food may last for 16 days more ?\na ) 122 , b ) 140 , c ) 199 , d ) 188 , e ) 190",
+      "response": "\"760 - - - - 22 760 - - - - 20 x - - - - - 16 x * 16 = 760 * 20 x = 950 760 - - - - - - - 190 answer : e\"",
+      "system": ""
+    },
+    {
+      "prompt": "if albert ’ s monthly earnings rise by 20 % , he would earn $ 560 . if , instead , his earnings rise by only 21 % , how much ( in $ ) would he earn this month ?\na ) 643 , b ) 652 , c ) 660 , d ) 564 , e ) 693",
+      "response": "\"= 560 / 1.2 ∗ 1.21 = 564 = 564 answer is d\"",
+      "system": ""
+    },
+    {
+      "prompt": "10 is subtracted from 50 % of a number , the result is 25 . find the number ?\na ) 75 , b ) 70 , c ) 35 , d ) 170 , e ) 50",
+      "response": "\"( 50 / 100 ) * x â € “ 10 = 25 5 x = 350 x = 70 answer : b\"",
+      "system": ""
+    },
+    {
+      "prompt": "if the average ( arithmetic mean ) of x and y is 60 , and z – x = 80 , what is the average of y and z ?\na ) 100 , b ) 120 , c ) 125 , d ) 115 , e ) 90",
+      "response": "\"x + y / 2 = 60 = > x + y = 120 x = z - 80 . . . sub this value z - 80 + y = 120 = > z + y = 200 = > z + y / 2 = 100 answer : a\"",
+      "system": ""
+    },
+    {
+      "prompt": "what is the speed of the stream if a canoe rows upstream at 3 km / hr and downstream at 12 km / hr\na ) 1 kmph , b ) 4 kmph , c ) 3 kmph , d ) 2 kmph , e ) 4.5 kmph",
+      "response": "\"sol . speed of stream = 1 / 2 ( 12 - 3 ) kmph = 4.5 kmph . answer e\"",
+      "system": ""
+    },
+    {
+      "prompt": "among all sales staff at listco corporation , college graduates and those without college degrees are equally represented . each sales staff member is either a level - 1 or level - 2 employee . level - 1 college graduates account for 15 % of listco ' s sales staff . listco employs 60 level - 1 employees , 30 of whom are college graduates . how many sales staff members without college degrees are level - 2 employees ?\na ) 46 , b ) 42 , c ) 56 , d ) 70 , e ) 58",
+      "response": "i ' m going in on this one . so let ' s say that we have the following so we know that l 1 = 60 and that c and l 1 = 0.15 x , we should set up a double set matrix btw but anyways , i ' m just explaining the point with this problem . now we are told that 0.15 x = 30 , therefore the grand total is 200 . now we know that l 2 is 200 - 60 = 140 . we also learn that c and no c are equally represented thus 100 each . therefore no c and no l 2 will be 100 - 30 = 70 . thus d is the correct answer choice",
+      "system": ""
+    },
+    {
+      "prompt": "if x is divided by 7 , the remainder is 5 . what is the remainder if 4 x is divided by 7 ?\na ) 1 , b ) 2 , c ) 4 , d ) 6 , e ) 8",
+      "response": "x = 7 q + 5 4 x = 7 * 4 q + 20 4 x = 7 * 4 q + 7 * 2 + 6 4 x = 7 ( 4 q + 2 ) + 6 4 x = 7 k + 6 ( k = 4 q + 2 ) answer d",
+      "system": ""
+    },
+    {
+      "prompt": "a rectangular field is to be fenced on three sides leaving a side of 20 feet uncovered . if the area of the field is 650 sq . feet , how many feet of fencing will be required ?\na ) 34 , b ) 40 , c ) 85 , d ) 88 , e ) none",
+      "response": "\"explanation we have : l = 20 ft and lb = 650 sq . ft . so , b = 32.5 ft . length of fencing = ( l + 2 b ) = ( 20 + 65 ) ft = 85 ft . answer c\"",
+      "system": ""
+    },
+    {
+      "prompt": "a straight line in the xy - plane has slope 2 . on this line the x - coordinate of the point is 300 and y - coordinate is 900 then what is the y intercept of the plane ?\na ) 200 , b ) 250 , c ) 100 , d ) 300 , e ) 220",
+      "response": "\"eq of line = y = mx + c m = 2 x = 300 y = 300 * 2 + c , substitute y by 900 as given in question . 900 = 600 + c , c = 200 correct option is a\"",
+      "system": ""
+    },
+    {
+      "prompt": "a cistern has a leak which would empty the cistern in 20 minutes . a tap is turned on which admits 2 liters a minute into the cistern , and it is emptied in 24 minutes . how many liters does the cistern hold ?\na ) 480 , b ) 240 , c ) 289 , d ) 270 , e ) 927",
+      "response": "\"1 / x - 1 / 20 = - 1 / 24 x = 120 120 * 2 = 240 answer : b\"",
+      "system": ""
+    },
+    {
+      "prompt": "the cash difference between the selling prices of an article at a profit of 6 % and 8 % is rs 3 . the ratio of two selling prices is\na ) 51 : 52 , b ) 52 : 53 , c ) 53 : 54 , d ) 54 : 55 , e ) none of these",
+      "response": "\"explanation : let the cost price of article is rs . x required ratio = ( 106 % of x ) / ( 108 % of x ) = 106 / 108 = 53 / 54 = 53 : 54 . answer : c\"",
+      "system": ""
+    },
+    {
+      "prompt": "there are 20 poles with a constant distance between each pole . a car takes 22 second to reach the 12 th pole . how much will it take to reach the last pole .\na ) 38 , b ) 41 , c ) 28 , d ) 88 , e ) 22",
+      "response": "\"assuming the car starts at the first pole . to reach the 12 th pole , the car need to travel 11 poles ( the first pole does n ' t count , as the car is already there ) . 11 poles 22 seconds 1 pole ( 22 / 11 ) seconds to reach the last ( 20 th ) pole , the car needs to travel 19 poles . 19 pole 19 x ( 22 / 11 ) seconds = 38 seconds answer : a\"",
+      "system": ""
+    },
+    {
+      "prompt": "right now , al and eliot have bank accounts , and al has more money than eliot . the difference between their two accounts is 1 / 12 of the sum of their two accounts . if al ’ s account were to increase by 10 % and eliot ’ s account were to increase by 15 % , then al would have exactly $ 22 more than eliot in his account . how much money does eliot have in his account right now ?\na ) $ 146.6 , b ) $ 120 , c ) $ 180 , d ) $ 220 , e ) $ 260",
+      "response": "lets assume al have amount a in his bank account and eliot ' s bank account got e amount . we can form an equation from the first condition . a - e = 1 / 12 * ( a + e ) = = > 11 a = 13 e - - - - - - - - - - - - ( 1 ) second condition gives two different amounts , al ' s amount = 1.1 a and eliot ' s amount = 1.2 e 1.1 a = 22 + 1.15 e = = > 11 a = 220 + 11.5 e - - - - - - - ( 2 ) substituting ( 1 ) in ( 2 ) : 13 e = 220 + 11.5 e = = > 1.5 e = 220 or e = 440 / 3 = 146.6 a",
+      "system": ""
+    },
+    {
+      "prompt": "in a urban village of india named ` ` owlna ' ' , 80 % people have refrigerator , 82 % people have television , 70 % people got computers and 75 % got air - conditionor . how many people ( minimum ) got all these luxury .\na ) 3 % , b ) 8 % , c ) 7 % , d ) 10 % , e ) 15 %",
+      "response": "\"c 7 % 100 - [ ( 100 - 80 ) + ( 100 - 82 ) + ( 100 - 70 ) + ( 100 - 75 ) ] = 100 - ( 20 + 18 + 30 + 25 ) = 100 - 93\"",
+      "system": ""
+    },
+    {
+      "prompt": "a fruit seller had some apples . he sells 40 % apples and still has 420 apples . originally , he had :\na ) 588 apples , b ) 742 apples , c ) 750 apples , d ) 600 apples , e ) 700 apples",
+      "response": "\"suppose originally he had x apples . then , ( 100 - 40 ) % of x = 420 . 60 / 100 x x = 420 x = ( 420 x 100 ) / 60 = 700 . answer e\"",
+      "system": ""
+    },
+    {
+      "prompt": "pencils , pens and exercise books in a shop are in the ratio of 14 : 4 : 3 . if there are 140 pencils , the number of exercise books in the shop is :\na ) 30 , b ) 27 , c ) 35 , d ) 33 , e ) 37",
+      "response": "explanation : let pencils = 14 x , pens = 4 x & exercise books = 3 x . now , 14 x = 140 hence x = 10 number of exercise books = 3 x = 30 answer : a",
+      "system": ""
+    },
+    {
+      "prompt": "a sum was put at simple interest at a certain rate for 4 years had it been put at 2 % higher rate , it would have fetched 56 more . find the sum .\na ) 500 , b ) 600 , c ) 700 , d ) 800 , e ) none of these",
+      "response": "\"difference in s . i . = p × t / 100 ( r 1 − r 2 ) ⇒ 56 = p × 4 × 2 / 100 ( ∵ r 1 - r 2 = 2 ) ⇒ p = 56 × 100 / 4 × 2 = 700 answer c\"",
+      "system": ""
+    },
+    {
+      "prompt": "every day daniel drives 96 miles back from work . on sunday , daniel drove all the way back from work at a constant speed of x miles per hour . on monday , daniel drove the first 32 miles back from work at ( 2 x ) miles per hour , and the rest of the way at ( x / 2 ) miles per hour . the time it took daniel to drive back from work on monday is longer than the time it took him to drive back from work on sunday by what percent ?\na ) 10 % , b ) 20 % , c ) 30 % , d ) 40 % , e ) 50 %",
+      "response": "\"let ' s test x = 4 . . . . on sunday , daniel drove 96 miles at 4 miles / hour . d = ( r ) ( t ) 96 = ( 4 ) ( t ) 96 / 4 = 24 = t it takes 24 hours to drive home on monday , daniel drove the first 32 miles at ( 2 ) ( 4 ) = 8 miles / hour and the rest of the way ( 64 miles ) at 4 / 2 = 2 miles / hour d = ( r ) ( t ) 32 = ( 8 ) ( t ) 32 / 8 = 4 = t it takes 4 hours for the first part d = ( r ) ( t ) 64 = ( 2 ) ( t ) 64 / 2 = 32 = t it takes 32 hours for the second part total time to drive home on monday = 4 + 32 = 36 hours we ' re asked by what percent 36 hours is greater than 32 hours . 36 / 32 = 1.5 , so it is 50 % greater . e\"",
+      "system": ""
+    },
+    {
+      "prompt": "the true discount on a bill due 9 months hence at 16 % per annum is rs . 189 . the amount of the bill is :\na ) rs . 1386 , b ) rs . 1764 , c ) rs . 1575 , d ) rs . 2268 , e ) none of these",
+      "response": "\"solution 32.5 let p . w . be rs . x . then , s . i . on rs . x at 16 % for 9 months = rs . 189 . ∴ x 16 x 9 / 12 x 1 / 100 = 189 or x = 1575 . ∴ p . w . = rs . 1575 . ∴ sum due = p . w . + t . d . = rs . ( 1575 + 189 ) = rs . 1764 . answer b\"",
+      "system": ""
+    },
+    {
+      "prompt": "two employees m and n are paid a total of $ 583 per week by their employer . if m is paid 120 percent of the salary paid to n , how much is n paid per week ?\na ) $ 245 , b ) $ 255 , c ) $ 265 , d ) $ 275 , e ) $ 285",
+      "response": "\"1.2 n + n = 583 2.2 n = 583 n = 265 the answer is c .\"",
+      "system": ""
+    },
+    {
+      "prompt": "what is the dividend . divisor 15 , the quotient is 9 and the remainder is 5 ?\na ) a ) 140 , b ) b ) 134 , c ) c ) 148 , d ) d ) 158 , e ) e ) 160",
+      "response": "\"d = d * q + r d = 15 * 9 + 5 d = 135 + 5 d = 140 answer a\"",
+      "system": ""
+    },
+    {
+      "prompt": "how many pounds of salt at 70 cents / lb must be mixed with 45 lbs of salt that costs 40 cents / lb so that a merchant will get 20 % profit by selling the mixture at 48 cents / lb ?\na ) 5 , b ) 9 , c ) 40 , d ) 50 , e ) 25",
+      "response": "\"selling price is 48 cents / lb for a 20 % profit , cost price should be 40 cents / lb ( cp * 6 / 5 = 48 ) basically , you need to mix 40 cents / lb ( salt 1 ) with 70 cents / lb ( salt 2 ) to get a mixture costing 45 cents / lb ( salt avg ) weight of salt 1 / weight of salt 2 = ( salt 2 - saltavg ) / ( saltavg - salt 1 ) = ( 70 - 45 ) / ( 45 - 40 ) = 5 / 1 we know that weight of salt 1 is 45 lbs . weight of salt 2 must be 9 lbs . answer ( b )\"",
+      "system": ""
+    },
+    {
+      "prompt": "when n is divided by 27 , the remainder is 4 . what is the remainder when n + 16 is divided by 7 ?\na ) 2 , b ) 3 , c ) 4 , d ) 5 , e ) 6",
+      "response": "\"assume n = 23 remainder ( n / 27 ) = 4 n + 16 = 39 remainder ( 39 / 7 ) = 4 option c\"",
+      "system": ""
+    },
+    {
+      "prompt": "working together at their respective constant rates , machine a and machine b can produce 600 units in 8 hours . working alone , machine b would complete that same output in 50 % more time . if machine a were to work on its own for an 8 - hour shift , what percent of the 600 unit total would it produce ?\na ) 25 , b ) 37 , c ) 50 , d ) 30 , e ) 75",
+      "response": "\"1 / a + 1 / b = 1 / t 1 / a + 1 / 12 = 1 / 8 ( 50 % more of 8 is 12 ) 1 / a = 1 / 24 machine a can produce 600 units in 24 hrs , so it can produce 600 * 8 / 24 = 200 units is 8 hrs . 200 is 30 % of 600 . d is the answer\"",
+      "system": ""
+    },
+    {
+      "prompt": "from the sale of sleeping bags , a retailer made a gross profit of 12 % of the wholesale cost . if each sleeping bag was sold for $ 28 , what was the wholesale cost per bag ?\na ) 3.0 , b ) 3.36 , c ) 24.64 , d ) 25.0 , e ) 31.36",
+      "response": "cost price * 1.12 = selling price - - > cost price * 1.12 = $ 28 - - > cost price = $ 25 . answer : d . actually even without any math only c and d make any sense , but since 24.64 * 1.12 wo n ' t be an integer ( $ 28 ) then only answer choice d remains .",
+      "system": ""
+    },
+    {
+      "prompt": "a certain car dealership sells economy cars , luxury cars , and sport utility vehicles . the ratio of economy to luxury cars is 5 : 2 . the ratio of economy cars to sport utility vehicles is 4 : 3 . what is the ratio of luxury cars to sport utility vehicles ?\na ) 9 : 8 , b ) 8 : 15 , c ) 3 : 2 , d ) 2 : 3 , e ) 1 : 2",
+      "response": "\"the ratio of economy to luxury cars is 5 : 2 - - > e : l = 5 : 2 = 20 : 8 . the ratio of economy cars to sport utility vehicles is 4 : 3 - - > e : s = 4 : 3 = 20 : 15 . thus , l : s = 8 : 15 . answer : b .\"",
+      "system": ""
+    },
+    {
+      "prompt": "nitin borrowed some money at the rate of 6 % p . a . for the first 3 years , 9 % p . a . for the next 5 years and 13 % p . a . for the period beyond 8 years . if the total interest paid by him at the end of 11 years is rs . 8160 , how much money did he borrow ?\na ) 8000 , b ) 2787 , c ) 27766 , d ) 9976 , e ) 21671",
+      "response": "let the sum be rs . x . then , [ ( x * 6 * 3 ) / 100 ] + [ ( x * 9 * 5 ) / 100 ] + [ ( x * 13 * 3 ) / 100 ] = 8160 18 x + 45 x + 39 x = ( 8160 * 100 ) 102 x = 816000 = > x = 8000 . answer : a",
+      "system": ""
+    },
+    {
+      "prompt": "find the average of all numbers between 6 and 36 which are divisible by 7\na ) 20 , b ) 15 , c ) 25 , d ) 30 , e ) 35",
+      "response": "explanation : average = ( 7 + 14 + 21 + 28 + 35 ) / 7 = 105 / 7 = 15 option b",
+      "system": ""
+    },
+    {
+      "prompt": "linda bought 3 notebooks at $ 1.20 each ; a box of pencils at $ 1.50 and a box of pens at $ 1.70 . how much did linda spend ?\na ) $ 6.80 , b ) $ 8.40 , c ) $ 7.70 , d ) $ 4.70 , e ) $ 3.90",
+      "response": "linda spent 1.20 ? 3 = $ 3.60 on notebooks the total amount of money that linda spent is equal to 3.60 + 1.50 + 1.70 = $ 6.80 correct answer a",
+      "system": ""
+    },
+    {
+      "prompt": "a basketball is dropped from a height of 40 feet . if it bounces back up to a height that is exactly half of its previous height , and it stops bouncing after hitting the ground for the fourth time , then how many total feet will the ball have traveled after 3 full bounces .\na ) 50 , b ) 55 , c ) 110 , d ) 75 , e ) 80",
+      "response": "\"initial distance = 40 feet first bounce = 20 feet up + 20 feet down = 40 feet second bouche = 10 feet up + 10 feet down = 20 feet third bounce = 5 feet up and 5 feet down = 10 feet total distance covered = 40 + 40 + 20 + 10 = 110 answer is c\"",
+      "system": ""
+    },
+    {
+      "prompt": "evaluate : | 7 - 8 ( 3 - 12 ) | - | 5 - 11 | = ?\na ) 40 , b ) 50 , c ) 73 , d ) 70 , e ) 80",
+      "response": "\"according to order of operations , inner brackets first . hence | 7 - 8 ( 3 - 12 ) | - | 5 - 11 | = | 7 - 8 * ( - 9 ) | - | 5 - 11 | according to order of operations , multiplication within absolute value signs ( which may be considered as brackets when it comes to order of operations ) next . hence = | 7 + 72 | - | 5 - 11 | = | 79 | - | - 6 | = 79 - 6 = 73 correct answer c ) 73\"",
+      "system": ""
+    },
+    {
+      "prompt": "in a certain pond , 50 fish were caught , tagged , and returned to the pond . a few days later , 50 fish were caught again , of which 2 were found to have been tagged . if the percent of tagged fish in the second catch approximates the percent of tagged fish in the pond , what is the approximate number of fish in the pond ?\na ) 400 , b ) 625 , c ) 1,250 , d ) 2,500 , e ) 10,000",
+      "response": "\"total fish = x percentage of second catch = ( 2 / 50 ) * 100 = 4 % so , x * 4 % = 50 x = 1250 answer : c\"",
+      "system": ""
+    },
+    {
+      "prompt": "the product of 4 consecutive even numbers is always divisible by :\na ) 384 , b ) 350 , c ) 400 , d ) 200 , e ) 250",
+      "response": "the product of 4 consecutive numbers is always divisible by 4 ! . since , we have 4 even numbers , we have an additional 2 available with each number . now , using both the facts , we can say that the product of 4 consecutive even numbers is always divisible by , 2 ^ 4 * 4 ! 16 * 24 = 384 answer a",
+      "system": ""
+    },
+    {
+      "prompt": "a certain lab experiments with white and brown mice only . in one experiment , 2 / 3 of the mice are white . if there are 14 white mice in the experiment , how many brown mice are in the experiment ?\na ) 12 , b ) 8 , c ) 28 , d ) 7 , e ) 27",
+      "response": "let total number of mice = m number of white mice = 2 / 3 m = 14 m = 21 number of brown mice = 1 / 3 m = 1 / 3 * 21 = > brown mice = 7 answer d",
+      "system": ""
+    },
+    {
+      "prompt": "a cube of edge 8 cm is cut into cubes each of edge 1 cm . the ratio of the total surface area of one of the small cubes to that of the large cube is equal to :\na ) 1 : 25 , b ) 1 : 22 , c ) 1 : 52 , d ) 1 : 64 , e ) none",
+      "response": "\"sol . required ratio = 6 * 1 * 1 / 6 * 8 * 8 = 1 / 64 = 1 : 64 . answer d\"",
+      "system": ""
+    },
+    {
+      "prompt": "a can give b 70 meters start and c 200 meters start in a kilometer race . how much start can b give c in a kilometer race ?\na ) 139.78 , b ) 139.13 , c ) 139.22 , d ) 111.0 , e ) 111.12",
+      "response": "\"a runs 1000 m while b runs 930 m and c runs 800 m . the number of meters that c runs when b runs 1000 m , = ( 1000 * 800 ) / 930 = 860.21 m . b can give c = 1000 - 860.21 = 139.78 m . answer : a\"",
+      "system": ""
+    },
+    {
+      "prompt": "what least number should be added to 1053 , so that the sum is completely divisible by 23\na ) a ) 4 , b ) b ) 1 , c ) c ) 2 , d ) d ) 3 , e ) e ) 5",
+      "response": "\"explanation : ( 1053 / 23 ) gives remainder 18 18 + 5 = 23 , so we need to add 5 answer : option e\"",
+      "system": ""
+    },
+    {
+      "prompt": "what is the sum of the integers 45 through 175 inclusive ?\na ) 12,295 , b ) 13,000 , c ) 14,300 , d ) 14,410 , e ) 28,820",
+      "response": "\"sum of n consecutive positive integers = n ( n + 1 ) / 2 . . in one case n = 44 and other 175 . . . subtract the sum to get the answer sum of first 175 + ive numbers = 175 * 176 / 2 = 15400 . . sum of first 45 + i ' ve numbers = 45 * 44 / 2 = 990 . . answer = 15400 - 990 = 14410 answer : d\"",
+      "system": ""
+    },
+    {
+      "prompt": "the circumferences of two circles are 528 meters and 704 meters . find the difference between the areas of the larger and the smaller circles ?\na ) 29963 sq m , b ) 28937 sq m , c ) 43162 sq m , d ) 27688 sq m , e ) 17248 sq m",
+      "response": "\"let the radii of the smaller and the larger circles be s m and l m respectively . 2 ∏ s = 528 and 2 ∏ l = 704 s = 528 / 2 ∏ and l = 704 / 2 ∏ difference between the areas = ∏ l ^ 2 - ∏ s ^ 2 = ∏ { 264 ^ 2 / ∏ ^ 2 - 352 ^ 2 / ∏ ^ 2 } = 264 ^ 2 / ∏ - 352 ^ 2 / ∏ = ( 264 - 352 ) ( 264 + 352 ) / ∏ = ( 88 ) ( 616 ) / ( 22 / 7 ) = 17248 sq m answer : e\"",
+      "system": ""
+    },
+    {
+      "prompt": "if a randomly selected non - negative single digit integer is added to { 2 , 3 , 4 , 9 } . what is the probability that the median of the set will increase but the range still remains the same ?\na ) 0.2 , b ) 0.3 , c ) 0.4 , d ) 0.5 , e ) 0.6",
+      "response": "\"we are selecting from non - negative single digit integers , so from { 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 } . these 10 digits represent the total number of outcomes . hence , the total number of outcomes is 10 . we need to find the probability that the median of the set will increase but the range still remains the same . the median of the set is ( 3 + 4 ) / 2 = 3.5 , thus the number selected must be 4 or greater . for the range to remain the same , the number must be between 2 and 9 inclusive . to satisfy both conditions , the number selected must be 4 , 5 , 6 , 7 , 8 , or 9 . the probability is 6 / 10 = 0.6 the answer is e .\"",
+      "system": ""
+    },
+    {
+      "prompt": "a man rows his boat 90 km downstream and 55 km upstream , taking 3 hours each time . find the speed of the stream ?\na ) 76 kmph , b ) 6 kmph , c ) 14 kmph , d ) 8 kmph , e ) 4 kmph",
+      "response": "\"speed downstream = d / t = 90 / ( 3 ) = 30 kmph speed upstream = d / t = 55 / ( 3 ) = 18 kmph the speed of the stream = ( 30 - 18 ) / 2 = 6 kmph answer : b\"",
+      "system": ""
+    },
+    {
+      "prompt": "in a class , 12 students like to play basketball and 8 like to play cricket . 3 students like to play on both basketball and cricket . how many students like to play basketball or cricket or both ?\na ) 12 , b ) 15 , c ) 17 , d ) 18 , e ) 22",
+      "response": "\"draw a venn diagram yourself ! b + c - bc = number of students that play either basketball or cricket 12 + 8 - 3 = 17 c )\"",
+      "system": ""
+    },
+    {
+      "prompt": "the area of one square is x ^ 2 + 12 x + 36 and the area of another square is 4 x ^ 2 − 12 x + 9 . if the sum of the perimeters of both squares is 64 , what is the value of x ?\na ) 0 , b ) 4.3 , c ) 2.5 , d ) 4.67 , e ) 10",
+      "response": "spotting the pattern of equations both are in form of ( x + c ) ^ 2 so a 1 = ( x + 6 ) ^ 2 a 2 = ( 2 x - 3 ) ^ 2 l 1 = x + 6 l 2 = 2 x - 3 p 1 = 4 ( x + 6 ) p 2 = 4 ( 2 x - 3 ) p 1 + p 2 = 64 4 ( x + 6 ) + 4 ( 2 x - 3 ) = 64 . . . . . . . . . . . . . . > x = 4.3 answer : b",
+      "system": ""
+    },
+    {
+      "prompt": "the length of the bridge , which a train 160 meters long and travelling at 45 km / hr can cross in 30 seconds , is ?\na ) 766 m , b ) 156 m , c ) 215 m , d ) 156 m , e ) 156 m",
+      "response": "\"speed = ( 45 * 5 / 18 ) m / sec = ( 25 / 2 ) m / sec . time = 30 sec . let the length of bridge be x meters . then , ( 160 + x ) / 30 = 25 / 2 = = > 2 ( 160 + x ) = 750 = = > x = 215 m . answer : c\"",
+      "system": ""
+    },
+    {
+      "prompt": "how many 4 - digit positive integers are there , where each digit is positive , and no 4 adjacent digits are same ?\na ) 1236 , b ) 3024 , c ) 4096 , d ) 4608 , e ) 6561",
+      "response": "\"first digit . . 9 posibilities second digit , 8 possibilities third digit , 7 possibilities fourth digit , 6 possibilities . 9 * 8 * 7 * 6 = 3024 . b\"",
+      "system": ""
+    },
+    {
+      "prompt": "if 2 men or 3 women can reap a field in 5 days how long will 5 men and 6 women take to reap it ?\na ) 2 / 24 , b ) 6 / 18 , c ) 2 / 22 , d ) 5 / 12 , e ) 9 / 10",
+      "response": "\"explanation : 2 men reap 2 / 5 field in 1 day 1 man reap 1 / ( 2 x 5 ) 3 women reap 1 / 43 field in 1 day 1 woman reap 1 / ( 5 x 3 ) 5 men and 6 women reap ( 5 / ( 2 x 5 ) + 6 / ( 3 x 5 ) ) = 9 / 10 in 1 day 5 men and 6 women will reap the field in 9 / 10 days answer : option e\"",
+      "system": ""
+    },
+    {
+      "prompt": "a train 120 m long is running with a speed of 66 km / hr . in what time will it pass a man who is roller skating at 12 km / hr in the direction opposite to that in which the train is going ?\na ) 4.92 , b ) 6.92 , c ) 7.92 , d ) 4.92 , e ) 2.92",
+      "response": "\"speed of train relative to man = 66 + 12 = 78 km / hr . = 78 * 5 / 18 = 65 / 3 m / sec . time taken to pass the man = 150 * 3 / 65 = 6.92 sec . answer : b\"",
+      "system": ""
+    },
+    {
+      "prompt": "how many different positive integers exist between 10 ^ 6 and 10 ^ 7 , the sum of whose digits is equal to 2 ?\na ) 5 , b ) 6 , c ) 7 , d ) 8 , e ) 18",
+      "response": "\"total cases = > 1000000 = > 6 cases for 1 being present at any of the 6 zero and last case 2000000 hence & cases answer : c\"",
+      "system": ""
+    },
+    {
+      "prompt": "find the average of all numbers between 3 and 86 which are divisible by 5\na ) 15 , b ) 20 , c ) 25 , d ) 30 , e ) 45",
+      "response": "\"explanation : average = ( 5 + 10 + 15 + 20 + 25 + 30 + 35 + 40 + 45 + 50 + 55 + 60 + 65 + 70 + 75 + 80 + 85 ) / 17 = 765 / 17 = 45 answer : option e\"",
+      "system": ""
+    },
+    {
+      "prompt": "what is the remainder when 8 ^ 1 + 8 ^ 2 + 8 ^ 3 + . . . + 8 ^ 9 is divided by 2 ?\na ) 4 , b ) 3 , c ) 2 , d ) none of the above , e ) 5",
+      "response": "\"notice that in the brackets we have the sum of 9 even multiples of 2 , which yields remainder of 0 upon division by 2 . answer : d\"",
+      "system": ""
+    },
+    {
+      "prompt": "milk contains 5 % water . what content of pure milk should be added to 10 liters of milk to reduce this to 2 % ?\na ) 10 liters , b ) 15 liters , c ) 20 liters , d ) 18 liters , e ) 22 liters",
+      "response": "\"quantity of water in 10 liters = 5 % of 10 liters = 0.5 liters let x liters of pure milk be added . then , 0.5 / ( 10 + x ) = 2 / 100 2 x = 30 x = 15 liters answer is b\"",
+      "system": ""
+    },
+    {
+      "prompt": "a student scored an average of 75 marks in 3 subjects : physics , chemistry and mathematics . if the average marks in physics and mathematics is 90 and that in physics and chemistry is 70 , what are the marks in physics ?\na ) 86 , b ) 16 , c ) 76 , d ) 95 , e ) 26",
+      "response": "\"given m + p + c = 75 * 3 = 225 - - - ( 1 ) m + p = 90 * 2 = 180 - - - ( 2 ) p + c = 70 * 2 = 140 - - - ( 3 ) where m , p and c are marks obtained by the student in mathematics , physics and chemistry . p = ( 2 ) + ( 3 ) - ( 1 ) = 180 + 140 - 225 = 95 answer : d\"",
+      "system": ""
+    },
+    {
+      "prompt": "a question paper has 2 parts , a & b , each containing 10 questions . if a student has to choose 8 from part a & 5 from part b , in how many ways can he choose the questions ?\na ) 1100 , b ) 1200 , c ) 1235 , d ) 1354 , e ) 1140",
+      "response": "there 10 questions in part a out of which 8 question can be chosen as = 10 c 8 . similarly , 5 questions can be chosen from 10 questions of part b as = 10 c 5 . hence , total number of ways , = 10 c 8 * 10 c 5 = [ 10 ! / ( 2 ! 8 ! ) ] * [ 10 ! / ( 5 ! * 5 ) ] = { 10 * 9 / 2 } * { 10 * 9 * 8 * 7 * 6 / ( 5 * 4 * 3 * 2 * 1 ) } = 1140 . e",
+      "system": ""
+    },
+    {
+      "prompt": "16 is what % of 90 ?\na ) 16 , b ) 17.78 , c ) 17 , d ) 18.5 , e ) 18.23",
+      "response": "we assume that 90 is 100 % assume ' x ' is value we looking for here , 90 = 100 % and x % = 16 therefore , 100 / x = 90 / 16 100 / x = 5.625 x = 17.78 b",
+      "system": ""
+    },
+    {
+      "prompt": "marla starts running around a circular track at the same time nick starts walking around the same circular track . marla completes 28 laps around the track per hour and nick completes 13 laps around the track per hour . how many minutes after marla and nick begin moving will marla have completed 4 more laps around the track than nick ?\na ) 5 , b ) 8 , c ) 12 , d ) 16 , e ) 20",
+      "response": "\"maria ' s rate - 28 laps per hour - - > 28 / 60 laps / min nick ' s rate - 13 laps per hour - - > 13 / 60 laps / min lets set equations : 28 / 60 * t = 4 ( since maria had to run 4 laps before nick would start ) 13 / 60 * t = 0 ( hick has just started and has n ' t run any lap yet ) ( 28 / 60 - 13 / 60 ) * t = 4 - 0 ( since nick was chasing maria ) t = 16 min needed maria to run 4 laps answer : d\"",
+      "system": ""
+    },
+    {
+      "prompt": "what is the least common multiple of 15 , 18 , and 34 ?\na ) 60 , b ) 120 , c ) 240 , d ) 360 , e ) 1530",
+      "response": "\"let us first write the numbers in the form of prime factors : 15 = 3 * 5 18 = 2 * 3 ^ 2 34 = 2 * 17 ^ 1 the lcm would be the largest powers of the prime numbers from all these three numbers . hence lcm = 1530 option e\"",
+      "system": ""
+    },
+    {
+      "prompt": "divide rs . 800 among a , b and c so that a receives 1 / 3 as much as b and c together and b receives 2 / 3 as a and c together . a ' s share is ?\na ) s . 800 , b ) s . 200 , c ) s . 600 , d ) s . 500 , e ) s . 900",
+      "response": "\"a + b + c = 800 a = 1 / 3 ( b + c ) ; b = 2 / 3 ( a + c ) a / ( b + c ) = 1 / 3 a = 1 / 4 * 800 = > 200 answer : b\"",
+      "system": ""
+    },
+    {
+      "prompt": "what is the smallest number which when increased by 3 is divisible by 12 , 15 , 35 , and 40 ?\na ) 837 , b ) 947 , c ) 1027 , d ) 1155 , e ) 1231",
+      "response": "\"factor each of the numbers 8 , 15 , 35 , and 40 into primes : 12 = 2 * 2 * 3 ; 15 = 3 * 5 ; 35 = 5 * 7 ; 40 = 2 * 2 * 2 * 5 the smallest number divisible by all of them is thus 2 * 2 * 2 * 3 * 5 * 7 = 840 837 + 3 = 840 a\"",
+      "system": ""
+    },
+    {
+      "prompt": "if the range w of the 6 numbers 4 , 314 , 710 and x is 12 , what is the difference between the greatest possible value of x and least possible value of x ?\na ) 0 , b ) 2 , c ) 12 , d ) 13 , e ) 15",
+      "response": "the range w of a set is the difference between the largest and smallest elements of a set . without x , the difference between the largest and smallest elements of a set is 14 - 3 = 11 < 12 , which means that in order 12 to be the range of the set x must be either the smallest element so that 14 - x = 12 - - - > x = 2 or x must the largest element so that x - 3 = 12 - - > x = 15 . the the difference between the greatest possible value of x and least possible value of x is 15 - 2 = 13 . answer : d .",
+      "system": ""
+    },
+    {
+      "prompt": "in the xy - coordinate plane , the graph of y = - x ^ 2 + 9 intersects line l at ( p , 5 ) and ( t , - 8 ) . what is the least possible value of the slope of line l ?\na ) - 6.5 , b ) 2 , c ) - 2 , d ) - 6 , e ) - 10",
+      "response": "\"we need to find out the value of p and l to get to the slope . line l and graph y intersect at point ( p , 5 ) . hence , x = p and y = 5 should sactisfy the graph . soliving 5 = - p 2 + 9 p 2 = 4 p = + or - 2 simillarly point ( t , - 8 ) should satisfy the equation . hence x = t and y = - 8 . - 7 = - t 2 + 9 t = + or - 4 considering p = - 2 and t = 4 , the least slope is ( - 8 - 5 ) / ( 4 - 2 ) = - 6.5 imo option a is correct answer .\"",
+      "system": ""
+    },
+    {
+      "prompt": "how many 9 - digits number are palindromic numbers ? a palindromic number reads the same forward and backward , example 123454321 .\na ) 100 , b ) 610 , c ) 729 , d ) 900 , e ) 90000",
+      "response": "take the task of building palindromes and break it intostages . stage 1 : select the 9 th digit we can choose 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , or 9 so , we can complete stage 1 in 9 ways stage 2 : select the 8 th , 7 th , 6 th , 5 th digit we can choose 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , or 9 so , we can complete stage 2 in 10 ways important : at this point , the remaining digits are alreadylocked in . stage 3 : select the 4 th , 3 rd , 2 nd , 1 st digit so , we can complete this stage in 1 way . by thefundamental counting principle ( fcp ) , we can complete all 5 stages ( and thus build a 9 - digit palindrome ) in ( 9 ) ( 10 ) ( 10 ) ( 10 ) ( 10 ) ( 1 ) ( 1 ) ( 1 ) ( 1 ) ways ( = 90000 ways ) answer : e",
+      "system": ""
+    },
+    {
+      "prompt": "if taxi fares were $ 1.00 for the first 1 / 5 mile and $ 0.30 for each 1 / 5 mile there after , then the taxi fare for a 3 - mile ride was\na ) $ 1.56 , b ) $ 2.40 , c ) $ 3.80 , d ) $ 5.20 , e ) $ 2.80",
+      "response": "\"in 3 miles , initial 1 / 5 mile charge is $ 1 rest of the distance = 3 - ( 1 / 5 ) = 14 / 5 rest of the distance charge = 14 ( 0.3 ) = $ 4.2 ( as the charge is 0.3 for every 1 / 5 mile ) = > total charge for 3 miles = 1 + 4.2 = 5.2 answer is d .\"",
+      "system": ""
+    },
+    {
+      "prompt": "find the largest 6 digit number which is exactly divisible by 88 ?\na ) 998765 , b ) 998907 , c ) 999944 , d ) 999954 , e ) 999990",
+      "response": "largest 6 digit number is 999999 after doing 999999 ÷ 88 we get remainder 55 hence largest 6 digit number exactly divisible by 88 = 999999 - 55 = 999944 c",
+      "system": ""
+    },
+    {
+      "prompt": "if n divided by 7 has a remainder of 2 , what is the remainder when 6 times n is divided by 7 ?\na ) 1 , b ) 2 , c ) 3 , d ) 5 , e ) 6",
+      "response": "\"as per question = > n = 7 p + 2 for some integer p hence 6 n = > 42 q + 12 = > remainder = > 12 for some integer q alternatively = > n = 2 > 6 n = > 12 = > 12 divided by 7 will leave a remainder 5 hence d\"",
+      "system": ""
+    },
+    {
+      "prompt": "in a certain egg - processing plant , every egg must be inspected , and is either accepted for processing or rejected . for every 96 eggs accepted for processing , 4 eggs are rejected . if , on a particular day , 12 additional eggs were accepted , but the overall number of eggs inspected remained the same , the ratio of those accepted to those rejected would be 99 to 1 . how many w eggs does the plant process per day ?\na ) 100 , b ) 300 , c ) 400 , d ) 3,000 , e ) 4,000",
+      "response": "\"straight pluggin in for me . as usual , i started with c and got the answer . lets ' back calculate and see what we get let us consider eggs processed each day to be 400 so initial ratio of eggs processed and rejected is 96 : 4 or 24 : 1 so out of 400 eggs , there will be 384 eggs processed and 16 rejected . now if the no . of eggs inspected remain and 12 more eggs get accepted that means there w = 384 + 12 = 396 eggs accepted and 4 rejected . . . and the ratio will be 99 : 1 bingo . . . this is what the questions says . . . . its always a good idea to start with c .\"",
+      "system": ""
+    },
+    {
+      "prompt": "if 0.5 % of a = 80 paise , then the value of a is ?\na ) rs . 170 , b ) rs . 160 , c ) rs . 1.70 , d ) rs . 4.25 , e ) none",
+      "response": "\"answer ∵ 0.5 / 100 of a = 80 / 100 ∴ a = rs . ( 80 / 0.5 ) = rs . 160 correct option : b\"",
+      "system": ""
+    },
+    {
+      "prompt": "wink , inc . follows a certain procedure that requires two tasks to be finished independently in order for a job to be done . on any given day , there is a 7 / 8 probability that task 1 will be completed on time , and a 1 / 5 probability that task 2 will be completed on time . on a certain day , what is the probability that task 1 will be completed on time , but task 2 will not ?\na ) 1 / 20 , b ) 3 / 40 , c ) 13 / 40 , d ) 7 / 10 , e ) 13 / 22",
+      "response": "\"p ( 1 and not 2 ) = 7 / 8 * ( 1 - 1 / 5 ) = 7 / 10 . answer : d .\"",
+      "system": ""
+    },
+    {
+      "prompt": "each of the integers from 0 to 9 , inclusive , is written on a separate slip of blank paper and the ten slips are dropped into a hat . if 4 of the slips are the drawn , without replacement , what is the probability that all 4 have a odd number written on it ?\na ) 1 / 12 , b ) 1 / 10 , c ) 1 / 8 , d ) 1 / 42 , e ) 5 / 9",
+      "response": "\"key is that there is no replacement , so each successive choice will become more skewed towards picking a neg ( i . e . the pool of positives decreases , while the pool of negatives stay the same ) p ( + on 1 st pick ) = 5 / 10 p ( + on 2 nd pick ) = 4 / 9 p ( + on 3 rd pick ) = 3 / 8 p ( + on 4 rd pick ) = 2 / 7 5 / 10 * 4 / 9 * 3 / 8 * 2 / 7 = 1 / 42 d\"",
+      "system": ""
+    },
+    {
+      "prompt": "in a 1000 m race , a beats b by 200 meters or 25 seconds . find the speed of b ?\na ) 5 m / s , b ) 8 m / s , c ) 9 m / s , d ) 4 m / s , e ) 2 m / s",
+      "response": "b 8 m / s since a beats b by 200 m or 25 seconds , i t implies that b covers 200 m in 25 seconds . hence speed of b = 200 / 25 = 8 m / s .",
+      "system": ""
+    },
+    {
+      "prompt": "a ’ s speed is 20 / 19 times that of b . if a and b run a race , what part of the length of the race should a give b as a head start , so that the race ends in a dead heat ?\na ) 1 / 19 , b ) 3 / 19 , c ) 1 / 10 , d ) 1 / 20 , e ) 3 / 10",
+      "response": "\"let d be the full distance . let x be the fraction of the distance that b runs . let v be the speed at which b runs . the time should be the same for both runners . time = d / ( 20 v / 19 ) = xd / v ( 19 / 20 ) * d / v = x * d / v x = 19 / 20 b should have a head start of 1 / 20 of the full distance . the answer is d .\"",
+      "system": ""
+    },
+    {
+      "prompt": "p and q started a business investing rs . 85,000 and rs . 35,000 respectively . in what ratio the profit earned after 2 years be divided between p and q respectively ?\na ) 17 : 6 , b ) 17 : 0 , c ) 17 : 7 , d ) 17 : 2 , e ) 17 : 3",
+      "response": "p : q = 85000 : 35000 = 17 : 7 . answer : c",
+      "system": ""
+    },
+    {
+      "prompt": "the product of two numbers is 266 and their difference is 5 . what is the bigger number ?\na ) 13 , b ) 15 , c ) 19 , d ) 24 , e ) none of these",
+      "response": "\"explanation : let the two numbers be a and b , here a > b ab = 266 b = 266 / a - - - - - - - - - - - - - - - - - ( i ) given , a – b = 5 - - - - - - - - - - - ( ii ) substitute from ( i ) in ( ii ) , we get a – 266 / a = 5 a 2 – 5 a + 266 = 0 ( a – 19 ) ( a – 14 ) = 0 therefore , a = 19 or a = 14 hence , bigger number = a = 19 answer : c\"",
+      "system": ""
+    },
+    {
+      "prompt": "vijay sells a cupboard at 12 % below cost price . had he got rs . 1500 more , he would have made a profit of 12 % . what is the cost price of the cupboard ?\na ) 7450 , b ) 14900 , c ) 6250 , d ) 6000 , e ) none of these",
+      "response": "\"explanation : cost price = 1500 / ( 0.12 + 0.12 ) = 1500 / 0.24 = rs . 6250 answer c\"",
+      "system": ""
+    },
+    {
+      "prompt": "in the junior basketball league there are 18 teams , 2 / 3 of them are bad and ½ are rich . what ca n ' t be the number of teams that are rich and bad ?\na ) 4 . , b ) 6 . , c ) 7 . , d ) 8 . , e ) 10",
+      "response": "otal teams = 18 bad teams = ( 2 / 3 ) * 18 = 12 rich teams = 9 so maximum value that the both rich and bad can take will be 9 . so e = 10 can not be that value . answer : e",
+      "system": ""
+    },
+    {
+      "prompt": "if i walk at 5 km / h , i miss the bus by 9 minutes . if i walk at 3 km / h , i reach 6 minutes before the arrival of the bus . how far i walk to reach the bus stand ?\na ) 1.99 km , b ) 1.55 km , c ) 1.82 km , d ) 2.87 km , e ) 1.87 km",
+      "response": "\"d = product of speed difference of time / difference of speed d = 5 x 3 / 60 [ 9 â ˆ ’ ( â ˆ ’ 6 ) / 5 - 3 ] [ here , â € “ ve sign indicates before the schedule time ] â ‡ ’ d = 1.87 km answer e\"",
+      "system": ""
+    },
+    {
+      "prompt": "3 * 12 + 3 * 13 + 3 * 16 + 11 = ?\na ) 122 , b ) 126 , c ) 134 , d ) 148 , e ) 151",
+      "response": "3 * 12 + 3 * 13 + 3 * 16 + 11 = 36 + 39 + 48 + 11 = 134 the answer is c .",
+      "system": ""
+    },
+    {
+      "prompt": "37 . if the cost price of 15 tables be equal to the selling price of 20 tables , the loss per cent is ?\na ) 20 % , b ) 30 % , c ) 25 % , d ) 37.5 % , e ) 38 %",
+      "response": "let c . p . of each table = re . 1 c . p . of 20 tables = rs . 20 s . p . of 20 table = c . p . of 15 tables = rs . 15 loss = ( 5 / 20 ) x 100 % = 25 % answer : c",
+      "system": ""
+    },
+    {
+      "prompt": "a man can row his boat with the stream at 25 km / h and against the stream in 13 km / h . the man ' s rate is ?\na ) 1 kmph , b ) 2 kmph , c ) 6 kmph , d ) 8 kmph , e ) 3 kmph",
+      "response": "\"ds = 25 us = 13 s = ? s = ( 25 - 13 ) / 2 = 6 kmph answer : c\"",
+      "system": ""
+    },
+    {
+      "prompt": "in what ratio must tea at rs . 65 per kg be mixed with tea at rs . 70 per kg so that the mixture must be worth rs . 6 per kg ?\na ) 1 : 1 , b ) 3 : 2 , c ) 4 : 3 , d ) 5 : 3 , e ) none",
+      "response": "\"required ratio = 500 : 500 = 1 : 1 answer a\"",
+      "system": ""
+    },
+    {
+      "prompt": "in a certain animal population , for each of the first 3 months of life , the probability that an animal will die during that month is 1 / 10 . for a group of 700 newborn members of the population , approximately how many would be expected to survive the first 3 months of life ?\na ) 511 , b ) 546 , c ) 552 , d ) 562 , e ) 570",
+      "response": "\"number of newborns that can die in first month = 1 / 10 * 700 = 70 survived = 630 number of newborns that can die in second month = 1 / 10 * 630 = 63 survived = 567 number of newborns that can die in third month = 1 / 10 * 567 = 56 survived = 511 answer : a\"",
+      "system": ""
+    },
+    {
+      "prompt": "if 60 percent of 500 is 50 percent of x , then x = ?\na ) 600 , b ) 620 , c ) 650 , d ) 700 , e ) 720",
+      "response": "\"0.6 * 500 = 0.5 * x x = 6 / 5 * 500 = 600\"",
+      "system": ""
+    },
+    {
+      "prompt": "a train running at the speed of 60 km / hr crosses a pole in 12 seconds . what is the length of the train ?\na ) 187 m , b ) 278 m , c ) 876 m , d ) 200 m , e ) 267 m",
+      "response": "\"speed = ( 60 * 5 / 18 ) m / sec = ( 50 / 3 ) m / sec length of the train = ( speed x time ) = ( 50 / 3 * 12 ) m = 200 m . answer : d\"",
+      "system": ""
+    },
+    {
+      "prompt": "when 242 is divided by a certain divisor the remainder obtained is 6 . when 698 is divided by the same divisor the remainder obtained is 13 . however , when the sum of the two numbers 242 and 698 is divided by the divisor , the remainder obtained is 5 . what is the value of the divisor ?\na ) 11 , b ) 14 , c ) 18 , d ) 23 , e ) none of these",
+      "response": "\"let that divisor be x since remainder is 6 or 13 it means divisor is greater than 13 . now 242 - 6 = 236 = kx ( k is an integer and 234 is divisble by x ) similarly 698 - 13 = 685 = lx ( l is an integer and 689 is divisible by x ) adding both 698 and 242 = ( 236 + 685 ) + 6 + 13 = x ( k + l ) + 19 when we divide this number by x then remainder will be equal to remainder of ( 19 divided by x ) = 5 hence x = 19 - 5 = 14 hence b\"",
+      "system": ""
+    },
+    {
+      "prompt": "two trains are moving in the same direction at 72 kmph and 36 kmph . the faster train crosses a man in the slower train in 15 seconds . find the length of the faster train ?\na ) 270 , b ) 277 , c ) 187 , d ) 257 , e ) 150",
+      "response": "\"relative speed = ( 72 - 36 ) * 5 / 18 = 2 * 5 = 10 mps . distance covered in 15 sec = 15 * 10 = 150 m . the length of the faster train = 150 m . answer : e\"",
+      "system": ""
+    },
+    {
+      "prompt": "the speed at which a man can row a boat in still water is 20 kmph . if he rows downstream , where the speed of current is 3 kmph , what time will he take to cover 60 metres ?\na ) 16 seconds , b ) 76 seconds , c ) 26 seconds , d ) 9.4 seconds , e ) 18 seconds",
+      "response": "\"speed of the boat downstream = 20 + 3 = 23 kmph = 23 * 5 / 18 = 115 / 18 m / s hence time taken to cover 60 m = 60 * 18 / 115 = 9.4 seconds . answer : d\"",
+      "system": ""
+    },
+    {
+      "prompt": "how many trucks are there if each truck carrying 70 packages and total of 490 packages ?\na ) a ) 7 , b ) b ) 6 , c ) c ) 9 , d ) d ) 11 , e ) e ) none of the above",
+      "response": "sol . total packages 490 each truck carries 70 packages = 490 / 70 = 7 answer : a",
+      "system": ""
+    },
+    {
+      "prompt": "tough and tricky questions : statistics . set x consists of prime numbers { 3 , 11 , 7 , a , 17 , 19 } . if integer y represents the product of all elements in set x and if 11 y is an even number , what is the range of set x ?\na ) 14 , b ) 16 , c ) 17 , d ) 20 , e ) 26",
+      "response": "since 11 y = even therefore y has to beevensince 11 is a odd integer ( even * odd = even ) similarly , y is the product of all integers in set x but all integers in set x are odd except the unknown a and since x contains only prime numbers , a has to equal to 2 . . . ( 2 is the only even prime number and the product of all prime numbers in set x has to be even , even * odd = even ) since you know value of a you can calculate the range = largest integer in the set minus smallest integer in the set = 19 - 2 = 17 answer is c",
+      "system": ""
+    }
+  ]
+}

Math_QA/group_09/tokenizer/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

Math_QA/group_09/tokenizer/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

Math_QA/group_09/tokenizer/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

Math_QA/group_09/tokenizer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

Math_QA/group_09/tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

Math_QA/group_09/tokenizer/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

README.md ADDED Viewed

	@@ -0,0 +1,223 @@

+# Qwen2.5-1.5B Math LoRA Collection
+This directory aggregates all LoRA checkpoints produced by the `train_lora` pipeline. Every subfolder corresponds to one math dataset and contains 10 independent 100-shot LoRA runs (group `00`–`09`) trained on **Qwen2.5‑1.5B-Instruct** with identical hyperparameters. The adapters here are the source of truth for downstream evaluation (`../评估体系`) and for the `parameter_generator` project, which learns to map prompts to LoRA weights.
+If you are new to the project, this document explains **where the data comes from, how the LoRAs are produced, and how you can reuse them for inference, evaluation, or further training**.
+## Provenance
+- **Base model:** `/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct`
+- **Datasets:** sampled from `../../prepare/data/math/*.json`. Each JSON is a list of `{prompt, response, system?}` records. `dataset_sampler.py` draws 10 disjoint groups of 100 samples (unless the dataset has <1 000 examples, in which case sampling with replacement keeps the group size fixed) using a deterministic seed derived from the dataset name.
+- **Training recipe (from `config/default.yaml`):**
+  - sequence length 4 096; LoRA `r=64`, `alpha=128`, `dropout=0.05`, target modules = `{q,k,v,o,gate,up,down}_proj`
+  - 12 epochs / max 1 800 steps, learning rate `1e-4`, batch size per device `2`, gradient accumulation `16`, BF16 training, gradient checkpointing on, weight decay `0.01`, warmup ratio `0.03`, checkpoints saved every 300 steps (keeping at most 6) plus a final adapter export
+  - Tokenizers are cloned from the base model (pad token defaults to EOS if missing)
+- **Monitoring & reproducibility:**
+  - Trainer logs (loss, LR, throughput) are in `../logs/<dataset>/group_xx/`.
+  - Slurm stdout/err for each shard live in `../logs/slurm/`.
+  - `metadata.json` captures the git commit (if `GIT_COMMIT` was set), timestamps, seeds, and the effective batch size so any experiment can be repeated exactly.
+### End-to-end data flow
+1. **Raw JSON data** comes from `../../prepare/data/math`. Each file is a list of dict objects with keys:
+   ```json
+   {
+     "prompt": "...question...",
+     "response": "...reference answer...",
+     "system": "optional system message"
+   }
+   ```
+2. `python -m train_lora.dataset_sampler --config config/default.yaml` reads every dataset, filters out `GSM8K_test.json`, and deterministically samples 10×100 items per dataset. The samples plus metadata (indices, seeds, timestamps) are written to `../prompt_groups/<dataset>/group_xx.json`.
+3. `python -m train_lora.run_tasks --run` (or the Slurm array) iterates dataset/group pairs, loads the corresponding prompt group, and performs LoRA fine-tuning with Hugging Face `Trainer`.
+4. After training finishes, the following artifacts land in `outputs/<dataset>/group_xx/`:
+   - a ready-to-use LoRA adapter (`adapter/`)
+   - intermediate checkpoints for analysis/resume
+   - tokenizers and metadata
+5. The evaluation stacks (`../评估体系`, `../parameter_generator/评估`) and the LoRA parameter generator both consume these directories directly.
+## Directory layout
+```
+outputs/
+├── Competition_Math/
+├── GSM8K_train/
+├── MATH/
+├── Math-IIO-68K-Mini/
+├── Math-Plus/
+├── Math_QA/
+├── Mu-Math/
+└── ToT-Math-V1/
+```
+Each dataset directory contains `group_00` … `group_09`. Inside every group:
+| Item | Description |
+| --- | --- |
+| `adapter/` | Final LoRA export (`adapter_model.safetensors`, `adapter_config.json`, tokenizer + chat template snapshots, and HF `training_args.bin`). This is the folder you will load for inference. |
+| `checkpoints/checkpoint-xxxx/` | Intermediate Trainer checkpoints saved every 300 steps (300–1 800). They include optimizer, scheduler, RNG state, and tokenizer copies for resuming or studying training dynamics. |
+| `tokenizer/` | Standalone tokenizer snapshot identical to the one used during training; useful if you need a self-contained deployment without referencing the base model directory. |
+| `prompt_group.json` | The exact 100-shot dataset used for this training run (a copy of `prompt_groups/<dataset>/group_xx.json`). Contains metadata such as sampled indices, original source file, and timestamp. |
+| `metadata.json` | Provenance record with training loss, Trainer metrics, LoRA config, effective batch size/world size, timestamps, git commit (if exported), and file paths. |
+| `metadata.json -> trainer_state` | Full training log history (per-step metrics). Disable via `metadata.save_training_state: false` if you want lighter metadata. |
+> **Tip:** Use `metadata.json` to find the latest checkpoint, to confirm which base model/tokenizer were used, or to drive automated uploads/evaluations.
+## Dataset overview
+| Dataset dir | Source file (relative to `prepare/data/math`) | Notes |
+| --- | --- | --- |
+| `Competition_Math` | `Competition_Math.json` | 100-shot groups drawn from Competition Math practice problems. |
+| `GSM8K_train` | `GSM8K_train.json` | Standard GSM8K train split, excluding the public test set (`GSM8K_test.json` was filtered out). |
+| `MATH` | `MATH.json` | High-school & olympiad math benchmark. |
+| `Math-IIO-68K-Mini` | `Math-IIO-68K-Mini.json` | Mini version of Math IIO dataset. |
+| `Math-Plus` | `Math-Plus.json` | Composed of challenging math word problems. |
+| `Math_QA` | `Math_QA.json` | Multi-choice MathQA dataset formatted to open-ended QA. |
+| `Mu-Math` | `Mu-Math.json` | MuSR style math reasoning set. |
+| `ToT-Math-V1` | `ToT-Math-V1.json` | Tree-of-Thought flavored math prompts. |
+All datasets follow the same JSON schema, so swapping between them only changes topical coverage.
+## How to navigate a single group
+```
+Math_QA/
+└── group_00/
+    ├── adapter/
+    │   ├── adapter_config.json
+    │   ├── adapter_model.safetensors
+    │   ├── tokenizer/… (extra copies of merges, vocab, chat_template.jinja)
+    │   └── training_args.bin
+    ├── checkpoints/
+    │   ├── checkpoint-300/
+    │   ├── checkpoint-600/
+    │   └── …
+    ├── tokenizer/         # same as base tokenizer but pinned to this run
+    ├── prompt_group.json  # 100-shot data
+    └── metadata.json
+```
+When inspecting or sharing a run, the **minimum** file set is `adapter/` + `prompt_group.json` + `metadata.json`. Everything else speeds up resuming or auditing.
+## Using the adapters
+### 0. Environment prerequisites
+- Python ≥ 3.10, `transformers >= 4.37`, `peft >= 0.8`, `accelerate`, `safetensors`, `torch` (GPU build).
+- The base model directory must be accessible; otherwise download `Qwen2.5-1.5B-Instruct` from Hugging Face and update `base_model` path.
+- Optional: set `HF_HOME`, `TRANSFORMERS_CACHE` to avoid repeated downloads.
+### 0.5. Reproduce the training pipeline (optional)
+If someone wants to regenerate any adapter from scratch:
+```bash
+cd /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora
+python -m train_lora.dataset_sampler --overwrite   # regenerates prompt groups
+python -m train_lora.train_single --dataset Math_QA --group 0
+# or run the full queue
+python -m train_lora.run_tasks --run
+```
+These commands will rebuild `prompt_groups/` and `outputs/` with exactly the same seeds and configuration documented above. Slurm users should submit `sbatch run_lora_multinode.sh`.
+### 1. Load adapter with PEFT
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+base_model = "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct"
+adapter_dir = "outputs/Math_QA/group_00/adapter"
+tokenizer = AutoTokenizer.from_pretrained(adapter_dir, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    base_model,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+model = PeftModel.from_pretrained(model, adapter_dir)
+prompt = "Solve 3x + 7 = 22."
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+out = model.generate(**inputs, max_new_tokens=256)
+print(tokenizer.decode(out[0], skip_special_tokens=True))
+```
+Notes:
+- Loading the tokenizer from `adapter/` ensures identical chat template and additional tokens (if any). You can also point to the base tokenizer path if you prefer.
+- For batch inference, wrap the model with `model.merge_and_unload()` if you need a single combined set of weights (at the cost of losing LoRA toggling).
+- If you want maximal throughput on a single GPU, also call `model.half()` or `model.to(torch.bfloat16)` depending on your hardware; the adapters were trained with BF16 so keeping BF16 is the safest choice.
+### 2. Resume or continue training
+```bash
+python -m train_lora.train_single \
+  --dataset Math_QA \
+  --group 0 \
+  --group-file outputs/Math_QA/group_00/prompt_group.json
+```
+Set `--group-file` to reuse the same 100 samples, and initialize `Trainer` with `checkpoints/checkpoint-XXXX` via `TrainingArguments.resume_from_checkpoint`. This reproduces a group or lets you extend training steps.
+To resume manually:
+```python
+trainer.train(resume_from_checkpoint="outputs/Math_QA/group_00/checkpoints/checkpoint-1500")
+```
+### 3. Evaluate with Math-Verify
+The evaluation stack in `../评估体系` and `../parameter_generator/评估` expects this directory layout. Example:
+```bash
+cd /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/评估体系
+python scripts/run_all_evals.py \
+  --config configs/eval_config.yaml \
+  --datasets Math_QA \
+  --groups 0 1
+```
+### 4. Packaging for distribution
+- Upload only `adapter/` and `metadata.json` when sharing publicly (e.g., Hugging Face) to avoid huge checkpoint directories.
+- Keep `prompt_group.json` if you want consumers to understand the training data or to regenerate LoRA weights with the same samples.
+- When exporting, include a README snippet that references this document so downstream users know the provenance.
+- Suggested Hugging Face layout:
+  ```
+  Math_QA/
+    group_00/
+      adapter/
+      prompt_group.json
+      metadata.json
+  README.md (copy sections describing provenance + usage)
+  ```
+## File reference (`metadata.json`)
+Key fields you may want to automate against:
+| Field | Meaning |
+| --- | --- |
+| `dataset_name`, `group_index` | Identify the run. |
+| `prompt_group_file` | Absolute path back to the sampled dataset. |
+| `checkpoint_root` | Where all intermediate checkpoints live. |
+| `train_loss`, `metrics` | Final loss and Trainer metrics dict. |
+| `trainer_state` | Full log history (can be large; disable via `metadata.save_training_state`). |
+| `training_args` | Exact HF `TrainingArguments` snapshot. |
+| `lora_config` | Copy of the LoRA hyperparameters used. |
+| `effective_batch_size` | `world_size × per_device_batch_size × grad_accum` — useful for scaling comparisons. |
+| `git_commit` | Populated if the `GIT_COMMIT` env var was set before training. |
+| `metrics.train_runtime`, `metrics.train_samples_per_second` | Throughput stats. |
+| `generated_at` | UTC timestamp when the metadata was written. |
+## Best practices
+- Always match BF16 or FP16 settings between base model loading and adapter training; these adapters were trained in BF16.
+- If you edit files inside this directory, keep structure intact—other scripts rely on relative paths (`adapter`, `tokenizer`, `metadata.json`).
+- Before deploying a new LoRA, verify it with the evaluation suite and consider merging multiple groups (e.g., ensemble or checkpoint averaging) only after confirming stability.
+- Use `prompt_group.json` and `metadata.json` as documentation when presenting results; they already include seeds, sample indices, and environment details.
+- If you build new LoRAs with different configs (e.g., higher rank, more steps), add a sibling directory (e.g., `outputs_v2/`) or annotate the README so collaborators know which adapters correspond to which experiment.
+Happy finetuning! If you extend this collection (new datasets, extra groups, or different hyperparameters), add another section here describing the changes so downstream consumers stay informed.