nikJ13 commited on Dec 8, 2024

Commit

04b6c32

verified ·

1 Parent(s): 8d71282

Delete Qwen2.5-Coder-7B-Instruct-math-solver-config_3

Browse files

Files changed (40) hide show

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/added_tokens.json +0 -24
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/README.md +0 -202
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/adapter_config.json +0 -34
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/adapter_model.safetensors +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/added_tokens.json +0 -24
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/merges.txt +0 -0
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/optimizer.pt +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/rng_state.pth +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/scheduler.pt +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/special_tokens_map.json +0 -31
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/tokenizer.json +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/tokenizer_config.json +0 -207
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/trainer_state.json +0 -47
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/training_args.bin +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/vocab.json +0 -0
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/README.md +0 -202
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/adapter_config.json +0 -34
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/adapter_model.safetensors +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/added_tokens.json +0 -24
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/merges.txt +0 -0
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/optimizer.pt +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/rng_state.pth +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/scheduler.pt +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/special_tokens_map.json +0 -31
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/tokenizer.json +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/tokenizer_config.json +0 -207
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/trainer_state.json +0 -1783
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/training_args.bin +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/vocab.json +0 -0
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/config.json +0 -45
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/generation_config.json +0 -14
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/merges.txt +0 -0
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/model-00001-of-00003.safetensors +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/model-00002-of-00003.safetensors +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/model-00003-of-00003.safetensors +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/model.safetensors.index.json +0 -0
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/special_tokens_map.json +0 -31
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/tokenizer.json +0 -3
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/tokenizer_config.json +0 -207
Qwen2.5-Coder-7B-Instruct-math-solver-config_3/vocab.json +0 -0

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/added_tokens.json DELETED Viewed

@@ -1,24 +0,0 @@
-{
-  "</tool_call>": 151658,
-  "<tool_call>": 151657,
-  "<|box_end|>": 151649,
-  "<|box_start|>": 151648,
-  "<|endoftext|>": 151643,
-  "<|file_sep|>": 151664,
-  "<|fim_middle|>": 151660,
-  "<|fim_pad|>": 151662,
-  "<|fim_prefix|>": 151659,
-  "<|fim_suffix|>": 151661,
-  "<|im_end|>": 151645,
-  "<|im_start|>": 151644,
-  "<|image_pad|>": 151655,
-  "<|object_ref_end|>": 151647,
-  "<|object_ref_start|>": 151646,
-  "<|quad_end|>": 151651,
-  "<|quad_start|>": 151650,
-  "<|repo_name|>": 151663,
-  "<|video_pad|>": 151656,
-  "<|vision_end|>": 151653,
-  "<|vision_pad|>": 151654,
-  "<|vision_start|>": 151652
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/README.md DELETED Viewed

@@ -1,202 +0,0 @@
----
-base_model: Qwen/Qwen2.5-Coder-7B-Instruct
-library_name: peft
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
-### Framework versions
-- PEFT 0.13.2

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/adapter_config.json DELETED Viewed

@@ -1,34 +0,0 @@
-{
-  "alpha_pattern": {},
-  "auto_mapping": null,
-  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-7B-Instruct",
-  "bias": "none",
-  "fan_in_fan_out": false,
-  "inference_mode": true,
-  "init_lora_weights": true,
-  "layer_replication": null,
-  "layers_pattern": null,
-  "layers_to_transform": null,
-  "loftq_config": {},
-  "lora_alpha": 32,
-  "lora_dropout": 0.05,
-  "megatron_config": null,
-  "megatron_core": "megatron.core",
-  "modules_to_save": null,
-  "peft_type": "LORA",
-  "r": 16,
-  "rank_pattern": {},
-  "revision": null,
-  "target_modules": [
-    "v_proj",
-    "down_proj",
-    "gate_proj",
-    "q_proj",
-    "k_proj",
-    "o_proj",
-    "up_proj"
-  ],
-  "task_type": "CAUSAL_LM",
-  "use_dora": false,
-  "use_rslora": false
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/adapter_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:73cdeda90144493e0e6b4aa17846ad8d72574f3a47dce41363f17ecb8635b33a
-size 161533192

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/added_tokens.json DELETED Viewed

@@ -1,24 +0,0 @@
-{
-  "</tool_call>": 151658,
-  "<tool_call>": 151657,
-  "<|box_end|>": 151649,
-  "<|box_start|>": 151648,
-  "<|endoftext|>": 151643,
-  "<|file_sep|>": 151664,
-  "<|fim_middle|>": 151660,
-  "<|fim_pad|>": 151662,
-  "<|fim_prefix|>": 151659,
-  "<|fim_suffix|>": 151661,
-  "<|im_end|>": 151645,
-  "<|im_start|>": 151644,
-  "<|image_pad|>": 151655,
-  "<|object_ref_end|>": 151647,
-  "<|object_ref_start|>": 151646,
-  "<|quad_end|>": 151651,
-  "<|quad_start|>": 151650,
-  "<|repo_name|>": 151663,
-  "<|video_pad|>": 151656,
-  "<|vision_end|>": 151653,
-  "<|vision_pad|>": 151654,
-  "<|vision_start|>": 151652
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/optimizer.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:4be7042ba3f357c8aadcf06d3acda14100472f1af3c59798af6cc4488e0acef3
-size 323195450

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/rng_state.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:76079fe49557897388d14561254800dd0a22f08f7e601bb3fa7b0dc866d6233c
-size 14244

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/scheduler.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b2a5aa6cab52bdf0942e6b057dccc7ca9a8b45ae6da834a9bf34a445647b8991
-size 1064

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/special_tokens_map.json DELETED Viewed

@@ -1,31 +0,0 @@
-{
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "eos_token": {
-    "content": "<|im_end|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/tokenizer.json DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:913950e4971737031da511cdd1b410daae4566f62eb845b3975bca5a102323d8
-size 11421995

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/tokenizer_config.json DELETED Viewed

@@ -1,207 +0,0 @@
-{
-  "add_bos_token": false,
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "151643": {
-      "content": "<|endoftext|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151644": {
-      "content": "<|im_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151645": {
-      "content": "<|im_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151646": {
-      "content": "<|object_ref_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151647": {
-      "content": "<|object_ref_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151648": {
-      "content": "<|box_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151649": {
-      "content": "<|box_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151650": {
-      "content": "<|quad_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151651": {
-      "content": "<|quad_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151652": {
-      "content": "<|vision_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151653": {
-      "content": "<|vision_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151654": {
-      "content": "<|vision_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151655": {
-      "content": "<|image_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151656": {
-      "content": "<|video_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151657": {
-      "content": "<tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151658": {
-      "content": "</tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151659": {
-      "content": "<|fim_prefix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151660": {
-      "content": "<|fim_middle|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151661": {
-      "content": "<|fim_suffix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151662": {
-      "content": "<|fim_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151663": {
-      "content": "<|repo_name|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151664": {
-      "content": "<|file_sep|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "bos_token": null,
-  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "<|im_end|>",
-  "errors": "replace",
-  "model_max_length": 32768,
-  "pad_token": "<|endoftext|>",
-  "split_special_tokens": false,
-  "tokenizer_class": "Qwen2Tokenizer",
-  "unk_token": null
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/trainer_state.json DELETED Viewed

@@ -1,47 +0,0 @@
-{
-  "best_metric": null,
-  "best_model_checkpoint": null,
-  "epoch": 2.0,
-  "eval_steps": 500,
-  "global_step": 2,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [
-    {
-      "epoch": 1.0,
-      "grad_norm": 0.7726275324821472,
-      "learning_rate": 0.0001,
-      "loss": 1.9478,
-      "step": 1
-    },
-    {
-      "epoch": 2.0,
-      "grad_norm": 0.7735128998756409,
-      "learning_rate": 0.0,
-      "loss": 1.9478,
-      "step": 2
-    }
-  ],
-  "logging_steps": 1,
-  "max_steps": 2,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 2,
-  "save_steps": 500,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": true,
-        "should_training_stop": true
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 62804257603584.0,
-  "train_batch_size": 2,
-  "trial_name": null,
-  "trial_params": null
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/training_args.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:1a04ab74c162f7bc2c66de7441ac11508a205b835ab1c0c316453a6b7b442cf6
-size 5752

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-2/vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/README.md DELETED Viewed

@@ -1,202 +0,0 @@
----
-base_model: Qwen/Qwen2.5-Coder-7B-Instruct
-library_name: peft
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
-### Framework versions
-- PEFT 0.13.2

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/adapter_config.json DELETED Viewed

@@ -1,34 +0,0 @@
-{
-  "alpha_pattern": {},
-  "auto_mapping": null,
-  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-7B-Instruct",
-  "bias": "none",
-  "fan_in_fan_out": false,
-  "inference_mode": true,
-  "init_lora_weights": true,
-  "layer_replication": null,
-  "layers_pattern": null,
-  "layers_to_transform": null,
-  "loftq_config": {},
-  "lora_alpha": 32,
-  "lora_dropout": 0.05,
-  "megatron_config": null,
-  "megatron_core": "megatron.core",
-  "modules_to_save": null,
-  "peft_type": "LORA",
-  "r": 16,
-  "rank_pattern": {},
-  "revision": null,
-  "target_modules": [
-    "down_proj",
-    "o_proj",
-    "k_proj",
-    "gate_proj",
-    "q_proj",
-    "up_proj",
-    "v_proj"
-  ],
-  "task_type": "CAUSAL_LM",
-  "use_dora": false,
-  "use_rslora": false
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/adapter_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:6dd0160486fd66e47c546174b93c6907d0895a7fc1c86c0789104e72311b1943
-size 161533192

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/added_tokens.json DELETED Viewed

@@ -1,24 +0,0 @@
-{
-  "</tool_call>": 151658,
-  "<tool_call>": 151657,
-  "<|box_end|>": 151649,
-  "<|box_start|>": 151648,
-  "<|endoftext|>": 151643,
-  "<|file_sep|>": 151664,
-  "<|fim_middle|>": 151660,
-  "<|fim_pad|>": 151662,
-  "<|fim_prefix|>": 151659,
-  "<|fim_suffix|>": 151661,
-  "<|im_end|>": 151645,
-  "<|im_start|>": 151644,
-  "<|image_pad|>": 151655,
-  "<|object_ref_end|>": 151647,
-  "<|object_ref_start|>": 151646,
-  "<|quad_end|>": 151651,
-  "<|quad_start|>": 151650,
-  "<|repo_name|>": 151663,
-  "<|video_pad|>": 151656,
-  "<|vision_end|>": 151653,
-  "<|vision_pad|>": 151654,
-  "<|vision_start|>": 151652
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/optimizer.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:7909d1068032873dfcbd9335cae366a200e1d23b53b4ea11df10a98ac4007777
-size 323195450

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/rng_state.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:46285fdf783fb8c88b056edfb4586a34cf9e1923fb5c13de3800e6806fb20d85
-size 14244

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/scheduler.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:9f08a587270fab32b5cfb6fd3c4f03224c719d157f88d2ac437e72b2c5c051bf
-size 1064

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/special_tokens_map.json DELETED Viewed

@@ -1,31 +0,0 @@
-{
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "eos_token": {
-    "content": "<|im_end|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/tokenizer.json DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:962b8d8c521fefa934665afddae177326e974ddd6a26e69ff31ad6bccbb5593b
-size 11421994

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/tokenizer_config.json DELETED Viewed

@@ -1,207 +0,0 @@
-{
-  "add_bos_token": false,
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "151643": {
-      "content": "<|endoftext|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151644": {
-      "content": "<|im_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151645": {
-      "content": "<|im_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151646": {
-      "content": "<|object_ref_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151647": {
-      "content": "<|object_ref_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151648": {
-      "content": "<|box_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151649": {
-      "content": "<|box_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151650": {
-      "content": "<|quad_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151651": {
-      "content": "<|quad_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151652": {
-      "content": "<|vision_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151653": {
-      "content": "<|vision_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151654": {
-      "content": "<|vision_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151655": {
-      "content": "<|image_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151656": {
-      "content": "<|video_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151657": {
-      "content": "<tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151658": {
-      "content": "</tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151659": {
-      "content": "<|fim_prefix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151660": {
-      "content": "<|fim_middle|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151661": {
-      "content": "<|fim_suffix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151662": {
-      "content": "<|fim_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151663": {
-      "content": "<|repo_name|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151664": {
-      "content": "<|file_sep|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "bos_token": null,
-  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "<|im_end|>",
-  "errors": "replace",
-  "model_max_length": 32768,
-  "pad_token": "<|endoftext|>",
-  "split_special_tokens": false,
-  "tokenizer_class": "Qwen2Tokenizer",
-  "unk_token": null
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/trainer_state.json DELETED Viewed

@@ -1,1783 +0,0 @@
-{
-  "best_metric": null,
-  "best_model_checkpoint": null,
-  "epoch": 2.0,
-  "eval_steps": 500,
-  "global_step": 250,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [
-    {
-      "epoch": 0.008,
-      "grad_norm": 0.34940728545188904,
-      "learning_rate": 4.000000000000001e-06,
-      "loss": 1.1602,
-      "step": 1
-    },
-    {
-      "epoch": 0.016,
-      "grad_norm": 0.44999364018440247,
-      "learning_rate": 8.000000000000001e-06,
-      "loss": 1.4248,
-      "step": 2
-    },
-    {
-      "epoch": 0.024,
-      "grad_norm": 0.4955803453922272,
-      "learning_rate": 1.2e-05,
-      "loss": 1.5365,
-      "step": 3
-    },
-    {
-      "epoch": 0.032,
-      "grad_norm": 0.5144389867782593,
-      "learning_rate": 1.6000000000000003e-05,
-      "loss": 1.4261,
-      "step": 4
-    },
-    {
-      "epoch": 0.04,
-      "grad_norm": 0.5175902247428894,
-      "learning_rate": 2e-05,
-      "loss": 1.5149,
-      "step": 5
-    },
-    {
-      "epoch": 0.048,
-      "grad_norm": 0.5514883399009705,
-      "learning_rate": 2.4e-05,
-      "loss": 1.4894,
-      "step": 6
-    },
-    {
-      "epoch": 0.056,
-      "grad_norm": 0.5745941400527954,
-      "learning_rate": 2.8000000000000003e-05,
-      "loss": 1.6145,
-      "step": 7
-    },
-    {
-      "epoch": 0.064,
-      "grad_norm": 0.5998412370681763,
-      "learning_rate": 3.2000000000000005e-05,
-      "loss": 1.6034,
-      "step": 8
-    },
-    {
-      "epoch": 0.072,
-      "grad_norm": 0.6740056276321411,
-      "learning_rate": 3.6e-05,
-      "loss": 1.5693,
-      "step": 9
-    },
-    {
-      "epoch": 0.08,
-      "grad_norm": 0.6063085794448853,
-      "learning_rate": 4e-05,
-      "loss": 1.4704,
-      "step": 10
-    },
-    {
-      "epoch": 0.088,
-      "grad_norm": 0.6194254755973816,
-      "learning_rate": 4.4000000000000006e-05,
-      "loss": 1.4702,
-      "step": 11
-    },
-    {
-      "epoch": 0.096,
-      "grad_norm": 0.6062185764312744,
-      "learning_rate": 4.8e-05,
-      "loss": 1.4051,
-      "step": 12
-    },
-    {
-      "epoch": 0.104,
-      "grad_norm": 0.6020225882530212,
-      "learning_rate": 5.2000000000000004e-05,
-      "loss": 1.4395,
-      "step": 13
-    },
-    {
-      "epoch": 0.112,
-      "grad_norm": 0.6642087697982788,
-      "learning_rate": 5.6000000000000006e-05,
-      "loss": 1.3962,
-      "step": 14
-    },
-    {
-      "epoch": 0.12,
-      "grad_norm": 0.6600400805473328,
-      "learning_rate": 6e-05,
-      "loss": 1.2873,
-      "step": 15
-    },
-    {
-      "epoch": 0.128,
-      "grad_norm": 0.6771585941314697,
-      "learning_rate": 6.400000000000001e-05,
-      "loss": 1.1731,
-      "step": 16
-    },
-    {
-      "epoch": 0.136,
-      "grad_norm": 0.7446087002754211,
-      "learning_rate": 6.800000000000001e-05,
-      "loss": 1.1394,
-      "step": 17
-    },
-    {
-      "epoch": 0.144,
-      "grad_norm": 0.816383957862854,
-      "learning_rate": 7.2e-05,
-      "loss": 1.0757,
-      "step": 18
-    },
-    {
-      "epoch": 0.152,
-      "grad_norm": 0.8373253345489502,
-      "learning_rate": 7.6e-05,
-      "loss": 1.0054,
-      "step": 19
-    },
-    {
-      "epoch": 0.16,
-      "grad_norm": 0.9889208674430847,
-      "learning_rate": 8e-05,
-      "loss": 0.9498,
-      "step": 20
-    },
-    {
-      "epoch": 0.168,
-      "grad_norm": 0.9649688601493835,
-      "learning_rate": 8.4e-05,
-      "loss": 0.7243,
-      "step": 21
-    },
-    {
-      "epoch": 0.176,
-      "grad_norm": 0.7845011949539185,
-      "learning_rate": 8.800000000000001e-05,
-      "loss": 0.6749,
-      "step": 22
-    },
-    {
-      "epoch": 0.184,
-      "grad_norm": 0.7114744186401367,
-      "learning_rate": 9.200000000000001e-05,
-      "loss": 0.5714,
-      "step": 23
-    },
-    {
-      "epoch": 0.192,
-      "grad_norm": 0.62738436460495,
-      "learning_rate": 9.6e-05,
-      "loss": 0.5821,
-      "step": 24
-    },
-    {
-      "epoch": 0.2,
-      "grad_norm": 0.6129463911056519,
-      "learning_rate": 0.0001,
-      "loss": 0.5317,
-      "step": 25
-    },
-    {
-      "epoch": 0.208,
-      "grad_norm": 0.5819089412689209,
-      "learning_rate": 9.955555555555556e-05,
-      "loss": 0.444,
-      "step": 26
-    },
-    {
-      "epoch": 0.216,
-      "grad_norm": 0.5154381990432739,
-      "learning_rate": 9.911111111111112e-05,
-      "loss": 0.4124,
-      "step": 27
-    },
-    {
-      "epoch": 0.224,
-      "grad_norm": 0.5854068398475647,
-      "learning_rate": 9.866666666666668e-05,
-      "loss": 0.464,
-      "step": 28
-    },
-    {
-      "epoch": 0.232,
-      "grad_norm": 0.6885148286819458,
-      "learning_rate": 9.822222222222223e-05,
-      "loss": 0.4453,
-      "step": 29
-    },
-    {
-      "epoch": 0.24,
-      "grad_norm": 0.7243378758430481,
-      "learning_rate": 9.777777777777778e-05,
-      "loss": 0.4729,
-      "step": 30
-    },
-    {
-      "epoch": 0.248,
-      "grad_norm": 0.8584285378456116,
-      "learning_rate": 9.733333333333335e-05,
-      "loss": 0.4418,
-      "step": 31
-    },
-    {
-      "epoch": 0.256,
-      "grad_norm": 0.6010773777961731,
-      "learning_rate": 9.68888888888889e-05,
-      "loss": 0.4747,
-      "step": 32
-    },
-    {
-      "epoch": 0.264,
-      "grad_norm": 0.6505383849143982,
-      "learning_rate": 9.644444444444445e-05,
-      "loss": 0.3523,
-      "step": 33
-    },
-    {
-      "epoch": 0.272,
-      "grad_norm": 0.5273345112800598,
-      "learning_rate": 9.6e-05,
-      "loss": 0.4084,
-      "step": 34
-    },
-    {
-      "epoch": 0.28,
-      "grad_norm": 0.4115114212036133,
-      "learning_rate": 9.555555555555557e-05,
-      "loss": 0.3415,
-      "step": 35
-    },
-    {
-      "epoch": 0.288,
-      "grad_norm": 0.39078396558761597,
-      "learning_rate": 9.511111111111112e-05,
-      "loss": 0.3244,
-      "step": 36
-    },
-    {
-      "epoch": 0.296,
-      "grad_norm": 0.415108859539032,
-      "learning_rate": 9.466666666666667e-05,
-      "loss": 0.3093,
-      "step": 37
-    },
-    {
-      "epoch": 0.304,
-      "grad_norm": 0.4628484547138214,
-      "learning_rate": 9.422222222222223e-05,
-      "loss": 0.3566,
-      "step": 38
-    },
-    {
-      "epoch": 0.312,
-      "grad_norm": 0.4045255184173584,
-      "learning_rate": 9.377777777777779e-05,
-      "loss": 0.335,
-      "step": 39
-    },
-    {
-      "epoch": 0.32,
-      "grad_norm": 0.4579395055770874,
-      "learning_rate": 9.333333333333334e-05,
-      "loss": 0.3419,
-      "step": 40
-    },
-    {
-      "epoch": 0.328,
-      "grad_norm": 0.42325565218925476,
-      "learning_rate": 9.28888888888889e-05,
-      "loss": 0.3205,
-      "step": 41
-    },
-    {
-      "epoch": 0.336,
-      "grad_norm": 0.48346665501594543,
-      "learning_rate": 9.244444444444445e-05,
-      "loss": 0.327,
-      "step": 42
-    },
-    {
-      "epoch": 0.344,
-      "grad_norm": 0.43801987171173096,
-      "learning_rate": 9.200000000000001e-05,
-      "loss": 0.3111,
-      "step": 43
-    },
-    {
-      "epoch": 0.352,
-      "grad_norm": 0.4221636950969696,
-      "learning_rate": 9.155555555555557e-05,
-      "loss": 0.2774,
-      "step": 44
-    },
-    {
-      "epoch": 0.36,
-      "grad_norm": 0.4764738082885742,
-      "learning_rate": 9.111111111111112e-05,
-      "loss": 0.2701,
-      "step": 45
-    },
-    {
-      "epoch": 0.368,
-      "grad_norm": 0.4839763641357422,
-      "learning_rate": 9.066666666666667e-05,
-      "loss": 0.2824,
-      "step": 46
-    },
-    {
-      "epoch": 0.376,
-      "grad_norm": 0.4484062194824219,
-      "learning_rate": 9.022222222222224e-05,
-      "loss": 0.2975,
-      "step": 47
-    },
-    {
-      "epoch": 0.384,
-      "grad_norm": 0.4971374571323395,
-      "learning_rate": 8.977777777777779e-05,
-      "loss": 0.2696,
-      "step": 48
-    },
-    {
-      "epoch": 0.392,
-      "grad_norm": 0.40287044644355774,
-      "learning_rate": 8.933333333333334e-05,
-      "loss": 0.2937,
-      "step": 49
-    },
-    {
-      "epoch": 0.4,
-      "grad_norm": 0.4321398437023163,
-      "learning_rate": 8.888888888888889e-05,
-      "loss": 0.2983,
-      "step": 50
-    },
-    {
-      "epoch": 0.408,
-      "grad_norm": 0.4827052652835846,
-      "learning_rate": 8.844444444444445e-05,
-      "loss": 0.2751,
-      "step": 51
-    },
-    {
-      "epoch": 0.416,
-      "grad_norm": 0.41231727600097656,
-      "learning_rate": 8.800000000000001e-05,
-      "loss": 0.2907,
-      "step": 52
-    },
-    {
-      "epoch": 0.424,
-      "grad_norm": 0.4017980396747589,
-      "learning_rate": 8.755555555555556e-05,
-      "loss": 0.2935,
-      "step": 53
-    },
-    {
-      "epoch": 0.432,
-      "grad_norm": 0.4233342707157135,
-      "learning_rate": 8.711111111111112e-05,
-      "loss": 0.2608,
-      "step": 54
-    },
-    {
-      "epoch": 0.44,
-      "grad_norm": 0.4224155843257904,
-      "learning_rate": 8.666666666666667e-05,
-      "loss": 0.2377,
-      "step": 55
-    },
-    {
-      "epoch": 0.448,
-      "grad_norm": 0.45558053255081177,
-      "learning_rate": 8.622222222222222e-05,
-      "loss": 0.314,
-      "step": 56
-    },
-    {
-      "epoch": 0.456,
-      "grad_norm": 0.2843957245349884,
-      "learning_rate": 8.577777777777777e-05,
-      "loss": 0.1776,
-      "step": 57
-    },
-    {
-      "epoch": 0.464,
-      "grad_norm": 0.3467918336391449,
-      "learning_rate": 8.533333333333334e-05,
-      "loss": 0.2644,
-      "step": 58
-    },
-    {
-      "epoch": 0.472,
-      "grad_norm": 0.40068551898002625,
-      "learning_rate": 8.488888888888889e-05,
-      "loss": 0.2437,
-      "step": 59
-    },
-    {
-      "epoch": 0.48,
-      "grad_norm": 0.3978475034236908,
-      "learning_rate": 8.444444444444444e-05,
-      "loss": 0.2279,
-      "step": 60
-    },
-    {
-      "epoch": 0.488,
-      "grad_norm": 0.33028894662857056,
-      "learning_rate": 8.4e-05,
-      "loss": 0.1828,
-      "step": 61
-    },
-    {
-      "epoch": 0.496,
-      "grad_norm": 0.49656128883361816,
-      "learning_rate": 8.355555555555556e-05,
-      "loss": 0.2689,
-      "step": 62
-    },
-    {
-      "epoch": 0.504,
-      "grad_norm": 0.3330729901790619,
-      "learning_rate": 8.311111111111111e-05,
-      "loss": 0.3695,
-      "step": 63
-    },
-    {
-      "epoch": 0.512,
-      "grad_norm": 0.27048325538635254,
-      "learning_rate": 8.266666666666667e-05,
-      "loss": 0.249,
-      "step": 64
-    },
-    {
-      "epoch": 0.52,
-      "grad_norm": 0.33588236570358276,
-      "learning_rate": 8.222222222222222e-05,
-      "loss": 0.261,
-      "step": 65
-    },
-    {
-      "epoch": 0.528,
-      "grad_norm": 0.3322620093822479,
-      "learning_rate": 8.177777777777778e-05,
-      "loss": 0.2348,
-      "step": 66
-    },
-    {
-      "epoch": 0.536,
-      "grad_norm": 0.31956177949905396,
-      "learning_rate": 8.133333333333334e-05,
-      "loss": 0.2274,
-      "step": 67
-    },
-    {
-      "epoch": 0.544,
-      "grad_norm": 0.3621397316455841,
-      "learning_rate": 8.088888888888889e-05,
-      "loss": 0.3115,
-      "step": 68
-    },
-    {
-      "epoch": 0.552,
-      "grad_norm": 0.40619128942489624,
-      "learning_rate": 8.044444444444444e-05,
-      "loss": 0.3288,
-      "step": 69
-    },
-    {
-      "epoch": 0.56,
-      "grad_norm": 0.4488775134086609,
-      "learning_rate": 8e-05,
-      "loss": 0.2983,
-      "step": 70
-    },
-    {
-      "epoch": 0.568,
-      "grad_norm": 0.3391357660293579,
-      "learning_rate": 7.955555555555556e-05,
-      "loss": 0.2995,
-      "step": 71
-    },
-    {
-      "epoch": 0.576,
-      "grad_norm": 0.41893452405929565,
-      "learning_rate": 7.911111111111111e-05,
-      "loss": 0.3542,
-      "step": 72
-    },
-    {
-      "epoch": 0.584,
-      "grad_norm": 0.38077589869499207,
-      "learning_rate": 7.866666666666666e-05,
-      "loss": 0.2966,
-      "step": 73
-    },
-    {
-      "epoch": 0.592,
-      "grad_norm": 0.3506402373313904,
-      "learning_rate": 7.822222222222223e-05,
-      "loss": 0.2485,
-      "step": 74
-    },
-    {
-      "epoch": 0.6,
-      "grad_norm": 0.3642931282520294,
-      "learning_rate": 7.777777777777778e-05,
-      "loss": 0.251,
-      "step": 75
-    },
-    {
-      "epoch": 0.608,
-      "grad_norm": 0.32581496238708496,
-      "learning_rate": 7.733333333333333e-05,
-      "loss": 0.241,
-      "step": 76
-    },
-    {
-      "epoch": 0.616,
-      "grad_norm": 0.37580692768096924,
-      "learning_rate": 7.688888888888889e-05,
-      "loss": 0.2484,
-      "step": 77
-    },
-    {
-      "epoch": 0.624,
-      "grad_norm": 0.3977254629135132,
-      "learning_rate": 7.644444444444445e-05,
-      "loss": 0.2673,
-      "step": 78
-    },
-    {
-      "epoch": 0.632,
-      "grad_norm": 0.36504948139190674,
-      "learning_rate": 7.6e-05,
-      "loss": 0.2046,
-      "step": 79
-    },
-    {
-      "epoch": 0.64,
-      "grad_norm": 0.3572933077812195,
-      "learning_rate": 7.555555555555556e-05,
-      "loss": 0.3035,
-      "step": 80
-    },
-    {
-      "epoch": 0.648,
-      "grad_norm": 0.34276166558265686,
-      "learning_rate": 7.511111111111111e-05,
-      "loss": 0.2456,
-      "step": 81
-    },
-    {
-      "epoch": 0.656,
-      "grad_norm": 0.35615694522857666,
-      "learning_rate": 7.466666666666667e-05,
-      "loss": 0.2596,
-      "step": 82
-    },
-    {
-      "epoch": 0.664,
-      "grad_norm": 0.40869560837745667,
-      "learning_rate": 7.422222222222223e-05,
-      "loss": 0.2463,
-      "step": 83
-    },
-    {
-      "epoch": 0.672,
-      "grad_norm": 0.42578941583633423,
-      "learning_rate": 7.377777777777778e-05,
-      "loss": 0.2699,
-      "step": 84
-    },
-    {
-      "epoch": 0.68,
-      "grad_norm": 0.3815199136734009,
-      "learning_rate": 7.333333333333333e-05,
-      "loss": 0.2387,
-      "step": 85
-    },
-    {
-      "epoch": 0.688,
-      "grad_norm": 0.3533499240875244,
-      "learning_rate": 7.28888888888889e-05,
-      "loss": 0.2138,
-      "step": 86
-    },
-    {
-      "epoch": 0.696,
-      "grad_norm": 0.49687737226486206,
-      "learning_rate": 7.244444444444445e-05,
-      "loss": 0.2631,
-      "step": 87
-    },
-    {
-      "epoch": 0.704,
-      "grad_norm": 0.34110110998153687,
-      "learning_rate": 7.2e-05,
-      "loss": 0.1998,
-      "step": 88
-    },
-    {
-      "epoch": 0.712,
-      "grad_norm": 0.35842886567115784,
-      "learning_rate": 7.155555555555555e-05,
-      "loss": 0.2193,
-      "step": 89
-    },
-    {
-      "epoch": 0.72,
-      "grad_norm": 0.3535846769809723,
-      "learning_rate": 7.111111111111112e-05,
-      "loss": 0.2132,
-      "step": 90
-    },
-    {
-      "epoch": 0.728,
-      "grad_norm": 0.4714760482311249,
-      "learning_rate": 7.066666666666667e-05,
-      "loss": 0.2114,
-      "step": 91
-    },
-    {
-      "epoch": 0.736,
-      "grad_norm": 0.4893051087856293,
-      "learning_rate": 7.022222222222222e-05,
-      "loss": 0.2618,
-      "step": 92
-    },
-    {
-      "epoch": 0.744,
-      "grad_norm": 0.4981172978878021,
-      "learning_rate": 6.977777777777779e-05,
-      "loss": 0.2408,
-      "step": 93
-    },
-    {
-      "epoch": 0.752,
-      "grad_norm": 0.35243386030197144,
-      "learning_rate": 6.933333333333334e-05,
-      "loss": 0.268,
-      "step": 94
-    },
-    {
-      "epoch": 0.76,
-      "grad_norm": 0.4038974344730377,
-      "learning_rate": 6.88888888888889e-05,
-      "loss": 0.3196,
-      "step": 95
-    },
-    {
-      "epoch": 0.768,
-      "grad_norm": 0.36606112122535706,
-      "learning_rate": 6.844444444444445e-05,
-      "loss": 0.3311,
-      "step": 96
-    },
-    {
-      "epoch": 0.776,
-      "grad_norm": 0.41023463010787964,
-      "learning_rate": 6.800000000000001e-05,
-      "loss": 0.2369,
-      "step": 97
-    },
-    {
-      "epoch": 0.784,
-      "grad_norm": 0.39212992787361145,
-      "learning_rate": 6.755555555555557e-05,
-      "loss": 0.2643,
-      "step": 98
-    },
-    {
-      "epoch": 0.792,
-      "grad_norm": 0.39127424359321594,
-      "learning_rate": 6.711111111111112e-05,
-      "loss": 0.2516,
-      "step": 99
-    },
-    {
-      "epoch": 0.8,
-      "grad_norm": 0.43007251620292664,
-      "learning_rate": 6.666666666666667e-05,
-      "loss": 0.339,
-      "step": 100
-    },
-    {
-      "epoch": 0.808,
-      "grad_norm": 0.3675765097141266,
-      "learning_rate": 6.622222222222224e-05,
-      "loss": 0.2827,
-      "step": 101
-    },
-    {
-      "epoch": 0.816,
-      "grad_norm": 0.3471575379371643,
-      "learning_rate": 6.577777777777779e-05,
-      "loss": 0.3642,
-      "step": 102
-    },
-    {
-      "epoch": 0.824,
-      "grad_norm": 0.3958161473274231,
-      "learning_rate": 6.533333333333334e-05,
-      "loss": 0.2404,
-      "step": 103
-    },
-    {
-      "epoch": 0.832,
-      "grad_norm": 0.408782422542572,
-      "learning_rate": 6.488888888888889e-05,
-      "loss": 0.2853,
-      "step": 104
-    },
-    {
-      "epoch": 0.84,
-      "grad_norm": 0.40164637565612793,
-      "learning_rate": 6.444444444444446e-05,
-      "loss": 0.2408,
-      "step": 105
-    },
-    {
-      "epoch": 0.848,
-      "grad_norm": 0.35690632462501526,
-      "learning_rate": 6.400000000000001e-05,
-      "loss": 0.2321,
-      "step": 106
-    },
-    {
-      "epoch": 0.856,
-      "grad_norm": 0.364923894405365,
-      "learning_rate": 6.355555555555556e-05,
-      "loss": 0.2349,
-      "step": 107
-    },
-    {
-      "epoch": 0.864,
-      "grad_norm": 0.38773006200790405,
-      "learning_rate": 6.311111111111112e-05,
-      "loss": 0.2351,
-      "step": 108
-    },
-    {
-      "epoch": 0.872,
-      "grad_norm": 0.3910294473171234,
-      "learning_rate": 6.266666666666667e-05,
-      "loss": 0.2406,
-      "step": 109
-    },
-    {
-      "epoch": 0.88,
-      "grad_norm": 0.4128250777721405,
-      "learning_rate": 6.222222222222222e-05,
-      "loss": 0.2838,
-      "step": 110
-    },
-    {
-      "epoch": 0.888,
-      "grad_norm": 0.48258158564567566,
-      "learning_rate": 6.177777777777779e-05,
-      "loss": 0.2579,
-      "step": 111
-    },
-    {
-      "epoch": 0.896,
-      "grad_norm": 0.3682475984096527,
-      "learning_rate": 6.133333333333334e-05,
-      "loss": 0.2402,
-      "step": 112
-    },
-    {
-      "epoch": 0.904,
-      "grad_norm": 0.5177001953125,
-      "learning_rate": 6.08888888888889e-05,
-      "loss": 0.3181,
-      "step": 113
-    },
-    {
-      "epoch": 0.912,
-      "grad_norm": 0.4134182631969452,
-      "learning_rate": 6.044444444444445e-05,
-      "loss": 0.2638,
-      "step": 114
-    },
-    {
-      "epoch": 0.92,
-      "grad_norm": 0.3373229205608368,
-      "learning_rate": 6e-05,
-      "loss": 0.1883,
-      "step": 115
-    },
-    {
-      "epoch": 0.928,
-      "grad_norm": 0.35318320989608765,
-      "learning_rate": 5.9555555555555554e-05,
-      "loss": 0.1996,
-      "step": 116
-    },
-    {
-      "epoch": 0.936,
-      "grad_norm": 0.3892814815044403,
-      "learning_rate": 5.911111111111112e-05,
-      "loss": 0.2788,
-      "step": 117
-    },
-    {
-      "epoch": 0.944,
-      "grad_norm": 0.4185868799686432,
-      "learning_rate": 5.866666666666667e-05,
-      "loss": 0.2725,
-      "step": 118
-    },
-    {
-      "epoch": 0.952,
-      "grad_norm": 0.3705510199069977,
-      "learning_rate": 5.8222222222222224e-05,
-      "loss": 0.2666,
-      "step": 119
-    },
-    {
-      "epoch": 0.96,
-      "grad_norm": 0.3852842152118683,
-      "learning_rate": 5.7777777777777776e-05,
-      "loss": 0.2172,
-      "step": 120
-    },
-    {
-      "epoch": 0.968,
-      "grad_norm": 0.45334190130233765,
-      "learning_rate": 5.7333333333333336e-05,
-      "loss": 0.3077,
-      "step": 121
-    },
-    {
-      "epoch": 0.976,
-      "grad_norm": 0.4152809679508209,
-      "learning_rate": 5.6888888888888895e-05,
-      "loss": 0.2205,
-      "step": 122
-    },
-    {
-      "epoch": 0.984,
-      "grad_norm": 0.4179824888706207,
-      "learning_rate": 5.644444444444445e-05,
-      "loss": 0.2,
-      "step": 123
-    },
-    {
-      "epoch": 0.992,
-      "grad_norm": 0.5038807392120361,
-      "learning_rate": 5.6000000000000006e-05,
-      "loss": 0.3133,
-      "step": 124
-    },
-    {
-      "epoch": 1.0,
-      "grad_norm": 0.48537206649780273,
-      "learning_rate": 5.555555555555556e-05,
-      "loss": 0.3349,
-      "step": 125
-    },
-    {
-      "epoch": 1.008,
-      "grad_norm": 0.30415406823158264,
-      "learning_rate": 5.511111111111111e-05,
-      "loss": 0.2907,
-      "step": 126
-    },
-    {
-      "epoch": 1.016,
-      "grad_norm": 0.3198951184749603,
-      "learning_rate": 5.466666666666666e-05,
-      "loss": 0.2422,
-      "step": 127
-    },
-    {
-      "epoch": 1.024,
-      "grad_norm": 0.31536588072776794,
-      "learning_rate": 5.422222222222223e-05,
-      "loss": 0.2573,
-      "step": 128
-    },
-    {
-      "epoch": 1.032,
-      "grad_norm": 0.30064645409584045,
-      "learning_rate": 5.377777777777778e-05,
-      "loss": 0.184,
-      "step": 129
-    },
-    {
-      "epoch": 1.04,
-      "grad_norm": 0.30268657207489014,
-      "learning_rate": 5.333333333333333e-05,
-      "loss": 0.2154,
-      "step": 130
-    },
-    {
-      "epoch": 1.048,
-      "grad_norm": 0.33857786655426025,
-      "learning_rate": 5.2888888888888885e-05,
-      "loss": 0.1998,
-      "step": 131
-    },
-    {
-      "epoch": 1.056,
-      "grad_norm": 0.3024175763130188,
-      "learning_rate": 5.244444444444445e-05,
-      "loss": 0.2746,
-      "step": 132
-    },
-    {
-      "epoch": 1.064,
-      "grad_norm": 0.36872145533561707,
-      "learning_rate": 5.2000000000000004e-05,
-      "loss": 0.2764,
-      "step": 133
-    },
-    {
-      "epoch": 1.072,
-      "grad_norm": 0.3166502118110657,
-      "learning_rate": 5.1555555555555556e-05,
-      "loss": 0.23,
-      "step": 134
-    },
-    {
-      "epoch": 1.08,
-      "grad_norm": 0.3621416389942169,
-      "learning_rate": 5.111111111111111e-05,
-      "loss": 0.2523,
-      "step": 135
-    },
-    {
-      "epoch": 1.088,
-      "grad_norm": 0.3940413296222687,
-      "learning_rate": 5.0666666666666674e-05,
-      "loss": 0.1879,
-      "step": 136
-    },
-    {
-      "epoch": 1.096,
-      "grad_norm": 0.29124918580055237,
-      "learning_rate": 5.0222222222222226e-05,
-      "loss": 0.1879,
-      "step": 137
-    },
-    {
-      "epoch": 1.104,
-      "grad_norm": 0.26944026350975037,
-      "learning_rate": 4.977777777777778e-05,
-      "loss": 0.1875,
-      "step": 138
-    },
-    {
-      "epoch": 1.112,
-      "grad_norm": 0.2946963608264923,
-      "learning_rate": 4.933333333333334e-05,
-      "loss": 0.1616,
-      "step": 139
-    },
-    {
-      "epoch": 1.12,
-      "grad_norm": 0.34473440051078796,
-      "learning_rate": 4.888888888888889e-05,
-      "loss": 0.2019,
-      "step": 140
-    },
-    {
-      "epoch": 1.1280000000000001,
-      "grad_norm": 0.2706325054168701,
-      "learning_rate": 4.844444444444445e-05,
-      "loss": 0.1621,
-      "step": 141
-    },
-    {
-      "epoch": 1.1360000000000001,
-      "grad_norm": 0.377413809299469,
-      "learning_rate": 4.8e-05,
-      "loss": 0.1689,
-      "step": 142
-    },
-    {
-      "epoch": 1.144,
-      "grad_norm": 0.41502654552459717,
-      "learning_rate": 4.755555555555556e-05,
-      "loss": 0.2477,
-      "step": 143
-    },
-    {
-      "epoch": 1.152,
-      "grad_norm": 0.36013564467430115,
-      "learning_rate": 4.711111111111111e-05,
-      "loss": 0.1666,
-      "step": 144
-    },
-    {
-      "epoch": 1.16,
-      "grad_norm": 0.35633739829063416,
-      "learning_rate": 4.666666666666667e-05,
-      "loss": 0.2476,
-      "step": 145
-    },
-    {
-      "epoch": 1.168,
-      "grad_norm": 0.44272857904434204,
-      "learning_rate": 4.6222222222222224e-05,
-      "loss": 0.1967,
-      "step": 146
-    },
-    {
-      "epoch": 1.176,
-      "grad_norm": 0.43732601404190063,
-      "learning_rate": 4.577777777777778e-05,
-      "loss": 0.1894,
-      "step": 147
-    },
-    {
-      "epoch": 1.184,
-      "grad_norm": 0.3923589885234833,
-      "learning_rate": 4.5333333333333335e-05,
-      "loss": 0.2073,
-      "step": 148
-    },
-    {
-      "epoch": 1.192,
-      "grad_norm": 0.36411362886428833,
-      "learning_rate": 4.4888888888888894e-05,
-      "loss": 0.1823,
-      "step": 149
-    },
-    {
-      "epoch": 1.2,
-      "grad_norm": 0.34394821524620056,
-      "learning_rate": 4.4444444444444447e-05,
-      "loss": 0.2084,
-      "step": 150
-    },
-    {
-      "epoch": 1.208,
-      "grad_norm": 0.3918055593967438,
-      "learning_rate": 4.4000000000000006e-05,
-      "loss": 0.1895,
-      "step": 151
-    },
-    {
-      "epoch": 1.216,
-      "grad_norm": 0.4197651445865631,
-      "learning_rate": 4.355555555555556e-05,
-      "loss": 0.153,
-      "step": 152
-    },
-    {
-      "epoch": 1.224,
-      "grad_norm": 0.3912750482559204,
-      "learning_rate": 4.311111111111111e-05,
-      "loss": 0.1695,
-      "step": 153
-    },
-    {
-      "epoch": 1.232,
-      "grad_norm": 0.37072229385375977,
-      "learning_rate": 4.266666666666667e-05,
-      "loss": 0.1688,
-      "step": 154
-    },
-    {
-      "epoch": 1.24,
-      "grad_norm": 0.3903689384460449,
-      "learning_rate": 4.222222222222222e-05,
-      "loss": 0.1852,
-      "step": 155
-    },
-    {
-      "epoch": 1.248,
-      "grad_norm": 0.4943057894706726,
-      "learning_rate": 4.177777777777778e-05,
-      "loss": 0.2265,
-      "step": 156
-    },
-    {
-      "epoch": 1.256,
-      "grad_norm": 0.3247779309749603,
-      "learning_rate": 4.133333333333333e-05,
-      "loss": 0.2305,
-      "step": 157
-    },
-    {
-      "epoch": 1.264,
-      "grad_norm": 0.37809041142463684,
-      "learning_rate": 4.088888888888889e-05,
-      "loss": 0.2274,
-      "step": 158
-    },
-    {
-      "epoch": 1.272,
-      "grad_norm": 0.4743436872959137,
-      "learning_rate": 4.0444444444444444e-05,
-      "loss": 0.2102,
-      "step": 159
-    },
-    {
-      "epoch": 1.28,
-      "grad_norm": 0.3210437297821045,
-      "learning_rate": 4e-05,
-      "loss": 0.1837,
-      "step": 160
-    },
-    {
-      "epoch": 1.288,
-      "grad_norm": 0.3802868723869324,
-      "learning_rate": 3.9555555555555556e-05,
-      "loss": 0.2297,
-      "step": 161
-    },
-    {
-      "epoch": 1.296,
-      "grad_norm": 0.47271764278411865,
-      "learning_rate": 3.9111111111111115e-05,
-      "loss": 0.211,
-      "step": 162
-    },
-    {
-      "epoch": 1.304,
-      "grad_norm": 0.3763769268989563,
-      "learning_rate": 3.866666666666667e-05,
-      "loss": 0.1803,
-      "step": 163
-    },
-    {
-      "epoch": 1.312,
-      "grad_norm": 0.38181614875793457,
-      "learning_rate": 3.8222222222222226e-05,
-      "loss": 0.2573,
-      "step": 164
-    },
-    {
-      "epoch": 1.32,
-      "grad_norm": 0.43592631816864014,
-      "learning_rate": 3.777777777777778e-05,
-      "loss": 0.2606,
-      "step": 165
-    },
-    {
-      "epoch": 1.328,
-      "grad_norm": 0.47964996099472046,
-      "learning_rate": 3.733333333333334e-05,
-      "loss": 0.2471,
-      "step": 166
-    },
-    {
-      "epoch": 1.336,
-      "grad_norm": 0.41977745294570923,
-      "learning_rate": 3.688888888888889e-05,
-      "loss": 0.2524,
-      "step": 167
-    },
-    {
-      "epoch": 1.3439999999999999,
-      "grad_norm": 0.49789348244667053,
-      "learning_rate": 3.644444444444445e-05,
-      "loss": 0.2309,
-      "step": 168
-    },
-    {
-      "epoch": 1.3519999999999999,
-      "grad_norm": 0.5704350471496582,
-      "learning_rate": 3.6e-05,
-      "loss": 0.2852,
-      "step": 169
-    },
-    {
-      "epoch": 1.3599999999999999,
-      "grad_norm": 0.33901578187942505,
-      "learning_rate": 3.555555555555556e-05,
-      "loss": 0.1612,
-      "step": 170
-    },
-    {
-      "epoch": 1.3679999999999999,
-      "grad_norm": 0.3156227171421051,
-      "learning_rate": 3.511111111111111e-05,
-      "loss": 0.1672,
-      "step": 171
-    },
-    {
-      "epoch": 1.376,
-      "grad_norm": 0.31457459926605225,
-      "learning_rate": 3.466666666666667e-05,
-      "loss": 0.191,
-      "step": 172
-    },
-    {
-      "epoch": 1.384,
-      "grad_norm": 0.4021678566932678,
-      "learning_rate": 3.4222222222222224e-05,
-      "loss": 0.1933,
-      "step": 173
-    },
-    {
-      "epoch": 1.392,
-      "grad_norm": 0.41086873412132263,
-      "learning_rate": 3.377777777777778e-05,
-      "loss": 0.2026,
-      "step": 174
-    },
-    {
-      "epoch": 1.4,
-      "grad_norm": 0.3638031482696533,
-      "learning_rate": 3.3333333333333335e-05,
-      "loss": 0.2336,
-      "step": 175
-    },
-    {
-      "epoch": 1.408,
-      "grad_norm": 0.34231624007225037,
-      "learning_rate": 3.2888888888888894e-05,
-      "loss": 0.1607,
-      "step": 176
-    },
-    {
-      "epoch": 1.416,
-      "grad_norm": 0.41986358165740967,
-      "learning_rate": 3.2444444444444446e-05,
-      "loss": 0.2123,
-      "step": 177
-    },
-    {
-      "epoch": 1.424,
-      "grad_norm": 0.35257431864738464,
-      "learning_rate": 3.2000000000000005e-05,
-      "loss": 0.22,
-      "step": 178
-    },
-    {
-      "epoch": 1.432,
-      "grad_norm": 0.33527669310569763,
-      "learning_rate": 3.155555555555556e-05,
-      "loss": 0.1589,
-      "step": 179
-    },
-    {
-      "epoch": 1.44,
-      "grad_norm": 0.44640272855758667,
-      "learning_rate": 3.111111111111111e-05,
-      "loss": 0.208,
-      "step": 180
-    },
-    {
-      "epoch": 1.448,
-      "grad_norm": 0.45262229442596436,
-      "learning_rate": 3.066666666666667e-05,
-      "loss": 0.1956,
-      "step": 181
-    },
-    {
-      "epoch": 1.456,
-      "grad_norm": 0.3733077049255371,
-      "learning_rate": 3.0222222222222225e-05,
-      "loss": 0.1876,
-      "step": 182
-    },
-    {
-      "epoch": 1.464,
-      "grad_norm": 0.28761085867881775,
-      "learning_rate": 2.9777777777777777e-05,
-      "loss": 0.1698,
-      "step": 183
-    },
-    {
-      "epoch": 1.472,
-      "grad_norm": 0.3967605233192444,
-      "learning_rate": 2.9333333333333336e-05,
-      "loss": 0.1664,
-      "step": 184
-    },
-    {
-      "epoch": 1.48,
-      "grad_norm": 0.38993561267852783,
-      "learning_rate": 2.8888888888888888e-05,
-      "loss": 0.1248,
-      "step": 185
-    },
-    {
-      "epoch": 1.488,
-      "grad_norm": 0.366316020488739,
-      "learning_rate": 2.8444444444444447e-05,
-      "loss": 0.144,
-      "step": 186
-    },
-    {
-      "epoch": 1.496,
-      "grad_norm": 0.4570133686065674,
-      "learning_rate": 2.8000000000000003e-05,
-      "loss": 0.1814,
-      "step": 187
-    },
-    {
-      "epoch": 1.504,
-      "grad_norm": 0.393306165933609,
-      "learning_rate": 2.7555555555555555e-05,
-      "loss": 0.2775,
-      "step": 188
-    },
-    {
-      "epoch": 1.512,
-      "grad_norm": 0.5033831000328064,
-      "learning_rate": 2.7111111111111114e-05,
-      "loss": 0.2751,
-      "step": 189
-    },
-    {
-      "epoch": 1.52,
-      "grad_norm": 0.3124541640281677,
-      "learning_rate": 2.6666666666666667e-05,
-      "loss": 0.2068,
-      "step": 190
-    },
-    {
-      "epoch": 1.528,
-      "grad_norm": 0.30927085876464844,
-      "learning_rate": 2.6222222222222226e-05,
-      "loss": 0.1823,
-      "step": 191
-    },
-    {
-      "epoch": 1.536,
-      "grad_norm": 0.3375771641731262,
-      "learning_rate": 2.5777777777777778e-05,
-      "loss": 0.1597,
-      "step": 192
-    },
-    {
-      "epoch": 1.544,
-      "grad_norm": 0.4232199788093567,
-      "learning_rate": 2.5333333333333337e-05,
-      "loss": 0.2307,
-      "step": 193
-    },
-    {
-      "epoch": 1.552,
-      "grad_norm": 0.41345012187957764,
-      "learning_rate": 2.488888888888889e-05,
-      "loss": 0.2137,
-      "step": 194
-    },
-    {
-      "epoch": 1.56,
-      "grad_norm": 0.4343262314796448,
-      "learning_rate": 2.4444444444444445e-05,
-      "loss": 0.2229,
-      "step": 195
-    },
-    {
-      "epoch": 1.568,
-      "grad_norm": 0.5007179975509644,
-      "learning_rate": 2.4e-05,
-      "loss": 0.2033,
-      "step": 196
-    },
-    {
-      "epoch": 1.576,
-      "grad_norm": 0.4232487976551056,
-      "learning_rate": 2.3555555555555556e-05,
-      "loss": 0.1694,
-      "step": 197
-    },
-    {
-      "epoch": 1.584,
-      "grad_norm": 0.42341843247413635,
-      "learning_rate": 2.3111111111111112e-05,
-      "loss": 0.1525,
-      "step": 198
-    },
-    {
-      "epoch": 1.592,
-      "grad_norm": 0.3467739224433899,
-      "learning_rate": 2.2666666666666668e-05,
-      "loss": 0.2528,
-      "step": 199
-    },
-    {
-      "epoch": 1.6,
-      "grad_norm": 0.5528080463409424,
-      "learning_rate": 2.2222222222222223e-05,
-      "loss": 0.2612,
-      "step": 200
-    },
-    {
-      "epoch": 1.608,
-      "grad_norm": 0.4317588210105896,
-      "learning_rate": 2.177777777777778e-05,
-      "loss": 0.2145,
-      "step": 201
-    },
-    {
-      "epoch": 1.616,
-      "grad_norm": 0.3030836284160614,
-      "learning_rate": 2.1333333333333335e-05,
-      "loss": 0.1564,
-      "step": 202
-    },
-    {
-      "epoch": 1.624,
-      "grad_norm": 0.402210533618927,
-      "learning_rate": 2.088888888888889e-05,
-      "loss": 0.1918,
-      "step": 203
-    },
-    {
-      "epoch": 1.6320000000000001,
-      "grad_norm": 0.41674157977104187,
-      "learning_rate": 2.0444444444444446e-05,
-      "loss": 0.182,
-      "step": 204
-    },
-    {
-      "epoch": 1.6400000000000001,
-      "grad_norm": 0.41260281205177307,
-      "learning_rate": 2e-05,
-      "loss": 0.1704,
-      "step": 205
-    },
-    {
-      "epoch": 1.6480000000000001,
-      "grad_norm": 0.4819144904613495,
-      "learning_rate": 1.9555555555555557e-05,
-      "loss": 0.2364,
-      "step": 206
-    },
-    {
-      "epoch": 1.6560000000000001,
-      "grad_norm": 0.40655967593193054,
-      "learning_rate": 1.9111111111111113e-05,
-      "loss": 0.1917,
-      "step": 207
-    },
-    {
-      "epoch": 1.6640000000000001,
-      "grad_norm": 0.4548530876636505,
-      "learning_rate": 1.866666666666667e-05,
-      "loss": 0.176,
-      "step": 208
-    },
-    {
-      "epoch": 1.6720000000000002,
-      "grad_norm": 0.40585026144981384,
-      "learning_rate": 1.8222222222222224e-05,
-      "loss": 0.1557,
-      "step": 209
-    },
-    {
-      "epoch": 1.6800000000000002,
-      "grad_norm": 0.4349209666252136,
-      "learning_rate": 1.777777777777778e-05,
-      "loss": 0.171,
-      "step": 210
-    },
-    {
-      "epoch": 1.688,
-      "grad_norm": 0.39553195238113403,
-      "learning_rate": 1.7333333333333336e-05,
-      "loss": 0.1694,
-      "step": 211
-    },
-    {
-      "epoch": 1.696,
-      "grad_norm": 0.4010556638240814,
-      "learning_rate": 1.688888888888889e-05,
-      "loss": 0.1848,
-      "step": 212
-    },
-    {
-      "epoch": 1.704,
-      "grad_norm": 0.47512316703796387,
-      "learning_rate": 1.6444444444444447e-05,
-      "loss": 0.1953,
-      "step": 213
-    },
-    {
-      "epoch": 1.712,
-      "grad_norm": 0.45944473147392273,
-      "learning_rate": 1.6000000000000003e-05,
-      "loss": 0.1874,
-      "step": 214
-    },
-    {
-      "epoch": 1.72,
-      "grad_norm": 0.43075892329216003,
-      "learning_rate": 1.5555555555555555e-05,
-      "loss": 0.2026,
-      "step": 215
-    },
-    {
-      "epoch": 1.728,
-      "grad_norm": 0.4133787453174591,
-      "learning_rate": 1.5111111111111112e-05,
-      "loss": 0.1658,
-      "step": 216
-    },
-    {
-      "epoch": 1.736,
-      "grad_norm": 0.3476410210132599,
-      "learning_rate": 1.4666666666666668e-05,
-      "loss": 0.1407,
-      "step": 217
-    },
-    {
-      "epoch": 1.744,
-      "grad_norm": 0.4610888659954071,
-      "learning_rate": 1.4222222222222224e-05,
-      "loss": 0.1641,
-      "step": 218
-    },
-    {
-      "epoch": 1.752,
-      "grad_norm": 0.40983593463897705,
-      "learning_rate": 1.3777777777777778e-05,
-      "loss": 0.2953,
-      "step": 219
-    },
-    {
-      "epoch": 1.76,
-      "grad_norm": 0.36787161231040955,
-      "learning_rate": 1.3333333333333333e-05,
-      "loss": 0.2039,
-      "step": 220
-    },
-    {
-      "epoch": 1.768,
-      "grad_norm": 0.44766202569007874,
-      "learning_rate": 1.2888888888888889e-05,
-      "loss": 0.2152,
-      "step": 221
-    },
-    {
-      "epoch": 1.776,
-      "grad_norm": 0.5092465877532959,
-      "learning_rate": 1.2444444444444445e-05,
-      "loss": 0.2759,
-      "step": 222
-    },
-    {
-      "epoch": 1.784,
-      "grad_norm": 0.3997543752193451,
-      "learning_rate": 1.2e-05,
-      "loss": 0.2243,
-      "step": 223
-    },
-    {
-      "epoch": 1.792,
-      "grad_norm": 0.44627732038497925,
-      "learning_rate": 1.1555555555555556e-05,
-      "loss": 0.2238,
-      "step": 224
-    },
-    {
-      "epoch": 1.8,
-      "grad_norm": 0.36766281723976135,
-      "learning_rate": 1.1111111111111112e-05,
-      "loss": 0.1655,
-      "step": 225
-    },
-    {
-      "epoch": 1.808,
-      "grad_norm": 0.47784334421157837,
-      "learning_rate": 1.0666666666666667e-05,
-      "loss": 0.2246,
-      "step": 226
-    },
-    {
-      "epoch": 1.8159999999999998,
-      "grad_norm": 0.4621998965740204,
-      "learning_rate": 1.0222222222222223e-05,
-      "loss": 0.222,
-      "step": 227
-    },
-    {
-      "epoch": 1.8239999999999998,
-      "grad_norm": 0.5030863881111145,
-      "learning_rate": 9.777777777777779e-06,
-      "loss": 0.2328,
-      "step": 228
-    },
-    {
-      "epoch": 1.8319999999999999,
-      "grad_norm": 0.4540053606033325,
-      "learning_rate": 9.333333333333334e-06,
-      "loss": 0.2873,
-      "step": 229
-    },
-    {
-      "epoch": 1.8399999999999999,
-      "grad_norm": 0.3857741057872772,
-      "learning_rate": 8.88888888888889e-06,
-      "loss": 0.1382,
-      "step": 230
-    },
-    {
-      "epoch": 1.8479999999999999,
-      "grad_norm": 0.36751654744148254,
-      "learning_rate": 8.444444444444446e-06,
-      "loss": 0.1497,
-      "step": 231
-    },
-    {
-      "epoch": 1.8559999999999999,
-      "grad_norm": 0.4918551743030548,
-      "learning_rate": 8.000000000000001e-06,
-      "loss": 0.2721,
-      "step": 232
-    },
-    {
-      "epoch": 1.8639999999999999,
-      "grad_norm": 0.3265823721885681,
-      "learning_rate": 7.555555555555556e-06,
-      "loss": 0.1729,
-      "step": 233
-    },
-    {
-      "epoch": 1.8719999999999999,
-      "grad_norm": 0.43187111616134644,
-      "learning_rate": 7.111111111111112e-06,
-      "loss": 0.1861,
-      "step": 234
-    },
-    {
-      "epoch": 1.88,
-      "grad_norm": 0.3579561710357666,
-      "learning_rate": 6.666666666666667e-06,
-      "loss": 0.1387,
-      "step": 235
-    },
-    {
-      "epoch": 1.888,
-      "grad_norm": 0.5295534133911133,
-      "learning_rate": 6.222222222222222e-06,
-      "loss": 0.221,
-      "step": 236
-    },
-    {
-      "epoch": 1.896,
-      "grad_norm": 0.4777468740940094,
-      "learning_rate": 5.777777777777778e-06,
-      "loss": 0.2091,
-      "step": 237
-    },
-    {
-      "epoch": 1.904,
-      "grad_norm": 0.4705863893032074,
-      "learning_rate": 5.333333333333334e-06,
-      "loss": 0.19,
-      "step": 238
-    },
-    {
-      "epoch": 1.912,
-      "grad_norm": 0.49623602628707886,
-      "learning_rate": 4.888888888888889e-06,
-      "loss": 0.1991,
-      "step": 239
-    },
-    {
-      "epoch": 1.92,
-      "grad_norm": 0.3791354298591614,
-      "learning_rate": 4.444444444444445e-06,
-      "loss": 0.1504,
-      "step": 240
-    },
-    {
-      "epoch": 1.928,
-      "grad_norm": 0.46336978673934937,
-      "learning_rate": 4.000000000000001e-06,
-      "loss": 0.1508,
-      "step": 241
-    },
-    {
-      "epoch": 1.936,
-      "grad_norm": 0.36507588624954224,
-      "learning_rate": 3.555555555555556e-06,
-      "loss": 0.1585,
-      "step": 242
-    },
-    {
-      "epoch": 1.944,
-      "grad_norm": 0.3430546820163727,
-      "learning_rate": 3.111111111111111e-06,
-      "loss": 0.1401,
-      "step": 243
-    },
-    {
-      "epoch": 1.952,
-      "grad_norm": 0.36047112941741943,
-      "learning_rate": 2.666666666666667e-06,
-      "loss": 0.1252,
-      "step": 244
-    },
-    {
-      "epoch": 1.96,
-      "grad_norm": 0.37298864126205444,
-      "learning_rate": 2.2222222222222225e-06,
-      "loss": 0.1378,
-      "step": 245
-    },
-    {
-      "epoch": 1.968,
-      "grad_norm": 0.42442232370376587,
-      "learning_rate": 1.777777777777778e-06,
-      "loss": 0.1851,
-      "step": 246
-    },
-    {
-      "epoch": 1.976,
-      "grad_norm": 0.40647417306900024,
-      "learning_rate": 1.3333333333333334e-06,
-      "loss": 0.142,
-      "step": 247
-    },
-    {
-      "epoch": 1.984,
-      "grad_norm": 0.4568365812301636,
-      "learning_rate": 8.88888888888889e-07,
-      "loss": 0.1439,
-      "step": 248
-    },
-    {
-      "epoch": 1.992,
-      "grad_norm": 0.4464185833930969,
-      "learning_rate": 4.444444444444445e-07,
-      "loss": 0.2106,
-      "step": 249
-    },
-    {
-      "epoch": 2.0,
-      "grad_norm": 0.4386111795902252,
-      "learning_rate": 0.0,
-      "loss": 0.1935,
-      "step": 250
-    }
-  ],
-  "logging_steps": 1,
-  "max_steps": 250,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 2,
-  "save_steps": 500,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": true,
-        "should_training_stop": true
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 1.982387378420736e+16,
-  "train_batch_size": 2,
-  "trial_name": null,
-  "trial_params": null
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/training_args.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:3af9b30c506c005209b1c316cfa178d2a8137f74bf4ecbd882cd48586751746a
-size 5752

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/checkpoint-250/vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/config.json DELETED Viewed

@@ -1,45 +0,0 @@
-{
-  "_name_or_path": "Qwen/Qwen2.5-Coder-7B-Instruct",
-  "architectures": [
-    "Qwen2ForCausalLM"
-  ],
-  "attention_dropout": 0.0,
-  "bos_token_id": 151643,
-  "eos_token_id": 151645,
-  "hidden_act": "silu",
-  "hidden_size": 3584,
-  "initializer_range": 0.02,
-  "intermediate_size": 18944,
-  "max_position_embeddings": 32768,
-  "max_window_layers": 28,
-  "model_type": "qwen2",
-  "num_attention_heads": 28,
-  "num_hidden_layers": 28,
-  "num_key_value_heads": 4,
-  "pad_token_id": 151645,
-  "quantization_config": {
-    "_load_in_4bit": true,
-    "_load_in_8bit": false,
-    "bnb_4bit_compute_dtype": "bfloat16",
-    "bnb_4bit_quant_storage": "uint8",
-    "bnb_4bit_quant_type": "nf4",
-    "bnb_4bit_use_double_quant": true,
-    "llm_int8_enable_fp32_cpu_offload": false,
-    "llm_int8_has_fp16_weight": false,
-    "llm_int8_skip_modules": null,
-    "llm_int8_threshold": 6.0,
-    "load_in_4bit": true,
-    "load_in_8bit": false,
-    "quant_method": "bitsandbytes"
-  },
-  "rms_norm_eps": 1e-06,
-  "rope_scaling": null,
-  "rope_theta": 1000000.0,
-  "sliding_window": null,
-  "tie_word_embeddings": false,
-  "torch_dtype": "bfloat16",
-  "transformers_version": "4.46.3",
-  "use_cache": true,
-  "use_sliding_window": false,
-  "vocab_size": 152064
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/generation_config.json DELETED Viewed

@@ -1,14 +0,0 @@
-{
-  "bos_token_id": 151643,
-  "do_sample": true,
-  "eos_token_id": [
-    151645,
-    151643
-  ],
-  "pad_token_id": 151643,
-  "repetition_penalty": 1.1,
-  "temperature": 0.7,
-  "top_k": 20,
-  "top_p": 0.8,
-  "transformers_version": "4.46.3"
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/model-00001-of-00003.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:f9efccab238c821a334f385d16161aa6e7e04fb63d61346151e9ad8ecaa23a09
-size 1982173107

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/model-00002-of-00003.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:1859c7479d8d0b3ee80c560ccc030190a8239696ee0595f0e1a03f41adc2e111
-size 1994606118

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/model-00003-of-00003.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:87517c0ad31330cb532c9a37828c0bde73a8c001cfab952280503cc5dcb6a6c7
-size 1571140064

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/model.safetensors.index.json DELETED Viewed

The diff for this file is too large to render. See raw diff

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/special_tokens_map.json DELETED Viewed

@@ -1,31 +0,0 @@
-{
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "eos_token": {
-    "content": "<|im_end|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/tokenizer.json DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:913950e4971737031da511cdd1b410daae4566f62eb845b3975bca5a102323d8
-size 11421995

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/tokenizer_config.json DELETED Viewed

@@ -1,207 +0,0 @@
-{
-  "add_bos_token": false,
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "151643": {
-      "content": "<|endoftext|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151644": {
-      "content": "<|im_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151645": {
-      "content": "<|im_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151646": {
-      "content": "<|object_ref_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151647": {
-      "content": "<|object_ref_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151648": {
-      "content": "<|box_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151649": {
-      "content": "<|box_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151650": {
-      "content": "<|quad_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151651": {
-      "content": "<|quad_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151652": {
-      "content": "<|vision_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151653": {
-      "content": "<|vision_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151654": {
-      "content": "<|vision_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151655": {
-      "content": "<|image_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151656": {
-      "content": "<|video_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151657": {
-      "content": "<tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151658": {
-      "content": "</tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151659": {
-      "content": "<|fim_prefix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151660": {
-      "content": "<|fim_middle|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151661": {
-      "content": "<|fim_suffix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151662": {
-      "content": "<|fim_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151663": {
-      "content": "<|repo_name|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151664": {
-      "content": "<|file_sep|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "bos_token": null,
-  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "<|im_end|>",
-  "errors": "replace",
-  "model_max_length": 32768,
-  "pad_token": "<|endoftext|>",
-  "split_special_tokens": false,
-  "tokenizer_class": "Qwen2Tokenizer",
-  "unk_token": null
-}

Qwen2.5-Coder-7B-Instruct-math-solver-config_3/vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff