mbakgun commited on Dec 26, 2025

Commit

84bd380

verified ·

1 Parent(s): daf1f8d

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +6 -0
.ipynb_checkpoints/README-checkpoint.md +129 -0
.ipynb_checkpoints/test_model-checkpoint.py +32 -0
README.md +126 -3
adapter_config.json +42 -0
adapter_model.safetensors +3 -0
added_tokens.json +24 -0
chat_template.jinja +54 -0
checkpoint-200/README.md +208 -0
checkpoint-200/adapter_config.json +42 -0
checkpoint-200/adapter_model.safetensors +3 -0
checkpoint-200/added_tokens.json +24 -0
checkpoint-200/chat_template.jinja +54 -0
checkpoint-200/merges.txt +0 -0
checkpoint-200/optimizer.pt +3 -0
checkpoint-200/rng_state.pth +3 -0
checkpoint-200/scheduler.pt +3 -0
checkpoint-200/special_tokens_map.json +31 -0
checkpoint-200/tokenizer.json +3 -0
checkpoint-200/tokenizer_config.json +207 -0
checkpoint-200/trainer_state.json +2234 -0
checkpoint-200/training_args.bin +3 -0
checkpoint-200/vocab.json +0 -0
checkpoint-300/README.md +208 -0
checkpoint-300/adapter_config.json +42 -0
checkpoint-300/adapter_model.safetensors +3 -0
checkpoint-300/added_tokens.json +24 -0
checkpoint-300/chat_template.jinja +54 -0
checkpoint-300/merges.txt +0 -0
checkpoint-300/optimizer.pt +3 -0
checkpoint-300/rng_state.pth +3 -0
checkpoint-300/scheduler.pt +3 -0
checkpoint-300/special_tokens_map.json +31 -0
checkpoint-300/tokenizer.json +3 -0
checkpoint-300/tokenizer_config.json +207 -0
checkpoint-300/trainer_state.json +0 -0
checkpoint-300/training_args.bin +3 -0
checkpoint-300/vocab.json +0 -0
checkpoint-400/README.md +208 -0
checkpoint-400/adapter_config.json +42 -0
checkpoint-400/adapter_model.safetensors +3 -0
checkpoint-400/added_tokens.json +24 -0
checkpoint-400/chat_template.jinja +54 -0
checkpoint-400/merges.txt +0 -0
checkpoint-400/optimizer.pt +3 -0
checkpoint-400/rng_state.pth +3 -0
checkpoint-400/scheduler.pt +3 -0
checkpoint-400/special_tokens_map.json +31 -0
checkpoint-400/tokenizer.json +3 -0
checkpoint-400/tokenizer_config.json +207 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+checkpoint-200/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-300/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-400/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-432/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+img-assets/header.jpg filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+tags:
+  - n8n
+  - workflow
+  - automation
+  - fine-tuned
+  - code-generation
+  - qlora
+datasets:
+  - mbakgun/n8nbuilder-n8n-workflows-dataset
+pipeline_tag: text-generation
+language:
+  - en
+---
+# Qwen2.5-Coder-14B-n8n-Workflow-Generator
+![n8nbuilder.dev](./img-assets/header.jpg)
+Fine-tuned Qwen2.5-Coder-14B-Instruct model specialized for generating n8n workflow JSONs from natural language descriptions.
+## Model Description
+This model is a QLoRA fine-tuned version of [Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) on the [n8nbuilder-n8n-workflows-dataset](https://huggingface.co/datasets/mbakgun/n8nbuilder-n8n-workflows-dataset), containing +2.5K n8n workflow templates.
+**Training Details:**
+- **Base Model**: Qwen/Qwen2.5-Coder-14B-Instruct
+- **Method**: QLoRA (4-bit quantization)
+- **LoRA Rank**: 32
+- **LoRA Alpha**: 64
+- **Training Steps**: 432 (3 epochs)
+- **Sequence Length**: 8192 tokens
+- **Learning Rate**: 2e-4
+## Usage
+### Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_name = "mbakgun/Qwen2.5-Coder-14B-n8n-Workflow-Generator"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+system_prompt = "You are an expert n8n workflow generation assistant. Your goal is to create valid, efficient, and error-free n8n workflow JSONs based on the user's requirements. Always output ONLY the valid JSON workflow."
+user_input = "Create a workflow that monitors a RSS feed and sends new items to Discord."
+prompt = f"{system_prompt}\n\n{user_input}"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=4096,
+    temperature=0.7,
+    do_sample=True
+)
+workflow_json = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(workflow_json)
+```
+### MLX (Apple Silicon)
+```bash
+# Convert to MLX
+mlx_lm.convert \
+  --hf-path mbakgun/Qwen2.5-Coder-14B-n8n-Workflow-Generator \
+  --mlx-path ./qwen25-n8n-mlx \
+  -q
+# Generate
+mlx_lm.generate \
+  --model ./qwen25-n8n-mlx \
+  --prompt "You are an expert n8n workflow generation assistant...\n\nCreate a workflow that sends Slack notifications when GitHub issues are created." \
+  --max-tokens 4096
+```
+## Training Data
+This model was fine-tuned on the [n8nbuilder-n8n-workflows-dataset](https://huggingface.co/datasets/mbakgun/n8nbuilder-n8n-workflows-dataset), which contains:
+- **2,304 workflow templates** (after filtering sequences >8192 tokens)
+- Format: Alpaca (instruction/input/output)
+- Source: n8n.io public template gallery
+- [n8nbuilder.dev - Create n8n Workflows in Seconds with AI](https://n8nbuilder.dev)
+## Performance
+- **Training Speed**: ~33.85s/step on H100 PCIe
+- **VRAM Usage**: ~30GB (4-bit QLoRA)
+- **Inference**: ~25-40 tok/s on Mac Mini M4 64GB (MLX)
+## Limitations
+- Generated workflows may require manual validation
+- Long workflows (>8192 tokens) may be truncated
+- Model trained on public templates only
+## Citation
+```bibtex
+@model{qwen25_coder_n8n_2025,
+  title={Qwen2.5-Coder-14B-n8n-Workflow-Generator},
+  author={mbakgun},
+  year={2025},
+  base_model={Qwen/Qwen2.5-Coder-14B-Instruct},
+  dataset={mbakgun/n8nbuilder-n8n-workflows-dataset},
+  url={https://huggingface.co/mbakgun/Qwen2.5-Coder-14B-n8n-Workflow-Generator}
+}
+```
+## Acknowledgments
+- [Qwen Team](https://huggingface.co/Qwen) for the base model
+- [n8n](https://n8n.io) for the workflow automation platform
+- [n8n-mcp](https://github.com/czlonkowski/n8n-mcp) for template indexing
+## License
+Apache 2.0

.ipynb_checkpoints/test_model-checkpoint.py ADDED Viewed

	@@ -0,0 +1,32 @@

+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+base_model_name = "Qwen/Qwen2.5-Coder-14B-Instruct"
+adapter_path = "./outputs/qwen25-coder-n8n"
+print("Loading base model...")
+base_model = AutoModelForCausalLM.from_pretrained(
+    base_model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+print("Loading adapter...")
+model = PeftModel.from_pretrained(base_model, adapter_path)
+tokenizer = AutoTokenizer.from_pretrained(base_model_name)
+system_prompt = "You are an expert n8n workflow generation assistant. Your goal is to create valid, efficient, and error-free n8n workflow JSONs based on the user's requirements. Always output ONLY the valid JSON workflow."
+user_input = "Create a workflow that gets data from a webhook and sends it to Slack. Also have a sticky note as documentation."
+messages = [
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": user_input}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+print("Generating workflow...")
+outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=True, temperature=0.1)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))

README.md CHANGED Viewed

@@ -1,3 +1,126 @@
----
-license: apache-2.0
----

+---
+library_name: peft
+license: apache-2.0
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+tags:
+- axolotl
+- base_model:adapter:Qwen/Qwen2.5-Coder-14B-Instruct
+- lora
+- transformers
+datasets:
+- mbakgun/n8nbuilder-n8n-workflows-dataset
+pipeline_tag: text-generation
+model-index:
+- name: outputs/qwen25-coder-n8n
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.13.0.dev0`
+```yaml
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+load_in_4bit: true
+adapter: qlora
+bnb_4bit_compute_dtype: bfloat16
+bnb_4bit_use_double_quant: true
+bnb_4bit_quant_type: nf4
+lora_r: 32
+lora_alpha: 64
+lora_dropout: 0.05
+lora_target_modules:
+  - q_proj
+  - k_proj
+  - v_proj
+  - o_proj
+  - gate_proj
+  - up_proj
+  - down_proj
+datasets:
+  - path: mbakgun/n8nbuilder-n8n-workflows-dataset
+    type: alpaca
+sequence_len: 8192
+sample_packing: false
+pad_to_sequence_len: false
+micro_batch_size: 1
+gradient_accumulation_steps: 16
+num_epochs: 3
+learning_rate: 2e-4
+lr_scheduler: cosine
+warmup_ratio: 0.1
+weight_decay: 0.01
+optimizer: adamw_bnb_8bit
+bf16: true
+tf32: true
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+train_on_inputs: false
+output_dir: ./outputs/qwen25-coder-n8n
+save_strategy: steps
+save_steps: 100
+logging_steps: 1
+flash_attention: true
+```
+</details><br>
+# outputs/qwen25-coder-n8n
+This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) on the mbakgun/n8nbuilder-n8n-workflows-dataset dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0002
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 16
+- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 43
+- training_steps: 432
+### Training results
+### Framework versions
+- PEFT 0.17.1
+- Transformers 4.57.0
+- Pytorch 2.7.1+cu126
+- Datasets 4.0.0
+- Tokenizers 0.22.1

adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "down_proj",
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "k_proj"
+  ],
+  "target_parameters": [],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b1d9d4ffcee0be992300b65795d78890d05f2efab9415fbf6c3b0d766246125
+size 550593184

added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

checkpoint-200/README.md ADDED Viewed

	@@ -0,0 +1,208 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- axolotl
+- base_model:adapter:Qwen/Qwen2.5-Coder-14B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "down_proj",
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "k_proj"
+  ],
+  "target_parameters": [],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e9721072167c0246f36fd6fb94902be365c05f3d748ece7ae1cd9bd285f36b00
+size 550593184

checkpoint-200/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-200/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

checkpoint-200/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:699f57458d973a6dfd8c65ff135cc49dcbd877e444891872bd7ec217061e98da
+size 280341861

checkpoint-200/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:16f82cf93e8654130004eec31b114ac5df86dc9281afce0ca2f6572fb5a44f6b
+size 14645

checkpoint-200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:91f21784a8f0cb263be2beb02bbf445ff22229c9edf3c87d40868a63594e9f4e
+size 1465

checkpoint-200/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-200/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

checkpoint-200/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2234 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.3888888888888888,
+  "eval_steps": 500,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.006944444444444444,
+      "grad_norm": 0.09848134219646454,
+      "learning_rate": 0.0,
+      "loss": 1.0821,
+      "memory/device_reserved (GiB)": 30.8,
+      "memory/max_active (GiB)": 27.69,
+      "memory/max_allocated (GiB)": 27.69,
+      "step": 1,
+      "tokens_per_second_per_gpu": 1763.68
+    },
+    {
+      "epoch": 0.013888888888888888,
+      "grad_norm": 0.11234692484140396,
+      "learning_rate": 4.651162790697674e-06,
+      "loss": 1.2119,
+      "memory/device_reserved (GiB)": 31.05,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 2,
+      "tokens_per_second_per_gpu": 1834.64
+    },
+    {
+      "epoch": 0.020833333333333332,
+      "grad_norm": 0.11071926355361938,
+      "learning_rate": 9.302325581395349e-06,
+      "loss": 1.2053,
+      "memory/device_reserved (GiB)": 31.05,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 3,
+      "tokens_per_second_per_gpu": 1875.82
+    },
+    {
+      "epoch": 0.027777777777777776,
+      "grad_norm": 0.10147764533758163,
+      "learning_rate": 1.3953488372093024e-05,
+      "loss": 1.0514,
+      "memory/device_reserved (GiB)": 32.27,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 4,
+      "tokens_per_second_per_gpu": 1792.72
+    },
+    {
+      "epoch": 0.034722222222222224,
+      "grad_norm": 0.10568977892398834,
+      "learning_rate": 1.8604651162790697e-05,
+      "loss": 1.209,
+      "memory/device_reserved (GiB)": 32.27,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 5,
+      "tokens_per_second_per_gpu": 1840.61
+    },
+    {
+      "epoch": 0.041666666666666664,
+      "grad_norm": 0.10363873094320297,
+      "learning_rate": 2.3255813953488374e-05,
+      "loss": 1.0817,
+      "memory/device_reserved (GiB)": 32.27,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 6,
+      "tokens_per_second_per_gpu": 1784.92
+    },
+    {
+      "epoch": 0.04861111111111111,
+      "grad_norm": 0.113986074924469,
+      "learning_rate": 2.7906976744186048e-05,
+      "loss": 1.1571,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 7,
+      "tokens_per_second_per_gpu": 1892.63
+    },
+    {
+      "epoch": 0.05555555555555555,
+      "grad_norm": 0.1191892921924591,
+      "learning_rate": 3.2558139534883724e-05,
+      "loss": 1.1444,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 8,
+      "tokens_per_second_per_gpu": 1863.93
+    },
+    {
+      "epoch": 0.0625,
+      "grad_norm": 0.11628979444503784,
+      "learning_rate": 3.7209302325581394e-05,
+      "loss": 1.1786,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 24.64,
+      "memory/max_allocated (GiB)": 24.64,
+      "step": 9,
+      "tokens_per_second_per_gpu": 1764.21
+    },
+    {
+      "epoch": 0.06944444444444445,
+      "grad_norm": 0.10155434161424637,
+      "learning_rate": 4.186046511627907e-05,
+      "loss": 1.0695,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 10,
+      "tokens_per_second_per_gpu": 1823.2
+    },
+    {
+      "epoch": 0.0763888888888889,
+      "grad_norm": 0.08485760539770126,
+      "learning_rate": 4.651162790697675e-05,
+      "loss": 1.0805,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 11,
+      "tokens_per_second_per_gpu": 1805.0
+    },
+    {
+      "epoch": 0.08333333333333333,
+      "grad_norm": 0.07211048156023026,
+      "learning_rate": 5.1162790697674425e-05,
+      "loss": 1.0824,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 12,
+      "tokens_per_second_per_gpu": 1799.74
+    },
+    {
+      "epoch": 0.09027777777777778,
+      "grad_norm": 0.06483420729637146,
+      "learning_rate": 5.5813953488372095e-05,
+      "loss": 1.0264,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 13,
+      "tokens_per_second_per_gpu": 1798.61
+    },
+    {
+      "epoch": 0.09722222222222222,
+      "grad_norm": 0.06657296419143677,
+      "learning_rate": 6.0465116279069765e-05,
+      "loss": 1.0967,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 14,
+      "tokens_per_second_per_gpu": 1788.44
+    },
+    {
+      "epoch": 0.10416666666666667,
+      "grad_norm": 0.195042684674263,
+      "learning_rate": 6.511627906976745e-05,
+      "loss": 1.1489,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 24.8,
+      "memory/max_allocated (GiB)": 24.8,
+      "step": 15,
+      "tokens_per_second_per_gpu": 1785.21
+    },
+    {
+      "epoch": 0.1111111111111111,
+      "grad_norm": 0.07728952169418335,
+      "learning_rate": 6.976744186046513e-05,
+      "loss": 1.0985,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 16,
+      "tokens_per_second_per_gpu": 1752.91
+    },
+    {
+      "epoch": 0.11805555555555555,
+      "grad_norm": 0.08134876191616058,
+      "learning_rate": 7.441860465116279e-05,
+      "loss": 1.1112,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 26.06,
+      "memory/max_allocated (GiB)": 26.06,
+      "step": 17,
+      "tokens_per_second_per_gpu": 1813.48
+    },
+    {
+      "epoch": 0.125,
+      "grad_norm": 0.08289807289838791,
+      "learning_rate": 7.906976744186047e-05,
+      "loss": 1.0222,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 18,
+      "tokens_per_second_per_gpu": 1852.47
+    },
+    {
+      "epoch": 0.13194444444444445,
+      "grad_norm": 0.09635733813047409,
+      "learning_rate": 8.372093023255814e-05,
+      "loss": 1.1493,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 24.17,
+      "memory/max_allocated (GiB)": 24.17,
+      "step": 19,
+      "tokens_per_second_per_gpu": 1745.06
+    },
+    {
+      "epoch": 0.1388888888888889,
+      "grad_norm": 0.08602173626422882,
+      "learning_rate": 8.837209302325582e-05,
+      "loss": 0.9912,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 26.25,
+      "memory/max_allocated (GiB)": 26.25,
+      "step": 20,
+      "tokens_per_second_per_gpu": 1758.58
+    },
+    {
+      "epoch": 0.14583333333333334,
+      "grad_norm": 0.08320974558591843,
+      "learning_rate": 9.30232558139535e-05,
+      "loss": 1.1637,
+      "memory/device_reserved (GiB)": 32.33,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 21,
+      "tokens_per_second_per_gpu": 1772.58
+    },
+    {
+      "epoch": 0.1527777777777778,
+      "grad_norm": 0.0785663053393364,
+      "learning_rate": 9.767441860465116e-05,
+      "loss": 1.0209,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 22,
+      "tokens_per_second_per_gpu": 1822.66
+    },
+    {
+      "epoch": 0.1597222222222222,
+      "grad_norm": 0.07734047621488571,
+      "learning_rate": 0.00010232558139534885,
+      "loss": 0.9858,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 23,
+      "tokens_per_second_per_gpu": 1889.88
+    },
+    {
+      "epoch": 0.16666666666666666,
+      "grad_norm": 0.07255646586418152,
+      "learning_rate": 0.00010697674418604651,
+      "loss": 1.0003,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 24,
+      "tokens_per_second_per_gpu": 1833.16
+    },
+    {
+      "epoch": 0.1736111111111111,
+      "grad_norm": 0.07897679507732391,
+      "learning_rate": 0.00011162790697674419,
+      "loss": 1.0143,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 25,
+      "tokens_per_second_per_gpu": 1900.98
+    },
+    {
+      "epoch": 0.18055555555555555,
+      "grad_norm": 0.09510312229394913,
+      "learning_rate": 0.00011627906976744187,
+      "loss": 1.0341,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 23.18,
+      "memory/max_allocated (GiB)": 23.18,
+      "step": 26,
+      "tokens_per_second_per_gpu": 1701.34
+    },
+    {
+      "epoch": 0.1875,
+      "grad_norm": 0.07016909122467041,
+      "learning_rate": 0.00012093023255813953,
+      "loss": 1.0004,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.38,
+      "memory/max_allocated (GiB)": 28.38,
+      "step": 27,
+      "tokens_per_second_per_gpu": 1891.5
+    },
+    {
+      "epoch": 0.19444444444444445,
+      "grad_norm": 0.07151541113853455,
+      "learning_rate": 0.0001255813953488372,
+      "loss": 0.9588,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 28,
+      "tokens_per_second_per_gpu": 1890.08
+    },
+    {
+      "epoch": 0.2013888888888889,
+      "grad_norm": 0.07155290246009827,
+      "learning_rate": 0.0001302325581395349,
+      "loss": 1.0078,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 29,
+      "tokens_per_second_per_gpu": 1759.11
+    },
+    {
+      "epoch": 0.20833333333333334,
+      "grad_norm": 0.08267220109701157,
+      "learning_rate": 0.00013488372093023256,
+      "loss": 1.0343,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 26.49,
+      "memory/max_allocated (GiB)": 26.49,
+      "step": 30,
+      "tokens_per_second_per_gpu": 1794.99
+    },
+    {
+      "epoch": 0.2152777777777778,
+      "grad_norm": 0.06379543989896774,
+      "learning_rate": 0.00013953488372093025,
+      "loss": 0.9155,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 31,
+      "tokens_per_second_per_gpu": 1829.94
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 0.07846751064062119,
+      "learning_rate": 0.00014418604651162791,
+      "loss": 1.0727,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 23.79,
+      "memory/max_allocated (GiB)": 23.79,
+      "step": 32,
+      "tokens_per_second_per_gpu": 1854.31
+    },
+    {
+      "epoch": 0.22916666666666666,
+      "grad_norm": 0.07601239532232285,
+      "learning_rate": 0.00014883720930232558,
+      "loss": 1.0472,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 33,
+      "tokens_per_second_per_gpu": 1806.02
+    },
+    {
+      "epoch": 0.2361111111111111,
+      "grad_norm": 0.09074926376342773,
+      "learning_rate": 0.00015348837209302327,
+      "loss": 1.034,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 23.79,
+      "memory/max_allocated (GiB)": 23.79,
+      "step": 34,
+      "tokens_per_second_per_gpu": 1814.81
+    },
+    {
+      "epoch": 0.24305555555555555,
+      "grad_norm": 0.07441543787717819,
+      "learning_rate": 0.00015813953488372093,
+      "loss": 0.9786,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 35,
+      "tokens_per_second_per_gpu": 1865.18
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 0.08436308056116104,
+      "learning_rate": 0.00016279069767441862,
+      "loss": 0.9218,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 36,
+      "tokens_per_second_per_gpu": 1860.2
+    },
+    {
+      "epoch": 0.2569444444444444,
+      "grad_norm": 0.07554468512535095,
+      "learning_rate": 0.00016744186046511629,
+      "loss": 1.0504,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 37,
+      "tokens_per_second_per_gpu": 1816.31
+    },
+    {
+      "epoch": 0.2638888888888889,
+      "grad_norm": 0.09911656379699707,
+      "learning_rate": 0.00017209302325581395,
+      "loss": 0.9141,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 26.26,
+      "memory/max_allocated (GiB)": 26.26,
+      "step": 38,
+      "tokens_per_second_per_gpu": 1863.8
+    },
+    {
+      "epoch": 0.2708333333333333,
+      "grad_norm": 0.07778877764940262,
+      "learning_rate": 0.00017674418604651164,
+      "loss": 1.0342,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 24.17,
+      "memory/max_allocated (GiB)": 24.17,
+      "step": 39,
+      "tokens_per_second_per_gpu": 1619.33
+    },
+    {
+      "epoch": 0.2777777777777778,
+      "grad_norm": 0.09776122868061066,
+      "learning_rate": 0.0001813953488372093,
+      "loss": 0.9365,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 26.06,
+      "memory/max_allocated (GiB)": 26.06,
+      "step": 40,
+      "tokens_per_second_per_gpu": 1708.88
+    },
+    {
+      "epoch": 0.2847222222222222,
+      "grad_norm": 0.09212527424097061,
+      "learning_rate": 0.000186046511627907,
+      "loss": 0.9455,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 41,
+      "tokens_per_second_per_gpu": 1854.36
+    },
+    {
+      "epoch": 0.2916666666666667,
+      "grad_norm": 0.1160384938120842,
+      "learning_rate": 0.00019069767441860466,
+      "loss": 0.9964,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 23.7,
+      "memory/max_allocated (GiB)": 23.7,
+      "step": 42,
+      "tokens_per_second_per_gpu": 1775.41
+    },
+    {
+      "epoch": 0.2986111111111111,
+      "grad_norm": 0.06805545091629028,
+      "learning_rate": 0.00019534883720930232,
+      "loss": 0.8627,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 43,
+      "tokens_per_second_per_gpu": 1830.25
+    },
+    {
+      "epoch": 0.3055555555555556,
+      "grad_norm": 0.06951376795768738,
+      "learning_rate": 0.0002,
+      "loss": 0.9417,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 44,
+      "tokens_per_second_per_gpu": 1843.05
+    },
+    {
+      "epoch": 0.3125,
+      "grad_norm": 0.06728649139404297,
+      "learning_rate": 0.00019999673886943734,
+      "loss": 0.9017,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 45,
+      "tokens_per_second_per_gpu": 1889.73
+    },
+    {
+      "epoch": 0.3194444444444444,
+      "grad_norm": 0.0888209342956543,
+      "learning_rate": 0.0001999869556904488,
+      "loss": 1.0087,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 46,
+      "tokens_per_second_per_gpu": 1805.58
+    },
+    {
+      "epoch": 0.3263888888888889,
+      "grad_norm": 0.07477093487977982,
+      "learning_rate": 0.00019997065110111885,
+      "loss": 0.9246,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 47,
+      "tokens_per_second_per_gpu": 1807.73
+    },
+    {
+      "epoch": 0.3333333333333333,
+      "grad_norm": 0.08000776916742325,
+      "learning_rate": 0.00019994782616487538,
+      "loss": 0.936,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 48,
+      "tokens_per_second_per_gpu": 1732.16
+    },
+    {
+      "epoch": 0.3402777777777778,
+      "grad_norm": 0.2703610956668854,
+      "learning_rate": 0.00019991848237042035,
+      "loss": 0.9732,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 24.64,
+      "memory/max_allocated (GiB)": 24.64,
+      "step": 49,
+      "tokens_per_second_per_gpu": 1814.28
+    },
+    {
+      "epoch": 0.3472222222222222,
+      "grad_norm": 0.08173573762178421,
+      "learning_rate": 0.00019988262163163264,
+      "loss": 0.9867,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 26.25,
+      "memory/max_allocated (GiB)": 26.25,
+      "step": 50,
+      "tokens_per_second_per_gpu": 1746.16
+    },
+    {
+      "epoch": 0.3541666666666667,
+      "grad_norm": 0.06703449040651321,
+      "learning_rate": 0.00019984024628744328,
+      "loss": 0.9353,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.31,
+      "memory/max_allocated (GiB)": 25.31,
+      "step": 51,
+      "tokens_per_second_per_gpu": 1803.49
+    },
+    {
+      "epoch": 0.3611111111111111,
+      "grad_norm": 0.0770621970295906,
+      "learning_rate": 0.0001997913591016829,
+      "loss": 0.9705,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.44,
+      "memory/max_allocated (GiB)": 27.44,
+      "step": 52,
+      "tokens_per_second_per_gpu": 1805.23
+    },
+    {
+      "epoch": 0.3680555555555556,
+      "grad_norm": 0.08800782263278961,
+      "learning_rate": 0.00019973596326290137,
+      "loss": 0.9082,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 53,
+      "tokens_per_second_per_gpu": 1787.79
+    },
+    {
+      "epoch": 0.375,
+      "grad_norm": 0.0656328946352005,
+      "learning_rate": 0.00019967406238415998,
+      "loss": 0.964,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 54,
+      "tokens_per_second_per_gpu": 1888.13
+    },
+    {
+      "epoch": 0.3819444444444444,
+      "grad_norm": 0.09178014099597931,
+      "learning_rate": 0.00019960566050279566,
+      "loss": 0.918,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 55,
+      "tokens_per_second_per_gpu": 1822.06
+    },
+    {
+      "epoch": 0.3888888888888889,
+      "grad_norm": 0.07544898241758347,
+      "learning_rate": 0.00019953076208015772,
+      "loss": 1.0078,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 56,
+      "tokens_per_second_per_gpu": 1779.76
+    },
+    {
+      "epoch": 0.3958333333333333,
+      "grad_norm": 0.07013165950775146,
+      "learning_rate": 0.0001994493720013169,
+      "loss": 0.952,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 57,
+      "tokens_per_second_per_gpu": 1875.9
+    },
+    {
+      "epoch": 0.4027777777777778,
+      "grad_norm": 0.2212851643562317,
+      "learning_rate": 0.00019936149557474666,
+      "loss": 0.9751,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 58,
+      "tokens_per_second_per_gpu": 1808.54
+    },
+    {
+      "epoch": 0.4097222222222222,
+      "grad_norm": 0.0661102756857872,
+      "learning_rate": 0.00019926713853197695,
+      "loss": 0.9055,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 24.74,
+      "memory/max_allocated (GiB)": 24.74,
+      "step": 59,
+      "tokens_per_second_per_gpu": 1802.3
+    },
+    {
+      "epoch": 0.4166666666666667,
+      "grad_norm": 0.08663811534643173,
+      "learning_rate": 0.0001991663070272206,
+      "loss": 0.9826,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 24.17,
+      "memory/max_allocated (GiB)": 24.17,
+      "step": 60,
+      "tokens_per_second_per_gpu": 1707.88
+    },
+    {
+      "epoch": 0.4236111111111111,
+      "grad_norm": 0.07683200389146805,
+      "learning_rate": 0.0001990590076369715,
+      "loss": 0.9759,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 61,
+      "tokens_per_second_per_gpu": 1895.98
+    },
+    {
+      "epoch": 0.4305555555555556,
+      "grad_norm": 0.07595925778150558,
+      "learning_rate": 0.00019894524735957622,
+      "loss": 0.9168,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 62,
+      "tokens_per_second_per_gpu": 1797.18
+    },
+    {
+      "epoch": 0.4375,
+      "grad_norm": 0.07661418616771698,
+      "learning_rate": 0.00019882503361477705,
+      "loss": 0.9679,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 63,
+      "tokens_per_second_per_gpu": 1830.69
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 0.08054457604885101,
+      "learning_rate": 0.00019869837424322829,
+      "loss": 0.9592,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 64,
+      "tokens_per_second_per_gpu": 1798.08
+    },
+    {
+      "epoch": 0.4513888888888889,
+      "grad_norm": 0.08320043236017227,
+      "learning_rate": 0.00019856527750598493,
+      "loss": 0.9257,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 65,
+      "tokens_per_second_per_gpu": 1795.23
+    },
+    {
+      "epoch": 0.4583333333333333,
+      "grad_norm": 0.0733579471707344,
+      "learning_rate": 0.00019842575208396372,
+      "loss": 0.8969,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 66,
+      "tokens_per_second_per_gpu": 1861.37
+    },
+    {
+      "epoch": 0.4652777777777778,
+      "grad_norm": 0.29595091938972473,
+      "learning_rate": 0.00019827980707737703,
+      "loss": 0.8604,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 67,
+      "tokens_per_second_per_gpu": 1844.94
+    },
+    {
+      "epoch": 0.4722222222222222,
+      "grad_norm": 0.10486430674791336,
+      "learning_rate": 0.00019812745200513927,
+      "loss": 0.9479,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 68,
+      "tokens_per_second_per_gpu": 1823.2
+    },
+    {
+      "epoch": 0.4791666666666667,
+      "grad_norm": 0.13543325662612915,
+      "learning_rate": 0.0001979686968042461,
+      "loss": 0.9287,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 69,
+      "tokens_per_second_per_gpu": 1821.33
+    },
+    {
+      "epoch": 0.4861111111111111,
+      "grad_norm": 0.07873474061489105,
+      "learning_rate": 0.00019780355182912626,
+      "loss": 0.9248,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 70,
+      "tokens_per_second_per_gpu": 1771.5
+    },
+    {
+      "epoch": 0.4930555555555556,
+      "grad_norm": 0.06926668435335159,
+      "learning_rate": 0.0001976320278509663,
+      "loss": 0.9172,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 26.06,
+      "memory/max_allocated (GiB)": 26.06,
+      "step": 71,
+      "tokens_per_second_per_gpu": 1875.05
+    },
+    {
+      "epoch": 0.5,
+      "grad_norm": 0.08082268387079239,
+      "learning_rate": 0.0001974541360570079,
+      "loss": 0.8823,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 72,
+      "tokens_per_second_per_gpu": 1865.89
+    },
+    {
+      "epoch": 0.5069444444444444,
+      "grad_norm": 0.07178379595279694,
+      "learning_rate": 0.00019726988804981844,
+      "loss": 0.9185,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 73,
+      "tokens_per_second_per_gpu": 1854.53
+    },
+    {
+      "epoch": 0.5138888888888888,
+      "grad_norm": 0.07196955382823944,
+      "learning_rate": 0.00019707929584653408,
+      "loss": 0.9461,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 74,
+      "tokens_per_second_per_gpu": 1820.12
+    },
+    {
+      "epoch": 0.5208333333333334,
+      "grad_norm": 0.07153692096471786,
+      "learning_rate": 0.00019688237187807594,
+      "loss": 1.0447,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 75,
+      "tokens_per_second_per_gpu": 1809.64
+    },
+    {
+      "epoch": 0.5277777777777778,
+      "grad_norm": 0.06721945106983185,
+      "learning_rate": 0.00019667912898833955,
+      "loss": 0.8106,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 23.22,
+      "memory/max_allocated (GiB)": 23.22,
+      "step": 76,
+      "tokens_per_second_per_gpu": 1785.2
+    },
+    {
+      "epoch": 0.5347222222222222,
+      "grad_norm": 0.08322236686944962,
+      "learning_rate": 0.00019646958043335677,
+      "loss": 0.9299,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 24.64,
+      "memory/max_allocated (GiB)": 24.64,
+      "step": 77,
+      "tokens_per_second_per_gpu": 1836.34
+    },
+    {
+      "epoch": 0.5416666666666666,
+      "grad_norm": 0.06773433834314346,
+      "learning_rate": 0.00019625373988043165,
+      "loss": 0.9262,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 78,
+      "tokens_per_second_per_gpu": 1850.19
+    },
+    {
+      "epoch": 0.5486111111111112,
+      "grad_norm": 0.06558340042829514,
+      "learning_rate": 0.00019603162140724862,
+      "loss": 0.9067,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 79,
+      "tokens_per_second_per_gpu": 1876.53
+    },
+    {
+      "epoch": 0.5555555555555556,
+      "grad_norm": 0.06962298601865768,
+      "learning_rate": 0.0001958032395009545,
+      "loss": 0.8971,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 80,
+      "tokens_per_second_per_gpu": 1767.39
+    },
+    {
+      "epoch": 0.5625,
+      "grad_norm": 0.08821487426757812,
+      "learning_rate": 0.00019556860905721362,
+      "loss": 0.9593,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.11,
+      "memory/max_allocated (GiB)": 25.11,
+      "step": 81,
+      "tokens_per_second_per_gpu": 1789.22
+    },
+    {
+      "epoch": 0.5694444444444444,
+      "grad_norm": 0.07249249517917633,
+      "learning_rate": 0.00019532774537923617,
+      "loss": 0.9409,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.44,
+      "memory/max_allocated (GiB)": 27.44,
+      "step": 82,
+      "tokens_per_second_per_gpu": 1764.78
+    },
+    {
+      "epoch": 0.5763888888888888,
+      "grad_norm": 0.08999690413475037,
+      "learning_rate": 0.00019508066417678018,
+      "loss": 0.8989,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 83,
+      "tokens_per_second_per_gpu": 1821.55
+    },
+    {
+      "epoch": 0.5833333333333334,
+      "grad_norm": 0.06516412645578384,
+      "learning_rate": 0.00019482738156512692,
+      "loss": 0.957,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.38,
+      "memory/max_allocated (GiB)": 28.38,
+      "step": 84,
+      "tokens_per_second_per_gpu": 1902.34
+    },
+    {
+      "epoch": 0.5902777777777778,
+      "grad_norm": 0.0862964540719986,
+      "learning_rate": 0.00019456791406402964,
+      "loss": 0.9576,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 85,
+      "tokens_per_second_per_gpu": 1848.41
+    },
+    {
+      "epoch": 0.5972222222222222,
+      "grad_norm": 0.11609019339084625,
+      "learning_rate": 0.00019430227859663633,
+      "loss": 0.9729,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 86,
+      "tokens_per_second_per_gpu": 1869.76
+    },
+    {
+      "epoch": 0.6041666666666666,
+      "grad_norm": 0.08986852318048477,
+      "learning_rate": 0.00019403049248838578,
+      "loss": 0.9315,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 87,
+      "tokens_per_second_per_gpu": 1824.83
+    },
+    {
+      "epoch": 0.6111111111111112,
+      "grad_norm": 0.08460818976163864,
+      "learning_rate": 0.00019375257346587773,
+      "loss": 0.9937,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 88,
+      "tokens_per_second_per_gpu": 1763.25
+    },
+    {
+      "epoch": 0.6180555555555556,
+      "grad_norm": 0.07170393317937851,
+      "learning_rate": 0.0001934685396557165,
+      "loss": 0.937,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 89,
+      "tokens_per_second_per_gpu": 1806.52
+    },
+    {
+      "epoch": 0.625,
+      "grad_norm": 0.1740889698266983,
+      "learning_rate": 0.00019317840958332888,
+      "loss": 0.8459,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 90,
+      "tokens_per_second_per_gpu": 1882.4
+    },
+    {
+      "epoch": 0.6319444444444444,
+      "grad_norm": 0.06397266685962677,
+      "learning_rate": 0.00019288220217175583,
+      "loss": 0.7879,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 91,
+      "tokens_per_second_per_gpu": 1867.77
+    },
+    {
+      "epoch": 0.6388888888888888,
+      "grad_norm": 0.06102391704916954,
+      "learning_rate": 0.00019257993674041813,
+      "loss": 0.8671,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 92,
+      "tokens_per_second_per_gpu": 1878.38
+    },
+    {
+      "epoch": 0.6458333333333334,
+      "grad_norm": 0.06330034881830215,
+      "learning_rate": 0.00019227163300385662,
+      "loss": 0.9089,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 93,
+      "tokens_per_second_per_gpu": 1859.18
+    },
+    {
+      "epoch": 0.6527777777777778,
+      "grad_norm": 0.07149945199489594,
+      "learning_rate": 0.00019195731107044594,
+      "loss": 0.8842,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 94,
+      "tokens_per_second_per_gpu": 1784.86
+    },
+    {
+      "epoch": 0.6597222222222222,
+      "grad_norm": 0.07301725447177887,
+      "learning_rate": 0.0001916369914410834,
+      "loss": 0.875,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 95,
+      "tokens_per_second_per_gpu": 1697.59
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 0.060419220477342606,
+      "learning_rate": 0.00019131069500785174,
+      "loss": 0.798,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 96,
+      "tokens_per_second_per_gpu": 1802.63
+    },
+    {
+      "epoch": 0.6736111111111112,
+      "grad_norm": 0.06787065416574478,
+      "learning_rate": 0.00019097844305265624,
+      "loss": 0.8801,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 97,
+      "tokens_per_second_per_gpu": 1756.93
+    },
+    {
+      "epoch": 0.6805555555555556,
+      "grad_norm": 0.07948441058397293,
+      "learning_rate": 0.0001906402572458371,
+      "loss": 0.9047,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 98,
+      "tokens_per_second_per_gpu": 1838.22
+    },
+    {
+      "epoch": 0.6875,
+      "grad_norm": 0.06950388848781586,
+      "learning_rate": 0.0001902961596447557,
+      "loss": 0.9473,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 26.06,
+      "memory/max_allocated (GiB)": 26.06,
+      "step": 99,
+      "tokens_per_second_per_gpu": 1809.3
+    },
+    {
+      "epoch": 0.6944444444444444,
+      "grad_norm": 0.059347931295633316,
+      "learning_rate": 0.00018994617269235616,
+      "loss": 0.837,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 100,
+      "tokens_per_second_per_gpu": 1897.82
+    },
+    {
+      "epoch": 0.7013888888888888,
+      "grad_norm": 0.0786047875881195,
+      "learning_rate": 0.00018959031921570135,
+      "loss": 0.8884,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 24.17,
+      "memory/max_allocated (GiB)": 24.17,
+      "step": 101,
+      "tokens_per_second_per_gpu": 1783.4
+    },
+    {
+      "epoch": 0.7083333333333334,
+      "grad_norm": 0.0657181590795517,
+      "learning_rate": 0.0001892286224244843,
+      "loss": 0.9047,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 102,
+      "tokens_per_second_per_gpu": 1810.31
+    },
+    {
+      "epoch": 0.7152777777777778,
+      "grad_norm": 0.07979200780391693,
+      "learning_rate": 0.00018886110590951417,
+      "loss": 0.9022,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.11,
+      "memory/max_allocated (GiB)": 25.11,
+      "step": 103,
+      "tokens_per_second_per_gpu": 1739.77
+    },
+    {
+      "epoch": 0.7222222222222222,
+      "grad_norm": 0.07467928528785706,
+      "learning_rate": 0.00018848779364117775,
+      "loss": 0.8916,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 24.64,
+      "memory/max_allocated (GiB)": 24.64,
+      "step": 104,
+      "tokens_per_second_per_gpu": 1796.23
+    },
+    {
+      "epoch": 0.7291666666666666,
+      "grad_norm": 0.07130390405654907,
+      "learning_rate": 0.000188108709967876,
+      "loss": 0.9186,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 105,
+      "tokens_per_second_per_gpu": 1834.56
+    },
+    {
+      "epoch": 0.7361111111111112,
+      "grad_norm": 0.08154763281345367,
+      "learning_rate": 0.000187723879614436,
+      "loss": 0.9381,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 26.06,
+      "memory/max_allocated (GiB)": 26.06,
+      "step": 106,
+      "tokens_per_second_per_gpu": 1711.53
+    },
+    {
+      "epoch": 0.7430555555555556,
+      "grad_norm": 0.07683106511831284,
+      "learning_rate": 0.00018733332768049827,
+      "loss": 0.9567,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 107,
+      "tokens_per_second_per_gpu": 1720.36
+    },
+    {
+      "epoch": 0.75,
+      "grad_norm": 0.07069261372089386,
+      "learning_rate": 0.00018693707963887978,
+      "loss": 0.9454,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 26.06,
+      "memory/max_allocated (GiB)": 26.06,
+      "step": 108,
+      "tokens_per_second_per_gpu": 1795.58
+    },
+    {
+      "epoch": 0.7569444444444444,
+      "grad_norm": 0.07336299121379852,
+      "learning_rate": 0.0001865351613339125,
+      "loss": 0.9739,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 109,
+      "tokens_per_second_per_gpu": 1695.16
+    },
+    {
+      "epoch": 0.7638888888888888,
+      "grad_norm": 0.0726110115647316,
+      "learning_rate": 0.0001861275989797578,
+      "loss": 0.8957,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 110,
+      "tokens_per_second_per_gpu": 1866.56
+    },
+    {
+      "epoch": 0.7708333333333334,
+      "grad_norm": 0.08780913054943085,
+      "learning_rate": 0.00018571441915869662,
+      "loss": 0.9204,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 111,
+      "tokens_per_second_per_gpu": 1820.22
+    },
+    {
+      "epoch": 0.7777777777777778,
+      "grad_norm": 0.10503561049699783,
+      "learning_rate": 0.0001852956488193959,
+      "loss": 0.9357,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 112,
+      "tokens_per_second_per_gpu": 1793.45
+    },
+    {
+      "epoch": 0.7847222222222222,
+      "grad_norm": 0.06898421794176102,
+      "learning_rate": 0.0001848713152751506,
+      "loss": 0.8725,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 113,
+      "tokens_per_second_per_gpu": 1794.32
+    },
+    {
+      "epoch": 0.7916666666666666,
+      "grad_norm": 0.06764024496078491,
+      "learning_rate": 0.00018444144620210256,
+      "loss": 0.9115,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 114,
+      "tokens_per_second_per_gpu": 1811.27
+    },
+    {
+      "epoch": 0.7986111111111112,
+      "grad_norm": 0.07626543939113617,
+      "learning_rate": 0.00018400606963743518,
+      "loss": 0.866,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 115,
+      "tokens_per_second_per_gpu": 1798.98
+    },
+    {
+      "epoch": 0.8055555555555556,
+      "grad_norm": 0.06234179437160492,
+      "learning_rate": 0.00018356521397754495,
+      "loss": 0.911,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 116,
+      "tokens_per_second_per_gpu": 1837.38
+    },
+    {
+      "epoch": 0.8125,
+      "grad_norm": 0.07429645955562592,
+      "learning_rate": 0.00018311890797618915,
+      "loss": 0.9853,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 117,
+      "tokens_per_second_per_gpu": 1909.7
+    },
+    {
+      "epoch": 0.8194444444444444,
+      "grad_norm": 0.0701533630490303,
+      "learning_rate": 0.00018266718074261062,
+      "loss": 0.8815,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 118,
+      "tokens_per_second_per_gpu": 1862.25
+    },
+    {
+      "epoch": 0.8263888888888888,
+      "grad_norm": 0.08356419950723648,
+      "learning_rate": 0.00018221006173963912,
+      "loss": 0.9683,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 119,
+      "tokens_per_second_per_gpu": 1779.76
+    },
+    {
+      "epoch": 0.8333333333333334,
+      "grad_norm": 0.06717222929000854,
+      "learning_rate": 0.00018174758078176963,
+      "loss": 0.8591,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 24.17,
+      "memory/max_allocated (GiB)": 24.17,
+      "step": 120,
+      "tokens_per_second_per_gpu": 1819.98
+    },
+    {
+      "epoch": 0.8402777777777778,
+      "grad_norm": 0.07267988473176956,
+      "learning_rate": 0.00018127976803321793,
+      "loss": 0.8717,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 121,
+      "tokens_per_second_per_gpu": 1790.21
+    },
+    {
+      "epoch": 0.8472222222222222,
+      "grad_norm": 0.07312195748090744,
+      "learning_rate": 0.00018080665400595303,
+      "loss": 0.8591,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 122,
+      "tokens_per_second_per_gpu": 1784.17
+    },
+    {
+      "epoch": 0.8541666666666666,
+      "grad_norm": 0.06259205937385559,
+      "learning_rate": 0.00018032826955770724,
+      "loss": 0.826,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 123,
+      "tokens_per_second_per_gpu": 1872.69
+    },
+    {
+      "epoch": 0.8611111111111112,
+      "grad_norm": 0.062194038182497025,
+      "learning_rate": 0.00017984464588996342,
+      "loss": 0.8974,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 124,
+      "tokens_per_second_per_gpu": 1856.73
+    },
+    {
+      "epoch": 0.8680555555555556,
+      "grad_norm": 0.06809177249670029,
+      "learning_rate": 0.00017935581454592002,
+      "loss": 0.8899,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 125,
+      "tokens_per_second_per_gpu": 1904.82
+    },
+    {
+      "epoch": 0.875,
+      "grad_norm": 0.058420922607183456,
+      "learning_rate": 0.00017886180740843383,
+      "loss": 0.8287,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 126,
+      "tokens_per_second_per_gpu": 1699.16
+    },
+    {
+      "epoch": 0.8819444444444444,
+      "grad_norm": 0.06883256137371063,
+      "learning_rate": 0.00017836265669794033,
+      "loss": 0.7913,
+      "memory/device_reserved (GiB)": 32.39,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 127,
+      "tokens_per_second_per_gpu": 1877.6
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 0.06439166516065598,
+      "learning_rate": 0.00017785839497035222,
+      "loss": 0.8462,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 128,
+      "tokens_per_second_per_gpu": 1785.45
+    },
+    {
+      "epoch": 0.8958333333333334,
+      "grad_norm": 0.07990575581789017,
+      "learning_rate": 0.00017734905511493615,
+      "loss": 0.9299,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 129,
+      "tokens_per_second_per_gpu": 1863.91
+    },
+    {
+      "epoch": 0.9027777777777778,
+      "grad_norm": 0.13562186062335968,
+      "learning_rate": 0.0001768346703521675,
+      "loss": 0.8973,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.11,
+      "memory/max_allocated (GiB)": 25.11,
+      "step": 130,
+      "tokens_per_second_per_gpu": 1841.01
+    },
+    {
+      "epoch": 0.9097222222222222,
+      "grad_norm": 0.0757925733923912,
+      "learning_rate": 0.0001763152742315637,
+      "loss": 0.8393,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 26.06,
+      "memory/max_allocated (GiB)": 26.06,
+      "step": 131,
+      "tokens_per_second_per_gpu": 1841.76
+    },
+    {
+      "epoch": 0.9166666666666666,
+      "grad_norm": 0.1594410538673401,
+      "learning_rate": 0.000175790900629496,
+      "loss": 0.9476,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 132,
+      "tokens_per_second_per_gpu": 1900.8
+    },
+    {
+      "epoch": 0.9236111111111112,
+      "grad_norm": 0.07541660219430923,
+      "learning_rate": 0.00017526158374698,
+      "loss": 0.8889,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 133,
+      "tokens_per_second_per_gpu": 1862.75
+    },
+    {
+      "epoch": 0.9305555555555556,
+      "grad_norm": 0.06783576309680939,
+      "learning_rate": 0.00017472735810744494,
+      "loss": 0.8986,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.11,
+      "memory/max_allocated (GiB)": 25.11,
+      "step": 134,
+      "tokens_per_second_per_gpu": 1834.08
+    },
+    {
+      "epoch": 0.9375,
+      "grad_norm": 0.08990088850259781,
+      "learning_rate": 0.00017418825855448206,
+      "loss": 0.8216,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 24.17,
+      "memory/max_allocated (GiB)": 24.17,
+      "step": 135,
+      "tokens_per_second_per_gpu": 1756.08
+    },
+    {
+      "epoch": 0.9444444444444444,
+      "grad_norm": 0.06768841296434402,
+      "learning_rate": 0.00017364432024957193,
+      "loss": 0.9572,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.44,
+      "memory/max_allocated (GiB)": 27.44,
+      "step": 136,
+      "tokens_per_second_per_gpu": 1894.76
+    },
+    {
+      "epoch": 0.9513888888888888,
+      "grad_norm": 0.07805129885673523,
+      "learning_rate": 0.00017309557866979113,
+      "loss": 0.9175,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 24.74,
+      "memory/max_allocated (GiB)": 24.74,
+      "step": 137,
+      "tokens_per_second_per_gpu": 1814.77
+    },
+    {
+      "epoch": 0.9583333333333334,
+      "grad_norm": 0.07132866978645325,
+      "learning_rate": 0.00017254206960549842,
+      "loss": 0.9275,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 26.49,
+      "memory/max_allocated (GiB)": 26.49,
+      "step": 138,
+      "tokens_per_second_per_gpu": 1802.23
+    },
+    {
+      "epoch": 0.9652777777777778,
+      "grad_norm": 0.0779808983206749,
+      "learning_rate": 0.00017198382915800033,
+      "loss": 0.9107,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 26.25,
+      "memory/max_allocated (GiB)": 26.25,
+      "step": 139,
+      "tokens_per_second_per_gpu": 1816.41
+    },
+    {
+      "epoch": 0.9722222222222222,
+      "grad_norm": 0.07991725951433182,
+      "learning_rate": 0.0001714208937371965,
+      "loss": 0.8597,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 24.64,
+      "memory/max_allocated (GiB)": 24.64,
+      "step": 140,
+      "tokens_per_second_per_gpu": 1831.07
+    },
+    {
+      "epoch": 0.9791666666666666,
+      "grad_norm": 0.07846493273973465,
+      "learning_rate": 0.00017085330005920516,
+      "loss": 0.9037,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 141,
+      "tokens_per_second_per_gpu": 1772.26
+    },
+    {
+      "epoch": 0.9861111111111112,
+      "grad_norm": 0.07660206407308578,
+      "learning_rate": 0.00017028108514396799,
+      "loss": 0.8545,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 142,
+      "tokens_per_second_per_gpu": 1841.68
+    },
+    {
+      "epoch": 0.9930555555555556,
+      "grad_norm": 0.06597072631120682,
+      "learning_rate": 0.000169704286312836,
+      "loss": 0.8313,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 143,
+      "tokens_per_second_per_gpu": 1819.32
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 0.12106018513441086,
+      "learning_rate": 0.00016912294118613517,
+      "loss": 0.9172,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 144,
+      "tokens_per_second_per_gpu": 1816.15
+    },
+    {
+      "epoch": 1.0069444444444444,
+      "grad_norm": 0.07370701432228088,
+      "learning_rate": 0.00016853708768071264,
+      "loss": 0.9025,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 145,
+      "tokens_per_second_per_gpu": 1740.51
+    },
+    {
+      "epoch": 1.0138888888888888,
+      "grad_norm": 0.06994107365608215,
+      "learning_rate": 0.0001679467640074639,
+      "loss": 0.8453,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 146,
+      "tokens_per_second_per_gpu": 1858.12
+    },
+    {
+      "epoch": 1.0208333333333333,
+      "grad_norm": 0.07402420789003372,
+      "learning_rate": 0.00016735200866884036,
+      "loss": 0.8769,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 147,
+      "tokens_per_second_per_gpu": 1768.73
+    },
+    {
+      "epoch": 1.0277777777777777,
+      "grad_norm": 0.09437773376703262,
+      "learning_rate": 0.00016675286045633828,
+      "loss": 0.9602,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.11,
+      "memory/max_allocated (GiB)": 25.11,
+      "step": 148,
+      "tokens_per_second_per_gpu": 1793.91
+    },
+    {
+      "epoch": 1.0347222222222223,
+      "grad_norm": 0.09405123442411423,
+      "learning_rate": 0.00016614935844796864,
+      "loss": 0.9701,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 149,
+      "tokens_per_second_per_gpu": 1738.78
+    },
+    {
+      "epoch": 1.0416666666666667,
+      "grad_norm": 0.0800110325217247,
+      "learning_rate": 0.00016554154200570825,
+      "loss": 0.9213,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 150,
+      "tokens_per_second_per_gpu": 1838.3
+    },
+    {
+      "epoch": 1.0486111111111112,
+      "grad_norm": 0.07217224687337875,
+      "learning_rate": 0.0001649294507729327,
+      "loss": 0.9151,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 24.74,
+      "memory/max_allocated (GiB)": 24.74,
+      "step": 151,
+      "tokens_per_second_per_gpu": 1800.16
+    },
+    {
+      "epoch": 1.0555555555555556,
+      "grad_norm": 0.11542835831642151,
+      "learning_rate": 0.0001643131246718305,
+      "loss": 0.8443,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 26.26,
+      "memory/max_allocated (GiB)": 26.26,
+      "step": 152,
+      "tokens_per_second_per_gpu": 1808.4
+    },
+    {
+      "epoch": 1.0625,
+      "grad_norm": 0.16738373041152954,
+      "learning_rate": 0.00016369260390079933,
+      "loss": 0.8658,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 24.74,
+      "memory/max_allocated (GiB)": 24.74,
+      "step": 153,
+      "tokens_per_second_per_gpu": 1786.38
+    },
+    {
+      "epoch": 1.0694444444444444,
+      "grad_norm": 0.08216153830289841,
+      "learning_rate": 0.0001630679289318242,
+      "loss": 0.8402,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 24.17,
+      "memory/max_allocated (GiB)": 24.17,
+      "step": 154,
+      "tokens_per_second_per_gpu": 1721.01
+    },
+    {
+      "epoch": 1.0763888888888888,
+      "grad_norm": 0.0893600732088089,
+      "learning_rate": 0.00016243914050783785,
+      "loss": 0.843,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 155,
+      "tokens_per_second_per_gpu": 1769.48
+    },
+    {
+      "epoch": 1.0833333333333333,
+      "grad_norm": 0.08116093277931213,
+      "learning_rate": 0.00016180627964006313,
+      "loss": 0.8911,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 156,
+      "tokens_per_second_per_gpu": 1890.64
+    },
+    {
+      "epoch": 1.0902777777777777,
+      "grad_norm": 0.08874551951885223,
+      "learning_rate": 0.00016116938760533844,
+      "loss": 0.9208,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 157,
+      "tokens_per_second_per_gpu": 1768.32
+    },
+    {
+      "epoch": 1.0972222222222223,
+      "grad_norm": 0.10523993521928787,
+      "learning_rate": 0.00016052850594342534,
+      "loss": 0.9199,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 23.22,
+      "memory/max_allocated (GiB)": 23.22,
+      "step": 158,
+      "tokens_per_second_per_gpu": 1866.22
+    },
+    {
+      "epoch": 1.1041666666666667,
+      "grad_norm": 0.1344747096300125,
+      "learning_rate": 0.00015988367645429938,
+      "loss": 0.8523,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.11,
+      "memory/max_allocated (GiB)": 25.11,
+      "step": 159,
+      "tokens_per_second_per_gpu": 1796.77
+    },
+    {
+      "epoch": 1.1111111111111112,
+      "grad_norm": 0.10914891213178635,
+      "learning_rate": 0.0001592349411954236,
+      "loss": 0.9055,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 160,
+      "tokens_per_second_per_gpu": 1796.0
+    },
+    {
+      "epoch": 1.1180555555555556,
+      "grad_norm": 0.09863642603158951,
+      "learning_rate": 0.0001585823424790056,
+      "loss": 0.834,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 26.26,
+      "memory/max_allocated (GiB)": 26.26,
+      "step": 161,
+      "tokens_per_second_per_gpu": 1744.31
+    },
+    {
+      "epoch": 1.125,
+      "grad_norm": 0.0731300413608551,
+      "learning_rate": 0.0001579259228692378,
+      "loss": 0.8731,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 162,
+      "tokens_per_second_per_gpu": 1840.36
+    },
+    {
+      "epoch": 1.1319444444444444,
+      "grad_norm": 0.08072181046009064,
+      "learning_rate": 0.00015726572517952122,
+      "loss": 0.8169,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 163,
+      "tokens_per_second_per_gpu": 1874.55
+    },
+    {
+      "epoch": 1.1388888888888888,
+      "grad_norm": 0.07996222376823425,
+      "learning_rate": 0.00015660179246967314,
+      "loss": 0.8925,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.38,
+      "memory/max_allocated (GiB)": 28.38,
+      "step": 164,
+      "tokens_per_second_per_gpu": 1858.78
+    },
+    {
+      "epoch": 1.1458333333333333,
+      "grad_norm": 0.07483426481485367,
+      "learning_rate": 0.00015593416804311852,
+      "loss": 0.842,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 165,
+      "tokens_per_second_per_gpu": 1792.51
+    },
+    {
+      "epoch": 1.1527777777777777,
+      "grad_norm": 0.07248706370592117,
+      "learning_rate": 0.00015526289544406585,
+      "loss": 0.8672,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 26.06,
+      "memory/max_allocated (GiB)": 26.06,
+      "step": 166,
+      "tokens_per_second_per_gpu": 1778.82
+    },
+    {
+      "epoch": 1.1597222222222223,
+      "grad_norm": 0.07618600875139236,
+      "learning_rate": 0.0001545880184546669,
+      "loss": 0.9355,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.11,
+      "memory/max_allocated (GiB)": 25.11,
+      "step": 167,
+      "tokens_per_second_per_gpu": 1743.04
+    },
+    {
+      "epoch": 1.1666666666666667,
+      "grad_norm": 0.06785187125205994,
+      "learning_rate": 0.0001539095810921612,
+      "loss": 0.8303,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 168,
+      "tokens_per_second_per_gpu": 1858.47
+    },
+    {
+      "epoch": 1.1736111111111112,
+      "grad_norm": 0.06521926075220108,
+      "learning_rate": 0.0001532276276060051,
+      "loss": 0.8736,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 169,
+      "tokens_per_second_per_gpu": 1828.71
+    },
+    {
+      "epoch": 1.1805555555555556,
+      "grad_norm": 0.07702817767858505,
+      "learning_rate": 0.00015254220247498573,
+      "loss": 0.8526,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 170,
+      "tokens_per_second_per_gpu": 1843.68
+    },
+    {
+      "epoch": 1.1875,
+      "grad_norm": 0.1337224841117859,
+      "learning_rate": 0.0001518533504043199,
+      "loss": 0.8973,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 26.25,
+      "memory/max_allocated (GiB)": 26.25,
+      "step": 171,
+      "tokens_per_second_per_gpu": 1884.45
+    },
+    {
+      "epoch": 1.1944444444444444,
+      "grad_norm": 0.09579396992921829,
+      "learning_rate": 0.0001511611163227385,
+      "loss": 0.8214,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 172,
+      "tokens_per_second_per_gpu": 1824.99
+    },
+    {
+      "epoch": 1.2013888888888888,
+      "grad_norm": 0.08288433402776718,
+      "learning_rate": 0.00015046554537955585,
+      "loss": 0.8975,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 173,
+      "tokens_per_second_per_gpu": 1815.52
+    },
+    {
+      "epoch": 1.2083333333333333,
+      "grad_norm": 0.07618972659111023,
+      "learning_rate": 0.00014976668294172527,
+      "loss": 0.8802,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 26.49,
+      "memory/max_allocated (GiB)": 26.49,
+      "step": 174,
+      "tokens_per_second_per_gpu": 1844.86
+    },
+    {
+      "epoch": 1.2152777777777777,
+      "grad_norm": 0.08120069652795792,
+      "learning_rate": 0.00014906457459087978,
+      "loss": 0.8764,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 175,
+      "tokens_per_second_per_gpu": 1855.53
+    },
+    {
+      "epoch": 1.2222222222222223,
+      "grad_norm": 0.0804535299539566,
+      "learning_rate": 0.00014835926612035945,
+      "loss": 0.7791,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 176,
+      "tokens_per_second_per_gpu": 1806.63
+    },
+    {
+      "epoch": 1.2291666666666667,
+      "grad_norm": 0.07893332093954086,
+      "learning_rate": 0.00014765080353222447,
+      "loss": 0.8774,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 23.75,
+      "memory/max_allocated (GiB)": 23.75,
+      "step": 177,
+      "tokens_per_second_per_gpu": 1791.96
+    },
+    {
+      "epoch": 1.2361111111111112,
+      "grad_norm": 0.08453946560621262,
+      "learning_rate": 0.0001469392330342548,
+      "loss": 0.8865,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 178,
+      "tokens_per_second_per_gpu": 1846.08
+    },
+    {
+      "epoch": 1.2430555555555556,
+      "grad_norm": 0.09483584016561508,
+      "learning_rate": 0.0001462246010369364,
+      "loss": 0.8611,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 179,
+      "tokens_per_second_per_gpu": 1746.22
+    },
+    {
+      "epoch": 1.25,
+      "grad_norm": 0.10927578061819077,
+      "learning_rate": 0.0001455069541504342,
+      "loss": 0.887,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 24.17,
+      "memory/max_allocated (GiB)": 24.17,
+      "step": 180,
+      "tokens_per_second_per_gpu": 1829.41
+    },
+    {
+      "epoch": 1.2569444444444444,
+      "grad_norm": 0.08271358907222748,
+      "learning_rate": 0.00014478633918155217,
+      "loss": 0.7864,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 181,
+      "tokens_per_second_per_gpu": 1757.68
+    },
+    {
+      "epoch": 1.2638888888888888,
+      "grad_norm": 0.09157366305589676,
+      "learning_rate": 0.00014406280313068018,
+      "loss": 0.8428,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 182,
+      "tokens_per_second_per_gpu": 1808.32
+    },
+    {
+      "epoch": 1.2708333333333333,
+      "grad_norm": 0.09229514002799988,
+      "learning_rate": 0.0001433363931887289,
+      "loss": 0.8618,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 24.27,
+      "memory/max_allocated (GiB)": 24.27,
+      "step": 183,
+      "tokens_per_second_per_gpu": 1808.02
+    },
+    {
+      "epoch": 1.2777777777777777,
+      "grad_norm": 0.08157222718000412,
+      "learning_rate": 0.00014260715673405157,
+      "loss": 0.8205,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.44,
+      "memory/max_allocated (GiB)": 27.44,
+      "step": 184,
+      "tokens_per_second_per_gpu": 1820.59
+    },
+    {
+      "epoch": 1.2847222222222223,
+      "grad_norm": 0.07500192523002625,
+      "learning_rate": 0.00014187514132935392,
+      "loss": 0.8863,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 185,
+      "tokens_per_second_per_gpu": 1786.34
+    },
+    {
+      "epoch": 1.2916666666666667,
+      "grad_norm": 0.08555968850851059,
+      "learning_rate": 0.00014114039471859222,
+      "loss": 0.8631,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.38,
+      "memory/max_allocated (GiB)": 28.38,
+      "step": 186,
+      "tokens_per_second_per_gpu": 1752.67
+    },
+    {
+      "epoch": 1.2986111111111112,
+      "grad_norm": 0.08531264960765839,
+      "learning_rate": 0.00014040296482385894,
+      "loss": 0.9533,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 187,
+      "tokens_per_second_per_gpu": 1935.53
+    },
+    {
+      "epoch": 1.3055555555555556,
+      "grad_norm": 0.0975320041179657,
+      "learning_rate": 0.0001396628997422575,
+      "loss": 0.9088,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 188,
+      "tokens_per_second_per_gpu": 1858.47
+    },
+    {
+      "epoch": 1.3125,
+      "grad_norm": 0.08182326704263687,
+      "learning_rate": 0.00013892024774276495,
+      "loss": 0.8836,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 189,
+      "tokens_per_second_per_gpu": 1805.73
+    },
+    {
+      "epoch": 1.3194444444444444,
+      "grad_norm": 0.09812135249376297,
+      "learning_rate": 0.00013817505726308402,
+      "loss": 0.8415,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 190,
+      "tokens_per_second_per_gpu": 1786.37
+    },
+    {
+      "epoch": 1.3263888888888888,
+      "grad_norm": 0.0917779952287674,
+      "learning_rate": 0.00013742737690648361,
+      "loss": 0.8869,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.2,
+      "memory/max_allocated (GiB)": 27.2,
+      "step": 191,
+      "tokens_per_second_per_gpu": 1818.34
+    },
+    {
+      "epoch": 1.3333333333333333,
+      "grad_norm": 0.0986262708902359,
+      "learning_rate": 0.00013667725543862905,
+      "loss": 0.8554,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 192,
+      "tokens_per_second_per_gpu": 1825.63
+    },
+    {
+      "epoch": 1.3402777777777777,
+      "grad_norm": 0.08627785742282867,
+      "learning_rate": 0.00013592474178440115,
+      "loss": 0.8649,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 193,
+      "tokens_per_second_per_gpu": 1804.99
+    },
+    {
+      "epoch": 1.3472222222222223,
+      "grad_norm": 0.07916709780693054,
+      "learning_rate": 0.0001351698850247055,
+      "loss": 0.8681,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 194,
+      "tokens_per_second_per_gpu": 1806.46
+    },
+    {
+      "epoch": 1.3541666666666667,
+      "grad_norm": 0.08284994214773178,
+      "learning_rate": 0.000134412734393271,
+      "loss": 0.8953,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.01,
+      "memory/max_allocated (GiB)": 27.01,
+      "step": 195,
+      "tokens_per_second_per_gpu": 1827.15
+    },
+    {
+      "epoch": 1.3611111111111112,
+      "grad_norm": 0.08089049905538559,
+      "learning_rate": 0.00013365333927343906,
+      "loss": 0.9466,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.9,
+      "memory/max_allocated (GiB)": 28.9,
+      "step": 196,
+      "tokens_per_second_per_gpu": 1850.2
+    },
+    {
+      "epoch": 1.3680555555555556,
+      "grad_norm": 0.09982682019472122,
+      "learning_rate": 0.00013289174919494228,
+      "loss": 0.8445,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 28.15,
+      "memory/max_allocated (GiB)": 28.15,
+      "step": 197,
+      "tokens_per_second_per_gpu": 1776.51
+    },
+    {
+      "epoch": 1.375,
+      "grad_norm": 0.08014615625143051,
+      "learning_rate": 0.0001321280138306743,
+      "loss": 0.92,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.68,
+      "memory/max_allocated (GiB)": 25.68,
+      "step": 198,
+      "tokens_per_second_per_gpu": 1849.96
+    },
+    {
+      "epoch": 1.3819444444444444,
+      "grad_norm": 0.10110778361558914,
+      "learning_rate": 0.00013136218299344992,
+      "loss": 0.7737,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 27.95,
+      "memory/max_allocated (GiB)": 27.95,
+      "step": 199,
+      "tokens_per_second_per_gpu": 1853.85
+    },
+    {
+      "epoch": 1.3888888888888888,
+      "grad_norm": 0.15635082125663757,
+      "learning_rate": 0.0001305943066327561,
+      "loss": 0.9339,
+      "memory/device_reserved (GiB)": 32.41,
+      "memory/max_active (GiB)": 25.31,
+      "memory/max_allocated (GiB)": 25.31,
+      "step": 200,
+      "tokens_per_second_per_gpu": 1818.61
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 432,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.1164899726190264e+18,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:929c176313b9455ecfcfecff8ea223e55e5864009e2253728ac173998dcc6858
+size 7313

checkpoint-200/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-300/README.md ADDED Viewed

	@@ -0,0 +1,208 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- axolotl
+- base_model:adapter:Qwen/Qwen2.5-Coder-14B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-300/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "down_proj",
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "k_proj"
+  ],
+  "target_parameters": [],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-300/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5832547a563b9cfb181f6eeef8a35ebf95dff6f0050f7b37b5950109239cf197
+size 550593184

checkpoint-300/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-300/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

checkpoint-300/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-300/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:135531222ff124b2cb39970b707fae2b724a6b2f15f1809690a5d9fd2f80fefd
+size 280342501

checkpoint-300/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7d0bb6efab72a136bee4f23b2793e9cc2cab1acc28416a13c80533640d65e29f
+size 14645

checkpoint-300/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba621f4a32bf8b97dde7ea36b2b0ca1d3db0a938e7e275774ff8f3bea77371bd
+size 1465

checkpoint-300/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-300/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

checkpoint-300/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-300/trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-300/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:929c176313b9455ecfcfecff8ea223e55e5864009e2253728ac173998dcc6858
+size 7313

checkpoint-300/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-400/README.md ADDED Viewed

	@@ -0,0 +1,208 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- axolotl
+- base_model:adapter:Qwen/Qwen2.5-Coder-14B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-400/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "down_proj",
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "k_proj"
+  ],
+  "target_parameters": [],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-400/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:10e4d52c8bd97535a59f37ca6afde5832448973f04f40508dc8cb930cf51ec83
+size 550593184

checkpoint-400/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-400/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

checkpoint-400/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-400/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0c4a8e9f5bfb8a404416b710ef6226a0d9f5571fd1188a1e54c2bffd998c781e
+size 280342501

checkpoint-400/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:24275ac9622942d95b5824053b082dd58b38ff30318f5a8c1efd68328d3217dc
+size 14645

checkpoint-400/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a54cce168c10d482d913193a6364c9721d98ba4602ce24cae6eda811715d0d10
+size 1465

checkpoint-400/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-400/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

checkpoint-400/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}