Instructions to use RetrO21/agrofinetune with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RetrO21/agrofinetune with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = PeftModel.from_pretrained(base_model, "RetrO21/agrofinetune")

Transformers

How to use RetrO21/agrofinetune with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RetrO21/agrofinetune")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("RetrO21/agrofinetune")
model = AutoModelForImageTextToText.from_pretrained("RetrO21/agrofinetune")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RetrO21/agrofinetune with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RetrO21/agrofinetune"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RetrO21/agrofinetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RetrO21/agrofinetune

SGLang

How to use RetrO21/agrofinetune with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RetrO21/agrofinetune" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RetrO21/agrofinetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RetrO21/agrofinetune" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RetrO21/agrofinetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RetrO21/agrofinetune with Docker Model Runner:
```
docker model run hf.co/RetrO21/agrofinetune
```

RetrO21 commited on Nov 27, 2025

Commit

82e5deb

verified ·

1 Parent(s): 12207f7

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +6 -0
README.md +28 -0
adapter_config.json +41 -0
adapter_model.safetensors +3 -0
added_tokens.json +16 -0
chat_template.jinja +7 -0
checkpoint-1737/README.md +209 -0
checkpoint-1737/adapter_config.json +41 -0
checkpoint-1737/adapter_model.safetensors +3 -0
checkpoint-1737/added_tokens.json +16 -0
checkpoint-1737/chat_template.jinja +7 -0
checkpoint-1737/merges.txt +0 -0
checkpoint-1737/optimizer.pt +3 -0
checkpoint-1737/rng_state.pth +3 -0
checkpoint-1737/scheduler.pt +3 -0
checkpoint-1737/special_tokens_map.json +31 -0
checkpoint-1737/tokenizer.json +3 -0
checkpoint-1737/tokenizer_config.json +143 -0
checkpoint-1737/trainer_state.json +386 -0
checkpoint-1737/training_args.bin +3 -0
checkpoint-1737/vocab.json +0 -0
checkpoint-3474/README.md +209 -0
checkpoint-3474/adapter_config.json +41 -0
checkpoint-3474/adapter_model.safetensors +3 -0
checkpoint-3474/added_tokens.json +16 -0
checkpoint-3474/chat_template.jinja +7 -0
checkpoint-3474/merges.txt +0 -0
checkpoint-3474/optimizer.pt +3 -0
checkpoint-3474/rng_state.pth +3 -0
checkpoint-3474/scheduler.pt +3 -0
checkpoint-3474/special_tokens_map.json +31 -0
checkpoint-3474/tokenizer.json +3 -0
checkpoint-3474/tokenizer_config.json +143 -0
checkpoint-3474/trainer_state.json +748 -0
checkpoint-3474/training_args.bin +3 -0
checkpoint-3474/vocab.json +0 -0
checkpoint-5211/README.md +209 -0
checkpoint-5211/adapter_config.json +41 -0
checkpoint-5211/adapter_model.safetensors +3 -0
checkpoint-5211/added_tokens.json +16 -0
checkpoint-5211/chat_template.jinja +7 -0
checkpoint-5211/merges.txt +0 -0
checkpoint-5211/optimizer.pt +3 -0
checkpoint-5211/rng_state.pth +3 -0
checkpoint-5211/scheduler.pt +3 -0
checkpoint-5211/special_tokens_map.json +31 -0
checkpoint-5211/tokenizer.json +3 -0
checkpoint-5211/tokenizer_config.json +143 -0
checkpoint-5211/trainer_state.json +1110 -0
checkpoint-5211/training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+checkpoint-1737/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-3474/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-5211/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-6948/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-8685/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+base_model: Qwen/Qwen2-VL-2B-Instruct
+library_name: peft
+model_name: output
+tags:
+- adapter
+- lora
+- sft
+- transformers
+- trl
+license: apache-2.0
+pipeline_tag: text-generation
+---
+# Model Card for output
+This model is a LoRA fine-tuned version of
+[Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct).
+It has been trained using the TRL SFT pipeline.
+## Quick start
+```python
+from transformers import pipeline
+pipe = pipeline("text-generation", model="RetrO21/agrofinetune", device="cuda")
+print(pipe("What is nitrogen deficiency?")[0]["generated_text"])

adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2-VL-2B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:76b5201211b5dac5150a2b3a87809a5671a1239a76fdfafed2618f15a157a612
+size 4374520

added_tokens.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,7 @@

+{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
+You are a helpful assistant.<|im_end|>
+{% endif %}<|im_start|>{{ message['role'] }}
+{% if message['content'] is string %}{{ message['content'] }}<|im_end|>
+{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
+{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
+{% endif %}

checkpoint-1737/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: ''
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen2-VL-2B-Instruct
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

checkpoint-1737/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2-VL-2B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-1737/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5f4b9708eccf0370f9aaa1466d17c487ab3a9e4e84732d5cd39bbd229aedd5c
+size 4374520

checkpoint-1737/added_tokens.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-1737/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,7 @@

+{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
+You are a helpful assistant.<|im_end|>
+{% endif %}<|im_start|>{{ message['role'] }}
+{% if message['content'] is string %}{{ message['content'] }}<|im_end|>
+{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
+{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
+{% endif %}

checkpoint-1737/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1737/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84ee821de3d805218a80046b08a325803a2434e306b554e094f68548e53fbe41
+size 8783179

checkpoint-1737/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e816ab59bde4778d4f30814a9146abbd7044e1640b72b0be4234c4aa55b98f1
+size 14645

checkpoint-1737/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f9121f4d6a6f445ab467d2762de7c0b86cf7fef9179d9273d56797386ca47712
+size 1465

checkpoint-1737/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1737/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f33787292af226c4a4842be48a0e614d9524e25dc248e48bb1af0593de5564f9
+size 11420539

checkpoint-1737/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,143 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-1737/trainer_state.json ADDED Viewed

	@@ -0,0 +1,386 @@

+{
+  "best_global_step": 1737,
+  "best_metric": 6.15173864364624,
+  "best_model_checkpoint": "./output/checkpoint-1737",
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 1737,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 3.864118957519531,
+      "epoch": 0.028785261945883708,
+      "grad_norm": 2.7545533180236816,
+      "learning_rate": 9.800000000000001e-06,
+      "loss": 15.2997,
+      "mean_token_accuracy": 0.10086015284061432,
+      "num_tokens": 47319.0,
+      "step": 50
+    },
+    {
+      "entropy": 4.047076859474182,
+      "epoch": 0.057570523891767415,
+      "grad_norm": 5.0328264236450195,
+      "learning_rate": 1.98e-05,
+      "loss": 15.3264,
+      "mean_token_accuracy": 0.09582207053899765,
+      "num_tokens": 96809.0,
+      "step": 100
+    },
+    {
+      "entropy": 4.7578076648712155,
+      "epoch": 0.08635578583765112,
+      "grad_norm": 38.50589370727539,
+      "learning_rate": 1.988584740827024e-05,
+      "loss": 13.0056,
+      "mean_token_accuracy": 0.126854517608881,
+      "num_tokens": 139962.0,
+      "step": 150
+    },
+    {
+      "entropy": 6.80673882484436,
+      "epoch": 0.11514104778353483,
+      "grad_norm": 12.030129432678223,
+      "learning_rate": 1.97693651718113e-05,
+      "loss": 9.2822,
+      "mean_token_accuracy": 0.11084575355052947,
+      "num_tokens": 188029.0,
+      "step": 200
+    },
+    {
+      "entropy": 7.177925786972046,
+      "epoch": 0.14392630972941853,
+      "grad_norm": 4.852536201477051,
+      "learning_rate": 1.965288293535236e-05,
+      "loss": 7.6333,
+      "mean_token_accuracy": 0.12398939326405525,
+      "num_tokens": 234425.0,
+      "step": 250
+    },
+    {
+      "entropy": 7.080496473312378,
+      "epoch": 0.17271157167530224,
+      "grad_norm": 4.10841178894043,
+      "learning_rate": 1.9536400698893422e-05,
+      "loss": 7.1632,
+      "mean_token_accuracy": 0.13563686355948448,
+      "num_tokens": 278885.0,
+      "step": 300
+    },
+    {
+      "entropy": 6.931579580307007,
+      "epoch": 0.20149683362118595,
+      "grad_norm": 14.636048316955566,
+      "learning_rate": 1.941991846243448e-05,
+      "loss": 6.8213,
+      "mean_token_accuracy": 0.16459846690297128,
+      "num_tokens": 325491.0,
+      "step": 350
+    },
+    {
+      "entropy": 6.853660764694214,
+      "epoch": 0.23028209556706966,
+      "grad_norm": 5.966708183288574,
+      "learning_rate": 1.930343622597554e-05,
+      "loss": 6.6625,
+      "mean_token_accuracy": 0.17670693069696428,
+      "num_tokens": 372913.0,
+      "step": 400
+    },
+    {
+      "entropy": 6.684267387390137,
+      "epoch": 0.25906735751295334,
+      "grad_norm": 4.031010627746582,
+      "learning_rate": 1.91869539895166e-05,
+      "loss": 6.4505,
+      "mean_token_accuracy": 0.1943434515595436,
+      "num_tokens": 419159.0,
+      "step": 450
+    },
+    {
+      "entropy": 6.679989137649536,
+      "epoch": 0.28785261945883706,
+      "grad_norm": 6.251070022583008,
+      "learning_rate": 1.907047175305766e-05,
+      "loss": 6.4314,
+      "mean_token_accuracy": 0.19514557600021362,
+      "num_tokens": 466994.0,
+      "step": 500
+    },
+    {
+      "entropy": 6.477229623794556,
+      "epoch": 0.31663788140472077,
+      "grad_norm": 3.8656675815582275,
+      "learning_rate": 1.895398951659872e-05,
+      "loss": 6.2139,
+      "mean_token_accuracy": 0.21764743447303772,
+      "num_tokens": 513308.0,
+      "step": 550
+    },
+    {
+      "entropy": 6.408129243850708,
+      "epoch": 0.3454231433506045,
+      "grad_norm": 8.688581466674805,
+      "learning_rate": 1.883750728013978e-05,
+      "loss": 6.1224,
+      "mean_token_accuracy": 0.23438037544488907,
+      "num_tokens": 559679.0,
+      "step": 600
+    },
+    {
+      "entropy": 6.128518767356873,
+      "epoch": 0.3742084052964882,
+      "grad_norm": 5.419503688812256,
+      "learning_rate": 1.872102504368084e-05,
+      "loss": 5.8692,
+      "mean_token_accuracy": 0.26634690463542937,
+      "num_tokens": 603140.0,
+      "step": 650
+    },
+    {
+      "entropy": 6.322700729370117,
+      "epoch": 0.4029936672423719,
+      "grad_norm": 2.2213082313537598,
+      "learning_rate": 1.86045428072219e-05,
+      "loss": 6.0717,
+      "mean_token_accuracy": 0.24038562417030335,
+      "num_tokens": 650179.0,
+      "step": 700
+    },
+    {
+      "entropy": 6.236415157318115,
+      "epoch": 0.4317789291882556,
+      "grad_norm": 4.804980278015137,
+      "learning_rate": 1.848806057076296e-05,
+      "loss": 5.9986,
+      "mean_token_accuracy": 0.24596781462430953,
+      "num_tokens": 696220.0,
+      "step": 750
+    },
+    {
+      "entropy": 6.269758443832398,
+      "epoch": 0.4605641911341393,
+      "grad_norm": 2.2888853549957275,
+      "learning_rate": 1.837157833430402e-05,
+      "loss": 6.0385,
+      "mean_token_accuracy": 0.24074893474578857,
+      "num_tokens": 743909.0,
+      "step": 800
+    },
+    {
+      "entropy": 6.270364007949829,
+      "epoch": 0.48934945308002303,
+      "grad_norm": 3.0903279781341553,
+      "learning_rate": 1.825509609784508e-05,
+      "loss": 6.0481,
+      "mean_token_accuracy": 0.23740622967481614,
+      "num_tokens": 792015.0,
+      "step": 850
+    },
+    {
+      "entropy": 6.3037636184692385,
+      "epoch": 0.5181347150259067,
+      "grad_norm": 3.969320058822632,
+      "learning_rate": 1.813861386138614e-05,
+      "loss": 6.0855,
+      "mean_token_accuracy": 0.2309597587585449,
+      "num_tokens": 841802.0,
+      "step": 900
+    },
+    {
+      "entropy": 6.038041458129883,
+      "epoch": 0.5469199769717904,
+      "grad_norm": 2.2712185382843018,
+      "learning_rate": 1.80221316249272e-05,
+      "loss": 5.8285,
+      "mean_token_accuracy": 0.26099125802516937,
+      "num_tokens": 886492.0,
+      "step": 950
+    },
+    {
+      "entropy": 6.142958383560181,
+      "epoch": 0.5757052389176741,
+      "grad_norm": 1.2311755418777466,
+      "learning_rate": 1.790564938846826e-05,
+      "loss": 5.9357,
+      "mean_token_accuracy": 0.24810438305139543,
+      "num_tokens": 932807.0,
+      "step": 1000
+    },
+    {
+      "entropy": 6.199834351539612,
+      "epoch": 0.6044905008635578,
+      "grad_norm": 2.2788379192352295,
+      "learning_rate": 1.7789167152009318e-05,
+      "loss": 5.9964,
+      "mean_token_accuracy": 0.23942562609910964,
+      "num_tokens": 980541.0,
+      "step": 1050
+    },
+    {
+      "entropy": 5.961639919281006,
+      "epoch": 0.6332757628094415,
+      "grad_norm": 1.9077532291412354,
+      "learning_rate": 1.767268491555038e-05,
+      "loss": 5.7664,
+      "mean_token_accuracy": 0.26718012750148773,
+      "num_tokens": 1023882.0,
+      "step": 1100
+    },
+    {
+      "entropy": 5.889280087947846,
+      "epoch": 0.6620610247553252,
+      "grad_norm": 2.4254891872406006,
+      "learning_rate": 1.7556202679091442e-05,
+      "loss": 5.6952,
+      "mean_token_accuracy": 0.27529804170131683,
+      "num_tokens": 1068300.0,
+      "step": 1150
+    },
+    {
+      "entropy": 6.085640063285828,
+      "epoch": 0.690846286701209,
+      "grad_norm": 2.35312557220459,
+      "learning_rate": 1.74397204426325e-05,
+      "loss": 5.8898,
+      "mean_token_accuracy": 0.25166562348604204,
+      "num_tokens": 1115425.0,
+      "step": 1200
+    },
+    {
+      "entropy": 6.146574058532715,
+      "epoch": 0.7196315486470927,
+      "grad_norm": 1.7730146646499634,
+      "learning_rate": 1.732323820617356e-05,
+      "loss": 5.9519,
+      "mean_token_accuracy": 0.24276195973157882,
+      "num_tokens": 1162319.0,
+      "step": 1250
+    },
+    {
+      "entropy": 6.079372715950012,
+      "epoch": 0.7484168105929764,
+      "grad_norm": 1.7070863246917725,
+      "learning_rate": 1.720675596971462e-05,
+      "loss": 5.8922,
+      "mean_token_accuracy": 0.24961524546146394,
+      "num_tokens": 1208230.0,
+      "step": 1300
+    },
+    {
+      "entropy": 5.9683656406402585,
+      "epoch": 0.7772020725388601,
+      "grad_norm": 1.8790594339370728,
+      "learning_rate": 1.709027373325568e-05,
+      "loss": 5.7827,
+      "mean_token_accuracy": 0.2632122594118118,
+      "num_tokens": 1253074.0,
+      "step": 1350
+    },
+    {
+      "entropy": 6.107076721191406,
+      "epoch": 0.8059873344847438,
+      "grad_norm": 1.1745644807815552,
+      "learning_rate": 1.6973791496796742e-05,
+      "loss": 5.9211,
+      "mean_token_accuracy": 0.24564073830842972,
+      "num_tokens": 1300179.0,
+      "step": 1400
+    },
+    {
+      "entropy": 6.141328382492065,
+      "epoch": 0.8347725964306275,
+      "grad_norm": 1.0346958637237549,
+      "learning_rate": 1.68573092603378e-05,
+      "loss": 5.9584,
+      "mean_token_accuracy": 0.23997059136629104,
+      "num_tokens": 1347539.0,
+      "step": 1450
+    },
+    {
+      "entropy": 6.070010099411011,
+      "epoch": 0.8635578583765112,
+      "grad_norm": 1.6541163921356201,
+      "learning_rate": 1.674082702387886e-05,
+      "loss": 5.889,
+      "mean_token_accuracy": 0.24875166177749633,
+      "num_tokens": 1394157.0,
+      "step": 1500
+    },
+    {
+      "entropy": 6.207450666427612,
+      "epoch": 0.8923431203223949,
+      "grad_norm": 0.9742990732192993,
+      "learning_rate": 1.662434478741992e-05,
+      "loss": 6.0217,
+      "mean_token_accuracy": 0.23067249596118927,
+      "num_tokens": 1443892.0,
+      "step": 1550
+    },
+    {
+      "entropy": 6.026197805404663,
+      "epoch": 0.9211283822682786,
+      "grad_norm": 1.4229531288146973,
+      "learning_rate": 1.650786255096098e-05,
+      "loss": 5.8455,
+      "mean_token_accuracy": 0.2537291014194489,
+      "num_tokens": 1491050.0,
+      "step": 1600
+    },
+    {
+      "entropy": 6.210526428222656,
+      "epoch": 0.9499136442141624,
+      "grad_norm": 1.3555018901824951,
+      "learning_rate": 1.6391380314502038e-05,
+      "loss": 6.0279,
+      "mean_token_accuracy": 0.2308420208096504,
+      "num_tokens": 1540809.0,
+      "step": 1650
+    },
+    {
+      "entropy": 5.9872834014892575,
+      "epoch": 0.9786989061600461,
+      "grad_norm": 0.9893498420715332,
+      "learning_rate": 1.62748980780431e-05,
+      "loss": 5.8137,
+      "mean_token_accuracy": 0.2566875320672989,
+      "num_tokens": 1585876.0,
+      "step": 1700
+    },
+    {
+      "epoch": 1.0,
+      "eval_entropy": 6.322207130045386,
+      "eval_loss": 6.15173864364624,
+      "eval_mean_token_accuracy": 0.21116007946877985,
+      "eval_model_preparation_time": 0.0036,
+      "eval_num_tokens": 1619719.0,
+      "eval_runtime": 76.1297,
+      "eval_samples_per_second": 5.701,
+      "eval_steps_per_second": 2.85,
+      "step": 1737
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 8685,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.265889302609408e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1737/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:130d33149272782bd60306263c371036419926142b8999aad7806359168f8484
+size 6225

checkpoint-1737/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-3474/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: ''
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen2-VL-2B-Instruct
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

checkpoint-3474/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2-VL-2B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-3474/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b7979fe4ab41b842e564542d82ca738faea1a24cfcb2e3003501296353e2a240
+size 4374520

checkpoint-3474/added_tokens.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-3474/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,7 @@

+{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
+You are a helpful assistant.<|im_end|>
+{% endif %}<|im_start|>{{ message['role'] }}
+{% if message['content'] is string %}{{ message['content'] }}<|im_end|>
+{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
+{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
+{% endif %}

checkpoint-3474/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-3474/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:571f08123879a8157590252a0cd0abe24c345fd53c5c7a3b55bb8b256658f9c0
+size 8783179

checkpoint-3474/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f6c201154e30349ea924dac640f38cc7626e879caf89ba0aa995630585e3ea5
+size 14645

checkpoint-3474/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ecacb7697ae73257f39077a0e981cf0773317c0d0186dca0c24e0700ca53ab36
+size 1465

checkpoint-3474/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-3474/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f33787292af226c4a4842be48a0e614d9524e25dc248e48bb1af0593de5564f9
+size 11420539

checkpoint-3474/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,143 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-3474/trainer_state.json ADDED Viewed

	@@ -0,0 +1,748 @@

+{
+  "best_global_step": 3474,
+  "best_metric": 6.12472677230835,
+  "best_model_checkpoint": "./output/checkpoint-3474",
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 3474,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 3.864118957519531,
+      "epoch": 0.028785261945883708,
+      "grad_norm": 2.7545533180236816,
+      "learning_rate": 9.800000000000001e-06,
+      "loss": 15.2997,
+      "mean_token_accuracy": 0.10086015284061432,
+      "num_tokens": 47319.0,
+      "step": 50
+    },
+    {
+      "entropy": 4.047076859474182,
+      "epoch": 0.057570523891767415,
+      "grad_norm": 5.0328264236450195,
+      "learning_rate": 1.98e-05,
+      "loss": 15.3264,
+      "mean_token_accuracy": 0.09582207053899765,
+      "num_tokens": 96809.0,
+      "step": 100
+    },
+    {
+      "entropy": 4.7578076648712155,
+      "epoch": 0.08635578583765112,
+      "grad_norm": 38.50589370727539,
+      "learning_rate": 1.988584740827024e-05,
+      "loss": 13.0056,
+      "mean_token_accuracy": 0.126854517608881,
+      "num_tokens": 139962.0,
+      "step": 150
+    },
+    {
+      "entropy": 6.80673882484436,
+      "epoch": 0.11514104778353483,
+      "grad_norm": 12.030129432678223,
+      "learning_rate": 1.97693651718113e-05,
+      "loss": 9.2822,
+      "mean_token_accuracy": 0.11084575355052947,
+      "num_tokens": 188029.0,
+      "step": 200
+    },
+    {
+      "entropy": 7.177925786972046,
+      "epoch": 0.14392630972941853,
+      "grad_norm": 4.852536201477051,
+      "learning_rate": 1.965288293535236e-05,
+      "loss": 7.6333,
+      "mean_token_accuracy": 0.12398939326405525,
+      "num_tokens": 234425.0,
+      "step": 250
+    },
+    {
+      "entropy": 7.080496473312378,
+      "epoch": 0.17271157167530224,
+      "grad_norm": 4.10841178894043,
+      "learning_rate": 1.9536400698893422e-05,
+      "loss": 7.1632,
+      "mean_token_accuracy": 0.13563686355948448,
+      "num_tokens": 278885.0,
+      "step": 300
+    },
+    {
+      "entropy": 6.931579580307007,
+      "epoch": 0.20149683362118595,
+      "grad_norm": 14.636048316955566,
+      "learning_rate": 1.941991846243448e-05,
+      "loss": 6.8213,
+      "mean_token_accuracy": 0.16459846690297128,
+      "num_tokens": 325491.0,
+      "step": 350
+    },
+    {
+      "entropy": 6.853660764694214,
+      "epoch": 0.23028209556706966,
+      "grad_norm": 5.966708183288574,
+      "learning_rate": 1.930343622597554e-05,
+      "loss": 6.6625,
+      "mean_token_accuracy": 0.17670693069696428,
+      "num_tokens": 372913.0,
+      "step": 400
+    },
+    {
+      "entropy": 6.684267387390137,
+      "epoch": 0.25906735751295334,
+      "grad_norm": 4.031010627746582,
+      "learning_rate": 1.91869539895166e-05,
+      "loss": 6.4505,
+      "mean_token_accuracy": 0.1943434515595436,
+      "num_tokens": 419159.0,
+      "step": 450
+    },
+    {
+      "entropy": 6.679989137649536,
+      "epoch": 0.28785261945883706,
+      "grad_norm": 6.251070022583008,
+      "learning_rate": 1.907047175305766e-05,
+      "loss": 6.4314,
+      "mean_token_accuracy": 0.19514557600021362,
+      "num_tokens": 466994.0,
+      "step": 500
+    },
+    {
+      "entropy": 6.477229623794556,
+      "epoch": 0.31663788140472077,
+      "grad_norm": 3.8656675815582275,
+      "learning_rate": 1.895398951659872e-05,
+      "loss": 6.2139,
+      "mean_token_accuracy": 0.21764743447303772,
+      "num_tokens": 513308.0,
+      "step": 550
+    },
+    {
+      "entropy": 6.408129243850708,
+      "epoch": 0.3454231433506045,
+      "grad_norm": 8.688581466674805,
+      "learning_rate": 1.883750728013978e-05,
+      "loss": 6.1224,
+      "mean_token_accuracy": 0.23438037544488907,
+      "num_tokens": 559679.0,
+      "step": 600
+    },
+    {
+      "entropy": 6.128518767356873,
+      "epoch": 0.3742084052964882,
+      "grad_norm": 5.419503688812256,
+      "learning_rate": 1.872102504368084e-05,
+      "loss": 5.8692,
+      "mean_token_accuracy": 0.26634690463542937,
+      "num_tokens": 603140.0,
+      "step": 650
+    },
+    {
+      "entropy": 6.322700729370117,
+      "epoch": 0.4029936672423719,
+      "grad_norm": 2.2213082313537598,
+      "learning_rate": 1.86045428072219e-05,
+      "loss": 6.0717,
+      "mean_token_accuracy": 0.24038562417030335,
+      "num_tokens": 650179.0,
+      "step": 700
+    },
+    {
+      "entropy": 6.236415157318115,
+      "epoch": 0.4317789291882556,
+      "grad_norm": 4.804980278015137,
+      "learning_rate": 1.848806057076296e-05,
+      "loss": 5.9986,
+      "mean_token_accuracy": 0.24596781462430953,
+      "num_tokens": 696220.0,
+      "step": 750
+    },
+    {
+      "entropy": 6.269758443832398,
+      "epoch": 0.4605641911341393,
+      "grad_norm": 2.2888853549957275,
+      "learning_rate": 1.837157833430402e-05,
+      "loss": 6.0385,
+      "mean_token_accuracy": 0.24074893474578857,
+      "num_tokens": 743909.0,
+      "step": 800
+    },
+    {
+      "entropy": 6.270364007949829,
+      "epoch": 0.48934945308002303,
+      "grad_norm": 3.0903279781341553,
+      "learning_rate": 1.825509609784508e-05,
+      "loss": 6.0481,
+      "mean_token_accuracy": 0.23740622967481614,
+      "num_tokens": 792015.0,
+      "step": 850
+    },
+    {
+      "entropy": 6.3037636184692385,
+      "epoch": 0.5181347150259067,
+      "grad_norm": 3.969320058822632,
+      "learning_rate": 1.813861386138614e-05,
+      "loss": 6.0855,
+      "mean_token_accuracy": 0.2309597587585449,
+      "num_tokens": 841802.0,
+      "step": 900
+    },
+    {
+      "entropy": 6.038041458129883,
+      "epoch": 0.5469199769717904,
+      "grad_norm": 2.2712185382843018,
+      "learning_rate": 1.80221316249272e-05,
+      "loss": 5.8285,
+      "mean_token_accuracy": 0.26099125802516937,
+      "num_tokens": 886492.0,
+      "step": 950
+    },
+    {
+      "entropy": 6.142958383560181,
+      "epoch": 0.5757052389176741,
+      "grad_norm": 1.2311755418777466,
+      "learning_rate": 1.790564938846826e-05,
+      "loss": 5.9357,
+      "mean_token_accuracy": 0.24810438305139543,
+      "num_tokens": 932807.0,
+      "step": 1000
+    },
+    {
+      "entropy": 6.199834351539612,
+      "epoch": 0.6044905008635578,
+      "grad_norm": 2.2788379192352295,
+      "learning_rate": 1.7789167152009318e-05,
+      "loss": 5.9964,
+      "mean_token_accuracy": 0.23942562609910964,
+      "num_tokens": 980541.0,
+      "step": 1050
+    },
+    {
+      "entropy": 5.961639919281006,
+      "epoch": 0.6332757628094415,
+      "grad_norm": 1.9077532291412354,
+      "learning_rate": 1.767268491555038e-05,
+      "loss": 5.7664,
+      "mean_token_accuracy": 0.26718012750148773,
+      "num_tokens": 1023882.0,
+      "step": 1100
+    },
+    {
+      "entropy": 5.889280087947846,
+      "epoch": 0.6620610247553252,
+      "grad_norm": 2.4254891872406006,
+      "learning_rate": 1.7556202679091442e-05,
+      "loss": 5.6952,
+      "mean_token_accuracy": 0.27529804170131683,
+      "num_tokens": 1068300.0,
+      "step": 1150
+    },
+    {
+      "entropy": 6.085640063285828,
+      "epoch": 0.690846286701209,
+      "grad_norm": 2.35312557220459,
+      "learning_rate": 1.74397204426325e-05,
+      "loss": 5.8898,
+      "mean_token_accuracy": 0.25166562348604204,
+      "num_tokens": 1115425.0,
+      "step": 1200
+    },
+    {
+      "entropy": 6.146574058532715,
+      "epoch": 0.7196315486470927,
+      "grad_norm": 1.7730146646499634,
+      "learning_rate": 1.732323820617356e-05,
+      "loss": 5.9519,
+      "mean_token_accuracy": 0.24276195973157882,
+      "num_tokens": 1162319.0,
+      "step": 1250
+    },
+    {
+      "entropy": 6.079372715950012,
+      "epoch": 0.7484168105929764,
+      "grad_norm": 1.7070863246917725,
+      "learning_rate": 1.720675596971462e-05,
+      "loss": 5.8922,
+      "mean_token_accuracy": 0.24961524546146394,
+      "num_tokens": 1208230.0,
+      "step": 1300
+    },
+    {
+      "entropy": 5.9683656406402585,
+      "epoch": 0.7772020725388601,
+      "grad_norm": 1.8790594339370728,
+      "learning_rate": 1.709027373325568e-05,
+      "loss": 5.7827,
+      "mean_token_accuracy": 0.2632122594118118,
+      "num_tokens": 1253074.0,
+      "step": 1350
+    },
+    {
+      "entropy": 6.107076721191406,
+      "epoch": 0.8059873344847438,
+      "grad_norm": 1.1745644807815552,
+      "learning_rate": 1.6973791496796742e-05,
+      "loss": 5.9211,
+      "mean_token_accuracy": 0.24564073830842972,
+      "num_tokens": 1300179.0,
+      "step": 1400
+    },
+    {
+      "entropy": 6.141328382492065,
+      "epoch": 0.8347725964306275,
+      "grad_norm": 1.0346958637237549,
+      "learning_rate": 1.68573092603378e-05,
+      "loss": 5.9584,
+      "mean_token_accuracy": 0.23997059136629104,
+      "num_tokens": 1347539.0,
+      "step": 1450
+    },
+    {
+      "entropy": 6.070010099411011,
+      "epoch": 0.8635578583765112,
+      "grad_norm": 1.6541163921356201,
+      "learning_rate": 1.674082702387886e-05,
+      "loss": 5.889,
+      "mean_token_accuracy": 0.24875166177749633,
+      "num_tokens": 1394157.0,
+      "step": 1500
+    },
+    {
+      "entropy": 6.207450666427612,
+      "epoch": 0.8923431203223949,
+      "grad_norm": 0.9742990732192993,
+      "learning_rate": 1.662434478741992e-05,
+      "loss": 6.0217,
+      "mean_token_accuracy": 0.23067249596118927,
+      "num_tokens": 1443892.0,
+      "step": 1550
+    },
+    {
+      "entropy": 6.026197805404663,
+      "epoch": 0.9211283822682786,
+      "grad_norm": 1.4229531288146973,
+      "learning_rate": 1.650786255096098e-05,
+      "loss": 5.8455,
+      "mean_token_accuracy": 0.2537291014194489,
+      "num_tokens": 1491050.0,
+      "step": 1600
+    },
+    {
+      "entropy": 6.210526428222656,
+      "epoch": 0.9499136442141624,
+      "grad_norm": 1.3555018901824951,
+      "learning_rate": 1.6391380314502038e-05,
+      "loss": 6.0279,
+      "mean_token_accuracy": 0.2308420208096504,
+      "num_tokens": 1540809.0,
+      "step": 1650
+    },
+    {
+      "entropy": 5.9872834014892575,
+      "epoch": 0.9786989061600461,
+      "grad_norm": 0.9893498420715332,
+      "learning_rate": 1.62748980780431e-05,
+      "loss": 5.8137,
+      "mean_token_accuracy": 0.2566875320672989,
+      "num_tokens": 1585876.0,
+      "step": 1700
+    },
+    {
+      "epoch": 1.0,
+      "eval_entropy": 6.322207130045386,
+      "eval_loss": 6.15173864364624,
+      "eval_mean_token_accuracy": 0.21116007946877985,
+      "eval_model_preparation_time": 0.0036,
+      "eval_num_tokens": 1619719.0,
+      "eval_runtime": 76.1297,
+      "eval_samples_per_second": 5.701,
+      "eval_steps_per_second": 2.85,
+      "step": 1737
+    },
+    {
+      "entropy": 6.038531675338745,
+      "epoch": 1.0074841681059297,
+      "grad_norm": 0.8715208172798157,
+      "learning_rate": 1.615841584158416e-05,
+      "loss": 5.8628,
+      "mean_token_accuracy": 0.2510762655735016,
+      "num_tokens": 1632015.0,
+      "step": 1750
+    },
+    {
+      "entropy": 6.164030771255494,
+      "epoch": 1.0362694300518134,
+      "grad_norm": 0.7344900965690613,
+      "learning_rate": 1.604193360512522e-05,
+      "loss": 5.9856,
+      "mean_token_accuracy": 0.2351543301343918,
+      "num_tokens": 1681154.0,
+      "step": 1800
+    },
+    {
+      "entropy": 6.0731862354278565,
+      "epoch": 1.065054691997697,
+      "grad_norm": 1.0801328420639038,
+      "learning_rate": 1.592545136866628e-05,
+      "loss": 5.8976,
+      "mean_token_accuracy": 0.24701615989208223,
+      "num_tokens": 1728110.0,
+      "step": 1850
+    },
+    {
+      "entropy": 6.079212121963501,
+      "epoch": 1.0938399539435808,
+      "grad_norm": 0.7876909375190735,
+      "learning_rate": 1.5808969132207338e-05,
+      "loss": 5.9056,
+      "mean_token_accuracy": 0.24457543224096298,
+      "num_tokens": 1775703.0,
+      "step": 1900
+    },
+    {
+      "entropy": 6.062467746734619,
+      "epoch": 1.1226252158894645,
+      "grad_norm": 0.5999078750610352,
+      "learning_rate": 1.56924868957484e-05,
+      "loss": 5.8899,
+      "mean_token_accuracy": 0.2469428673386574,
+      "num_tokens": 1821980.0,
+      "step": 1950
+    },
+    {
+      "entropy": 6.031774473190308,
+      "epoch": 1.1514104778353482,
+      "grad_norm": 1.6313235759735107,
+      "learning_rate": 1.557600465928946e-05,
+      "loss": 5.8593,
+      "mean_token_accuracy": 0.250918984413147,
+      "num_tokens": 1867547.0,
+      "step": 2000
+    },
+    {
+      "entropy": 6.122789564132691,
+      "epoch": 1.180195739781232,
+      "grad_norm": 2.562373161315918,
+      "learning_rate": 1.545952242283052e-05,
+      "loss": 5.9502,
+      "mean_token_accuracy": 0.23938885867595672,
+      "num_tokens": 1915411.0,
+      "step": 2050
+    },
+    {
+      "entropy": 6.067130417823791,
+      "epoch": 1.2089810017271156,
+      "grad_norm": 0.9762872457504272,
+      "learning_rate": 1.534304018637158e-05,
+      "loss": 5.8956,
+      "mean_token_accuracy": 0.2454381173849106,
+      "num_tokens": 1964009.0,
+      "step": 2100
+    },
+    {
+      "entropy": 5.9613511180877685,
+      "epoch": 1.2377662636729994,
+      "grad_norm": 0.8701547384262085,
+      "learning_rate": 1.5226557949912639e-05,
+      "loss": 5.7907,
+      "mean_token_accuracy": 0.25976367652416227,
+      "num_tokens": 2008595.0,
+      "step": 2150
+    },
+    {
+      "entropy": 6.13505428314209,
+      "epoch": 1.266551525618883,
+      "grad_norm": 0.8511647582054138,
+      "learning_rate": 1.51100757134537e-05,
+      "loss": 5.9619,
+      "mean_token_accuracy": 0.23760781466960906,
+      "num_tokens": 2057229.0,
+      "step": 2200
+    },
+    {
+      "entropy": 6.025254983901977,
+      "epoch": 1.2953367875647668,
+      "grad_norm": 0.7627406120300293,
+      "learning_rate": 1.4993593476994758e-05,
+      "loss": 5.8546,
+      "mean_token_accuracy": 0.2508662334084511,
+      "num_tokens": 2103631.0,
+      "step": 2250
+    },
+    {
+      "entropy": 5.981974196434021,
+      "epoch": 1.3241220495106505,
+      "grad_norm": 1.6922173500061035,
+      "learning_rate": 1.4877111240535819e-05,
+      "loss": 5.8119,
+      "mean_token_accuracy": 0.256170334815979,
+      "num_tokens": 2150369.0,
+      "step": 2300
+    },
+    {
+      "entropy": 6.19903904914856,
+      "epoch": 1.3529073114565342,
+      "grad_norm": 0.40436601638793945,
+      "learning_rate": 1.4760629004076878e-05,
+      "loss": 6.0244,
+      "mean_token_accuracy": 0.22900927513837815,
+      "num_tokens": 2199724.0,
+      "step": 2350
+    },
+    {
+      "entropy": 5.986697297096253,
+      "epoch": 1.381692573402418,
+      "grad_norm": 0.8481882214546204,
+      "learning_rate": 1.464414676761794e-05,
+      "loss": 5.8195,
+      "mean_token_accuracy": 0.2552035376429558,
+      "num_tokens": 2245341.0,
+      "step": 2400
+    },
+    {
+      "entropy": 6.1886044692993165,
+      "epoch": 1.4104778353483016,
+      "grad_norm": 0.7911505103111267,
+      "learning_rate": 1.4527664531159e-05,
+      "loss": 6.0148,
+      "mean_token_accuracy": 0.23026730984449387,
+      "num_tokens": 2294726.0,
+      "step": 2450
+    },
+    {
+      "entropy": 5.974867792129516,
+      "epoch": 1.4392630972941853,
+      "grad_norm": 1.640499234199524,
+      "learning_rate": 1.441118229470006e-05,
+      "loss": 5.8111,
+      "mean_token_accuracy": 0.2554209426045418,
+      "num_tokens": 2342251.0,
+      "step": 2500
+    },
+    {
+      "entropy": 5.967635660171509,
+      "epoch": 1.468048359240069,
+      "grad_norm": 0.8022929430007935,
+      "learning_rate": 1.429470005824112e-05,
+      "loss": 5.8015,
+      "mean_token_accuracy": 0.2569852137565613,
+      "num_tokens": 2387469.0,
+      "step": 2550
+    },
+    {
+      "entropy": 6.047262029647827,
+      "epoch": 1.4968336211859528,
+      "grad_norm": 0.9270678758621216,
+      "learning_rate": 1.417821782178218e-05,
+      "loss": 5.8782,
+      "mean_token_accuracy": 0.2467849862575531,
+      "num_tokens": 2434128.0,
+      "step": 2600
+    },
+    {
+      "entropy": 6.00601068019867,
+      "epoch": 1.5256188831318365,
+      "grad_norm": 1.5378597974777222,
+      "learning_rate": 1.406173558532324e-05,
+      "loss": 5.839,
+      "mean_token_accuracy": 0.25216978013515473,
+      "num_tokens": 2480366.0,
+      "step": 2650
+    },
+    {
+      "entropy": 5.988714299201965,
+      "epoch": 1.5544041450777202,
+      "grad_norm": 0.819143533706665,
+      "learning_rate": 1.3945253348864299e-05,
+      "loss": 5.82,
+      "mean_token_accuracy": 0.254311783015728,
+      "num_tokens": 2527357.0,
+      "step": 2700
+    },
+    {
+      "entropy": 5.960293846130371,
+      "epoch": 1.583189407023604,
+      "grad_norm": 0.8920449614524841,
+      "learning_rate": 1.382877111240536e-05,
+      "loss": 5.7946,
+      "mean_token_accuracy": 0.25750755161046984,
+      "num_tokens": 2574470.0,
+      "step": 2750
+    },
+    {
+      "entropy": 6.1214879322052,
+      "epoch": 1.6119746689694876,
+      "grad_norm": 0.5333890914916992,
+      "learning_rate": 1.371228887594642e-05,
+      "loss": 5.9513,
+      "mean_token_accuracy": 0.2377367687225342,
+      "num_tokens": 2622280.0,
+      "step": 2800
+    },
+    {
+      "entropy": 5.951769871711731,
+      "epoch": 1.6407599309153713,
+      "grad_norm": 0.5994665026664734,
+      "learning_rate": 1.3595806639487479e-05,
+      "loss": 5.7861,
+      "mean_token_accuracy": 0.25854207515716554,
+      "num_tokens": 2668624.0,
+      "step": 2850
+    },
+    {
+      "entropy": 5.927765312194825,
+      "epoch": 1.669545192861255,
+      "grad_norm": 0.4460087716579437,
+      "learning_rate": 1.347932440302854e-05,
+      "loss": 5.7661,
+      "mean_token_accuracy": 0.25973255425691605,
+      "num_tokens": 2714388.0,
+      "step": 2900
+    },
+    {
+      "entropy": 6.097678365707398,
+      "epoch": 1.6983304548071387,
+      "grad_norm": 0.7125752568244934,
+      "learning_rate": 1.3362842166569598e-05,
+      "loss": 5.9284,
+      "mean_token_accuracy": 0.23995368272066117,
+      "num_tokens": 2761465.0,
+      "step": 2950
+    },
+    {
+      "entropy": 5.986212658882141,
+      "epoch": 1.7271157167530224,
+      "grad_norm": 1.5405049324035645,
+      "learning_rate": 1.3246359930110659e-05,
+      "loss": 5.8194,
+      "mean_token_accuracy": 0.25333445996046067,
+      "num_tokens": 2808066.0,
+      "step": 3000
+    },
+    {
+      "entropy": 5.7968806195259095,
+      "epoch": 1.7559009786989062,
+      "grad_norm": 0.4532749652862549,
+      "learning_rate": 1.312987769365172e-05,
+      "loss": 5.6344,
+      "mean_token_accuracy": 0.2782411390542984,
+      "num_tokens": 2851822.0,
+      "step": 3050
+    },
+    {
+      "entropy": 5.973708114624023,
+      "epoch": 1.7846862406447899,
+      "grad_norm": 1.4795438051223755,
+      "learning_rate": 1.3013395457192778e-05,
+      "loss": 5.8104,
+      "mean_token_accuracy": 0.25441971331834795,
+      "num_tokens": 2897737.0,
+      "step": 3100
+    },
+    {
+      "entropy": 5.70733567237854,
+      "epoch": 1.8134715025906736,
+      "grad_norm": 0.6216577887535095,
+      "learning_rate": 1.2896913220733839e-05,
+      "loss": 5.5523,
+      "mean_token_accuracy": 0.28787180870771406,
+      "num_tokens": 2939511.0,
+      "step": 3150
+    },
+    {
+      "entropy": 5.96826630115509,
+      "epoch": 1.8422567645365573,
+      "grad_norm": 0.9246350526809692,
+      "learning_rate": 1.2780430984274898e-05,
+      "loss": 5.8057,
+      "mean_token_accuracy": 0.25464902341365814,
+      "num_tokens": 2986368.0,
+      "step": 3200
+    },
+    {
+      "entropy": 5.950662693977356,
+      "epoch": 1.871042026482441,
+      "grad_norm": 0.8141199946403503,
+      "learning_rate": 1.266394874781596e-05,
+      "loss": 5.7886,
+      "mean_token_accuracy": 0.25830793648958206,
+      "num_tokens": 3031770.0,
+      "step": 3250
+    },
+    {
+      "entropy": 6.00512773513794,
+      "epoch": 1.8998272884283247,
+      "grad_norm": 0.4913998246192932,
+      "learning_rate": 1.2547466511357018e-05,
+      "loss": 5.838,
+      "mean_token_accuracy": 0.2512077575922012,
+      "num_tokens": 3078322.0,
+      "step": 3300
+    },
+    {
+      "entropy": 6.090880632400513,
+      "epoch": 1.9286125503742084,
+      "grad_norm": 0.9893012046813965,
+      "learning_rate": 1.243098427489808e-05,
+      "loss": 5.9264,
+      "mean_token_accuracy": 0.2391783133149147,
+      "num_tokens": 3125572.0,
+      "step": 3350
+    },
+    {
+      "entropy": 5.949693293571472,
+      "epoch": 1.9573978123200921,
+      "grad_norm": 0.5794200301170349,
+      "learning_rate": 1.231450203843914e-05,
+      "loss": 5.7861,
+      "mean_token_accuracy": 0.2568664598464966,
+      "num_tokens": 3171974.0,
+      "step": 3400
+    },
+    {
+      "entropy": 6.03591317653656,
+      "epoch": 1.9861830742659758,
+      "grad_norm": 0.8525373339653015,
+      "learning_rate": 1.21980198019802e-05,
+      "loss": 5.8741,
+      "mean_token_accuracy": 0.24642003327608109,
+      "num_tokens": 3219624.0,
+      "step": 3450
+    },
+    {
+      "epoch": 2.0,
+      "eval_entropy": 6.272298685416648,
+      "eval_loss": 6.12472677230835,
+      "eval_mean_token_accuracy": 0.21168697409091458,
+      "eval_model_preparation_time": 0.0036,
+      "eval_num_tokens": 3239438.0,
+      "eval_runtime": 76.2536,
+      "eval_samples_per_second": 5.692,
+      "eval_steps_per_second": 2.846,
+      "step": 3474
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 8685,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.529454004325376e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-3474/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:130d33149272782bd60306263c371036419926142b8999aad7806359168f8484
+size 6225

checkpoint-3474/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-5211/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: ''
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen2-VL-2B-Instruct
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

checkpoint-5211/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2-VL-2B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-5211/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e6a7b22d63fd8741b839353cbaab150c0bd5f07d663ad8884bd3b4af58a9cce
+size 4374520

checkpoint-5211/added_tokens.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-5211/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,7 @@

+{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
+You are a helpful assistant.<|im_end|>
+{% endif %}<|im_start|>{{ message['role'] }}
+{% if message['content'] is string %}{{ message['content'] }}<|im_end|>
+{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
+{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
+{% endif %}

checkpoint-5211/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-5211/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d7235486f7f068a0b9991bde7ca0b6a16106923b1cca53549a5bb621f15d218
+size 8783179

checkpoint-5211/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:43cbafcbad7a00736ad4867a9fc18293a08b0b3d13acacb84d30cd8449539e81
+size 14645

checkpoint-5211/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c82e157712778db9a1270de44d6dd5d35b469dbf5b63767059cabfb507d50c8a
+size 1465

checkpoint-5211/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-5211/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f33787292af226c4a4842be48a0e614d9524e25dc248e48bb1af0593de5564f9
+size 11420539

checkpoint-5211/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,143 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-5211/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1110 @@

+{
+  "best_global_step": 5211,
+  "best_metric": 6.0980024337768555,
+  "best_model_checkpoint": "./output/checkpoint-5211",
+  "epoch": 3.0,
+  "eval_steps": 500,
+  "global_step": 5211,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "entropy": 3.864118957519531,
+      "epoch": 0.028785261945883708,
+      "grad_norm": 2.7545533180236816,
+      "learning_rate": 9.800000000000001e-06,
+      "loss": 15.2997,
+      "mean_token_accuracy": 0.10086015284061432,
+      "num_tokens": 47319.0,
+      "step": 50
+    },
+    {
+      "entropy": 4.047076859474182,
+      "epoch": 0.057570523891767415,
+      "grad_norm": 5.0328264236450195,
+      "learning_rate": 1.98e-05,
+      "loss": 15.3264,
+      "mean_token_accuracy": 0.09582207053899765,
+      "num_tokens": 96809.0,
+      "step": 100
+    },
+    {
+      "entropy": 4.7578076648712155,
+      "epoch": 0.08635578583765112,
+      "grad_norm": 38.50589370727539,
+      "learning_rate": 1.988584740827024e-05,
+      "loss": 13.0056,
+      "mean_token_accuracy": 0.126854517608881,
+      "num_tokens": 139962.0,
+      "step": 150
+    },
+    {
+      "entropy": 6.80673882484436,
+      "epoch": 0.11514104778353483,
+      "grad_norm": 12.030129432678223,
+      "learning_rate": 1.97693651718113e-05,
+      "loss": 9.2822,
+      "mean_token_accuracy": 0.11084575355052947,
+      "num_tokens": 188029.0,
+      "step": 200
+    },
+    {
+      "entropy": 7.177925786972046,
+      "epoch": 0.14392630972941853,
+      "grad_norm": 4.852536201477051,
+      "learning_rate": 1.965288293535236e-05,
+      "loss": 7.6333,
+      "mean_token_accuracy": 0.12398939326405525,
+      "num_tokens": 234425.0,
+      "step": 250
+    },
+    {
+      "entropy": 7.080496473312378,
+      "epoch": 0.17271157167530224,
+      "grad_norm": 4.10841178894043,
+      "learning_rate": 1.9536400698893422e-05,
+      "loss": 7.1632,
+      "mean_token_accuracy": 0.13563686355948448,
+      "num_tokens": 278885.0,
+      "step": 300
+    },
+    {
+      "entropy": 6.931579580307007,
+      "epoch": 0.20149683362118595,
+      "grad_norm": 14.636048316955566,
+      "learning_rate": 1.941991846243448e-05,
+      "loss": 6.8213,
+      "mean_token_accuracy": 0.16459846690297128,
+      "num_tokens": 325491.0,
+      "step": 350
+    },
+    {
+      "entropy": 6.853660764694214,
+      "epoch": 0.23028209556706966,
+      "grad_norm": 5.966708183288574,
+      "learning_rate": 1.930343622597554e-05,
+      "loss": 6.6625,
+      "mean_token_accuracy": 0.17670693069696428,
+      "num_tokens": 372913.0,
+      "step": 400
+    },
+    {
+      "entropy": 6.684267387390137,
+      "epoch": 0.25906735751295334,
+      "grad_norm": 4.031010627746582,
+      "learning_rate": 1.91869539895166e-05,
+      "loss": 6.4505,
+      "mean_token_accuracy": 0.1943434515595436,
+      "num_tokens": 419159.0,
+      "step": 450
+    },
+    {
+      "entropy": 6.679989137649536,
+      "epoch": 0.28785261945883706,
+      "grad_norm": 6.251070022583008,
+      "learning_rate": 1.907047175305766e-05,
+      "loss": 6.4314,
+      "mean_token_accuracy": 0.19514557600021362,
+      "num_tokens": 466994.0,
+      "step": 500
+    },
+    {
+      "entropy": 6.477229623794556,
+      "epoch": 0.31663788140472077,
+      "grad_norm": 3.8656675815582275,
+      "learning_rate": 1.895398951659872e-05,
+      "loss": 6.2139,
+      "mean_token_accuracy": 0.21764743447303772,
+      "num_tokens": 513308.0,
+      "step": 550
+    },
+    {
+      "entropy": 6.408129243850708,
+      "epoch": 0.3454231433506045,
+      "grad_norm": 8.688581466674805,
+      "learning_rate": 1.883750728013978e-05,
+      "loss": 6.1224,
+      "mean_token_accuracy": 0.23438037544488907,
+      "num_tokens": 559679.0,
+      "step": 600
+    },
+    {
+      "entropy": 6.128518767356873,
+      "epoch": 0.3742084052964882,
+      "grad_norm": 5.419503688812256,
+      "learning_rate": 1.872102504368084e-05,
+      "loss": 5.8692,
+      "mean_token_accuracy": 0.26634690463542937,
+      "num_tokens": 603140.0,
+      "step": 650
+    },
+    {
+      "entropy": 6.322700729370117,
+      "epoch": 0.4029936672423719,
+      "grad_norm": 2.2213082313537598,
+      "learning_rate": 1.86045428072219e-05,
+      "loss": 6.0717,
+      "mean_token_accuracy": 0.24038562417030335,
+      "num_tokens": 650179.0,
+      "step": 700
+    },
+    {
+      "entropy": 6.236415157318115,
+      "epoch": 0.4317789291882556,
+      "grad_norm": 4.804980278015137,
+      "learning_rate": 1.848806057076296e-05,
+      "loss": 5.9986,
+      "mean_token_accuracy": 0.24596781462430953,
+      "num_tokens": 696220.0,
+      "step": 750
+    },
+    {
+      "entropy": 6.269758443832398,
+      "epoch": 0.4605641911341393,
+      "grad_norm": 2.2888853549957275,
+      "learning_rate": 1.837157833430402e-05,
+      "loss": 6.0385,
+      "mean_token_accuracy": 0.24074893474578857,
+      "num_tokens": 743909.0,
+      "step": 800
+    },
+    {
+      "entropy": 6.270364007949829,
+      "epoch": 0.48934945308002303,
+      "grad_norm": 3.0903279781341553,
+      "learning_rate": 1.825509609784508e-05,
+      "loss": 6.0481,
+      "mean_token_accuracy": 0.23740622967481614,
+      "num_tokens": 792015.0,
+      "step": 850
+    },
+    {
+      "entropy": 6.3037636184692385,
+      "epoch": 0.5181347150259067,
+      "grad_norm": 3.969320058822632,
+      "learning_rate": 1.813861386138614e-05,
+      "loss": 6.0855,
+      "mean_token_accuracy": 0.2309597587585449,
+      "num_tokens": 841802.0,
+      "step": 900
+    },
+    {
+      "entropy": 6.038041458129883,
+      "epoch": 0.5469199769717904,
+      "grad_norm": 2.2712185382843018,
+      "learning_rate": 1.80221316249272e-05,
+      "loss": 5.8285,
+      "mean_token_accuracy": 0.26099125802516937,
+      "num_tokens": 886492.0,
+      "step": 950
+    },
+    {
+      "entropy": 6.142958383560181,
+      "epoch": 0.5757052389176741,
+      "grad_norm": 1.2311755418777466,
+      "learning_rate": 1.790564938846826e-05,
+      "loss": 5.9357,
+      "mean_token_accuracy": 0.24810438305139543,
+      "num_tokens": 932807.0,
+      "step": 1000
+    },
+    {
+      "entropy": 6.199834351539612,
+      "epoch": 0.6044905008635578,
+      "grad_norm": 2.2788379192352295,
+      "learning_rate": 1.7789167152009318e-05,
+      "loss": 5.9964,
+      "mean_token_accuracy": 0.23942562609910964,
+      "num_tokens": 980541.0,
+      "step": 1050
+    },
+    {
+      "entropy": 5.961639919281006,
+      "epoch": 0.6332757628094415,
+      "grad_norm": 1.9077532291412354,
+      "learning_rate": 1.767268491555038e-05,
+      "loss": 5.7664,
+      "mean_token_accuracy": 0.26718012750148773,
+      "num_tokens": 1023882.0,
+      "step": 1100
+    },
+    {
+      "entropy": 5.889280087947846,
+      "epoch": 0.6620610247553252,
+      "grad_norm": 2.4254891872406006,
+      "learning_rate": 1.7556202679091442e-05,
+      "loss": 5.6952,
+      "mean_token_accuracy": 0.27529804170131683,
+      "num_tokens": 1068300.0,
+      "step": 1150
+    },
+    {
+      "entropy": 6.085640063285828,
+      "epoch": 0.690846286701209,
+      "grad_norm": 2.35312557220459,
+      "learning_rate": 1.74397204426325e-05,
+      "loss": 5.8898,
+      "mean_token_accuracy": 0.25166562348604204,
+      "num_tokens": 1115425.0,
+      "step": 1200
+    },
+    {
+      "entropy": 6.146574058532715,
+      "epoch": 0.7196315486470927,
+      "grad_norm": 1.7730146646499634,
+      "learning_rate": 1.732323820617356e-05,
+      "loss": 5.9519,
+      "mean_token_accuracy": 0.24276195973157882,
+      "num_tokens": 1162319.0,
+      "step": 1250
+    },
+    {
+      "entropy": 6.079372715950012,
+      "epoch": 0.7484168105929764,
+      "grad_norm": 1.7070863246917725,
+      "learning_rate": 1.720675596971462e-05,
+      "loss": 5.8922,
+      "mean_token_accuracy": 0.24961524546146394,
+      "num_tokens": 1208230.0,
+      "step": 1300
+    },
+    {
+      "entropy": 5.9683656406402585,
+      "epoch": 0.7772020725388601,
+      "grad_norm": 1.8790594339370728,
+      "learning_rate": 1.709027373325568e-05,
+      "loss": 5.7827,
+      "mean_token_accuracy": 0.2632122594118118,
+      "num_tokens": 1253074.0,
+      "step": 1350
+    },
+    {
+      "entropy": 6.107076721191406,
+      "epoch": 0.8059873344847438,
+      "grad_norm": 1.1745644807815552,
+      "learning_rate": 1.6973791496796742e-05,
+      "loss": 5.9211,
+      "mean_token_accuracy": 0.24564073830842972,
+      "num_tokens": 1300179.0,
+      "step": 1400
+    },
+    {
+      "entropy": 6.141328382492065,
+      "epoch": 0.8347725964306275,
+      "grad_norm": 1.0346958637237549,
+      "learning_rate": 1.68573092603378e-05,
+      "loss": 5.9584,
+      "mean_token_accuracy": 0.23997059136629104,
+      "num_tokens": 1347539.0,
+      "step": 1450
+    },
+    {
+      "entropy": 6.070010099411011,
+      "epoch": 0.8635578583765112,
+      "grad_norm": 1.6541163921356201,
+      "learning_rate": 1.674082702387886e-05,
+      "loss": 5.889,
+      "mean_token_accuracy": 0.24875166177749633,
+      "num_tokens": 1394157.0,
+      "step": 1500
+    },
+    {
+      "entropy": 6.207450666427612,
+      "epoch": 0.8923431203223949,
+      "grad_norm": 0.9742990732192993,
+      "learning_rate": 1.662434478741992e-05,
+      "loss": 6.0217,
+      "mean_token_accuracy": 0.23067249596118927,
+      "num_tokens": 1443892.0,
+      "step": 1550
+    },
+    {
+      "entropy": 6.026197805404663,
+      "epoch": 0.9211283822682786,
+      "grad_norm": 1.4229531288146973,
+      "learning_rate": 1.650786255096098e-05,
+      "loss": 5.8455,
+      "mean_token_accuracy": 0.2537291014194489,
+      "num_tokens": 1491050.0,
+      "step": 1600
+    },
+    {
+      "entropy": 6.210526428222656,
+      "epoch": 0.9499136442141624,
+      "grad_norm": 1.3555018901824951,
+      "learning_rate": 1.6391380314502038e-05,
+      "loss": 6.0279,
+      "mean_token_accuracy": 0.2308420208096504,
+      "num_tokens": 1540809.0,
+      "step": 1650
+    },
+    {
+      "entropy": 5.9872834014892575,
+      "epoch": 0.9786989061600461,
+      "grad_norm": 0.9893498420715332,
+      "learning_rate": 1.62748980780431e-05,
+      "loss": 5.8137,
+      "mean_token_accuracy": 0.2566875320672989,
+      "num_tokens": 1585876.0,
+      "step": 1700
+    },
+    {
+      "epoch": 1.0,
+      "eval_entropy": 6.322207130045386,
+      "eval_loss": 6.15173864364624,
+      "eval_mean_token_accuracy": 0.21116007946877985,
+      "eval_model_preparation_time": 0.0036,
+      "eval_num_tokens": 1619719.0,
+      "eval_runtime": 76.1297,
+      "eval_samples_per_second": 5.701,
+      "eval_steps_per_second": 2.85,
+      "step": 1737
+    },
+    {
+      "entropy": 6.038531675338745,
+      "epoch": 1.0074841681059297,
+      "grad_norm": 0.8715208172798157,
+      "learning_rate": 1.615841584158416e-05,
+      "loss": 5.8628,
+      "mean_token_accuracy": 0.2510762655735016,
+      "num_tokens": 1632015.0,
+      "step": 1750
+    },
+    {
+      "entropy": 6.164030771255494,
+      "epoch": 1.0362694300518134,
+      "grad_norm": 0.7344900965690613,
+      "learning_rate": 1.604193360512522e-05,
+      "loss": 5.9856,
+      "mean_token_accuracy": 0.2351543301343918,
+      "num_tokens": 1681154.0,
+      "step": 1800
+    },
+    {
+      "entropy": 6.0731862354278565,
+      "epoch": 1.065054691997697,
+      "grad_norm": 1.0801328420639038,
+      "learning_rate": 1.592545136866628e-05,
+      "loss": 5.8976,
+      "mean_token_accuracy": 0.24701615989208223,
+      "num_tokens": 1728110.0,
+      "step": 1850
+    },
+    {
+      "entropy": 6.079212121963501,
+      "epoch": 1.0938399539435808,
+      "grad_norm": 0.7876909375190735,
+      "learning_rate": 1.5808969132207338e-05,
+      "loss": 5.9056,
+      "mean_token_accuracy": 0.24457543224096298,
+      "num_tokens": 1775703.0,
+      "step": 1900
+    },
+    {
+      "entropy": 6.062467746734619,
+      "epoch": 1.1226252158894645,
+      "grad_norm": 0.5999078750610352,
+      "learning_rate": 1.56924868957484e-05,
+      "loss": 5.8899,
+      "mean_token_accuracy": 0.2469428673386574,
+      "num_tokens": 1821980.0,
+      "step": 1950
+    },
+    {
+      "entropy": 6.031774473190308,
+      "epoch": 1.1514104778353482,
+      "grad_norm": 1.6313235759735107,
+      "learning_rate": 1.557600465928946e-05,
+      "loss": 5.8593,
+      "mean_token_accuracy": 0.250918984413147,
+      "num_tokens": 1867547.0,
+      "step": 2000
+    },
+    {
+      "entropy": 6.122789564132691,
+      "epoch": 1.180195739781232,
+      "grad_norm": 2.562373161315918,
+      "learning_rate": 1.545952242283052e-05,
+      "loss": 5.9502,
+      "mean_token_accuracy": 0.23938885867595672,
+      "num_tokens": 1915411.0,
+      "step": 2050
+    },
+    {
+      "entropy": 6.067130417823791,
+      "epoch": 1.2089810017271156,
+      "grad_norm": 0.9762872457504272,
+      "learning_rate": 1.534304018637158e-05,
+      "loss": 5.8956,
+      "mean_token_accuracy": 0.2454381173849106,
+      "num_tokens": 1964009.0,
+      "step": 2100
+    },
+    {
+      "entropy": 5.9613511180877685,
+      "epoch": 1.2377662636729994,
+      "grad_norm": 0.8701547384262085,
+      "learning_rate": 1.5226557949912639e-05,
+      "loss": 5.7907,
+      "mean_token_accuracy": 0.25976367652416227,
+      "num_tokens": 2008595.0,
+      "step": 2150
+    },
+    {
+      "entropy": 6.13505428314209,
+      "epoch": 1.266551525618883,
+      "grad_norm": 0.8511647582054138,
+      "learning_rate": 1.51100757134537e-05,
+      "loss": 5.9619,
+      "mean_token_accuracy": 0.23760781466960906,
+      "num_tokens": 2057229.0,
+      "step": 2200
+    },
+    {
+      "entropy": 6.025254983901977,
+      "epoch": 1.2953367875647668,
+      "grad_norm": 0.7627406120300293,
+      "learning_rate": 1.4993593476994758e-05,
+      "loss": 5.8546,
+      "mean_token_accuracy": 0.2508662334084511,
+      "num_tokens": 2103631.0,
+      "step": 2250
+    },
+    {
+      "entropy": 5.981974196434021,
+      "epoch": 1.3241220495106505,
+      "grad_norm": 1.6922173500061035,
+      "learning_rate": 1.4877111240535819e-05,
+      "loss": 5.8119,
+      "mean_token_accuracy": 0.256170334815979,
+      "num_tokens": 2150369.0,
+      "step": 2300
+    },
+    {
+      "entropy": 6.19903904914856,
+      "epoch": 1.3529073114565342,
+      "grad_norm": 0.40436601638793945,
+      "learning_rate": 1.4760629004076878e-05,
+      "loss": 6.0244,
+      "mean_token_accuracy": 0.22900927513837815,
+      "num_tokens": 2199724.0,
+      "step": 2350
+    },
+    {
+      "entropy": 5.986697297096253,
+      "epoch": 1.381692573402418,
+      "grad_norm": 0.8481882214546204,
+      "learning_rate": 1.464414676761794e-05,
+      "loss": 5.8195,
+      "mean_token_accuracy": 0.2552035376429558,
+      "num_tokens": 2245341.0,
+      "step": 2400
+    },
+    {
+      "entropy": 6.1886044692993165,
+      "epoch": 1.4104778353483016,
+      "grad_norm": 0.7911505103111267,
+      "learning_rate": 1.4527664531159e-05,
+      "loss": 6.0148,
+      "mean_token_accuracy": 0.23026730984449387,
+      "num_tokens": 2294726.0,
+      "step": 2450
+    },
+    {
+      "entropy": 5.974867792129516,
+      "epoch": 1.4392630972941853,
+      "grad_norm": 1.640499234199524,
+      "learning_rate": 1.441118229470006e-05,
+      "loss": 5.8111,
+      "mean_token_accuracy": 0.2554209426045418,
+      "num_tokens": 2342251.0,
+      "step": 2500
+    },
+    {
+      "entropy": 5.967635660171509,
+      "epoch": 1.468048359240069,
+      "grad_norm": 0.8022929430007935,
+      "learning_rate": 1.429470005824112e-05,
+      "loss": 5.8015,
+      "mean_token_accuracy": 0.2569852137565613,
+      "num_tokens": 2387469.0,
+      "step": 2550
+    },
+    {
+      "entropy": 6.047262029647827,
+      "epoch": 1.4968336211859528,
+      "grad_norm": 0.9270678758621216,
+      "learning_rate": 1.417821782178218e-05,
+      "loss": 5.8782,
+      "mean_token_accuracy": 0.2467849862575531,
+      "num_tokens": 2434128.0,
+      "step": 2600
+    },
+    {
+      "entropy": 6.00601068019867,
+      "epoch": 1.5256188831318365,
+      "grad_norm": 1.5378597974777222,
+      "learning_rate": 1.406173558532324e-05,
+      "loss": 5.839,
+      "mean_token_accuracy": 0.25216978013515473,
+      "num_tokens": 2480366.0,
+      "step": 2650
+    },
+    {
+      "entropy": 5.988714299201965,
+      "epoch": 1.5544041450777202,
+      "grad_norm": 0.819143533706665,
+      "learning_rate": 1.3945253348864299e-05,
+      "loss": 5.82,
+      "mean_token_accuracy": 0.254311783015728,
+      "num_tokens": 2527357.0,
+      "step": 2700
+    },
+    {
+      "entropy": 5.960293846130371,
+      "epoch": 1.583189407023604,
+      "grad_norm": 0.8920449614524841,
+      "learning_rate": 1.382877111240536e-05,
+      "loss": 5.7946,
+      "mean_token_accuracy": 0.25750755161046984,
+      "num_tokens": 2574470.0,
+      "step": 2750
+    },
+    {
+      "entropy": 6.1214879322052,
+      "epoch": 1.6119746689694876,
+      "grad_norm": 0.5333890914916992,
+      "learning_rate": 1.371228887594642e-05,
+      "loss": 5.9513,
+      "mean_token_accuracy": 0.2377367687225342,
+      "num_tokens": 2622280.0,
+      "step": 2800
+    },
+    {
+      "entropy": 5.951769871711731,
+      "epoch": 1.6407599309153713,
+      "grad_norm": 0.5994665026664734,
+      "learning_rate": 1.3595806639487479e-05,
+      "loss": 5.7861,
+      "mean_token_accuracy": 0.25854207515716554,
+      "num_tokens": 2668624.0,
+      "step": 2850
+    },
+    {
+      "entropy": 5.927765312194825,
+      "epoch": 1.669545192861255,
+      "grad_norm": 0.4460087716579437,
+      "learning_rate": 1.347932440302854e-05,
+      "loss": 5.7661,
+      "mean_token_accuracy": 0.25973255425691605,
+      "num_tokens": 2714388.0,
+      "step": 2900
+    },
+    {
+      "entropy": 6.097678365707398,
+      "epoch": 1.6983304548071387,
+      "grad_norm": 0.7125752568244934,
+      "learning_rate": 1.3362842166569598e-05,
+      "loss": 5.9284,
+      "mean_token_accuracy": 0.23995368272066117,
+      "num_tokens": 2761465.0,
+      "step": 2950
+    },
+    {
+      "entropy": 5.986212658882141,
+      "epoch": 1.7271157167530224,
+      "grad_norm": 1.5405049324035645,
+      "learning_rate": 1.3246359930110659e-05,
+      "loss": 5.8194,
+      "mean_token_accuracy": 0.25333445996046067,
+      "num_tokens": 2808066.0,
+      "step": 3000
+    },
+    {
+      "entropy": 5.7968806195259095,
+      "epoch": 1.7559009786989062,
+      "grad_norm": 0.4532749652862549,
+      "learning_rate": 1.312987769365172e-05,
+      "loss": 5.6344,
+      "mean_token_accuracy": 0.2782411390542984,
+      "num_tokens": 2851822.0,
+      "step": 3050
+    },
+    {
+      "entropy": 5.973708114624023,
+      "epoch": 1.7846862406447899,
+      "grad_norm": 1.4795438051223755,
+      "learning_rate": 1.3013395457192778e-05,
+      "loss": 5.8104,
+      "mean_token_accuracy": 0.25441971331834795,
+      "num_tokens": 2897737.0,
+      "step": 3100
+    },
+    {
+      "entropy": 5.70733567237854,
+      "epoch": 1.8134715025906736,
+      "grad_norm": 0.6216577887535095,
+      "learning_rate": 1.2896913220733839e-05,
+      "loss": 5.5523,
+      "mean_token_accuracy": 0.28787180870771406,
+      "num_tokens": 2939511.0,
+      "step": 3150
+    },
+    {
+      "entropy": 5.96826630115509,
+      "epoch": 1.8422567645365573,
+      "grad_norm": 0.9246350526809692,
+      "learning_rate": 1.2780430984274898e-05,
+      "loss": 5.8057,
+      "mean_token_accuracy": 0.25464902341365814,
+      "num_tokens": 2986368.0,
+      "step": 3200
+    },
+    {
+      "entropy": 5.950662693977356,
+      "epoch": 1.871042026482441,
+      "grad_norm": 0.8141199946403503,
+      "learning_rate": 1.266394874781596e-05,
+      "loss": 5.7886,
+      "mean_token_accuracy": 0.25830793648958206,
+      "num_tokens": 3031770.0,
+      "step": 3250
+    },
+    {
+      "entropy": 6.00512773513794,
+      "epoch": 1.8998272884283247,
+      "grad_norm": 0.4913998246192932,
+      "learning_rate": 1.2547466511357018e-05,
+      "loss": 5.838,
+      "mean_token_accuracy": 0.2512077575922012,
+      "num_tokens": 3078322.0,
+      "step": 3300
+    },
+    {
+      "entropy": 6.090880632400513,
+      "epoch": 1.9286125503742084,
+      "grad_norm": 0.9893012046813965,
+      "learning_rate": 1.243098427489808e-05,
+      "loss": 5.9264,
+      "mean_token_accuracy": 0.2391783133149147,
+      "num_tokens": 3125572.0,
+      "step": 3350
+    },
+    {
+      "entropy": 5.949693293571472,
+      "epoch": 1.9573978123200921,
+      "grad_norm": 0.5794200301170349,
+      "learning_rate": 1.231450203843914e-05,
+      "loss": 5.7861,
+      "mean_token_accuracy": 0.2568664598464966,
+      "num_tokens": 3171974.0,
+      "step": 3400
+    },
+    {
+      "entropy": 6.03591317653656,
+      "epoch": 1.9861830742659758,
+      "grad_norm": 0.8525373339653015,
+      "learning_rate": 1.21980198019802e-05,
+      "loss": 5.8741,
+      "mean_token_accuracy": 0.24642003327608109,
+      "num_tokens": 3219624.0,
+      "step": 3450
+    },
+    {
+      "epoch": 2.0,
+      "eval_entropy": 6.272298685416648,
+      "eval_loss": 6.12472677230835,
+      "eval_mean_token_accuracy": 0.21168697409091458,
+      "eval_model_preparation_time": 0.0036,
+      "eval_num_tokens": 3239438.0,
+      "eval_runtime": 76.2536,
+      "eval_samples_per_second": 5.692,
+      "eval_steps_per_second": 2.846,
+      "step": 3474
+    },
+    {
+      "entropy": 5.914763498306274,
+      "epoch": 2.0149683362118593,
+      "grad_norm": 0.5479806661605835,
+      "learning_rate": 1.208153756552126e-05,
+      "loss": 5.7559,
+      "mean_token_accuracy": 0.2624077323079109,
+      "num_tokens": 3263994.0,
+      "step": 3500
+    },
+    {
+      "entropy": 6.033470869064331,
+      "epoch": 2.043753598157743,
+      "grad_norm": 1.7186369895935059,
+      "learning_rate": 1.1965055329062319e-05,
+      "loss": 5.8677,
+      "mean_token_accuracy": 0.24745646148920059,
+      "num_tokens": 3311182.0,
+      "step": 3550
+    },
+    {
+      "entropy": 5.962404427528381,
+      "epoch": 2.0725388601036268,
+      "grad_norm": 0.9068580269813538,
+      "learning_rate": 1.184857309260338e-05,
+      "loss": 5.8038,
+      "mean_token_accuracy": 0.25500513821840287,
+      "num_tokens": 3358036.0,
+      "step": 3600
+    },
+    {
+      "entropy": 5.995727968215943,
+      "epoch": 2.1013241220495105,
+      "grad_norm": 2.044490337371826,
+      "learning_rate": 1.1732090856144438e-05,
+      "loss": 5.8333,
+      "mean_token_accuracy": 0.2514388278126717,
+      "num_tokens": 3404058.0,
+      "step": 3650
+    },
+    {
+      "entropy": 5.981345901489258,
+      "epoch": 2.130109383995394,
+      "grad_norm": 0.5262818336486816,
+      "learning_rate": 1.1615608619685499e-05,
+      "loss": 5.8205,
+      "mean_token_accuracy": 0.2523340278863907,
+      "num_tokens": 3449834.0,
+      "step": 3700
+    },
+    {
+      "entropy": 5.848710675239563,
+      "epoch": 2.158894645941278,
+      "grad_norm": 0.726718544960022,
+      "learning_rate": 1.149912638322656e-05,
+      "loss": 5.6891,
+      "mean_token_accuracy": 0.2697497832775116,
+      "num_tokens": 3494740.0,
+      "step": 3750
+    },
+    {
+      "entropy": 5.964878315925598,
+      "epoch": 2.1876799078871616,
+      "grad_norm": 0.6147393584251404,
+      "learning_rate": 1.1382644146767618e-05,
+      "loss": 5.8029,
+      "mean_token_accuracy": 0.2553535890579224,
+      "num_tokens": 3541342.0,
+      "step": 3800
+    },
+    {
+      "entropy": 6.045858116149902,
+      "epoch": 2.2164651698330453,
+      "grad_norm": 0.8283621072769165,
+      "learning_rate": 1.1266161910308679e-05,
+      "loss": 5.8802,
+      "mean_token_accuracy": 0.24544916599988936,
+      "num_tokens": 3588995.0,
+      "step": 3850
+    },
+    {
+      "entropy": 5.909895505905151,
+      "epoch": 2.245250431778929,
+      "grad_norm": 0.9912867546081543,
+      "learning_rate": 1.1149679673849738e-05,
+      "loss": 5.7481,
+      "mean_token_accuracy": 0.2620398569107056,
+      "num_tokens": 3634252.0,
+      "step": 3900
+    },
+    {
+      "entropy": 5.9534005498886104,
+      "epoch": 2.2740356937248127,
+      "grad_norm": 1.2012401819229126,
+      "learning_rate": 1.1033197437390799e-05,
+      "loss": 5.788,
+      "mean_token_accuracy": 0.25642816990613937,
+      "num_tokens": 3681197.0,
+      "step": 3950
+    },
+    {
+      "entropy": 6.155718851089477,
+      "epoch": 2.3028209556706964,
+      "grad_norm": 1.4272509813308716,
+      "learning_rate": 1.0916715200931857e-05,
+      "loss": 5.9842,
+      "mean_token_accuracy": 0.23176315426826477,
+      "num_tokens": 3729955.0,
+      "step": 4000
+    },
+    {
+      "entropy": 6.004842009544372,
+      "epoch": 2.33160621761658,
+      "grad_norm": 1.1919596195220947,
+      "learning_rate": 1.0800232964472918e-05,
+      "loss": 5.8332,
+      "mean_token_accuracy": 0.25039500594139097,
+      "num_tokens": 3777043.0,
+      "step": 4050
+    },
+    {
+      "entropy": 6.045269584655761,
+      "epoch": 2.360391479562464,
+      "grad_norm": 0.6200748085975647,
+      "learning_rate": 1.068375072801398e-05,
+      "loss": 5.8641,
+      "mean_token_accuracy": 0.2466951721906662,
+      "num_tokens": 3824067.0,
+      "step": 4100
+    },
+    {
+      "entropy": 6.105137758255005,
+      "epoch": 2.3891767415083476,
+      "grad_norm": 1.0185531377792358,
+      "learning_rate": 1.0567268491555038e-05,
+      "loss": 5.9181,
+      "mean_token_accuracy": 0.24000227689743042,
+      "num_tokens": 3872769.0,
+      "step": 4150
+    },
+    {
+      "entropy": 6.013391451835632,
+      "epoch": 2.4179620034542313,
+      "grad_norm": 0.6188511848449707,
+      "learning_rate": 1.04507862550961e-05,
+      "loss": 5.8286,
+      "mean_token_accuracy": 0.25189226895570754,
+      "num_tokens": 3919379.0,
+      "step": 4200
+    },
+    {
+      "entropy": 5.972923498153687,
+      "epoch": 2.446747265400115,
+      "grad_norm": 0.7165982127189636,
+      "learning_rate": 1.0334304018637157e-05,
+      "loss": 5.7908,
+      "mean_token_accuracy": 0.2567197346687317,
+      "num_tokens": 3965593.0,
+      "step": 4250
+    },
+    {
+      "entropy": 6.0378124713897705,
+      "epoch": 2.4755325273459987,
+      "grad_norm": 0.5278330445289612,
+      "learning_rate": 1.021782178217822e-05,
+      "loss": 5.8559,
+      "mean_token_accuracy": 0.2484271454811096,
+      "num_tokens": 4012300.0,
+      "step": 4300
+    },
+    {
+      "entropy": 5.984496111869812,
+      "epoch": 2.5043177892918824,
+      "grad_norm": 0.8995006680488586,
+      "learning_rate": 1.0101339545719278e-05,
+      "loss": 5.8092,
+      "mean_token_accuracy": 0.253717774450779,
+      "num_tokens": 4059323.0,
+      "step": 4350
+    },
+    {
+      "entropy": 6.124767150878906,
+      "epoch": 2.533103051237766,
+      "grad_norm": 1.3810409307479858,
+      "learning_rate": 9.984857309260339e-06,
+      "loss": 5.9468,
+      "mean_token_accuracy": 0.23715158700942993,
+      "num_tokens": 4107616.0,
+      "step": 4400
+    },
+    {
+      "entropy": 5.8810745000839235,
+      "epoch": 2.56188831318365,
+      "grad_norm": 0.8794332146644592,
+      "learning_rate": 9.868375072801398e-06,
+      "loss": 5.7089,
+      "mean_token_accuracy": 0.2662400561571121,
+      "num_tokens": 4152400.0,
+      "step": 4450
+    },
+    {
+      "entropy": 6.108017959594727,
+      "epoch": 2.5906735751295336,
+      "grad_norm": 0.5132983922958374,
+      "learning_rate": 9.751892836342458e-06,
+      "loss": 5.9346,
+      "mean_token_accuracy": 0.23871887892484664,
+      "num_tokens": 4200994.0,
+      "step": 4500
+    },
+    {
+      "entropy": 5.985005149841308,
+      "epoch": 2.6194588370754173,
+      "grad_norm": 0.6561470031738281,
+      "learning_rate": 9.635410599883519e-06,
+      "loss": 5.8111,
+      "mean_token_accuracy": 0.25315980523824694,
+      "num_tokens": 4247548.0,
+      "step": 4550
+    },
+    {
+      "entropy": 6.050709452629089,
+      "epoch": 2.648244099021301,
+      "grad_norm": 0.8790570497512817,
+      "learning_rate": 9.51892836342458e-06,
+      "loss": 5.8789,
+      "mean_token_accuracy": 0.2440834751725197,
+      "num_tokens": 4295250.0,
+      "step": 4600
+    },
+    {
+      "entropy": 6.007251596450805,
+      "epoch": 2.6770293609671847,
+      "grad_norm": 0.6728562116622925,
+      "learning_rate": 9.402446126965639e-06,
+      "loss": 5.8338,
+      "mean_token_accuracy": 0.2509264424443245,
+      "num_tokens": 4341599.0,
+      "step": 4650
+    },
+    {
+      "entropy": 5.966628184318543,
+      "epoch": 2.7058146229130684,
+      "grad_norm": 0.5815795063972473,
+      "learning_rate": 9.285963890506699e-06,
+      "loss": 5.7961,
+      "mean_token_accuracy": 0.2559360232949257,
+      "num_tokens": 4388673.0,
+      "step": 4700
+    },
+    {
+      "entropy": 5.7972593069076535,
+      "epoch": 2.734599884858952,
+      "grad_norm": 1.0610334873199463,
+      "learning_rate": 9.169481654047758e-06,
+      "loss": 5.6318,
+      "mean_token_accuracy": 0.27574603259563446,
+      "num_tokens": 4432959.0,
+      "step": 4750
+    },
+    {
+      "entropy": 5.984181261062622,
+      "epoch": 2.763385146804836,
+      "grad_norm": 2.1847357749938965,
+      "learning_rate": 9.052999417588819e-06,
+      "loss": 5.8153,
+      "mean_token_accuracy": 0.2533784031867981,
+      "num_tokens": 4479190.0,
+      "step": 4800
+    },
+    {
+      "entropy": 5.959725599288941,
+      "epoch": 2.7921704087507195,
+      "grad_norm": 0.5671709179878235,
+      "learning_rate": 8.936517181129878e-06,
+      "loss": 5.7912,
+      "mean_token_accuracy": 0.2556650054454803,
+      "num_tokens": 4525674.0,
+      "step": 4850
+    },
+    {
+      "entropy": 5.814929313659668,
+      "epoch": 2.8209556706966032,
+      "grad_norm": 0.9447108507156372,
+      "learning_rate": 8.820034944670938e-06,
+      "loss": 5.6478,
+      "mean_token_accuracy": 0.27417868226766584,
+      "num_tokens": 4570379.0,
+      "step": 4900
+    },
+    {
+      "entropy": 5.96754421710968,
+      "epoch": 2.849740932642487,
+      "grad_norm": 2.009676218032837,
+      "learning_rate": 8.703552708211999e-06,
+      "loss": 5.795,
+      "mean_token_accuracy": 0.2556305864453316,
+      "num_tokens": 4617184.0,
+      "step": 4950
+    },
+    {
+      "entropy": 6.008112049102783,
+      "epoch": 2.8785261945883707,
+      "grad_norm": 1.1977978944778442,
+      "learning_rate": 8.587070471753058e-06,
+      "loss": 5.8416,
+      "mean_token_accuracy": 0.2494604030251503,
+      "num_tokens": 4664180.0,
+      "step": 5000
+    },
+    {
+      "entropy": 5.832320966720581,
+      "epoch": 2.9073114565342544,
+      "grad_norm": 0.4845636785030365,
+      "learning_rate": 8.470588235294118e-06,
+      "loss": 5.6672,
+      "mean_token_accuracy": 0.27187123566865923,
+      "num_tokens": 4708377.0,
+      "step": 5050
+    },
+    {
+      "entropy": 5.84138514995575,
+      "epoch": 2.936096718480138,
+      "grad_norm": 0.8487229943275452,
+      "learning_rate": 8.354105998835179e-06,
+      "loss": 5.6769,
+      "mean_token_accuracy": 0.26995211571455,
+      "num_tokens": 4753587.0,
+      "step": 5100
+    },
+    {
+      "entropy": 6.016681690216064,
+      "epoch": 2.964881980426022,
+      "grad_norm": 0.9554332494735718,
+      "learning_rate": 8.237623762376238e-06,
+      "loss": 5.8479,
+      "mean_token_accuracy": 0.24785644590854644,
+      "num_tokens": 4800508.0,
+      "step": 5150
+    },
+    {
+      "entropy": 6.103472499847412,
+      "epoch": 2.9936672423719055,
+      "grad_norm": 0.6602863669395447,
+      "learning_rate": 8.121141525917298e-06,
+      "loss": 5.9305,
+      "mean_token_accuracy": 0.23794592499732972,
+      "num_tokens": 4849415.0,
+      "step": 5200
+    },
+    {
+      "epoch": 3.0,
+      "eval_entropy": 6.254081044878278,
+      "eval_loss": 6.0980024337768555,
+      "eval_mean_token_accuracy": 0.21401402258103894,
+      "eval_model_preparation_time": 0.0036,
+      "eval_num_tokens": 4859157.0,
+      "eval_runtime": 75.9443,
+      "eval_samples_per_second": 5.715,
+      "eval_steps_per_second": 2.857,
+      "step": 5211
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 8685,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6.795785692717056e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-5211/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:130d33149272782bd60306263c371036419926142b8999aad7806359168f8484
+size 6225