Instructions to use Sim4Rec/inter-play-sim-assistant-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sim4Rec/inter-play-sim-assistant-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sim4Rec/inter-play-sim-assistant-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Sim4Rec/inter-play-sim-assistant-sft")
model = AutoModelForCausalLM.from_pretrained("Sim4Rec/inter-play-sim-assistant-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Sim4Rec/inter-play-sim-assistant-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sim4Rec/inter-play-sim-assistant-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sim4Rec/inter-play-sim-assistant-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Sim4Rec/inter-play-sim-assistant-sft

SGLang

How to use Sim4Rec/inter-play-sim-assistant-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sim4Rec/inter-play-sim-assistant-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sim4Rec/inter-play-sim-assistant-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sim4Rec/inter-play-sim-assistant-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sim4Rec/inter-play-sim-assistant-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Sim4Rec/inter-play-sim-assistant-sft with Docker Model Runner:
```
docker model run hf.co/Sim4Rec/inter-play-sim-assistant-sft
```

jeromeramos commited on Feb 2, 2025

Commit

98e6ed7

verified ·

1 Parent(s): 104abb7

Model save

Browse files

Files changed (15) hide show

README.md +37 -45
all_results.json +6 -6
config.json +1 -1
generation_config.json +1 -1
model-00001-of-00004.safetensors +1 -1
model-00002-of-00004.safetensors +1 -1
model-00003-of-00004.safetensors +1 -1
model-00004-of-00004.safetensors +1 -1
runs/Feb02_21-16-11_w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-54ksrw6/events.out.tfevents.1738531247.w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-54ksrw6.52190.0 +3 -0
special_tokens_map.json +0 -37
tokenizer.json +2 -2
tokenizer_config.json +1 -47
train_results.json +6 -6
trainer_state.json +309 -813
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -1,66 +1,58 @@
 ---
-library_name: transformers
-license: llama3.1
 base_model: meta-llama/Llama-3.1-8B
 tags:
 - trl
 - sft
-- generated_from_trainer
-model-index:
-- name: inter-play-sim-assistant-sft
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# inter-play-sim-assistant-sft
-This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.7087
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 4
-- eval_batch_size: 4
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 4
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 64
-- total_eval_batch_size: 16
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 1
-### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| 0.6807        | 1.0   | 723  | 0.7087          |
-### Framework versions
-- Transformers 4.45.2
-- Pytorch 2.4.1.post302
-- Datasets 3.0.1
-- Tokenizers 0.20.1

 ---
 base_model: meta-llama/Llama-3.1-8B
+library_name: transformers
+model_name: inter-play-sim-assistant-sft
 tags:
+- generated_from_trainer
 - trl
 - sft
+licence: license
 ---
+# Model Card for inter-play-sim-assistant-sft
+This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="Sim4Rec/inter-play-sim-assistant-sft", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/jerome-ramos-20/huggingface/runs/rdaw49f9)
+This model was trained with SFT.
+### Framework versions
+- TRL: 0.14.0
+- Transformers: 4.48.2
+- Pytorch: 2.5.1
+- Datasets: 3.0.1
+- Tokenizers: 0.21.0
+## Citations
+Cite TRL as:
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```

all_results.json CHANGED Viewed

@@ -1,14 +1,14 @@
 {
-    "epoch": 1.0,
     "eval_loss": 0.7086742520332336,
     "eval_runtime": 98.2909,
     "eval_samples": 2071,
     "eval_samples_per_second": 46.902,
     "eval_steps_per_second": 2.94,
-    "total_flos": 1.702143589888295e+18,
-    "train_loss": 0.914938143180119,
-    "train_runtime": 4445.7565,
     "train_samples": 46269,
-    "train_samples_per_second": 10.407,
-    "train_steps_per_second": 0.163
 }

 {
+    "epoch": 0.9986168741355463,
     "eval_loss": 0.7086742520332336,
     "eval_runtime": 98.2909,
     "eval_samples": 2071,
     "eval_samples_per_second": 46.902,
     "eval_steps_per_second": 2.94,
+    "total_flos": 1.74045731487744e+18,
+    "train_loss": 0.8234024724801822,
+    "train_runtime": 2385.6161,
     "train_samples": 46269,
+    "train_samples_per_second": 19.395,
+    "train_steps_per_second": 0.151
 }

config.json CHANGED Viewed

@@ -31,7 +31,7 @@
   "rope_theta": 500000.0,
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
-  "transformers_version": "4.45.2",
   "use_cache": false,
   "vocab_size": 128320
 }

   "rope_theta": 500000.0,
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
+  "transformers_version": "4.48.2",
   "use_cache": false,
   "vocab_size": 128320
 }

generation_config.json CHANGED Viewed

@@ -5,5 +5,5 @@
   "eos_token_id": 128001,
   "temperature": 0.6,
   "top_p": 0.9,
-  "transformers_version": "4.45.2"
 }

   "eos_token_id": 128001,
   "temperature": 0.6,
   "top_p": 0.9,
+  "transformers_version": "4.48.2"
 }

model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f0d1acf0ba6a3f1ba28ba59a3541d28b6951d70f2f46c044d4216ad79c4e6568
 size 4977222960

 version https://git-lfs.github.com/spec/v1
+oid sha256:96a3f25cdf50508cedb9141a645a1b95248e26d20ef5fe3d2de30857075f9ee2
 size 4977222960

model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:94e01be8ae8791d7dd12ad345f799cdbd8762e20c18bbbc34b3810dab3fe08de
 size 4999802720

 version https://git-lfs.github.com/spec/v1
+oid sha256:675a20a1c8cb7ef8957fa7e3549f80d43b42e7bb023aa7c1a6c3b159e495bc67
 size 4999802720

model-00003-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f2342c0554a4a6147d4b0cbb7d9102db5213f1a98ff1f94100f5392e9e4babca
 size 4915916176

 version https://git-lfs.github.com/spec/v1
+oid sha256:bc06566be8c2f403026d162424f82153270a6a0d04b0b40e6e14ad4c2ea5332c
 size 4915916176

model-00004-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e726e4714bc8147f1489557764a74b0681c9fc1f0cf2aa91bf3b9b8f7ae6756a
 size 1168663096

 version https://git-lfs.github.com/spec/v1
+oid sha256:2986a57de29725fc6544bc073fd543bfb7c0fe517bc6ef006cfebc4c12bbb8e5
 size 1168663096

runs/Feb02_21-16-11_w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-54ksrw6/events.out.tfevents.1738531247.w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-54ksrw6.52190.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:658dbb9c74cd9a9ba070279f36a501ac7caf4ca720e92179e4c3996f6fb59659
+size 21809

special_tokens_map.json CHANGED Viewed

@@ -1,41 +1,4 @@
 {
-  "additional_special_tokens": [
-    {
-      "content": "<response>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
-    },
-    {
-      "content": "</response>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
-    },
-    {
-      "content": "<answer>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
-    },
-    {
-      "content": "</answer>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
-    },
-    {
-      "content": "<inquire>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
-    }
-  ],
   "bos_token": {
     "content": "<|im_start|>",
     "lstrip": false,

 {
   "bos_token": {
     "content": "<|im_start|>",
     "lstrip": false,

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3919c1e7bfa558ff525a618a3d463929a238acaba668d7ef6da432fcd6cd7fad
-size 17211327

 version https://git-lfs.github.com/spec/v1
+oid sha256:635e16753749bb3465bdf9e00f68e8b29c9e4884d9ee55eb27705bd8f1318cf4
+size 17210395

tokenizer_config.json CHANGED Viewed

@@ -2063,59 +2063,13 @@
       "rstrip": false,
       "single_word": false,
       "special": true
-    },
-    "128258": {
-      "content": "<response>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "128259": {
-      "content": "</response>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "128260": {
-      "content": "<answer>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "128261": {
-      "content": "</answer>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "128262": {
-      "content": "<inquire>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
     }
   },
-  "additional_special_tokens": [
-    "<response>",
-    "</response>",
-    "<answer>",
-    "</answer>",
-    "<inquire>"
-  ],
   "bos_token": "<|im_start|>",
   "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|im_end|>",
   "model_input_names": [
     "input_ids",
     "attention_mask"

       "rstrip": false,
       "single_word": false,
       "special": true
     }
   },
   "bos_token": "<|im_start|>",
   "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|im_end|>",
+  "extra_special_tokens": {},
   "model_input_names": [
     "input_ids",
     "attention_mask"

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-    "epoch": 1.0,
-    "total_flos": 1.702143589888295e+18,
-    "train_loss": 0.914938143180119,
-    "train_runtime": 4445.7565,
     "train_samples": 46269,
-    "train_samples_per_second": 10.407,
-    "train_steps_per_second": 0.163
 }

 {
+    "epoch": 0.9986168741355463,
+    "total_flos": 1.74045731487744e+18,
+    "train_loss": 0.8234024724801822,
+    "train_runtime": 2385.6161,
     "train_samples": 46269,
+    "train_samples_per_second": 19.395,
+    "train_steps_per_second": 0.151
 }

trainer_state.json CHANGED Viewed

@@ -1,1048 +1,544 @@
 {
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 1.0,
   "eval_steps": 500,
-  "global_step": 723,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 0.0013831258644536654,
-      "grad_norm": 362.72589111328125,
-      "learning_rate": 2.7397260273972604e-06,
-      "loss": 4.592,
       "step": 1
     },
     {
-      "epoch": 0.006915629322268326,
-      "grad_norm": 43.32975769042969,
-      "learning_rate": 1.3698630136986302e-05,
-      "loss": 3.1459,
       "step": 5
     },
     {
-      "epoch": 0.013831258644536652,
-      "grad_norm": 4.805922985076904,
-      "learning_rate": 2.7397260273972603e-05,
-      "loss": 1.1293,
       "step": 10
     },
     {
-      "epoch": 0.02074688796680498,
-      "grad_norm": 12.635961532592773,
-      "learning_rate": 4.1095890410958905e-05,
-      "loss": 1.0413,
       "step": 15
     },
     {
-      "epoch": 0.027662517289073305,
-      "grad_norm": 4.329101085662842,
-      "learning_rate": 5.479452054794521e-05,
-      "loss": 0.9687,
       "step": 20
     },
     {
-      "epoch": 0.034578146611341634,
-      "grad_norm": 2.8100547790527344,
-      "learning_rate": 6.84931506849315e-05,
-      "loss": 0.9551,
       "step": 25
     },
     {
-      "epoch": 0.04149377593360996,
-      "grad_norm": 2.5719969272613525,
-      "learning_rate": 8.219178082191781e-05,
-      "loss": 0.9471,
       "step": 30
     },
     {
-      "epoch": 0.048409405255878286,
-      "grad_norm": 2.6822149753570557,
-      "learning_rate": 9.58904109589041e-05,
-      "loss": 0.9569,
       "step": 35
     },
     {
-      "epoch": 0.05532503457814661,
-      "grad_norm": 2.2626094818115234,
-      "learning_rate": 0.00010958904109589041,
-      "loss": 0.9722,
       "step": 40
     },
     {
-      "epoch": 0.06224066390041494,
-      "grad_norm": 2.315629482269287,
-      "learning_rate": 0.0001232876712328767,
-      "loss": 1.029,
       "step": 45
     },
     {
-      "epoch": 0.06915629322268327,
-      "grad_norm": 1.7225898504257202,
-      "learning_rate": 0.000136986301369863,
-      "loss": 1.0222,
       "step": 50
     },
     {
-      "epoch": 0.07607192254495158,
-      "grad_norm": 1.9998998641967773,
-      "learning_rate": 0.00015068493150684933,
-      "loss": 1.0334,
       "step": 55
     },
     {
-      "epoch": 0.08298755186721991,
-      "grad_norm": 2.4757347106933594,
-      "learning_rate": 0.00016438356164383562,
-      "loss": 1.0415,
       "step": 60
     },
     {
-      "epoch": 0.08990318118948824,
-      "grad_norm": 1.6731195449829102,
-      "learning_rate": 0.00017808219178082192,
-      "loss": 1.1052,
       "step": 65
     },
     {
-      "epoch": 0.09681881051175657,
-      "grad_norm": 2.4862051010131836,
-      "learning_rate": 0.0001917808219178082,
-      "loss": 1.0613,
       "step": 70
     },
     {
-      "epoch": 0.1037344398340249,
-      "grad_norm": 15.797516822814941,
-      "learning_rate": 0.0001999953280342959,
-      "loss": 1.2723,
       "step": 75
     },
     {
-      "epoch": 0.11065006915629322,
-      "grad_norm": 103.66837310791016,
-      "learning_rate": 0.00019994277343344518,
-      "loss": 2.2148,
       "step": 80
     },
     {
-      "epoch": 0.11756569847856155,
-      "grad_norm": 17.954513549804688,
-      "learning_rate": 0.0001998318550673364,
-      "loss": 2.1056,
       "step": 85
     },
     {
-      "epoch": 0.12448132780082988,
-      "grad_norm": 11.641583442687988,
-      "learning_rate": 0.00019966263770917193,
-      "loss": 3.8091,
       "step": 90
     },
     {
-      "epoch": 0.1313969571230982,
-      "grad_norm": 3.0865869522094727,
-      "learning_rate": 0.00019943522017712358,
-      "loss": 1.4319,
       "step": 95
     },
     {
-      "epoch": 0.13831258644536654,
-      "grad_norm": 2.2934648990631104,
-      "learning_rate": 0.000199149735276626,
-      "loss": 1.1962,
       "step": 100
     },
     {
-      "epoch": 0.14522821576763487,
-      "grad_norm": 2.6889796257019043,
-      "learning_rate": 0.00019880634972282166,
-      "loss": 1.1448,
       "step": 105
     },
     {
-      "epoch": 0.15214384508990317,
-      "grad_norm": 1.613039493560791,
-      "learning_rate": 0.00019840526404320415,
-      "loss": 1.1176,
       "step": 110
     },
     {
-      "epoch": 0.1590594744121715,
-      "grad_norm": 1.7039101123809814,
-      "learning_rate": 0.0001979467124605156,
-      "loss": 1.1048,
       "step": 115
     },
     {
-      "epoch": 0.16597510373443983,
-      "grad_norm": 1.4721240997314453,
-      "learning_rate": 0.00019743096275596735,
-      "loss": 1.0652,
       "step": 120
     },
     {
-      "epoch": 0.17289073305670816,
-      "grad_norm": 1.2751952409744263,
-      "learning_rate": 0.0001968583161128631,
-      "loss": 1.0753,
       "step": 125
     },
     {
-      "epoch": 0.1798063623789765,
-      "grad_norm": 1.4763237237930298,
-      "learning_rate": 0.00019622910694071656,
-      "loss": 1.0534,
       "step": 130
     },
     {
-      "epoch": 0.18672199170124482,
-      "grad_norm": 1.4396852254867554,
-      "learning_rate": 0.00019554370267996538,
-      "loss": 1.0363,
       "step": 135
     },
     {
-      "epoch": 0.19363762102351315,
-      "grad_norm": 1.4692487716674805,
-      "learning_rate": 0.00019480250358739663,
-      "loss": 1.0164,
       "step": 140
     },
     {
-      "epoch": 0.20055325034578148,
-      "grad_norm": 1.7275824546813965,
-      "learning_rate": 0.00019400594250240798,
-      "loss": 1.0782,
       "step": 145
     },
     {
-      "epoch": 0.2074688796680498,
-      "grad_norm": 1.3491764068603516,
-      "learning_rate": 0.0001931544845942415,
-      "loss": 1.017,
       "step": 150
     },
     {
-      "epoch": 0.2143845089903181,
-      "grad_norm": 1.069191336631775,
-      "learning_rate": 0.00019224862709033824,
-      "loss": 1.0312,
       "step": 155
     },
     {
-      "epoch": 0.22130013831258644,
-      "grad_norm": 1.2669651508331299,
-      "learning_rate": 0.00019128889898597116,
-      "loss": 1.06,
       "step": 160
     },
     {
-      "epoch": 0.22821576763485477,
-      "grad_norm": 1.5793681144714355,
-      "learning_rate": 0.0001902758607353269,
-      "loss": 0.9996,
       "step": 165
     },
     {
-      "epoch": 0.2351313969571231,
-      "grad_norm": 1.3170143365859985,
-      "learning_rate": 0.00018921010392421628,
-      "loss": 1.0259,
       "step": 170
     },
     {
-      "epoch": 0.24204702627939143,
-      "grad_norm": 1.088996171951294,
-      "learning_rate": 0.00018809225092460488,
-      "loss": 1.0145,
       "step": 175
     },
     {
-      "epoch": 0.24896265560165975,
-      "grad_norm": 1.0915374755859375,
-      "learning_rate": 0.0001869229545311653,
-      "loss": 1.004,
       "step": 180
     },
     {
-      "epoch": 0.25587828492392806,
-      "grad_norm": 0.9946417808532715,
-      "learning_rate": 0.00018570289758006346,
-      "loss": 0.9957,
       "step": 185
     },
     {
-      "epoch": 0.2627939142461964,
-      "grad_norm": 1.0990344285964966,
-      "learning_rate": 0.00018443279255020152,
-      "loss": 0.9922,
       "step": 190
     },
     {
-      "epoch": 0.2697095435684647,
-      "grad_norm": 1.0007325410842896,
-      "learning_rate": 0.0001831133811471503,
-      "loss": 0.9955,
       "step": 195
     },
     {
-      "epoch": 0.2766251728907331,
-      "grad_norm": 0.9499465823173523,
-      "learning_rate": 0.000181745433870014,
-      "loss": 0.9934,
       "step": 200
     },
     {
-      "epoch": 0.2835408022130014,
-      "grad_norm": 0.8894681930541992,
-      "learning_rate": 0.00018032974956148063,
-      "loss": 0.9824,
       "step": 205
     },
     {
-      "epoch": 0.29045643153526973,
-      "grad_norm": 0.9037023782730103,
-      "learning_rate": 0.00017886715494132006,
-      "loss": 0.9829,
       "step": 210
     },
     {
-      "epoch": 0.29737206085753803,
-      "grad_norm": 0.8227784633636475,
-      "learning_rate": 0.00017735850412360331,
-      "loss": 0.9855,
       "step": 215
     },
     {
-      "epoch": 0.30428769017980634,
-      "grad_norm": 0.7213776707649231,
-      "learning_rate": 0.0001758046781179237,
-      "loss": 0.9818,
       "step": 220
     },
     {
-      "epoch": 0.3112033195020747,
-      "grad_norm": 0.7871569991111755,
-      "learning_rate": 0.00017420658431491223,
-      "loss": 0.9537,
       "step": 225
     },
     {
-      "epoch": 0.318118948824343,
-      "grad_norm": 0.7829176783561707,
-      "learning_rate": 0.0001725651559563469,
-      "loss": 0.9476,
       "step": 230
     },
     {
-      "epoch": 0.32503457814661135,
-      "grad_norm": 0.7467004060745239,
-      "learning_rate": 0.00017088135159016584,
-      "loss": 0.9544,
       "step": 235
     },
     {
-      "epoch": 0.33195020746887965,
-      "grad_norm": 0.8612602353096008,
-      "learning_rate": 0.00016915615451070233,
-      "loss": 0.9651,
       "step": 240
     },
     {
-      "epoch": 0.338865836791148,
-      "grad_norm": 0.817294716835022,
-      "learning_rate": 0.0001673905721844686,
-      "loss": 0.9291,
       "step": 245
     },
     {
-      "epoch": 0.3457814661134163,
-      "grad_norm": 0.7505643963813782,
-      "learning_rate": 0.00016558563566182363,
-      "loss": 0.9078,
       "step": 250
     },
     {
-      "epoch": 0.35269709543568467,
-      "grad_norm": 0.8095275163650513,
-      "learning_rate": 0.000163742398974869,
-      "loss": 0.9343,
       "step": 255
     },
     {
-      "epoch": 0.359612724757953,
-      "grad_norm": 0.7329360842704773,
-      "learning_rate": 0.00016186193852192355,
-      "loss": 0.9177,
       "step": 260
     },
     {
-      "epoch": 0.3665283540802213,
-      "grad_norm": 0.7928723096847534,
-      "learning_rate": 0.0001599453524389374,
-      "loss": 0.9402,
       "step": 265
     },
     {
-      "epoch": 0.37344398340248963,
-      "grad_norm": 0.7580089569091797,
-      "learning_rate": 0.00015799375995821118,
-      "loss": 0.9128,
       "step": 270
     },
     {
-      "epoch": 0.38035961272475793,
-      "grad_norm": 0.6633639335632324,
-      "learning_rate": 0.00015600830075479603,
-      "loss": 0.9083,
       "step": 275
     },
     {
-      "epoch": 0.3872752420470263,
-      "grad_norm": 0.719149112701416,
-      "learning_rate": 0.0001539901342809554,
-      "loss": 0.9134,
       "step": 280
     },
     {
-      "epoch": 0.3941908713692946,
-      "grad_norm": 0.7551069259643555,
-      "learning_rate": 0.00015194043908907775,
-      "loss": 0.9187,
       "step": 285
     },
     {
-      "epoch": 0.40110650069156295,
-      "grad_norm": 0.7057808041572571,
-      "learning_rate": 0.00014986041214343486,
-      "loss": 0.9034,
       "step": 290
     },
     {
-      "epoch": 0.40802213001383125,
-      "grad_norm": 0.720129668712616,
-      "learning_rate": 0.00014775126812118864,
-      "loss": 0.8919,
       "step": 295
     },
     {
-      "epoch": 0.4149377593360996,
-      "grad_norm": 0.6026068925857544,
-      "learning_rate": 0.00014561423870305382,
-      "loss": 0.8808,
       "step": 300
     },
     {
-      "epoch": 0.4218533886583679,
-      "grad_norm": 0.6884504556655884,
-      "learning_rate": 0.000143450571854031,
-      "loss": 0.8889,
       "step": 305
     },
     {
-      "epoch": 0.4287690179806362,
-      "grad_norm": 0.7061929702758789,
-      "learning_rate": 0.00014126153109463024,
-      "loss": 0.8767,
       "step": 310
     },
     {
-      "epoch": 0.43568464730290457,
-      "grad_norm": 0.6553164124488831,
-      "learning_rate": 0.0001390483947630109,
-      "loss": 0.8512,
       "step": 315
     },
     {
-      "epoch": 0.4426002766251729,
-      "grad_norm": 0.6340067982673645,
-      "learning_rate": 0.00013681245526846783,
-      "loss": 0.8621,
       "step": 320
     },
     {
-      "epoch": 0.44951590594744123,
-      "grad_norm": 0.7011162042617798,
-      "learning_rate": 0.00013455501833670088,
-      "loss": 0.863,
       "step": 325
     },
     {
-      "epoch": 0.45643153526970953,
-      "grad_norm": 0.6404575109481812,
-      "learning_rate": 0.00013227740224730798,
-      "loss": 0.8574,
       "step": 330
     },
     {
-      "epoch": 0.4633471645919779,
-      "grad_norm": 0.5631649494171143,
-      "learning_rate": 0.00012998093706394675,
-      "loss": 0.8621,
       "step": 335
     },
     {
-      "epoch": 0.4702627939142462,
-      "grad_norm": 0.6324531435966492,
-      "learning_rate": 0.00012766696385761494,
-      "loss": 0.8459,
       "step": 340
     },
     {
-      "epoch": 0.47717842323651455,
-      "grad_norm": 0.6023428440093994,
-      "learning_rate": 0.00012533683392350263,
-      "loss": 0.8534,
       "step": 345
     },
     {
-      "epoch": 0.48409405255878285,
-      "grad_norm": 0.5649986863136292,
-      "learning_rate": 0.00012299190799187405,
-      "loss": 0.8396,
       "step": 350
     },
     {
-      "epoch": 0.49100968188105115,
-      "grad_norm": 0.6606884598731995,
-      "learning_rate": 0.00012063355543343924,
-      "loss": 0.8505,
       "step": 355
     },
     {
-      "epoch": 0.4979253112033195,
-      "grad_norm": 0.6517143845558167,
-      "learning_rate": 0.00011826315345968013,
-      "loss": 0.8152,
       "step": 360
     },
     {
-      "epoch": 0.5048409405255878,
-      "grad_norm": 0.5879771113395691,
-      "learning_rate": 0.00011588208631859807,
-      "loss": 0.8375,
-      "step": 365
-    },
-    {
-      "epoch": 0.5117565698478561,
-      "grad_norm": 0.5463613867759705,
-      "learning_rate": 0.00011349174448635158,
-      "loss": 0.8307,
-      "step": 370
-    },
-    {
-      "epoch": 0.5186721991701245,
-      "grad_norm": 0.6212437748908997,
-      "learning_rate": 0.00011109352385525783,
-      "loss": 0.8303,
-      "step": 375
-    },
-    {
-      "epoch": 0.5255878284923928,
-      "grad_norm": 0.5765424370765686,
-      "learning_rate": 0.00010868882491863049,
-      "loss": 0.8311,
-      "step": 380
-    },
-    {
-      "epoch": 0.5325034578146611,
-      "grad_norm": 0.5644758343696594,
-      "learning_rate": 0.00010627905195293135,
-      "loss": 0.8251,
-      "step": 385
-    },
-    {
-      "epoch": 0.5394190871369294,
-      "grad_norm": 0.5946635603904724,
-      "learning_rate": 0.00010386561219771222,
-      "loss": 0.8157,
-      "step": 390
-    },
-    {
-      "epoch": 0.5463347164591977,
-      "grad_norm": 0.5891283750534058,
-      "learning_rate": 0.00010144991503382674,
-      "loss": 0.811,
-      "step": 395
-    },
-    {
-      "epoch": 0.5532503457814661,
-      "grad_norm": 0.5650286674499512,
-      "learning_rate": 9.903337116039171e-05,
-      "loss": 0.7895,
-      "step": 400
-    },
-    {
-      "epoch": 0.5601659751037344,
-      "grad_norm": 0.5457018613815308,
-      "learning_rate": 9.661739177097836e-05,
-      "loss": 0.7975,
-      "step": 405
-    },
-    {
-      "epoch": 0.5670816044260027,
-      "grad_norm": 0.54831463098526,
-      "learning_rate": 9.420338772951521e-05,
-      "loss": 0.806,
-      "step": 410
-    },
-    {
-      "epoch": 0.573997233748271,
-      "grad_norm": 0.5567086935043335,
-      "learning_rate": 9.179276874638315e-05,
-      "loss": 0.8009,
-      "step": 415
-    },
-    {
-      "epoch": 0.5809128630705395,
-      "grad_norm": 0.5590771436691284,
-      "learning_rate": 8.938694255518444e-05,
-      "loss": 0.7919,
-      "step": 420
-    },
-    {
-      "epoch": 0.5878284923928078,
-      "grad_norm": 0.5756837725639343,
-      "learning_rate": 8.698731409066568e-05,
-      "loss": 0.7923,
-      "step": 425
-    },
-    {
-      "epoch": 0.5947441217150761,
-      "grad_norm": 0.5342572331428528,
-      "learning_rate": 8.459528466827575e-05,
-      "loss": 0.8009,
-      "step": 430
-    },
-    {
-      "epoch": 0.6016597510373444,
-      "grad_norm": 0.5078967213630676,
-      "learning_rate": 8.221225116583678e-05,
-      "loss": 0.7832,
-      "step": 435
-    },
-    {
-      "epoch": 0.6085753803596127,
-      "grad_norm": 0.5687820911407471,
-      "learning_rate": 7.98396052078071e-05,
-      "loss": 0.7867,
-      "step": 440
-    },
-    {
-      "epoch": 0.6154910096818811,
-      "grad_norm": 0.563842236995697,
-      "learning_rate": 7.747873235261157e-05,
-      "loss": 0.7876,
-      "step": 445
-    },
-    {
-      "epoch": 0.6224066390041494,
-      "grad_norm": 0.5250119566917419,
-      "learning_rate": 7.513101128351454e-05,
-      "loss": 0.7883,
-      "step": 450
-    },
-    {
-      "epoch": 0.6293222683264177,
-      "grad_norm": 0.4736279845237732,
-      "learning_rate": 7.279781300350758e-05,
-      "loss": 0.7733,
-      "step": 455
-    },
-    {
-      "epoch": 0.636237897648686,
-      "grad_norm": 0.5302676558494568,
-      "learning_rate": 7.048050003468251e-05,
-      "loss": 0.7777,
-      "step": 460
-    },
-    {
-      "epoch": 0.6431535269709544,
-      "grad_norm": 0.5099808573722839,
-      "learning_rate": 6.81804256225567e-05,
-      "loss": 0.7903,
-      "step": 465
-    },
-    {
-      "epoch": 0.6500691562932227,
-      "grad_norm": 0.5259727239608765,
-      "learning_rate": 6.58989329458158e-05,
-      "loss": 0.7643,
-      "step": 470
-    },
-    {
-      "epoch": 0.656984785615491,
-      "grad_norm": 0.48697277903556824,
-      "learning_rate": 6.36373543319353e-05,
-      "loss": 0.761,
-      "step": 475
-    },
-    {
-      "epoch": 0.6639004149377593,
-      "grad_norm": 0.5124850869178772,
-      "learning_rate": 6.139701047913885e-05,
-      "loss": 0.7603,
-      "step": 480
-    },
-    {
-      "epoch": 0.6708160442600276,
-      "grad_norm": 0.49750426411628723,
-      "learning_rate": 5.917920968514752e-05,
-      "loss": 0.7555,
-      "step": 485
-    },
-    {
-      "epoch": 0.677731673582296,
-      "grad_norm": 0.5086675882339478,
-      "learning_rate": 5.698524708317081e-05,
-      "loss": 0.7618,
-      "step": 490
-    },
-    {
-      "epoch": 0.6846473029045643,
-      "grad_norm": 0.5031773447990417,
-      "learning_rate": 5.481640388558551e-05,
-      "loss": 0.742,
-      "step": 495
-    },
-    {
-      "epoch": 0.6915629322268326,
-      "grad_norm": 0.5332823991775513,
-      "learning_rate": 5.267394663574351e-05,
-      "loss": 0.7524,
-      "step": 500
-    },
-    {
-      "epoch": 0.6984785615491009,
-      "grad_norm": 0.5489112138748169,
-      "learning_rate": 5.055912646834635e-05,
-      "loss": 0.73,
-      "step": 505
     },
     {
-      "epoch": 0.7053941908713693,
-      "grad_norm": 0.5065959692001343,
-      "learning_rate": 4.8473178378817564e-05,
-      "loss": 0.749,
-      "step": 510
-    },
-    {
-      "epoch": 0.7123098201936376,
-      "grad_norm": 0.49869057536125183,
-      "learning_rate": 4.6417320502100316e-05,
-      "loss": 0.7426,
-      "step": 515
-    },
-    {
-      "epoch": 0.719225449515906,
-      "grad_norm": 0.5105318427085876,
-      "learning_rate": 4.439275340130099e-05,
-      "loss": 0.7551,
-      "step": 520
-    },
-    {
-      "epoch": 0.7261410788381742,
-      "grad_norm": 0.4851531982421875,
-      "learning_rate": 4.240065936659374e-05,
-      "loss": 0.7333,
-      "step": 525
-    },
-    {
-      "epoch": 0.7330567081604425,
-      "grad_norm": 0.5122374892234802,
-      "learning_rate": 4.044220172479675e-05,
-      "loss": 0.753,
-      "step": 530
-    },
-    {
-      "epoch": 0.739972337482711,
-      "grad_norm": 0.4910212755203247,
-      "learning_rate": 3.851852416002187e-05,
-      "loss": 0.7309,
-      "step": 535
-    },
-    {
-      "epoch": 0.7468879668049793,
-      "grad_norm": 0.47112396359443665,
-      "learning_rate": 3.663075004579547e-05,
-      "loss": 0.747,
-      "step": 540
-    },
-    {
-      "epoch": 0.7538035961272476,
-      "grad_norm": 0.5120583176612854,
-      "learning_rate": 3.477998178903982e-05,
-      "loss": 0.7317,
-      "step": 545
-    },
-    {
-      "epoch": 0.7607192254495159,
-      "grad_norm": 0.47287434339523315,
-      "learning_rate": 3.296730018629846e-05,
-      "loss": 0.7124,
-      "step": 550
-    },
-    {
-      "epoch": 0.7676348547717843,
-      "grad_norm": 0.46755021810531616,
-      "learning_rate": 3.11937637925816e-05,
-      "loss": 0.7215,
-      "step": 555
-    },
-    {
-      "epoch": 0.7745504840940526,
-      "grad_norm": 0.4925783574581146,
-      "learning_rate": 2.9460408303199694e-05,
-      "loss": 0.732,
-      "step": 560
-    },
-    {
-      "epoch": 0.7814661134163209,
-      "grad_norm": 0.4654233455657959,
-      "learning_rate": 2.7768245948946612e-05,
-      "loss": 0.7157,
-      "step": 565
-    },
-    {
-      "epoch": 0.7883817427385892,
-      "grad_norm": 0.471231609582901,
-      "learning_rate": 2.61182649049853e-05,
-      "loss": 0.7193,
-      "step": 570
-    },
-    {
-      "epoch": 0.7952973720608575,
-      "grad_norm": 0.4691479802131653,
-      "learning_rate": 2.4511428713781238e-05,
-      "loss": 0.7268,
-      "step": 575
-    },
-    {
-      "epoch": 0.8022130013831259,
-      "grad_norm": 0.46052515506744385,
-      "learning_rate": 2.2948675722421086e-05,
-      "loss": 0.707,
-      "step": 580
-    },
-    {
-      "epoch": 0.8091286307053942,
-      "grad_norm": 0.4800238609313965,
-      "learning_rate": 2.1430918534643996e-05,
-      "loss": 0.7119,
-      "step": 585
-    },
-    {
-      "epoch": 0.8160442600276625,
-      "grad_norm": 0.44981294870376587,
-      "learning_rate": 1.9959043477907e-05,
-      "loss": 0.7031,
-      "step": 590
-    },
-    {
-      "epoch": 0.8229598893499308,
-      "grad_norm": 0.4723599851131439,
-      "learning_rate": 1.8533910085794713e-05,
-      "loss": 0.7131,
-      "step": 595
-    },
-    {
-      "epoch": 0.8298755186721992,
-      "grad_norm": 0.4526391327381134,
-      "learning_rate": 1.7156350596075744e-05,
-      "loss": 0.7028,
-      "step": 600
-    },
-    {
-      "epoch": 0.8367911479944675,
-      "grad_norm": 0.4318162500858307,
-      "learning_rate": 1.5827169464699576e-05,
-      "loss": 0.7129,
-      "step": 605
-    },
-    {
-      "epoch": 0.8437067773167358,
-      "grad_norm": 0.45468002557754517,
-      "learning_rate": 1.4547142896016608e-05,
-      "loss": 0.7003,
-      "step": 610
-    },
-    {
-      "epoch": 0.8506224066390041,
-      "grad_norm": 0.4626847803592682,
-      "learning_rate": 1.3317018389496927e-05,
-      "loss": 0.7148,
-      "step": 615
-    },
-    {
-      "epoch": 0.8575380359612724,
-      "grad_norm": 0.4578290581703186,
-      "learning_rate": 1.2137514303211561e-05,
-      "loss": 0.7,
-      "step": 620
-    },
-    {
-      "epoch": 0.8644536652835408,
-      "grad_norm": 0.4639127254486084,
-      "learning_rate": 1.1009319434331622e-05,
-      "loss": 0.7035,
-      "step": 625
-    },
-    {
-      "epoch": 0.8713692946058091,
-      "grad_norm": 0.45225322246551514,
-      "learning_rate": 9.93309261689015e-06,
-      "loss": 0.6892,
-      "step": 630
-    },
-    {
-      "epoch": 0.8782849239280774,
-      "grad_norm": 0.5041568279266357,
-      "learning_rate": 8.909462337041507e-06,
-      "loss": 0.6957,
-      "step": 635
-    },
-    {
-      "epoch": 0.8852005532503457,
-      "grad_norm": 0.4375840127468109,
-      "learning_rate": 7.939026366043322e-06,
-      "loss": 0.698,
-      "step": 640
-    },
-    {
-      "epoch": 0.8921161825726142,
-      "grad_norm": 0.44402968883514404,
-      "learning_rate": 7.022351411174866e-06,
-      "loss": 0.6981,
-      "step": 645
-    },
-    {
-      "epoch": 0.8990318118948825,
-      "grad_norm": 0.4330901801586151,
-      "learning_rate": 6.1599727847957975e-06,
-      "loss": 0.6907,
-      "step": 650
-    },
-    {
-      "epoch": 0.9059474412171508,
-      "grad_norm": 0.4889258146286011,
-      "learning_rate": 5.3523940917390215e-06,
-      "loss": 0.6822,
-      "step": 655
-    },
-    {
-      "epoch": 0.9128630705394191,
-      "grad_norm": 0.45514872670173645,
-      "learning_rate": 4.600086935219561e-06,
-      "loss": 0.6885,
-      "step": 660
-    },
-    {
-      "epoch": 0.9197786998616874,
-      "grad_norm": 0.4331250488758087,
-      "learning_rate": 3.903490641431573e-06,
-      "loss": 0.6758,
-      "step": 665
-    },
-    {
-      "epoch": 0.9266943291839558,
-      "grad_norm": 0.4574739336967468,
-      "learning_rate": 3.2630120029942037e-06,
-      "loss": 0.6851,
-      "step": 670
-    },
-    {
-      "epoch": 0.9336099585062241,
-      "grad_norm": 0.4640979468822479,
-      "learning_rate": 2.679025041396155e-06,
-      "loss": 0.7036,
-      "step": 675
-    },
-    {
-      "epoch": 0.9405255878284924,
-      "grad_norm": 0.45547041296958923,
-      "learning_rate": 2.1518707885777146e-06,
-      "loss": 0.722,
-      "step": 680
-    },
-    {
-      "epoch": 0.9474412171507607,
-      "grad_norm": 0.4363589286804199,
-      "learning_rate": 1.6818570877776718e-06,
-      "loss": 0.6818,
-      "step": 685
-    },
-    {
-      "epoch": 0.9543568464730291,
-      "grad_norm": 0.45574650168418884,
-      "learning_rate": 1.2692584137615204e-06,
-      "loss": 0.704,
-      "step": 690
-    },
-    {
-      "epoch": 0.9612724757952974,
-      "grad_norm": 0.4429336190223694,
-      "learning_rate": 9.143157125359514e-07,
-      "loss": 0.6805,
-      "step": 695
-    },
-    {
-      "epoch": 0.9681881051175657,
-      "grad_norm": 0.435350239276886,
-      "learning_rate": 6.172362606431281e-07,
-      "loss": 0.6934,
-      "step": 700
-    },
-    {
-      "epoch": 0.975103734439834,
-      "grad_norm": 0.47741129994392395,
-      "learning_rate": 3.781935441171336e-07,
-      "loss": 0.6917,
-      "step": 705
-    },
-    {
-      "epoch": 0.9820193637621023,
-      "grad_norm": 0.44488659501075745,
-      "learning_rate": 1.973271571728441e-07,
-      "loss": 0.6832,
-      "step": 710
-    },
-    {
-      "epoch": 0.9889349930843707,
-      "grad_norm": 0.46524661779403687,
-      "learning_rate": 7.474272068698218e-08,
-      "loss": 0.6978,
-      "step": 715
-    },
-    {
-      "epoch": 0.995850622406639,
-      "grad_norm": 0.4371030032634735,
-      "learning_rate": 1.0511820518432913e-08,
-      "loss": 0.6807,
-      "step": 720
-    },
-    {
-      "epoch": 1.0,
-      "eval_loss": 0.7086742520332336,
-      "eval_runtime": 99.1201,
-      "eval_samples_per_second": 46.509,
-      "eval_steps_per_second": 2.916,
-      "step": 723
-    },
-    {
-      "epoch": 1.0,
-      "step": 723,
-      "total_flos": 1.702143589888295e+18,
-      "train_loss": 0.914938143180119,
-      "train_runtime": 4445.7565,
-      "train_samples_per_second": 10.407,
-      "train_steps_per_second": 0.163
     }
   ],
   "logging_steps": 5,
-  "max_steps": 723,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 1,
   "save_steps": 500,
@@ -1058,7 +554,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.702143589888295e+18,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

 {
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.9986168741355463,
   "eval_steps": 500,
+  "global_step": 361,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.0027662517289073307,
+      "grad_norm": 22.881450653076172,
+      "learning_rate": 5.405405405405406e-06,
+      "loss": 1.6158,
       "step": 1
     },
     {
+      "epoch": 0.013831258644536652,
+      "grad_norm": 2.178889274597168,
+      "learning_rate": 2.702702702702703e-05,
+      "loss": 1.3807,
       "step": 5
     },
     {
+      "epoch": 0.027662517289073305,
+      "grad_norm": 14.400589942932129,
+      "learning_rate": 5.405405405405406e-05,
+      "loss": 1.3352,
       "step": 10
     },
     {
+      "epoch": 0.04149377593360996,
+      "grad_norm": 2.756945848464966,
+      "learning_rate": 8.108108108108109e-05,
+      "loss": 1.2203,
       "step": 15
     },
     {
+      "epoch": 0.05532503457814661,
+      "grad_norm": 1.3922957181930542,
+      "learning_rate": 0.00010810810810810812,
+      "loss": 1.0964,
       "step": 20
     },
     {
+      "epoch": 0.06915629322268327,
+      "grad_norm": 1.0261996984481812,
+      "learning_rate": 0.00013513513513513514,
+      "loss": 1.2033,
       "step": 25
     },
     {
+      "epoch": 0.08298755186721991,
+      "grad_norm": 1.6099579334259033,
+      "learning_rate": 0.00016216216216216218,
+      "loss": 1.2005,
       "step": 30
     },
     {
+      "epoch": 0.09681881051175657,
+      "grad_norm": 1.77192223072052,
+      "learning_rate": 0.0001891891891891892,
+      "loss": 1.4161,
       "step": 35
     },
     {
+      "epoch": 0.11065006915629322,
+      "grad_norm": 1.0772837400436401,
+      "learning_rate": 0.0001999576950082201,
+      "loss": 1.4553,
       "step": 40
     },
     {
+      "epoch": 0.12448132780082988,
+      "grad_norm": 1.4605121612548828,
+      "learning_rate": 0.0001996992941167792,
+      "loss": 1.2175,
       "step": 45
     },
     {
+      "epoch": 0.13831258644536654,
+      "grad_norm": 1.0822768211364746,
+      "learning_rate": 0.00019920660160815422,
+      "loss": 1.0378,
       "step": 50
     },
     {
+      "epoch": 0.15214384508990317,
+      "grad_norm": 0.9796843528747559,
+      "learning_rate": 0.00019848077530122083,
+      "loss": 1.0451,
       "step": 55
     },
     {
+      "epoch": 0.16597510373443983,
+      "grad_norm": 1.1945514678955078,
+      "learning_rate": 0.00019752352087524933,
+      "loss": 1.4266,
       "step": 60
     },
     {
+      "epoch": 0.1798063623789765,
+      "grad_norm": 0.8683685064315796,
+      "learning_rate": 0.00019633708786158806,
+      "loss": 1.0347,
       "step": 65
     },
     {
+      "epoch": 0.19363762102351315,
+      "grad_norm": 0.25568732619285583,
+      "learning_rate": 0.0001949242643573034,
+      "loss": 0.9376,
       "step": 70
     },
     {
+      "epoch": 0.2074688796680498,
+      "grad_norm": 0.26001420617103577,
+      "learning_rate": 0.0001932883704732001,
+      "loss": 0.9132,
       "step": 75
     },
     {
+      "epoch": 0.22130013831258644,
+      "grad_norm": 0.2598419189453125,
+      "learning_rate": 0.00019143325053161796,
+      "loss": 0.8958,
       "step": 80
     },
     {
+      "epoch": 0.2351313969571231,
+      "grad_norm": 0.20231448113918304,
+      "learning_rate": 0.00018936326403234125,
+      "loss": 0.8734,
       "step": 85
     },
     {
+      "epoch": 0.24896265560165975,
+      "grad_norm": 0.17383822798728943,
+      "learning_rate": 0.00018708327540784922,
+      "loss": 0.8701,
       "step": 90
     },
     {
+      "epoch": 0.2627939142461964,
+      "grad_norm": 0.17745399475097656,
+      "learning_rate": 0.0001845986425919841,
+      "loss": 0.8499,
       "step": 95
     },
     {
+      "epoch": 0.2766251728907331,
+      "grad_norm": 0.17801660299301147,
+      "learning_rate": 0.0001819152044288992,
+      "loss": 0.8512,
       "step": 100
     },
     {
+      "epoch": 0.29045643153526973,
+      "grad_norm": 0.18566825985908508,
+      "learning_rate": 0.00017903926695187595,
+      "loss": 0.8361,
       "step": 105
     },
     {
+      "epoch": 0.30428769017980634,
+      "grad_norm": 0.18012060225009918,
+      "learning_rate": 0.00017597758856425494,
+      "loss": 0.834,
       "step": 110
     },
     {
+      "epoch": 0.318118948824343,
+      "grad_norm": 0.16151954233646393,
+      "learning_rate": 0.00017273736415730488,
+      "loss": 0.8114,
       "step": 115
     },
     {
+      "epoch": 0.33195020746887965,
+      "grad_norm": 0.16563855111598969,
+      "learning_rate": 0.00016932620820235244,
+      "loss": 0.8191,
       "step": 120
     },
     {
+      "epoch": 0.3457814661134163,
+      "grad_norm": 0.16186057031154633,
+      "learning_rate": 0.0001657521368569064,
+      "loss": 0.7887,
       "step": 125
     },
     {
+      "epoch": 0.359612724757953,
+      "grad_norm": 0.1734704077243805,
+      "learning_rate": 0.000162023549126826,
+      "loss": 0.7946,
       "step": 130
     },
     {
+      "epoch": 0.37344398340248963,
+      "grad_norm": 0.17336814105510712,
+      "learning_rate": 0.00015814920712880267,
+      "loss": 0.7974,
       "step": 135
     },
     {
+      "epoch": 0.3872752420470263,
+      "grad_norm": 0.15509486198425293,
+      "learning_rate": 0.00015413821549953698,
+      "loss": 0.7866,
       "step": 140
     },
     {
+      "epoch": 0.40110650069156295,
+      "grad_norm": 0.18101590871810913,
+      "learning_rate": 0.00015000000000000001,
+      "loss": 0.7927,
       "step": 145
     },
     {
+      "epoch": 0.4149377593360996,
+      "grad_norm": 0.14941518008708954,
+      "learning_rate": 0.0001457442853650581,
+      "loss": 0.7698,
       "step": 150
     },
     {
+      "epoch": 0.4287690179806362,
+      "grad_norm": 0.15677104890346527,
+      "learning_rate": 0.00014138107245051392,
+      "loss": 0.7721,
       "step": 155
     },
     {
+      "epoch": 0.4426002766251729,
+      "grad_norm": 0.14607611298561096,
+      "learning_rate": 0.00013692061473126845,
+      "loss": 0.7516,
       "step": 160
     },
     {
+      "epoch": 0.45643153526970953,
+      "grad_norm": 0.16472630202770233,
+      "learning_rate": 0.00013237339420583212,
+      "loss": 0.7554,
       "step": 165
     },
     {
+      "epoch": 0.4702627939142462,
+      "grad_norm": 0.13666489720344543,
+      "learning_rate": 0.00012775009676380957,
+      "loss": 0.7515,
       "step": 170
     },
     {
+      "epoch": 0.48409405255878285,
+      "grad_norm": 0.1362183392047882,
+      "learning_rate": 0.00012306158707424403,
+      "loss": 0.7513,
       "step": 175
     },
     {
+      "epoch": 0.4979253112033195,
+      "grad_norm": 0.12810291349887848,
+      "learning_rate": 0.00011831888305383268,
+      "loss": 0.7385,
       "step": 180
     },
     {
+      "epoch": 0.5117565698478561,
+      "grad_norm": 0.14311975240707397,
+      "learning_rate": 0.00011353312997501313,
+      "loss": 0.7469,
       "step": 185
     },
     {
+      "epoch": 0.5255878284923928,
+      "grad_norm": 0.129547581076622,
+      "learning_rate": 0.00010871557427476583,
+      "loss": 0.7423,
       "step": 190
     },
     {
+      "epoch": 0.5394190871369294,
+      "grad_norm": 0.1447523832321167,
+      "learning_rate": 0.0001038775371256817,
+      "loss": 0.7351,
       "step": 195
     },
     {
+      "epoch": 0.5532503457814661,
+      "grad_norm": 0.1369813233613968,
+      "learning_rate": 9.903038783140216e-05,
+      "loss": 0.7202,
       "step": 200
     },
     {
+      "epoch": 0.5670816044260027,
+      "grad_norm": 0.12533989548683167,
+      "learning_rate": 9.418551710895243e-05,
+      "loss": 0.722,
       "step": 205
     },
     {
+      "epoch": 0.5809128630705395,
+      "grad_norm": 0.12739399075508118,
+      "learning_rate": 8.935431032075318e-05,
+      "loss": 0.7173,
       "step": 210
     },
     {
+      "epoch": 0.5947441217150761,
+      "grad_norm": 0.13596710562705994,
+      "learning_rate": 8.454812071921596e-05,
+      "loss": 0.7194,
       "step": 215
     },
     {
+      "epoch": 0.6085753803596127,
+      "grad_norm": 0.12327581644058228,
+      "learning_rate": 7.977824276679623e-05,
+      "loss": 0.7095,
       "step": 220
     },
     {
+      "epoch": 0.6224066390041494,
+      "grad_norm": 0.1317676603794098,
+      "learning_rate": 7.505588559420189e-05,
+      "loss": 0.713,
       "step": 225
     },
     {
+      "epoch": 0.636237897648686,
+      "grad_norm": 0.13516183197498322,
+      "learning_rate": 7.039214665913003e-05,
+      "loss": 0.7048,
       "step": 230
     },
     {
+      "epoch": 0.6500691562932227,
+      "grad_norm": 0.12717784941196442,
+      "learning_rate": 6.579798566743314e-05,
+      "loss": 0.7088,
       "step": 235
     },
     {
+      "epoch": 0.6639004149377593,
+      "grad_norm": 0.12097220122814178,
+      "learning_rate": 6.128419881799996e-05,
+      "loss": 0.6939,
       "step": 240
     },
     {
+      "epoch": 0.677731673582296,
+      "grad_norm": 0.1216357946395874,
+      "learning_rate": 5.6861393431874675e-05,
+      "loss": 0.6943,
       "step": 245
     },
     {
+      "epoch": 0.6915629322268326,
+      "grad_norm": 0.12578962743282318,
+      "learning_rate": 5.253996302523596e-05,
+      "loss": 0.6832,
       "step": 250
     },
     {
+      "epoch": 0.7053941908713693,
+      "grad_norm": 0.1288958042860031,
+      "learning_rate": 4.833006288481371e-05,
+      "loss": 0.6786,
       "step": 255
     },
     {
+      "epoch": 0.719225449515906,
+      "grad_norm": 0.13444924354553223,
+      "learning_rate": 4.424158620314073e-05,
+      "loss": 0.6861,
       "step": 260
     },
     {
+      "epoch": 0.7330567081604425,
+      "grad_norm": 0.15658161044120789,
+      "learning_rate": 4.028414082972141e-05,
+      "loss": 0.6829,
       "step": 265
     },
     {
+      "epoch": 0.7468879668049793,
+      "grad_norm": 0.13638462126255035,
+      "learning_rate": 3.646702669275151e-05,
+      "loss": 0.6811,
       "step": 270
     },
     {
+      "epoch": 0.7607192254495159,
+      "grad_norm": 0.11960398405790329,
+      "learning_rate": 3.279921394444776e-05,
+      "loss": 0.6645,
       "step": 275
     },
     {
+      "epoch": 0.7745504840940526,
+      "grad_norm": 0.12005037814378738,
+      "learning_rate": 2.9289321881345254e-05,
+      "loss": 0.6709,
       "step": 280
     },
     {
+      "epoch": 0.7883817427385892,
+      "grad_norm": 0.12300828844308853,
+      "learning_rate": 2.594559868909956e-05,
+      "loss": 0.6629,
       "step": 285
     },
     {
+      "epoch": 0.8022130013831259,
+      "grad_norm": 0.11922738701105118,
+      "learning_rate": 2.2775902059393085e-05,
+      "loss": 0.6613,
       "step": 290
     },
     {
+      "epoch": 0.8160442600276625,
+      "grad_norm": 0.11143971979618073,
+      "learning_rate": 1.9787680724495617e-05,
+      "loss": 0.6546,
       "step": 295
     },
     {
+      "epoch": 0.8298755186721992,
+      "grad_norm": 0.11601640284061432,
+      "learning_rate": 1.698795695287212e-05,
+      "loss": 0.6567,
       "step": 300
     },
     {
+      "epoch": 0.8437067773167358,
+      "grad_norm": 0.11989685148000717,
+      "learning_rate": 1.4383310046973365e-05,
+      "loss": 0.657,
       "step": 305
     },
     {
+      "epoch": 0.8575380359612724,
+      "grad_norm": 0.11077902466058731,
+      "learning_rate": 1.1979860881988902e-05,
+      "loss": 0.6555,
       "step": 310
     },
     {
+      "epoch": 0.8713692946058091,
+      "grad_norm": 0.11324643343687057,
+      "learning_rate": 9.783257521896227e-06,
+      "loss": 0.6468,
       "step": 315
     },
     {
+      "epoch": 0.8852005532503457,
+      "grad_norm": 0.11370333284139633,
+      "learning_rate": 7.798661946608166e-06,
+      "loss": 0.648,
       "step": 320
     },
     {
+      "epoch": 0.8990318118948825,
+      "grad_norm": 0.10991474986076355,
+      "learning_rate": 6.030737921409169e-06,
+      "loss": 0.6446,
       "step": 325
     },
     {
+      "epoch": 0.9128630705394191,
+      "grad_norm": 0.11461606621742249,
+      "learning_rate": 4.4836400371876974e-06,
+      "loss": 0.6387,
       "step": 330
     },
     {
+      "epoch": 0.9266943291839558,
+      "grad_norm": 0.1137213185429573,
+      "learning_rate": 3.161003947219421e-06,
+      "loss": 0.6329,
       "step": 335
     },
     {
+      "epoch": 0.9405255878284924,
+      "grad_norm": 0.10857342928647995,
+      "learning_rate": 2.0659378234448525e-06,
+      "loss": 0.6627,
       "step": 340
     },
     {
+      "epoch": 0.9543568464730291,
+      "grad_norm": 0.10978103429079056,
+      "learning_rate": 1.201015052319099e-06,
+      "loss": 0.6435,
       "step": 345
     },
     {
+      "epoch": 0.9681881051175657,
+      "grad_norm": 0.1058996319770813,
+      "learning_rate": 5.682681873981577e-07,
+      "loss": 0.6388,
       "step": 350
     },
     {
+      "epoch": 0.9820193637621023,
+      "grad_norm": 0.10548459738492966,
+      "learning_rate": 1.6918417287318245e-07,
+      "loss": 0.6382,
       "step": 355
     },
     {
+      "epoch": 0.995850622406639,
+      "grad_norm": 0.11099706590175629,
+      "learning_rate": 4.700849277383679e-09,
+      "loss": 0.6424,
       "step": 360
     },
     {
+      "epoch": 0.9986168741355463,
+      "eval_loss": 0.658014178276062,
+      "eval_runtime": 53.9504,
+      "eval_samples_per_second": 85.449,
+      "eval_steps_per_second": 2.688,
+      "step": 361
     },
     {
+      "epoch": 0.9986168741355463,
+      "step": 361,
+      "total_flos": 1.74045731487744e+18,
+      "train_loss": 0.8234024724801822,
+      "train_runtime": 2385.6161,
+      "train_samples_per_second": 19.395,
+      "train_steps_per_second": 0.151
     }
   ],
   "logging_steps": 5,
+  "max_steps": 361,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 1,
   "save_steps": 500,
       "attributes": {}
     }
   },
+  "total_flos": 1.74045731487744e+18,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c7f901665b246c7f97e4ff363cef52fdcc6b1b8fb59deef2a745733be6a10b18
-size 6968

 version https://git-lfs.github.com/spec/v1
+oid sha256:475f29db775c3fa3ebae0c3997a227d93f50e4e631d281907a24df8c23250da0
+size 7096