Instructions to use Sim4Rec/inter-play-sim-assistant-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sim4Rec/inter-play-sim-assistant-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sim4Rec/inter-play-sim-assistant-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Sim4Rec/inter-play-sim-assistant-sft")
model = AutoModelForCausalLM.from_pretrained("Sim4Rec/inter-play-sim-assistant-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Sim4Rec/inter-play-sim-assistant-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sim4Rec/inter-play-sim-assistant-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sim4Rec/inter-play-sim-assistant-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Sim4Rec/inter-play-sim-assistant-sft

SGLang

How to use Sim4Rec/inter-play-sim-assistant-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sim4Rec/inter-play-sim-assistant-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sim4Rec/inter-play-sim-assistant-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sim4Rec/inter-play-sim-assistant-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sim4Rec/inter-play-sim-assistant-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Sim4Rec/inter-play-sim-assistant-sft with Docker Model Runner:
```
docker model run hf.co/Sim4Rec/inter-play-sim-assistant-sft
```

jeromeramos commited on Feb 23, 2025

Commit

66cb90a

verified ·

1 Parent(s): a12f8e8

Model save

Browse files

Files changed (14) hide show

README.md +1 -1
all_results.json +6 -11
config.json +1 -1
model-00001-of-00004.safetensors +1 -1
model-00002-of-00004.safetensors +1 -1
model-00003-of-00004.safetensors +1 -1
model-00004-of-00004.safetensors +1 -1
runs/Feb23_19-29-36_w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-5fsrblx/events.out.tfevents.1740339152.w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-5fsrblx.47029.0 +3 -0
special_tokens_map.json +1 -1
tokenizer.json +2 -2
tokenizer_config.json +1 -1
train_results.json +6 -6
trainer_state.json +747 -235
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/jerome-ramos-20/huggingface/runs/qm0bt6vo)
 This model was trained with SFT.

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/jerome-ramos-20/huggingface/runs/y53jkfs4)
 This model was trained with SFT.

all_results.json CHANGED Viewed

@@ -1,14 +1,9 @@
 {
-    "epoch": 0.9986168741355463,
-    "eval_loss": 0.6599090695381165,
-    "eval_runtime": 52.3271,
-    "eval_samples": 2071,
-    "eval_samples_per_second": 88.1,
-    "eval_steps_per_second": 2.771,
-    "total_flos": 1.7115790489220547e+18,
-    "train_loss": 0.861595592175164,
-    "train_runtime": 2353.9448,
     "train_samples": 46269,
-    "train_samples_per_second": 19.656,
-    "train_steps_per_second": 0.153
 }

 {
+    "epoch": 1.9986168741355463,
+    "total_flos": 3.364677087628624e+18,
+    "train_loss": 0.7622144496341822,
+    "train_runtime": 4614.5072,
     "train_samples": 46269,
+    "train_samples_per_second": 20.054,
+    "train_steps_per_second": 0.156
 }

config.json CHANGED Viewed

@@ -18,7 +18,7 @@
   "num_attention_heads": 32,
   "num_hidden_layers": 32,
   "num_key_value_heads": 8,
-  "pad_token_id": 128001,
   "pretraining_tp": 1,
   "rms_norm_eps": 1e-05,
   "rope_scaling": {

   "num_attention_heads": 32,
   "num_hidden_layers": 32,
   "num_key_value_heads": 8,
+  "pad_token_id": 128004,
   "pretraining_tp": 1,
   "rms_norm_eps": 1e-05,
   "rope_scaling": {

model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5f3646a85996025cdaed773e7201ee3e3320349d66731b2b77492ae1a5d14add
 size 4977222960

 version https://git-lfs.github.com/spec/v1
+oid sha256:473fce18d250c18a47a350a533e0dd77b59518a960c722628b7eefa5b9884132
 size 4977222960

model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6f2743e048959efde7f2379dd20a4fe9079ab98f6b125b32edd2f0c912d96d3e
 size 4999802720

 version https://git-lfs.github.com/spec/v1
+oid sha256:6b5a874d269778c72e79f59151bef603cf142d8c8224f7678b0ec2edc10dfd44
 size 4999802720

model-00003-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ca317cebfba4e27a3e92f9eb2fd21f5695e0e5a514d71d78f5f2e240dd728ae2
 size 4915916176

 version https://git-lfs.github.com/spec/v1
+oid sha256:d058b976bfd82b1b6afa50b85c104a77791360dd7423bb0a8b70d93602dcff1e
 size 4915916176

model-00004-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7e888911d0c982b08e2ac6b084e7de4a4d500111b730a8ef9c8c43b1c4e83ad2
 size 1168663096

 version https://git-lfs.github.com/spec/v1
+oid sha256:8613511f5a50b99d52a9d237182c70ff8f7729acbbdadc631e83795f1d83e3c4
 size 1168663096

runs/Feb23_19-29-36_w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-5fsrblx/events.out.tfevents.1740339152.w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-5fsrblx.47029.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:045c477cc8061b8990d0fcd0fdab78b9feec3385322b85d0fa235271055954db
+size 37272

special_tokens_map.json CHANGED Viewed

@@ -50,5 +50,5 @@
     "rstrip": false,
     "single_word": false
   },
-  "pad_token": "<|end_of_text|>"
 }

     "rstrip": false,
     "single_word": false
   },
+  "pad_token": "<|finetune_right_pad_id|>"
 }

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3919c1e7bfa558ff525a618a3d463929a238acaba668d7ef6da432fcd6cd7fad
-size 17211327

 version https://git-lfs.github.com/spec/v1
+oid sha256:b5ea5afcc70a5f73f9b545a5940b211fd23e2acd4d895a3ebc3144ca348a4633
+size 17211228

tokenizer_config.json CHANGED Viewed

@@ -2122,6 +2122,6 @@
     "attention_mask"
   ],
   "model_max_length": 131072,
-  "pad_token": "<|end_of_text|>",
   "tokenizer_class": "PreTrainedTokenizerFast"
 }

     "attention_mask"
   ],
   "model_max_length": 131072,
+  "pad_token": "<|finetune_right_pad_id|>",
   "tokenizer_class": "PreTrainedTokenizerFast"
 }

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-    "epoch": 0.9986168741355463,
-    "total_flos": 1.7115790489220547e+18,
-    "train_loss": 0.861595592175164,
-    "train_runtime": 2353.9448,
     "train_samples": 46269,
-    "train_samples_per_second": 19.656,
-    "train_steps_per_second": 0.153
 }

 {
+    "epoch": 1.9986168741355463,
+    "total_flos": 3.364677087628624e+18,
+    "train_loss": 0.7622144496341822,
+    "train_runtime": 4614.5072,
     "train_samples": 46269,
+    "train_samples_per_second": 20.054,
+    "train_steps_per_second": 0.156
 }

trainer_state.json CHANGED Viewed

@@ -1,546 +1,1058 @@
 {
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.9986168741355463,
   "eval_steps": 500,
-  "global_step": 361,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
       "epoch": 0.0027662517289073307,
-      "grad_norm": 22.1127872467041,
-      "learning_rate": 5.405405405405406e-06,
-      "loss": 2.6011,
       "step": 1
     },
     {
       "epoch": 0.013831258644536652,
-      "grad_norm": 3.526327610015869,
-      "learning_rate": 2.702702702702703e-05,
-      "loss": 2.2001,
       "step": 5
     },
     {
       "epoch": 0.027662517289073305,
-      "grad_norm": 2.0694353580474854,
-      "learning_rate": 5.405405405405406e-05,
-      "loss": 1.8786,
       "step": 10
     },
     {
       "epoch": 0.04149377593360996,
-      "grad_norm": 1.429513931274414,
-      "learning_rate": 8.108108108108109e-05,
-      "loss": 1.6509,
       "step": 15
     },
     {
       "epoch": 0.05532503457814661,
-      "grad_norm": 3.6957762241363525,
-      "learning_rate": 0.00010810810810810812,
-      "loss": 1.4395,
       "step": 20
     },
     {
       "epoch": 0.06915629322268327,
-      "grad_norm": 3.5924487113952637,
-      "learning_rate": 0.00013513513513513514,
-      "loss": 1.1714,
       "step": 25
     },
     {
       "epoch": 0.08298755186721991,
-      "grad_norm": 1.092515468597412,
-      "learning_rate": 0.00016216216216216218,
-      "loss": 1.2197,
       "step": 30
     },
     {
       "epoch": 0.09681881051175657,
-      "grad_norm": 4.442113876342773,
-      "learning_rate": 0.0001891891891891892,
-      "loss": 1.204,
       "step": 35
     },
     {
       "epoch": 0.11065006915629322,
-      "grad_norm": 6.686959266662598,
-      "learning_rate": 0.0001999576950082201,
-      "loss": 1.6501,
       "step": 40
     },
     {
       "epoch": 0.12448132780082988,
-      "grad_norm": 4.45343017578125,
-      "learning_rate": 0.0001996992941167792,
-      "loss": 1.4183,
       "step": 45
     },
     {
       "epoch": 0.13831258644536654,
-      "grad_norm": 5.694210052490234,
-      "learning_rate": 0.00019920660160815422,
-      "loss": 1.5559,
       "step": 50
     },
     {
       "epoch": 0.15214384508990317,
-      "grad_norm": 2.5626814365386963,
-      "learning_rate": 0.00019848077530122083,
-      "loss": 1.2451,
       "step": 55
     },
     {
       "epoch": 0.16597510373443983,
-      "grad_norm": 0.5388926863670349,
-      "learning_rate": 0.00019752352087524933,
-      "loss": 1.0484,
       "step": 60
     },
     {
       "epoch": 0.1798063623789765,
-      "grad_norm": 0.3036655783653259,
-      "learning_rate": 0.00019633708786158806,
-      "loss": 0.9605,
       "step": 65
     },
     {
       "epoch": 0.19363762102351315,
-      "grad_norm": 0.25516265630722046,
-      "learning_rate": 0.0001949242643573034,
-      "loss": 0.9121,
       "step": 70
     },
     {
       "epoch": 0.2074688796680498,
-      "grad_norm": 0.22290439903736115,
-      "learning_rate": 0.0001932883704732001,
-      "loss": 0.9072,
       "step": 75
     },
     {
       "epoch": 0.22130013831258644,
-      "grad_norm": 0.23729199171066284,
-      "learning_rate": 0.00019143325053161796,
-      "loss": 0.8938,
       "step": 80
     },
     {
       "epoch": 0.2351313969571231,
-      "grad_norm": 0.2355058640241623,
-      "learning_rate": 0.00018936326403234125,
-      "loss": 0.8759,
       "step": 85
     },
     {
       "epoch": 0.24896265560165975,
-      "grad_norm": 0.20683090388774872,
-      "learning_rate": 0.00018708327540784922,
-      "loss": 0.8758,
       "step": 90
     },
     {
       "epoch": 0.2627939142461964,
-      "grad_norm": 0.21719329059123993,
-      "learning_rate": 0.0001845986425919841,
-      "loss": 0.8571,
       "step": 95
     },
     {
       "epoch": 0.2766251728907331,
-      "grad_norm": 0.20208917558193207,
-      "learning_rate": 0.0001819152044288992,
-      "loss": 0.859,
       "step": 100
     },
     {
       "epoch": 0.29045643153526973,
-      "grad_norm": 0.18826699256896973,
-      "learning_rate": 0.00017903926695187595,
-      "loss": 0.8427,
       "step": 105
     },
     {
       "epoch": 0.30428769017980634,
-      "grad_norm": 0.18175852298736572,
-      "learning_rate": 0.00017597758856425494,
-      "loss": 0.8389,
       "step": 110
     },
     {
       "epoch": 0.318118948824343,
-      "grad_norm": 0.17405715584754944,
-      "learning_rate": 0.00017273736415730488,
-      "loss": 0.8185,
       "step": 115
     },
     {
       "epoch": 0.33195020746887965,
-      "grad_norm": 0.15530933439731598,
-      "learning_rate": 0.00016932620820235244,
-      "loss": 0.8249,
       "step": 120
     },
     {
       "epoch": 0.3457814661134163,
-      "grad_norm": 0.17757271230220795,
-      "learning_rate": 0.0001657521368569064,
-      "loss": 0.7947,
       "step": 125
     },
     {
       "epoch": 0.359612724757953,
-      "grad_norm": 0.18264907598495483,
-      "learning_rate": 0.000162023549126826,
-      "loss": 0.8021,
       "step": 130
     },
     {
       "epoch": 0.37344398340248963,
-      "grad_norm": 0.18304209411144257,
-      "learning_rate": 0.00015814920712880267,
-      "loss": 0.8039,
       "step": 135
     },
     {
       "epoch": 0.3872752420470263,
-      "grad_norm": 0.16061393916606903,
-      "learning_rate": 0.00015413821549953698,
-      "loss": 0.792,
       "step": 140
     },
     {
       "epoch": 0.40110650069156295,
-      "grad_norm": 0.1555311381816864,
-      "learning_rate": 0.00015000000000000001,
-      "loss": 0.7948,
       "step": 145
     },
     {
       "epoch": 0.4149377593360996,
-      "grad_norm": 0.15761056542396545,
-      "learning_rate": 0.0001457442853650581,
-      "loss": 0.7768,
       "step": 150
     },
     {
       "epoch": 0.4287690179806362,
-      "grad_norm": 0.1716078668832779,
-      "learning_rate": 0.00014138107245051392,
-      "loss": 0.7758,
       "step": 155
     },
     {
       "epoch": 0.4426002766251729,
-      "grad_norm": 0.1470308154821396,
-      "learning_rate": 0.00013692061473126845,
-      "loss": 0.7578,
       "step": 160
     },
     {
       "epoch": 0.45643153526970953,
-      "grad_norm": 0.15690156817436218,
-      "learning_rate": 0.00013237339420583212,
-      "loss": 0.7619,
       "step": 165
     },
     {
       "epoch": 0.4702627939142462,
-      "grad_norm": 0.17660725116729736,
-      "learning_rate": 0.00012775009676380957,
-      "loss": 0.7567,
       "step": 170
     },
     {
       "epoch": 0.48409405255878285,
-      "grad_norm": 0.13694822788238525,
-      "learning_rate": 0.00012306158707424403,
-      "loss": 0.7569,
       "step": 175
     },
     {
       "epoch": 0.4979253112033195,
-      "grad_norm": 0.12447214871644974,
-      "learning_rate": 0.00011831888305383268,
-      "loss": 0.7414,
       "step": 180
     },
     {
       "epoch": 0.5117565698478561,
-      "grad_norm": 0.13208778202533722,
-      "learning_rate": 0.00011353312997501313,
-      "loss": 0.7495,
       "step": 185
     },
     {
       "epoch": 0.5255878284923928,
-      "grad_norm": 0.13374905288219452,
-      "learning_rate": 0.00010871557427476583,
-      "loss": 0.7467,
       "step": 190
     },
     {
       "epoch": 0.5394190871369294,
-      "grad_norm": 0.14392955601215363,
-      "learning_rate": 0.0001038775371256817,
-      "loss": 0.7388,
       "step": 195
     },
     {
       "epoch": 0.5532503457814661,
-      "grad_norm": 0.13033545017242432,
-      "learning_rate": 9.903038783140216e-05,
-      "loss": 0.7239,
       "step": 200
     },
     {
       "epoch": 0.5670816044260027,
-      "grad_norm": 0.12652400135993958,
-      "learning_rate": 9.418551710895243e-05,
-      "loss": 0.7251,
       "step": 205
     },
     {
       "epoch": 0.5809128630705395,
-      "grad_norm": 0.12813538312911987,
-      "learning_rate": 8.935431032075318e-05,
-      "loss": 0.7206,
       "step": 210
     },
     {
       "epoch": 0.5947441217150761,
-      "grad_norm": 0.13136501610279083,
-      "learning_rate": 8.454812071921596e-05,
-      "loss": 0.721,
       "step": 215
     },
     {
       "epoch": 0.6085753803596127,
-      "grad_norm": 0.13638000190258026,
-      "learning_rate": 7.977824276679623e-05,
-      "loss": 0.7134,
       "step": 220
     },
     {
       "epoch": 0.6224066390041494,
-      "grad_norm": 0.13380198180675507,
-      "learning_rate": 7.505588559420189e-05,
-      "loss": 0.7158,
       "step": 225
     },
     {
       "epoch": 0.636237897648686,
-      "grad_norm": 0.13291427493095398,
-      "learning_rate": 7.039214665913003e-05,
-      "loss": 0.7068,
       "step": 230
     },
     {
       "epoch": 0.6500691562932227,
-      "grad_norm": 0.12505605816841125,
-      "learning_rate": 6.579798566743314e-05,
-      "loss": 0.7109,
       "step": 235
     },
     {
       "epoch": 0.6639004149377593,
-      "grad_norm": 0.11483744531869888,
-      "learning_rate": 6.128419881799996e-05,
-      "loss": 0.6962,
       "step": 240
     },
     {
       "epoch": 0.677731673582296,
-      "grad_norm": 0.1254301220178604,
-      "learning_rate": 5.6861393431874675e-05,
-      "loss": 0.6944,
       "step": 245
     },
     {
       "epoch": 0.6915629322268326,
-      "grad_norm": 0.13567984104156494,
-      "learning_rate": 5.253996302523596e-05,
-      "loss": 0.6865,
       "step": 250
     },
     {
       "epoch": 0.7053941908713693,
-      "grad_norm": 0.1235489696264267,
-      "learning_rate": 4.833006288481371e-05,
-      "loss": 0.6807,
       "step": 255
     },
     {
       "epoch": 0.719225449515906,
-      "grad_norm": 0.13388977944850922,
-      "learning_rate": 4.424158620314073e-05,
-      "loss": 0.6881,
       "step": 260
     },
     {
       "epoch": 0.7330567081604425,
-      "grad_norm": 0.12815245985984802,
-      "learning_rate": 4.028414082972141e-05,
-      "loss": 0.6842,
       "step": 265
     },
     {
       "epoch": 0.7468879668049793,
-      "grad_norm": 0.1258043646812439,
-      "learning_rate": 3.646702669275151e-05,
-      "loss": 0.6832,
       "step": 270
     },
     {
       "epoch": 0.7607192254495159,
-      "grad_norm": 0.11947453022003174,
-      "learning_rate": 3.279921394444776e-05,
-      "loss": 0.6672,
       "step": 275
     },
     {
       "epoch": 0.7745504840940526,
-      "grad_norm": 0.12488783895969391,
-      "learning_rate": 2.9289321881345254e-05,
-      "loss": 0.6729,
       "step": 280
     },
     {
       "epoch": 0.7883817427385892,
-      "grad_norm": 0.11996188759803772,
-      "learning_rate": 2.594559868909956e-05,
-      "loss": 0.6641,
       "step": 285
     },
     {
       "epoch": 0.8022130013831259,
-      "grad_norm": 0.12338840216398239,
-      "learning_rate": 2.2775902059393085e-05,
-      "loss": 0.6618,
       "step": 290
     },
     {
       "epoch": 0.8160442600276625,
-      "grad_norm": 0.11500907689332962,
-      "learning_rate": 1.9787680724495617e-05,
-      "loss": 0.6576,
       "step": 295
     },
     {
       "epoch": 0.8298755186721992,
-      "grad_norm": 0.1203397586941719,
-      "learning_rate": 1.698795695287212e-05,
-      "loss": 0.6579,
       "step": 300
     },
     {
       "epoch": 0.8437067773167358,
-      "grad_norm": 0.11593286693096161,
-      "learning_rate": 1.4383310046973365e-05,
-      "loss": 0.659,
       "step": 305
     },
     {
       "epoch": 0.8575380359612724,
-      "grad_norm": 0.10674016922712326,
-      "learning_rate": 1.1979860881988902e-05,
-      "loss": 0.6581,
       "step": 310
     },
     {
       "epoch": 0.8713692946058091,
-      "grad_norm": 0.1114317774772644,
-      "learning_rate": 9.783257521896227e-06,
-      "loss": 0.6489,
       "step": 315
     },
     {
       "epoch": 0.8852005532503457,
-      "grad_norm": 0.11088614910840988,
-      "learning_rate": 7.798661946608166e-06,
-      "loss": 0.6485,
       "step": 320
     },
     {
       "epoch": 0.8990318118948825,
-      "grad_norm": 0.10715563595294952,
-      "learning_rate": 6.030737921409169e-06,
-      "loss": 0.645,
       "step": 325
     },
     {
       "epoch": 0.9128630705394191,
-      "grad_norm": 0.11442163586616516,
-      "learning_rate": 4.4836400371876974e-06,
-      "loss": 0.64,
       "step": 330
     },
     {
       "epoch": 0.9266943291839558,
-      "grad_norm": 0.1089484840631485,
-      "learning_rate": 3.161003947219421e-06,
-      "loss": 0.6336,
       "step": 335
     },
     {
       "epoch": 0.9405255878284924,
-      "grad_norm": 0.10584916174411774,
-      "learning_rate": 2.0659378234448525e-06,
-      "loss": 0.665,
       "step": 340
     },
     {
       "epoch": 0.9543568464730291,
-      "grad_norm": 0.10534138232469559,
-      "learning_rate": 1.201015052319099e-06,
-      "loss": 0.6455,
       "step": 345
     },
     {
       "epoch": 0.9681881051175657,
-      "grad_norm": 0.1038522943854332,
-      "learning_rate": 5.682681873981577e-07,
-      "loss": 0.6406,
       "step": 350
     },
     {
       "epoch": 0.9820193637621023,
-      "grad_norm": 0.10471897572278976,
-      "learning_rate": 1.6918417287318245e-07,
-      "loss": 0.6396,
       "step": 355
     },
     {
       "epoch": 0.995850622406639,
-      "grad_norm": 0.10800525546073914,
-      "learning_rate": 4.700849277383679e-09,
-      "loss": 0.6434,
       "step": 360
     },
     {
       "epoch": 0.9986168741355463,
-      "eval_loss": 0.6599090695381165,
-      "eval_runtime": 53.0431,
-      "eval_samples_per_second": 86.91,
-      "eval_steps_per_second": 2.734,
       "step": 361
     },
     {
-      "epoch": 0.9986168741355463,
-      "step": 361,
-      "total_flos": 1.7115790489220547e+18,
-      "train_loss": 0.861595592175164,
-      "train_runtime": 2353.9448,
-      "train_samples_per_second": 19.656,
-      "train_steps_per_second": 0.153
     }
   ],
   "logging_steps": 5,
-  "max_steps": 361,
   "num_input_tokens_seen": 0,
-  "num_train_epochs": 1,
   "save_steps": 500,
   "stateful_callbacks": {
     "TrainerControl": {
@@ -554,7 +1066,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.7115790489220547e+18,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

 {
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 1.9986168741355463,
   "eval_steps": 500,
+  "global_step": 722,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
       "epoch": 0.0027662517289073307,
+      "grad_norm": 20.252717971801758,
+      "learning_rate": 2.7397260273972604e-06,
+      "loss": 2.8393,
       "step": 1
     },
     {
       "epoch": 0.013831258644536652,
+      "grad_norm": 3.2012741565704346,
+      "learning_rate": 1.3698630136986302e-05,
+      "loss": 2.5142,
       "step": 5
     },
     {
       "epoch": 0.027662517289073305,
+      "grad_norm": 4.701728820800781,
+      "learning_rate": 2.7397260273972603e-05,
+      "loss": 2.0682,
       "step": 10
     },
     {
       "epoch": 0.04149377593360996,
+      "grad_norm": 1.333525538444519,
+      "learning_rate": 4.1095890410958905e-05,
+      "loss": 1.8592,
       "step": 15
     },
     {
       "epoch": 0.05532503457814661,
+      "grad_norm": 1.3727704286575317,
+      "learning_rate": 5.479452054794521e-05,
+      "loss": 1.611,
       "step": 20
     },
     {
       "epoch": 0.06915629322268327,
+      "grad_norm": 1.287765622138977,
+      "learning_rate": 6.84931506849315e-05,
+      "loss": 1.2818,
       "step": 25
     },
     {
       "epoch": 0.08298755186721991,
+      "grad_norm": 0.443893164396286,
+      "learning_rate": 8.219178082191781e-05,
+      "loss": 1.0442,
       "step": 30
     },
     {
       "epoch": 0.09681881051175657,
+      "grad_norm": 1.4994786977767944,
+      "learning_rate": 9.58904109589041e-05,
+      "loss": 1.0328,
       "step": 35
     },
     {
       "epoch": 0.11065006915629322,
+      "grad_norm": 3.60703444480896,
+      "learning_rate": 0.00010958904109589041,
+      "loss": 1.264,
       "step": 40
     },
     {
       "epoch": 0.12448132780082988,
+      "grad_norm": 2.5533716678619385,
+      "learning_rate": 0.0001232876712328767,
+      "loss": 1.5079,
       "step": 45
     },
     {
       "epoch": 0.13831258644536654,
+      "grad_norm": 25.258333206176758,
+      "learning_rate": 0.000136986301369863,
+      "loss": 1.3297,
       "step": 50
     },
     {
       "epoch": 0.15214384508990317,
+      "grad_norm": 2.7347092628479004,
+      "learning_rate": 0.00015068493150684933,
+      "loss": 1.4471,
       "step": 55
     },
     {
       "epoch": 0.16597510373443983,
+      "grad_norm": 3.742867946624756,
+      "learning_rate": 0.00016438356164383562,
+      "loss": 1.2809,
       "step": 60
     },
     {
       "epoch": 0.1798063623789765,
+      "grad_norm": 3.686124563217163,
+      "learning_rate": 0.00017808219178082192,
+      "loss": 1.6302,
       "step": 65
     },
     {
       "epoch": 0.19363762102351315,
+      "grad_norm": 8.028692245483398,
+      "learning_rate": 0.0001917808219178082,
+      "loss": 2.139,
       "step": 70
     },
     {
       "epoch": 0.2074688796680498,
+      "grad_norm": 4.435426712036133,
+      "learning_rate": 0.00019999531362588743,
+      "loss": 1.4489,
       "step": 75
     },
     {
       "epoch": 0.22130013831258644,
+      "grad_norm": 2.330904483795166,
+      "learning_rate": 0.00019994259696141126,
+      "loss": 1.4819,
       "step": 80
     },
     {
       "epoch": 0.2351313969571231,
+      "grad_norm": 1.6414345502853394,
+      "learning_rate": 0.0001998313366477513,
+      "loss": 1.3125,
       "step": 85
     },
     {
       "epoch": 0.24896265560165975,
+      "grad_norm": 15.830405235290527,
+      "learning_rate": 0.00019966159785816663,
+      "loss": 1.1538,
       "step": 90
     },
     {
       "epoch": 0.2627939142461964,
+      "grad_norm": 0.8891340494155884,
+      "learning_rate": 0.00019943348002101371,
+      "loss": 1.1039,
       "step": 95
     },
     {
       "epoch": 0.2766251728907331,
+      "grad_norm": 1.8807998895645142,
+      "learning_rate": 0.00019914711676150378,
+      "loss": 1.1167,
       "step": 100
     },
     {
       "epoch": 0.29045643153526973,
+      "grad_norm": 0.5741276741027832,
+      "learning_rate": 0.0001988026758234289,
+      "loss": 1.1358,
       "step": 105
     },
     {
       "epoch": 0.30428769017980634,
+      "grad_norm": 0.26256272196769714,
+      "learning_rate": 0.00019840035897090215,
+      "loss": 1.0063,
       "step": 110
     },
     {
       "epoch": 0.318118948824343,
+      "grad_norm": 0.3350989818572998,
+      "learning_rate": 0.00019794040187017005,
+      "loss": 0.9463,
       "step": 115
     },
     {
       "epoch": 0.33195020746887965,
+      "grad_norm": 0.25455090403556824,
+      "learning_rate": 0.00019742307395156507,
+      "loss": 0.9406,
       "step": 120
     },
     {
       "epoch": 0.3457814661134163,
+      "grad_norm": 0.24111931025981903,
+      "learning_rate": 0.0001968486782516813,
+      "loss": 0.896,
       "step": 125
     },
     {
       "epoch": 0.359612724757953,
+      "grad_norm": 0.2141195684671402,
+      "learning_rate": 0.00019621755123586354,
+      "loss": 0.9039,
       "step": 130
     },
     {
       "epoch": 0.37344398340248963,
+      "grad_norm": 0.19233381748199463,
+      "learning_rate": 0.00019553006260111515,
+      "loss": 0.9018,
       "step": 135
     },
     {
       "epoch": 0.3872752420470263,
+      "grad_norm": 0.18128527700901031,
+      "learning_rate": 0.0001947866150595396,
+      "loss": 0.8879,
       "step": 140
     },
     {
       "epoch": 0.40110650069156295,
+      "grad_norm": 0.18775707483291626,
+      "learning_rate": 0.00019398764410244275,
+      "loss": 0.8892,
       "step": 145
     },
     {
       "epoch": 0.4149377593360996,
+      "grad_norm": 0.18092310428619385,
+      "learning_rate": 0.00019313361774523385,
+      "loss": 0.8646,
       "step": 150
     },
     {
       "epoch": 0.4287690179806362,
+      "grad_norm": 0.19820688664913177,
+      "learning_rate": 0.00019222503625327496,
+      "loss": 0.865,
       "step": 155
     },
     {
       "epoch": 0.4426002766251729,
+      "grad_norm": 0.19297532737255096,
+      "learning_rate": 0.00019126243184883898,
+      "loss": 0.8473,
       "step": 160
     },
     {
       "epoch": 0.45643153526970953,
+      "grad_norm": 0.17955026030540466,
+      "learning_rate": 0.00019024636839934855,
+      "loss": 0.8512,
       "step": 165
     },
     {
       "epoch": 0.4702627939142462,
+      "grad_norm": 0.18673136830329895,
+      "learning_rate": 0.00018917744108707776,
+      "loss": 0.8421,
       "step": 170
     },
     {
       "epoch": 0.48409405255878285,
+      "grad_norm": 0.1972828507423401,
+      "learning_rate": 0.0001880562760605105,
+      "loss": 0.8432,
       "step": 175
     },
     {
       "epoch": 0.4979253112033195,
+      "grad_norm": 0.1867324709892273,
+      "learning_rate": 0.00018688353006756004,
+      "loss": 0.8299,
       "step": 180
     },
     {
       "epoch": 0.5117565698478561,
+      "grad_norm": 0.16586913168430328,
+      "learning_rate": 0.0001856598900708637,
+      "loss": 0.837,
       "step": 185
     },
     {
       "epoch": 0.5255878284923928,
+      "grad_norm": 0.1780562847852707,
+      "learning_rate": 0.00018438607284537907,
+      "loss": 0.8328,
       "step": 190
     },
     {
       "epoch": 0.5394190871369294,
+      "grad_norm": 0.17686933279037476,
+      "learning_rate": 0.00018306282455851655,
+      "loss": 0.8238,
       "step": 195
     },
     {
       "epoch": 0.5532503457814661,
+      "grad_norm": 0.1651686578989029,
+      "learning_rate": 0.00018169092033305516,
+      "loss": 0.81,
       "step": 200
     },
     {
       "epoch": 0.5670816044260027,
+      "grad_norm": 0.15609298646450043,
+      "learning_rate": 0.00018027116379309638,
+      "loss": 0.8119,
       "step": 205
     },
     {
       "epoch": 0.5809128630705395,
+      "grad_norm": 0.15646931529045105,
+      "learning_rate": 0.00017880438659332332,
+      "loss": 0.811,
       "step": 210
     },
     {
       "epoch": 0.5947441217150761,
+      "grad_norm": 0.15522946417331696,
+      "learning_rate": 0.00017729144793183992,
+      "loss": 0.8105,
       "step": 215
     },
     {
       "epoch": 0.6085753803596127,
+      "grad_norm": 0.14471209049224854,
+      "learning_rate": 0.0001757332340468762,
+      "loss": 0.8026,
       "step": 220
     },
     {
       "epoch": 0.6224066390041494,
+      "grad_norm": 0.16337046027183533,
+      "learning_rate": 0.00017413065769765406,
+      "loss": 0.8051,
       "step": 225
     },
     {
       "epoch": 0.636237897648686,
+      "grad_norm": 0.16175448894500732,
+      "learning_rate": 0.00017248465762971776,
+      "loss": 0.7999,
       "step": 230
     },
     {
       "epoch": 0.6500691562932227,
+      "grad_norm": 0.17871476709842682,
+      "learning_rate": 0.00017079619802504238,
+      "loss": 0.8067,
       "step": 235
     },
     {
       "epoch": 0.6639004149377593,
+      "grad_norm": 0.1535872220993042,
+      "learning_rate": 0.00016906626793724224,
+      "loss": 0.7893,
       "step": 240
     },
     {
       "epoch": 0.677731673582296,
+      "grad_norm": 0.14963175356388092,
+      "learning_rate": 0.00016729588071221055,
+      "loss": 0.7867,
       "step": 245
     },
     {
       "epoch": 0.6915629322268326,
+      "grad_norm": 0.14980217814445496,
+      "learning_rate": 0.00016548607339452853,
+      "loss": 0.7809,
       "step": 250
     },
     {
       "epoch": 0.7053941908713693,
+      "grad_norm": 0.1530940979719162,
+      "learning_rate": 0.0001636379061199933,
+      "loss": 0.7745,
       "step": 255
     },
     {
       "epoch": 0.719225449515906,
+      "grad_norm": 0.13913874328136444,
+      "learning_rate": 0.0001617524614946192,
+      "loss": 0.7844,
       "step": 260
     },
     {
       "epoch": 0.7330567081604425,
+      "grad_norm": 0.1462012678384781,
+      "learning_rate": 0.00015983084396047653,
+      "loss": 0.781,
       "step": 265
     },
     {
       "epoch": 0.7468879668049793,
+      "grad_norm": 0.14574629068374634,
+      "learning_rate": 0.00015787417914873967,
+      "loss": 0.7801,
       "step": 270
     },
     {
       "epoch": 0.7607192254495159,
+      "grad_norm": 0.13526415824890137,
+      "learning_rate": 0.00015588361322032283,
+      "loss": 0.7629,
       "step": 275
     },
     {
       "epoch": 0.7745504840940526,
+      "grad_norm": 0.14606699347496033,
+      "learning_rate": 0.00015386031219449047,
+      "loss": 0.77,
       "step": 280
     },
     {
       "epoch": 0.7883817427385892,
+      "grad_norm": 0.14022809267044067,
+      "learning_rate": 0.0001518054612658348,
+      "loss": 0.7627,
       "step": 285
     },
     {
       "epoch": 0.8022130013831259,
+      "grad_norm": 0.1374536156654358,
+      "learning_rate": 0.00014972026411002107,
+      "loss": 0.7599,
       "step": 290
     },
     {
       "epoch": 0.8160442600276625,
+      "grad_norm": 0.15299181640148163,
+      "learning_rate": 0.00014760594217870737,
+      "loss": 0.754,
       "step": 295
     },
     {
       "epoch": 0.8298755186721992,
+      "grad_norm": 0.13170954585075378,
+      "learning_rate": 0.00014546373398405143,
+      "loss": 0.7542,
       "step": 300
     },
     {
       "epoch": 0.8437067773167358,
+      "grad_norm": 0.13624456524848938,
+      "learning_rate": 0.00014329489437322397,
+      "loss": 0.7527,
       "step": 305
     },
     {
       "epoch": 0.8575380359612724,
+      "grad_norm": 0.13965509831905365,
+      "learning_rate": 0.0001411006937933532,
+      "loss": 0.7544,
       "step": 310
     },
     {
       "epoch": 0.8713692946058091,
+      "grad_norm": 0.140866219997406,
+      "learning_rate": 0.00013888241754733208,
+      "loss": 0.7439,
       "step": 315
     },
     {
       "epoch": 0.8852005532503457,
+      "grad_norm": 0.1314583420753479,
+      "learning_rate": 0.0001366413650409223,
+      "loss": 0.7431,
       "step": 320
     },
     {
       "epoch": 0.8990318118948825,
+      "grad_norm": 0.123719722032547,
+      "learning_rate": 0.00013437884902159822,
+      "loss": 0.7377,
       "step": 325
     },
     {
       "epoch": 0.9128630705394191,
+      "grad_norm": 0.15139903128147125,
+      "learning_rate": 0.00013209619480957497,
+      "loss": 0.7316,
       "step": 330
     },
     {
       "epoch": 0.9266943291839558,
+      "grad_norm": 0.13264508545398712,
+      "learning_rate": 0.00012979473952147205,
+      "loss": 0.7251,
       "step": 335
     },
     {
       "epoch": 0.9405255878284924,
+      "grad_norm": 0.13327768445014954,
+      "learning_rate": 0.00012747583128706698,
+      "loss": 0.7523,
       "step": 340
     },
     {
       "epoch": 0.9543568464730291,
+      "grad_norm": 0.12970297038555145,
+      "learning_rate": 0.0001251408284595974,
+      "loss": 0.7293,
       "step": 345
     },
     {
       "epoch": 0.9681881051175657,
+      "grad_norm": 0.14814430475234985,
+      "learning_rate": 0.00012279109882007492,
+      "loss": 0.7262,
       "step": 350
     },
     {
       "epoch": 0.9820193637621023,
+      "grad_norm": 0.14566916227340698,
+      "learning_rate": 0.00012042801877607625,
+      "loss": 0.7223,
       "step": 355
     },
     {
       "epoch": 0.995850622406639,
+      "grad_norm": 0.12835277616977692,
+      "learning_rate": 0.00011805297255548118,
+      "loss": 0.7246,
       "step": 360
     },
     {
       "epoch": 0.9986168741355463,
+      "eval_loss": 0.7382652759552002,
+      "eval_runtime": 51.8183,
+      "eval_samples_per_second": 88.965,
+      "eval_steps_per_second": 2.798,
       "step": 361
     },
     {
+      "epoch": 1.0110650069156293,
+      "grad_norm": 0.14208853244781494,
+      "learning_rate": 0.00011566735139562947,
+      "loss": 0.7608,
+      "step": 365
+    },
+    {
+      "epoch": 1.0248962655601659,
+      "grad_norm": 0.12897470593452454,
+      "learning_rate": 0.00011327255272837221,
+      "loss": 0.581,
+      "step": 370
+    },
+    {
+      "epoch": 1.0387275242047027,
+      "grad_norm": 0.13211384415626526,
+      "learning_rate": 0.00011086997936149408,
+      "loss": 0.5594,
+      "step": 375
+    },
+    {
+      "epoch": 1.0525587828492393,
+      "grad_norm": 0.12891365587711334,
+      "learning_rate": 0.00010846103865698696,
+      "loss": 0.5629,
+      "step": 380
+    },
+    {
+      "epoch": 1.066390041493776,
+      "grad_norm": 0.13562671840190887,
+      "learning_rate": 0.00010604714170665544,
+      "loss": 0.5539,
+      "step": 385
+    },
+    {
+      "epoch": 1.0802213001383125,
+      "grad_norm": 0.12082191556692123,
+      "learning_rate": 0.00010362970250553796,
+      "loss": 0.5562,
+      "step": 390
+    },
+    {
+      "epoch": 1.0940525587828493,
+      "grad_norm": 0.12038177996873856,
+      "learning_rate": 0.00010121013712362684,
+      "loss": 0.5506,
+      "step": 395
+    },
+    {
+      "epoch": 1.107883817427386,
+      "grad_norm": 0.11669060587882996,
+      "learning_rate": 9.878986287637318e-05,
+      "loss": 0.552,
+      "step": 400
+    },
+    {
+      "epoch": 1.1217150760719226,
+      "grad_norm": 0.13290895521640778,
+      "learning_rate": 9.637029749446205e-05,
+      "loss": 0.5591,
+      "step": 405
+    },
+    {
+      "epoch": 1.1355463347164592,
+      "grad_norm": 0.14374125003814697,
+      "learning_rate": 9.395285829334458e-05,
+      "loss": 0.5549,
+      "step": 410
+    },
+    {
+      "epoch": 1.1493775933609958,
+      "grad_norm": 0.12170559167861938,
+      "learning_rate": 9.153896134301309e-05,
+      "loss": 0.5551,
+      "step": 415
+    },
+    {
+      "epoch": 1.1632088520055326,
+      "grad_norm": 0.12605440616607666,
+      "learning_rate": 8.913002063850593e-05,
+      "loss": 0.5548,
+      "step": 420
+    },
+    {
+      "epoch": 1.1770401106500692,
+      "grad_norm": 0.12290360033512115,
+      "learning_rate": 8.672744727162781e-05,
+      "loss": 0.5582,
+      "step": 425
+    },
+    {
+      "epoch": 1.1908713692946058,
+      "grad_norm": 0.11714824289083481,
+      "learning_rate": 8.433264860437056e-05,
+      "loss": 0.5484,
+      "step": 430
+    },
+    {
+      "epoch": 1.2047026279391424,
+      "grad_norm": 0.12463142722845078,
+      "learning_rate": 8.194702744451886e-05,
+      "loss": 0.5443,
+      "step": 435
+    },
+    {
+      "epoch": 1.2185338865836792,
+      "grad_norm": 0.11625911295413971,
+      "learning_rate": 7.957198122392377e-05,
+      "loss": 0.5373,
+      "step": 440
+    },
+    {
+      "epoch": 1.2323651452282158,
+      "grad_norm": 0.13417629897594452,
+      "learning_rate": 7.72089011799251e-05,
+      "loss": 0.548,
+      "step": 445
+    },
+    {
+      "epoch": 1.2461964038727524,
+      "grad_norm": 0.13118138909339905,
+      "learning_rate": 7.485917154040263e-05,
+      "loss": 0.5549,
+      "step": 450
+    },
+    {
+      "epoch": 1.260027662517289,
+      "grad_norm": 0.12697063386440277,
+      "learning_rate": 7.252416871293304e-05,
+      "loss": 0.5428,
+      "step": 455
+    },
+    {
+      "epoch": 1.2738589211618256,
+      "grad_norm": 0.12316182255744934,
+      "learning_rate": 7.020526047852797e-05,
+      "loss": 0.5465,
+      "step": 460
+    },
+    {
+      "epoch": 1.2876901798063622,
+      "grad_norm": 0.12391973286867142,
+      "learning_rate": 6.790380519042507e-05,
+      "loss": 0.5314,
+      "step": 465
+    },
+    {
+      "epoch": 1.301521438450899,
+      "grad_norm": 0.1304333359003067,
+      "learning_rate": 6.562115097840182e-05,
+      "loss": 0.5395,
+      "step": 470
+    },
+    {
+      "epoch": 1.3153526970954357,
+      "grad_norm": 0.13805896043777466,
+      "learning_rate": 6.335863495907772e-05,
+      "loss": 0.5357,
+      "step": 475
+    },
+    {
+      "epoch": 1.3291839557399723,
+      "grad_norm": 0.11262813210487366,
+      "learning_rate": 6.111758245266794e-05,
+      "loss": 0.5315,
+      "step": 480
+    },
+    {
+      "epoch": 1.343015214384509,
+      "grad_norm": 0.1169748455286026,
+      "learning_rate": 5.889930620664681e-05,
+      "loss": 0.5377,
+      "step": 485
+    },
+    {
+      "epoch": 1.3568464730290457,
+      "grad_norm": 0.1196884959936142,
+      "learning_rate": 5.670510562677607e-05,
+      "loss": 0.5328,
+      "step": 490
+    },
+    {
+      "epoch": 1.3706777316735823,
+      "grad_norm": 0.1325497031211853,
+      "learning_rate": 5.453626601594857e-05,
+      "loss": 0.5361,
+      "step": 495
+    },
+    {
+      "epoch": 1.384508990318119,
+      "grad_norm": 0.12347881495952606,
+      "learning_rate": 5.239405782129261e-05,
+      "loss": 0.5308,
+      "step": 500
+    },
+    {
+      "epoch": 1.3983402489626555,
+      "grad_norm": 0.12328892946243286,
+      "learning_rate": 5.027973588997896e-05,
+      "loss": 0.5324,
+      "step": 505
+    },
+    {
+      "epoch": 1.4121715076071921,
+      "grad_norm": 0.12255866080522537,
+      "learning_rate": 4.819453873416526e-05,
+      "loss": 0.5314,
+      "step": 510
+    },
+    {
+      "epoch": 1.426002766251729,
+      "grad_norm": 0.11596041172742844,
+      "learning_rate": 4.6139687805509535e-05,
+      "loss": 0.5247,
+      "step": 515
+    },
+    {
+      "epoch": 1.4398340248962656,
+      "grad_norm": 0.1198066920042038,
+      "learning_rate": 4.411638677967718e-05,
+      "loss": 0.5176,
+      "step": 520
+    },
+    {
+      "epoch": 1.4536652835408022,
+      "grad_norm": 0.11880145221948624,
+      "learning_rate": 4.212582085126038e-05,
+      "loss": 0.5209,
+      "step": 525
+    },
+    {
+      "epoch": 1.467496542185339,
+      "grad_norm": 0.11095953732728958,
+      "learning_rate": 4.016915603952347e-05,
+      "loss": 0.5276,
+      "step": 530
+    },
+    {
+      "epoch": 1.4813278008298756,
+      "grad_norm": 0.11372353136539459,
+      "learning_rate": 3.824753850538082e-05,
+      "loss": 0.5196,
+      "step": 535
+    },
+    {
+      "epoch": 1.4951590594744122,
+      "grad_norm": 0.1171097531914711,
+      "learning_rate": 3.636209388000673e-05,
+      "loss": 0.5273,
+      "step": 540
+    },
+    {
+      "epoch": 1.5089903181189488,
+      "grad_norm": 0.11904237419366837,
+      "learning_rate": 3.45139266054715e-05,
+      "loss": 0.5202,
+      "step": 545
+    },
+    {
+      "epoch": 1.5228215767634854,
+      "grad_norm": 0.11875548213720322,
+      "learning_rate": 3.270411928778948e-05,
+      "loss": 0.5161,
+      "step": 550
+    },
+    {
+      "epoch": 1.536652835408022,
+      "grad_norm": 0.10985347628593445,
+      "learning_rate": 3.093373206275775e-05,
+      "loss": 0.5154,
+      "step": 555
+    },
+    {
+      "epoch": 1.5504840940525588,
+      "grad_norm": 0.11928029358386993,
+      "learning_rate": 2.9203801974957666e-05,
+      "loss": 0.5075,
+      "step": 560
+    },
+    {
+      "epoch": 1.5643153526970954,
+      "grad_norm": 0.11182394623756409,
+      "learning_rate": 2.751534237028227e-05,
+      "loss": 0.518,
+      "step": 565
+    },
+    {
+      "epoch": 1.5781466113416323,
+      "grad_norm": 0.11065185815095901,
+      "learning_rate": 2.5869342302345945e-05,
+      "loss": 0.5112,
+      "step": 570
+    },
+    {
+      "epoch": 1.5919778699861689,
+      "grad_norm": 0.10788684338331223,
+      "learning_rate": 2.4266765953123814e-05,
+      "loss": 0.5201,
+      "step": 575
+    },
+    {
+      "epoch": 1.6058091286307055,
+      "grad_norm": 0.12274058163166046,
+      "learning_rate": 2.2708552068160115e-05,
+      "loss": 0.5122,
+      "step": 580
+    },
+    {
+      "epoch": 1.619640387275242,
+      "grad_norm": 0.11127958446741104,
+      "learning_rate": 2.1195613406676706e-05,
+      "loss": 0.5183,
+      "step": 585
+    },
+    {
+      "epoch": 1.6334716459197787,
+      "grad_norm": 0.11190652847290039,
+      "learning_rate": 1.9728836206903656e-05,
+      "loss": 0.5052,
+      "step": 590
+    },
+    {
+      "epoch": 1.6473029045643153,
+      "grad_norm": 0.10597892105579376,
+      "learning_rate": 1.8309079666944883e-05,
+      "loss": 0.5038,
+      "step": 595
+    },
+    {
+      "epoch": 1.6611341632088519,
+      "grad_norm": 0.11393982917070389,
+      "learning_rate": 1.6937175441483455e-05,
+      "loss": 0.4965,
+      "step": 600
+    },
+    {
+      "epoch": 1.6749654218533887,
+      "grad_norm": 0.11752679198980331,
+      "learning_rate": 1.561392715462098e-05,
+      "loss": 0.5137,
+      "step": 605
+    },
+    {
+      "epoch": 1.6887966804979253,
+      "grad_norm": 0.10845957696437836,
+      "learning_rate": 1.4340109929136291e-05,
+      "loss": 0.5051,
+      "step": 610
+    },
+    {
+      "epoch": 1.702627939142462,
+      "grad_norm": 0.11502846330404282,
+      "learning_rate": 1.3116469932439968e-05,
+      "loss": 0.5065,
+      "step": 615
+    },
+    {
+      "epoch": 1.7164591977869987,
+      "grad_norm": 0.10717900097370148,
+      "learning_rate": 1.1943723939489516e-05,
+      "loss": 0.4963,
+      "step": 620
+    },
+    {
+      "epoch": 1.7302904564315353,
+      "grad_norm": 0.11207237094640732,
+      "learning_rate": 1.0822558912922265e-05,
+      "loss": 0.4953,
+      "step": 625
+    },
+    {
+      "epoch": 1.744121715076072,
+      "grad_norm": 0.10951482504606247,
+      "learning_rate": 9.753631600651458e-06,
+      "loss": 0.4875,
+      "step": 630
+    },
+    {
+      "epoch": 1.7579529737206085,
+      "grad_norm": 0.10522555559873581,
+      "learning_rate": 8.737568151161024e-06,
+      "loss": 0.5041,
+      "step": 635
+    },
+    {
+      "epoch": 1.7717842323651452,
+      "grad_norm": 0.10585460811853409,
+      "learning_rate": 7.774963746725073e-06,
+      "loss": 0.5084,
+      "step": 640
+    },
+    {
+      "epoch": 1.7856154910096818,
+      "grad_norm": 0.10949090868234634,
+      "learning_rate": 6.866382254766157e-06,
+      "loss": 0.5026,
+      "step": 645
+    },
+    {
+      "epoch": 1.7994467496542186,
+      "grad_norm": 0.10402841866016388,
+      "learning_rate": 6.0123558975572645e-06,
+      "loss": 0.4941,
+      "step": 650
+    },
+    {
+      "epoch": 1.8132780082987552,
+      "grad_norm": 0.1075565442442894,
+      "learning_rate": 5.213384940460408e-06,
+      "loss": 0.5069,
+      "step": 655
+    },
+    {
+      "epoch": 1.8271092669432918,
+      "grad_norm": 0.11545684933662415,
+      "learning_rate": 4.46993739888486e-06,
+      "loss": 0.5089,
+      "step": 660
+    },
+    {
+      "epoch": 1.8409405255878286,
+      "grad_norm": 0.11043102294206619,
+      "learning_rate": 3.7824487641364594e-06,
+      "loss": 0.4983,
+      "step": 665
+    },
+    {
+      "epoch": 1.8547717842323652,
+      "grad_norm": 0.10490711033344269,
+      "learning_rate": 3.151321748318692e-06,
+      "loss": 0.5023,
+      "step": 670
+    },
+    {
+      "epoch": 1.8686030428769018,
+      "grad_norm": 0.10829063504934311,
+      "learning_rate": 2.5769260484349466e-06,
+      "loss": 0.4941,
+      "step": 675
+    },
+    {
+      "epoch": 1.8824343015214384,
+      "grad_norm": 0.10498882085084915,
+      "learning_rate": 2.059598129829976e-06,
+      "loss": 0.5005,
+      "step": 680
+    },
+    {
+      "epoch": 1.896265560165975,
+      "grad_norm": 0.10581523925065994,
+      "learning_rate": 1.5996410290978314e-06,
+      "loss": 0.4905,
+      "step": 685
+    },
+    {
+      "epoch": 1.9100968188105116,
+      "grad_norm": 0.11104903370141983,
+      "learning_rate": 1.1973241765711352e-06,
+      "loss": 0.4912,
+      "step": 690
+    },
+    {
+      "epoch": 1.9239280774550485,
+      "grad_norm": 0.10330861806869507,
+      "learning_rate": 8.52883238496227e-07,
+      "loss": 0.4814,
+      "step": 695
+    },
+    {
+      "epoch": 1.937759336099585,
+      "grad_norm": 0.10697363317012787,
+      "learning_rate": 5.665199789862907e-07,
+      "loss": 0.4984,
+      "step": 700
+    },
+    {
+      "epoch": 1.9515905947441217,
+      "grad_norm": 0.10598309338092804,
+      "learning_rate": 3.3840214183337157e-07,
+      "loss": 0.4924,
+      "step": 705
+    },
+    {
+      "epoch": 1.9654218533886585,
+      "grad_norm": 0.10365109890699387,
+      "learning_rate": 1.686633522487213e-07,
+      "loss": 0.4958,
+      "step": 710
+    },
+    {
+      "epoch": 1.979253112033195,
+      "grad_norm": 0.10521161556243896,
+      "learning_rate": 5.740303858874363e-08,
+      "loss": 0.5027,
+      "step": 715
+    },
+    {
+      "epoch": 1.9930843706777317,
+      "grad_norm": 0.10533251613378525,
+      "learning_rate": 4.686374112583547e-09,
+      "loss": 0.4935,
+      "step": 720
+    },
+    {
+      "epoch": 1.9986168741355463,
+      "eval_loss": 0.6898206472396851,
+      "eval_runtime": 51.7983,
+      "eval_samples_per_second": 88.999,
+      "eval_steps_per_second": 2.799,
+      "step": 722
+    },
+    {
+      "epoch": 1.9986168741355463,
+      "step": 722,
+      "total_flos": 3.364677087628624e+18,
+      "train_loss": 0.7622144496341822,
+      "train_runtime": 4614.5072,
+      "train_samples_per_second": 20.054,
+      "train_steps_per_second": 0.156
     }
   ],
   "logging_steps": 5,
+  "max_steps": 722,
   "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
   "save_steps": 500,
   "stateful_callbacks": {
     "TrainerControl": {
       "attributes": {}
     }
   },
+  "total_flos": 3.364677087628624e+18,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4394984455d4ffe3e51e3b2431658cf9b616f4718e0ca4da0047bdbe4ff3859e
 size 7096

 version https://git-lfs.github.com/spec/v1
+oid sha256:3e9a343aa9f12c033062e11705d146dffaacd0bf53572b9e68d2ca60f23368e7
 size 7096