Instructions to use Ba2han/model-sft-q2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/model-sft-q2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/model-sft-q2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/model-sft-q2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/model-sft-q2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Ba2han/model-sft-q2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/model-sft-q2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/model-sft-q2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ba2han/model-sft-q2

SGLang

How to use Ba2han/model-sft-q2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/model-sft-q2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/model-sft-q2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/model-sft-q2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/model-sft-q2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Ba2han/model-sft-q2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/model-sft-q2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/model-sft-q2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/model-sft-q2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/model-sft-q2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/model-sft-q2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/model-sft-q2
```

Ba2han commited on 3 days ago

Commit

d6c4e1f

verified ·

1 Parent(s): 3f726c1

Training in progress, step 202, checkpoint

Browse files

Files changed (8) hide show

last-checkpoint/config.json +1 -1
last-checkpoint/generation_config.json +1 -1
last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +2 -2
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/tokenizer_config.json +1 -0
last-checkpoint/trainer_state.json +329 -838
last-checkpoint/training_args.bin +2 -2

last-checkpoint/config.json CHANGED Viewed

@@ -74,7 +74,7 @@
   },
   "sliding_window": null,
   "tie_word_embeddings": true,
-  "transformers_version": "5.5.0",
   "unsloth_version": "2026.5.5",
   "use_cache": false,
   "use_sliding_window": false,

   },
   "sliding_window": null,
   "tie_word_embeddings": true,
+  "transformers_version": "5.9.0",
   "unsloth_version": "2026.5.5",
   "use_cache": false,
   "use_sliding_window": false,

last-checkpoint/generation_config.json CHANGED Viewed

@@ -9,6 +9,6 @@
   "output_attentions": false,
   "output_hidden_states": false,
   "pad_token_id": 50034,
-  "transformers_version": "5.5.0",
   "use_cache": false
 }

   "output_attentions": false,
   "output_hidden_states": false,
   "pad_token_id": 50034,
+  "transformers_version": "5.9.0",
   "use_cache": false
 }

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ddd73a9f4705d43af46441fdabae4f1fcdb54eba8592d8aa3b59157cb049a61d
 size 1049614696

 version https://git-lfs.github.com/spec/v1
+oid sha256:e86ea7e76d56fce0e7cf1b5ead34055c12d9e1c046d85c8d5e52e91e6ee5f69d
 size 1049614696

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:75180711f91895ee76c5f92a1790222aed1a68917f97d9bddb9b8c90258b062e
-size 1372902609

 version https://git-lfs.github.com/spec/v1
+oid sha256:987fe636d2e5772693b0825e190d31af506ea6d4eec9f928250ac37333c1578a
+size 1372902161

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4956fb60be60c7de5858bdceb864c9f31c63309229a058e355df8cca31faf0ff
 size 1465

 version https://git-lfs.github.com/spec/v1
+oid sha256:5315fadac529441a6634fe188c8336bd2e818d51c6e2348dc0d1261d0463f5d4
 size 1465

last-checkpoint/tokenizer_config.json CHANGED Viewed

@@ -8,6 +8,7 @@
     "<|im_end|>"
   ],
   "is_local": true,
   "model_input_names": [
     "input_ids",
     "attention_mask"

     "<|im_end|>"
   ],
   "is_local": true,
+  "local_files_only": false,
   "model_input_names": [
     "input_ids",
     "attention_mask"

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,1251 +2,742 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 1.0,
-  "eval_steps": 51,
-  "global_step": 337,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 0.005943536404160475,
-      "grad_norm": 1.8359375,
-      "learning_rate": 7.142857142857143e-06,
-      "loss": 1.5533452033996582,
       "step": 2
     },
     {
-      "epoch": 0.01188707280832095,
-      "grad_norm": 1.9609375,
-      "learning_rate": 2.1428571428571428e-05,
-      "loss": 1.618288278579712,
       "step": 4
     },
     {
-      "epoch": 0.017830609212481426,
-      "grad_norm": 1.8671875,
-      "learning_rate": 3.571428571428572e-05,
-      "loss": 1.5396310091018677,
       "step": 6
     },
     {
-      "epoch": 0.0237741456166419,
-      "grad_norm": 1.890625,
-      "learning_rate": 5e-05,
-      "loss": 1.579732894897461,
       "step": 8
     },
     {
-      "epoch": 0.029717682020802376,
-      "grad_norm": 1.1640625,
-      "learning_rate": 5e-05,
-      "loss": 1.5438868999481201,
       "step": 10
     },
     {
-      "epoch": 0.03566121842496285,
-      "grad_norm": 1.0,
       "learning_rate": 5e-05,
-      "loss": 1.4871983528137207,
       "step": 12
     },
     {
-      "epoch": 0.041604754829123326,
-      "grad_norm": 0.8828125,
       "learning_rate": 5e-05,
-      "loss": 1.529313325881958,
       "step": 14
     },
     {
-      "epoch": 0.0475482912332838,
-      "grad_norm": 1.234375,
       "learning_rate": 5e-05,
-      "loss": 1.494563341140747,
       "step": 16
     },
     {
-      "epoch": 0.05349182763744428,
-      "grad_norm": 1.2109375,
       "learning_rate": 5e-05,
-      "loss": 1.5288245677947998,
       "step": 18
     },
     {
-      "epoch": 0.05943536404160475,
-      "grad_norm": 1.109375,
       "learning_rate": 5e-05,
-      "loss": 1.5613418817520142,
       "step": 20
     },
     {
-      "epoch": 0.06537890044576523,
-      "grad_norm": 0.88671875,
       "learning_rate": 5e-05,
-      "loss": 1.4736829996109009,
       "step": 22
     },
     {
-      "epoch": 0.0713224368499257,
-      "grad_norm": 0.78515625,
       "learning_rate": 5e-05,
-      "loss": 1.3815312385559082,
       "step": 24
     },
     {
-      "epoch": 0.07726597325408618,
-      "grad_norm": 0.8515625,
       "learning_rate": 5e-05,
-      "loss": 1.5204863548278809,
       "step": 26
     },
     {
-      "epoch": 0.08320950965824665,
-      "grad_norm": 0.9375,
       "learning_rate": 5e-05,
-      "loss": 1.4796512126922607,
       "step": 28
     },
     {
-      "epoch": 0.08915304606240713,
-      "grad_norm": 0.78125,
       "learning_rate": 5e-05,
-      "loss": 1.4494001865386963,
       "step": 30
     },
     {
-      "epoch": 0.0950965824665676,
-      "grad_norm": 0.9375,
       "learning_rate": 5e-05,
-      "loss": 1.4973952770233154,
       "step": 32
     },
     {
-      "epoch": 0.10104011887072809,
-      "grad_norm": 0.875,
       "learning_rate": 5e-05,
-      "loss": 1.4398908615112305,
       "step": 34
     },
     {
-      "epoch": 0.10698365527488855,
-      "grad_norm": 0.9609375,
       "learning_rate": 5e-05,
-      "loss": 1.3991800546646118,
       "step": 36
     },
     {
-      "epoch": 0.11292719167904904,
-      "grad_norm": 0.80859375,
       "learning_rate": 5e-05,
-      "loss": 1.475541353225708,
       "step": 38
     },
     {
-      "epoch": 0.1188707280832095,
-      "grad_norm": 0.9375,
       "learning_rate": 5e-05,
-      "loss": 1.4666297435760498,
       "step": 40
     },
     {
-      "epoch": 0.12481426448736999,
-      "grad_norm": 0.859375,
       "learning_rate": 5e-05,
-      "loss": 1.4133172035217285,
       "step": 42
     },
     {
-      "epoch": 0.13075780089153047,
-      "grad_norm": 0.83203125,
       "learning_rate": 5e-05,
-      "loss": 1.4405813217163086,
       "step": 44
     },
     {
-      "epoch": 0.13670133729569092,
-      "grad_norm": 0.7265625,
       "learning_rate": 5e-05,
-      "loss": 1.5167930126190186,
       "step": 46
     },
     {
-      "epoch": 0.1426448736998514,
-      "grad_norm": 0.88671875,
       "learning_rate": 5e-05,
-      "loss": 1.35231614112854,
       "step": 48
     },
     {
-      "epoch": 0.1485884101040119,
-      "grad_norm": 0.9296875,
       "learning_rate": 5e-05,
-      "loss": 1.5603207349777222,
       "step": 50
     },
     {
-      "epoch": 0.1515601783060921,
-      "eval_loss": 1.49801504611969,
-      "eval_runtime": 1.855,
-      "eval_samples_per_second": 58.76,
-      "eval_steps_per_second": 7.547,
-      "step": 51
-    },
-    {
-      "epoch": 0.15453194650817237,
-      "grad_norm": 0.8359375,
       "learning_rate": 5e-05,
-      "loss": 1.5468140840530396,
       "step": 52
     },
     {
-      "epoch": 0.16047548291233285,
-      "grad_norm": 0.8515625,
       "learning_rate": 5e-05,
-      "loss": 1.483694076538086,
       "step": 54
     },
     {
-      "epoch": 0.1664190193164933,
-      "grad_norm": 0.859375,
       "learning_rate": 5e-05,
-      "loss": 1.3892195224761963,
       "step": 56
     },
     {
-      "epoch": 0.1723625557206538,
-      "grad_norm": 0.8359375,
       "learning_rate": 5e-05,
-      "loss": 1.4213433265686035,
       "step": 58
     },
     {
-      "epoch": 0.17830609212481427,
-      "grad_norm": 0.75,
       "learning_rate": 5e-05,
-      "loss": 1.5514280796051025,
       "step": 60
     },
     {
-      "epoch": 0.18424962852897475,
-      "grad_norm": 0.7265625,
       "learning_rate": 5e-05,
-      "loss": 1.415259599685669,
       "step": 62
     },
     {
-      "epoch": 0.1901931649331352,
-      "grad_norm": 0.83984375,
       "learning_rate": 5e-05,
-      "loss": 1.4870233535766602,
       "step": 64
     },
     {
-      "epoch": 0.1961367013372957,
-      "grad_norm": 0.83203125,
       "learning_rate": 5e-05,
-      "loss": 1.5115995407104492,
       "step": 66
     },
     {
-      "epoch": 0.20208023774145617,
-      "grad_norm": 0.83984375,
       "learning_rate": 5e-05,
-      "loss": 1.5251541137695312,
       "step": 68
     },
     {
-      "epoch": 0.20802377414561665,
-      "grad_norm": 0.8515625,
       "learning_rate": 5e-05,
-      "loss": 1.4827852249145508,
       "step": 70
     },
     {
-      "epoch": 0.2139673105497771,
-      "grad_norm": 1.0078125,
       "learning_rate": 5e-05,
-      "loss": 1.563201904296875,
       "step": 72
     },
     {
-      "epoch": 0.2199108469539376,
-      "grad_norm": 0.73046875,
       "learning_rate": 5e-05,
-      "loss": 1.488223910331726,
       "step": 74
     },
     {
-      "epoch": 0.22585438335809807,
-      "grad_norm": 0.8828125,
       "learning_rate": 5e-05,
-      "loss": 1.4918928146362305,
       "step": 76
     },
     {
-      "epoch": 0.23179791976225855,
-      "grad_norm": 0.78515625,
       "learning_rate": 5e-05,
-      "loss": 1.499556541442871,
       "step": 78
     },
     {
-      "epoch": 0.237741456166419,
-      "grad_norm": 0.68359375,
       "learning_rate": 5e-05,
-      "loss": 1.398608684539795,
       "step": 80
     },
     {
-      "epoch": 0.2436849925705795,
-      "grad_norm": 0.76953125,
       "learning_rate": 5e-05,
-      "loss": 1.491286277770996,
       "step": 82
     },
     {
-      "epoch": 0.24962852897473997,
-      "grad_norm": 0.93359375,
       "learning_rate": 5e-05,
-      "loss": 1.3734058141708374,
       "step": 84
     },
     {
-      "epoch": 0.2555720653789004,
-      "grad_norm": 0.9453125,
       "learning_rate": 5e-05,
-      "loss": 1.5038151741027832,
       "step": 86
     },
     {
-      "epoch": 0.26151560178306094,
-      "grad_norm": 0.73828125,
       "learning_rate": 5e-05,
-      "loss": 1.5082918405532837,
       "step": 88
     },
     {
-      "epoch": 0.2674591381872214,
-      "grad_norm": 0.73828125,
       "learning_rate": 5e-05,
-      "loss": 1.5045579671859741,
       "step": 90
     },
     {
-      "epoch": 0.27340267459138184,
-      "grad_norm": 0.765625,
       "learning_rate": 5e-05,
-      "loss": 1.550144910812378,
       "step": 92
     },
     {
-      "epoch": 0.27934621099554235,
       "grad_norm": 0.6875,
       "learning_rate": 5e-05,
-      "loss": 1.459304690361023,
       "step": 94
     },
     {
-      "epoch": 0.2852897473997028,
-      "grad_norm": 0.84765625,
       "learning_rate": 5e-05,
-      "loss": 1.4019618034362793,
       "step": 96
     },
     {
-      "epoch": 0.2912332838038633,
-      "grad_norm": 0.765625,
       "learning_rate": 5e-05,
-      "loss": 1.4445351362228394,
       "step": 98
     },
     {
-      "epoch": 0.2971768202080238,
-      "grad_norm": 1.0078125,
       "learning_rate": 5e-05,
-      "loss": 1.5843751430511475,
       "step": 100
     },
     {
-      "epoch": 0.3031203566121842,
-      "grad_norm": 0.8203125,
       "learning_rate": 5e-05,
-      "loss": 1.4642349481582642,
-      "step": 102
-    },
-    {
-      "epoch": 0.3031203566121842,
-      "eval_loss": 1.4828336238861084,
-      "eval_runtime": 1.1636,
-      "eval_samples_per_second": 93.674,
-      "eval_steps_per_second": 12.032,
       "step": 102
     },
     {
-      "epoch": 0.30906389301634474,
-      "grad_norm": 0.99609375,
       "learning_rate": 5e-05,
-      "loss": 1.4447834491729736,
       "step": 104
     },
     {
-      "epoch": 0.3150074294205052,
-      "grad_norm": 0.94921875,
       "learning_rate": 5e-05,
-      "loss": 1.4195196628570557,
       "step": 106
     },
     {
-      "epoch": 0.3209509658246657,
-      "grad_norm": 0.8046875,
       "learning_rate": 5e-05,
-      "loss": 1.4759737253189087,
       "step": 108
     },
     {
-      "epoch": 0.32689450222882616,
-      "grad_norm": 0.796875,
       "learning_rate": 5e-05,
-      "loss": 1.5813639163970947,
       "step": 110
     },
     {
-      "epoch": 0.3328380386329866,
-      "grad_norm": 0.8828125,
       "learning_rate": 5e-05,
-      "loss": 1.4312057495117188,
       "step": 112
     },
     {
-      "epoch": 0.3387815750371471,
-      "grad_norm": 0.8125,
       "learning_rate": 5e-05,
-      "loss": 1.405687689781189,
       "step": 114
     },
     {
-      "epoch": 0.3447251114413076,
-      "grad_norm": 0.89453125,
       "learning_rate": 5e-05,
-      "loss": 1.4332811832427979,
       "step": 116
     },
     {
-      "epoch": 0.35066864784546803,
-      "grad_norm": 0.72265625,
       "learning_rate": 5e-05,
-      "loss": 1.4324063062667847,
       "step": 118
     },
     {
-      "epoch": 0.35661218424962854,
-      "grad_norm": 0.89453125,
       "learning_rate": 5e-05,
-      "loss": 1.5157840251922607,
       "step": 120
     },
     {
-      "epoch": 0.362555720653789,
-      "grad_norm": 0.9375,
       "learning_rate": 5e-05,
-      "loss": 1.4901947975158691,
       "step": 122
     },
     {
-      "epoch": 0.3684992570579495,
-      "grad_norm": 0.85546875,
       "learning_rate": 5e-05,
-      "loss": 1.4857661724090576,
       "step": 124
     },
     {
-      "epoch": 0.37444279346210996,
-      "grad_norm": 0.75390625,
       "learning_rate": 5e-05,
-      "loss": 1.4482135772705078,
       "step": 126
     },
     {
-      "epoch": 0.3803863298662704,
-      "grad_norm": 0.86328125,
       "learning_rate": 5e-05,
-      "loss": 1.5095102787017822,
       "step": 128
     },
     {
-      "epoch": 0.3863298662704309,
-      "grad_norm": 0.8828125,
       "learning_rate": 5e-05,
-      "loss": 1.4518234729766846,
       "step": 130
     },
     {
-      "epoch": 0.3922734026745914,
-      "grad_norm": 0.7265625,
       "learning_rate": 5e-05,
-      "loss": 1.4330897331237793,
       "step": 132
     },
     {
-      "epoch": 0.39821693907875183,
-      "grad_norm": 0.796875,
       "learning_rate": 5e-05,
-      "loss": 1.462794303894043,
       "step": 134
     },
     {
-      "epoch": 0.40416047548291234,
-      "grad_norm": 0.76171875,
       "learning_rate": 5e-05,
-      "loss": 1.4570199251174927,
       "step": 136
     },
     {
-      "epoch": 0.4101040118870728,
-      "grad_norm": 0.8671875,
       "learning_rate": 5e-05,
-      "loss": 1.50748610496521,
       "step": 138
     },
     {
-      "epoch": 0.4160475482912333,
-      "grad_norm": 0.86328125,
       "learning_rate": 5e-05,
-      "loss": 1.478920578956604,
       "step": 140
     },
     {
-      "epoch": 0.42199108469539376,
-      "grad_norm": 0.8359375,
       "learning_rate": 5e-05,
-      "loss": 1.4309303760528564,
       "step": 142
     },
     {
-      "epoch": 0.4279346210995542,
-      "grad_norm": 0.6875,
       "learning_rate": 5e-05,
-      "loss": 1.5331854820251465,
       "step": 144
     },
     {
-      "epoch": 0.4338781575037147,
-      "grad_norm": 0.78125,
       "learning_rate": 5e-05,
-      "loss": 1.426405668258667,
       "step": 146
     },
     {
-      "epoch": 0.4398216939078752,
-      "grad_norm": 0.87109375,
       "learning_rate": 5e-05,
-      "loss": 1.4882712364196777,
       "step": 148
     },
     {
-      "epoch": 0.4457652303120357,
-      "grad_norm": 0.8359375,
       "learning_rate": 5e-05,
-      "loss": 1.5059183835983276,
       "step": 150
     },
     {
-      "epoch": 0.45170876671619614,
-      "grad_norm": 0.8671875,
       "learning_rate": 5e-05,
-      "loss": 1.3766722679138184,
       "step": 152
     },
     {
-      "epoch": 0.45468053491827637,
-      "eval_loss": 1.4696053266525269,
-      "eval_runtime": 1.1246,
-      "eval_samples_per_second": 96.922,
-      "eval_steps_per_second": 12.449,
-      "step": 153
     },
     {
-      "epoch": 0.4576523031203566,
-      "grad_norm": 0.81640625,
       "learning_rate": 5e-05,
-      "loss": 1.4256658554077148,
       "step": 154
     },
     {
-      "epoch": 0.4635958395245171,
-      "grad_norm": 0.83984375,
       "learning_rate": 5e-05,
-      "loss": 1.458268165588379,
       "step": 156
     },
     {
-      "epoch": 0.46953937592867756,
-      "grad_norm": 0.94921875,
       "learning_rate": 5e-05,
-      "loss": 1.3582110404968262,
       "step": 158
     },
     {
-      "epoch": 0.475482912332838,
-      "grad_norm": 0.875,
       "learning_rate": 5e-05,
-      "loss": 1.435016393661499,
       "step": 160
     },
     {
-      "epoch": 0.4814264487369985,
-      "grad_norm": 0.734375,
       "learning_rate": 5e-05,
-      "loss": 1.4493205547332764,
       "step": 162
     },
     {
-      "epoch": 0.487369985141159,
-      "grad_norm": 0.734375,
       "learning_rate": 5e-05,
-      "loss": 1.4993988275527954,
       "step": 164
     },
     {
-      "epoch": 0.4933135215453195,
-      "grad_norm": 0.72265625,
       "learning_rate": 5e-05,
-      "loss": 1.5079882144927979,
       "step": 166
     },
     {
-      "epoch": 0.49925705794947994,
-      "grad_norm": 0.75390625,
       "learning_rate": 5e-05,
-      "loss": 1.518232822418213,
       "step": 168
     },
     {
-      "epoch": 0.5052005943536404,
-      "grad_norm": 0.90234375,
       "learning_rate": 5e-05,
-      "loss": 1.5173208713531494,
       "step": 170
     },
     {
-      "epoch": 0.5111441307578009,
-      "grad_norm": 0.8359375,
       "learning_rate": 5e-05,
-      "loss": 1.4525389671325684,
       "step": 172
     },
     {
-      "epoch": 0.5170876671619614,
-      "grad_norm": 0.90234375,
       "learning_rate": 5e-05,
-      "loss": 1.5169626474380493,
       "step": 174
     },
     {
-      "epoch": 0.5230312035661219,
-      "grad_norm": 0.7578125,
       "learning_rate": 5e-05,
-      "loss": 1.3763575553894043,
       "step": 176
     },
     {
-      "epoch": 0.5289747399702823,
-      "grad_norm": 0.78125,
       "learning_rate": 5e-05,
-      "loss": 1.500097393989563,
       "step": 178
     },
     {
-      "epoch": 0.5349182763744428,
-      "grad_norm": 0.93359375,
       "learning_rate": 5e-05,
-      "loss": 1.4460171461105347,
       "step": 180
     },
     {
-      "epoch": 0.5408618127786032,
-      "grad_norm": 0.79296875,
       "learning_rate": 5e-05,
-      "loss": 1.4529346227645874,
       "step": 182
     },
     {
-      "epoch": 0.5468053491827637,
-      "grad_norm": 0.86328125,
       "learning_rate": 5e-05,
-      "loss": 1.4821476936340332,
       "step": 184
     },
     {
-      "epoch": 0.5527488855869243,
-      "grad_norm": 1.078125,
       "learning_rate": 5e-05,
-      "loss": 1.4030323028564453,
       "step": 186
     },
     {
-      "epoch": 0.5586924219910847,
-      "grad_norm": 1.0,
       "learning_rate": 5e-05,
-      "loss": 1.416299819946289,
       "step": 188
     },
     {
-      "epoch": 0.5646359583952452,
-      "grad_norm": 0.85546875,
       "learning_rate": 5e-05,
-      "loss": 1.4422305822372437,
       "step": 190
     },
     {
-      "epoch": 0.5705794947994056,
-      "grad_norm": 0.83203125,
       "learning_rate": 5e-05,
-      "loss": 1.4656535387039185,
       "step": 192
     },
     {
-      "epoch": 0.5765230312035661,
-      "grad_norm": 0.7421875,
       "learning_rate": 5e-05,
-      "loss": 1.4635984897613525,
       "step": 194
     },
     {
-      "epoch": 0.5824665676077266,
-      "grad_norm": 0.8828125,
       "learning_rate": 5e-05,
-      "loss": 1.4563398361206055,
       "step": 196
     },
     {
-      "epoch": 0.5884101040118871,
-      "grad_norm": 0.81640625,
       "learning_rate": 5e-05,
-      "loss": 1.4318304061889648,
       "step": 198
     },
     {
-      "epoch": 0.5943536404160475,
-      "grad_norm": 0.90234375,
       "learning_rate": 5e-05,
-      "loss": 1.389236569404602,
       "step": 200
     },
     {
-      "epoch": 0.600297176820208,
-      "grad_norm": 0.85546875,
       "learning_rate": 5e-05,
-      "loss": 1.4266612529754639,
       "step": 202
-    },
-    {
-      "epoch": 0.6062407132243685,
-      "grad_norm": 0.734375,
-      "learning_rate": 5e-05,
-      "loss": 1.5085554122924805,
-      "step": 204
-    },
-    {
-      "epoch": 0.6062407132243685,
-      "eval_loss": 1.4601037502288818,
-      "eval_runtime": 1.128,
-      "eval_samples_per_second": 96.628,
-      "eval_steps_per_second": 12.411,
-      "step": 204
-    },
-    {
-      "epoch": 0.612184249628529,
-      "grad_norm": 0.734375,
-      "learning_rate": 5e-05,
-      "loss": 1.4719808101654053,
-      "step": 206
-    },
-    {
-      "epoch": 0.6181277860326895,
-      "grad_norm": 0.90625,
-      "learning_rate": 5e-05,
-      "loss": 1.4344429969787598,
-      "step": 208
-    },
-    {
-      "epoch": 0.6240713224368499,
-      "grad_norm": 0.81640625,
-      "learning_rate": 5e-05,
-      "loss": 1.4264543056488037,
-      "step": 210
-    },
-    {
-      "epoch": 0.6300148588410104,
-      "grad_norm": 0.7265625,
-      "learning_rate": 5e-05,
-      "loss": 1.4732258319854736,
-      "step": 212
-    },
-    {
-      "epoch": 0.6359583952451708,
-      "grad_norm": 0.73828125,
-      "learning_rate": 5e-05,
-      "loss": 1.371578574180603,
-      "step": 214
-    },
-    {
-      "epoch": 0.6419019316493314,
-      "grad_norm": 0.82421875,
-      "learning_rate": 5e-05,
-      "loss": 1.4412343502044678,
-      "step": 216
-    },
-    {
-      "epoch": 0.6478454680534919,
-      "grad_norm": 0.71484375,
-      "learning_rate": 5e-05,
-      "loss": 1.51022207736969,
-      "step": 218
-    },
-    {
-      "epoch": 0.6537890044576523,
-      "grad_norm": 0.84375,
-      "learning_rate": 5e-05,
-      "loss": 1.367915391921997,
-      "step": 220
-    },
-    {
-      "epoch": 0.6597325408618128,
-      "grad_norm": 0.86328125,
-      "learning_rate": 5e-05,
-      "loss": 1.4306704998016357,
-      "step": 222
-    },
-    {
-      "epoch": 0.6656760772659732,
-      "grad_norm": 0.76171875,
-      "learning_rate": 5e-05,
-      "loss": 1.432612419128418,
-      "step": 224
-    },
-    {
-      "epoch": 0.6716196136701337,
-      "grad_norm": 0.67578125,
-      "learning_rate": 5e-05,
-      "loss": 1.4430606365203857,
-      "step": 226
-    },
-    {
-      "epoch": 0.6775631500742942,
-      "grad_norm": 0.67578125,
-      "learning_rate": 5e-05,
-      "loss": 1.4083107709884644,
-      "step": 228
-    },
-    {
-      "epoch": 0.6835066864784547,
-      "grad_norm": 0.87109375,
-      "learning_rate": 5e-05,
-      "loss": 1.4255032539367676,
-      "step": 230
-    },
-    {
-      "epoch": 0.6894502228826151,
-      "grad_norm": 0.828125,
-      "learning_rate": 5e-05,
-      "loss": 1.4819388389587402,
-      "step": 232
-    },
-    {
-      "epoch": 0.6953937592867756,
-      "grad_norm": 0.84375,
-      "learning_rate": 5e-05,
-      "loss": 1.541199803352356,
-      "step": 234
-    },
-    {
-      "epoch": 0.7013372956909361,
-      "grad_norm": 0.75390625,
-      "learning_rate": 5e-05,
-      "loss": 1.4741461277008057,
-      "step": 236
-    },
-    {
-      "epoch": 0.7072808320950966,
-      "grad_norm": 0.83203125,
-      "learning_rate": 5e-05,
-      "loss": 1.4825263023376465,
-      "step": 238
-    },
-    {
-      "epoch": 0.7132243684992571,
-      "grad_norm": 0.95703125,
-      "learning_rate": 5e-05,
-      "loss": 1.4338710308074951,
-      "step": 240
-    },
-    {
-      "epoch": 0.7191679049034175,
-      "grad_norm": 0.79296875,
-      "learning_rate": 5e-05,
-      "loss": 1.4071189165115356,
-      "step": 242
-    },
-    {
-      "epoch": 0.725111441307578,
-      "grad_norm": 0.765625,
-      "learning_rate": 5e-05,
-      "loss": 1.4799857139587402,
-      "step": 244
-    },
-    {
-      "epoch": 0.7310549777117384,
-      "grad_norm": 0.6953125,
-      "learning_rate": 5e-05,
-      "loss": 1.4438296556472778,
-      "step": 246
-    },
-    {
-      "epoch": 0.736998514115899,
-      "grad_norm": 0.77734375,
-      "learning_rate": 5e-05,
-      "loss": 1.4408268928527832,
-      "step": 248
-    },
-    {
-      "epoch": 0.7429420505200595,
-      "grad_norm": 0.7734375,
-      "learning_rate": 5e-05,
-      "loss": 1.3916218280792236,
-      "step": 250
-    },
-    {
-      "epoch": 0.7488855869242199,
-      "grad_norm": 0.94140625,
-      "learning_rate": 5e-05,
-      "loss": 1.3853819370269775,
-      "step": 252
-    },
-    {
-      "epoch": 0.7548291233283804,
-      "grad_norm": 0.83984375,
-      "learning_rate": 4.998292650357558e-05,
-      "loss": 1.3740458488464355,
-      "step": 254
-    },
-    {
-      "epoch": 0.7578008915304606,
-      "eval_loss": 1.4529203176498413,
-      "eval_runtime": 1.1255,
-      "eval_samples_per_second": 96.842,
-      "eval_steps_per_second": 12.438,
-      "step": 255
-    },
-    {
-      "epoch": 0.7607726597325408,
-      "grad_norm": 0.8671875,
-      "learning_rate": 4.984647842238185e-05,
-      "loss": 1.5183303356170654,
-      "step": 256
-    },
-    {
-      "epoch": 0.7667161961367014,
-      "grad_norm": 0.7265625,
-      "learning_rate": 4.957432749209755e-05,
-      "loss": 1.4994571208953857,
-      "step": 258
-    },
-    {
-      "epoch": 0.7726597325408618,
-      "grad_norm": 0.84765625,
-      "learning_rate": 4.916796010672969e-05,
-      "loss": 1.494471549987793,
-      "step": 260
-    },
-    {
-      "epoch": 0.7786032689450223,
-      "grad_norm": 0.8671875,
-      "learning_rate": 4.862959570402049e-05,
-      "loss": 1.4754960536956787,
-      "step": 262
-    },
-    {
-      "epoch": 0.7845468053491828,
-      "grad_norm": 0.8046875,
-      "learning_rate": 4.796217464364808e-05,
-      "loss": 1.4098687171936035,
-      "step": 264
-    },
-    {
-      "epoch": 0.7904903417533432,
-      "grad_norm": 0.8046875,
-      "learning_rate": 4.716934214800155e-05,
-      "loss": 1.445394515991211,
-      "step": 266
-    },
-    {
-      "epoch": 0.7964338781575037,
-      "grad_norm": 0.7109375,
-      "learning_rate": 4.625542839324036e-05,
-      "loss": 1.4716103076934814,
-      "step": 268
-    },
-    {
-      "epoch": 0.8023774145616642,
-      "grad_norm": 0.8359375,
-      "learning_rate": 4.522542485937369e-05,
-      "loss": 1.4395897388458252,
-      "step": 270
-    },
-    {
-      "epoch": 0.8083209509658247,
-      "grad_norm": 0.75,
-      "learning_rate": 4.408495706852758e-05,
-      "loss": 1.4456340074539185,
-      "step": 272
-    },
-    {
-      "epoch": 0.8142644873699851,
-      "grad_norm": 0.74609375,
-      "learning_rate": 4.284025386029381e-05,
-      "loss": 1.455463171005249,
-      "step": 274
-    },
-    {
-      "epoch": 0.8202080237741456,
-      "grad_norm": 0.8828125,
-      "learning_rate": 4.149811337196807e-05,
-      "loss": 1.4609105587005615,
-      "step": 276
-    },
-    {
-      "epoch": 0.826151560178306,
-      "grad_norm": 0.83203125,
-      "learning_rate": 4.0065865909481417e-05,
-      "loss": 1.4417420625686646,
-      "step": 278
-    },
-    {
-      "epoch": 0.8320950965824666,
-      "grad_norm": 0.86328125,
-      "learning_rate": 3.855133391181124e-05,
-      "loss": 1.4518589973449707,
-      "step": 280
-    },
-    {
-      "epoch": 0.8380386329866271,
-      "grad_norm": 0.76171875,
-      "learning_rate": 3.696278922753216e-05,
-      "loss": 1.4845668077468872,
-      "step": 282
-    },
-    {
-      "epoch": 0.8439821693907875,
-      "grad_norm": 0.80078125,
-      "learning_rate": 3.5308907936847594e-05,
-      "loss": 1.4887086153030396,
-      "step": 284
-    },
-    {
-      "epoch": 0.849925705794948,
-      "grad_norm": 0.7578125,
-      "learning_rate": 3.3598722965848204e-05,
-      "loss": 1.5054309368133545,
-      "step": 286
-    },
-    {
-      "epoch": 0.8558692421991084,
-      "grad_norm": 0.82421875,
-      "learning_rate": 3.1841574751802076e-05,
-      "loss": 1.5620818138122559,
-      "step": 288
-    },
-    {
-      "epoch": 0.861812778603269,
-      "grad_norm": 0.86328125,
-      "learning_rate": 3.0047060228925256e-05,
-      "loss": 1.5021510124206543,
-      "step": 290
-    },
-    {
-      "epoch": 0.8677563150074294,
-      "grad_norm": 0.71875,
-      "learning_rate": 2.8224980413255086e-05,
-      "loss": 1.514552354812622,
-      "step": 292
-    },
-    {
-      "epoch": 0.8736998514115899,
-      "grad_norm": 0.73828125,
-      "learning_rate": 2.638528687289925e-05,
-      "loss": 1.460189700126648,
-      "step": 294
-    },
-    {
-      "epoch": 0.8796433878157504,
-      "grad_norm": 0.7421875,
-      "learning_rate": 2.453802737602176e-05,
-      "loss": 1.3503719568252563,
-      "step": 296
-    },
-    {
-      "epoch": 0.8855869242199108,
-      "grad_norm": 0.76171875,
-      "learning_rate": 2.2693291013417453e-05,
-      "loss": 1.5173900127410889,
-      "step": 298
-    },
-    {
-      "epoch": 0.8915304606240714,
-      "grad_norm": 0.72265625,
-      "learning_rate": 2.0861153095396748e-05,
-      "loss": 1.4111042022705078,
-      "step": 300
-    },
-    {
-      "epoch": 0.8974739970282318,
-      "grad_norm": 0.71484375,
-      "learning_rate": 1.9051620123934537e-05,
-      "loss": 1.4748804569244385,
-      "step": 302
-    },
-    {
-      "epoch": 0.9034175334323923,
-      "grad_norm": 0.859375,
-      "learning_rate": 1.7274575140626318e-05,
-      "loss": 1.4101568460464478,
-      "step": 304
-    },
-    {
-      "epoch": 0.9093610698365527,
-      "grad_norm": 0.98828125,
-      "learning_rate": 1.5539723748942245e-05,
-      "loss": 1.5212171077728271,
-      "step": 306
-    },
-    {
-      "epoch": 0.9093610698365527,
-      "eval_loss": 1.4502739906311035,
-      "eval_runtime": 1.126,
-      "eval_samples_per_second": 96.803,
-      "eval_steps_per_second": 12.433,
-      "step": 306
-    },
-    {
-      "epoch": 0.9153046062407132,
-      "grad_norm": 0.79296875,
-      "learning_rate": 1.3856541105586545e-05,
-      "loss": 1.4394731521606445,
-      "step": 308
-    },
-    {
-      "epoch": 0.9212481426448736,
-      "grad_norm": 0.828125,
-      "learning_rate": 1.223422017047733e-05,
-      "loss": 1.4315065145492554,
-      "step": 310
-    },
-    {
-      "epoch": 0.9271916790490342,
-      "grad_norm": 0.7109375,
-      "learning_rate": 1.068162149798737e-05,
-      "loss": 1.4175217151641846,
-      "step": 312
-    },
-    {
-      "epoch": 0.9331352154531947,
-      "grad_norm": 0.8203125,
-      "learning_rate": 9.207224843668732e-06,
-      "loss": 1.4227707386016846,
-      "step": 314
-    },
-    {
-      "epoch": 0.9390787518573551,
-      "grad_norm": 0.79296875,
-      "learning_rate": 7.819082850768434e-06,
-      "loss": 1.4493082761764526,
-      "step": 316
-    },
-    {
-      "epoch": 0.9450222882615156,
-      "grad_norm": 0.8125,
-      "learning_rate": 6.524777069483526e-06,
-      "loss": 1.5667023658752441,
-      "step": 318
-    },
-    {
-      "epoch": 0.950965824665676,
-      "grad_norm": 0.8125,
-      "learning_rate": 5.33137654916292e-06,
-      "loss": 1.4023852348327637,
-      "step": 320
-    },
-    {
-      "epoch": 0.9569093610698366,
-      "grad_norm": 0.79296875,
-      "learning_rate": 4.245399229611238e-06,
-      "loss": 1.4751367568969727,
-      "step": 322
-    },
-    {
-      "epoch": 0.962852897473997,
-      "grad_norm": 0.82421875,
-      "learning_rate": 3.2727763423617913e-06,
-      "loss": 1.474877953529358,
-      "step": 324
-    },
-    {
-      "epoch": 0.9687964338781575,
-      "grad_norm": 0.7890625,
-      "learning_rate": 2.418820016346779e-06,
-      "loss": 1.4048078060150146,
-      "step": 326
-    },
-    {
-      "epoch": 0.974739970282318,
-      "grad_norm": 0.71484375,
-      "learning_rate": 1.6881942648911076e-06,
-      "loss": 1.442025899887085,
-      "step": 328
-    },
-    {
-      "epoch": 0.9806835066864784,
-      "grad_norm": 0.78515625,
-      "learning_rate": 1.0848895124889818e-06,
-      "loss": 1.4316831827163696,
-      "step": 330
-    },
-    {
-      "epoch": 0.986627043090639,
-      "grad_norm": 0.77734375,
-      "learning_rate": 6.122008004890851e-07,
-      "loss": 1.4720109701156616,
-      "step": 332
-    },
-    {
-      "epoch": 0.9925705794947994,
-      "grad_norm": 0.765625,
-      "learning_rate": 2.7270979072135104e-07,
-      "loss": 1.3915910720825195,
-      "step": 334
-    },
-    {
-      "epoch": 0.9985141158989599,
-      "grad_norm": 0.72265625,
-      "learning_rate": 6.827066535529946e-08,
-      "loss": 1.4640827178955078,
-      "step": 336
-    },
-    {
-      "epoch": 1.0,
-      "eval_loss": 1.44913911819458,
-      "eval_runtime": 1.1162,
-      "eval_samples_per_second": 97.657,
-      "eval_steps_per_second": 12.543,
-      "step": 337
     }
   ],
   "logging_steps": 2,
-  "max_steps": 337,
   "num_input_tokens_seen": 0,
-  "num_train_epochs": 1,
-  "save_steps": 135,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
@@ -1254,12 +745,12 @@
         "should_evaluate": false,
         "should_log": false,
         "should_save": true,
-        "should_training_stop": true
       },
       "attributes": {}
     }
   },
-  "total_flos": 5917994317250560.0,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.5024875621890548,
+  "eval_steps": 76,
+  "global_step": 202,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.004975124378109453,
+      "grad_norm": 1.3828125,
+      "learning_rate": 4.5454545454545455e-06,
+      "loss": 1.6639080047607422,
       "step": 2
     },
     {
+      "epoch": 0.009950248756218905,
+      "grad_norm": 2.140625,
+      "learning_rate": 1.3636363636363637e-05,
+      "loss": 1.7496397495269775,
       "step": 4
     },
     {
+      "epoch": 0.014925373134328358,
+      "grad_norm": 1.3828125,
+      "learning_rate": 2.272727272727273e-05,
+      "loss": 1.6321816444396973,
       "step": 6
     },
     {
+      "epoch": 0.01990049751243781,
+      "grad_norm": 1.6640625,
+      "learning_rate": 3.181818181818182e-05,
+      "loss": 1.6631131172180176,
       "step": 8
     },
     {
+      "epoch": 0.024875621890547265,
+      "grad_norm": 1.8359375,
+      "learning_rate": 4.0909090909090915e-05,
+      "loss": 1.5767230987548828,
       "step": 10
     },
     {
+      "epoch": 0.029850746268656716,
+      "grad_norm": 1.734375,
       "learning_rate": 5e-05,
+      "loss": 1.6631473302841187,
       "step": 12
     },
     {
+      "epoch": 0.03482587064676617,
+      "grad_norm": 1.3828125,
       "learning_rate": 5e-05,
+      "loss": 1.632326602935791,
       "step": 14
     },
     {
+      "epoch": 0.03980099502487562,
+      "grad_norm": 0.9296875,
       "learning_rate": 5e-05,
+      "loss": 1.6015336513519287,
       "step": 16
     },
     {
+      "epoch": 0.04477611940298507,
+      "grad_norm": 1.1484375,
       "learning_rate": 5e-05,
+      "loss": 1.542963981628418,
       "step": 18
     },
     {
+      "epoch": 0.04975124378109453,
+      "grad_norm": 1.125,
       "learning_rate": 5e-05,
+      "loss": 1.7240108251571655,
       "step": 20
     },
     {
+      "epoch": 0.05472636815920398,
+      "grad_norm": 0.89453125,
       "learning_rate": 5e-05,
+      "loss": 1.628758430480957,
       "step": 22
     },
     {
+      "epoch": 0.05970149253731343,
+      "grad_norm": 1.0859375,
       "learning_rate": 5e-05,
+      "loss": 1.5917885303497314,
       "step": 24
     },
     {
+      "epoch": 0.06467661691542288,
+      "grad_norm": 0.8984375,
       "learning_rate": 5e-05,
+      "loss": 1.5668330192565918,
       "step": 26
     },
     {
+      "epoch": 0.06965174129353234,
+      "grad_norm": 0.8125,
       "learning_rate": 5e-05,
+      "loss": 1.6025413274765015,
       "step": 28
     },
     {
+      "epoch": 0.07462686567164178,
+      "grad_norm": 0.7578125,
       "learning_rate": 5e-05,
+      "loss": 1.59916090965271,
       "step": 30
     },
     {
+      "epoch": 0.07960199004975124,
+      "grad_norm": 0.7890625,
       "learning_rate": 5e-05,
+      "loss": 1.6070656776428223,
       "step": 32
     },
     {
+      "epoch": 0.0845771144278607,
+      "grad_norm": 0.8046875,
       "learning_rate": 5e-05,
+      "loss": 1.5991092920303345,
       "step": 34
     },
     {
+      "epoch": 0.08955223880597014,
+      "grad_norm": 0.80859375,
       "learning_rate": 5e-05,
+      "loss": 1.517960548400879,
       "step": 36
     },
     {
+      "epoch": 0.0945273631840796,
+      "grad_norm": 0.74609375,
       "learning_rate": 5e-05,
+      "loss": 1.6418545246124268,
       "step": 38
     },
     {
+      "epoch": 0.09950248756218906,
+      "grad_norm": 0.765625,
       "learning_rate": 5e-05,
+      "loss": 1.5496408939361572,
       "step": 40
     },
     {
+      "epoch": 0.1044776119402985,
+      "grad_norm": 0.8046875,
       "learning_rate": 5e-05,
+      "loss": 1.591111183166504,
       "step": 42
     },
     {
+      "epoch": 0.10945273631840796,
+      "grad_norm": 0.79296875,
       "learning_rate": 5e-05,
+      "loss": 1.5569896697998047,
       "step": 44
     },
     {
+      "epoch": 0.11442786069651742,
+      "grad_norm": 0.71484375,
       "learning_rate": 5e-05,
+      "loss": 1.573204517364502,
       "step": 46
     },
     {
+      "epoch": 0.11940298507462686,
+      "grad_norm": 0.7421875,
       "learning_rate": 5e-05,
+      "loss": 1.5156298875808716,
       "step": 48
     },
     {
+      "epoch": 0.12437810945273632,
+      "grad_norm": 0.83984375,
       "learning_rate": 5e-05,
+      "loss": 1.4954731464385986,
       "step": 50
     },
     {
+      "epoch": 0.12935323383084577,
+      "grad_norm": 0.7734375,
       "learning_rate": 5e-05,
+      "loss": 1.6173453330993652,
       "step": 52
     },
     {
+      "epoch": 0.13432835820895522,
+      "grad_norm": 0.8671875,
       "learning_rate": 5e-05,
+      "loss": 1.579205870628357,
       "step": 54
     },
     {
+      "epoch": 0.13930348258706468,
+      "grad_norm": 0.84765625,
       "learning_rate": 5e-05,
+      "loss": 1.5793405771255493,
       "step": 56
     },
     {
+      "epoch": 0.14427860696517414,
+      "grad_norm": 0.84375,
       "learning_rate": 5e-05,
+      "loss": 1.5619373321533203,
       "step": 58
     },
     {
+      "epoch": 0.14925373134328357,
+      "grad_norm": 0.80859375,
       "learning_rate": 5e-05,
+      "loss": 1.5738036632537842,
       "step": 60
     },
     {
+      "epoch": 0.15422885572139303,
+      "grad_norm": 0.97265625,
       "learning_rate": 5e-05,
+      "loss": 1.528868317604065,
       "step": 62
     },
     {
+      "epoch": 0.15920398009950248,
+      "grad_norm": 0.77734375,
       "learning_rate": 5e-05,
+      "loss": 1.5742967128753662,
       "step": 64
     },
     {
+      "epoch": 0.16417910447761194,
+      "grad_norm": 0.765625,
       "learning_rate": 5e-05,
+      "loss": 1.5363436937332153,
       "step": 66
     },
     {
+      "epoch": 0.1691542288557214,
+      "grad_norm": 0.71484375,
       "learning_rate": 5e-05,
+      "loss": 1.5038269758224487,
       "step": 68
     },
     {
+      "epoch": 0.17412935323383086,
+      "grad_norm": 0.72265625,
       "learning_rate": 5e-05,
+      "loss": 1.5686390399932861,
       "step": 70
     },
     {
+      "epoch": 0.1791044776119403,
+      "grad_norm": 0.71875,
       "learning_rate": 5e-05,
+      "loss": 1.5683722496032715,
       "step": 72
     },
     {
+      "epoch": 0.18407960199004975,
+      "grad_norm": 0.8984375,
       "learning_rate": 5e-05,
+      "loss": 1.6040563583374023,
       "step": 74
     },
     {
+      "epoch": 0.1890547263681592,
+      "grad_norm": 0.7734375,
       "learning_rate": 5e-05,
+      "loss": 1.5213723182678223,
       "step": 76
     },
     {
+      "epoch": 0.1890547263681592,
+      "eval_loss": 1.5235487222671509,
+      "eval_runtime": 2.1832,
+      "eval_samples_per_second": 59.545,
+      "eval_steps_per_second": 7.787,
+      "step": 76
+    },
+    {
+      "epoch": 0.19402985074626866,
+      "grad_norm": 0.81640625,
       "learning_rate": 5e-05,
+      "loss": 1.548842191696167,
       "step": 78
     },
     {
+      "epoch": 0.19900497512437812,
+      "grad_norm": 0.71875,
       "learning_rate": 5e-05,
+      "loss": 1.4947301149368286,
       "step": 80
     },
     {
+      "epoch": 0.20398009950248755,
+      "grad_norm": 0.6953125,
       "learning_rate": 5e-05,
+      "loss": 1.5300225019454956,
       "step": 82
     },
     {
+      "epoch": 0.208955223880597,
+      "grad_norm": 0.6796875,
       "learning_rate": 5e-05,
+      "loss": 1.5121870040893555,
       "step": 84
     },
     {
+      "epoch": 0.21393034825870647,
+      "grad_norm": 0.79296875,
       "learning_rate": 5e-05,
+      "loss": 1.562124252319336,
       "step": 86
     },
     {
+      "epoch": 0.21890547263681592,
+      "grad_norm": 0.80078125,
       "learning_rate": 5e-05,
+      "loss": 1.5368881225585938,
       "step": 88
     },
     {
+      "epoch": 0.22388059701492538,
+      "grad_norm": 0.671875,
       "learning_rate": 5e-05,
+      "loss": 1.5035767555236816,
       "step": 90
     },
     {
+      "epoch": 0.22885572139303484,
+      "grad_norm": 0.6953125,
       "learning_rate": 5e-05,
+      "loss": 1.5528807640075684,
       "step": 92
     },
     {
+      "epoch": 0.23383084577114427,
       "grad_norm": 0.6875,
       "learning_rate": 5e-05,
+      "loss": 1.5195538997650146,
       "step": 94
     },
     {
+      "epoch": 0.23880597014925373,
+      "grad_norm": 0.6484375,
       "learning_rate": 5e-05,
+      "loss": 1.4883313179016113,
       "step": 96
     },
     {
+      "epoch": 0.24378109452736318,
+      "grad_norm": 0.7890625,
       "learning_rate": 5e-05,
+      "loss": 1.50142502784729,
       "step": 98
     },
     {
+      "epoch": 0.24875621890547264,
+      "grad_norm": 0.65234375,
       "learning_rate": 5e-05,
+      "loss": 1.5273784399032593,
       "step": 100
     },
     {
+      "epoch": 0.2537313432835821,
+      "grad_norm": 0.69140625,
       "learning_rate": 5e-05,
+      "loss": 1.5636398792266846,
       "step": 102
     },
     {
+      "epoch": 0.25870646766169153,
+      "grad_norm": 0.6171875,
       "learning_rate": 5e-05,
+      "loss": 1.5040175914764404,
       "step": 104
     },
     {
+      "epoch": 0.263681592039801,
+      "grad_norm": 0.7890625,
       "learning_rate": 5e-05,
+      "loss": 1.580956220626831,
       "step": 106
     },
     {
+      "epoch": 0.26865671641791045,
+      "grad_norm": 0.765625,
       "learning_rate": 5e-05,
+      "loss": 1.5606034994125366,
       "step": 108
     },
     {
+      "epoch": 0.2736318407960199,
+      "grad_norm": 0.7890625,
       "learning_rate": 5e-05,
+      "loss": 1.5479745864868164,
       "step": 110
     },
     {
+      "epoch": 0.27860696517412936,
+      "grad_norm": 0.796875,
       "learning_rate": 5e-05,
+      "loss": 1.4870773553848267,
       "step": 112
     },
     {
+      "epoch": 0.2835820895522388,
+      "grad_norm": 1.015625,
       "learning_rate": 5e-05,
+      "loss": 1.5059258937835693,
       "step": 114
     },
     {
+      "epoch": 0.2885572139303483,
+      "grad_norm": 0.68359375,
       "learning_rate": 5e-05,
+      "loss": 1.566910982131958,
       "step": 116
     },
     {
+      "epoch": 0.2935323383084577,
+      "grad_norm": 0.77734375,
       "learning_rate": 5e-05,
+      "loss": 1.5694658756256104,
       "step": 118
     },
     {
+      "epoch": 0.29850746268656714,
+      "grad_norm": 0.75390625,
       "learning_rate": 5e-05,
+      "loss": 1.6117546558380127,
       "step": 120
     },
     {
+      "epoch": 0.3034825870646766,
+      "grad_norm": 0.6953125,
       "learning_rate": 5e-05,
+      "loss": 1.5218111276626587,
       "step": 122
     },
     {
+      "epoch": 0.30845771144278605,
+      "grad_norm": 0.71484375,
       "learning_rate": 5e-05,
+      "loss": 1.4893097877502441,
       "step": 124
     },
     {
+      "epoch": 0.31343283582089554,
+      "grad_norm": 1.0078125,
       "learning_rate": 5e-05,
+      "loss": 1.5823085308074951,
       "step": 126
     },
     {
+      "epoch": 0.31840796019900497,
+      "grad_norm": 0.76171875,
       "learning_rate": 5e-05,
+      "loss": 1.5641398429870605,
       "step": 128
     },
     {
+      "epoch": 0.32338308457711445,
+      "grad_norm": 0.83984375,
       "learning_rate": 5e-05,
+      "loss": 1.573578119277954,
       "step": 130
     },
     {
+      "epoch": 0.3283582089552239,
+      "grad_norm": 0.67578125,
       "learning_rate": 5e-05,
+      "loss": 1.5401456356048584,
       "step": 132
     },
     {
+      "epoch": 0.3333333333333333,
+      "grad_norm": 0.80859375,
       "learning_rate": 5e-05,
+      "loss": 1.5921449661254883,
       "step": 134
     },
     {
+      "epoch": 0.3383084577114428,
+      "grad_norm": 0.796875,
       "learning_rate": 5e-05,
+      "loss": 1.4478504657745361,
       "step": 136
     },
     {
+      "epoch": 0.34328358208955223,
+      "grad_norm": 0.6953125,
       "learning_rate": 5e-05,
+      "loss": 1.5600370168685913,
       "step": 138
     },
     {
+      "epoch": 0.3482587064676617,
+      "grad_norm": 0.6796875,
       "learning_rate": 5e-05,
+      "loss": 1.5285460948944092,
       "step": 140
     },
     {
+      "epoch": 0.35323383084577115,
+      "grad_norm": 0.83203125,
       "learning_rate": 5e-05,
+      "loss": 1.542060375213623,
       "step": 142
     },
     {
+      "epoch": 0.3582089552238806,
+      "grad_norm": 0.73046875,
       "learning_rate": 5e-05,
+      "loss": 1.5423415899276733,
       "step": 144
     },
     {
+      "epoch": 0.36318407960199006,
+      "grad_norm": 0.7890625,
       "learning_rate": 5e-05,
+      "loss": 1.5532753467559814,
       "step": 146
     },
     {
+      "epoch": 0.3681592039800995,
+      "grad_norm": 0.703125,
       "learning_rate": 5e-05,
+      "loss": 1.498077392578125,
       "step": 148
     },
     {
+      "epoch": 0.373134328358209,
+      "grad_norm": 0.86328125,
       "learning_rate": 5e-05,
+      "loss": 1.4719334840774536,
       "step": 150
     },
     {
+      "epoch": 0.3781094527363184,
+      "grad_norm": 0.75390625,
       "learning_rate": 5e-05,
+      "loss": 1.6089732646942139,
       "step": 152
     },
     {
+      "epoch": 0.3781094527363184,
+      "eval_loss": 1.4965729713439941,
+      "eval_runtime": 1.4549,
+      "eval_samples_per_second": 89.354,
+      "eval_steps_per_second": 11.685,
+      "step": 152
     },
     {
+      "epoch": 0.38308457711442784,
+      "grad_norm": 0.765625,
       "learning_rate": 5e-05,
+      "loss": 1.4956027269363403,
       "step": 154
     },
     {
+      "epoch": 0.3880597014925373,
+      "grad_norm": 0.85546875,
       "learning_rate": 5e-05,
+      "loss": 1.4428843259811401,
       "step": 156
     },
     {
+      "epoch": 0.39303482587064675,
+      "grad_norm": 0.71484375,
       "learning_rate": 5e-05,
+      "loss": 1.5057318210601807,
       "step": 158
     },
     {
+      "epoch": 0.39800995024875624,
+      "grad_norm": 0.66015625,
       "learning_rate": 5e-05,
+      "loss": 1.5654449462890625,
       "step": 160
     },
     {
+      "epoch": 0.40298507462686567,
+      "grad_norm": 0.73046875,
       "learning_rate": 5e-05,
+      "loss": 1.5439975261688232,
       "step": 162
     },
     {
+      "epoch": 0.4079601990049751,
+      "grad_norm": 0.7421875,
       "learning_rate": 5e-05,
+      "loss": 1.5199835300445557,
       "step": 164
     },
     {
+      "epoch": 0.4129353233830846,
+      "grad_norm": 0.73828125,
       "learning_rate": 5e-05,
+      "loss": 1.4676998853683472,
       "step": 166
     },
     {
+      "epoch": 0.417910447761194,
+      "grad_norm": 0.671875,
       "learning_rate": 5e-05,
+      "loss": 1.5374722480773926,
       "step": 168
     },
     {
+      "epoch": 0.4228855721393035,
+      "grad_norm": 0.75390625,
       "learning_rate": 5e-05,
+      "loss": 1.563814401626587,
       "step": 170
     },
     {
+      "epoch": 0.42786069651741293,
+      "grad_norm": 0.8046875,
       "learning_rate": 5e-05,
+      "loss": 1.568427562713623,
       "step": 172
     },
     {
+      "epoch": 0.43283582089552236,
+      "grad_norm": 0.78125,
       "learning_rate": 5e-05,
+      "loss": 1.5757570266723633,
       "step": 174
     },
     {
+      "epoch": 0.43781094527363185,
+      "grad_norm": 0.6875,
       "learning_rate": 5e-05,
+      "loss": 1.5818047523498535,
       "step": 176
     },
     {
+      "epoch": 0.4427860696517413,
+      "grad_norm": 0.734375,
       "learning_rate": 5e-05,
+      "loss": 1.5185985565185547,
       "step": 178
     },
     {
+      "epoch": 0.44776119402985076,
+      "grad_norm": 0.67578125,
       "learning_rate": 5e-05,
+      "loss": 1.5347332954406738,
       "step": 180
     },
     {
+      "epoch": 0.4527363184079602,
+      "grad_norm": 1.0546875,
       "learning_rate": 5e-05,
+      "loss": 1.466269850730896,
       "step": 182
     },
     {
+      "epoch": 0.4577114427860697,
+      "grad_norm": 0.828125,
       "learning_rate": 5e-05,
+      "loss": 1.535921335220337,
       "step": 184
     },
     {
+      "epoch": 0.4626865671641791,
+      "grad_norm": 0.71484375,
       "learning_rate": 5e-05,
+      "loss": 1.559277057647705,
       "step": 186
     },
     {
+      "epoch": 0.46766169154228854,
+      "grad_norm": 0.78125,
       "learning_rate": 5e-05,
+      "loss": 1.5251140594482422,
       "step": 188
     },
     {
+      "epoch": 0.472636815920398,
+      "grad_norm": 0.640625,
       "learning_rate": 5e-05,
+      "loss": 1.5697033405303955,
       "step": 190
     },
     {
+      "epoch": 0.47761194029850745,
+      "grad_norm": 0.71875,
       "learning_rate": 5e-05,
+      "loss": 1.4658384323120117,
       "step": 192
     },
     {
+      "epoch": 0.48258706467661694,
+      "grad_norm": 0.828125,
       "learning_rate": 5e-05,
+      "loss": 1.5391371250152588,
       "step": 194
     },
     {
+      "epoch": 0.48756218905472637,
+      "grad_norm": 0.62890625,
       "learning_rate": 5e-05,
+      "loss": 1.517061710357666,
       "step": 196
     },
     {
+      "epoch": 0.4925373134328358,
+      "grad_norm": 0.85546875,
       "learning_rate": 5e-05,
+      "loss": 1.549302339553833,
       "step": 198
     },
     {
+      "epoch": 0.4975124378109453,
+      "grad_norm": 0.70703125,
       "learning_rate": 5e-05,
+      "loss": 1.52018404006958,
       "step": 200
     },
     {
+      "epoch": 0.5024875621890548,
+      "grad_norm": 0.65625,
       "learning_rate": 5e-05,
+      "loss": 1.5727933645248413,
       "step": 202
     }
   ],
   "logging_steps": 2,
+  "max_steps": 503,
   "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 202,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
         "should_evaluate": false,
         "should_log": false,
         "should_save": true,
+        "should_training_stop": false
       },
       "attributes": {}
     }
   },
+  "total_flos": 4803482146045952.0,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

last-checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:15867342fd377cf74773a6b47c5bb552643dbadb054a78f267afaea78cf9a2a2
-size 5777

 version https://git-lfs.github.com/spec/v1
+oid sha256:cfdf90fde84a79be9f401a8d95d02b50c23a941f33d8ecb04d6f5a91b2b29739
+size 5841