Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on 13 days ago

Commit

ba500b3

verified ·

1 Parent(s): ad0f8ae

Training in progress, step 12285, checkpoint

Browse files

Files changed (4) hide show

last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +1102 -3

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3f526a3e8a8fcf0d894a7150e1bca30ac3d5b419920fe259f4717ec06cbacee2
 size 1171937904

 version https://git-lfs.github.com/spec/v1
+oid sha256:ccf3832274462c19ae6e1156105c162797ccf65eb00490b5abf25853f514c2bc
 size 1171937904

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e3838b4a4abf4b914da876d4a14a454c5f4a0de704be3f8adea3b497048f6044
 size 1288212619

 version https://git-lfs.github.com/spec/v1
+oid sha256:643729e56f97304b8e07b41a168a1e56714ba24900040ce753f6443fda6c397d
 size 1288212619

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5203a46e674e5add3d3c850bbbfcabeaf929f40436e8f78372e7b5b311c4463a
 size 1401

 version https://git-lfs.github.com/spec/v1
+oid sha256:42b726231982d4fccb23f68a4f834f6558cd6f912a84230e86f2230216451432
 size 1401

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.38,
   "eval_steps": 3150,
-  "global_step": 11970,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -41927,6 +41927,1105 @@
       "learning_rate": 0.1,
       "loss": 2.4388091564178467,
       "step": 11970
     }
   ],
   "logging_steps": 2,
@@ -41946,7 +43045,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 3.964287804106193e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.39,
   "eval_steps": 3150,
+  "global_step": 12285,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "learning_rate": 0.1,
       "loss": 2.4388091564178467,
       "step": 11970
+    },
+    {
+      "epoch": 0.38006349206349205,
+      "grad_norm": 0.1318359375,
+      "learning_rate": 0.1,
+      "loss": 2.4562020301818848,
+      "step": 11972
+    },
+    {
+      "epoch": 0.38012698412698415,
+      "grad_norm": 0.17578125,
+      "learning_rate": 0.1,
+      "loss": 2.4419918060302734,
+      "step": 11974
+    },
+    {
+      "epoch": 0.3801904761904762,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.1,
+      "loss": 2.4348676204681396,
+      "step": 11976
+    },
+    {
+      "epoch": 0.38025396825396823,
+      "grad_norm": 0.2177734375,
+      "learning_rate": 0.1,
+      "loss": 2.4611902236938477,
+      "step": 11978
+    },
+    {
+      "epoch": 0.38031746031746033,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.1,
+      "loss": 2.4563024044036865,
+      "step": 11980
+    },
+    {
+      "epoch": 0.3803809523809524,
+      "grad_norm": 0.16015625,
+      "learning_rate": 0.1,
+      "loss": 2.445188045501709,
+      "step": 11982
+    },
+    {
+      "epoch": 0.3804444444444444,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.474489212036133,
+      "step": 11984
+    },
+    {
+      "epoch": 0.3805079365079365,
+      "grad_norm": 0.39453125,
+      "learning_rate": 0.1,
+      "loss": 2.4510064125061035,
+      "step": 11986
+    },
+    {
+      "epoch": 0.38057142857142856,
+      "grad_norm": 0.236328125,
+      "learning_rate": 0.1,
+      "loss": 2.494896650314331,
+      "step": 11988
+    },
+    {
+      "epoch": 0.38063492063492066,
+      "grad_norm": 0.1552734375,
+      "learning_rate": 0.1,
+      "loss": 2.458981513977051,
+      "step": 11990
+    },
+    {
+      "epoch": 0.3806984126984127,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.1,
+      "loss": 2.4623570442199707,
+      "step": 11992
+    },
+    {
+      "epoch": 0.38076190476190475,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.1,
+      "loss": 2.4659810066223145,
+      "step": 11994
+    },
+    {
+      "epoch": 0.38082539682539684,
+      "grad_norm": 0.07958984375,
+      "learning_rate": 0.1,
+      "loss": 2.465498447418213,
+      "step": 11996
+    },
+    {
+      "epoch": 0.3808888888888889,
+      "grad_norm": 0.0859375,
+      "learning_rate": 0.1,
+      "loss": 2.4689385890960693,
+      "step": 11998
+    },
+    {
+      "epoch": 0.38095238095238093,
+      "grad_norm": 0.1826171875,
+      "learning_rate": 0.1,
+      "loss": 2.4477922916412354,
+      "step": 12000
+    },
+    {
+      "epoch": 0.38101587301587303,
+      "grad_norm": 0.6484375,
+      "learning_rate": 0.1,
+      "loss": 2.4727680683135986,
+      "step": 12002
+    },
+    {
+      "epoch": 0.3810793650793651,
+      "grad_norm": 0.1787109375,
+      "learning_rate": 0.1,
+      "loss": 2.4675381183624268,
+      "step": 12004
+    },
+    {
+      "epoch": 0.3811428571428571,
+      "grad_norm": 0.060302734375,
+      "learning_rate": 0.1,
+      "loss": 2.453152894973755,
+      "step": 12006
+    },
+    {
+      "epoch": 0.3812063492063492,
+      "grad_norm": 0.2158203125,
+      "learning_rate": 0.1,
+      "loss": 2.4358911514282227,
+      "step": 12008
+    },
+    {
+      "epoch": 0.38126984126984126,
+      "grad_norm": 0.21875,
+      "learning_rate": 0.1,
+      "loss": 2.4744372367858887,
+      "step": 12010
+    },
+    {
+      "epoch": 0.38133333333333336,
+      "grad_norm": 0.09912109375,
+      "learning_rate": 0.1,
+      "loss": 2.45621919631958,
+      "step": 12012
+    },
+    {
+      "epoch": 0.3813968253968254,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.1,
+      "loss": 2.468158006668091,
+      "step": 12014
+    },
+    {
+      "epoch": 0.38146031746031744,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.4647765159606934,
+      "step": 12016
+    },
+    {
+      "epoch": 0.38152380952380954,
+      "grad_norm": 0.1494140625,
+      "learning_rate": 0.1,
+      "loss": 2.4902184009552,
+      "step": 12018
+    },
+    {
+      "epoch": 0.3815873015873016,
+      "grad_norm": 0.06884765625,
+      "learning_rate": 0.1,
+      "loss": 2.4720568656921387,
+      "step": 12020
+    },
+    {
+      "epoch": 0.38165079365079363,
+      "grad_norm": 0.1767578125,
+      "learning_rate": 0.1,
+      "loss": 2.4778892993927,
+      "step": 12022
+    },
+    {
+      "epoch": 0.38171428571428573,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.1,
+      "loss": 2.46598482131958,
+      "step": 12024
+    },
+    {
+      "epoch": 0.38177777777777777,
+      "grad_norm": 0.302734375,
+      "learning_rate": 0.1,
+      "loss": 2.4907238483428955,
+      "step": 12026
+    },
+    {
+      "epoch": 0.3818412698412698,
+      "grad_norm": 0.1455078125,
+      "learning_rate": 0.1,
+      "loss": 2.474137783050537,
+      "step": 12028
+    },
+    {
+      "epoch": 0.3819047619047619,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.1,
+      "loss": 2.4604177474975586,
+      "step": 12030
+    },
+    {
+      "epoch": 0.38196825396825396,
+      "grad_norm": 0.25,
+      "learning_rate": 0.1,
+      "loss": 2.4689693450927734,
+      "step": 12032
+    },
+    {
+      "epoch": 0.38203174603174606,
+      "grad_norm": 0.0859375,
+      "learning_rate": 0.1,
+      "loss": 2.458057403564453,
+      "step": 12034
+    },
+    {
+      "epoch": 0.3820952380952381,
+      "grad_norm": 0.1318359375,
+      "learning_rate": 0.1,
+      "loss": 2.489271879196167,
+      "step": 12036
+    },
+    {
+      "epoch": 0.38215873015873014,
+      "grad_norm": 0.12109375,
+      "learning_rate": 0.1,
+      "loss": 2.4576919078826904,
+      "step": 12038
+    },
+    {
+      "epoch": 0.38222222222222224,
+      "grad_norm": 0.228515625,
+      "learning_rate": 0.1,
+      "loss": 2.463383674621582,
+      "step": 12040
+    },
+    {
+      "epoch": 0.3822857142857143,
+      "grad_norm": 0.322265625,
+      "learning_rate": 0.1,
+      "loss": 2.4628543853759766,
+      "step": 12042
+    },
+    {
+      "epoch": 0.3823492063492063,
+      "grad_norm": 0.291015625,
+      "learning_rate": 0.1,
+      "loss": 2.4454421997070312,
+      "step": 12044
+    },
+    {
+      "epoch": 0.3824126984126984,
+      "grad_norm": 0.06005859375,
+      "learning_rate": 0.1,
+      "loss": 2.4660277366638184,
+      "step": 12046
+    },
+    {
+      "epoch": 0.38247619047619047,
+      "grad_norm": 0.216796875,
+      "learning_rate": 0.1,
+      "loss": 2.484807252883911,
+      "step": 12048
+    },
+    {
+      "epoch": 0.3825396825396825,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.1,
+      "loss": 2.485915184020996,
+      "step": 12050
+    },
+    {
+      "epoch": 0.3826031746031746,
+      "grad_norm": 0.3046875,
+      "learning_rate": 0.1,
+      "loss": 2.4765751361846924,
+      "step": 12052
+    },
+    {
+      "epoch": 0.38266666666666665,
+      "grad_norm": 0.0615234375,
+      "learning_rate": 0.1,
+      "loss": 2.45377516746521,
+      "step": 12054
+    },
+    {
+      "epoch": 0.38273015873015875,
+      "grad_norm": 0.1005859375,
+      "learning_rate": 0.1,
+      "loss": 2.477993965148926,
+      "step": 12056
+    },
+    {
+      "epoch": 0.3827936507936508,
+      "grad_norm": 0.1787109375,
+      "learning_rate": 0.1,
+      "loss": 2.476003646850586,
+      "step": 12058
+    },
+    {
+      "epoch": 0.38285714285714284,
+      "grad_norm": 0.26171875,
+      "learning_rate": 0.1,
+      "loss": 2.4513702392578125,
+      "step": 12060
+    },
+    {
+      "epoch": 0.38292063492063494,
+      "grad_norm": 0.2490234375,
+      "learning_rate": 0.1,
+      "loss": 2.481924295425415,
+      "step": 12062
+    },
+    {
+      "epoch": 0.382984126984127,
+      "grad_norm": 0.2119140625,
+      "learning_rate": 0.1,
+      "loss": 2.4330499172210693,
+      "step": 12064
+    },
+    {
+      "epoch": 0.383047619047619,
+      "grad_norm": 0.1015625,
+      "learning_rate": 0.1,
+      "loss": 2.4708168506622314,
+      "step": 12066
+    },
+    {
+      "epoch": 0.3831111111111111,
+      "grad_norm": 0.07421875,
+      "learning_rate": 0.1,
+      "loss": 2.4653377532958984,
+      "step": 12068
+    },
+    {
+      "epoch": 0.38317460317460317,
+      "grad_norm": 0.16796875,
+      "learning_rate": 0.1,
+      "loss": 2.5031898021698,
+      "step": 12070
+    },
+    {
+      "epoch": 0.3832380952380952,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.1,
+      "loss": 2.475032329559326,
+      "step": 12072
+    },
+    {
+      "epoch": 0.3833015873015873,
+      "grad_norm": 0.11865234375,
+      "learning_rate": 0.1,
+      "loss": 2.4548768997192383,
+      "step": 12074
+    },
+    {
+      "epoch": 0.38336507936507935,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.1,
+      "loss": 2.457284688949585,
+      "step": 12076
+    },
+    {
+      "epoch": 0.38342857142857145,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.1,
+      "loss": 2.4320826530456543,
+      "step": 12078
+    },
+    {
+      "epoch": 0.3834920634920635,
+      "grad_norm": 0.0908203125,
+      "learning_rate": 0.1,
+      "loss": 2.460562229156494,
+      "step": 12080
+    },
+    {
+      "epoch": 0.38355555555555554,
+      "grad_norm": 0.44140625,
+      "learning_rate": 0.1,
+      "loss": 2.4694409370422363,
+      "step": 12082
+    },
+    {
+      "epoch": 0.38361904761904764,
+      "grad_norm": 0.21484375,
+      "learning_rate": 0.1,
+      "loss": 2.4640471935272217,
+      "step": 12084
+    },
+    {
+      "epoch": 0.3836825396825397,
+      "grad_norm": 0.11181640625,
+      "learning_rate": 0.1,
+      "loss": 2.480008363723755,
+      "step": 12086
+    },
+    {
+      "epoch": 0.3837460317460317,
+      "grad_norm": 0.09765625,
+      "learning_rate": 0.1,
+      "loss": 2.4768412113189697,
+      "step": 12088
+    },
+    {
+      "epoch": 0.3838095238095238,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.1,
+      "loss": 2.4503283500671387,
+      "step": 12090
+    },
+    {
+      "epoch": 0.38387301587301587,
+      "grad_norm": 0.474609375,
+      "learning_rate": 0.1,
+      "loss": 2.4610016345977783,
+      "step": 12092
+    },
+    {
+      "epoch": 0.3839365079365079,
+      "grad_norm": 0.486328125,
+      "learning_rate": 0.1,
+      "loss": 2.4768073558807373,
+      "step": 12094
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 0.046142578125,
+      "learning_rate": 0.1,
+      "loss": 2.4615330696105957,
+      "step": 12096
+    },
+    {
+      "epoch": 0.38406349206349205,
+      "grad_norm": 0.16015625,
+      "learning_rate": 0.1,
+      "loss": 2.4912209510803223,
+      "step": 12098
+    },
+    {
+      "epoch": 0.38412698412698415,
+      "grad_norm": 0.2109375,
+      "learning_rate": 0.1,
+      "loss": 2.4521989822387695,
+      "step": 12100
+    },
+    {
+      "epoch": 0.3841904761904762,
+      "grad_norm": 0.1845703125,
+      "learning_rate": 0.1,
+      "loss": 2.466559886932373,
+      "step": 12102
+    },
+    {
+      "epoch": 0.38425396825396824,
+      "grad_norm": 0.1630859375,
+      "learning_rate": 0.1,
+      "loss": 2.465717315673828,
+      "step": 12104
+    },
+    {
+      "epoch": 0.38431746031746034,
+      "grad_norm": 0.1630859375,
+      "learning_rate": 0.1,
+      "loss": 2.475071907043457,
+      "step": 12106
+    },
+    {
+      "epoch": 0.3843809523809524,
+      "grad_norm": 0.470703125,
+      "learning_rate": 0.1,
+      "loss": 2.5202531814575195,
+      "step": 12108
+    },
+    {
+      "epoch": 0.3844444444444444,
+      "grad_norm": 0.470703125,
+      "learning_rate": 0.1,
+      "loss": 2.4715182781219482,
+      "step": 12110
+    },
+    {
+      "epoch": 0.3845079365079365,
+      "grad_norm": 0.1162109375,
+      "learning_rate": 0.1,
+      "loss": 2.455341100692749,
+      "step": 12112
+    },
+    {
+      "epoch": 0.38457142857142856,
+      "grad_norm": 0.10400390625,
+      "learning_rate": 0.1,
+      "loss": 2.4790189266204834,
+      "step": 12114
+    },
+    {
+      "epoch": 0.3846349206349206,
+      "grad_norm": 0.06884765625,
+      "learning_rate": 0.1,
+      "loss": 2.47458553314209,
+      "step": 12116
+    },
+    {
+      "epoch": 0.3846984126984127,
+      "grad_norm": 0.07763671875,
+      "learning_rate": 0.1,
+      "loss": 2.475902557373047,
+      "step": 12118
+    },
+    {
+      "epoch": 0.38476190476190475,
+      "grad_norm": 0.12255859375,
+      "learning_rate": 0.1,
+      "loss": 2.498180389404297,
+      "step": 12120
+    },
+    {
+      "epoch": 0.38482539682539685,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.1,
+      "loss": 2.507518768310547,
+      "step": 12122
+    },
+    {
+      "epoch": 0.3848888888888889,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.1,
+      "loss": 2.4781503677368164,
+      "step": 12124
+    },
+    {
+      "epoch": 0.38495238095238093,
+      "grad_norm": 0.06396484375,
+      "learning_rate": 0.1,
+      "loss": 2.5078864097595215,
+      "step": 12126
+    },
+    {
+      "epoch": 0.38501587301587303,
+      "grad_norm": 0.0859375,
+      "learning_rate": 0.1,
+      "loss": 2.479459285736084,
+      "step": 12128
+    },
+    {
+      "epoch": 0.3850793650793651,
+      "grad_norm": 0.138671875,
+      "learning_rate": 0.1,
+      "loss": 2.5114645957946777,
+      "step": 12130
+    },
+    {
+      "epoch": 0.3851428571428571,
+      "grad_norm": 0.29296875,
+      "learning_rate": 0.1,
+      "loss": 2.4905712604522705,
+      "step": 12132
+    },
+    {
+      "epoch": 0.3852063492063492,
+      "grad_norm": 0.12060546875,
+      "learning_rate": 0.1,
+      "loss": 2.4662930965423584,
+      "step": 12134
+    },
+    {
+      "epoch": 0.38526984126984126,
+      "grad_norm": 0.177734375,
+      "learning_rate": 0.1,
+      "loss": 2.484269857406616,
+      "step": 12136
+    },
+    {
+      "epoch": 0.38533333333333336,
+      "grad_norm": 0.265625,
+      "learning_rate": 0.1,
+      "loss": 2.501328229904175,
+      "step": 12138
+    },
+    {
+      "epoch": 0.3853968253968254,
+      "grad_norm": 0.068359375,
+      "learning_rate": 0.1,
+      "loss": 2.4805939197540283,
+      "step": 12140
+    },
+    {
+      "epoch": 0.38546031746031745,
+      "grad_norm": 0.2392578125,
+      "learning_rate": 0.1,
+      "loss": 2.4765126705169678,
+      "step": 12142
+    },
+    {
+      "epoch": 0.38552380952380955,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.1,
+      "loss": 2.500314474105835,
+      "step": 12144
+    },
+    {
+      "epoch": 0.3855873015873016,
+      "grad_norm": 0.125,
+      "learning_rate": 0.1,
+      "loss": 2.5027472972869873,
+      "step": 12146
+    },
+    {
+      "epoch": 0.38565079365079363,
+      "grad_norm": 0.2890625,
+      "learning_rate": 0.1,
+      "loss": 2.473562240600586,
+      "step": 12148
+    },
+    {
+      "epoch": 0.38571428571428573,
+      "grad_norm": 0.2890625,
+      "learning_rate": 0.1,
+      "loss": 2.511924982070923,
+      "step": 12150
+    },
+    {
+      "epoch": 0.3857777777777778,
+      "grad_norm": 0.140625,
+      "learning_rate": 0.1,
+      "loss": 2.4819207191467285,
+      "step": 12152
+    },
+    {
+      "epoch": 0.3858412698412698,
+      "grad_norm": 0.06591796875,
+      "learning_rate": 0.1,
+      "loss": 2.4831345081329346,
+      "step": 12154
+    },
+    {
+      "epoch": 0.3859047619047619,
+      "grad_norm": 0.0849609375,
+      "learning_rate": 0.1,
+      "loss": 2.4863007068634033,
+      "step": 12156
+    },
+    {
+      "epoch": 0.38596825396825396,
+      "grad_norm": 0.30859375,
+      "learning_rate": 0.1,
+      "loss": 2.460641622543335,
+      "step": 12158
+    },
+    {
+      "epoch": 0.38603174603174606,
+      "grad_norm": 0.392578125,
+      "learning_rate": 0.1,
+      "loss": 2.4990243911743164,
+      "step": 12160
+    },
+    {
+      "epoch": 0.3860952380952381,
+      "grad_norm": 0.1455078125,
+      "learning_rate": 0.1,
+      "loss": 2.5030386447906494,
+      "step": 12162
+    },
+    {
+      "epoch": 0.38615873015873015,
+      "grad_norm": 0.248046875,
+      "learning_rate": 0.1,
+      "loss": 2.4420783519744873,
+      "step": 12164
+    },
+    {
+      "epoch": 0.38622222222222224,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.1,
+      "loss": 2.504606008529663,
+      "step": 12166
+    },
+    {
+      "epoch": 0.3862857142857143,
+      "grad_norm": 0.06640625,
+      "learning_rate": 0.1,
+      "loss": 2.5177011489868164,
+      "step": 12168
+    },
+    {
+      "epoch": 0.38634920634920633,
+      "grad_norm": 0.119140625,
+      "learning_rate": 0.1,
+      "loss": 2.5138041973114014,
+      "step": 12170
+    },
+    {
+      "epoch": 0.38641269841269843,
+      "grad_norm": 0.11181640625,
+      "learning_rate": 0.1,
+      "loss": 2.4538416862487793,
+      "step": 12172
+    },
+    {
+      "epoch": 0.3864761904761905,
+      "grad_norm": 0.1640625,
+      "learning_rate": 0.1,
+      "loss": 2.4983925819396973,
+      "step": 12174
+    },
+    {
+      "epoch": 0.3865396825396825,
+      "grad_norm": 0.177734375,
+      "learning_rate": 0.1,
+      "loss": 2.472153425216675,
+      "step": 12176
+    },
+    {
+      "epoch": 0.3866031746031746,
+      "grad_norm": 0.08544921875,
+      "learning_rate": 0.1,
+      "loss": 2.4816946983337402,
+      "step": 12178
+    },
+    {
+      "epoch": 0.38666666666666666,
+      "grad_norm": 0.1767578125,
+      "learning_rate": 0.1,
+      "loss": 2.5129659175872803,
+      "step": 12180
+    },
+    {
+      "epoch": 0.38673015873015876,
+      "grad_norm": 0.11376953125,
+      "learning_rate": 0.1,
+      "loss": 2.4627156257629395,
+      "step": 12182
+    },
+    {
+      "epoch": 0.3867936507936508,
+      "grad_norm": 0.3984375,
+      "learning_rate": 0.1,
+      "loss": 2.494533061981201,
+      "step": 12184
+    },
+    {
+      "epoch": 0.38685714285714284,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.1,
+      "loss": 2.471395254135132,
+      "step": 12186
+    },
+    {
+      "epoch": 0.38692063492063494,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.1,
+      "loss": 2.4891700744628906,
+      "step": 12188
+    },
+    {
+      "epoch": 0.386984126984127,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.1,
+      "loss": 2.5044522285461426,
+      "step": 12190
+    },
+    {
+      "epoch": 0.38704761904761903,
+      "grad_norm": 0.2890625,
+      "learning_rate": 0.1,
+      "loss": 2.487046241760254,
+      "step": 12192
+    },
+    {
+      "epoch": 0.38711111111111113,
+      "grad_norm": 0.1708984375,
+      "learning_rate": 0.1,
+      "loss": 2.514503002166748,
+      "step": 12194
+    },
+    {
+      "epoch": 0.38717460317460317,
+      "grad_norm": 0.232421875,
+      "learning_rate": 0.1,
+      "loss": 2.4909095764160156,
+      "step": 12196
+    },
+    {
+      "epoch": 0.3872380952380952,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.1,
+      "loss": 2.4744057655334473,
+      "step": 12198
+    },
+    {
+      "epoch": 0.3873015873015873,
+      "grad_norm": 0.1103515625,
+      "learning_rate": 0.1,
+      "loss": 2.4896419048309326,
+      "step": 12200
+    },
+    {
+      "epoch": 0.38736507936507936,
+      "grad_norm": 0.119140625,
+      "learning_rate": 0.1,
+      "loss": 2.4401516914367676,
+      "step": 12202
+    },
+    {
+      "epoch": 0.38742857142857146,
+      "grad_norm": 0.103515625,
+      "learning_rate": 0.1,
+      "loss": 2.45546293258667,
+      "step": 12204
+    },
+    {
+      "epoch": 0.3874920634920635,
+      "grad_norm": 0.203125,
+      "learning_rate": 0.1,
+      "loss": 2.4839749336242676,
+      "step": 12206
+    },
+    {
+      "epoch": 0.38755555555555554,
+      "grad_norm": 0.09375,
+      "learning_rate": 0.1,
+      "loss": 2.457061767578125,
+      "step": 12208
+    },
+    {
+      "epoch": 0.38761904761904764,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.1,
+      "loss": 2.4803502559661865,
+      "step": 12210
+    },
+    {
+      "epoch": 0.3876825396825397,
+      "grad_norm": 0.10791015625,
+      "learning_rate": 0.1,
+      "loss": 2.4752790927886963,
+      "step": 12212
+    },
+    {
+      "epoch": 0.3877460317460317,
+      "grad_norm": 0.1796875,
+      "learning_rate": 0.1,
+      "loss": 2.475409746170044,
+      "step": 12214
+    },
+    {
+      "epoch": 0.3878095238095238,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.1,
+      "loss": 2.4633171558380127,
+      "step": 12216
+    },
+    {
+      "epoch": 0.38787301587301587,
+      "grad_norm": 0.1923828125,
+      "learning_rate": 0.1,
+      "loss": 2.451033115386963,
+      "step": 12218
+    },
+    {
+      "epoch": 0.3879365079365079,
+      "grad_norm": 0.263671875,
+      "learning_rate": 0.1,
+      "loss": 2.4517745971679688,
+      "step": 12220
+    },
+    {
+      "epoch": 0.388,
+      "grad_norm": 0.267578125,
+      "learning_rate": 0.1,
+      "loss": 2.4649953842163086,
+      "step": 12222
+    },
+    {
+      "epoch": 0.38806349206349205,
+      "grad_norm": 0.11376953125,
+      "learning_rate": 0.1,
+      "loss": 2.4534993171691895,
+      "step": 12224
+    },
+    {
+      "epoch": 0.38812698412698415,
+      "grad_norm": 0.06884765625,
+      "learning_rate": 0.1,
+      "loss": 2.4374120235443115,
+      "step": 12226
+    },
+    {
+      "epoch": 0.3881904761904762,
+      "grad_norm": 0.1123046875,
+      "learning_rate": 0.1,
+      "loss": 2.444390296936035,
+      "step": 12228
+    },
+    {
+      "epoch": 0.38825396825396824,
+      "grad_norm": 0.1640625,
+      "learning_rate": 0.1,
+      "loss": 2.4295685291290283,
+      "step": 12230
+    },
+    {
+      "epoch": 0.38831746031746034,
+      "grad_norm": 0.267578125,
+      "learning_rate": 0.1,
+      "loss": 2.4740664958953857,
+      "step": 12232
+    },
+    {
+      "epoch": 0.3883809523809524,
+      "grad_norm": 0.150390625,
+      "learning_rate": 0.1,
+      "loss": 2.4552464485168457,
+      "step": 12234
+    },
+    {
+      "epoch": 0.3884444444444444,
+      "grad_norm": 0.248046875,
+      "learning_rate": 0.1,
+      "loss": 2.469726085662842,
+      "step": 12236
+    },
+    {
+      "epoch": 0.3885079365079365,
+      "grad_norm": 0.423828125,
+      "learning_rate": 0.1,
+      "loss": 2.4829745292663574,
+      "step": 12238
+    },
+    {
+      "epoch": 0.38857142857142857,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.1,
+      "loss": 2.461160182952881,
+      "step": 12240
+    },
+    {
+      "epoch": 0.3886349206349206,
+      "grad_norm": 0.10888671875,
+      "learning_rate": 0.1,
+      "loss": 2.465991973876953,
+      "step": 12242
+    },
+    {
+      "epoch": 0.3886984126984127,
+      "grad_norm": 0.181640625,
+      "learning_rate": 0.1,
+      "loss": 2.464043617248535,
+      "step": 12244
+    },
+    {
+      "epoch": 0.38876190476190475,
+      "grad_norm": 0.140625,
+      "learning_rate": 0.1,
+      "loss": 2.4663949012756348,
+      "step": 12246
+    },
+    {
+      "epoch": 0.38882539682539685,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.1,
+      "loss": 2.4549779891967773,
+      "step": 12248
+    },
+    {
+      "epoch": 0.3888888888888889,
+      "grad_norm": 0.140625,
+      "learning_rate": 0.1,
+      "loss": 2.4570398330688477,
+      "step": 12250
+    },
+    {
+      "epoch": 0.38895238095238094,
+      "grad_norm": 0.10400390625,
+      "learning_rate": 0.1,
+      "loss": 2.4533965587615967,
+      "step": 12252
+    },
+    {
+      "epoch": 0.38901587301587304,
+      "grad_norm": 0.1552734375,
+      "learning_rate": 0.1,
+      "loss": 2.461820363998413,
+      "step": 12254
+    },
+    {
+      "epoch": 0.3890793650793651,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.1,
+      "loss": 2.473161220550537,
+      "step": 12256
+    },
+    {
+      "epoch": 0.3891428571428571,
+      "grad_norm": 0.5859375,
+      "learning_rate": 0.1,
+      "loss": 2.4561400413513184,
+      "step": 12258
+    },
+    {
+      "epoch": 0.3892063492063492,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.1,
+      "loss": 2.4945881366729736,
+      "step": 12260
+    },
+    {
+      "epoch": 0.38926984126984127,
+      "grad_norm": 0.1787109375,
+      "learning_rate": 0.1,
+      "loss": 2.4576754570007324,
+      "step": 12262
+    },
+    {
+      "epoch": 0.3893333333333333,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.1,
+      "loss": 2.4777143001556396,
+      "step": 12264
+    },
+    {
+      "epoch": 0.3893968253968254,
+      "grad_norm": 0.25390625,
+      "learning_rate": 0.1,
+      "loss": 2.466712713241577,
+      "step": 12266
+    },
+    {
+      "epoch": 0.38946031746031745,
+      "grad_norm": 0.3359375,
+      "learning_rate": 0.1,
+      "loss": 2.4671568870544434,
+      "step": 12268
+    },
+    {
+      "epoch": 0.38952380952380955,
+      "grad_norm": 0.2138671875,
+      "learning_rate": 0.1,
+      "loss": 2.449784517288208,
+      "step": 12270
+    },
+    {
+      "epoch": 0.3895873015873016,
+      "grad_norm": 0.09130859375,
+      "learning_rate": 0.1,
+      "loss": 2.468003273010254,
+      "step": 12272
+    },
+    {
+      "epoch": 0.38965079365079364,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.1,
+      "loss": 2.455387830734253,
+      "step": 12274
+    },
+    {
+      "epoch": 0.38971428571428574,
+      "grad_norm": 0.052490234375,
+      "learning_rate": 0.1,
+      "loss": 2.445190191268921,
+      "step": 12276
+    },
+    {
+      "epoch": 0.3897777777777778,
+      "grad_norm": 0.08056640625,
+      "learning_rate": 0.1,
+      "loss": 2.44834041595459,
+      "step": 12278
+    },
+    {
+      "epoch": 0.3898412698412698,
+      "grad_norm": 0.25,
+      "learning_rate": 0.1,
+      "loss": 2.4347023963928223,
+      "step": 12280
+    },
+    {
+      "epoch": 0.3899047619047619,
+      "grad_norm": 0.296875,
+      "learning_rate": 0.1,
+      "loss": 2.424799680709839,
+      "step": 12282
+    },
+    {
+      "epoch": 0.38996825396825396,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.1,
+      "loss": 2.452523708343506,
+      "step": 12284
     }
   ],
   "logging_steps": 2,
       "attributes": {}
     }
   },
+  "total_flos": 4.068607701147668e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null