Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on 11 days ago

Commit

b67d153

verified ·

1 Parent(s): c9fd956

Training in progress, step 12600, checkpoint

Browse files

Files changed (4) hide show

last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +1117 -3

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ccf3832274462c19ae6e1156105c162797ccf65eb00490b5abf25853f514c2bc
 size 1171937904

 version https://git-lfs.github.com/spec/v1
+oid sha256:4f7a031ce062de2717c20cbfe28bf235cc8c0984196df8647a9d04071a672be7
 size 1171937904

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:643729e56f97304b8e07b41a168a1e56714ba24900040ce753f6443fda6c397d
 size 1288212619

 version https://git-lfs.github.com/spec/v1
+oid sha256:2bca4c44c416761810f1c3083731707ad3a2d1e7ac24304feb6b7f426c34993d
 size 1288212619

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:42b726231982d4fccb23f68a4f834f6558cd6f912a84230e86f2230216451432
 size 1401

 version https://git-lfs.github.com/spec/v1
+oid sha256:17739480306268eecb229c9abd21a55ac8184b30446253afafb60d7a0227de30
 size 1401

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.39,
   "eval_steps": 3150,
-  "global_step": 12285,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -43026,6 +43026,1120 @@
       "learning_rate": 0.1,
       "loss": 2.452523708343506,
       "step": 12284
     }
   ],
   "logging_steps": 2,
@@ -43045,7 +44159,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 4.068607701147668e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.4,
   "eval_steps": 3150,
+  "global_step": 12600,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "learning_rate": 0.1,
       "loss": 2.452523708343506,
       "step": 12284
+    },
+    {
+      "epoch": 0.390031746031746,
+      "grad_norm": 0.109375,
+      "learning_rate": 0.1,
+      "loss": 2.4557151794433594,
+      "step": 12286
+    },
+    {
+      "epoch": 0.3900952380952381,
+      "grad_norm": 0.1884765625,
+      "learning_rate": 0.1,
+      "loss": 2.424776315689087,
+      "step": 12288
+    },
+    {
+      "epoch": 0.39015873015873015,
+      "grad_norm": 0.279296875,
+      "learning_rate": 0.1,
+      "loss": 2.4696648120880127,
+      "step": 12290
+    },
+    {
+      "epoch": 0.39022222222222225,
+      "grad_norm": 0.1064453125,
+      "learning_rate": 0.1,
+      "loss": 2.4684576988220215,
+      "step": 12292
+    },
+    {
+      "epoch": 0.3902857142857143,
+      "grad_norm": 0.1806640625,
+      "learning_rate": 0.1,
+      "loss": 2.461087942123413,
+      "step": 12294
+    },
+    {
+      "epoch": 0.39034920634920633,
+      "grad_norm": 0.203125,
+      "learning_rate": 0.1,
+      "loss": 2.430222749710083,
+      "step": 12296
+    },
+    {
+      "epoch": 0.39041269841269843,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.1,
+      "loss": 2.4839582443237305,
+      "step": 12298
+    },
+    {
+      "epoch": 0.3904761904761905,
+      "grad_norm": 0.1611328125,
+      "learning_rate": 0.1,
+      "loss": 2.4276487827301025,
+      "step": 12300
+    },
+    {
+      "epoch": 0.3905396825396825,
+      "grad_norm": 0.123046875,
+      "learning_rate": 0.1,
+      "loss": 2.4334208965301514,
+      "step": 12302
+    },
+    {
+      "epoch": 0.3906031746031746,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.1,
+      "loss": 2.4657487869262695,
+      "step": 12304
+    },
+    {
+      "epoch": 0.39066666666666666,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.1,
+      "loss": 2.4697165489196777,
+      "step": 12306
+    },
+    {
+      "epoch": 0.3907301587301587,
+      "grad_norm": 0.263671875,
+      "learning_rate": 0.1,
+      "loss": 2.4890670776367188,
+      "step": 12308
+    },
+    {
+      "epoch": 0.3907936507936508,
+      "grad_norm": 0.078125,
+      "learning_rate": 0.1,
+      "loss": 2.482919931411743,
+      "step": 12310
+    },
+    {
+      "epoch": 0.39085714285714285,
+      "grad_norm": 0.1845703125,
+      "learning_rate": 0.1,
+      "loss": 2.482670545578003,
+      "step": 12312
+    },
+    {
+      "epoch": 0.39092063492063495,
+      "grad_norm": 0.4765625,
+      "learning_rate": 0.1,
+      "loss": 2.473506212234497,
+      "step": 12314
+    },
+    {
+      "epoch": 0.390984126984127,
+      "grad_norm": 0.11865234375,
+      "learning_rate": 0.1,
+      "loss": 2.484362840652466,
+      "step": 12316
+    },
+    {
+      "epoch": 0.39104761904761903,
+      "grad_norm": 0.09375,
+      "learning_rate": 0.1,
+      "loss": 2.49501895904541,
+      "step": 12318
+    },
+    {
+      "epoch": 0.39111111111111113,
+      "grad_norm": 0.1044921875,
+      "learning_rate": 0.1,
+      "loss": 2.473717451095581,
+      "step": 12320
+    },
+    {
+      "epoch": 0.3911746031746032,
+      "grad_norm": 0.13671875,
+      "learning_rate": 0.1,
+      "loss": 2.4876880645751953,
+      "step": 12322
+    },
+    {
+      "epoch": 0.3912380952380952,
+      "grad_norm": 0.220703125,
+      "learning_rate": 0.1,
+      "loss": 2.4805827140808105,
+      "step": 12324
+    },
+    {
+      "epoch": 0.3913015873015873,
+      "grad_norm": 0.31640625,
+      "learning_rate": 0.1,
+      "loss": 2.508021593093872,
+      "step": 12326
+    },
+    {
+      "epoch": 0.39136507936507936,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.1,
+      "loss": 2.4558026790618896,
+      "step": 12328
+    },
+    {
+      "epoch": 0.3914285714285714,
+      "grad_norm": 0.138671875,
+      "learning_rate": 0.1,
+      "loss": 2.4671530723571777,
+      "step": 12330
+    },
+    {
+      "epoch": 0.3914920634920635,
+      "grad_norm": 0.1435546875,
+      "learning_rate": 0.1,
+      "loss": 2.4863462448120117,
+      "step": 12332
+    },
+    {
+      "epoch": 0.39155555555555555,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.1,
+      "loss": 2.462531328201294,
+      "step": 12334
+    },
+    {
+      "epoch": 0.39161904761904764,
+      "grad_norm": 0.11865234375,
+      "learning_rate": 0.1,
+      "loss": 2.4865562915802,
+      "step": 12336
+    },
+    {
+      "epoch": 0.3916825396825397,
+      "grad_norm": 0.12353515625,
+      "learning_rate": 0.1,
+      "loss": 2.481738805770874,
+      "step": 12338
+    },
+    {
+      "epoch": 0.39174603174603173,
+      "grad_norm": 0.130859375,
+      "learning_rate": 0.1,
+      "loss": 2.490565061569214,
+      "step": 12340
+    },
+    {
+      "epoch": 0.39180952380952383,
+      "grad_norm": 0.1826171875,
+      "learning_rate": 0.1,
+      "loss": 2.481773614883423,
+      "step": 12342
+    },
+    {
+      "epoch": 0.3918730158730159,
+      "grad_norm": 0.50390625,
+      "learning_rate": 0.1,
+      "loss": 2.4730658531188965,
+      "step": 12344
+    },
+    {
+      "epoch": 0.3919365079365079,
+      "grad_norm": 0.298828125,
+      "learning_rate": 0.1,
+      "loss": 2.4841883182525635,
+      "step": 12346
+    },
+    {
+      "epoch": 0.392,
+      "grad_norm": 0.1572265625,
+      "learning_rate": 0.1,
+      "loss": 2.493680715560913,
+      "step": 12348
+    },
+    {
+      "epoch": 0.39206349206349206,
+      "grad_norm": 0.130859375,
+      "learning_rate": 0.1,
+      "loss": 2.502474784851074,
+      "step": 12350
+    },
+    {
+      "epoch": 0.3921269841269841,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.1,
+      "loss": 2.4785163402557373,
+      "step": 12352
+    },
+    {
+      "epoch": 0.3921904761904762,
+      "grad_norm": 0.4296875,
+      "learning_rate": 0.1,
+      "loss": 2.491947650909424,
+      "step": 12354
+    },
+    {
+      "epoch": 0.39225396825396824,
+      "grad_norm": 0.1328125,
+      "learning_rate": 0.1,
+      "loss": 2.5014781951904297,
+      "step": 12356
+    },
+    {
+      "epoch": 0.39231746031746034,
+      "grad_norm": 0.125,
+      "learning_rate": 0.1,
+      "loss": 2.479877471923828,
+      "step": 12358
+    },
+    {
+      "epoch": 0.3923809523809524,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.1,
+      "loss": 2.460399627685547,
+      "step": 12360
+    },
+    {
+      "epoch": 0.39244444444444443,
+      "grad_norm": 0.236328125,
+      "learning_rate": 0.1,
+      "loss": 2.4771628379821777,
+      "step": 12362
+    },
+    {
+      "epoch": 0.39250793650793653,
+      "grad_norm": 0.09619140625,
+      "learning_rate": 0.1,
+      "loss": 2.5179996490478516,
+      "step": 12364
+    },
+    {
+      "epoch": 0.39257142857142857,
+      "grad_norm": 0.158203125,
+      "learning_rate": 0.1,
+      "loss": 2.4573960304260254,
+      "step": 12366
+    },
+    {
+      "epoch": 0.3926349206349206,
+      "grad_norm": 0.07666015625,
+      "learning_rate": 0.1,
+      "loss": 2.488504409790039,
+      "step": 12368
+    },
+    {
+      "epoch": 0.3926984126984127,
+      "grad_norm": 0.279296875,
+      "learning_rate": 0.1,
+      "loss": 2.48992657661438,
+      "step": 12370
+    },
+    {
+      "epoch": 0.39276190476190476,
+      "grad_norm": 0.443359375,
+      "learning_rate": 0.1,
+      "loss": 2.4311540126800537,
+      "step": 12372
+    },
+    {
+      "epoch": 0.3928253968253968,
+      "grad_norm": 0.0859375,
+      "learning_rate": 0.1,
+      "loss": 2.4825241565704346,
+      "step": 12374
+    },
+    {
+      "epoch": 0.3928888888888889,
+      "grad_norm": 0.140625,
+      "learning_rate": 0.1,
+      "loss": 2.4729347229003906,
+      "step": 12376
+    },
+    {
+      "epoch": 0.39295238095238094,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.1,
+      "loss": 2.4759974479675293,
+      "step": 12378
+    },
+    {
+      "epoch": 0.39301587301587304,
+      "grad_norm": 0.05712890625,
+      "learning_rate": 0.1,
+      "loss": 2.490987539291382,
+      "step": 12380
+    },
+    {
+      "epoch": 0.3930793650793651,
+      "grad_norm": 0.0595703125,
+      "learning_rate": 0.1,
+      "loss": 2.4801199436187744,
+      "step": 12382
+    },
+    {
+      "epoch": 0.3931428571428571,
+      "grad_norm": 0.0966796875,
+      "learning_rate": 0.1,
+      "loss": 2.4446208477020264,
+      "step": 12384
+    },
+    {
+      "epoch": 0.3932063492063492,
+      "grad_norm": 0.30078125,
+      "learning_rate": 0.1,
+      "loss": 2.462266683578491,
+      "step": 12386
+    },
+    {
+      "epoch": 0.39326984126984127,
+      "grad_norm": 0.55859375,
+      "learning_rate": 0.1,
+      "loss": 2.4918887615203857,
+      "step": 12388
+    },
+    {
+      "epoch": 0.3933333333333333,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.1,
+      "loss": 2.4611899852752686,
+      "step": 12390
+    },
+    {
+      "epoch": 0.3933968253968254,
+      "grad_norm": 0.1796875,
+      "learning_rate": 0.1,
+      "loss": 2.4827232360839844,
+      "step": 12392
+    },
+    {
+      "epoch": 0.39346031746031745,
+      "grad_norm": 0.1640625,
+      "learning_rate": 0.1,
+      "loss": 2.4664206504821777,
+      "step": 12394
+    },
+    {
+      "epoch": 0.3935238095238095,
+      "grad_norm": 0.09375,
+      "learning_rate": 0.1,
+      "loss": 2.463472843170166,
+      "step": 12396
+    },
+    {
+      "epoch": 0.3935873015873016,
+      "grad_norm": 0.146484375,
+      "learning_rate": 0.1,
+      "loss": 2.452786445617676,
+      "step": 12398
+    },
+    {
+      "epoch": 0.39365079365079364,
+      "grad_norm": 0.16015625,
+      "learning_rate": 0.1,
+      "loss": 2.477893590927124,
+      "step": 12400
+    },
+    {
+      "epoch": 0.39371428571428574,
+      "grad_norm": 0.3359375,
+      "learning_rate": 0.1,
+      "loss": 2.4455089569091797,
+      "step": 12402
+    },
+    {
+      "epoch": 0.3937777777777778,
+      "grad_norm": 0.53515625,
+      "learning_rate": 0.1,
+      "loss": 2.4739983081817627,
+      "step": 12404
+    },
+    {
+      "epoch": 0.3938412698412698,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.1,
+      "loss": 2.463010549545288,
+      "step": 12406
+    },
+    {
+      "epoch": 0.3939047619047619,
+      "grad_norm": 0.09765625,
+      "learning_rate": 0.1,
+      "loss": 2.485477924346924,
+      "step": 12408
+    },
+    {
+      "epoch": 0.39396825396825397,
+      "grad_norm": 0.11376953125,
+      "learning_rate": 0.1,
+      "loss": 2.4666271209716797,
+      "step": 12410
+    },
+    {
+      "epoch": 0.394031746031746,
+      "grad_norm": 0.11962890625,
+      "learning_rate": 0.1,
+      "loss": 2.4628119468688965,
+      "step": 12412
+    },
+    {
+      "epoch": 0.3940952380952381,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.1,
+      "loss": 2.4641313552856445,
+      "step": 12414
+    },
+    {
+      "epoch": 0.39415873015873015,
+      "grad_norm": 0.37890625,
+      "learning_rate": 0.1,
+      "loss": 2.4827420711517334,
+      "step": 12416
+    },
+    {
+      "epoch": 0.3942222222222222,
+      "grad_norm": 0.26953125,
+      "learning_rate": 0.1,
+      "loss": 2.471240520477295,
+      "step": 12418
+    },
+    {
+      "epoch": 0.3942857142857143,
+      "grad_norm": 0.10791015625,
+      "learning_rate": 0.1,
+      "loss": 2.4685475826263428,
+      "step": 12420
+    },
+    {
+      "epoch": 0.39434920634920634,
+      "grad_norm": 0.12158203125,
+      "learning_rate": 0.1,
+      "loss": 2.4665637016296387,
+      "step": 12422
+    },
+    {
+      "epoch": 0.39441269841269844,
+      "grad_norm": 0.2177734375,
+      "learning_rate": 0.1,
+      "loss": 2.4597582817077637,
+      "step": 12424
+    },
+    {
+      "epoch": 0.3944761904761905,
+      "grad_norm": 0.06982421875,
+      "learning_rate": 0.1,
+      "loss": 2.4702558517456055,
+      "step": 12426
+    },
+    {
+      "epoch": 0.3945396825396825,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.1,
+      "loss": 2.458078145980835,
+      "step": 12428
+    },
+    {
+      "epoch": 0.3946031746031746,
+      "grad_norm": 0.07568359375,
+      "learning_rate": 0.1,
+      "loss": 2.460068702697754,
+      "step": 12430
+    },
+    {
+      "epoch": 0.39466666666666667,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.1,
+      "loss": 2.4554131031036377,
+      "step": 12432
+    },
+    {
+      "epoch": 0.3947301587301587,
+      "grad_norm": 0.08056640625,
+      "learning_rate": 0.1,
+      "loss": 2.488631248474121,
+      "step": 12434
+    },
+    {
+      "epoch": 0.3947936507936508,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.4419941902160645,
+      "step": 12436
+    },
+    {
+      "epoch": 0.39485714285714285,
+      "grad_norm": 0.25,
+      "learning_rate": 0.1,
+      "loss": 2.4841761589050293,
+      "step": 12438
+    },
+    {
+      "epoch": 0.3949206349206349,
+      "grad_norm": 0.26171875,
+      "learning_rate": 0.1,
+      "loss": 2.4492082595825195,
+      "step": 12440
+    },
+    {
+      "epoch": 0.394984126984127,
+      "grad_norm": 0.26953125,
+      "learning_rate": 0.1,
+      "loss": 2.447962760925293,
+      "step": 12442
+    },
+    {
+      "epoch": 0.39504761904761904,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.1,
+      "loss": 2.4587724208831787,
+      "step": 12444
+    },
+    {
+      "epoch": 0.39511111111111114,
+      "grad_norm": 0.0712890625,
+      "learning_rate": 0.1,
+      "loss": 2.462385892868042,
+      "step": 12446
+    },
+    {
+      "epoch": 0.3951746031746032,
+      "grad_norm": 0.057373046875,
+      "learning_rate": 0.1,
+      "loss": 2.4769206047058105,
+      "step": 12448
+    },
+    {
+      "epoch": 0.3952380952380952,
+      "grad_norm": 0.26953125,
+      "learning_rate": 0.1,
+      "loss": 2.448030948638916,
+      "step": 12450
+    },
+    {
+      "epoch": 0.3953015873015873,
+      "grad_norm": 0.5,
+      "learning_rate": 0.1,
+      "loss": 2.474348783493042,
+      "step": 12452
+    },
+    {
+      "epoch": 0.39536507936507936,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.1,
+      "loss": 2.4807779788970947,
+      "step": 12454
+    },
+    {
+      "epoch": 0.3954285714285714,
+      "grad_norm": 0.06884765625,
+      "learning_rate": 0.1,
+      "loss": 2.4678142070770264,
+      "step": 12456
+    },
+    {
+      "epoch": 0.3954920634920635,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.1,
+      "loss": 2.4569032192230225,
+      "step": 12458
+    },
+    {
+      "epoch": 0.39555555555555555,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.1,
+      "loss": 2.476902484893799,
+      "step": 12460
+    },
+    {
+      "epoch": 0.3956190476190476,
+      "grad_norm": 0.140625,
+      "learning_rate": 0.1,
+      "loss": 2.4690651893615723,
+      "step": 12462
+    },
+    {
+      "epoch": 0.3956825396825397,
+      "grad_norm": 0.2451171875,
+      "learning_rate": 0.1,
+      "loss": 2.4898293018341064,
+      "step": 12464
+    },
+    {
+      "epoch": 0.39574603174603173,
+      "grad_norm": 0.375,
+      "learning_rate": 0.1,
+      "loss": 2.4907736778259277,
+      "step": 12466
+    },
+    {
+      "epoch": 0.39580952380952383,
+      "grad_norm": 0.130859375,
+      "learning_rate": 0.1,
+      "loss": 2.4488649368286133,
+      "step": 12468
+    },
+    {
+      "epoch": 0.3958730158730159,
+      "grad_norm": 0.1298828125,
+      "learning_rate": 0.1,
+      "loss": 2.454991102218628,
+      "step": 12470
+    },
+    {
+      "epoch": 0.3959365079365079,
+      "grad_norm": 0.07275390625,
+      "learning_rate": 0.1,
+      "loss": 2.489849328994751,
+      "step": 12472
+    },
+    {
+      "epoch": 0.396,
+      "grad_norm": 0.1513671875,
+      "learning_rate": 0.1,
+      "loss": 2.4257235527038574,
+      "step": 12474
+    },
+    {
+      "epoch": 0.39606349206349206,
+      "grad_norm": 0.056396484375,
+      "learning_rate": 0.1,
+      "loss": 2.471559762954712,
+      "step": 12476
+    },
+    {
+      "epoch": 0.3961269841269841,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.1,
+      "loss": 2.4803552627563477,
+      "step": 12478
+    },
+    {
+      "epoch": 0.3961904761904762,
+      "grad_norm": 0.451171875,
+      "learning_rate": 0.1,
+      "loss": 2.445671319961548,
+      "step": 12480
+    },
+    {
+      "epoch": 0.39625396825396825,
+      "grad_norm": 0.1923828125,
+      "learning_rate": 0.1,
+      "loss": 2.468325138092041,
+      "step": 12482
+    },
+    {
+      "epoch": 0.3963174603174603,
+      "grad_norm": 0.11865234375,
+      "learning_rate": 0.1,
+      "loss": 2.4626095294952393,
+      "step": 12484
+    },
+    {
+      "epoch": 0.3963809523809524,
+      "grad_norm": 0.0625,
+      "learning_rate": 0.1,
+      "loss": 2.4606876373291016,
+      "step": 12486
+    },
+    {
+      "epoch": 0.39644444444444443,
+      "grad_norm": 0.333984375,
+      "learning_rate": 0.1,
+      "loss": 2.4665181636810303,
+      "step": 12488
+    },
+    {
+      "epoch": 0.39650793650793653,
+      "grad_norm": 0.50390625,
+      "learning_rate": 0.1,
+      "loss": 2.468353748321533,
+      "step": 12490
+    },
+    {
+      "epoch": 0.3965714285714286,
+      "grad_norm": 0.2158203125,
+      "learning_rate": 0.1,
+      "loss": 2.486236095428467,
+      "step": 12492
+    },
+    {
+      "epoch": 0.3966349206349206,
+      "grad_norm": 0.0869140625,
+      "learning_rate": 0.1,
+      "loss": 2.473392963409424,
+      "step": 12494
+    },
+    {
+      "epoch": 0.3966984126984127,
+      "grad_norm": 0.10693359375,
+      "learning_rate": 0.1,
+      "loss": 2.485105514526367,
+      "step": 12496
+    },
+    {
+      "epoch": 0.39676190476190476,
+      "grad_norm": 0.1826171875,
+      "learning_rate": 0.1,
+      "loss": 2.4699769020080566,
+      "step": 12498
+    },
+    {
+      "epoch": 0.3968253968253968,
+      "grad_norm": 0.1806640625,
+      "learning_rate": 0.1,
+      "loss": 2.4763691425323486,
+      "step": 12500
+    },
+    {
+      "epoch": 0.3968888888888889,
+      "grad_norm": 0.07568359375,
+      "learning_rate": 0.1,
+      "loss": 2.491389274597168,
+      "step": 12502
+    },
+    {
+      "epoch": 0.39695238095238095,
+      "grad_norm": 0.1025390625,
+      "learning_rate": 0.1,
+      "loss": 2.481797218322754,
+      "step": 12504
+    },
+    {
+      "epoch": 0.397015873015873,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.1,
+      "loss": 2.5158865451812744,
+      "step": 12506
+    },
+    {
+      "epoch": 0.3970793650793651,
+      "grad_norm": 0.2021484375,
+      "learning_rate": 0.1,
+      "loss": 2.4983389377593994,
+      "step": 12508
+    },
+    {
+      "epoch": 0.39714285714285713,
+      "grad_norm": 0.1201171875,
+      "learning_rate": 0.1,
+      "loss": 2.496899127960205,
+      "step": 12510
+    },
+    {
+      "epoch": 0.39720634920634923,
+      "grad_norm": 0.087890625,
+      "learning_rate": 0.1,
+      "loss": 2.499006986618042,
+      "step": 12512
+    },
+    {
+      "epoch": 0.3972698412698413,
+      "grad_norm": 0.11865234375,
+      "learning_rate": 0.1,
+      "loss": 2.483018636703491,
+      "step": 12514
+    },
+    {
+      "epoch": 0.3973333333333333,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.1,
+      "loss": 2.4755356311798096,
+      "step": 12516
+    },
+    {
+      "epoch": 0.3973968253968254,
+      "grad_norm": 0.482421875,
+      "learning_rate": 0.1,
+      "loss": 2.4969849586486816,
+      "step": 12518
+    },
+    {
+      "epoch": 0.39746031746031746,
+      "grad_norm": 0.06591796875,
+      "learning_rate": 0.1,
+      "loss": 2.481879234313965,
+      "step": 12520
+    },
+    {
+      "epoch": 0.3975238095238095,
+      "grad_norm": 0.212890625,
+      "learning_rate": 0.1,
+      "loss": 2.507845401763916,
+      "step": 12522
+    },
+    {
+      "epoch": 0.3975873015873016,
+      "grad_norm": 0.2080078125,
+      "learning_rate": 0.1,
+      "loss": 2.514796018600464,
+      "step": 12524
+    },
+    {
+      "epoch": 0.39765079365079364,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.1,
+      "loss": 2.486309766769409,
+      "step": 12526
+    },
+    {
+      "epoch": 0.3977142857142857,
+      "grad_norm": 0.2412109375,
+      "learning_rate": 0.1,
+      "loss": 2.4785990715026855,
+      "step": 12528
+    },
+    {
+      "epoch": 0.3977777777777778,
+      "grad_norm": 0.294921875,
+      "learning_rate": 0.1,
+      "loss": 2.4831855297088623,
+      "step": 12530
+    },
+    {
+      "epoch": 0.39784126984126983,
+      "grad_norm": 0.19921875,
+      "learning_rate": 0.1,
+      "loss": 2.4929800033569336,
+      "step": 12532
+    },
+    {
+      "epoch": 0.3979047619047619,
+      "grad_norm": 0.2158203125,
+      "learning_rate": 0.1,
+      "loss": 2.5137693881988525,
+      "step": 12534
+    },
+    {
+      "epoch": 0.39796825396825397,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.1,
+      "loss": 2.4880499839782715,
+      "step": 12536
+    },
+    {
+      "epoch": 0.398031746031746,
+      "grad_norm": 0.423828125,
+      "learning_rate": 0.1,
+      "loss": 2.503533124923706,
+      "step": 12538
+    },
+    {
+      "epoch": 0.3980952380952381,
+      "grad_norm": 0.19921875,
+      "learning_rate": 0.1,
+      "loss": 2.480361223220825,
+      "step": 12540
+    },
+    {
+      "epoch": 0.39815873015873016,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.4782419204711914,
+      "step": 12542
+    },
+    {
+      "epoch": 0.3982222222222222,
+      "grad_norm": 0.1572265625,
+      "learning_rate": 0.1,
+      "loss": 2.463319778442383,
+      "step": 12544
+    },
+    {
+      "epoch": 0.3982857142857143,
+      "grad_norm": 0.09765625,
+      "learning_rate": 0.1,
+      "loss": 2.4680655002593994,
+      "step": 12546
+    },
+    {
+      "epoch": 0.39834920634920634,
+      "grad_norm": 0.08740234375,
+      "learning_rate": 0.1,
+      "loss": 2.4729971885681152,
+      "step": 12548
+    },
+    {
+      "epoch": 0.3984126984126984,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.1,
+      "loss": 2.4643752574920654,
+      "step": 12550
+    },
+    {
+      "epoch": 0.3984761904761905,
+      "grad_norm": 0.1650390625,
+      "learning_rate": 0.1,
+      "loss": 2.4500062465667725,
+      "step": 12552
+    },
+    {
+      "epoch": 0.3985396825396825,
+      "grad_norm": 0.2021484375,
+      "learning_rate": 0.1,
+      "loss": 2.4672186374664307,
+      "step": 12554
+    },
+    {
+      "epoch": 0.3986031746031746,
+      "grad_norm": 0.38671875,
+      "learning_rate": 0.1,
+      "loss": 2.4818828105926514,
+      "step": 12556
+    },
+    {
+      "epoch": 0.39866666666666667,
+      "grad_norm": 0.2578125,
+      "learning_rate": 0.1,
+      "loss": 2.4652960300445557,
+      "step": 12558
+    },
+    {
+      "epoch": 0.3987301587301587,
+      "grad_norm": 0.11474609375,
+      "learning_rate": 0.1,
+      "loss": 2.433884382247925,
+      "step": 12560
+    },
+    {
+      "epoch": 0.3987936507936508,
+      "grad_norm": 0.05712890625,
+      "learning_rate": 0.1,
+      "loss": 2.424755811691284,
+      "step": 12562
+    },
+    {
+      "epoch": 0.39885714285714285,
+      "grad_norm": 0.07568359375,
+      "learning_rate": 0.1,
+      "loss": 2.430487632751465,
+      "step": 12564
+    },
+    {
+      "epoch": 0.3989206349206349,
+      "grad_norm": 0.055908203125,
+      "learning_rate": 0.1,
+      "loss": 2.4455268383026123,
+      "step": 12566
+    },
+    {
+      "epoch": 0.398984126984127,
+      "grad_norm": 0.23828125,
+      "learning_rate": 0.1,
+      "loss": 2.4371044635772705,
+      "step": 12568
+    },
+    {
+      "epoch": 0.39904761904761904,
+      "grad_norm": 0.546875,
+      "learning_rate": 0.1,
+      "loss": 2.4374101161956787,
+      "step": 12570
+    },
+    {
+      "epoch": 0.39911111111111114,
+      "grad_norm": 0.21875,
+      "learning_rate": 0.1,
+      "loss": 2.4382810592651367,
+      "step": 12572
+    },
+    {
+      "epoch": 0.3991746031746032,
+      "grad_norm": 0.056884765625,
+      "learning_rate": 0.1,
+      "loss": 2.4031074047088623,
+      "step": 12574
+    },
+    {
+      "epoch": 0.3992380952380952,
+      "grad_norm": 0.0517578125,
+      "learning_rate": 0.1,
+      "loss": 2.41848087310791,
+      "step": 12576
+    },
+    {
+      "epoch": 0.3993015873015873,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.1,
+      "loss": 2.432514190673828,
+      "step": 12578
+    },
+    {
+      "epoch": 0.39936507936507937,
+      "grad_norm": 0.4296875,
+      "learning_rate": 0.1,
+      "loss": 2.4152355194091797,
+      "step": 12580
+    },
+    {
+      "epoch": 0.3994285714285714,
+      "grad_norm": 0.326171875,
+      "learning_rate": 0.1,
+      "loss": 2.3761203289031982,
+      "step": 12582
+    },
+    {
+      "epoch": 0.3994920634920635,
+      "grad_norm": 0.06103515625,
+      "learning_rate": 0.1,
+      "loss": 2.4018025398254395,
+      "step": 12584
+    },
+    {
+      "epoch": 0.39955555555555555,
+      "grad_norm": 0.07080078125,
+      "learning_rate": 0.1,
+      "loss": 2.386880874633789,
+      "step": 12586
+    },
+    {
+      "epoch": 0.3996190476190476,
+      "grad_norm": 0.1845703125,
+      "learning_rate": 0.1,
+      "loss": 2.397996425628662,
+      "step": 12588
+    },
+    {
+      "epoch": 0.3996825396825397,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.1,
+      "loss": 2.398606300354004,
+      "step": 12590
+    },
+    {
+      "epoch": 0.39974603174603174,
+      "grad_norm": 0.0771484375,
+      "learning_rate": 0.1,
+      "loss": 2.370978832244873,
+      "step": 12592
+    },
+    {
+      "epoch": 0.39980952380952384,
+      "grad_norm": 0.10205078125,
+      "learning_rate": 0.1,
+      "loss": 2.37992262840271,
+      "step": 12594
+    },
+    {
+      "epoch": 0.3998730158730159,
+      "grad_norm": 0.1435546875,
+      "learning_rate": 0.1,
+      "loss": 2.378612995147705,
+      "step": 12596
+    },
+    {
+      "epoch": 0.3999365079365079,
+      "grad_norm": 0.12060546875,
+      "learning_rate": 0.1,
+      "loss": 2.3547215461730957,
+      "step": 12598
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.08740234375,
+      "learning_rate": 0.1,
+      "loss": 2.359011650085449,
+      "step": 12600
+    },
+    {
+      "epoch": 0.4,
+      "eval_loss": 1.7705790996551514,
+      "eval_runtime": 105.9037,
+      "eval_samples_per_second": 10.028,
+      "eval_steps_per_second": 2.512,
+      "step": 12600
     }
   ],
   "logging_steps": 2,
       "attributes": {}
     }
   },
+  "total_flos": 4.17291330903253e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null