Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on 28 days ago

Commit

4867b39

1 Parent(s): 5550def

Training in progress, step 10710, checkpoint

Browse files

Files changed (1) hide show

last-checkpoint/trainer_state.json +1109 -3

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.33,
   "eval_steps": 3150,
-  "global_step": 10395,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -36411,6 +36411,1112 @@
       "learning_rate": 0.1,
       "loss": 2.2154617309570312,
       "step": 10394
     }
   ],
   "logging_steps": 2,
@@ -36430,7 +37536,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 3.4427152866129555e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.34,
   "eval_steps": 3150,
+  "global_step": 10710,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "learning_rate": 0.1,
       "loss": 2.2154617309570312,
       "step": 10394
+    },
+    {
+      "epoch": 0.330031746031746,
+      "grad_norm": 0.1298828125,
+      "learning_rate": 0.1,
+      "loss": 2.2190306186676025,
+      "step": 10396
+    },
+    {
+      "epoch": 0.3300952380952381,
+      "grad_norm": 0.10888671875,
+      "learning_rate": 0.1,
+      "loss": 2.2018134593963623,
+      "step": 10398
+    },
+    {
+      "epoch": 0.33015873015873015,
+      "grad_norm": 0.142578125,
+      "learning_rate": 0.1,
+      "loss": 2.2261345386505127,
+      "step": 10400
+    },
+    {
+      "epoch": 0.3302222222222222,
+      "grad_norm": 0.1787109375,
+      "learning_rate": 0.1,
+      "loss": 2.2277352809906006,
+      "step": 10402
+    },
+    {
+      "epoch": 0.3302857142857143,
+      "grad_norm": 0.142578125,
+      "learning_rate": 0.1,
+      "loss": 2.2359938621520996,
+      "step": 10404
+    },
+    {
+      "epoch": 0.33034920634920634,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.1,
+      "loss": 2.2159276008605957,
+      "step": 10406
+    },
+    {
+      "epoch": 0.33041269841269844,
+      "grad_norm": 0.22265625,
+      "learning_rate": 0.1,
+      "loss": 2.2152233123779297,
+      "step": 10408
+    },
+    {
+      "epoch": 0.3304761904761905,
+      "grad_norm": 0.17578125,
+      "learning_rate": 0.1,
+      "loss": 2.1989567279815674,
+      "step": 10410
+    },
+    {
+      "epoch": 0.3305396825396825,
+      "grad_norm": 0.23046875,
+      "learning_rate": 0.1,
+      "loss": 2.2169737815856934,
+      "step": 10412
+    },
+    {
+      "epoch": 0.3306031746031746,
+      "grad_norm": 0.158203125,
+      "learning_rate": 0.1,
+      "loss": 2.2402162551879883,
+      "step": 10414
+    },
+    {
+      "epoch": 0.33066666666666666,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.1,
+      "loss": 2.22573184967041,
+      "step": 10416
+    },
+    {
+      "epoch": 0.3307301587301587,
+      "grad_norm": 0.173828125,
+      "learning_rate": 0.1,
+      "loss": 2.2216873168945312,
+      "step": 10418
+    },
+    {
+      "epoch": 0.3307936507936508,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.1,
+      "loss": 2.22286319732666,
+      "step": 10420
+    },
+    {
+      "epoch": 0.33085714285714285,
+      "grad_norm": 0.248046875,
+      "learning_rate": 0.1,
+      "loss": 2.2141366004943848,
+      "step": 10422
+    },
+    {
+      "epoch": 0.3309206349206349,
+      "grad_norm": 0.138671875,
+      "learning_rate": 0.1,
+      "loss": 2.207756519317627,
+      "step": 10424
+    },
+    {
+      "epoch": 0.330984126984127,
+      "grad_norm": 0.07763671875,
+      "learning_rate": 0.1,
+      "loss": 2.225470781326294,
+      "step": 10426
+    },
+    {
+      "epoch": 0.33104761904761904,
+      "grad_norm": 0.1669921875,
+      "learning_rate": 0.1,
+      "loss": 2.228902816772461,
+      "step": 10428
+    },
+    {
+      "epoch": 0.33111111111111113,
+      "grad_norm": 0.1533203125,
+      "learning_rate": 0.1,
+      "loss": 2.2184183597564697,
+      "step": 10430
+    },
+    {
+      "epoch": 0.3311746031746032,
+      "grad_norm": 0.12158203125,
+      "learning_rate": 0.1,
+      "loss": 2.2254750728607178,
+      "step": 10432
+    },
+    {
+      "epoch": 0.3312380952380952,
+      "grad_norm": 0.1767578125,
+      "learning_rate": 0.1,
+      "loss": 2.2171778678894043,
+      "step": 10434
+    },
+    {
+      "epoch": 0.3313015873015873,
+      "grad_norm": 0.068359375,
+      "learning_rate": 0.1,
+      "loss": 2.237307071685791,
+      "step": 10436
+    },
+    {
+      "epoch": 0.33136507936507936,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.1,
+      "loss": 2.2065367698669434,
+      "step": 10438
+    },
+    {
+      "epoch": 0.3314285714285714,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.1,
+      "loss": 2.2208123207092285,
+      "step": 10440
+    },
+    {
+      "epoch": 0.3314920634920635,
+      "grad_norm": 0.16796875,
+      "learning_rate": 0.1,
+      "loss": 2.219244956970215,
+      "step": 10442
+    },
+    {
+      "epoch": 0.33155555555555555,
+      "grad_norm": 0.185546875,
+      "learning_rate": 0.1,
+      "loss": 2.2103569507598877,
+      "step": 10444
+    },
+    {
+      "epoch": 0.33161904761904765,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.1,
+      "loss": 2.2479629516601562,
+      "step": 10446
+    },
+    {
+      "epoch": 0.3316825396825397,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.1,
+      "loss": 2.2229833602905273,
+      "step": 10448
+    },
+    {
+      "epoch": 0.33174603174603173,
+      "grad_norm": 0.546875,
+      "learning_rate": 0.1,
+      "loss": 2.2340261936187744,
+      "step": 10450
+    },
+    {
+      "epoch": 0.33180952380952383,
+      "grad_norm": 0.11962890625,
+      "learning_rate": 0.1,
+      "loss": 2.228762626647949,
+      "step": 10452
+    },
+    {
+      "epoch": 0.3318730158730159,
+      "grad_norm": 0.12158203125,
+      "learning_rate": 0.1,
+      "loss": 2.2089247703552246,
+      "step": 10454
+    },
+    {
+      "epoch": 0.3319365079365079,
+      "grad_norm": 0.07275390625,
+      "learning_rate": 0.1,
+      "loss": 2.214805841445923,
+      "step": 10456
+    },
+    {
+      "epoch": 0.332,
+      "grad_norm": 0.111328125,
+      "learning_rate": 0.1,
+      "loss": 2.2390830516815186,
+      "step": 10458
+    },
+    {
+      "epoch": 0.33206349206349206,
+      "grad_norm": 0.2080078125,
+      "learning_rate": 0.1,
+      "loss": 2.255481719970703,
+      "step": 10460
+    },
+    {
+      "epoch": 0.3321269841269841,
+      "grad_norm": 0.146484375,
+      "learning_rate": 0.1,
+      "loss": 2.19061541557312,
+      "step": 10462
+    },
+    {
+      "epoch": 0.3321904761904762,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.1,
+      "loss": 2.2185862064361572,
+      "step": 10464
+    },
+    {
+      "epoch": 0.33225396825396825,
+      "grad_norm": 0.064453125,
+      "learning_rate": 0.1,
+      "loss": 2.2265207767486572,
+      "step": 10466
+    },
+    {
+      "epoch": 0.33231746031746034,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.1,
+      "loss": 2.2071709632873535,
+      "step": 10468
+    },
+    {
+      "epoch": 0.3323809523809524,
+      "grad_norm": 0.2333984375,
+      "learning_rate": 0.1,
+      "loss": 2.219552993774414,
+      "step": 10470
+    },
+    {
+      "epoch": 0.33244444444444443,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.1,
+      "loss": 2.2066922187805176,
+      "step": 10472
+    },
+    {
+      "epoch": 0.33250793650793653,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.1,
+      "loss": 2.1909236907958984,
+      "step": 10474
+    },
+    {
+      "epoch": 0.3325714285714286,
+      "grad_norm": 0.2099609375,
+      "learning_rate": 0.1,
+      "loss": 2.212517023086548,
+      "step": 10476
+    },
+    {
+      "epoch": 0.3326349206349206,
+      "grad_norm": 0.181640625,
+      "learning_rate": 0.1,
+      "loss": 2.2175776958465576,
+      "step": 10478
+    },
+    {
+      "epoch": 0.3326984126984127,
+      "grad_norm": 0.162109375,
+      "learning_rate": 0.1,
+      "loss": 2.1897659301757812,
+      "step": 10480
+    },
+    {
+      "epoch": 0.33276190476190476,
+      "grad_norm": 0.07763671875,
+      "learning_rate": 0.1,
+      "loss": 2.197108745574951,
+      "step": 10482
+    },
+    {
+      "epoch": 0.3328253968253968,
+      "grad_norm": 0.08544921875,
+      "learning_rate": 0.1,
+      "loss": 2.210545301437378,
+      "step": 10484
+    },
+    {
+      "epoch": 0.3328888888888889,
+      "grad_norm": 0.1572265625,
+      "learning_rate": 0.1,
+      "loss": 2.1997792720794678,
+      "step": 10486
+    },
+    {
+      "epoch": 0.33295238095238094,
+      "grad_norm": 0.0908203125,
+      "learning_rate": 0.1,
+      "loss": 2.212092638015747,
+      "step": 10488
+    },
+    {
+      "epoch": 0.33301587301587304,
+      "grad_norm": 0.08349609375,
+      "learning_rate": 0.1,
+      "loss": 2.193092107772827,
+      "step": 10490
+    },
+    {
+      "epoch": 0.3330793650793651,
+      "grad_norm": 0.1669921875,
+      "learning_rate": 0.1,
+      "loss": 2.2050037384033203,
+      "step": 10492
+    },
+    {
+      "epoch": 0.33314285714285713,
+      "grad_norm": 0.4140625,
+      "learning_rate": 0.1,
+      "loss": 2.2340750694274902,
+      "step": 10494
+    },
+    {
+      "epoch": 0.33320634920634923,
+      "grad_norm": 0.07763671875,
+      "learning_rate": 0.1,
+      "loss": 2.1892526149749756,
+      "step": 10496
+    },
+    {
+      "epoch": 0.33326984126984127,
+      "grad_norm": 0.07421875,
+      "learning_rate": 0.1,
+      "loss": 2.1927127838134766,
+      "step": 10498
+    },
+    {
+      "epoch": 0.3333333333333333,
+      "grad_norm": 0.2216796875,
+      "learning_rate": 0.1,
+      "loss": 2.200645685195923,
+      "step": 10500
+    },
+    {
+      "epoch": 0.3333968253968254,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.1,
+      "loss": 2.215970277786255,
+      "step": 10502
+    },
+    {
+      "epoch": 0.33346031746031746,
+      "grad_norm": 0.109375,
+      "learning_rate": 0.1,
+      "loss": 2.222879648208618,
+      "step": 10504
+    },
+    {
+      "epoch": 0.3335238095238095,
+      "grad_norm": 0.052978515625,
+      "learning_rate": 0.1,
+      "loss": 2.2258737087249756,
+      "step": 10506
+    },
+    {
+      "epoch": 0.3335873015873016,
+      "grad_norm": 0.1552734375,
+      "learning_rate": 0.1,
+      "loss": 2.1933491230010986,
+      "step": 10508
+    },
+    {
+      "epoch": 0.33365079365079364,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.1,
+      "loss": 2.189959764480591,
+      "step": 10510
+    },
+    {
+      "epoch": 0.33371428571428574,
+      "grad_norm": 0.1943359375,
+      "learning_rate": 0.1,
+      "loss": 2.215662717819214,
+      "step": 10512
+    },
+    {
+      "epoch": 0.3337777777777778,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.1,
+      "loss": 2.2161316871643066,
+      "step": 10514
+    },
+    {
+      "epoch": 0.3338412698412698,
+      "grad_norm": 0.158203125,
+      "learning_rate": 0.1,
+      "loss": 2.213809013366699,
+      "step": 10516
+    },
+    {
+      "epoch": 0.3339047619047619,
+      "grad_norm": 0.11181640625,
+      "learning_rate": 0.1,
+      "loss": 2.193037509918213,
+      "step": 10518
+    },
+    {
+      "epoch": 0.33396825396825397,
+      "grad_norm": 0.04638671875,
+      "learning_rate": 0.1,
+      "loss": 2.168241262435913,
+      "step": 10520
+    },
+    {
+      "epoch": 0.334031746031746,
+      "grad_norm": 0.2138671875,
+      "learning_rate": 0.1,
+      "loss": 2.2035601139068604,
+      "step": 10522
+    },
+    {
+      "epoch": 0.3340952380952381,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.1,
+      "loss": 2.1778528690338135,
+      "step": 10524
+    },
+    {
+      "epoch": 0.33415873015873016,
+      "grad_norm": 0.1142578125,
+      "learning_rate": 0.1,
+      "loss": 2.15850830078125,
+      "step": 10526
+    },
+    {
+      "epoch": 0.3342222222222222,
+      "grad_norm": 0.10791015625,
+      "learning_rate": 0.1,
+      "loss": 2.205479621887207,
+      "step": 10528
+    },
+    {
+      "epoch": 0.3342857142857143,
+      "grad_norm": 0.07275390625,
+      "learning_rate": 0.1,
+      "loss": 2.195462703704834,
+      "step": 10530
+    },
+    {
+      "epoch": 0.33434920634920634,
+      "grad_norm": 0.09423828125,
+      "learning_rate": 0.1,
+      "loss": 2.1862833499908447,
+      "step": 10532
+    },
+    {
+      "epoch": 0.33441269841269844,
+      "grad_norm": 0.1826171875,
+      "learning_rate": 0.1,
+      "loss": 2.1653358936309814,
+      "step": 10534
+    },
+    {
+      "epoch": 0.3344761904761905,
+      "grad_norm": 0.1044921875,
+      "learning_rate": 0.1,
+      "loss": 2.187879800796509,
+      "step": 10536
+    },
+    {
+      "epoch": 0.3345396825396825,
+      "grad_norm": 0.2392578125,
+      "learning_rate": 0.1,
+      "loss": 2.1676132678985596,
+      "step": 10538
+    },
+    {
+      "epoch": 0.3346031746031746,
+      "grad_norm": 0.298828125,
+      "learning_rate": 0.1,
+      "loss": 2.172863006591797,
+      "step": 10540
+    },
+    {
+      "epoch": 0.33466666666666667,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.1,
+      "loss": 2.221343517303467,
+      "step": 10542
+    },
+    {
+      "epoch": 0.3347301587301587,
+      "grad_norm": 0.05615234375,
+      "learning_rate": 0.1,
+      "loss": 2.168297052383423,
+      "step": 10544
+    },
+    {
+      "epoch": 0.3347936507936508,
+      "grad_norm": 0.08837890625,
+      "learning_rate": 0.1,
+      "loss": 2.179994821548462,
+      "step": 10546
+    },
+    {
+      "epoch": 0.33485714285714285,
+      "grad_norm": 0.107421875,
+      "learning_rate": 0.1,
+      "loss": 2.1946322917938232,
+      "step": 10548
+    },
+    {
+      "epoch": 0.3349206349206349,
+      "grad_norm": 0.1630859375,
+      "learning_rate": 0.1,
+      "loss": 2.1987736225128174,
+      "step": 10550
+    },
+    {
+      "epoch": 0.334984126984127,
+      "grad_norm": 0.1708984375,
+      "learning_rate": 0.1,
+      "loss": 2.186816453933716,
+      "step": 10552
+    },
+    {
+      "epoch": 0.33504761904761904,
+      "grad_norm": 0.060546875,
+      "learning_rate": 0.1,
+      "loss": 2.2010762691497803,
+      "step": 10554
+    },
+    {
+      "epoch": 0.33511111111111114,
+      "grad_norm": 0.09912109375,
+      "learning_rate": 0.1,
+      "loss": 2.20300030708313,
+      "step": 10556
+    },
+    {
+      "epoch": 0.3351746031746032,
+      "grad_norm": 0.234375,
+      "learning_rate": 0.1,
+      "loss": 2.2044005393981934,
+      "step": 10558
+    },
+    {
+      "epoch": 0.3352380952380952,
+      "grad_norm": 0.173828125,
+      "learning_rate": 0.1,
+      "loss": 2.177442789077759,
+      "step": 10560
+    },
+    {
+      "epoch": 0.3353015873015873,
+      "grad_norm": 0.1875,
+      "learning_rate": 0.1,
+      "loss": 2.1997268199920654,
+      "step": 10562
+    },
+    {
+      "epoch": 0.33536507936507937,
+      "grad_norm": 0.54296875,
+      "learning_rate": 0.1,
+      "loss": 2.177321434020996,
+      "step": 10564
+    },
+    {
+      "epoch": 0.3354285714285714,
+      "grad_norm": 0.09521484375,
+      "learning_rate": 0.1,
+      "loss": 2.1977639198303223,
+      "step": 10566
+    },
+    {
+      "epoch": 0.3354920634920635,
+      "grad_norm": 0.06689453125,
+      "learning_rate": 0.1,
+      "loss": 2.186274528503418,
+      "step": 10568
+    },
+    {
+      "epoch": 0.33555555555555555,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.1,
+      "loss": 2.178790807723999,
+      "step": 10570
+    },
+    {
+      "epoch": 0.3356190476190476,
+      "grad_norm": 0.11474609375,
+      "learning_rate": 0.1,
+      "loss": 2.1916494369506836,
+      "step": 10572
+    },
+    {
+      "epoch": 0.3356825396825397,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.1,
+      "loss": 2.198308229446411,
+      "step": 10574
+    },
+    {
+      "epoch": 0.33574603174603174,
+      "grad_norm": 0.1748046875,
+      "learning_rate": 0.1,
+      "loss": 2.1767139434814453,
+      "step": 10576
+    },
+    {
+      "epoch": 0.33580952380952384,
+      "grad_norm": 0.2138671875,
+      "learning_rate": 0.1,
+      "loss": 2.189945936203003,
+      "step": 10578
+    },
+    {
+      "epoch": 0.3358730158730159,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.1,
+      "loss": 2.1869962215423584,
+      "step": 10580
+    },
+    {
+      "epoch": 0.3359365079365079,
+      "grad_norm": 0.181640625,
+      "learning_rate": 0.1,
+      "loss": 2.1500842571258545,
+      "step": 10582
+    },
+    {
+      "epoch": 0.336,
+      "grad_norm": 0.10009765625,
+      "learning_rate": 0.1,
+      "loss": 2.1756904125213623,
+      "step": 10584
+    },
+    {
+      "epoch": 0.33606349206349206,
+      "grad_norm": 0.1083984375,
+      "learning_rate": 0.1,
+      "loss": 2.1680054664611816,
+      "step": 10586
+    },
+    {
+      "epoch": 0.3361269841269841,
+      "grad_norm": 0.216796875,
+      "learning_rate": 0.1,
+      "loss": 2.182795763015747,
+      "step": 10588
+    },
+    {
+      "epoch": 0.3361904761904762,
+      "grad_norm": 0.177734375,
+      "learning_rate": 0.1,
+      "loss": 2.187570571899414,
+      "step": 10590
+    },
+    {
+      "epoch": 0.33625396825396825,
+      "grad_norm": 0.216796875,
+      "learning_rate": 0.1,
+      "loss": 2.186627149581909,
+      "step": 10592
+    },
+    {
+      "epoch": 0.3363174603174603,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.1,
+      "loss": 2.1917238235473633,
+      "step": 10594
+    },
+    {
+      "epoch": 0.3363809523809524,
+      "grad_norm": 0.09130859375,
+      "learning_rate": 0.1,
+      "loss": 2.1571314334869385,
+      "step": 10596
+    },
+    {
+      "epoch": 0.33644444444444443,
+      "grad_norm": 0.062255859375,
+      "learning_rate": 0.1,
+      "loss": 2.2026073932647705,
+      "step": 10598
+    },
+    {
+      "epoch": 0.33650793650793653,
+      "grad_norm": 0.103515625,
+      "learning_rate": 0.1,
+      "loss": 2.2058682441711426,
+      "step": 10600
+    },
+    {
+      "epoch": 0.3365714285714286,
+      "grad_norm": 0.28125,
+      "learning_rate": 0.1,
+      "loss": 2.202815294265747,
+      "step": 10602
+    },
+    {
+      "epoch": 0.3366349206349206,
+      "grad_norm": 0.265625,
+      "learning_rate": 0.1,
+      "loss": 2.2016568183898926,
+      "step": 10604
+    },
+    {
+      "epoch": 0.3366984126984127,
+      "grad_norm": 0.11279296875,
+      "learning_rate": 0.1,
+      "loss": 2.1916162967681885,
+      "step": 10606
+    },
+    {
+      "epoch": 0.33676190476190476,
+      "grad_norm": 0.109375,
+      "learning_rate": 0.1,
+      "loss": 2.1857426166534424,
+      "step": 10608
+    },
+    {
+      "epoch": 0.3368253968253968,
+      "grad_norm": 0.08935546875,
+      "learning_rate": 0.1,
+      "loss": 2.2138404846191406,
+      "step": 10610
+    },
+    {
+      "epoch": 0.3368888888888889,
+      "grad_norm": 0.1904296875,
+      "learning_rate": 0.1,
+      "loss": 2.1968023777008057,
+      "step": 10612
+    },
+    {
+      "epoch": 0.33695238095238095,
+      "grad_norm": 0.1748046875,
+      "learning_rate": 0.1,
+      "loss": 2.193377733230591,
+      "step": 10614
+    },
+    {
+      "epoch": 0.337015873015873,
+      "grad_norm": 0.302734375,
+      "learning_rate": 0.1,
+      "loss": 2.1726179122924805,
+      "step": 10616
+    },
+    {
+      "epoch": 0.3370793650793651,
+      "grad_norm": 0.2099609375,
+      "learning_rate": 0.1,
+      "loss": 2.1705336570739746,
+      "step": 10618
+    },
+    {
+      "epoch": 0.33714285714285713,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.2049412727355957,
+      "step": 10620
+    },
+    {
+      "epoch": 0.33720634920634923,
+      "grad_norm": 0.279296875,
+      "learning_rate": 0.1,
+      "loss": 2.211937665939331,
+      "step": 10622
+    },
+    {
+      "epoch": 0.3372698412698413,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.1,
+      "loss": 2.178178071975708,
+      "step": 10624
+    },
+    {
+      "epoch": 0.3373333333333333,
+      "grad_norm": 0.06005859375,
+      "learning_rate": 0.1,
+      "loss": 2.1817891597747803,
+      "step": 10626
+    },
+    {
+      "epoch": 0.3373968253968254,
+      "grad_norm": 0.07373046875,
+      "learning_rate": 0.1,
+      "loss": 2.1902942657470703,
+      "step": 10628
+    },
+    {
+      "epoch": 0.33746031746031746,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.1,
+      "loss": 2.168306827545166,
+      "step": 10630
+    },
+    {
+      "epoch": 0.3375238095238095,
+      "grad_norm": 0.294921875,
+      "learning_rate": 0.1,
+      "loss": 2.1916491985321045,
+      "step": 10632
+    },
+    {
+      "epoch": 0.3375873015873016,
+      "grad_norm": 0.1162109375,
+      "learning_rate": 0.1,
+      "loss": 2.20019268989563,
+      "step": 10634
+    },
+    {
+      "epoch": 0.33765079365079365,
+      "grad_norm": 0.1015625,
+      "learning_rate": 0.1,
+      "loss": 2.1709628105163574,
+      "step": 10636
+    },
+    {
+      "epoch": 0.3377142857142857,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.1,
+      "loss": 2.164698839187622,
+      "step": 10638
+    },
+    {
+      "epoch": 0.3377777777777778,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.1,
+      "loss": 2.1735100746154785,
+      "step": 10640
+    },
+    {
+      "epoch": 0.33784126984126983,
+      "grad_norm": 0.2177734375,
+      "learning_rate": 0.1,
+      "loss": 2.1582412719726562,
+      "step": 10642
+    },
+    {
+      "epoch": 0.33790476190476193,
+      "grad_norm": 0.453125,
+      "learning_rate": 0.1,
+      "loss": 2.1964187622070312,
+      "step": 10644
+    },
+    {
+      "epoch": 0.337968253968254,
+      "grad_norm": 0.1533203125,
+      "learning_rate": 0.1,
+      "loss": 2.1713085174560547,
+      "step": 10646
+    },
+    {
+      "epoch": 0.338031746031746,
+      "grad_norm": 0.0986328125,
+      "learning_rate": 0.1,
+      "loss": 2.190800666809082,
+      "step": 10648
+    },
+    {
+      "epoch": 0.3380952380952381,
+      "grad_norm": 0.1044921875,
+      "learning_rate": 0.1,
+      "loss": 2.1849403381347656,
+      "step": 10650
+    },
+    {
+      "epoch": 0.33815873015873016,
+      "grad_norm": 0.1513671875,
+      "learning_rate": 0.1,
+      "loss": 2.1842076778411865,
+      "step": 10652
+    },
+    {
+      "epoch": 0.3382222222222222,
+      "grad_norm": 0.07861328125,
+      "learning_rate": 0.1,
+      "loss": 2.1835153102874756,
+      "step": 10654
+    },
+    {
+      "epoch": 0.3382857142857143,
+      "grad_norm": 0.1201171875,
+      "learning_rate": 0.1,
+      "loss": 2.184572696685791,
+      "step": 10656
+    },
+    {
+      "epoch": 0.33834920634920634,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.20729398727417,
+      "step": 10658
+    },
+    {
+      "epoch": 0.3384126984126984,
+      "grad_norm": 0.11328125,
+      "learning_rate": 0.1,
+      "loss": 2.148922920227051,
+      "step": 10660
+    },
+    {
+      "epoch": 0.3384761904761905,
+      "grad_norm": 0.0595703125,
+      "learning_rate": 0.1,
+      "loss": 2.197078227996826,
+      "step": 10662
+    },
+    {
+      "epoch": 0.33853968253968253,
+      "grad_norm": 0.1455078125,
+      "learning_rate": 0.1,
+      "loss": 2.1765565872192383,
+      "step": 10664
+    },
+    {
+      "epoch": 0.33860317460317463,
+      "grad_norm": 0.12109375,
+      "learning_rate": 0.1,
+      "loss": 2.160946846008301,
+      "step": 10666
+    },
+    {
+      "epoch": 0.33866666666666667,
+      "grad_norm": 0.138671875,
+      "learning_rate": 0.1,
+      "loss": 2.1720829010009766,
+      "step": 10668
+    },
+    {
+      "epoch": 0.3387301587301587,
+      "grad_norm": 0.28515625,
+      "learning_rate": 0.1,
+      "loss": 2.2134289741516113,
+      "step": 10670
+    },
+    {
+      "epoch": 0.3387936507936508,
+      "grad_norm": 0.2333984375,
+      "learning_rate": 0.1,
+      "loss": 2.1767289638519287,
+      "step": 10672
+    },
+    {
+      "epoch": 0.33885714285714286,
+      "grad_norm": 0.173828125,
+      "learning_rate": 0.1,
+      "loss": 2.1474976539611816,
+      "step": 10674
+    },
+    {
+      "epoch": 0.3389206349206349,
+      "grad_norm": 0.1083984375,
+      "learning_rate": 0.1,
+      "loss": 2.1698927879333496,
+      "step": 10676
+    },
+    {
+      "epoch": 0.338984126984127,
+      "grad_norm": 0.2421875,
+      "learning_rate": 0.1,
+      "loss": 2.170835494995117,
+      "step": 10678
+    },
+    {
+      "epoch": 0.33904761904761904,
+      "grad_norm": 0.193359375,
+      "learning_rate": 0.1,
+      "loss": 2.15285587310791,
+      "step": 10680
+    },
+    {
+      "epoch": 0.3391111111111111,
+      "grad_norm": 0.2080078125,
+      "learning_rate": 0.1,
+      "loss": 2.1628880500793457,
+      "step": 10682
+    },
+    {
+      "epoch": 0.3391746031746032,
+      "grad_norm": 0.130859375,
+      "learning_rate": 0.1,
+      "loss": 2.1494805812835693,
+      "step": 10684
+    },
+    {
+      "epoch": 0.3392380952380952,
+      "grad_norm": 0.052490234375,
+      "learning_rate": 0.1,
+      "loss": 2.1779205799102783,
+      "step": 10686
+    },
+    {
+      "epoch": 0.3393015873015873,
+      "grad_norm": 0.0966796875,
+      "learning_rate": 0.1,
+      "loss": 2.150136947631836,
+      "step": 10688
+    },
+    {
+      "epoch": 0.33936507936507937,
+      "grad_norm": 0.053466796875,
+      "learning_rate": 0.1,
+      "loss": 2.1458067893981934,
+      "step": 10690
+    },
+    {
+      "epoch": 0.3394285714285714,
+      "grad_norm": 0.322265625,
+      "learning_rate": 0.1,
+      "loss": 2.15260910987854,
+      "step": 10692
+    },
+    {
+      "epoch": 0.3394920634920635,
+      "grad_norm": 0.279296875,
+      "learning_rate": 0.1,
+      "loss": 2.1635806560516357,
+      "step": 10694
+    },
+    {
+      "epoch": 0.33955555555555555,
+      "grad_norm": 0.2890625,
+      "learning_rate": 0.1,
+      "loss": 2.1902191638946533,
+      "step": 10696
+    },
+    {
+      "epoch": 0.3396190476190476,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.1,
+      "loss": 2.171922206878662,
+      "step": 10698
+    },
+    {
+      "epoch": 0.3396825396825397,
+      "grad_norm": 0.09326171875,
+      "learning_rate": 0.1,
+      "loss": 2.1400513648986816,
+      "step": 10700
+    },
+    {
+      "epoch": 0.33974603174603174,
+      "grad_norm": 0.1181640625,
+      "learning_rate": 0.1,
+      "loss": 2.1618614196777344,
+      "step": 10702
+    },
+    {
+      "epoch": 0.3398095238095238,
+      "grad_norm": 0.17578125,
+      "learning_rate": 0.1,
+      "loss": 2.1420485973358154,
+      "step": 10704
+    },
+    {
+      "epoch": 0.3398730158730159,
+      "grad_norm": 0.1025390625,
+      "learning_rate": 0.1,
+      "loss": 2.181180953979492,
+      "step": 10706
+    },
+    {
+      "epoch": 0.3399365079365079,
+      "grad_norm": 0.1064453125,
+      "learning_rate": 0.1,
+      "loss": 2.175330877304077,
+      "step": 10708
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 0.138671875,
+      "learning_rate": 0.1,
+      "loss": 2.1646363735198975,
+      "step": 10710
     }
   ],
   "logging_steps": 2,
       "attributes": {}
     }
   },
+  "total_flos": 3.5470475061314245e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null