Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on 7 days ago

Commit

1738ef0

verified ·

1 Parent(s): 17f9879

Training in progress, step 7875, checkpoint

Browse files

Files changed (5) hide show

last-checkpoint/config.json +1 -1
last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +1102 -3

last-checkpoint/config.json CHANGED Viewed

@@ -60,7 +60,7 @@
   "max_position_embeddings": 8192,
   "max_window_layers": 40,
   "mlp_type": "squared_relu",
-  "model_name": "qwen3-canon-padded",
   "model_type": "qwen3",
   "n_layer": 40,
   "num_attention_heads": 16,

   "max_position_embeddings": 8192,
   "max_window_layers": 40,
   "mlp_type": "squared_relu",
+  "model_name": "checkpoint-7560",
   "model_type": "qwen3",
   "n_layer": 40,
   "num_attention_heads": 16,

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3d4a4f61f0107262f86305ea5b4a9845d4bfa3d5b095053c23d5ff124af32c1a
 size 1171937904

 version https://git-lfs.github.com/spec/v1
+oid sha256:d2a7dbb293fc8a969155fa59a9a5d45fb487f92059756a2206dd2b213705ad34
 size 1171937904

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:94dc587b48c02a4478666e2ba43dda46b10f5b11daee66f5d9390179e3f3c142
 size 1288212619

 version https://git-lfs.github.com/spec/v1
+oid sha256:37e1fac40ab049effbace32a5577cae0116c678cd4eb1899d992ce7b00f84a24
 size 1288212619

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:45cb5ae3dc4aeef06aed690321fbe67c6d0a24b097985013bb940546b9888e86
 size 1401

 version https://git-lfs.github.com/spec/v1
+oid sha256:a5e041e5f006a575c0c450022f325a9d63503c3927d9e10753083345cf295a3a
 size 1401

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.24,
   "eval_steps": 3150,
-  "global_step": 7560,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -26484,6 +26484,1105 @@
       "learning_rate": 0.1,
       "loss": 2.1111397743225098,
       "step": 7560
     }
   ],
   "logging_steps": 2,
@@ -26503,7 +27602,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 2.503673961854231e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.25,
   "eval_steps": 3150,
+  "global_step": 7875,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "learning_rate": 0.1,
       "loss": 2.1111397743225098,
       "step": 7560
+    },
+    {
+      "epoch": 0.24006349206349206,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.1,
+      "loss": 2.108712673187256,
+      "step": 7562
+    },
+    {
+      "epoch": 0.24012698412698413,
+      "grad_norm": 0.1064453125,
+      "learning_rate": 0.1,
+      "loss": 2.1471011638641357,
+      "step": 7564
+    },
+    {
+      "epoch": 0.2401904761904762,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.1,
+      "loss": 2.1378467082977295,
+      "step": 7566
+    },
+    {
+      "epoch": 0.24025396825396825,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.1,
+      "loss": 2.1189959049224854,
+      "step": 7568
+    },
+    {
+      "epoch": 0.24031746031746032,
+      "grad_norm": 0.0830078125,
+      "learning_rate": 0.1,
+      "loss": 2.15275239944458,
+      "step": 7570
+    },
+    {
+      "epoch": 0.2403809523809524,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.1,
+      "loss": 2.096132516860962,
+      "step": 7572
+    },
+    {
+      "epoch": 0.24044444444444443,
+      "grad_norm": 0.25390625,
+      "learning_rate": 0.1,
+      "loss": 2.1388909816741943,
+      "step": 7574
+    },
+    {
+      "epoch": 0.2405079365079365,
+      "grad_norm": 0.310546875,
+      "learning_rate": 0.1,
+      "loss": 2.1405465602874756,
+      "step": 7576
+    },
+    {
+      "epoch": 0.24057142857142857,
+      "grad_norm": 0.0576171875,
+      "learning_rate": 0.1,
+      "loss": 2.168820858001709,
+      "step": 7578
+    },
+    {
+      "epoch": 0.24063492063492065,
+      "grad_norm": 0.080078125,
+      "learning_rate": 0.1,
+      "loss": 2.110339403152466,
+      "step": 7580
+    },
+    {
+      "epoch": 0.2406984126984127,
+      "grad_norm": 0.09130859375,
+      "learning_rate": 0.1,
+      "loss": 2.1240358352661133,
+      "step": 7582
+    },
+    {
+      "epoch": 0.24076190476190476,
+      "grad_norm": 0.103515625,
+      "learning_rate": 0.1,
+      "loss": 2.0973892211914062,
+      "step": 7584
+    },
+    {
+      "epoch": 0.24082539682539683,
+      "grad_norm": 0.08935546875,
+      "learning_rate": 0.1,
+      "loss": 2.1381688117980957,
+      "step": 7586
+    },
+    {
+      "epoch": 0.2408888888888889,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.1,
+      "loss": 2.106537103652954,
+      "step": 7588
+    },
+    {
+      "epoch": 0.24095238095238095,
+      "grad_norm": 0.2392578125,
+      "learning_rate": 0.1,
+      "loss": 2.1185178756713867,
+      "step": 7590
+    },
+    {
+      "epoch": 0.24101587301587302,
+      "grad_norm": 0.19140625,
+      "learning_rate": 0.1,
+      "loss": 2.120662212371826,
+      "step": 7592
+    },
+    {
+      "epoch": 0.2410793650793651,
+      "grad_norm": 0.197265625,
+      "learning_rate": 0.1,
+      "loss": 2.130314588546753,
+      "step": 7594
+    },
+    {
+      "epoch": 0.24114285714285713,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.1,
+      "loss": 2.1311545372009277,
+      "step": 7596
+    },
+    {
+      "epoch": 0.2412063492063492,
+      "grad_norm": 0.1201171875,
+      "learning_rate": 0.1,
+      "loss": 2.1440396308898926,
+      "step": 7598
+    },
+    {
+      "epoch": 0.24126984126984127,
+      "grad_norm": 0.1953125,
+      "learning_rate": 0.1,
+      "loss": 2.121884822845459,
+      "step": 7600
+    },
+    {
+      "epoch": 0.24133333333333334,
+      "grad_norm": 0.07080078125,
+      "learning_rate": 0.1,
+      "loss": 2.1237287521362305,
+      "step": 7602
+    },
+    {
+      "epoch": 0.2413968253968254,
+      "grad_norm": 0.1826171875,
+      "learning_rate": 0.1,
+      "loss": 2.1442861557006836,
+      "step": 7604
+    },
+    {
+      "epoch": 0.24146031746031746,
+      "grad_norm": 0.49609375,
+      "learning_rate": 0.1,
+      "loss": 2.095569372177124,
+      "step": 7606
+    },
+    {
+      "epoch": 0.24152380952380953,
+      "grad_norm": 0.12109375,
+      "learning_rate": 0.1,
+      "loss": 2.1049482822418213,
+      "step": 7608
+    },
+    {
+      "epoch": 0.2415873015873016,
+      "grad_norm": 0.10888671875,
+      "learning_rate": 0.1,
+      "loss": 2.1004626750946045,
+      "step": 7610
+    },
+    {
+      "epoch": 0.24165079365079364,
+      "grad_norm": 0.10546875,
+      "learning_rate": 0.1,
+      "loss": 2.107957363128662,
+      "step": 7612
+    },
+    {
+      "epoch": 0.24171428571428571,
+      "grad_norm": 0.05810546875,
+      "learning_rate": 0.1,
+      "loss": 2.1209492683410645,
+      "step": 7614
+    },
+    {
+      "epoch": 0.24177777777777779,
+      "grad_norm": 0.08251953125,
+      "learning_rate": 0.1,
+      "loss": 2.1376209259033203,
+      "step": 7616
+    },
+    {
+      "epoch": 0.24184126984126983,
+      "grad_norm": 0.09521484375,
+      "learning_rate": 0.1,
+      "loss": 2.126371383666992,
+      "step": 7618
+    },
+    {
+      "epoch": 0.2419047619047619,
+      "grad_norm": 0.057373046875,
+      "learning_rate": 0.1,
+      "loss": 2.1156535148620605,
+      "step": 7620
+    },
+    {
+      "epoch": 0.24196825396825397,
+      "grad_norm": 0.12451171875,
+      "learning_rate": 0.1,
+      "loss": 2.100332260131836,
+      "step": 7622
+    },
+    {
+      "epoch": 0.24203174603174604,
+      "grad_norm": 0.3359375,
+      "learning_rate": 0.1,
+      "loss": 2.0983855724334717,
+      "step": 7624
+    },
+    {
+      "epoch": 0.24209523809523809,
+      "grad_norm": 0.2412109375,
+      "learning_rate": 0.1,
+      "loss": 2.110562324523926,
+      "step": 7626
+    },
+    {
+      "epoch": 0.24215873015873016,
+      "grad_norm": 0.06396484375,
+      "learning_rate": 0.1,
+      "loss": 2.120020866394043,
+      "step": 7628
+    },
+    {
+      "epoch": 0.24222222222222223,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.1,
+      "loss": 2.1192984580993652,
+      "step": 7630
+    },
+    {
+      "epoch": 0.2422857142857143,
+      "grad_norm": 0.3125,
+      "learning_rate": 0.1,
+      "loss": 2.1244804859161377,
+      "step": 7632
+    },
+    {
+      "epoch": 0.24234920634920634,
+      "grad_norm": 0.15234375,
+      "learning_rate": 0.1,
+      "loss": 2.113220691680908,
+      "step": 7634
+    },
+    {
+      "epoch": 0.2424126984126984,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.1,
+      "loss": 2.09896183013916,
+      "step": 7636
+    },
+    {
+      "epoch": 0.24247619047619048,
+      "grad_norm": 0.1640625,
+      "learning_rate": 0.1,
+      "loss": 2.1062746047973633,
+      "step": 7638
+    },
+    {
+      "epoch": 0.24253968253968253,
+      "grad_norm": 0.076171875,
+      "learning_rate": 0.1,
+      "loss": 2.0957741737365723,
+      "step": 7640
+    },
+    {
+      "epoch": 0.2426031746031746,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.1,
+      "loss": 2.1229372024536133,
+      "step": 7642
+    },
+    {
+      "epoch": 0.24266666666666667,
+      "grad_norm": 0.29296875,
+      "learning_rate": 0.1,
+      "loss": 2.109020471572876,
+      "step": 7644
+    },
+    {
+      "epoch": 0.24273015873015874,
+      "grad_norm": 0.2470703125,
+      "learning_rate": 0.1,
+      "loss": 2.1225526332855225,
+      "step": 7646
+    },
+    {
+      "epoch": 0.24279365079365078,
+      "grad_norm": 0.1064453125,
+      "learning_rate": 0.1,
+      "loss": 2.0903031826019287,
+      "step": 7648
+    },
+    {
+      "epoch": 0.24285714285714285,
+      "grad_norm": 0.1162109375,
+      "learning_rate": 0.1,
+      "loss": 2.0699353218078613,
+      "step": 7650
+    },
+    {
+      "epoch": 0.24292063492063493,
+      "grad_norm": 0.06982421875,
+      "learning_rate": 0.1,
+      "loss": 2.1053171157836914,
+      "step": 7652
+    },
+    {
+      "epoch": 0.242984126984127,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.1,
+      "loss": 2.0997400283813477,
+      "step": 7654
+    },
+    {
+      "epoch": 0.24304761904761904,
+      "grad_norm": 0.08154296875,
+      "learning_rate": 0.1,
+      "loss": 2.1162967681884766,
+      "step": 7656
+    },
+    {
+      "epoch": 0.2431111111111111,
+      "grad_norm": 0.06640625,
+      "learning_rate": 0.1,
+      "loss": 2.1261515617370605,
+      "step": 7658
+    },
+    {
+      "epoch": 0.24317460317460318,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.1,
+      "loss": 2.093658685684204,
+      "step": 7660
+    },
+    {
+      "epoch": 0.24323809523809523,
+      "grad_norm": 0.146484375,
+      "learning_rate": 0.1,
+      "loss": 2.099531888961792,
+      "step": 7662
+    },
+    {
+      "epoch": 0.2433015873015873,
+      "grad_norm": 0.142578125,
+      "learning_rate": 0.1,
+      "loss": 2.098545789718628,
+      "step": 7664
+    },
+    {
+      "epoch": 0.24336507936507937,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.1,
+      "loss": 2.0889580249786377,
+      "step": 7666
+    },
+    {
+      "epoch": 0.24342857142857144,
+      "grad_norm": 0.244140625,
+      "learning_rate": 0.1,
+      "loss": 2.082266330718994,
+      "step": 7668
+    },
+    {
+      "epoch": 0.24349206349206348,
+      "grad_norm": 0.09814453125,
+      "learning_rate": 0.1,
+      "loss": 2.0861690044403076,
+      "step": 7670
+    },
+    {
+      "epoch": 0.24355555555555555,
+      "grad_norm": 0.1669921875,
+      "learning_rate": 0.1,
+      "loss": 2.113250732421875,
+      "step": 7672
+    },
+    {
+      "epoch": 0.24361904761904762,
+      "grad_norm": 0.2392578125,
+      "learning_rate": 0.1,
+      "loss": 2.147066593170166,
+      "step": 7674
+    },
+    {
+      "epoch": 0.2436825396825397,
+      "grad_norm": 0.1162109375,
+      "learning_rate": 0.1,
+      "loss": 2.146116256713867,
+      "step": 7676
+    },
+    {
+      "epoch": 0.24374603174603174,
+      "grad_norm": 0.16796875,
+      "learning_rate": 0.1,
+      "loss": 2.1041014194488525,
+      "step": 7678
+    },
+    {
+      "epoch": 0.2438095238095238,
+      "grad_norm": 0.12109375,
+      "learning_rate": 0.1,
+      "loss": 2.1133410930633545,
+      "step": 7680
+    },
+    {
+      "epoch": 0.24387301587301588,
+      "grad_norm": 0.23046875,
+      "learning_rate": 0.1,
+      "loss": 2.0914034843444824,
+      "step": 7682
+    },
+    {
+      "epoch": 0.24393650793650792,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.1,
+      "loss": 2.0841925144195557,
+      "step": 7684
+    },
+    {
+      "epoch": 0.244,
+      "grad_norm": 0.0859375,
+      "learning_rate": 0.1,
+      "loss": 2.0767691135406494,
+      "step": 7686
+    },
+    {
+      "epoch": 0.24406349206349207,
+      "grad_norm": 0.056640625,
+      "learning_rate": 0.1,
+      "loss": 2.095615863800049,
+      "step": 7688
+    },
+    {
+      "epoch": 0.24412698412698414,
+      "grad_norm": 0.1787109375,
+      "learning_rate": 0.1,
+      "loss": 2.0853114128112793,
+      "step": 7690
+    },
+    {
+      "epoch": 0.24419047619047618,
+      "grad_norm": 0.267578125,
+      "learning_rate": 0.1,
+      "loss": 2.0804555416107178,
+      "step": 7692
+    },
+    {
+      "epoch": 0.24425396825396825,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.1,
+      "loss": 2.0776097774505615,
+      "step": 7694
+    },
+    {
+      "epoch": 0.24431746031746032,
+      "grad_norm": 0.07470703125,
+      "learning_rate": 0.1,
+      "loss": 2.124927282333374,
+      "step": 7696
+    },
+    {
+      "epoch": 0.2443809523809524,
+      "grad_norm": 0.2470703125,
+      "learning_rate": 0.1,
+      "loss": 2.095863103866577,
+      "step": 7698
+    },
+    {
+      "epoch": 0.24444444444444444,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.1,
+      "loss": 2.1371068954467773,
+      "step": 7700
+    },
+    {
+      "epoch": 0.2445079365079365,
+      "grad_norm": 0.056640625,
+      "learning_rate": 0.1,
+      "loss": 2.0929276943206787,
+      "step": 7702
+    },
+    {
+      "epoch": 0.24457142857142858,
+      "grad_norm": 0.10400390625,
+      "learning_rate": 0.1,
+      "loss": 2.104721784591675,
+      "step": 7704
+    },
+    {
+      "epoch": 0.24463492063492062,
+      "grad_norm": 0.11474609375,
+      "learning_rate": 0.1,
+      "loss": 2.08351731300354,
+      "step": 7706
+    },
+    {
+      "epoch": 0.2446984126984127,
+      "grad_norm": 0.234375,
+      "learning_rate": 0.1,
+      "loss": 2.108940839767456,
+      "step": 7708
+    },
+    {
+      "epoch": 0.24476190476190476,
+      "grad_norm": 0.248046875,
+      "learning_rate": 0.1,
+      "loss": 2.1364517211914062,
+      "step": 7710
+    },
+    {
+      "epoch": 0.24482539682539683,
+      "grad_norm": 0.10546875,
+      "learning_rate": 0.1,
+      "loss": 2.104205846786499,
+      "step": 7712
+    },
+    {
+      "epoch": 0.24488888888888888,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.1,
+      "loss": 2.0820083618164062,
+      "step": 7714
+    },
+    {
+      "epoch": 0.24495238095238095,
+      "grad_norm": 0.0810546875,
+      "learning_rate": 0.1,
+      "loss": 2.1198596954345703,
+      "step": 7716
+    },
+    {
+      "epoch": 0.24501587301587302,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.1,
+      "loss": 2.0827841758728027,
+      "step": 7718
+    },
+    {
+      "epoch": 0.2450793650793651,
+      "grad_norm": 0.1708984375,
+      "learning_rate": 0.1,
+      "loss": 2.100688934326172,
+      "step": 7720
+    },
+    {
+      "epoch": 0.24514285714285713,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.1,
+      "loss": 2.1275124549865723,
+      "step": 7722
+    },
+    {
+      "epoch": 0.2452063492063492,
+      "grad_norm": 0.076171875,
+      "learning_rate": 0.1,
+      "loss": 2.0562610626220703,
+      "step": 7724
+    },
+    {
+      "epoch": 0.24526984126984128,
+      "grad_norm": 0.271484375,
+      "learning_rate": 0.1,
+      "loss": 2.088521957397461,
+      "step": 7726
+    },
+    {
+      "epoch": 0.24533333333333332,
+      "grad_norm": 0.212890625,
+      "learning_rate": 0.1,
+      "loss": 2.1251540184020996,
+      "step": 7728
+    },
+    {
+      "epoch": 0.2453968253968254,
+      "grad_norm": 0.1572265625,
+      "learning_rate": 0.1,
+      "loss": 2.110805034637451,
+      "step": 7730
+    },
+    {
+      "epoch": 0.24546031746031746,
+      "grad_norm": 0.291015625,
+      "learning_rate": 0.1,
+      "loss": 2.1409051418304443,
+      "step": 7732
+    },
+    {
+      "epoch": 0.24552380952380953,
+      "grad_norm": 0.072265625,
+      "learning_rate": 0.1,
+      "loss": 2.089325428009033,
+      "step": 7734
+    },
+    {
+      "epoch": 0.24558730158730158,
+      "grad_norm": 0.09619140625,
+      "learning_rate": 0.1,
+      "loss": 2.116668224334717,
+      "step": 7736
+    },
+    {
+      "epoch": 0.24565079365079365,
+      "grad_norm": 0.271484375,
+      "learning_rate": 0.1,
+      "loss": 2.124164581298828,
+      "step": 7738
+    },
+    {
+      "epoch": 0.24571428571428572,
+      "grad_norm": 0.150390625,
+      "learning_rate": 0.1,
+      "loss": 2.1034581661224365,
+      "step": 7740
+    },
+    {
+      "epoch": 0.2457777777777778,
+      "grad_norm": 0.08251953125,
+      "learning_rate": 0.1,
+      "loss": 2.0777595043182373,
+      "step": 7742
+    },
+    {
+      "epoch": 0.24584126984126983,
+      "grad_norm": 0.0732421875,
+      "learning_rate": 0.1,
+      "loss": 2.087156057357788,
+      "step": 7744
+    },
+    {
+      "epoch": 0.2459047619047619,
+      "grad_norm": 0.2001953125,
+      "learning_rate": 0.1,
+      "loss": 2.0853726863861084,
+      "step": 7746
+    },
+    {
+      "epoch": 0.24596825396825397,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.1,
+      "loss": 2.1072938442230225,
+      "step": 7748
+    },
+    {
+      "epoch": 0.24603174603174602,
+      "grad_norm": 0.0869140625,
+      "learning_rate": 0.1,
+      "loss": 2.0686442852020264,
+      "step": 7750
+    },
+    {
+      "epoch": 0.2460952380952381,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.1,
+      "loss": 2.0939245223999023,
+      "step": 7752
+    },
+    {
+      "epoch": 0.24615873015873016,
+      "grad_norm": 0.095703125,
+      "learning_rate": 0.1,
+      "loss": 2.0739314556121826,
+      "step": 7754
+    },
+    {
+      "epoch": 0.24622222222222223,
+      "grad_norm": 0.1572265625,
+      "learning_rate": 0.1,
+      "loss": 2.128819465637207,
+      "step": 7756
+    },
+    {
+      "epoch": 0.24628571428571427,
+      "grad_norm": 0.1171875,
+      "learning_rate": 0.1,
+      "loss": 2.080601453781128,
+      "step": 7758
+    },
+    {
+      "epoch": 0.24634920634920635,
+      "grad_norm": 0.0625,
+      "learning_rate": 0.1,
+      "loss": 2.1003646850585938,
+      "step": 7760
+    },
+    {
+      "epoch": 0.24641269841269842,
+      "grad_norm": 0.134765625,
+      "learning_rate": 0.1,
+      "loss": 2.07106614112854,
+      "step": 7762
+    },
+    {
+      "epoch": 0.2464761904761905,
+      "grad_norm": 0.26171875,
+      "learning_rate": 0.1,
+      "loss": 2.0942091941833496,
+      "step": 7764
+    },
+    {
+      "epoch": 0.24653968253968253,
+      "grad_norm": 0.125,
+      "learning_rate": 0.1,
+      "loss": 2.0892837047576904,
+      "step": 7766
+    },
+    {
+      "epoch": 0.2466031746031746,
+      "grad_norm": 0.171875,
+      "learning_rate": 0.1,
+      "loss": 2.1195573806762695,
+      "step": 7768
+    },
+    {
+      "epoch": 0.24666666666666667,
+      "grad_norm": 0.28125,
+      "learning_rate": 0.1,
+      "loss": 2.1099534034729004,
+      "step": 7770
+    },
+    {
+      "epoch": 0.24673015873015874,
+      "grad_norm": 0.134765625,
+      "learning_rate": 0.1,
+      "loss": 2.081984281539917,
+      "step": 7772
+    },
+    {
+      "epoch": 0.2467936507936508,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.0995848178863525,
+      "step": 7774
+    },
+    {
+      "epoch": 0.24685714285714286,
+      "grad_norm": 0.1826171875,
+      "learning_rate": 0.1,
+      "loss": 2.1072371006011963,
+      "step": 7776
+    },
+    {
+      "epoch": 0.24692063492063493,
+      "grad_norm": 0.08642578125,
+      "learning_rate": 0.1,
+      "loss": 2.0923726558685303,
+      "step": 7778
+    },
+    {
+      "epoch": 0.24698412698412697,
+      "grad_norm": 0.052978515625,
+      "learning_rate": 0.1,
+      "loss": 2.088723659515381,
+      "step": 7780
+    },
+    {
+      "epoch": 0.24704761904761904,
+      "grad_norm": 0.16796875,
+      "learning_rate": 0.1,
+      "loss": 2.112581968307495,
+      "step": 7782
+    },
+    {
+      "epoch": 0.24711111111111111,
+      "grad_norm": 0.244140625,
+      "learning_rate": 0.1,
+      "loss": 2.115924596786499,
+      "step": 7784
+    },
+    {
+      "epoch": 0.24717460317460319,
+      "grad_norm": 0.059814453125,
+      "learning_rate": 0.1,
+      "loss": 2.103855609893799,
+      "step": 7786
+    },
+    {
+      "epoch": 0.24723809523809523,
+      "grad_norm": 0.169921875,
+      "learning_rate": 0.1,
+      "loss": 2.0790719985961914,
+      "step": 7788
+    },
+    {
+      "epoch": 0.2473015873015873,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.1,
+      "loss": 2.1142492294311523,
+      "step": 7790
+    },
+    {
+      "epoch": 0.24736507936507937,
+      "grad_norm": 0.212890625,
+      "learning_rate": 0.1,
+      "loss": 2.140623092651367,
+      "step": 7792
+    },
+    {
+      "epoch": 0.24742857142857144,
+      "grad_norm": 0.10205078125,
+      "learning_rate": 0.1,
+      "loss": 2.0839195251464844,
+      "step": 7794
+    },
+    {
+      "epoch": 0.24749206349206349,
+      "grad_norm": 0.20703125,
+      "learning_rate": 0.1,
+      "loss": 2.085799217224121,
+      "step": 7796
+    },
+    {
+      "epoch": 0.24755555555555556,
+      "grad_norm": 0.1796875,
+      "learning_rate": 0.1,
+      "loss": 2.0974864959716797,
+      "step": 7798
+    },
+    {
+      "epoch": 0.24761904761904763,
+      "grad_norm": 0.27734375,
+      "learning_rate": 0.1,
+      "loss": 2.100862741470337,
+      "step": 7800
+    },
+    {
+      "epoch": 0.24768253968253967,
+      "grad_norm": 0.265625,
+      "learning_rate": 0.1,
+      "loss": 2.123643636703491,
+      "step": 7802
+    },
+    {
+      "epoch": 0.24774603174603174,
+      "grad_norm": 0.1533203125,
+      "learning_rate": 0.1,
+      "loss": 2.111666202545166,
+      "step": 7804
+    },
+    {
+      "epoch": 0.2478095238095238,
+      "grad_norm": 0.06640625,
+      "learning_rate": 0.1,
+      "loss": 2.1051220893859863,
+      "step": 7806
+    },
+    {
+      "epoch": 0.24787301587301588,
+      "grad_norm": 0.0693359375,
+      "learning_rate": 0.1,
+      "loss": 2.1013715267181396,
+      "step": 7808
+    },
+    {
+      "epoch": 0.24793650793650793,
+      "grad_norm": 0.11181640625,
+      "learning_rate": 0.1,
+      "loss": 2.0939769744873047,
+      "step": 7810
+    },
+    {
+      "epoch": 0.248,
+      "grad_norm": 0.07275390625,
+      "learning_rate": 0.1,
+      "loss": 2.0908074378967285,
+      "step": 7812
+    },
+    {
+      "epoch": 0.24806349206349207,
+      "grad_norm": 0.1025390625,
+      "learning_rate": 0.1,
+      "loss": 2.0955634117126465,
+      "step": 7814
+    },
+    {
+      "epoch": 0.24812698412698414,
+      "grad_norm": 0.087890625,
+      "learning_rate": 0.1,
+      "loss": 2.0926597118377686,
+      "step": 7816
+    },
+    {
+      "epoch": 0.24819047619047618,
+      "grad_norm": 0.095703125,
+      "learning_rate": 0.1,
+      "loss": 2.1247200965881348,
+      "step": 7818
+    },
+    {
+      "epoch": 0.24825396825396825,
+      "grad_norm": 0.421875,
+      "learning_rate": 0.1,
+      "loss": 2.096296548843384,
+      "step": 7820
+    },
+    {
+      "epoch": 0.24831746031746033,
+      "grad_norm": 0.396484375,
+      "learning_rate": 0.1,
+      "loss": 2.1253182888031006,
+      "step": 7822
+    },
+    {
+      "epoch": 0.24838095238095237,
+      "grad_norm": 0.2001953125,
+      "learning_rate": 0.1,
+      "loss": 2.1058707237243652,
+      "step": 7824
+    },
+    {
+      "epoch": 0.24844444444444444,
+      "grad_norm": 0.1513671875,
+      "learning_rate": 0.1,
+      "loss": 2.07863187789917,
+      "step": 7826
+    },
+    {
+      "epoch": 0.2485079365079365,
+      "grad_norm": 0.150390625,
+      "learning_rate": 0.1,
+      "loss": 2.123563766479492,
+      "step": 7828
+    },
+    {
+      "epoch": 0.24857142857142858,
+      "grad_norm": 0.11669921875,
+      "learning_rate": 0.1,
+      "loss": 2.111006736755371,
+      "step": 7830
+    },
+    {
+      "epoch": 0.24863492063492063,
+      "grad_norm": 0.10009765625,
+      "learning_rate": 0.1,
+      "loss": 2.103977680206299,
+      "step": 7832
+    },
+    {
+      "epoch": 0.2486984126984127,
+      "grad_norm": 0.11474609375,
+      "learning_rate": 0.1,
+      "loss": 2.104381561279297,
+      "step": 7834
+    },
+    {
+      "epoch": 0.24876190476190477,
+      "grad_norm": 0.16015625,
+      "learning_rate": 0.1,
+      "loss": 2.113999843597412,
+      "step": 7836
+    },
+    {
+      "epoch": 0.24882539682539684,
+      "grad_norm": 0.267578125,
+      "learning_rate": 0.1,
+      "loss": 2.0946075916290283,
+      "step": 7838
+    },
+    {
+      "epoch": 0.24888888888888888,
+      "grad_norm": 0.1640625,
+      "learning_rate": 0.1,
+      "loss": 2.123718023300171,
+      "step": 7840
+    },
+    {
+      "epoch": 0.24895238095238095,
+      "grad_norm": 0.0908203125,
+      "learning_rate": 0.1,
+      "loss": 2.1301300525665283,
+      "step": 7842
+    },
+    {
+      "epoch": 0.24901587301587302,
+      "grad_norm": 0.0830078125,
+      "learning_rate": 0.1,
+      "loss": 2.1389126777648926,
+      "step": 7844
+    },
+    {
+      "epoch": 0.24907936507936507,
+      "grad_norm": 0.1123046875,
+      "learning_rate": 0.1,
+      "loss": 2.138192892074585,
+      "step": 7846
+    },
+    {
+      "epoch": 0.24914285714285714,
+      "grad_norm": 0.28515625,
+      "learning_rate": 0.1,
+      "loss": 2.091545581817627,
+      "step": 7848
+    },
+    {
+      "epoch": 0.2492063492063492,
+      "grad_norm": 0.2080078125,
+      "learning_rate": 0.1,
+      "loss": 2.090691089630127,
+      "step": 7850
+    },
+    {
+      "epoch": 0.24926984126984128,
+      "grad_norm": 0.1533203125,
+      "learning_rate": 0.1,
+      "loss": 2.1429617404937744,
+      "step": 7852
+    },
+    {
+      "epoch": 0.24933333333333332,
+      "grad_norm": 0.08447265625,
+      "learning_rate": 0.1,
+      "loss": 2.1256930828094482,
+      "step": 7854
+    },
+    {
+      "epoch": 0.2493968253968254,
+      "grad_norm": 0.0830078125,
+      "learning_rate": 0.1,
+      "loss": 2.0978100299835205,
+      "step": 7856
+    },
+    {
+      "epoch": 0.24946031746031747,
+      "grad_norm": 0.240234375,
+      "learning_rate": 0.1,
+      "loss": 2.1158103942871094,
+      "step": 7858
+    },
+    {
+      "epoch": 0.24952380952380954,
+      "grad_norm": 0.236328125,
+      "learning_rate": 0.1,
+      "loss": 2.1170918941497803,
+      "step": 7860
+    },
+    {
+      "epoch": 0.24958730158730158,
+      "grad_norm": 0.11181640625,
+      "learning_rate": 0.1,
+      "loss": 2.0898027420043945,
+      "step": 7862
+    },
+    {
+      "epoch": 0.24965079365079365,
+      "grad_norm": 0.1650390625,
+      "learning_rate": 0.1,
+      "loss": 2.1037285327911377,
+      "step": 7864
+    },
+    {
+      "epoch": 0.24971428571428572,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.1,
+      "loss": 2.1031713485717773,
+      "step": 7866
+    },
+    {
+      "epoch": 0.24977777777777777,
+      "grad_norm": 0.11572265625,
+      "learning_rate": 0.1,
+      "loss": 2.089454174041748,
+      "step": 7868
+    },
+    {
+      "epoch": 0.24984126984126984,
+      "grad_norm": 0.13671875,
+      "learning_rate": 0.1,
+      "loss": 2.1241745948791504,
+      "step": 7870
+    },
+    {
+      "epoch": 0.2499047619047619,
+      "grad_norm": 0.185546875,
+      "learning_rate": 0.1,
+      "loss": 2.114145278930664,
+      "step": 7872
+    },
+    {
+      "epoch": 0.24996825396825398,
+      "grad_norm": 0.259765625,
+      "learning_rate": 0.1,
+      "loss": 2.1260359287261963,
+      "step": 7874
     }
   ],
   "logging_steps": 2,
       "attributes": {}
     }
   },
+  "total_flos": 2.608016489333632e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null