Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on 5 days ago

Commit

27a8e40

verified ·

1 Parent(s): 2607bcc

Training in progress, step 12915, checkpoint

Browse files

Files changed (4) hide show

last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +1102 -3

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4f7a031ce062de2717c20cbfe28bf235cc8c0984196df8647a9d04071a672be7
 size 1171937904

 version https://git-lfs.github.com/spec/v1
+oid sha256:1c4ab61291ed876d2846992c1aedc554bfcbc5ab9a61ff3f786fac212673430e
 size 1171937904

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2bca4c44c416761810f1c3083731707ad3a2d1e7ac24304feb6b7f426c34993d
 size 1288212619

 version https://git-lfs.github.com/spec/v1
+oid sha256:dafff400ed1ea4c4e87a08eca726a8823cbd85926bcb70e6f6c4e823327f74fa
 size 1288212619

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:17739480306268eecb229c9abd21a55ac8184b30446253afafb60d7a0227de30
 size 1401

 version https://git-lfs.github.com/spec/v1
+oid sha256:c0146b598e2b404bbfd38ad4897f388ac3b184beab65521e27d27e70a8fd0073
 size 1401

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.4,
   "eval_steps": 3150,
-  "global_step": 12600,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -44140,6 +44140,1105 @@
       "eval_samples_per_second": 10.028,
       "eval_steps_per_second": 2.512,
       "step": 12600
     }
   ],
   "logging_steps": 2,
@@ -44159,7 +45258,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 4.17291330903253e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.41,
   "eval_steps": 3150,
+  "global_step": 12915,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "eval_samples_per_second": 10.028,
       "eval_steps_per_second": 2.512,
       "step": 12600
+    },
+    {
+      "epoch": 0.40006349206349207,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.1,
+      "loss": 2.377570629119873,
+      "step": 12602
+    },
+    {
+      "epoch": 0.4001269841269841,
+      "grad_norm": 0.0693359375,
+      "learning_rate": 0.1,
+      "loss": 2.394019365310669,
+      "step": 12604
+    },
+    {
+      "epoch": 0.4001904761904762,
+      "grad_norm": 0.1953125,
+      "learning_rate": 0.1,
+      "loss": 2.3574202060699463,
+      "step": 12606
+    },
+    {
+      "epoch": 0.40025396825396825,
+      "grad_norm": 0.1513671875,
+      "learning_rate": 0.1,
+      "loss": 2.33833646774292,
+      "step": 12608
+    },
+    {
+      "epoch": 0.4003174603174603,
+      "grad_norm": 0.33203125,
+      "learning_rate": 0.1,
+      "loss": 2.369246244430542,
+      "step": 12610
+    },
+    {
+      "epoch": 0.4003809523809524,
+      "grad_norm": 0.181640625,
+      "learning_rate": 0.1,
+      "loss": 2.3324646949768066,
+      "step": 12612
+    },
+    {
+      "epoch": 0.40044444444444444,
+      "grad_norm": 0.2021484375,
+      "learning_rate": 0.1,
+      "loss": 2.342609405517578,
+      "step": 12614
+    },
+    {
+      "epoch": 0.40050793650793653,
+      "grad_norm": 0.208984375,
+      "learning_rate": 0.1,
+      "loss": 2.3590927124023438,
+      "step": 12616
+    },
+    {
+      "epoch": 0.4005714285714286,
+      "grad_norm": 0.265625,
+      "learning_rate": 0.1,
+      "loss": 2.340346097946167,
+      "step": 12618
+    },
+    {
+      "epoch": 0.4006349206349206,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.1,
+      "loss": 2.324613571166992,
+      "step": 12620
+    },
+    {
+      "epoch": 0.4006984126984127,
+      "grad_norm": 0.09521484375,
+      "learning_rate": 0.1,
+      "loss": 2.3287599086761475,
+      "step": 12622
+    },
+    {
+      "epoch": 0.40076190476190476,
+      "grad_norm": 0.1884765625,
+      "learning_rate": 0.1,
+      "loss": 2.3095972537994385,
+      "step": 12624
+    },
+    {
+      "epoch": 0.4008253968253968,
+      "grad_norm": 0.06591796875,
+      "learning_rate": 0.1,
+      "loss": 2.3337745666503906,
+      "step": 12626
+    },
+    {
+      "epoch": 0.4008888888888889,
+      "grad_norm": 0.0947265625,
+      "learning_rate": 0.1,
+      "loss": 2.3558530807495117,
+      "step": 12628
+    },
+    {
+      "epoch": 0.40095238095238095,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.1,
+      "loss": 2.3162124156951904,
+      "step": 12630
+    },
+    {
+      "epoch": 0.401015873015873,
+      "grad_norm": 0.20703125,
+      "learning_rate": 0.1,
+      "loss": 2.35158634185791,
+      "step": 12632
+    },
+    {
+      "epoch": 0.4010793650793651,
+      "grad_norm": 0.123046875,
+      "learning_rate": 0.1,
+      "loss": 2.3345530033111572,
+      "step": 12634
+    },
+    {
+      "epoch": 0.40114285714285713,
+      "grad_norm": 0.212890625,
+      "learning_rate": 0.1,
+      "loss": 2.3051955699920654,
+      "step": 12636
+    },
+    {
+      "epoch": 0.40120634920634923,
+      "grad_norm": 0.19140625,
+      "learning_rate": 0.1,
+      "loss": 2.3525850772857666,
+      "step": 12638
+    },
+    {
+      "epoch": 0.4012698412698413,
+      "grad_norm": 0.0791015625,
+      "learning_rate": 0.1,
+      "loss": 2.3211631774902344,
+      "step": 12640
+    },
+    {
+      "epoch": 0.4013333333333333,
+      "grad_norm": 0.193359375,
+      "learning_rate": 0.1,
+      "loss": 2.300722122192383,
+      "step": 12642
+    },
+    {
+      "epoch": 0.4013968253968254,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.1,
+      "loss": 2.308464288711548,
+      "step": 12644
+    },
+    {
+      "epoch": 0.40146031746031746,
+      "grad_norm": 0.265625,
+      "learning_rate": 0.1,
+      "loss": 2.2925209999084473,
+      "step": 12646
+    },
+    {
+      "epoch": 0.4015238095238095,
+      "grad_norm": 0.29296875,
+      "learning_rate": 0.1,
+      "loss": 2.31681227684021,
+      "step": 12648
+    },
+    {
+      "epoch": 0.4015873015873016,
+      "grad_norm": 0.076171875,
+      "learning_rate": 0.1,
+      "loss": 2.309741973876953,
+      "step": 12650
+    },
+    {
+      "epoch": 0.40165079365079365,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.1,
+      "loss": 2.2926597595214844,
+      "step": 12652
+    },
+    {
+      "epoch": 0.4017142857142857,
+      "grad_norm": 0.1904296875,
+      "learning_rate": 0.1,
+      "loss": 2.289318323135376,
+      "step": 12654
+    },
+    {
+      "epoch": 0.4017777777777778,
+      "grad_norm": 0.19140625,
+      "learning_rate": 0.1,
+      "loss": 2.311735153198242,
+      "step": 12656
+    },
+    {
+      "epoch": 0.40184126984126983,
+      "grad_norm": 0.08203125,
+      "learning_rate": 0.1,
+      "loss": 2.292684316635132,
+      "step": 12658
+    },
+    {
+      "epoch": 0.40190476190476193,
+      "grad_norm": 0.134765625,
+      "learning_rate": 0.1,
+      "loss": 2.2918014526367188,
+      "step": 12660
+    },
+    {
+      "epoch": 0.401968253968254,
+      "grad_norm": 0.4375,
+      "learning_rate": 0.1,
+      "loss": 2.315018892288208,
+      "step": 12662
+    },
+    {
+      "epoch": 0.402031746031746,
+      "grad_norm": 0.306640625,
+      "learning_rate": 0.1,
+      "loss": 2.2844858169555664,
+      "step": 12664
+    },
+    {
+      "epoch": 0.4020952380952381,
+      "grad_norm": 0.267578125,
+      "learning_rate": 0.1,
+      "loss": 2.2867681980133057,
+      "step": 12666
+    },
+    {
+      "epoch": 0.40215873015873016,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.1,
+      "loss": 2.2710859775543213,
+      "step": 12668
+    },
+    {
+      "epoch": 0.4022222222222222,
+      "grad_norm": 0.16796875,
+      "learning_rate": 0.1,
+      "loss": 2.2854208946228027,
+      "step": 12670
+    },
+    {
+      "epoch": 0.4022857142857143,
+      "grad_norm": 0.043212890625,
+      "learning_rate": 0.1,
+      "loss": 2.2909247875213623,
+      "step": 12672
+    },
+    {
+      "epoch": 0.40234920634920635,
+      "grad_norm": 0.21875,
+      "learning_rate": 0.1,
+      "loss": 2.272588014602661,
+      "step": 12674
+    },
+    {
+      "epoch": 0.4024126984126984,
+      "grad_norm": 0.318359375,
+      "learning_rate": 0.1,
+      "loss": 2.2849819660186768,
+      "step": 12676
+    },
+    {
+      "epoch": 0.4024761904761905,
+      "grad_norm": 0.2294921875,
+      "learning_rate": 0.1,
+      "loss": 2.2895278930664062,
+      "step": 12678
+    },
+    {
+      "epoch": 0.40253968253968253,
+      "grad_norm": 0.07958984375,
+      "learning_rate": 0.1,
+      "loss": 2.2852509021759033,
+      "step": 12680
+    },
+    {
+      "epoch": 0.40260317460317463,
+      "grad_norm": 0.06640625,
+      "learning_rate": 0.1,
+      "loss": 2.284451484680176,
+      "step": 12682
+    },
+    {
+      "epoch": 0.4026666666666667,
+      "grad_norm": 0.1552734375,
+      "learning_rate": 0.1,
+      "loss": 2.2716317176818848,
+      "step": 12684
+    },
+    {
+      "epoch": 0.4027301587301587,
+      "grad_norm": 0.095703125,
+      "learning_rate": 0.1,
+      "loss": 2.2610225677490234,
+      "step": 12686
+    },
+    {
+      "epoch": 0.4027936507936508,
+      "grad_norm": 0.138671875,
+      "learning_rate": 0.1,
+      "loss": 2.2757816314697266,
+      "step": 12688
+    },
+    {
+      "epoch": 0.40285714285714286,
+      "grad_norm": 0.1494140625,
+      "learning_rate": 0.1,
+      "loss": 2.2554984092712402,
+      "step": 12690
+    },
+    {
+      "epoch": 0.4029206349206349,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.1,
+      "loss": 2.2480530738830566,
+      "step": 12692
+    },
+    {
+      "epoch": 0.402984126984127,
+      "grad_norm": 0.10888671875,
+      "learning_rate": 0.1,
+      "loss": 2.276384115219116,
+      "step": 12694
+    },
+    {
+      "epoch": 0.40304761904761904,
+      "grad_norm": 0.140625,
+      "learning_rate": 0.1,
+      "loss": 2.269636869430542,
+      "step": 12696
+    },
+    {
+      "epoch": 0.4031111111111111,
+      "grad_norm": 0.19921875,
+      "learning_rate": 0.1,
+      "loss": 2.2666258811950684,
+      "step": 12698
+    },
+    {
+      "epoch": 0.4031746031746032,
+      "grad_norm": 0.06884765625,
+      "learning_rate": 0.1,
+      "loss": 2.276977062225342,
+      "step": 12700
+    },
+    {
+      "epoch": 0.40323809523809523,
+      "grad_norm": 0.125,
+      "learning_rate": 0.1,
+      "loss": 2.2830371856689453,
+      "step": 12702
+    },
+    {
+      "epoch": 0.4033015873015873,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.1,
+      "loss": 2.277653932571411,
+      "step": 12704
+    },
+    {
+      "epoch": 0.40336507936507937,
+      "grad_norm": 0.123046875,
+      "learning_rate": 0.1,
+      "loss": 2.3027429580688477,
+      "step": 12706
+    },
+    {
+      "epoch": 0.4034285714285714,
+      "grad_norm": 0.12451171875,
+      "learning_rate": 0.1,
+      "loss": 2.2705657482147217,
+      "step": 12708
+    },
+    {
+      "epoch": 0.4034920634920635,
+      "grad_norm": 0.314453125,
+      "learning_rate": 0.1,
+      "loss": 2.2589237689971924,
+      "step": 12710
+    },
+    {
+      "epoch": 0.40355555555555556,
+      "grad_norm": 0.298828125,
+      "learning_rate": 0.1,
+      "loss": 2.239840030670166,
+      "step": 12712
+    },
+    {
+      "epoch": 0.4036190476190476,
+      "grad_norm": 0.0830078125,
+      "learning_rate": 0.1,
+      "loss": 2.250976324081421,
+      "step": 12714
+    },
+    {
+      "epoch": 0.4036825396825397,
+      "grad_norm": 0.09228515625,
+      "learning_rate": 0.1,
+      "loss": 2.246659517288208,
+      "step": 12716
+    },
+    {
+      "epoch": 0.40374603174603174,
+      "grad_norm": 0.1494140625,
+      "learning_rate": 0.1,
+      "loss": 2.259284496307373,
+      "step": 12718
+    },
+    {
+      "epoch": 0.4038095238095238,
+      "grad_norm": 0.13671875,
+      "learning_rate": 0.1,
+      "loss": 2.2647953033447266,
+      "step": 12720
+    },
+    {
+      "epoch": 0.4038730158730159,
+      "grad_norm": 0.228515625,
+      "learning_rate": 0.1,
+      "loss": 2.234811544418335,
+      "step": 12722
+    },
+    {
+      "epoch": 0.4039365079365079,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.1,
+      "loss": 2.2175509929656982,
+      "step": 12724
+    },
+    {
+      "epoch": 0.404,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.1,
+      "loss": 2.2525484561920166,
+      "step": 12726
+    },
+    {
+      "epoch": 0.40406349206349207,
+      "grad_norm": 0.12109375,
+      "learning_rate": 0.1,
+      "loss": 2.2338736057281494,
+      "step": 12728
+    },
+    {
+      "epoch": 0.4041269841269841,
+      "grad_norm": 0.10302734375,
+      "learning_rate": 0.1,
+      "loss": 2.2381958961486816,
+      "step": 12730
+    },
+    {
+      "epoch": 0.4041904761904762,
+      "grad_norm": 0.2177734375,
+      "learning_rate": 0.1,
+      "loss": 2.2337396144866943,
+      "step": 12732
+    },
+    {
+      "epoch": 0.40425396825396825,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.1,
+      "loss": 2.2588319778442383,
+      "step": 12734
+    },
+    {
+      "epoch": 0.4043174603174603,
+      "grad_norm": 0.279296875,
+      "learning_rate": 0.1,
+      "loss": 2.2670295238494873,
+      "step": 12736
+    },
+    {
+      "epoch": 0.4043809523809524,
+      "grad_norm": 0.0869140625,
+      "learning_rate": 0.1,
+      "loss": 2.2243025302886963,
+      "step": 12738
+    },
+    {
+      "epoch": 0.40444444444444444,
+      "grad_norm": 0.07861328125,
+      "learning_rate": 0.1,
+      "loss": 2.251145362854004,
+      "step": 12740
+    },
+    {
+      "epoch": 0.4045079365079365,
+      "grad_norm": 0.11962890625,
+      "learning_rate": 0.1,
+      "loss": 2.2257883548736572,
+      "step": 12742
+    },
+    {
+      "epoch": 0.4045714285714286,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.1,
+      "loss": 2.225264549255371,
+      "step": 12744
+    },
+    {
+      "epoch": 0.4046349206349206,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.1,
+      "loss": 2.2152347564697266,
+      "step": 12746
+    },
+    {
+      "epoch": 0.4046984126984127,
+      "grad_norm": 0.39453125,
+      "learning_rate": 0.1,
+      "loss": 2.2306759357452393,
+      "step": 12748
+    },
+    {
+      "epoch": 0.40476190476190477,
+      "grad_norm": 0.0791015625,
+      "learning_rate": 0.1,
+      "loss": 2.22837233543396,
+      "step": 12750
+    },
+    {
+      "epoch": 0.4048253968253968,
+      "grad_norm": 0.1083984375,
+      "learning_rate": 0.1,
+      "loss": 2.257899522781372,
+      "step": 12752
+    },
+    {
+      "epoch": 0.4048888888888889,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.246670722961426,
+      "step": 12754
+    },
+    {
+      "epoch": 0.40495238095238095,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.1,
+      "loss": 2.2245774269104004,
+      "step": 12756
+    },
+    {
+      "epoch": 0.405015873015873,
+      "grad_norm": 0.06494140625,
+      "learning_rate": 0.1,
+      "loss": 2.2274630069732666,
+      "step": 12758
+    },
+    {
+      "epoch": 0.4050793650793651,
+      "grad_norm": 0.052490234375,
+      "learning_rate": 0.1,
+      "loss": 2.240325927734375,
+      "step": 12760
+    },
+    {
+      "epoch": 0.40514285714285714,
+      "grad_norm": 0.13671875,
+      "learning_rate": 0.1,
+      "loss": 2.2673754692077637,
+      "step": 12762
+    },
+    {
+      "epoch": 0.4052063492063492,
+      "grad_norm": 0.06494140625,
+      "learning_rate": 0.1,
+      "loss": 2.2353477478027344,
+      "step": 12764
+    },
+    {
+      "epoch": 0.4052698412698413,
+      "grad_norm": 0.1416015625,
+      "learning_rate": 0.1,
+      "loss": 2.2503252029418945,
+      "step": 12766
+    },
+    {
+      "epoch": 0.4053333333333333,
+      "grad_norm": 0.41015625,
+      "learning_rate": 0.1,
+      "loss": 2.2447316646575928,
+      "step": 12768
+    },
+    {
+      "epoch": 0.4053968253968254,
+      "grad_norm": 0.177734375,
+      "learning_rate": 0.1,
+      "loss": 2.234184741973877,
+      "step": 12770
+    },
+    {
+      "epoch": 0.40546031746031747,
+      "grad_norm": 0.06982421875,
+      "learning_rate": 0.1,
+      "loss": 2.243861198425293,
+      "step": 12772
+    },
+    {
+      "epoch": 0.4055238095238095,
+      "grad_norm": 0.06982421875,
+      "learning_rate": 0.1,
+      "loss": 2.2590394020080566,
+      "step": 12774
+    },
+    {
+      "epoch": 0.4055873015873016,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.1,
+      "loss": 2.251347064971924,
+      "step": 12776
+    },
+    {
+      "epoch": 0.40565079365079365,
+      "grad_norm": 0.0791015625,
+      "learning_rate": 0.1,
+      "loss": 2.2699408531188965,
+      "step": 12778
+    },
+    {
+      "epoch": 0.4057142857142857,
+      "grad_norm": 0.212890625,
+      "learning_rate": 0.1,
+      "loss": 2.25612735748291,
+      "step": 12780
+    },
+    {
+      "epoch": 0.4057777777777778,
+      "grad_norm": 0.1953125,
+      "learning_rate": 0.1,
+      "loss": 2.2291111946105957,
+      "step": 12782
+    },
+    {
+      "epoch": 0.40584126984126984,
+      "grad_norm": 0.1171875,
+      "learning_rate": 0.1,
+      "loss": 2.274329900741577,
+      "step": 12784
+    },
+    {
+      "epoch": 0.4059047619047619,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.1,
+      "loss": 2.2519423961639404,
+      "step": 12786
+    },
+    {
+      "epoch": 0.405968253968254,
+      "grad_norm": 0.07421875,
+      "learning_rate": 0.1,
+      "loss": 2.256042957305908,
+      "step": 12788
+    },
+    {
+      "epoch": 0.406031746031746,
+      "grad_norm": 0.33203125,
+      "learning_rate": 0.1,
+      "loss": 2.272219657897949,
+      "step": 12790
+    },
+    {
+      "epoch": 0.4060952380952381,
+      "grad_norm": 0.578125,
+      "learning_rate": 0.1,
+      "loss": 2.256260871887207,
+      "step": 12792
+    },
+    {
+      "epoch": 0.40615873015873016,
+      "grad_norm": 0.08544921875,
+      "learning_rate": 0.1,
+      "loss": 2.251669406890869,
+      "step": 12794
+    },
+    {
+      "epoch": 0.4062222222222222,
+      "grad_norm": 0.04541015625,
+      "learning_rate": 0.1,
+      "loss": 2.269117593765259,
+      "step": 12796
+    },
+    {
+      "epoch": 0.4062857142857143,
+      "grad_norm": 0.08447265625,
+      "learning_rate": 0.1,
+      "loss": 2.2264370918273926,
+      "step": 12798
+    },
+    {
+      "epoch": 0.40634920634920635,
+      "grad_norm": 0.125,
+      "learning_rate": 0.1,
+      "loss": 2.255627393722534,
+      "step": 12800
+    },
+    {
+      "epoch": 0.4064126984126984,
+      "grad_norm": 0.16015625,
+      "learning_rate": 0.1,
+      "loss": 2.232905387878418,
+      "step": 12802
+    },
+    {
+      "epoch": 0.4064761904761905,
+      "grad_norm": 0.177734375,
+      "learning_rate": 0.1,
+      "loss": 2.2764182090759277,
+      "step": 12804
+    },
+    {
+      "epoch": 0.40653968253968253,
+      "grad_norm": 0.1083984375,
+      "learning_rate": 0.1,
+      "loss": 2.243446111679077,
+      "step": 12806
+    },
+    {
+      "epoch": 0.4066031746031746,
+      "grad_norm": 0.072265625,
+      "learning_rate": 0.1,
+      "loss": 2.2535452842712402,
+      "step": 12808
+    },
+    {
+      "epoch": 0.4066666666666667,
+      "grad_norm": 0.0458984375,
+      "learning_rate": 0.1,
+      "loss": 2.2059407234191895,
+      "step": 12810
+    },
+    {
+      "epoch": 0.4067301587301587,
+      "grad_norm": 0.162109375,
+      "learning_rate": 0.1,
+      "loss": 2.250304698944092,
+      "step": 12812
+    },
+    {
+      "epoch": 0.4067936507936508,
+      "grad_norm": 0.2099609375,
+      "learning_rate": 0.1,
+      "loss": 2.2253971099853516,
+      "step": 12814
+    },
+    {
+      "epoch": 0.40685714285714286,
+      "grad_norm": 0.1376953125,
+      "learning_rate": 0.1,
+      "loss": 2.227365493774414,
+      "step": 12816
+    },
+    {
+      "epoch": 0.4069206349206349,
+      "grad_norm": 0.0908203125,
+      "learning_rate": 0.1,
+      "loss": 2.25632905960083,
+      "step": 12818
+    },
+    {
+      "epoch": 0.406984126984127,
+      "grad_norm": 0.18359375,
+      "learning_rate": 0.1,
+      "loss": 2.231961965560913,
+      "step": 12820
+    },
+    {
+      "epoch": 0.40704761904761905,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.1,
+      "loss": 2.2555220127105713,
+      "step": 12822
+    },
+    {
+      "epoch": 0.4071111111111111,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.1,
+      "loss": 2.24263334274292,
+      "step": 12824
+    },
+    {
+      "epoch": 0.4071746031746032,
+      "grad_norm": 0.173828125,
+      "learning_rate": 0.1,
+      "loss": 2.242009162902832,
+      "step": 12826
+    },
+    {
+      "epoch": 0.40723809523809523,
+      "grad_norm": 0.1494140625,
+      "learning_rate": 0.1,
+      "loss": 2.246868371963501,
+      "step": 12828
+    },
+    {
+      "epoch": 0.4073015873015873,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.1,
+      "loss": 2.2295122146606445,
+      "step": 12830
+    },
+    {
+      "epoch": 0.4073650793650794,
+      "grad_norm": 0.267578125,
+      "learning_rate": 0.1,
+      "loss": 2.263503313064575,
+      "step": 12832
+    },
+    {
+      "epoch": 0.4074285714285714,
+      "grad_norm": 0.09423828125,
+      "learning_rate": 0.1,
+      "loss": 2.241060256958008,
+      "step": 12834
+    },
+    {
+      "epoch": 0.4074920634920635,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.1,
+      "loss": 2.2220916748046875,
+      "step": 12836
+    },
+    {
+      "epoch": 0.40755555555555556,
+      "grad_norm": 0.27734375,
+      "learning_rate": 0.1,
+      "loss": 2.221209764480591,
+      "step": 12838
+    },
+    {
+      "epoch": 0.4076190476190476,
+      "grad_norm": 0.10546875,
+      "learning_rate": 0.1,
+      "loss": 2.230886220932007,
+      "step": 12840
+    },
+    {
+      "epoch": 0.4076825396825397,
+      "grad_norm": 0.123046875,
+      "learning_rate": 0.1,
+      "loss": 2.2281064987182617,
+      "step": 12842
+    },
+    {
+      "epoch": 0.40774603174603175,
+      "grad_norm": 0.09765625,
+      "learning_rate": 0.1,
+      "loss": 2.2533373832702637,
+      "step": 12844
+    },
+    {
+      "epoch": 0.4078095238095238,
+      "grad_norm": 0.11279296875,
+      "learning_rate": 0.1,
+      "loss": 2.220545530319214,
+      "step": 12846
+    },
+    {
+      "epoch": 0.4078730158730159,
+      "grad_norm": 0.1962890625,
+      "learning_rate": 0.1,
+      "loss": 2.207355499267578,
+      "step": 12848
+    },
+    {
+      "epoch": 0.40793650793650793,
+      "grad_norm": 0.07568359375,
+      "learning_rate": 0.1,
+      "loss": 2.2466094493865967,
+      "step": 12850
+    },
+    {
+      "epoch": 0.408,
+      "grad_norm": 0.083984375,
+      "learning_rate": 0.1,
+      "loss": 2.2127885818481445,
+      "step": 12852
+    },
+    {
+      "epoch": 0.4080634920634921,
+      "grad_norm": 0.2216796875,
+      "learning_rate": 0.1,
+      "loss": 2.2296013832092285,
+      "step": 12854
+    },
+    {
+      "epoch": 0.4081269841269841,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.1,
+      "loss": 2.2332053184509277,
+      "step": 12856
+    },
+    {
+      "epoch": 0.4081904761904762,
+      "grad_norm": 0.11962890625,
+      "learning_rate": 0.1,
+      "loss": 2.2455201148986816,
+      "step": 12858
+    },
+    {
+      "epoch": 0.40825396825396826,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.1,
+      "loss": 2.2342209815979004,
+      "step": 12860
+    },
+    {
+      "epoch": 0.4083174603174603,
+      "grad_norm": 0.134765625,
+      "learning_rate": 0.1,
+      "loss": 2.2207164764404297,
+      "step": 12862
+    },
+    {
+      "epoch": 0.4083809523809524,
+      "grad_norm": 0.220703125,
+      "learning_rate": 0.1,
+      "loss": 2.2393648624420166,
+      "step": 12864
+    },
+    {
+      "epoch": 0.40844444444444444,
+      "grad_norm": 0.146484375,
+      "learning_rate": 0.1,
+      "loss": 2.2382168769836426,
+      "step": 12866
+    },
+    {
+      "epoch": 0.4085079365079365,
+      "grad_norm": 0.06689453125,
+      "learning_rate": 0.1,
+      "loss": 2.2354063987731934,
+      "step": 12868
+    },
+    {
+      "epoch": 0.4085714285714286,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.2208573818206787,
+      "step": 12870
+    },
+    {
+      "epoch": 0.40863492063492063,
+      "grad_norm": 0.06689453125,
+      "learning_rate": 0.1,
+      "loss": 2.2357020378112793,
+      "step": 12872
+    },
+    {
+      "epoch": 0.40869841269841267,
+      "grad_norm": 0.1513671875,
+      "learning_rate": 0.1,
+      "loss": 2.2209126949310303,
+      "step": 12874
+    },
+    {
+      "epoch": 0.40876190476190477,
+      "grad_norm": 0.310546875,
+      "learning_rate": 0.1,
+      "loss": 2.2232158184051514,
+      "step": 12876
+    },
+    {
+      "epoch": 0.4088253968253968,
+      "grad_norm": 0.275390625,
+      "learning_rate": 0.1,
+      "loss": 2.1869778633117676,
+      "step": 12878
+    },
+    {
+      "epoch": 0.4088888888888889,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.1,
+      "loss": 2.230013847351074,
+      "step": 12880
+    },
+    {
+      "epoch": 0.40895238095238096,
+      "grad_norm": 0.1416015625,
+      "learning_rate": 0.1,
+      "loss": 2.2143027782440186,
+      "step": 12882
+    },
+    {
+      "epoch": 0.409015873015873,
+      "grad_norm": 0.171875,
+      "learning_rate": 0.1,
+      "loss": 2.2395071983337402,
+      "step": 12884
+    },
+    {
+      "epoch": 0.4090793650793651,
+      "grad_norm": 0.18359375,
+      "learning_rate": 0.1,
+      "loss": 2.2181894779205322,
+      "step": 12886
+    },
+    {
+      "epoch": 0.40914285714285714,
+      "grad_norm": 0.1298828125,
+      "learning_rate": 0.1,
+      "loss": 2.237212657928467,
+      "step": 12888
+    },
+    {
+      "epoch": 0.4092063492063492,
+      "grad_norm": 0.1787109375,
+      "learning_rate": 0.1,
+      "loss": 2.204676866531372,
+      "step": 12890
+    },
+    {
+      "epoch": 0.4092698412698413,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.1,
+      "loss": 2.2483561038970947,
+      "step": 12892
+    },
+    {
+      "epoch": 0.4093333333333333,
+      "grad_norm": 0.12109375,
+      "learning_rate": 0.1,
+      "loss": 2.2073922157287598,
+      "step": 12894
+    },
+    {
+      "epoch": 0.40939682539682537,
+      "grad_norm": 0.1669921875,
+      "learning_rate": 0.1,
+      "loss": 2.2114458084106445,
+      "step": 12896
+    },
+    {
+      "epoch": 0.40946031746031747,
+      "grad_norm": 0.2314453125,
+      "learning_rate": 0.1,
+      "loss": 2.2161312103271484,
+      "step": 12898
+    },
+    {
+      "epoch": 0.4095238095238095,
+      "grad_norm": 0.07470703125,
+      "learning_rate": 0.1,
+      "loss": 2.237602472305298,
+      "step": 12900
+    },
+    {
+      "epoch": 0.4095873015873016,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.1,
+      "loss": 2.218475580215454,
+      "step": 12902
+    },
+    {
+      "epoch": 0.40965079365079365,
+      "grad_norm": 0.1767578125,
+      "learning_rate": 0.1,
+      "loss": 2.231311798095703,
+      "step": 12904
+    },
+    {
+      "epoch": 0.4097142857142857,
+      "grad_norm": 0.11328125,
+      "learning_rate": 0.1,
+      "loss": 2.2220139503479004,
+      "step": 12906
+    },
+    {
+      "epoch": 0.4097777777777778,
+      "grad_norm": 0.10009765625,
+      "learning_rate": 0.1,
+      "loss": 2.226863384246826,
+      "step": 12908
+    },
+    {
+      "epoch": 0.40984126984126984,
+      "grad_norm": 0.08056640625,
+      "learning_rate": 0.1,
+      "loss": 2.22263503074646,
+      "step": 12910
+    },
+    {
+      "epoch": 0.4099047619047619,
+      "grad_norm": 0.25,
+      "learning_rate": 0.1,
+      "loss": 2.216628074645996,
+      "step": 12912
+    },
+    {
+      "epoch": 0.409968253968254,
+      "grad_norm": 0.4765625,
+      "learning_rate": 0.1,
+      "loss": 2.2253456115722656,
+      "step": 12914
     }
   ],
   "logging_steps": 2,
       "attributes": {}
     }
   },
+  "total_flos": 4.277200399146157e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null