Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on 6 days ago

Commit

8d3f943

verified ·

1 Parent(s): c75cda1

Training in progress, step 10080, checkpoint

Browse files

Files changed (5) hide show

last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +1109 -3

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:046b7472de65288674e1a17a228c5b23d6202755eb3c596ca55a9421460d18e6
 size 1171937904

 version https://git-lfs.github.com/spec/v1
+oid sha256:4c15dd5758da6e91090a4e05104520c379459187474d7afcf1e299354f423045
 size 1171937904

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9994629a53927dac2badb394d55fb630a655b779ecaf3bbfa953d852dfac6df2
 size 1288212619

 version https://git-lfs.github.com/spec/v1
+oid sha256:0bda61c5562b085b556664a05a01e57bbea325105a786d10ee131f7f55de32d8
 size 1288212619

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7069bcf0178fbe01320f62674e3e7f828ace2134880455e4dabcb55dde670c6e
 size 14645

 version https://git-lfs.github.com/spec/v1
+oid sha256:17d09b98667b67af91698e95c4e454d8599b7a3fe6cb1b84c03f84f475afbcb6
 size 14645

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:753ea7ed53defd1dc2fbec9cf283a83e933481c30fa1896b26eca5deaada9ff5
 size 1401

 version https://git-lfs.github.com/spec/v1
+oid sha256:4cd3a69179e8bbc8abd9a58c7c722a1b78f068a2931f77ad34a41d0a1716b746
 size 1401

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.31,
   "eval_steps": 3150,
-  "global_step": 9765,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -34206,6 +34206,1112 @@
       "learning_rate": 0.1,
       "loss": 2.1471362113952637,
       "step": 9764
     }
   ],
   "logging_steps": 2,
@@ -34225,7 +35331,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 3.2340511805303165e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.32,
   "eval_steps": 3150,
+  "global_step": 10080,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "learning_rate": 0.1,
       "loss": 2.1471362113952637,
       "step": 9764
+    },
+    {
+      "epoch": 0.31003174603174605,
+      "grad_norm": 0.0732421875,
+      "learning_rate": 0.1,
+      "loss": 2.193241596221924,
+      "step": 9766
+    },
+    {
+      "epoch": 0.3100952380952381,
+      "grad_norm": 0.111328125,
+      "learning_rate": 0.1,
+      "loss": 2.1627867221832275,
+      "step": 9768
+    },
+    {
+      "epoch": 0.31015873015873013,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.1,
+      "loss": 2.186823844909668,
+      "step": 9770
+    },
+    {
+      "epoch": 0.31022222222222223,
+      "grad_norm": 0.2412109375,
+      "learning_rate": 0.1,
+      "loss": 2.165410280227661,
+      "step": 9772
+    },
+    {
+      "epoch": 0.3102857142857143,
+      "grad_norm": 0.115234375,
+      "learning_rate": 0.1,
+      "loss": 2.191713809967041,
+      "step": 9774
+    },
+    {
+      "epoch": 0.3103492063492064,
+      "grad_norm": 0.058349609375,
+      "learning_rate": 0.1,
+      "loss": 2.1742026805877686,
+      "step": 9776
+    },
+    {
+      "epoch": 0.3104126984126984,
+      "grad_norm": 0.0888671875,
+      "learning_rate": 0.1,
+      "loss": 2.170882225036621,
+      "step": 9778
+    },
+    {
+      "epoch": 0.31047619047619046,
+      "grad_norm": 0.212890625,
+      "learning_rate": 0.1,
+      "loss": 2.167639970779419,
+      "step": 9780
+    },
+    {
+      "epoch": 0.31053968253968256,
+      "grad_norm": 0.2119140625,
+      "learning_rate": 0.1,
+      "loss": 2.1710586547851562,
+      "step": 9782
+    },
+    {
+      "epoch": 0.3106031746031746,
+      "grad_norm": 0.271484375,
+      "learning_rate": 0.1,
+      "loss": 2.2063710689544678,
+      "step": 9784
+    },
+    {
+      "epoch": 0.31066666666666665,
+      "grad_norm": 0.259765625,
+      "learning_rate": 0.1,
+      "loss": 2.181265115737915,
+      "step": 9786
+    },
+    {
+      "epoch": 0.31073015873015875,
+      "grad_norm": 0.0673828125,
+      "learning_rate": 0.1,
+      "loss": 2.1840031147003174,
+      "step": 9788
+    },
+    {
+      "epoch": 0.3107936507936508,
+      "grad_norm": 0.1376953125,
+      "learning_rate": 0.1,
+      "loss": 2.1821494102478027,
+      "step": 9790
+    },
+    {
+      "epoch": 0.31085714285714283,
+      "grad_norm": 0.2431640625,
+      "learning_rate": 0.1,
+      "loss": 2.1770834922790527,
+      "step": 9792
+    },
+    {
+      "epoch": 0.31092063492063493,
+      "grad_norm": 0.263671875,
+      "learning_rate": 0.1,
+      "loss": 2.163947105407715,
+      "step": 9794
+    },
+    {
+      "epoch": 0.310984126984127,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.1887195110321045,
+      "step": 9796
+    },
+    {
+      "epoch": 0.3110476190476191,
+      "grad_norm": 0.08935546875,
+      "learning_rate": 0.1,
+      "loss": 2.1659390926361084,
+      "step": 9798
+    },
+    {
+      "epoch": 0.3111111111111111,
+      "grad_norm": 0.10791015625,
+      "learning_rate": 0.1,
+      "loss": 2.1604509353637695,
+      "step": 9800
+    },
+    {
+      "epoch": 0.31117460317460316,
+      "grad_norm": 0.06640625,
+      "learning_rate": 0.1,
+      "loss": 2.193321466445923,
+      "step": 9802
+    },
+    {
+      "epoch": 0.31123809523809526,
+      "grad_norm": 0.053466796875,
+      "learning_rate": 0.1,
+      "loss": 2.1531283855438232,
+      "step": 9804
+    },
+    {
+      "epoch": 0.3113015873015873,
+      "grad_norm": 0.0751953125,
+      "learning_rate": 0.1,
+      "loss": 2.1906158924102783,
+      "step": 9806
+    },
+    {
+      "epoch": 0.31136507936507934,
+      "grad_norm": 0.31640625,
+      "learning_rate": 0.1,
+      "loss": 2.2052948474884033,
+      "step": 9808
+    },
+    {
+      "epoch": 0.31142857142857144,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.1,
+      "loss": 2.20705246925354,
+      "step": 9810
+    },
+    {
+      "epoch": 0.3114920634920635,
+      "grad_norm": 0.08154296875,
+      "learning_rate": 0.1,
+      "loss": 2.1867713928222656,
+      "step": 9812
+    },
+    {
+      "epoch": 0.31155555555555553,
+      "grad_norm": 0.053955078125,
+      "learning_rate": 0.1,
+      "loss": 2.185459852218628,
+      "step": 9814
+    },
+    {
+      "epoch": 0.31161904761904763,
+      "grad_norm": 0.061279296875,
+      "learning_rate": 0.1,
+      "loss": 2.192904233932495,
+      "step": 9816
+    },
+    {
+      "epoch": 0.31168253968253967,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.1,
+      "loss": 2.16719651222229,
+      "step": 9818
+    },
+    {
+      "epoch": 0.31174603174603177,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.184863567352295,
+      "step": 9820
+    },
+    {
+      "epoch": 0.3118095238095238,
+      "grad_norm": 0.19140625,
+      "learning_rate": 0.1,
+      "loss": 2.1928882598876953,
+      "step": 9822
+    },
+    {
+      "epoch": 0.31187301587301586,
+      "grad_norm": 0.2294921875,
+      "learning_rate": 0.1,
+      "loss": 2.1773009300231934,
+      "step": 9824
+    },
+    {
+      "epoch": 0.31193650793650796,
+      "grad_norm": 0.10595703125,
+      "learning_rate": 0.1,
+      "loss": 2.187877893447876,
+      "step": 9826
+    },
+    {
+      "epoch": 0.312,
+      "grad_norm": 0.1435546875,
+      "learning_rate": 0.1,
+      "loss": 2.187873363494873,
+      "step": 9828
+    },
+    {
+      "epoch": 0.31206349206349204,
+      "grad_norm": 0.0546875,
+      "learning_rate": 0.1,
+      "loss": 2.2156882286071777,
+      "step": 9830
+    },
+    {
+      "epoch": 0.31212698412698414,
+      "grad_norm": 0.1845703125,
+      "learning_rate": 0.1,
+      "loss": 2.1722946166992188,
+      "step": 9832
+    },
+    {
+      "epoch": 0.3121904761904762,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.1,
+      "loss": 2.180370330810547,
+      "step": 9834
+    },
+    {
+      "epoch": 0.31225396825396823,
+      "grad_norm": 0.0634765625,
+      "learning_rate": 0.1,
+      "loss": 2.1398494243621826,
+      "step": 9836
+    },
+    {
+      "epoch": 0.3123174603174603,
+      "grad_norm": 0.3125,
+      "learning_rate": 0.1,
+      "loss": 2.190025806427002,
+      "step": 9838
+    },
+    {
+      "epoch": 0.31238095238095237,
+      "grad_norm": 0.291015625,
+      "learning_rate": 0.1,
+      "loss": 2.176567554473877,
+      "step": 9840
+    },
+    {
+      "epoch": 0.31244444444444447,
+      "grad_norm": 0.095703125,
+      "learning_rate": 0.1,
+      "loss": 2.1811344623565674,
+      "step": 9842
+    },
+    {
+      "epoch": 0.3125079365079365,
+      "grad_norm": 0.06689453125,
+      "learning_rate": 0.1,
+      "loss": 2.159538507461548,
+      "step": 9844
+    },
+    {
+      "epoch": 0.31257142857142856,
+      "grad_norm": 0.2021484375,
+      "learning_rate": 0.1,
+      "loss": 2.1890382766723633,
+      "step": 9846
+    },
+    {
+      "epoch": 0.31263492063492065,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.1,
+      "loss": 2.191279649734497,
+      "step": 9848
+    },
+    {
+      "epoch": 0.3126984126984127,
+      "grad_norm": 0.1650390625,
+      "learning_rate": 0.1,
+      "loss": 2.1829254627227783,
+      "step": 9850
+    },
+    {
+      "epoch": 0.31276190476190474,
+      "grad_norm": 0.0791015625,
+      "learning_rate": 0.1,
+      "loss": 2.1959803104400635,
+      "step": 9852
+    },
+    {
+      "epoch": 0.31282539682539684,
+      "grad_norm": 0.1611328125,
+      "learning_rate": 0.1,
+      "loss": 2.1675479412078857,
+      "step": 9854
+    },
+    {
+      "epoch": 0.3128888888888889,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.1,
+      "loss": 2.172224521636963,
+      "step": 9856
+    },
+    {
+      "epoch": 0.3129523809523809,
+      "grad_norm": 0.1083984375,
+      "learning_rate": 0.1,
+      "loss": 2.199164390563965,
+      "step": 9858
+    },
+    {
+      "epoch": 0.313015873015873,
+      "grad_norm": 0.119140625,
+      "learning_rate": 0.1,
+      "loss": 2.2071785926818848,
+      "step": 9860
+    },
+    {
+      "epoch": 0.31307936507936507,
+      "grad_norm": 0.27734375,
+      "learning_rate": 0.1,
+      "loss": 2.198953628540039,
+      "step": 9862
+    },
+    {
+      "epoch": 0.31314285714285717,
+      "grad_norm": 0.28515625,
+      "learning_rate": 0.1,
+      "loss": 2.1639485359191895,
+      "step": 9864
+    },
+    {
+      "epoch": 0.3132063492063492,
+      "grad_norm": 0.1435546875,
+      "learning_rate": 0.1,
+      "loss": 2.1723406314849854,
+      "step": 9866
+    },
+    {
+      "epoch": 0.31326984126984125,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.1,
+      "loss": 2.1850998401641846,
+      "step": 9868
+    },
+    {
+      "epoch": 0.31333333333333335,
+      "grad_norm": 0.18359375,
+      "learning_rate": 0.1,
+      "loss": 2.2024178504943848,
+      "step": 9870
+    },
+    {
+      "epoch": 0.3133968253968254,
+      "grad_norm": 0.330078125,
+      "learning_rate": 0.1,
+      "loss": 2.2177889347076416,
+      "step": 9872
+    },
+    {
+      "epoch": 0.31346031746031744,
+      "grad_norm": 0.07861328125,
+      "learning_rate": 0.1,
+      "loss": 2.2047946453094482,
+      "step": 9874
+    },
+    {
+      "epoch": 0.31352380952380954,
+      "grad_norm": 0.06201171875,
+      "learning_rate": 0.1,
+      "loss": 2.1921725273132324,
+      "step": 9876
+    },
+    {
+      "epoch": 0.3135873015873016,
+      "grad_norm": 0.04931640625,
+      "learning_rate": 0.1,
+      "loss": 2.179666757583618,
+      "step": 9878
+    },
+    {
+      "epoch": 0.3136507936507936,
+      "grad_norm": 0.08935546875,
+      "learning_rate": 0.1,
+      "loss": 2.1968069076538086,
+      "step": 9880
+    },
+    {
+      "epoch": 0.3137142857142857,
+      "grad_norm": 0.234375,
+      "learning_rate": 0.1,
+      "loss": 2.215252161026001,
+      "step": 9882
+    },
+    {
+      "epoch": 0.31377777777777777,
+      "grad_norm": 0.2421875,
+      "learning_rate": 0.1,
+      "loss": 2.207919120788574,
+      "step": 9884
+    },
+    {
+      "epoch": 0.31384126984126987,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.1,
+      "loss": 2.192678928375244,
+      "step": 9886
+    },
+    {
+      "epoch": 0.3139047619047619,
+      "grad_norm": 0.125,
+      "learning_rate": 0.1,
+      "loss": 2.178010940551758,
+      "step": 9888
+    },
+    {
+      "epoch": 0.31396825396825395,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.1,
+      "loss": 2.195068836212158,
+      "step": 9890
+    },
+    {
+      "epoch": 0.31403174603174605,
+      "grad_norm": 0.2021484375,
+      "learning_rate": 0.1,
+      "loss": 2.2022101879119873,
+      "step": 9892
+    },
+    {
+      "epoch": 0.3140952380952381,
+      "grad_norm": 0.1689453125,
+      "learning_rate": 0.1,
+      "loss": 2.188624858856201,
+      "step": 9894
+    },
+    {
+      "epoch": 0.31415873015873014,
+      "grad_norm": 0.0654296875,
+      "learning_rate": 0.1,
+      "loss": 2.174272060394287,
+      "step": 9896
+    },
+    {
+      "epoch": 0.31422222222222224,
+      "grad_norm": 0.12109375,
+      "learning_rate": 0.1,
+      "loss": 2.1800880432128906,
+      "step": 9898
+    },
+    {
+      "epoch": 0.3142857142857143,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.1,
+      "loss": 2.182217597961426,
+      "step": 9900
+    },
+    {
+      "epoch": 0.3143492063492063,
+      "grad_norm": 0.1640625,
+      "learning_rate": 0.1,
+      "loss": 2.175539255142212,
+      "step": 9902
+    },
+    {
+      "epoch": 0.3144126984126984,
+      "grad_norm": 0.1826171875,
+      "learning_rate": 0.1,
+      "loss": 2.1903021335601807,
+      "step": 9904
+    },
+    {
+      "epoch": 0.31447619047619046,
+      "grad_norm": 0.16015625,
+      "learning_rate": 0.1,
+      "loss": 2.197434663772583,
+      "step": 9906
+    },
+    {
+      "epoch": 0.31453968253968256,
+      "grad_norm": 0.08447265625,
+      "learning_rate": 0.1,
+      "loss": 2.198740005493164,
+      "step": 9908
+    },
+    {
+      "epoch": 0.3146031746031746,
+      "grad_norm": 0.0908203125,
+      "learning_rate": 0.1,
+      "loss": 2.165989398956299,
+      "step": 9910
+    },
+    {
+      "epoch": 0.31466666666666665,
+      "grad_norm": 0.201171875,
+      "learning_rate": 0.1,
+      "loss": 2.22456693649292,
+      "step": 9912
+    },
+    {
+      "epoch": 0.31473015873015875,
+      "grad_norm": 0.33203125,
+      "learning_rate": 0.1,
+      "loss": 2.1924569606781006,
+      "step": 9914
+    },
+    {
+      "epoch": 0.3147936507936508,
+      "grad_norm": 0.1806640625,
+      "learning_rate": 0.1,
+      "loss": 2.1935088634490967,
+      "step": 9916
+    },
+    {
+      "epoch": 0.31485714285714284,
+      "grad_norm": 0.1044921875,
+      "learning_rate": 0.1,
+      "loss": 2.222795248031616,
+      "step": 9918
+    },
+    {
+      "epoch": 0.31492063492063493,
+      "grad_norm": 0.058349609375,
+      "learning_rate": 0.1,
+      "loss": 2.2171945571899414,
+      "step": 9920
+    },
+    {
+      "epoch": 0.314984126984127,
+      "grad_norm": 0.1416015625,
+      "learning_rate": 0.1,
+      "loss": 2.2098042964935303,
+      "step": 9922
+    },
+    {
+      "epoch": 0.315047619047619,
+      "grad_norm": 0.140625,
+      "learning_rate": 0.1,
+      "loss": 2.1736574172973633,
+      "step": 9924
+    },
+    {
+      "epoch": 0.3151111111111111,
+      "grad_norm": 0.267578125,
+      "learning_rate": 0.1,
+      "loss": 2.203256607055664,
+      "step": 9926
+    },
+    {
+      "epoch": 0.31517460317460316,
+      "grad_norm": 0.427734375,
+      "learning_rate": 0.1,
+      "loss": 2.196739912033081,
+      "step": 9928
+    },
+    {
+      "epoch": 0.31523809523809526,
+      "grad_norm": 0.1005859375,
+      "learning_rate": 0.1,
+      "loss": 2.202354669570923,
+      "step": 9930
+    },
+    {
+      "epoch": 0.3153015873015873,
+      "grad_norm": 0.1298828125,
+      "learning_rate": 0.1,
+      "loss": 2.1933236122131348,
+      "step": 9932
+    },
+    {
+      "epoch": 0.31536507936507935,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.1,
+      "loss": 2.198293685913086,
+      "step": 9934
+    },
+    {
+      "epoch": 0.31542857142857145,
+      "grad_norm": 0.1865234375,
+      "learning_rate": 0.1,
+      "loss": 2.212238311767578,
+      "step": 9936
+    },
+    {
+      "epoch": 0.3154920634920635,
+      "grad_norm": 0.26953125,
+      "learning_rate": 0.1,
+      "loss": 2.1917948722839355,
+      "step": 9938
+    },
+    {
+      "epoch": 0.31555555555555553,
+      "grad_norm": 0.1669921875,
+      "learning_rate": 0.1,
+      "loss": 2.187749147415161,
+      "step": 9940
+    },
+    {
+      "epoch": 0.31561904761904763,
+      "grad_norm": 0.08154296875,
+      "learning_rate": 0.1,
+      "loss": 2.2057549953460693,
+      "step": 9942
+    },
+    {
+      "epoch": 0.3156825396825397,
+      "grad_norm": 0.19140625,
+      "learning_rate": 0.1,
+      "loss": 2.1741790771484375,
+      "step": 9944
+    },
+    {
+      "epoch": 0.3157460317460317,
+      "grad_norm": 0.109375,
+      "learning_rate": 0.1,
+      "loss": 2.224497079849243,
+      "step": 9946
+    },
+    {
+      "epoch": 0.3158095238095238,
+      "grad_norm": 0.09912109375,
+      "learning_rate": 0.1,
+      "loss": 2.203294277191162,
+      "step": 9948
+    },
+    {
+      "epoch": 0.31587301587301586,
+      "grad_norm": 0.08251953125,
+      "learning_rate": 0.1,
+      "loss": 2.196547269821167,
+      "step": 9950
+    },
+    {
+      "epoch": 0.31593650793650796,
+      "grad_norm": 0.138671875,
+      "learning_rate": 0.1,
+      "loss": 2.216204881668091,
+      "step": 9952
+    },
+    {
+      "epoch": 0.316,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.1,
+      "loss": 2.2269957065582275,
+      "step": 9954
+    },
+    {
+      "epoch": 0.31606349206349205,
+      "grad_norm": 0.2314453125,
+      "learning_rate": 0.1,
+      "loss": 2.209287643432617,
+      "step": 9956
+    },
+    {
+      "epoch": 0.31612698412698415,
+      "grad_norm": 0.119140625,
+      "learning_rate": 0.1,
+      "loss": 2.233351945877075,
+      "step": 9958
+    },
+    {
+      "epoch": 0.3161904761904762,
+      "grad_norm": 0.201171875,
+      "learning_rate": 0.1,
+      "loss": 2.187995195388794,
+      "step": 9960
+    },
+    {
+      "epoch": 0.31625396825396823,
+      "grad_norm": 0.234375,
+      "learning_rate": 0.1,
+      "loss": 2.203009605407715,
+      "step": 9962
+    },
+    {
+      "epoch": 0.31631746031746033,
+      "grad_norm": 0.0703125,
+      "learning_rate": 0.1,
+      "loss": 2.1894705295562744,
+      "step": 9964
+    },
+    {
+      "epoch": 0.3163809523809524,
+      "grad_norm": 0.09033203125,
+      "learning_rate": 0.1,
+      "loss": 2.2359108924865723,
+      "step": 9966
+    },
+    {
+      "epoch": 0.3164444444444444,
+      "grad_norm": 0.1494140625,
+      "learning_rate": 0.1,
+      "loss": 2.202052593231201,
+      "step": 9968
+    },
+    {
+      "epoch": 0.3165079365079365,
+      "grad_norm": 0.0908203125,
+      "learning_rate": 0.1,
+      "loss": 2.155691623687744,
+      "step": 9970
+    },
+    {
+      "epoch": 0.31657142857142856,
+      "grad_norm": 0.189453125,
+      "learning_rate": 0.1,
+      "loss": 2.2047054767608643,
+      "step": 9972
+    },
+    {
+      "epoch": 0.31663492063492066,
+      "grad_norm": 0.296875,
+      "learning_rate": 0.1,
+      "loss": 2.1690495014190674,
+      "step": 9974
+    },
+    {
+      "epoch": 0.3166984126984127,
+      "grad_norm": 0.09814453125,
+      "learning_rate": 0.1,
+      "loss": 2.1789681911468506,
+      "step": 9976
+    },
+    {
+      "epoch": 0.31676190476190474,
+      "grad_norm": 0.212890625,
+      "learning_rate": 0.1,
+      "loss": 2.19167423248291,
+      "step": 9978
+    },
+    {
+      "epoch": 0.31682539682539684,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.1,
+      "loss": 2.178757667541504,
+      "step": 9980
+    },
+    {
+      "epoch": 0.3168888888888889,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.1,
+      "loss": 2.184861183166504,
+      "step": 9982
+    },
+    {
+      "epoch": 0.31695238095238093,
+      "grad_norm": 0.1494140625,
+      "learning_rate": 0.1,
+      "loss": 2.141266345977783,
+      "step": 9984
+    },
+    {
+      "epoch": 0.31701587301587303,
+      "grad_norm": 0.33203125,
+      "learning_rate": 0.1,
+      "loss": 2.199850559234619,
+      "step": 9986
+    },
+    {
+      "epoch": 0.31707936507936507,
+      "grad_norm": 0.2578125,
+      "learning_rate": 0.1,
+      "loss": 2.2000479698181152,
+      "step": 9988
+    },
+    {
+      "epoch": 0.3171428571428571,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.1,
+      "loss": 2.1777122020721436,
+      "step": 9990
+    },
+    {
+      "epoch": 0.3172063492063492,
+      "grad_norm": 0.1708984375,
+      "learning_rate": 0.1,
+      "loss": 2.1901791095733643,
+      "step": 9992
+    },
+    {
+      "epoch": 0.31726984126984126,
+      "grad_norm": 0.078125,
+      "learning_rate": 0.1,
+      "loss": 2.194085121154785,
+      "step": 9994
+    },
+    {
+      "epoch": 0.31733333333333336,
+      "grad_norm": 0.09033203125,
+      "learning_rate": 0.1,
+      "loss": 2.1851909160614014,
+      "step": 9996
+    },
+    {
+      "epoch": 0.3173968253968254,
+      "grad_norm": 0.1875,
+      "learning_rate": 0.1,
+      "loss": 2.190678358078003,
+      "step": 9998
+    },
+    {
+      "epoch": 0.31746031746031744,
+      "grad_norm": 0.25390625,
+      "learning_rate": 0.1,
+      "loss": 2.172008752822876,
+      "step": 10000
+    },
+    {
+      "epoch": 0.31752380952380954,
+      "grad_norm": 0.0634765625,
+      "learning_rate": 0.1,
+      "loss": 2.1945793628692627,
+      "step": 10002
+    },
+    {
+      "epoch": 0.3175873015873016,
+      "grad_norm": 0.1435546875,
+      "learning_rate": 0.1,
+      "loss": 2.173008680343628,
+      "step": 10004
+    },
+    {
+      "epoch": 0.31765079365079363,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.1,
+      "loss": 2.1987736225128174,
+      "step": 10006
+    },
+    {
+      "epoch": 0.3177142857142857,
+      "grad_norm": 0.306640625,
+      "learning_rate": 0.1,
+      "loss": 2.194270610809326,
+      "step": 10008
+    },
+    {
+      "epoch": 0.31777777777777777,
+      "grad_norm": 0.07958984375,
+      "learning_rate": 0.1,
+      "loss": 2.194709300994873,
+      "step": 10010
+    },
+    {
+      "epoch": 0.31784126984126987,
+      "grad_norm": 0.1630859375,
+      "learning_rate": 0.1,
+      "loss": 2.1713898181915283,
+      "step": 10012
+    },
+    {
+      "epoch": 0.3179047619047619,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.1,
+      "loss": 2.1831634044647217,
+      "step": 10014
+    },
+    {
+      "epoch": 0.31796825396825396,
+      "grad_norm": 0.11279296875,
+      "learning_rate": 0.1,
+      "loss": 2.1948635578155518,
+      "step": 10016
+    },
+    {
+      "epoch": 0.31803174603174605,
+      "grad_norm": 0.1123046875,
+      "learning_rate": 0.1,
+      "loss": 2.1673130989074707,
+      "step": 10018
+    },
+    {
+      "epoch": 0.3180952380952381,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.1,
+      "loss": 2.1914515495300293,
+      "step": 10020
+    },
+    {
+      "epoch": 0.31815873015873014,
+      "grad_norm": 0.330078125,
+      "learning_rate": 0.1,
+      "loss": 2.1910383701324463,
+      "step": 10022
+    },
+    {
+      "epoch": 0.31822222222222224,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.1,
+      "loss": 2.155423402786255,
+      "step": 10024
+    },
+    {
+      "epoch": 0.3182857142857143,
+      "grad_norm": 0.130859375,
+      "learning_rate": 0.1,
+      "loss": 2.1624786853790283,
+      "step": 10026
+    },
+    {
+      "epoch": 0.3183492063492063,
+      "grad_norm": 0.07421875,
+      "learning_rate": 0.1,
+      "loss": 2.1976842880249023,
+      "step": 10028
+    },
+    {
+      "epoch": 0.3184126984126984,
+      "grad_norm": 0.248046875,
+      "learning_rate": 0.1,
+      "loss": 2.1654818058013916,
+      "step": 10030
+    },
+    {
+      "epoch": 0.31847619047619047,
+      "grad_norm": 0.2275390625,
+      "learning_rate": 0.1,
+      "loss": 2.1663379669189453,
+      "step": 10032
+    },
+    {
+      "epoch": 0.31853968253968257,
+      "grad_norm": 0.08154296875,
+      "learning_rate": 0.1,
+      "loss": 2.1839845180511475,
+      "step": 10034
+    },
+    {
+      "epoch": 0.3186031746031746,
+      "grad_norm": 0.1572265625,
+      "learning_rate": 0.1,
+      "loss": 2.1859796047210693,
+      "step": 10036
+    },
+    {
+      "epoch": 0.31866666666666665,
+      "grad_norm": 0.408203125,
+      "learning_rate": 0.1,
+      "loss": 2.171574354171753,
+      "step": 10038
+    },
+    {
+      "epoch": 0.31873015873015875,
+      "grad_norm": 0.09228515625,
+      "learning_rate": 0.1,
+      "loss": 2.1888623237609863,
+      "step": 10040
+    },
+    {
+      "epoch": 0.3187936507936508,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.1,
+      "loss": 2.2281229496002197,
+      "step": 10042
+    },
+    {
+      "epoch": 0.31885714285714284,
+      "grad_norm": 0.06591796875,
+      "learning_rate": 0.1,
+      "loss": 2.215243339538574,
+      "step": 10044
+    },
+    {
+      "epoch": 0.31892063492063494,
+      "grad_norm": 0.11767578125,
+      "learning_rate": 0.1,
+      "loss": 2.1814558506011963,
+      "step": 10046
+    },
+    {
+      "epoch": 0.318984126984127,
+      "grad_norm": 0.10986328125,
+      "learning_rate": 0.1,
+      "loss": 2.185574769973755,
+      "step": 10048
+    },
+    {
+      "epoch": 0.319047619047619,
+      "grad_norm": 0.048583984375,
+      "learning_rate": 0.1,
+      "loss": 2.209963321685791,
+      "step": 10050
+    },
+    {
+      "epoch": 0.3191111111111111,
+      "grad_norm": 0.173828125,
+      "learning_rate": 0.1,
+      "loss": 2.1775128841400146,
+      "step": 10052
+    },
+    {
+      "epoch": 0.31917460317460317,
+      "grad_norm": 0.171875,
+      "learning_rate": 0.1,
+      "loss": 2.1951544284820557,
+      "step": 10054
+    },
+    {
+      "epoch": 0.31923809523809527,
+      "grad_norm": 0.08203125,
+      "learning_rate": 0.1,
+      "loss": 2.2017500400543213,
+      "step": 10056
+    },
+    {
+      "epoch": 0.3193015873015873,
+      "grad_norm": 0.064453125,
+      "learning_rate": 0.1,
+      "loss": 2.179600238800049,
+      "step": 10058
+    },
+    {
+      "epoch": 0.31936507936507935,
+      "grad_norm": 0.1865234375,
+      "learning_rate": 0.1,
+      "loss": 2.2006568908691406,
+      "step": 10060
+    },
+    {
+      "epoch": 0.31942857142857145,
+      "grad_norm": 0.458984375,
+      "learning_rate": 0.1,
+      "loss": 2.195237398147583,
+      "step": 10062
+    },
+    {
+      "epoch": 0.3194920634920635,
+      "grad_norm": 0.05029296875,
+      "learning_rate": 0.1,
+      "loss": 2.193232536315918,
+      "step": 10064
+    },
+    {
+      "epoch": 0.31955555555555554,
+      "grad_norm": 0.09130859375,
+      "learning_rate": 0.1,
+      "loss": 2.1862893104553223,
+      "step": 10066
+    },
+    {
+      "epoch": 0.31961904761904764,
+      "grad_norm": 0.1318359375,
+      "learning_rate": 0.1,
+      "loss": 2.2037243843078613,
+      "step": 10068
+    },
+    {
+      "epoch": 0.3196825396825397,
+      "grad_norm": 0.1328125,
+      "learning_rate": 0.1,
+      "loss": 2.184126615524292,
+      "step": 10070
+    },
+    {
+      "epoch": 0.3197460317460317,
+      "grad_norm": 0.1865234375,
+      "learning_rate": 0.1,
+      "loss": 2.2134101390838623,
+      "step": 10072
+    },
+    {
+      "epoch": 0.3198095238095238,
+      "grad_norm": 0.083984375,
+      "learning_rate": 0.1,
+      "loss": 2.183708667755127,
+      "step": 10074
+    },
+    {
+      "epoch": 0.31987301587301586,
+      "grad_norm": 0.06494140625,
+      "learning_rate": 0.1,
+      "loss": 2.20131516456604,
+      "step": 10076
+    },
+    {
+      "epoch": 0.31993650793650796,
+      "grad_norm": 0.05322265625,
+      "learning_rate": 0.1,
+      "loss": 2.1745753288269043,
+      "step": 10078
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 0.08349609375,
+      "learning_rate": 0.1,
+      "loss": 2.191664218902588,
+      "step": 10080
     }
   ],
   "logging_steps": 2,
       "attributes": {}
     }
   },
+  "total_flos": 3.33837643366914e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null