Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on 27 days ago

Commit

64f55cb

1 Parent(s): af64d42

Training in progress, step 11340, checkpoint

Browse files

Files changed (1) hide show

last-checkpoint/trainer_state.json +1109 -3

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.35,
   "eval_steps": 3150,
-  "global_step": 11025,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -38616,6 +38616,1112 @@
       "learning_rate": 0.1,
       "loss": 2.3982603549957275,
       "step": 11024
     }
   ],
   "logging_steps": 2,
@@ -38635,7 +39741,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 3.6513666425418183e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.36,
   "eval_steps": 3150,
+  "global_step": 11340,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "learning_rate": 0.1,
       "loss": 2.3982603549957275,
       "step": 11024
+    },
+    {
+      "epoch": 0.350031746031746,
+      "grad_norm": 0.11767578125,
+      "learning_rate": 0.1,
+      "loss": 2.417543888092041,
+      "step": 11026
+    },
+    {
+      "epoch": 0.35009523809523807,
+      "grad_norm": 0.4921875,
+      "learning_rate": 0.1,
+      "loss": 2.4204814434051514,
+      "step": 11028
+    },
+    {
+      "epoch": 0.35015873015873017,
+      "grad_norm": 0.2470703125,
+      "learning_rate": 0.1,
+      "loss": 2.423769235610962,
+      "step": 11030
+    },
+    {
+      "epoch": 0.3502222222222222,
+      "grad_norm": 0.26171875,
+      "learning_rate": 0.1,
+      "loss": 2.403280019760132,
+      "step": 11032
+    },
+    {
+      "epoch": 0.3502857142857143,
+      "grad_norm": 0.12451171875,
+      "learning_rate": 0.1,
+      "loss": 2.391350746154785,
+      "step": 11034
+    },
+    {
+      "epoch": 0.35034920634920635,
+      "grad_norm": 0.166015625,
+      "learning_rate": 0.1,
+      "loss": 2.422663927078247,
+      "step": 11036
+    },
+    {
+      "epoch": 0.3504126984126984,
+      "grad_norm": 0.4140625,
+      "learning_rate": 0.1,
+      "loss": 2.4171299934387207,
+      "step": 11038
+    },
+    {
+      "epoch": 0.3504761904761905,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.1,
+      "loss": 2.4181694984436035,
+      "step": 11040
+    },
+    {
+      "epoch": 0.35053968253968254,
+      "grad_norm": 0.181640625,
+      "learning_rate": 0.1,
+      "loss": 2.4219069480895996,
+      "step": 11042
+    },
+    {
+      "epoch": 0.3506031746031746,
+      "grad_norm": 0.05517578125,
+      "learning_rate": 0.1,
+      "loss": 2.4353079795837402,
+      "step": 11044
+    },
+    {
+      "epoch": 0.3506666666666667,
+      "grad_norm": 0.1513671875,
+      "learning_rate": 0.1,
+      "loss": 2.4211533069610596,
+      "step": 11046
+    },
+    {
+      "epoch": 0.3507301587301587,
+      "grad_norm": 0.21484375,
+      "learning_rate": 0.1,
+      "loss": 2.4109206199645996,
+      "step": 11048
+    },
+    {
+      "epoch": 0.35079365079365077,
+      "grad_norm": 0.234375,
+      "learning_rate": 0.1,
+      "loss": 2.42941951751709,
+      "step": 11050
+    },
+    {
+      "epoch": 0.35085714285714287,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.1,
+      "loss": 2.3879783153533936,
+      "step": 11052
+    },
+    {
+      "epoch": 0.3509206349206349,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.1,
+      "loss": 2.4356982707977295,
+      "step": 11054
+    },
+    {
+      "epoch": 0.350984126984127,
+      "grad_norm": 0.212890625,
+      "learning_rate": 0.1,
+      "loss": 2.4666500091552734,
+      "step": 11056
+    },
+    {
+      "epoch": 0.35104761904761905,
+      "grad_norm": 0.072265625,
+      "learning_rate": 0.1,
+      "loss": 2.4236481189727783,
+      "step": 11058
+    },
+    {
+      "epoch": 0.3511111111111111,
+      "grad_norm": 0.2421875,
+      "learning_rate": 0.1,
+      "loss": 2.4124252796173096,
+      "step": 11060
+    },
+    {
+      "epoch": 0.3511746031746032,
+      "grad_norm": 0.314453125,
+      "learning_rate": 0.1,
+      "loss": 2.432394027709961,
+      "step": 11062
+    },
+    {
+      "epoch": 0.35123809523809524,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.1,
+      "loss": 2.4237029552459717,
+      "step": 11064
+    },
+    {
+      "epoch": 0.3513015873015873,
+      "grad_norm": 0.1044921875,
+      "learning_rate": 0.1,
+      "loss": 2.424628496170044,
+      "step": 11066
+    },
+    {
+      "epoch": 0.3513650793650794,
+      "grad_norm": 0.310546875,
+      "learning_rate": 0.1,
+      "loss": 2.4362010955810547,
+      "step": 11068
+    },
+    {
+      "epoch": 0.3514285714285714,
+      "grad_norm": 0.203125,
+      "learning_rate": 0.1,
+      "loss": 2.437283992767334,
+      "step": 11070
+    },
+    {
+      "epoch": 0.35149206349206347,
+      "grad_norm": 0.107421875,
+      "learning_rate": 0.1,
+      "loss": 2.446931838989258,
+      "step": 11072
+    },
+    {
+      "epoch": 0.35155555555555557,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.1,
+      "loss": 2.439727783203125,
+      "step": 11074
+    },
+    {
+      "epoch": 0.3516190476190476,
+      "grad_norm": 0.11474609375,
+      "learning_rate": 0.1,
+      "loss": 2.451850175857544,
+      "step": 11076
+    },
+    {
+      "epoch": 0.3516825396825397,
+      "grad_norm": 0.185546875,
+      "learning_rate": 0.1,
+      "loss": 2.431602954864502,
+      "step": 11078
+    },
+    {
+      "epoch": 0.35174603174603175,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.1,
+      "loss": 2.4908480644226074,
+      "step": 11080
+    },
+    {
+      "epoch": 0.3518095238095238,
+      "grad_norm": 0.2431640625,
+      "learning_rate": 0.1,
+      "loss": 2.444524049758911,
+      "step": 11082
+    },
+    {
+      "epoch": 0.3518730158730159,
+      "grad_norm": 0.244140625,
+      "learning_rate": 0.1,
+      "loss": 2.432138204574585,
+      "step": 11084
+    },
+    {
+      "epoch": 0.35193650793650794,
+      "grad_norm": 0.310546875,
+      "learning_rate": 0.1,
+      "loss": 2.4384355545043945,
+      "step": 11086
+    },
+    {
+      "epoch": 0.352,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.1,
+      "loss": 2.4333276748657227,
+      "step": 11088
+    },
+    {
+      "epoch": 0.3520634920634921,
+      "grad_norm": 0.11474609375,
+      "learning_rate": 0.1,
+      "loss": 2.467416524887085,
+      "step": 11090
+    },
+    {
+      "epoch": 0.3521269841269841,
+      "grad_norm": 0.1923828125,
+      "learning_rate": 0.1,
+      "loss": 2.4466445446014404,
+      "step": 11092
+    },
+    {
+      "epoch": 0.35219047619047616,
+      "grad_norm": 0.37890625,
+      "learning_rate": 0.1,
+      "loss": 2.461064338684082,
+      "step": 11094
+    },
+    {
+      "epoch": 0.35225396825396826,
+      "grad_norm": 0.291015625,
+      "learning_rate": 0.1,
+      "loss": 2.464916229248047,
+      "step": 11096
+    },
+    {
+      "epoch": 0.3523174603174603,
+      "grad_norm": 0.0771484375,
+      "learning_rate": 0.1,
+      "loss": 2.4417896270751953,
+      "step": 11098
+    },
+    {
+      "epoch": 0.3523809523809524,
+      "grad_norm": 0.1767578125,
+      "learning_rate": 0.1,
+      "loss": 2.4504122734069824,
+      "step": 11100
+    },
+    {
+      "epoch": 0.35244444444444445,
+      "grad_norm": 0.25390625,
+      "learning_rate": 0.1,
+      "loss": 2.46895694732666,
+      "step": 11102
+    },
+    {
+      "epoch": 0.3525079365079365,
+      "grad_norm": 0.125,
+      "learning_rate": 0.1,
+      "loss": 2.4499869346618652,
+      "step": 11104
+    },
+    {
+      "epoch": 0.3525714285714286,
+      "grad_norm": 0.046875,
+      "learning_rate": 0.1,
+      "loss": 2.463418483734131,
+      "step": 11106
+    },
+    {
+      "epoch": 0.35263492063492063,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.1,
+      "loss": 2.453564405441284,
+      "step": 11108
+    },
+    {
+      "epoch": 0.3526984126984127,
+      "grad_norm": 0.1982421875,
+      "learning_rate": 0.1,
+      "loss": 2.4937078952789307,
+      "step": 11110
+    },
+    {
+      "epoch": 0.3527619047619048,
+      "grad_norm": 0.486328125,
+      "learning_rate": 0.1,
+      "loss": 2.4809000492095947,
+      "step": 11112
+    },
+    {
+      "epoch": 0.3528253968253968,
+      "grad_norm": 0.30078125,
+      "learning_rate": 0.1,
+      "loss": 2.503629446029663,
+      "step": 11114
+    },
+    {
+      "epoch": 0.35288888888888886,
+      "grad_norm": 0.12451171875,
+      "learning_rate": 0.1,
+      "loss": 2.464132070541382,
+      "step": 11116
+    },
+    {
+      "epoch": 0.35295238095238096,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.1,
+      "loss": 2.5101478099823,
+      "step": 11118
+    },
+    {
+      "epoch": 0.353015873015873,
+      "grad_norm": 0.328125,
+      "learning_rate": 0.1,
+      "loss": 2.492459297180176,
+      "step": 11120
+    },
+    {
+      "epoch": 0.3530793650793651,
+      "grad_norm": 0.07666015625,
+      "learning_rate": 0.1,
+      "loss": 2.4816741943359375,
+      "step": 11122
+    },
+    {
+      "epoch": 0.35314285714285715,
+      "grad_norm": 0.1513671875,
+      "learning_rate": 0.1,
+      "loss": 2.4845693111419678,
+      "step": 11124
+    },
+    {
+      "epoch": 0.3532063492063492,
+      "grad_norm": 0.11376953125,
+      "learning_rate": 0.1,
+      "loss": 2.47183895111084,
+      "step": 11126
+    },
+    {
+      "epoch": 0.3532698412698413,
+      "grad_norm": 0.1982421875,
+      "learning_rate": 0.1,
+      "loss": 2.5153868198394775,
+      "step": 11128
+    },
+    {
+      "epoch": 0.35333333333333333,
+      "grad_norm": 0.1748046875,
+      "learning_rate": 0.1,
+      "loss": 2.5130763053894043,
+      "step": 11130
+    },
+    {
+      "epoch": 0.3533968253968254,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.1,
+      "loss": 2.487558126449585,
+      "step": 11132
+    },
+    {
+      "epoch": 0.3534603174603175,
+      "grad_norm": 0.1572265625,
+      "learning_rate": 0.1,
+      "loss": 2.501115322113037,
+      "step": 11134
+    },
+    {
+      "epoch": 0.3535238095238095,
+      "grad_norm": 0.08154296875,
+      "learning_rate": 0.1,
+      "loss": 2.516602039337158,
+      "step": 11136
+    },
+    {
+      "epoch": 0.35358730158730156,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.1,
+      "loss": 2.49763560295105,
+      "step": 11138
+    },
+    {
+      "epoch": 0.35365079365079366,
+      "grad_norm": 0.482421875,
+      "learning_rate": 0.1,
+      "loss": 2.4931797981262207,
+      "step": 11140
+    },
+    {
+      "epoch": 0.3537142857142857,
+      "grad_norm": 0.26953125,
+      "learning_rate": 0.1,
+      "loss": 2.5045690536499023,
+      "step": 11142
+    },
+    {
+      "epoch": 0.3537777777777778,
+      "grad_norm": 0.318359375,
+      "learning_rate": 0.1,
+      "loss": 2.4567301273345947,
+      "step": 11144
+    },
+    {
+      "epoch": 0.35384126984126985,
+      "grad_norm": 0.4375,
+      "learning_rate": 0.1,
+      "loss": 2.4942221641540527,
+      "step": 11146
+    },
+    {
+      "epoch": 0.3539047619047619,
+      "grad_norm": 0.263671875,
+      "learning_rate": 0.1,
+      "loss": 2.494699001312256,
+      "step": 11148
+    },
+    {
+      "epoch": 0.353968253968254,
+      "grad_norm": 0.185546875,
+      "learning_rate": 0.1,
+      "loss": 2.503068685531616,
+      "step": 11150
+    },
+    {
+      "epoch": 0.35403174603174603,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.1,
+      "loss": 2.525214672088623,
+      "step": 11152
+    },
+    {
+      "epoch": 0.3540952380952381,
+      "grad_norm": 0.087890625,
+      "learning_rate": 0.1,
+      "loss": 2.475733518600464,
+      "step": 11154
+    },
+    {
+      "epoch": 0.3541587301587302,
+      "grad_norm": 0.201171875,
+      "learning_rate": 0.1,
+      "loss": 2.518965005874634,
+      "step": 11156
+    },
+    {
+      "epoch": 0.3542222222222222,
+      "grad_norm": 0.162109375,
+      "learning_rate": 0.1,
+      "loss": 2.4847307205200195,
+      "step": 11158
+    },
+    {
+      "epoch": 0.35428571428571426,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.1,
+      "loss": 2.5241191387176514,
+      "step": 11160
+    },
+    {
+      "epoch": 0.35434920634920636,
+      "grad_norm": 0.0673828125,
+      "learning_rate": 0.1,
+      "loss": 2.5268568992614746,
+      "step": 11162
+    },
+    {
+      "epoch": 0.3544126984126984,
+      "grad_norm": 0.18359375,
+      "learning_rate": 0.1,
+      "loss": 2.489588975906372,
+      "step": 11164
+    },
+    {
+      "epoch": 0.3544761904761905,
+      "grad_norm": 0.19921875,
+      "learning_rate": 0.1,
+      "loss": 2.517854928970337,
+      "step": 11166
+    },
+    {
+      "epoch": 0.35453968253968254,
+      "grad_norm": 0.1796875,
+      "learning_rate": 0.1,
+      "loss": 2.5115416049957275,
+      "step": 11168
+    },
+    {
+      "epoch": 0.3546031746031746,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.1,
+      "loss": 2.5270791053771973,
+      "step": 11170
+    },
+    {
+      "epoch": 0.3546666666666667,
+      "grad_norm": 0.189453125,
+      "learning_rate": 0.1,
+      "loss": 2.49017596244812,
+      "step": 11172
+    },
+    {
+      "epoch": 0.35473015873015873,
+      "grad_norm": 0.07080078125,
+      "learning_rate": 0.1,
+      "loss": 2.486820697784424,
+      "step": 11174
+    },
+    {
+      "epoch": 0.35479365079365077,
+      "grad_norm": 0.181640625,
+      "learning_rate": 0.1,
+      "loss": 2.4675469398498535,
+      "step": 11176
+    },
+    {
+      "epoch": 0.35485714285714287,
+      "grad_norm": 0.5625,
+      "learning_rate": 0.1,
+      "loss": 2.5254688262939453,
+      "step": 11178
+    },
+    {
+      "epoch": 0.3549206349206349,
+      "grad_norm": 0.2119140625,
+      "learning_rate": 0.1,
+      "loss": 2.49434494972229,
+      "step": 11180
+    },
+    {
+      "epoch": 0.35498412698412696,
+      "grad_norm": 0.142578125,
+      "learning_rate": 0.1,
+      "loss": 2.473165273666382,
+      "step": 11182
+    },
+    {
+      "epoch": 0.35504761904761906,
+      "grad_norm": 0.3125,
+      "learning_rate": 0.1,
+      "loss": 2.496553421020508,
+      "step": 11184
+    },
+    {
+      "epoch": 0.3551111111111111,
+      "grad_norm": 0.609375,
+      "learning_rate": 0.1,
+      "loss": 2.5094237327575684,
+      "step": 11186
+    },
+    {
+      "epoch": 0.3551746031746032,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.1,
+      "loss": 2.4690942764282227,
+      "step": 11188
+    },
+    {
+      "epoch": 0.35523809523809524,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.1,
+      "loss": 2.4825117588043213,
+      "step": 11190
+    },
+    {
+      "epoch": 0.3553015873015873,
+      "grad_norm": 0.259765625,
+      "learning_rate": 0.1,
+      "loss": 2.530270576477051,
+      "step": 11192
+    },
+    {
+      "epoch": 0.3553650793650794,
+      "grad_norm": 0.0947265625,
+      "learning_rate": 0.1,
+      "loss": 2.500499725341797,
+      "step": 11194
+    },
+    {
+      "epoch": 0.3554285714285714,
+      "grad_norm": 0.0849609375,
+      "learning_rate": 0.1,
+      "loss": 2.483163595199585,
+      "step": 11196
+    },
+    {
+      "epoch": 0.35549206349206347,
+      "grad_norm": 0.09130859375,
+      "learning_rate": 0.1,
+      "loss": 2.5392792224884033,
+      "step": 11198
+    },
+    {
+      "epoch": 0.35555555555555557,
+      "grad_norm": 0.0751953125,
+      "learning_rate": 0.1,
+      "loss": 2.4986934661865234,
+      "step": 11200
+    },
+    {
+      "epoch": 0.3556190476190476,
+      "grad_norm": 0.04931640625,
+      "learning_rate": 0.1,
+      "loss": 2.480569362640381,
+      "step": 11202
+    },
+    {
+      "epoch": 0.35568253968253966,
+      "grad_norm": 0.06494140625,
+      "learning_rate": 0.1,
+      "loss": 2.4624855518341064,
+      "step": 11204
+    },
+    {
+      "epoch": 0.35574603174603175,
+      "grad_norm": 0.197265625,
+      "learning_rate": 0.1,
+      "loss": 2.493828296661377,
+      "step": 11206
+    },
+    {
+      "epoch": 0.3558095238095238,
+      "grad_norm": 0.203125,
+      "learning_rate": 0.1,
+      "loss": 2.491539239883423,
+      "step": 11208
+    },
+    {
+      "epoch": 0.3558730158730159,
+      "grad_norm": 0.0478515625,
+      "learning_rate": 0.1,
+      "loss": 2.497725486755371,
+      "step": 11210
+    },
+    {
+      "epoch": 0.35593650793650794,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.1,
+      "loss": 2.4691996574401855,
+      "step": 11212
+    },
+    {
+      "epoch": 0.356,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.1,
+      "loss": 2.4893603324890137,
+      "step": 11214
+    },
+    {
+      "epoch": 0.3560634920634921,
+      "grad_norm": 0.2734375,
+      "learning_rate": 0.1,
+      "loss": 2.4978229999542236,
+      "step": 11216
+    },
+    {
+      "epoch": 0.3561269841269841,
+      "grad_norm": 0.053466796875,
+      "learning_rate": 0.1,
+      "loss": 2.4881231784820557,
+      "step": 11218
+    },
+    {
+      "epoch": 0.35619047619047617,
+      "grad_norm": 0.10595703125,
+      "learning_rate": 0.1,
+      "loss": 2.46063232421875,
+      "step": 11220
+    },
+    {
+      "epoch": 0.35625396825396827,
+      "grad_norm": 0.09033203125,
+      "learning_rate": 0.1,
+      "loss": 2.5022330284118652,
+      "step": 11222
+    },
+    {
+      "epoch": 0.3563174603174603,
+      "grad_norm": 0.21875,
+      "learning_rate": 0.1,
+      "loss": 2.5123372077941895,
+      "step": 11224
+    },
+    {
+      "epoch": 0.35638095238095235,
+      "grad_norm": 0.62890625,
+      "learning_rate": 0.1,
+      "loss": 2.4963247776031494,
+      "step": 11226
+    },
+    {
+      "epoch": 0.35644444444444445,
+      "grad_norm": 0.166015625,
+      "learning_rate": 0.1,
+      "loss": 2.4859702587127686,
+      "step": 11228
+    },
+    {
+      "epoch": 0.3565079365079365,
+      "grad_norm": 0.1298828125,
+      "learning_rate": 0.1,
+      "loss": 2.475501775741577,
+      "step": 11230
+    },
+    {
+      "epoch": 0.3565714285714286,
+      "grad_norm": 0.291015625,
+      "learning_rate": 0.1,
+      "loss": 2.490372657775879,
+      "step": 11232
+    },
+    {
+      "epoch": 0.35663492063492064,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.1,
+      "loss": 2.480090618133545,
+      "step": 11234
+    },
+    {
+      "epoch": 0.3566984126984127,
+      "grad_norm": 0.146484375,
+      "learning_rate": 0.1,
+      "loss": 2.5044050216674805,
+      "step": 11236
+    },
+    {
+      "epoch": 0.3567619047619048,
+      "grad_norm": 0.09521484375,
+      "learning_rate": 0.1,
+      "loss": 2.4619669914245605,
+      "step": 11238
+    },
+    {
+      "epoch": 0.3568253968253968,
+      "grad_norm": 0.080078125,
+      "learning_rate": 0.1,
+      "loss": 2.4584763050079346,
+      "step": 11240
+    },
+    {
+      "epoch": 0.35688888888888887,
+      "grad_norm": 0.197265625,
+      "learning_rate": 0.1,
+      "loss": 2.478175640106201,
+      "step": 11242
+    },
+    {
+      "epoch": 0.35695238095238097,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.1,
+      "loss": 2.476935386657715,
+      "step": 11244
+    },
+    {
+      "epoch": 0.357015873015873,
+      "grad_norm": 0.244140625,
+      "learning_rate": 0.1,
+      "loss": 2.461888074874878,
+      "step": 11246
+    },
+    {
+      "epoch": 0.35707936507936505,
+      "grad_norm": 0.1142578125,
+      "learning_rate": 0.1,
+      "loss": 2.449694871902466,
+      "step": 11248
+    },
+    {
+      "epoch": 0.35714285714285715,
+      "grad_norm": 0.244140625,
+      "learning_rate": 0.1,
+      "loss": 2.4589521884918213,
+      "step": 11250
+    },
+    {
+      "epoch": 0.3572063492063492,
+      "grad_norm": 0.169921875,
+      "learning_rate": 0.1,
+      "loss": 2.469026565551758,
+      "step": 11252
+    },
+    {
+      "epoch": 0.3572698412698413,
+      "grad_norm": 0.2294921875,
+      "learning_rate": 0.1,
+      "loss": 2.463324785232544,
+      "step": 11254
+    },
+    {
+      "epoch": 0.35733333333333334,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.1,
+      "loss": 2.450981616973877,
+      "step": 11256
+    },
+    {
+      "epoch": 0.3573968253968254,
+      "grad_norm": 0.1572265625,
+      "learning_rate": 0.1,
+      "loss": 2.4704065322875977,
+      "step": 11258
+    },
+    {
+      "epoch": 0.3574603174603175,
+      "grad_norm": 0.10400390625,
+      "learning_rate": 0.1,
+      "loss": 2.4855854511260986,
+      "step": 11260
+    },
+    {
+      "epoch": 0.3575238095238095,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.1,
+      "loss": 2.459942102432251,
+      "step": 11262
+    },
+    {
+      "epoch": 0.35758730158730156,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.1,
+      "loss": 2.4716644287109375,
+      "step": 11264
+    },
+    {
+      "epoch": 0.35765079365079366,
+      "grad_norm": 0.138671875,
+      "learning_rate": 0.1,
+      "loss": 2.4955148696899414,
+      "step": 11266
+    },
+    {
+      "epoch": 0.3577142857142857,
+      "grad_norm": 0.076171875,
+      "learning_rate": 0.1,
+      "loss": 2.462697982788086,
+      "step": 11268
+    },
+    {
+      "epoch": 0.35777777777777775,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.1,
+      "loss": 2.4656829833984375,
+      "step": 11270
+    },
+    {
+      "epoch": 0.35784126984126985,
+      "grad_norm": 0.09423828125,
+      "learning_rate": 0.1,
+      "loss": 2.458569288253784,
+      "step": 11272
+    },
+    {
+      "epoch": 0.3579047619047619,
+      "grad_norm": 0.0771484375,
+      "learning_rate": 0.1,
+      "loss": 2.4550700187683105,
+      "step": 11274
+    },
+    {
+      "epoch": 0.357968253968254,
+      "grad_norm": 0.1142578125,
+      "learning_rate": 0.1,
+      "loss": 2.4642810821533203,
+      "step": 11276
+    },
+    {
+      "epoch": 0.35803174603174603,
+      "grad_norm": 0.1826171875,
+      "learning_rate": 0.1,
+      "loss": 2.454416513442993,
+      "step": 11278
+    },
+    {
+      "epoch": 0.3580952380952381,
+      "grad_norm": 0.17578125,
+      "learning_rate": 0.1,
+      "loss": 2.469170570373535,
+      "step": 11280
+    },
+    {
+      "epoch": 0.3581587301587302,
+      "grad_norm": 0.17578125,
+      "learning_rate": 0.1,
+      "loss": 2.481339693069458,
+      "step": 11282
+    },
+    {
+      "epoch": 0.3582222222222222,
+      "grad_norm": 0.458984375,
+      "learning_rate": 0.1,
+      "loss": 2.467975378036499,
+      "step": 11284
+    },
+    {
+      "epoch": 0.35828571428571426,
+      "grad_norm": 0.2080078125,
+      "learning_rate": 0.1,
+      "loss": 2.5128746032714844,
+      "step": 11286
+    },
+    {
+      "epoch": 0.35834920634920636,
+      "grad_norm": 0.08154296875,
+      "learning_rate": 0.1,
+      "loss": 2.4856278896331787,
+      "step": 11288
+    },
+    {
+      "epoch": 0.3584126984126984,
+      "grad_norm": 0.1044921875,
+      "learning_rate": 0.1,
+      "loss": 2.4855496883392334,
+      "step": 11290
+    },
+    {
+      "epoch": 0.3584761904761905,
+      "grad_norm": 0.28125,
+      "learning_rate": 0.1,
+      "loss": 2.4849109649658203,
+      "step": 11292
+    },
+    {
+      "epoch": 0.35853968253968255,
+      "grad_norm": 0.2890625,
+      "learning_rate": 0.1,
+      "loss": 2.4842143058776855,
+      "step": 11294
+    },
+    {
+      "epoch": 0.3586031746031746,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.1,
+      "loss": 2.451904058456421,
+      "step": 11296
+    },
+    {
+      "epoch": 0.3586666666666667,
+      "grad_norm": 0.1884765625,
+      "learning_rate": 0.1,
+      "loss": 2.4747142791748047,
+      "step": 11298
+    },
+    {
+      "epoch": 0.35873015873015873,
+      "grad_norm": 0.2294921875,
+      "learning_rate": 0.1,
+      "loss": 2.4832489490509033,
+      "step": 11300
+    },
+    {
+      "epoch": 0.3587936507936508,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.1,
+      "loss": 2.4741103649139404,
+      "step": 11302
+    },
+    {
+      "epoch": 0.3588571428571429,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.1,
+      "loss": 2.470633029937744,
+      "step": 11304
+    },
+    {
+      "epoch": 0.3589206349206349,
+      "grad_norm": 0.228515625,
+      "learning_rate": 0.1,
+      "loss": 2.491023063659668,
+      "step": 11306
+    },
+    {
+      "epoch": 0.35898412698412696,
+      "grad_norm": 0.2578125,
+      "learning_rate": 0.1,
+      "loss": 2.4867446422576904,
+      "step": 11308
+    },
+    {
+      "epoch": 0.35904761904761906,
+      "grad_norm": 0.1875,
+      "learning_rate": 0.1,
+      "loss": 2.4757778644561768,
+      "step": 11310
+    },
+    {
+      "epoch": 0.3591111111111111,
+      "grad_norm": 0.236328125,
+      "learning_rate": 0.1,
+      "loss": 2.483226776123047,
+      "step": 11312
+    },
+    {
+      "epoch": 0.3591746031746032,
+      "grad_norm": 0.291015625,
+      "learning_rate": 0.1,
+      "loss": 2.509500026702881,
+      "step": 11314
+    },
+    {
+      "epoch": 0.35923809523809525,
+      "grad_norm": 0.1064453125,
+      "learning_rate": 0.1,
+      "loss": 2.4560673236846924,
+      "step": 11316
+    },
+    {
+      "epoch": 0.3593015873015873,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.1,
+      "loss": 2.4755465984344482,
+      "step": 11318
+    },
+    {
+      "epoch": 0.3593650793650794,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.1,
+      "loss": 2.4875471591949463,
+      "step": 11320
+    },
+    {
+      "epoch": 0.35942857142857143,
+      "grad_norm": 0.1376953125,
+      "learning_rate": 0.1,
+      "loss": 2.501183032989502,
+      "step": 11322
+    },
+    {
+      "epoch": 0.3594920634920635,
+      "grad_norm": 0.1708984375,
+      "learning_rate": 0.1,
+      "loss": 2.482649087905884,
+      "step": 11324
+    },
+    {
+      "epoch": 0.3595555555555556,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.1,
+      "loss": 2.5185811519622803,
+      "step": 11326
+    },
+    {
+      "epoch": 0.3596190476190476,
+      "grad_norm": 0.189453125,
+      "learning_rate": 0.1,
+      "loss": 2.5071229934692383,
+      "step": 11328
+    },
+    {
+      "epoch": 0.35968253968253966,
+      "grad_norm": 0.1708984375,
+      "learning_rate": 0.1,
+      "loss": 2.4960479736328125,
+      "step": 11330
+    },
+    {
+      "epoch": 0.35974603174603176,
+      "grad_norm": 0.0732421875,
+      "learning_rate": 0.1,
+      "loss": 2.4599719047546387,
+      "step": 11332
+    },
+    {
+      "epoch": 0.3598095238095238,
+      "grad_norm": 0.0712890625,
+      "learning_rate": 0.1,
+      "loss": 2.492602586746216,
+      "step": 11334
+    },
+    {
+      "epoch": 0.3598730158730159,
+      "grad_norm": 0.0615234375,
+      "learning_rate": 0.1,
+      "loss": 2.467595100402832,
+      "step": 11336
+    },
+    {
+      "epoch": 0.35993650793650794,
+      "grad_norm": 0.0966796875,
+      "learning_rate": 0.1,
+      "loss": 2.5092246532440186,
+      "step": 11338
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 0.2373046875,
+      "learning_rate": 0.1,
+      "loss": 2.4747467041015625,
+      "step": 11340
     }
   ],
   "logging_steps": 2,
       "attributes": {}
     }
   },
+  "total_flos": 3.755676497098762e+19,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null