Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on 28 days ago

Commit

850e8e2

1 Parent(s): 4429c11

Training in progress, step 600, checkpoint

Browse files

Files changed (1) hide show

last-checkpoint/trainer_state.json +1053 -3

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.02633678020516626,
   "eval_steps": 957,
-  "global_step": 300,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -1058,6 +1058,1056 @@
       "learning_rate": 0.005,
       "loss": 3.4181106090545654,
       "step": 300
     }
   ],
   "logging_steps": 2,
@@ -1077,7 +2127,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 5.071178624049867e+17,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.05267356041033252,
   "eval_steps": 957,
+  "global_step": 600,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "learning_rate": 0.005,
       "loss": 3.4181106090545654,
       "step": 300
+    },
+    {
+      "epoch": 0.02651235873986737,
+      "grad_norm": 0.1962890625,
+      "learning_rate": 0.005,
+      "loss": 3.4428491592407227,
+      "step": 302
+    },
+    {
+      "epoch": 0.026687937274568477,
+      "grad_norm": 0.1865234375,
+      "learning_rate": 0.005,
+      "loss": 3.4298229217529297,
+      "step": 304
+    },
+    {
+      "epoch": 0.026863515809269586,
+      "grad_norm": 0.1767578125,
+      "learning_rate": 0.005,
+      "loss": 3.41455078125,
+      "step": 306
+    },
+    {
+      "epoch": 0.027039094343970695,
+      "grad_norm": 0.1669921875,
+      "learning_rate": 0.005,
+      "loss": 3.3820443153381348,
+      "step": 308
+    },
+    {
+      "epoch": 0.027214672878671804,
+      "grad_norm": 0.2119140625,
+      "learning_rate": 0.005,
+      "loss": 3.3708128929138184,
+      "step": 310
+    },
+    {
+      "epoch": 0.027390251413372913,
+      "grad_norm": 0.20703125,
+      "learning_rate": 0.005,
+      "loss": 3.3826849460601807,
+      "step": 312
+    },
+    {
+      "epoch": 0.02756582994807402,
+      "grad_norm": 0.212890625,
+      "learning_rate": 0.005,
+      "loss": 3.357750177383423,
+      "step": 314
+    },
+    {
+      "epoch": 0.027741408482775128,
+      "grad_norm": 0.1953125,
+      "learning_rate": 0.005,
+      "loss": 3.3657925128936768,
+      "step": 316
+    },
+    {
+      "epoch": 0.027916987017476236,
+      "grad_norm": 0.177734375,
+      "learning_rate": 0.005,
+      "loss": 3.367344379425049,
+      "step": 318
+    },
+    {
+      "epoch": 0.028092565552177345,
+      "grad_norm": 0.19140625,
+      "learning_rate": 0.005,
+      "loss": 3.3542888164520264,
+      "step": 320
+    },
+    {
+      "epoch": 0.028268144086878454,
+      "grad_norm": 0.1904296875,
+      "learning_rate": 0.005,
+      "loss": 3.349008798599243,
+      "step": 322
+    },
+    {
+      "epoch": 0.028443722621579563,
+      "grad_norm": 0.2099609375,
+      "learning_rate": 0.005,
+      "loss": 3.319000720977783,
+      "step": 324
+    },
+    {
+      "epoch": 0.028619301156280672,
+      "grad_norm": 0.2255859375,
+      "learning_rate": 0.005,
+      "loss": 3.3123488426208496,
+      "step": 326
+    },
+    {
+      "epoch": 0.028794879690981778,
+      "grad_norm": 0.275390625,
+      "learning_rate": 0.005,
+      "loss": 3.311730146408081,
+      "step": 328
+    },
+    {
+      "epoch": 0.028970458225682887,
+      "grad_norm": 0.21484375,
+      "learning_rate": 0.005,
+      "loss": 3.2922301292419434,
+      "step": 330
+    },
+    {
+      "epoch": 0.029146036760383996,
+      "grad_norm": 0.2099609375,
+      "learning_rate": 0.005,
+      "loss": 3.3116979598999023,
+      "step": 332
+    },
+    {
+      "epoch": 0.029321615295085104,
+      "grad_norm": 0.17578125,
+      "learning_rate": 0.005,
+      "loss": 3.314528465270996,
+      "step": 334
+    },
+    {
+      "epoch": 0.029497193829786213,
+      "grad_norm": 0.1533203125,
+      "learning_rate": 0.005,
+      "loss": 3.3112518787384033,
+      "step": 336
+    },
+    {
+      "epoch": 0.029672772364487322,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.005,
+      "loss": 3.283816337585449,
+      "step": 338
+    },
+    {
+      "epoch": 0.029848350899188428,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.005,
+      "loss": 3.3049468994140625,
+      "step": 340
+    },
+    {
+      "epoch": 0.030023929433889537,
+      "grad_norm": 0.1328125,
+      "learning_rate": 0.005,
+      "loss": 3.2841060161590576,
+      "step": 342
+    },
+    {
+      "epoch": 0.030199507968590646,
+      "grad_norm": 0.1533203125,
+      "learning_rate": 0.005,
+      "loss": 3.2544920444488525,
+      "step": 344
+    },
+    {
+      "epoch": 0.030375086503291755,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.005,
+      "loss": 3.2736458778381348,
+      "step": 346
+    },
+    {
+      "epoch": 0.030550665037992863,
+      "grad_norm": 0.142578125,
+      "learning_rate": 0.005,
+      "loss": 3.2455122470855713,
+      "step": 348
+    },
+    {
+      "epoch": 0.030726243572693972,
+      "grad_norm": 0.15234375,
+      "learning_rate": 0.005,
+      "loss": 3.2548434734344482,
+      "step": 350
+    },
+    {
+      "epoch": 0.03090182210739508,
+      "grad_norm": 0.146484375,
+      "learning_rate": 0.005,
+      "loss": 3.2415554523468018,
+      "step": 352
+    },
+    {
+      "epoch": 0.031077400642096187,
+      "grad_norm": 0.1513671875,
+      "learning_rate": 0.005,
+      "loss": 3.25886607170105,
+      "step": 354
+    },
+    {
+      "epoch": 0.0312529791767973,
+      "grad_norm": 0.1494140625,
+      "learning_rate": 0.005,
+      "loss": 3.2604076862335205,
+      "step": 356
+    },
+    {
+      "epoch": 0.031428557711498405,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.005,
+      "loss": 3.248387336730957,
+      "step": 358
+    },
+    {
+      "epoch": 0.03160413624619951,
+      "grad_norm": 0.173828125,
+      "learning_rate": 0.005,
+      "loss": 3.2637195587158203,
+      "step": 360
+    },
+    {
+      "epoch": 0.03177971478090062,
+      "grad_norm": 0.1513671875,
+      "learning_rate": 0.005,
+      "loss": 3.2353899478912354,
+      "step": 362
+    },
+    {
+      "epoch": 0.03195529331560173,
+      "grad_norm": 0.142578125,
+      "learning_rate": 0.005,
+      "loss": 3.2072153091430664,
+      "step": 364
+    },
+    {
+      "epoch": 0.03213087185030284,
+      "grad_norm": 0.1611328125,
+      "learning_rate": 0.005,
+      "loss": 3.249500274658203,
+      "step": 366
+    },
+    {
+      "epoch": 0.032306450385003946,
+      "grad_norm": 0.177734375,
+      "learning_rate": 0.005,
+      "loss": 3.2054481506347656,
+      "step": 368
+    },
+    {
+      "epoch": 0.03248202891970506,
+      "grad_norm": 0.1767578125,
+      "learning_rate": 0.005,
+      "loss": 3.1932778358459473,
+      "step": 370
+    },
+    {
+      "epoch": 0.032657607454406164,
+      "grad_norm": 0.16015625,
+      "learning_rate": 0.005,
+      "loss": 3.197559118270874,
+      "step": 372
+    },
+    {
+      "epoch": 0.03283318598910727,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.005,
+      "loss": 3.158921957015991,
+      "step": 374
+    },
+    {
+      "epoch": 0.03300876452380838,
+      "grad_norm": 0.125,
+      "learning_rate": 0.005,
+      "loss": 3.163810968399048,
+      "step": 376
+    },
+    {
+      "epoch": 0.03318434305850949,
+      "grad_norm": 0.1376953125,
+      "learning_rate": 0.005,
+      "loss": 3.1478772163391113,
+      "step": 378
+    },
+    {
+      "epoch": 0.0333599215932106,
+      "grad_norm": 0.1328125,
+      "learning_rate": 0.005,
+      "loss": 3.1855390071868896,
+      "step": 380
+    },
+    {
+      "epoch": 0.033535500127911705,
+      "grad_norm": 0.142578125,
+      "learning_rate": 0.005,
+      "loss": 3.1721317768096924,
+      "step": 382
+    },
+    {
+      "epoch": 0.03371107866261282,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.005,
+      "loss": 3.1935954093933105,
+      "step": 384
+    },
+    {
+      "epoch": 0.03388665719731392,
+      "grad_norm": 0.2236328125,
+      "learning_rate": 0.005,
+      "loss": 3.162994861602783,
+      "step": 386
+    },
+    {
+      "epoch": 0.03406223573201503,
+      "grad_norm": 0.181640625,
+      "learning_rate": 0.005,
+      "loss": 3.1582443714141846,
+      "step": 388
+    },
+    {
+      "epoch": 0.03423781426671614,
+      "grad_norm": 0.1455078125,
+      "learning_rate": 0.005,
+      "loss": 3.1388156414031982,
+      "step": 390
+    },
+    {
+      "epoch": 0.034413392801417246,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.005,
+      "loss": 3.183506727218628,
+      "step": 392
+    },
+    {
+      "epoch": 0.03458897133611836,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.005,
+      "loss": 3.160357713699341,
+      "step": 394
+    },
+    {
+      "epoch": 0.034764549870819464,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.005,
+      "loss": 3.164640188217163,
+      "step": 396
+    },
+    {
+      "epoch": 0.034940128405520576,
+      "grad_norm": 0.169921875,
+      "learning_rate": 0.005,
+      "loss": 3.127124786376953,
+      "step": 398
+    },
+    {
+      "epoch": 0.03511570694022168,
+      "grad_norm": 0.162109375,
+      "learning_rate": 0.005,
+      "loss": 3.1583406925201416,
+      "step": 400
+    },
+    {
+      "epoch": 0.03529128547492279,
+      "grad_norm": 0.140625,
+      "learning_rate": 0.005,
+      "loss": 3.1125779151916504,
+      "step": 402
+    },
+    {
+      "epoch": 0.0354668640096239,
+      "grad_norm": 0.1484375,
+      "learning_rate": 0.005,
+      "loss": 3.111574411392212,
+      "step": 404
+    },
+    {
+      "epoch": 0.035642442544325005,
+      "grad_norm": 0.2255859375,
+      "learning_rate": 0.005,
+      "loss": 3.1208906173706055,
+      "step": 406
+    },
+    {
+      "epoch": 0.03581802107902612,
+      "grad_norm": 0.12451171875,
+      "learning_rate": 0.005,
+      "loss": 3.1485278606414795,
+      "step": 408
+    },
+    {
+      "epoch": 0.03599359961372722,
+      "grad_norm": 0.11328125,
+      "learning_rate": 0.005,
+      "loss": 3.1263508796691895,
+      "step": 410
+    },
+    {
+      "epoch": 0.036169178148428335,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.005,
+      "loss": 3.0847690105438232,
+      "step": 412
+    },
+    {
+      "epoch": 0.03634475668312944,
+      "grad_norm": 0.1435546875,
+      "learning_rate": 0.005,
+      "loss": 3.104984760284424,
+      "step": 414
+    },
+    {
+      "epoch": 0.036520335217830546,
+      "grad_norm": 0.134765625,
+      "learning_rate": 0.005,
+      "loss": 3.102842092514038,
+      "step": 416
+    },
+    {
+      "epoch": 0.03669591375253166,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.005,
+      "loss": 3.097351551055908,
+      "step": 418
+    },
+    {
+      "epoch": 0.036871492287232764,
+      "grad_norm": 0.1328125,
+      "learning_rate": 0.005,
+      "loss": 3.116666555404663,
+      "step": 420
+    },
+    {
+      "epoch": 0.037047070821933877,
+      "grad_norm": 0.1640625,
+      "learning_rate": 0.005,
+      "loss": 3.0793497562408447,
+      "step": 422
+    },
+    {
+      "epoch": 0.03722264935663498,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.005,
+      "loss": 3.136044502258301,
+      "step": 424
+    },
+    {
+      "epoch": 0.037398227891336094,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.005,
+      "loss": 3.1369822025299072,
+      "step": 426
+    },
+    {
+      "epoch": 0.0375738064260372,
+      "grad_norm": 0.150390625,
+      "learning_rate": 0.005,
+      "loss": 3.0935890674591064,
+      "step": 428
+    },
+    {
+      "epoch": 0.037749384960738305,
+      "grad_norm": 0.142578125,
+      "learning_rate": 0.005,
+      "loss": 3.0664894580841064,
+      "step": 430
+    },
+    {
+      "epoch": 0.03792496349543942,
+      "grad_norm": 0.1572265625,
+      "learning_rate": 0.005,
+      "loss": 3.0921502113342285,
+      "step": 432
+    },
+    {
+      "epoch": 0.03810054203014052,
+      "grad_norm": 0.1865234375,
+      "learning_rate": 0.005,
+      "loss": 3.082390546798706,
+      "step": 434
+    },
+    {
+      "epoch": 0.038276120564841636,
+      "grad_norm": 0.18359375,
+      "learning_rate": 0.005,
+      "loss": 3.10709285736084,
+      "step": 436
+    },
+    {
+      "epoch": 0.03845169909954274,
+      "grad_norm": 0.16796875,
+      "learning_rate": 0.005,
+      "loss": 3.0833730697631836,
+      "step": 438
+    },
+    {
+      "epoch": 0.038627277634243846,
+      "grad_norm": 0.1845703125,
+      "learning_rate": 0.005,
+      "loss": 3.1024534702301025,
+      "step": 440
+    },
+    {
+      "epoch": 0.03880285616894496,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.005,
+      "loss": 3.0464351177215576,
+      "step": 442
+    },
+    {
+      "epoch": 0.038978434703646064,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.005,
+      "loss": 3.072038412094116,
+      "step": 444
+    },
+    {
+      "epoch": 0.03915401323834718,
+      "grad_norm": 0.1318359375,
+      "learning_rate": 0.005,
+      "loss": 3.0556483268737793,
+      "step": 446
+    },
+    {
+      "epoch": 0.03932959177304828,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.005,
+      "loss": 3.0786564350128174,
+      "step": 448
+    },
+    {
+      "epoch": 0.039505170307749395,
+      "grad_norm": 0.1328125,
+      "learning_rate": 0.005,
+      "loss": 3.039013147354126,
+      "step": 450
+    },
+    {
+      "epoch": 0.0396807488424505,
+      "grad_norm": 0.12255859375,
+      "learning_rate": 0.005,
+      "loss": 3.100440740585327,
+      "step": 452
+    },
+    {
+      "epoch": 0.039856327377151606,
+      "grad_norm": 0.11767578125,
+      "learning_rate": 0.005,
+      "loss": 3.032219171524048,
+      "step": 454
+    },
+    {
+      "epoch": 0.04003190591185272,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.005,
+      "loss": 3.0131278038024902,
+      "step": 456
+    },
+    {
+      "epoch": 0.04020748444655382,
+      "grad_norm": 0.208984375,
+      "learning_rate": 0.005,
+      "loss": 3.0364959239959717,
+      "step": 458
+    },
+    {
+      "epoch": 0.040383062981254936,
+      "grad_norm": 0.17578125,
+      "learning_rate": 0.005,
+      "loss": 3.0660479068756104,
+      "step": 460
+    },
+    {
+      "epoch": 0.04055864151595604,
+      "grad_norm": 0.125,
+      "learning_rate": 0.005,
+      "loss": 3.022470474243164,
+      "step": 462
+    },
+    {
+      "epoch": 0.040734220050657154,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.005,
+      "loss": 3.024975538253784,
+      "step": 464
+    },
+    {
+      "epoch": 0.04090979858535826,
+      "grad_norm": 0.123046875,
+      "learning_rate": 0.005,
+      "loss": 3.0443012714385986,
+      "step": 466
+    },
+    {
+      "epoch": 0.041085377120059365,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.005,
+      "loss": 3.0197906494140625,
+      "step": 468
+    },
+    {
+      "epoch": 0.04126095565476048,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.005,
+      "loss": 3.0200116634368896,
+      "step": 470
+    },
+    {
+      "epoch": 0.04143653418946158,
+      "grad_norm": 0.1533203125,
+      "learning_rate": 0.005,
+      "loss": 3.0184497833251953,
+      "step": 472
+    },
+    {
+      "epoch": 0.041612112724162695,
+      "grad_norm": 0.1552734375,
+      "learning_rate": 0.005,
+      "loss": 3.014349937438965,
+      "step": 474
+    },
+    {
+      "epoch": 0.0417876912588638,
+      "grad_norm": 0.1455078125,
+      "learning_rate": 0.005,
+      "loss": 2.9996232986450195,
+      "step": 476
+    },
+    {
+      "epoch": 0.04196326979356491,
+      "grad_norm": 0.1318359375,
+      "learning_rate": 0.005,
+      "loss": 2.9938416481018066,
+      "step": 478
+    },
+    {
+      "epoch": 0.04213884832826602,
+      "grad_norm": 0.1142578125,
+      "learning_rate": 0.005,
+      "loss": 3.0089657306671143,
+      "step": 480
+    },
+    {
+      "epoch": 0.042314426862967124,
+      "grad_norm": 0.1220703125,
+      "learning_rate": 0.005,
+      "loss": 3.0019686222076416,
+      "step": 482
+    },
+    {
+      "epoch": 0.042490005397668236,
+      "grad_norm": 0.12451171875,
+      "learning_rate": 0.005,
+      "loss": 3.0172030925750732,
+      "step": 484
+    },
+    {
+      "epoch": 0.04266558393236934,
+      "grad_norm": 0.1357421875,
+      "learning_rate": 0.005,
+      "loss": 3.0507822036743164,
+      "step": 486
+    },
+    {
+      "epoch": 0.042841162467070454,
+      "grad_norm": 0.1376953125,
+      "learning_rate": 0.005,
+      "loss": 3.0241496562957764,
+      "step": 488
+    },
+    {
+      "epoch": 0.04301674100177156,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.005,
+      "loss": 3.0176684856414795,
+      "step": 490
+    },
+    {
+      "epoch": 0.04319231953647267,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.005,
+      "loss": 2.978384256362915,
+      "step": 492
+    },
+    {
+      "epoch": 0.04336789807117378,
+      "grad_norm": 0.130859375,
+      "learning_rate": 0.005,
+      "loss": 3.0217716693878174,
+      "step": 494
+    },
+    {
+      "epoch": 0.04354347660587488,
+      "grad_norm": 0.13671875,
+      "learning_rate": 0.005,
+      "loss": 3.0396456718444824,
+      "step": 496
+    },
+    {
+      "epoch": 0.043719055140575995,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.005,
+      "loss": 2.992241621017456,
+      "step": 498
+    },
+    {
+      "epoch": 0.0438946336752771,
+      "grad_norm": 0.1416015625,
+      "learning_rate": 0.005,
+      "loss": 3.012587070465088,
+      "step": 500
+    },
+    {
+      "epoch": 0.04407021220997821,
+      "grad_norm": 0.15234375,
+      "learning_rate": 0.005,
+      "loss": 2.994797945022583,
+      "step": 502
+    },
+    {
+      "epoch": 0.04424579074467932,
+      "grad_norm": 0.15625,
+      "learning_rate": 0.005,
+      "loss": 2.9956095218658447,
+      "step": 504
+    },
+    {
+      "epoch": 0.044421369279380424,
+      "grad_norm": 0.12353515625,
+      "learning_rate": 0.005,
+      "loss": 2.9908454418182373,
+      "step": 506
+    },
+    {
+      "epoch": 0.044596947814081536,
+      "grad_norm": 0.1435546875,
+      "learning_rate": 0.005,
+      "loss": 2.9612441062927246,
+      "step": 508
+    },
+    {
+      "epoch": 0.04477252634878264,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.005,
+      "loss": 2.9770443439483643,
+      "step": 510
+    },
+    {
+      "epoch": 0.044948104883483754,
+      "grad_norm": 0.1669921875,
+      "learning_rate": 0.005,
+      "loss": 3.000307083129883,
+      "step": 512
+    },
+    {
+      "epoch": 0.04512368341818486,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.005,
+      "loss": 3.000415086746216,
+      "step": 514
+    },
+    {
+      "epoch": 0.04529926195288597,
+      "grad_norm": 0.1416015625,
+      "learning_rate": 0.005,
+      "loss": 2.9415054321289062,
+      "step": 516
+    },
+    {
+      "epoch": 0.04547484048758708,
+      "grad_norm": 0.1298828125,
+      "learning_rate": 0.005,
+      "loss": 2.9683773517608643,
+      "step": 518
+    },
+    {
+      "epoch": 0.04565041902228818,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.005,
+      "loss": 2.9754841327667236,
+      "step": 520
+    },
+    {
+      "epoch": 0.045825997556989295,
+      "grad_norm": 0.1298828125,
+      "learning_rate": 0.005,
+      "loss": 2.9868385791778564,
+      "step": 522
+    },
+    {
+      "epoch": 0.0460015760916904,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.005,
+      "loss": 2.96690034866333,
+      "step": 524
+    },
+    {
+      "epoch": 0.04617715462639151,
+      "grad_norm": 0.1376953125,
+      "learning_rate": 0.005,
+      "loss": 2.970330238342285,
+      "step": 526
+    },
+    {
+      "epoch": 0.04635273316109262,
+      "grad_norm": 0.1845703125,
+      "learning_rate": 0.005,
+      "loss": 2.960998058319092,
+      "step": 528
+    },
+    {
+      "epoch": 0.04652831169579373,
+      "grad_norm": 0.1943359375,
+      "learning_rate": 0.005,
+      "loss": 2.980581045150757,
+      "step": 530
+    },
+    {
+      "epoch": 0.046703890230494836,
+      "grad_norm": 0.146484375,
+      "learning_rate": 0.005,
+      "loss": 2.950021743774414,
+      "step": 532
+    },
+    {
+      "epoch": 0.04687946876519594,
+      "grad_norm": 0.142578125,
+      "learning_rate": 0.005,
+      "loss": 2.9638140201568604,
+      "step": 534
+    },
+    {
+      "epoch": 0.047055047299897054,
+      "grad_norm": 0.12158203125,
+      "learning_rate": 0.005,
+      "loss": 2.9714972972869873,
+      "step": 536
+    },
+    {
+      "epoch": 0.04723062583459816,
+      "grad_norm": 0.10205078125,
+      "learning_rate": 0.005,
+      "loss": 2.9336445331573486,
+      "step": 538
+    },
+    {
+      "epoch": 0.04740620436929927,
+      "grad_norm": 0.09814453125,
+      "learning_rate": 0.005,
+      "loss": 2.9828338623046875,
+      "step": 540
+    },
+    {
+      "epoch": 0.04758178290400038,
+      "grad_norm": 0.1142578125,
+      "learning_rate": 0.005,
+      "loss": 2.9587650299072266,
+      "step": 542
+    },
+    {
+      "epoch": 0.04775736143870149,
+      "grad_norm": 0.111328125,
+      "learning_rate": 0.005,
+      "loss": 2.977285385131836,
+      "step": 544
+    },
+    {
+      "epoch": 0.047932939973402595,
+      "grad_norm": 0.119140625,
+      "learning_rate": 0.005,
+      "loss": 2.9418492317199707,
+      "step": 546
+    },
+    {
+      "epoch": 0.0481085185081037,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.005,
+      "loss": 2.9256904125213623,
+      "step": 548
+    },
+    {
+      "epoch": 0.04828409704280481,
+      "grad_norm": 0.1376953125,
+      "learning_rate": 0.005,
+      "loss": 2.945709705352783,
+      "step": 550
+    },
+    {
+      "epoch": 0.04845967557750592,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.005,
+      "loss": 2.9624128341674805,
+      "step": 552
+    },
+    {
+      "epoch": 0.04863525411220703,
+      "grad_norm": 0.11962890625,
+      "learning_rate": 0.005,
+      "loss": 2.9454598426818848,
+      "step": 554
+    },
+    {
+      "epoch": 0.04881083264690814,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.005,
+      "loss": 2.925313949584961,
+      "step": 556
+    },
+    {
+      "epoch": 0.04898641118160925,
+      "grad_norm": 0.11083984375,
+      "learning_rate": 0.005,
+      "loss": 2.933090925216675,
+      "step": 558
+    },
+    {
+      "epoch": 0.049161989716310354,
+      "grad_norm": 0.1005859375,
+      "learning_rate": 0.005,
+      "loss": 2.94028639793396,
+      "step": 560
+    },
+    {
+      "epoch": 0.04933756825101146,
+      "grad_norm": 0.11767578125,
+      "learning_rate": 0.005,
+      "loss": 2.9429543018341064,
+      "step": 562
+    },
+    {
+      "epoch": 0.04951314678571257,
+      "grad_norm": 0.10791015625,
+      "learning_rate": 0.005,
+      "loss": 2.919218063354492,
+      "step": 564
+    },
+    {
+      "epoch": 0.04968872532041368,
+      "grad_norm": 0.11669921875,
+      "learning_rate": 0.005,
+      "loss": 2.9045145511627197,
+      "step": 566
+    },
+    {
+      "epoch": 0.04986430385511479,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.005,
+      "loss": 2.941321849822998,
+      "step": 568
+    },
+    {
+      "epoch": 0.050039882389815896,
+      "grad_norm": 0.125,
+      "learning_rate": 0.005,
+      "loss": 2.9178407192230225,
+      "step": 570
+    },
+    {
+      "epoch": 0.05021546092451701,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.005,
+      "loss": 2.8982131481170654,
+      "step": 572
+    },
+    {
+      "epoch": 0.050391039459218114,
+      "grad_norm": 0.115234375,
+      "learning_rate": 0.005,
+      "loss": 2.926523447036743,
+      "step": 574
+    },
+    {
+      "epoch": 0.05056661799391922,
+      "grad_norm": 0.11083984375,
+      "learning_rate": 0.005,
+      "loss": 2.936190366744995,
+      "step": 576
+    },
+    {
+      "epoch": 0.05074219652862033,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.005,
+      "loss": 2.920558214187622,
+      "step": 578
+    },
+    {
+      "epoch": 0.05091777506332144,
+      "grad_norm": 0.1318359375,
+      "learning_rate": 0.005,
+      "loss": 2.8939120769500732,
+      "step": 580
+    },
+    {
+      "epoch": 0.05109335359802255,
+      "grad_norm": 0.154296875,
+      "learning_rate": 0.005,
+      "loss": 2.9341375827789307,
+      "step": 582
+    },
+    {
+      "epoch": 0.051268932132723655,
+      "grad_norm": 0.1416015625,
+      "learning_rate": 0.005,
+      "loss": 2.9142417907714844,
+      "step": 584
+    },
+    {
+      "epoch": 0.05144451066742476,
+      "grad_norm": 0.11279296875,
+      "learning_rate": 0.005,
+      "loss": 2.923351526260376,
+      "step": 586
+    },
+    {
+      "epoch": 0.05162008920212587,
+      "grad_norm": 0.11474609375,
+      "learning_rate": 0.005,
+      "loss": 2.923654794692993,
+      "step": 588
+    },
+    {
+      "epoch": 0.05179566773682698,
+      "grad_norm": 0.12451171875,
+      "learning_rate": 0.005,
+      "loss": 2.9181389808654785,
+      "step": 590
+    },
+    {
+      "epoch": 0.05197124627152809,
+      "grad_norm": 0.1044921875,
+      "learning_rate": 0.005,
+      "loss": 2.8995134830474854,
+      "step": 592
+    },
+    {
+      "epoch": 0.052146824806229196,
+      "grad_norm": 0.1103515625,
+      "learning_rate": 0.005,
+      "loss": 2.914994716644287,
+      "step": 594
+    },
+    {
+      "epoch": 0.05232240334093031,
+      "grad_norm": 0.1220703125,
+      "learning_rate": 0.005,
+      "loss": 2.9268240928649902,
+      "step": 596
+    },
+    {
+      "epoch": 0.052497981875631414,
+      "grad_norm": 0.11083984375,
+      "learning_rate": 0.005,
+      "loss": 2.901998281478882,
+      "step": 598
+    },
+    {
+      "epoch": 0.05267356041033252,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.005,
+      "loss": 2.892770528793335,
+      "step": 600
     }
   ],
   "logging_steps": 2,
       "attributes": {}
     }
   },
+  "total_flos": 1.0142157599816058e+18,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null