Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on 21 days ago

Commit

d4bd2e0

1 Parent(s): 850e8e2

Training in progress, step 900, checkpoint

Browse files

Files changed (1) hide show

last-checkpoint/trainer_state.json +1053 -3

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.05267356041033252,
   "eval_steps": 957,
-  "global_step": 600,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -2108,6 +2108,1056 @@
       "learning_rate": 0.005,
       "loss": 2.892770528793335,
       "step": 600
     }
   ],
   "logging_steps": 2,
@@ -2127,7 +3177,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.0142157599816058e+18,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.07901034061549879,
   "eval_steps": 957,
+  "global_step": 900,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "learning_rate": 0.005,
       "loss": 2.892770528793335,
       "step": 600
+    },
+    {
+      "epoch": 0.05284913894503363,
+      "grad_norm": 0.1474609375,
+      "learning_rate": 0.005,
+      "loss": 2.9107680320739746,
+      "step": 602
+    },
+    {
+      "epoch": 0.05302471747973474,
+      "grad_norm": 0.1728515625,
+      "learning_rate": 0.005,
+      "loss": 2.940669298171997,
+      "step": 604
+    },
+    {
+      "epoch": 0.05320029601443585,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.005,
+      "loss": 2.8937621116638184,
+      "step": 606
+    },
+    {
+      "epoch": 0.053375874549136955,
+      "grad_norm": 0.1171875,
+      "learning_rate": 0.005,
+      "loss": 2.9157919883728027,
+      "step": 608
+    },
+    {
+      "epoch": 0.05355145308383807,
+      "grad_norm": 0.1376953125,
+      "learning_rate": 0.005,
+      "loss": 2.9084694385528564,
+      "step": 610
+    },
+    {
+      "epoch": 0.05372703161853917,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.005,
+      "loss": 2.89589786529541,
+      "step": 612
+    },
+    {
+      "epoch": 0.05390261015324028,
+      "grad_norm": 0.10546875,
+      "learning_rate": 0.005,
+      "loss": 2.866093397140503,
+      "step": 614
+    },
+    {
+      "epoch": 0.05407818868794139,
+      "grad_norm": 0.1123046875,
+      "learning_rate": 0.005,
+      "loss": 2.8685555458068848,
+      "step": 616
+    },
+    {
+      "epoch": 0.054253767222642496,
+      "grad_norm": 0.1337890625,
+      "learning_rate": 0.005,
+      "loss": 2.934528112411499,
+      "step": 618
+    },
+    {
+      "epoch": 0.05442934575734361,
+      "grad_norm": 0.13671875,
+      "learning_rate": 0.005,
+      "loss": 2.915015459060669,
+      "step": 620
+    },
+    {
+      "epoch": 0.054604924292044714,
+      "grad_norm": 0.130859375,
+      "learning_rate": 0.005,
+      "loss": 2.930428981781006,
+      "step": 622
+    },
+    {
+      "epoch": 0.054780502826745826,
+      "grad_norm": 0.1455078125,
+      "learning_rate": 0.005,
+      "loss": 2.885199546813965,
+      "step": 624
+    },
+    {
+      "epoch": 0.05495608136144693,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.005,
+      "loss": 2.9132332801818848,
+      "step": 626
+    },
+    {
+      "epoch": 0.05513165989614804,
+      "grad_norm": 0.1220703125,
+      "learning_rate": 0.005,
+      "loss": 2.8978776931762695,
+      "step": 628
+    },
+    {
+      "epoch": 0.05530723843084915,
+      "grad_norm": 0.140625,
+      "learning_rate": 0.005,
+      "loss": 2.8665482997894287,
+      "step": 630
+    },
+    {
+      "epoch": 0.055482816965550255,
+      "grad_norm": 0.1279296875,
+      "learning_rate": 0.005,
+      "loss": 2.8479433059692383,
+      "step": 632
+    },
+    {
+      "epoch": 0.05565839550025137,
+      "grad_norm": 0.1591796875,
+      "learning_rate": 0.005,
+      "loss": 2.902829647064209,
+      "step": 634
+    },
+    {
+      "epoch": 0.05583397403495247,
+      "grad_norm": 0.1376953125,
+      "learning_rate": 0.005,
+      "loss": 2.8622169494628906,
+      "step": 636
+    },
+    {
+      "epoch": 0.056009552569653585,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.005,
+      "loss": 2.8923892974853516,
+      "step": 638
+    },
+    {
+      "epoch": 0.05618513110435469,
+      "grad_norm": 0.12451171875,
+      "learning_rate": 0.005,
+      "loss": 2.8786332607269287,
+      "step": 640
+    },
+    {
+      "epoch": 0.056360709639055796,
+      "grad_norm": 0.1064453125,
+      "learning_rate": 0.005,
+      "loss": 2.8552439212799072,
+      "step": 642
+    },
+    {
+      "epoch": 0.05653628817375691,
+      "grad_norm": 0.10595703125,
+      "learning_rate": 0.005,
+      "loss": 2.848996877670288,
+      "step": 644
+    },
+    {
+      "epoch": 0.056711866708458014,
+      "grad_norm": 0.11767578125,
+      "learning_rate": 0.005,
+      "loss": 2.9026734828948975,
+      "step": 646
+    },
+    {
+      "epoch": 0.05688744524315913,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.005,
+      "loss": 2.907194137573242,
+      "step": 648
+    },
+    {
+      "epoch": 0.05706302377786023,
+      "grad_norm": 0.130859375,
+      "learning_rate": 0.005,
+      "loss": 2.885383129119873,
+      "step": 650
+    },
+    {
+      "epoch": 0.057238602312561344,
+      "grad_norm": 0.12353515625,
+      "learning_rate": 0.005,
+      "loss": 2.8656253814697266,
+      "step": 652
+    },
+    {
+      "epoch": 0.05741418084726245,
+      "grad_norm": 0.1259765625,
+      "learning_rate": 0.005,
+      "loss": 2.873291254043579,
+      "step": 654
+    },
+    {
+      "epoch": 0.057589759381963555,
+      "grad_norm": 0.10546875,
+      "learning_rate": 0.005,
+      "loss": 2.8469748497009277,
+      "step": 656
+    },
+    {
+      "epoch": 0.05776533791666467,
+      "grad_norm": 0.1162109375,
+      "learning_rate": 0.005,
+      "loss": 2.8983521461486816,
+      "step": 658
+    },
+    {
+      "epoch": 0.05794091645136577,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.005,
+      "loss": 2.8409366607666016,
+      "step": 660
+    },
+    {
+      "epoch": 0.058116494986066886,
+      "grad_norm": 0.115234375,
+      "learning_rate": 0.005,
+      "loss": 2.9008703231811523,
+      "step": 662
+    },
+    {
+      "epoch": 0.05829207352076799,
+      "grad_norm": 0.1328125,
+      "learning_rate": 0.005,
+      "loss": 2.853753089904785,
+      "step": 664
+    },
+    {
+      "epoch": 0.058467652055469096,
+      "grad_norm": 0.11376953125,
+      "learning_rate": 0.005,
+      "loss": 2.8707268238067627,
+      "step": 666
+    },
+    {
+      "epoch": 0.05864323059017021,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.005,
+      "loss": 2.889103889465332,
+      "step": 668
+    },
+    {
+      "epoch": 0.058818809124871314,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.005,
+      "loss": 2.8374733924865723,
+      "step": 670
+    },
+    {
+      "epoch": 0.05899438765957243,
+      "grad_norm": 0.109375,
+      "learning_rate": 0.005,
+      "loss": 2.8331263065338135,
+      "step": 672
+    },
+    {
+      "epoch": 0.05916996619427353,
+      "grad_norm": 0.11279296875,
+      "learning_rate": 0.005,
+      "loss": 2.880403757095337,
+      "step": 674
+    },
+    {
+      "epoch": 0.059345544728974645,
+      "grad_norm": 0.1025390625,
+      "learning_rate": 0.005,
+      "loss": 2.8748178482055664,
+      "step": 676
+    },
+    {
+      "epoch": 0.05952112326367575,
+      "grad_norm": 0.09619140625,
+      "learning_rate": 0.005,
+      "loss": 2.8325412273406982,
+      "step": 678
+    },
+    {
+      "epoch": 0.059696701798376856,
+      "grad_norm": 0.0966796875,
+      "learning_rate": 0.005,
+      "loss": 2.826225519180298,
+      "step": 680
+    },
+    {
+      "epoch": 0.05987228033307797,
+      "grad_norm": 0.11767578125,
+      "learning_rate": 0.005,
+      "loss": 2.858710527420044,
+      "step": 682
+    },
+    {
+      "epoch": 0.06004785886777907,
+      "grad_norm": 0.1083984375,
+      "learning_rate": 0.005,
+      "loss": 2.8498644828796387,
+      "step": 684
+    },
+    {
+      "epoch": 0.060223437402480186,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.005,
+      "loss": 2.8439319133758545,
+      "step": 686
+    },
+    {
+      "epoch": 0.06039901593718129,
+      "grad_norm": 0.0986328125,
+      "learning_rate": 0.005,
+      "loss": 2.816505193710327,
+      "step": 688
+    },
+    {
+      "epoch": 0.060574594471882404,
+      "grad_norm": 0.11181640625,
+      "learning_rate": 0.005,
+      "loss": 2.8512182235717773,
+      "step": 690
+    },
+    {
+      "epoch": 0.06075017300658351,
+      "grad_norm": 0.09619140625,
+      "learning_rate": 0.005,
+      "loss": 2.8385231494903564,
+      "step": 692
+    },
+    {
+      "epoch": 0.060925751541284615,
+      "grad_norm": 0.09619140625,
+      "learning_rate": 0.005,
+      "loss": 2.8305373191833496,
+      "step": 694
+    },
+    {
+      "epoch": 0.06110133007598573,
+      "grad_norm": 0.1123046875,
+      "learning_rate": 0.005,
+      "loss": 2.817195415496826,
+      "step": 696
+    },
+    {
+      "epoch": 0.06127690861068683,
+      "grad_norm": 0.10888671875,
+      "learning_rate": 0.005,
+      "loss": 2.869903087615967,
+      "step": 698
+    },
+    {
+      "epoch": 0.061452487145387945,
+      "grad_norm": 0.14453125,
+      "learning_rate": 0.005,
+      "loss": 2.856109619140625,
+      "step": 700
+    },
+    {
+      "epoch": 0.06162806568008905,
+      "grad_norm": 0.1025390625,
+      "learning_rate": 0.005,
+      "loss": 2.8376383781433105,
+      "step": 702
+    },
+    {
+      "epoch": 0.06180364421479016,
+      "grad_norm": 0.10595703125,
+      "learning_rate": 0.005,
+      "loss": 2.7832024097442627,
+      "step": 704
+    },
+    {
+      "epoch": 0.06197922274949127,
+      "grad_norm": 0.1220703125,
+      "learning_rate": 0.005,
+      "loss": 2.8388211727142334,
+      "step": 706
+    },
+    {
+      "epoch": 0.062154801284192374,
+      "grad_norm": 0.107421875,
+      "learning_rate": 0.005,
+      "loss": 2.8409483432769775,
+      "step": 708
+    },
+    {
+      "epoch": 0.062330379818893486,
+      "grad_norm": 0.09912109375,
+      "learning_rate": 0.005,
+      "loss": 2.8188676834106445,
+      "step": 710
+    },
+    {
+      "epoch": 0.0625059583535946,
+      "grad_norm": 0.103515625,
+      "learning_rate": 0.005,
+      "loss": 2.8468964099884033,
+      "step": 712
+    },
+    {
+      "epoch": 0.0626815368882957,
+      "grad_norm": 0.10302734375,
+      "learning_rate": 0.005,
+      "loss": 2.798382520675659,
+      "step": 714
+    },
+    {
+      "epoch": 0.06285711542299681,
+      "grad_norm": 0.130859375,
+      "learning_rate": 0.005,
+      "loss": 2.8014094829559326,
+      "step": 716
+    },
+    {
+      "epoch": 0.06303269395769792,
+      "grad_norm": 0.126953125,
+      "learning_rate": 0.005,
+      "loss": 2.8606605529785156,
+      "step": 718
+    },
+    {
+      "epoch": 0.06320827249239902,
+      "grad_norm": 0.1201171875,
+      "learning_rate": 0.005,
+      "loss": 2.8185365200042725,
+      "step": 720
+    },
+    {
+      "epoch": 0.06338385102710013,
+      "grad_norm": 0.12109375,
+      "learning_rate": 0.005,
+      "loss": 2.79809832572937,
+      "step": 722
+    },
+    {
+      "epoch": 0.06355942956180125,
+      "grad_norm": 0.12890625,
+      "learning_rate": 0.005,
+      "loss": 2.829028606414795,
+      "step": 724
+    },
+    {
+      "epoch": 0.06373500809650236,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.005,
+      "loss": 2.775477886199951,
+      "step": 726
+    },
+    {
+      "epoch": 0.06391058663120346,
+      "grad_norm": 0.09521484375,
+      "learning_rate": 0.005,
+      "loss": 2.806027412414551,
+      "step": 728
+    },
+    {
+      "epoch": 0.06408616516590457,
+      "grad_norm": 0.10498046875,
+      "learning_rate": 0.005,
+      "loss": 2.8189749717712402,
+      "step": 730
+    },
+    {
+      "epoch": 0.06426174370060568,
+      "grad_norm": 0.103515625,
+      "learning_rate": 0.005,
+      "loss": 2.809892416000366,
+      "step": 732
+    },
+    {
+      "epoch": 0.06443732223530678,
+      "grad_norm": 0.10009765625,
+      "learning_rate": 0.005,
+      "loss": 2.8056528568267822,
+      "step": 734
+    },
+    {
+      "epoch": 0.06461290077000789,
+      "grad_norm": 0.11328125,
+      "learning_rate": 0.005,
+      "loss": 2.851624011993408,
+      "step": 736
+    },
+    {
+      "epoch": 0.064788479304709,
+      "grad_norm": 0.09375,
+      "learning_rate": 0.005,
+      "loss": 2.839448928833008,
+      "step": 738
+    },
+    {
+      "epoch": 0.06496405783941012,
+      "grad_norm": 0.09326171875,
+      "learning_rate": 0.005,
+      "loss": 2.832307815551758,
+      "step": 740
+    },
+    {
+      "epoch": 0.06513963637411121,
+      "grad_norm": 0.10498046875,
+      "learning_rate": 0.005,
+      "loss": 2.816222667694092,
+      "step": 742
+    },
+    {
+      "epoch": 0.06531521490881233,
+      "grad_norm": 0.11328125,
+      "learning_rate": 0.005,
+      "loss": 2.814714193344116,
+      "step": 744
+    },
+    {
+      "epoch": 0.06549079344351344,
+      "grad_norm": 0.10302734375,
+      "learning_rate": 0.005,
+      "loss": 2.8213393688201904,
+      "step": 746
+    },
+    {
+      "epoch": 0.06566637197821454,
+      "grad_norm": 0.12451171875,
+      "learning_rate": 0.005,
+      "loss": 2.838747024536133,
+      "step": 748
+    },
+    {
+      "epoch": 0.06584195051291565,
+      "grad_norm": 0.11669921875,
+      "learning_rate": 0.005,
+      "loss": 2.8111016750335693,
+      "step": 750
+    },
+    {
+      "epoch": 0.06601752904761676,
+      "grad_norm": 0.1357421875,
+      "learning_rate": 0.005,
+      "loss": 2.8226990699768066,
+      "step": 752
+    },
+    {
+      "epoch": 0.06619310758231788,
+      "grad_norm": 0.12060546875,
+      "learning_rate": 0.005,
+      "loss": 2.8317198753356934,
+      "step": 754
+    },
+    {
+      "epoch": 0.06636868611701897,
+      "grad_norm": 0.1015625,
+      "learning_rate": 0.005,
+      "loss": 2.795905113220215,
+      "step": 756
+    },
+    {
+      "epoch": 0.06654426465172009,
+      "grad_norm": 0.10205078125,
+      "learning_rate": 0.005,
+      "loss": 2.8132882118225098,
+      "step": 758
+    },
+    {
+      "epoch": 0.0667198431864212,
+      "grad_norm": 0.10205078125,
+      "learning_rate": 0.005,
+      "loss": 2.8165059089660645,
+      "step": 760
+    },
+    {
+      "epoch": 0.0668954217211223,
+      "grad_norm": 0.09228515625,
+      "learning_rate": 0.005,
+      "loss": 2.8166255950927734,
+      "step": 762
+    },
+    {
+      "epoch": 0.06707100025582341,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.005,
+      "loss": 2.822350025177002,
+      "step": 764
+    },
+    {
+      "epoch": 0.06724657879052452,
+      "grad_norm": 0.095703125,
+      "learning_rate": 0.005,
+      "loss": 2.8084328174591064,
+      "step": 766
+    },
+    {
+      "epoch": 0.06742215732522563,
+      "grad_norm": 0.11279296875,
+      "learning_rate": 0.005,
+      "loss": 2.803453207015991,
+      "step": 768
+    },
+    {
+      "epoch": 0.06759773585992673,
+      "grad_norm": 0.11376953125,
+      "learning_rate": 0.005,
+      "loss": 2.7944207191467285,
+      "step": 770
+    },
+    {
+      "epoch": 0.06777331439462785,
+      "grad_norm": 0.1201171875,
+      "learning_rate": 0.005,
+      "loss": 2.778062582015991,
+      "step": 772
+    },
+    {
+      "epoch": 0.06794889292932896,
+      "grad_norm": 0.1025390625,
+      "learning_rate": 0.005,
+      "loss": 2.7989413738250732,
+      "step": 774
+    },
+    {
+      "epoch": 0.06812447146403006,
+      "grad_norm": 0.12353515625,
+      "learning_rate": 0.005,
+      "loss": 2.784641742706299,
+      "step": 776
+    },
+    {
+      "epoch": 0.06830004999873117,
+      "grad_norm": 0.1103515625,
+      "learning_rate": 0.005,
+      "loss": 2.8191514015197754,
+      "step": 778
+    },
+    {
+      "epoch": 0.06847562853343228,
+      "grad_norm": 0.10791015625,
+      "learning_rate": 0.005,
+      "loss": 2.793182373046875,
+      "step": 780
+    },
+    {
+      "epoch": 0.0686512070681334,
+      "grad_norm": 0.1201171875,
+      "learning_rate": 0.005,
+      "loss": 2.7880682945251465,
+      "step": 782
+    },
+    {
+      "epoch": 0.06882678560283449,
+      "grad_norm": 0.1396484375,
+      "learning_rate": 0.005,
+      "loss": 2.8155131340026855,
+      "step": 784
+    },
+    {
+      "epoch": 0.0690023641375356,
+      "grad_norm": 0.1083984375,
+      "learning_rate": 0.005,
+      "loss": 2.8056650161743164,
+      "step": 786
+    },
+    {
+      "epoch": 0.06917794267223672,
+      "grad_norm": 0.1103515625,
+      "learning_rate": 0.005,
+      "loss": 2.794665575027466,
+      "step": 788
+    },
+    {
+      "epoch": 0.06935352120693782,
+      "grad_norm": 0.11474609375,
+      "learning_rate": 0.005,
+      "loss": 2.8043224811553955,
+      "step": 790
+    },
+    {
+      "epoch": 0.06952909974163893,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.005,
+      "loss": 2.7930874824523926,
+      "step": 792
+    },
+    {
+      "epoch": 0.06970467827634004,
+      "grad_norm": 0.095703125,
+      "learning_rate": 0.005,
+      "loss": 2.7990355491638184,
+      "step": 794
+    },
+    {
+      "epoch": 0.06988025681104115,
+      "grad_norm": 0.0947265625,
+      "learning_rate": 0.005,
+      "loss": 2.7672834396362305,
+      "step": 796
+    },
+    {
+      "epoch": 0.07005583534574225,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.005,
+      "loss": 2.8196029663085938,
+      "step": 798
+    },
+    {
+      "epoch": 0.07023141388044336,
+      "grad_norm": 0.08837890625,
+      "learning_rate": 0.005,
+      "loss": 2.78668475151062,
+      "step": 800
+    },
+    {
+      "epoch": 0.07040699241514448,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.005,
+      "loss": 2.755502462387085,
+      "step": 802
+    },
+    {
+      "epoch": 0.07058257094984557,
+      "grad_norm": 0.1064453125,
+      "learning_rate": 0.005,
+      "loss": 2.756058692932129,
+      "step": 804
+    },
+    {
+      "epoch": 0.07075814948454669,
+      "grad_norm": 0.0927734375,
+      "learning_rate": 0.005,
+      "loss": 2.7573466300964355,
+      "step": 806
+    },
+    {
+      "epoch": 0.0709337280192478,
+      "grad_norm": 0.09619140625,
+      "learning_rate": 0.005,
+      "loss": 2.804060220718384,
+      "step": 808
+    },
+    {
+      "epoch": 0.07110930655394891,
+      "grad_norm": 0.095703125,
+      "learning_rate": 0.005,
+      "loss": 2.8130621910095215,
+      "step": 810
+    },
+    {
+      "epoch": 0.07128488508865001,
+      "grad_norm": 0.0869140625,
+      "learning_rate": 0.005,
+      "loss": 2.744436264038086,
+      "step": 812
+    },
+    {
+      "epoch": 0.07146046362335112,
+      "grad_norm": 0.09619140625,
+      "learning_rate": 0.005,
+      "loss": 2.813497304916382,
+      "step": 814
+    },
+    {
+      "epoch": 0.07163604215805223,
+      "grad_norm": 0.11669921875,
+      "learning_rate": 0.005,
+      "loss": 2.8008008003234863,
+      "step": 816
+    },
+    {
+      "epoch": 0.07181162069275333,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.005,
+      "loss": 2.7890126705169678,
+      "step": 818
+    },
+    {
+      "epoch": 0.07198719922745445,
+      "grad_norm": 0.119140625,
+      "learning_rate": 0.005,
+      "loss": 2.784027576446533,
+      "step": 820
+    },
+    {
+      "epoch": 0.07216277776215556,
+      "grad_norm": 0.0869140625,
+      "learning_rate": 0.005,
+      "loss": 2.790325880050659,
+      "step": 822
+    },
+    {
+      "epoch": 0.07233835629685667,
+      "grad_norm": 0.0908203125,
+      "learning_rate": 0.005,
+      "loss": 2.780329704284668,
+      "step": 824
+    },
+    {
+      "epoch": 0.07251393483155777,
+      "grad_norm": 0.11865234375,
+      "learning_rate": 0.005,
+      "loss": 2.7846665382385254,
+      "step": 826
+    },
+    {
+      "epoch": 0.07268951336625888,
+      "grad_norm": 0.1171875,
+      "learning_rate": 0.005,
+      "loss": 2.786909818649292,
+      "step": 828
+    },
+    {
+      "epoch": 0.07286509190096,
+      "grad_norm": 0.11083984375,
+      "learning_rate": 0.005,
+      "loss": 2.808149814605713,
+      "step": 830
+    },
+    {
+      "epoch": 0.07304067043566109,
+      "grad_norm": 0.1025390625,
+      "learning_rate": 0.005,
+      "loss": 2.758054733276367,
+      "step": 832
+    },
+    {
+      "epoch": 0.0732162489703622,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.005,
+      "loss": 2.8043644428253174,
+      "step": 834
+    },
+    {
+      "epoch": 0.07339182750506332,
+      "grad_norm": 0.0947265625,
+      "learning_rate": 0.005,
+      "loss": 2.7352335453033447,
+      "step": 836
+    },
+    {
+      "epoch": 0.07356740603976443,
+      "grad_norm": 0.11328125,
+      "learning_rate": 0.005,
+      "loss": 2.762988567352295,
+      "step": 838
+    },
+    {
+      "epoch": 0.07374298457446553,
+      "grad_norm": 0.10595703125,
+      "learning_rate": 0.005,
+      "loss": 2.7505500316619873,
+      "step": 840
+    },
+    {
+      "epoch": 0.07391856310916664,
+      "grad_norm": 0.08837890625,
+      "learning_rate": 0.005,
+      "loss": 2.756007194519043,
+      "step": 842
+    },
+    {
+      "epoch": 0.07409414164386775,
+      "grad_norm": 0.08203125,
+      "learning_rate": 0.005,
+      "loss": 2.740776538848877,
+      "step": 844
+    },
+    {
+      "epoch": 0.07426972017856885,
+      "grad_norm": 0.103515625,
+      "learning_rate": 0.005,
+      "loss": 2.766468048095703,
+      "step": 846
+    },
+    {
+      "epoch": 0.07444529871326996,
+      "grad_norm": 0.10009765625,
+      "learning_rate": 0.005,
+      "loss": 2.768131971359253,
+      "step": 848
+    },
+    {
+      "epoch": 0.07462087724797108,
+      "grad_norm": 0.09326171875,
+      "learning_rate": 0.005,
+      "loss": 2.7882707118988037,
+      "step": 850
+    },
+    {
+      "epoch": 0.07479645578267219,
+      "grad_norm": 0.08935546875,
+      "learning_rate": 0.005,
+      "loss": 2.759927988052368,
+      "step": 852
+    },
+    {
+      "epoch": 0.07497203431737329,
+      "grad_norm": 0.09814453125,
+      "learning_rate": 0.005,
+      "loss": 2.772904634475708,
+      "step": 854
+    },
+    {
+      "epoch": 0.0751476128520744,
+      "grad_norm": 0.11279296875,
+      "learning_rate": 0.005,
+      "loss": 2.7328786849975586,
+      "step": 856
+    },
+    {
+      "epoch": 0.07532319138677551,
+      "grad_norm": 0.1025390625,
+      "learning_rate": 0.005,
+      "loss": 2.789806842803955,
+      "step": 858
+    },
+    {
+      "epoch": 0.07549876992147661,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.005,
+      "loss": 2.771458148956299,
+      "step": 860
+    },
+    {
+      "epoch": 0.07567434845617772,
+      "grad_norm": 0.11962890625,
+      "learning_rate": 0.005,
+      "loss": 2.7628495693206787,
+      "step": 862
+    },
+    {
+      "epoch": 0.07584992699087884,
+      "grad_norm": 0.0966796875,
+      "learning_rate": 0.005,
+      "loss": 2.770573377609253,
+      "step": 864
+    },
+    {
+      "epoch": 0.07602550552557993,
+      "grad_norm": 0.09228515625,
+      "learning_rate": 0.005,
+      "loss": 2.793875217437744,
+      "step": 866
+    },
+    {
+      "epoch": 0.07620108406028105,
+      "grad_norm": 0.091796875,
+      "learning_rate": 0.005,
+      "loss": 2.7835752964019775,
+      "step": 868
+    },
+    {
+      "epoch": 0.07637666259498216,
+      "grad_norm": 0.08544921875,
+      "learning_rate": 0.005,
+      "loss": 2.775233745574951,
+      "step": 870
+    },
+    {
+      "epoch": 0.07655224112968327,
+      "grad_norm": 0.08544921875,
+      "learning_rate": 0.005,
+      "loss": 2.7418532371520996,
+      "step": 872
+    },
+    {
+      "epoch": 0.07672781966438437,
+      "grad_norm": 0.08740234375,
+      "learning_rate": 0.005,
+      "loss": 2.754110097885132,
+      "step": 874
+    },
+    {
+      "epoch": 0.07690339819908548,
+      "grad_norm": 0.0810546875,
+      "learning_rate": 0.005,
+      "loss": 2.773123264312744,
+      "step": 876
+    },
+    {
+      "epoch": 0.0770789767337866,
+      "grad_norm": 0.083984375,
+      "learning_rate": 0.005,
+      "loss": 2.750389337539673,
+      "step": 878
+    },
+    {
+      "epoch": 0.07725455526848769,
+      "grad_norm": 0.0830078125,
+      "learning_rate": 0.005,
+      "loss": 2.733523368835449,
+      "step": 880
+    },
+    {
+      "epoch": 0.0774301338031888,
+      "grad_norm": 0.08984375,
+      "learning_rate": 0.005,
+      "loss": 2.7816543579101562,
+      "step": 882
+    },
+    {
+      "epoch": 0.07760571233788992,
+      "grad_norm": 0.0947265625,
+      "learning_rate": 0.005,
+      "loss": 2.7417476177215576,
+      "step": 884
+    },
+    {
+      "epoch": 0.07778129087259103,
+      "grad_norm": 0.10302734375,
+      "learning_rate": 0.005,
+      "loss": 2.7590861320495605,
+      "step": 886
+    },
+    {
+      "epoch": 0.07795686940729213,
+      "grad_norm": 0.099609375,
+      "learning_rate": 0.005,
+      "loss": 2.74955677986145,
+      "step": 888
+    },
+    {
+      "epoch": 0.07813244794199324,
+      "grad_norm": 0.09375,
+      "learning_rate": 0.005,
+      "loss": 2.7264111042022705,
+      "step": 890
+    },
+    {
+      "epoch": 0.07830802647669435,
+      "grad_norm": 0.09521484375,
+      "learning_rate": 0.005,
+      "loss": 2.7831571102142334,
+      "step": 892
+    },
+    {
+      "epoch": 0.07848360501139545,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.005,
+      "loss": 2.777005672454834,
+      "step": 894
+    },
+    {
+      "epoch": 0.07865918354609656,
+      "grad_norm": 0.09716796875,
+      "learning_rate": 0.005,
+      "loss": 2.759586811065674,
+      "step": 896
+    },
+    {
+      "epoch": 0.07883476208079768,
+      "grad_norm": 0.083984375,
+      "learning_rate": 0.005,
+      "loss": 2.750412940979004,
+      "step": 898
+    },
+    {
+      "epoch": 0.07901034061549879,
+      "grad_norm": 0.08447265625,
+      "learning_rate": 0.005,
+      "loss": 2.7232556343078613,
+      "step": 900
     }
   ],
   "logging_steps": 2,
       "attributes": {}
     }
   },
+  "total_flos": 1.5213476081262559e+18,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null