Instructions to use AiAF/rp-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AiAF/rp-2b with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it")
model = PeftModel.from_pretrained(base_model, "AiAF/rp-2b")

Transformers

How to use AiAF/rp-2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AiAF/rp-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AiAF/rp-2b")
model = AutoModelForCausalLM.from_pretrained("AiAF/rp-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AiAF/rp-2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AiAF/rp-2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AiAF/rp-2b

SGLang

How to use AiAF/rp-2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AiAF/rp-2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AiAF/rp-2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AiAF/rp-2b with Docker Model Runner:
```
docker model run hf.co/AiAF/rp-2b
```

AiAF commited on Mar 30

Commit

ed45871

verified ·

1 Parent(s): 9a07e06

Training in progress, step 850, checkpoint

Browse files

Files changed (6) hide show

last-checkpoint/adapter_model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/tokens_state.json +1 -1
last-checkpoint/trainer_state.json +715 -3

last-checkpoint/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cd804fe5a6a07ca92c0d9df3ee8901a99a952af466c85b5d67804f3b9b5754fc
 size 102264160

 version https://git-lfs.github.com/spec/v1
+oid sha256:ad29cff1b863587cbb2ca948354cb20133cf91efb3ab95cc9e09274cb6bcac5b
 size 102264160

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cc0bed6cff1a4618fb4cd1381e691366f8ad28f8182c56da1f0df2fb19366078
 size 52162827

 version https://git-lfs.github.com/spec/v1
+oid sha256:99e35caf9f22b9501f8794b7071db015d9c5f1cc2081e5e6b308b86d01258be1
 size 52162827

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9f05bb1ddd76152fd645931407e88adee7bc96ff7799e0d5b2faef63c077f8ed
 size 14645

 version https://git-lfs.github.com/spec/v1
+oid sha256:e394ddf37d3569e21dd7164d17df1486101a840dc12b8080abbcaca06573e244
 size 14645

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4c0f6da37afd2d18fa5e85c27927c29b3e2c21ee39c49983ca41ec400e0b2cd5
 size 1465

 version https://git-lfs.github.com/spec/v1
+oid sha256:b233ed6e5d634209b3aa9991eded2c9aa4b12fa1b2fb73e19124dd488ff69f21
 size 1465

last-checkpoint/tokens_state.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"total": ~~11163776~~, "trainable": ~~4620168~~}


1	+ {"total": 11857664, "trainable": 4907297}

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.39019631752225337,
   "eval_steps": 50,
-  "global_step": 800,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -11412,6 +11412,718 @@
       "memory/max_active (GiB)": 11.76,
       "memory/max_allocated (GiB)": 11.76,
       "step": 800
     }
   ],
   "logging_steps": 1,
@@ -11431,7 +12143,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.3731959764176077e+17,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.4145835873673942,
   "eval_steps": 50,
+  "global_step": 850,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "memory/max_active (GiB)": 11.76,
       "memory/max_allocated (GiB)": 11.76,
       "step": 800
+    },
+    {
+      "epoch": 0.3906840629191562,
+      "grad_norm": 0.17632552981376648,
+      "learning_rate": 2.025571894372794e-05,
+      "loss": 2.419203281402588,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.2369,
+      "step": 801,
+      "tokens/total": 11177856,
+      "tokens/train_per_sec_per_gpu": 2251.92,
+      "tokens/trainable": 4624735
+    },
+    {
+      "epoch": 0.391171808316059,
+      "grad_norm": 0.1677611917257309,
+      "learning_rate": 2.0060712799926408e-05,
+      "loss": 2.390589475631714,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 10.91993,
+      "step": 802,
+      "tokens/total": 11191808,
+      "tokens/train_per_sec_per_gpu": 1610.5,
+      "tokens/trainable": 4629319
+    },
+    {
+      "epoch": 0.39165955371296185,
+      "grad_norm": 0.14507685601711273,
+      "learning_rate": 1.9866545181421013e-05,
+      "loss": 2.7266347408294678,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 15.28137,
+      "step": 803,
+      "tokens/total": 11205120,
+      "tokens/train_per_sec_per_gpu": 201.89,
+      "tokens/trainable": 4636143
+    },
+    {
+      "epoch": 0.3921472991098647,
+      "grad_norm": 0.14650315046310425,
+      "learning_rate": 1.967321812493813e-05,
+      "loss": 2.496601104736328,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 12.14116,
+      "step": 804,
+      "tokens/total": 11217920,
+      "tokens/train_per_sec_per_gpu": 1516.0,
+      "tokens/trainable": 4641943
+    },
+    {
+      "epoch": 0.39263504450676745,
+      "grad_norm": 0.2653907835483551,
+      "learning_rate": 1.9480733658387175e-05,
+      "loss": 2.582747220993042,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.53,
+      "memory/max_allocated (GiB)": 15.53,
+      "ppl": 13.23344,
+      "step": 805,
+      "tokens/total": 11231872,
+      "tokens/train_per_sec_per_gpu": 701.31,
+      "tokens/trainable": 4643824
+    },
+    {
+      "epoch": 0.3931227899036703,
+      "grad_norm": 0.1440833956003189,
+      "learning_rate": 1.9289093800839066e-05,
+      "loss": 2.470148801803589,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 11.82421,
+      "step": 806,
+      "tokens/total": 11245824,
+      "tokens/train_per_sec_per_gpu": 3396.95,
+      "tokens/trainable": 4650548
+    },
+    {
+      "epoch": 0.3936105353005731,
+      "grad_norm": 0.1409664899110794,
+      "learning_rate": 1.9098300562505266e-05,
+      "loss": 2.7668747901916504,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.98,
+      "memory/max_allocated (GiB)": 15.98,
+      "ppl": 15.90884,
+      "step": 807,
+      "tokens/total": 11259648,
+      "tokens/train_per_sec_per_gpu": 1139.14,
+      "tokens/trainable": 4657579
+    },
+    {
+      "epoch": 0.39409828069747593,
+      "grad_norm": 0.1468340903520584,
+      "learning_rate": 1.8908355944716517e-05,
+      "loss": 2.6180667877197266,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.98,
+      "memory/max_allocated (GiB)": 15.98,
+      "ppl": 13.7092,
+      "step": 808,
+      "tokens/total": 11274240,
+      "tokens/train_per_sec_per_gpu": 1312.1,
+      "tokens/trainable": 4664365
+    },
+    {
+      "epoch": 0.39458602609437876,
+      "grad_norm": 0.1398187279701233,
+      "learning_rate": 1.871926193990202e-05,
+      "loss": 2.5571842193603516,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.89944,
+      "step": 809,
+      "tokens/total": 11287296,
+      "tokens/train_per_sec_per_gpu": 1438.48,
+      "tokens/trainable": 4671448
+    },
+    {
+      "epoch": 0.39507377149128153,
+      "grad_norm": 0.13157154619693756,
+      "learning_rate": 1.8531020531568378e-05,
+      "loss": 2.4374163150787354,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.44344,
+      "step": 810,
+      "tokens/total": 11303296,
+      "tokens/train_per_sec_per_gpu": 1925.09,
+      "tokens/trainable": 4679027
+    },
+    {
+      "epoch": 0.39556151688818436,
+      "grad_norm": 0.1602177768945694,
+      "learning_rate": 1.8343633694278895e-05,
+      "loss": 2.5065877437591553,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.26301,
+      "step": 811,
+      "tokens/total": 11317120,
+      "tokens/train_per_sec_per_gpu": 2074.92,
+      "tokens/trainable": 4684195
+    },
+    {
+      "epoch": 0.3960492622850872,
+      "grad_norm": 0.17014168202877045,
+      "learning_rate": 1.8157103393632868e-05,
+      "loss": 2.4969608783721924,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.14553,
+      "step": 812,
+      "tokens/total": 11331712,
+      "tokens/train_per_sec_per_gpu": 1899.17,
+      "tokens/trainable": 4688512
+    },
+    {
+      "epoch": 0.39653700768199,
+      "grad_norm": 0.15981672704219818,
+      "learning_rate": 1.7971431586244815e-05,
+      "loss": 2.3524038791656494,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 13.86,
+      "memory/max_allocated (GiB)": 13.86,
+      "ppl": 10.51081,
+      "step": 813,
+      "tokens/total": 11344256,
+      "tokens/train_per_sec_per_gpu": 2253.56,
+      "tokens/trainable": 4693239
+    },
+    {
+      "epoch": 0.39702475307889284,
+      "grad_norm": 0.1451166570186615,
+      "learning_rate": 1.7786620219724204e-05,
+      "loss": 2.3406598567962646,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 10.38809,
+      "step": 814,
+      "tokens/total": 11359104,
+      "tokens/train_per_sec_per_gpu": 797.01,
+      "tokens/trainable": 4699549
+    },
+    {
+      "epoch": 0.3975124984757956,
+      "grad_norm": 0.184647798538208,
+      "learning_rate": 1.7602671232654754e-05,
+      "loss": 2.687480926513672,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 14.69461,
+      "step": 815,
+      "tokens/total": 11373568,
+      "tokens/train_per_sec_per_gpu": 1141.17,
+      "tokens/trainable": 4703613
+    },
+    {
+      "epoch": 0.39800024387269844,
+      "grad_norm": 0.1620160937309265,
+      "learning_rate": 1.741958655457436e-05,
+      "loss": 2.4154233932495117,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.19451,
+      "step": 816,
+      "tokens/total": 11385600,
+      "tokens/train_per_sec_per_gpu": 87.94,
+      "tokens/trainable": 4708168
+    },
+    {
+      "epoch": 0.3984879892696013,
+      "grad_norm": 0.15860387682914734,
+      "learning_rate": 1.723736810595461e-05,
+      "loss": 2.539144992828369,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.66883,
+      "step": 817,
+      "tokens/total": 11399680,
+      "tokens/train_per_sec_per_gpu": 1662.93,
+      "tokens/trainable": 4713327
+    },
+    {
+      "epoch": 0.3989757346665041,
+      "grad_norm": 0.14269250631332397,
+      "learning_rate": 1.7056017798180824e-05,
+      "loss": 2.400291919708252,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 11.02639,
+      "step": 818,
+      "tokens/total": 11415040,
+      "tokens/train_per_sec_per_gpu": 1938.98,
+      "tokens/trainable": 4720448
+    },
+    {
+      "epoch": 0.39946348006340693,
+      "grad_norm": 0.182223379611969,
+      "learning_rate": 1.6875537533531948e-05,
+      "loss": 2.5135679244995117,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.34891,
+      "step": 819,
+      "tokens/total": 11428480,
+      "tokens/train_per_sec_per_gpu": 2430.12,
+      "tokens/trainable": 4724427
+    },
+    {
+      "epoch": 0.3999512254603097,
+      "grad_norm": 0.15434423089027405,
+      "learning_rate": 1.6695929205160487e-05,
+      "loss": 2.6116271018981934,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 13.6212,
+      "step": 820,
+      "tokens/total": 11443200,
+      "tokens/train_per_sec_per_gpu": 2846.47,
+      "tokens/trainable": 4730440
+    },
+    {
+      "epoch": 0.40043897085721253,
+      "grad_norm": 0.14820340275764465,
+      "learning_rate": 1.65171946970729e-05,
+      "loss": 2.5509421825408936,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.81918,
+      "step": 821,
+      "tokens/total": 11458048,
+      "tokens/train_per_sec_per_gpu": 2417.91,
+      "tokens/trainable": 4737120
+    },
+    {
+      "epoch": 0.40092671625411536,
+      "grad_norm": 0.17228034138679504,
+      "learning_rate": 1.6339335884109518e-05,
+      "loss": 2.514219284057617,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 12.35696,
+      "step": 822,
+      "tokens/total": 11470848,
+      "tokens/train_per_sec_per_gpu": 1171.68,
+      "tokens/trainable": 4743584
+    },
+    {
+      "epoch": 0.4014144616510182,
+      "grad_norm": 0.14200446009635925,
+      "learning_rate": 1.6162354631925204e-05,
+      "loss": 2.7033231258392334,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.53,
+      "memory/max_allocated (GiB)": 15.53,
+      "ppl": 14.92926,
+      "step": 823,
+      "tokens/total": 11484160,
+      "tokens/train_per_sec_per_gpu": 1981.38,
+      "tokens/trainable": 4750111
+    },
+    {
+      "epoch": 0.401902207047921,
+      "grad_norm": 0.1785208135843277,
+      "learning_rate": 1.598625279696948e-05,
+      "loss": 2.621516704559326,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.75657,
+      "step": 824,
+      "tokens/total": 11497728,
+      "tokens/train_per_sec_per_gpu": 2868.11,
+      "tokens/trainable": 4754852
+    },
+    {
+      "epoch": 0.4023899524448238,
+      "grad_norm": 0.15656448900699615,
+      "learning_rate": 1.5811032226467305e-05,
+      "loss": 2.681117534637451,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 14.6014,
+      "step": 825,
+      "tokens/total": 11511808,
+      "tokens/train_per_sec_per_gpu": 1996.95,
+      "tokens/trainable": 4761473
+    },
+    {
+      "epoch": 0.4028776978417266,
+      "grad_norm": 0.13972437381744385,
+      "learning_rate": 1.563669475839956e-05,
+      "loss": 2.4459619522094727,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.54165,
+      "step": 826,
+      "tokens/total": 11525376,
+      "tokens/train_per_sec_per_gpu": 1750.28,
+      "tokens/trainable": 4768003
+    },
+    {
+      "epoch": 0.40336544323862944,
+      "grad_norm": 0.1425899863243103,
+      "learning_rate": 1.5463242221483743e-05,
+      "loss": 2.4396560192108154,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.46909,
+      "step": 827,
+      "tokens/total": 11540352,
+      "tokens/train_per_sec_per_gpu": 3054.08,
+      "tokens/trainable": 4774058
+    },
+    {
+      "epoch": 0.40385318863553227,
+      "grad_norm": 0.1668749898672104,
+      "learning_rate": 1.529067643515495e-05,
+      "loss": 2.672379493713379,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 14.47437,
+      "step": 828,
+      "tokens/total": 11554304,
+      "tokens/train_per_sec_per_gpu": 1615.7,
+      "tokens/trainable": 4779147
+    },
+    {
+      "epoch": 0.4043409340324351,
+      "grad_norm": 0.1699647754430771,
+      "learning_rate": 1.5118999209546559e-05,
+      "loss": 2.7491025924682617,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 15.6286,
+      "step": 829,
+      "tokens/total": 11568000,
+      "tokens/train_per_sec_per_gpu": 2331.87,
+      "tokens/trainable": 4785120
+    },
+    {
+      "epoch": 0.40482867942933787,
+      "grad_norm": 0.1574130356311798,
+      "learning_rate": 1.4948212345471491e-05,
+      "loss": 2.519521713256836,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 12.42265,
+      "step": 830,
+      "tokens/total": 11583360,
+      "tokens/train_per_sec_per_gpu": 1494.79,
+      "tokens/trainable": 4790591
+    },
+    {
+      "epoch": 0.4053164248262407,
+      "grad_norm": 0.12484201788902283,
+      "learning_rate": 1.4778317634403083e-05,
+      "loss": 2.391390800476074,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.92868,
+      "step": 831,
+      "tokens/total": 11598080,
+      "tokens/train_per_sec_per_gpu": 4005.3,
+      "tokens/trainable": 4799487
+    },
+    {
+      "epoch": 0.4058041702231435,
+      "grad_norm": 0.21536649763584137,
+      "learning_rate": 1.460931685845649e-05,
+      "loss": 2.217477560043335,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.98,
+      "memory/max_allocated (GiB)": 15.98,
+      "ppl": 9.18414,
+      "step": 832,
+      "tokens/total": 11610496,
+      "tokens/train_per_sec_per_gpu": 2295.79,
+      "tokens/trainable": 4802257
+    },
+    {
+      "epoch": 0.40629191562004635,
+      "grad_norm": 0.1689203679561615,
+      "learning_rate": 1.444121179036989e-05,
+      "loss": 2.681854724884033,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.53,
+      "memory/max_allocated (GiB)": 15.53,
+      "ppl": 14.61217,
+      "step": 833,
+      "tokens/total": 11622784,
+      "tokens/train_per_sec_per_gpu": 1075.61,
+      "tokens/trainable": 4807534
+    },
+    {
+      "epoch": 0.4067796610169492,
+      "grad_norm": 0.16477236151695251,
+      "learning_rate": 1.427400419348588e-05,
+      "loss": 2.518036127090454,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 12.40421,
+      "step": 834,
+      "tokens/total": 11637248,
+      "tokens/train_per_sec_per_gpu": 2330.01,
+      "tokens/trainable": 4812618
+    },
+    {
+      "epoch": 0.40726740641385195,
+      "grad_norm": 0.15567028522491455,
+      "learning_rate": 1.4107695821733025e-05,
+      "loss": 2.579047203063965,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.18457,
+      "step": 835,
+      "tokens/total": 11651072,
+      "tokens/train_per_sec_per_gpu": 2918.59,
+      "tokens/trainable": 4818469
+    },
+    {
+      "epoch": 0.4077551518107548,
+      "grad_norm": 0.15685085952281952,
+      "learning_rate": 1.3942288419607475e-05,
+      "loss": 2.431553840637207,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.37655,
+      "step": 836,
+      "tokens/total": 11665408,
+      "tokens/train_per_sec_per_gpu": 2168.92,
+      "tokens/trainable": 4823738
+    },
+    {
+      "epoch": 0.4082428972076576,
+      "grad_norm": 0.19555719196796417,
+      "learning_rate": 1.3777783722154603e-05,
+      "loss": 2.263695478439331,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 9.61857,
+      "step": 837,
+      "tokens/total": 11679744,
+      "tokens/train_per_sec_per_gpu": 1541.14,
+      "tokens/trainable": 4826962
+    },
+    {
+      "epoch": 0.40873064260456043,
+      "grad_norm": 0.13367053866386414,
+      "learning_rate": 1.3614183454950824e-05,
+      "loss": 2.4465866088867188,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.54886,
+      "step": 838,
+      "tokens/total": 11693952,
+      "tokens/train_per_sec_per_gpu": 1674.24,
+      "tokens/trainable": 4834337
+    },
+    {
+      "epoch": 0.40921838800146326,
+      "grad_norm": 0.14650499820709229,
+      "learning_rate": 1.3451489334085554e-05,
+      "loss": 2.3801074028015137,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 14.74,
+      "memory/max_allocated (GiB)": 14.74,
+      "ppl": 10.80606,
+      "step": 839,
+      "tokens/total": 11707136,
+      "tokens/train_per_sec_per_gpu": 2774.02,
+      "tokens/trainable": 4840420
+    },
+    {
+      "epoch": 0.40970613339836603,
+      "grad_norm": 0.18212567269802094,
+      "learning_rate": 1.3289703066143111e-05,
+      "loss": 2.615509510040283,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.67418,
+      "step": 840,
+      "tokens/total": 11721600,
+      "tokens/train_per_sec_per_gpu": 2407.64,
+      "tokens/trainable": 4845453
+    },
+    {
+      "epoch": 0.41019387879526886,
+      "grad_norm": 0.1324673295021057,
+      "learning_rate": 1.3128826348184887e-05,
+      "loss": 2.3340201377868652,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.31934,
+      "step": 841,
+      "tokens/total": 11736832,
+      "tokens/train_per_sec_per_gpu": 1690.93,
+      "tokens/trainable": 4852363
+    },
+    {
+      "epoch": 0.4106816241921717,
+      "grad_norm": 0.1828589141368866,
+      "learning_rate": 1.2968860867731569e-05,
+      "loss": 2.5910964012145996,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.18,
+      "memory/max_allocated (GiB)": 15.18,
+      "ppl": 13.34439,
+      "step": 842,
+      "tokens/total": 11749760,
+      "tokens/train_per_sec_per_gpu": 2010.81,
+      "tokens/trainable": 4860314
+    },
+    {
+      "epoch": 0.4111693695890745,
+      "grad_norm": 0.17424485087394714,
+      "learning_rate": 1.2809808302745297e-05,
+      "loss": 2.2547149658203125,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 14.3,
+      "memory/max_allocated (GiB)": 14.3,
+      "ppl": 9.53258,
+      "step": 843,
+      "tokens/total": 11761536,
+      "tokens/train_per_sec_per_gpu": 1108.13,
+      "tokens/trainable": 4864065
+    },
+    {
+      "epoch": 0.41165711498597735,
+      "grad_norm": 0.1569896787405014,
+      "learning_rate": 1.2651670321612263e-05,
+      "loss": 2.6433074474334717,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 14.05963,
+      "step": 844,
+      "tokens/total": 11775104,
+      "tokens/train_per_sec_per_gpu": 1862.26,
+      "tokens/trainable": 4869648
+    },
+    {
+      "epoch": 0.4121448603828801,
+      "grad_norm": 0.1626082807779312,
+      "learning_rate": 1.2494448583125018e-05,
+      "loss": 2.5848560333251953,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 14.74,
+      "memory/max_allocated (GiB)": 14.74,
+      "ppl": 13.26138,
+      "step": 845,
+      "tokens/total": 11787136,
+      "tokens/train_per_sec_per_gpu": 3456.79,
+      "tokens/trainable": 4874679
+    },
+    {
+      "epoch": 0.41263260577978295,
+      "grad_norm": 0.13333570957183838,
+      "learning_rate": 1.233814473646524e-05,
+      "loss": 2.450986385345459,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 11.59978,
+      "step": 846,
+      "tokens/total": 11802112,
+      "tokens/train_per_sec_per_gpu": 3875.89,
+      "tokens/trainable": 4882554
+    },
+    {
+      "epoch": 0.4131203511766858,
+      "grad_norm": 0.15294967591762543,
+      "learning_rate": 1.218276042118629e-05,
+      "loss": 2.352025032043457,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.18,
+      "memory/max_allocated (GiB)": 15.18,
+      "ppl": 10.50682,
+      "step": 847,
+      "tokens/total": 11815936,
+      "tokens/train_per_sec_per_gpu": 2607.66,
+      "tokens/trainable": 4888862
+    },
+    {
+      "epoch": 0.4136080965735886,
+      "grad_norm": 0.13894210755825043,
+      "learning_rate": 1.202829726719611e-05,
+      "loss": 2.7392213344573975,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 15.18,
+      "memory/max_allocated (GiB)": 15.18,
+      "ppl": 15.47493,
+      "step": 848,
+      "tokens/total": 11828352,
+      "tokens/train_per_sec_per_gpu": 1599.17,
+      "tokens/trainable": 4896381
+    },
+    {
+      "epoch": 0.41409584197049143,
+      "grad_norm": 0.16302239894866943,
+      "learning_rate": 1.1874756894740135e-05,
+      "loss": 2.470451831817627,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.82779,
+      "step": 849,
+      "tokens/total": 11843712,
+      "tokens/train_per_sec_per_gpu": 1074.34,
+      "tokens/trainable": 4901622
+    },
+    {
+      "epoch": 0.4145835873673942,
+      "grad_norm": 0.149576798081398,
+      "learning_rate": 1.172214091438416e-05,
+      "loss": 2.4945759773254395,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.11659,
+      "step": 850,
+      "tokens/total": 11857664,
+      "tokens/train_per_sec_per_gpu": 2578.75,
+      "tokens/trainable": 4907297
+    },
+    {
+      "epoch": 0.4145835873673942,
+      "eval_loss": 2.491702079772949,
+      "eval_ppl": 12.08182,
+      "eval_runtime": 6.0732,
+      "eval_samples_per_second": 32.932,
+      "eval_steps_per_second": 16.466,
+      "memory/device_reserved (GiB)": 27.47,
+      "memory/max_active (GiB)": 11.76,
+      "memory/max_allocated (GiB)": 11.76,
+      "step": 850
     }
   ],
   "logging_steps": 1,
       "attributes": {}
     }
   },
+  "total_flos": 1.4585474031825715e+17,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null