Instructions to use AiAF/rp-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AiAF/rp-2b with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it")
model = PeftModel.from_pretrained(base_model, "AiAF/rp-2b")

Transformers

How to use AiAF/rp-2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AiAF/rp-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AiAF/rp-2b")
model = AutoModelForCausalLM.from_pretrained("AiAF/rp-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AiAF/rp-2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AiAF/rp-2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AiAF/rp-2b

SGLang

How to use AiAF/rp-2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AiAF/rp-2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AiAF/rp-2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AiAF/rp-2b with Docker Model Runner:
```
docker model run hf.co/AiAF/rp-2b
```

AiAF commited on Mar 30

Commit

c86dc89

verified ·

1 Parent(s): fadd720

Training in progress, step 900, checkpoint

Browse files

Files changed (6) hide show

last-checkpoint/adapter_model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/tokens_state.json +1 -1
last-checkpoint/trainer_state.json +715 -3

last-checkpoint/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ad29cff1b863587cbb2ca948354cb20133cf91efb3ab95cc9e09274cb6bcac5b
 size 102264160

 version https://git-lfs.github.com/spec/v1
+oid sha256:92825f995b10c587276dcbb59bd1d6ee8e64825522d2dd1211f3d32eb56271e0
 size 102264160

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:99e35caf9f22b9501f8794b7071db015d9c5f1cc2081e5e6b308b86d01258be1
 size 52162827

 version https://git-lfs.github.com/spec/v1
+oid sha256:310b6425692196f794c4a8c4e6a433c67cf109b9958b71fd97fd7d7987695364
 size 52162827

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e394ddf37d3569e21dd7164d17df1486101a840dc12b8080abbcaca06573e244
 size 14645

 version https://git-lfs.github.com/spec/v1
+oid sha256:e94609f49fe622efb94028eb554b792c6f84218319e3570798403eabc10e0789
 size 14645

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b233ed6e5d634209b3aa9991eded2c9aa4b12fa1b2fb73e19124dd488ff69f21
 size 1465

 version https://git-lfs.github.com/spec/v1
+oid sha256:0fd147b564a8c5ec603af92237fa27a3ed62221eb04924da2d44364eed74d116
 size 1465

last-checkpoint/tokens_state.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"total": ~~11857664~~, "trainable": ~~4907297~~}


1	+ {"total": 12555520, "trainable": 5198878}

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.4145835873673942,
   "eval_steps": 50,
-  "global_step": 850,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -12124,6 +12124,718 @@
       "memory/max_active (GiB)": 11.76,
       "memory/max_allocated (GiB)": 11.76,
       "step": 850
     }
   ],
   "logging_steps": 1,
@@ -12143,7 +12855,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.4585474031825715e+17,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.43897085721253504,
   "eval_steps": 50,
+  "global_step": 900,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "memory/max_active (GiB)": 11.76,
       "memory/max_allocated (GiB)": 11.76,
       "step": 850
+    },
+    {
+      "epoch": 0.41507133276429703,
+      "grad_norm": 0.1435479074716568,
+      "learning_rate": 1.1570450926997655e-05,
+      "loss": 2.4883711338043213,
+      "memory/device_reserved (GiB)": 25.22,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 12.04165,
+      "step": 851,
+      "tokens/total": 11870720,
+      "tokens/train_per_sec_per_gpu": 3333.56,
+      "tokens/trainable": 4914049
+    },
+    {
+      "epoch": 0.41555907816119986,
+      "grad_norm": 0.1966681033372879,
+      "learning_rate": 1.141968852373676e-05,
+      "loss": 2.431966781616211,
+      "memory/device_reserved (GiB)": 25.23,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 11.38124,
+      "step": 852,
+      "tokens/total": 11884800,
+      "tokens/train_per_sec_per_gpu": 2765.34,
+      "tokens/trainable": 4917836
+    },
+    {
+      "epoch": 0.4160468235581027,
+      "grad_norm": 0.1497463434934616,
+      "learning_rate": 1.1269855286027797e-05,
+      "loss": 2.442220687866211,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.49855,
+      "step": 853,
+      "tokens/total": 11898240,
+      "tokens/train_per_sec_per_gpu": 653.89,
+      "tokens/trainable": 4923467
+    },
+    {
+      "epoch": 0.4165345689550055,
+      "grad_norm": 0.1307368278503418,
+      "learning_rate": 1.1120952785550476e-05,
+      "loss": 2.2968218326568604,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 9.94253,
+      "step": 854,
+      "tokens/total": 11913984,
+      "tokens/train_per_sec_per_gpu": 2660.99,
+      "tokens/trainable": 4931210
+    },
+    {
+      "epoch": 0.4170223143519083,
+      "grad_norm": 0.1639799326658249,
+      "learning_rate": 1.0972982584221592e-05,
+      "loss": 2.3355393409729004,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 10.33503,
+      "step": 855,
+      "tokens/total": 11929088,
+      "tokens/train_per_sec_per_gpu": 1326.43,
+      "tokens/trainable": 4937919
+    },
+    {
+      "epoch": 0.4175100597488111,
+      "grad_norm": 0.14141331613063812,
+      "learning_rate": 1.0825946234178574e-05,
+      "loss": 2.449113607406616,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.57808,
+      "step": 856,
+      "tokens/total": 11944320,
+      "tokens/train_per_sec_per_gpu": 1926.65,
+      "tokens/trainable": 4944515
+    },
+    {
+      "epoch": 0.41799780514571394,
+      "grad_norm": 0.17068256437778473,
+      "learning_rate": 1.067984527776309e-05,
+      "loss": 2.702491521835327,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 14.91685,
+      "step": 857,
+      "tokens/total": 11958016,
+      "tokens/train_per_sec_per_gpu": 1277.78,
+      "tokens/trainable": 4949764
+    },
+    {
+      "epoch": 0.41848555054261677,
+      "grad_norm": 0.199791818857193,
+      "learning_rate": 1.0534681247505106e-05,
+      "loss": 2.603161573410034,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.50637,
+      "step": 858,
+      "tokens/total": 11970304,
+      "tokens/train_per_sec_per_gpu": 1420.03,
+      "tokens/trainable": 4953574
+    },
+    {
+      "epoch": 0.4189732959395196,
+      "grad_norm": 0.1493324339389801,
+      "learning_rate": 1.0390455666106547e-05,
+      "loss": 2.5382094383239746,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.65699,
+      "step": 859,
+      "tokens/total": 11985920,
+      "tokens/train_per_sec_per_gpu": 2890.82,
+      "tokens/trainable": 4960025
+    },
+    {
+      "epoch": 0.41946104133642237,
+      "grad_norm": 0.1596572995185852,
+      "learning_rate": 1.024717004642557e-05,
+      "loss": 2.5015199184417725,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.09,
+      "memory/max_allocated (GiB)": 15.09,
+      "ppl": 12.20102,
+      "step": 860,
+      "tokens/total": 12000256,
+      "tokens/train_per_sec_per_gpu": 1781.45,
+      "tokens/trainable": 4965371
+    },
+    {
+      "epoch": 0.4199487867333252,
+      "grad_norm": 0.16531601548194885,
+      "learning_rate": 1.010482589146048e-05,
+      "loss": 2.519444704055786,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.4217,
+      "step": 861,
+      "tokens/total": 12014592,
+      "tokens/train_per_sec_per_gpu": 2384.76,
+      "tokens/trainable": 4970292
+    },
+    {
+      "epoch": 0.420436532130228,
+      "grad_norm": 0.15156078338623047,
+      "learning_rate": 9.963424694334122e-06,
+      "loss": 2.636232376098633,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 13.96051,
+      "step": 862,
+      "tokens/total": 12027520,
+      "tokens/train_per_sec_per_gpu": 1668.48,
+      "tokens/trainable": 4977083
+    },
+    {
+      "epoch": 0.42092427752713085,
+      "grad_norm": 0.15729570388793945,
+      "learning_rate": 9.822967938278171e-06,
+      "loss": 2.6120800971984863,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.62737,
+      "step": 863,
+      "tokens/total": 12040960,
+      "tokens/train_per_sec_per_gpu": 1881.0,
+      "tokens/trainable": 4982959
+    },
+    {
+      "epoch": 0.4214120229240337,
+      "grad_norm": 0.16363853216171265,
+      "learning_rate": 9.683457096617488e-06,
+      "loss": 2.4688925743103027,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.80936,
+      "step": 864,
+      "tokens/total": 12053760,
+      "tokens/train_per_sec_per_gpu": 1967.45,
+      "tokens/trainable": 4987799
+    },
+    {
+      "epoch": 0.42189976832093645,
+      "grad_norm": 0.14668720960617065,
+      "learning_rate": 9.544893632754814e-06,
+      "loss": 2.505845546722412,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 12.25392,
+      "step": 865,
+      "tokens/total": 12067072,
+      "tokens/train_per_sec_per_gpu": 1906.6,
+      "tokens/trainable": 4994507
+    },
+    {
+      "epoch": 0.4223875137178393,
+      "grad_norm": 0.16504798829555511,
+      "learning_rate": 9.407279000155312e-06,
+      "loss": 2.658405303955078,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 14.27351,
+      "step": 866,
+      "tokens/total": 12080384,
+      "tokens/train_per_sec_per_gpu": 341.69,
+      "tokens/trainable": 4999747
+    },
+    {
+      "epoch": 0.4228752591147421,
+      "grad_norm": 0.1413465142250061,
+      "learning_rate": 9.270614642331376e-06,
+      "loss": 2.5570576190948486,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 14.74,
+      "memory/max_allocated (GiB)": 14.74,
+      "ppl": 12.89781,
+      "step": 867,
+      "tokens/total": 12093056,
+      "tokens/train_per_sec_per_gpu": 1759.22,
+      "tokens/trainable": 5006280
+    },
+    {
+      "epoch": 0.42336300451164494,
+      "grad_norm": 0.16530846059322357,
+      "learning_rate": 9.134901992827427e-06,
+      "loss": 2.3816144466400146,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.18,
+      "memory/max_allocated (GiB)": 15.18,
+      "ppl": 10.82236,
+      "step": 868,
+      "tokens/total": 12106880,
+      "tokens/train_per_sec_per_gpu": 230.38,
+      "tokens/trainable": 5011197
+    },
+    {
+      "epoch": 0.42385074990854776,
+      "grad_norm": 0.15594670176506042,
+      "learning_rate": 9.000142475204964e-06,
+      "loss": 2.468984603881836,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.81045,
+      "step": 869,
+      "tokens/total": 12120704,
+      "tokens/train_per_sec_per_gpu": 2162.37,
+      "tokens/trainable": 5016737
+    },
+    {
+      "epoch": 0.42433849530545054,
+      "grad_norm": 0.16707849502563477,
+      "learning_rate": 8.866337503027522e-06,
+      "loss": 2.6235711574554443,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 13.78486,
+      "step": 870,
+      "tokens/total": 12133504,
+      "tokens/train_per_sec_per_gpu": 1585.54,
+      "tokens/trainable": 5021777
+    },
+    {
+      "epoch": 0.42482624070235336,
+      "grad_norm": 0.14890620112419128,
+      "learning_rate": 8.733488479845997e-06,
+      "loss": 2.49348783493042,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.10342,
+      "step": 871,
+      "tokens/total": 12149120,
+      "tokens/train_per_sec_per_gpu": 2713.5,
+      "tokens/trainable": 5028442
+    },
+    {
+      "epoch": 0.4253139860992562,
+      "grad_norm": 0.18396082520484924,
+      "learning_rate": 8.60159679918372e-06,
+      "loss": 2.4837136268615723,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 14.3,
+      "memory/max_allocated (GiB)": 14.3,
+      "ppl": 11.98569,
+      "step": 872,
+      "tokens/total": 12162432,
+      "tokens/train_per_sec_per_gpu": 1947.57,
+      "tokens/trainable": 5033319
+    },
+    {
+      "epoch": 0.425801731496159,
+      "grad_norm": 0.18063481152057648,
+      "learning_rate": 8.470663844522052e-06,
+      "loss": 2.8627002239227295,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 17.50874,
+      "step": 873,
+      "tokens/total": 12177664,
+      "tokens/train_per_sec_per_gpu": 620.63,
+      "tokens/trainable": 5038529
+    },
+    {
+      "epoch": 0.42628947689306185,
+      "grad_norm": 0.19020754098892212,
+      "learning_rate": 8.340690989285726e-06,
+      "loss": 2.350053071975708,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.98,
+      "memory/max_allocated (GiB)": 15.98,
+      "ppl": 10.48613,
+      "step": 874,
+      "tokens/total": 12192256,
+      "tokens/train_per_sec_per_gpu": 254.37,
+      "tokens/trainable": 5041985
+    },
+    {
+      "epoch": 0.4267772222899646,
+      "grad_norm": 0.13190138339996338,
+      "learning_rate": 8.21167959682848e-06,
+      "loss": 2.2473227977752686,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 9.46237,
+      "step": 875,
+      "tokens/total": 12205440,
+      "tokens/train_per_sec_per_gpu": 3375.17,
+      "tokens/trainable": 5049048
+    },
+    {
+      "epoch": 0.42726496768686745,
+      "grad_norm": 0.16024993360042572,
+      "learning_rate": 8.083631020418791e-06,
+      "loss": 2.3730978965759277,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 10.73058,
+      "step": 876,
+      "tokens/total": 12220160,
+      "tokens/train_per_sec_per_gpu": 2040.44,
+      "tokens/trainable": 5054947
+    },
+    {
+      "epoch": 0.4277527130837703,
+      "grad_norm": 0.1305093914270401,
+      "learning_rate": 7.956546603225601e-06,
+      "loss": 2.5211825370788574,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.4433,
+      "step": 877,
+      "tokens/total": 12235520,
+      "tokens/train_per_sec_per_gpu": 2041.36,
+      "tokens/trainable": 5063371
+    },
+    {
+      "epoch": 0.4282404584806731,
+      "grad_norm": 0.13371102511882782,
+      "learning_rate": 7.830427678304353e-06,
+      "loss": 2.4531219005584717,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.62458,
+      "step": 878,
+      "tokens/total": 12250368,
+      "tokens/train_per_sec_per_gpu": 3181.56,
+      "tokens/trainable": 5071429
+    },
+    {
+      "epoch": 0.42872820387757593,
+      "grad_norm": 0.1576300710439682,
+      "learning_rate": 7.705275568582848e-06,
+      "loss": 2.5344905853271484,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.61001,
+      "step": 879,
+      "tokens/total": 12265344,
+      "tokens/train_per_sec_per_gpu": 3537.63,
+      "tokens/trainable": 5076986
+    },
+    {
+      "epoch": 0.4292159492744787,
+      "grad_norm": 0.13250482082366943,
+      "learning_rate": 7.581091586847522e-06,
+      "loss": 2.433558464050293,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.39937,
+      "step": 880,
+      "tokens/total": 12280320,
+      "tokens/train_per_sec_per_gpu": 2714.44,
+      "tokens/trainable": 5084389
+    },
+    {
+      "epoch": 0.42970369467138153,
+      "grad_norm": 0.12432190030813217,
+      "learning_rate": 7.457877035729588e-06,
+      "loss": 2.548605442047119,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 12.78926,
+      "step": 881,
+      "tokens/total": 12293376,
+      "tokens/train_per_sec_per_gpu": 3840.17,
+      "tokens/trainable": 5093340
+    },
+    {
+      "epoch": 0.43019144006828436,
+      "grad_norm": 0.15672548115253448,
+      "learning_rate": 7.335633207691361e-06,
+      "loss": 2.5341434478759766,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 14.3,
+      "memory/max_allocated (GiB)": 14.3,
+      "ppl": 12.60563,
+      "step": 882,
+      "tokens/total": 12305792,
+      "tokens/train_per_sec_per_gpu": 1558.83,
+      "tokens/trainable": 5098803
+    },
+    {
+      "epoch": 0.4306791854651872,
+      "grad_norm": 0.17533928155899048,
+      "learning_rate": 7.21436138501278e-06,
+      "loss": 2.480517864227295,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.09,
+      "memory/max_allocated (GiB)": 15.09,
+      "ppl": 11.94745,
+      "step": 883,
+      "tokens/total": 12318592,
+      "tokens/train_per_sec_per_gpu": 1472.63,
+      "tokens/trainable": 5103041
+    },
+    {
+      "epoch": 0.43116693086209,
+      "grad_norm": 0.14066585898399353,
+      "learning_rate": 7.094062839777837e-06,
+      "loss": 2.64518404006958,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.18,
+      "memory/max_allocated (GiB)": 15.18,
+      "ppl": 14.08604,
+      "step": 884,
+      "tokens/total": 12331776,
+      "tokens/train_per_sec_per_gpu": 846.43,
+      "tokens/trainable": 5109866
+    },
+    {
+      "epoch": 0.4316546762589928,
+      "grad_norm": 0.1537938266992569,
+      "learning_rate": 6.974738833861383e-06,
+      "loss": 2.3791351318359375,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.18,
+      "memory/max_allocated (GiB)": 15.18,
+      "ppl": 10.79556,
+      "step": 885,
+      "tokens/total": 12345856,
+      "tokens/train_per_sec_per_gpu": 1987.51,
+      "tokens/trainable": 5115305
+    },
+    {
+      "epoch": 0.4321424216558956,
+      "grad_norm": 0.13529153168201447,
+      "learning_rate": 6.856390618915775e-06,
+      "loss": 2.581418037414551,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.21587,
+      "step": 886,
+      "tokens/total": 12359936,
+      "tokens/train_per_sec_per_gpu": 3477.99,
+      "tokens/trainable": 5122721
+    },
+    {
+      "epoch": 0.43263016705279844,
+      "grad_norm": 0.151899516582489,
+      "learning_rate": 6.739019436357774e-06,
+      "loss": 2.509517192840576,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 12.29899,
+      "step": 887,
+      "tokens/total": 12374272,
+      "tokens/train_per_sec_per_gpu": 3268.25,
+      "tokens/trainable": 5129148
+    },
+    {
+      "epoch": 0.43311791244970127,
+      "grad_norm": 0.13883507251739502,
+      "learning_rate": 6.622626517355557e-06,
+      "loss": 2.570188522338867,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 13.06829,
+      "step": 888,
+      "tokens/total": 12387712,
+      "tokens/train_per_sec_per_gpu": 2985.04,
+      "tokens/trainable": 5136366
+    },
+    {
+      "epoch": 0.4336056578466041,
+      "grad_norm": 0.17564928531646729,
+      "learning_rate": 6.507213082815744e-06,
+      "loss": 2.4525790214538574,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 11.61827,
+      "step": 889,
+      "tokens/total": 12401280,
+      "tokens/train_per_sec_per_gpu": 776.96,
+      "tokens/trainable": 5140656
+    },
+    {
+      "epoch": 0.43409340324350687,
+      "grad_norm": 0.14762923121452332,
+      "learning_rate": 6.392780343370686e-06,
+      "loss": 2.6221022605895996,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.76463,
+      "step": 890,
+      "tokens/total": 12414080,
+      "tokens/train_per_sec_per_gpu": 2521.43,
+      "tokens/trainable": 5147622
+    },
+    {
+      "epoch": 0.4345811486404097,
+      "grad_norm": 0.14871041476726532,
+      "learning_rate": 6.2793294993656494e-06,
+      "loss": 2.450669527053833,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.59611,
+      "step": 891,
+      "tokens/total": 12429184,
+      "tokens/train_per_sec_per_gpu": 983.15,
+      "tokens/trainable": 5153346
+    },
+    {
+      "epoch": 0.4350688940373125,
+      "grad_norm": 0.2055574506521225,
+      "learning_rate": 6.166861740846297e-06,
+      "loss": 2.7320761680603027,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 15.36475,
+      "step": 892,
+      "tokens/total": 12443136,
+      "tokens/train_per_sec_per_gpu": 615.37,
+      "tokens/trainable": 5156569
+    },
+    {
+      "epoch": 0.43555663943421535,
+      "grad_norm": 0.16989806294441223,
+      "learning_rate": 6.055378247546218e-06,
+      "loss": 2.3722715377807617,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.72172,
+      "step": 893,
+      "tokens/total": 12456576,
+      "tokens/train_per_sec_per_gpu": 289.52,
+      "tokens/trainable": 5161189
+    },
+    {
+      "epoch": 0.4360443848311182,
+      "grad_norm": 0.158726304769516,
+      "learning_rate": 5.9448801888744795e-06,
+      "loss": 2.1923623085021973,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 8.95635,
+      "step": 894,
+      "tokens/total": 12469888,
+      "tokens/train_per_sec_per_gpu": 3135.48,
+      "tokens/trainable": 5165648
+    },
+    {
+      "epoch": 0.43653213022802095,
+      "grad_norm": 0.12554942071437836,
+      "learning_rate": 5.835368723903456e-06,
+      "loss": 2.471595287322998,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.98,
+      "memory/max_allocated (GiB)": 15.98,
+      "ppl": 11.84132,
+      "step": 895,
+      "tokens/total": 12484992,
+      "tokens/train_per_sec_per_gpu": 2601.89,
+      "tokens/trainable": 5174167
+    },
+    {
+      "epoch": 0.4370198756249238,
+      "grad_norm": 0.17701223492622375,
+      "learning_rate": 5.726845001356573e-06,
+      "loss": 2.3380227088928223,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.53,
+      "memory/max_allocated (GiB)": 15.53,
+      "ppl": 10.36073,
+      "step": 896,
+      "tokens/total": 12498688,
+      "tokens/train_per_sec_per_gpu": 1107.32,
+      "tokens/trainable": 5177916
+    },
+    {
+      "epoch": 0.4375076210218266,
+      "grad_norm": 0.15862098336219788,
+      "learning_rate": 5.6193101595963585e-06,
+      "loss": 2.281069755554199,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 9.78714,
+      "step": 897,
+      "tokens/total": 12512640,
+      "tokens/train_per_sec_per_gpu": 1239.77,
+      "tokens/trainable": 5182570
+    },
+    {
+      "epoch": 0.43799536641872944,
+      "grad_norm": 0.16245107352733612,
+      "learning_rate": 5.512765326612379e-06,
+      "loss": 2.2768242359161377,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 9.74568,
+      "step": 898,
+      "tokens/total": 12527872,
+      "tokens/train_per_sec_per_gpu": 742.37,
+      "tokens/trainable": 5187760
+    },
+    {
+      "epoch": 0.43848311181563226,
+      "grad_norm": 0.15560710430145264,
+      "learning_rate": 5.407211620009544e-06,
+      "loss": 2.7127208709716797,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 15.53,
+      "memory/max_allocated (GiB)": 15.53,
+      "ppl": 15.07022,
+      "step": 899,
+      "tokens/total": 12541056,
+      "tokens/train_per_sec_per_gpu": 1990.04,
+      "tokens/trainable": 5193948
+    },
+    {
+      "epoch": 0.43897085721253504,
+      "grad_norm": 0.1649906188249588,
+      "learning_rate": 5.30265014699628e-06,
+      "loss": 2.6698319911956787,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 14.43754,
+      "step": 900,
+      "tokens/total": 12555520,
+      "tokens/train_per_sec_per_gpu": 2105.77,
+      "tokens/trainable": 5198878
+    },
+    {
+      "epoch": 0.43897085721253504,
+      "eval_loss": 2.4894537925720215,
+      "eval_ppl": 12.05469,
+      "eval_runtime": 6.0203,
+      "eval_samples_per_second": 33.221,
+      "eval_steps_per_second": 16.61,
+      "memory/device_reserved (GiB)": 36.95,
+      "memory/max_active (GiB)": 11.76,
+      "memory/max_allocated (GiB)": 11.76,
+      "step": 900
     }
   ],
   "logging_steps": 1,
       "attributes": {}
     }
   },
+  "total_flos": 1.5443869122625536e+17,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null