Instructions to use AiAF/rp-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AiAF/rp-2b with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it")
model = PeftModel.from_pretrained(base_model, "AiAF/rp-2b")

Transformers

How to use AiAF/rp-2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AiAF/rp-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AiAF/rp-2b")
model = AutoModelForCausalLM.from_pretrained("AiAF/rp-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AiAF/rp-2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AiAF/rp-2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AiAF/rp-2b

SGLang

How to use AiAF/rp-2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AiAF/rp-2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AiAF/rp-2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AiAF/rp-2b with Docker Model Runner:
```
docker model run hf.co/AiAF/rp-2b
```

AiAF commited on Mar 30

Commit

ef695d7

verified ·

1 Parent(s): 1f2e159

Training in progress, step 750, checkpoint

Browse files

Files changed (6) hide show

last-checkpoint/adapter_model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/tokens_state.json +1 -1
last-checkpoint/trainer_state.json +715 -3

last-checkpoint/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:59badcbc1d668a371853f284e39f9ca33e2fe2af68b773148163044bb0f70bdd
 size 102264160

 version https://git-lfs.github.com/spec/v1
+oid sha256:f145aad3e393aacb1ea6687fe5c794bd1505c6b68c50e5038c6eac34efa7e4d6
 size 102264160

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:49665f71e34f2a3db3bbae94d41e9706d6e4267d7bf49d604935f42728af0512
 size 52162827

 version https://git-lfs.github.com/spec/v1
+oid sha256:140bdab4eebed8c5ba2417db0ed65f56201fa6307a32fb787ad292b97ae34b13
 size 52162827

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b2376b84b97294d583dff60749feb13d6533baf27b96b9a245af922803baac53
 size 14645

 version https://git-lfs.github.com/spec/v1
+oid sha256:4295d68f9590a1ee84490e5a76cd2d12d84f3c4e7c7542a7915be508cf875fe0
 size 14645

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1fccf8d05f51ee90d9abfa90ec4fa092bb34ce369846454436f6371151204846
 size 1465

 version https://git-lfs.github.com/spec/v1
+oid sha256:6af5f150dbd15fa79794ceabe67cfe7018c07d61742eb73c3c6b041388c26d7c
 size 1465

last-checkpoint/tokens_state.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"total": ~~9769472~~, "trainable": ~~4042644~~}


1	+ {"total": 10467328, "trainable": 4329291}

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.3414217778319717,
   "eval_steps": 50,
-  "global_step": 700,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -9988,6 +9988,718 @@
       "memory/max_active (GiB)": 11.76,
       "memory/max_allocated (GiB)": 11.76,
       "step": 700
     }
   ],
   "logging_steps": 1,
@@ -10007,7 +10719,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.201690148756521e+17,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.36580904767711253,
   "eval_steps": 50,
+  "global_step": 750,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "memory/max_active (GiB)": 11.76,
       "memory/max_allocated (GiB)": 11.76,
       "step": 700
+    },
+    {
+      "epoch": 0.3419095232288745,
+      "grad_norm": 0.15700815618038177,
+      "learning_rate": 4.360429701490934e-05,
+      "loss": 2.678558111190796,
+      "memory/device_reserved (GiB)": 37.36,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 14.56408,
+      "step": 701,
+      "tokens/total": 9783296,
+      "tokens/train_per_sec_per_gpu": 731.12,
+      "tokens/trainable": 4048483
+    },
+    {
+      "epoch": 0.34239726862577735,
+      "grad_norm": 0.15936368703842163,
+      "learning_rate": 4.333713551181852e-05,
+      "loss": 2.4016025066375732,
+      "memory/device_reserved (GiB)": 37.36,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.04086,
+      "step": 702,
+      "tokens/total": 9796608,
+      "tokens/train_per_sec_per_gpu": 1781.96,
+      "tokens/trainable": 4053789
+    },
+    {
+      "epoch": 0.3428850140226802,
+      "grad_norm": 0.15781526267528534,
+      "learning_rate": 4.307056837536373e-05,
+      "loss": 2.310777187347412,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.08226,
+      "step": 703,
+      "tokens/total": 9812224,
+      "tokens/train_per_sec_per_gpu": 2462.76,
+      "tokens/trainable": 4058992
+    },
+    {
+      "epoch": 0.34337275941958295,
+      "grad_norm": 0.15782800316810608,
+      "learning_rate": 4.2804598401708175e-05,
+      "loss": 2.5483644008636475,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 12.78617,
+      "step": 704,
+      "tokens/total": 9825920,
+      "tokens/train_per_sec_per_gpu": 725.7,
+      "tokens/trainable": 4064335
+    },
+    {
+      "epoch": 0.3438605048164858,
+      "grad_norm": 0.13850180804729462,
+      "learning_rate": 4.253922838075095e-05,
+      "loss": 2.7391016483306885,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 15.47308,
+      "step": 705,
+      "tokens/total": 9840896,
+      "tokens/train_per_sec_per_gpu": 2419.77,
+      "tokens/trainable": 4072424
+    },
+    {
+      "epoch": 0.3443482502133886,
+      "grad_norm": 0.16455164551734924,
+      "learning_rate": 4.227446109609809e-05,
+      "loss": 2.409106969833374,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.12402,
+      "step": 706,
+      "tokens/total": 9855104,
+      "tokens/train_per_sec_per_gpu": 1066.86,
+      "tokens/trainable": 4077224
+    },
+    {
+      "epoch": 0.34483599561029143,
+      "grad_norm": 0.1787019520998001,
+      "learning_rate": 4.2010299325033034e-05,
+      "loss": 2.276559352874756,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 9.7431,
+      "step": 707,
+      "tokens/total": 9867776,
+      "tokens/train_per_sec_per_gpu": 799.15,
+      "tokens/trainable": 4081093
+    },
+    {
+      "epoch": 0.34532374100719426,
+      "grad_norm": 0.14423713088035583,
+      "learning_rate": 4.17467458384878e-05,
+      "loss": 2.411412239074707,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.1497,
+      "step": 708,
+      "tokens/total": 9881856,
+      "tokens/train_per_sec_per_gpu": 3455.55,
+      "tokens/trainable": 4087806
+    },
+    {
+      "epoch": 0.34581148640409703,
+      "grad_norm": 0.1406305432319641,
+      "learning_rate": 4.1483803401013796e-05,
+      "loss": 2.494349241256714,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 12.11385,
+      "step": 709,
+      "tokens/total": 9896192,
+      "tokens/train_per_sec_per_gpu": 1836.23,
+      "tokens/trainable": 4094473
+    },
+    {
+      "epoch": 0.34629923180099986,
+      "grad_norm": 0.1451570838689804,
+      "learning_rate": 4.12214747707527e-05,
+      "loss": 2.4983654022216797,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 14.74,
+      "memory/max_allocated (GiB)": 14.74,
+      "ppl": 12.1626,
+      "step": 710,
+      "tokens/total": 9909120,
+      "tokens/train_per_sec_per_gpu": 2190.99,
+      "tokens/trainable": 4100682
+    },
+    {
+      "epoch": 0.3467869771979027,
+      "grad_norm": 0.18409603834152222,
+      "learning_rate": 4.0959762699407766e-05,
+      "loss": 2.590090751647949,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.18,
+      "memory/max_allocated (GiB)": 15.18,
+      "ppl": 13.33098,
+      "step": 711,
+      "tokens/total": 9922304,
+      "tokens/train_per_sec_per_gpu": 2223.08,
+      "tokens/trainable": 4104953
+    },
+    {
+      "epoch": 0.3472747225948055,
+      "grad_norm": 0.14688999950885773,
+      "learning_rate": 4.0698669932214727e-05,
+      "loss": 2.700690507888794,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 14.89001,
+      "step": 712,
+      "tokens/total": 9936000,
+      "tokens/train_per_sec_per_gpu": 2682.83,
+      "tokens/trainable": 4111587
+    },
+    {
+      "epoch": 0.34776246799170835,
+      "grad_norm": 0.13892816007137299,
+      "learning_rate": 4.043819920791322e-05,
+      "loss": 2.448453426361084,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.57044,
+      "step": 713,
+      "tokens/total": 9951360,
+      "tokens/train_per_sec_per_gpu": 3127.06,
+      "tokens/trainable": 4118516
+    },
+    {
+      "epoch": 0.3482502133886111,
+      "grad_norm": 0.1610805243253708,
+      "learning_rate": 4.0178353258717804e-05,
+      "loss": 2.5162341594696045,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 12.38188,
+      "step": 714,
+      "tokens/total": 9966336,
+      "tokens/train_per_sec_per_gpu": 1221.89,
+      "tokens/trainable": 4124094
+    },
+    {
+      "epoch": 0.34873795878551395,
+      "grad_norm": 0.14121432602405548,
+      "learning_rate": 3.991913481028965e-05,
+      "loss": 2.4478676319122314,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.56366,
+      "step": 715,
+      "tokens/total": 9980544,
+      "tokens/train_per_sec_per_gpu": 2758.01,
+      "tokens/trainable": 4130649
+    },
+    {
+      "epoch": 0.3492257041824168,
+      "grad_norm": 0.14568425714969635,
+      "learning_rate": 3.966054658170754e-05,
+      "loss": 2.542024612426758,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.70537,
+      "step": 716,
+      "tokens/total": 9995008,
+      "tokens/train_per_sec_per_gpu": 1350.01,
+      "tokens/trainable": 4137414
+    },
+    {
+      "epoch": 0.3497134495793196,
+      "grad_norm": 0.14108124375343323,
+      "learning_rate": 3.940259128543967e-05,
+      "loss": 2.3504650592803955,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 10.49045,
+      "step": 717,
+      "tokens/total": 10008960,
+      "tokens/train_per_sec_per_gpu": 793.29,
+      "tokens/trainable": 4143920
+    },
+    {
+      "epoch": 0.35020119497622243,
+      "grad_norm": 0.16065354645252228,
+      "learning_rate": 3.9145271627314986e-05,
+      "loss": 2.477329969406128,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 13.41,
+      "memory/max_allocated (GiB)": 13.41,
+      "ppl": 11.90942,
+      "step": 718,
+      "tokens/total": 10021120,
+      "tokens/train_per_sec_per_gpu": 2102.68,
+      "tokens/trainable": 4148865
+    },
+    {
+      "epoch": 0.3506889403731252,
+      "grad_norm": 0.145247220993042,
+      "learning_rate": 3.8888590306494974e-05,
+      "loss": 2.375197410583496,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 10.75314,
+      "step": 719,
+      "tokens/total": 10034944,
+      "tokens/train_per_sec_per_gpu": 1469.38,
+      "tokens/trainable": 4154874
+    },
+    {
+      "epoch": 0.35117668577002803,
+      "grad_norm": 0.13928496837615967,
+      "learning_rate": 3.8632550015445256e-05,
+      "loss": 2.509256601333618,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 12.29579,
+      "step": 720,
+      "tokens/total": 10048000,
+      "tokens/train_per_sec_per_gpu": 1983.62,
+      "tokens/trainable": 4162436
+    },
+    {
+      "epoch": 0.35166443116693086,
+      "grad_norm": 0.1396213173866272,
+      "learning_rate": 3.8377153439907266e-05,
+      "loss": 2.288262367248535,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 9.85779,
+      "step": 721,
+      "tokens/total": 10062080,
+      "tokens/train_per_sec_per_gpu": 1594.6,
+      "tokens/trainable": 4168778
+    },
+    {
+      "epoch": 0.3521521765638337,
+      "grad_norm": 0.17178580164909363,
+      "learning_rate": 3.81224032588703e-05,
+      "loss": 2.3965115547180176,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 10.98479,
+      "step": 722,
+      "tokens/total": 10075392,
+      "tokens/train_per_sec_per_gpu": 916.45,
+      "tokens/trainable": 4172899
+    },
+    {
+      "epoch": 0.3526399219607365,
+      "grad_norm": 0.19447971880435944,
+      "learning_rate": 3.786830214454315e-05,
+      "loss": 2.4896106719970703,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.05658,
+      "step": 723,
+      "tokens/total": 10090112,
+      "tokens/train_per_sec_per_gpu": 988.46,
+      "tokens/trainable": 4176781
+    },
+    {
+      "epoch": 0.3531276673576393,
+      "grad_norm": 0.1555708646774292,
+      "learning_rate": 3.7614852762326305e-05,
+      "loss": 2.4370362758636475,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.43909,
+      "step": 724,
+      "tokens/total": 10104832,
+      "tokens/train_per_sec_per_gpu": 2862.85,
+      "tokens/trainable": 4182322
+    },
+    {
+      "epoch": 0.3536154127545421,
+      "grad_norm": 0.15927904844284058,
+      "learning_rate": 3.736205777078381e-05,
+      "loss": 2.3857152462005615,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 10.86683,
+      "step": 725,
+      "tokens/total": 10118144,
+      "tokens/train_per_sec_per_gpu": 1523.11,
+      "tokens/trainable": 4187426
+    },
+    {
+      "epoch": 0.35410315815144494,
+      "grad_norm": 0.16110184788703918,
+      "learning_rate": 3.710991982161555e-05,
+      "loss": 2.508744716644287,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 13.41,
+      "memory/max_allocated (GiB)": 13.41,
+      "ppl": 12.28949,
+      "step": 726,
+      "tokens/total": 10129152,
+      "tokens/train_per_sec_per_gpu": 3551.65,
+      "tokens/trainable": 4192287
+    },
+    {
+      "epoch": 0.35459090354834777,
+      "grad_norm": 0.17562873661518097,
+      "learning_rate": 3.6858441559629306e-05,
+      "loss": 2.5916664600372314,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 14.74,
+      "memory/max_allocated (GiB)": 14.74,
+      "ppl": 13.352,
+      "step": 727,
+      "tokens/total": 10142080,
+      "tokens/train_per_sec_per_gpu": 833.09,
+      "tokens/trainable": 4196474
+    },
+    {
+      "epoch": 0.3550786489452506,
+      "grad_norm": 0.16362859308719635,
+      "learning_rate": 3.6607625622713e-05,
+      "loss": 2.5120694637298584,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.33042,
+      "step": 728,
+      "tokens/total": 10156160,
+      "tokens/train_per_sec_per_gpu": 1357.25,
+      "tokens/trainable": 4201530
+    },
+    {
+      "epoch": 0.35556639434215337,
+      "grad_norm": 0.18990835547447205,
+      "learning_rate": 3.63574746418072e-05,
+      "loss": 2.468756914138794,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.80776,
+      "step": 729,
+      "tokens/total": 10169728,
+      "tokens/train_per_sec_per_gpu": 2009.19,
+      "tokens/trainable": 4205098
+    },
+    {
+      "epoch": 0.3560541397390562,
+      "grad_norm": 0.13236725330352783,
+      "learning_rate": 3.610799124087725e-05,
+      "loss": 2.67596435546875,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 14.52635,
+      "step": 730,
+      "tokens/total": 10185344,
+      "tokens/train_per_sec_per_gpu": 3279.33,
+      "tokens/trainable": 4214000
+    },
+    {
+      "epoch": 0.356541885135959,
+      "grad_norm": 0.1518988162279129,
+      "learning_rate": 3.585917803688603e-05,
+      "loss": 2.535468101501465,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.98,
+      "memory/max_allocated (GiB)": 15.98,
+      "ppl": 12.62234,
+      "step": 731,
+      "tokens/total": 10199296,
+      "tokens/train_per_sec_per_gpu": 646.5,
+      "tokens/trainable": 4219880
+    },
+    {
+      "epoch": 0.35702963053286185,
+      "grad_norm": 0.18351151049137115,
+      "learning_rate": 3.5611037639766265e-05,
+      "loss": 2.455716371536255,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 11.65478,
+      "step": 732,
+      "tokens/total": 10214272,
+      "tokens/train_per_sec_per_gpu": 2191.87,
+      "tokens/trainable": 4224167
+    },
+    {
+      "epoch": 0.3575173759297647,
+      "grad_norm": 0.1563291698694229,
+      "learning_rate": 3.5363572652393326e-05,
+      "loss": 2.5146679878234863,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 12.3625,
+      "step": 733,
+      "tokens/total": 10227712,
+      "tokens/train_per_sec_per_gpu": 2745.69,
+      "tokens/trainable": 4230596
+    },
+    {
+      "epoch": 0.35800512132666745,
+      "grad_norm": 0.15779973566532135,
+      "learning_rate": 3.511678567055786e-05,
+      "loss": 2.7565038204193115,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 15.7447,
+      "step": 734,
+      "tokens/total": 10241152,
+      "tokens/train_per_sec_per_gpu": 3290.48,
+      "tokens/trainable": 4236327
+    },
+    {
+      "epoch": 0.3584928667235703,
+      "grad_norm": 0.14905805885791779,
+      "learning_rate": 3.487067928293848e-05,
+      "loss": 2.6842727661132812,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 14.64755,
+      "step": 735,
+      "tokens/total": 10255232,
+      "tokens/train_per_sec_per_gpu": 3533.74,
+      "tokens/trainable": 4242429
+    },
+    {
+      "epoch": 0.3589806121204731,
+      "grad_norm": 0.1586298942565918,
+      "learning_rate": 3.4625256071074773e-05,
+      "loss": 2.7677407264709473,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 15.92262,
+      "step": 736,
+      "tokens/total": 10269056,
+      "tokens/train_per_sec_per_gpu": 2718.39,
+      "tokens/trainable": 4248250
+    },
+    {
+      "epoch": 0.35946835751737594,
+      "grad_norm": 0.14619523286819458,
+      "learning_rate": 3.4380518609340076e-05,
+      "loss": 2.4541380405426025,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.6364,
+      "step": 737,
+      "tokens/total": 10283520,
+      "tokens/train_per_sec_per_gpu": 1616.46,
+      "tokens/trainable": 4254038
+    },
+    {
+      "epoch": 0.35995610291427876,
+      "grad_norm": 0.1348477602005005,
+      "learning_rate": 3.4136469464914575e-05,
+      "loss": 2.392076015472412,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 10.93617,
+      "step": 738,
+      "tokens/total": 10297216,
+      "tokens/train_per_sec_per_gpu": 2858.04,
+      "tokens/trainable": 4260608
+    },
+    {
+      "epoch": 0.36044384831118154,
+      "grad_norm": 0.16576313972473145,
+      "learning_rate": 3.389311119775828e-05,
+      "loss": 2.443544864654541,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.51378,
+      "step": 739,
+      "tokens/total": 10312576,
+      "tokens/train_per_sec_per_gpu": 2672.63,
+      "tokens/trainable": 4265873
+    },
+    {
+      "epoch": 0.36093159370808436,
+      "grad_norm": 0.1353883296251297,
+      "learning_rate": 3.3650446360584275e-05,
+      "loss": 2.4599642753601074,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 11.70439,
+      "step": 740,
+      "tokens/total": 10327552,
+      "tokens/train_per_sec_per_gpu": 2203.07,
+      "tokens/trainable": 4273202
+    },
+    {
+      "epoch": 0.3614193391049872,
+      "grad_norm": 0.12930633127689362,
+      "learning_rate": 3.340847749883191e-05,
+      "loss": 2.54553484916687,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.75005,
+      "step": 741,
+      "tokens/total": 10342656,
+      "tokens/train_per_sec_per_gpu": 1942.69,
+      "tokens/trainable": 4280837
+    },
+    {
+      "epoch": 0.36190708450189,
+      "grad_norm": 0.15681229531764984,
+      "learning_rate": 3.316720715064e-05,
+      "loss": 2.5012075901031494,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.98,
+      "memory/max_allocated (GiB)": 15.98,
+      "ppl": 12.19721,
+      "step": 742,
+      "tokens/total": 10355200,
+      "tokens/train_per_sec_per_gpu": 1917.69,
+      "tokens/trainable": 4285994
+    },
+    {
+      "epoch": 0.36239482989879285,
+      "grad_norm": 0.1448935866355896,
+      "learning_rate": 3.292663784682036e-05,
+      "loss": 2.5044198036193848,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.23646,
+      "step": 743,
+      "tokens/total": 10369280,
+      "tokens/train_per_sec_per_gpu": 1901.75,
+      "tokens/trainable": 4292179
+    },
+    {
+      "epoch": 0.3628825752956956,
+      "grad_norm": 0.15751656889915466,
+      "learning_rate": 3.268677211083109e-05,
+      "loss": 2.60463547706604,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 14.74,
+      "memory/max_allocated (GiB)": 14.74,
+      "ppl": 13.52629,
+      "step": 744,
+      "tokens/total": 10382336,
+      "tokens/train_per_sec_per_gpu": 3480.31,
+      "tokens/trainable": 4297990
+    },
+    {
+      "epoch": 0.36337032069259845,
+      "grad_norm": 0.12639038264751434,
+      "learning_rate": 3.2447612458750365e-05,
+      "loss": 2.4025323390960693,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.53,
+      "memory/max_allocated (GiB)": 15.53,
+      "ppl": 11.05113,
+      "step": 745,
+      "tokens/total": 10396416,
+      "tokens/train_per_sec_per_gpu": 1472.7,
+      "tokens/trainable": 4305977
+    },
+    {
+      "epoch": 0.3638580660895013,
+      "grad_norm": 0.1874016672372818,
+      "learning_rate": 3.2209161399249674e-05,
+      "loss": 2.3843958377838135,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.8525,
+      "step": 746,
+      "tokens/total": 10410112,
+      "tokens/train_per_sec_per_gpu": 2110.82,
+      "tokens/trainable": 4309531
+    },
+    {
+      "epoch": 0.3643458114864041,
+      "grad_norm": 0.15403391420841217,
+      "learning_rate": 3.197142143356787e-05,
+      "loss": 2.4408397674560547,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.18,
+      "memory/max_allocated (GiB)": 15.18,
+      "ppl": 11.48268,
+      "step": 747,
+      "tokens/total": 10423936,
+      "tokens/train_per_sec_per_gpu": 891.14,
+      "tokens/trainable": 4314760
+    },
+    {
+      "epoch": 0.36483355688330693,
+      "grad_norm": 0.14246027171611786,
+      "learning_rate": 3.173439505548462e-05,
+      "loss": 2.531158447265625,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 15.98,
+      "memory/max_allocated (GiB)": 15.98,
+      "ppl": 12.56806,
+      "step": 748,
+      "tokens/total": 10438016,
+      "tokens/train_per_sec_per_gpu": 2604.33,
+      "tokens/trainable": 4321155
+    },
+    {
+      "epoch": 0.3653213022802097,
+      "grad_norm": 0.18709643185138702,
+      "learning_rate": 3.149808475129452e-05,
+      "loss": 2.3285350799560547,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.2629,
+      "step": 749,
+      "tokens/total": 10453760,
+      "tokens/train_per_sec_per_gpu": 703.15,
+      "tokens/trainable": 4325157
+    },
+    {
+      "epoch": 0.36580904767711253,
+      "grad_norm": 0.1710476279258728,
+      "learning_rate": 3.126249299978086e-05,
+      "loss": 2.375027656555176,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.75131,
+      "step": 750,
+      "tokens/total": 10467328,
+      "tokens/train_per_sec_per_gpu": 2769.28,
+      "tokens/trainable": 4329291
+    },
+    {
+      "epoch": 0.36580904767711253,
+      "eval_loss": 2.4998860359191895,
+      "eval_ppl": 12.18111,
+      "eval_runtime": 6.0497,
+      "eval_samples_per_second": 33.06,
+      "eval_steps_per_second": 16.53,
+      "memory/device_reserved (GiB)": 49.08,
+      "memory/max_active (GiB)": 11.76,
+      "memory/max_allocated (GiB)": 11.76,
+      "step": 750
     }
   ],
   "logging_steps": 1,
       "attributes": {}
     }
   },
+  "total_flos": 1.287529657836503e+17,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null