Instructions to use AiAF/rp-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AiAF/rp-2b with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it")
model = PeftModel.from_pretrained(base_model, "AiAF/rp-2b")

Transformers

How to use AiAF/rp-2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AiAF/rp-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AiAF/rp-2b")
model = AutoModelForCausalLM.from_pretrained("AiAF/rp-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AiAF/rp-2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AiAF/rp-2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AiAF/rp-2b

SGLang

How to use AiAF/rp-2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AiAF/rp-2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AiAF/rp-2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAF/rp-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AiAF/rp-2b with Docker Model Runner:
```
docker model run hf.co/AiAF/rp-2b
```

AiAF commited on Mar 30

Commit

3928ed9

verified ·

1 Parent(s): b7c51fa

Training in progress, step 700, checkpoint

Browse files

Files changed (6) hide show

last-checkpoint/adapter_model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/tokens_state.json +1 -1
last-checkpoint/trainer_state.json +715 -3

last-checkpoint/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f324031f7a5f64d996914a53f616440d3542f7bfd1d1bd047c6bf8351b781971
 size 102264160

 version https://git-lfs.github.com/spec/v1
+oid sha256:59badcbc1d668a371853f284e39f9ca33e2fe2af68b773148163044bb0f70bdd
 size 102264160

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:67210f76ed9c029d6ef5061227adf74372a001ffef8daca3ef3d136da719dbb9
 size 52162827

 version https://git-lfs.github.com/spec/v1
+oid sha256:49665f71e34f2a3db3bbae94d41e9706d6e4267d7bf49d604935f42728af0512
 size 52162827

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4a1d3c69b35f53b118782dd94c78466c7746e86131456f7cde73b319b454bd68
 size 14645

 version https://git-lfs.github.com/spec/v1
+oid sha256:b2376b84b97294d583dff60749feb13d6533baf27b96b9a245af922803baac53
 size 14645

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c5592f46c154d334eaa5a16d750cd9060bfacc1786394b3ad334927a4f8e7542
 size 1465

 version https://git-lfs.github.com/spec/v1
+oid sha256:1fccf8d05f51ee90d9abfa90ec4fa092bb34ce369846454436f6371151204846
 size 1465

last-checkpoint/tokens_state.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"total": ~~9076224~~, "trainable": ~~3746515~~}


1	+ {"total": 9769472, "trainable": 4042644}

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.31703450798683086,
   "eval_steps": 50,
-  "global_step": 650,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -9276,6 +9276,718 @@
       "memory/max_active (GiB)": 11.76,
       "memory/max_allocated (GiB)": 11.76,
       "step": 650
     }
   ],
   "logging_steps": 1,
@@ -9295,7 +10007,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.1164174449455923e+17,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.3414217778319717,
   "eval_steps": 50,
+  "global_step": 700,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "memory/max_active (GiB)": 11.76,
       "memory/max_allocated (GiB)": 11.76,
       "step": 650
+    },
+    {
+      "epoch": 0.3175222533837337,
+      "grad_norm": 0.1710362732410431,
+      "learning_rate": 5.765665457425102e-05,
+      "loss": 2.4334278106689453,
+      "memory/device_reserved (GiB)": 25.88,
+      "memory/max_active (GiB)": 15.19,
+      "memory/max_allocated (GiB)": 15.19,
+      "ppl": 11.39788,
+      "step": 651,
+      "tokens/total": 9089664,
+      "tokens/train_per_sec_per_gpu": 899.25,
+      "tokens/trainable": 3750762
+    },
+    {
+      "epoch": 0.3180099987806365,
+      "grad_norm": 0.14897273480892181,
+      "learning_rate": 5.736346951157544e-05,
+      "loss": 2.455512523651123,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.6524,
+      "step": 652,
+      "tokens/total": 9104128,
+      "tokens/train_per_sec_per_gpu": 3318.27,
+      "tokens/trainable": 3756193
+    },
+    {
+      "epoch": 0.31849774417753934,
+      "grad_norm": 0.13437563180923462,
+      "learning_rate": 5.707073168592942e-05,
+      "loss": 2.4900941848754883,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.19,
+      "memory/max_allocated (GiB)": 15.19,
+      "ppl": 12.06241,
+      "step": 653,
+      "tokens/total": 9118592,
+      "tokens/train_per_sec_per_gpu": 1940.99,
+      "tokens/trainable": 3763520
+    },
+    {
+      "epoch": 0.3189854895744421,
+      "grad_norm": 0.153215691447258,
+      "learning_rate": 5.677844416799424e-05,
+      "loss": 2.5800952911376953,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.1984,
+      "step": 654,
+      "tokens/total": 9133824,
+      "tokens/train_per_sec_per_gpu": 2126.06,
+      "tokens/trainable": 3769074
+    },
+    {
+      "epoch": 0.31947323497134494,
+      "grad_norm": 0.14280448853969574,
+      "learning_rate": 5.648661002372768e-05,
+      "loss": 2.5871028900146484,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 13.29121,
+      "step": 655,
+      "tokens/total": 9147904,
+      "tokens/train_per_sec_per_gpu": 3428.97,
+      "tokens/trainable": 3776266
+    },
+    {
+      "epoch": 0.31996098036824777,
+      "grad_norm": 0.1566459834575653,
+      "learning_rate": 5.6195232314331766e-05,
+      "loss": 2.5909602642059326,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 13.34258,
+      "step": 656,
+      "tokens/total": 9161344,
+      "tokens/train_per_sec_per_gpu": 1993.38,
+      "tokens/trainable": 3782089
+    },
+    {
+      "epoch": 0.3204487257651506,
+      "grad_norm": 0.16187436878681183,
+      "learning_rate": 5.590431409622081e-05,
+      "loss": 2.4071998596191406,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 11.10283,
+      "step": 657,
+      "tokens/total": 9175040,
+      "tokens/train_per_sec_per_gpu": 390.92,
+      "tokens/trainable": 3786753
+    },
+    {
+      "epoch": 0.3209364711620534,
+      "grad_norm": 0.17244231700897217,
+      "learning_rate": 5.56138584209893e-05,
+      "loss": 2.428713083267212,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.53,
+      "memory/max_allocated (GiB)": 15.53,
+      "ppl": 11.34427,
+      "step": 658,
+      "tokens/total": 9187328,
+      "tokens/train_per_sec_per_gpu": 2066.3,
+      "tokens/trainable": 3791029
+    },
+    {
+      "epoch": 0.3214242165589562,
+      "grad_norm": 0.14595621824264526,
+      "learning_rate": 5.532386833537977e-05,
+      "loss": 2.5427656173706055,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 14.74,
+      "memory/max_allocated (GiB)": 14.74,
+      "ppl": 12.71479,
+      "step": 659,
+      "tokens/total": 9199872,
+      "tokens/train_per_sec_per_gpu": 3074.28,
+      "tokens/trainable": 3797470
+    },
+    {
+      "epoch": 0.321911961955859,
+      "grad_norm": 0.1848934441804886,
+      "learning_rate": 5.503434688125104e-05,
+      "loss": 2.55776309967041,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.09,
+      "memory/max_allocated (GiB)": 15.09,
+      "ppl": 12.90691,
+      "step": 660,
+      "tokens/total": 9213184,
+      "tokens/train_per_sec_per_gpu": 236.15,
+      "tokens/trainable": 3801347
+    },
+    {
+      "epoch": 0.32239970735276186,
+      "grad_norm": 0.2647537291049957,
+      "learning_rate": 5.474529709554612e-05,
+      "loss": 2.4955523014068604,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 14.3,
+      "memory/max_allocated (GiB)": 14.3,
+      "ppl": 12.12843,
+      "step": 661,
+      "tokens/total": 9225984,
+      "tokens/train_per_sec_per_gpu": 1517.0,
+      "tokens/trainable": 3807345
+    },
+    {
+      "epoch": 0.3228874527496647,
+      "grad_norm": 0.16561807692050934,
+      "learning_rate": 5.445672201026054e-05,
+      "loss": 2.59391450881958,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 12.97,
+      "memory/max_allocated (GiB)": 12.97,
+      "ppl": 13.38205,
+      "step": 662,
+      "tokens/total": 9237504,
+      "tokens/train_per_sec_per_gpu": 3075.71,
+      "tokens/trainable": 3812751
+    },
+    {
+      "epoch": 0.3233751981465675,
+      "grad_norm": 0.12832888960838318,
+      "learning_rate": 5.416862465241033e-05,
+      "loss": 2.464712619781494,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.7601,
+      "step": 663,
+      "tokens/total": 9251968,
+      "tokens/train_per_sec_per_gpu": 1501.25,
+      "tokens/trainable": 3820648
+    },
+    {
+      "epoch": 0.3238629435434703,
+      "grad_norm": 0.11801256984472275,
+      "learning_rate": 5.388100804400049e-05,
+      "loss": 2.523484230041504,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.47198,
+      "step": 664,
+      "tokens/total": 9267200,
+      "tokens/train_per_sec_per_gpu": 3130.58,
+      "tokens/trainable": 3830437
+    },
+    {
+      "epoch": 0.3243506889403731,
+      "grad_norm": 0.12580764293670654,
+      "learning_rate": 5.3593875201993174e-05,
+      "loss": 2.391364336013794,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.92839,
+      "step": 665,
+      "tokens/total": 9282944,
+      "tokens/train_per_sec_per_gpu": 1146.11,
+      "tokens/trainable": 3838657
+    },
+    {
+      "epoch": 0.32483843433727594,
+      "grad_norm": 0.13414451479911804,
+      "learning_rate": 5.3307229138275936e-05,
+      "loss": 2.372819662094116,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.7276,
+      "step": 666,
+      "tokens/total": 9297920,
+      "tokens/train_per_sec_per_gpu": 1654.91,
+      "tokens/trainable": 3845393
+    },
+    {
+      "epoch": 0.32532617973417877,
+      "grad_norm": 0.13741862773895264,
+      "learning_rate": 5.302107285963045e-05,
+      "loss": 2.618802309036255,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.19,
+      "memory/max_allocated (GiB)": 15.19,
+      "ppl": 13.71928,
+      "step": 667,
+      "tokens/total": 9311360,
+      "tokens/train_per_sec_per_gpu": 2520.66,
+      "tokens/trainable": 3852665
+    },
+    {
+      "epoch": 0.3258139251310816,
+      "grad_norm": 0.12451744079589844,
+      "learning_rate": 5.273540936770058e-05,
+      "loss": 2.497060775756836,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.14674,
+      "step": 668,
+      "tokens/total": 9325952,
+      "tokens/train_per_sec_per_gpu": 2434.46,
+      "tokens/trainable": 3860517
+    },
+    {
+      "epoch": 0.32630167052798437,
+      "grad_norm": 0.14122170209884644,
+      "learning_rate": 5.245024165896126e-05,
+      "loss": 2.5780842304229736,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 13.17188,
+      "step": 669,
+      "tokens/total": 9340928,
+      "tokens/train_per_sec_per_gpu": 2394.1,
+      "tokens/trainable": 3867023
+    },
+    {
+      "epoch": 0.3267894159248872,
+      "grad_norm": 0.1293308287858963,
+      "learning_rate": 5.2165572724686754e-05,
+      "loss": 2.517449140548706,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 12.39693,
+      "step": 670,
+      "tokens/total": 9355392,
+      "tokens/train_per_sec_per_gpu": 2504.15,
+      "tokens/trainable": 3874779
+    },
+    {
+      "epoch": 0.32727716132179,
+      "grad_norm": 0.1419108510017395,
+      "learning_rate": 5.1881405550919493e-05,
+      "loss": 2.5625345706939697,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.19,
+      "memory/max_allocated (GiB)": 15.19,
+      "ppl": 12.96865,
+      "step": 671,
+      "tokens/total": 9369600,
+      "tokens/train_per_sec_per_gpu": 2788.16,
+      "tokens/trainable": 3881901
+    },
+    {
+      "epoch": 0.32776490671869285,
+      "grad_norm": 0.15044128894805908,
+      "learning_rate": 5.1597743118438726e-05,
+      "loss": 2.6445508003234863,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 14.07712,
+      "step": 672,
+      "tokens/total": 9383808,
+      "tokens/train_per_sec_per_gpu": 2152.92,
+      "tokens/trainable": 3887683
+    },
+    {
+      "epoch": 0.3282526521155957,
+      "grad_norm": 0.13477809727191925,
+      "learning_rate": 5.1314588402729044e-05,
+      "loss": 2.459366798400879,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.53,
+      "memory/max_allocated (GiB)": 15.53,
+      "ppl": 11.6974,
+      "step": 673,
+      "tokens/total": 9397376,
+      "tokens/train_per_sec_per_gpu": 1205.32,
+      "tokens/trainable": 3894517
+    },
+    {
+      "epoch": 0.32874039751249845,
+      "grad_norm": 0.16951484978199005,
+      "learning_rate": 5.103194437394952e-05,
+      "loss": 2.6503396034240723,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 14.74,
+      "memory/max_allocated (GiB)": 14.74,
+      "ppl": 14.15885,
+      "step": 674,
+      "tokens/total": 9410176,
+      "tokens/train_per_sec_per_gpu": 402.56,
+      "tokens/trainable": 3898680
+    },
+    {
+      "epoch": 0.3292281429094013,
+      "grad_norm": 0.14310821890830994,
+      "learning_rate": 5.074981399690218e-05,
+      "loss": 2.5750184059143066,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.09,
+      "memory/max_allocated (GiB)": 15.09,
+      "ppl": 13.13156,
+      "step": 675,
+      "tokens/total": 9423360,
+      "tokens/train_per_sec_per_gpu": 1274.61,
+      "tokens/trainable": 3904449
+    },
+    {
+      "epoch": 0.3297158883063041,
+      "grad_norm": 0.14187775552272797,
+      "learning_rate": 5.0468200231001286e-05,
+      "loss": 2.58148455619812,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.21674,
+      "step": 676,
+      "tokens/total": 9438080,
+      "tokens/train_per_sec_per_gpu": 808.62,
+      "tokens/trainable": 3910987
+    },
+    {
+      "epoch": 0.33020363370320693,
+      "grad_norm": 0.16191494464874268,
+      "learning_rate": 5.018710603024187e-05,
+      "loss": 2.709486484527588,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.98,
+      "memory/max_allocated (GiB)": 15.98,
+      "ppl": 15.02156,
+      "step": 677,
+      "tokens/total": 9452672,
+      "tokens/train_per_sec_per_gpu": 2706.9,
+      "tokens/trainable": 3917526
+    },
+    {
+      "epoch": 0.33069137910010976,
+      "grad_norm": 0.1644907146692276,
+      "learning_rate": 4.9906534343169144e-05,
+      "loss": 2.374467372894287,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.74529,
+      "step": 678,
+      "tokens/total": 9466624,
+      "tokens/train_per_sec_per_gpu": 2207.07,
+      "tokens/trainable": 3922866
+    },
+    {
+      "epoch": 0.33117912449701253,
+      "grad_norm": 0.14618027210235596,
+      "learning_rate": 4.962648811284738e-05,
+      "loss": 2.3652446269989014,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 10.64664,
+      "step": 679,
+      "tokens/total": 9481216,
+      "tokens/train_per_sec_per_gpu": 2138.53,
+      "tokens/trainable": 3929184
+    },
+    {
+      "epoch": 0.33166686989391536,
+      "grad_norm": 0.13205529749393463,
+      "learning_rate": 4.934697027682894e-05,
+      "loss": 2.431748867034912,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 11.37876,
+      "step": 680,
+      "tokens/total": 9496064,
+      "tokens/train_per_sec_per_gpu": 2242.79,
+      "tokens/trainable": 3937050
+    },
+    {
+      "epoch": 0.3321546152908182,
+      "grad_norm": 0.15257883071899414,
+      "learning_rate": 4.9067983767123736e-05,
+      "loss": 2.757232666015625,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 15.75618,
+      "step": 681,
+      "tokens/total": 9509504,
+      "tokens/train_per_sec_per_gpu": 1140.87,
+      "tokens/trainable": 3943613
+    },
+    {
+      "epoch": 0.332642360687721,
+      "grad_norm": 0.14146484434604645,
+      "learning_rate": 4.8789531510168163e-05,
+      "loss": 2.426405191421509,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 11.31812,
+      "step": 682,
+      "tokens/total": 9523072,
+      "tokens/train_per_sec_per_gpu": 1019.98,
+      "tokens/trainable": 3950129
+    },
+    {
+      "epoch": 0.33313010608462384,
+      "grad_norm": 0.13973264396190643,
+      "learning_rate": 4.851161642679466e-05,
+      "loss": 2.4615488052368164,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.19,
+      "memory/max_allocated (GiB)": 15.19,
+      "ppl": 11.72295,
+      "step": 683,
+      "tokens/total": 9536768,
+      "tokens/train_per_sec_per_gpu": 2048.1,
+      "tokens/trainable": 3956791
+    },
+    {
+      "epoch": 0.3336178514815266,
+      "grad_norm": 0.1458214819431305,
+      "learning_rate": 4.8234241432200965e-05,
+      "loss": 2.595818519592285,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 13.86,
+      "memory/max_allocated (GiB)": 13.86,
+      "ppl": 13.40756,
+      "step": 684,
+      "tokens/total": 9548672,
+      "tokens/train_per_sec_per_gpu": 3464.81,
+      "tokens/trainable": 3963305
+    },
+    {
+      "epoch": 0.33410559687842944,
+      "grad_norm": 0.18916182219982147,
+      "learning_rate": 4.795740943591955e-05,
+      "loss": 2.2587080001831055,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.19,
+      "memory/max_allocated (GiB)": 15.19,
+      "ppl": 9.57072,
+      "step": 685,
+      "tokens/total": 9562496,
+      "tokens/train_per_sec_per_gpu": 324.45,
+      "tokens/trainable": 3966822
+    },
+    {
+      "epoch": 0.3345933422753323,
+      "grad_norm": 0.14918474853038788,
+      "learning_rate": 4.768112334178699e-05,
+      "loss": 2.419451951980591,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 11.2397,
+      "step": 686,
+      "tokens/total": 9577472,
+      "tokens/train_per_sec_per_gpu": 2453.15,
+      "tokens/trainable": 3972755
+    },
+    {
+      "epoch": 0.3350810876722351,
+      "grad_norm": 0.19332581758499146,
+      "learning_rate": 4.74053860479137e-05,
+      "loss": 2.4659688472747803,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 11.77488,
+      "step": 687,
+      "tokens/total": 9593344,
+      "tokens/train_per_sec_per_gpu": 1020.5,
+      "tokens/trainable": 3977603
+    },
+    {
+      "epoch": 0.33556883306913793,
+      "grad_norm": 0.23045431077480316,
+      "learning_rate": 4.7130200446653475e-05,
+      "loss": 2.4577431678771973,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 11.67843,
+      "step": 688,
+      "tokens/total": 9604992,
+      "tokens/train_per_sec_per_gpu": 354.7,
+      "tokens/trainable": 3980037
+    },
+    {
+      "epoch": 0.3360565784660407,
+      "grad_norm": 0.13135406374931335,
+      "learning_rate": 4.6855569424572955e-05,
+      "loss": 2.274285316467285,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 9.72097,
+      "step": 689,
+      "tokens/total": 9620096,
+      "tokens/train_per_sec_per_gpu": 2860.29,
+      "tokens/trainable": 3988505
+    },
+    {
+      "epoch": 0.33654432386294353,
+      "grad_norm": 0.17153258621692657,
+      "learning_rate": 4.65814958624217e-05,
+      "loss": 2.4314942359924316,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.19,
+      "memory/max_allocated (GiB)": 15.19,
+      "ppl": 11.37587,
+      "step": 690,
+      "tokens/total": 9631488,
+      "tokens/train_per_sec_per_gpu": 1934.44,
+      "tokens/trainable": 3992936
+    },
+    {
+      "epoch": 0.33703206925984636,
+      "grad_norm": 0.14782929420471191,
+      "learning_rate": 4.630798263510162e-05,
+      "loss": 2.453141689300537,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 14.74,
+      "memory/max_allocated (GiB)": 14.74,
+      "ppl": 11.62481,
+      "step": 691,
+      "tokens/total": 9644544,
+      "tokens/train_per_sec_per_gpu": 2826.18,
+      "tokens/trainable": 3998788
+    },
+    {
+      "epoch": 0.3375198146567492,
+      "grad_norm": 0.20830675959587097,
+      "learning_rate": 4.6035032611637094e-05,
+      "loss": 2.244356393814087,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 9.43434,
+      "step": 692,
+      "tokens/total": 9659776,
+      "tokens/train_per_sec_per_gpu": 553.59,
+      "tokens/trainable": 4001704
+    },
+    {
+      "epoch": 0.338007560053652,
+      "grad_norm": 0.13705019652843475,
+      "learning_rate": 4.5762648655144666e-05,
+      "loss": 2.3326563835144043,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.42,
+      "memory/max_allocated (GiB)": 16.42,
+      "ppl": 10.30528,
+      "step": 693,
+      "tokens/total": 9675008,
+      "tokens/train_per_sec_per_gpu": 2153.61,
+      "tokens/trainable": 4008399
+    },
+    {
+      "epoch": 0.3384953054505548,
+      "grad_norm": 0.17856715619564056,
+      "learning_rate": 4.549083362280317e-05,
+      "loss": 2.6045002937316895,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.07,
+      "memory/max_allocated (GiB)": 16.07,
+      "ppl": 13.52447,
+      "step": 694,
+      "tokens/total": 9687424,
+      "tokens/train_per_sec_per_gpu": 1395.96,
+      "tokens/trainable": 4012593
+    },
+    {
+      "epoch": 0.3389830508474576,
+      "grad_norm": 0.17393389344215393,
+      "learning_rate": 4.5219590365823714e-05,
+      "loss": 2.6212544441223145,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 13.75297,
+      "step": 695,
+      "tokens/total": 9701120,
+      "tokens/train_per_sec_per_gpu": 621.02,
+      "tokens/trainable": 4017183
+    },
+    {
+      "epoch": 0.33947079624436044,
+      "grad_norm": 0.15557512640953064,
+      "learning_rate": 4.494892172941965e-05,
+      "loss": 2.5806403160095215,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.19,
+      "memory/max_allocated (GiB)": 15.19,
+      "ppl": 13.20559,
+      "step": 696,
+      "tokens/total": 9714304,
+      "tokens/train_per_sec_per_gpu": 1263.71,
+      "tokens/trainable": 4022987
+    },
+    {
+      "epoch": 0.33995854164126327,
+      "grad_norm": 0.17690803110599518,
+      "learning_rate": 4.467883055277695e-05,
+      "loss": 2.4236552715301514,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 11.28704,
+      "step": 697,
+      "tokens/total": 9727104,
+      "tokens/train_per_sec_per_gpu": 2278.1,
+      "tokens/trainable": 4027598
+    },
+    {
+      "epoch": 0.3404462870381661,
+      "grad_norm": 0.16057541966438293,
+      "learning_rate": 4.440931966902418e-05,
+      "loss": 2.4536919593811035,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.53,
+      "memory/max_allocated (GiB)": 15.53,
+      "ppl": 11.63121,
+      "step": 698,
+      "tokens/total": 9740672,
+      "tokens/train_per_sec_per_gpu": 2386.73,
+      "tokens/trainable": 4032248
+    },
+    {
+      "epoch": 0.34093403243506887,
+      "grad_norm": 0.1556730717420578,
+      "learning_rate": 4.414039190520308e-05,
+      "loss": 2.69823956489563,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 16.51,
+      "memory/max_allocated (GiB)": 16.51,
+      "ppl": 14.85356,
+      "step": 699,
+      "tokens/total": 9755904,
+      "tokens/train_per_sec_per_gpu": 1962.29,
+      "tokens/trainable": 4038156
+    },
+    {
+      "epoch": 0.3414217778319717,
+      "grad_norm": 0.17577911913394928,
+      "learning_rate": 4.387205008223854e-05,
+      "loss": 2.6918764114379883,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 15.63,
+      "memory/max_allocated (GiB)": 15.63,
+      "ppl": 14.75934,
+      "step": 700,
+      "tokens/total": 9769472,
+      "tokens/train_per_sec_per_gpu": 893.71,
+      "tokens/trainable": 4042644
+    },
+    {
+      "epoch": 0.3414217778319717,
+      "eval_loss": 2.505732774734497,
+      "eval_ppl": 12.25253,
+      "eval_runtime": 6.0458,
+      "eval_samples_per_second": 33.081,
+      "eval_steps_per_second": 16.541,
+      "memory/device_reserved (GiB)": 37.6,
+      "memory/max_active (GiB)": 11.76,
+      "memory/max_allocated (GiB)": 11.76,
+      "step": 700
     }
   ],
   "logging_steps": 1,
       "attributes": {}
     }
   },
+  "total_flos": 1.201690148756521e+17,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null