Instructions to use legesher/language-decoded-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use legesher/language-decoded-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="legesher/language-decoded-lora")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("legesher/language-decoded-lora", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use legesher/language-decoded-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "legesher/language-decoded-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "legesher/language-decoded-lora",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/legesher/language-decoded-lora

SGLang

How to use legesher/language-decoded-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "legesher/language-decoded-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "legesher/language-decoded-lora",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "legesher/language-decoded-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "legesher/language-decoded-lora",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use legesher/language-decoded-lora with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for legesher/language-decoded-lora to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for legesher/language-decoded-lora to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for legesher/language-decoded-lora to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="legesher/language-decoded-lora",
    max_seq_length=2048,
)

Docker Model Runner
How to use legesher/language-decoded-lora with Docker Model Runner:
```
docker model run hf.co/legesher/language-decoded-lora
```

Rashik24 commited on May 19

Commit

67e7d55

verified ·

1 Parent(s): 805922a

Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

tiny-aya-base/condition-2-ur-20k-seed42/adapter_config.json +50 -0
tiny-aya-base/condition-2-ur-20k-seed42/adapter_model.safetensors +3 -0
tiny-aya-base/condition-2-ur-20k-seed42/training_metrics.json +809 -0

tiny-aya-base/condition-2-ur-20k-seed42/adapter_config.json ADDED Viewed

	@@ -0,0 +1,50 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "Cohere2ForCausalLM",
+    "parent_library": "transformers.models.cohere2.modeling_cohere2",
+    "unsloth_fixed": true
+  },
+  "base_model_name_or_path": "CohereLabs/tiny-aya-base",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "o_proj",
+    "v_proj",
+    "up_proj",
+    "k_proj",
+    "down_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

tiny-aya-base/condition-2-ur-20k-seed42/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4be052adc7d175133e49df4744fd698b581f533681a2918b02aac85be7e692b7
+size 120981200

tiny-aya-base/condition-2-ur-20k-seed42/training_metrics.json ADDED Viewed

	@@ -0,0 +1,809 @@

+{
+  "model_config_name": "tiny-aya-base",
+  "condition_name": "condition-2-ur-20k",
+  "seed": 42,
+  "output_name": "condition-2-ur-20k-seed42",
+  "train_result": {
+    "train_runtime": 13570.4962,
+    "train_samples_per_second": 1.326,
+    "train_steps_per_second": 0.083,
+    "total_flos": 3.140573201726177e+17,
+    "train_loss": 1.0729037674797905,
+    "epoch": 1.0
+  },
+  "log_history": [
+    {
+      "loss": 1.6003,
+      "grad_norm": 0.17624995112419128,
+      "learning_rate": 3.157894736842105e-05,
+      "epoch": 0.008888888888888889,
+      "step": 10
+    },
+    {
+      "loss": 1.5313,
+      "grad_norm": 0.2164340317249298,
+      "learning_rate": 6.666666666666667e-05,
+      "epoch": 0.017777777777777778,
+      "step": 20
+    },
+    {
+      "loss": 1.3757,
+      "grad_norm": 0.18562479317188263,
+      "learning_rate": 0.0001017543859649123,
+      "epoch": 0.02666666666666667,
+      "step": 30
+    },
+    {
+      "loss": 1.2882,
+      "grad_norm": 0.24056175351142883,
+      "learning_rate": 0.0001368421052631579,
+      "epoch": 0.035555555555555556,
+      "step": 40
+    },
+    {
+      "loss": 1.204,
+      "grad_norm": 0.19117851555347443,
+      "learning_rate": 0.00017192982456140353,
+      "epoch": 0.044444444444444446,
+      "step": 50
+    },
+    {
+      "loss": 1.1585,
+      "grad_norm": 0.2000139057636261,
+      "learning_rate": 0.0001999982694427025,
+      "epoch": 0.05333333333333334,
+      "step": 60
+    },
+    {
+      "loss": 1.1604,
+      "grad_norm": 0.16707158088684082,
+      "learning_rate": 0.00019993770622619782,
+      "epoch": 0.06222222222222222,
+      "step": 70
+    },
+    {
+      "loss": 1.1208,
+      "grad_norm": 0.1946488916873932,
+      "learning_rate": 0.00019979067503207154,
+      "epoch": 0.07111111111111111,
+      "step": 80
+    },
+    {
+      "loss": 1.0732,
+      "grad_norm": 0.22706064581871033,
+      "learning_rate": 0.00019955730307447014,
+      "epoch": 0.08,
+      "step": 90
+    },
+    {
+      "loss": 1.1061,
+      "grad_norm": 0.18451257050037384,
+      "learning_rate": 0.0001992377922711879,
+      "epoch": 0.08888888888888889,
+      "step": 100
+    },
+    {
+      "loss": 1.1041,
+      "grad_norm": 0.19636838138103485,
+      "learning_rate": 0.00019883241906896388,
+      "epoch": 0.09777777777777778,
+      "step": 110
+    },
+    {
+      "loss": 1.037,
+      "grad_norm": 0.1620660275220871,
+      "learning_rate": 0.00019834153420429478,
+      "epoch": 0.10666666666666667,
+      "step": 120
+    },
+    {
+      "loss": 1.0957,
+      "grad_norm": 0.1725081503391266,
+      "learning_rate": 0.00019776556239997146,
+      "epoch": 0.11555555555555555,
+      "step": 130
+    },
+    {
+      "loss": 1.123,
+      "grad_norm": 0.16180731356143951,
+      "learning_rate": 0.0001971050019976005,
+      "epoch": 0.12444444444444444,
+      "step": 140
+    },
+    {
+      "loss": 1.0964,
+      "grad_norm": 0.1704389601945877,
+      "learning_rate": 0.00019636042452643,
+      "epoch": 0.13333333333333333,
+      "step": 150
+    },
+    {
+      "loss": 1.0381,
+      "grad_norm": 0.17602771520614624,
+      "learning_rate": 0.00019553247420885157,
+      "epoch": 0.14222222222222222,
+      "step": 160
+    },
+    {
+      "loss": 1.0891,
+      "grad_norm": 0.1582716852426529,
+      "learning_rate": 0.00019462186740300697,
+      "epoch": 0.1511111111111111,
+      "step": 170
+    },
+    {
+      "loss": 1.0711,
+      "grad_norm": 0.14734524488449097,
+      "learning_rate": 0.00019362939198298184,
+      "epoch": 0.16,
+      "step": 180
+    },
+    {
+      "loss": 1.0855,
+      "grad_norm": 0.14791883528232574,
+      "learning_rate": 0.00019255590665712214,
+      "epoch": 0.1688888888888889,
+      "step": 190
+    },
+    {
+      "loss": 1.0078,
+      "grad_norm": 0.15503981709480286,
+      "learning_rate": 0.00019140234022506348,
+      "epoch": 0.17777777777777778,
+      "step": 200
+    },
+    {
+      "loss": 1.0704,
+      "grad_norm": 0.14969214797019958,
+      "learning_rate": 0.00019016969077411647,
+      "epoch": 0.18666666666666668,
+      "step": 210
+    },
+    {
+      "loss": 1.0345,
+      "grad_norm": 0.1351374387741089,
+      "learning_rate": 0.0001888590248157027,
+      "epoch": 0.19555555555555557,
+      "step": 220
+    },
+    {
+      "loss": 1.0458,
+      "grad_norm": 0.137342631816864,
+      "learning_rate": 0.00018747147636258917,
+      "epoch": 0.20444444444444446,
+      "step": 230
+    },
+    {
+      "loss": 1.0645,
+      "grad_norm": 0.13347966969013214,
+      "learning_rate": 0.00018600824594771907,
+      "epoch": 0.21333333333333335,
+      "step": 240
+    },
+    {
+      "loss": 1.0704,
+      "grad_norm": 0.15762671828269958,
+      "learning_rate": 0.0001844705995854882,
+      "epoch": 0.2222222222222222,
+      "step": 250
+    },
+    {
+      "loss": 1.0993,
+      "grad_norm": 0.1556115299463272,
+      "learning_rate": 0.00018285986767636566,
+      "epoch": 0.2311111111111111,
+      "step": 260
+    },
+    {
+      "loss": 1.1072,
+      "grad_norm": 0.1583993136882782,
+      "learning_rate": 0.00018117744385580625,
+      "epoch": 0.24,
+      "step": 270
+    },
+    {
+      "loss": 1.1328,
+      "grad_norm": 0.1593606024980545,
+      "learning_rate": 0.0001794247837884511,
+      "epoch": 0.24888888888888888,
+      "step": 280
+    },
+    {
+      "loss": 1.1149,
+      "grad_norm": 0.13557392358779907,
+      "learning_rate": 0.0001776034039086592,
+      "epoch": 0.2577777777777778,
+      "step": 290
+    },
+    {
+      "loss": 1.0651,
+      "grad_norm": 0.14842922985553741,
+      "learning_rate": 0.00017571488010846003,
+      "epoch": 0.26666666666666666,
+      "step": 300
+    },
+    {
+      "loss": 1.0862,
+      "grad_norm": 0.1485857218503952,
+      "learning_rate": 0.00017376084637406222,
+      "epoch": 0.27555555555555555,
+      "step": 310
+    },
+    {
+      "loss": 1.0609,
+      "grad_norm": 0.16749393939971924,
+      "learning_rate": 0.000171742993372098,
+      "epoch": 0.28444444444444444,
+      "step": 320
+    },
+    {
+      "loss": 1.036,
+      "grad_norm": 0.1489233523607254,
+      "learning_rate": 0.0001696630669868267,
+      "epoch": 0.29333333333333333,
+      "step": 330
+    },
+    {
+      "loss": 1.1238,
+      "grad_norm": 0.14296723902225494,
+      "learning_rate": 0.00016752286680956306,
+      "epoch": 0.3022222222222222,
+      "step": 340
+    },
+    {
+      "loss": 1.0445,
+      "grad_norm": 0.14162296056747437,
+      "learning_rate": 0.00016532424458163693,
+      "epoch": 0.3111111111111111,
+      "step": 350
+    },
+    {
+      "loss": 1.0369,
+      "grad_norm": 0.143046036362648,
+      "learning_rate": 0.0001630691025922321,
+      "epoch": 0.32,
+      "step": 360
+    },
+    {
+      "loss": 1.0383,
+      "grad_norm": 0.1497723013162613,
+      "learning_rate": 0.0001607593920324899,
+      "epoch": 0.3288888888888889,
+      "step": 370
+    },
+    {
+      "loss": 0.9846,
+      "grad_norm": 0.13559605181217194,
+      "learning_rate": 0.00015839711130730203,
+      "epoch": 0.3377777777777778,
+      "step": 380
+    },
+    {
+      "loss": 1.042,
+      "grad_norm": 0.14534465968608856,
+      "learning_rate": 0.00015598430430625334,
+      "epoch": 0.3466666666666667,
+      "step": 390
+    },
+    {
+      "loss": 1.0358,
+      "grad_norm": 0.16837020218372345,
+      "learning_rate": 0.00015352305863520991,
+      "epoch": 0.35555555555555557,
+      "step": 400
+    },
+    {
+      "loss": 1.0265,
+      "grad_norm": 0.13915309309959412,
+      "learning_rate": 0.00015101550381008377,
+      "epoch": 0.36444444444444446,
+      "step": 410
+    },
+    {
+      "loss": 1.0672,
+      "grad_norm": 0.19003401696681976,
+      "learning_rate": 0.00014846380941433522,
+      "epoch": 0.37333333333333335,
+      "step": 420
+    },
+    {
+      "loss": 1.0971,
+      "grad_norm": 0.1480245292186737,
+      "learning_rate": 0.00014587018322180905,
+      "epoch": 0.38222222222222224,
+      "step": 430
+    },
+    {
+      "loss": 1.0946,
+      "grad_norm": 0.1381472498178482,
+      "learning_rate": 0.00014323686928652697,
+      "epoch": 0.39111111111111113,
+      "step": 440
+    },
+    {
+      "loss": 1.0513,
+      "grad_norm": 0.16802673041820526,
+      "learning_rate": 0.00014056614600108997,
+      "epoch": 0.4,
+      "step": 450
+    },
+    {
+      "loss": 1.0624,
+      "grad_norm": 0.19680212438106537,
+      "learning_rate": 0.00013786032412537035,
+      "epoch": 0.4088888888888889,
+      "step": 460
+    },
+    {
+      "loss": 1.0544,
+      "grad_norm": 0.15500085055828094,
+      "learning_rate": 0.00013512174478719894,
+      "epoch": 0.4177777777777778,
+      "step": 470
+    },
+    {
+      "loss": 1.032,
+      "grad_norm": 0.14582695066928864,
+      "learning_rate": 0.00013235277745677747,
+      "epoch": 0.4266666666666667,
+      "step": 480
+    },
+    {
+      "loss": 1.0269,
+      "grad_norm": 0.1789093166589737,
+      "learning_rate": 0.00012955581789656843,
+      "epoch": 0.43555555555555553,
+      "step": 490
+    },
+    {
+      "loss": 1.0565,
+      "grad_norm": 0.14305955171585083,
+      "learning_rate": 0.00012673328608843636,
+      "epoch": 0.4444444444444444,
+      "step": 500
+    },
+    {
+      "loss": 1.0405,
+      "grad_norm": 0.1427326500415802,
+      "learning_rate": 0.00012388762413983445,
+      "epoch": 0.4533333333333333,
+      "step": 510
+    },
+    {
+      "loss": 1.0596,
+      "grad_norm": 0.14619527757167816,
+      "learning_rate": 0.00012102129417084714,
+      "epoch": 0.4622222222222222,
+      "step": 520
+    },
+    {
+      "loss": 1.0823,
+      "grad_norm": 0.15209229290485382,
+      "learning_rate": 0.00011813677618391759,
+      "epoch": 0.4711111111111111,
+      "step": 530
+    },
+    {
+      "loss": 1.0738,
+      "grad_norm": 0.1491280198097229,
+      "learning_rate": 0.00011523656591810337,
+      "epoch": 0.48,
+      "step": 540
+    },
+    {
+      "loss": 0.9624,
+      "grad_norm": 0.13512106239795685,
+      "learning_rate": 0.00011232317268971585,
+      "epoch": 0.4888888888888889,
+      "step": 550
+    },
+    {
+      "loss": 1.067,
+      "grad_norm": 0.15700113773345947,
+      "learning_rate": 0.00010939911722121306,
+      "epoch": 0.49777777777777776,
+      "step": 560
+    },
+    {
+      "loss": 1.0011,
+      "grad_norm": 0.17299264669418335,
+      "learning_rate": 0.00010646692946022285,
+      "epoch": 0.5066666666666667,
+      "step": 570
+    },
+    {
+      "loss": 1.0878,
+      "grad_norm": 0.13645489513874054,
+      "learning_rate": 0.00010352914639058526,
+      "epoch": 0.5155555555555555,
+      "step": 580
+    },
+    {
+      "loss": 1.0605,
+      "grad_norm": 0.14967432618141174,
+      "learning_rate": 0.00010058830983730622,
+      "epoch": 0.5244444444444445,
+      "step": 590
+    },
+    {
+      "loss": 0.9786,
+      "grad_norm": 0.12973353266716003,
+      "learning_rate": 9.764696426732303e-05,
+      "epoch": 0.5333333333333333,
+      "step": 600
+    },
+    {
+      "loss": 1.0876,
+      "grad_norm": 0.14270669221878052,
+      "learning_rate": 9.470765458798368e-05,
+      "epoch": 0.5422222222222223,
+      "step": 610
+    },
+    {
+      "loss": 1.0488,
+      "grad_norm": 0.13296638429164886,
+      "learning_rate": 9.177292394514555e-05,
+      "epoch": 0.5511111111111111,
+      "step": 620
+    },
+    {
+      "loss": 1.082,
+      "grad_norm": 0.13516603410243988,
+      "learning_rate": 8.884531152279756e-05,
+      "epoch": 0.56,
+      "step": 630
+    },
+    {
+      "loss": 1.0213,
+      "grad_norm": 0.13275611400604248,
+      "learning_rate": 8.592735034611097e-05,
+      "epoch": 0.5688888888888889,
+      "step": 640
+    },
+    {
+      "loss": 1.0552,
+      "grad_norm": 0.13083018362522125,
+      "learning_rate": 8.302156508981815e-05,
+      "epoch": 0.5777777777777777,
+      "step": 650
+    },
+    {
+      "loss": 1.0686,
+      "grad_norm": 0.12451142817735672,
+      "learning_rate": 8.013046989381691e-05,
+      "epoch": 0.5866666666666667,
+      "step": 660
+    },
+    {
+      "loss": 1.0172,
+      "grad_norm": 0.12081737071275711,
+      "learning_rate": 7.725656618788937e-05,
+      "epoch": 0.5955555555555555,
+      "step": 670
+    },
+    {
+      "loss": 1.0153,
+      "grad_norm": 0.13223817944526672,
+      "learning_rate": 7.4402340527418e-05,
+      "epoch": 0.6044444444444445,
+      "step": 680
+    },
+    {
+      "loss": 1.0257,
+      "grad_norm": 0.14401006698608398,
+      "learning_rate": 7.157026244197132e-05,
+      "epoch": 0.6133333333333333,
+      "step": 690
+    },
+    {
+      "loss": 1.0693,
+      "grad_norm": 0.13986144959926605,
+      "learning_rate": 6.87627822986206e-05,
+      "epoch": 0.6222222222222222,
+      "step": 700
+    },
+    {
+      "loss": 1.0483,
+      "grad_norm": 0.16049247980117798,
+      "learning_rate": 6.598232918183632e-05,
+      "epoch": 0.6311111111111111,
+      "step": 710
+    },
+    {
+      "loss": 1.0385,
+      "grad_norm": 0.12732988595962524,
+      "learning_rate": 6.323130879179875e-05,
+      "epoch": 0.64,
+      "step": 720
+    },
+    {
+      "loss": 1.0188,
+      "grad_norm": 0.12288779765367508,
+      "learning_rate": 6.051210136294089e-05,
+      "epoch": 0.6488888888888888,
+      "step": 730
+    },
+    {
+      "loss": 1.0767,
+      "grad_norm": 0.1345112919807434,
+      "learning_rate": 5.7827059604525234e-05,
+      "epoch": 0.6577777777777778,
+      "step": 740
+    },
+    {
+      "loss": 1.131,
+      "grad_norm": 0.14112229645252228,
+      "learning_rate": 5.517850666503547e-05,
+      "epoch": 0.6666666666666666,
+      "step": 750
+    },
+    {
+      "loss": 1.0454,
+      "grad_norm": 0.1502366065979004,
+      "learning_rate": 5.2568734122144756e-05,
+      "epoch": 0.6755555555555556,
+      "step": 760
+    },
+    {
+      "loss": 1.0806,
+      "grad_norm": 0.15065442025661469,
+      "learning_rate": 5.000000000000002e-05,
+      "epoch": 0.6844444444444444,
+      "step": 770
+    },
+    {
+      "loss": 1.0434,
+      "grad_norm": 0.1595860868692398,
+      "learning_rate": 4.747452681553674e-05,
+      "epoch": 0.6933333333333334,
+      "step": 780
+    },
+    {
+      "loss": 1.037,
+      "grad_norm": 0.14797107875347137,
+      "learning_rate": 4.4994499655515865e-05,
+      "epoch": 0.7022222222222222,
+      "step": 790
+    },
+    {
+      "loss": 1.0375,
+      "grad_norm": 0.1406623274087906,
+      "learning_rate": 4.256206428594587e-05,
+      "epoch": 0.7111111111111111,
+      "step": 800
+    },
+    {
+      "loss": 1.0624,
+      "grad_norm": 0.133322075009346,
+      "learning_rate": 4.017932529552543e-05,
+      "epoch": 0.72,
+      "step": 810
+    },
+    {
+      "loss": 0.9731,
+      "grad_norm": 0.15407244861125946,
+      "learning_rate": 3.784834427471408e-05,
+      "epoch": 0.7288888888888889,
+      "step": 820
+    },
+    {
+      "loss": 1.0992,
+      "grad_norm": 0.11977162212133408,
+      "learning_rate": 3.557113803200537e-05,
+      "epoch": 0.7377777777777778,
+      "step": 830
+    },
+    {
+      "loss": 1.0719,
+      "grad_norm": 0.12586627900600433,
+      "learning_rate": 3.3349676848946345e-05,
+      "epoch": 0.7466666666666667,
+      "step": 840
+    },
+    {
+      "loss": 1.0736,
+      "grad_norm": 0.14590239524841309,
+      "learning_rate": 3.118588277541312e-05,
+      "epoch": 0.7555555555555555,
+      "step": 850
+    },
+    {
+      "loss": 1.0511,
+      "grad_norm": 0.14116385579109192,
+      "learning_rate": 2.9081627966617096e-05,
+      "epoch": 0.7644444444444445,
+      "step": 860
+    },
+    {
+      "loss": 0.9715,
+      "grad_norm": 0.12606090307235718,
+      "learning_rate": 2.7038733063281174e-05,
+      "epoch": 0.7733333333333333,
+      "step": 870
+    },
+    {
+      "loss": 1.0455,
+      "grad_norm": 0.12883330881595612,
+      "learning_rate": 2.5058965616387498e-05,
+      "epoch": 0.7822222222222223,
+      "step": 880
+    },
+    {
+      "loss": 1.0763,
+      "grad_norm": 0.15165486931800842,
+      "learning_rate": 2.3144038557858916e-05,
+      "epoch": 0.7911111111111111,
+      "step": 890
+    },
+    {
+      "loss": 1.0576,
+      "grad_norm": 0.13923071324825287,
+      "learning_rate": 2.1295608718498284e-05,
+      "epoch": 0.8,
+      "step": 900
+    },
+    {
+      "loss": 0.9802,
+      "grad_norm": 0.15918999910354614,
+      "learning_rate": 1.9515275394467446e-05,
+      "epoch": 0.8088888888888889,
+      "step": 910
+    },
+    {
+      "loss": 1.0506,
+      "grad_norm": 0.1284029185771942,
+      "learning_rate": 1.7804578963545994e-05,
+      "epoch": 0.8177777777777778,
+      "step": 920
+    },
+    {
+      "loss": 1.0801,
+      "grad_norm": 0.14188328385353088,
+      "learning_rate": 1.6164999552367765e-05,
+      "epoch": 0.8266666666666667,
+      "step": 930
+    },
+    {
+      "loss": 1.0663,
+      "grad_norm": 0.13663959503173828,
+      "learning_rate": 1.4597955755787373e-05,
+      "epoch": 0.8355555555555556,
+      "step": 940
+    },
+    {
+      "loss": 1.0464,
+      "grad_norm": 0.12637336552143097,
+      "learning_rate": 1.3104803409485356e-05,
+      "epoch": 0.8444444444444444,
+      "step": 950
+    },
+    {
+      "loss": 1.0779,
+      "grad_norm": 0.12190049886703491,
+      "learning_rate": 1.1686834416873815e-05,
+      "epoch": 0.8533333333333334,
+      "step": 960
+    },
+    {
+      "loss": 1.0832,
+      "grad_norm": 0.14396388828754425,
+      "learning_rate": 1.0345275631317163e-05,
+      "epoch": 0.8622222222222222,
+      "step": 970
+    },
+    {
+      "loss": 1.0358,
+      "grad_norm": 0.14128635823726654,
+      "learning_rate": 9.081287794635774e-06,
+      "epoch": 0.8711111111111111,
+      "step": 980
+    },
+    {
+      "loss": 1.0186,
+      "grad_norm": 0.12689436972141266,
+      "learning_rate": 7.895964532810317e-06,
+      "epoch": 0.88,
+      "step": 990
+    },
+    {
+      "loss": 1.0633,
+      "grad_norm": 0.13579504191875458,
+      "learning_rate": 6.7903314097560454e-06,
+      "epoch": 0.8888888888888888,
+      "step": 1000
+    },
+    {
+      "loss": 1.0574,
+      "grad_norm": 0.16054902970790863,
+      "learning_rate": 5.765345039985648e-06,
+      "epoch": 0.8977777777777778,
+      "step": 1010
+    },
+    {
+      "loss": 1.074,
+      "grad_norm": 0.1411615014076233,
+      "learning_rate": 4.821892260928451e-06,
+      "epoch": 0.9066666666666666,
+      "step": 1020
+    },
+    {
+      "loss": 1.0644,
+      "grad_norm": 0.13848498463630676,
+      "learning_rate": 3.960789365622075e-06,
+      "epoch": 0.9155555555555556,
+      "step": 1030
+    },
+    {
+      "loss": 1.0546,
+      "grad_norm": 0.13058850169181824,
+      "learning_rate": 3.1827813964403484e-06,
+      "epoch": 0.9244444444444444,
+      "step": 1040
+    },
+    {
+      "loss": 0.9995,
+      "grad_norm": 0.13550138473510742,
+      "learning_rate": 2.4885415004686665e-06,
+      "epoch": 0.9333333333333333,
+      "step": 1050
+    },
+    {
+      "loss": 1.0227,
+      "grad_norm": 0.17298896610736847,
+      "learning_rate": 1.8786703470845547e-06,
+      "epoch": 0.9422222222222222,
+      "step": 1060
+    },
+    {
+      "loss": 1.0634,
+      "grad_norm": 0.11537665873765945,
+      "learning_rate": 1.3536956082472074e-06,
+      "epoch": 0.9511111111111111,
+      "step": 1070
+    },
+    {
+      "loss": 1.0009,
+      "grad_norm": 0.15022173523902893,
+      "learning_rate": 9.140715019458457e-07,
+      "epoch": 0.96,
+      "step": 1080
+    },
+    {
+      "loss": 1.0038,
+      "grad_norm": 0.13574285805225372,
+      "learning_rate": 5.60178399201805e-07,
+      "epoch": 0.9688888888888889,
+      "step": 1090
+    },
+    {
+      "loss": 1.059,
+      "grad_norm": 0.1271033138036728,
+      "learning_rate": 2.923224949643477e-07,
+      "epoch": 0.9777777777777777,
+      "step": 1100
+    },
+    {
+      "loss": 1.0386,
+      "grad_norm": 0.15312351286411285,
+      "learning_rate": 1.1073554318509205e-07,
+      "epoch": 0.9866666666666667,
+      "step": 1110
+    },
+    {
+      "loss": 1.0199,
+      "grad_norm": 0.13344977796077728,
+      "learning_rate": 1.5574656300143542e-08,
+      "epoch": 0.9955555555555555,
+      "step": 1120
+    },
+    {
+      "train_runtime": 13570.4962,
+      "train_samples_per_second": 1.326,
+      "train_steps_per_second": 0.083,
+      "total_flos": 3.140573201726177e+17,
+      "train_loss": 1.0729037674797905,
+      "epoch": 1.0,
+      "step": 1125
+    }
+  ]
+}