Instructions to use legesher/language-decoded-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use legesher/language-decoded-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="legesher/language-decoded-lora")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("legesher/language-decoded-lora", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use legesher/language-decoded-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "legesher/language-decoded-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "legesher/language-decoded-lora",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/legesher/language-decoded-lora

SGLang

How to use legesher/language-decoded-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "legesher/language-decoded-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "legesher/language-decoded-lora",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "legesher/language-decoded-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "legesher/language-decoded-lora",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use legesher/language-decoded-lora with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for legesher/language-decoded-lora to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for legesher/language-decoded-lora to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for legesher/language-decoded-lora to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="legesher/language-decoded-lora",
    max_seq_length=2048,
)

Docker Model Runner
How to use legesher/language-decoded-lora with Docker Model Runner:
```
docker model run hf.co/legesher/language-decoded-lora
```

Rashik24 commited on May 18

Commit

7d1ac71

verified ·

1 Parent(s): 1c9b9c0

Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

tiny-aya-base/condition-1-en-20k-seed42/adapter_config.json +50 -0
tiny-aya-base/condition-1-en-20k-seed42/adapter_model.safetensors +3 -0
tiny-aya-base/condition-1-en-20k-seed42/training_metrics.json +809 -0

tiny-aya-base/condition-1-en-20k-seed42/adapter_config.json ADDED Viewed

	@@ -0,0 +1,50 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "Cohere2ForCausalLM",
+    "parent_library": "transformers.models.cohere2.modeling_cohere2",
+    "unsloth_fixed": true
+  },
+  "base_model_name_or_path": "CohereLabs/tiny-aya-base",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "q_proj",
+    "k_proj",
+    "down_proj",
+    "up_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

tiny-aya-base/condition-1-en-20k-seed42/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18de78e59fc7dc2e4b0bb942e4479085bee3eacbe8f4b92487d91d94ed96d15d
+size 120981200

tiny-aya-base/condition-1-en-20k-seed42/training_metrics.json ADDED Viewed

	@@ -0,0 +1,809 @@

+{
+  "model_config_name": "tiny-aya-base",
+  "condition_name": "condition-1-en-20k",
+  "seed": 42,
+  "output_name": "condition-1-en-20k-seed42",
+  "train_result": {
+    "train_runtime": 14563.2899,
+    "train_samples_per_second": 1.236,
+    "train_steps_per_second": 0.077,
+    "total_flos": 3.141329099864146e+17,
+    "train_loss": 1.0860543518066406,
+    "epoch": 1.0
+  },
+  "log_history": [
+    {
+      "loss": 1.1096,
+      "grad_norm": 0.06403742730617523,
+      "learning_rate": 3.157894736842105e-05,
+      "epoch": 0.008888888888888889,
+      "step": 10
+    },
+    {
+      "loss": 1.0956,
+      "grad_norm": 0.07200434803962708,
+      "learning_rate": 6.666666666666667e-05,
+      "epoch": 0.017777777777777778,
+      "step": 20
+    },
+    {
+      "loss": 1.1093,
+      "grad_norm": 0.08676765114068985,
+      "learning_rate": 0.0001017543859649123,
+      "epoch": 0.02666666666666667,
+      "step": 30
+    },
+    {
+      "loss": 1.1155,
+      "grad_norm": 0.08623942732810974,
+      "learning_rate": 0.0001368421052631579,
+      "epoch": 0.035555555555555556,
+      "step": 40
+    },
+    {
+      "loss": 1.074,
+      "grad_norm": 0.07963975518941879,
+      "learning_rate": 0.00017192982456140353,
+      "epoch": 0.044444444444444446,
+      "step": 50
+    },
+    {
+      "loss": 1.1382,
+      "grad_norm": 0.08217670768499374,
+      "learning_rate": 0.0001999982694427025,
+      "epoch": 0.05333333333333334,
+      "step": 60
+    },
+    {
+      "loss": 1.012,
+      "grad_norm": 0.0925581157207489,
+      "learning_rate": 0.00019993770622619782,
+      "epoch": 0.06222222222222222,
+      "step": 70
+    },
+    {
+      "loss": 1.1102,
+      "grad_norm": 0.08492089062929153,
+      "learning_rate": 0.00019979067503207154,
+      "epoch": 0.07111111111111111,
+      "step": 80
+    },
+    {
+      "loss": 1.1262,
+      "grad_norm": 0.07605846226215363,
+      "learning_rate": 0.00019955730307447014,
+      "epoch": 0.08,
+      "step": 90
+    },
+    {
+      "loss": 1.0844,
+      "grad_norm": 0.08105836063623428,
+      "learning_rate": 0.0001992377922711879,
+      "epoch": 0.08888888888888889,
+      "step": 100
+    },
+    {
+      "loss": 1.0595,
+      "grad_norm": 0.07768302410840988,
+      "learning_rate": 0.00019883241906896388,
+      "epoch": 0.09777777777777778,
+      "step": 110
+    },
+    {
+      "loss": 1.0744,
+      "grad_norm": 0.08287444710731506,
+      "learning_rate": 0.00019834153420429478,
+      "epoch": 0.10666666666666667,
+      "step": 120
+    },
+    {
+      "loss": 1.0606,
+      "grad_norm": 0.07890604436397552,
+      "learning_rate": 0.00019776556239997146,
+      "epoch": 0.11555555555555555,
+      "step": 130
+    },
+    {
+      "loss": 1.0618,
+      "grad_norm": 0.08559550344944,
+      "learning_rate": 0.0001971050019976005,
+      "epoch": 0.12444444444444444,
+      "step": 140
+    },
+    {
+      "loss": 1.0808,
+      "grad_norm": 0.08610441535711288,
+      "learning_rate": 0.00019636042452643,
+      "epoch": 0.13333333333333333,
+      "step": 150
+    },
+    {
+      "loss": 1.1284,
+      "grad_norm": 0.08284633606672287,
+      "learning_rate": 0.00019553247420885157,
+      "epoch": 0.14222222222222222,
+      "step": 160
+    },
+    {
+      "loss": 1.0735,
+      "grad_norm": 0.07116303592920303,
+      "learning_rate": 0.00019462186740300697,
+      "epoch": 0.1511111111111111,
+      "step": 170
+    },
+    {
+      "loss": 1.0629,
+      "grad_norm": 0.07475987076759338,
+      "learning_rate": 0.00019362939198298184,
+      "epoch": 0.16,
+      "step": 180
+    },
+    {
+      "loss": 1.0924,
+      "grad_norm": 0.07793471962213516,
+      "learning_rate": 0.00019255590665712214,
+      "epoch": 0.1688888888888889,
+      "step": 190
+    },
+    {
+      "loss": 1.0697,
+      "grad_norm": 0.08392629772424698,
+      "learning_rate": 0.00019140234022506348,
+      "epoch": 0.17777777777777778,
+      "step": 200
+    },
+    {
+      "loss": 1.0898,
+      "grad_norm": 0.07578205317258835,
+      "learning_rate": 0.00019016969077411647,
+      "epoch": 0.18666666666666668,
+      "step": 210
+    },
+    {
+      "loss": 1.1044,
+      "grad_norm": 0.08797736465930939,
+      "learning_rate": 0.0001888590248157027,
+      "epoch": 0.19555555555555557,
+      "step": 220
+    },
+    {
+      "loss": 1.1363,
+      "grad_norm": 0.08936440199613571,
+      "learning_rate": 0.00018747147636258917,
+      "epoch": 0.20444444444444446,
+      "step": 230
+    },
+    {
+      "loss": 1.106,
+      "grad_norm": 0.08917027711868286,
+      "learning_rate": 0.00018600824594771907,
+      "epoch": 0.21333333333333335,
+      "step": 240
+    },
+    {
+      "loss": 1.1467,
+      "grad_norm": 0.08728639781475067,
+      "learning_rate": 0.0001844705995854882,
+      "epoch": 0.2222222222222222,
+      "step": 250
+    },
+    {
+      "loss": 1.0262,
+      "grad_norm": 0.08384900540113449,
+      "learning_rate": 0.00018285986767636566,
+      "epoch": 0.2311111111111111,
+      "step": 260
+    },
+    {
+      "loss": 1.1373,
+      "grad_norm": 0.08024132251739502,
+      "learning_rate": 0.00018117744385580625,
+      "epoch": 0.24,
+      "step": 270
+    },
+    {
+      "loss": 1.1169,
+      "grad_norm": 0.08673477172851562,
+      "learning_rate": 0.0001794247837884511,
+      "epoch": 0.24888888888888888,
+      "step": 280
+    },
+    {
+      "loss": 1.0404,
+      "grad_norm": 0.10723299533128738,
+      "learning_rate": 0.0001776034039086592,
+      "epoch": 0.2577777777777778,
+      "step": 290
+    },
+    {
+      "loss": 1.1536,
+      "grad_norm": 0.0855490043759346,
+      "learning_rate": 0.00017571488010846003,
+      "epoch": 0.26666666666666666,
+      "step": 300
+    },
+    {
+      "loss": 1.1175,
+      "grad_norm": 0.07594490051269531,
+      "learning_rate": 0.00017376084637406222,
+      "epoch": 0.27555555555555555,
+      "step": 310
+    },
+    {
+      "loss": 1.0489,
+      "grad_norm": 0.07274004071950912,
+      "learning_rate": 0.000171742993372098,
+      "epoch": 0.28444444444444444,
+      "step": 320
+    },
+    {
+      "loss": 1.0753,
+      "grad_norm": 0.09644320607185364,
+      "learning_rate": 0.0001696630669868267,
+      "epoch": 0.29333333333333333,
+      "step": 330
+    },
+    {
+      "loss": 1.132,
+      "grad_norm": 0.09514153003692627,
+      "learning_rate": 0.00016752286680956306,
+      "epoch": 0.3022222222222222,
+      "step": 340
+    },
+    {
+      "loss": 1.0777,
+      "grad_norm": 0.09859903156757355,
+      "learning_rate": 0.00016532424458163693,
+      "epoch": 0.3111111111111111,
+      "step": 350
+    },
+    {
+      "loss": 1.0358,
+      "grad_norm": 0.07918336242437363,
+      "learning_rate": 0.0001630691025922321,
+      "epoch": 0.32,
+      "step": 360
+    },
+    {
+      "loss": 1.08,
+      "grad_norm": 0.08175525069236755,
+      "learning_rate": 0.0001607593920324899,
+      "epoch": 0.3288888888888889,
+      "step": 370
+    },
+    {
+      "loss": 1.1033,
+      "grad_norm": 0.11684149503707886,
+      "learning_rate": 0.00015839711130730203,
+      "epoch": 0.3377777777777778,
+      "step": 380
+    },
+    {
+      "loss": 1.0916,
+      "grad_norm": 0.09373477101325989,
+      "learning_rate": 0.00015598430430625334,
+      "epoch": 0.3466666666666667,
+      "step": 390
+    },
+    {
+      "loss": 1.1265,
+      "grad_norm": 0.08210037648677826,
+      "learning_rate": 0.00015352305863520991,
+      "epoch": 0.35555555555555557,
+      "step": 400
+    },
+    {
+      "loss": 1.1241,
+      "grad_norm": 0.08330373466014862,
+      "learning_rate": 0.00015101550381008377,
+      "epoch": 0.36444444444444446,
+      "step": 410
+    },
+    {
+      "loss": 1.0741,
+      "grad_norm": 0.07790176570415497,
+      "learning_rate": 0.00014846380941433522,
+      "epoch": 0.37333333333333335,
+      "step": 420
+    },
+    {
+      "loss": 1.0975,
+      "grad_norm": 0.08394762128591537,
+      "learning_rate": 0.00014587018322180905,
+      "epoch": 0.38222222222222224,
+      "step": 430
+    },
+    {
+      "loss": 1.0846,
+      "grad_norm": 0.0850745216012001,
+      "learning_rate": 0.00014323686928652697,
+      "epoch": 0.39111111111111113,
+      "step": 440
+    },
+    {
+      "loss": 1.0661,
+      "grad_norm": 0.09569722414016724,
+      "learning_rate": 0.00014056614600108997,
+      "epoch": 0.4,
+      "step": 450
+    },
+    {
+      "loss": 1.1233,
+      "grad_norm": 0.08644863963127136,
+      "learning_rate": 0.00013786032412537035,
+      "epoch": 0.4088888888888889,
+      "step": 460
+    },
+    {
+      "loss": 1.0715,
+      "grad_norm": 0.08750565350055695,
+      "learning_rate": 0.00013512174478719894,
+      "epoch": 0.4177777777777778,
+      "step": 470
+    },
+    {
+      "loss": 1.0938,
+      "grad_norm": 0.09004763513803482,
+      "learning_rate": 0.00013235277745677747,
+      "epoch": 0.4266666666666667,
+      "step": 480
+    },
+    {
+      "loss": 1.0737,
+      "grad_norm": 0.08652064204216003,
+      "learning_rate": 0.00012955581789656843,
+      "epoch": 0.43555555555555553,
+      "step": 490
+    },
+    {
+      "loss": 1.0483,
+      "grad_norm": 0.08033134043216705,
+      "learning_rate": 0.00012673328608843636,
+      "epoch": 0.4444444444444444,
+      "step": 500
+    },
+    {
+      "loss": 1.088,
+      "grad_norm": 0.08166171610355377,
+      "learning_rate": 0.00012388762413983445,
+      "epoch": 0.4533333333333333,
+      "step": 510
+    },
+    {
+      "loss": 1.0894,
+      "grad_norm": 0.10416824370622635,
+      "learning_rate": 0.00012102129417084714,
+      "epoch": 0.4622222222222222,
+      "step": 520
+    },
+    {
+      "loss": 1.0954,
+      "grad_norm": 0.08470191806554794,
+      "learning_rate": 0.00011813677618391759,
+      "epoch": 0.4711111111111111,
+      "step": 530
+    },
+    {
+      "loss": 1.0228,
+      "grad_norm": 0.0904318168759346,
+      "learning_rate": 0.00011523656591810337,
+      "epoch": 0.48,
+      "step": 540
+    },
+    {
+      "loss": 1.0255,
+      "grad_norm": 0.09989261627197266,
+      "learning_rate": 0.00011232317268971585,
+      "epoch": 0.4888888888888889,
+      "step": 550
+    },
+    {
+      "loss": 1.0486,
+      "grad_norm": 0.08665929734706879,
+      "learning_rate": 0.00010939911722121306,
+      "epoch": 0.49777777777777776,
+      "step": 560
+    },
+    {
+      "loss": 1.1087,
+      "grad_norm": 0.09826831519603729,
+      "learning_rate": 0.00010646692946022285,
+      "epoch": 0.5066666666666667,
+      "step": 570
+    },
+    {
+      "loss": 1.0638,
+      "grad_norm": 0.09649895876646042,
+      "learning_rate": 0.00010352914639058526,
+      "epoch": 0.5155555555555555,
+      "step": 580
+    },
+    {
+      "loss": 1.0457,
+      "grad_norm": 0.08156418800354004,
+      "learning_rate": 0.00010058830983730622,
+      "epoch": 0.5244444444444445,
+      "step": 590
+    },
+    {
+      "loss": 1.1155,
+      "grad_norm": 0.09795871376991272,
+      "learning_rate": 9.764696426732303e-05,
+      "epoch": 0.5333333333333333,
+      "step": 600
+    },
+    {
+      "loss": 1.1693,
+      "grad_norm": 0.08484228700399399,
+      "learning_rate": 9.470765458798368e-05,
+      "epoch": 0.5422222222222223,
+      "step": 610
+    },
+    {
+      "loss": 1.0619,
+      "grad_norm": 0.09032510966062546,
+      "learning_rate": 9.177292394514555e-05,
+      "epoch": 0.5511111111111111,
+      "step": 620
+    },
+    {
+      "loss": 1.0983,
+      "grad_norm": 0.09677831083536148,
+      "learning_rate": 8.884531152279756e-05,
+      "epoch": 0.56,
+      "step": 630
+    },
+    {
+      "loss": 1.1169,
+      "grad_norm": 0.08354154229164124,
+      "learning_rate": 8.592735034611097e-05,
+      "epoch": 0.5688888888888889,
+      "step": 640
+    },
+    {
+      "loss": 1.081,
+      "grad_norm": 0.08487077802419662,
+      "learning_rate": 8.302156508981815e-05,
+      "epoch": 0.5777777777777777,
+      "step": 650
+    },
+    {
+      "loss": 1.089,
+      "grad_norm": 0.0782162994146347,
+      "learning_rate": 8.013046989381691e-05,
+      "epoch": 0.5866666666666667,
+      "step": 660
+    },
+    {
+      "loss": 1.015,
+      "grad_norm": 0.08796509355306625,
+      "learning_rate": 7.725656618788937e-05,
+      "epoch": 0.5955555555555555,
+      "step": 670
+    },
+    {
+      "loss": 1.0907,
+      "grad_norm": 0.0932922437787056,
+      "learning_rate": 7.4402340527418e-05,
+      "epoch": 0.6044444444444445,
+      "step": 680
+    },
+    {
+      "loss": 1.0903,
+      "grad_norm": 0.07425214350223541,
+      "learning_rate": 7.157026244197132e-05,
+      "epoch": 0.6133333333333333,
+      "step": 690
+    },
+    {
+      "loss": 1.0779,
+      "grad_norm": 0.08208112418651581,
+      "learning_rate": 6.87627822986206e-05,
+      "epoch": 0.6222222222222222,
+      "step": 700
+    },
+    {
+      "loss": 1.1069,
+      "grad_norm": 0.08736101537942886,
+      "learning_rate": 6.598232918183632e-05,
+      "epoch": 0.6311111111111111,
+      "step": 710
+    },
+    {
+      "loss": 1.1449,
+      "grad_norm": 0.08801425993442535,
+      "learning_rate": 6.323130879179875e-05,
+      "epoch": 0.64,
+      "step": 720
+    },
+    {
+      "loss": 1.0806,
+      "grad_norm": 0.09554090350866318,
+      "learning_rate": 6.051210136294089e-05,
+      "epoch": 0.6488888888888888,
+      "step": 730
+    },
+    {
+      "loss": 1.0781,
+      "grad_norm": 0.08407624065876007,
+      "learning_rate": 5.7827059604525234e-05,
+      "epoch": 0.6577777777777778,
+      "step": 740
+    },
+    {
+      "loss": 1.0749,
+      "grad_norm": 0.10076402127742767,
+      "learning_rate": 5.517850666503547e-05,
+      "epoch": 0.6666666666666666,
+      "step": 750
+    },
+    {
+      "loss": 1.0283,
+      "grad_norm": 0.08602714538574219,
+      "learning_rate": 5.2568734122144756e-05,
+      "epoch": 0.6755555555555556,
+      "step": 760
+    },
+    {
+      "loss": 1.047,
+      "grad_norm": 0.08159055560827255,
+      "learning_rate": 5.000000000000002e-05,
+      "epoch": 0.6844444444444444,
+      "step": 770
+    },
+    {
+      "loss": 1.0853,
+      "grad_norm": 0.0906791090965271,
+      "learning_rate": 4.747452681553674e-05,
+      "epoch": 0.6933333333333334,
+      "step": 780
+    },
+    {
+      "loss": 1.0818,
+      "grad_norm": 0.09545056521892548,
+      "learning_rate": 4.4994499655515865e-05,
+      "epoch": 0.7022222222222222,
+      "step": 790
+    },
+    {
+      "loss": 1.105,
+      "grad_norm": 0.09591014683246613,
+      "learning_rate": 4.256206428594587e-05,
+      "epoch": 0.7111111111111111,
+      "step": 800
+    },
+    {
+      "loss": 1.0438,
+      "grad_norm": 0.09302148222923279,
+      "learning_rate": 4.017932529552543e-05,
+      "epoch": 0.72,
+      "step": 810
+    },
+    {
+      "loss": 1.1063,
+      "grad_norm": 0.09119318425655365,
+      "learning_rate": 3.784834427471408e-05,
+      "epoch": 0.7288888888888889,
+      "step": 820
+    },
+    {
+      "loss": 1.0919,
+      "grad_norm": 0.09713295102119446,
+      "learning_rate": 3.557113803200537e-05,
+      "epoch": 0.7377777777777778,
+      "step": 830
+    },
+    {
+      "loss": 1.0431,
+      "grad_norm": 0.08561000227928162,
+      "learning_rate": 3.3349676848946345e-05,
+      "epoch": 0.7466666666666667,
+      "step": 840
+    },
+    {
+      "loss": 1.0339,
+      "grad_norm": 0.09021477401256561,
+      "learning_rate": 3.118588277541312e-05,
+      "epoch": 0.7555555555555555,
+      "step": 850
+    },
+    {
+      "loss": 1.0718,
+      "grad_norm": 0.08053261786699295,
+      "learning_rate": 2.9081627966617096e-05,
+      "epoch": 0.7644444444444445,
+      "step": 860
+    },
+    {
+      "loss": 1.1127,
+      "grad_norm": 0.08699634671211243,
+      "learning_rate": 2.7038733063281174e-05,
+      "epoch": 0.7733333333333333,
+      "step": 870
+    },
+    {
+      "loss": 1.0671,
+      "grad_norm": 0.08015429228544235,
+      "learning_rate": 2.5058965616387498e-05,
+      "epoch": 0.7822222222222223,
+      "step": 880
+    },
+    {
+      "loss": 1.0999,
+      "grad_norm": 0.0755259171128273,
+      "learning_rate": 2.3144038557858916e-05,
+      "epoch": 0.7911111111111111,
+      "step": 890
+    },
+    {
+      "loss": 1.0768,
+      "grad_norm": 0.09144140779972076,
+      "learning_rate": 2.1295608718498284e-05,
+      "epoch": 0.8,
+      "step": 900
+    },
+    {
+      "loss": 1.1237,
+      "grad_norm": 0.0847860723733902,
+      "learning_rate": 1.9515275394467446e-05,
+      "epoch": 0.8088888888888889,
+      "step": 910
+    },
+    {
+      "loss": 1.0821,
+      "grad_norm": 0.08161620050668716,
+      "learning_rate": 1.7804578963545994e-05,
+      "epoch": 0.8177777777777778,
+      "step": 920
+    },
+    {
+      "loss": 1.1261,
+      "grad_norm": 0.10084446519613266,
+      "learning_rate": 1.6164999552367765e-05,
+      "epoch": 0.8266666666666667,
+      "step": 930
+    },
+    {
+      "loss": 1.1363,
+      "grad_norm": 0.08395445346832275,
+      "learning_rate": 1.4597955755787373e-05,
+      "epoch": 0.8355555555555556,
+      "step": 940
+    },
+    {
+      "loss": 1.0882,
+      "grad_norm": 0.10394323617219925,
+      "learning_rate": 1.3104803409485356e-05,
+      "epoch": 0.8444444444444444,
+      "step": 950
+    },
+    {
+      "loss": 1.0949,
+      "grad_norm": 0.08739852160215378,
+      "learning_rate": 1.1686834416873815e-05,
+      "epoch": 0.8533333333333334,
+      "step": 960
+    },
+    {
+      "loss": 1.0952,
+      "grad_norm": 0.08164411783218384,
+      "learning_rate": 1.0345275631317163e-05,
+      "epoch": 0.8622222222222222,
+      "step": 970
+    },
+    {
+      "loss": 1.0312,
+      "grad_norm": 0.09349754452705383,
+      "learning_rate": 9.081287794635774e-06,
+      "epoch": 0.8711111111111111,
+      "step": 980
+    },
+    {
+      "loss": 1.0914,
+      "grad_norm": 0.10627619922161102,
+      "learning_rate": 7.895964532810317e-06,
+      "epoch": 0.88,
+      "step": 990
+    },
+    {
+      "loss": 1.0996,
+      "grad_norm": 0.08615703135728836,
+      "learning_rate": 6.7903314097560454e-06,
+      "epoch": 0.8888888888888888,
+      "step": 1000
+    },
+    {
+      "loss": 1.102,
+      "grad_norm": 0.08835088461637497,
+      "learning_rate": 5.765345039985648e-06,
+      "epoch": 0.8977777777777778,
+      "step": 1010
+    },
+    {
+      "loss": 1.0453,
+      "grad_norm": 0.08689237385988235,
+      "learning_rate": 4.821892260928451e-06,
+      "epoch": 0.9066666666666666,
+      "step": 1020
+    },
+    {
+      "loss": 1.0663,
+      "grad_norm": 0.09312503784894943,
+      "learning_rate": 3.960789365622075e-06,
+      "epoch": 0.9155555555555556,
+      "step": 1030
+    },
+    {
+      "loss": 1.0539,
+      "grad_norm": 0.08195506036281586,
+      "learning_rate": 3.1827813964403484e-06,
+      "epoch": 0.9244444444444444,
+      "step": 1040
+    },
+    {
+      "loss": 1.0738,
+      "grad_norm": 0.08607076853513718,
+      "learning_rate": 2.4885415004686665e-06,
+      "epoch": 0.9333333333333333,
+      "step": 1050
+    },
+    {
+      "loss": 1.0909,
+      "grad_norm": 0.09247469156980515,
+      "learning_rate": 1.8786703470845547e-06,
+      "epoch": 0.9422222222222222,
+      "step": 1060
+    },
+    {
+      "loss": 1.1243,
+      "grad_norm": 0.07952508330345154,
+      "learning_rate": 1.3536956082472074e-06,
+      "epoch": 0.9511111111111111,
+      "step": 1070
+    },
+    {
+      "loss": 1.1077,
+      "grad_norm": 0.09081117808818817,
+      "learning_rate": 9.140715019458457e-07,
+      "epoch": 0.96,
+      "step": 1080
+    },
+    {
+      "loss": 1.0837,
+      "grad_norm": 0.07951053231954575,
+      "learning_rate": 5.60178399201805e-07,
+      "epoch": 0.9688888888888889,
+      "step": 1090
+    },
+    {
+      "loss": 1.0637,
+      "grad_norm": 0.09419357776641846,
+      "learning_rate": 2.923224949643477e-07,
+      "epoch": 0.9777777777777777,
+      "step": 1100
+    },
+    {
+      "loss": 1.0626,
+      "grad_norm": 0.08806079626083374,
+      "learning_rate": 1.1073554318509205e-07,
+      "epoch": 0.9866666666666667,
+      "step": 1110
+    },
+    {
+      "loss": 1.1044,
+      "grad_norm": 0.11233274638652802,
+      "learning_rate": 1.5574656300143542e-08,
+      "epoch": 0.9955555555555555,
+      "step": 1120
+    },
+    {
+      "train_runtime": 14563.2899,
+      "train_samples_per_second": 1.236,
+      "train_steps_per_second": 0.077,
+      "total_flos": 3.141329099864146e+17,
+      "train_loss": 1.0860543518066406,
+      "epoch": 1.0,
+      "step": 1125
+    }
+  ]
+}