Instructions to use madhuHuggingface/functiongemma-ec2-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use madhuHuggingface/functiongemma-ec2-finetuned with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("madhuHuggingface/functiongemma-ec2-finetuned", dtype="auto")

llama-cpp-python

How to use madhuHuggingface/functiongemma-ec2-finetuned with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="madhuHuggingface/functiongemma-ec2-finetuned",
	filename="gguf/functiongemma-270m-it.Q8_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use madhuHuggingface/functiongemma-ec2-finetuned with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0
# Run inference directly in the terminal:
llama-cli -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0
# Run inference directly in the terminal:
llama-cli -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0

Use Docker

docker model run hf.co/madhuHuggingface/functiongemma-ec2-finetuned:Q8_0

LM Studio
Jan
Ollama
How to use madhuHuggingface/functiongemma-ec2-finetuned with Ollama:
```
ollama run hf.co/madhuHuggingface/functiongemma-ec2-finetuned:Q8_0
```

Unsloth Studio

How to use madhuHuggingface/functiongemma-ec2-finetuned with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for madhuHuggingface/functiongemma-ec2-finetuned to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for madhuHuggingface/functiongemma-ec2-finetuned to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for madhuHuggingface/functiongemma-ec2-finetuned to start chatting

How to use madhuHuggingface/functiongemma-ec2-finetuned with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "madhuHuggingface/functiongemma-ec2-finetuned:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use madhuHuggingface/functiongemma-ec2-finetuned with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf madhuHuggingface/functiongemma-ec2-finetuned:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default madhuHuggingface/functiongemma-ec2-finetuned:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use madhuHuggingface/functiongemma-ec2-finetuned with Docker Model Runner:
```
docker model run hf.co/madhuHuggingface/functiongemma-ec2-finetuned:Q8_0
```

Lemonade

How to use madhuHuggingface/functiongemma-ec2-finetuned with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull madhuHuggingface/functiongemma-ec2-finetuned:Q8_0

Run and chat with the model

lemonade run user.functiongemma-ec2-finetuned-Q8_0

List all available models

lemonade list

madhuHuggingface commited on Apr 24

Commit

1b9237e

verified ·

1 Parent(s): 70b41d2

Training in progress, step 1500, checkpoint

Browse files

Files changed (5) hide show

last-checkpoint/adapter_model.safetensors +1 -1
last-checkpoint/optimizer.pt +2 -2
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +596 -701
last-checkpoint/training_args.bin +1 -1

last-checkpoint/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4cb557daf6d75831c1e4da4535fdc7690f5ccf481dc20f270738e296bd0bdd5c
 size 60785144

 version https://git-lfs.github.com/spec/v1
+oid sha256:f6eda6363323c6d79792a81bd69d938b3a58d34e1eb645e055ee597b8bf472ad
 size 60785144

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:148ccd6ae7f5f80cfe57bd86a288b885812f4fc65258f37d567f0168c8f6621a
-size 31803787

 version https://git-lfs.github.com/spec/v1
+oid sha256:ebc1f15cea2729790f76d46a9aa205d07f6d4e3b1ddf01b2a41c940267766469
+size 31149205

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e028462760f837e3f0e379c4b9e8963a3ff91e1441d984e3578a111bec3744ac
 size 1465

 version https://git-lfs.github.com/spec/v1
+oid sha256:fd1b8760497cf2afe6ff758fde5edda9af4f73c2c6f23659a7e5f0cb97215d93
 size 1465

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -4,1169 +4,1064 @@
   "best_model_checkpoint": null,
   "epoch": 3.0,
   "eval_steps": 500,
-  "global_step": 1650,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 0.01818181818181818,
-      "grad_norm": 2.724055290222168,
       "learning_rate": 9e-05,
-      "loss": 0.8719,
       "step": 10
     },
     {
-      "epoch": 0.03636363636363636,
-      "grad_norm": 1.765032410621643,
       "learning_rate": 0.00019,
-      "loss": 0.0961,
       "step": 20
     },
     {
-      "epoch": 0.05454545454545454,
-      "grad_norm": 2.5369484424591064,
-      "learning_rate": 0.00019998495581483372,
-      "loss": 0.2139,
       "step": 30
     },
     {
-      "epoch": 0.07272727272727272,
-      "grad_norm": 1.9454748630523682,
-      "learning_rate": 0.00019993295703551577,
-      "loss": 0.2203,
       "step": 40
     },
     {
-      "epoch": 0.09090909090909091,
-      "grad_norm": 10.630720138549805,
-      "learning_rate": 0.00019984383724218924,
-      "loss": 0.1146,
       "step": 50
     },
     {
-      "epoch": 0.10909090909090909,
-      "grad_norm": 0.7372597455978394,
-      "learning_rate": 0.00019971762953921923,
-      "loss": 0.0478,
       "step": 60
     },
     {
-      "epoch": 0.12727272727272726,
-      "grad_norm": 0.2733962833881378,
-      "learning_rate": 0.00019955438080761524,
-      "loss": 0.0542,
       "step": 70
     },
     {
-      "epoch": 0.14545454545454545,
-      "grad_norm": 0.24756257236003876,
-      "learning_rate": 0.00019935415168761682,
-      "loss": 0.0428,
       "step": 80
     },
     {
-      "epoch": 0.16363636363636364,
-      "grad_norm": 0.9512230753898621,
-      "learning_rate": 0.00019911701655616818,
-      "loss": 0.0108,
       "step": 90
     },
     {
-      "epoch": 0.18181818181818182,
-      "grad_norm": 0.02735295332968235,
-      "learning_rate": 0.00019884306349929017,
-      "loss": 0.0079,
       "step": 100
     },
     {
-      "epoch": 0.2,
-      "grad_norm": 0.17632828652858734,
-      "learning_rate": 0.00019853239427935994,
-      "loss": 0.0156,
       "step": 110
     },
     {
-      "epoch": 0.21818181818181817,
-      "grad_norm": 1.9710729122161865,
-      "learning_rate": 0.0001981851242973103,
-      "loss": 0.0087,
       "step": 120
     },
     {
-      "epoch": 0.23636363636363636,
-      "grad_norm": 1.9644619226455688,
-      "learning_rate": 0.00019780138254976308,
-      "loss": 0.015,
       "step": 130
     },
     {
-      "epoch": 0.2545454545454545,
-      "grad_norm": 3.3607516288757324,
-      "learning_rate": 0.0001973813115811122,
-      "loss": 0.0164,
       "step": 140
     },
     {
-      "epoch": 0.2727272727272727,
-      "grad_norm": 0.11576563864946365,
-      "learning_rate": 0.00019692506743057404,
-      "loss": 0.0072,
       "step": 150
     },
     {
-      "epoch": 0.2909090909090909,
-      "grad_norm": 0.01492376159876585,
-      "learning_rate": 0.00019643281957422545,
-      "loss": 0.0124,
       "step": 160
     },
     {
-      "epoch": 0.3090909090909091,
-      "grad_norm": 0.09349380433559418,
-      "learning_rate": 0.0001959047508620502,
-      "loss": 0.019,
       "step": 170
     },
     {
-      "epoch": 0.32727272727272727,
-      "grad_norm": 1.256628155708313,
-      "learning_rate": 0.00019534105745001762,
-      "loss": 0.0115,
       "step": 180
     },
     {
-      "epoch": 0.34545454545454546,
-      "grad_norm": 0.023120613768696785,
-      "learning_rate": 0.00019474194872721892,
-      "loss": 0.0071,
       "step": 190
     },
     {
-      "epoch": 0.36363636363636365,
-      "grad_norm": 0.7280399799346924,
-      "learning_rate": 0.0001941076472380873,
-      "loss": 0.0091,
       "step": 200
     },
     {
-      "epoch": 0.38181818181818183,
-      "grad_norm": 0.3502386808395386,
-      "learning_rate": 0.00019343838859973219,
-      "loss": 0.0073,
       "step": 210
     },
     {
-      "epoch": 0.4,
-      "grad_norm": 0.02848837897181511,
-      "learning_rate": 0.0001927344214144167,
-      "loss": 0.0085,
       "step": 220
     },
     {
-      "epoch": 0.41818181818181815,
-      "grad_norm": 0.06355214864015579,
-      "learning_rate": 0.00019199600717721243,
-      "loss": 0.0089,
       "step": 230
     },
     {
-      "epoch": 0.43636363636363634,
-      "grad_norm": 0.043841782957315445,
-      "learning_rate": 0.0001912234201788645,
-      "loss": 0.0056,
       "step": 240
     },
     {
-      "epoch": 0.45454545454545453,
-      "grad_norm": 0.4108541011810303,
-      "learning_rate": 0.00019041694740390362,
-      "loss": 0.0136,
       "step": 250
     },
     {
-      "epoch": 0.4727272727272727,
-      "grad_norm": 1.324217677116394,
-      "learning_rate": 0.00018957688842404337,
-      "loss": 0.013,
       "step": 260
     },
     {
-      "epoch": 0.4909090909090909,
-      "grad_norm": 1.3584219217300415,
-      "learning_rate": 0.00018870355528690134,
-      "loss": 0.018,
       "step": 270
     },
     {
-      "epoch": 0.509090909090909,
-      "grad_norm": 0.11316097527742386,
-      "learning_rate": 0.00018779727240008618,
-      "loss": 0.0114,
       "step": 280
     },
     {
-      "epoch": 0.5272727272727272,
-      "grad_norm": 0.015650872141122818,
-      "learning_rate": 0.00018685837641069342,
-      "loss": 0.0067,
       "step": 290
     },
     {
-      "epoch": 0.5454545454545454,
-      "grad_norm": 0.011964845471084118,
-      "learning_rate": 0.0001858872160802549,
-      "loss": 0.0049,
       "step": 300
     },
     {
-      "epoch": 0.5636363636363636,
-      "grad_norm": 0.1816367506980896,
-      "learning_rate": 0.00018488415215518807,
-      "loss": 0.0055,
       "step": 310
     },
     {
-      "epoch": 0.5818181818181818,
-      "grad_norm": 0.0085946349427104,
-      "learning_rate": 0.00018384955723279325,
-      "loss": 0.0043,
       "step": 320
     },
     {
-      "epoch": 0.6,
-      "grad_norm": 0.0685335248708725,
-      "learning_rate": 0.00018278381562284926,
-      "loss": 0.0053,
       "step": 330
     },
     {
-      "epoch": 0.6181818181818182,
-      "grad_norm": 0.06212463974952698,
-      "learning_rate": 0.00018168732320485774,
-      "loss": 0.0045,
       "step": 340
     },
     {
-      "epoch": 0.6363636363636364,
-      "grad_norm": 0.002532408805564046,
-      "learning_rate": 0.00018056048728099024,
-      "loss": 0.0017,
       "step": 350
     },
     {
-      "epoch": 0.6545454545454545,
-      "grad_norm": 0.19549418985843658,
-      "learning_rate": 0.000179403726424792,
-      "loss": 0.0196,
       "step": 360
     },
     {
-      "epoch": 0.6727272727272727,
-      "grad_norm": 0.20419709384441376,
-      "learning_rate": 0.00017821747032569906,
-      "loss": 0.0073,
       "step": 370
     },
     {
-      "epoch": 0.6909090909090909,
-      "grad_norm": 0.038685109466314316,
-      "learning_rate": 0.0001770021596294261,
-      "loss": 0.0044,
       "step": 380
     },
     {
-      "epoch": 0.7090909090909091,
-      "grad_norm": 0.023335754871368408,
-      "learning_rate": 0.00017575824577428453,
-      "loss": 0.008,
       "step": 390
     },
     {
-      "epoch": 0.7272727272727273,
-      "grad_norm": 0.0774979218840599,
-      "learning_rate": 0.00017448619082349165,
-      "loss": 0.0046,
       "step": 400
     },
     {
-      "epoch": 0.7454545454545455,
-      "grad_norm": 0.009631125256419182,
-      "learning_rate": 0.000173186467293533,
-      "loss": 0.0043,
       "step": 410
     },
     {
-      "epoch": 0.7636363636363637,
-      "grad_norm": 0.04730548709630966,
-      "learning_rate": 0.00017185955797864184,
-      "loss": 0.0039,
       "step": 420
     },
     {
-      "epoch": 0.7818181818181819,
-      "grad_norm": 0.05997833237051964,
-      "learning_rate": 0.00017050595577146061,
-      "loss": 0.004,
       "step": 430
     },
     {
-      "epoch": 0.8,
-      "grad_norm": 0.7156815528869629,
-      "learning_rate": 0.00016912616347995157,
-      "loss": 0.0053,
       "step": 440
     },
     {
-      "epoch": 0.8181818181818182,
-      "grad_norm": 1.1794134378433228,
-      "learning_rate": 0.00016772069364062432,
-      "loss": 0.0063,
       "step": 450
     },
     {
-      "epoch": 0.8363636363636363,
-      "grad_norm": 0.9097093939781189,
-      "learning_rate": 0.0001662900683281491,
-      "loss": 0.0085,
       "step": 460
     },
     {
-      "epoch": 0.8545454545454545,
-      "grad_norm": 0.008992375805974007,
-      "learning_rate": 0.0001648348189614275,
-      "loss": 0.0037,
       "step": 470
     },
     {
-      "epoch": 0.8727272727272727,
-      "grad_norm": 0.024826984852552414,
-      "learning_rate": 0.00016335548610619215,
-      "loss": 0.0045,
       "step": 480
     },
     {
-      "epoch": 0.8909090909090909,
-      "grad_norm": 0.04557984322309494,
-      "learning_rate": 0.00016185261927420845,
       "loss": 0.0039,
       "step": 490
     },
     {
-      "epoch": 0.9090909090909091,
-      "grad_norm": 0.016058191657066345,
-      "learning_rate": 0.00016032677671915343,
-      "loss": 0.0048,
       "step": 500
     },
     {
-      "epoch": 0.9272727272727272,
-      "grad_norm": 0.08177982270717621,
-      "learning_rate": 0.00015877852522924732,
-      "loss": 0.0047,
       "step": 510
     },
     {
-      "epoch": 0.9454545454545454,
-      "grad_norm": 0.00327894976362586,
-      "learning_rate": 0.00015720843991671486,
-      "loss": 0.0033,
       "step": 520
     },
     {
-      "epoch": 0.9636363636363636,
-      "grad_norm": 0.0032260078005492687,
-      "learning_rate": 0.0001556171040041546,
-      "loss": 0.0027,
       "step": 530
     },
     {
-      "epoch": 0.9818181818181818,
-      "grad_norm": 0.0032725839409977198,
-      "learning_rate": 0.00015400510860789546,
-      "loss": 0.0034,
       "step": 540
     },
     {
-      "epoch": 1.0,
-      "grad_norm": 0.005383970681577921,
-      "learning_rate": 0.00015237305251842122,
-      "loss": 0.003,
       "step": 550
     },
     {
-      "epoch": 1.018181818181818,
-      "grad_norm": 0.03116321749985218,
-      "learning_rate": 0.00015072154197794422,
-      "loss": 0.0038,
       "step": 560
     },
     {
-      "epoch": 1.0363636363636364,
-      "grad_norm": 0.08080937713384628,
-      "learning_rate": 0.00014905119045521115,
-      "loss": 0.0022,
       "step": 570
     },
     {
-      "epoch": 1.0545454545454545,
-      "grad_norm": 0.08265340328216553,
-      "learning_rate": 0.00014736261841762454,
-      "loss": 0.0052,
       "step": 580
     },
     {
-      "epoch": 1.0727272727272728,
-      "grad_norm": 0.001597880502231419,
-      "learning_rate": 0.00014565645310076427,
-      "loss": 0.0043,
       "step": 590
     },
     {
-      "epoch": 1.0909090909090908,
-      "grad_norm": 0.02374003641307354,
-      "learning_rate": 0.0001439333282753954,
-      "loss": 0.0045,
       "step": 600
     },
     {
-      "epoch": 1.1090909090909091,
-      "grad_norm": 0.024072911590337753,
-      "learning_rate": 0.00014219388401204796,
-      "loss": 0.0059,
       "step": 610
     },
     {
-      "epoch": 1.1272727272727272,
-      "grad_norm": 0.0032293887343257666,
-      "learning_rate": 0.00014043876644325703,
-      "loss": 0.0042,
       "step": 620
     },
     {
-      "epoch": 1.1454545454545455,
-      "grad_norm": 0.03435875102877617,
-      "learning_rate": 0.00013866862752355088,
-      "loss": 0.0035,
       "step": 630
     },
     {
-      "epoch": 1.1636363636363636,
-      "grad_norm": 0.0040184855461120605,
-      "learning_rate": 0.00013688412478727634,
-      "loss": 0.0035,
       "step": 640
     },
     {
-      "epoch": 1.1818181818181819,
-      "grad_norm": 0.002403195947408676,
-      "learning_rate": 0.0001350859211043517,
-      "loss": 0.0036,
       "step": 650
     },
     {
-      "epoch": 1.2,
-      "grad_norm": 0.0033931646030396223,
-      "learning_rate": 0.00013327468443403783,
-      "loss": 0.0037,
       "step": 660
     },
     {
-      "epoch": 1.2181818181818183,
-      "grad_norm": 0.05022850260138512,
-      "learning_rate": 0.00013145108757681818,
-      "loss": 0.0039,
       "step": 670
     },
     {
-      "epoch": 1.2363636363636363,
-      "grad_norm": 0.03751479461789131,
-      "learning_rate": 0.00012961580792448106,
-      "loss": 0.0068,
       "step": 680
     },
     {
-      "epoch": 1.2545454545454544,
-      "grad_norm": 0.02040746621787548,
-      "learning_rate": 0.00012776952720849636,
       "loss": 0.0035,
       "step": 690
     },
     {
-      "epoch": 1.2727272727272727,
-      "grad_norm": 0.07843760401010513,
-      "learning_rate": 0.0001259129312467799,
-      "loss": 0.0053,
       "step": 700
     },
     {
-      "epoch": 1.290909090909091,
-      "grad_norm": 0.03921971097588539,
-      "learning_rate": 0.00012404670968894037,
-      "loss": 0.0029,
       "step": 710
     },
     {
-      "epoch": 1.309090909090909,
-      "grad_norm": 0.01010841503739357,
-      "learning_rate": 0.00012217155576010224,
-      "loss": 0.0043,
       "step": 720
     },
     {
-      "epoch": 1.3272727272727272,
-      "grad_norm": 0.007104328367859125,
-      "learning_rate": 0.00012028816600340136,
-      "loss": 0.0029,
       "step": 730
     },
     {
-      "epoch": 1.3454545454545455,
-      "grad_norm": 0.0017335556913167238,
-      "learning_rate": 0.0001183972400212473,
-      "loss": 0.0029,
       "step": 740
     },
     {
-      "epoch": 1.3636363636363638,
-      "grad_norm": 0.03576100617647171,
-      "learning_rate": 0.00011649948021544979,
-      "loss": 0.0033,
       "step": 750
     },
     {
-      "epoch": 1.3818181818181818,
-      "grad_norm": 0.0428117960691452,
-      "learning_rate": 0.00011459559152630511,
-      "loss": 0.0047,
       "step": 760
     },
     {
-      "epoch": 1.4,
-      "grad_norm": 0.005341957323253155,
-      "learning_rate": 0.00011268628117073939,
-      "loss": 0.0026,
       "step": 770
     },
     {
-      "epoch": 1.4181818181818182,
-      "grad_norm": 0.0017387029947713017,
-      "learning_rate": 0.00011077225837960659,
-      "loss": 0.003,
       "step": 780
     },
     {
-      "epoch": 1.4363636363636363,
-      "grad_norm": 0.3902546167373657,
-      "learning_rate": 0.00010885423413423812,
-      "loss": 0.0032,
       "step": 790
     },
     {
-      "epoch": 1.4545454545454546,
-      "grad_norm": 0.0021570881363004446,
-      "learning_rate": 0.00010693292090234228,
-      "loss": 0.0023,
       "step": 800
     },
     {
-      "epoch": 1.4727272727272727,
-      "grad_norm": 0.0013360620941966772,
-      "learning_rate": 0.00010500903237335156,
-      "loss": 0.0028,
       "step": 810
     },
     {
-      "epoch": 1.490909090909091,
-      "grad_norm": 0.0020807127002626657,
-      "learning_rate": 0.00010308328319331621,
-      "loss": 0.004,
       "step": 820
     },
     {
-      "epoch": 1.509090909090909,
-      "grad_norm": 0.002286926843225956,
-      "learning_rate": 0.00010115638869944238,
-      "loss": 0.0026,
       "step": 830
     },
     {
-      "epoch": 1.5272727272727273,
-      "grad_norm": 0.00438573257997632,
-      "learning_rate": 9.922906465437359e-05,
-      "loss": 0.005,
       "step": 840
     },
     {
-      "epoch": 1.5454545454545454,
-      "grad_norm": 0.008608890697360039,
-      "learning_rate": 9.730202698031409e-05,
-      "loss": 0.0036,
       "step": 850
     },
     {
-      "epoch": 1.5636363636363635,
-      "grad_norm": 0.07345500588417053,
-      "learning_rate": 9.537599149309288e-05,
-      "loss": 0.0031,
       "step": 860
     },
     {
-      "epoch": 1.5818181818181818,
-      "grad_norm": 0.03977720066905022,
-      "learning_rate": 9.345167363626764e-05,
-      "loss": 0.0021,
       "step": 870
     },
     {
-      "epoch": 1.6,
-      "grad_norm": 0.04508666321635246,
-      "learning_rate": 9.15297882153664e-05,
-      "loss": 0.0036,
       "step": 880
     },
     {
-      "epoch": 1.6181818181818182,
-      "grad_norm": 0.001087481272406876,
-      "learning_rate": 8.961104913236644e-05,
-      "loss": 0.0018,
       "step": 890
     },
     {
-      "epoch": 1.6363636363636362,
-      "grad_norm": 0.0006801167037338018,
-      "learning_rate": 8.769616912050914e-05,
-      "loss": 0.0023,
       "step": 900
     },
     {
-      "epoch": 1.6545454545454545,
-      "grad_norm": 0.0018512771930545568,
-      "learning_rate": 8.578585947954832e-05,
-      "loss": 0.0029,
       "step": 910
     },
     {
-      "epoch": 1.6727272727272728,
-      "grad_norm": 0.002133868169039488,
-      "learning_rate": 8.388082981153165e-05,
       "loss": 0.003,
       "step": 920
     },
     {
-      "epoch": 1.690909090909091,
-      "grad_norm": 0.027992183342576027,
-      "learning_rate": 8.198178775721249e-05,
-      "loss": 0.0034,
       "step": 930
     },
     {
-      "epoch": 1.709090909090909,
-      "grad_norm": 0.01641463302075863,
-      "learning_rate": 8.008943873319001e-05,
-      "loss": 0.0021,
       "step": 940
     },
     {
-      "epoch": 1.7272727272727273,
-      "grad_norm": 0.0009675964247435331,
-      "learning_rate": 7.820448566987582e-05,
-      "loss": 0.0028,
       "step": 950
     },
     {
-      "epoch": 1.7454545454545456,
-      "grad_norm": 0.0009243777021765709,
-      "learning_rate": 7.632762875038421e-05,
-      "loss": 0.0038,
       "step": 960
     },
     {
-      "epoch": 1.7636363636363637,
-      "grad_norm": 0.0014060670509934425,
-      "learning_rate": 7.445956515044248e-05,
-      "loss": 0.0022,
       "step": 970
     },
     {
-      "epoch": 1.7818181818181817,
-      "grad_norm": 0.0010076279286295176,
-      "learning_rate": 7.260098877941856e-05,
-      "loss": 0.0027,
       "step": 980
     },
     {
-      "epoch": 1.8,
-      "grad_norm": 0.03766465559601784,
-      "learning_rate": 7.075259002256233e-05,
-      "loss": 0.0051,
       "step": 990
     },
     {
-      "epoch": 1.8181818181818183,
-      "grad_norm": 0.0012187482789158821,
-      "learning_rate": 6.891505548455539e-05,
-      "loss": 0.0026,
       "step": 1000
     },
     {
-      "epoch": 1.8363636363636364,
-      "grad_norm": 0.0012040241854265332,
-      "learning_rate": 6.708906773446544e-05,
-      "loss": 0.0022,
       "step": 1010
     },
     {
-      "epoch": 1.8545454545454545,
-      "grad_norm": 0.021220913156867027,
-      "learning_rate": 6.527530505220008e-05,
-      "loss": 0.003,
       "step": 1020
     },
     {
-      "epoch": 1.8727272727272726,
-      "grad_norm": 0.0018169321119785309,
-      "learning_rate": 6.347444117655306e-05,
-      "loss": 0.0032,
       "step": 1030
     },
     {
-      "epoch": 1.8909090909090909,
-      "grad_norm": 0.0020051824394613504,
-      "learning_rate": 6.16871450549381e-05,
-      "loss": 0.0028,
       "step": 1040
     },
     {
-      "epoch": 1.9090909090909092,
-      "grad_norm": 0.04109319671988487,
-      "learning_rate": 5.9914080594902235e-05,
-      "loss": 0.0033,
       "step": 1050
     },
     {
-      "epoch": 1.9272727272727272,
-      "grad_norm": 0.028795786201953888,
-      "learning_rate": 5.815590641751112e-05,
-      "loss": 0.0034,
       "step": 1060
     },
     {
-      "epoch": 1.9454545454545453,
-      "grad_norm": 0.0015263812383636832,
-      "learning_rate": 5.641327561269828e-05,
-      "loss": 0.0025,
       "step": 1070
     },
     {
-      "epoch": 1.9636363636363636,
-      "grad_norm": 0.0009399221162311733,
-      "learning_rate": 5.468683549666884e-05,
-      "loss": 0.0024,
       "step": 1080
     },
     {
-      "epoch": 1.981818181818182,
-      "grad_norm": 0.034878093749284744,
-      "learning_rate": 5.297722737144802e-05,
-      "loss": 0.0028,
       "step": 1090
     },
     {
-      "epoch": 2.0,
-      "grad_norm": 0.045912522822618484,
-      "learning_rate": 5.128508628666364e-05,
-      "loss": 0.0032,
       "step": 1100
     },
     {
-      "epoch": 2.018181818181818,
-      "grad_norm": 0.0015094137052074075,
-      "learning_rate": 4.96110408036509e-05,
-      "loss": 0.0022,
       "step": 1110
     },
     {
-      "epoch": 2.036363636363636,
-      "grad_norm": 0.0008388591813854873,
-      "learning_rate": 4.7955712761967785e-05,
-      "loss": 0.0021,
       "step": 1120
     },
     {
-      "epoch": 2.0545454545454547,
-      "grad_norm": 0.046201568096876144,
-      "learning_rate": 4.631971704840685e-05,
-      "loss": 0.0067,
       "step": 1130
     },
     {
-      "epoch": 2.0727272727272728,
-      "grad_norm": 0.09966272115707397,
-      "learning_rate": 4.470366136858994e-05,
-      "loss": 0.0034,
       "step": 1140
     },
     {
-      "epoch": 2.090909090909091,
-      "grad_norm": 0.0031047267839312553,
-      "learning_rate": 4.310814602123047e-05,
-      "loss": 0.003,
       "step": 1150
     },
     {
-      "epoch": 2.109090909090909,
-      "grad_norm": 0.04409536346793175,
-      "learning_rate": 4.153376367514673e-05,
       "loss": 0.0031,
       "step": 1160
     },
     {
-      "epoch": 2.1272727272727274,
-      "grad_norm": 0.03463288024067879,
-      "learning_rate": 3.998109914910978e-05,
-      "loss": 0.0034,
       "step": 1170
     },
     {
-      "epoch": 2.1454545454545455,
-      "grad_norm": 0.04570608213543892,
-      "learning_rate": 3.845072919460717e-05,
-      "loss": 0.0038,
       "step": 1180
     },
     {
-      "epoch": 2.1636363636363636,
-      "grad_norm": 0.0019727866165339947,
-      "learning_rate": 3.694322228160325e-05,
-      "loss": 0.0031,
       "step": 1190
     },
     {
-      "epoch": 2.1818181818181817,
-      "grad_norm": 0.0016377349384129047,
-      "learning_rate": 3.545913838737567e-05,
-      "loss": 0.0027,
       "step": 1200
     },
     {
-      "epoch": 2.2,
-      "grad_norm": 0.053264349699020386,
-      "learning_rate": 3.399902878850693e-05,
-      "loss": 0.0019,
       "step": 1210
     },
     {
-      "epoch": 2.2181818181818183,
-      "grad_norm": 0.03723418712615967,
-      "learning_rate": 3.256343585610739e-05,
-      "loss": 0.0024,
       "step": 1220
     },
     {
-      "epoch": 2.2363636363636363,
-      "grad_norm": 0.03567889332771301,
-      "learning_rate": 3.115289285434671e-05,
-      "loss": 0.0038,
       "step": 1230
     },
     {
-      "epoch": 2.2545454545454544,
-      "grad_norm": 0.05600603297352791,
-      "learning_rate": 2.9767923742367942e-05,
-      "loss": 0.0025,
       "step": 1240
     },
     {
-      "epoch": 2.2727272727272725,
-      "grad_norm": 0.036135077476501465,
-      "learning_rate": 2.8409042979657995e-05,
-      "loss": 0.0022,
       "step": 1250
     },
     {
-      "epoch": 2.290909090909091,
-      "grad_norm": 0.03194129467010498,
-      "learning_rate": 2.7076755334947122e-05,
-      "loss": 0.0027,
       "step": 1260
     },
     {
-      "epoch": 2.309090909090909,
-      "grad_norm": 0.0014442523242905736,
-      "learning_rate": 2.5771555698707804e-05,
-      "loss": 0.0024,
       "step": 1270
     },
     {
-      "epoch": 2.327272727272727,
-      "grad_norm": 0.02508995682001114,
-      "learning_rate": 2.449392889932315e-05,
-      "loss": 0.0033,
       "step": 1280
     },
     {
-      "epoch": 2.3454545454545457,
-      "grad_norm": 0.056714046746492386,
-      "learning_rate": 2.324434952299298e-05,
-      "loss": 0.0026,
       "step": 1290
     },
     {
-      "epoch": 2.3636363636363638,
-      "grad_norm": 0.03895105794072151,
-      "learning_rate": 2.2023281737444435e-05,
-      "loss": 0.0031,
       "step": 1300
     },
     {
-      "epoch": 2.381818181818182,
-      "grad_norm": 0.0020949903409928083,
-      "learning_rate": 2.0831179119512623e-05,
-      "loss": 0.0032,
       "step": 1310
     },
     {
-      "epoch": 2.4,
-      "grad_norm": 0.04687955975532532,
-      "learning_rate": 1.966848448665529e-05,
-      "loss": 0.0037,
       "step": 1320
     },
     {
-      "epoch": 2.418181818181818,
-      "grad_norm": 0.05086053907871246,
-      "learning_rate": 1.853562973246421e-05,
-      "loss": 0.0038,
       "step": 1330
     },
     {
-      "epoch": 2.4363636363636365,
-      "grad_norm": 0.04270637780427933,
-      "learning_rate": 1.7433035666234442e-05,
-      "loss": 0.0025,
       "step": 1340
     },
     {
-      "epoch": 2.4545454545454546,
-      "grad_norm": 0.03331838920712471,
-      "learning_rate": 1.6361111856650768e-05,
-      "loss": 0.0029,
       "step": 1350
     },
     {
-      "epoch": 2.4727272727272727,
-      "grad_norm": 0.0021509944926947355,
-      "learning_rate": 1.5320256479649607e-05,
-      "loss": 0.002,
       "step": 1360
     },
     {
-      "epoch": 2.4909090909090907,
-      "grad_norm": 0.0010530983563512564,
-      "learning_rate": 1.4310856170513087e-05,
-      "loss": 0.0032,
       "step": 1370
     },
     {
-      "epoch": 2.509090909090909,
-      "grad_norm": 0.044435929507017136,
-      "learning_rate": 1.333328588024959e-05,
-      "loss": 0.0035,
       "step": 1380
     },
     {
-      "epoch": 2.5272727272727273,
-      "grad_norm": 0.00098559504840523,
-      "learning_rate": 1.2387908736314923e-05,
-      "loss": 0.0031,
       "step": 1390
     },
     {
-      "epoch": 2.5454545454545454,
-      "grad_norm": 0.11319796741008759,
-      "learning_rate": 1.1475075907725253e-05,
-      "loss": 0.0031,
       "step": 1400
     },
     {
-      "epoch": 2.5636363636363635,
-      "grad_norm": 0.0011413119500502944,
-      "learning_rate": 1.0595126474612106e-05,
-      "loss": 0.0023,
       "step": 1410
     },
     {
-      "epoch": 2.581818181818182,
-      "grad_norm": 0.0011832962045446038,
-      "learning_rate": 9.748387302268036e-06,
-      "loss": 0.0023,
       "step": 1420
     },
     {
-      "epoch": 2.6,
-      "grad_norm": 0.0010551942978054285,
-      "learning_rate": 8.935172919729373e-06,
-      "loss": 0.0028,
       "step": 1430
     },
     {
-      "epoch": 2.618181818181818,
-      "grad_norm": 0.0023533699568361044,
-      "learning_rate": 8.155785402941684e-06,
-      "loss": 0.0029,
       "step": 1440
     },
     {
-      "epoch": 2.6363636363636362,
-      "grad_norm": 0.0014758601319044828,
-      "learning_rate": 7.410514262550749e-06,
-      "loss": 0.002,
       "step": 1450
     },
     {
-      "epoch": 2.6545454545454543,
-      "grad_norm": 0.001560390810482204,
-      "learning_rate": 6.6996363363612925e-06,
-      "loss": 0.0033,
       "step": 1460
     },
     {
-      "epoch": 2.672727272727273,
-      "grad_norm": 0.0016578083159402013,
-      "learning_rate": 6.023415686502942e-06,
-      "loss": 0.0018,
       "step": 1470
     },
     {
-      "epoch": 2.690909090909091,
-      "grad_norm": 0.002960038837045431,
-      "learning_rate": 5.382103501341973e-06,
       "loss": 0.0026,
       "step": 1480
     },
     {
-      "epoch": 2.709090909090909,
-      "grad_norm": 0.03903718665242195,
-      "learning_rate": 4.775938002175129e-06,
-      "loss": 0.003,
-      "step": 1490
-    },
-    {
-      "epoch": 2.7272727272727275,
-      "grad_norm": 0.03972185030579567,
-      "learning_rate": 4.205144354740032e-06,
-      "loss": 0.0031,
-      "step": 1500
-    },
-    {
-      "epoch": 2.7454545454545456,
-      "grad_norm": 0.0010361653985455632,
-      "learning_rate": 3.6699345855753855e-06,
-      "loss": 0.0021,
-      "step": 1510
-    },
-    {
-      "epoch": 2.7636363636363637,
-      "grad_norm": 0.0898214727640152,
-      "learning_rate": 3.170507503261766e-06,
-      "loss": 0.0027,
-      "step": 1520
-    },
-    {
-      "epoch": 2.7818181818181817,
-      "grad_norm": 0.004568600095808506,
-      "learning_rate": 2.7070486245722837e-06,
-      "loss": 0.0017,
-      "step": 1530
-    },
-    {
-      "epoch": 2.8,
-      "grad_norm": 0.0027899593114852905,
-      "learning_rate": 2.2797301055607513e-06,
-      "loss": 0.0025,
-      "step": 1540
-    },
-    {
-      "epoch": 2.8181818181818183,
-      "grad_norm": 0.0008926861337386072,
-      "learning_rate": 1.888710677612693e-06,
-      "loss": 0.0014,
-      "step": 1550
-    },
-    {
-      "epoch": 2.8363636363636364,
-      "grad_norm": 0.0007694001542404294,
-      "learning_rate": 1.5341355884831431e-06,
-      "loss": 0.0024,
-      "step": 1560
-    },
-    {
-      "epoch": 2.8545454545454545,
-      "grad_norm": 0.03972714766860008,
-      "learning_rate": 1.2161365483429942e-06,
-      "loss": 0.0039,
-      "step": 1570
-    },
-    {
-      "epoch": 2.8727272727272726,
-      "grad_norm": 0.06516830623149872,
-      "learning_rate": 9.348316808541091e-07,
-      "loss": 0.0031,
-      "step": 1580
-    },
-    {
-      "epoch": 2.8909090909090907,
-      "grad_norm": 0.044061657041311264,
-      "learning_rate": 6.903254792911318e-07,
-      "loss": 0.003,
-      "step": 1590
-    },
-    {
-      "epoch": 2.909090909090909,
-      "grad_norm": 0.0019460883922874928,
-      "learning_rate": 4.827087677265585e-07,
       "loss": 0.0028,
-      "step": 1600
-    },
-    {
-      "epoch": 2.9272727272727272,
-      "grad_norm": 0.03875862807035446,
-      "learning_rate": 3.1205866729324687e-07,
-      "loss": 0.0024,
-      "step": 1610
-    },
-    {
-      "epoch": 2.9454545454545453,
-      "grad_norm": 0.002218488836660981,
-      "learning_rate": 1.784385675371425e-07,
-      "loss": 0.0019,
-      "step": 1620
-    },
-    {
-      "epoch": 2.963636363636364,
-      "grad_norm": 0.043530985713005066,
-      "learning_rate": 8.189810287055899e-08,
-      "loss": 0.0028,
-      "step": 1630
-    },
-    {
-      "epoch": 2.981818181818182,
-      "grad_norm": 0.044648706912994385,
-      "learning_rate": 2.247313413507035e-08,
-      "loss": 0.0031,
-      "step": 1640
     },
     {
       "epoch": 3.0,
-      "grad_norm": 0.04245592653751373,
-      "learning_rate": 1.8573528069998346e-10,
-      "loss": 0.0031,
-      "step": 1650
     }
   ],
   "logging_steps": 10,
-  "max_steps": 1650,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 3,
   "save_steps": 100,
@@ -1182,7 +1077,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 2568571720559616.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

   "best_model_checkpoint": null,
   "epoch": 3.0,
   "eval_steps": 500,
+  "global_step": 1500,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.02,
+      "grad_norm": 6.955190181732178,
       "learning_rate": 9e-05,
+      "loss": 1.8499,
       "step": 10
     },
     {
+      "epoch": 0.04,
+      "grad_norm": 8.705879211425781,
       "learning_rate": 0.00019,
+      "loss": 0.1145,
       "step": 20
     },
     {
+      "epoch": 0.06,
+      "grad_norm": 1.1119884252548218,
+      "learning_rate": 0.00019998175187996916,
+      "loss": 0.0391,
       "step": 30
     },
     {
+      "epoch": 0.08,
+      "grad_norm": 1.489068865776062,
+      "learning_rate": 0.0001999186805091047,
+      "loss": 0.0224,
       "step": 40
     },
     {
+      "epoch": 0.1,
+      "grad_norm": 0.8914257884025574,
+      "learning_rate": 0.00019981058901312606,
+      "loss": 0.0211,
       "step": 50
     },
     {
+      "epoch": 0.12,
+      "grad_norm": 0.17663325369358063,
+      "learning_rate": 0.00019965752609456464,
+      "loss": 0.0078,
       "step": 60
     },
     {
+      "epoch": 0.14,
+      "grad_norm": 0.03448864817619324,
+      "learning_rate": 0.00019945956071862003,
+      "loss": 0.0081,
       "step": 70
     },
     {
+      "epoch": 0.16,
+      "grad_norm": 4.913799285888672,
+      "learning_rate": 0.00019921678208208654,
+      "loss": 0.0047,
       "step": 80
     },
     {
+      "epoch": 0.18,
+      "grad_norm": 0.6515750885009766,
+      "learning_rate": 0.00019892929957316397,
+      "loss": 0.0115,
       "step": 90
     },
     {
+      "epoch": 0.2,
+      "grad_norm": 0.25924447178840637,
+      "learning_rate": 0.00019859724272217099,
+      "loss": 0.0168,
       "step": 100
     },
     {
+      "epoch": 0.22,
+      "grad_norm": 0.21993209421634674,
+      "learning_rate": 0.0001982207611431827,
+      "loss": 0.0097,
       "step": 110
     },
     {
+      "epoch": 0.24,
+      "grad_norm": 1.2233567237854004,
+      "learning_rate": 0.00019780002446661966,
+      "loss": 0.0079,
       "step": 120
     },
     {
+      "epoch": 0.26,
+      "grad_norm": 0.14659936726093292,
+      "learning_rate": 0.0001973352222628176,
+      "loss": 0.0041,
       "step": 130
     },
     {
+      "epoch": 0.28,
+      "grad_norm": 0.13708041608333588,
+      "learning_rate": 0.0001968265639566135,
+      "loss": 0.0047,
       "step": 140
     },
     {
+      "epoch": 0.3,
+      "grad_norm": 0.11654426902532578,
+      "learning_rate": 0.0001962742787329852,
+      "loss": 0.0039,
       "step": 150
     },
     {
+      "epoch": 0.32,
+      "grad_norm": 0.1725812405347824,
+      "learning_rate": 0.00019567861543378837,
+      "loss": 0.0041,
       "step": 160
     },
     {
+      "epoch": 0.34,
+      "grad_norm": 0.002145808655768633,
+      "learning_rate": 0.00019503984244563616,
+      "loss": 0.0017,
       "step": 170
     },
     {
+      "epoch": 0.36,
+      "grad_norm": 2.0850086212158203,
+      "learning_rate": 0.000194358247578973,
+      "loss": 0.0048,
       "step": 180
     },
     {
+      "epoch": 0.38,
+      "grad_norm": 0.07945302873849869,
+      "learning_rate": 0.00019363413793839658,
+      "loss": 0.0051,
       "step": 190
     },
     {
+      "epoch": 0.4,
+      "grad_norm": 0.231564462184906,
+      "learning_rate": 0.00019286783978428624,
+      "loss": 0.0078,
       "step": 200
     },
     {
+      "epoch": 0.42,
+      "grad_norm": 0.013585828244686127,
+      "learning_rate": 0.00019205969838580094,
+      "loss": 0.0042,
       "step": 210
     },
     {
+      "epoch": 0.44,
+      "grad_norm": 0.009324288927018642,
+      "learning_rate": 0.00019121007786531178,
+      "loss": 0.0056,
       "step": 220
     },
     {
+      "epoch": 0.46,
+      "grad_norm": 0.0634087324142456,
+      "learning_rate": 0.00019031936103434044,
+      "loss": 0.0046,
       "step": 230
     },
     {
+      "epoch": 0.48,
+      "grad_norm": 0.06287066638469696,
+      "learning_rate": 0.00018938794922107675,
+      "loss": 0.0028,
       "step": 240
     },
     {
+      "epoch": 0.5,
+      "grad_norm": 0.0060167331248521805,
+      "learning_rate": 0.00018841626208955292,
+      "loss": 0.0027,
       "step": 250
     },
     {
+      "epoch": 0.52,
+      "grad_norm": 0.0029571247287094593,
+      "learning_rate": 0.0001874047374505569,
+      "loss": 0.0046,
       "step": 260
     },
     {
+      "epoch": 0.54,
+      "grad_norm": 0.0028790035285055637,
+      "learning_rate": 0.00018635383106436855,
+      "loss": 0.0028,
       "step": 270
     },
     {
+      "epoch": 0.56,
+      "grad_norm": 0.04535164311528206,
+      "learning_rate": 0.00018526401643540922,
+      "loss": 0.0046,
       "step": 280
     },
     {
+      "epoch": 0.58,
+      "grad_norm": 0.09899382293224335,
+      "learning_rate": 0.0001841357845988957,
+      "loss": 0.0037,
       "step": 290
     },
     {
+      "epoch": 0.6,
+      "grad_norm": 0.6568487286567688,
+      "learning_rate": 0.00018296964389959578,
+      "loss": 0.0101,
       "step": 300
     },
     {
+      "epoch": 0.62,
+      "grad_norm": 0.16598820686340332,
+      "learning_rate": 0.00018176611976278441,
+      "loss": 0.0237,
       "step": 310
     },
     {
+      "epoch": 0.64,
+      "grad_norm": 0.08521094173192978,
+      "learning_rate": 0.00018052575445750419,
+      "loss": 0.0058,
       "step": 320
     },
     {
+      "epoch": 0.66,
+      "grad_norm": 1.5081530809402466,
+      "learning_rate": 0.00017924910685223643,
+      "loss": 0.0205,
       "step": 330
     },
     {
+      "epoch": 0.68,
+      "grad_norm": 0.35874947905540466,
+      "learning_rate": 0.0001779367521630931,
+      "loss": 0.1046,
       "step": 340
     },
     {
+      "epoch": 0.7,
+      "grad_norm": 0.229485422372818,
+      "learning_rate": 0.00017658928169464312,
+      "loss": 0.0129,
       "step": 350
     },
     {
+      "epoch": 0.72,
+      "grad_norm": 0.4974204897880554,
+      "learning_rate": 0.00017520730257348946,
+      "loss": 0.006,
       "step": 360
     },
     {
+      "epoch": 0.74,
+      "grad_norm": 0.8693443536758423,
+      "learning_rate": 0.00017379143747471768,
+      "loss": 0.0075,
       "step": 370
     },
     {
+      "epoch": 0.76,
+      "grad_norm": 1.4635262489318848,
+      "learning_rate": 0.00017234232434133883,
+      "loss": 0.0077,
       "step": 380
     },
     {
+      "epoch": 0.78,
+      "grad_norm": 0.047906193882226944,
+      "learning_rate": 0.00017086061609685257,
+      "loss": 0.0094,
       "step": 390
     },
     {
+      "epoch": 0.8,
+      "grad_norm": 0.13906188309192657,
+      "learning_rate": 0.00016934698035106133,
+      "loss": 0.0096,
       "step": 400
     },
     {
+      "epoch": 0.82,
+      "grad_norm": 0.11411689221858978,
+      "learning_rate": 0.00016780209909926676,
+      "loss": 0.0092,
       "step": 410
     },
     {
+      "epoch": 0.84,
+      "grad_norm": 0.011227969080209732,
+      "learning_rate": 0.00016622666841498463,
+      "loss": 0.0084,
       "step": 420
     },
     {
+      "epoch": 0.86,
+      "grad_norm": 1.5423718690872192,
+      "learning_rate": 0.00016462139813631693,
+      "loss": 0.0041,
       "step": 430
     },
     {
+      "epoch": 0.88,
+      "grad_norm": 0.347548246383667,
+      "learning_rate": 0.00016298701154612147,
+      "loss": 0.0046,
       "step": 440
     },
     {
+      "epoch": 0.9,
+      "grad_norm": 0.03537672758102417,
+      "learning_rate": 0.00016132424504612406,
+      "loss": 0.016,
       "step": 450
     },
     {
+      "epoch": 0.92,
+      "grad_norm": 0.05080743879079819,
+      "learning_rate": 0.00015963384782511993,
+      "loss": 0.006,
       "step": 460
     },
     {
+      "epoch": 0.94,
+      "grad_norm": 0.00858448538929224,
+      "learning_rate": 0.00015791658152141327,
+      "loss": 0.0047,
       "step": 470
     },
     {
+      "epoch": 0.96,
+      "grad_norm": 0.19455233216285706,
+      "learning_rate": 0.00015617321987964776,
+      "loss": 0.0044,
       "step": 480
     },
     {
+      "epoch": 0.98,
+      "grad_norm": 0.08450737595558167,
+      "learning_rate": 0.00015440454840218225,
       "loss": 0.0039,
       "step": 490
     },
     {
+      "epoch": 1.0,
+      "grad_norm": 0.006426886655390263,
+      "learning_rate": 0.00015261136399516873,
+      "loss": 0.003,
       "step": 500
     },
     {
+      "epoch": 1.02,
+      "grad_norm": 0.11707904934883118,
+      "learning_rate": 0.00015079447460949238,
+      "loss": 0.0053,
       "step": 510
     },
     {
+      "epoch": 1.04,
+      "grad_norm": 0.005810865201056004,
+      "learning_rate": 0.00014895469887673483,
+      "loss": 0.0028,
       "step": 520
     },
     {
+      "epoch": 1.06,
+      "grad_norm": 0.007626334670931101,
+      "learning_rate": 0.00014709286574032536,
+      "loss": 0.0036,
       "step": 530
     },
     {
+      "epoch": 1.08,
+      "grad_norm": 0.012294676154851913,
+      "learning_rate": 0.00014520981408204574,
+      "loss": 0.0014,
       "step": 540
     },
     {
+      "epoch": 1.1,
+      "grad_norm": 0.0394788458943367,
+      "learning_rate": 0.00014330639234405742,
+      "loss": 0.0032,
       "step": 550
     },
     {
+      "epoch": 1.12,
+      "grad_norm": 0.006509778555482626,
+      "learning_rate": 0.00014138345814662068,
+      "loss": 0.0026,
       "step": 560
     },
     {
+      "epoch": 1.1400000000000001,
+      "grad_norm": 0.12478422373533249,
+      "learning_rate": 0.0001394418779016789,
+      "loss": 0.0072,
       "step": 570
     },
     {
+      "epoch": 1.16,
+      "grad_norm": 0.03920350596308708,
+      "learning_rate": 0.00013748252642248115,
+      "loss": 0.0034,
       "step": 580
     },
     {
+      "epoch": 1.18,
+      "grad_norm": 0.06001276522874832,
+      "learning_rate": 0.00013550628652941985,
+      "loss": 0.003,
       "step": 590
     },
     {
+      "epoch": 1.2,
+      "grad_norm": 0.016110895201563835,
+      "learning_rate": 0.0001335140486522604,
+      "loss": 0.0028,
       "step": 600
     },
     {
+      "epoch": 1.22,
+      "grad_norm": 0.012471065856516361,
+      "learning_rate": 0.00013150671042894228,
+      "loss": 0.003,
       "step": 610
     },
     {
+      "epoch": 1.24,
+      "grad_norm": 0.03785092383623123,
+      "learning_rate": 0.00012948517630113245,
+      "loss": 0.0036,
       "step": 620
     },
     {
+      "epoch": 1.26,
+      "grad_norm": 0.06877022981643677,
+      "learning_rate": 0.0001274503571067131,
+      "loss": 0.0032,
       "step": 630
     },
     {
+      "epoch": 1.28,
+      "grad_norm": 0.14263379573822021,
+      "learning_rate": 0.00012540316966938795,
+      "loss": 0.0038,
       "step": 640
     },
     {
+      "epoch": 1.3,
+      "grad_norm": 0.011557388119399548,
+      "learning_rate": 0.00012334453638559057,
+      "loss": 0.0034,
       "step": 650
     },
     {
+      "epoch": 1.32,
+      "grad_norm": 0.05995357036590576,
+      "learning_rate": 0.00012127538480888283,
+      "loss": 0.0034,
       "step": 660
     },
     {
+      "epoch": 1.34,
+      "grad_norm": 0.06527257710695267,
+      "learning_rate": 0.00011919664723202906,
+      "loss": 0.0028,
       "step": 670
     },
     {
+      "epoch": 1.3599999999999999,
+      "grad_norm": 0.12198451906442642,
+      "learning_rate": 0.00011710926026693525,
+      "loss": 0.003,
       "step": 680
     },
     {
+      "epoch": 1.38,
+      "grad_norm": 0.005978007335215807,
+      "learning_rate": 0.00011501416442264184,
       "loss": 0.0035,
       "step": 690
     },
     {
+      "epoch": 1.4,
+      "grad_norm": 0.01665549911558628,
+      "learning_rate": 0.00011291230368156087,
+      "loss": 0.0065,
       "step": 700
     },
     {
+      "epoch": 1.42,
+      "grad_norm": 0.24193237721920013,
+      "learning_rate": 0.00011080462507414806,
+      "loss": 0.0066,
       "step": 710
     },
     {
+      "epoch": 1.44,
+      "grad_norm": 0.04041582718491554,
+      "learning_rate": 0.00010869207825220147,
+      "loss": 0.0025,
       "step": 720
     },
     {
+      "epoch": 1.46,
+      "grad_norm": 0.0575215183198452,
+      "learning_rate": 0.0001065756150609792,
+      "loss": 0.0066,
       "step": 730
     },
     {
+      "epoch": 1.48,
+      "grad_norm": 0.06933045387268066,
+      "learning_rate": 0.00010445618911032853,
+      "loss": 0.0031,
       "step": 740
     },
     {
+      "epoch": 1.5,
+      "grad_norm": 0.08046724647283554,
+      "learning_rate": 0.00010233475534502042,
+      "loss": 0.0035,
       "step": 750
     },
     {
+      "epoch": 1.52,
+      "grad_norm": 0.12715864181518555,
+      "learning_rate": 0.00010021226961448209,
+      "loss": 0.0031,
       "step": 760
     },
     {
+      "epoch": 1.54,
+      "grad_norm": 0.05433020740747452,
+      "learning_rate": 9.808968824212234e-05,
+      "loss": 0.0016,
       "step": 770
     },
     {
+      "epoch": 1.56,
+      "grad_norm": 0.005047277547419071,
+      "learning_rate": 9.596796759444293e-05,
+      "loss": 0.0023,
       "step": 780
     },
     {
+      "epoch": 1.58,
+      "grad_norm": 0.06136553734540939,
+      "learning_rate": 9.384806365013113e-05,
+      "loss": 0.0037,
       "step": 790
     },
     {
+      "epoch": 1.6,
+      "grad_norm": 0.07095402479171753,
+      "learning_rate": 9.173093156932623e-05,
+      "loss": 0.0032,
       "step": 800
     },
     {
+      "epoch": 1.62,
+      "grad_norm": 0.007647352758795023,
+      "learning_rate": 8.961752526325565e-05,
+      "loss": 0.0031,
       "step": 810
     },
     {
+      "epoch": 1.6400000000000001,
+      "grad_norm": 0.10909626632928848,
+      "learning_rate": 8.750879696443321e-05,
+      "loss": 0.0044,
       "step": 820
     },
     {
+      "epoch": 1.6600000000000001,
+      "grad_norm": 0.005036857444792986,
+      "learning_rate": 8.540569679761391e-05,
+      "loss": 0.0017,
       "step": 830
     },
     {
+      "epoch": 1.6800000000000002,
+      "grad_norm": 0.04456391558051109,
+      "learning_rate": 8.330917235169867e-05,
+      "loss": 0.0028,
       "step": 840
     },
     {
+      "epoch": 1.7,
+      "grad_norm": 0.026686355471611023,
+      "learning_rate": 8.12201682527811e-05,
+      "loss": 0.0009,
       "step": 850
     },
     {
+      "epoch": 1.72,
+      "grad_norm": 0.5429267883300781,
+      "learning_rate": 7.913962573852996e-05,
+      "loss": 0.0042,
       "step": 860
     },
     {
+      "epoch": 1.74,
+      "grad_norm": 0.09535812586545944,
+      "learning_rate": 7.706848223409759e-05,
+      "loss": 0.003,
       "step": 870
     },
     {
+      "epoch": 1.76,
+      "grad_norm": 0.004211663268506527,
+      "learning_rate": 7.500767092974647e-05,
+      "loss": 0.0034,
       "step": 880
     },
     {
+      "epoch": 1.78,
+      "grad_norm": 0.007590270135551691,
+      "learning_rate": 7.295812036038407e-05,
+      "loss": 0.0023,
       "step": 890
     },
     {
+      "epoch": 1.8,
+      "grad_norm": 0.09613944590091705,
+      "learning_rate": 7.092075398719502e-05,
+      "loss": 0.0029,
       "step": 900
     },
     {
+      "epoch": 1.8199999999999998,
+      "grad_norm": 0.04118485003709793,
+      "learning_rate": 6.889648978155909e-05,
+      "loss": 0.0018,
       "step": 910
     },
     {
+      "epoch": 1.8399999999999999,
+      "grad_norm": 0.03493022918701172,
+      "learning_rate": 6.688623981144339e-05,
       "loss": 0.003,
       "step": 920
     },
     {
+      "epoch": 1.8599999999999999,
+      "grad_norm": 0.05923795700073242,
+      "learning_rate": 6.489090983045379e-05,
+      "loss": 0.0023,
       "step": 930
     },
     {
+      "epoch": 1.88,
+      "grad_norm": 0.004041542299091816,
+      "learning_rate": 6.291139886973169e-05,
+      "loss": 0.0019,
       "step": 940
     },
     {
+      "epoch": 1.9,
+      "grad_norm": 0.058228544890880585,
+      "learning_rate": 6.094859883287977e-05,
+      "loss": 0.0027,
       "step": 950
     },
     {
+      "epoch": 1.92,
+      "grad_norm": 0.0040581803768873215,
+      "learning_rate": 5.90033940940989e-05,
+      "loss": 0.0013,
       "step": 960
     },
     {
+      "epoch": 1.94,
+      "grad_norm": 0.0043375324457883835,
+      "learning_rate": 5.7076661099717986e-05,
+      "loss": 0.0025,
       "step": 970
     },
     {
+      "epoch": 1.96,
+      "grad_norm": 0.0034573073498904705,
+      "learning_rate": 5.5169267973295294e-05,
+      "loss": 0.002,
       "step": 980
     },
     {
+      "epoch": 1.98,
+      "grad_norm": 0.03163556754589081,
+      "learning_rate": 5.3282074124470284e-05,
+      "loss": 0.0021,
       "step": 990
     },
     {
+      "epoch": 2.0,
+      "grad_norm": 0.027402976527810097,
+      "learning_rate": 5.141592986174151e-05,
+      "loss": 0.0032,
       "step": 1000
     },
     {
+      "epoch": 2.02,
+      "grad_norm": 0.03729909658432007,
+      "learning_rate": 4.957167600934474e-05,
+      "loss": 0.0027,
       "step": 1010
     },
     {
+      "epoch": 2.04,
+      "grad_norm": 0.06650708615779877,
+      "learning_rate": 4.7750143528405126e-05,
+      "loss": 0.0031,
       "step": 1020
     },
     {
+      "epoch": 2.06,
+      "grad_norm": 0.05407334491610527,
+      "learning_rate": 4.595215314253285e-05,
+      "loss": 0.0024,
       "step": 1030
     },
     {
+      "epoch": 2.08,
+      "grad_norm": 0.06878269463777542,
+      "learning_rate": 4.417851496803164e-05,
+      "loss": 0.0031,
       "step": 1040
     },
     {
+      "epoch": 2.1,
+      "grad_norm": 0.11301030218601227,
+      "learning_rate": 4.243002814888656e-05,
+      "loss": 0.0029,
       "step": 1050
     },
     {
+      "epoch": 2.12,
+      "grad_norm": 0.0333506241440773,
+      "learning_rate": 4.0707480496695514e-05,
+      "loss": 0.0022,
       "step": 1060
     },
     {
+      "epoch": 2.14,
+      "grad_norm": 0.07614877074956894,
+      "learning_rate": 3.9011648135706966e-05,
+      "loss": 0.0022,
       "step": 1070
     },
     {
+      "epoch": 2.16,
+      "grad_norm": 0.0026465761475265026,
+      "learning_rate": 3.734329515312349e-05,
+      "loss": 0.0023,
       "step": 1080
     },
     {
+      "epoch": 2.18,
+      "grad_norm": 0.003543775761500001,
+      "learning_rate": 3.570317325482847e-05,
+      "loss": 0.0017,
       "step": 1090
     },
     {
+      "epoch": 2.2,
+      "grad_norm": 0.058600034564733505,
+      "learning_rate": 3.409202142669213e-05,
+      "loss": 0.0035,
       "step": 1100
     },
     {
+      "epoch": 2.22,
+      "grad_norm": 0.04403044655919075,
+      "learning_rate": 3.251056560160821e-05,
+      "loss": 0.003,
       "step": 1110
     },
     {
+      "epoch": 2.24,
+      "grad_norm": 0.12272568047046661,
+      "learning_rate": 3.095951833241213e-05,
+      "loss": 0.003,
       "step": 1120
     },
     {
+      "epoch": 2.26,
+      "grad_norm": 0.003978225402534008,
+      "learning_rate": 2.9439578470827755e-05,
+      "loss": 0.0026,
       "step": 1130
     },
     {
+      "epoch": 2.2800000000000002,
+      "grad_norm": 0.0032513344194740057,
+      "learning_rate": 2.7951430852587268e-05,
+      "loss": 0.0035,
       "step": 1140
     },
     {
+      "epoch": 2.3,
+      "grad_norm": 0.0030118192080408335,
+      "learning_rate": 2.649574598886665e-05,
+      "loss": 0.0021,
       "step": 1150
     },
     {
+      "epoch": 2.32,
+      "grad_norm": 0.050590354949235916,
+      "learning_rate": 2.507317976417475e-05,
       "loss": 0.0031,
       "step": 1160
     },
     {
+      "epoch": 2.34,
+      "grad_norm": 0.024627618491649628,
+      "learning_rate": 2.3684373140833016e-05,
+      "loss": 0.0029,
       "step": 1170
     },
     {
+      "epoch": 2.36,
+      "grad_norm": 0.05296558514237404,
+      "learning_rate": 2.2329951870178655e-05,
+      "loss": 0.0041,
       "step": 1180
     },
     {
+      "epoch": 2.38,
+      "grad_norm": 0.00513384910300374,
+      "learning_rate": 2.1010526210621406e-05,
+      "loss": 0.002,
       "step": 1190
     },
     {
+      "epoch": 2.4,
+      "grad_norm": 0.008860021829605103,
+      "learning_rate": 1.9726690652680578e-05,
+      "loss": 0.0019,
       "step": 1200
     },
     {
+      "epoch": 2.42,
+      "grad_norm": 0.0037229766603559256,
+      "learning_rate": 1.8479023651127115e-05,
+      "loss": 0.0023,
       "step": 1210
     },
     {
+      "epoch": 2.44,
+      "grad_norm": 0.004817347973585129,
+      "learning_rate": 1.726808736435046e-05,
+      "loss": 0.0015,
       "step": 1220
     },
     {
+      "epoch": 2.46,
+      "grad_norm": 0.05881744623184204,
+      "learning_rate": 1.6094427401068224e-05,
+      "loss": 0.0028,
       "step": 1230
     },
     {
+      "epoch": 2.48,
+      "grad_norm": 0.00562276691198349,
+      "learning_rate": 1.4958572574492501e-05,
+      "loss": 0.0023,
       "step": 1240
     },
     {
+      "epoch": 2.5,
+      "grad_norm": 0.0030358799267560244,
+      "learning_rate": 1.38610346640637e-05,
+      "loss": 0.0018,
       "step": 1250
     },
     {
+      "epoch": 2.52,
+      "grad_norm": 0.003225122345611453,
+      "learning_rate": 1.2802308184859502e-05,
+      "loss": 0.0013,
       "step": 1260
     },
     {
+      "epoch": 2.54,
+      "grad_norm": 0.054522693157196045,
+      "learning_rate": 1.1782870164782111e-05,
+      "loss": 0.0019,
       "step": 1270
     },
     {
+      "epoch": 2.56,
+      "grad_norm": 0.041042644530534744,
+      "learning_rate": 1.0803179929624973e-05,
+      "loss": 0.0021,
       "step": 1280
     },
     {
+      "epoch": 2.58,
+      "grad_norm": 0.00768931582570076,
+      "learning_rate": 9.863678896115559e-06,
+      "loss": 0.0022,
       "step": 1290
     },
     {
+      "epoch": 2.6,
+      "grad_norm": 0.0033417909871786833,
+      "learning_rate": 8.964790373027132e-06,
+      "loss": 0.0017,
       "step": 1300
     },
     {
+      "epoch": 2.62,
+      "grad_norm": 0.04406670480966568,
+      "learning_rate": 8.106919370449572e-06,
+      "loss": 0.0027,
       "step": 1310
     },
     {
+      "epoch": 2.64,
+      "grad_norm": 0.043826743960380554,
+      "learning_rate": 7.290452417304916e-06,
+      "loss": 0.0024,
       "step": 1320
     },
     {
+      "epoch": 2.66,
+      "grad_norm": 0.08563917130231857,
+      "learning_rate": 6.515757387189902e-06,
+      "loss": 0.0017,
       "step": 1330
     },
     {
+      "epoch": 2.68,
+      "grad_norm": 0.003863748861476779,
+      "learning_rate": 5.783183332624098e-06,
+      "loss": 0.0028,
       "step": 1340
     },
     {
+      "epoch": 2.7,
+      "grad_norm": 0.00339197413995862,
+      "learning_rate": 5.093060327778043e-06,
+      "loss": 0.0028,
       "step": 1350
     },
     {
+      "epoch": 2.7199999999999998,
+      "grad_norm": 0.11382139474153519,
+      "learning_rate": 4.445699319752539e-06,
+      "loss": 0.0034,
       "step": 1360
     },
     {
+      "epoch": 2.74,
+      "grad_norm": 0.0028220233507454395,
+      "learning_rate": 3.841391988476018e-06,
+      "loss": 0.0018,
       "step": 1370
     },
     {
+      "epoch": 2.76,
+      "grad_norm": 0.058400608599185944,
+      "learning_rate": 3.2804106152828582e-06,
+      "loss": 0.0027,
       "step": 1380
     },
     {
+      "epoch": 2.7800000000000002,
+      "grad_norm": 0.110735222697258,
+      "learning_rate": 2.7630079602323442e-06,
+      "loss": 0.0026,
       "step": 1390
     },
     {
+      "epoch": 2.8,
+      "grad_norm": 0.0034612929448485374,
+      "learning_rate": 2.289417148223094e-06,
+      "loss": 0.0021,
       "step": 1400
     },
     {
+      "epoch": 2.82,
+      "grad_norm": 0.04820137843489647,
+      "learning_rate": 1.8598515639545622e-06,
+      "loss": 0.0021,
       "step": 1410
     },
     {
+      "epoch": 2.84,
+      "grad_norm": 0.08901210874319077,
+      "learning_rate": 1.4745047557827796e-06,
+      "loss": 0.0017,
       "step": 1420
     },
     {
+      "epoch": 2.86,
+      "grad_norm": 0.052542563527822495,
+      "learning_rate": 1.133550348513701e-06,
+      "loss": 0.0017,
       "step": 1430
     },
     {
+      "epoch": 2.88,
+      "grad_norm": 0.0500030443072319,
+      "learning_rate": 8.371419651735268e-07,
+      "loss": 0.0018,
       "step": 1440
     },
     {
+      "epoch": 2.9,
+      "grad_norm": 0.05235590413212776,
+      "learning_rate": 5.854131577911259e-07,
+      "loss": 0.0025,
       "step": 1450
     },
     {
+      "epoch": 2.92,
+      "grad_norm": 0.050380345433950424,
+      "learning_rate": 3.7847734722378234e-07,
+      "loss": 0.0021,
       "step": 1460
     },
     {
+      "epoch": 2.94,
+      "grad_norm": 0.04665536433458328,
+      "learning_rate": 2.1642777205346242e-07,
+      "loss": 0.0021,
       "step": 1470
     },
     {
+      "epoch": 2.96,
+      "grad_norm": 0.08257055282592773,
+      "learning_rate": 9.933744657651956e-08,
       "loss": 0.0026,
       "step": 1480
     },
     {
+      "epoch": 2.98,
+      "grad_norm": 0.04862203821539879,
+      "learning_rate": 2.7259127905776562e-08,
       "loss": 0.0028,
+      "step": 1490
     },
     {
       "epoch": 3.0,
+      "grad_norm": 0.05159607157111168,
+      "learning_rate": 2.252921999401636e-10,
+      "loss": 0.0029,
+      "step": 1500
     }
   ],
   "logging_steps": 10,
+  "max_steps": 1500,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 3,
   "save_steps": 100,
       "attributes": {}
     }
   },
+  "total_flos": 2975560292163072.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

last-checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:abc526e0d2d37a3bd6eaa08f79cb4da62578b543694bda368ffceb921df35e95
 size 6353

 version https://git-lfs.github.com/spec/v1
+oid sha256:bad76c15491571a2b0d904f43b98d3ec2521c42abf54bc17c579eedfa7b8332a
 size 6353