Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

.gitattributes +1 -0
README.md +198 -0
adapter_config.json +39 -0
adapter_model.safetensors +3 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +33 -0
tokenizer.json +3 -0
tokenizer_config.json +0 -0
trainer_state.json +2409 -0
training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,198 @@

+---
+base_model: google/gemma-3-270m
+library_name: peft
+license: mit
+tags:
+  - chess
+  - lora
+  - mixture-of-experts
+  - mps
+  - apple-silicon
+  - gemma
+  - uci
+  - chess-engine
+datasets:
+  - lukifer23/gemmafischer-chess-training
+language:
+  - en
+pipeline_tag: text-generation
+---
+# GemmaFischer UCI Expert LoRA
+LoRA adapter for chess move generation in UCI format, trained on Google's Gemma-3 270M base model. This is the **UCI Expert** from the GemmaFischer Mixture of Experts chess system, optimized for Apple Silicon with MPS acceleration.
+## Model Description
+This adapter specializes in generating legal chess moves in UCI (Universal Chess Interface) format. It's part of a 3-expert system including:
+- **UCI Expert** (this model): Fast move generation in UCI format
+- **Tutor Expert**: Detailed chess explanations and analysis
+- **Director Expert**: Strategic reasoning and Q&A
+## Training Details
+### Base Model
+- **Model**: google/gemma-3-270m
+- **Architecture**: Gemma-3 270M parameters
+### LoRA Configuration
+- **Rank (r)**: 16
+- **Alpha**: 32
+- **Dropout**: 0.05
+- **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`
+- **Task Type**: Causal Language Modeling
+### Training Data
+- **Dataset Size**: 50,000 chess positions
+- **Validation**: 100% Stockfish-verified legal moves
+- **Quality Score**: 0.8
+- **Format**: Standardized JSONL with metadata
+### Training Metrics
+- **Total Steps**: 1,600
+- **Best Eval Loss**: 0.8723 (at step 1600)
+- **Final Training Loss**: 0.7017
+- **Training Platform**: Apple M3 Pro with MPS acceleration
+- **Training Speed**: ~2-3 steps/second
+- **Batch Size**: 1 with gradient accumulation
+### Hardware & Optimization
+- **Platform**: Mac-only (M3 Pro)
+- **Acceleration**: MPS (Metal Performance Shaders)
+- **Memory Optimization**: Gradient checkpointing enabled
+- **Peak Memory**: ~3-5GB
+## Usage
+### Installation
+```bash
+pip install transformers peft torch
+```
+### Loading the Model
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "google/gemma-3-270m",
+    device_map="mps",  # For Apple Silicon
+    torch_dtype="auto"
+)
+# Load LoRA adapter
+model = PeftModel.from_pretrained(
+    base_model,
+    "lukifer23/gemmafischer-uci-lora"
+)
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m")
+```
+### Generating UCI Moves
+```python
+# Format: FEN position -> UCI move
+fen = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
+prompt = f"FEN: {fen}\nGenerate the best move in UCI format only:"
+inputs = tokenizer(prompt, return_tensors="pt").to("mps")
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=5,
+    do_sample=False,  # Deterministic for UCI
+    temperature=0.0
+)
+move = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(move)  # e.g., "e2e4"
+```
+### Integration with Chess Software
+```python
+import chess
+def get_uci_move(fen_position):
+    """Generate UCI move for a given position."""
+    prompt = f"FEN: {fen_position}\nGenerate the best move in UCI format only:"
+    inputs = tokenizer(prompt, return_tensors="pt").to("mps")
+    outputs = model.generate(**inputs, max_new_tokens=5, do_sample=False)
+    move_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    # Extract UCI move (format: e2e4, e7e8q for promotion)
+    import re
+    uci_match = re.search(r'[a-h][1-8][a-h][1-8][qrbn]?', move_text)
+    return uci_match.group(0) if uci_match else None
+# Example usage
+board = chess.Board()
+uci_move = get_uci_move(board.fen())
+if uci_move:
+    move = chess.Move.from_uci(uci_move)
+    board.push(move)
+```
+## Performance
+### Capabilities
+- **Move Legality**: 100% legal move generation (Stockfish validated)
+- **UCI Format**: Correct UCI notation (e.g., `e2e4`, `e7e8q`)
+- **Inference Speed**: ~0.4-0.5s per move on M3 Pro
+- **Special Moves**: Supports castling, en passant, promotions
+### Limitations
+- Optimized for Apple Silicon MPS only
+- Not a strong chess engine (270M parameters)
+- Best used as part of MoE system with other experts
+- Requires base model access (Google Gemma-3)
+## System Requirements
+- **Hardware**: Mac with Apple Silicon (M1/M2/M3/M4)
+- **RAM**: 8GB minimum, 16GB recommended
+- **macOS**: 12.0+ (for MPS support)
+- **Python**: 3.10+
+## Related Models & Resources
+### GemmaFischer Collection
+- **Tutor Expert**: [lukifer23/gemmafischer-tutor-lora](https://huggingface.co/lukifer23/gemmafischer-tutor-lora) (coming soon)
+- **Director Expert**: [lukifer23/gemmafischer-director-lora](https://huggingface.co/lukifer23/gemmafischer-director-lora) (coming soon)
+- **Training Dataset**: [lukifer23/gemmafischer-chess-training](https://huggingface.co/datasets/lukifer23/gemmafischer-chess-training) (coming soon)
+### Repository
+- **GitHub**: [github.com/lukifer23/GemmaFischer](https://github.com/lukifer23/GemmaFischer)
+- **Documentation**: Full training guides, evaluation tools, and MoE system
+- **Web Interface**: Interactive chess board with expert switching
+## Training Loss Curve
+The model was trained for 1,600 steps with evaluation every 100 steps:
+- Initial loss: 4.59 (step 1)
+- Best eval loss: 0.872 (step 1600)
+- Final training loss: 0.702 (step 1600)
+Steady convergence with cosine learning rate schedule from 1e-4 to near zero.
+## Citation
+```bibtex
+@misc{gemmafischer2025,
+  author = {lukifer23},
+  title = {GemmaFischer: Chess Engine and Tutor with Mixture of Experts},
+  year = {2025},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/lukifer23/gemmafischer-uci-lora}}
+}
+```
+## License
+MIT License - See [LICENSE](https://github.com/lukifer23/GemmaFischer/blob/main/LICENSE) file for details.
+## Acknowledgments
+- **Base Model**: Google's Gemma-3 270M
+- **Training Platform**: Apple Silicon (M3 Pro) with MPS
+- **Validation**: Stockfish chess engine
+- **Framework**: HuggingFace Transformers + PEFT

adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/Users/admin/Downloads/VSCode/GemmaFischer/models/google-gemma-3-270m",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "q_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:74d6d9edde51340678ce1ee14ae112b077fc53078404605c4d903803c3f67bbf
+size 5917192

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4ee04aa80d94558c7cfc322b976fbbfaa7d7d2991535e15290a0c01d094b5cf2
+size 2156890246

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a416021fcc136006bfe4651385bb006441ee5a161cc6b38aff634835fe44cadc
+size 13990

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:933821bed50a92dd2dc11b2ebd21a8303e761867bc574b4556159143c11330c7
+size 1064

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<eos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca2f60fd56eabb86ada6d0ef7c30d1ce71e1ed22af2d19e5238a9f0a5cdfa23c
+size 33384666

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

trainer_state.json ADDED Viewed

	@@ -0,0 +1,2409 @@

+{
+  "best_global_step": 1600,
+  "best_metric": 0.8723308444023132,
+  "best_model_checkpoint": "checkpoints/lora_uci/checkpoint-1600",
+  "epoch": 0.14222222222222222,
+  "eval_steps": 100,
+  "global_step": 1600,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 8.888888888888889e-05,
+      "grad_norm": 34.336647033691406,
+      "learning_rate": 0.0,
+      "loss": 4.5963,
+      "step": 1
+    },
+    {
+      "epoch": 0.00044444444444444447,
+      "grad_norm": 32.09209060668945,
+      "learning_rate": 2.5e-06,
+      "loss": 4.4684,
+      "step": 5
+    },
+    {
+      "epoch": 0.0008888888888888889,
+      "grad_norm": 31.128032684326172,
+      "learning_rate": 5.625e-06,
+      "loss": 3.7103,
+      "step": 10
+    },
+    {
+      "epoch": 0.0013333333333333333,
+      "grad_norm": 22.065799713134766,
+      "learning_rate": 8.75e-06,
+      "loss": 2.5875,
+      "step": 15
+    },
+    {
+      "epoch": 0.0017777777777777779,
+      "grad_norm": 14.675823211669922,
+      "learning_rate": 1.1875e-05,
+      "loss": 1.7574,
+      "step": 20
+    },
+    {
+      "epoch": 0.0022222222222222222,
+      "grad_norm": 15.899088859558105,
+      "learning_rate": 1.5e-05,
+      "loss": 1.426,
+      "step": 25
+    },
+    {
+      "epoch": 0.0026666666666666666,
+      "grad_norm": 14.202256202697754,
+      "learning_rate": 1.8125e-05,
+      "loss": 1.2788,
+      "step": 30
+    },
+    {
+      "epoch": 0.003111111111111111,
+      "grad_norm": 9.793079376220703,
+      "learning_rate": 2.125e-05,
+      "loss": 1.2216,
+      "step": 35
+    },
+    {
+      "epoch": 0.0035555555555555557,
+      "grad_norm": 19.244722366333008,
+      "learning_rate": 2.4375e-05,
+      "loss": 1.2066,
+      "step": 40
+    },
+    {
+      "epoch": 0.004,
+      "grad_norm": 8.825323104858398,
+      "learning_rate": 2.7500000000000004e-05,
+      "loss": 1.0881,
+      "step": 45
+    },
+    {
+      "epoch": 0.0044444444444444444,
+      "grad_norm": 10.595223426818848,
+      "learning_rate": 3.0625000000000006e-05,
+      "loss": 1.0866,
+      "step": 50
+    },
+    {
+      "epoch": 0.004888888888888889,
+      "grad_norm": 12.221918106079102,
+      "learning_rate": 3.375000000000001e-05,
+      "loss": 1.1606,
+      "step": 55
+    },
+    {
+      "epoch": 0.005333333333333333,
+      "grad_norm": 11.6161527633667,
+      "learning_rate": 3.6875e-05,
+      "loss": 1.1047,
+      "step": 60
+    },
+    {
+      "epoch": 0.0057777777777777775,
+      "grad_norm": 8.067273139953613,
+      "learning_rate": 4e-05,
+      "loss": 1.0627,
+      "step": 65
+    },
+    {
+      "epoch": 0.006222222222222222,
+      "grad_norm": 12.541388511657715,
+      "learning_rate": 4.3125000000000005e-05,
+      "loss": 1.0847,
+      "step": 70
+    },
+    {
+      "epoch": 0.006666666666666667,
+      "grad_norm": 11.718969345092773,
+      "learning_rate": 4.6250000000000006e-05,
+      "loss": 1.0382,
+      "step": 75
+    },
+    {
+      "epoch": 0.0071111111111111115,
+      "grad_norm": 9.308419227600098,
+      "learning_rate": 4.937500000000001e-05,
+      "loss": 1.0483,
+      "step": 80
+    },
+    {
+      "epoch": 0.007555555555555556,
+      "grad_norm": 6.772762298583984,
+      "learning_rate": 5.25e-05,
+      "loss": 1.009,
+      "step": 85
+    },
+    {
+      "epoch": 0.008,
+      "grad_norm": 9.496241569519043,
+      "learning_rate": 5.5625000000000004e-05,
+      "loss": 0.9837,
+      "step": 90
+    },
+    {
+      "epoch": 0.008444444444444444,
+      "grad_norm": 7.885592937469482,
+      "learning_rate": 5.8750000000000005e-05,
+      "loss": 1.0189,
+      "step": 95
+    },
+    {
+      "epoch": 0.008888888888888889,
+      "grad_norm": 5.724958419799805,
+      "learning_rate": 6.1875e-05,
+      "loss": 1.0009,
+      "step": 100
+    },
+    {
+      "epoch": 0.008888888888888889,
+      "eval_loss": 1.1365891695022583,
+      "eval_runtime": 185.2718,
+      "eval_samples_per_second": 26.987,
+      "eval_steps_per_second": 3.373,
+      "step": 100
+    },
+    {
+      "epoch": 0.009333333333333334,
+      "grad_norm": 7.011026859283447,
+      "learning_rate": 6.500000000000001e-05,
+      "loss": 1.0031,
+      "step": 105
+    },
+    {
+      "epoch": 0.009777777777777778,
+      "grad_norm": 7.641518592834473,
+      "learning_rate": 6.8125e-05,
+      "loss": 0.997,
+      "step": 110
+    },
+    {
+      "epoch": 0.010222222222222223,
+      "grad_norm": 9.401971817016602,
+      "learning_rate": 7.125000000000001e-05,
+      "loss": 1.0158,
+      "step": 115
+    },
+    {
+      "epoch": 0.010666666666666666,
+      "grad_norm": 4.336047649383545,
+      "learning_rate": 7.4375e-05,
+      "loss": 0.9726,
+      "step": 120
+    },
+    {
+      "epoch": 0.011111111111111112,
+      "grad_norm": 6.882427215576172,
+      "learning_rate": 7.75e-05,
+      "loss": 1.0175,
+      "step": 125
+    },
+    {
+      "epoch": 0.011555555555555555,
+      "grad_norm": 5.442468643188477,
+      "learning_rate": 8.062500000000001e-05,
+      "loss": 0.936,
+      "step": 130
+    },
+    {
+      "epoch": 0.012,
+      "grad_norm": 4.264267444610596,
+      "learning_rate": 8.375e-05,
+      "loss": 0.9527,
+      "step": 135
+    },
+    {
+      "epoch": 0.012444444444444444,
+      "grad_norm": 5.994289398193359,
+      "learning_rate": 8.687500000000001e-05,
+      "loss": 0.9757,
+      "step": 140
+    },
+    {
+      "epoch": 0.012888888888888889,
+      "grad_norm": 5.154539585113525,
+      "learning_rate": 9e-05,
+      "loss": 0.9634,
+      "step": 145
+    },
+    {
+      "epoch": 0.013333333333333334,
+      "grad_norm": 5.39900541305542,
+      "learning_rate": 9.3125e-05,
+      "loss": 0.9563,
+      "step": 150
+    },
+    {
+      "epoch": 0.013777777777777778,
+      "grad_norm": 5.613903522491455,
+      "learning_rate": 9.625000000000001e-05,
+      "loss": 0.9728,
+      "step": 155
+    },
+    {
+      "epoch": 0.014222222222222223,
+      "grad_norm": 4.219268798828125,
+      "learning_rate": 9.9375e-05,
+      "loss": 0.9342,
+      "step": 160
+    },
+    {
+      "epoch": 0.014666666666666666,
+      "grad_norm": 6.655375003814697,
+      "learning_rate": 9.999809615320856e-05,
+      "loss": 0.9196,
+      "step": 165
+    },
+    {
+      "epoch": 0.015111111111111112,
+      "grad_norm": 4.512648582458496,
+      "learning_rate": 9.999036202410325e-05,
+      "loss": 1.0073,
+      "step": 170
+    },
+    {
+      "epoch": 0.015555555555555555,
+      "grad_norm": 4.687705993652344,
+      "learning_rate": 9.997667954183565e-05,
+      "loss": 0.9762,
+      "step": 175
+    },
+    {
+      "epoch": 0.016,
+      "grad_norm": 4.793412685394287,
+      "learning_rate": 9.995705033448435e-05,
+      "loss": 0.9198,
+      "step": 180
+    },
+    {
+      "epoch": 0.016444444444444446,
+      "grad_norm": 5.910813808441162,
+      "learning_rate": 9.99314767377287e-05,
+      "loss": 0.9407,
+      "step": 185
+    },
+    {
+      "epoch": 0.016888888888888887,
+      "grad_norm": 5.998133659362793,
+      "learning_rate": 9.9899961794571e-05,
+      "loss": 0.9476,
+      "step": 190
+    },
+    {
+      "epoch": 0.017333333333333333,
+      "grad_norm": 4.035412311553955,
+      "learning_rate": 9.986250925497429e-05,
+      "loss": 0.9648,
+      "step": 195
+    },
+    {
+      "epoch": 0.017777777777777778,
+      "grad_norm": 4.533802032470703,
+      "learning_rate": 9.981912357541627e-05,
+      "loss": 0.9095,
+      "step": 200
+    },
+    {
+      "epoch": 0.017777777777777778,
+      "eval_loss": 1.080334186553955,
+      "eval_runtime": 187.8444,
+      "eval_samples_per_second": 26.618,
+      "eval_steps_per_second": 3.327,
+      "step": 200
+    },
+    {
+      "epoch": 0.018222222222222223,
+      "grad_norm": 4.582113265991211,
+      "learning_rate": 9.976980991835894e-05,
+      "loss": 0.9287,
+      "step": 205
+    },
+    {
+      "epoch": 0.018666666666666668,
+      "grad_norm": 3.81330943107605,
+      "learning_rate": 9.971457415163435e-05,
+      "loss": 0.9538,
+      "step": 210
+    },
+    {
+      "epoch": 0.01911111111111111,
+      "grad_norm": 4.3214430809021,
+      "learning_rate": 9.965342284774632e-05,
+      "loss": 0.9262,
+      "step": 215
+    },
+    {
+      "epoch": 0.019555555555555555,
+      "grad_norm": 4.554825305938721,
+      "learning_rate": 9.958636328308853e-05,
+      "loss": 0.9172,
+      "step": 220
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 4.938535213470459,
+      "learning_rate": 9.951340343707852e-05,
+      "loss": 0.916,
+      "step": 225
+    },
+    {
+      "epoch": 0.020444444444444446,
+      "grad_norm": 4.2901930809021,
+      "learning_rate": 9.943455199120837e-05,
+      "loss": 0.9322,
+      "step": 230
+    },
+    {
+      "epoch": 0.020888888888888887,
+      "grad_norm": 4.4027228355407715,
+      "learning_rate": 9.93498183280116e-05,
+      "loss": 0.8971,
+      "step": 235
+    },
+    {
+      "epoch": 0.021333333333333333,
+      "grad_norm": 4.342219829559326,
+      "learning_rate": 9.925921252994676e-05,
+      "loss": 0.9283,
+      "step": 240
+    },
+    {
+      "epoch": 0.021777777777777778,
+      "grad_norm": 4.265533447265625,
+      "learning_rate": 9.916274537819775e-05,
+      "loss": 0.9491,
+      "step": 245
+    },
+    {
+      "epoch": 0.022222222222222223,
+      "grad_norm": 3.2655222415924072,
+      "learning_rate": 9.906042835139089e-05,
+      "loss": 0.9067,
+      "step": 250
+    },
+    {
+      "epoch": 0.02266666666666667,
+      "grad_norm": 5.214357852935791,
+      "learning_rate": 9.89522736242292e-05,
+      "loss": 0.9483,
+      "step": 255
+    },
+    {
+      "epoch": 0.02311111111111111,
+      "grad_norm": 3.831639289855957,
+      "learning_rate": 9.883829406604363e-05,
+      "loss": 0.8565,
+      "step": 260
+    },
+    {
+      "epoch": 0.023555555555555555,
+      "grad_norm": 3.9132847785949707,
+      "learning_rate": 9.871850323926177e-05,
+      "loss": 0.9348,
+      "step": 265
+    },
+    {
+      "epoch": 0.024,
+      "grad_norm": 3.378690481185913,
+      "learning_rate": 9.859291539779406e-05,
+      "loss": 0.9033,
+      "step": 270
+    },
+    {
+      "epoch": 0.024444444444444446,
+      "grad_norm": 4.511641025543213,
+      "learning_rate": 9.846154548533773e-05,
+      "loss": 0.908,
+      "step": 275
+    },
+    {
+      "epoch": 0.024888888888888887,
+      "grad_norm": 2.6710896492004395,
+      "learning_rate": 9.832440913359861e-05,
+      "loss": 0.8716,
+      "step": 280
+    },
+    {
+      "epoch": 0.025333333333333333,
+      "grad_norm": 3.3226819038391113,
+      "learning_rate": 9.818152266043114e-05,
+      "loss": 0.9229,
+      "step": 285
+    },
+    {
+      "epoch": 0.025777777777777778,
+      "grad_norm": 2.8580386638641357,
+      "learning_rate": 9.803290306789676e-05,
+      "loss": 0.8817,
+      "step": 290
+    },
+    {
+      "epoch": 0.026222222222222223,
+      "grad_norm": 3.8689441680908203,
+      "learning_rate": 9.787856804024073e-05,
+      "loss": 0.8654,
+      "step": 295
+    },
+    {
+      "epoch": 0.02666666666666667,
+      "grad_norm": 5.108214855194092,
+      "learning_rate": 9.771853594178791e-05,
+      "loss": 0.9015,
+      "step": 300
+    },
+    {
+      "epoch": 0.02666666666666667,
+      "eval_loss": 1.0304287672042847,
+      "eval_runtime": 149.0721,
+      "eval_samples_per_second": 33.541,
+      "eval_steps_per_second": 4.193,
+      "step": 300
+    },
+    {
+      "epoch": 0.02711111111111111,
+      "grad_norm": 3.7313003540039062,
+      "learning_rate": 9.755282581475769e-05,
+      "loss": 0.8804,
+      "step": 305
+    },
+    {
+      "epoch": 0.027555555555555555,
+      "grad_norm": 2.914902687072754,
+      "learning_rate": 9.738145737699799e-05,
+      "loss": 0.8449,
+      "step": 310
+    },
+    {
+      "epoch": 0.028,
+      "grad_norm": 3.4815356731414795,
+      "learning_rate": 9.720445101963922e-05,
+      "loss": 0.8962,
+      "step": 315
+    },
+    {
+      "epoch": 0.028444444444444446,
+      "grad_norm": 3.9641096591949463,
+      "learning_rate": 9.702182780466775e-05,
+      "loss": 0.8539,
+      "step": 320
+    },
+    {
+      "epoch": 0.028888888888888888,
+      "grad_norm": 3.4962000846862793,
+      "learning_rate": 9.683360946241989e-05,
+      "loss": 0.8779,
+      "step": 325
+    },
+    {
+      "epoch": 0.029333333333333333,
+      "grad_norm": 3.8663835525512695,
+      "learning_rate": 9.663981838899612e-05,
+      "loss": 0.8943,
+      "step": 330
+    },
+    {
+      "epoch": 0.029777777777777778,
+      "grad_norm": 3.476285934448242,
+      "learning_rate": 9.644047764359622e-05,
+      "loss": 0.8697,
+      "step": 335
+    },
+    {
+      "epoch": 0.030222222222222223,
+      "grad_norm": 3.642954111099243,
+      "learning_rate": 9.623561094577542e-05,
+      "loss": 0.9362,
+      "step": 340
+    },
+    {
+      "epoch": 0.030666666666666665,
+      "grad_norm": 3.1184070110321045,
+      "learning_rate": 9.602524267262203e-05,
+      "loss": 0.8999,
+      "step": 345
+    },
+    {
+      "epoch": 0.03111111111111111,
+      "grad_norm": 3.0415987968444824,
+      "learning_rate": 9.580939785585681e-05,
+      "loss": 0.8613,
+      "step": 350
+    },
+    {
+      "epoch": 0.03155555555555556,
+      "grad_norm": 2.9512388706207275,
+      "learning_rate": 9.558810217885443e-05,
+      "loss": 0.8499,
+      "step": 355
+    },
+    {
+      "epoch": 0.032,
+      "grad_norm": 3.8605785369873047,
+      "learning_rate": 9.536138197358745e-05,
+      "loss": 0.8472,
+      "step": 360
+    },
+    {
+      "epoch": 0.03244444444444444,
+      "grad_norm": 2.254096031188965,
+      "learning_rate": 9.512926421749304e-05,
+      "loss": 0.8302,
+      "step": 365
+    },
+    {
+      "epoch": 0.03288888888888889,
+      "grad_norm": 2.6906585693359375,
+      "learning_rate": 9.489177653026289e-05,
+      "loss": 0.8939,
+      "step": 370
+    },
+    {
+      "epoch": 0.03333333333333333,
+      "grad_norm": 2.434436559677124,
+      "learning_rate": 9.464894717055686e-05,
+      "loss": 0.9043,
+      "step": 375
+    },
+    {
+      "epoch": 0.033777777777777775,
+      "grad_norm": 2.497917652130127,
+      "learning_rate": 9.440080503264037e-05,
+      "loss": 0.8737,
+      "step": 380
+    },
+    {
+      "epoch": 0.03422222222222222,
+      "grad_norm": 4.014335632324219,
+      "learning_rate": 9.414737964294636e-05,
+      "loss": 0.8566,
+      "step": 385
+    },
+    {
+      "epoch": 0.034666666666666665,
+      "grad_norm": 2.4953484535217285,
+      "learning_rate": 9.388870115656184e-05,
+      "loss": 0.8479,
+      "step": 390
+    },
+    {
+      "epoch": 0.035111111111111114,
+      "grad_norm": 3.679469347000122,
+      "learning_rate": 9.362480035363986e-05,
+      "loss": 0.8702,
+      "step": 395
+    },
+    {
+      "epoch": 0.035555555555555556,
+      "grad_norm": 3.756469249725342,
+      "learning_rate": 9.335570863573686e-05,
+      "loss": 0.8408,
+      "step": 400
+    },
+    {
+      "epoch": 0.035555555555555556,
+      "eval_loss": 0.9743552803993225,
+      "eval_runtime": 145.3606,
+      "eval_samples_per_second": 34.397,
+      "eval_steps_per_second": 4.3,
+      "step": 400
+    },
+    {
+      "epoch": 0.036,
+      "grad_norm": 2.5384509563446045,
+      "learning_rate": 9.308145802207629e-05,
+      "loss": 0.7945,
+      "step": 405
+    },
+    {
+      "epoch": 0.036444444444444446,
+      "grad_norm": 3.5420918464660645,
+      "learning_rate": 9.280208114573859e-05,
+      "loss": 0.8276,
+      "step": 410
+    },
+    {
+      "epoch": 0.03688888888888889,
+      "grad_norm": 3.725031852722168,
+      "learning_rate": 9.251761124977815e-05,
+      "loss": 0.8379,
+      "step": 415
+    },
+    {
+      "epoch": 0.037333333333333336,
+      "grad_norm": 3.2828762531280518,
+      "learning_rate": 9.222808218326784e-05,
+      "loss": 0.8136,
+      "step": 420
+    },
+    {
+      "epoch": 0.03777777777777778,
+      "grad_norm": 3.535404682159424,
+      "learning_rate": 9.193352839727121e-05,
+      "loss": 0.8441,
+      "step": 425
+    },
+    {
+      "epoch": 0.03822222222222222,
+      "grad_norm": 3.6818301677703857,
+      "learning_rate": 9.163398494074314e-05,
+      "loss": 0.824,
+      "step": 430
+    },
+    {
+      "epoch": 0.03866666666666667,
+      "grad_norm": 3.2186279296875,
+      "learning_rate": 9.132948745635944e-05,
+      "loss": 0.867,
+      "step": 435
+    },
+    {
+      "epoch": 0.03911111111111111,
+      "grad_norm": 3.6566619873046875,
+      "learning_rate": 9.102007217627568e-05,
+      "loss": 0.8889,
+      "step": 440
+    },
+    {
+      "epoch": 0.03955555555555555,
+      "grad_norm": 2.5457677841186523,
+      "learning_rate": 9.070577591781597e-05,
+      "loss": 0.8509,
+      "step": 445
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 2.946967840194702,
+      "learning_rate": 9.038663607909198e-05,
+      "loss": 0.8356,
+      "step": 450
+    },
+    {
+      "epoch": 0.04044444444444444,
+      "grad_norm": 3.6755688190460205,
+      "learning_rate": 9.006269063455304e-05,
+      "loss": 0.8134,
+      "step": 455
+    },
+    {
+      "epoch": 0.04088888888888889,
+      "grad_norm": 3.2024929523468018,
+      "learning_rate": 8.97339781304675e-05,
+      "loss": 0.8292,
+      "step": 460
+    },
+    {
+      "epoch": 0.04133333333333333,
+      "grad_norm": 3.4355175495147705,
+      "learning_rate": 8.940053768033609e-05,
+      "loss": 0.8325,
+      "step": 465
+    },
+    {
+      "epoch": 0.041777777777777775,
+      "grad_norm": 3.882667303085327,
+      "learning_rate": 8.906240896023794e-05,
+      "loss": 0.8773,
+      "step": 470
+    },
+    {
+      "epoch": 0.042222222222222223,
+      "grad_norm": 3.5231196880340576,
+      "learning_rate": 8.871963220410928e-05,
+      "loss": 0.8399,
+      "step": 475
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 2.3692946434020996,
+      "learning_rate": 8.837224819895626e-05,
+      "loss": 0.8638,
+      "step": 480
+    },
+    {
+      "epoch": 0.043111111111111114,
+      "grad_norm": 3.1206417083740234,
+      "learning_rate": 8.802029828000156e-05,
+      "loss": 0.8241,
+      "step": 485
+    },
+    {
+      "epoch": 0.043555555555555556,
+      "grad_norm": 2.570483922958374,
+      "learning_rate": 8.766382432576588e-05,
+      "loss": 0.8265,
+      "step": 490
+    },
+    {
+      "epoch": 0.044,
+      "grad_norm": 2.721163749694824,
+      "learning_rate": 8.730286875308497e-05,
+      "loss": 0.8191,
+      "step": 495
+    },
+    {
+      "epoch": 0.044444444444444446,
+      "grad_norm": 2.883211851119995,
+      "learning_rate": 8.693747451206232e-05,
+      "loss": 0.8014,
+      "step": 500
+    },
+    {
+      "epoch": 0.044444444444444446,
+      "eval_loss": 0.9622268676757812,
+      "eval_runtime": 146.4776,
+      "eval_samples_per_second": 34.135,
+      "eval_steps_per_second": 4.267,
+      "step": 500
+    },
+    {
+      "epoch": 0.04488888888888889,
+      "grad_norm": 2.902592897415161,
+      "learning_rate": 8.656768508095853e-05,
+      "loss": 0.8482,
+      "step": 505
+    },
+    {
+      "epoch": 0.04533333333333334,
+      "grad_norm": 3.207852840423584,
+      "learning_rate": 8.61935444610179e-05,
+      "loss": 0.819,
+      "step": 510
+    },
+    {
+      "epoch": 0.04577777777777778,
+      "grad_norm": 3.402653455734253,
+      "learning_rate": 8.581509717123273e-05,
+      "loss": 0.8495,
+      "step": 515
+    },
+    {
+      "epoch": 0.04622222222222222,
+      "grad_norm": 2.159984827041626,
+      "learning_rate": 8.543238824304584e-05,
+      "loss": 0.8078,
+      "step": 520
+    },
+    {
+      "epoch": 0.04666666666666667,
+      "grad_norm": 3.279927968978882,
+      "learning_rate": 8.504546321499255e-05,
+      "loss": 0.8831,
+      "step": 525
+    },
+    {
+      "epoch": 0.04711111111111111,
+      "grad_norm": 3.268341064453125,
+      "learning_rate": 8.46543681272818e-05,
+      "loss": 0.8359,
+      "step": 530
+    },
+    {
+      "epoch": 0.04755555555555555,
+      "grad_norm": 2.2996602058410645,
+      "learning_rate": 8.425914951631795e-05,
+      "loss": 0.8419,
+      "step": 535
+    },
+    {
+      "epoch": 0.048,
+      "grad_norm": 3.5043487548828125,
+      "learning_rate": 8.385985440916344e-05,
+      "loss": 0.8337,
+      "step": 540
+    },
+    {
+      "epoch": 0.04844444444444444,
+      "grad_norm": 2.0510051250457764,
+      "learning_rate": 8.345653031794292e-05,
+      "loss": 0.8375,
+      "step": 545
+    },
+    {
+      "epoch": 0.04888888888888889,
+      "grad_norm": 2.8395752906799316,
+      "learning_rate": 8.304922523418987e-05,
+      "loss": 0.82,
+      "step": 550
+    },
+    {
+      "epoch": 0.04933333333333333,
+      "grad_norm": 4.7278876304626465,
+      "learning_rate": 8.263798762313612e-05,
+      "loss": 0.8209,
+      "step": 555
+    },
+    {
+      "epoch": 0.049777777777777775,
+      "grad_norm": 2.782799482345581,
+      "learning_rate": 8.222286641794488e-05,
+      "loss": 0.8109,
+      "step": 560
+    },
+    {
+      "epoch": 0.050222222222222224,
+      "grad_norm": 2.960604190826416,
+      "learning_rate": 8.18039110138882e-05,
+      "loss": 0.8397,
+      "step": 565
+    },
+    {
+      "epoch": 0.050666666666666665,
+      "grad_norm": 2.1768970489501953,
+      "learning_rate": 8.138117126246951e-05,
+      "loss": 0.7785,
+      "step": 570
+    },
+    {
+      "epoch": 0.051111111111111114,
+      "grad_norm": 2.2641615867614746,
+      "learning_rate": 8.095469746549172e-05,
+      "loss": 0.8549,
+      "step": 575
+    },
+    {
+      "epoch": 0.051555555555555556,
+      "grad_norm": 3.175459384918213,
+      "learning_rate": 8.052454036907174e-05,
+      "loss": 0.8181,
+      "step": 580
+    },
+    {
+      "epoch": 0.052,
+      "grad_norm": 4.72011661529541,
+      "learning_rate": 8.009075115760241e-05,
+      "loss": 0.8396,
+      "step": 585
+    },
+    {
+      "epoch": 0.052444444444444446,
+      "grad_norm": 3.0918285846710205,
+      "learning_rate": 7.965338144766186e-05,
+      "loss": 0.8967,
+      "step": 590
+    },
+    {
+      "epoch": 0.05288888888888889,
+      "grad_norm": 3.0895960330963135,
+      "learning_rate": 7.921248328187173e-05,
+      "loss": 0.8236,
+      "step": 595
+    },
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 2.9176504611968994,
+      "learning_rate": 7.876810912270462e-05,
+      "loss": 0.7833,
+      "step": 600
+    },
+    {
+      "epoch": 0.05333333333333334,
+      "eval_loss": 0.9369513392448425,
+      "eval_runtime": 145.5512,
+      "eval_samples_per_second": 34.352,
+      "eval_steps_per_second": 4.294,
+      "step": 600
+    },
+    {
+      "epoch": 0.05377777777777778,
+      "grad_norm": 2.640820264816284,
+      "learning_rate": 7.832031184624164e-05,
+      "loss": 0.7855,
+      "step": 605
+    },
+    {
+      "epoch": 0.05422222222222222,
+      "grad_norm": 2.6097235679626465,
+      "learning_rate": 7.786914473588056e-05,
+      "loss": 0.8043,
+      "step": 610
+    },
+    {
+      "epoch": 0.05466666666666667,
+      "grad_norm": 2.815849781036377,
+      "learning_rate": 7.74146614759957e-05,
+      "loss": 0.8535,
+      "step": 615
+    },
+    {
+      "epoch": 0.05511111111111111,
+      "grad_norm": 3.133481025695801,
+      "learning_rate": 7.695691614555003e-05,
+      "loss": 0.8229,
+      "step": 620
+    },
+    {
+      "epoch": 0.05555555555555555,
+      "grad_norm": 2.641892910003662,
+      "learning_rate": 7.649596321166024e-05,
+      "loss": 0.7967,
+      "step": 625
+    },
+    {
+      "epoch": 0.056,
+      "grad_norm": 3.032099723815918,
+      "learning_rate": 7.603185752311587e-05,
+      "loss": 0.812,
+      "step": 630
+    },
+    {
+      "epoch": 0.05644444444444444,
+      "grad_norm": 2.820112466812134,
+      "learning_rate": 7.55646543038526e-05,
+      "loss": 0.7977,
+      "step": 635
+    },
+    {
+      "epoch": 0.05688888888888889,
+      "grad_norm": 3.10481333732605,
+      "learning_rate": 7.509440914638139e-05,
+      "loss": 0.8705,
+      "step": 640
+    },
+    {
+      "epoch": 0.05733333333333333,
+      "grad_norm": 2.5827136039733887,
+      "learning_rate": 7.462117800517336e-05,
+      "loss": 0.815,
+      "step": 645
+    },
+    {
+      "epoch": 0.057777777777777775,
+      "grad_norm": 3.8412251472473145,
+      "learning_rate": 7.414501719000187e-05,
+      "loss": 0.8164,
+      "step": 650
+    },
+    {
+      "epoch": 0.058222222222222224,
+      "grad_norm": 2.3787269592285156,
+      "learning_rate": 7.366598335924217e-05,
+      "loss": 0.8154,
+      "step": 655
+    },
+    {
+      "epoch": 0.058666666666666666,
+      "grad_norm": 2.156470537185669,
+      "learning_rate": 7.318413351312965e-05,
+      "loss": 0.8122,
+      "step": 660
+    },
+    {
+      "epoch": 0.059111111111111114,
+      "grad_norm": 2.743718385696411,
+      "learning_rate": 7.269952498697734e-05,
+      "loss": 0.8279,
+      "step": 665
+    },
+    {
+      "epoch": 0.059555555555555556,
+      "grad_norm": 2.574324369430542,
+      "learning_rate": 7.221221544435363e-05,
+      "loss": 0.8205,
+      "step": 670
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 2.6374778747558594,
+      "learning_rate": 7.172226287022086e-05,
+      "loss": 0.786,
+      "step": 675
+    },
+    {
+      "epoch": 0.060444444444444446,
+      "grad_norm": 2.6063718795776367,
+      "learning_rate": 7.122972556403567e-05,
+      "loss": 0.7742,
+      "step": 680
+    },
+    {
+      "epoch": 0.06088888888888889,
+      "grad_norm": 2.1271631717681885,
+      "learning_rate": 7.073466213281196e-05,
+      "loss": 0.8303,
+      "step": 685
+    },
+    {
+      "epoch": 0.06133333333333333,
+      "grad_norm": 2.25993275642395,
+      "learning_rate": 7.023713148414727e-05,
+      "loss": 0.8154,
+      "step": 690
+    },
+    {
+      "epoch": 0.06177777777777778,
+      "grad_norm": 2.227431058883667,
+      "learning_rate": 6.973719281921335e-05,
+      "loss": 0.8394,
+      "step": 695
+    },
+    {
+      "epoch": 0.06222222222222222,
+      "grad_norm": 2.3561928272247314,
+      "learning_rate": 6.923490562571181e-05,
+      "loss": 0.815,
+      "step": 700
+    },
+    {
+      "epoch": 0.06222222222222222,
+      "eval_loss": 0.9396146535873413,
+      "eval_runtime": 174.3916,
+      "eval_samples_per_second": 28.671,
+      "eval_steps_per_second": 3.584,
+      "step": 700
+    },
+    {
+      "epoch": 0.06266666666666666,
+      "grad_norm": 2.8203611373901367,
+      "learning_rate": 6.873032967079561e-05,
+      "loss": 0.8072,
+      "step": 705
+    },
+    {
+      "epoch": 0.06311111111111112,
+      "grad_norm": 2.616844892501831,
+      "learning_rate": 6.82235249939575e-05,
+      "loss": 0.8132,
+      "step": 710
+    },
+    {
+      "epoch": 0.06355555555555556,
+      "grad_norm": 2.7529284954071045,
+      "learning_rate": 6.771455189988579e-05,
+      "loss": 0.8126,
+      "step": 715
+    },
+    {
+      "epoch": 0.064,
+      "grad_norm": 2.466383218765259,
+      "learning_rate": 6.720347095128884e-05,
+      "loss": 0.8174,
+      "step": 720
+    },
+    {
+      "epoch": 0.06444444444444444,
+      "grad_norm": 2.2590644359588623,
+      "learning_rate": 6.669034296168855e-05,
+      "loss": 0.8065,
+      "step": 725
+    },
+    {
+      "epoch": 0.06488888888888888,
+      "grad_norm": 2.241419792175293,
+      "learning_rate": 6.617522898818426e-05,
+      "loss": 0.8332,
+      "step": 730
+    },
+    {
+      "epoch": 0.06533333333333333,
+      "grad_norm": 2.259533405303955,
+      "learning_rate": 6.565819032418747e-05,
+      "loss": 0.8599,
+      "step": 735
+    },
+    {
+      "epoch": 0.06577777777777778,
+      "grad_norm": 2.110358715057373,
+      "learning_rate": 6.513928849212873e-05,
+      "loss": 0.795,
+      "step": 740
+    },
+    {
+      "epoch": 0.06622222222222222,
+      "grad_norm": 2.656036615371704,
+      "learning_rate": 6.461858523613684e-05,
+      "loss": 0.8161,
+      "step": 745
+    },
+    {
+      "epoch": 0.06666666666666667,
+      "grad_norm": 3.0978121757507324,
+      "learning_rate": 6.409614251469208e-05,
+      "loss": 0.8104,
+      "step": 750
+    },
+    {
+      "epoch": 0.06711111111111111,
+      "grad_norm": 2.494825839996338,
+      "learning_rate": 6.357202249325371e-05,
+      "loss": 0.791,
+      "step": 755
+    },
+    {
+      "epoch": 0.06755555555555555,
+      "grad_norm": 2.344874143600464,
+      "learning_rate": 6.304628753686295e-05,
+      "loss": 0.8195,
+      "step": 760
+    },
+    {
+      "epoch": 0.068,
+      "grad_norm": 2.4682934284210205,
+      "learning_rate": 6.251900020272208e-05,
+      "loss": 0.7791,
+      "step": 765
+    },
+    {
+      "epoch": 0.06844444444444445,
+      "grad_norm": 2.29433012008667,
+      "learning_rate": 6.199022323275083e-05,
+      "loss": 0.8252,
+      "step": 770
+    },
+    {
+      "epoch": 0.06888888888888889,
+      "grad_norm": 1.965577483177185,
+      "learning_rate": 6.146001954612071e-05,
+      "loss": 0.8046,
+      "step": 775
+    },
+    {
+      "epoch": 0.06933333333333333,
+      "grad_norm": 2.1349830627441406,
+      "learning_rate": 6.092845223176823e-05,
+      "loss": 0.82,
+      "step": 780
+    },
+    {
+      "epoch": 0.06977777777777777,
+      "grad_norm": 2.2359840869903564,
+      "learning_rate": 6.0395584540887963e-05,
+      "loss": 0.8138,
+      "step": 785
+    },
+    {
+      "epoch": 0.07022222222222223,
+      "grad_norm": 2.470207691192627,
+      "learning_rate": 5.9861479879406315e-05,
+      "loss": 0.771,
+      "step": 790
+    },
+    {
+      "epoch": 0.07066666666666667,
+      "grad_norm": 1.9428234100341797,
+      "learning_rate": 5.932620180043674e-05,
+      "loss": 0.7997,
+      "step": 795
+    },
+    {
+      "epoch": 0.07111111111111111,
+      "grad_norm": 2.2809956073760986,
+      "learning_rate": 5.8789813996717736e-05,
+      "loss": 0.8113,
+      "step": 800
+    },
+    {
+      "epoch": 0.07111111111111111,
+      "eval_loss": 0.9512593746185303,
+      "eval_runtime": 156.0458,
+      "eval_samples_per_second": 32.042,
+      "eval_steps_per_second": 4.005,
+      "step": 800
+    },
+    {
+      "epoch": 0.07155555555555555,
+      "grad_norm": 2.4197585582733154,
+      "learning_rate": 5.8252380293033884e-05,
+      "loss": 0.8103,
+      "step": 805
+    },
+    {
+      "epoch": 0.072,
+      "grad_norm": 3.481379747390747,
+      "learning_rate": 5.7713964638621444e-05,
+      "loss": 0.8354,
+      "step": 810
+    },
+    {
+      "epoch": 0.07244444444444445,
+      "grad_norm": 3.0828964710235596,
+      "learning_rate": 5.717463109955896e-05,
+      "loss": 0.814,
+      "step": 815
+    },
+    {
+      "epoch": 0.07288888888888889,
+      "grad_norm": 2.0905721187591553,
+      "learning_rate": 5.663444385114411e-05,
+      "loss": 0.7695,
+      "step": 820
+    },
+    {
+      "epoch": 0.07333333333333333,
+      "grad_norm": 3.2763991355895996,
+      "learning_rate": 5.6093467170257374e-05,
+      "loss": 0.7864,
+      "step": 825
+    },
+    {
+      "epoch": 0.07377777777777778,
+      "grad_norm": 2.24617862701416,
+      "learning_rate": 5.5551765427713884e-05,
+      "loss": 0.7314,
+      "step": 830
+    },
+    {
+      "epoch": 0.07422222222222222,
+      "grad_norm": 2.808973789215088,
+      "learning_rate": 5.5009403080603815e-05,
+      "loss": 0.8163,
+      "step": 835
+    },
+    {
+      "epoch": 0.07466666666666667,
+      "grad_norm": 2.1906187534332275,
+      "learning_rate": 5.4466444664622685e-05,
+      "loss": 0.7868,
+      "step": 840
+    },
+    {
+      "epoch": 0.07511111111111111,
+      "grad_norm": 2.268329381942749,
+      "learning_rate": 5.392295478639225e-05,
+      "loss": 0.7765,
+      "step": 845
+    },
+    {
+      "epoch": 0.07555555555555556,
+      "grad_norm": 2.3029792308807373,
+      "learning_rate": 5.337899811577296e-05,
+      "loss": 0.7739,
+      "step": 850
+    },
+    {
+      "epoch": 0.076,
+      "grad_norm": 2.1580140590667725,
+      "learning_rate": 5.283463937816888e-05,
+      "loss": 0.7358,
+      "step": 855
+    },
+    {
+      "epoch": 0.07644444444444444,
+      "grad_norm": 1.9017610549926758,
+      "learning_rate": 5.228994334682604e-05,
+      "loss": 0.7553,
+      "step": 860
+    },
+    {
+      "epoch": 0.0768888888888889,
+      "grad_norm": 2.5367610454559326,
+      "learning_rate": 5.174497483512506e-05,
+      "loss": 0.7988,
+      "step": 865
+    },
+    {
+      "epoch": 0.07733333333333334,
+      "grad_norm": 2.2793335914611816,
+      "learning_rate": 5.119979868886895e-05,
+      "loss": 0.7736,
+      "step": 870
+    },
+    {
+      "epoch": 0.07777777777777778,
+      "grad_norm": 2.226646900177002,
+      "learning_rate": 5.0654479778567223e-05,
+      "loss": 0.7659,
+      "step": 875
+    },
+    {
+      "epoch": 0.07822222222222222,
+      "grad_norm": 2.1985228061676025,
+      "learning_rate": 5.010908299171685e-05,
+      "loss": 0.7584,
+      "step": 880
+    },
+    {
+      "epoch": 0.07866666666666666,
+      "grad_norm": 2.9187963008880615,
+      "learning_rate": 4.9563673225081314e-05,
+      "loss": 0.7747,
+      "step": 885
+    },
+    {
+      "epoch": 0.0791111111111111,
+      "grad_norm": 2.6130077838897705,
+      "learning_rate": 4.901831537696859e-05,
+      "loss": 0.7689,
+      "step": 890
+    },
+    {
+      "epoch": 0.07955555555555556,
+      "grad_norm": 2.0770986080169678,
+      "learning_rate": 4.8473074339508875e-05,
+      "loss": 0.7608,
+      "step": 895
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 2.1071507930755615,
+      "learning_rate": 4.792801499093305e-05,
+      "loss": 0.7597,
+      "step": 900
+    },
+    {
+      "epoch": 0.08,
+      "eval_loss": 0.9028043746948242,
+      "eval_runtime": 151.7758,
+      "eval_samples_per_second": 32.943,
+      "eval_steps_per_second": 4.118,
+      "step": 900
+    },
+    {
+      "epoch": 0.08044444444444444,
+      "grad_norm": 2.295839309692383,
+      "learning_rate": 4.738320218785281e-05,
+      "loss": 0.7514,
+      "step": 905
+    },
+    {
+      "epoch": 0.08088888888888889,
+      "grad_norm": 1.9599803686141968,
+      "learning_rate": 4.683870075754347e-05,
+      "loss": 0.7633,
+      "step": 910
+    },
+    {
+      "epoch": 0.08133333333333333,
+      "grad_norm": 2.3032443523406982,
+      "learning_rate": 4.629457549023004e-05,
+      "loss": 0.7607,
+      "step": 915
+    },
+    {
+      "epoch": 0.08177777777777778,
+      "grad_norm": 2.9767608642578125,
+      "learning_rate": 4.575089113137792e-05,
+      "loss": 0.8124,
+      "step": 920
+    },
+    {
+      "epoch": 0.08222222222222222,
+      "grad_norm": 2.6544406414031982,
+      "learning_rate": 4.52077123739888e-05,
+      "loss": 0.799,
+      "step": 925
+    },
+    {
+      "epoch": 0.08266666666666667,
+      "grad_norm": 1.9514062404632568,
+      "learning_rate": 4.466510385090287e-05,
+      "loss": 0.7782,
+      "step": 930
+    },
+    {
+      "epoch": 0.08311111111111111,
+      "grad_norm": 2.3520166873931885,
+      "learning_rate": 4.412313012710813e-05,
+      "loss": 0.8328,
+      "step": 935
+    },
+    {
+      "epoch": 0.08355555555555555,
+      "grad_norm": 2.1971933841705322,
+      "learning_rate": 4.358185569205779e-05,
+      "loss": 0.7903,
+      "step": 940
+    },
+    {
+      "epoch": 0.084,
+      "grad_norm": 2.2344958782196045,
+      "learning_rate": 4.3041344951996746e-05,
+      "loss": 0.7288,
+      "step": 945
+    },
+    {
+      "epoch": 0.08444444444444445,
+      "grad_norm": 1.9887062311172485,
+      "learning_rate": 4.250166222229774e-05,
+      "loss": 0.7817,
+      "step": 950
+    },
+    {
+      "epoch": 0.08488888888888889,
+      "grad_norm": 2.5852599143981934,
+      "learning_rate": 4.196287171980869e-05,
+      "loss": 0.8126,
+      "step": 955
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 2.0287818908691406,
+      "learning_rate": 4.142503755521129e-05,
+      "loss": 0.8016,
+      "step": 960
+    },
+    {
+      "epoch": 0.08577777777777777,
+      "grad_norm": 2.318622589111328,
+      "learning_rate": 4.088822372539263e-05,
+      "loss": 0.7858,
+      "step": 965
+    },
+    {
+      "epoch": 0.08622222222222223,
+      "grad_norm": 1.7345527410507202,
+      "learning_rate": 4.035249410583016e-05,
+      "loss": 0.7737,
+      "step": 970
+    },
+    {
+      "epoch": 0.08666666666666667,
+      "grad_norm": 1.9019683599472046,
+      "learning_rate": 3.981791244299113e-05,
+      "loss": 0.7344,
+      "step": 975
+    },
+    {
+      "epoch": 0.08711111111111111,
+      "grad_norm": 2.2720935344696045,
+      "learning_rate": 3.928454234674747e-05,
+      "loss": 0.7723,
+      "step": 980
+    },
+    {
+      "epoch": 0.08755555555555555,
+      "grad_norm": 2.1315135955810547,
+      "learning_rate": 3.875244728280676e-05,
+      "loss": 0.799,
+      "step": 985
+    },
+    {
+      "epoch": 0.088,
+      "grad_norm": 2.0208346843719482,
+      "learning_rate": 3.82216905651605e-05,
+      "loss": 0.793,
+      "step": 990
+    },
+    {
+      "epoch": 0.08844444444444445,
+      "grad_norm": 2.7285315990448,
+      "learning_rate": 3.769233534855035e-05,
+      "loss": 0.7506,
+      "step": 995
+    },
+    {
+      "epoch": 0.08888888888888889,
+      "grad_norm": 2.095430374145508,
+      "learning_rate": 3.7164444620953396e-05,
+      "loss": 0.7534,
+      "step": 1000
+    },
+    {
+      "epoch": 0.08888888888888889,
+      "eval_loss": 0.8925400376319885,
+      "eval_runtime": 146.2458,
+      "eval_samples_per_second": 34.189,
+      "eval_steps_per_second": 4.274,
+      "step": 1000
+    },
+    {
+      "epoch": 0.08933333333333333,
+      "grad_norm": 2.029069423675537,
+      "learning_rate": 3.663808119608716e-05,
+      "loss": 0.792,
+      "step": 1005
+    },
+    {
+      "epoch": 0.08977777777777778,
+      "grad_norm": 2.4296746253967285,
+      "learning_rate": 3.6113307705935396e-05,
+      "loss": 0.7631,
+      "step": 1010
+    },
+    {
+      "epoch": 0.09022222222222222,
+      "grad_norm": 1.9055721759796143,
+      "learning_rate": 3.559018659329554e-05,
+      "loss": 0.764,
+      "step": 1015
+    },
+    {
+      "epoch": 0.09066666666666667,
+      "grad_norm": 1.9428242444992065,
+      "learning_rate": 3.506878010434863e-05,
+      "loss": 0.7671,
+      "step": 1020
+    },
+    {
+      "epoch": 0.09111111111111111,
+      "grad_norm": 2.55059552192688,
+      "learning_rate": 3.4549150281252636e-05,
+      "loss": 0.7873,
+      "step": 1025
+    },
+    {
+      "epoch": 0.09155555555555556,
+      "grad_norm": 1.9363492727279663,
+      "learning_rate": 3.403135895476004e-05,
+      "loss": 0.7592,
+      "step": 1030
+    },
+    {
+      "epoch": 0.092,
+      "grad_norm": 2.3893046379089355,
+      "learning_rate": 3.351546773686065e-05,
+      "loss": 0.7718,
+      "step": 1035
+    },
+    {
+      "epoch": 0.09244444444444444,
+      "grad_norm": 1.9245718717575073,
+      "learning_rate": 3.300153801345028e-05,
+      "loss": 0.7403,
+      "step": 1040
+    },
+    {
+      "epoch": 0.09288888888888888,
+      "grad_norm": 1.7629024982452393,
+      "learning_rate": 3.248963093702663e-05,
+      "loss": 0.7999,
+      "step": 1045
+    },
+    {
+      "epoch": 0.09333333333333334,
+      "grad_norm": 2.3090808391571045,
+      "learning_rate": 3.197980741941252e-05,
+      "loss": 0.7815,
+      "step": 1050
+    },
+    {
+      "epoch": 0.09377777777777778,
+      "grad_norm": 2.262960433959961,
+      "learning_rate": 3.147212812450819e-05,
+      "loss": 0.7737,
+      "step": 1055
+    },
+    {
+      "epoch": 0.09422222222222222,
+      "grad_norm": 2.4823575019836426,
+      "learning_rate": 3.096665346107278e-05,
+      "loss": 0.7961,
+      "step": 1060
+    },
+    {
+      "epoch": 0.09466666666666666,
+      "grad_norm": 2.411437511444092,
+      "learning_rate": 3.046344357553632e-05,
+      "loss": 0.8292,
+      "step": 1065
+    },
+    {
+      "epoch": 0.0951111111111111,
+      "grad_norm": 2.0698482990264893,
+      "learning_rate": 2.996255834484296e-05,
+      "loss": 0.7709,
+      "step": 1070
+    },
+    {
+      "epoch": 0.09555555555555556,
+      "grad_norm": 1.6237046718597412,
+      "learning_rate": 2.946405736932615e-05,
+      "loss": 0.7675,
+      "step": 1075
+    },
+    {
+      "epoch": 0.096,
+      "grad_norm": 2.6146557331085205,
+      "learning_rate": 2.8967999965616816e-05,
+      "loss": 0.7564,
+      "step": 1080
+    },
+    {
+      "epoch": 0.09644444444444444,
+      "grad_norm": 2.2040791511535645,
+      "learning_rate": 2.8474445159585235e-05,
+      "loss": 0.733,
+      "step": 1085
+    },
+    {
+      "epoch": 0.09688888888888889,
+      "grad_norm": 2.3044800758361816,
+      "learning_rate": 2.7983451679317706e-05,
+      "loss": 0.7705,
+      "step": 1090
+    },
+    {
+      "epoch": 0.09733333333333333,
+      "grad_norm": 2.10251784324646,
+      "learning_rate": 2.7495077948128245e-05,
+      "loss": 0.7545,
+      "step": 1095
+    },
+    {
+      "epoch": 0.09777777777777778,
+      "grad_norm": 2.353555202484131,
+      "learning_rate": 2.700938207760701e-05,
+      "loss": 0.7512,
+      "step": 1100
+    },
+    {
+      "epoch": 0.09777777777777778,
+      "eval_loss": 0.8821930885314941,
+      "eval_runtime": 144.8568,
+      "eval_samples_per_second": 34.517,
+      "eval_steps_per_second": 4.315,
+      "step": 1100
+    },
+    {
+      "epoch": 0.09822222222222222,
+      "grad_norm": 2.295103073120117,
+      "learning_rate": 2.6526421860705473e-05,
+      "loss": 0.7454,
+      "step": 1105
+    },
+    {
+      "epoch": 0.09866666666666667,
+      "grad_norm": 2.361027956008911,
+      "learning_rate": 2.6046254764859685e-05,
+      "loss": 0.6993,
+      "step": 1110
+    },
+    {
+      "epoch": 0.09911111111111111,
+      "grad_norm": 2.4085135459899902,
+      "learning_rate": 2.556893792515227e-05,
+      "loss": 0.7861,
+      "step": 1115
+    },
+    {
+      "epoch": 0.09955555555555555,
+      "grad_norm": 1.8174635171890259,
+      "learning_rate": 2.5094528137513795e-05,
+      "loss": 0.8115,
+      "step": 1120
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 2.0099422931671143,
+      "learning_rate": 2.4623081851964806e-05,
+      "loss": 0.7719,
+      "step": 1125
+    },
+    {
+      "epoch": 0.10044444444444445,
+      "grad_norm": 2.152926445007324,
+      "learning_rate": 2.4154655165898627e-05,
+      "loss": 0.8149,
+      "step": 1130
+    },
+    {
+      "epoch": 0.10088888888888889,
+      "grad_norm": 1.8284573554992676,
+      "learning_rate": 2.3689303817406515e-05,
+      "loss": 0.7305,
+      "step": 1135
+    },
+    {
+      "epoch": 0.10133333333333333,
+      "grad_norm": 2.114602565765381,
+      "learning_rate": 2.3227083178645313e-05,
+      "loss": 0.7345,
+      "step": 1140
+    },
+    {
+      "epoch": 0.10177777777777777,
+      "grad_norm": 2.3404438495635986,
+      "learning_rate": 2.276804824924864e-05,
+      "loss": 0.7555,
+      "step": 1145
+    },
+    {
+      "epoch": 0.10222222222222223,
+      "grad_norm": 2.388821601867676,
+      "learning_rate": 2.2312253649782655e-05,
+      "loss": 0.7898,
+      "step": 1150
+    },
+    {
+      "epoch": 0.10266666666666667,
+      "grad_norm": 2.1446168422698975,
+      "learning_rate": 2.185975361524657e-05,
+      "loss": 0.7312,
+      "step": 1155
+    },
+    {
+      "epoch": 0.10311111111111111,
+      "grad_norm": 2.198666572570801,
+      "learning_rate": 2.1410601988619394e-05,
+      "loss": 0.7612,
+      "step": 1160
+    },
+    {
+      "epoch": 0.10355555555555555,
+      "grad_norm": 2.017157793045044,
+      "learning_rate": 2.0964852214453013e-05,
+      "loss": 0.7532,
+      "step": 1165
+    },
+    {
+      "epoch": 0.104,
+      "grad_norm": 1.7391427755355835,
+      "learning_rate": 2.0522557332512953e-05,
+      "loss": 0.7452,
+      "step": 1170
+    },
+    {
+      "epoch": 0.10444444444444445,
+      "grad_norm": 2.3819758892059326,
+      "learning_rate": 2.008376997146705e-05,
+      "loss": 0.7735,
+      "step": 1175
+    },
+    {
+      "epoch": 0.10488888888888889,
+      "grad_norm": 2.148484945297241,
+      "learning_rate": 1.9648542342623277e-05,
+      "loss": 0.7528,
+      "step": 1180
+    },
+    {
+      "epoch": 0.10533333333333333,
+      "grad_norm": 2.3765158653259277,
+      "learning_rate": 1.9216926233717085e-05,
+      "loss": 0.7266,
+      "step": 1185
+    },
+    {
+      "epoch": 0.10577777777777778,
+      "grad_norm": 2.245584487915039,
+      "learning_rate": 1.8788973002749105e-05,
+      "loss": 0.7631,
+      "step": 1190
+    },
+    {
+      "epoch": 0.10622222222222222,
+      "grad_norm": 2.7776741981506348,
+      "learning_rate": 1.83647335718742e-05,
+      "loss": 0.8013,
+      "step": 1195
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 2.3494765758514404,
+      "learning_rate": 1.7944258421342098e-05,
+      "loss": 0.7146,
+      "step": 1200
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "eval_loss": 0.8797385692596436,
+      "eval_runtime": 152.824,
+      "eval_samples_per_second": 32.717,
+      "eval_steps_per_second": 4.09,
+      "step": 1200
+    },
+    {
+      "epoch": 0.10711111111111112,
+      "grad_norm": 2.1948468685150146,
+      "learning_rate": 1.7527597583490822e-05,
+      "loss": 0.7383,
+      "step": 1205
+    },
+    {
+      "epoch": 0.10755555555555556,
+      "grad_norm": 2.3977882862091064,
+      "learning_rate": 1.7114800636793377e-05,
+      "loss": 0.7751,
+      "step": 1210
+    },
+    {
+      "epoch": 0.108,
+      "grad_norm": 2.380903482437134,
+      "learning_rate": 1.670591669995829e-05,
+      "loss": 0.7514,
+      "step": 1215
+    },
+    {
+      "epoch": 0.10844444444444444,
+      "grad_norm": 2.2962698936462402,
+      "learning_rate": 1.6300994426085103e-05,
+      "loss": 0.7014,
+      "step": 1220
+    },
+    {
+      "epoch": 0.10888888888888888,
+      "grad_norm": 2.4068901538848877,
+      "learning_rate": 1.5900081996875083e-05,
+      "loss": 0.7087,
+      "step": 1225
+    },
+    {
+      "epoch": 0.10933333333333334,
+      "grad_norm": 1.881907343864441,
+      "learning_rate": 1.5503227116898016e-05,
+      "loss": 0.7847,
+      "step": 1230
+    },
+    {
+      "epoch": 0.10977777777777778,
+      "grad_norm": 1.9586148262023926,
+      "learning_rate": 1.5110477007916001e-05,
+      "loss": 0.7493,
+      "step": 1235
+    },
+    {
+      "epoch": 0.11022222222222222,
+      "grad_norm": 2.418649911880493,
+      "learning_rate": 1.4721878403264345e-05,
+      "loss": 0.7384,
+      "step": 1240
+    },
+    {
+      "epoch": 0.11066666666666666,
+      "grad_norm": 1.8795591592788696,
+      "learning_rate": 1.4337477542290928e-05,
+      "loss": 0.7254,
+      "step": 1245
+    },
+    {
+      "epoch": 0.1111111111111111,
+      "grad_norm": 1.8943523168563843,
+      "learning_rate": 1.3957320164854059e-05,
+      "loss": 0.7512,
+      "step": 1250
+    },
+    {
+      "epoch": 0.11155555555555556,
+      "grad_norm": 1.794277548789978,
+      "learning_rate": 1.3581451505879994e-05,
+      "loss": 0.7525,
+      "step": 1255
+    },
+    {
+      "epoch": 0.112,
+      "grad_norm": 2.044266700744629,
+      "learning_rate": 1.3209916289980334e-05,
+      "loss": 0.7377,
+      "step": 1260
+    },
+    {
+      "epoch": 0.11244444444444444,
+      "grad_norm": 2.0691747665405273,
+      "learning_rate": 1.2842758726130283e-05,
+      "loss": 0.7573,
+      "step": 1265
+    },
+    {
+      "epoch": 0.11288888888888889,
+      "grad_norm": 1.927995204925537,
+      "learning_rate": 1.2480022502408307e-05,
+      "loss": 0.7135,
+      "step": 1270
+    },
+    {
+      "epoch": 0.11333333333333333,
+      "grad_norm": 2.154827356338501,
+      "learning_rate": 1.2121750780797513e-05,
+      "loss": 0.7531,
+      "step": 1275
+    },
+    {
+      "epoch": 0.11377777777777778,
+      "grad_norm": 2.528263807296753,
+      "learning_rate": 1.1767986192049984e-05,
+      "loss": 0.7425,
+      "step": 1280
+    },
+    {
+      "epoch": 0.11422222222222222,
+      "grad_norm": 2.024723529815674,
+      "learning_rate": 1.1418770830614013e-05,
+      "loss": 0.7483,
+      "step": 1285
+    },
+    {
+      "epoch": 0.11466666666666667,
+      "grad_norm": 2.312997817993164,
+      "learning_rate": 1.1074146249625333e-05,
+      "loss": 0.7718,
+      "step": 1290
+    },
+    {
+      "epoch": 0.11511111111111111,
+      "grad_norm": 2.0183262825012207,
+      "learning_rate": 1.0734153455962765e-05,
+      "loss": 0.7252,
+      "step": 1295
+    },
+    {
+      "epoch": 0.11555555555555555,
+      "grad_norm": 1.9232224225997925,
+      "learning_rate": 1.0398832905368694e-05,
+      "loss": 0.7424,
+      "step": 1300
+    },
+    {
+      "epoch": 0.11555555555555555,
+      "eval_loss": 0.8784002065658569,
+      "eval_runtime": 150.2141,
+      "eval_samples_per_second": 33.286,
+      "eval_steps_per_second": 4.161,
+      "step": 1300
+    },
+    {
+      "epoch": 0.116,
+      "grad_norm": 2.266510486602783,
+      "learning_rate": 1.006822449763537e-05,
+      "loss": 0.7199,
+      "step": 1305
+    },
+    {
+      "epoch": 0.11644444444444445,
+      "grad_norm": 2.1905033588409424,
+      "learning_rate": 9.742367571857091e-06,
+      "loss": 0.6834,
+      "step": 1310
+    },
+    {
+      "epoch": 0.11688888888888889,
+      "grad_norm": 2.1627376079559326,
+      "learning_rate": 9.421300901749386e-06,
+      "loss": 0.7759,
+      "step": 1315
+    },
+    {
+      "epoch": 0.11733333333333333,
+      "grad_norm": 1.9716633558273315,
+      "learning_rate": 9.105062691035233e-06,
+      "loss": 0.7466,
+      "step": 1320
+    },
+    {
+      "epoch": 0.11777777777777777,
+      "grad_norm": 2.4646966457366943,
+      "learning_rate": 8.793690568899216e-06,
+      "loss": 0.7428,
+      "step": 1325
+    },
+    {
+      "epoch": 0.11822222222222223,
+      "grad_norm": 1.8310861587524414,
+      "learning_rate": 8.487221585510074e-06,
+      "loss": 0.7042,
+      "step": 1330
+    },
+    {
+      "epoch": 0.11866666666666667,
+      "grad_norm": 2.5189576148986816,
+      "learning_rate": 8.185692207612022e-06,
+      "loss": 0.7686,
+      "step": 1335
+    },
+    {
+      "epoch": 0.11911111111111111,
+      "grad_norm": 2.2453572750091553,
+      "learning_rate": 7.889138314185678e-06,
+      "loss": 0.7635,
+      "step": 1340
+    },
+    {
+      "epoch": 0.11955555555555555,
+      "grad_norm": 2.0272557735443115,
+      "learning_rate": 7.597595192178702e-06,
+      "loss": 0.7305,
+      "step": 1345
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.907797932624817,
+      "learning_rate": 7.311097532307121e-06,
+      "loss": 0.7326,
+      "step": 1350
+    },
+    {
+      "epoch": 0.12044444444444445,
+      "grad_norm": 2.3352434635162354,
+      "learning_rate": 7.029679424927365e-06,
+      "loss": 0.7466,
+      "step": 1355
+    },
+    {
+      "epoch": 0.12088888888888889,
+      "grad_norm": 2.1825642585754395,
+      "learning_rate": 6.753374355979975e-06,
+      "loss": 0.7346,
+      "step": 1360
+    },
+    {
+      "epoch": 0.12133333333333333,
+      "grad_norm": 2.175647020339966,
+      "learning_rate": 6.482215203005015e-06,
+      "loss": 0.716,
+      "step": 1365
+    },
+    {
+      "epoch": 0.12177777777777778,
+      "grad_norm": 2.214008092880249,
+      "learning_rate": 6.216234231230012e-06,
+      "loss": 0.7253,
+      "step": 1370
+    },
+    {
+      "epoch": 0.12222222222222222,
+      "grad_norm": 1.9471856355667114,
+      "learning_rate": 5.955463089730723e-06,
+      "loss": 0.7655,
+      "step": 1375
+    },
+    {
+      "epoch": 0.12266666666666666,
+      "grad_norm": 1.9830577373504639,
+      "learning_rate": 5.699932807665198e-06,
+      "loss": 0.7597,
+      "step": 1380
+    },
+    {
+      "epoch": 0.12311111111111112,
+      "grad_norm": 2.7330000400543213,
+      "learning_rate": 5.449673790581611e-06,
+      "loss": 0.7379,
+      "step": 1385
+    },
+    {
+      "epoch": 0.12355555555555556,
+      "grad_norm": 1.8489813804626465,
+      "learning_rate": 5.204715816800343e-06,
+      "loss": 0.7257,
+      "step": 1390
+    },
+    {
+      "epoch": 0.124,
+      "grad_norm": 1.673449158668518,
+      "learning_rate": 4.965088033870608e-06,
+      "loss": 0.7137,
+      "step": 1395
+    },
+    {
+      "epoch": 0.12444444444444444,
+      "grad_norm": 1.8715451955795288,
+      "learning_rate": 4.730818955102234e-06,
+      "loss": 0.7182,
+      "step": 1400
+    },
+    {
+      "epoch": 0.12444444444444444,
+      "eval_loss": 0.8741394281387329,
+      "eval_runtime": 146.2188,
+      "eval_samples_per_second": 34.195,
+      "eval_steps_per_second": 4.274,
+      "step": 1400
+    },
+    {
+      "epoch": 0.12488888888888888,
+      "grad_norm": 2.205672264099121,
+      "learning_rate": 4.501936456172845e-06,
+      "loss": 0.677,
+      "step": 1405
+    },
+    {
+      "epoch": 0.12533333333333332,
+      "grad_norm": 1.7708081007003784,
+      "learning_rate": 4.278467771810896e-06,
+      "loss": 0.7472,
+      "step": 1410
+    },
+    {
+      "epoch": 0.12577777777777777,
+      "grad_norm": 1.9440569877624512,
+      "learning_rate": 4.06043949255509e-06,
+      "loss": 0.7535,
+      "step": 1415
+    },
+    {
+      "epoch": 0.12622222222222224,
+      "grad_norm": 2.4499361515045166,
+      "learning_rate": 3.847877561590296e-06,
+      "loss": 0.7376,
+      "step": 1420
+    },
+    {
+      "epoch": 0.12666666666666668,
+      "grad_norm": 1.9922362565994263,
+      "learning_rate": 3.6408072716606346e-06,
+      "loss": 0.7311,
+      "step": 1425
+    },
+    {
+      "epoch": 0.12711111111111112,
+      "grad_norm": 2.0958635807037354,
+      "learning_rate": 3.4392532620598216e-06,
+      "loss": 0.7467,
+      "step": 1430
+    },
+    {
+      "epoch": 0.12755555555555556,
+      "grad_norm": 2.4406979084014893,
+      "learning_rate": 3.24323951569942e-06,
+      "loss": 0.6949,
+      "step": 1435
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 2.1745223999023438,
+      "learning_rate": 3.052789356255037e-06,
+      "loss": 0.7868,
+      "step": 1440
+    },
+    {
+      "epoch": 0.12844444444444444,
+      "grad_norm": 1.7190449237823486,
+      "learning_rate": 2.8679254453910785e-06,
+      "loss": 0.7544,
+      "step": 1445
+    },
+    {
+      "epoch": 0.1288888888888889,
+      "grad_norm": 2.2487220764160156,
+      "learning_rate": 2.688669780064268e-06,
+      "loss": 0.795,
+      "step": 1450
+    },
+    {
+      "epoch": 0.12933333333333333,
+      "grad_norm": 1.7264000177383423,
+      "learning_rate": 2.515043689906149e-06,
+      "loss": 0.7619,
+      "step": 1455
+    },
+    {
+      "epoch": 0.12977777777777777,
+      "grad_norm": 2.1828534603118896,
+      "learning_rate": 2.3470678346851518e-06,
+      "loss": 0.7232,
+      "step": 1460
+    },
+    {
+      "epoch": 0.1302222222222222,
+      "grad_norm": 2.1059188842773438,
+      "learning_rate": 2.1847622018482283e-06,
+      "loss": 0.7275,
+      "step": 1465
+    },
+    {
+      "epoch": 0.13066666666666665,
+      "grad_norm": 2.2545006275177,
+      "learning_rate": 2.0281461041425807e-06,
+      "loss": 0.7381,
+      "step": 1470
+    },
+    {
+      "epoch": 0.13111111111111112,
+      "grad_norm": 1.883216142654419,
+      "learning_rate": 1.8772381773176417e-06,
+      "loss": 0.713,
+      "step": 1475
+    },
+    {
+      "epoch": 0.13155555555555556,
+      "grad_norm": 1.9266149997711182,
+      "learning_rate": 1.7320563779075593e-06,
+      "loss": 0.7198,
+      "step": 1480
+    },
+    {
+      "epoch": 0.132,
+      "grad_norm": 2.2115542888641357,
+      "learning_rate": 1.5926179810946184e-06,
+      "loss": 0.7582,
+      "step": 1485
+    },
+    {
+      "epoch": 0.13244444444444445,
+      "grad_norm": 2.4320266246795654,
+      "learning_rate": 1.4589395786535953e-06,
+      "loss": 0.7429,
+      "step": 1490
+    },
+    {
+      "epoch": 0.1328888888888889,
+      "grad_norm": 2.242762804031372,
+      "learning_rate": 1.331037076977576e-06,
+      "loss": 0.7461,
+      "step": 1495
+    },
+    {
+      "epoch": 0.13333333333333333,
+      "grad_norm": 1.9152480363845825,
+      "learning_rate": 1.2089256951851924e-06,
+      "loss": 0.7538,
+      "step": 1500
+    },
+    {
+      "epoch": 0.13333333333333333,
+      "eval_loss": 0.8730303645133972,
+      "eval_runtime": 146.7239,
+      "eval_samples_per_second": 34.078,
+      "eval_steps_per_second": 4.26,
+      "step": 1500
+    },
+    {
+      "epoch": 0.13377777777777777,
+      "grad_norm": 2.0257728099823,
+      "learning_rate": 1.0926199633097157e-06,
+      "loss": 0.7371,
+      "step": 1505
+    },
+    {
+      "epoch": 0.13422222222222221,
+      "grad_norm": 2.0611376762390137,
+      "learning_rate": 9.821337205701665e-07,
+      "loss": 0.7441,
+      "step": 1510
+    },
+    {
+      "epoch": 0.13466666666666666,
+      "grad_norm": 2.425452470779419,
+      "learning_rate": 8.774801137245159e-07,
+      "loss": 0.7061,
+      "step": 1515
+    },
+    {
+      "epoch": 0.1351111111111111,
+      "grad_norm": 2.0856432914733887,
+      "learning_rate": 7.786715955054203e-07,
+      "loss": 0.7179,
+      "step": 1520
+    },
+    {
+      "epoch": 0.13555555555555557,
+      "grad_norm": 2.4328227043151855,
+      "learning_rate": 6.857199231384282e-07,
+      "loss": 0.7216,
+      "step": 1525
+    },
+    {
+      "epoch": 0.136,
+      "grad_norm": 1.9803858995437622,
+      "learning_rate": 5.986361569430165e-07,
+      "loss": 0.7653,
+      "step": 1530
+    },
+    {
+      "epoch": 0.13644444444444445,
+      "grad_norm": 1.987241268157959,
+      "learning_rate": 5.174306590164879e-07,
+      "loss": 0.7252,
+      "step": 1535
+    },
+    {
+      "epoch": 0.1368888888888889,
+      "grad_norm": 2.019005537033081,
+      "learning_rate": 4.4211309200102303e-07,
+      "loss": 0.7449,
+      "step": 1540
+    },
+    {
+      "epoch": 0.13733333333333334,
+      "grad_norm": 1.9359219074249268,
+      "learning_rate": 3.7269241793390085e-07,
+      "loss": 0.7122,
+      "step": 1545
+    },
+    {
+      "epoch": 0.13777777777777778,
+      "grad_norm": 2.6825265884399414,
+      "learning_rate": 3.09176897181096e-07,
+      "loss": 0.7412,
+      "step": 1550
+    },
+    {
+      "epoch": 0.13822222222222222,
+      "grad_norm": 2.0021426677703857,
+      "learning_rate": 2.515740874544148e-07,
+      "loss": 0.7252,
+      "step": 1555
+    },
+    {
+      "epoch": 0.13866666666666666,
+      "grad_norm": 1.840452790260315,
+      "learning_rate": 1.9989084291216487e-07,
+      "loss": 0.7414,
+      "step": 1560
+    },
+    {
+      "epoch": 0.1391111111111111,
+      "grad_norm": 1.8142333030700684,
+      "learning_rate": 1.5413331334360182e-07,
+      "loss": 0.7031,
+      "step": 1565
+    },
+    {
+      "epoch": 0.13955555555555554,
+      "grad_norm": 2.1795592308044434,
+      "learning_rate": 1.1430694343715353e-07,
+      "loss": 0.7334,
+      "step": 1570
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 2.239292621612549,
+      "learning_rate": 8.041647213256064e-08,
+      "loss": 0.7233,
+      "step": 1575
+    },
+    {
+      "epoch": 0.14044444444444446,
+      "grad_norm": 2.0367276668548584,
+      "learning_rate": 5.246593205699424e-08,
+      "loss": 0.7542,
+      "step": 1580
+    },
+    {
+      "epoch": 0.1408888888888889,
+      "grad_norm": 2.47322678565979,
+      "learning_rate": 3.04586490452119e-08,
+      "loss": 0.7508,
+      "step": 1585
+    },
+    {
+      "epoch": 0.14133333333333334,
+      "grad_norm": 2.6402623653411865,
+      "learning_rate": 1.4397241743813184e-08,
+      "loss": 0.7378,
+      "step": 1590
+    },
+    {
+      "epoch": 0.14177777777777778,
+      "grad_norm": 2.3272688388824463,
+      "learning_rate": 4.2836212996499865e-09,
+      "loss": 0.7607,
+      "step": 1595
+    },
+    {
+      "epoch": 0.14222222222222222,
+      "grad_norm": 2.066089630126953,
+      "learning_rate": 1.189911324084303e-10,
+      "loss": 0.7017,
+      "step": 1600
+    },
+    {
+      "epoch": 0.14222222222222222,
+      "eval_loss": 0.8723308444023132,
+      "eval_runtime": 146.0052,
+      "eval_samples_per_second": 34.245,
+      "eval_steps_per_second": 4.281,
+      "step": 1600
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 1600,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 347760947673600.0,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f301dabc963ac8802177bfd738213a0cc9f22b48633c155ad395ae77124e6c7
+size 5368