Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

README.md +193 -184
adapter_config.json +8 -5
adapter_model.safetensors +2 -2
optimizer.pt +2 -2
rng_state.pth +2 -2
scheduler.pt +2 -2
tokenizer.json +2 -2
trainer_state.json +674 -1890
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -1,198 +1,207 @@
 ---
 base_model: google/gemma-3-270m
 library_name: peft
-license: mit
-tags:
-  - chess
-  - lora
-  - mixture-of-experts
-  - mps
-  - apple-silicon
-  - gemma
-  - uci
-  - chess-engine
-datasets:
-  - lukifer23/gemmafischer-chess-training
-language:
-  - en
 pipeline_tag: text-generation
 ---
-# GemmaFischer UCI Expert LoRA
-LoRA adapter for chess move generation in UCI format, trained on Google's Gemma-3 270M base model. This is the **UCI Expert** from the GemmaFischer Mixture of Experts chess system, optimized for Apple Silicon with MPS acceleration.
-## Model Description
-This adapter specializes in generating legal chess moves in UCI (Universal Chess Interface) format. It's part of a 3-expert system including:
-- **UCI Expert** (this model): Fast move generation in UCI format
-- **Tutor Expert**: Detailed chess explanations and analysis
-- **Director Expert**: Strategic reasoning and Q&A
-## Training Details
-### Base Model
-- **Model**: google/gemma-3-270m
-- **Architecture**: Gemma-3 270M parameters
-### LoRA Configuration
-- **Rank (r)**: 16
-- **Alpha**: 32
-- **Dropout**: 0.05
-- **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`
-- **Task Type**: Causal Language Modeling
 ### Training Data
-- **Dataset Size**: 50,000 chess positions
-- **Validation**: 100% Stockfish-verified legal moves
-- **Quality Score**: 0.8
-- **Format**: Standardized JSONL with metadata
-### Training Metrics
-- **Total Steps**: 1,600
-- **Best Eval Loss**: 0.8723 (at step 1600)
-- **Final Training Loss**: 0.7017
-- **Training Platform**: Apple M3 Pro with MPS acceleration
-- **Training Speed**: ~2-3 steps/second
-- **Batch Size**: 1 with gradient accumulation
-### Hardware & Optimization
-- **Platform**: Mac-only (M3 Pro)
-- **Acceleration**: MPS (Metal Performance Shaders)
-- **Memory Optimization**: Gradient checkpointing enabled
-- **Peak Memory**: ~3-5GB
-## Usage
-### Installation
-```bash
-pip install transformers peft torch
-```
-### Loading the Model
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from peft import PeftModel
-# Load base model
-base_model = AutoModelForCausalLM.from_pretrained(
-    "google/gemma-3-270m",
-    device_map="mps",  # For Apple Silicon
-    torch_dtype="auto"
-)
-# Load LoRA adapter
-model = PeftModel.from_pretrained(
-    base_model,
-    "lukifer23/gemmafischer-uci-lora"
-)
-tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m")
-```
-### Generating UCI Moves
-```python
-# Format: FEN position -> UCI move
-fen = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
-prompt = f"FEN: {fen}\nGenerate the best move in UCI format only:"
-inputs = tokenizer(prompt, return_tensors="pt").to("mps")
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=5,
-    do_sample=False,  # Deterministic for UCI
-    temperature=0.0
-)
-move = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(move)  # e.g., "e2e4"
-```
-### Integration with Chess Software
-```python
-import chess
-def get_uci_move(fen_position):
-    """Generate UCI move for a given position."""
-    prompt = f"FEN: {fen_position}\nGenerate the best move in UCI format only:"
-    inputs = tokenizer(prompt, return_tensors="pt").to("mps")
-    outputs = model.generate(**inputs, max_new_tokens=5, do_sample=False)
-    move_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
-    # Extract UCI move (format: e2e4, e7e8q for promotion)
-    import re
-    uci_match = re.search(r'[a-h][1-8][a-h][1-8][qrbn]?', move_text)
-    return uci_match.group(0) if uci_match else None
-# Example usage
-board = chess.Board()
-uci_move = get_uci_move(board.fen())
-if uci_move:
-    move = chess.Move.from_uci(uci_move)
-    board.push(move)
-```
-## Performance
-### Capabilities
-- **Move Legality**: 100% legal move generation (Stockfish validated)
-- **UCI Format**: Correct UCI notation (e.g., `e2e4`, `e7e8q`)
-- **Inference Speed**: ~0.4-0.5s per move on M3 Pro
-- **Special Moves**: Supports castling, en passant, promotions
-### Limitations
-- Optimized for Apple Silicon MPS only
-- Not a strong chess engine (270M parameters)
-- Best used as part of MoE system with other experts
-- Requires base model access (Google Gemma-3)
-## System Requirements
-- **Hardware**: Mac with Apple Silicon (M1/M2/M3/M4)
-- **RAM**: 8GB minimum, 16GB recommended
-- **macOS**: 12.0+ (for MPS support)
-- **Python**: 3.10+
-## Related Models & Resources
-### GemmaFischer Collection
-- **Tutor Expert**: [lukifer23/gemmafischer-tutor-lora](https://huggingface.co/lukifer23/gemmafischer-tutor-lora) (coming soon)
-- **Director Expert**: [lukifer23/gemmafischer-director-lora](https://huggingface.co/lukifer23/gemmafischer-director-lora) (coming soon)
-- **Training Dataset**: [lukifer23/gemmafischer-chess-training](https://huggingface.co/datasets/lukifer23/gemmafischer-chess-training) (coming soon)
-### Repository
-- **GitHub**: [github.com/lukifer23/GemmaFischer](https://github.com/lukifer23/GemmaFischer)
-- **Documentation**: Full training guides, evaluation tools, and MoE system
-- **Web Interface**: Interactive chess board with expert switching
-## Training Loss Curve
-The model was trained for 1,600 steps with evaluation every 100 steps:
-- Initial loss: 4.59 (step 1)
-- Best eval loss: 0.872 (step 1600)
-- Final training loss: 0.702 (step 1600)
-Steady convergence with cosine learning rate schedule from 1e-4 to near zero.
-## Citation
-```bibtex
-@misc{gemmafischer2025,
-  author = {lukifer23},
-  title = {GemmaFischer: Chess Engine and Tutor with Mixture of Experts},
-  year = {2025},
-  publisher = {HuggingFace},
-  howpublished = {\url{https://huggingface.co/lukifer23/gemmafischer-uci-lora}}
-}
-```
-## License
-MIT License - See [LICENSE](https://github.com/lukifer23/GemmaFischer/blob/main/LICENSE) file for details.
-## Acknowledgments
-- **Base Model**: Google's Gemma-3 270M
-- **Training Platform**: Apple Silicon (M3 Pro) with MPS
-- **Validation**: Stockfish chess engine
-- **Framework**: HuggingFace Transformers + PEFT

 ---
 base_model: google/gemma-3-270m
 library_name: peft
 pipeline_tag: text-generation
+tags:
+- base_model:adapter:google/gemma-3-270m
+- lora
+- transformers
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
 ### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

adapter_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "alpha_pattern": {},
   "auto_mapping": null,
-  "base_model_name_or_path": "/Users/admin/Downloads/VSCode/GemmaFischer/models/google-gemma-3-270m",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
@@ -13,7 +13,7 @@
   "layers_pattern": null,
   "layers_to_transform": null,
   "loftq_config": {},
-  "lora_alpha": 32,
   "lora_bias": false,
   "lora_dropout": 0.05,
   "megatron_config": null,
@@ -21,14 +21,17 @@
   "modules_to_save": null,
   "peft_type": "LORA",
   "qalora_group_size": 16,
-  "r": 16,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "k_proj",
     "o_proj",
-    "q_proj",
-    "v_proj"
   ],
   "target_parameters": null,
   "task_type": "CAUSAL_LM",

 {
   "alpha_pattern": {},
   "auto_mapping": null,
+  "base_model_name_or_path": "google/gemma-3-270m",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
   "layers_pattern": null,
   "layers_to_transform": null,
   "loftq_config": {},
+  "lora_alpha": 64,
   "lora_bias": false,
   "lora_dropout": 0.05,
   "megatron_config": null,
   "modules_to_save": null,
   "peft_type": "LORA",
   "qalora_group_size": 16,
+  "r": 32,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "gate_proj",
+    "down_proj",
+    "up_proj",
     "k_proj",
     "o_proj",
+    "v_proj",
+    "q_proj"
   ],
   "target_parameters": null,
   "task_type": "CAUSAL_LM",

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:74d6d9edde51340678ce1ee14ae112b077fc53078404605c4d903803c3f67bbf
-size 5917192

 version https://git-lfs.github.com/spec/v1
+oid sha256:938b914519292237dea78823bff38d42b726382b54a5f3cd464add97d8d2bd25
+size 30409120

optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ee04aa80d94558c7cfc322b976fbbfaa7d7d2991535e15290a0c01d094b5cf2
-size 2156890246

 version https://git-lfs.github.com/spec/v1
+oid sha256:cd5756df66c54278fe70b29b27a913fa01a64a7254a5221338f09897e0ce9588
+size 2205934555

rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a416021fcc136006bfe4651385bb006441ee5a161cc6b38aff634835fe44cadc
-size 13990

 version https://git-lfs.github.com/spec/v1
+oid sha256:66f70f09d3cb910592b4d2344caae1568a9b5043377429849d732244fbeec9cb
+size 14391

scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:933821bed50a92dd2dc11b2ebd21a8303e761867bc574b4556159143c11330c7
-size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:d49cd60c3246e9d86bd41348cd643f8e1cabffcb1dda38d57f3cf7a26c4f60d4
+size 1465

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ca2f60fd56eabb86ada6d0ef7c30d1ce71e1ed22af2d19e5238a9f0a5cdfa23c
-size 33384666

 version https://git-lfs.github.com/spec/v1
+oid sha256:4201e7b539fef153e1fe3058db39e600717b3323fee690d37e92fa52fb2b5af2
+size 33384667

trainer_state.json CHANGED Viewed

@@ -1,2395 +1,1179 @@
 {
-  "best_global_step": 1600,
-  "best_metric": 0.8723308444023132,
-  "best_model_checkpoint": "checkpoints/lora_uci/checkpoint-1600",
-  "epoch": 0.14222222222222222,
-  "eval_steps": 100,
-  "global_step": 1600,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 8.888888888888889e-05,
-      "grad_norm": 34.336647033691406,
       "learning_rate": 0.0,
-      "loss": 4.5963,
       "step": 1
     },
     {
-      "epoch": 0.00044444444444444447,
-      "grad_norm": 32.09209060668945,
-      "learning_rate": 2.5e-06,
-      "loss": 4.4684,
       "step": 5
     },
     {
-      "epoch": 0.0008888888888888889,
-      "grad_norm": 31.128032684326172,
-      "learning_rate": 5.625e-06,
-      "loss": 3.7103,
       "step": 10
     },
     {
-      "epoch": 0.0013333333333333333,
-      "grad_norm": 22.065799713134766,
-      "learning_rate": 8.75e-06,
-      "loss": 2.5875,
       "step": 15
     },
     {
-      "epoch": 0.0017777777777777779,
-      "grad_norm": 14.675823211669922,
-      "learning_rate": 1.1875e-05,
-      "loss": 1.7574,
       "step": 20
     },
     {
-      "epoch": 0.0022222222222222222,
-      "grad_norm": 15.899088859558105,
-      "learning_rate": 1.5e-05,
-      "loss": 1.426,
       "step": 25
     },
     {
-      "epoch": 0.0026666666666666666,
-      "grad_norm": 14.202256202697754,
-      "learning_rate": 1.8125e-05,
-      "loss": 1.2788,
       "step": 30
     },
     {
-      "epoch": 0.003111111111111111,
-      "grad_norm": 9.793079376220703,
-      "learning_rate": 2.125e-05,
-      "loss": 1.2216,
       "step": 35
     },
     {
-      "epoch": 0.0035555555555555557,
-      "grad_norm": 19.244722366333008,
-      "learning_rate": 2.4375e-05,
-      "loss": 1.2066,
       "step": 40
     },
     {
-      "epoch": 0.004,
-      "grad_norm": 8.825323104858398,
-      "learning_rate": 2.7500000000000004e-05,
-      "loss": 1.0881,
       "step": 45
     },
     {
-      "epoch": 0.0044444444444444444,
-      "grad_norm": 10.595223426818848,
-      "learning_rate": 3.0625000000000006e-05,
-      "loss": 1.0866,
       "step": 50
     },
     {
-      "epoch": 0.004888888888888889,
-      "grad_norm": 12.221918106079102,
-      "learning_rate": 3.375000000000001e-05,
-      "loss": 1.1606,
       "step": 55
     },
     {
-      "epoch": 0.005333333333333333,
-      "grad_norm": 11.6161527633667,
-      "learning_rate": 3.6875e-05,
-      "loss": 1.1047,
       "step": 60
     },
     {
-      "epoch": 0.0057777777777777775,
-      "grad_norm": 8.067273139953613,
-      "learning_rate": 4e-05,
-      "loss": 1.0627,
       "step": 65
     },
     {
-      "epoch": 0.006222222222222222,
-      "grad_norm": 12.541388511657715,
-      "learning_rate": 4.3125000000000005e-05,
-      "loss": 1.0847,
       "step": 70
     },
     {
-      "epoch": 0.006666666666666667,
-      "grad_norm": 11.718969345092773,
-      "learning_rate": 4.6250000000000006e-05,
-      "loss": 1.0382,
       "step": 75
     },
     {
-      "epoch": 0.0071111111111111115,
-      "grad_norm": 9.308419227600098,
-      "learning_rate": 4.937500000000001e-05,
-      "loss": 1.0483,
       "step": 80
     },
     {
-      "epoch": 0.007555555555555556,
-      "grad_norm": 6.772762298583984,
-      "learning_rate": 5.25e-05,
-      "loss": 1.009,
       "step": 85
     },
     {
-      "epoch": 0.008,
-      "grad_norm": 9.496241569519043,
-      "learning_rate": 5.5625000000000004e-05,
-      "loss": 0.9837,
       "step": 90
     },
     {
-      "epoch": 0.008444444444444444,
-      "grad_norm": 7.885592937469482,
-      "learning_rate": 5.8750000000000005e-05,
-      "loss": 1.0189,
       "step": 95
     },
     {
-      "epoch": 0.008888888888888889,
-      "grad_norm": 5.724958419799805,
-      "learning_rate": 6.1875e-05,
-      "loss": 1.0009,
-      "step": 100
-    },
-    {
-      "epoch": 0.008888888888888889,
-      "eval_loss": 1.1365891695022583,
-      "eval_runtime": 185.2718,
-      "eval_samples_per_second": 26.987,
-      "eval_steps_per_second": 3.373,
       "step": 100
     },
     {
-      "epoch": 0.009333333333333334,
-      "grad_norm": 7.011026859283447,
-      "learning_rate": 6.500000000000001e-05,
-      "loss": 1.0031,
       "step": 105
     },
     {
-      "epoch": 0.009777777777777778,
-      "grad_norm": 7.641518592834473,
-      "learning_rate": 6.8125e-05,
-      "loss": 0.997,
       "step": 110
     },
     {
-      "epoch": 0.010222222222222223,
-      "grad_norm": 9.401971817016602,
-      "learning_rate": 7.125000000000001e-05,
-      "loss": 1.0158,
       "step": 115
     },
     {
-      "epoch": 0.010666666666666666,
-      "grad_norm": 4.336047649383545,
-      "learning_rate": 7.4375e-05,
-      "loss": 0.9726,
       "step": 120
     },
     {
-      "epoch": 0.011111111111111112,
-      "grad_norm": 6.882427215576172,
-      "learning_rate": 7.75e-05,
-      "loss": 1.0175,
       "step": 125
     },
     {
-      "epoch": 0.011555555555555555,
-      "grad_norm": 5.442468643188477,
-      "learning_rate": 8.062500000000001e-05,
-      "loss": 0.936,
       "step": 130
     },
     {
-      "epoch": 0.012,
-      "grad_norm": 4.264267444610596,
-      "learning_rate": 8.375e-05,
-      "loss": 0.9527,
       "step": 135
     },
     {
-      "epoch": 0.012444444444444444,
-      "grad_norm": 5.994289398193359,
-      "learning_rate": 8.687500000000001e-05,
-      "loss": 0.9757,
       "step": 140
     },
     {
-      "epoch": 0.012888888888888889,
-      "grad_norm": 5.154539585113525,
-      "learning_rate": 9e-05,
-      "loss": 0.9634,
       "step": 145
     },
     {
-      "epoch": 0.013333333333333334,
-      "grad_norm": 5.39900541305542,
-      "learning_rate": 9.3125e-05,
-      "loss": 0.9563,
       "step": 150
     },
     {
-      "epoch": 0.013777777777777778,
-      "grad_norm": 5.613903522491455,
-      "learning_rate": 9.625000000000001e-05,
-      "loss": 0.9728,
       "step": 155
     },
     {
-      "epoch": 0.014222222222222223,
-      "grad_norm": 4.219268798828125,
-      "learning_rate": 9.9375e-05,
-      "loss": 0.9342,
       "step": 160
     },
     {
-      "epoch": 0.014666666666666666,
-      "grad_norm": 6.655375003814697,
-      "learning_rate": 9.999809615320856e-05,
-      "loss": 0.9196,
       "step": 165
     },
     {
-      "epoch": 0.015111111111111112,
-      "grad_norm": 4.512648582458496,
-      "learning_rate": 9.999036202410325e-05,
-      "loss": 1.0073,
       "step": 170
     },
     {
-      "epoch": 0.015555555555555555,
-      "grad_norm": 4.687705993652344,
-      "learning_rate": 9.997667954183565e-05,
-      "loss": 0.9762,
       "step": 175
     },
     {
-      "epoch": 0.016,
-      "grad_norm": 4.793412685394287,
-      "learning_rate": 9.995705033448435e-05,
-      "loss": 0.9198,
       "step": 180
     },
     {
-      "epoch": 0.016444444444444446,
-      "grad_norm": 5.910813808441162,
-      "learning_rate": 9.99314767377287e-05,
-      "loss": 0.9407,
       "step": 185
     },
     {
-      "epoch": 0.016888888888888887,
-      "grad_norm": 5.998133659362793,
-      "learning_rate": 9.9899961794571e-05,
-      "loss": 0.9476,
       "step": 190
     },
     {
-      "epoch": 0.017333333333333333,
-      "grad_norm": 4.035412311553955,
-      "learning_rate": 9.986250925497429e-05,
-      "loss": 0.9648,
       "step": 195
     },
     {
-      "epoch": 0.017777777777777778,
-      "grad_norm": 4.533802032470703,
-      "learning_rate": 9.981912357541627e-05,
-      "loss": 0.9095,
       "step": 200
     },
     {
-      "epoch": 0.017777777777777778,
-      "eval_loss": 1.080334186553955,
-      "eval_runtime": 187.8444,
-      "eval_samples_per_second": 26.618,
-      "eval_steps_per_second": 3.327,
       "step": 200
     },
     {
-      "epoch": 0.018222222222222223,
-      "grad_norm": 4.582113265991211,
-      "learning_rate": 9.976980991835894e-05,
-      "loss": 0.9287,
       "step": 205
     },
     {
-      "epoch": 0.018666666666666668,
-      "grad_norm": 3.81330943107605,
-      "learning_rate": 9.971457415163435e-05,
-      "loss": 0.9538,
       "step": 210
     },
     {
-      "epoch": 0.01911111111111111,
-      "grad_norm": 4.3214430809021,
-      "learning_rate": 9.965342284774632e-05,
-      "loss": 0.9262,
       "step": 215
     },
     {
-      "epoch": 0.019555555555555555,
-      "grad_norm": 4.554825305938721,
-      "learning_rate": 9.958636328308853e-05,
-      "loss": 0.9172,
       "step": 220
     },
     {
-      "epoch": 0.02,
-      "grad_norm": 4.938535213470459,
-      "learning_rate": 9.951340343707852e-05,
-      "loss": 0.916,
       "step": 225
     },
     {
-      "epoch": 0.020444444444444446,
-      "grad_norm": 4.2901930809021,
-      "learning_rate": 9.943455199120837e-05,
-      "loss": 0.9322,
       "step": 230
     },
     {
-      "epoch": 0.020888888888888887,
-      "grad_norm": 4.4027228355407715,
-      "learning_rate": 9.93498183280116e-05,
-      "loss": 0.8971,
       "step": 235
     },
     {
-      "epoch": 0.021333333333333333,
-      "grad_norm": 4.342219829559326,
-      "learning_rate": 9.925921252994676e-05,
-      "loss": 0.9283,
       "step": 240
     },
     {
-      "epoch": 0.021777777777777778,
-      "grad_norm": 4.265533447265625,
-      "learning_rate": 9.916274537819775e-05,
-      "loss": 0.9491,
       "step": 245
     },
     {
-      "epoch": 0.022222222222222223,
-      "grad_norm": 3.2655222415924072,
-      "learning_rate": 9.906042835139089e-05,
-      "loss": 0.9067,
       "step": 250
     },
     {
-      "epoch": 0.02266666666666667,
-      "grad_norm": 5.214357852935791,
-      "learning_rate": 9.89522736242292e-05,
-      "loss": 0.9483,
       "step": 255
     },
     {
-      "epoch": 0.02311111111111111,
-      "grad_norm": 3.831639289855957,
-      "learning_rate": 9.883829406604363e-05,
-      "loss": 0.8565,
       "step": 260
     },
     {
-      "epoch": 0.023555555555555555,
-      "grad_norm": 3.9132847785949707,
-      "learning_rate": 9.871850323926177e-05,
-      "loss": 0.9348,
       "step": 265
     },
     {
-      "epoch": 0.024,
-      "grad_norm": 3.378690481185913,
-      "learning_rate": 9.859291539779406e-05,
-      "loss": 0.9033,
       "step": 270
     },
     {
-      "epoch": 0.024444444444444446,
-      "grad_norm": 4.511641025543213,
-      "learning_rate": 9.846154548533773e-05,
-      "loss": 0.908,
       "step": 275
     },
     {
-      "epoch": 0.024888888888888887,
-      "grad_norm": 2.6710896492004395,
-      "learning_rate": 9.832440913359861e-05,
-      "loss": 0.8716,
       "step": 280
     },
     {
-      "epoch": 0.025333333333333333,
-      "grad_norm": 3.3226819038391113,
-      "learning_rate": 9.818152266043114e-05,
-      "loss": 0.9229,
       "step": 285
     },
     {
-      "epoch": 0.025777777777777778,
-      "grad_norm": 2.8580386638641357,
-      "learning_rate": 9.803290306789676e-05,
-      "loss": 0.8817,
       "step": 290
     },
     {
-      "epoch": 0.026222222222222223,
-      "grad_norm": 3.8689441680908203,
-      "learning_rate": 9.787856804024073e-05,
-      "loss": 0.8654,
       "step": 295
     },
     {
-      "epoch": 0.02666666666666667,
-      "grad_norm": 5.108214855194092,
-      "learning_rate": 9.771853594178791e-05,
-      "loss": 0.9015,
-      "step": 300
-    },
-    {
-      "epoch": 0.02666666666666667,
-      "eval_loss": 1.0304287672042847,
-      "eval_runtime": 149.0721,
-      "eval_samples_per_second": 33.541,
-      "eval_steps_per_second": 4.193,
       "step": 300
     },
     {
-      "epoch": 0.02711111111111111,
-      "grad_norm": 3.7313003540039062,
-      "learning_rate": 9.755282581475769e-05,
-      "loss": 0.8804,
       "step": 305
     },
     {
-      "epoch": 0.027555555555555555,
-      "grad_norm": 2.914902687072754,
-      "learning_rate": 9.738145737699799e-05,
-      "loss": 0.8449,
       "step": 310
     },
     {
-      "epoch": 0.028,
-      "grad_norm": 3.4815356731414795,
-      "learning_rate": 9.720445101963922e-05,
-      "loss": 0.8962,
       "step": 315
     },
     {
-      "epoch": 0.028444444444444446,
-      "grad_norm": 3.9641096591949463,
-      "learning_rate": 9.702182780466775e-05,
-      "loss": 0.8539,
       "step": 320
     },
     {
-      "epoch": 0.028888888888888888,
-      "grad_norm": 3.4962000846862793,
-      "learning_rate": 9.683360946241989e-05,
-      "loss": 0.8779,
       "step": 325
     },
     {
-      "epoch": 0.029333333333333333,
-      "grad_norm": 3.8663835525512695,
-      "learning_rate": 9.663981838899612e-05,
-      "loss": 0.8943,
       "step": 330
     },
     {
-      "epoch": 0.029777777777777778,
-      "grad_norm": 3.476285934448242,
-      "learning_rate": 9.644047764359622e-05,
-      "loss": 0.8697,
       "step": 335
     },
     {
-      "epoch": 0.030222222222222223,
-      "grad_norm": 3.642954111099243,
-      "learning_rate": 9.623561094577542e-05,
-      "loss": 0.9362,
       "step": 340
     },
     {
-      "epoch": 0.030666666666666665,
-      "grad_norm": 3.1184070110321045,
-      "learning_rate": 9.602524267262203e-05,
-      "loss": 0.8999,
       "step": 345
     },
     {
-      "epoch": 0.03111111111111111,
-      "grad_norm": 3.0415987968444824,
-      "learning_rate": 9.580939785585681e-05,
-      "loss": 0.8613,
       "step": 350
     },
     {
-      "epoch": 0.03155555555555556,
-      "grad_norm": 2.9512388706207275,
-      "learning_rate": 9.558810217885443e-05,
-      "loss": 0.8499,
       "step": 355
     },
     {
-      "epoch": 0.032,
-      "grad_norm": 3.8605785369873047,
-      "learning_rate": 9.536138197358745e-05,
-      "loss": 0.8472,
       "step": 360
     },
     {
-      "epoch": 0.03244444444444444,
-      "grad_norm": 2.254096031188965,
-      "learning_rate": 9.512926421749304e-05,
-      "loss": 0.8302,
       "step": 365
     },
     {
-      "epoch": 0.03288888888888889,
-      "grad_norm": 2.6906585693359375,
-      "learning_rate": 9.489177653026289e-05,
-      "loss": 0.8939,
       "step": 370
     },
     {
-      "epoch": 0.03333333333333333,
-      "grad_norm": 2.434436559677124,
-      "learning_rate": 9.464894717055686e-05,
-      "loss": 0.9043,
       "step": 375
     },
     {
-      "epoch": 0.033777777777777775,
-      "grad_norm": 2.497917652130127,
-      "learning_rate": 9.440080503264037e-05,
-      "loss": 0.8737,
       "step": 380
     },
     {
-      "epoch": 0.03422222222222222,
-      "grad_norm": 4.014335632324219,
-      "learning_rate": 9.414737964294636e-05,
-      "loss": 0.8566,
       "step": 385
     },
     {
-      "epoch": 0.034666666666666665,
-      "grad_norm": 2.4953484535217285,
-      "learning_rate": 9.388870115656184e-05,
-      "loss": 0.8479,
       "step": 390
     },
     {
-      "epoch": 0.035111111111111114,
-      "grad_norm": 3.679469347000122,
-      "learning_rate": 9.362480035363986e-05,
-      "loss": 0.8702,
       "step": 395
     },
     {
-      "epoch": 0.035555555555555556,
-      "grad_norm": 3.756469249725342,
-      "learning_rate": 9.335570863573686e-05,
-      "loss": 0.8408,
       "step": 400
     },
     {
-      "epoch": 0.035555555555555556,
-      "eval_loss": 0.9743552803993225,
-      "eval_runtime": 145.3606,
-      "eval_samples_per_second": 34.397,
-      "eval_steps_per_second": 4.3,
       "step": 400
     },
     {
-      "epoch": 0.036,
-      "grad_norm": 2.5384509563446045,
-      "learning_rate": 9.308145802207629e-05,
-      "loss": 0.7945,
       "step": 405
     },
     {
-      "epoch": 0.036444444444444446,
-      "grad_norm": 3.5420918464660645,
-      "learning_rate": 9.280208114573859e-05,
-      "loss": 0.8276,
       "step": 410
     },
     {
-      "epoch": 0.03688888888888889,
-      "grad_norm": 3.725031852722168,
-      "learning_rate": 9.251761124977815e-05,
-      "loss": 0.8379,
       "step": 415
     },
     {
-      "epoch": 0.037333333333333336,
-      "grad_norm": 3.2828762531280518,
-      "learning_rate": 9.222808218326784e-05,
-      "loss": 0.8136,
       "step": 420
     },
     {
-      "epoch": 0.03777777777777778,
-      "grad_norm": 3.535404682159424,
-      "learning_rate": 9.193352839727121e-05,
-      "loss": 0.8441,
       "step": 425
     },
     {
-      "epoch": 0.03822222222222222,
-      "grad_norm": 3.6818301677703857,
-      "learning_rate": 9.163398494074314e-05,
-      "loss": 0.824,
       "step": 430
     },
     {
-      "epoch": 0.03866666666666667,
-      "grad_norm": 3.2186279296875,
-      "learning_rate": 9.132948745635944e-05,
-      "loss": 0.867,
       "step": 435
     },
     {
-      "epoch": 0.03911111111111111,
-      "grad_norm": 3.6566619873046875,
-      "learning_rate": 9.102007217627568e-05,
-      "loss": 0.8889,
       "step": 440
     },
     {
-      "epoch": 0.03955555555555555,
-      "grad_norm": 2.5457677841186523,
-      "learning_rate": 9.070577591781597e-05,
-      "loss": 0.8509,
       "step": 445
     },
     {
-      "epoch": 0.04,
-      "grad_norm": 2.946967840194702,
-      "learning_rate": 9.038663607909198e-05,
-      "loss": 0.8356,
       "step": 450
     },
     {
-      "epoch": 0.04044444444444444,
-      "grad_norm": 3.6755688190460205,
-      "learning_rate": 9.006269063455304e-05,
-      "loss": 0.8134,
       "step": 455
     },
     {
-      "epoch": 0.04088888888888889,
-      "grad_norm": 3.2024929523468018,
-      "learning_rate": 8.97339781304675e-05,
-      "loss": 0.8292,
       "step": 460
     },
     {
-      "epoch": 0.04133333333333333,
-      "grad_norm": 3.4355175495147705,
-      "learning_rate": 8.940053768033609e-05,
-      "loss": 0.8325,
       "step": 465
     },
     {
-      "epoch": 0.041777777777777775,
-      "grad_norm": 3.882667303085327,
-      "learning_rate": 8.906240896023794e-05,
-      "loss": 0.8773,
       "step": 470
     },
     {
-      "epoch": 0.042222222222222223,
-      "grad_norm": 3.5231196880340576,
-      "learning_rate": 8.871963220410928e-05,
-      "loss": 0.8399,
       "step": 475
     },
     {
-      "epoch": 0.042666666666666665,
-      "grad_norm": 2.3692946434020996,
-      "learning_rate": 8.837224819895626e-05,
-      "loss": 0.8638,
       "step": 480
     },
     {
-      "epoch": 0.043111111111111114,
-      "grad_norm": 3.1206417083740234,
-      "learning_rate": 8.802029828000156e-05,
-      "loss": 0.8241,
       "step": 485
     },
     {
-      "epoch": 0.043555555555555556,
-      "grad_norm": 2.570483922958374,
-      "learning_rate": 8.766382432576588e-05,
-      "loss": 0.8265,
       "step": 490
     },
     {
-      "epoch": 0.044,
-      "grad_norm": 2.721163749694824,
-      "learning_rate": 8.730286875308497e-05,
-      "loss": 0.8191,
       "step": 495
     },
     {
-      "epoch": 0.044444444444444446,
-      "grad_norm": 2.883211851119995,
-      "learning_rate": 8.693747451206232e-05,
-      "loss": 0.8014,
-      "step": 500
-    },
-    {
-      "epoch": 0.044444444444444446,
-      "eval_loss": 0.9622268676757812,
-      "eval_runtime": 146.4776,
-      "eval_samples_per_second": 34.135,
-      "eval_steps_per_second": 4.267,
       "step": 500
     },
     {
-      "epoch": 0.04488888888888889,
-      "grad_norm": 2.902592897415161,
-      "learning_rate": 8.656768508095853e-05,
-      "loss": 0.8482,
       "step": 505
     },
     {
-      "epoch": 0.04533333333333334,
-      "grad_norm": 3.207852840423584,
-      "learning_rate": 8.61935444610179e-05,
-      "loss": 0.819,
       "step": 510
     },
     {
-      "epoch": 0.04577777777777778,
-      "grad_norm": 3.402653455734253,
-      "learning_rate": 8.581509717123273e-05,
-      "loss": 0.8495,
       "step": 515
     },
     {
-      "epoch": 0.04622222222222222,
-      "grad_norm": 2.159984827041626,
-      "learning_rate": 8.543238824304584e-05,
-      "loss": 0.8078,
       "step": 520
     },
     {
-      "epoch": 0.04666666666666667,
-      "grad_norm": 3.279927968978882,
-      "learning_rate": 8.504546321499255e-05,
-      "loss": 0.8831,
       "step": 525
     },
     {
-      "epoch": 0.04711111111111111,
-      "grad_norm": 3.268341064453125,
-      "learning_rate": 8.46543681272818e-05,
-      "loss": 0.8359,
       "step": 530
     },
     {
-      "epoch": 0.04755555555555555,
-      "grad_norm": 2.2996602058410645,
-      "learning_rate": 8.425914951631795e-05,
-      "loss": 0.8419,
       "step": 535
     },
     {
-      "epoch": 0.048,
-      "grad_norm": 3.5043487548828125,
-      "learning_rate": 8.385985440916344e-05,
-      "loss": 0.8337,
       "step": 540
     },
     {
-      "epoch": 0.04844444444444444,
-      "grad_norm": 2.0510051250457764,
-      "learning_rate": 8.345653031794292e-05,
-      "loss": 0.8375,
       "step": 545
     },
     {
-      "epoch": 0.04888888888888889,
-      "grad_norm": 2.8395752906799316,
-      "learning_rate": 8.304922523418987e-05,
-      "loss": 0.82,
       "step": 550
     },
     {
-      "epoch": 0.04933333333333333,
-      "grad_norm": 4.7278876304626465,
-      "learning_rate": 8.263798762313612e-05,
-      "loss": 0.8209,
       "step": 555
     },
     {
-      "epoch": 0.049777777777777775,
-      "grad_norm": 2.782799482345581,
-      "learning_rate": 8.222286641794488e-05,
-      "loss": 0.8109,
       "step": 560
     },
     {
-      "epoch": 0.050222222222222224,
-      "grad_norm": 2.960604190826416,
-      "learning_rate": 8.18039110138882e-05,
-      "loss": 0.8397,
       "step": 565
     },
     {
-      "epoch": 0.050666666666666665,
-      "grad_norm": 2.1768970489501953,
-      "learning_rate": 8.138117126246951e-05,
-      "loss": 0.7785,
       "step": 570
     },
     {
-      "epoch": 0.051111111111111114,
-      "grad_norm": 2.2641615867614746,
-      "learning_rate": 8.095469746549172e-05,
-      "loss": 0.8549,
       "step": 575
     },
     {
-      "epoch": 0.051555555555555556,
-      "grad_norm": 3.175459384918213,
-      "learning_rate": 8.052454036907174e-05,
-      "loss": 0.8181,
       "step": 580
     },
     {
-      "epoch": 0.052,
-      "grad_norm": 4.72011661529541,
-      "learning_rate": 8.009075115760241e-05,
-      "loss": 0.8396,
       "step": 585
     },
     {
-      "epoch": 0.052444444444444446,
-      "grad_norm": 3.0918285846710205,
-      "learning_rate": 7.965338144766186e-05,
-      "loss": 0.8967,
       "step": 590
     },
     {
-      "epoch": 0.05288888888888889,
-      "grad_norm": 3.0895960330963135,
-      "learning_rate": 7.921248328187173e-05,
-      "loss": 0.8236,
       "step": 595
     },
     {
-      "epoch": 0.05333333333333334,
-      "grad_norm": 2.9176504611968994,
-      "learning_rate": 7.876810912270462e-05,
-      "loss": 0.7833,
       "step": 600
     },
     {
-      "epoch": 0.05333333333333334,
-      "eval_loss": 0.9369513392448425,
-      "eval_runtime": 145.5512,
-      "eval_samples_per_second": 34.352,
-      "eval_steps_per_second": 4.294,
       "step": 600
     },
     {
-      "epoch": 0.05377777777777778,
-      "grad_norm": 2.640820264816284,
-      "learning_rate": 7.832031184624164e-05,
-      "loss": 0.7855,
       "step": 605
     },
     {
-      "epoch": 0.05422222222222222,
-      "grad_norm": 2.6097235679626465,
-      "learning_rate": 7.786914473588056e-05,
-      "loss": 0.8043,
       "step": 610
     },
     {
-      "epoch": 0.05466666666666667,
-      "grad_norm": 2.815849781036377,
-      "learning_rate": 7.74146614759957e-05,
-      "loss": 0.8535,
       "step": 615
     },
     {
-      "epoch": 0.05511111111111111,
-      "grad_norm": 3.133481025695801,
-      "learning_rate": 7.695691614555003e-05,
-      "loss": 0.8229,
       "step": 620
     },
     {
-      "epoch": 0.05555555555555555,
-      "grad_norm": 2.641892910003662,
-      "learning_rate": 7.649596321166024e-05,
-      "loss": 0.7967,
       "step": 625
     },
     {
-      "epoch": 0.056,
-      "grad_norm": 3.032099723815918,
-      "learning_rate": 7.603185752311587e-05,
-      "loss": 0.812,
       "step": 630
     },
     {
-      "epoch": 0.05644444444444444,
-      "grad_norm": 2.820112466812134,
-      "learning_rate": 7.55646543038526e-05,
-      "loss": 0.7977,
       "step": 635
     },
     {
-      "epoch": 0.05688888888888889,
-      "grad_norm": 3.10481333732605,
-      "learning_rate": 7.509440914638139e-05,
-      "loss": 0.8705,
       "step": 640
     },
     {
-      "epoch": 0.05733333333333333,
-      "grad_norm": 2.5827136039733887,
-      "learning_rate": 7.462117800517336e-05,
-      "loss": 0.815,
       "step": 645
     },
     {
-      "epoch": 0.057777777777777775,
-      "grad_norm": 3.8412251472473145,
-      "learning_rate": 7.414501719000187e-05,
-      "loss": 0.8164,
       "step": 650
     },
     {
-      "epoch": 0.058222222222222224,
-      "grad_norm": 2.3787269592285156,
-      "learning_rate": 7.366598335924217e-05,
-      "loss": 0.8154,
       "step": 655
     },
     {
-      "epoch": 0.058666666666666666,
-      "grad_norm": 2.156470537185669,
-      "learning_rate": 7.318413351312965e-05,
-      "loss": 0.8122,
       "step": 660
     },
     {
-      "epoch": 0.059111111111111114,
-      "grad_norm": 2.743718385696411,
-      "learning_rate": 7.269952498697734e-05,
-      "loss": 0.8279,
       "step": 665
     },
     {
-      "epoch": 0.059555555555555556,
-      "grad_norm": 2.574324369430542,
-      "learning_rate": 7.221221544435363e-05,
-      "loss": 0.8205,
       "step": 670
     },
     {
-      "epoch": 0.06,
-      "grad_norm": 2.6374778747558594,
-      "learning_rate": 7.172226287022086e-05,
-      "loss": 0.786,
       "step": 675
     },
     {
-      "epoch": 0.060444444444444446,
-      "grad_norm": 2.6063718795776367,
-      "learning_rate": 7.122972556403567e-05,
-      "loss": 0.7742,
       "step": 680
     },
     {
-      "epoch": 0.06088888888888889,
-      "grad_norm": 2.1271631717681885,
-      "learning_rate": 7.073466213281196e-05,
-      "loss": 0.8303,
       "step": 685
     },
     {
-      "epoch": 0.06133333333333333,
-      "grad_norm": 2.25993275642395,
-      "learning_rate": 7.023713148414727e-05,
-      "loss": 0.8154,
       "step": 690
     },
     {
-      "epoch": 0.06177777777777778,
-      "grad_norm": 2.227431058883667,
-      "learning_rate": 6.973719281921335e-05,
-      "loss": 0.8394,
       "step": 695
     },
     {
-      "epoch": 0.06222222222222222,
-      "grad_norm": 2.3561928272247314,
-      "learning_rate": 6.923490562571181e-05,
-      "loss": 0.815,
-      "step": 700
-    },
-    {
-      "epoch": 0.06222222222222222,
-      "eval_loss": 0.9396146535873413,
-      "eval_runtime": 174.3916,
-      "eval_samples_per_second": 28.671,
-      "eval_steps_per_second": 3.584,
       "step": 700
     },
     {
-      "epoch": 0.06266666666666666,
-      "grad_norm": 2.8203611373901367,
-      "learning_rate": 6.873032967079561e-05,
-      "loss": 0.8072,
       "step": 705
     },
     {
-      "epoch": 0.06311111111111112,
-      "grad_norm": 2.616844892501831,
-      "learning_rate": 6.82235249939575e-05,
-      "loss": 0.8132,
       "step": 710
     },
     {
-      "epoch": 0.06355555555555556,
-      "grad_norm": 2.7529284954071045,
-      "learning_rate": 6.771455189988579e-05,
-      "loss": 0.8126,
       "step": 715
     },
     {
-      "epoch": 0.064,
-      "grad_norm": 2.466383218765259,
-      "learning_rate": 6.720347095128884e-05,
-      "loss": 0.8174,
       "step": 720
     },
     {
-      "epoch": 0.06444444444444444,
-      "grad_norm": 2.2590644359588623,
-      "learning_rate": 6.669034296168855e-05,
-      "loss": 0.8065,
       "step": 725
     },
     {
-      "epoch": 0.06488888888888888,
-      "grad_norm": 2.241419792175293,
-      "learning_rate": 6.617522898818426e-05,
-      "loss": 0.8332,
       "step": 730
     },
     {
-      "epoch": 0.06533333333333333,
-      "grad_norm": 2.259533405303955,
-      "learning_rate": 6.565819032418747e-05,
-      "loss": 0.8599,
       "step": 735
     },
     {
-      "epoch": 0.06577777777777778,
-      "grad_norm": 2.110358715057373,
-      "learning_rate": 6.513928849212873e-05,
-      "loss": 0.795,
       "step": 740
     },
     {
-      "epoch": 0.06622222222222222,
-      "grad_norm": 2.656036615371704,
-      "learning_rate": 6.461858523613684e-05,
-      "loss": 0.8161,
       "step": 745
     },
     {
-      "epoch": 0.06666666666666667,
-      "grad_norm": 3.0978121757507324,
-      "learning_rate": 6.409614251469208e-05,
-      "loss": 0.8104,
       "step": 750
     },
     {
-      "epoch": 0.06711111111111111,
-      "grad_norm": 2.494825839996338,
-      "learning_rate": 6.357202249325371e-05,
-      "loss": 0.791,
       "step": 755
     },
     {
-      "epoch": 0.06755555555555555,
-      "grad_norm": 2.344874143600464,
-      "learning_rate": 6.304628753686295e-05,
-      "loss": 0.8195,
       "step": 760
     },
     {
-      "epoch": 0.068,
-      "grad_norm": 2.4682934284210205,
-      "learning_rate": 6.251900020272208e-05,
-      "loss": 0.7791,
       "step": 765
     },
     {
-      "epoch": 0.06844444444444445,
-      "grad_norm": 2.29433012008667,
-      "learning_rate": 6.199022323275083e-05,
-      "loss": 0.8252,
       "step": 770
     },
     {
-      "epoch": 0.06888888888888889,
-      "grad_norm": 1.965577483177185,
-      "learning_rate": 6.146001954612071e-05,
-      "loss": 0.8046,
       "step": 775
     },
     {
-      "epoch": 0.06933333333333333,
-      "grad_norm": 2.1349830627441406,
-      "learning_rate": 6.092845223176823e-05,
-      "loss": 0.82,
       "step": 780
     },
     {
-      "epoch": 0.06977777777777777,
-      "grad_norm": 2.2359840869903564,
-      "learning_rate": 6.0395584540887963e-05,
-      "loss": 0.8138,
       "step": 785
     },
     {
-      "epoch": 0.07022222222222223,
-      "grad_norm": 2.470207691192627,
-      "learning_rate": 5.9861479879406315e-05,
-      "loss": 0.771,
       "step": 790
     },
     {
-      "epoch": 0.07066666666666667,
-      "grad_norm": 1.9428234100341797,
-      "learning_rate": 5.932620180043674e-05,
-      "loss": 0.7997,
       "step": 795
     },
     {
-      "epoch": 0.07111111111111111,
-      "grad_norm": 2.2809956073760986,
-      "learning_rate": 5.8789813996717736e-05,
-      "loss": 0.8113,
       "step": 800
     },
     {
-      "epoch": 0.07111111111111111,
-      "eval_loss": 0.9512593746185303,
-      "eval_runtime": 156.0458,
-      "eval_samples_per_second": 32.042,
-      "eval_steps_per_second": 4.005,
       "step": 800
-    },
-    {
-      "epoch": 0.07155555555555555,
-      "grad_norm": 2.4197585582733154,
-      "learning_rate": 5.8252380293033884e-05,
-      "loss": 0.8103,
-      "step": 805
-    },
-    {
-      "epoch": 0.072,
-      "grad_norm": 3.481379747390747,
-      "learning_rate": 5.7713964638621444e-05,
-      "loss": 0.8354,
-      "step": 810
-    },
-    {
-      "epoch": 0.07244444444444445,
-      "grad_norm": 3.0828964710235596,
-      "learning_rate": 5.717463109955896e-05,
-      "loss": 0.814,
-      "step": 815
-    },
-    {
-      "epoch": 0.07288888888888889,
-      "grad_norm": 2.0905721187591553,
-      "learning_rate": 5.663444385114411e-05,
-      "loss": 0.7695,
-      "step": 820
-    },
-    {
-      "epoch": 0.07333333333333333,
-      "grad_norm": 3.2763991355895996,
-      "learning_rate": 5.6093467170257374e-05,
-      "loss": 0.7864,
-      "step": 825
-    },
-    {
-      "epoch": 0.07377777777777778,
-      "grad_norm": 2.24617862701416,
-      "learning_rate": 5.5551765427713884e-05,
-      "loss": 0.7314,
-      "step": 830
-    },
-    {
-      "epoch": 0.07422222222222222,
-      "grad_norm": 2.808973789215088,
-      "learning_rate": 5.5009403080603815e-05,
-      "loss": 0.8163,
-      "step": 835
-    },
-    {
-      "epoch": 0.07466666666666667,
-      "grad_norm": 2.1906187534332275,
-      "learning_rate": 5.4466444664622685e-05,
-      "loss": 0.7868,
-      "step": 840
-    },
-    {
-      "epoch": 0.07511111111111111,
-      "grad_norm": 2.268329381942749,
-      "learning_rate": 5.392295478639225e-05,
-      "loss": 0.7765,
-      "step": 845
-    },
-    {
-      "epoch": 0.07555555555555556,
-      "grad_norm": 2.3029792308807373,
-      "learning_rate": 5.337899811577296e-05,
-      "loss": 0.7739,
-      "step": 850
-    },
-    {
-      "epoch": 0.076,
-      "grad_norm": 2.1580140590667725,
-      "learning_rate": 5.283463937816888e-05,
-      "loss": 0.7358,
-      "step": 855
-    },
-    {
-      "epoch": 0.07644444444444444,
-      "grad_norm": 1.9017610549926758,
-      "learning_rate": 5.228994334682604e-05,
-      "loss": 0.7553,
-      "step": 860
-    },
-    {
-      "epoch": 0.0768888888888889,
-      "grad_norm": 2.5367610454559326,
-      "learning_rate": 5.174497483512506e-05,
-      "loss": 0.7988,
-      "step": 865
-    },
-    {
-      "epoch": 0.07733333333333334,
-      "grad_norm": 2.2793335914611816,
-      "learning_rate": 5.119979868886895e-05,
-      "loss": 0.7736,
-      "step": 870
-    },
-    {
-      "epoch": 0.07777777777777778,
-      "grad_norm": 2.226646900177002,
-      "learning_rate": 5.0654479778567223e-05,
-      "loss": 0.7659,
-      "step": 875
-    },
-    {
-      "epoch": 0.07822222222222222,
-      "grad_norm": 2.1985228061676025,
-      "learning_rate": 5.010908299171685e-05,
-      "loss": 0.7584,
-      "step": 880
-    },
-    {
-      "epoch": 0.07866666666666666,
-      "grad_norm": 2.9187963008880615,
-      "learning_rate": 4.9563673225081314e-05,
-      "loss": 0.7747,
-      "step": 885
-    },
-    {
-      "epoch": 0.0791111111111111,
-      "grad_norm": 2.6130077838897705,
-      "learning_rate": 4.901831537696859e-05,
-      "loss": 0.7689,
-      "step": 890
-    },
-    {
-      "epoch": 0.07955555555555556,
-      "grad_norm": 2.0770986080169678,
-      "learning_rate": 4.8473074339508875e-05,
-      "loss": 0.7608,
-      "step": 895
-    },
-    {
-      "epoch": 0.08,
-      "grad_norm": 2.1071507930755615,
-      "learning_rate": 4.792801499093305e-05,
-      "loss": 0.7597,
-      "step": 900
-    },
-    {
-      "epoch": 0.08,
-      "eval_loss": 0.9028043746948242,
-      "eval_runtime": 151.7758,
-      "eval_samples_per_second": 32.943,
-      "eval_steps_per_second": 4.118,
-      "step": 900
-    },
-    {
-      "epoch": 0.08044444444444444,
-      "grad_norm": 2.295839309692383,
-      "learning_rate": 4.738320218785281e-05,
-      "loss": 0.7514,
-      "step": 905
-    },
-    {
-      "epoch": 0.08088888888888889,
-      "grad_norm": 1.9599803686141968,
-      "learning_rate": 4.683870075754347e-05,
-      "loss": 0.7633,
-      "step": 910
-    },
-    {
-      "epoch": 0.08133333333333333,
-      "grad_norm": 2.3032443523406982,
-      "learning_rate": 4.629457549023004e-05,
-      "loss": 0.7607,
-      "step": 915
-    },
-    {
-      "epoch": 0.08177777777777778,
-      "grad_norm": 2.9767608642578125,
-      "learning_rate": 4.575089113137792e-05,
-      "loss": 0.8124,
-      "step": 920
-    },
-    {
-      "epoch": 0.08222222222222222,
-      "grad_norm": 2.6544406414031982,
-      "learning_rate": 4.52077123739888e-05,
-      "loss": 0.799,
-      "step": 925
-    },
-    {
-      "epoch": 0.08266666666666667,
-      "grad_norm": 1.9514062404632568,
-      "learning_rate": 4.466510385090287e-05,
-      "loss": 0.7782,
-      "step": 930
-    },
-    {
-      "epoch": 0.08311111111111111,
-      "grad_norm": 2.3520166873931885,
-      "learning_rate": 4.412313012710813e-05,
-      "loss": 0.8328,
-      "step": 935
-    },
-    {
-      "epoch": 0.08355555555555555,
-      "grad_norm": 2.1971933841705322,
-      "learning_rate": 4.358185569205779e-05,
-      "loss": 0.7903,
-      "step": 940
-    },
-    {
-      "epoch": 0.084,
-      "grad_norm": 2.2344958782196045,
-      "learning_rate": 4.3041344951996746e-05,
-      "loss": 0.7288,
-      "step": 945
-    },
-    {
-      "epoch": 0.08444444444444445,
-      "grad_norm": 1.9887062311172485,
-      "learning_rate": 4.250166222229774e-05,
-      "loss": 0.7817,
-      "step": 950
-    },
-    {
-      "epoch": 0.08488888888888889,
-      "grad_norm": 2.5852599143981934,
-      "learning_rate": 4.196287171980869e-05,
-      "loss": 0.8126,
-      "step": 955
-    },
-    {
-      "epoch": 0.08533333333333333,
-      "grad_norm": 2.0287818908691406,
-      "learning_rate": 4.142503755521129e-05,
-      "loss": 0.8016,
-      "step": 960
-    },
-    {
-      "epoch": 0.08577777777777777,
-      "grad_norm": 2.318622589111328,
-      "learning_rate": 4.088822372539263e-05,
-      "loss": 0.7858,
-      "step": 965
-    },
-    {
-      "epoch": 0.08622222222222223,
-      "grad_norm": 1.7345527410507202,
-      "learning_rate": 4.035249410583016e-05,
-      "loss": 0.7737,
-      "step": 970
-    },
-    {
-      "epoch": 0.08666666666666667,
-      "grad_norm": 1.9019683599472046,
-      "learning_rate": 3.981791244299113e-05,
-      "loss": 0.7344,
-      "step": 975
-    },
-    {
-      "epoch": 0.08711111111111111,
-      "grad_norm": 2.2720935344696045,
-      "learning_rate": 3.928454234674747e-05,
-      "loss": 0.7723,
-      "step": 980
-    },
-    {
-      "epoch": 0.08755555555555555,
-      "grad_norm": 2.1315135955810547,
-      "learning_rate": 3.875244728280676e-05,
-      "loss": 0.799,
-      "step": 985
-    },
-    {
-      "epoch": 0.088,
-      "grad_norm": 2.0208346843719482,
-      "learning_rate": 3.82216905651605e-05,
-      "loss": 0.793,
-      "step": 990
-    },
-    {
-      "epoch": 0.08844444444444445,
-      "grad_norm": 2.7285315990448,
-      "learning_rate": 3.769233534855035e-05,
-      "loss": 0.7506,
-      "step": 995
-    },
-    {
-      "epoch": 0.08888888888888889,
-      "grad_norm": 2.095430374145508,
-      "learning_rate": 3.7164444620953396e-05,
-      "loss": 0.7534,
-      "step": 1000
-    },
-    {
-      "epoch": 0.08888888888888889,
-      "eval_loss": 0.8925400376319885,
-      "eval_runtime": 146.2458,
-      "eval_samples_per_second": 34.189,
-      "eval_steps_per_second": 4.274,
-      "step": 1000
-    },
-    {
-      "epoch": 0.08933333333333333,
-      "grad_norm": 2.029069423675537,
-      "learning_rate": 3.663808119608716e-05,
-      "loss": 0.792,
-      "step": 1005
-    },
-    {
-      "epoch": 0.08977777777777778,
-      "grad_norm": 2.4296746253967285,
-      "learning_rate": 3.6113307705935396e-05,
-      "loss": 0.7631,
-      "step": 1010
-    },
-    {
-      "epoch": 0.09022222222222222,
-      "grad_norm": 1.9055721759796143,
-      "learning_rate": 3.559018659329554e-05,
-      "loss": 0.764,
-      "step": 1015
-    },
-    {
-      "epoch": 0.09066666666666667,
-      "grad_norm": 1.9428242444992065,
-      "learning_rate": 3.506878010434863e-05,
-      "loss": 0.7671,
-      "step": 1020
-    },
-    {
-      "epoch": 0.09111111111111111,
-      "grad_norm": 2.55059552192688,
-      "learning_rate": 3.4549150281252636e-05,
-      "loss": 0.7873,
-      "step": 1025
-    },
-    {
-      "epoch": 0.09155555555555556,
-      "grad_norm": 1.9363492727279663,
-      "learning_rate": 3.403135895476004e-05,
-      "loss": 0.7592,
-      "step": 1030
-    },
-    {
-      "epoch": 0.092,
-      "grad_norm": 2.3893046379089355,
-      "learning_rate": 3.351546773686065e-05,
-      "loss": 0.7718,
-      "step": 1035
-    },
-    {
-      "epoch": 0.09244444444444444,
-      "grad_norm": 1.9245718717575073,
-      "learning_rate": 3.300153801345028e-05,
-      "loss": 0.7403,
-      "step": 1040
-    },
-    {
-      "epoch": 0.09288888888888888,
-      "grad_norm": 1.7629024982452393,
-      "learning_rate": 3.248963093702663e-05,
-      "loss": 0.7999,
-      "step": 1045
-    },
-    {
-      "epoch": 0.09333333333333334,
-      "grad_norm": 2.3090808391571045,
-      "learning_rate": 3.197980741941252e-05,
-      "loss": 0.7815,
-      "step": 1050
-    },
-    {
-      "epoch": 0.09377777777777778,
-      "grad_norm": 2.262960433959961,
-      "learning_rate": 3.147212812450819e-05,
-      "loss": 0.7737,
-      "step": 1055
-    },
-    {
-      "epoch": 0.09422222222222222,
-      "grad_norm": 2.4823575019836426,
-      "learning_rate": 3.096665346107278e-05,
-      "loss": 0.7961,
-      "step": 1060
-    },
-    {
-      "epoch": 0.09466666666666666,
-      "grad_norm": 2.411437511444092,
-      "learning_rate": 3.046344357553632e-05,
-      "loss": 0.8292,
-      "step": 1065
-    },
-    {
-      "epoch": 0.0951111111111111,
-      "grad_norm": 2.0698482990264893,
-      "learning_rate": 2.996255834484296e-05,
-      "loss": 0.7709,
-      "step": 1070
-    },
-    {
-      "epoch": 0.09555555555555556,
-      "grad_norm": 1.6237046718597412,
-      "learning_rate": 2.946405736932615e-05,
-      "loss": 0.7675,
-      "step": 1075
-    },
-    {
-      "epoch": 0.096,
-      "grad_norm": 2.6146557331085205,
-      "learning_rate": 2.8967999965616816e-05,
-      "loss": 0.7564,
-      "step": 1080
-    },
-    {
-      "epoch": 0.09644444444444444,
-      "grad_norm": 2.2040791511535645,
-      "learning_rate": 2.8474445159585235e-05,
-      "loss": 0.733,
-      "step": 1085
-    },
-    {
-      "epoch": 0.09688888888888889,
-      "grad_norm": 2.3044800758361816,
-      "learning_rate": 2.7983451679317706e-05,
-      "loss": 0.7705,
-      "step": 1090
-    },
-    {
-      "epoch": 0.09733333333333333,
-      "grad_norm": 2.10251784324646,
-      "learning_rate": 2.7495077948128245e-05,
-      "loss": 0.7545,
-      "step": 1095
-    },
-    {
-      "epoch": 0.09777777777777778,
-      "grad_norm": 2.353555202484131,
-      "learning_rate": 2.700938207760701e-05,
-      "loss": 0.7512,
-      "step": 1100
-    },
-    {
-      "epoch": 0.09777777777777778,
-      "eval_loss": 0.8821930885314941,
-      "eval_runtime": 144.8568,
-      "eval_samples_per_second": 34.517,
-      "eval_steps_per_second": 4.315,
-      "step": 1100
-    },
-    {
-      "epoch": 0.09822222222222222,
-      "grad_norm": 2.295103073120117,
-      "learning_rate": 2.6526421860705473e-05,
-      "loss": 0.7454,
-      "step": 1105
-    },
-    {
-      "epoch": 0.09866666666666667,
-      "grad_norm": 2.361027956008911,
-      "learning_rate": 2.6046254764859685e-05,
-      "loss": 0.6993,
-      "step": 1110
-    },
-    {
-      "epoch": 0.09911111111111111,
-      "grad_norm": 2.4085135459899902,
-      "learning_rate": 2.556893792515227e-05,
-      "loss": 0.7861,
-      "step": 1115
-    },
-    {
-      "epoch": 0.09955555555555555,
-      "grad_norm": 1.8174635171890259,
-      "learning_rate": 2.5094528137513795e-05,
-      "loss": 0.8115,
-      "step": 1120
-    },
-    {
-      "epoch": 0.1,
-      "grad_norm": 2.0099422931671143,
-      "learning_rate": 2.4623081851964806e-05,
-      "loss": 0.7719,
-      "step": 1125
-    },
-    {
-      "epoch": 0.10044444444444445,
-      "grad_norm": 2.152926445007324,
-      "learning_rate": 2.4154655165898627e-05,
-      "loss": 0.8149,
-      "step": 1130
-    },
-    {
-      "epoch": 0.10088888888888889,
-      "grad_norm": 1.8284573554992676,
-      "learning_rate": 2.3689303817406515e-05,
-      "loss": 0.7305,
-      "step": 1135
-    },
-    {
-      "epoch": 0.10133333333333333,
-      "grad_norm": 2.114602565765381,
-      "learning_rate": 2.3227083178645313e-05,
-      "loss": 0.7345,
-      "step": 1140
-    },
-    {
-      "epoch": 0.10177777777777777,
-      "grad_norm": 2.3404438495635986,
-      "learning_rate": 2.276804824924864e-05,
-      "loss": 0.7555,
-      "step": 1145
-    },
-    {
-      "epoch": 0.10222222222222223,
-      "grad_norm": 2.388821601867676,
-      "learning_rate": 2.2312253649782655e-05,
-      "loss": 0.7898,
-      "step": 1150
-    },
-    {
-      "epoch": 0.10266666666666667,
-      "grad_norm": 2.1446168422698975,
-      "learning_rate": 2.185975361524657e-05,
-      "loss": 0.7312,
-      "step": 1155
-    },
-    {
-      "epoch": 0.10311111111111111,
-      "grad_norm": 2.198666572570801,
-      "learning_rate": 2.1410601988619394e-05,
-      "loss": 0.7612,
-      "step": 1160
-    },
-    {
-      "epoch": 0.10355555555555555,
-      "grad_norm": 2.017157793045044,
-      "learning_rate": 2.0964852214453013e-05,
-      "loss": 0.7532,
-      "step": 1165
-    },
-    {
-      "epoch": 0.104,
-      "grad_norm": 1.7391427755355835,
-      "learning_rate": 2.0522557332512953e-05,
-      "loss": 0.7452,
-      "step": 1170
-    },
-    {
-      "epoch": 0.10444444444444445,
-      "grad_norm": 2.3819758892059326,
-      "learning_rate": 2.008376997146705e-05,
-      "loss": 0.7735,
-      "step": 1175
-    },
-    {
-      "epoch": 0.10488888888888889,
-      "grad_norm": 2.148484945297241,
-      "learning_rate": 1.9648542342623277e-05,
-      "loss": 0.7528,
-      "step": 1180
-    },
-    {
-      "epoch": 0.10533333333333333,
-      "grad_norm": 2.3765158653259277,
-      "learning_rate": 1.9216926233717085e-05,
-      "loss": 0.7266,
-      "step": 1185
-    },
-    {
-      "epoch": 0.10577777777777778,
-      "grad_norm": 2.245584487915039,
-      "learning_rate": 1.8788973002749105e-05,
-      "loss": 0.7631,
-      "step": 1190
-    },
-    {
-      "epoch": 0.10622222222222222,
-      "grad_norm": 2.7776741981506348,
-      "learning_rate": 1.83647335718742e-05,
-      "loss": 0.8013,
-      "step": 1195
-    },
-    {
-      "epoch": 0.10666666666666667,
-      "grad_norm": 2.3494765758514404,
-      "learning_rate": 1.7944258421342098e-05,
-      "loss": 0.7146,
-      "step": 1200
-    },
-    {
-      "epoch": 0.10666666666666667,
-      "eval_loss": 0.8797385692596436,
-      "eval_runtime": 152.824,
-      "eval_samples_per_second": 32.717,
-      "eval_steps_per_second": 4.09,
-      "step": 1200
-    },
-    {
-      "epoch": 0.10711111111111112,
-      "grad_norm": 2.1948468685150146,
-      "learning_rate": 1.7527597583490822e-05,
-      "loss": 0.7383,
-      "step": 1205
-    },
-    {
-      "epoch": 0.10755555555555556,
-      "grad_norm": 2.3977882862091064,
-      "learning_rate": 1.7114800636793377e-05,
-      "loss": 0.7751,
-      "step": 1210
-    },
-    {
-      "epoch": 0.108,
-      "grad_norm": 2.380903482437134,
-      "learning_rate": 1.670591669995829e-05,
-      "loss": 0.7514,
-      "step": 1215
-    },
-    {
-      "epoch": 0.10844444444444444,
-      "grad_norm": 2.2962698936462402,
-      "learning_rate": 1.6300994426085103e-05,
-      "loss": 0.7014,
-      "step": 1220
-    },
-    {
-      "epoch": 0.10888888888888888,
-      "grad_norm": 2.4068901538848877,
-      "learning_rate": 1.5900081996875083e-05,
-      "loss": 0.7087,
-      "step": 1225
-    },
-    {
-      "epoch": 0.10933333333333334,
-      "grad_norm": 1.881907343864441,
-      "learning_rate": 1.5503227116898016e-05,
-      "loss": 0.7847,
-      "step": 1230
-    },
-    {
-      "epoch": 0.10977777777777778,
-      "grad_norm": 1.9586148262023926,
-      "learning_rate": 1.5110477007916001e-05,
-      "loss": 0.7493,
-      "step": 1235
-    },
-    {
-      "epoch": 0.11022222222222222,
-      "grad_norm": 2.418649911880493,
-      "learning_rate": 1.4721878403264345e-05,
-      "loss": 0.7384,
-      "step": 1240
-    },
-    {
-      "epoch": 0.11066666666666666,
-      "grad_norm": 1.8795591592788696,
-      "learning_rate": 1.4337477542290928e-05,
-      "loss": 0.7254,
-      "step": 1245
-    },
-    {
-      "epoch": 0.1111111111111111,
-      "grad_norm": 1.8943523168563843,
-      "learning_rate": 1.3957320164854059e-05,
-      "loss": 0.7512,
-      "step": 1250
-    },
-    {
-      "epoch": 0.11155555555555556,
-      "grad_norm": 1.794277548789978,
-      "learning_rate": 1.3581451505879994e-05,
-      "loss": 0.7525,
-      "step": 1255
-    },
-    {
-      "epoch": 0.112,
-      "grad_norm": 2.044266700744629,
-      "learning_rate": 1.3209916289980334e-05,
-      "loss": 0.7377,
-      "step": 1260
-    },
-    {
-      "epoch": 0.11244444444444444,
-      "grad_norm": 2.0691747665405273,
-      "learning_rate": 1.2842758726130283e-05,
-      "loss": 0.7573,
-      "step": 1265
-    },
-    {
-      "epoch": 0.11288888888888889,
-      "grad_norm": 1.927995204925537,
-      "learning_rate": 1.2480022502408307e-05,
-      "loss": 0.7135,
-      "step": 1270
-    },
-    {
-      "epoch": 0.11333333333333333,
-      "grad_norm": 2.154827356338501,
-      "learning_rate": 1.2121750780797513e-05,
-      "loss": 0.7531,
-      "step": 1275
-    },
-    {
-      "epoch": 0.11377777777777778,
-      "grad_norm": 2.528263807296753,
-      "learning_rate": 1.1767986192049984e-05,
-      "loss": 0.7425,
-      "step": 1280
-    },
-    {
-      "epoch": 0.11422222222222222,
-      "grad_norm": 2.024723529815674,
-      "learning_rate": 1.1418770830614013e-05,
-      "loss": 0.7483,
-      "step": 1285
-    },
-    {
-      "epoch": 0.11466666666666667,
-      "grad_norm": 2.312997817993164,
-      "learning_rate": 1.1074146249625333e-05,
-      "loss": 0.7718,
-      "step": 1290
-    },
-    {
-      "epoch": 0.11511111111111111,
-      "grad_norm": 2.0183262825012207,
-      "learning_rate": 1.0734153455962765e-05,
-      "loss": 0.7252,
-      "step": 1295
-    },
-    {
-      "epoch": 0.11555555555555555,
-      "grad_norm": 1.9232224225997925,
-      "learning_rate": 1.0398832905368694e-05,
-      "loss": 0.7424,
-      "step": 1300
-    },
-    {
-      "epoch": 0.11555555555555555,
-      "eval_loss": 0.8784002065658569,
-      "eval_runtime": 150.2141,
-      "eval_samples_per_second": 33.286,
-      "eval_steps_per_second": 4.161,
-      "step": 1300
-    },
-    {
-      "epoch": 0.116,
-      "grad_norm": 2.266510486602783,
-      "learning_rate": 1.006822449763537e-05,
-      "loss": 0.7199,
-      "step": 1305
-    },
-    {
-      "epoch": 0.11644444444444445,
-      "grad_norm": 2.1905033588409424,
-      "learning_rate": 9.742367571857091e-06,
-      "loss": 0.6834,
-      "step": 1310
-    },
-    {
-      "epoch": 0.11688888888888889,
-      "grad_norm": 2.1627376079559326,
-      "learning_rate": 9.421300901749386e-06,
-      "loss": 0.7759,
-      "step": 1315
-    },
-    {
-      "epoch": 0.11733333333333333,
-      "grad_norm": 1.9716633558273315,
-      "learning_rate": 9.105062691035233e-06,
-      "loss": 0.7466,
-      "step": 1320
-    },
-    {
-      "epoch": 0.11777777777777777,
-      "grad_norm": 2.4646966457366943,
-      "learning_rate": 8.793690568899216e-06,
-      "loss": 0.7428,
-      "step": 1325
-    },
-    {
-      "epoch": 0.11822222222222223,
-      "grad_norm": 1.8310861587524414,
-      "learning_rate": 8.487221585510074e-06,
-      "loss": 0.7042,
-      "step": 1330
-    },
-    {
-      "epoch": 0.11866666666666667,
-      "grad_norm": 2.5189576148986816,
-      "learning_rate": 8.185692207612022e-06,
-      "loss": 0.7686,
-      "step": 1335
-    },
-    {
-      "epoch": 0.11911111111111111,
-      "grad_norm": 2.2453572750091553,
-      "learning_rate": 7.889138314185678e-06,
-      "loss": 0.7635,
-      "step": 1340
-    },
-    {
-      "epoch": 0.11955555555555555,
-      "grad_norm": 2.0272557735443115,
-      "learning_rate": 7.597595192178702e-06,
-      "loss": 0.7305,
-      "step": 1345
-    },
-    {
-      "epoch": 0.12,
-      "grad_norm": 1.907797932624817,
-      "learning_rate": 7.311097532307121e-06,
-      "loss": 0.7326,
-      "step": 1350
-    },
-    {
-      "epoch": 0.12044444444444445,
-      "grad_norm": 2.3352434635162354,
-      "learning_rate": 7.029679424927365e-06,
-      "loss": 0.7466,
-      "step": 1355
-    },
-    {
-      "epoch": 0.12088888888888889,
-      "grad_norm": 2.1825642585754395,
-      "learning_rate": 6.753374355979975e-06,
-      "loss": 0.7346,
-      "step": 1360
-    },
-    {
-      "epoch": 0.12133333333333333,
-      "grad_norm": 2.175647020339966,
-      "learning_rate": 6.482215203005015e-06,
-      "loss": 0.716,
-      "step": 1365
-    },
-    {
-      "epoch": 0.12177777777777778,
-      "grad_norm": 2.214008092880249,
-      "learning_rate": 6.216234231230012e-06,
-      "loss": 0.7253,
-      "step": 1370
-    },
-    {
-      "epoch": 0.12222222222222222,
-      "grad_norm": 1.9471856355667114,
-      "learning_rate": 5.955463089730723e-06,
-      "loss": 0.7655,
-      "step": 1375
-    },
-    {
-      "epoch": 0.12266666666666666,
-      "grad_norm": 1.9830577373504639,
-      "learning_rate": 5.699932807665198e-06,
-      "loss": 0.7597,
-      "step": 1380
-    },
-    {
-      "epoch": 0.12311111111111112,
-      "grad_norm": 2.7330000400543213,
-      "learning_rate": 5.449673790581611e-06,
-      "loss": 0.7379,
-      "step": 1385
-    },
-    {
-      "epoch": 0.12355555555555556,
-      "grad_norm": 1.8489813804626465,
-      "learning_rate": 5.204715816800343e-06,
-      "loss": 0.7257,
-      "step": 1390
-    },
-    {
-      "epoch": 0.124,
-      "grad_norm": 1.673449158668518,
-      "learning_rate": 4.965088033870608e-06,
-      "loss": 0.7137,
-      "step": 1395
-    },
-    {
-      "epoch": 0.12444444444444444,
-      "grad_norm": 1.8715451955795288,
-      "learning_rate": 4.730818955102234e-06,
-      "loss": 0.7182,
-      "step": 1400
-    },
-    {
-      "epoch": 0.12444444444444444,
-      "eval_loss": 0.8741394281387329,
-      "eval_runtime": 146.2188,
-      "eval_samples_per_second": 34.195,
-      "eval_steps_per_second": 4.274,
-      "step": 1400
-    },
-    {
-      "epoch": 0.12488888888888888,
-      "grad_norm": 2.205672264099121,
-      "learning_rate": 4.501936456172845e-06,
-      "loss": 0.677,
-      "step": 1405
-    },
-    {
-      "epoch": 0.12533333333333332,
-      "grad_norm": 1.7708081007003784,
-      "learning_rate": 4.278467771810896e-06,
-      "loss": 0.7472,
-      "step": 1410
-    },
-    {
-      "epoch": 0.12577777777777777,
-      "grad_norm": 1.9440569877624512,
-      "learning_rate": 4.06043949255509e-06,
-      "loss": 0.7535,
-      "step": 1415
-    },
-    {
-      "epoch": 0.12622222222222224,
-      "grad_norm": 2.4499361515045166,
-      "learning_rate": 3.847877561590296e-06,
-      "loss": 0.7376,
-      "step": 1420
-    },
-    {
-      "epoch": 0.12666666666666668,
-      "grad_norm": 1.9922362565994263,
-      "learning_rate": 3.6408072716606346e-06,
-      "loss": 0.7311,
-      "step": 1425
-    },
-    {
-      "epoch": 0.12711111111111112,
-      "grad_norm": 2.0958635807037354,
-      "learning_rate": 3.4392532620598216e-06,
-      "loss": 0.7467,
-      "step": 1430
-    },
-    {
-      "epoch": 0.12755555555555556,
-      "grad_norm": 2.4406979084014893,
-      "learning_rate": 3.24323951569942e-06,
-      "loss": 0.6949,
-      "step": 1435
-    },
-    {
-      "epoch": 0.128,
-      "grad_norm": 2.1745223999023438,
-      "learning_rate": 3.052789356255037e-06,
-      "loss": 0.7868,
-      "step": 1440
-    },
-    {
-      "epoch": 0.12844444444444444,
-      "grad_norm": 1.7190449237823486,
-      "learning_rate": 2.8679254453910785e-06,
-      "loss": 0.7544,
-      "step": 1445
-    },
-    {
-      "epoch": 0.1288888888888889,
-      "grad_norm": 2.2487220764160156,
-      "learning_rate": 2.688669780064268e-06,
-      "loss": 0.795,
-      "step": 1450
-    },
-    {
-      "epoch": 0.12933333333333333,
-      "grad_norm": 1.7264000177383423,
-      "learning_rate": 2.515043689906149e-06,
-      "loss": 0.7619,
-      "step": 1455
-    },
-    {
-      "epoch": 0.12977777777777777,
-      "grad_norm": 2.1828534603118896,
-      "learning_rate": 2.3470678346851518e-06,
-      "loss": 0.7232,
-      "step": 1460
-    },
-    {
-      "epoch": 0.1302222222222222,
-      "grad_norm": 2.1059188842773438,
-      "learning_rate": 2.1847622018482283e-06,
-      "loss": 0.7275,
-      "step": 1465
-    },
-    {
-      "epoch": 0.13066666666666665,
-      "grad_norm": 2.2545006275177,
-      "learning_rate": 2.0281461041425807e-06,
-      "loss": 0.7381,
-      "step": 1470
-    },
-    {
-      "epoch": 0.13111111111111112,
-      "grad_norm": 1.883216142654419,
-      "learning_rate": 1.8772381773176417e-06,
-      "loss": 0.713,
-      "step": 1475
-    },
-    {
-      "epoch": 0.13155555555555556,
-      "grad_norm": 1.9266149997711182,
-      "learning_rate": 1.7320563779075593e-06,
-      "loss": 0.7198,
-      "step": 1480
-    },
-    {
-      "epoch": 0.132,
-      "grad_norm": 2.2115542888641357,
-      "learning_rate": 1.5926179810946184e-06,
-      "loss": 0.7582,
-      "step": 1485
-    },
-    {
-      "epoch": 0.13244444444444445,
-      "grad_norm": 2.4320266246795654,
-      "learning_rate": 1.4589395786535953e-06,
-      "loss": 0.7429,
-      "step": 1490
-    },
-    {
-      "epoch": 0.1328888888888889,
-      "grad_norm": 2.242762804031372,
-      "learning_rate": 1.331037076977576e-06,
-      "loss": 0.7461,
-      "step": 1495
-    },
-    {
-      "epoch": 0.13333333333333333,
-      "grad_norm": 1.9152480363845825,
-      "learning_rate": 1.2089256951851924e-06,
-      "loss": 0.7538,
-      "step": 1500
-    },
-    {
-      "epoch": 0.13333333333333333,
-      "eval_loss": 0.8730303645133972,
-      "eval_runtime": 146.7239,
-      "eval_samples_per_second": 34.078,
-      "eval_steps_per_second": 4.26,
-      "step": 1500
-    },
-    {
-      "epoch": 0.13377777777777777,
-      "grad_norm": 2.0257728099823,
-      "learning_rate": 1.0926199633097157e-06,
-      "loss": 0.7371,
-      "step": 1505
-    },
-    {
-      "epoch": 0.13422222222222221,
-      "grad_norm": 2.0611376762390137,
-      "learning_rate": 9.821337205701665e-07,
-      "loss": 0.7441,
-      "step": 1510
-    },
-    {
-      "epoch": 0.13466666666666666,
-      "grad_norm": 2.425452470779419,
-      "learning_rate": 8.774801137245159e-07,
-      "loss": 0.7061,
-      "step": 1515
-    },
-    {
-      "epoch": 0.1351111111111111,
-      "grad_norm": 2.0856432914733887,
-      "learning_rate": 7.786715955054203e-07,
-      "loss": 0.7179,
-      "step": 1520
-    },
-    {
-      "epoch": 0.13555555555555557,
-      "grad_norm": 2.4328227043151855,
-      "learning_rate": 6.857199231384282e-07,
-      "loss": 0.7216,
-      "step": 1525
-    },
-    {
-      "epoch": 0.136,
-      "grad_norm": 1.9803858995437622,
-      "learning_rate": 5.986361569430165e-07,
-      "loss": 0.7653,
-      "step": 1530
-    },
-    {
-      "epoch": 0.13644444444444445,
-      "grad_norm": 1.987241268157959,
-      "learning_rate": 5.174306590164879e-07,
-      "loss": 0.7252,
-      "step": 1535
-    },
-    {
-      "epoch": 0.1368888888888889,
-      "grad_norm": 2.019005537033081,
-      "learning_rate": 4.4211309200102303e-07,
-      "loss": 0.7449,
-      "step": 1540
-    },
-    {
-      "epoch": 0.13733333333333334,
-      "grad_norm": 1.9359219074249268,
-      "learning_rate": 3.7269241793390085e-07,
-      "loss": 0.7122,
-      "step": 1545
-    },
-    {
-      "epoch": 0.13777777777777778,
-      "grad_norm": 2.6825265884399414,
-      "learning_rate": 3.09176897181096e-07,
-      "loss": 0.7412,
-      "step": 1550
-    },
-    {
-      "epoch": 0.13822222222222222,
-      "grad_norm": 2.0021426677703857,
-      "learning_rate": 2.515740874544148e-07,
-      "loss": 0.7252,
-      "step": 1555
-    },
-    {
-      "epoch": 0.13866666666666666,
-      "grad_norm": 1.840452790260315,
-      "learning_rate": 1.9989084291216487e-07,
-      "loss": 0.7414,
-      "step": 1560
-    },
-    {
-      "epoch": 0.1391111111111111,
-      "grad_norm": 1.8142333030700684,
-      "learning_rate": 1.5413331334360182e-07,
-      "loss": 0.7031,
-      "step": 1565
-    },
-    {
-      "epoch": 0.13955555555555554,
-      "grad_norm": 2.1795592308044434,
-      "learning_rate": 1.1430694343715353e-07,
-      "loss": 0.7334,
-      "step": 1570
-    },
-    {
-      "epoch": 0.14,
-      "grad_norm": 2.239292621612549,
-      "learning_rate": 8.041647213256064e-08,
-      "loss": 0.7233,
-      "step": 1575
-    },
-    {
-      "epoch": 0.14044444444444446,
-      "grad_norm": 2.0367276668548584,
-      "learning_rate": 5.246593205699424e-08,
-      "loss": 0.7542,
-      "step": 1580
-    },
-    {
-      "epoch": 0.1408888888888889,
-      "grad_norm": 2.47322678565979,
-      "learning_rate": 3.04586490452119e-08,
-      "loss": 0.7508,
-      "step": 1585
-    },
-    {
-      "epoch": 0.14133333333333334,
-      "grad_norm": 2.6402623653411865,
-      "learning_rate": 1.4397241743813184e-08,
-      "loss": 0.7378,
-      "step": 1590
-    },
-    {
-      "epoch": 0.14177777777777778,
-      "grad_norm": 2.3272688388824463,
-      "learning_rate": 4.2836212996499865e-09,
-      "loss": 0.7607,
-      "step": 1595
-    },
-    {
-      "epoch": 0.14222222222222222,
-      "grad_norm": 2.066089630126953,
-      "learning_rate": 1.189911324084303e-10,
-      "loss": 0.7017,
-      "step": 1600
-    },
-    {
-      "epoch": 0.14222222222222222,
-      "eval_loss": 0.8723308444023132,
-      "eval_runtime": 146.0052,
-      "eval_samples_per_second": 34.245,
-      "eval_steps_per_second": 4.281,
-      "step": 1600
     }
   ],
   "logging_steps": 5,
-  "max_steps": 1600,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 1,
-  "save_steps": 100,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
@@ -2397,13 +1181,13 @@
         "should_evaluate": false,
         "should_log": false,
         "should_save": true,
-        "should_training_stop": true
       },
       "attributes": {}
     }
   },
-  "total_flos": 347760947673600.0,
-  "train_batch_size": 1,
   "trial_name": null,
   "trial_params": null
 }

 {
+  "best_global_step": 800,
+  "best_metric": 0.7418414950370789,
+  "best_model_checkpoint": "checkpoints/lora_uci/checkpoint-800",
+  "epoch": 0.28444444444444444,
+  "eval_steps": 200,
+  "global_step": 800,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.00035555555555555557,
+      "grad_norm": 32.15512466430664,
       "learning_rate": 0.0,
+      "loss": 4.3308,
       "step": 1
     },
     {
+      "epoch": 0.0017777777777777779,
+      "grad_norm": 29.14179801940918,
+      "learning_rate": 1.6666666666666667e-06,
+      "loss": 4.2083,
       "step": 5
     },
     {
+      "epoch": 0.0035555555555555557,
+      "grad_norm": 22.27640724182129,
+      "learning_rate": 3.75e-06,
+      "loss": 3.5173,
       "step": 10
     },
     {
+      "epoch": 0.005333333333333333,
+      "grad_norm": 16.255199432373047,
+      "learning_rate": 5.833333333333334e-06,
+      "loss": 2.492,
       "step": 15
     },
     {
+      "epoch": 0.0071111111111111115,
+      "grad_norm": 13.185359954833984,
+      "learning_rate": 7.916666666666667e-06,
+      "loss": 1.6992,
       "step": 20
     },
     {
+      "epoch": 0.008888888888888889,
+      "grad_norm": 10.229533195495605,
+      "learning_rate": 1e-05,
+      "loss": 1.3879,
       "step": 25
     },
     {
+      "epoch": 0.010666666666666666,
+      "grad_norm": 12.396824836730957,
+      "learning_rate": 1.2083333333333333e-05,
+      "loss": 1.2205,
       "step": 30
     },
     {
+      "epoch": 0.012444444444444444,
+      "grad_norm": 34.271759033203125,
+      "learning_rate": 1.4166666666666668e-05,
+      "loss": 1.12,
       "step": 35
     },
     {
+      "epoch": 0.014222222222222223,
+      "grad_norm": 16.109539031982422,
+      "learning_rate": 1.6250000000000002e-05,
+      "loss": 1.1309,
       "step": 40
     },
     {
+      "epoch": 0.016,
+      "grad_norm": 14.91402816772461,
+      "learning_rate": 1.8333333333333333e-05,
+      "loss": 1.1121,
       "step": 45
     },
     {
+      "epoch": 0.017777777777777778,
+      "grad_norm": 11.38852310180664,
+      "learning_rate": 2.0416666666666667e-05,
+      "loss": 1.0329,
       "step": 50
     },
     {
+      "epoch": 0.019555555555555555,
+      "grad_norm": 13.237730979919434,
+      "learning_rate": 2.25e-05,
+      "loss": 1.0185,
       "step": 55
     },
     {
+      "epoch": 0.021333333333333333,
+      "grad_norm": 29.64104461669922,
+      "learning_rate": 2.4583333333333332e-05,
+      "loss": 1.012,
       "step": 60
     },
     {
+      "epoch": 0.02311111111111111,
+      "grad_norm": 9.474141120910645,
+      "learning_rate": 2.6666666666666667e-05,
+      "loss": 0.9874,
       "step": 65
     },
     {
+      "epoch": 0.024888888888888887,
+      "grad_norm": 8.440736770629883,
+      "learning_rate": 2.8749999999999997e-05,
+      "loss": 0.9604,
       "step": 70
     },
     {
+      "epoch": 0.02666666666666667,
+      "grad_norm": 6.6439900398254395,
+      "learning_rate": 3.0833333333333335e-05,
+      "loss": 0.9503,
       "step": 75
     },
     {
+      "epoch": 0.028444444444444446,
+      "grad_norm": 7.365396499633789,
+      "learning_rate": 3.291666666666667e-05,
+      "loss": 0.9339,
       "step": 80
     },
     {
+      "epoch": 0.030222222222222223,
+      "grad_norm": 8.67831802368164,
+      "learning_rate": 3.5e-05,
+      "loss": 0.919,
       "step": 85
     },
     {
+      "epoch": 0.032,
+      "grad_norm": 9.373591423034668,
+      "learning_rate": 3.708333333333334e-05,
+      "loss": 0.9177,
       "step": 90
     },
     {
+      "epoch": 0.033777777777777775,
+      "grad_norm": 6.998920440673828,
+      "learning_rate": 3.9166666666666665e-05,
+      "loss": 0.9053,
       "step": 95
     },
     {
+      "epoch": 0.035555555555555556,
+      "grad_norm": 7.322479248046875,
+      "learning_rate": 4.125e-05,
+      "loss": 0.9217,
       "step": 100
     },
     {
+      "epoch": 0.037333333333333336,
+      "grad_norm": 8.45313549041748,
+      "learning_rate": 4.3333333333333334e-05,
+      "loss": 0.9077,
       "step": 105
     },
     {
+      "epoch": 0.03911111111111111,
+      "grad_norm": 10.838536262512207,
+      "learning_rate": 4.541666666666667e-05,
+      "loss": 0.9066,
       "step": 110
     },
     {
+      "epoch": 0.04088888888888889,
+      "grad_norm": 9.282814979553223,
+      "learning_rate": 4.75e-05,
+      "loss": 0.8939,
       "step": 115
     },
     {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 5.256754398345947,
+      "learning_rate": 4.958333333333334e-05,
+      "loss": 0.8755,
       "step": 120
     },
     {
+      "epoch": 0.044444444444444446,
+      "grad_norm": 6.6123552322387695,
+      "learning_rate": 5.166666666666667e-05,
+      "loss": 0.8843,
       "step": 125
     },
     {
+      "epoch": 0.04622222222222222,
+      "grad_norm": 6.317594528198242,
+      "learning_rate": 5.375e-05,
+      "loss": 0.8802,
       "step": 130
     },
     {
+      "epoch": 0.048,
+      "grad_norm": 4.420418739318848,
+      "learning_rate": 5.583333333333334e-05,
+      "loss": 0.8716,
       "step": 135
     },
     {
+      "epoch": 0.049777777777777775,
+      "grad_norm": 7.11093282699585,
+      "learning_rate": 5.7916666666666674e-05,
+      "loss": 0.8944,
       "step": 140
     },
     {
+      "epoch": 0.051555555555555556,
+      "grad_norm": 8.643278121948242,
+      "learning_rate": 6e-05,
+      "loss": 0.9049,
       "step": 145
     },
     {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 5.504462718963623,
+      "learning_rate": 6.208333333333334e-05,
+      "loss": 0.9215,
       "step": 150
     },
     {
+      "epoch": 0.05511111111111111,
+      "grad_norm": 4.5625200271606445,
+      "learning_rate": 6.416666666666668e-05,
+      "loss": 0.8763,
       "step": 155
     },
     {
+      "epoch": 0.05688888888888889,
+      "grad_norm": 4.5830397605896,
+      "learning_rate": 6.625e-05,
+      "loss": 0.8967,
       "step": 160
     },
     {
+      "epoch": 0.058666666666666666,
+      "grad_norm": 5.370687961578369,
+      "learning_rate": 6.833333333333333e-05,
+      "loss": 0.868,
       "step": 165
     },
     {
+      "epoch": 0.060444444444444446,
+      "grad_norm": 8.188835144042969,
+      "learning_rate": 7.041666666666668e-05,
+      "loss": 0.8853,
       "step": 170
     },
     {
+      "epoch": 0.06222222222222222,
+      "grad_norm": 3.952087163925171,
+      "learning_rate": 7.25e-05,
+      "loss": 0.8724,
       "step": 175
     },
     {
+      "epoch": 0.064,
+      "grad_norm": 4.194353103637695,
+      "learning_rate": 7.458333333333333e-05,
+      "loss": 0.8581,
       "step": 180
     },
     {
+      "epoch": 0.06577777777777778,
+      "grad_norm": 2.985386610031128,
+      "learning_rate": 7.666666666666667e-05,
+      "loss": 0.8496,
       "step": 185
     },
     {
+      "epoch": 0.06755555555555555,
+      "grad_norm": 5.666004657745361,
+      "learning_rate": 7.875e-05,
+      "loss": 0.8816,
       "step": 190
     },
     {
+      "epoch": 0.06933333333333333,
+      "grad_norm": 3.95521879196167,
+      "learning_rate": 8.083333333333334e-05,
+      "loss": 0.8872,
       "step": 195
     },
     {
+      "epoch": 0.07111111111111111,
+      "grad_norm": 4.558910369873047,
+      "learning_rate": 8.291666666666667e-05,
+      "loss": 0.8802,
       "step": 200
     },
     {
+      "epoch": 0.07111111111111111,
+      "eval_loss": 0.8564087748527527,
+      "eval_runtime": 155.7786,
+      "eval_samples_per_second": 32.097,
+      "eval_steps_per_second": 4.012,
       "step": 200
     },
     {
+      "epoch": 0.07288888888888889,
+      "grad_norm": 2.4701411724090576,
+      "learning_rate": 8.5e-05,
+      "loss": 0.8383,
       "step": 205
     },
     {
+      "epoch": 0.07466666666666667,
+      "grad_norm": 4.364571571350098,
+      "learning_rate": 8.708333333333334e-05,
+      "loss": 0.8763,
       "step": 210
     },
     {
+      "epoch": 0.07644444444444444,
+      "grad_norm": 4.059802532196045,
+      "learning_rate": 8.916666666666667e-05,
+      "loss": 0.8928,
       "step": 215
     },
     {
+      "epoch": 0.07822222222222222,
+      "grad_norm": 7.405764579772949,
+      "learning_rate": 9.125e-05,
+      "loss": 0.8619,
       "step": 220
     },
     {
+      "epoch": 0.08,
+      "grad_norm": 4.007632732391357,
+      "learning_rate": 9.333333333333334e-05,
+      "loss": 0.9656,
       "step": 225
     },
     {
+      "epoch": 0.08177777777777778,
+      "grad_norm": 6.396026611328125,
+      "learning_rate": 9.541666666666668e-05,
+      "loss": 0.9084,
       "step": 230
     },
     {
+      "epoch": 0.08355555555555555,
+      "grad_norm": 4.630360126495361,
+      "learning_rate": 9.75e-05,
+      "loss": 0.8617,
       "step": 235
     },
     {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 2.987304925918579,
+      "learning_rate": 9.958333333333335e-05,
+      "loss": 0.8696,
       "step": 240
     },
     {
+      "epoch": 0.08711111111111111,
+      "grad_norm": 3.981341600418091,
+      "learning_rate": 9.999915384288722e-05,
+      "loss": 0.8412,
       "step": 245
     },
     {
+      "epoch": 0.08888888888888889,
+      "grad_norm": 2.754917860031128,
+      "learning_rate": 9.999571637870036e-05,
+      "loss": 0.8526,
       "step": 250
     },
     {
+      "epoch": 0.09066666666666667,
+      "grad_norm": 2.6841213703155518,
+      "learning_rate": 9.998963490426943e-05,
+      "loss": 0.853,
       "step": 255
     },
     {
+      "epoch": 0.09244444444444444,
+      "grad_norm": 3.0342020988464355,
+      "learning_rate": 9.998090974121159e-05,
+      "loss": 0.8551,
       "step": 260
     },
     {
+      "epoch": 0.09422222222222222,
+      "grad_norm": 3.418090343475342,
+      "learning_rate": 9.99695413509548e-05,
+      "loss": 0.8315,
       "step": 265
     },
     {
+      "epoch": 0.096,
+      "grad_norm": 2.6049306392669678,
+      "learning_rate": 9.995553033471335e-05,
+      "loss": 0.8239,
       "step": 270
     },
     {
+      "epoch": 0.09777777777777778,
+      "grad_norm": 4.258927345275879,
+      "learning_rate": 9.993887743345614e-05,
+      "loss": 0.84,
       "step": 275
     },
     {
+      "epoch": 0.09955555555555555,
+      "grad_norm": 3.5856103897094727,
+      "learning_rate": 9.991958352786744e-05,
+      "loss": 0.8397,
       "step": 280
     },
     {
+      "epoch": 0.10133333333333333,
+      "grad_norm": 4.18107271194458,
+      "learning_rate": 9.989764963830037e-05,
+      "loss": 0.8283,
       "step": 285
     },
     {
+      "epoch": 0.10311111111111111,
+      "grad_norm": 2.9539637565612793,
+      "learning_rate": 9.987307692472287e-05,
+      "loss": 0.8315,
       "step": 290
     },
     {
+      "epoch": 0.10488888888888889,
+      "grad_norm": 2.6121134757995605,
+      "learning_rate": 9.98458666866564e-05,
+      "loss": 0.8203,
       "step": 295
     },
     {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 3.4740283489227295,
+      "learning_rate": 9.98160203631072e-05,
+      "loss": 0.83,
       "step": 300
     },
     {
+      "epoch": 0.10844444444444444,
+      "grad_norm": 3.486816167831421,
+      "learning_rate": 9.978353953249022e-05,
+      "loss": 0.8269,
       "step": 305
     },
     {
+      "epoch": 0.11022222222222222,
+      "grad_norm": 2.7455246448516846,
+      "learning_rate": 9.974842591254558e-05,
+      "loss": 0.8332,
       "step": 310
     },
     {
+      "epoch": 0.112,
+      "grad_norm": 2.8629767894744873,
+      "learning_rate": 9.971068136024781e-05,
+      "loss": 0.8305,
       "step": 315
     },
     {
+      "epoch": 0.11377777777777778,
+      "grad_norm": 2.647754192352295,
+      "learning_rate": 9.967030787170757e-05,
+      "loss": 0.8213,
       "step": 320
     },
     {
+      "epoch": 0.11555555555555555,
+      "grad_norm": 2.873353958129883,
+      "learning_rate": 9.962730758206611e-05,
+      "loss": 0.8269,
       "step": 325
     },
     {
+      "epoch": 0.11733333333333333,
+      "grad_norm": 1.9501383304595947,
+      "learning_rate": 9.95816827653824e-05,
+      "loss": 0.8001,
       "step": 330
     },
     {
+      "epoch": 0.11911111111111111,
+      "grad_norm": 2.3588831424713135,
+      "learning_rate": 9.95334358345128e-05,
+      "loss": 0.7965,
       "step": 335
     },
     {
+      "epoch": 0.12088888888888889,
+      "grad_norm": 1.9669915437698364,
+      "learning_rate": 9.948256934098352e-05,
+      "loss": 0.7949,
       "step": 340
     },
     {
+      "epoch": 0.12266666666666666,
+      "grad_norm": 2.3287253379821777,
+      "learning_rate": 9.942908597485558e-05,
+      "loss": 0.8312,
       "step": 345
     },
     {
+      "epoch": 0.12444444444444444,
+      "grad_norm": 3.263697385787964,
+      "learning_rate": 9.93729885645827e-05,
+      "loss": 0.8305,
       "step": 350
     },
     {
+      "epoch": 0.12622222222222224,
+      "grad_norm": 1.87248694896698,
+      "learning_rate": 9.931428007686158e-05,
+      "loss": 0.8292,
       "step": 355
     },
     {
+      "epoch": 0.128,
+      "grad_norm": 2.7504541873931885,
+      "learning_rate": 9.925296361647504e-05,
+      "loss": 0.8285,
       "step": 360
     },
     {
+      "epoch": 0.12977777777777777,
+      "grad_norm": 2.169858694076538,
+      "learning_rate": 9.918904242612795e-05,
+      "loss": 0.8166,
       "step": 365
     },
     {
+      "epoch": 0.13155555555555556,
+      "grad_norm": 2.2024645805358887,
+      "learning_rate": 9.912251988627549e-05,
+      "loss": 0.7927,
       "step": 370
     },
     {
+      "epoch": 0.13333333333333333,
+      "grad_norm": 2.103611946105957,
+      "learning_rate": 9.905339951494463e-05,
+      "loss": 0.8236,
       "step": 375
     },
     {
+      "epoch": 0.1351111111111111,
+      "grad_norm": 2.265293836593628,
+      "learning_rate": 9.898168496754794e-05,
+      "loss": 0.7926,
       "step": 380
     },
     {
+      "epoch": 0.1368888888888889,
+      "grad_norm": 1.8098556995391846,
+      "learning_rate": 9.890738003669029e-05,
+      "loss": 0.812,
       "step": 385
     },
     {
+      "epoch": 0.13866666666666666,
+      "grad_norm": 1.6579197645187378,
+      "learning_rate": 9.88304886519683e-05,
+      "loss": 0.7933,
       "step": 390
     },
     {
+      "epoch": 0.14044444444444446,
+      "grad_norm": 1.664461612701416,
+      "learning_rate": 9.875101487976253e-05,
+      "loss": 0.798,
       "step": 395
     },
     {
+      "epoch": 0.14222222222222222,
+      "grad_norm": 1.6052963733673096,
+      "learning_rate": 9.866896292302243e-05,
+      "loss": 0.7937,
       "step": 400
     },
     {
+      "epoch": 0.14222222222222222,
+      "eval_loss": 0.791572093963623,
+      "eval_runtime": 159.2149,
+      "eval_samples_per_second": 31.404,
+      "eval_steps_per_second": 3.926,
       "step": 400
     },
     {
+      "epoch": 0.144,
+      "grad_norm": 2.126084566116333,
+      "learning_rate": 9.858433712104403e-05,
+      "loss": 0.8188,
       "step": 405
     },
     {
+      "epoch": 0.14577777777777778,
+      "grad_norm": 3.2941622734069824,
+      "learning_rate": 9.849714194924046e-05,
+      "loss": 0.8067,
       "step": 410
     },
     {
+      "epoch": 0.14755555555555555,
+      "grad_norm": 1.658234715461731,
+      "learning_rate": 9.84073820189054e-05,
+      "loss": 0.7953,
       "step": 415
     },
     {
+      "epoch": 0.14933333333333335,
+      "grad_norm": 2.6132164001464844,
+      "learning_rate": 9.831506207696898e-05,
+      "loss": 0.8044,
       "step": 420
     },
     {
+      "epoch": 0.1511111111111111,
+      "grad_norm": 1.6197243928909302,
+      "learning_rate": 9.822018700574695e-05,
+      "loss": 0.7818,
       "step": 425
     },
     {
+      "epoch": 0.15288888888888888,
+      "grad_norm": 2.1293976306915283,
+      "learning_rate": 9.812276182268236e-05,
+      "loss": 0.7796,
       "step": 430
     },
     {
+      "epoch": 0.15466666666666667,
+      "grad_norm": 2.590989589691162,
+      "learning_rate": 9.802279168008029e-05,
+      "loss": 0.7903,
       "step": 435
     },
     {
+      "epoch": 0.15644444444444444,
+      "grad_norm": 1.674521803855896,
+      "learning_rate": 9.792028186483526e-05,
+      "loss": 0.7772,
       "step": 440
     },
     {
+      "epoch": 0.1582222222222222,
+      "grad_norm": 2.3836069107055664,
+      "learning_rate": 9.781523779815179e-05,
+      "loss": 0.7934,
       "step": 445
     },
     {
+      "epoch": 0.16,
+      "grad_norm": 2.3944199085235596,
+      "learning_rate": 9.770766503525754e-05,
+      "loss": 0.7932,
       "step": 450
     },
     {
+      "epoch": 0.16177777777777777,
+      "grad_norm": 2.112563371658325,
+      "learning_rate": 9.759756926510965e-05,
+      "loss": 0.7873,
       "step": 455
     },
     {
+      "epoch": 0.16355555555555557,
+      "grad_norm": 2.041534185409546,
+      "learning_rate": 9.748495631009386e-05,
+      "loss": 0.796,
       "step": 460
     },
     {
+      "epoch": 0.16533333333333333,
+      "grad_norm": 1.7045772075653076,
+      "learning_rate": 9.736983212571646e-05,
+      "loss": 0.7791,
       "step": 465
     },
     {
+      "epoch": 0.1671111111111111,
+      "grad_norm": 1.5439887046813965,
+      "learning_rate": 9.725220280028957e-05,
+      "loss": 0.7939,
       "step": 470
     },
     {
+      "epoch": 0.1688888888888889,
+      "grad_norm": 1.459672451019287,
+      "learning_rate": 9.713207455460894e-05,
+      "loss": 0.7749,
       "step": 475
     },
     {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 3.114187240600586,
+      "learning_rate": 9.700945374162506e-05,
+      "loss": 0.7785,
       "step": 480
     },
     {
+      "epoch": 0.17244444444444446,
+      "grad_norm": 1.7480342388153076,
+      "learning_rate": 9.688434684610726e-05,
+      "loss": 0.7653,
       "step": 485
     },
     {
+      "epoch": 0.17422222222222222,
+      "grad_norm": 1.854999303817749,
+      "learning_rate": 9.67567604843006e-05,
+      "loss": 0.7878,
       "step": 490
     },
     {
+      "epoch": 0.176,
+      "grad_norm": 2.006537437438965,
+      "learning_rate": 9.662670140357611e-05,
+      "loss": 0.7851,
       "step": 495
     },
     {
+      "epoch": 0.17777777777777778,
+      "grad_norm": 1.9404226541519165,
+      "learning_rate": 9.649417648207388e-05,
+      "loss": 0.7719,
       "step": 500
     },
     {
+      "epoch": 0.17955555555555555,
+      "grad_norm": 1.7404245138168335,
+      "learning_rate": 9.635919272833938e-05,
+      "loss": 0.775,
       "step": 505
     },
     {
+      "epoch": 0.18133333333333335,
+      "grad_norm": 1.4806632995605469,
+      "learning_rate": 9.622175728095271e-05,
+      "loss": 0.7822,
       "step": 510
     },
     {
+      "epoch": 0.1831111111111111,
+      "grad_norm": 1.607060432434082,
+      "learning_rate": 9.60818774081512e-05,
+      "loss": 0.7822,
       "step": 515
     },
     {
+      "epoch": 0.18488888888888888,
+      "grad_norm": 1.6430386304855347,
+      "learning_rate": 9.593956050744492e-05,
+      "loss": 0.7711,
       "step": 520
     },
     {
+      "epoch": 0.18666666666666668,
+      "grad_norm": 2.3202788829803467,
+      "learning_rate": 9.579481410522556e-05,
+      "loss": 0.7839,
       "step": 525
     },
     {
+      "epoch": 0.18844444444444444,
+      "grad_norm": 2.160609722137451,
+      "learning_rate": 9.564764585636833e-05,
+      "loss": 0.7854,
       "step": 530
     },
     {
+      "epoch": 0.1902222222222222,
+      "grad_norm": 1.7440357208251953,
+      "learning_rate": 9.549806354382717e-05,
+      "loss": 0.7806,
       "step": 535
     },
     {
+      "epoch": 0.192,
+      "grad_norm": 1.8481121063232422,
+      "learning_rate": 9.534607507822313e-05,
+      "loss": 0.7701,
       "step": 540
     },
     {
+      "epoch": 0.19377777777777777,
+      "grad_norm": 1.9447892904281616,
+      "learning_rate": 9.519168849742604e-05,
+      "loss": 0.772,
       "step": 545
     },
     {
+      "epoch": 0.19555555555555557,
+      "grad_norm": 2.9007174968719482,
+      "learning_rate": 9.503491196612939e-05,
+      "loss": 0.7486,
       "step": 550
     },
     {
+      "epoch": 0.19733333333333333,
+      "grad_norm": 1.5981870889663696,
+      "learning_rate": 9.487575377541864e-05,
+      "loss": 0.7713,
       "step": 555
     },
     {
+      "epoch": 0.1991111111111111,
+      "grad_norm": 1.6032360792160034,
+      "learning_rate": 9.471422234233259e-05,
+      "loss": 0.7596,
       "step": 560
     },
     {
+      "epoch": 0.2008888888888889,
+      "grad_norm": 1.7337145805358887,
+      "learning_rate": 9.45503262094184e-05,
+      "loss": 0.7725,
       "step": 565
     },
     {
+      "epoch": 0.20266666666666666,
+      "grad_norm": 1.7922943830490112,
+      "learning_rate": 9.438407404427971e-05,
+      "loss": 0.7646,
       "step": 570
     },
     {
+      "epoch": 0.20444444444444446,
+      "grad_norm": 1.2763404846191406,
+      "learning_rate": 9.421547463911835e-05,
+      "loss": 0.7744,
       "step": 575
     },
     {
+      "epoch": 0.20622222222222222,
+      "grad_norm": 1.6107685565948486,
+      "learning_rate": 9.404453691026929e-05,
+      "loss": 0.7854,
       "step": 580
     },
     {
+      "epoch": 0.208,
+      "grad_norm": 1.531690239906311,
+      "learning_rate": 9.38712698977291e-05,
+      "loss": 0.7765,
       "step": 585
     },
     {
+      "epoch": 0.20977777777777779,
+      "grad_norm": 4.303262710571289,
+      "learning_rate": 9.369568276467797e-05,
+      "loss": 0.7451,
       "step": 590
     },
     {
+      "epoch": 0.21155555555555555,
+      "grad_norm": 1.4451102018356323,
+      "learning_rate": 9.351778479699499e-05,
+      "loss": 0.767,
       "step": 595
     },
     {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 1.9426709413528442,
+      "learning_rate": 9.333758540276716e-05,
+      "loss": 0.7611,
       "step": 600
     },
     {
+      "epoch": 0.21333333333333335,
+      "eval_loss": 0.7643172144889832,
+      "eval_runtime": 149.0272,
+      "eval_samples_per_second": 33.551,
+      "eval_steps_per_second": 4.194,
       "step": 600
     },
     {
+      "epoch": 0.21511111111111111,
+      "grad_norm": 1.740432858467102,
+      "learning_rate": 9.315509411179182e-05,
+      "loss": 0.763,
       "step": 605
     },
     {
+      "epoch": 0.21688888888888888,
+      "grad_norm": 2.1427745819091797,
+      "learning_rate": 9.297032057507264e-05,
+      "loss": 0.7717,
       "step": 610
     },
     {
+      "epoch": 0.21866666666666668,
+      "grad_norm": 2.2643210887908936,
+      "learning_rate": 9.278327456430926e-05,
+      "loss": 0.7917,
       "step": 615
     },
     {
+      "epoch": 0.22044444444444444,
+      "grad_norm": 1.9076204299926758,
+      "learning_rate": 9.259396597138052e-05,
+      "loss": 0.7637,
       "step": 620
     },
     {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 1.2893517017364502,
+      "learning_rate": 9.24024048078213e-05,
+      "loss": 0.7435,
       "step": 625
     },
     {
+      "epoch": 0.224,
+      "grad_norm": 1.9294421672821045,
+      "learning_rate": 9.22086012042931e-05,
+      "loss": 0.7446,
       "step": 630
     },
     {
+      "epoch": 0.22577777777777777,
+      "grad_norm": 1.516155481338501,
+      "learning_rate": 9.201256541004829e-05,
+      "loss": 0.7608,
       "step": 635
     },
     {
+      "epoch": 0.22755555555555557,
+      "grad_norm": 1.4733030796051025,
+      "learning_rate": 9.181430779238797e-05,
+      "loss": 0.7708,
       "step": 640
     },
     {
+      "epoch": 0.22933333333333333,
+      "grad_norm": 1.7201124429702759,
+      "learning_rate": 9.16138388361139e-05,
+      "loss": 0.7475,
       "step": 645
     },
     {
+      "epoch": 0.2311111111111111,
+      "grad_norm": 1.573810338973999,
+      "learning_rate": 9.141116914297378e-05,
+      "loss": 0.7782,
       "step": 650
     },
     {
+      "epoch": 0.2328888888888889,
+      "grad_norm": 1.3574259281158447,
+      "learning_rate": 9.120630943110077e-05,
+      "loss": 0.7406,
       "step": 655
     },
     {
+      "epoch": 0.23466666666666666,
+      "grad_norm": 1.912653923034668,
+      "learning_rate": 9.099927053444662e-05,
+      "loss": 0.7462,
       "step": 660
     },
     {
+      "epoch": 0.23644444444444446,
+      "grad_norm": 2.2716944217681885,
+      "learning_rate": 9.079006340220862e-05,
+      "loss": 0.7526,
       "step": 665
     },
     {
+      "epoch": 0.23822222222222222,
+      "grad_norm": 1.625752329826355,
+      "learning_rate": 9.057869909825062e-05,
+      "loss": 0.762,
       "step": 670
     },
     {
+      "epoch": 0.24,
+      "grad_norm": 1.2941769361495972,
+      "learning_rate": 9.0365188800518e-05,
+      "loss": 0.7544,
       "step": 675
     },
     {
+      "epoch": 0.24177777777777779,
+      "grad_norm": 1.6075447797775269,
+      "learning_rate": 9.01495438004464e-05,
+      "loss": 0.7639,
       "step": 680
     },
     {
+      "epoch": 0.24355555555555555,
+      "grad_norm": 1.671107292175293,
+      "learning_rate": 8.993177550236464e-05,
+      "loss": 0.7567,
       "step": 685
     },
     {
+      "epoch": 0.24533333333333332,
+      "grad_norm": 1.3904489278793335,
+      "learning_rate": 8.971189542289162e-05,
+      "loss": 0.7633,
       "step": 690
     },
     {
+      "epoch": 0.24711111111111111,
+      "grad_norm": 1.997207760810852,
+      "learning_rate": 8.948991519032716e-05,
+      "loss": 0.7403,
       "step": 695
     },
     {
+      "epoch": 0.24888888888888888,
+      "grad_norm": 1.5448390245437622,
+      "learning_rate": 8.926584654403724e-05,
+      "loss": 0.7424,
       "step": 700
     },
     {
+      "epoch": 0.25066666666666665,
+      "grad_norm": 1.760542392730713,
+      "learning_rate": 8.903970133383297e-05,
+      "loss": 0.7436,
       "step": 705
     },
     {
+      "epoch": 0.25244444444444447,
+      "grad_norm": 1.6473764181137085,
+      "learning_rate": 8.881149151934398e-05,
+      "loss": 0.7569,
       "step": 710
     },
     {
+      "epoch": 0.25422222222222224,
+      "grad_norm": 1.2679284811019897,
+      "learning_rate": 8.858122916938601e-05,
+      "loss": 0.7556,
       "step": 715
     },
     {
+      "epoch": 0.256,
+      "grad_norm": 1.3798352479934692,
+      "learning_rate": 8.834892646132254e-05,
+      "loss": 0.7446,
       "step": 720
     },
     {
+      "epoch": 0.2577777777777778,
+      "grad_norm": 1.692984700202942,
+      "learning_rate": 8.811459568042091e-05,
+      "loss": 0.7695,
       "step": 725
     },
     {
+      "epoch": 0.25955555555555554,
+      "grad_norm": 1.6680200099945068,
+      "learning_rate": 8.787824921920249e-05,
+      "loss": 0.7462,
       "step": 730
     },
     {
+      "epoch": 0.2613333333333333,
+      "grad_norm": 1.4423996210098267,
+      "learning_rate": 8.763989957678742e-05,
+      "loss": 0.7637,
       "step": 735
     },
     {
+      "epoch": 0.26311111111111113,
+      "grad_norm": 1.6775215864181519,
+      "learning_rate": 8.739955935823351e-05,
+      "loss": 0.755,
       "step": 740
     },
     {
+      "epoch": 0.2648888888888889,
+      "grad_norm": 1.6499780416488647,
+      "learning_rate": 8.715724127386972e-05,
+      "loss": 0.7468,
       "step": 745
     },
     {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 1.3906657695770264,
+      "learning_rate": 8.691295813862386e-05,
+      "loss": 0.7458,
       "step": 750
     },
     {
+      "epoch": 0.26844444444444443,
+      "grad_norm": 1.3843157291412354,
+      "learning_rate": 8.666672287134494e-05,
+      "loss": 0.7461,
       "step": 755
     },
     {
+      "epoch": 0.2702222222222222,
+      "grad_norm": 1.6778830289840698,
+      "learning_rate": 8.641854849412001e-05,
+      "loss": 0.7284,
       "step": 760
     },
     {
+      "epoch": 0.272,
+      "grad_norm": 1.2424935102462769,
+      "learning_rate": 8.61684481315854e-05,
+      "loss": 0.7774,
       "step": 765
     },
     {
+      "epoch": 0.2737777777777778,
+      "grad_norm": 1.291231393814087,
+      "learning_rate": 8.591643501023265e-05,
+      "loss": 0.7428,
       "step": 770
     },
     {
+      "epoch": 0.27555555555555555,
+      "grad_norm": 1.12603759765625,
+      "learning_rate": 8.566252245770909e-05,
+      "loss": 0.7377,
       "step": 775
     },
     {
+      "epoch": 0.2773333333333333,
+      "grad_norm": 1.348775029182434,
+      "learning_rate": 8.54067239021129e-05,
+      "loss": 0.762,
       "step": 780
     },
     {
+      "epoch": 0.2791111111111111,
+      "grad_norm": 1.703519582748413,
+      "learning_rate": 8.51490528712831e-05,
+      "loss": 0.7533,
       "step": 785
     },
     {
+      "epoch": 0.2808888888888889,
+      "grad_norm": 1.8479610681533813,
+      "learning_rate": 8.488952299208401e-05,
+      "loss": 0.7535,
       "step": 790
     },
     {
+      "epoch": 0.2826666666666667,
+      "grad_norm": 1.1511187553405762,
+      "learning_rate": 8.462814798968472e-05,
+      "loss": 0.7555,
       "step": 795
     },
     {
+      "epoch": 0.28444444444444444,
+      "grad_norm": 2.5522501468658447,
+      "learning_rate": 8.43649416868331e-05,
+      "loss": 0.741,
       "step": 800
     },
     {
+      "epoch": 0.28444444444444444,
+      "eval_loss": 0.7418414950370789,
+      "eval_runtime": 169.2643,
+      "eval_samples_per_second": 29.54,
+      "eval_steps_per_second": 3.692,
       "step": 800
     }
   ],
   "logging_steps": 5,
+  "max_steps": 2400,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 1,
+  "save_steps": 200,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
         "should_evaluate": false,
         "should_log": false,
         "should_save": true,
+        "should_training_stop": false
       },
       "attributes": {}
     }
   },
+  "total_flos": 775521753600000.0,
+  "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null
 }

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1f301dabc963ac8802177bfd738213a0cc9f22b48633c155ad395ae77124e6c7
-size 5368

 version https://git-lfs.github.com/spec/v1
+oid sha256:e74b0f4621a0feddce935bce6008dfc021ab4f3f6753b47ccab1c7dbe33fc776
+size 5841