Time Series Forecasting
Safetensors
granite_tsfm
tinytimemixer
ttm4hvac
tsfm
digital twin
hvac
energy
Ferran Aran commited on
Commit
71fbc29
·
unverified ·
0 Parent(s):

initial commit

Browse files
Files changed (9) hide show
  1. .gitattributes +35 -0
  2. README.md +219 -0
  3. config.json +74 -0
  4. model.safetensors +3 -0
  5. optimizer.pt +3 -0
  6. rng_state.pth +3 -0
  7. scheduler.pt +3 -0
  8. trainer_state.json +2563 -0
  9. training_args.bin +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: granite_tsfm
3
+ base_model: ibm-granite/granite-timeseries-ttm-r2
4
+ tags:
5
+ - ttm4hvac
6
+ - tsfm
7
+ - digital twin
8
+ - hvac
9
+ - energy
10
+ license: apache-2.0
11
+ papers:
12
+ - title: "Transfer learning of building dynamics digital twin for HVAC control with Time-series Foundation Model"
13
+ url: https://arxiv.org/abs/XXXX.XXXXX
14
+ authors: "Ferran Aran Domingo"
15
+ datasets:
16
+ - gft/ttm4hvac-source-all-train
17
+ - gft/ttm4hvac-target-heat-test
18
+ - gft/ttm4hvac-target-cool-test
19
+ pipeline_tag: time-series-forecasting
20
+ ---
21
+
22
+ # TTM4HVAC – TinyTimeMixer for HVAC dynamics modeling
23
+
24
+ This repository contains the **primary and recommended checkpoint** of the **TTM4HVAC** project: a fine-tuned version of IBM's *TinyTimeMixer* designed to serve as a generic digital twin of building dynamics under an HVAC system.
25
+
26
+ This model corresponds to the **“source-all”** training configuration (all source buildings, full dataset), and it achieves the best overall performance across the TTM4HVAC evaluation benchmarks.
27
+
28
+ Read more on the paper: [arXiv:XXXX.XXXXX]() (to be released).
29
+
30
+ ---
31
+
32
+ # 🔧 Installation
33
+
34
+ The model uses IBM’s Granite Time Series Foundation Model tooling, available from PyPI:
35
+
36
+ ```bash
37
+ pip install granite-tsfm==0.3.1
38
+ ````
39
+
40
+ This installs the `tsfm_public` package containing:
41
+
42
+ * `TinyTimeMixerForPrediction`
43
+ * `TimeSeriesPreprocessor`
44
+ * `TimeSeriesForecastingPipeline`
45
+ * dataset utilities
46
+
47
+ ---
48
+
49
+ # 🚀 Quickstart
50
+
51
+ This example loads the model directly from Hugging Face and performs:
52
+
53
+ 1. Data preprocessing
54
+ 2. Zero-shot evaluation
55
+ 3. Forecast generation
56
+
57
+ ```python
58
+ import pandas as pd
59
+ import torch
60
+
61
+ from tsfm_public import (
62
+ TinyTimeMixerForPrediction,
63
+ TimeSeriesPreprocessor,
64
+ TimeSeriesForecastingPipeline,
65
+ get_datasets,
66
+ )
67
+ from tsfm_public.toolkit.time_series_preprocessor import prepare_data_splits
68
+
69
+ MODEL_ID = "gft/ttm4hvac"
70
+ device = "cuda" if torch.cuda.is_available() else "cpu"
71
+
72
+ TARGETS = [
73
+ "Room Air Temperature (C)",
74
+ "HVAC Power Consumption (W)"
75
+ ]
76
+ OBSERVABLES = [
77
+ "Outdoor Air Temperature (C)",
78
+ "Outdoor Humidity (%)",
79
+ "Wind Speed (m/s)",
80
+ "Direct Solar Radiation (W/m^2)",
81
+ ]
82
+ CONTROLS = [
83
+ "Heating Setpoint (C)",
84
+ "Cooling Setpoint (C)"
85
+ ]
86
+ ID_COLUMNS = []
87
+ TIMESTAMP_COLUMN = "time"
88
+
89
+ BATCH_SIZE = 32
90
+ SPLIT_CONFIG = {"train": 0.35, "test": 0.25} # val is inferred
91
+
92
+
93
+ def run_inference(df: pd.DataFrame, model_id: str = MODEL_ID):
94
+ # 1) Load the fine-tuned TinyTimeMixer model from Hugging Face
95
+ model = TinyTimeMixerForPrediction.from_pretrained(model_id)
96
+ model.to(device)
97
+
98
+ context_length = model.config.context_length
99
+ prediction_length = model.config.prediction_length
100
+
101
+ # 2) Build the preprocessor
102
+ tsp = TimeSeriesPreprocessor(
103
+ timestamp_column=TIMESTAMP_COLUMN,
104
+ target_columns=TARGETS,
105
+ control_columns=CONTROLS,
106
+ observable_columns=OBSERVABLES,
107
+ id_columns=ID_COLUMNS,
108
+ context_length=context_length,
109
+ prediction_length=prediction_length,
110
+ scaling=True,
111
+ freq="15min",
112
+ encode_categorical=False,
113
+ scaler_type="standard",
114
+ )
115
+
116
+ # 3) Prepare test split
117
+ _, _, df_test = prepare_data_splits(
118
+ df, context_length=context_length, split_config=SPLIT_CONFIG
119
+ )
120
+
121
+ # 4) Build the forecasting pipeline
122
+ pipeline = TimeSeriesForecastingPipeline(
123
+ model,
124
+ device=device,
125
+ feature_extractor=tsp,
126
+ batch_size=BATCH_SIZE,
127
+ )
128
+
129
+ # 5) Generate forecasts
130
+ df_forecast = pipeline(df_test)
131
+
132
+ return df_test, df_forecast
133
+
134
+ ```
135
+
136
+ ## Example using [gft/ttm4hvac-target-heat-test]()
137
+
138
+ ```python
139
+ from datasets import load_dataset
140
+
141
+ ds = load_dataset("gft/ttm4hvac-target-heat-test")
142
+ df = ds["test"].to_pandas()
143
+ df.head()
144
+ ```
145
+
146
+ ---
147
+
148
+ # 📑 Input Schema
149
+
150
+ Your input `pandas.DataFrame` must contain:
151
+
152
+ * `time` (timestamp column)
153
+ * **Targets:**
154
+
155
+ * `Room Air Temperature (C)`
156
+ * `HVAC Power Consumption (W)`
157
+ * **Observables:**
158
+
159
+ * `Outdoor Air Temperature (C)`
160
+ * `Outdoor Humidity (%)`
161
+ * `Wind Speed (m/s)`
162
+ * `Direct Solar Radiation (W/m^2)`
163
+ * **Controls:**
164
+
165
+ * `Heating Setpoint (C)`
166
+ * `Cooling Setpoint (C)`
167
+
168
+ Sampling frequency must be **15 minutes** (`freq="15min"`).
169
+
170
+ ---
171
+
172
+ # 📦 Related models (TTM4HVAC family)
173
+
174
+ These models correspond to each experiment documented on the [paper]():
175
+
176
+ - `gft/ttm4hvac` - Main model, best performer (this repo)
177
+ - [`gft/ttm4hvac-source-default`](https://huggingface.co/gft/ttm4hvac-source-default)
178
+ - [`gft/ttm4hvac-target-default`](https://huggingface.co/gft/ttm4hvac-target-default)
179
+ - [`gft/ttm4hvac-target-chaotic`](https://huggingface.co/gft/ttm4hvac-target-chaotic)
180
+
181
+ ---
182
+
183
+ # 📚 Related Datasets
184
+
185
+ Training and evaluation datasets used for this fine-tune:
186
+
187
+ * [`gft/ttm4hvac-source-all-train`](https://huggingface.co/datasets/gft/ttm4hvac-source-all-train)
188
+ * [`gft/ttm4hvac-target-heat-test`](https://huggingface.co/datasets/gft/ttm4hvac-target-heat-test)
189
+ * [`gft/ttm4hvac-target-cool-test`](https://huggingface.co/datasets/gft/ttm4hvac-target-cool-test)
190
+
191
+ Other datasets:
192
+
193
+ * [`gft/ttm4hvac-source-default-train`](https://huggingface.co/datasets/gft/ttm4hvac-source-default-train)
194
+ * [`gft/ttm4hvac-target-chaotic-train`](https://huggingface.co/datasets/gft/ttm4hvac-target-chaotic-train)
195
+ * [`gft/ttm4hvac-target-default-train`](https://huggingface.co/datasets/gft/ttm4hvac-target-default-train)
196
+
197
+ ---
198
+
199
+ # 📘 Project Overview
200
+
201
+ **TTM4HVAC** investigates how foundation-model-based time-series architectures (*TinyTimeMixer*, from IBM Granite TSFM) can:
202
+
203
+ * model complex building thermal dynamics,
204
+ * generalize across buildings and climates,
205
+ * support transfer from source → target buildings,
206
+ * evaluate under diverse behavioral patterns (default schedules vs chaotic occupants).
207
+
208
+ ---
209
+
210
+ # ✒️ Citation
211
+
212
+ If you use this model or datasets, please cite:
213
+
214
+ ```
215
+ **F. Aran**,
216
+ *Transfer learning of building dynamics digital twin for HVAC control with Time-series Foundation Model*,
217
+ arXiv:XXXX.XXXXX, 2025.
218
+ https://arxiv.org/abs/XXXX.XXXXX
219
+ ```
config.json ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "adaptive_patching_levels": 3,
3
+ "architectures": [
4
+ "TinyTimeMixerForPrediction"
5
+ ],
6
+ "categorical_vocab_size_list": null,
7
+ "context_length": 1536,
8
+ "d_model": 384,
9
+ "d_model_scale": 3,
10
+ "decoder_adaptive_patching_levels": 0,
11
+ "decoder_d_model": 256,
12
+ "decoder_d_model_scale": 2,
13
+ "decoder_mode": "mix_channel",
14
+ "decoder_num_layers": 2,
15
+ "decoder_raw_residual": false,
16
+ "distribution_output": "student_t",
17
+ "dropout": 0.4,
18
+ "enable_forecast_channel_mixing": true,
19
+ "exogenous_channel_indices": [
20
+ 2,
21
+ 3,
22
+ 4,
23
+ 5,
24
+ 6,
25
+ 7
26
+ ],
27
+ "expansion_factor": 2,
28
+ "fcm_context_length": 3,
29
+ "fcm_gated_attn": true,
30
+ "fcm_mix_layers": 3,
31
+ "fcm_prepend_past": true,
32
+ "fcm_prepend_past_offset": null,
33
+ "fcm_use_mixer": true,
34
+ "frequency_token_vocab_size": 8,
35
+ "gated_attn": true,
36
+ "head_dropout": 0.4,
37
+ "huber_delta": 1,
38
+ "init_embed": "pytorch",
39
+ "init_linear": "pytorch",
40
+ "init_processing": true,
41
+ "init_std": 0.02,
42
+ "loss": "mse",
43
+ "mask_value": 0,
44
+ "masked_context_length": null,
45
+ "mode": "common_channel",
46
+ "model_type": "tinytimemixer",
47
+ "norm_eps": 1e-05,
48
+ "norm_mlp": "LayerNorm",
49
+ "num_input_channels": 8,
50
+ "num_layers": 2,
51
+ "num_parallel_samples": 100,
52
+ "num_patches": 12,
53
+ "patch_last": true,
54
+ "patch_length": 128,
55
+ "patch_stride": 128,
56
+ "positional_encoding_type": "sincos",
57
+ "post_init": false,
58
+ "prediction_channel_indices": [
59
+ 0,
60
+ 1
61
+ ],
62
+ "prediction_filter_length": null,
63
+ "prediction_length": 96,
64
+ "quantile": 0.5,
65
+ "resolution_prefix_tuning": false,
66
+ "scaling": "std",
67
+ "self_attn": false,
68
+ "self_attn_heads": 1,
69
+ "stride_ratio": 1,
70
+ "torch_dtype": "float32",
71
+ "transformers_version": "4.55.0",
72
+ "use_decoder": true,
73
+ "use_positional_encoding": false
74
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d3643e1bf9722f6ae4abd23706b8c2da7750ed05981646b3985ba788f18263e
3
+ size 12429560
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b621dc33436026e17e945b47fbb7e7694600e524489f3db5a4fee9c71bded5a6
3
+ size 8673675
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c747a9e154cdb52e177e60daeeb0a8bb06eb54daf170ab3d3676ffb0ce48329
3
+ size 14645
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bb829282574d82ab39ef909e200415a4f6fb23acb9d875d495763dae4133dc1
3
+ size 1465
trainer_state.json ADDED
@@ -0,0 +1,2563 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 9408,
3
+ "best_metric": 0.1410149782896042,
4
+ "best_model_checkpoint": "tmp/out/1536-96-r2_mix_channel_fcmCtx3_fcmLayers3_fcmChMixingTrue_stride24_bs512_lrf_deb3/checkpoint-9408",
5
+ "epoch": 168.0,
6
+ "eval_steps": 500,
7
+ "global_step": 9408,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 1.0,
14
+ "grad_norm": 0.376089870929718,
15
+ "learning_rate": 0.00029836401390103334,
16
+ "loss": 0.3643,
17
+ "step": 56
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_loss": 0.25079935789108276,
22
+ "eval_runtime": 12.3705,
23
+ "eval_samples_per_second": 877.898,
24
+ "eval_steps_per_second": 1.778,
25
+ "step": 56
26
+ },
27
+ {
28
+ "epoch": 2.0,
29
+ "grad_norm": 0.25105392932891846,
30
+ "learning_rate": 0.00029836183164580883,
31
+ "loss": 0.3058,
32
+ "step": 112
33
+ },
34
+ {
35
+ "epoch": 2.0,
36
+ "eval_loss": 0.23216894268989563,
37
+ "eval_runtime": 12.2194,
38
+ "eval_samples_per_second": 888.753,
39
+ "eval_steps_per_second": 1.8,
40
+ "step": 112
41
+ },
42
+ {
43
+ "epoch": 3.0,
44
+ "grad_norm": 0.17020165920257568,
45
+ "learning_rate": 0.00029835817704944523,
46
+ "loss": 0.2683,
47
+ "step": 168
48
+ },
49
+ {
50
+ "epoch": 3.0,
51
+ "eval_loss": 0.20991244912147522,
52
+ "eval_runtime": 10.9934,
53
+ "eval_samples_per_second": 987.863,
54
+ "eval_steps_per_second": 2.001,
55
+ "step": 168
56
+ },
57
+ {
58
+ "epoch": 4.0,
59
+ "grad_norm": 0.13130681216716766,
60
+ "learning_rate": 0.00029835305014801184,
61
+ "loss": 0.2395,
62
+ "step": 224
63
+ },
64
+ {
65
+ "epoch": 4.0,
66
+ "eval_loss": 0.19736029207706451,
67
+ "eval_runtime": 11.7226,
68
+ "eval_samples_per_second": 926.414,
69
+ "eval_steps_per_second": 1.877,
70
+ "step": 224
71
+ },
72
+ {
73
+ "epoch": 5.0,
74
+ "grad_norm": 0.12686163187026978,
75
+ "learning_rate": 0.0002983464509921093,
76
+ "loss": 0.2241,
77
+ "step": 280
78
+ },
79
+ {
80
+ "epoch": 5.0,
81
+ "eval_loss": 0.18977424502372742,
82
+ "eval_runtime": 11.8479,
83
+ "eval_samples_per_second": 916.618,
84
+ "eval_steps_per_second": 1.857,
85
+ "step": 280
86
+ },
87
+ {
88
+ "epoch": 6.0,
89
+ "grad_norm": 0.11746390908956528,
90
+ "learning_rate": 0.00029833837964686835,
91
+ "loss": 0.2148,
92
+ "step": 336
93
+ },
94
+ {
95
+ "epoch": 6.0,
96
+ "eval_loss": 0.1851092129945755,
97
+ "eval_runtime": 11.7556,
98
+ "eval_samples_per_second": 923.812,
99
+ "eval_steps_per_second": 1.871,
100
+ "step": 336
101
+ },
102
+ {
103
+ "epoch": 7.0,
104
+ "grad_norm": 0.13627897202968597,
105
+ "learning_rate": 0.0002983288361919503,
106
+ "loss": 0.2078,
107
+ "step": 392
108
+ },
109
+ {
110
+ "epoch": 7.0,
111
+ "eval_loss": 0.18129761517047882,
112
+ "eval_runtime": 11.7487,
113
+ "eval_samples_per_second": 924.357,
114
+ "eval_steps_per_second": 1.873,
115
+ "step": 392
116
+ },
117
+ {
118
+ "epoch": 8.0,
119
+ "grad_norm": 0.1497841328382492,
120
+ "learning_rate": 0.00029831782072154485,
121
+ "loss": 0.2025,
122
+ "step": 448
123
+ },
124
+ {
125
+ "epoch": 8.0,
126
+ "eval_loss": 0.17769944667816162,
127
+ "eval_runtime": 12.1141,
128
+ "eval_samples_per_second": 896.477,
129
+ "eval_steps_per_second": 1.816,
130
+ "step": 448
131
+ },
132
+ {
133
+ "epoch": 9.0,
134
+ "grad_norm": 0.19643521308898926,
135
+ "learning_rate": 0.0002983053333443701,
136
+ "loss": 0.1976,
137
+ "step": 504
138
+ },
139
+ {
140
+ "epoch": 9.0,
141
+ "eval_loss": 0.17583897709846497,
142
+ "eval_runtime": 12.5558,
143
+ "eval_samples_per_second": 864.936,
144
+ "eval_steps_per_second": 1.752,
145
+ "step": 504
146
+ },
147
+ {
148
+ "epoch": 10.0,
149
+ "grad_norm": 0.1033664122223854,
150
+ "learning_rate": 0.0002982913741836719,
151
+ "loss": 0.1936,
152
+ "step": 560
153
+ },
154
+ {
155
+ "epoch": 10.0,
156
+ "eval_loss": 0.1739388257265091,
157
+ "eval_runtime": 12.449,
158
+ "eval_samples_per_second": 872.358,
159
+ "eval_steps_per_second": 1.767,
160
+ "step": 560
161
+ },
162
+ {
163
+ "epoch": 11.0,
164
+ "grad_norm": 0.1361815184354782,
165
+ "learning_rate": 0.00029827594337722164,
166
+ "loss": 0.1902,
167
+ "step": 616
168
+ },
169
+ {
170
+ "epoch": 11.0,
171
+ "eval_loss": 0.17110829055309296,
172
+ "eval_runtime": 12.7701,
173
+ "eval_samples_per_second": 850.423,
174
+ "eval_steps_per_second": 1.723,
175
+ "step": 616
176
+ },
177
+ {
178
+ "epoch": 12.0,
179
+ "grad_norm": 0.12385320663452148,
180
+ "learning_rate": 0.0002982590410773146,
181
+ "loss": 0.1867,
182
+ "step": 672
183
+ },
184
+ {
185
+ "epoch": 12.0,
186
+ "eval_loss": 0.16852673888206482,
187
+ "eval_runtime": 11.8972,
188
+ "eval_samples_per_second": 912.817,
189
+ "eval_steps_per_second": 1.849,
190
+ "step": 672
191
+ },
192
+ {
193
+ "epoch": 13.0,
194
+ "grad_norm": 0.13126742839813232,
195
+ "learning_rate": 0.0002982406674507699,
196
+ "loss": 0.1837,
197
+ "step": 728
198
+ },
199
+ {
200
+ "epoch": 13.0,
201
+ "eval_loss": 0.1675039380788803,
202
+ "eval_runtime": 11.8951,
203
+ "eval_samples_per_second": 912.98,
204
+ "eval_steps_per_second": 1.85,
205
+ "step": 728
206
+ },
207
+ {
208
+ "epoch": 14.0,
209
+ "grad_norm": 0.14581529796123505,
210
+ "learning_rate": 0.00029822082267892794,
211
+ "loss": 0.1818,
212
+ "step": 784
213
+ },
214
+ {
215
+ "epoch": 14.0,
216
+ "eval_loss": 0.16522179543972015,
217
+ "eval_runtime": 12.951,
218
+ "eval_samples_per_second": 838.545,
219
+ "eval_steps_per_second": 1.699,
220
+ "step": 784
221
+ },
222
+ {
223
+ "epoch": 15.0,
224
+ "grad_norm": 0.12710689008235931,
225
+ "learning_rate": 0.0002981995069576483,
226
+ "loss": 0.1787,
227
+ "step": 840
228
+ },
229
+ {
230
+ "epoch": 15.0,
231
+ "eval_loss": 0.1651495099067688,
232
+ "eval_runtime": 12.4369,
233
+ "eval_samples_per_second": 873.211,
234
+ "eval_steps_per_second": 1.769,
235
+ "step": 840
236
+ },
237
+ {
238
+ "epoch": 16.0,
239
+ "grad_norm": 0.1914917379617691,
240
+ "learning_rate": 0.0002981767204973089,
241
+ "loss": 0.177,
242
+ "step": 896
243
+ },
244
+ {
245
+ "epoch": 16.0,
246
+ "eval_loss": 0.1639031320810318,
247
+ "eval_runtime": 12.7112,
248
+ "eval_samples_per_second": 854.365,
249
+ "eval_steps_per_second": 1.731,
250
+ "step": 896
251
+ },
252
+ {
253
+ "epoch": 17.0,
254
+ "grad_norm": 0.15502069890499115,
255
+ "learning_rate": 0.00029815246352280276,
256
+ "loss": 0.1751,
257
+ "step": 952
258
+ },
259
+ {
260
+ "epoch": 17.0,
261
+ "eval_loss": 0.16176268458366394,
262
+ "eval_runtime": 12.1031,
263
+ "eval_samples_per_second": 897.291,
264
+ "eval_steps_per_second": 1.818,
265
+ "step": 952
266
+ },
267
+ {
268
+ "epoch": 18.0,
269
+ "grad_norm": 0.11603855341672897,
270
+ "learning_rate": 0.0002981267362735362,
271
+ "loss": 0.1734,
272
+ "step": 1008
273
+ },
274
+ {
275
+ "epoch": 18.0,
276
+ "eval_loss": 0.1614038050174713,
277
+ "eval_runtime": 11.893,
278
+ "eval_samples_per_second": 913.139,
279
+ "eval_steps_per_second": 1.85,
280
+ "step": 1008
281
+ },
282
+ {
283
+ "epoch": 19.0,
284
+ "grad_norm": 0.11780980974435806,
285
+ "learning_rate": 0.0002980995390034271,
286
+ "loss": 0.172,
287
+ "step": 1064
288
+ },
289
+ {
290
+ "epoch": 19.0,
291
+ "eval_loss": 0.16114258766174316,
292
+ "eval_runtime": 12.6404,
293
+ "eval_samples_per_second": 859.152,
294
+ "eval_steps_per_second": 1.74,
295
+ "step": 1064
296
+ },
297
+ {
298
+ "epoch": 20.0,
299
+ "grad_norm": 0.14823858439922333,
300
+ "learning_rate": 0.00029807087198090116,
301
+ "loss": 0.1702,
302
+ "step": 1120
303
+ },
304
+ {
305
+ "epoch": 20.0,
306
+ "eval_loss": 0.15980996191501617,
307
+ "eval_runtime": 12.5631,
308
+ "eval_samples_per_second": 864.434,
309
+ "eval_steps_per_second": 1.751,
310
+ "step": 1120
311
+ },
312
+ {
313
+ "epoch": 21.0,
314
+ "grad_norm": 0.1246936172246933,
315
+ "learning_rate": 0.0002980407354888907,
316
+ "loss": 0.1688,
317
+ "step": 1176
318
+ },
319
+ {
320
+ "epoch": 21.0,
321
+ "eval_loss": 0.15955598652362823,
322
+ "eval_runtime": 12.315,
323
+ "eval_samples_per_second": 881.853,
324
+ "eval_steps_per_second": 1.786,
325
+ "step": 1176
326
+ },
327
+ {
328
+ "epoch": 22.0,
329
+ "grad_norm": 0.11726798117160797,
330
+ "learning_rate": 0.0002980091298248309,
331
+ "loss": 0.1675,
332
+ "step": 1232
333
+ },
334
+ {
335
+ "epoch": 22.0,
336
+ "eval_loss": 0.15864743292331696,
337
+ "eval_runtime": 12.3526,
338
+ "eval_samples_per_second": 879.166,
339
+ "eval_steps_per_second": 1.781,
340
+ "step": 1232
341
+ },
342
+ {
343
+ "epoch": 23.0,
344
+ "grad_norm": 0.13960805535316467,
345
+ "learning_rate": 0.0002979760553006564,
346
+ "loss": 0.1666,
347
+ "step": 1288
348
+ },
349
+ {
350
+ "epoch": 23.0,
351
+ "eval_loss": 0.15781378746032715,
352
+ "eval_runtime": 12.187,
353
+ "eval_samples_per_second": 891.116,
354
+ "eval_steps_per_second": 1.805,
355
+ "step": 1288
356
+ },
357
+ {
358
+ "epoch": 24.0,
359
+ "grad_norm": 0.11856065690517426,
360
+ "learning_rate": 0.00029794151224279964,
361
+ "loss": 0.1652,
362
+ "step": 1344
363
+ },
364
+ {
365
+ "epoch": 24.0,
366
+ "eval_loss": 0.15776978433132172,
367
+ "eval_runtime": 12.435,
368
+ "eval_samples_per_second": 873.344,
369
+ "eval_steps_per_second": 1.769,
370
+ "step": 1344
371
+ },
372
+ {
373
+ "epoch": 25.0,
374
+ "grad_norm": 0.12466388940811157,
375
+ "learning_rate": 0.00029790550099218654,
376
+ "loss": 0.1643,
377
+ "step": 1400
378
+ },
379
+ {
380
+ "epoch": 25.0,
381
+ "eval_loss": 0.15815725922584534,
382
+ "eval_runtime": 13.1792,
383
+ "eval_samples_per_second": 824.023,
384
+ "eval_steps_per_second": 1.669,
385
+ "step": 1400
386
+ },
387
+ {
388
+ "epoch": 26.0,
389
+ "grad_norm": 0.12369589507579803,
390
+ "learning_rate": 0.0002978680219042336,
391
+ "loss": 0.1633,
392
+ "step": 1456
393
+ },
394
+ {
395
+ "epoch": 26.0,
396
+ "eval_loss": 0.1567024141550064,
397
+ "eval_runtime": 12.484,
398
+ "eval_samples_per_second": 869.916,
399
+ "eval_steps_per_second": 1.762,
400
+ "step": 1456
401
+ },
402
+ {
403
+ "epoch": 27.0,
404
+ "grad_norm": 0.14197547733783722,
405
+ "learning_rate": 0.0002978290753488448,
406
+ "loss": 0.1624,
407
+ "step": 1512
408
+ },
409
+ {
410
+ "epoch": 27.0,
411
+ "eval_loss": 0.15676391124725342,
412
+ "eval_runtime": 12.738,
413
+ "eval_samples_per_second": 852.567,
414
+ "eval_steps_per_second": 1.727,
415
+ "step": 1512
416
+ },
417
+ {
418
+ "epoch": 28.0,
419
+ "grad_norm": 0.13262535631656647,
420
+ "learning_rate": 0.0002977886617104062,
421
+ "loss": 0.1613,
422
+ "step": 1568
423
+ },
424
+ {
425
+ "epoch": 28.0,
426
+ "eval_loss": 0.1567520797252655,
427
+ "eval_runtime": 12.6529,
428
+ "eval_samples_per_second": 858.304,
429
+ "eval_steps_per_second": 1.739,
430
+ "step": 1568
431
+ },
432
+ {
433
+ "epoch": 29.0,
434
+ "grad_norm": 0.15622882544994354,
435
+ "learning_rate": 0.0002977467813877842,
436
+ "loss": 0.1604,
437
+ "step": 1624
438
+ },
439
+ {
440
+ "epoch": 29.0,
441
+ "eval_loss": 0.15647795796394348,
442
+ "eval_runtime": 12.6006,
443
+ "eval_samples_per_second": 861.863,
444
+ "eval_steps_per_second": 1.746,
445
+ "step": 1624
446
+ },
447
+ {
448
+ "epoch": 30.0,
449
+ "grad_norm": 0.15161629021167755,
450
+ "learning_rate": 0.00029770343479432095,
451
+ "loss": 0.1598,
452
+ "step": 1680
453
+ },
454
+ {
455
+ "epoch": 30.0,
456
+ "eval_loss": 0.15717600286006927,
457
+ "eval_runtime": 12.8165,
458
+ "eval_samples_per_second": 847.348,
459
+ "eval_steps_per_second": 1.717,
460
+ "step": 1680
461
+ },
462
+ {
463
+ "epoch": 31.0,
464
+ "grad_norm": 0.12715986371040344,
465
+ "learning_rate": 0.0002976586223578297,
466
+ "loss": 0.1591,
467
+ "step": 1736
468
+ },
469
+ {
470
+ "epoch": 31.0,
471
+ "eval_loss": 0.1557074338197708,
472
+ "eval_runtime": 12.6403,
473
+ "eval_samples_per_second": 859.156,
474
+ "eval_steps_per_second": 1.74,
475
+ "step": 1736
476
+ },
477
+ {
478
+ "epoch": 32.0,
479
+ "grad_norm": 0.1595166027545929,
480
+ "learning_rate": 0.00029761234452059136,
481
+ "loss": 0.1584,
482
+ "step": 1792
483
+ },
484
+ {
485
+ "epoch": 32.0,
486
+ "eval_loss": 0.15540747344493866,
487
+ "eval_runtime": 13.3084,
488
+ "eval_samples_per_second": 816.027,
489
+ "eval_steps_per_second": 1.653,
490
+ "step": 1792
491
+ },
492
+ {
493
+ "epoch": 33.0,
494
+ "grad_norm": 0.16593649983406067,
495
+ "learning_rate": 0.0002975646017393494,
496
+ "loss": 0.1576,
497
+ "step": 1848
498
+ },
499
+ {
500
+ "epoch": 33.0,
501
+ "eval_loss": 0.15468333661556244,
502
+ "eval_runtime": 13.1483,
503
+ "eval_samples_per_second": 825.961,
504
+ "eval_steps_per_second": 1.673,
505
+ "step": 1848
506
+ },
507
+ {
508
+ "epoch": 34.0,
509
+ "grad_norm": 0.14555956423282623,
510
+ "learning_rate": 0.0002975153944853054,
511
+ "loss": 0.1567,
512
+ "step": 1904
513
+ },
514
+ {
515
+ "epoch": 34.0,
516
+ "eval_loss": 0.1553257554769516,
517
+ "eval_runtime": 12.853,
518
+ "eval_samples_per_second": 844.936,
519
+ "eval_steps_per_second": 1.712,
520
+ "step": 1904
521
+ },
522
+ {
523
+ "epoch": 35.0,
524
+ "grad_norm": 0.23194457590579987,
525
+ "learning_rate": 0.00029746472324411547,
526
+ "loss": 0.156,
527
+ "step": 1960
528
+ },
529
+ {
530
+ "epoch": 35.0,
531
+ "eval_loss": 0.1549767106771469,
532
+ "eval_runtime": 11.49,
533
+ "eval_samples_per_second": 945.169,
534
+ "eval_steps_per_second": 1.915,
535
+ "step": 1960
536
+ },
537
+ {
538
+ "epoch": 36.0,
539
+ "grad_norm": 0.17572428286075592,
540
+ "learning_rate": 0.0002974125885158844,
541
+ "loss": 0.1559,
542
+ "step": 2016
543
+ },
544
+ {
545
+ "epoch": 36.0,
546
+ "eval_loss": 0.15631072223186493,
547
+ "eval_runtime": 12.6465,
548
+ "eval_samples_per_second": 858.739,
549
+ "eval_steps_per_second": 1.74,
550
+ "step": 2016
551
+ },
552
+ {
553
+ "epoch": 37.0,
554
+ "grad_norm": 0.1315496563911438,
555
+ "learning_rate": 0.0002973589908151604,
556
+ "loss": 0.1547,
557
+ "step": 2072
558
+ },
559
+ {
560
+ "epoch": 37.0,
561
+ "eval_loss": 0.1540231704711914,
562
+ "eval_runtime": 13.3162,
563
+ "eval_samples_per_second": 815.548,
564
+ "eval_steps_per_second": 1.652,
565
+ "step": 2072
566
+ },
567
+ {
568
+ "epoch": 38.0,
569
+ "grad_norm": 0.17212693393230438,
570
+ "learning_rate": 0.0002973039306709319,
571
+ "loss": 0.1539,
572
+ "step": 2128
573
+ },
574
+ {
575
+ "epoch": 38.0,
576
+ "eval_loss": 0.15414279699325562,
577
+ "eval_runtime": 13.2364,
578
+ "eval_samples_per_second": 820.466,
579
+ "eval_steps_per_second": 1.662,
580
+ "step": 2128
581
+ },
582
+ {
583
+ "epoch": 39.0,
584
+ "grad_norm": 0.12589286267757416,
585
+ "learning_rate": 0.0002972474086266193,
586
+ "loss": 0.1538,
587
+ "step": 2184
588
+ },
589
+ {
590
+ "epoch": 39.0,
591
+ "eval_loss": 0.15399765968322754,
592
+ "eval_runtime": 12.5952,
593
+ "eval_samples_per_second": 862.236,
594
+ "eval_steps_per_second": 1.747,
595
+ "step": 2184
596
+ },
597
+ {
598
+ "epoch": 40.0,
599
+ "grad_norm": 0.1479528248310089,
600
+ "learning_rate": 0.0002971894252400732,
601
+ "loss": 0.1529,
602
+ "step": 2240
603
+ },
604
+ {
605
+ "epoch": 40.0,
606
+ "eval_loss": 0.1546306610107422,
607
+ "eval_runtime": 12.4569,
608
+ "eval_samples_per_second": 871.809,
609
+ "eval_steps_per_second": 1.766,
610
+ "step": 2240
611
+ },
612
+ {
613
+ "epoch": 41.0,
614
+ "grad_norm": 0.140830859541893,
615
+ "learning_rate": 0.00029712998108356566,
616
+ "loss": 0.1521,
617
+ "step": 2296
618
+ },
619
+ {
620
+ "epoch": 41.0,
621
+ "eval_loss": 0.15411749482154846,
622
+ "eval_runtime": 12.8911,
623
+ "eval_samples_per_second": 842.441,
624
+ "eval_steps_per_second": 1.707,
625
+ "step": 2296
626
+ },
627
+ {
628
+ "epoch": 42.0,
629
+ "grad_norm": 0.14429251849651337,
630
+ "learning_rate": 0.0002970690767437871,
631
+ "loss": 0.1521,
632
+ "step": 2352
633
+ },
634
+ {
635
+ "epoch": 42.0,
636
+ "eval_loss": 0.1535186916589737,
637
+ "eval_runtime": 12.7037,
638
+ "eval_samples_per_second": 854.87,
639
+ "eval_steps_per_second": 1.732,
640
+ "step": 2352
641
+ },
642
+ {
643
+ "epoch": 43.0,
644
+ "grad_norm": 0.1678067147731781,
645
+ "learning_rate": 0.00029700671282183844,
646
+ "loss": 0.1516,
647
+ "step": 2408
648
+ },
649
+ {
650
+ "epoch": 43.0,
651
+ "eval_loss": 0.15345174074172974,
652
+ "eval_runtime": 12.8622,
653
+ "eval_samples_per_second": 844.337,
654
+ "eval_steps_per_second": 1.71,
655
+ "step": 2408
656
+ },
657
+ {
658
+ "epoch": 44.0,
659
+ "grad_norm": 0.16715741157531738,
660
+ "learning_rate": 0.00029694288993322636,
661
+ "loss": 0.1506,
662
+ "step": 2464
663
+ },
664
+ {
665
+ "epoch": 44.0,
666
+ "eval_loss": 0.1528453379869461,
667
+ "eval_runtime": 12.394,
668
+ "eval_samples_per_second": 876.23,
669
+ "eval_steps_per_second": 1.775,
670
+ "step": 2464
671
+ },
672
+ {
673
+ "epoch": 45.0,
674
+ "grad_norm": 0.1476888358592987,
675
+ "learning_rate": 0.00029687760870785704,
676
+ "loss": 0.1502,
677
+ "step": 2520
678
+ },
679
+ {
680
+ "epoch": 45.0,
681
+ "eval_loss": 0.15371684730052948,
682
+ "eval_runtime": 12.8504,
683
+ "eval_samples_per_second": 845.113,
684
+ "eval_steps_per_second": 1.712,
685
+ "step": 2520
686
+ },
687
+ {
688
+ "epoch": 46.0,
689
+ "grad_norm": 0.16268473863601685,
690
+ "learning_rate": 0.00029681086979003,
691
+ "loss": 0.1497,
692
+ "step": 2576
693
+ },
694
+ {
695
+ "epoch": 46.0,
696
+ "eval_loss": 0.15216761827468872,
697
+ "eval_runtime": 12.9049,
698
+ "eval_samples_per_second": 841.539,
699
+ "eval_steps_per_second": 1.705,
700
+ "step": 2576
701
+ },
702
+ {
703
+ "epoch": 47.0,
704
+ "grad_norm": 0.17756158113479614,
705
+ "learning_rate": 0.0002967426738384313,
706
+ "loss": 0.1493,
707
+ "step": 2632
708
+ },
709
+ {
710
+ "epoch": 47.0,
711
+ "eval_loss": 0.15324676036834717,
712
+ "eval_runtime": 13.0526,
713
+ "eval_samples_per_second": 832.021,
714
+ "eval_steps_per_second": 1.685,
715
+ "step": 2632
716
+ },
717
+ {
718
+ "epoch": 48.0,
719
+ "grad_norm": 0.13994063436985016,
720
+ "learning_rate": 0.0002966730215261271,
721
+ "loss": 0.1487,
722
+ "step": 2688
723
+ },
724
+ {
725
+ "epoch": 48.0,
726
+ "eval_loss": 0.15221010148525238,
727
+ "eval_runtime": 12.6334,
728
+ "eval_samples_per_second": 859.628,
729
+ "eval_steps_per_second": 1.741,
730
+ "step": 2688
731
+ },
732
+ {
733
+ "epoch": 49.0,
734
+ "grad_norm": 0.18394885957241058,
735
+ "learning_rate": 0.0002966019135405581,
736
+ "loss": 0.1483,
737
+ "step": 2744
738
+ },
739
+ {
740
+ "epoch": 49.0,
741
+ "eval_loss": 0.15254603326320648,
742
+ "eval_runtime": 12.296,
743
+ "eval_samples_per_second": 883.214,
744
+ "eval_steps_per_second": 1.789,
745
+ "step": 2744
746
+ },
747
+ {
748
+ "epoch": 50.0,
749
+ "grad_norm": 0.14756232500076294,
750
+ "learning_rate": 0.000296529350583531,
751
+ "loss": 0.1479,
752
+ "step": 2800
753
+ },
754
+ {
755
+ "epoch": 50.0,
756
+ "eval_loss": 0.15157358348369598,
757
+ "eval_runtime": 12.7067,
758
+ "eval_samples_per_second": 854.666,
759
+ "eval_steps_per_second": 1.731,
760
+ "step": 2800
761
+ },
762
+ {
763
+ "epoch": 51.0,
764
+ "grad_norm": 0.18675681948661804,
765
+ "learning_rate": 0.00029645533337121344,
766
+ "loss": 0.1476,
767
+ "step": 2856
768
+ },
769
+ {
770
+ "epoch": 51.0,
771
+ "eval_loss": 0.15315961837768555,
772
+ "eval_runtime": 12.914,
773
+ "eval_samples_per_second": 840.949,
774
+ "eval_steps_per_second": 1.704,
775
+ "step": 2856
776
+ },
777
+ {
778
+ "epoch": 52.0,
779
+ "grad_norm": 0.21148425340652466,
780
+ "learning_rate": 0.0002963798626341248,
781
+ "loss": 0.1467,
782
+ "step": 2912
783
+ },
784
+ {
785
+ "epoch": 52.0,
786
+ "eval_loss": 0.151397705078125,
787
+ "eval_runtime": 12.6083,
788
+ "eval_samples_per_second": 861.336,
789
+ "eval_steps_per_second": 1.745,
790
+ "step": 2912
791
+ },
792
+ {
793
+ "epoch": 53.0,
794
+ "grad_norm": 0.14957012236118317,
795
+ "learning_rate": 0.00029630293911713125,
796
+ "loss": 0.1463,
797
+ "step": 2968
798
+ },
799
+ {
800
+ "epoch": 53.0,
801
+ "eval_loss": 0.152817040681839,
802
+ "eval_runtime": 12.3988,
803
+ "eval_samples_per_second": 875.89,
804
+ "eval_steps_per_second": 1.774,
805
+ "step": 2968
806
+ },
807
+ {
808
+ "epoch": 54.0,
809
+ "grad_norm": 0.18841682374477386,
810
+ "learning_rate": 0.0002962245635794367,
811
+ "loss": 0.1457,
812
+ "step": 3024
813
+ },
814
+ {
815
+ "epoch": 54.0,
816
+ "eval_loss": 0.1509653627872467,
817
+ "eval_runtime": 12.9201,
818
+ "eval_samples_per_second": 840.553,
819
+ "eval_steps_per_second": 1.703,
820
+ "step": 3024
821
+ },
822
+ {
823
+ "epoch": 55.0,
824
+ "grad_norm": 0.19782641530036926,
825
+ "learning_rate": 0.00029614473679457606,
826
+ "loss": 0.1457,
827
+ "step": 3080
828
+ },
829
+ {
830
+ "epoch": 55.0,
831
+ "eval_loss": 0.15204061567783356,
832
+ "eval_runtime": 13.0172,
833
+ "eval_samples_per_second": 834.282,
834
+ "eval_steps_per_second": 1.69,
835
+ "step": 3080
836
+ },
837
+ {
838
+ "epoch": 56.0,
839
+ "grad_norm": 0.15806534886360168,
840
+ "learning_rate": 0.0002960634595504073,
841
+ "loss": 0.145,
842
+ "step": 3136
843
+ },
844
+ {
845
+ "epoch": 56.0,
846
+ "eval_loss": 0.15144167840480804,
847
+ "eval_runtime": 12.3723,
848
+ "eval_samples_per_second": 877.767,
849
+ "eval_steps_per_second": 1.778,
850
+ "step": 3136
851
+ },
852
+ {
853
+ "epoch": 57.0,
854
+ "grad_norm": 0.1470707207918167,
855
+ "learning_rate": 0.00029598073264910414,
856
+ "loss": 0.1446,
857
+ "step": 3192
858
+ },
859
+ {
860
+ "epoch": 57.0,
861
+ "eval_loss": 0.15259326994419098,
862
+ "eval_runtime": 11.8486,
863
+ "eval_samples_per_second": 916.567,
864
+ "eval_steps_per_second": 1.857,
865
+ "step": 3192
866
+ },
867
+ {
868
+ "epoch": 58.0,
869
+ "grad_norm": 0.12880393862724304,
870
+ "learning_rate": 0.00029589655690714776,
871
+ "loss": 0.1444,
872
+ "step": 3248
873
+ },
874
+ {
875
+ "epoch": 58.0,
876
+ "eval_loss": 0.1521604359149933,
877
+ "eval_runtime": 12.3711,
878
+ "eval_samples_per_second": 877.851,
879
+ "eval_steps_per_second": 1.778,
880
+ "step": 3248
881
+ },
882
+ {
883
+ "epoch": 59.0,
884
+ "grad_norm": 0.20687344670295715,
885
+ "learning_rate": 0.00029581093315531867,
886
+ "loss": 0.1439,
887
+ "step": 3304
888
+ },
889
+ {
890
+ "epoch": 59.0,
891
+ "eval_loss": 0.1506902128458023,
892
+ "eval_runtime": 12.2839,
893
+ "eval_samples_per_second": 884.082,
894
+ "eval_steps_per_second": 1.791,
895
+ "step": 3304
896
+ },
897
+ {
898
+ "epoch": 60.0,
899
+ "grad_norm": 0.31674283742904663,
900
+ "learning_rate": 0.00029572386223868856,
901
+ "loss": 0.1434,
902
+ "step": 3360
903
+ },
904
+ {
905
+ "epoch": 60.0,
906
+ "eval_loss": 0.1497628092765808,
907
+ "eval_runtime": 12.2602,
908
+ "eval_samples_per_second": 885.791,
909
+ "eval_steps_per_second": 1.794,
910
+ "step": 3360
911
+ },
912
+ {
913
+ "epoch": 61.0,
914
+ "grad_norm": 0.1524023711681366,
915
+ "learning_rate": 0.0002956353450166127,
916
+ "loss": 0.1428,
917
+ "step": 3416
918
+ },
919
+ {
920
+ "epoch": 61.0,
921
+ "eval_loss": 0.15104272961616516,
922
+ "eval_runtime": 11.4854,
923
+ "eval_samples_per_second": 945.545,
924
+ "eval_steps_per_second": 1.915,
925
+ "step": 3416
926
+ },
927
+ {
928
+ "epoch": 62.0,
929
+ "grad_norm": 0.1333588808774948,
930
+ "learning_rate": 0.00029554538236271986,
931
+ "loss": 0.1427,
932
+ "step": 3472
933
+ },
934
+ {
935
+ "epoch": 62.0,
936
+ "eval_loss": 0.15125687420368195,
937
+ "eval_runtime": 11.619,
938
+ "eval_samples_per_second": 934.673,
939
+ "eval_steps_per_second": 1.893,
940
+ "step": 3472
941
+ },
942
+ {
943
+ "epoch": 63.0,
944
+ "grad_norm": 0.14987458288669586,
945
+ "learning_rate": 0.0002954539751649054,
946
+ "loss": 0.1427,
947
+ "step": 3528
948
+ },
949
+ {
950
+ "epoch": 63.0,
951
+ "eval_loss": 0.15022161602973938,
952
+ "eval_runtime": 11.7178,
953
+ "eval_samples_per_second": 926.795,
954
+ "eval_steps_per_second": 1.877,
955
+ "step": 3528
956
+ },
957
+ {
958
+ "epoch": 64.0,
959
+ "grad_norm": 0.19036932289600372,
960
+ "learning_rate": 0.00029536112432532164,
961
+ "loss": 0.1418,
962
+ "step": 3584
963
+ },
964
+ {
965
+ "epoch": 64.0,
966
+ "eval_loss": 0.15002530813217163,
967
+ "eval_runtime": 12.0423,
968
+ "eval_samples_per_second": 901.82,
969
+ "eval_steps_per_second": 1.827,
970
+ "step": 3584
971
+ },
972
+ {
973
+ "epoch": 65.0,
974
+ "grad_norm": 0.15858310461044312,
975
+ "learning_rate": 0.00029526683076036824,
976
+ "loss": 0.1416,
977
+ "step": 3640
978
+ },
979
+ {
980
+ "epoch": 65.0,
981
+ "eval_loss": 0.15072880685329437,
982
+ "eval_runtime": 11.4427,
983
+ "eval_samples_per_second": 949.077,
984
+ "eval_steps_per_second": 1.923,
985
+ "step": 3640
986
+ },
987
+ {
988
+ "epoch": 66.0,
989
+ "grad_norm": 0.1411045342683792,
990
+ "learning_rate": 0.0002951710954006851,
991
+ "loss": 0.1415,
992
+ "step": 3696
993
+ },
994
+ {
995
+ "epoch": 66.0,
996
+ "eval_loss": 0.150208979845047,
997
+ "eval_runtime": 11.7843,
998
+ "eval_samples_per_second": 921.567,
999
+ "eval_steps_per_second": 1.867,
1000
+ "step": 3696
1001
+ },
1002
+ {
1003
+ "epoch": 67.0,
1004
+ "grad_norm": 0.18127693235874176,
1005
+ "learning_rate": 0.00029507391919114174,
1006
+ "loss": 0.1407,
1007
+ "step": 3752
1008
+ },
1009
+ {
1010
+ "epoch": 67.0,
1011
+ "eval_loss": 0.15111134946346283,
1012
+ "eval_runtime": 11.7998,
1013
+ "eval_samples_per_second": 920.352,
1014
+ "eval_steps_per_second": 1.864,
1015
+ "step": 3752
1016
+ },
1017
+ {
1018
+ "epoch": 68.0,
1019
+ "grad_norm": 0.20954985916614532,
1020
+ "learning_rate": 0.0002949753030908276,
1021
+ "loss": 0.1404,
1022
+ "step": 3808
1023
+ },
1024
+ {
1025
+ "epoch": 68.0,
1026
+ "eval_loss": 0.15048466622829437,
1027
+ "eval_runtime": 11.8536,
1028
+ "eval_samples_per_second": 916.178,
1029
+ "eval_steps_per_second": 1.856,
1030
+ "step": 3808
1031
+ },
1032
+ {
1033
+ "epoch": 69.0,
1034
+ "grad_norm": 0.1799214780330658,
1035
+ "learning_rate": 0.0002948752480730442,
1036
+ "loss": 0.1401,
1037
+ "step": 3864
1038
+ },
1039
+ {
1040
+ "epoch": 69.0,
1041
+ "eval_loss": 0.14996136724948883,
1042
+ "eval_runtime": 11.8425,
1043
+ "eval_samples_per_second": 917.04,
1044
+ "eval_steps_per_second": 1.858,
1045
+ "step": 3864
1046
+ },
1047
+ {
1048
+ "epoch": 70.0,
1049
+ "grad_norm": 0.14687888324260712,
1050
+ "learning_rate": 0.0002947737551252938,
1051
+ "loss": 0.1399,
1052
+ "step": 3920
1053
+ },
1054
+ {
1055
+ "epoch": 70.0,
1056
+ "eval_loss": 0.1494998186826706,
1057
+ "eval_runtime": 11.8446,
1058
+ "eval_samples_per_second": 916.877,
1059
+ "eval_steps_per_second": 1.857,
1060
+ "step": 3920
1061
+ },
1062
+ {
1063
+ "epoch": 71.0,
1064
+ "grad_norm": 0.2250983864068985,
1065
+ "learning_rate": 0.000294670825249271,
1066
+ "loss": 0.1397,
1067
+ "step": 3976
1068
+ },
1069
+ {
1070
+ "epoch": 71.0,
1071
+ "eval_loss": 0.14974181354045868,
1072
+ "eval_runtime": 10.3667,
1073
+ "eval_samples_per_second": 1047.585,
1074
+ "eval_steps_per_second": 2.122,
1075
+ "step": 3976
1076
+ },
1077
+ {
1078
+ "epoch": 72.0,
1079
+ "grad_norm": 0.14977572858333588,
1080
+ "learning_rate": 0.00029456645946085235,
1081
+ "loss": 0.1393,
1082
+ "step": 4032
1083
+ },
1084
+ {
1085
+ "epoch": 72.0,
1086
+ "eval_loss": 0.1504337042570114,
1087
+ "eval_runtime": 11.0031,
1088
+ "eval_samples_per_second": 986.994,
1089
+ "eval_steps_per_second": 1.999,
1090
+ "step": 4032
1091
+ },
1092
+ {
1093
+ "epoch": 73.0,
1094
+ "grad_norm": 0.2215435802936554,
1095
+ "learning_rate": 0.00029446065879008577,
1096
+ "loss": 0.1389,
1097
+ "step": 4088
1098
+ },
1099
+ {
1100
+ "epoch": 73.0,
1101
+ "eval_loss": 0.14960449934005737,
1102
+ "eval_runtime": 10.5211,
1103
+ "eval_samples_per_second": 1032.216,
1104
+ "eval_steps_per_second": 2.091,
1105
+ "step": 4088
1106
+ },
1107
+ {
1108
+ "epoch": 74.0,
1109
+ "grad_norm": 0.14885684847831726,
1110
+ "learning_rate": 0.00029435342428118117,
1111
+ "loss": 0.1384,
1112
+ "step": 4144
1113
+ },
1114
+ {
1115
+ "epoch": 74.0,
1116
+ "eval_loss": 0.14882370829582214,
1117
+ "eval_runtime": 11.6942,
1118
+ "eval_samples_per_second": 928.669,
1119
+ "eval_steps_per_second": 1.881,
1120
+ "step": 4144
1121
+ },
1122
+ {
1123
+ "epoch": 75.0,
1124
+ "grad_norm": 0.20596224069595337,
1125
+ "learning_rate": 0.0002942447569924998,
1126
+ "loss": 0.1384,
1127
+ "step": 4200
1128
+ },
1129
+ {
1130
+ "epoch": 75.0,
1131
+ "eval_loss": 0.14847591519355774,
1132
+ "eval_runtime": 11.911,
1133
+ "eval_samples_per_second": 911.765,
1134
+ "eval_steps_per_second": 1.847,
1135
+ "step": 4200
1136
+ },
1137
+ {
1138
+ "epoch": 76.0,
1139
+ "grad_norm": 0.1551866978406906,
1140
+ "learning_rate": 0.0002941346579965444,
1141
+ "loss": 0.1379,
1142
+ "step": 4256
1143
+ },
1144
+ {
1145
+ "epoch": 76.0,
1146
+ "eval_loss": 0.1497822105884552,
1147
+ "eval_runtime": 11.0615,
1148
+ "eval_samples_per_second": 981.782,
1149
+ "eval_steps_per_second": 1.989,
1150
+ "step": 4256
1151
+ },
1152
+ {
1153
+ "epoch": 77.0,
1154
+ "grad_norm": 0.19567330181598663,
1155
+ "learning_rate": 0.00029402312837994727,
1156
+ "loss": 0.138,
1157
+ "step": 4312
1158
+ },
1159
+ {
1160
+ "epoch": 77.0,
1161
+ "eval_loss": 0.14890199899673462,
1162
+ "eval_runtime": 11.5065,
1163
+ "eval_samples_per_second": 943.812,
1164
+ "eval_steps_per_second": 1.912,
1165
+ "step": 4312
1166
+ },
1167
+ {
1168
+ "epoch": 78.0,
1169
+ "grad_norm": 0.1951490044593811,
1170
+ "learning_rate": 0.0002939101692434606,
1171
+ "loss": 0.1372,
1172
+ "step": 4368
1173
+ },
1174
+ {
1175
+ "epoch": 78.0,
1176
+ "eval_loss": 0.14929604530334473,
1177
+ "eval_runtime": 11.7303,
1178
+ "eval_samples_per_second": 925.806,
1179
+ "eval_steps_per_second": 1.875,
1180
+ "step": 4368
1181
+ },
1182
+ {
1183
+ "epoch": 79.0,
1184
+ "grad_norm": 0.15116438269615173,
1185
+ "learning_rate": 0.00029379578170194554,
1186
+ "loss": 0.1371,
1187
+ "step": 4424
1188
+ },
1189
+ {
1190
+ "epoch": 79.0,
1191
+ "eval_loss": 0.14909496903419495,
1192
+ "eval_runtime": 11.5142,
1193
+ "eval_samples_per_second": 943.184,
1194
+ "eval_steps_per_second": 1.911,
1195
+ "step": 4424
1196
+ },
1197
+ {
1198
+ "epoch": 80.0,
1199
+ "grad_norm": 0.24799354374408722,
1200
+ "learning_rate": 0.00029367996688436096,
1201
+ "loss": 0.1369,
1202
+ "step": 4480
1203
+ },
1204
+ {
1205
+ "epoch": 80.0,
1206
+ "eval_loss": 0.14952804148197174,
1207
+ "eval_runtime": 10.7014,
1208
+ "eval_samples_per_second": 1014.824,
1209
+ "eval_steps_per_second": 2.056,
1210
+ "step": 4480
1211
+ },
1212
+ {
1213
+ "epoch": 81.0,
1214
+ "grad_norm": 0.16792896389961243,
1215
+ "learning_rate": 0.00029356272593375216,
1216
+ "loss": 0.1368,
1217
+ "step": 4536
1218
+ },
1219
+ {
1220
+ "epoch": 81.0,
1221
+ "eval_loss": 0.1491686999797821,
1222
+ "eval_runtime": 11.5601,
1223
+ "eval_samples_per_second": 939.442,
1224
+ "eval_steps_per_second": 1.903,
1225
+ "step": 4536
1226
+ },
1227
+ {
1228
+ "epoch": 82.0,
1229
+ "grad_norm": 0.21115855872631073,
1230
+ "learning_rate": 0.00029344406000724046,
1231
+ "loss": 0.1363,
1232
+ "step": 4592
1233
+ },
1234
+ {
1235
+ "epoch": 82.0,
1236
+ "eval_loss": 0.14837497472763062,
1237
+ "eval_runtime": 11.7754,
1238
+ "eval_samples_per_second": 922.263,
1239
+ "eval_steps_per_second": 1.868,
1240
+ "step": 4592
1241
+ },
1242
+ {
1243
+ "epoch": 83.0,
1244
+ "grad_norm": 0.15595555305480957,
1245
+ "learning_rate": 0.0002933239702760101,
1246
+ "loss": 0.1361,
1247
+ "step": 4648
1248
+ },
1249
+ {
1250
+ "epoch": 83.0,
1251
+ "eval_loss": 0.14758282899856567,
1252
+ "eval_runtime": 11.5424,
1253
+ "eval_samples_per_second": 940.879,
1254
+ "eval_steps_per_second": 1.906,
1255
+ "step": 4648
1256
+ },
1257
+ {
1258
+ "epoch": 84.0,
1259
+ "grad_norm": 0.14343903958797455,
1260
+ "learning_rate": 0.00029320245792529843,
1261
+ "loss": 0.1355,
1262
+ "step": 4704
1263
+ },
1264
+ {
1265
+ "epoch": 84.0,
1266
+ "eval_loss": 0.1478155553340912,
1267
+ "eval_runtime": 11.4968,
1268
+ "eval_samples_per_second": 944.61,
1269
+ "eval_steps_per_second": 1.914,
1270
+ "step": 4704
1271
+ },
1272
+ {
1273
+ "epoch": 85.0,
1274
+ "grad_norm": 0.2670864462852478,
1275
+ "learning_rate": 0.00029307952415438376,
1276
+ "loss": 0.1353,
1277
+ "step": 4760
1278
+ },
1279
+ {
1280
+ "epoch": 85.0,
1281
+ "eval_loss": 0.14811985194683075,
1282
+ "eval_runtime": 11.0295,
1283
+ "eval_samples_per_second": 984.636,
1284
+ "eval_steps_per_second": 1.995,
1285
+ "step": 4760
1286
+ },
1287
+ {
1288
+ "epoch": 86.0,
1289
+ "grad_norm": 0.19388346374034882,
1290
+ "learning_rate": 0.00029295517017657207,
1291
+ "loss": 0.1353,
1292
+ "step": 4816
1293
+ },
1294
+ {
1295
+ "epoch": 86.0,
1296
+ "eval_loss": 0.14837351441383362,
1297
+ "eval_runtime": 11.4695,
1298
+ "eval_samples_per_second": 946.859,
1299
+ "eval_steps_per_second": 1.918,
1300
+ "step": 4816
1301
+ },
1302
+ {
1303
+ "epoch": 87.0,
1304
+ "grad_norm": 0.15899422764778137,
1305
+ "learning_rate": 0.00029282939721918743,
1306
+ "loss": 0.1351,
1307
+ "step": 4872
1308
+ },
1309
+ {
1310
+ "epoch": 87.0,
1311
+ "eval_loss": 0.14791646599769592,
1312
+ "eval_runtime": 11.4789,
1313
+ "eval_samples_per_second": 946.087,
1314
+ "eval_steps_per_second": 1.917,
1315
+ "step": 4872
1316
+ },
1317
+ {
1318
+ "epoch": 88.0,
1319
+ "grad_norm": 0.25924888253211975,
1320
+ "learning_rate": 0.00029270220652355785,
1321
+ "loss": 0.1345,
1322
+ "step": 4928
1323
+ },
1324
+ {
1325
+ "epoch": 88.0,
1326
+ "eval_loss": 0.1483958214521408,
1327
+ "eval_runtime": 11.0986,
1328
+ "eval_samples_per_second": 978.501,
1329
+ "eval_steps_per_second": 1.982,
1330
+ "step": 4928
1331
+ },
1332
+ {
1333
+ "epoch": 89.0,
1334
+ "grad_norm": 0.197585791349411,
1335
+ "learning_rate": 0.0002925735993450043,
1336
+ "loss": 0.1342,
1337
+ "step": 4984
1338
+ },
1339
+ {
1340
+ "epoch": 89.0,
1341
+ "eval_loss": 0.14841538667678833,
1342
+ "eval_runtime": 11.2913,
1343
+ "eval_samples_per_second": 961.799,
1344
+ "eval_steps_per_second": 1.948,
1345
+ "step": 4984
1346
+ },
1347
+ {
1348
+ "epoch": 90.0,
1349
+ "grad_norm": 0.18903715908527374,
1350
+ "learning_rate": 0.0002924435769528278,
1351
+ "loss": 0.1343,
1352
+ "step": 5040
1353
+ },
1354
+ {
1355
+ "epoch": 90.0,
1356
+ "eval_loss": 0.14745239913463593,
1357
+ "eval_runtime": 12.07,
1358
+ "eval_samples_per_second": 899.752,
1359
+ "eval_steps_per_second": 1.823,
1360
+ "step": 5040
1361
+ },
1362
+ {
1363
+ "epoch": 91.0,
1364
+ "grad_norm": 0.1610485017299652,
1365
+ "learning_rate": 0.00029231214063029666,
1366
+ "loss": 0.1336,
1367
+ "step": 5096
1368
+ },
1369
+ {
1370
+ "epoch": 91.0,
1371
+ "eval_loss": 0.1469384878873825,
1372
+ "eval_runtime": 12.1199,
1373
+ "eval_samples_per_second": 896.05,
1374
+ "eval_steps_per_second": 1.815,
1375
+ "step": 5096
1376
+ },
1377
+ {
1378
+ "epoch": 92.0,
1379
+ "grad_norm": 0.20112423598766327,
1380
+ "learning_rate": 0.00029217929167463404,
1381
+ "loss": 0.1337,
1382
+ "step": 5152
1383
+ },
1384
+ {
1385
+ "epoch": 92.0,
1386
+ "eval_loss": 0.14764182269573212,
1387
+ "eval_runtime": 10.2692,
1388
+ "eval_samples_per_second": 1057.536,
1389
+ "eval_steps_per_second": 2.142,
1390
+ "step": 5152
1391
+ },
1392
+ {
1393
+ "epoch": 93.0,
1394
+ "grad_norm": 0.28488588333129883,
1395
+ "learning_rate": 0.00029204503139700625,
1396
+ "loss": 0.1335,
1397
+ "step": 5208
1398
+ },
1399
+ {
1400
+ "epoch": 93.0,
1401
+ "eval_loss": 0.1479685753583908,
1402
+ "eval_runtime": 11.6849,
1403
+ "eval_samples_per_second": 929.407,
1404
+ "eval_steps_per_second": 1.883,
1405
+ "step": 5208
1406
+ },
1407
+ {
1408
+ "epoch": 94.0,
1409
+ "grad_norm": 0.2028261125087738,
1410
+ "learning_rate": 0.0002919093611225077,
1411
+ "loss": 0.1333,
1412
+ "step": 5264
1413
+ },
1414
+ {
1415
+ "epoch": 94.0,
1416
+ "eval_loss": 0.14725789427757263,
1417
+ "eval_runtime": 11.2025,
1418
+ "eval_samples_per_second": 969.429,
1419
+ "eval_steps_per_second": 1.964,
1420
+ "step": 5264
1421
+ },
1422
+ {
1423
+ "epoch": 95.0,
1424
+ "grad_norm": 0.20275919139385223,
1425
+ "learning_rate": 0.0002917722821901492,
1426
+ "loss": 0.1334,
1427
+ "step": 5320
1428
+ },
1429
+ {
1430
+ "epoch": 95.0,
1431
+ "eval_loss": 0.14767614006996155,
1432
+ "eval_runtime": 10.8005,
1433
+ "eval_samples_per_second": 1005.513,
1434
+ "eval_steps_per_second": 2.037,
1435
+ "step": 5320
1436
+ },
1437
+ {
1438
+ "epoch": 96.0,
1439
+ "grad_norm": 0.2053348869085312,
1440
+ "learning_rate": 0.0002916337959528444,
1441
+ "loss": 0.1325,
1442
+ "step": 5376
1443
+ },
1444
+ {
1445
+ "epoch": 96.0,
1446
+ "eval_loss": 0.14707864820957184,
1447
+ "eval_runtime": 11.1238,
1448
+ "eval_samples_per_second": 976.287,
1449
+ "eval_steps_per_second": 1.978,
1450
+ "step": 5376
1451
+ },
1452
+ {
1453
+ "epoch": 97.0,
1454
+ "grad_norm": 0.23510950803756714,
1455
+ "learning_rate": 0.0002914939037773966,
1456
+ "loss": 0.1321,
1457
+ "step": 5432
1458
+ },
1459
+ {
1460
+ "epoch": 97.0,
1461
+ "eval_loss": 0.1476944088935852,
1462
+ "eval_runtime": 10.9362,
1463
+ "eval_samples_per_second": 993.028,
1464
+ "eval_steps_per_second": 2.012,
1465
+ "step": 5432
1466
+ },
1467
+ {
1468
+ "epoch": 98.0,
1469
+ "grad_norm": 0.2703108787536621,
1470
+ "learning_rate": 0.000291352607044485,
1471
+ "loss": 0.1327,
1472
+ "step": 5488
1473
+ },
1474
+ {
1475
+ "epoch": 98.0,
1476
+ "eval_loss": 0.1466565579175949,
1477
+ "eval_runtime": 10.8189,
1478
+ "eval_samples_per_second": 1003.802,
1479
+ "eval_steps_per_second": 2.033,
1480
+ "step": 5488
1481
+ },
1482
+ {
1483
+ "epoch": 99.0,
1484
+ "grad_norm": 0.22386641800403595,
1485
+ "learning_rate": 0.0002912099071486513,
1486
+ "loss": 0.1318,
1487
+ "step": 5544
1488
+ },
1489
+ {
1490
+ "epoch": 99.0,
1491
+ "eval_loss": 0.1469065397977829,
1492
+ "eval_runtime": 10.9677,
1493
+ "eval_samples_per_second": 990.181,
1494
+ "eval_steps_per_second": 2.006,
1495
+ "step": 5544
1496
+ },
1497
+ {
1498
+ "epoch": 100.0,
1499
+ "grad_norm": 0.18684013187885284,
1500
+ "learning_rate": 0.0002910658054982861,
1501
+ "loss": 0.1319,
1502
+ "step": 5600
1503
+ },
1504
+ {
1505
+ "epoch": 100.0,
1506
+ "eval_loss": 0.1462097316980362,
1507
+ "eval_runtime": 11.5801,
1508
+ "eval_samples_per_second": 937.82,
1509
+ "eval_steps_per_second": 1.9,
1510
+ "step": 5600
1511
+ },
1512
+ {
1513
+ "epoch": 101.0,
1514
+ "grad_norm": 0.1831580400466919,
1515
+ "learning_rate": 0.00029092030351561435,
1516
+ "loss": 0.1318,
1517
+ "step": 5656
1518
+ },
1519
+ {
1520
+ "epoch": 101.0,
1521
+ "eval_loss": 0.1467864215373993,
1522
+ "eval_runtime": 11.2551,
1523
+ "eval_samples_per_second": 964.899,
1524
+ "eval_steps_per_second": 1.955,
1525
+ "step": 5656
1526
+ },
1527
+ {
1528
+ "epoch": 102.0,
1529
+ "grad_norm": 0.20423631370067596,
1530
+ "learning_rate": 0.00029077340263668184,
1531
+ "loss": 0.1315,
1532
+ "step": 5712
1533
+ },
1534
+ {
1535
+ "epoch": 102.0,
1536
+ "eval_loss": 0.1470629870891571,
1537
+ "eval_runtime": 10.0185,
1538
+ "eval_samples_per_second": 1083.994,
1539
+ "eval_steps_per_second": 2.196,
1540
+ "step": 5712
1541
+ },
1542
+ {
1543
+ "epoch": 103.0,
1544
+ "grad_norm": 0.20669810473918915,
1545
+ "learning_rate": 0.0002906251043113414,
1546
+ "loss": 0.1312,
1547
+ "step": 5768
1548
+ },
1549
+ {
1550
+ "epoch": 103.0,
1551
+ "eval_loss": 0.14603030681610107,
1552
+ "eval_runtime": 11.5962,
1553
+ "eval_samples_per_second": 936.51,
1554
+ "eval_steps_per_second": 1.897,
1555
+ "step": 5768
1556
+ },
1557
+ {
1558
+ "epoch": 104.0,
1559
+ "grad_norm": 0.18566496670246124,
1560
+ "learning_rate": 0.0002904754100032369,
1561
+ "loss": 0.1308,
1562
+ "step": 5824
1563
+ },
1564
+ {
1565
+ "epoch": 104.0,
1566
+ "eval_loss": 0.146591916680336,
1567
+ "eval_runtime": 11.8139,
1568
+ "eval_samples_per_second": 919.255,
1569
+ "eval_steps_per_second": 1.862,
1570
+ "step": 5824
1571
+ },
1572
+ {
1573
+ "epoch": 105.0,
1574
+ "grad_norm": 0.32265496253967285,
1575
+ "learning_rate": 0.000290324321189791,
1576
+ "loss": 0.1311,
1577
+ "step": 5880
1578
+ },
1579
+ {
1580
+ "epoch": 105.0,
1581
+ "eval_loss": 0.1458718478679657,
1582
+ "eval_runtime": 11.9546,
1583
+ "eval_samples_per_second": 908.438,
1584
+ "eval_steps_per_second": 1.84,
1585
+ "step": 5880
1586
+ },
1587
+ {
1588
+ "epoch": 106.0,
1589
+ "grad_norm": 0.17987699806690216,
1590
+ "learning_rate": 0.00029017183936218906,
1591
+ "loss": 0.1302,
1592
+ "step": 5936
1593
+ },
1594
+ {
1595
+ "epoch": 106.0,
1596
+ "eval_loss": 0.1459737867116928,
1597
+ "eval_runtime": 12.1694,
1598
+ "eval_samples_per_second": 892.4,
1599
+ "eval_steps_per_second": 1.808,
1600
+ "step": 5936
1601
+ },
1602
+ {
1603
+ "epoch": 107.0,
1604
+ "grad_norm": 0.18314820528030396,
1605
+ "learning_rate": 0.0002900179660253659,
1606
+ "loss": 0.1303,
1607
+ "step": 5992
1608
+ },
1609
+ {
1610
+ "epoch": 107.0,
1611
+ "eval_loss": 0.14506617188453674,
1612
+ "eval_runtime": 11.0204,
1613
+ "eval_samples_per_second": 985.446,
1614
+ "eval_steps_per_second": 1.996,
1615
+ "step": 5992
1616
+ },
1617
+ {
1618
+ "epoch": 108.0,
1619
+ "grad_norm": 0.1967027485370636,
1620
+ "learning_rate": 0.00028986270269798893,
1621
+ "loss": 0.13,
1622
+ "step": 6048
1623
+ },
1624
+ {
1625
+ "epoch": 108.0,
1626
+ "eval_loss": 0.1448826640844345,
1627
+ "eval_runtime": 11.2115,
1628
+ "eval_samples_per_second": 968.651,
1629
+ "eval_steps_per_second": 1.962,
1630
+ "step": 6048
1631
+ },
1632
+ {
1633
+ "epoch": 109.0,
1634
+ "grad_norm": 0.17848514020442963,
1635
+ "learning_rate": 0.00028970605091244395,
1636
+ "loss": 0.13,
1637
+ "step": 6104
1638
+ },
1639
+ {
1640
+ "epoch": 109.0,
1641
+ "eval_loss": 0.14577716588974,
1642
+ "eval_runtime": 12.0159,
1643
+ "eval_samples_per_second": 903.806,
1644
+ "eval_steps_per_second": 1.831,
1645
+ "step": 6104
1646
+ },
1647
+ {
1648
+ "epoch": 110.0,
1649
+ "grad_norm": 0.1681281179189682,
1650
+ "learning_rate": 0.00028954801221482137,
1651
+ "loss": 0.13,
1652
+ "step": 6160
1653
+ },
1654
+ {
1655
+ "epoch": 110.0,
1656
+ "eval_loss": 0.1459922343492508,
1657
+ "eval_runtime": 11.657,
1658
+ "eval_samples_per_second": 931.628,
1659
+ "eval_steps_per_second": 1.887,
1660
+ "step": 6160
1661
+ },
1662
+ {
1663
+ "epoch": 111.0,
1664
+ "grad_norm": 0.19543369114398956,
1665
+ "learning_rate": 0.00028938858816489945,
1666
+ "loss": 0.1294,
1667
+ "step": 6216
1668
+ },
1669
+ {
1670
+ "epoch": 111.0,
1671
+ "eval_loss": 0.14557458460330963,
1672
+ "eval_runtime": 11.502,
1673
+ "eval_samples_per_second": 944.183,
1674
+ "eval_steps_per_second": 1.913,
1675
+ "step": 6216
1676
+ },
1677
+ {
1678
+ "epoch": 112.0,
1679
+ "grad_norm": 0.19514279067516327,
1680
+ "learning_rate": 0.0002892277803361288,
1681
+ "loss": 0.1294,
1682
+ "step": 6272
1683
+ },
1684
+ {
1685
+ "epoch": 112.0,
1686
+ "eval_loss": 0.14542081952095032,
1687
+ "eval_runtime": 11.3675,
1688
+ "eval_samples_per_second": 955.353,
1689
+ "eval_steps_per_second": 1.935,
1690
+ "step": 6272
1691
+ },
1692
+ {
1693
+ "epoch": 113.0,
1694
+ "grad_norm": 0.19245897233486176,
1695
+ "learning_rate": 0.00028906559031561803,
1696
+ "loss": 0.1294,
1697
+ "step": 6328
1698
+ },
1699
+ {
1700
+ "epoch": 113.0,
1701
+ "eval_loss": 0.14575673639774323,
1702
+ "eval_runtime": 12.0854,
1703
+ "eval_samples_per_second": 898.603,
1704
+ "eval_steps_per_second": 1.82,
1705
+ "step": 6328
1706
+ },
1707
+ {
1708
+ "epoch": 114.0,
1709
+ "grad_norm": 0.2559398412704468,
1710
+ "learning_rate": 0.0002889020197041172,
1711
+ "loss": 0.129,
1712
+ "step": 6384
1713
+ },
1714
+ {
1715
+ "epoch": 114.0,
1716
+ "eval_loss": 0.14476452767848969,
1717
+ "eval_runtime": 11.4747,
1718
+ "eval_samples_per_second": 946.432,
1719
+ "eval_steps_per_second": 1.917,
1720
+ "step": 6384
1721
+ },
1722
+ {
1723
+ "epoch": 115.0,
1724
+ "grad_norm": 0.1581374853849411,
1725
+ "learning_rate": 0.0002887370701160019,
1726
+ "loss": 0.129,
1727
+ "step": 6440
1728
+ },
1729
+ {
1730
+ "epoch": 115.0,
1731
+ "eval_loss": 0.14649543166160583,
1732
+ "eval_runtime": 11.7792,
1733
+ "eval_samples_per_second": 921.961,
1734
+ "eval_steps_per_second": 1.868,
1735
+ "step": 6440
1736
+ },
1737
+ {
1738
+ "epoch": 116.0,
1739
+ "grad_norm": 0.17189738154411316,
1740
+ "learning_rate": 0.0002885707431792581,
1741
+ "loss": 0.1282,
1742
+ "step": 6496
1743
+ },
1744
+ {
1745
+ "epoch": 116.0,
1746
+ "eval_loss": 0.14660660922527313,
1747
+ "eval_runtime": 11.9186,
1748
+ "eval_samples_per_second": 911.183,
1749
+ "eval_steps_per_second": 1.846,
1750
+ "step": 6496
1751
+ },
1752
+ {
1753
+ "epoch": 117.0,
1754
+ "grad_norm": 0.2357121855020523,
1755
+ "learning_rate": 0.0002884030405354656,
1756
+ "loss": 0.129,
1757
+ "step": 6552
1758
+ },
1759
+ {
1760
+ "epoch": 117.0,
1761
+ "eval_loss": 0.146439790725708,
1762
+ "eval_runtime": 11.5156,
1763
+ "eval_samples_per_second": 943.071,
1764
+ "eval_steps_per_second": 1.91,
1765
+ "step": 6552
1766
+ },
1767
+ {
1768
+ "epoch": 118.0,
1769
+ "grad_norm": 0.1968863159418106,
1770
+ "learning_rate": 0.00028823396383978163,
1771
+ "loss": 0.1279,
1772
+ "step": 6608
1773
+ },
1774
+ {
1775
+ "epoch": 118.0,
1776
+ "eval_loss": 0.1450948715209961,
1777
+ "eval_runtime": 11.6204,
1778
+ "eval_samples_per_second": 934.567,
1779
+ "eval_steps_per_second": 1.893,
1780
+ "step": 6608
1781
+ },
1782
+ {
1783
+ "epoch": 119.0,
1784
+ "grad_norm": 0.16850939393043518,
1785
+ "learning_rate": 0.0002880635147609254,
1786
+ "loss": 0.1279,
1787
+ "step": 6664
1788
+ },
1789
+ {
1790
+ "epoch": 119.0,
1791
+ "eval_loss": 0.1456771343946457,
1792
+ "eval_runtime": 11.4295,
1793
+ "eval_samples_per_second": 950.17,
1794
+ "eval_steps_per_second": 1.925,
1795
+ "step": 6664
1796
+ },
1797
+ {
1798
+ "epoch": 120.0,
1799
+ "grad_norm": 0.20816339552402496,
1800
+ "learning_rate": 0.0002878916949811601,
1801
+ "loss": 0.1277,
1802
+ "step": 6720
1803
+ },
1804
+ {
1805
+ "epoch": 120.0,
1806
+ "eval_loss": 0.1461264193058014,
1807
+ "eval_runtime": 11.9161,
1808
+ "eval_samples_per_second": 911.372,
1809
+ "eval_steps_per_second": 1.846,
1810
+ "step": 6720
1811
+ },
1812
+ {
1813
+ "epoch": 121.0,
1814
+ "grad_norm": 0.19195137917995453,
1815
+ "learning_rate": 0.0002877185061962775,
1816
+ "loss": 0.1279,
1817
+ "step": 6776
1818
+ },
1819
+ {
1820
+ "epoch": 121.0,
1821
+ "eval_loss": 0.14506319165229797,
1822
+ "eval_runtime": 10.7769,
1823
+ "eval_samples_per_second": 1007.715,
1824
+ "eval_steps_per_second": 2.041,
1825
+ "step": 6776
1826
+ },
1827
+ {
1828
+ "epoch": 122.0,
1829
+ "grad_norm": 0.1636265516281128,
1830
+ "learning_rate": 0.0002875439501155812,
1831
+ "loss": 0.1277,
1832
+ "step": 6832
1833
+ },
1834
+ {
1835
+ "epoch": 122.0,
1836
+ "eval_loss": 0.1454634666442871,
1837
+ "eval_runtime": 11.7121,
1838
+ "eval_samples_per_second": 927.245,
1839
+ "eval_steps_per_second": 1.878,
1840
+ "step": 6832
1841
+ },
1842
+ {
1843
+ "epoch": 123.0,
1844
+ "grad_norm": 0.17660963535308838,
1845
+ "learning_rate": 0.00028736802846186907,
1846
+ "loss": 0.1273,
1847
+ "step": 6888
1848
+ },
1849
+ {
1850
+ "epoch": 123.0,
1851
+ "eval_loss": 0.1449379324913025,
1852
+ "eval_runtime": 12.0977,
1853
+ "eval_samples_per_second": 897.695,
1854
+ "eval_steps_per_second": 1.819,
1855
+ "step": 6888
1856
+ },
1857
+ {
1858
+ "epoch": 124.0,
1859
+ "grad_norm": 0.20895443856716156,
1860
+ "learning_rate": 0.00028719074297141686,
1861
+ "loss": 0.127,
1862
+ "step": 6944
1863
+ },
1864
+ {
1865
+ "epoch": 124.0,
1866
+ "eval_loss": 0.14427852630615234,
1867
+ "eval_runtime": 11.8774,
1868
+ "eval_samples_per_second": 914.341,
1869
+ "eval_steps_per_second": 1.852,
1870
+ "step": 6944
1871
+ },
1872
+ {
1873
+ "epoch": 125.0,
1874
+ "grad_norm": 0.1895224153995514,
1875
+ "learning_rate": 0.0002870120953939609,
1876
+ "loss": 0.1269,
1877
+ "step": 7000
1878
+ },
1879
+ {
1880
+ "epoch": 125.0,
1881
+ "eval_loss": 0.1446518748998642,
1882
+ "eval_runtime": 11.7658,
1883
+ "eval_samples_per_second": 923.015,
1884
+ "eval_steps_per_second": 1.87,
1885
+ "step": 7000
1886
+ },
1887
+ {
1888
+ "epoch": 126.0,
1889
+ "grad_norm": 0.191587895154953,
1890
+ "learning_rate": 0.0002868320874926807,
1891
+ "loss": 0.1269,
1892
+ "step": 7056
1893
+ },
1894
+ {
1895
+ "epoch": 126.0,
1896
+ "eval_loss": 0.14533261954784393,
1897
+ "eval_runtime": 11.2533,
1898
+ "eval_samples_per_second": 965.053,
1899
+ "eval_steps_per_second": 1.955,
1900
+ "step": 7056
1901
+ },
1902
+ {
1903
+ "epoch": 127.0,
1904
+ "grad_norm": 0.20511987805366516,
1905
+ "learning_rate": 0.00028665072104418107,
1906
+ "loss": 0.1263,
1907
+ "step": 7112
1908
+ },
1909
+ {
1910
+ "epoch": 127.0,
1911
+ "eval_loss": 0.1444355994462967,
1912
+ "eval_runtime": 11.3297,
1913
+ "eval_samples_per_second": 958.545,
1914
+ "eval_steps_per_second": 1.942,
1915
+ "step": 7112
1916
+ },
1917
+ {
1918
+ "epoch": 128.0,
1919
+ "grad_norm": 0.19347704946994781,
1920
+ "learning_rate": 0.0002864679978384761,
1921
+ "loss": 0.1266,
1922
+ "step": 7168
1923
+ },
1924
+ {
1925
+ "epoch": 128.0,
1926
+ "eval_loss": 0.14528335630893707,
1927
+ "eval_runtime": 11.7467,
1928
+ "eval_samples_per_second": 924.517,
1929
+ "eval_steps_per_second": 1.873,
1930
+ "step": 7168
1931
+ },
1932
+ {
1933
+ "epoch": 129.0,
1934
+ "grad_norm": 0.1948786824941635,
1935
+ "learning_rate": 0.00028628391967896994,
1936
+ "loss": 0.1267,
1937
+ "step": 7224
1938
+ },
1939
+ {
1940
+ "epoch": 129.0,
1941
+ "eval_loss": 0.1452852487564087,
1942
+ "eval_runtime": 10.7249,
1943
+ "eval_samples_per_second": 1012.6,
1944
+ "eval_steps_per_second": 2.051,
1945
+ "step": 7224
1946
+ },
1947
+ {
1948
+ "epoch": 130.0,
1949
+ "grad_norm": 0.2143562138080597,
1950
+ "learning_rate": 0.00028609848838243983,
1951
+ "loss": 0.1263,
1952
+ "step": 7280
1953
+ },
1954
+ {
1955
+ "epoch": 130.0,
1956
+ "eval_loss": 0.14422422647476196,
1957
+ "eval_runtime": 12.1111,
1958
+ "eval_samples_per_second": 896.699,
1959
+ "eval_steps_per_second": 1.817,
1960
+ "step": 7280
1961
+ },
1962
+ {
1963
+ "epoch": 131.0,
1964
+ "grad_norm": 0.17198456823825836,
1965
+ "learning_rate": 0.0002859117057790177,
1966
+ "loss": 0.1258,
1967
+ "step": 7336
1968
+ },
1969
+ {
1970
+ "epoch": 131.0,
1971
+ "eval_loss": 0.14419187605381012,
1972
+ "eval_runtime": 11.2161,
1973
+ "eval_samples_per_second": 968.25,
1974
+ "eval_steps_per_second": 1.961,
1975
+ "step": 7336
1976
+ },
1977
+ {
1978
+ "epoch": 132.0,
1979
+ "grad_norm": 0.2027718871831894,
1980
+ "learning_rate": 0.0002857235737121728,
1981
+ "loss": 0.1257,
1982
+ "step": 7392
1983
+ },
1984
+ {
1985
+ "epoch": 132.0,
1986
+ "eval_loss": 0.14398382604122162,
1987
+ "eval_runtime": 11.7549,
1988
+ "eval_samples_per_second": 923.871,
1989
+ "eval_steps_per_second": 1.872,
1990
+ "step": 7392
1991
+ },
1992
+ {
1993
+ "epoch": 133.0,
1994
+ "grad_norm": 0.18598471581935883,
1995
+ "learning_rate": 0.00028553409403869214,
1996
+ "loss": 0.1256,
1997
+ "step": 7448
1998
+ },
1999
+ {
2000
+ "epoch": 133.0,
2001
+ "eval_loss": 0.144750714302063,
2002
+ "eval_runtime": 10.9992,
2003
+ "eval_samples_per_second": 987.344,
2004
+ "eval_steps_per_second": 2.0,
2005
+ "step": 7448
2006
+ },
2007
+ {
2008
+ "epoch": 134.0,
2009
+ "grad_norm": 0.18290792405605316,
2010
+ "learning_rate": 0.0002853432686286638,
2011
+ "loss": 0.1255,
2012
+ "step": 7504
2013
+ },
2014
+ {
2015
+ "epoch": 134.0,
2016
+ "eval_loss": 0.14384572207927704,
2017
+ "eval_runtime": 11.23,
2018
+ "eval_samples_per_second": 967.05,
2019
+ "eval_steps_per_second": 1.959,
2020
+ "step": 7504
2021
+ },
2022
+ {
2023
+ "epoch": 135.0,
2024
+ "grad_norm": 0.22160011529922485,
2025
+ "learning_rate": 0.0002851510993654578,
2026
+ "loss": 0.1254,
2027
+ "step": 7560
2028
+ },
2029
+ {
2030
+ "epoch": 135.0,
2031
+ "eval_loss": 0.1437937319278717,
2032
+ "eval_runtime": 11.9673,
2033
+ "eval_samples_per_second": 907.472,
2034
+ "eval_steps_per_second": 1.838,
2035
+ "step": 7560
2036
+ },
2037
+ {
2038
+ "epoch": 136.0,
2039
+ "grad_norm": 0.18182989954948425,
2040
+ "learning_rate": 0.0002849575881457068,
2041
+ "loss": 0.1252,
2042
+ "step": 7616
2043
+ },
2044
+ {
2045
+ "epoch": 136.0,
2046
+ "eval_loss": 0.14378975331783295,
2047
+ "eval_runtime": 11.8117,
2048
+ "eval_samples_per_second": 919.426,
2049
+ "eval_steps_per_second": 1.863,
2050
+ "step": 7616
2051
+ },
2052
+ {
2053
+ "epoch": 137.0,
2054
+ "grad_norm": 0.16500607132911682,
2055
+ "learning_rate": 0.0002847627368792885,
2056
+ "loss": 0.125,
2057
+ "step": 7672
2058
+ },
2059
+ {
2060
+ "epoch": 137.0,
2061
+ "eval_loss": 0.1436585932970047,
2062
+ "eval_runtime": 12.4256,
2063
+ "eval_samples_per_second": 874.0,
2064
+ "eval_steps_per_second": 1.771,
2065
+ "step": 7672
2066
+ },
2067
+ {
2068
+ "epoch": 138.0,
2069
+ "grad_norm": 0.22664882242679596,
2070
+ "learning_rate": 0.0002845665474893062,
2071
+ "loss": 0.125,
2072
+ "step": 7728
2073
+ },
2074
+ {
2075
+ "epoch": 138.0,
2076
+ "eval_loss": 0.14313535392284393,
2077
+ "eval_runtime": 12.1895,
2078
+ "eval_samples_per_second": 890.932,
2079
+ "eval_steps_per_second": 1.805,
2080
+ "step": 7728
2081
+ },
2082
+ {
2083
+ "epoch": 139.0,
2084
+ "grad_norm": 0.1606769859790802,
2085
+ "learning_rate": 0.0002843690219120703,
2086
+ "loss": 0.1242,
2087
+ "step": 7784
2088
+ },
2089
+ {
2090
+ "epoch": 139.0,
2091
+ "eval_loss": 0.14361213147640228,
2092
+ "eval_runtime": 12.1036,
2093
+ "eval_samples_per_second": 897.251,
2094
+ "eval_steps_per_second": 1.818,
2095
+ "step": 7784
2096
+ },
2097
+ {
2098
+ "epoch": 140.0,
2099
+ "grad_norm": 0.20197436213493347,
2100
+ "learning_rate": 0.0002841701620970783,
2101
+ "loss": 0.1244,
2102
+ "step": 7840
2103
+ },
2104
+ {
2105
+ "epoch": 140.0,
2106
+ "eval_loss": 0.142960324883461,
2107
+ "eval_runtime": 11.6316,
2108
+ "eval_samples_per_second": 933.665,
2109
+ "eval_steps_per_second": 1.891,
2110
+ "step": 7840
2111
+ },
2112
+ {
2113
+ "epoch": 141.0,
2114
+ "grad_norm": 0.18616272509098053,
2115
+ "learning_rate": 0.000283969970006996,
2116
+ "loss": 0.1243,
2117
+ "step": 7896
2118
+ },
2119
+ {
2120
+ "epoch": 141.0,
2121
+ "eval_loss": 0.1441134661436081,
2122
+ "eval_runtime": 11.589,
2123
+ "eval_samples_per_second": 937.094,
2124
+ "eval_steps_per_second": 1.898,
2125
+ "step": 7896
2126
+ },
2127
+ {
2128
+ "epoch": 142.0,
2129
+ "grad_norm": 0.20340923964977264,
2130
+ "learning_rate": 0.0002837684476176391,
2131
+ "loss": 0.1239,
2132
+ "step": 7952
2133
+ },
2134
+ {
2135
+ "epoch": 142.0,
2136
+ "eval_loss": 0.1434699296951294,
2137
+ "eval_runtime": 12.3235,
2138
+ "eval_samples_per_second": 881.241,
2139
+ "eval_steps_per_second": 1.785,
2140
+ "step": 7952
2141
+ },
2142
+ {
2143
+ "epoch": 143.0,
2144
+ "grad_norm": 0.18145394325256348,
2145
+ "learning_rate": 0.0002835655969179518,
2146
+ "loss": 0.1241,
2147
+ "step": 8008
2148
+ },
2149
+ {
2150
+ "epoch": 143.0,
2151
+ "eval_loss": 0.14338643848896027,
2152
+ "eval_runtime": 12.3449,
2153
+ "eval_samples_per_second": 879.717,
2154
+ "eval_steps_per_second": 1.782,
2155
+ "step": 8008
2156
+ },
2157
+ {
2158
+ "epoch": 144.0,
2159
+ "grad_norm": 0.1755165159702301,
2160
+ "learning_rate": 0.0002833614199099885,
2161
+ "loss": 0.1241,
2162
+ "step": 8064
2163
+ },
2164
+ {
2165
+ "epoch": 144.0,
2166
+ "eval_loss": 0.14308682084083557,
2167
+ "eval_runtime": 12.0765,
2168
+ "eval_samples_per_second": 899.268,
2169
+ "eval_steps_per_second": 1.822,
2170
+ "step": 8064
2171
+ },
2172
+ {
2173
+ "epoch": 145.0,
2174
+ "grad_norm": 0.18520286679267883,
2175
+ "learning_rate": 0.00028315591860889397,
2176
+ "loss": 0.1238,
2177
+ "step": 8120
2178
+ },
2179
+ {
2180
+ "epoch": 145.0,
2181
+ "eval_loss": 0.14301612973213196,
2182
+ "eval_runtime": 11.4026,
2183
+ "eval_samples_per_second": 952.414,
2184
+ "eval_steps_per_second": 1.929,
2185
+ "step": 8120
2186
+ },
2187
+ {
2188
+ "epoch": 146.0,
2189
+ "grad_norm": 0.2836858630180359,
2190
+ "learning_rate": 0.0002829490950428833,
2191
+ "loss": 0.1237,
2192
+ "step": 8176
2193
+ },
2194
+ {
2195
+ "epoch": 146.0,
2196
+ "eval_loss": 0.1432274430990219,
2197
+ "eval_runtime": 10.5295,
2198
+ "eval_samples_per_second": 1031.389,
2199
+ "eval_steps_per_second": 2.089,
2200
+ "step": 8176
2201
+ },
2202
+ {
2203
+ "epoch": 147.0,
2204
+ "grad_norm": 0.18382933735847473,
2205
+ "learning_rate": 0.0002827409512532215,
2206
+ "loss": 0.1233,
2207
+ "step": 8232
2208
+ },
2209
+ {
2210
+ "epoch": 147.0,
2211
+ "eval_loss": 0.14315703511238098,
2212
+ "eval_runtime": 11.7841,
2213
+ "eval_samples_per_second": 921.584,
2214
+ "eval_steps_per_second": 1.867,
2215
+ "step": 8232
2216
+ },
2217
+ {
2218
+ "epoch": 148.0,
2219
+ "grad_norm": 0.16152502596378326,
2220
+ "learning_rate": 0.00028253148929420393,
2221
+ "loss": 0.1236,
2222
+ "step": 8288
2223
+ },
2224
+ {
2225
+ "epoch": 148.0,
2226
+ "eval_loss": 0.14190851151943207,
2227
+ "eval_runtime": 12.2311,
2228
+ "eval_samples_per_second": 887.903,
2229
+ "eval_steps_per_second": 1.799,
2230
+ "step": 8288
2231
+ },
2232
+ {
2233
+ "epoch": 149.0,
2234
+ "grad_norm": 0.23382407426834106,
2235
+ "learning_rate": 0.0002823207112331354,
2236
+ "loss": 0.1232,
2237
+ "step": 8344
2238
+ },
2239
+ {
2240
+ "epoch": 149.0,
2241
+ "eval_loss": 0.14270788431167603,
2242
+ "eval_runtime": 12.109,
2243
+ "eval_samples_per_second": 896.855,
2244
+ "eval_steps_per_second": 1.817,
2245
+ "step": 8344
2246
+ },
2247
+ {
2248
+ "epoch": 150.0,
2249
+ "grad_norm": 0.1615588366985321,
2250
+ "learning_rate": 0.00028210861915030973,
2251
+ "loss": 0.1232,
2252
+ "step": 8400
2253
+ },
2254
+ {
2255
+ "epoch": 150.0,
2256
+ "eval_loss": 0.14285807311534882,
2257
+ "eval_runtime": 12.5884,
2258
+ "eval_samples_per_second": 862.702,
2259
+ "eval_steps_per_second": 1.748,
2260
+ "step": 8400
2261
+ },
2262
+ {
2263
+ "epoch": 151.0,
2264
+ "grad_norm": 0.2795417308807373,
2265
+ "learning_rate": 0.0002818952151389907,
2266
+ "loss": 0.1227,
2267
+ "step": 8456
2268
+ },
2269
+ {
2270
+ "epoch": 151.0,
2271
+ "eval_loss": 0.14255040884017944,
2272
+ "eval_runtime": 12.5025,
2273
+ "eval_samples_per_second": 868.624,
2274
+ "eval_steps_per_second": 1.76,
2275
+ "step": 8456
2276
+ },
2277
+ {
2278
+ "epoch": 152.0,
2279
+ "grad_norm": 0.2292180061340332,
2280
+ "learning_rate": 0.00028168050130538953,
2281
+ "loss": 0.1231,
2282
+ "step": 8512
2283
+ },
2284
+ {
2285
+ "epoch": 152.0,
2286
+ "eval_loss": 0.14337477087974548,
2287
+ "eval_runtime": 12.1529,
2288
+ "eval_samples_per_second": 893.611,
2289
+ "eval_steps_per_second": 1.81,
2290
+ "step": 8512
2291
+ },
2292
+ {
2293
+ "epoch": 153.0,
2294
+ "grad_norm": 0.17736776173114777,
2295
+ "learning_rate": 0.00028146447976864553,
2296
+ "loss": 0.1224,
2297
+ "step": 8568
2298
+ },
2299
+ {
2300
+ "epoch": 153.0,
2301
+ "eval_loss": 0.14352336525917053,
2302
+ "eval_runtime": 12.3539,
2303
+ "eval_samples_per_second": 879.073,
2304
+ "eval_steps_per_second": 1.781,
2305
+ "step": 8568
2306
+ },
2307
+ {
2308
+ "epoch": 154.0,
2309
+ "grad_norm": 0.36273321509361267,
2310
+ "learning_rate": 0.0002812471526608039,
2311
+ "loss": 0.1227,
2312
+ "step": 8624
2313
+ },
2314
+ {
2315
+ "epoch": 154.0,
2316
+ "eval_loss": 0.142772376537323,
2317
+ "eval_runtime": 12.0892,
2318
+ "eval_samples_per_second": 898.323,
2319
+ "eval_steps_per_second": 1.82,
2320
+ "step": 8624
2321
+ },
2322
+ {
2323
+ "epoch": 155.0,
2324
+ "grad_norm": 0.19883078336715698,
2325
+ "learning_rate": 0.00028102852212679526,
2326
+ "loss": 0.1228,
2327
+ "step": 8680
2328
+ },
2329
+ {
2330
+ "epoch": 155.0,
2331
+ "eval_loss": 0.14210332930088043,
2332
+ "eval_runtime": 12.2389,
2333
+ "eval_samples_per_second": 887.336,
2334
+ "eval_steps_per_second": 1.798,
2335
+ "step": 8680
2336
+ },
2337
+ {
2338
+ "epoch": 156.0,
2339
+ "grad_norm": 0.2114337682723999,
2340
+ "learning_rate": 0.00028080859032441463,
2341
+ "loss": 0.1223,
2342
+ "step": 8736
2343
+ },
2344
+ {
2345
+ "epoch": 156.0,
2346
+ "eval_loss": 0.14258325099945068,
2347
+ "eval_runtime": 12.5038,
2348
+ "eval_samples_per_second": 868.534,
2349
+ "eval_steps_per_second": 1.759,
2350
+ "step": 8736
2351
+ },
2352
+ {
2353
+ "epoch": 157.0,
2354
+ "grad_norm": 0.193147674202919,
2355
+ "learning_rate": 0.0002805873594243001,
2356
+ "loss": 0.1223,
2357
+ "step": 8792
2358
+ },
2359
+ {
2360
+ "epoch": 157.0,
2361
+ "eval_loss": 0.1423390656709671,
2362
+ "eval_runtime": 11.2533,
2363
+ "eval_samples_per_second": 965.047,
2364
+ "eval_steps_per_second": 1.955,
2365
+ "step": 8792
2366
+ },
2367
+ {
2368
+ "epoch": 158.0,
2369
+ "grad_norm": 0.15751470625400543,
2370
+ "learning_rate": 0.0002803648316099116,
2371
+ "loss": 0.1222,
2372
+ "step": 8848
2373
+ },
2374
+ {
2375
+ "epoch": 158.0,
2376
+ "eval_loss": 0.1417943835258484,
2377
+ "eval_runtime": 11.5797,
2378
+ "eval_samples_per_second": 937.847,
2379
+ "eval_steps_per_second": 1.9,
2380
+ "step": 8848
2381
+ },
2382
+ {
2383
+ "epoch": 159.0,
2384
+ "grad_norm": 0.27395108342170715,
2385
+ "learning_rate": 0.00028014100907750874,
2386
+ "loss": 0.1219,
2387
+ "step": 8904
2388
+ },
2389
+ {
2390
+ "epoch": 159.0,
2391
+ "eval_loss": 0.14257293939590454,
2392
+ "eval_runtime": 12.328,
2393
+ "eval_samples_per_second": 880.923,
2394
+ "eval_steps_per_second": 1.785,
2395
+ "step": 8904
2396
+ },
2397
+ {
2398
+ "epoch": 160.0,
2399
+ "grad_norm": 0.22418324649333954,
2400
+ "learning_rate": 0.0002799158940361295,
2401
+ "loss": 0.1217,
2402
+ "step": 8960
2403
+ },
2404
+ {
2405
+ "epoch": 160.0,
2406
+ "eval_loss": 0.1431107521057129,
2407
+ "eval_runtime": 12.2423,
2408
+ "eval_samples_per_second": 887.09,
2409
+ "eval_steps_per_second": 1.797,
2410
+ "step": 8960
2411
+ },
2412
+ {
2413
+ "epoch": 161.0,
2414
+ "grad_norm": 0.2003849744796753,
2415
+ "learning_rate": 0.0002796894887075685,
2416
+ "loss": 0.1218,
2417
+ "step": 9016
2418
+ },
2419
+ {
2420
+ "epoch": 161.0,
2421
+ "eval_loss": 0.14198802411556244,
2422
+ "eval_runtime": 11.4923,
2423
+ "eval_samples_per_second": 944.981,
2424
+ "eval_steps_per_second": 1.914,
2425
+ "step": 9016
2426
+ },
2427
+ {
2428
+ "epoch": 162.0,
2429
+ "grad_norm": 0.21222490072250366,
2430
+ "learning_rate": 0.00027946179532635447,
2431
+ "loss": 0.1215,
2432
+ "step": 9072
2433
+ },
2434
+ {
2435
+ "epoch": 162.0,
2436
+ "eval_loss": 0.14226287603378296,
2437
+ "eval_runtime": 12.6489,
2438
+ "eval_samples_per_second": 858.572,
2439
+ "eval_steps_per_second": 1.739,
2440
+ "step": 9072
2441
+ },
2442
+ {
2443
+ "epoch": 163.0,
2444
+ "grad_norm": 0.3284847140312195,
2445
+ "learning_rate": 0.0002792328161397301,
2446
+ "loss": 0.1214,
2447
+ "step": 9128
2448
+ },
2449
+ {
2450
+ "epoch": 163.0,
2451
+ "eval_loss": 0.14255832135677338,
2452
+ "eval_runtime": 11.8749,
2453
+ "eval_samples_per_second": 914.536,
2454
+ "eval_steps_per_second": 1.853,
2455
+ "step": 9128
2456
+ },
2457
+ {
2458
+ "epoch": 164.0,
2459
+ "grad_norm": 0.17873606085777283,
2460
+ "learning_rate": 0.0002790025534076267,
2461
+ "loss": 0.1209,
2462
+ "step": 9184
2463
+ },
2464
+ {
2465
+ "epoch": 164.0,
2466
+ "eval_loss": 0.14214134216308594,
2467
+ "eval_runtime": 11.7349,
2468
+ "eval_samples_per_second": 925.446,
2469
+ "eval_steps_per_second": 1.875,
2470
+ "step": 9184
2471
+ },
2472
+ {
2473
+ "epoch": 165.0,
2474
+ "grad_norm": 0.29637348651885986,
2475
+ "learning_rate": 0.00027877100940264476,
2476
+ "loss": 0.1214,
2477
+ "step": 9240
2478
+ },
2479
+ {
2480
+ "epoch": 165.0,
2481
+ "eval_loss": 0.14148862659931183,
2482
+ "eval_runtime": 11.2369,
2483
+ "eval_samples_per_second": 966.457,
2484
+ "eval_steps_per_second": 1.958,
2485
+ "step": 9240
2486
+ },
2487
+ {
2488
+ "epoch": 166.0,
2489
+ "grad_norm": 0.19445298612117767,
2490
+ "learning_rate": 0.0002785381864100304,
2491
+ "loss": 0.1211,
2492
+ "step": 9296
2493
+ },
2494
+ {
2495
+ "epoch": 166.0,
2496
+ "eval_loss": 0.14366163313388824,
2497
+ "eval_runtime": 11.7897,
2498
+ "eval_samples_per_second": 921.146,
2499
+ "eval_steps_per_second": 1.866,
2500
+ "step": 9296
2501
+ },
2502
+ {
2503
+ "epoch": 167.0,
2504
+ "grad_norm": 0.2037288248538971,
2505
+ "learning_rate": 0.0002783040867276523,
2506
+ "loss": 0.1209,
2507
+ "step": 9352
2508
+ },
2509
+ {
2510
+ "epoch": 167.0,
2511
+ "eval_loss": 0.14206562936306,
2512
+ "eval_runtime": 11.4292,
2513
+ "eval_samples_per_second": 950.199,
2514
+ "eval_steps_per_second": 1.925,
2515
+ "step": 9352
2516
+ },
2517
+ {
2518
+ "epoch": 168.0,
2519
+ "grad_norm": 0.21530179679393768,
2520
+ "learning_rate": 0.0002780687126659796,
2521
+ "loss": 0.1208,
2522
+ "step": 9408
2523
+ },
2524
+ {
2525
+ "epoch": 168.0,
2526
+ "eval_loss": 0.1410149782896042,
2527
+ "eval_runtime": 11.7288,
2528
+ "eval_samples_per_second": 925.923,
2529
+ "eval_steps_per_second": 1.876,
2530
+ "step": 9408
2531
+ }
2532
+ ],
2533
+ "logging_steps": 500,
2534
+ "max_steps": 56000,
2535
+ "num_input_tokens_seen": 0,
2536
+ "num_train_epochs": 1000,
2537
+ "save_steps": 500,
2538
+ "stateful_callbacks": {
2539
+ "EarlyStoppingCallback": {
2540
+ "args": {
2541
+ "early_stopping_patience": 10,
2542
+ "early_stopping_threshold": 1e-05
2543
+ },
2544
+ "attributes": {
2545
+ "early_stopping_patience_counter": 0
2546
+ }
2547
+ },
2548
+ "TrainerControl": {
2549
+ "args": {
2550
+ "should_epoch_stop": false,
2551
+ "should_evaluate": false,
2552
+ "should_log": false,
2553
+ "should_save": true,
2554
+ "should_training_stop": false
2555
+ },
2556
+ "attributes": {}
2557
+ }
2558
+ },
2559
+ "total_flos": 1.1001513367240704e+18,
2560
+ "train_batch_size": 512,
2561
+ "trial_name": null,
2562
+ "trial_params": null
2563
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ecee95428066b300b2d5db561c50a64688363967e43cd3bb0ca6cc31f18ae34
3
+ size 5969