yamraj047 commited on
Commit
3a0baeb
·
verified ·
1 Parent(s): 3467392

Upload 13 files

Browse files
README.md CHANGED
@@ -1,7 +1,207 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
3
  ---
4
- Epoch Training Loss Validation Loss
5
 
6
- 2 0.473800 0.684264
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: mistralai/Mistral-7B-v0.1
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Mistral-7B-v0.1
7
+ - lora
8
+ - transformers
9
  ---
 
10
 
11
+ # Model Card for Model ID
12
 
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
adapter_config-6.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-v0.1",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "v_proj",
29
+ "gate_proj",
30
+ "o_proj",
31
+ "k_proj",
32
+ "up_proj",
33
+ "down_proj",
34
+ "q_proj"
35
+ ],
36
+ "target_parameters": null,
37
+ "task_type": "CAUSAL_LM",
38
+ "trainable_token_indices": null,
39
+ "use_dora": false,
40
+ "use_qalora": false,
41
+ "use_rslora": false
42
+ }
adapter_model-6.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:673704334c1db9c68a18eec7f2793e9e752da544e503bab1064bc32e438cb6c0
3
+ size 167832240
optimizer-5.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48ea4acc1d0f4f5c7ca8cf8743612846af1e30223211f696272c4abf42f486f7
3
+ size 85733607
rng_state-2.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89d152c2bbfea8b5176830b4c43f42d24c8f3e43e8c250a9e7ae7fd66305706a
3
+ size 14645
scaler-2.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00ddef88593cbb83cbfefa9bbe9b0fac17ba88550ad9bce7578ad68c0bea8ae1
3
+ size 1383
scheduler-4.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:badd1b4c74f1dfb5be5fef2e5d0412cbee5bd3dcd153599495e12ff16bc87721
3
+ size 1465
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "extra_special_tokens": {},
36
+ "legacy": false,
37
+ "model_max_length": 1000000000000000019884624838656,
38
+ "pad_token": "</s>",
39
+ "sp_model_kwargs": {},
40
+ "spaces_between_special_tokens": false,
41
+ "tokenizer_class": "LlamaTokenizer",
42
+ "unk_token": "<unk>",
43
+ "use_default_system_prompt": false
44
+ }
trainer_state-6.json ADDED
@@ -0,0 +1,1184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1620,
3
+ "best_metric": 0.6842637062072754,
4
+ "best_model_checkpoint": "./nepal-legal-model/checkpoint-1620",
5
+ "epoch": 2.0,
6
+ "eval_steps": 500,
7
+ "global_step": 1620,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.012345679012345678,
14
+ "grad_norm": 2.0793182849884033,
15
+ "learning_rate": 2.4657534246575342e-05,
16
+ "loss": 1.858,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.024691358024691357,
21
+ "grad_norm": 2.0381851196289062,
22
+ "learning_rate": 5.2054794520547945e-05,
23
+ "loss": 1.4272,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.037037037037037035,
28
+ "grad_norm": 1.6031200885772705,
29
+ "learning_rate": 7.945205479452055e-05,
30
+ "loss": 1.0783,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.04938271604938271,
35
+ "grad_norm": 1.6209287643432617,
36
+ "learning_rate": 0.00010684931506849317,
37
+ "loss": 1.0151,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.06172839506172839,
42
+ "grad_norm": 1.2847559452056885,
43
+ "learning_rate": 0.00013424657534246576,
44
+ "loss": 0.9561,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.07407407407407407,
49
+ "grad_norm": 1.2488341331481934,
50
+ "learning_rate": 0.00016164383561643837,
51
+ "loss": 0.9219,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.08641975308641975,
56
+ "grad_norm": 1.1916821002960205,
57
+ "learning_rate": 0.00018904109589041096,
58
+ "loss": 0.9161,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.09876543209876543,
63
+ "grad_norm": 1.1260769367218018,
64
+ "learning_rate": 0.0001999968022038833,
65
+ "loss": 0.944,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.1111111111111111,
70
+ "grad_norm": 1.1027578115463257,
71
+ "learning_rate": 0.0001999772608571399,
72
+ "loss": 0.8996,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.12345679012345678,
77
+ "grad_norm": 1.143027663230896,
78
+ "learning_rate": 0.0001999399581844347,
79
+ "loss": 0.8976,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.13580246913580246,
84
+ "grad_norm": 1.2351293563842773,
85
+ "learning_rate": 0.00019988490081272397,
86
+ "loss": 0.8754,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.14814814814814814,
91
+ "grad_norm": 1.026627779006958,
92
+ "learning_rate": 0.0001998120985231511,
93
+ "loss": 0.8798,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.16049382716049382,
98
+ "grad_norm": 1.2233778238296509,
99
+ "learning_rate": 0.00019972156424930896,
100
+ "loss": 0.8562,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.1728395061728395,
105
+ "grad_norm": 1.1058930158615112,
106
+ "learning_rate": 0.00019961331407494245,
107
+ "loss": 0.8666,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.18518518518518517,
112
+ "grad_norm": 1.1126072406768799,
113
+ "learning_rate": 0.00019948736723109082,
114
+ "loss": 0.8733,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.19753086419753085,
119
+ "grad_norm": 1.0302902460098267,
120
+ "learning_rate": 0.00019934374609267136,
121
+ "loss": 0.8287,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.20987654320987653,
126
+ "grad_norm": 1.0634909868240356,
127
+ "learning_rate": 0.00019918247617450454,
128
+ "loss": 0.834,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.2222222222222222,
133
+ "grad_norm": 1.0340452194213867,
134
+ "learning_rate": 0.00019900358612678099,
135
+ "loss": 0.8747,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.2345679012345679,
140
+ "grad_norm": 1.0361789464950562,
141
+ "learning_rate": 0.0001988071077299718,
142
+ "loss": 0.8597,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.24691358024691357,
147
+ "grad_norm": 1.0306812524795532,
148
+ "learning_rate": 0.00019859307588918258,
149
+ "loss": 0.8594,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.25925925925925924,
154
+ "grad_norm": 1.1273531913757324,
155
+ "learning_rate": 0.00019836152862795245,
156
+ "loss": 0.8533,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.2716049382716049,
161
+ "grad_norm": 1.0480186939239502,
162
+ "learning_rate": 0.0001981125070814991,
163
+ "loss": 0.8409,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.2839506172839506,
168
+ "grad_norm": 1.0523358583450317,
169
+ "learning_rate": 0.00019784605548941073,
170
+ "loss": 0.8555,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.2962962962962963,
175
+ "grad_norm": 1.0661187171936035,
176
+ "learning_rate": 0.00019756222118778706,
177
+ "loss": 0.8623,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.30864197530864196,
182
+ "grad_norm": 1.0556374788284302,
183
+ "learning_rate": 0.0001972610546008295,
184
+ "loss": 0.8104,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.32098765432098764,
189
+ "grad_norm": 1.0738286972045898,
190
+ "learning_rate": 0.00019694260923188356,
191
+ "loss": 0.805,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.3333333333333333,
196
+ "grad_norm": 1.0582962036132812,
197
+ "learning_rate": 0.00019660694165393334,
198
+ "loss": 0.8487,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.345679012345679,
203
+ "grad_norm": 0.99802565574646,
204
+ "learning_rate": 0.00019625411149955154,
205
+ "loss": 0.8211,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.35802469135802467,
210
+ "grad_norm": 1.1545002460479736,
211
+ "learning_rate": 0.00019588418145030518,
212
+ "loss": 0.7967,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.37037037037037035,
217
+ "grad_norm": 1.1147046089172363,
218
+ "learning_rate": 0.00019549721722562018,
219
+ "loss": 0.823,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.38271604938271603,
224
+ "grad_norm": 1.0882933139801025,
225
+ "learning_rate": 0.00019509328757110598,
226
+ "loss": 0.8206,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.3950617283950617,
231
+ "grad_norm": 1.0245946645736694,
232
+ "learning_rate": 0.0001946724642463427,
233
+ "loss": 0.771,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.4074074074074074,
238
+ "grad_norm": 1.1092945337295532,
239
+ "learning_rate": 0.00019423482201213276,
240
+ "loss": 0.791,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.41975308641975306,
245
+ "grad_norm": 0.9481214284896851,
246
+ "learning_rate": 0.0001937804386172193,
247
+ "loss": 0.8142,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.43209876543209874,
252
+ "grad_norm": 1.1183067560195923,
253
+ "learning_rate": 0.00019330939478447393,
254
+ "loss": 0.7952,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.4444444444444444,
259
+ "grad_norm": 1.094511866569519,
260
+ "learning_rate": 0.00019282177419655585,
261
+ "loss": 0.7853,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.4567901234567901,
266
+ "grad_norm": 0.9660793542861938,
267
+ "learning_rate": 0.00019231766348104556,
268
+ "loss": 0.7678,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.4691358024691358,
273
+ "grad_norm": 1.1652113199234009,
274
+ "learning_rate": 0.000191797152195055,
275
+ "loss": 0.7881,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.48148148148148145,
280
+ "grad_norm": 1.0458804368972778,
281
+ "learning_rate": 0.00019126033280931733,
282
+ "loss": 0.7882,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.49382716049382713,
287
+ "grad_norm": 1.1295260190963745,
288
+ "learning_rate": 0.00019070730069175936,
289
+ "loss": 0.8328,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.5061728395061729,
294
+ "grad_norm": 1.1698542833328247,
295
+ "learning_rate": 0.00019013815409055896,
296
+ "loss": 0.803,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.5185185185185185,
301
+ "grad_norm": 1.214735984802246,
302
+ "learning_rate": 0.0001895529941166909,
303
+ "loss": 0.797,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.5308641975308642,
308
+ "grad_norm": 1.0859205722808838,
309
+ "learning_rate": 0.00018895192472596426,
310
+ "loss": 0.7961,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.5432098765432098,
315
+ "grad_norm": 1.1454962491989136,
316
+ "learning_rate": 0.0001883350527005541,
317
+ "loss": 0.7848,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.5555555555555556,
322
+ "grad_norm": 1.109433889389038,
323
+ "learning_rate": 0.00018770248763003134,
324
+ "loss": 0.7801,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.5679012345679012,
329
+ "grad_norm": 1.0427923202514648,
330
+ "learning_rate": 0.00018705434189189376,
331
+ "loss": 0.7957,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.5802469135802469,
336
+ "grad_norm": 1.0453143119812012,
337
+ "learning_rate": 0.00018639073063160172,
338
+ "loss": 0.7812,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.5925925925925926,
343
+ "grad_norm": 1.0415431261062622,
344
+ "learning_rate": 0.00018571177174212214,
345
+ "loss": 0.7463,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.6049382716049383,
350
+ "grad_norm": 1.1567952632904053,
351
+ "learning_rate": 0.00018501758584298433,
352
+ "loss": 0.7643,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.6172839506172839,
357
+ "grad_norm": 1.0001814365386963,
358
+ "learning_rate": 0.00018430829625885165,
359
+ "loss": 0.7885,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.6296296296296297,
364
+ "grad_norm": 1.0246291160583496,
365
+ "learning_rate": 0.00018358402899761218,
366
+ "loss": 0.7723,
367
+ "step": 510
368
+ },
369
+ {
370
+ "epoch": 0.6419753086419753,
371
+ "grad_norm": 1.0211082696914673,
372
+ "learning_rate": 0.00018284491272799327,
373
+ "loss": 0.7739,
374
+ "step": 520
375
+ },
376
+ {
377
+ "epoch": 0.654320987654321,
378
+ "grad_norm": 1.1272550821304321,
379
+ "learning_rate": 0.00018209107875670277,
380
+ "loss": 0.7844,
381
+ "step": 530
382
+ },
383
+ {
384
+ "epoch": 0.6666666666666666,
385
+ "grad_norm": 1.2581202983856201,
386
+ "learning_rate": 0.00018132266100510214,
387
+ "loss": 0.769,
388
+ "step": 540
389
+ },
390
+ {
391
+ "epoch": 0.6790123456790124,
392
+ "grad_norm": 1.1308720111846924,
393
+ "learning_rate": 0.0001805397959854147,
394
+ "loss": 0.7587,
395
+ "step": 550
396
+ },
397
+ {
398
+ "epoch": 0.691358024691358,
399
+ "grad_norm": 0.9830589890480042,
400
+ "learning_rate": 0.00017974262277647374,
401
+ "loss": 0.766,
402
+ "step": 560
403
+ },
404
+ {
405
+ "epoch": 0.7037037037037037,
406
+ "grad_norm": 1.083027958869934,
407
+ "learning_rate": 0.00017893128299901472,
408
+ "loss": 0.7503,
409
+ "step": 570
410
+ },
411
+ {
412
+ "epoch": 0.7160493827160493,
413
+ "grad_norm": 1.0589890480041504,
414
+ "learning_rate": 0.00017810592079051585,
415
+ "loss": 0.7865,
416
+ "step": 580
417
+ },
418
+ {
419
+ "epoch": 0.7283950617283951,
420
+ "grad_norm": 1.1053671836853027,
421
+ "learning_rate": 0.00017726668277959136,
422
+ "loss": 0.7639,
423
+ "step": 590
424
+ },
425
+ {
426
+ "epoch": 0.7407407407407407,
427
+ "grad_norm": 1.1152055263519287,
428
+ "learning_rate": 0.00017641371805994264,
429
+ "loss": 0.7614,
430
+ "step": 600
431
+ },
432
+ {
433
+ "epoch": 0.7530864197530864,
434
+ "grad_norm": 1.1337790489196777,
435
+ "learning_rate": 0.00017554717816387107,
436
+ "loss": 0.761,
437
+ "step": 610
438
+ },
439
+ {
440
+ "epoch": 0.7654320987654321,
441
+ "grad_norm": 1.1052172183990479,
442
+ "learning_rate": 0.00017466721703535764,
443
+ "loss": 0.7506,
444
+ "step": 620
445
+ },
446
+ {
447
+ "epoch": 0.7777777777777778,
448
+ "grad_norm": 1.0141187906265259,
449
+ "learning_rate": 0.0001737739910027145,
450
+ "loss": 0.7529,
451
+ "step": 630
452
+ },
453
+ {
454
+ "epoch": 0.7901234567901234,
455
+ "grad_norm": 1.1626172065734863,
456
+ "learning_rate": 0.00017286765875081244,
457
+ "loss": 0.7786,
458
+ "step": 640
459
+ },
460
+ {
461
+ "epoch": 0.8024691358024691,
462
+ "grad_norm": 1.1584755182266235,
463
+ "learning_rate": 0.00017194838129289006,
464
+ "loss": 0.745,
465
+ "step": 650
466
+ },
467
+ {
468
+ "epoch": 0.8148148148148148,
469
+ "grad_norm": 1.1189895868301392,
470
+ "learning_rate": 0.0001710163219419491,
471
+ "loss": 0.7436,
472
+ "step": 660
473
+ },
474
+ {
475
+ "epoch": 0.8271604938271605,
476
+ "grad_norm": 1.096261739730835,
477
+ "learning_rate": 0.00017007164628174139,
478
+ "loss": 0.7656,
479
+ "step": 670
480
+ },
481
+ {
482
+ "epoch": 0.8395061728395061,
483
+ "grad_norm": 1.0469647645950317,
484
+ "learning_rate": 0.00016911452213735223,
485
+ "loss": 0.752,
486
+ "step": 680
487
+ },
488
+ {
489
+ "epoch": 0.8518518518518519,
490
+ "grad_norm": 1.1135762929916382,
491
+ "learning_rate": 0.00016814511954538558,
492
+ "loss": 0.7516,
493
+ "step": 690
494
+ },
495
+ {
496
+ "epoch": 0.8641975308641975,
497
+ "grad_norm": 1.0560423135757446,
498
+ "learning_rate": 0.00016716361072375657,
499
+ "loss": 0.7412,
500
+ "step": 700
501
+ },
502
+ {
503
+ "epoch": 0.8765432098765432,
504
+ "grad_norm": 1.190370798110962,
505
+ "learning_rate": 0.00016617017004109632,
506
+ "loss": 0.6996,
507
+ "step": 710
508
+ },
509
+ {
510
+ "epoch": 0.8888888888888888,
511
+ "grad_norm": 0.9996367692947388,
512
+ "learning_rate": 0.0001651649739857746,
513
+ "loss": 0.7436,
514
+ "step": 720
515
+ },
516
+ {
517
+ "epoch": 0.9012345679012346,
518
+ "grad_norm": 1.0223890542984009,
519
+ "learning_rate": 0.00016414820113454622,
520
+ "loss": 0.7632,
521
+ "step": 730
522
+ },
523
+ {
524
+ "epoch": 0.9135802469135802,
525
+ "grad_norm": 1.1123933792114258,
526
+ "learning_rate": 0.0001631200321208261,
527
+ "loss": 0.7437,
528
+ "step": 740
529
+ },
530
+ {
531
+ "epoch": 0.9259259259259259,
532
+ "grad_norm": 1.1075836420059204,
533
+ "learning_rate": 0.00016208064960259897,
534
+ "loss": 0.7722,
535
+ "step": 750
536
+ },
537
+ {
538
+ "epoch": 0.9382716049382716,
539
+ "grad_norm": 1.114262342453003,
540
+ "learning_rate": 0.00016103023822996982,
541
+ "loss": 0.7664,
542
+ "step": 760
543
+ },
544
+ {
545
+ "epoch": 0.9506172839506173,
546
+ "grad_norm": 1.1287472248077393,
547
+ "learning_rate": 0.00015996898461235977,
548
+ "loss": 0.7474,
549
+ "step": 770
550
+ },
551
+ {
552
+ "epoch": 0.9629629629629629,
553
+ "grad_norm": 1.188179850578308,
554
+ "learning_rate": 0.00015889707728535462,
555
+ "loss": 0.7508,
556
+ "step": 780
557
+ },
558
+ {
559
+ "epoch": 0.9753086419753086,
560
+ "grad_norm": 1.1187824010849,
561
+ "learning_rate": 0.0001578147066772104,
562
+ "loss": 0.7656,
563
+ "step": 790
564
+ },
565
+ {
566
+ "epoch": 0.9876543209876543,
567
+ "grad_norm": 1.1188304424285889,
568
+ "learning_rate": 0.00015672206507502337,
569
+ "loss": 0.7268,
570
+ "step": 800
571
+ },
572
+ {
573
+ "epoch": 1.0,
574
+ "grad_norm": 1.1175668239593506,
575
+ "learning_rate": 0.00015561934659056947,
576
+ "loss": 0.7362,
577
+ "step": 810
578
+ },
579
+ {
580
+ "epoch": 1.0,
581
+ "eval_loss": 0.733393669128418,
582
+ "eval_runtime": 411.0606,
583
+ "eval_samples_per_second": 3.503,
584
+ "eval_steps_per_second": 0.876,
585
+ "step": 810
586
+ },
587
+ {
588
+ "epoch": 1.0123456790123457,
589
+ "grad_norm": 1.548414945602417,
590
+ "learning_rate": 0.0001545067471258196,
591
+ "loss": 0.5331,
592
+ "step": 820
593
+ },
594
+ {
595
+ "epoch": 1.0246913580246915,
596
+ "grad_norm": 1.2169567346572876,
597
+ "learning_rate": 0.00015338446433813693,
598
+ "loss": 0.5351,
599
+ "step": 830
600
+ },
601
+ {
602
+ "epoch": 1.037037037037037,
603
+ "grad_norm": 1.2411112785339355,
604
+ "learning_rate": 0.00015225269760516232,
605
+ "loss": 0.5072,
606
+ "step": 840
607
+ },
608
+ {
609
+ "epoch": 1.0493827160493827,
610
+ "grad_norm": 1.0267516374588013,
611
+ "learning_rate": 0.00015111164798939432,
612
+ "loss": 0.5127,
613
+ "step": 850
614
+ },
615
+ {
616
+ "epoch": 1.0617283950617284,
617
+ "grad_norm": 1.082014560699463,
618
+ "learning_rate": 0.00014996151820246935,
619
+ "loss": 0.507,
620
+ "step": 860
621
+ },
622
+ {
623
+ "epoch": 1.074074074074074,
624
+ "grad_norm": 1.1765062808990479,
625
+ "learning_rate": 0.00014880251256914963,
626
+ "loss": 0.517,
627
+ "step": 870
628
+ },
629
+ {
630
+ "epoch": 1.0864197530864197,
631
+ "grad_norm": 1.1010947227478027,
632
+ "learning_rate": 0.0001476348369910238,
633
+ "loss": 0.5151,
634
+ "step": 880
635
+ },
636
+ {
637
+ "epoch": 1.0987654320987654,
638
+ "grad_norm": 1.2586389780044556,
639
+ "learning_rate": 0.00014645869890992803,
640
+ "loss": 0.5277,
641
+ "step": 890
642
+ },
643
+ {
644
+ "epoch": 1.1111111111111112,
645
+ "grad_norm": 1.0955579280853271,
646
+ "learning_rate": 0.000145274307271093,
647
+ "loss": 0.502,
648
+ "step": 900
649
+ },
650
+ {
651
+ "epoch": 1.123456790123457,
652
+ "grad_norm": 1.1811714172363281,
653
+ "learning_rate": 0.0001440818724860241,
654
+ "loss": 0.5271,
655
+ "step": 910
656
+ },
657
+ {
658
+ "epoch": 1.1358024691358024,
659
+ "grad_norm": 1.2689094543457031,
660
+ "learning_rate": 0.00014288160639512105,
661
+ "loss": 0.5348,
662
+ "step": 920
663
+ },
664
+ {
665
+ "epoch": 1.1481481481481481,
666
+ "grad_norm": 1.3397020101547241,
667
+ "learning_rate": 0.0001416737222300438,
668
+ "loss": 0.5278,
669
+ "step": 930
670
+ },
671
+ {
672
+ "epoch": 1.1604938271604939,
673
+ "grad_norm": 1.1110204458236694,
674
+ "learning_rate": 0.00014045843457583085,
675
+ "loss": 0.5328,
676
+ "step": 940
677
+ },
678
+ {
679
+ "epoch": 1.1728395061728394,
680
+ "grad_norm": 1.1803778409957886,
681
+ "learning_rate": 0.0001392359593327778,
682
+ "loss": 0.5188,
683
+ "step": 950
684
+ },
685
+ {
686
+ "epoch": 1.1851851851851851,
687
+ "grad_norm": 1.1317882537841797,
688
+ "learning_rate": 0.00013800651367808158,
689
+ "loss": 0.5181,
690
+ "step": 960
691
+ },
692
+ {
693
+ "epoch": 1.1975308641975309,
694
+ "grad_norm": 1.2042200565338135,
695
+ "learning_rate": 0.00013677031602725822,
696
+ "loss": 0.5165,
697
+ "step": 970
698
+ },
699
+ {
700
+ "epoch": 1.2098765432098766,
701
+ "grad_norm": 1.357170820236206,
702
+ "learning_rate": 0.0001355275859953406,
703
+ "loss": 0.5201,
704
+ "step": 980
705
+ },
706
+ {
707
+ "epoch": 1.2222222222222223,
708
+ "grad_norm": 1.24747896194458,
709
+ "learning_rate": 0.00013427854435786303,
710
+ "loss": 0.5213,
711
+ "step": 990
712
+ },
713
+ {
714
+ "epoch": 1.2345679012345678,
715
+ "grad_norm": 1.2096565961837769,
716
+ "learning_rate": 0.00013302341301163953,
717
+ "loss": 0.5144,
718
+ "step": 1000
719
+ },
720
+ {
721
+ "epoch": 1.2469135802469136,
722
+ "grad_norm": 1.2155942916870117,
723
+ "learning_rate": 0.0001317624149353432,
724
+ "loss": 0.5203,
725
+ "step": 1010
726
+ },
727
+ {
728
+ "epoch": 1.2592592592592593,
729
+ "grad_norm": 1.2768797874450684,
730
+ "learning_rate": 0.00013049577414989317,
731
+ "loss": 0.5253,
732
+ "step": 1020
733
+ },
734
+ {
735
+ "epoch": 1.2716049382716048,
736
+ "grad_norm": 1.174811840057373,
737
+ "learning_rate": 0.0001292237156786565,
738
+ "loss": 0.5268,
739
+ "step": 1030
740
+ },
741
+ {
742
+ "epoch": 1.2839506172839505,
743
+ "grad_norm": 1.1357448101043701,
744
+ "learning_rate": 0.00012794646550747196,
745
+ "loss": 0.5317,
746
+ "step": 1040
747
+ },
748
+ {
749
+ "epoch": 1.2962962962962963,
750
+ "grad_norm": 1.138178825378418,
751
+ "learning_rate": 0.00012666425054450275,
752
+ "loss": 0.5052,
753
+ "step": 1050
754
+ },
755
+ {
756
+ "epoch": 1.308641975308642,
757
+ "grad_norm": 1.2756052017211914,
758
+ "learning_rate": 0.0001253772985799255,
759
+ "loss": 0.5297,
760
+ "step": 1060
761
+ },
762
+ {
763
+ "epoch": 1.3209876543209877,
764
+ "grad_norm": 1.306522250175476,
765
+ "learning_rate": 0.00012408583824546248,
766
+ "loss": 0.5199,
767
+ "step": 1070
768
+ },
769
+ {
770
+ "epoch": 1.3333333333333333,
771
+ "grad_norm": 1.2052675485610962,
772
+ "learning_rate": 0.00012279009897376444,
773
+ "loss": 0.5215,
774
+ "step": 1080
775
+ },
776
+ {
777
+ "epoch": 1.345679012345679,
778
+ "grad_norm": 1.2734299898147583,
779
+ "learning_rate": 0.00012149031095765087,
780
+ "loss": 0.5091,
781
+ "step": 1090
782
+ },
783
+ {
784
+ "epoch": 1.3580246913580247,
785
+ "grad_norm": 1.2760584354400635,
786
+ "learning_rate": 0.00012018670510921557,
787
+ "loss": 0.4978,
788
+ "step": 1100
789
+ },
790
+ {
791
+ "epoch": 1.3703703703703702,
792
+ "grad_norm": 1.286183476448059,
793
+ "learning_rate": 0.0001188795130188042,
794
+ "loss": 0.5433,
795
+ "step": 1110
796
+ },
797
+ {
798
+ "epoch": 1.382716049382716,
799
+ "grad_norm": 1.3962328433990479,
800
+ "learning_rate": 0.00011756896691387141,
801
+ "loss": 0.5576,
802
+ "step": 1120
803
+ },
804
+ {
805
+ "epoch": 1.3950617283950617,
806
+ "grad_norm": 1.234094262123108,
807
+ "learning_rate": 0.00011625529961772481,
808
+ "loss": 0.5274,
809
+ "step": 1130
810
+ },
811
+ {
812
+ "epoch": 1.4074074074074074,
813
+ "grad_norm": 1.2545151710510254,
814
+ "learning_rate": 0.00011493874450816302,
815
+ "loss": 0.5132,
816
+ "step": 1140
817
+ },
818
+ {
819
+ "epoch": 1.4197530864197532,
820
+ "grad_norm": 1.2998137474060059,
821
+ "learning_rate": 0.00011361953547601532,
822
+ "loss": 0.5102,
823
+ "step": 1150
824
+ },
825
+ {
826
+ "epoch": 1.4320987654320987,
827
+ "grad_norm": 1.1854768991470337,
828
+ "learning_rate": 0.00011229790688358994,
829
+ "loss": 0.5389,
830
+ "step": 1160
831
+ },
832
+ {
833
+ "epoch": 1.4444444444444444,
834
+ "grad_norm": 1.2062567472457886,
835
+ "learning_rate": 0.00011097409352303896,
836
+ "loss": 0.5038,
837
+ "step": 1170
838
+ },
839
+ {
840
+ "epoch": 1.4567901234567902,
841
+ "grad_norm": 1.1520670652389526,
842
+ "learning_rate": 0.00010964833057464645,
843
+ "loss": 0.5273,
844
+ "step": 1180
845
+ },
846
+ {
847
+ "epoch": 1.4691358024691357,
848
+ "grad_norm": 1.1943365335464478,
849
+ "learning_rate": 0.00010832085356504786,
850
+ "loss": 0.5149,
851
+ "step": 1190
852
+ },
853
+ {
854
+ "epoch": 1.4814814814814814,
855
+ "grad_norm": 1.239999771118164,
856
+ "learning_rate": 0.00010699189832538795,
857
+ "loss": 0.5234,
858
+ "step": 1200
859
+ },
860
+ {
861
+ "epoch": 1.4938271604938271,
862
+ "grad_norm": 1.1738687753677368,
863
+ "learning_rate": 0.00010566170094942438,
864
+ "loss": 0.5077,
865
+ "step": 1210
866
+ },
867
+ {
868
+ "epoch": 1.5061728395061729,
869
+ "grad_norm": 1.2650293111801147,
870
+ "learning_rate": 0.00010433049775158497,
871
+ "loss": 0.5139,
872
+ "step": 1220
873
+ },
874
+ {
875
+ "epoch": 1.5185185185185186,
876
+ "grad_norm": 1.3197226524353027,
877
+ "learning_rate": 0.00010299852522498535,
878
+ "loss": 0.4822,
879
+ "step": 1230
880
+ },
881
+ {
882
+ "epoch": 1.5308641975308643,
883
+ "grad_norm": 1.2888526916503906,
884
+ "learning_rate": 0.00010166601999941528,
885
+ "loss": 0.5005,
886
+ "step": 1240
887
+ },
888
+ {
889
+ "epoch": 1.5432098765432098,
890
+ "grad_norm": 1.2094467878341675,
891
+ "learning_rate": 0.00010033321879930044,
892
+ "loss": 0.489,
893
+ "step": 1250
894
+ },
895
+ {
896
+ "epoch": 1.5555555555555556,
897
+ "grad_norm": 1.161982536315918,
898
+ "learning_rate": 9.900035840164752e-05,
899
+ "loss": 0.5058,
900
+ "step": 1260
901
+ },
902
+ {
903
+ "epoch": 1.567901234567901,
904
+ "grad_norm": 1.1806972026824951,
905
+ "learning_rate": 9.766767559397977e-05,
906
+ "loss": 0.4991,
907
+ "step": 1270
908
+ },
909
+ {
910
+ "epoch": 1.5802469135802468,
911
+ "grad_norm": 1.2687534093856812,
912
+ "learning_rate": 9.633540713227095e-05,
913
+ "loss": 0.5262,
914
+ "step": 1280
915
+ },
916
+ {
917
+ "epoch": 1.5925925925925926,
918
+ "grad_norm": 1.2309614419937134,
919
+ "learning_rate": 9.500378969888479e-05,
920
+ "loss": 0.4909,
921
+ "step": 1290
922
+ },
923
+ {
924
+ "epoch": 1.6049382716049383,
925
+ "grad_norm": 1.2943015098571777,
926
+ "learning_rate": 9.367305986052747e-05,
927
+ "loss": 0.5267,
928
+ "step": 1300
929
+ },
930
+ {
931
+ "epoch": 1.617283950617284,
932
+ "grad_norm": 1.1998778581619263,
933
+ "learning_rate": 9.234345402622064e-05,
934
+ "loss": 0.5039,
935
+ "step": 1310
936
+ },
937
+ {
938
+ "epoch": 1.6296296296296298,
939
+ "grad_norm": 1.2614692449569702,
940
+ "learning_rate": 9.101520840530245e-05,
941
+ "loss": 0.509,
942
+ "step": 1320
943
+ },
944
+ {
945
+ "epoch": 1.6419753086419753,
946
+ "grad_norm": 1.2492973804473877,
947
+ "learning_rate": 8.968855896546429e-05,
948
+ "loss": 0.5279,
949
+ "step": 1330
950
+ },
951
+ {
952
+ "epoch": 1.654320987654321,
953
+ "grad_norm": 1.2979342937469482,
954
+ "learning_rate": 8.83637413908301e-05,
955
+ "loss": 0.4958,
956
+ "step": 1340
957
+ },
958
+ {
959
+ "epoch": 1.6666666666666665,
960
+ "grad_norm": 1.2049380540847778,
961
+ "learning_rate": 8.70409910400862e-05,
962
+ "loss": 0.5236,
963
+ "step": 1350
964
+ },
965
+ {
966
+ "epoch": 1.6790123456790123,
967
+ "grad_norm": 1.494537591934204,
968
+ "learning_rate": 8.572054290466911e-05,
969
+ "loss": 0.5099,
970
+ "step": 1360
971
+ },
972
+ {
973
+ "epoch": 1.691358024691358,
974
+ "grad_norm": 1.2835593223571777,
975
+ "learning_rate": 8.440263156701835e-05,
976
+ "loss": 0.5148,
977
+ "step": 1370
978
+ },
979
+ {
980
+ "epoch": 1.7037037037037037,
981
+ "grad_norm": 1.40305495262146,
982
+ "learning_rate": 8.308749115890212e-05,
983
+ "loss": 0.4957,
984
+ "step": 1380
985
+ },
986
+ {
987
+ "epoch": 1.7160493827160495,
988
+ "grad_norm": 1.2815101146697998,
989
+ "learning_rate": 8.177535531982266e-05,
990
+ "loss": 0.4919,
991
+ "step": 1390
992
+ },
993
+ {
994
+ "epoch": 1.7283950617283952,
995
+ "grad_norm": 1.3042811155319214,
996
+ "learning_rate": 8.046645715550971e-05,
997
+ "loss": 0.5041,
998
+ "step": 1400
999
+ },
1000
+ {
1001
+ "epoch": 1.7407407407407407,
1002
+ "grad_norm": 1.3259227275848389,
1003
+ "learning_rate": 7.916102919650826e-05,
1004
+ "loss": 0.5199,
1005
+ "step": 1410
1006
+ },
1007
+ {
1008
+ "epoch": 1.7530864197530864,
1009
+ "grad_norm": 1.3886185884475708,
1010
+ "learning_rate": 7.785930335686843e-05,
1011
+ "loss": 0.5027,
1012
+ "step": 1420
1013
+ },
1014
+ {
1015
+ "epoch": 1.765432098765432,
1016
+ "grad_norm": 1.3747823238372803,
1017
+ "learning_rate": 7.656151089294553e-05,
1018
+ "loss": 0.5039,
1019
+ "step": 1430
1020
+ },
1021
+ {
1022
+ "epoch": 1.7777777777777777,
1023
+ "grad_norm": 1.3446813821792603,
1024
+ "learning_rate": 7.526788236231621e-05,
1025
+ "loss": 0.4968,
1026
+ "step": 1440
1027
+ },
1028
+ {
1029
+ "epoch": 1.7901234567901234,
1030
+ "grad_norm": 1.2697521448135376,
1031
+ "learning_rate": 7.397864758281909e-05,
1032
+ "loss": 0.492,
1033
+ "step": 1450
1034
+ },
1035
+ {
1036
+ "epoch": 1.8024691358024691,
1037
+ "grad_norm": 1.2015522718429565,
1038
+ "learning_rate": 7.26940355917269e-05,
1039
+ "loss": 0.5009,
1040
+ "step": 1460
1041
+ },
1042
+ {
1043
+ "epoch": 1.8148148148148149,
1044
+ "grad_norm": 1.1260066032409668,
1045
+ "learning_rate": 7.141427460505712e-05,
1046
+ "loss": 0.4872,
1047
+ "step": 1470
1048
+ },
1049
+ {
1050
+ "epoch": 1.8271604938271606,
1051
+ "grad_norm": 1.410532832145691,
1052
+ "learning_rate": 7.013959197702851e-05,
1053
+ "loss": 0.5243,
1054
+ "step": 1480
1055
+ },
1056
+ {
1057
+ "epoch": 1.8395061728395061,
1058
+ "grad_norm": 1.280680537223816,
1059
+ "learning_rate": 6.887021415967081e-05,
1060
+ "loss": 0.4939,
1061
+ "step": 1490
1062
+ },
1063
+ {
1064
+ "epoch": 1.8518518518518519,
1065
+ "grad_norm": 1.3018995523452759,
1066
+ "learning_rate": 6.760636666259485e-05,
1067
+ "loss": 0.5088,
1068
+ "step": 1500
1069
+ },
1070
+ {
1071
+ "epoch": 1.8641975308641974,
1072
+ "grad_norm": 1.2070612907409668,
1073
+ "learning_rate": 6.634827401292981e-05,
1074
+ "loss": 0.4701,
1075
+ "step": 1510
1076
+ },
1077
+ {
1078
+ "epoch": 1.876543209876543,
1079
+ "grad_norm": 1.1327502727508545,
1080
+ "learning_rate": 6.50961597154351e-05,
1081
+ "loss": 0.4843,
1082
+ "step": 1520
1083
+ },
1084
+ {
1085
+ "epoch": 1.8888888888888888,
1086
+ "grad_norm": 1.2304959297180176,
1087
+ "learning_rate": 6.385024621279411e-05,
1088
+ "loss": 0.4749,
1089
+ "step": 1530
1090
+ },
1091
+ {
1092
+ "epoch": 1.9012345679012346,
1093
+ "grad_norm": 1.3501994609832764,
1094
+ "learning_rate": 6.261075484609634e-05,
1095
+ "loss": 0.4999,
1096
+ "step": 1540
1097
+ },
1098
+ {
1099
+ "epoch": 1.9135802469135803,
1100
+ "grad_norm": 1.2083770036697388,
1101
+ "learning_rate": 6.137790581551525e-05,
1102
+ "loss": 0.4813,
1103
+ "step": 1550
1104
+ },
1105
+ {
1106
+ "epoch": 1.925925925925926,
1107
+ "grad_norm": 1.230406641960144,
1108
+ "learning_rate": 6.0151918141189156e-05,
1109
+ "loss": 0.4897,
1110
+ "step": 1560
1111
+ },
1112
+ {
1113
+ "epoch": 1.9382716049382716,
1114
+ "grad_norm": 1.3704227209091187,
1115
+ "learning_rate": 5.893300962431123e-05,
1116
+ "loss": 0.4931,
1117
+ "step": 1570
1118
+ },
1119
+ {
1120
+ "epoch": 1.9506172839506173,
1121
+ "grad_norm": 1.153743863105774,
1122
+ "learning_rate": 5.772139680843651e-05,
1123
+ "loss": 0.4776,
1124
+ "step": 1580
1125
+ },
1126
+ {
1127
+ "epoch": 1.9629629629629628,
1128
+ "grad_norm": 1.553464651107788,
1129
+ "learning_rate": 5.651729494101201e-05,
1130
+ "loss": 0.4899,
1131
+ "step": 1590
1132
+ },
1133
+ {
1134
+ "epoch": 1.9753086419753085,
1135
+ "grad_norm": 1.3536055088043213,
1136
+ "learning_rate": 5.532091793513732e-05,
1137
+ "loss": 0.4734,
1138
+ "step": 1600
1139
+ },
1140
+ {
1141
+ "epoch": 1.9876543209876543,
1142
+ "grad_norm": 1.220577597618103,
1143
+ "learning_rate": 5.413247833156219e-05,
1144
+ "loss": 0.4677,
1145
+ "step": 1610
1146
+ },
1147
+ {
1148
+ "epoch": 2.0,
1149
+ "grad_norm": 1.194543480873108,
1150
+ "learning_rate": 5.2952187260927675e-05,
1151
+ "loss": 0.4738,
1152
+ "step": 1620
1153
+ },
1154
+ {
1155
+ "epoch": 2.0,
1156
+ "eval_loss": 0.6842637062072754,
1157
+ "eval_runtime": 410.4698,
1158
+ "eval_samples_per_second": 3.508,
1159
+ "eval_steps_per_second": 0.877,
1160
+ "step": 1620
1161
+ }
1162
+ ],
1163
+ "logging_steps": 10,
1164
+ "max_steps": 2430,
1165
+ "num_input_tokens_seen": 0,
1166
+ "num_train_epochs": 3,
1167
+ "save_steps": 500,
1168
+ "stateful_callbacks": {
1169
+ "TrainerControl": {
1170
+ "args": {
1171
+ "should_epoch_stop": false,
1172
+ "should_evaluate": false,
1173
+ "should_log": false,
1174
+ "should_save": true,
1175
+ "should_training_stop": false
1176
+ },
1177
+ "attributes": {}
1178
+ }
1179
+ },
1180
+ "total_flos": 2.8474547416911053e+17,
1181
+ "train_batch_size": 4,
1182
+ "trial_name": null,
1183
+ "trial_params": null
1184
+ }
training_args-6.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c63a8058f1f43024b3dc1982d0bdea55cef0e28323149cf48e044a325a6f066
3
+ size 5841