Arvyniitb commited on
Commit
6eebe0c
·
verified ·
1 Parent(s): 6972730

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Mistral-7B-Instruct-v0.3
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "q_proj",
29
+ "v_proj"
30
+ ],
31
+ "target_parameters": null,
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_qalora": false,
36
+ "use_rslora": false
37
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7310bd5cdac0e96e60625be9080e2baa5ada4d62d6ae6bef9bf8c141c8b4bd9b
3
+ size 27280152
chat_template.jinja ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if messages[0]["role"] == "system" %}
2
+ {%- set system_message = messages[0]["content"] %}
3
+ {%- set loop_messages = messages[1:] %}
4
+ {%- else %}
5
+ {%- set loop_messages = messages %}
6
+ {%- endif %}
7
+ {%- if not tools is defined %}
8
+ {%- set tools = none %}
9
+ {%- endif %}
10
+ {%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
11
+
12
+ {#- This block checks for alternating user/assistant messages, skipping tool calling messages #}
13
+ {%- set ns = namespace() %}
14
+ {%- set ns.index = 0 %}
15
+ {%- for message in loop_messages %}
16
+ {%- if not (message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
17
+ {%- if (message["role"] == "user") != (ns.index % 2 == 0) %}
18
+ {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
19
+ {%- endif %}
20
+ {%- set ns.index = ns.index + 1 %}
21
+ {%- endif %}
22
+ {%- endfor %}
23
+
24
+ {{- bos_token }}
25
+ {%- for message in loop_messages %}
26
+ {%- if message["role"] == "user" %}
27
+ {%- if tools is not none and (message == user_messages[-1]) %}
28
+ {{- "[AVAILABLE_TOOLS] [" }}
29
+ {%- for tool in tools %}
30
+ {%- set tool = tool.function %}
31
+ {{- '{"type": "function", "function": {' }}
32
+ {%- for key, val in tool.items() if key != "return" %}
33
+ {%- if val is string %}
34
+ {{- '"' + key + '": "' + val + '"' }}
35
+ {%- else %}
36
+ {{- '"' + key + '": ' + val|tojson }}
37
+ {%- endif %}
38
+ {%- if not loop.last %}
39
+ {{- ", " }}
40
+ {%- endif %}
41
+ {%- endfor %}
42
+ {{- "}}" }}
43
+ {%- if not loop.last %}
44
+ {{- ", " }}
45
+ {%- else %}
46
+ {{- "]" }}
47
+ {%- endif %}
48
+ {%- endfor %}
49
+ {{- "[/AVAILABLE_TOOLS]" }}
50
+ {%- endif %}
51
+ {%- if loop.last and system_message is defined %}
52
+ {{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
53
+ {%- else %}
54
+ {{- "[INST] " + message["content"] + "[/INST]" }}
55
+ {%- endif %}
56
+ {%- elif message.tool_calls is defined and message.tool_calls is not none %}
57
+ {{- "[TOOL_CALLS] [" }}
58
+ {%- for tool_call in message.tool_calls %}
59
+ {%- set out = tool_call.function|tojson %}
60
+ {{- out[:-1] }}
61
+ {%- if not tool_call.id is defined or tool_call.id|length != 9 %}
62
+ {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
63
+ {%- endif %}
64
+ {{- ', "id": "' + tool_call.id + '"}' }}
65
+ {%- if not loop.last %}
66
+ {{- ", " }}
67
+ {%- else %}
68
+ {{- "]" + eos_token }}
69
+ {%- endif %}
70
+ {%- endfor %}
71
+ {%- elif message["role"] == "assistant" %}
72
+ {{- " " + message["content"]|trim + eos_token}}
73
+ {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
74
+ {%- if message.content is defined and message.content.content is defined %}
75
+ {%- set content = message.content.content %}
76
+ {%- else %}
77
+ {%- set content = message.content %}
78
+ {%- endif %}
79
+ {{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
80
+ {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}
81
+ {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
82
+ {%- endif %}
83
+ {{- '"call_id": "' + message.tool_call_id + '"}[/TOOL_RESULTS]' }}
84
+ {%- else %}
85
+ {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
86
+ {%- endif %}
87
+ {%- endfor %}
checkpoint-410/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Mistral-7B-Instruct-v0.3
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
checkpoint-410/adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "q_proj",
29
+ "v_proj"
30
+ ],
31
+ "target_parameters": null,
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_qalora": false,
36
+ "use_rslora": false
37
+ }
checkpoint-410/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b20fae71d882abdd0bc9477fae2045b407904c9ea15ea877f0c9993cf9fa3fed
3
+ size 27280152
checkpoint-410/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:498a67e77de1222aaba1ba471224cef172b7cffc898f013f57e79d6b96d98d23
3
+ size 54636235
checkpoint-410/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22aef3df2367ea7da8b6ce1f68b5241aa8aac5ec108641dd3326b6ac59ebb1c2
3
+ size 14645
checkpoint-410/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac6ad17c6a8c86030146e63684fda11ce325d6fb30293c0d06ad21be0385527f
3
+ size 1383
checkpoint-410/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bfbb8fed946bbe258a94cb211e3043bbffc159e04e541ba942b2b891dcbc9eb7
3
+ size 1465
checkpoint-410/trainer_state.json ADDED
@@ -0,0 +1,337 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 2.0,
6
+ "eval_steps": 500,
7
+ "global_step": 410,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.04884004884004884,
14
+ "grad_norm": 0.6565904021263123,
15
+ "learning_rate": 0.00018,
16
+ "loss": 2.306,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.09768009768009768,
21
+ "grad_norm": 0.4782625734806061,
22
+ "learning_rate": 0.00019702479338842976,
23
+ "loss": 1.8776,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.14652014652014653,
28
+ "grad_norm": 0.34842449426651,
29
+ "learning_rate": 0.0001937190082644628,
30
+ "loss": 1.986,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.19536019536019536,
35
+ "grad_norm": 0.5142027735710144,
36
+ "learning_rate": 0.0001904132231404959,
37
+ "loss": 1.953,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.2442002442002442,
42
+ "grad_norm": 0.4530898332595825,
43
+ "learning_rate": 0.00018710743801652891,
44
+ "loss": 1.8595,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.29304029304029305,
49
+ "grad_norm": 0.4601179361343384,
50
+ "learning_rate": 0.000183801652892562,
51
+ "loss": 1.9006,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.3418803418803419,
56
+ "grad_norm": 0.49045854806900024,
57
+ "learning_rate": 0.00018049586776859504,
58
+ "loss": 1.6583,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.3907203907203907,
63
+ "grad_norm": 0.4943140745162964,
64
+ "learning_rate": 0.0001771900826446281,
65
+ "loss": 1.841,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.43956043956043955,
70
+ "grad_norm": 0.43958303332328796,
71
+ "learning_rate": 0.00017388429752066115,
72
+ "loss": 1.7377,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.4884004884004884,
77
+ "grad_norm": 0.4475325047969818,
78
+ "learning_rate": 0.00017057851239669423,
79
+ "loss": 1.9054,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.5372405372405372,
84
+ "grad_norm": 0.4703655242919922,
85
+ "learning_rate": 0.00016727272727272728,
86
+ "loss": 1.7999,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.5860805860805861,
91
+ "grad_norm": 0.47283482551574707,
92
+ "learning_rate": 0.00016396694214876033,
93
+ "loss": 1.8296,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.6349206349206349,
98
+ "grad_norm": 0.45280855894088745,
99
+ "learning_rate": 0.00016066115702479338,
100
+ "loss": 1.6442,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.6837606837606838,
105
+ "grad_norm": 0.4415304362773895,
106
+ "learning_rate": 0.00015735537190082646,
107
+ "loss": 1.8178,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.7326007326007326,
112
+ "grad_norm": 0.6615126729011536,
113
+ "learning_rate": 0.0001540495867768595,
114
+ "loss": 1.9372,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.7814407814407814,
119
+ "grad_norm": 0.5677676200866699,
120
+ "learning_rate": 0.00015074380165289256,
121
+ "loss": 1.6327,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.8302808302808303,
126
+ "grad_norm": 0.528029203414917,
127
+ "learning_rate": 0.00014743801652892564,
128
+ "loss": 1.7825,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.8791208791208791,
133
+ "grad_norm": 0.6582658290863037,
134
+ "learning_rate": 0.0001441322314049587,
135
+ "loss": 1.9638,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.927960927960928,
140
+ "grad_norm": 0.6919686198234558,
141
+ "learning_rate": 0.00014082644628099175,
142
+ "loss": 1.8914,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.9768009768009768,
147
+ "grad_norm": 0.3584222197532654,
148
+ "learning_rate": 0.0001375206611570248,
149
+ "loss": 1.7909,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 1.0,
154
+ "eval_loss": 1.735278844833374,
155
+ "eval_runtime": 19.9432,
156
+ "eval_samples_per_second": 9.126,
157
+ "eval_steps_per_second": 1.153,
158
+ "step": 205
159
+ },
160
+ {
161
+ "epoch": 1.0244200244200243,
162
+ "grad_norm": 0.6349706649780273,
163
+ "learning_rate": 0.00013421487603305788,
164
+ "loss": 1.8237,
165
+ "step": 210
166
+ },
167
+ {
168
+ "epoch": 1.0732600732600732,
169
+ "grad_norm": 0.338216632604599,
170
+ "learning_rate": 0.00013090909090909093,
171
+ "loss": 1.7347,
172
+ "step": 220
173
+ },
174
+ {
175
+ "epoch": 1.122100122100122,
176
+ "grad_norm": 0.43673205375671387,
177
+ "learning_rate": 0.00012760330578512398,
178
+ "loss": 1.6828,
179
+ "step": 230
180
+ },
181
+ {
182
+ "epoch": 1.170940170940171,
183
+ "grad_norm": 0.3703135550022125,
184
+ "learning_rate": 0.00012429752066115703,
185
+ "loss": 1.6821,
186
+ "step": 240
187
+ },
188
+ {
189
+ "epoch": 1.2197802197802199,
190
+ "grad_norm": 0.4435603618621826,
191
+ "learning_rate": 0.0001209917355371901,
192
+ "loss": 1.7656,
193
+ "step": 250
194
+ },
195
+ {
196
+ "epoch": 1.2686202686202686,
197
+ "grad_norm": 0.5340843200683594,
198
+ "learning_rate": 0.00011768595041322315,
199
+ "loss": 1.792,
200
+ "step": 260
201
+ },
202
+ {
203
+ "epoch": 1.3174603174603174,
204
+ "grad_norm": 0.37987396121025085,
205
+ "learning_rate": 0.00011438016528925621,
206
+ "loss": 1.7761,
207
+ "step": 270
208
+ },
209
+ {
210
+ "epoch": 1.3663003663003663,
211
+ "grad_norm": 0.33273303508758545,
212
+ "learning_rate": 0.00011107438016528926,
213
+ "loss": 1.6398,
214
+ "step": 280
215
+ },
216
+ {
217
+ "epoch": 1.4151404151404152,
218
+ "grad_norm": 0.4231277406215668,
219
+ "learning_rate": 0.00010776859504132233,
220
+ "loss": 1.8901,
221
+ "step": 290
222
+ },
223
+ {
224
+ "epoch": 1.463980463980464,
225
+ "grad_norm": 0.2134978324174881,
226
+ "learning_rate": 0.00010446280991735538,
227
+ "loss": 1.8604,
228
+ "step": 300
229
+ },
230
+ {
231
+ "epoch": 1.5128205128205128,
232
+ "grad_norm": 0.43554314970970154,
233
+ "learning_rate": 0.00010115702479338845,
234
+ "loss": 1.7911,
235
+ "step": 310
236
+ },
237
+ {
238
+ "epoch": 1.5616605616605617,
239
+ "grad_norm": 0.20964309573173523,
240
+ "learning_rate": 9.785123966942148e-05,
241
+ "loss": 1.6337,
242
+ "step": 320
243
+ },
244
+ {
245
+ "epoch": 1.6105006105006106,
246
+ "grad_norm": 0.5670686364173889,
247
+ "learning_rate": 9.454545454545455e-05,
248
+ "loss": 1.7667,
249
+ "step": 330
250
+ },
251
+ {
252
+ "epoch": 1.6593406593406592,
253
+ "grad_norm": 0.28007999062538147,
254
+ "learning_rate": 9.12396694214876e-05,
255
+ "loss": 1.7941,
256
+ "step": 340
257
+ },
258
+ {
259
+ "epoch": 1.7081807081807083,
260
+ "grad_norm": 0.244796484708786,
261
+ "learning_rate": 8.793388429752067e-05,
262
+ "loss": 1.6848,
263
+ "step": 350
264
+ },
265
+ {
266
+ "epoch": 1.757020757020757,
267
+ "grad_norm": 0.2684899866580963,
268
+ "learning_rate": 8.462809917355372e-05,
269
+ "loss": 1.8238,
270
+ "step": 360
271
+ },
272
+ {
273
+ "epoch": 1.8058608058608059,
274
+ "grad_norm": 0.2789863049983978,
275
+ "learning_rate": 8.132231404958678e-05,
276
+ "loss": 1.7551,
277
+ "step": 370
278
+ },
279
+ {
280
+ "epoch": 1.8547008547008548,
281
+ "grad_norm": 0.2315087765455246,
282
+ "learning_rate": 7.801652892561983e-05,
283
+ "loss": 1.6725,
284
+ "step": 380
285
+ },
286
+ {
287
+ "epoch": 1.9035409035409034,
288
+ "grad_norm": 0.2351648360490799,
289
+ "learning_rate": 7.47107438016529e-05,
290
+ "loss": 1.7861,
291
+ "step": 390
292
+ },
293
+ {
294
+ "epoch": 1.9523809523809523,
295
+ "grad_norm": 0.15907102823257446,
296
+ "learning_rate": 7.140495867768595e-05,
297
+ "loss": 1.6753,
298
+ "step": 400
299
+ },
300
+ {
301
+ "epoch": 2.0,
302
+ "grad_norm": 0.15325959026813507,
303
+ "learning_rate": 6.8099173553719e-05,
304
+ "loss": 1.8352,
305
+ "step": 410
306
+ },
307
+ {
308
+ "epoch": 2.0,
309
+ "eval_loss": 1.7097043991088867,
310
+ "eval_runtime": 19.9498,
311
+ "eval_samples_per_second": 9.123,
312
+ "eval_steps_per_second": 1.153,
313
+ "step": 410
314
+ }
315
+ ],
316
+ "logging_steps": 10,
317
+ "max_steps": 615,
318
+ "num_input_tokens_seen": 0,
319
+ "num_train_epochs": 3,
320
+ "save_steps": 500,
321
+ "stateful_callbacks": {
322
+ "TrainerControl": {
323
+ "args": {
324
+ "should_epoch_stop": false,
325
+ "should_evaluate": false,
326
+ "should_log": false,
327
+ "should_save": true,
328
+ "should_training_stop": false
329
+ },
330
+ "attributes": {}
331
+ }
332
+ },
333
+ "total_flos": 1.433220480415826e+17,
334
+ "train_batch_size": 1,
335
+ "trial_name": null,
336
+ "trial_params": null
337
+ }
checkpoint-410/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d91f4cbf3cd0a6d6f917302881f00ad5f5a4b3059dba94b59bdffe68e30afaf4
3
+ size 5905
checkpoint-615/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Mistral-7B-Instruct-v0.3
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
checkpoint-615/adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "q_proj",
29
+ "v_proj"
30
+ ],
31
+ "target_parameters": null,
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_qalora": false,
36
+ "use_rslora": false
37
+ }
checkpoint-615/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7310bd5cdac0e96e60625be9080e2baa5ada4d62d6ae6bef9bf8c141c8b4bd9b
3
+ size 27280152
checkpoint-615/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ff5c5bc4841f55e65637c58c0b084258cf4205c0b89f4cf5971d21e695c4732
3
+ size 54636235
checkpoint-615/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ac677aedd334f823cefc9c1ef31c987d08d2e1867b02203efc778847d1a2949
3
+ size 14645
checkpoint-615/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59bf985af78a95922eb89c5db9a7d99258c9c2d800308c27d8ef439640af7446
3
+ size 1383
checkpoint-615/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f9c8067aa1464f74c5bf63bc7dd6be3931d3294bdb2f3fc41f12c01477dd25b
3
+ size 1465
checkpoint-615/trainer_state.json ADDED
@@ -0,0 +1,485 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 615,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.04884004884004884,
14
+ "grad_norm": 0.6565904021263123,
15
+ "learning_rate": 0.00018,
16
+ "loss": 2.306,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.09768009768009768,
21
+ "grad_norm": 0.4782625734806061,
22
+ "learning_rate": 0.00019702479338842976,
23
+ "loss": 1.8776,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.14652014652014653,
28
+ "grad_norm": 0.34842449426651,
29
+ "learning_rate": 0.0001937190082644628,
30
+ "loss": 1.986,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.19536019536019536,
35
+ "grad_norm": 0.5142027735710144,
36
+ "learning_rate": 0.0001904132231404959,
37
+ "loss": 1.953,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.2442002442002442,
42
+ "grad_norm": 0.4530898332595825,
43
+ "learning_rate": 0.00018710743801652891,
44
+ "loss": 1.8595,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.29304029304029305,
49
+ "grad_norm": 0.4601179361343384,
50
+ "learning_rate": 0.000183801652892562,
51
+ "loss": 1.9006,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.3418803418803419,
56
+ "grad_norm": 0.49045854806900024,
57
+ "learning_rate": 0.00018049586776859504,
58
+ "loss": 1.6583,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.3907203907203907,
63
+ "grad_norm": 0.4943140745162964,
64
+ "learning_rate": 0.0001771900826446281,
65
+ "loss": 1.841,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.43956043956043955,
70
+ "grad_norm": 0.43958303332328796,
71
+ "learning_rate": 0.00017388429752066115,
72
+ "loss": 1.7377,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.4884004884004884,
77
+ "grad_norm": 0.4475325047969818,
78
+ "learning_rate": 0.00017057851239669423,
79
+ "loss": 1.9054,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.5372405372405372,
84
+ "grad_norm": 0.4703655242919922,
85
+ "learning_rate": 0.00016727272727272728,
86
+ "loss": 1.7999,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.5860805860805861,
91
+ "grad_norm": 0.47283482551574707,
92
+ "learning_rate": 0.00016396694214876033,
93
+ "loss": 1.8296,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.6349206349206349,
98
+ "grad_norm": 0.45280855894088745,
99
+ "learning_rate": 0.00016066115702479338,
100
+ "loss": 1.6442,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.6837606837606838,
105
+ "grad_norm": 0.4415304362773895,
106
+ "learning_rate": 0.00015735537190082646,
107
+ "loss": 1.8178,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.7326007326007326,
112
+ "grad_norm": 0.6615126729011536,
113
+ "learning_rate": 0.0001540495867768595,
114
+ "loss": 1.9372,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.7814407814407814,
119
+ "grad_norm": 0.5677676200866699,
120
+ "learning_rate": 0.00015074380165289256,
121
+ "loss": 1.6327,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.8302808302808303,
126
+ "grad_norm": 0.528029203414917,
127
+ "learning_rate": 0.00014743801652892564,
128
+ "loss": 1.7825,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.8791208791208791,
133
+ "grad_norm": 0.6582658290863037,
134
+ "learning_rate": 0.0001441322314049587,
135
+ "loss": 1.9638,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.927960927960928,
140
+ "grad_norm": 0.6919686198234558,
141
+ "learning_rate": 0.00014082644628099175,
142
+ "loss": 1.8914,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.9768009768009768,
147
+ "grad_norm": 0.3584222197532654,
148
+ "learning_rate": 0.0001375206611570248,
149
+ "loss": 1.7909,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 1.0,
154
+ "eval_loss": 1.735278844833374,
155
+ "eval_runtime": 19.9432,
156
+ "eval_samples_per_second": 9.126,
157
+ "eval_steps_per_second": 1.153,
158
+ "step": 205
159
+ },
160
+ {
161
+ "epoch": 1.0244200244200243,
162
+ "grad_norm": 0.6349706649780273,
163
+ "learning_rate": 0.00013421487603305788,
164
+ "loss": 1.8237,
165
+ "step": 210
166
+ },
167
+ {
168
+ "epoch": 1.0732600732600732,
169
+ "grad_norm": 0.338216632604599,
170
+ "learning_rate": 0.00013090909090909093,
171
+ "loss": 1.7347,
172
+ "step": 220
173
+ },
174
+ {
175
+ "epoch": 1.122100122100122,
176
+ "grad_norm": 0.43673205375671387,
177
+ "learning_rate": 0.00012760330578512398,
178
+ "loss": 1.6828,
179
+ "step": 230
180
+ },
181
+ {
182
+ "epoch": 1.170940170940171,
183
+ "grad_norm": 0.3703135550022125,
184
+ "learning_rate": 0.00012429752066115703,
185
+ "loss": 1.6821,
186
+ "step": 240
187
+ },
188
+ {
189
+ "epoch": 1.2197802197802199,
190
+ "grad_norm": 0.4435603618621826,
191
+ "learning_rate": 0.0001209917355371901,
192
+ "loss": 1.7656,
193
+ "step": 250
194
+ },
195
+ {
196
+ "epoch": 1.2686202686202686,
197
+ "grad_norm": 0.5340843200683594,
198
+ "learning_rate": 0.00011768595041322315,
199
+ "loss": 1.792,
200
+ "step": 260
201
+ },
202
+ {
203
+ "epoch": 1.3174603174603174,
204
+ "grad_norm": 0.37987396121025085,
205
+ "learning_rate": 0.00011438016528925621,
206
+ "loss": 1.7761,
207
+ "step": 270
208
+ },
209
+ {
210
+ "epoch": 1.3663003663003663,
211
+ "grad_norm": 0.33273303508758545,
212
+ "learning_rate": 0.00011107438016528926,
213
+ "loss": 1.6398,
214
+ "step": 280
215
+ },
216
+ {
217
+ "epoch": 1.4151404151404152,
218
+ "grad_norm": 0.4231277406215668,
219
+ "learning_rate": 0.00010776859504132233,
220
+ "loss": 1.8901,
221
+ "step": 290
222
+ },
223
+ {
224
+ "epoch": 1.463980463980464,
225
+ "grad_norm": 0.2134978324174881,
226
+ "learning_rate": 0.00010446280991735538,
227
+ "loss": 1.8604,
228
+ "step": 300
229
+ },
230
+ {
231
+ "epoch": 1.5128205128205128,
232
+ "grad_norm": 0.43554314970970154,
233
+ "learning_rate": 0.00010115702479338845,
234
+ "loss": 1.7911,
235
+ "step": 310
236
+ },
237
+ {
238
+ "epoch": 1.5616605616605617,
239
+ "grad_norm": 0.20964309573173523,
240
+ "learning_rate": 9.785123966942148e-05,
241
+ "loss": 1.6337,
242
+ "step": 320
243
+ },
244
+ {
245
+ "epoch": 1.6105006105006106,
246
+ "grad_norm": 0.5670686364173889,
247
+ "learning_rate": 9.454545454545455e-05,
248
+ "loss": 1.7667,
249
+ "step": 330
250
+ },
251
+ {
252
+ "epoch": 1.6593406593406592,
253
+ "grad_norm": 0.28007999062538147,
254
+ "learning_rate": 9.12396694214876e-05,
255
+ "loss": 1.7941,
256
+ "step": 340
257
+ },
258
+ {
259
+ "epoch": 1.7081807081807083,
260
+ "grad_norm": 0.244796484708786,
261
+ "learning_rate": 8.793388429752067e-05,
262
+ "loss": 1.6848,
263
+ "step": 350
264
+ },
265
+ {
266
+ "epoch": 1.757020757020757,
267
+ "grad_norm": 0.2684899866580963,
268
+ "learning_rate": 8.462809917355372e-05,
269
+ "loss": 1.8238,
270
+ "step": 360
271
+ },
272
+ {
273
+ "epoch": 1.8058608058608059,
274
+ "grad_norm": 0.2789863049983978,
275
+ "learning_rate": 8.132231404958678e-05,
276
+ "loss": 1.7551,
277
+ "step": 370
278
+ },
279
+ {
280
+ "epoch": 1.8547008547008548,
281
+ "grad_norm": 0.2315087765455246,
282
+ "learning_rate": 7.801652892561983e-05,
283
+ "loss": 1.6725,
284
+ "step": 380
285
+ },
286
+ {
287
+ "epoch": 1.9035409035409034,
288
+ "grad_norm": 0.2351648360490799,
289
+ "learning_rate": 7.47107438016529e-05,
290
+ "loss": 1.7861,
291
+ "step": 390
292
+ },
293
+ {
294
+ "epoch": 1.9523809523809523,
295
+ "grad_norm": 0.15907102823257446,
296
+ "learning_rate": 7.140495867768595e-05,
297
+ "loss": 1.6753,
298
+ "step": 400
299
+ },
300
+ {
301
+ "epoch": 2.0,
302
+ "grad_norm": 0.15325959026813507,
303
+ "learning_rate": 6.8099173553719e-05,
304
+ "loss": 1.8352,
305
+ "step": 410
306
+ },
307
+ {
308
+ "epoch": 2.0,
309
+ "eval_loss": 1.7097043991088867,
310
+ "eval_runtime": 19.9498,
311
+ "eval_samples_per_second": 9.123,
312
+ "eval_steps_per_second": 1.153,
313
+ "step": 410
314
+ },
315
+ {
316
+ "epoch": 2.0488400488400487,
317
+ "grad_norm": 0.10827562212944031,
318
+ "learning_rate": 6.479338842975207e-05,
319
+ "loss": 1.7363,
320
+ "step": 420
321
+ },
322
+ {
323
+ "epoch": 2.0976800976800978,
324
+ "grad_norm": 0.1293765902519226,
325
+ "learning_rate": 6.148760330578512e-05,
326
+ "loss": 1.726,
327
+ "step": 430
328
+ },
329
+ {
330
+ "epoch": 2.1465201465201464,
331
+ "grad_norm": 0.1775052547454834,
332
+ "learning_rate": 5.818181818181818e-05,
333
+ "loss": 1.6641,
334
+ "step": 440
335
+ },
336
+ {
337
+ "epoch": 2.1953601953601956,
338
+ "grad_norm": 0.10628961026668549,
339
+ "learning_rate": 5.487603305785124e-05,
340
+ "loss": 1.8193,
341
+ "step": 450
342
+ },
343
+ {
344
+ "epoch": 2.244200244200244,
345
+ "grad_norm": 0.11912493407726288,
346
+ "learning_rate": 5.1570247933884295e-05,
347
+ "loss": 1.6497,
348
+ "step": 460
349
+ },
350
+ {
351
+ "epoch": 2.293040293040293,
352
+ "grad_norm": 0.17353476583957672,
353
+ "learning_rate": 4.826446280991736e-05,
354
+ "loss": 1.7043,
355
+ "step": 470
356
+ },
357
+ {
358
+ "epoch": 2.341880341880342,
359
+ "grad_norm": 0.11081728339195251,
360
+ "learning_rate": 4.495867768595042e-05,
361
+ "loss": 1.7964,
362
+ "step": 480
363
+ },
364
+ {
365
+ "epoch": 2.3907203907203907,
366
+ "grad_norm": 0.09711920469999313,
367
+ "learning_rate": 4.165289256198348e-05,
368
+ "loss": 1.8315,
369
+ "step": 490
370
+ },
371
+ {
372
+ "epoch": 2.4395604395604398,
373
+ "grad_norm": 0.1099553108215332,
374
+ "learning_rate": 3.8347107438016536e-05,
375
+ "loss": 1.7696,
376
+ "step": 500
377
+ },
378
+ {
379
+ "epoch": 2.4884004884004884,
380
+ "grad_norm": 0.12278404831886292,
381
+ "learning_rate": 3.504132231404959e-05,
382
+ "loss": 1.7851,
383
+ "step": 510
384
+ },
385
+ {
386
+ "epoch": 2.537240537240537,
387
+ "grad_norm": 0.08975676447153091,
388
+ "learning_rate": 3.1735537190082646e-05,
389
+ "loss": 1.9197,
390
+ "step": 520
391
+ },
392
+ {
393
+ "epoch": 2.586080586080586,
394
+ "grad_norm": 0.08411797136068344,
395
+ "learning_rate": 2.8429752066115704e-05,
396
+ "loss": 1.6497,
397
+ "step": 530
398
+ },
399
+ {
400
+ "epoch": 2.634920634920635,
401
+ "grad_norm": 0.10348548740148544,
402
+ "learning_rate": 2.5123966942148763e-05,
403
+ "loss": 1.5763,
404
+ "step": 540
405
+ },
406
+ {
407
+ "epoch": 2.683760683760684,
408
+ "grad_norm": 0.10242890566587448,
409
+ "learning_rate": 2.1818181818181818e-05,
410
+ "loss": 1.5551,
411
+ "step": 550
412
+ },
413
+ {
414
+ "epoch": 2.7326007326007327,
415
+ "grad_norm": 0.09305644780397415,
416
+ "learning_rate": 1.8512396694214876e-05,
417
+ "loss": 1.8751,
418
+ "step": 560
419
+ },
420
+ {
421
+ "epoch": 2.7814407814407813,
422
+ "grad_norm": 0.10115351527929306,
423
+ "learning_rate": 1.5206611570247933e-05,
424
+ "loss": 1.6887,
425
+ "step": 570
426
+ },
427
+ {
428
+ "epoch": 2.8302808302808304,
429
+ "grad_norm": 0.0991702601313591,
430
+ "learning_rate": 1.1900826446280993e-05,
431
+ "loss": 1.6846,
432
+ "step": 580
433
+ },
434
+ {
435
+ "epoch": 2.879120879120879,
436
+ "grad_norm": 0.11022075265645981,
437
+ "learning_rate": 8.59504132231405e-06,
438
+ "loss": 1.88,
439
+ "step": 590
440
+ },
441
+ {
442
+ "epoch": 2.927960927960928,
443
+ "grad_norm": 0.12525387108325958,
444
+ "learning_rate": 5.289256198347107e-06,
445
+ "loss": 1.7041,
446
+ "step": 600
447
+ },
448
+ {
449
+ "epoch": 2.976800976800977,
450
+ "grad_norm": 0.1043957769870758,
451
+ "learning_rate": 1.9834710743801654e-06,
452
+ "loss": 1.8158,
453
+ "step": 610
454
+ },
455
+ {
456
+ "epoch": 3.0,
457
+ "eval_loss": 1.7082070112228394,
458
+ "eval_runtime": 19.9486,
459
+ "eval_samples_per_second": 9.123,
460
+ "eval_steps_per_second": 1.153,
461
+ "step": 615
462
+ }
463
+ ],
464
+ "logging_steps": 10,
465
+ "max_steps": 615,
466
+ "num_input_tokens_seen": 0,
467
+ "num_train_epochs": 3,
468
+ "save_steps": 500,
469
+ "stateful_callbacks": {
470
+ "TrainerControl": {
471
+ "args": {
472
+ "should_epoch_stop": false,
473
+ "should_evaluate": false,
474
+ "should_log": false,
475
+ "should_save": true,
476
+ "should_training_stop": true
477
+ },
478
+ "attributes": {}
479
+ }
480
+ },
481
+ "total_flos": 2.149830720623739e+17,
482
+ "train_batch_size": 1,
483
+ "trial_name": null,
484
+ "trial_params": null
485
+ }
checkpoint-615/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d91f4cbf3cd0a6d6f917302881f00ad5f5a4b3059dba94b59bdffe68e30afaf4
3
+ size 5905
runs/Oct29_18-28-07_299803b437f5/events.out.tfevents.1761762498.299803b437f5.1532.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:307acfa7fcbdcb72d62fe162d74dc71a565d2d3f17b839d808e6cf1b5382668a
3
+ size 19545
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
3
+ size 587404
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff