JLake310 commited on
Commit
dccb3f8
·
verified ·
1 Parent(s): 54db375

add models

Browse files
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
mlc-chat-config.json ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "roberta",
3
+ "quantization": "q0f32",
4
+ "model_config": {
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-05,
14
+ "max_position_embeddings": 514,
15
+ "model_type": "roberta",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 1,
19
+ "type_vocab_size": 1,
20
+ "vocab_size": 50265,
21
+ "num_labels": 2,
22
+ "classifier_dropout": null,
23
+ "chunk_size_feed_forward": 0,
24
+ "is_decoder": false,
25
+ "add_cross_attention": false,
26
+ "use_return_dict": false,
27
+ "context_window_size": 768,
28
+ "prefill_chunk_size": 0,
29
+ "max_batch_size": 80,
30
+ "tensor_parallel_shards": 1
31
+ },
32
+ "vocab_size": 50265,
33
+ "context_window_size": 768,
34
+ "sliding_window_size": -1,
35
+ "prefill_chunk_size": 0,
36
+ "attention_sink_size": -1,
37
+ "tensor_parallel_shards": 1,
38
+ "mean_gen_len": 128,
39
+ "max_gen_len": 512,
40
+ "shift_fill_factor": 0.3,
41
+ "temperature": 0,
42
+ "presence_penalty": 0.0,
43
+ "frequency_penalty": 0.0,
44
+ "repetition_penalty": 1.0,
45
+ "top_p": 0.95,
46
+ "conv_template": {
47
+ "name": "roberta",
48
+ "system_template": "{system_message}",
49
+ "system_message": "",
50
+ "add_role_after_system_message": true,
51
+ "roles": {
52
+ "user": "",
53
+ "assistant": ""
54
+ },
55
+ "role_templates": {
56
+ "user": "{user_message}",
57
+ "assistant": "{assistant_message}",
58
+ "tool": "{tool_message}"
59
+ },
60
+ "messages": [],
61
+ "seps": [
62
+ "</s>"
63
+ ],
64
+ "role_content_sep": "",
65
+ "role_empty_sep": "",
66
+ "stop_str": [],
67
+ "stop_token_ids": [
68
+ 2
69
+ ],
70
+ "function_string": "",
71
+ "use_function_calling": false,
72
+ "image_token_index": -1
73
+ },
74
+ "pad_token_id": 1,
75
+ "bos_token_id": 0,
76
+ "eos_token_id": 2,
77
+ "tokenizer_files": [
78
+ "tokenizer.json",
79
+ "vocab.json",
80
+ "merges.txt",
81
+ "tokenizer_config.json"
82
+ ],
83
+ "version": "0.1.0"
84
+ }
ndarray-cache-b16.json ADDED
@@ -0,0 +1,2152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 201,
4
+ "ParamBytes": 498588680.0,
5
+ "BitsPerParam": 32.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 77207040,
12
+ "records": [
13
+ {
14
+ "name": "roberta.embeddings.word_embeddings.weight",
15
+ "shape": [
16
+ 50265,
17
+ 768
18
+ ],
19
+ "dtype": "bfloat16",
20
+ "format": "raw",
21
+ "nbytes": 77207040,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "4a61cba31613349f9ffc22e1606a39c1"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 32696836,
31
+ "records": [
32
+ {
33
+ "name": "classifier.dense.bias",
34
+ "shape": [
35
+ 768
36
+ ],
37
+ "dtype": "bfloat16",
38
+ "format": "raw",
39
+ "nbytes": 1536,
40
+ "byteOffset": 0
41
+ },
42
+ {
43
+ "name": "classifier.dense.weight",
44
+ "shape": [
45
+ 768,
46
+ 768
47
+ ],
48
+ "dtype": "bfloat16",
49
+ "format": "raw",
50
+ "nbytes": 1179648,
51
+ "byteOffset": 1536
52
+ },
53
+ {
54
+ "name": "classifier.out_proj.bias",
55
+ "shape": [
56
+ 2
57
+ ],
58
+ "dtype": "bfloat16",
59
+ "format": "raw",
60
+ "nbytes": 4,
61
+ "byteOffset": 1181184
62
+ },
63
+ {
64
+ "name": "classifier.out_proj.weight",
65
+ "shape": [
66
+ 2,
67
+ 768
68
+ ],
69
+ "dtype": "bfloat16",
70
+ "format": "raw",
71
+ "nbytes": 3072,
72
+ "byteOffset": 1181188
73
+ },
74
+ {
75
+ "name": "roberta.embeddings.LayerNorm.bias",
76
+ "shape": [
77
+ 768
78
+ ],
79
+ "dtype": "bfloat16",
80
+ "format": "raw",
81
+ "nbytes": 1536,
82
+ "byteOffset": 1184260
83
+ },
84
+ {
85
+ "name": "roberta.embeddings.LayerNorm.weight",
86
+ "shape": [
87
+ 768
88
+ ],
89
+ "dtype": "bfloat16",
90
+ "format": "raw",
91
+ "nbytes": 1536,
92
+ "byteOffset": 1185796
93
+ },
94
+ {
95
+ "name": "roberta.embeddings.position_embeddings.weight",
96
+ "shape": [
97
+ 514,
98
+ 768
99
+ ],
100
+ "dtype": "bfloat16",
101
+ "format": "raw",
102
+ "nbytes": 789504,
103
+ "byteOffset": 1187332
104
+ },
105
+ {
106
+ "name": "roberta.embeddings.token_type_embeddings.weight",
107
+ "shape": [
108
+ 1,
109
+ 768
110
+ ],
111
+ "dtype": "bfloat16",
112
+ "format": "raw",
113
+ "nbytes": 1536,
114
+ "byteOffset": 1976836
115
+ },
116
+ {
117
+ "name": "roberta.encoder.layer.0.attention.output.LayerNorm.bias",
118
+ "shape": [
119
+ 768
120
+ ],
121
+ "dtype": "bfloat16",
122
+ "format": "raw",
123
+ "nbytes": 1536,
124
+ "byteOffset": 1978372
125
+ },
126
+ {
127
+ "name": "roberta.encoder.layer.0.attention.output.LayerNorm.weight",
128
+ "shape": [
129
+ 768
130
+ ],
131
+ "dtype": "bfloat16",
132
+ "format": "raw",
133
+ "nbytes": 1536,
134
+ "byteOffset": 1979908
135
+ },
136
+ {
137
+ "name": "roberta.encoder.layer.0.attention.output.dense.bias",
138
+ "shape": [
139
+ 768
140
+ ],
141
+ "dtype": "bfloat16",
142
+ "format": "raw",
143
+ "nbytes": 1536,
144
+ "byteOffset": 1981444
145
+ },
146
+ {
147
+ "name": "roberta.encoder.layer.0.attention.output.dense.weight",
148
+ "shape": [
149
+ 768,
150
+ 768
151
+ ],
152
+ "dtype": "bfloat16",
153
+ "format": "raw",
154
+ "nbytes": 1179648,
155
+ "byteOffset": 1982980
156
+ },
157
+ {
158
+ "name": "roberta.encoder.layer.0.attention.self.key.bias",
159
+ "shape": [
160
+ 768
161
+ ],
162
+ "dtype": "bfloat16",
163
+ "format": "raw",
164
+ "nbytes": 1536,
165
+ "byteOffset": 3162628
166
+ },
167
+ {
168
+ "name": "roberta.encoder.layer.0.attention.self.key.weight",
169
+ "shape": [
170
+ 768,
171
+ 768
172
+ ],
173
+ "dtype": "bfloat16",
174
+ "format": "raw",
175
+ "nbytes": 1179648,
176
+ "byteOffset": 3164164
177
+ },
178
+ {
179
+ "name": "roberta.encoder.layer.0.attention.self.query.bias",
180
+ "shape": [
181
+ 768
182
+ ],
183
+ "dtype": "bfloat16",
184
+ "format": "raw",
185
+ "nbytes": 1536,
186
+ "byteOffset": 4343812
187
+ },
188
+ {
189
+ "name": "roberta.encoder.layer.0.attention.self.query.weight",
190
+ "shape": [
191
+ 768,
192
+ 768
193
+ ],
194
+ "dtype": "bfloat16",
195
+ "format": "raw",
196
+ "nbytes": 1179648,
197
+ "byteOffset": 4345348
198
+ },
199
+ {
200
+ "name": "roberta.encoder.layer.0.attention.self.value.bias",
201
+ "shape": [
202
+ 768
203
+ ],
204
+ "dtype": "bfloat16",
205
+ "format": "raw",
206
+ "nbytes": 1536,
207
+ "byteOffset": 5524996
208
+ },
209
+ {
210
+ "name": "roberta.encoder.layer.0.attention.self.value.weight",
211
+ "shape": [
212
+ 768,
213
+ 768
214
+ ],
215
+ "dtype": "bfloat16",
216
+ "format": "raw",
217
+ "nbytes": 1179648,
218
+ "byteOffset": 5526532
219
+ },
220
+ {
221
+ "name": "roberta.encoder.layer.0.intermediate.dense.bias",
222
+ "shape": [
223
+ 3072
224
+ ],
225
+ "dtype": "bfloat16",
226
+ "format": "raw",
227
+ "nbytes": 6144,
228
+ "byteOffset": 6706180
229
+ },
230
+ {
231
+ "name": "roberta.encoder.layer.0.intermediate.dense.weight",
232
+ "shape": [
233
+ 3072,
234
+ 768
235
+ ],
236
+ "dtype": "bfloat16",
237
+ "format": "raw",
238
+ "nbytes": 4718592,
239
+ "byteOffset": 6712324
240
+ },
241
+ {
242
+ "name": "roberta.encoder.layer.0.output.LayerNorm.bias",
243
+ "shape": [
244
+ 768
245
+ ],
246
+ "dtype": "bfloat16",
247
+ "format": "raw",
248
+ "nbytes": 1536,
249
+ "byteOffset": 11430916
250
+ },
251
+ {
252
+ "name": "roberta.encoder.layer.0.output.LayerNorm.weight",
253
+ "shape": [
254
+ 768
255
+ ],
256
+ "dtype": "bfloat16",
257
+ "format": "raw",
258
+ "nbytes": 1536,
259
+ "byteOffset": 11432452
260
+ },
261
+ {
262
+ "name": "roberta.encoder.layer.0.output.dense.bias",
263
+ "shape": [
264
+ 768
265
+ ],
266
+ "dtype": "bfloat16",
267
+ "format": "raw",
268
+ "nbytes": 1536,
269
+ "byteOffset": 11433988
270
+ },
271
+ {
272
+ "name": "roberta.encoder.layer.0.output.dense.weight",
273
+ "shape": [
274
+ 768,
275
+ 3072
276
+ ],
277
+ "dtype": "bfloat16",
278
+ "format": "raw",
279
+ "nbytes": 4718592,
280
+ "byteOffset": 11435524
281
+ },
282
+ {
283
+ "name": "roberta.encoder.layer.1.attention.output.LayerNorm.bias",
284
+ "shape": [
285
+ 768
286
+ ],
287
+ "dtype": "bfloat16",
288
+ "format": "raw",
289
+ "nbytes": 1536,
290
+ "byteOffset": 16154116
291
+ },
292
+ {
293
+ "name": "roberta.encoder.layer.1.attention.output.LayerNorm.weight",
294
+ "shape": [
295
+ 768
296
+ ],
297
+ "dtype": "bfloat16",
298
+ "format": "raw",
299
+ "nbytes": 1536,
300
+ "byteOffset": 16155652
301
+ },
302
+ {
303
+ "name": "roberta.encoder.layer.1.attention.output.dense.bias",
304
+ "shape": [
305
+ 768
306
+ ],
307
+ "dtype": "bfloat16",
308
+ "format": "raw",
309
+ "nbytes": 1536,
310
+ "byteOffset": 16157188
311
+ },
312
+ {
313
+ "name": "roberta.encoder.layer.1.attention.output.dense.weight",
314
+ "shape": [
315
+ 768,
316
+ 768
317
+ ],
318
+ "dtype": "bfloat16",
319
+ "format": "raw",
320
+ "nbytes": 1179648,
321
+ "byteOffset": 16158724
322
+ },
323
+ {
324
+ "name": "roberta.encoder.layer.1.attention.self.key.bias",
325
+ "shape": [
326
+ 768
327
+ ],
328
+ "dtype": "bfloat16",
329
+ "format": "raw",
330
+ "nbytes": 1536,
331
+ "byteOffset": 17338372
332
+ },
333
+ {
334
+ "name": "roberta.encoder.layer.1.attention.self.key.weight",
335
+ "shape": [
336
+ 768,
337
+ 768
338
+ ],
339
+ "dtype": "bfloat16",
340
+ "format": "raw",
341
+ "nbytes": 1179648,
342
+ "byteOffset": 17339908
343
+ },
344
+ {
345
+ "name": "roberta.encoder.layer.1.attention.self.query.bias",
346
+ "shape": [
347
+ 768
348
+ ],
349
+ "dtype": "bfloat16",
350
+ "format": "raw",
351
+ "nbytes": 1536,
352
+ "byteOffset": 18519556
353
+ },
354
+ {
355
+ "name": "roberta.encoder.layer.1.attention.self.query.weight",
356
+ "shape": [
357
+ 768,
358
+ 768
359
+ ],
360
+ "dtype": "bfloat16",
361
+ "format": "raw",
362
+ "nbytes": 1179648,
363
+ "byteOffset": 18521092
364
+ },
365
+ {
366
+ "name": "roberta.encoder.layer.1.attention.self.value.bias",
367
+ "shape": [
368
+ 768
369
+ ],
370
+ "dtype": "bfloat16",
371
+ "format": "raw",
372
+ "nbytes": 1536,
373
+ "byteOffset": 19700740
374
+ },
375
+ {
376
+ "name": "roberta.encoder.layer.1.attention.self.value.weight",
377
+ "shape": [
378
+ 768,
379
+ 768
380
+ ],
381
+ "dtype": "bfloat16",
382
+ "format": "raw",
383
+ "nbytes": 1179648,
384
+ "byteOffset": 19702276
385
+ },
386
+ {
387
+ "name": "roberta.encoder.layer.1.intermediate.dense.bias",
388
+ "shape": [
389
+ 3072
390
+ ],
391
+ "dtype": "bfloat16",
392
+ "format": "raw",
393
+ "nbytes": 6144,
394
+ "byteOffset": 20881924
395
+ },
396
+ {
397
+ "name": "roberta.encoder.layer.1.intermediate.dense.weight",
398
+ "shape": [
399
+ 3072,
400
+ 768
401
+ ],
402
+ "dtype": "bfloat16",
403
+ "format": "raw",
404
+ "nbytes": 4718592,
405
+ "byteOffset": 20888068
406
+ },
407
+ {
408
+ "name": "roberta.encoder.layer.1.output.LayerNorm.bias",
409
+ "shape": [
410
+ 768
411
+ ],
412
+ "dtype": "bfloat16",
413
+ "format": "raw",
414
+ "nbytes": 1536,
415
+ "byteOffset": 25606660
416
+ },
417
+ {
418
+ "name": "roberta.encoder.layer.1.output.LayerNorm.weight",
419
+ "shape": [
420
+ 768
421
+ ],
422
+ "dtype": "bfloat16",
423
+ "format": "raw",
424
+ "nbytes": 1536,
425
+ "byteOffset": 25608196
426
+ },
427
+ {
428
+ "name": "roberta.encoder.layer.1.output.dense.bias",
429
+ "shape": [
430
+ 768
431
+ ],
432
+ "dtype": "bfloat16",
433
+ "format": "raw",
434
+ "nbytes": 1536,
435
+ "byteOffset": 25609732
436
+ },
437
+ {
438
+ "name": "roberta.encoder.layer.1.output.dense.weight",
439
+ "shape": [
440
+ 768,
441
+ 3072
442
+ ],
443
+ "dtype": "bfloat16",
444
+ "format": "raw",
445
+ "nbytes": 4718592,
446
+ "byteOffset": 25611268
447
+ },
448
+ {
449
+ "name": "roberta.encoder.layer.10.attention.output.LayerNorm.bias",
450
+ "shape": [
451
+ 768
452
+ ],
453
+ "dtype": "bfloat16",
454
+ "format": "raw",
455
+ "nbytes": 1536,
456
+ "byteOffset": 30329860
457
+ },
458
+ {
459
+ "name": "roberta.encoder.layer.10.attention.output.LayerNorm.weight",
460
+ "shape": [
461
+ 768
462
+ ],
463
+ "dtype": "bfloat16",
464
+ "format": "raw",
465
+ "nbytes": 1536,
466
+ "byteOffset": 30331396
467
+ },
468
+ {
469
+ "name": "roberta.encoder.layer.10.attention.output.dense.bias",
470
+ "shape": [
471
+ 768
472
+ ],
473
+ "dtype": "bfloat16",
474
+ "format": "raw",
475
+ "nbytes": 1536,
476
+ "byteOffset": 30332932
477
+ },
478
+ {
479
+ "name": "roberta.encoder.layer.10.attention.output.dense.weight",
480
+ "shape": [
481
+ 768,
482
+ 768
483
+ ],
484
+ "dtype": "bfloat16",
485
+ "format": "raw",
486
+ "nbytes": 1179648,
487
+ "byteOffset": 30334468
488
+ },
489
+ {
490
+ "name": "roberta.encoder.layer.10.attention.self.key.bias",
491
+ "shape": [
492
+ 768
493
+ ],
494
+ "dtype": "bfloat16",
495
+ "format": "raw",
496
+ "nbytes": 1536,
497
+ "byteOffset": 31514116
498
+ },
499
+ {
500
+ "name": "roberta.encoder.layer.10.attention.self.key.weight",
501
+ "shape": [
502
+ 768,
503
+ 768
504
+ ],
505
+ "dtype": "bfloat16",
506
+ "format": "raw",
507
+ "nbytes": 1179648,
508
+ "byteOffset": 31515652
509
+ },
510
+ {
511
+ "name": "roberta.encoder.layer.10.attention.self.query.bias",
512
+ "shape": [
513
+ 768
514
+ ],
515
+ "dtype": "bfloat16",
516
+ "format": "raw",
517
+ "nbytes": 1536,
518
+ "byteOffset": 32695300
519
+ }
520
+ ],
521
+ "md5sum": "bb1b378319baec7f1ca617bf8f86ff32"
522
+ },
523
+ {
524
+ "dataPath": "params_shard_2.bin",
525
+ "format": "raw-shard",
526
+ "nbytes": 30718464,
527
+ "records": [
528
+ {
529
+ "name": "roberta.encoder.layer.10.attention.self.query.weight",
530
+ "shape": [
531
+ 768,
532
+ 768
533
+ ],
534
+ "dtype": "bfloat16",
535
+ "format": "raw",
536
+ "nbytes": 1179648,
537
+ "byteOffset": 0
538
+ },
539
+ {
540
+ "name": "roberta.encoder.layer.10.attention.self.value.bias",
541
+ "shape": [
542
+ 768
543
+ ],
544
+ "dtype": "bfloat16",
545
+ "format": "raw",
546
+ "nbytes": 1536,
547
+ "byteOffset": 1179648
548
+ },
549
+ {
550
+ "name": "roberta.encoder.layer.10.attention.self.value.weight",
551
+ "shape": [
552
+ 768,
553
+ 768
554
+ ],
555
+ "dtype": "bfloat16",
556
+ "format": "raw",
557
+ "nbytes": 1179648,
558
+ "byteOffset": 1181184
559
+ },
560
+ {
561
+ "name": "roberta.encoder.layer.10.intermediate.dense.bias",
562
+ "shape": [
563
+ 3072
564
+ ],
565
+ "dtype": "bfloat16",
566
+ "format": "raw",
567
+ "nbytes": 6144,
568
+ "byteOffset": 2360832
569
+ },
570
+ {
571
+ "name": "roberta.encoder.layer.10.intermediate.dense.weight",
572
+ "shape": [
573
+ 3072,
574
+ 768
575
+ ],
576
+ "dtype": "bfloat16",
577
+ "format": "raw",
578
+ "nbytes": 4718592,
579
+ "byteOffset": 2366976
580
+ },
581
+ {
582
+ "name": "roberta.encoder.layer.10.output.LayerNorm.bias",
583
+ "shape": [
584
+ 768
585
+ ],
586
+ "dtype": "bfloat16",
587
+ "format": "raw",
588
+ "nbytes": 1536,
589
+ "byteOffset": 7085568
590
+ },
591
+ {
592
+ "name": "roberta.encoder.layer.10.output.LayerNorm.weight",
593
+ "shape": [
594
+ 768
595
+ ],
596
+ "dtype": "bfloat16",
597
+ "format": "raw",
598
+ "nbytes": 1536,
599
+ "byteOffset": 7087104
600
+ },
601
+ {
602
+ "name": "roberta.encoder.layer.10.output.dense.bias",
603
+ "shape": [
604
+ 768
605
+ ],
606
+ "dtype": "bfloat16",
607
+ "format": "raw",
608
+ "nbytes": 1536,
609
+ "byteOffset": 7088640
610
+ },
611
+ {
612
+ "name": "roberta.encoder.layer.10.output.dense.weight",
613
+ "shape": [
614
+ 768,
615
+ 3072
616
+ ],
617
+ "dtype": "bfloat16",
618
+ "format": "raw",
619
+ "nbytes": 4718592,
620
+ "byteOffset": 7090176
621
+ },
622
+ {
623
+ "name": "roberta.encoder.layer.11.attention.output.LayerNorm.bias",
624
+ "shape": [
625
+ 768
626
+ ],
627
+ "dtype": "bfloat16",
628
+ "format": "raw",
629
+ "nbytes": 1536,
630
+ "byteOffset": 11808768
631
+ },
632
+ {
633
+ "name": "roberta.encoder.layer.11.attention.output.LayerNorm.weight",
634
+ "shape": [
635
+ 768
636
+ ],
637
+ "dtype": "bfloat16",
638
+ "format": "raw",
639
+ "nbytes": 1536,
640
+ "byteOffset": 11810304
641
+ },
642
+ {
643
+ "name": "roberta.encoder.layer.11.attention.output.dense.bias",
644
+ "shape": [
645
+ 768
646
+ ],
647
+ "dtype": "bfloat16",
648
+ "format": "raw",
649
+ "nbytes": 1536,
650
+ "byteOffset": 11811840
651
+ },
652
+ {
653
+ "name": "roberta.encoder.layer.11.attention.output.dense.weight",
654
+ "shape": [
655
+ 768,
656
+ 768
657
+ ],
658
+ "dtype": "bfloat16",
659
+ "format": "raw",
660
+ "nbytes": 1179648,
661
+ "byteOffset": 11813376
662
+ },
663
+ {
664
+ "name": "roberta.encoder.layer.11.attention.self.key.bias",
665
+ "shape": [
666
+ 768
667
+ ],
668
+ "dtype": "bfloat16",
669
+ "format": "raw",
670
+ "nbytes": 1536,
671
+ "byteOffset": 12993024
672
+ },
673
+ {
674
+ "name": "roberta.encoder.layer.11.attention.self.key.weight",
675
+ "shape": [
676
+ 768,
677
+ 768
678
+ ],
679
+ "dtype": "bfloat16",
680
+ "format": "raw",
681
+ "nbytes": 1179648,
682
+ "byteOffset": 12994560
683
+ },
684
+ {
685
+ "name": "roberta.encoder.layer.11.attention.self.query.bias",
686
+ "shape": [
687
+ 768
688
+ ],
689
+ "dtype": "bfloat16",
690
+ "format": "raw",
691
+ "nbytes": 1536,
692
+ "byteOffset": 14174208
693
+ },
694
+ {
695
+ "name": "roberta.encoder.layer.11.attention.self.query.weight",
696
+ "shape": [
697
+ 768,
698
+ 768
699
+ ],
700
+ "dtype": "bfloat16",
701
+ "format": "raw",
702
+ "nbytes": 1179648,
703
+ "byteOffset": 14175744
704
+ },
705
+ {
706
+ "name": "roberta.encoder.layer.11.attention.self.value.bias",
707
+ "shape": [
708
+ 768
709
+ ],
710
+ "dtype": "bfloat16",
711
+ "format": "raw",
712
+ "nbytes": 1536,
713
+ "byteOffset": 15355392
714
+ },
715
+ {
716
+ "name": "roberta.encoder.layer.11.attention.self.value.weight",
717
+ "shape": [
718
+ 768,
719
+ 768
720
+ ],
721
+ "dtype": "bfloat16",
722
+ "format": "raw",
723
+ "nbytes": 1179648,
724
+ "byteOffset": 15356928
725
+ },
726
+ {
727
+ "name": "roberta.encoder.layer.11.intermediate.dense.bias",
728
+ "shape": [
729
+ 3072
730
+ ],
731
+ "dtype": "bfloat16",
732
+ "format": "raw",
733
+ "nbytes": 6144,
734
+ "byteOffset": 16536576
735
+ },
736
+ {
737
+ "name": "roberta.encoder.layer.11.intermediate.dense.weight",
738
+ "shape": [
739
+ 3072,
740
+ 768
741
+ ],
742
+ "dtype": "bfloat16",
743
+ "format": "raw",
744
+ "nbytes": 4718592,
745
+ "byteOffset": 16542720
746
+ },
747
+ {
748
+ "name": "roberta.encoder.layer.11.output.LayerNorm.bias",
749
+ "shape": [
750
+ 768
751
+ ],
752
+ "dtype": "bfloat16",
753
+ "format": "raw",
754
+ "nbytes": 1536,
755
+ "byteOffset": 21261312
756
+ },
757
+ {
758
+ "name": "roberta.encoder.layer.11.output.LayerNorm.weight",
759
+ "shape": [
760
+ 768
761
+ ],
762
+ "dtype": "bfloat16",
763
+ "format": "raw",
764
+ "nbytes": 1536,
765
+ "byteOffset": 21262848
766
+ },
767
+ {
768
+ "name": "roberta.encoder.layer.11.output.dense.bias",
769
+ "shape": [
770
+ 768
771
+ ],
772
+ "dtype": "bfloat16",
773
+ "format": "raw",
774
+ "nbytes": 1536,
775
+ "byteOffset": 21264384
776
+ },
777
+ {
778
+ "name": "roberta.encoder.layer.11.output.dense.weight",
779
+ "shape": [
780
+ 768,
781
+ 3072
782
+ ],
783
+ "dtype": "bfloat16",
784
+ "format": "raw",
785
+ "nbytes": 4718592,
786
+ "byteOffset": 21265920
787
+ },
788
+ {
789
+ "name": "roberta.encoder.layer.2.attention.output.LayerNorm.bias",
790
+ "shape": [
791
+ 768
792
+ ],
793
+ "dtype": "bfloat16",
794
+ "format": "raw",
795
+ "nbytes": 1536,
796
+ "byteOffset": 25984512
797
+ },
798
+ {
799
+ "name": "roberta.encoder.layer.2.attention.output.LayerNorm.weight",
800
+ "shape": [
801
+ 768
802
+ ],
803
+ "dtype": "bfloat16",
804
+ "format": "raw",
805
+ "nbytes": 1536,
806
+ "byteOffset": 25986048
807
+ },
808
+ {
809
+ "name": "roberta.encoder.layer.2.attention.output.dense.bias",
810
+ "shape": [
811
+ 768
812
+ ],
813
+ "dtype": "bfloat16",
814
+ "format": "raw",
815
+ "nbytes": 1536,
816
+ "byteOffset": 25987584
817
+ },
818
+ {
819
+ "name": "roberta.encoder.layer.2.attention.output.dense.weight",
820
+ "shape": [
821
+ 768,
822
+ 768
823
+ ],
824
+ "dtype": "bfloat16",
825
+ "format": "raw",
826
+ "nbytes": 1179648,
827
+ "byteOffset": 25989120
828
+ },
829
+ {
830
+ "name": "roberta.encoder.layer.2.attention.self.key.bias",
831
+ "shape": [
832
+ 768
833
+ ],
834
+ "dtype": "bfloat16",
835
+ "format": "raw",
836
+ "nbytes": 1536,
837
+ "byteOffset": 27168768
838
+ },
839
+ {
840
+ "name": "roberta.encoder.layer.2.attention.self.key.weight",
841
+ "shape": [
842
+ 768,
843
+ 768
844
+ ],
845
+ "dtype": "bfloat16",
846
+ "format": "raw",
847
+ "nbytes": 1179648,
848
+ "byteOffset": 27170304
849
+ },
850
+ {
851
+ "name": "roberta.encoder.layer.2.attention.self.query.bias",
852
+ "shape": [
853
+ 768
854
+ ],
855
+ "dtype": "bfloat16",
856
+ "format": "raw",
857
+ "nbytes": 1536,
858
+ "byteOffset": 28349952
859
+ },
860
+ {
861
+ "name": "roberta.encoder.layer.2.attention.self.query.weight",
862
+ "shape": [
863
+ 768,
864
+ 768
865
+ ],
866
+ "dtype": "bfloat16",
867
+ "format": "raw",
868
+ "nbytes": 1179648,
869
+ "byteOffset": 28351488
870
+ },
871
+ {
872
+ "name": "roberta.encoder.layer.2.attention.self.value.bias",
873
+ "shape": [
874
+ 768
875
+ ],
876
+ "dtype": "bfloat16",
877
+ "format": "raw",
878
+ "nbytes": 1536,
879
+ "byteOffset": 29531136
880
+ },
881
+ {
882
+ "name": "roberta.encoder.layer.2.attention.self.value.weight",
883
+ "shape": [
884
+ 768,
885
+ 768
886
+ ],
887
+ "dtype": "bfloat16",
888
+ "format": "raw",
889
+ "nbytes": 1179648,
890
+ "byteOffset": 29532672
891
+ },
892
+ {
893
+ "name": "roberta.encoder.layer.2.intermediate.dense.bias",
894
+ "shape": [
895
+ 3072
896
+ ],
897
+ "dtype": "bfloat16",
898
+ "format": "raw",
899
+ "nbytes": 6144,
900
+ "byteOffset": 30712320
901
+ }
902
+ ],
903
+ "md5sum": "894bfb02ebd46757ea2fa28fdc8f9079"
904
+ },
905
+ {
906
+ "dataPath": "params_shard_3.bin",
907
+ "format": "raw-shard",
908
+ "nbytes": 33074688,
909
+ "records": [
910
+ {
911
+ "name": "roberta.encoder.layer.2.intermediate.dense.weight",
912
+ "shape": [
913
+ 3072,
914
+ 768
915
+ ],
916
+ "dtype": "bfloat16",
917
+ "format": "raw",
918
+ "nbytes": 4718592,
919
+ "byteOffset": 0
920
+ },
921
+ {
922
+ "name": "roberta.encoder.layer.2.output.LayerNorm.bias",
923
+ "shape": [
924
+ 768
925
+ ],
926
+ "dtype": "bfloat16",
927
+ "format": "raw",
928
+ "nbytes": 1536,
929
+ "byteOffset": 4718592
930
+ },
931
+ {
932
+ "name": "roberta.encoder.layer.2.output.LayerNorm.weight",
933
+ "shape": [
934
+ 768
935
+ ],
936
+ "dtype": "bfloat16",
937
+ "format": "raw",
938
+ "nbytes": 1536,
939
+ "byteOffset": 4720128
940
+ },
941
+ {
942
+ "name": "roberta.encoder.layer.2.output.dense.bias",
943
+ "shape": [
944
+ 768
945
+ ],
946
+ "dtype": "bfloat16",
947
+ "format": "raw",
948
+ "nbytes": 1536,
949
+ "byteOffset": 4721664
950
+ },
951
+ {
952
+ "name": "roberta.encoder.layer.2.output.dense.weight",
953
+ "shape": [
954
+ 768,
955
+ 3072
956
+ ],
957
+ "dtype": "bfloat16",
958
+ "format": "raw",
959
+ "nbytes": 4718592,
960
+ "byteOffset": 4723200
961
+ },
962
+ {
963
+ "name": "roberta.encoder.layer.3.attention.output.LayerNorm.bias",
964
+ "shape": [
965
+ 768
966
+ ],
967
+ "dtype": "bfloat16",
968
+ "format": "raw",
969
+ "nbytes": 1536,
970
+ "byteOffset": 9441792
971
+ },
972
+ {
973
+ "name": "roberta.encoder.layer.3.attention.output.LayerNorm.weight",
974
+ "shape": [
975
+ 768
976
+ ],
977
+ "dtype": "bfloat16",
978
+ "format": "raw",
979
+ "nbytes": 1536,
980
+ "byteOffset": 9443328
981
+ },
982
+ {
983
+ "name": "roberta.encoder.layer.3.attention.output.dense.bias",
984
+ "shape": [
985
+ 768
986
+ ],
987
+ "dtype": "bfloat16",
988
+ "format": "raw",
989
+ "nbytes": 1536,
990
+ "byteOffset": 9444864
991
+ },
992
+ {
993
+ "name": "roberta.encoder.layer.3.attention.output.dense.weight",
994
+ "shape": [
995
+ 768,
996
+ 768
997
+ ],
998
+ "dtype": "bfloat16",
999
+ "format": "raw",
1000
+ "nbytes": 1179648,
1001
+ "byteOffset": 9446400
1002
+ },
1003
+ {
1004
+ "name": "roberta.encoder.layer.3.attention.self.key.bias",
1005
+ "shape": [
1006
+ 768
1007
+ ],
1008
+ "dtype": "bfloat16",
1009
+ "format": "raw",
1010
+ "nbytes": 1536,
1011
+ "byteOffset": 10626048
1012
+ },
1013
+ {
1014
+ "name": "roberta.encoder.layer.3.attention.self.key.weight",
1015
+ "shape": [
1016
+ 768,
1017
+ 768
1018
+ ],
1019
+ "dtype": "bfloat16",
1020
+ "format": "raw",
1021
+ "nbytes": 1179648,
1022
+ "byteOffset": 10627584
1023
+ },
1024
+ {
1025
+ "name": "roberta.encoder.layer.3.attention.self.query.bias",
1026
+ "shape": [
1027
+ 768
1028
+ ],
1029
+ "dtype": "bfloat16",
1030
+ "format": "raw",
1031
+ "nbytes": 1536,
1032
+ "byteOffset": 11807232
1033
+ },
1034
+ {
1035
+ "name": "roberta.encoder.layer.3.attention.self.query.weight",
1036
+ "shape": [
1037
+ 768,
1038
+ 768
1039
+ ],
1040
+ "dtype": "bfloat16",
1041
+ "format": "raw",
1042
+ "nbytes": 1179648,
1043
+ "byteOffset": 11808768
1044
+ },
1045
+ {
1046
+ "name": "roberta.encoder.layer.3.attention.self.value.bias",
1047
+ "shape": [
1048
+ 768
1049
+ ],
1050
+ "dtype": "bfloat16",
1051
+ "format": "raw",
1052
+ "nbytes": 1536,
1053
+ "byteOffset": 12988416
1054
+ },
1055
+ {
1056
+ "name": "roberta.encoder.layer.3.attention.self.value.weight",
1057
+ "shape": [
1058
+ 768,
1059
+ 768
1060
+ ],
1061
+ "dtype": "bfloat16",
1062
+ "format": "raw",
1063
+ "nbytes": 1179648,
1064
+ "byteOffset": 12989952
1065
+ },
1066
+ {
1067
+ "name": "roberta.encoder.layer.3.intermediate.dense.bias",
1068
+ "shape": [
1069
+ 3072
1070
+ ],
1071
+ "dtype": "bfloat16",
1072
+ "format": "raw",
1073
+ "nbytes": 6144,
1074
+ "byteOffset": 14169600
1075
+ },
1076
+ {
1077
+ "name": "roberta.encoder.layer.3.intermediate.dense.weight",
1078
+ "shape": [
1079
+ 3072,
1080
+ 768
1081
+ ],
1082
+ "dtype": "bfloat16",
1083
+ "format": "raw",
1084
+ "nbytes": 4718592,
1085
+ "byteOffset": 14175744
1086
+ },
1087
+ {
1088
+ "name": "roberta.encoder.layer.3.output.LayerNorm.bias",
1089
+ "shape": [
1090
+ 768
1091
+ ],
1092
+ "dtype": "bfloat16",
1093
+ "format": "raw",
1094
+ "nbytes": 1536,
1095
+ "byteOffset": 18894336
1096
+ },
1097
+ {
1098
+ "name": "roberta.encoder.layer.3.output.LayerNorm.weight",
1099
+ "shape": [
1100
+ 768
1101
+ ],
1102
+ "dtype": "bfloat16",
1103
+ "format": "raw",
1104
+ "nbytes": 1536,
1105
+ "byteOffset": 18895872
1106
+ },
1107
+ {
1108
+ "name": "roberta.encoder.layer.3.output.dense.bias",
1109
+ "shape": [
1110
+ 768
1111
+ ],
1112
+ "dtype": "bfloat16",
1113
+ "format": "raw",
1114
+ "nbytes": 1536,
1115
+ "byteOffset": 18897408
1116
+ },
1117
+ {
1118
+ "name": "roberta.encoder.layer.3.output.dense.weight",
1119
+ "shape": [
1120
+ 768,
1121
+ 3072
1122
+ ],
1123
+ "dtype": "bfloat16",
1124
+ "format": "raw",
1125
+ "nbytes": 4718592,
1126
+ "byteOffset": 18898944
1127
+ },
1128
+ {
1129
+ "name": "roberta.encoder.layer.4.attention.output.LayerNorm.bias",
1130
+ "shape": [
1131
+ 768
1132
+ ],
1133
+ "dtype": "bfloat16",
1134
+ "format": "raw",
1135
+ "nbytes": 1536,
1136
+ "byteOffset": 23617536
1137
+ },
1138
+ {
1139
+ "name": "roberta.encoder.layer.4.attention.output.LayerNorm.weight",
1140
+ "shape": [
1141
+ 768
1142
+ ],
1143
+ "dtype": "bfloat16",
1144
+ "format": "raw",
1145
+ "nbytes": 1536,
1146
+ "byteOffset": 23619072
1147
+ },
1148
+ {
1149
+ "name": "roberta.encoder.layer.4.attention.output.dense.bias",
1150
+ "shape": [
1151
+ 768
1152
+ ],
1153
+ "dtype": "bfloat16",
1154
+ "format": "raw",
1155
+ "nbytes": 1536,
1156
+ "byteOffset": 23620608
1157
+ },
1158
+ {
1159
+ "name": "roberta.encoder.layer.4.attention.output.dense.weight",
1160
+ "shape": [
1161
+ 768,
1162
+ 768
1163
+ ],
1164
+ "dtype": "bfloat16",
1165
+ "format": "raw",
1166
+ "nbytes": 1179648,
1167
+ "byteOffset": 23622144
1168
+ },
1169
+ {
1170
+ "name": "roberta.encoder.layer.4.attention.self.key.bias",
1171
+ "shape": [
1172
+ 768
1173
+ ],
1174
+ "dtype": "bfloat16",
1175
+ "format": "raw",
1176
+ "nbytes": 1536,
1177
+ "byteOffset": 24801792
1178
+ },
1179
+ {
1180
+ "name": "roberta.encoder.layer.4.attention.self.key.weight",
1181
+ "shape": [
1182
+ 768,
1183
+ 768
1184
+ ],
1185
+ "dtype": "bfloat16",
1186
+ "format": "raw",
1187
+ "nbytes": 1179648,
1188
+ "byteOffset": 24803328
1189
+ },
1190
+ {
1191
+ "name": "roberta.encoder.layer.4.attention.self.query.bias",
1192
+ "shape": [
1193
+ 768
1194
+ ],
1195
+ "dtype": "bfloat16",
1196
+ "format": "raw",
1197
+ "nbytes": 1536,
1198
+ "byteOffset": 25982976
1199
+ },
1200
+ {
1201
+ "name": "roberta.encoder.layer.4.attention.self.query.weight",
1202
+ "shape": [
1203
+ 768,
1204
+ 768
1205
+ ],
1206
+ "dtype": "bfloat16",
1207
+ "format": "raw",
1208
+ "nbytes": 1179648,
1209
+ "byteOffset": 25984512
1210
+ },
1211
+ {
1212
+ "name": "roberta.encoder.layer.4.attention.self.value.bias",
1213
+ "shape": [
1214
+ 768
1215
+ ],
1216
+ "dtype": "bfloat16",
1217
+ "format": "raw",
1218
+ "nbytes": 1536,
1219
+ "byteOffset": 27164160
1220
+ },
1221
+ {
1222
+ "name": "roberta.encoder.layer.4.attention.self.value.weight",
1223
+ "shape": [
1224
+ 768,
1225
+ 768
1226
+ ],
1227
+ "dtype": "bfloat16",
1228
+ "format": "raw",
1229
+ "nbytes": 1179648,
1230
+ "byteOffset": 27165696
1231
+ },
1232
+ {
1233
+ "name": "roberta.encoder.layer.4.intermediate.dense.bias",
1234
+ "shape": [
1235
+ 3072
1236
+ ],
1237
+ "dtype": "bfloat16",
1238
+ "format": "raw",
1239
+ "nbytes": 6144,
1240
+ "byteOffset": 28345344
1241
+ },
1242
+ {
1243
+ "name": "roberta.encoder.layer.4.intermediate.dense.weight",
1244
+ "shape": [
1245
+ 3072,
1246
+ 768
1247
+ ],
1248
+ "dtype": "bfloat16",
1249
+ "format": "raw",
1250
+ "nbytes": 4718592,
1251
+ "byteOffset": 28351488
1252
+ },
1253
+ {
1254
+ "name": "roberta.encoder.layer.4.output.LayerNorm.bias",
1255
+ "shape": [
1256
+ 768
1257
+ ],
1258
+ "dtype": "bfloat16",
1259
+ "format": "raw",
1260
+ "nbytes": 1536,
1261
+ "byteOffset": 33070080
1262
+ },
1263
+ {
1264
+ "name": "roberta.encoder.layer.4.output.LayerNorm.weight",
1265
+ "shape": [
1266
+ 768
1267
+ ],
1268
+ "dtype": "bfloat16",
1269
+ "format": "raw",
1270
+ "nbytes": 1536,
1271
+ "byteOffset": 33071616
1272
+ },
1273
+ {
1274
+ "name": "roberta.encoder.layer.4.output.dense.bias",
1275
+ "shape": [
1276
+ 768
1277
+ ],
1278
+ "dtype": "bfloat16",
1279
+ "format": "raw",
1280
+ "nbytes": 1536,
1281
+ "byteOffset": 33073152
1282
+ }
1283
+ ],
1284
+ "md5sum": "0204eb9ffd7d0691cdba514cfc076112"
1285
+ },
1286
+ {
1287
+ "dataPath": "params_shard_4.bin",
1288
+ "format": "raw-shard",
1289
+ "nbytes": 33074688,
1290
+ "records": [
1291
+ {
1292
+ "name": "roberta.encoder.layer.4.output.dense.weight",
1293
+ "shape": [
1294
+ 768,
1295
+ 3072
1296
+ ],
1297
+ "dtype": "bfloat16",
1298
+ "format": "raw",
1299
+ "nbytes": 4718592,
1300
+ "byteOffset": 0
1301
+ },
1302
+ {
1303
+ "name": "roberta.encoder.layer.5.attention.output.LayerNorm.bias",
1304
+ "shape": [
1305
+ 768
1306
+ ],
1307
+ "dtype": "bfloat16",
1308
+ "format": "raw",
1309
+ "nbytes": 1536,
1310
+ "byteOffset": 4718592
1311
+ },
1312
+ {
1313
+ "name": "roberta.encoder.layer.5.attention.output.LayerNorm.weight",
1314
+ "shape": [
1315
+ 768
1316
+ ],
1317
+ "dtype": "bfloat16",
1318
+ "format": "raw",
1319
+ "nbytes": 1536,
1320
+ "byteOffset": 4720128
1321
+ },
1322
+ {
1323
+ "name": "roberta.encoder.layer.5.attention.output.dense.bias",
1324
+ "shape": [
1325
+ 768
1326
+ ],
1327
+ "dtype": "bfloat16",
1328
+ "format": "raw",
1329
+ "nbytes": 1536,
1330
+ "byteOffset": 4721664
1331
+ },
1332
+ {
1333
+ "name": "roberta.encoder.layer.5.attention.output.dense.weight",
1334
+ "shape": [
1335
+ 768,
1336
+ 768
1337
+ ],
1338
+ "dtype": "bfloat16",
1339
+ "format": "raw",
1340
+ "nbytes": 1179648,
1341
+ "byteOffset": 4723200
1342
+ },
1343
+ {
1344
+ "name": "roberta.encoder.layer.5.attention.self.key.bias",
1345
+ "shape": [
1346
+ 768
1347
+ ],
1348
+ "dtype": "bfloat16",
1349
+ "format": "raw",
1350
+ "nbytes": 1536,
1351
+ "byteOffset": 5902848
1352
+ },
1353
+ {
1354
+ "name": "roberta.encoder.layer.5.attention.self.key.weight",
1355
+ "shape": [
1356
+ 768,
1357
+ 768
1358
+ ],
1359
+ "dtype": "bfloat16",
1360
+ "format": "raw",
1361
+ "nbytes": 1179648,
1362
+ "byteOffset": 5904384
1363
+ },
1364
+ {
1365
+ "name": "roberta.encoder.layer.5.attention.self.query.bias",
1366
+ "shape": [
1367
+ 768
1368
+ ],
1369
+ "dtype": "bfloat16",
1370
+ "format": "raw",
1371
+ "nbytes": 1536,
1372
+ "byteOffset": 7084032
1373
+ },
1374
+ {
1375
+ "name": "roberta.encoder.layer.5.attention.self.query.weight",
1376
+ "shape": [
1377
+ 768,
1378
+ 768
1379
+ ],
1380
+ "dtype": "bfloat16",
1381
+ "format": "raw",
1382
+ "nbytes": 1179648,
1383
+ "byteOffset": 7085568
1384
+ },
1385
+ {
1386
+ "name": "roberta.encoder.layer.5.attention.self.value.bias",
1387
+ "shape": [
1388
+ 768
1389
+ ],
1390
+ "dtype": "bfloat16",
1391
+ "format": "raw",
1392
+ "nbytes": 1536,
1393
+ "byteOffset": 8265216
1394
+ },
1395
+ {
1396
+ "name": "roberta.encoder.layer.5.attention.self.value.weight",
1397
+ "shape": [
1398
+ 768,
1399
+ 768
1400
+ ],
1401
+ "dtype": "bfloat16",
1402
+ "format": "raw",
1403
+ "nbytes": 1179648,
1404
+ "byteOffset": 8266752
1405
+ },
1406
+ {
1407
+ "name": "roberta.encoder.layer.5.intermediate.dense.bias",
1408
+ "shape": [
1409
+ 3072
1410
+ ],
1411
+ "dtype": "bfloat16",
1412
+ "format": "raw",
1413
+ "nbytes": 6144,
1414
+ "byteOffset": 9446400
1415
+ },
1416
+ {
1417
+ "name": "roberta.encoder.layer.5.intermediate.dense.weight",
1418
+ "shape": [
1419
+ 3072,
1420
+ 768
1421
+ ],
1422
+ "dtype": "bfloat16",
1423
+ "format": "raw",
1424
+ "nbytes": 4718592,
1425
+ "byteOffset": 9452544
1426
+ },
1427
+ {
1428
+ "name": "roberta.encoder.layer.5.output.LayerNorm.bias",
1429
+ "shape": [
1430
+ 768
1431
+ ],
1432
+ "dtype": "bfloat16",
1433
+ "format": "raw",
1434
+ "nbytes": 1536,
1435
+ "byteOffset": 14171136
1436
+ },
1437
+ {
1438
+ "name": "roberta.encoder.layer.5.output.LayerNorm.weight",
1439
+ "shape": [
1440
+ 768
1441
+ ],
1442
+ "dtype": "bfloat16",
1443
+ "format": "raw",
1444
+ "nbytes": 1536,
1445
+ "byteOffset": 14172672
1446
+ },
1447
+ {
1448
+ "name": "roberta.encoder.layer.5.output.dense.bias",
1449
+ "shape": [
1450
+ 768
1451
+ ],
1452
+ "dtype": "bfloat16",
1453
+ "format": "raw",
1454
+ "nbytes": 1536,
1455
+ "byteOffset": 14174208
1456
+ },
1457
+ {
1458
+ "name": "roberta.encoder.layer.5.output.dense.weight",
1459
+ "shape": [
1460
+ 768,
1461
+ 3072
1462
+ ],
1463
+ "dtype": "bfloat16",
1464
+ "format": "raw",
1465
+ "nbytes": 4718592,
1466
+ "byteOffset": 14175744
1467
+ },
1468
+ {
1469
+ "name": "roberta.encoder.layer.6.attention.output.LayerNorm.bias",
1470
+ "shape": [
1471
+ 768
1472
+ ],
1473
+ "dtype": "bfloat16",
1474
+ "format": "raw",
1475
+ "nbytes": 1536,
1476
+ "byteOffset": 18894336
1477
+ },
1478
+ {
1479
+ "name": "roberta.encoder.layer.6.attention.output.LayerNorm.weight",
1480
+ "shape": [
1481
+ 768
1482
+ ],
1483
+ "dtype": "bfloat16",
1484
+ "format": "raw",
1485
+ "nbytes": 1536,
1486
+ "byteOffset": 18895872
1487
+ },
1488
+ {
1489
+ "name": "roberta.encoder.layer.6.attention.output.dense.bias",
1490
+ "shape": [
1491
+ 768
1492
+ ],
1493
+ "dtype": "bfloat16",
1494
+ "format": "raw",
1495
+ "nbytes": 1536,
1496
+ "byteOffset": 18897408
1497
+ },
1498
+ {
1499
+ "name": "roberta.encoder.layer.6.attention.output.dense.weight",
1500
+ "shape": [
1501
+ 768,
1502
+ 768
1503
+ ],
1504
+ "dtype": "bfloat16",
1505
+ "format": "raw",
1506
+ "nbytes": 1179648,
1507
+ "byteOffset": 18898944
1508
+ },
1509
+ {
1510
+ "name": "roberta.encoder.layer.6.attention.self.key.bias",
1511
+ "shape": [
1512
+ 768
1513
+ ],
1514
+ "dtype": "bfloat16",
1515
+ "format": "raw",
1516
+ "nbytes": 1536,
1517
+ "byteOffset": 20078592
1518
+ },
1519
+ {
1520
+ "name": "roberta.encoder.layer.6.attention.self.key.weight",
1521
+ "shape": [
1522
+ 768,
1523
+ 768
1524
+ ],
1525
+ "dtype": "bfloat16",
1526
+ "format": "raw",
1527
+ "nbytes": 1179648,
1528
+ "byteOffset": 20080128
1529
+ },
1530
+ {
1531
+ "name": "roberta.encoder.layer.6.attention.self.query.bias",
1532
+ "shape": [
1533
+ 768
1534
+ ],
1535
+ "dtype": "bfloat16",
1536
+ "format": "raw",
1537
+ "nbytes": 1536,
1538
+ "byteOffset": 21259776
1539
+ },
1540
+ {
1541
+ "name": "roberta.encoder.layer.6.attention.self.query.weight",
1542
+ "shape": [
1543
+ 768,
1544
+ 768
1545
+ ],
1546
+ "dtype": "bfloat16",
1547
+ "format": "raw",
1548
+ "nbytes": 1179648,
1549
+ "byteOffset": 21261312
1550
+ },
1551
+ {
1552
+ "name": "roberta.encoder.layer.6.attention.self.value.bias",
1553
+ "shape": [
1554
+ 768
1555
+ ],
1556
+ "dtype": "bfloat16",
1557
+ "format": "raw",
1558
+ "nbytes": 1536,
1559
+ "byteOffset": 22440960
1560
+ },
1561
+ {
1562
+ "name": "roberta.encoder.layer.6.attention.self.value.weight",
1563
+ "shape": [
1564
+ 768,
1565
+ 768
1566
+ ],
1567
+ "dtype": "bfloat16",
1568
+ "format": "raw",
1569
+ "nbytes": 1179648,
1570
+ "byteOffset": 22442496
1571
+ },
1572
+ {
1573
+ "name": "roberta.encoder.layer.6.intermediate.dense.bias",
1574
+ "shape": [
1575
+ 3072
1576
+ ],
1577
+ "dtype": "bfloat16",
1578
+ "format": "raw",
1579
+ "nbytes": 6144,
1580
+ "byteOffset": 23622144
1581
+ },
1582
+ {
1583
+ "name": "roberta.encoder.layer.6.intermediate.dense.weight",
1584
+ "shape": [
1585
+ 3072,
1586
+ 768
1587
+ ],
1588
+ "dtype": "bfloat16",
1589
+ "format": "raw",
1590
+ "nbytes": 4718592,
1591
+ "byteOffset": 23628288
1592
+ },
1593
+ {
1594
+ "name": "roberta.encoder.layer.6.output.LayerNorm.bias",
1595
+ "shape": [
1596
+ 768
1597
+ ],
1598
+ "dtype": "bfloat16",
1599
+ "format": "raw",
1600
+ "nbytes": 1536,
1601
+ "byteOffset": 28346880
1602
+ },
1603
+ {
1604
+ "name": "roberta.encoder.layer.6.output.LayerNorm.weight",
1605
+ "shape": [
1606
+ 768
1607
+ ],
1608
+ "dtype": "bfloat16",
1609
+ "format": "raw",
1610
+ "nbytes": 1536,
1611
+ "byteOffset": 28348416
1612
+ },
1613
+ {
1614
+ "name": "roberta.encoder.layer.6.output.dense.bias",
1615
+ "shape": [
1616
+ 768
1617
+ ],
1618
+ "dtype": "bfloat16",
1619
+ "format": "raw",
1620
+ "nbytes": 1536,
1621
+ "byteOffset": 28349952
1622
+ },
1623
+ {
1624
+ "name": "roberta.encoder.layer.6.output.dense.weight",
1625
+ "shape": [
1626
+ 768,
1627
+ 3072
1628
+ ],
1629
+ "dtype": "bfloat16",
1630
+ "format": "raw",
1631
+ "nbytes": 4718592,
1632
+ "byteOffset": 28351488
1633
+ },
1634
+ {
1635
+ "name": "roberta.encoder.layer.7.attention.output.LayerNorm.bias",
1636
+ "shape": [
1637
+ 768
1638
+ ],
1639
+ "dtype": "bfloat16",
1640
+ "format": "raw",
1641
+ "nbytes": 1536,
1642
+ "byteOffset": 33070080
1643
+ },
1644
+ {
1645
+ "name": "roberta.encoder.layer.7.attention.output.LayerNorm.weight",
1646
+ "shape": [
1647
+ 768
1648
+ ],
1649
+ "dtype": "bfloat16",
1650
+ "format": "raw",
1651
+ "nbytes": 1536,
1652
+ "byteOffset": 33071616
1653
+ },
1654
+ {
1655
+ "name": "roberta.encoder.layer.7.attention.output.dense.bias",
1656
+ "shape": [
1657
+ 768
1658
+ ],
1659
+ "dtype": "bfloat16",
1660
+ "format": "raw",
1661
+ "nbytes": 1536,
1662
+ "byteOffset": 33073152
1663
+ }
1664
+ ],
1665
+ "md5sum": "f2720ca40469da131d390ea8694443fd"
1666
+ },
1667
+ {
1668
+ "dataPath": "params_shard_5.bin",
1669
+ "format": "raw-shard",
1670
+ "nbytes": 33080832,
1671
+ "records": [
1672
+ {
1673
+ "name": "roberta.encoder.layer.7.attention.output.dense.weight",
1674
+ "shape": [
1675
+ 768,
1676
+ 768
1677
+ ],
1678
+ "dtype": "bfloat16",
1679
+ "format": "raw",
1680
+ "nbytes": 1179648,
1681
+ "byteOffset": 0
1682
+ },
1683
+ {
1684
+ "name": "roberta.encoder.layer.7.attention.self.key.bias",
1685
+ "shape": [
1686
+ 768
1687
+ ],
1688
+ "dtype": "bfloat16",
1689
+ "format": "raw",
1690
+ "nbytes": 1536,
1691
+ "byteOffset": 1179648
1692
+ },
1693
+ {
1694
+ "name": "roberta.encoder.layer.7.attention.self.key.weight",
1695
+ "shape": [
1696
+ 768,
1697
+ 768
1698
+ ],
1699
+ "dtype": "bfloat16",
1700
+ "format": "raw",
1701
+ "nbytes": 1179648,
1702
+ "byteOffset": 1181184
1703
+ },
1704
+ {
1705
+ "name": "roberta.encoder.layer.7.attention.self.query.bias",
1706
+ "shape": [
1707
+ 768
1708
+ ],
1709
+ "dtype": "bfloat16",
1710
+ "format": "raw",
1711
+ "nbytes": 1536,
1712
+ "byteOffset": 2360832
1713
+ },
1714
+ {
1715
+ "name": "roberta.encoder.layer.7.attention.self.query.weight",
1716
+ "shape": [
1717
+ 768,
1718
+ 768
1719
+ ],
1720
+ "dtype": "bfloat16",
1721
+ "format": "raw",
1722
+ "nbytes": 1179648,
1723
+ "byteOffset": 2362368
1724
+ },
1725
+ {
1726
+ "name": "roberta.encoder.layer.7.attention.self.value.bias",
1727
+ "shape": [
1728
+ 768
1729
+ ],
1730
+ "dtype": "bfloat16",
1731
+ "format": "raw",
1732
+ "nbytes": 1536,
1733
+ "byteOffset": 3542016
1734
+ },
1735
+ {
1736
+ "name": "roberta.encoder.layer.7.attention.self.value.weight",
1737
+ "shape": [
1738
+ 768,
1739
+ 768
1740
+ ],
1741
+ "dtype": "bfloat16",
1742
+ "format": "raw",
1743
+ "nbytes": 1179648,
1744
+ "byteOffset": 3543552
1745
+ },
1746
+ {
1747
+ "name": "roberta.encoder.layer.7.intermediate.dense.bias",
1748
+ "shape": [
1749
+ 3072
1750
+ ],
1751
+ "dtype": "bfloat16",
1752
+ "format": "raw",
1753
+ "nbytes": 6144,
1754
+ "byteOffset": 4723200
1755
+ },
1756
+ {
1757
+ "name": "roberta.encoder.layer.7.intermediate.dense.weight",
1758
+ "shape": [
1759
+ 3072,
1760
+ 768
1761
+ ],
1762
+ "dtype": "bfloat16",
1763
+ "format": "raw",
1764
+ "nbytes": 4718592,
1765
+ "byteOffset": 4729344
1766
+ },
1767
+ {
1768
+ "name": "roberta.encoder.layer.7.output.LayerNorm.bias",
1769
+ "shape": [
1770
+ 768
1771
+ ],
1772
+ "dtype": "bfloat16",
1773
+ "format": "raw",
1774
+ "nbytes": 1536,
1775
+ "byteOffset": 9447936
1776
+ },
1777
+ {
1778
+ "name": "roberta.encoder.layer.7.output.LayerNorm.weight",
1779
+ "shape": [
1780
+ 768
1781
+ ],
1782
+ "dtype": "bfloat16",
1783
+ "format": "raw",
1784
+ "nbytes": 1536,
1785
+ "byteOffset": 9449472
1786
+ },
1787
+ {
1788
+ "name": "roberta.encoder.layer.7.output.dense.bias",
1789
+ "shape": [
1790
+ 768
1791
+ ],
1792
+ "dtype": "bfloat16",
1793
+ "format": "raw",
1794
+ "nbytes": 1536,
1795
+ "byteOffset": 9451008
1796
+ },
1797
+ {
1798
+ "name": "roberta.encoder.layer.7.output.dense.weight",
1799
+ "shape": [
1800
+ 768,
1801
+ 3072
1802
+ ],
1803
+ "dtype": "bfloat16",
1804
+ "format": "raw",
1805
+ "nbytes": 4718592,
1806
+ "byteOffset": 9452544
1807
+ },
1808
+ {
1809
+ "name": "roberta.encoder.layer.8.attention.output.LayerNorm.bias",
1810
+ "shape": [
1811
+ 768
1812
+ ],
1813
+ "dtype": "bfloat16",
1814
+ "format": "raw",
1815
+ "nbytes": 1536,
1816
+ "byteOffset": 14171136
1817
+ },
1818
+ {
1819
+ "name": "roberta.encoder.layer.8.attention.output.LayerNorm.weight",
1820
+ "shape": [
1821
+ 768
1822
+ ],
1823
+ "dtype": "bfloat16",
1824
+ "format": "raw",
1825
+ "nbytes": 1536,
1826
+ "byteOffset": 14172672
1827
+ },
1828
+ {
1829
+ "name": "roberta.encoder.layer.8.attention.output.dense.bias",
1830
+ "shape": [
1831
+ 768
1832
+ ],
1833
+ "dtype": "bfloat16",
1834
+ "format": "raw",
1835
+ "nbytes": 1536,
1836
+ "byteOffset": 14174208
1837
+ },
1838
+ {
1839
+ "name": "roberta.encoder.layer.8.attention.output.dense.weight",
1840
+ "shape": [
1841
+ 768,
1842
+ 768
1843
+ ],
1844
+ "dtype": "bfloat16",
1845
+ "format": "raw",
1846
+ "nbytes": 1179648,
1847
+ "byteOffset": 14175744
1848
+ },
1849
+ {
1850
+ "name": "roberta.encoder.layer.8.attention.self.key.bias",
1851
+ "shape": [
1852
+ 768
1853
+ ],
1854
+ "dtype": "bfloat16",
1855
+ "format": "raw",
1856
+ "nbytes": 1536,
1857
+ "byteOffset": 15355392
1858
+ },
1859
+ {
1860
+ "name": "roberta.encoder.layer.8.attention.self.key.weight",
1861
+ "shape": [
1862
+ 768,
1863
+ 768
1864
+ ],
1865
+ "dtype": "bfloat16",
1866
+ "format": "raw",
1867
+ "nbytes": 1179648,
1868
+ "byteOffset": 15356928
1869
+ },
1870
+ {
1871
+ "name": "roberta.encoder.layer.8.attention.self.query.bias",
1872
+ "shape": [
1873
+ 768
1874
+ ],
1875
+ "dtype": "bfloat16",
1876
+ "format": "raw",
1877
+ "nbytes": 1536,
1878
+ "byteOffset": 16536576
1879
+ },
1880
+ {
1881
+ "name": "roberta.encoder.layer.8.attention.self.query.weight",
1882
+ "shape": [
1883
+ 768,
1884
+ 768
1885
+ ],
1886
+ "dtype": "bfloat16",
1887
+ "format": "raw",
1888
+ "nbytes": 1179648,
1889
+ "byteOffset": 16538112
1890
+ },
1891
+ {
1892
+ "name": "roberta.encoder.layer.8.attention.self.value.bias",
1893
+ "shape": [
1894
+ 768
1895
+ ],
1896
+ "dtype": "bfloat16",
1897
+ "format": "raw",
1898
+ "nbytes": 1536,
1899
+ "byteOffset": 17717760
1900
+ },
1901
+ {
1902
+ "name": "roberta.encoder.layer.8.attention.self.value.weight",
1903
+ "shape": [
1904
+ 768,
1905
+ 768
1906
+ ],
1907
+ "dtype": "bfloat16",
1908
+ "format": "raw",
1909
+ "nbytes": 1179648,
1910
+ "byteOffset": 17719296
1911
+ },
1912
+ {
1913
+ "name": "roberta.encoder.layer.8.intermediate.dense.bias",
1914
+ "shape": [
1915
+ 3072
1916
+ ],
1917
+ "dtype": "bfloat16",
1918
+ "format": "raw",
1919
+ "nbytes": 6144,
1920
+ "byteOffset": 18898944
1921
+ },
1922
+ {
1923
+ "name": "roberta.encoder.layer.8.intermediate.dense.weight",
1924
+ "shape": [
1925
+ 3072,
1926
+ 768
1927
+ ],
1928
+ "dtype": "bfloat16",
1929
+ "format": "raw",
1930
+ "nbytes": 4718592,
1931
+ "byteOffset": 18905088
1932
+ },
1933
+ {
1934
+ "name": "roberta.encoder.layer.8.output.LayerNorm.bias",
1935
+ "shape": [
1936
+ 768
1937
+ ],
1938
+ "dtype": "bfloat16",
1939
+ "format": "raw",
1940
+ "nbytes": 1536,
1941
+ "byteOffset": 23623680
1942
+ },
1943
+ {
1944
+ "name": "roberta.encoder.layer.8.output.LayerNorm.weight",
1945
+ "shape": [
1946
+ 768
1947
+ ],
1948
+ "dtype": "bfloat16",
1949
+ "format": "raw",
1950
+ "nbytes": 1536,
1951
+ "byteOffset": 23625216
1952
+ },
1953
+ {
1954
+ "name": "roberta.encoder.layer.8.output.dense.bias",
1955
+ "shape": [
1956
+ 768
1957
+ ],
1958
+ "dtype": "bfloat16",
1959
+ "format": "raw",
1960
+ "nbytes": 1536,
1961
+ "byteOffset": 23626752
1962
+ },
1963
+ {
1964
+ "name": "roberta.encoder.layer.8.output.dense.weight",
1965
+ "shape": [
1966
+ 768,
1967
+ 3072
1968
+ ],
1969
+ "dtype": "bfloat16",
1970
+ "format": "raw",
1971
+ "nbytes": 4718592,
1972
+ "byteOffset": 23628288
1973
+ },
1974
+ {
1975
+ "name": "roberta.encoder.layer.9.attention.output.LayerNorm.bias",
1976
+ "shape": [
1977
+ 768
1978
+ ],
1979
+ "dtype": "bfloat16",
1980
+ "format": "raw",
1981
+ "nbytes": 1536,
1982
+ "byteOffset": 28346880
1983
+ },
1984
+ {
1985
+ "name": "roberta.encoder.layer.9.attention.output.LayerNorm.weight",
1986
+ "shape": [
1987
+ 768
1988
+ ],
1989
+ "dtype": "bfloat16",
1990
+ "format": "raw",
1991
+ "nbytes": 1536,
1992
+ "byteOffset": 28348416
1993
+ },
1994
+ {
1995
+ "name": "roberta.encoder.layer.9.attention.output.dense.bias",
1996
+ "shape": [
1997
+ 768
1998
+ ],
1999
+ "dtype": "bfloat16",
2000
+ "format": "raw",
2001
+ "nbytes": 1536,
2002
+ "byteOffset": 28349952
2003
+ },
2004
+ {
2005
+ "name": "roberta.encoder.layer.9.attention.output.dense.weight",
2006
+ "shape": [
2007
+ 768,
2008
+ 768
2009
+ ],
2010
+ "dtype": "bfloat16",
2011
+ "format": "raw",
2012
+ "nbytes": 1179648,
2013
+ "byteOffset": 28351488
2014
+ },
2015
+ {
2016
+ "name": "roberta.encoder.layer.9.attention.self.key.bias",
2017
+ "shape": [
2018
+ 768
2019
+ ],
2020
+ "dtype": "bfloat16",
2021
+ "format": "raw",
2022
+ "nbytes": 1536,
2023
+ "byteOffset": 29531136
2024
+ },
2025
+ {
2026
+ "name": "roberta.encoder.layer.9.attention.self.key.weight",
2027
+ "shape": [
2028
+ 768,
2029
+ 768
2030
+ ],
2031
+ "dtype": "bfloat16",
2032
+ "format": "raw",
2033
+ "nbytes": 1179648,
2034
+ "byteOffset": 29532672
2035
+ },
2036
+ {
2037
+ "name": "roberta.encoder.layer.9.attention.self.query.bias",
2038
+ "shape": [
2039
+ 768
2040
+ ],
2041
+ "dtype": "bfloat16",
2042
+ "format": "raw",
2043
+ "nbytes": 1536,
2044
+ "byteOffset": 30712320
2045
+ },
2046
+ {
2047
+ "name": "roberta.encoder.layer.9.attention.self.query.weight",
2048
+ "shape": [
2049
+ 768,
2050
+ 768
2051
+ ],
2052
+ "dtype": "bfloat16",
2053
+ "format": "raw",
2054
+ "nbytes": 1179648,
2055
+ "byteOffset": 30713856
2056
+ },
2057
+ {
2058
+ "name": "roberta.encoder.layer.9.attention.self.value.bias",
2059
+ "shape": [
2060
+ 768
2061
+ ],
2062
+ "dtype": "bfloat16",
2063
+ "format": "raw",
2064
+ "nbytes": 1536,
2065
+ "byteOffset": 31893504
2066
+ },
2067
+ {
2068
+ "name": "roberta.encoder.layer.9.attention.self.value.weight",
2069
+ "shape": [
2070
+ 768,
2071
+ 768
2072
+ ],
2073
+ "dtype": "bfloat16",
2074
+ "format": "raw",
2075
+ "nbytes": 1179648,
2076
+ "byteOffset": 31895040
2077
+ },
2078
+ {
2079
+ "name": "roberta.encoder.layer.9.intermediate.dense.bias",
2080
+ "shape": [
2081
+ 3072
2082
+ ],
2083
+ "dtype": "bfloat16",
2084
+ "format": "raw",
2085
+ "nbytes": 6144,
2086
+ "byteOffset": 33074688
2087
+ }
2088
+ ],
2089
+ "md5sum": "96b9d77f860a578f5bc124742a2447aa"
2090
+ },
2091
+ {
2092
+ "dataPath": "params_shard_6.bin",
2093
+ "format": "raw-shard",
2094
+ "nbytes": 9441792,
2095
+ "records": [
2096
+ {
2097
+ "name": "roberta.encoder.layer.9.intermediate.dense.weight",
2098
+ "shape": [
2099
+ 3072,
2100
+ 768
2101
+ ],
2102
+ "dtype": "bfloat16",
2103
+ "format": "raw",
2104
+ "nbytes": 4718592,
2105
+ "byteOffset": 0
2106
+ },
2107
+ {
2108
+ "name": "roberta.encoder.layer.9.output.LayerNorm.bias",
2109
+ "shape": [
2110
+ 768
2111
+ ],
2112
+ "dtype": "bfloat16",
2113
+ "format": "raw",
2114
+ "nbytes": 1536,
2115
+ "byteOffset": 4718592
2116
+ },
2117
+ {
2118
+ "name": "roberta.encoder.layer.9.output.LayerNorm.weight",
2119
+ "shape": [
2120
+ 768
2121
+ ],
2122
+ "dtype": "bfloat16",
2123
+ "format": "raw",
2124
+ "nbytes": 1536,
2125
+ "byteOffset": 4720128
2126
+ },
2127
+ {
2128
+ "name": "roberta.encoder.layer.9.output.dense.bias",
2129
+ "shape": [
2130
+ 768
2131
+ ],
2132
+ "dtype": "bfloat16",
2133
+ "format": "raw",
2134
+ "nbytes": 1536,
2135
+ "byteOffset": 4721664
2136
+ },
2137
+ {
2138
+ "name": "roberta.encoder.layer.9.output.dense.weight",
2139
+ "shape": [
2140
+ 768,
2141
+ 3072
2142
+ ],
2143
+ "dtype": "bfloat16",
2144
+ "format": "raw",
2145
+ "nbytes": 4718592,
2146
+ "byteOffset": 4723200
2147
+ }
2148
+ ],
2149
+ "md5sum": "369c5d4f9e3ffd5ee22f0023b2faba04"
2150
+ }
2151
+ ]
2152
+ }
ndarray-cache.json ADDED
@@ -0,0 +1,2152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 201,
4
+ "ParamBytes": 498588680.0,
5
+ "BitsPerParam": 32.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 77207040,
12
+ "records": [
13
+ {
14
+ "name": "roberta.embeddings.word_embeddings.weight",
15
+ "shape": [
16
+ 50265,
17
+ 768
18
+ ],
19
+ "dtype": "float32",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 77207040,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "4a61cba31613349f9ffc22e1606a39c1"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 32696836,
31
+ "records": [
32
+ {
33
+ "name": "classifier.dense.bias",
34
+ "shape": [
35
+ 768
36
+ ],
37
+ "dtype": "float32",
38
+ "format": "f32-to-bf16",
39
+ "nbytes": 1536,
40
+ "byteOffset": 0
41
+ },
42
+ {
43
+ "name": "classifier.dense.weight",
44
+ "shape": [
45
+ 768,
46
+ 768
47
+ ],
48
+ "dtype": "float32",
49
+ "format": "f32-to-bf16",
50
+ "nbytes": 1179648,
51
+ "byteOffset": 1536
52
+ },
53
+ {
54
+ "name": "classifier.out_proj.bias",
55
+ "shape": [
56
+ 2
57
+ ],
58
+ "dtype": "float32",
59
+ "format": "f32-to-bf16",
60
+ "nbytes": 4,
61
+ "byteOffset": 1181184
62
+ },
63
+ {
64
+ "name": "classifier.out_proj.weight",
65
+ "shape": [
66
+ 2,
67
+ 768
68
+ ],
69
+ "dtype": "float32",
70
+ "format": "f32-to-bf16",
71
+ "nbytes": 3072,
72
+ "byteOffset": 1181188
73
+ },
74
+ {
75
+ "name": "roberta.embeddings.LayerNorm.bias",
76
+ "shape": [
77
+ 768
78
+ ],
79
+ "dtype": "float32",
80
+ "format": "f32-to-bf16",
81
+ "nbytes": 1536,
82
+ "byteOffset": 1184260
83
+ },
84
+ {
85
+ "name": "roberta.embeddings.LayerNorm.weight",
86
+ "shape": [
87
+ 768
88
+ ],
89
+ "dtype": "float32",
90
+ "format": "f32-to-bf16",
91
+ "nbytes": 1536,
92
+ "byteOffset": 1185796
93
+ },
94
+ {
95
+ "name": "roberta.embeddings.position_embeddings.weight",
96
+ "shape": [
97
+ 514,
98
+ 768
99
+ ],
100
+ "dtype": "float32",
101
+ "format": "f32-to-bf16",
102
+ "nbytes": 789504,
103
+ "byteOffset": 1187332
104
+ },
105
+ {
106
+ "name": "roberta.embeddings.token_type_embeddings.weight",
107
+ "shape": [
108
+ 1,
109
+ 768
110
+ ],
111
+ "dtype": "float32",
112
+ "format": "f32-to-bf16",
113
+ "nbytes": 1536,
114
+ "byteOffset": 1976836
115
+ },
116
+ {
117
+ "name": "roberta.encoder.layer.0.attention.output.LayerNorm.bias",
118
+ "shape": [
119
+ 768
120
+ ],
121
+ "dtype": "float32",
122
+ "format": "f32-to-bf16",
123
+ "nbytes": 1536,
124
+ "byteOffset": 1978372
125
+ },
126
+ {
127
+ "name": "roberta.encoder.layer.0.attention.output.LayerNorm.weight",
128
+ "shape": [
129
+ 768
130
+ ],
131
+ "dtype": "float32",
132
+ "format": "f32-to-bf16",
133
+ "nbytes": 1536,
134
+ "byteOffset": 1979908
135
+ },
136
+ {
137
+ "name": "roberta.encoder.layer.0.attention.output.dense.bias",
138
+ "shape": [
139
+ 768
140
+ ],
141
+ "dtype": "float32",
142
+ "format": "f32-to-bf16",
143
+ "nbytes": 1536,
144
+ "byteOffset": 1981444
145
+ },
146
+ {
147
+ "name": "roberta.encoder.layer.0.attention.output.dense.weight",
148
+ "shape": [
149
+ 768,
150
+ 768
151
+ ],
152
+ "dtype": "float32",
153
+ "format": "f32-to-bf16",
154
+ "nbytes": 1179648,
155
+ "byteOffset": 1982980
156
+ },
157
+ {
158
+ "name": "roberta.encoder.layer.0.attention.self.key.bias",
159
+ "shape": [
160
+ 768
161
+ ],
162
+ "dtype": "float32",
163
+ "format": "f32-to-bf16",
164
+ "nbytes": 1536,
165
+ "byteOffset": 3162628
166
+ },
167
+ {
168
+ "name": "roberta.encoder.layer.0.attention.self.key.weight",
169
+ "shape": [
170
+ 768,
171
+ 768
172
+ ],
173
+ "dtype": "float32",
174
+ "format": "f32-to-bf16",
175
+ "nbytes": 1179648,
176
+ "byteOffset": 3164164
177
+ },
178
+ {
179
+ "name": "roberta.encoder.layer.0.attention.self.query.bias",
180
+ "shape": [
181
+ 768
182
+ ],
183
+ "dtype": "float32",
184
+ "format": "f32-to-bf16",
185
+ "nbytes": 1536,
186
+ "byteOffset": 4343812
187
+ },
188
+ {
189
+ "name": "roberta.encoder.layer.0.attention.self.query.weight",
190
+ "shape": [
191
+ 768,
192
+ 768
193
+ ],
194
+ "dtype": "float32",
195
+ "format": "f32-to-bf16",
196
+ "nbytes": 1179648,
197
+ "byteOffset": 4345348
198
+ },
199
+ {
200
+ "name": "roberta.encoder.layer.0.attention.self.value.bias",
201
+ "shape": [
202
+ 768
203
+ ],
204
+ "dtype": "float32",
205
+ "format": "f32-to-bf16",
206
+ "nbytes": 1536,
207
+ "byteOffset": 5524996
208
+ },
209
+ {
210
+ "name": "roberta.encoder.layer.0.attention.self.value.weight",
211
+ "shape": [
212
+ 768,
213
+ 768
214
+ ],
215
+ "dtype": "float32",
216
+ "format": "f32-to-bf16",
217
+ "nbytes": 1179648,
218
+ "byteOffset": 5526532
219
+ },
220
+ {
221
+ "name": "roberta.encoder.layer.0.intermediate.dense.bias",
222
+ "shape": [
223
+ 3072
224
+ ],
225
+ "dtype": "float32",
226
+ "format": "f32-to-bf16",
227
+ "nbytes": 6144,
228
+ "byteOffset": 6706180
229
+ },
230
+ {
231
+ "name": "roberta.encoder.layer.0.intermediate.dense.weight",
232
+ "shape": [
233
+ 3072,
234
+ 768
235
+ ],
236
+ "dtype": "float32",
237
+ "format": "f32-to-bf16",
238
+ "nbytes": 4718592,
239
+ "byteOffset": 6712324
240
+ },
241
+ {
242
+ "name": "roberta.encoder.layer.0.output.LayerNorm.bias",
243
+ "shape": [
244
+ 768
245
+ ],
246
+ "dtype": "float32",
247
+ "format": "f32-to-bf16",
248
+ "nbytes": 1536,
249
+ "byteOffset": 11430916
250
+ },
251
+ {
252
+ "name": "roberta.encoder.layer.0.output.LayerNorm.weight",
253
+ "shape": [
254
+ 768
255
+ ],
256
+ "dtype": "float32",
257
+ "format": "f32-to-bf16",
258
+ "nbytes": 1536,
259
+ "byteOffset": 11432452
260
+ },
261
+ {
262
+ "name": "roberta.encoder.layer.0.output.dense.bias",
263
+ "shape": [
264
+ 768
265
+ ],
266
+ "dtype": "float32",
267
+ "format": "f32-to-bf16",
268
+ "nbytes": 1536,
269
+ "byteOffset": 11433988
270
+ },
271
+ {
272
+ "name": "roberta.encoder.layer.0.output.dense.weight",
273
+ "shape": [
274
+ 768,
275
+ 3072
276
+ ],
277
+ "dtype": "float32",
278
+ "format": "f32-to-bf16",
279
+ "nbytes": 4718592,
280
+ "byteOffset": 11435524
281
+ },
282
+ {
283
+ "name": "roberta.encoder.layer.1.attention.output.LayerNorm.bias",
284
+ "shape": [
285
+ 768
286
+ ],
287
+ "dtype": "float32",
288
+ "format": "f32-to-bf16",
289
+ "nbytes": 1536,
290
+ "byteOffset": 16154116
291
+ },
292
+ {
293
+ "name": "roberta.encoder.layer.1.attention.output.LayerNorm.weight",
294
+ "shape": [
295
+ 768
296
+ ],
297
+ "dtype": "float32",
298
+ "format": "f32-to-bf16",
299
+ "nbytes": 1536,
300
+ "byteOffset": 16155652
301
+ },
302
+ {
303
+ "name": "roberta.encoder.layer.1.attention.output.dense.bias",
304
+ "shape": [
305
+ 768
306
+ ],
307
+ "dtype": "float32",
308
+ "format": "f32-to-bf16",
309
+ "nbytes": 1536,
310
+ "byteOffset": 16157188
311
+ },
312
+ {
313
+ "name": "roberta.encoder.layer.1.attention.output.dense.weight",
314
+ "shape": [
315
+ 768,
316
+ 768
317
+ ],
318
+ "dtype": "float32",
319
+ "format": "f32-to-bf16",
320
+ "nbytes": 1179648,
321
+ "byteOffset": 16158724
322
+ },
323
+ {
324
+ "name": "roberta.encoder.layer.1.attention.self.key.bias",
325
+ "shape": [
326
+ 768
327
+ ],
328
+ "dtype": "float32",
329
+ "format": "f32-to-bf16",
330
+ "nbytes": 1536,
331
+ "byteOffset": 17338372
332
+ },
333
+ {
334
+ "name": "roberta.encoder.layer.1.attention.self.key.weight",
335
+ "shape": [
336
+ 768,
337
+ 768
338
+ ],
339
+ "dtype": "float32",
340
+ "format": "f32-to-bf16",
341
+ "nbytes": 1179648,
342
+ "byteOffset": 17339908
343
+ },
344
+ {
345
+ "name": "roberta.encoder.layer.1.attention.self.query.bias",
346
+ "shape": [
347
+ 768
348
+ ],
349
+ "dtype": "float32",
350
+ "format": "f32-to-bf16",
351
+ "nbytes": 1536,
352
+ "byteOffset": 18519556
353
+ },
354
+ {
355
+ "name": "roberta.encoder.layer.1.attention.self.query.weight",
356
+ "shape": [
357
+ 768,
358
+ 768
359
+ ],
360
+ "dtype": "float32",
361
+ "format": "f32-to-bf16",
362
+ "nbytes": 1179648,
363
+ "byteOffset": 18521092
364
+ },
365
+ {
366
+ "name": "roberta.encoder.layer.1.attention.self.value.bias",
367
+ "shape": [
368
+ 768
369
+ ],
370
+ "dtype": "float32",
371
+ "format": "f32-to-bf16",
372
+ "nbytes": 1536,
373
+ "byteOffset": 19700740
374
+ },
375
+ {
376
+ "name": "roberta.encoder.layer.1.attention.self.value.weight",
377
+ "shape": [
378
+ 768,
379
+ 768
380
+ ],
381
+ "dtype": "float32",
382
+ "format": "f32-to-bf16",
383
+ "nbytes": 1179648,
384
+ "byteOffset": 19702276
385
+ },
386
+ {
387
+ "name": "roberta.encoder.layer.1.intermediate.dense.bias",
388
+ "shape": [
389
+ 3072
390
+ ],
391
+ "dtype": "float32",
392
+ "format": "f32-to-bf16",
393
+ "nbytes": 6144,
394
+ "byteOffset": 20881924
395
+ },
396
+ {
397
+ "name": "roberta.encoder.layer.1.intermediate.dense.weight",
398
+ "shape": [
399
+ 3072,
400
+ 768
401
+ ],
402
+ "dtype": "float32",
403
+ "format": "f32-to-bf16",
404
+ "nbytes": 4718592,
405
+ "byteOffset": 20888068
406
+ },
407
+ {
408
+ "name": "roberta.encoder.layer.1.output.LayerNorm.bias",
409
+ "shape": [
410
+ 768
411
+ ],
412
+ "dtype": "float32",
413
+ "format": "f32-to-bf16",
414
+ "nbytes": 1536,
415
+ "byteOffset": 25606660
416
+ },
417
+ {
418
+ "name": "roberta.encoder.layer.1.output.LayerNorm.weight",
419
+ "shape": [
420
+ 768
421
+ ],
422
+ "dtype": "float32",
423
+ "format": "f32-to-bf16",
424
+ "nbytes": 1536,
425
+ "byteOffset": 25608196
426
+ },
427
+ {
428
+ "name": "roberta.encoder.layer.1.output.dense.bias",
429
+ "shape": [
430
+ 768
431
+ ],
432
+ "dtype": "float32",
433
+ "format": "f32-to-bf16",
434
+ "nbytes": 1536,
435
+ "byteOffset": 25609732
436
+ },
437
+ {
438
+ "name": "roberta.encoder.layer.1.output.dense.weight",
439
+ "shape": [
440
+ 768,
441
+ 3072
442
+ ],
443
+ "dtype": "float32",
444
+ "format": "f32-to-bf16",
445
+ "nbytes": 4718592,
446
+ "byteOffset": 25611268
447
+ },
448
+ {
449
+ "name": "roberta.encoder.layer.10.attention.output.LayerNorm.bias",
450
+ "shape": [
451
+ 768
452
+ ],
453
+ "dtype": "float32",
454
+ "format": "f32-to-bf16",
455
+ "nbytes": 1536,
456
+ "byteOffset": 30329860
457
+ },
458
+ {
459
+ "name": "roberta.encoder.layer.10.attention.output.LayerNorm.weight",
460
+ "shape": [
461
+ 768
462
+ ],
463
+ "dtype": "float32",
464
+ "format": "f32-to-bf16",
465
+ "nbytes": 1536,
466
+ "byteOffset": 30331396
467
+ },
468
+ {
469
+ "name": "roberta.encoder.layer.10.attention.output.dense.bias",
470
+ "shape": [
471
+ 768
472
+ ],
473
+ "dtype": "float32",
474
+ "format": "f32-to-bf16",
475
+ "nbytes": 1536,
476
+ "byteOffset": 30332932
477
+ },
478
+ {
479
+ "name": "roberta.encoder.layer.10.attention.output.dense.weight",
480
+ "shape": [
481
+ 768,
482
+ 768
483
+ ],
484
+ "dtype": "float32",
485
+ "format": "f32-to-bf16",
486
+ "nbytes": 1179648,
487
+ "byteOffset": 30334468
488
+ },
489
+ {
490
+ "name": "roberta.encoder.layer.10.attention.self.key.bias",
491
+ "shape": [
492
+ 768
493
+ ],
494
+ "dtype": "float32",
495
+ "format": "f32-to-bf16",
496
+ "nbytes": 1536,
497
+ "byteOffset": 31514116
498
+ },
499
+ {
500
+ "name": "roberta.encoder.layer.10.attention.self.key.weight",
501
+ "shape": [
502
+ 768,
503
+ 768
504
+ ],
505
+ "dtype": "float32",
506
+ "format": "f32-to-bf16",
507
+ "nbytes": 1179648,
508
+ "byteOffset": 31515652
509
+ },
510
+ {
511
+ "name": "roberta.encoder.layer.10.attention.self.query.bias",
512
+ "shape": [
513
+ 768
514
+ ],
515
+ "dtype": "float32",
516
+ "format": "f32-to-bf16",
517
+ "nbytes": 1536,
518
+ "byteOffset": 32695300
519
+ }
520
+ ],
521
+ "md5sum": "bb1b378319baec7f1ca617bf8f86ff32"
522
+ },
523
+ {
524
+ "dataPath": "params_shard_2.bin",
525
+ "format": "raw-shard",
526
+ "nbytes": 30718464,
527
+ "records": [
528
+ {
529
+ "name": "roberta.encoder.layer.10.attention.self.query.weight",
530
+ "shape": [
531
+ 768,
532
+ 768
533
+ ],
534
+ "dtype": "float32",
535
+ "format": "f32-to-bf16",
536
+ "nbytes": 1179648,
537
+ "byteOffset": 0
538
+ },
539
+ {
540
+ "name": "roberta.encoder.layer.10.attention.self.value.bias",
541
+ "shape": [
542
+ 768
543
+ ],
544
+ "dtype": "float32",
545
+ "format": "f32-to-bf16",
546
+ "nbytes": 1536,
547
+ "byteOffset": 1179648
548
+ },
549
+ {
550
+ "name": "roberta.encoder.layer.10.attention.self.value.weight",
551
+ "shape": [
552
+ 768,
553
+ 768
554
+ ],
555
+ "dtype": "float32",
556
+ "format": "f32-to-bf16",
557
+ "nbytes": 1179648,
558
+ "byteOffset": 1181184
559
+ },
560
+ {
561
+ "name": "roberta.encoder.layer.10.intermediate.dense.bias",
562
+ "shape": [
563
+ 3072
564
+ ],
565
+ "dtype": "float32",
566
+ "format": "f32-to-bf16",
567
+ "nbytes": 6144,
568
+ "byteOffset": 2360832
569
+ },
570
+ {
571
+ "name": "roberta.encoder.layer.10.intermediate.dense.weight",
572
+ "shape": [
573
+ 3072,
574
+ 768
575
+ ],
576
+ "dtype": "float32",
577
+ "format": "f32-to-bf16",
578
+ "nbytes": 4718592,
579
+ "byteOffset": 2366976
580
+ },
581
+ {
582
+ "name": "roberta.encoder.layer.10.output.LayerNorm.bias",
583
+ "shape": [
584
+ 768
585
+ ],
586
+ "dtype": "float32",
587
+ "format": "f32-to-bf16",
588
+ "nbytes": 1536,
589
+ "byteOffset": 7085568
590
+ },
591
+ {
592
+ "name": "roberta.encoder.layer.10.output.LayerNorm.weight",
593
+ "shape": [
594
+ 768
595
+ ],
596
+ "dtype": "float32",
597
+ "format": "f32-to-bf16",
598
+ "nbytes": 1536,
599
+ "byteOffset": 7087104
600
+ },
601
+ {
602
+ "name": "roberta.encoder.layer.10.output.dense.bias",
603
+ "shape": [
604
+ 768
605
+ ],
606
+ "dtype": "float32",
607
+ "format": "f32-to-bf16",
608
+ "nbytes": 1536,
609
+ "byteOffset": 7088640
610
+ },
611
+ {
612
+ "name": "roberta.encoder.layer.10.output.dense.weight",
613
+ "shape": [
614
+ 768,
615
+ 3072
616
+ ],
617
+ "dtype": "float32",
618
+ "format": "f32-to-bf16",
619
+ "nbytes": 4718592,
620
+ "byteOffset": 7090176
621
+ },
622
+ {
623
+ "name": "roberta.encoder.layer.11.attention.output.LayerNorm.bias",
624
+ "shape": [
625
+ 768
626
+ ],
627
+ "dtype": "float32",
628
+ "format": "f32-to-bf16",
629
+ "nbytes": 1536,
630
+ "byteOffset": 11808768
631
+ },
632
+ {
633
+ "name": "roberta.encoder.layer.11.attention.output.LayerNorm.weight",
634
+ "shape": [
635
+ 768
636
+ ],
637
+ "dtype": "float32",
638
+ "format": "f32-to-bf16",
639
+ "nbytes": 1536,
640
+ "byteOffset": 11810304
641
+ },
642
+ {
643
+ "name": "roberta.encoder.layer.11.attention.output.dense.bias",
644
+ "shape": [
645
+ 768
646
+ ],
647
+ "dtype": "float32",
648
+ "format": "f32-to-bf16",
649
+ "nbytes": 1536,
650
+ "byteOffset": 11811840
651
+ },
652
+ {
653
+ "name": "roberta.encoder.layer.11.attention.output.dense.weight",
654
+ "shape": [
655
+ 768,
656
+ 768
657
+ ],
658
+ "dtype": "float32",
659
+ "format": "f32-to-bf16",
660
+ "nbytes": 1179648,
661
+ "byteOffset": 11813376
662
+ },
663
+ {
664
+ "name": "roberta.encoder.layer.11.attention.self.key.bias",
665
+ "shape": [
666
+ 768
667
+ ],
668
+ "dtype": "float32",
669
+ "format": "f32-to-bf16",
670
+ "nbytes": 1536,
671
+ "byteOffset": 12993024
672
+ },
673
+ {
674
+ "name": "roberta.encoder.layer.11.attention.self.key.weight",
675
+ "shape": [
676
+ 768,
677
+ 768
678
+ ],
679
+ "dtype": "float32",
680
+ "format": "f32-to-bf16",
681
+ "nbytes": 1179648,
682
+ "byteOffset": 12994560
683
+ },
684
+ {
685
+ "name": "roberta.encoder.layer.11.attention.self.query.bias",
686
+ "shape": [
687
+ 768
688
+ ],
689
+ "dtype": "float32",
690
+ "format": "f32-to-bf16",
691
+ "nbytes": 1536,
692
+ "byteOffset": 14174208
693
+ },
694
+ {
695
+ "name": "roberta.encoder.layer.11.attention.self.query.weight",
696
+ "shape": [
697
+ 768,
698
+ 768
699
+ ],
700
+ "dtype": "float32",
701
+ "format": "f32-to-bf16",
702
+ "nbytes": 1179648,
703
+ "byteOffset": 14175744
704
+ },
705
+ {
706
+ "name": "roberta.encoder.layer.11.attention.self.value.bias",
707
+ "shape": [
708
+ 768
709
+ ],
710
+ "dtype": "float32",
711
+ "format": "f32-to-bf16",
712
+ "nbytes": 1536,
713
+ "byteOffset": 15355392
714
+ },
715
+ {
716
+ "name": "roberta.encoder.layer.11.attention.self.value.weight",
717
+ "shape": [
718
+ 768,
719
+ 768
720
+ ],
721
+ "dtype": "float32",
722
+ "format": "f32-to-bf16",
723
+ "nbytes": 1179648,
724
+ "byteOffset": 15356928
725
+ },
726
+ {
727
+ "name": "roberta.encoder.layer.11.intermediate.dense.bias",
728
+ "shape": [
729
+ 3072
730
+ ],
731
+ "dtype": "float32",
732
+ "format": "f32-to-bf16",
733
+ "nbytes": 6144,
734
+ "byteOffset": 16536576
735
+ },
736
+ {
737
+ "name": "roberta.encoder.layer.11.intermediate.dense.weight",
738
+ "shape": [
739
+ 3072,
740
+ 768
741
+ ],
742
+ "dtype": "float32",
743
+ "format": "f32-to-bf16",
744
+ "nbytes": 4718592,
745
+ "byteOffset": 16542720
746
+ },
747
+ {
748
+ "name": "roberta.encoder.layer.11.output.LayerNorm.bias",
749
+ "shape": [
750
+ 768
751
+ ],
752
+ "dtype": "float32",
753
+ "format": "f32-to-bf16",
754
+ "nbytes": 1536,
755
+ "byteOffset": 21261312
756
+ },
757
+ {
758
+ "name": "roberta.encoder.layer.11.output.LayerNorm.weight",
759
+ "shape": [
760
+ 768
761
+ ],
762
+ "dtype": "float32",
763
+ "format": "f32-to-bf16",
764
+ "nbytes": 1536,
765
+ "byteOffset": 21262848
766
+ },
767
+ {
768
+ "name": "roberta.encoder.layer.11.output.dense.bias",
769
+ "shape": [
770
+ 768
771
+ ],
772
+ "dtype": "float32",
773
+ "format": "f32-to-bf16",
774
+ "nbytes": 1536,
775
+ "byteOffset": 21264384
776
+ },
777
+ {
778
+ "name": "roberta.encoder.layer.11.output.dense.weight",
779
+ "shape": [
780
+ 768,
781
+ 3072
782
+ ],
783
+ "dtype": "float32",
784
+ "format": "f32-to-bf16",
785
+ "nbytes": 4718592,
786
+ "byteOffset": 21265920
787
+ },
788
+ {
789
+ "name": "roberta.encoder.layer.2.attention.output.LayerNorm.bias",
790
+ "shape": [
791
+ 768
792
+ ],
793
+ "dtype": "float32",
794
+ "format": "f32-to-bf16",
795
+ "nbytes": 1536,
796
+ "byteOffset": 25984512
797
+ },
798
+ {
799
+ "name": "roberta.encoder.layer.2.attention.output.LayerNorm.weight",
800
+ "shape": [
801
+ 768
802
+ ],
803
+ "dtype": "float32",
804
+ "format": "f32-to-bf16",
805
+ "nbytes": 1536,
806
+ "byteOffset": 25986048
807
+ },
808
+ {
809
+ "name": "roberta.encoder.layer.2.attention.output.dense.bias",
810
+ "shape": [
811
+ 768
812
+ ],
813
+ "dtype": "float32",
814
+ "format": "f32-to-bf16",
815
+ "nbytes": 1536,
816
+ "byteOffset": 25987584
817
+ },
818
+ {
819
+ "name": "roberta.encoder.layer.2.attention.output.dense.weight",
820
+ "shape": [
821
+ 768,
822
+ 768
823
+ ],
824
+ "dtype": "float32",
825
+ "format": "f32-to-bf16",
826
+ "nbytes": 1179648,
827
+ "byteOffset": 25989120
828
+ },
829
+ {
830
+ "name": "roberta.encoder.layer.2.attention.self.key.bias",
831
+ "shape": [
832
+ 768
833
+ ],
834
+ "dtype": "float32",
835
+ "format": "f32-to-bf16",
836
+ "nbytes": 1536,
837
+ "byteOffset": 27168768
838
+ },
839
+ {
840
+ "name": "roberta.encoder.layer.2.attention.self.key.weight",
841
+ "shape": [
842
+ 768,
843
+ 768
844
+ ],
845
+ "dtype": "float32",
846
+ "format": "f32-to-bf16",
847
+ "nbytes": 1179648,
848
+ "byteOffset": 27170304
849
+ },
850
+ {
851
+ "name": "roberta.encoder.layer.2.attention.self.query.bias",
852
+ "shape": [
853
+ 768
854
+ ],
855
+ "dtype": "float32",
856
+ "format": "f32-to-bf16",
857
+ "nbytes": 1536,
858
+ "byteOffset": 28349952
859
+ },
860
+ {
861
+ "name": "roberta.encoder.layer.2.attention.self.query.weight",
862
+ "shape": [
863
+ 768,
864
+ 768
865
+ ],
866
+ "dtype": "float32",
867
+ "format": "f32-to-bf16",
868
+ "nbytes": 1179648,
869
+ "byteOffset": 28351488
870
+ },
871
+ {
872
+ "name": "roberta.encoder.layer.2.attention.self.value.bias",
873
+ "shape": [
874
+ 768
875
+ ],
876
+ "dtype": "float32",
877
+ "format": "f32-to-bf16",
878
+ "nbytes": 1536,
879
+ "byteOffset": 29531136
880
+ },
881
+ {
882
+ "name": "roberta.encoder.layer.2.attention.self.value.weight",
883
+ "shape": [
884
+ 768,
885
+ 768
886
+ ],
887
+ "dtype": "float32",
888
+ "format": "f32-to-bf16",
889
+ "nbytes": 1179648,
890
+ "byteOffset": 29532672
891
+ },
892
+ {
893
+ "name": "roberta.encoder.layer.2.intermediate.dense.bias",
894
+ "shape": [
895
+ 3072
896
+ ],
897
+ "dtype": "float32",
898
+ "format": "f32-to-bf16",
899
+ "nbytes": 6144,
900
+ "byteOffset": 30712320
901
+ }
902
+ ],
903
+ "md5sum": "894bfb02ebd46757ea2fa28fdc8f9079"
904
+ },
905
+ {
906
+ "dataPath": "params_shard_3.bin",
907
+ "format": "raw-shard",
908
+ "nbytes": 33074688,
909
+ "records": [
910
+ {
911
+ "name": "roberta.encoder.layer.2.intermediate.dense.weight",
912
+ "shape": [
913
+ 3072,
914
+ 768
915
+ ],
916
+ "dtype": "float32",
917
+ "format": "f32-to-bf16",
918
+ "nbytes": 4718592,
919
+ "byteOffset": 0
920
+ },
921
+ {
922
+ "name": "roberta.encoder.layer.2.output.LayerNorm.bias",
923
+ "shape": [
924
+ 768
925
+ ],
926
+ "dtype": "float32",
927
+ "format": "f32-to-bf16",
928
+ "nbytes": 1536,
929
+ "byteOffset": 4718592
930
+ },
931
+ {
932
+ "name": "roberta.encoder.layer.2.output.LayerNorm.weight",
933
+ "shape": [
934
+ 768
935
+ ],
936
+ "dtype": "float32",
937
+ "format": "f32-to-bf16",
938
+ "nbytes": 1536,
939
+ "byteOffset": 4720128
940
+ },
941
+ {
942
+ "name": "roberta.encoder.layer.2.output.dense.bias",
943
+ "shape": [
944
+ 768
945
+ ],
946
+ "dtype": "float32",
947
+ "format": "f32-to-bf16",
948
+ "nbytes": 1536,
949
+ "byteOffset": 4721664
950
+ },
951
+ {
952
+ "name": "roberta.encoder.layer.2.output.dense.weight",
953
+ "shape": [
954
+ 768,
955
+ 3072
956
+ ],
957
+ "dtype": "float32",
958
+ "format": "f32-to-bf16",
959
+ "nbytes": 4718592,
960
+ "byteOffset": 4723200
961
+ },
962
+ {
963
+ "name": "roberta.encoder.layer.3.attention.output.LayerNorm.bias",
964
+ "shape": [
965
+ 768
966
+ ],
967
+ "dtype": "float32",
968
+ "format": "f32-to-bf16",
969
+ "nbytes": 1536,
970
+ "byteOffset": 9441792
971
+ },
972
+ {
973
+ "name": "roberta.encoder.layer.3.attention.output.LayerNorm.weight",
974
+ "shape": [
975
+ 768
976
+ ],
977
+ "dtype": "float32",
978
+ "format": "f32-to-bf16",
979
+ "nbytes": 1536,
980
+ "byteOffset": 9443328
981
+ },
982
+ {
983
+ "name": "roberta.encoder.layer.3.attention.output.dense.bias",
984
+ "shape": [
985
+ 768
986
+ ],
987
+ "dtype": "float32",
988
+ "format": "f32-to-bf16",
989
+ "nbytes": 1536,
990
+ "byteOffset": 9444864
991
+ },
992
+ {
993
+ "name": "roberta.encoder.layer.3.attention.output.dense.weight",
994
+ "shape": [
995
+ 768,
996
+ 768
997
+ ],
998
+ "dtype": "float32",
999
+ "format": "f32-to-bf16",
1000
+ "nbytes": 1179648,
1001
+ "byteOffset": 9446400
1002
+ },
1003
+ {
1004
+ "name": "roberta.encoder.layer.3.attention.self.key.bias",
1005
+ "shape": [
1006
+ 768
1007
+ ],
1008
+ "dtype": "float32",
1009
+ "format": "f32-to-bf16",
1010
+ "nbytes": 1536,
1011
+ "byteOffset": 10626048
1012
+ },
1013
+ {
1014
+ "name": "roberta.encoder.layer.3.attention.self.key.weight",
1015
+ "shape": [
1016
+ 768,
1017
+ 768
1018
+ ],
1019
+ "dtype": "float32",
1020
+ "format": "f32-to-bf16",
1021
+ "nbytes": 1179648,
1022
+ "byteOffset": 10627584
1023
+ },
1024
+ {
1025
+ "name": "roberta.encoder.layer.3.attention.self.query.bias",
1026
+ "shape": [
1027
+ 768
1028
+ ],
1029
+ "dtype": "float32",
1030
+ "format": "f32-to-bf16",
1031
+ "nbytes": 1536,
1032
+ "byteOffset": 11807232
1033
+ },
1034
+ {
1035
+ "name": "roberta.encoder.layer.3.attention.self.query.weight",
1036
+ "shape": [
1037
+ 768,
1038
+ 768
1039
+ ],
1040
+ "dtype": "float32",
1041
+ "format": "f32-to-bf16",
1042
+ "nbytes": 1179648,
1043
+ "byteOffset": 11808768
1044
+ },
1045
+ {
1046
+ "name": "roberta.encoder.layer.3.attention.self.value.bias",
1047
+ "shape": [
1048
+ 768
1049
+ ],
1050
+ "dtype": "float32",
1051
+ "format": "f32-to-bf16",
1052
+ "nbytes": 1536,
1053
+ "byteOffset": 12988416
1054
+ },
1055
+ {
1056
+ "name": "roberta.encoder.layer.3.attention.self.value.weight",
1057
+ "shape": [
1058
+ 768,
1059
+ 768
1060
+ ],
1061
+ "dtype": "float32",
1062
+ "format": "f32-to-bf16",
1063
+ "nbytes": 1179648,
1064
+ "byteOffset": 12989952
1065
+ },
1066
+ {
1067
+ "name": "roberta.encoder.layer.3.intermediate.dense.bias",
1068
+ "shape": [
1069
+ 3072
1070
+ ],
1071
+ "dtype": "float32",
1072
+ "format": "f32-to-bf16",
1073
+ "nbytes": 6144,
1074
+ "byteOffset": 14169600
1075
+ },
1076
+ {
1077
+ "name": "roberta.encoder.layer.3.intermediate.dense.weight",
1078
+ "shape": [
1079
+ 3072,
1080
+ 768
1081
+ ],
1082
+ "dtype": "float32",
1083
+ "format": "f32-to-bf16",
1084
+ "nbytes": 4718592,
1085
+ "byteOffset": 14175744
1086
+ },
1087
+ {
1088
+ "name": "roberta.encoder.layer.3.output.LayerNorm.bias",
1089
+ "shape": [
1090
+ 768
1091
+ ],
1092
+ "dtype": "float32",
1093
+ "format": "f32-to-bf16",
1094
+ "nbytes": 1536,
1095
+ "byteOffset": 18894336
1096
+ },
1097
+ {
1098
+ "name": "roberta.encoder.layer.3.output.LayerNorm.weight",
1099
+ "shape": [
1100
+ 768
1101
+ ],
1102
+ "dtype": "float32",
1103
+ "format": "f32-to-bf16",
1104
+ "nbytes": 1536,
1105
+ "byteOffset": 18895872
1106
+ },
1107
+ {
1108
+ "name": "roberta.encoder.layer.3.output.dense.bias",
1109
+ "shape": [
1110
+ 768
1111
+ ],
1112
+ "dtype": "float32",
1113
+ "format": "f32-to-bf16",
1114
+ "nbytes": 1536,
1115
+ "byteOffset": 18897408
1116
+ },
1117
+ {
1118
+ "name": "roberta.encoder.layer.3.output.dense.weight",
1119
+ "shape": [
1120
+ 768,
1121
+ 3072
1122
+ ],
1123
+ "dtype": "float32",
1124
+ "format": "f32-to-bf16",
1125
+ "nbytes": 4718592,
1126
+ "byteOffset": 18898944
1127
+ },
1128
+ {
1129
+ "name": "roberta.encoder.layer.4.attention.output.LayerNorm.bias",
1130
+ "shape": [
1131
+ 768
1132
+ ],
1133
+ "dtype": "float32",
1134
+ "format": "f32-to-bf16",
1135
+ "nbytes": 1536,
1136
+ "byteOffset": 23617536
1137
+ },
1138
+ {
1139
+ "name": "roberta.encoder.layer.4.attention.output.LayerNorm.weight",
1140
+ "shape": [
1141
+ 768
1142
+ ],
1143
+ "dtype": "float32",
1144
+ "format": "f32-to-bf16",
1145
+ "nbytes": 1536,
1146
+ "byteOffset": 23619072
1147
+ },
1148
+ {
1149
+ "name": "roberta.encoder.layer.4.attention.output.dense.bias",
1150
+ "shape": [
1151
+ 768
1152
+ ],
1153
+ "dtype": "float32",
1154
+ "format": "f32-to-bf16",
1155
+ "nbytes": 1536,
1156
+ "byteOffset": 23620608
1157
+ },
1158
+ {
1159
+ "name": "roberta.encoder.layer.4.attention.output.dense.weight",
1160
+ "shape": [
1161
+ 768,
1162
+ 768
1163
+ ],
1164
+ "dtype": "float32",
1165
+ "format": "f32-to-bf16",
1166
+ "nbytes": 1179648,
1167
+ "byteOffset": 23622144
1168
+ },
1169
+ {
1170
+ "name": "roberta.encoder.layer.4.attention.self.key.bias",
1171
+ "shape": [
1172
+ 768
1173
+ ],
1174
+ "dtype": "float32",
1175
+ "format": "f32-to-bf16",
1176
+ "nbytes": 1536,
1177
+ "byteOffset": 24801792
1178
+ },
1179
+ {
1180
+ "name": "roberta.encoder.layer.4.attention.self.key.weight",
1181
+ "shape": [
1182
+ 768,
1183
+ 768
1184
+ ],
1185
+ "dtype": "float32",
1186
+ "format": "f32-to-bf16",
1187
+ "nbytes": 1179648,
1188
+ "byteOffset": 24803328
1189
+ },
1190
+ {
1191
+ "name": "roberta.encoder.layer.4.attention.self.query.bias",
1192
+ "shape": [
1193
+ 768
1194
+ ],
1195
+ "dtype": "float32",
1196
+ "format": "f32-to-bf16",
1197
+ "nbytes": 1536,
1198
+ "byteOffset": 25982976
1199
+ },
1200
+ {
1201
+ "name": "roberta.encoder.layer.4.attention.self.query.weight",
1202
+ "shape": [
1203
+ 768,
1204
+ 768
1205
+ ],
1206
+ "dtype": "float32",
1207
+ "format": "f32-to-bf16",
1208
+ "nbytes": 1179648,
1209
+ "byteOffset": 25984512
1210
+ },
1211
+ {
1212
+ "name": "roberta.encoder.layer.4.attention.self.value.bias",
1213
+ "shape": [
1214
+ 768
1215
+ ],
1216
+ "dtype": "float32",
1217
+ "format": "f32-to-bf16",
1218
+ "nbytes": 1536,
1219
+ "byteOffset": 27164160
1220
+ },
1221
+ {
1222
+ "name": "roberta.encoder.layer.4.attention.self.value.weight",
1223
+ "shape": [
1224
+ 768,
1225
+ 768
1226
+ ],
1227
+ "dtype": "float32",
1228
+ "format": "f32-to-bf16",
1229
+ "nbytes": 1179648,
1230
+ "byteOffset": 27165696
1231
+ },
1232
+ {
1233
+ "name": "roberta.encoder.layer.4.intermediate.dense.bias",
1234
+ "shape": [
1235
+ 3072
1236
+ ],
1237
+ "dtype": "float32",
1238
+ "format": "f32-to-bf16",
1239
+ "nbytes": 6144,
1240
+ "byteOffset": 28345344
1241
+ },
1242
+ {
1243
+ "name": "roberta.encoder.layer.4.intermediate.dense.weight",
1244
+ "shape": [
1245
+ 3072,
1246
+ 768
1247
+ ],
1248
+ "dtype": "float32",
1249
+ "format": "f32-to-bf16",
1250
+ "nbytes": 4718592,
1251
+ "byteOffset": 28351488
1252
+ },
1253
+ {
1254
+ "name": "roberta.encoder.layer.4.output.LayerNorm.bias",
1255
+ "shape": [
1256
+ 768
1257
+ ],
1258
+ "dtype": "float32",
1259
+ "format": "f32-to-bf16",
1260
+ "nbytes": 1536,
1261
+ "byteOffset": 33070080
1262
+ },
1263
+ {
1264
+ "name": "roberta.encoder.layer.4.output.LayerNorm.weight",
1265
+ "shape": [
1266
+ 768
1267
+ ],
1268
+ "dtype": "float32",
1269
+ "format": "f32-to-bf16",
1270
+ "nbytes": 1536,
1271
+ "byteOffset": 33071616
1272
+ },
1273
+ {
1274
+ "name": "roberta.encoder.layer.4.output.dense.bias",
1275
+ "shape": [
1276
+ 768
1277
+ ],
1278
+ "dtype": "float32",
1279
+ "format": "f32-to-bf16",
1280
+ "nbytes": 1536,
1281
+ "byteOffset": 33073152
1282
+ }
1283
+ ],
1284
+ "md5sum": "0204eb9ffd7d0691cdba514cfc076112"
1285
+ },
1286
+ {
1287
+ "dataPath": "params_shard_4.bin",
1288
+ "format": "raw-shard",
1289
+ "nbytes": 33074688,
1290
+ "records": [
1291
+ {
1292
+ "name": "roberta.encoder.layer.4.output.dense.weight",
1293
+ "shape": [
1294
+ 768,
1295
+ 3072
1296
+ ],
1297
+ "dtype": "float32",
1298
+ "format": "f32-to-bf16",
1299
+ "nbytes": 4718592,
1300
+ "byteOffset": 0
1301
+ },
1302
+ {
1303
+ "name": "roberta.encoder.layer.5.attention.output.LayerNorm.bias",
1304
+ "shape": [
1305
+ 768
1306
+ ],
1307
+ "dtype": "float32",
1308
+ "format": "f32-to-bf16",
1309
+ "nbytes": 1536,
1310
+ "byteOffset": 4718592
1311
+ },
1312
+ {
1313
+ "name": "roberta.encoder.layer.5.attention.output.LayerNorm.weight",
1314
+ "shape": [
1315
+ 768
1316
+ ],
1317
+ "dtype": "float32",
1318
+ "format": "f32-to-bf16",
1319
+ "nbytes": 1536,
1320
+ "byteOffset": 4720128
1321
+ },
1322
+ {
1323
+ "name": "roberta.encoder.layer.5.attention.output.dense.bias",
1324
+ "shape": [
1325
+ 768
1326
+ ],
1327
+ "dtype": "float32",
1328
+ "format": "f32-to-bf16",
1329
+ "nbytes": 1536,
1330
+ "byteOffset": 4721664
1331
+ },
1332
+ {
1333
+ "name": "roberta.encoder.layer.5.attention.output.dense.weight",
1334
+ "shape": [
1335
+ 768,
1336
+ 768
1337
+ ],
1338
+ "dtype": "float32",
1339
+ "format": "f32-to-bf16",
1340
+ "nbytes": 1179648,
1341
+ "byteOffset": 4723200
1342
+ },
1343
+ {
1344
+ "name": "roberta.encoder.layer.5.attention.self.key.bias",
1345
+ "shape": [
1346
+ 768
1347
+ ],
1348
+ "dtype": "float32",
1349
+ "format": "f32-to-bf16",
1350
+ "nbytes": 1536,
1351
+ "byteOffset": 5902848
1352
+ },
1353
+ {
1354
+ "name": "roberta.encoder.layer.5.attention.self.key.weight",
1355
+ "shape": [
1356
+ 768,
1357
+ 768
1358
+ ],
1359
+ "dtype": "float32",
1360
+ "format": "f32-to-bf16",
1361
+ "nbytes": 1179648,
1362
+ "byteOffset": 5904384
1363
+ },
1364
+ {
1365
+ "name": "roberta.encoder.layer.5.attention.self.query.bias",
1366
+ "shape": [
1367
+ 768
1368
+ ],
1369
+ "dtype": "float32",
1370
+ "format": "f32-to-bf16",
1371
+ "nbytes": 1536,
1372
+ "byteOffset": 7084032
1373
+ },
1374
+ {
1375
+ "name": "roberta.encoder.layer.5.attention.self.query.weight",
1376
+ "shape": [
1377
+ 768,
1378
+ 768
1379
+ ],
1380
+ "dtype": "float32",
1381
+ "format": "f32-to-bf16",
1382
+ "nbytes": 1179648,
1383
+ "byteOffset": 7085568
1384
+ },
1385
+ {
1386
+ "name": "roberta.encoder.layer.5.attention.self.value.bias",
1387
+ "shape": [
1388
+ 768
1389
+ ],
1390
+ "dtype": "float32",
1391
+ "format": "f32-to-bf16",
1392
+ "nbytes": 1536,
1393
+ "byteOffset": 8265216
1394
+ },
1395
+ {
1396
+ "name": "roberta.encoder.layer.5.attention.self.value.weight",
1397
+ "shape": [
1398
+ 768,
1399
+ 768
1400
+ ],
1401
+ "dtype": "float32",
1402
+ "format": "f32-to-bf16",
1403
+ "nbytes": 1179648,
1404
+ "byteOffset": 8266752
1405
+ },
1406
+ {
1407
+ "name": "roberta.encoder.layer.5.intermediate.dense.bias",
1408
+ "shape": [
1409
+ 3072
1410
+ ],
1411
+ "dtype": "float32",
1412
+ "format": "f32-to-bf16",
1413
+ "nbytes": 6144,
1414
+ "byteOffset": 9446400
1415
+ },
1416
+ {
1417
+ "name": "roberta.encoder.layer.5.intermediate.dense.weight",
1418
+ "shape": [
1419
+ 3072,
1420
+ 768
1421
+ ],
1422
+ "dtype": "float32",
1423
+ "format": "f32-to-bf16",
1424
+ "nbytes": 4718592,
1425
+ "byteOffset": 9452544
1426
+ },
1427
+ {
1428
+ "name": "roberta.encoder.layer.5.output.LayerNorm.bias",
1429
+ "shape": [
1430
+ 768
1431
+ ],
1432
+ "dtype": "float32",
1433
+ "format": "f32-to-bf16",
1434
+ "nbytes": 1536,
1435
+ "byteOffset": 14171136
1436
+ },
1437
+ {
1438
+ "name": "roberta.encoder.layer.5.output.LayerNorm.weight",
1439
+ "shape": [
1440
+ 768
1441
+ ],
1442
+ "dtype": "float32",
1443
+ "format": "f32-to-bf16",
1444
+ "nbytes": 1536,
1445
+ "byteOffset": 14172672
1446
+ },
1447
+ {
1448
+ "name": "roberta.encoder.layer.5.output.dense.bias",
1449
+ "shape": [
1450
+ 768
1451
+ ],
1452
+ "dtype": "float32",
1453
+ "format": "f32-to-bf16",
1454
+ "nbytes": 1536,
1455
+ "byteOffset": 14174208
1456
+ },
1457
+ {
1458
+ "name": "roberta.encoder.layer.5.output.dense.weight",
1459
+ "shape": [
1460
+ 768,
1461
+ 3072
1462
+ ],
1463
+ "dtype": "float32",
1464
+ "format": "f32-to-bf16",
1465
+ "nbytes": 4718592,
1466
+ "byteOffset": 14175744
1467
+ },
1468
+ {
1469
+ "name": "roberta.encoder.layer.6.attention.output.LayerNorm.bias",
1470
+ "shape": [
1471
+ 768
1472
+ ],
1473
+ "dtype": "float32",
1474
+ "format": "f32-to-bf16",
1475
+ "nbytes": 1536,
1476
+ "byteOffset": 18894336
1477
+ },
1478
+ {
1479
+ "name": "roberta.encoder.layer.6.attention.output.LayerNorm.weight",
1480
+ "shape": [
1481
+ 768
1482
+ ],
1483
+ "dtype": "float32",
1484
+ "format": "f32-to-bf16",
1485
+ "nbytes": 1536,
1486
+ "byteOffset": 18895872
1487
+ },
1488
+ {
1489
+ "name": "roberta.encoder.layer.6.attention.output.dense.bias",
1490
+ "shape": [
1491
+ 768
1492
+ ],
1493
+ "dtype": "float32",
1494
+ "format": "f32-to-bf16",
1495
+ "nbytes": 1536,
1496
+ "byteOffset": 18897408
1497
+ },
1498
+ {
1499
+ "name": "roberta.encoder.layer.6.attention.output.dense.weight",
1500
+ "shape": [
1501
+ 768,
1502
+ 768
1503
+ ],
1504
+ "dtype": "float32",
1505
+ "format": "f32-to-bf16",
1506
+ "nbytes": 1179648,
1507
+ "byteOffset": 18898944
1508
+ },
1509
+ {
1510
+ "name": "roberta.encoder.layer.6.attention.self.key.bias",
1511
+ "shape": [
1512
+ 768
1513
+ ],
1514
+ "dtype": "float32",
1515
+ "format": "f32-to-bf16",
1516
+ "nbytes": 1536,
1517
+ "byteOffset": 20078592
1518
+ },
1519
+ {
1520
+ "name": "roberta.encoder.layer.6.attention.self.key.weight",
1521
+ "shape": [
1522
+ 768,
1523
+ 768
1524
+ ],
1525
+ "dtype": "float32",
1526
+ "format": "f32-to-bf16",
1527
+ "nbytes": 1179648,
1528
+ "byteOffset": 20080128
1529
+ },
1530
+ {
1531
+ "name": "roberta.encoder.layer.6.attention.self.query.bias",
1532
+ "shape": [
1533
+ 768
1534
+ ],
1535
+ "dtype": "float32",
1536
+ "format": "f32-to-bf16",
1537
+ "nbytes": 1536,
1538
+ "byteOffset": 21259776
1539
+ },
1540
+ {
1541
+ "name": "roberta.encoder.layer.6.attention.self.query.weight",
1542
+ "shape": [
1543
+ 768,
1544
+ 768
1545
+ ],
1546
+ "dtype": "float32",
1547
+ "format": "f32-to-bf16",
1548
+ "nbytes": 1179648,
1549
+ "byteOffset": 21261312
1550
+ },
1551
+ {
1552
+ "name": "roberta.encoder.layer.6.attention.self.value.bias",
1553
+ "shape": [
1554
+ 768
1555
+ ],
1556
+ "dtype": "float32",
1557
+ "format": "f32-to-bf16",
1558
+ "nbytes": 1536,
1559
+ "byteOffset": 22440960
1560
+ },
1561
+ {
1562
+ "name": "roberta.encoder.layer.6.attention.self.value.weight",
1563
+ "shape": [
1564
+ 768,
1565
+ 768
1566
+ ],
1567
+ "dtype": "float32",
1568
+ "format": "f32-to-bf16",
1569
+ "nbytes": 1179648,
1570
+ "byteOffset": 22442496
1571
+ },
1572
+ {
1573
+ "name": "roberta.encoder.layer.6.intermediate.dense.bias",
1574
+ "shape": [
1575
+ 3072
1576
+ ],
1577
+ "dtype": "float32",
1578
+ "format": "f32-to-bf16",
1579
+ "nbytes": 6144,
1580
+ "byteOffset": 23622144
1581
+ },
1582
+ {
1583
+ "name": "roberta.encoder.layer.6.intermediate.dense.weight",
1584
+ "shape": [
1585
+ 3072,
1586
+ 768
1587
+ ],
1588
+ "dtype": "float32",
1589
+ "format": "f32-to-bf16",
1590
+ "nbytes": 4718592,
1591
+ "byteOffset": 23628288
1592
+ },
1593
+ {
1594
+ "name": "roberta.encoder.layer.6.output.LayerNorm.bias",
1595
+ "shape": [
1596
+ 768
1597
+ ],
1598
+ "dtype": "float32",
1599
+ "format": "f32-to-bf16",
1600
+ "nbytes": 1536,
1601
+ "byteOffset": 28346880
1602
+ },
1603
+ {
1604
+ "name": "roberta.encoder.layer.6.output.LayerNorm.weight",
1605
+ "shape": [
1606
+ 768
1607
+ ],
1608
+ "dtype": "float32",
1609
+ "format": "f32-to-bf16",
1610
+ "nbytes": 1536,
1611
+ "byteOffset": 28348416
1612
+ },
1613
+ {
1614
+ "name": "roberta.encoder.layer.6.output.dense.bias",
1615
+ "shape": [
1616
+ 768
1617
+ ],
1618
+ "dtype": "float32",
1619
+ "format": "f32-to-bf16",
1620
+ "nbytes": 1536,
1621
+ "byteOffset": 28349952
1622
+ },
1623
+ {
1624
+ "name": "roberta.encoder.layer.6.output.dense.weight",
1625
+ "shape": [
1626
+ 768,
1627
+ 3072
1628
+ ],
1629
+ "dtype": "float32",
1630
+ "format": "f32-to-bf16",
1631
+ "nbytes": 4718592,
1632
+ "byteOffset": 28351488
1633
+ },
1634
+ {
1635
+ "name": "roberta.encoder.layer.7.attention.output.LayerNorm.bias",
1636
+ "shape": [
1637
+ 768
1638
+ ],
1639
+ "dtype": "float32",
1640
+ "format": "f32-to-bf16",
1641
+ "nbytes": 1536,
1642
+ "byteOffset": 33070080
1643
+ },
1644
+ {
1645
+ "name": "roberta.encoder.layer.7.attention.output.LayerNorm.weight",
1646
+ "shape": [
1647
+ 768
1648
+ ],
1649
+ "dtype": "float32",
1650
+ "format": "f32-to-bf16",
1651
+ "nbytes": 1536,
1652
+ "byteOffset": 33071616
1653
+ },
1654
+ {
1655
+ "name": "roberta.encoder.layer.7.attention.output.dense.bias",
1656
+ "shape": [
1657
+ 768
1658
+ ],
1659
+ "dtype": "float32",
1660
+ "format": "f32-to-bf16",
1661
+ "nbytes": 1536,
1662
+ "byteOffset": 33073152
1663
+ }
1664
+ ],
1665
+ "md5sum": "f2720ca40469da131d390ea8694443fd"
1666
+ },
1667
+ {
1668
+ "dataPath": "params_shard_5.bin",
1669
+ "format": "raw-shard",
1670
+ "nbytes": 33080832,
1671
+ "records": [
1672
+ {
1673
+ "name": "roberta.encoder.layer.7.attention.output.dense.weight",
1674
+ "shape": [
1675
+ 768,
1676
+ 768
1677
+ ],
1678
+ "dtype": "float32",
1679
+ "format": "f32-to-bf16",
1680
+ "nbytes": 1179648,
1681
+ "byteOffset": 0
1682
+ },
1683
+ {
1684
+ "name": "roberta.encoder.layer.7.attention.self.key.bias",
1685
+ "shape": [
1686
+ 768
1687
+ ],
1688
+ "dtype": "float32",
1689
+ "format": "f32-to-bf16",
1690
+ "nbytes": 1536,
1691
+ "byteOffset": 1179648
1692
+ },
1693
+ {
1694
+ "name": "roberta.encoder.layer.7.attention.self.key.weight",
1695
+ "shape": [
1696
+ 768,
1697
+ 768
1698
+ ],
1699
+ "dtype": "float32",
1700
+ "format": "f32-to-bf16",
1701
+ "nbytes": 1179648,
1702
+ "byteOffset": 1181184
1703
+ },
1704
+ {
1705
+ "name": "roberta.encoder.layer.7.attention.self.query.bias",
1706
+ "shape": [
1707
+ 768
1708
+ ],
1709
+ "dtype": "float32",
1710
+ "format": "f32-to-bf16",
1711
+ "nbytes": 1536,
1712
+ "byteOffset": 2360832
1713
+ },
1714
+ {
1715
+ "name": "roberta.encoder.layer.7.attention.self.query.weight",
1716
+ "shape": [
1717
+ 768,
1718
+ 768
1719
+ ],
1720
+ "dtype": "float32",
1721
+ "format": "f32-to-bf16",
1722
+ "nbytes": 1179648,
1723
+ "byteOffset": 2362368
1724
+ },
1725
+ {
1726
+ "name": "roberta.encoder.layer.7.attention.self.value.bias",
1727
+ "shape": [
1728
+ 768
1729
+ ],
1730
+ "dtype": "float32",
1731
+ "format": "f32-to-bf16",
1732
+ "nbytes": 1536,
1733
+ "byteOffset": 3542016
1734
+ },
1735
+ {
1736
+ "name": "roberta.encoder.layer.7.attention.self.value.weight",
1737
+ "shape": [
1738
+ 768,
1739
+ 768
1740
+ ],
1741
+ "dtype": "float32",
1742
+ "format": "f32-to-bf16",
1743
+ "nbytes": 1179648,
1744
+ "byteOffset": 3543552
1745
+ },
1746
+ {
1747
+ "name": "roberta.encoder.layer.7.intermediate.dense.bias",
1748
+ "shape": [
1749
+ 3072
1750
+ ],
1751
+ "dtype": "float32",
1752
+ "format": "f32-to-bf16",
1753
+ "nbytes": 6144,
1754
+ "byteOffset": 4723200
1755
+ },
1756
+ {
1757
+ "name": "roberta.encoder.layer.7.intermediate.dense.weight",
1758
+ "shape": [
1759
+ 3072,
1760
+ 768
1761
+ ],
1762
+ "dtype": "float32",
1763
+ "format": "f32-to-bf16",
1764
+ "nbytes": 4718592,
1765
+ "byteOffset": 4729344
1766
+ },
1767
+ {
1768
+ "name": "roberta.encoder.layer.7.output.LayerNorm.bias",
1769
+ "shape": [
1770
+ 768
1771
+ ],
1772
+ "dtype": "float32",
1773
+ "format": "f32-to-bf16",
1774
+ "nbytes": 1536,
1775
+ "byteOffset": 9447936
1776
+ },
1777
+ {
1778
+ "name": "roberta.encoder.layer.7.output.LayerNorm.weight",
1779
+ "shape": [
1780
+ 768
1781
+ ],
1782
+ "dtype": "float32",
1783
+ "format": "f32-to-bf16",
1784
+ "nbytes": 1536,
1785
+ "byteOffset": 9449472
1786
+ },
1787
+ {
1788
+ "name": "roberta.encoder.layer.7.output.dense.bias",
1789
+ "shape": [
1790
+ 768
1791
+ ],
1792
+ "dtype": "float32",
1793
+ "format": "f32-to-bf16",
1794
+ "nbytes": 1536,
1795
+ "byteOffset": 9451008
1796
+ },
1797
+ {
1798
+ "name": "roberta.encoder.layer.7.output.dense.weight",
1799
+ "shape": [
1800
+ 768,
1801
+ 3072
1802
+ ],
1803
+ "dtype": "float32",
1804
+ "format": "f32-to-bf16",
1805
+ "nbytes": 4718592,
1806
+ "byteOffset": 9452544
1807
+ },
1808
+ {
1809
+ "name": "roberta.encoder.layer.8.attention.output.LayerNorm.bias",
1810
+ "shape": [
1811
+ 768
1812
+ ],
1813
+ "dtype": "float32",
1814
+ "format": "f32-to-bf16",
1815
+ "nbytes": 1536,
1816
+ "byteOffset": 14171136
1817
+ },
1818
+ {
1819
+ "name": "roberta.encoder.layer.8.attention.output.LayerNorm.weight",
1820
+ "shape": [
1821
+ 768
1822
+ ],
1823
+ "dtype": "float32",
1824
+ "format": "f32-to-bf16",
1825
+ "nbytes": 1536,
1826
+ "byteOffset": 14172672
1827
+ },
1828
+ {
1829
+ "name": "roberta.encoder.layer.8.attention.output.dense.bias",
1830
+ "shape": [
1831
+ 768
1832
+ ],
1833
+ "dtype": "float32",
1834
+ "format": "f32-to-bf16",
1835
+ "nbytes": 1536,
1836
+ "byteOffset": 14174208
1837
+ },
1838
+ {
1839
+ "name": "roberta.encoder.layer.8.attention.output.dense.weight",
1840
+ "shape": [
1841
+ 768,
1842
+ 768
1843
+ ],
1844
+ "dtype": "float32",
1845
+ "format": "f32-to-bf16",
1846
+ "nbytes": 1179648,
1847
+ "byteOffset": 14175744
1848
+ },
1849
+ {
1850
+ "name": "roberta.encoder.layer.8.attention.self.key.bias",
1851
+ "shape": [
1852
+ 768
1853
+ ],
1854
+ "dtype": "float32",
1855
+ "format": "f32-to-bf16",
1856
+ "nbytes": 1536,
1857
+ "byteOffset": 15355392
1858
+ },
1859
+ {
1860
+ "name": "roberta.encoder.layer.8.attention.self.key.weight",
1861
+ "shape": [
1862
+ 768,
1863
+ 768
1864
+ ],
1865
+ "dtype": "float32",
1866
+ "format": "f32-to-bf16",
1867
+ "nbytes": 1179648,
1868
+ "byteOffset": 15356928
1869
+ },
1870
+ {
1871
+ "name": "roberta.encoder.layer.8.attention.self.query.bias",
1872
+ "shape": [
1873
+ 768
1874
+ ],
1875
+ "dtype": "float32",
1876
+ "format": "f32-to-bf16",
1877
+ "nbytes": 1536,
1878
+ "byteOffset": 16536576
1879
+ },
1880
+ {
1881
+ "name": "roberta.encoder.layer.8.attention.self.query.weight",
1882
+ "shape": [
1883
+ 768,
1884
+ 768
1885
+ ],
1886
+ "dtype": "float32",
1887
+ "format": "f32-to-bf16",
1888
+ "nbytes": 1179648,
1889
+ "byteOffset": 16538112
1890
+ },
1891
+ {
1892
+ "name": "roberta.encoder.layer.8.attention.self.value.bias",
1893
+ "shape": [
1894
+ 768
1895
+ ],
1896
+ "dtype": "float32",
1897
+ "format": "f32-to-bf16",
1898
+ "nbytes": 1536,
1899
+ "byteOffset": 17717760
1900
+ },
1901
+ {
1902
+ "name": "roberta.encoder.layer.8.attention.self.value.weight",
1903
+ "shape": [
1904
+ 768,
1905
+ 768
1906
+ ],
1907
+ "dtype": "float32",
1908
+ "format": "f32-to-bf16",
1909
+ "nbytes": 1179648,
1910
+ "byteOffset": 17719296
1911
+ },
1912
+ {
1913
+ "name": "roberta.encoder.layer.8.intermediate.dense.bias",
1914
+ "shape": [
1915
+ 3072
1916
+ ],
1917
+ "dtype": "float32",
1918
+ "format": "f32-to-bf16",
1919
+ "nbytes": 6144,
1920
+ "byteOffset": 18898944
1921
+ },
1922
+ {
1923
+ "name": "roberta.encoder.layer.8.intermediate.dense.weight",
1924
+ "shape": [
1925
+ 3072,
1926
+ 768
1927
+ ],
1928
+ "dtype": "float32",
1929
+ "format": "f32-to-bf16",
1930
+ "nbytes": 4718592,
1931
+ "byteOffset": 18905088
1932
+ },
1933
+ {
1934
+ "name": "roberta.encoder.layer.8.output.LayerNorm.bias",
1935
+ "shape": [
1936
+ 768
1937
+ ],
1938
+ "dtype": "float32",
1939
+ "format": "f32-to-bf16",
1940
+ "nbytes": 1536,
1941
+ "byteOffset": 23623680
1942
+ },
1943
+ {
1944
+ "name": "roberta.encoder.layer.8.output.LayerNorm.weight",
1945
+ "shape": [
1946
+ 768
1947
+ ],
1948
+ "dtype": "float32",
1949
+ "format": "f32-to-bf16",
1950
+ "nbytes": 1536,
1951
+ "byteOffset": 23625216
1952
+ },
1953
+ {
1954
+ "name": "roberta.encoder.layer.8.output.dense.bias",
1955
+ "shape": [
1956
+ 768
1957
+ ],
1958
+ "dtype": "float32",
1959
+ "format": "f32-to-bf16",
1960
+ "nbytes": 1536,
1961
+ "byteOffset": 23626752
1962
+ },
1963
+ {
1964
+ "name": "roberta.encoder.layer.8.output.dense.weight",
1965
+ "shape": [
1966
+ 768,
1967
+ 3072
1968
+ ],
1969
+ "dtype": "float32",
1970
+ "format": "f32-to-bf16",
1971
+ "nbytes": 4718592,
1972
+ "byteOffset": 23628288
1973
+ },
1974
+ {
1975
+ "name": "roberta.encoder.layer.9.attention.output.LayerNorm.bias",
1976
+ "shape": [
1977
+ 768
1978
+ ],
1979
+ "dtype": "float32",
1980
+ "format": "f32-to-bf16",
1981
+ "nbytes": 1536,
1982
+ "byteOffset": 28346880
1983
+ },
1984
+ {
1985
+ "name": "roberta.encoder.layer.9.attention.output.LayerNorm.weight",
1986
+ "shape": [
1987
+ 768
1988
+ ],
1989
+ "dtype": "float32",
1990
+ "format": "f32-to-bf16",
1991
+ "nbytes": 1536,
1992
+ "byteOffset": 28348416
1993
+ },
1994
+ {
1995
+ "name": "roberta.encoder.layer.9.attention.output.dense.bias",
1996
+ "shape": [
1997
+ 768
1998
+ ],
1999
+ "dtype": "float32",
2000
+ "format": "f32-to-bf16",
2001
+ "nbytes": 1536,
2002
+ "byteOffset": 28349952
2003
+ },
2004
+ {
2005
+ "name": "roberta.encoder.layer.9.attention.output.dense.weight",
2006
+ "shape": [
2007
+ 768,
2008
+ 768
2009
+ ],
2010
+ "dtype": "float32",
2011
+ "format": "f32-to-bf16",
2012
+ "nbytes": 1179648,
2013
+ "byteOffset": 28351488
2014
+ },
2015
+ {
2016
+ "name": "roberta.encoder.layer.9.attention.self.key.bias",
2017
+ "shape": [
2018
+ 768
2019
+ ],
2020
+ "dtype": "float32",
2021
+ "format": "f32-to-bf16",
2022
+ "nbytes": 1536,
2023
+ "byteOffset": 29531136
2024
+ },
2025
+ {
2026
+ "name": "roberta.encoder.layer.9.attention.self.key.weight",
2027
+ "shape": [
2028
+ 768,
2029
+ 768
2030
+ ],
2031
+ "dtype": "float32",
2032
+ "format": "f32-to-bf16",
2033
+ "nbytes": 1179648,
2034
+ "byteOffset": 29532672
2035
+ },
2036
+ {
2037
+ "name": "roberta.encoder.layer.9.attention.self.query.bias",
2038
+ "shape": [
2039
+ 768
2040
+ ],
2041
+ "dtype": "float32",
2042
+ "format": "f32-to-bf16",
2043
+ "nbytes": 1536,
2044
+ "byteOffset": 30712320
2045
+ },
2046
+ {
2047
+ "name": "roberta.encoder.layer.9.attention.self.query.weight",
2048
+ "shape": [
2049
+ 768,
2050
+ 768
2051
+ ],
2052
+ "dtype": "float32",
2053
+ "format": "f32-to-bf16",
2054
+ "nbytes": 1179648,
2055
+ "byteOffset": 30713856
2056
+ },
2057
+ {
2058
+ "name": "roberta.encoder.layer.9.attention.self.value.bias",
2059
+ "shape": [
2060
+ 768
2061
+ ],
2062
+ "dtype": "float32",
2063
+ "format": "f32-to-bf16",
2064
+ "nbytes": 1536,
2065
+ "byteOffset": 31893504
2066
+ },
2067
+ {
2068
+ "name": "roberta.encoder.layer.9.attention.self.value.weight",
2069
+ "shape": [
2070
+ 768,
2071
+ 768
2072
+ ],
2073
+ "dtype": "float32",
2074
+ "format": "f32-to-bf16",
2075
+ "nbytes": 1179648,
2076
+ "byteOffset": 31895040
2077
+ },
2078
+ {
2079
+ "name": "roberta.encoder.layer.9.intermediate.dense.bias",
2080
+ "shape": [
2081
+ 3072
2082
+ ],
2083
+ "dtype": "float32",
2084
+ "format": "f32-to-bf16",
2085
+ "nbytes": 6144,
2086
+ "byteOffset": 33074688
2087
+ }
2088
+ ],
2089
+ "md5sum": "96b9d77f860a578f5bc124742a2447aa"
2090
+ },
2091
+ {
2092
+ "dataPath": "params_shard_6.bin",
2093
+ "format": "raw-shard",
2094
+ "nbytes": 9441792,
2095
+ "records": [
2096
+ {
2097
+ "name": "roberta.encoder.layer.9.intermediate.dense.weight",
2098
+ "shape": [
2099
+ 3072,
2100
+ 768
2101
+ ],
2102
+ "dtype": "float32",
2103
+ "format": "f32-to-bf16",
2104
+ "nbytes": 4718592,
2105
+ "byteOffset": 0
2106
+ },
2107
+ {
2108
+ "name": "roberta.encoder.layer.9.output.LayerNorm.bias",
2109
+ "shape": [
2110
+ 768
2111
+ ],
2112
+ "dtype": "float32",
2113
+ "format": "f32-to-bf16",
2114
+ "nbytes": 1536,
2115
+ "byteOffset": 4718592
2116
+ },
2117
+ {
2118
+ "name": "roberta.encoder.layer.9.output.LayerNorm.weight",
2119
+ "shape": [
2120
+ 768
2121
+ ],
2122
+ "dtype": "float32",
2123
+ "format": "f32-to-bf16",
2124
+ "nbytes": 1536,
2125
+ "byteOffset": 4720128
2126
+ },
2127
+ {
2128
+ "name": "roberta.encoder.layer.9.output.dense.bias",
2129
+ "shape": [
2130
+ 768
2131
+ ],
2132
+ "dtype": "float32",
2133
+ "format": "f32-to-bf16",
2134
+ "nbytes": 1536,
2135
+ "byteOffset": 4721664
2136
+ },
2137
+ {
2138
+ "name": "roberta.encoder.layer.9.output.dense.weight",
2139
+ "shape": [
2140
+ 768,
2141
+ 3072
2142
+ ],
2143
+ "dtype": "float32",
2144
+ "format": "f32-to-bf16",
2145
+ "nbytes": 4718592,
2146
+ "byteOffset": 4723200
2147
+ }
2148
+ ],
2149
+ "md5sum": "369c5d4f9e3ffd5ee22f0023b2faba04"
2150
+ }
2151
+ ]
2152
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:951600cb97a440e2e35202e137231d48ff487cd204fd22ce1b2a173fff63ff5e
3
+ size 77207040
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4649cd11a7460251d76eeba64218c7189527e866dc85819cda75fcc186598d2b
3
+ size 32696836
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee04a2bb03ce8dfd3fd8fc46283ac28dfa6b5539ee0956345b711d2382180b3c
3
+ size 30718464
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a7a47604f7c96fedc247b14629054f3f9c1f10ee5dd3603212050828f125517
3
+ size 33074688
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d9fcd55c8c8ec6779eef5c22e46c67291966a603b6f6f787af0b12d2c0d362c
3
+ size 33074688
params_shard_5.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89d1e38d0ebf86b236cce16338d04e3719e228c05840c916643b1f7356c87099
3
+ size 33080832
params_shard_6.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5137ae2a04ecb028f495824149452ce03248d4955a19937254c1d0dbab8f8940
3
+ size 9441792
roberta-cls-model-q0f32-android.tar ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aed76df6f74c17a1d9bc067b97bceb4ad064b0d8b05db9de9c061d2ee85e73d0
3
+ size 112431
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "mask_token": "<mask>",
51
+ "model_max_length": 512,
52
+ "pad_token": "<pad>",
53
+ "sep_token": "</s>",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "trim_offsets": true,
56
+ "unk_token": "<unk>"
57
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff