tclungu commited on
Commit
4a751e4
1 Parent(s): b3941be

mobilebert-multi-dutch-squad-v2

Browse files
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - generated_from_trainer
4
+ model-index:
5
+ - name: multi-finetuned-squad
6
+ results: []
7
+ ---
8
+
9
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
+ should probably proofread and complete it, then remove this comment. -->
11
+
12
+ # multi-finetuned-squad
13
+
14
+ This model was trained from scratch on the None dataset.
15
+ It achieves the following results on the evaluation set:
16
+ - Loss: 1.9112
17
+
18
+ ## Model description
19
+
20
+ More information needed
21
+
22
+ ## Intended uses & limitations
23
+
24
+ More information needed
25
+
26
+ ## Training and evaluation data
27
+
28
+ More information needed
29
+
30
+ ## Training procedure
31
+
32
+ ### Training hyperparameters
33
+
34
+ The following hyperparameters were used during training:
35
+ - learning_rate: 2e-05
36
+ - train_batch_size: 16
37
+ - eval_batch_size: 16
38
+ - seed: 42
39
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
40
+ - lr_scheduler_type: linear
41
+ - num_epochs: 3
42
+
43
+ ### Training results
44
+
45
+ | Training Loss | Epoch | Step | Validation Loss |
46
+ |:-------------:|:-----:|:-----:|:---------------:|
47
+ | 2.5281 | 1.0 | 7275 | 1.9342 |
48
+ | 2.4622 | 2.0 | 14550 | 1.8998 |
49
+ | 2.4195 | 3.0 | 21825 | 1.9112 |
50
+
51
+
52
+ ### Framework versions
53
+
54
+ - Transformers 4.33.3
55
+ - Pytorch 2.0.1
56
+ - Datasets 2.14.5
57
+ - Tokenizers 0.13.3
config.json ADDED
@@ -0,0 +1,1978 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/Users/teodorlungu/Downloads/mobilebert/multi",
3
+ "architectures": [
4
+ "MobileBertForQuestionAnswering"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_activation": true,
8
+ "classifier_dropout": null,
9
+ "embedding_size": 128,
10
+ "hidden_act": "relu",
11
+ "hidden_dropout_prob": 0.0,
12
+ "hidden_size": 512,
13
+ "initializer_range": 0.02,
14
+ "input_layers": {
15
+ "input_mask": [
16
+ "input_mask",
17
+ 0,
18
+ 0
19
+ ],
20
+ "input_type_ids": [
21
+ "input_type_ids",
22
+ 0,
23
+ 0
24
+ ],
25
+ "input_word_ids": [
26
+ "input_word_ids",
27
+ 0,
28
+ 0
29
+ ]
30
+ },
31
+ "intermediate_size": 512,
32
+ "intra_bottleneck_size": 128,
33
+ "key_query_shared_bottleneck": true,
34
+ "layer_norm_eps": 1e-12,
35
+ "layers": [
36
+ {
37
+ "class_name": "InputLayer",
38
+ "config": {
39
+ "batch_input_shape": [
40
+ null,
41
+ null
42
+ ],
43
+ "dtype": "int32",
44
+ "name": "input_mask",
45
+ "ragged": false,
46
+ "sparse": false
47
+ },
48
+ "inbound_nodes": [],
49
+ "name": "input_mask",
50
+ "shared_object_id": 0
51
+ },
52
+ {
53
+ "class_name": "InputLayer",
54
+ "config": {
55
+ "batch_input_shape": [
56
+ null,
57
+ null
58
+ ],
59
+ "dtype": "int32",
60
+ "name": "input_type_ids",
61
+ "ragged": false,
62
+ "sparse": false
63
+ },
64
+ "inbound_nodes": [],
65
+ "name": "input_type_ids",
66
+ "shared_object_id": 1
67
+ },
68
+ {
69
+ "class_name": "InputLayer",
70
+ "config": {
71
+ "batch_input_shape": [
72
+ null,
73
+ null
74
+ ],
75
+ "dtype": "int32",
76
+ "name": "input_word_ids",
77
+ "ragged": false,
78
+ "sparse": false
79
+ },
80
+ "inbound_nodes": [],
81
+ "name": "input_word_ids",
82
+ "shared_object_id": 2
83
+ },
84
+ {
85
+ "class_name": "Functional",
86
+ "config": {
87
+ "input_layers": [
88
+ [
89
+ "input_word_ids",
90
+ 0,
91
+ 0
92
+ ],
93
+ [
94
+ "input_mask",
95
+ 0,
96
+ 0
97
+ ],
98
+ [
99
+ "input_type_ids",
100
+ 0,
101
+ 0
102
+ ]
103
+ ],
104
+ "layers": [
105
+ {
106
+ "class_name": "InputLayer",
107
+ "config": {
108
+ "batch_input_shape": [
109
+ null,
110
+ null
111
+ ],
112
+ "dtype": "int32",
113
+ "name": "input_word_ids",
114
+ "ragged": false,
115
+ "sparse": false
116
+ },
117
+ "inbound_nodes": [],
118
+ "name": "input_word_ids",
119
+ "shared_object_id": 2
120
+ },
121
+ {
122
+ "class_name": "InputLayer",
123
+ "config": {
124
+ "batch_input_shape": [
125
+ null,
126
+ null
127
+ ],
128
+ "dtype": "int32",
129
+ "name": "input_type_ids",
130
+ "ragged": false,
131
+ "sparse": false
132
+ },
133
+ "inbound_nodes": [],
134
+ "name": "input_type_ids",
135
+ "shared_object_id": 1
136
+ },
137
+ {
138
+ "class_name": "InputLayer",
139
+ "config": {
140
+ "batch_input_shape": [
141
+ null,
142
+ null
143
+ ],
144
+ "dtype": "int32",
145
+ "name": "input_mask",
146
+ "ragged": false,
147
+ "sparse": false
148
+ },
149
+ "inbound_nodes": [],
150
+ "name": "input_mask",
151
+ "shared_object_id": 0
152
+ },
153
+ {
154
+ "class_name": "Text>MobileBertEmbedding",
155
+ "config": {
156
+ "dropout_rate": 0.0,
157
+ "dtype": "float32",
158
+ "initializer": {
159
+ "__passive_serialization__": true,
160
+ "class_name": "TruncatedNormal",
161
+ "config": {
162
+ "mean": 0.0,
163
+ "seed": null,
164
+ "stddev": 0.02
165
+ },
166
+ "shared_object_id": 3
167
+ },
168
+ "max_sequence_length": 512,
169
+ "name": "mobile_bert_embedding",
170
+ "normalization_type": "no_norm",
171
+ "output_embed_size": 512,
172
+ "trainable": true,
173
+ "type_vocab_size": 2,
174
+ "word_embed_size": 128,
175
+ "word_vocab_size": 119547
176
+ },
177
+ "inbound_nodes": [
178
+ [
179
+ [
180
+ "input_word_ids",
181
+ 0,
182
+ 0,
183
+ {
184
+ "token_type_ids": [
185
+ "input_type_ids",
186
+ 0,
187
+ 0
188
+ ]
189
+ }
190
+ ]
191
+ ]
192
+ ],
193
+ "name": "mobile_bert_embedding"
194
+ },
195
+ {
196
+ "class_name": "keras_nlp>SelfAttentionMask",
197
+ "config": {
198
+ "dtype": "float32",
199
+ "name": "self_attention_mask",
200
+ "trainable": true
201
+ },
202
+ "inbound_nodes": [
203
+ [
204
+ [
205
+ "input_mask",
206
+ 0,
207
+ 0,
208
+ {
209
+ "to_mask": [
210
+ "input_mask",
211
+ 0,
212
+ 0
213
+ ]
214
+ }
215
+ ]
216
+ ]
217
+ ],
218
+ "name": "self_attention_mask"
219
+ },
220
+ {
221
+ "class_name": "Text>MobileBertTransformer",
222
+ "config": {
223
+ "attention_probs_dropout_prob": 0.1,
224
+ "dtype": "float32",
225
+ "hidden_dropout_prob": 0.0,
226
+ "hidden_size": 512,
227
+ "initializer": {
228
+ "__passive_serialization__": true,
229
+ "class_name": "TruncatedNormal",
230
+ "config": {
231
+ "mean": 0.0,
232
+ "seed": null,
233
+ "stddev": 0.02
234
+ },
235
+ "shared_object_id": 3
236
+ },
237
+ "intermediate_act_fn": "relu",
238
+ "intermediate_size": 512,
239
+ "intra_bottleneck_size": 128,
240
+ "key_query_shared_bottleneck": true,
241
+ "name": "transformer_layer_0",
242
+ "normalization_type": "no_norm",
243
+ "num_attention_heads": 4,
244
+ "num_feedforward_networks": 4,
245
+ "trainable": true,
246
+ "use_bottleneck_attention": false
247
+ },
248
+ "inbound_nodes": [
249
+ [
250
+ [
251
+ "mobile_bert_embedding",
252
+ 0,
253
+ 0,
254
+ {
255
+ "attention_mask": [
256
+ "self_attention_mask",
257
+ 0,
258
+ 0
259
+ ],
260
+ "return_attention_scores": true
261
+ }
262
+ ]
263
+ ]
264
+ ],
265
+ "name": "transformer_layer_0"
266
+ },
267
+ {
268
+ "class_name": "Text>MobileBertTransformer",
269
+ "config": {
270
+ "attention_probs_dropout_prob": 0.1,
271
+ "dtype": "float32",
272
+ "hidden_dropout_prob": 0.0,
273
+ "hidden_size": 512,
274
+ "initializer": {
275
+ "__passive_serialization__": true,
276
+ "class_name": "TruncatedNormal",
277
+ "config": {
278
+ "mean": 0.0,
279
+ "seed": null,
280
+ "stddev": 0.02
281
+ },
282
+ "shared_object_id": 3
283
+ },
284
+ "intermediate_act_fn": "relu",
285
+ "intermediate_size": 512,
286
+ "intra_bottleneck_size": 128,
287
+ "key_query_shared_bottleneck": true,
288
+ "name": "transformer_layer_1",
289
+ "normalization_type": "no_norm",
290
+ "num_attention_heads": 4,
291
+ "num_feedforward_networks": 4,
292
+ "trainable": true,
293
+ "use_bottleneck_attention": false
294
+ },
295
+ "inbound_nodes": [
296
+ [
297
+ [
298
+ "transformer_layer_0",
299
+ 0,
300
+ 0,
301
+ {
302
+ "attention_mask": [
303
+ "self_attention_mask",
304
+ 0,
305
+ 0
306
+ ],
307
+ "return_attention_scores": true
308
+ }
309
+ ]
310
+ ]
311
+ ],
312
+ "name": "transformer_layer_1"
313
+ },
314
+ {
315
+ "class_name": "Text>MobileBertTransformer",
316
+ "config": {
317
+ "attention_probs_dropout_prob": 0.1,
318
+ "dtype": "float32",
319
+ "hidden_dropout_prob": 0.0,
320
+ "hidden_size": 512,
321
+ "initializer": {
322
+ "__passive_serialization__": true,
323
+ "class_name": "TruncatedNormal",
324
+ "config": {
325
+ "mean": 0.0,
326
+ "seed": null,
327
+ "stddev": 0.02
328
+ },
329
+ "shared_object_id": 3
330
+ },
331
+ "intermediate_act_fn": "relu",
332
+ "intermediate_size": 512,
333
+ "intra_bottleneck_size": 128,
334
+ "key_query_shared_bottleneck": true,
335
+ "name": "transformer_layer_2",
336
+ "normalization_type": "no_norm",
337
+ "num_attention_heads": 4,
338
+ "num_feedforward_networks": 4,
339
+ "trainable": true,
340
+ "use_bottleneck_attention": false
341
+ },
342
+ "inbound_nodes": [
343
+ [
344
+ [
345
+ "transformer_layer_1",
346
+ 0,
347
+ 0,
348
+ {
349
+ "attention_mask": [
350
+ "self_attention_mask",
351
+ 0,
352
+ 0
353
+ ],
354
+ "return_attention_scores": true
355
+ }
356
+ ]
357
+ ]
358
+ ],
359
+ "name": "transformer_layer_2"
360
+ },
361
+ {
362
+ "class_name": "Text>MobileBertTransformer",
363
+ "config": {
364
+ "attention_probs_dropout_prob": 0.1,
365
+ "dtype": "float32",
366
+ "hidden_dropout_prob": 0.0,
367
+ "hidden_size": 512,
368
+ "initializer": {
369
+ "__passive_serialization__": true,
370
+ "class_name": "TruncatedNormal",
371
+ "config": {
372
+ "mean": 0.0,
373
+ "seed": null,
374
+ "stddev": 0.02
375
+ },
376
+ "shared_object_id": 3
377
+ },
378
+ "intermediate_act_fn": "relu",
379
+ "intermediate_size": 512,
380
+ "intra_bottleneck_size": 128,
381
+ "key_query_shared_bottleneck": true,
382
+ "name": "transformer_layer_3",
383
+ "normalization_type": "no_norm",
384
+ "num_attention_heads": 4,
385
+ "num_feedforward_networks": 4,
386
+ "trainable": true,
387
+ "use_bottleneck_attention": false
388
+ },
389
+ "inbound_nodes": [
390
+ [
391
+ [
392
+ "transformer_layer_2",
393
+ 0,
394
+ 0,
395
+ {
396
+ "attention_mask": [
397
+ "self_attention_mask",
398
+ 0,
399
+ 0
400
+ ],
401
+ "return_attention_scores": true
402
+ }
403
+ ]
404
+ ]
405
+ ],
406
+ "name": "transformer_layer_3"
407
+ },
408
+ {
409
+ "class_name": "Text>MobileBertTransformer",
410
+ "config": {
411
+ "attention_probs_dropout_prob": 0.1,
412
+ "dtype": "float32",
413
+ "hidden_dropout_prob": 0.0,
414
+ "hidden_size": 512,
415
+ "initializer": {
416
+ "__passive_serialization__": true,
417
+ "class_name": "TruncatedNormal",
418
+ "config": {
419
+ "mean": 0.0,
420
+ "seed": null,
421
+ "stddev": 0.02
422
+ },
423
+ "shared_object_id": 3
424
+ },
425
+ "intermediate_act_fn": "relu",
426
+ "intermediate_size": 512,
427
+ "intra_bottleneck_size": 128,
428
+ "key_query_shared_bottleneck": true,
429
+ "name": "transformer_layer_4",
430
+ "normalization_type": "no_norm",
431
+ "num_attention_heads": 4,
432
+ "num_feedforward_networks": 4,
433
+ "trainable": true,
434
+ "use_bottleneck_attention": false
435
+ },
436
+ "inbound_nodes": [
437
+ [
438
+ [
439
+ "transformer_layer_3",
440
+ 0,
441
+ 0,
442
+ {
443
+ "attention_mask": [
444
+ "self_attention_mask",
445
+ 0,
446
+ 0
447
+ ],
448
+ "return_attention_scores": true
449
+ }
450
+ ]
451
+ ]
452
+ ],
453
+ "name": "transformer_layer_4"
454
+ },
455
+ {
456
+ "class_name": "Text>MobileBertTransformer",
457
+ "config": {
458
+ "attention_probs_dropout_prob": 0.1,
459
+ "dtype": "float32",
460
+ "hidden_dropout_prob": 0.0,
461
+ "hidden_size": 512,
462
+ "initializer": {
463
+ "__passive_serialization__": true,
464
+ "class_name": "TruncatedNormal",
465
+ "config": {
466
+ "mean": 0.0,
467
+ "seed": null,
468
+ "stddev": 0.02
469
+ },
470
+ "shared_object_id": 3
471
+ },
472
+ "intermediate_act_fn": "relu",
473
+ "intermediate_size": 512,
474
+ "intra_bottleneck_size": 128,
475
+ "key_query_shared_bottleneck": true,
476
+ "name": "transformer_layer_5",
477
+ "normalization_type": "no_norm",
478
+ "num_attention_heads": 4,
479
+ "num_feedforward_networks": 4,
480
+ "trainable": true,
481
+ "use_bottleneck_attention": false
482
+ },
483
+ "inbound_nodes": [
484
+ [
485
+ [
486
+ "transformer_layer_4",
487
+ 0,
488
+ 0,
489
+ {
490
+ "attention_mask": [
491
+ "self_attention_mask",
492
+ 0,
493
+ 0
494
+ ],
495
+ "return_attention_scores": true
496
+ }
497
+ ]
498
+ ]
499
+ ],
500
+ "name": "transformer_layer_5"
501
+ },
502
+ {
503
+ "class_name": "Text>MobileBertTransformer",
504
+ "config": {
505
+ "attention_probs_dropout_prob": 0.1,
506
+ "dtype": "float32",
507
+ "hidden_dropout_prob": 0.0,
508
+ "hidden_size": 512,
509
+ "initializer": {
510
+ "__passive_serialization__": true,
511
+ "class_name": "TruncatedNormal",
512
+ "config": {
513
+ "mean": 0.0,
514
+ "seed": null,
515
+ "stddev": 0.02
516
+ },
517
+ "shared_object_id": 3
518
+ },
519
+ "intermediate_act_fn": "relu",
520
+ "intermediate_size": 512,
521
+ "intra_bottleneck_size": 128,
522
+ "key_query_shared_bottleneck": true,
523
+ "name": "transformer_layer_6",
524
+ "normalization_type": "no_norm",
525
+ "num_attention_heads": 4,
526
+ "num_feedforward_networks": 4,
527
+ "trainable": true,
528
+ "use_bottleneck_attention": false
529
+ },
530
+ "inbound_nodes": [
531
+ [
532
+ [
533
+ "transformer_layer_5",
534
+ 0,
535
+ 0,
536
+ {
537
+ "attention_mask": [
538
+ "self_attention_mask",
539
+ 0,
540
+ 0
541
+ ],
542
+ "return_attention_scores": true
543
+ }
544
+ ]
545
+ ]
546
+ ],
547
+ "name": "transformer_layer_6"
548
+ },
549
+ {
550
+ "class_name": "Text>MobileBertTransformer",
551
+ "config": {
552
+ "attention_probs_dropout_prob": 0.1,
553
+ "dtype": "float32",
554
+ "hidden_dropout_prob": 0.0,
555
+ "hidden_size": 512,
556
+ "initializer": {
557
+ "__passive_serialization__": true,
558
+ "class_name": "TruncatedNormal",
559
+ "config": {
560
+ "mean": 0.0,
561
+ "seed": null,
562
+ "stddev": 0.02
563
+ },
564
+ "shared_object_id": 3
565
+ },
566
+ "intermediate_act_fn": "relu",
567
+ "intermediate_size": 512,
568
+ "intra_bottleneck_size": 128,
569
+ "key_query_shared_bottleneck": true,
570
+ "name": "transformer_layer_7",
571
+ "normalization_type": "no_norm",
572
+ "num_attention_heads": 4,
573
+ "num_feedforward_networks": 4,
574
+ "trainable": true,
575
+ "use_bottleneck_attention": false
576
+ },
577
+ "inbound_nodes": [
578
+ [
579
+ [
580
+ "transformer_layer_6",
581
+ 0,
582
+ 0,
583
+ {
584
+ "attention_mask": [
585
+ "self_attention_mask",
586
+ 0,
587
+ 0
588
+ ],
589
+ "return_attention_scores": true
590
+ }
591
+ ]
592
+ ]
593
+ ],
594
+ "name": "transformer_layer_7"
595
+ },
596
+ {
597
+ "class_name": "Text>MobileBertTransformer",
598
+ "config": {
599
+ "attention_probs_dropout_prob": 0.1,
600
+ "dtype": "float32",
601
+ "hidden_dropout_prob": 0.0,
602
+ "hidden_size": 512,
603
+ "initializer": {
604
+ "__passive_serialization__": true,
605
+ "class_name": "TruncatedNormal",
606
+ "config": {
607
+ "mean": 0.0,
608
+ "seed": null,
609
+ "stddev": 0.02
610
+ },
611
+ "shared_object_id": 3
612
+ },
613
+ "intermediate_act_fn": "relu",
614
+ "intermediate_size": 512,
615
+ "intra_bottleneck_size": 128,
616
+ "key_query_shared_bottleneck": true,
617
+ "name": "transformer_layer_8",
618
+ "normalization_type": "no_norm",
619
+ "num_attention_heads": 4,
620
+ "num_feedforward_networks": 4,
621
+ "trainable": true,
622
+ "use_bottleneck_attention": false
623
+ },
624
+ "inbound_nodes": [
625
+ [
626
+ [
627
+ "transformer_layer_7",
628
+ 0,
629
+ 0,
630
+ {
631
+ "attention_mask": [
632
+ "self_attention_mask",
633
+ 0,
634
+ 0
635
+ ],
636
+ "return_attention_scores": true
637
+ }
638
+ ]
639
+ ]
640
+ ],
641
+ "name": "transformer_layer_8"
642
+ },
643
+ {
644
+ "class_name": "Text>MobileBertTransformer",
645
+ "config": {
646
+ "attention_probs_dropout_prob": 0.1,
647
+ "dtype": "float32",
648
+ "hidden_dropout_prob": 0.0,
649
+ "hidden_size": 512,
650
+ "initializer": {
651
+ "__passive_serialization__": true,
652
+ "class_name": "TruncatedNormal",
653
+ "config": {
654
+ "mean": 0.0,
655
+ "seed": null,
656
+ "stddev": 0.02
657
+ },
658
+ "shared_object_id": 3
659
+ },
660
+ "intermediate_act_fn": "relu",
661
+ "intermediate_size": 512,
662
+ "intra_bottleneck_size": 128,
663
+ "key_query_shared_bottleneck": true,
664
+ "name": "transformer_layer_9",
665
+ "normalization_type": "no_norm",
666
+ "num_attention_heads": 4,
667
+ "num_feedforward_networks": 4,
668
+ "trainable": true,
669
+ "use_bottleneck_attention": false
670
+ },
671
+ "inbound_nodes": [
672
+ [
673
+ [
674
+ "transformer_layer_8",
675
+ 0,
676
+ 0,
677
+ {
678
+ "attention_mask": [
679
+ "self_attention_mask",
680
+ 0,
681
+ 0
682
+ ],
683
+ "return_attention_scores": true
684
+ }
685
+ ]
686
+ ]
687
+ ],
688
+ "name": "transformer_layer_9"
689
+ },
690
+ {
691
+ "class_name": "Text>MobileBertTransformer",
692
+ "config": {
693
+ "attention_probs_dropout_prob": 0.1,
694
+ "dtype": "float32",
695
+ "hidden_dropout_prob": 0.0,
696
+ "hidden_size": 512,
697
+ "initializer": {
698
+ "__passive_serialization__": true,
699
+ "class_name": "TruncatedNormal",
700
+ "config": {
701
+ "mean": 0.0,
702
+ "seed": null,
703
+ "stddev": 0.02
704
+ },
705
+ "shared_object_id": 3
706
+ },
707
+ "intermediate_act_fn": "relu",
708
+ "intermediate_size": 512,
709
+ "intra_bottleneck_size": 128,
710
+ "key_query_shared_bottleneck": true,
711
+ "name": "transformer_layer_10",
712
+ "normalization_type": "no_norm",
713
+ "num_attention_heads": 4,
714
+ "num_feedforward_networks": 4,
715
+ "trainable": true,
716
+ "use_bottleneck_attention": false
717
+ },
718
+ "inbound_nodes": [
719
+ [
720
+ [
721
+ "transformer_layer_9",
722
+ 0,
723
+ 0,
724
+ {
725
+ "attention_mask": [
726
+ "self_attention_mask",
727
+ 0,
728
+ 0
729
+ ],
730
+ "return_attention_scores": true
731
+ }
732
+ ]
733
+ ]
734
+ ],
735
+ "name": "transformer_layer_10"
736
+ },
737
+ {
738
+ "class_name": "Text>MobileBertTransformer",
739
+ "config": {
740
+ "attention_probs_dropout_prob": 0.1,
741
+ "dtype": "float32",
742
+ "hidden_dropout_prob": 0.0,
743
+ "hidden_size": 512,
744
+ "initializer": {
745
+ "__passive_serialization__": true,
746
+ "class_name": "TruncatedNormal",
747
+ "config": {
748
+ "mean": 0.0,
749
+ "seed": null,
750
+ "stddev": 0.02
751
+ },
752
+ "shared_object_id": 3
753
+ },
754
+ "intermediate_act_fn": "relu",
755
+ "intermediate_size": 512,
756
+ "intra_bottleneck_size": 128,
757
+ "key_query_shared_bottleneck": true,
758
+ "name": "transformer_layer_11",
759
+ "normalization_type": "no_norm",
760
+ "num_attention_heads": 4,
761
+ "num_feedforward_networks": 4,
762
+ "trainable": true,
763
+ "use_bottleneck_attention": false
764
+ },
765
+ "inbound_nodes": [
766
+ [
767
+ [
768
+ "transformer_layer_10",
769
+ 0,
770
+ 0,
771
+ {
772
+ "attention_mask": [
773
+ "self_attention_mask",
774
+ 0,
775
+ 0
776
+ ],
777
+ "return_attention_scores": true
778
+ }
779
+ ]
780
+ ]
781
+ ],
782
+ "name": "transformer_layer_11"
783
+ },
784
+ {
785
+ "class_name": "Text>MobileBertTransformer",
786
+ "config": {
787
+ "attention_probs_dropout_prob": 0.1,
788
+ "dtype": "float32",
789
+ "hidden_dropout_prob": 0.0,
790
+ "hidden_size": 512,
791
+ "initializer": {
792
+ "__passive_serialization__": true,
793
+ "class_name": "TruncatedNormal",
794
+ "config": {
795
+ "mean": 0.0,
796
+ "seed": null,
797
+ "stddev": 0.02
798
+ },
799
+ "shared_object_id": 3
800
+ },
801
+ "intermediate_act_fn": "relu",
802
+ "intermediate_size": 512,
803
+ "intra_bottleneck_size": 128,
804
+ "key_query_shared_bottleneck": true,
805
+ "name": "transformer_layer_12",
806
+ "normalization_type": "no_norm",
807
+ "num_attention_heads": 4,
808
+ "num_feedforward_networks": 4,
809
+ "trainable": true,
810
+ "use_bottleneck_attention": false
811
+ },
812
+ "inbound_nodes": [
813
+ [
814
+ [
815
+ "transformer_layer_11",
816
+ 0,
817
+ 0,
818
+ {
819
+ "attention_mask": [
820
+ "self_attention_mask",
821
+ 0,
822
+ 0
823
+ ],
824
+ "return_attention_scores": true
825
+ }
826
+ ]
827
+ ]
828
+ ],
829
+ "name": "transformer_layer_12"
830
+ },
831
+ {
832
+ "class_name": "Text>MobileBertTransformer",
833
+ "config": {
834
+ "attention_probs_dropout_prob": 0.1,
835
+ "dtype": "float32",
836
+ "hidden_dropout_prob": 0.0,
837
+ "hidden_size": 512,
838
+ "initializer": {
839
+ "__passive_serialization__": true,
840
+ "class_name": "TruncatedNormal",
841
+ "config": {
842
+ "mean": 0.0,
843
+ "seed": null,
844
+ "stddev": 0.02
845
+ },
846
+ "shared_object_id": 3
847
+ },
848
+ "intermediate_act_fn": "relu",
849
+ "intermediate_size": 512,
850
+ "intra_bottleneck_size": 128,
851
+ "key_query_shared_bottleneck": true,
852
+ "name": "transformer_layer_13",
853
+ "normalization_type": "no_norm",
854
+ "num_attention_heads": 4,
855
+ "num_feedforward_networks": 4,
856
+ "trainable": true,
857
+ "use_bottleneck_attention": false
858
+ },
859
+ "inbound_nodes": [
860
+ [
861
+ [
862
+ "transformer_layer_12",
863
+ 0,
864
+ 0,
865
+ {
866
+ "attention_mask": [
867
+ "self_attention_mask",
868
+ 0,
869
+ 0
870
+ ],
871
+ "return_attention_scores": true
872
+ }
873
+ ]
874
+ ]
875
+ ],
876
+ "name": "transformer_layer_13"
877
+ },
878
+ {
879
+ "class_name": "Text>MobileBertTransformer",
880
+ "config": {
881
+ "attention_probs_dropout_prob": 0.1,
882
+ "dtype": "float32",
883
+ "hidden_dropout_prob": 0.0,
884
+ "hidden_size": 512,
885
+ "initializer": {
886
+ "__passive_serialization__": true,
887
+ "class_name": "TruncatedNormal",
888
+ "config": {
889
+ "mean": 0.0,
890
+ "seed": null,
891
+ "stddev": 0.02
892
+ },
893
+ "shared_object_id": 3
894
+ },
895
+ "intermediate_act_fn": "relu",
896
+ "intermediate_size": 512,
897
+ "intra_bottleneck_size": 128,
898
+ "key_query_shared_bottleneck": true,
899
+ "name": "transformer_layer_14",
900
+ "normalization_type": "no_norm",
901
+ "num_attention_heads": 4,
902
+ "num_feedforward_networks": 4,
903
+ "trainable": true,
904
+ "use_bottleneck_attention": false
905
+ },
906
+ "inbound_nodes": [
907
+ [
908
+ [
909
+ "transformer_layer_13",
910
+ 0,
911
+ 0,
912
+ {
913
+ "attention_mask": [
914
+ "self_attention_mask",
915
+ 0,
916
+ 0
917
+ ],
918
+ "return_attention_scores": true
919
+ }
920
+ ]
921
+ ]
922
+ ],
923
+ "name": "transformer_layer_14"
924
+ },
925
+ {
926
+ "class_name": "Text>MobileBertTransformer",
927
+ "config": {
928
+ "attention_probs_dropout_prob": 0.1,
929
+ "dtype": "float32",
930
+ "hidden_dropout_prob": 0.0,
931
+ "hidden_size": 512,
932
+ "initializer": {
933
+ "__passive_serialization__": true,
934
+ "class_name": "TruncatedNormal",
935
+ "config": {
936
+ "mean": 0.0,
937
+ "seed": null,
938
+ "stddev": 0.02
939
+ },
940
+ "shared_object_id": 3
941
+ },
942
+ "intermediate_act_fn": "relu",
943
+ "intermediate_size": 512,
944
+ "intra_bottleneck_size": 128,
945
+ "key_query_shared_bottleneck": true,
946
+ "name": "transformer_layer_15",
947
+ "normalization_type": "no_norm",
948
+ "num_attention_heads": 4,
949
+ "num_feedforward_networks": 4,
950
+ "trainable": true,
951
+ "use_bottleneck_attention": false
952
+ },
953
+ "inbound_nodes": [
954
+ [
955
+ [
956
+ "transformer_layer_14",
957
+ 0,
958
+ 0,
959
+ {
960
+ "attention_mask": [
961
+ "self_attention_mask",
962
+ 0,
963
+ 0
964
+ ],
965
+ "return_attention_scores": true
966
+ }
967
+ ]
968
+ ]
969
+ ],
970
+ "name": "transformer_layer_15"
971
+ },
972
+ {
973
+ "class_name": "Text>MobileBertTransformer",
974
+ "config": {
975
+ "attention_probs_dropout_prob": 0.1,
976
+ "dtype": "float32",
977
+ "hidden_dropout_prob": 0.0,
978
+ "hidden_size": 512,
979
+ "initializer": {
980
+ "__passive_serialization__": true,
981
+ "class_name": "TruncatedNormal",
982
+ "config": {
983
+ "mean": 0.0,
984
+ "seed": null,
985
+ "stddev": 0.02
986
+ },
987
+ "shared_object_id": 3
988
+ },
989
+ "intermediate_act_fn": "relu",
990
+ "intermediate_size": 512,
991
+ "intra_bottleneck_size": 128,
992
+ "key_query_shared_bottleneck": true,
993
+ "name": "transformer_layer_16",
994
+ "normalization_type": "no_norm",
995
+ "num_attention_heads": 4,
996
+ "num_feedforward_networks": 4,
997
+ "trainable": true,
998
+ "use_bottleneck_attention": false
999
+ },
1000
+ "inbound_nodes": [
1001
+ [
1002
+ [
1003
+ "transformer_layer_15",
1004
+ 0,
1005
+ 0,
1006
+ {
1007
+ "attention_mask": [
1008
+ "self_attention_mask",
1009
+ 0,
1010
+ 0
1011
+ ],
1012
+ "return_attention_scores": true
1013
+ }
1014
+ ]
1015
+ ]
1016
+ ],
1017
+ "name": "transformer_layer_16"
1018
+ },
1019
+ {
1020
+ "class_name": "Text>MobileBertTransformer",
1021
+ "config": {
1022
+ "attention_probs_dropout_prob": 0.1,
1023
+ "dtype": "float32",
1024
+ "hidden_dropout_prob": 0.0,
1025
+ "hidden_size": 512,
1026
+ "initializer": {
1027
+ "__passive_serialization__": true,
1028
+ "class_name": "TruncatedNormal",
1029
+ "config": {
1030
+ "mean": 0.0,
1031
+ "seed": null,
1032
+ "stddev": 0.02
1033
+ },
1034
+ "shared_object_id": 3
1035
+ },
1036
+ "intermediate_act_fn": "relu",
1037
+ "intermediate_size": 512,
1038
+ "intra_bottleneck_size": 128,
1039
+ "key_query_shared_bottleneck": true,
1040
+ "name": "transformer_layer_17",
1041
+ "normalization_type": "no_norm",
1042
+ "num_attention_heads": 4,
1043
+ "num_feedforward_networks": 4,
1044
+ "trainable": true,
1045
+ "use_bottleneck_attention": false
1046
+ },
1047
+ "inbound_nodes": [
1048
+ [
1049
+ [
1050
+ "transformer_layer_16",
1051
+ 0,
1052
+ 0,
1053
+ {
1054
+ "attention_mask": [
1055
+ "self_attention_mask",
1056
+ 0,
1057
+ 0
1058
+ ],
1059
+ "return_attention_scores": true
1060
+ }
1061
+ ]
1062
+ ]
1063
+ ],
1064
+ "name": "transformer_layer_17"
1065
+ },
1066
+ {
1067
+ "class_name": "Text>MobileBertTransformer",
1068
+ "config": {
1069
+ "attention_probs_dropout_prob": 0.1,
1070
+ "dtype": "float32",
1071
+ "hidden_dropout_prob": 0.0,
1072
+ "hidden_size": 512,
1073
+ "initializer": {
1074
+ "__passive_serialization__": true,
1075
+ "class_name": "TruncatedNormal",
1076
+ "config": {
1077
+ "mean": 0.0,
1078
+ "seed": null,
1079
+ "stddev": 0.02
1080
+ },
1081
+ "shared_object_id": 3
1082
+ },
1083
+ "intermediate_act_fn": "relu",
1084
+ "intermediate_size": 512,
1085
+ "intra_bottleneck_size": 128,
1086
+ "key_query_shared_bottleneck": true,
1087
+ "name": "transformer_layer_18",
1088
+ "normalization_type": "no_norm",
1089
+ "num_attention_heads": 4,
1090
+ "num_feedforward_networks": 4,
1091
+ "trainable": true,
1092
+ "use_bottleneck_attention": false
1093
+ },
1094
+ "inbound_nodes": [
1095
+ [
1096
+ [
1097
+ "transformer_layer_17",
1098
+ 0,
1099
+ 0,
1100
+ {
1101
+ "attention_mask": [
1102
+ "self_attention_mask",
1103
+ 0,
1104
+ 0
1105
+ ],
1106
+ "return_attention_scores": true
1107
+ }
1108
+ ]
1109
+ ]
1110
+ ],
1111
+ "name": "transformer_layer_18"
1112
+ },
1113
+ {
1114
+ "class_name": "Text>MobileBertTransformer",
1115
+ "config": {
1116
+ "attention_probs_dropout_prob": 0.1,
1117
+ "dtype": "float32",
1118
+ "hidden_dropout_prob": 0.0,
1119
+ "hidden_size": 512,
1120
+ "initializer": {
1121
+ "__passive_serialization__": true,
1122
+ "class_name": "TruncatedNormal",
1123
+ "config": {
1124
+ "mean": 0.0,
1125
+ "seed": null,
1126
+ "stddev": 0.02
1127
+ },
1128
+ "shared_object_id": 3
1129
+ },
1130
+ "intermediate_act_fn": "relu",
1131
+ "intermediate_size": 512,
1132
+ "intra_bottleneck_size": 128,
1133
+ "key_query_shared_bottleneck": true,
1134
+ "name": "transformer_layer_19",
1135
+ "normalization_type": "no_norm",
1136
+ "num_attention_heads": 4,
1137
+ "num_feedforward_networks": 4,
1138
+ "trainable": true,
1139
+ "use_bottleneck_attention": false
1140
+ },
1141
+ "inbound_nodes": [
1142
+ [
1143
+ [
1144
+ "transformer_layer_18",
1145
+ 0,
1146
+ 0,
1147
+ {
1148
+ "attention_mask": [
1149
+ "self_attention_mask",
1150
+ 0,
1151
+ 0
1152
+ ],
1153
+ "return_attention_scores": true
1154
+ }
1155
+ ]
1156
+ ]
1157
+ ],
1158
+ "name": "transformer_layer_19"
1159
+ },
1160
+ {
1161
+ "class_name": "Text>MobileBertTransformer",
1162
+ "config": {
1163
+ "attention_probs_dropout_prob": 0.1,
1164
+ "dtype": "float32",
1165
+ "hidden_dropout_prob": 0.0,
1166
+ "hidden_size": 512,
1167
+ "initializer": {
1168
+ "__passive_serialization__": true,
1169
+ "class_name": "TruncatedNormal",
1170
+ "config": {
1171
+ "mean": 0.0,
1172
+ "seed": null,
1173
+ "stddev": 0.02
1174
+ },
1175
+ "shared_object_id": 3
1176
+ },
1177
+ "intermediate_act_fn": "relu",
1178
+ "intermediate_size": 512,
1179
+ "intra_bottleneck_size": 128,
1180
+ "key_query_shared_bottleneck": true,
1181
+ "name": "transformer_layer_20",
1182
+ "normalization_type": "no_norm",
1183
+ "num_attention_heads": 4,
1184
+ "num_feedforward_networks": 4,
1185
+ "trainable": true,
1186
+ "use_bottleneck_attention": false
1187
+ },
1188
+ "inbound_nodes": [
1189
+ [
1190
+ [
1191
+ "transformer_layer_19",
1192
+ 0,
1193
+ 0,
1194
+ {
1195
+ "attention_mask": [
1196
+ "self_attention_mask",
1197
+ 0,
1198
+ 0
1199
+ ],
1200
+ "return_attention_scores": true
1201
+ }
1202
+ ]
1203
+ ]
1204
+ ],
1205
+ "name": "transformer_layer_20"
1206
+ },
1207
+ {
1208
+ "class_name": "Text>MobileBertTransformer",
1209
+ "config": {
1210
+ "attention_probs_dropout_prob": 0.1,
1211
+ "dtype": "float32",
1212
+ "hidden_dropout_prob": 0.0,
1213
+ "hidden_size": 512,
1214
+ "initializer": {
1215
+ "__passive_serialization__": true,
1216
+ "class_name": "TruncatedNormal",
1217
+ "config": {
1218
+ "mean": 0.0,
1219
+ "seed": null,
1220
+ "stddev": 0.02
1221
+ },
1222
+ "shared_object_id": 3
1223
+ },
1224
+ "intermediate_act_fn": "relu",
1225
+ "intermediate_size": 512,
1226
+ "intra_bottleneck_size": 128,
1227
+ "key_query_shared_bottleneck": true,
1228
+ "name": "transformer_layer_21",
1229
+ "normalization_type": "no_norm",
1230
+ "num_attention_heads": 4,
1231
+ "num_feedforward_networks": 4,
1232
+ "trainable": true,
1233
+ "use_bottleneck_attention": false
1234
+ },
1235
+ "inbound_nodes": [
1236
+ [
1237
+ [
1238
+ "transformer_layer_20",
1239
+ 0,
1240
+ 0,
1241
+ {
1242
+ "attention_mask": [
1243
+ "self_attention_mask",
1244
+ 0,
1245
+ 0
1246
+ ],
1247
+ "return_attention_scores": true
1248
+ }
1249
+ ]
1250
+ ]
1251
+ ],
1252
+ "name": "transformer_layer_21"
1253
+ },
1254
+ {
1255
+ "class_name": "Text>MobileBertTransformer",
1256
+ "config": {
1257
+ "attention_probs_dropout_prob": 0.1,
1258
+ "dtype": "float32",
1259
+ "hidden_dropout_prob": 0.0,
1260
+ "hidden_size": 512,
1261
+ "initializer": {
1262
+ "__passive_serialization__": true,
1263
+ "class_name": "TruncatedNormal",
1264
+ "config": {
1265
+ "mean": 0.0,
1266
+ "seed": null,
1267
+ "stddev": 0.02
1268
+ },
1269
+ "shared_object_id": 3
1270
+ },
1271
+ "intermediate_act_fn": "relu",
1272
+ "intermediate_size": 512,
1273
+ "intra_bottleneck_size": 128,
1274
+ "key_query_shared_bottleneck": true,
1275
+ "name": "transformer_layer_22",
1276
+ "normalization_type": "no_norm",
1277
+ "num_attention_heads": 4,
1278
+ "num_feedforward_networks": 4,
1279
+ "trainable": true,
1280
+ "use_bottleneck_attention": false
1281
+ },
1282
+ "inbound_nodes": [
1283
+ [
1284
+ [
1285
+ "transformer_layer_21",
1286
+ 0,
1287
+ 0,
1288
+ {
1289
+ "attention_mask": [
1290
+ "self_attention_mask",
1291
+ 0,
1292
+ 0
1293
+ ],
1294
+ "return_attention_scores": true
1295
+ }
1296
+ ]
1297
+ ]
1298
+ ],
1299
+ "name": "transformer_layer_22"
1300
+ },
1301
+ {
1302
+ "class_name": "Text>MobileBertTransformer",
1303
+ "config": {
1304
+ "attention_probs_dropout_prob": 0.1,
1305
+ "dtype": "float32",
1306
+ "hidden_dropout_prob": 0.0,
1307
+ "hidden_size": 512,
1308
+ "initializer": {
1309
+ "__passive_serialization__": true,
1310
+ "class_name": "TruncatedNormal",
1311
+ "config": {
1312
+ "mean": 0.0,
1313
+ "seed": null,
1314
+ "stddev": 0.02
1315
+ },
1316
+ "shared_object_id": 3
1317
+ },
1318
+ "intermediate_act_fn": "relu",
1319
+ "intermediate_size": 512,
1320
+ "intra_bottleneck_size": 128,
1321
+ "key_query_shared_bottleneck": true,
1322
+ "name": "transformer_layer_23",
1323
+ "normalization_type": "no_norm",
1324
+ "num_attention_heads": 4,
1325
+ "num_feedforward_networks": 4,
1326
+ "trainable": true,
1327
+ "use_bottleneck_attention": false
1328
+ },
1329
+ "inbound_nodes": [
1330
+ [
1331
+ [
1332
+ "transformer_layer_22",
1333
+ 0,
1334
+ 0,
1335
+ {
1336
+ "attention_mask": [
1337
+ "self_attention_mask",
1338
+ 0,
1339
+ 0
1340
+ ],
1341
+ "return_attention_scores": true
1342
+ }
1343
+ ]
1344
+ ]
1345
+ ],
1346
+ "name": "transformer_layer_23"
1347
+ },
1348
+ {
1349
+ "class_name": "SlicingOpLambda",
1350
+ "config": {
1351
+ "dtype": "float32",
1352
+ "function": "__operators__.getitem",
1353
+ "name": "tf.__operators__.getitem",
1354
+ "trainable": true
1355
+ },
1356
+ "inbound_nodes": [
1357
+ [
1358
+ "transformer_layer_23",
1359
+ 0,
1360
+ 0,
1361
+ {
1362
+ "slice_spec": [
1363
+ {
1364
+ "start": null,
1365
+ "step": null,
1366
+ "stop": null
1367
+ },
1368
+ {
1369
+ "start": 0,
1370
+ "step": null,
1371
+ "stop": 1
1372
+ },
1373
+ {
1374
+ "start": null,
1375
+ "step": null,
1376
+ "stop": null
1377
+ }
1378
+ ]
1379
+ }
1380
+ ]
1381
+ ],
1382
+ "name": "tf.__operators__.getitem"
1383
+ },
1384
+ {
1385
+ "class_name": "TFOpLambda",
1386
+ "config": {
1387
+ "dtype": "float32",
1388
+ "function": "compat.v1.squeeze",
1389
+ "name": "tf.compat.v1.squeeze",
1390
+ "trainable": true
1391
+ },
1392
+ "inbound_nodes": [
1393
+ [
1394
+ "tf.__operators__.getitem",
1395
+ 0,
1396
+ 0,
1397
+ {
1398
+ "axis": 1,
1399
+ "name": null
1400
+ }
1401
+ ]
1402
+ ],
1403
+ "name": "tf.compat.v1.squeeze"
1404
+ }
1405
+ ],
1406
+ "name": "mobile_bert_encoder",
1407
+ "output_layers": {
1408
+ "attention_scores": [
1409
+ [
1410
+ "transformer_layer_0",
1411
+ 0,
1412
+ 1
1413
+ ],
1414
+ [
1415
+ "transformer_layer_1",
1416
+ 0,
1417
+ 1
1418
+ ],
1419
+ [
1420
+ "transformer_layer_2",
1421
+ 0,
1422
+ 1
1423
+ ],
1424
+ [
1425
+ "transformer_layer_3",
1426
+ 0,
1427
+ 1
1428
+ ],
1429
+ [
1430
+ "transformer_layer_4",
1431
+ 0,
1432
+ 1
1433
+ ],
1434
+ [
1435
+ "transformer_layer_5",
1436
+ 0,
1437
+ 1
1438
+ ],
1439
+ [
1440
+ "transformer_layer_6",
1441
+ 0,
1442
+ 1
1443
+ ],
1444
+ [
1445
+ "transformer_layer_7",
1446
+ 0,
1447
+ 1
1448
+ ],
1449
+ [
1450
+ "transformer_layer_8",
1451
+ 0,
1452
+ 1
1453
+ ],
1454
+ [
1455
+ "transformer_layer_9",
1456
+ 0,
1457
+ 1
1458
+ ],
1459
+ [
1460
+ "transformer_layer_10",
1461
+ 0,
1462
+ 1
1463
+ ],
1464
+ [
1465
+ "transformer_layer_11",
1466
+ 0,
1467
+ 1
1468
+ ],
1469
+ [
1470
+ "transformer_layer_12",
1471
+ 0,
1472
+ 1
1473
+ ],
1474
+ [
1475
+ "transformer_layer_13",
1476
+ 0,
1477
+ 1
1478
+ ],
1479
+ [
1480
+ "transformer_layer_14",
1481
+ 0,
1482
+ 1
1483
+ ],
1484
+ [
1485
+ "transformer_layer_15",
1486
+ 0,
1487
+ 1
1488
+ ],
1489
+ [
1490
+ "transformer_layer_16",
1491
+ 0,
1492
+ 1
1493
+ ],
1494
+ [
1495
+ "transformer_layer_17",
1496
+ 0,
1497
+ 1
1498
+ ],
1499
+ [
1500
+ "transformer_layer_18",
1501
+ 0,
1502
+ 1
1503
+ ],
1504
+ [
1505
+ "transformer_layer_19",
1506
+ 0,
1507
+ 1
1508
+ ],
1509
+ [
1510
+ "transformer_layer_20",
1511
+ 0,
1512
+ 1
1513
+ ],
1514
+ [
1515
+ "transformer_layer_21",
1516
+ 0,
1517
+ 1
1518
+ ],
1519
+ [
1520
+ "transformer_layer_22",
1521
+ 0,
1522
+ 1
1523
+ ],
1524
+ [
1525
+ "transformer_layer_23",
1526
+ 0,
1527
+ 1
1528
+ ]
1529
+ ],
1530
+ "encoder_outputs": [
1531
+ [
1532
+ "mobile_bert_embedding",
1533
+ 0,
1534
+ 0
1535
+ ],
1536
+ [
1537
+ "transformer_layer_0",
1538
+ 0,
1539
+ 0
1540
+ ],
1541
+ [
1542
+ "transformer_layer_1",
1543
+ 0,
1544
+ 0
1545
+ ],
1546
+ [
1547
+ "transformer_layer_2",
1548
+ 0,
1549
+ 0
1550
+ ],
1551
+ [
1552
+ "transformer_layer_3",
1553
+ 0,
1554
+ 0
1555
+ ],
1556
+ [
1557
+ "transformer_layer_4",
1558
+ 0,
1559
+ 0
1560
+ ],
1561
+ [
1562
+ "transformer_layer_5",
1563
+ 0,
1564
+ 0
1565
+ ],
1566
+ [
1567
+ "transformer_layer_6",
1568
+ 0,
1569
+ 0
1570
+ ],
1571
+ [
1572
+ "transformer_layer_7",
1573
+ 0,
1574
+ 0
1575
+ ],
1576
+ [
1577
+ "transformer_layer_8",
1578
+ 0,
1579
+ 0
1580
+ ],
1581
+ [
1582
+ "transformer_layer_9",
1583
+ 0,
1584
+ 0
1585
+ ],
1586
+ [
1587
+ "transformer_layer_10",
1588
+ 0,
1589
+ 0
1590
+ ],
1591
+ [
1592
+ "transformer_layer_11",
1593
+ 0,
1594
+ 0
1595
+ ],
1596
+ [
1597
+ "transformer_layer_12",
1598
+ 0,
1599
+ 0
1600
+ ],
1601
+ [
1602
+ "transformer_layer_13",
1603
+ 0,
1604
+ 0
1605
+ ],
1606
+ [
1607
+ "transformer_layer_14",
1608
+ 0,
1609
+ 0
1610
+ ],
1611
+ [
1612
+ "transformer_layer_15",
1613
+ 0,
1614
+ 0
1615
+ ],
1616
+ [
1617
+ "transformer_layer_16",
1618
+ 0,
1619
+ 0
1620
+ ],
1621
+ [
1622
+ "transformer_layer_17",
1623
+ 0,
1624
+ 0
1625
+ ],
1626
+ [
1627
+ "transformer_layer_18",
1628
+ 0,
1629
+ 0
1630
+ ],
1631
+ [
1632
+ "transformer_layer_19",
1633
+ 0,
1634
+ 0
1635
+ ],
1636
+ [
1637
+ "transformer_layer_20",
1638
+ 0,
1639
+ 0
1640
+ ],
1641
+ [
1642
+ "transformer_layer_21",
1643
+ 0,
1644
+ 0
1645
+ ],
1646
+ [
1647
+ "transformer_layer_22",
1648
+ 0,
1649
+ 0
1650
+ ],
1651
+ [
1652
+ "transformer_layer_23",
1653
+ 0,
1654
+ 0
1655
+ ]
1656
+ ],
1657
+ "pooled_output": [
1658
+ "tf.compat.v1.squeeze",
1659
+ 0,
1660
+ 0
1661
+ ],
1662
+ "sequence_output": [
1663
+ "transformer_layer_23",
1664
+ 0,
1665
+ 0
1666
+ ]
1667
+ },
1668
+ "trainable": true
1669
+ },
1670
+ "inbound_nodes": [
1671
+ {
1672
+ "input_mask": [
1673
+ "input_mask",
1674
+ 0,
1675
+ 0,
1676
+ {}
1677
+ ],
1678
+ "input_type_ids": [
1679
+ "input_type_ids",
1680
+ 0,
1681
+ 0,
1682
+ {}
1683
+ ],
1684
+ "input_word_ids": [
1685
+ "input_word_ids",
1686
+ 0,
1687
+ 0,
1688
+ {}
1689
+ ]
1690
+ }
1691
+ ],
1692
+ "name": "mobile_bert_encoder"
1693
+ }
1694
+ ],
1695
+ "max_position_embeddings": 512,
1696
+ "model_type": "mobilebert",
1697
+ "name": "model",
1698
+ "normalization_type": "no_norm",
1699
+ "num_attention_heads": 4,
1700
+ "num_feedforward_networks": 4,
1701
+ "num_hidden_layers": 24,
1702
+ "output_layers": {
1703
+ "attention_scores": [
1704
+ [
1705
+ "mobile_bert_encoder",
1706
+ 1,
1707
+ 0
1708
+ ],
1709
+ [
1710
+ "mobile_bert_encoder",
1711
+ 1,
1712
+ 1
1713
+ ],
1714
+ [
1715
+ "mobile_bert_encoder",
1716
+ 1,
1717
+ 2
1718
+ ],
1719
+ [
1720
+ "mobile_bert_encoder",
1721
+ 1,
1722
+ 3
1723
+ ],
1724
+ [
1725
+ "mobile_bert_encoder",
1726
+ 1,
1727
+ 4
1728
+ ],
1729
+ [
1730
+ "mobile_bert_encoder",
1731
+ 1,
1732
+ 5
1733
+ ],
1734
+ [
1735
+ "mobile_bert_encoder",
1736
+ 1,
1737
+ 6
1738
+ ],
1739
+ [
1740
+ "mobile_bert_encoder",
1741
+ 1,
1742
+ 7
1743
+ ],
1744
+ [
1745
+ "mobile_bert_encoder",
1746
+ 1,
1747
+ 8
1748
+ ],
1749
+ [
1750
+ "mobile_bert_encoder",
1751
+ 1,
1752
+ 9
1753
+ ],
1754
+ [
1755
+ "mobile_bert_encoder",
1756
+ 1,
1757
+ 10
1758
+ ],
1759
+ [
1760
+ "mobile_bert_encoder",
1761
+ 1,
1762
+ 11
1763
+ ],
1764
+ [
1765
+ "mobile_bert_encoder",
1766
+ 1,
1767
+ 12
1768
+ ],
1769
+ [
1770
+ "mobile_bert_encoder",
1771
+ 1,
1772
+ 13
1773
+ ],
1774
+ [
1775
+ "mobile_bert_encoder",
1776
+ 1,
1777
+ 14
1778
+ ],
1779
+ [
1780
+ "mobile_bert_encoder",
1781
+ 1,
1782
+ 15
1783
+ ],
1784
+ [
1785
+ "mobile_bert_encoder",
1786
+ 1,
1787
+ 16
1788
+ ],
1789
+ [
1790
+ "mobile_bert_encoder",
1791
+ 1,
1792
+ 17
1793
+ ],
1794
+ [
1795
+ "mobile_bert_encoder",
1796
+ 1,
1797
+ 18
1798
+ ],
1799
+ [
1800
+ "mobile_bert_encoder",
1801
+ 1,
1802
+ 19
1803
+ ],
1804
+ [
1805
+ "mobile_bert_encoder",
1806
+ 1,
1807
+ 20
1808
+ ],
1809
+ [
1810
+ "mobile_bert_encoder",
1811
+ 1,
1812
+ 21
1813
+ ],
1814
+ [
1815
+ "mobile_bert_encoder",
1816
+ 1,
1817
+ 22
1818
+ ],
1819
+ [
1820
+ "mobile_bert_encoder",
1821
+ 1,
1822
+ 23
1823
+ ]
1824
+ ],
1825
+ "default": [
1826
+ "mobile_bert_encoder",
1827
+ 1,
1828
+ 49
1829
+ ],
1830
+ "encoder_outputs": [
1831
+ [
1832
+ "mobile_bert_encoder",
1833
+ 1,
1834
+ 24
1835
+ ],
1836
+ [
1837
+ "mobile_bert_encoder",
1838
+ 1,
1839
+ 25
1840
+ ],
1841
+ [
1842
+ "mobile_bert_encoder",
1843
+ 1,
1844
+ 26
1845
+ ],
1846
+ [
1847
+ "mobile_bert_encoder",
1848
+ 1,
1849
+ 27
1850
+ ],
1851
+ [
1852
+ "mobile_bert_encoder",
1853
+ 1,
1854
+ 28
1855
+ ],
1856
+ [
1857
+ "mobile_bert_encoder",
1858
+ 1,
1859
+ 29
1860
+ ],
1861
+ [
1862
+ "mobile_bert_encoder",
1863
+ 1,
1864
+ 30
1865
+ ],
1866
+ [
1867
+ "mobile_bert_encoder",
1868
+ 1,
1869
+ 31
1870
+ ],
1871
+ [
1872
+ "mobile_bert_encoder",
1873
+ 1,
1874
+ 32
1875
+ ],
1876
+ [
1877
+ "mobile_bert_encoder",
1878
+ 1,
1879
+ 33
1880
+ ],
1881
+ [
1882
+ "mobile_bert_encoder",
1883
+ 1,
1884
+ 34
1885
+ ],
1886
+ [
1887
+ "mobile_bert_encoder",
1888
+ 1,
1889
+ 35
1890
+ ],
1891
+ [
1892
+ "mobile_bert_encoder",
1893
+ 1,
1894
+ 36
1895
+ ],
1896
+ [
1897
+ "mobile_bert_encoder",
1898
+ 1,
1899
+ 37
1900
+ ],
1901
+ [
1902
+ "mobile_bert_encoder",
1903
+ 1,
1904
+ 38
1905
+ ],
1906
+ [
1907
+ "mobile_bert_encoder",
1908
+ 1,
1909
+ 39
1910
+ ],
1911
+ [
1912
+ "mobile_bert_encoder",
1913
+ 1,
1914
+ 40
1915
+ ],
1916
+ [
1917
+ "mobile_bert_encoder",
1918
+ 1,
1919
+ 41
1920
+ ],
1921
+ [
1922
+ "mobile_bert_encoder",
1923
+ 1,
1924
+ 42
1925
+ ],
1926
+ [
1927
+ "mobile_bert_encoder",
1928
+ 1,
1929
+ 43
1930
+ ],
1931
+ [
1932
+ "mobile_bert_encoder",
1933
+ 1,
1934
+ 44
1935
+ ],
1936
+ [
1937
+ "mobile_bert_encoder",
1938
+ 1,
1939
+ 45
1940
+ ],
1941
+ [
1942
+ "mobile_bert_encoder",
1943
+ 1,
1944
+ 46
1945
+ ],
1946
+ [
1947
+ "mobile_bert_encoder",
1948
+ 1,
1949
+ 47
1950
+ ],
1951
+ [
1952
+ "mobile_bert_encoder",
1953
+ 1,
1954
+ 48
1955
+ ]
1956
+ ],
1957
+ "pooled_output": [
1958
+ "mobile_bert_encoder",
1959
+ 1,
1960
+ 49
1961
+ ],
1962
+ "sequence_output": [
1963
+ "mobile_bert_encoder",
1964
+ 1,
1965
+ 50
1966
+ ]
1967
+ },
1968
+ "pad_token_id": 0,
1969
+ "torch_dtype": "float32",
1970
+ "trainable": true,
1971
+ "transformers_version": "4.33.3",
1972
+ "trigram_input": true,
1973
+ "true_hidden_size": 128,
1974
+ "type_vocab_size": 2,
1975
+ "use_bottleneck": true,
1976
+ "use_bottleneck_attention": false,
1977
+ "vocab_size": 30522
1978
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4241f317fd568b0f0cfaba919daa00bb75ae2877c1569ce4b90091cdcb8e97e1
3
+ size 98705378
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "cls_token": "[CLS]",
4
+ "do_lower_case": true,
5
+ "mask_token": "[MASK]",
6
+ "model_max_length": 1000000000000000019884624838656,
7
+ "pad_token": "[PAD]",
8
+ "sep_token": "[SEP]",
9
+ "strip_accents": null,
10
+ "tokenize_chinese_chars": true,
11
+ "tokenizer_class": "MobileBertTokenizer",
12
+ "unk_token": "[UNK]"
13
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1caeb1ce90fb5cfc845a282f3149fe6949aff03d0f871662f2297af2cdb26d9d
3
+ size 4027
vocab.txt ADDED
The diff for this file is too large to render. See raw diff