mavis-ai commited on
Commit
31bcaf3
·
verified ·
1 Parent(s): ee18880

Upload Multilingual-e5-large-Q8

Browse files
Files changed (11) hide show
  1. .DS_Store +0 -0
  2. .gitattributes +1 -0
  3. LICENSE +25 -0
  4. NOTICE +41 -0
  5. README.md +128 -0
  6. config.json +88 -0
  7. quantization.json +3207 -0
  8. special_tokens_map.json +51 -0
  9. tokenizer.json +3 -0
  10. tokenizer_config.json +54 -0
  11. weights.00.safetensors +3 -0
.DS_Store ADDED
Binary file (6.15 kB). View file
 
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) Microsoft Corporation and contributors
4
+ Copyright (c) Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei and contributors
5
+ Copyright (c) 2026 MAVIS / R.E.V.I.S. contributors for q8 packaging metadata and conversion scripts, if any
6
+
7
+ Permission is hereby granted, free of charge, to any person obtaining a copy
8
+ of this software and associated documentation files, model files, weights,
9
+ configuration files, tokenizer files, and related materials (the "Software"),
10
+ to deal in the Software without restriction, including without limitation the
11
+ rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
12
+ copies of the Software, and to permit persons to whom the Software is
13
+ furnished to do so, subject to the following conditions:
14
+
15
+ The above copyright notice, this permission notice, and any attribution notices
16
+ included with the Software shall be included in all copies or substantial
17
+ portions of the Software.
18
+
19
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
20
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
21
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
22
+ AUTHORS, COPYRIGHT HOLDERS, OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM, DAMAGES
23
+ OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
24
+ ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
25
+ DEALINGS IN THE SOFTWARE.
NOTICE ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ NOTICE
2
+
3
+ This repository redistributes an 8-bit quantized package derived from:
4
+
5
+ intfloat/multilingual-e5-large
6
+ https://huggingface.co/intfloat/multilingual-e5-large
7
+
8
+ The original model is released under the MIT License.
9
+
10
+ Original authors / associated paper:
11
+
12
+ Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
13
+ Multilingual E5 Text Embeddings: A Technical Report
14
+
15
+ Base architecture:
16
+
17
+ XLM-RoBERTa large
18
+
19
+ Embedding dimension:
20
+
21
+ 1024
22
+
23
+ Modification notice:
24
+
25
+ This repository contains a R.E.V.I.S. q8 quantized distribution.
26
+ Selected 2D weight tensors were converted to symmetric per-row int8
27
+ representation with per-row scale tensors.
28
+
29
+ No fine-tuning, additional training, or architecture-level modification
30
+ has been applied.
31
+
32
+ Redistributor:
33
+
34
+ MAVIS / R.E.V.I.S.
35
+
36
+ Purpose:
37
+
38
+ Local semantic embedding, semantic recall, RAG retrieval, and multilingual
39
+ semantic search in the R.E.V.I.S. local Cognitive OS ecosystem.
40
+
41
+ The upstream MIT License text is preserved in LICENSE.
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: mlx
3
+ license: mit
4
+ pipeline_tag: feature-extraction
5
+ base_model: intfloat/multilingual-e5-large
6
+ tags:
7
+ - mlx
8
+ - embeddings
9
+ - sentence-transformers
10
+ - xlm-roberta
11
+ - multilingual
12
+ - quantized
13
+ - int8
14
+ - q8
15
+ - revis
16
+ ---
17
+
18
+ # mavis-ai/multilingual-e5-large-q8
19
+
20
+ This repository contains an 8-bit quantized MLX-compatible distribution of `intfloat/multilingual-e5-large`, prepared for use with **R.E.V.I.S.** as its local semantic embedding model.
21
+
22
+ The model is intended for local text embedding, semantic recall, RAG retrieval, and multilingual semantic search workflows.
23
+
24
+ ## Important Notice
25
+
26
+ This repository is hosted primarily as a dedicated download source for the R.E.V.I.S. application ecosystem. You are free to download and use this model package for your own local embedding or MLX workflows, subject to the MIT License and the attribution notices included in this repository.
27
+
28
+ This package is **not** a new embedding model and has **not** been fine-tuned. It is a quantized redistribution of `intfloat/multilingual-e5-large`.
29
+
30
+ For the original model card, training details, intended usage, and evaluation information, refer to the official upstream model:
31
+
32
+ - Original model: <https://huggingface.co/intfloat/multilingual-e5-large>
33
+ - Base architecture: XLM-RoBERTa large
34
+ - Embedding size: 1024
35
+
36
+ ## Quantization
37
+
38
+ This package stores selected 2D weight tensors using a R.E.V.I.S. q8 format:
39
+
40
+ - Quantization type: symmetric per-row int8
41
+ - Scale format: per-row scale tensor
42
+ - Expected dequantization: `weight = qweight.astype(float16) * scale[:, None].astype(float16)`
43
+
44
+ Typical tensor layout:
45
+
46
+ ```text
47
+ encoder.layer.0.attention.self.query.weight.qweight
48
+ encoder.layer.0.attention.self.query.weight.scale
49
+ ```
50
+
51
+ Non-quantized tensors, such as LayerNorm parameters, bias tensors, and other small metadata tensors, are preserved in their original floating-point representation.
52
+
53
+ This format is optimized for smaller download and storage size. In the current R.E.V.I.S. runtime, q8 tensors may be dequantized to floating point at load time for compatibility with the existing embedding forward path.
54
+
55
+ ## Optimized for R.E.V.I.S. (Local Cognitive OS)
56
+
57
+ We host this model package to serve as the local semantic embedding engine for **R.E.V.I.S.**
58
+
59
+ **R.E.V.I.S.** is a 100% local Cognitive OS for Multi-Agentic AI. It transforms your Mac devices into a distributed Agentic Swarm via zero-config Wi-Fi clustering, allowing you to run heavy AI workloads—like recursive web research, dynamic RAG generation, and multi-step logic—without killing single-machine performance.
60
+
61
+ If you are interested in pushing the absolute limits of local AI and open-weight models, check out our project.
62
+
63
+ - Official Website: <https://mavis-ai.co.jp/revis/>
64
+ - Watch the 13-min Raw Demo (Multi-node Dynamic RAG): <https://x.gd/LxaBF>
65
+ - Follow our updates on X: <https://x.com/mavis_ai_jp>
66
+
67
+ ## Usage Notes
68
+
69
+ For retrieval-style tasks, E5 models typically use different text prefixes for queries and passages. R.E.V.I.S. applies its own canonical query and passage formatting internally.
70
+
71
+ If you use this package outside R.E.V.I.S., refer to the upstream E5 instructions for recommended prompt prefixes and pooling behavior.
72
+
73
+ ## Files
74
+
75
+ Recommended repository files:
76
+
77
+ ```text
78
+ README.md
79
+ LICENSE
80
+ NOTICE
81
+ weights.00.safetensors
82
+ config.json
83
+ tokenizer.json
84
+ tokenizer_config.json
85
+ special_tokens_map.json
86
+ quantization.json
87
+ ```
88
+
89
+ ## License
90
+
91
+ This repository redistributes a quantized package derived from `intfloat/multilingual-e5-large`, which is released under the **MIT License**.
92
+
93
+ The upstream copyright notice and MIT License text are preserved in `LICENSE`.
94
+
95
+ Additional attribution and redistribution notes are included in `NOTICE`.
96
+
97
+ ## Attribution
98
+
99
+ Original model:
100
+
101
+ ```text
102
+ intfloat/multilingual-e5-large
103
+ https://huggingface.co/intfloat/multilingual-e5-large
104
+ ```
105
+
106
+ Original authors / associated paper:
107
+
108
+ ```text
109
+ Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
110
+ Multilingual E5 Text Embeddings: A Technical Report
111
+ ```
112
+
113
+ R.E.V.I.S. q8 package:
114
+
115
+ ```text
116
+ Prepared and redistributed by MAVIS / R.E.V.I.S.
117
+ Quantization: symmetric per-row int8 q8 package for local MLX embedding runtime
118
+ ```
119
+
120
+ ## Modification Notice
121
+
122
+ Compared with the upstream `intfloat/multilingual-e5-large` release, this repository applies the following packaging modification:
123
+
124
+ ```text
125
+ Selected 2D weight tensors were quantized to symmetric per-row int8 q8 representation.
126
+ ```
127
+
128
+ No fine-tuning, additional training, or architecture-level modification has been applied.
config.json ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "return_dict": true,
3
+ "output_hidden_states": false,
4
+ "output_attentions": false,
5
+ "torchscript": false,
6
+ "torch_dtype": "float32",
7
+ "use_bfloat16": false,
8
+ "tf_legacy_loss": false,
9
+ "pruned_heads": {},
10
+ "tie_word_embeddings": true,
11
+ "is_encoder_decoder": false,
12
+ "is_decoder": false,
13
+ "cross_attention_hidden_size": null,
14
+ "add_cross_attention": false,
15
+ "tie_encoder_decoder": false,
16
+ "max_length": 20,
17
+ "min_length": 0,
18
+ "do_sample": false,
19
+ "early_stopping": false,
20
+ "num_beams": 1,
21
+ "num_beam_groups": 1,
22
+ "diversity_penalty": 0.0,
23
+ "temperature": 1.0,
24
+ "top_k": 50,
25
+ "top_p": 1.0,
26
+ "typical_p": 1.0,
27
+ "repetition_penalty": 1.0,
28
+ "length_penalty": 1.0,
29
+ "no_repeat_ngram_size": 0,
30
+ "encoder_no_repeat_ngram_size": 0,
31
+ "bad_words_ids": null,
32
+ "num_return_sequences": 1,
33
+ "chunk_size_feed_forward": 0,
34
+ "output_scores": false,
35
+ "return_dict_in_generate": false,
36
+ "forced_bos_token_id": null,
37
+ "forced_eos_token_id": null,
38
+ "remove_invalid_values": false,
39
+ "exponential_decay_length_penalty": null,
40
+ "suppress_tokens": null,
41
+ "begin_suppress_tokens": null,
42
+ "architectures": [
43
+ "XLMRobertaModel"
44
+ ],
45
+ "finetuning_task": null,
46
+ "id2label": {
47
+ "0": "LABEL_0",
48
+ "1": "LABEL_1"
49
+ },
50
+ "label2id": {
51
+ "LABEL_0": 0,
52
+ "LABEL_1": 1
53
+ },
54
+ "tokenizer_class": null,
55
+ "prefix": null,
56
+ "bos_token_id": 0,
57
+ "pad_token_id": 1,
58
+ "eos_token_id": 2,
59
+ "sep_token_id": null,
60
+ "decoder_start_token_id": null,
61
+ "task_specific_params": null,
62
+ "problem_type": null,
63
+ "_name_or_path": "/Users/katopz/.cache/huggingface/hub/models--intfloat--multilingual-e5-large/snapshots/9f78368af0062735ba99812349c562316e29f719",
64
+ "transformers_version": "4.36.2",
65
+ "model_type": "xlm-roberta",
66
+ "output_past": true,
67
+ "vocab_size": 250002,
68
+ "hidden_size": 1024,
69
+ "num_hidden_layers": 24,
70
+ "num_attention_heads": 16,
71
+ "hidden_act": "gelu",
72
+ "intermediate_size": 4096,
73
+ "hidden_dropout_prob": 0.1,
74
+ "attention_probs_dropout_prob": 0.1,
75
+ "max_position_embeddings": 514,
76
+ "type_vocab_size": 1,
77
+ "initializer_range": 0.02,
78
+ "layer_norm_eps": 1e-05,
79
+ "position_embedding_type": "absolute",
80
+ "use_cache": true,
81
+ "classifier_dropout": null,
82
+ "revis_quantization": {
83
+ "format": "revis-xlm-roberta-e5-q8",
84
+ "bits": 8,
85
+ "type": "symmetric-per-row",
86
+ "manifest": "quantization.json"
87
+ }
88
+ }
quantization.json ADDED
@@ -0,0 +1,3207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "schemaVersion": 1,
3
+ "format": "revis-xlm-roberta-e5-q8",
4
+ "source": "intfloat/multilingual-e5-large",
5
+ "baseWeights": "weights.00.safetensors",
6
+ "quantizedWeights": "weights.00.safetensors",
7
+ "quantization": {
8
+ "type": "symmetric-per-row",
9
+ "bits": 8,
10
+ "quantizedTensorSuffix": ".qweight",
11
+ "scaleTensorSuffix": ".scale",
12
+ "dequantization": "weight = qweight.astype(float32) * scale[:, None]"
13
+ },
14
+ "quantized": [
15
+ {
16
+ "name": "embeddings.position_embeddings.weight",
17
+ "shape": [
18
+ 514,
19
+ 1024
20
+ ],
21
+ "dtype": "float16",
22
+ "qweight": "embeddings.position_embeddings.weight.qweight",
23
+ "scale": "embeddings.position_embeddings.weight.scale"
24
+ },
25
+ {
26
+ "name": "embeddings.token_type_embeddings.weight",
27
+ "shape": [
28
+ 1,
29
+ 1024
30
+ ],
31
+ "dtype": "float16",
32
+ "qweight": "embeddings.token_type_embeddings.weight.qweight",
33
+ "scale": "embeddings.token_type_embeddings.weight.scale"
34
+ },
35
+ {
36
+ "name": "embeddings.word_embeddings.weight",
37
+ "shape": [
38
+ 250002,
39
+ 1024
40
+ ],
41
+ "dtype": "float16",
42
+ "qweight": "embeddings.word_embeddings.weight.qweight",
43
+ "scale": "embeddings.word_embeddings.weight.scale"
44
+ },
45
+ {
46
+ "name": "encoder.layer.0.attention.output.dense.weight",
47
+ "shape": [
48
+ 1024,
49
+ 1024
50
+ ],
51
+ "dtype": "float16",
52
+ "qweight": "encoder.layer.0.attention.output.dense.weight.qweight",
53
+ "scale": "encoder.layer.0.attention.output.dense.weight.scale"
54
+ },
55
+ {
56
+ "name": "encoder.layer.0.attention.self.key.weight",
57
+ "shape": [
58
+ 1024,
59
+ 1024
60
+ ],
61
+ "dtype": "float16",
62
+ "qweight": "encoder.layer.0.attention.self.key.weight.qweight",
63
+ "scale": "encoder.layer.0.attention.self.key.weight.scale"
64
+ },
65
+ {
66
+ "name": "encoder.layer.0.attention.self.query.weight",
67
+ "shape": [
68
+ 1024,
69
+ 1024
70
+ ],
71
+ "dtype": "float16",
72
+ "qweight": "encoder.layer.0.attention.self.query.weight.qweight",
73
+ "scale": "encoder.layer.0.attention.self.query.weight.scale"
74
+ },
75
+ {
76
+ "name": "encoder.layer.0.attention.self.value.weight",
77
+ "shape": [
78
+ 1024,
79
+ 1024
80
+ ],
81
+ "dtype": "float16",
82
+ "qweight": "encoder.layer.0.attention.self.value.weight.qweight",
83
+ "scale": "encoder.layer.0.attention.self.value.weight.scale"
84
+ },
85
+ {
86
+ "name": "encoder.layer.0.intermediate.dense.weight",
87
+ "shape": [
88
+ 4096,
89
+ 1024
90
+ ],
91
+ "dtype": "float16",
92
+ "qweight": "encoder.layer.0.intermediate.dense.weight.qweight",
93
+ "scale": "encoder.layer.0.intermediate.dense.weight.scale"
94
+ },
95
+ {
96
+ "name": "encoder.layer.0.output.dense.weight",
97
+ "shape": [
98
+ 1024,
99
+ 4096
100
+ ],
101
+ "dtype": "float16",
102
+ "qweight": "encoder.layer.0.output.dense.weight.qweight",
103
+ "scale": "encoder.layer.0.output.dense.weight.scale"
104
+ },
105
+ {
106
+ "name": "encoder.layer.1.attention.output.dense.weight",
107
+ "shape": [
108
+ 1024,
109
+ 1024
110
+ ],
111
+ "dtype": "float16",
112
+ "qweight": "encoder.layer.1.attention.output.dense.weight.qweight",
113
+ "scale": "encoder.layer.1.attention.output.dense.weight.scale"
114
+ },
115
+ {
116
+ "name": "encoder.layer.1.attention.self.key.weight",
117
+ "shape": [
118
+ 1024,
119
+ 1024
120
+ ],
121
+ "dtype": "float16",
122
+ "qweight": "encoder.layer.1.attention.self.key.weight.qweight",
123
+ "scale": "encoder.layer.1.attention.self.key.weight.scale"
124
+ },
125
+ {
126
+ "name": "encoder.layer.1.attention.self.query.weight",
127
+ "shape": [
128
+ 1024,
129
+ 1024
130
+ ],
131
+ "dtype": "float16",
132
+ "qweight": "encoder.layer.1.attention.self.query.weight.qweight",
133
+ "scale": "encoder.layer.1.attention.self.query.weight.scale"
134
+ },
135
+ {
136
+ "name": "encoder.layer.1.attention.self.value.weight",
137
+ "shape": [
138
+ 1024,
139
+ 1024
140
+ ],
141
+ "dtype": "float16",
142
+ "qweight": "encoder.layer.1.attention.self.value.weight.qweight",
143
+ "scale": "encoder.layer.1.attention.self.value.weight.scale"
144
+ },
145
+ {
146
+ "name": "encoder.layer.1.intermediate.dense.weight",
147
+ "shape": [
148
+ 4096,
149
+ 1024
150
+ ],
151
+ "dtype": "float16",
152
+ "qweight": "encoder.layer.1.intermediate.dense.weight.qweight",
153
+ "scale": "encoder.layer.1.intermediate.dense.weight.scale"
154
+ },
155
+ {
156
+ "name": "encoder.layer.1.output.dense.weight",
157
+ "shape": [
158
+ 1024,
159
+ 4096
160
+ ],
161
+ "dtype": "float16",
162
+ "qweight": "encoder.layer.1.output.dense.weight.qweight",
163
+ "scale": "encoder.layer.1.output.dense.weight.scale"
164
+ },
165
+ {
166
+ "name": "encoder.layer.10.attention.output.dense.weight",
167
+ "shape": [
168
+ 1024,
169
+ 1024
170
+ ],
171
+ "dtype": "float16",
172
+ "qweight": "encoder.layer.10.attention.output.dense.weight.qweight",
173
+ "scale": "encoder.layer.10.attention.output.dense.weight.scale"
174
+ },
175
+ {
176
+ "name": "encoder.layer.10.attention.self.key.weight",
177
+ "shape": [
178
+ 1024,
179
+ 1024
180
+ ],
181
+ "dtype": "float16",
182
+ "qweight": "encoder.layer.10.attention.self.key.weight.qweight",
183
+ "scale": "encoder.layer.10.attention.self.key.weight.scale"
184
+ },
185
+ {
186
+ "name": "encoder.layer.10.attention.self.query.weight",
187
+ "shape": [
188
+ 1024,
189
+ 1024
190
+ ],
191
+ "dtype": "float16",
192
+ "qweight": "encoder.layer.10.attention.self.query.weight.qweight",
193
+ "scale": "encoder.layer.10.attention.self.query.weight.scale"
194
+ },
195
+ {
196
+ "name": "encoder.layer.10.attention.self.value.weight",
197
+ "shape": [
198
+ 1024,
199
+ 1024
200
+ ],
201
+ "dtype": "float16",
202
+ "qweight": "encoder.layer.10.attention.self.value.weight.qweight",
203
+ "scale": "encoder.layer.10.attention.self.value.weight.scale"
204
+ },
205
+ {
206
+ "name": "encoder.layer.10.intermediate.dense.weight",
207
+ "shape": [
208
+ 4096,
209
+ 1024
210
+ ],
211
+ "dtype": "float16",
212
+ "qweight": "encoder.layer.10.intermediate.dense.weight.qweight",
213
+ "scale": "encoder.layer.10.intermediate.dense.weight.scale"
214
+ },
215
+ {
216
+ "name": "encoder.layer.10.output.dense.weight",
217
+ "shape": [
218
+ 1024,
219
+ 4096
220
+ ],
221
+ "dtype": "float16",
222
+ "qweight": "encoder.layer.10.output.dense.weight.qweight",
223
+ "scale": "encoder.layer.10.output.dense.weight.scale"
224
+ },
225
+ {
226
+ "name": "encoder.layer.11.attention.output.dense.weight",
227
+ "shape": [
228
+ 1024,
229
+ 1024
230
+ ],
231
+ "dtype": "float16",
232
+ "qweight": "encoder.layer.11.attention.output.dense.weight.qweight",
233
+ "scale": "encoder.layer.11.attention.output.dense.weight.scale"
234
+ },
235
+ {
236
+ "name": "encoder.layer.11.attention.self.key.weight",
237
+ "shape": [
238
+ 1024,
239
+ 1024
240
+ ],
241
+ "dtype": "float16",
242
+ "qweight": "encoder.layer.11.attention.self.key.weight.qweight",
243
+ "scale": "encoder.layer.11.attention.self.key.weight.scale"
244
+ },
245
+ {
246
+ "name": "encoder.layer.11.attention.self.query.weight",
247
+ "shape": [
248
+ 1024,
249
+ 1024
250
+ ],
251
+ "dtype": "float16",
252
+ "qweight": "encoder.layer.11.attention.self.query.weight.qweight",
253
+ "scale": "encoder.layer.11.attention.self.query.weight.scale"
254
+ },
255
+ {
256
+ "name": "encoder.layer.11.attention.self.value.weight",
257
+ "shape": [
258
+ 1024,
259
+ 1024
260
+ ],
261
+ "dtype": "float16",
262
+ "qweight": "encoder.layer.11.attention.self.value.weight.qweight",
263
+ "scale": "encoder.layer.11.attention.self.value.weight.scale"
264
+ },
265
+ {
266
+ "name": "encoder.layer.11.intermediate.dense.weight",
267
+ "shape": [
268
+ 4096,
269
+ 1024
270
+ ],
271
+ "dtype": "float16",
272
+ "qweight": "encoder.layer.11.intermediate.dense.weight.qweight",
273
+ "scale": "encoder.layer.11.intermediate.dense.weight.scale"
274
+ },
275
+ {
276
+ "name": "encoder.layer.11.output.dense.weight",
277
+ "shape": [
278
+ 1024,
279
+ 4096
280
+ ],
281
+ "dtype": "float16",
282
+ "qweight": "encoder.layer.11.output.dense.weight.qweight",
283
+ "scale": "encoder.layer.11.output.dense.weight.scale"
284
+ },
285
+ {
286
+ "name": "encoder.layer.12.attention.output.dense.weight",
287
+ "shape": [
288
+ 1024,
289
+ 1024
290
+ ],
291
+ "dtype": "float16",
292
+ "qweight": "encoder.layer.12.attention.output.dense.weight.qweight",
293
+ "scale": "encoder.layer.12.attention.output.dense.weight.scale"
294
+ },
295
+ {
296
+ "name": "encoder.layer.12.attention.self.key.weight",
297
+ "shape": [
298
+ 1024,
299
+ 1024
300
+ ],
301
+ "dtype": "float16",
302
+ "qweight": "encoder.layer.12.attention.self.key.weight.qweight",
303
+ "scale": "encoder.layer.12.attention.self.key.weight.scale"
304
+ },
305
+ {
306
+ "name": "encoder.layer.12.attention.self.query.weight",
307
+ "shape": [
308
+ 1024,
309
+ 1024
310
+ ],
311
+ "dtype": "float16",
312
+ "qweight": "encoder.layer.12.attention.self.query.weight.qweight",
313
+ "scale": "encoder.layer.12.attention.self.query.weight.scale"
314
+ },
315
+ {
316
+ "name": "encoder.layer.12.attention.self.value.weight",
317
+ "shape": [
318
+ 1024,
319
+ 1024
320
+ ],
321
+ "dtype": "float16",
322
+ "qweight": "encoder.layer.12.attention.self.value.weight.qweight",
323
+ "scale": "encoder.layer.12.attention.self.value.weight.scale"
324
+ },
325
+ {
326
+ "name": "encoder.layer.12.intermediate.dense.weight",
327
+ "shape": [
328
+ 4096,
329
+ 1024
330
+ ],
331
+ "dtype": "float16",
332
+ "qweight": "encoder.layer.12.intermediate.dense.weight.qweight",
333
+ "scale": "encoder.layer.12.intermediate.dense.weight.scale"
334
+ },
335
+ {
336
+ "name": "encoder.layer.12.output.dense.weight",
337
+ "shape": [
338
+ 1024,
339
+ 4096
340
+ ],
341
+ "dtype": "float16",
342
+ "qweight": "encoder.layer.12.output.dense.weight.qweight",
343
+ "scale": "encoder.layer.12.output.dense.weight.scale"
344
+ },
345
+ {
346
+ "name": "encoder.layer.13.attention.output.dense.weight",
347
+ "shape": [
348
+ 1024,
349
+ 1024
350
+ ],
351
+ "dtype": "float16",
352
+ "qweight": "encoder.layer.13.attention.output.dense.weight.qweight",
353
+ "scale": "encoder.layer.13.attention.output.dense.weight.scale"
354
+ },
355
+ {
356
+ "name": "encoder.layer.13.attention.self.key.weight",
357
+ "shape": [
358
+ 1024,
359
+ 1024
360
+ ],
361
+ "dtype": "float16",
362
+ "qweight": "encoder.layer.13.attention.self.key.weight.qweight",
363
+ "scale": "encoder.layer.13.attention.self.key.weight.scale"
364
+ },
365
+ {
366
+ "name": "encoder.layer.13.attention.self.query.weight",
367
+ "shape": [
368
+ 1024,
369
+ 1024
370
+ ],
371
+ "dtype": "float16",
372
+ "qweight": "encoder.layer.13.attention.self.query.weight.qweight",
373
+ "scale": "encoder.layer.13.attention.self.query.weight.scale"
374
+ },
375
+ {
376
+ "name": "encoder.layer.13.attention.self.value.weight",
377
+ "shape": [
378
+ 1024,
379
+ 1024
380
+ ],
381
+ "dtype": "float16",
382
+ "qweight": "encoder.layer.13.attention.self.value.weight.qweight",
383
+ "scale": "encoder.layer.13.attention.self.value.weight.scale"
384
+ },
385
+ {
386
+ "name": "encoder.layer.13.intermediate.dense.weight",
387
+ "shape": [
388
+ 4096,
389
+ 1024
390
+ ],
391
+ "dtype": "float16",
392
+ "qweight": "encoder.layer.13.intermediate.dense.weight.qweight",
393
+ "scale": "encoder.layer.13.intermediate.dense.weight.scale"
394
+ },
395
+ {
396
+ "name": "encoder.layer.13.output.dense.weight",
397
+ "shape": [
398
+ 1024,
399
+ 4096
400
+ ],
401
+ "dtype": "float16",
402
+ "qweight": "encoder.layer.13.output.dense.weight.qweight",
403
+ "scale": "encoder.layer.13.output.dense.weight.scale"
404
+ },
405
+ {
406
+ "name": "encoder.layer.14.attention.output.dense.weight",
407
+ "shape": [
408
+ 1024,
409
+ 1024
410
+ ],
411
+ "dtype": "float16",
412
+ "qweight": "encoder.layer.14.attention.output.dense.weight.qweight",
413
+ "scale": "encoder.layer.14.attention.output.dense.weight.scale"
414
+ },
415
+ {
416
+ "name": "encoder.layer.14.attention.self.key.weight",
417
+ "shape": [
418
+ 1024,
419
+ 1024
420
+ ],
421
+ "dtype": "float16",
422
+ "qweight": "encoder.layer.14.attention.self.key.weight.qweight",
423
+ "scale": "encoder.layer.14.attention.self.key.weight.scale"
424
+ },
425
+ {
426
+ "name": "encoder.layer.14.attention.self.query.weight",
427
+ "shape": [
428
+ 1024,
429
+ 1024
430
+ ],
431
+ "dtype": "float16",
432
+ "qweight": "encoder.layer.14.attention.self.query.weight.qweight",
433
+ "scale": "encoder.layer.14.attention.self.query.weight.scale"
434
+ },
435
+ {
436
+ "name": "encoder.layer.14.attention.self.value.weight",
437
+ "shape": [
438
+ 1024,
439
+ 1024
440
+ ],
441
+ "dtype": "float16",
442
+ "qweight": "encoder.layer.14.attention.self.value.weight.qweight",
443
+ "scale": "encoder.layer.14.attention.self.value.weight.scale"
444
+ },
445
+ {
446
+ "name": "encoder.layer.14.intermediate.dense.weight",
447
+ "shape": [
448
+ 4096,
449
+ 1024
450
+ ],
451
+ "dtype": "float16",
452
+ "qweight": "encoder.layer.14.intermediate.dense.weight.qweight",
453
+ "scale": "encoder.layer.14.intermediate.dense.weight.scale"
454
+ },
455
+ {
456
+ "name": "encoder.layer.14.output.dense.weight",
457
+ "shape": [
458
+ 1024,
459
+ 4096
460
+ ],
461
+ "dtype": "float16",
462
+ "qweight": "encoder.layer.14.output.dense.weight.qweight",
463
+ "scale": "encoder.layer.14.output.dense.weight.scale"
464
+ },
465
+ {
466
+ "name": "encoder.layer.15.attention.output.dense.weight",
467
+ "shape": [
468
+ 1024,
469
+ 1024
470
+ ],
471
+ "dtype": "float16",
472
+ "qweight": "encoder.layer.15.attention.output.dense.weight.qweight",
473
+ "scale": "encoder.layer.15.attention.output.dense.weight.scale"
474
+ },
475
+ {
476
+ "name": "encoder.layer.15.attention.self.key.weight",
477
+ "shape": [
478
+ 1024,
479
+ 1024
480
+ ],
481
+ "dtype": "float16",
482
+ "qweight": "encoder.layer.15.attention.self.key.weight.qweight",
483
+ "scale": "encoder.layer.15.attention.self.key.weight.scale"
484
+ },
485
+ {
486
+ "name": "encoder.layer.15.attention.self.query.weight",
487
+ "shape": [
488
+ 1024,
489
+ 1024
490
+ ],
491
+ "dtype": "float16",
492
+ "qweight": "encoder.layer.15.attention.self.query.weight.qweight",
493
+ "scale": "encoder.layer.15.attention.self.query.weight.scale"
494
+ },
495
+ {
496
+ "name": "encoder.layer.15.attention.self.value.weight",
497
+ "shape": [
498
+ 1024,
499
+ 1024
500
+ ],
501
+ "dtype": "float16",
502
+ "qweight": "encoder.layer.15.attention.self.value.weight.qweight",
503
+ "scale": "encoder.layer.15.attention.self.value.weight.scale"
504
+ },
505
+ {
506
+ "name": "encoder.layer.15.intermediate.dense.weight",
507
+ "shape": [
508
+ 4096,
509
+ 1024
510
+ ],
511
+ "dtype": "float16",
512
+ "qweight": "encoder.layer.15.intermediate.dense.weight.qweight",
513
+ "scale": "encoder.layer.15.intermediate.dense.weight.scale"
514
+ },
515
+ {
516
+ "name": "encoder.layer.15.output.dense.weight",
517
+ "shape": [
518
+ 1024,
519
+ 4096
520
+ ],
521
+ "dtype": "float16",
522
+ "qweight": "encoder.layer.15.output.dense.weight.qweight",
523
+ "scale": "encoder.layer.15.output.dense.weight.scale"
524
+ },
525
+ {
526
+ "name": "encoder.layer.16.attention.output.dense.weight",
527
+ "shape": [
528
+ 1024,
529
+ 1024
530
+ ],
531
+ "dtype": "float16",
532
+ "qweight": "encoder.layer.16.attention.output.dense.weight.qweight",
533
+ "scale": "encoder.layer.16.attention.output.dense.weight.scale"
534
+ },
535
+ {
536
+ "name": "encoder.layer.16.attention.self.key.weight",
537
+ "shape": [
538
+ 1024,
539
+ 1024
540
+ ],
541
+ "dtype": "float16",
542
+ "qweight": "encoder.layer.16.attention.self.key.weight.qweight",
543
+ "scale": "encoder.layer.16.attention.self.key.weight.scale"
544
+ },
545
+ {
546
+ "name": "encoder.layer.16.attention.self.query.weight",
547
+ "shape": [
548
+ 1024,
549
+ 1024
550
+ ],
551
+ "dtype": "float16",
552
+ "qweight": "encoder.layer.16.attention.self.query.weight.qweight",
553
+ "scale": "encoder.layer.16.attention.self.query.weight.scale"
554
+ },
555
+ {
556
+ "name": "encoder.layer.16.attention.self.value.weight",
557
+ "shape": [
558
+ 1024,
559
+ 1024
560
+ ],
561
+ "dtype": "float16",
562
+ "qweight": "encoder.layer.16.attention.self.value.weight.qweight",
563
+ "scale": "encoder.layer.16.attention.self.value.weight.scale"
564
+ },
565
+ {
566
+ "name": "encoder.layer.16.intermediate.dense.weight",
567
+ "shape": [
568
+ 4096,
569
+ 1024
570
+ ],
571
+ "dtype": "float16",
572
+ "qweight": "encoder.layer.16.intermediate.dense.weight.qweight",
573
+ "scale": "encoder.layer.16.intermediate.dense.weight.scale"
574
+ },
575
+ {
576
+ "name": "encoder.layer.16.output.dense.weight",
577
+ "shape": [
578
+ 1024,
579
+ 4096
580
+ ],
581
+ "dtype": "float16",
582
+ "qweight": "encoder.layer.16.output.dense.weight.qweight",
583
+ "scale": "encoder.layer.16.output.dense.weight.scale"
584
+ },
585
+ {
586
+ "name": "encoder.layer.17.attention.output.dense.weight",
587
+ "shape": [
588
+ 1024,
589
+ 1024
590
+ ],
591
+ "dtype": "float16",
592
+ "qweight": "encoder.layer.17.attention.output.dense.weight.qweight",
593
+ "scale": "encoder.layer.17.attention.output.dense.weight.scale"
594
+ },
595
+ {
596
+ "name": "encoder.layer.17.attention.self.key.weight",
597
+ "shape": [
598
+ 1024,
599
+ 1024
600
+ ],
601
+ "dtype": "float16",
602
+ "qweight": "encoder.layer.17.attention.self.key.weight.qweight",
603
+ "scale": "encoder.layer.17.attention.self.key.weight.scale"
604
+ },
605
+ {
606
+ "name": "encoder.layer.17.attention.self.query.weight",
607
+ "shape": [
608
+ 1024,
609
+ 1024
610
+ ],
611
+ "dtype": "float16",
612
+ "qweight": "encoder.layer.17.attention.self.query.weight.qweight",
613
+ "scale": "encoder.layer.17.attention.self.query.weight.scale"
614
+ },
615
+ {
616
+ "name": "encoder.layer.17.attention.self.value.weight",
617
+ "shape": [
618
+ 1024,
619
+ 1024
620
+ ],
621
+ "dtype": "float16",
622
+ "qweight": "encoder.layer.17.attention.self.value.weight.qweight",
623
+ "scale": "encoder.layer.17.attention.self.value.weight.scale"
624
+ },
625
+ {
626
+ "name": "encoder.layer.17.intermediate.dense.weight",
627
+ "shape": [
628
+ 4096,
629
+ 1024
630
+ ],
631
+ "dtype": "float16",
632
+ "qweight": "encoder.layer.17.intermediate.dense.weight.qweight",
633
+ "scale": "encoder.layer.17.intermediate.dense.weight.scale"
634
+ },
635
+ {
636
+ "name": "encoder.layer.17.output.dense.weight",
637
+ "shape": [
638
+ 1024,
639
+ 4096
640
+ ],
641
+ "dtype": "float16",
642
+ "qweight": "encoder.layer.17.output.dense.weight.qweight",
643
+ "scale": "encoder.layer.17.output.dense.weight.scale"
644
+ },
645
+ {
646
+ "name": "encoder.layer.18.attention.output.dense.weight",
647
+ "shape": [
648
+ 1024,
649
+ 1024
650
+ ],
651
+ "dtype": "float16",
652
+ "qweight": "encoder.layer.18.attention.output.dense.weight.qweight",
653
+ "scale": "encoder.layer.18.attention.output.dense.weight.scale"
654
+ },
655
+ {
656
+ "name": "encoder.layer.18.attention.self.key.weight",
657
+ "shape": [
658
+ 1024,
659
+ 1024
660
+ ],
661
+ "dtype": "float16",
662
+ "qweight": "encoder.layer.18.attention.self.key.weight.qweight",
663
+ "scale": "encoder.layer.18.attention.self.key.weight.scale"
664
+ },
665
+ {
666
+ "name": "encoder.layer.18.attention.self.query.weight",
667
+ "shape": [
668
+ 1024,
669
+ 1024
670
+ ],
671
+ "dtype": "float16",
672
+ "qweight": "encoder.layer.18.attention.self.query.weight.qweight",
673
+ "scale": "encoder.layer.18.attention.self.query.weight.scale"
674
+ },
675
+ {
676
+ "name": "encoder.layer.18.attention.self.value.weight",
677
+ "shape": [
678
+ 1024,
679
+ 1024
680
+ ],
681
+ "dtype": "float16",
682
+ "qweight": "encoder.layer.18.attention.self.value.weight.qweight",
683
+ "scale": "encoder.layer.18.attention.self.value.weight.scale"
684
+ },
685
+ {
686
+ "name": "encoder.layer.18.intermediate.dense.weight",
687
+ "shape": [
688
+ 4096,
689
+ 1024
690
+ ],
691
+ "dtype": "float16",
692
+ "qweight": "encoder.layer.18.intermediate.dense.weight.qweight",
693
+ "scale": "encoder.layer.18.intermediate.dense.weight.scale"
694
+ },
695
+ {
696
+ "name": "encoder.layer.18.output.dense.weight",
697
+ "shape": [
698
+ 1024,
699
+ 4096
700
+ ],
701
+ "dtype": "float16",
702
+ "qweight": "encoder.layer.18.output.dense.weight.qweight",
703
+ "scale": "encoder.layer.18.output.dense.weight.scale"
704
+ },
705
+ {
706
+ "name": "encoder.layer.19.attention.output.dense.weight",
707
+ "shape": [
708
+ 1024,
709
+ 1024
710
+ ],
711
+ "dtype": "float16",
712
+ "qweight": "encoder.layer.19.attention.output.dense.weight.qweight",
713
+ "scale": "encoder.layer.19.attention.output.dense.weight.scale"
714
+ },
715
+ {
716
+ "name": "encoder.layer.19.attention.self.key.weight",
717
+ "shape": [
718
+ 1024,
719
+ 1024
720
+ ],
721
+ "dtype": "float16",
722
+ "qweight": "encoder.layer.19.attention.self.key.weight.qweight",
723
+ "scale": "encoder.layer.19.attention.self.key.weight.scale"
724
+ },
725
+ {
726
+ "name": "encoder.layer.19.attention.self.query.weight",
727
+ "shape": [
728
+ 1024,
729
+ 1024
730
+ ],
731
+ "dtype": "float16",
732
+ "qweight": "encoder.layer.19.attention.self.query.weight.qweight",
733
+ "scale": "encoder.layer.19.attention.self.query.weight.scale"
734
+ },
735
+ {
736
+ "name": "encoder.layer.19.attention.self.value.weight",
737
+ "shape": [
738
+ 1024,
739
+ 1024
740
+ ],
741
+ "dtype": "float16",
742
+ "qweight": "encoder.layer.19.attention.self.value.weight.qweight",
743
+ "scale": "encoder.layer.19.attention.self.value.weight.scale"
744
+ },
745
+ {
746
+ "name": "encoder.layer.19.intermediate.dense.weight",
747
+ "shape": [
748
+ 4096,
749
+ 1024
750
+ ],
751
+ "dtype": "float16",
752
+ "qweight": "encoder.layer.19.intermediate.dense.weight.qweight",
753
+ "scale": "encoder.layer.19.intermediate.dense.weight.scale"
754
+ },
755
+ {
756
+ "name": "encoder.layer.19.output.dense.weight",
757
+ "shape": [
758
+ 1024,
759
+ 4096
760
+ ],
761
+ "dtype": "float16",
762
+ "qweight": "encoder.layer.19.output.dense.weight.qweight",
763
+ "scale": "encoder.layer.19.output.dense.weight.scale"
764
+ },
765
+ {
766
+ "name": "encoder.layer.2.attention.output.dense.weight",
767
+ "shape": [
768
+ 1024,
769
+ 1024
770
+ ],
771
+ "dtype": "float16",
772
+ "qweight": "encoder.layer.2.attention.output.dense.weight.qweight",
773
+ "scale": "encoder.layer.2.attention.output.dense.weight.scale"
774
+ },
775
+ {
776
+ "name": "encoder.layer.2.attention.self.key.weight",
777
+ "shape": [
778
+ 1024,
779
+ 1024
780
+ ],
781
+ "dtype": "float16",
782
+ "qweight": "encoder.layer.2.attention.self.key.weight.qweight",
783
+ "scale": "encoder.layer.2.attention.self.key.weight.scale"
784
+ },
785
+ {
786
+ "name": "encoder.layer.2.attention.self.query.weight",
787
+ "shape": [
788
+ 1024,
789
+ 1024
790
+ ],
791
+ "dtype": "float16",
792
+ "qweight": "encoder.layer.2.attention.self.query.weight.qweight",
793
+ "scale": "encoder.layer.2.attention.self.query.weight.scale"
794
+ },
795
+ {
796
+ "name": "encoder.layer.2.attention.self.value.weight",
797
+ "shape": [
798
+ 1024,
799
+ 1024
800
+ ],
801
+ "dtype": "float16",
802
+ "qweight": "encoder.layer.2.attention.self.value.weight.qweight",
803
+ "scale": "encoder.layer.2.attention.self.value.weight.scale"
804
+ },
805
+ {
806
+ "name": "encoder.layer.2.intermediate.dense.weight",
807
+ "shape": [
808
+ 4096,
809
+ 1024
810
+ ],
811
+ "dtype": "float16",
812
+ "qweight": "encoder.layer.2.intermediate.dense.weight.qweight",
813
+ "scale": "encoder.layer.2.intermediate.dense.weight.scale"
814
+ },
815
+ {
816
+ "name": "encoder.layer.2.output.dense.weight",
817
+ "shape": [
818
+ 1024,
819
+ 4096
820
+ ],
821
+ "dtype": "float16",
822
+ "qweight": "encoder.layer.2.output.dense.weight.qweight",
823
+ "scale": "encoder.layer.2.output.dense.weight.scale"
824
+ },
825
+ {
826
+ "name": "encoder.layer.20.attention.output.dense.weight",
827
+ "shape": [
828
+ 1024,
829
+ 1024
830
+ ],
831
+ "dtype": "float16",
832
+ "qweight": "encoder.layer.20.attention.output.dense.weight.qweight",
833
+ "scale": "encoder.layer.20.attention.output.dense.weight.scale"
834
+ },
835
+ {
836
+ "name": "encoder.layer.20.attention.self.key.weight",
837
+ "shape": [
838
+ 1024,
839
+ 1024
840
+ ],
841
+ "dtype": "float16",
842
+ "qweight": "encoder.layer.20.attention.self.key.weight.qweight",
843
+ "scale": "encoder.layer.20.attention.self.key.weight.scale"
844
+ },
845
+ {
846
+ "name": "encoder.layer.20.attention.self.query.weight",
847
+ "shape": [
848
+ 1024,
849
+ 1024
850
+ ],
851
+ "dtype": "float16",
852
+ "qweight": "encoder.layer.20.attention.self.query.weight.qweight",
853
+ "scale": "encoder.layer.20.attention.self.query.weight.scale"
854
+ },
855
+ {
856
+ "name": "encoder.layer.20.attention.self.value.weight",
857
+ "shape": [
858
+ 1024,
859
+ 1024
860
+ ],
861
+ "dtype": "float16",
862
+ "qweight": "encoder.layer.20.attention.self.value.weight.qweight",
863
+ "scale": "encoder.layer.20.attention.self.value.weight.scale"
864
+ },
865
+ {
866
+ "name": "encoder.layer.20.intermediate.dense.weight",
867
+ "shape": [
868
+ 4096,
869
+ 1024
870
+ ],
871
+ "dtype": "float16",
872
+ "qweight": "encoder.layer.20.intermediate.dense.weight.qweight",
873
+ "scale": "encoder.layer.20.intermediate.dense.weight.scale"
874
+ },
875
+ {
876
+ "name": "encoder.layer.20.output.dense.weight",
877
+ "shape": [
878
+ 1024,
879
+ 4096
880
+ ],
881
+ "dtype": "float16",
882
+ "qweight": "encoder.layer.20.output.dense.weight.qweight",
883
+ "scale": "encoder.layer.20.output.dense.weight.scale"
884
+ },
885
+ {
886
+ "name": "encoder.layer.21.attention.output.dense.weight",
887
+ "shape": [
888
+ 1024,
889
+ 1024
890
+ ],
891
+ "dtype": "float16",
892
+ "qweight": "encoder.layer.21.attention.output.dense.weight.qweight",
893
+ "scale": "encoder.layer.21.attention.output.dense.weight.scale"
894
+ },
895
+ {
896
+ "name": "encoder.layer.21.attention.self.key.weight",
897
+ "shape": [
898
+ 1024,
899
+ 1024
900
+ ],
901
+ "dtype": "float16",
902
+ "qweight": "encoder.layer.21.attention.self.key.weight.qweight",
903
+ "scale": "encoder.layer.21.attention.self.key.weight.scale"
904
+ },
905
+ {
906
+ "name": "encoder.layer.21.attention.self.query.weight",
907
+ "shape": [
908
+ 1024,
909
+ 1024
910
+ ],
911
+ "dtype": "float16",
912
+ "qweight": "encoder.layer.21.attention.self.query.weight.qweight",
913
+ "scale": "encoder.layer.21.attention.self.query.weight.scale"
914
+ },
915
+ {
916
+ "name": "encoder.layer.21.attention.self.value.weight",
917
+ "shape": [
918
+ 1024,
919
+ 1024
920
+ ],
921
+ "dtype": "float16",
922
+ "qweight": "encoder.layer.21.attention.self.value.weight.qweight",
923
+ "scale": "encoder.layer.21.attention.self.value.weight.scale"
924
+ },
925
+ {
926
+ "name": "encoder.layer.21.intermediate.dense.weight",
927
+ "shape": [
928
+ 4096,
929
+ 1024
930
+ ],
931
+ "dtype": "float16",
932
+ "qweight": "encoder.layer.21.intermediate.dense.weight.qweight",
933
+ "scale": "encoder.layer.21.intermediate.dense.weight.scale"
934
+ },
935
+ {
936
+ "name": "encoder.layer.21.output.dense.weight",
937
+ "shape": [
938
+ 1024,
939
+ 4096
940
+ ],
941
+ "dtype": "float16",
942
+ "qweight": "encoder.layer.21.output.dense.weight.qweight",
943
+ "scale": "encoder.layer.21.output.dense.weight.scale"
944
+ },
945
+ {
946
+ "name": "encoder.layer.22.attention.output.dense.weight",
947
+ "shape": [
948
+ 1024,
949
+ 1024
950
+ ],
951
+ "dtype": "float16",
952
+ "qweight": "encoder.layer.22.attention.output.dense.weight.qweight",
953
+ "scale": "encoder.layer.22.attention.output.dense.weight.scale"
954
+ },
955
+ {
956
+ "name": "encoder.layer.22.attention.self.key.weight",
957
+ "shape": [
958
+ 1024,
959
+ 1024
960
+ ],
961
+ "dtype": "float16",
962
+ "qweight": "encoder.layer.22.attention.self.key.weight.qweight",
963
+ "scale": "encoder.layer.22.attention.self.key.weight.scale"
964
+ },
965
+ {
966
+ "name": "encoder.layer.22.attention.self.query.weight",
967
+ "shape": [
968
+ 1024,
969
+ 1024
970
+ ],
971
+ "dtype": "float16",
972
+ "qweight": "encoder.layer.22.attention.self.query.weight.qweight",
973
+ "scale": "encoder.layer.22.attention.self.query.weight.scale"
974
+ },
975
+ {
976
+ "name": "encoder.layer.22.attention.self.value.weight",
977
+ "shape": [
978
+ 1024,
979
+ 1024
980
+ ],
981
+ "dtype": "float16",
982
+ "qweight": "encoder.layer.22.attention.self.value.weight.qweight",
983
+ "scale": "encoder.layer.22.attention.self.value.weight.scale"
984
+ },
985
+ {
986
+ "name": "encoder.layer.22.intermediate.dense.weight",
987
+ "shape": [
988
+ 4096,
989
+ 1024
990
+ ],
991
+ "dtype": "float16",
992
+ "qweight": "encoder.layer.22.intermediate.dense.weight.qweight",
993
+ "scale": "encoder.layer.22.intermediate.dense.weight.scale"
994
+ },
995
+ {
996
+ "name": "encoder.layer.22.output.dense.weight",
997
+ "shape": [
998
+ 1024,
999
+ 4096
1000
+ ],
1001
+ "dtype": "float16",
1002
+ "qweight": "encoder.layer.22.output.dense.weight.qweight",
1003
+ "scale": "encoder.layer.22.output.dense.weight.scale"
1004
+ },
1005
+ {
1006
+ "name": "encoder.layer.23.attention.output.dense.weight",
1007
+ "shape": [
1008
+ 1024,
1009
+ 1024
1010
+ ],
1011
+ "dtype": "float16",
1012
+ "qweight": "encoder.layer.23.attention.output.dense.weight.qweight",
1013
+ "scale": "encoder.layer.23.attention.output.dense.weight.scale"
1014
+ },
1015
+ {
1016
+ "name": "encoder.layer.23.attention.self.key.weight",
1017
+ "shape": [
1018
+ 1024,
1019
+ 1024
1020
+ ],
1021
+ "dtype": "float16",
1022
+ "qweight": "encoder.layer.23.attention.self.key.weight.qweight",
1023
+ "scale": "encoder.layer.23.attention.self.key.weight.scale"
1024
+ },
1025
+ {
1026
+ "name": "encoder.layer.23.attention.self.query.weight",
1027
+ "shape": [
1028
+ 1024,
1029
+ 1024
1030
+ ],
1031
+ "dtype": "float16",
1032
+ "qweight": "encoder.layer.23.attention.self.query.weight.qweight",
1033
+ "scale": "encoder.layer.23.attention.self.query.weight.scale"
1034
+ },
1035
+ {
1036
+ "name": "encoder.layer.23.attention.self.value.weight",
1037
+ "shape": [
1038
+ 1024,
1039
+ 1024
1040
+ ],
1041
+ "dtype": "float16",
1042
+ "qweight": "encoder.layer.23.attention.self.value.weight.qweight",
1043
+ "scale": "encoder.layer.23.attention.self.value.weight.scale"
1044
+ },
1045
+ {
1046
+ "name": "encoder.layer.23.intermediate.dense.weight",
1047
+ "shape": [
1048
+ 4096,
1049
+ 1024
1050
+ ],
1051
+ "dtype": "float16",
1052
+ "qweight": "encoder.layer.23.intermediate.dense.weight.qweight",
1053
+ "scale": "encoder.layer.23.intermediate.dense.weight.scale"
1054
+ },
1055
+ {
1056
+ "name": "encoder.layer.23.output.dense.weight",
1057
+ "shape": [
1058
+ 1024,
1059
+ 4096
1060
+ ],
1061
+ "dtype": "float16",
1062
+ "qweight": "encoder.layer.23.output.dense.weight.qweight",
1063
+ "scale": "encoder.layer.23.output.dense.weight.scale"
1064
+ },
1065
+ {
1066
+ "name": "encoder.layer.3.attention.output.dense.weight",
1067
+ "shape": [
1068
+ 1024,
1069
+ 1024
1070
+ ],
1071
+ "dtype": "float16",
1072
+ "qweight": "encoder.layer.3.attention.output.dense.weight.qweight",
1073
+ "scale": "encoder.layer.3.attention.output.dense.weight.scale"
1074
+ },
1075
+ {
1076
+ "name": "encoder.layer.3.attention.self.key.weight",
1077
+ "shape": [
1078
+ 1024,
1079
+ 1024
1080
+ ],
1081
+ "dtype": "float16",
1082
+ "qweight": "encoder.layer.3.attention.self.key.weight.qweight",
1083
+ "scale": "encoder.layer.3.attention.self.key.weight.scale"
1084
+ },
1085
+ {
1086
+ "name": "encoder.layer.3.attention.self.query.weight",
1087
+ "shape": [
1088
+ 1024,
1089
+ 1024
1090
+ ],
1091
+ "dtype": "float16",
1092
+ "qweight": "encoder.layer.3.attention.self.query.weight.qweight",
1093
+ "scale": "encoder.layer.3.attention.self.query.weight.scale"
1094
+ },
1095
+ {
1096
+ "name": "encoder.layer.3.attention.self.value.weight",
1097
+ "shape": [
1098
+ 1024,
1099
+ 1024
1100
+ ],
1101
+ "dtype": "float16",
1102
+ "qweight": "encoder.layer.3.attention.self.value.weight.qweight",
1103
+ "scale": "encoder.layer.3.attention.self.value.weight.scale"
1104
+ },
1105
+ {
1106
+ "name": "encoder.layer.3.intermediate.dense.weight",
1107
+ "shape": [
1108
+ 4096,
1109
+ 1024
1110
+ ],
1111
+ "dtype": "float16",
1112
+ "qweight": "encoder.layer.3.intermediate.dense.weight.qweight",
1113
+ "scale": "encoder.layer.3.intermediate.dense.weight.scale"
1114
+ },
1115
+ {
1116
+ "name": "encoder.layer.3.output.dense.weight",
1117
+ "shape": [
1118
+ 1024,
1119
+ 4096
1120
+ ],
1121
+ "dtype": "float16",
1122
+ "qweight": "encoder.layer.3.output.dense.weight.qweight",
1123
+ "scale": "encoder.layer.3.output.dense.weight.scale"
1124
+ },
1125
+ {
1126
+ "name": "encoder.layer.4.attention.output.dense.weight",
1127
+ "shape": [
1128
+ 1024,
1129
+ 1024
1130
+ ],
1131
+ "dtype": "float16",
1132
+ "qweight": "encoder.layer.4.attention.output.dense.weight.qweight",
1133
+ "scale": "encoder.layer.4.attention.output.dense.weight.scale"
1134
+ },
1135
+ {
1136
+ "name": "encoder.layer.4.attention.self.key.weight",
1137
+ "shape": [
1138
+ 1024,
1139
+ 1024
1140
+ ],
1141
+ "dtype": "float16",
1142
+ "qweight": "encoder.layer.4.attention.self.key.weight.qweight",
1143
+ "scale": "encoder.layer.4.attention.self.key.weight.scale"
1144
+ },
1145
+ {
1146
+ "name": "encoder.layer.4.attention.self.query.weight",
1147
+ "shape": [
1148
+ 1024,
1149
+ 1024
1150
+ ],
1151
+ "dtype": "float16",
1152
+ "qweight": "encoder.layer.4.attention.self.query.weight.qweight",
1153
+ "scale": "encoder.layer.4.attention.self.query.weight.scale"
1154
+ },
1155
+ {
1156
+ "name": "encoder.layer.4.attention.self.value.weight",
1157
+ "shape": [
1158
+ 1024,
1159
+ 1024
1160
+ ],
1161
+ "dtype": "float16",
1162
+ "qweight": "encoder.layer.4.attention.self.value.weight.qweight",
1163
+ "scale": "encoder.layer.4.attention.self.value.weight.scale"
1164
+ },
1165
+ {
1166
+ "name": "encoder.layer.4.intermediate.dense.weight",
1167
+ "shape": [
1168
+ 4096,
1169
+ 1024
1170
+ ],
1171
+ "dtype": "float16",
1172
+ "qweight": "encoder.layer.4.intermediate.dense.weight.qweight",
1173
+ "scale": "encoder.layer.4.intermediate.dense.weight.scale"
1174
+ },
1175
+ {
1176
+ "name": "encoder.layer.4.output.dense.weight",
1177
+ "shape": [
1178
+ 1024,
1179
+ 4096
1180
+ ],
1181
+ "dtype": "float16",
1182
+ "qweight": "encoder.layer.4.output.dense.weight.qweight",
1183
+ "scale": "encoder.layer.4.output.dense.weight.scale"
1184
+ },
1185
+ {
1186
+ "name": "encoder.layer.5.attention.output.dense.weight",
1187
+ "shape": [
1188
+ 1024,
1189
+ 1024
1190
+ ],
1191
+ "dtype": "float16",
1192
+ "qweight": "encoder.layer.5.attention.output.dense.weight.qweight",
1193
+ "scale": "encoder.layer.5.attention.output.dense.weight.scale"
1194
+ },
1195
+ {
1196
+ "name": "encoder.layer.5.attention.self.key.weight",
1197
+ "shape": [
1198
+ 1024,
1199
+ 1024
1200
+ ],
1201
+ "dtype": "float16",
1202
+ "qweight": "encoder.layer.5.attention.self.key.weight.qweight",
1203
+ "scale": "encoder.layer.5.attention.self.key.weight.scale"
1204
+ },
1205
+ {
1206
+ "name": "encoder.layer.5.attention.self.query.weight",
1207
+ "shape": [
1208
+ 1024,
1209
+ 1024
1210
+ ],
1211
+ "dtype": "float16",
1212
+ "qweight": "encoder.layer.5.attention.self.query.weight.qweight",
1213
+ "scale": "encoder.layer.5.attention.self.query.weight.scale"
1214
+ },
1215
+ {
1216
+ "name": "encoder.layer.5.attention.self.value.weight",
1217
+ "shape": [
1218
+ 1024,
1219
+ 1024
1220
+ ],
1221
+ "dtype": "float16",
1222
+ "qweight": "encoder.layer.5.attention.self.value.weight.qweight",
1223
+ "scale": "encoder.layer.5.attention.self.value.weight.scale"
1224
+ },
1225
+ {
1226
+ "name": "encoder.layer.5.intermediate.dense.weight",
1227
+ "shape": [
1228
+ 4096,
1229
+ 1024
1230
+ ],
1231
+ "dtype": "float16",
1232
+ "qweight": "encoder.layer.5.intermediate.dense.weight.qweight",
1233
+ "scale": "encoder.layer.5.intermediate.dense.weight.scale"
1234
+ },
1235
+ {
1236
+ "name": "encoder.layer.5.output.dense.weight",
1237
+ "shape": [
1238
+ 1024,
1239
+ 4096
1240
+ ],
1241
+ "dtype": "float16",
1242
+ "qweight": "encoder.layer.5.output.dense.weight.qweight",
1243
+ "scale": "encoder.layer.5.output.dense.weight.scale"
1244
+ },
1245
+ {
1246
+ "name": "encoder.layer.6.attention.output.dense.weight",
1247
+ "shape": [
1248
+ 1024,
1249
+ 1024
1250
+ ],
1251
+ "dtype": "float16",
1252
+ "qweight": "encoder.layer.6.attention.output.dense.weight.qweight",
1253
+ "scale": "encoder.layer.6.attention.output.dense.weight.scale"
1254
+ },
1255
+ {
1256
+ "name": "encoder.layer.6.attention.self.key.weight",
1257
+ "shape": [
1258
+ 1024,
1259
+ 1024
1260
+ ],
1261
+ "dtype": "float16",
1262
+ "qweight": "encoder.layer.6.attention.self.key.weight.qweight",
1263
+ "scale": "encoder.layer.6.attention.self.key.weight.scale"
1264
+ },
1265
+ {
1266
+ "name": "encoder.layer.6.attention.self.query.weight",
1267
+ "shape": [
1268
+ 1024,
1269
+ 1024
1270
+ ],
1271
+ "dtype": "float16",
1272
+ "qweight": "encoder.layer.6.attention.self.query.weight.qweight",
1273
+ "scale": "encoder.layer.6.attention.self.query.weight.scale"
1274
+ },
1275
+ {
1276
+ "name": "encoder.layer.6.attention.self.value.weight",
1277
+ "shape": [
1278
+ 1024,
1279
+ 1024
1280
+ ],
1281
+ "dtype": "float16",
1282
+ "qweight": "encoder.layer.6.attention.self.value.weight.qweight",
1283
+ "scale": "encoder.layer.6.attention.self.value.weight.scale"
1284
+ },
1285
+ {
1286
+ "name": "encoder.layer.6.intermediate.dense.weight",
1287
+ "shape": [
1288
+ 4096,
1289
+ 1024
1290
+ ],
1291
+ "dtype": "float16",
1292
+ "qweight": "encoder.layer.6.intermediate.dense.weight.qweight",
1293
+ "scale": "encoder.layer.6.intermediate.dense.weight.scale"
1294
+ },
1295
+ {
1296
+ "name": "encoder.layer.6.output.dense.weight",
1297
+ "shape": [
1298
+ 1024,
1299
+ 4096
1300
+ ],
1301
+ "dtype": "float16",
1302
+ "qweight": "encoder.layer.6.output.dense.weight.qweight",
1303
+ "scale": "encoder.layer.6.output.dense.weight.scale"
1304
+ },
1305
+ {
1306
+ "name": "encoder.layer.7.attention.output.dense.weight",
1307
+ "shape": [
1308
+ 1024,
1309
+ 1024
1310
+ ],
1311
+ "dtype": "float16",
1312
+ "qweight": "encoder.layer.7.attention.output.dense.weight.qweight",
1313
+ "scale": "encoder.layer.7.attention.output.dense.weight.scale"
1314
+ },
1315
+ {
1316
+ "name": "encoder.layer.7.attention.self.key.weight",
1317
+ "shape": [
1318
+ 1024,
1319
+ 1024
1320
+ ],
1321
+ "dtype": "float16",
1322
+ "qweight": "encoder.layer.7.attention.self.key.weight.qweight",
1323
+ "scale": "encoder.layer.7.attention.self.key.weight.scale"
1324
+ },
1325
+ {
1326
+ "name": "encoder.layer.7.attention.self.query.weight",
1327
+ "shape": [
1328
+ 1024,
1329
+ 1024
1330
+ ],
1331
+ "dtype": "float16",
1332
+ "qweight": "encoder.layer.7.attention.self.query.weight.qweight",
1333
+ "scale": "encoder.layer.7.attention.self.query.weight.scale"
1334
+ },
1335
+ {
1336
+ "name": "encoder.layer.7.attention.self.value.weight",
1337
+ "shape": [
1338
+ 1024,
1339
+ 1024
1340
+ ],
1341
+ "dtype": "float16",
1342
+ "qweight": "encoder.layer.7.attention.self.value.weight.qweight",
1343
+ "scale": "encoder.layer.7.attention.self.value.weight.scale"
1344
+ },
1345
+ {
1346
+ "name": "encoder.layer.7.intermediate.dense.weight",
1347
+ "shape": [
1348
+ 4096,
1349
+ 1024
1350
+ ],
1351
+ "dtype": "float16",
1352
+ "qweight": "encoder.layer.7.intermediate.dense.weight.qweight",
1353
+ "scale": "encoder.layer.7.intermediate.dense.weight.scale"
1354
+ },
1355
+ {
1356
+ "name": "encoder.layer.7.output.dense.weight",
1357
+ "shape": [
1358
+ 1024,
1359
+ 4096
1360
+ ],
1361
+ "dtype": "float16",
1362
+ "qweight": "encoder.layer.7.output.dense.weight.qweight",
1363
+ "scale": "encoder.layer.7.output.dense.weight.scale"
1364
+ },
1365
+ {
1366
+ "name": "encoder.layer.8.attention.output.dense.weight",
1367
+ "shape": [
1368
+ 1024,
1369
+ 1024
1370
+ ],
1371
+ "dtype": "float16",
1372
+ "qweight": "encoder.layer.8.attention.output.dense.weight.qweight",
1373
+ "scale": "encoder.layer.8.attention.output.dense.weight.scale"
1374
+ },
1375
+ {
1376
+ "name": "encoder.layer.8.attention.self.key.weight",
1377
+ "shape": [
1378
+ 1024,
1379
+ 1024
1380
+ ],
1381
+ "dtype": "float16",
1382
+ "qweight": "encoder.layer.8.attention.self.key.weight.qweight",
1383
+ "scale": "encoder.layer.8.attention.self.key.weight.scale"
1384
+ },
1385
+ {
1386
+ "name": "encoder.layer.8.attention.self.query.weight",
1387
+ "shape": [
1388
+ 1024,
1389
+ 1024
1390
+ ],
1391
+ "dtype": "float16",
1392
+ "qweight": "encoder.layer.8.attention.self.query.weight.qweight",
1393
+ "scale": "encoder.layer.8.attention.self.query.weight.scale"
1394
+ },
1395
+ {
1396
+ "name": "encoder.layer.8.attention.self.value.weight",
1397
+ "shape": [
1398
+ 1024,
1399
+ 1024
1400
+ ],
1401
+ "dtype": "float16",
1402
+ "qweight": "encoder.layer.8.attention.self.value.weight.qweight",
1403
+ "scale": "encoder.layer.8.attention.self.value.weight.scale"
1404
+ },
1405
+ {
1406
+ "name": "encoder.layer.8.intermediate.dense.weight",
1407
+ "shape": [
1408
+ 4096,
1409
+ 1024
1410
+ ],
1411
+ "dtype": "float16",
1412
+ "qweight": "encoder.layer.8.intermediate.dense.weight.qweight",
1413
+ "scale": "encoder.layer.8.intermediate.dense.weight.scale"
1414
+ },
1415
+ {
1416
+ "name": "encoder.layer.8.output.dense.weight",
1417
+ "shape": [
1418
+ 1024,
1419
+ 4096
1420
+ ],
1421
+ "dtype": "float16",
1422
+ "qweight": "encoder.layer.8.output.dense.weight.qweight",
1423
+ "scale": "encoder.layer.8.output.dense.weight.scale"
1424
+ },
1425
+ {
1426
+ "name": "encoder.layer.9.attention.output.dense.weight",
1427
+ "shape": [
1428
+ 1024,
1429
+ 1024
1430
+ ],
1431
+ "dtype": "float16",
1432
+ "qweight": "encoder.layer.9.attention.output.dense.weight.qweight",
1433
+ "scale": "encoder.layer.9.attention.output.dense.weight.scale"
1434
+ },
1435
+ {
1436
+ "name": "encoder.layer.9.attention.self.key.weight",
1437
+ "shape": [
1438
+ 1024,
1439
+ 1024
1440
+ ],
1441
+ "dtype": "float16",
1442
+ "qweight": "encoder.layer.9.attention.self.key.weight.qweight",
1443
+ "scale": "encoder.layer.9.attention.self.key.weight.scale"
1444
+ },
1445
+ {
1446
+ "name": "encoder.layer.9.attention.self.query.weight",
1447
+ "shape": [
1448
+ 1024,
1449
+ 1024
1450
+ ],
1451
+ "dtype": "float16",
1452
+ "qweight": "encoder.layer.9.attention.self.query.weight.qweight",
1453
+ "scale": "encoder.layer.9.attention.self.query.weight.scale"
1454
+ },
1455
+ {
1456
+ "name": "encoder.layer.9.attention.self.value.weight",
1457
+ "shape": [
1458
+ 1024,
1459
+ 1024
1460
+ ],
1461
+ "dtype": "float16",
1462
+ "qweight": "encoder.layer.9.attention.self.value.weight.qweight",
1463
+ "scale": "encoder.layer.9.attention.self.value.weight.scale"
1464
+ },
1465
+ {
1466
+ "name": "encoder.layer.9.intermediate.dense.weight",
1467
+ "shape": [
1468
+ 4096,
1469
+ 1024
1470
+ ],
1471
+ "dtype": "float16",
1472
+ "qweight": "encoder.layer.9.intermediate.dense.weight.qweight",
1473
+ "scale": "encoder.layer.9.intermediate.dense.weight.scale"
1474
+ },
1475
+ {
1476
+ "name": "encoder.layer.9.output.dense.weight",
1477
+ "shape": [
1478
+ 1024,
1479
+ 4096
1480
+ ],
1481
+ "dtype": "float16",
1482
+ "qweight": "encoder.layer.9.output.dense.weight.qweight",
1483
+ "scale": "encoder.layer.9.output.dense.weight.scale"
1484
+ },
1485
+ {
1486
+ "name": "pooler.dense.weight",
1487
+ "shape": [
1488
+ 1024,
1489
+ 1024
1490
+ ],
1491
+ "dtype": "float16",
1492
+ "qweight": "pooler.dense.weight.qweight",
1493
+ "scale": "pooler.dense.weight.scale"
1494
+ }
1495
+ ],
1496
+ "kept": [
1497
+ {
1498
+ "name": "embeddings.LayerNorm.bias",
1499
+ "shape": [
1500
+ 1024
1501
+ ],
1502
+ "dtype": "float16"
1503
+ },
1504
+ {
1505
+ "name": "embeddings.LayerNorm.weight",
1506
+ "shape": [
1507
+ 1024
1508
+ ],
1509
+ "dtype": "float16"
1510
+ },
1511
+ {
1512
+ "name": "embeddings.position_ids",
1513
+ "shape": [
1514
+ 1,
1515
+ 514
1516
+ ],
1517
+ "dtype": "float16"
1518
+ },
1519
+ {
1520
+ "name": "encoder.layer.0.attention.output.LayerNorm.bias",
1521
+ "shape": [
1522
+ 1024
1523
+ ],
1524
+ "dtype": "float16"
1525
+ },
1526
+ {
1527
+ "name": "encoder.layer.0.attention.output.LayerNorm.weight",
1528
+ "shape": [
1529
+ 1024
1530
+ ],
1531
+ "dtype": "float16"
1532
+ },
1533
+ {
1534
+ "name": "encoder.layer.0.attention.output.dense.bias",
1535
+ "shape": [
1536
+ 1024
1537
+ ],
1538
+ "dtype": "float16"
1539
+ },
1540
+ {
1541
+ "name": "encoder.layer.0.attention.self.key.bias",
1542
+ "shape": [
1543
+ 1024
1544
+ ],
1545
+ "dtype": "float16"
1546
+ },
1547
+ {
1548
+ "name": "encoder.layer.0.attention.self.query.bias",
1549
+ "shape": [
1550
+ 1024
1551
+ ],
1552
+ "dtype": "float16"
1553
+ },
1554
+ {
1555
+ "name": "encoder.layer.0.attention.self.value.bias",
1556
+ "shape": [
1557
+ 1024
1558
+ ],
1559
+ "dtype": "float16"
1560
+ },
1561
+ {
1562
+ "name": "encoder.layer.0.intermediate.dense.bias",
1563
+ "shape": [
1564
+ 4096
1565
+ ],
1566
+ "dtype": "float16"
1567
+ },
1568
+ {
1569
+ "name": "encoder.layer.0.output.LayerNorm.bias",
1570
+ "shape": [
1571
+ 1024
1572
+ ],
1573
+ "dtype": "float16"
1574
+ },
1575
+ {
1576
+ "name": "encoder.layer.0.output.LayerNorm.weight",
1577
+ "shape": [
1578
+ 1024
1579
+ ],
1580
+ "dtype": "float16"
1581
+ },
1582
+ {
1583
+ "name": "encoder.layer.0.output.dense.bias",
1584
+ "shape": [
1585
+ 1024
1586
+ ],
1587
+ "dtype": "float16"
1588
+ },
1589
+ {
1590
+ "name": "encoder.layer.1.attention.output.LayerNorm.bias",
1591
+ "shape": [
1592
+ 1024
1593
+ ],
1594
+ "dtype": "float16"
1595
+ },
1596
+ {
1597
+ "name": "encoder.layer.1.attention.output.LayerNorm.weight",
1598
+ "shape": [
1599
+ 1024
1600
+ ],
1601
+ "dtype": "float16"
1602
+ },
1603
+ {
1604
+ "name": "encoder.layer.1.attention.output.dense.bias",
1605
+ "shape": [
1606
+ 1024
1607
+ ],
1608
+ "dtype": "float16"
1609
+ },
1610
+ {
1611
+ "name": "encoder.layer.1.attention.self.key.bias",
1612
+ "shape": [
1613
+ 1024
1614
+ ],
1615
+ "dtype": "float16"
1616
+ },
1617
+ {
1618
+ "name": "encoder.layer.1.attention.self.query.bias",
1619
+ "shape": [
1620
+ 1024
1621
+ ],
1622
+ "dtype": "float16"
1623
+ },
1624
+ {
1625
+ "name": "encoder.layer.1.attention.self.value.bias",
1626
+ "shape": [
1627
+ 1024
1628
+ ],
1629
+ "dtype": "float16"
1630
+ },
1631
+ {
1632
+ "name": "encoder.layer.1.intermediate.dense.bias",
1633
+ "shape": [
1634
+ 4096
1635
+ ],
1636
+ "dtype": "float16"
1637
+ },
1638
+ {
1639
+ "name": "encoder.layer.1.output.LayerNorm.bias",
1640
+ "shape": [
1641
+ 1024
1642
+ ],
1643
+ "dtype": "float16"
1644
+ },
1645
+ {
1646
+ "name": "encoder.layer.1.output.LayerNorm.weight",
1647
+ "shape": [
1648
+ 1024
1649
+ ],
1650
+ "dtype": "float16"
1651
+ },
1652
+ {
1653
+ "name": "encoder.layer.1.output.dense.bias",
1654
+ "shape": [
1655
+ 1024
1656
+ ],
1657
+ "dtype": "float16"
1658
+ },
1659
+ {
1660
+ "name": "encoder.layer.10.attention.output.LayerNorm.bias",
1661
+ "shape": [
1662
+ 1024
1663
+ ],
1664
+ "dtype": "float16"
1665
+ },
1666
+ {
1667
+ "name": "encoder.layer.10.attention.output.LayerNorm.weight",
1668
+ "shape": [
1669
+ 1024
1670
+ ],
1671
+ "dtype": "float16"
1672
+ },
1673
+ {
1674
+ "name": "encoder.layer.10.attention.output.dense.bias",
1675
+ "shape": [
1676
+ 1024
1677
+ ],
1678
+ "dtype": "float16"
1679
+ },
1680
+ {
1681
+ "name": "encoder.layer.10.attention.self.key.bias",
1682
+ "shape": [
1683
+ 1024
1684
+ ],
1685
+ "dtype": "float16"
1686
+ },
1687
+ {
1688
+ "name": "encoder.layer.10.attention.self.query.bias",
1689
+ "shape": [
1690
+ 1024
1691
+ ],
1692
+ "dtype": "float16"
1693
+ },
1694
+ {
1695
+ "name": "encoder.layer.10.attention.self.value.bias",
1696
+ "shape": [
1697
+ 1024
1698
+ ],
1699
+ "dtype": "float16"
1700
+ },
1701
+ {
1702
+ "name": "encoder.layer.10.intermediate.dense.bias",
1703
+ "shape": [
1704
+ 4096
1705
+ ],
1706
+ "dtype": "float16"
1707
+ },
1708
+ {
1709
+ "name": "encoder.layer.10.output.LayerNorm.bias",
1710
+ "shape": [
1711
+ 1024
1712
+ ],
1713
+ "dtype": "float16"
1714
+ },
1715
+ {
1716
+ "name": "encoder.layer.10.output.LayerNorm.weight",
1717
+ "shape": [
1718
+ 1024
1719
+ ],
1720
+ "dtype": "float16"
1721
+ },
1722
+ {
1723
+ "name": "encoder.layer.10.output.dense.bias",
1724
+ "shape": [
1725
+ 1024
1726
+ ],
1727
+ "dtype": "float16"
1728
+ },
1729
+ {
1730
+ "name": "encoder.layer.11.attention.output.LayerNorm.bias",
1731
+ "shape": [
1732
+ 1024
1733
+ ],
1734
+ "dtype": "float16"
1735
+ },
1736
+ {
1737
+ "name": "encoder.layer.11.attention.output.LayerNorm.weight",
1738
+ "shape": [
1739
+ 1024
1740
+ ],
1741
+ "dtype": "float16"
1742
+ },
1743
+ {
1744
+ "name": "encoder.layer.11.attention.output.dense.bias",
1745
+ "shape": [
1746
+ 1024
1747
+ ],
1748
+ "dtype": "float16"
1749
+ },
1750
+ {
1751
+ "name": "encoder.layer.11.attention.self.key.bias",
1752
+ "shape": [
1753
+ 1024
1754
+ ],
1755
+ "dtype": "float16"
1756
+ },
1757
+ {
1758
+ "name": "encoder.layer.11.attention.self.query.bias",
1759
+ "shape": [
1760
+ 1024
1761
+ ],
1762
+ "dtype": "float16"
1763
+ },
1764
+ {
1765
+ "name": "encoder.layer.11.attention.self.value.bias",
1766
+ "shape": [
1767
+ 1024
1768
+ ],
1769
+ "dtype": "float16"
1770
+ },
1771
+ {
1772
+ "name": "encoder.layer.11.intermediate.dense.bias",
1773
+ "shape": [
1774
+ 4096
1775
+ ],
1776
+ "dtype": "float16"
1777
+ },
1778
+ {
1779
+ "name": "encoder.layer.11.output.LayerNorm.bias",
1780
+ "shape": [
1781
+ 1024
1782
+ ],
1783
+ "dtype": "float16"
1784
+ },
1785
+ {
1786
+ "name": "encoder.layer.11.output.LayerNorm.weight",
1787
+ "shape": [
1788
+ 1024
1789
+ ],
1790
+ "dtype": "float16"
1791
+ },
1792
+ {
1793
+ "name": "encoder.layer.11.output.dense.bias",
1794
+ "shape": [
1795
+ 1024
1796
+ ],
1797
+ "dtype": "float16"
1798
+ },
1799
+ {
1800
+ "name": "encoder.layer.12.attention.output.LayerNorm.bias",
1801
+ "shape": [
1802
+ 1024
1803
+ ],
1804
+ "dtype": "float16"
1805
+ },
1806
+ {
1807
+ "name": "encoder.layer.12.attention.output.LayerNorm.weight",
1808
+ "shape": [
1809
+ 1024
1810
+ ],
1811
+ "dtype": "float16"
1812
+ },
1813
+ {
1814
+ "name": "encoder.layer.12.attention.output.dense.bias",
1815
+ "shape": [
1816
+ 1024
1817
+ ],
1818
+ "dtype": "float16"
1819
+ },
1820
+ {
1821
+ "name": "encoder.layer.12.attention.self.key.bias",
1822
+ "shape": [
1823
+ 1024
1824
+ ],
1825
+ "dtype": "float16"
1826
+ },
1827
+ {
1828
+ "name": "encoder.layer.12.attention.self.query.bias",
1829
+ "shape": [
1830
+ 1024
1831
+ ],
1832
+ "dtype": "float16"
1833
+ },
1834
+ {
1835
+ "name": "encoder.layer.12.attention.self.value.bias",
1836
+ "shape": [
1837
+ 1024
1838
+ ],
1839
+ "dtype": "float16"
1840
+ },
1841
+ {
1842
+ "name": "encoder.layer.12.intermediate.dense.bias",
1843
+ "shape": [
1844
+ 4096
1845
+ ],
1846
+ "dtype": "float16"
1847
+ },
1848
+ {
1849
+ "name": "encoder.layer.12.output.LayerNorm.bias",
1850
+ "shape": [
1851
+ 1024
1852
+ ],
1853
+ "dtype": "float16"
1854
+ },
1855
+ {
1856
+ "name": "encoder.layer.12.output.LayerNorm.weight",
1857
+ "shape": [
1858
+ 1024
1859
+ ],
1860
+ "dtype": "float16"
1861
+ },
1862
+ {
1863
+ "name": "encoder.layer.12.output.dense.bias",
1864
+ "shape": [
1865
+ 1024
1866
+ ],
1867
+ "dtype": "float16"
1868
+ },
1869
+ {
1870
+ "name": "encoder.layer.13.attention.output.LayerNorm.bias",
1871
+ "shape": [
1872
+ 1024
1873
+ ],
1874
+ "dtype": "float16"
1875
+ },
1876
+ {
1877
+ "name": "encoder.layer.13.attention.output.LayerNorm.weight",
1878
+ "shape": [
1879
+ 1024
1880
+ ],
1881
+ "dtype": "float16"
1882
+ },
1883
+ {
1884
+ "name": "encoder.layer.13.attention.output.dense.bias",
1885
+ "shape": [
1886
+ 1024
1887
+ ],
1888
+ "dtype": "float16"
1889
+ },
1890
+ {
1891
+ "name": "encoder.layer.13.attention.self.key.bias",
1892
+ "shape": [
1893
+ 1024
1894
+ ],
1895
+ "dtype": "float16"
1896
+ },
1897
+ {
1898
+ "name": "encoder.layer.13.attention.self.query.bias",
1899
+ "shape": [
1900
+ 1024
1901
+ ],
1902
+ "dtype": "float16"
1903
+ },
1904
+ {
1905
+ "name": "encoder.layer.13.attention.self.value.bias",
1906
+ "shape": [
1907
+ 1024
1908
+ ],
1909
+ "dtype": "float16"
1910
+ },
1911
+ {
1912
+ "name": "encoder.layer.13.intermediate.dense.bias",
1913
+ "shape": [
1914
+ 4096
1915
+ ],
1916
+ "dtype": "float16"
1917
+ },
1918
+ {
1919
+ "name": "encoder.layer.13.output.LayerNorm.bias",
1920
+ "shape": [
1921
+ 1024
1922
+ ],
1923
+ "dtype": "float16"
1924
+ },
1925
+ {
1926
+ "name": "encoder.layer.13.output.LayerNorm.weight",
1927
+ "shape": [
1928
+ 1024
1929
+ ],
1930
+ "dtype": "float16"
1931
+ },
1932
+ {
1933
+ "name": "encoder.layer.13.output.dense.bias",
1934
+ "shape": [
1935
+ 1024
1936
+ ],
1937
+ "dtype": "float16"
1938
+ },
1939
+ {
1940
+ "name": "encoder.layer.14.attention.output.LayerNorm.bias",
1941
+ "shape": [
1942
+ 1024
1943
+ ],
1944
+ "dtype": "float16"
1945
+ },
1946
+ {
1947
+ "name": "encoder.layer.14.attention.output.LayerNorm.weight",
1948
+ "shape": [
1949
+ 1024
1950
+ ],
1951
+ "dtype": "float16"
1952
+ },
1953
+ {
1954
+ "name": "encoder.layer.14.attention.output.dense.bias",
1955
+ "shape": [
1956
+ 1024
1957
+ ],
1958
+ "dtype": "float16"
1959
+ },
1960
+ {
1961
+ "name": "encoder.layer.14.attention.self.key.bias",
1962
+ "shape": [
1963
+ 1024
1964
+ ],
1965
+ "dtype": "float16"
1966
+ },
1967
+ {
1968
+ "name": "encoder.layer.14.attention.self.query.bias",
1969
+ "shape": [
1970
+ 1024
1971
+ ],
1972
+ "dtype": "float16"
1973
+ },
1974
+ {
1975
+ "name": "encoder.layer.14.attention.self.value.bias",
1976
+ "shape": [
1977
+ 1024
1978
+ ],
1979
+ "dtype": "float16"
1980
+ },
1981
+ {
1982
+ "name": "encoder.layer.14.intermediate.dense.bias",
1983
+ "shape": [
1984
+ 4096
1985
+ ],
1986
+ "dtype": "float16"
1987
+ },
1988
+ {
1989
+ "name": "encoder.layer.14.output.LayerNorm.bias",
1990
+ "shape": [
1991
+ 1024
1992
+ ],
1993
+ "dtype": "float16"
1994
+ },
1995
+ {
1996
+ "name": "encoder.layer.14.output.LayerNorm.weight",
1997
+ "shape": [
1998
+ 1024
1999
+ ],
2000
+ "dtype": "float16"
2001
+ },
2002
+ {
2003
+ "name": "encoder.layer.14.output.dense.bias",
2004
+ "shape": [
2005
+ 1024
2006
+ ],
2007
+ "dtype": "float16"
2008
+ },
2009
+ {
2010
+ "name": "encoder.layer.15.attention.output.LayerNorm.bias",
2011
+ "shape": [
2012
+ 1024
2013
+ ],
2014
+ "dtype": "float16"
2015
+ },
2016
+ {
2017
+ "name": "encoder.layer.15.attention.output.LayerNorm.weight",
2018
+ "shape": [
2019
+ 1024
2020
+ ],
2021
+ "dtype": "float16"
2022
+ },
2023
+ {
2024
+ "name": "encoder.layer.15.attention.output.dense.bias",
2025
+ "shape": [
2026
+ 1024
2027
+ ],
2028
+ "dtype": "float16"
2029
+ },
2030
+ {
2031
+ "name": "encoder.layer.15.attention.self.key.bias",
2032
+ "shape": [
2033
+ 1024
2034
+ ],
2035
+ "dtype": "float16"
2036
+ },
2037
+ {
2038
+ "name": "encoder.layer.15.attention.self.query.bias",
2039
+ "shape": [
2040
+ 1024
2041
+ ],
2042
+ "dtype": "float16"
2043
+ },
2044
+ {
2045
+ "name": "encoder.layer.15.attention.self.value.bias",
2046
+ "shape": [
2047
+ 1024
2048
+ ],
2049
+ "dtype": "float16"
2050
+ },
2051
+ {
2052
+ "name": "encoder.layer.15.intermediate.dense.bias",
2053
+ "shape": [
2054
+ 4096
2055
+ ],
2056
+ "dtype": "float16"
2057
+ },
2058
+ {
2059
+ "name": "encoder.layer.15.output.LayerNorm.bias",
2060
+ "shape": [
2061
+ 1024
2062
+ ],
2063
+ "dtype": "float16"
2064
+ },
2065
+ {
2066
+ "name": "encoder.layer.15.output.LayerNorm.weight",
2067
+ "shape": [
2068
+ 1024
2069
+ ],
2070
+ "dtype": "float16"
2071
+ },
2072
+ {
2073
+ "name": "encoder.layer.15.output.dense.bias",
2074
+ "shape": [
2075
+ 1024
2076
+ ],
2077
+ "dtype": "float16"
2078
+ },
2079
+ {
2080
+ "name": "encoder.layer.16.attention.output.LayerNorm.bias",
2081
+ "shape": [
2082
+ 1024
2083
+ ],
2084
+ "dtype": "float16"
2085
+ },
2086
+ {
2087
+ "name": "encoder.layer.16.attention.output.LayerNorm.weight",
2088
+ "shape": [
2089
+ 1024
2090
+ ],
2091
+ "dtype": "float16"
2092
+ },
2093
+ {
2094
+ "name": "encoder.layer.16.attention.output.dense.bias",
2095
+ "shape": [
2096
+ 1024
2097
+ ],
2098
+ "dtype": "float16"
2099
+ },
2100
+ {
2101
+ "name": "encoder.layer.16.attention.self.key.bias",
2102
+ "shape": [
2103
+ 1024
2104
+ ],
2105
+ "dtype": "float16"
2106
+ },
2107
+ {
2108
+ "name": "encoder.layer.16.attention.self.query.bias",
2109
+ "shape": [
2110
+ 1024
2111
+ ],
2112
+ "dtype": "float16"
2113
+ },
2114
+ {
2115
+ "name": "encoder.layer.16.attention.self.value.bias",
2116
+ "shape": [
2117
+ 1024
2118
+ ],
2119
+ "dtype": "float16"
2120
+ },
2121
+ {
2122
+ "name": "encoder.layer.16.intermediate.dense.bias",
2123
+ "shape": [
2124
+ 4096
2125
+ ],
2126
+ "dtype": "float16"
2127
+ },
2128
+ {
2129
+ "name": "encoder.layer.16.output.LayerNorm.bias",
2130
+ "shape": [
2131
+ 1024
2132
+ ],
2133
+ "dtype": "float16"
2134
+ },
2135
+ {
2136
+ "name": "encoder.layer.16.output.LayerNorm.weight",
2137
+ "shape": [
2138
+ 1024
2139
+ ],
2140
+ "dtype": "float16"
2141
+ },
2142
+ {
2143
+ "name": "encoder.layer.16.output.dense.bias",
2144
+ "shape": [
2145
+ 1024
2146
+ ],
2147
+ "dtype": "float16"
2148
+ },
2149
+ {
2150
+ "name": "encoder.layer.17.attention.output.LayerNorm.bias",
2151
+ "shape": [
2152
+ 1024
2153
+ ],
2154
+ "dtype": "float16"
2155
+ },
2156
+ {
2157
+ "name": "encoder.layer.17.attention.output.LayerNorm.weight",
2158
+ "shape": [
2159
+ 1024
2160
+ ],
2161
+ "dtype": "float16"
2162
+ },
2163
+ {
2164
+ "name": "encoder.layer.17.attention.output.dense.bias",
2165
+ "shape": [
2166
+ 1024
2167
+ ],
2168
+ "dtype": "float16"
2169
+ },
2170
+ {
2171
+ "name": "encoder.layer.17.attention.self.key.bias",
2172
+ "shape": [
2173
+ 1024
2174
+ ],
2175
+ "dtype": "float16"
2176
+ },
2177
+ {
2178
+ "name": "encoder.layer.17.attention.self.query.bias",
2179
+ "shape": [
2180
+ 1024
2181
+ ],
2182
+ "dtype": "float16"
2183
+ },
2184
+ {
2185
+ "name": "encoder.layer.17.attention.self.value.bias",
2186
+ "shape": [
2187
+ 1024
2188
+ ],
2189
+ "dtype": "float16"
2190
+ },
2191
+ {
2192
+ "name": "encoder.layer.17.intermediate.dense.bias",
2193
+ "shape": [
2194
+ 4096
2195
+ ],
2196
+ "dtype": "float16"
2197
+ },
2198
+ {
2199
+ "name": "encoder.layer.17.output.LayerNorm.bias",
2200
+ "shape": [
2201
+ 1024
2202
+ ],
2203
+ "dtype": "float16"
2204
+ },
2205
+ {
2206
+ "name": "encoder.layer.17.output.LayerNorm.weight",
2207
+ "shape": [
2208
+ 1024
2209
+ ],
2210
+ "dtype": "float16"
2211
+ },
2212
+ {
2213
+ "name": "encoder.layer.17.output.dense.bias",
2214
+ "shape": [
2215
+ 1024
2216
+ ],
2217
+ "dtype": "float16"
2218
+ },
2219
+ {
2220
+ "name": "encoder.layer.18.attention.output.LayerNorm.bias",
2221
+ "shape": [
2222
+ 1024
2223
+ ],
2224
+ "dtype": "float16"
2225
+ },
2226
+ {
2227
+ "name": "encoder.layer.18.attention.output.LayerNorm.weight",
2228
+ "shape": [
2229
+ 1024
2230
+ ],
2231
+ "dtype": "float16"
2232
+ },
2233
+ {
2234
+ "name": "encoder.layer.18.attention.output.dense.bias",
2235
+ "shape": [
2236
+ 1024
2237
+ ],
2238
+ "dtype": "float16"
2239
+ },
2240
+ {
2241
+ "name": "encoder.layer.18.attention.self.key.bias",
2242
+ "shape": [
2243
+ 1024
2244
+ ],
2245
+ "dtype": "float16"
2246
+ },
2247
+ {
2248
+ "name": "encoder.layer.18.attention.self.query.bias",
2249
+ "shape": [
2250
+ 1024
2251
+ ],
2252
+ "dtype": "float16"
2253
+ },
2254
+ {
2255
+ "name": "encoder.layer.18.attention.self.value.bias",
2256
+ "shape": [
2257
+ 1024
2258
+ ],
2259
+ "dtype": "float16"
2260
+ },
2261
+ {
2262
+ "name": "encoder.layer.18.intermediate.dense.bias",
2263
+ "shape": [
2264
+ 4096
2265
+ ],
2266
+ "dtype": "float16"
2267
+ },
2268
+ {
2269
+ "name": "encoder.layer.18.output.LayerNorm.bias",
2270
+ "shape": [
2271
+ 1024
2272
+ ],
2273
+ "dtype": "float16"
2274
+ },
2275
+ {
2276
+ "name": "encoder.layer.18.output.LayerNorm.weight",
2277
+ "shape": [
2278
+ 1024
2279
+ ],
2280
+ "dtype": "float16"
2281
+ },
2282
+ {
2283
+ "name": "encoder.layer.18.output.dense.bias",
2284
+ "shape": [
2285
+ 1024
2286
+ ],
2287
+ "dtype": "float16"
2288
+ },
2289
+ {
2290
+ "name": "encoder.layer.19.attention.output.LayerNorm.bias",
2291
+ "shape": [
2292
+ 1024
2293
+ ],
2294
+ "dtype": "float16"
2295
+ },
2296
+ {
2297
+ "name": "encoder.layer.19.attention.output.LayerNorm.weight",
2298
+ "shape": [
2299
+ 1024
2300
+ ],
2301
+ "dtype": "float16"
2302
+ },
2303
+ {
2304
+ "name": "encoder.layer.19.attention.output.dense.bias",
2305
+ "shape": [
2306
+ 1024
2307
+ ],
2308
+ "dtype": "float16"
2309
+ },
2310
+ {
2311
+ "name": "encoder.layer.19.attention.self.key.bias",
2312
+ "shape": [
2313
+ 1024
2314
+ ],
2315
+ "dtype": "float16"
2316
+ },
2317
+ {
2318
+ "name": "encoder.layer.19.attention.self.query.bias",
2319
+ "shape": [
2320
+ 1024
2321
+ ],
2322
+ "dtype": "float16"
2323
+ },
2324
+ {
2325
+ "name": "encoder.layer.19.attention.self.value.bias",
2326
+ "shape": [
2327
+ 1024
2328
+ ],
2329
+ "dtype": "float16"
2330
+ },
2331
+ {
2332
+ "name": "encoder.layer.19.intermediate.dense.bias",
2333
+ "shape": [
2334
+ 4096
2335
+ ],
2336
+ "dtype": "float16"
2337
+ },
2338
+ {
2339
+ "name": "encoder.layer.19.output.LayerNorm.bias",
2340
+ "shape": [
2341
+ 1024
2342
+ ],
2343
+ "dtype": "float16"
2344
+ },
2345
+ {
2346
+ "name": "encoder.layer.19.output.LayerNorm.weight",
2347
+ "shape": [
2348
+ 1024
2349
+ ],
2350
+ "dtype": "float16"
2351
+ },
2352
+ {
2353
+ "name": "encoder.layer.19.output.dense.bias",
2354
+ "shape": [
2355
+ 1024
2356
+ ],
2357
+ "dtype": "float16"
2358
+ },
2359
+ {
2360
+ "name": "encoder.layer.2.attention.output.LayerNorm.bias",
2361
+ "shape": [
2362
+ 1024
2363
+ ],
2364
+ "dtype": "float16"
2365
+ },
2366
+ {
2367
+ "name": "encoder.layer.2.attention.output.LayerNorm.weight",
2368
+ "shape": [
2369
+ 1024
2370
+ ],
2371
+ "dtype": "float16"
2372
+ },
2373
+ {
2374
+ "name": "encoder.layer.2.attention.output.dense.bias",
2375
+ "shape": [
2376
+ 1024
2377
+ ],
2378
+ "dtype": "float16"
2379
+ },
2380
+ {
2381
+ "name": "encoder.layer.2.attention.self.key.bias",
2382
+ "shape": [
2383
+ 1024
2384
+ ],
2385
+ "dtype": "float16"
2386
+ },
2387
+ {
2388
+ "name": "encoder.layer.2.attention.self.query.bias",
2389
+ "shape": [
2390
+ 1024
2391
+ ],
2392
+ "dtype": "float16"
2393
+ },
2394
+ {
2395
+ "name": "encoder.layer.2.attention.self.value.bias",
2396
+ "shape": [
2397
+ 1024
2398
+ ],
2399
+ "dtype": "float16"
2400
+ },
2401
+ {
2402
+ "name": "encoder.layer.2.intermediate.dense.bias",
2403
+ "shape": [
2404
+ 4096
2405
+ ],
2406
+ "dtype": "float16"
2407
+ },
2408
+ {
2409
+ "name": "encoder.layer.2.output.LayerNorm.bias",
2410
+ "shape": [
2411
+ 1024
2412
+ ],
2413
+ "dtype": "float16"
2414
+ },
2415
+ {
2416
+ "name": "encoder.layer.2.output.LayerNorm.weight",
2417
+ "shape": [
2418
+ 1024
2419
+ ],
2420
+ "dtype": "float16"
2421
+ },
2422
+ {
2423
+ "name": "encoder.layer.2.output.dense.bias",
2424
+ "shape": [
2425
+ 1024
2426
+ ],
2427
+ "dtype": "float16"
2428
+ },
2429
+ {
2430
+ "name": "encoder.layer.20.attention.output.LayerNorm.bias",
2431
+ "shape": [
2432
+ 1024
2433
+ ],
2434
+ "dtype": "float16"
2435
+ },
2436
+ {
2437
+ "name": "encoder.layer.20.attention.output.LayerNorm.weight",
2438
+ "shape": [
2439
+ 1024
2440
+ ],
2441
+ "dtype": "float16"
2442
+ },
2443
+ {
2444
+ "name": "encoder.layer.20.attention.output.dense.bias",
2445
+ "shape": [
2446
+ 1024
2447
+ ],
2448
+ "dtype": "float16"
2449
+ },
2450
+ {
2451
+ "name": "encoder.layer.20.attention.self.key.bias",
2452
+ "shape": [
2453
+ 1024
2454
+ ],
2455
+ "dtype": "float16"
2456
+ },
2457
+ {
2458
+ "name": "encoder.layer.20.attention.self.query.bias",
2459
+ "shape": [
2460
+ 1024
2461
+ ],
2462
+ "dtype": "float16"
2463
+ },
2464
+ {
2465
+ "name": "encoder.layer.20.attention.self.value.bias",
2466
+ "shape": [
2467
+ 1024
2468
+ ],
2469
+ "dtype": "float16"
2470
+ },
2471
+ {
2472
+ "name": "encoder.layer.20.intermediate.dense.bias",
2473
+ "shape": [
2474
+ 4096
2475
+ ],
2476
+ "dtype": "float16"
2477
+ },
2478
+ {
2479
+ "name": "encoder.layer.20.output.LayerNorm.bias",
2480
+ "shape": [
2481
+ 1024
2482
+ ],
2483
+ "dtype": "float16"
2484
+ },
2485
+ {
2486
+ "name": "encoder.layer.20.output.LayerNorm.weight",
2487
+ "shape": [
2488
+ 1024
2489
+ ],
2490
+ "dtype": "float16"
2491
+ },
2492
+ {
2493
+ "name": "encoder.layer.20.output.dense.bias",
2494
+ "shape": [
2495
+ 1024
2496
+ ],
2497
+ "dtype": "float16"
2498
+ },
2499
+ {
2500
+ "name": "encoder.layer.21.attention.output.LayerNorm.bias",
2501
+ "shape": [
2502
+ 1024
2503
+ ],
2504
+ "dtype": "float16"
2505
+ },
2506
+ {
2507
+ "name": "encoder.layer.21.attention.output.LayerNorm.weight",
2508
+ "shape": [
2509
+ 1024
2510
+ ],
2511
+ "dtype": "float16"
2512
+ },
2513
+ {
2514
+ "name": "encoder.layer.21.attention.output.dense.bias",
2515
+ "shape": [
2516
+ 1024
2517
+ ],
2518
+ "dtype": "float16"
2519
+ },
2520
+ {
2521
+ "name": "encoder.layer.21.attention.self.key.bias",
2522
+ "shape": [
2523
+ 1024
2524
+ ],
2525
+ "dtype": "float16"
2526
+ },
2527
+ {
2528
+ "name": "encoder.layer.21.attention.self.query.bias",
2529
+ "shape": [
2530
+ 1024
2531
+ ],
2532
+ "dtype": "float16"
2533
+ },
2534
+ {
2535
+ "name": "encoder.layer.21.attention.self.value.bias",
2536
+ "shape": [
2537
+ 1024
2538
+ ],
2539
+ "dtype": "float16"
2540
+ },
2541
+ {
2542
+ "name": "encoder.layer.21.intermediate.dense.bias",
2543
+ "shape": [
2544
+ 4096
2545
+ ],
2546
+ "dtype": "float16"
2547
+ },
2548
+ {
2549
+ "name": "encoder.layer.21.output.LayerNorm.bias",
2550
+ "shape": [
2551
+ 1024
2552
+ ],
2553
+ "dtype": "float16"
2554
+ },
2555
+ {
2556
+ "name": "encoder.layer.21.output.LayerNorm.weight",
2557
+ "shape": [
2558
+ 1024
2559
+ ],
2560
+ "dtype": "float16"
2561
+ },
2562
+ {
2563
+ "name": "encoder.layer.21.output.dense.bias",
2564
+ "shape": [
2565
+ 1024
2566
+ ],
2567
+ "dtype": "float16"
2568
+ },
2569
+ {
2570
+ "name": "encoder.layer.22.attention.output.LayerNorm.bias",
2571
+ "shape": [
2572
+ 1024
2573
+ ],
2574
+ "dtype": "float16"
2575
+ },
2576
+ {
2577
+ "name": "encoder.layer.22.attention.output.LayerNorm.weight",
2578
+ "shape": [
2579
+ 1024
2580
+ ],
2581
+ "dtype": "float16"
2582
+ },
2583
+ {
2584
+ "name": "encoder.layer.22.attention.output.dense.bias",
2585
+ "shape": [
2586
+ 1024
2587
+ ],
2588
+ "dtype": "float16"
2589
+ },
2590
+ {
2591
+ "name": "encoder.layer.22.attention.self.key.bias",
2592
+ "shape": [
2593
+ 1024
2594
+ ],
2595
+ "dtype": "float16"
2596
+ },
2597
+ {
2598
+ "name": "encoder.layer.22.attention.self.query.bias",
2599
+ "shape": [
2600
+ 1024
2601
+ ],
2602
+ "dtype": "float16"
2603
+ },
2604
+ {
2605
+ "name": "encoder.layer.22.attention.self.value.bias",
2606
+ "shape": [
2607
+ 1024
2608
+ ],
2609
+ "dtype": "float16"
2610
+ },
2611
+ {
2612
+ "name": "encoder.layer.22.intermediate.dense.bias",
2613
+ "shape": [
2614
+ 4096
2615
+ ],
2616
+ "dtype": "float16"
2617
+ },
2618
+ {
2619
+ "name": "encoder.layer.22.output.LayerNorm.bias",
2620
+ "shape": [
2621
+ 1024
2622
+ ],
2623
+ "dtype": "float16"
2624
+ },
2625
+ {
2626
+ "name": "encoder.layer.22.output.LayerNorm.weight",
2627
+ "shape": [
2628
+ 1024
2629
+ ],
2630
+ "dtype": "float16"
2631
+ },
2632
+ {
2633
+ "name": "encoder.layer.22.output.dense.bias",
2634
+ "shape": [
2635
+ 1024
2636
+ ],
2637
+ "dtype": "float16"
2638
+ },
2639
+ {
2640
+ "name": "encoder.layer.23.attention.output.LayerNorm.bias",
2641
+ "shape": [
2642
+ 1024
2643
+ ],
2644
+ "dtype": "float16"
2645
+ },
2646
+ {
2647
+ "name": "encoder.layer.23.attention.output.LayerNorm.weight",
2648
+ "shape": [
2649
+ 1024
2650
+ ],
2651
+ "dtype": "float16"
2652
+ },
2653
+ {
2654
+ "name": "encoder.layer.23.attention.output.dense.bias",
2655
+ "shape": [
2656
+ 1024
2657
+ ],
2658
+ "dtype": "float16"
2659
+ },
2660
+ {
2661
+ "name": "encoder.layer.23.attention.self.key.bias",
2662
+ "shape": [
2663
+ 1024
2664
+ ],
2665
+ "dtype": "float16"
2666
+ },
2667
+ {
2668
+ "name": "encoder.layer.23.attention.self.query.bias",
2669
+ "shape": [
2670
+ 1024
2671
+ ],
2672
+ "dtype": "float16"
2673
+ },
2674
+ {
2675
+ "name": "encoder.layer.23.attention.self.value.bias",
2676
+ "shape": [
2677
+ 1024
2678
+ ],
2679
+ "dtype": "float16"
2680
+ },
2681
+ {
2682
+ "name": "encoder.layer.23.intermediate.dense.bias",
2683
+ "shape": [
2684
+ 4096
2685
+ ],
2686
+ "dtype": "float16"
2687
+ },
2688
+ {
2689
+ "name": "encoder.layer.23.output.LayerNorm.bias",
2690
+ "shape": [
2691
+ 1024
2692
+ ],
2693
+ "dtype": "float16"
2694
+ },
2695
+ {
2696
+ "name": "encoder.layer.23.output.LayerNorm.weight",
2697
+ "shape": [
2698
+ 1024
2699
+ ],
2700
+ "dtype": "float16"
2701
+ },
2702
+ {
2703
+ "name": "encoder.layer.23.output.dense.bias",
2704
+ "shape": [
2705
+ 1024
2706
+ ],
2707
+ "dtype": "float16"
2708
+ },
2709
+ {
2710
+ "name": "encoder.layer.3.attention.output.LayerNorm.bias",
2711
+ "shape": [
2712
+ 1024
2713
+ ],
2714
+ "dtype": "float16"
2715
+ },
2716
+ {
2717
+ "name": "encoder.layer.3.attention.output.LayerNorm.weight",
2718
+ "shape": [
2719
+ 1024
2720
+ ],
2721
+ "dtype": "float16"
2722
+ },
2723
+ {
2724
+ "name": "encoder.layer.3.attention.output.dense.bias",
2725
+ "shape": [
2726
+ 1024
2727
+ ],
2728
+ "dtype": "float16"
2729
+ },
2730
+ {
2731
+ "name": "encoder.layer.3.attention.self.key.bias",
2732
+ "shape": [
2733
+ 1024
2734
+ ],
2735
+ "dtype": "float16"
2736
+ },
2737
+ {
2738
+ "name": "encoder.layer.3.attention.self.query.bias",
2739
+ "shape": [
2740
+ 1024
2741
+ ],
2742
+ "dtype": "float16"
2743
+ },
2744
+ {
2745
+ "name": "encoder.layer.3.attention.self.value.bias",
2746
+ "shape": [
2747
+ 1024
2748
+ ],
2749
+ "dtype": "float16"
2750
+ },
2751
+ {
2752
+ "name": "encoder.layer.3.intermediate.dense.bias",
2753
+ "shape": [
2754
+ 4096
2755
+ ],
2756
+ "dtype": "float16"
2757
+ },
2758
+ {
2759
+ "name": "encoder.layer.3.output.LayerNorm.bias",
2760
+ "shape": [
2761
+ 1024
2762
+ ],
2763
+ "dtype": "float16"
2764
+ },
2765
+ {
2766
+ "name": "encoder.layer.3.output.LayerNorm.weight",
2767
+ "shape": [
2768
+ 1024
2769
+ ],
2770
+ "dtype": "float16"
2771
+ },
2772
+ {
2773
+ "name": "encoder.layer.3.output.dense.bias",
2774
+ "shape": [
2775
+ 1024
2776
+ ],
2777
+ "dtype": "float16"
2778
+ },
2779
+ {
2780
+ "name": "encoder.layer.4.attention.output.LayerNorm.bias",
2781
+ "shape": [
2782
+ 1024
2783
+ ],
2784
+ "dtype": "float16"
2785
+ },
2786
+ {
2787
+ "name": "encoder.layer.4.attention.output.LayerNorm.weight",
2788
+ "shape": [
2789
+ 1024
2790
+ ],
2791
+ "dtype": "float16"
2792
+ },
2793
+ {
2794
+ "name": "encoder.layer.4.attention.output.dense.bias",
2795
+ "shape": [
2796
+ 1024
2797
+ ],
2798
+ "dtype": "float16"
2799
+ },
2800
+ {
2801
+ "name": "encoder.layer.4.attention.self.key.bias",
2802
+ "shape": [
2803
+ 1024
2804
+ ],
2805
+ "dtype": "float16"
2806
+ },
2807
+ {
2808
+ "name": "encoder.layer.4.attention.self.query.bias",
2809
+ "shape": [
2810
+ 1024
2811
+ ],
2812
+ "dtype": "float16"
2813
+ },
2814
+ {
2815
+ "name": "encoder.layer.4.attention.self.value.bias",
2816
+ "shape": [
2817
+ 1024
2818
+ ],
2819
+ "dtype": "float16"
2820
+ },
2821
+ {
2822
+ "name": "encoder.layer.4.intermediate.dense.bias",
2823
+ "shape": [
2824
+ 4096
2825
+ ],
2826
+ "dtype": "float16"
2827
+ },
2828
+ {
2829
+ "name": "encoder.layer.4.output.LayerNorm.bias",
2830
+ "shape": [
2831
+ 1024
2832
+ ],
2833
+ "dtype": "float16"
2834
+ },
2835
+ {
2836
+ "name": "encoder.layer.4.output.LayerNorm.weight",
2837
+ "shape": [
2838
+ 1024
2839
+ ],
2840
+ "dtype": "float16"
2841
+ },
2842
+ {
2843
+ "name": "encoder.layer.4.output.dense.bias",
2844
+ "shape": [
2845
+ 1024
2846
+ ],
2847
+ "dtype": "float16"
2848
+ },
2849
+ {
2850
+ "name": "encoder.layer.5.attention.output.LayerNorm.bias",
2851
+ "shape": [
2852
+ 1024
2853
+ ],
2854
+ "dtype": "float16"
2855
+ },
2856
+ {
2857
+ "name": "encoder.layer.5.attention.output.LayerNorm.weight",
2858
+ "shape": [
2859
+ 1024
2860
+ ],
2861
+ "dtype": "float16"
2862
+ },
2863
+ {
2864
+ "name": "encoder.layer.5.attention.output.dense.bias",
2865
+ "shape": [
2866
+ 1024
2867
+ ],
2868
+ "dtype": "float16"
2869
+ },
2870
+ {
2871
+ "name": "encoder.layer.5.attention.self.key.bias",
2872
+ "shape": [
2873
+ 1024
2874
+ ],
2875
+ "dtype": "float16"
2876
+ },
2877
+ {
2878
+ "name": "encoder.layer.5.attention.self.query.bias",
2879
+ "shape": [
2880
+ 1024
2881
+ ],
2882
+ "dtype": "float16"
2883
+ },
2884
+ {
2885
+ "name": "encoder.layer.5.attention.self.value.bias",
2886
+ "shape": [
2887
+ 1024
2888
+ ],
2889
+ "dtype": "float16"
2890
+ },
2891
+ {
2892
+ "name": "encoder.layer.5.intermediate.dense.bias",
2893
+ "shape": [
2894
+ 4096
2895
+ ],
2896
+ "dtype": "float16"
2897
+ },
2898
+ {
2899
+ "name": "encoder.layer.5.output.LayerNorm.bias",
2900
+ "shape": [
2901
+ 1024
2902
+ ],
2903
+ "dtype": "float16"
2904
+ },
2905
+ {
2906
+ "name": "encoder.layer.5.output.LayerNorm.weight",
2907
+ "shape": [
2908
+ 1024
2909
+ ],
2910
+ "dtype": "float16"
2911
+ },
2912
+ {
2913
+ "name": "encoder.layer.5.output.dense.bias",
2914
+ "shape": [
2915
+ 1024
2916
+ ],
2917
+ "dtype": "float16"
2918
+ },
2919
+ {
2920
+ "name": "encoder.layer.6.attention.output.LayerNorm.bias",
2921
+ "shape": [
2922
+ 1024
2923
+ ],
2924
+ "dtype": "float16"
2925
+ },
2926
+ {
2927
+ "name": "encoder.layer.6.attention.output.LayerNorm.weight",
2928
+ "shape": [
2929
+ 1024
2930
+ ],
2931
+ "dtype": "float16"
2932
+ },
2933
+ {
2934
+ "name": "encoder.layer.6.attention.output.dense.bias",
2935
+ "shape": [
2936
+ 1024
2937
+ ],
2938
+ "dtype": "float16"
2939
+ },
2940
+ {
2941
+ "name": "encoder.layer.6.attention.self.key.bias",
2942
+ "shape": [
2943
+ 1024
2944
+ ],
2945
+ "dtype": "float16"
2946
+ },
2947
+ {
2948
+ "name": "encoder.layer.6.attention.self.query.bias",
2949
+ "shape": [
2950
+ 1024
2951
+ ],
2952
+ "dtype": "float16"
2953
+ },
2954
+ {
2955
+ "name": "encoder.layer.6.attention.self.value.bias",
2956
+ "shape": [
2957
+ 1024
2958
+ ],
2959
+ "dtype": "float16"
2960
+ },
2961
+ {
2962
+ "name": "encoder.layer.6.intermediate.dense.bias",
2963
+ "shape": [
2964
+ 4096
2965
+ ],
2966
+ "dtype": "float16"
2967
+ },
2968
+ {
2969
+ "name": "encoder.layer.6.output.LayerNorm.bias",
2970
+ "shape": [
2971
+ 1024
2972
+ ],
2973
+ "dtype": "float16"
2974
+ },
2975
+ {
2976
+ "name": "encoder.layer.6.output.LayerNorm.weight",
2977
+ "shape": [
2978
+ 1024
2979
+ ],
2980
+ "dtype": "float16"
2981
+ },
2982
+ {
2983
+ "name": "encoder.layer.6.output.dense.bias",
2984
+ "shape": [
2985
+ 1024
2986
+ ],
2987
+ "dtype": "float16"
2988
+ },
2989
+ {
2990
+ "name": "encoder.layer.7.attention.output.LayerNorm.bias",
2991
+ "shape": [
2992
+ 1024
2993
+ ],
2994
+ "dtype": "float16"
2995
+ },
2996
+ {
2997
+ "name": "encoder.layer.7.attention.output.LayerNorm.weight",
2998
+ "shape": [
2999
+ 1024
3000
+ ],
3001
+ "dtype": "float16"
3002
+ },
3003
+ {
3004
+ "name": "encoder.layer.7.attention.output.dense.bias",
3005
+ "shape": [
3006
+ 1024
3007
+ ],
3008
+ "dtype": "float16"
3009
+ },
3010
+ {
3011
+ "name": "encoder.layer.7.attention.self.key.bias",
3012
+ "shape": [
3013
+ 1024
3014
+ ],
3015
+ "dtype": "float16"
3016
+ },
3017
+ {
3018
+ "name": "encoder.layer.7.attention.self.query.bias",
3019
+ "shape": [
3020
+ 1024
3021
+ ],
3022
+ "dtype": "float16"
3023
+ },
3024
+ {
3025
+ "name": "encoder.layer.7.attention.self.value.bias",
3026
+ "shape": [
3027
+ 1024
3028
+ ],
3029
+ "dtype": "float16"
3030
+ },
3031
+ {
3032
+ "name": "encoder.layer.7.intermediate.dense.bias",
3033
+ "shape": [
3034
+ 4096
3035
+ ],
3036
+ "dtype": "float16"
3037
+ },
3038
+ {
3039
+ "name": "encoder.layer.7.output.LayerNorm.bias",
3040
+ "shape": [
3041
+ 1024
3042
+ ],
3043
+ "dtype": "float16"
3044
+ },
3045
+ {
3046
+ "name": "encoder.layer.7.output.LayerNorm.weight",
3047
+ "shape": [
3048
+ 1024
3049
+ ],
3050
+ "dtype": "float16"
3051
+ },
3052
+ {
3053
+ "name": "encoder.layer.7.output.dense.bias",
3054
+ "shape": [
3055
+ 1024
3056
+ ],
3057
+ "dtype": "float16"
3058
+ },
3059
+ {
3060
+ "name": "encoder.layer.8.attention.output.LayerNorm.bias",
3061
+ "shape": [
3062
+ 1024
3063
+ ],
3064
+ "dtype": "float16"
3065
+ },
3066
+ {
3067
+ "name": "encoder.layer.8.attention.output.LayerNorm.weight",
3068
+ "shape": [
3069
+ 1024
3070
+ ],
3071
+ "dtype": "float16"
3072
+ },
3073
+ {
3074
+ "name": "encoder.layer.8.attention.output.dense.bias",
3075
+ "shape": [
3076
+ 1024
3077
+ ],
3078
+ "dtype": "float16"
3079
+ },
3080
+ {
3081
+ "name": "encoder.layer.8.attention.self.key.bias",
3082
+ "shape": [
3083
+ 1024
3084
+ ],
3085
+ "dtype": "float16"
3086
+ },
3087
+ {
3088
+ "name": "encoder.layer.8.attention.self.query.bias",
3089
+ "shape": [
3090
+ 1024
3091
+ ],
3092
+ "dtype": "float16"
3093
+ },
3094
+ {
3095
+ "name": "encoder.layer.8.attention.self.value.bias",
3096
+ "shape": [
3097
+ 1024
3098
+ ],
3099
+ "dtype": "float16"
3100
+ },
3101
+ {
3102
+ "name": "encoder.layer.8.intermediate.dense.bias",
3103
+ "shape": [
3104
+ 4096
3105
+ ],
3106
+ "dtype": "float16"
3107
+ },
3108
+ {
3109
+ "name": "encoder.layer.8.output.LayerNorm.bias",
3110
+ "shape": [
3111
+ 1024
3112
+ ],
3113
+ "dtype": "float16"
3114
+ },
3115
+ {
3116
+ "name": "encoder.layer.8.output.LayerNorm.weight",
3117
+ "shape": [
3118
+ 1024
3119
+ ],
3120
+ "dtype": "float16"
3121
+ },
3122
+ {
3123
+ "name": "encoder.layer.8.output.dense.bias",
3124
+ "shape": [
3125
+ 1024
3126
+ ],
3127
+ "dtype": "float16"
3128
+ },
3129
+ {
3130
+ "name": "encoder.layer.9.attention.output.LayerNorm.bias",
3131
+ "shape": [
3132
+ 1024
3133
+ ],
3134
+ "dtype": "float16"
3135
+ },
3136
+ {
3137
+ "name": "encoder.layer.9.attention.output.LayerNorm.weight",
3138
+ "shape": [
3139
+ 1024
3140
+ ],
3141
+ "dtype": "float16"
3142
+ },
3143
+ {
3144
+ "name": "encoder.layer.9.attention.output.dense.bias",
3145
+ "shape": [
3146
+ 1024
3147
+ ],
3148
+ "dtype": "float16"
3149
+ },
3150
+ {
3151
+ "name": "encoder.layer.9.attention.self.key.bias",
3152
+ "shape": [
3153
+ 1024
3154
+ ],
3155
+ "dtype": "float16"
3156
+ },
3157
+ {
3158
+ "name": "encoder.layer.9.attention.self.query.bias",
3159
+ "shape": [
3160
+ 1024
3161
+ ],
3162
+ "dtype": "float16"
3163
+ },
3164
+ {
3165
+ "name": "encoder.layer.9.attention.self.value.bias",
3166
+ "shape": [
3167
+ 1024
3168
+ ],
3169
+ "dtype": "float16"
3170
+ },
3171
+ {
3172
+ "name": "encoder.layer.9.intermediate.dense.bias",
3173
+ "shape": [
3174
+ 4096
3175
+ ],
3176
+ "dtype": "float16"
3177
+ },
3178
+ {
3179
+ "name": "encoder.layer.9.output.LayerNorm.bias",
3180
+ "shape": [
3181
+ 1024
3182
+ ],
3183
+ "dtype": "float16"
3184
+ },
3185
+ {
3186
+ "name": "encoder.layer.9.output.LayerNorm.weight",
3187
+ "shape": [
3188
+ 1024
3189
+ ],
3190
+ "dtype": "float16"
3191
+ },
3192
+ {
3193
+ "name": "encoder.layer.9.output.dense.bias",
3194
+ "shape": [
3195
+ 1024
3196
+ ],
3197
+ "dtype": "float16"
3198
+ },
3199
+ {
3200
+ "name": "pooler.dense.bias",
3201
+ "shape": [
3202
+ 1024
3203
+ ],
3204
+ "dtype": "float16"
3205
+ }
3206
+ ]
3207
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f59925fcb90c92b894cb93e51bb9b4a6105c5c249fe54ce1c704420ac39b81af
3
+ size 17082756
tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 512,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "XLMRobertaTokenizer",
53
+ "unk_token": "<unk>"
54
+ }
weights.00.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:300cb491160d5d6341397f850e3f1b8bdfb3b7ecf21b1f77eeb5f75c731b99ee
3
+ size 561220870