hongyongjiang commited on
Commit
e6b8d1a
·
verified ·
1 Parent(s): eb27fc5

Upload qwen2.5-vl-3b/mmproj.txt with huggingface_hub

Browse files
Files changed (1) hide show
  1. qwen2.5-vl-3b/mmproj.txt +823 -0
qwen2.5-vl-3b/mmproj.txt ADDED
@@ -0,0 +1,823 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Loading GGUF file: Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf
2
+ Found 520 tensors
3
+ v.patch_embd.weight -> model.patch_embd.weight shape: (1280, 3, 14, 14)
4
+ v.patch_embd.weight.1 -> model.patch_embd.weight.1 shape: (1280, 3, 14, 14)
5
+ Converting v.blk.0.ln1.weight to FP16
6
+ v.blk.0.ln1.weight -> model.layers.0.ln1.weight shape: (1280,)
7
+ Converting v.blk.0.ln2.weight to FP16
8
+ v.blk.0.ln2.weight -> model.layers.0.ln2.weight shape: (1280,)
9
+ v.blk.0.attn_q.weight -> model.layers.0.self_attn.q_proj.weight shape: (1280, 1280)
10
+ v.blk.0.attn_k.weight -> model.layers.0.self_attn.k_proj.weight shape: (1280, 1280)
11
+ v.blk.0.attn_v.weight -> model.layers.0.self_attn.v_proj.weight shape: (1280, 1280)
12
+ Converting v.blk.0.attn_q.bias to FP16
13
+ v.blk.0.attn_q.bias -> model.layers.0.self_attn.q_proj.bias shape: (1280,)
14
+ Converting v.blk.0.attn_k.bias to FP16
15
+ v.blk.0.attn_k.bias -> model.layers.0.self_attn.k_proj.bias shape: (1280,)
16
+ Converting v.blk.0.attn_v.bias to FP16
17
+ v.blk.0.attn_v.bias -> model.layers.0.self_attn.v_proj.bias shape: (1280,)
18
+ v.blk.0.attn_out.weight -> model.layers.0.self_attn.o_proj.weight shape: (1280, 1280)
19
+ Converting v.blk.0.attn_out.bias to FP16
20
+ v.blk.0.attn_out.bias -> model.layers.0.self_attn.o_proj.bias shape: (1280,)
21
+ v.blk.0.ffn_gate.weight -> model.layers.0.mlp.gate_proj.weight shape: (3420, 1280)
22
+ Converting v.blk.0.ffn_gate.bias to FP16
23
+ v.blk.0.ffn_gate.bias -> model.layers.0.mlp.gate_proj.bias shape: (3420,)
24
+ v.blk.0.ffn_up.weight -> model.layers.0.mlp.up_proj.weight shape: (3420, 1280)
25
+ Converting v.blk.0.ffn_up.bias to FP16
26
+ v.blk.0.ffn_up.bias -> model.layers.0.mlp.up_proj.bias shape: (3420,)
27
+ v.blk.0.ffn_down.weight -> model.layers.0.mlp.down_proj.weight shape: (1280, 3420)
28
+ Converting v.blk.0.ffn_down.bias to FP16
29
+ v.blk.0.ffn_down.bias -> model.layers.0.mlp.down_proj.bias shape: (1280,)
30
+ Converting v.blk.1.ln1.weight to FP16
31
+ v.blk.1.ln1.weight -> model.layers.1.ln1.weight shape: (1280,)
32
+ Converting v.blk.1.ln2.weight to FP16
33
+ v.blk.1.ln2.weight -> model.layers.1.ln2.weight shape: (1280,)
34
+ v.blk.1.attn_q.weight -> model.layers.1.self_attn.q_proj.weight shape: (1280, 1280)
35
+ v.blk.1.attn_k.weight -> model.layers.1.self_attn.k_proj.weight shape: (1280, 1280)
36
+ v.blk.1.attn_v.weight -> model.layers.1.self_attn.v_proj.weight shape: (1280, 1280)
37
+ Converting v.blk.1.attn_q.bias to FP16
38
+ v.blk.1.attn_q.bias -> model.layers.1.self_attn.q_proj.bias shape: (1280,)
39
+ Converting v.blk.1.attn_k.bias to FP16
40
+ v.blk.1.attn_k.bias -> model.layers.1.self_attn.k_proj.bias shape: (1280,)
41
+ Converting v.blk.1.attn_v.bias to FP16
42
+ v.blk.1.attn_v.bias -> model.layers.1.self_attn.v_proj.bias shape: (1280,)
43
+ v.blk.1.attn_out.weight -> model.layers.1.self_attn.o_proj.weight shape: (1280, 1280)
44
+ Converting v.blk.1.attn_out.bias to FP16
45
+ v.blk.1.attn_out.bias -> model.layers.1.self_attn.o_proj.bias shape: (1280,)
46
+ v.blk.1.ffn_gate.weight -> model.layers.1.mlp.gate_proj.weight shape: (3420, 1280)
47
+ Converting v.blk.1.ffn_gate.bias to FP16
48
+ v.blk.1.ffn_gate.bias -> model.layers.1.mlp.gate_proj.bias shape: (3420,)
49
+ v.blk.1.ffn_up.weight -> model.layers.1.mlp.up_proj.weight shape: (3420, 1280)
50
+ Converting v.blk.1.ffn_up.bias to FP16
51
+ v.blk.1.ffn_up.bias -> model.layers.1.mlp.up_proj.bias shape: (3420,)
52
+ v.blk.1.ffn_down.weight -> model.layers.1.mlp.down_proj.weight shape: (1280, 3420)
53
+ Converting v.blk.1.ffn_down.bias to FP16
54
+ v.blk.1.ffn_down.bias -> model.layers.1.mlp.down_proj.bias shape: (1280,)
55
+ Converting v.blk.2.ln1.weight to FP16
56
+ v.blk.2.ln1.weight -> model.layers.2.ln1.weight shape: (1280,)
57
+ Converting v.blk.2.ln2.weight to FP16
58
+ v.blk.2.ln2.weight -> model.layers.2.ln2.weight shape: (1280,)
59
+ v.blk.2.attn_q.weight -> model.layers.2.self_attn.q_proj.weight shape: (1280, 1280)
60
+ v.blk.2.attn_k.weight -> model.layers.2.self_attn.k_proj.weight shape: (1280, 1280)
61
+ v.blk.2.attn_v.weight -> model.layers.2.self_attn.v_proj.weight shape: (1280, 1280)
62
+ Converting v.blk.2.attn_q.bias to FP16
63
+ v.blk.2.attn_q.bias -> model.layers.2.self_attn.q_proj.bias shape: (1280,)
64
+ Converting v.blk.2.attn_k.bias to FP16
65
+ v.blk.2.attn_k.bias -> model.layers.2.self_attn.k_proj.bias shape: (1280,)
66
+ Converting v.blk.2.attn_v.bias to FP16
67
+ v.blk.2.attn_v.bias -> model.layers.2.self_attn.v_proj.bias shape: (1280,)
68
+ v.blk.2.attn_out.weight -> model.layers.2.self_attn.o_proj.weight shape: (1280, 1280)
69
+ Converting v.blk.2.attn_out.bias to FP16
70
+ v.blk.2.attn_out.bias -> model.layers.2.self_attn.o_proj.bias shape: (1280,)
71
+ v.blk.2.ffn_gate.weight -> model.layers.2.mlp.gate_proj.weight shape: (3420, 1280)
72
+ Converting v.blk.2.ffn_gate.bias to FP16
73
+ v.blk.2.ffn_gate.bias -> model.layers.2.mlp.gate_proj.bias shape: (3420,)
74
+ v.blk.2.ffn_up.weight -> model.layers.2.mlp.up_proj.weight shape: (3420, 1280)
75
+ Converting v.blk.2.ffn_up.bias to FP16
76
+ v.blk.2.ffn_up.bias -> model.layers.2.mlp.up_proj.bias shape: (3420,)
77
+ v.blk.2.ffn_down.weight -> model.layers.2.mlp.down_proj.weight shape: (1280, 3420)
78
+ Converting v.blk.2.ffn_down.bias to FP16
79
+ v.blk.2.ffn_down.bias -> model.layers.2.mlp.down_proj.bias shape: (1280,)
80
+ Converting v.blk.3.ln1.weight to FP16
81
+ v.blk.3.ln1.weight -> model.layers.3.ln1.weight shape: (1280,)
82
+ Converting v.blk.3.ln2.weight to FP16
83
+ v.blk.3.ln2.weight -> model.layers.3.ln2.weight shape: (1280,)
84
+ v.blk.3.attn_q.weight -> model.layers.3.self_attn.q_proj.weight shape: (1280, 1280)
85
+ v.blk.3.attn_k.weight -> model.layers.3.self_attn.k_proj.weight shape: (1280, 1280)
86
+ v.blk.3.attn_v.weight -> model.layers.3.self_attn.v_proj.weight shape: (1280, 1280)
87
+ Converting v.blk.3.attn_q.bias to FP16
88
+ v.blk.3.attn_q.bias -> model.layers.3.self_attn.q_proj.bias shape: (1280,)
89
+ Converting v.blk.3.attn_k.bias to FP16
90
+ v.blk.3.attn_k.bias -> model.layers.3.self_attn.k_proj.bias shape: (1280,)
91
+ Converting v.blk.3.attn_v.bias to FP16
92
+ v.blk.3.attn_v.bias -> model.layers.3.self_attn.v_proj.bias shape: (1280,)
93
+ v.blk.3.attn_out.weight -> model.layers.3.self_attn.o_proj.weight shape: (1280, 1280)
94
+ Converting v.blk.3.attn_out.bias to FP16
95
+ v.blk.3.attn_out.bias -> model.layers.3.self_attn.o_proj.bias shape: (1280,)
96
+ v.blk.3.ffn_gate.weight -> model.layers.3.mlp.gate_proj.weight shape: (3420, 1280)
97
+ Converting v.blk.3.ffn_gate.bias to FP16
98
+ v.blk.3.ffn_gate.bias -> model.layers.3.mlp.gate_proj.bias shape: (3420,)
99
+ v.blk.3.ffn_up.weight -> model.layers.3.mlp.up_proj.weight shape: (3420, 1280)
100
+ Converting v.blk.3.ffn_up.bias to FP16
101
+ v.blk.3.ffn_up.bias -> model.layers.3.mlp.up_proj.bias shape: (3420,)
102
+ v.blk.3.ffn_down.weight -> model.layers.3.mlp.down_proj.weight shape: (1280, 3420)
103
+ Converting v.blk.3.ffn_down.bias to FP16
104
+ v.blk.3.ffn_down.bias -> model.layers.3.mlp.down_proj.bias shape: (1280,)
105
+ Converting v.blk.4.ln1.weight to FP16
106
+ v.blk.4.ln1.weight -> model.layers.4.ln1.weight shape: (1280,)
107
+ Converting v.blk.4.ln2.weight to FP16
108
+ v.blk.4.ln2.weight -> model.layers.4.ln2.weight shape: (1280,)
109
+ v.blk.4.attn_q.weight -> model.layers.4.self_attn.q_proj.weight shape: (1280, 1280)
110
+ v.blk.4.attn_k.weight -> model.layers.4.self_attn.k_proj.weight shape: (1280, 1280)
111
+ v.blk.4.attn_v.weight -> model.layers.4.self_attn.v_proj.weight shape: (1280, 1280)
112
+ Converting v.blk.4.attn_q.bias to FP16
113
+ v.blk.4.attn_q.bias -> model.layers.4.self_attn.q_proj.bias shape: (1280,)
114
+ Converting v.blk.4.attn_k.bias to FP16
115
+ v.blk.4.attn_k.bias -> model.layers.4.self_attn.k_proj.bias shape: (1280,)
116
+ Converting v.blk.4.attn_v.bias to FP16
117
+ v.blk.4.attn_v.bias -> model.layers.4.self_attn.v_proj.bias shape: (1280,)
118
+ v.blk.4.attn_out.weight -> model.layers.4.self_attn.o_proj.weight shape: (1280, 1280)
119
+ Converting v.blk.4.attn_out.bias to FP16
120
+ v.blk.4.attn_out.bias -> model.layers.4.self_attn.o_proj.bias shape: (1280,)
121
+ v.blk.4.ffn_gate.weight -> model.layers.4.mlp.gate_proj.weight shape: (3420, 1280)
122
+ Converting v.blk.4.ffn_gate.bias to FP16
123
+ v.blk.4.ffn_gate.bias -> model.layers.4.mlp.gate_proj.bias shape: (3420,)
124
+ v.blk.4.ffn_up.weight -> model.layers.4.mlp.up_proj.weight shape: (3420, 1280)
125
+ Converting v.blk.4.ffn_up.bias to FP16
126
+ v.blk.4.ffn_up.bias -> model.layers.4.mlp.up_proj.bias shape: (3420,)
127
+ v.blk.4.ffn_down.weight -> model.layers.4.mlp.down_proj.weight shape: (1280, 3420)
128
+ Converting v.blk.4.ffn_down.bias to FP16
129
+ v.blk.4.ffn_down.bias -> model.layers.4.mlp.down_proj.bias shape: (1280,)
130
+ Converting v.blk.5.ln1.weight to FP16
131
+ v.blk.5.ln1.weight -> model.layers.5.ln1.weight shape: (1280,)
132
+ Converting v.blk.5.ln2.weight to FP16
133
+ v.blk.5.ln2.weight -> model.layers.5.ln2.weight shape: (1280,)
134
+ v.blk.5.attn_q.weight -> model.layers.5.self_attn.q_proj.weight shape: (1280, 1280)
135
+ v.blk.5.attn_k.weight -> model.layers.5.self_attn.k_proj.weight shape: (1280, 1280)
136
+ v.blk.5.attn_v.weight -> model.layers.5.self_attn.v_proj.weight shape: (1280, 1280)
137
+ Converting v.blk.5.attn_q.bias to FP16
138
+ v.blk.5.attn_q.bias -> model.layers.5.self_attn.q_proj.bias shape: (1280,)
139
+ Converting v.blk.5.attn_k.bias to FP16
140
+ v.blk.5.attn_k.bias -> model.layers.5.self_attn.k_proj.bias shape: (1280,)
141
+ Converting v.blk.5.attn_v.bias to FP16
142
+ v.blk.5.attn_v.bias -> model.layers.5.self_attn.v_proj.bias shape: (1280,)
143
+ v.blk.5.attn_out.weight -> model.layers.5.self_attn.o_proj.weight shape: (1280, 1280)
144
+ Converting v.blk.5.attn_out.bias to FP16
145
+ v.blk.5.attn_out.bias -> model.layers.5.self_attn.o_proj.bias shape: (1280,)
146
+ v.blk.5.ffn_gate.weight -> model.layers.5.mlp.gate_proj.weight shape: (3420, 1280)
147
+ Converting v.blk.5.ffn_gate.bias to FP16
148
+ v.blk.5.ffn_gate.bias -> model.layers.5.mlp.gate_proj.bias shape: (3420,)
149
+ v.blk.5.ffn_up.weight -> model.layers.5.mlp.up_proj.weight shape: (3420, 1280)
150
+ Converting v.blk.5.ffn_up.bias to FP16
151
+ v.blk.5.ffn_up.bias -> model.layers.5.mlp.up_proj.bias shape: (3420,)
152
+ v.blk.5.ffn_down.weight -> model.layers.5.mlp.down_proj.weight shape: (1280, 3420)
153
+ Converting v.blk.5.ffn_down.bias to FP16
154
+ v.blk.5.ffn_down.bias -> model.layers.5.mlp.down_proj.bias shape: (1280,)
155
+ Converting v.blk.6.ln1.weight to FP16
156
+ v.blk.6.ln1.weight -> model.layers.6.ln1.weight shape: (1280,)
157
+ Converting v.blk.6.ln2.weight to FP16
158
+ v.blk.6.ln2.weight -> model.layers.6.ln2.weight shape: (1280,)
159
+ v.blk.6.attn_q.weight -> model.layers.6.self_attn.q_proj.weight shape: (1280, 1280)
160
+ v.blk.6.attn_k.weight -> model.layers.6.self_attn.k_proj.weight shape: (1280, 1280)
161
+ v.blk.6.attn_v.weight -> model.layers.6.self_attn.v_proj.weight shape: (1280, 1280)
162
+ Converting v.blk.6.attn_q.bias to FP16
163
+ v.blk.6.attn_q.bias -> model.layers.6.self_attn.q_proj.bias shape: (1280,)
164
+ Converting v.blk.6.attn_k.bias to FP16
165
+ v.blk.6.attn_k.bias -> model.layers.6.self_attn.k_proj.bias shape: (1280,)
166
+ Converting v.blk.6.attn_v.bias to FP16
167
+ v.blk.6.attn_v.bias -> model.layers.6.self_attn.v_proj.bias shape: (1280,)
168
+ v.blk.6.attn_out.weight -> model.layers.6.self_attn.o_proj.weight shape: (1280, 1280)
169
+ Converting v.blk.6.attn_out.bias to FP16
170
+ v.blk.6.attn_out.bias -> model.layers.6.self_attn.o_proj.bias shape: (1280,)
171
+ v.blk.6.ffn_gate.weight -> model.layers.6.mlp.gate_proj.weight shape: (3420, 1280)
172
+ Converting v.blk.6.ffn_gate.bias to FP16
173
+ v.blk.6.ffn_gate.bias -> model.layers.6.mlp.gate_proj.bias shape: (3420,)
174
+ v.blk.6.ffn_up.weight -> model.layers.6.mlp.up_proj.weight shape: (3420, 1280)
175
+ Converting v.blk.6.ffn_up.bias to FP16
176
+ v.blk.6.ffn_up.bias -> model.layers.6.mlp.up_proj.bias shape: (3420,)
177
+ v.blk.6.ffn_down.weight -> model.layers.6.mlp.down_proj.weight shape: (1280, 3420)
178
+ Converting v.blk.6.ffn_down.bias to FP16
179
+ v.blk.6.ffn_down.bias -> model.layers.6.mlp.down_proj.bias shape: (1280,)
180
+ Converting v.blk.7.ln1.weight to FP16
181
+ v.blk.7.ln1.weight -> model.layers.7.ln1.weight shape: (1280,)
182
+ Converting v.blk.7.ln2.weight to FP16
183
+ v.blk.7.ln2.weight -> model.layers.7.ln2.weight shape: (1280,)
184
+ v.blk.7.attn_q.weight -> model.layers.7.self_attn.q_proj.weight shape: (1280, 1280)
185
+ v.blk.7.attn_k.weight -> model.layers.7.self_attn.k_proj.weight shape: (1280, 1280)
186
+ v.blk.7.attn_v.weight -> model.layers.7.self_attn.v_proj.weight shape: (1280, 1280)
187
+ Converting v.blk.7.attn_q.bias to FP16
188
+ v.blk.7.attn_q.bias -> model.layers.7.self_attn.q_proj.bias shape: (1280,)
189
+ Converting v.blk.7.attn_k.bias to FP16
190
+ v.blk.7.attn_k.bias -> model.layers.7.self_attn.k_proj.bias shape: (1280,)
191
+ Converting v.blk.7.attn_v.bias to FP16
192
+ v.blk.7.attn_v.bias -> model.layers.7.self_attn.v_proj.bias shape: (1280,)
193
+ v.blk.7.attn_out.weight -> model.layers.7.self_attn.o_proj.weight shape: (1280, 1280)
194
+ Converting v.blk.7.attn_out.bias to FP16
195
+ v.blk.7.attn_out.bias -> model.layers.7.self_attn.o_proj.bias shape: (1280,)
196
+ v.blk.7.ffn_gate.weight -> model.layers.7.mlp.gate_proj.weight shape: (3420, 1280)
197
+ Converting v.blk.7.ffn_gate.bias to FP16
198
+ v.blk.7.ffn_gate.bias -> model.layers.7.mlp.gate_proj.bias shape: (3420,)
199
+ v.blk.7.ffn_up.weight -> model.layers.7.mlp.up_proj.weight shape: (3420, 1280)
200
+ Converting v.blk.7.ffn_up.bias to FP16
201
+ v.blk.7.ffn_up.bias -> model.layers.7.mlp.up_proj.bias shape: (3420,)
202
+ v.blk.7.ffn_down.weight -> model.layers.7.mlp.down_proj.weight shape: (1280, 3420)
203
+ Converting v.blk.7.ffn_down.bias to FP16
204
+ v.blk.7.ffn_down.bias -> model.layers.7.mlp.down_proj.bias shape: (1280,)
205
+ Converting v.blk.8.ln1.weight to FP16
206
+ v.blk.8.ln1.weight -> model.layers.8.ln1.weight shape: (1280,)
207
+ Converting v.blk.8.ln2.weight to FP16
208
+ v.blk.8.ln2.weight -> model.layers.8.ln2.weight shape: (1280,)
209
+ v.blk.8.attn_q.weight -> model.layers.8.self_attn.q_proj.weight shape: (1280, 1280)
210
+ v.blk.8.attn_k.weight -> model.layers.8.self_attn.k_proj.weight shape: (1280, 1280)
211
+ v.blk.8.attn_v.weight -> model.layers.8.self_attn.v_proj.weight shape: (1280, 1280)
212
+ Converting v.blk.8.attn_q.bias to FP16
213
+ v.blk.8.attn_q.bias -> model.layers.8.self_attn.q_proj.bias shape: (1280,)
214
+ Converting v.blk.8.attn_k.bias to FP16
215
+ v.blk.8.attn_k.bias -> model.layers.8.self_attn.k_proj.bias shape: (1280,)
216
+ Converting v.blk.8.attn_v.bias to FP16
217
+ v.blk.8.attn_v.bias -> model.layers.8.self_attn.v_proj.bias shape: (1280,)
218
+ v.blk.8.attn_out.weight -> model.layers.8.self_attn.o_proj.weight shape: (1280, 1280)
219
+ Converting v.blk.8.attn_out.bias to FP16
220
+ v.blk.8.attn_out.bias -> model.layers.8.self_attn.o_proj.bias shape: (1280,)
221
+ v.blk.8.ffn_gate.weight -> model.layers.8.mlp.gate_proj.weight shape: (3420, 1280)
222
+ Converting v.blk.8.ffn_gate.bias to FP16
223
+ v.blk.8.ffn_gate.bias -> model.layers.8.mlp.gate_proj.bias shape: (3420,)
224
+ v.blk.8.ffn_up.weight -> model.layers.8.mlp.up_proj.weight shape: (3420, 1280)
225
+ Converting v.blk.8.ffn_up.bias to FP16
226
+ v.blk.8.ffn_up.bias -> model.layers.8.mlp.up_proj.bias shape: (3420,)
227
+ v.blk.8.ffn_down.weight -> model.layers.8.mlp.down_proj.weight shape: (1280, 3420)
228
+ Converting v.blk.8.ffn_down.bias to FP16
229
+ v.blk.8.ffn_down.bias -> model.layers.8.mlp.down_proj.bias shape: (1280,)
230
+ Converting v.blk.9.ln1.weight to FP16
231
+ v.blk.9.ln1.weight -> model.layers.9.ln1.weight shape: (1280,)
232
+ Converting v.blk.9.ln2.weight to FP16
233
+ v.blk.9.ln2.weight -> model.layers.9.ln2.weight shape: (1280,)
234
+ v.blk.9.attn_q.weight -> model.layers.9.self_attn.q_proj.weight shape: (1280, 1280)
235
+ v.blk.9.attn_k.weight -> model.layers.9.self_attn.k_proj.weight shape: (1280, 1280)
236
+ v.blk.9.attn_v.weight -> model.layers.9.self_attn.v_proj.weight shape: (1280, 1280)
237
+ Converting v.blk.9.attn_q.bias to FP16
238
+ v.blk.9.attn_q.bias -> model.layers.9.self_attn.q_proj.bias shape: (1280,)
239
+ Converting v.blk.9.attn_k.bias to FP16
240
+ v.blk.9.attn_k.bias -> model.layers.9.self_attn.k_proj.bias shape: (1280,)
241
+ Converting v.blk.9.attn_v.bias to FP16
242
+ v.blk.9.attn_v.bias -> model.layers.9.self_attn.v_proj.bias shape: (1280,)
243
+ v.blk.9.attn_out.weight -> model.layers.9.self_attn.o_proj.weight shape: (1280, 1280)
244
+ Converting v.blk.9.attn_out.bias to FP16
245
+ v.blk.9.attn_out.bias -> model.layers.9.self_attn.o_proj.bias shape: (1280,)
246
+ v.blk.9.ffn_gate.weight -> model.layers.9.mlp.gate_proj.weight shape: (3420, 1280)
247
+ Converting v.blk.9.ffn_gate.bias to FP16
248
+ v.blk.9.ffn_gate.bias -> model.layers.9.mlp.gate_proj.bias shape: (3420,)
249
+ v.blk.9.ffn_up.weight -> model.layers.9.mlp.up_proj.weight shape: (3420, 1280)
250
+ Converting v.blk.9.ffn_up.bias to FP16
251
+ v.blk.9.ffn_up.bias -> model.layers.9.mlp.up_proj.bias shape: (3420,)
252
+ v.blk.9.ffn_down.weight -> model.layers.9.mlp.down_proj.weight shape: (1280, 3420)
253
+ Converting v.blk.9.ffn_down.bias to FP16
254
+ v.blk.9.ffn_down.bias -> model.layers.9.mlp.down_proj.bias shape: (1280,)
255
+ Converting v.blk.10.ln1.weight to FP16
256
+ v.blk.10.ln1.weight -> model.layers.10.ln1.weight shape: (1280,)
257
+ Converting v.blk.10.ln2.weight to FP16
258
+ v.blk.10.ln2.weight -> model.layers.10.ln2.weight shape: (1280,)
259
+ v.blk.10.attn_q.weight -> model.layers.10.self_attn.q_proj.weight shape: (1280, 1280)
260
+ v.blk.10.attn_k.weight -> model.layers.10.self_attn.k_proj.weight shape: (1280, 1280)
261
+ v.blk.10.attn_v.weight -> model.layers.10.self_attn.v_proj.weight shape: (1280, 1280)
262
+ Converting v.blk.10.attn_q.bias to FP16
263
+ v.blk.10.attn_q.bias -> model.layers.10.self_attn.q_proj.bias shape: (1280,)
264
+ Converting v.blk.10.attn_k.bias to FP16
265
+ v.blk.10.attn_k.bias -> model.layers.10.self_attn.k_proj.bias shape: (1280,)
266
+ Converting v.blk.10.attn_v.bias to FP16
267
+ v.blk.10.attn_v.bias -> model.layers.10.self_attn.v_proj.bias shape: (1280,)
268
+ v.blk.10.attn_out.weight -> model.layers.10.self_attn.o_proj.weight shape: (1280, 1280)
269
+ Converting v.blk.10.attn_out.bias to FP16
270
+ v.blk.10.attn_out.bias -> model.layers.10.self_attn.o_proj.bias shape: (1280,)
271
+ v.blk.10.ffn_gate.weight -> model.layers.10.mlp.gate_proj.weight shape: (3420, 1280)
272
+ Converting v.blk.10.ffn_gate.bias to FP16
273
+ v.blk.10.ffn_gate.bias -> model.layers.10.mlp.gate_proj.bias shape: (3420,)
274
+ v.blk.10.ffn_up.weight -> model.layers.10.mlp.up_proj.weight shape: (3420, 1280)
275
+ Converting v.blk.10.ffn_up.bias to FP16
276
+ v.blk.10.ffn_up.bias -> model.layers.10.mlp.up_proj.bias shape: (3420,)
277
+ v.blk.10.ffn_down.weight -> model.layers.10.mlp.down_proj.weight shape: (1280, 3420)
278
+ Converting v.blk.10.ffn_down.bias to FP16
279
+ v.blk.10.ffn_down.bias -> model.layers.10.mlp.down_proj.bias shape: (1280,)
280
+ Converting v.blk.11.ln1.weight to FP16
281
+ v.blk.11.ln1.weight -> model.layers.11.ln1.weight shape: (1280,)
282
+ Converting v.blk.11.ln2.weight to FP16
283
+ v.blk.11.ln2.weight -> model.layers.11.ln2.weight shape: (1280,)
284
+ v.blk.11.attn_q.weight -> model.layers.11.self_attn.q_proj.weight shape: (1280, 1280)
285
+ v.blk.11.attn_k.weight -> model.layers.11.self_attn.k_proj.weight shape: (1280, 1280)
286
+ v.blk.11.attn_v.weight -> model.layers.11.self_attn.v_proj.weight shape: (1280, 1280)
287
+ Converting v.blk.11.attn_q.bias to FP16
288
+ v.blk.11.attn_q.bias -> model.layers.11.self_attn.q_proj.bias shape: (1280,)
289
+ Converting v.blk.11.attn_k.bias to FP16
290
+ v.blk.11.attn_k.bias -> model.layers.11.self_attn.k_proj.bias shape: (1280,)
291
+ Converting v.blk.11.attn_v.bias to FP16
292
+ v.blk.11.attn_v.bias -> model.layers.11.self_attn.v_proj.bias shape: (1280,)
293
+ v.blk.11.attn_out.weight -> model.layers.11.self_attn.o_proj.weight shape: (1280, 1280)
294
+ Converting v.blk.11.attn_out.bias to FP16
295
+ v.blk.11.attn_out.bias -> model.layers.11.self_attn.o_proj.bias shape: (1280,)
296
+ v.blk.11.ffn_gate.weight -> model.layers.11.mlp.gate_proj.weight shape: (3420, 1280)
297
+ Converting v.blk.11.ffn_gate.bias to FP16
298
+ v.blk.11.ffn_gate.bias -> model.layers.11.mlp.gate_proj.bias shape: (3420,)
299
+ v.blk.11.ffn_up.weight -> model.layers.11.mlp.up_proj.weight shape: (3420, 1280)
300
+ Converting v.blk.11.ffn_up.bias to FP16
301
+ v.blk.11.ffn_up.bias -> model.layers.11.mlp.up_proj.bias shape: (3420,)
302
+ v.blk.11.ffn_down.weight -> model.layers.11.mlp.down_proj.weight shape: (1280, 3420)
303
+ Converting v.blk.11.ffn_down.bias to FP16
304
+ v.blk.11.ffn_down.bias -> model.layers.11.mlp.down_proj.bias shape: (1280,)
305
+ Converting v.blk.12.ln1.weight to FP16
306
+ v.blk.12.ln1.weight -> model.layers.12.ln1.weight shape: (1280,)
307
+ Converting v.blk.12.ln2.weight to FP16
308
+ v.blk.12.ln2.weight -> model.layers.12.ln2.weight shape: (1280,)
309
+ v.blk.12.attn_q.weight -> model.layers.12.self_attn.q_proj.weight shape: (1280, 1280)
310
+ v.blk.12.attn_k.weight -> model.layers.12.self_attn.k_proj.weight shape: (1280, 1280)
311
+ v.blk.12.attn_v.weight -> model.layers.12.self_attn.v_proj.weight shape: (1280, 1280)
312
+ Converting v.blk.12.attn_q.bias to FP16
313
+ v.blk.12.attn_q.bias -> model.layers.12.self_attn.q_proj.bias shape: (1280,)
314
+ Converting v.blk.12.attn_k.bias to FP16
315
+ v.blk.12.attn_k.bias -> model.layers.12.self_attn.k_proj.bias shape: (1280,)
316
+ Converting v.blk.12.attn_v.bias to FP16
317
+ v.blk.12.attn_v.bias -> model.layers.12.self_attn.v_proj.bias shape: (1280,)
318
+ v.blk.12.attn_out.weight -> model.layers.12.self_attn.o_proj.weight shape: (1280, 1280)
319
+ Converting v.blk.12.attn_out.bias to FP16
320
+ v.blk.12.attn_out.bias -> model.layers.12.self_attn.o_proj.bias shape: (1280,)
321
+ v.blk.12.ffn_gate.weight -> model.layers.12.mlp.gate_proj.weight shape: (3420, 1280)
322
+ Converting v.blk.12.ffn_gate.bias to FP16
323
+ v.blk.12.ffn_gate.bias -> model.layers.12.mlp.gate_proj.bias shape: (3420,)
324
+ v.blk.12.ffn_up.weight -> model.layers.12.mlp.up_proj.weight shape: (3420, 1280)
325
+ Converting v.blk.12.ffn_up.bias to FP16
326
+ v.blk.12.ffn_up.bias -> model.layers.12.mlp.up_proj.bias shape: (3420,)
327
+ v.blk.12.ffn_down.weight -> model.layers.12.mlp.down_proj.weight shape: (1280, 3420)
328
+ Converting v.blk.12.ffn_down.bias to FP16
329
+ v.blk.12.ffn_down.bias -> model.layers.12.mlp.down_proj.bias shape: (1280,)
330
+ Converting v.blk.13.ln1.weight to FP16
331
+ v.blk.13.ln1.weight -> model.layers.13.ln1.weight shape: (1280,)
332
+ Converting v.blk.13.ln2.weight to FP16
333
+ v.blk.13.ln2.weight -> model.layers.13.ln2.weight shape: (1280,)
334
+ v.blk.13.attn_q.weight -> model.layers.13.self_attn.q_proj.weight shape: (1280, 1280)
335
+ v.blk.13.attn_k.weight -> model.layers.13.self_attn.k_proj.weight shape: (1280, 1280)
336
+ v.blk.13.attn_v.weight -> model.layers.13.self_attn.v_proj.weight shape: (1280, 1280)
337
+ Converting v.blk.13.attn_q.bias to FP16
338
+ v.blk.13.attn_q.bias -> model.layers.13.self_attn.q_proj.bias shape: (1280,)
339
+ Converting v.blk.13.attn_k.bias to FP16
340
+ v.blk.13.attn_k.bias -> model.layers.13.self_attn.k_proj.bias shape: (1280,)
341
+ Converting v.blk.13.attn_v.bias to FP16
342
+ v.blk.13.attn_v.bias -> model.layers.13.self_attn.v_proj.bias shape: (1280,)
343
+ v.blk.13.attn_out.weight -> model.layers.13.self_attn.o_proj.weight shape: (1280, 1280)
344
+ Converting v.blk.13.attn_out.bias to FP16
345
+ v.blk.13.attn_out.bias -> model.layers.13.self_attn.o_proj.bias shape: (1280,)
346
+ v.blk.13.ffn_gate.weight -> model.layers.13.mlp.gate_proj.weight shape: (3420, 1280)
347
+ Converting v.blk.13.ffn_gate.bias to FP16
348
+ v.blk.13.ffn_gate.bias -> model.layers.13.mlp.gate_proj.bias shape: (3420,)
349
+ v.blk.13.ffn_up.weight -> model.layers.13.mlp.up_proj.weight shape: (3420, 1280)
350
+ Converting v.blk.13.ffn_up.bias to FP16
351
+ v.blk.13.ffn_up.bias -> model.layers.13.mlp.up_proj.bias shape: (3420,)
352
+ v.blk.13.ffn_down.weight -> model.layers.13.mlp.down_proj.weight shape: (1280, 3420)
353
+ Converting v.blk.13.ffn_down.bias to FP16
354
+ v.blk.13.ffn_down.bias -> model.layers.13.mlp.down_proj.bias shape: (1280,)
355
+ Converting v.blk.14.ln1.weight to FP16
356
+ v.blk.14.ln1.weight -> model.layers.14.ln1.weight shape: (1280,)
357
+ Converting v.blk.14.ln2.weight to FP16
358
+ v.blk.14.ln2.weight -> model.layers.14.ln2.weight shape: (1280,)
359
+ v.blk.14.attn_q.weight -> model.layers.14.self_attn.q_proj.weight shape: (1280, 1280)
360
+ v.blk.14.attn_k.weight -> model.layers.14.self_attn.k_proj.weight shape: (1280, 1280)
361
+ v.blk.14.attn_v.weight -> model.layers.14.self_attn.v_proj.weight shape: (1280, 1280)
362
+ Converting v.blk.14.attn_q.bias to FP16
363
+ v.blk.14.attn_q.bias -> model.layers.14.self_attn.q_proj.bias shape: (1280,)
364
+ Converting v.blk.14.attn_k.bias to FP16
365
+ v.blk.14.attn_k.bias -> model.layers.14.self_attn.k_proj.bias shape: (1280,)
366
+ Converting v.blk.14.attn_v.bias to FP16
367
+ v.blk.14.attn_v.bias -> model.layers.14.self_attn.v_proj.bias shape: (1280,)
368
+ v.blk.14.attn_out.weight -> model.layers.14.self_attn.o_proj.weight shape: (1280, 1280)
369
+ Converting v.blk.14.attn_out.bias to FP16
370
+ v.blk.14.attn_out.bias -> model.layers.14.self_attn.o_proj.bias shape: (1280,)
371
+ v.blk.14.ffn_gate.weight -> model.layers.14.mlp.gate_proj.weight shape: (3420, 1280)
372
+ Converting v.blk.14.ffn_gate.bias to FP16
373
+ v.blk.14.ffn_gate.bias -> model.layers.14.mlp.gate_proj.bias shape: (3420,)
374
+ v.blk.14.ffn_up.weight -> model.layers.14.mlp.up_proj.weight shape: (3420, 1280)
375
+ Converting v.blk.14.ffn_up.bias to FP16
376
+ v.blk.14.ffn_up.bias -> model.layers.14.mlp.up_proj.bias shape: (3420,)
377
+ v.blk.14.ffn_down.weight -> model.layers.14.mlp.down_proj.weight shape: (1280, 3420)
378
+ Converting v.blk.14.ffn_down.bias to FP16
379
+ v.blk.14.ffn_down.bias -> model.layers.14.mlp.down_proj.bias shape: (1280,)
380
+ Converting v.blk.15.ln1.weight to FP16
381
+ v.blk.15.ln1.weight -> model.layers.15.ln1.weight shape: (1280,)
382
+ Converting v.blk.15.ln2.weight to FP16
383
+ v.blk.15.ln2.weight -> model.layers.15.ln2.weight shape: (1280,)
384
+ v.blk.15.attn_q.weight -> model.layers.15.self_attn.q_proj.weight shape: (1280, 1280)
385
+ v.blk.15.attn_k.weight -> model.layers.15.self_attn.k_proj.weight shape: (1280, 1280)
386
+ v.blk.15.attn_v.weight -> model.layers.15.self_attn.v_proj.weight shape: (1280, 1280)
387
+ Converting v.blk.15.attn_q.bias to FP16
388
+ v.blk.15.attn_q.bias -> model.layers.15.self_attn.q_proj.bias shape: (1280,)
389
+ Converting v.blk.15.attn_k.bias to FP16
390
+ v.blk.15.attn_k.bias -> model.layers.15.self_attn.k_proj.bias shape: (1280,)
391
+ Converting v.blk.15.attn_v.bias to FP16
392
+ v.blk.15.attn_v.bias -> model.layers.15.self_attn.v_proj.bias shape: (1280,)
393
+ v.blk.15.attn_out.weight -> model.layers.15.self_attn.o_proj.weight shape: (1280, 1280)
394
+ Converting v.blk.15.attn_out.bias to FP16
395
+ v.blk.15.attn_out.bias -> model.layers.15.self_attn.o_proj.bias shape: (1280,)
396
+ v.blk.15.ffn_gate.weight -> model.layers.15.mlp.gate_proj.weight shape: (3420, 1280)
397
+ Converting v.blk.15.ffn_gate.bias to FP16
398
+ v.blk.15.ffn_gate.bias -> model.layers.15.mlp.gate_proj.bias shape: (3420,)
399
+ v.blk.15.ffn_up.weight -> model.layers.15.mlp.up_proj.weight shape: (3420, 1280)
400
+ Converting v.blk.15.ffn_up.bias to FP16
401
+ v.blk.15.ffn_up.bias -> model.layers.15.mlp.up_proj.bias shape: (3420,)
402
+ v.blk.15.ffn_down.weight -> model.layers.15.mlp.down_proj.weight shape: (1280, 3420)
403
+ Converting v.blk.15.ffn_down.bias to FP16
404
+ v.blk.15.ffn_down.bias -> model.layers.15.mlp.down_proj.bias shape: (1280,)
405
+ Converting v.blk.16.ln1.weight to FP16
406
+ v.blk.16.ln1.weight -> model.layers.16.ln1.weight shape: (1280,)
407
+ Converting v.blk.16.ln2.weight to FP16
408
+ v.blk.16.ln2.weight -> model.layers.16.ln2.weight shape: (1280,)
409
+ v.blk.16.attn_q.weight -> model.layers.16.self_attn.q_proj.weight shape: (1280, 1280)
410
+ v.blk.16.attn_k.weight -> model.layers.16.self_attn.k_proj.weight shape: (1280, 1280)
411
+ v.blk.16.attn_v.weight -> model.layers.16.self_attn.v_proj.weight shape: (1280, 1280)
412
+ Converting v.blk.16.attn_q.bias to FP16
413
+ v.blk.16.attn_q.bias -> model.layers.16.self_attn.q_proj.bias shape: (1280,)
414
+ Converting v.blk.16.attn_k.bias to FP16
415
+ v.blk.16.attn_k.bias -> model.layers.16.self_attn.k_proj.bias shape: (1280,)
416
+ Converting v.blk.16.attn_v.bias to FP16
417
+ v.blk.16.attn_v.bias -> model.layers.16.self_attn.v_proj.bias shape: (1280,)
418
+ v.blk.16.attn_out.weight -> model.layers.16.self_attn.o_proj.weight shape: (1280, 1280)
419
+ Converting v.blk.16.attn_out.bias to FP16
420
+ v.blk.16.attn_out.bias -> model.layers.16.self_attn.o_proj.bias shape: (1280,)
421
+ v.blk.16.ffn_gate.weight -> model.layers.16.mlp.gate_proj.weight shape: (3420, 1280)
422
+ Converting v.blk.16.ffn_gate.bias to FP16
423
+ v.blk.16.ffn_gate.bias -> model.layers.16.mlp.gate_proj.bias shape: (3420,)
424
+ v.blk.16.ffn_up.weight -> model.layers.16.mlp.up_proj.weight shape: (3420, 1280)
425
+ Converting v.blk.16.ffn_up.bias to FP16
426
+ v.blk.16.ffn_up.bias -> model.layers.16.mlp.up_proj.bias shape: (3420,)
427
+ v.blk.16.ffn_down.weight -> model.layers.16.mlp.down_proj.weight shape: (1280, 3420)
428
+ Converting v.blk.16.ffn_down.bias to FP16
429
+ v.blk.16.ffn_down.bias -> model.layers.16.mlp.down_proj.bias shape: (1280,)
430
+ Converting v.blk.17.ln1.weight to FP16
431
+ v.blk.17.ln1.weight -> model.layers.17.ln1.weight shape: (1280,)
432
+ Converting v.blk.17.ln2.weight to FP16
433
+ v.blk.17.ln2.weight -> model.layers.17.ln2.weight shape: (1280,)
434
+ v.blk.17.attn_q.weight -> model.layers.17.self_attn.q_proj.weight shape: (1280, 1280)
435
+ v.blk.17.attn_k.weight -> model.layers.17.self_attn.k_proj.weight shape: (1280, 1280)
436
+ v.blk.17.attn_v.weight -> model.layers.17.self_attn.v_proj.weight shape: (1280, 1280)
437
+ Converting v.blk.17.attn_q.bias to FP16
438
+ v.blk.17.attn_q.bias -> model.layers.17.self_attn.q_proj.bias shape: (1280,)
439
+ Converting v.blk.17.attn_k.bias to FP16
440
+ v.blk.17.attn_k.bias -> model.layers.17.self_attn.k_proj.bias shape: (1280,)
441
+ Converting v.blk.17.attn_v.bias to FP16
442
+ v.blk.17.attn_v.bias -> model.layers.17.self_attn.v_proj.bias shape: (1280,)
443
+ v.blk.17.attn_out.weight -> model.layers.17.self_attn.o_proj.weight shape: (1280, 1280)
444
+ Converting v.blk.17.attn_out.bias to FP16
445
+ v.blk.17.attn_out.bias -> model.layers.17.self_attn.o_proj.bias shape: (1280,)
446
+ v.blk.17.ffn_gate.weight -> model.layers.17.mlp.gate_proj.weight shape: (3420, 1280)
447
+ Converting v.blk.17.ffn_gate.bias to FP16
448
+ v.blk.17.ffn_gate.bias -> model.layers.17.mlp.gate_proj.bias shape: (3420,)
449
+ v.blk.17.ffn_up.weight -> model.layers.17.mlp.up_proj.weight shape: (3420, 1280)
450
+ Converting v.blk.17.ffn_up.bias to FP16
451
+ v.blk.17.ffn_up.bias -> model.layers.17.mlp.up_proj.bias shape: (3420,)
452
+ v.blk.17.ffn_down.weight -> model.layers.17.mlp.down_proj.weight shape: (1280, 3420)
453
+ Converting v.blk.17.ffn_down.bias to FP16
454
+ v.blk.17.ffn_down.bias -> model.layers.17.mlp.down_proj.bias shape: (1280,)
455
+ Converting v.blk.18.ln1.weight to FP16
456
+ v.blk.18.ln1.weight -> model.layers.18.ln1.weight shape: (1280,)
457
+ Converting v.blk.18.ln2.weight to FP16
458
+ v.blk.18.ln2.weight -> model.layers.18.ln2.weight shape: (1280,)
459
+ v.blk.18.attn_q.weight -> model.layers.18.self_attn.q_proj.weight shape: (1280, 1280)
460
+ v.blk.18.attn_k.weight -> model.layers.18.self_attn.k_proj.weight shape: (1280, 1280)
461
+ v.blk.18.attn_v.weight -> model.layers.18.self_attn.v_proj.weight shape: (1280, 1280)
462
+ Converting v.blk.18.attn_q.bias to FP16
463
+ v.blk.18.attn_q.bias -> model.layers.18.self_attn.q_proj.bias shape: (1280,)
464
+ Converting v.blk.18.attn_k.bias to FP16
465
+ v.blk.18.attn_k.bias -> model.layers.18.self_attn.k_proj.bias shape: (1280,)
466
+ Converting v.blk.18.attn_v.bias to FP16
467
+ v.blk.18.attn_v.bias -> model.layers.18.self_attn.v_proj.bias shape: (1280,)
468
+ v.blk.18.attn_out.weight -> model.layers.18.self_attn.o_proj.weight shape: (1280, 1280)
469
+ Converting v.blk.18.attn_out.bias to FP16
470
+ v.blk.18.attn_out.bias -> model.layers.18.self_attn.o_proj.bias shape: (1280,)
471
+ v.blk.18.ffn_gate.weight -> model.layers.18.mlp.gate_proj.weight shape: (3420, 1280)
472
+ Converting v.blk.18.ffn_gate.bias to FP16
473
+ v.blk.18.ffn_gate.bias -> model.layers.18.mlp.gate_proj.bias shape: (3420,)
474
+ v.blk.18.ffn_up.weight -> model.layers.18.mlp.up_proj.weight shape: (3420, 1280)
475
+ Converting v.blk.18.ffn_up.bias to FP16
476
+ v.blk.18.ffn_up.bias -> model.layers.18.mlp.up_proj.bias shape: (3420,)
477
+ v.blk.18.ffn_down.weight -> model.layers.18.mlp.down_proj.weight shape: (1280, 3420)
478
+ Converting v.blk.18.ffn_down.bias to FP16
479
+ v.blk.18.ffn_down.bias -> model.layers.18.mlp.down_proj.bias shape: (1280,)
480
+ Converting v.blk.19.ln1.weight to FP16
481
+ v.blk.19.ln1.weight -> model.layers.19.ln1.weight shape: (1280,)
482
+ Converting v.blk.19.ln2.weight to FP16
483
+ v.blk.19.ln2.weight -> model.layers.19.ln2.weight shape: (1280,)
484
+ v.blk.19.attn_q.weight -> model.layers.19.self_attn.q_proj.weight shape: (1280, 1280)
485
+ v.blk.19.attn_k.weight -> model.layers.19.self_attn.k_proj.weight shape: (1280, 1280)
486
+ v.blk.19.attn_v.weight -> model.layers.19.self_attn.v_proj.weight shape: (1280, 1280)
487
+ Converting v.blk.19.attn_q.bias to FP16
488
+ v.blk.19.attn_q.bias -> model.layers.19.self_attn.q_proj.bias shape: (1280,)
489
+ Converting v.blk.19.attn_k.bias to FP16
490
+ v.blk.19.attn_k.bias -> model.layers.19.self_attn.k_proj.bias shape: (1280,)
491
+ Converting v.blk.19.attn_v.bias to FP16
492
+ v.blk.19.attn_v.bias -> model.layers.19.self_attn.v_proj.bias shape: (1280,)
493
+ v.blk.19.attn_out.weight -> model.layers.19.self_attn.o_proj.weight shape: (1280, 1280)
494
+ Converting v.blk.19.attn_out.bias to FP16
495
+ v.blk.19.attn_out.bias -> model.layers.19.self_attn.o_proj.bias shape: (1280,)
496
+ v.blk.19.ffn_gate.weight -> model.layers.19.mlp.gate_proj.weight shape: (3420, 1280)
497
+ Converting v.blk.19.ffn_gate.bias to FP16
498
+ v.blk.19.ffn_gate.bias -> model.layers.19.mlp.gate_proj.bias shape: (3420,)
499
+ v.blk.19.ffn_up.weight -> model.layers.19.mlp.up_proj.weight shape: (3420, 1280)
500
+ Converting v.blk.19.ffn_up.bias to FP16
501
+ v.blk.19.ffn_up.bias -> model.layers.19.mlp.up_proj.bias shape: (3420,)
502
+ v.blk.19.ffn_down.weight -> model.layers.19.mlp.down_proj.weight shape: (1280, 3420)
503
+ Converting v.blk.19.ffn_down.bias to FP16
504
+ v.blk.19.ffn_down.bias -> model.layers.19.mlp.down_proj.bias shape: (1280,)
505
+ Converting v.blk.20.ln1.weight to FP16
506
+ v.blk.20.ln1.weight -> model.layers.20.ln1.weight shape: (1280,)
507
+ Converting v.blk.20.ln2.weight to FP16
508
+ v.blk.20.ln2.weight -> model.layers.20.ln2.weight shape: (1280,)
509
+ v.blk.20.attn_q.weight -> model.layers.20.self_attn.q_proj.weight shape: (1280, 1280)
510
+ v.blk.20.attn_k.weight -> model.layers.20.self_attn.k_proj.weight shape: (1280, 1280)
511
+ v.blk.20.attn_v.weight -> model.layers.20.self_attn.v_proj.weight shape: (1280, 1280)
512
+ Converting v.blk.20.attn_q.bias to FP16
513
+ v.blk.20.attn_q.bias -> model.layers.20.self_attn.q_proj.bias shape: (1280,)
514
+ Converting v.blk.20.attn_k.bias to FP16
515
+ v.blk.20.attn_k.bias -> model.layers.20.self_attn.k_proj.bias shape: (1280,)
516
+ Converting v.blk.20.attn_v.bias to FP16
517
+ v.blk.20.attn_v.bias -> model.layers.20.self_attn.v_proj.bias shape: (1280,)
518
+ v.blk.20.attn_out.weight -> model.layers.20.self_attn.o_proj.weight shape: (1280, 1280)
519
+ Converting v.blk.20.attn_out.bias to FP16
520
+ v.blk.20.attn_out.bias -> model.layers.20.self_attn.o_proj.bias shape: (1280,)
521
+ v.blk.20.ffn_gate.weight -> model.layers.20.mlp.gate_proj.weight shape: (3420, 1280)
522
+ Converting v.blk.20.ffn_gate.bias to FP16
523
+ v.blk.20.ffn_gate.bias -> model.layers.20.mlp.gate_proj.bias shape: (3420,)
524
+ v.blk.20.ffn_up.weight -> model.layers.20.mlp.up_proj.weight shape: (3420, 1280)
525
+ Converting v.blk.20.ffn_up.bias to FP16
526
+ v.blk.20.ffn_up.bias -> model.layers.20.mlp.up_proj.bias shape: (3420,)
527
+ v.blk.20.ffn_down.weight -> model.layers.20.mlp.down_proj.weight shape: (1280, 3420)
528
+ Converting v.blk.20.ffn_down.bias to FP16
529
+ v.blk.20.ffn_down.bias -> model.layers.20.mlp.down_proj.bias shape: (1280,)
530
+ Converting v.blk.21.ln1.weight to FP16
531
+ v.blk.21.ln1.weight -> model.layers.21.ln1.weight shape: (1280,)
532
+ Converting v.blk.21.ln2.weight to FP16
533
+ v.blk.21.ln2.weight -> model.layers.21.ln2.weight shape: (1280,)
534
+ v.blk.21.attn_q.weight -> model.layers.21.self_attn.q_proj.weight shape: (1280, 1280)
535
+ v.blk.21.attn_k.weight -> model.layers.21.self_attn.k_proj.weight shape: (1280, 1280)
536
+ v.blk.21.attn_v.weight -> model.layers.21.self_attn.v_proj.weight shape: (1280, 1280)
537
+ Converting v.blk.21.attn_q.bias to FP16
538
+ v.blk.21.attn_q.bias -> model.layers.21.self_attn.q_proj.bias shape: (1280,)
539
+ Converting v.blk.21.attn_k.bias to FP16
540
+ v.blk.21.attn_k.bias -> model.layers.21.self_attn.k_proj.bias shape: (1280,)
541
+ Converting v.blk.21.attn_v.bias to FP16
542
+ v.blk.21.attn_v.bias -> model.layers.21.self_attn.v_proj.bias shape: (1280,)
543
+ v.blk.21.attn_out.weight -> model.layers.21.self_attn.o_proj.weight shape: (1280, 1280)
544
+ Converting v.blk.21.attn_out.bias to FP16
545
+ v.blk.21.attn_out.bias -> model.layers.21.self_attn.o_proj.bias shape: (1280,)
546
+ v.blk.21.ffn_gate.weight -> model.layers.21.mlp.gate_proj.weight shape: (3420, 1280)
547
+ Converting v.blk.21.ffn_gate.bias to FP16
548
+ v.blk.21.ffn_gate.bias -> model.layers.21.mlp.gate_proj.bias shape: (3420,)
549
+ v.blk.21.ffn_up.weight -> model.layers.21.mlp.up_proj.weight shape: (3420, 1280)
550
+ Converting v.blk.21.ffn_up.bias to FP16
551
+ v.blk.21.ffn_up.bias -> model.layers.21.mlp.up_proj.bias shape: (3420,)
552
+ v.blk.21.ffn_down.weight -> model.layers.21.mlp.down_proj.weight shape: (1280, 3420)
553
+ Converting v.blk.21.ffn_down.bias to FP16
554
+ v.blk.21.ffn_down.bias -> model.layers.21.mlp.down_proj.bias shape: (1280,)
555
+ Converting v.blk.22.ln1.weight to FP16
556
+ v.blk.22.ln1.weight -> model.layers.22.ln1.weight shape: (1280,)
557
+ Converting v.blk.22.ln2.weight to FP16
558
+ v.blk.22.ln2.weight -> model.layers.22.ln2.weight shape: (1280,)
559
+ v.blk.22.attn_q.weight -> model.layers.22.self_attn.q_proj.weight shape: (1280, 1280)
560
+ v.blk.22.attn_k.weight -> model.layers.22.self_attn.k_proj.weight shape: (1280, 1280)
561
+ v.blk.22.attn_v.weight -> model.layers.22.self_attn.v_proj.weight shape: (1280, 1280)
562
+ Converting v.blk.22.attn_q.bias to FP16
563
+ v.blk.22.attn_q.bias -> model.layers.22.self_attn.q_proj.bias shape: (1280,)
564
+ Converting v.blk.22.attn_k.bias to FP16
565
+ v.blk.22.attn_k.bias -> model.layers.22.self_attn.k_proj.bias shape: (1280,)
566
+ Converting v.blk.22.attn_v.bias to FP16
567
+ v.blk.22.attn_v.bias -> model.layers.22.self_attn.v_proj.bias shape: (1280,)
568
+ v.blk.22.attn_out.weight -> model.layers.22.self_attn.o_proj.weight shape: (1280, 1280)
569
+ Converting v.blk.22.attn_out.bias to FP16
570
+ v.blk.22.attn_out.bias -> model.layers.22.self_attn.o_proj.bias shape: (1280,)
571
+ v.blk.22.ffn_gate.weight -> model.layers.22.mlp.gate_proj.weight shape: (3420, 1280)
572
+ Converting v.blk.22.ffn_gate.bias to FP16
573
+ v.blk.22.ffn_gate.bias -> model.layers.22.mlp.gate_proj.bias shape: (3420,)
574
+ v.blk.22.ffn_up.weight -> model.layers.22.mlp.up_proj.weight shape: (3420, 1280)
575
+ Converting v.blk.22.ffn_up.bias to FP16
576
+ v.blk.22.ffn_up.bias -> model.layers.22.mlp.up_proj.bias shape: (3420,)
577
+ v.blk.22.ffn_down.weight -> model.layers.22.mlp.down_proj.weight shape: (1280, 3420)
578
+ Converting v.blk.22.ffn_down.bias to FP16
579
+ v.blk.22.ffn_down.bias -> model.layers.22.mlp.down_proj.bias shape: (1280,)
580
+ Converting v.blk.23.ln1.weight to FP16
581
+ v.blk.23.ln1.weight -> model.layers.23.ln1.weight shape: (1280,)
582
+ Converting v.blk.23.ln2.weight to FP16
583
+ v.blk.23.ln2.weight -> model.layers.23.ln2.weight shape: (1280,)
584
+ v.blk.23.attn_q.weight -> model.layers.23.self_attn.q_proj.weight shape: (1280, 1280)
585
+ v.blk.23.attn_k.weight -> model.layers.23.self_attn.k_proj.weight shape: (1280, 1280)
586
+ v.blk.23.attn_v.weight -> model.layers.23.self_attn.v_proj.weight shape: (1280, 1280)
587
+ Converting v.blk.23.attn_q.bias to FP16
588
+ v.blk.23.attn_q.bias -> model.layers.23.self_attn.q_proj.bias shape: (1280,)
589
+ Converting v.blk.23.attn_k.bias to FP16
590
+ v.blk.23.attn_k.bias -> model.layers.23.self_attn.k_proj.bias shape: (1280,)
591
+ Converting v.blk.23.attn_v.bias to FP16
592
+ v.blk.23.attn_v.bias -> model.layers.23.self_attn.v_proj.bias shape: (1280,)
593
+ v.blk.23.attn_out.weight -> model.layers.23.self_attn.o_proj.weight shape: (1280, 1280)
594
+ Converting v.blk.23.attn_out.bias to FP16
595
+ v.blk.23.attn_out.bias -> model.layers.23.self_attn.o_proj.bias shape: (1280,)
596
+ v.blk.23.ffn_gate.weight -> model.layers.23.mlp.gate_proj.weight shape: (3420, 1280)
597
+ Converting v.blk.23.ffn_gate.bias to FP16
598
+ v.blk.23.ffn_gate.bias -> model.layers.23.mlp.gate_proj.bias shape: (3420,)
599
+ v.blk.23.ffn_up.weight -> model.layers.23.mlp.up_proj.weight shape: (3420, 1280)
600
+ Converting v.blk.23.ffn_up.bias to FP16
601
+ v.blk.23.ffn_up.bias -> model.layers.23.mlp.up_proj.bias shape: (3420,)
602
+ v.blk.23.ffn_down.weight -> model.layers.23.mlp.down_proj.weight shape: (1280, 3420)
603
+ Converting v.blk.23.ffn_down.bias to FP16
604
+ v.blk.23.ffn_down.bias -> model.layers.23.mlp.down_proj.bias shape: (1280,)
605
+ Converting v.blk.24.ln1.weight to FP16
606
+ v.blk.24.ln1.weight -> model.layers.24.ln1.weight shape: (1280,)
607
+ Converting v.blk.24.ln2.weight to FP16
608
+ v.blk.24.ln2.weight -> model.layers.24.ln2.weight shape: (1280,)
609
+ v.blk.24.attn_q.weight -> model.layers.24.self_attn.q_proj.weight shape: (1280, 1280)
610
+ v.blk.24.attn_k.weight -> model.layers.24.self_attn.k_proj.weight shape: (1280, 1280)
611
+ v.blk.24.attn_v.weight -> model.layers.24.self_attn.v_proj.weight shape: (1280, 1280)
612
+ Converting v.blk.24.attn_q.bias to FP16
613
+ v.blk.24.attn_q.bias -> model.layers.24.self_attn.q_proj.bias shape: (1280,)
614
+ Converting v.blk.24.attn_k.bias to FP16
615
+ v.blk.24.attn_k.bias -> model.layers.24.self_attn.k_proj.bias shape: (1280,)
616
+ Converting v.blk.24.attn_v.bias to FP16
617
+ v.blk.24.attn_v.bias -> model.layers.24.self_attn.v_proj.bias shape: (1280,)
618
+ v.blk.24.attn_out.weight -> model.layers.24.self_attn.o_proj.weight shape: (1280, 1280)
619
+ Converting v.blk.24.attn_out.bias to FP16
620
+ v.blk.24.attn_out.bias -> model.layers.24.self_attn.o_proj.bias shape: (1280,)
621
+ v.blk.24.ffn_gate.weight -> model.layers.24.mlp.gate_proj.weight shape: (3420, 1280)
622
+ Converting v.blk.24.ffn_gate.bias to FP16
623
+ v.blk.24.ffn_gate.bias -> model.layers.24.mlp.gate_proj.bias shape: (3420,)
624
+ v.blk.24.ffn_up.weight -> model.layers.24.mlp.up_proj.weight shape: (3420, 1280)
625
+ Converting v.blk.24.ffn_up.bias to FP16
626
+ v.blk.24.ffn_up.bias -> model.layers.24.mlp.up_proj.bias shape: (3420,)
627
+ v.blk.24.ffn_down.weight -> model.layers.24.mlp.down_proj.weight shape: (1280, 3420)
628
+ Converting v.blk.24.ffn_down.bias to FP16
629
+ v.blk.24.ffn_down.bias -> model.layers.24.mlp.down_proj.bias shape: (1280,)
630
+ Converting v.blk.25.ln1.weight to FP16
631
+ v.blk.25.ln1.weight -> model.layers.25.ln1.weight shape: (1280,)
632
+ Converting v.blk.25.ln2.weight to FP16
633
+ v.blk.25.ln2.weight -> model.layers.25.ln2.weight shape: (1280,)
634
+ v.blk.25.attn_q.weight -> model.layers.25.self_attn.q_proj.weight shape: (1280, 1280)
635
+ v.blk.25.attn_k.weight -> model.layers.25.self_attn.k_proj.weight shape: (1280, 1280)
636
+ v.blk.25.attn_v.weight -> model.layers.25.self_attn.v_proj.weight shape: (1280, 1280)
637
+ Converting v.blk.25.attn_q.bias to FP16
638
+ v.blk.25.attn_q.bias -> model.layers.25.self_attn.q_proj.bias shape: (1280,)
639
+ Converting v.blk.25.attn_k.bias to FP16
640
+ v.blk.25.attn_k.bias -> model.layers.25.self_attn.k_proj.bias shape: (1280,)
641
+ Converting v.blk.25.attn_v.bias to FP16
642
+ v.blk.25.attn_v.bias -> model.layers.25.self_attn.v_proj.bias shape: (1280,)
643
+ v.blk.25.attn_out.weight -> model.layers.25.self_attn.o_proj.weight shape: (1280, 1280)
644
+ Converting v.blk.25.attn_out.bias to FP16
645
+ v.blk.25.attn_out.bias -> model.layers.25.self_attn.o_proj.bias shape: (1280,)
646
+ v.blk.25.ffn_gate.weight -> model.layers.25.mlp.gate_proj.weight shape: (3420, 1280)
647
+ Converting v.blk.25.ffn_gate.bias to FP16
648
+ v.blk.25.ffn_gate.bias -> model.layers.25.mlp.gate_proj.bias shape: (3420,)
649
+ v.blk.25.ffn_up.weight -> model.layers.25.mlp.up_proj.weight shape: (3420, 1280)
650
+ Converting v.blk.25.ffn_up.bias to FP16
651
+ v.blk.25.ffn_up.bias -> model.layers.25.mlp.up_proj.bias shape: (3420,)
652
+ v.blk.25.ffn_down.weight -> model.layers.25.mlp.down_proj.weight shape: (1280, 3420)
653
+ Converting v.blk.25.ffn_down.bias to FP16
654
+ v.blk.25.ffn_down.bias -> model.layers.25.mlp.down_proj.bias shape: (1280,)
655
+ Converting v.blk.26.ln1.weight to FP16
656
+ v.blk.26.ln1.weight -> model.layers.26.ln1.weight shape: (1280,)
657
+ Converting v.blk.26.ln2.weight to FP16
658
+ v.blk.26.ln2.weight -> model.layers.26.ln2.weight shape: (1280,)
659
+ v.blk.26.attn_q.weight -> model.layers.26.self_attn.q_proj.weight shape: (1280, 1280)
660
+ v.blk.26.attn_k.weight -> model.layers.26.self_attn.k_proj.weight shape: (1280, 1280)
661
+ v.blk.26.attn_v.weight -> model.layers.26.self_attn.v_proj.weight shape: (1280, 1280)
662
+ Converting v.blk.26.attn_q.bias to FP16
663
+ v.blk.26.attn_q.bias -> model.layers.26.self_attn.q_proj.bias shape: (1280,)
664
+ Converting v.blk.26.attn_k.bias to FP16
665
+ v.blk.26.attn_k.bias -> model.layers.26.self_attn.k_proj.bias shape: (1280,)
666
+ Converting v.blk.26.attn_v.bias to FP16
667
+ v.blk.26.attn_v.bias -> model.layers.26.self_attn.v_proj.bias shape: (1280,)
668
+ v.blk.26.attn_out.weight -> model.layers.26.self_attn.o_proj.weight shape: (1280, 1280)
669
+ Converting v.blk.26.attn_out.bias to FP16
670
+ v.blk.26.attn_out.bias -> model.layers.26.self_attn.o_proj.bias shape: (1280,)
671
+ v.blk.26.ffn_gate.weight -> model.layers.26.mlp.gate_proj.weight shape: (3420, 1280)
672
+ Converting v.blk.26.ffn_gate.bias to FP16
673
+ v.blk.26.ffn_gate.bias -> model.layers.26.mlp.gate_proj.bias shape: (3420,)
674
+ v.blk.26.ffn_up.weight -> model.layers.26.mlp.up_proj.weight shape: (3420, 1280)
675
+ Converting v.blk.26.ffn_up.bias to FP16
676
+ v.blk.26.ffn_up.bias -> model.layers.26.mlp.up_proj.bias shape: (3420,)
677
+ v.blk.26.ffn_down.weight -> model.layers.26.mlp.down_proj.weight shape: (1280, 3420)
678
+ Converting v.blk.26.ffn_down.bias to FP16
679
+ v.blk.26.ffn_down.bias -> model.layers.26.mlp.down_proj.bias shape: (1280,)
680
+ Converting v.blk.27.ln1.weight to FP16
681
+ v.blk.27.ln1.weight -> model.layers.27.ln1.weight shape: (1280,)
682
+ Converting v.blk.27.ln2.weight to FP16
683
+ v.blk.27.ln2.weight -> model.layers.27.ln2.weight shape: (1280,)
684
+ v.blk.27.attn_q.weight -> model.layers.27.self_attn.q_proj.weight shape: (1280, 1280)
685
+ v.blk.27.attn_k.weight -> model.layers.27.self_attn.k_proj.weight shape: (1280, 1280)
686
+ v.blk.27.attn_v.weight -> model.layers.27.self_attn.v_proj.weight shape: (1280, 1280)
687
+ Converting v.blk.27.attn_q.bias to FP16
688
+ v.blk.27.attn_q.bias -> model.layers.27.self_attn.q_proj.bias shape: (1280,)
689
+ Converting v.blk.27.attn_k.bias to FP16
690
+ v.blk.27.attn_k.bias -> model.layers.27.self_attn.k_proj.bias shape: (1280,)
691
+ Converting v.blk.27.attn_v.bias to FP16
692
+ v.blk.27.attn_v.bias -> model.layers.27.self_attn.v_proj.bias shape: (1280,)
693
+ v.blk.27.attn_out.weight -> model.layers.27.self_attn.o_proj.weight shape: (1280, 1280)
694
+ Converting v.blk.27.attn_out.bias to FP16
695
+ v.blk.27.attn_out.bias -> model.layers.27.self_attn.o_proj.bias shape: (1280,)
696
+ v.blk.27.ffn_gate.weight -> model.layers.27.mlp.gate_proj.weight shape: (3420, 1280)
697
+ Converting v.blk.27.ffn_gate.bias to FP16
698
+ v.blk.27.ffn_gate.bias -> model.layers.27.mlp.gate_proj.bias shape: (3420,)
699
+ v.blk.27.ffn_up.weight -> model.layers.27.mlp.up_proj.weight shape: (3420, 1280)
700
+ Converting v.blk.27.ffn_up.bias to FP16
701
+ v.blk.27.ffn_up.bias -> model.layers.27.mlp.up_proj.bias shape: (3420,)
702
+ v.blk.27.ffn_down.weight -> model.layers.27.mlp.down_proj.weight shape: (1280, 3420)
703
+ Converting v.blk.27.ffn_down.bias to FP16
704
+ v.blk.27.ffn_down.bias -> model.layers.27.mlp.down_proj.bias shape: (1280,)
705
+ Converting v.blk.28.ln1.weight to FP16
706
+ v.blk.28.ln1.weight -> model.layers.28.ln1.weight shape: (1280,)
707
+ Converting v.blk.28.ln2.weight to FP16
708
+ v.blk.28.ln2.weight -> model.layers.28.ln2.weight shape: (1280,)
709
+ v.blk.28.attn_q.weight -> model.layers.28.self_attn.q_proj.weight shape: (1280, 1280)
710
+ v.blk.28.attn_k.weight -> model.layers.28.self_attn.k_proj.weight shape: (1280, 1280)
711
+ v.blk.28.attn_v.weight -> model.layers.28.self_attn.v_proj.weight shape: (1280, 1280)
712
+ Converting v.blk.28.attn_q.bias to FP16
713
+ v.blk.28.attn_q.bias -> model.layers.28.self_attn.q_proj.bias shape: (1280,)
714
+ Converting v.blk.28.attn_k.bias to FP16
715
+ v.blk.28.attn_k.bias -> model.layers.28.self_attn.k_proj.bias shape: (1280,)
716
+ Converting v.blk.28.attn_v.bias to FP16
717
+ v.blk.28.attn_v.bias -> model.layers.28.self_attn.v_proj.bias shape: (1280,)
718
+ v.blk.28.attn_out.weight -> model.layers.28.self_attn.o_proj.weight shape: (1280, 1280)
719
+ Converting v.blk.28.attn_out.bias to FP16
720
+ v.blk.28.attn_out.bias -> model.layers.28.self_attn.o_proj.bias shape: (1280,)
721
+ v.blk.28.ffn_gate.weight -> model.layers.28.mlp.gate_proj.weight shape: (3420, 1280)
722
+ Converting v.blk.28.ffn_gate.bias to FP16
723
+ v.blk.28.ffn_gate.bias -> model.layers.28.mlp.gate_proj.bias shape: (3420,)
724
+ v.blk.28.ffn_up.weight -> model.layers.28.mlp.up_proj.weight shape: (3420, 1280)
725
+ Converting v.blk.28.ffn_up.bias to FP16
726
+ v.blk.28.ffn_up.bias -> model.layers.28.mlp.up_proj.bias shape: (3420,)
727
+ v.blk.28.ffn_down.weight -> model.layers.28.mlp.down_proj.weight shape: (1280, 3420)
728
+ Converting v.blk.28.ffn_down.bias to FP16
729
+ v.blk.28.ffn_down.bias -> model.layers.28.mlp.down_proj.bias shape: (1280,)
730
+ Converting v.blk.29.ln1.weight to FP16
731
+ v.blk.29.ln1.weight -> model.layers.29.ln1.weight shape: (1280,)
732
+ Converting v.blk.29.ln2.weight to FP16
733
+ v.blk.29.ln2.weight -> model.layers.29.ln2.weight shape: (1280,)
734
+ v.blk.29.attn_q.weight -> model.layers.29.self_attn.q_proj.weight shape: (1280, 1280)
735
+ v.blk.29.attn_k.weight -> model.layers.29.self_attn.k_proj.weight shape: (1280, 1280)
736
+ v.blk.29.attn_v.weight -> model.layers.29.self_attn.v_proj.weight shape: (1280, 1280)
737
+ Converting v.blk.29.attn_q.bias to FP16
738
+ v.blk.29.attn_q.bias -> model.layers.29.self_attn.q_proj.bias shape: (1280,)
739
+ Converting v.blk.29.attn_k.bias to FP16
740
+ v.blk.29.attn_k.bias -> model.layers.29.self_attn.k_proj.bias shape: (1280,)
741
+ Converting v.blk.29.attn_v.bias to FP16
742
+ v.blk.29.attn_v.bias -> model.layers.29.self_attn.v_proj.bias shape: (1280,)
743
+ v.blk.29.attn_out.weight -> model.layers.29.self_attn.o_proj.weight shape: (1280, 1280)
744
+ Converting v.blk.29.attn_out.bias to FP16
745
+ v.blk.29.attn_out.bias -> model.layers.29.self_attn.o_proj.bias shape: (1280,)
746
+ v.blk.29.ffn_gate.weight -> model.layers.29.mlp.gate_proj.weight shape: (3420, 1280)
747
+ Converting v.blk.29.ffn_gate.bias to FP16
748
+ v.blk.29.ffn_gate.bias -> model.layers.29.mlp.gate_proj.bias shape: (3420,)
749
+ v.blk.29.ffn_up.weight -> model.layers.29.mlp.up_proj.weight shape: (3420, 1280)
750
+ Converting v.blk.29.ffn_up.bias to FP16
751
+ v.blk.29.ffn_up.bias -> model.layers.29.mlp.up_proj.bias shape: (3420,)
752
+ v.blk.29.ffn_down.weight -> model.layers.29.mlp.down_proj.weight shape: (1280, 3420)
753
+ Converting v.blk.29.ffn_down.bias to FP16
754
+ v.blk.29.ffn_down.bias -> model.layers.29.mlp.down_proj.bias shape: (1280,)
755
+ Converting v.blk.30.ln1.weight to FP16
756
+ v.blk.30.ln1.weight -> model.layers.30.ln1.weight shape: (1280,)
757
+ Converting v.blk.30.ln2.weight to FP16
758
+ v.blk.30.ln2.weight -> model.layers.30.ln2.weight shape: (1280,)
759
+ v.blk.30.attn_q.weight -> model.layers.30.self_attn.q_proj.weight shape: (1280, 1280)
760
+ v.blk.30.attn_k.weight -> model.layers.30.self_attn.k_proj.weight shape: (1280, 1280)
761
+ v.blk.30.attn_v.weight -> model.layers.30.self_attn.v_proj.weight shape: (1280, 1280)
762
+ Converting v.blk.30.attn_q.bias to FP16
763
+ v.blk.30.attn_q.bias -> model.layers.30.self_attn.q_proj.bias shape: (1280,)
764
+ Converting v.blk.30.attn_k.bias to FP16
765
+ v.blk.30.attn_k.bias -> model.layers.30.self_attn.k_proj.bias shape: (1280,)
766
+ Converting v.blk.30.attn_v.bias to FP16
767
+ v.blk.30.attn_v.bias -> model.layers.30.self_attn.v_proj.bias shape: (1280,)
768
+ v.blk.30.attn_out.weight -> model.layers.30.self_attn.o_proj.weight shape: (1280, 1280)
769
+ Converting v.blk.30.attn_out.bias to FP16
770
+ v.blk.30.attn_out.bias -> model.layers.30.self_attn.o_proj.bias shape: (1280,)
771
+ v.blk.30.ffn_gate.weight -> model.layers.30.mlp.gate_proj.weight shape: (3420, 1280)
772
+ Converting v.blk.30.ffn_gate.bias to FP16
773
+ v.blk.30.ffn_gate.bias -> model.layers.30.mlp.gate_proj.bias shape: (3420,)
774
+ v.blk.30.ffn_up.weight -> model.layers.30.mlp.up_proj.weight shape: (3420, 1280)
775
+ Converting v.blk.30.ffn_up.bias to FP16
776
+ v.blk.30.ffn_up.bias -> model.layers.30.mlp.up_proj.bias shape: (3420,)
777
+ v.blk.30.ffn_down.weight -> model.layers.30.mlp.down_proj.weight shape: (1280, 3420)
778
+ Converting v.blk.30.ffn_down.bias to FP16
779
+ v.blk.30.ffn_down.bias -> model.layers.30.mlp.down_proj.bias shape: (1280,)
780
+ Converting v.blk.31.ln1.weight to FP16
781
+ v.blk.31.ln1.weight -> model.layers.31.ln1.weight shape: (1280,)
782
+ Converting v.blk.31.ln2.weight to FP16
783
+ v.blk.31.ln2.weight -> model.layers.31.ln2.weight shape: (1280,)
784
+ v.blk.31.attn_q.weight -> model.layers.31.self_attn.q_proj.weight shape: (1280, 1280)
785
+ v.blk.31.attn_k.weight -> model.layers.31.self_attn.k_proj.weight shape: (1280, 1280)
786
+ v.blk.31.attn_v.weight -> model.layers.31.self_attn.v_proj.weight shape: (1280, 1280)
787
+ Converting v.blk.31.attn_q.bias to FP16
788
+ v.blk.31.attn_q.bias -> model.layers.31.self_attn.q_proj.bias shape: (1280,)
789
+ Converting v.blk.31.attn_k.bias to FP16
790
+ v.blk.31.attn_k.bias -> model.layers.31.self_attn.k_proj.bias shape: (1280,)
791
+ Converting v.blk.31.attn_v.bias to FP16
792
+ v.blk.31.attn_v.bias -> model.layers.31.self_attn.v_proj.bias shape: (1280,)
793
+ v.blk.31.attn_out.weight -> model.layers.31.self_attn.o_proj.weight shape: (1280, 1280)
794
+ Converting v.blk.31.attn_out.bias to FP16
795
+ v.blk.31.attn_out.bias -> model.layers.31.self_attn.o_proj.bias shape: (1280,)
796
+ v.blk.31.ffn_gate.weight -> model.layers.31.mlp.gate_proj.weight shape: (3420, 1280)
797
+ Converting v.blk.31.ffn_gate.bias to FP16
798
+ v.blk.31.ffn_gate.bias -> model.layers.31.mlp.gate_proj.bias shape: (3420,)
799
+ v.blk.31.ffn_up.weight -> model.layers.31.mlp.up_proj.weight shape: (3420, 1280)
800
+ Converting v.blk.31.ffn_up.bias to FP16
801
+ v.blk.31.ffn_up.bias -> model.layers.31.mlp.up_proj.bias shape: (3420,)
802
+ v.blk.31.ffn_down.weight -> model.layers.31.mlp.down_proj.weight shape: (1280, 3420)
803
+ Converting v.blk.31.ffn_down.bias to FP16
804
+ v.blk.31.ffn_down.bias -> model.layers.31.mlp.down_proj.bias shape: (1280,)
805
+ Converting v.post_ln.weight to FP16
806
+ v.post_ln.weight -> model.post_ln.weight shape: (1280,)
807
+ mm.0.weight -> model.mm.0.weight shape: (5120, 5120)
808
+ Converting mm.0.bias to FP16
809
+ mm.0.bias -> model.mm.0.bias shape: (5120,)
810
+ mm.2.weight -> model.mm.2.weight shape: (2048, 5120)
811
+ Converting mm.2.bias to FP16
812
+ mm.2.bias -> model.mm.2.bias shape: (2048,)
813
+ Converting v.position_embd.weight to FP16
814
+ v.position_embd.weight -> model.position_embd.weight shape: (10, 10)
815
+
816
+ Converted 520 tensors
817
+
818
+ All required tensors present!
819
+
820
+ Saving to qwen2.5-vl-3b-mmproj/out-fp16.npz...
821
+ Output file size: 0.96 GB
822
+
823
+ Conversion complete!