huiwon commited on
Commit
5724a75
·
verified ·
1 Parent(s): 59f3cb9

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ trainer_state.json filter=lfs diff=lfs merge=lfs -text
added_tokens.json ADDED
@@ -0,0 +1,2079 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|action_0|>": 151672,
9
+ "<|action_1000|>": 152672,
10
+ "<|action_1001|>": 152673,
11
+ "<|action_1002|>": 152674,
12
+ "<|action_1003|>": 152675,
13
+ "<|action_1004|>": 152676,
14
+ "<|action_1005|>": 152677,
15
+ "<|action_1006|>": 152678,
16
+ "<|action_1007|>": 152679,
17
+ "<|action_1008|>": 152680,
18
+ "<|action_1009|>": 152681,
19
+ "<|action_100|>": 151772,
20
+ "<|action_1010|>": 152682,
21
+ "<|action_1011|>": 152683,
22
+ "<|action_1012|>": 152684,
23
+ "<|action_1013|>": 152685,
24
+ "<|action_1014|>": 152686,
25
+ "<|action_1015|>": 152687,
26
+ "<|action_1016|>": 152688,
27
+ "<|action_1017|>": 152689,
28
+ "<|action_1018|>": 152690,
29
+ "<|action_1019|>": 152691,
30
+ "<|action_101|>": 151773,
31
+ "<|action_1020|>": 152692,
32
+ "<|action_1021|>": 152693,
33
+ "<|action_1022|>": 152694,
34
+ "<|action_1023|>": 152695,
35
+ "<|action_1024|>": 152696,
36
+ "<|action_1025|>": 152697,
37
+ "<|action_1026|>": 152698,
38
+ "<|action_1027|>": 152699,
39
+ "<|action_1028|>": 152700,
40
+ "<|action_1029|>": 152701,
41
+ "<|action_102|>": 151774,
42
+ "<|action_1030|>": 152702,
43
+ "<|action_1031|>": 152703,
44
+ "<|action_1032|>": 152704,
45
+ "<|action_1033|>": 152705,
46
+ "<|action_1034|>": 152706,
47
+ "<|action_1035|>": 152707,
48
+ "<|action_1036|>": 152708,
49
+ "<|action_1037|>": 152709,
50
+ "<|action_1038|>": 152710,
51
+ "<|action_1039|>": 152711,
52
+ "<|action_103|>": 151775,
53
+ "<|action_1040|>": 152712,
54
+ "<|action_1041|>": 152713,
55
+ "<|action_1042|>": 152714,
56
+ "<|action_1043|>": 152715,
57
+ "<|action_1044|>": 152716,
58
+ "<|action_1045|>": 152717,
59
+ "<|action_1046|>": 152718,
60
+ "<|action_1047|>": 152719,
61
+ "<|action_1048|>": 152720,
62
+ "<|action_1049|>": 152721,
63
+ "<|action_104|>": 151776,
64
+ "<|action_1050|>": 152722,
65
+ "<|action_1051|>": 152723,
66
+ "<|action_1052|>": 152724,
67
+ "<|action_1053|>": 152725,
68
+ "<|action_1054|>": 152726,
69
+ "<|action_1055|>": 152727,
70
+ "<|action_1056|>": 152728,
71
+ "<|action_1057|>": 152729,
72
+ "<|action_1058|>": 152730,
73
+ "<|action_1059|>": 152731,
74
+ "<|action_105|>": 151777,
75
+ "<|action_1060|>": 152732,
76
+ "<|action_1061|>": 152733,
77
+ "<|action_1062|>": 152734,
78
+ "<|action_1063|>": 152735,
79
+ "<|action_1064|>": 152736,
80
+ "<|action_1065|>": 152737,
81
+ "<|action_1066|>": 152738,
82
+ "<|action_1067|>": 152739,
83
+ "<|action_1068|>": 152740,
84
+ "<|action_1069|>": 152741,
85
+ "<|action_106|>": 151778,
86
+ "<|action_1070|>": 152742,
87
+ "<|action_1071|>": 152743,
88
+ "<|action_1072|>": 152744,
89
+ "<|action_1073|>": 152745,
90
+ "<|action_1074|>": 152746,
91
+ "<|action_1075|>": 152747,
92
+ "<|action_1076|>": 152748,
93
+ "<|action_1077|>": 152749,
94
+ "<|action_1078|>": 152750,
95
+ "<|action_1079|>": 152751,
96
+ "<|action_107|>": 151779,
97
+ "<|action_1080|>": 152752,
98
+ "<|action_1081|>": 152753,
99
+ "<|action_1082|>": 152754,
100
+ "<|action_1083|>": 152755,
101
+ "<|action_1084|>": 152756,
102
+ "<|action_1085|>": 152757,
103
+ "<|action_1086|>": 152758,
104
+ "<|action_1087|>": 152759,
105
+ "<|action_1088|>": 152760,
106
+ "<|action_1089|>": 152761,
107
+ "<|action_108|>": 151780,
108
+ "<|action_1090|>": 152762,
109
+ "<|action_1091|>": 152763,
110
+ "<|action_1092|>": 152764,
111
+ "<|action_1093|>": 152765,
112
+ "<|action_1094|>": 152766,
113
+ "<|action_1095|>": 152767,
114
+ "<|action_1096|>": 152768,
115
+ "<|action_1097|>": 152769,
116
+ "<|action_1098|>": 152770,
117
+ "<|action_1099|>": 152771,
118
+ "<|action_109|>": 151781,
119
+ "<|action_10|>": 151682,
120
+ "<|action_1100|>": 152772,
121
+ "<|action_1101|>": 152773,
122
+ "<|action_1102|>": 152774,
123
+ "<|action_1103|>": 152775,
124
+ "<|action_1104|>": 152776,
125
+ "<|action_1105|>": 152777,
126
+ "<|action_1106|>": 152778,
127
+ "<|action_1107|>": 152779,
128
+ "<|action_1108|>": 152780,
129
+ "<|action_1109|>": 152781,
130
+ "<|action_110|>": 151782,
131
+ "<|action_1110|>": 152782,
132
+ "<|action_1111|>": 152783,
133
+ "<|action_1112|>": 152784,
134
+ "<|action_1113|>": 152785,
135
+ "<|action_1114|>": 152786,
136
+ "<|action_1115|>": 152787,
137
+ "<|action_1116|>": 152788,
138
+ "<|action_1117|>": 152789,
139
+ "<|action_1118|>": 152790,
140
+ "<|action_1119|>": 152791,
141
+ "<|action_111|>": 151783,
142
+ "<|action_1120|>": 152792,
143
+ "<|action_1121|>": 152793,
144
+ "<|action_1122|>": 152794,
145
+ "<|action_1123|>": 152795,
146
+ "<|action_1124|>": 152796,
147
+ "<|action_1125|>": 152797,
148
+ "<|action_1126|>": 152798,
149
+ "<|action_1127|>": 152799,
150
+ "<|action_1128|>": 152800,
151
+ "<|action_1129|>": 152801,
152
+ "<|action_112|>": 151784,
153
+ "<|action_1130|>": 152802,
154
+ "<|action_1131|>": 152803,
155
+ "<|action_1132|>": 152804,
156
+ "<|action_1133|>": 152805,
157
+ "<|action_1134|>": 152806,
158
+ "<|action_1135|>": 152807,
159
+ "<|action_1136|>": 152808,
160
+ "<|action_1137|>": 152809,
161
+ "<|action_1138|>": 152810,
162
+ "<|action_1139|>": 152811,
163
+ "<|action_113|>": 151785,
164
+ "<|action_1140|>": 152812,
165
+ "<|action_1141|>": 152813,
166
+ "<|action_1142|>": 152814,
167
+ "<|action_1143|>": 152815,
168
+ "<|action_1144|>": 152816,
169
+ "<|action_1145|>": 152817,
170
+ "<|action_1146|>": 152818,
171
+ "<|action_1147|>": 152819,
172
+ "<|action_1148|>": 152820,
173
+ "<|action_1149|>": 152821,
174
+ "<|action_114|>": 151786,
175
+ "<|action_1150|>": 152822,
176
+ "<|action_1151|>": 152823,
177
+ "<|action_1152|>": 152824,
178
+ "<|action_1153|>": 152825,
179
+ "<|action_1154|>": 152826,
180
+ "<|action_1155|>": 152827,
181
+ "<|action_1156|>": 152828,
182
+ "<|action_1157|>": 152829,
183
+ "<|action_1158|>": 152830,
184
+ "<|action_1159|>": 152831,
185
+ "<|action_115|>": 151787,
186
+ "<|action_1160|>": 152832,
187
+ "<|action_1161|>": 152833,
188
+ "<|action_1162|>": 152834,
189
+ "<|action_1163|>": 152835,
190
+ "<|action_1164|>": 152836,
191
+ "<|action_1165|>": 152837,
192
+ "<|action_1166|>": 152838,
193
+ "<|action_1167|>": 152839,
194
+ "<|action_1168|>": 152840,
195
+ "<|action_1169|>": 152841,
196
+ "<|action_116|>": 151788,
197
+ "<|action_1170|>": 152842,
198
+ "<|action_1171|>": 152843,
199
+ "<|action_1172|>": 152844,
200
+ "<|action_1173|>": 152845,
201
+ "<|action_1174|>": 152846,
202
+ "<|action_1175|>": 152847,
203
+ "<|action_1176|>": 152848,
204
+ "<|action_1177|>": 152849,
205
+ "<|action_1178|>": 152850,
206
+ "<|action_1179|>": 152851,
207
+ "<|action_117|>": 151789,
208
+ "<|action_1180|>": 152852,
209
+ "<|action_1181|>": 152853,
210
+ "<|action_1182|>": 152854,
211
+ "<|action_1183|>": 152855,
212
+ "<|action_1184|>": 152856,
213
+ "<|action_1185|>": 152857,
214
+ "<|action_1186|>": 152858,
215
+ "<|action_1187|>": 152859,
216
+ "<|action_1188|>": 152860,
217
+ "<|action_1189|>": 152861,
218
+ "<|action_118|>": 151790,
219
+ "<|action_1190|>": 152862,
220
+ "<|action_1191|>": 152863,
221
+ "<|action_1192|>": 152864,
222
+ "<|action_1193|>": 152865,
223
+ "<|action_1194|>": 152866,
224
+ "<|action_1195|>": 152867,
225
+ "<|action_1196|>": 152868,
226
+ "<|action_1197|>": 152869,
227
+ "<|action_1198|>": 152870,
228
+ "<|action_1199|>": 152871,
229
+ "<|action_119|>": 151791,
230
+ "<|action_11|>": 151683,
231
+ "<|action_1200|>": 152872,
232
+ "<|action_1201|>": 152873,
233
+ "<|action_1202|>": 152874,
234
+ "<|action_1203|>": 152875,
235
+ "<|action_1204|>": 152876,
236
+ "<|action_1205|>": 152877,
237
+ "<|action_1206|>": 152878,
238
+ "<|action_1207|>": 152879,
239
+ "<|action_1208|>": 152880,
240
+ "<|action_1209|>": 152881,
241
+ "<|action_120|>": 151792,
242
+ "<|action_1210|>": 152882,
243
+ "<|action_1211|>": 152883,
244
+ "<|action_1212|>": 152884,
245
+ "<|action_1213|>": 152885,
246
+ "<|action_1214|>": 152886,
247
+ "<|action_1215|>": 152887,
248
+ "<|action_1216|>": 152888,
249
+ "<|action_1217|>": 152889,
250
+ "<|action_1218|>": 152890,
251
+ "<|action_1219|>": 152891,
252
+ "<|action_121|>": 151793,
253
+ "<|action_1220|>": 152892,
254
+ "<|action_1221|>": 152893,
255
+ "<|action_1222|>": 152894,
256
+ "<|action_1223|>": 152895,
257
+ "<|action_1224|>": 152896,
258
+ "<|action_1225|>": 152897,
259
+ "<|action_1226|>": 152898,
260
+ "<|action_1227|>": 152899,
261
+ "<|action_1228|>": 152900,
262
+ "<|action_1229|>": 152901,
263
+ "<|action_122|>": 151794,
264
+ "<|action_1230|>": 152902,
265
+ "<|action_1231|>": 152903,
266
+ "<|action_1232|>": 152904,
267
+ "<|action_1233|>": 152905,
268
+ "<|action_1234|>": 152906,
269
+ "<|action_1235|>": 152907,
270
+ "<|action_1236|>": 152908,
271
+ "<|action_1237|>": 152909,
272
+ "<|action_1238|>": 152910,
273
+ "<|action_1239|>": 152911,
274
+ "<|action_123|>": 151795,
275
+ "<|action_1240|>": 152912,
276
+ "<|action_1241|>": 152913,
277
+ "<|action_1242|>": 152914,
278
+ "<|action_1243|>": 152915,
279
+ "<|action_1244|>": 152916,
280
+ "<|action_1245|>": 152917,
281
+ "<|action_1246|>": 152918,
282
+ "<|action_1247|>": 152919,
283
+ "<|action_1248|>": 152920,
284
+ "<|action_1249|>": 152921,
285
+ "<|action_124|>": 151796,
286
+ "<|action_1250|>": 152922,
287
+ "<|action_1251|>": 152923,
288
+ "<|action_1252|>": 152924,
289
+ "<|action_1253|>": 152925,
290
+ "<|action_1254|>": 152926,
291
+ "<|action_1255|>": 152927,
292
+ "<|action_1256|>": 152928,
293
+ "<|action_1257|>": 152929,
294
+ "<|action_1258|>": 152930,
295
+ "<|action_1259|>": 152931,
296
+ "<|action_125|>": 151797,
297
+ "<|action_1260|>": 152932,
298
+ "<|action_1261|>": 152933,
299
+ "<|action_1262|>": 152934,
300
+ "<|action_1263|>": 152935,
301
+ "<|action_1264|>": 152936,
302
+ "<|action_1265|>": 152937,
303
+ "<|action_1266|>": 152938,
304
+ "<|action_1267|>": 152939,
305
+ "<|action_1268|>": 152940,
306
+ "<|action_1269|>": 152941,
307
+ "<|action_126|>": 151798,
308
+ "<|action_1270|>": 152942,
309
+ "<|action_1271|>": 152943,
310
+ "<|action_1272|>": 152944,
311
+ "<|action_1273|>": 152945,
312
+ "<|action_1274|>": 152946,
313
+ "<|action_1275|>": 152947,
314
+ "<|action_1276|>": 152948,
315
+ "<|action_1277|>": 152949,
316
+ "<|action_1278|>": 152950,
317
+ "<|action_1279|>": 152951,
318
+ "<|action_127|>": 151799,
319
+ "<|action_1280|>": 152952,
320
+ "<|action_1281|>": 152953,
321
+ "<|action_1282|>": 152954,
322
+ "<|action_1283|>": 152955,
323
+ "<|action_1284|>": 152956,
324
+ "<|action_1285|>": 152957,
325
+ "<|action_1286|>": 152958,
326
+ "<|action_1287|>": 152959,
327
+ "<|action_1288|>": 152960,
328
+ "<|action_1289|>": 152961,
329
+ "<|action_128|>": 151800,
330
+ "<|action_1290|>": 152962,
331
+ "<|action_1291|>": 152963,
332
+ "<|action_1292|>": 152964,
333
+ "<|action_1293|>": 152965,
334
+ "<|action_1294|>": 152966,
335
+ "<|action_1295|>": 152967,
336
+ "<|action_1296|>": 152968,
337
+ "<|action_1297|>": 152969,
338
+ "<|action_1298|>": 152970,
339
+ "<|action_1299|>": 152971,
340
+ "<|action_129|>": 151801,
341
+ "<|action_12|>": 151684,
342
+ "<|action_1300|>": 152972,
343
+ "<|action_1301|>": 152973,
344
+ "<|action_1302|>": 152974,
345
+ "<|action_1303|>": 152975,
346
+ "<|action_1304|>": 152976,
347
+ "<|action_1305|>": 152977,
348
+ "<|action_1306|>": 152978,
349
+ "<|action_1307|>": 152979,
350
+ "<|action_1308|>": 152980,
351
+ "<|action_1309|>": 152981,
352
+ "<|action_130|>": 151802,
353
+ "<|action_1310|>": 152982,
354
+ "<|action_1311|>": 152983,
355
+ "<|action_1312|>": 152984,
356
+ "<|action_1313|>": 152985,
357
+ "<|action_1314|>": 152986,
358
+ "<|action_1315|>": 152987,
359
+ "<|action_1316|>": 152988,
360
+ "<|action_1317|>": 152989,
361
+ "<|action_1318|>": 152990,
362
+ "<|action_1319|>": 152991,
363
+ "<|action_131|>": 151803,
364
+ "<|action_1320|>": 152992,
365
+ "<|action_1321|>": 152993,
366
+ "<|action_1322|>": 152994,
367
+ "<|action_1323|>": 152995,
368
+ "<|action_1324|>": 152996,
369
+ "<|action_1325|>": 152997,
370
+ "<|action_1326|>": 152998,
371
+ "<|action_1327|>": 152999,
372
+ "<|action_1328|>": 153000,
373
+ "<|action_1329|>": 153001,
374
+ "<|action_132|>": 151804,
375
+ "<|action_1330|>": 153002,
376
+ "<|action_1331|>": 153003,
377
+ "<|action_1332|>": 153004,
378
+ "<|action_1333|>": 153005,
379
+ "<|action_1334|>": 153006,
380
+ "<|action_1335|>": 153007,
381
+ "<|action_1336|>": 153008,
382
+ "<|action_1337|>": 153009,
383
+ "<|action_1338|>": 153010,
384
+ "<|action_1339|>": 153011,
385
+ "<|action_133|>": 151805,
386
+ "<|action_1340|>": 153012,
387
+ "<|action_1341|>": 153013,
388
+ "<|action_1342|>": 153014,
389
+ "<|action_1343|>": 153015,
390
+ "<|action_1344|>": 153016,
391
+ "<|action_1345|>": 153017,
392
+ "<|action_1346|>": 153018,
393
+ "<|action_1347|>": 153019,
394
+ "<|action_1348|>": 153020,
395
+ "<|action_1349|>": 153021,
396
+ "<|action_134|>": 151806,
397
+ "<|action_1350|>": 153022,
398
+ "<|action_1351|>": 153023,
399
+ "<|action_1352|>": 153024,
400
+ "<|action_1353|>": 153025,
401
+ "<|action_1354|>": 153026,
402
+ "<|action_1355|>": 153027,
403
+ "<|action_1356|>": 153028,
404
+ "<|action_1357|>": 153029,
405
+ "<|action_1358|>": 153030,
406
+ "<|action_1359|>": 153031,
407
+ "<|action_135|>": 151807,
408
+ "<|action_1360|>": 153032,
409
+ "<|action_1361|>": 153033,
410
+ "<|action_1362|>": 153034,
411
+ "<|action_1363|>": 153035,
412
+ "<|action_1364|>": 153036,
413
+ "<|action_1365|>": 153037,
414
+ "<|action_1366|>": 153038,
415
+ "<|action_1367|>": 153039,
416
+ "<|action_1368|>": 153040,
417
+ "<|action_1369|>": 153041,
418
+ "<|action_136|>": 151808,
419
+ "<|action_1370|>": 153042,
420
+ "<|action_1371|>": 153043,
421
+ "<|action_1372|>": 153044,
422
+ "<|action_1373|>": 153045,
423
+ "<|action_1374|>": 153046,
424
+ "<|action_1375|>": 153047,
425
+ "<|action_1376|>": 153048,
426
+ "<|action_1377|>": 153049,
427
+ "<|action_1378|>": 153050,
428
+ "<|action_1379|>": 153051,
429
+ "<|action_137|>": 151809,
430
+ "<|action_1380|>": 153052,
431
+ "<|action_1381|>": 153053,
432
+ "<|action_1382|>": 153054,
433
+ "<|action_1383|>": 153055,
434
+ "<|action_1384|>": 153056,
435
+ "<|action_1385|>": 153057,
436
+ "<|action_1386|>": 153058,
437
+ "<|action_1387|>": 153059,
438
+ "<|action_1388|>": 153060,
439
+ "<|action_1389|>": 153061,
440
+ "<|action_138|>": 151810,
441
+ "<|action_1390|>": 153062,
442
+ "<|action_1391|>": 153063,
443
+ "<|action_1392|>": 153064,
444
+ "<|action_1393|>": 153065,
445
+ "<|action_1394|>": 153066,
446
+ "<|action_1395|>": 153067,
447
+ "<|action_1396|>": 153068,
448
+ "<|action_1397|>": 153069,
449
+ "<|action_1398|>": 153070,
450
+ "<|action_1399|>": 153071,
451
+ "<|action_139|>": 151811,
452
+ "<|action_13|>": 151685,
453
+ "<|action_1400|>": 153072,
454
+ "<|action_1401|>": 153073,
455
+ "<|action_1402|>": 153074,
456
+ "<|action_1403|>": 153075,
457
+ "<|action_1404|>": 153076,
458
+ "<|action_1405|>": 153077,
459
+ "<|action_1406|>": 153078,
460
+ "<|action_1407|>": 153079,
461
+ "<|action_1408|>": 153080,
462
+ "<|action_1409|>": 153081,
463
+ "<|action_140|>": 151812,
464
+ "<|action_1410|>": 153082,
465
+ "<|action_1411|>": 153083,
466
+ "<|action_1412|>": 153084,
467
+ "<|action_1413|>": 153085,
468
+ "<|action_1414|>": 153086,
469
+ "<|action_1415|>": 153087,
470
+ "<|action_1416|>": 153088,
471
+ "<|action_1417|>": 153089,
472
+ "<|action_1418|>": 153090,
473
+ "<|action_1419|>": 153091,
474
+ "<|action_141|>": 151813,
475
+ "<|action_1420|>": 153092,
476
+ "<|action_1421|>": 153093,
477
+ "<|action_1422|>": 153094,
478
+ "<|action_1423|>": 153095,
479
+ "<|action_1424|>": 153096,
480
+ "<|action_1425|>": 153097,
481
+ "<|action_1426|>": 153098,
482
+ "<|action_1427|>": 153099,
483
+ "<|action_1428|>": 153100,
484
+ "<|action_1429|>": 153101,
485
+ "<|action_142|>": 151814,
486
+ "<|action_1430|>": 153102,
487
+ "<|action_1431|>": 153103,
488
+ "<|action_1432|>": 153104,
489
+ "<|action_1433|>": 153105,
490
+ "<|action_1434|>": 153106,
491
+ "<|action_1435|>": 153107,
492
+ "<|action_1436|>": 153108,
493
+ "<|action_1437|>": 153109,
494
+ "<|action_1438|>": 153110,
495
+ "<|action_1439|>": 153111,
496
+ "<|action_143|>": 151815,
497
+ "<|action_1440|>": 153112,
498
+ "<|action_1441|>": 153113,
499
+ "<|action_1442|>": 153114,
500
+ "<|action_1443|>": 153115,
501
+ "<|action_1444|>": 153116,
502
+ "<|action_1445|>": 153117,
503
+ "<|action_1446|>": 153118,
504
+ "<|action_1447|>": 153119,
505
+ "<|action_1448|>": 153120,
506
+ "<|action_1449|>": 153121,
507
+ "<|action_144|>": 151816,
508
+ "<|action_1450|>": 153122,
509
+ "<|action_1451|>": 153123,
510
+ "<|action_1452|>": 153124,
511
+ "<|action_1453|>": 153125,
512
+ "<|action_1454|>": 153126,
513
+ "<|action_1455|>": 153127,
514
+ "<|action_1456|>": 153128,
515
+ "<|action_1457|>": 153129,
516
+ "<|action_1458|>": 153130,
517
+ "<|action_1459|>": 153131,
518
+ "<|action_145|>": 151817,
519
+ "<|action_1460|>": 153132,
520
+ "<|action_1461|>": 153133,
521
+ "<|action_1462|>": 153134,
522
+ "<|action_1463|>": 153135,
523
+ "<|action_1464|>": 153136,
524
+ "<|action_1465|>": 153137,
525
+ "<|action_1466|>": 153138,
526
+ "<|action_1467|>": 153139,
527
+ "<|action_1468|>": 153140,
528
+ "<|action_1469|>": 153141,
529
+ "<|action_146|>": 151818,
530
+ "<|action_1470|>": 153142,
531
+ "<|action_1471|>": 153143,
532
+ "<|action_1472|>": 153144,
533
+ "<|action_1473|>": 153145,
534
+ "<|action_1474|>": 153146,
535
+ "<|action_1475|>": 153147,
536
+ "<|action_1476|>": 153148,
537
+ "<|action_1477|>": 153149,
538
+ "<|action_1478|>": 153150,
539
+ "<|action_1479|>": 153151,
540
+ "<|action_147|>": 151819,
541
+ "<|action_1480|>": 153152,
542
+ "<|action_1481|>": 153153,
543
+ "<|action_1482|>": 153154,
544
+ "<|action_1483|>": 153155,
545
+ "<|action_1484|>": 153156,
546
+ "<|action_1485|>": 153157,
547
+ "<|action_1486|>": 153158,
548
+ "<|action_1487|>": 153159,
549
+ "<|action_1488|>": 153160,
550
+ "<|action_1489|>": 153161,
551
+ "<|action_148|>": 151820,
552
+ "<|action_1490|>": 153162,
553
+ "<|action_1491|>": 153163,
554
+ "<|action_1492|>": 153164,
555
+ "<|action_1493|>": 153165,
556
+ "<|action_1494|>": 153166,
557
+ "<|action_1495|>": 153167,
558
+ "<|action_1496|>": 153168,
559
+ "<|action_1497|>": 153169,
560
+ "<|action_1498|>": 153170,
561
+ "<|action_1499|>": 153171,
562
+ "<|action_149|>": 151821,
563
+ "<|action_14|>": 151686,
564
+ "<|action_1500|>": 153172,
565
+ "<|action_1501|>": 153173,
566
+ "<|action_1502|>": 153174,
567
+ "<|action_1503|>": 153175,
568
+ "<|action_1504|>": 153176,
569
+ "<|action_1505|>": 153177,
570
+ "<|action_1506|>": 153178,
571
+ "<|action_1507|>": 153179,
572
+ "<|action_1508|>": 153180,
573
+ "<|action_1509|>": 153181,
574
+ "<|action_150|>": 151822,
575
+ "<|action_1510|>": 153182,
576
+ "<|action_1511|>": 153183,
577
+ "<|action_1512|>": 153184,
578
+ "<|action_1513|>": 153185,
579
+ "<|action_1514|>": 153186,
580
+ "<|action_1515|>": 153187,
581
+ "<|action_1516|>": 153188,
582
+ "<|action_1517|>": 153189,
583
+ "<|action_1518|>": 153190,
584
+ "<|action_1519|>": 153191,
585
+ "<|action_151|>": 151823,
586
+ "<|action_1520|>": 153192,
587
+ "<|action_1521|>": 153193,
588
+ "<|action_1522|>": 153194,
589
+ "<|action_1523|>": 153195,
590
+ "<|action_1524|>": 153196,
591
+ "<|action_1525|>": 153197,
592
+ "<|action_1526|>": 153198,
593
+ "<|action_1527|>": 153199,
594
+ "<|action_1528|>": 153200,
595
+ "<|action_1529|>": 153201,
596
+ "<|action_152|>": 151824,
597
+ "<|action_1530|>": 153202,
598
+ "<|action_1531|>": 153203,
599
+ "<|action_1532|>": 153204,
600
+ "<|action_1533|>": 153205,
601
+ "<|action_1534|>": 153206,
602
+ "<|action_1535|>": 153207,
603
+ "<|action_1536|>": 153208,
604
+ "<|action_1537|>": 153209,
605
+ "<|action_1538|>": 153210,
606
+ "<|action_1539|>": 153211,
607
+ "<|action_153|>": 151825,
608
+ "<|action_1540|>": 153212,
609
+ "<|action_1541|>": 153213,
610
+ "<|action_1542|>": 153214,
611
+ "<|action_1543|>": 153215,
612
+ "<|action_1544|>": 153216,
613
+ "<|action_1545|>": 153217,
614
+ "<|action_1546|>": 153218,
615
+ "<|action_1547|>": 153219,
616
+ "<|action_1548|>": 153220,
617
+ "<|action_1549|>": 153221,
618
+ "<|action_154|>": 151826,
619
+ "<|action_1550|>": 153222,
620
+ "<|action_1551|>": 153223,
621
+ "<|action_1552|>": 153224,
622
+ "<|action_1553|>": 153225,
623
+ "<|action_1554|>": 153226,
624
+ "<|action_1555|>": 153227,
625
+ "<|action_1556|>": 153228,
626
+ "<|action_1557|>": 153229,
627
+ "<|action_1558|>": 153230,
628
+ "<|action_1559|>": 153231,
629
+ "<|action_155|>": 151827,
630
+ "<|action_1560|>": 153232,
631
+ "<|action_1561|>": 153233,
632
+ "<|action_1562|>": 153234,
633
+ "<|action_1563|>": 153235,
634
+ "<|action_1564|>": 153236,
635
+ "<|action_1565|>": 153237,
636
+ "<|action_1566|>": 153238,
637
+ "<|action_1567|>": 153239,
638
+ "<|action_1568|>": 153240,
639
+ "<|action_1569|>": 153241,
640
+ "<|action_156|>": 151828,
641
+ "<|action_1570|>": 153242,
642
+ "<|action_1571|>": 153243,
643
+ "<|action_1572|>": 153244,
644
+ "<|action_1573|>": 153245,
645
+ "<|action_1574|>": 153246,
646
+ "<|action_1575|>": 153247,
647
+ "<|action_1576|>": 153248,
648
+ "<|action_1577|>": 153249,
649
+ "<|action_1578|>": 153250,
650
+ "<|action_1579|>": 153251,
651
+ "<|action_157|>": 151829,
652
+ "<|action_1580|>": 153252,
653
+ "<|action_1581|>": 153253,
654
+ "<|action_1582|>": 153254,
655
+ "<|action_1583|>": 153255,
656
+ "<|action_1584|>": 153256,
657
+ "<|action_1585|>": 153257,
658
+ "<|action_1586|>": 153258,
659
+ "<|action_1587|>": 153259,
660
+ "<|action_1588|>": 153260,
661
+ "<|action_1589|>": 153261,
662
+ "<|action_158|>": 151830,
663
+ "<|action_1590|>": 153262,
664
+ "<|action_1591|>": 153263,
665
+ "<|action_1592|>": 153264,
666
+ "<|action_1593|>": 153265,
667
+ "<|action_1594|>": 153266,
668
+ "<|action_1595|>": 153267,
669
+ "<|action_1596|>": 153268,
670
+ "<|action_1597|>": 153269,
671
+ "<|action_1598|>": 153270,
672
+ "<|action_1599|>": 153271,
673
+ "<|action_159|>": 151831,
674
+ "<|action_15|>": 151687,
675
+ "<|action_1600|>": 153272,
676
+ "<|action_1601|>": 153273,
677
+ "<|action_1602|>": 153274,
678
+ "<|action_1603|>": 153275,
679
+ "<|action_1604|>": 153276,
680
+ "<|action_1605|>": 153277,
681
+ "<|action_1606|>": 153278,
682
+ "<|action_1607|>": 153279,
683
+ "<|action_1608|>": 153280,
684
+ "<|action_1609|>": 153281,
685
+ "<|action_160|>": 151832,
686
+ "<|action_1610|>": 153282,
687
+ "<|action_1611|>": 153283,
688
+ "<|action_1612|>": 153284,
689
+ "<|action_1613|>": 153285,
690
+ "<|action_1614|>": 153286,
691
+ "<|action_1615|>": 153287,
692
+ "<|action_1616|>": 153288,
693
+ "<|action_1617|>": 153289,
694
+ "<|action_1618|>": 153290,
695
+ "<|action_1619|>": 153291,
696
+ "<|action_161|>": 151833,
697
+ "<|action_1620|>": 153292,
698
+ "<|action_1621|>": 153293,
699
+ "<|action_1622|>": 153294,
700
+ "<|action_1623|>": 153295,
701
+ "<|action_1624|>": 153296,
702
+ "<|action_1625|>": 153297,
703
+ "<|action_1626|>": 153298,
704
+ "<|action_1627|>": 153299,
705
+ "<|action_1628|>": 153300,
706
+ "<|action_1629|>": 153301,
707
+ "<|action_162|>": 151834,
708
+ "<|action_1630|>": 153302,
709
+ "<|action_1631|>": 153303,
710
+ "<|action_1632|>": 153304,
711
+ "<|action_1633|>": 153305,
712
+ "<|action_1634|>": 153306,
713
+ "<|action_1635|>": 153307,
714
+ "<|action_1636|>": 153308,
715
+ "<|action_1637|>": 153309,
716
+ "<|action_1638|>": 153310,
717
+ "<|action_1639|>": 153311,
718
+ "<|action_163|>": 151835,
719
+ "<|action_1640|>": 153312,
720
+ "<|action_1641|>": 153313,
721
+ "<|action_1642|>": 153314,
722
+ "<|action_1643|>": 153315,
723
+ "<|action_1644|>": 153316,
724
+ "<|action_1645|>": 153317,
725
+ "<|action_1646|>": 153318,
726
+ "<|action_1647|>": 153319,
727
+ "<|action_1648|>": 153320,
728
+ "<|action_1649|>": 153321,
729
+ "<|action_164|>": 151836,
730
+ "<|action_1650|>": 153322,
731
+ "<|action_1651|>": 153323,
732
+ "<|action_1652|>": 153324,
733
+ "<|action_1653|>": 153325,
734
+ "<|action_1654|>": 153326,
735
+ "<|action_1655|>": 153327,
736
+ "<|action_1656|>": 153328,
737
+ "<|action_1657|>": 153329,
738
+ "<|action_1658|>": 153330,
739
+ "<|action_1659|>": 153331,
740
+ "<|action_165|>": 151837,
741
+ "<|action_1660|>": 153332,
742
+ "<|action_1661|>": 153333,
743
+ "<|action_1662|>": 153334,
744
+ "<|action_1663|>": 153335,
745
+ "<|action_1664|>": 153336,
746
+ "<|action_1665|>": 153337,
747
+ "<|action_1666|>": 153338,
748
+ "<|action_1667|>": 153339,
749
+ "<|action_1668|>": 153340,
750
+ "<|action_1669|>": 153341,
751
+ "<|action_166|>": 151838,
752
+ "<|action_1670|>": 153342,
753
+ "<|action_1671|>": 153343,
754
+ "<|action_1672|>": 153344,
755
+ "<|action_1673|>": 153345,
756
+ "<|action_1674|>": 153346,
757
+ "<|action_1675|>": 153347,
758
+ "<|action_1676|>": 153348,
759
+ "<|action_1677|>": 153349,
760
+ "<|action_1678|>": 153350,
761
+ "<|action_1679|>": 153351,
762
+ "<|action_167|>": 151839,
763
+ "<|action_1680|>": 153352,
764
+ "<|action_1681|>": 153353,
765
+ "<|action_1682|>": 153354,
766
+ "<|action_1683|>": 153355,
767
+ "<|action_1684|>": 153356,
768
+ "<|action_1685|>": 153357,
769
+ "<|action_1686|>": 153358,
770
+ "<|action_1687|>": 153359,
771
+ "<|action_1688|>": 153360,
772
+ "<|action_1689|>": 153361,
773
+ "<|action_168|>": 151840,
774
+ "<|action_1690|>": 153362,
775
+ "<|action_1691|>": 153363,
776
+ "<|action_1692|>": 153364,
777
+ "<|action_1693|>": 153365,
778
+ "<|action_1694|>": 153366,
779
+ "<|action_1695|>": 153367,
780
+ "<|action_1696|>": 153368,
781
+ "<|action_1697|>": 153369,
782
+ "<|action_1698|>": 153370,
783
+ "<|action_1699|>": 153371,
784
+ "<|action_169|>": 151841,
785
+ "<|action_16|>": 151688,
786
+ "<|action_1700|>": 153372,
787
+ "<|action_1701|>": 153373,
788
+ "<|action_1702|>": 153374,
789
+ "<|action_1703|>": 153375,
790
+ "<|action_1704|>": 153376,
791
+ "<|action_1705|>": 153377,
792
+ "<|action_1706|>": 153378,
793
+ "<|action_1707|>": 153379,
794
+ "<|action_1708|>": 153380,
795
+ "<|action_1709|>": 153381,
796
+ "<|action_170|>": 151842,
797
+ "<|action_1710|>": 153382,
798
+ "<|action_1711|>": 153383,
799
+ "<|action_1712|>": 153384,
800
+ "<|action_1713|>": 153385,
801
+ "<|action_1714|>": 153386,
802
+ "<|action_1715|>": 153387,
803
+ "<|action_1716|>": 153388,
804
+ "<|action_1717|>": 153389,
805
+ "<|action_1718|>": 153390,
806
+ "<|action_1719|>": 153391,
807
+ "<|action_171|>": 151843,
808
+ "<|action_1720|>": 153392,
809
+ "<|action_1721|>": 153393,
810
+ "<|action_1722|>": 153394,
811
+ "<|action_1723|>": 153395,
812
+ "<|action_1724|>": 153396,
813
+ "<|action_1725|>": 153397,
814
+ "<|action_1726|>": 153398,
815
+ "<|action_1727|>": 153399,
816
+ "<|action_1728|>": 153400,
817
+ "<|action_1729|>": 153401,
818
+ "<|action_172|>": 151844,
819
+ "<|action_1730|>": 153402,
820
+ "<|action_1731|>": 153403,
821
+ "<|action_1732|>": 153404,
822
+ "<|action_1733|>": 153405,
823
+ "<|action_1734|>": 153406,
824
+ "<|action_1735|>": 153407,
825
+ "<|action_1736|>": 153408,
826
+ "<|action_1737|>": 153409,
827
+ "<|action_1738|>": 153410,
828
+ "<|action_1739|>": 153411,
829
+ "<|action_173|>": 151845,
830
+ "<|action_1740|>": 153412,
831
+ "<|action_1741|>": 153413,
832
+ "<|action_1742|>": 153414,
833
+ "<|action_1743|>": 153415,
834
+ "<|action_1744|>": 153416,
835
+ "<|action_1745|>": 153417,
836
+ "<|action_1746|>": 153418,
837
+ "<|action_1747|>": 153419,
838
+ "<|action_1748|>": 153420,
839
+ "<|action_1749|>": 153421,
840
+ "<|action_174|>": 151846,
841
+ "<|action_1750|>": 153422,
842
+ "<|action_1751|>": 153423,
843
+ "<|action_1752|>": 153424,
844
+ "<|action_1753|>": 153425,
845
+ "<|action_1754|>": 153426,
846
+ "<|action_1755|>": 153427,
847
+ "<|action_1756|>": 153428,
848
+ "<|action_1757|>": 153429,
849
+ "<|action_1758|>": 153430,
850
+ "<|action_1759|>": 153431,
851
+ "<|action_175|>": 151847,
852
+ "<|action_1760|>": 153432,
853
+ "<|action_1761|>": 153433,
854
+ "<|action_1762|>": 153434,
855
+ "<|action_1763|>": 153435,
856
+ "<|action_1764|>": 153436,
857
+ "<|action_1765|>": 153437,
858
+ "<|action_1766|>": 153438,
859
+ "<|action_1767|>": 153439,
860
+ "<|action_1768|>": 153440,
861
+ "<|action_1769|>": 153441,
862
+ "<|action_176|>": 151848,
863
+ "<|action_1770|>": 153442,
864
+ "<|action_1771|>": 153443,
865
+ "<|action_1772|>": 153444,
866
+ "<|action_1773|>": 153445,
867
+ "<|action_1774|>": 153446,
868
+ "<|action_1775|>": 153447,
869
+ "<|action_1776|>": 153448,
870
+ "<|action_1777|>": 153449,
871
+ "<|action_1778|>": 153450,
872
+ "<|action_1779|>": 153451,
873
+ "<|action_177|>": 151849,
874
+ "<|action_1780|>": 153452,
875
+ "<|action_1781|>": 153453,
876
+ "<|action_1782|>": 153454,
877
+ "<|action_1783|>": 153455,
878
+ "<|action_1784|>": 153456,
879
+ "<|action_1785|>": 153457,
880
+ "<|action_1786|>": 153458,
881
+ "<|action_1787|>": 153459,
882
+ "<|action_1788|>": 153460,
883
+ "<|action_1789|>": 153461,
884
+ "<|action_178|>": 151850,
885
+ "<|action_1790|>": 153462,
886
+ "<|action_1791|>": 153463,
887
+ "<|action_1792|>": 153464,
888
+ "<|action_1793|>": 153465,
889
+ "<|action_1794|>": 153466,
890
+ "<|action_1795|>": 153467,
891
+ "<|action_1796|>": 153468,
892
+ "<|action_1797|>": 153469,
893
+ "<|action_1798|>": 153470,
894
+ "<|action_1799|>": 153471,
895
+ "<|action_179|>": 151851,
896
+ "<|action_17|>": 151689,
897
+ "<|action_1800|>": 153472,
898
+ "<|action_1801|>": 153473,
899
+ "<|action_1802|>": 153474,
900
+ "<|action_1803|>": 153475,
901
+ "<|action_1804|>": 153476,
902
+ "<|action_1805|>": 153477,
903
+ "<|action_1806|>": 153478,
904
+ "<|action_1807|>": 153479,
905
+ "<|action_1808|>": 153480,
906
+ "<|action_1809|>": 153481,
907
+ "<|action_180|>": 151852,
908
+ "<|action_1810|>": 153482,
909
+ "<|action_1811|>": 153483,
910
+ "<|action_1812|>": 153484,
911
+ "<|action_1813|>": 153485,
912
+ "<|action_1814|>": 153486,
913
+ "<|action_1815|>": 153487,
914
+ "<|action_1816|>": 153488,
915
+ "<|action_1817|>": 153489,
916
+ "<|action_1818|>": 153490,
917
+ "<|action_1819|>": 153491,
918
+ "<|action_181|>": 151853,
919
+ "<|action_1820|>": 153492,
920
+ "<|action_1821|>": 153493,
921
+ "<|action_1822|>": 153494,
922
+ "<|action_1823|>": 153495,
923
+ "<|action_1824|>": 153496,
924
+ "<|action_1825|>": 153497,
925
+ "<|action_1826|>": 153498,
926
+ "<|action_1827|>": 153499,
927
+ "<|action_1828|>": 153500,
928
+ "<|action_1829|>": 153501,
929
+ "<|action_182|>": 151854,
930
+ "<|action_1830|>": 153502,
931
+ "<|action_1831|>": 153503,
932
+ "<|action_1832|>": 153504,
933
+ "<|action_1833|>": 153505,
934
+ "<|action_1834|>": 153506,
935
+ "<|action_1835|>": 153507,
936
+ "<|action_1836|>": 153508,
937
+ "<|action_1837|>": 153509,
938
+ "<|action_1838|>": 153510,
939
+ "<|action_1839|>": 153511,
940
+ "<|action_183|>": 151855,
941
+ "<|action_1840|>": 153512,
942
+ "<|action_1841|>": 153513,
943
+ "<|action_1842|>": 153514,
944
+ "<|action_1843|>": 153515,
945
+ "<|action_1844|>": 153516,
946
+ "<|action_1845|>": 153517,
947
+ "<|action_1846|>": 153518,
948
+ "<|action_1847|>": 153519,
949
+ "<|action_1848|>": 153520,
950
+ "<|action_1849|>": 153521,
951
+ "<|action_184|>": 151856,
952
+ "<|action_1850|>": 153522,
953
+ "<|action_1851|>": 153523,
954
+ "<|action_1852|>": 153524,
955
+ "<|action_1853|>": 153525,
956
+ "<|action_1854|>": 153526,
957
+ "<|action_1855|>": 153527,
958
+ "<|action_1856|>": 153528,
959
+ "<|action_1857|>": 153529,
960
+ "<|action_1858|>": 153530,
961
+ "<|action_1859|>": 153531,
962
+ "<|action_185|>": 151857,
963
+ "<|action_1860|>": 153532,
964
+ "<|action_1861|>": 153533,
965
+ "<|action_1862|>": 153534,
966
+ "<|action_1863|>": 153535,
967
+ "<|action_1864|>": 153536,
968
+ "<|action_1865|>": 153537,
969
+ "<|action_1866|>": 153538,
970
+ "<|action_1867|>": 153539,
971
+ "<|action_1868|>": 153540,
972
+ "<|action_1869|>": 153541,
973
+ "<|action_186|>": 151858,
974
+ "<|action_1870|>": 153542,
975
+ "<|action_1871|>": 153543,
976
+ "<|action_1872|>": 153544,
977
+ "<|action_1873|>": 153545,
978
+ "<|action_1874|>": 153546,
979
+ "<|action_1875|>": 153547,
980
+ "<|action_1876|>": 153548,
981
+ "<|action_1877|>": 153549,
982
+ "<|action_1878|>": 153550,
983
+ "<|action_1879|>": 153551,
984
+ "<|action_187|>": 151859,
985
+ "<|action_1880|>": 153552,
986
+ "<|action_1881|>": 153553,
987
+ "<|action_1882|>": 153554,
988
+ "<|action_1883|>": 153555,
989
+ "<|action_1884|>": 153556,
990
+ "<|action_1885|>": 153557,
991
+ "<|action_1886|>": 153558,
992
+ "<|action_1887|>": 153559,
993
+ "<|action_1888|>": 153560,
994
+ "<|action_1889|>": 153561,
995
+ "<|action_188|>": 151860,
996
+ "<|action_1890|>": 153562,
997
+ "<|action_1891|>": 153563,
998
+ "<|action_1892|>": 153564,
999
+ "<|action_1893|>": 153565,
1000
+ "<|action_1894|>": 153566,
1001
+ "<|action_1895|>": 153567,
1002
+ "<|action_1896|>": 153568,
1003
+ "<|action_1897|>": 153569,
1004
+ "<|action_1898|>": 153570,
1005
+ "<|action_1899|>": 153571,
1006
+ "<|action_189|>": 151861,
1007
+ "<|action_18|>": 151690,
1008
+ "<|action_1900|>": 153572,
1009
+ "<|action_1901|>": 153573,
1010
+ "<|action_1902|>": 153574,
1011
+ "<|action_1903|>": 153575,
1012
+ "<|action_1904|>": 153576,
1013
+ "<|action_1905|>": 153577,
1014
+ "<|action_1906|>": 153578,
1015
+ "<|action_1907|>": 153579,
1016
+ "<|action_1908|>": 153580,
1017
+ "<|action_1909|>": 153581,
1018
+ "<|action_190|>": 151862,
1019
+ "<|action_1910|>": 153582,
1020
+ "<|action_1911|>": 153583,
1021
+ "<|action_1912|>": 153584,
1022
+ "<|action_1913|>": 153585,
1023
+ "<|action_1914|>": 153586,
1024
+ "<|action_1915|>": 153587,
1025
+ "<|action_1916|>": 153588,
1026
+ "<|action_1917|>": 153589,
1027
+ "<|action_1918|>": 153590,
1028
+ "<|action_1919|>": 153591,
1029
+ "<|action_191|>": 151863,
1030
+ "<|action_1920|>": 153592,
1031
+ "<|action_1921|>": 153593,
1032
+ "<|action_1922|>": 153594,
1033
+ "<|action_1923|>": 153595,
1034
+ "<|action_1924|>": 153596,
1035
+ "<|action_1925|>": 153597,
1036
+ "<|action_1926|>": 153598,
1037
+ "<|action_1927|>": 153599,
1038
+ "<|action_1928|>": 153600,
1039
+ "<|action_1929|>": 153601,
1040
+ "<|action_192|>": 151864,
1041
+ "<|action_1930|>": 153602,
1042
+ "<|action_1931|>": 153603,
1043
+ "<|action_1932|>": 153604,
1044
+ "<|action_1933|>": 153605,
1045
+ "<|action_1934|>": 153606,
1046
+ "<|action_1935|>": 153607,
1047
+ "<|action_1936|>": 153608,
1048
+ "<|action_1937|>": 153609,
1049
+ "<|action_1938|>": 153610,
1050
+ "<|action_1939|>": 153611,
1051
+ "<|action_193|>": 151865,
1052
+ "<|action_1940|>": 153612,
1053
+ "<|action_1941|>": 153613,
1054
+ "<|action_1942|>": 153614,
1055
+ "<|action_1943|>": 153615,
1056
+ "<|action_1944|>": 153616,
1057
+ "<|action_1945|>": 153617,
1058
+ "<|action_1946|>": 153618,
1059
+ "<|action_1947|>": 153619,
1060
+ "<|action_1948|>": 153620,
1061
+ "<|action_1949|>": 153621,
1062
+ "<|action_194|>": 151866,
1063
+ "<|action_1950|>": 153622,
1064
+ "<|action_1951|>": 153623,
1065
+ "<|action_1952|>": 153624,
1066
+ "<|action_1953|>": 153625,
1067
+ "<|action_1954|>": 153626,
1068
+ "<|action_1955|>": 153627,
1069
+ "<|action_1956|>": 153628,
1070
+ "<|action_1957|>": 153629,
1071
+ "<|action_1958|>": 153630,
1072
+ "<|action_1959|>": 153631,
1073
+ "<|action_195|>": 151867,
1074
+ "<|action_1960|>": 153632,
1075
+ "<|action_1961|>": 153633,
1076
+ "<|action_1962|>": 153634,
1077
+ "<|action_1963|>": 153635,
1078
+ "<|action_1964|>": 153636,
1079
+ "<|action_1965|>": 153637,
1080
+ "<|action_1966|>": 153638,
1081
+ "<|action_1967|>": 153639,
1082
+ "<|action_1968|>": 153640,
1083
+ "<|action_1969|>": 153641,
1084
+ "<|action_196|>": 151868,
1085
+ "<|action_1970|>": 153642,
1086
+ "<|action_1971|>": 153643,
1087
+ "<|action_1972|>": 153644,
1088
+ "<|action_1973|>": 153645,
1089
+ "<|action_1974|>": 153646,
1090
+ "<|action_1975|>": 153647,
1091
+ "<|action_1976|>": 153648,
1092
+ "<|action_1977|>": 153649,
1093
+ "<|action_1978|>": 153650,
1094
+ "<|action_1979|>": 153651,
1095
+ "<|action_197|>": 151869,
1096
+ "<|action_1980|>": 153652,
1097
+ "<|action_1981|>": 153653,
1098
+ "<|action_1982|>": 153654,
1099
+ "<|action_1983|>": 153655,
1100
+ "<|action_1984|>": 153656,
1101
+ "<|action_1985|>": 153657,
1102
+ "<|action_1986|>": 153658,
1103
+ "<|action_1987|>": 153659,
1104
+ "<|action_1988|>": 153660,
1105
+ "<|action_1989|>": 153661,
1106
+ "<|action_198|>": 151870,
1107
+ "<|action_1990|>": 153662,
1108
+ "<|action_1991|>": 153663,
1109
+ "<|action_1992|>": 153664,
1110
+ "<|action_1993|>": 153665,
1111
+ "<|action_1994|>": 153666,
1112
+ "<|action_1995|>": 153667,
1113
+ "<|action_1996|>": 153668,
1114
+ "<|action_1997|>": 153669,
1115
+ "<|action_1998|>": 153670,
1116
+ "<|action_1999|>": 153671,
1117
+ "<|action_199|>": 151871,
1118
+ "<|action_19|>": 151691,
1119
+ "<|action_1|>": 151673,
1120
+ "<|action_2000|>": 153672,
1121
+ "<|action_2001|>": 153673,
1122
+ "<|action_2002|>": 153674,
1123
+ "<|action_2003|>": 153675,
1124
+ "<|action_2004|>": 153676,
1125
+ "<|action_2005|>": 153677,
1126
+ "<|action_2006|>": 153678,
1127
+ "<|action_2007|>": 153679,
1128
+ "<|action_2008|>": 153680,
1129
+ "<|action_2009|>": 153681,
1130
+ "<|action_200|>": 151872,
1131
+ "<|action_2010|>": 153682,
1132
+ "<|action_2011|>": 153683,
1133
+ "<|action_2012|>": 153684,
1134
+ "<|action_2013|>": 153685,
1135
+ "<|action_2014|>": 153686,
1136
+ "<|action_2015|>": 153687,
1137
+ "<|action_2016|>": 153688,
1138
+ "<|action_2017|>": 153689,
1139
+ "<|action_2018|>": 153690,
1140
+ "<|action_2019|>": 153691,
1141
+ "<|action_201|>": 151873,
1142
+ "<|action_2020|>": 153692,
1143
+ "<|action_2021|>": 153693,
1144
+ "<|action_2022|>": 153694,
1145
+ "<|action_2023|>": 153695,
1146
+ "<|action_2024|>": 153696,
1147
+ "<|action_2025|>": 153697,
1148
+ "<|action_2026|>": 153698,
1149
+ "<|action_2027|>": 153699,
1150
+ "<|action_2028|>": 153700,
1151
+ "<|action_2029|>": 153701,
1152
+ "<|action_202|>": 151874,
1153
+ "<|action_2030|>": 153702,
1154
+ "<|action_2031|>": 153703,
1155
+ "<|action_2032|>": 153704,
1156
+ "<|action_2033|>": 153705,
1157
+ "<|action_2034|>": 153706,
1158
+ "<|action_2035|>": 153707,
1159
+ "<|action_2036|>": 153708,
1160
+ "<|action_2037|>": 153709,
1161
+ "<|action_2038|>": 153710,
1162
+ "<|action_2039|>": 153711,
1163
+ "<|action_203|>": 151875,
1164
+ "<|action_2040|>": 153712,
1165
+ "<|action_2041|>": 153713,
1166
+ "<|action_2042|>": 153714,
1167
+ "<|action_2043|>": 153715,
1168
+ "<|action_2044|>": 153716,
1169
+ "<|action_2045|>": 153717,
1170
+ "<|action_2046|>": 153718,
1171
+ "<|action_2047|>": 153719,
1172
+ "<|action_204|>": 151876,
1173
+ "<|action_205|>": 151877,
1174
+ "<|action_206|>": 151878,
1175
+ "<|action_207|>": 151879,
1176
+ "<|action_208|>": 151880,
1177
+ "<|action_209|>": 151881,
1178
+ "<|action_20|>": 151692,
1179
+ "<|action_210|>": 151882,
1180
+ "<|action_211|>": 151883,
1181
+ "<|action_212|>": 151884,
1182
+ "<|action_213|>": 151885,
1183
+ "<|action_214|>": 151886,
1184
+ "<|action_215|>": 151887,
1185
+ "<|action_216|>": 151888,
1186
+ "<|action_217|>": 151889,
1187
+ "<|action_218|>": 151890,
1188
+ "<|action_219|>": 151891,
1189
+ "<|action_21|>": 151693,
1190
+ "<|action_220|>": 151892,
1191
+ "<|action_221|>": 151893,
1192
+ "<|action_222|>": 151894,
1193
+ "<|action_223|>": 151895,
1194
+ "<|action_224|>": 151896,
1195
+ "<|action_225|>": 151897,
1196
+ "<|action_226|>": 151898,
1197
+ "<|action_227|>": 151899,
1198
+ "<|action_228|>": 151900,
1199
+ "<|action_229|>": 151901,
1200
+ "<|action_22|>": 151694,
1201
+ "<|action_230|>": 151902,
1202
+ "<|action_231|>": 151903,
1203
+ "<|action_232|>": 151904,
1204
+ "<|action_233|>": 151905,
1205
+ "<|action_234|>": 151906,
1206
+ "<|action_235|>": 151907,
1207
+ "<|action_236|>": 151908,
1208
+ "<|action_237|>": 151909,
1209
+ "<|action_238|>": 151910,
1210
+ "<|action_239|>": 151911,
1211
+ "<|action_23|>": 151695,
1212
+ "<|action_240|>": 151912,
1213
+ "<|action_241|>": 151913,
1214
+ "<|action_242|>": 151914,
1215
+ "<|action_243|>": 151915,
1216
+ "<|action_244|>": 151916,
1217
+ "<|action_245|>": 151917,
1218
+ "<|action_246|>": 151918,
1219
+ "<|action_247|>": 151919,
1220
+ "<|action_248|>": 151920,
1221
+ "<|action_249|>": 151921,
1222
+ "<|action_24|>": 151696,
1223
+ "<|action_250|>": 151922,
1224
+ "<|action_251|>": 151923,
1225
+ "<|action_252|>": 151924,
1226
+ "<|action_253|>": 151925,
1227
+ "<|action_254|>": 151926,
1228
+ "<|action_255|>": 151927,
1229
+ "<|action_256|>": 151928,
1230
+ "<|action_257|>": 151929,
1231
+ "<|action_258|>": 151930,
1232
+ "<|action_259|>": 151931,
1233
+ "<|action_25|>": 151697,
1234
+ "<|action_260|>": 151932,
1235
+ "<|action_261|>": 151933,
1236
+ "<|action_262|>": 151934,
1237
+ "<|action_263|>": 151935,
1238
+ "<|action_264|>": 151936,
1239
+ "<|action_265|>": 151937,
1240
+ "<|action_266|>": 151938,
1241
+ "<|action_267|>": 151939,
1242
+ "<|action_268|>": 151940,
1243
+ "<|action_269|>": 151941,
1244
+ "<|action_26|>": 151698,
1245
+ "<|action_270|>": 151942,
1246
+ "<|action_271|>": 151943,
1247
+ "<|action_272|>": 151944,
1248
+ "<|action_273|>": 151945,
1249
+ "<|action_274|>": 151946,
1250
+ "<|action_275|>": 151947,
1251
+ "<|action_276|>": 151948,
1252
+ "<|action_277|>": 151949,
1253
+ "<|action_278|>": 151950,
1254
+ "<|action_279|>": 151951,
1255
+ "<|action_27|>": 151699,
1256
+ "<|action_280|>": 151952,
1257
+ "<|action_281|>": 151953,
1258
+ "<|action_282|>": 151954,
1259
+ "<|action_283|>": 151955,
1260
+ "<|action_284|>": 151956,
1261
+ "<|action_285|>": 151957,
1262
+ "<|action_286|>": 151958,
1263
+ "<|action_287|>": 151959,
1264
+ "<|action_288|>": 151960,
1265
+ "<|action_289|>": 151961,
1266
+ "<|action_28|>": 151700,
1267
+ "<|action_290|>": 151962,
1268
+ "<|action_291|>": 151963,
1269
+ "<|action_292|>": 151964,
1270
+ "<|action_293|>": 151965,
1271
+ "<|action_294|>": 151966,
1272
+ "<|action_295|>": 151967,
1273
+ "<|action_296|>": 151968,
1274
+ "<|action_297|>": 151969,
1275
+ "<|action_298|>": 151970,
1276
+ "<|action_299|>": 151971,
1277
+ "<|action_29|>": 151701,
1278
+ "<|action_2|>": 151674,
1279
+ "<|action_300|>": 151972,
1280
+ "<|action_301|>": 151973,
1281
+ "<|action_302|>": 151974,
1282
+ "<|action_303|>": 151975,
1283
+ "<|action_304|>": 151976,
1284
+ "<|action_305|>": 151977,
1285
+ "<|action_306|>": 151978,
1286
+ "<|action_307|>": 151979,
1287
+ "<|action_308|>": 151980,
1288
+ "<|action_309|>": 151981,
1289
+ "<|action_30|>": 151702,
1290
+ "<|action_310|>": 151982,
1291
+ "<|action_311|>": 151983,
1292
+ "<|action_312|>": 151984,
1293
+ "<|action_313|>": 151985,
1294
+ "<|action_314|>": 151986,
1295
+ "<|action_315|>": 151987,
1296
+ "<|action_316|>": 151988,
1297
+ "<|action_317|>": 151989,
1298
+ "<|action_318|>": 151990,
1299
+ "<|action_319|>": 151991,
1300
+ "<|action_31|>": 151703,
1301
+ "<|action_320|>": 151992,
1302
+ "<|action_321|>": 151993,
1303
+ "<|action_322|>": 151994,
1304
+ "<|action_323|>": 151995,
1305
+ "<|action_324|>": 151996,
1306
+ "<|action_325|>": 151997,
1307
+ "<|action_326|>": 151998,
1308
+ "<|action_327|>": 151999,
1309
+ "<|action_328|>": 152000,
1310
+ "<|action_329|>": 152001,
1311
+ "<|action_32|>": 151704,
1312
+ "<|action_330|>": 152002,
1313
+ "<|action_331|>": 152003,
1314
+ "<|action_332|>": 152004,
1315
+ "<|action_333|>": 152005,
1316
+ "<|action_334|>": 152006,
1317
+ "<|action_335|>": 152007,
1318
+ "<|action_336|>": 152008,
1319
+ "<|action_337|>": 152009,
1320
+ "<|action_338|>": 152010,
1321
+ "<|action_339|>": 152011,
1322
+ "<|action_33|>": 151705,
1323
+ "<|action_340|>": 152012,
1324
+ "<|action_341|>": 152013,
1325
+ "<|action_342|>": 152014,
1326
+ "<|action_343|>": 152015,
1327
+ "<|action_344|>": 152016,
1328
+ "<|action_345|>": 152017,
1329
+ "<|action_346|>": 152018,
1330
+ "<|action_347|>": 152019,
1331
+ "<|action_348|>": 152020,
1332
+ "<|action_349|>": 152021,
1333
+ "<|action_34|>": 151706,
1334
+ "<|action_350|>": 152022,
1335
+ "<|action_351|>": 152023,
1336
+ "<|action_352|>": 152024,
1337
+ "<|action_353|>": 152025,
1338
+ "<|action_354|>": 152026,
1339
+ "<|action_355|>": 152027,
1340
+ "<|action_356|>": 152028,
1341
+ "<|action_357|>": 152029,
1342
+ "<|action_358|>": 152030,
1343
+ "<|action_359|>": 152031,
1344
+ "<|action_35|>": 151707,
1345
+ "<|action_360|>": 152032,
1346
+ "<|action_361|>": 152033,
1347
+ "<|action_362|>": 152034,
1348
+ "<|action_363|>": 152035,
1349
+ "<|action_364|>": 152036,
1350
+ "<|action_365|>": 152037,
1351
+ "<|action_366|>": 152038,
1352
+ "<|action_367|>": 152039,
1353
+ "<|action_368|>": 152040,
1354
+ "<|action_369|>": 152041,
1355
+ "<|action_36|>": 151708,
1356
+ "<|action_370|>": 152042,
1357
+ "<|action_371|>": 152043,
1358
+ "<|action_372|>": 152044,
1359
+ "<|action_373|>": 152045,
1360
+ "<|action_374|>": 152046,
1361
+ "<|action_375|>": 152047,
1362
+ "<|action_376|>": 152048,
1363
+ "<|action_377|>": 152049,
1364
+ "<|action_378|>": 152050,
1365
+ "<|action_379|>": 152051,
1366
+ "<|action_37|>": 151709,
1367
+ "<|action_380|>": 152052,
1368
+ "<|action_381|>": 152053,
1369
+ "<|action_382|>": 152054,
1370
+ "<|action_383|>": 152055,
1371
+ "<|action_384|>": 152056,
1372
+ "<|action_385|>": 152057,
1373
+ "<|action_386|>": 152058,
1374
+ "<|action_387|>": 152059,
1375
+ "<|action_388|>": 152060,
1376
+ "<|action_389|>": 152061,
1377
+ "<|action_38|>": 151710,
1378
+ "<|action_390|>": 152062,
1379
+ "<|action_391|>": 152063,
1380
+ "<|action_392|>": 152064,
1381
+ "<|action_393|>": 152065,
1382
+ "<|action_394|>": 152066,
1383
+ "<|action_395|>": 152067,
1384
+ "<|action_396|>": 152068,
1385
+ "<|action_397|>": 152069,
1386
+ "<|action_398|>": 152070,
1387
+ "<|action_399|>": 152071,
1388
+ "<|action_39|>": 151711,
1389
+ "<|action_3|>": 151675,
1390
+ "<|action_400|>": 152072,
1391
+ "<|action_401|>": 152073,
1392
+ "<|action_402|>": 152074,
1393
+ "<|action_403|>": 152075,
1394
+ "<|action_404|>": 152076,
1395
+ "<|action_405|>": 152077,
1396
+ "<|action_406|>": 152078,
1397
+ "<|action_407|>": 152079,
1398
+ "<|action_408|>": 152080,
1399
+ "<|action_409|>": 152081,
1400
+ "<|action_40|>": 151712,
1401
+ "<|action_410|>": 152082,
1402
+ "<|action_411|>": 152083,
1403
+ "<|action_412|>": 152084,
1404
+ "<|action_413|>": 152085,
1405
+ "<|action_414|>": 152086,
1406
+ "<|action_415|>": 152087,
1407
+ "<|action_416|>": 152088,
1408
+ "<|action_417|>": 152089,
1409
+ "<|action_418|>": 152090,
1410
+ "<|action_419|>": 152091,
1411
+ "<|action_41|>": 151713,
1412
+ "<|action_420|>": 152092,
1413
+ "<|action_421|>": 152093,
1414
+ "<|action_422|>": 152094,
1415
+ "<|action_423|>": 152095,
1416
+ "<|action_424|>": 152096,
1417
+ "<|action_425|>": 152097,
1418
+ "<|action_426|>": 152098,
1419
+ "<|action_427|>": 152099,
1420
+ "<|action_428|>": 152100,
1421
+ "<|action_429|>": 152101,
1422
+ "<|action_42|>": 151714,
1423
+ "<|action_430|>": 152102,
1424
+ "<|action_431|>": 152103,
1425
+ "<|action_432|>": 152104,
1426
+ "<|action_433|>": 152105,
1427
+ "<|action_434|>": 152106,
1428
+ "<|action_435|>": 152107,
1429
+ "<|action_436|>": 152108,
1430
+ "<|action_437|>": 152109,
1431
+ "<|action_438|>": 152110,
1432
+ "<|action_439|>": 152111,
1433
+ "<|action_43|>": 151715,
1434
+ "<|action_440|>": 152112,
1435
+ "<|action_441|>": 152113,
1436
+ "<|action_442|>": 152114,
1437
+ "<|action_443|>": 152115,
1438
+ "<|action_444|>": 152116,
1439
+ "<|action_445|>": 152117,
1440
+ "<|action_446|>": 152118,
1441
+ "<|action_447|>": 152119,
1442
+ "<|action_448|>": 152120,
1443
+ "<|action_449|>": 152121,
1444
+ "<|action_44|>": 151716,
1445
+ "<|action_450|>": 152122,
1446
+ "<|action_451|>": 152123,
1447
+ "<|action_452|>": 152124,
1448
+ "<|action_453|>": 152125,
1449
+ "<|action_454|>": 152126,
1450
+ "<|action_455|>": 152127,
1451
+ "<|action_456|>": 152128,
1452
+ "<|action_457|>": 152129,
1453
+ "<|action_458|>": 152130,
1454
+ "<|action_459|>": 152131,
1455
+ "<|action_45|>": 151717,
1456
+ "<|action_460|>": 152132,
1457
+ "<|action_461|>": 152133,
1458
+ "<|action_462|>": 152134,
1459
+ "<|action_463|>": 152135,
1460
+ "<|action_464|>": 152136,
1461
+ "<|action_465|>": 152137,
1462
+ "<|action_466|>": 152138,
1463
+ "<|action_467|>": 152139,
1464
+ "<|action_468|>": 152140,
1465
+ "<|action_469|>": 152141,
1466
+ "<|action_46|>": 151718,
1467
+ "<|action_470|>": 152142,
1468
+ "<|action_471|>": 152143,
1469
+ "<|action_472|>": 152144,
1470
+ "<|action_473|>": 152145,
1471
+ "<|action_474|>": 152146,
1472
+ "<|action_475|>": 152147,
1473
+ "<|action_476|>": 152148,
1474
+ "<|action_477|>": 152149,
1475
+ "<|action_478|>": 152150,
1476
+ "<|action_479|>": 152151,
1477
+ "<|action_47|>": 151719,
1478
+ "<|action_480|>": 152152,
1479
+ "<|action_481|>": 152153,
1480
+ "<|action_482|>": 152154,
1481
+ "<|action_483|>": 152155,
1482
+ "<|action_484|>": 152156,
1483
+ "<|action_485|>": 152157,
1484
+ "<|action_486|>": 152158,
1485
+ "<|action_487|>": 152159,
1486
+ "<|action_488|>": 152160,
1487
+ "<|action_489|>": 152161,
1488
+ "<|action_48|>": 151720,
1489
+ "<|action_490|>": 152162,
1490
+ "<|action_491|>": 152163,
1491
+ "<|action_492|>": 152164,
1492
+ "<|action_493|>": 152165,
1493
+ "<|action_494|>": 152166,
1494
+ "<|action_495|>": 152167,
1495
+ "<|action_496|>": 152168,
1496
+ "<|action_497|>": 152169,
1497
+ "<|action_498|>": 152170,
1498
+ "<|action_499|>": 152171,
1499
+ "<|action_49|>": 151721,
1500
+ "<|action_4|>": 151676,
1501
+ "<|action_500|>": 152172,
1502
+ "<|action_501|>": 152173,
1503
+ "<|action_502|>": 152174,
1504
+ "<|action_503|>": 152175,
1505
+ "<|action_504|>": 152176,
1506
+ "<|action_505|>": 152177,
1507
+ "<|action_506|>": 152178,
1508
+ "<|action_507|>": 152179,
1509
+ "<|action_508|>": 152180,
1510
+ "<|action_509|>": 152181,
1511
+ "<|action_50|>": 151722,
1512
+ "<|action_510|>": 152182,
1513
+ "<|action_511|>": 152183,
1514
+ "<|action_512|>": 152184,
1515
+ "<|action_513|>": 152185,
1516
+ "<|action_514|>": 152186,
1517
+ "<|action_515|>": 152187,
1518
+ "<|action_516|>": 152188,
1519
+ "<|action_517|>": 152189,
1520
+ "<|action_518|>": 152190,
1521
+ "<|action_519|>": 152191,
1522
+ "<|action_51|>": 151723,
1523
+ "<|action_520|>": 152192,
1524
+ "<|action_521|>": 152193,
1525
+ "<|action_522|>": 152194,
1526
+ "<|action_523|>": 152195,
1527
+ "<|action_524|>": 152196,
1528
+ "<|action_525|>": 152197,
1529
+ "<|action_526|>": 152198,
1530
+ "<|action_527|>": 152199,
1531
+ "<|action_528|>": 152200,
1532
+ "<|action_529|>": 152201,
1533
+ "<|action_52|>": 151724,
1534
+ "<|action_530|>": 152202,
1535
+ "<|action_531|>": 152203,
1536
+ "<|action_532|>": 152204,
1537
+ "<|action_533|>": 152205,
1538
+ "<|action_534|>": 152206,
1539
+ "<|action_535|>": 152207,
1540
+ "<|action_536|>": 152208,
1541
+ "<|action_537|>": 152209,
1542
+ "<|action_538|>": 152210,
1543
+ "<|action_539|>": 152211,
1544
+ "<|action_53|>": 151725,
1545
+ "<|action_540|>": 152212,
1546
+ "<|action_541|>": 152213,
1547
+ "<|action_542|>": 152214,
1548
+ "<|action_543|>": 152215,
1549
+ "<|action_544|>": 152216,
1550
+ "<|action_545|>": 152217,
1551
+ "<|action_546|>": 152218,
1552
+ "<|action_547|>": 152219,
1553
+ "<|action_548|>": 152220,
1554
+ "<|action_549|>": 152221,
1555
+ "<|action_54|>": 151726,
1556
+ "<|action_550|>": 152222,
1557
+ "<|action_551|>": 152223,
1558
+ "<|action_552|>": 152224,
1559
+ "<|action_553|>": 152225,
1560
+ "<|action_554|>": 152226,
1561
+ "<|action_555|>": 152227,
1562
+ "<|action_556|>": 152228,
1563
+ "<|action_557|>": 152229,
1564
+ "<|action_558|>": 152230,
1565
+ "<|action_559|>": 152231,
1566
+ "<|action_55|>": 151727,
1567
+ "<|action_560|>": 152232,
1568
+ "<|action_561|>": 152233,
1569
+ "<|action_562|>": 152234,
1570
+ "<|action_563|>": 152235,
1571
+ "<|action_564|>": 152236,
1572
+ "<|action_565|>": 152237,
1573
+ "<|action_566|>": 152238,
1574
+ "<|action_567|>": 152239,
1575
+ "<|action_568|>": 152240,
1576
+ "<|action_569|>": 152241,
1577
+ "<|action_56|>": 151728,
1578
+ "<|action_570|>": 152242,
1579
+ "<|action_571|>": 152243,
1580
+ "<|action_572|>": 152244,
1581
+ "<|action_573|>": 152245,
1582
+ "<|action_574|>": 152246,
1583
+ "<|action_575|>": 152247,
1584
+ "<|action_576|>": 152248,
1585
+ "<|action_577|>": 152249,
1586
+ "<|action_578|>": 152250,
1587
+ "<|action_579|>": 152251,
1588
+ "<|action_57|>": 151729,
1589
+ "<|action_580|>": 152252,
1590
+ "<|action_581|>": 152253,
1591
+ "<|action_582|>": 152254,
1592
+ "<|action_583|>": 152255,
1593
+ "<|action_584|>": 152256,
1594
+ "<|action_585|>": 152257,
1595
+ "<|action_586|>": 152258,
1596
+ "<|action_587|>": 152259,
1597
+ "<|action_588|>": 152260,
1598
+ "<|action_589|>": 152261,
1599
+ "<|action_58|>": 151730,
1600
+ "<|action_590|>": 152262,
1601
+ "<|action_591|>": 152263,
1602
+ "<|action_592|>": 152264,
1603
+ "<|action_593|>": 152265,
1604
+ "<|action_594|>": 152266,
1605
+ "<|action_595|>": 152267,
1606
+ "<|action_596|>": 152268,
1607
+ "<|action_597|>": 152269,
1608
+ "<|action_598|>": 152270,
1609
+ "<|action_599|>": 152271,
1610
+ "<|action_59|>": 151731,
1611
+ "<|action_5|>": 151677,
1612
+ "<|action_600|>": 152272,
1613
+ "<|action_601|>": 152273,
1614
+ "<|action_602|>": 152274,
1615
+ "<|action_603|>": 152275,
1616
+ "<|action_604|>": 152276,
1617
+ "<|action_605|>": 152277,
1618
+ "<|action_606|>": 152278,
1619
+ "<|action_607|>": 152279,
1620
+ "<|action_608|>": 152280,
1621
+ "<|action_609|>": 152281,
1622
+ "<|action_60|>": 151732,
1623
+ "<|action_610|>": 152282,
1624
+ "<|action_611|>": 152283,
1625
+ "<|action_612|>": 152284,
1626
+ "<|action_613|>": 152285,
1627
+ "<|action_614|>": 152286,
1628
+ "<|action_615|>": 152287,
1629
+ "<|action_616|>": 152288,
1630
+ "<|action_617|>": 152289,
1631
+ "<|action_618|>": 152290,
1632
+ "<|action_619|>": 152291,
1633
+ "<|action_61|>": 151733,
1634
+ "<|action_620|>": 152292,
1635
+ "<|action_621|>": 152293,
1636
+ "<|action_622|>": 152294,
1637
+ "<|action_623|>": 152295,
1638
+ "<|action_624|>": 152296,
1639
+ "<|action_625|>": 152297,
1640
+ "<|action_626|>": 152298,
1641
+ "<|action_627|>": 152299,
1642
+ "<|action_628|>": 152300,
1643
+ "<|action_629|>": 152301,
1644
+ "<|action_62|>": 151734,
1645
+ "<|action_630|>": 152302,
1646
+ "<|action_631|>": 152303,
1647
+ "<|action_632|>": 152304,
1648
+ "<|action_633|>": 152305,
1649
+ "<|action_634|>": 152306,
1650
+ "<|action_635|>": 152307,
1651
+ "<|action_636|>": 152308,
1652
+ "<|action_637|>": 152309,
1653
+ "<|action_638|>": 152310,
1654
+ "<|action_639|>": 152311,
1655
+ "<|action_63|>": 151735,
1656
+ "<|action_640|>": 152312,
1657
+ "<|action_641|>": 152313,
1658
+ "<|action_642|>": 152314,
1659
+ "<|action_643|>": 152315,
1660
+ "<|action_644|>": 152316,
1661
+ "<|action_645|>": 152317,
1662
+ "<|action_646|>": 152318,
1663
+ "<|action_647|>": 152319,
1664
+ "<|action_648|>": 152320,
1665
+ "<|action_649|>": 152321,
1666
+ "<|action_64|>": 151736,
1667
+ "<|action_650|>": 152322,
1668
+ "<|action_651|>": 152323,
1669
+ "<|action_652|>": 152324,
1670
+ "<|action_653|>": 152325,
1671
+ "<|action_654|>": 152326,
1672
+ "<|action_655|>": 152327,
1673
+ "<|action_656|>": 152328,
1674
+ "<|action_657|>": 152329,
1675
+ "<|action_658|>": 152330,
1676
+ "<|action_659|>": 152331,
1677
+ "<|action_65|>": 151737,
1678
+ "<|action_660|>": 152332,
1679
+ "<|action_661|>": 152333,
1680
+ "<|action_662|>": 152334,
1681
+ "<|action_663|>": 152335,
1682
+ "<|action_664|>": 152336,
1683
+ "<|action_665|>": 152337,
1684
+ "<|action_666|>": 152338,
1685
+ "<|action_667|>": 152339,
1686
+ "<|action_668|>": 152340,
1687
+ "<|action_669|>": 152341,
1688
+ "<|action_66|>": 151738,
1689
+ "<|action_670|>": 152342,
1690
+ "<|action_671|>": 152343,
1691
+ "<|action_672|>": 152344,
1692
+ "<|action_673|>": 152345,
1693
+ "<|action_674|>": 152346,
1694
+ "<|action_675|>": 152347,
1695
+ "<|action_676|>": 152348,
1696
+ "<|action_677|>": 152349,
1697
+ "<|action_678|>": 152350,
1698
+ "<|action_679|>": 152351,
1699
+ "<|action_67|>": 151739,
1700
+ "<|action_680|>": 152352,
1701
+ "<|action_681|>": 152353,
1702
+ "<|action_682|>": 152354,
1703
+ "<|action_683|>": 152355,
1704
+ "<|action_684|>": 152356,
1705
+ "<|action_685|>": 152357,
1706
+ "<|action_686|>": 152358,
1707
+ "<|action_687|>": 152359,
1708
+ "<|action_688|>": 152360,
1709
+ "<|action_689|>": 152361,
1710
+ "<|action_68|>": 151740,
1711
+ "<|action_690|>": 152362,
1712
+ "<|action_691|>": 152363,
1713
+ "<|action_692|>": 152364,
1714
+ "<|action_693|>": 152365,
1715
+ "<|action_694|>": 152366,
1716
+ "<|action_695|>": 152367,
1717
+ "<|action_696|>": 152368,
1718
+ "<|action_697|>": 152369,
1719
+ "<|action_698|>": 152370,
1720
+ "<|action_699|>": 152371,
1721
+ "<|action_69|>": 151741,
1722
+ "<|action_6|>": 151678,
1723
+ "<|action_700|>": 152372,
1724
+ "<|action_701|>": 152373,
1725
+ "<|action_702|>": 152374,
1726
+ "<|action_703|>": 152375,
1727
+ "<|action_704|>": 152376,
1728
+ "<|action_705|>": 152377,
1729
+ "<|action_706|>": 152378,
1730
+ "<|action_707|>": 152379,
1731
+ "<|action_708|>": 152380,
1732
+ "<|action_709|>": 152381,
1733
+ "<|action_70|>": 151742,
1734
+ "<|action_710|>": 152382,
1735
+ "<|action_711|>": 152383,
1736
+ "<|action_712|>": 152384,
1737
+ "<|action_713|>": 152385,
1738
+ "<|action_714|>": 152386,
1739
+ "<|action_715|>": 152387,
1740
+ "<|action_716|>": 152388,
1741
+ "<|action_717|>": 152389,
1742
+ "<|action_718|>": 152390,
1743
+ "<|action_719|>": 152391,
1744
+ "<|action_71|>": 151743,
1745
+ "<|action_720|>": 152392,
1746
+ "<|action_721|>": 152393,
1747
+ "<|action_722|>": 152394,
1748
+ "<|action_723|>": 152395,
1749
+ "<|action_724|>": 152396,
1750
+ "<|action_725|>": 152397,
1751
+ "<|action_726|>": 152398,
1752
+ "<|action_727|>": 152399,
1753
+ "<|action_728|>": 152400,
1754
+ "<|action_729|>": 152401,
1755
+ "<|action_72|>": 151744,
1756
+ "<|action_730|>": 152402,
1757
+ "<|action_731|>": 152403,
1758
+ "<|action_732|>": 152404,
1759
+ "<|action_733|>": 152405,
1760
+ "<|action_734|>": 152406,
1761
+ "<|action_735|>": 152407,
1762
+ "<|action_736|>": 152408,
1763
+ "<|action_737|>": 152409,
1764
+ "<|action_738|>": 152410,
1765
+ "<|action_739|>": 152411,
1766
+ "<|action_73|>": 151745,
1767
+ "<|action_740|>": 152412,
1768
+ "<|action_741|>": 152413,
1769
+ "<|action_742|>": 152414,
1770
+ "<|action_743|>": 152415,
1771
+ "<|action_744|>": 152416,
1772
+ "<|action_745|>": 152417,
1773
+ "<|action_746|>": 152418,
1774
+ "<|action_747|>": 152419,
1775
+ "<|action_748|>": 152420,
1776
+ "<|action_749|>": 152421,
1777
+ "<|action_74|>": 151746,
1778
+ "<|action_750|>": 152422,
1779
+ "<|action_751|>": 152423,
1780
+ "<|action_752|>": 152424,
1781
+ "<|action_753|>": 152425,
1782
+ "<|action_754|>": 152426,
1783
+ "<|action_755|>": 152427,
1784
+ "<|action_756|>": 152428,
1785
+ "<|action_757|>": 152429,
1786
+ "<|action_758|>": 152430,
1787
+ "<|action_759|>": 152431,
1788
+ "<|action_75|>": 151747,
1789
+ "<|action_760|>": 152432,
1790
+ "<|action_761|>": 152433,
1791
+ "<|action_762|>": 152434,
1792
+ "<|action_763|>": 152435,
1793
+ "<|action_764|>": 152436,
1794
+ "<|action_765|>": 152437,
1795
+ "<|action_766|>": 152438,
1796
+ "<|action_767|>": 152439,
1797
+ "<|action_768|>": 152440,
1798
+ "<|action_769|>": 152441,
1799
+ "<|action_76|>": 151748,
1800
+ "<|action_770|>": 152442,
1801
+ "<|action_771|>": 152443,
1802
+ "<|action_772|>": 152444,
1803
+ "<|action_773|>": 152445,
1804
+ "<|action_774|>": 152446,
1805
+ "<|action_775|>": 152447,
1806
+ "<|action_776|>": 152448,
1807
+ "<|action_777|>": 152449,
1808
+ "<|action_778|>": 152450,
1809
+ "<|action_779|>": 152451,
1810
+ "<|action_77|>": 151749,
1811
+ "<|action_780|>": 152452,
1812
+ "<|action_781|>": 152453,
1813
+ "<|action_782|>": 152454,
1814
+ "<|action_783|>": 152455,
1815
+ "<|action_784|>": 152456,
1816
+ "<|action_785|>": 152457,
1817
+ "<|action_786|>": 152458,
1818
+ "<|action_787|>": 152459,
1819
+ "<|action_788|>": 152460,
1820
+ "<|action_789|>": 152461,
1821
+ "<|action_78|>": 151750,
1822
+ "<|action_790|>": 152462,
1823
+ "<|action_791|>": 152463,
1824
+ "<|action_792|>": 152464,
1825
+ "<|action_793|>": 152465,
1826
+ "<|action_794|>": 152466,
1827
+ "<|action_795|>": 152467,
1828
+ "<|action_796|>": 152468,
1829
+ "<|action_797|>": 152469,
1830
+ "<|action_798|>": 152470,
1831
+ "<|action_799|>": 152471,
1832
+ "<|action_79|>": 151751,
1833
+ "<|action_7|>": 151679,
1834
+ "<|action_800|>": 152472,
1835
+ "<|action_801|>": 152473,
1836
+ "<|action_802|>": 152474,
1837
+ "<|action_803|>": 152475,
1838
+ "<|action_804|>": 152476,
1839
+ "<|action_805|>": 152477,
1840
+ "<|action_806|>": 152478,
1841
+ "<|action_807|>": 152479,
1842
+ "<|action_808|>": 152480,
1843
+ "<|action_809|>": 152481,
1844
+ "<|action_80|>": 151752,
1845
+ "<|action_810|>": 152482,
1846
+ "<|action_811|>": 152483,
1847
+ "<|action_812|>": 152484,
1848
+ "<|action_813|>": 152485,
1849
+ "<|action_814|>": 152486,
1850
+ "<|action_815|>": 152487,
1851
+ "<|action_816|>": 152488,
1852
+ "<|action_817|>": 152489,
1853
+ "<|action_818|>": 152490,
1854
+ "<|action_819|>": 152491,
1855
+ "<|action_81|>": 151753,
1856
+ "<|action_820|>": 152492,
1857
+ "<|action_821|>": 152493,
1858
+ "<|action_822|>": 152494,
1859
+ "<|action_823|>": 152495,
1860
+ "<|action_824|>": 152496,
1861
+ "<|action_825|>": 152497,
1862
+ "<|action_826|>": 152498,
1863
+ "<|action_827|>": 152499,
1864
+ "<|action_828|>": 152500,
1865
+ "<|action_829|>": 152501,
1866
+ "<|action_82|>": 151754,
1867
+ "<|action_830|>": 152502,
1868
+ "<|action_831|>": 152503,
1869
+ "<|action_832|>": 152504,
1870
+ "<|action_833|>": 152505,
1871
+ "<|action_834|>": 152506,
1872
+ "<|action_835|>": 152507,
1873
+ "<|action_836|>": 152508,
1874
+ "<|action_837|>": 152509,
1875
+ "<|action_838|>": 152510,
1876
+ "<|action_839|>": 152511,
1877
+ "<|action_83|>": 151755,
1878
+ "<|action_840|>": 152512,
1879
+ "<|action_841|>": 152513,
1880
+ "<|action_842|>": 152514,
1881
+ "<|action_843|>": 152515,
1882
+ "<|action_844|>": 152516,
1883
+ "<|action_845|>": 152517,
1884
+ "<|action_846|>": 152518,
1885
+ "<|action_847|>": 152519,
1886
+ "<|action_848|>": 152520,
1887
+ "<|action_849|>": 152521,
1888
+ "<|action_84|>": 151756,
1889
+ "<|action_850|>": 152522,
1890
+ "<|action_851|>": 152523,
1891
+ "<|action_852|>": 152524,
1892
+ "<|action_853|>": 152525,
1893
+ "<|action_854|>": 152526,
1894
+ "<|action_855|>": 152527,
1895
+ "<|action_856|>": 152528,
1896
+ "<|action_857|>": 152529,
1897
+ "<|action_858|>": 152530,
1898
+ "<|action_859|>": 152531,
1899
+ "<|action_85|>": 151757,
1900
+ "<|action_860|>": 152532,
1901
+ "<|action_861|>": 152533,
1902
+ "<|action_862|>": 152534,
1903
+ "<|action_863|>": 152535,
1904
+ "<|action_864|>": 152536,
1905
+ "<|action_865|>": 152537,
1906
+ "<|action_866|>": 152538,
1907
+ "<|action_867|>": 152539,
1908
+ "<|action_868|>": 152540,
1909
+ "<|action_869|>": 152541,
1910
+ "<|action_86|>": 151758,
1911
+ "<|action_870|>": 152542,
1912
+ "<|action_871|>": 152543,
1913
+ "<|action_872|>": 152544,
1914
+ "<|action_873|>": 152545,
1915
+ "<|action_874|>": 152546,
1916
+ "<|action_875|>": 152547,
1917
+ "<|action_876|>": 152548,
1918
+ "<|action_877|>": 152549,
1919
+ "<|action_878|>": 152550,
1920
+ "<|action_879|>": 152551,
1921
+ "<|action_87|>": 151759,
1922
+ "<|action_880|>": 152552,
1923
+ "<|action_881|>": 152553,
1924
+ "<|action_882|>": 152554,
1925
+ "<|action_883|>": 152555,
1926
+ "<|action_884|>": 152556,
1927
+ "<|action_885|>": 152557,
1928
+ "<|action_886|>": 152558,
1929
+ "<|action_887|>": 152559,
1930
+ "<|action_888|>": 152560,
1931
+ "<|action_889|>": 152561,
1932
+ "<|action_88|>": 151760,
1933
+ "<|action_890|>": 152562,
1934
+ "<|action_891|>": 152563,
1935
+ "<|action_892|>": 152564,
1936
+ "<|action_893|>": 152565,
1937
+ "<|action_894|>": 152566,
1938
+ "<|action_895|>": 152567,
1939
+ "<|action_896|>": 152568,
1940
+ "<|action_897|>": 152569,
1941
+ "<|action_898|>": 152570,
1942
+ "<|action_899|>": 152571,
1943
+ "<|action_89|>": 151761,
1944
+ "<|action_8|>": 151680,
1945
+ "<|action_900|>": 152572,
1946
+ "<|action_901|>": 152573,
1947
+ "<|action_902|>": 152574,
1948
+ "<|action_903|>": 152575,
1949
+ "<|action_904|>": 152576,
1950
+ "<|action_905|>": 152577,
1951
+ "<|action_906|>": 152578,
1952
+ "<|action_907|>": 152579,
1953
+ "<|action_908|>": 152580,
1954
+ "<|action_909|>": 152581,
1955
+ "<|action_90|>": 151762,
1956
+ "<|action_910|>": 152582,
1957
+ "<|action_911|>": 152583,
1958
+ "<|action_912|>": 152584,
1959
+ "<|action_913|>": 152585,
1960
+ "<|action_914|>": 152586,
1961
+ "<|action_915|>": 152587,
1962
+ "<|action_916|>": 152588,
1963
+ "<|action_917|>": 152589,
1964
+ "<|action_918|>": 152590,
1965
+ "<|action_919|>": 152591,
1966
+ "<|action_91|>": 151763,
1967
+ "<|action_920|>": 152592,
1968
+ "<|action_921|>": 152593,
1969
+ "<|action_922|>": 152594,
1970
+ "<|action_923|>": 152595,
1971
+ "<|action_924|>": 152596,
1972
+ "<|action_925|>": 152597,
1973
+ "<|action_926|>": 152598,
1974
+ "<|action_927|>": 152599,
1975
+ "<|action_928|>": 152600,
1976
+ "<|action_929|>": 152601,
1977
+ "<|action_92|>": 151764,
1978
+ "<|action_930|>": 152602,
1979
+ "<|action_931|>": 152603,
1980
+ "<|action_932|>": 152604,
1981
+ "<|action_933|>": 152605,
1982
+ "<|action_934|>": 152606,
1983
+ "<|action_935|>": 152607,
1984
+ "<|action_936|>": 152608,
1985
+ "<|action_937|>": 152609,
1986
+ "<|action_938|>": 152610,
1987
+ "<|action_939|>": 152611,
1988
+ "<|action_93|>": 151765,
1989
+ "<|action_940|>": 152612,
1990
+ "<|action_941|>": 152613,
1991
+ "<|action_942|>": 152614,
1992
+ "<|action_943|>": 152615,
1993
+ "<|action_944|>": 152616,
1994
+ "<|action_945|>": 152617,
1995
+ "<|action_946|>": 152618,
1996
+ "<|action_947|>": 152619,
1997
+ "<|action_948|>": 152620,
1998
+ "<|action_949|>": 152621,
1999
+ "<|action_94|>": 151766,
2000
+ "<|action_950|>": 152622,
2001
+ "<|action_951|>": 152623,
2002
+ "<|action_952|>": 152624,
2003
+ "<|action_953|>": 152625,
2004
+ "<|action_954|>": 152626,
2005
+ "<|action_955|>": 152627,
2006
+ "<|action_956|>": 152628,
2007
+ "<|action_957|>": 152629,
2008
+ "<|action_958|>": 152630,
2009
+ "<|action_959|>": 152631,
2010
+ "<|action_95|>": 151767,
2011
+ "<|action_960|>": 152632,
2012
+ "<|action_961|>": 152633,
2013
+ "<|action_962|>": 152634,
2014
+ "<|action_963|>": 152635,
2015
+ "<|action_964|>": 152636,
2016
+ "<|action_965|>": 152637,
2017
+ "<|action_966|>": 152638,
2018
+ "<|action_967|>": 152639,
2019
+ "<|action_968|>": 152640,
2020
+ "<|action_969|>": 152641,
2021
+ "<|action_96|>": 151768,
2022
+ "<|action_970|>": 152642,
2023
+ "<|action_971|>": 152643,
2024
+ "<|action_972|>": 152644,
2025
+ "<|action_973|>": 152645,
2026
+ "<|action_974|>": 152646,
2027
+ "<|action_975|>": 152647,
2028
+ "<|action_976|>": 152648,
2029
+ "<|action_977|>": 152649,
2030
+ "<|action_978|>": 152650,
2031
+ "<|action_979|>": 152651,
2032
+ "<|action_97|>": 151769,
2033
+ "<|action_980|>": 152652,
2034
+ "<|action_981|>": 152653,
2035
+ "<|action_982|>": 152654,
2036
+ "<|action_983|>": 152655,
2037
+ "<|action_984|>": 152656,
2038
+ "<|action_985|>": 152657,
2039
+ "<|action_986|>": 152658,
2040
+ "<|action_987|>": 152659,
2041
+ "<|action_988|>": 152660,
2042
+ "<|action_989|>": 152661,
2043
+ "<|action_98|>": 151770,
2044
+ "<|action_990|>": 152662,
2045
+ "<|action_991|>": 152663,
2046
+ "<|action_992|>": 152664,
2047
+ "<|action_993|>": 152665,
2048
+ "<|action_994|>": 152666,
2049
+ "<|action_995|>": 152667,
2050
+ "<|action_996|>": 152668,
2051
+ "<|action_997|>": 152669,
2052
+ "<|action_998|>": 152670,
2053
+ "<|action_999|>": 152671,
2054
+ "<|action_99|>": 151771,
2055
+ "<|action_9|>": 151681,
2056
+ "<|action_end|>": 151670,
2057
+ "<|action_placeholder|>": 151671,
2058
+ "<|action_start|>": 151669,
2059
+ "<|box_end|>": 151649,
2060
+ "<|box_start|>": 151648,
2061
+ "<|endoftext|>": 151643,
2062
+ "<|file_sep|>": 151664,
2063
+ "<|fim_middle|>": 151660,
2064
+ "<|fim_pad|>": 151662,
2065
+ "<|fim_prefix|>": 151659,
2066
+ "<|fim_suffix|>": 151661,
2067
+ "<|im_end|>": 151645,
2068
+ "<|im_start|>": 151644,
2069
+ "<|image_pad|>": 151655,
2070
+ "<|object_ref_end|>": 151647,
2071
+ "<|object_ref_start|>": 151646,
2072
+ "<|quad_end|>": 151651,
2073
+ "<|quad_start|>": 151650,
2074
+ "<|repo_name|>": 151663,
2075
+ "<|video_pad|>": 151656,
2076
+ "<|vision_end|>": 151653,
2077
+ "<|vision_pad|>": 151654,
2078
+ "<|vision_start|>": 151652
2079
+ }
chat_template.jinja ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {%- if messages[0].content is string %}
5
+ {{- messages[0].content }}
6
+ {%- else %}
7
+ {%- for content in messages[0].content %}
8
+ {%- if 'text' in content %}
9
+ {{- content.text }}
10
+ {%- endif %}
11
+ {%- endfor %}
12
+ {%- endif %}
13
+ {{- '\n\n' }}
14
+ {%- endif %}
15
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
16
+ {%- for tool in tools %}
17
+ {{- "\n" }}
18
+ {{- tool | tojson }}
19
+ {%- endfor %}
20
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
21
+ {%- else %}
22
+ {%- if messages[0].role == 'system' %}
23
+ {{- '<|im_start|>system\n' }}
24
+ {%- if messages[0].content is string %}
25
+ {{- messages[0].content }}
26
+ {%- else %}
27
+ {%- for content in messages[0].content %}
28
+ {%- if 'text' in content %}
29
+ {{- content.text }}
30
+ {%- endif %}
31
+ {%- endfor %}
32
+ {%- endif %}
33
+ {{- '<|im_end|>\n' }}
34
+ {%- endif %}
35
+ {%- endif %}
36
+ {%- set image_count = namespace(value=0) %}
37
+ {%- set video_count = namespace(value=0) %}
38
+ {%- for message in messages %}
39
+ {%- if message.role == "user" %}
40
+ {{- '<|im_start|>' + message.role + '\n' }}
41
+ {%- if message.content is string %}
42
+ {{- message.content }}
43
+ {%- else %}
44
+ {%- for content in message.content %}
45
+ {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}
46
+ {%- set image_count.value = image_count.value + 1 %}
47
+ {%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%}
48
+ <|vision_start|><|image_pad|><|vision_end|>
49
+ {%- elif content.type == 'video' or 'video' in content %}
50
+ {%- set video_count.value = video_count.value + 1 %}
51
+ {%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%}
52
+ <|vision_start|><|video_pad|><|vision_end|>
53
+ {%- elif 'text' in content %}
54
+ {{- content.text }}
55
+ {%- endif %}
56
+ {%- endfor %}
57
+ {%- endif %}
58
+ {{- '<|im_end|>\n' }}
59
+ {%- elif message.role == "assistant" %}
60
+ {{- '<|im_start|>' + message.role + '\n' }}
61
+ {%- if message.content is string %}
62
+ {{- message.content }}
63
+ {%- else %}
64
+ {%- for content_item in message.content %}
65
+ {%- if 'text' in content_item %}
66
+ {{- content_item.text }}
67
+ {%- endif %}
68
+ {%- endfor %}
69
+ {%- endif %}
70
+ {%- if message.tool_calls %}
71
+ {%- for tool_call in message.tool_calls %}
72
+ {%- if (loop.first and message.content) or (not loop.first) %}
73
+ {{- '\n' }}
74
+ {%- endif %}
75
+ {%- if tool_call.function %}
76
+ {%- set tool_call = tool_call.function %}
77
+ {%- endif %}
78
+ {{- '<tool_call>\n{"name": "' }}
79
+ {{- tool_call.name }}
80
+ {{- '", "arguments": ' }}
81
+ {%- if tool_call.arguments is string %}
82
+ {{- tool_call.arguments }}
83
+ {%- else %}
84
+ {{- tool_call.arguments | tojson }}
85
+ {%- endif %}
86
+ {{- '}\n</tool_call>' }}
87
+ {%- endfor %}
88
+ {%- endif %}
89
+ {{- '<|im_end|>\n' }}
90
+ {%- elif message.role == "tool" %}
91
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
92
+ {{- '<|im_start|>user' }}
93
+ {%- endif %}
94
+ {{- '\n<tool_response>\n' }}
95
+ {%- if message.content is string %}
96
+ {{- message.content }}
97
+ {%- else %}
98
+ {%- for content in message.content %}
99
+ {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}
100
+ {%- set image_count.value = image_count.value + 1 %}
101
+ {%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%}
102
+ <|vision_start|><|image_pad|><|vision_end|>
103
+ {%- elif content.type == 'video' or 'video' in content %}
104
+ {%- set video_count.value = video_count.value + 1 %}
105
+ {%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%}
106
+ <|vision_start|><|video_pad|><|vision_end|>
107
+ {%- elif 'text' in content %}
108
+ {{- content.text }}
109
+ {%- endif %}
110
+ {%- endfor %}
111
+ {%- endif %}
112
+ {{- '\n</tool_response>' }}
113
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
114
+ {{- '<|im_end|>\n' }}
115
+ {%- endif %}
116
+ {%- endif %}
117
+ {%- endfor %}
118
+ {%- if add_generation_prompt %}
119
+ {{- '<|im_start|>assistant\n' }}
120
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3VLForConditionalGeneration"
4
+ ],
5
+ "dtype": "bfloat16",
6
+ "eos_token_id": 151645,
7
+ "image_token_id": 151655,
8
+ "model_type": "qwen3_vl",
9
+ "pad_token_id": 151643,
10
+ "text_config": {
11
+ "attention_bias": false,
12
+ "attention_dropout": 0.0,
13
+ "bos_token_id": 151643,
14
+ "dtype": "bfloat16",
15
+ "eos_token_id": 151645,
16
+ "head_dim": 128,
17
+ "hidden_act": "silu",
18
+ "hidden_size": 4096,
19
+ "initializer_range": 0.02,
20
+ "intermediate_size": 12288,
21
+ "max_position_embeddings": 262144,
22
+ "model_type": "qwen3_vl_text",
23
+ "num_attention_heads": 32,
24
+ "num_hidden_layers": 36,
25
+ "num_key_value_heads": 8,
26
+ "rms_norm_eps": 1e-06,
27
+ "rope_parameters": {
28
+ "mrope_interleaved": true,
29
+ "mrope_section": [
30
+ 24,
31
+ 20,
32
+ 20
33
+ ],
34
+ "rope_theta": 5000000,
35
+ "rope_type": "default"
36
+ },
37
+ "rope_theta": 5000000,
38
+ "use_cache": true,
39
+ "vocab_size": 153720
40
+ },
41
+ "tie_word_embeddings": false,
42
+ "transformers_version": "5.0.0.dev0",
43
+ "use_cache": false,
44
+ "video_token_id": 151656,
45
+ "vision_config": {
46
+ "deepstack_visual_indexes": [
47
+ 8,
48
+ 16,
49
+ 24
50
+ ],
51
+ "depth": 27,
52
+ "dtype": "bfloat16",
53
+ "hidden_act": "gelu_pytorch_tanh",
54
+ "hidden_size": 1152,
55
+ "in_channels": 3,
56
+ "initializer_range": 0.02,
57
+ "intermediate_size": 4304,
58
+ "model_type": "qwen3_vl",
59
+ "num_heads": 16,
60
+ "num_position_embeddings": 2304,
61
+ "out_hidden_size": 4096,
62
+ "patch_size": 16,
63
+ "spatial_merge_size": 2,
64
+ "temporal_patch_size": 2
65
+ },
66
+ "vision_end_token_id": 151653,
67
+ "vision_start_token_id": 151652
68
+ }
contextvla.py ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import torch.nn as nn
3
+ import torch
4
+ import torch.distributed as dist
5
+
6
+ local_rank = int(os.getenv("LOCAL_RANK", "0"))
7
+ world_size = torch.cuda.device_count()
8
+
9
+ rank = local_rank
10
+
11
+ class LayerWrapper(nn.Module):
12
+ def __init__(
13
+ self,
14
+ layer,
15
+ layer_idx,
16
+ internal_projection=4,
17
+ img_pattern=[151652],
18
+ motion_token=0
19
+ ):
20
+ super().__init__()
21
+ self.layer = layer
22
+ self.layer_idx = layer_idx
23
+ self.internal_projection = internal_projection
24
+ self.motion_token = motion_token
25
+ self.img_pattern = img_pattern
26
+ assert motion_token == 1
27
+
28
+ def get_removing_indices(self, hidden_states, input_ids):
29
+ pat_len = len(self.img_pattern)
30
+
31
+ windows = input_ids.unfold(dimension=1, size=pat_len, step=1)
32
+ pattern_tensor = torch.tensor(self.img_pattern, device=hidden_states.device).view(1, 1, -1)
33
+ matches = (windows == pattern_tensor).all(dim=-1)
34
+
35
+ match_lists = [torch.nonzero(matches[b], as_tuple=False).squeeze(-1) for b in range(hidden_states.shape[0])]
36
+ begin_idx = torch.tensor([m[0] for m in match_lists], device=hidden_states.device).unsqueeze(1)
37
+ end_idx = torch.tensor([m[-1] for m in match_lists], device=hidden_states.device).unsqueeze(1)
38
+
39
+ return begin_idx, end_idx
40
+
41
+ def left_pad_emb_list(self, emb_list):
42
+ rev = [e.flip(0) for e in emb_list]
43
+ padded_rev = torch.nn.utils.rnn.pad_sequence(rev, batch_first=True, padding_value=0)
44
+ return padded_rev.flip(1)
45
+
46
+ def forward(self, hidden_states, input_ids, *args, **kwargs):
47
+ bsz, seq_len, dim = hidden_states.shape
48
+
49
+ is_incremental = (
50
+ "cache_position" in kwargs
51
+ and kwargs["cache_position"] is not None
52
+ and seq_len == 1
53
+ )
54
+ if self.layer_idx == self.internal_projection and not is_incremental:
55
+ device = hidden_states.device
56
+
57
+ token_indices = torch.arange(seq_len, device=device).view(1, -1).expand(bsz, -1)
58
+ begin_idx, end_idx = self.get_removing_indices(hidden_states, input_ids)
59
+
60
+ compress_mask = (end_idx > begin_idx).reshape(-1)
61
+
62
+ keep_mask_front = token_indices < begin_idx
63
+ keep_mask_back = token_indices >= end_idx
64
+ drop_mask = ~(keep_mask_front | keep_mask_back)
65
+
66
+ motion_token = (
67
+ (hidden_states * drop_mask.unsqueeze(-1)).sum(dim=1)
68
+ / drop_mask.sum(dim=1, keepdim=True).clamp(min=1)
69
+ ).reshape(bsz, self.motion_token, -1)
70
+
71
+ hidden_states = [
72
+ torch.cat([
73
+ hidden_states[b][keep_mask_front[b]],
74
+ motion_token[b] if compress_mask[b] else torch.tensor([], device=hidden_states.device, dtype=hidden_states.dtype),
75
+ hidden_states[b][keep_mask_back[b]]
76
+ ], dim=0) for b in range(bsz)
77
+ ]
78
+
79
+ hidden_states = self.left_pad_emb_list(hidden_states)
80
+
81
+ if 'attention_mask' in kwargs and kwargs['attention_mask'] is not None:
82
+ att_list = [
83
+ torch.cat([
84
+ kwargs["attention_mask"][b][keep_mask_front[b]],
85
+ torch.ones(1, device=kwargs["attention_mask"].device, dtype=kwargs["attention_mask"].dtype) if compress_mask[b] else torch.tensor([], device=kwargs["attention_mask"].device, dtype=kwargs["attention_mask"].dtype),
86
+ kwargs["attention_mask"][b][keep_mask_back[b]],
87
+ ]) for b in range(bsz)
88
+ ]
89
+ kwargs["attention_mask"] = self.left_pad_emb_list(att_list)
90
+
91
+ if 'position_ids' in kwargs.keys() and kwargs['position_ids'] is not None:
92
+ pos_list = [
93
+ torch.cat([
94
+ kwargs["position_ids"][b][keep_mask_front[b]],
95
+ kwargs["position_ids"][b][begin_idx[b]:begin_idx[b]+1] if compress_mask[b] else torch.tensor([], device=kwargs["position_ids"].device, dtype=kwargs["position_ids"].dtype),
96
+ kwargs["position_ids"][b][keep_mask_back[b]],
97
+ ]) for b in range(bsz)
98
+ ]
99
+ kwargs["position_ids"] = self.left_pad_emb_list(pos_list)
100
+
101
+ if 'position_embeddings' in kwargs.keys() and kwargs['position_embeddings'] is not None:
102
+ emb_x_list = [
103
+ torch.cat([
104
+ kwargs["position_embeddings"][0][b][keep_mask_front[b]],
105
+ kwargs["position_embeddings"][0][b][begin_idx[b]:begin_idx[b]+1] if compress_mask[b] else torch.tensor([], device=kwargs["position_embeddings"][0].device, dtype=kwargs["position_embeddings"][0].dtype),
106
+ kwargs["position_embeddings"][0][b][keep_mask_back[b]],
107
+ ], dim=0) for b in range(bsz)
108
+ ]
109
+
110
+ emb_y_list = [
111
+ torch.cat([
112
+ kwargs["position_embeddings"][1][b][keep_mask_front[b]],
113
+ kwargs["position_embeddings"][1][b][begin_idx[b]:begin_idx[b]+1] if compress_mask[b] else torch.tensor([], device=kwargs["position_embeddings"][0].device, dtype=kwargs["position_embeddings"][0].dtype),
114
+ kwargs["position_embeddings"][1][b][keep_mask_back[b]],
115
+ ], dim=0) for b in range(bsz)
116
+ ]
117
+
118
+ emb_x_padded = self.left_pad_emb_list(emb_x_list)
119
+ emb_y_padded = self.left_pad_emb_list(emb_y_list)
120
+ kwargs["position_embeddings"] = (emb_x_padded, emb_y_padded)
121
+
122
+ if "cache_position" in kwargs and kwargs["cache_position"] is not None:
123
+ kwargs["cache_position"] = kwargs["cache_position"][: hidden_states.shape[1]]
124
+
125
+ return self.layer(hidden_states, *args, **kwargs), kwargs
126
+
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_sample": true,
3
+ "eos_token_id": [
4
+ 151645,
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "temperature": 0.7,
10
+ "top_k": 20,
11
+ "top_p": 0.8,
12
+ "transformers_version": "5.0.0.dev0"
13
+ }
latest ADDED
@@ -0,0 +1 @@
 
 
1
+ global_step70000
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f2607e81681f17af60e21c27c8ece74d54fdc00acdf306165af4a5135318d6a
3
+ size 4912008096
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:573b5c8e794a57e60078ea152ac8b553221636b0a0b0e6707e0c3ffb23cfe219
3
+ size 4915963312
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c58179bb7001dfd964833a6345795036c12d74f0ca8685554ff0bcc365c70264
3
+ size 4983071440
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5f906abc28a935826db5e7ed9e5b1bededb367fb7fee1c832ff8cbf70401012
3
+ size 2752528080
model.safetensors.index.json ADDED
@@ -0,0 +1,758 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 770288,
4
+ "total_size": 17563476448
5
+ },
6
+ "weight_map": {
7
+ "lm_head.weight": "model-00004-of-00004.safetensors",
8
+ "model.language_model.embed_tokens.weight": "model-00001-of-00004.safetensors",
9
+ "model.language_model.layers.0.layer.input_layernorm.weight": "model-00001-of-00004.safetensors",
10
+ "model.language_model.layers.0.layer.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
11
+ "model.language_model.layers.0.layer.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
12
+ "model.language_model.layers.0.layer.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
13
+ "model.language_model.layers.0.layer.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
14
+ "model.language_model.layers.0.layer.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
15
+ "model.language_model.layers.0.layer.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
16
+ "model.language_model.layers.0.layer.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
17
+ "model.language_model.layers.0.layer.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
18
+ "model.language_model.layers.0.layer.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
19
+ "model.language_model.layers.0.layer.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
20
+ "model.language_model.layers.1.layer.input_layernorm.weight": "model-00001-of-00004.safetensors",
21
+ "model.language_model.layers.1.layer.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
22
+ "model.language_model.layers.1.layer.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
23
+ "model.language_model.layers.1.layer.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
24
+ "model.language_model.layers.1.layer.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
25
+ "model.language_model.layers.1.layer.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
26
+ "model.language_model.layers.1.layer.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
27
+ "model.language_model.layers.1.layer.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
28
+ "model.language_model.layers.1.layer.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
29
+ "model.language_model.layers.1.layer.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
30
+ "model.language_model.layers.1.layer.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
31
+ "model.language_model.layers.10.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
32
+ "model.language_model.layers.10.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
33
+ "model.language_model.layers.10.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
34
+ "model.language_model.layers.10.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
35
+ "model.language_model.layers.10.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
36
+ "model.language_model.layers.10.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
37
+ "model.language_model.layers.10.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
38
+ "model.language_model.layers.10.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
39
+ "model.language_model.layers.10.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
40
+ "model.language_model.layers.10.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
41
+ "model.language_model.layers.10.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
42
+ "model.language_model.layers.11.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
43
+ "model.language_model.layers.11.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
44
+ "model.language_model.layers.11.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
45
+ "model.language_model.layers.11.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
46
+ "model.language_model.layers.11.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
47
+ "model.language_model.layers.11.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
48
+ "model.language_model.layers.11.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
49
+ "model.language_model.layers.11.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
50
+ "model.language_model.layers.11.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
51
+ "model.language_model.layers.11.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
52
+ "model.language_model.layers.11.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
53
+ "model.language_model.layers.12.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
54
+ "model.language_model.layers.12.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
55
+ "model.language_model.layers.12.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
56
+ "model.language_model.layers.12.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
57
+ "model.language_model.layers.12.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
58
+ "model.language_model.layers.12.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
59
+ "model.language_model.layers.12.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
60
+ "model.language_model.layers.12.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
61
+ "model.language_model.layers.12.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
62
+ "model.language_model.layers.12.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
63
+ "model.language_model.layers.12.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
64
+ "model.language_model.layers.13.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
65
+ "model.language_model.layers.13.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
66
+ "model.language_model.layers.13.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
67
+ "model.language_model.layers.13.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
68
+ "model.language_model.layers.13.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
69
+ "model.language_model.layers.13.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
70
+ "model.language_model.layers.13.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
71
+ "model.language_model.layers.13.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
72
+ "model.language_model.layers.13.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
73
+ "model.language_model.layers.13.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
74
+ "model.language_model.layers.13.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
75
+ "model.language_model.layers.14.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
76
+ "model.language_model.layers.14.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
77
+ "model.language_model.layers.14.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
78
+ "model.language_model.layers.14.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
79
+ "model.language_model.layers.14.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
80
+ "model.language_model.layers.14.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
81
+ "model.language_model.layers.14.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
82
+ "model.language_model.layers.14.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
83
+ "model.language_model.layers.14.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
84
+ "model.language_model.layers.14.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
85
+ "model.language_model.layers.14.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
86
+ "model.language_model.layers.15.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
87
+ "model.language_model.layers.15.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
88
+ "model.language_model.layers.15.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
89
+ "model.language_model.layers.15.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
90
+ "model.language_model.layers.15.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
91
+ "model.language_model.layers.15.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
92
+ "model.language_model.layers.15.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
93
+ "model.language_model.layers.15.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
94
+ "model.language_model.layers.15.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
95
+ "model.language_model.layers.15.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
96
+ "model.language_model.layers.15.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
97
+ "model.language_model.layers.16.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
98
+ "model.language_model.layers.16.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
99
+ "model.language_model.layers.16.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
100
+ "model.language_model.layers.16.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
101
+ "model.language_model.layers.16.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
102
+ "model.language_model.layers.16.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
103
+ "model.language_model.layers.16.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
104
+ "model.language_model.layers.16.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
105
+ "model.language_model.layers.16.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
106
+ "model.language_model.layers.16.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
107
+ "model.language_model.layers.16.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
108
+ "model.language_model.layers.17.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
109
+ "model.language_model.layers.17.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
110
+ "model.language_model.layers.17.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
111
+ "model.language_model.layers.17.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
112
+ "model.language_model.layers.17.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
113
+ "model.language_model.layers.17.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
114
+ "model.language_model.layers.17.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
115
+ "model.language_model.layers.17.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
116
+ "model.language_model.layers.17.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
117
+ "model.language_model.layers.17.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
118
+ "model.language_model.layers.17.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
119
+ "model.language_model.layers.18.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
120
+ "model.language_model.layers.18.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
121
+ "model.language_model.layers.18.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
122
+ "model.language_model.layers.18.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
123
+ "model.language_model.layers.18.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
124
+ "model.language_model.layers.18.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
125
+ "model.language_model.layers.18.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
126
+ "model.language_model.layers.18.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
127
+ "model.language_model.layers.18.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
128
+ "model.language_model.layers.18.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
129
+ "model.language_model.layers.18.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
130
+ "model.language_model.layers.19.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
131
+ "model.language_model.layers.19.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
132
+ "model.language_model.layers.19.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
133
+ "model.language_model.layers.19.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
134
+ "model.language_model.layers.19.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
135
+ "model.language_model.layers.19.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
136
+ "model.language_model.layers.19.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
137
+ "model.language_model.layers.19.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
138
+ "model.language_model.layers.19.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
139
+ "model.language_model.layers.19.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
140
+ "model.language_model.layers.19.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
141
+ "model.language_model.layers.2.layer.input_layernorm.weight": "model-00001-of-00004.safetensors",
142
+ "model.language_model.layers.2.layer.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
143
+ "model.language_model.layers.2.layer.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
144
+ "model.language_model.layers.2.layer.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
145
+ "model.language_model.layers.2.layer.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
146
+ "model.language_model.layers.2.layer.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
147
+ "model.language_model.layers.2.layer.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
148
+ "model.language_model.layers.2.layer.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
149
+ "model.language_model.layers.2.layer.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
150
+ "model.language_model.layers.2.layer.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
151
+ "model.language_model.layers.2.layer.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
152
+ "model.language_model.layers.20.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
153
+ "model.language_model.layers.20.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
154
+ "model.language_model.layers.20.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
155
+ "model.language_model.layers.20.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
156
+ "model.language_model.layers.20.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
157
+ "model.language_model.layers.20.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
158
+ "model.language_model.layers.20.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
159
+ "model.language_model.layers.20.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
160
+ "model.language_model.layers.20.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
161
+ "model.language_model.layers.20.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
162
+ "model.language_model.layers.20.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
163
+ "model.language_model.layers.21.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
164
+ "model.language_model.layers.21.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
165
+ "model.language_model.layers.21.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
166
+ "model.language_model.layers.21.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
167
+ "model.language_model.layers.21.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
168
+ "model.language_model.layers.21.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
169
+ "model.language_model.layers.21.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
170
+ "model.language_model.layers.21.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
171
+ "model.language_model.layers.21.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
172
+ "model.language_model.layers.21.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
173
+ "model.language_model.layers.21.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
174
+ "model.language_model.layers.22.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
175
+ "model.language_model.layers.22.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
176
+ "model.language_model.layers.22.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
177
+ "model.language_model.layers.22.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
178
+ "model.language_model.layers.22.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
179
+ "model.language_model.layers.22.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
180
+ "model.language_model.layers.22.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
181
+ "model.language_model.layers.22.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
182
+ "model.language_model.layers.22.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
183
+ "model.language_model.layers.22.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
184
+ "model.language_model.layers.22.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
185
+ "model.language_model.layers.23.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
186
+ "model.language_model.layers.23.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
187
+ "model.language_model.layers.23.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
188
+ "model.language_model.layers.23.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
189
+ "model.language_model.layers.23.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
190
+ "model.language_model.layers.23.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
191
+ "model.language_model.layers.23.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
192
+ "model.language_model.layers.23.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
193
+ "model.language_model.layers.23.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
194
+ "model.language_model.layers.23.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
195
+ "model.language_model.layers.23.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
196
+ "model.language_model.layers.24.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
197
+ "model.language_model.layers.24.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
198
+ "model.language_model.layers.24.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
199
+ "model.language_model.layers.24.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
200
+ "model.language_model.layers.24.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
201
+ "model.language_model.layers.24.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
202
+ "model.language_model.layers.24.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
203
+ "model.language_model.layers.24.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
204
+ "model.language_model.layers.24.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
205
+ "model.language_model.layers.24.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
206
+ "model.language_model.layers.24.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
207
+ "model.language_model.layers.25.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
208
+ "model.language_model.layers.25.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
209
+ "model.language_model.layers.25.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
210
+ "model.language_model.layers.25.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
211
+ "model.language_model.layers.25.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
212
+ "model.language_model.layers.25.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
213
+ "model.language_model.layers.25.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
214
+ "model.language_model.layers.25.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
215
+ "model.language_model.layers.25.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
216
+ "model.language_model.layers.25.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
217
+ "model.language_model.layers.25.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
218
+ "model.language_model.layers.26.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
219
+ "model.language_model.layers.26.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
220
+ "model.language_model.layers.26.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
221
+ "model.language_model.layers.26.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
222
+ "model.language_model.layers.26.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
223
+ "model.language_model.layers.26.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
224
+ "model.language_model.layers.26.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
225
+ "model.language_model.layers.26.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
226
+ "model.language_model.layers.26.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
227
+ "model.language_model.layers.26.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
228
+ "model.language_model.layers.26.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
229
+ "model.language_model.layers.27.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
230
+ "model.language_model.layers.27.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
231
+ "model.language_model.layers.27.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
232
+ "model.language_model.layers.27.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
233
+ "model.language_model.layers.27.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
234
+ "model.language_model.layers.27.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
235
+ "model.language_model.layers.27.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
236
+ "model.language_model.layers.27.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
237
+ "model.language_model.layers.27.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
238
+ "model.language_model.layers.27.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
239
+ "model.language_model.layers.27.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
240
+ "model.language_model.layers.28.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
241
+ "model.language_model.layers.28.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
242
+ "model.language_model.layers.28.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
243
+ "model.language_model.layers.28.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
244
+ "model.language_model.layers.28.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
245
+ "model.language_model.layers.28.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
246
+ "model.language_model.layers.28.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
247
+ "model.language_model.layers.28.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
248
+ "model.language_model.layers.28.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
249
+ "model.language_model.layers.28.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
250
+ "model.language_model.layers.28.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
251
+ "model.language_model.layers.29.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
252
+ "model.language_model.layers.29.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
253
+ "model.language_model.layers.29.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
254
+ "model.language_model.layers.29.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
255
+ "model.language_model.layers.29.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
256
+ "model.language_model.layers.29.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
257
+ "model.language_model.layers.29.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
258
+ "model.language_model.layers.29.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
259
+ "model.language_model.layers.29.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
260
+ "model.language_model.layers.29.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
261
+ "model.language_model.layers.29.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
262
+ "model.language_model.layers.3.layer.input_layernorm.weight": "model-00001-of-00004.safetensors",
263
+ "model.language_model.layers.3.layer.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
264
+ "model.language_model.layers.3.layer.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
265
+ "model.language_model.layers.3.layer.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
266
+ "model.language_model.layers.3.layer.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
267
+ "model.language_model.layers.3.layer.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
268
+ "model.language_model.layers.3.layer.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
269
+ "model.language_model.layers.3.layer.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
270
+ "model.language_model.layers.3.layer.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
271
+ "model.language_model.layers.3.layer.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
272
+ "model.language_model.layers.3.layer.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
273
+ "model.language_model.layers.30.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
274
+ "model.language_model.layers.30.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
275
+ "model.language_model.layers.30.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
276
+ "model.language_model.layers.30.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
277
+ "model.language_model.layers.30.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
278
+ "model.language_model.layers.30.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
279
+ "model.language_model.layers.30.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
280
+ "model.language_model.layers.30.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
281
+ "model.language_model.layers.30.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
282
+ "model.language_model.layers.30.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
283
+ "model.language_model.layers.30.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
284
+ "model.language_model.layers.31.layer.input_layernorm.weight": "model-00003-of-00004.safetensors",
285
+ "model.language_model.layers.31.layer.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
286
+ "model.language_model.layers.31.layer.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
287
+ "model.language_model.layers.31.layer.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
288
+ "model.language_model.layers.31.layer.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
289
+ "model.language_model.layers.31.layer.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
290
+ "model.language_model.layers.31.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
291
+ "model.language_model.layers.31.layer.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
292
+ "model.language_model.layers.31.layer.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
293
+ "model.language_model.layers.31.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
294
+ "model.language_model.layers.31.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
295
+ "model.language_model.layers.32.layer.input_layernorm.weight": "model-00004-of-00004.safetensors",
296
+ "model.language_model.layers.32.layer.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
297
+ "model.language_model.layers.32.layer.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
298
+ "model.language_model.layers.32.layer.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
299
+ "model.language_model.layers.32.layer.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
300
+ "model.language_model.layers.32.layer.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
301
+ "model.language_model.layers.32.layer.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
302
+ "model.language_model.layers.32.layer.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
303
+ "model.language_model.layers.32.layer.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
304
+ "model.language_model.layers.32.layer.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
305
+ "model.language_model.layers.32.layer.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
306
+ "model.language_model.layers.33.layer.input_layernorm.weight": "model-00004-of-00004.safetensors",
307
+ "model.language_model.layers.33.layer.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
308
+ "model.language_model.layers.33.layer.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
309
+ "model.language_model.layers.33.layer.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
310
+ "model.language_model.layers.33.layer.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
311
+ "model.language_model.layers.33.layer.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
312
+ "model.language_model.layers.33.layer.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
313
+ "model.language_model.layers.33.layer.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
314
+ "model.language_model.layers.33.layer.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
315
+ "model.language_model.layers.33.layer.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
316
+ "model.language_model.layers.33.layer.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
317
+ "model.language_model.layers.34.layer.input_layernorm.weight": "model-00004-of-00004.safetensors",
318
+ "model.language_model.layers.34.layer.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
319
+ "model.language_model.layers.34.layer.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
320
+ "model.language_model.layers.34.layer.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
321
+ "model.language_model.layers.34.layer.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
322
+ "model.language_model.layers.34.layer.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
323
+ "model.language_model.layers.34.layer.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
324
+ "model.language_model.layers.34.layer.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
325
+ "model.language_model.layers.34.layer.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
326
+ "model.language_model.layers.34.layer.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
327
+ "model.language_model.layers.34.layer.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
328
+ "model.language_model.layers.35.layer.input_layernorm.weight": "model-00004-of-00004.safetensors",
329
+ "model.language_model.layers.35.layer.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
330
+ "model.language_model.layers.35.layer.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
331
+ "model.language_model.layers.35.layer.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
332
+ "model.language_model.layers.35.layer.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
333
+ "model.language_model.layers.35.layer.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
334
+ "model.language_model.layers.35.layer.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
335
+ "model.language_model.layers.35.layer.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
336
+ "model.language_model.layers.35.layer.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
337
+ "model.language_model.layers.35.layer.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
338
+ "model.language_model.layers.35.layer.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
339
+ "model.language_model.layers.4.layer.input_layernorm.weight": "model-00001-of-00004.safetensors",
340
+ "model.language_model.layers.4.layer.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
341
+ "model.language_model.layers.4.layer.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
342
+ "model.language_model.layers.4.layer.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
343
+ "model.language_model.layers.4.layer.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
344
+ "model.language_model.layers.4.layer.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
345
+ "model.language_model.layers.4.layer.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
346
+ "model.language_model.layers.4.layer.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
347
+ "model.language_model.layers.4.layer.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
348
+ "model.language_model.layers.4.layer.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
349
+ "model.language_model.layers.4.layer.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
350
+ "model.language_model.layers.5.layer.input_layernorm.weight": "model-00001-of-00004.safetensors",
351
+ "model.language_model.layers.5.layer.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
352
+ "model.language_model.layers.5.layer.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
353
+ "model.language_model.layers.5.layer.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
354
+ "model.language_model.layers.5.layer.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
355
+ "model.language_model.layers.5.layer.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
356
+ "model.language_model.layers.5.layer.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
357
+ "model.language_model.layers.5.layer.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
358
+ "model.language_model.layers.5.layer.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
359
+ "model.language_model.layers.5.layer.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
360
+ "model.language_model.layers.5.layer.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
361
+ "model.language_model.layers.6.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
362
+ "model.language_model.layers.6.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
363
+ "model.language_model.layers.6.layer.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
364
+ "model.language_model.layers.6.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
365
+ "model.language_model.layers.6.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
366
+ "model.language_model.layers.6.layer.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
367
+ "model.language_model.layers.6.layer.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
368
+ "model.language_model.layers.6.layer.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
369
+ "model.language_model.layers.6.layer.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
370
+ "model.language_model.layers.6.layer.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
371
+ "model.language_model.layers.6.layer.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
372
+ "model.language_model.layers.7.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
373
+ "model.language_model.layers.7.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
374
+ "model.language_model.layers.7.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
375
+ "model.language_model.layers.7.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
376
+ "model.language_model.layers.7.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
377
+ "model.language_model.layers.7.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
378
+ "model.language_model.layers.7.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
379
+ "model.language_model.layers.7.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
380
+ "model.language_model.layers.7.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
381
+ "model.language_model.layers.7.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
382
+ "model.language_model.layers.7.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
383
+ "model.language_model.layers.8.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
384
+ "model.language_model.layers.8.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
385
+ "model.language_model.layers.8.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
386
+ "model.language_model.layers.8.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
387
+ "model.language_model.layers.8.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
388
+ "model.language_model.layers.8.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
389
+ "model.language_model.layers.8.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
390
+ "model.language_model.layers.8.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
391
+ "model.language_model.layers.8.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
392
+ "model.language_model.layers.8.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
393
+ "model.language_model.layers.8.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
394
+ "model.language_model.layers.9.layer.input_layernorm.weight": "model-00002-of-00004.safetensors",
395
+ "model.language_model.layers.9.layer.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
396
+ "model.language_model.layers.9.layer.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
397
+ "model.language_model.layers.9.layer.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
398
+ "model.language_model.layers.9.layer.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
399
+ "model.language_model.layers.9.layer.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
400
+ "model.language_model.layers.9.layer.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
401
+ "model.language_model.layers.9.layer.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
402
+ "model.language_model.layers.9.layer.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
403
+ "model.language_model.layers.9.layer.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
404
+ "model.language_model.layers.9.layer.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
405
+ "model.language_model.norm.weight": "model-00004-of-00004.safetensors",
406
+ "model.visual.blocks.0.attn.proj.bias": "model-00001-of-00004.safetensors",
407
+ "model.visual.blocks.0.attn.proj.weight": "model-00001-of-00004.safetensors",
408
+ "model.visual.blocks.0.attn.qkv.bias": "model-00001-of-00004.safetensors",
409
+ "model.visual.blocks.0.attn.qkv.weight": "model-00001-of-00004.safetensors",
410
+ "model.visual.blocks.0.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
411
+ "model.visual.blocks.0.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
412
+ "model.visual.blocks.0.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
413
+ "model.visual.blocks.0.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
414
+ "model.visual.blocks.0.norm1.bias": "model-00001-of-00004.safetensors",
415
+ "model.visual.blocks.0.norm1.weight": "model-00001-of-00004.safetensors",
416
+ "model.visual.blocks.0.norm2.bias": "model-00001-of-00004.safetensors",
417
+ "model.visual.blocks.0.norm2.weight": "model-00001-of-00004.safetensors",
418
+ "model.visual.blocks.1.attn.proj.bias": "model-00001-of-00004.safetensors",
419
+ "model.visual.blocks.1.attn.proj.weight": "model-00001-of-00004.safetensors",
420
+ "model.visual.blocks.1.attn.qkv.bias": "model-00001-of-00004.safetensors",
421
+ "model.visual.blocks.1.attn.qkv.weight": "model-00001-of-00004.safetensors",
422
+ "model.visual.blocks.1.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
423
+ "model.visual.blocks.1.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
424
+ "model.visual.blocks.1.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
425
+ "model.visual.blocks.1.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
426
+ "model.visual.blocks.1.norm1.bias": "model-00001-of-00004.safetensors",
427
+ "model.visual.blocks.1.norm1.weight": "model-00001-of-00004.safetensors",
428
+ "model.visual.blocks.1.norm2.bias": "model-00001-of-00004.safetensors",
429
+ "model.visual.blocks.1.norm2.weight": "model-00001-of-00004.safetensors",
430
+ "model.visual.blocks.10.attn.proj.bias": "model-00001-of-00004.safetensors",
431
+ "model.visual.blocks.10.attn.proj.weight": "model-00001-of-00004.safetensors",
432
+ "model.visual.blocks.10.attn.qkv.bias": "model-00001-of-00004.safetensors",
433
+ "model.visual.blocks.10.attn.qkv.weight": "model-00001-of-00004.safetensors",
434
+ "model.visual.blocks.10.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
435
+ "model.visual.blocks.10.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
436
+ "model.visual.blocks.10.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
437
+ "model.visual.blocks.10.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
438
+ "model.visual.blocks.10.norm1.bias": "model-00001-of-00004.safetensors",
439
+ "model.visual.blocks.10.norm1.weight": "model-00001-of-00004.safetensors",
440
+ "model.visual.blocks.10.norm2.bias": "model-00001-of-00004.safetensors",
441
+ "model.visual.blocks.10.norm2.weight": "model-00001-of-00004.safetensors",
442
+ "model.visual.blocks.11.attn.proj.bias": "model-00001-of-00004.safetensors",
443
+ "model.visual.blocks.11.attn.proj.weight": "model-00001-of-00004.safetensors",
444
+ "model.visual.blocks.11.attn.qkv.bias": "model-00001-of-00004.safetensors",
445
+ "model.visual.blocks.11.attn.qkv.weight": "model-00001-of-00004.safetensors",
446
+ "model.visual.blocks.11.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
447
+ "model.visual.blocks.11.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
448
+ "model.visual.blocks.11.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
449
+ "model.visual.blocks.11.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
450
+ "model.visual.blocks.11.norm1.bias": "model-00001-of-00004.safetensors",
451
+ "model.visual.blocks.11.norm1.weight": "model-00001-of-00004.safetensors",
452
+ "model.visual.blocks.11.norm2.bias": "model-00001-of-00004.safetensors",
453
+ "model.visual.blocks.11.norm2.weight": "model-00001-of-00004.safetensors",
454
+ "model.visual.blocks.12.attn.proj.bias": "model-00001-of-00004.safetensors",
455
+ "model.visual.blocks.12.attn.proj.weight": "model-00001-of-00004.safetensors",
456
+ "model.visual.blocks.12.attn.qkv.bias": "model-00001-of-00004.safetensors",
457
+ "model.visual.blocks.12.attn.qkv.weight": "model-00001-of-00004.safetensors",
458
+ "model.visual.blocks.12.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
459
+ "model.visual.blocks.12.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
460
+ "model.visual.blocks.12.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
461
+ "model.visual.blocks.12.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
462
+ "model.visual.blocks.12.norm1.bias": "model-00001-of-00004.safetensors",
463
+ "model.visual.blocks.12.norm1.weight": "model-00001-of-00004.safetensors",
464
+ "model.visual.blocks.12.norm2.bias": "model-00001-of-00004.safetensors",
465
+ "model.visual.blocks.12.norm2.weight": "model-00001-of-00004.safetensors",
466
+ "model.visual.blocks.13.attn.proj.bias": "model-00001-of-00004.safetensors",
467
+ "model.visual.blocks.13.attn.proj.weight": "model-00001-of-00004.safetensors",
468
+ "model.visual.blocks.13.attn.qkv.bias": "model-00001-of-00004.safetensors",
469
+ "model.visual.blocks.13.attn.qkv.weight": "model-00001-of-00004.safetensors",
470
+ "model.visual.blocks.13.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
471
+ "model.visual.blocks.13.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
472
+ "model.visual.blocks.13.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
473
+ "model.visual.blocks.13.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
474
+ "model.visual.blocks.13.norm1.bias": "model-00001-of-00004.safetensors",
475
+ "model.visual.blocks.13.norm1.weight": "model-00001-of-00004.safetensors",
476
+ "model.visual.blocks.13.norm2.bias": "model-00001-of-00004.safetensors",
477
+ "model.visual.blocks.13.norm2.weight": "model-00001-of-00004.safetensors",
478
+ "model.visual.blocks.14.attn.proj.bias": "model-00001-of-00004.safetensors",
479
+ "model.visual.blocks.14.attn.proj.weight": "model-00001-of-00004.safetensors",
480
+ "model.visual.blocks.14.attn.qkv.bias": "model-00001-of-00004.safetensors",
481
+ "model.visual.blocks.14.attn.qkv.weight": "model-00001-of-00004.safetensors",
482
+ "model.visual.blocks.14.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
483
+ "model.visual.blocks.14.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
484
+ "model.visual.blocks.14.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
485
+ "model.visual.blocks.14.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
486
+ "model.visual.blocks.14.norm1.bias": "model-00001-of-00004.safetensors",
487
+ "model.visual.blocks.14.norm1.weight": "model-00001-of-00004.safetensors",
488
+ "model.visual.blocks.14.norm2.bias": "model-00001-of-00004.safetensors",
489
+ "model.visual.blocks.14.norm2.weight": "model-00001-of-00004.safetensors",
490
+ "model.visual.blocks.15.attn.proj.bias": "model-00001-of-00004.safetensors",
491
+ "model.visual.blocks.15.attn.proj.weight": "model-00001-of-00004.safetensors",
492
+ "model.visual.blocks.15.attn.qkv.bias": "model-00001-of-00004.safetensors",
493
+ "model.visual.blocks.15.attn.qkv.weight": "model-00001-of-00004.safetensors",
494
+ "model.visual.blocks.15.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
495
+ "model.visual.blocks.15.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
496
+ "model.visual.blocks.15.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
497
+ "model.visual.blocks.15.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
498
+ "model.visual.blocks.15.norm1.bias": "model-00001-of-00004.safetensors",
499
+ "model.visual.blocks.15.norm1.weight": "model-00001-of-00004.safetensors",
500
+ "model.visual.blocks.15.norm2.bias": "model-00001-of-00004.safetensors",
501
+ "model.visual.blocks.15.norm2.weight": "model-00001-of-00004.safetensors",
502
+ "model.visual.blocks.16.attn.proj.bias": "model-00001-of-00004.safetensors",
503
+ "model.visual.blocks.16.attn.proj.weight": "model-00001-of-00004.safetensors",
504
+ "model.visual.blocks.16.attn.qkv.bias": "model-00001-of-00004.safetensors",
505
+ "model.visual.blocks.16.attn.qkv.weight": "model-00001-of-00004.safetensors",
506
+ "model.visual.blocks.16.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
507
+ "model.visual.blocks.16.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
508
+ "model.visual.blocks.16.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
509
+ "model.visual.blocks.16.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
510
+ "model.visual.blocks.16.norm1.bias": "model-00001-of-00004.safetensors",
511
+ "model.visual.blocks.16.norm1.weight": "model-00001-of-00004.safetensors",
512
+ "model.visual.blocks.16.norm2.bias": "model-00001-of-00004.safetensors",
513
+ "model.visual.blocks.16.norm2.weight": "model-00001-of-00004.safetensors",
514
+ "model.visual.blocks.17.attn.proj.bias": "model-00001-of-00004.safetensors",
515
+ "model.visual.blocks.17.attn.proj.weight": "model-00001-of-00004.safetensors",
516
+ "model.visual.blocks.17.attn.qkv.bias": "model-00001-of-00004.safetensors",
517
+ "model.visual.blocks.17.attn.qkv.weight": "model-00001-of-00004.safetensors",
518
+ "model.visual.blocks.17.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
519
+ "model.visual.blocks.17.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
520
+ "model.visual.blocks.17.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
521
+ "model.visual.blocks.17.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
522
+ "model.visual.blocks.17.norm1.bias": "model-00001-of-00004.safetensors",
523
+ "model.visual.blocks.17.norm1.weight": "model-00001-of-00004.safetensors",
524
+ "model.visual.blocks.17.norm2.bias": "model-00001-of-00004.safetensors",
525
+ "model.visual.blocks.17.norm2.weight": "model-00001-of-00004.safetensors",
526
+ "model.visual.blocks.18.attn.proj.bias": "model-00001-of-00004.safetensors",
527
+ "model.visual.blocks.18.attn.proj.weight": "model-00001-of-00004.safetensors",
528
+ "model.visual.blocks.18.attn.qkv.bias": "model-00001-of-00004.safetensors",
529
+ "model.visual.blocks.18.attn.qkv.weight": "model-00001-of-00004.safetensors",
530
+ "model.visual.blocks.18.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
531
+ "model.visual.blocks.18.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
532
+ "model.visual.blocks.18.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
533
+ "model.visual.blocks.18.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
534
+ "model.visual.blocks.18.norm1.bias": "model-00001-of-00004.safetensors",
535
+ "model.visual.blocks.18.norm1.weight": "model-00001-of-00004.safetensors",
536
+ "model.visual.blocks.18.norm2.bias": "model-00001-of-00004.safetensors",
537
+ "model.visual.blocks.18.norm2.weight": "model-00001-of-00004.safetensors",
538
+ "model.visual.blocks.19.attn.proj.bias": "model-00001-of-00004.safetensors",
539
+ "model.visual.blocks.19.attn.proj.weight": "model-00001-of-00004.safetensors",
540
+ "model.visual.blocks.19.attn.qkv.bias": "model-00001-of-00004.safetensors",
541
+ "model.visual.blocks.19.attn.qkv.weight": "model-00001-of-00004.safetensors",
542
+ "model.visual.blocks.19.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
543
+ "model.visual.blocks.19.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
544
+ "model.visual.blocks.19.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
545
+ "model.visual.blocks.19.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
546
+ "model.visual.blocks.19.norm1.bias": "model-00001-of-00004.safetensors",
547
+ "model.visual.blocks.19.norm1.weight": "model-00001-of-00004.safetensors",
548
+ "model.visual.blocks.19.norm2.bias": "model-00001-of-00004.safetensors",
549
+ "model.visual.blocks.19.norm2.weight": "model-00001-of-00004.safetensors",
550
+ "model.visual.blocks.2.attn.proj.bias": "model-00001-of-00004.safetensors",
551
+ "model.visual.blocks.2.attn.proj.weight": "model-00001-of-00004.safetensors",
552
+ "model.visual.blocks.2.attn.qkv.bias": "model-00001-of-00004.safetensors",
553
+ "model.visual.blocks.2.attn.qkv.weight": "model-00001-of-00004.safetensors",
554
+ "model.visual.blocks.2.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
555
+ "model.visual.blocks.2.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
556
+ "model.visual.blocks.2.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
557
+ "model.visual.blocks.2.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
558
+ "model.visual.blocks.2.norm1.bias": "model-00001-of-00004.safetensors",
559
+ "model.visual.blocks.2.norm1.weight": "model-00001-of-00004.safetensors",
560
+ "model.visual.blocks.2.norm2.bias": "model-00001-of-00004.safetensors",
561
+ "model.visual.blocks.2.norm2.weight": "model-00001-of-00004.safetensors",
562
+ "model.visual.blocks.20.attn.proj.bias": "model-00001-of-00004.safetensors",
563
+ "model.visual.blocks.20.attn.proj.weight": "model-00001-of-00004.safetensors",
564
+ "model.visual.blocks.20.attn.qkv.bias": "model-00001-of-00004.safetensors",
565
+ "model.visual.blocks.20.attn.qkv.weight": "model-00001-of-00004.safetensors",
566
+ "model.visual.blocks.20.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
567
+ "model.visual.blocks.20.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
568
+ "model.visual.blocks.20.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
569
+ "model.visual.blocks.20.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
570
+ "model.visual.blocks.20.norm1.bias": "model-00001-of-00004.safetensors",
571
+ "model.visual.blocks.20.norm1.weight": "model-00001-of-00004.safetensors",
572
+ "model.visual.blocks.20.norm2.bias": "model-00001-of-00004.safetensors",
573
+ "model.visual.blocks.20.norm2.weight": "model-00001-of-00004.safetensors",
574
+ "model.visual.blocks.21.attn.proj.bias": "model-00001-of-00004.safetensors",
575
+ "model.visual.blocks.21.attn.proj.weight": "model-00001-of-00004.safetensors",
576
+ "model.visual.blocks.21.attn.qkv.bias": "model-00001-of-00004.safetensors",
577
+ "model.visual.blocks.21.attn.qkv.weight": "model-00001-of-00004.safetensors",
578
+ "model.visual.blocks.21.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
579
+ "model.visual.blocks.21.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
580
+ "model.visual.blocks.21.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
581
+ "model.visual.blocks.21.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
582
+ "model.visual.blocks.21.norm1.bias": "model-00001-of-00004.safetensors",
583
+ "model.visual.blocks.21.norm1.weight": "model-00001-of-00004.safetensors",
584
+ "model.visual.blocks.21.norm2.bias": "model-00001-of-00004.safetensors",
585
+ "model.visual.blocks.21.norm2.weight": "model-00001-of-00004.safetensors",
586
+ "model.visual.blocks.22.attn.proj.bias": "model-00001-of-00004.safetensors",
587
+ "model.visual.blocks.22.attn.proj.weight": "model-00001-of-00004.safetensors",
588
+ "model.visual.blocks.22.attn.qkv.bias": "model-00001-of-00004.safetensors",
589
+ "model.visual.blocks.22.attn.qkv.weight": "model-00001-of-00004.safetensors",
590
+ "model.visual.blocks.22.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
591
+ "model.visual.blocks.22.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
592
+ "model.visual.blocks.22.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
593
+ "model.visual.blocks.22.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
594
+ "model.visual.blocks.22.norm1.bias": "model-00001-of-00004.safetensors",
595
+ "model.visual.blocks.22.norm1.weight": "model-00001-of-00004.safetensors",
596
+ "model.visual.blocks.22.norm2.bias": "model-00001-of-00004.safetensors",
597
+ "model.visual.blocks.22.norm2.weight": "model-00001-of-00004.safetensors",
598
+ "model.visual.blocks.23.attn.proj.bias": "model-00001-of-00004.safetensors",
599
+ "model.visual.blocks.23.attn.proj.weight": "model-00001-of-00004.safetensors",
600
+ "model.visual.blocks.23.attn.qkv.bias": "model-00001-of-00004.safetensors",
601
+ "model.visual.blocks.23.attn.qkv.weight": "model-00001-of-00004.safetensors",
602
+ "model.visual.blocks.23.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
603
+ "model.visual.blocks.23.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
604
+ "model.visual.blocks.23.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
605
+ "model.visual.blocks.23.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
606
+ "model.visual.blocks.23.norm1.bias": "model-00001-of-00004.safetensors",
607
+ "model.visual.blocks.23.norm1.weight": "model-00001-of-00004.safetensors",
608
+ "model.visual.blocks.23.norm2.bias": "model-00001-of-00004.safetensors",
609
+ "model.visual.blocks.23.norm2.weight": "model-00001-of-00004.safetensors",
610
+ "model.visual.blocks.24.attn.proj.bias": "model-00001-of-00004.safetensors",
611
+ "model.visual.blocks.24.attn.proj.weight": "model-00001-of-00004.safetensors",
612
+ "model.visual.blocks.24.attn.qkv.bias": "model-00001-of-00004.safetensors",
613
+ "model.visual.blocks.24.attn.qkv.weight": "model-00001-of-00004.safetensors",
614
+ "model.visual.blocks.24.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
615
+ "model.visual.blocks.24.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
616
+ "model.visual.blocks.24.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
617
+ "model.visual.blocks.24.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
618
+ "model.visual.blocks.24.norm1.bias": "model-00001-of-00004.safetensors",
619
+ "model.visual.blocks.24.norm1.weight": "model-00001-of-00004.safetensors",
620
+ "model.visual.blocks.24.norm2.bias": "model-00001-of-00004.safetensors",
621
+ "model.visual.blocks.24.norm2.weight": "model-00001-of-00004.safetensors",
622
+ "model.visual.blocks.25.attn.proj.bias": "model-00001-of-00004.safetensors",
623
+ "model.visual.blocks.25.attn.proj.weight": "model-00001-of-00004.safetensors",
624
+ "model.visual.blocks.25.attn.qkv.bias": "model-00001-of-00004.safetensors",
625
+ "model.visual.blocks.25.attn.qkv.weight": "model-00001-of-00004.safetensors",
626
+ "model.visual.blocks.25.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
627
+ "model.visual.blocks.25.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
628
+ "model.visual.blocks.25.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
629
+ "model.visual.blocks.25.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
630
+ "model.visual.blocks.25.norm1.bias": "model-00001-of-00004.safetensors",
631
+ "model.visual.blocks.25.norm1.weight": "model-00001-of-00004.safetensors",
632
+ "model.visual.blocks.25.norm2.bias": "model-00001-of-00004.safetensors",
633
+ "model.visual.blocks.25.norm2.weight": "model-00001-of-00004.safetensors",
634
+ "model.visual.blocks.26.attn.proj.bias": "model-00001-of-00004.safetensors",
635
+ "model.visual.blocks.26.attn.proj.weight": "model-00001-of-00004.safetensors",
636
+ "model.visual.blocks.26.attn.qkv.bias": "model-00001-of-00004.safetensors",
637
+ "model.visual.blocks.26.attn.qkv.weight": "model-00001-of-00004.safetensors",
638
+ "model.visual.blocks.26.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
639
+ "model.visual.blocks.26.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
640
+ "model.visual.blocks.26.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
641
+ "model.visual.blocks.26.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
642
+ "model.visual.blocks.26.norm1.bias": "model-00001-of-00004.safetensors",
643
+ "model.visual.blocks.26.norm1.weight": "model-00001-of-00004.safetensors",
644
+ "model.visual.blocks.26.norm2.bias": "model-00001-of-00004.safetensors",
645
+ "model.visual.blocks.26.norm2.weight": "model-00001-of-00004.safetensors",
646
+ "model.visual.blocks.3.attn.proj.bias": "model-00001-of-00004.safetensors",
647
+ "model.visual.blocks.3.attn.proj.weight": "model-00001-of-00004.safetensors",
648
+ "model.visual.blocks.3.attn.qkv.bias": "model-00001-of-00004.safetensors",
649
+ "model.visual.blocks.3.attn.qkv.weight": "model-00001-of-00004.safetensors",
650
+ "model.visual.blocks.3.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
651
+ "model.visual.blocks.3.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
652
+ "model.visual.blocks.3.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
653
+ "model.visual.blocks.3.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
654
+ "model.visual.blocks.3.norm1.bias": "model-00001-of-00004.safetensors",
655
+ "model.visual.blocks.3.norm1.weight": "model-00001-of-00004.safetensors",
656
+ "model.visual.blocks.3.norm2.bias": "model-00001-of-00004.safetensors",
657
+ "model.visual.blocks.3.norm2.weight": "model-00001-of-00004.safetensors",
658
+ "model.visual.blocks.4.attn.proj.bias": "model-00001-of-00004.safetensors",
659
+ "model.visual.blocks.4.attn.proj.weight": "model-00001-of-00004.safetensors",
660
+ "model.visual.blocks.4.attn.qkv.bias": "model-00001-of-00004.safetensors",
661
+ "model.visual.blocks.4.attn.qkv.weight": "model-00001-of-00004.safetensors",
662
+ "model.visual.blocks.4.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
663
+ "model.visual.blocks.4.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
664
+ "model.visual.blocks.4.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
665
+ "model.visual.blocks.4.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
666
+ "model.visual.blocks.4.norm1.bias": "model-00001-of-00004.safetensors",
667
+ "model.visual.blocks.4.norm1.weight": "model-00001-of-00004.safetensors",
668
+ "model.visual.blocks.4.norm2.bias": "model-00001-of-00004.safetensors",
669
+ "model.visual.blocks.4.norm2.weight": "model-00001-of-00004.safetensors",
670
+ "model.visual.blocks.5.attn.proj.bias": "model-00001-of-00004.safetensors",
671
+ "model.visual.blocks.5.attn.proj.weight": "model-00001-of-00004.safetensors",
672
+ "model.visual.blocks.5.attn.qkv.bias": "model-00001-of-00004.safetensors",
673
+ "model.visual.blocks.5.attn.qkv.weight": "model-00001-of-00004.safetensors",
674
+ "model.visual.blocks.5.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
675
+ "model.visual.blocks.5.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
676
+ "model.visual.blocks.5.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
677
+ "model.visual.blocks.5.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
678
+ "model.visual.blocks.5.norm1.bias": "model-00001-of-00004.safetensors",
679
+ "model.visual.blocks.5.norm1.weight": "model-00001-of-00004.safetensors",
680
+ "model.visual.blocks.5.norm2.bias": "model-00001-of-00004.safetensors",
681
+ "model.visual.blocks.5.norm2.weight": "model-00001-of-00004.safetensors",
682
+ "model.visual.blocks.6.attn.proj.bias": "model-00001-of-00004.safetensors",
683
+ "model.visual.blocks.6.attn.proj.weight": "model-00001-of-00004.safetensors",
684
+ "model.visual.blocks.6.attn.qkv.bias": "model-00001-of-00004.safetensors",
685
+ "model.visual.blocks.6.attn.qkv.weight": "model-00001-of-00004.safetensors",
686
+ "model.visual.blocks.6.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
687
+ "model.visual.blocks.6.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
688
+ "model.visual.blocks.6.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
689
+ "model.visual.blocks.6.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
690
+ "model.visual.blocks.6.norm1.bias": "model-00001-of-00004.safetensors",
691
+ "model.visual.blocks.6.norm1.weight": "model-00001-of-00004.safetensors",
692
+ "model.visual.blocks.6.norm2.bias": "model-00001-of-00004.safetensors",
693
+ "model.visual.blocks.6.norm2.weight": "model-00001-of-00004.safetensors",
694
+ "model.visual.blocks.7.attn.proj.bias": "model-00001-of-00004.safetensors",
695
+ "model.visual.blocks.7.attn.proj.weight": "model-00001-of-00004.safetensors",
696
+ "model.visual.blocks.7.attn.qkv.bias": "model-00001-of-00004.safetensors",
697
+ "model.visual.blocks.7.attn.qkv.weight": "model-00001-of-00004.safetensors",
698
+ "model.visual.blocks.7.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
699
+ "model.visual.blocks.7.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
700
+ "model.visual.blocks.7.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
701
+ "model.visual.blocks.7.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
702
+ "model.visual.blocks.7.norm1.bias": "model-00001-of-00004.safetensors",
703
+ "model.visual.blocks.7.norm1.weight": "model-00001-of-00004.safetensors",
704
+ "model.visual.blocks.7.norm2.bias": "model-00001-of-00004.safetensors",
705
+ "model.visual.blocks.7.norm2.weight": "model-00001-of-00004.safetensors",
706
+ "model.visual.blocks.8.attn.proj.bias": "model-00001-of-00004.safetensors",
707
+ "model.visual.blocks.8.attn.proj.weight": "model-00001-of-00004.safetensors",
708
+ "model.visual.blocks.8.attn.qkv.bias": "model-00001-of-00004.safetensors",
709
+ "model.visual.blocks.8.attn.qkv.weight": "model-00001-of-00004.safetensors",
710
+ "model.visual.blocks.8.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
711
+ "model.visual.blocks.8.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
712
+ "model.visual.blocks.8.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
713
+ "model.visual.blocks.8.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
714
+ "model.visual.blocks.8.norm1.bias": "model-00001-of-00004.safetensors",
715
+ "model.visual.blocks.8.norm1.weight": "model-00001-of-00004.safetensors",
716
+ "model.visual.blocks.8.norm2.bias": "model-00001-of-00004.safetensors",
717
+ "model.visual.blocks.8.norm2.weight": "model-00001-of-00004.safetensors",
718
+ "model.visual.blocks.9.attn.proj.bias": "model-00001-of-00004.safetensors",
719
+ "model.visual.blocks.9.attn.proj.weight": "model-00001-of-00004.safetensors",
720
+ "model.visual.blocks.9.attn.qkv.bias": "model-00001-of-00004.safetensors",
721
+ "model.visual.blocks.9.attn.qkv.weight": "model-00001-of-00004.safetensors",
722
+ "model.visual.blocks.9.mlp.linear_fc1.bias": "model-00001-of-00004.safetensors",
723
+ "model.visual.blocks.9.mlp.linear_fc1.weight": "model-00001-of-00004.safetensors",
724
+ "model.visual.blocks.9.mlp.linear_fc2.bias": "model-00001-of-00004.safetensors",
725
+ "model.visual.blocks.9.mlp.linear_fc2.weight": "model-00001-of-00004.safetensors",
726
+ "model.visual.blocks.9.norm1.bias": "model-00001-of-00004.safetensors",
727
+ "model.visual.blocks.9.norm1.weight": "model-00001-of-00004.safetensors",
728
+ "model.visual.blocks.9.norm2.bias": "model-00001-of-00004.safetensors",
729
+ "model.visual.blocks.9.norm2.weight": "model-00001-of-00004.safetensors",
730
+ "model.visual.deepstack_merger_list.0.linear_fc1.bias": "model-00001-of-00004.safetensors",
731
+ "model.visual.deepstack_merger_list.0.linear_fc1.weight": "model-00001-of-00004.safetensors",
732
+ "model.visual.deepstack_merger_list.0.linear_fc2.bias": "model-00001-of-00004.safetensors",
733
+ "model.visual.deepstack_merger_list.0.linear_fc2.weight": "model-00001-of-00004.safetensors",
734
+ "model.visual.deepstack_merger_list.0.norm.bias": "model-00001-of-00004.safetensors",
735
+ "model.visual.deepstack_merger_list.0.norm.weight": "model-00001-of-00004.safetensors",
736
+ "model.visual.deepstack_merger_list.1.linear_fc1.bias": "model-00001-of-00004.safetensors",
737
+ "model.visual.deepstack_merger_list.1.linear_fc1.weight": "model-00001-of-00004.safetensors",
738
+ "model.visual.deepstack_merger_list.1.linear_fc2.bias": "model-00001-of-00004.safetensors",
739
+ "model.visual.deepstack_merger_list.1.linear_fc2.weight": "model-00001-of-00004.safetensors",
740
+ "model.visual.deepstack_merger_list.1.norm.bias": "model-00001-of-00004.safetensors",
741
+ "model.visual.deepstack_merger_list.1.norm.weight": "model-00001-of-00004.safetensors",
742
+ "model.visual.deepstack_merger_list.2.linear_fc1.bias": "model-00001-of-00004.safetensors",
743
+ "model.visual.deepstack_merger_list.2.linear_fc1.weight": "model-00001-of-00004.safetensors",
744
+ "model.visual.deepstack_merger_list.2.linear_fc2.bias": "model-00001-of-00004.safetensors",
745
+ "model.visual.deepstack_merger_list.2.linear_fc2.weight": "model-00001-of-00004.safetensors",
746
+ "model.visual.deepstack_merger_list.2.norm.bias": "model-00001-of-00004.safetensors",
747
+ "model.visual.deepstack_merger_list.2.norm.weight": "model-00001-of-00004.safetensors",
748
+ "model.visual.merger.linear_fc1.bias": "model-00001-of-00004.safetensors",
749
+ "model.visual.merger.linear_fc1.weight": "model-00001-of-00004.safetensors",
750
+ "model.visual.merger.linear_fc2.bias": "model-00001-of-00004.safetensors",
751
+ "model.visual.merger.linear_fc2.weight": "model-00001-of-00004.safetensors",
752
+ "model.visual.merger.norm.bias": "model-00001-of-00004.safetensors",
753
+ "model.visual.merger.norm.weight": "model-00001-of-00004.safetensors",
754
+ "model.visual.patch_embed.proj.bias": "model-00001-of-00004.safetensors",
755
+ "model.visual.patch_embed.proj.weight": "model-00001-of-00004.safetensors",
756
+ "model.visual.pos_embed.weight": "model-00001-of-00004.safetensors"
757
+ }
758
+ }
modeling_contextvla.py ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ from torch import nn
4
+ import torch
5
+
6
+ from huggingface_hub import snapshot_download
7
+ from transformers.trainer_utils import load_sharded_checkpoint
8
+ from transformers import AutoConfig, AutoProcessor
9
+
10
+ from qwenvl.model.modeling_qwen3_vl import Qwen3VLForConditionalGeneration
11
+ from qwenvl.model.contextvla import LayerWrapper
12
+
13
+
14
+ ACTION_START_TOKEN = "<|action_start|>"
15
+ ACTION_END_TOKEN = "<|action_end|>"
16
+ ACTION_PLACEHOLDER_TOKEN = "<|action_placeholder|>"
17
+
18
+
19
+ def add_action_to_processor(processor):
20
+ custom_tokens = [ACTION_START_TOKEN, ACTION_END_TOKEN, ACTION_PLACEHOLDER_TOKEN]
21
+ for i in range(2048):
22
+ custom_tokens.append(f"<|action_{i}|>")
23
+
24
+ num_added = processor.tokenizer.add_tokens(custom_tokens, special_tokens=True)
25
+ print(f"Added {num_added} custom tokens")
26
+
27
+ return processor
28
+
29
+
30
+ class ContextVLA_Qwen3VL(Qwen3VLForConditionalGeneration):
31
+ @classmethod
32
+ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
33
+ base_config = AutoConfig.from_pretrained("Qwen/Qwen3-VL-8B-Instruct")
34
+ model = Qwen3VLForConditionalGeneration._from_config(base_config, **kwargs)
35
+ for layer_idx in range(len(model.model.language_model.layers)):
36
+ model.model.language_model.layers[layer_idx] = LayerWrapper(
37
+ model.model.language_model.layers[layer_idx],
38
+ layer_idx=layer_idx,
39
+ internal_projection=4,
40
+ img_pattern=[151652],
41
+ motion_token=1
42
+ )
43
+
44
+ processor = AutoProcessor.from_pretrained(
45
+ "Qwen/Qwen3-VL-8B-Instruct",
46
+ )
47
+ processor = add_action_to_processor(processor)
48
+ model.resize_token_embeddings(len(processor.tokenizer))
49
+
50
+ if os.path.isdir(pretrained_model_name_or_path):
51
+ local_dir = pretrained_model_name_or_path
52
+ else:
53
+ local_dir = snapshot_download(pretrained_model_name_or_path)
54
+
55
+ load_sharded_checkpoint(model, local_dir)
56
+ print(f"[ContextVLA] weights loaded from {local_dir}")
57
+
58
+ return model
modeling_qwen3_vl.py ADDED
@@ -0,0 +1,1617 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
2
+ # This file was automatically generated from src/transformers/models/qwen3_vl/modular_qwen3_vl.py.
3
+ # Do NOT edit this file manually as any edits will be overwritten by the generation of
4
+ # the file from the modular. If any change should be done, please apply the change to the
5
+ # modular_qwen3_vl.py file directly. One of our CI enforces this.
6
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
7
+ # coding=utf-8
8
+ # Copyright 2025 The Qwen Team and The HuggingFace Inc. team. All rights reserved.
9
+ #
10
+ # Licensed under the Apache License, Version 2.0 (the "License");
11
+ # you may not use this file except in compliance with the License.
12
+ # You may obtain a copy of the License at
13
+ #
14
+ # http://www.apache.org/licenses/LICENSE-2.0
15
+ #
16
+ # Unless required by applicable law or agreed to in writing, software
17
+ # distributed under the License is distributed on an "AS IS" BASIS,
18
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19
+ # See the License for the specific language governing permissions and
20
+ # limitations under the License.
21
+ import os
22
+
23
+ from collections.abc import Callable
24
+ from dataclasses import dataclass
25
+ from typing import Any, Optional, Union
26
+
27
+ import torch
28
+ import torch.nn as nn
29
+ import torch.nn.functional as F
30
+
31
+ from transformers.activations import ACT2FN
32
+ from transformers.cache_utils import Cache, DynamicCache
33
+ from transformers.generation import GenerationMixin
34
+ from transformers.integrations import use_kernel_forward_from_hub
35
+ from transformers.masking_utils import create_causal_mask
36
+ from transformers.modeling_flash_attention_utils import FlashAttentionKwargs
37
+ from transformers.modeling_layers import GradientCheckpointingLayer
38
+ from transformers.modeling_outputs import BaseModelOutputWithPast, ModelOutput
39
+ from transformers.modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update
40
+ from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
41
+ from transformers.processing_utils import Unpack
42
+ from transformers.utils import TransformersKwargs, auto_docstring, is_torchdynamo_compiling
43
+ from transformers.utils.generic import check_model_inputs
44
+ from transformers.models.qwen3_vl.configuration_qwen3_vl import Qwen3VLConfig, Qwen3VLTextConfig, Qwen3VLVisionConfig
45
+
46
+ local_rank = int(os.getenv("LOCAL_RANK", "0"))
47
+ world_size = torch.cuda.device_count()
48
+
49
+ rank = local_rank
50
+
51
+ class Qwen3VLVisionMLP(nn.Module):
52
+ def __init__(self, config):
53
+ super().__init__()
54
+ self.hidden_size = config.hidden_size
55
+ self.intermediate_size = config.intermediate_size
56
+ self.linear_fc1 = nn.Linear(self.hidden_size, self.intermediate_size, bias=True)
57
+ self.linear_fc2 = nn.Linear(self.intermediate_size, self.hidden_size, bias=True)
58
+ self.act_fn = ACT2FN[config.hidden_act]
59
+
60
+ def forward(self, hidden_state):
61
+ return self.linear_fc2(self.act_fn(self.linear_fc1(hidden_state)))
62
+
63
+
64
+ class Qwen3VLVisionPatchEmbed(nn.Module):
65
+ def __init__(self, config) -> None:
66
+ super().__init__()
67
+ self.patch_size = config.patch_size
68
+ self.temporal_patch_size = config.temporal_patch_size
69
+ self.in_channels = config.in_channels
70
+ self.embed_dim = config.hidden_size
71
+
72
+ kernel_size = [self.temporal_patch_size, self.patch_size, self.patch_size]
73
+ self.proj = nn.Conv3d(self.in_channels, self.embed_dim, kernel_size=kernel_size, stride=kernel_size, bias=True)
74
+
75
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
76
+ target_dtype = self.proj.weight.dtype
77
+ hidden_states = hidden_states.view(
78
+ -1, self.in_channels, self.temporal_patch_size, self.patch_size, self.patch_size
79
+ )
80
+ hidden_states = self.proj(hidden_states.to(dtype=target_dtype)).view(-1, self.embed_dim)
81
+ return hidden_states
82
+
83
+
84
+ class Qwen3VLVisionRotaryEmbedding(nn.Module):
85
+ inv_freq: torch.Tensor # fix linting for `register_buffer`
86
+
87
+ def __init__(self, dim: int, theta: float = 10000.0) -> None:
88
+ super().__init__()
89
+ inv_freq = 1.0 / (theta ** (torch.arange(0, dim, 2, dtype=torch.float) / dim))
90
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
91
+
92
+ def forward(self, seqlen: int) -> torch.Tensor:
93
+ seq = torch.arange(seqlen, device=self.inv_freq.device, dtype=self.inv_freq.dtype)
94
+ freqs = torch.outer(seq, self.inv_freq)
95
+ return freqs
96
+
97
+
98
+ class Qwen3VLVisionPatchMerger(nn.Module):
99
+ def __init__(self, config: Qwen3VLVisionConfig, use_postshuffle_norm=False) -> None:
100
+ super().__init__()
101
+ self.hidden_size = config.hidden_size * (config.spatial_merge_size**2)
102
+ self.use_postshuffle_norm = use_postshuffle_norm
103
+ self.norm = nn.LayerNorm(self.hidden_size if use_postshuffle_norm else config.hidden_size, eps=1e-6)
104
+ self.linear_fc1 = nn.Linear(self.hidden_size, self.hidden_size)
105
+ self.act_fn = nn.GELU()
106
+ self.linear_fc2 = nn.Linear(self.hidden_size, config.out_hidden_size)
107
+
108
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
109
+ x = self.norm(x.view(-1, self.hidden_size) if self.use_postshuffle_norm else x).view(-1, self.hidden_size)
110
+ x = self.linear_fc2(self.act_fn(self.linear_fc1(x)))
111
+ return x
112
+
113
+
114
+ def rotate_half(x):
115
+ """Rotates half the hidden dims of the input."""
116
+ x1 = x[..., : x.shape[-1] // 2]
117
+ x2 = x[..., x.shape[-1] // 2 :]
118
+ return torch.cat((-x2, x1), dim=-1)
119
+
120
+
121
+ def apply_rotary_pos_emb_vision(
122
+ q: torch.Tensor, k: torch.Tensor, cos: torch.Tensor, sin: torch.Tensor
123
+ ) -> tuple[torch.Tensor, torch.Tensor]:
124
+ orig_q_dtype = q.dtype
125
+ orig_k_dtype = k.dtype
126
+ q, k = q.float(), k.float()
127
+ cos, sin = cos.unsqueeze(-2).float(), sin.unsqueeze(-2).float()
128
+ q_embed = (q * cos) + (rotate_half(q) * sin)
129
+ k_embed = (k * cos) + (rotate_half(k) * sin)
130
+ q_embed = q_embed.to(orig_q_dtype)
131
+ k_embed = k_embed.to(orig_k_dtype)
132
+ return q_embed, k_embed
133
+
134
+
135
+ def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
136
+ """
137
+ This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
138
+ num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
139
+ """
140
+ batch, num_key_value_heads, slen, head_dim = hidden_states.shape
141
+ if n_rep == 1:
142
+ return hidden_states
143
+ hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
144
+ return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
145
+
146
+
147
+ def eager_attention_forward(
148
+ module: nn.Module,
149
+ query: torch.Tensor,
150
+ key: torch.Tensor,
151
+ value: torch.Tensor,
152
+ attention_mask: Optional[torch.Tensor],
153
+ scaling: float,
154
+ dropout: float = 0.0,
155
+ **kwargs: Unpack[TransformersKwargs],
156
+ ):
157
+ key_states = repeat_kv(key, module.num_key_value_groups)
158
+ value_states = repeat_kv(value, module.num_key_value_groups)
159
+
160
+ attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling
161
+ if attention_mask is not None:
162
+ causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
163
+ attn_weights = attn_weights + causal_mask
164
+
165
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)
166
+ attn_weights = nn.functional.dropout(attn_weights, p=dropout, training=module.training)
167
+ attn_output = torch.matmul(attn_weights, value_states)
168
+ attn_output = attn_output.transpose(1, 2).contiguous()
169
+
170
+ return attn_output, attn_weights
171
+
172
+
173
+ class Qwen3VLVisionAttention(nn.Module):
174
+ def __init__(self, config: Qwen3VLVisionConfig) -> None:
175
+ super().__init__()
176
+ self.dim = config.hidden_size
177
+ self.num_heads = config.num_heads
178
+ self.head_dim = self.dim // self.num_heads
179
+ self.num_key_value_groups = 1 # needed for eager attention
180
+ self.qkv = nn.Linear(self.dim, self.dim * 3, bias=True)
181
+ self.proj = nn.Linear(self.dim, self.dim)
182
+ self.scaling = self.head_dim**-0.5
183
+ self.config = config
184
+ self.attention_dropout = 0.0
185
+ self.is_causal = False
186
+
187
+ def forward(
188
+ self,
189
+ hidden_states: torch.Tensor,
190
+ cu_seqlens: torch.Tensor,
191
+ rotary_pos_emb: Optional[torch.Tensor] = None,
192
+ position_embeddings: Optional[tuple[torch.Tensor, torch.Tensor]] = None,
193
+ **kwargs,
194
+ ) -> torch.Tensor:
195
+ seq_length = hidden_states.shape[0]
196
+ query_states, key_states, value_states = (
197
+ self.qkv(hidden_states).reshape(seq_length, 3, self.num_heads, -1).permute(1, 0, 2, 3).unbind(0)
198
+ )
199
+ cos, sin = position_embeddings
200
+ query_states, key_states = apply_rotary_pos_emb_vision(query_states, key_states, cos, sin)
201
+
202
+ query_states = query_states.transpose(0, 1).unsqueeze(0)
203
+ key_states = key_states.transpose(0, 1).unsqueeze(0)
204
+ value_states = value_states.transpose(0, 1).unsqueeze(0)
205
+
206
+ attention_interface: Callable = eager_attention_forward
207
+ if self.config._attn_implementation != "eager":
208
+ attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
209
+
210
+ if self.config._attn_implementation == "flash_attention_2":
211
+ # Flash Attention 2: Use cu_seqlens for variable length attention
212
+ max_seqlen = (cu_seqlens[1:] - cu_seqlens[:-1]).max()
213
+ attn_output, _ = attention_interface(
214
+ self,
215
+ query_states,
216
+ key_states,
217
+ value_states,
218
+ attention_mask=None,
219
+ scaling=self.scaling,
220
+ dropout=0.0 if not self.training else self.attention_dropout,
221
+ cu_seq_lens_q=cu_seqlens,
222
+ cu_seq_lens_k=cu_seqlens,
223
+ max_length_q=max_seqlen,
224
+ max_length_k=max_seqlen,
225
+ is_causal=False,
226
+ **kwargs,
227
+ )
228
+ else:
229
+ # Other implementations: Process each chunk separately
230
+ lengths = cu_seqlens[1:] - cu_seqlens[:-1]
231
+ splits = [
232
+ torch.split(tensor, lengths.tolist(), dim=2) for tensor in (query_states, key_states, value_states)
233
+ ]
234
+
235
+ attn_outputs = [
236
+ attention_interface(
237
+ self,
238
+ q,
239
+ k,
240
+ v,
241
+ attention_mask=None,
242
+ scaling=self.scaling,
243
+ dropout=0.0 if not self.training else self.attention_dropout,
244
+ is_causal=False,
245
+ **kwargs,
246
+ )[0]
247
+ for q, k, v in zip(*splits)
248
+ ]
249
+ attn_output = torch.cat(attn_outputs, dim=1)
250
+
251
+ attn_output = attn_output.reshape(seq_length, -1).contiguous()
252
+ attn_output = self.proj(attn_output)
253
+
254
+ return attn_output
255
+
256
+
257
+ class Qwen3VLVisionBlock(GradientCheckpointingLayer):
258
+ def __init__(self, config, attn_implementation: str = "sdpa") -> None:
259
+ super().__init__()
260
+ self.norm1 = nn.LayerNorm(config.hidden_size, eps=1e-6)
261
+ self.norm2 = nn.LayerNorm(config.hidden_size, eps=1e-6)
262
+ self.attn = Qwen3VLVisionAttention(config=config)
263
+ self.mlp = Qwen3VLVisionMLP(config=config)
264
+
265
+ def forward(
266
+ self,
267
+ hidden_states: torch.Tensor,
268
+ cu_seqlens: torch.Tensor,
269
+ rotary_pos_emb: Optional[torch.Tensor] = None,
270
+ position_embeddings: Optional[tuple[torch.Tensor, torch.Tensor]] = None,
271
+ **kwargs,
272
+ ) -> torch.Tensor:
273
+ hidden_states = hidden_states + self.attn(
274
+ self.norm1(hidden_states),
275
+ cu_seqlens=cu_seqlens,
276
+ rotary_pos_emb=rotary_pos_emb,
277
+ position_embeddings=position_embeddings,
278
+ **kwargs,
279
+ )
280
+ hidden_states = hidden_states + self.mlp(self.norm2(hidden_states))
281
+ return hidden_states
282
+
283
+
284
+ class Qwen3VLTextRotaryEmbedding(nn.Module):
285
+ inv_freq: torch.Tensor # fix linting for `register_buffer`
286
+
287
+ def __init__(self, config: Qwen3VLTextConfig, device=None):
288
+ super().__init__()
289
+ self.max_seq_len_cached = config.max_position_embeddings
290
+ self.original_max_seq_len = config.max_position_embeddings
291
+
292
+ self.config = config
293
+
294
+ self.rope_type = self.config.rope_parameters["rope_type"]
295
+ rope_init_fn: Callable = self.compute_default_rope_parameters
296
+ if self.rope_type != "default":
297
+ rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]
298
+ inv_freq, self.attention_scaling = rope_init_fn(self.config, device)
299
+
300
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
301
+ self.original_inv_freq = inv_freq
302
+
303
+ self.mrope_section = config.rope_parameters.get("mrope_section", [24, 20, 20])
304
+
305
+ @staticmethod
306
+ def compute_default_rope_parameters(
307
+ config: Optional[Qwen3VLTextConfig] = None,
308
+ device: Optional["torch.device"] = None,
309
+ seq_len: Optional[int] = None,
310
+ ) -> tuple["torch.Tensor", float]:
311
+ """
312
+ Computes the inverse frequencies according to the original RoPE implementation
313
+ Args:
314
+ config ([`~transformers.PreTrainedConfig`]):
315
+ The model configuration.
316
+ device (`torch.device`):
317
+ The device to use for initialization of the inverse frequencies.
318
+ seq_len (`int`, *optional*):
319
+ The current sequence length. Unused for this type of RoPE.
320
+ Returns:
321
+ Tuple of (`torch.Tensor`, `float`), containing the inverse frequencies for the RoPE embeddings and the
322
+ post-processing scaling factor applied to the computed cos/sin (unused in this type of RoPE).
323
+ """
324
+ base = config.rope_parameters["rope_theta"]
325
+ dim = getattr(config, "head_dim", None) or config.hidden_size // config.num_attention_heads
326
+
327
+ attention_factor = 1.0 # Unused in this type of RoPE
328
+
329
+ # Compute the inverse frequencies
330
+ inv_freq = 1.0 / (
331
+ base ** (torch.arange(0, dim, 2, dtype=torch.int64).to(device=device, dtype=torch.float) / dim)
332
+ )
333
+ return inv_freq, attention_factor
334
+
335
+ @torch.no_grad()
336
+ @dynamic_rope_update # power user: used with advanced RoPE types (e.g. dynamic rope)
337
+ def forward(self, x, position_ids):
338
+ # In contrast to other models, Qwen3VL has different position ids for the grids
339
+ # So we expand the inv_freq to shape (3, ...)
340
+ if position_ids.ndim == 2:
341
+ position_ids = position_ids[None, ...].expand(3, position_ids.shape[0], -1)
342
+ inv_freq_expanded = self.inv_freq[None, None, :, None].float().expand(3, position_ids.shape[1], -1, 1)
343
+ position_ids_expanded = position_ids[:, :, None, :].float() # shape (3, bs, 1, positions)
344
+
345
+ device_type = x.device.type if isinstance(x.device.type, str) and x.device.type != "mps" else "cpu"
346
+ with torch.autocast(device_type=device_type, enabled=False): # Force float32
347
+ freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(2, 3)
348
+ freqs = self.apply_interleaved_mrope(freqs, self.mrope_section)
349
+ emb = torch.cat((freqs, freqs), dim=-1)
350
+ cos = emb.cos() * self.attention_scaling
351
+ sin = emb.sin() * self.attention_scaling
352
+
353
+ return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
354
+
355
+ def apply_interleaved_mrope(self, freqs, mrope_section):
356
+ """Apply interleaved MRoPE to 3D rotary embeddings.
357
+ Reorganizes frequency layout from chunked [TTT...HHH...WWW] to
358
+ interleaved [THTHWHTHW...TT], preserving frequency continuity.
359
+ args:
360
+ x: (3, bs, seq_len, head_dim // 2)
361
+ mrope_section: (3,)
362
+ returns:
363
+ x_t: (bs, seq_len, head_dim // 2)
364
+ """
365
+ freqs_t = freqs[0] # just overwrite the first dimension T
366
+ for dim, offset in enumerate((1, 2), start=1): # H, W
367
+ length = mrope_section[dim] * 3
368
+ idx = slice(offset, length, 3)
369
+ freqs_t[..., idx] = freqs[dim, ..., idx]
370
+ return freqs_t
371
+
372
+
373
+ @use_kernel_forward_from_hub("RMSNorm")
374
+ class Qwen3VLTextRMSNorm(nn.Module):
375
+ def __init__(self, hidden_size, eps: float = 1e-6) -> None:
376
+ """
377
+ Qwen3VLTextRMSNorm is equivalent to T5LayerNorm
378
+ """
379
+ super().__init__()
380
+ self.weight = nn.Parameter(torch.ones(hidden_size))
381
+ self.variance_epsilon = eps
382
+
383
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
384
+ input_dtype = hidden_states.dtype
385
+ hidden_states = hidden_states.to(torch.float32)
386
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
387
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
388
+ return self.weight * hidden_states.to(input_dtype)
389
+
390
+ def extra_repr(self):
391
+ return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"
392
+
393
+
394
+ def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
395
+ """Applies Rotary Position Embedding to the query and key tensors.
396
+
397
+ Args:
398
+ q (`torch.Tensor`): The query tensor.
399
+ k (`torch.Tensor`): The key tensor.
400
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
401
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
402
+ position_ids (`torch.Tensor`, *optional*):
403
+ Deprecated and unused.
404
+ unsqueeze_dim (`int`, *optional*, defaults to 1):
405
+ The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
406
+ sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
407
+ that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
408
+ k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
409
+ cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
410
+ the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
411
+ Returns:
412
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
413
+ """
414
+ cos = cos.unsqueeze(unsqueeze_dim)
415
+ sin = sin.unsqueeze(unsqueeze_dim)
416
+ q_embed = (q * cos) + (rotate_half(q) * sin)
417
+ k_embed = (k * cos) + (rotate_half(k) * sin)
418
+ return q_embed, k_embed
419
+
420
+
421
+ class Qwen3VLTextAttention(nn.Module):
422
+ """Multi-headed attention from 'Attention Is All You Need' paper"""
423
+
424
+ def __init__(self, config: Qwen3VLTextConfig, layer_idx: int):
425
+ super().__init__()
426
+ self.layer_type = config.layer_types[layer_idx] if hasattr(config, "layer_types") else None
427
+ self.config = config
428
+ self.layer_idx = layer_idx
429
+ self.head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads)
430
+ self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads
431
+ self.scaling = self.head_dim**-0.5
432
+ self.attention_dropout = config.attention_dropout
433
+ self.is_causal = True
434
+
435
+ self.q_proj = nn.Linear(
436
+ config.hidden_size, config.num_attention_heads * self.head_dim, bias=config.attention_bias
437
+ )
438
+ self.k_proj = nn.Linear(
439
+ config.hidden_size, config.num_key_value_heads * self.head_dim, bias=config.attention_bias
440
+ )
441
+ self.v_proj = nn.Linear(
442
+ config.hidden_size, config.num_key_value_heads * self.head_dim, bias=config.attention_bias
443
+ )
444
+ self.o_proj = nn.Linear(
445
+ config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias
446
+ )
447
+ self.q_norm = Qwen3VLTextRMSNorm(self.head_dim, eps=config.rms_norm_eps) # unlike olmo, only on the head dim!
448
+ self.k_norm = Qwen3VLTextRMSNorm(
449
+ self.head_dim, eps=config.rms_norm_eps
450
+ ) # thus post q_norm does not need reshape
451
+
452
+ def forward(
453
+ self,
454
+ hidden_states: torch.Tensor,
455
+ position_embeddings: tuple[torch.Tensor, torch.Tensor],
456
+ attention_mask: Optional[torch.Tensor],
457
+ past_key_values: Optional[Cache] = None,
458
+ cache_position: Optional[torch.LongTensor] = None,
459
+ **kwargs: Unpack[FlashAttentionKwargs],
460
+ ) -> tuple[torch.Tensor, Optional[torch.Tensor]]:
461
+ input_shape = hidden_states.shape[:-1]
462
+ hidden_shape = (*input_shape, -1, self.head_dim)
463
+
464
+ query_states = self.q_norm(self.q_proj(hidden_states).view(hidden_shape)).transpose(1, 2)
465
+ key_states = self.k_norm(self.k_proj(hidden_states).view(hidden_shape)).transpose(1, 2)
466
+ value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)
467
+
468
+ cos, sin = position_embeddings
469
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
470
+
471
+ if past_key_values is not None:
472
+ # sin and cos are specific to RoPE models; cache_position needed for the static cache
473
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
474
+ key_states, value_states = past_key_values.update(key_states, value_states, self.layer_idx, cache_kwargs)
475
+
476
+ attention_interface: Callable = eager_attention_forward
477
+ if self.config._attn_implementation != "eager":
478
+ attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
479
+
480
+ attn_output, attn_weights = attention_interface(
481
+ self,
482
+ query_states,
483
+ key_states,
484
+ value_states,
485
+ attention_mask,
486
+ dropout=0.0 if not self.training else self.attention_dropout,
487
+ scaling=self.scaling,
488
+ **kwargs,
489
+ )
490
+
491
+ attn_output = attn_output.reshape(*input_shape, -1).contiguous()
492
+ attn_output = self.o_proj(attn_output)
493
+ return attn_output, attn_weights
494
+
495
+
496
+ class Qwen3VLTextMLP(nn.Module):
497
+ def __init__(self, config):
498
+ super().__init__()
499
+ self.config = config
500
+ self.hidden_size = config.hidden_size
501
+ self.intermediate_size = config.intermediate_size
502
+ self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
503
+ self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
504
+ self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
505
+ self.act_fn = ACT2FN[config.hidden_act]
506
+
507
+ def forward(self, x):
508
+ down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
509
+ return down_proj
510
+
511
+
512
+ class Qwen3VLTextDecoderLayer(GradientCheckpointingLayer):
513
+ def __init__(self, config: Qwen3VLTextConfig, layer_idx: int):
514
+ super().__init__()
515
+ self.hidden_size = config.hidden_size
516
+
517
+ self.self_attn = Qwen3VLTextAttention(config=config, layer_idx=layer_idx)
518
+
519
+ self.mlp = Qwen3VLTextMLP(config)
520
+ self.input_layernorm = Qwen3VLTextRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
521
+ self.post_attention_layernorm = Qwen3VLTextRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
522
+
523
+ def forward(
524
+ self,
525
+ hidden_states: torch.Tensor,
526
+ position_embeddings: tuple[torch.Tensor, torch.Tensor],
527
+ attention_mask: Optional[torch.Tensor] = None,
528
+ position_ids: Optional[torch.LongTensor] = None,
529
+ past_key_values: Optional[Cache] = None,
530
+ use_cache: Optional[bool] = False,
531
+ cache_position: Optional[torch.LongTensor] = None,
532
+ **kwargs: Unpack[TransformersKwargs],
533
+ ) -> torch.Tensor:
534
+ residual = hidden_states
535
+ hidden_states = self.input_layernorm(hidden_states)
536
+ # Self Attention
537
+ hidden_states, _ = self.self_attn(
538
+ hidden_states=hidden_states,
539
+ attention_mask=attention_mask,
540
+ position_ids=position_ids,
541
+ past_key_values=past_key_values,
542
+ use_cache=use_cache,
543
+ cache_position=cache_position,
544
+ position_embeddings=position_embeddings,
545
+ **kwargs,
546
+ )
547
+ hidden_states = residual + hidden_states
548
+
549
+ # Fully Connected
550
+ residual = hidden_states
551
+ hidden_states = self.post_attention_layernorm(hidden_states)
552
+ hidden_states = self.mlp(hidden_states)
553
+ hidden_states = residual + hidden_states
554
+ return hidden_states
555
+
556
+
557
+ @dataclass
558
+ @auto_docstring(
559
+ custom_intro="""
560
+ Base class for Llava outputs, with hidden states and attentions.
561
+ """
562
+ )
563
+ class Qwen3VLModelOutputWithPast(ModelOutput):
564
+ r"""
565
+ past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
566
+ It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache).
567
+
568
+ Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see
569
+ `past_key_values` input) to speed up sequential decoding.
570
+ rope_deltas (`torch.LongTensor` of shape `(batch_size, )`, *optional*):
571
+ The rope index difference between sequence length and multimodal rope.
572
+ """
573
+
574
+ last_hidden_state: Optional[torch.FloatTensor] = None
575
+ past_key_values: Optional[Cache] = None
576
+ hidden_states: Optional[tuple[torch.FloatTensor]] = None
577
+ attentions: Optional[tuple[torch.FloatTensor]] = None
578
+ rope_deltas: Optional[torch.LongTensor] = None
579
+
580
+
581
+ @auto_docstring
582
+ class Qwen3VLPreTrainedModel(PreTrainedModel):
583
+ config: Qwen3VLConfig
584
+ base_model_prefix = "model"
585
+ input_modalities = ["image", "video", "text"]
586
+ supports_gradient_checkpointing = True
587
+ _no_split_modules = ["Qwen3VLTextDecoderLayer", "Qwen3VLVisionBlock"]
588
+ _skip_keys_device_placement = "past_key_values"
589
+ _supports_flash_attn = True
590
+ _supports_sdpa = True
591
+
592
+ _can_compile_fullgraph = True
593
+ _supports_attention_backend = True
594
+ _can_record_outputs = {
595
+ "hidden_states": Qwen3VLTextDecoderLayer,
596
+ "attentions": Qwen3VLTextAttention,
597
+ }
598
+
599
+
600
+ class Qwen3VLVisionModel(Qwen3VLPreTrainedModel):
601
+ config: Qwen3VLVisionConfig
602
+ _no_split_modules = ["Qwen3VLVisionBlock"]
603
+
604
+ def __init__(self, config, *inputs, **kwargs) -> None:
605
+ super().__init__(config, *inputs, **kwargs)
606
+ self.spatial_merge_size = config.spatial_merge_size
607
+ self.patch_size = config.patch_size
608
+ self.spatial_merge_unit = self.spatial_merge_size * self.spatial_merge_size
609
+
610
+ self.patch_embed = Qwen3VLVisionPatchEmbed(
611
+ config=config,
612
+ )
613
+
614
+ self.pos_embed = nn.Embedding(config.num_position_embeddings, config.hidden_size)
615
+ self.num_grid_per_side = int(config.num_position_embeddings**0.5)
616
+
617
+ head_dim = config.hidden_size // config.num_heads
618
+ self.rotary_pos_emb = Qwen3VLVisionRotaryEmbedding(head_dim // 2)
619
+
620
+ self.blocks = nn.ModuleList([Qwen3VLVisionBlock(config) for _ in range(config.depth)])
621
+ self.merger = Qwen3VLVisionPatchMerger(
622
+ config=config,
623
+ use_postshuffle_norm=False,
624
+ )
625
+
626
+ self.deepstack_visual_indexes = config.deepstack_visual_indexes
627
+ self.deepstack_merger_list = nn.ModuleList(
628
+ [
629
+ Qwen3VLVisionPatchMerger(
630
+ config=config,
631
+ use_postshuffle_norm=True,
632
+ )
633
+ for _ in range(len(config.deepstack_visual_indexes))
634
+ ]
635
+ )
636
+
637
+ self.gradient_checkpointing = False
638
+
639
+ def rot_pos_emb(self, grid_thw: torch.Tensor) -> torch.Tensor:
640
+ merge_size = self.spatial_merge_size
641
+
642
+ max_hw = int(grid_thw[:, 1:].max().item())
643
+ freq_table = self.rotary_pos_emb(max_hw) # (max_hw, dim // 2)
644
+ device = freq_table.device
645
+
646
+ total_tokens = int(torch.prod(grid_thw, dim=1).sum().item())
647
+ pos_ids = torch.empty((total_tokens, 2), dtype=torch.long, device=device)
648
+
649
+ offset = 0
650
+ for num_frames, height, width in grid_thw:
651
+ merged_h, merged_w = height // merge_size, width // merge_size
652
+
653
+ block_rows = torch.arange(merged_h, device=device) # block row indices
654
+ block_cols = torch.arange(merged_w, device=device) # block col indices
655
+ intra_row = torch.arange(merge_size, device=device) # intra-block row offsets
656
+ intra_col = torch.arange(merge_size, device=device) # intra-block col offsets
657
+
658
+ # Compute full-resolution positions
659
+ row_idx = block_rows[:, None, None, None] * merge_size + intra_row[None, None, :, None]
660
+ col_idx = block_cols[None, :, None, None] * merge_size + intra_col[None, None, None, :]
661
+
662
+ row_idx = row_idx.expand(merged_h, merged_w, merge_size, merge_size).reshape(-1)
663
+ col_idx = col_idx.expand(merged_h, merged_w, merge_size, merge_size).reshape(-1)
664
+
665
+ coords = torch.stack((row_idx, col_idx), dim=-1)
666
+
667
+ if num_frames > 1:
668
+ coords = coords.repeat(num_frames, 1)
669
+
670
+ num_tokens = coords.shape[0]
671
+ pos_ids[offset : offset + num_tokens] = coords
672
+ offset += num_tokens
673
+
674
+ embeddings = freq_table[pos_ids] # lookup rotary embeddings
675
+ embeddings = embeddings.flatten(1)
676
+ return embeddings
677
+
678
+ def fast_pos_embed_interpolate(self, grid_thw):
679
+ grid_ts, grid_hs, grid_ws = grid_thw[:, 0], grid_thw[:, 1], grid_thw[:, 2]
680
+ device = self.pos_embed.weight.device
681
+
682
+ idx_list = [[] for _ in range(4)]
683
+ weight_list = [[] for _ in range(4)]
684
+
685
+ for t, h, w in zip(grid_ts, grid_hs, grid_ws):
686
+ h_idxs = torch.linspace(0, self.num_grid_per_side - 1, h)
687
+ w_idxs = torch.linspace(0, self.num_grid_per_side - 1, w)
688
+
689
+ h_idxs_floor = h_idxs.int()
690
+ w_idxs_floor = w_idxs.int()
691
+ h_idxs_ceil = (h_idxs.int() + 1).clip(max=self.num_grid_per_side - 1)
692
+ w_idxs_ceil = (w_idxs.int() + 1).clip(max=self.num_grid_per_side - 1)
693
+
694
+ dh = h_idxs - h_idxs_floor
695
+ dw = w_idxs - w_idxs_floor
696
+
697
+ base_h = h_idxs_floor * self.num_grid_per_side
698
+ base_h_ceil = h_idxs_ceil * self.num_grid_per_side
699
+
700
+ indices = [
701
+ (base_h[None].T + w_idxs_floor[None]).flatten(),
702
+ (base_h[None].T + w_idxs_ceil[None]).flatten(),
703
+ (base_h_ceil[None].T + w_idxs_floor[None]).flatten(),
704
+ (base_h_ceil[None].T + w_idxs_ceil[None]).flatten(),
705
+ ]
706
+
707
+ weights = [
708
+ ((1 - dh)[None].T * (1 - dw)[None]).flatten(),
709
+ ((1 - dh)[None].T * dw[None]).flatten(),
710
+ (dh[None].T * (1 - dw)[None]).flatten(),
711
+ (dh[None].T * dw[None]).flatten(),
712
+ ]
713
+
714
+ for i in range(4):
715
+ idx_list[i].extend(indices[i].tolist())
716
+ weight_list[i].extend(weights[i].tolist())
717
+
718
+ idx_tensor = torch.tensor(idx_list, dtype=torch.long, device=device)
719
+ weight_tensor = torch.tensor(weight_list, dtype=self.pos_embed.weight.dtype, device=device)
720
+ pos_embeds = self.pos_embed(idx_tensor).to(device) * weight_tensor[:, :, None]
721
+ patch_pos_embeds = pos_embeds[0] + pos_embeds[1] + pos_embeds[2] + pos_embeds[3]
722
+
723
+ patch_pos_embeds = patch_pos_embeds.split([h * w for h, w in zip(grid_hs, grid_ws)])
724
+
725
+ patch_pos_embeds_permute = []
726
+ merge_size = self.config.spatial_merge_size
727
+ for pos_embed, t, h, w in zip(patch_pos_embeds, grid_ts, grid_hs, grid_ws):
728
+ pos_embed = pos_embed.repeat(t, 1)
729
+ pos_embed = (
730
+ pos_embed.view(t, h // merge_size, merge_size, w // merge_size, merge_size, -1)
731
+ .permute(0, 1, 3, 2, 4, 5)
732
+ .flatten(0, 4)
733
+ )
734
+ patch_pos_embeds_permute.append(pos_embed)
735
+ patch_pos_embeds = torch.cat(patch_pos_embeds_permute)
736
+ return patch_pos_embeds
737
+
738
+ def forward(self, hidden_states: torch.Tensor, grid_thw: torch.Tensor, **kwargs) -> torch.Tensor:
739
+ """
740
+ Args:
741
+ hidden_states (`torch.Tensor` of shape `(seq_len, hidden_size)`):
742
+ The final hidden states of the model.
743
+ grid_thw (`torch.Tensor` of shape `(num_images_or_videos, 3)`):
744
+ The temporal, height and width of feature shape of each image in LLM.
745
+
746
+ Returns:
747
+ `torch.Tensor`: hidden_states.
748
+ """
749
+ hidden_states = self.patch_embed(hidden_states)
750
+
751
+ pos_embeds = self.fast_pos_embed_interpolate(grid_thw)
752
+ hidden_states = hidden_states + pos_embeds
753
+
754
+ rotary_pos_emb = self.rot_pos_emb(grid_thw)
755
+
756
+ seq_len, _ = hidden_states.size()
757
+ hidden_states = hidden_states.reshape(seq_len, -1)
758
+ rotary_pos_emb = rotary_pos_emb.reshape(seq_len, -1)
759
+ emb = torch.cat((rotary_pos_emb, rotary_pos_emb), dim=-1)
760
+ position_embeddings = (emb.cos(), emb.sin())
761
+
762
+ cu_seqlens = torch.repeat_interleave(grid_thw[:, 1] * grid_thw[:, 2], grid_thw[:, 0]).cumsum(
763
+ dim=0,
764
+ # Select dtype based on the following factors:
765
+ # - FA2 requires that cu_seqlens_q must have dtype int32
766
+ # - torch.onnx.export requires that cu_seqlens_q must have same dtype as grid_thw
767
+ # See https://github.com/huggingface/transformers/pull/34852 for more information
768
+ dtype=grid_thw.dtype if torch.jit.is_tracing() else torch.int32,
769
+ )
770
+ cu_seqlens = F.pad(cu_seqlens, (1, 0), value=0)
771
+
772
+ deepstack_feature_lists = []
773
+ for layer_num, blk in enumerate(self.blocks):
774
+ hidden_states = blk(
775
+ hidden_states,
776
+ cu_seqlens=cu_seqlens,
777
+ position_embeddings=position_embeddings,
778
+ **kwargs,
779
+ )
780
+ if layer_num in self.deepstack_visual_indexes:
781
+ deepstack_feature = self.deepstack_merger_list[self.deepstack_visual_indexes.index(layer_num)](
782
+ hidden_states
783
+ )
784
+ deepstack_feature_lists.append(deepstack_feature)
785
+
786
+ hidden_states = self.merger(hidden_states)
787
+
788
+ return hidden_states, deepstack_feature_lists
789
+
790
+
791
+ @auto_docstring(
792
+ custom_intro=(
793
+ "Text part of Qwen3VL, "
794
+ "not a pure text-only model, as DeepStack integrates visual features into the early hidden states."
795
+ )
796
+ )
797
+ class Qwen3VLTextModel(Qwen3VLPreTrainedModel):
798
+ config: Qwen3VLTextConfig
799
+ _no_split_modules = ["Qwen3VLTextDecoderLayer"]
800
+
801
+ def __init__(self, config: Qwen3VLTextConfig):
802
+ super().__init__(config)
803
+ self.padding_idx = config.pad_token_id
804
+ self.vocab_size = config.vocab_size
805
+
806
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
807
+ self.layers = nn.ModuleList(
808
+ [Qwen3VLTextDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
809
+ )
810
+ self.norm = Qwen3VLTextRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
811
+ self.rotary_emb = Qwen3VLTextRotaryEmbedding(config=config)
812
+ self.gradient_checkpointing = False
813
+
814
+ # Initialize weights and apply final processing
815
+ self.post_init()
816
+
817
+ @check_model_inputs()
818
+ @auto_docstring
819
+ def forward(
820
+ self,
821
+ input_ids: Optional[torch.LongTensor] = None,
822
+ attention_mask: Optional[torch.Tensor] = None,
823
+ position_ids: Optional[torch.LongTensor] = None,
824
+ past_key_values: Optional[Cache] = None,
825
+ inputs_embeds: Optional[torch.FloatTensor] = None,
826
+ use_cache: Optional[bool] = None,
827
+ cache_position: Optional[torch.LongTensor] = None,
828
+ # args for deepstack
829
+ visual_pos_masks: Optional[torch.Tensor] = None,
830
+ deepstack_visual_embeds: Optional[list[torch.Tensor]] = None,
831
+ **kwargs: Unpack[FlashAttentionKwargs],
832
+ ) -> Union[tuple, BaseModelOutputWithPast]:
833
+ r"""
834
+ visual_pos_masks (`torch.Tensor` of shape `(batch_size, seqlen)`, *optional*):
835
+ The mask of the visual positions.
836
+ deepstack_visual_embeds (`list[torch.Tensor]`, *optional*):
837
+ The deepstack visual embeddings. The shape is (num_layers, visual_seqlen, embed_dim).
838
+ The feature is extracted from the different visual encoder layers, and fed to the decoder
839
+ hidden states. It's from the paper DeepStack(https://arxiv.org/abs/2406.04334).
840
+ """
841
+
842
+ # torch.jit.trace() doesn't support cache objects in the output
843
+ if use_cache and past_key_values is None and not torch.jit.is_tracing():
844
+ past_key_values = DynamicCache(config=self.config)
845
+
846
+ if inputs_embeds is None:
847
+ inputs_embeds = self.embed_tokens(input_ids)
848
+
849
+ if cache_position is None:
850
+ past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
851
+ cache_position = torch.arange(
852
+ past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1], device=inputs_embeds.device
853
+ )
854
+
855
+ # the hard coded `3` is for temporal, height and width.
856
+ if position_ids is None:
857
+ position_ids = cache_position.view(1, 1, -1).expand(3, inputs_embeds.shape[0], -1)
858
+ elif position_ids.ndim == 2:
859
+ position_ids = position_ids[None, ...].expand(3, position_ids.shape[0], -1)
860
+
861
+ if position_ids.ndim == 3 and position_ids.shape[0] == 4:
862
+ text_position_ids = position_ids[0]
863
+ position_ids = position_ids[1:]
864
+ else:
865
+ text_position_ids = position_ids[0]
866
+
867
+ attention_mask = create_causal_mask(
868
+ config=self.config,
869
+ input_embeds=inputs_embeds,
870
+ attention_mask=attention_mask,
871
+ cache_position=cache_position,
872
+ past_key_values=past_key_values,
873
+ position_ids=text_position_ids,
874
+ )
875
+
876
+ hidden_states = inputs_embeds
877
+
878
+ # create position embeddings to be shared across the decoder layers
879
+ position_embeddings = self.rotary_emb(hidden_states, position_ids)
880
+
881
+ # decoder layers: FIXME: HARD CODING
882
+ for layer_idx, decoder_layer in enumerate(self.layers):
883
+ layer_outputs = decoder_layer(
884
+ hidden_states,
885
+ input_ids,
886
+ attention_mask=attention_mask,
887
+ position_ids=text_position_ids,
888
+ past_key_values=past_key_values,
889
+ cache_position=cache_position,
890
+ position_embeddings=position_embeddings,
891
+ **kwargs,
892
+ )
893
+ ## FIXME: HARD CODING
894
+ hidden_states = layer_outputs[0]
895
+ if 'attention_mask' in layer_outputs[1]:
896
+ attention_mask = layer_outputs[1]['attention_mask']
897
+ if 'position_ids' in layer_outputs[1]:
898
+ text_position_ids = layer_outputs[1]['position_ids']
899
+ if 'past_key_values' in layer_outputs[1]:
900
+ past_key_values = layer_outputs[1]['past_key_values']
901
+ if 'cache_position' in layer_outputs[1]:
902
+ cache_position = layer_outputs[1]['cache_position']
903
+ if 'position_embeddings' in layer_outputs[1]:
904
+ position_embeddings = layer_outputs[1]['position_embeddings']
905
+
906
+ # add visual features to the hidden states of first several layers
907
+ if deepstack_visual_embeds is not None and layer_idx in range(len(deepstack_visual_embeds)):
908
+ hidden_states = self._deepstack_process(
909
+ hidden_states,
910
+ visual_pos_masks,
911
+ deepstack_visual_embeds[layer_idx],
912
+ )
913
+
914
+ hidden_states = self.norm(hidden_states)
915
+
916
+ return BaseModelOutputWithPast(
917
+ last_hidden_state=hidden_states,
918
+ past_key_values=past_key_values,
919
+ )
920
+
921
+ def _deepstack_process(
922
+ self, hidden_states: torch.Tensor, visual_pos_masks: torch.Tensor, visual_embeds: torch.Tensor
923
+ ):
924
+ visual_pos_masks = visual_pos_masks.to(hidden_states.device)
925
+ visual_embeds = visual_embeds.to(hidden_states.device, hidden_states.dtype)
926
+ hidden_states = hidden_states.clone()
927
+ local_this = hidden_states[visual_pos_masks, :] + visual_embeds
928
+ hidden_states[visual_pos_masks, :] = local_this
929
+ return hidden_states
930
+
931
+
932
+ @auto_docstring
933
+ class Qwen3VLModel(Qwen3VLPreTrainedModel):
934
+ base_model_prefix = ""
935
+ _checkpoint_conversion_mapping = {}
936
+ # Reference: fix gemma3 grad acc #37208
937
+ accepts_loss_kwargs = False
938
+ config: Qwen3VLConfig
939
+ _no_split_modules = ["Qwen3VLTextDecoderLayer", "Qwen3VLVisionBlock"]
940
+
941
+ def __init__(self, config):
942
+ super().__init__(config)
943
+ self.visual = Qwen3VLVisionModel._from_config(config.vision_config)
944
+ self.language_model = Qwen3VLTextModel._from_config(config.text_config)
945
+ self.rope_deltas = None # cache rope_deltas here
946
+
947
+ # Initialize weights and apply final processing
948
+ self.post_init()
949
+
950
+ def get_input_embeddings(self):
951
+ return self.language_model.get_input_embeddings()
952
+
953
+ def set_input_embeddings(self, value):
954
+ self.language_model.set_input_embeddings(value)
955
+
956
+ def set_decoder(self, decoder):
957
+ self.language_model = decoder
958
+
959
+ def get_decoder(self):
960
+ return self.language_model
961
+
962
+ def get_rope_index(
963
+ self,
964
+ input_ids: Optional[torch.LongTensor] = None,
965
+ image_grid_thw: Optional[torch.LongTensor] = None,
966
+ video_grid_thw: Optional[torch.LongTensor] = None,
967
+ attention_mask: Optional[torch.Tensor] = None,
968
+ ) -> tuple[torch.Tensor, torch.Tensor]:
969
+ """Different from the original implementation, Qwen3VL use timestamps rather than absolute time position ids."""
970
+
971
+ # Since we use timestamps to seperate videos, like <t1> <vision_start> <frame1> <vision_end> <t2> <vision_start> <frame2> <vision_end>, the video_grid_thw should also be split
972
+ if video_grid_thw is not None:
973
+ video_grid_thw = torch.repeat_interleave(video_grid_thw, video_grid_thw[:, 0], dim=0)
974
+ video_grid_thw[:, 0] = 1
975
+
976
+ spatial_merge_size = self.config.vision_config.spatial_merge_size
977
+ image_token_id = self.config.image_token_id
978
+ video_token_id = self.config.video_token_id
979
+ vision_start_token_id = self.config.vision_start_token_id
980
+ mrope_position_deltas = []
981
+ if input_ids is not None and (image_grid_thw is not None or video_grid_thw is not None):
982
+ total_input_ids = input_ids
983
+ if attention_mask is None:
984
+ attention_mask = torch.ones_like(total_input_ids)
985
+ position_ids = torch.ones(
986
+ 3,
987
+ input_ids.shape[0],
988
+ input_ids.shape[1],
989
+ dtype=input_ids.dtype,
990
+ device=input_ids.device,
991
+ )
992
+ image_index, video_index = 0, 0
993
+ attention_mask = attention_mask.to(total_input_ids.device)
994
+ for i, input_ids in enumerate(total_input_ids):
995
+ input_ids = input_ids[attention_mask[i] == 1]
996
+ image_nums, video_nums = 0, 0
997
+ vision_start_indices = torch.argwhere(input_ids == vision_start_token_id).squeeze(1)
998
+ vision_tokens = input_ids[vision_start_indices + 1]
999
+ image_nums = (vision_tokens == image_token_id).sum()
1000
+ video_nums = (vision_tokens == video_token_id).sum()
1001
+ input_tokens = input_ids.tolist()
1002
+ llm_pos_ids_list: list = []
1003
+ st = 0
1004
+ remain_images, remain_videos = image_nums, video_nums
1005
+ for _ in range(image_nums + video_nums):
1006
+ if image_token_id in input_tokens and remain_images > 0:
1007
+ ed_image = input_tokens.index(image_token_id, st)
1008
+ else:
1009
+ ed_image = len(input_tokens) + 1
1010
+ if video_token_id in input_tokens and remain_videos > 0:
1011
+ ed_video = input_tokens.index(video_token_id, st)
1012
+ else:
1013
+ ed_video = len(input_tokens) + 1
1014
+ if ed_image < ed_video:
1015
+ t, h, w = (
1016
+ image_grid_thw[image_index][0],
1017
+ image_grid_thw[image_index][1],
1018
+ image_grid_thw[image_index][2],
1019
+ )
1020
+ image_index += 1
1021
+ remain_images -= 1
1022
+ ed = ed_image
1023
+
1024
+ else:
1025
+ t, h, w = (
1026
+ video_grid_thw[video_index][0],
1027
+ video_grid_thw[video_index][1],
1028
+ video_grid_thw[video_index][2],
1029
+ )
1030
+ video_index += 1
1031
+ remain_videos -= 1
1032
+ ed = ed_video
1033
+ llm_grid_t, llm_grid_h, llm_grid_w = (
1034
+ t.item(),
1035
+ h.item() // spatial_merge_size,
1036
+ w.item() // spatial_merge_size,
1037
+ )
1038
+ text_len = ed - st
1039
+
1040
+ st_idx = llm_pos_ids_list[-1].max() + 1 if len(llm_pos_ids_list) > 0 else 0
1041
+ llm_pos_ids_list.append(torch.arange(text_len).view(1, -1).expand(3, -1) + st_idx)
1042
+
1043
+ # t_index is always 0 because llm_grid_t is always 1 (we use timestamps to encode the temporal information for videos)
1044
+ t_index = torch.arange(llm_grid_t).view(-1, 1).expand(-1, llm_grid_h * llm_grid_w).flatten()
1045
+ h_index = torch.arange(llm_grid_h).view(1, -1, 1).expand(llm_grid_t, -1, llm_grid_w).flatten()
1046
+ w_index = torch.arange(llm_grid_w).view(1, 1, -1).expand(llm_grid_t, llm_grid_h, -1).flatten()
1047
+ llm_pos_ids_list.append(torch.stack([t_index, h_index, w_index]) + text_len + st_idx)
1048
+ st = ed + llm_grid_t * llm_grid_h * llm_grid_w
1049
+
1050
+ if st < len(input_tokens):
1051
+ st_idx = llm_pos_ids_list[-1].max() + 1 if len(llm_pos_ids_list) > 0 else 0
1052
+ text_len = len(input_tokens) - st
1053
+ llm_pos_ids_list.append(torch.arange(text_len).view(1, -1).expand(3, -1) + st_idx)
1054
+
1055
+ llm_positions = torch.cat(llm_pos_ids_list, dim=1).reshape(3, -1)
1056
+ position_ids[..., i, attention_mask[i] == 1] = llm_positions.to(position_ids.device)
1057
+ mrope_position_deltas.append(llm_positions.max() + 1 - len(total_input_ids[i]))
1058
+ mrope_position_deltas = torch.tensor(mrope_position_deltas, device=input_ids.device).unsqueeze(1)
1059
+ return position_ids, mrope_position_deltas
1060
+ else:
1061
+ if attention_mask is not None:
1062
+ position_ids = attention_mask.long().cumsum(-1) - 1
1063
+ position_ids.masked_fill_(attention_mask == 0, 1)
1064
+ position_ids = position_ids.unsqueeze(0).expand(3, -1, -1).to(attention_mask.device)
1065
+ max_position_ids = position_ids.max(0, keepdim=False)[0].max(-1, keepdim=True)[0]
1066
+ mrope_position_deltas = max_position_ids + 1 - attention_mask.shape[-1]
1067
+ else:
1068
+ position_ids = (
1069
+ torch.arange(input_ids.shape[1], device=input_ids.device)
1070
+ .view(1, 1, -1)
1071
+ .expand(3, input_ids.shape[0], -1)
1072
+ )
1073
+ mrope_position_deltas = torch.zeros(
1074
+ [input_ids.shape[0], 1],
1075
+ device=input_ids.device,
1076
+ dtype=input_ids.dtype,
1077
+ )
1078
+
1079
+ return position_ids, mrope_position_deltas
1080
+
1081
+ def get_video_features(
1082
+ self, pixel_values_videos: torch.FloatTensor, video_grid_thw: Optional[torch.LongTensor] = None
1083
+ ):
1084
+ """
1085
+ Encodes videos into continuous embeddings that can be forwarded to the language model. The deepstack visual features are also returned.
1086
+
1087
+ Args:
1088
+ pixel_values_videos (`torch.FloatTensor` of shape `(batch_size, num_channels, image_size, image_size)`):
1089
+ The tensors corresponding to the input videos.
1090
+ video_grid_thw (`torch.LongTensor` of shape `(num_videos, 3)`, *optional*):
1091
+ The temporal, height and width of feature shape of each video in LLM.
1092
+ """
1093
+ # Same implementation as for images
1094
+ return self.get_image_features(pixel_values_videos, video_grid_thw)
1095
+
1096
+ def get_image_features(self, pixel_values: torch.FloatTensor, image_grid_thw: Optional[torch.LongTensor] = None):
1097
+ """
1098
+ Encodes images into continuous embeddings that can be forwarded to the language model. The deepstack visual features are also returned.
1099
+
1100
+ Args:
1101
+ pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, image_size, image_size)`):
1102
+ The tensors corresponding to the input images.
1103
+ image_grid_thw (`torch.LongTensor` of shape `(num_images, 3)`, *optional*):
1104
+ The temporal, height and width of feature shape of each image in LLM.
1105
+ """
1106
+ pixel_values = pixel_values.type(self.visual.dtype)
1107
+ image_embeds, deepstack_image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
1108
+ split_sizes = (image_grid_thw.prod(-1) // self.visual.spatial_merge_size**2).tolist()
1109
+ image_embeds = torch.split(image_embeds, split_sizes)
1110
+ return image_embeds, deepstack_image_embeds
1111
+
1112
+ def get_placeholder_mask(
1113
+ self,
1114
+ input_ids: torch.LongTensor,
1115
+ inputs_embeds: torch.FloatTensor,
1116
+ image_features: Optional[torch.FloatTensor] = None,
1117
+ video_features: Optional[torch.FloatTensor] = None,
1118
+ ):
1119
+ """
1120
+ Obtains multimodal placeholder mask from `input_ids` or `inputs_embeds`, and checks that the placeholder token count is
1121
+ equal to the length of multimodal features. If the lengths are different, an error is raised.
1122
+ """
1123
+ if input_ids is None:
1124
+ special_image_mask = inputs_embeds == self.get_input_embeddings()(
1125
+ torch.tensor(self.config.image_token_id, dtype=torch.long, device=inputs_embeds.device)
1126
+ )
1127
+ special_image_mask = special_image_mask.all(-1)
1128
+ special_video_mask = inputs_embeds == self.get_input_embeddings()(
1129
+ torch.tensor(self.config.video_token_id, dtype=torch.long, device=inputs_embeds.device)
1130
+ )
1131
+ special_video_mask = special_video_mask.all(-1)
1132
+ else:
1133
+ special_image_mask = input_ids == self.config.image_token_id
1134
+ special_video_mask = input_ids == self.config.video_token_id
1135
+
1136
+ n_image_tokens = special_image_mask.sum()
1137
+ special_image_mask = special_image_mask.unsqueeze(-1).expand_as(inputs_embeds).to(inputs_embeds.device)
1138
+ if image_features is not None and inputs_embeds[special_image_mask].numel() != image_features.numel():
1139
+ raise ValueError(
1140
+ f"Image features and image tokens do not match: tokens: {n_image_tokens}, features {image_features.shape[0]}"
1141
+ )
1142
+
1143
+ n_video_tokens = special_video_mask.sum()
1144
+ special_video_mask = special_video_mask.unsqueeze(-1).expand_as(inputs_embeds).to(inputs_embeds.device)
1145
+ if video_features is not None and inputs_embeds[special_video_mask].numel() != video_features.numel():
1146
+ raise ValueError(
1147
+ f"Videos features and video tokens do not match: tokens: {n_video_tokens}, features {video_features.shape[0]}"
1148
+ )
1149
+
1150
+ return special_image_mask, special_video_mask
1151
+
1152
+ @auto_docstring
1153
+ @check_model_inputs()
1154
+ def forward(
1155
+ self,
1156
+ input_ids: torch.LongTensor = None,
1157
+ attention_mask: Optional[torch.Tensor] = None,
1158
+ position_ids: Optional[torch.LongTensor] = None,
1159
+ past_key_values: Optional[Cache] = None,
1160
+ inputs_embeds: Optional[torch.FloatTensor] = None,
1161
+ pixel_values: Optional[torch.Tensor] = None,
1162
+ pixel_values_videos: Optional[torch.FloatTensor] = None,
1163
+ image_grid_thw: Optional[torch.LongTensor] = None,
1164
+ video_grid_thw: Optional[torch.LongTensor] = None,
1165
+ cache_position: Optional[torch.LongTensor] = None,
1166
+ **kwargs: Unpack[TransformersKwargs],
1167
+ ) -> Union[tuple, Qwen3VLModelOutputWithPast]:
1168
+ r"""
1169
+ image_grid_thw (`torch.LongTensor` of shape `(num_images, 3)`, *optional*):
1170
+ The temporal, height and width of feature shape of each image in LLM.
1171
+ video_grid_thw (`torch.LongTensor` of shape `(num_videos, 3)`, *optional*):
1172
+ The temporal, height and width of feature shape of each video in LLM.
1173
+ """
1174
+ if (input_ids is None) ^ (inputs_embeds is not None):
1175
+ raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
1176
+
1177
+ if inputs_embeds is None:
1178
+ inputs_embeds = self.get_input_embeddings()(input_ids)
1179
+
1180
+ image_mask = None
1181
+ video_mask = None
1182
+
1183
+ if pixel_values is not None:
1184
+ image_embeds, deepstack_image_embeds = self.get_image_features(pixel_values, image_grid_thw)
1185
+ image_embeds = torch.cat(image_embeds, dim=0).to(inputs_embeds.device, inputs_embeds.dtype)
1186
+ image_mask, _ = self.get_placeholder_mask(
1187
+ input_ids, inputs_embeds=inputs_embeds, image_features=image_embeds
1188
+ )
1189
+ inputs_embeds = inputs_embeds.masked_scatter(image_mask, image_embeds)
1190
+
1191
+ if pixel_values_videos is not None:
1192
+ video_embeds, deepstack_video_embeds = self.get_video_features(pixel_values_videos, video_grid_thw)
1193
+ video_embeds = torch.cat(video_embeds, dim=0).to(inputs_embeds.device, inputs_embeds.dtype)
1194
+ _, video_mask = self.get_placeholder_mask(
1195
+ input_ids, inputs_embeds=inputs_embeds, video_features=video_embeds
1196
+ )
1197
+ inputs_embeds = inputs_embeds.masked_scatter(video_mask, video_embeds)
1198
+
1199
+
1200
+ visual_pos_masks = None
1201
+ deepstack_visual_embeds = None
1202
+ if image_mask is not None and video_mask is not None:
1203
+ # aggregate visual_pos_masks and deepstack_visual_embeds
1204
+ image_mask = image_mask[..., 0]
1205
+ video_mask = video_mask[..., 0]
1206
+ visual_pos_masks = image_mask | video_mask
1207
+ deepstack_visual_embeds = []
1208
+ image_mask_joint = image_mask[visual_pos_masks]
1209
+ video_mask_joint = video_mask[visual_pos_masks]
1210
+ for img_embed, vid_embed in zip(deepstack_image_embeds, deepstack_video_embeds):
1211
+ embed_joint = img_embed.new_zeros(visual_pos_masks.sum(), img_embed.shape[-1]).to(img_embed.device)
1212
+ embed_joint[image_mask_joint, :] = img_embed
1213
+ embed_joint[video_mask_joint, :] = vid_embed
1214
+ deepstack_visual_embeds.append(embed_joint)
1215
+ elif image_mask is not None:
1216
+ image_mask = image_mask[..., 0]
1217
+ visual_pos_masks = image_mask
1218
+ deepstack_visual_embeds = deepstack_image_embeds
1219
+ elif video_mask is not None:
1220
+ video_mask = video_mask[..., 0]
1221
+ visual_pos_masks = video_mask
1222
+ deepstack_visual_embeds = deepstack_video_embeds
1223
+
1224
+ if position_ids is None:
1225
+ past_key_values_length = 0 if past_key_values is None else past_key_values.get_seq_length()
1226
+ if self.rope_deltas is None or past_key_values_length == 0:
1227
+ position_ids, rope_deltas = self.get_rope_index(
1228
+ input_ids,
1229
+ image_grid_thw,
1230
+ video_grid_thw,
1231
+ attention_mask=attention_mask,
1232
+ )
1233
+ self.rope_deltas = rope_deltas
1234
+ # then use the prev pre-calculated rope-deltas to get the correct position ids
1235
+ else:
1236
+ batch_size, seq_length, _ = inputs_embeds.shape
1237
+ delta = (past_key_values_length + self.rope_deltas).to(inputs_embeds.device)
1238
+ position_ids = torch.arange(seq_length, device=inputs_embeds.device)
1239
+ position_ids = position_ids.view(1, -1).expand(batch_size, -1)
1240
+ if cache_position is not None: # otherwise `deltas` is an int `0`
1241
+ delta = delta.repeat_interleave(batch_size // delta.shape[0], dim=0)
1242
+ position_ids = position_ids.add(delta)
1243
+ position_ids = position_ids.unsqueeze(0).expand(3, -1, -1)
1244
+
1245
+ outputs = self.language_model(
1246
+ input_ids=input_ids,
1247
+ position_ids=position_ids,
1248
+ attention_mask=attention_mask,
1249
+ past_key_values=past_key_values,
1250
+ inputs_embeds=inputs_embeds,
1251
+ cache_position=cache_position,
1252
+ visual_pos_masks=visual_pos_masks,
1253
+ deepstack_visual_embeds=deepstack_visual_embeds,
1254
+ **kwargs,
1255
+ )
1256
+
1257
+ return Qwen3VLModelOutputWithPast(
1258
+ last_hidden_state=outputs.last_hidden_state,
1259
+ past_key_values=outputs.past_key_values,
1260
+ rope_deltas=self.rope_deltas,
1261
+ )
1262
+
1263
+
1264
+ @dataclass
1265
+ @auto_docstring(
1266
+ custom_intro="""
1267
+ Base class for Qwen3VL causal language model (or autoregressive) outputs.
1268
+ """
1269
+ )
1270
+ class Qwen3VLCausalLMOutputWithPast(ModelOutput):
1271
+ r"""
1272
+ loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided):
1273
+ Language modeling loss (for next-token prediction).
1274
+ logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`):
1275
+ Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
1276
+ past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
1277
+ It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache).
1278
+
1279
+ Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see
1280
+ `past_key_values` input) to speed up sequential decoding.
1281
+ rope_deltas (`torch.LongTensor` of shape `(batch_size, )`, *optional*):
1282
+ The rope index difference between sequence length and multimodal rope.
1283
+ """
1284
+
1285
+ loss: Optional[torch.FloatTensor] = None
1286
+ logits: Optional[torch.FloatTensor] = None
1287
+ past_key_values: Optional[Cache] = None
1288
+ hidden_states: Optional[tuple[torch.FloatTensor]] = None
1289
+ attentions: Optional[tuple[torch.FloatTensor]] = None
1290
+ rope_deltas: Optional[torch.LongTensor] = None
1291
+
1292
+
1293
+ class Qwen3VLForConditionalGeneration(Qwen3VLPreTrainedModel, GenerationMixin):
1294
+ _checkpoint_conversion_mapping = {}
1295
+ _tied_weights_keys = {"lm_head.weight": "model.language_model.embed_tokens.weight"}
1296
+ # Reference: fix gemma3 grad acc #37208
1297
+ accepts_loss_kwargs = False
1298
+ config: Qwen3VLConfig
1299
+
1300
+ def __init__(self, config):
1301
+ super().__init__(config)
1302
+ self.model = Qwen3VLModel(config)
1303
+ self.lm_head = nn.Linear(config.text_config.hidden_size, config.text_config.vocab_size, bias=False)
1304
+
1305
+ self.post_init()
1306
+
1307
+ def get_input_embeddings(self):
1308
+ return self.model.get_input_embeddings()
1309
+
1310
+ def set_input_embeddings(self, value):
1311
+ self.model.set_input_embeddings(value)
1312
+
1313
+ def set_decoder(self, decoder):
1314
+ self.model.set_decoder(decoder)
1315
+
1316
+ def get_decoder(self):
1317
+ return self.model.get_decoder()
1318
+
1319
+ def get_video_features(
1320
+ self, pixel_values_videos: torch.FloatTensor, video_grid_thw: Optional[torch.LongTensor] = None
1321
+ ):
1322
+ return self.model.get_video_features(pixel_values_videos, video_grid_thw)
1323
+
1324
+ def get_image_features(self, pixel_values: torch.FloatTensor, image_grid_thw: Optional[torch.LongTensor] = None):
1325
+ return self.model.get_image_features(pixel_values, image_grid_thw)
1326
+
1327
+ # Make modules available through conditional class for BC
1328
+ @property
1329
+ def language_model(self):
1330
+ return self.model.language_model
1331
+
1332
+ @property
1333
+ def visual(self):
1334
+ return self.model.visual
1335
+
1336
+ @check_model_inputs()
1337
+ def forward(
1338
+ self,
1339
+ input_ids: torch.LongTensor = None,
1340
+ attention_mask: Optional[torch.Tensor] = None,
1341
+ position_ids: Optional[torch.LongTensor] = None,
1342
+ past_key_values: Optional[Cache] = None,
1343
+ inputs_embeds: Optional[torch.FloatTensor] = None,
1344
+ labels: Optional[torch.LongTensor] = None,
1345
+ pixel_values: Optional[torch.Tensor] = None,
1346
+ pixel_values_videos: Optional[torch.FloatTensor] = None,
1347
+ image_grid_thw: Optional[torch.LongTensor] = None,
1348
+ video_grid_thw: Optional[torch.LongTensor] = None,
1349
+ cache_position: Optional[torch.LongTensor] = None,
1350
+ logits_to_keep: Union[int, torch.Tensor] = 0,
1351
+ **kwargs: Unpack[TransformersKwargs],
1352
+ ) -> Union[tuple, Qwen3VLCausalLMOutputWithPast]:
1353
+
1354
+ outputs = self.model(
1355
+ input_ids=input_ids,
1356
+ pixel_values=pixel_values,
1357
+ pixel_values_videos=pixel_values_videos,
1358
+ image_grid_thw=image_grid_thw,
1359
+ video_grid_thw=video_grid_thw,
1360
+ position_ids=position_ids,
1361
+ attention_mask=attention_mask,
1362
+ past_key_values=past_key_values,
1363
+ inputs_embeds=inputs_embeds,
1364
+ cache_position=cache_position,
1365
+ **kwargs,
1366
+ )
1367
+
1368
+ hidden_states = outputs[0]
1369
+
1370
+ # Only compute necessary logits, and do not upcast them to float if we are not computing the loss
1371
+ logits = self.lm_head(hidden_states)
1372
+
1373
+ loss = None
1374
+ if labels is not None:
1375
+ loss = self.loss_function(logits=logits, labels=labels[..., -1*logits.shape[1]:], vocab_size=self.config.text_config.vocab_size)
1376
+
1377
+ return Qwen3VLCausalLMOutputWithPast(
1378
+ loss=loss,
1379
+ logits=logits,
1380
+ past_key_values=outputs.past_key_values,
1381
+ rope_deltas=outputs.rope_deltas,
1382
+ )
1383
+
1384
+ def prepare_inputs_for_generation(
1385
+ self,
1386
+ input_ids,
1387
+ past_key_values=None,
1388
+ attention_mask=None,
1389
+ inputs_embeds=None,
1390
+ cache_position=None,
1391
+ position_ids=None,
1392
+ use_cache=True,
1393
+ pixel_values=None,
1394
+ pixel_values_videos=None,
1395
+ image_grid_thw=None,
1396
+ video_grid_thw=None,
1397
+ **kwargs,
1398
+ ):
1399
+ # Overwritten -- in specific circumstances we don't want to forward image inputs to the model
1400
+
1401
+ model_inputs = super().prepare_inputs_for_generation(
1402
+ input_ids,
1403
+ past_key_values=past_key_values,
1404
+ attention_mask=attention_mask,
1405
+ inputs_embeds=inputs_embeds,
1406
+ cache_position=cache_position,
1407
+ position_ids=position_ids,
1408
+ pixel_values=pixel_values,
1409
+ pixel_values_videos=pixel_values_videos,
1410
+ image_grid_thw=image_grid_thw,
1411
+ video_grid_thw=video_grid_thw,
1412
+ use_cache=use_cache,
1413
+ **kwargs,
1414
+ )
1415
+
1416
+ model_inputs = super().prepare_inputs_for_generation(
1417
+ input_ids,
1418
+ past_key_values=past_key_values,
1419
+ attention_mask=attention_mask,
1420
+ inputs_embeds=inputs_embeds,
1421
+ cache_position=cache_position,
1422
+ position_ids=position_ids,
1423
+ pixel_values=pixel_values,
1424
+ pixel_values_videos=pixel_values_videos,
1425
+ image_grid_thw=image_grid_thw,
1426
+ video_grid_thw=video_grid_thw,
1427
+ use_cache=use_cache,
1428
+ **kwargs,
1429
+ )
1430
+
1431
+ # Qwen3VL position_ids are prepared with rope_deltas
1432
+ if position_ids is None:
1433
+ # Calculate RoPE index once per generation in the pre-fill stage only.
1434
+ # When compiling, we can't check tensor values thus we check only input length
1435
+ # It is safe to assume that `length!=1` means we're in pre-fill because compiled
1436
+ # models currently cannot do asssisted decoding
1437
+ if model_inputs["cache_position"][0] == 0 or self.model.rope_deltas is None:
1438
+ vision_positions, rope_deltas = self.model.get_rope_index(
1439
+ model_inputs.get("input_ids", None),
1440
+ image_grid_thw=image_grid_thw,
1441
+ video_grid_thw=video_grid_thw,
1442
+ attention_mask=attention_mask,
1443
+ )
1444
+ self.model.rope_deltas = rope_deltas
1445
+ # then use the prev pre-calculated rope-deltas to get the correct position ids
1446
+ elif "position_ids" in model_inputs:
1447
+ batch_size, seq_length = model_inputs["position_ids"].shape
1448
+ device = model_inputs["position_ids"].device
1449
+ position_ids = torch.arange(seq_length, device=device)
1450
+ position_ids = position_ids.view(1, 1, -1).expand(3, batch_size, -1)
1451
+ delta = cache_position[0] + self.model.rope_deltas
1452
+ delta = delta.repeat_interleave(batch_size // delta.shape[0], dim=0)
1453
+ vision_positions = position_ids + delta.expand_as(position_ids)
1454
+
1455
+ # Concatenate "text + vision" positions into [4, bs, seq-len]
1456
+ text_positions = model_inputs["position_ids"][None, ...]
1457
+ model_inputs["position_ids"] = torch.cat([text_positions, vision_positions], dim=0)
1458
+
1459
+ if cache_position[0] != 0:
1460
+ model_inputs["pixel_values"] = None
1461
+ model_inputs["pixel_values_videos"] = None
1462
+
1463
+ return model_inputs
1464
+
1465
+ def _get_image_nums_and_video_nums(
1466
+ self,
1467
+ input_ids: Optional[torch.LongTensor],
1468
+ inputs_embeds: Optional[torch.Tensor] = None,
1469
+ ) -> tuple[torch.Tensor, torch.Tensor]:
1470
+ """
1471
+ Get the number of images and videos for each sample to calculate the separation length of the sample tensor.
1472
+ These parameters are not passed through the processor to avoid unpredictable impacts from interface modifications.
1473
+
1474
+ Args:
1475
+ input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
1476
+ Indices of input sequence tokens in the vocabulary.
1477
+
1478
+ Returns:
1479
+ image_nums (`torch.LongTensor` of shape `(batch_size, num_images_sample)`)
1480
+ video_nums (`torch.LongTensor` of shape `(batch_size, num_videos_sample)`)
1481
+ """
1482
+ image_token_id = self.config.image_token_id
1483
+ video_token_id = self.config.video_token_id
1484
+ vision_start_token_id = self.config.vision_start_token_id
1485
+
1486
+ if inputs_embeds is not None:
1487
+ vision_start_mask = (
1488
+ inputs_embeds
1489
+ == self.get_input_embeddings()(
1490
+ torch.tensor(vision_start_token_id, dtype=torch.long, device=inputs_embeds.device)
1491
+ )
1492
+ )[..., 0]
1493
+ image_mask = (
1494
+ inputs_embeds
1495
+ == self.get_input_embeddings()(
1496
+ torch.tensor(image_token_id, dtype=torch.long, device=inputs_embeds.device)
1497
+ )
1498
+ )[..., 0]
1499
+ video_mask = (
1500
+ inputs_embeds
1501
+ == self.get_input_embeddings()(
1502
+ torch.tensor(video_token_id, dtype=torch.long, device=inputs_embeds.device)
1503
+ )
1504
+ )[..., 0]
1505
+ else:
1506
+ vision_start_mask = input_ids == vision_start_token_id
1507
+ image_mask = input_ids == image_token_id
1508
+ video_mask = input_ids == video_token_id
1509
+
1510
+ vision_first_mask = torch.roll(vision_start_mask, shifts=1, dims=1)
1511
+ image_nums = torch.sum(vision_first_mask & image_mask, dim=1)
1512
+ video_nums = torch.sum(vision_first_mask & video_mask, dim=1)
1513
+
1514
+ return image_nums, video_nums
1515
+
1516
+ def _expand_inputs_for_generation(
1517
+ self,
1518
+ expand_size: int = 1,
1519
+ is_encoder_decoder: bool = False,
1520
+ input_ids: Optional[torch.LongTensor] = None,
1521
+ **model_kwargs,
1522
+ ) -> tuple[torch.LongTensor, dict[str, Any]]:
1523
+ # Overwritten -- Support for expanding tensors without a batch size dimension
1524
+ # e.g., pixel_values, image_grid_thw, pixel_values_videos, video_grid_thw, second_per_grid_t
1525
+ # pixel_values.shape[0] is sum(seqlen_images for samples)
1526
+ # image_grid_thw.shape[0] is sum(num_images for samples)
1527
+
1528
+ if expand_size == 1:
1529
+ return input_ids, model_kwargs
1530
+
1531
+ visual_keys = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw"]
1532
+
1533
+ def _expand_dict_for_generation_visual(dict_to_expand):
1534
+ image_grid_thw = model_kwargs.get("image_grid_thw", None)
1535
+ video_grid_thw = model_kwargs.get("video_grid_thw", None)
1536
+ image_nums, video_nums = self._get_image_nums_and_video_nums(
1537
+ input_ids, inputs_embeds=model_kwargs.get("inputs_embeds", None)
1538
+ )
1539
+
1540
+ # video_nums: (batch_size,)
1541
+ # since video_nums is the number of videos in the input dependent on the input_ids(vision_start),
1542
+ # but qwen3vl append vision_start to each frame of each video, so we need to recover the real video_nums according to video_grid_thw
1543
+ if video_grid_thw is not None:
1544
+ cumulative_frame_counts = torch.cumsum(video_grid_thw[:, 0], dim=0)
1545
+ cumulative_token_video_counts = torch.cumsum(video_nums, dim=0)
1546
+ # Find video boundaries in cumulative_frame_counts
1547
+ video_boundary_indices = torch.searchsorted(cumulative_frame_counts, cumulative_token_video_counts)
1548
+ # example: video_boundary_indices = [3, 5] means video_nums = [4, 2]
1549
+ video_nums = torch.diff(torch.cat([-video_boundary_indices.new_ones(1), video_boundary_indices]))
1550
+
1551
+ def _repeat_interleave_samples(x, lengths, repeat_times):
1552
+ samples = torch.split(x, lengths)
1553
+ repeat_args = [repeat_times] + [1] * (x.dim() - 1)
1554
+ result = torch.cat([sample.repeat(*repeat_args) for sample in samples], dim=0)
1555
+ return result
1556
+
1557
+ for key in dict_to_expand:
1558
+ if key == "pixel_values":
1559
+ # split images into samples
1560
+ samples = torch.split(image_grid_thw, list(image_nums))
1561
+ # compute the sequence length of images for each sample
1562
+ lengths = [torch.prod(sample, dim=1).sum() for sample in samples]
1563
+ dict_to_expand[key] = _repeat_interleave_samples(
1564
+ dict_to_expand[key], lengths=lengths, repeat_times=expand_size
1565
+ )
1566
+ elif key == "image_grid_thw":
1567
+ # get the num of images for each sample
1568
+ lengths = list(image_nums)
1569
+ dict_to_expand[key] = _repeat_interleave_samples(
1570
+ dict_to_expand[key], lengths=lengths, repeat_times=expand_size
1571
+ )
1572
+ elif key == "pixel_values_videos":
1573
+ samples = torch.split(video_grid_thw, list(video_nums))
1574
+ lengths = [torch.prod(sample, dim=1).sum() for sample in samples]
1575
+ dict_to_expand[key] = _repeat_interleave_samples(
1576
+ dict_to_expand[key], lengths=lengths, repeat_times=expand_size
1577
+ )
1578
+ elif key == "video_grid_thw":
1579
+ lengths = list(video_nums)
1580
+ dict_to_expand[key] = _repeat_interleave_samples(
1581
+ dict_to_expand[key], lengths=lengths, repeat_times=expand_size
1582
+ )
1583
+ return dict_to_expand
1584
+
1585
+ def _expand_dict_for_generation(dict_to_expand):
1586
+ for key in dict_to_expand:
1587
+ if (
1588
+ key != "cache_position"
1589
+ and dict_to_expand[key] is not None
1590
+ and isinstance(dict_to_expand[key], torch.Tensor)
1591
+ and key not in visual_keys
1592
+ ):
1593
+ dict_to_expand[key] = dict_to_expand[key].repeat_interleave(expand_size, dim=0)
1594
+ return dict_to_expand
1595
+
1596
+ model_kwargs = _expand_dict_for_generation_visual(model_kwargs)
1597
+
1598
+ if input_ids is not None:
1599
+ input_ids = input_ids.repeat_interleave(expand_size, dim=0)
1600
+
1601
+ model_kwargs = _expand_dict_for_generation(model_kwargs)
1602
+
1603
+ if is_encoder_decoder:
1604
+ if model_kwargs.get("encoder_outputs") is None:
1605
+ raise ValueError("If `is_encoder_decoder` is True, make sure that `encoder_outputs` is defined.")
1606
+ model_kwargs["encoder_outputs"] = _expand_dict_for_generation(model_kwargs["encoder_outputs"])
1607
+
1608
+ return input_ids, model_kwargs
1609
+
1610
+
1611
+ __all__ = [
1612
+ "Qwen3VLVisionModel",
1613
+ "Qwen3VLForConditionalGeneration",
1614
+ "Qwen3VLModel",
1615
+ "Qwen3VLPreTrainedModel",
1616
+ "Qwen3VLTextModel",
1617
+ ]
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78113b4ebba2cf35807c8b5277d635e4940fee06c39a0eda6d913c7c7f9edbf1
3
+ size 11815343
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
trainer_state.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1475be9f09ec148da85bbe25c4595c6416527ff01e26fb2976cc14377b5c397d
3
+ size 11351594
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c81a80ebcd627171a70a382e22a64c162c34370fa9d42260e3bf782beb3383ae
3
+ size 7121
vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
zero_to_fp32.py ADDED
@@ -0,0 +1,760 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+
3
+ # Copyright (c) Microsoft Corporation.
4
+ # SPDX-License-Identifier: Apache-2.0
5
+
6
+ # DeepSpeed Team
7
+
8
+ # This script extracts fp32 consolidated weights from a zero 1, 2 and 3 DeepSpeed checkpoints. It gets
9
+ # copied into the top level checkpoint dir, so the user can easily do the conversion at any point in
10
+ # the future. Once extracted, the weights don't require DeepSpeed and can be used in any
11
+ # application.
12
+ #
13
+ # example:
14
+ # python zero_to_fp32.py . output_dir/
15
+ # or
16
+ # python zero_to_fp32.py . output_dir/ --safe_serialization
17
+
18
+ import argparse
19
+ import torch
20
+ import glob
21
+ import math
22
+ import os
23
+ import re
24
+ import gc
25
+ import json
26
+ import numpy as np
27
+ from tqdm import tqdm
28
+ from collections import OrderedDict
29
+ from dataclasses import dataclass
30
+
31
+ # while this script doesn't use deepspeed to recover data, since the checkpoints are pickled with
32
+ # DeepSpeed data structures it has to be available in the current python environment.
33
+ from deepspeed.utils import logger
34
+ from deepspeed.checkpoint.constants import (DS_VERSION, OPTIMIZER_STATE_DICT, SINGLE_PARTITION_OF_FP32_GROUPS,
35
+ FP32_FLAT_GROUPS, ZERO_STAGE, PARTITION_COUNT, PARAM_SHAPES, BUFFER_NAMES,
36
+ FROZEN_PARAM_SHAPES, FROZEN_PARAM_FRAGMENTS)
37
+
38
+
39
+ @dataclass
40
+ class zero_model_state:
41
+ buffers: dict()
42
+ param_shapes: dict()
43
+ shared_params: list
44
+ ds_version: int
45
+ frozen_param_shapes: dict()
46
+ frozen_param_fragments: dict()
47
+
48
+
49
+ debug = 0
50
+
51
+ # load to cpu
52
+ device = torch.device('cpu')
53
+
54
+
55
+ def atoi(text):
56
+ return int(text) if text.isdigit() else text
57
+
58
+
59
+ def natural_keys(text):
60
+ '''
61
+ alist.sort(key=natural_keys) sorts in human order
62
+ http://nedbatchelder.com/blog/200712/human_sorting.html
63
+ (See Toothy's implementation in the comments)
64
+ '''
65
+ return [atoi(c) for c in re.split(r'(\d+)', text)]
66
+
67
+
68
+ def get_model_state_file(checkpoint_dir, zero_stage):
69
+ if not os.path.isdir(checkpoint_dir):
70
+ raise FileNotFoundError(f"Directory '{checkpoint_dir}' doesn't exist")
71
+
72
+ # there should be only one file
73
+ if zero_stage <= 2:
74
+ file = os.path.join(checkpoint_dir, "mp_rank_00_model_states.pt")
75
+ elif zero_stage == 3:
76
+ file = os.path.join(checkpoint_dir, "zero_pp_rank_0_mp_rank_00_model_states.pt")
77
+
78
+ if not os.path.exists(file):
79
+ raise FileNotFoundError(f"can't find model states file at '{file}'")
80
+
81
+ return file
82
+
83
+
84
+ def get_checkpoint_files(checkpoint_dir, glob_pattern):
85
+ # XXX: need to test that this simple glob rule works for multi-node setup too
86
+ ckpt_files = sorted(glob.glob(os.path.join(checkpoint_dir, glob_pattern)), key=natural_keys)
87
+
88
+ if len(ckpt_files) == 0:
89
+ raise FileNotFoundError(f"can't find {glob_pattern} files in directory '{checkpoint_dir}'")
90
+
91
+ return ckpt_files
92
+
93
+
94
+ def get_optim_files(checkpoint_dir):
95
+ return get_checkpoint_files(checkpoint_dir, "*_optim_states.pt")
96
+
97
+
98
+ def get_model_state_files(checkpoint_dir):
99
+ return get_checkpoint_files(checkpoint_dir, "*_model_states.pt")
100
+
101
+
102
+ def parse_model_states(files):
103
+ zero_model_states = []
104
+ for file in files:
105
+ state_dict = torch.load(file, map_location=device, weights_only=False)
106
+
107
+ if BUFFER_NAMES not in state_dict:
108
+ raise ValueError(f"{file} is not a model state checkpoint")
109
+ buffer_names = state_dict[BUFFER_NAMES]
110
+ if debug:
111
+ print("Found buffers:", buffer_names)
112
+
113
+ # recover just the buffers while restoring them to fp32 if they were saved in fp16
114
+ buffers = {k: v.float() for k, v in state_dict["module"].items() if k in buffer_names}
115
+ param_shapes = state_dict[PARAM_SHAPES]
116
+
117
+ # collect parameters that are included in param_shapes
118
+ param_names = []
119
+ for s in param_shapes:
120
+ for name in s.keys():
121
+ param_names.append(name)
122
+
123
+ # update with frozen parameters
124
+ frozen_param_shapes = state_dict.get(FROZEN_PARAM_SHAPES, None)
125
+ if frozen_param_shapes is not None:
126
+ if debug:
127
+ print(f"Found frozen_param_shapes: {frozen_param_shapes}")
128
+ param_names += list(frozen_param_shapes.keys())
129
+
130
+ # handle shared params
131
+ shared_params = [[k, v] for k, v in state_dict["shared_params"].items()]
132
+
133
+ ds_version = state_dict.get(DS_VERSION, None)
134
+
135
+ frozen_param_fragments = state_dict.get(FROZEN_PARAM_FRAGMENTS, None)
136
+
137
+ z_model_state = zero_model_state(buffers=buffers,
138
+ param_shapes=param_shapes,
139
+ shared_params=shared_params,
140
+ ds_version=ds_version,
141
+ frozen_param_shapes=frozen_param_shapes,
142
+ frozen_param_fragments=frozen_param_fragments)
143
+ zero_model_states.append(z_model_state)
144
+
145
+ return zero_model_states
146
+
147
+
148
+ def parse_optim_states(files, ds_checkpoint_dir):
149
+ total_files = len(files)
150
+ state_dicts = []
151
+ for f in tqdm(files, desc='Loading checkpoint shards'):
152
+ state_dict = torch.load(f, map_location=device, mmap=True, weights_only=False)
153
+ # immediately discard the potentially huge 2 optimizer states as we only care for fp32 master weights
154
+ # and also handle the case where it was already removed by another helper script
155
+ state_dict["optimizer_state_dict"].pop("optimizer_state_dict", None)
156
+ state_dicts.append(state_dict)
157
+
158
+ if ZERO_STAGE not in state_dicts[0][OPTIMIZER_STATE_DICT]:
159
+ raise ValueError(f"{files[0]} is not a zero checkpoint")
160
+ zero_stage = state_dicts[0][OPTIMIZER_STATE_DICT][ZERO_STAGE]
161
+ world_size = state_dicts[0][OPTIMIZER_STATE_DICT][PARTITION_COUNT]
162
+
163
+ # For ZeRO-2 each param group can have different partition_count as data parallelism for expert
164
+ # parameters can be different from data parallelism for non-expert parameters. So we can just
165
+ # use the max of the partition_count to get the dp world_size.
166
+
167
+ if type(world_size) is list:
168
+ world_size = max(world_size)
169
+
170
+ if world_size != total_files:
171
+ raise ValueError(
172
+ f"Expected {world_size} of '*_optim_states.pt' under '{ds_checkpoint_dir}' but found {total_files} files. "
173
+ "Possibly due to an overwrite of an old checkpoint, or a checkpoint didn't get saved by one or more processes."
174
+ )
175
+
176
+ # the groups are named differently in each stage
177
+ if zero_stage <= 2:
178
+ fp32_groups_key = SINGLE_PARTITION_OF_FP32_GROUPS
179
+ elif zero_stage == 3:
180
+ fp32_groups_key = FP32_FLAT_GROUPS
181
+ else:
182
+ raise ValueError(f"unknown zero stage {zero_stage}")
183
+
184
+ fp32_flat_groups = [state_dicts[i][OPTIMIZER_STATE_DICT][fp32_groups_key] for i in range(len(state_dicts))]
185
+ return zero_stage, world_size, fp32_flat_groups
186
+
187
+
188
+ def _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir, exclude_frozen_parameters):
189
+ """
190
+ Returns fp32 state_dict reconstructed from ds checkpoint
191
+
192
+ Args:
193
+ - ``ds_checkpoint_dir``: path to the deepspeed checkpoint folder (where the optimizer files are)
194
+
195
+ """
196
+ print(f"Processing zero checkpoint '{ds_checkpoint_dir}'")
197
+
198
+ optim_files = get_optim_files(ds_checkpoint_dir)
199
+ zero_stage, world_size, fp32_flat_groups = parse_optim_states(optim_files, ds_checkpoint_dir)
200
+ print(f"Detected checkpoint of type zero stage {zero_stage}, world_size: {world_size}")
201
+
202
+ model_files = get_model_state_files(ds_checkpoint_dir)
203
+
204
+ zero_model_states = parse_model_states(model_files)
205
+ print(f'Parsing checkpoint created by deepspeed=={zero_model_states[0].ds_version}')
206
+
207
+ if zero_stage <= 2:
208
+ return _get_fp32_state_dict_from_zero2_checkpoint(world_size, fp32_flat_groups, zero_model_states,
209
+ exclude_frozen_parameters)
210
+ elif zero_stage == 3:
211
+ return _get_fp32_state_dict_from_zero3_checkpoint(world_size, fp32_flat_groups, zero_model_states,
212
+ exclude_frozen_parameters)
213
+
214
+
215
+ def _zero2_merge_frozen_params(state_dict, zero_model_states):
216
+ if zero_model_states[0].frozen_param_shapes is None or len(zero_model_states[0].frozen_param_shapes) == 0:
217
+ return
218
+
219
+ frozen_param_shapes = zero_model_states[0].frozen_param_shapes
220
+ frozen_param_fragments = zero_model_states[0].frozen_param_fragments
221
+
222
+ if debug:
223
+ num_elem = sum(s.numel() for s in frozen_param_shapes.values())
224
+ print(f'rank 0: {FROZEN_PARAM_SHAPES}.numel = {num_elem}')
225
+
226
+ wanted_params = len(frozen_param_shapes)
227
+ wanted_numel = sum(s.numel() for s in frozen_param_shapes.values())
228
+ avail_numel = sum([p.numel() for p in frozen_param_fragments.values()])
229
+ print(f'Frozen params: Have {avail_numel} numels to process.')
230
+ print(f'Frozen params: Need {wanted_numel} numels in {wanted_params} params')
231
+
232
+ total_params = 0
233
+ total_numel = 0
234
+ for name, shape in frozen_param_shapes.items():
235
+ total_params += 1
236
+ unpartitioned_numel = shape.numel()
237
+ total_numel += unpartitioned_numel
238
+
239
+ state_dict[name] = frozen_param_fragments[name]
240
+
241
+ if debug:
242
+ print(f"{name} full shape: {shape} unpartitioned numel {unpartitioned_numel} ")
243
+
244
+ print(f"Reconstructed Frozen fp32 state dict with {total_params} params {total_numel} elements")
245
+
246
+
247
+ def _has_callable(obj, fn):
248
+ attr = getattr(obj, fn, None)
249
+ return callable(attr)
250
+
251
+
252
+ def _zero2_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states):
253
+ param_shapes = zero_model_states[0].param_shapes
254
+
255
+ # Reconstruction protocol:
256
+ #
257
+ # XXX: document this
258
+
259
+ if debug:
260
+ for i in range(world_size):
261
+ for j in range(len(fp32_flat_groups[0])):
262
+ print(f"{FP32_FLAT_GROUPS}[{i}][{j}].shape={fp32_flat_groups[i][j].shape}")
263
+
264
+ # XXX: memory usage doubles here (zero2)
265
+ num_param_groups = len(fp32_flat_groups[0])
266
+ merged_single_partition_of_fp32_groups = []
267
+ for i in range(num_param_groups):
268
+ merged_partitions = [sd[i] for sd in fp32_flat_groups]
269
+ full_single_fp32_vector = torch.cat(merged_partitions, 0)
270
+ merged_single_partition_of_fp32_groups.append(full_single_fp32_vector)
271
+ avail_numel = sum(
272
+ [full_single_fp32_vector.numel() for full_single_fp32_vector in merged_single_partition_of_fp32_groups])
273
+
274
+ if debug:
275
+ wanted_params = sum([len(shapes) for shapes in param_shapes])
276
+ wanted_numel = sum([sum(shape.numel() for shape in shapes.values()) for shapes in param_shapes])
277
+ # not asserting if there is a mismatch due to possible padding
278
+ print(f"Have {avail_numel} numels to process.")
279
+ print(f"Need {wanted_numel} numels in {wanted_params} params.")
280
+
281
+ # params
282
+ # XXX: for huge models that can't fit into the host's RAM we will have to recode this to support
283
+ # out-of-core computing solution
284
+ total_numel = 0
285
+ total_params = 0
286
+ for shapes, full_single_fp32_vector in zip(param_shapes, merged_single_partition_of_fp32_groups):
287
+ offset = 0
288
+ avail_numel = full_single_fp32_vector.numel()
289
+ for name, shape in shapes.items():
290
+
291
+ unpartitioned_numel = shape.numel() if _has_callable(shape, 'numel') else math.prod(shape)
292
+ total_numel += unpartitioned_numel
293
+ total_params += 1
294
+
295
+ if debug:
296
+ print(f"{name} full shape: {shape} unpartitioned numel {unpartitioned_numel} ")
297
+ state_dict[name] = full_single_fp32_vector.narrow(0, offset, unpartitioned_numel).view(shape)
298
+ offset += unpartitioned_numel
299
+
300
+ # Z2 started to align to 2*world_size to improve nccl performance. Therefore both offset and
301
+ # avail_numel can differ by anywhere between 0..2*world_size. Due to two unrelated complex
302
+ # paddings performed in the code it's almost impossible to predict the exact numbers w/o the
303
+ # live optimizer object, so we are checking that the numbers are within the right range
304
+ align_to = 2 * world_size
305
+
306
+ def zero2_align(x):
307
+ return align_to * math.ceil(x / align_to)
308
+
309
+ if debug:
310
+ print(f"original offset={offset}, avail_numel={avail_numel}")
311
+
312
+ offset = zero2_align(offset)
313
+ avail_numel = zero2_align(avail_numel)
314
+
315
+ if debug:
316
+ print(f"aligned offset={offset}, avail_numel={avail_numel}")
317
+
318
+ # Sanity check
319
+ if offset != avail_numel:
320
+ raise ValueError(f"consumed {offset} numels out of {avail_numel} - something is wrong")
321
+
322
+ print(f"Reconstructed fp32 state dict with {total_params} params {total_numel} elements")
323
+
324
+
325
+ def _get_fp32_state_dict_from_zero2_checkpoint(world_size, fp32_flat_groups, zero_model_states,
326
+ exclude_frozen_parameters):
327
+ state_dict = OrderedDict()
328
+
329
+ # buffers
330
+ buffers = zero_model_states[0].buffers
331
+ state_dict.update(buffers)
332
+ if debug:
333
+ print(f"added {len(buffers)} buffers")
334
+
335
+ if not exclude_frozen_parameters:
336
+ _zero2_merge_frozen_params(state_dict, zero_model_states)
337
+
338
+ _zero2_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states)
339
+
340
+ # recover shared parameters
341
+ for pair in zero_model_states[0].shared_params:
342
+ if pair[1] in state_dict:
343
+ state_dict[pair[0]] = state_dict[pair[1]]
344
+
345
+ return state_dict
346
+
347
+
348
+ def zero3_partitioned_param_info(unpartitioned_numel, world_size):
349
+ remainder = unpartitioned_numel % world_size
350
+ padding_numel = (world_size - remainder) if remainder else 0
351
+ partitioned_numel = math.ceil(unpartitioned_numel / world_size)
352
+ return partitioned_numel, padding_numel
353
+
354
+
355
+ def _zero3_merge_frozen_params(state_dict, world_size, zero_model_states):
356
+ if zero_model_states[0].frozen_param_shapes is None or len(zero_model_states[0].frozen_param_shapes) == 0:
357
+ return
358
+
359
+ if debug:
360
+ for i in range(world_size):
361
+ num_elem = sum(s.numel() for s in zero_model_states[i].frozen_param_fragments.values())
362
+ print(f'rank {i}: {FROZEN_PARAM_SHAPES}.numel = {num_elem}')
363
+
364
+ frozen_param_shapes = zero_model_states[0].frozen_param_shapes
365
+ wanted_params = len(frozen_param_shapes)
366
+ wanted_numel = sum(s.numel() for s in frozen_param_shapes.values())
367
+ avail_numel = sum([p.numel() for p in zero_model_states[0].frozen_param_fragments.values()]) * world_size
368
+ print(f'Frozen params: Have {avail_numel} numels to process.')
369
+ print(f'Frozen params: Need {wanted_numel} numels in {wanted_params} params')
370
+
371
+ total_params = 0
372
+ total_numel = 0
373
+ for name, shape in zero_model_states[0].frozen_param_shapes.items():
374
+ total_params += 1
375
+ unpartitioned_numel = shape.numel()
376
+ total_numel += unpartitioned_numel
377
+
378
+ param_frags = tuple(model_state.frozen_param_fragments[name] for model_state in zero_model_states)
379
+ state_dict[name] = torch.cat(param_frags, 0).narrow(0, 0, unpartitioned_numel).view(shape)
380
+
381
+ partitioned_numel, partitioned_padding_numel = zero3_partitioned_param_info(unpartitioned_numel, world_size)
382
+
383
+ if debug:
384
+ print(
385
+ f"Frozen params: {total_params} {name} full shape: {shape} partition0 numel={partitioned_numel} partitioned_padding_numel={partitioned_padding_numel}"
386
+ )
387
+
388
+ print(f"Reconstructed Frozen fp32 state dict with {total_params} params {total_numel} elements")
389
+
390
+
391
+ class GatheredTensor:
392
+ """
393
+ A pseudo tensor that collects partitioned weights.
394
+ It is more memory efficient when there are multiple groups.
395
+ """
396
+
397
+ def __init__(self, flat_groups, flat_groups_offset, offset, partitioned_numel, shape):
398
+ self.flat_groups = flat_groups
399
+ self.flat_groups_offset = flat_groups_offset
400
+ self.offset = offset
401
+ self.partitioned_numel = partitioned_numel
402
+ self.shape = shape
403
+ self.dtype = self.flat_groups[0][0].dtype
404
+
405
+ def contiguous(self):
406
+ """
407
+ Merge partitioned weights from flat_groups into a single tensor.
408
+ """
409
+ end_idx = self.offset + self.partitioned_numel
410
+ world_size = len(self.flat_groups)
411
+ pad_flat_param_chunks = []
412
+
413
+ for rank_i in range(world_size):
414
+ # for each rank, we need to collect weights from related group/groups
415
+ flat_groups_at_rank_i = self.flat_groups[rank_i]
416
+ start_group_id = None
417
+ end_group_id = None
418
+ for group_id in range(len(self.flat_groups_offset)):
419
+ if self.flat_groups_offset[group_id] <= self.offset < self.flat_groups_offset[group_id + 1]:
420
+ start_group_id = group_id
421
+ if self.flat_groups_offset[group_id] < end_idx <= self.flat_groups_offset[group_id + 1]:
422
+ end_group_id = group_id
423
+ break
424
+ # collect weights from related group/groups
425
+ for group_id in range(start_group_id, end_group_id + 1):
426
+ flat_tensor = flat_groups_at_rank_i[group_id]
427
+ start_offset = self.offset - self.flat_groups_offset[group_id]
428
+ end_offset = min(end_idx, self.flat_groups_offset[group_id + 1]) - self.flat_groups_offset[group_id]
429
+ pad_flat_param_chunks.append(flat_tensor[start_offset:end_offset])
430
+
431
+ # collect weights from all ranks
432
+ pad_flat_param = torch.cat(pad_flat_param_chunks, dim=0)
433
+ param = pad_flat_param[:self.shape.numel()].view(self.shape).contiguous()
434
+ return param
435
+
436
+
437
+ def _zero3_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states):
438
+ param_shapes = zero_model_states[0].param_shapes
439
+ avail_numel = sum([flat_group.numel() for flat_group in fp32_flat_groups[0]]) * world_size
440
+
441
+ # Reconstruction protocol: For zero3 we need to zip the partitions together at boundary of each
442
+ # param, re-consolidating each param, while dealing with padding if any
443
+
444
+ # merge list of dicts, preserving order
445
+ param_shapes = {k: v for d in param_shapes for k, v in d.items()}
446
+
447
+ if debug:
448
+ for i in range(world_size):
449
+ print(f"{FP32_FLAT_GROUPS}[{i}].shape={fp32_flat_groups[i].shape}")
450
+
451
+ wanted_params = len(param_shapes)
452
+ wanted_numel = sum(shape.numel() for shape in param_shapes.values())
453
+ # not asserting if there is a mismatch due to possible padding
454
+ avail_numel = fp32_flat_groups[0].numel() * world_size
455
+ print(f"Trainable params: Have {avail_numel} numels to process.")
456
+ print(f"Trainable params: Need {wanted_numel} numels in {wanted_params} params.")
457
+
458
+ # params
459
+ # XXX: for huge models that can't fit into the host's RAM we will have to recode this to support
460
+ # out-of-core computing solution
461
+ offset = 0
462
+ total_numel = 0
463
+ total_params = 0
464
+ flat_groups_offset = [0] + list(np.cumsum([flat_tensor.numel() for flat_tensor in fp32_flat_groups[0]]))
465
+ for name, shape in tqdm(param_shapes.items(), desc='Gathering sharded weights'):
466
+ unpartitioned_numel = shape.numel()
467
+ total_numel += unpartitioned_numel
468
+ total_params += 1
469
+ partitioned_numel, partitioned_padding_numel = zero3_partitioned_param_info(unpartitioned_numel, world_size)
470
+
471
+ if debug:
472
+ print(
473
+ f"Trainable params: {total_params} {name} full shape: {shape} partition0 numel={partitioned_numel} partitioned_padding_numel={partitioned_padding_numel}"
474
+ )
475
+
476
+ # memory efficient tensor
477
+ tensor = GatheredTensor(fp32_flat_groups, flat_groups_offset, offset, partitioned_numel, shape)
478
+ state_dict[name] = tensor
479
+ offset += partitioned_numel
480
+
481
+ offset *= world_size
482
+
483
+ # Sanity check
484
+ if offset != avail_numel:
485
+ raise ValueError(f"consumed {offset} numels out of {avail_numel} - something is wrong")
486
+
487
+ print(f"Reconstructed Trainable fp32 state dict with {total_params} params {total_numel} elements")
488
+
489
+
490
+ def _get_fp32_state_dict_from_zero3_checkpoint(world_size, fp32_flat_groups, zero_model_states,
491
+ exclude_frozen_parameters):
492
+ state_dict = OrderedDict()
493
+
494
+ # buffers
495
+ buffers = zero_model_states[0].buffers
496
+ state_dict.update(buffers)
497
+ if debug:
498
+ print(f"added {len(buffers)} buffers")
499
+
500
+ if not exclude_frozen_parameters:
501
+ _zero3_merge_frozen_params(state_dict, world_size, zero_model_states)
502
+
503
+ _zero3_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states)
504
+
505
+ # recover shared parameters
506
+ for pair in zero_model_states[0].shared_params:
507
+ if pair[1] in state_dict:
508
+ state_dict[pair[0]] = state_dict[pair[1]]
509
+
510
+ return state_dict
511
+
512
+
513
+ def to_torch_tensor(state_dict, return_empty_tensor=False):
514
+ """
515
+ Convert state_dict of GatheredTensor to torch tensor
516
+ """
517
+ torch_state_dict = {}
518
+ converted_tensors = {}
519
+ for name, tensor in state_dict.items():
520
+ tensor_id = id(tensor)
521
+ if tensor_id in converted_tensors: # shared tensors
522
+ shared_tensor = torch_state_dict[converted_tensors[tensor_id]]
523
+ torch_state_dict[name] = shared_tensor
524
+ else:
525
+ converted_tensors[tensor_id] = name
526
+ if return_empty_tensor:
527
+ torch_state_dict[name] = torch.empty(tensor.shape, dtype=tensor.dtype)
528
+ else:
529
+ torch_state_dict[name] = tensor.contiguous()
530
+ return torch_state_dict
531
+
532
+
533
+ def get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir,
534
+ tag=None,
535
+ exclude_frozen_parameters=False,
536
+ lazy_mode=False):
537
+ """
538
+ Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated state_dict that can be loaded with
539
+ ``load_state_dict()`` and used for training without DeepSpeed or shared with others, for example
540
+ via a model hub.
541
+
542
+ Args:
543
+ - ``checkpoint_dir``: path to the desired checkpoint folder
544
+ - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in 'latest' file. e.g., ``global_step14``
545
+ - ``exclude_frozen_parameters``: exclude frozen parameters
546
+ - ``lazy_mode``: get state_dict in lazy mode. It returns a dict of pesduo tensor instead of torch tensor, which is more memory efficient.
547
+ Convert the pesduo tensor to torch tensor by ``.contiguous()``
548
+
549
+ Returns:
550
+ - pytorch ``state_dict``
551
+
552
+ A typical usage might be ::
553
+
554
+ from deepspeed.utils.zero_to_fp32 import get_fp32_state_dict_from_zero_checkpoint
555
+ # do the training and checkpoint saving
556
+ state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir) # already on cpu
557
+ model = model.cpu() # move to cpu
558
+ model.load_state_dict(state_dict)
559
+ # submit to model hub or save the model to share with others
560
+
561
+ In this example the ``model`` will no longer be usable in the deepspeed context of the same
562
+ application. i.e. you will need to re-initialize the deepspeed engine, since
563
+ ``model.load_state_dict(state_dict)`` will remove all the deepspeed magic from it.
564
+
565
+ If you want it all done for you, use ``load_state_dict_from_zero_checkpoint`` instead.
566
+
567
+ Note: the above usage may not work if your application doesn't have sufficient free CPU memory.
568
+ You may need to use the offline approach using the ``zero_to_fp32.py`` script that is saved with
569
+ the checkpoint. Or you can load state_dict in lazy mode ::
570
+
571
+ from deepspeed.utils.zero_to_fp32 import get_fp32_state_dict_from_zero_checkpoint
572
+ state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, lazy_mode=True) # not on cpu
573
+ for name, lazy_tensor in state_dict.item():
574
+ tensor = lazy_tensor.contiguous() # to cpu
575
+ print(name, tensor)
576
+ # del tensor to release memory if it no longer in use
577
+ """
578
+ if tag is None:
579
+ latest_path = os.path.join(checkpoint_dir, 'latest')
580
+ if os.path.isfile(latest_path):
581
+ with open(latest_path, 'r') as fd:
582
+ tag = fd.read().strip()
583
+ else:
584
+ raise ValueError(f"Unable to find 'latest' file at {latest_path}")
585
+
586
+ ds_checkpoint_dir = os.path.join(checkpoint_dir, tag)
587
+
588
+ if not os.path.isdir(ds_checkpoint_dir):
589
+ raise FileNotFoundError(f"Directory '{ds_checkpoint_dir}' doesn't exist")
590
+
591
+ state_dict = _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir, exclude_frozen_parameters)
592
+ if lazy_mode:
593
+ return state_dict
594
+ else:
595
+ return to_torch_tensor(state_dict)
596
+
597
+
598
+ def convert_zero_checkpoint_to_fp32_state_dict(checkpoint_dir,
599
+ output_dir,
600
+ max_shard_size="5GB",
601
+ safe_serialization=False,
602
+ tag=None,
603
+ exclude_frozen_parameters=False):
604
+ """
605
+ Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated ``state_dict`` file that can be
606
+ loaded with ``torch.load(file)`` + ``load_state_dict()`` and used for training without DeepSpeed.
607
+
608
+ Args:
609
+ - ``checkpoint_dir``: path to the desired checkpoint folder. (one that contains the tag-folder, like ``global_step14``)
610
+ - ``output_dir``: directory to the pytorch fp32 state_dict output files
611
+ - ``max_shard_size``: the maximum size for a checkpoint before being sharded, default value is 5GB
612
+ - ``safe_serialization``: whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`).
613
+ - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named ``latest`` in the checkpoint folder, e.g., ``global_step14``
614
+ - ``exclude_frozen_parameters``: exclude frozen parameters
615
+ """
616
+
617
+ # Dependency pre-check
618
+ if safe_serialization:
619
+ try:
620
+ from safetensors.torch import save_file
621
+ except ImportError:
622
+ print('If you want to use `safe_serialization`, please `pip install safetensors`')
623
+ raise
624
+ if max_shard_size is not None:
625
+ try:
626
+ from huggingface_hub import split_torch_state_dict_into_shards
627
+ except ImportError:
628
+ print('If you want to use `max_shard_size`, please `pip install huggingface_hub`')
629
+ raise
630
+
631
+ # Convert zero checkpoint to state_dict
632
+ state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir,
633
+ tag,
634
+ exclude_frozen_parameters,
635
+ lazy_mode=True)
636
+
637
+ # Shard the model if it is too big.
638
+ weights_name = "model.safetensors" if safe_serialization else "pytorch_model.bin"
639
+ if max_shard_size is not None:
640
+ filename_pattern = weights_name.replace(".bin", "{suffix}.bin").replace(".safetensors", "{suffix}.safetensors")
641
+ # an memory-efficient approach for sharding
642
+ empty_state_dict = to_torch_tensor(state_dict, return_empty_tensor=True)
643
+ state_dict_split = split_torch_state_dict_into_shards(empty_state_dict,
644
+ filename_pattern=filename_pattern,
645
+ max_shard_size=max_shard_size)
646
+ else:
647
+ from collections import namedtuple
648
+ StateDictSplit = namedtuple("StateDictSplit", ["is_sharded", "filename_to_tensors"])
649
+ state_dict_split = StateDictSplit(is_sharded=False,
650
+ filename_to_tensors={weights_name: list(state_dict.keys())})
651
+
652
+ # Save the model by shard
653
+ os.makedirs(output_dir, exist_ok=True)
654
+ filename_to_tensors = state_dict_split.filename_to_tensors.items()
655
+ for shard_file, tensors in tqdm(filename_to_tensors, desc="Saving checkpoint shards"):
656
+ shard_state_dict = {tensor_name: state_dict[tensor_name] for tensor_name in tensors}
657
+ shard_state_dict = to_torch_tensor(shard_state_dict)
658
+ output_path = os.path.join(output_dir, shard_file)
659
+ if safe_serialization:
660
+ save_file(shard_state_dict, output_path, metadata={"format": "pt"})
661
+ else:
662
+ torch.save(shard_state_dict, output_path)
663
+ # release the memory of current shard
664
+ for tensor_name in list(shard_state_dict.keys()):
665
+ del state_dict[tensor_name]
666
+ del shard_state_dict[tensor_name]
667
+ del shard_state_dict
668
+ gc.collect()
669
+
670
+ # Save index if sharded
671
+ if state_dict_split.is_sharded:
672
+ index = {
673
+ "metadata": state_dict_split.metadata,
674
+ "weight_map": state_dict_split.tensor_to_filename,
675
+ }
676
+ save_index_file = "model.safetensors.index.json" if safe_serialization else "pytorch_model.bin.index.json"
677
+ save_index_file = os.path.join(output_dir, save_index_file)
678
+ with open(save_index_file, "w", encoding="utf-8") as f:
679
+ content = json.dumps(index, indent=2, sort_keys=True) + "\n"
680
+ f.write(content)
681
+
682
+
683
+ def load_state_dict_from_zero_checkpoint(model, checkpoint_dir, tag=None):
684
+ """
685
+ 1. Put the provided model to cpu
686
+ 2. Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated ``state_dict``
687
+ 3. Load it into the provided model
688
+
689
+ Args:
690
+ - ``model``: the model object to update
691
+ - ``checkpoint_dir``: path to the desired checkpoint folder. (one that contains the tag-folder, like ``global_step14``)
692
+ - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named ``latest`` in the checkpoint folder, e.g., ``global_step14``
693
+
694
+ Returns:
695
+ - ``model`: modified model
696
+
697
+ Make sure you have plenty of CPU memory available before you call this function. If you don't
698
+ have enough use the ``zero_to_fp32.py`` utility to do the conversion. You will find it
699
+ conveniently placed for you in the checkpoint folder.
700
+
701
+ A typical usage might be ::
702
+
703
+ from deepspeed.utils.zero_to_fp32 import load_state_dict_from_zero_checkpoint
704
+ model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir)
705
+ # submit to model hub or save the model to share with others
706
+
707
+ Note, that once this was run, the ``model`` will no longer be usable in the deepspeed context
708
+ of the same application. i.e. you will need to re-initialize the deepspeed engine, since
709
+ ``model.load_state_dict(state_dict)`` will remove all the deepspeed magic from it.
710
+
711
+ """
712
+ logger.info("Extracting fp32 weights")
713
+ state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag)
714
+
715
+ logger.info("Overwriting model with fp32 weights")
716
+ model = model.cpu()
717
+ model.load_state_dict(state_dict, strict=False)
718
+
719
+ return model
720
+
721
+
722
+ if __name__ == "__main__":
723
+ parser = argparse.ArgumentParser()
724
+ parser.add_argument("checkpoint_dir",
725
+ type=str,
726
+ help="path to the desired checkpoint folder, e.g., path/checkpoint-12")
727
+ parser.add_argument("output_dir",
728
+ type=str,
729
+ help="directory to the pytorch fp32 state_dict output files"
730
+ "(e.g. path/checkpoint-12-output/)")
731
+ parser.add_argument(
732
+ "--max_shard_size",
733
+ type=str,
734
+ default="5GB",
735
+ help="The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size"
736
+ "lower than this size. If expressed as a string, needs to be digits followed by a unit (like `5MB`"
737
+ "We default it to 5GB in order for models to be able to run easily on free-tier google colab instances"
738
+ "without CPU OOM issues.")
739
+ parser.add_argument(
740
+ "--safe_serialization",
741
+ default=False,
742
+ action='store_true',
743
+ help="Whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`).")
744
+ parser.add_argument("-t",
745
+ "--tag",
746
+ type=str,
747
+ default=None,
748
+ help="checkpoint tag used as a unique identifier for checkpoint. e.g., global_step1")
749
+ parser.add_argument("--exclude_frozen_parameters", action='store_true', help="exclude frozen parameters")
750
+ parser.add_argument("-d", "--debug", action='store_true', help="enable debug")
751
+ args = parser.parse_args()
752
+
753
+ debug = args.debug
754
+
755
+ convert_zero_checkpoint_to_fp32_state_dict(args.checkpoint_dir,
756
+ args.output_dir,
757
+ max_shard_size=args.max_shard_size,
758
+ safe_serialization=args.safe_serialization,
759
+ tag=args.tag,
760
+ exclude_frozen_parameters=args.exclude_frozen_parameters)