boatbomber commited on
Commit
1c275ca
·
verified ·
1 Parent(s): a0fa8ae

Upload model

Browse files
.gitattributes CHANGED
@@ -36,3 +36,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
36
  assets/*.png filter=lfs diff=lfs merge=lfs -text
37
  training/data/cdli.atf filter=lfs diff=lfs merge=lfs -text
38
  training/data/NotoSansCuneiform-Regular.ttf filter=lfs diff=lfs merge=lfs -text
 
 
36
  assets/*.png filter=lfs diff=lfs merge=lfs -text
37
  training/data/cdli.atf filter=lfs diff=lfs merge=lfs -text
38
  training/data/NotoSansCuneiform-Regular.ttf filter=lfs diff=lfs merge=lfs -text
39
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
added_tokens.json ADDED
@@ -0,0 +1,1687 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<B>": 101976,
3
+ "<D>": 101780,
4
+ "<M>": 101977,
5
+ "<S>": 101465,
6
+ "<ansze>": 101979,
7
+ "<disz>": 101775,
8
+ "<ecel>": 101308,
9
+ "<fcel>": 101309,
10
+ "<ki>": 101592,
11
+ "<lcel>": 101311,
12
+ "<munus>": 101978,
13
+ "<nl>": 101313,
14
+ "<ucel>": 101312,
15
+ "<xcel>": 101310,
16
+ "<|AUDIO_PLACEHOLDER|>": 100296,
17
+ "<|CROP_COL_SEP|>": 101301,
18
+ "<|CROP_ROW_SEP|>": 101302,
19
+ "<|IMAGE_END|>": 101306,
20
+ "<|IMAGE_PLACEHOLDER|>": 100295,
21
+ "<|IMAGE_SEP|>": 101303,
22
+ "<|IMAGE_START|>": 101305,
23
+ "<|LOC_0|>": 100297,
24
+ "<|LOC_1000|>": 101297,
25
+ "<|LOC_100|>": 100397,
26
+ "<|LOC_101|>": 100398,
27
+ "<|LOC_102|>": 100399,
28
+ "<|LOC_103|>": 100400,
29
+ "<|LOC_104|>": 100401,
30
+ "<|LOC_105|>": 100402,
31
+ "<|LOC_106|>": 100403,
32
+ "<|LOC_107|>": 100404,
33
+ "<|LOC_108|>": 100405,
34
+ "<|LOC_109|>": 100406,
35
+ "<|LOC_10|>": 100307,
36
+ "<|LOC_110|>": 100407,
37
+ "<|LOC_111|>": 100408,
38
+ "<|LOC_112|>": 100409,
39
+ "<|LOC_113|>": 100410,
40
+ "<|LOC_114|>": 100411,
41
+ "<|LOC_115|>": 100412,
42
+ "<|LOC_116|>": 100413,
43
+ "<|LOC_117|>": 100414,
44
+ "<|LOC_118|>": 100415,
45
+ "<|LOC_119|>": 100416,
46
+ "<|LOC_11|>": 100308,
47
+ "<|LOC_120|>": 100417,
48
+ "<|LOC_121|>": 100418,
49
+ "<|LOC_122|>": 100419,
50
+ "<|LOC_123|>": 100420,
51
+ "<|LOC_124|>": 100421,
52
+ "<|LOC_125|>": 100422,
53
+ "<|LOC_126|>": 100423,
54
+ "<|LOC_127|>": 100424,
55
+ "<|LOC_128|>": 100425,
56
+ "<|LOC_129|>": 100426,
57
+ "<|LOC_12|>": 100309,
58
+ "<|LOC_130|>": 100427,
59
+ "<|LOC_131|>": 100428,
60
+ "<|LOC_132|>": 100429,
61
+ "<|LOC_133|>": 100430,
62
+ "<|LOC_134|>": 100431,
63
+ "<|LOC_135|>": 100432,
64
+ "<|LOC_136|>": 100433,
65
+ "<|LOC_137|>": 100434,
66
+ "<|LOC_138|>": 100435,
67
+ "<|LOC_139|>": 100436,
68
+ "<|LOC_13|>": 100310,
69
+ "<|LOC_140|>": 100437,
70
+ "<|LOC_141|>": 100438,
71
+ "<|LOC_142|>": 100439,
72
+ "<|LOC_143|>": 100440,
73
+ "<|LOC_144|>": 100441,
74
+ "<|LOC_145|>": 100442,
75
+ "<|LOC_146|>": 100443,
76
+ "<|LOC_147|>": 100444,
77
+ "<|LOC_148|>": 100445,
78
+ "<|LOC_149|>": 100446,
79
+ "<|LOC_14|>": 100311,
80
+ "<|LOC_150|>": 100447,
81
+ "<|LOC_151|>": 100448,
82
+ "<|LOC_152|>": 100449,
83
+ "<|LOC_153|>": 100450,
84
+ "<|LOC_154|>": 100451,
85
+ "<|LOC_155|>": 100452,
86
+ "<|LOC_156|>": 100453,
87
+ "<|LOC_157|>": 100454,
88
+ "<|LOC_158|>": 100455,
89
+ "<|LOC_159|>": 100456,
90
+ "<|LOC_15|>": 100312,
91
+ "<|LOC_160|>": 100457,
92
+ "<|LOC_161|>": 100458,
93
+ "<|LOC_162|>": 100459,
94
+ "<|LOC_163|>": 100460,
95
+ "<|LOC_164|>": 100461,
96
+ "<|LOC_165|>": 100462,
97
+ "<|LOC_166|>": 100463,
98
+ "<|LOC_167|>": 100464,
99
+ "<|LOC_168|>": 100465,
100
+ "<|LOC_169|>": 100466,
101
+ "<|LOC_16|>": 100313,
102
+ "<|LOC_170|>": 100467,
103
+ "<|LOC_171|>": 100468,
104
+ "<|LOC_172|>": 100469,
105
+ "<|LOC_173|>": 100470,
106
+ "<|LOC_174|>": 100471,
107
+ "<|LOC_175|>": 100472,
108
+ "<|LOC_176|>": 100473,
109
+ "<|LOC_177|>": 100474,
110
+ "<|LOC_178|>": 100475,
111
+ "<|LOC_179|>": 100476,
112
+ "<|LOC_17|>": 100314,
113
+ "<|LOC_180|>": 100477,
114
+ "<|LOC_181|>": 100478,
115
+ "<|LOC_182|>": 100479,
116
+ "<|LOC_183|>": 100480,
117
+ "<|LOC_184|>": 100481,
118
+ "<|LOC_185|>": 100482,
119
+ "<|LOC_186|>": 100483,
120
+ "<|LOC_187|>": 100484,
121
+ "<|LOC_188|>": 100485,
122
+ "<|LOC_189|>": 100486,
123
+ "<|LOC_18|>": 100315,
124
+ "<|LOC_190|>": 100487,
125
+ "<|LOC_191|>": 100488,
126
+ "<|LOC_192|>": 100489,
127
+ "<|LOC_193|>": 100490,
128
+ "<|LOC_194|>": 100491,
129
+ "<|LOC_195|>": 100492,
130
+ "<|LOC_196|>": 100493,
131
+ "<|LOC_197|>": 100494,
132
+ "<|LOC_198|>": 100495,
133
+ "<|LOC_199|>": 100496,
134
+ "<|LOC_19|>": 100316,
135
+ "<|LOC_1|>": 100298,
136
+ "<|LOC_200|>": 100497,
137
+ "<|LOC_201|>": 100498,
138
+ "<|LOC_202|>": 100499,
139
+ "<|LOC_203|>": 100500,
140
+ "<|LOC_204|>": 100501,
141
+ "<|LOC_205|>": 100502,
142
+ "<|LOC_206|>": 100503,
143
+ "<|LOC_207|>": 100504,
144
+ "<|LOC_208|>": 100505,
145
+ "<|LOC_209|>": 100506,
146
+ "<|LOC_20|>": 100317,
147
+ "<|LOC_210|>": 100507,
148
+ "<|LOC_211|>": 100508,
149
+ "<|LOC_212|>": 100509,
150
+ "<|LOC_213|>": 100510,
151
+ "<|LOC_214|>": 100511,
152
+ "<|LOC_215|>": 100512,
153
+ "<|LOC_216|>": 100513,
154
+ "<|LOC_217|>": 100514,
155
+ "<|LOC_218|>": 100515,
156
+ "<|LOC_219|>": 100516,
157
+ "<|LOC_21|>": 100318,
158
+ "<|LOC_220|>": 100517,
159
+ "<|LOC_221|>": 100518,
160
+ "<|LOC_222|>": 100519,
161
+ "<|LOC_223|>": 100520,
162
+ "<|LOC_224|>": 100521,
163
+ "<|LOC_225|>": 100522,
164
+ "<|LOC_226|>": 100523,
165
+ "<|LOC_227|>": 100524,
166
+ "<|LOC_228|>": 100525,
167
+ "<|LOC_229|>": 100526,
168
+ "<|LOC_22|>": 100319,
169
+ "<|LOC_230|>": 100527,
170
+ "<|LOC_231|>": 100528,
171
+ "<|LOC_232|>": 100529,
172
+ "<|LOC_233|>": 100530,
173
+ "<|LOC_234|>": 100531,
174
+ "<|LOC_235|>": 100532,
175
+ "<|LOC_236|>": 100533,
176
+ "<|LOC_237|>": 100534,
177
+ "<|LOC_238|>": 100535,
178
+ "<|LOC_239|>": 100536,
179
+ "<|LOC_23|>": 100320,
180
+ "<|LOC_240|>": 100537,
181
+ "<|LOC_241|>": 100538,
182
+ "<|LOC_242|>": 100539,
183
+ "<|LOC_243|>": 100540,
184
+ "<|LOC_244|>": 100541,
185
+ "<|LOC_245|>": 100542,
186
+ "<|LOC_246|>": 100543,
187
+ "<|LOC_247|>": 100544,
188
+ "<|LOC_248|>": 100545,
189
+ "<|LOC_249|>": 100546,
190
+ "<|LOC_24|>": 100321,
191
+ "<|LOC_250|>": 100547,
192
+ "<|LOC_251|>": 100548,
193
+ "<|LOC_252|>": 100549,
194
+ "<|LOC_253|>": 100550,
195
+ "<|LOC_254|>": 100551,
196
+ "<|LOC_255|>": 100552,
197
+ "<|LOC_256|>": 100553,
198
+ "<|LOC_257|>": 100554,
199
+ "<|LOC_258|>": 100555,
200
+ "<|LOC_259|>": 100556,
201
+ "<|LOC_25|>": 100322,
202
+ "<|LOC_260|>": 100557,
203
+ "<|LOC_261|>": 100558,
204
+ "<|LOC_262|>": 100559,
205
+ "<|LOC_263|>": 100560,
206
+ "<|LOC_264|>": 100561,
207
+ "<|LOC_265|>": 100562,
208
+ "<|LOC_266|>": 100563,
209
+ "<|LOC_267|>": 100564,
210
+ "<|LOC_268|>": 100565,
211
+ "<|LOC_269|>": 100566,
212
+ "<|LOC_26|>": 100323,
213
+ "<|LOC_270|>": 100567,
214
+ "<|LOC_271|>": 100568,
215
+ "<|LOC_272|>": 100569,
216
+ "<|LOC_273|>": 100570,
217
+ "<|LOC_274|>": 100571,
218
+ "<|LOC_275|>": 100572,
219
+ "<|LOC_276|>": 100573,
220
+ "<|LOC_277|>": 100574,
221
+ "<|LOC_278|>": 100575,
222
+ "<|LOC_279|>": 100576,
223
+ "<|LOC_27|>": 100324,
224
+ "<|LOC_280|>": 100577,
225
+ "<|LOC_281|>": 100578,
226
+ "<|LOC_282|>": 100579,
227
+ "<|LOC_283|>": 100580,
228
+ "<|LOC_284|>": 100581,
229
+ "<|LOC_285|>": 100582,
230
+ "<|LOC_286|>": 100583,
231
+ "<|LOC_287|>": 100584,
232
+ "<|LOC_288|>": 100585,
233
+ "<|LOC_289|>": 100586,
234
+ "<|LOC_28|>": 100325,
235
+ "<|LOC_290|>": 100587,
236
+ "<|LOC_291|>": 100588,
237
+ "<|LOC_292|>": 100589,
238
+ "<|LOC_293|>": 100590,
239
+ "<|LOC_294|>": 100591,
240
+ "<|LOC_295|>": 100592,
241
+ "<|LOC_296|>": 100593,
242
+ "<|LOC_297|>": 100594,
243
+ "<|LOC_298|>": 100595,
244
+ "<|LOC_299|>": 100596,
245
+ "<|LOC_29|>": 100326,
246
+ "<|LOC_2|>": 100299,
247
+ "<|LOC_300|>": 100597,
248
+ "<|LOC_301|>": 100598,
249
+ "<|LOC_302|>": 100599,
250
+ "<|LOC_303|>": 100600,
251
+ "<|LOC_304|>": 100601,
252
+ "<|LOC_305|>": 100602,
253
+ "<|LOC_306|>": 100603,
254
+ "<|LOC_307|>": 100604,
255
+ "<|LOC_308|>": 100605,
256
+ "<|LOC_309|>": 100606,
257
+ "<|LOC_30|>": 100327,
258
+ "<|LOC_310|>": 100607,
259
+ "<|LOC_311|>": 100608,
260
+ "<|LOC_312|>": 100609,
261
+ "<|LOC_313|>": 100610,
262
+ "<|LOC_314|>": 100611,
263
+ "<|LOC_315|>": 100612,
264
+ "<|LOC_316|>": 100613,
265
+ "<|LOC_317|>": 100614,
266
+ "<|LOC_318|>": 100615,
267
+ "<|LOC_319|>": 100616,
268
+ "<|LOC_31|>": 100328,
269
+ "<|LOC_320|>": 100617,
270
+ "<|LOC_321|>": 100618,
271
+ "<|LOC_322|>": 100619,
272
+ "<|LOC_323|>": 100620,
273
+ "<|LOC_324|>": 100621,
274
+ "<|LOC_325|>": 100622,
275
+ "<|LOC_326|>": 100623,
276
+ "<|LOC_327|>": 100624,
277
+ "<|LOC_328|>": 100625,
278
+ "<|LOC_329|>": 100626,
279
+ "<|LOC_32|>": 100329,
280
+ "<|LOC_330|>": 100627,
281
+ "<|LOC_331|>": 100628,
282
+ "<|LOC_332|>": 100629,
283
+ "<|LOC_333|>": 100630,
284
+ "<|LOC_334|>": 100631,
285
+ "<|LOC_335|>": 100632,
286
+ "<|LOC_336|>": 100633,
287
+ "<|LOC_337|>": 100634,
288
+ "<|LOC_338|>": 100635,
289
+ "<|LOC_339|>": 100636,
290
+ "<|LOC_33|>": 100330,
291
+ "<|LOC_340|>": 100637,
292
+ "<|LOC_341|>": 100638,
293
+ "<|LOC_342|>": 100639,
294
+ "<|LOC_343|>": 100640,
295
+ "<|LOC_344|>": 100641,
296
+ "<|LOC_345|>": 100642,
297
+ "<|LOC_346|>": 100643,
298
+ "<|LOC_347|>": 100644,
299
+ "<|LOC_348|>": 100645,
300
+ "<|LOC_349|>": 100646,
301
+ "<|LOC_34|>": 100331,
302
+ "<|LOC_350|>": 100647,
303
+ "<|LOC_351|>": 100648,
304
+ "<|LOC_352|>": 100649,
305
+ "<|LOC_353|>": 100650,
306
+ "<|LOC_354|>": 100651,
307
+ "<|LOC_355|>": 100652,
308
+ "<|LOC_356|>": 100653,
309
+ "<|LOC_357|>": 100654,
310
+ "<|LOC_358|>": 100655,
311
+ "<|LOC_359|>": 100656,
312
+ "<|LOC_35|>": 100332,
313
+ "<|LOC_360|>": 100657,
314
+ "<|LOC_361|>": 100658,
315
+ "<|LOC_362|>": 100659,
316
+ "<|LOC_363|>": 100660,
317
+ "<|LOC_364|>": 100661,
318
+ "<|LOC_365|>": 100662,
319
+ "<|LOC_366|>": 100663,
320
+ "<|LOC_367|>": 100664,
321
+ "<|LOC_368|>": 100665,
322
+ "<|LOC_369|>": 100666,
323
+ "<|LOC_36|>": 100333,
324
+ "<|LOC_370|>": 100667,
325
+ "<|LOC_371|>": 100668,
326
+ "<|LOC_372|>": 100669,
327
+ "<|LOC_373|>": 100670,
328
+ "<|LOC_374|>": 100671,
329
+ "<|LOC_375|>": 100672,
330
+ "<|LOC_376|>": 100673,
331
+ "<|LOC_377|>": 100674,
332
+ "<|LOC_378|>": 100675,
333
+ "<|LOC_379|>": 100676,
334
+ "<|LOC_37|>": 100334,
335
+ "<|LOC_380|>": 100677,
336
+ "<|LOC_381|>": 100678,
337
+ "<|LOC_382|>": 100679,
338
+ "<|LOC_383|>": 100680,
339
+ "<|LOC_384|>": 100681,
340
+ "<|LOC_385|>": 100682,
341
+ "<|LOC_386|>": 100683,
342
+ "<|LOC_387|>": 100684,
343
+ "<|LOC_388|>": 100685,
344
+ "<|LOC_389|>": 100686,
345
+ "<|LOC_38|>": 100335,
346
+ "<|LOC_390|>": 100687,
347
+ "<|LOC_391|>": 100688,
348
+ "<|LOC_392|>": 100689,
349
+ "<|LOC_393|>": 100690,
350
+ "<|LOC_394|>": 100691,
351
+ "<|LOC_395|>": 100692,
352
+ "<|LOC_396|>": 100693,
353
+ "<|LOC_397|>": 100694,
354
+ "<|LOC_398|>": 100695,
355
+ "<|LOC_399|>": 100696,
356
+ "<|LOC_39|>": 100336,
357
+ "<|LOC_3|>": 100300,
358
+ "<|LOC_400|>": 100697,
359
+ "<|LOC_401|>": 100698,
360
+ "<|LOC_402|>": 100699,
361
+ "<|LOC_403|>": 100700,
362
+ "<|LOC_404|>": 100701,
363
+ "<|LOC_405|>": 100702,
364
+ "<|LOC_406|>": 100703,
365
+ "<|LOC_407|>": 100704,
366
+ "<|LOC_408|>": 100705,
367
+ "<|LOC_409|>": 100706,
368
+ "<|LOC_40|>": 100337,
369
+ "<|LOC_410|>": 100707,
370
+ "<|LOC_411|>": 100708,
371
+ "<|LOC_412|>": 100709,
372
+ "<|LOC_413|>": 100710,
373
+ "<|LOC_414|>": 100711,
374
+ "<|LOC_415|>": 100712,
375
+ "<|LOC_416|>": 100713,
376
+ "<|LOC_417|>": 100714,
377
+ "<|LOC_418|>": 100715,
378
+ "<|LOC_419|>": 100716,
379
+ "<|LOC_41|>": 100338,
380
+ "<|LOC_420|>": 100717,
381
+ "<|LOC_421|>": 100718,
382
+ "<|LOC_422|>": 100719,
383
+ "<|LOC_423|>": 100720,
384
+ "<|LOC_424|>": 100721,
385
+ "<|LOC_425|>": 100722,
386
+ "<|LOC_426|>": 100723,
387
+ "<|LOC_427|>": 100724,
388
+ "<|LOC_428|>": 100725,
389
+ "<|LOC_429|>": 100726,
390
+ "<|LOC_42|>": 100339,
391
+ "<|LOC_430|>": 100727,
392
+ "<|LOC_431|>": 100728,
393
+ "<|LOC_432|>": 100729,
394
+ "<|LOC_433|>": 100730,
395
+ "<|LOC_434|>": 100731,
396
+ "<|LOC_435|>": 100732,
397
+ "<|LOC_436|>": 100733,
398
+ "<|LOC_437|>": 100734,
399
+ "<|LOC_438|>": 100735,
400
+ "<|LOC_439|>": 100736,
401
+ "<|LOC_43|>": 100340,
402
+ "<|LOC_440|>": 100737,
403
+ "<|LOC_441|>": 100738,
404
+ "<|LOC_442|>": 100739,
405
+ "<|LOC_443|>": 100740,
406
+ "<|LOC_444|>": 100741,
407
+ "<|LOC_445|>": 100742,
408
+ "<|LOC_446|>": 100743,
409
+ "<|LOC_447|>": 100744,
410
+ "<|LOC_448|>": 100745,
411
+ "<|LOC_449|>": 100746,
412
+ "<|LOC_44|>": 100341,
413
+ "<|LOC_450|>": 100747,
414
+ "<|LOC_451|>": 100748,
415
+ "<|LOC_452|>": 100749,
416
+ "<|LOC_453|>": 100750,
417
+ "<|LOC_454|>": 100751,
418
+ "<|LOC_455|>": 100752,
419
+ "<|LOC_456|>": 100753,
420
+ "<|LOC_457|>": 100754,
421
+ "<|LOC_458|>": 100755,
422
+ "<|LOC_459|>": 100756,
423
+ "<|LOC_45|>": 100342,
424
+ "<|LOC_460|>": 100757,
425
+ "<|LOC_461|>": 100758,
426
+ "<|LOC_462|>": 100759,
427
+ "<|LOC_463|>": 100760,
428
+ "<|LOC_464|>": 100761,
429
+ "<|LOC_465|>": 100762,
430
+ "<|LOC_466|>": 100763,
431
+ "<|LOC_467|>": 100764,
432
+ "<|LOC_468|>": 100765,
433
+ "<|LOC_469|>": 100766,
434
+ "<|LOC_46|>": 100343,
435
+ "<|LOC_470|>": 100767,
436
+ "<|LOC_471|>": 100768,
437
+ "<|LOC_472|>": 100769,
438
+ "<|LOC_473|>": 100770,
439
+ "<|LOC_474|>": 100771,
440
+ "<|LOC_475|>": 100772,
441
+ "<|LOC_476|>": 100773,
442
+ "<|LOC_477|>": 100774,
443
+ "<|LOC_478|>": 100775,
444
+ "<|LOC_479|>": 100776,
445
+ "<|LOC_47|>": 100344,
446
+ "<|LOC_480|>": 100777,
447
+ "<|LOC_481|>": 100778,
448
+ "<|LOC_482|>": 100779,
449
+ "<|LOC_483|>": 100780,
450
+ "<|LOC_484|>": 100781,
451
+ "<|LOC_485|>": 100782,
452
+ "<|LOC_486|>": 100783,
453
+ "<|LOC_487|>": 100784,
454
+ "<|LOC_488|>": 100785,
455
+ "<|LOC_489|>": 100786,
456
+ "<|LOC_48|>": 100345,
457
+ "<|LOC_490|>": 100787,
458
+ "<|LOC_491|>": 100788,
459
+ "<|LOC_492|>": 100789,
460
+ "<|LOC_493|>": 100790,
461
+ "<|LOC_494|>": 100791,
462
+ "<|LOC_495|>": 100792,
463
+ "<|LOC_496|>": 100793,
464
+ "<|LOC_497|>": 100794,
465
+ "<|LOC_498|>": 100795,
466
+ "<|LOC_499|>": 100796,
467
+ "<|LOC_49|>": 100346,
468
+ "<|LOC_4|>": 100301,
469
+ "<|LOC_500|>": 100797,
470
+ "<|LOC_501|>": 100798,
471
+ "<|LOC_502|>": 100799,
472
+ "<|LOC_503|>": 100800,
473
+ "<|LOC_504|>": 100801,
474
+ "<|LOC_505|>": 100802,
475
+ "<|LOC_506|>": 100803,
476
+ "<|LOC_507|>": 100804,
477
+ "<|LOC_508|>": 100805,
478
+ "<|LOC_509|>": 100806,
479
+ "<|LOC_50|>": 100347,
480
+ "<|LOC_510|>": 100807,
481
+ "<|LOC_511|>": 100808,
482
+ "<|LOC_512|>": 100809,
483
+ "<|LOC_513|>": 100810,
484
+ "<|LOC_514|>": 100811,
485
+ "<|LOC_515|>": 100812,
486
+ "<|LOC_516|>": 100813,
487
+ "<|LOC_517|>": 100814,
488
+ "<|LOC_518|>": 100815,
489
+ "<|LOC_519|>": 100816,
490
+ "<|LOC_51|>": 100348,
491
+ "<|LOC_520|>": 100817,
492
+ "<|LOC_521|>": 100818,
493
+ "<|LOC_522|>": 100819,
494
+ "<|LOC_523|>": 100820,
495
+ "<|LOC_524|>": 100821,
496
+ "<|LOC_525|>": 100822,
497
+ "<|LOC_526|>": 100823,
498
+ "<|LOC_527|>": 100824,
499
+ "<|LOC_528|>": 100825,
500
+ "<|LOC_529|>": 100826,
501
+ "<|LOC_52|>": 100349,
502
+ "<|LOC_530|>": 100827,
503
+ "<|LOC_531|>": 100828,
504
+ "<|LOC_532|>": 100829,
505
+ "<|LOC_533|>": 100830,
506
+ "<|LOC_534|>": 100831,
507
+ "<|LOC_535|>": 100832,
508
+ "<|LOC_536|>": 100833,
509
+ "<|LOC_537|>": 100834,
510
+ "<|LOC_538|>": 100835,
511
+ "<|LOC_539|>": 100836,
512
+ "<|LOC_53|>": 100350,
513
+ "<|LOC_540|>": 100837,
514
+ "<|LOC_541|>": 100838,
515
+ "<|LOC_542|>": 100839,
516
+ "<|LOC_543|>": 100840,
517
+ "<|LOC_544|>": 100841,
518
+ "<|LOC_545|>": 100842,
519
+ "<|LOC_546|>": 100843,
520
+ "<|LOC_547|>": 100844,
521
+ "<|LOC_548|>": 100845,
522
+ "<|LOC_549|>": 100846,
523
+ "<|LOC_54|>": 100351,
524
+ "<|LOC_550|>": 100847,
525
+ "<|LOC_551|>": 100848,
526
+ "<|LOC_552|>": 100849,
527
+ "<|LOC_553|>": 100850,
528
+ "<|LOC_554|>": 100851,
529
+ "<|LOC_555|>": 100852,
530
+ "<|LOC_556|>": 100853,
531
+ "<|LOC_557|>": 100854,
532
+ "<|LOC_558|>": 100855,
533
+ "<|LOC_559|>": 100856,
534
+ "<|LOC_55|>": 100352,
535
+ "<|LOC_560|>": 100857,
536
+ "<|LOC_561|>": 100858,
537
+ "<|LOC_562|>": 100859,
538
+ "<|LOC_563|>": 100860,
539
+ "<|LOC_564|>": 100861,
540
+ "<|LOC_565|>": 100862,
541
+ "<|LOC_566|>": 100863,
542
+ "<|LOC_567|>": 100864,
543
+ "<|LOC_568|>": 100865,
544
+ "<|LOC_569|>": 100866,
545
+ "<|LOC_56|>": 100353,
546
+ "<|LOC_570|>": 100867,
547
+ "<|LOC_571|>": 100868,
548
+ "<|LOC_572|>": 100869,
549
+ "<|LOC_573|>": 100870,
550
+ "<|LOC_574|>": 100871,
551
+ "<|LOC_575|>": 100872,
552
+ "<|LOC_576|>": 100873,
553
+ "<|LOC_577|>": 100874,
554
+ "<|LOC_578|>": 100875,
555
+ "<|LOC_579|>": 100876,
556
+ "<|LOC_57|>": 100354,
557
+ "<|LOC_580|>": 100877,
558
+ "<|LOC_581|>": 100878,
559
+ "<|LOC_582|>": 100879,
560
+ "<|LOC_583|>": 100880,
561
+ "<|LOC_584|>": 100881,
562
+ "<|LOC_585|>": 100882,
563
+ "<|LOC_586|>": 100883,
564
+ "<|LOC_587|>": 100884,
565
+ "<|LOC_588|>": 100885,
566
+ "<|LOC_589|>": 100886,
567
+ "<|LOC_58|>": 100355,
568
+ "<|LOC_590|>": 100887,
569
+ "<|LOC_591|>": 100888,
570
+ "<|LOC_592|>": 100889,
571
+ "<|LOC_593|>": 100890,
572
+ "<|LOC_594|>": 100891,
573
+ "<|LOC_595|>": 100892,
574
+ "<|LOC_596|>": 100893,
575
+ "<|LOC_597|>": 100894,
576
+ "<|LOC_598|>": 100895,
577
+ "<|LOC_599|>": 100896,
578
+ "<|LOC_59|>": 100356,
579
+ "<|LOC_5|>": 100302,
580
+ "<|LOC_600|>": 100897,
581
+ "<|LOC_601|>": 100898,
582
+ "<|LOC_602|>": 100899,
583
+ "<|LOC_603|>": 100900,
584
+ "<|LOC_604|>": 100901,
585
+ "<|LOC_605|>": 100902,
586
+ "<|LOC_606|>": 100903,
587
+ "<|LOC_607|>": 100904,
588
+ "<|LOC_608|>": 100905,
589
+ "<|LOC_609|>": 100906,
590
+ "<|LOC_60|>": 100357,
591
+ "<|LOC_610|>": 100907,
592
+ "<|LOC_611|>": 100908,
593
+ "<|LOC_612|>": 100909,
594
+ "<|LOC_613|>": 100910,
595
+ "<|LOC_614|>": 100911,
596
+ "<|LOC_615|>": 100912,
597
+ "<|LOC_616|>": 100913,
598
+ "<|LOC_617|>": 100914,
599
+ "<|LOC_618|>": 100915,
600
+ "<|LOC_619|>": 100916,
601
+ "<|LOC_61|>": 100358,
602
+ "<|LOC_620|>": 100917,
603
+ "<|LOC_621|>": 100918,
604
+ "<|LOC_622|>": 100919,
605
+ "<|LOC_623|>": 100920,
606
+ "<|LOC_624|>": 100921,
607
+ "<|LOC_625|>": 100922,
608
+ "<|LOC_626|>": 100923,
609
+ "<|LOC_627|>": 100924,
610
+ "<|LOC_628|>": 100925,
611
+ "<|LOC_629|>": 100926,
612
+ "<|LOC_62|>": 100359,
613
+ "<|LOC_630|>": 100927,
614
+ "<|LOC_631|>": 100928,
615
+ "<|LOC_632|>": 100929,
616
+ "<|LOC_633|>": 100930,
617
+ "<|LOC_634|>": 100931,
618
+ "<|LOC_635|>": 100932,
619
+ "<|LOC_636|>": 100933,
620
+ "<|LOC_637|>": 100934,
621
+ "<|LOC_638|>": 100935,
622
+ "<|LOC_639|>": 100936,
623
+ "<|LOC_63|>": 100360,
624
+ "<|LOC_640|>": 100937,
625
+ "<|LOC_641|>": 100938,
626
+ "<|LOC_642|>": 100939,
627
+ "<|LOC_643|>": 100940,
628
+ "<|LOC_644|>": 100941,
629
+ "<|LOC_645|>": 100942,
630
+ "<|LOC_646|>": 100943,
631
+ "<|LOC_647|>": 100944,
632
+ "<|LOC_648|>": 100945,
633
+ "<|LOC_649|>": 100946,
634
+ "<|LOC_64|>": 100361,
635
+ "<|LOC_650|>": 100947,
636
+ "<|LOC_651|>": 100948,
637
+ "<|LOC_652|>": 100949,
638
+ "<|LOC_653|>": 100950,
639
+ "<|LOC_654|>": 100951,
640
+ "<|LOC_655|>": 100952,
641
+ "<|LOC_656|>": 100953,
642
+ "<|LOC_657|>": 100954,
643
+ "<|LOC_658|>": 100955,
644
+ "<|LOC_659|>": 100956,
645
+ "<|LOC_65|>": 100362,
646
+ "<|LOC_660|>": 100957,
647
+ "<|LOC_661|>": 100958,
648
+ "<|LOC_662|>": 100959,
649
+ "<|LOC_663|>": 100960,
650
+ "<|LOC_664|>": 100961,
651
+ "<|LOC_665|>": 100962,
652
+ "<|LOC_666|>": 100963,
653
+ "<|LOC_667|>": 100964,
654
+ "<|LOC_668|>": 100965,
655
+ "<|LOC_669|>": 100966,
656
+ "<|LOC_66|>": 100363,
657
+ "<|LOC_670|>": 100967,
658
+ "<|LOC_671|>": 100968,
659
+ "<|LOC_672|>": 100969,
660
+ "<|LOC_673|>": 100970,
661
+ "<|LOC_674|>": 100971,
662
+ "<|LOC_675|>": 100972,
663
+ "<|LOC_676|>": 100973,
664
+ "<|LOC_677|>": 100974,
665
+ "<|LOC_678|>": 100975,
666
+ "<|LOC_679|>": 100976,
667
+ "<|LOC_67|>": 100364,
668
+ "<|LOC_680|>": 100977,
669
+ "<|LOC_681|>": 100978,
670
+ "<|LOC_682|>": 100979,
671
+ "<|LOC_683|>": 100980,
672
+ "<|LOC_684|>": 100981,
673
+ "<|LOC_685|>": 100982,
674
+ "<|LOC_686|>": 100983,
675
+ "<|LOC_687|>": 100984,
676
+ "<|LOC_688|>": 100985,
677
+ "<|LOC_689|>": 100986,
678
+ "<|LOC_68|>": 100365,
679
+ "<|LOC_690|>": 100987,
680
+ "<|LOC_691|>": 100988,
681
+ "<|LOC_692|>": 100989,
682
+ "<|LOC_693|>": 100990,
683
+ "<|LOC_694|>": 100991,
684
+ "<|LOC_695|>": 100992,
685
+ "<|LOC_696|>": 100993,
686
+ "<|LOC_697|>": 100994,
687
+ "<|LOC_698|>": 100995,
688
+ "<|LOC_699|>": 100996,
689
+ "<|LOC_69|>": 100366,
690
+ "<|LOC_6|>": 100303,
691
+ "<|LOC_700|>": 100997,
692
+ "<|LOC_701|>": 100998,
693
+ "<|LOC_702|>": 100999,
694
+ "<|LOC_703|>": 101000,
695
+ "<|LOC_704|>": 101001,
696
+ "<|LOC_705|>": 101002,
697
+ "<|LOC_706|>": 101003,
698
+ "<|LOC_707|>": 101004,
699
+ "<|LOC_708|>": 101005,
700
+ "<|LOC_709|>": 101006,
701
+ "<|LOC_70|>": 100367,
702
+ "<|LOC_710|>": 101007,
703
+ "<|LOC_711|>": 101008,
704
+ "<|LOC_712|>": 101009,
705
+ "<|LOC_713|>": 101010,
706
+ "<|LOC_714|>": 101011,
707
+ "<|LOC_715|>": 101012,
708
+ "<|LOC_716|>": 101013,
709
+ "<|LOC_717|>": 101014,
710
+ "<|LOC_718|>": 101015,
711
+ "<|LOC_719|>": 101016,
712
+ "<|LOC_71|>": 100368,
713
+ "<|LOC_720|>": 101017,
714
+ "<|LOC_721|>": 101018,
715
+ "<|LOC_722|>": 101019,
716
+ "<|LOC_723|>": 101020,
717
+ "<|LOC_724|>": 101021,
718
+ "<|LOC_725|>": 101022,
719
+ "<|LOC_726|>": 101023,
720
+ "<|LOC_727|>": 101024,
721
+ "<|LOC_728|>": 101025,
722
+ "<|LOC_729|>": 101026,
723
+ "<|LOC_72|>": 100369,
724
+ "<|LOC_730|>": 101027,
725
+ "<|LOC_731|>": 101028,
726
+ "<|LOC_732|>": 101029,
727
+ "<|LOC_733|>": 101030,
728
+ "<|LOC_734|>": 101031,
729
+ "<|LOC_735|>": 101032,
730
+ "<|LOC_736|>": 101033,
731
+ "<|LOC_737|>": 101034,
732
+ "<|LOC_738|>": 101035,
733
+ "<|LOC_739|>": 101036,
734
+ "<|LOC_73|>": 100370,
735
+ "<|LOC_740|>": 101037,
736
+ "<|LOC_741|>": 101038,
737
+ "<|LOC_742|>": 101039,
738
+ "<|LOC_743|>": 101040,
739
+ "<|LOC_744|>": 101041,
740
+ "<|LOC_745|>": 101042,
741
+ "<|LOC_746|>": 101043,
742
+ "<|LOC_747|>": 101044,
743
+ "<|LOC_748|>": 101045,
744
+ "<|LOC_749|>": 101046,
745
+ "<|LOC_74|>": 100371,
746
+ "<|LOC_750|>": 101047,
747
+ "<|LOC_751|>": 101048,
748
+ "<|LOC_752|>": 101049,
749
+ "<|LOC_753|>": 101050,
750
+ "<|LOC_754|>": 101051,
751
+ "<|LOC_755|>": 101052,
752
+ "<|LOC_756|>": 101053,
753
+ "<|LOC_757|>": 101054,
754
+ "<|LOC_758|>": 101055,
755
+ "<|LOC_759|>": 101056,
756
+ "<|LOC_75|>": 100372,
757
+ "<|LOC_760|>": 101057,
758
+ "<|LOC_761|>": 101058,
759
+ "<|LOC_762|>": 101059,
760
+ "<|LOC_763|>": 101060,
761
+ "<|LOC_764|>": 101061,
762
+ "<|LOC_765|>": 101062,
763
+ "<|LOC_766|>": 101063,
764
+ "<|LOC_767|>": 101064,
765
+ "<|LOC_768|>": 101065,
766
+ "<|LOC_769|>": 101066,
767
+ "<|LOC_76|>": 100373,
768
+ "<|LOC_770|>": 101067,
769
+ "<|LOC_771|>": 101068,
770
+ "<|LOC_772|>": 101069,
771
+ "<|LOC_773|>": 101070,
772
+ "<|LOC_774|>": 101071,
773
+ "<|LOC_775|>": 101072,
774
+ "<|LOC_776|>": 101073,
775
+ "<|LOC_777|>": 101074,
776
+ "<|LOC_778|>": 101075,
777
+ "<|LOC_779|>": 101076,
778
+ "<|LOC_77|>": 100374,
779
+ "<|LOC_780|>": 101077,
780
+ "<|LOC_781|>": 101078,
781
+ "<|LOC_782|>": 101079,
782
+ "<|LOC_783|>": 101080,
783
+ "<|LOC_784|>": 101081,
784
+ "<|LOC_785|>": 101082,
785
+ "<|LOC_786|>": 101083,
786
+ "<|LOC_787|>": 101084,
787
+ "<|LOC_788|>": 101085,
788
+ "<|LOC_789|>": 101086,
789
+ "<|LOC_78|>": 100375,
790
+ "<|LOC_790|>": 101087,
791
+ "<|LOC_791|>": 101088,
792
+ "<|LOC_792|>": 101089,
793
+ "<|LOC_793|>": 101090,
794
+ "<|LOC_794|>": 101091,
795
+ "<|LOC_795|>": 101092,
796
+ "<|LOC_796|>": 101093,
797
+ "<|LOC_797|>": 101094,
798
+ "<|LOC_798|>": 101095,
799
+ "<|LOC_799|>": 101096,
800
+ "<|LOC_79|>": 100376,
801
+ "<|LOC_7|>": 100304,
802
+ "<|LOC_800|>": 101097,
803
+ "<|LOC_801|>": 101098,
804
+ "<|LOC_802|>": 101099,
805
+ "<|LOC_803|>": 101100,
806
+ "<|LOC_804|>": 101101,
807
+ "<|LOC_805|>": 101102,
808
+ "<|LOC_806|>": 101103,
809
+ "<|LOC_807|>": 101104,
810
+ "<|LOC_808|>": 101105,
811
+ "<|LOC_809|>": 101106,
812
+ "<|LOC_80|>": 100377,
813
+ "<|LOC_810|>": 101107,
814
+ "<|LOC_811|>": 101108,
815
+ "<|LOC_812|>": 101109,
816
+ "<|LOC_813|>": 101110,
817
+ "<|LOC_814|>": 101111,
818
+ "<|LOC_815|>": 101112,
819
+ "<|LOC_816|>": 101113,
820
+ "<|LOC_817|>": 101114,
821
+ "<|LOC_818|>": 101115,
822
+ "<|LOC_819|>": 101116,
823
+ "<|LOC_81|>": 100378,
824
+ "<|LOC_820|>": 101117,
825
+ "<|LOC_821|>": 101118,
826
+ "<|LOC_822|>": 101119,
827
+ "<|LOC_823|>": 101120,
828
+ "<|LOC_824|>": 101121,
829
+ "<|LOC_825|>": 101122,
830
+ "<|LOC_826|>": 101123,
831
+ "<|LOC_827|>": 101124,
832
+ "<|LOC_828|>": 101125,
833
+ "<|LOC_829|>": 101126,
834
+ "<|LOC_82|>": 100379,
835
+ "<|LOC_830|>": 101127,
836
+ "<|LOC_831|>": 101128,
837
+ "<|LOC_832|>": 101129,
838
+ "<|LOC_833|>": 101130,
839
+ "<|LOC_834|>": 101131,
840
+ "<|LOC_835|>": 101132,
841
+ "<|LOC_836|>": 101133,
842
+ "<|LOC_837|>": 101134,
843
+ "<|LOC_838|>": 101135,
844
+ "<|LOC_839|>": 101136,
845
+ "<|LOC_83|>": 100380,
846
+ "<|LOC_840|>": 101137,
847
+ "<|LOC_841|>": 101138,
848
+ "<|LOC_842|>": 101139,
849
+ "<|LOC_843|>": 101140,
850
+ "<|LOC_844|>": 101141,
851
+ "<|LOC_845|>": 101142,
852
+ "<|LOC_846|>": 101143,
853
+ "<|LOC_847|>": 101144,
854
+ "<|LOC_848|>": 101145,
855
+ "<|LOC_849|>": 101146,
856
+ "<|LOC_84|>": 100381,
857
+ "<|LOC_850|>": 101147,
858
+ "<|LOC_851|>": 101148,
859
+ "<|LOC_852|>": 101149,
860
+ "<|LOC_853|>": 101150,
861
+ "<|LOC_854|>": 101151,
862
+ "<|LOC_855|>": 101152,
863
+ "<|LOC_856|>": 101153,
864
+ "<|LOC_857|>": 101154,
865
+ "<|LOC_858|>": 101155,
866
+ "<|LOC_859|>": 101156,
867
+ "<|LOC_85|>": 100382,
868
+ "<|LOC_860|>": 101157,
869
+ "<|LOC_861|>": 101158,
870
+ "<|LOC_862|>": 101159,
871
+ "<|LOC_863|>": 101160,
872
+ "<|LOC_864|>": 101161,
873
+ "<|LOC_865|>": 101162,
874
+ "<|LOC_866|>": 101163,
875
+ "<|LOC_867|>": 101164,
876
+ "<|LOC_868|>": 101165,
877
+ "<|LOC_869|>": 101166,
878
+ "<|LOC_86|>": 100383,
879
+ "<|LOC_870|>": 101167,
880
+ "<|LOC_871|>": 101168,
881
+ "<|LOC_872|>": 101169,
882
+ "<|LOC_873|>": 101170,
883
+ "<|LOC_874|>": 101171,
884
+ "<|LOC_875|>": 101172,
885
+ "<|LOC_876|>": 101173,
886
+ "<|LOC_877|>": 101174,
887
+ "<|LOC_878|>": 101175,
888
+ "<|LOC_879|>": 101176,
889
+ "<|LOC_87|>": 100384,
890
+ "<|LOC_880|>": 101177,
891
+ "<|LOC_881|>": 101178,
892
+ "<|LOC_882|>": 101179,
893
+ "<|LOC_883|>": 101180,
894
+ "<|LOC_884|>": 101181,
895
+ "<|LOC_885|>": 101182,
896
+ "<|LOC_886|>": 101183,
897
+ "<|LOC_887|>": 101184,
898
+ "<|LOC_888|>": 101185,
899
+ "<|LOC_889|>": 101186,
900
+ "<|LOC_88|>": 100385,
901
+ "<|LOC_890|>": 101187,
902
+ "<|LOC_891|>": 101188,
903
+ "<|LOC_892|>": 101189,
904
+ "<|LOC_893|>": 101190,
905
+ "<|LOC_894|>": 101191,
906
+ "<|LOC_895|>": 101192,
907
+ "<|LOC_896|>": 101193,
908
+ "<|LOC_897|>": 101194,
909
+ "<|LOC_898|>": 101195,
910
+ "<|LOC_899|>": 101196,
911
+ "<|LOC_89|>": 100386,
912
+ "<|LOC_8|>": 100305,
913
+ "<|LOC_900|>": 101197,
914
+ "<|LOC_901|>": 101198,
915
+ "<|LOC_902|>": 101199,
916
+ "<|LOC_903|>": 101200,
917
+ "<|LOC_904|>": 101201,
918
+ "<|LOC_905|>": 101202,
919
+ "<|LOC_906|>": 101203,
920
+ "<|LOC_907|>": 101204,
921
+ "<|LOC_908|>": 101205,
922
+ "<|LOC_909|>": 101206,
923
+ "<|LOC_90|>": 100387,
924
+ "<|LOC_910|>": 101207,
925
+ "<|LOC_911|>": 101208,
926
+ "<|LOC_912|>": 101209,
927
+ "<|LOC_913|>": 101210,
928
+ "<|LOC_914|>": 101211,
929
+ "<|LOC_915|>": 101212,
930
+ "<|LOC_916|>": 101213,
931
+ "<|LOC_917|>": 101214,
932
+ "<|LOC_918|>": 101215,
933
+ "<|LOC_919|>": 101216,
934
+ "<|LOC_91|>": 100388,
935
+ "<|LOC_920|>": 101217,
936
+ "<|LOC_921|>": 101218,
937
+ "<|LOC_922|>": 101219,
938
+ "<|LOC_923|>": 101220,
939
+ "<|LOC_924|>": 101221,
940
+ "<|LOC_925|>": 101222,
941
+ "<|LOC_926|>": 101223,
942
+ "<|LOC_927|>": 101224,
943
+ "<|LOC_928|>": 101225,
944
+ "<|LOC_929|>": 101226,
945
+ "<|LOC_92|>": 100389,
946
+ "<|LOC_930|>": 101227,
947
+ "<|LOC_931|>": 101228,
948
+ "<|LOC_932|>": 101229,
949
+ "<|LOC_933|>": 101230,
950
+ "<|LOC_934|>": 101231,
951
+ "<|LOC_935|>": 101232,
952
+ "<|LOC_936|>": 101233,
953
+ "<|LOC_937|>": 101234,
954
+ "<|LOC_938|>": 101235,
955
+ "<|LOC_939|>": 101236,
956
+ "<|LOC_93|>": 100390,
957
+ "<|LOC_940|>": 101237,
958
+ "<|LOC_941|>": 101238,
959
+ "<|LOC_942|>": 101239,
960
+ "<|LOC_943|>": 101240,
961
+ "<|LOC_944|>": 101241,
962
+ "<|LOC_945|>": 101242,
963
+ "<|LOC_946|>": 101243,
964
+ "<|LOC_947|>": 101244,
965
+ "<|LOC_948|>": 101245,
966
+ "<|LOC_949|>": 101246,
967
+ "<|LOC_94|>": 100391,
968
+ "<|LOC_950|>": 101247,
969
+ "<|LOC_951|>": 101248,
970
+ "<|LOC_952|>": 101249,
971
+ "<|LOC_953|>": 101250,
972
+ "<|LOC_954|>": 101251,
973
+ "<|LOC_955|>": 101252,
974
+ "<|LOC_956|>": 101253,
975
+ "<|LOC_957|>": 101254,
976
+ "<|LOC_958|>": 101255,
977
+ "<|LOC_959|>": 101256,
978
+ "<|LOC_95|>": 100392,
979
+ "<|LOC_960|>": 101257,
980
+ "<|LOC_961|>": 101258,
981
+ "<|LOC_962|>": 101259,
982
+ "<|LOC_963|>": 101260,
983
+ "<|LOC_964|>": 101261,
984
+ "<|LOC_965|>": 101262,
985
+ "<|LOC_966|>": 101263,
986
+ "<|LOC_967|>": 101264,
987
+ "<|LOC_968|>": 101265,
988
+ "<|LOC_969|>": 101266,
989
+ "<|LOC_96|>": 100393,
990
+ "<|LOC_970|>": 101267,
991
+ "<|LOC_971|>": 101268,
992
+ "<|LOC_972|>": 101269,
993
+ "<|LOC_973|>": 101270,
994
+ "<|LOC_974|>": 101271,
995
+ "<|LOC_975|>": 101272,
996
+ "<|LOC_976|>": 101273,
997
+ "<|LOC_977|>": 101274,
998
+ "<|LOC_978|>": 101275,
999
+ "<|LOC_979|>": 101276,
1000
+ "<|LOC_97|>": 100394,
1001
+ "<|LOC_980|>": 101277,
1002
+ "<|LOC_981|>": 101278,
1003
+ "<|LOC_982|>": 101279,
1004
+ "<|LOC_983|>": 101280,
1005
+ "<|LOC_984|>": 101281,
1006
+ "<|LOC_985|>": 101282,
1007
+ "<|LOC_986|>": 101283,
1008
+ "<|LOC_987|>": 101284,
1009
+ "<|LOC_988|>": 101285,
1010
+ "<|LOC_989|>": 101286,
1011
+ "<|LOC_98|>": 100395,
1012
+ "<|LOC_990|>": 101287,
1013
+ "<|LOC_991|>": 101288,
1014
+ "<|LOC_992|>": 101289,
1015
+ "<|LOC_993|>": 101290,
1016
+ "<|LOC_994|>": 101291,
1017
+ "<|LOC_995|>": 101292,
1018
+ "<|LOC_996|>": 101293,
1019
+ "<|LOC_997|>": 101294,
1020
+ "<|LOC_998|>": 101295,
1021
+ "<|LOC_999|>": 101296,
1022
+ "<|LOC_99|>": 100396,
1023
+ "<|LOC_9|>": 100306,
1024
+ "<|LOC_BEGIN|>": 101298,
1025
+ "<|LOC_END|>": 101299,
1026
+ "<|LOC_SEP|>": 101300,
1027
+ "<|image_pad|>": 101304,
1028
+ "<|video_pad|>": 101307,
1029
+ "@bottom": 101975,
1030
+ "@left": 101972,
1031
+ "@obverse": 101970,
1032
+ "@reverse": 101971,
1033
+ "@right": 101973,
1034
+ "@top": 101974,
1035
+ "𒀀": 101546,
1036
+ "𒀀𒀭": 101478,
1037
+ "𒀀𒂔𒀀𒇲": 101863,
1038
+ "𒀀𒂔𒇲": 101385,
1039
+ "𒀀𒅆": 101625,
1040
+ "𒀀𒅗": 101496,
1041
+ "𒀀𒇉": 101544,
1042
+ "𒀀𒇒": 101943,
1043
+ "𒀀𒉺𒄐𒉻𒋛𒀀": 101675,
1044
+ "𒀀𒋗𒉀": 101698,
1045
+ "𒀀𒋢": 101456,
1046
+ "𒀀𒌁": 101495,
1047
+ "𒀀𒌅𒃮𒇺": 101582,
1048
+ "𒀄": 101501,
1049
+ "𒀉": 101399,
1050
+ "𒀉𒆗": 101743,
1051
+ "𒀊": 101470,
1052
+ "𒀋": 101421,
1053
+ "𒀏": 101463,
1054
+ "𒀕": 101748,
1055
+ "𒀖": 101674,
1056
+ "𒀖𒆪": 101932,
1057
+ "𒀚": 101756,
1058
+ "𒀜": 101563,
1059
+ "𒀝": 101484,
1060
+ "𒀞": 101473,
1061
+ "𒀠": 101327,
1062
+ "𒀩": 101798,
1063
+ "𒀪": 101902,
1064
+ "𒀫": 101697,
1065
+ "𒀫𒌓": 101549,
1066
+ "𒀬": 101846,
1067
+ "𒀬𒀬": 101430,
1068
+ "𒀭": 101485,
1069
+ "𒀭𒀭": 101600,
1070
+ "𒀭𒀸𒀭": 101662,
1071
+ "𒀭𒅎𒂂": 101954,
1072
+ "𒀭𒅎𒈪": 101946,
1073
+ "𒀭𒈾": 101321,
1074
+ "𒀮": 101738,
1075
+ "𒀯": 101410,
1076
+ "𒀲": 101679,
1077
+ "𒀲𒀴": 101776,
1078
+ "𒀲𒅆𒂠": 101735,
1079
+ "𒀲𒊩": 101554,
1080
+ "𒀳": 101425,
1081
+ "𒀴": 101499,
1082
+ "𒀵": 101912,
1083
+ "𒀸": 101632,
1084
+ "𒀸𒀸": 101609,
1085
+ "𒀹": 101584,
1086
+ "𒀾": 101623,
1087
+ "𒀿": 101523,
1088
+ "𒁀": 101871,
1089
+ "𒁁": 101822,
1090
+ "𒁃": 101930,
1091
+ "𒁄": 101316,
1092
+ "𒁆": 101636,
1093
+ "𒁇": 101394,
1094
+ "𒁇𒂔": 101935,
1095
+ "𒁈": 101476,
1096
+ "𒁉": 101938,
1097
+ "𒁉𒁷": 101414,
1098
+ "𒁉𒌑𒊓": 101788,
1099
+ "𒁋": 101434,
1100
+ "𒁍": 101694,
1101
+ "𒁑": 101481,
1102
+ "𒁓": 101566,
1103
+ "𒁔": 101486,
1104
+ "𒁕": 101901,
1105
+ "𒁖": 101750,
1106
+ "𒁜": 101872,
1107
+ "𒁦": 101371,
1108
+ "𒁭": 101494,
1109
+ "𒁮": 101358,
1110
+ "𒁯": 101631,
1111
+ "𒁰": 101812,
1112
+ "𒁱": 101660,
1113
+ "𒁲": 101778,
1114
+ "𒁳": 101630,
1115
+ "𒁴": 101645,
1116
+ "𒁵": 101895,
1117
+ "𒁶": 101865,
1118
+ "𒁷": 101960,
1119
+ "𒁷𒊺𒉪": 101839,
1120
+ "𒁹": 101816,
1121
+ "𒁹𒁹": 101361,
1122
+ "𒁺": 101472,
1123
+ "𒁺𒁺": 101915,
1124
+ "𒁻": 101690,
1125
+ "𒁼": 101658,
1126
+ "𒁽": 101529,
1127
+ "𒁾": 101684,
1128
+ "𒁾𒉄": 101663,
1129
+ "𒂀": 101356,
1130
+ "𒂁": 101530,
1131
+ "𒂂": 101330,
1132
+ "𒂄": 101412,
1133
+ "𒂅": 101847,
1134
+ "𒂆": 101858,
1135
+ "𒂇": 101687,
1136
+ "𒂉": 101424,
1137
+ "𒂊": 101353,
1138
+ "𒂍": 101539,
1139
+ "𒂍𒉣": 101370,
1140
+ "𒂔": 101467,
1141
+ "𒂕": 101667,
1142
+ "𒂖": 101942,
1143
+ "𒂗": 101411,
1144
+ "𒂗𒆤": 101422,
1145
+ "𒂗𒈨𒄀": 101334,
1146
+ "𒂗𒈨𒇷": 101607,
1147
+ "𒂗𒍪": 101605,
1148
+ "𒂙": 101619,
1149
+ "𒂞": 101443,
1150
+ "𒂟": 101928,
1151
+ "𒂠": 101319,
1152
+ "𒂠𒈤": 101934,
1153
+ "𒂠𒊺": 101461,
1154
+ "𒂡": 101944,
1155
+ "𒂦": 101925,
1156
+ "𒂫": 101540,
1157
+ "𒂬": 101594,
1158
+ "𒂵": 101730,
1159
+ "𒂶": 101507,
1160
+ "𒂷": 101462,
1161
+ "𒂷𒄑": 101936,
1162
+ "𒂼": 101596,
1163
+ "𒃅": 101634,
1164
+ "𒃌": 101811,
1165
+ "𒃎": 101659,
1166
+ "𒃞": 101537,
1167
+ "𒃠": 101506,
1168
+ "𒃡": 101532,
1169
+ "𒃢": 101939,
1170
+ "𒃣": 101842,
1171
+ "𒃮": 101967,
1172
+ "𒃰": 101357,
1173
+ "𒃰𒋺𒋛": 101335,
1174
+ "𒃱": 101927,
1175
+ "𒃲": 101574,
1176
+ "𒃲𒁔": 101883,
1177
+ "𒃲𒉌": 101517,
1178
+ "𒃲𒌺": 101804,
1179
+ "𒃳": 101480,
1180
+ "𒃴": 101639,
1181
+ "𒃵": 101404,
1182
+ "𒃶": 101364,
1183
+ "𒃷": 101560,
1184
+ "𒃸": 101691,
1185
+ "𒃻": 101400,
1186
+ "𒃻𒊮𒀀": 101860,
1187
+ "𒃼": 101511,
1188
+ "𒃽": 101808,
1189
+ "𒃾": 101354,
1190
+ "𒄀": 101437,
1191
+ "𒄀𒈾𒀊𒌈": 101732,
1192
+ "𒄀𒈾𒀊𒌋𒄞": 101866,
1193
+ "𒄃": 101807,
1194
+ "𒄄": 101447,
1195
+ "𒄇": 101569,
1196
+ "𒄈": 101884,
1197
+ "𒄉": 101759,
1198
+ "𒄊": 101622,
1199
+ "𒄊𒀕𒃲": 101407,
1200
+ "𒄊𒀴": 101597,
1201
+ "𒄋": 101483,
1202
+ "𒄌": 101657,
1203
+ "𒄎": 101966,
1204
+ "𒄐": 101633,
1205
+ "𒄑": 101921,
1206
+ "𒄑𒆵": 101817,
1207
+ "𒄑𒈪": 101352,
1208
+ "𒄑𒉈": 101783,
1209
+ "𒄑𒉋": 101502,
1210
+ "𒄑𒌆𒉿": 101427,
1211
+ "𒄑𒌆𒉿𒋼𒀀𒁺": 101905,
1212
+ "𒄑𒌆𒋼𒀀𒁺": 101602,
1213
+ "𒄒": 101638,
1214
+ "𒄖": 101398,
1215
+ "𒄗": 101941,
1216
+ "𒄘": 101575,
1217
+ "𒄘𒃼": 101628,
1218
+ "𒄘𒉭": 101952,
1219
+ "𒄘𒌦": 101395,
1220
+ "𒄙": 101641,
1221
+ "𒄞": 101611,
1222
+ "𒄟": 101458,
1223
+ "𒄠": 101343,
1224
+ "𒄢": 101795,
1225
+ "𒄣": 101771,
1226
+ "𒄤": 101564,
1227
+ "𒄥": 101451,
1228
+ "𒄦": 101869,
1229
+ "𒄧": 101709,
1230
+ "𒄨": 101375,
1231
+ "𒄩": 101741,
1232
+ "𒄩𒀀": 101435,
1233
+ "𒄪": 101722,
1234
+ "𒄫": 101864,
1235
+ "𒄬": 101704,
1236
+ "𒄭": 101615,
1237
+ "𒄭𒀀": 101479,
1238
+ "𒄭𒄊": 101766,
1239
+ "𒄮": 101533,
1240
+ "𒄯": 101429,
1241
+ "𒄯𒄯": 101616,
1242
+ "𒄰": 101739,
1243
+ "𒄴": 101548,
1244
+ "𒄴𒈨": 101718,
1245
+ "𒄵": 101510,
1246
+ "𒄷": 101706,
1247
+ "𒄷𒄭": 101922,
1248
+ "𒄷𒈿": 101572,
1249
+ "𒄷𒋛": 101603,
1250
+ "𒄸": 101332,
1251
+ "𒄾": 101401,
1252
+ "𒄿": 101666,
1253
+ "𒅀": 101315,
1254
+ "𒅁": 101369,
1255
+ "𒅂": 101426,
1256
+ "𒅅": 101765,
1257
+ "𒅆": 101681,
1258
+ "𒅆𒁾": 101752,
1259
+ "𒅆𒂍": 101474,
1260
+ "𒅆𒂟": 101747,
1261
+ "𒅆𒂠": 101782,
1262
+ "𒅆𒆳𒍝": 101390,
1263
+ "𒅆𒊒": 101515,
1264
+ "𒅆𒌨": 101809,
1265
+ "𒅇": 101624,
1266
+ "𒅈": 101892,
1267
+ "𒅊": 101449,
1268
+ "𒅋": 101363,
1269
+ "𒅌": 101450,
1270
+ "𒅍": 101774,
1271
+ "𒅎": 101763,
1272
+ "𒅏": 101880,
1273
+ "𒅓": 101559,
1274
+ "𒅔": 101701,
1275
+ "𒅕": 101696,
1276
+ "𒅖": 101833,
1277
+ "𒅗": 101897,
1278
+ "𒅗𒀭": 101328,
1279
+ "𒅗𒁲": 101719,
1280
+ "𒅘": 101904,
1281
+ "𒅡": 101769,
1282
+ "𒅢": 101881,
1283
+ "𒅤": 101325,
1284
+ "𒅤𒊭": 101595,
1285
+ "𒅥": 101913,
1286
+ "𒅮": 101929,
1287
+ "𒅲": 101612,
1288
+ "𒅴": 101567,
1289
+ "𒅻": 101646,
1290
+ "𒅾": 101688,
1291
+ "𒅿": 101753,
1292
+ "𒆁": 101342,
1293
+ "𒆈": 101762,
1294
+ "𒆍": 101819,
1295
+ "𒆍𒀭𒊏": 101791,
1296
+ "𒆍𒃲": 101604,
1297
+ "𒆏": 101642,
1298
+ "𒆒": 101827,
1299
+ "𒆓": 101768,
1300
+ "𒆕": 101586,
1301
+ "𒆗": 101576,
1302
+ "𒆘": 101882,
1303
+ "𒆚": 101336,
1304
+ "𒆜": 101959,
1305
+ "𒆜𒁍": 101652,
1306
+ "𒆜𒆳": 101418,
1307
+ "𒆟": 101340,
1308
+ "𒆠": 101614,
1309
+ "𒆠𒀀": 101887,
1310
+ "𒆠𒆗": 101703,
1311
+ "𒆠𒆘": 101423,
1312
+ "𒆠𒇴": 101907,
1313
+ "𒆠𒈫": 101538,
1314
+ "𒆠𒌓": 101664,
1315
+ "𒆠𒍇": 101373,
1316
+ "𒆤": 101924,
1317
+ "𒆥": 101728,
1318
+ "𒆦": 101588,
1319
+ "𒆧": 101720,
1320
+ "𒆪": 101844,
1321
+ "𒆪𒀭": 101707,
1322
+ "𒆬": 101893,
1323
+ "𒆭": 101349,
1324
+ "𒆯": 101853,
1325
+ "𒆰": 101590,
1326
+ "𒆲": 101346,
1327
+ "𒆳": 101678,
1328
+ "𒆵": 101693,
1329
+ "𒆷": 101525,
1330
+ "𒆸": 101593,
1331
+ "𒆸𒆸": 101457,
1332
+ "𒆹": 101789,
1333
+ "𒇀": 101700,
1334
+ "𒇅": 101599,
1335
+ "𒇇": 101841,
1336
+ "𒇉": 101692,
1337
+ "𒇋": 101322,
1338
+ "𒇌": 101813,
1339
+ "𒇒": 101516,
1340
+ "𒇡": 101534,
1341
+ "𒇥": 101876,
1342
+ "𒇧": 101577,
1343
+ "𒇬": 101851,
1344
+ "𒇭": 101417,
1345
+ "𒇯": 101438,
1346
+ "𒇯𒁺": 101834,
1347
+ "𒇲": 101393,
1348
+ "𒇲𒆸": 101859,
1349
+ "𒇲𒊬": 101826,
1350
+ "𒇳": 101867,
1351
+ "𒇳𒁺": 101889,
1352
+ "𒇳𒆸": 101651,
1353
+ "𒇳𒊬": 101964,
1354
+ "𒇴": 101475,
1355
+ "𒇵": 101466,
1356
+ "𒇶": 101562,
1357
+ "𒇷": 101820,
1358
+ "𒇸": 101711,
1359
+ "𒇹": 101362,
1360
+ "𒇺": 101682,
1361
+ "𒇻": 101460,
1362
+ "𒇻𒄾": 101585,
1363
+ "𒇻𒋢": 101909,
1364
+ "𒇼": 101962,
1365
+ "𒇽": 101513,
1366
+ "𒇿": 101409,
1367
+ "𒈂": 101814,
1368
+ "𒈌": 101433,
1369
+ "𒈐": 101926,
1370
+ "𒈕": 101742,
1371
+ "𒈖": 101647,
1372
+ "𒈗": 101686,
1373
+ "𒈚": 101348,
1374
+ "𒈛": 101379,
1375
+ "𒈜": 101779,
1376
+ "𒈝": 101629,
1377
+ "𒈠": 101947,
1378
+ "𒈢": 101873,
1379
+ "𒈣": 101498,
1380
+ "𒈤": 101552,
1381
+ "𒈥": 101785,
1382
+ "𒈦": 101878,
1383
+ "𒈦𒄘𒃼": 101823,
1384
+ "𒈧": 101397,
1385
+ "𒈨": 101359,
1386
+ "𒈨𒌍": 101796,
1387
+ "𒈩": 101497,
1388
+ "𒈪": 101716,
1389
+ "𒈪𒈪": 101518,
1390
+ "𒈪𒉭": 101492,
1391
+ "𒈫": 101453,
1392
+ "𒈬": 101606,
1393
+ "𒈭": 101772,
1394
+ "𒈮": 101945,
1395
+ "𒈯": 101579,
1396
+ "𒈰": 101490,
1397
+ "𒈱": 101440,
1398
+ "𒈲": 101824,
1399
+ "𒈹": 101790,
1400
+ "𒈹𒍝𒀕": 101536,
1401
+ "𒈽": 101755,
1402
+ "𒈾": 101680,
1403
+ "𒈿": 101512,
1404
+ "𒉀": 101898,
1405
+ "𒉄": 101803,
1406
+ "𒉆": 101504,
1407
+ "𒉇": 101886,
1408
+ "𒉈": 101888,
1409
+ "𒉈𒊒": 101439,
1410
+ "𒉋": 101320,
1411
+ "𒉌": 101643,
1412
+ "𒉌𒄑": 101372,
1413
+ "𒉌𒌇": 101521,
1414
+ "𒉌𒌓": 101360,
1415
+ "𒉎": 101906,
1416
+ "𒉏": 101491,
1417
+ "𒉐": 101541,
1418
+ "𒉒": 101355,
1419
+ "𒉓": 101406,
1420
+ "𒉘": 101617,
1421
+ "𒉚": 101899,
1422
+ "𒉠": 101879,
1423
+ "𒉡": 101403,
1424
+ "𒉢": 101723,
1425
+ "𒉢𒁓": 101648,
1426
+ "𒉢𒁓𒆷": 101367,
1427
+ "𒉣": 101661,
1428
+ "𒉣𒇬": 101923,
1429
+ "𒉣𒈨": 101850,
1430
+ "𒉣𒈨𒅤": 101786,
1431
+ "𒉩": 101452,
1432
+ "𒉪": 101531,
1433
+ "𒉭": 101578,
1434
+ "𒉮": 101852,
1435
+ "𒉯": 101558,
1436
+ "𒉴": 101581,
1437
+ "𒉺": 101916,
1438
+ "𒉺𒀠": 101561,
1439
+ "𒉺𒀭": 101323,
1440
+ "𒉺𒁼": 101726,
1441
+ "𒉺𒁽": 101900,
1442
+ "𒉺𒃶": 101610,
1443
+ "𒉺𒄸𒁺": 101724,
1444
+ "𒉺𒅁": 101963,
1445
+ "𒉺𒇻": 101528,
1446
+ "𒉺𒋼𒋛": 101727,
1447
+ "𒉺𒌆": 101702,
1448
+ "𒉻": 101956,
1449
+ "𒉻𒀭𒈹": 101333,
1450
+ "𒉻𒈹": 101917,
1451
+ "𒉼": 101620,
1452
+ "𒉽": 101810,
1453
+ "𒉽𒂊": 101396,
1454
+ "𒉽𒅖": 101420,
1455
+ "𒉽𒉽": 101896,
1456
+ "𒉾": 101341,
1457
+ "𒉿": 101761,
1458
+ "𒊊": 101653,
1459
+ "𒊌": 101339,
1460
+ "𒊍": 101781,
1461
+ "𒊏": 101608,
1462
+ "𒊐": 101868,
1463
+ "𒊑": 101677,
1464
+ "𒊒": 101344,
1465
+ "𒊓": 101734,
1466
+ "𒊕": 101831,
1467
+ "𒊚𒃮": 101689,
1468
+ "𒊨": 101749,
1469
+ "𒊩": 101416,
1470
+ "𒊩𒀲": 101835,
1471
+ "𒊩𒂠": 101856,
1472
+ "𒊩𒃢": 101489,
1473
+ "𒊩𒆪": 101583,
1474
+ "𒊩𒆳": 101351,
1475
+ "𒊩𒈠": 101695,
1476
+ "𒊩𒈨": 101890,
1477
+ "𒊩𒌆": 101854,
1478
+ "𒊩𒌨": 101669,
1479
+ "𒊬": 101918,
1480
+ "𒊭": 101522,
1481
+ "𒊮": 101828,
1482
+ "𒊷": 101737,
1483
+ "𒊹": 101471,
1484
+ "𒊺": 101601,
1485
+ "𒊺𒉀": 101948,
1486
+ "𒊺𒊺𒉪": 101542,
1487
+ "𒊻": 101392,
1488
+ "𒊿": 101825,
1489
+ "𒋀": 101377,
1490
+ "𒋀𒀊": 101415,
1491
+ "𒋀𒀕": 101553,
1492
+ "𒋀𒆠": 101568,
1493
+ "𒋁": 101431,
1494
+ "𒋃": 101383,
1495
+ "𒋄": 101951,
1496
+ "𒋆": 101380,
1497
+ "𒋋": 101933,
1498
+ "𒋒": 101885,
1499
+ "𒋓": 101555,
1500
+ "𒋗": 101598,
1501
+ "𒋗𒃸": 101384,
1502
+ "𒋗𒃻𒌉𒇲": 101950,
1503
+ "𒋗𒃻𒌉𒇲𒁉": 101618,
1504
+ "𒋗𒆗": 101488,
1505
+ "𒋗𒈫": 101627,
1506
+ "𒋗𒉀": 101446,
1507
+ "𒋙": 101454,
1508
+ "𒋙𒀭": 101968,
1509
+ "𒋙𒀭𒄲": 101806,
1510
+ "𒋙𒀯": 101751,
1511
+ "𒋚": 101957,
1512
+ "𒋛": 101535,
1513
+ "𒋛𒀀": 101855,
1514
+ "𒋜": 101862,
1515
+ "𒋝": 101836,
1516
+ "𒋞": 101368,
1517
+ "𒋠": 101919,
1518
+ "𒋡": 101877,
1519
+ "𒋢": 101376,
1520
+ "𒋢𒆳𒊒": 101758,
1521
+ "𒋣": 101655,
1522
+ "𒋤": 101650,
1523
+ "𒋥": 101545,
1524
+ "𒋦": 101350,
1525
+ "𒋧": 101591,
1526
+ "𒋩": 101764,
1527
+ "𒋪": 101670,
1528
+ "𒋫": 101314,
1529
+ "𒋭": 101455,
1530
+ "𒋰": 101571,
1531
+ "𒋳": 101891,
1532
+ "𒋺": 101784,
1533
+ "𒋻": 101565,
1534
+ "𒋼": 101712,
1535
+ "𒋼𒀀": 101613,
1536
+ "𒋼𒀊": 101729,
1537
+ "𒋼𒀕": 101744,
1538
+ "𒋽": 101386,
1539
+ "𒋾": 101857,
1540
+ "𒌀": 101665,
1541
+ "𒌁": 101733,
1542
+ "𒌃": 101676,
1543
+ "𒌅": 101914,
1544
+ "𒌆": 101773,
1545
+ "𒌆𒌓": 101760,
1546
+ "𒌇": 101445,
1547
+ "𒌈": 101849,
1548
+ "𒌉": 101953,
1549
+ "𒌉𒍑": 101705,
1550
+ "𒌋": 101505,
1551
+ "𒌋𒀜": 101550,
1552
+ "𒌋𒂔": 101801,
1553
+ "𒌋𒂙": 101637,
1554
+ "𒌋𒃶": 101656,
1555
+ "𒌋𒅗": 101969,
1556
+ "𒌋𒈬": 101493,
1557
+ "𒌋𒌆": 101509,
1558
+ "𒌋𒌋": 101861,
1559
+ "𒌋𒌋𒌋": 101767,
1560
+ "𒌋𒌓": 101870,
1561
+ "𒌋𒌓𒆤": 101635,
1562
+ "𒌋𒐍": 101329,
1563
+ "𒌋𒑆": 101487,
1564
+ "𒌌": 101573,
1565
+ "𒌍": 101442,
1566
+ "𒌑": 101469,
1567
+ "𒌑𒆠𒋧𒂵": 101408,
1568
+ "𒌑𒉀𒂵": 101413,
1569
+ "𒌒": 101965,
1570
+ "𒌓": 101640,
1571
+ "𒌓𒀊": 101802,
1572
+ "𒌓𒀕": 101829,
1573
+ "𒌓𒁺": 101699,
1574
+ "𒌓𒄒𒉣": 101840,
1575
+ "𒌓𒅗𒁇": 101428,
1576
+ "𒌓𒈣𒀏𒆠𒋳": 101805,
1577
+ "𒌓𒉣": 101326,
1578
+ "𒌓𒌓": 101800,
1579
+ "𒌔": 101797,
1580
+ "𒌗": 101621,
1581
+ "𒌙": 101908,
1582
+ "𒌝": 101441,
1583
+ "𒌝𒈨": 101894,
1584
+ "𒌢": 101911,
1585
+ "𒌣": 101736,
1586
+ "𒌤": 101843,
1587
+ "𒌦": 101961,
1588
+ "𒌨": 101378,
1589
+ "𒌫": 101821,
1590
+ "𒌴": 101432,
1591
+ "𒌵": 101520,
1592
+ "𒌶": 101673,
1593
+ "𒌷": 101324,
1594
+ "𒌸": 101419,
1595
+ "𒌺": 101388,
1596
+ "𒍀": 101317,
1597
+ "𒍂": 101955,
1598
+ "𒍇": 101557,
1599
+ "𒍍": 101757,
1600
+ "𒍎": 101318,
1601
+ "𒍏": 101387,
1602
+ "𒍑": 101937,
1603
+ "𒍑𒆪": 101366,
1604
+ "𒍕": 101519,
1605
+ "𒍚": 101777,
1606
+ "𒍜": 101874,
1607
+ "𒍝": 101672,
1608
+ "𒍝𒄢": 101468,
1609
+ "𒍝𒈹𒀕": 101848,
1610
+ "𒍝𒈽𒀕": 101570,
1611
+ "𒍝𒉏": 101668,
1612
+ "𒍠": 101799,
1613
+ "𒍠𒀭": 101949,
1614
+ "𒍠𒄩": 101931,
1615
+ "𒍢": 101746,
1616
+ "𒍣": 101838,
1617
+ "𒍤𒆸": 101331,
1618
+ "𒍥": 101920,
1619
+ "𒍦": 101793,
1620
+ "𒍨": 101543,
1621
+ "𒍩": 101337,
1622
+ "𒍩𒀭": 101731,
1623
+ "𒍪": 101477,
1624
+ "𒍪𒀊": 101547,
1625
+ "𒍫": 101815,
1626
+ "𒍬": 101654,
1627
+ "𒍮": 101508,
1628
+ "𒐀": 101589,
1629
+ "𒐁": 101448,
1630
+ "𒐂": 101381,
1631
+ "𒐃": 101526,
1632
+ "𒐄": 101787,
1633
+ "𒐅": 101500,
1634
+ "𒐆": 101459,
1635
+ "𒐇": 101444,
1636
+ "𒐈": 101580,
1637
+ "𒐉": 101715,
1638
+ "𒐊": 101714,
1639
+ "𒐋": 101514,
1640
+ "𒐌": 101345,
1641
+ "𒐍": 101551,
1642
+ "𒐎": 101503,
1643
+ "𒐏": 101958,
1644
+ "𒐏𒐉": 101875,
1645
+ "𒐏𒐊": 101717,
1646
+ "𒐐": 101940,
1647
+ "𒐑": 101725,
1648
+ "𒐒": 101710,
1649
+ "𒐓": 101685,
1650
+ "𒐔": 101382,
1651
+ "𒐕": 101464,
1652
+ "𒐖": 101347,
1653
+ "𒐖𒐏𒑆": 101740,
1654
+ "𒐗": 101556,
1655
+ "𒐗𒌍𒐍": 101374,
1656
+ "𒐘": 101818,
1657
+ "𒐙": 101338,
1658
+ "𒐚": 101405,
1659
+ "𒐛": 101671,
1660
+ "𒐜": 101389,
1661
+ "𒐝": 101649,
1662
+ "𒐞": 101770,
1663
+ "𒐟": 101626,
1664
+ "𒐠": 101683,
1665
+ "𒐡": 101754,
1666
+ "𒐣": 101524,
1667
+ "𒐴": 101830,
1668
+ "𒐵": 101845,
1669
+ "𒐶": 101436,
1670
+ "𒐸": 101708,
1671
+ "𒐹": 101794,
1672
+ "𒐼": 101837,
1673
+ "𒑏": 101832,
1674
+ "𒑐": 101721,
1675
+ "𒑑": 101587,
1676
+ "𒑒": 101365,
1677
+ "𒑔": 101391,
1678
+ "𒑗": 101482,
1679
+ "𒑘": 101745,
1680
+ "𒑙": 101792,
1681
+ "𒑚": 101903,
1682
+ "𒑛": 101644,
1683
+ "𒑜": 101910,
1684
+ "𒑰": 101527,
1685
+ "𒑱": 101402,
1686
+ "𒓊": 101713
1687
+ }
chat_template.jinja ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if not add_generation_prompt is defined -%}
2
+ {%- set add_generation_prompt = true -%}
3
+ {%- endif -%}
4
+ {%- if not cls_token is defined -%}
5
+ {%- set cls_token = "<|begin_of_sentence|>" -%}
6
+ {%- endif -%}
7
+ {%- if not eos_token is defined -%}
8
+ {%- set eos_token = "</s>" -%}
9
+ {%- endif -%}
10
+ {%- if not image_token is defined -%}
11
+ {%- set image_token = "<|IMAGE_START|><|IMAGE_PLACEHOLDER|><|IMAGE_END|>" -%}
12
+ {%- endif -%}
13
+ {{- cls_token -}}
14
+ {%- for message in messages -%}
15
+ {%- if message["role"] == "user" -%}
16
+ {{- "User: " -}}
17
+ {%- for content in message["content"] -%}
18
+ {%- if content["type"] == "image" -%}
19
+ {{ image_token }}
20
+ {%- endif -%}
21
+ {%- endfor -%}
22
+ {%- for content in message["content"] -%}
23
+ {%- if content["type"] == "text" -%}
24
+ {{ content["text"] }}
25
+ {%- endif -%}
26
+ {%- endfor -%}
27
+ {{ "\n" -}}
28
+ {%- elif message["role"] == "assistant" -%}
29
+ {{- "Assistant: " -}}
30
+ {%- for content in message["content"] -%}
31
+ {%- if content["type"] == "text" -%}
32
+ {{ content["text"] }}
33
+ {%- endif -%}
34
+ {%- endfor -%}
35
+ {{ eos_token -}}
36
+ {%- elif message["role"] == "system" -%}
37
+ {%- for content in message["content"] -%}
38
+ {%- if content["type"] == "text" -%}
39
+ {{ content["text"] + "\n" }}
40
+ {%- endif -%}
41
+ {%- endfor -%}
42
+ {%- endif -%}
43
+ {%- endfor -%}
44
+ {%- if add_generation_prompt -%}
45
+ {{- "Assistant: " -}}
46
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "PaddleOCRVLForConditionalGeneration"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.0,
6
+ "auto_map": {
7
+ "AutoConfig": "configuration_paddleocr_vl.PaddleOCRVLConfig",
8
+ "AutoModel": "modeling_paddleocr_vl.PaddleOCRVLForConditionalGeneration",
9
+ "AutoModelForCausalLM": "modeling_paddleocr_vl.PaddleOCRVLForConditionalGeneration"
10
+ },
11
+ "bos_token_id": 1,
12
+ "compression_ratio": 1.0,
13
+ "torch_dtype": "bfloat16",
14
+ "eos_token_id": 2,
15
+ "head_dim": 128,
16
+ "hidden_act": "silu",
17
+ "hidden_dropout_prob": 0.0,
18
+ "hidden_size": 1024,
19
+ "ignored_index": -100,
20
+ "image_token_id": 100295,
21
+ "intermediate_size": 3072,
22
+ "max_position_embeddings": 131072,
23
+ "max_sequence_length": null,
24
+ "model_type": "paddleocr_vl",
25
+ "num_attention_heads": 16,
26
+ "num_hidden_layers": 18,
27
+ "num_key_value_heads": 2,
28
+ "pad_token_id": 0,
29
+ "rms_norm_eps": 1e-05,
30
+ "rope_is_neox_style": true,
31
+ "rope_scaling": {
32
+ "mrope_section": [
33
+ 16,
34
+ 24,
35
+ 24
36
+ ],
37
+ "rope_type": "default",
38
+ "type": "default"
39
+ },
40
+ "rope_theta": 500000,
41
+ "sliding_window": null,
42
+ "tie_word_embeddings": false,
43
+ "transformers_version": "4.57.3",
44
+ "unsloth_version": "2025.12.8",
45
+ "use_3d_rope": true,
46
+ "use_bias": false,
47
+ "use_cache": false,
48
+ "use_flash_attention": false,
49
+ "video_token_id": 101307,
50
+ "vision_config": {
51
+ "architectures": [
52
+ "PaddleOCRVisionModel"
53
+ ],
54
+ "attention_dropout": 0.0,
55
+ "auto_map": {
56
+ "AutoConfig": "configuration_paddleocr_vl.PaddleOCRVLConfig",
57
+ "AutoModel": "modeling_paddleocr_vl.PaddleOCRVisionModel"
58
+ },
59
+ "torch_dtype": "bfloat16",
60
+ "hidden_act": "gelu_pytorch_tanh",
61
+ "hidden_size": 1152,
62
+ "image_size": 384,
63
+ "intermediate_size": 4304,
64
+ "layer_norm_eps": 1e-06,
65
+ "model_type": "paddleocr_vl",
66
+ "num_attention_heads": 16,
67
+ "num_channels": 3,
68
+ "num_hidden_layers": 27,
69
+ "pad_token_id": 0,
70
+ "patch_size": 14,
71
+ "spatial_merge_size": 2,
72
+ "temporal_patch_size": 2,
73
+ "tokens_per_second": 2
74
+ },
75
+ "vision_start_token_id": 101305,
76
+ "vocab_size": 101980,
77
+ "weight_share_add_bias": true
78
+ }
configuration_paddleocr_vl.py ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ from transformers.configuration_utils import PretrainedConfig
16
+ from transformers.modeling_rope_utils import rope_config_validation
17
+
18
+ class PaddleOCRVisionConfig(PretrainedConfig):
19
+ model_type = "paddleocr_vl"
20
+ base_config_key = "vision_config"
21
+
22
+ def __init__(
23
+ self,
24
+ hidden_size=768,
25
+ intermediate_size=3072,
26
+ num_hidden_layers=12,
27
+ num_attention_heads=12,
28
+ num_channels=3,
29
+ image_size=224,
30
+ patch_size=14,
31
+ hidden_act="gelu_pytorch_tanh",
32
+ layer_norm_eps=1e-6,
33
+ attention_dropout=0.0,
34
+ spatial_merge_size=2,
35
+ temporal_patch_size=2,
36
+ tokens_per_second=2,
37
+ **kwargs,
38
+ ):
39
+ super().__init__(**kwargs)
40
+
41
+ self.hidden_size = hidden_size
42
+ self.intermediate_size = intermediate_size
43
+ self.num_hidden_layers = num_hidden_layers
44
+ self.num_attention_heads = num_attention_heads
45
+ self.num_channels = num_channels
46
+ self.patch_size = patch_size
47
+ self.image_size = image_size
48
+ self.attention_dropout = attention_dropout
49
+ self.layer_norm_eps = layer_norm_eps
50
+ self.hidden_act = hidden_act
51
+ self.spatial_merge_size = spatial_merge_size
52
+ self.temporal_patch_size = temporal_patch_size
53
+ self.tokens_per_second = tokens_per_second
54
+
55
+
56
+
57
+ class PaddleOCRVLConfig(PretrainedConfig):
58
+ """
59
+ Configuration class.
60
+
61
+ This class stores the configuration of an Ernie model, defining the model architecture.
62
+ It inherits from PretrainedConfig and can be used to control model outputs.
63
+ """
64
+
65
+ model_type = "paddleocr_vl"
66
+ keys_to_ignore_at_inference = ["past_key_values"]
67
+ sub_configs = {"vision_config": PaddleOCRVisionConfig}
68
+
69
+ # Default tensor parallel plan for base model `Qwen3`
70
+ base_model_tp_plan = {
71
+ "layers.*.self_attn.q_proj": "colwise",
72
+ "layers.*.self_attn.k_proj": "colwise",
73
+ "layers.*.self_attn.v_proj": "colwise",
74
+ "layers.*.self_attn.o_proj": "rowwise",
75
+ "layers.*.mlp.gate_proj": "colwise",
76
+ "layers.*.mlp.up_proj": "colwise",
77
+ "layers.*.mlp.down_proj": "rowwise",
78
+ }
79
+ base_model_pp_plan = {
80
+ "embed_tokens": (["input_ids"], ["inputs_embeds"]),
81
+ "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
82
+ "norm": (["hidden_states"], ["hidden_states"]),
83
+ }
84
+
85
+ def __init__(
86
+ self,
87
+ vocab_size=32000,
88
+ hidden_size=768,
89
+ intermediate_size=11008,
90
+ max_position_embeddings=32768,
91
+ num_hidden_layers=2,
92
+ num_attention_heads=2,
93
+ image_token_id=101304,
94
+ video_token_id=101305,
95
+ vision_start_token_id=101306,
96
+ rms_norm_eps=1e-6,
97
+ use_cache=False,
98
+ use_flash_attention=False,
99
+ pad_token_id=0,
100
+ bos_token_id=1,
101
+ eos_token_id=2,
102
+ head_dim=128,
103
+ hidden_act="silu",
104
+ use_bias=False,
105
+ rope_theta=10000,
106
+ weight_share_add_bias=True,
107
+ ignored_index=-100,
108
+ attention_probs_dropout_prob=0.0,
109
+ hidden_dropout_prob=0.0,
110
+ compression_ratio: float = 1.0,
111
+ num_key_value_heads=None,
112
+ max_sequence_length=None,
113
+ tie_word_embeddings=False,
114
+ vision_config=None,
115
+ rope_scaling=None,
116
+ **kwargs,
117
+ ):
118
+ """
119
+ Initialize configuration with default or specified parameters.
120
+
121
+ Args:
122
+ vocab_size (int): Size of the vocabulary (number of unique tokens)
123
+ hidden_size (int): Dimensionality of the encoder layers and the pooler layer
124
+ intermediate_size (int): Dimensionality of the "intermediate" (feed-forward) layer
125
+ max_position_embeddings (int): Maximum sequence length the model can handle
126
+ num_hidden_layers (int): Number of hidden layers in the Transformer encoder
127
+ num_attention_heads (int): Number of attention heads for each attention layer
128
+ rms_norm_eps (float): The epsilon used by the RMS normalization layers
129
+ use_cache (bool): Whether to use caching for faster generation (decoding)
130
+ use_flash_attention (bool): Whether to use FlashAttention for optimized attention computation
131
+ pad_token_id (int): Token ID used for padding sequences
132
+ bos_token_id (int): Token ID used for beginning-of-sequence
133
+ eos_token_id (int): Token ID used for end-of-sequence
134
+ use_bias (bool): Whether to use bias terms in linear layers
135
+ rope_theta (float): The base period of the RoPE embeddings
136
+ weight_share_add_bias (bool): Whether to share bias weights in certain layers
137
+ ignored_index (int): Target value that is ignored during loss computation
138
+ attention_probs_dropout_prob (float): Dropout probability for attention weights
139
+ hidden_dropout_prob (float): Dropout probability for hidden layers
140
+ compression_ratio (float): Ratio for KV cache compression (1.0 = no compression)
141
+ num_key_value_heads (int): Number of key/value heads (for Grouped Query Attention)
142
+ max_sequence_length (int): Maximum sequence length for positional embeddings
143
+ **kwargs: Additional keyword arguments passed to parent class
144
+ """
145
+
146
+ # Set default for tied embeddings if not specified.
147
+ super().__init__(
148
+ pad_token_id=pad_token_id,
149
+ bos_token_id=bos_token_id,
150
+ eos_token_id=eos_token_id,
151
+ **kwargs,
152
+ )
153
+ if isinstance(vision_config, dict):
154
+ self.vision_config = self.sub_configs["vision_config"](**vision_config)
155
+ elif vision_config is None:
156
+ self.vision_config = self.sub_configs["vision_config"]()
157
+ self.vocab_size = vocab_size
158
+ self.hidden_size = hidden_size
159
+ self.intermediate_size = intermediate_size
160
+ self.max_position_embeddings = max_position_embeddings
161
+ self.num_hidden_layers = num_hidden_layers
162
+ self.num_attention_heads = num_attention_heads
163
+ self.rms_norm_eps = rms_norm_eps
164
+ self.use_cache = use_cache
165
+ self.use_flash_attention = use_flash_attention
166
+ self.pad_token_id = pad_token_id
167
+ self.bos_token_id = bos_token_id
168
+ self.eos_token_id = eos_token_id
169
+ self.image_token_id = image_token_id
170
+ self.video_token_id = video_token_id
171
+ self.vision_start_token_id = vision_start_token_id
172
+ self.head_dim = head_dim
173
+ self.hidden_act=hidden_act
174
+ self.sliding_window = None
175
+ self.hidden_size = hidden_size
176
+ self.use_bias = use_bias
177
+ self.weight_share_add_bias = weight_share_add_bias
178
+ self.rope_theta = rope_theta
179
+ self.ignored_index = ignored_index
180
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
181
+ self.hidden_dropout_prob = hidden_dropout_prob
182
+ self.compression_ratio = compression_ratio
183
+ self.num_key_value_heads = num_key_value_heads
184
+ self.max_sequence_length = max_sequence_length
185
+ self.rope_scaling = rope_scaling
186
+ if self.rope_scaling is not None and "type" in self.rope_scaling:
187
+ if self.rope_scaling["type"] == "mrope":
188
+ self.rope_scaling["type"] = "default"
189
+ self.rope_scaling["rope_type"] = self.rope_scaling["type"]
190
+ rope_config_validation(self, ignore_keys={"mrope_section"})
191
+ super().__init__(tie_word_embeddings=tie_word_embeddings, **kwargs)
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": [2, 2],
5
+ "max_length": 131072,
6
+ "pad_token_id": 0,
7
+ "transformers_version": "4.57.2",
8
+ "use_cache": true
9
+ }
image_processing_paddleocr_vl.py ADDED
@@ -0,0 +1,569 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ """Image processor class for PaddleOCR-VL."""
16
+
17
+ import math
18
+ from typing import Dict, List, Optional, Union
19
+
20
+ import numpy as np
21
+ import torch
22
+ from transformers.image_processing_utils import BaseImageProcessor, BatchFeature
23
+ from torchvision.transforms import functional as TF
24
+ from transformers.image_transforms import (
25
+ convert_to_rgb,
26
+ resize,
27
+ to_channel_dimension_format,
28
+ )
29
+ from transformers.image_utils import (
30
+ OPENAI_CLIP_MEAN,
31
+ OPENAI_CLIP_STD,
32
+ ChannelDimension,
33
+ PILImageResampling,
34
+ get_image_size,
35
+ infer_channel_dimension_format,
36
+ is_scaled_image,
37
+ is_valid_image,
38
+ make_list_of_images,
39
+ to_numpy_array,
40
+ valid_images,
41
+ validate_preprocess_arguments,
42
+ )
43
+ from transformers.utils import TensorType, is_vision_available, logging
44
+
45
+
46
+ logger = logging.get_logger(__name__)
47
+
48
+
49
+ if is_vision_available():
50
+ from PIL import Image
51
+
52
+ ImageInput = Union[
53
+ "PIL.Image.Image",
54
+ np.ndarray,
55
+ "torch.Tensor",
56
+ List["PIL.Image.Image"],
57
+ List[np.ndarray],
58
+ List["torch.Tensor"],
59
+ ] # noqa
60
+
61
+
62
+ VideoInput = Union[
63
+ List["PIL.Image.Image"],
64
+ "np.ndarray",
65
+ "torch.Tensor",
66
+ List["np.ndarray"],
67
+ List["torch.Tensor"],
68
+ List[List["PIL.Image.Image"]],
69
+ List[List["np.ndarrray"]],
70
+ List[List["torch.Tensor"]],
71
+ ] # noqa
72
+
73
+
74
+ def make_batched_images(images) -> List[List[ImageInput]]:
75
+ """
76
+ Accepts images in list or nested list format, and makes a list of images for preprocessing.
77
+
78
+ Args:
79
+ images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`):
80
+ The input image.
81
+
82
+ Returns:
83
+ list: A list of images.
84
+ """
85
+ if (
86
+ isinstance(images, (list, tuple))
87
+ and isinstance(images[0], (list, tuple))
88
+ and is_valid_image(images[0][0])
89
+ ):
90
+ return [img for img_list in images for img in img_list]
91
+
92
+ elif isinstance(images, (list, tuple)) and is_valid_image(images[0]):
93
+ return images
94
+
95
+ elif is_valid_image(images):
96
+ return [images]
97
+
98
+ raise ValueError(f"Could not make batched images from {images}")
99
+
100
+
101
+ def adjust_size(size, patch_size):
102
+ num_patches = size // patch_size
103
+ if num_patches % 2 != 0: # 如果是奇数,减1
104
+ num_patches -= 1
105
+ return num_patches * patch_size
106
+
107
+
108
+ def make_batched_videos(videos) -> List[VideoInput]:
109
+ if (
110
+ isinstance(videos, (list, tuple))
111
+ and isinstance(videos[0], (list, tuple))
112
+ and is_valid_image(videos[0][0])
113
+ ):
114
+ return videos
115
+
116
+ elif isinstance(videos, (list, tuple)) and is_valid_image(videos[0]):
117
+ if isinstance(videos[0], Image.Image):
118
+ return [videos]
119
+ elif len(videos[0].shape) == 4:
120
+ return [list(video) for video in videos]
121
+
122
+ elif is_valid_image(videos) and len(videos.shape) == 4:
123
+ return [list(videos)]
124
+
125
+ raise ValueError(f"Could not make batched video from {videos}")
126
+
127
+
128
+ def smart_resize(
129
+ height: int,
130
+ width: int,
131
+ factor: int = 28,
132
+ min_pixels: int = 28 * 28 * 130,
133
+ max_pixels: int = 28 * 28 * 1280,
134
+ ):
135
+ """Rescales the image so that the following conditions are met:
136
+
137
+ 1. Both dimensions (height and width) are divisible by 'factor'.
138
+
139
+ 2. The total number of pixels is within the range ['min_pixels', 'max_pixels'].
140
+
141
+ 3. The aspect ratio of the image is maintained as closely as possible.
142
+
143
+ """
144
+ # if height < factor or width < factor:
145
+ # raise ValueError(f"height:{height} or width:{width} must be larger than factor:{factor}")
146
+ # if int(height < factor//4) + int(width < factor//4):
147
+ # raise ValueError(f"height:{height} or width:{width} must be larger than factor:{factor//4}")
148
+
149
+ if height < factor:
150
+ print(f"smart_resize: height={height} < factor={factor}, reset height=factor")
151
+ width = round((width * factor) / height)
152
+ height = factor
153
+
154
+ if width < factor:
155
+ print(f"smart_resize: width={width} < factor={factor}, reset width=factor")
156
+ height = round((height * factor) / width)
157
+ width = factor
158
+
159
+ if max(height, width) / min(height, width) > 200:
160
+ raise ValueError(
161
+ f"absolute aspect ratio must be smaller than 200, got {max(height, width) / min(height, width)}"
162
+ )
163
+ h_bar = round(height / factor) * factor
164
+ w_bar = round(width / factor) * factor
165
+ if h_bar * w_bar > max_pixels:
166
+ beta = math.sqrt((height * width) / max_pixels)
167
+ h_bar = math.floor(height / beta / factor) * factor
168
+ w_bar = math.floor(width / beta / factor) * factor
169
+ elif h_bar * w_bar < min_pixels:
170
+ beta = math.sqrt(min_pixels / (height * width))
171
+ h_bar = math.ceil(height * beta / factor) * factor
172
+ w_bar = math.ceil(width * beta / factor) * factor
173
+ return h_bar, w_bar
174
+
175
+
176
+ class PaddleOCRVLImageProcessor(BaseImageProcessor):
177
+ r"""
178
+ Constructs a Siglip image processor that dynamically resizes images based on the original images.
179
+
180
+ Args:
181
+ do_resize (`bool`, *optional*, defaults to `True`):
182
+ Whether to resize the image's (height, width) dimensions.
183
+ resample (`PILImageResampling`, *optional*, defaults to `Resampling.BICUBIC`):
184
+ Resampling filter to use when resizing the image.
185
+ do_rescale (`bool`, *optional*, defaults to `True`):
186
+ Whether to rescale the image by the specified scale `rescale_factor`.
187
+ rescale_factor (`int` or `float`, *optional*, defaults to `1/255`):
188
+ Scale factor to use if rescaling the image.
189
+ do_normalize (`bool`, *optional*, defaults to `True`):
190
+ Whether to normalize the image.
191
+ image_mean (`float` or `List[float]`, *optional*, defaults to `[0.48145466, 0.4578275, 0.40821073]`):
192
+ Mean to use if normalizing the image. This is a float or list of floats for each channel in the image.
193
+ image_std (`float` or `List[float]`, *optional*, defaults to `[0.26862954, 0.26130258, 0.27577711]`):
194
+ Standard deviation to use if normalizing the image. This is a float or list of floats for each channel in the image.
195
+ do_convert_rgb (`bool`, *optional*, defaults to `True`):
196
+ Whether to convert the image to RGB.
197
+ min_pixels (`int`, *optional*, defaults to `28 * 28 * 130`):
198
+ The min pixels of the image to resize the image.
199
+ max_pixels (`int`, *optional*, defaults to `28 * 28 * 1670`):
200
+ The max pixels of the image to resize the image.
201
+ patch_size (`int`, *optional*, defaults to 14):
202
+ The spacial patch size of the vision encoder.
203
+ temporal_patch_size (`int`, *optional*, defaults to 2):
204
+ The temporal patch size of the vision encoder.
205
+ merge_size (`int`, *optional*, defaults to 2):
206
+ The merge size of the vision encoder to llm encoder.
207
+ """
208
+
209
+ model_input_names = [
210
+ "pixel_values",
211
+ "image_grid_thw",
212
+ "pixel_values_videos",
213
+ "video_grid_thw",
214
+ ]
215
+
216
+ def __init__(
217
+ self,
218
+ do_resize: bool = True,
219
+ resample: PILImageResampling = PILImageResampling.BICUBIC,
220
+ do_rescale: bool = True,
221
+ rescale_factor: Union[int, float] = 1 / 255,
222
+ do_normalize: bool = True,
223
+ image_mean: Optional[Union[float, List[float]]] = None,
224
+ image_std: Optional[Union[float, List[float]]] = None,
225
+ do_convert_rgb: bool = True,
226
+ min_pixels: int = 28 * 28 * 130,
227
+ max_pixels: int = 28 * 28 * 1280,
228
+ patch_size: int = 14,
229
+ temporal_patch_size: int = 1,
230
+ merge_size: int = 2,
231
+ **kwargs,
232
+ ) -> None:
233
+ super().__init__(**kwargs)
234
+ self.do_resize = do_resize
235
+ self.resample = resample
236
+ self.do_rescale = do_rescale
237
+ self.rescale_factor = rescale_factor
238
+ self.do_normalize = do_normalize
239
+ self.image_mean = image_mean if image_mean is not None else OPENAI_CLIP_MEAN
240
+ self.image_std = image_std if image_std is not None else OPENAI_CLIP_STD
241
+ self.min_pixels = min_pixels
242
+ self.max_pixels = max_pixels
243
+ self.patch_size = patch_size
244
+ self.temporal_patch_size = temporal_patch_size
245
+ self.merge_size = merge_size
246
+ self.size = {"min_pixels": min_pixels, "max_pixels": max_pixels} # not used
247
+ self.do_convert_rgb = do_convert_rgb
248
+
249
+ def mvit_rescale(self, image: Image.Image, merge_size: int = 2) -> Image.Image:
250
+ try:
251
+ w, h = image.size
252
+ except:
253
+ raise ValueError(str((type(image), image)))
254
+ patch_size = self.patch_size
255
+
256
+ if (w // patch_size) * (h // patch_size) > self.in_token_limit:
257
+ scale = math.sqrt(
258
+ self.in_token_limit / ((w // patch_size) * (h // patch_size))
259
+ )
260
+ new_w, new_h = int(w * scale), int(h * scale)
261
+
262
+ image = image.resize((new_w, new_h), Image.Resampling.BICUBIC)
263
+ if self.pad_input:
264
+ new_w, new_h = image.size
265
+ pad_size_h = merge_size * patch_size
266
+ pad_size_w = merge_size * patch_size
267
+
268
+ pad_h = (pad_size_h - new_h % pad_size_h) % pad_size_h
269
+ pad_w = (pad_size_w - new_w % pad_size_w) % pad_size_w
270
+
271
+ image = TF.pad(image, (0, 0, pad_w, pad_h))
272
+ else:
273
+ new_w, new_h = image.size
274
+ new_w = new_w - new_w % patch_size
275
+ new_h = new_h - new_h % patch_size
276
+
277
+ new_w = adjust_size(new_w, patch_size)
278
+ new_h = adjust_size(new_h, patch_size)
279
+
280
+ image = TF.center_crop(image, (new_h, new_w))
281
+
282
+ w, h = image.size
283
+ if w // patch_size >= 512 or h // patch_size >= 512:
284
+ new_h = min(patch_size * 510, h)
285
+ new_w = min(patch_size * 510, w)
286
+ image = TF.center_crop(image, (new_h, new_w))
287
+ # raise ValueError("Exceed pos emb")
288
+ return image
289
+
290
+ def _preprocess(
291
+ self,
292
+ images: Union[ImageInput, VideoInput],
293
+ do_resize: bool = None,
294
+ resample: PILImageResampling = None,
295
+ do_rescale: bool = None,
296
+ rescale_factor: float = None,
297
+ do_normalize: bool = None,
298
+ image_mean: Optional[Union[float, List[float]]] = None,
299
+ image_std: Optional[Union[float, List[float]]] = None,
300
+ do_convert_rgb: bool = None,
301
+ data_format: Optional[ChannelDimension] = ChannelDimension.FIRST,
302
+ input_data_format: Optional[Union[str, ChannelDimension]] = None,
303
+ ):
304
+ """
305
+ Preprocess an image or batch of images. Copy of the `preprocess` method from `CLIPImageProcessor`.
306
+
307
+ Args:
308
+ images (`ImageInput`):
309
+ Image or batch of images to preprocess. Expects pixel values ranging from 0 to 255. If pixel values range from 0 to 1, set `do_rescale=False`.
310
+ vision_info (`List[Dict]`, *optional*):
311
+ Optional list of dictionaries containing additional information about vision inputs.
312
+ do_resize (`bool`, *optional*, defaults to `self.do_resize`):
313
+ Whether to resize the image.
314
+ resample (`PILImageResampling`, *optional*, defaults to `self.resample`):
315
+ Resampling filter to use if resizing the image. This can be one of the `PILImageResampling` enums.
316
+ do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
317
+ Whether to rescale the image.
318
+ rescale_factor (`float`, *optional*, defaults to `self.rescale_factor`):
319
+ Scale factor to use if rescaling the image.
320
+ do_normalize (`bool`, *optional*, defaults to `self.do_normalize`):
321
+ Whether to normalize the image.
322
+ image_mean (`float` or `List[float]`, *optional*, defaults to `self.image_mean`):
323
+ Mean to use if normalizing the image. Can be a float or a list of floats corresponding to the number of channels in the image.
324
+ image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
325
+ Standard deviation to use if normalizing the image. Can be a float or a list of floats corresponding to the number of channels in the image.
326
+ do_convert_rgb (`bool`, *optional*, defaults to `self.do_convert_rgb`):
327
+ Whether to convert the image to RGB.
328
+ data_format (`ChannelDimension`, *optional*, defaults to `ChannelDimension.FIRST`):
329
+ The channel dimension format for the output image. Can be one of:
330
+ - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
331
+ - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
332
+ - Unset: Use the channel dimension format of the input image.
333
+ input_data_format (`ChannelDimension` or `str`, *optional*):
334
+ The channel dimension format for the input image. Can be one of:
335
+ - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
336
+ - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
337
+ - `"none"` or `ChannelDimension.NONE`: image in (height, width) format. - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.
338
+ """
339
+ images = make_list_of_images(images)
340
+
341
+ if do_convert_rgb:
342
+ images = [convert_to_rgb(image) for image in images]
343
+
344
+ # All transformations expect numpy arrays.
345
+ images = [to_numpy_array(image) for image in images]
346
+
347
+ if is_scaled_image(images[0]) and do_rescale:
348
+ logger.warning_once(
349
+ "It looks like you are trying to rescale already rescaled images. If the input"
350
+ " images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again."
351
+ )
352
+ if input_data_format is None:
353
+ # We assume that all images have the same channel dimension format.
354
+ input_data_format = infer_channel_dimension_format(images[0])
355
+
356
+ height, width = get_image_size(images[0], channel_dim=input_data_format)
357
+ resized_height, resized_width = height, width
358
+ processed_images = []
359
+
360
+ for image in images:
361
+ if do_resize:
362
+ resized_height, resized_width = smart_resize(
363
+ height,
364
+ width,
365
+ factor=self.patch_size * self.merge_size,
366
+ min_pixels=self.min_pixels,
367
+ max_pixels=self.max_pixels,
368
+ )
369
+ image = resize(
370
+ image,
371
+ size=(resized_height, resized_width),
372
+ resample=resample,
373
+ input_data_format=input_data_format,
374
+ )
375
+
376
+ if do_rescale:
377
+ image = self.rescale(
378
+ image, scale=rescale_factor, input_data_format=input_data_format
379
+ )
380
+
381
+ if do_normalize:
382
+ image = self.normalize(
383
+ image=image,
384
+ mean=image_mean,
385
+ std=image_std,
386
+ input_data_format=input_data_format,
387
+ )
388
+ image = to_channel_dimension_format(
389
+ image, data_format, input_channel_dim=input_data_format
390
+ )
391
+ processed_images.append(image)
392
+
393
+ patches = np.array(processed_images)
394
+ if data_format == ChannelDimension.LAST:
395
+ patches = patches.transpose(0, 3, 1, 2)
396
+ if patches.shape[0] == 1:
397
+ patches = np.tile(patches, (self.temporal_patch_size, 1, 1, 1))
398
+ init_patches = patches
399
+ channel = patches.shape[1]
400
+ grid_t = patches.shape[0] // self.temporal_patch_size
401
+ grid_h, grid_w = (
402
+ resized_height // self.patch_size,
403
+ resized_width // self.patch_size,
404
+ )
405
+ patches = patches.reshape(
406
+ grid_t,
407
+ self.temporal_patch_size,
408
+ channel,
409
+ grid_h,
410
+ self.patch_size,
411
+ grid_w,
412
+ self.patch_size,
413
+ )
414
+ patches = patches.transpose(0, 3, 5, 2, 1, 4, 6)
415
+ assert self.temporal_patch_size == 1
416
+ flatten_patches = patches.reshape(
417
+ grid_t * grid_h * grid_w, channel, self.patch_size, self.patch_size
418
+ )
419
+ return flatten_patches, (grid_t, grid_h, grid_w)
420
+
421
+ def preprocess(
422
+ self,
423
+ images: ImageInput,
424
+ videos: VideoInput = None,
425
+ do_resize: bool = None,
426
+ size: Dict[str, int] = None,
427
+ resample: PILImageResampling = None,
428
+ do_rescale: bool = None,
429
+ rescale_factor: float = None,
430
+ do_normalize: bool = None,
431
+ image_mean: Optional[Union[float, List[float]]] = None,
432
+ image_std: Optional[Union[float, List[float]]] = None,
433
+ do_convert_rgb: bool = None,
434
+ return_tensors: Optional[Union[str, TensorType]] = None,
435
+ data_format: Optional[ChannelDimension] = ChannelDimension.FIRST,
436
+ input_data_format: Optional[Union[str, ChannelDimension]] = None,
437
+ ):
438
+ """
439
+ Args:
440
+ images (`ImageInput`):
441
+ Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If
442
+ passing in images with pixel values between 0 and 1, set `do_rescale=False`.
443
+ videos (`VideoInput`):
444
+ Video to preprocess. Expects a single or batch of videos with pixel values ranging from 0 to 255. If
445
+ passing in videos with pixel values between 0 and 1, set `do_rescale=False`.
446
+ do_resize (`bool`, *optional*, defaults to `self.do_resize`):
447
+ Whether to resize the image.
448
+ size (`Dict[str, int]`, *optional*, defaults to `self.size`):
449
+ Size of the image after resizing. Shortest edge of the image is resized to size["shortest_edge"], with
450
+ the longest edge resized to keep the input aspect ratio.
451
+ resample (`int`, *optional*, defaults to `self.resample`):
452
+ Resampling filter to use if resizing the image. This can be one of the enum `PILImageResampling`. Only
453
+ has an effect if `do_resize` is set to `True`.
454
+ do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
455
+ Whether to rescale the image.
456
+ rescale_factor (`float`, *optional*, defaults to `self.rescale_factor`):
457
+ Rescale factor to rescale the image by if `do_rescale` is set to `True`.
458
+ do_normalize (`bool`, *optional*, defaults to `self.do_normalize`):
459
+ Whether to normalize the image.
460
+ image_mean (`float` or `List[float]`, *optional*, defaults to `self.image_mean`):
461
+ Image mean to use for normalization. Only has an effect if `do_normalize` is set to `True`.
462
+ image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
463
+ Image standard deviation to use for normalization. Only has an effect if `do_normalize` is set to
464
+ `True`.
465
+ do_convert_rgb (`bool`, *optional*, defaults to `self.do_convert_rgb`):
466
+ Whether to convert the image to RGB.
467
+ return_tensors (`str` or `TensorType`, *optional*):
468
+ The type of tensors to return. Can be one of:
469
+ - Unset: Return a list of `np.ndarray`.
470
+ - `TensorType.TENSORFLOW` or `'tf'`: Return a batch of type `tf.Tensor`.
471
+ - `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`.
472
+ - `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`.
473
+ - `TensorType.JAX` or `'jax'`: Return a batch of type `jax.numpy.ndarray`.
474
+ data_format (`ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST`):
475
+ The channel dimension format for the output image. Can be one of:
476
+ - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
477
+ - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
478
+ - Unset: Use the channel dimension format of the input image.
479
+ input_data_format (`ChannelDimension` or `str`, *optional*):
480
+ The channel dimension format for the input image. If unset, the channel dimension format is inferred
481
+ from the input image. Can be one of:
482
+ - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
483
+ - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
484
+ - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.
485
+
486
+ """
487
+ do_resize = do_resize if do_resize is not None else self.do_resize
488
+ size = size if size is not None else self.size
489
+ resample = resample if resample is not None else self.resample
490
+ do_rescale = do_rescale if do_rescale is not None else self.do_rescale
491
+ rescale_factor = (
492
+ rescale_factor if rescale_factor is not None else self.rescale_factor
493
+ )
494
+ do_normalize = do_normalize if do_normalize is not None else self.do_normalize
495
+ image_mean = image_mean if image_mean is not None else self.image_mean
496
+ image_std = image_std if image_std is not None else self.image_std
497
+ do_convert_rgb = (
498
+ do_convert_rgb if do_convert_rgb is not None else self.do_convert_rgb
499
+ )
500
+
501
+ if images is not None:
502
+ images = make_batched_images(images)
503
+ if videos is not None:
504
+ videos = make_batched_videos(videos)
505
+
506
+ if images is not None and not valid_images(images):
507
+ raise ValueError(
508
+ "Invalid image type. Must be of type PIL.Image.Image, numpy.ndarray, "
509
+ "torch.Tensor, tf.Tensor or jax.ndarray."
510
+ )
511
+
512
+ validate_preprocess_arguments(
513
+ rescale_factor=rescale_factor,
514
+ do_normalize=do_normalize,
515
+ image_mean=image_mean,
516
+ image_std=image_std,
517
+ do_resize=do_resize,
518
+ size=size,
519
+ resample=resample,
520
+ )
521
+
522
+ if images is not None:
523
+ pixel_values, vision_grid_thws = [], []
524
+ for image in images:
525
+ patches, image_grid_thw = self._preprocess(
526
+ image,
527
+ do_resize=do_resize,
528
+ resample=resample,
529
+ do_rescale=do_rescale,
530
+ rescale_factor=rescale_factor,
531
+ do_normalize=do_normalize,
532
+ image_mean=image_mean,
533
+ image_std=image_std,
534
+ data_format=data_format,
535
+ do_convert_rgb=do_convert_rgb,
536
+ input_data_format=input_data_format,
537
+ )
538
+ pixel_values.extend(patches)
539
+ vision_grid_thws.append(image_grid_thw)
540
+ pixel_values = np.array(pixel_values)
541
+ vision_grid_thws = np.array(vision_grid_thws)
542
+ data = {"pixel_values": pixel_values, "image_grid_thw": vision_grid_thws}
543
+
544
+ if videos is not None:
545
+ pixel_values, vision_grid_thws = [], []
546
+ for images in videos:
547
+ patches, video_grid_thw = self._preprocess(
548
+ images,
549
+ do_resize=do_resize,
550
+ resample=resample,
551
+ do_rescale=do_rescale,
552
+ rescale_factor=rescale_factor,
553
+ do_normalize=do_normalize,
554
+ image_mean=image_mean,
555
+ image_std=image_std,
556
+ data_format=data_format,
557
+ do_convert_rgb=do_convert_rgb,
558
+ input_data_format=input_data_format,
559
+ )
560
+ pixel_values.extend(patches)
561
+ vision_grid_thws.append(video_grid_thw)
562
+ pixel_values = np.array(pixel_values)
563
+ vision_grid_thws = np.array(vision_grid_thws)
564
+ data = {
565
+ "pixel_values_videos": pixel_values,
566
+ "video_grid_thw": vision_grid_thws,
567
+ }
568
+
569
+ return BatchFeature(data=data, tensor_type=return_tensors)
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f0b8542a39aedad236e3be8c587ceabb2d39e6ac4724f17f4607d690fa8cc66
3
+ size 1911341344
modeling_paddleocr_vl.py ADDED
The diff for this file is too large to render. See raw diff
 
preprocessor_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoImageProcessor": "image_processing_paddleocr_vl.PaddleOCRVLImageProcessor",
4
+ "AutoProcessor": "processing_paddleocr_vl.PaddleOCRVLProcessor"
5
+ },
6
+ "do_convert_rgb": true,
7
+ "do_normalize": true,
8
+ "do_rescale": true,
9
+ "do_resize": true,
10
+ "image_mean": [
11
+ 0.5,
12
+ 0.5,
13
+ 0.5
14
+ ],
15
+ "image_processor_type": "PaddleOCRVLImageProcessor",
16
+ "image_std": [
17
+ 0.5,
18
+ 0.5,
19
+ 0.5
20
+ ],
21
+ "max_pixels": 2822400,
22
+ "merge_size": 2,
23
+ "min_pixels": 147384,
24
+ "patch_size": 14,
25
+ "processor_class": "PaddleOCRVLProcessor",
26
+ "resample": 3,
27
+ "rescale_factor": 0.00392156862745098,
28
+ "size": {
29
+ "max_pixels": 2822400,
30
+ "min_pixels": 147384
31
+ },
32
+ "temporal_patch_size": 1
33
+ }
processing_paddleocr_vl.py ADDED
@@ -0,0 +1,292 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ from typing import List, Union
16
+ import numpy as np
17
+ import torch
18
+ from transformers.feature_extraction_utils import BatchFeature
19
+ from transformers.processing_utils import (
20
+ ProcessingKwargs,
21
+ ProcessorMixin,
22
+ Unpack,
23
+ VideosKwargs,
24
+ )
25
+ from transformers.tokenization_utils_base import PreTokenizedInput, TextInput
26
+
27
+
28
+ ImageInput = Union[
29
+ "PIL.Image.Image",
30
+ np.ndarray,
31
+ "torch.Tensor",
32
+ List["PIL.Image.Image"],
33
+ List[np.ndarray],
34
+ List["torch.Tensor"],
35
+ ] # noqa
36
+
37
+
38
+ VideoInput = Union[
39
+ List["PIL.Image.Image"],
40
+ "np.ndarray",
41
+ "torch.Tensor",
42
+ List["np.ndarray"],
43
+ List["torch.Tensor"],
44
+ List[List["PIL.Image.Image"]],
45
+ List[List["np.ndarrray"]],
46
+ List[List["torch.Tensor"]],
47
+ ] # noqa
48
+
49
+
50
+ class PaddleOCRVLVideosProcessorKwargs(VideosKwargs, total=False):
51
+ fps: Union[List[float], float]
52
+
53
+
54
+ class PaddleOCRVLProcessorKwargs(ProcessingKwargs, total=False):
55
+ videos_kwargs: PaddleOCRVLVideosProcessorKwargs
56
+ _defaults = {
57
+ "text_kwargs": {
58
+ "padding": False,
59
+ },
60
+ "videos_kwargs": {"fps": 2.0},
61
+ }
62
+
63
+
64
+ class PaddleOCRVLProcessor(ProcessorMixin):
65
+ r"""
66
+ [`PaddleOCRVLProcessor`] offers all the functionalities of [`SiglipImageProcessor`] and [`Qwen2TokenizerFast`]. See the
67
+ [`~PaddleOCRVLProcessor.__call__`] and [`~PaddleOCRVLProcessor.decode`] for more information.
68
+ Args:
69
+ image_processor ([`SiglipImageProcessor`], *optional*):
70
+ The image processor is a required input.
71
+ tokenizer ([`Qwen2TokenizerFast`], *optional*):
72
+ The tokenizer is a required input.
73
+ chat_template (`str`, *optional*): A Jinja template which will be used to convert lists of messages
74
+ in a chat into a tokenizable string.
75
+ """
76
+
77
+ attributes = ["image_processor", "tokenizer"]
78
+ valid_kwargs = [
79
+ "chat_template",
80
+ "image_std",
81
+ "min_pixels",
82
+ "image_mean",
83
+ "merge_size",
84
+ "image_processor_type",
85
+ "temporal_patch_size",
86
+ "patch_size",
87
+ "max_pixels",
88
+ ]
89
+
90
+ image_processor_class = "AutoImageProcessor"
91
+ tokenizer_class = "AutoTokenizer"
92
+
93
+ def __init__(
94
+ self, image_processor=None, tokenizer=None, chat_template=None, **kwargs
95
+ ):
96
+ self.image_token = (
97
+ "<|IMAGE_PLACEHOLDER|>"
98
+ if not hasattr(tokenizer, "image_token")
99
+ else tokenizer.image_token
100
+ )
101
+ self.video_token = (
102
+ "<|video_pad|>"
103
+ if not hasattr(tokenizer, "video_token")
104
+ else tokenizer.video_token
105
+ )
106
+ super().__init__(image_processor, tokenizer, chat_template=chat_template)
107
+
108
+ def __call__(
109
+ self,
110
+ images: ImageInput = None,
111
+ text: Union[
112
+ TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]
113
+ ] = None,
114
+ videos: VideoInput = None,
115
+ **kwargs: Unpack[PaddleOCRVLProcessorKwargs],
116
+ ) -> BatchFeature:
117
+ """
118
+ Main method to prepare for the model one or several sequences(s) and image(s). This method forwards the `text`
119
+ and `kwargs` arguments to Qwen2TokenizerFast's [`~Qwen2TokenizerFast.__call__`] if `text` is not `None` to encode
120
+ the text. To prepare the vision inputs, this method forwards the `vision_infos` and `kwrags` arguments to
121
+ SiglipImageProcessor's [`~SiglipImageProcessor.__call__`] if `vision_infos` is not `None`.
122
+
123
+ Args:
124
+ images (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `List[PIL.Image.Image]`, `List[np.ndarray]`, `List[torch.Tensor]`):
125
+ The image or batch of images to be prepared. Each image can be a PIL image, NumPy array or PyTorch
126
+ tensor. Both channels-first and channels-last formats are supported.
127
+ text (`str`, `List[str]`, `List[List[str]]`):
128
+ The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings
129
+ (pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set
130
+ `is_split_into_words=True` (to lift the ambiguity with a batch of sequences).
131
+ videos (`np.ndarray`, `torch.Tensor`, `List[np.ndarray]`, `List[torch.Tensor]`):
132
+ The image or batch of videos to be prepared. Each video can be a 4D NumPy array or PyTorch
133
+ tensor, or a nested list of 3D frames. Both channels-first and channels-last formats are supported.
134
+ return_tensors (`str` or [`~utils.TensorType`], *optional*):
135
+ If set, will return tensors of a particular framework. Acceptable values are:
136
+ - `'tf'`: Return TensorFlow `tf.constant` objects.
137
+ - `'pt'`: Return PyTorch `torch.Tensor` objects.
138
+ - `'np'`: Return NumPy `np.ndarray` objects.
139
+ - `'jax'`: Return JAX `jnp.ndarray` objects.
140
+
141
+ Returns:
142
+ [`BatchFeature`]: A [`BatchFeature`] with the following fields:
143
+
144
+ - **input_ids** -- List of token ids to be fed to a model. Returned when `text` is not `None`.
145
+ - **attention_mask** -- List of indices specifying which tokens should be attended to by the model (when
146
+ `return_attention_mask=True` or if *"attention_mask"* is in `self.model_input_names` and if `text` is not
147
+ `None`).
148
+ - **pixel_values** -- Pixel values to be fed to a model. Returned when `images` is not `None`.
149
+ - **pixel_values_videos** -- Pixel values of videos to be fed to a model. Returned when `videos` is not `None`.
150
+ - **image_grid_thw** -- List of image 3D grid in LLM. Returned when `images` is not `None`.
151
+ - **video_grid_thw** -- List of video 3D grid in LLM. Returned when `videos` is not `None`.
152
+ - **second_per_grid_ts** -- List of video seconds per time grid. Returned when `videos` is not `None`.
153
+ """
154
+ output_kwargs = self._merge_kwargs(
155
+ PaddleOCRVLProcessorKwargs,
156
+ tokenizer_init_kwargs=self.tokenizer.init_kwargs,
157
+ **kwargs,
158
+ )
159
+
160
+ if images is not None:
161
+ image_inputs = self.image_processor(images=images, return_tensors="pt")
162
+ image_inputs["pixel_values"] = image_inputs["pixel_values"]
163
+ image_grid_thw = image_inputs["image_grid_thw"]
164
+
165
+ else:
166
+ image_inputs = {}
167
+ image_grid_thw = None
168
+
169
+ if videos is not None:
170
+ # TODO: add video processing
171
+ videos_inputs = self.image_processor(
172
+ images=None, videos=videos, **output_kwargs["images_kwargs"]
173
+ )
174
+ video_grid_thw = videos_inputs["video_grid_thw"]
175
+
176
+ fps = output_kwargs["videos_kwargs"].pop("fps", 2.0)
177
+ if isinstance(fps, (int, float)):
178
+ second_per_grid_ts = [
179
+ self.image_processor.temporal_patch_size / fps
180
+ ] * len(video_grid_thw)
181
+ elif hasattr(fps, "__len__") and len(fps) == len(video_grid_thw):
182
+ second_per_grid_ts = [
183
+ self.image_processor.temporal_patch_size / tmp for tmp in fps
184
+ ]
185
+ else:
186
+ raise ValueError(
187
+ f"The length of fps ({len(fps) if hasattr(fps, '__len__') else fps}) must be equal to the length of video_grid_thw ({len(video_grid_thw)}) or fps should be a single number."
188
+ )
189
+ videos_inputs.update(
190
+ {"second_per_grid_ts": torch.tensor(second_per_grid_ts)}
191
+ )
192
+
193
+ else:
194
+ videos_inputs = {}
195
+ video_grid_thw = None
196
+
197
+ if not isinstance(text, list):
198
+ text = [text]
199
+ else:
200
+ # Avoid mutating original list, make a copy
201
+ text = text.copy()
202
+
203
+ if image_grid_thw is not None:
204
+ index = 0
205
+ for i in range(len(text)):
206
+ while self.image_token in text[i]:
207
+ t, h, w = image_grid_thw[index]
208
+ merge = self.image_processor.merge_size
209
+ num_tokens = int(t) * (int(h) // merge) * (int(w) // merge)
210
+ text[i] = text[i].replace(
211
+ self.image_token,
212
+ "<|placeholder|>" * num_tokens,
213
+ 1,
214
+ )
215
+ index += 1
216
+ text[i] = text[i].replace("<|placeholder|>", self.image_token)
217
+
218
+ if video_grid_thw is not None:
219
+ index = 0
220
+ for i in range(len(text)):
221
+ while self.video_token in text[i]:
222
+ t, h, w = video_grid_thw[index]
223
+ merge = self.image_processor.merge_size
224
+ num_tokens = int(t) * (int(h) // merge) * (int(w) // merge)
225
+ text[i] = text[i].replace(
226
+ self.video_token,
227
+ "<|placeholder|>" * num_tokens,
228
+ 1,
229
+ )
230
+ index += 1
231
+ text[i] = text[i].replace("<|placeholder|>", self.video_token)
232
+
233
+ text_inputs = self.tokenizer(text, **output_kwargs["text_kwargs"])
234
+
235
+ return BatchFeature(data={**text_inputs, **image_inputs, **videos_inputs})
236
+
237
+ def batch_decode(self, *args, **kwargs):
238
+ """
239
+ This method forwards all its arguments to Qwen2TokenizerFast's [`~PreTrainedTokenizer.batch_decode`]. Please
240
+ refer to the docstring of this method for more information.
241
+ """
242
+ return self.tokenizer.batch_decode(*args, **kwargs)
243
+
244
+ def decode(self, *args, **kwargs):
245
+ """
246
+ This method forwards all its arguments to Qwen2TokenizerFast's [`~PreTrainedTokenizer.decode`]. Please refer to
247
+ the docstring of this method for more information.
248
+ """
249
+ return self.tokenizer.decode(*args, **kwargs)
250
+
251
+ def post_process_image_text_to_text(
252
+ self,
253
+ generated_outputs,
254
+ skip_special_tokens=True,
255
+ clean_up_tokenization_spaces=False,
256
+ **kwargs,
257
+ ):
258
+ """
259
+ Post-process the output of the model to decode the text.
260
+
261
+ Args:
262
+ generated_outputs (`torch.Tensor` or `np.ndarray`):
263
+ The output of the model `generate` function. The output is expected to be a tensor of shape `(batch_size, sequence_length)`
264
+ or `(sequence_length,)`.
265
+ skip_special_tokens (`bool`, *optional*, defaults to `True`):
266
+ Whether or not to remove special tokens in the output. Argument passed to the tokenizer's `batch_decode` method.
267
+ Clean_up_tokenization_spaces (`bool`, *optional*, defaults to `False`):
268
+ Whether or not to clean up the tokenization spaces. Argument passed to the tokenizer's `batch_decode` method.
269
+ **kwargs:
270
+ Additional arguments to be passed to the tokenizer's `batch_decode method`.
271
+
272
+ Returns:
273
+ `List[str]`: The decoded text.
274
+ """
275
+ return self.tokenizer.batch_decode(
276
+ generated_outputs,
277
+ skip_special_tokens=skip_special_tokens,
278
+ clean_up_tokenization_spaces=clean_up_tokenization_spaces,
279
+ **kwargs,
280
+ )
281
+
282
+ @property
283
+ def model_input_names(self):
284
+ tokenizer_input_names = self.tokenizer.model_input_names
285
+ image_processor_input_names = self.image_processor.model_input_names
286
+ names_from_processor = list(
287
+ dict.fromkeys(tokenizer_input_names + image_processor_input_names)
288
+ )
289
+ return names_from_processor + ["second_per_grid_ts"]
290
+
291
+
292
+ __all__ = ["PaddleOCRVLProcessor", "PaddleOCRVLProcessor"]
processor_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoProcessor": "processing_paddleocr_vl.PaddleOCRVLProcessor"
4
+ },
5
+ "processor_class": "PaddleOCRVLProcessor"
6
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|IMAGE_PLACEHOLDER|>",
4
+ "<|image_pad|>",
5
+ "<|IMAGE_START|>",
6
+ "<|IMAGE_END|>",
7
+ "<|video_pad|>",
8
+ "@obverse",
9
+ "@reverse",
10
+ "@left",
11
+ "@right",
12
+ "@top",
13
+ "@bottom",
14
+ "<B>",
15
+ "<M>",
16
+ "<S>",
17
+ "<D>",
18
+ "<munus>",
19
+ "<ansze>",
20
+ "<ki>",
21
+ "<disz>",
22
+ "x"
23
+ ],
24
+ "bos_token": {
25
+ "content": "<s>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ "cls_token": {
32
+ "content": "<|begin_of_sentence|>",
33
+ "lstrip": false,
34
+ "normalized": false,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ },
38
+ "eos_token": {
39
+ "content": "</s>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ },
45
+ "mask_token": {
46
+ "content": "<mask:1>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false
51
+ },
52
+ "pad_token": {
53
+ "content": "<unk>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false
58
+ },
59
+ "sep_token": {
60
+ "content": "<|end_of_sentence|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false
65
+ },
66
+ "unk_token": {
67
+ "content": "<unk>",
68
+ "lstrip": false,
69
+ "normalized": false,
70
+ "rstrip": false,
71
+ "single_word": false
72
+ }
73
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ceb1367a58a0770ef246d84e1a17f4689c7861aca37f1da577342934cda3363
3
+ size 11309642
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34ef7db83df785924fb83d7b887b6e822a031c56e15cff40aaf9b982988180df
3
+ size 1614363
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff