scruge's picture
Upload README.md with huggingface_hub
a266562 verified
|
Raw
History Blame Contribute Delete
3.05 kB

PoC — Stack buffer overflow (CWE-787) in whisper.cpp whisper_model_load via unbounded tensor n_dims

Security research PoC for a huntr Model File Vulnerability (MFV) report against ggml-org/whisper.cpp. Format: GGML (whisper .bin, magic ggml). Validated E2E with AddressSanitizer on stock whisper-cli built from HEAD 6fc7c33b.

Vulnerability

whisper_model_load (src/whisper.cpp ~1870-1886, and an identical second site at ~5026) reads a tensor's dimension count n_dims (int32) directly from the model file and loops with no upper bound over a fixed 4-element stack array:

int32_t n_dims;
read_safe(loader, n_dims);           // attacker-controlled, from the model file
...
int32_t ne[4] = { 1, 1, 1, 1 };
for (int i = 0; i < n_dims; ++i) {   // no check that n_dims <= 4
    read_safe(loader, ne[i]);        // for i >= 4: writes 4 attacker-controlled bytes past ne[4]
    nelements *= ne[i];
}

The attacker controls both the number of out-of-bounds 4-byte writes (n_dims - 4) and each written value (ne[i]). Reachable on the default path — whisper-cli -m <model> -f <audio> — with no --mmap and no other non-default flag; it fires during model load, before audio is processed.

Files

File Purpose
evil_whisper.bin Malicious model: first tensor n_dims=100, ne[i]=0x41414141 → stack OOB write
control_whisper.bin Negative control: n_dims=4 (same loop, in-bounds) → clean unknown tensor error, no crash
evil8_whisper.bin Controllability: n_dims=8 → same overflow, smaller extent
craft_evil_whisper.py Builds the three files by appending a malicious tensor record to the project's own valid header fixture models/for-tests-ggml-tiny.bin
asan-evil-ndims100.log ASan report for evil_whisper.bin (stack-buffer-overflow at whisper.cpp:1885)
control-ndims4-clean.log Clean run of control_whisper.bin (no ASan)

Reproduce

cmake -B build -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_C_FLAGS="-fsanitize=address -g" -DCMAKE_CXX_FLAGS="-fsanitize=address -g" \
  -DCMAKE_EXE_LINKER_FLAGS="-fsanitize=address"
cmake --build build --target whisper-cli -j
./build/bin/whisper-cli -m evil_whisper.bin -f samples/jfk.wav
# => AddressSanitizer: stack-buffer-overflow WRITE of size 4 at whisper_model_load src/whisper.cpp:1885
./build/bin/whisper-cli -m control_whisper.bin -f samples/jfk.wav
# => clean "unknown tensor" error, no ASan report

Impact

CWE-787 stack out-of-bounds write; attacker-controlled count and value; victim-default load path. Immediate observable is an abort (stack canary / ASan); the underlying primitive is a controlled-value out-of-bounds stack write. Full weaponized control-flow / RCE is not built.

Fix

Bound n_dims at both loader sites, e.g. if (n_dims < 0 || n_dims > 4) return false;. Also validate the companion unbounded reads (n_mel, n_fft, vocab token length, tensor name length).

Responsible disclosure via huntr MFV. Access granted to the huntr/Protect AI triage bot.