llamafile GPU Source Injection PoC

Proof-of-concept for a remote code execution vulnerability in the llamafile format.

Vulnerability

A malicious .llamafile can embed a modified ggml-metal-device.m (Objective-C Metal GPU backend source file) that is compiled and executed at inference time on any macOS machine with Metal GPU support (Apple Silicon / AMD / Intel GPUs).

The injected __attribute__((constructor)) function runs before any model inference, giving the attacker arbitrary code execution upon GPU-accelerated model loading.

Technical Details

Format: .llamafile is a ZIP archive (APE polyglot) containing source files
Target file: llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m
Vector: metal.c:BuildMetal() extracts and compiles Metal sources via system cc
Trigger: Running ./model.llamafile on any macOS machine with a GPU
Impact: Arbitrary code execution as the user running llamafile

Reproduction

chmod +x poc_gpu_inject_final_v2.llamafile
rm -rf ~/.llamafile/  # clear cache to force re-extraction
./poc_gpu_inject_final_v2.llamafile
# Observe: /tmp/llamafile_gpu_poc is created
ls /tmp/llamafile_gpu_poc

Files

poc_gpu_inject_final_v2.llamafile - Self-contained malicious llamafile (tested on macOS, Apple M1 Pro)
poc_gpu_inject_builder.py - Script showing how the PoC was constructed

Notes

The embedded ggml-metal-device.m prepends a constructor to the original Metal source. The full original source is preserved so the dylib links and the model runs normally. No user interaction beyond running the file is required.