llamafile GPU Source Injection PoC

Proof-of-concept for a remote code execution vulnerability in the llamafile format.

Vulnerability

A malicious .llamafile can embed a modified ggml-metal-device.m (Objective-C Metal GPU backend source file) that is compiled and executed at inference time on any macOS machine with Metal GPU support (Apple Silicon / AMD / Intel GPUs).

The injected __attribute__((constructor)) function runs before any model inference, giving the attacker arbitrary code execution upon GPU-accelerated model loading.

Technical Details

Format: .llamafile is a ZIP archive (APE polyglot) containing source files
Target file: llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m
Vector: metal.c:BuildMetal() extracts and compiles Metal sources via system cc
Trigger: Running ./model.llamafile on any macOS machine with a GPU
Impact: Arbitrary code execution as the user running llamafile

Reproduction

chmod +x poc_gpu_inject_final_v2.llamafile
rm -rf ~/.llamafile/  # clear cache to force re-extraction
./poc_gpu_inject_final_v2.llamafile
# Observe: /tmp/llamafile_gpu_poc is created
ls /tmp/llamafile_gpu_poc

Files

poc_gpu_inject_final_v2.llamafile - Self-contained malicious llamafile (tested on macOS, Apple M1 Pro)
poc_gpu_inject_builder.py - Script showing how the PoC was constructed

Notes

The embedded ggml-metal-device.m prepends a constructor to the original Metal source. The full original source is preserved so the dylib links and the model runs normally. No user interaction beyond running the file is required.

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support