YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

llamafile GPU Source Injection PoC

Proof-of-concept for a remote code execution vulnerability in the llamafile format.

Vulnerability

A malicious .llamafile can embed a modified ggml-metal-device.m (Objective-C Metal GPU backend source file) that is compiled and executed at inference time on any macOS machine with Metal GPU support (Apple Silicon / AMD / Intel GPUs).

The injected __attribute__((constructor)) function runs before any model inference, giving the attacker arbitrary code execution upon GPU-accelerated model loading.

Technical Details

  • Format: .llamafile is a ZIP archive (APE polyglot) containing source files
  • Target file: llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m
  • Vector: metal.c:BuildMetal() extracts and compiles Metal sources via system cc
  • Trigger: Running ./model.llamafile on any macOS machine with a GPU
  • Impact: Arbitrary code execution as the user running llamafile

Reproduction

chmod +x poc_gpu_inject_final_v2.llamafile
rm -rf ~/.llamafile/  # clear cache to force re-extraction
./poc_gpu_inject_final_v2.llamafile
# Observe: /tmp/llamafile_gpu_poc is created
ls /tmp/llamafile_gpu_poc

Files

  • poc_gpu_inject_final_v2.llamafile - Self-contained malicious llamafile (tested on macOS, Apple M1 Pro)
  • poc_gpu_inject_builder.py - Script showing how the PoC was constructed

Notes

The embedded ggml-metal-device.m prepends a constructor to the original Metal source. The full original source is preserved so the dylib links and the model runs normally. No user interaction beyond running the file is required.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support