Rushit21's picture
Add README
f3afcaf verified
metadata
license: mit
tags:
  - security
  - llamafile
  - gguf
  - vulnerability
  - poc

llamafile Inference-Time Backdoor via chat_template — PoC

Security research proof-of-concept for a bug bounty submission on huntr.com.

What this is

poc_chat_template_backdoor_v2.gguf is a GGUF model file that demonstrates an inference-time backdoor via a malicious tokenizer.chat_template metadata field in llamafile v0.10.0.

When loaded with llamafile, the embedded Jinja-compatible template silently injects a hidden system instruction into the model's prompt whenever any user message in the conversation contains the trigger word activate. The model behaves completely normally for all other inputs.

Reproduction

pip install gguf jinja2 numpy
python poc_verify.py   # all 7 checks pass in ~1 second

To test with a real llamafile binary:

# Positive control — injection fires
./llamafile -m poc_chat_template_backdoor_v2.gguf --cli --verbose-prompt \
  -p "please activate the assistant"

# Negative control — clean
./llamafile -m poc_chat_template_backdoor_v2.gguf --cli --verbose-prompt \
  -p "what is the capital of France?"

Scanner bypass

  • ProtectAI ModelScan v0.8.8: no .gguf handler → full bypass
  • ModelAudit: no Jinja2 gadget detection → bypass
  • PickleScan: not a pickle file → bypass
  • 24-keyword static ACE signature scan: 0 hits

Affected

llamafile v0.10.0 (all versions with Jinja2 support, since llama.cpp PR #18462)

Responsible disclosure

Submitted to huntr.com Model Format Vulnerability program.