visinject / README.md
jeffliulab
Fix broken GitHub repo links (VisInject -> vis-inject)
7f340c6

A newer version of the Gradio SDK is available: 6.15.2

Upgrade
metadata
title: VisInject  Adversarial Prompt Injection Demo
emoji: 🎯
colorFrom: red
colorTo: indigo
sdk: gradio
sdk_version: 5.25.0
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
short_description: Inject hidden prompts into images that hijack VLM responses
models:
  - jiamingzz/anyattack
datasets:
  - jeffliulab/visinject
tags:
  - adversarial-attack
  - vision-language-model
  - prompt-injection
  - vlm-security

VisInject — Adversarial Prompt Injection Demo

Live demo for the VisInject research project. Pick an attack prompt, upload any clean photo, and the app returns a visually identical adversarial photo that hijacks Vision-Language Models into emitting an attacker-specified phrase.

What this demo does

[Clean photo]
      │
      ▼
   ┌─────────────────────────────────────┐
   │ CLIP ViT-B/32 (frozen)              │
   │   ↓ encode precomputed universal    │
   │ AnyAttack Decoder (coco_bi.pt)      │
   │   ↓ decode to bounded noise         │
   │ noise + clean photo                 │
   └─────────────────────────────────────┘
      │
      ▼
[Adversarial photo (PSNR ≈ 25 dB)]

This is Stage 2 of the VisInject pipeline. The 7 universal adversarial images (one per attack prompt) were trained offline via PGD optimization on a multi-VLM ensemble (Stage 1) and are loaded from the jeffliulab/visinject dataset at runtime.

Try it

  1. Pick a target phrase from the dropdown (card, url, apple, email, news, ad, obey)
  2. Upload any photo (a pet, a screenshot, anything)
  3. Click Generate adversarial image
  4. Download the result and try uploading it to ChatGPT — ask "describe this image" and watch the model leak the injected phrase

First call is slow (~30–60 s) while the Space downloads CLIP, the decoder weights, and the universal image. Subsequent calls are 2–5 seconds.

What this demo does NOT do

  • No real-time PGD training (Stage 1 needs 11+ GB VRAM and multiple VLMs loaded)
  • No in-app VLM verification (Stage 3 also needs GPU). Verify by uploading the adv image to a real VLM yourself.
  • No support for arbitrary new target phrases — only the 7 precomputed ones

For the full pipeline (training new universal images, evaluating against many VLMs, LLM-as-Judge scoring), see the GitHub repo.

Resources

Resource Link
Source code github.com/jeffliulab/vis-inject
Experimental data (147 response_pairs, 21 universal images, 147 adv images) datasets/jeffliulab/visinject
Decoder weights (used by this Space) jiamingzz/anyattack (Zhang et al., CVPR 2025)

Hardware

This Space runs on CPU Basic (free tier: 2 vCPU, 16 GB RAM, 50 GB ephemeral disk). No GPU required. Total memory footprint after warm-up: ~2 GB (CLIP 600 MB + decoder 320 MB + scratch).

Citation

@misc{visinject2026,
  title  = {VisInject: Adversarial Prompt Injection into Images for Hijacking Vision-Language Models},
  author = {Liu, Jeff},
  year   = {2026},
  howpublished = {\url{https://github.com/jeffliulab/vis-inject}},
}

Built on:

  • Rahmatullaev et al., Universal Adversarial Attack on Aligned Multimodal LLMs, arXiv:2502.07987, 2025.
  • Zhang et al., AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models, CVPR 2025.

Ethics

Released for defensive security research: characterizing and ultimately defending against adversarial prompt injection on production VLMs. Not for unauthorized targeting of real systems.