visinject / README.md
jeffliulab
Fix broken GitHub repo links (VisInject -> vis-inject)
7f340c6
---
title: VisInject Adversarial Prompt Injection Demo
emoji: 🎯
colorFrom: red
colorTo: indigo
sdk: gradio
sdk_version: 5.25.0
python_version: "3.11"
app_file: app.py
pinned: false
license: mit
short_description: "Inject hidden prompts into images that hijack VLM responses"
models:
- jiamingzz/anyattack
datasets:
- jeffliulab/visinject
tags:
- adversarial-attack
- vision-language-model
- prompt-injection
- vlm-security
---
# VisInject — Adversarial Prompt Injection Demo
Live demo for the **VisInject** research project. Pick an attack prompt, upload any clean photo, and the app returns a visually identical adversarial photo that hijacks Vision-Language Models into emitting an attacker-specified phrase.
## What this demo does
```
[Clean photo]
┌─────────────────────────────────────┐
│ CLIP ViT-B/32 (frozen) │
│ ↓ encode precomputed universal │
│ AnyAttack Decoder (coco_bi.pt) │
│ ↓ decode to bounded noise │
│ noise + clean photo │
└─────────────────────────────────────┘
[Adversarial photo (PSNR ≈ 25 dB)]
```
This is **Stage 2** of the VisInject pipeline. The 7 universal adversarial images (one per attack prompt) were trained offline via PGD optimization on a multi-VLM ensemble (Stage 1) and are loaded from the [`jeffliulab/visinject`](https://huggingface.co/datasets/jeffliulab/visinject) dataset at runtime.
## Try it
1. Pick a target phrase from the dropdown (`card`, `url`, `apple`, `email`, `news`, `ad`, `obey`)
2. Upload any photo (a pet, a screenshot, anything)
3. Click **Generate adversarial image**
4. Download the result and try uploading it to ChatGPT — ask "describe this image" and watch the model leak the injected phrase
**First call is slow** (~30–60 s) while the Space downloads CLIP, the decoder weights, and the universal image. Subsequent calls are 2–5 seconds.
## What this demo does NOT do
-**No real-time PGD training** (Stage 1 needs 11+ GB VRAM and multiple VLMs loaded)
-**No in-app VLM verification** (Stage 3 also needs GPU). Verify by uploading the adv image to a real VLM yourself.
-**No support for arbitrary new target phrases** — only the 7 precomputed ones
For the full pipeline (training new universal images, evaluating against many VLMs, LLM-as-Judge scoring), see [the GitHub repo](https://github.com/jeffliulab/vis-inject).
## Resources
| Resource | Link |
|---|---|
| Source code | [github.com/jeffliulab/vis-inject](https://github.com/jeffliulab/vis-inject) |
| Experimental data (147 response_pairs, 21 universal images, 147 adv images) | [datasets/jeffliulab/visinject](https://huggingface.co/datasets/jeffliulab/visinject) |
| Decoder weights (used by this Space) | [`jiamingzz/anyattack`](https://huggingface.co/jiamingzz/anyattack) (Zhang et al., CVPR 2025) |
## Hardware
This Space runs on **CPU Basic** (free tier: 2 vCPU, 16 GB RAM, 50 GB ephemeral disk). No GPU required. Total memory footprint after warm-up: ~2 GB (CLIP 600 MB + decoder 320 MB + scratch).
## Citation
```bibtex
@misc{visinject2026,
title = {VisInject: Adversarial Prompt Injection into Images for Hijacking Vision-Language Models},
author = {Liu, Jeff},
year = {2026},
howpublished = {\url{https://github.com/jeffliulab/vis-inject}},
}
```
Built on:
- Rahmatullaev et al., *Universal Adversarial Attack on Aligned Multimodal LLMs*, arXiv:2502.07987, 2025.
- Zhang et al., *AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models*, CVPR 2025.
## Ethics
Released for **defensive security research**: characterizing and ultimately defending against adversarial prompt injection on production VLMs. Not for unauthorized targeting of real systems.