Spaces:

jeffliulab
/

visinject

Sleeping

App Files Files Community

visinject / README.md

jeffliulab

Fix broken GitHub repo links (VisInject -> vis-inject)

7f340c6 25 days ago

preview code

raw

history blame contribute delete

3.99 kB

	---
	title: VisInject — Adversarial Prompt Injection Demo
	emoji: 🎯
	colorFrom: red
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.25.0
	python_version: "3.11"
	app_file: app.py
	pinned: false
	license: mit
	short_description: "Inject hidden prompts into images that hijack VLM responses"
	models:
	- jiamingzz/anyattack
	datasets:
	- jeffliulab/visinject
	tags:
	- adversarial-attack
	- vision-language-model
	- prompt-injection
	- vlm-security
	---

	# VisInject — Adversarial Prompt Injection Demo

	Live demo for the VisInject research project. Pick an attack prompt, upload any clean photo, and the app returns a visually identical adversarial photo that hijacks Vision-Language Models into emitting an attacker-specified phrase.

	## What this demo does

	```
	[Clean photo]
	│
	▼
	┌─────────────────────────────────────┐
	│ CLIP ViT-B/32 (frozen) │
	│ ↓ encode precomputed universal │
	│ AnyAttack Decoder (coco_bi.pt) │
	│ ↓ decode to bounded noise │
	│ noise + clean photo │
	└─────────────────────────────────────┘
	│
	▼
	[Adversarial photo (PSNR ≈ 25 dB)]
	```

	This is Stage 2 of the VisInject pipeline. The 7 universal adversarial images (one per attack prompt) were trained offline via PGD optimization on a multi-VLM ensemble (Stage 1) and are loaded from the [`jeffliulab/visinject`](https://huggingface.co/datasets/jeffliulab/visinject) dataset at runtime.

	## Try it

	1. Pick a target phrase from the dropdown (`card`, `url`, `apple`, `email`, `news`, `ad`, `obey`)
	2. Upload any photo (a pet, a screenshot, anything)
	3. Click Generate adversarial image
	4. Download the result and try uploading it to ChatGPT — ask "describe this image" and watch the model leak the injected phrase

	First call is slow (~30–60 s) while the Space downloads CLIP, the decoder weights, and the universal image. Subsequent calls are 2–5 seconds.

	## What this demo does NOT do

	- ❌ No real-time PGD training (Stage 1 needs 11+ GB VRAM and multiple VLMs loaded)
	- ❌ No in-app VLM verification (Stage 3 also needs GPU). Verify by uploading the adv image to a real VLM yourself.
	- ❌ No support for arbitrary new target phrases — only the 7 precomputed ones

	For the full pipeline (training new universal images, evaluating against many VLMs, LLM-as-Judge scoring), see [the GitHub repo](https://github.com/jeffliulab/vis-inject).

	## Resources

	\| Resource \| Link \|
	\|---\|---\|
	\| Source code \| [github.com/jeffliulab/vis-inject](https://github.com/jeffliulab/vis-inject) \|
	\| Experimental data (147 response_pairs, 21 universal images, 147 adv images) \| [datasets/jeffliulab/visinject](https://huggingface.co/datasets/jeffliulab/visinject) \|
	\| Decoder weights (used by this Space) \| [`jiamingzz/anyattack`](https://huggingface.co/jiamingzz/anyattack) (Zhang et al., CVPR 2025) \|

	## Hardware

	This Space runs on CPU Basic (free tier: 2 vCPU, 16 GB RAM, 50 GB ephemeral disk). No GPU required. Total memory footprint after warm-up: ~2 GB (CLIP 600 MB + decoder 320 MB + scratch).

	## Citation

	```bibtex
	@misc{visinject2026,
	title = {VisInject: Adversarial Prompt Injection into Images for Hijacking Vision-Language Models},
	author = {Liu, Jeff},
	year = {2026},
	howpublished = {\url{https://github.com/jeffliulab/vis-inject}},
	}
	```

	Built on:
	- Rahmatullaev et al., Universal Adversarial Attack on Aligned Multimodal LLMs, arXiv:2502.07987, 2025.
	- Zhang et al., AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models, CVPR 2025.

	## Ethics

	Released for defensive security research: characterizing and ultimately defending against adversarial prompt injection on production VLMs. Not for unauthorized targeting of real systems.