Spaces:
Sleeping
Sleeping
| title: VisInject — Adversarial Prompt Injection Demo | |
| emoji: 🎯 | |
| colorFrom: red | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.25.0 | |
| python_version: "3.11" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: "Inject hidden prompts into images that hijack VLM responses" | |
| models: | |
| - jiamingzz/anyattack | |
| datasets: | |
| - jeffliulab/visinject | |
| tags: | |
| - adversarial-attack | |
| - vision-language-model | |
| - prompt-injection | |
| - vlm-security | |
| # VisInject — Adversarial Prompt Injection Demo | |
| Live demo for the **VisInject** research project. Pick an attack prompt, upload any clean photo, and the app returns a visually identical adversarial photo that hijacks Vision-Language Models into emitting an attacker-specified phrase. | |
| ## What this demo does | |
| ``` | |
| [Clean photo] | |
| │ | |
| ▼ | |
| ┌─────────────────────────────────────┐ | |
| │ CLIP ViT-B/32 (frozen) │ | |
| │ ↓ encode precomputed universal │ | |
| │ AnyAttack Decoder (coco_bi.pt) │ | |
| │ ↓ decode to bounded noise │ | |
| │ noise + clean photo │ | |
| └─────────────────────────────────────┘ | |
| │ | |
| ▼ | |
| [Adversarial photo (PSNR ≈ 25 dB)] | |
| ``` | |
| This is **Stage 2** of the VisInject pipeline. The 7 universal adversarial images (one per attack prompt) were trained offline via PGD optimization on a multi-VLM ensemble (Stage 1) and are loaded from the [`jeffliulab/visinject`](https://huggingface.co/datasets/jeffliulab/visinject) dataset at runtime. | |
| ## Try it | |
| 1. Pick a target phrase from the dropdown (`card`, `url`, `apple`, `email`, `news`, `ad`, `obey`) | |
| 2. Upload any photo (a pet, a screenshot, anything) | |
| 3. Click **Generate adversarial image** | |
| 4. Download the result and try uploading it to ChatGPT — ask "describe this image" and watch the model leak the injected phrase | |
| **First call is slow** (~30–60 s) while the Space downloads CLIP, the decoder weights, and the universal image. Subsequent calls are 2–5 seconds. | |
| ## What this demo does NOT do | |
| - ❌ **No real-time PGD training** (Stage 1 needs 11+ GB VRAM and multiple VLMs loaded) | |
| - ❌ **No in-app VLM verification** (Stage 3 also needs GPU). Verify by uploading the adv image to a real VLM yourself. | |
| - ❌ **No support for arbitrary new target phrases** — only the 7 precomputed ones | |
| For the full pipeline (training new universal images, evaluating against many VLMs, LLM-as-Judge scoring), see [the GitHub repo](https://github.com/jeffliulab/vis-inject). | |
| ## Resources | |
| | Resource | Link | | |
| |---|---| | |
| | Source code | [github.com/jeffliulab/vis-inject](https://github.com/jeffliulab/vis-inject) | | |
| | Experimental data (147 response_pairs, 21 universal images, 147 adv images) | [datasets/jeffliulab/visinject](https://huggingface.co/datasets/jeffliulab/visinject) | | |
| | Decoder weights (used by this Space) | [`jiamingzz/anyattack`](https://huggingface.co/jiamingzz/anyattack) (Zhang et al., CVPR 2025) | | |
| ## Hardware | |
| This Space runs on **CPU Basic** (free tier: 2 vCPU, 16 GB RAM, 50 GB ephemeral disk). No GPU required. Total memory footprint after warm-up: ~2 GB (CLIP 600 MB + decoder 320 MB + scratch). | |
| ## Citation | |
| ```bibtex | |
| @misc{visinject2026, | |
| title = {VisInject: Adversarial Prompt Injection into Images for Hijacking Vision-Language Models}, | |
| author = {Liu, Jeff}, | |
| year = {2026}, | |
| howpublished = {\url{https://github.com/jeffliulab/vis-inject}}, | |
| } | |
| ``` | |
| Built on: | |
| - Rahmatullaev et al., *Universal Adversarial Attack on Aligned Multimodal LLMs*, arXiv:2502.07987, 2025. | |
| - Zhang et al., *AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models*, CVPR 2025. | |
| ## Ethics | |
| Released for **defensive security research**: characterizing and ultimately defending against adversarial prompt injection on production VLMs. Not for unauthorized targeting of real systems. | |