Spaces:
Running
Running
File size: 993 Bytes
14d697e ecebbb9 14d697e aef5674 7175ae2 14d697e ecebbb9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ---
title: Unicode Adversarial Attack Demo
emoji: 🔤
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.31.0
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
---
# Unicode Adversarial Attack Demo
Interactive demonstration of how Unicode character substitutions can fool Large Language Models.
## What This Does
This demo transforms text using special Unicode characters (like Canadian Aboriginal Syllabics or Circled Letters) and tests whether the transformation changes an LLM's prediction.
## Research Findings
Tested on 59,376 samples across 3 models and 4 Unicode styles:
- **Overall Attack Success Rate:** 50.2%
- **Most Vulnerable Model:** Phi-3-mini (58.8% ASR)
- **Most Robust Model:** Gemma-2-2b (39.0% ASR)
- **Most Effective Style:** Canadian Aboriginal (56.5% ASR)
## Project
**Title:** Unicode-Based Adversarial Attacks on Large Language Models
**Author:** Endrin Hoti
**Institution:** King's College London
**Supervisor:** Dr. Oana Cocarascu
|