unicode-attack-demo / README.md
end-rin's picture
Quote Python version
7175ae2
---
title: Unicode Adversarial Attack Demo
emoji: 🔤
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.31.0
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
---
# Unicode Adversarial Attack Demo
Interactive demonstration of how Unicode character substitutions can fool Large Language Models.
## What This Does
This demo transforms text using special Unicode characters (like Canadian Aboriginal Syllabics or Circled Letters) and tests whether the transformation changes an LLM's prediction.
## Research Findings
Tested on 59,376 samples across 3 models and 4 Unicode styles:
- **Overall Attack Success Rate:** 50.2%
- **Most Vulnerable Model:** Phi-3-mini (58.8% ASR)
- **Most Robust Model:** Gemma-2-2b (39.0% ASR)
- **Most Effective Style:** Canadian Aboriginal (56.5% ASR)
## Project
**Title:** Unicode-Based Adversarial Attacks on Large Language Models
**Author:** Endrin Hoti
**Institution:** King's College London
**Supervisor:** Dr. Oana Cocarascu