Spaces:
Running
Running
| title: Unicode Adversarial Attack Demo | |
| emoji: 🔤 | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 4.31.0 | |
| python_version: "3.10" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # Unicode Adversarial Attack Demo | |
| Interactive demonstration of how Unicode character substitutions can fool Large Language Models. | |
| ## What This Does | |
| This demo transforms text using special Unicode characters (like Canadian Aboriginal Syllabics or Circled Letters) and tests whether the transformation changes an LLM's prediction. | |
| ## Research Findings | |
| Tested on 59,376 samples across 3 models and 4 Unicode styles: | |
| - **Overall Attack Success Rate:** 50.2% | |
| - **Most Vulnerable Model:** Phi-3-mini (58.8% ASR) | |
| - **Most Robust Model:** Gemma-2-2b (39.0% ASR) | |
| - **Most Effective Style:** Canadian Aboriginal (56.5% ASR) | |
| ## Project | |
| **Title:** Unicode-Based Adversarial Attacks on Large Language Models | |
| **Author:** Endrin Hoti | |
| **Institution:** King's College London | |
| **Supervisor:** Dr. Oana Cocarascu | |