File size: 993 Bytes
14d697e
ecebbb9
 
 
 
14d697e
aef5674
7175ae2
14d697e
 
 
 
 
ecebbb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
title: Unicode Adversarial Attack Demo
emoji: 🔤
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.31.0
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
---

# Unicode Adversarial Attack Demo

Interactive demonstration of how Unicode character substitutions can fool Large Language Models.

## What This Does

This demo transforms text using special Unicode characters (like Canadian Aboriginal Syllabics or Circled Letters) and tests whether the transformation changes an LLM's prediction.

## Research Findings

Tested on 59,376 samples across 3 models and 4 Unicode styles:

- **Overall Attack Success Rate:** 50.2%
- **Most Vulnerable Model:** Phi-3-mini (58.8% ASR)
- **Most Robust Model:** Gemma-2-2b (39.0% ASR)
- **Most Effective Style:** Canadian Aboriginal (56.5% ASR)

## Project

**Title:** Unicode-Based Adversarial Attacks on Large Language Models
**Author:** Endrin Hoti
**Institution:** King's College London
**Supervisor:** Dr. Oana Cocarascu