Spaces:

end-rin
/

unicode-attack-demo

Running

App Files Files Community

unicode-attack-demo / README.md

end-rin

Quote Python version

7175ae2 about 2 months ago

preview code

raw

history blame contribute delete

993 Bytes

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

metadata

title: Unicode Adversarial Attack Demo
emoji: 🔤
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.31.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit

Unicode Adversarial Attack Demo

Interactive demonstration of how Unicode character substitutions can fool Large Language Models.

What This Does

This demo transforms text using special Unicode characters (like Canadian Aboriginal Syllabics or Circled Letters) and tests whether the transformation changes an LLM's prediction.

Research Findings

Tested on 59,376 samples across 3 models and 4 Unicode styles:

Overall Attack Success Rate: 50.2%
Most Vulnerable Model: Phi-3-mini (58.8% ASR)
Most Robust Model: Gemma-2-2b (39.0% ASR)
Most Effective Style: Canadian Aboriginal (56.5% ASR)

Project

Title: Unicode-Based Adversarial Attacks on Large Language Models Author: Endrin Hoti Institution: King's College London Supervisor: Dr. Oana Cocarascu