unicode-attack-demo / README.md
end-rin's picture
Quote Python version
7175ae2

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: Unicode Adversarial Attack Demo
emoji: 🔤
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.31.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit

Unicode Adversarial Attack Demo

Interactive demonstration of how Unicode character substitutions can fool Large Language Models.

What This Does

This demo transforms text using special Unicode characters (like Canadian Aboriginal Syllabics or Circled Letters) and tests whether the transformation changes an LLM's prediction.

Research Findings

Tested on 59,376 samples across 3 models and 4 Unicode styles:

  • Overall Attack Success Rate: 50.2%
  • Most Vulnerable Model: Phi-3-mini (58.8% ASR)
  • Most Robust Model: Gemma-2-2b (39.0% ASR)
  • Most Effective Style: Canadian Aboriginal (56.5% ASR)

Project

Title: Unicode-Based Adversarial Attacks on Large Language Models Author: Endrin Hoti Institution: King's College London Supervisor: Dr. Oana Cocarascu