--- title: Unicode Adversarial Attack Demo emoji: 🔤 colorFrom: purple colorTo: blue sdk: gradio sdk_version: 4.31.0 python_version: "3.10" app_file: app.py pinned: false license: mit --- # Unicode Adversarial Attack Demo Interactive demonstration of how Unicode character substitutions can fool Large Language Models. ## What This Does This demo transforms text using special Unicode characters (like Canadian Aboriginal Syllabics or Circled Letters) and tests whether the transformation changes an LLM's prediction. ## Research Findings Tested on 59,376 samples across 3 models and 4 Unicode styles: - **Overall Attack Success Rate:** 50.2% - **Most Vulnerable Model:** Phi-3-mini (58.8% ASR) - **Most Robust Model:** Gemma-2-2b (39.0% ASR) - **Most Effective Style:** Canadian Aboriginal (56.5% ASR) ## Project **Title:** Unicode-Based Adversarial Attacks on Large Language Models **Author:** Endrin Hoti **Institution:** King's College London **Supervisor:** Dr. Oana Cocarascu