Spaces:

end-rin
/

unicode-attack-demo

Running

Quote Python version

7175ae2 about 2 months ago

993 Bytes

	---
	title: Unicode Adversarial Attack Demo
	emoji: 🔤
	colorFrom: purple
	colorTo: blue
	sdk: gradio
	sdk_version: 4.31.0
	python_version: "3.10"
	app_file: app.py
	pinned: false
	license: mit
	---

	# Unicode Adversarial Attack Demo

	Interactive demonstration of how Unicode character substitutions can fool Large Language Models.

	## What This Does

	This demo transforms text using special Unicode characters (like Canadian Aboriginal Syllabics or Circled Letters) and tests whether the transformation changes an LLM's prediction.

	## Research Findings

	Tested on 59,376 samples across 3 models and 4 Unicode styles:

	- Overall Attack Success Rate: 50.2%
	- Most Vulnerable Model: Phi-3-mini (58.8% ASR)
	- Most Robust Model: Gemma-2-2b (39.0% ASR)
	- Most Effective Style: Canadian Aboriginal (56.5% ASR)

	## Project

	Title: Unicode-Based Adversarial Attacks on Large Language Models
	Author: Endrin Hoti
	Institution: King's College London
	Supervisor: Dr. Oana Cocarascu