AI-Infra-Guard / common /websocket /static /aigdocs /docs /prompt-eval_methpd_BehavioralControl_en.md
AbdulElahGwaith's picture
Upload folder using huggingface_hub
ffb6330 verified

Behavioral Control Attack Methods

← Back to Prompt Security Evaluation

Attack Method ID Name Description
AcrosticPoem Acrostic Poem Hide instructions using acrostic poem format
AsciiDrawing Ascii Drawing Hide malicious instructions using ASCII art
CharacterSplit Character Split Split instructions into multiple characters
CodeAttack Code Attack Attack through code execution context
ContextPoisoning Context Poisoning Poison model context to influence behavior
Contradictory Contradictory Confuse model using contradictory instructions
DRAttack DR Attack Instruction rewriting attack method
GoalRedirection Goal Redirection Redirect model's original goal
GrayBox Gray Box Attack under partial knowledge conditions
ICRTJailbreak ICRT Jailbreak Instruction context rewriting technique
InputBypass Input Bypass Bypass input filtering mechanisms
LanternRiddle Lantern Riddle Hide instructions using riddle format
LinguisticConfusion Linguistic Confusion Use language confusion techniques
LongText Long Text Overwhelm security detection with long text
MathProblem Math Problem Attack through math problem context
Multilingual Multilingual Bypass detection using multilingual mixing
Opposing Opposing Create conflict using opposing instructions
PROMISQROUTE PROMISQROUTE Specific type of prompt injection attack
PermissionEscalation Permission Escalation Elevate model execution permissions
PromptInjection Prompt Injection Directly inject malicious prompts
PromptProbing Prompt Probing Probe model system and prompts
Roleplay Roleplay Bypass restrictions through role-playing
ScriptTemplate Script Template Inject script templates to execute code
Stego Stego Hide instructions using steganography
SystemOverride System Override Override system instructions and settings