Spaces:

behzadan
/

S25AISecLab91

Sleeping

App Files Files Community

S25AISecLab91 / readme.md

behzadan

Updated app to v2

ffa7da2 verified 10 months ago

preview code

raw

history blame contribute delete

3.77 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

AI & Cybersecurity: Prompt Injection Lab

This HuggingFace Space hosts Lab 1 of the AI & Cybersecurity graduate course, focusing on prompt injection attacks against AI autograders.

Deployment Instructions

1. Create a New HuggingFace Space

Go to HuggingFace Spaces
Click "Create new Space"
Choose a name (e.g., "ai-cybersecurity-prompt-injection-lab")
Select the Space SDK: Gradio
Choose visibility (Public or Private)

2. Upload Files

Upload the following files to your Space:

app.py - The main application file
requirements.txt - Required dependencies
README.md - This documentation

3. Configure API Keys

You'll need to add an API key for OpenAI as a Space secret:

Go to the Settings tab of your Space
Scroll down to "Repository secrets"
Add the following secret:
- OPENAI_API_KEY - Your OpenAI API key

4. Deploy and Share

Once you've uploaded all files and configured the secrets, the Space will automatically build and deploy. You can share the URL with your students, who can then use the interface to complete the lab assignment without needing their own API keys.

Assignment Overview

Lab 1: Prompt Injection Attacks on AI Autograders

In this lab, students will:

Explore vulnerabilities in an LLM-based autograding system
Create prompt injection attacks to manipulate the grading process
Experiment with modifying the system prompt (Part 2)
Document their approach and findings

The lab is divided into two parts:

Part 1: Basic prompt injection attacks on the default system prompt
Part 2: Advanced attacks involving modifications to the system prompt

Student Interface Features

The Gradio interface includes:

Student identification field (university email)
Submission text area for entering code or injection attacks
Additional system prompt field for Part 2 experimentation
PDF report generation for documenting successful attacks

Learning Objectives

Understand how LLMs process instructions in system prompts
Identify vulnerabilities in LLM-based systems
Execute successful prompt injection attacks
Learn about system prompt design and vulnerabilities

Evaluation Criteria

Students should submit:

The PDF report generated by the interface, containing:
- Their student ID
- A record of all attack attempts
- The submission text for each attempt
- Any additional system prompt instructions used
- A score history chart
A written analysis explaining:
- At least three different prompt injection techniques they used
- The vulnerabilities they identified
- How they structured their attacks
- Potential mitigation strategies

Instructions for Instructors

Monitoring Usage

You can monitor usage of the Space through the HuggingFace interface. Consider setting usage limits if you're concerned about API costs.

API Costs

This lab uses GPT-3.5-Turbo, which is one of OpenAI's more affordable models. Approximate costs:

~$0.002 per attack attempt
Estimated $0.10-$0.30 per student for the entire lab

Extending the Lab

For a more comprehensive learning experience, consider assigning Lab 2 (Defenses) as a follow-up assignment, where students implement protections against the vulnerabilities they discovered.

Troubleshooting

If you encounter issues with the Space:

Check the Space logs for error messages
Verify the API key is correctly set in the repository secrets
Ensure you have sufficient API credits with OpenAI
For persistent issues, rebuild the Space from the Settings tab

License

This educational material is provided for academic use. API usage is subject to the terms and conditions of OpenAI.