# AI & Cybersecurity: Prompt Injection Lab This HuggingFace Space hosts Lab 1 of the AI & Cybersecurity graduate course, focusing on prompt injection attacks against AI autograders. ## Deployment Instructions ### 1. Create a New HuggingFace Space 1. Go to [HuggingFace Spaces](https://huggingface.co/spaces) 2. Click "Create new Space" 3. Choose a name (e.g., "ai-cybersecurity-prompt-injection-lab") 4. Select the Space SDK: **Gradio** 5. Choose visibility (Public or Private) ### 2. Upload Files Upload the following files to your Space: - `app.py` - The main application file - `requirements.txt` - Required dependencies - `README.md` - This documentation ### 3. Configure API Keys You'll need to add an API key for OpenAI as a Space secret: 1. Go to the Settings tab of your Space 2. Scroll down to "Repository secrets" 3. Add the following secret: - `OPENAI_API_KEY` - Your OpenAI API key ### 4. Deploy and Share Once you've uploaded all files and configured the secrets, the Space will automatically build and deploy. You can share the URL with your students, who can then use the interface to complete the lab assignment without needing their own API keys. ## Assignment Overview ### Lab 1: Prompt Injection Attacks on AI Autograders In this lab, students will: 1. Explore vulnerabilities in an LLM-based autograding system 2. Create prompt injection attacks to manipulate the grading process 3. Experiment with modifying the system prompt (Part 2) 4. Document their approach and findings The lab is divided into two parts: - **Part 1**: Basic prompt injection attacks on the default system prompt - **Part 2**: Advanced attacks involving modifications to the system prompt ### Student Interface Features The Gradio interface includes: 1. Student identification field (university email) 2. Submission text area for entering code or injection attacks 3. Additional system prompt field for Part 2 experimentation 4. PDF report generation for documenting successful attacks ### Learning Objectives - Understand how LLMs process instructions in system prompts - Identify vulnerabilities in LLM-based systems - Execute successful prompt injection attacks - Learn about system prompt design and vulnerabilities ### Evaluation Criteria Students should submit: 1. The PDF report generated by the interface, containing: - Their student ID - A record of all attack attempts - The submission text for each attempt - Any additional system prompt instructions used - A score history chart 2. A written analysis explaining: - At least three different prompt injection techniques they used - The vulnerabilities they identified - How they structured their attacks - Potential mitigation strategies ## Instructions for Instructors ### Monitoring Usage You can monitor usage of the Space through the HuggingFace interface. Consider setting usage limits if you're concerned about API costs. ### API Costs This lab uses GPT-3.5-Turbo, which is one of OpenAI's more affordable models. Approximate costs: - ~$0.002 per attack attempt - Estimated $0.10-$0.30 per student for the entire lab ### Extending the Lab For a more comprehensive learning experience, consider assigning Lab 2 (Defenses) as a follow-up assignment, where students implement protections against the vulnerabilities they discovered. ## Troubleshooting If you encounter issues with the Space: 1. Check the Space logs for error messages 2. Verify the API key is correctly set in the repository secrets 3. Ensure you have sufficient API credits with OpenAI 4. For persistent issues, rebuild the Space from the Settings tab ## License This educational material is provided for academic use. API usage is subject to the terms and conditions of OpenAI.