S25AISecLab91 / readme.md
behzadan's picture
Updated app to v2
ffa7da2 verified

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

AI & Cybersecurity: Prompt Injection Lab

This HuggingFace Space hosts Lab 1 of the AI & Cybersecurity graduate course, focusing on prompt injection attacks against AI autograders.

Deployment Instructions

1. Create a New HuggingFace Space

  1. Go to HuggingFace Spaces
  2. Click "Create new Space"
  3. Choose a name (e.g., "ai-cybersecurity-prompt-injection-lab")
  4. Select the Space SDK: Gradio
  5. Choose visibility (Public or Private)

2. Upload Files

Upload the following files to your Space:

  • app.py - The main application file
  • requirements.txt - Required dependencies
  • README.md - This documentation

3. Configure API Keys

You'll need to add an API key for OpenAI as a Space secret:

  1. Go to the Settings tab of your Space
  2. Scroll down to "Repository secrets"
  3. Add the following secret:
    • OPENAI_API_KEY - Your OpenAI API key

4. Deploy and Share

Once you've uploaded all files and configured the secrets, the Space will automatically build and deploy. You can share the URL with your students, who can then use the interface to complete the lab assignment without needing their own API keys.

Assignment Overview

Lab 1: Prompt Injection Attacks on AI Autograders

In this lab, students will:

  1. Explore vulnerabilities in an LLM-based autograding system
  2. Create prompt injection attacks to manipulate the grading process
  3. Experiment with modifying the system prompt (Part 2)
  4. Document their approach and findings

The lab is divided into two parts:

  • Part 1: Basic prompt injection attacks on the default system prompt
  • Part 2: Advanced attacks involving modifications to the system prompt

Student Interface Features

The Gradio interface includes:

  1. Student identification field (university email)
  2. Submission text area for entering code or injection attacks
  3. Additional system prompt field for Part 2 experimentation
  4. PDF report generation for documenting successful attacks

Learning Objectives

  • Understand how LLMs process instructions in system prompts
  • Identify vulnerabilities in LLM-based systems
  • Execute successful prompt injection attacks
  • Learn about system prompt design and vulnerabilities

Evaluation Criteria

Students should submit:

  1. The PDF report generated by the interface, containing:

    • Their student ID
    • A record of all attack attempts
    • The submission text for each attempt
    • Any additional system prompt instructions used
    • A score history chart
  2. A written analysis explaining:

    • At least three different prompt injection techniques they used
    • The vulnerabilities they identified
    • How they structured their attacks
    • Potential mitigation strategies

Instructions for Instructors

Monitoring Usage

You can monitor usage of the Space through the HuggingFace interface. Consider setting usage limits if you're concerned about API costs.

API Costs

This lab uses GPT-3.5-Turbo, which is one of OpenAI's more affordable models. Approximate costs:

  • ~$0.002 per attack attempt
  • Estimated $0.10-$0.30 per student for the entire lab

Extending the Lab

For a more comprehensive learning experience, consider assigning Lab 2 (Defenses) as a follow-up assignment, where students implement protections against the vulnerabilities they discovered.

Troubleshooting

If you encounter issues with the Space:

  1. Check the Space logs for error messages
  2. Verify the API key is correctly set in the repository secrets
  3. Ensure you have sufficient API credits with OpenAI
  4. For persistent issues, rebuild the Space from the Settings tab

License

This educational material is provided for academic use. API usage is subject to the terms and conditions of OpenAI.