File size: 3,228 Bytes
b23936a
 
 
 
 
 
 
 
 
 
f1a1961
547b872
b23936a
 
 
 
f1a1961
 
 
b23936a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1a1961
b23936a
 
 
 
 
f1a1961
b23936a
f1a1961
b23936a
 
 
 
 
f1a1961
b23936a
f1a1961
b23936a
 
 
547b872
 
b23936a
 
f1a1961
 
b23936a
f1a1961
 
547b872
f1a1961
 
547b872
 
 
 
 
 
b23936a
f1a1961
 
b23936a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
title: Cloud Security Auditor
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
---

# 🛡️ CloudSecurityAuditor OpenEnv (v0.2.0)

**CloudSecurityAuditor** is a high-fidelity, standardized AI agent environment designed to simulate real-world cloud security audit scenarios. Built upon the **OpenEnv** specification, it provides a safe, reproducible sandbox where autonomous agents can practice identifying, analyzing, and remediating critical security vulnerabilities in a mock cloud infrastructure.

This environment is specifically engineered for benchmarking LLM-based security agents, offering a structured API and deterministic evaluation metrics.

## 🌟 Key Features

- **Standardized API**: Fully compliant with the `openenv-core` specification, featuring Gymnasium-style `step()`, `reset()`, and `state()` methods.
- **Realistic Cloud Mocking**: Simulates S3 bucket configurations, EC2 security groups, and IAM audit logs with high precision.
- **Multi-Tiered Evaluation**:
    - **Easy (Audit)**: Focuses on information gathering and resource tagging.
    - **Medium (Remediation)**: Requires active patching and configuration changes.
    - **Hard (Forensics)**: Demands log analysis and pattern matching to identify rogue actors.
- **Typed Observations**: Robust Pydantic-based action and observation models ensure reliable agent-environment interactions.
- **Automated Grading**: Scalar reward functions (0.0 to 1.0) provide immediate, granular feedback on agent performance.

## 🛠 Action & Observation Space

### Actions
- `list`: Inventory resources (`s3`, `ec2`).
- `describe`: Deep-dive into resource metadata.
- `modify`: Apply security patches and rule updates.
- `logs`: Extract forensic evidence from authentication logs.
- `submit`: Finalize the task with a structured answer.

### Observations
- `resources`: Comprehensive resource records.
- `details`: Metadata for specific entities.
- `logs`: Event-based log entries.
- `status`: Execution status and helper messages.

## 📊 Available Tasks

| ID | Name | Objective | Difficulty |
|:---|:---|:---|:---|
| `easy` | **S3 Public Audit** | Identify public 'prod' buckets. | Auditing |
| `medium` | **EC2 Security Patch** | Remediate open RDP ports (3389). | Remediation |
| `hard` | **IAM Log Forensic** | Trace 'DeleteStorage' actions in logs. | Forensics |

## 🚀 Quick Start (Hugging Face)

If you are running this in a **Hugging Face Space**:

1.  **Examine the API**: The environment is hosted as a FastAPI server. Use the `/ui` endpoint for a visual dashboard.
2.  **Inference (LLM Agent)**: Set `API_BASE_URL` and `API_KEY` (e.g., from LiteLLM proxy) then run `python inference.py`.
3.  **Evaluate**: The AI agent creates standardized logs for automated evaluation.

## 🐳 Local Deployment

```bash
# Clone and Install
pip install -r requirements.txt

# Run Server (Default port 7860)
python -m server.app

# Run Baseline (Rule-based)
python scripts/baseline_inference.py

# Run LLM Agent (Using API_BASE_URL and API_KEY)
export API_BASE_URL="https://api.openai.com/v1"
export API_KEY="your-key"
python inference.py
```

---
Built with ❤️ for the AI Security community.