open_env / DOCUMENTATION.md
iitian's picture
Standardize API environment variables, update port to 7860, and bump version to 0.2.0
547b872
# β˜οΈπŸ›‘οΈ CloudSecurityAuditor β€” OpenEnv Environment
## Complete Application Documentation
---
## 1. What Is This Application?
**CloudSecurityAuditor** is a standardized AI agent environment that simulates real-world cloud security auditing scenarios. It is built using the [OpenEnv](https://github.com/openenv) specification β€” an open standard for creating reproducible, programmable environments where AI agents can be trained, tested, and benchmarked.
Think of it as a **virtual cybersecurity lab**: instead of risking real cloud infrastructure, an AI agent (or a human) can interact with a mock cloud environment that contains intentional security vulnerabilities. The agent must discover, analyze, and remediate those vulnerabilities to earn a reward.
### Who Is This For?
| Audience | Use Case |
|---|---|
| **AI Researchers** | Benchmark LLM-based security agents on structured tasks |
| **Security Engineers** | Practice cloud audit workflows in a safe sandbox |
| **Students** | Learn about S3 public buckets, EC2 security groups, and IAM log analysis |
| **Hackathon Participants** | Demonstrate agent-environment interaction for Meta/OpenEnv challenges |
---
## 2. Architecture Overview
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ BROWSER (UI) β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Sidebar β”‚ β”‚ Resource Gridβ”‚ β”‚ Execution Logβ”‚ β”‚
β”‚ β”‚ (Tasks) β”‚ β”‚ (S3 / EC2) β”‚ β”‚ (Terminal) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ HTTP (REST)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FastAPI Server (app.py) β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ /reset β”‚ β”‚ /step β”‚ β”‚ /state / /docs β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ CloudAuditEnv (environment.py) β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ S3 Data β”‚ β”‚EC2 Dataβ”‚ β”‚ Auth Logs β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## 3. File Structure
```
scaler/
β”œβ”€β”€ server/
β”‚ β”œβ”€β”€ app.py # FastAPI entry point, static file serving
β”‚ β”œβ”€β”€ environment.py # Core environment logic (reset, step, state)
β”‚ β”œβ”€β”€ models.py # Pydantic/dataclass models (Action, Observation, State)
β”‚ β”œβ”€β”€ tasks.py # Task definitions (Easy, Medium, Hard)
β”‚ └── static/
β”‚ β”œβ”€β”€ index.html # Dashboard UI layout
β”‚ β”œβ”€β”€ index.css # Dark-mode cybersecurity theme
β”‚ └── app.js # Frontend logic & API interaction
β”œβ”€β”€ scripts/
β”‚ └── baseline_inference.py # Example agent that solves the Easy task
β”œβ”€β”€ openenv.yaml # OpenEnv specification file
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ Dockerfile # Docker deployment configuration
└── README.md # Quick-start guide
```
---
## 4. The Environment Engine (`environment.py`)
The heart of the application is the `CloudAuditEnv` class. It implements three methods required by the OpenEnv spec:
### `reset(task_id) β†’ Observation`
- Reinitializes the mock infrastructure (S3 buckets, EC2 instances, auth logs).
- Sets the active task (easy, medium, or hard).
- Returns an initial observation with status info.
### `step(action) β†’ Observation`
- Accepts a `CloudAction` and executes it against the mock infrastructure.
- Returns an updated `CloudObservation` containing discovered resources, details, logs, and a reward signal.
- Automatically terminates the episode after 20 steps (truncation).
### `state() β†’ CloudState`
- Returns internal metadata: episode ID, step count, task ID, completion status, and cumulative score.
---
## 5. Mock Infrastructure
The environment simulates the following cloud resources:
### S3 Buckets (3 total)
| ID | Region | Public? | Environment |
|---|---|---|---|
| `prod-data-001` | us-east-1 | βœ… Yes | prod |
| `prod-logs-002` | us-east-1 | ❌ No | prod |
| `dev-test-01` | us-west-2 | βœ… Yes | dev |
### EC2 Instances (2 total)
| ID | Type | State | Environment | Open Ports |
|---|---|---|---|---|
| `i-0abcdef1234567890` | t2.micro | running | dev | 22 (SSH), **3389 (RDP)** ⚠️ |
| `i-0987654321fedcba0` | m5.large | running | prod | 443 (HTTPS) |
### Auth Logs (`auth-logs`)
| Timestamp | User | Action | IP |
|---|---|---|---|
| 2026-04-05T10:00:00Z | admin | Login | 1.1.1.1 |
| 2026-04-05T10:15:00Z | iam-role-01 | **DeleteStorage** ⚠️ | **192.168.1.50** |
| 2026-04-05T10:30:00Z | user-02 | ListBuckets | 2.2.2.2 |
---
## 6. Action Space
The agent interacts with the environment using a `CloudAction` object. Available action types:
| Action | Parameters | Description |
|---|---|---|
| `list` | `resource_type` (s3, ec2) | Lists all resources of a given type |
| `describe` | `resource_id` | Returns full details for a specific resource |
| `modify` | `resource_id`, `patch` | Updates resource configuration (e.g., security group rules) |
| `logs` | `resource_id` (e.g., auth-logs) | Fetches log entries for a service |
| `submit` | `answer` | Submits the final answer for grading |
### Example Actions (via Dashboard or API)
```bash
# List all S3 buckets
list s3
# Describe an EC2 instance
describe i-0abcdef1234567890
# Fetch authentication logs
logs auth-logs
# Submit an answer for Easy task
submit prod-data-001
# Submit an answer for Hard task
submit 192.168.1.50
```
---
## 7. Observation Space
Every `step()` and `reset()` returns a `CloudObservation`:
| Field | Type | Description |
|---|---|---|
| `resources` | `List[Dict]` | List of discovered resource records |
| `details` | `Dict` | Full metadata for a single described resource |
| `logs` | `List[Dict]` | Log entries (timestamp, user, action, IP) |
| `status` | `str` | Human-readable status message |
| `info` | `str` | Additional context (e.g., grading feedback) |
| `reward` | `float` | Scalar reward (0.0 to 1.0) |
| `done` | `bool` | Whether the episode has ended |
---
## 8. Tasks & Grading
### Task 1: Easy β€” S3 Public Audit
**Goal:** Identify all S3 buckets that are both `public: true` AND tagged `env: prod`.
| Step | Action | Expected Result |
|---|---|---|
| 1 | `list s3` | Returns 3 buckets |
| 2 | Filter for public + prod | `prod-data-001` |
| 3 | `submit prod-data-001` | Reward: **1.0** βœ… |
---
### Task 2: Medium β€” EC2 Security Patch
**Goal:** Find EC2 instance `i-0abcdef1234567890` which has port 3389 (RDP) open to `0.0.0.0/0`, and close it by modifying the security group to only allow port 22.
| Step | Action | Expected Result |
|---|---|---|
| 1 | `list ec2` | Returns 2 instances |
| 2 | `describe i-0abcdef1234567890` | Shows RDP port open |
| 3 | `modify i-0abcdef1234567890` with patch `{"rules": [{"port": 22, "cidr": "0.0.0.0/0"}]}` | Reward: **1.0** βœ… |
---
### Task 3: Hard β€” IAM Log Forensic
**Goal:** A rogue IAM role (`iam-role-01`) has performed unauthorized actions. Analyze the `auth-logs` to identify the IP address that performed `DeleteStorage`.
| Step | Action | Expected Result |
|---|---|---|
| 1 | `logs auth-logs` | Returns 3 log entries |
| 2 | Find `DeleteStorage` action | IP: `192.168.1.50` |
| 3 | `submit 192.168.1.50` | Reward: **1.0** βœ… |
---
## 9. API Reference
Base URL: `http://localhost:7860`
### `POST /reset`
Reset the environment to a specific task.
**Request:**
```json
{ "task_id": "easy" }
```
**Response:**
```json
{
"observation": {
"resources": null,
"details": null,
"status": null,
"logs": null,
"info": "Environment reset. Task: easy"
},
"reward": 0.0,
"done": false
}
```
### `POST /step`
Execute an action in the environment.
**Request:**
```json
{
"action": {
"action": "list",
"resource_type": "s3"
}
}
```
**Response:**
```json
{
"observation": {
"resources": [
{ "id": "prod-data-001", "region": "us-east-1", "public": true, "tags": { "env": "prod" } },
{ "id": "prod-logs-002", "region": "us-east-1", "public": false, "tags": { "env": "prod" } },
{ "id": "dev-test-01", "region": "us-west-2", "public": true, "tags": { "env": "dev" } }
],
"status": "Listed 3 s3 resources."
},
"reward": 0.0,
"done": false
}
```
### `GET /state`
Get internal environment state.
**Response:**
```json
{
"episode_id": "a1b2c3d4-...",
"step_count": 3,
"task_id": "easy",
"is_completed": false,
"score": 0.0
}
```
### `GET /docs`
Interactive Swagger UI for API exploration.
### `GET /`
Dashboard UI (the web interface).
---
## 10. Dashboard UI
The application includes a premium dark-mode cybersecurity dashboard accessible at `http://localhost:7860`.
### Features
- **Sidebar Task Selector** β€” Switch between Easy, Medium, and Hard challenges with one click.
- **Infrastructure Overview** β€” Visual resource cards for S3 buckets and EC2 instances. Vulnerable resources are highlighted with red borders and blinking status dots.
- **Execution Log** β€” Terminal-style console showing timestamped action logs with color-coded entries (blue for actions, green for system, yellow for rewards, red for errors).
- **Manual Command Input** β€” Type commands like `list s3`, `describe i-0abcdef1234567890`, `logs auth-logs`, or `submit prod-data-001` directly in the dashboard.
- **Live Stats HUD** β€” Displays current task name, cumulative reward, and environment status (Active/Completed).
### Design
- **Theme:** Cyber-noir dark mode with deep navy background (#0a0e14)
- **Accents:** Neon cyan (#00f5ff) for primary elements
- **Typography:** Inter (body), Outfit (headings), JetBrains Mono (code/logs)
- **Effects:** Glassmorphism panels, fade-in card animations, pulsing vulnerability indicators
---
## 11. Running the Application
### Local Development
```bash
# Install dependencies
pip install -r requirements.txt
# Start the server
python -m server.app
# Open in browser
open http://localhost:7860
```
### Running the Baseline Agent
```bash
# Solves the Easy task automatically
python scripts/baseline_inference.py
```
### Docker Deployment
```bash
# Build the image
docker build -t cloud-security-auditor .
# Run the container
docker run -p 7860:7860 cloud-security-auditor
```
### Hugging Face Spaces Deployment
1. Create a new Space on Hugging Face.
2. Select **Docker** as the SDK.
3. Upload the repository contents (including `openenv.yaml` and `Dockerfile`).
4. The entrypoint is automatically set via `openenv.yaml`.
---
## 12. Technology Stack
| Component | Technology |
|---|---|
| Backend | Python 3.10, FastAPI, Uvicorn |
| Environment | openenv-core β‰₯ 0.1.1 |
| Data Models | Python dataclasses |
| Frontend | Vanilla HTML/CSS/JS |
| Fonts | Google Fonts (Inter, Outfit, JetBrains Mono) |
| Deployment | Docker, Hugging Face Spaces |
---
## 13. OpenEnv Specification (`openenv.yaml`)
```yaml
name: cloud-security-auditor
version: "0.2.0"
description: "A real-world cloud security audit environment for AI agents."
hardware:
tier: "cpu-small"
vCPU: 2
RAM: 4Gi
port: 7860
entrypoint: "uvicorn server.app:app --host 0.0.0.0 --port 7860"
tags:
- security
- cloud
- task-based
evaluation:
tasks:
- id: "easy"
name: "S3 Public Audit"
difficulty: "easy"
- id: "medium"
name: "EC2 Security Patch"
difficulty: "medium"
- id: "hard"
name: "IAM Log Forensic"
difficulty: "hard"
```
---
## 14. Extending the Environment
### Adding a New Task
1. Add the task definition to `server/tasks.py`.
2. Add the corresponding mock data to `_initialize_state()` in `environment.py`.
3. Add the grading logic to the `step()` method under `CloudActionType.SUBMIT`.
4. Add a new task button to `index.html` in the sidebar.
### Adding a New Resource Type
1. Add the resource data to `self.resources` in `environment.py`.
2. Add a handler for `CloudActionType.LIST` and `CloudActionType.DESCRIBE` for the new type.
3. Update `detectResourceType()` in `app.js` to render the correct card icon/label.
---
*Built for the Meta Hackathon / OpenEnv Challenge β€’ April 2026*