---
title: MetaXRL Soc-OpenEnv
emoji: 🐳
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
---

# SOC Incident Response OpenEnv

## Problem Statement

Security Operations Center teams handle large alert volumes, multi-stage attacks, and conflicting business constraints during active incidents. This project turns that real workflow into a trainable and testable OpenEnv benchmark.

The goal is to evaluate how well an agent can:

- triage noisy SIEM alerts
- reconstruct attack chains across hosts
- contain threats without violating critical business constraints

This environment is designed for hackathon-style validation and reproducible benchmarking.

## What This Project Implements

- Real-world SOC simulation (not a toy domain)
- Full OpenEnv interface with typed models
- `reset()`, `step()`, `state()` contract
- Three tasks with difficulty progression
- Deterministic graders returning scores in `[0.0, 1.0]`
- Dense reward shaping with partial progress signals
- Baseline inference script using OpenAI client against an OpenAI-compatible endpoint
- FastAPI backend and React frontend console for local and judge demos
- Docker + Hugging Face Spaces compatible packaging

## Core Workflow (Conceptual)

This project has three separate layers:

1. Environment
   The simulator generates observations, applies actions, tracks state, and emits rewards.

2. Policy Model
   The baseline model reads observations and outputs one JSON action per step.

3. Grader
   At episode end, deterministic graders map final state to a score from `0.0` to `1.0`.

In short: `observation -> action -> step -> reward -> final grade`.

## Tasks

| ID | Difficulty | Max Steps | Objective |
|---|---|---:|---|
| `alert_triage` | Easy | 10 | Classify and contain true positives while avoiding false-positive containment |
| `attack_chain_reconstruction` | Medium | 25 | Correlate alerts across hosts, recover ATT&CK chain context, contain correctly |
| `constrained_incident_response` | Hard | 40 | Balance security, continuity, and compliance under hard business constraints |

## API Endpoints

- `POST /reset`
- `POST /step`
- `GET /state`
- `POST /grade`
- `GET /api/tasks`

## Local Setup

### 1) Python dependencies

```bash
pip install -r requirements.txt
pip install -e . --no-deps
```

### 2) Frontend dependencies

```bash
cd web
npm install
cd ..
```

## Run Locally (Recommended Terminal Layout)

Use two terminals.

### Terminal A: backend

```powershell
python server.py
```

Backend should be live at:

- `http://127.0.0.1:7860/docs`

### Terminal B: frontend

```powershell
cd web
$env:VITE_API_BASE_URL="http://127.0.0.1:7860"
npm run dev
```

Frontend should be live at:

- `http://localhost:5173`

## How To Test Locally (Backend-Only)

This is the fastest way to validate API behavior before UI checks.

### Step 1: reset

```powershell
Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:7860/reset" -ContentType "application/json" -Body '{"task_id":"alert_triage","seed":42}'
```

### Step 2: step

```powershell
Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:7860/step" -ContentType "application/json" -Body '{"task_id":"alert_triage","action":{"action_type":"enrich_alert","alert_id":"ALT-001","source":"threat_intel"}}'
```

### Step 3: state

```powershell
Invoke-RestMethod -Method Get -Uri "http://127.0.0.1:7860/state?task_id=alert_triage"
```

### Step 4: grade

```powershell
Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:7860/grade?task_id=alert_triage"
```

Important: Always call `reset` first for a task before calling `step` or `grade`.

## How To Test Locally (Frontend Console)

After backend + frontend are running:

1. Open `http://localhost:5173`
2. Select a task on the left panel
3. Click `Reset episode`
4. Confirm `Current observation` and `Backend state` are populated
5. Click `Load suggested action` or edit JSON manually
6. Click `Execute draft action`
7. Optionally click `Run guided demo`
8. Click `Grade current episode`

You should see trace events, reward updates, and a final score breakdown.

## Baseline Inference Script

`inference.py` runs all three tasks by default and writes `baseline_results.json`.

Required environment variables:

- `API_BASE_URL`
- `MODEL_NAME`
- `HF_TOKEN`

Example (PowerShell):

```powershell
$env:API_BASE_URL="https://router.huggingface.co/v1"
$env:MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct"
$env:HF_TOKEN="hf_your_token_here"
python inference.py
```

Single task:

```powershell
python inference.py --task alert_triage
```

## Expected Output Artifacts

- Console logs per step with action and reward
- Final scores per task
- `baseline_results.json` in repo root

## Common Errors and Fixes

### 400 on `/step` in UI

Cause:
- Episode not reset for the selected task.

Fix:
- Click `Reset episode` first, then run step.

### 401 Invalid username or password in `inference.py`

Cause:
- Invalid or missing token/model access.

Fix:
- Verify `HF_TOKEN` is set in the same terminal session.
- Verify token has access to chosen model.
- Verify endpoint and model name are valid.

### Frontend cannot reach backend

Cause:
- Wrong API base URL.

Fix:
- Start backend on `127.0.0.1:7860`.
- Start frontend with `VITE_API_BASE_URL=http://127.0.0.1:7860`.

## Tests

Run unit tests:

```powershell
pytest tests -q
```

## Docker

```bash
docker build -t soc-openenv .
docker run -p 7860:7860 \
  -e API_BASE_URL="https://router.huggingface.co/v1" \
  -e MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct" \
  -e HF_TOKEN="hf_your_token_here" \
  soc-openenv
```

## Validation Before Submission

```bash
./validate.sh
./validate.sh https://YOUR_USERNAME-soc-openenv.hf.space
```

## Hugging Face Spaces

Set these Space secrets:

- `API_BASE_URL`
- `MODEL_NAME`
- `HF_TOKEN`

## Project Structure

```text
soc-openenv/
├── openenv.yaml
├── Dockerfile
├── requirements.txt
├── pyproject.toml
├── README.md
├── inference.py
├── server.py
├── validate.sh
├── soc_env/
├── scenarios/
├── tests/
└── web/
```

## Contact

help_openenvhackathon@scaler.com