Spaces:

Battlecon
/

soc-openenv

Sleeping

App Files Files Community

soc-openenv / README.md

Jayanth4577

Add tests for graders and environment configuration

6964e37 3 months ago

preview code

Raw

History Blame Contribute Delete

6.15 kB

metadata

title: MetaXRL Soc-OpenEnv
emoji: 🐳
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit

SOC Incident Response OpenEnv

Problem Statement

Security Operations Center teams handle large alert volumes, multi-stage attacks, and conflicting business constraints during active incidents. This project turns that real workflow into a trainable and testable OpenEnv benchmark.

The goal is to evaluate how well an agent can:

triage noisy SIEM alerts
reconstruct attack chains across hosts
contain threats without violating critical business constraints

This environment is designed for hackathon-style validation and reproducible benchmarking.

What This Project Implements

Real-world SOC simulation (not a toy domain)
Full OpenEnv interface with typed models
reset(), step(), state() contract
Three tasks with difficulty progression
Deterministic graders returning scores in [0.0, 1.0]
Dense reward shaping with partial progress signals
Baseline inference script using OpenAI client against an OpenAI-compatible endpoint
FastAPI backend and React frontend console for local and judge demos
Docker + Hugging Face Spaces compatible packaging

Core Workflow (Conceptual)

This project has three separate layers:

Environment The simulator generates observations, applies actions, tracks state, and emits rewards.
Policy Model The baseline model reads observations and outputs one JSON action per step.
Grader At episode end, deterministic graders map final state to a score from 0.0 to 1.0.

In short: observation -> action -> step -> reward -> final grade.

Tasks

ID	Difficulty	Max Steps	Objective
`alert_triage`	Easy	10	Classify and contain true positives while avoiding false-positive containment
`attack_chain_reconstruction`	Medium	25	Correlate alerts across hosts, recover ATT&CK chain context, contain correctly
`constrained_incident_response`	Hard	40	Balance security, continuity, and compliance under hard business constraints

API Endpoints

POST /reset
POST /step
GET /state
POST /grade
GET /api/tasks

Local Setup

1) Python dependencies

pip install -r requirements.txt
pip install -e . --no-deps

2) Frontend dependencies

cd web
npm install
cd ..

Run Locally (Recommended Terminal Layout)

Use two terminals.

Terminal A: backend

python server.py

Backend should be live at:

http://127.0.0.1:7860/docs

Terminal B: frontend

cd web
$env:VITE_API_BASE_URL="http://127.0.0.1:7860"
npm run dev

Frontend should be live at:

http://localhost:5173

How To Test Locally (Backend-Only)

This is the fastest way to validate API behavior before UI checks.

Step 1: reset

Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:7860/reset" -ContentType "application/json" -Body '{"task_id":"alert_triage","seed":42}'

Step 2: step

Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:7860/step" -ContentType "application/json" -Body '{"task_id":"alert_triage","action":{"action_type":"enrich_alert","alert_id":"ALT-001","source":"threat_intel"}}'

Step 3: state

Invoke-RestMethod -Method Get -Uri "http://127.0.0.1:7860/state?task_id=alert_triage"

Step 4: grade

Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:7860/grade?task_id=alert_triage"

Important: Always call reset first for a task before calling step or grade.

How To Test Locally (Frontend Console)

After backend + frontend are running:

Open http://localhost:5173
Select a task on the left panel
Click Reset episode
Confirm Current observation and Backend state are populated
Click Load suggested action or edit JSON manually
Click Execute draft action
Optionally click Run guided demo
Click Grade current episode

You should see trace events, reward updates, and a final score breakdown.

Baseline Inference Script

inference.py runs all three tasks by default and writes baseline_results.json.

Required environment variables:

API_BASE_URL
MODEL_NAME
HF_TOKEN

Example (PowerShell):

$env:API_BASE_URL="https://router.huggingface.co/v1"
$env:MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct"
$env:HF_TOKEN="hf_your_token_here"
python inference.py

Single task:

python inference.py --task alert_triage

Expected Output Artifacts

Console logs per step with action and reward
Final scores per task
baseline_results.json in repo root

Common Errors and Fixes

400 on `/step` in UI

Cause:

Episode not reset for the selected task.

Fix:

Click Reset episode first, then run step.

401 Invalid username or password in `inference.py`

Cause:

Invalid or missing token/model access.

Fix:

Verify HF_TOKEN is set in the same terminal session.
Verify token has access to chosen model.
Verify endpoint and model name are valid.

Frontend cannot reach backend

Cause:

Wrong API base URL.

Fix:

Start backend on 127.0.0.1:7860.
Start frontend with VITE_API_BASE_URL=http://127.0.0.1:7860.

Tests

Run unit tests:

pytest tests -q

Docker

docker build -t soc-openenv .
docker run -p 7860:7860 \
  -e API_BASE_URL="https://router.huggingface.co/v1" \
  -e MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct" \
  -e HF_TOKEN="hf_your_token_here" \
  soc-openenv

Validation Before Submission

./validate.sh
./validate.sh https://YOUR_USERNAME-soc-openenv.hf.space

Hugging Face Spaces

Set these Space secrets:

API_BASE_URL
MODEL_NAME
HF_TOKEN

Project Structure

soc-openenv/
├── openenv.yaml
├── Dockerfile
├── requirements.txt
├── pyproject.toml
├── README.md
├── inference.py
├── server.py
├── validate.sh
├── soc_env/
├── scenarios/
├── tests/
└── web/

Contact

help_openenvhackathon@scaler.com

SOC Incident Response OpenEnv

Problem Statement

What This Project Implements

Core Workflow (Conceptual)

Tasks

API Endpoints

Local Setup

1) Python dependencies

2) Frontend dependencies

Run Locally (Recommended Terminal Layout)

Terminal A: backend

Terminal B: frontend

How To Test Locally (Backend-Only)

Step 1: reset

Step 2: step

Step 3: state

Step 4: grade

How To Test Locally (Frontend Console)

Baseline Inference Script

Expected Output Artifacts

Common Errors and Fixes

400 on /step in UI

401 Invalid username or password in inference.py

Frontend cannot reach backend

Tests

Docker

Validation Before Submission

Hugging Face Spaces

Project Structure

Contact

400 on `/step` in UI

401 Invalid username or password in `inference.py`