soc-openenv / README.md
Jayanth4577's picture
Add tests for graders and environment configuration
6964e37
|
Raw
History Blame Contribute Delete
6.15 kB
metadata
title: MetaXRL Soc-OpenEnv
emoji: 🐳
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit

SOC Incident Response OpenEnv

Problem Statement

Security Operations Center teams handle large alert volumes, multi-stage attacks, and conflicting business constraints during active incidents. This project turns that real workflow into a trainable and testable OpenEnv benchmark.

The goal is to evaluate how well an agent can:

  • triage noisy SIEM alerts
  • reconstruct attack chains across hosts
  • contain threats without violating critical business constraints

This environment is designed for hackathon-style validation and reproducible benchmarking.

What This Project Implements

  • Real-world SOC simulation (not a toy domain)
  • Full OpenEnv interface with typed models
  • reset(), step(), state() contract
  • Three tasks with difficulty progression
  • Deterministic graders returning scores in [0.0, 1.0]
  • Dense reward shaping with partial progress signals
  • Baseline inference script using OpenAI client against an OpenAI-compatible endpoint
  • FastAPI backend and React frontend console for local and judge demos
  • Docker + Hugging Face Spaces compatible packaging

Core Workflow (Conceptual)

This project has three separate layers:

  1. Environment The simulator generates observations, applies actions, tracks state, and emits rewards.

  2. Policy Model The baseline model reads observations and outputs one JSON action per step.

  3. Grader At episode end, deterministic graders map final state to a score from 0.0 to 1.0.

In short: observation -> action -> step -> reward -> final grade.

Tasks

ID Difficulty Max Steps Objective
alert_triage Easy 10 Classify and contain true positives while avoiding false-positive containment
attack_chain_reconstruction Medium 25 Correlate alerts across hosts, recover ATT&CK chain context, contain correctly
constrained_incident_response Hard 40 Balance security, continuity, and compliance under hard business constraints

API Endpoints

  • POST /reset
  • POST /step
  • GET /state
  • POST /grade
  • GET /api/tasks

Local Setup

1) Python dependencies

pip install -r requirements.txt
pip install -e . --no-deps

2) Frontend dependencies

cd web
npm install
cd ..

Run Locally (Recommended Terminal Layout)

Use two terminals.

Terminal A: backend

python server.py

Backend should be live at:

  • http://127.0.0.1:7860/docs

Terminal B: frontend

cd web
$env:VITE_API_BASE_URL="http://127.0.0.1:7860"
npm run dev

Frontend should be live at:

  • http://localhost:5173

How To Test Locally (Backend-Only)

This is the fastest way to validate API behavior before UI checks.

Step 1: reset

Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:7860/reset" -ContentType "application/json" -Body '{"task_id":"alert_triage","seed":42}'

Step 2: step

Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:7860/step" -ContentType "application/json" -Body '{"task_id":"alert_triage","action":{"action_type":"enrich_alert","alert_id":"ALT-001","source":"threat_intel"}}'

Step 3: state

Invoke-RestMethod -Method Get -Uri "http://127.0.0.1:7860/state?task_id=alert_triage"

Step 4: grade

Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:7860/grade?task_id=alert_triage"

Important: Always call reset first for a task before calling step or grade.

How To Test Locally (Frontend Console)

After backend + frontend are running:

  1. Open http://localhost:5173
  2. Select a task on the left panel
  3. Click Reset episode
  4. Confirm Current observation and Backend state are populated
  5. Click Load suggested action or edit JSON manually
  6. Click Execute draft action
  7. Optionally click Run guided demo
  8. Click Grade current episode

You should see trace events, reward updates, and a final score breakdown.

Baseline Inference Script

inference.py runs all three tasks by default and writes baseline_results.json.

Required environment variables:

  • API_BASE_URL
  • MODEL_NAME
  • HF_TOKEN

Example (PowerShell):

$env:API_BASE_URL="https://router.huggingface.co/v1"
$env:MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct"
$env:HF_TOKEN="hf_your_token_here"
python inference.py

Single task:

python inference.py --task alert_triage

Expected Output Artifacts

  • Console logs per step with action and reward
  • Final scores per task
  • baseline_results.json in repo root

Common Errors and Fixes

400 on /step in UI

Cause:

  • Episode not reset for the selected task.

Fix:

  • Click Reset episode first, then run step.

401 Invalid username or password in inference.py

Cause:

  • Invalid or missing token/model access.

Fix:

  • Verify HF_TOKEN is set in the same terminal session.
  • Verify token has access to chosen model.
  • Verify endpoint and model name are valid.

Frontend cannot reach backend

Cause:

  • Wrong API base URL.

Fix:

  • Start backend on 127.0.0.1:7860.
  • Start frontend with VITE_API_BASE_URL=http://127.0.0.1:7860.

Tests

Run unit tests:

pytest tests -q

Docker

docker build -t soc-openenv .
docker run -p 7860:7860 \
  -e API_BASE_URL="https://router.huggingface.co/v1" \
  -e MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct" \
  -e HF_TOKEN="hf_your_token_here" \
  soc-openenv

Validation Before Submission

./validate.sh
./validate.sh https://YOUR_USERNAME-soc-openenv.hf.space

Hugging Face Spaces

Set these Space secrets:

  • API_BASE_URL
  • MODEL_NAME
  • HF_TOKEN

Project Structure

soc-openenv/
β”œβ”€β”€ openenv.yaml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
β”œβ”€β”€ inference.py
β”œβ”€β”€ server.py
β”œβ”€β”€ validate.sh
β”œβ”€β”€ soc_env/
β”œβ”€β”€ scenarios/
β”œβ”€β”€ tests/
└── web/

Contact

help_openenvhackathon@scaler.com