sentinel_env / USER_GUIDE.md
KaushikSarveswaran's picture
Initial submission: OpenEnv-Sentinel SRE triage environment
33dd3ee

OpenEnv-Sentinel β€” User Guide

Step-by-step guide for running, validating, and deploying the SRE Incident Triage environment.


Table of Contents

  1. Prerequisites
  2. Local Setup
  3. Running the Server Locally
  4. Manual Validation β€” Local Server
  5. Docker Build & Validation
  6. Running Inference (LLM Agent)
  7. OpenEnv Validate
  8. Deploy to Hugging Face Spaces
  9. Troubleshooting

1. Prerequisites

Tool Version Purpose
Python β‰₯ 3.10 Runtime
pip / pipenv Latest Dependency management
Docker Latest Container build & test
Git Latest Version control, HF push
huggingface-cli Latest HF Spaces deployment
openenv-core CLI β‰₯ 0.2.3 openenv validate / openenv push

Install the OpenEnv CLI and Hugging Face CLI:

pip install openenv-core huggingface-hub[cli]

2. Local Setup

Option A β€” pip (quick)

cd openenv-sentinel
pip install -e ".[dev,inference]"

Option B β€” pipenv (isolated)

cd openenv-sentinel
pipenv install --python 3.12
pipenv install -e ".[dev]"
pipenv install openai httpx websockets
pipenv shell

All subsequent commands assume you are inside the virtual environment.


3. Running the Server Locally

Start the FastAPI server on port 8000:

uvicorn server.app:app --host 0.0.0.0 --port 8000

You should see:

INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Verify with:

curl http://localhost:8000/health
# β†’ {"status":"ok"}

curl http://localhost:8000/schema
# β†’ JSON with action, observation, state schemas

4. Manual Validation β€” Local Server

With the server running (from step 3), validate the environment in a second terminal.

4.1 Automated test script

The quickest way to validate all 3 tasks end-to-end:

pip install websockets httpx   # if not already installed
python test_local.py

Expected output:

Health: {'status': 'ok'}
Schema: action fields=['tool_name', 'parameters']

==================================================
TASK 1
==================================================
  Reset OK: CRITICAL: payment-api returning HTTP 500 errors...
  Step 1 (status payment-api): reward=0.11
  Step 2 (logs payment-api): reward=0.11
  Resolution: score=0.75, done=True
  State: final_score=1.0, root_cause_correct=True, recommendation_correct=True

... (Tasks 2 & 3 similar) ...

βœ… ALL TESTS PASSED

4.2 Manual cURL validation (HTTP endpoints)

Note: HTTP endpoints are stateless β€” each request creates a fresh environment instance. Use these for single-shot checks only. For multi-step episodes, use the WebSocket endpoint (section 4.3).

Health check:

curl http://localhost:8000/health

Schema check:

curl http://localhost:8000/schema | python -m json.tool

Reset (single-shot):

curl -X POST http://localhost:8000/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": 1}'

4.3 Manual WebSocket validation (stateful sessions)

Multi-step episodes require WebSocket because the server maintains session state across messages. Install websocat or use Python:

Using Python interactively:

import asyncio, json, websockets

async def manual_test():
    async with websockets.connect("ws://localhost:8000/ws") as ws:
        # 1. Reset to Task 1
        await ws.send(json.dumps({"type": "reset", "data": {"task_id": 1}}))
        resp = json.loads(await ws.recv())
        print("Reset:", json.dumps(resp["data"]["observation"]["incident_summary"]))

        # 2. Call a diagnostic tool
        await ws.send(json.dumps({
            "type": "step",
            "data": {
                "tool_name": "get_service_status",
                "parameters": {"service": "payment-api"}
            }
        }))
        resp = json.loads(await ws.recv())
        print("Step 1:", resp["data"]["observation"]["tool_output"][:200])

        # 3. Submit resolution
        await ws.send(json.dumps({
            "type": "step",
            "data": {
                "tool_name": "submit_resolution",
                "parameters": {
                    "root_cause": "Missing DB_CONNECTION_STRING after v2.3.1 deploy",
                    "affected_service": "payment-api",
                    "recommendation": "Rollback to v2.3.0 or set the env var"
                }
            }
        }))
        resp = json.loads(await ws.recv())
        print("Done:", resp["data"]["done"], "Score:", resp["data"]["reward"])

        # 4. Get final state
        await ws.send(json.dumps({"type": "state"}))
        resp = json.loads(await ws.recv())
        print("Final score:", resp["data"]["final_score"])

asyncio.run(manual_test())

Using websocat (CLI tool):

brew install websocat   # macOS
websocat ws://localhost:8000/ws

Then type JSON messages line by line:

{"type": "reset", "data": {"task_id": 1}}
{"type": "step", "data": {"tool_name": "get_service_status", "parameters": {"service": "payment-api"}}}
{"type": "step", "data": {"tool_name": "submit_resolution", "parameters": {"root_cause": "Missing DB_CONNECTION_STRING", "affected_service": "payment-api", "recommendation": "Rollback to v2.3.0"}}}
{"type": "state"}

4.4 What to check

Check Expected
/health returns 200 {"status": "ok"}
/schema returns action/observation/state schemas Three top-level keys with JSON Schema properties
Reset with task_id 1, 2, 3 Returns incident_summary, available_tools (7 tools), done: false
Diagnostic tool steps Returns tool_output (non-empty), per-step reward
submit_resolution Sets done: true, returns graded reward
State after resolution final_score between 0.0–1.0, root_cause_correct bool
All 3 tasks produce scores > 0.0 with good resolutions Task 1 β‰ˆ 1.0, Task 2 β‰ˆ 1.0, Task 3 β‰ˆ 1.0 (with ideal answers)

5. Docker Build & Validation

5.1 Build the image

docker build -t sentinel-env:latest -f server/Dockerfile .

5.2 Run the container

docker run -p 8000:8000 sentinel-env:latest

The server starts on port 8000 inside the container, mapped to your host.

5.3 Validate against the container

Once the container is running, all the same validation steps from section 4 work:

# Health check
curl http://localhost:8000/health

# Run the automated test suite
python test_local.py

# Or run inference against the containerised server
ENV_URL=http://localhost:8000 python inference.py

5.4 Docker β€” useful commands

# Build with no cache (clean rebuild)
docker build --no-cache -t sentinel-env:latest -f server/Dockerfile .

# Run in background
docker run -d --name sentinel -p 8000:8000 sentinel-env:latest

# View logs
docker logs -f sentinel

# Stop and remove
docker stop sentinel && docker rm sentinel

# Check image size (should be < 500MB)
docker images sentinel-env

6. Running Inference (LLM Agent)

The inference script drives an LLM through all 3 tasks via WebSocket.

6.1 Set environment variables

The inference script supports HF Inference (default), OpenAI, and Azure OpenAI endpoints.

Important: ENV_URL is the Sentinel environment server. API_BASE_URL is the LLM API endpoint (matching the official OpenEnv inference pattern).

Option A β€” HF Inference API (default, for hackathon submission):

export ENV_URL=http://localhost:8000                        # env server
export API_BASE_URL=https://router.huggingface.co/v1        # default, can omit
export MODEL_NAME=openai/gpt-oss-120b:novita                # default, can omit
export HF_TOKEN=hf_...                                      # or API_KEY

Option B β€” OpenAI:

export ENV_URL=http://localhost:8000
export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o
export API_KEY=sk-...

Option C β€” Azure OpenAI (for local/enterprise testing):

export ENV_URL=http://localhost:8000
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
export AZURE_OPENAI_API_KEY=your-azure-key
export MODEL_NAME=your-deployment-name      # Azure deployment name
export AZURE_OPENAI_API_VERSION=2024-12-01-preview  # optional, this is the default

When AZURE_OPENAI_ENDPOINT is set, the script uses AzureOpenAI client. Otherwise it uses OpenAI(base_url=API_BASE_URL, api_key=...) β€” which covers both HF router and direct OpenAI.

6.2 Install inference dependencies

pip install openai websockets

6.3 Run

python inference.py

Expected output:

==================================================
Running Task 1...
==================================================
Task 1: 0.85

==================================================
Running Task 2...
==================================================
Task 2: 0.65

==================================================
Running Task 3...
==================================================
Task 3: 0.40

==================================================
Task 1: 0.85
Task 2: 0.65
Task 3: 0.40
Average: 0.63
==================================================

6.4 Inference against a remote HF Space

# HF model via HF router (hackathon default)
export ENV_URL=https://your-username-sentinel-env.hf.space
export HF_TOKEN=hf_...
python inference.py

# OpenAI model
export ENV_URL=https://your-username-sentinel-env.hf.space
export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o
export API_KEY=sk-...
python inference.py

# Azure OpenAI
export ENV_URL=https://your-username-sentinel-env.hf.space
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
export AZURE_OPENAI_API_KEY=your-azure-key
export MODEL_NAME=your-deployment-name
python inference.py

7. OpenEnv Validate

Run the official OpenEnv validation to confirm spec compliance:

openenv validate

This checks:

  • openenv.yaml manifest is valid
  • The app entry point (server.app:app) is importable
  • A main() function exists in the script entry point
  • uv.lock is present and up to date

If uv.lock is missing or stale:

pip install uv
uv lock
openenv validate

8. Deploy to Hugging Face Spaces

8.1 Login to Hugging Face

huggingface-cli login
# Paste your HF token when prompted (needs write access)

8.2 Option A β€” openenv push (recommended)

openenv push

This reads openenv.yaml and pushes the environment as a Docker Space tagged with openenv.

8.3 Option B β€” Manual HF Spaces deployment

Step 1: Create the Space

Go to https://huggingface.co/new-space and create a new Space:

  • Space name: sentinel-env (or any name)
  • SDK: Docker
  • Hardware: CPU basic (2 vCPU, 16GB RAM β€” free tier)
  • Visibility: Public

Step 2: Clone the Space repo

git clone https://huggingface.co/spaces/YOUR_USERNAME/sentinel-env hf-space
cd hf-space

Step 3: Copy project files

# Copy all source files
cp -r /path/to/openenv-sentinel/{models.py,__init__.py,client.py,inference.py} .
cp -r /path/to/openenv-sentinel/{server,scenarios,tools,grading} .
cp /path/to/openenv-sentinel/openenv.yaml .
cp /path/to/openenv-sentinel/pyproject.toml .
cp /path/to/openenv-sentinel/README.md .

# The Dockerfile must be at the repo root for HF Spaces
cp /path/to/openenv-sentinel/server/Dockerfile .

Important: HF Spaces expects Dockerfile at the repository root. The COPY paths inside the Dockerfile already reference files relative to the build context (repo root), so no changes are needed.

Step 4: Push to HF

git add .
git commit -m "Deploy OpenEnv-Sentinel"
git push

Step 5: Verify deployment

The Space builds automatically. Once running:

curl https://YOUR_USERNAME-sentinel-env.hf.space/health
# β†’ {"status": "ok"}

8.4 Verify the deployed Space

# Health
curl https://YOUR_USERNAME-sentinel-env.hf.space/health

# Schema
curl https://YOUR_USERNAME-sentinel-env.hf.space/schema

# Run test_local.py against the Space (edit BASE_HTTP/BASE_WS in the file)
# Or run inference:
ENV_URL=https://YOUR_USERNAME-sentinel-env.hf.space \
HF_TOKEN=hf_... \
python inference.py

8.5 HF Spaces tips

  • Cold starts: Free-tier Spaces sleep after inactivity. First request takes ~30s.
  • Logs: View build & runtime logs in the Space's "Logs" tab on HF.
  • Environment variables: Set secrets (like API keys) in Space Settings β†’ Repository secrets.
  • Tags: Ensure the README frontmatter includes tags: [openenv] for hackathon discovery.
  • Port: The app_port: 8000 in README frontmatter must match the EXPOSE in the Dockerfile.

9. Troubleshooting

Problem Solution
ModuleNotFoundError: No module named 'openenv' Run pip install -e . or pip install openenv-core>=0.2.3
openenv validate fails with "no main() found" Ensure server/app.py has a def main() function and [project.scripts] in pyproject.toml
openenv validate fails with "uv.lock not found" Run pip install uv && uv lock
WebSocket connection refused Server must be running (uvicorn server.app:app --port 8000)
HTTP /step returns fresh state (not continuing episode) HTTP endpoints are stateless. Use WebSocket /ws for multi-step episodes
Docker build fails on COPY Run docker build from the project root (not from server/)
Docker healthcheck failing Ensure curl is installed in the image (the Dockerfile does this)
inference.py error: "ENV_URL required" export ENV_URL=http://localhost:8000
Azure OpenAI 401 / auth error Verify AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, and that MODEL_NAME matches your deployment name
HF Space shows "Building" forever Check the Logs tab for build errors. Common: missing files in COPY
HF Space returns 502 The app hasn't started yet (cold start) or crashed. Check runtime logs
Task score is 0.0 The resolution keywords didn't match. Check grading criteria in HACKATHON_PLAN.md Β§5
websockets not installed pip install websockets

Quick Reference

# ── Local development ──
pip install -e ".[dev,inference]"
uvicorn server.app:app --port 8000          # start server
python test_local.py                         # validate all 3 tasks
openenv validate                             # check spec compliance

# ── Docker ──
docker build -t sentinel-env -f server/Dockerfile .
docker run -p 8000:8000 sentinel-env

# ── Inference (HF router β€” hackathon default) ──
export ENV_URL=http://localhost:8000
export HF_TOKEN=hf_...
python inference.py

# ── Inference (OpenAI) ──
export ENV_URL=http://localhost:8000
export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o
export API_KEY=sk-...
python inference.py

# ── Inference (Azure OpenAI) ──
export ENV_URL=http://localhost:8000
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
export AZURE_OPENAI_API_KEY=your-azure-key
export MODEL_NAME=your-deployment-name
python inference.py

# ── Deploy ──
huggingface-cli login
openenv push