firewatch-env / quickstart.md
10doshi12's picture
Upload folder using huggingface_hub
44f306a verified

πŸ”₯ FirewatchEnv β€” Quickstart Guide

Get from zero to running your first AI SRE agent in under 5 minutes.


What is FirewatchEnv?

FirewatchEnv is an RL training environment for autonomous SRE incident response, built for the Meta PyTorch OpenEnv Hackathon India 2026. Your AI agent acts as an on-call Site Reliability Engineer β€” it receives simulated microservice telemetry (OTel-compatible metrics, Prometheus-style alerts, log excerpts) and must diagnose and remediate the root cause before the SLO error budget runs out.

Key highlights:

  • Single container, no Kubernetes β€” runs on 2 vCPUs / 8 GB RAM
  • Three difficulty tiers (Easy β†’ Medium β†’ Hard) with adversarial prompt injection in Task 3
  • Outcome-only reward function β€” the agent can't game the grader; it must actually fix the system

Prerequisites

Tool Version Install
Python 3.10+ python.org
uv latest pip install uv or curl -LsSf https://astral.sh/uv/install.sh | sh
Git any git-scm.com
Docker latest (optional β€” only for containerized runs) docker.com

1 β€” Clone & Install

git clone https://huggingface.co/spaces/10doshi12/firewatch-env
cd firewatch-env

Important: All commands below should be run from inside the firewatch_env/ directory, which contains the actual environment code.

cd firewatch_env
uv sync            # installs all Python dependencies from pyproject.toml + uv.lock

This installs:

  • openenv-core[core] β‰₯ 0.2.2 β€” FastAPI server + HTTP client types
  • pydantic β‰₯ 2.0 β€” data models
  • openai β‰₯ 1.0 β€” LLM inference via OpenAI-compatible API
  • python-dotenv β€” .env file loading

2 β€” Configure Environment Variables

Copy the example and fill in your credentials:

cp .env.example .env

Edit .env:

# --- LLM Provider (HuggingFace Router) ---
API_BASE_URL=https://router.huggingface.co/v1
MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
HF_TOKEN=hf_your_huggingface_token_here

# --- Server URL (usually auto-detected β€” leave commented for local dev) ---
# SPACE_URL=https://10doshi12-firewatch-env.hf.space

Get your HF token from huggingface.co/settings/tokens (requires a Pro or Enterprise plan for router access to gated models).

Variable Required Description
API_BASE_URL Yes HuggingFace Router endpoint (https://router.huggingface.co/v1)
MODEL_NAME Yes Model on HF Hub (e.g. Qwen/Qwen2.5-7B-Instruct, Qwen/Qwen2.5-72B-Instruct)
HF_TOKEN No* HuggingFace API token. If omitted, inference runs a deterministic rule-based fallback agent (no LLM calls).
SPACE_URL No Override server URL. Auto-detected in order: localhost:8000 β†’ localhost:7860 β†’ HF Space

3 β€” Start the Server

uv run server

The FastAPI server starts on http://localhost:8000 with these endpoints:

Endpoint Method Description
/health GET Health check
/reset POST Reset environment β€” {"difficulty": "easy", "seed": 42}
/step POST Execute action β€” {"action": {"action_type": "fetch_logs", "target_service": "auth-service"}}
/state GET Get current environment state
/schema GET Action / observation JSON schemas
/ws WS WebSocket for persistent sessions

Quick smoke test (new terminal):

# Reset an easy episode
curl -X POST http://localhost:8000/reset \
  -H "Content-Type: application/json" \
  -d '{"difficulty": "easy", "seed": 42}'

# Take an action
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action": {"action_type": "fetch_logs", "target_service": "cache"}}'

# Check current state
curl http://localhost:8000/state

4 β€” Run the Inference Agent

With the server running in one terminal, open a second terminal:

cd firewatch_env
python inference.py

This runs your agent across all three tasks sequentially:

Task Difficulty Services Red Herrings Max Ticks Seed
task_easy Easy 3 0 20 42
task_medium Medium 5 1 30 137
task_hard Hard 7 3 (1 adversarial) 40 256

Expected Output

[START] task=task_easy env=firewatch-env model=x-ai/grok-4.1-fast
[STEP] step=1 action=fetch_logs:cache reward=-0.14 done=false error=null
[STEP] step=2 action=rollback_deploy:cache reward=-0.14 done=false error=null
...
[END] success=true steps=4 score=0.96 rewards=-0.14,-0.14,-0.14,1.86

Each [STEP] line shows the action taken, intermediate reward, and whether the episode ended. The [END] line reports the final graded score (0.0–1.0).


5 β€” Docker (Alternative)

Build and run the environment as a Docker container:

# From the firewatch_env/ directory
docker build -t firewatch-env ./server
docker run -p 7860:7860 firewatch-env

The server will be available at http://localhost:7860. Set SPACE_URL=http://localhost:7860 when running inference.py (or let auto-detection find it).


6 β€” Deploy to HuggingFace Spaces

openenv validate          # must pass with zero errors
openenv push --repo-id 10doshi12/firewatch-env

Your environment will be live at https://10doshi12-firewatch-env.hf.space.


Project Structure

firewatch_env/
β”œβ”€β”€ models.py              # Pydantic models (FirewatchAction, SystemObservation, etc.)
β”œβ”€β”€ simulation.py          # ServiceMesh + generate_episode() + fault physics
β”œβ”€β”€ actions.py             # ActionHandler β€” all 17 action types
β”œβ”€β”€ rewards.py             # RewardEngine + grade() + EpisodeResult
β”œβ”€β”€ config.py              # Constants, TASKS dict, topology (pure data)
β”œβ”€β”€ client.py              # OpenEnv-generated WebSocket client
β”œβ”€β”€ inference.py           # LLM agent loop (stdout eval format)
β”œβ”€β”€ openenv.yaml           # OpenEnv spec definition
β”œβ”€β”€ .env.example           # Environment variable template
β”œβ”€β”€ Dockerfile             # Multi-stage Docker build
β”œβ”€β”€ pyproject.toml         # Dependencies & entry points
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py             # FastAPI application (entry point)
β”‚   └── firewatch_env_environment.py  # Environment wiring
└── tests/
    β”œβ”€β”€ test_integration.py
    β”œβ”€β”€ test_simulation.py
    └── test_inference.py

Action Space Reference

Investigation Actions (read-only)

Action Description
fetch_logs Populates recent_logs on the target service
get_metrics_detail Returns 3-tick metric trend summary
trace_dependencies Returns full upstream/downstream dependency chain
strace_process System-call level process inspection
profiler_dump CPU/memory profiler output
check_gc_pressure GC pause times and heap pressure
trace_distributed_request End-to-end distributed trace
inspect_thread_pool Thread pool utilization and deadlock detection
inspect_commit_diff Recent deployment diff

Remediation Actions (mutate state)

Action Description
restart_service Resets OOM state; wrong if error_rate < 0.10
rollback_deploy Halts bad deployment progression
revert_config Restores connection pool / config settings
scale_replicas Increases memory headroom
circuit_break Suppresses cascade for 3 ticks
traffic_shift Redirects traffic away from degraded service

Meta Actions

Action Description
declare_resolved Terminates episode and triggers grading
escalate Records escalation (no state change)

Fault Types

Fault Signal in Logs Correct Remediation
oom OOMKilled, exit code 137 restart_service
bad_deploy Error spike post-deployment SHA rollback_deploy
config_drift HikariCP pool exhaustion, 30s timeouts revert_config
network_partition Connection refused, circuit breaker OPEN circuit_break or restart_service
memory_leak Gradual latency increase, slow memory growth scale_replicas β†’ restart_service

Scoring

The grader produces a score between 0.0 and 1.0 based on four components:

Component Weight What it Measures
Recovery 40% Did system health improve?
Speed 25% How quickly was MTTM achieved?
Precision 20% Were wrong actions avoided?
SLO 15% How much error budget remained?

Running Tests

cd firewatch_env
uv run pytest tests/                                  # all tests
uv run pytest tests/test_integration.py               # integration only
uv run pytest tests/test_simulation.py                # simulation logic
uv run pytest tests/test_integration.py::test_reset_deterministic  # single test

Troubleshooting

Problem Solution
uv: command not found Install uv: pip install uv or curl -LsSf https://astral.sh/uv/install.sh | sh
openenv-core import error Run uv sync inside firewatch_env/
Server won't start Check port 8000 isn't in use: lsof -i :8000
inference.py can't find server Server auto-detection probes localhost:8000 β†’ localhost:7860. Ensure the server is running.
LLM API errors / 401 Verify HF_TOKEN in .env. Without it, the rule-based fallback agent runs (no LLM).
Score is 0.0 Agent didn't call declare_resolved or SLO budget hit 0%. Check action logs.
Docker build fails Ensure Docker Desktop is running. Build from firewatch_env/: docker build -t fw ./server

Next Steps

  • Swap the model: Change MODEL_NAME in .env to test different HF-hosted models (e.g. Qwen/Qwen2.5-72B-Instruct, meta-llama/Llama-3.3-70B-Instruct)
  • Tune the agent: Edit SYSTEM_PROMPT and _recovery_hint() in inference.py to improve decision-making
  • Add actions: Extend actions.py with new diagnostic or remediation actions
  • Custom tasks: Define new scenarios in config.py and openenv.yaml
  • Benchmark: Compare scores across models to find the best SRE agent

FirewatchEnv β€” Meta PyTorch OpenEnv Hackathon India 2026