Spaces:

10doshi12
/

firewatch-env

Running

App Files Files Community

firewatch-env / quickstart.md

10doshi12

Upload folder using huggingface_hub

44f306a verified 1 day ago

preview code

raw

history blame contribute delete

10.8 kB

🔥 FirewatchEnv — Quickstart Guide

Get from zero to running your first AI SRE agent in under 5 minutes.

What is FirewatchEnv?

FirewatchEnv is an RL training environment for autonomous SRE incident response, built for the Meta PyTorch OpenEnv Hackathon India 2026. Your AI agent acts as an on-call Site Reliability Engineer — it receives simulated microservice telemetry (OTel-compatible metrics, Prometheus-style alerts, log excerpts) and must diagnose and remediate the root cause before the SLO error budget runs out.

Key highlights:

Single container, no Kubernetes — runs on 2 vCPUs / 8 GB RAM
Three difficulty tiers (Easy → Medium → Hard) with adversarial prompt injection in Task 3
Outcome-only reward function — the agent can't game the grader; it must actually fix the system

Prerequisites

Tool	Version	Install
Python	3.10+	python.org
uv	latest	`pip install uv` or `curl -LsSf https://astral.sh/uv/install.sh \| sh`
Git	any	git-scm.com
Docker	latest (optional — only for containerized runs)	docker.com

1 — Clone & Install

git clone https://huggingface.co/spaces/10doshi12/firewatch-env
cd firewatch-env

Important: All commands below should be run from inside the firewatch_env/ directory, which contains the actual environment code.

cd firewatch_env
uv sync            # installs all Python dependencies from pyproject.toml + uv.lock

This installs:

openenv-core[core] ≥ 0.2.2 — FastAPI server + HTTP client types
pydantic ≥ 2.0 — data models
openai ≥ 1.0 — LLM inference via OpenAI-compatible API
python-dotenv — .env file loading

2 — Configure Environment Variables

Copy the example and fill in your credentials:

cp .env.example .env

Edit .env:

# --- LLM Provider (HuggingFace Router) ---
API_BASE_URL=https://router.huggingface.co/v1
MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
HF_TOKEN=hf_your_huggingface_token_here

# --- Server URL (usually auto-detected — leave commented for local dev) ---
# SPACE_URL=https://10doshi12-firewatch-env.hf.space

Get your HF token from huggingface.co/settings/tokens (requires a Pro or Enterprise plan for router access to gated models).

Variable	Required	Description
`API_BASE_URL`	Yes	HuggingFace Router endpoint (`https://router.huggingface.co/v1`)
`MODEL_NAME`	Yes	Model on HF Hub (e.g. `Qwen/Qwen2.5-7B-Instruct`, `Qwen/Qwen2.5-72B-Instruct`)
`HF_TOKEN`	No*	HuggingFace API token. If omitted, inference runs a deterministic rule-based fallback agent (no LLM calls).
`SPACE_URL`	No	Override server URL. Auto-detected in order: `localhost:8000` → `localhost:7860` → HF Space

3 — Start the Server

uv run server

The FastAPI server starts on http://localhost:8000 with these endpoints:

Endpoint	Method	Description
`/health`	GET	Health check
`/reset`	POST	Reset environment — `{"difficulty": "easy", "seed": 42}`
`/step`	POST	Execute action — `{"action": {"action_type": "fetch_logs", "target_service": "auth-service"}}`
`/state`	GET	Get current environment state
`/schema`	GET	Action / observation JSON schemas
`/ws`	WS	WebSocket for persistent sessions

Quick smoke test (new terminal):

# Reset an easy episode
curl -X POST http://localhost:8000/reset \
  -H "Content-Type: application/json" \
  -d '{"difficulty": "easy", "seed": 42}'

# Take an action
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action": {"action_type": "fetch_logs", "target_service": "cache"}}'

# Check current state
curl http://localhost:8000/state

4 — Run the Inference Agent

With the server running in one terminal, open a second terminal:

cd firewatch_env
python inference.py

This runs your agent across all three tasks sequentially:

Task	Difficulty	Services	Red Herrings	Max Ticks	Seed
`task_easy`	Easy	3	0	20	42
`task_medium`	Medium	5	1	30	137
`task_hard`	Hard	7	3 (1 adversarial)	40	256

Expected Output

[START] task=task_easy env=firewatch-env model=x-ai/grok-4.1-fast
[STEP] step=1 action=fetch_logs:cache reward=-0.14 done=false error=null
[STEP] step=2 action=rollback_deploy:cache reward=-0.14 done=false error=null
...
[END] success=true steps=4 score=0.96 rewards=-0.14,-0.14,-0.14,1.86

Each [STEP] line shows the action taken, intermediate reward, and whether the episode ended. The [END] line reports the final graded score (0.0–1.0).

5 — Docker (Alternative)

Build and run the environment as a Docker container:

# From the firewatch_env/ directory
docker build -t firewatch-env ./server
docker run -p 7860:7860 firewatch-env

The server will be available at http://localhost:7860. Set SPACE_URL=http://localhost:7860 when running inference.py (or let auto-detection find it).

6 — Deploy to HuggingFace Spaces

openenv validate          # must pass with zero errors
openenv push --repo-id 10doshi12/firewatch-env

Your environment will be live at https://10doshi12-firewatch-env.hf.space.

Project Structure

firewatch_env/
├── models.py              # Pydantic models (FirewatchAction, SystemObservation, etc.)
├── simulation.py          # ServiceMesh + generate_episode() + fault physics
├── actions.py             # ActionHandler — all 17 action types
├── rewards.py             # RewardEngine + grade() + EpisodeResult
├── config.py              # Constants, TASKS dict, topology (pure data)
├── client.py              # OpenEnv-generated WebSocket client
├── inference.py           # LLM agent loop (stdout eval format)
├── openenv.yaml           # OpenEnv spec definition
├── .env.example           # Environment variable template
├── Dockerfile             # Multi-stage Docker build
├── pyproject.toml         # Dependencies & entry points
├── server/
│   ├── app.py             # FastAPI application (entry point)
│   └── firewatch_env_environment.py  # Environment wiring
└── tests/
    ├── test_integration.py
    ├── test_simulation.py
    └── test_inference.py

Action Space Reference

Investigation Actions (read-only)

Action	Description
`fetch_logs`	Populates `recent_logs` on the target service
`get_metrics_detail`	Returns 3-tick metric trend summary
`trace_dependencies`	Returns full upstream/downstream dependency chain
`strace_process`	System-call level process inspection
`profiler_dump`	CPU/memory profiler output
`check_gc_pressure`	GC pause times and heap pressure
`trace_distributed_request`	End-to-end distributed trace
`inspect_thread_pool`	Thread pool utilization and deadlock detection
`inspect_commit_diff`	Recent deployment diff

Remediation Actions (mutate state)

Action	Description
`restart_service`	Resets OOM state; wrong if `error_rate < 0.10`
`rollback_deploy`	Halts bad deployment progression
`revert_config`	Restores connection pool / config settings
`scale_replicas`	Increases memory headroom
`circuit_break`	Suppresses cascade for 3 ticks
`traffic_shift`	Redirects traffic away from degraded service

Meta Actions

Action	Description
`declare_resolved`	Terminates episode and triggers grading
`escalate`	Records escalation (no state change)

Fault Types

Fault	Signal in Logs	Correct Remediation
`oom`	OOMKilled, exit code 137	`restart_service`
`bad_deploy`	Error spike post-deployment SHA	`rollback_deploy`
`config_drift`	HikariCP pool exhaustion, 30s timeouts	`revert_config`
`network_partition`	Connection refused, circuit breaker OPEN	`circuit_break` or `restart_service`
`memory_leak`	Gradual latency increase, slow memory growth	`scale_replicas` → `restart_service`

Scoring

The grader produces a score between 0.0 and 1.0 based on four components:

Component	Weight	What it Measures
Recovery	40%	Did system health improve?
Speed	25%	How quickly was MTTM achieved?
Precision	20%	Were wrong actions avoided?
SLO	15%	How much error budget remained?

Running Tests

cd firewatch_env
uv run pytest tests/                                  # all tests
uv run pytest tests/test_integration.py               # integration only
uv run pytest tests/test_simulation.py                # simulation logic
uv run pytest tests/test_integration.py::test_reset_deterministic  # single test

Troubleshooting

Problem	Solution
`uv: command not found`	Install uv: `pip install uv` or `curl -LsSf https://astral.sh/uv/install.sh \| sh`
`openenv-core` import error	Run `uv sync` inside `firewatch_env/`
Server won't start	Check port 8000 isn't in use: `lsof -i :8000`
`inference.py` can't find server	Server auto-detection probes `localhost:8000` → `localhost:7860`. Ensure the server is running.
LLM API errors / 401	Verify `HF_TOKEN` in `.env`. Without it, the rule-based fallback agent runs (no LLM).
Score is 0.0	Agent didn't call `declare_resolved` or SLO budget hit 0%. Check action logs.
Docker build fails	Ensure Docker Desktop is running. Build from `firewatch_env/`: `docker build -t fw ./server`

Next Steps

Swap the model: Change MODEL_NAME in .env to test different HF-hosted models (e.g. Qwen/Qwen2.5-72B-Instruct, meta-llama/Llama-3.3-70B-Instruct)
Tune the agent: Edit SYSTEM_PROMPT and _recovery_hint() in inference.py to improve decision-making
Add actions: Extend actions.py with new diagnostic or remediation actions
Custom tasks: Define new scenarios in config.py and openenv.yaml
Benchmark: Compare scores across models to find the best SRE agent

FirewatchEnv — Meta PyTorch OpenEnv Hackathon India 2026