Spaces:
Running
Running
| # π₯ FirewatchEnv β Quickstart Guide | |
| > Get from zero to running your first AI SRE agent in under 5 minutes. | |
| --- | |
| ## What is FirewatchEnv? | |
| FirewatchEnv is an **RL training environment** for autonomous SRE incident response, built for the [Meta PyTorch OpenEnv Hackathon India 2026](https://github.com/meta-pytorch/OpenEnv). Your AI agent acts as an on-call Site Reliability Engineer β it receives simulated microservice telemetry (OTel-compatible metrics, Prometheus-style alerts, log excerpts) and must **diagnose and remediate the root cause** before the SLO error budget runs out. | |
| **Key highlights:** | |
| - Single container, no Kubernetes β runs on 2 vCPUs / 8 GB RAM | |
| - Three difficulty tiers (Easy β Medium β Hard) with adversarial prompt injection in Task 3 | |
| - Outcome-only reward function β the agent can't game the grader; it must actually fix the system | |
| --- | |
| ## Prerequisites | |
| | Tool | Version | Install | | |
| |------|---------|---------| | |
| | **Python** | 3.10+ | [python.org](https://www.python.org/downloads/) | | |
| | **uv** | latest | `pip install uv` or `curl -LsSf https://astral.sh/uv/install.sh \| sh` | | |
| | **Git** | any | [git-scm.com](https://git-scm.com/) | | |
| | **Docker** | latest *(optional β only for containerized runs)* | [docker.com](https://docs.docker.com/get-docker/) | | |
| --- | |
| ## 1 β Clone & Install | |
| ```bash | |
| git clone https://huggingface.co/spaces/10doshi12/firewatch-env | |
| cd firewatch-env | |
| ``` | |
| > **Important:** All commands below should be run from inside the `firewatch_env/` directory, which contains the actual environment code. | |
| ```bash | |
| cd firewatch_env | |
| uv sync # installs all Python dependencies from pyproject.toml + uv.lock | |
| ``` | |
| This installs: | |
| - `openenv-core[core]` β₯ 0.2.2 β FastAPI server + HTTP client types | |
| - `pydantic` β₯ 2.0 β data models | |
| - `openai` β₯ 1.0 β LLM inference via OpenAI-compatible API | |
| - `python-dotenv` β `.env` file loading | |
| --- | |
| ## 2 β Configure Environment Variables | |
| Copy the example and fill in your credentials: | |
| ```bash | |
| cp .env.example .env | |
| ``` | |
| Edit `.env`: | |
| ```dotenv | |
| # --- LLM Provider (HuggingFace Router) --- | |
| API_BASE_URL=https://router.huggingface.co/v1 | |
| MODEL_NAME=Qwen/Qwen2.5-7B-Instruct | |
| HF_TOKEN=hf_your_huggingface_token_here | |
| # --- Server URL (usually auto-detected β leave commented for local dev) --- | |
| # SPACE_URL=https://10doshi12-firewatch-env.hf.space | |
| ``` | |
| Get your HF token from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) (requires a **Pro** or **Enterprise** plan for router access to gated models). | |
| | Variable | Required | Description | | |
| |----------|----------|-------------| | |
| | `API_BASE_URL` | Yes | HuggingFace Router endpoint (`https://router.huggingface.co/v1`) | | |
| | `MODEL_NAME` | Yes | Model on HF Hub (e.g. `Qwen/Qwen2.5-7B-Instruct`, `Qwen/Qwen2.5-72B-Instruct`) | | |
| | `HF_TOKEN` | No* | HuggingFace API token. *If omitted, inference runs a deterministic rule-based fallback agent (no LLM calls).* | | |
| | `SPACE_URL` | No | Override server URL. Auto-detected in order: `localhost:8000` β `localhost:7860` β HF Space | | |
| --- | |
| ## 3 β Start the Server | |
| ```bash | |
| uv run server | |
| ``` | |
| The FastAPI server starts on **http://localhost:8000** with these endpoints: | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/health` | GET | Health check | | |
| | `/reset` | POST | Reset environment β `{"difficulty": "easy", "seed": 42}` | | |
| | `/step` | POST | Execute action β `{"action": {"action_type": "fetch_logs", "target_service": "auth-service"}}` | | |
| | `/state` | GET | Get current environment state | | |
| | `/schema` | GET | Action / observation JSON schemas | | |
| | `/ws` | WS | WebSocket for persistent sessions | | |
| ### Quick smoke test (new terminal): | |
| ```bash | |
| # Reset an easy episode | |
| curl -X POST http://localhost:8000/reset \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"difficulty": "easy", "seed": 42}' | |
| # Take an action | |
| curl -X POST http://localhost:8000/step \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"action": {"action_type": "fetch_logs", "target_service": "cache"}}' | |
| # Check current state | |
| curl http://localhost:8000/state | |
| ``` | |
| --- | |
| ## 4 β Run the Inference Agent | |
| With the server running in one terminal, open a **second terminal**: | |
| ```bash | |
| cd firewatch_env | |
| python inference.py | |
| ``` | |
| This runs your agent across all three tasks sequentially: | |
| | Task | Difficulty | Services | Red Herrings | Max Ticks | Seed | | |
| |------|-----------|----------|-------------|-----------|------| | |
| | `task_easy` | Easy | 3 | 0 | 20 | 42 | | |
| | `task_medium` | Medium | 5 | 1 | 30 | 137 | | |
| | `task_hard` | Hard | 7 | 3 (1 adversarial) | 40 | 256 | | |
| ### Expected Output | |
| ``` | |
| [START] task=task_easy env=firewatch-env model=x-ai/grok-4.1-fast | |
| [STEP] step=1 action=fetch_logs:cache reward=-0.14 done=false error=null | |
| [STEP] step=2 action=rollback_deploy:cache reward=-0.14 done=false error=null | |
| ... | |
| [END] success=true steps=4 score=0.96 rewards=-0.14,-0.14,-0.14,1.86 | |
| ``` | |
| Each `[STEP]` line shows the action taken, intermediate reward, and whether the episode ended. The `[END]` line reports the final graded score (0.0β1.0). | |
| --- | |
| ## 5 β Docker (Alternative) | |
| Build and run the environment as a Docker container: | |
| ```bash | |
| # From the firewatch_env/ directory | |
| docker build -t firewatch-env ./server | |
| docker run -p 7860:7860 firewatch-env | |
| ``` | |
| The server will be available at **http://localhost:7860**. Set `SPACE_URL=http://localhost:7860` when running `inference.py` (or let auto-detection find it). | |
| --- | |
| ## 6 β Deploy to HuggingFace Spaces | |
| ```bash | |
| openenv validate # must pass with zero errors | |
| openenv push --repo-id 10doshi12/firewatch-env | |
| ``` | |
| Your environment will be live at `https://10doshi12-firewatch-env.hf.space`. | |
| --- | |
| ## Project Structure | |
| ``` | |
| firewatch_env/ | |
| βββ models.py # Pydantic models (FirewatchAction, SystemObservation, etc.) | |
| βββ simulation.py # ServiceMesh + generate_episode() + fault physics | |
| βββ actions.py # ActionHandler β all 17 action types | |
| βββ rewards.py # RewardEngine + grade() + EpisodeResult | |
| βββ config.py # Constants, TASKS dict, topology (pure data) | |
| βββ client.py # OpenEnv-generated WebSocket client | |
| βββ inference.py # LLM agent loop (stdout eval format) | |
| βββ openenv.yaml # OpenEnv spec definition | |
| βββ .env.example # Environment variable template | |
| βββ Dockerfile # Multi-stage Docker build | |
| βββ pyproject.toml # Dependencies & entry points | |
| βββ server/ | |
| β βββ app.py # FastAPI application (entry point) | |
| β βββ firewatch_env_environment.py # Environment wiring | |
| βββ tests/ | |
| βββ test_integration.py | |
| βββ test_simulation.py | |
| βββ test_inference.py | |
| ``` | |
| --- | |
| ## Action Space Reference | |
| ### Investigation Actions (read-only) | |
| | Action | Description | | |
| |--------|-------------| | |
| | `fetch_logs` | Populates `recent_logs` on the target service | | |
| | `get_metrics_detail` | Returns 3-tick metric trend summary | | |
| | `trace_dependencies` | Returns full upstream/downstream dependency chain | | |
| | `strace_process` | System-call level process inspection | | |
| | `profiler_dump` | CPU/memory profiler output | | |
| | `check_gc_pressure` | GC pause times and heap pressure | | |
| | `trace_distributed_request` | End-to-end distributed trace | | |
| | `inspect_thread_pool` | Thread pool utilization and deadlock detection | | |
| | `inspect_commit_diff` | Recent deployment diff | | |
| ### Remediation Actions (mutate state) | |
| | Action | Description | | |
| |--------|-------------| | |
| | `restart_service` | Resets OOM state; wrong if `error_rate < 0.10` | | |
| | `rollback_deploy` | Halts bad deployment progression | | |
| | `revert_config` | Restores connection pool / config settings | | |
| | `scale_replicas` | Increases memory headroom | | |
| | `circuit_break` | Suppresses cascade for 3 ticks | | |
| | `traffic_shift` | Redirects traffic away from degraded service | | |
| ### Meta Actions | |
| | Action | Description | | |
| |--------|-------------| | |
| | `declare_resolved` | Terminates episode and triggers grading | | |
| | `escalate` | Records escalation (no state change) | | |
| --- | |
| ## Fault Types | |
| | Fault | Signal in Logs | Correct Remediation | | |
| |-------|---------------|---------------------| | |
| | `oom` | OOMKilled, exit code 137 | `restart_service` | | |
| | `bad_deploy` | Error spike post-deployment SHA | `rollback_deploy` | | |
| | `config_drift` | HikariCP pool exhaustion, 30s timeouts | `revert_config` | | |
| | `network_partition` | Connection refused, circuit breaker OPEN | `circuit_break` or `restart_service` | | |
| | `memory_leak` | Gradual latency increase, slow memory growth | `scale_replicas` β `restart_service` | | |
| --- | |
| ## Scoring | |
| The grader produces a score between **0.0 and 1.0** based on four components: | |
| | Component | Weight | What it Measures | | |
| |-----------|--------|-----------------| | |
| | Recovery | 40% | Did system health improve? | | |
| | Speed | 25% | How quickly was MTTM achieved? | | |
| | Precision | 20% | Were wrong actions avoided? | | |
| | SLO | 15% | How much error budget remained? | | |
| --- | |
| ## Running Tests | |
| ```bash | |
| cd firewatch_env | |
| uv run pytest tests/ # all tests | |
| uv run pytest tests/test_integration.py # integration only | |
| uv run pytest tests/test_simulation.py # simulation logic | |
| uv run pytest tests/test_integration.py::test_reset_deterministic # single test | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| | Problem | Solution | | |
| |---------|----------| | |
| | `uv: command not found` | Install uv: `pip install uv` or `curl -LsSf https://astral.sh/uv/install.sh \| sh` | | |
| | `openenv-core` import error | Run `uv sync` inside `firewatch_env/` | | |
| | Server won't start | Check port 8000 isn't in use: `lsof -i :8000` | | |
| | `inference.py` can't find server | Server auto-detection probes `localhost:8000` β `localhost:7860`. Ensure the server is running. | | |
| | LLM API errors / 401 | Verify `HF_TOKEN` in `.env`. Without it, the rule-based fallback agent runs (no LLM). | | |
| | Score is 0.0 | Agent didn't call `declare_resolved` or SLO budget hit 0%. Check action logs. | | |
| | Docker build fails | Ensure Docker Desktop is running. Build from `firewatch_env/`: `docker build -t fw ./server` | | |
| --- | |
| ## Next Steps | |
| - **Swap the model**: Change `MODEL_NAME` in `.env` to test different HF-hosted models (e.g. `Qwen/Qwen2.5-72B-Instruct`, `meta-llama/Llama-3.3-70B-Instruct`) | |
| - **Tune the agent**: Edit `SYSTEM_PROMPT` and `_recovery_hint()` in `inference.py` to improve decision-making | |
| - **Add actions**: Extend `actions.py` with new diagnostic or remediation actions | |
| - **Custom tasks**: Define new scenarios in `config.py` and `openenv.yaml` | |
| - **Benchmark**: Compare scores across models to find the best SRE agent | |
| --- | |
| *FirewatchEnv β Meta PyTorch OpenEnv Hackathon India 2026* | |