File size: 5,258 Bytes
443c22e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---

title: Pyre  Crisis Navigation Environment
emoji: 🔥
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
app_port: 8000
tags:
  - openenv
---


# Pyre — Crisis Navigation Environment for LLM Agents

> *When buildings burn, the difference between a safe evacuation and a tragedy is the quality of decisions made in the first 60 seconds. Can we train an LLM to make them?*

**Pyre** places an LLM agent *inside* a burning building. The agent must navigate to safety under partial observability — no global map, hard time pressure, and a fire that actively spreads and blocks exits.

---

## Why Pyre vs. existing environments

| Feature | `grid_world` | `maze_env` | `wildfire_env` | **Pyre** |
|---|---|---|---|---|
| Observability | Full | Full | Partial | **Partial, first-person, text** |
| Map dynamics | Static | Static | Dynamic (fire) | **Dynamic (fire + doors)** |
| Action richness | 4 moves | 4 moves | Suppression | **Movement + door control + look** |
| Agent role | Mover | Mover | Suppressor | **Survivor** |
| Reward complexity | Reach goal | Reach goal | Suppress fire | **8-component composite rubric** |

*`wildfire_env` trains an agent to fight fires from above; Pyre trains an agent to survive from inside.*

---

## What the agent sees (narrative observation)

Every step the agent receives a first-person text observation:

```

You are in the **main_corridor**. The air is **moderate**.

Health: ████████░░ (85/100) | Wind: **EAST**

Flames are visible to the **west**.

Exits visible: exit_0_7 at 8m west.

Doors: door_1 (closed) at 2m east.

You hear: Fire alarm sounding; Smoke detector beeping.

Last action: You move south. The smoke is thick here.

Available actions: move(direction='north')  move(direction='south')  door(target_id='door_1', door_state='open')  look(direction='east')  wait()

```

---

## Action space

| Action | Parameters | Effect |
|---|---|---|
| `move` | `direction` | Move one cell N/S/E/W |
| `door` | `target_id`, `door_state` | Open or close a nearby door — closed doors slow fire spread to 15% |
| `look` | `direction` | Scan up to 5 cells in one direction for detailed zone/fire/smoke info |
| `wait` | — | Skip turn |

---

## Reward function (composite rubric)

**Per step:**
- `-0.01` constant time penalty
- `+0.1` moved closer to nearest unblocked exit (BFS distance)
- `-0.5` moved into smoke ≥ moderate or fire-adjacent cell
- `-0.02 × damage` health drain from smoke/fire exposure
- `+0.5` closed a door adjacent to active fire (strategic)

**Episode end:**
- `+5.0` agent evacuated alive
- `-10.0` agent incapacitated
- `+0.05 × remaining_steps` time bonus for fast evacuation

---

## Quick start

```bash

cd pyre_env

uv sync

uv run server   # → http://localhost:8000



# Health check

curl http://localhost:8000/health



# Reset (difficulty: easy | medium | hard)

curl -X POST http://localhost:8000/reset \

  -H "Content-Type: application/json" \

  -d '{"difficulty": "medium"}'



# Step

curl -X POST http://localhost:8000/step \

  -H "Content-Type: application/json" \

  -d '{"action": "move", "direction": "north"}'



# Random baseline (5 episodes)

python examples/random_agent.py --episodes 5 --verbose

```

### Python client

```python

from pyre_env import PyreEnv, PyreAction



with PyreEnv(base_url="http://localhost:8000") as env:

    result = env.reset()

    print(result.observation.narrative)

    result = env.step(PyreAction(action="move", direction="north"))

    print(f"Reward: {result.reward}, Health: {result.observation.agent_health}")

```

---

## Difficulty levels

| Level | Fire sources | Spread rate | Wind | Humidity | Max steps |
|---|---|---|---|---|---|
| `easy` | 1 | 10–20% | Calm | 30–50% | 200 |
| `medium` | 2–4 | 15–40% | Any | 10–45% | 150 |
| `hard` | 3–5 | 30–55% | Never calm | 5–20% | 100 |

---

## Deployment

```bash

openenv push --repo-id your-org/pyre-env

```

---

## Project structure

```

pyre_env/

├── models.py                       PyreAction, PyreObservation, PyreMapState, PyreState

├── client.py                       PyreEnv (EnvClient subclass)

├── openenv.yaml                    OpenEnv manifest

├── pyproject.toml

├── server/

│   ├── app.py                      FastAPI bootstrap

│   ├── pyre_env_environment.py     Main Environment class

│   ├── floor_plan.py               3 building templates + episode generation

│   ├── fire_sim.py                 Cellular automaton fire/smoke simulation

│   ├── narrative.py                Visibility + first-person text observation renderer

│   └── rubrics.py                  8 composable reward components

└── examples/

    └── random_agent.py             Smoke-test baseline

```

---

## Hackathon alignment

- **Theme #2 — Long-Horizon Planning**: 50–200 step episodes; agent must build a mental map across many partial observations
- **Theme #3.1 — World Modeling**: no global map; agent infers fire spread, corridor topology, and exit reachability from local text observations alone