Spaces:

Akshaykumarbm
/

pyre_env

Sleeping

App Files Files Community

pyre_env / README.md

Akshaykumarbm

Upload folder using huggingface_hub

1f2f7b2 verified 27 days ago

preview code

raw

history blame contribute delete

5.26 kB

metadata

title: Pyre — Crisis Navigation Environment
emoji: 🔥
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
app_port: 8000
tags:
  - openenv

Pyre — Crisis Navigation Environment for LLM Agents

When buildings burn, the difference between a safe evacuation and a tragedy is the quality of decisions made in the first 60 seconds. Can we train an LLM to make them?

Pyre places an LLM agent inside a burning building. The agent must navigate to safety under partial observability — no global map, hard time pressure, and a fire that actively spreads and blocks exits.

Why Pyre vs. existing environments

Feature	`grid_world`	`maze_env`	`wildfire_env`	Pyre
Observability	Full	Full	Partial	Partial, first-person, text
Map dynamics	Static	Static	Dynamic (fire)	Dynamic (fire + doors)
Action richness	4 moves	4 moves	Suppression	Movement + door control + look
Agent role	Mover	Mover	Suppressor	Survivor
Reward complexity	Reach goal	Reach goal	Suppress fire	8-component composite rubric

wildfire_env trains an agent to fight fires from above; Pyre trains an agent to survive from inside.

What the agent sees (narrative observation)

Every step the agent receives a first-person text observation:

You are in the **main_corridor**. The air is **moderate**.
Health: ████████░░ (85/100) | Wind: **EAST**
Flames are visible to the **west**.
Exits visible: exit_0_7 at 8m west.
Doors: door_1 (closed) at 2m east.
You hear: Fire alarm sounding; Smoke detector beeping.
Last action: You move south. The smoke is thick here.
Available actions: move(direction='north')  move(direction='south')  door(target_id='door_1', door_state='open')  look(direction='east')  wait()

Action space

Action	Parameters	Effect
`move`	`direction`	Move one cell N/S/E/W
`door`	`target_id`, `door_state`	Open or close a nearby door — closed doors slow fire spread to 15%
`look`	`direction`	Scan up to 5 cells in one direction for detailed zone/fire/smoke info
`wait`	—	Skip turn

Reward function (composite rubric)

Per step:

-0.01 constant time penalty
+0.1 moved closer to nearest unblocked exit (BFS distance)
-0.5 moved into smoke ≥ moderate or fire-adjacent cell
-0.02 × damage health drain from smoke/fire exposure
+0.5 closed a door adjacent to active fire (strategic)

Episode end:

+5.0 agent evacuated alive
-10.0 agent incapacitated
+0.05 × remaining_steps time bonus for fast evacuation

Quick start

cd pyre_env
uv sync
uv run server   # → http://localhost:8000

# Health check
curl http://localhost:8000/health

# Reset (difficulty: easy | medium | hard)
curl -X POST http://localhost:8000/reset \
  -H "Content-Type: application/json" \
  -d '{"difficulty": "medium"}'

# Step
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action": "move", "direction": "north"}'

# Random baseline (5 episodes)
python examples/random_agent.py --episodes 5 --verbose

Python client

from pyre_env import PyreEnv, PyreAction

with PyreEnv(base_url="http://localhost:8000") as env:
    result = env.reset()
    print(result.observation.narrative)
    result = env.step(PyreAction(action="move", direction="north"))
    print(f"Reward: {result.reward}, Health: {result.observation.agent_health}")

Difficulty levels

Level	Fire sources	Spread rate	Wind	Humidity	Max steps
`easy`	1	10–20%	Calm	30–50%	200
`medium`	2–4	15–40%	Any	10–45%	150
`hard`	3–5	30–55%	Never calm	5–20%	100

Deployment

openenv push --repo-id your-org/pyre-env

Project structure

pyre_env/
├── models.py                       PyreAction, PyreObservation, PyreMapState, PyreState
├── client.py                       PyreEnv (EnvClient subclass)
├── openenv.yaml                    OpenEnv manifest
├── pyproject.toml
├── server/
│   ├── app.py                      FastAPI bootstrap
│   ├── pyre_env_environment.py     Main Environment class
│   ├── floor_plan.py               3 building templates + episode generation
│   ├── fire_sim.py                 Cellular automaton fire/smoke simulation
│   ├── narrative.py                Visibility + first-person text observation renderer
│   └── rubrics.py                  8 composable reward components
└── examples/
    └── random_agent.py             Smoke-test baseline

Hackathon alignment

Theme #2 — Long-Horizon Planning: 50–200 step episodes; agent must build a mental map across many partial observations
Theme #3.1 — World Modeling: no global map; agent infers fire spread, corridor topology, and exit reachability from local text observations alone