Spaces:
Sleeping
Sleeping
| # Dispatch Arena Spec | |
| ## Contract | |
| ```python | |
| reset(seed=None, episode_id=None, config=None) -> Observation | |
| step(action) -> Observation | |
| state -> State | |
| ``` | |
| The same `DispatchArenaEnvironment` powers training, REST, WebSocket streaming, replay, and the frontend. | |
| ## Mini Mode | |
| Mini mode has one courier, one order, and three nodes: `hub`, `pickup`, `dropoff`. | |
| | Action | Legal when | Effect | | |
| |---|---|---| | |
| | `wait` | Episode is active | Consumes a tick while prep can progress. | | |
| | `go_pickup` | Courier is not carrying and not at pickup | Moves courier to `pickup`. | | |
| | `pickup` | Courier at pickup and order ready | Courier carries the order. | | |
| | `go_dropoff` | Courier carrying and not at dropoff | Moves courier to `dropoff`. | | |
| | `dropoff` | Courier carrying at dropoff | Terminal success. | | |
| ## Normal Mode | |
| Normal mode has 2-5 couriers and 3-10 orders. The agent is a centralized dispatcher. | |
| | Action | Required args | Effect | | |
| |---|---|---| | |
| | `assign` | `courier_id`, `order_id` | Sends an idle courier to an unassigned open order. | | |
| | `reposition` | `courier_id`, `node_id` | Moves an idle courier to a graph node. | | |
| | `hold` | optional `courier_id` | Consumes a tick. | | |
| | `prioritize` | optional `order_id` | Records dispatch intent without privileged state. | | |
| Pickup and dropoff are automatic once an assigned courier reaches the relevant node and the order is ready. | |
| ## Observation | |
| - `state`: sanitized public state. | |
| - `reward`: reward from the most recent transition. | |
| - `done`: terminal flag. | |
| - `truncated`: true only for hard timeout. | |
| - `verifier_status`: `in_progress`, `delivered_successfully`, `partial_success`, or `timeout_failure`. | |
| - `reward_breakdown`: all reward components and total. | |
| - `legal_actions`: action names currently available. | |
| - `action_mask`: mask in mode-specific action order. | |
| - `summary_text`: compact public explanation. | |
| - `info`: transition metadata such as invalid action reasons and events. | |
| Hidden state never exposed in hidden mode: exact `prep_remaining`. | |
| ## Step Ordering | |
| 1. Reject the step if the episode is terminal. | |
| 2. Apply base step cost. | |
| 3. Consume one tick. | |
| 4. Progress prep timers. | |
| 5. Validate the requested action against the pre-transition legal state. | |
| 6. Apply valid action effects or invalid penalty. | |
| 7. In normal mode, advance courier travel and automatic pickup/dropoff. | |
| 8. Expire overdue open orders. | |
| 9. Apply timeout if max ticks is reached. | |
| 10. Recompute public state, legal actions, metrics, and replay payload. | |
| ## Determinism | |
| The same `(mode, seed, config, action_trace)` reproduces the same scenario, observations, reward components, and replay records. | |