--- title: RL-Env Warehouse Fulfillment emoji: 🏭 colorFrom: blue colorTo: green sdk: docker app_port: 8000 pinned: false --- # RL-Env: Warehouse Fulfillment Environment A progressively challenging OpenEnv-style warehouse fulfillment environment simulating realistic pharmacy micro-fulfillment workflows with obstacles, weight constraints, stamina management, and budget optimization. ## Overview Agents control a warehouse robot navigating a 7×7 grid to fulfill customer orders. Tasks range from simple single-item pickups to complex multi-constraint challenges requiring strategic planning across obstacles, battery management, item weights, stamina conservation, and profit optimization. ## Requirements Coverage - **Real-world task**: Implemented in [`env.py`](grid_env/env.py) - **Eight graded tasks**: Defined in [`tasks.py`](grid_env/tasks.py) with progressive difficulty - **Deterministic graders**: Implemented in [`graders.py`](grid_env/graders.py) - **Typed OpenEnv models**: Implemented in [`models.py`](grid_env/models.py) - **OpenEnv manifests**: [`openenv.yaml`](openenv.yaml) and [`openv.yaml`](grid_env/openv.yaml) - **OpenAI baseline runner**: [`baseline.py`](grid_env/baseline.py) ## Core Mechanics ### Actions - **Navigation**: `turn_left`, `turn_right`, `move_forward` - **Operations**: `scan_bin`, `pick_item`, `pack_item` - **Resources**: `recharge` (battery), `rest` (stamina) - **Utility**: `wait` ### Advanced Mechanics (task-dependent) 1. **Obstacles**: Impassable cells blocking direct paths — agents must route around them 2. **Item Weight & Carry Capacity**: Items have weight (1-4 units); heavier items drain more battery while moving 3. **Stamina System**: Movement costs stamina; when depleted, movement costs double battery 4. **Money & Profit Targets**: Items have dollar values; correct packs earn money, wrong packs lose money ## Environment Interface ```python from grid_env import WarehouseFulfillmentEnv env = WarehouseFulfillmentEnv(task_id="easy_single_pick", seed=7) # Reset environment observation = env.reset(task_id="easy_single_pick", seed=7) # Step with action observation, reward, done, info = env.step("move_forward") # Get full state state = env.state() ``` All data types (`WarehouseAction`, `WarehouseObservation`, `WarehouseReward`, `WarehouseState`) are Pydantic models. ## Tasks The environment includes **8 progressively challenging tasks** across 4 difficulty levels: ### Easy - **`easy_single_pick`**: Fulfill one urgent thermometer order (40 steps, battery 36) ### Medium - **`medium_multi_item`**: Two-line order with scan verification (60 steps, battery 34) - **`obstacle_course`**: Navigate around 6 obstacles to fulfill two-item order (70 steps, battery 40) ### Hard - **`hard_restock_priority`**: Three-line order with battery management (85 steps, battery 24) - **`heavy_lifting`**: Weight-constrained picking — items weigh 1-4 units, carry capacity 3 (90 steps, battery 32) - **`stamina_run`**: Stamina management — movement drains stamina; rest to recover (80 steps, battery 36, stamina 12) ### Expert - **`budget_run`**: Profit-driven fulfillment — earn $15+ from valued items (70 steps, battery 30, target $15) - **`gauntlet`**: All mechanics combined — obstacles + weight + stamina + $20 profit target (120 steps, battery 28, stamina 10, carry capacity 3) Each grader returns a deterministic rubric-based score in `[0.0, 1.0]` based on completion, efficiency, and constraint satisfaction. ## Reward Design The reward function provides dense trajectory-wide feedback: **Positive Rewards:** - Correct scans: +0.12 - Correct picks: +0.20 - Correct packs: +0.35 - Completion bonus: +0.50 - Timely recharge/rest: +0.06-0.08 **Penalties:** - Invalid actions: -0.08 to -0.10 - Wrong picks: -0.18 - Wrong packs: -0.15 - Obstacle collisions: -0.12 - Overweight attempts: -0.12 - Waiting: -0.01 per step **Money System (expert tasks):** - Correct packs earn item value in dollars - Wrong packs lose 50% of item value ## Installation & Usage ### Install Dependencies ```bash pip install -r requirements.txt ``` ### Run Baseline with OpenAI The baseline runner uses the OpenAI Python SDK with **multi-seed evaluation** for robust scoring: ```bash export OPENAI_API_KEY="your_api_key_here" export OPENAI_MODEL="gpt-4o-mini" # Single-seed evaluation (backward compatible) export EVAL_SEEDS="7" python3 -m grid_env.baseline # Multi-seed evaluation (default: 5 seeds) export EVAL_SEEDS="7,42,123,456,789" python3 -m grid_env.baseline ``` ### Run Inference Script For comprehensive evaluation across all 8 tasks with multi-seed support: ```bash # Create .env file with: # HF_TOKEN=your_api_key # API_BASE_URL=https://api.openai.com/v1 # MODEL_NAME=gpt-4o-mini # EVAL_SEEDS=7,42,123,456,789 # Load .env and run set -a && source .env && set +a python3 inference.py ``` **Multi-seed evaluation** runs each task across multiple random seeds and reports: - Mean score ± standard deviation - Min/max scores across seeds - Success rate across seeds This prevents overfitting to specific warehouse layouts and provides more reliable performance metrics. ## Testing Run the full test suite (205 tests): ```bash pytest tests/ -v ``` ### Multi-Seed Evaluation Demo See how multi-seed evaluation works without requiring an API key: ```bash python3 example_multiseed.py ``` This runs a random policy across multiple seeds and shows score variance. ## Server Deployment The environment includes a FastAPI server for remote access: ```bash # Local development uvicorn grid_env.Server.app:app --reload # Docker deployment docker build -t warehouse-env . docker run -p 8000:8000 warehouse-env ``` API endpoints: - `GET /health` — Server status - `GET /tasks` — List all tasks - `POST /reset` — Reset environment - `POST /step` — Execute action - `GET /state` — Get current state ## Validation If `openenv` is installed, validate the manifest: ```bash openenv validate grid_env/openv.yaml ``` ## Project Structure ``` grid_env/ ├── env.py # Core environment logic ├── tasks.py # Task definitions (8 tasks) ├── graders.py # Rubric-based graders ├── models.py # Pydantic data models ├── baseline.py # OpenAI baseline runner ├── tools.py # Action tool definitions ├── world.py # World state wrapper ├── client.py # Client facade └── Server/ ├── app.py # FastAPI server └── warehouse_env.py # Service wrapper tests/ # 205 test cases inference.py # LLM inference runner ``` ## License MIT License - Copyright 2026 Soham Bose