Spaces:
Sleeping
Sleeping
| title: RL-Env Warehouse Fulfillment | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 8000 | |
| pinned: false | |
| # RL-Env: Warehouse Fulfillment Environment | |
| A progressively challenging OpenEnv-style warehouse fulfillment environment simulating realistic pharmacy micro-fulfillment workflows with obstacles, weight constraints, stamina management, and budget optimization. | |
| ## Overview | |
| Agents control a warehouse robot navigating a 7Γ7 grid to fulfill customer orders. Tasks range from simple single-item pickups to complex multi-constraint challenges requiring strategic planning across obstacles, battery management, item weights, stamina conservation, and profit optimization. | |
| ## Requirements Coverage | |
| - **Real-world task**: Implemented in [`env.py`](grid_env/env.py) | |
| - **Eight graded tasks**: Defined in [`tasks.py`](grid_env/tasks.py) with progressive difficulty | |
| - **Deterministic graders**: Implemented in [`graders.py`](grid_env/graders.py) | |
| - **Typed OpenEnv models**: Implemented in [`models.py`](grid_env/models.py) | |
| - **OpenEnv manifests**: [`openenv.yaml`](openenv.yaml) and [`openv.yaml`](grid_env/openv.yaml) | |
| - **OpenAI baseline runner**: [`baseline.py`](grid_env/baseline.py) | |
| ## Core Mechanics | |
| ### Actions | |
| - **Navigation**: `turn_left`, `turn_right`, `move_forward` | |
| - **Operations**: `scan_bin`, `pick_item`, `pack_item` | |
| - **Resources**: `recharge` (battery), `rest` (stamina) | |
| - **Utility**: `wait` | |
| ### Advanced Mechanics (task-dependent) | |
| 1. **Obstacles**: Impassable cells blocking direct paths β agents must route around them | |
| 2. **Item Weight & Carry Capacity**: Items have weight (1-4 units); heavier items drain more battery while moving | |
| 3. **Stamina System**: Movement costs stamina; when depleted, movement costs double battery | |
| 4. **Money & Profit Targets**: Items have dollar values; correct packs earn money, wrong packs lose money | |
| ## Environment Interface | |
| ```python | |
| from grid_env import WarehouseFulfillmentEnv | |
| env = WarehouseFulfillmentEnv(task_id="easy_single_pick", seed=7) | |
| # Reset environment | |
| observation = env.reset(task_id="easy_single_pick", seed=7) | |
| # Step with action | |
| observation, reward, done, info = env.step("move_forward") | |
| # Get full state | |
| state = env.state() | |
| ``` | |
| All data types (`WarehouseAction`, `WarehouseObservation`, `WarehouseReward`, `WarehouseState`) are Pydantic models. | |
| ## Tasks | |
| The environment includes **8 progressively challenging tasks** across 4 difficulty levels: | |
| ### Easy | |
| - **`easy_single_pick`**: Fulfill one urgent thermometer order (40 steps, battery 36) | |
| ### Medium | |
| - **`medium_multi_item`**: Two-line order with scan verification (60 steps, battery 34) | |
| - **`obstacle_course`**: Navigate around 6 obstacles to fulfill two-item order (70 steps, battery 40) | |
| ### Hard | |
| - **`hard_restock_priority`**: Three-line order with battery management (85 steps, battery 24) | |
| - **`heavy_lifting`**: Weight-constrained picking β items weigh 1-4 units, carry capacity 3 (90 steps, battery 32) | |
| - **`stamina_run`**: Stamina management β movement drains stamina; rest to recover (80 steps, battery 36, stamina 12) | |
| ### Expert | |
| - **`budget_run`**: Profit-driven fulfillment β earn $15+ from valued items (70 steps, battery 30, target $15) | |
| - **`gauntlet`**: All mechanics combined β obstacles + weight + stamina + $20 profit target (120 steps, battery 28, stamina 10, carry capacity 3) | |
| Each grader returns a deterministic rubric-based score in `[0.0, 1.0]` based on completion, efficiency, and constraint satisfaction. | |
| ## Reward Design | |
| The reward function provides dense trajectory-wide feedback: | |
| **Positive Rewards:** | |
| - Correct scans: +0.12 | |
| - Correct picks: +0.20 | |
| - Correct packs: +0.35 | |
| - Completion bonus: +0.50 | |
| - Timely recharge/rest: +0.06-0.08 | |
| **Penalties:** | |
| - Invalid actions: -0.08 to -0.10 | |
| - Wrong picks: -0.18 | |
| - Wrong packs: -0.15 | |
| - Obstacle collisions: -0.12 | |
| - Overweight attempts: -0.12 | |
| - Waiting: -0.01 per step | |
| **Money System (expert tasks):** | |
| - Correct packs earn item value in dollars | |
| - Wrong packs lose 50% of item value | |
| ## Installation & Usage | |
| ### Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Run Baseline with OpenAI | |
| The baseline runner uses the OpenAI Python SDK with **multi-seed evaluation** for robust scoring: | |
| ```bash | |
| export OPENAI_API_KEY="your_api_key_here" | |
| export OPENAI_MODEL="gpt-4o-mini" | |
| # Single-seed evaluation (backward compatible) | |
| export EVAL_SEEDS="7" | |
| python3 -m grid_env.baseline | |
| # Multi-seed evaluation (default: 5 seeds) | |
| export EVAL_SEEDS="7,42,123,456,789" | |
| python3 -m grid_env.baseline | |
| ``` | |
| ### Run Inference Script | |
| For comprehensive evaluation across all 8 tasks with multi-seed support: | |
| ```bash | |
| # Create .env file with: | |
| # HF_TOKEN=your_api_key | |
| # API_BASE_URL=https://api.openai.com/v1 | |
| # MODEL_NAME=gpt-4o-mini | |
| # EVAL_SEEDS=7,42,123,456,789 | |
| # Load .env and run | |
| set -a && source .env && set +a | |
| python3 inference.py | |
| ``` | |
| **Multi-seed evaluation** runs each task across multiple random seeds and reports: | |
| - Mean score Β± standard deviation | |
| - Min/max scores across seeds | |
| - Success rate across seeds | |
| This prevents overfitting to specific warehouse layouts and provides more reliable performance metrics. | |
| ## Testing | |
| Run the full test suite (205 tests): | |
| ```bash | |
| pytest tests/ -v | |
| ``` | |
| ### Multi-Seed Evaluation Demo | |
| See how multi-seed evaluation works without requiring an API key: | |
| ```bash | |
| python3 example_multiseed.py | |
| ``` | |
| This runs a random policy across multiple seeds and shows score variance. | |
| ## Server Deployment | |
| The environment includes a FastAPI server for remote access: | |
| ```bash | |
| # Local development | |
| uvicorn grid_env.Server.app:app --reload | |
| # Docker deployment | |
| docker build -t warehouse-env . | |
| docker run -p 8000:8000 warehouse-env | |
| ``` | |
| API endpoints: | |
| - `GET /health` β Server status | |
| - `GET /tasks` β List all tasks | |
| - `POST /reset` β Reset environment | |
| - `POST /step` β Execute action | |
| - `GET /state` β Get current state | |
| ## Validation | |
| If `openenv` is installed, validate the manifest: | |
| ```bash | |
| openenv validate grid_env/openv.yaml | |
| ``` | |
| ## Project Structure | |
| ``` | |
| grid_env/ | |
| βββ env.py # Core environment logic | |
| βββ tasks.py # Task definitions (8 tasks) | |
| βββ graders.py # Rubric-based graders | |
| βββ models.py # Pydantic data models | |
| βββ baseline.py # OpenAI baseline runner | |
| βββ tools.py # Action tool definitions | |
| βββ world.py # World state wrapper | |
| βββ client.py # Client facade | |
| βββ Server/ | |
| βββ app.py # FastAPI server | |
| βββ warehouse_env.py # Service wrapper | |
| tests/ # 205 test cases | |
| inference.py # LLM inference runner | |
| ``` | |
| ## License | |
| MIT License - Copyright 2026 Soham Bose | |