Spaces:
Sleeping
title: RL-Env Warehouse Fulfillment
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: false
RL-Env: Warehouse Fulfillment Environment
A progressively challenging OpenEnv-style warehouse fulfillment environment simulating realistic pharmacy micro-fulfillment workflows with obstacles, weight constraints, stamina management, and budget optimization.
Overview
Agents control a warehouse robot navigating a 7Γ7 grid to fulfill customer orders. Tasks range from simple single-item pickups to complex multi-constraint challenges requiring strategic planning across obstacles, battery management, item weights, stamina conservation, and profit optimization.
Requirements Coverage
- Real-world task: Implemented in
env.py - Eight graded tasks: Defined in
tasks.pywith progressive difficulty - Deterministic graders: Implemented in
graders.py - Typed OpenEnv models: Implemented in
models.py - OpenEnv manifests:
openenv.yamlandopenv.yaml - OpenAI baseline runner:
baseline.py
Core Mechanics
Actions
- Navigation:
turn_left,turn_right,move_forward - Operations:
scan_bin,pick_item,pack_item - Resources:
recharge(battery),rest(stamina) - Utility:
wait
Advanced Mechanics (task-dependent)
- Obstacles: Impassable cells blocking direct paths β agents must route around them
- Item Weight & Carry Capacity: Items have weight (1-4 units); heavier items drain more battery while moving
- Stamina System: Movement costs stamina; when depleted, movement costs double battery
- Money & Profit Targets: Items have dollar values; correct packs earn money, wrong packs lose money
Environment Interface
from grid_env import WarehouseFulfillmentEnv
env = WarehouseFulfillmentEnv(task_id="easy_single_pick", seed=7)
# Reset environment
observation = env.reset(task_id="easy_single_pick", seed=7)
# Step with action
observation, reward, done, info = env.step("move_forward")
# Get full state
state = env.state()
All data types (WarehouseAction, WarehouseObservation, WarehouseReward, WarehouseState) are Pydantic models.
Tasks
The environment includes 8 progressively challenging tasks across 4 difficulty levels:
Easy
easy_single_pick: Fulfill one urgent thermometer order (40 steps, battery 36)
Medium
medium_multi_item: Two-line order with scan verification (60 steps, battery 34)obstacle_course: Navigate around 6 obstacles to fulfill two-item order (70 steps, battery 40)
Hard
hard_restock_priority: Three-line order with battery management (85 steps, battery 24)heavy_lifting: Weight-constrained picking β items weigh 1-4 units, carry capacity 3 (90 steps, battery 32)stamina_run: Stamina management β movement drains stamina; rest to recover (80 steps, battery 36, stamina 12)
Expert
budget_run: Profit-driven fulfillment β earn $15+ from valued items (70 steps, battery 30, target $15)gauntlet: All mechanics combined β obstacles + weight + stamina + $20 profit target (120 steps, battery 28, stamina 10, carry capacity 3)
Each grader returns a deterministic rubric-based score in [0.0, 1.0] based on completion, efficiency, and constraint satisfaction.
Reward Design
The reward function provides dense trajectory-wide feedback:
Positive Rewards:
- Correct scans: +0.12
- Correct picks: +0.20
- Correct packs: +0.35
- Completion bonus: +0.50
- Timely recharge/rest: +0.06-0.08
Penalties:
- Invalid actions: -0.08 to -0.10
- Wrong picks: -0.18
- Wrong packs: -0.15
- Obstacle collisions: -0.12
- Overweight attempts: -0.12
- Waiting: -0.01 per step
Money System (expert tasks):
- Correct packs earn item value in dollars
- Wrong packs lose 50% of item value
Installation & Usage
Install Dependencies
pip install -r requirements.txt
Run Baseline with OpenAI
The baseline runner uses the OpenAI Python SDK with multi-seed evaluation for robust scoring:
export OPENAI_API_KEY="your_api_key_here"
export OPENAI_MODEL="gpt-4o-mini"
# Single-seed evaluation (backward compatible)
export EVAL_SEEDS="7"
python3 -m grid_env.baseline
# Multi-seed evaluation (default: 5 seeds)
export EVAL_SEEDS="7,42,123,456,789"
python3 -m grid_env.baseline
Run Inference Script
For comprehensive evaluation across all 8 tasks with multi-seed support:
# Create .env file with:
# HF_TOKEN=your_api_key
# API_BASE_URL=https://api.openai.com/v1
# MODEL_NAME=gpt-4o-mini
# EVAL_SEEDS=7,42,123,456,789
# Load .env and run
set -a && source .env && set +a
python3 inference.py
Multi-seed evaluation runs each task across multiple random seeds and reports:
- Mean score Β± standard deviation
- Min/max scores across seeds
- Success rate across seeds
This prevents overfitting to specific warehouse layouts and provides more reliable performance metrics.
Testing
Run the full test suite (205 tests):
pytest tests/ -v
Multi-Seed Evaluation Demo
See how multi-seed evaluation works without requiring an API key:
python3 example_multiseed.py
This runs a random policy across multiple seeds and shows score variance.
Server Deployment
The environment includes a FastAPI server for remote access:
# Local development
uvicorn grid_env.Server.app:app --reload
# Docker deployment
docker build -t warehouse-env .
docker run -p 8000:8000 warehouse-env
API endpoints:
GET /healthβ Server statusGET /tasksβ List all tasksPOST /resetβ Reset environmentPOST /stepβ Execute actionGET /stateβ Get current state
Validation
If openenv is installed, validate the manifest:
openenv validate grid_env/openv.yaml
Project Structure
grid_env/
βββ env.py # Core environment logic
βββ tasks.py # Task definitions (8 tasks)
βββ graders.py # Rubric-based graders
βββ models.py # Pydantic data models
βββ baseline.py # OpenAI baseline runner
βββ tools.py # Action tool definitions
βββ world.py # World state wrapper
βββ client.py # Client facade
βββ Server/
βββ app.py # FastAPI server
βββ warehouse_env.py # Service wrapper
tests/ # 205 test cases
inference.py # LLM inference runner
License
MIT License - Copyright 2026 Soham Bose