Spaces:

sohambose98
/

mini-rl-env

Sleeping

App Files Files Community

mini-rl-env / README.md

Soham Bose

updated README

468d9ec unverified 2 months ago

preview code

raw

history blame contribute delete

6.71 kB

metadata

title: RL-Env Warehouse Fulfillment
emoji: 🏭
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: false

RL-Env: Warehouse Fulfillment Environment

A progressively challenging OpenEnv-style warehouse fulfillment environment simulating realistic pharmacy micro-fulfillment workflows with obstacles, weight constraints, stamina management, and budget optimization.

Overview

Agents control a warehouse robot navigating a 7×7 grid to fulfill customer orders. Tasks range from simple single-item pickups to complex multi-constraint challenges requiring strategic planning across obstacles, battery management, item weights, stamina conservation, and profit optimization.

Requirements Coverage

Real-world task: Implemented in env.py
Eight graded tasks: Defined in tasks.py with progressive difficulty
Deterministic graders: Implemented in graders.py
Typed OpenEnv models: Implemented in models.py
OpenEnv manifests: openenv.yaml and openv.yaml
OpenAI baseline runner: baseline.py

Core Mechanics

Actions

Navigation: turn_left, turn_right, move_forward
Operations: scan_bin, pick_item, pack_item
Resources: recharge (battery), rest (stamina)
Utility: wait

Advanced Mechanics (task-dependent)

Obstacles: Impassable cells blocking direct paths — agents must route around them
Item Weight & Carry Capacity: Items have weight (1-4 units); heavier items drain more battery while moving
Stamina System: Movement costs stamina; when depleted, movement costs double battery
Money & Profit Targets: Items have dollar values; correct packs earn money, wrong packs lose money

Environment Interface

from grid_env import WarehouseFulfillmentEnv

env = WarehouseFulfillmentEnv(task_id="easy_single_pick", seed=7)

# Reset environment
observation = env.reset(task_id="easy_single_pick", seed=7)

# Step with action
observation, reward, done, info = env.step("move_forward")

# Get full state
state = env.state()

All data types (WarehouseAction, WarehouseObservation, WarehouseReward, WarehouseState) are Pydantic models.

Tasks

The environment includes 8 progressively challenging tasks across 4 difficulty levels:

Easy

easy_single_pick: Fulfill one urgent thermometer order (40 steps, battery 36)

Medium

medium_multi_item: Two-line order with scan verification (60 steps, battery 34)
obstacle_course: Navigate around 6 obstacles to fulfill two-item order (70 steps, battery 40)

Hard

hard_restock_priority: Three-line order with battery management (85 steps, battery 24)
heavy_lifting: Weight-constrained picking — items weigh 1-4 units, carry capacity 3 (90 steps, battery 32)
stamina_run: Stamina management — movement drains stamina; rest to recover (80 steps, battery 36, stamina 12)

Expert

budget_run: Profit-driven fulfillment — earn $15+ from valued items (70 steps, battery 30, target $15)
gauntlet: All mechanics combined — obstacles + weight + stamina + $20 profit target (120 steps, battery 28, stamina 10, carry capacity 3)

Each grader returns a deterministic rubric-based score in [0.0, 1.0] based on completion, efficiency, and constraint satisfaction.

Reward Design

The reward function provides dense trajectory-wide feedback:

Positive Rewards:

Correct scans: +0.12
Correct picks: +0.20
Correct packs: +0.35
Completion bonus: +0.50
Timely recharge/rest: +0.06-0.08

Penalties:

Invalid actions: -0.08 to -0.10
Wrong picks: -0.18
Wrong packs: -0.15
Obstacle collisions: -0.12
Overweight attempts: -0.12
Waiting: -0.01 per step

Money System (expert tasks):

Correct packs earn item value in dollars
Wrong packs lose 50% of item value

Installation & Usage

Install Dependencies

pip install -r requirements.txt

Run Baseline with OpenAI

The baseline runner uses the OpenAI Python SDK with multi-seed evaluation for robust scoring:

export OPENAI_API_KEY="your_api_key_here"
export OPENAI_MODEL="gpt-4o-mini"

# Single-seed evaluation (backward compatible)
export EVAL_SEEDS="7"
python3 -m grid_env.baseline

# Multi-seed evaluation (default: 5 seeds)
export EVAL_SEEDS="7,42,123,456,789"
python3 -m grid_env.baseline

Run Inference Script

For comprehensive evaluation across all 8 tasks with multi-seed support:

# Create .env file with:
# HF_TOKEN=your_api_key
# API_BASE_URL=https://api.openai.com/v1
# MODEL_NAME=gpt-4o-mini
# EVAL_SEEDS=7,42,123,456,789

# Load .env and run
set -a && source .env && set +a
python3 inference.py

Multi-seed evaluation runs each task across multiple random seeds and reports:

Mean score ± standard deviation
Min/max scores across seeds
Success rate across seeds

This prevents overfitting to specific warehouse layouts and provides more reliable performance metrics.

Testing

Run the full test suite (205 tests):

pytest tests/ -v

Multi-Seed Evaluation Demo

See how multi-seed evaluation works without requiring an API key:

python3 example_multiseed.py

This runs a random policy across multiple seeds and shows score variance.

Server Deployment

The environment includes a FastAPI server for remote access:

# Local development
uvicorn grid_env.Server.app:app --reload

# Docker deployment
docker build -t warehouse-env .
docker run -p 8000:8000 warehouse-env

API endpoints:

GET /health — Server status
GET /tasks — List all tasks
POST /reset — Reset environment
POST /step — Execute action
GET /state — Get current state

Validation

If openenv is installed, validate the manifest:

openenv validate grid_env/openv.yaml

Project Structure

grid_env/
├── env.py           # Core environment logic
├── tasks.py         # Task definitions (8 tasks)
├── graders.py       # Rubric-based graders
├── models.py        # Pydantic data models
├── baseline.py      # OpenAI baseline runner
├── tools.py         # Action tool definitions
├── world.py         # World state wrapper
├── client.py        # Client facade
└── Server/
    ├── app.py           # FastAPI server
    └── warehouse_env.py # Service wrapper

tests/               # 205 test cases
inference.py         # LLM inference runner