Spaces:

sohambose98
/

mini-rl-env

Sleeping

App Files Files Community

mini-rl-env / README.md

Soham Bose

updated README

468d9ec unverified 2 months ago

preview code

raw

history blame contribute delete

6.71 kB

	---
	title: RL-Env Warehouse Fulfillment
	emoji: 🏭
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 8000
	pinned: false
	---

	# RL-Env: Warehouse Fulfillment Environment

	A progressively challenging OpenEnv-style warehouse fulfillment environment simulating realistic pharmacy micro-fulfillment workflows with obstacles, weight constraints, stamina management, and budget optimization.

	## Overview

	Agents control a warehouse robot navigating a 7×7 grid to fulfill customer orders. Tasks range from simple single-item pickups to complex multi-constraint challenges requiring strategic planning across obstacles, battery management, item weights, stamina conservation, and profit optimization.

	## Requirements Coverage

	- Real-world task: Implemented in [`env.py`](grid_env/env.py)
	- Eight graded tasks: Defined in [`tasks.py`](grid_env/tasks.py) with progressive difficulty
	- Deterministic graders: Implemented in [`graders.py`](grid_env/graders.py)
	- Typed OpenEnv models: Implemented in [`models.py`](grid_env/models.py)
	- OpenEnv manifests: [`openenv.yaml`](openenv.yaml) and [`openv.yaml`](grid_env/openv.yaml)
	- OpenAI baseline runner: [`baseline.py`](grid_env/baseline.py)

	## Core Mechanics

	### Actions
	- Navigation: `turn_left`, `turn_right`, `move_forward`
	- Operations: `scan_bin`, `pick_item`, `pack_item`
	- Resources: `recharge` (battery), `rest` (stamina)
	- Utility: `wait`

	### Advanced Mechanics (task-dependent)
	1. Obstacles: Impassable cells blocking direct paths — agents must route around them
	2. Item Weight & Carry Capacity: Items have weight (1-4 units); heavier items drain more battery while moving
	3. Stamina System: Movement costs stamina; when depleted, movement costs double battery
	4. Money & Profit Targets: Items have dollar values; correct packs earn money, wrong packs lose money

	## Environment Interface

	```python
	from grid_env import WarehouseFulfillmentEnv

	env = WarehouseFulfillmentEnv(task_id="easy_single_pick", seed=7)

	# Reset environment
	observation = env.reset(task_id="easy_single_pick", seed=7)

	# Step with action
	observation, reward, done, info = env.step("move_forward")

	# Get full state
	state = env.state()
	```

	All data types (`WarehouseAction`, `WarehouseObservation`, `WarehouseReward`, `WarehouseState`) are Pydantic models.

	## Tasks

	The environment includes 8 progressively challenging tasks across 4 difficulty levels:

	### Easy
	- `easy_single_pick`: Fulfill one urgent thermometer order (40 steps, battery 36)

	### Medium
	- `medium_multi_item`: Two-line order with scan verification (60 steps, battery 34)
	- `obstacle_course`: Navigate around 6 obstacles to fulfill two-item order (70 steps, battery 40)

	### Hard
	- `hard_restock_priority`: Three-line order with battery management (85 steps, battery 24)
	- `heavy_lifting`: Weight-constrained picking — items weigh 1-4 units, carry capacity 3 (90 steps, battery 32)
	- `stamina_run`: Stamina management — movement drains stamina; rest to recover (80 steps, battery 36, stamina 12)

	### Expert
	- `budget_run`: Profit-driven fulfillment — earn $15+ from valued items (70 steps, battery 30, target $15)
	- `gauntlet`: All mechanics combined — obstacles + weight + stamina + $20 profit target (120 steps, battery 28, stamina 10, carry capacity 3)

	Each grader returns a deterministic rubric-based score in `[0.0, 1.0]` based on completion, efficiency, and constraint satisfaction.

	## Reward Design

	The reward function provides dense trajectory-wide feedback:

	Positive Rewards:
	- Correct scans: +0.12
	- Correct picks: +0.20
	- Correct packs: +0.35
	- Completion bonus: +0.50
	- Timely recharge/rest: +0.06-0.08

	Penalties:
	- Invalid actions: -0.08 to -0.10
	- Wrong picks: -0.18
	- Wrong packs: -0.15
	- Obstacle collisions: -0.12
	- Overweight attempts: -0.12
	- Waiting: -0.01 per step

	Money System (expert tasks):
	- Correct packs earn item value in dollars
	- Wrong packs lose 50% of item value

	## Installation & Usage

	### Install Dependencies

	```bash
	pip install -r requirements.txt
	```

	### Run Baseline with OpenAI

	The baseline runner uses the OpenAI Python SDK with multi-seed evaluation for robust scoring:

	```bash
	export OPENAI_API_KEY="your_api_key_here"
	export OPENAI_MODEL="gpt-4o-mini"

	# Single-seed evaluation (backward compatible)
	export EVAL_SEEDS="7"
	python3 -m grid_env.baseline

	# Multi-seed evaluation (default: 5 seeds)
	export EVAL_SEEDS="7,42,123,456,789"
	python3 -m grid_env.baseline
	```

	### Run Inference Script

	For comprehensive evaluation across all 8 tasks with multi-seed support:

	```bash
	# Create .env file with:
	# HF_TOKEN=your_api_key
	# API_BASE_URL=https://api.openai.com/v1
	# MODEL_NAME=gpt-4o-mini
	# EVAL_SEEDS=7,42,123,456,789

	# Load .env and run
	set -a && source .env && set +a
	python3 inference.py
	```

	Multi-seed evaluation runs each task across multiple random seeds and reports:
	- Mean score ± standard deviation
	- Min/max scores across seeds
	- Success rate across seeds

	This prevents overfitting to specific warehouse layouts and provides more reliable performance metrics.

	## Testing

	Run the full test suite (205 tests):

	```bash
	pytest tests/ -v
	```

	### Multi-Seed Evaluation Demo

	See how multi-seed evaluation works without requiring an API key:

	```bash
	python3 example_multiseed.py
	```

	This runs a random policy across multiple seeds and shows score variance.

	## Server Deployment

	The environment includes a FastAPI server for remote access:

	```bash
	# Local development
	uvicorn grid_env.Server.app:app --reload

	# Docker deployment
	docker build -t warehouse-env .
	docker run -p 8000:8000 warehouse-env
	```

	API endpoints:
	- `GET /health` — Server status
	- `GET /tasks` — List all tasks
	- `POST /reset` — Reset environment
	- `POST /step` — Execute action
	- `GET /state` — Get current state

	## Validation

	If `openenv` is installed, validate the manifest:

	```bash
	openenv validate grid_env/openv.yaml
	```

	## Project Structure

	```
	grid_env/
	├── env.py # Core environment logic
	├── tasks.py # Task definitions (8 tasks)
	├── graders.py # Rubric-based graders
	├── models.py # Pydantic data models
	├── baseline.py # OpenAI baseline runner
	├── tools.py # Action tool definitions
	├── world.py # World state wrapper
	├── client.py # Client facade
	└── Server/
	├── app.py # FastAPI server
	└── warehouse_env.py # Service wrapper

	tests/ # 205 test cases
	inference.py # LLM inference runner
	```

	## License

	MIT License - Copyright 2026 Soham Bose