Upload env source code

16f1ab9 verified 8 days ago

6.19 kB

	# Vision: SO-ARM 101 Toy-Sorting Pipeline

	## End Goal

	Train a manipulation policy that picks up colored toy objects and drops them
	into matching colored trays, using imitation learning from teleoperated demos
	recorded in Isaac Sim.

	## Pipeline

	```
	┌─────────────────────────────────────────────────────────────────────┐
	│ 1. Simulate │
	│ Isaac Lab ToySortingEnv (Python 3.11 / Isaac Lab 2.3.2) │
	│ • Wooden table + SO-ARM 101 │
	│ • 3 colored trays (red \| green \| blue) │
	│ • 9 colored toys (3 per color, randomized each episode) │
	└─────────────────┬───────────────────────────────────────────────────┘
	│ Phase 2: ZMQ REP :5555
	▼
	┌─────────────────────────────────────────────────────────────────────┐
	│ 2. Collect Demos │
	│ LeRobot / SpaceMouse teleop client (Python 3.12) │
	│ • Sends joint targets to sim via ZMQ │
	│ • Streams obs/actions into LeRobot Dataset v3 format │
	│ • Pushes dataset to HuggingFace Hub │
	└─────────────────┬───────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────────────┐
	│ 3. Augment Dataset (optional) │
	│ • Background swap, color jitter, domain randomization │
	│ • Re-label with reward signal for RL fine-tuning │
	└─────────────────┬───────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────────────┐
	│ 4. Train Policy │
	│ lerobot train policy=act (or diffusion_policy) │
	│ • Loads dataset from HuggingFace Hub │
	│ • Saves checkpoint to Hub │
	└─────────────────┬───────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────────────┐
	│ 5. Evaluate in Sim │
	│ • Roll out policy in ToySortingEnv │
	│ • Log success rate (sort 3 toys correctly in <60 s) │
	│ • Push eval metrics to HuggingFace Hub │
	└─────────────────────────────────────────────────────────────────────┘
	```

	## Container Architecture

	```
	┌────────────────────────────────────────┐ ┌──────────────────────────────────────┐
	│ sim (isaac-lab:2.3.2, Python 3.11) │ │ train (python:3.12-slim + lerobot) │
	│ Isaac Lab ToySortingEnv │◄──►│ LeRobot training / data collection │
	│ ZMQ REP server :5555 (Phase 2) │ZMQ │ IsaacGymClient gymnasium wrapper │
	│ X11 GUI for visualization │ │ HuggingFace dataset push │
	└────────────────────────────────────────┘ └──────────────────────────────────────┘
	```

	## Phases

	\| Phase \| Status \| Description \|
	\|-------\|--------\|-------------\|
	\| 1 \| Done \| Isaac Lab env with real USD assets; X11 visualization \|
	\| 2 \| Scaffolded \| ZMQ bridge + LeRobot demo collection client \|
	\| 3 \| Planned \| Dataset augmentation pipeline \|
	\| 4 \| Planned \| ACT / Diffusion Policy training with LeRobot \|
	\| 5 \| Planned \| Closed-loop evaluation + HF metrics push \|

	## Asset Strategy

	Assets live outside git (large binary files). Two distribution paths:

	1. Developer machine: `python assets/download.py --extract` copies the
	needed USD files from the local `Lightwheel_Xx8T7EPOMd_KitchenRoom/` pack.
	2. Docker / CI: `python assets/download.py --download` fetches the
	pre-extracted subset from HuggingFace Hub (`HF_ASSET_REPO` env var).

	Neither the git repo nor the Docker image contains asset files directly.

	## Success Metric (Phase 5 target)

	> Place all 9 toys into their correct color-matched tray within 60 seconds,
	> measured over 50 random seeds. Target success rate ≥ 80 %.