HW3 โ€” model checkpoints

Trained checkpoints for EN.601.495/695 Introduction to Robot Learning, Spring 2026, HW3.

Each subdirectory mirrors starter-code/logs/<algo>/<env>_N/ from github.com/tarcode2004/hw3-rl and contains the best_model.zip (saved by SB3's EvalCallback), the evaluations.npz curves, and the run's monitor csv.

The headline result is TQC + HER + TimeFeatureWrapper on PandaPickAndPlace-v3: 92โ€“98 % success on 50 deterministic eval episodes. See the standalone repo for that model.

Layout

Path Algo / wrapper Env Notes
zoo3/sac-PandaReach-v3 zoo3 sac-PandaReach-v3 SAC + HER, sparse, converged 100% by 5k steps
zoo3/sac-PandaPush-v3 zoo3 sac-PandaPush-v3 SAC + HER, sparse, killed at 639k by reboot, best_model from peak
zoo3/tqc-PandaPickAndPlace-v3 zoo3 tqc-PandaPickAndPlace-v3 TQC + HER + TimeFeatureWrapper, sparse, 92-98% deterministic eval
zoo3/ppo-PandaReach-v3 zoo3 ppo-PandaReach-v3 PPO sparse, converged ~100K
minimal_sac/PandaReachDense-v3 minimal_sac PandaReachDense-v3 vanilla SAC, dense reward, 20K steps
minimal_sac/PandaPickAndPlaceDense-v3 minimal_sac PandaPickAndPlaceDense-v3 vanilla SAC, dense reward, 300K steps
minimal_sac/PandaPushDense-v3 minimal_sac PandaPushDense-v3 vanilla SAC, dense reward, 300K steps
mbrl/PandaReachDense-v3 mbrl PandaReachDense-v3 Basic MBRL (random data + dynamics + SAC on surrogate)
mbrl/PandaPickAndPlaceDense-v3 mbrl PandaPickAndPlaceDense-v3 Basic MBRL, partial run (stopped ~220K)
Downloads last month
956
Video Preview
loading