Instructions to use tarmus/hw3-rl-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use tarmus/hw3-rl-models with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="tarmus/hw3-rl-models", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
HW3 โ model checkpoints
Trained checkpoints for EN.601.495/695 Introduction to Robot Learning, Spring 2026, HW3.
Each subdirectory mirrors starter-code/logs/<algo>/<env>_N/ from
github.com/tarcode2004/hw3-rl
and contains the best_model.zip (saved by SB3's EvalCallback),
the evaluations.npz curves, and the run's monitor csv.
The headline result is TQC + HER + TimeFeatureWrapper on PandaPickAndPlace-v3: 92โ98 % success on 50 deterministic eval episodes. See the standalone repo for that model.
Layout
| Path | Algo / wrapper | Env | Notes |
|---|---|---|---|
zoo3/sac-PandaReach-v3 |
zoo3 | sac-PandaReach-v3 | SAC + HER, sparse, converged 100% by 5k steps |
zoo3/sac-PandaPush-v3 |
zoo3 | sac-PandaPush-v3 | SAC + HER, sparse, killed at 639k by reboot, best_model from peak |
zoo3/tqc-PandaPickAndPlace-v3 |
zoo3 | tqc-PandaPickAndPlace-v3 | TQC + HER + TimeFeatureWrapper, sparse, 92-98% deterministic eval |
zoo3/ppo-PandaReach-v3 |
zoo3 | ppo-PandaReach-v3 | PPO sparse, converged ~100K |
minimal_sac/PandaReachDense-v3 |
minimal_sac | PandaReachDense-v3 | vanilla SAC, dense reward, 20K steps |
minimal_sac/PandaPickAndPlaceDense-v3 |
minimal_sac | PandaPickAndPlaceDense-v3 | vanilla SAC, dense reward, 300K steps |
minimal_sac/PandaPushDense-v3 |
minimal_sac | PandaPushDense-v3 | vanilla SAC, dense reward, 300K steps |
mbrl/PandaReachDense-v3 |
mbrl | PandaReachDense-v3 | Basic MBRL (random data + dynamics + SAC on surrogate) |
mbrl/PandaPickAndPlaceDense-v3 |
mbrl | PandaPickAndPlaceDense-v3 | Basic MBRL, partial run (stopped ~220K) |
- Downloads last month
- 956