tarmus commited on
Commit
cd4d30d
·
verified ·
1 Parent(s): 7f0ae21

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - reinforcement-learning
5
+ - panda-gym
6
+ - stable-baselines3
7
+ - sb3-contrib
8
+ - hw3
9
+ ---
10
+
11
+ # HW3 — model checkpoints
12
+
13
+ Trained checkpoints for *EN.601.495/695 Introduction to Robot Learning, Spring 2026, HW3*.
14
+
15
+ Each subdirectory mirrors `starter-code/logs/<algo>/<env>_N/` from
16
+ [github.com/tarcode2004/hw3-rl](https://github.com/tarcode2004/hw3-rl)
17
+ and contains the `best_model.zip` (saved by SB3's `EvalCallback`),
18
+ the `evaluations.npz` curves, and the run's monitor csv.
19
+
20
+ The headline result is **TQC + HER + TimeFeatureWrapper on
21
+ PandaPickAndPlace-v3**: 92–98 % success on 50 deterministic eval
22
+ episodes. See the [standalone repo](https://huggingface.co/tarmus/tqc-PandaPickAndPlace-v3)
23
+ for that model.
24
+
25
+ ## Layout
26
+
27
+ | Path | Algo / wrapper | Env | Notes |
28
+ |------|----------------|-----|-------|
29
+ | `zoo3/sac-PandaReach-v3` | zoo3 | sac-PandaReach-v3 | SAC + HER, sparse, converged 100% by 5k steps |
30
+ | `zoo3/sac-PandaPush-v3` | zoo3 | sac-PandaPush-v3 | SAC + HER, sparse, killed at 639k by reboot, best_model from peak |
31
+ | `zoo3/tqc-PandaPickAndPlace-v3` | zoo3 | tqc-PandaPickAndPlace-v3 | TQC + HER + TimeFeatureWrapper, sparse, 92-98% deterministic eval |
32
+ | `zoo3/ppo-PandaReach-v3` | zoo3 | ppo-PandaReach-v3 | PPO sparse, converged ~100K |
33
+ | `minimal_sac/PandaReachDense-v3` | minimal_sac | PandaReachDense-v3 | vanilla SAC, dense reward, 20K steps |
34
+ | `minimal_sac/PandaPickAndPlaceDense-v3` | minimal_sac | PandaPickAndPlaceDense-v3 | vanilla SAC, dense reward, 300K steps |
35
+ | `minimal_sac/PandaPushDense-v3` | minimal_sac | PandaPushDense-v3 | vanilla SAC, dense reward, 300K steps |
36
+ | `mbrl/PandaReachDense-v3` | mbrl | PandaReachDense-v3 | Basic MBRL (random data + dynamics + SAC on surrogate) |
37
+ | `mbrl/PandaPickAndPlaceDense-v3` | mbrl | PandaPickAndPlaceDense-v3 | Basic MBRL, partial run (stopped ~220K) |