DavidH2802 commited on
Commit
a8629b2
·
verified ·
1 Parent(s): 2931280

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -3
README.md CHANGED
@@ -1,3 +1,148 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - reinforcement-learning
5
+ - sac
6
+ - pytorch
7
+ - isaac-lab
8
+ - robotics
9
+ - locomotion
10
+ library_name: pytorch
11
+ model-index:
12
+ - name: SAC-Ant
13
+ results: []
14
+ ---
15
+
16
+ # SAC-Ant
17
+
18
+ A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the `Isaac-Ant-Direct-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.
19
+
20
+ **GitHub Repository:** [DavidH2802/SAC-from-scratch](https://github.com/DavidH2802/SAC-from-scratch)
21
+
22
+ <p align="center">
23
+ <img src="ant.gif" alt="Ant Locomotion Policy" width="480"/>
24
+ </p>
25
+
26
+ ## Model Description
27
+
28
+ The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.
29
+
30
+ ### Architecture
31
+
32
+ - **Actor:** MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
33
+ - **Q-Networks (x2):** MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).
34
+
35
+ ## Training Details
36
+
37
+ ### Hyperparameters
38
+
39
+ | Parameter | Value |
40
+ |---|---|
41
+ | Task | Isaac-Ant-Direct-v0 |
42
+ | Parallel Envs | 4096 |
43
+ | Actor LR | 3e-4 |
44
+ | Critic LR | 3e-4 |
45
+ | Alpha LR | 3e-4 |
46
+ | Discount (γ) | 0.99 |
47
+ | Polyak (τ) | 0.005 |
48
+ | Initial Alpha | 1.0 |
49
+ | Batch Size | 2048 |
50
+ | Buffer Capacity | 1,000,000 |
51
+ | Warmup Steps | 200 |
52
+ | Total Steps | 50,000 |
53
+ | Total Transitions | ~205M |
54
+ | Training Time | ~45 minutes |
55
+
56
+ ### Hardware
57
+
58
+ - **GPU:** NVIDIA RTX 4070 SUPER (12 GB VRAM)
59
+ - **CPU:** Intel Xeon E5-2686 v4
60
+ - **Cloud:** vast.ai
61
+
62
+ ### Observation Normalization
63
+
64
+ The checkpoint includes running mean and variance statistics for observation normalization. These **must** be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.
65
+
66
+ ## How to Use
67
+
68
+ ### Download
69
+
70
+ ```python
71
+ from huggingface_hub import hf_hub_download
72
+
73
+ checkpoint_path = hf_hub_download(
74
+ repo_id="DavidH2802/SAC-Ant",
75
+ filename="final_policy.pt",
76
+ )
77
+ ```
78
+
79
+ ### Inference
80
+
81
+ Clone the full project for the model and environment code:
82
+
83
+ ```bash
84
+ git clone https://github.com/DavidH2802/SAC-from-scratch.git
85
+ cd SAC-from-scratch
86
+ ```
87
+
88
+ Then load and run the policy:
89
+
90
+ ```python
91
+ import torch
92
+ from src.model import Actor
93
+ from src.utils.normalization import RunningMeanStd
94
+
95
+ checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)
96
+
97
+ # Restore actor
98
+ actor = Actor(obs_dim, act_dim).to("cuda")
99
+ actor.load_state_dict(checkpoint["actor"])
100
+ actor.eval()
101
+
102
+ # Restore observation normalization (required)
103
+ obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
104
+ obs_rms.mean = checkpoint["obs_rms_mean"]
105
+ obs_rms.var = checkpoint["obs_rms_var"]
106
+
107
+ # Run policy
108
+ obs_norm = obs_rms.normalize(obs) # obs from env
109
+ with torch.no_grad():
110
+ action = actor.get_deterministic_action(obs_norm) # deterministic (mean action)
111
+ ```
112
+
113
+ ### Full Evaluation with Isaac Lab
114
+
115
+ See the [GitHub repository](https://github.com/DavidH2802/SAC-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording.
116
+
117
+ ## Checkpoint Contents
118
+
119
+ The `final_policy.pt` file contains:
120
+
121
+ | Key | Description |
122
+ |---|---|
123
+ | `actor` | Actor network state dict |
124
+ | `obs_rms_mean` | Running mean for observation normalization |
125
+ | `obs_rms_var` | Running variance for observation normalization |
126
+
127
+ ## Framework
128
+
129
+ - **Algorithm:** SAC (from scratch, no RL library dependencies)
130
+ - **Deep Learning:** PyTorch
131
+ - **Simulation:** NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
132
+ - **Environment:** Isaac-Ant-Direct-v0
133
+
134
+ ## Citation
135
+
136
+ ```bibtex
137
+ @misc{habinski2026sac,
138
+ author = {David Habinski},
139
+ title = {SAC from Scratch in PyTorch with Isaac Lab},
140
+ year = {2026},
141
+ publisher = {GitHub},
142
+ url = {https://github.com/DavidH2802/SAC-from-scratch}
143
+ }
144
+ ```
145
+
146
+ ## License
147
+
148
+ MIT