File size: 3,870 Bytes
a8629b2
 
 
 
 
 
 
 
 
 
 
5253503
a8629b2
 
 
 
 
 
 
 
 
81e4802
 
 
 
a8629b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
license: mit
tags:
  - reinforcement-learning
  - sac
  - pytorch
  - isaac-lab
  - robotics
  - locomotion
library_name: pytorch
model-index:
  - name: SAC-Ant
    results: []
---

# SAC-Ant

A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the `Isaac-Ant-Direct-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.

**GitHub Repository:** [DavidH2802/SAC-from-scratch](https://github.com/DavidH2802/SAC-from-scratch)

<p align="center">
  <img src="ant.gif" alt="Ant Locomotion Policy" width="480"/>
</p>

## Model Description

The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.

### Architecture

- **Actor:** MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
- **Q-Networks (x2):** MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).

## Training Details

### Hyperparameters

| Parameter | Value |
|---|---|
| Task | Isaac-Ant-Direct-v0 |
| Parallel Envs | 4096 |
| Actor LR | 3e-4 |
| Critic LR | 3e-4 |
| Alpha LR | 3e-4 |
| Discount (γ) | 0.99 |
| Polyak (τ) | 0.005 |
| Initial Alpha | 1.0 |
| Batch Size | 2048 |
| Buffer Capacity | 1,000,000 |
| Warmup Steps | 200 |
| Total Steps | 50,000 |
| Total Transitions | ~205M |
| Training Time | ~45 minutes |

### Hardware

- **GPU:** NVIDIA RTX 4070 SUPER (12 GB VRAM)
- **CPU:** Intel Xeon E5-2686 v4
- **Cloud:** vast.ai

### Observation Normalization

The checkpoint includes running mean and variance statistics for observation normalization. These **must** be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.

## How to Use

### Download

```python
from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="DavidH2802/SAC-Ant",
    filename="final_policy.pt",
)
```

### Inference

Clone the full project for the model and environment code:

```bash
git clone https://github.com/DavidH2802/SAC-from-scratch.git
cd SAC-from-scratch
```

Then load and run the policy:

```python
import torch
from src.model import Actor
from src.utils.normalization import RunningMeanStd

checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)

# Restore actor
actor = Actor(obs_dim, act_dim).to("cuda")
actor.load_state_dict(checkpoint["actor"])
actor.eval()

# Restore observation normalization (required)
obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
obs_rms.mean = checkpoint["obs_rms_mean"]
obs_rms.var = checkpoint["obs_rms_var"]

# Run policy
obs_norm = obs_rms.normalize(obs)  # obs from env
with torch.no_grad():
    action = actor.get_deterministic_action(obs_norm)  # deterministic (mean action)
```

### Full Evaluation with Isaac Lab

See the [GitHub repository](https://github.com/DavidH2802/SAC-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording.

## Checkpoint Contents

The `final_policy.pt` file contains:

| Key | Description |
|---|---|
| `actor` | Actor network state dict |
| `obs_rms_mean` | Running mean for observation normalization |
| `obs_rms_var` | Running variance for observation normalization |

## Framework

- **Algorithm:** SAC (from scratch, no RL library dependencies)
- **Deep Learning:** PyTorch
- **Simulation:** NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
- **Environment:** Isaac-Ant-Direct-v0

## Citation

```bibtex
@misc{habinski2026sac,
  author = {David Habinski},
  title = {SAC from Scratch in PyTorch with Isaac Lab},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/DavidH2802/SAC-from-scratch}
}
```

## License

MIT