File size: 3,870 Bytes
a8629b2 5253503 a8629b2 81e4802 a8629b2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | ---
license: mit
tags:
- reinforcement-learning
- sac
- pytorch
- isaac-lab
- robotics
- locomotion
library_name: pytorch
model-index:
- name: SAC-Ant
results: []
---
# SAC-Ant
A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the `Isaac-Ant-Direct-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.
**GitHub Repository:** [DavidH2802/SAC-from-scratch](https://github.com/DavidH2802/SAC-from-scratch)
<p align="center">
<img src="ant.gif" alt="Ant Locomotion Policy" width="480"/>
</p>
## Model Description
The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.
### Architecture
- **Actor:** MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
- **Q-Networks (x2):** MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).
## Training Details
### Hyperparameters
| Parameter | Value |
|---|---|
| Task | Isaac-Ant-Direct-v0 |
| Parallel Envs | 4096 |
| Actor LR | 3e-4 |
| Critic LR | 3e-4 |
| Alpha LR | 3e-4 |
| Discount (γ) | 0.99 |
| Polyak (τ) | 0.005 |
| Initial Alpha | 1.0 |
| Batch Size | 2048 |
| Buffer Capacity | 1,000,000 |
| Warmup Steps | 200 |
| Total Steps | 50,000 |
| Total Transitions | ~205M |
| Training Time | ~45 minutes |
### Hardware
- **GPU:** NVIDIA RTX 4070 SUPER (12 GB VRAM)
- **CPU:** Intel Xeon E5-2686 v4
- **Cloud:** vast.ai
### Observation Normalization
The checkpoint includes running mean and variance statistics for observation normalization. These **must** be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.
## How to Use
### Download
```python
from huggingface_hub import hf_hub_download
checkpoint_path = hf_hub_download(
repo_id="DavidH2802/SAC-Ant",
filename="final_policy.pt",
)
```
### Inference
Clone the full project for the model and environment code:
```bash
git clone https://github.com/DavidH2802/SAC-from-scratch.git
cd SAC-from-scratch
```
Then load and run the policy:
```python
import torch
from src.model import Actor
from src.utils.normalization import RunningMeanStd
checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)
# Restore actor
actor = Actor(obs_dim, act_dim).to("cuda")
actor.load_state_dict(checkpoint["actor"])
actor.eval()
# Restore observation normalization (required)
obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
obs_rms.mean = checkpoint["obs_rms_mean"]
obs_rms.var = checkpoint["obs_rms_var"]
# Run policy
obs_norm = obs_rms.normalize(obs) # obs from env
with torch.no_grad():
action = actor.get_deterministic_action(obs_norm) # deterministic (mean action)
```
### Full Evaluation with Isaac Lab
See the [GitHub repository](https://github.com/DavidH2802/SAC-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording.
## Checkpoint Contents
The `final_policy.pt` file contains:
| Key | Description |
|---|---|
| `actor` | Actor network state dict |
| `obs_rms_mean` | Running mean for observation normalization |
| `obs_rms_var` | Running variance for observation normalization |
## Framework
- **Algorithm:** SAC (from scratch, no RL library dependencies)
- **Deep Learning:** PyTorch
- **Simulation:** NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
- **Environment:** Isaac-Ant-Direct-v0
## Citation
```bibtex
@misc{habinski2026sac,
author = {David Habinski},
title = {SAC from Scratch in PyTorch with Isaac Lab},
year = {2026},
publisher = {GitHub},
url = {https://github.com/DavidH2802/SAC-from-scratch}
}
```
## License
MIT
|