---
license: mit
tags:
- reinforcement-learning
- sac
- pytorch
- isaac-lab
- robotics
- locomotion
library_name: pytorch
model-index:
- name: SAC-Ant
results: []
---
# SAC-Ant
A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the `Isaac-Ant-Direct-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.
**GitHub Repository:** [DavidH2802/SAC-from-scratch](https://github.com/DavidH2802/SAC-from-scratch)
## Model Description
The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.
### Architecture
- **Actor:** MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
- **Q-Networks (x2):** MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).
## Training Details
### Hyperparameters
| Parameter | Value |
|---|---|
| Task | Isaac-Ant-Direct-v0 |
| Parallel Envs | 4096 |
| Actor LR | 3e-4 |
| Critic LR | 3e-4 |
| Alpha LR | 3e-4 |
| Discount (γ) | 0.99 |
| Polyak (τ) | 0.005 |
| Initial Alpha | 1.0 |
| Batch Size | 2048 |
| Buffer Capacity | 1,000,000 |
| Warmup Steps | 200 |
| Total Steps | 50,000 |
| Total Transitions | ~205M |
| Training Time | ~45 minutes |
### Hardware
- **GPU:** NVIDIA RTX 4070 SUPER (12 GB VRAM)
- **CPU:** Intel Xeon E5-2686 v4
- **Cloud:** vast.ai
### Observation Normalization
The checkpoint includes running mean and variance statistics for observation normalization. These **must** be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.
## How to Use
### Download
```python
from huggingface_hub import hf_hub_download
checkpoint_path = hf_hub_download(
repo_id="DavidH2802/SAC-Ant",
filename="final_policy.pt",
)
```
### Inference
Clone the full project for the model and environment code:
```bash
git clone https://github.com/DavidH2802/SAC-from-scratch.git
cd SAC-from-scratch
```
Then load and run the policy:
```python
import torch
from src.model import Actor
from src.utils.normalization import RunningMeanStd
checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)
# Restore actor
actor = Actor(obs_dim, act_dim).to("cuda")
actor.load_state_dict(checkpoint["actor"])
actor.eval()
# Restore observation normalization (required)
obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
obs_rms.mean = checkpoint["obs_rms_mean"]
obs_rms.var = checkpoint["obs_rms_var"]
# Run policy
obs_norm = obs_rms.normalize(obs) # obs from env
with torch.no_grad():
action = actor.get_deterministic_action(obs_norm) # deterministic (mean action)
```
### Full Evaluation with Isaac Lab
See the [GitHub repository](https://github.com/DavidH2802/SAC-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording.
## Checkpoint Contents
The `final_policy.pt` file contains:
| Key | Description |
|---|---|
| `actor` | Actor network state dict |
| `obs_rms_mean` | Running mean for observation normalization |
| `obs_rms_var` | Running variance for observation normalization |
## Framework
- **Algorithm:** SAC (from scratch, no RL library dependencies)
- **Deep Learning:** PyTorch
- **Simulation:** NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
- **Environment:** Isaac-Ant-Direct-v0
## Citation
```bibtex
@misc{habinski2026sac,
author = {David Habinski},
title = {SAC from Scratch in PyTorch with Isaac Lab},
year = {2026},
publisher = {GitHub},
url = {https://github.com/DavidH2802/SAC-from-scratch}
}
```
## License
MIT