SAC-Ant
A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the Isaac-Ant-Direct-v0 task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.
GitHub Repository: DavidH2802/SAC-from-scratch
Model Description
The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.
Architecture
- Actor: MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
- Q-Networks (x2): MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).
Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| Task | Isaac-Ant-Direct-v0 |
| Parallel Envs | 4096 |
| Actor LR | 3e-4 |
| Critic LR | 3e-4 |
| Alpha LR | 3e-4 |
| Discount (γ) | 0.99 |
| Polyak (Ï„) | 0.005 |
| Initial Alpha | 1.0 |
| Batch Size | 2048 |
| Buffer Capacity | 1,000,000 |
| Warmup Steps | 200 |
| Total Steps | 50,000 |
| Total Transitions | ~205M |
| Training Time | ~45 minutes |
Hardware
- GPU: NVIDIA RTX 4070 SUPER (12 GB VRAM)
- CPU: Intel Xeon E5-2686 v4
- Cloud: vast.ai
Observation Normalization
The checkpoint includes running mean and variance statistics for observation normalization. These must be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.
How to Use
Download
from huggingface_hub import hf_hub_download
checkpoint_path = hf_hub_download(
repo_id="DavidH2802/SAC-Ant",
filename="final_policy.pt",
)
Inference
Clone the full project for the model and environment code:
git clone https://github.com/DavidH2802/SAC-from-scratch.git
cd SAC-from-scratch
Then load and run the policy:
import torch
from src.model import Actor
from src.utils.normalization import RunningMeanStd
checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)
# Restore actor
actor = Actor(obs_dim, act_dim).to("cuda")
actor.load_state_dict(checkpoint["actor"])
actor.eval()
# Restore observation normalization (required)
obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
obs_rms.mean = checkpoint["obs_rms_mean"]
obs_rms.var = checkpoint["obs_rms_var"]
# Run policy
obs_norm = obs_rms.normalize(obs) # obs from env
with torch.no_grad():
action = actor.get_deterministic_action(obs_norm) # deterministic (mean action)
Full Evaluation with Isaac Lab
See the GitHub repository for complete setup instructions including Isaac Lab installation and the eval.py script for video recording.
Checkpoint Contents
The final_policy.pt file contains:
| Key | Description |
|---|---|
actor |
Actor network state dict |
obs_rms_mean |
Running mean for observation normalization |
obs_rms_var |
Running variance for observation normalization |
Framework
- Algorithm: SAC (from scratch, no RL library dependencies)
- Deep Learning: PyTorch
- Simulation: NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
- Environment: Isaac-Ant-Direct-v0
Citation
@misc{habinski2026sac,
author = {David Habinski},
title = {SAC from Scratch in PyTorch with Isaac Lab},
year = {2026},
publisher = {GitHub},
url = {https://github.com/DavidH2802/SAC-from-scratch}
}
License
MIT