SAC-Ant

A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the Isaac-Ant-Direct-v0 task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.

GitHub Repository: DavidH2802/SAC-from-scratch

Ant Locomotion Policy

Model Description

The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.

Architecture

Actor: MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
Q-Networks (x2): MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).

Training Details

Hyperparameters

Parameter	Value
Task	Isaac-Ant-Direct-v0
Parallel Envs	4096
Actor LR	3e-4
Critic LR	3e-4
Alpha LR	3e-4
Discount (γ)	0.99
Polyak (τ)	0.005
Initial Alpha	1.0
Batch Size	2048
Buffer Capacity	1,000,000
Warmup Steps	200
Total Steps	50,000
Total Transitions	~205M
Training Time	~45 minutes

Hardware

GPU: NVIDIA RTX 4070 SUPER (12 GB VRAM)
CPU: Intel Xeon E5-2686 v4
Cloud: vast.ai

Observation Normalization

The checkpoint includes running mean and variance statistics for observation normalization. These must be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.

How to Use

Download

from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="DavidH2802/SAC-Ant",
    filename="final_policy.pt",
)

Inference

Clone the full project for the model and environment code:

git clone https://github.com/DavidH2802/SAC-from-scratch.git
cd SAC-from-scratch

Then load and run the policy:

import torch
from src.model import Actor
from src.utils.normalization import RunningMeanStd

checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)

# Restore actor
actor = Actor(obs_dim, act_dim).to("cuda")
actor.load_state_dict(checkpoint["actor"])
actor.eval()

# Restore observation normalization (required)
obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
obs_rms.mean = checkpoint["obs_rms_mean"]
obs_rms.var = checkpoint["obs_rms_var"]

# Run policy
obs_norm = obs_rms.normalize(obs)  # obs from env
with torch.no_grad():
    action = actor.get_deterministic_action(obs_norm)  # deterministic (mean action)

Full Evaluation with Isaac Lab

See the GitHub repository for complete setup instructions including Isaac Lab installation and the eval.py script for video recording.

Checkpoint Contents

The final_policy.pt file contains:

Key	Description
`actor`	Actor network state dict
`obs_rms_mean`	Running mean for observation normalization
`obs_rms_var`	Running variance for observation normalization

Framework

Algorithm: SAC (from scratch, no RL library dependencies)
Deep Learning: PyTorch
Simulation: NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
Environment: Isaac-Ant-Direct-v0

Citation

@misc{habinski2026sac,
  author = {David Habinski},
  title = {SAC from Scratch in PyTorch with Isaac Lab},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/DavidH2802/SAC-from-scratch}
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning