SAC-Ant

A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the Isaac-Ant-Direct-v0 task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.

GitHub Repository: DavidH2802/SAC-from-scratch

Ant Locomotion Policy

Model Description

The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.

Architecture

  • Actor: MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
  • Q-Networks (x2): MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).

Training Details

Hyperparameters

Parameter Value
Task Isaac-Ant-Direct-v0
Parallel Envs 4096
Actor LR 3e-4
Critic LR 3e-4
Alpha LR 3e-4
Discount (γ) 0.99
Polyak (Ï„) 0.005
Initial Alpha 1.0
Batch Size 2048
Buffer Capacity 1,000,000
Warmup Steps 200
Total Steps 50,000
Total Transitions ~205M
Training Time ~45 minutes

Hardware

  • GPU: NVIDIA RTX 4070 SUPER (12 GB VRAM)
  • CPU: Intel Xeon E5-2686 v4
  • Cloud: vast.ai

Observation Normalization

The checkpoint includes running mean and variance statistics for observation normalization. These must be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.

How to Use

Download

from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="DavidH2802/SAC-Ant",
    filename="final_policy.pt",
)

Inference

Clone the full project for the model and environment code:

git clone https://github.com/DavidH2802/SAC-from-scratch.git
cd SAC-from-scratch

Then load and run the policy:

import torch
from src.model import Actor
from src.utils.normalization import RunningMeanStd

checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)

# Restore actor
actor = Actor(obs_dim, act_dim).to("cuda")
actor.load_state_dict(checkpoint["actor"])
actor.eval()

# Restore observation normalization (required)
obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
obs_rms.mean = checkpoint["obs_rms_mean"]
obs_rms.var = checkpoint["obs_rms_var"]

# Run policy
obs_norm = obs_rms.normalize(obs)  # obs from env
with torch.no_grad():
    action = actor.get_deterministic_action(obs_norm)  # deterministic (mean action)

Full Evaluation with Isaac Lab

See the GitHub repository for complete setup instructions including Isaac Lab installation and the eval.py script for video recording.

Checkpoint Contents

The final_policy.pt file contains:

Key Description
actor Actor network state dict
obs_rms_mean Running mean for observation normalization
obs_rms_var Running variance for observation normalization

Framework

  • Algorithm: SAC (from scratch, no RL library dependencies)
  • Deep Learning: PyTorch
  • Simulation: NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
  • Environment: Isaac-Ant-Direct-v0

Citation

@misc{habinski2026sac,
  author = {David Habinski},
  title = {SAC from Scratch in PyTorch with Isaac Lab},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/DavidH2802/SAC-from-scratch}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading