File size: 3,870 Bytes

---
license: mit
tags:
  - reinforcement-learning
  - sac
  - pytorch
  - isaac-lab
  - robotics
  - locomotion
library_name: pytorch
model-index:
  - name: SAC-Ant
    results: []
---

# SAC-Ant

A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the `Isaac-Ant-Direct-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.

**GitHub Repository:** [DavidH2802/SAC-from-scratch](https://github.com/DavidH2802/SAC-from-scratch)

<p align="center">
  <img src="ant.gif" alt="Ant Locomotion Policy" width="480"/>
</p>

## Model Description

The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.

### Architecture

- **Actor:** MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
- **Q-Networks (x2):** MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).

## Training Details

### Hyperparameters

| Parameter | Value |
|---|---|
| Task | Isaac-Ant-Direct-v0 |
| Parallel Envs | 4096 |
| Actor LR | 3e-4 |
| Critic LR | 3e-4 |
| Alpha LR | 3e-4 |
| Discount (γ) | 0.99 |
| Polyak (τ) | 0.005 |
| Initial Alpha | 1.0 |
| Batch Size | 2048 |
| Buffer Capacity | 1,000,000 |
| Warmup Steps | 200 |
| Total Steps | 50,000 |
| Total Transitions | ~205M |
| Training Time | ~45 minutes |

### Hardware

- **GPU:** NVIDIA RTX 4070 SUPER (12 GB VRAM)
- **CPU:** Intel Xeon E5-2686 v4
- **Cloud:** vast.ai

### Observation Normalization

The checkpoint includes running mean and variance statistics for observation normalization. These **must** be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.

## How to Use

### Download

```python
from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="DavidH2802/SAC-Ant",
    filename="final_policy.pt",
)
```

### Inference

Clone the full project for the model and environment code:

```bash
git clone https://github.com/DavidH2802/SAC-from-scratch.git
cd SAC-from-scratch
```

Then load and run the policy:

```python
import torch
from src.model import Actor
from src.utils.normalization import RunningMeanStd

checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)

# Restore actor
actor = Actor(obs_dim, act_dim).to("cuda")
actor.load_state_dict(checkpoint["actor"])
actor.eval()

# Restore observation normalization (required)
obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
obs_rms.mean = checkpoint["obs_rms_mean"]
obs_rms.var = checkpoint["obs_rms_var"]

# Run policy
obs_norm = obs_rms.normalize(obs)  # obs from env
with torch.no_grad():
    action = actor.get_deterministic_action(obs_norm)  # deterministic (mean action)
```

### Full Evaluation with Isaac Lab

See the [GitHub repository](https://github.com/DavidH2802/SAC-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording.

## Checkpoint Contents

The `final_policy.pt` file contains:

| Key | Description |
|---|---|
| `actor` | Actor network state dict |
| `obs_rms_mean` | Running mean for observation normalization |
| `obs_rms_var` | Running variance for observation normalization |

## Framework

- **Algorithm:** SAC (from scratch, no RL library dependencies)
- **Deep Learning:** PyTorch
- **Simulation:** NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
- **Environment:** Isaac-Ant-Direct-v0

## Citation

```bibtex
@misc{habinski2026sac,
  author = {David Habinski},
  title = {SAC from Scratch in PyTorch with Isaac Lab},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/DavidH2802/SAC-from-scratch}
}
```

## License

MIT