| --- |
| license: mit |
| tags: |
| - reinforcement-learning |
| - sac |
| - pytorch |
| - isaac-lab |
| - robotics |
| - locomotion |
| library_name: pytorch |
| model-index: |
| - name: SAC-Ant |
| results: [] |
| --- |
| |
| # SAC-Ant |
|
|
| A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the `Isaac-Ant-Direct-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments. |
|
|
| **GitHub Repository:** [DavidH2802/SAC-from-scratch](https://github.com/DavidH2802/SAC-from-scratch) |
|
|
| <p align="center"> |
| <img src="ant.gif" alt="Ant Locomotion Policy" width="480"/> |
| </p> |
|
|
| ## Model Description |
|
|
| The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh. |
|
|
| ### Architecture |
|
|
| - **Actor:** MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh. |
| - **Q-Networks (x2):** MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference). |
|
|
| ## Training Details |
|
|
| ### Hyperparameters |
|
|
| | Parameter | Value | |
| |---|---| |
| | Task | Isaac-Ant-Direct-v0 | |
| | Parallel Envs | 4096 | |
| | Actor LR | 3e-4 | |
| | Critic LR | 3e-4 | |
| | Alpha LR | 3e-4 | |
| | Discount (γ) | 0.99 | |
| | Polyak (τ) | 0.005 | |
| | Initial Alpha | 1.0 | |
| | Batch Size | 2048 | |
| | Buffer Capacity | 1,000,000 | |
| | Warmup Steps | 200 | |
| | Total Steps | 50,000 | |
| | Total Transitions | ~205M | |
| | Training Time | ~45 minutes | |
|
|
| ### Hardware |
|
|
| - **GPU:** NVIDIA RTX 4070 SUPER (12 GB VRAM) |
| - **CPU:** Intel Xeon E5-2686 v4 |
| - **Cloud:** vast.ai |
|
|
| ### Observation Normalization |
|
|
| The checkpoint includes running mean and variance statistics for observation normalization. These **must** be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly. |
|
|
| ## How to Use |
|
|
| ### Download |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| checkpoint_path = hf_hub_download( |
| repo_id="DavidH2802/SAC-Ant", |
| filename="final_policy.pt", |
| ) |
| ``` |
|
|
| ### Inference |
|
|
| Clone the full project for the model and environment code: |
|
|
| ```bash |
| git clone https://github.com/DavidH2802/SAC-from-scratch.git |
| cd SAC-from-scratch |
| ``` |
|
|
| Then load and run the policy: |
|
|
| ```python |
| import torch |
| from src.model import Actor |
| from src.utils.normalization import RunningMeanStd |
| |
| checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True) |
| |
| # Restore actor |
| actor = Actor(obs_dim, act_dim).to("cuda") |
| actor.load_state_dict(checkpoint["actor"]) |
| actor.eval() |
| |
| # Restore observation normalization (required) |
| obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda") |
| obs_rms.mean = checkpoint["obs_rms_mean"] |
| obs_rms.var = checkpoint["obs_rms_var"] |
| |
| # Run policy |
| obs_norm = obs_rms.normalize(obs) # obs from env |
| with torch.no_grad(): |
| action = actor.get_deterministic_action(obs_norm) # deterministic (mean action) |
| ``` |
|
|
| ### Full Evaluation with Isaac Lab |
|
|
| See the [GitHub repository](https://github.com/DavidH2802/SAC-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording. |
|
|
| ## Checkpoint Contents |
|
|
| The `final_policy.pt` file contains: |
|
|
| | Key | Description | |
| |---|---| |
| | `actor` | Actor network state dict | |
| | `obs_rms_mean` | Running mean for observation normalization | |
| | `obs_rms_var` | Running variance for observation normalization | |
|
|
| ## Framework |
|
|
| - **Algorithm:** SAC (from scratch, no RL library dependencies) |
| - **Deep Learning:** PyTorch |
| - **Simulation:** NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5 |
| - **Environment:** Isaac-Ant-Direct-v0 |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{habinski2026sac, |
| author = {David Habinski}, |
| title = {SAC from Scratch in PyTorch with Isaac Lab}, |
| year = {2026}, |
| publisher = {GitHub}, |
| url = {https://github.com/DavidH2802/SAC-from-scratch} |
| } |
| ``` |
|
|
| ## License |
|
|
| MIT |
|
|