--- license: mit tags: - reinforcement-learning - sac - pytorch - isaac-lab - robotics - locomotion library_name: pytorch model-index: - name: SAC-Ant results: [] --- # SAC-Ant A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the `Isaac-Ant-Direct-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments. **GitHub Repository:** [DavidH2802/SAC-from-scratch](https://github.com/DavidH2802/SAC-from-scratch)

Ant Locomotion Policy

## Model Description The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh. ### Architecture - **Actor:** MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh. - **Q-Networks (x2):** MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference). ## Training Details ### Hyperparameters | Parameter | Value | |---|---| | Task | Isaac-Ant-Direct-v0 | | Parallel Envs | 4096 | | Actor LR | 3e-4 | | Critic LR | 3e-4 | | Alpha LR | 3e-4 | | Discount (γ) | 0.99 | | Polyak (τ) | 0.005 | | Initial Alpha | 1.0 | | Batch Size | 2048 | | Buffer Capacity | 1,000,000 | | Warmup Steps | 200 | | Total Steps | 50,000 | | Total Transitions | ~205M | | Training Time | ~45 minutes | ### Hardware - **GPU:** NVIDIA RTX 4070 SUPER (12 GB VRAM) - **CPU:** Intel Xeon E5-2686 v4 - **Cloud:** vast.ai ### Observation Normalization The checkpoint includes running mean and variance statistics for observation normalization. These **must** be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly. ## How to Use ### Download ```python from huggingface_hub import hf_hub_download checkpoint_path = hf_hub_download( repo_id="DavidH2802/SAC-Ant", filename="final_policy.pt", ) ``` ### Inference Clone the full project for the model and environment code: ```bash git clone https://github.com/DavidH2802/SAC-from-scratch.git cd SAC-from-scratch ``` Then load and run the policy: ```python import torch from src.model import Actor from src.utils.normalization import RunningMeanStd checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True) # Restore actor actor = Actor(obs_dim, act_dim).to("cuda") actor.load_state_dict(checkpoint["actor"]) actor.eval() # Restore observation normalization (required) obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda") obs_rms.mean = checkpoint["obs_rms_mean"] obs_rms.var = checkpoint["obs_rms_var"] # Run policy obs_norm = obs_rms.normalize(obs) # obs from env with torch.no_grad(): action = actor.get_deterministic_action(obs_norm) # deterministic (mean action) ``` ### Full Evaluation with Isaac Lab See the [GitHub repository](https://github.com/DavidH2802/SAC-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording. ## Checkpoint Contents The `final_policy.pt` file contains: | Key | Description | |---|---| | `actor` | Actor network state dict | | `obs_rms_mean` | Running mean for observation normalization | | `obs_rms_var` | Running variance for observation normalization | ## Framework - **Algorithm:** SAC (from scratch, no RL library dependencies) - **Deep Learning:** PyTorch - **Simulation:** NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5 - **Environment:** Isaac-Ant-Direct-v0 ## Citation ```bibtex @misc{habinski2026sac, author = {David Habinski}, title = {SAC from Scratch in PyTorch with Isaac Lab}, year = {2026}, publisher = {GitHub}, url = {https://github.com/DavidH2802/SAC-from-scratch} } ``` ## License MIT