DavidH2802
/

SAC-from-scratch

+---
+license: mit
+tags:
+  - reinforcement-learning
+  - sac
+  - pytorch
+  - isaac-lab
+  - robotics
+  - locomotion
+library_name: pytorch
+model-index:
+  - name: SAC-Ant
+    results: []
+---
+# SAC-Ant
+A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the `Isaac-Ant-Direct-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.
+**GitHub Repository:** [DavidH2802/SAC-from-scratch](https://github.com/DavidH2802/SAC-from-scratch)
+<p align="center">
+  <img src="ant.gif" alt="Ant Locomotion Policy" width="480"/>
+</p>
+## Model Description
+The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.
+### Architecture
+- **Actor:** MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
+- **Q-Networks (x2):** MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).
+## Training Details
+### Hyperparameters
+| Parameter | Value |
+|---|---|
+| Task | Isaac-Ant-Direct-v0 |
+| Parallel Envs | 4096 |
+| Actor LR | 3e-4 |
+| Critic LR | 3e-4 |
+| Alpha LR | 3e-4 |
+| Discount (γ) | 0.99 |
+| Polyak (τ) | 0.005 |
+| Initial Alpha | 1.0 |
+| Batch Size | 2048 |
+| Buffer Capacity | 1,000,000 |
+| Warmup Steps | 200 |
+| Total Steps | 50,000 |
+| Total Transitions | ~205M |
+| Training Time | ~45 minutes |
+### Hardware
+- **GPU:** NVIDIA RTX 4070 SUPER (12 GB VRAM)
+- **CPU:** Intel Xeon E5-2686 v4
+- **Cloud:** vast.ai
+### Observation Normalization
+The checkpoint includes running mean and variance statistics for observation normalization. These **must** be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.
+## How to Use
+### Download
+```python
+from huggingface_hub import hf_hub_download
+checkpoint_path = hf_hub_download(
+    repo_id="DavidH2802/SAC-Ant",
+    filename="final_policy.pt",
+)
+```
+### Inference
+Clone the full project for the model and environment code:
+```bash
+git clone https://github.com/DavidH2802/SAC-from-scratch.git
+cd SAC-from-scratch
+```
+Then load and run the policy:
+```python
+import torch
+from src.model import Actor
+from src.utils.normalization import RunningMeanStd
+checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)
+# Restore actor
+actor = Actor(obs_dim, act_dim).to("cuda")
+actor.load_state_dict(checkpoint["actor"])
+actor.eval()
+# Restore observation normalization (required)
+obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
+obs_rms.mean = checkpoint["obs_rms_mean"]
+obs_rms.var = checkpoint["obs_rms_var"]
+# Run policy
+obs_norm = obs_rms.normalize(obs)  # obs from env
+with torch.no_grad():
+    action = actor.get_deterministic_action(obs_norm)  # deterministic (mean action)
+```
+### Full Evaluation with Isaac Lab
+See the [GitHub repository](https://github.com/DavidH2802/SAC-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording.
+## Checkpoint Contents
+The `final_policy.pt` file contains:
+| Key | Description |
+|---|---|
+| `actor` | Actor network state dict |
+| `obs_rms_mean` | Running mean for observation normalization |
+| `obs_rms_var` | Running variance for observation normalization |
+## Framework
+- **Algorithm:** SAC (from scratch, no RL library dependencies)
+- **Deep Learning:** PyTorch
+- **Simulation:** NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
+- **Environment:** Isaac-Ant-Direct-v0
+## Citation
+```bibtex
+@misc{habinski2026sac,
+  author = {David Habinski},
+  title = {SAC from Scratch in PyTorch with Isaac Lab},
+  year = {2026},
+  publisher = {GitHub},
+  url = {https://github.com/DavidH2802/SAC-from-scratch}
+}
+```
+## License
+MIT