Update README.md

81e4802 verified 12 days ago

3.87 kB

	---
	license: mit
	tags:
	- reinforcement-learning
	- sac
	- pytorch
	- isaac-lab
	- robotics
	- locomotion
	library_name: pytorch
	model-index:
	- name: SAC-Ant
	results: []
	---

	# SAC-Ant

	A Soft Actor-Critic (SAC) policy trained from scratch in PyTorch on the `Isaac-Ant-Direct-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.

	GitHub Repository: [DavidH2802/SAC-from-scratch](https://github.com/DavidH2802/SAC-from-scratch)

	<p align="center">
	<img src="ant.gif" alt="Ant Locomotion Policy" width="480"/>
	</p>

	## Model Description

	The model is a squashed Gaussian policy (Actor) that controls a multi-legged Ant robot to locomote. The policy outputs continuous joint-level actions squashed through tanh.

	### Architecture

	- Actor: MLP (obs → 256 → 256) with ReLU activations, two output heads for mean and state-dependent log-std. Actions squashed through tanh.
	- Q-Networks (x2): MLP ((obs, action) → 256 → 256 → 1) with LayerNorm and ReLU activations (included in checkpoint but not needed for inference).

	## Training Details

	### Hyperparameters

	\| Parameter \| Value \|
	\|---\|---\|
	\| Task \| Isaac-Ant-Direct-v0 \|
	\| Parallel Envs \| 4096 \|
	\| Actor LR \| 3e-4 \|
	\| Critic LR \| 3e-4 \|
	\| Alpha LR \| 3e-4 \|
	\| Discount (γ) \| 0.99 \|
	\| Polyak (τ) \| 0.005 \|
	\| Initial Alpha \| 1.0 \|
	\| Batch Size \| 2048 \|
	\| Buffer Capacity \| 1,000,000 \|
	\| Warmup Steps \| 200 \|
	\| Total Steps \| 50,000 \|
	\| Total Transitions \| ~205M \|
	\| Training Time \| ~45 minutes \|

	### Hardware

	- GPU: NVIDIA RTX 4070 SUPER (12 GB VRAM)
	- CPU: Intel Xeon E5-2686 v4
	- Cloud: vast.ai

	### Observation Normalization

	The checkpoint includes running mean and variance statistics for observation normalization. These must be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.

	## How to Use

	### Download

	```python
	from huggingface_hub import hf_hub_download

	checkpoint_path = hf_hub_download(
	repo_id="DavidH2802/SAC-Ant",
	filename="final_policy.pt",
	)
	```

	### Inference

	Clone the full project for the model and environment code:

	```bash
	git clone https://github.com/DavidH2802/SAC-from-scratch.git
	cd SAC-from-scratch
	```

	Then load and run the policy:

	```python
	import torch
	from src.model import Actor
	from src.utils.normalization import RunningMeanStd

	checkpoint = torch.load("final_policy.pt", map_location="cuda", weights_only=True)

	# Restore actor
	actor = Actor(obs_dim, act_dim).to("cuda")
	actor.load_state_dict(checkpoint["actor"])
	actor.eval()

	# Restore observation normalization (required)
	obs_rms = RunningMeanStd(shape=(obs_dim,), device="cuda")
	obs_rms.mean = checkpoint["obs_rms_mean"]
	obs_rms.var = checkpoint["obs_rms_var"]

	# Run policy
	obs_norm = obs_rms.normalize(obs) # obs from env
	with torch.no_grad():
	action = actor.get_deterministic_action(obs_norm) # deterministic (mean action)
	```

	### Full Evaluation with Isaac Lab

	See the [GitHub repository](https://github.com/DavidH2802/SAC-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording.

	## Checkpoint Contents

	The `final_policy.pt` file contains:

	\| Key \| Description \|
	\|---\|---\|
	\| `actor` \| Actor network state dict \|
	\| `obs_rms_mean` \| Running mean for observation normalization \|
	\| `obs_rms_var` \| Running variance for observation normalization \|

	## Framework

	- Algorithm: SAC (from scratch, no RL library dependencies)
	- Deep Learning: PyTorch
	- Simulation: NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
	- Environment: Isaac-Ant-Direct-v0

	## Citation

	```bibtex
	@misc{habinski2026sac,
	author = {David Habinski},
	title = {SAC from Scratch in PyTorch with Isaac Lab},
	year = {2026},
	publisher = {GitHub},
	url = {https://github.com/DavidH2802/SAC-from-scratch}
	}
	```

	## License

	MIT