Reward Rush: Walker2d SAC (Stripped Model)

This repository contains a stripped Soft Actor-Critic (SAC) actor trained on the Walker2d-v4 environment.

The checkpoint has been cleaned to ensure:

CPU-only compatibility
Safe loading with weights_only=True
No CUDA tensors
No pickle or NumPy object issues
Easy inference without training code

Model Architecture

The model is a Gaussian SAC actor evaluated deterministically using the mean action.

Observation space:

Dimension stored in obs_dim inside the checkpoint

Action space:

Dimension stored in act_dim
Continuous actions in the range [-1, 1]

Network structure:

Input → Linear(obs_dim, hidden_dim) → ReLU
→ Linear(hidden_dim, hidden_dim) → ReLU
→ Linear(hidden_dim, hidden_dim) → ReLU
→ Mean head: Linear(hidden_dim, act_dim)
→ Log-Std head: Linear(hidden_dim, act_dim)

Only the mean head is used during evaluation.

Common Mistakes to Avoid

Incorrect number of layers
The checkpoint contains net.0, net.2, and net.4.
This means the actor has three hidden Linear layers.
Using fewer layers will cause unexpected key errors.
Renaming layers
The model must use self.net = nn.Sequential(...).
Renaming layers to fc1, fc2, etc. will break loading.
Hardcoding dimensions
Do not hardcode observation or action sizes.
Always read obs_dim and act_dim from the checkpoint.
Using weights_only=False
The model is already stripped.
Always load with weights_only=True.
Sampling actions
This model is evaluated deterministically.
Do not sample from a distribution or use log_std.

Download and Test Code

The script below downloads the model from Hugging Face and evaluates it for 100 episodes.

import torch
import gymnasium as gym
import numpy as np
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="Nharen/Reward_Rush_SAC_Walker",
    filename="Walker.pth"
)

ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=True)

obs_dim = ckpt["obs_dim"]
act_dim = ckpt["act_dim"]
hidden_dim = ckpt.get("hidden_dim", 256)

class SACActor(torch.nn.Module):
    def __init__(self, obs_dim, act_dim, hidden_dim=256):
        super().__init__()
        self.net = torch.nn.Sequential(
            torch.nn.Linear(obs_dim, hidden_dim),
            torch.nn.ReLU(),
            torch.nn.Linear(hidden_dim, hidden_dim),
            torch.nn.ReLU(),
            torch.nn.Linear(hidden_dim, hidden_dim),
            torch.nn.ReLU(),
        )
        self.mean = torch.nn.Linear(hidden_dim, act_dim)
        self.log_std = torch.nn.Linear(hidden_dim, act_dim)

    def forward(self, obs):
        x = self.net(obs)
        return torch.tanh(self.mean(x))

actor = SACActor(obs_dim, act_dim, hidden_dim)
actor.load_state_dict(ckpt["actor_state_dict"])
actor.eval()

env = gym.make("Walker2d-v4")

num_episodes = 100
episode_rewards = []

for ep in range(num_episodes):
    obs, _ = env.reset()
    done = False
    ep_reward = 0.0

    while not done:
        with torch.no_grad():
            obs_t = torch.tensor(obs, dtype=torch.float32).unsqueeze(0)
            action = actor(obs_t).squeeze(0).cpu().numpy()

        obs, reward, terminated, truncated, _ = env.step(action)
        ep_reward += reward
        done = terminated or truncated

    episode_rewards.append(ep_reward)
    print(f"Episode {ep + 1:3d} | Reward: {ep_reward:.2f}")

env.close()

episode_rewards = np.array(episode_rewards)

print("Episodes:", num_episodes)
print("Mean reward:", episode_rewards.mean())
print("Std reward:", episode_rewards.std())
print("Min reward:", episode_rewards.min())
print("Max reward:", episode_rewards.max())

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

Mean Reward on Walker2d-v3
self-reported

4283.036
Std Reward on Walker2d-v3
self-reported

128.536
Mean Episode Length on Walker2d-v3
self-reported

997.180