gym_xarm_4tasks | Goal-Conditioned SAC Policy

This repository contains a Stable-Baselines3 SAC model trained on
gym_xarm/XarmLift-v0, conditioned to perform 4 directional pushing tasks with a single policy.

The policy receives the desired task as part of the observation (goal-conditioning).

Tasks / Goals

The model supports 4 pushing directions, encoded as a one-hot vector:

Goal ID Name Direction (world frame)
0 forward +X
1 left +Y
2 right −Y
3 back −X (towards robot base)

The goal is appended to the observation as a 4D one-hot vector.

Observation Space

  • Original environment observation: 28 floats
  • Goal one-hot vector: 4 floats
  • Total observation size: 32

obs = [env_obs(28), goal_one_hot(4)]

Action Space

Continuous action space:


Box(4,) in range [-1, 1]

(As defined by gym_xarm/XarmLift-v0)

Files in this Repository

  • best_model.zip — Trained SAC policy (Stable-Baselines3)
  • vecnormalize.pkl — Observation normalization statistics (VecNormalize)

⚠️ Important: vecnormalize.pkl must be loaded for correct inference.

How to Load the Model

import gymnasium as gym
import gym_xarm
import numpy as np

from stable_baselines3 import SAC
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

# Create env (must match training wrappers!)
def make_env():
    env = gym.make("gym_xarm/XarmLift-v0")
    # add the same goal-conditioning wrapper used during training
    # (28 -> 32 obs)
    return env

venv = DummyVecEnv([make_env])

# Load VecNormalize stats
venv = VecNormalize.load("vecnormalize.pkl", venv)
venv.training = False
venv.norm_reward = False

# Load model
model = SAC.load("best_model.zip")

# Example inference loop
obs = venv.reset()
for _ in range(300):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, info = venv.step(action)

Training Details

  • Algorithm: Soft Actor-Critic (SAC)
  • Framework: Stable-Baselines3
  • Multi-task learning via goal-conditioned observations
  • Reward shaping based on directional displacement of the object
  • Evaluated independently on all 4 tasks

Notes

  • The model performs well on forward / left / right
  • The backward task is more challenging due to environment geometry
  • If you retrain or fine-tune, ensure goal directions match the physical layout
Downloads last month
8
Video Preview
loading