gym_xarm_4tasks | Goal-Conditioned SAC Policy

This repository contains a Stable-Baselines3 SAC model trained on
gym_xarm/XarmLift-v0, conditioned to perform 4 directional pushing tasks with a single policy.

The policy receives the desired task as part of the observation (goal-conditioning).

Tasks / Goals

The model supports 4 pushing directions, encoded as a one-hot vector:

Goal ID	Name	Direction (world frame)
0	forward	+X
1	left	+Y
2	right	−Y
3	back	−X (towards robot base)

The goal is appended to the observation as a 4D one-hot vector.

Observation Space

Original environment observation: 28 floats
Goal one-hot vector: 4 floats
Total observation size: 32


obs = [env_obs(28), goal_one_hot(4)]

Action Space

Continuous action space:


Box(4,) in range [-1, 1]

(As defined by gym_xarm/XarmLift-v0)

Files in this Repository

best_model.zip — Trained SAC policy (Stable-Baselines3)
vecnormalize.pkl — Observation normalization statistics (VecNormalize)

⚠️ Important: vecnormalize.pkl must be loaded for correct inference.

How to Load the Model

import gymnasium as gym
import gym_xarm
import numpy as np

from stable_baselines3 import SAC
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

# Create env (must match training wrappers!)
def make_env():
    env = gym.make("gym_xarm/XarmLift-v0")
    # add the same goal-conditioning wrapper used during training
    # (28 -> 32 obs)
    return env

venv = DummyVecEnv([make_env])

# Load VecNormalize stats
venv = VecNormalize.load("vecnormalize.pkl", venv)
venv.training = False
venv.norm_reward = False

# Load model
model = SAC.load("best_model.zip")

# Example inference loop
obs = venv.reset()
for _ in range(300):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, info = venv.step(action)

Training Details

Algorithm: Soft Actor-Critic (SAC)
Framework: Stable-Baselines3
Multi-task learning via goal-conditioned observations
Reward shaping based on directional displacement of the object
Evaluated independently on all 4 tasks

Notes

The model performs well on forward / left / right
The backward task is more challenging due to environment geometry
If you retrain or fine-tune, ensure goal directions match the physical layout

Downloads last month: -

Video Preview

Reinforcement Learning