gym_xarm_4tasks | Goal-Conditioned SAC Policy
This repository contains a Stable-Baselines3 SAC model trained ongym_xarm/XarmLift-v0, conditioned to perform 4 directional pushing tasks
with a single policy.
The policy receives the desired task as part of the observation (goal-conditioning).
Tasks / Goals
The model supports 4 pushing directions, encoded as a one-hot vector:
| Goal ID | Name | Direction (world frame) |
|---|---|---|
| 0 | forward | +X |
| 1 | left | +Y |
| 2 | right | −Y |
| 3 | back | −X (towards robot base) |
The goal is appended to the observation as a 4D one-hot vector.
Observation Space
- Original environment observation: 28 floats
- Goal one-hot vector: 4 floats
- Total observation size: 32
obs = [env_obs(28), goal_one_hot(4)]
Action Space
Continuous action space:
Box(4,) in range [-1, 1]
(As defined by gym_xarm/XarmLift-v0)
Files in this Repository
best_model.zip— Trained SAC policy (Stable-Baselines3)vecnormalize.pkl— Observation normalization statistics (VecNormalize)
⚠️ Important: vecnormalize.pkl must be loaded for correct inference.
How to Load the Model
import gymnasium as gym
import gym_xarm
import numpy as np
from stable_baselines3 import SAC
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
# Create env (must match training wrappers!)
def make_env():
env = gym.make("gym_xarm/XarmLift-v0")
# add the same goal-conditioning wrapper used during training
# (28 -> 32 obs)
return env
venv = DummyVecEnv([make_env])
# Load VecNormalize stats
venv = VecNormalize.load("vecnormalize.pkl", venv)
venv.training = False
venv.norm_reward = False
# Load model
model = SAC.load("best_model.zip")
# Example inference loop
obs = venv.reset()
for _ in range(300):
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, info = venv.step(action)
Training Details
- Algorithm: Soft Actor-Critic (SAC)
- Framework: Stable-Baselines3
- Multi-task learning via goal-conditioned observations
- Reward shaping based on directional displacement of the object
- Evaluated independently on all 4 tasks
Notes
- The model performs well on forward / left / right
- The backward task is more challenging due to environment geometry
- If you retrain or fine-tune, ensure goal directions match the physical layout
- Downloads last month
- 8