TD3+BC - Fetch Robot Pick-and-Place

TD3 with Behavior Cloning regularization for offline RL on Fetch robot pick-and-place.

Model Description

This model was trained using offline reinforcement learning on a static dataset of 540 demonstration episodes (26,538 transitions) collected from trajectory optimization on the Fetch robot in Gazebo simulation.

Task

  • Robot: Fetch Mobile Manipulator (7 arm + 2 gripper = 9 DOF)
  • Task: Pick-and-place (lift cracker box >= 10cm)
  • State space: 9D joint positions
  • Action space: 9D target joint positions

Dataset

  • Source: Trajectory optimization with quality-tiered rewards
  • Episodes: 540 (304 both_pass, 194 lift_only, 42 fail)
  • Transitions: 26,538
  • Reward structure: Sparse terminal (both_pass=1.0, lift_only=0.5, fail=0.0)

Training Hyperparameters

Parameter Value
algorithm TD3+BC
alpha 2.5
lr 0.0003
batch_size 256
discount 0.99
tau 0.005
policy_delay 2
num_iterations 100000
hidden_dims [256, 256]
state_normalization zero_mean_unit_var
action_normalization [-1, 1] via joint limits

Evaluation Results

Metric Value
action_mse 0.374415
gazebo_success_rate 0/5 (0%)
gazebo_avg_lift 0.0002

Offline Policy Evaluation

Action MSE measures how closely the policy reproduces the demonstration actions:

  • TD3+BC: MSE = 0.374 (poor action matching)
  • IQL (tau=0.7): MSE = 0.0027 (good)
  • IQL (tau=0.9): MSE = 0.0012 (best)

Gazebo Evaluation (5 episodes)

All models achieved 0% success rate in the initial pilot evaluation. This is expected for a first iteration - the models need further refinement (e.g., longer training, reward shaping, or residual RL integration).

Files

  • checkpoint.pt - Model weights (PyTorch)
  • training_code.py - Training implementation
  • training_log.csv - Training metrics over time
  • eval_gazebo.csv - Gazebo evaluation results
  • dataset_stats.json - Dataset normalization statistics
  • config.json - Model configuration

Usage

import torch
import numpy as np

# Load checkpoint
ckpt = torch.load("checkpoint.pt", map_location="cpu", weights_only=True)

# Load dataset stats for normalization
import json
with open("dataset_stats.json") as f:
    stats = json.load(f)
state_mean = torch.tensor(stats["state_mean"])
state_std = torch.tensor(stats["state_std"])

Joint Names

JOINTS = [
    'shoulder_pan_joint',    # idx 0
    'shoulder_lift_joint',   # idx 1
    'upperarm_roll_joint',   # idx 2
    'elbow_flex_joint',      # idx 3
    'forearm_roll_joint',    # idx 4
    'wrist_flex_joint',      # idx 5
    'wrist_roll_joint',      # idx 6
    'l_gripper_finger_joint',# idx 7
    'r_gripper_finger_joint',# idx 8
]

Citation

@misc{fetch_offline_rl_pilot,
  title={Offline RL Pilot Study for Fetch Robot Pick-and-Place},
  year={2026},
}
Downloads last month
4
Video Preview
loading