TD3+BC - Fetch Robot Pick-and-Place
TD3 with Behavior Cloning regularization for offline RL on Fetch robot pick-and-place.
Model Description
This model was trained using offline reinforcement learning on a static dataset of 540 demonstration episodes (26,538 transitions) collected from trajectory optimization on the Fetch robot in Gazebo simulation.
Task
- Robot: Fetch Mobile Manipulator (7 arm + 2 gripper = 9 DOF)
- Task: Pick-and-place (lift cracker box >= 10cm)
- State space: 9D joint positions
- Action space: 9D target joint positions
Dataset
- Source: Trajectory optimization with quality-tiered rewards
- Episodes: 540 (304 both_pass, 194 lift_only, 42 fail)
- Transitions: 26,538
- Reward structure: Sparse terminal (both_pass=1.0, lift_only=0.5, fail=0.0)
Training Hyperparameters
| Parameter | Value |
|---|---|
| algorithm | TD3+BC |
| alpha | 2.5 |
| lr | 0.0003 |
| batch_size | 256 |
| discount | 0.99 |
| tau | 0.005 |
| policy_delay | 2 |
| num_iterations | 100000 |
| hidden_dims | [256, 256] |
| state_normalization | zero_mean_unit_var |
| action_normalization | [-1, 1] via joint limits |
Evaluation Results
| Metric | Value |
|---|---|
| action_mse | 0.374415 |
| gazebo_success_rate | 0/5 (0%) |
| gazebo_avg_lift | 0.0002 |
Offline Policy Evaluation
Action MSE measures how closely the policy reproduces the demonstration actions:
- TD3+BC: MSE = 0.374 (poor action matching)
- IQL (tau=0.7): MSE = 0.0027 (good)
- IQL (tau=0.9): MSE = 0.0012 (best)
Gazebo Evaluation (5 episodes)
All models achieved 0% success rate in the initial pilot evaluation. This is expected for a first iteration - the models need further refinement (e.g., longer training, reward shaping, or residual RL integration).
Files
checkpoint.pt- Model weights (PyTorch)training_code.py- Training implementationtraining_log.csv- Training metrics over timeeval_gazebo.csv- Gazebo evaluation resultsdataset_stats.json- Dataset normalization statisticsconfig.json- Model configuration
Usage
import torch
import numpy as np
# Load checkpoint
ckpt = torch.load("checkpoint.pt", map_location="cpu", weights_only=True)
# Load dataset stats for normalization
import json
with open("dataset_stats.json") as f:
stats = json.load(f)
state_mean = torch.tensor(stats["state_mean"])
state_std = torch.tensor(stats["state_std"])
Joint Names
JOINTS = [
'shoulder_pan_joint', # idx 0
'shoulder_lift_joint', # idx 1
'upperarm_roll_joint', # idx 2
'elbow_flex_joint', # idx 3
'forearm_roll_joint', # idx 4
'wrist_flex_joint', # idx 5
'wrist_roll_joint', # idx 6
'l_gripper_finger_joint',# idx 7
'r_gripper_finger_joint',# idx 8
]
Citation
@misc{fetch_offline_rl_pilot,
title={Offline RL Pilot Study for Fetch Robot Pick-and-Place},
year={2026},
}
- Downloads last month
- 4