Upload trained policies for Approach, Grasp, and Transport-1
Browse files- README.md +61 -0
- agent_config.yaml +96 -0
- env_config.py +264 -0
- policy_approach.pth +3 -0
- policy_grasp.pth +3 -0
- policy_transport_gear_1.pth +3 -0
README.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- reinforcement-learning
|
| 5 |
+
- robotics
|
| 6 |
+
- isaac-lab
|
| 7 |
+
- rtx-5090
|
| 8 |
+
- industrial-assembly
|
| 9 |
+
datasets:
|
| 10 |
+
- simulation
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Galaxea Gearbox Assembly R1 Policies
|
| 14 |
+
|
| 15 |
+
This repository contains the trained Reinforcement Learning (RL) policies for the high-precision gearbox assembly task using the Galaxea R1 robot. These models were trained using **NVIDIA Isaac Lab** on a single **NVIDIA RTX 5090**, achieving state-of-the-art simulation throughput and convergence stability.
|
| 16 |
+
|
| 17 |
+
## Model Description
|
| 18 |
+
|
| 19 |
+
The policies are trained to control a 7-DoF robotic arm (Galaxea R1) to assemble a complex planetary gearbox. The task is decomposed into sequential sub-tasks: `Approach` -> `Grasp` -> `Transport` (for each gear).
|
| 20 |
+
|
| 21 |
+
- **Algorithm**: PPO (Proximal Policy Optimization) via `rl_games`
|
| 22 |
+
- **Observation Space**: 69-dim (Joint pos/vel, EE pose, Relative gear targets)
|
| 23 |
+
- **Action Space**: 14-dim (Joint position targets + Gripper)
|
| 24 |
+
- **Training Framework**: Isaac Lab (DirectRL Mode)
|
| 25 |
+
|
| 26 |
+
## Performance Metrics
|
| 27 |
+
|
| 28 |
+
The models were trained with a massive throughput of **~8,200 FPS** (Frames Per Second) using full GPU vectorization.
|
| 29 |
+
|
| 30 |
+
| Policy | Stage | Avg Reward | Critic Loss | Entropy | Status |
|
| 31 |
+
| :--- | :--- | :--- | :--- | :--- | :--- |
|
| 32 |
+
| **Approach** | 1 (Foundation) | ~241.4 | 3.8e-5 | 2.58 | **Converged** |
|
| 33 |
+
| **Grasp** | 2 (Manipulation) | ~240.9 | 3.3e-5 | -0.92 | **Converged** |
|
| 34 |
+
| **Transport 1** | 3 (Assembly) | ~282.6 | 1.7e-4 | 11.2 | **Robust** |
|
| 35 |
+
|
| 36 |
+
## Included Files
|
| 37 |
+
|
| 38 |
+
- `policy_approach.pth`: PyTorch checkpoint for the Approach phase.
|
| 39 |
+
- `policy_grasp.pth`: PyTorch checkpoint for the Grasping phase.
|
| 40 |
+
- `policy_transport_gear_1.pth`: PyTorch checkpoint for Transporting the first Sun Gear.
|
| 41 |
+
- `env_config.py`: The environment configuration used for training (PhysX settings, rewards).
|
| 42 |
+
- `agent_config.yaml`: The PPO hyperparameters.
|
| 43 |
+
|
| 44 |
+
## Usage
|
| 45 |
+
|
| 46 |
+
These policies are designed to be loaded into the Isaac Lab environment:
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
# Pseudo-code for loading
|
| 50 |
+
from rl_games.torch_runner import Runner
|
| 51 |
+
|
| 52 |
+
runner = Runner()
|
| 53 |
+
runner.load('policy_approach.pth')
|
| 54 |
+
# ... run inference ...
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## Hardware Specification
|
| 58 |
+
|
| 59 |
+
- **GPU**: NVIDIA GeForce RTX 5090 (32GB)
|
| 60 |
+
- **Training Time**: ~3 hours per policy (Optimized from 50+ days)
|
| 61 |
+
- **Simultaneous Envs**: 8,192
|
agent_config.yaml
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# RL-Games PPO Configuration for Long Trajectory Assembly
|
| 2 |
+
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
|
| 3 |
+
|
| 4 |
+
params:
|
| 5 |
+
seed: 42
|
| 6 |
+
|
| 7 |
+
# Environment wrapper clipping
|
| 8 |
+
env:
|
| 9 |
+
clip_observations: 5.0
|
| 10 |
+
clip_actions: 1.0
|
| 11 |
+
|
| 12 |
+
algo:
|
| 13 |
+
name: a2c_continuous
|
| 14 |
+
|
| 15 |
+
model:
|
| 16 |
+
name: continuous_a2c_logstd
|
| 17 |
+
|
| 18 |
+
network:
|
| 19 |
+
name: actor_critic
|
| 20 |
+
separate: False
|
| 21 |
+
|
| 22 |
+
space:
|
| 23 |
+
continuous:
|
| 24 |
+
mu_activation: None
|
| 25 |
+
sigma_activation: None
|
| 26 |
+
mu_init:
|
| 27 |
+
name: default
|
| 28 |
+
sigma_init:
|
| 29 |
+
name: const_initializer
|
| 30 |
+
val: 0
|
| 31 |
+
fixed_sigma: True
|
| 32 |
+
|
| 33 |
+
mlp:
|
| 34 |
+
units: [512, 256, 128]
|
| 35 |
+
activation: elu
|
| 36 |
+
d2rl: False
|
| 37 |
+
|
| 38 |
+
initializer:
|
| 39 |
+
name: default
|
| 40 |
+
regularizer:
|
| 41 |
+
name: None
|
| 42 |
+
|
| 43 |
+
load_checkpoint: False
|
| 44 |
+
load_path: ''
|
| 45 |
+
|
| 46 |
+
config:
|
| 47 |
+
name: Galaxea-LongTrajectoryAssembly-Direct-v0
|
| 48 |
+
full_experiment_name: LongTrajectoryAssembly
|
| 49 |
+
|
| 50 |
+
env_name: rlgpu
|
| 51 |
+
device: 'cuda:0'
|
| 52 |
+
device_name: 'cuda:0'
|
| 53 |
+
multi_gpu: False
|
| 54 |
+
ppo: True
|
| 55 |
+
mixed_precision: False
|
| 56 |
+
normalize_input: True
|
| 57 |
+
normalize_value: True
|
| 58 |
+
# value_bootstrap: True # Commented out to match Isaac Lab examples
|
| 59 |
+
num_actors: -1 # Will be set by num_envs
|
| 60 |
+
reward_shaper:
|
| 61 |
+
scale_value: 1.0
|
| 62 |
+
normalize_advantage: True
|
| 63 |
+
|
| 64 |
+
gamma: 0.99
|
| 65 |
+
tau: 0.95
|
| 66 |
+
learning_rate: 3e-4
|
| 67 |
+
lr_schedule: adaptive
|
| 68 |
+
kl_threshold: 0.008
|
| 69 |
+
|
| 70 |
+
score_to_win: 100000
|
| 71 |
+
max_epochs: 5000
|
| 72 |
+
save_best_after: 100
|
| 73 |
+
save_frequency: 100
|
| 74 |
+
print_stats: True
|
| 75 |
+
|
| 76 |
+
grad_norm: 1.0
|
| 77 |
+
entropy_coef: 0.001
|
| 78 |
+
truncate_grads: True
|
| 79 |
+
|
| 80 |
+
e_clip: 0.2
|
| 81 |
+
clip_value: True
|
| 82 |
+
|
| 83 |
+
# PPO specific
|
| 84 |
+
horizon_length: 32
|
| 85 |
+
minibatch_size: 16384
|
| 86 |
+
mini_epochs: 8
|
| 87 |
+
critic_coef: 2
|
| 88 |
+
bounds_loss_coef: 0.0001 # Add bounds loss coefficient to prevent b_loss error
|
| 89 |
+
|
| 90 |
+
# Training
|
| 91 |
+
games_to_track: 100
|
| 92 |
+
player:
|
| 93 |
+
deterministic: True
|
| 94 |
+
games_num: 1000000
|
| 95 |
+
print_stats: True
|
| 96 |
+
|
env_config.py
ADDED
|
@@ -0,0 +1,264 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# SPDX-License-Identifier: BSD-3-Clause
|
| 5 |
+
|
| 6 |
+
"""Configuration for Long Trajectory Gear Assembly Environment."""
|
| 7 |
+
|
| 8 |
+
from isaaclab.assets import ArticulationCfg, RigidObjectCfg
|
| 9 |
+
from isaaclab.envs import DirectRLEnvCfg
|
| 10 |
+
from isaaclab.scene import InteractiveSceneCfg
|
| 11 |
+
from isaaclab.sim import SimulationCfg, PhysxCfg
|
| 12 |
+
from isaaclab.utils import configclass
|
| 13 |
+
from isaaclab.sensors import CameraCfg
|
| 14 |
+
|
| 15 |
+
from Galaxea_Lab_External.robots import (
|
| 16 |
+
GALAXEA_R1_CHALLENGE_CFG,
|
| 17 |
+
GALAXEA_HEAD_CAMERA_CFG,
|
| 18 |
+
GALAXEA_HAND_CAMERA_CFG,
|
| 19 |
+
TABLE_CFG,
|
| 20 |
+
RING_GEAR_CFG,
|
| 21 |
+
SUN_PLANETARY_GEAR_CFG,
|
| 22 |
+
PLANETARY_CARRIER_CFG,
|
| 23 |
+
PLANETARY_REDUCER_CFG,
|
| 24 |
+
)
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
@configclass
|
| 28 |
+
class LongTrajectoryAssemblyEnvCfg(DirectRLEnvCfg):
|
| 29 |
+
"""Configuration for Long Trajectory Gear Assembly Environment.
|
| 30 |
+
|
| 31 |
+
This environment supports multi-stage assembly tasks with 8 policies:
|
| 32 |
+
- Policy_Approach (shared across all gears)
|
| 33 |
+
- Policy_Grasp (shared across all gears)
|
| 34 |
+
- Policy_Transport_Gear1~4, Carrier, Reducer (6 gear-specific policies)
|
| 35 |
+
|
| 36 |
+
Transition management is handled via environment-based rule-based transitions.
|
| 37 |
+
"""
|
| 38 |
+
|
| 39 |
+
# Record data settings
|
| 40 |
+
record_data = False
|
| 41 |
+
record_freq = 5
|
| 42 |
+
|
| 43 |
+
# Camera settings (disabled by default for RL training)
|
| 44 |
+
enable_cameras = False
|
| 45 |
+
|
| 46 |
+
# Environment settings
|
| 47 |
+
sim_dt = 0.01
|
| 48 |
+
decimation = 5
|
| 49 |
+
episode_length_s = 120.0 # Long trajectory: 120 seconds max
|
| 50 |
+
|
| 51 |
+
# Number of re-renders on reset (for camera sensors)
|
| 52 |
+
num_rerenders_on_reset = 5
|
| 53 |
+
|
| 54 |
+
# Action and observation spaces
|
| 55 |
+
# Action: Left arm(6) + Right arm(6) + Left gripper(1) + Right gripper(1) = 14
|
| 56 |
+
action_space = 14
|
| 57 |
+
# Observation space:
|
| 58 |
+
# - Joint pos: 6+6+1+1 = 14
|
| 59 |
+
# - Joint vel: 6+6+1+1 = 14
|
| 60 |
+
# - EE poses: 3+4+3+4 = 14
|
| 61 |
+
# - Gear obs: 3+4+3+3+4+1 = 18
|
| 62 |
+
# - Encodings: 3+6 = 9
|
| 63 |
+
# Total = 69
|
| 64 |
+
observation_space = 69
|
| 65 |
+
state_space = 0
|
| 66 |
+
|
| 67 |
+
# Simulation configuration
|
| 68 |
+
# Increase GPU collision stack size to handle many environments (default 2**26)
|
| 69 |
+
sim: SimulationCfg = SimulationCfg(
|
| 70 |
+
dt=sim_dt,
|
| 71 |
+
render_interval=decimation,
|
| 72 |
+
physx=PhysxCfg(
|
| 73 |
+
gpu_collision_stack_size=2**31, # Increased for 8192+ envs (approx 2.1 billion)
|
| 74 |
+
)
|
| 75 |
+
)
|
| 76 |
+
|
| 77 |
+
# Robot configuration
|
| 78 |
+
robot_cfg: ArticulationCfg = GALAXEA_R1_CHALLENGE_CFG.replace(
|
| 79 |
+
prim_path="/World/envs/env_.*/Robot"
|
| 80 |
+
)
|
| 81 |
+
|
| 82 |
+
# Table configuration
|
| 83 |
+
table_cfg: RigidObjectCfg = TABLE_CFG.replace(
|
| 84 |
+
prim_path="/World/envs/env_.*/Table"
|
| 85 |
+
)
|
| 86 |
+
|
| 87 |
+
# Gear configurations with default initial positions
|
| 88 |
+
ring_gear_cfg: RigidObjectCfg = RING_GEAR_CFG.replace(
|
| 89 |
+
prim_path="/World/envs/env_.*/ring_gear",
|
| 90 |
+
init_state=RigidObjectCfg.InitialStateCfg(
|
| 91 |
+
pos=(0.45, 0.0, 1.0),
|
| 92 |
+
rot=(1.0, 0.0, 0.0, 0.0),
|
| 93 |
+
)
|
| 94 |
+
)
|
| 95 |
+
|
| 96 |
+
sun_planetary_gear_1_cfg: RigidObjectCfg = SUN_PLANETARY_GEAR_CFG.replace(
|
| 97 |
+
prim_path="/World/envs/env_.*/sun_planetary_gear_1",
|
| 98 |
+
init_state=RigidObjectCfg.InitialStateCfg(
|
| 99 |
+
pos=(0.4, -0.2, 1.0),
|
| 100 |
+
rot=(1.0, 0.0, 0.0, 0.0),
|
| 101 |
+
)
|
| 102 |
+
)
|
| 103 |
+
|
| 104 |
+
sun_planetary_gear_2_cfg: RigidObjectCfg = SUN_PLANETARY_GEAR_CFG.replace(
|
| 105 |
+
prim_path="/World/envs/env_.*/sun_planetary_gear_2",
|
| 106 |
+
init_state=RigidObjectCfg.InitialStateCfg(
|
| 107 |
+
pos=(0.5, -0.25, 1.0),
|
| 108 |
+
rot=(1.0, 0.0, 0.0, 0.0),
|
| 109 |
+
)
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
sun_planetary_gear_3_cfg: RigidObjectCfg = SUN_PLANETARY_GEAR_CFG.replace(
|
| 113 |
+
prim_path="/World/envs/env_.*/sun_planetary_gear_3",
|
| 114 |
+
init_state=RigidObjectCfg.InitialStateCfg(
|
| 115 |
+
pos=(0.45, -0.15, 1.0),
|
| 116 |
+
rot=(1.0, 0.0, 0.0, 0.0),
|
| 117 |
+
)
|
| 118 |
+
)
|
| 119 |
+
|
| 120 |
+
sun_planetary_gear_4_cfg: RigidObjectCfg = SUN_PLANETARY_GEAR_CFG.replace(
|
| 121 |
+
prim_path="/World/envs/env_.*/sun_planetary_gear_4",
|
| 122 |
+
init_state=RigidObjectCfg.InitialStateCfg(
|
| 123 |
+
pos=(0.55, -0.3, 1.0),
|
| 124 |
+
rot=(1.0, 0.0, 0.0, 0.0),
|
| 125 |
+
)
|
| 126 |
+
)
|
| 127 |
+
|
| 128 |
+
planetary_carrier_cfg: RigidObjectCfg = PLANETARY_CARRIER_CFG.replace(
|
| 129 |
+
prim_path="/World/envs/env_.*/planetary_carrier",
|
| 130 |
+
init_state=RigidObjectCfg.InitialStateCfg(
|
| 131 |
+
pos=(0.5, 0.25, 1.0),
|
| 132 |
+
rot=(1.0, 0.0, 0.0, 0.0),
|
| 133 |
+
)
|
| 134 |
+
)
|
| 135 |
+
|
| 136 |
+
planetary_reducer_cfg: RigidObjectCfg = PLANETARY_REDUCER_CFG.replace(
|
| 137 |
+
prim_path="/World/envs/env_.*/planetary_reducer",
|
| 138 |
+
init_state=RigidObjectCfg.InitialStateCfg(
|
| 139 |
+
pos=(0.3, 0.1, 1.0),
|
| 140 |
+
rot=(1.0, 0.0, 0.0, 0.0),
|
| 141 |
+
)
|
| 142 |
+
)
|
| 143 |
+
|
| 144 |
+
# Physics material coefficients
|
| 145 |
+
table_friction_coefficient = 0.4
|
| 146 |
+
gears_friction_coefficient = 0.01
|
| 147 |
+
gripper_friction_coefficient = 2.0
|
| 148 |
+
|
| 149 |
+
# Camera configurations
|
| 150 |
+
head_camera_cfg: CameraCfg = GALAXEA_HEAD_CAMERA_CFG.replace(
|
| 151 |
+
prim_path="/World/envs/env_.*/Robot/zed_link/head_cam/head_cam"
|
| 152 |
+
)
|
| 153 |
+
left_hand_camera_cfg: CameraCfg = GALAXEA_HAND_CAMERA_CFG.replace(
|
| 154 |
+
prim_path="/World/envs/env_.*/Robot/left_realsense_link/left_hand_cam/left_hand_cam"
|
| 155 |
+
)
|
| 156 |
+
right_hand_camera_cfg: CameraCfg = GALAXEA_HAND_CAMERA_CFG.replace(
|
| 157 |
+
prim_path="/World/envs/env_.*/Robot/right_realsense_link/right_hand_cam/right_hand_cam"
|
| 158 |
+
)
|
| 159 |
+
|
| 160 |
+
# Scene configuration
|
| 161 |
+
scene: InteractiveSceneCfg = InteractiveSceneCfg(
|
| 162 |
+
num_envs=1,
|
| 163 |
+
env_spacing=4.0,
|
| 164 |
+
replicate_physics=True
|
| 165 |
+
)
|
| 166 |
+
|
| 167 |
+
# Joint names for robot control
|
| 168 |
+
left_arm_joint_dof_name = "left_arm_joint.*"
|
| 169 |
+
right_arm_joint_dof_name = "right_arm_joint.*"
|
| 170 |
+
left_gripper_dof_name = "left_gripper_axis1"
|
| 171 |
+
right_gripper_dof_name = "right_gripper_axis1"
|
| 172 |
+
torso_joint_dof_name = "torso_joint[1-3]"
|
| 173 |
+
torso_joint1_dof_name = "torso_joint1"
|
| 174 |
+
torso_joint2_dof_name = "torso_joint2"
|
| 175 |
+
torso_joint3_dof_name = "torso_joint3"
|
| 176 |
+
torso_joint4_dof_name = "torso_joint4"
|
| 177 |
+
|
| 178 |
+
# Initial torso joint positions
|
| 179 |
+
initial_torso_joint1_pos = 0.5
|
| 180 |
+
initial_torso_joint2_pos = -0.8
|
| 181 |
+
initial_torso_joint3_pos = 0.5
|
| 182 |
+
|
| 183 |
+
# Table offset
|
| 184 |
+
x_offset = 0.2
|
| 185 |
+
|
| 186 |
+
# Assembly precision (1cm as specified in requirements)
|
| 187 |
+
assembly_precision = 0.01 # 1cm
|
| 188 |
+
|
| 189 |
+
# Stage timeout configuration (seconds per sub-task)
|
| 190 |
+
stage_timeout_approach = 10.0
|
| 191 |
+
stage_timeout_grasp = 5.0
|
| 192 |
+
stage_timeout_transport = 15.0
|
| 193 |
+
|
| 194 |
+
# Reward weights
|
| 195 |
+
reward_approach_distance_weight = 0.1 # Horizontal distance to be above gear
|
| 196 |
+
reward_approach_height_weight = 0.1 # Correct pre-grasp height
|
| 197 |
+
reward_approach_orientation_weight = 0.1 # Gripper pointing downward
|
| 198 |
+
reward_approach_gripper_open_weight = 0.05 # Gripper is open
|
| 199 |
+
reward_approach_complete_bonus = 1.0
|
| 200 |
+
reward_grasp_gripper_weight = 0.1
|
| 201 |
+
reward_grasp_contact_weight = 0.1
|
| 202 |
+
reward_grasp_lift_weight = 0.1
|
| 203 |
+
reward_grasp_complete_bonus = 2.0
|
| 204 |
+
reward_transport_distance_weight = 0.2 # Horizontal alignment reward
|
| 205 |
+
reward_transport_height_weight = 0.2 # Height alignment reward
|
| 206 |
+
reward_transport_orientation_weight = 0.1 # Orientation alignment reward
|
| 207 |
+
reward_transport_stability_weight = 0.1 # Low velocity reward
|
| 208 |
+
reward_transport_complete_bonus = 10.0 # Bonus for meeting evaluate_score criteria
|
| 209 |
+
reward_transition_bonus = 5.0
|
| 210 |
+
reward_time_penalty = 0.001
|
| 211 |
+
|
| 212 |
+
# Approach completion thresholds
|
| 213 |
+
approach_distance_threshold = 0.05 # 5cm to gear center (deprecated, use below)
|
| 214 |
+
approach_horizontal_threshold = 0.03 # 3cm - EE must be directly above gear
|
| 215 |
+
approach_height_threshold = 0.02 # 2cm tolerance for pre-grasp height
|
| 216 |
+
approach_orientation_threshold = 0.3 # radians (deprecated)
|
| 217 |
+
approach_orientation_dot_threshold = 0.95 # quaternion dot product threshold (close to 1 = aligned)
|
| 218 |
+
gripper_open_threshold = 0.03 # gripper must be at least this open (rad)
|
| 219 |
+
pre_grasp_height_offset = 0.05 # 5cm above gear for pre-grasp position
|
| 220 |
+
|
| 221 |
+
# Grasp completion thresholds
|
| 222 |
+
grasp_gripper_closed_threshold = 0.8 # normalized gripper position
|
| 223 |
+
grasp_contact_force_threshold = 2.0 # Newtons
|
| 224 |
+
grasp_lift_height = 0.1 # 10cm above table
|
| 225 |
+
|
| 226 |
+
# Transport completion thresholds
|
| 227 |
+
transport_position_threshold = 0.01 # 1cm precision
|
| 228 |
+
transport_orientation_threshold = 0.1 # radians
|
| 229 |
+
transport_stability_velocity_threshold = 0.01 # m/s
|
| 230 |
+
|
| 231 |
+
# Gear assembly sequence
|
| 232 |
+
gear_sequence = [
|
| 233 |
+
"gear_1", # Sun planetary gear 1
|
| 234 |
+
"gear_2", # Sun planetary gear 2
|
| 235 |
+
"gear_3", # Sun planetary gear 3
|
| 236 |
+
"gear_4", # Sun planetary gear 4 (center)
|
| 237 |
+
"carrier", # Planetary carrier onto ring gear
|
| 238 |
+
"reducer", # Planetary reducer onto gear 4
|
| 239 |
+
]
|
| 240 |
+
|
| 241 |
+
# Pin local positions relative to planetary carrier
|
| 242 |
+
pin_0_local_pos = (0.0, -0.054, 0.0)
|
| 243 |
+
pin_1_local_pos = (0.0471, 0.0268, 0.0)
|
| 244 |
+
pin_2_local_pos = (-0.0471, 0.0268, 0.0)
|
| 245 |
+
|
| 246 |
+
# TCP (Tool Center Point) offsets
|
| 247 |
+
tcp_offset_x = 0.0079 # 0.3864 - 0.3785
|
| 248 |
+
tcp_offset_z = 0.0909 # 1.1475 - 1.05661
|
| 249 |
+
|
| 250 |
+
# Table and grasping heights
|
| 251 |
+
table_height = 0.9
|
| 252 |
+
grasping_height = -0.003
|
| 253 |
+
lifting_height = 0.2
|
| 254 |
+
|
| 255 |
+
# Sub-task types for training mode selection
|
| 256 |
+
# "full" - train entire sequence
|
| 257 |
+
# "approach" - train only approach sub-task
|
| 258 |
+
# "grasp" - train only grasp sub-task
|
| 259 |
+
# "transport_gear_1" through "transport_reducer" - train specific transport
|
| 260 |
+
training_subtask = "full"
|
| 261 |
+
|
| 262 |
+
# Starting gear index for curriculum learning (0-5)
|
| 263 |
+
curriculum_start_gear_idx = 0
|
| 264 |
+
|
policy_approach.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fd7a5dd8c95da78f3e09020569ba534d216e66320269777a7328eae358179221
|
| 3 |
+
size 2443653
|
policy_grasp.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2e1d879be6f34282d76421daac59455279304a6fe7c407399305c783708af5b6
|
| 3 |
+
size 2443653
|
policy_transport_gear_1.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5eecd4be84743813ea59f91938abc1b849b40d8bd069f13d84e84bbad12a36e9
|
| 3 |
+
size 2443653
|