yjsm1203 commited on
Commit
d7f019f
·
verified ·
1 Parent(s): a6a889b

Upload trained policies for Approach, Grasp, and Transport-1

Browse files
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - reinforcement-learning
5
+ - robotics
6
+ - isaac-lab
7
+ - rtx-5090
8
+ - industrial-assembly
9
+ datasets:
10
+ - simulation
11
+ ---
12
+
13
+ # Galaxea Gearbox Assembly R1 Policies
14
+
15
+ This repository contains the trained Reinforcement Learning (RL) policies for the high-precision gearbox assembly task using the Galaxea R1 robot. These models were trained using **NVIDIA Isaac Lab** on a single **NVIDIA RTX 5090**, achieving state-of-the-art simulation throughput and convergence stability.
16
+
17
+ ## Model Description
18
+
19
+ The policies are trained to control a 7-DoF robotic arm (Galaxea R1) to assemble a complex planetary gearbox. The task is decomposed into sequential sub-tasks: `Approach` -> `Grasp` -> `Transport` (for each gear).
20
+
21
+ - **Algorithm**: PPO (Proximal Policy Optimization) via `rl_games`
22
+ - **Observation Space**: 69-dim (Joint pos/vel, EE pose, Relative gear targets)
23
+ - **Action Space**: 14-dim (Joint position targets + Gripper)
24
+ - **Training Framework**: Isaac Lab (DirectRL Mode)
25
+
26
+ ## Performance Metrics
27
+
28
+ The models were trained with a massive throughput of **~8,200 FPS** (Frames Per Second) using full GPU vectorization.
29
+
30
+ | Policy | Stage | Avg Reward | Critic Loss | Entropy | Status |
31
+ | :--- | :--- | :--- | :--- | :--- | :--- |
32
+ | **Approach** | 1 (Foundation) | ~241.4 | 3.8e-5 | 2.58 | **Converged** |
33
+ | **Grasp** | 2 (Manipulation) | ~240.9 | 3.3e-5 | -0.92 | **Converged** |
34
+ | **Transport 1** | 3 (Assembly) | ~282.6 | 1.7e-4 | 11.2 | **Robust** |
35
+
36
+ ## Included Files
37
+
38
+ - `policy_approach.pth`: PyTorch checkpoint for the Approach phase.
39
+ - `policy_grasp.pth`: PyTorch checkpoint for the Grasping phase.
40
+ - `policy_transport_gear_1.pth`: PyTorch checkpoint for Transporting the first Sun Gear.
41
+ - `env_config.py`: The environment configuration used for training (PhysX settings, rewards).
42
+ - `agent_config.yaml`: The PPO hyperparameters.
43
+
44
+ ## Usage
45
+
46
+ These policies are designed to be loaded into the Isaac Lab environment:
47
+
48
+ ```python
49
+ # Pseudo-code for loading
50
+ from rl_games.torch_runner import Runner
51
+
52
+ runner = Runner()
53
+ runner.load('policy_approach.pth')
54
+ # ... run inference ...
55
+ ```
56
+
57
+ ## Hardware Specification
58
+
59
+ - **GPU**: NVIDIA GeForce RTX 5090 (32GB)
60
+ - **Training Time**: ~3 hours per policy (Optimized from 50+ days)
61
+ - **Simultaneous Envs**: 8,192
agent_config.yaml ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RL-Games PPO Configuration for Long Trajectory Assembly
2
+ # Copyright (c) 2022-2025, The Isaac Lab Project Developers.
3
+
4
+ params:
5
+ seed: 42
6
+
7
+ # Environment wrapper clipping
8
+ env:
9
+ clip_observations: 5.0
10
+ clip_actions: 1.0
11
+
12
+ algo:
13
+ name: a2c_continuous
14
+
15
+ model:
16
+ name: continuous_a2c_logstd
17
+
18
+ network:
19
+ name: actor_critic
20
+ separate: False
21
+
22
+ space:
23
+ continuous:
24
+ mu_activation: None
25
+ sigma_activation: None
26
+ mu_init:
27
+ name: default
28
+ sigma_init:
29
+ name: const_initializer
30
+ val: 0
31
+ fixed_sigma: True
32
+
33
+ mlp:
34
+ units: [512, 256, 128]
35
+ activation: elu
36
+ d2rl: False
37
+
38
+ initializer:
39
+ name: default
40
+ regularizer:
41
+ name: None
42
+
43
+ load_checkpoint: False
44
+ load_path: ''
45
+
46
+ config:
47
+ name: Galaxea-LongTrajectoryAssembly-Direct-v0
48
+ full_experiment_name: LongTrajectoryAssembly
49
+
50
+ env_name: rlgpu
51
+ device: 'cuda:0'
52
+ device_name: 'cuda:0'
53
+ multi_gpu: False
54
+ ppo: True
55
+ mixed_precision: False
56
+ normalize_input: True
57
+ normalize_value: True
58
+ # value_bootstrap: True # Commented out to match Isaac Lab examples
59
+ num_actors: -1 # Will be set by num_envs
60
+ reward_shaper:
61
+ scale_value: 1.0
62
+ normalize_advantage: True
63
+
64
+ gamma: 0.99
65
+ tau: 0.95
66
+ learning_rate: 3e-4
67
+ lr_schedule: adaptive
68
+ kl_threshold: 0.008
69
+
70
+ score_to_win: 100000
71
+ max_epochs: 5000
72
+ save_best_after: 100
73
+ save_frequency: 100
74
+ print_stats: True
75
+
76
+ grad_norm: 1.0
77
+ entropy_coef: 0.001
78
+ truncate_grads: True
79
+
80
+ e_clip: 0.2
81
+ clip_value: True
82
+
83
+ # PPO specific
84
+ horizon_length: 32
85
+ minibatch_size: 16384
86
+ mini_epochs: 8
87
+ critic_coef: 2
88
+ bounds_loss_coef: 0.0001 # Add bounds loss coefficient to prevent b_loss error
89
+
90
+ # Training
91
+ games_to_track: 100
92
+ player:
93
+ deterministic: True
94
+ games_num: 1000000
95
+ print_stats: True
96
+
env_config.py ADDED
@@ -0,0 +1,264 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2022-2025, The Isaac Lab Project Developers.
2
+ # All rights reserved.
3
+ #
4
+ # SPDX-License-Identifier: BSD-3-Clause
5
+
6
+ """Configuration for Long Trajectory Gear Assembly Environment."""
7
+
8
+ from isaaclab.assets import ArticulationCfg, RigidObjectCfg
9
+ from isaaclab.envs import DirectRLEnvCfg
10
+ from isaaclab.scene import InteractiveSceneCfg
11
+ from isaaclab.sim import SimulationCfg, PhysxCfg
12
+ from isaaclab.utils import configclass
13
+ from isaaclab.sensors import CameraCfg
14
+
15
+ from Galaxea_Lab_External.robots import (
16
+ GALAXEA_R1_CHALLENGE_CFG,
17
+ GALAXEA_HEAD_CAMERA_CFG,
18
+ GALAXEA_HAND_CAMERA_CFG,
19
+ TABLE_CFG,
20
+ RING_GEAR_CFG,
21
+ SUN_PLANETARY_GEAR_CFG,
22
+ PLANETARY_CARRIER_CFG,
23
+ PLANETARY_REDUCER_CFG,
24
+ )
25
+
26
+
27
+ @configclass
28
+ class LongTrajectoryAssemblyEnvCfg(DirectRLEnvCfg):
29
+ """Configuration for Long Trajectory Gear Assembly Environment.
30
+
31
+ This environment supports multi-stage assembly tasks with 8 policies:
32
+ - Policy_Approach (shared across all gears)
33
+ - Policy_Grasp (shared across all gears)
34
+ - Policy_Transport_Gear1~4, Carrier, Reducer (6 gear-specific policies)
35
+
36
+ Transition management is handled via environment-based rule-based transitions.
37
+ """
38
+
39
+ # Record data settings
40
+ record_data = False
41
+ record_freq = 5
42
+
43
+ # Camera settings (disabled by default for RL training)
44
+ enable_cameras = False
45
+
46
+ # Environment settings
47
+ sim_dt = 0.01
48
+ decimation = 5
49
+ episode_length_s = 120.0 # Long trajectory: 120 seconds max
50
+
51
+ # Number of re-renders on reset (for camera sensors)
52
+ num_rerenders_on_reset = 5
53
+
54
+ # Action and observation spaces
55
+ # Action: Left arm(6) + Right arm(6) + Left gripper(1) + Right gripper(1) = 14
56
+ action_space = 14
57
+ # Observation space:
58
+ # - Joint pos: 6+6+1+1 = 14
59
+ # - Joint vel: 6+6+1+1 = 14
60
+ # - EE poses: 3+4+3+4 = 14
61
+ # - Gear obs: 3+4+3+3+4+1 = 18
62
+ # - Encodings: 3+6 = 9
63
+ # Total = 69
64
+ observation_space = 69
65
+ state_space = 0
66
+
67
+ # Simulation configuration
68
+ # Increase GPU collision stack size to handle many environments (default 2**26)
69
+ sim: SimulationCfg = SimulationCfg(
70
+ dt=sim_dt,
71
+ render_interval=decimation,
72
+ physx=PhysxCfg(
73
+ gpu_collision_stack_size=2**31, # Increased for 8192+ envs (approx 2.1 billion)
74
+ )
75
+ )
76
+
77
+ # Robot configuration
78
+ robot_cfg: ArticulationCfg = GALAXEA_R1_CHALLENGE_CFG.replace(
79
+ prim_path="/World/envs/env_.*/Robot"
80
+ )
81
+
82
+ # Table configuration
83
+ table_cfg: RigidObjectCfg = TABLE_CFG.replace(
84
+ prim_path="/World/envs/env_.*/Table"
85
+ )
86
+
87
+ # Gear configurations with default initial positions
88
+ ring_gear_cfg: RigidObjectCfg = RING_GEAR_CFG.replace(
89
+ prim_path="/World/envs/env_.*/ring_gear",
90
+ init_state=RigidObjectCfg.InitialStateCfg(
91
+ pos=(0.45, 0.0, 1.0),
92
+ rot=(1.0, 0.0, 0.0, 0.0),
93
+ )
94
+ )
95
+
96
+ sun_planetary_gear_1_cfg: RigidObjectCfg = SUN_PLANETARY_GEAR_CFG.replace(
97
+ prim_path="/World/envs/env_.*/sun_planetary_gear_1",
98
+ init_state=RigidObjectCfg.InitialStateCfg(
99
+ pos=(0.4, -0.2, 1.0),
100
+ rot=(1.0, 0.0, 0.0, 0.0),
101
+ )
102
+ )
103
+
104
+ sun_planetary_gear_2_cfg: RigidObjectCfg = SUN_PLANETARY_GEAR_CFG.replace(
105
+ prim_path="/World/envs/env_.*/sun_planetary_gear_2",
106
+ init_state=RigidObjectCfg.InitialStateCfg(
107
+ pos=(0.5, -0.25, 1.0),
108
+ rot=(1.0, 0.0, 0.0, 0.0),
109
+ )
110
+ )
111
+
112
+ sun_planetary_gear_3_cfg: RigidObjectCfg = SUN_PLANETARY_GEAR_CFG.replace(
113
+ prim_path="/World/envs/env_.*/sun_planetary_gear_3",
114
+ init_state=RigidObjectCfg.InitialStateCfg(
115
+ pos=(0.45, -0.15, 1.0),
116
+ rot=(1.0, 0.0, 0.0, 0.0),
117
+ )
118
+ )
119
+
120
+ sun_planetary_gear_4_cfg: RigidObjectCfg = SUN_PLANETARY_GEAR_CFG.replace(
121
+ prim_path="/World/envs/env_.*/sun_planetary_gear_4",
122
+ init_state=RigidObjectCfg.InitialStateCfg(
123
+ pos=(0.55, -0.3, 1.0),
124
+ rot=(1.0, 0.0, 0.0, 0.0),
125
+ )
126
+ )
127
+
128
+ planetary_carrier_cfg: RigidObjectCfg = PLANETARY_CARRIER_CFG.replace(
129
+ prim_path="/World/envs/env_.*/planetary_carrier",
130
+ init_state=RigidObjectCfg.InitialStateCfg(
131
+ pos=(0.5, 0.25, 1.0),
132
+ rot=(1.0, 0.0, 0.0, 0.0),
133
+ )
134
+ )
135
+
136
+ planetary_reducer_cfg: RigidObjectCfg = PLANETARY_REDUCER_CFG.replace(
137
+ prim_path="/World/envs/env_.*/planetary_reducer",
138
+ init_state=RigidObjectCfg.InitialStateCfg(
139
+ pos=(0.3, 0.1, 1.0),
140
+ rot=(1.0, 0.0, 0.0, 0.0),
141
+ )
142
+ )
143
+
144
+ # Physics material coefficients
145
+ table_friction_coefficient = 0.4
146
+ gears_friction_coefficient = 0.01
147
+ gripper_friction_coefficient = 2.0
148
+
149
+ # Camera configurations
150
+ head_camera_cfg: CameraCfg = GALAXEA_HEAD_CAMERA_CFG.replace(
151
+ prim_path="/World/envs/env_.*/Robot/zed_link/head_cam/head_cam"
152
+ )
153
+ left_hand_camera_cfg: CameraCfg = GALAXEA_HAND_CAMERA_CFG.replace(
154
+ prim_path="/World/envs/env_.*/Robot/left_realsense_link/left_hand_cam/left_hand_cam"
155
+ )
156
+ right_hand_camera_cfg: CameraCfg = GALAXEA_HAND_CAMERA_CFG.replace(
157
+ prim_path="/World/envs/env_.*/Robot/right_realsense_link/right_hand_cam/right_hand_cam"
158
+ )
159
+
160
+ # Scene configuration
161
+ scene: InteractiveSceneCfg = InteractiveSceneCfg(
162
+ num_envs=1,
163
+ env_spacing=4.0,
164
+ replicate_physics=True
165
+ )
166
+
167
+ # Joint names for robot control
168
+ left_arm_joint_dof_name = "left_arm_joint.*"
169
+ right_arm_joint_dof_name = "right_arm_joint.*"
170
+ left_gripper_dof_name = "left_gripper_axis1"
171
+ right_gripper_dof_name = "right_gripper_axis1"
172
+ torso_joint_dof_name = "torso_joint[1-3]"
173
+ torso_joint1_dof_name = "torso_joint1"
174
+ torso_joint2_dof_name = "torso_joint2"
175
+ torso_joint3_dof_name = "torso_joint3"
176
+ torso_joint4_dof_name = "torso_joint4"
177
+
178
+ # Initial torso joint positions
179
+ initial_torso_joint1_pos = 0.5
180
+ initial_torso_joint2_pos = -0.8
181
+ initial_torso_joint3_pos = 0.5
182
+
183
+ # Table offset
184
+ x_offset = 0.2
185
+
186
+ # Assembly precision (1cm as specified in requirements)
187
+ assembly_precision = 0.01 # 1cm
188
+
189
+ # Stage timeout configuration (seconds per sub-task)
190
+ stage_timeout_approach = 10.0
191
+ stage_timeout_grasp = 5.0
192
+ stage_timeout_transport = 15.0
193
+
194
+ # Reward weights
195
+ reward_approach_distance_weight = 0.1 # Horizontal distance to be above gear
196
+ reward_approach_height_weight = 0.1 # Correct pre-grasp height
197
+ reward_approach_orientation_weight = 0.1 # Gripper pointing downward
198
+ reward_approach_gripper_open_weight = 0.05 # Gripper is open
199
+ reward_approach_complete_bonus = 1.0
200
+ reward_grasp_gripper_weight = 0.1
201
+ reward_grasp_contact_weight = 0.1
202
+ reward_grasp_lift_weight = 0.1
203
+ reward_grasp_complete_bonus = 2.0
204
+ reward_transport_distance_weight = 0.2 # Horizontal alignment reward
205
+ reward_transport_height_weight = 0.2 # Height alignment reward
206
+ reward_transport_orientation_weight = 0.1 # Orientation alignment reward
207
+ reward_transport_stability_weight = 0.1 # Low velocity reward
208
+ reward_transport_complete_bonus = 10.0 # Bonus for meeting evaluate_score criteria
209
+ reward_transition_bonus = 5.0
210
+ reward_time_penalty = 0.001
211
+
212
+ # Approach completion thresholds
213
+ approach_distance_threshold = 0.05 # 5cm to gear center (deprecated, use below)
214
+ approach_horizontal_threshold = 0.03 # 3cm - EE must be directly above gear
215
+ approach_height_threshold = 0.02 # 2cm tolerance for pre-grasp height
216
+ approach_orientation_threshold = 0.3 # radians (deprecated)
217
+ approach_orientation_dot_threshold = 0.95 # quaternion dot product threshold (close to 1 = aligned)
218
+ gripper_open_threshold = 0.03 # gripper must be at least this open (rad)
219
+ pre_grasp_height_offset = 0.05 # 5cm above gear for pre-grasp position
220
+
221
+ # Grasp completion thresholds
222
+ grasp_gripper_closed_threshold = 0.8 # normalized gripper position
223
+ grasp_contact_force_threshold = 2.0 # Newtons
224
+ grasp_lift_height = 0.1 # 10cm above table
225
+
226
+ # Transport completion thresholds
227
+ transport_position_threshold = 0.01 # 1cm precision
228
+ transport_orientation_threshold = 0.1 # radians
229
+ transport_stability_velocity_threshold = 0.01 # m/s
230
+
231
+ # Gear assembly sequence
232
+ gear_sequence = [
233
+ "gear_1", # Sun planetary gear 1
234
+ "gear_2", # Sun planetary gear 2
235
+ "gear_3", # Sun planetary gear 3
236
+ "gear_4", # Sun planetary gear 4 (center)
237
+ "carrier", # Planetary carrier onto ring gear
238
+ "reducer", # Planetary reducer onto gear 4
239
+ ]
240
+
241
+ # Pin local positions relative to planetary carrier
242
+ pin_0_local_pos = (0.0, -0.054, 0.0)
243
+ pin_1_local_pos = (0.0471, 0.0268, 0.0)
244
+ pin_2_local_pos = (-0.0471, 0.0268, 0.0)
245
+
246
+ # TCP (Tool Center Point) offsets
247
+ tcp_offset_x = 0.0079 # 0.3864 - 0.3785
248
+ tcp_offset_z = 0.0909 # 1.1475 - 1.05661
249
+
250
+ # Table and grasping heights
251
+ table_height = 0.9
252
+ grasping_height = -0.003
253
+ lifting_height = 0.2
254
+
255
+ # Sub-task types for training mode selection
256
+ # "full" - train entire sequence
257
+ # "approach" - train only approach sub-task
258
+ # "grasp" - train only grasp sub-task
259
+ # "transport_gear_1" through "transport_reducer" - train specific transport
260
+ training_subtask = "full"
261
+
262
+ # Starting gear index for curriculum learning (0-5)
263
+ curriculum_start_gear_idx = 0
264
+
policy_approach.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd7a5dd8c95da78f3e09020569ba534d216e66320269777a7328eae358179221
3
+ size 2443653
policy_grasp.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e1d879be6f34282d76421daac59455279304a6fe7c407399305c783708af5b6
3
+ size 2443653
policy_transport_gear_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5eecd4be84743813ea59f91938abc1b849b40d8bd069f13d84e84bbad12a36e9
3
+ size 2443653