Factory Feed-Forward PPO+OSC Teachers
This repository contains three non-recurrent RL-Games PPO+OSC teacher checkpoints for Isaac Lab Factory assembly tasks:
Isaac-Factory-PegInsert-Direct-v0Isaac-Factory-GearMesh-Direct-v0Isaac-Factory-NutThread-Direct-v0
The teachers were trained from the stock Isaac Lab Factory RL-Games PPO+OSC configuration with both actor and central-value LSTM blocks removed. They are intended as clean teacher policies for behavior cloning or DAgger-style data collection, where each action should be a function of the current policy observation only.
Architecture
- Actor:
19D policy observation -> MLP[512, 128, 64], ELU -> Gaussian6D OSC action. - Central critic:
43D privileged state -> MLP[512, 128, 64], ELU -> scalar value. - RL-Games
seq_length=1. - Actor
rnn=null. - Central-value
network.rnn=null.
The action space is the Factory 6D operational-space-control action. These
are not FORGE checkpoints and do not use the FORGE success-prediction action
slot.
Checkpoints
| Task | File | Stop Epoch | Training Seed |
|---|---|---|---|
| PegInsert | checkpoints/peginsert_ff_ppo_osc_ep100.pth |
100 |
0 |
| GearMesh | checkpoints/gearmesh_ff_ppo_osc_ep150.pth |
150 |
0 |
| NutThread | checkpoints/nutthread_ff_ppo_osc_ep100.pth |
100 |
0 |
These files are raw PyTorch/RL-Games .pth checkpoints. Load them only in a
trusted environment.
Evaluation
Evaluation used held-out reset seeds 100,101,102 for gates and
100,101,102,103,104,105 for confirmation.
| Task | Gate Horizon | Gate Success | Six-Seed Success | Mean Return | Mean TTS |
|---|---|---|---|---|---|
| PegInsert | 150 |
376/384 (97.92%) |
750/768 (97.66%) |
350.12 |
35.72 |
| GearMesh | 300 |
378/384 (98.44%) |
750/768 (97.66%) |
719.53 |
75.02 |
| NutThread | 450 |
378/384 (98.44%) |
755/768 (98.31%) |
834.76 |
298.12 |
All gate and confirmation summaries record agent_runtime.is_rnn=false,
actor rnn=null, central-value network.rnn=null, and seq_length=1.
The JSON summaries are included under:
inspect/eval/videos/*_summary.json
Rendered Success Videos
Each video is a one-environment rollout on seed 100 using real Factory
sensor cameras. The external and wrist camera blank-frame counts were zero.
PegInsert
- Success:
true - First success step:
53 - Frames:
150 - Blank frames:
0/150external,0/150wrist
GearMesh
- Success:
true - First success step:
47 - Frames:
300 - Blank frames:
0/300external,0/300wrist
NutThread
- Success:
true - First success step:
333 - Frames:
450 - Blank frames:
0/450external,0/450wrist
Loading Notes
These checkpoints must be loaded with the same feed-forward RL-Games config used for training. In Hydra override form, remove both recurrent blocks and force sequence length to one:
'~agent.params.network.rnn' \
'~agent.params.config.central_value_config.network.rnn' \
agent.params.config.seq_length=1
In the source project, the runner switch was:
PHASE5_ASSEMBRAIN_DISABLE_RNN=1
Example task/checkpoint pairing:
Isaac-Factory-PegInsert-Direct-v0 -> checkpoints/peginsert_ff_ppo_osc_ep100.pth
Isaac-Factory-GearMesh-Direct-v0 -> checkpoints/gearmesh_ff_ppo_osc_ep150.pth
Isaac-Factory-NutThread-Direct-v0 -> checkpoints/nutthread_ff_ppo_osc_ep100.pth
Scope
These models are simulation teachers. They have not been validated on real hardware. They were trained for Factory PPO+OSC policy rollouts and later imitation-learning data generation, not for direct deployment.