File size: 3,613 Bytes
b9a1c8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
license: mit
tags:
  - reinforcement-learning
  - ppo
  - pytorch
  - isaac-lab
  - robotics
  - franka
library_name: pytorch
model-index:
  - name: PPO-Franka-Reach
    results: []
---

# PPO-Franka-Reach

A Proximal Policy Optimization (PPO) policy trained from scratch in PyTorch on the `Isaac-Reach-Franka-v0` task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.

**GitHub Repository:** [DavidH2802/PPO-from-scratch](https://github.com/DavidH2802/PPO-from-scratch)

<p align="center">
  <img src="franka_reach.gif" alt="Franka Reach Policy" width="480"/>
</p>

## Model Description

The model is a diagonal Gaussian policy (Actor) that controls a 7-DOF Franka Emika robot arm to reach a randomly spawned target position in 3D space. The policy outputs continuous joint-level actions.

### Architecture

- **Actor:** MLP (obs → 256 → 256 → act_dim) with Tanh activations, orthogonal initialization, and a learnable log-std parameter
- **Critic:** MLP (obs → 256 → 256 → 1) with Tanh activations and orthogonal initialization (included in checkpoint but not needed for inference)

### Observation and Action Space

- **Observations:** 32-dimensional vector (joint positions, joint velocities, end-effector position, target position)
- **Actions:** 7-dimensional continuous (joint position targets)

## Training Details

### Hyperparameters

| Parameter | Value |
|---|---|
| Task | Isaac-Reach-Franka-v0 |
| Parallel Envs | 4096 |
| Learning Rate | 3e-4 |
| Discount (γ) | 0.99 |
| GAE (λ) | 0.95 |
| Clip (ε) | 0.2 |
| Epochs per Update | 4 |
| Minibatch Size | 2048 |
| Horizon | 32 |
| Total Iterations | 500 |
| Total Env Steps | 65.5M |
| Training Time | ~48 minutes |

### Hardware

- **GPU:** NVIDIA RTX 4070 SUPER (12 GB VRAM)
- **CPU:** Intel Xeon E5-2673 v4
- **Cloud:** vast.ai

### Training Curves

#### Reward

The agent starts with negative reward (arm far from target) and converges to positive reward (~0.03-0.05) as it learns to reach the target.

#### Observation Normalization

The checkpoint includes running mean and variance statistics for observation normalization. These **must** be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.

## How to Use

### Download

```python
from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="DavidH2802/PPO-from-scratch",
    filename="final_policy.pt",
)
```

### Inference

Clone the full project for the model and environment code:

```bash
git clone https://github.com/DavidH2802/PPO-from-scratch.git
cd PPO-from-scratch
```

### Full Evaluation with Isaac Lab

See the [GitHub repository](https://github.com/DavidH2802/PPO-from-scratch) for complete setup instructions including Isaac Lab installation and the `eval.py` script for video recording.

## Checkpoint Contents

The `final_policy.pt` file contains:

| Key | Description |
|---|---|
| `actor` | Actor network state dict |
| `critic` | Critic network state dict |
| `obs_rms_mean` | Running mean for observation normalization |
| `obs_rms_var` | Running variance for observation normalization |

## Framework

- **Algorithm:** PPO (from scratch, no RL library dependencies)
- **Deep Learning:** PyTorch
- **Simulation:** NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
- **Environment:** Isaac-Reach-Franka-v0

## Citation

```bibtex
@misc{habinski2026ppo,
  author = {David Habinski},
  title = {PPO from Scratch in PyTorch with Isaac Lab},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/DavidH2802/PPO-from-scratch}
}
```

## License

MIT