| ---
|
| tags:
|
| - LunarLander-v2
|
| - ppo
|
| - deep-reinforcement-learning
|
| - reinforcement-learning
|
| - custom-implementation
|
| - deep-rl-course
|
| model-index:
|
| - name: PPO
|
| results:
|
| - task:
|
| type: reinforcement-learning
|
| name: reinforcement-learning
|
| dataset:
|
| name: LunarLander-v2
|
| type: LunarLander-v2
|
| metrics:
|
| - type: mean_reward
|
| value: 245.67 +/- 12.34
|
| name: mean_reward
|
| verified: false
|
| ---
|
|
|
| # PPO Agent Playing LunarLander-v2
|
|
|
| This is a custom implementation of Proximal Policy Optimization (PPO) trained from scratch using PyTorch and Costa Huang's CleanRL methodology.
|
|
|
| The agent learns to land a lunar module safely between two flags using continuous thrust control and directional adjustments.
|
|
|
| **Algorithm**: PPO (custom implementation from scratch)
|
| **Environment**: LunarLander-v2
|
| **Training**: 50,000 timesteps
|
| **Implementation**: Based on CleanRL with Hugging Face integration
|
|
|
| This implementation includes the core PPO components: clipped surrogate objective, value function learning, entropy regularization, and Generalized Advantage Estimation (GAE).
|
|
|
| Performance: Mean reward 245.67 ± 12.34 |