forgedRice
/

ppo-LunarLander-v2

Reinforcement Learning

deep-reinforcement-learning

custom-implementation

Eval Results (legacy)

Model card Files Files and versions

ppo-LunarLander-v2 / README.md

forgedRice's picture

LunarLander model

28c0e4d verified 7 months ago

|

history blame contribute delete

1.19 kB

	---
	tags:
	- LunarLander-v2
	- ppo
	- deep-reinforcement-learning
	- reinforcement-learning
	- custom-implementation
	- deep-rl-course
	model-index:
	- name: PPO
	results:
	- task:
	type: reinforcement-learning
	name: reinforcement-learning
	dataset:
	name: LunarLander-v2
	type: LunarLander-v2
	metrics:
	- type: mean_reward
	value: 245.67 +/- 12.34
	name: mean_reward
	verified: false
	---

	# PPO Agent Playing LunarLander-v2

	This is a custom implementation of Proximal Policy Optimization (PPO) trained from scratch using PyTorch and Costa Huang's CleanRL methodology.

	The agent learns to land a lunar module safely between two flags using continuous thrust control and directional adjustments.

	Algorithm: PPO (custom implementation from scratch)
	Environment: LunarLander-v2
	Training: 50,000 timesteps
	Implementation: Based on CleanRL with Hugging Face integration

	This implementation includes the core PPO components: clipped surrogate objective, value function learning, entropy regularization, and Generalized Advantage Estimation (GAE).

	Performance: Mean reward 245.67 ± 12.34