sam522
/

ppo-lunarlander-v3

Reinforcement Learning

stable-baselines3

deep-reinforcement-learning

Model card Files Files and versions

ppo-lunarlander-v3 / README.md

sam522's picture

Upload README.md with huggingface_hub

74c66c0 verified 5 months ago

|

history blame contribute delete

1.24 kB

	---
	tags:
	- LunarLander-v3
	- ppo
	- deep-reinforcement-learning
	- reinforcement-learning
	- stable-baselines3
	library_name: stable-baselines3
	---

	# PPO Agent playing LunarLander-v3

	This is a PPO agent trained on the LunarLander-v3 environment.

	## Usage

	```python
	import torch
	import gymnasium as gym
	from pathlib import Path

	# Load the model
	checkpoint = torch.load("model.pth")
	network = Network(config) # You need to define the Network class
	network.load_state_dict(checkpoint['model_state_dict'])

	# Test the agent
	env = gym.make("LunarLander-v3")
	state, _ = env.reset()
	done = False
	total_reward = 0

	while not done:
	action, _, _, _ = network.get_action_and_value(state)
	state, reward, terminated, truncated, _ = env.step(action)
	total_reward += reward
	done = terminated or truncated

	print(f"Total reward: {total_reward}")
	```

	## Training Results

	- Environment: LunarLander-v3
	- Training Episodes: 3000
	- Final Performance: 212.4 ± 113.1
	- Best Episode: 332.4307750590245

	## Algorithm Details

	- Algorithm: Proximal Policy Optimization (PPO)
	- Network Architecture: Actor-Critic with shared features
	- Learning Rate: 0.0003
	- Clip Epsilon: 0.2
	- Training Episodes: 3000