paprae
/

ppo-LunarLander-v2

Reinforcement Learning

stable-baselines3

deep-reinforcement-learning

Eval Results (legacy)

Model card Files Files and versions

ppo-LunarLander-v2 / LunarLander-v2

paprae's picture

Create LunarLander-v2

c223161 about 3 years ago

history blame contribute delete

966 Bytes

	1) model = PPO(policy = "MlpPolicy",
	env = env,
	n_steps = 1024,
	batch_size = 64,
	n_epochs = 4,
	gamma = 0.999,
	gae_lambda = 0.98,
	ent_coef = 0.01,
	verbose=1)
	model.learn(total_timesteps = 500000)
	mean_reward=193.60 +/- 21.32519973099738

	2) model = PPO(policy = "MlpPolicy",
	env = env,
	n_steps = 1024,
	batch_size = 64,
	n_epochs = 8,
	gamma = 0.999,
	gae_lambda = 0.98,
	ent_coef = 0.01,
	verbose=1)
	model.learn(total_timesteps = 500000)
	mean_reward=235.09 +/- 21.878789192117072

	3) model = PPO(policy = "MlpPolicy",
	env = env,
	n_steps = 1024,
	batch_size = 64,
	n_epochs = 8,
	gamma = 0.999,
	gae_lambda = 0.98,
	ent_coef = 0.01,
	verbose=1)
	model.learn(total_timesteps = 1000000)