TD3 Agent for LunarLander-v3
This is a trained TD3 (Twin Delayed Deep Deterministic Policy Gradient) agent for the LunarLander-v3 environment.
Model Details
- Algorithm: Twin Delayed Deep Deterministic Policy Gradient (TD3)
- Environment: LunarLander-v3
- Framework: PyTorch
- Device: cuda
Training Information
- Total Timesteps: 1,000,000
- Buffer Size: 10,000
- Batch Size: 256
- Learning Rates:
- Actor: 1e-4
- Critic: 1e-3
Evaluation Results
- Average Reward: 284.91 ± 16.67
- Min Reward: 245.55
- Max Reward: 323.18
- Evaluation Episodes: 100
How to Use
import torch
from td3_agent import TD3Agent
import gymnasium as gym
# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = torch.load("pytorch_model.pth", map_location=device)
# Create agent and environment
env = gym.make("LunarLander-v3")
agent = TD3Agent(obs_dim=8, action_dim=4, max_action=1.0, hyperparameters={}, device=device)
agent.actor.load_state_dict(model['actor_state_dict'])
# Test agent
state, _ = env.reset()
while True:
action = agent.select_action(state, eval_mode=True)
state, reward, terminated, truncated, _ = env.step(action)
if terminated or truncated:
break
Credits
Trained by Basem Elgalfy as part of Assignment 4 in Reinforcement Learning.