TD3 Agent for LunarLander-v3

This is a trained TD3 (Twin Delayed Deep Deterministic Policy Gradient) agent for the LunarLander-v3 environment.

Model Details

  • Algorithm: Twin Delayed Deep Deterministic Policy Gradient (TD3)
  • Environment: LunarLander-v3
  • Framework: PyTorch
  • Device: cuda

Training Information

  • Total Timesteps: 1,000,000
  • Buffer Size: 10,000
  • Batch Size: 256
  • Learning Rates:
    • Actor: 1e-4
    • Critic: 1e-3

Evaluation Results

  • Average Reward: 284.91 ± 16.67
  • Min Reward: 245.55
  • Max Reward: 323.18
  • Evaluation Episodes: 100

How to Use

import torch
from td3_agent import TD3Agent
import gymnasium as gym

# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = torch.load("pytorch_model.pth", map_location=device)

# Create agent and environment
env = gym.make("LunarLander-v3")
agent = TD3Agent(obs_dim=8, action_dim=4, max_action=1.0, hyperparameters={}, device=device)
agent.actor.load_state_dict(model['actor_state_dict'])

# Test agent
state, _ = env.reset()
while True:
    action = agent.select_action(state, eval_mode=True)
    state, reward, terminated, truncated, _ = env.step(action)
    if terminated or truncated:
        break

Credits

Trained by Basem Elgalfy as part of Assignment 4 in Reinforcement Learning.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading