Reward Rush: Lunar Lander DQN

This repository contains a Deep Q-Network agent trained for the LunarLander-v3 environment.

Model Architecture

The model uses a simple multi-layer perceptron structure with the following specifications:

Input: 8 state observations
Output: 4 discrete actions
Architecture:
- Linear(8, 32) -> ReLU
- Linear(32, 32) -> ReLU
- Linear(32, 4)

Common Implementation Mistakes to Avoid

Layer Names: The weights are saved with specific keys (fc1, fc2, fc3). Using nn.Sequential will cause a naming mismatch error (expecting fc.0, fc.2).
Hidden Dimensions: This specific model was trained with 32 neurons per layer. Using 64 or 128 will result in a size mismatch error.
Checkpoint Dictionary: The .pth file contains a dictionary. The weights must be accessed via the "policy_net_state_dict" key.
Inference Output: Running the model returns a tensor of Q-values. You must use argmax(dim=1) to extract the action index before passing it to the environment.

Import and Test Code

import torch
import torch.nn as nn
import gymnasium as gym
import numpy as np
from huggingface_hub import hf_hub_download

class LunarNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(8, 32)
        self.fc2 = nn.Linear(32, 32)
        self.fc3 = nn.Linear(32, 4)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

def run_evaluation():
    repo_id = "Nharen/Reward_Rush_DQN_Lunar_Lander"
    filename = "lunar_lander_dqn.pth"
    
    path = hf_hub_download(repo_id=repo_id, filename=filename)
    model = LunarNet()
    
    checkpoint = torch.load(path, map_location='cpu', weights_only=True)
    
    if isinstance(checkpoint, dict) and "policy_net_state_dict" in checkpoint:
        state_dict = checkpoint["policy_net_state_dict"]
    else:
        state_dict = checkpoint

    model.load_state_dict(state_dict)
    model.eval()

    env = gym.make("LunarLander-v3")
    total_rewards = []

    for _ in range(100):
        state, _ = env.reset()
        episode_reward = 0
        done = False
        while not done:
            state_t = torch.as_tensor(state, dtype=torch.float32).unsqueeze(0)
            with torch.no_grad():
                action = model(state_t).argmax(dim=1).item()
            state, reward, terminated, truncated, _ = env.step(action)
            episode_reward += reward
            done = terminated or truncated
        total_rewards.append(episode_reward)

    print(f"Average Reward: {np.mean(total_rewards)}")
    env.close()

if __name__ == "__main__":
    run_evaluation()

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v3
self-reported

260.000
n_evaluation_episodes on LunarLander-v3
self-reported

100.000