Reinforce Agent playing CartPole-v1

This is a trained Reinforce agent playing CartPole-v1. The agent was trained using a custom implementation of the Policy Gradient (Reinforce) algorithm using PyTorch.

๐ŸŽฎ Environment

  • Environment: CartPole-v1
  • Goal: Balance the pole by pushing the cart left or right.
  • State Space: 4 (Cart Position, Cart Velocity, Pole Angle, Pole Angular Velocity)
  • Action Space: 2 (Push Left, Push Right)

๐Ÿ“Š Evaluation Results

Metric Value
Mean Reward 500.00 +/- 0.00
Evaluation Episodes 10

โš™๏ธ Hyperparameters

The agent was trained using the following hyperparameters:

  • H_size (Hidden Neurons): 16
  • Total Training Episodes: 1000
  • Max Steps per Episode: 1000
  • Learning Rate: 1e-2
  • Gamma (Discount Factor): 1.0

๐Ÿง  Model Architecture

The policy is a simple Neural Network with one hidden layer:

  • Input: 4 (State size)
  • Hidden: 16 (ReLU activation)
  • Output: 2 (Softmax activation)

๐Ÿ Usage

To use this model, you need gym and torch installed.

โš ๏ธ Note: When loading the model with torch.load, you might get a warning about "suspicious pickle files" or "weights_only=False". This is standard for PyTorch models that save the full model structure. Please ignore the warning and trust the source if you are running this locally.

import gym
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from huggingface_hub import hf_hub_download

# 1. Define the Policy class (MUST be present to load the model)
class Policy(nn.Module):
    def __init__(self, s_size, a_size, h_size):
        super(Policy, self).__init__()
        self.fc1 = nn.Linear(s_size, h_size)
        self.fc2 = nn.Linear(h_size, a_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.softmax(x, dim=1)

    def act(self, state):
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        state = torch.from_numpy(state).float().unsqueeze(0).to(device)
        probs = self.forward(state).cpu()
        m = torch.distributions.Categorical(probs)
        action = m.sample()
        return action.item(), m.log_prob(action)

# 2. Download the model
repo_id = "Tejas-Anvekar/Reinforce-CartPole-v1" # Replace with your Repo ID
filename = "model.pt"
model_path = hf_hub_download(repo_id=repo_id, filename=filename)

# 3. Load the model
# We set weights_only=False because we are loading the full model structure
model = torch.load(model_path, map_location=torch.device('cpu'))
model.eval()

# 4. Evaluate
env = gym.make("CartPole-v1")
state = env.reset()
done = False
total_reward = 0

print("Agent playing...")
while not done:
    action, _ = model.act(state)
    state, reward, done, _ = env.step(action)
    total_reward += reward
    env.render()

print(f"Game Over! Total Reward: {total_reward}")
env.close()
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results