Reinforce Agent playing CartPole-v1

This is a trained Reinforce agent playing CartPole-v1. The agent was trained using a custom implementation of the Policy Gradient (Reinforce) algorithm using PyTorch.

🎮 Environment

Environment: CartPole-v1
Goal: Balance the pole by pushing the cart left or right.
State Space: 4 (Cart Position, Cart Velocity, Pole Angle, Pole Angular Velocity)
Action Space: 2 (Push Left, Push Right)

📊 Evaluation Results

Metric	Value
Mean Reward	500.00 +/- 0.00
Evaluation Episodes	10

⚙️ Hyperparameters

The agent was trained using the following hyperparameters:

H_size (Hidden Neurons): 16
Total Training Episodes: 1000
Max Steps per Episode: 1000
Learning Rate: 1e-2
Gamma (Discount Factor): 1.0

🧠 Model Architecture

The policy is a simple Neural Network with one hidden layer:

Input: 4 (State size)
Hidden: 16 (ReLU activation)
Output: 2 (Softmax activation)

🐍 Usage

To use this model, you need gym and torch installed.

⚠️ Note: When loading the model with torch.load, you might get a warning about "suspicious pickle files" or "weights_only=False". This is standard for PyTorch models that save the full model structure. Please ignore the warning and trust the source if you are running this locally.

import gym
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from huggingface_hub import hf_hub_download

# 1. Define the Policy class (MUST be present to load the model)
class Policy(nn.Module):
    def __init__(self, s_size, a_size, h_size):
        super(Policy, self).__init__()
        self.fc1 = nn.Linear(s_size, h_size)
        self.fc2 = nn.Linear(h_size, a_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.softmax(x, dim=1)

    def act(self, state):
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        state = torch.from_numpy(state).float().unsqueeze(0).to(device)
        probs = self.forward(state).cpu()
        m = torch.distributions.Categorical(probs)
        action = m.sample()
        return action.item(), m.log_prob(action)

# 2. Download the model
repo_id = "Tejas-Anvekar/Reinforce-CartPole-v1" # Replace with your Repo ID
filename = "model.pt"
model_path = hf_hub_download(repo_id=repo_id, filename=filename)

# 3. Load the model
# We set weights_only=False because we are loading the full model structure
model = torch.load(model_path, map_location=torch.device('cpu'))
model.eval()

# 4. Evaluate
env = gym.make("CartPole-v1")
state = env.reset()
done = False
total_reward = 0

print("Agent playing...")
while not done:
    action, _ = model.act(state)
    state, reward, done, _ = env.step(action)
    total_reward += reward
    env.render()

print(f"Game Over! Total Reward: {total_reward}")
env.close()

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on CartPole-v1
self-reported

500.00 +/- 0.00