Reinforce Agent playing CartPole-v1
This is a trained Reinforce agent playing CartPole-v1. The agent was trained using a custom implementation of the Policy Gradient (Reinforce) algorithm using PyTorch.
๐ฎ Environment
- Environment:
CartPole-v1 - Goal: Balance the pole by pushing the cart left or right.
- State Space: 4 (Cart Position, Cart Velocity, Pole Angle, Pole Angular Velocity)
- Action Space: 2 (Push Left, Push Right)
๐ Evaluation Results
| Metric | Value |
|---|---|
| Mean Reward | 500.00 +/- 0.00 |
| Evaluation Episodes | 10 |
โ๏ธ Hyperparameters
The agent was trained using the following hyperparameters:
- H_size (Hidden Neurons): 16
- Total Training Episodes: 1000
- Max Steps per Episode: 1000
- Learning Rate: 1e-2
- Gamma (Discount Factor): 1.0
๐ง Model Architecture
The policy is a simple Neural Network with one hidden layer:
- Input: 4 (State size)
- Hidden: 16 (ReLU activation)
- Output: 2 (Softmax activation)
๐ Usage
To use this model, you need gym and torch installed.
โ ๏ธ Note: When loading the model with torch.load, you might get a warning about "suspicious pickle files" or "weights_only=False". This is standard for PyTorch models that save the full model structure. Please ignore the warning and trust the source if you are running this locally.
import gym
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from huggingface_hub import hf_hub_download
# 1. Define the Policy class (MUST be present to load the model)
class Policy(nn.Module):
def __init__(self, s_size, a_size, h_size):
super(Policy, self).__init__()
self.fc1 = nn.Linear(s_size, h_size)
self.fc2 = nn.Linear(h_size, a_size)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.softmax(x, dim=1)
def act(self, state):
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
state = torch.from_numpy(state).float().unsqueeze(0).to(device)
probs = self.forward(state).cpu()
m = torch.distributions.Categorical(probs)
action = m.sample()
return action.item(), m.log_prob(action)
# 2. Download the model
repo_id = "Tejas-Anvekar/Reinforce-CartPole-v1" # Replace with your Repo ID
filename = "model.pt"
model_path = hf_hub_download(repo_id=repo_id, filename=filename)
# 3. Load the model
# We set weights_only=False because we are loading the full model structure
model = torch.load(model_path, map_location=torch.device('cpu'))
model.eval()
# 4. Evaluate
env = gym.make("CartPole-v1")
state = env.reset()
done = False
total_reward = 0
print("Agent playing...")
while not done:
action, _ = model.act(state)
state, reward, done, _ = env.step(action)
total_reward += reward
env.render()
print(f"Game Over! Total Reward: {total_reward}")
env.close()
Evaluation results
- mean_reward on CartPole-v1self-reported500.00 +/- 0.00