| # PPO CartPole Agent | |
| This is a PPO agent trained on the CartPole-v1 environment using Stable Baselines3. | |
| ## Performance | |
| The agent achieved a mean reward of 500.00 ± 0.00 over 10 evaluation episodes. | |
| ## Training Details | |
| - Algorithm: PPO | |
| - Environment: CartPole-v1 | |
| - Training Steps: 25,000 | |
| - Framework: Stable Baselines3 | |
| ## Usage | |
| ```python | |
| from stable_baselines3 import PPO | |
| import gymnasium as gym | |
| # Load the model | |
| model = PPO.load("drap/cartpole-ppo") | |
| # Create environment | |
| env = gym.make("CartPole-v1") | |
| # Test the model | |
| obs, _ = env.reset() | |
| while True: | |
| action, _ = model.predict(obs, deterministic=True) | |
| obs, reward, terminated, truncated, _ = env.step(action) | |
| if terminated or truncated: | |
| break | |
| ``` | |