StevanLS
/

a2c-cartpole-v1

Reinforcement Learning

stable-baselines3

deep-reinforcement-learning

Eval Results (legacy)

Model card Files Files and versions

a2c-cartpole-v1 / README.md

StevanLS's picture

Update README.md

3aadbf8 verified about 1 year ago

|

history blame contribute delete

2.43 kB

	---
	library_name: stable-baselines3
	tags:
	- CartPole-v1
	- deep-reinforcement-learning
	- reinforcement-learning
	- stable-baselines3
	- A2C
	model-index:
	- name: A2C CartPole
	results:
	- task:
	type: reinforcement-learning
	name: Cart-Pole Balance
	dataset:
	name: CartPole-v1
	type: gymnasium
	metrics:
	- type: mean_reward
	value: REPLACE_WITH_ACTUAL_MEAN_REWARD # Replace with your model's mean reward
	name: mean_reward
	- type: success_rate
	value: REPLACE_WITH_SUCCESS_RATE # Replace with your model's success rate
	name: success_rate

	---

	# A2C CartPole Model

	This is an A2C (Advantage Actor-Critic) model trained to balance a pole on a moving cart. The model was trained using Stable-Baselines3.

	## Task Description

	The CartPole task involves balancing a pole attached by an unactuated joint to a cart that moves along a frictionless track. The goal is to prevent the pole from falling over by applying forces to the cart. The episode ends when:
	- The pole angle is more than ±12 degrees from vertical
	- The cart position is more than ±2.4 units from the center
	- Or when the episode length reaches 500 steps

	## Training Details

	- Environment: CartPole-v1
	- Algorithm: A2C (Advantage Actor-Critic)
	- Training Steps: 50,000
	- Policy: MlpPolicy
	- Learning Rate: 0.001
	- N_steps: 5
	- Gamma: 0.99
	- Training Framework: Stable-Baselines3

	## Usage

	```python
	import gymnasium as gym
	from stable_baselines3 import A2C

	# Create environment
	env = gym.make("CartPole-v1", render_mode="human")

	# Load the trained model
	model = A2C.load("StevanLS/a2c-cartpole-v1")

	# Test the model
	obs, _ = env.reset()
	while True:
	action, _ = model.predict(obs, deterministic=True)
	obs, reward, done, truncated, info = env.step(action)
	if done or truncated:
	obs, _ = env.reset()
	```

	## Author

	- StevanLS

	## Citations

	```bibtex
	@article{gymatorium2023,
	author={Farama Foundation},
	title={Gymnasium},
	year={2023},
	journal={GitHub repository},
	publisher={GitHub},
	url={https://github.com/Farama-Foundation/Gymnasium}
	}

	@article{raffin2021stable,
	title={Stable-baselines3: Reliable reinforcement learning implementations},
	author={Raffin, Antonin and Hill, Ashley and Gleave, Adam and Kanervisto, Anssi and Ernestus, Maximilian and Dormann, Noah},
	journal={Journal of Machine Learning Research},
	year={2021}
	}
	```