This model was trained for the Hugging Face Deep Reinforcement Learning course using a CleanRL-style PPO implementation in PyTorch.
-