--- tags: - Pixelcopter-PLE-v0 - reinforce - reinforcement-learning - custom-implementation - deep-rl-class model-index: - name: Reinforce-PixelCopter results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: Pixelcopter-PLE-v0 type: Pixelcopter-PLE-v0 metrics: - type: mean_reward value: 58.13 +/- 55.17 name: mean_reward verified: false --- # ๐Ÿš Reinforce Agent โ€” Pixelcopter-PLE-v0 A policy gradient agent trained from scratch using the **REINFORCE** algorithm to play [Pixelcopter](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html), a challenging continuous control game built on the PyGame Learning Environment (PLE). --- ## ๐Ÿ“Š Performance | Metric | Value | |--------|-------| | Mean Reward | 58.13 | | Std of Reward | ยฑ55.17 | | Best Average Score | 80.65 (Episode 46000) | | Evaluation Episodes | 10 | | Training Episodes | 50,000 | --- ## ๐Ÿง  Algorithm โ€” REINFORCE (Monte Carlo Policy Gradient) REINFORCE is a classic **policy gradient** method that directly optimizes the policy by: 1. Rolling out full episodes using the current policy 2. Computing discounted returns **Gโ‚œ = rโ‚œโ‚Šโ‚ + ฮณrโ‚œโ‚Šโ‚‚ + ฮณยฒrโ‚œโ‚Šโ‚ƒ + ...** for each timestep 3. Updating the policy by maximizing **E[ log ฯ€_ฮธ(a|s) ยท Gโ‚œ ]** The policy network is a simple feedforward neural network: - **Input:** State observation vector - **Hidden layer:** Fully connected + ReLU activation - **Output:** Action probabilities via Softmax --- ## โš™๏ธ Hyperparameters | Parameter | Value | |-----------|-------| | Hidden layer size | 64 | | Training episodes | 50,000 | | Max steps per episode | 10,000 | | Discount factor (ฮณ) | 0.99 | | Learning rate | 1e-4 | | Optimizer | Adam | --- ## ๐ŸŽฎ About the Environment **Pixelcopter-PLE-v0** is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing. - **Observation space:** 7 continuous values (player velocity, player y-position, wall positions, etc.) - **Action space:** 2 discrete actions โ€” throttle up or do nothing - **Reward:** +1 for each timestep survived - **Episode ends:** On collision with a wall or the ground/ceiling --- ## ๐Ÿš€ How to Use ```python from ple.games.pixelcopter import Pixelcopter from ple import PLE import torch # Load the model model = torch.load("model.pt", map_location=torch.device("cpu")) model.eval() # Run inference state, _ = env.reset() action, _ = model.act(state) ``` --- ## ๐Ÿ“š Training Details - **Framework:** PyTorch - **Returns:** Standardized per episode for training stability - **Environment API:** PyGame Learning Environment (PLE) via custom Gymnasium wrapper --- ## ๐Ÿ‘ค Author Trained by **nirmanpatel** as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README).