--- tags: - Pixelcopter-PLE-v0 - reinforce - reinforcement-learning - custom-implementation - deep-rl-class model-index: - name: Reinforce-PixelCopter results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: Pixelcopter-PLE-v0 type: Pixelcopter-PLE-v0 metrics: - type: mean_reward value: 12.03 name: mean_reward verified: false --- # 🚁 **Reinforce Agent on Pixelcopter-PLE-v0** This repository contains a trained **Reinforce (Policy Gradient)** agent that successfully plays the **Pixelcopter-PLE-v0** environment. --- ## 📊 Model Card **Model Name:** `Reinforce-Pixelcopter-PLE-v0` **Environment:** `Pixelcopter-PLE-v0` **Algorithm:** Reinforce (Monte Carlo Policy Gradient) **Performance Metric:** - Achieves stable flight and obstacle avoidance across evaluation runs - Mean reward demonstrates convergence to an effective policy --- ## 🚀 Usage ```python from huggingface_hub import load_from_hub import gym # Load the trained Reinforce model model = load_from_hub( repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0", filename="reinforce.pkl" ) # Initialize environment env = gym.make(model["env_id"]) ``` --- ## 🧠 Notes - The agent is trained using the **Reinforce algorithm**, which updates policy parameters based on episodic returns. - The environment is **Pixelcopter-PLE-v0**, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles. - The serialized policy is stored in `reinforce.pkl`. --- ## 📂 Repository Structure - `reinforce.pkl` → Trained policy weights - `README.md` → Documentation and usage guide --- ## ✅ Results - The agent learns to maintain altitude and avoid collisions with obstacles. - Demonstrates convergence to a stable policy using **policy gradient methods**. --- ## 🔎 Environment Overview - **Observation Space:** Pixel-based state representation (visual input) - **Action Space:** Discrete (flap or no flap) - **Objective:** Keep the helicopter flying while avoiding obstacles - **Reward:** Positive reward for survival, penalties for collisions --- ## 📚 Learning Highlights - **Algorithm:** Reinforce (Policy Gradient) - **Update Rule:** Policy parameters updated using returns from sampled episodes - **Strengths:** Effective for environments with discrete actions and episodic rewards - **Limitations:** High variance in updates, mitigated with sufficient training episodes