| --- |
| tags: |
| - Pixelcopter-PLE-v0 |
| - reinforce |
| - reinforcement-learning |
| - custom-implementation |
| - deep-rl-class |
| model-index: |
| - name: Reinforce-PixelCopter |
| results: |
| - task: |
| type: reinforcement-learning |
| name: reinforcement-learning |
| dataset: |
| name: Pixelcopter-PLE-v0 |
| type: Pixelcopter-PLE-v0 |
| metrics: |
| - type: mean_reward |
| value: 12.03 |
| name: mean_reward |
| verified: false |
| --- |
| |
|
|
| # π **Reinforce Agent on Pixelcopter-PLE-v0** |
|
|
| This repository contains a trained **Reinforce (Policy Gradient)** agent that successfully plays the **Pixelcopter-PLE-v0** environment. |
|
|
| --- |
|
|
| ## π Model Card |
|
|
| **Model Name:** `Reinforce-Pixelcopter-PLE-v0` |
| **Environment:** `Pixelcopter-PLE-v0` |
| **Algorithm:** Reinforce (Monte Carlo Policy Gradient) |
| **Performance Metric:** |
| - Achieves stable flight and obstacle avoidance across evaluation runs |
| - Mean reward demonstrates convergence to an effective policy |
|
|
| --- |
|
|
| ## π Usage |
|
|
| ```python |
| from huggingface_hub import load_from_hub |
| import gym |
| |
| # Load the trained Reinforce model |
| model = load_from_hub( |
| repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0", |
| filename="reinforce.pkl" |
| ) |
| |
| # Initialize environment |
| env = gym.make(model["env_id"]) |
| ``` |
|
|
| --- |
|
|
| ## π§ Notes |
| - The agent is trained using the **Reinforce algorithm**, which updates policy parameters based on episodic returns. |
| - The environment is **Pixelcopter-PLE-v0**, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles. |
| - The serialized policy is stored in `reinforce.pkl`. |
|
|
| --- |
|
|
| ## π Repository Structure |
| - `reinforce.pkl` β Trained policy weights |
| - `README.md` β Documentation and usage guide |
|
|
| --- |
|
|
| ## β
Results |
| - The agent learns to maintain altitude and avoid collisions with obstacles. |
| - Demonstrates convergence to a stable policy using **policy gradient methods**. |
|
|
| --- |
|
|
| ## π Environment Overview |
| - **Observation Space:** Pixel-based state representation (visual input) |
| - **Action Space:** Discrete (flap or no flap) |
| - **Objective:** Keep the helicopter flying while avoiding obstacles |
| - **Reward:** Positive reward for survival, penalties for collisions |
|
|
| --- |
|
|
| ## π Learning Highlights |
| - **Algorithm:** Reinforce (Policy Gradient) |
| - **Update Rule:** Policy parameters updated using returns from sampled episodes |
| - **Strengths:** Effective for environments with discrete actions and episodic rewards |
| - **Limitations:** High variance in updates, mitigated with sufficient training episodes |
| |