REINFORCE Agent for Pixelcopter-PLE-v0

Model Description

This repository contains a trained REINFORCE (Policy Gradient) reinforcement learning agent that has learned to play Pixelcopter-PLE-v0, a challenging helicopter navigation game from the PyGame Learning Environment (PLE). The agent uses policy gradient methods to learn optimal flight control strategies through trial and error.

Model Details

Algorithm: REINFORCE (Monte Carlo Policy Gradient)
Environment: Pixelcopter-PLE-v0 (PyGame Learning Environment)
Framework: Custom implementation following Deep RL Course guidelines
Task Type: Discrete Control (Binary Actions)
Action Space: Discrete (2 actions: do nothing or thrust up)
Observation Space: Visual/pixel-based or feature-based state representation

Environment Overview

Pixelcopter-PLE-v0 is a classic helicopter control game where:

Objective: Navigate a helicopter through obstacles without crashing
Challenge: Requires precise timing and control to avoid ceiling, floor, and obstacles
Physics: Gravity constantly pulls the helicopter down; player must apply thrust to maintain altitude
Scoring: Points are awarded for surviving longer and successfully navigating through gaps
Difficulty: Requires learning temporal dependencies and precise action timing

Performance

The trained REINFORCE agent achieves the following performance metrics:

Mean Reward: 13.10 ± 6.89
Performance Analysis: This represents solid performance for this challenging environment
Consistency: The standard deviation indicates moderate variability, which is expected for policy gradient methods

Educational Resources

This model was developed following the Deep Reinforcement Learning Course Unit 4:

Course Link: https://huggingface.co/deep-rl-course/unit4/introduction
Topic: Policy Gradient Methods and REINFORCE
Learning Objectives: Understanding policy-based RL algorithms

For comprehensive learning about REINFORCE and policy gradient methods, refer to the complete course materials.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on Pixelcopter-PLE-v0
self-reported

13.10 +/- 6.89