KraTUZen commited on
Commit
bca8553
Β·
verified Β·
1 Parent(s): b0c5799

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -4
README.md CHANGED
@@ -21,7 +21,72 @@ model-index:
21
  verified: false
22
  ---
23
 
24
- # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
25
- This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
26
- To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
27
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  verified: false
22
  ---
23
 
24
+
25
+ # 🚁 **Reinforce Agent on Pixelcopter-PLE-v0**
26
+
27
+ This repository contains a trained **Reinforce (Policy Gradient)** agent that successfully plays the **Pixelcopter-PLE-v0** environment.
28
+
29
+ ---
30
+
31
+ ## πŸ“Š Model Card
32
+
33
+ **Model Name:** `Reinforce-Pixelcopter-PLE-v0`
34
+ **Environment:** `Pixelcopter-PLE-v0`
35
+ **Algorithm:** Reinforce (Monte Carlo Policy Gradient)
36
+ **Performance Metric:**
37
+ - Achieves stable flight and obstacle avoidance across evaluation runs
38
+ - Mean reward demonstrates convergence to an effective policy
39
+
40
+ ---
41
+
42
+ ## πŸš€ Usage
43
+
44
+ ```python
45
+ from huggingface_hub import load_from_hub
46
+ import gym
47
+
48
+ # Load the trained Reinforce model
49
+ model = load_from_hub(
50
+ repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0",
51
+ filename="reinforce.pkl"
52
+ )
53
+
54
+ # Initialize environment
55
+ env = gym.make(model["env_id"])
56
+ ```
57
+
58
+ ---
59
+
60
+ ## 🧠 Notes
61
+ - The agent is trained using the **Reinforce algorithm**, which updates policy parameters based on episodic returns.
62
+ - The environment is **Pixelcopter-PLE-v0**, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles.
63
+ - The serialized policy is stored in `reinforce.pkl`.
64
+
65
+ ---
66
+
67
+ ## πŸ“‚ Repository Structure
68
+ - `reinforce.pkl` β†’ Trained policy weights
69
+ - `README.md` β†’ Documentation and usage guide
70
+
71
+ ---
72
+
73
+ ## βœ… Results
74
+ - The agent learns to maintain altitude and avoid collisions with obstacles.
75
+ - Demonstrates convergence to a stable policy using **policy gradient methods**.
76
+
77
+ ---
78
+
79
+ ## πŸ”Ž Environment Overview
80
+ - **Observation Space:** Pixel-based state representation (visual input)
81
+ - **Action Space:** Discrete (flap or no flap)
82
+ - **Objective:** Keep the helicopter flying while avoiding obstacles
83
+ - **Reward:** Positive reward for survival, penalties for collisions
84
+
85
+ ---
86
+
87
+ ## πŸ“š Learning Highlights
88
+ - **Algorithm:** Reinforce (Policy Gradient)
89
+ - **Update Rule:** Policy parameters updated using returns from sampled episodes
90
+ - **Strengths:** Effective for environments with discrete actions and episodic rewards
91
+ - **Limitations:** High variance in updates, mitigated with sufficient training episodes
92
+