Mahmoud103 commited on
Commit
0fda40e
·
verified ·
1 Parent(s): a913dbb

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - SpaceInvadersNoFrameskip-v4
4
+ - deep-reinforcement-learning
5
+ - reinforcement-learning
6
+ model-index:
7
+ - name: PPO
8
+ results:
9
+ - task:
10
+ type: reinforcement-learning
11
+ name: reinforcement-learning
12
+ dataset:
13
+ name: SpaceInvadersNoFrameskip-v4
14
+ type: SpaceInvadersNoFrameskip-v4
15
+ metrics:
16
+ - type: mean_reward
17
+ value: 900.0
18
+ name: mean_reward
19
+ verified: false
20
+ ---
21
+
22
+ # PPO Agent playing SpaceInvadersNoFrameskip-v4
23
+
24
+ This is a trained model of a PPO agent playing SpaceInvadersNoFrameskip-v4 using CleanRL.
25
+
26
+ ## Metrics
27
+ - **Mean Reward**: 900.0
28
+
29
+ ## Usage
30
+ ```python
31
+ import torch
32
+ import gymnasium as gym
33
+ from PPO_atari import Agent
34
+
35
+ env = gym.make("SpaceInvadersNoFrameskip-v4")
36
+
37
+ # Load the model
38
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
39
+ env = gym.make("SpaceInvadersNoFrameskip-v4")
40
+ agent = Agent(env).to(device)
41
+ agent.load_state_dict(torch.load("model.pth", map_location=device))
42
+ agent.eval()
43
+
44
+ # Run evaluation
45
+ obs, _ = env.reset()
46
+ done = False
47
+ while not done:
48
+ action, _, _, _ = agent.get_action_and_value(torch.tensor(obs).unsqueeze(0).to(device))
49
+ obs, reward, terminated, truncated, _ = env.step(action.cpu().numpy()[0])
50
+ done = terminated or truncated
51
+ ```
52
+
53
+ ## Training Details
54
+
55
+ - **Algorithm**: Proximal Policy Optimization (PPO)
56
+ - **Environment**: SpaceInvadersNoFrameskip-v4
57
+ - **Total timesteps**: 10,000,000
58
+ - **Framework**: CleanRL
59
+ - **Number of parallel environments**: 8
60
+ - **Learning rate**: 2.5e-4
61
+ - **Evaluation episodes**: 100
62
+ - **Mean reward**: 900.00
63
+
64
+ ## Hyperparameters
65
+
66
+ - Learning rate: 2.5e-4
67
+ - Gamma: 0.99
68
+ - GAE Lambda: 0.95
69
+ - Clip coefficient: 0.1
70
+ - Value function coefficient: 0.5
71
+ - Entropy coefficient: 0.01
72
+ - Number of epochs: 4
73
+ - Minibatches: 4