anas101alaa commited on
Commit
48b3248
·
verified ·
1 Parent(s): afaa4de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -52
README.md CHANGED
@@ -1,52 +1,79 @@
1
- ---
2
- tags:
3
- - deep-reinforcement-learning
4
- - reinforcement-learning
5
- - TD3
6
- - continuous-control
7
- ---
8
-
9
- # TD3 Model: td3_lunar
10
-
11
- ## Model Description
12
- This is a trained TD3 (Twin Delayed Deep Deterministic Policy Gradient) agent.
13
-
14
- ## Environment
15
- - **Environment Name**: [e.g., BipedalWalker-v3, HalfCheetah-v2]
16
- - **Action Space**: Continuous
17
- - **Observation Space**: [Describe dimensions]
18
-
19
- ## Training Details
20
- - **Total Timesteps**: [e.g., 1M]
21
- - **Training Time**: [e.g., 2 hours]
22
- - **Framework**: PyTorch
23
-
24
- ## Hyperparameters
25
- - Learning Rate (Actor): [e.g., 3e-4]
26
- - Learning Rate (Critic): [e.g., 3e-4]
27
- - Discount Factor (gamma): [e.g., 0.99]
28
- - Tau: [e.g., 0.005]
29
- - Policy Noise: [e.g., 0.2]
30
- - Noise Clip: [e.g., 0.5]
31
- - Policy Delay: [e.g., 2]
32
-
33
- ## Results
34
- - **Mean Reward**: [e.g., 250 ± 50]
35
-
36
- ## Usage
37
- ```python
38
- import torch
39
-
40
- # Load the actor model
41
- actor = YourActorClass() # Define your actor architecture
42
- actor.load_state_dict(torch.load('actor.pth'))
43
- actor.eval()
44
-
45
- # Use the model
46
- state = env.reset()
47
- action = actor(torch.FloatTensor(state)).detach().numpy()
48
- ```
49
-
50
- ## Files
51
- - `actor.pth`: Actor network weights
52
- - `critic.pth`: Critic network weights (if applicable)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - deep-reinforcement-learning
4
+ - reinforcement-learning
5
+ - TD3
6
+ - continuous-control
7
+ library_name: stable-baselines3
8
+ model-index:
9
+ - name: td3_lunar
10
+ results:
11
+ - task:
12
+ type: reinforcement-learning
13
+ name: reinforcement-learning
14
+ dataset:
15
+ name: LunarLanderContinuous-v2
16
+ type: LunarLanderContinuous-v2
17
+ metrics:
18
+ - type: mean_reward
19
+ value: 250.00 +/- 50.00
20
+ name: mean_reward
21
+ verified: false
22
+ ---
23
+
24
+ # TD3 Model: td3_lunar
25
+
26
+ ## Model Description
27
+ This is a trained TD3 (Twin Delayed Deep Deterministic Policy Gradient) agent for the LunarLanderContinuous-v2 environment.
28
+
29
+ ## Environment
30
+ - **Environment ID**: `LunarLanderContinuous-v2`
31
+ - **Action Space**: Box(2,) - Continuous actions for main engine and side engines
32
+ - **Observation Space**: Box(8,) - Position, velocity, angle, angular velocity, leg contact
33
+
34
+ ## Training Details
35
+ - **Total Timesteps**: 1,000,000
36
+ - **Training Time**: 2 hours
37
+ - **Framework**: PyTorch
38
+ - **Library**: stable-baselines3 (or your custom implementation)
39
+
40
+ ## Hyperparameters
41
+ - **Learning Rate (Actor)**: 3e-4
42
+ - **Learning Rate (Critic)**: 3e-4
43
+ - **Discount Factor (gamma)**: 0.99
44
+ - **Tau**: 0.005
45
+ - **Policy Noise**: 0.2
46
+ - **Noise Clip**: 0.5
47
+ - **Policy Delay**: 2
48
+ - **Buffer Size**: 1,000,000
49
+ - **Batch Size**: 256
50
+
51
+ ## Results
52
+ - **Mean Reward**: 250.00 ± 50.00 (over 100 evaluation episodes)
53
+
54
+ ## Usage
55
+ ```python
56
+ import torch
57
+ import gymnasium as gym
58
+
59
+ # Load the actor model
60
+ actor = YourActorClass() # Define your actor architecture
61
+ actor.load_state_dict(torch.load('actor.pth'))
62
+ actor.eval()
63
+
64
+ # Use the model
65
+ env = gym.make('LunarLanderContinuous-v2')
66
+ state, info = env.reset()
67
+ done = False
68
+
69
+ while not done:
70
+ action = actor(torch.FloatTensor(state)).detach().numpy()
71
+ state, reward, terminated, truncated, info = env.step(action)
72
+ done = terminated or truncated
73
+ ```
74
+
75
+ ## Files
76
+ - `actor.pth`: Actor network weights
77
+ - `critic_1.pth`: First critic network weights
78
+ - `critic_2.pth`: Second critic network weights
79
+ - `config.json`: Model configuration