anas101alaa
/

td3_lunar

@@ -1,52 +1,79 @@
----
-tags:
-- deep-reinforcement-learning
-- reinforcement-learning
-- TD3
-- continuous-control
----
-# TD3 Model: td3_lunar
-## Model Description
-This is a trained TD3 (Twin Delayed Deep Deterministic Policy Gradient) agent.
-## Environment
-- **Environment Name**: [e.g., BipedalWalker-v3, HalfCheetah-v2]
-- **Action Space**: Continuous
-- **Observation Space**: [Describe dimensions]
-## Training Details
-- **Total Timesteps**: [e.g., 1M]
-- **Training Time**: [e.g., 2 hours]
-- **Framework**: PyTorch
-## Hyperparameters
-- Learning Rate (Actor): [e.g., 3e-4]
-- Learning Rate (Critic): [e.g., 3e-4]
-- Discount Factor (gamma): [e.g., 0.99]
-- Tau: [e.g., 0.005]
-- Policy Noise: [e.g., 0.2]
-- Noise Clip: [e.g., 0.5]
-- Policy Delay: [e.g., 2]
-## Results
-- **Mean Reward**: [e.g., 250 ± 50]
-## Usage
-```python
-import torch
-# Load the actor model
-actor = YourActorClass()  # Define your actor architecture
-actor.load_state_dict(torch.load('actor.pth'))
-actor.eval()
-# Use the model
-state = env.reset()
-action = actor(torch.FloatTensor(state)).detach().numpy()
-```
-## Files
-- `actor.pth`: Actor network weights
-- `critic.pth`: Critic network weights (if applicable)

+---
+tags:
+- deep-reinforcement-learning
+- reinforcement-learning
+- TD3
+- continuous-control
+library_name: stable-baselines3
+model-index:
+- name: td3_lunar
+  results:
+  - task:
+      type: reinforcement-learning
+      name: reinforcement-learning
+    dataset:
+      name: LunarLanderContinuous-v2
+      type: LunarLanderContinuous-v2
+    metrics:
+    - type: mean_reward
+      value: 250.00 +/- 50.00
+      name: mean_reward
+      verified: false
+---
+# TD3 Model: td3_lunar
+## Model Description
+This is a trained TD3 (Twin Delayed Deep Deterministic Policy Gradient) agent for the LunarLanderContinuous-v2 environment.
+## Environment
+- **Environment ID**: `LunarLanderContinuous-v2`
+- **Action Space**: Box(2,) - Continuous actions for main engine and side engines
+- **Observation Space**: Box(8,) - Position, velocity, angle, angular velocity, leg contact
+## Training Details
+- **Total Timesteps**: 1,000,000
+- **Training Time**: 2 hours
+- **Framework**: PyTorch
+- **Library**: stable-baselines3 (or your custom implementation)
+## Hyperparameters
+- **Learning Rate (Actor)**: 3e-4
+- **Learning Rate (Critic)**: 3e-4
+- **Discount Factor (gamma)**: 0.99
+- **Tau**: 0.005
+- **Policy Noise**: 0.2
+- **Noise Clip**: 0.5
+- **Policy Delay**: 2
+- **Buffer Size**: 1,000,000
+- **Batch Size**: 256
+## Results
+- **Mean Reward**: 250.00 ± 50.00 (over 100 evaluation episodes)
+## Usage
+```python
+import torch
+import gymnasium as gym
+# Load the actor model
+actor = YourActorClass()  # Define your actor architecture
+actor.load_state_dict(torch.load('actor.pth'))
+actor.eval()
+# Use the model
+env = gym.make('LunarLanderContinuous-v2')
+state, info = env.reset()
+done = False
+while not done:
+    action = actor(torch.FloatTensor(state)).detach().numpy()
+    state, reward, terminated, truncated, info = env.step(action)
+    done = terminated or truncated
+```
+## Files
+- `actor.pth`: Actor network weights
+- `critic_1.pth`: First critic network weights
+- `critic_2.pth`: Second critic network weights
+- `config.json`: Model configuration