Reinforcement Learning
stable-baselines3
deep-reinforcement-learning
TD3
continuous-control
Eval Results (legacy)
Instructions to use anas101alaa/td3_lunar with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use anas101alaa/td3_lunar with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="anas101alaa/td3_lunar", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
TD3 Model: td3_lunar
Model Description
This is a trained TD3 (Twin Delayed Deep Deterministic Policy Gradient) agent for the LunarLanderContinuous-v2 environment.
Environment
- Environment ID:
LunarLanderContinuous-v2 - Action Space: Box(2,) - Continuous actions for main engine and side engines
- Observation Space: Box(8,) - Position, velocity, angle, angular velocity, leg contact
Training Details
- Total Timesteps: 1,000,000
- Training Time: 2 hours
- Framework: PyTorch
- Library: stable-baselines3 (or your custom implementation)
Hyperparameters
- Learning Rate (Actor): 3e-4
- Learning Rate (Critic): 3e-4
- Discount Factor (gamma): 0.99
- Tau: 0.005
- Policy Noise: 0.2
- Noise Clip: 0.5
- Policy Delay: 2
- Buffer Size: 1,000,000
- Batch Size: 256
Results
- Mean Reward: 250.00 ± 50.00 (over 100 evaluation episodes)
Usage
import torch
import gymnasium as gym
# Load the actor model
actor = YourActorClass() # Define your actor architecture
actor.load_state_dict(torch.load('actor.pth'))
actor.eval()
# Use the model
env = gym.make('LunarLanderContinuous-v2')
state, info = env.reset()
done = False
while not done:
action = actor(torch.FloatTensor(state)).detach().numpy()
state, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
Files
actor.pth: Actor network weightscritic_1.pth: First critic network weightscritic_2.pth: Second critic network weightsconfig.json: Model configuration
- Downloads last month
- -
Evaluation results
- mean_reward on LunarLanderContinuous-v2self-reported250.00 +/- 50.00