nirmanpatel's picture
Update README.md
1ddf746 verified
metadata
library_name: stable-baselines3
tags:
  - PandaReachDense-v3
  - deep-reinforcement-learning
  - reinforcement-learning
  - robotics
  - stable-baselines3
  - gymnasium
  - panda-gym
model-index:
  - name: A2C
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: PandaReachDense-v3
          type: PandaReachDense-v3
        metrics:
          - type: mean_reward
            value: '-17.94 +/- 6.03'
            name: mean_reward
            verified: false

A2C Agent for PandaReachDense-v3

This repository contains a trained Advantage Actor-Critic (A2C) agent for the PandaReachDense-v3 robotics environment from Panda-Gym.

The agent was trained using:

  • Stable-Baselines3
  • Gymnasium
  • Panda-Gym

Environment

The task involves controlling a Franka Panda robotic arm to reach a target position in 3D space.

Environment:

  • PandaReachDense-v3

Frameworks:

  • Stable-Baselines3
  • Gymnasium
  • Panda-Gym

Training Details

Algorithm:

  • A2C (Advantage Actor-Critic)

Observation Space:

  • Continuous

Action Space:

  • Continuous robotic control

Reward Type:

  • Dense reward

Evaluation Reward:

  • Mean Reward: -17.94 +/- 6.03

Usage

Install dependencies:

pip install stable-baselines3 gymnasium panda-gym huggingface_sb3

Load the model:

import gymnasium as gym
from stable_baselines3 import A2C
from huggingface_sb3 import load_from_hub

repo_id = "nirmanpatel/a2c-PandaReachDense-v3"
filename = "a2c-PandaReachDense-v3.zip"

checkpoint = load_from_hub(
    repo_id=repo_id,
    filename=filename,
)

env = gym.make("PandaReachDense-v3")

model = A2C.load(checkpoint)

obs, info = env.reset()

for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        obs, info = env.reset()

Notes

This project demonstrates:

  • Reinforcement Learning for robotics
  • Continuous control using A2C
  • Gymnasium-compatible RL pipelines
  • Hugging Face model deployment

Author

Created by Nirman Patel