|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- robotics |
|
|
- manipulation |
|
|
- grasp |
|
|
- lerobot |
|
|
- clip |
|
|
--- |
|
|
|
|
|
# A2 Pretrained Policy |
|
|
|
|
|
Pretrained ViLGP3D policy for 6-DOF grasp and place tasks in tabletop manipulation. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model uses CLIP-based cross-attention for selecting grasp and place poses from candidates generated by GraspNet/PlaceNet. |
|
|
|
|
|
## Files |
|
|
|
|
|
- `sl_checkpoint_199.pth`: Trained policy weights (ViLGP3D fusion network) |
|
|
- `checkpoint-rs.tar`: GraspNet checkpoint for grasp candidate generation |
|
|
|
|
|
## Usage |
|
|
|
|
|
### With lerobot_policy_a2 |
|
|
|
|
|
```python |
|
|
from lerobot_policy_a2 import A2Policy |
|
|
|
|
|
# Load pretrained model |
|
|
policy = A2Policy.from_pretrained("dgrachev/a2_pretrained") |
|
|
|
|
|
# Use for grasp prediction |
|
|
action, info = policy.predict_grasp( |
|
|
color_images={"front": rgb_image}, |
|
|
depth_images={"front": depth_image}, |
|
|
point_cloud=point_cloud, |
|
|
lang_goal="grasp a round object" |
|
|
) |
|
|
``` |
|
|
|
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Architecture**: ViLGP3D with CLIP ViT-B/32 backbone |
|
|
- **Hidden dim**: 768 |
|
|
- **Attention heads**: 8 |
|
|
- **Position encoding**: Rotary Position Encoding (RoPE) |
|
|
- **Training data**: Tabletop manipulation demonstrations |
|
|
|
|
|
## Related Resources |
|
|
|
|
|
- [lerobot_policy_a2](https://github.com/dgrachev/lerobot_policy_a2) - Policy package |
|
|
- [lerobot_grach0v](https://github.com/grach0v/lerobot) - LeRobot fork with A2 environment |
|
|
- [a2_assets](https://huggingface.co/datasets/dgrachev/a2_assets) - Environment assets |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{a2_policy, |
|
|
author = {Denis Grachev}, |
|
|
title = {A2 Policy: CLIP-based 6-DOF Grasp and Place Policy}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/dgrachev/a2_pretrained} |
|
|
} |
|
|
``` |
|
|
|