a2_pretrained / README.md
dgrachev's picture
Update README.md
7856e4e verified
---
license: apache-2.0
tags:
- robotics
- manipulation
- grasp
- lerobot
- clip
---
# A2 Pretrained Policy
Pretrained ViLGP3D policy for 6-DOF grasp and place tasks in tabletop manipulation.
## Model Description
This model uses CLIP-based cross-attention for selecting grasp and place poses from candidates generated by GraspNet/PlaceNet.
## Files
- `sl_checkpoint_199.pth`: Trained policy weights (ViLGP3D fusion network)
- `checkpoint-rs.tar`: GraspNet checkpoint for grasp candidate generation
## Usage
### With lerobot_policy_a2
```python
from lerobot_policy_a2 import A2Policy
# Load pretrained model
policy = A2Policy.from_pretrained("dgrachev/a2_pretrained")
# Use for grasp prediction
action, info = policy.predict_grasp(
color_images={"front": rgb_image},
depth_images={"front": depth_image},
point_cloud=point_cloud,
lang_goal="grasp a round object"
)
```
## Training Details
- **Architecture**: ViLGP3D with CLIP ViT-B/32 backbone
- **Hidden dim**: 768
- **Attention heads**: 8
- **Position encoding**: Rotary Position Encoding (RoPE)
- **Training data**: Tabletop manipulation demonstrations
## Related Resources
- [lerobot_policy_a2](https://github.com/dgrachev/lerobot_policy_a2) - Policy package
- [lerobot_grach0v](https://github.com/grach0v/lerobot) - LeRobot fork with A2 environment
- [a2_assets](https://huggingface.co/datasets/dgrachev/a2_assets) - Environment assets
## Citation
```bibtex
@misc{a2_policy,
author = {Denis Grachev},
title = {A2 Policy: CLIP-based 6-DOF Grasp and Place Policy},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/dgrachev/a2_pretrained}
}
```