--- license: apache-2.0 tags: - robotics - manipulation - grasp - lerobot - clip --- # A2 Pretrained Policy Pretrained ViLGP3D policy for 6-DOF grasp and place tasks in tabletop manipulation. ## Model Description This model uses CLIP-based cross-attention for selecting grasp and place poses from candidates generated by GraspNet/PlaceNet. ## Files - `sl_checkpoint_199.pth`: Trained policy weights (ViLGP3D fusion network) - `checkpoint-rs.tar`: GraspNet checkpoint for grasp candidate generation ## Usage ### With lerobot_policy_a2 ```python from lerobot_policy_a2 import A2Policy # Load pretrained model policy = A2Policy.from_pretrained("dgrachev/a2_pretrained") # Use for grasp prediction action, info = policy.predict_grasp( color_images={"front": rgb_image}, depth_images={"front": depth_image}, point_cloud=point_cloud, lang_goal="grasp a round object" ) ``` ## Training Details - **Architecture**: ViLGP3D with CLIP ViT-B/32 backbone - **Hidden dim**: 768 - **Attention heads**: 8 - **Position encoding**: Rotary Position Encoding (RoPE) - **Training data**: Tabletop manipulation demonstrations ## Related Resources - [lerobot_policy_a2](https://github.com/dgrachev/lerobot_policy_a2) - Policy package - [lerobot_grach0v](https://github.com/grach0v/lerobot) - LeRobot fork with A2 environment - [a2_assets](https://huggingface.co/datasets/dgrachev/a2_assets) - Environment assets ## Citation ```bibtex @misc{a2_policy, author = {Denis Grachev}, title = {A2 Policy: CLIP-based 6-DOF Grasp and Place Policy}, year = {2025}, publisher = {HuggingFace}, url = {https://huggingface.co/dgrachev/a2_pretrained} } ```