---
license: apache-2.0
tags:
  - robotics
  - manipulation
  - grasp
  - lerobot
  - clip
---

# A2 Pretrained Policy

Pretrained ViLGP3D policy for 6-DOF grasp and place tasks in tabletop manipulation.

## Model Description

This model uses CLIP-based cross-attention for selecting grasp and place poses from candidates generated by GraspNet/PlaceNet.

## Files

- `sl_checkpoint_199.pth`: Trained policy weights (ViLGP3D fusion network)
- `checkpoint-rs.tar`: GraspNet checkpoint for grasp candidate generation

## Usage

### With lerobot_policy_a2

```python
from lerobot_policy_a2 import A2Policy

# Load pretrained model
policy = A2Policy.from_pretrained("dgrachev/a2_pretrained")

# Use for grasp prediction
action, info = policy.predict_grasp(
    color_images={"front": rgb_image},
    depth_images={"front": depth_image},
    point_cloud=point_cloud,
    lang_goal="grasp a round object"
)
```


## Training Details

- **Architecture**: ViLGP3D with CLIP ViT-B/32 backbone
- **Hidden dim**: 768
- **Attention heads**: 8
- **Position encoding**: Rotary Position Encoding (RoPE)
- **Training data**: Tabletop manipulation demonstrations

## Related Resources

- [lerobot_policy_a2](https://github.com/dgrachev/lerobot_policy_a2) - Policy package
- [lerobot_grach0v](https://github.com/grach0v/lerobot) - LeRobot fork with A2 environment
- [a2_assets](https://huggingface.co/datasets/dgrachev/a2_assets) - Environment assets

## Citation

```bibtex
@misc{a2_policy,
  author = {Denis Grachev},
  title = {A2 Policy: CLIP-based 6-DOF Grasp and Place Policy},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/dgrachev/a2_pretrained}
}
```