dgrachev
/

a2_pretrained

Model card Files Files and versions

a2_pretrained / README.md

dgrachev's picture

Update README.md

7856e4e verified 10 days ago

|

history blame contribute delete

1.67 kB

	---
	license: apache-2.0
	tags:
	- robotics
	- manipulation
	- grasp
	- lerobot
	- clip
	---

	# A2 Pretrained Policy

	Pretrained ViLGP3D policy for 6-DOF grasp and place tasks in tabletop manipulation.

	## Model Description

	This model uses CLIP-based cross-attention for selecting grasp and place poses from candidates generated by GraspNet/PlaceNet.

	## Files

	- `sl_checkpoint_199.pth`: Trained policy weights (ViLGP3D fusion network)
	- `checkpoint-rs.tar`: GraspNet checkpoint for grasp candidate generation

	## Usage

	### With lerobot_policy_a2

	```python
	from lerobot_policy_a2 import A2Policy

	# Load pretrained model
	policy = A2Policy.from_pretrained("dgrachev/a2_pretrained")

	# Use for grasp prediction
	action, info = policy.predict_grasp(
	color_images={"front": rgb_image},
	depth_images={"front": depth_image},
	point_cloud=point_cloud,
	lang_goal="grasp a round object"
	)
	```


	## Training Details

	- Architecture: ViLGP3D with CLIP ViT-B/32 backbone
	- Hidden dim: 768
	- Attention heads: 8
	- Position encoding: Rotary Position Encoding (RoPE)
	- Training data: Tabletop manipulation demonstrations

	## Related Resources

	- [lerobot_policy_a2](https://github.com/dgrachev/lerobot_policy_a2) - Policy package
	- [lerobot_grach0v](https://github.com/grach0v/lerobot) - LeRobot fork with A2 environment
	- [a2_assets](https://huggingface.co/datasets/dgrachev/a2_assets) - Environment assets

	## Citation

	```bibtex
	@misc{a2_policy,
	author = {Denis Grachev},
	title = {A2 Policy: CLIP-based 6-DOF Grasp and Place Policy},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/dgrachev/a2_pretrained}
	}
	```