dgrachev
/

a2_pretrained

@@ -1,30 +1,77 @@
-# A2 Pretrained Model
-Pretrained ViLGP3D model for 6-DOF grasp pose selection in tabletop manipulation.
-## Model Architecture
-- **Network**: CLIPAction (CLIP-based action selection with cross-attention)
-- **Width**: 768
-- **Layers**: 1
-- **Heads**: 8
-- **Action Dim**: 7 (xyz + quaternion)
-- **Features**: RoPE (Rotary Position Encoding)
 ## Usage
 ```python
-from lerobot_policy_a2 import A2Policy, A2Config
 # Load pretrained model
 policy = A2Policy.from_pretrained("dgrachev/a2_pretrained")
 ```
-## Training Data
-Trained on simulated tabletop grasping with UR5e robot and Robotiq gripper.
-## Related
-- Environment: Install with `pip install lerobot[a2]`
-- Assets: [dgrachev/a2_assets](https://huggingface.co/datasets/dgrachev/a2_assets)

+---
+license: apache-2.0
+tags:
+  - robotics
+  - manipulation
+  - grasp
+  - lerobot
+  - clip
+---
+# A2 Pretrained Policy
+Pretrained ViLGP3D policy for 6-DOF grasp and place tasks in tabletop manipulation.
+## Model Description
+This model uses CLIP-based cross-attention for selecting grasp and place poses from candidates generated by GraspNet/PlaceNet.
+## Files
+- `sl_checkpoint_199.pth`: Trained policy weights (ViLGP3D fusion network)
+- `checkpoint-rs.tar`: GraspNet checkpoint for grasp candidate generation
 ## Usage
+### With lerobot_policy_a2
 ```python
+from lerobot_policy_a2 import A2Policy
 # Load pretrained model
 policy = A2Policy.from_pretrained("dgrachev/a2_pretrained")
+# Use for grasp prediction
+action, info = policy.predict_grasp(
+    color_images={"front": rgb_image},
+    depth_images={"front": depth_image},
+    point_cloud=point_cloud,
+    lang_goal="grasp a round object"
+)
+```
+### With LeRobot A2 Environment
+```bash
+# Data collection
+A2_DISABLE_EGL=true uv run python -m lerobot.envs.a2_collect     --policy a2     --hf_repo dgrachev/a2_pretrained     --task grasp     --num_episodes 100
+# Benchmark evaluation
+A2_DISABLE_EGL=true uv run python -m lerobot.envs.a2_benchmark     --task grasp     --policy a2     --hf_repo dgrachev/a2_pretrained
 ```
+## Training Details
+- **Architecture**: ViLGP3D with CLIP ViT-B/32 backbone
+- **Hidden dim**: 768
+- **Attention heads**: 8
+- **Position encoding**: Rotary Position Encoding (RoPE)
+- **Training data**: Tabletop manipulation demonstrations
+## Related Resources
+- [lerobot_policy_a2](https://github.com/dgrachev/lerobot_policy_a2) - Policy package
+- [lerobot_grach0v](https://github.com/grach0v/lerobot) - LeRobot fork with A2 environment
+- [a2_assets](https://huggingface.co/datasets/dgrachev/a2_assets) - Environment assets
+## Citation
+```bibtex
+@misc{a2_policy,
+  author = {Denis Grachev},
+  title = {A2 Policy: CLIP-based 6-DOF Grasp and Place Policy},
+  year = {2025},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/dgrachev/a2_pretrained}
+}
+```