Improve model card with metadata, abstract, and usage example

This PR improves the model card for the `michaelyuanqwq/roboengine-sam` model by:

- Adding `pipeline_tag: image-segmentation` to improve discoverability on the Hugging Face Hub.
- Adding `library_name: transformers` to indicate compatibility with the 🤗 Transformers library, enabling the "Use in Transformers" badge.
- Adding relevant tags (`segmentation`, `robotics`, `computer-vision`) for better categorization.
- Including the full paper abstract for comprehensive information.
- Adding a direct link to the Hugging Face paper page (`https://huggingface.co/papers/2503.18738`) and the GitHub repository.
- Providing a basic Python code example for direct inference using the `transformers` library.

Files changed (1) hide show

README.md +83 -6

README.md CHANGED Viewed

@@ -1,21 +1,98 @@
 ---
-license: mit
 datasets:
 - michaelyuanqwq/roboseg
 ---
-<h1> RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation </h1>
 **[Chengbo Yuan*](https://michaelyuancb.github.io/), [Suraj Joshi*](https://x.com/nonlinearjunkie), [Shaoting Zhu*](https://zst1406217.github.io/), [Hang Su](https://scholar.google.com/citations?user=dxN1_X0AAAAJ&hl=en), [Hang Zhao](https://hangzhaomit.github.io/), [Yang Gao](https://yang-gao.weebly.com/).**
-**[[Project Website](https://roboengine.github.io/)] [[Arxiv](https://arxiv.org/abs/2503.18738)] [[BibTex](#jump)]**
-The Robo-SAM checkpoints of "RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation".
-Please checkout https://github.com/michaelyuancb/roboengine for more details.
-### BibTex
 ```
 @article{yuan2025roboengine,
   title={RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation},
   author={Yuan, Chengbo and Joshi, Suraj and Zhu, Shaoting and Su, Hang and Zhao, Hang and Gao, Yang},

 ---
 datasets:
 - michaelyuanqwq/roboseg
+license: mit
+pipeline_tag: image-segmentation
+library_name: transformers
+tags:
+- segmentation
+- robotics
+- computer-vision
 ---
+# RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation
 **[Chengbo Yuan*](https://michaelyuancb.github.io/), [Suraj Joshi*](https://x.com/nonlinearjunkie), [Shaoting Zhu*](https://zst1406217.github.io/), [Hang Su](https://scholar.google.com/citations?user=dxN1_X0AAAAJ&hl=en), [Hang Zhao](https://hangzhaomit.github.io/), [Yang Gao](https://yang-gao.weebly.com/).**
+**[[Project Website](https://roboengine.github.io/)] [[Hugging Face Paper](https://huggingface.co/papers/2503.18738)] [[arXiv](https://arxiv.org/abs/2503.18738)] [[GitHub Code](https://github.com/michaelyuancb/roboengine)] [[BibTex](#jump)]**
+This repository contains the Robo-SAM checkpoints from the paper "RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation". RoboEngine introduces the first plug-and-play visual robot data augmentation toolkit, enabling users to effortlessly generate physics- and task-aware robot scenes with just a few lines of code. It significantly enhances the visual robustness of imitation learning by addressing limitations of existing methods.
+## Abstract
+Visual augmentation has become a crucial technique for enhancing the visual robustness of imitation learning. However, existing methods are often limited by prerequisites such as camera calibration or the need for controlled environments (e.g., green screen setups). In this work, we introduce RoboEngine, the first plug-and-play visual robot data augmentation toolkit. For the first time, users can effortlessly generate physics- and task-aware robot scenes with just a few lines of code. To achieve this, we present a novel robot scene segmentation dataset, a generalizable high-quality robot segmentation model, and a fine-tuned background generation model, which together form the core components of the out-of-the-box toolkit. Using RoboEngine, we demonstrate the ability to generalize robot manipulation tasks across six entirely new scenes, based solely on demonstrations collected from a single scene, achieving a more than 200% performance improvement compared to the no-augmentation baseline. All datasets, model weights, and the toolkit are released this https URL.
+## Usage
+This model is a Robo-SAM checkpoint and can be loaded using the Hugging Face `transformers` library with `trust_remote_code=True`. It can be used for semantic robot segmentation.
+```python
+from transformers import AutoProcessor, AutoModel
+from PIL import Image
+import torch
+import numpy as np
+# Load model and processor
+# Make sure you have installed `transformers` and `torch`
+# If you encounter errors, try `pip install torch` and `pip install transformers`
+model = AutoModel.from_pretrained("michaelyuanqwq/roboengine-sam", trust_remote_code=True)
+processor = AutoProcessor.from_pretrained("michaelyuanqwq/roboengine-sam", trust_remote_code=True)
+# Example image input: replace 'your_robot_image.png' with the actual path to your image.
+# You can find example images in the original GitHub repository:
+# https://github.com/michaelyuancb/roboengine/tree/main/assets
+try:
+    # Create a dummy image if file not found for demonstration
+    try:
+        raw_image = Image.open("your_robot_image.png").convert("RGB")
+    except FileNotFoundError:
+        print("Sample image 'your_robot_image.png' not found. Creating a dummy white image for demonstration.")
+        raw_image = Image.new('RGB', (512, 512), color = 'white')
+    # Prepare inputs for semantic robot segmentation
+    # The model expects input points or bounding boxes. A central point is often used
+    # as a default to prompt for the main object (robot) in the image.
+    input_points = [[[raw_image.height / 2, raw_image.width / 2]]]
+    inputs = processor(raw_image, input_points=input_points, return_tensors="pt")
+    # Move inputs to the appropriate device (e.g., GPU if available)
+    if torch.cuda.is_available():
+        for k,v in inputs.items():
+            if isinstance(v, torch.Tensor):
+                inputs[k] = v.to(model.device)
+    # Perform inference
+    with torch.no_grad():
+        outputs = model(**inputs)
+    # Post-process masks
+    # The output `outputs.pred_masks` contains the predicted masks.
+    # `post_process_masks` converts them to original image dimensions.
+    masks = processor.post_process_masks(
+        outputs.pred_masks.cpu(),
+        inputs["original_sizes"].cpu(),
+        inputs["reshaped_input_sizes"].cpu()
+    )[0] # Take the masks for the first image in the batch
+    # `masks` is a list of dictionaries, each describing a segmented object.
+    # The 'segmentation' key contains a boolean NumPy array.
+    if masks:
+        # Assuming the first mask is the primary robot segmentation
+        robot_mask_array = masks[0]['segmentation'].numpy()
+        # Save the mask as an image (e.g., black where not robot, white where robot)
+        Image.fromarray(robot_mask_array.astype(np.uint8) * 255).save("robot_segmented_mask.png")
+        print("Robot segmentation mask saved as robot_segmented_mask.png")
+    else:
+        print("No masks were generated for the input image.")
+except Exception as e:
+    print(f"An error occurred during usage example: {e}")
+    print("Please ensure all dependencies are installed and provide a valid image path.")
 ```
+For a more comprehensive understanding and usage of RoboEngine as a full toolkit for robot data augmentation, please refer to the [official GitHub repository](https://github.com/michaelyuancb/roboengine).
+## BibTex
+```bibtex
 @article{yuan2025roboengine,
   title={RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation},
   author={Yuan, Chengbo and Joshi, Suraj and Zhu, Shaoting and Su, Hang and Zhao, Hang and Gao, Yang},