nielsr HF Staff commited on
Commit
3ba18ee
·
verified ·
1 Parent(s): 21a0cfd

Improve model card with metadata, abstract, and usage example

Browse files

This PR improves the model card for the `michaelyuanqwq/roboengine-sam` model by:

- Adding `pipeline_tag: image-segmentation` to improve discoverability on the Hugging Face Hub.
- Adding `library_name: transformers` to indicate compatibility with the 🤗 Transformers library, enabling the "Use in Transformers" badge.
- Adding relevant tags (`segmentation`, `robotics`, `computer-vision`) for better categorization.
- Including the full paper abstract for comprehensive information.
- Adding a direct link to the Hugging Face paper page (`https://huggingface.co/papers/2503.18738`) and the GitHub repository.
- Providing a basic Python code example for direct inference using the `transformers` library.

Files changed (1) hide show
  1. README.md +83 -6
README.md CHANGED
@@ -1,21 +1,98 @@
1
  ---
2
- license: mit
3
  datasets:
4
  - michaelyuanqwq/roboseg
 
 
 
 
 
 
 
5
  ---
6
- <h1> RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation </h1>
 
7
 
8
  **[Chengbo Yuan*](https://michaelyuancb.github.io/), [Suraj Joshi*](https://x.com/nonlinearjunkie), [Shaoting Zhu*](https://zst1406217.github.io/), [Hang Su](https://scholar.google.com/citations?user=dxN1_X0AAAAJ&hl=en), [Hang Zhao](https://hangzhaomit.github.io/), [Yang Gao](https://yang-gao.weebly.com/).**
9
 
10
- **[[Project Website](https://roboengine.github.io/)] [[Arxiv](https://arxiv.org/abs/2503.18738)] [[BibTex](#jump)]**
 
 
 
 
 
 
 
 
 
11
 
 
 
 
 
 
12
 
13
- The Robo-SAM checkpoints of "RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation".
 
 
 
 
14
 
15
- Please checkout https://github.com/michaelyuancb/roboengine for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- ### BibTex
18
  ```
 
 
 
 
19
  @article{yuan2025roboengine,
20
  title={RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation},
21
  author={Yuan, Chengbo and Joshi, Suraj and Zhu, Shaoting and Su, Hang and Zhao, Hang and Gao, Yang},
 
1
  ---
 
2
  datasets:
3
  - michaelyuanqwq/roboseg
4
+ license: mit
5
+ pipeline_tag: image-segmentation
6
+ library_name: transformers
7
+ tags:
8
+ - segmentation
9
+ - robotics
10
+ - computer-vision
11
  ---
12
+
13
+ # RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation
14
 
15
  **[Chengbo Yuan*](https://michaelyuancb.github.io/), [Suraj Joshi*](https://x.com/nonlinearjunkie), [Shaoting Zhu*](https://zst1406217.github.io/), [Hang Su](https://scholar.google.com/citations?user=dxN1_X0AAAAJ&hl=en), [Hang Zhao](https://hangzhaomit.github.io/), [Yang Gao](https://yang-gao.weebly.com/).**
16
 
17
+ **[[Project Website](https://roboengine.github.io/)] [[Hugging Face Paper](https://huggingface.co/papers/2503.18738)] [[arXiv](https://arxiv.org/abs/2503.18738)] [[GitHub Code](https://github.com/michaelyuancb/roboengine)] [[BibTex](#jump)]**
18
+
19
+ This repository contains the Robo-SAM checkpoints from the paper "RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation". RoboEngine introduces the first plug-and-play visual robot data augmentation toolkit, enabling users to effortlessly generate physics- and task-aware robot scenes with just a few lines of code. It significantly enhances the visual robustness of imitation learning by addressing limitations of existing methods.
20
+
21
+ ## Abstract
22
+ Visual augmentation has become a crucial technique for enhancing the visual robustness of imitation learning. However, existing methods are often limited by prerequisites such as camera calibration or the need for controlled environments (e.g., green screen setups). In this work, we introduce RoboEngine, the first plug-and-play visual robot data augmentation toolkit. For the first time, users can effortlessly generate physics- and task-aware robot scenes with just a few lines of code. To achieve this, we present a novel robot scene segmentation dataset, a generalizable high-quality robot segmentation model, and a fine-tuned background generation model, which together form the core components of the out-of-the-box toolkit. Using RoboEngine, we demonstrate the ability to generalize robot manipulation tasks across six entirely new scenes, based solely on demonstrations collected from a single scene, achieving a more than 200% performance improvement compared to the no-augmentation baseline. All datasets, model weights, and the toolkit are released this https URL.
23
+
24
+ ## Usage
25
+
26
+ This model is a Robo-SAM checkpoint and can be loaded using the Hugging Face `transformers` library with `trust_remote_code=True`. It can be used for semantic robot segmentation.
27
 
28
+ ```python
29
+ from transformers import AutoProcessor, AutoModel
30
+ from PIL import Image
31
+ import torch
32
+ import numpy as np
33
 
34
+ # Load model and processor
35
+ # Make sure you have installed `transformers` and `torch`
36
+ # If you encounter errors, try `pip install torch` and `pip install transformers`
37
+ model = AutoModel.from_pretrained("michaelyuanqwq/roboengine-sam", trust_remote_code=True)
38
+ processor = AutoProcessor.from_pretrained("michaelyuanqwq/roboengine-sam", trust_remote_code=True)
39
 
40
+ # Example image input: replace 'your_robot_image.png' with the actual path to your image.
41
+ # You can find example images in the original GitHub repository:
42
+ # https://github.com/michaelyuancb/roboengine/tree/main/assets
43
+ try:
44
+ # Create a dummy image if file not found for demonstration
45
+ try:
46
+ raw_image = Image.open("your_robot_image.png").convert("RGB")
47
+ except FileNotFoundError:
48
+ print("Sample image 'your_robot_image.png' not found. Creating a dummy white image for demonstration.")
49
+ raw_image = Image.new('RGB', (512, 512), color = 'white')
50
+
51
+ # Prepare inputs for semantic robot segmentation
52
+ # The model expects input points or bounding boxes. A central point is often used
53
+ # as a default to prompt for the main object (robot) in the image.
54
+ input_points = [[[raw_image.height / 2, raw_image.width / 2]]]
55
+
56
+ inputs = processor(raw_image, input_points=input_points, return_tensors="pt")
57
+ # Move inputs to the appropriate device (e.g., GPU if available)
58
+ if torch.cuda.is_available():
59
+ for k,v in inputs.items():
60
+ if isinstance(v, torch.Tensor):
61
+ inputs[k] = v.to(model.device)
62
+
63
+ # Perform inference
64
+ with torch.no_grad():
65
+ outputs = model(**inputs)
66
+
67
+ # Post-process masks
68
+ # The output `outputs.pred_masks` contains the predicted masks.
69
+ # `post_process_masks` converts them to original image dimensions.
70
+ masks = processor.post_process_masks(
71
+ outputs.pred_masks.cpu(),
72
+ inputs["original_sizes"].cpu(),
73
+ inputs["reshaped_input_sizes"].cpu()
74
+ )[0] # Take the masks for the first image in the batch
75
+
76
+ # `masks` is a list of dictionaries, each describing a segmented object.
77
+ # The 'segmentation' key contains a boolean NumPy array.
78
+ if masks:
79
+ # Assuming the first mask is the primary robot segmentation
80
+ robot_mask_array = masks[0]['segmentation'].numpy()
81
+ # Save the mask as an image (e.g., black where not robot, white where robot)
82
+ Image.fromarray(robot_mask_array.astype(np.uint8) * 255).save("robot_segmented_mask.png")
83
+ print("Robot segmentation mask saved as robot_segmented_mask.png")
84
+ else:
85
+ print("No masks were generated for the input image.")
86
+
87
+ except Exception as e:
88
+ print(f"An error occurred during usage example: {e}")
89
+ print("Please ensure all dependencies are installed and provide a valid image path.")
90
 
 
91
  ```
92
+ For a more comprehensive understanding and usage of RoboEngine as a full toolkit for robot data augmentation, please refer to the [official GitHub repository](https://github.com/michaelyuancb/roboengine).
93
+
94
+ ## BibTex
95
+ ```bibtex
96
  @article{yuan2025roboengine,
97
  title={RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation},
98
  author={Yuan, Chengbo and Joshi, Suraj and Zhu, Shaoting and Su, Hang and Zhao, Hang and Gao, Yang},