| --- |
| license: other |
| extra_gated_fields: |
| First Name: text |
| Last Name: text |
| Date of birth: date_picker |
| Country: country |
| Affiliation: text |
| Job title: |
| type: select |
| options: |
| - Student |
| - Research Graduate |
| - AI researcher |
| - AI developer/engineer |
| - Reporter |
| - Other |
| geo: ip_location |
| By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox |
| extra_gated_description: >- |
| The information you provide will be collected, stored, processed and shared in |
| accordance with the [Meta Privacy |
| Policy](https://www.facebook.com/privacy/policy/). |
| extra_gated_button_content: Submit |
| language: |
| - en |
| pipeline_tag: mask-generation |
| --- |
| |
| SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. Compared to its predecessor [SAM 2](https://github.com/facebookresearch/sam2), SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short text phrase or exemplars. Unlike prior work, SAM 3 can handle a vastly larger set of open-vocabulary prompts. It achieves 75-80% of human performance on our new [SA-CO benchmark](https://github.com/facebookresearch/sam3/edit/main_readme/README.md#sa-co-dataset) which contains 270K unique concepts, over 50 times more than existing benchmarks. |
|
|
|
|
| ### Basic Usage |
|
|
| ```python |
| import torch |
| #################################### For Image #################################### |
| from PIL import Image |
| from sam3.model_builder import build_sam3_image_model |
| from sam3.model.sam3_image_processor import Sam3Processor |
| # Load the model |
| model = build_sam3_image_model() |
| processor = Sam3Processor(model) |
| # Load an image |
| image = Image.open("<YOUR_IMAGE_PATH.jpg>") |
| inference_state = processor.set_image(image) |
| # Prompt the model with text |
| output = processor.set_text_prompt(state=inference_state, prompt="<YOUR_TEXT_PROMPT>") |
| |
| # Get the masks, bounding boxes, and scores |
| masks, boxes, scores = output["masks"], output["boxes"], output["scores"] |
| |
| #################################### For Video #################################### |
| |
| from sam3.model_builder import build_sam3_video_predictor |
| |
| video_predictor = build_sam3_video_predictor() |
| video_path = "<YOUR_VIDEO_PATH>" # a JPEG folder or an MP4 video file |
| # Start a session |
| response = video_predictor.handle_request( |
| request=dict( |
| type="start_session", |
| resource_path=video_path, |
| ) |
| ) |
| response = video_predictor.handle_request( |
| request=dict( |
| type="add_prompt", |
| session_id=response["session_id"], |
| frame_index=0, # Arbitrary frame index |
| text="<YOUR_TEXT_PROMPT>", |
| ) |
| ) |
| output = response["outputs"] |
| ``` |
|
|
| The official code is publicly release in the [sam3 repo](https://github.com/facebookresearch/sam3). |
|
|