facebook
/

sam3

Mask Generation

feature-extraction

Model card Files Files and versions

sam3 / README.md

hkhedr's picture

initial commit

7cb7408 verified 5 months ago

|

2.99 kB

	---
	license: other
	extra_gated_fields:
	First Name: text
	Last Name: text
	Date of birth: date_picker
	Country: country
	Affiliation: text
	Job title:
	type: select
	options:
	- Student
	- Research Graduate
	- AI researcher
	- AI developer/engineer
	- Reporter
	- Other
	geo: ip_location
	By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
	extra_gated_description: >-
	The information you provide will be collected, stored, processed and shared in
	accordance with the [Meta Privacy
	Policy](https://www.facebook.com/privacy/policy/).
	extra_gated_button_content: Submit
	language:
	- en
	pipeline_tag: mask-generation
	---

	SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. Compared to its predecessor [SAM 2](https://github.com/facebookresearch/sam2), SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short text phrase or exemplars. Unlike prior work, SAM 3 can handle a vastly larger set of open-vocabulary prompts. It achieves 75-80% of human performance on our new [SA-CO benchmark](https://github.com/facebookresearch/sam3/edit/main_readme/README.md#sa-co-dataset) which contains 270K unique concepts, over 50 times more than existing benchmarks.


	### Basic Usage

	```python
	import torch
	#################################### For Image ####################################
	from PIL import Image
	from sam3.model_builder import build_sam3_image_model
	from sam3.model.sam3_image_processor import Sam3Processor
	# Load the model
	model = build_sam3_image_model()
	processor = Sam3Processor(model)
	# Load an image
	image = Image.open("<YOUR_IMAGE_PATH.jpg>")
	inference_state = processor.set_image(image)
	# Prompt the model with text
	output = processor.set_text_prompt(state=inference_state, prompt="<YOUR_TEXT_PROMPT>")

	# Get the masks, bounding boxes, and scores
	masks, boxes, scores = output["masks"], output["boxes"], output["scores"]

	#################################### For Video ####################################

	from sam3.model_builder import build_sam3_video_predictor

	video_predictor = build_sam3_video_predictor()
	video_path = "<YOUR_VIDEO_PATH>" # a JPEG folder or an MP4 video file
	# Start a session
	response = video_predictor.handle_request(
	request=dict(
	type="start_session",
	resource_path=video_path,
	)
	)
	response = video_predictor.handle_request(
	request=dict(
	type="add_prompt",
	session_id=response["session_id"],
	frame_index=0, # Arbitrary frame index
	text="<YOUR_TEXT_PROMPT>",
	)
	)
	output = response["outputs"]
	```

	The official code is publicly release in the [sam3 repo](https://github.com/facebookresearch/sam3).