Instructions to use facebook/sam3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/sam3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("mask-generation", model="facebook/sam3")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("facebook/sam3") model = AutoModel.from_pretrained("facebook/sam3") - Notebooks
- Google Colab
- Kaggle
Changed Checkpoints?
SAM3 text-prompted segmentation broken after recent checkpoint update
Last week I was testing SAM3's feasibility for a object detection task on drone imagery. Using the Transformers integration with a text prompt of "cars", the model was returning detections (see image 1). This week, running the exact same code on the exact same images returns 0 detections (see image 2). No changes were made to my code or environment.
After investigating, I noticed the model load report shows a key naming mismatch in the text encoder:
The checkpoint stores text encoder weights under detector_model.text_encoder.*
The Sam3Model architecture expects them at text_encoder.text_model.*
This means the text encoder is loading with randomly initialized weights instead of the pretrained ones, which explains why the model is no longer responding to text prompts at all. I suspect the checkpoint was updated recently and the key names changed.
To reproduce:
from transformers import Sam3Processor, Sam3Model
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = Sam3Model.from_pretrained("facebook/sam3").to(device)
processor = Sam3Processor.from_pretrained("facebook/sam3")
from PIL import Image
image = Image.open("your_image.jpg").convert("RGB")
inputs = processor(images=image, text="cars", return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
results = processor.post_process_instance_segmentation(
outputs, target_sizes=[image.size[::-1]]
)[0]
print(f"Found {len(results['masks'])} instance(s)")
The load report on model initialization shows all 24 text encoder layers as UNEXPECTED (from checkpoint) and MISSING (in model), confirming the text encoder is not loading correctly.
Note: the image 1 result I attached from last week also now appears to have been wrong in hindsight β it was detecting construction materials as "cars", suggesting the text encoder may have already been partially broken then, just failing differently. The current behavior (0 detections on images with clearly visible cars) is a complete failure.
Has the checkpoint been updated recently? Is there a pinnable revision that is known to work?



