SAM3 Robot Segmentation β Training Code
This repository contains the training and inference code used to fine-tune SAM3 (Segment Anything Model 3 from Meta) for robot arm and gripper segmentation.
Trained Models
| Model | HuggingFace | Description |
|---|---|---|
sam-cosmos-gripper |
π€ sazirarrwth99/sam-cosmos-gripper | Gripper-only fine-tune |
sam3-robot-gripper-combined |
π€ sazirarrwth99/sam3-robot-gripper-combined | Robot arm + gripper |
Repository Contents
train_sam3_frozen.py Main training script (frozen vision backbone)
infer_on_datasets.py Batch inference across 4 datasets
run_inference.py Single-image / directory inference
upload_to_hf.py Upload checkpoints + model cards to HF Hub
upload_samples_and_code_to_hf.py Upload samples + this code repo
prepare_gripper_dataset.py Prepare DROID gripper annotations
prepare_combined_dataset.py Merge RoboSeg + gripper into one COCO dataset
convert_roboseg_to_coco.py Convert RoboSeg β COCO format
train_gripper_docker.sh Docker launcher β gripper model
train_combined_docker.sh Docker launcher β combined model
run_dataset_inference_docker.sh Docker launcher β run inference on 4 datasets
run_hf_upload_docker.sh Docker launcher β HF upload
Quick Start β Inference
# Via Docker (recommended)
docker run --gpus all --rm \
-v /path/to/images:/images \
-v /path/to/model.pt:/model/model.pt \
-v /out:/outputs \
cosmos-predict2.5-roboseg:latest \
python development/Training/robot_segmentation/run_inference.py \
--model_path /model/model.pt \
--input_dir /images \
--output_dir /outputs \
--prompt gripper
Quick Start β Training
# 1. Prepare dataset
python prepare_gripper_dataset.py
# 2. Launch training (frozen vision backbone, transfer from robot checkpoint)
bash train_gripper_docker.sh
# 3. Monitor on WandB
# Project: sam3-roboseg | Run: sam_cosmos_gripper
Training Details
Both models use the train_sam3_frozen.py script which:
- Freezes the SAM3 vision encoder (ViT-H based)
- Only trains the prompt encoder, mask decoder, and text cross-attention layers
- Uses cosine LR schedule with 5-epoch linear warm-up
- Saves best checkpoint by validation loss
- Logs to WandB automatically when
WANDB_API_KEYis set - Uploads final checkpoint to HuggingFace when
HF_TOKENis set