You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SAM3 Robot Segmentation — Training Code

This repository contains the training and inference code used to fine-tune SAM3 (Segment Anything Model 3 from Meta) for robot arm and gripper segmentation.

Trained Models

Model	HuggingFace	Description
`sam-cosmos-gripper`	🤗 sazirarrwth99/sam-cosmos-gripper	Gripper-only fine-tune
`sam3-robot-gripper-combined`	🤗 sazirarrwth99/sam3-robot-gripper-combined	Robot arm + gripper

Repository Contents

train_sam3_frozen.py          Main training script (frozen vision backbone)
infer_on_datasets.py          Batch inference across 4 datasets
run_inference.py              Single-image / directory inference
upload_to_hf.py               Upload checkpoints + model cards to HF Hub
upload_samples_and_code_to_hf.py  Upload samples + this code repo

prepare_gripper_dataset.py    Prepare DROID gripper annotations
prepare_combined_dataset.py   Merge RoboSeg + gripper into one COCO dataset
convert_roboseg_to_coco.py    Convert RoboSeg → COCO format

train_gripper_docker.sh       Docker launcher – gripper model
train_combined_docker.sh      Docker launcher – combined model
run_dataset_inference_docker.sh  Docker launcher – run inference on 4 datasets
run_hf_upload_docker.sh       Docker launcher – HF upload

Quick Start — Inference

# Via Docker (recommended)
docker run --gpus all --rm \
  -v /path/to/images:/images \
  -v /path/to/model.pt:/model/model.pt \
  -v /out:/outputs \
  cosmos-predict2.5-roboseg:latest \
  python development/Training/robot_segmentation/run_inference.py \
    --model_path /model/model.pt \
    --input_dir  /images \
    --output_dir /outputs \
    --prompt     gripper

Quick Start — Training

# 1. Prepare dataset
python prepare_gripper_dataset.py

# 2. Launch training (frozen vision backbone, transfer from robot checkpoint)
bash train_gripper_docker.sh

# 3. Monitor on WandB
#    Project: sam3-roboseg  |  Run: sam_cosmos_gripper

Training Details

Both models use the train_sam3_frozen.py script which:

Freezes the SAM3 vision encoder (ViT-H based)
Only trains the prompt encoder, mask decoder, and text cross-attention layers
Uses cosine LR schedule with 5-epoch linear warm-up
Saves best checkpoint by validation loss
Logs to WandB automatically when WANDB_API_KEY is set
Uploads final checkpoint to HuggingFace when HF_TOKEN is set

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics