sazirarrwth99's picture
Add model card
19f7768 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - segmentation
  - robotics
  - sam3
  - training-code
  - robot-arm
  - gripper

SAM3 Robot Segmentation β€” Training Code

This repository contains the training and inference code used to fine-tune SAM3 (Segment Anything Model 3 from Meta) for robot arm and gripper segmentation.

Trained Models

Model HuggingFace Description
sam-cosmos-gripper πŸ€— sazirarrwth99/sam-cosmos-gripper Gripper-only fine-tune
sam3-robot-gripper-combined πŸ€— sazirarrwth99/sam3-robot-gripper-combined Robot arm + gripper

Repository Contents

train_sam3_frozen.py          Main training script (frozen vision backbone)
infer_on_datasets.py          Batch inference across 4 datasets
run_inference.py              Single-image / directory inference
upload_to_hf.py               Upload checkpoints + model cards to HF Hub
upload_samples_and_code_to_hf.py  Upload samples + this code repo

prepare_gripper_dataset.py    Prepare DROID gripper annotations
prepare_combined_dataset.py   Merge RoboSeg + gripper into one COCO dataset
convert_roboseg_to_coco.py    Convert RoboSeg β†’ COCO format

train_gripper_docker.sh       Docker launcher – gripper model
train_combined_docker.sh      Docker launcher – combined model
run_dataset_inference_docker.sh  Docker launcher – run inference on 4 datasets
run_hf_upload_docker.sh       Docker launcher – HF upload

Quick Start β€” Inference

# Via Docker (recommended)
docker run --gpus all --rm \
  -v /path/to/images:/images \
  -v /path/to/model.pt:/model/model.pt \
  -v /out:/outputs \
  cosmos-predict2.5-roboseg:latest \
  python development/Training/robot_segmentation/run_inference.py \
    --model_path /model/model.pt \
    --input_dir  /images \
    --output_dir /outputs \
    --prompt     gripper

Quick Start β€” Training

# 1. Prepare dataset
python prepare_gripper_dataset.py

# 2. Launch training (frozen vision backbone, transfer from robot checkpoint)
bash train_gripper_docker.sh

# 3. Monitor on WandB
#    Project: sam3-roboseg  |  Run: sam_cosmos_gripper

Training Details

Both models use the train_sam3_frozen.py script which:

  • Freezes the SAM3 vision encoder (ViT-H based)
  • Only trains the prompt encoder, mask decoder, and text cross-attention layers
  • Uses cosine LR schedule with 5-epoch linear warm-up
  • Saves best checkpoint by validation loss
  • Logs to WandB automatically when WANDB_API_KEY is set
  • Uploads final checkpoint to HuggingFace when HF_TOKEN is set