LabelPlayground / README.md
Erick
Upload folder using huggingface_hub
47cb9bd verified

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: LabelPlayground
app_file: app.py
sdk: gradio
sdk_version: 6.8.0

autolabel β€” OWLv2 + SAM2 labeling pipeline

Auto-label images using OWLv2 (open-vocabulary object detection) and optionally SAM2 (instance segmentation), then export a COCO dataset ready for model fine-tuning.


Quickstart

# 1. Install
uv sync

# 2. Copy env file (sets PYTORCH_ENABLE_MPS_FALLBACK=1 for Apple Silicon)
cp .env.example .env

# 3. Launch
make app

Models download automatically on first use and are cached in ~/.cache/huggingface. Nothing else is written to the project directory.

Model Size Purpose
owlv2-large-patch14-finetuned ~700 MB Text β†’ bounding boxes
sam2-hiera-tiny ~160 MB Box prompts β†’ pixel masks

How the app works

Mode selector

Both tabs have a Detection / Segmentation radio button:

Mode What runs COCO output
Detection OWLv2 only bbox + empty segmentation: []
Segmentation OWLv2 β†’ SAM2 bbox + segmentation polygon list

How Detection and Segmentation work

Detection uses OWLv2 β€” an open-vocabulary object detector. You give it a text prompt ("cup, bottle") and it returns bounding boxes with confidence scores. No fixed class list, no retraining needed.

Segmentation uses the Grounded SAM2 pattern β€” two models chained together:

Text prompts ("cup, bottle")
        β”‚
        β–Ό
     OWLv2          ← understands text, produces bounding boxes
        β”‚
        β–Ό
  Bounding boxes
        β”‚
        β–Ό
     SAM2           ← understands spatial prompts, produces pixel masks
        β”‚
        β–Ό
  Masks + COCO polygons

SAM2 (sam2-hiera-tiny) is a prompt-based segmenter β€” it accepts box, point, or mask prompts but has no concept of text or class names. It can't answer "find me a cup"; it can only answer "segment the object inside this box." OWLv2 is the grounding step that translates your words into coordinates SAM2 can act on.

Both models run in Segmentation mode. Detection mode skips SAM2 entirely.

πŸ§ͺ Test tab

Upload a single image, pick a mode, and type comma-separated object prompts. Hit Detect to see an annotated preview alongside a results table (label, confidence, bounding box). In Segmentation mode, pixel mask overlays are drawn on top of the bounding boxes. Use this tab to dial in prompts and threshold before a batch run β€” nothing is saved to disk.

πŸ“‚ Batch tab

Upload multiple images and run the chosen mode on all of them at once. You get:

  • An annotated gallery showing every image
  • A Download ZIP button containing:
    • coco_export.json β€” COCO-format annotations ready for fine-tuning
    • images/ β€” all images resized to your chosen training size

The size dropdown offers common YOLOX training resolutions (416 β†’ 1024) plus As is to keep the original dimensions. Coordinates in the COCO file match the resized images exactly.

All artifacts live in a system temp directory β€” nothing is written to the project.


Project layout

autolabel/
β”œβ”€β”€ config.py       # Pydantic settings, auto device detection (CUDA β†’ MPS β†’ CPU)
β”œβ”€β”€ detect.py       # OWLv2 inference β€” infer() shared by app + CLI
β”œβ”€β”€ segment.py      # SAM2 integration β€” box prompts β†’ masks + COCO polygons
β”œβ”€β”€ export.py       # COCO JSON builder (no pycocotools); bbox + segmentation
β”œβ”€β”€ finetune.py     # Fine-tuning loop (future use)
└── utils.py        # Shared helpers
scripts/
β”œβ”€β”€ run_detection.py   # CLI: batch detect β†’ data/detections/
β”œβ”€β”€ export_coco.py     # CLI: build coco_export.json from data/labeled/
└── finetune_owlv2.py  # CLI: fine-tune OWLv2 (future use)
app.py              # Gradio web UI

CLI workflow

Detection and export can be driven from the command line without the UI:

# Detect all images in data/raw/ β†’ data/detections/
make detect

# Custom prompts
uv run python scripts/run_detection.py --prompts "cup,mug,bottle"

# Force re-run on already-processed images
uv run python scripts/run_detection.py --force

# Build COCO JSON from data/labeled/
make export

Fine-tuning (future)

The fine-tuning infrastructure is already in place. Once you have a coco_export.json from a Batch run:

make finetune
# or:
uv run python scripts/finetune_owlv2.py \
  --coco-json data/labeled/coco_export.json \
  --image-dir data/raw \
  --epochs 10

Key hyperparameters

Parameter Default Notes
Epochs 10 More epochs β†’ higher overfit risk on small datasets
Learning rate 1e-4 Applied to the detection head
Gradient accumulation 4 Effective batch size multiplier
Unfreeze backbone off Also trains the vision encoder β€” needs more data

Tips

  • Start with 50–100 annotated images per class minimum; 200–500 is better.
  • Fine-tuned models are more confident β€” raise the threshold to 0.2–0.4.
  • Leave the backbone frozen unless you have 500+ images per class.

Prerequisites

Tool Version Notes
Python 3.11.x Managed by uv
uv latest curl -LsSf https://astral.sh/uv/install.sh | sh
CUDA toolkit 11.8+ Windows/Linux GPU users only

Apple Silicon: PYTORCH_ENABLE_MPS_FALLBACK=1 is pre-set in .env.example.

Windows/CUDA: remove PYTORCH_ENABLE_MPS_FALLBACK from .env. For a specific CUDA build:

uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
uv sync

Makefile targets

Target Description
make setup Install dependencies, copy .env.example
make app Launch the Gradio UI
make detect Batch detect via CLI β†’ data/detections/
make export Build COCO JSON via CLI
make finetune Fine-tune OWLv2 via CLI
make clean Delete generated JSONs (raw images untouched)

License

MIT