Spaces:

bytestream89
/

LabelPlayground

Sleeping

App Files Files Community

LabelPlayground / README.md

Erick

Upload folder using huggingface_hub

47cb9bd verified 16 days ago

preview code

raw

history blame contribute delete

6.23 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: LabelPlayground
app_file: app.py
sdk: gradio
sdk_version: 6.8.0

autolabel — OWLv2 + SAM2 labeling pipeline

Auto-label images using OWLv2 (open-vocabulary object detection) and optionally SAM2 (instance segmentation), then export a COCO dataset ready for model fine-tuning.

Quickstart

# 1. Install
uv sync

# 2. Copy env file (sets PYTORCH_ENABLE_MPS_FALLBACK=1 for Apple Silicon)
cp .env.example .env

# 3. Launch
make app

Models download automatically on first use and are cached in ~/.cache/huggingface. Nothing else is written to the project directory.

Model	Size	Purpose
`owlv2-large-patch14-finetuned`	~700 MB	Text → bounding boxes
`sam2-hiera-tiny`	~160 MB	Box prompts → pixel masks

How the app works

Mode selector

Both tabs have a Detection / Segmentation radio button:

Mode	What runs	COCO output
Detection	OWLv2 only	`bbox` + empty `segmentation: []`
Segmentation	OWLv2 → SAM2	`bbox` + `segmentation` polygon list

How Detection and Segmentation work

Detection uses OWLv2 — an open-vocabulary object detector. You give it a text prompt ("cup, bottle") and it returns bounding boxes with confidence scores. No fixed class list, no retraining needed.

Segmentation uses the Grounded SAM2 pattern — two models chained together:

Text prompts ("cup, bottle")
        │
        ▼
     OWLv2          ← understands text, produces bounding boxes
        │
        ▼
  Bounding boxes
        │
        ▼
     SAM2           ← understands spatial prompts, produces pixel masks
        │
        ▼
  Masks + COCO polygons

SAM2 (sam2-hiera-tiny) is a prompt-based segmenter — it accepts box, point, or mask prompts but has no concept of text or class names. It can't answer "find me a cup"; it can only answer "segment the object inside this box." OWLv2 is the grounding step that translates your words into coordinates SAM2 can act on.

Both models run in Segmentation mode. Detection mode skips SAM2 entirely.

🧪 Test tab

Upload a single image, pick a mode, and type comma-separated object prompts. Hit Detect to see an annotated preview alongside a results table (label, confidence, bounding box). In Segmentation mode, pixel mask overlays are drawn on top of the bounding boxes. Use this tab to dial in prompts and threshold before a batch run — nothing is saved to disk.

📂 Batch tab

Upload multiple images and run the chosen mode on all of them at once. You get:

An annotated gallery showing every image
A Download ZIP button containing:
- coco_export.json — COCO-format annotations ready for fine-tuning
- images/ — all images resized to your chosen training size

The size dropdown offers common YOLOX training resolutions (416 → 1024) plus As is to keep the original dimensions. Coordinates in the COCO file match the resized images exactly.

All artifacts live in a system temp directory — nothing is written to the project.

Project layout

autolabel/
├── config.py       # Pydantic settings, auto device detection (CUDA → MPS → CPU)
├── detect.py       # OWLv2 inference — infer() shared by app + CLI
├── segment.py      # SAM2 integration — box prompts → masks + COCO polygons
├── export.py       # COCO JSON builder (no pycocotools); bbox + segmentation
├── finetune.py     # Fine-tuning loop (future use)
└── utils.py        # Shared helpers
scripts/
├── run_detection.py   # CLI: batch detect → data/detections/
├── export_coco.py     # CLI: build coco_export.json from data/labeled/
└── finetune_owlv2.py  # CLI: fine-tune OWLv2 (future use)
app.py              # Gradio web UI

CLI workflow

Detection and export can be driven from the command line without the UI:

# Detect all images in data/raw/ → data/detections/
make detect

# Custom prompts
uv run python scripts/run_detection.py --prompts "cup,mug,bottle"

# Force re-run on already-processed images
uv run python scripts/run_detection.py --force

# Build COCO JSON from data/labeled/
make export

Fine-tuning (future)

The fine-tuning infrastructure is already in place. Once you have a coco_export.json from a Batch run:

make finetune
# or:
uv run python scripts/finetune_owlv2.py \
  --coco-json data/labeled/coco_export.json \
  --image-dir data/raw \
  --epochs 10

Key hyperparameters

Parameter	Default	Notes
Epochs	10	More epochs → higher overfit risk on small datasets
Learning rate	1e-4	Applied to the detection head
Gradient accumulation	4	Effective batch size multiplier
Unfreeze backbone	off	Also trains the vision encoder — needs more data

Tips

Start with 50–100 annotated images per class minimum; 200–500 is better.
Fine-tuned models are more confident — raise the threshold to 0.2–0.4.
Leave the backbone frozen unless you have 500+ images per class.

Prerequisites

Tool	Version	Notes
Python	3.11.x	Managed by uv
uv	latest	`curl -LsSf https://astral.sh/uv/install.sh \| sh`
CUDA toolkit	11.8+	Windows/Linux GPU users only

Apple Silicon: PYTORCH_ENABLE_MPS_FALLBACK=1 is pre-set in .env.example.

Windows/CUDA: remove PYTORCH_ENABLE_MPS_FALLBACK from .env. For a specific CUDA build:

uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
uv sync

Makefile targets

Target	Description
`make setup`	Install dependencies, copy `.env.example`
`make app`	Launch the Gradio UI
`make detect`	Batch detect via CLI → `data/detections/`
`make export`	Build COCO JSON via CLI
`make finetune`	Fine-tune OWLv2 via CLI
`make clean`	Delete generated JSONs (raw images untouched)

License

MIT