LabelPlayground / README.md
Erick
Upload folder using huggingface_hub
47cb9bd verified
---
title: LabelPlayground
app_file: app.py
sdk: gradio
sdk_version: 6.8.0
---
# autolabel β€” OWLv2 + SAM2 labeling pipeline
Auto-label images using **OWLv2** (open-vocabulary object detection) and
optionally **SAM2** (instance segmentation), then export a COCO dataset ready
for model fine-tuning.
---
## Quickstart
```bash
# 1. Install
uv sync
# 2. Copy env file (sets PYTORCH_ENABLE_MPS_FALLBACK=1 for Apple Silicon)
cp .env.example .env
# 3. Launch
make app
```
Models download automatically on first use and are cached in
`~/.cache/huggingface`. Nothing else is written to the project directory.
| Model | Size | Purpose |
|-------|------|---------|
| `owlv2-large-patch14-finetuned` | ~700 MB | Text β†’ bounding boxes |
| `sam2-hiera-tiny` | ~160 MB | Box prompts β†’ pixel masks |
---
## How the app works
### Mode selector
Both tabs have a **Detection / Segmentation** radio button:
| Mode | What runs | COCO output |
|------|-----------|-------------|
| **Detection** | OWLv2 only | `bbox` + empty `segmentation: []` |
| **Segmentation** | OWLv2 β†’ SAM2 | `bbox` + `segmentation` polygon list |
### How Detection and Segmentation work
**Detection** uses [OWLv2](https://huggingface.co/google/owlv2-large-patch14-finetuned) β€” an
open-vocabulary object detector. You give it a text prompt ("cup, bottle") and it returns
bounding boxes with confidence scores. No fixed class list, no retraining needed.
**Segmentation** uses the **Grounded SAM2** pattern β€” two models chained together:
```
Text prompts ("cup, bottle")
β”‚
β–Ό
OWLv2 ← understands text, produces bounding boxes
β”‚
β–Ό
Bounding boxes
β”‚
β–Ό
SAM2 ← understands spatial prompts, produces pixel masks
β”‚
β–Ό
Masks + COCO polygons
```
SAM2 (`sam2-hiera-tiny`) is a *prompt-based* segmenter β€” it accepts box, point, or mask
prompts but has no concept of text or class names. It can't answer "find me a cup"; it
can only answer "segment the object inside this box." OWLv2 is the **grounding** step
that translates your words into coordinates SAM2 can act on.
Both models run in Segmentation mode. Detection mode skips SAM2 entirely.
### πŸ§ͺ Test tab
Upload a single image, pick a mode, and type comma-separated object prompts.
Hit **Detect** to see an annotated preview alongside a results table (label,
confidence, bounding box). In Segmentation mode, pixel mask overlays are drawn
on top of the bounding boxes. Use this tab to dial in prompts and threshold
before a batch run β€” nothing is saved to disk.
### πŸ“‚ Batch tab
Upload multiple images and run the chosen mode on all of them at once. You get:
- An annotated **gallery** showing every image
- A **Download ZIP** button containing:
- `coco_export.json` β€” COCO-format annotations ready for fine-tuning
- `images/` β€” all images resized to your chosen training size
The size dropdown offers common YOLOX training resolutions (416 β†’ 1024) plus
**As is** to keep the original dimensions. Coordinates in the COCO file match
the resized images exactly.
All artifacts live in a system temp directory β€” nothing is written to the project.
---
## Project layout
```
autolabel/
β”œβ”€β”€ config.py # Pydantic settings, auto device detection (CUDA β†’ MPS β†’ CPU)
β”œβ”€β”€ detect.py # OWLv2 inference β€” infer() shared by app + CLI
β”œβ”€β”€ segment.py # SAM2 integration β€” box prompts β†’ masks + COCO polygons
β”œβ”€β”€ export.py # COCO JSON builder (no pycocotools); bbox + segmentation
β”œβ”€β”€ finetune.py # Fine-tuning loop (future use)
└── utils.py # Shared helpers
scripts/
β”œβ”€β”€ run_detection.py # CLI: batch detect β†’ data/detections/
β”œβ”€β”€ export_coco.py # CLI: build coco_export.json from data/labeled/
└── finetune_owlv2.py # CLI: fine-tune OWLv2 (future use)
app.py # Gradio web UI
```
---
## CLI workflow
Detection and export can be driven from the command line without the UI:
```bash
# Detect all images in data/raw/ β†’ data/detections/
make detect
# Custom prompts
uv run python scripts/run_detection.py --prompts "cup,mug,bottle"
# Force re-run on already-processed images
uv run python scripts/run_detection.py --force
# Build COCO JSON from data/labeled/
make export
```
---
## Fine-tuning (future)
The fine-tuning infrastructure is already in place. Once you have a
`coco_export.json` from a Batch run:
```bash
make finetune
# or:
uv run python scripts/finetune_owlv2.py \
--coco-json data/labeled/coco_export.json \
--image-dir data/raw \
--epochs 10
```
### Key hyperparameters
| Parameter | Default | Notes |
|-----------|---------|-------|
| Epochs | 10 | More epochs β†’ higher overfit risk on small datasets |
| Learning rate | 1e-4 | Applied to the detection head |
| Gradient accumulation | 4 | Effective batch size multiplier |
| Unfreeze backbone | off | Also trains the vision encoder β€” needs more data |
### Tips
- Start with **50–100 annotated images per class** minimum; 200–500 is better.
- Fine-tuned models are more confident β€” raise the threshold to 0.2–0.4.
- Leave the backbone frozen unless you have 500+ images per class.
---
## Prerequisites
| Tool | Version | Notes |
|------|---------|-------|
| Python | **3.11.x** | Managed by uv |
| [uv](https://docs.astral.sh/uv/) | latest | `curl -LsSf https://astral.sh/uv/install.sh \| sh` |
| CUDA toolkit | 11.8+ | Windows/Linux GPU users only |
**Apple Silicon:** `PYTORCH_ENABLE_MPS_FALLBACK=1` is pre-set in `.env.example`.
**Windows/CUDA:** remove `PYTORCH_ENABLE_MPS_FALLBACK` from `.env`. For a
specific CUDA build:
```powershell
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
uv sync
```
---
## Makefile targets
| Target | Description |
|--------|-------------|
| `make setup` | Install dependencies, copy `.env.example` |
| `make app` | Launch the Gradio UI |
| `make detect` | Batch detect via CLI β†’ `data/detections/` |
| `make export` | Build COCO JSON via CLI |
| `make finetune` | Fine-tune OWLv2 via CLI |
| `make clean` | Delete generated JSONs (raw images untouched) |
---
## License
MIT