Spaces:
Sleeping
Sleeping
File size: 6,227 Bytes
9634000 47cb9bd 9634000 47cb9bd 9634000 47cb9bd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 | ---
title: LabelPlayground
app_file: app.py
sdk: gradio
sdk_version: 6.8.0
---
# autolabel β OWLv2 + SAM2 labeling pipeline
Auto-label images using **OWLv2** (open-vocabulary object detection) and
optionally **SAM2** (instance segmentation), then export a COCO dataset ready
for model fine-tuning.
---
## Quickstart
```bash
# 1. Install
uv sync
# 2. Copy env file (sets PYTORCH_ENABLE_MPS_FALLBACK=1 for Apple Silicon)
cp .env.example .env
# 3. Launch
make app
```
Models download automatically on first use and are cached in
`~/.cache/huggingface`. Nothing else is written to the project directory.
| Model | Size | Purpose |
|-------|------|---------|
| `owlv2-large-patch14-finetuned` | ~700 MB | Text β bounding boxes |
| `sam2-hiera-tiny` | ~160 MB | Box prompts β pixel masks |
---
## How the app works
### Mode selector
Both tabs have a **Detection / Segmentation** radio button:
| Mode | What runs | COCO output |
|------|-----------|-------------|
| **Detection** | OWLv2 only | `bbox` + empty `segmentation: []` |
| **Segmentation** | OWLv2 β SAM2 | `bbox` + `segmentation` polygon list |
### How Detection and Segmentation work
**Detection** uses [OWLv2](https://huggingface.co/google/owlv2-large-patch14-finetuned) β an
open-vocabulary object detector. You give it a text prompt ("cup, bottle") and it returns
bounding boxes with confidence scores. No fixed class list, no retraining needed.
**Segmentation** uses the **Grounded SAM2** pattern β two models chained together:
```
Text prompts ("cup, bottle")
β
βΌ
OWLv2 β understands text, produces bounding boxes
β
βΌ
Bounding boxes
β
βΌ
SAM2 β understands spatial prompts, produces pixel masks
β
βΌ
Masks + COCO polygons
```
SAM2 (`sam2-hiera-tiny`) is a *prompt-based* segmenter β it accepts box, point, or mask
prompts but has no concept of text or class names. It can't answer "find me a cup"; it
can only answer "segment the object inside this box." OWLv2 is the **grounding** step
that translates your words into coordinates SAM2 can act on.
Both models run in Segmentation mode. Detection mode skips SAM2 entirely.
### π§ͺ Test tab
Upload a single image, pick a mode, and type comma-separated object prompts.
Hit **Detect** to see an annotated preview alongside a results table (label,
confidence, bounding box). In Segmentation mode, pixel mask overlays are drawn
on top of the bounding boxes. Use this tab to dial in prompts and threshold
before a batch run β nothing is saved to disk.
### π Batch tab
Upload multiple images and run the chosen mode on all of them at once. You get:
- An annotated **gallery** showing every image
- A **Download ZIP** button containing:
- `coco_export.json` β COCO-format annotations ready for fine-tuning
- `images/` β all images resized to your chosen training size
The size dropdown offers common YOLOX training resolutions (416 β 1024) plus
**As is** to keep the original dimensions. Coordinates in the COCO file match
the resized images exactly.
All artifacts live in a system temp directory β nothing is written to the project.
---
## Project layout
```
autolabel/
βββ config.py # Pydantic settings, auto device detection (CUDA β MPS β CPU)
βββ detect.py # OWLv2 inference β infer() shared by app + CLI
βββ segment.py # SAM2 integration β box prompts β masks + COCO polygons
βββ export.py # COCO JSON builder (no pycocotools); bbox + segmentation
βββ finetune.py # Fine-tuning loop (future use)
βββ utils.py # Shared helpers
scripts/
βββ run_detection.py # CLI: batch detect β data/detections/
βββ export_coco.py # CLI: build coco_export.json from data/labeled/
βββ finetune_owlv2.py # CLI: fine-tune OWLv2 (future use)
app.py # Gradio web UI
```
---
## CLI workflow
Detection and export can be driven from the command line without the UI:
```bash
# Detect all images in data/raw/ β data/detections/
make detect
# Custom prompts
uv run python scripts/run_detection.py --prompts "cup,mug,bottle"
# Force re-run on already-processed images
uv run python scripts/run_detection.py --force
# Build COCO JSON from data/labeled/
make export
```
---
## Fine-tuning (future)
The fine-tuning infrastructure is already in place. Once you have a
`coco_export.json` from a Batch run:
```bash
make finetune
# or:
uv run python scripts/finetune_owlv2.py \
--coco-json data/labeled/coco_export.json \
--image-dir data/raw \
--epochs 10
```
### Key hyperparameters
| Parameter | Default | Notes |
|-----------|---------|-------|
| Epochs | 10 | More epochs β higher overfit risk on small datasets |
| Learning rate | 1e-4 | Applied to the detection head |
| Gradient accumulation | 4 | Effective batch size multiplier |
| Unfreeze backbone | off | Also trains the vision encoder β needs more data |
### Tips
- Start with **50β100 annotated images per class** minimum; 200β500 is better.
- Fine-tuned models are more confident β raise the threshold to 0.2β0.4.
- Leave the backbone frozen unless you have 500+ images per class.
---
## Prerequisites
| Tool | Version | Notes |
|------|---------|-------|
| Python | **3.11.x** | Managed by uv |
| [uv](https://docs.astral.sh/uv/) | latest | `curl -LsSf https://astral.sh/uv/install.sh \| sh` |
| CUDA toolkit | 11.8+ | Windows/Linux GPU users only |
**Apple Silicon:** `PYTORCH_ENABLE_MPS_FALLBACK=1` is pre-set in `.env.example`.
**Windows/CUDA:** remove `PYTORCH_ENABLE_MPS_FALLBACK` from `.env`. For a
specific CUDA build:
```powershell
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
uv sync
```
---
## Makefile targets
| Target | Description |
|--------|-------------|
| `make setup` | Install dependencies, copy `.env.example` |
| `make app` | Launch the Gradio UI |
| `make detect` | Batch detect via CLI β `data/detections/` |
| `make export` | Build COCO JSON via CLI |
| `make finetune` | Fine-tune OWLv2 via CLI |
| `make clean` | Delete generated JSONs (raw images untouched) |
---
## License
MIT
|