Spaces:
Sleeping
Sleeping
| title: LabelPlayground | |
| app_file: app.py | |
| sdk: gradio | |
| sdk_version: 6.8.0 | |
| # autolabel β OWLv2 + SAM2 labeling pipeline | |
| Auto-label images using **OWLv2** (open-vocabulary object detection) and | |
| optionally **SAM2** (instance segmentation), then export a COCO dataset ready | |
| for model fine-tuning. | |
| --- | |
| ## Quickstart | |
| ```bash | |
| # 1. Install | |
| uv sync | |
| # 2. Copy env file (sets PYTORCH_ENABLE_MPS_FALLBACK=1 for Apple Silicon) | |
| cp .env.example .env | |
| # 3. Launch | |
| make app | |
| ``` | |
| Models download automatically on first use and are cached in | |
| `~/.cache/huggingface`. Nothing else is written to the project directory. | |
| | Model | Size | Purpose | | |
| |-------|------|---------| | |
| | `owlv2-large-patch14-finetuned` | ~700 MB | Text β bounding boxes | | |
| | `sam2-hiera-tiny` | ~160 MB | Box prompts β pixel masks | | |
| --- | |
| ## How the app works | |
| ### Mode selector | |
| Both tabs have a **Detection / Segmentation** radio button: | |
| | Mode | What runs | COCO output | | |
| |------|-----------|-------------| | |
| | **Detection** | OWLv2 only | `bbox` + empty `segmentation: []` | | |
| | **Segmentation** | OWLv2 β SAM2 | `bbox` + `segmentation` polygon list | | |
| ### How Detection and Segmentation work | |
| **Detection** uses [OWLv2](https://huggingface.co/google/owlv2-large-patch14-finetuned) β an | |
| open-vocabulary object detector. You give it a text prompt ("cup, bottle") and it returns | |
| bounding boxes with confidence scores. No fixed class list, no retraining needed. | |
| **Segmentation** uses the **Grounded SAM2** pattern β two models chained together: | |
| ``` | |
| Text prompts ("cup, bottle") | |
| β | |
| βΌ | |
| OWLv2 β understands text, produces bounding boxes | |
| β | |
| βΌ | |
| Bounding boxes | |
| β | |
| βΌ | |
| SAM2 β understands spatial prompts, produces pixel masks | |
| β | |
| βΌ | |
| Masks + COCO polygons | |
| ``` | |
| SAM2 (`sam2-hiera-tiny`) is a *prompt-based* segmenter β it accepts box, point, or mask | |
| prompts but has no concept of text or class names. It can't answer "find me a cup"; it | |
| can only answer "segment the object inside this box." OWLv2 is the **grounding** step | |
| that translates your words into coordinates SAM2 can act on. | |
| Both models run in Segmentation mode. Detection mode skips SAM2 entirely. | |
| ### π§ͺ Test tab | |
| Upload a single image, pick a mode, and type comma-separated object prompts. | |
| Hit **Detect** to see an annotated preview alongside a results table (label, | |
| confidence, bounding box). In Segmentation mode, pixel mask overlays are drawn | |
| on top of the bounding boxes. Use this tab to dial in prompts and threshold | |
| before a batch run β nothing is saved to disk. | |
| ### π Batch tab | |
| Upload multiple images and run the chosen mode on all of them at once. You get: | |
| - An annotated **gallery** showing every image | |
| - A **Download ZIP** button containing: | |
| - `coco_export.json` β COCO-format annotations ready for fine-tuning | |
| - `images/` β all images resized to your chosen training size | |
| The size dropdown offers common YOLOX training resolutions (416 β 1024) plus | |
| **As is** to keep the original dimensions. Coordinates in the COCO file match | |
| the resized images exactly. | |
| All artifacts live in a system temp directory β nothing is written to the project. | |
| --- | |
| ## Project layout | |
| ``` | |
| autolabel/ | |
| βββ config.py # Pydantic settings, auto device detection (CUDA β MPS β CPU) | |
| βββ detect.py # OWLv2 inference β infer() shared by app + CLI | |
| βββ segment.py # SAM2 integration β box prompts β masks + COCO polygons | |
| βββ export.py # COCO JSON builder (no pycocotools); bbox + segmentation | |
| βββ finetune.py # Fine-tuning loop (future use) | |
| βββ utils.py # Shared helpers | |
| scripts/ | |
| βββ run_detection.py # CLI: batch detect β data/detections/ | |
| βββ export_coco.py # CLI: build coco_export.json from data/labeled/ | |
| βββ finetune_owlv2.py # CLI: fine-tune OWLv2 (future use) | |
| app.py # Gradio web UI | |
| ``` | |
| --- | |
| ## CLI workflow | |
| Detection and export can be driven from the command line without the UI: | |
| ```bash | |
| # Detect all images in data/raw/ β data/detections/ | |
| make detect | |
| # Custom prompts | |
| uv run python scripts/run_detection.py --prompts "cup,mug,bottle" | |
| # Force re-run on already-processed images | |
| uv run python scripts/run_detection.py --force | |
| # Build COCO JSON from data/labeled/ | |
| make export | |
| ``` | |
| --- | |
| ## Fine-tuning (future) | |
| The fine-tuning infrastructure is already in place. Once you have a | |
| `coco_export.json` from a Batch run: | |
| ```bash | |
| make finetune | |
| # or: | |
| uv run python scripts/finetune_owlv2.py \ | |
| --coco-json data/labeled/coco_export.json \ | |
| --image-dir data/raw \ | |
| --epochs 10 | |
| ``` | |
| ### Key hyperparameters | |
| | Parameter | Default | Notes | | |
| |-----------|---------|-------| | |
| | Epochs | 10 | More epochs β higher overfit risk on small datasets | | |
| | Learning rate | 1e-4 | Applied to the detection head | | |
| | Gradient accumulation | 4 | Effective batch size multiplier | | |
| | Unfreeze backbone | off | Also trains the vision encoder β needs more data | | |
| ### Tips | |
| - Start with **50β100 annotated images per class** minimum; 200β500 is better. | |
| - Fine-tuned models are more confident β raise the threshold to 0.2β0.4. | |
| - Leave the backbone frozen unless you have 500+ images per class. | |
| --- | |
| ## Prerequisites | |
| | Tool | Version | Notes | | |
| |------|---------|-------| | |
| | Python | **3.11.x** | Managed by uv | | |
| | [uv](https://docs.astral.sh/uv/) | latest | `curl -LsSf https://astral.sh/uv/install.sh \| sh` | | |
| | CUDA toolkit | 11.8+ | Windows/Linux GPU users only | | |
| **Apple Silicon:** `PYTORCH_ENABLE_MPS_FALLBACK=1` is pre-set in `.env.example`. | |
| **Windows/CUDA:** remove `PYTORCH_ENABLE_MPS_FALLBACK` from `.env`. For a | |
| specific CUDA build: | |
| ```powershell | |
| uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 | |
| uv sync | |
| ``` | |
| --- | |
| ## Makefile targets | |
| | Target | Description | | |
| |--------|-------------| | |
| | `make setup` | Install dependencies, copy `.env.example` | | |
| | `make app` | Launch the Gradio UI | | |
| | `make detect` | Batch detect via CLI β `data/detections/` | | |
| | `make export` | Build COCO JSON via CLI | | |
| | `make finetune` | Fine-tune OWLv2 via CLI | | |
| | `make clean` | Delete generated JSONs (raw images untouched) | | |
| --- | |
| ## License | |
| MIT | |