Spaces:

bytestream89
/

LabelPlayground

Sleeping

App Files Files Community

LabelPlayground / README.md

Erick

Upload folder using huggingface_hub

47cb9bd verified 17 days ago

preview code

raw

history blame contribute delete

6.23 kB

	---
	title: LabelPlayground
	app_file: app.py
	sdk: gradio
	sdk_version: 6.8.0
	---
	# autolabel — OWLv2 + SAM2 labeling pipeline

	Auto-label images using OWLv2 (open-vocabulary object detection) and
	optionally SAM2 (instance segmentation), then export a COCO dataset ready
	for model fine-tuning.

	---

	## Quickstart

	```bash
	# 1. Install
	uv sync

	# 2. Copy env file (sets PYTORCH_ENABLE_MPS_FALLBACK=1 for Apple Silicon)
	cp .env.example .env

	# 3. Launch
	make app
	```

	Models download automatically on first use and are cached in
	`~/.cache/huggingface`. Nothing else is written to the project directory.

	\| Model \| Size \| Purpose \|
	\|-------\|------\|---------\|
	\| `owlv2-large-patch14-finetuned` \| ~700 MB \| Text → bounding boxes \|
	\| `sam2-hiera-tiny` \| ~160 MB \| Box prompts → pixel masks \|

	---

	## How the app works

	### Mode selector

	Both tabs have a Detection / Segmentation radio button:

	\| Mode \| What runs \| COCO output \|
	\|------\|-----------\|-------------\|
	\| Detection \| OWLv2 only \| `bbox` + empty `segmentation: []` \|
	\| Segmentation \| OWLv2 → SAM2 \| `bbox` + `segmentation` polygon list \|

	### How Detection and Segmentation work

	Detection uses [OWLv2](https://huggingface.co/google/owlv2-large-patch14-finetuned) — an
	open-vocabulary object detector. You give it a text prompt ("cup, bottle") and it returns
	bounding boxes with confidence scores. No fixed class list, no retraining needed.

	Segmentation uses the Grounded SAM2 pattern — two models chained together:

	```
	Text prompts ("cup, bottle")
	│
	▼
	OWLv2 ← understands text, produces bounding boxes
	│
	▼
	Bounding boxes
	│
	▼
	SAM2 ← understands spatial prompts, produces pixel masks
	│
	▼
	Masks + COCO polygons
	```

	SAM2 (`sam2-hiera-tiny`) is a prompt-based segmenter — it accepts box, point, or mask
	prompts but has no concept of text or class names. It can't answer "find me a cup"; it
	can only answer "segment the object inside this box." OWLv2 is the grounding step
	that translates your words into coordinates SAM2 can act on.

	Both models run in Segmentation mode. Detection mode skips SAM2 entirely.

	### 🧪 Test tab

	Upload a single image, pick a mode, and type comma-separated object prompts.
	Hit Detect to see an annotated preview alongside a results table (label,
	confidence, bounding box). In Segmentation mode, pixel mask overlays are drawn
	on top of the bounding boxes. Use this tab to dial in prompts and threshold
	before a batch run — nothing is saved to disk.

	### 📂 Batch tab

	Upload multiple images and run the chosen mode on all of them at once. You get:

	- An annotated gallery showing every image
	- A Download ZIP button containing:
	- `coco_export.json` — COCO-format annotations ready for fine-tuning
	- `images/` — all images resized to your chosen training size

	The size dropdown offers common YOLOX training resolutions (416 → 1024) plus
	As is to keep the original dimensions. Coordinates in the COCO file match
	the resized images exactly.

	All artifacts live in a system temp directory — nothing is written to the project.

	---

	## Project layout

	```
	autolabel/
	├── config.py # Pydantic settings, auto device detection (CUDA → MPS → CPU)
	├── detect.py # OWLv2 inference — infer() shared by app + CLI
	├── segment.py # SAM2 integration — box prompts → masks + COCO polygons
	├── export.py # COCO JSON builder (no pycocotools); bbox + segmentation
	├── finetune.py # Fine-tuning loop (future use)
	└── utils.py # Shared helpers
	scripts/
	├── run_detection.py # CLI: batch detect → data/detections/
	├── export_coco.py # CLI: build coco_export.json from data/labeled/
	└── finetune_owlv2.py # CLI: fine-tune OWLv2 (future use)
	app.py # Gradio web UI
	```

	---

	## CLI workflow

	Detection and export can be driven from the command line without the UI:

	```bash
	# Detect all images in data/raw/ → data/detections/
	make detect

	# Custom prompts
	uv run python scripts/run_detection.py --prompts "cup,mug,bottle"

	# Force re-run on already-processed images
	uv run python scripts/run_detection.py --force

	# Build COCO JSON from data/labeled/
	make export
	```

	---

	## Fine-tuning (future)

	The fine-tuning infrastructure is already in place. Once you have a
	`coco_export.json` from a Batch run:

	```bash
	make finetune
	# or:
	uv run python scripts/finetune_owlv2.py \
	--coco-json data/labeled/coco_export.json \
	--image-dir data/raw \
	--epochs 10
	```

	### Key hyperparameters

	\| Parameter \| Default \| Notes \|
	\|-----------\|---------\|-------\|
	\| Epochs \| 10 \| More epochs → higher overfit risk on small datasets \|
	\| Learning rate \| 1e-4 \| Applied to the detection head \|
	\| Gradient accumulation \| 4 \| Effective batch size multiplier \|
	\| Unfreeze backbone \| off \| Also trains the vision encoder — needs more data \|

	### Tips

	- Start with 50–100 annotated images per class minimum; 200–500 is better.
	- Fine-tuned models are more confident — raise the threshold to 0.2–0.4.
	- Leave the backbone frozen unless you have 500+ images per class.

	---

	## Prerequisites

	\| Tool \| Version \| Notes \|
	\|------\|---------\|-------\|
	\| Python \| 3.11.x \| Managed by uv \|
	\| [uv](https://docs.astral.sh/uv/) \| latest \| `curl -LsSf https://astral.sh/uv/install.sh \\| sh` \|
	\| CUDA toolkit \| 11.8+ \| Windows/Linux GPU users only \|

	Apple Silicon: `PYTORCH_ENABLE_MPS_FALLBACK=1` is pre-set in `.env.example`.

	Windows/CUDA: remove `PYTORCH_ENABLE_MPS_FALLBACK` from `.env`. For a
	specific CUDA build:

	```powershell
	uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
	uv sync
	```

	---

	## Makefile targets

	\| Target \| Description \|
	\|--------\|-------------\|
	\| `make setup` \| Install dependencies, copy `.env.example` \|
	\| `make app` \| Launch the Gradio UI \|
	\| `make detect` \| Batch detect via CLI → `data/detections/` \|
	\| `make export` \| Build COCO JSON via CLI \|
	\| `make finetune` \| Fine-tune OWLv2 via CLI \|
	\| `make clean` \| Delete generated JSONs (raw images untouched) \|

	---

	## License

	MIT