| # API Documentation | |
| This document describes the Gradio API endpoints exposed by the ROI-VAE image and video compression application. The API allows programmatic access to segmentation, compression, detection, and full pipeline processing for both images and videos. | |
| **Live Demo:** https://biaslab2025-contextual-communication-demo.hf.space | |
| ## Table of Contents | |
| - [Quick Start](#quick-start) | |
| - [Important Notes](#important-notes) | |
| - [Image API Endpoints](#image-api-endpoints) | |
| - [/segment](#1-segment---generate-roi-mask) | |
| - [/compress](#2-compress---compress-image) | |
| - [/detect](#3-detect---object-detection) | |
| - [/detect_overlay](#31-detect_overlay---detection-with-visualization) | |
| - [/process](#4-process---full-image-pipeline) | |
| - [Video API Endpoints](#video-api-endpoints) | |
| - [/segment_video](#1-segment_video---segment-video) | |
| - [/compress_video](#2-compress_video---compress-video) | |
| - [/detect_video](#3-detect_video---video-detection) | |
| - [/process_video](#4-process_video---full-video-pipeline) | |
| - [Streaming Video API Endpoints](#streaming-video-api-endpoints) | |
| - [/stream_process_video](#1-stream_process_video---full-streaming-pipeline) | |
| - [/stream_compress_video](#2-stream_compress_video---simplified-streaming-compression) | |
| - [Class Reference](#class-reference) | |
| - [Error Handling](#error-handling) | |
| - [GPU Quota Handling](#handling-gpu-quota-on-hf-spaces) | |
| - [cURL Examples](#using-with-curl) | |
| - [Example Scripts](#example-scripts) | |
| --- | |
| ## Quick Start | |
| ### Installation | |
| ```bash | |
| pip install gradio_client | |
| ``` | |
| ### Image Processing | |
| ```python | |
| from gradio_client import Client, handle_file | |
| # Connect to the API | |
| client = Client("https://biaslab2025-contextual-communication-demo.hf.space") | |
| # Or local: client = Client("http://localhost:7860") | |
| # Full pipeline: segment → compress → detect | |
| compressed, mask, bpp, ratio, coverage, detections_json = client.predict( | |
| handle_file("path/to/image.jpg"), | |
| "car, person", # segmentation prompt | |
| "sam3", # segmentation method | |
| 4, # quality level (1-5) | |
| 0.3, # sigma (background compression) | |
| True, # run detection | |
| "yolo", # detection method | |
| "", # detection classes (empty for closed-vocab) | |
| api_name="/process" | |
| ) | |
| print(f"Compression: {bpp:.4f} bpp ({ratio:.2f}x)") | |
| ``` | |
| ### Video Processing | |
| ```python | |
| from gradio_client import Client, handle_file | |
| import json | |
| client = Client("http://localhost:7860") | |
| # Full pipeline with static settings | |
| output_video, stats_json = client.predict( | |
| handle_file("path/to/video.mp4"), | |
| "person, car", # segmentation classes | |
| "sam3", # segmentation method | |
| "static", # mode: "static" or "dynamic" | |
| 4, # quality level (1-5) | |
| 0.3, # sigma | |
| 15.0, # output FPS | |
| 500, # bandwidth (dynamic mode) | |
| 5, # min_fps (dynamic mode) | |
| 30, # max_fps (dynamic mode) | |
| False, # run detection | |
| "yolo", # detection method | |
| None, # mask_file_path (optional) | |
| api_name="/process_video" | |
| ) | |
| stats = json.loads(stats_json) | |
| print(f"Compressed video: {output_video}") | |
| print(f"Total frames: {stats['total_frames']}") | |
| ``` | |
| --- | |
| ## Important Notes | |
| ### File Handling | |
| Always wrap file paths with `handle_file()` when using `gradio_client`: | |
| ```python | |
| from gradio_client import handle_file | |
| # ✅ Correct | |
| client.predict(handle_file("image.jpg"), ...) | |
| # ❌ Incorrect - will fail with validation error | |
| client.predict("image.jpg", ...) | |
| ``` | |
| ### Detection Output Format | |
| All detection endpoints return JSON strings with this structure: | |
| ```python | |
| import json | |
| detections = json.loads(detections_json) | |
| # Each detection has: | |
| # - label: str (class name) | |
| # - score: float (confidence 0-1) | |
| # - bbox_xyxy: list[float] (bounding box [x1, y1, x2, y2]) | |
| ``` | |
| ### Open-Vocabulary Detectors | |
| The following detectors require a `classes` parameter: | |
| - `yolo_world` - YOLO-World | |
| - `grounding_dino` - Grounding DINO | |
| Closed-vocabulary detectors (`yolo`, `detr`, `faster_rcnn`, etc.) use pretrained COCO classes and ignore the `classes` parameter. | |
| --- | |
| ## Image API Endpoints | |
| ### 1. `/segment` - Generate ROI Mask | |
| Segments an image to create a Region of Interest (ROI) mask. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `prompt` | str | `"object"` | Comma-separated classes or natural language prompt | | |
| | `method` | str | `"sam3"` | Segmentation method (see [methods](#segmentation-methods)) | | |
| | `return_overlay` | bool | `False` | If `True`, returns image with ROI highlighted instead of mask | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `result_image` | Image | Grayscale mask OR image with ROI overlay (if `return_overlay=True`) | | |
| | `roi_coverage` | float | Fraction of image covered by ROI (0.0-1.0) | | |
| | `classes_used` | str | JSON list of classes/prompts used | | |
| **Example:** | |
| ```python | |
| # Get binary mask (default) | |
| mask, coverage, classes = client.predict( | |
| handle_file("car_scene.jpg"), | |
| "car, road", | |
| "sam3", | |
| False, # return_overlay | |
| api_name="/segment" | |
| ) | |
| print(f"ROI covers {coverage*100:.2f}% of image") | |
| # Get image with ROI highlighted | |
| highlighted, coverage, classes = client.predict( | |
| handle_file("car_scene.jpg"), | |
| "car, road", | |
| "sam3", | |
| True, # return_overlay=True | |
| api_name="/segment" | |
| ) | |
| ``` | |
| --- | |
| ### 2. `/compress` - Compress Image | |
| Compresses an image using TIC VAE, optionally with an ROI mask for variable quality. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `mask_image` | Image | `None` | ROI mask (white=ROI, black=background) | | |
| | `quality` | int | `4` | Quality level 1-5 | | |
| | `sigma` | float | `0.3` | Background preservation (0.01-1.0) | | |
| **Quality Levels:** | |
| | Level | Lambda | Description | | |
| |-------|--------|-------------| | |
| | 1 | 0.0035 | Smallest file | | |
| | 2 | 0.013 | Smaller file | | |
| | 3 | 0.025 | Balanced | | |
| | 4 | 0.0483 | Higher quality (default) | | |
| | 5 | 0.0932 | Best quality | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `compressed_image` | Image | Compressed output image | | |
| | `bpp` | float | Bits per pixel | | |
| | `compression_ratio` | float | Compression ratio (24/bpp) | | |
| **Example:** | |
| ```python | |
| # Compress without mask (uniform quality) | |
| compressed, bpp, ratio = client.predict( | |
| handle_file("image.jpg"), | |
| None, # no mask | |
| 4, # quality | |
| 0.3, # sigma (ignored without mask) | |
| api_name="/compress" | |
| ) | |
| # Compress with ROI mask | |
| mask, _, _ = client.predict(handle_file("image.jpg"), "person", "yolo", False, api_name="/segment") | |
| compressed, bpp, ratio = client.predict( | |
| handle_file("image.jpg"), | |
| handle_file(mask), | |
| 4, | |
| 0.2, # aggressive background compression | |
| api_name="/compress" | |
| ) | |
| ``` | |
| --- | |
| ### 3. `/detect` - Object Detection | |
| Runs object detection on an image and returns detection results as JSON. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `method` | str | `"yolo"` | Detection method (see [methods](#detection-methods)) | | |
| | `classes` | str | `""` | Comma-separated classes (required for open-vocab detectors) | | |
| | `confidence` | float | `0.25` | Confidence threshold (0.0-1.0) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `detections_json` | str | JSON string of detection results | | |
| **Example - Closed-Vocabulary:** | |
| ```python | |
| import json | |
| # YOLO detection (COCO classes) | |
| dets_json = client.predict( | |
| handle_file("street_scene.jpg"), | |
| "yolo", | |
| "", # no classes needed | |
| 0.25, | |
| api_name="/detect" | |
| ) | |
| detections = json.loads(dets_json) | |
| for det in detections: | |
| print(f"{det['label']}: {det['score']:.2f}") | |
| ``` | |
| **Example - Open-Vocabulary:** | |
| ```python | |
| # YOLO-World with custom classes | |
| dets_json = client.predict( | |
| handle_file("image.jpg"), | |
| "yolo_world", | |
| "hat, backpack, umbrella", # custom classes required | |
| 0.25, | |
| api_name="/detect" | |
| ) | |
| ``` | |
| --- | |
| ### 3.1. `/detect_overlay` - Detection with Visualization | |
| Runs object detection and returns the image with bounding boxes drawn. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `method` | str | `"yolo"` | Detection method (see [methods](#detection-methods)) | | |
| | `classes` | str | `""` | Comma-separated classes (required for open-vocab detectors) | | |
| | `confidence` | float | `0.25` | Confidence threshold (0.0-1.0) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `result_image` | Image | Image with detection bounding boxes | | |
| | `detections_json` | str | JSON string of detection results | | |
| **Example:** | |
| ```python | |
| import json | |
| # Get image with detection boxes | |
| result_img, dets_json = client.predict( | |
| handle_file("street_scene.jpg"), | |
| "yolo", | |
| "", | |
| 0.25, | |
| api_name="/detect_overlay" | |
| ) | |
| # result_img is a file path to the image with boxes drawn | |
| print(f"Image with boxes: {result_img}") | |
| detections = json.loads(dets_json) | |
| ``` | |
| --- | |
| ### 4. `/process` - Full Image Pipeline | |
| Runs the complete pipeline: segmentation → compression → optional detection. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `prompt` | str | `"object"` | Segmentation prompt/classes | | |
| | `segmentation_method` | str | `"sam3"` | ROI segmentation method | | |
| | `quality` | int | `4` | Compression quality (1-5) | | |
| | `sigma` | float | `0.3` | Background preservation (0.01-1.0) | | |
| | `run_detection` | bool | `False` | Whether to run detection on output | | |
| | `detection_method` | str | `"yolo"` | Detector to use | | |
| | `detection_classes` | str | `""` | Classes for open-vocab detectors | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `compressed_image` | Image | Compressed output image | | |
| | `mask_image` | Image | Generated ROI mask | | |
| | `bpp` | float | Bits per pixel | | |
| | `compression_ratio` | float | Compression ratio | | |
| | `roi_coverage` | float | ROI coverage percentage (0-1) | | |
| | `detections_json` | str | JSON detections (empty list if `run_detection=False`) | | |
| **Example:** | |
| ```python | |
| import json | |
| compressed, mask, bpp, ratio, coverage, dets_json = client.predict( | |
| handle_file("street.jpg"), | |
| "car, person, road", | |
| "sam3", | |
| 4, | |
| 0.3, | |
| True, # run detection | |
| "yolo", | |
| "", | |
| api_name="/process" | |
| ) | |
| print(f"ROI Coverage: {coverage*100:.2f}%") | |
| print(f"Compression: {bpp:.4f} bpp ({ratio:.2f}x)") | |
| print(f"Detections: {len(json.loads(dets_json))}") | |
| ``` | |
| --- | |
| ## Video API Endpoints | |
| ### 1. `/segment_video` - Segment Video | |
| Segments a video to find ROI regions, returning either a mask file or overlay video. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `video_path` | Video | required | Input video file | | |
| | `prompt` | str | `"object"` | Comma-separated classes or natural language prompt | | |
| | `method` | str | `"sam3"` | Segmentation method | | |
| | `return_overlay` | bool | `False` | If `True`, returns video with ROI highlighted | | |
| | `output_fps` | float | `15.0` | Output framerate (max 30) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `result_path` | File/Video | Mask file (NPZ) OR video with ROI overlay | | |
| | `stats_json` | str | JSON with frame count, coverage, and classes | | |
| **Example:** | |
| ```python | |
| import json | |
| # Get mask file for reuse in compression | |
| mask_file, stats_json = client.predict( | |
| handle_file("video.mp4"), | |
| "person, car", | |
| "sam3", | |
| False, # return masks file | |
| 15.0, # fps | |
| api_name="/segment_video" | |
| ) | |
| stats = json.loads(stats_json) | |
| print(f"Processed {stats['total_frames']} frames") | |
| print(f"Avg ROI coverage: {stats['avg_roi_coverage']*100:.2f}%") | |
| # Get video with ROI overlay for visualization | |
| overlay_video, _ = client.predict( | |
| handle_file("video.mp4"), | |
| "person, car", | |
| "sam3", | |
| True, # return overlay video | |
| 15.0, | |
| api_name="/segment_video" | |
| ) | |
| ``` | |
| --- | |
| ### 2. `/compress_video` - Compress Video | |
| Compresses a video with optional ROI mask preservation. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `video_path` | Video | required | Input video file | | |
| | `mask_file_path` | str | `None` | Path to pre-computed masks (from `/segment_video`) | | |
| | `quality` | int | `4` | Quality level (1-5) | | |
| | `sigma` | float | `0.3` | Background preservation (0.01-1.0) | | |
| | `output_fps` | float | `15.0` | Target output framerate | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `compressed_video` | Video | Compressed output video | | |
| | `stats_json` | str | JSON with compression statistics | | |
| **Example:** | |
| ```python | |
| import json | |
| # First, segment to get masks | |
| mask_file, _ = client.predict( | |
| handle_file("video.mp4"), "person", "sam3", False, 15.0, | |
| api_name="/segment_video" | |
| ) | |
| # Then compress with cached masks (3-5x faster!) | |
| compressed, stats_json = client.predict( | |
| handle_file("video.mp4"), | |
| mask_file, # reuse masks | |
| 4, # quality | |
| 0.3, # sigma | |
| 15.0, # fps | |
| api_name="/compress_video" | |
| ) | |
| stats = json.loads(stats_json) | |
| print(f"Compression ratio: {stats['compression_ratio']}x") | |
| print(f"Total size: {stats['total_size_kb']} KB") | |
| ``` | |
| --- | |
| ### 3. `/detect_video` - Video Detection | |
| Runs object detection on each frame of a video. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `video_path` | Video | required | Input video file | | |
| | `method` | str | `"yolo"` | Detection method | | |
| | `classes` | str | `""` | Comma-separated classes (required for open-vocab) | | |
| | `confidence` | float | `0.25` | Confidence threshold (0.0-1.0) | | |
| | `return_overlay` | bool | `False` | If `True`, returns video with detection boxes | | |
| | `output_fps` | float | `15.0` | Output framerate (max 30) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `result_video` | Video | Video with detection boxes (if `return_overlay=True`), None otherwise | | |
| | `detections_json` | str | JSON with per-frame detections | | |
| **Example:** | |
| ```python | |
| import json | |
| # Get per-frame detections JSON | |
| _, dets_json = client.predict( | |
| handle_file("video.mp4"), | |
| "yolo", | |
| "", | |
| 0.25, | |
| False, # return JSON only | |
| 15.0, | |
| api_name="/detect_video" | |
| ) | |
| data = json.loads(dets_json) | |
| print(f"Total detections: {data['total_detections']}") | |
| print(f"Avg per frame: {data['avg_detections_per_frame']}") | |
| # Get video with detection overlays | |
| det_video, _ = client.predict( | |
| handle_file("video.mp4"), | |
| "yolo", | |
| "", | |
| 0.25, | |
| True, # return overlay video | |
| 15.0, | |
| api_name="/detect_video" | |
| ) | |
| ``` | |
| --- | |
| ### 4. `/process_video` - Full Video Pipeline | |
| Processes a video with ROI-based compression (segment → compress), with optional detection. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `video_path` | Video | required | Input video file | | |
| | `prompt` | str | `"object"` | Segmentation prompt/classes | | |
| | `segmentation_method` | str | `"sam3"` | ROI segmentation method | | |
| | `mode` | str | `"static"` | `"static"` or `"dynamic"` | | |
| | `quality` | int | `4` | Quality level 1-5 (static mode) | | |
| | `sigma` | float | `0.3` | Background preservation (static mode) | | |
| | `output_fps` | float | `15.0` | Target framerate (static mode) | | |
| | `bandwidth_kbps` | float | `500.0` | Target bandwidth (dynamic mode) | | |
| | `min_fps` | float | `5.0` | Minimum framerate (dynamic mode) | | |
| | `max_fps` | float | `30.0` | Maximum framerate (dynamic mode) | | |
| | `aggressiveness` | float | `0.5` | Bandwidth savings strategy (dynamic mode): `0.0` = use full bandwidth (high FPS always), `0.5` = moderate savings, `1.0` = maximum savings (aggressive FPS reduction for low motion) | | |
| | `run_detection` | bool | `False` | Whether to run detection/tracking | | |
| | `detection_method` | str | `"yolo"` | Detector to use | | |
| | `mask_file_path` | str | `None` | Path to pre-computed masks (skips segmentation) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `output_video` | Video | Compressed video | | |
| | `stats_json` | str | JSON with detailed statistics | | |
| **Example - Static Mode:** | |
| ```python | |
| import json | |
| output, stats_json = client.predict( | |
| handle_file("video.mp4"), | |
| "person, car", | |
| "sam3", | |
| "static", | |
| 4, 0.3, 15.0, # static: quality, sigma, fps | |
| 500, 5, 30, # dynamic: bandwidth, min_fps, max_fps (ignored) | |
| False, "yolo", None, | |
| api_name="/process_video" | |
| ) | |
| stats = json.loads(stats_json) | |
| print(f"Processed {stats['total_frames']} frames") | |
| ``` | |
| **Example - Dynamic Mode:** | |
| ```python | |
| output, stats_json = client.predict( | |
| handle_file("video.mp4"), | |
| "person", | |
| "yolo", | |
| "dynamic", | |
| 4, 0.3, 15.0, # static settings (ignored) | |
| 750, # target bandwidth 750 kbps | |
| 8, # min FPS | |
| 30, # max FPS | |
| True, "yolo", None, | |
| api_name="/process_video" | |
| ) | |
| ``` | |
| --- | |
| ## Class Reference | |
| ### Segmentation Methods | |
| **Pixel-Perfect Segmentation:** | |
| | Method | Description | Classes | | |
| |--------|-------------|---------| | |
| | `sam3` | Prompt-based (natural language) | Any text prompt | | |
| ### Segmentation Methods | |
| | Method | Description | Classes | | |
| |--------|-------------|---------| | |
| | `sam3` | Prompt-based (natural language) | Any text prompt | | |
| | `yolo` | YOLO instance segmentation | 80 COCO classes | | |
| | `segformer` | Cityscapes semantic segmentation | 19 classes | | |
| | `mask2former` | Swin-based panoptic/semantic | 133 COCO / 150 ADE20K | | |
| | `maskrcnn` | ResNet50-FPN instance segmentation | 80 COCO classes | | |
| | `fake_yolo` | Fast bbox-based (YOLO + ByteTrack) | 80 COCO classes | | |
| | `fake_yolo_botsort` | Fast bbox-based (YOLO + BoTSORT) | 80 COCO classes | | |
| | `fake_detr` | Fast bbox-based (DETR + ByteTrack) | 80 COCO classes | | |
| | `fake_fasterrcnn` | Fast bbox-based (Faster R-CNN + ByteTrack) | 80 COCO classes | | |
| | `fake_retinanet` | Fast bbox-based (RetinaNet + ByteTrack) | 80 COCO classes | | |
| | `fake_fcos` | Fast bbox-based (FCOS + ByteTrack) | 80 COCO classes | | |
| | `fake_deformable_detr` | Fast bbox-based (Deformable DETR + ByteTrack) | 80 COCO classes | | |
| | `fake_grounding_dino` | Fast bbox-based (Grounding DINO + ByteTrack) | Requires prompt | | |
| **Note:** `fake_*` methods create rectangular masks from detection bounding boxes with object tracking. Faster than pixel-perfect segmentation, suitable for video when precise boundaries aren't critical. | |
| ### Detection Methods | |
| **Closed-Vocabulary (COCO pretrained):** | |
| | Method | Description | | |
| |--------|-------------| | |
| | `yolo` | Ultralytics YOLO | | |
| | `detr` | Facebook DETR | | |
| | `faster_rcnn` | Faster R-CNN | | |
| | `retinanet` | RetinaNet | | |
| | `fcos` | FCOS | | |
| | `ssd` | SSD300 | | |
| | `deformable_detr` | Deformable DETR | | |
| **Open-Vocabulary (requires `classes` parameter):** | |
| | Method | Description | | |
| |--------|-------------| | |
| | `yolo_world` | YOLO-World | | |
| | `grounding_dino` | Grounding DINO | | |
| ### COCO Classes (80) | |
| ``` | |
| person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, | |
| traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, | |
| dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, | |
| handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, | |
| baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, | |
| wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, | |
| broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, | |
| bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, | |
| microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, | |
| teddy bear, hair drier, toothbrush | |
| ``` | |
| ### Cityscapes Classes (19) | |
| ``` | |
| road, sidewalk, building, wall, fence, pole, traffic light, traffic sign, | |
| vegetation, terrain, sky, person, rider, car, truck, bus, train, motorcycle, | |
| bicycle | |
| ``` | |
| --- | |
| ## Error Handling | |
| ```python | |
| try: | |
| result = client.predict( | |
| handle_file("image.jpg"), | |
| ..., | |
| api_name="/endpoint" | |
| ) | |
| except Exception as e: | |
| print(f"API Error: {e}") | |
| ``` | |
| **Common Errors:** | |
| | Error | Cause | Solution | | |
| |-------|-------|----------| | |
| | Validation error for ImageData | Missing `handle_file()` | Wrap file paths with `handle_file()` | | |
| | File does not exist | Invalid path | Check file path is correct | | |
| | Empty detection classes | Open-vocab detector without classes | Provide classes for `yolo_world`, `grounding_dino` | | |
| | GPU quota exceeded | HF Spaces limit | Wait and retry (see below) | | |
| --- | |
| ## Handling GPU Quota on HF Spaces | |
| When using Hugging Face Spaces with ZeroGPU, you may encounter quota limits: | |
| ``` | |
| You have exceeded your GPU quota (60s requested vs. 0s left). Try again in 0:05:30 | |
| ``` | |
| ### Automatic Retry with Backoff | |
| ```python | |
| import time | |
| import re | |
| def extract_wait_time(error_msg): | |
| """Extract wait time from GPU quota error message.""" | |
| match = re.search(r'Try again in (\d+):(\d+)(?::(\d+))?', error_msg) | |
| if match: | |
| if match.group(3): # HH:MM:SS | |
| return int(match.group(1)) * 3600 + int(match.group(2)) * 60 + int(match.group(3)) | |
| else: # MM:SS | |
| return int(match.group(1)) * 60 + int(match.group(2)) | |
| return 60 | |
| def call_with_retry(client, *args, api_name, max_retries=5): | |
| """Call API with exponential backoff retry.""" | |
| delay = 10 | |
| for attempt in range(max_retries): | |
| try: | |
| return client.predict(*args, api_name=api_name) | |
| except Exception as e: | |
| error_msg = str(e) | |
| if "exceeded your GPU quota" in error_msg: | |
| wait_time = extract_wait_time(error_msg) | |
| actual_delay = max(delay, wait_time + 5) | |
| print(f"⏳ GPU quota exhausted. Waiting {actual_delay}s... (attempt {attempt + 1})") | |
| time.sleep(actual_delay) | |
| delay *= 2 | |
| else: | |
| raise | |
| raise Exception("Max retries reached") | |
| # Usage | |
| result = call_with_retry( | |
| client, | |
| handle_file("image.jpg"), | |
| "car", "sam3", False, 4, 0.3, False, "yolo", "", | |
| api_name="/process" | |
| ) | |
| ``` | |
| --- | |
| ## Using with cURL | |
| ### Upload File First | |
| ```bash | |
| # Upload image | |
| FILE_URL=$(curl -s -X POST http://localhost:7860/upload \ | |
| -F "files=@image.jpg" | \ | |
| python3 -c "import sys, json; print(json.load(sys.stdin)[0])") | |
| ``` | |
| ### Call Endpoints | |
| ```bash | |
| # Segment | |
| curl -X POST http://localhost:7860/api/segment \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"data\": [\"$FILE_URL\", \"car, person\", \"sam3\", false]}" | |
| # Compress (no mask) | |
| curl -X POST http://localhost:7860/api/compress \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"data\": [\"$FILE_URL\", null, 4, 0.3]}" | |
| # Detect | |
| curl -X POST http://localhost:7860/api/detect \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"data\": [\"$FILE_URL\", \"yolo\", \"\", 0.25, false]}" | |
| # Full pipeline | |
| curl -X POST http://localhost:7860/api/process \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"data\": [\"$FILE_URL\", \"car, person\", \"sam3\", 4, 0.3, true, \"yolo\", \"\"]}" | |
| ``` | |
| --- | |
| ## Performance Guide | |
| ### Choosing Segmentation Methods | |
| **Use Pixel-Perfect Segmentation when:** | |
| - You need precise object boundaries | |
| - Working with single images or small videos | |
| - Quality is more important than speed | |
| - Computing time/power is not constrained | |
| **Use Fast Segmentation (fake_*) when:** | |
| - Processing large videos or real-time streams | |
| - Speed is critical (2-3x faster) | |
| - Rectangular masks are acceptable | |
| - Need temporal consistency (tracking maintains object IDs) | |
| ### Performance Benchmarks | |
| **Video Processing (480p, 30 frames):** | |
| | Method | Speed | Use Case | | |
| |--------|-------|----------| | |
| | `fake_yolo` | ~70 fps | Real-time video, fastest | | |
| | `fake_yolo_botsort` | ~65 fps | Real-time with robust tracking | | |
| | `fake_detr` | ~40 fps | Good speed + accuracy balance | | |
| | `fake_fasterrcnn` | ~30 fps | Accurate detection | | |
| | `yolo` (pixel-perfect) | ~30 fps | Instance segmentation | | |
| | `sam3` | ~15 fps | Prompt-based, highest flexibility | | |
| | `mask2former` | ~20 fps | Panoptic segmentation | | |
| **Detection Performance (with batch support):** | |
| | Detector | Single-Frame | Batch (30 frames) | Speedup | | |
| |----------|--------------|-------------------|---------| | |
| | YOLO26x | ~40 fps | ~70 fps | 1.75x | | |
| | DETR | ~15 fps | ~40 fps | 2.67x | | |
| | Faster R-CNN | ~12 fps | ~30 fps | 2.50x | | |
| ### Example: Fast Video Processing | |
| ```python | |
| from gradio_client import Client, handle_file | |
| import json | |
| import time | |
| client = Client("http://localhost:7860") | |
| # Method 1: Fast fake segmentation (recommended for video) | |
| start = time.time() | |
| output1, stats1 = client.predict( | |
| handle_file("long_video.mp4"), | |
| "person, car", | |
| "fake_yolo", # Fast detection + tracking | |
| "static", | |
| 4, | |
| 0.3, | |
| 15.0, | |
| 500, 5, 30, False, "yolo", None, | |
| api_name="/process_video" | |
| ) | |
| fast_time = time.time() - start | |
| # Method 2: Pixel-perfect segmentation | |
| start = time.time() | |
| output2, stats2 = client.predict( | |
| handle_file("long_video.mp4"), | |
| "person, car", | |
| "yolo", # Pixel-perfect YOLO26x-seg | |
| "static", | |
| 4, | |
| 0.3, | |
| 15.0, | |
| 500, 5, 30, False, "yolo", None, | |
| api_name="/process_video" | |
| ) | |
| perfect_time = time.time() - start | |
| stats1_data = json.loads(stats1) | |
| stats2_data = json.loads(stats2) | |
| print(f"Fast segmentation: {fast_time:.2f}s") | |
| print(f"Pixel-perfect: {perfect_time:.2f}s") | |
| print(f"Speedup: {perfect_time/fast_time:.2f}x faster") | |
| print(f"Compression ratio (fast): {stats1_data['compression_ratio']:.2f}x") | |
| print(f"Compression ratio (perfect): {stats2_data['compression_ratio']:.2f}x") | |
| ``` | |
| ### Example: Tracker Comparison | |
| ```python | |
| # Test different trackers with same detector | |
| trackers = { | |
| "ByteTrack (default)": "fake_yolo", | |
| "BoTSORT": "fake_yolo_botsort", | |
| } | |
| for name, method in trackers.items(): | |
| output, stats = client.predict( | |
| handle_file("test_video.mp4"), | |
| "person", | |
| method, | |
| "static", | |
| 4, 0.3, 15.0, | |
| 500, 5, 30, False, "yolo", None, | |
| api_name="/process_video" | |
| ) | |
| stats_data = json.loads(stats) | |
| print(f"{name}: {stats_data['avg_roi_coverage']:.2f}% avg coverage") | |
| ``` | |
| --- | |
| ## Example Scripts | |
| ### Batch Image Processing | |
| ```python | |
| from gradio_client import Client, handle_file | |
| from pathlib import Path | |
| client = Client("http://localhost:7860") | |
| output_dir = Path("compressed_output") | |
| output_dir.mkdir(exist_ok=True) | |
| for img_path in Path("images").glob("*.jpg"): | |
| print(f"Processing {img_path.name}...") | |
| compressed, mask, bpp, ratio, coverage, _ = client.predict( | |
| handle_file(str(img_path)), | |
| "car, person", | |
| "sam3", | |
| 4, 0.3, | |
| False, "", "", | |
| api_name="/process" | |
| ) | |
| # Save compressed image | |
| output_path = output_dir / f"compressed_{img_path.name}" | |
| with open(output_path, "wb") as f: | |
| f.write(open(compressed, "rb").read()) | |
| print(f" BPP: {bpp:.4f}, Ratio: {ratio:.2f}x, ROI: {coverage*100:.2f}%") | |
| ``` | |
| ### Video Processing with Mask Caching | |
| ```python | |
| from gradio_client import Client, handle_file | |
| import json | |
| client = Client("http://localhost:7860") | |
| video_path = "input_video.mp4" | |
| # Step 1: Segment video (one-time cost) | |
| mask_file, seg_stats = client.predict( | |
| handle_file(video_path), | |
| "person, car", | |
| "sam3", | |
| False, # return mask file | |
| 15.0, | |
| api_name="/segment_video" | |
| ) | |
| print(f"Segmented video, masks saved to: {mask_file}") | |
| # Step 2: Compress with different settings, reusing masks | |
| for quality in [3, 4, 5]: | |
| compressed, comp_stats = client.predict( | |
| handle_file(video_path), | |
| mask_file, # reuse cached masks | |
| quality, | |
| 0.3, | |
| 15.0, | |
| api_name="/compress_video" | |
| ) | |
| stats = json.loads(comp_stats) | |
| print(f"Quality {quality}: {stats['compression_ratio']}x compression") | |
| ``` | |
| ### Detection Comparison (Original vs Compressed) | |
| ```python | |
| from gradio_client import Client, handle_file | |
| import json | |
| client = Client("http://localhost:7860") | |
| image = "street_scene.jpg" | |
| # Detect on original | |
| _, dets_orig = client.predict( | |
| handle_file(image), "yolo", "", 0.25, False, | |
| api_name="/detect" | |
| ) | |
| orig_count = len(json.loads(dets_orig)) | |
| print(f"Original: {orig_count} detections") | |
| # Compress and detect | |
| compressed, _, bpp, ratio, _, dets_comp = client.predict( | |
| handle_file(image), | |
| "car, person, road", | |
| "sam3", | |
| 4, 0.3, | |
| True, "yolo", "", | |
| api_name="/process" | |
| ) | |
| comp_count = len(json.loads(dets_comp)) | |
| retention = comp_count / orig_count * 100 if orig_count else 0 | |
| print(f"Compressed ({ratio:.2f}x): {comp_count} detections") | |
| print(f"Detection retention: {retention:.1f}%") | |
| ``` | |
| --- | |
| ## Additional Resources | |
| - **Web UI**: Visit `http://localhost:7860` for interactive interface | |
| - **GitHub**: See repository for source code and examples | |
| - **Model Checkpoints**: Available in `checkpoints/` directory | |
| - **Test Images**: Sample images in `data/images/` directory | |