Raheeb Hassan
Add aggressiveness parameter for bandwidth savings strategy in video processing
760687a
| # API Documentation | |
| This document describes the Gradio API endpoints exposed by the ROI-VAE image and video compression application. The API allows programmatic access to segmentation, compression, detection, and full pipeline processing for both images and videos. | |
| **Live Demo:** https://biaslab2025-contextual-communication-demo.hf.space | |
| ## Table of Contents | |
| - [Quick Start](#quick-start) | |
| - [Important Notes](#important-notes) | |
| - [Image API Endpoints](#image-api-endpoints) | |
| - [/segment](#1-segment---generate-roi-mask) | |
| - [/compress](#2-compress---compress-image) | |
| - [/detect](#3-detect---object-detection) | |
| - [/detect_overlay](#31-detect_overlay---detection-with-visualization) | |
| - [/process](#4-process---full-image-pipeline) | |
| - [Video API Endpoints](#video-api-endpoints) | |
| - [/segment_video](#1-segment_video---segment-video) | |
| - [/compress_video](#2-compress_video---compress-video) | |
| - [/detect_video](#3-detect_video---video-detection) | |
| - [/process_video](#4-process_video---full-video-pipeline) | |
| - [Streaming Video API Endpoints](#streaming-video-api-endpoints) | |
| - [/stream_process_video](#1-stream_process_video---full-streaming-pipeline) | |
| - [/stream_compress_video](#2-stream_compress_video---simplified-streaming-compression) | |
| - [Class Reference](#class-reference) | |
| - [Error Handling](#error-handling) | |
| - [GPU Quota Handling](#handling-gpu-quota-on-hf-spaces) | |
| - [cURL Examples](#using-with-curl) | |
| - [Example Scripts](#example-scripts) | |
| --- | |
| ## Quick Start | |
| ### Installation | |
| ```bash | |
| pip install gradio_client | |
| ``` | |
| ### Image Processing | |
| ```python | |
| from gradio_client import Client, handle_file | |
| # Connect to the API | |
| client = Client("https://biaslab2025-contextual-communication-demo.hf.space") | |
| # Or local: client = Client("http://localhost:7860") | |
| # Full pipeline: segment → compress → detect | |
| compressed, mask, bpp, ratio, coverage, detections_json = client.predict( | |
| handle_file("path/to/image.jpg"), | |
| "car, person", # segmentation prompt | |
| "sam3", # segmentation method | |
| 4, # quality level (1-5) | |
| 0.3, # sigma (background compression) | |
| True, # run detection | |
| "yolo", # detection method | |
| "", # detection classes (empty for closed-vocab) | |
| api_name="/process" | |
| ) | |
| print(f"Compression: {bpp:.4f} bpp ({ratio:.2f}x)") | |
| ``` | |
| ### Video Processing | |
| ```python | |
| from gradio_client import Client, handle_file | |
| import json | |
| client = Client("http://localhost:7860") | |
| # Full pipeline with static settings | |
| output_video, stats_json = client.predict( | |
| handle_file("path/to/video.mp4"), | |
| "person, car", # segmentation classes | |
| "sam3", # segmentation method | |
| "static", # mode: "static" or "dynamic" | |
| 4, # quality level (1-5) | |
| 0.3, # sigma | |
| 15.0, # output FPS | |
| 500, # bandwidth (dynamic mode) | |
| 5, # min_fps (dynamic mode) | |
| 30, # max_fps (dynamic mode) | |
| False, # run detection | |
| "yolo", # detection method | |
| None, # mask_file_path (optional) | |
| api_name="/process_video" | |
| ) | |
| stats = json.loads(stats_json) | |
| print(f"Compressed video: {output_video}") | |
| print(f"Total frames: {stats['total_frames']}") | |
| ``` | |
| --- | |
| ## Important Notes | |
| ### File Handling | |
| Always wrap file paths with `handle_file()` when using `gradio_client`: | |
| ```python | |
| from gradio_client import handle_file | |
| # ✅ Correct | |
| client.predict(handle_file("image.jpg"), ...) | |
| # ❌ Incorrect - will fail with validation error | |
| client.predict("image.jpg", ...) | |
| ``` | |
| ### Detection Output Format | |
| All detection endpoints return JSON strings with this structure: | |
| ```python | |
| import json | |
| detections = json.loads(detections_json) | |
| # Each detection has: | |
| # - label: str (class name) | |
| # - score: float (confidence 0-1) | |
| # - bbox_xyxy: list[float] (bounding box [x1, y1, x2, y2]) | |
| ``` | |
| ### Open-Vocabulary Detectors | |
| The following detectors require a `classes` parameter: | |
| - `yolo_world` - YOLO-World | |
| - `grounding_dino` - Grounding DINO | |
| Closed-vocabulary detectors (`yolo`, `detr`, `faster_rcnn`, etc.) use pretrained COCO classes and ignore the `classes` parameter. | |
| --- | |
| ## Image API Endpoints | |
| ### 1. `/segment` - Generate ROI Mask | |
| Segments an image to create a Region of Interest (ROI) mask. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `prompt` | str | `"object"` | Comma-separated classes or natural language prompt | | |
| | `method` | str | `"sam3"` | Segmentation method (see [methods](#segmentation-methods)) | | |
| | `return_overlay` | bool | `False` | If `True`, returns image with ROI highlighted instead of mask | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `result_image` | Image | Grayscale mask OR image with ROI overlay (if `return_overlay=True`) | | |
| | `roi_coverage` | float | Fraction of image covered by ROI (0.0-1.0) | | |
| | `classes_used` | str | JSON list of classes/prompts used | | |
| **Example:** | |
| ```python | |
| # Get binary mask (default) | |
| mask, coverage, classes = client.predict( | |
| handle_file("car_scene.jpg"), | |
| "car, road", | |
| "sam3", | |
| False, # return_overlay | |
| api_name="/segment" | |
| ) | |
| print(f"ROI covers {coverage*100:.2f}% of image") | |
| # Get image with ROI highlighted | |
| highlighted, coverage, classes = client.predict( | |
| handle_file("car_scene.jpg"), | |
| "car, road", | |
| "sam3", | |
| True, # return_overlay=True | |
| api_name="/segment" | |
| ) | |
| ``` | |
| --- | |
| ### 2. `/compress` - Compress Image | |
| Compresses an image using TIC VAE, optionally with an ROI mask for variable quality. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `mask_image` | Image | `None` | ROI mask (white=ROI, black=background) | | |
| | `quality` | int | `4` | Quality level 1-5 | | |
| | `sigma` | float | `0.3` | Background preservation (0.01-1.0) | | |
| **Quality Levels:** | |
| | Level | Lambda | Description | | |
| |-------|--------|-------------| | |
| | 1 | 0.0035 | Smallest file | | |
| | 2 | 0.013 | Smaller file | | |
| | 3 | 0.025 | Balanced | | |
| | 4 | 0.0483 | Higher quality (default) | | |
| | 5 | 0.0932 | Best quality | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `compressed_image` | Image | Compressed output image | | |
| | `bpp` | float | Bits per pixel | | |
| | `compression_ratio` | float | Compression ratio (24/bpp) | | |
| **Example:** | |
| ```python | |
| # Compress without mask (uniform quality) | |
| compressed, bpp, ratio = client.predict( | |
| handle_file("image.jpg"), | |
| None, # no mask | |
| 4, # quality | |
| 0.3, # sigma (ignored without mask) | |
| api_name="/compress" | |
| ) | |
| # Compress with ROI mask | |
| mask, _, _ = client.predict(handle_file("image.jpg"), "person", "yolo", False, api_name="/segment") | |
| compressed, bpp, ratio = client.predict( | |
| handle_file("image.jpg"), | |
| handle_file(mask), | |
| 4, | |
| 0.2, # aggressive background compression | |
| api_name="/compress" | |
| ) | |
| ``` | |
| --- | |
| ### 3. `/detect` - Object Detection | |
| Runs object detection on an image and returns detection results as JSON. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `method` | str | `"yolo"` | Detection method (see [methods](#detection-methods)) | | |
| | `classes` | str | `""` | Comma-separated classes (required for open-vocab detectors) | | |
| | `confidence` | float | `0.25` | Confidence threshold (0.0-1.0) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `detections_json` | str | JSON string of detection results | | |
| **Example - Closed-Vocabulary:** | |
| ```python | |
| import json | |
| # YOLO detection (COCO classes) | |
| dets_json = client.predict( | |
| handle_file("street_scene.jpg"), | |
| "yolo", | |
| "", # no classes needed | |
| 0.25, | |
| api_name="/detect" | |
| ) | |
| detections = json.loads(dets_json) | |
| for det in detections: | |
| print(f"{det['label']}: {det['score']:.2f}") | |
| ``` | |
| **Example - Open-Vocabulary:** | |
| ```python | |
| # YOLO-World with custom classes | |
| dets_json = client.predict( | |
| handle_file("image.jpg"), | |
| "yolo_world", | |
| "hat, backpack, umbrella", # custom classes required | |
| 0.25, | |
| api_name="/detect" | |
| ) | |
| ``` | |
| --- | |
| ### 3.1. `/detect_overlay` - Detection with Visualization | |
| Runs object detection and returns the image with bounding boxes drawn. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `method` | str | `"yolo"` | Detection method (see [methods](#detection-methods)) | | |
| | `classes` | str | `""` | Comma-separated classes (required for open-vocab detectors) | | |
| | `confidence` | float | `0.25` | Confidence threshold (0.0-1.0) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `result_image` | Image | Image with detection bounding boxes | | |
| | `detections_json` | str | JSON string of detection results | | |
| **Example:** | |
| ```python | |
| import json | |
| # Get image with detection boxes | |
| result_img, dets_json = client.predict( | |
| handle_file("street_scene.jpg"), | |
| "yolo", | |
| "", | |
| 0.25, | |
| api_name="/detect_overlay" | |
| ) | |
| # result_img is a file path to the image with boxes drawn | |
| print(f"Image with boxes: {result_img}") | |
| detections = json.loads(dets_json) | |
| ``` | |
| --- | |
| ### 4. `/process` - Full Image Pipeline | |
| Runs the complete pipeline: segmentation → compression → optional detection. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `image` | Image | required | Input image file | | |
| | `prompt` | str | `"object"` | Segmentation prompt/classes | | |
| | `segmentation_method` | str | `"sam3"` | ROI segmentation method | | |
| | `quality` | int | `4` | Compression quality (1-5) | | |
| | `sigma` | float | `0.3` | Background preservation (0.01-1.0) | | |
| | `run_detection` | bool | `False` | Whether to run detection on output | | |
| | `detection_method` | str | `"yolo"` | Detector to use | | |
| | `detection_classes` | str | `""` | Classes for open-vocab detectors | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `compressed_image` | Image | Compressed output image | | |
| | `mask_image` | Image | Generated ROI mask | | |
| | `bpp` | float | Bits per pixel | | |
| | `compression_ratio` | float | Compression ratio | | |
| | `roi_coverage` | float | ROI coverage percentage (0-1) | | |
| | `detections_json` | str | JSON detections (empty list if `run_detection=False`) | | |
| **Example:** | |
| ```python | |
| import json | |
| compressed, mask, bpp, ratio, coverage, dets_json = client.predict( | |
| handle_file("street.jpg"), | |
| "car, person, road", | |
| "sam3", | |
| 4, | |
| 0.3, | |
| True, # run detection | |
| "yolo", | |
| "", | |
| api_name="/process" | |
| ) | |
| print(f"ROI Coverage: {coverage*100:.2f}%") | |
| print(f"Compression: {bpp:.4f} bpp ({ratio:.2f}x)") | |
| print(f"Detections: {len(json.loads(dets_json))}") | |
| ``` | |
| --- | |
| ## Video API Endpoints | |
| ### 1. `/segment_video` - Segment Video | |
| Segments a video to find ROI regions, returning either a mask file or overlay video. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `video_path` | Video | required | Input video file | | |
| | `prompt` | str | `"object"` | Comma-separated classes or natural language prompt | | |
| | `method` | str | `"sam3"` | Segmentation method | | |
| | `return_overlay` | bool | `False` | If `True`, returns video with ROI highlighted | | |
| | `output_fps` | float | `15.0` | Output framerate (max 30) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `result_path` | File/Video | Mask file (NPZ) OR video with ROI overlay | | |
| | `stats_json` | str | JSON with frame count, coverage, and classes | | |
| **Example:** | |
| ```python | |
| import json | |
| # Get mask file for reuse in compression | |
| mask_file, stats_json = client.predict( | |
| handle_file("video.mp4"), | |
| "person, car", | |
| "sam3", | |
| False, # return masks file | |
| 15.0, # fps | |
| api_name="/segment_video" | |
| ) | |
| stats = json.loads(stats_json) | |
| print(f"Processed {stats['total_frames']} frames") | |
| print(f"Avg ROI coverage: {stats['avg_roi_coverage']*100:.2f}%") | |
| # Get video with ROI overlay for visualization | |
| overlay_video, _ = client.predict( | |
| handle_file("video.mp4"), | |
| "person, car", | |
| "sam3", | |
| True, # return overlay video | |
| 15.0, | |
| api_name="/segment_video" | |
| ) | |
| ``` | |
| --- | |
| ### 2. `/compress_video` - Compress Video | |
| Compresses a video with optional ROI mask preservation. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `video_path` | Video | required | Input video file | | |
| | `mask_file_path` | str | `None` | Path to pre-computed masks (from `/segment_video`) | | |
| | `quality` | int | `4` | Quality level (1-5) | | |
| | `sigma` | float | `0.3` | Background preservation (0.01-1.0) | | |
| | `output_fps` | float | `15.0` | Target output framerate | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `compressed_video` | Video | Compressed output video | | |
| | `stats_json` | str | JSON with compression statistics | | |
| **Example:** | |
| ```python | |
| import json | |
| # First, segment to get masks | |
| mask_file, _ = client.predict( | |
| handle_file("video.mp4"), "person", "sam3", False, 15.0, | |
| api_name="/segment_video" | |
| ) | |
| # Then compress with cached masks (3-5x faster!) | |
| compressed, stats_json = client.predict( | |
| handle_file("video.mp4"), | |
| mask_file, # reuse masks | |
| 4, # quality | |
| 0.3, # sigma | |
| 15.0, # fps | |
| api_name="/compress_video" | |
| ) | |
| stats = json.loads(stats_json) | |
| print(f"Compression ratio: {stats['compression_ratio']}x") | |
| print(f"Total size: {stats['total_size_kb']} KB") | |
| ``` | |
| --- | |
| ### 3. `/detect_video` - Video Detection | |
| Runs object detection on each frame of a video. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `video_path` | Video | required | Input video file | | |
| | `method` | str | `"yolo"` | Detection method | | |
| | `classes` | str | `""` | Comma-separated classes (required for open-vocab) | | |
| | `confidence` | float | `0.25` | Confidence threshold (0.0-1.0) | | |
| | `return_overlay` | bool | `False` | If `True`, returns video with detection boxes | | |
| | `output_fps` | float | `15.0` | Output framerate (max 30) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `result_video` | Video | Video with detection boxes (if `return_overlay=True`), None otherwise | | |
| | `detections_json` | str | JSON with per-frame detections | | |
| **Example:** | |
| ```python | |
| import json | |
| # Get per-frame detections JSON | |
| _, dets_json = client.predict( | |
| handle_file("video.mp4"), | |
| "yolo", | |
| "", | |
| 0.25, | |
| False, # return JSON only | |
| 15.0, | |
| api_name="/detect_video" | |
| ) | |
| data = json.loads(dets_json) | |
| print(f"Total detections: {data['total_detections']}") | |
| print(f"Avg per frame: {data['avg_detections_per_frame']}") | |
| # Get video with detection overlays | |
| det_video, _ = client.predict( | |
| handle_file("video.mp4"), | |
| "yolo", | |
| "", | |
| 0.25, | |
| True, # return overlay video | |
| 15.0, | |
| api_name="/detect_video" | |
| ) | |
| ``` | |
| --- | |
| ### 4. `/process_video` - Full Video Pipeline | |
| Processes a video with ROI-based compression (segment → compress), with optional detection. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `video_path` | Video | required | Input video file | | |
| | `prompt` | str | `"object"` | Segmentation prompt/classes | | |
| | `segmentation_method` | str | `"sam3"` | ROI segmentation method | | |
| | `mode` | str | `"static"` | `"static"` or `"dynamic"` | | |
| | `quality` | int | `4` | Quality level 1-5 (static mode) | | |
| | `sigma` | float | `0.3` | Background preservation (static mode) | | |
| | `output_fps` | float | `15.0` | Target framerate (static mode) | | |
| | `bandwidth_kbps` | float | `500.0` | Target bandwidth (dynamic mode) | | |
| | `min_fps` | float | `5.0` | Minimum framerate (dynamic mode) | | |
| | `max_fps` | float | `30.0` | Maximum framerate (dynamic mode) | | |
| | `aggressiveness` | float | `0.5` | Bandwidth savings strategy (dynamic mode): `0.0` = use full bandwidth (high FPS always), `0.5` = moderate savings, `1.0` = maximum savings (aggressive FPS reduction for low motion) | | |
| | `run_detection` | bool | `False` | Whether to run detection/tracking | | |
| | `detection_method` | str | `"yolo"` | Detector to use | | |
| | `mask_file_path` | str | `None` | Path to pre-computed masks (skips segmentation) | | |
| **Returns:** | |
| | Output | Type | Description | | |
| |--------|------|-------------| | |
| | `output_video` | Video | Compressed video | | |
| | `stats_json` | str | JSON with detailed statistics | | |
| **Example - Static Mode:** | |
| ```python | |
| import json | |
| output, stats_json = client.predict( | |
| handle_file("video.mp4"), | |
| "person, car", | |
| "sam3", | |
| "static", | |
| 4, 0.3, 15.0, # static: quality, sigma, fps | |
| 500, 5, 30, # dynamic: bandwidth, min_fps, max_fps (ignored) | |
| False, "yolo", None, | |
| api_name="/process_video" | |
| ) | |
| stats = json.loads(stats_json) | |
| print(f"Processed {stats['total_frames']} frames") | |
| ``` | |
| **Example - Dynamic Mode:** | |
| ```python | |
| output, stats_json = client.predict( | |
| handle_file("video.mp4"), | |
| "person", | |
| "yolo", | |
| "dynamic", | |
| 4, 0.3, 15.0, # static settings (ignored) | |
| 750, # target bandwidth 750 kbps | |
| 8, # min FPS | |
| 30, # max FPS | |
| True, "yolo", None, | |
| api_name="/process_video" | |
| ) | |
| ``` | |
| --- | |
| ## Streaming Video API Endpoints | |
| The streaming API provides HLS-style chunk-by-chunk delivery for real-time video processing. Unlike the buffered endpoints above, these endpoints **yield chunks progressively** as they're produced, enabling: | |
| - Real-time streaming to frontend | |
| - Lower latency (first chunks available immediately) | |
| - Memory efficient (no buffering entire video) | |
| - Backwards compatible (existing endpoints remain unchanged) | |
| ### ⚡ Real-Time Behavior | |
| **Yes, this is true streaming!** Chunks are yielded immediately after compression: | |
| 1. Video frames are extracted and accumulated into ~1 second chunks (15-30 frames) | |
| 2. Each chunk is segmented and compressed using batch processing | |
| 3. **Chunk is yielded immediately** - no waiting for subsequent chunks | |
| 4. Frontend receives and can display frames right away | |
| **First chunk latency:** ~1.5-4 seconds (depending on models) | |
| **Subsequent chunks:** Streamed continuously as they're ready | |
| The "chunk" granularity (vs frame-by-frame) is for efficiency - batch processing 15-30 frames at once is much faster than processing individually. | |
| ### 1. `/stream_process_video` - Full Streaming Pipeline | |
| Streams compressed video chunks with segmentation and optional detection. | |
| **Parameters:** | |
| - Same as `/process_video`, plus: | |
| - `frame_format` (str, default: "jpeg"): Frame encoding format ("jpeg" or "png") | |
| - `frame_quality` (int, default: 85): JPEG quality 1-95 (ignored for PNG) | |
| - `max_resolution` (int, default: 720): Maximum height in pixels (e.g., 360, 480, 720, 1080). Video is resized before processing for faster performance. Lower values = faster processing. | |
| **Note:** The `aggressiveness` parameter (0.0-1.0) controls bandwidth savings strategy in dynamic mode - higher values aggressively reduce FPS during low-motion scenes for maximum bandwidth efficiency, while lower values maintain high FPS to use available bandwidth. | |
| **Yields:** | |
| JSON strings, each containing one chunk: | |
| ```json | |
| { | |
| "chunk_index": 0, | |
| "frames": ["base64_encoded_jpeg_1", "base64_encoded_jpeg_2", ...], | |
| "timestamps": [0.0, 0.033, 0.066, ...], | |
| "fps": 15.0, | |
| "stats": { | |
| "avg_bpp": 0.256, | |
| "estimated_bytes": 32768, | |
| "quality_level": 4, | |
| "sigma": 0.3 | |
| } | |
| } | |
| ``` | |
| Final message: | |
| ```json | |
| {"status": "complete"} | |
| ``` | |
| **Example (Python):** | |
| ```python | |
| from gradio_client import Client, handle_file | |
| import json | |
| import base64 | |
| from PIL import Image | |
| from io import BytesIO | |
| client = Client("http://localhost:7860") | |
| # Get generator of chunks | |
| chunk_stream = client.submit( | |
| handle_file("video.mp4"), | |
| "person, car", # prompt | |
| "sam3", # segmentation_method | |
| "static", # mode | |
| 4, # quality | |
| 0.3, # sigma | |
| 15.0, # output_fps | |
| 500.0, # bandwidth_kbps (dynamic mode) | |
| 5.0, # min_fps | |
| 30.0, # max_fps | |
| None, # mask_file_path | |
| "jpeg", # frame_format | |
| 85, # frame_quality | |
| 360, # max_resolution (360p for speed) | |
| api_name="/stream_process_video" | |
| ) | |
| # Process chunks as they arrive | |
| all_frames = [] | |
| for chunk_json in chunk_stream: | |
| chunk = json.loads(chunk_json) | |
| if "status" in chunk and chunk["status"] == "complete": | |
| print("Streaming complete!") | |
| break | |
| if "error" in chunk: | |
| print(f"Error: {chunk['error']}") | |
| break | |
| # Decode frames from base64 | |
| for frame_b64 in chunk["frames"]: | |
| frame_bytes = base64.b64decode(frame_b64) | |
| frame = Image.open(BytesIO(frame_bytes)) | |
| all_frames.append(frame) | |
| # Print progress | |
| print(f"Chunk {chunk['chunk_index']}: " | |
| f"{len(chunk['frames'])} frames @ {chunk['fps']} FPS, " | |
| f"BPP: {chunk['stats']['avg_bpp']:.3f}") | |
| print(f"Total frames received: {len(all_frames)}") | |
| ``` | |
| **Example (JavaScript/TypeScript):** | |
| ```typescript | |
| async function streamVideo(videoFile: File) { | |
| const client = await Client.connect("http://localhost:7860"); | |
| const chunks: VideoChunk[] = []; | |
| // Start streaming | |
| const stream = client.submit("/stream_process_video", [ | |
| videoFile, | |
| "person, car", // prompt | |
| "sam3", // method | |
| "static", // mode | |
| 4, // quality | |
| 0.3, // sigma | |
| 15.0, // fps | |
| 500, 5, 30, // dynamic settings | |
| null, // mask_file | |
| "jpeg", // format | |
| 85, // quality | |
| 360 // max_resolution (360p for speed) | |
| ]); | |
| // Process chunks as they arrive | |
| for await (const chunkJson of stream) { | |
| const chunk = JSON.parse(chunkJson); | |
| if (chunk.status === "complete") { | |
| console.log("✅ Stream complete"); | |
| break; | |
| } | |
| if (chunk.error) { | |
| console.error("❌ Error:", chunk.error); | |
| break; | |
| } | |
| // Decode frames | |
| const frames = chunk.frames.map((b64: string) => { | |
| const blob = base64ToBlob(b64, "image/jpeg"); | |
| return URL.createObjectURL(blob); | |
| }); | |
| chunks.push({ | |
| index: chunk.chunk_index, | |
| frames: frames, | |
| timestamps: chunk.timestamps, | |
| fps: chunk.fps, | |
| stats: chunk.stats | |
| }); | |
| console.log(`📦 Chunk ${chunk.chunk_index}: ${frames.length} frames`); | |
| // Display first frame of chunk immediately | |
| displayFrame(frames[0]); | |
| } | |
| return chunks; | |
| } | |
| function base64ToBlob(base64: string, mimeType: string): Blob { | |
| const byteString = atob(base64); | |
| const arrayBuffer = new ArrayBuffer(byteString.length); | |
| const uint8Array = new Uint8Array(arrayBuffer); | |
| for (let i = 0; i < byteString.length; i++) { | |
| uint8Array[i] = byteString.charCodeAt(i); | |
| } | |
| return new Blob([uint8Array], { type: mimeType }); | |
| } | |
| ``` | |
| ### 2. `/stream_compress_video` - Simplified Streaming Compression | |
| Simpler streaming endpoint without segmentation configuration (use with pre-computed masks). | |
| **Parameters:** | |
| - `video_path` (str): Input video file | |
| - `mask_file_path` (str, optional): Pre-computed mask file from `/segment_video` | |
| - `quality` (int, default: 4): Quality level 1-5 | |
| - `sigma` (float, default: 0.3): Background preservation 0.01-1.0 | |
| - `output_fps` (float, default: 15.0): Target framerate | |
| - `frame_format` (str, default: "jpeg"): Frame encoding | |
| - `frame_quality` (int, default: 85): JPEG quality | |
| **Yields:** | |
| Same format as `/stream_process_video` | |
| **Example:** | |
| ```python | |
| from gradio_client import Client, handle_file | |
| import json | |
| client = Client("http://localhost:7860") | |
| # Pre-segment video once | |
| mask_file, _ = client.predict( | |
| handle_file("video.mp4"), | |
| "person, car", | |
| "sam3", | |
| False, # return mask file | |
| 15.0, | |
| api_name="/segment_video" | |
| ) | |
| # Stream compression with cached masks | |
| chunk_stream = client.submit( | |
| handle_file("video.mp4"), | |
| mask_file, # reuse masks | |
| 4, # quality | |
| 0.3, # sigma | |
| 15.0, # fps | |
| "jpeg", # format | |
| 85, # quality | |
| api_name="/stream_compress_video" | |
| ) | |
| for chunk_json in chunk_stream: | |
| chunk = json.loads(chunk_json) | |
| if "status" in chunk: | |
| break | |
| print(f"Chunk {chunk['chunk_index']}: {len(chunk['frames'])} frames") | |
| ``` | |
| ### Benefits of Streaming API | |
| 1. **Lower Latency**: First chunks available in ~1 second (vs buffering entire video) | |
| 2. **Memory Efficient**: Process frames incrementally, no need to buffer | |
| 3. **Real-time Display**: Show frames to user as they're compressed | |
| 4. **Progress Updates**: Monitor compression progress chunk-by-chunk | |
| 5. **Bandwidth Adaptive**: Works with dynamic mode for adaptive streaming | |
| ### Chunk Structure | |
| Each chunk contains: | |
| - **chunk_index**: Sequential number (0, 1, 2, ...) | |
| - **frames**: List of base64-encoded images (typically 15-30 frames per chunk) | |
| - **timestamps**: Frame timestamps in seconds since video start | |
| - **fps**: Effective framerate for this chunk | |
| - **stats**: Compression statistics | |
| - `avg_bpp`: Average bits per pixel | |
| - `estimated_bytes`: Chunk size estimate | |
| - `quality_level`: TIC model quality (1-5) | |
| - `sigma`: Background compression factor | |
| - `motion` (dynamic mode only): Motion analysis metrics | |
| ### Backwards Compatibility | |
| All existing API endpoints (`/process_video`, `/compress_video`, etc.) remain unchanged and continue to work as before. The streaming endpoints are **additive** - they don't modify existing behavior. | |
| --- | |
| ## Class Reference | |
| ### Segmentation Methods | |
| | Method | Description | Classes | | |
| |--------|-------------|---------| | |
| | `sam3` | Prompt-based (natural language) | Any text prompt | | |
| | `yolo` | YOLO instance segmentation | 80 COCO classes | | |
| | `segformer` | Cityscapes semantic segmentation | 19 classes | | |
| | `mask2former` | Swin-based panoptic/semantic | 133 COCO / 150 ADE20K | | |
| | `maskrcnn` | ResNet50-FPN instance segmentation | 80 COCO classes | | |
| ### Detection Methods | |
| **Closed-Vocabulary (COCO pretrained):** | |
| | Method | Description | | |
| |--------|-------------| | |
| | `yolo` | Ultralytics YOLO | | |
| | `detr` | Facebook DETR | | |
| | `faster_rcnn` | Faster R-CNN | | |
| | `retinanet` | RetinaNet | | |
| | `fcos` | FCOS | | |
| | `ssd` | SSD300 | | |
| **Open-Vocabulary (requires `classes` parameter):** | |
| | Method | Description | | |
| |--------|-------------| | |
| | `yolo_world` | YOLO-World | | |
| | `grounding_dino` | Grounding DINO | | |
| ### COCO Classes (80) | |
| ``` | |
| person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, | |
| traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, | |
| dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, | |
| handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, | |
| baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, | |
| wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, | |
| broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, | |
| bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, | |
| microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, | |
| teddy bear, hair drier, toothbrush | |
| ``` | |
| ### Cityscapes Classes (19) | |
| ``` | |
| road, sidewalk, building, wall, fence, pole, traffic light, traffic sign, | |
| vegetation, terrain, sky, person, rider, car, truck, bus, train, motorcycle, | |
| bicycle | |
| ``` | |
| --- | |
| ## Error Handling | |
| ```python | |
| try: | |
| result = client.predict( | |
| handle_file("image.jpg"), | |
| ..., | |
| api_name="/endpoint" | |
| ) | |
| except Exception as e: | |
| print(f"API Error: {e}") | |
| ``` | |
| **Common Errors:** | |
| | Error | Cause | Solution | | |
| |-------|-------|----------| | |
| | Validation error for ImageData | Missing `handle_file()` | Wrap file paths with `handle_file()` | | |
| | File does not exist | Invalid path | Check file path is correct | | |
| | Empty detection classes | Open-vocab detector without classes | Provide classes for `yolo_world`, `grounding_dino` | | |
| | GPU quota exceeded | HF Spaces limit | Wait and retry (see below) | | |
| --- | |
| ## Handling GPU Quota on HF Spaces | |
| When using Hugging Face Spaces with ZeroGPU, you may encounter quota limits: | |
| ``` | |
| You have exceeded your GPU quota (60s requested vs. 0s left). Try again in 0:05:30 | |
| ``` | |
| ### Automatic Retry with Backoff | |
| ```python | |
| import time | |
| import re | |
| def extract_wait_time(error_msg): | |
| """Extract wait time from GPU quota error message.""" | |
| match = re.search(r'Try again in (\d+):(\d+)(?::(\d+))?', error_msg) | |
| if match: | |
| if match.group(3): # HH:MM:SS | |
| return int(match.group(1)) * 3600 + int(match.group(2)) * 60 + int(match.group(3)) | |
| else: # MM:SS | |
| return int(match.group(1)) * 60 + int(match.group(2)) | |
| return 60 | |
| def call_with_retry(client, *args, api_name, max_retries=5): | |
| """Call API with exponential backoff retry.""" | |
| delay = 10 | |
| for attempt in range(max_retries): | |
| try: | |
| return client.predict(*args, api_name=api_name) | |
| except Exception as e: | |
| error_msg = str(e) | |
| if "exceeded your GPU quota" in error_msg: | |
| wait_time = extract_wait_time(error_msg) | |
| actual_delay = max(delay, wait_time + 5) | |
| print(f"⏳ GPU quota exhausted. Waiting {actual_delay}s... (attempt {attempt + 1})") | |
| time.sleep(actual_delay) | |
| delay *= 2 | |
| else: | |
| raise | |
| raise Exception("Max retries reached") | |
| # Usage | |
| result = call_with_retry( | |
| client, | |
| handle_file("image.jpg"), | |
| "car", "sam3", False, 4, 0.3, False, "yolo", "", | |
| api_name="/process" | |
| ) | |
| ``` | |
| --- | |
| ## Using with cURL | |
| ### Upload File First | |
| ```bash | |
| # Upload image | |
| FILE_URL=$(curl -s -X POST http://localhost:7860/upload \ | |
| -F "files=@image.jpg" | \ | |
| python3 -c "import sys, json; print(json.load(sys.stdin)[0])") | |
| ``` | |
| ### Call Endpoints | |
| ```bash | |
| # Segment | |
| curl -X POST http://localhost:7860/api/segment \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"data\": [\"$FILE_URL\", \"car, person\", \"sam3\", false]}" | |
| # Compress (no mask) | |
| curl -X POST http://localhost:7860/api/compress \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"data\": [\"$FILE_URL\", null, 4, 0.3]}" | |
| # Detect | |
| curl -X POST http://localhost:7860/api/detect \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"data\": [\"$FILE_URL\", \"yolo\", \"\", 0.25, false]}" | |
| # Full pipeline | |
| curl -X POST http://localhost:7860/api/process \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"data\": [\"$FILE_URL\", \"car, person\", \"sam3\", 4, 0.3, true, \"yolo\", \"\"]}" | |
| ``` | |
| --- | |
| ## Example Scripts | |
| ### Batch Image Processing | |
| ```python | |
| from gradio_client import Client, handle_file | |
| from pathlib import Path | |
| client = Client("http://localhost:7860") | |
| output_dir = Path("compressed_output") | |
| output_dir.mkdir(exist_ok=True) | |
| for img_path in Path("images").glob("*.jpg"): | |
| print(f"Processing {img_path.name}...") | |
| compressed, mask, bpp, ratio, coverage, _ = client.predict( | |
| handle_file(str(img_path)), | |
| "car, person", | |
| "sam3", | |
| 4, 0.3, | |
| False, "", "", | |
| api_name="/process" | |
| ) | |
| # Save compressed image | |
| output_path = output_dir / f"compressed_{img_path.name}" | |
| with open(output_path, "wb") as f: | |
| f.write(open(compressed, "rb").read()) | |
| print(f" BPP: {bpp:.4f}, Ratio: {ratio:.2f}x, ROI: {coverage*100:.2f}%") | |
| ``` | |
| ### Video Processing with Mask Caching | |
| ```python | |
| from gradio_client import Client, handle_file | |
| import json | |
| client = Client("http://localhost:7860") | |
| video_path = "input_video.mp4" | |
| # Step 1: Segment video (one-time cost) | |
| mask_file, seg_stats = client.predict( | |
| handle_file(video_path), | |
| "person, car", | |
| "sam3", | |
| False, # return mask file | |
| 15.0, | |
| api_name="/segment_video" | |
| ) | |
| print(f"Segmented video, masks saved to: {mask_file}") | |
| # Step 2: Compress with different settings, reusing masks | |
| for quality in [3, 4, 5]: | |
| compressed, comp_stats = client.predict( | |
| handle_file(video_path), | |
| mask_file, # reuse cached masks | |
| quality, | |
| 0.3, | |
| 15.0, | |
| api_name="/compress_video" | |
| ) | |
| stats = json.loads(comp_stats) | |
| print(f"Quality {quality}: {stats['compression_ratio']}x compression") | |
| ``` | |
| ### Detection Comparison (Original vs Compressed) | |
| ```python | |
| from gradio_client import Client, handle_file | |
| import json | |
| client = Client("http://localhost:7860") | |
| image = "street_scene.jpg" | |
| # Detect on original | |
| _, dets_orig = client.predict( | |
| handle_file(image), "yolo", "", 0.25, False, | |
| api_name="/detect" | |
| ) | |
| orig_count = len(json.loads(dets_orig)) | |
| print(f"Original: {orig_count} detections") | |
| # Compress and detect | |
| compressed, _, bpp, ratio, _, dets_comp = client.predict( | |
| handle_file(image), | |
| "car, person, road", | |
| "sam3", | |
| 4, 0.3, | |
| True, "yolo", "", | |
| api_name="/process" | |
| ) | |
| comp_count = len(json.loads(dets_comp)) | |
| retention = comp_count / orig_count * 100 if orig_count else 0 | |
| print(f"Compressed ({ratio:.2f}x): {comp_count} detections") | |
| print(f"Detection retention: {retention:.1f}%") | |
| ``` | |
| --- | |
| ## Additional Resources | |
| - **Web UI**: Visit `http://localhost:7860` for interactive interface | |
| - **GitHub**: See repository for source code and examples | |
| - **Model Checkpoints**: Available in `checkpoints/` directory | |
| - **Test Images**: Sample images in `data/images/` directory | |