A newer version of the Gradio SDK is available: 6.13.0
title: Contextual Communication Demo
emoji: π‘
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
Contextual Communication Demo
An interactive demo for contextual communication in bandwidth-degraded environments (e.g., ISR collection from drones). The core idea is context-aware compression: transmit an extremely compact latent representation while ensuring the decoded output remains useful for downstream decision-making (e.g., object detection).
This repository implements contextual spatial compression for EO/IR-style imagery using an ROI-aware learned image compression model (TIC-style VAE) guided by segmentation masks.
Features
- Contextual (ROI) compression: preserves fidelity in mission-relevant regions while aggressively compressing non-relevant background.
- Mission-driven context extraction: map a mission prompt to ROI masks via multiple segmentation strategies:
- Class-based segmentation (SegFormer / YOLO / Mask2Former / Mask R-CNN)
- Prompt/referring segmentation (SAM3)
- Optional object detection overlays to visualize task retention on the decoded image
- Two operator knobs for bandwidth adaptation:
- Background preservation ($\sigma$, 0.01β1.0): lower = more background degradation
- Transmission quality (checkpoint/lambda selection): higher = larger payload / better reconstruction
- CLI tools for segmentation, ROI compression, and before/after detection retention.
Setup
pip install -r requirements.txt
Checkpoints are expected under checkpoints/ (e.g., checkpoints/tic_lambda_0.0483.pth.tar).
By default, model weights/caches downloaded by detection/segmentation backends are also stored under checkpoints/:
- Hugging Face models under
checkpoints/hf/ - Torch/torchvision weights under
checkpoints/torch/
Usage
Interactive Demo (Hugging Face Spaces / Local)
This repo includes a Gradio app intended for Hugging Face Spaces (app_file: app.py). To run locally:
python app.py
In the UI:
- Enter a Mission and choose a Context Extraction Method (ROI).
- Tune the two knobs to match bandwidth constraints:
- Transmission quality (checkpoint selection)
- Background preservation ($\sigma$)
- Optionally enable object detection overlays.
Note: the app includes a Video tab placeholder (inactive).
CLI: Contextual Spatial Compression (Images)
python roi_compressor.py \
--input data/images/car/0016cf15fa4d4e16.jpg \
--output results/compressed.jpg \
--checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
--sigma 0.3 \
--seg-method yolo \
--seg-classes car \
--highlight
Key arguments:
--sigma: background quality (lower = more compression)--seg-method:segformer,yolo,mask2former,maskrcnn--load-mask: bypass segmentation using a precomputed mask
CLI: Segmentation Only
python roi_segmenter.py \
--input data/images/car/0016cf15fa4d4e16.jpg \
--output results/mask.png \
--method segformer \
--classes car \
--visualize
Prompt-based segmentation (SAM3):
python roi_segmenter.py \
--input data/images/car/0016cf15fa4d4e16.jpg \
--output results/mask.png \
--method sam3 \
--prompt "a car" \
--visualize
CLI: Detection Retention (Before vs After)
Compare original vs already-compressed:
python roi_detection_eval.py \
--before data/images/car/0016cf15fa4d4e16.jpg \
--after results/compressed.jpg \
--detectors yolo fasterrcnn detr \
--viz-dir results/det_viz
Or generate the "after" image via ROI compression and then evaluate:
python roi_detection_eval.py \
--before data/images/car/0016cf15fa4d4e16.jpg \
--checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
--sigma 0.3 \
--seg-method yolo --seg-classes car \
--detectors yolo fasterrcnn \
--save-after results/after.jpg \
--viz-dir results/det_viz
Open-vocabulary example (YOLO-World):
python roi_detection_eval.py \
--before data/images/person/kodim04.png \
--checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
--sigma 0.3 \
--seg-method yolo --seg-classes person \
--detectors yolo_world \
--open-vocab-classes "person,car" \
--viz-dir results/det_viz
Project Structure
.
βββ app.py # Gradio demo (Hugging Face Spaces)
βββ model_cache.py # Cache routing to `checkpoints/`
βββ roi_compressor.py # CLI: contextual (ROI) image compression
βββ roi_segmenter.py # CLI: ROI mask generation
βββ roi_detection_eval.py # CLI: before/after detection retention
βββ segmentation/ # Segmenters + factory
βββ detection/ # Detectors + factory
βββ vae/ # ROI-aware TIC model + compression utils
βββ checkpoints/ # Compression checkpoints + model caches
βββ data/images/ # Sample images
βββ examples.sh
βββ _segmentation_comparison.ipynb
Modular API
Segmentation:
from segmentation import create_segmenter
segmenter = create_segmenter("yolo", device="cuda", conf_threshold=0.3)
mask = segmenter(image, target_classes=["car", "person"])
Compression:
from vae import load_checkpoint, compress_image
model = load_checkpoint("checkpoints/tic_lambda_0.0483.pth.tar", device="cuda")
out = compress_image(image, mask, model, sigma=0.3, device="cuda")
compressed = out["compressed"]
bpp = out["bpp"]
Notes
- OpenCV is included via
opencv-python-headless(recommended for server/Spaces environments). - Some backends download weights on first use; caches are routed under
checkpoints/. - Output directories like
results/are created at runtime by the CLIs.
title: Contextual Communication Demo emoji: "π‘" colorFrom: blue colorTo: purple sdk: gradio sdk_version: "6.2.0" app_file: app.py pinned: false
Contextual Communication Demo
An interactive demo for contextual communication in bandwidth-degraded environments (e.g., ISR collection from drones). The core idea is context-aware compression: transmit an extremely compact latent representation while ensuring the decoded output remains useful for downstream decision-making (e.g., object detection).
This repository implements contextual spatial compression for EO/IR-style imagery using an ROI-aware learned image compression model (TIC-style VAE) guided by segmentation masks.
Features
- Contextual (ROI) compression: Preserves fidelity in mission-relevant regions while aggressively compressing non-relevant background.
- Mission-driven context extraction: A mission prompt can be mapped to ROI masks via multiple segmentation strategies:
- Class-based segmentation (e.g., SegFormer / YOLO / Mask2Former / Mask R-CNN)
- Prompt/referring segmentation (SAM3)
- Optional object detection overlays to evaluate task retention on decoded outputs
- Two operator knobs for bandwidth adaptation:
- Background preservation (
sigma, 0.01β1.0): lower = more background degradation - Overall quality level (checkpoint/lambda selection): higher = larger file / better reconstruction
- Background preservation (
- Visualization: Compare input vs decoded output and optionally highlight context regions.
- CLI tools: Scripts for segmentation, ROI compression, and before/after detection eval.
Setup
Install Dependencies:
pip install -r requirements.txtModel Checkpoints: Checkpoints are located in
checkpoints/directory. Main checkpoint:checkpoints/tic_lambda_0.0483.pth.tarBy default, model weights/caches downloaded by detection/segmentation backends are also stored under
checkpoints/(Hugging Face models undercheckpoints/hf/, torchvision weights undercheckpoints/torch/).
Usage
Interactive Demo (Hugging Face Spaces / Local)
This repo includes a Gradio app intended for Hugging Face Spaces (app_file: app.py). To run locally:
python app.py
In the UI:
- Enter a Mission and choose a Context Extraction Method (ROI).
- Tune the two knobs to match bandwidth constraints:
- Transmission quality (checkpoint selection)
- Background preservation ($\sigma$)
- Optionally enable object detection overlays to visualize task retention on the decoded image.
Note: the app includes a Video tab placeholder (inactive).
Contextual Spatial Compression (Images)
Run the compression script with an input image:
python roi_compressor.py \
--input data/images/car/0016cf15fa4d4e16.jpg \
--output results/compressed.jpg \
--checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
--sigma 0.3 \
--seg-classes car \
--highlight
Arguments:
--input: Path to input image.--output: Path to save compressed image.--checkpoint: Path to model checkpoint.--sigma: Background quality factor (lower = more compression). Default: 0.3.--lambda: Rate-distortion tradeoff parameter (default: 0.0483).--seg-method: Segmentation method (segformer,yolo,mask2former,maskrcnn). Default:segformer.--seg-classes: List of classes to treat as ROI (e.g.,car,person).--highlight: Save a comparison grid with ROI highlighted.
Tip: you can bypass segmentation by providing --load-mask.
Segmentation Only
Generate segmentation masks without compression:
python roi_segmenter.py \
--input data/images/car/0016cf15fa4d4e16.jpg \
--output results/mask.png \
--method segformer \
--classes car \
--visualize
Prompt-based segmentation (SAM3):
python roi_segmenter.py \
--input data/images/car/0016cf15fa4d4e16.jpg \
--output results/mask.png \
--method sam3 \
--prompt "a car" \
--visualize
Project Structure
.
βββ app.py # Gradio demo (Hugging Face Spaces)
βββ README.md
βββ requirements.txt
βββ model_cache.py # Cache routing to `checkpoints/`
βββ examples.sh # Example CLI commands
βββ _segmentation_comparison.ipynb
βββ roi_compressor.py # CLI: contextual (ROI) image compression
βββ roi_segmenter.py # CLI: ROI mask generation
βββ roi_detection_eval.py # CLI: before/after detection retention
βββ checkpoints/ # Compression checkpoints + model caches
βββ data/images/ # Sample images
βββ segmentation/ # Segmenters + factory
βββ detection/ # Detectors + factory
βββ vae/ # ROI-aware TIC model + compression utils
Modular API
Using Segmentation Module
from segmentation import create_segmenter
# Create a segmenter
segmenter = create_segmenter('yolo', device='cuda', conf_threshold=0.3)
# Segment image
mask = segmenter(image, target_classes=['car', 'person'])
Using Compression Module
from vae import load_checkpoint, compress_image
from PIL import Image
# Load model
model = load_checkpoint('checkpoints/tic_lambda_0.0483.pth.tar', device='cuda')
# Compress with ROI mask
result = compress_image(image, mask, model, sigma=0.3, device='cuda')
compressed_img = result['compressed']
bpp = result['bpp']
Object Detection (New)
An extendable object detection module is available in detection/ with multiple implemented backends:
- YOLO (Ultralytics)
- YOLO-World (Ultralytics, open-vocabulary)
- Faster R-CNN (torchvision)
- RetinaNet (torchvision)
- SSD (torchvision)
- FCOS (torchvision)
- DETR (transformers)
- Deformable DETR (transformers, if supported by your installed version)
- EfficientDet (optional, requires
effdet) - Grounding DINO (transformers, open-vocabulary)
Open-vocabulary detectors (YOLO-World / Grounding DINO) require text prompts/classes at runtime.
Evaluate Detection Before/After ROI Compression
Compare an original image vs an already-compressed image:
python roi_detection_eval.py \
--before data/images/car/0016cf15fa4d4e16.jpg \
--after results/compressed.jpg \
--detectors yolo fasterrcnn detr \
--viz-dir results/det_viz
Or generate the "after" image via ROI compression and then evaluate:
python roi_detection_eval.py \
--before data/images/car/0016cf15fa4d4e16.jpg \
--checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
--sigma 0.3 \
--seg-method yolo --seg-classes car \
--detectors yolo fasterrcnn \
--save-after results/after.jpg \
--viz-dir results/det_viz
Open-vocabulary example (YOLO-World):
python roi_detection_eval.py \
--before data/images/person/kodim04.png \
--checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
--sigma 0.3 \
--seg-method yolo --seg-classes person \
--detectors yolo_world \
--open-vocab-classes "person,car" \
--viz-dir results/det_viz
Open-vocabulary example (Grounding DINO):
python roi_detection_eval.py \
--before data/images/car/0016cf15fa4d4e16.jpg \
--checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
--sigma 0.3 \
--seg-method yolo --seg-classes car \
--detectors grounding_dino \
--open-vocab-classes "car,person" \
--viz-dir results/det_viz
Programmatic API
The application exposes a Gradio API for programmatic access to all features:
Image API
/segment- Segment image β mask or overlay/compress- Compress image with optional ROI mask/detect- Run object detection β JSON or overlay/process- Full pipeline: segment β compress β detect
Video API (Buffered)
/segment_video- Segment video β mask file or overlay video/compress_video- Compress video with optional cached masks/detect_video- Run detection on video β JSON or overlay video/process_video- Full pipeline with static/dynamic modes
Video API (Streaming - NEW!)
/stream_process_video- Stream compressed chunks progressively (HLS-style)/stream_compress_video- Stream chunks with pre-computed masks
Key difference: Streaming endpoints yield chunks as they're produced (low latency, ~1 second for first chunk) instead of buffering the entire video. Perfect for real-time streaming applications.
See API.md for complete documentation with examples.
See STREAMING_API.md for streaming API guide and comparison.
Quick Example
from gradio_client import Client, handle_file
client = Client("http://localhost:7860")
# Image: segment β compress β detect
compressed, mask, bpp, ratio, coverage, detections = client.predict(
handle_file("image.jpg"),
"car, person", # mission prompt
"sam3", # ROI method
4, # quality level (1-5)
0.3, # sigma (background preservation)
True, # run detection
"yolo", # detection method
"", # detection classes
api_name="/process"
)
# Video: streaming compression (chunk-by-chunk)
chunk_stream = client.submit(
handle_file("video.mp4"),
"person, car",
"sam3", "static",
4, 0.3, 15.0,
api_name="/stream_process_video"
)
for chunk_json in chunk_stream:
chunk = json.loads(chunk_json)
if chunk.get("status") == "complete":
break
print(f"Chunk {chunk['chunk_index']}: {len(chunk['frames'])} frames")
JavaScript/Frontend Integration
Yes, streaming works great with JavaScript! The @gradio/client package fully supports async iterators for streaming:
import { Client } from "@gradio/client";
const client = await Client.connect("http://localhost:7860");
const stream = client.submit("/stream_process_video", {
video_path: videoFile,
prompt: "person, car",
segmentation_method: "sam3",
mode: "static",
quality: 4,
sigma: 0.3,
output_fps: 15.0,
frame_format: "jpeg",
frame_quality: 85
});
for await (const msg of stream) {
const chunk = JSON.parse(msg.data);
if (chunk.status === "complete") break;
// Display frames immediately
displayFrame(`data:image/jpeg;base64,${chunk.frames[0]}`);
}
Complete examples available:
- examples/streaming_demo.html - Standalone HTML demo
- examples/streaming_client.ts - React/TypeScript/Vanilla JS examples
See STREAMING_API.md for detailed streaming guide.```