Spaces:

BiasLab2025
/

contextual-communication-demo

Running

App Files Files Community

contextual-communication-demo / README.md

raheebhassan

Add streaming video API endpoints for real-time processing and compression

8177c87 3 months ago

preview code

raw

history blame contribute delete

16.7 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: Contextual Communication Demo
emoji: 📡
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false

Contextual Communication Demo

An interactive demo for contextual communication in bandwidth-degraded environments (e.g., ISR collection from drones). The core idea is context-aware compression: transmit an extremely compact latent representation while ensuring the decoded output remains useful for downstream decision-making (e.g., object detection).

This repository implements contextual spatial compression for EO/IR-style imagery using an ROI-aware learned image compression model (TIC-style VAE) guided by segmentation masks.

Features

Contextual (ROI) compression: preserves fidelity in mission-relevant regions while aggressively compressing non-relevant background.
Mission-driven context extraction: map a mission prompt to ROI masks via multiple segmentation strategies:
- Class-based segmentation (SegFormer / YOLO / Mask2Former / Mask R-CNN)
- Prompt/referring segmentation (SAM3)
- Optional object detection overlays to visualize task retention on the decoded image
Two operator knobs for bandwidth adaptation:
- Background preservation ($\sigma$, 0.01–1.0): lower = more background degradation
- Transmission quality (checkpoint/lambda selection): higher = larger payload / better reconstruction
CLI tools for segmentation, ROI compression, and before/after detection retention.

Setup

pip install -r requirements.txt

Checkpoints are expected under checkpoints/ (e.g., checkpoints/tic_lambda_0.0483.pth.tar).

By default, model weights/caches downloaded by detection/segmentation backends are also stored under checkpoints/:

Hugging Face models under checkpoints/hf/
Torch/torchvision weights under checkpoints/torch/

Usage

Interactive Demo (Hugging Face Spaces / Local)

This repo includes a Gradio app intended for Hugging Face Spaces (app_file: app.py). To run locally:

python app.py

In the UI:

Enter a Mission and choose a Context Extraction Method (ROI).
Tune the two knobs to match bandwidth constraints:
- Transmission quality (checkpoint selection)
- Background preservation ($\sigma$)
Optionally enable object detection overlays.

Note: the app includes a Video tab placeholder (inactive).

CLI: Contextual Spatial Compression (Images)

python roi_compressor.py \
    --input data/images/car/0016cf15fa4d4e16.jpg \
    --output results/compressed.jpg \
    --checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
    --sigma 0.3 \
    --seg-method yolo \
    --seg-classes car \
    --highlight

Key arguments:

--sigma: background quality (lower = more compression)
--seg-method: segformer, yolo, mask2former, maskrcnn
--load-mask: bypass segmentation using a precomputed mask

CLI: Segmentation Only

python roi_segmenter.py \
    --input data/images/car/0016cf15fa4d4e16.jpg \
    --output results/mask.png \
    --method segformer \
    --classes car \
    --visualize

Prompt-based segmentation (SAM3):

python roi_segmenter.py \
    --input data/images/car/0016cf15fa4d4e16.jpg \
    --output results/mask.png \
    --method sam3 \
    --prompt "a car" \
    --visualize

CLI: Detection Retention (Before vs After)

Compare original vs already-compressed:

python roi_detection_eval.py \
    --before data/images/car/0016cf15fa4d4e16.jpg \
    --after results/compressed.jpg \
    --detectors yolo fasterrcnn detr \
    --viz-dir results/det_viz

Or generate the "after" image via ROI compression and then evaluate:

python roi_detection_eval.py \
    --before data/images/car/0016cf15fa4d4e16.jpg \
    --checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
    --sigma 0.3 \
    --seg-method yolo --seg-classes car \
    --detectors yolo fasterrcnn \
    --save-after results/after.jpg \
    --viz-dir results/det_viz

Open-vocabulary example (YOLO-World):

python roi_detection_eval.py \
    --before data/images/person/kodim04.png \
    --checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
    --sigma 0.3 \
    --seg-method yolo --seg-classes person \
    --detectors yolo_world \
    --open-vocab-classes "person,car" \
    --viz-dir results/det_viz

Project Structure

.
├── app.py                    # Gradio demo (Hugging Face Spaces)
├── model_cache.py            # Cache routing to `checkpoints/`
├── roi_compressor.py         # CLI: contextual (ROI) image compression
├── roi_segmenter.py          # CLI: ROI mask generation
├── roi_detection_eval.py     # CLI: before/after detection retention
├── segmentation/             # Segmenters + factory
├── detection/                # Detectors + factory
├── vae/                      # ROI-aware TIC model + compression utils
├── checkpoints/              # Compression checkpoints + model caches
├── data/images/                   # Sample images
├── examples.sh
└── _segmentation_comparison.ipynb

Modular API

Segmentation:

from segmentation import create_segmenter

segmenter = create_segmenter("yolo", device="cuda", conf_threshold=0.3)
mask = segmenter(image, target_classes=["car", "person"])

Compression:

from vae import load_checkpoint, compress_image

model = load_checkpoint("checkpoints/tic_lambda_0.0483.pth.tar", device="cuda")
out = compress_image(image, mask, model, sigma=0.3, device="cuda")
compressed = out["compressed"]
bpp = out["bpp"]

Notes

OpenCV is included via opencv-python-headless (recommended for server/Spaces environments).
Some backends download weights on first use; caches are routed under checkpoints/.
Output directories like results/ are created at runtime by the CLIs.

title: Contextual Communication Demo emoji: "📡" colorFrom: blue colorTo: purple sdk: gradio sdk_version: "6.2.0" app_file: app.py pinned: false

Contextual Communication Demo

This repository implements contextual spatial compression for EO/IR-style imagery using an ROI-aware learned image compression model (TIC-style VAE) guided by segmentation masks.

Features

Contextual (ROI) compression: Preserves fidelity in mission-relevant regions while aggressively compressing non-relevant background.
Mission-driven context extraction: A mission prompt can be mapped to ROI masks via multiple segmentation strategies:
- Class-based segmentation (e.g., SegFormer / YOLO / Mask2Former / Mask R-CNN)
- Prompt/referring segmentation (SAM3)
- Optional object detection overlays to evaluate task retention on decoded outputs
Two operator knobs for bandwidth adaptation:
- Background preservation (sigma, 0.01–1.0): lower = more background degradation
- Overall quality level (checkpoint/lambda selection): higher = larger file / better reconstruction
Visualization: Compare input vs decoded output and optionally highlight context regions.
CLI tools: Scripts for segmentation, ROI compression, and before/after detection eval.

Setup

Install Dependencies:
```
pip install -r requirements.txt
```
Model Checkpoints: Checkpoints are located in checkpoints/ directory. Main checkpoint: checkpoints/tic_lambda_0.0483.pth.tar

By default, model weights/caches downloaded by detection/segmentation backends are also stored under checkpoints/ (Hugging Face models under checkpoints/hf/, torchvision weights under checkpoints/torch/).

Usage

Interactive Demo (Hugging Face Spaces / Local)

This repo includes a Gradio app intended for Hugging Face Spaces (app_file: app.py). To run locally:

python app.py

In the UI:

Enter a Mission and choose a Context Extraction Method (ROI).
Tune the two knobs to match bandwidth constraints:
- Transmission quality (checkpoint selection)
- Background preservation ($\sigma$)
Optionally enable object detection overlays to visualize task retention on the decoded image.

Note: the app includes a Video tab placeholder (inactive).

Contextual Spatial Compression (Images)

Run the compression script with an input image:

python roi_compressor.py \
    --input data/images/car/0016cf15fa4d4e16.jpg \
    --output results/compressed.jpg \
    --checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
    --sigma 0.3 \
    --seg-classes car \
    --highlight

Arguments:

--input: Path to input image.
--output: Path to save compressed image.
--checkpoint: Path to model checkpoint.
--sigma: Background quality factor (lower = more compression). Default: 0.3.
--lambda: Rate-distortion tradeoff parameter (default: 0.0483).
--seg-method: Segmentation method (segformer, yolo, mask2former, maskrcnn). Default: segformer.
--seg-classes: List of classes to treat as ROI (e.g., car, person).
--highlight: Save a comparison grid with ROI highlighted.

Tip: you can bypass segmentation by providing --load-mask.

Segmentation Only

Generate segmentation masks without compression:

python roi_segmenter.py \
    --input data/images/car/0016cf15fa4d4e16.jpg \
    --output results/mask.png \
    --method segformer \
    --classes car \
    --visualize

Prompt-based segmentation (SAM3):

python roi_segmenter.py \
    --input data/images/car/0016cf15fa4d4e16.jpg \
    --output results/mask.png \
    --method sam3 \
    --prompt "a car" \
    --visualize

Project Structure

.
├── app.py                    # Gradio demo (Hugging Face Spaces)
├── README.md
├── requirements.txt
├── model_cache.py            # Cache routing to `checkpoints/`
├── examples.sh               # Example CLI commands
├── _segmentation_comparison.ipynb
├── roi_compressor.py         # CLI: contextual (ROI) image compression
├── roi_segmenter.py          # CLI: ROI mask generation
├── roi_detection_eval.py     # CLI: before/after detection retention
├── checkpoints/              # Compression checkpoints + model caches
├── data/images/                   # Sample images
├── segmentation/             # Segmenters + factory
├── detection/                # Detectors + factory
└── vae/                      # ROI-aware TIC model + compression utils

Modular API

Using Segmentation Module

from segmentation import create_segmenter

# Create a segmenter
segmenter = create_segmenter('yolo', device='cuda', conf_threshold=0.3)

# Segment image
mask = segmenter(image, target_classes=['car', 'person'])

Using Compression Module

from vae import load_checkpoint, compress_image
from PIL import Image

# Load model
model = load_checkpoint('checkpoints/tic_lambda_0.0483.pth.tar', device='cuda')

# Compress with ROI mask
result = compress_image(image, mask, model, sigma=0.3, device='cuda')
compressed_img = result['compressed']
bpp = result['bpp']

Object Detection (New)

An extendable object detection module is available in detection/ with multiple implemented backends:

YOLO (Ultralytics)
YOLO-World (Ultralytics, open-vocabulary)
Faster R-CNN (torchvision)
RetinaNet (torchvision)
SSD (torchvision)
FCOS (torchvision)
DETR (transformers)
Deformable DETR (transformers, if supported by your installed version)
EfficientDet (optional, requires effdet)
Grounding DINO (transformers, open-vocabulary)

Open-vocabulary detectors (YOLO-World / Grounding DINO) require text prompts/classes at runtime.

Evaluate Detection Before/After ROI Compression

Compare an original image vs an already-compressed image:

python roi_detection_eval.py \
    --before data/images/car/0016cf15fa4d4e16.jpg \
    --after results/compressed.jpg \
    --detectors yolo fasterrcnn detr \
    --viz-dir results/det_viz

Or generate the "after" image via ROI compression and then evaluate:

python roi_detection_eval.py \
    --before data/images/car/0016cf15fa4d4e16.jpg \
    --checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
    --sigma 0.3 \
    --seg-method yolo --seg-classes car \
    --detectors yolo fasterrcnn \
    --save-after results/after.jpg \
    --viz-dir results/det_viz

Open-vocabulary example (YOLO-World):

python roi_detection_eval.py \
    --before data/images/person/kodim04.png \
    --checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
    --sigma 0.3 \
    --seg-method yolo --seg-classes person \
    --detectors yolo_world \
    --open-vocab-classes "person,car" \
    --viz-dir results/det_viz

Open-vocabulary example (Grounding DINO):

python roi_detection_eval.py \
    --before data/images/car/0016cf15fa4d4e16.jpg \
    --checkpoint checkpoints/tic_lambda_0.0483.pth.tar \
    --sigma 0.3 \
    --seg-method yolo --seg-classes car \
    --detectors grounding_dino \
    --open-vocab-classes "car,person" \
    --viz-dir results/det_viz

Programmatic API

The application exposes a Gradio API for programmatic access to all features:

Image API

/segment - Segment image → mask or overlay
/compress - Compress image with optional ROI mask
/detect - Run object detection → JSON or overlay
/process - Full pipeline: segment → compress → detect

Video API (Buffered)

/segment_video - Segment video → mask file or overlay video
/compress_video - Compress video with optional cached masks
/detect_video - Run detection on video → JSON or overlay video
/process_video - Full pipeline with static/dynamic modes

Video API (Streaming - NEW!)

/stream_process_video - Stream compressed chunks progressively (HLS-style)
/stream_compress_video - Stream chunks with pre-computed masks

Key difference: Streaming endpoints yield chunks as they're produced (low latency, ~1 second for first chunk) instead of buffering the entire video. Perfect for real-time streaming applications.

See API.md for complete documentation with examples.
See STREAMING_API.md for streaming API guide and comparison.

Quick Example

from gradio_client import Client, handle_file

client = Client("http://localhost:7860")

# Image: segment → compress → detect
compressed, mask, bpp, ratio, coverage, detections = client.predict(
    handle_file("image.jpg"),
    "car, person",  # mission prompt
    "sam3",         # ROI method
    4,              # quality level (1-5)
    0.3,            # sigma (background preservation)
    True,           # run detection
    "yolo",         # detection method
    "",             # detection classes
    api_name="/process"
)

# Video: streaming compression (chunk-by-chunk)
chunk_stream = client.submit(
    handle_file("video.mp4"),
    "person, car",
    "sam3", "static",
    4, 0.3, 15.0,
    api_name="/stream_process_video"
)

for chunk_json in chunk_stream:
    chunk = json.loads(chunk_json)
    if chunk.get("status") == "complete":
        break
    print(f"Chunk {chunk['chunk_index']}: {len(chunk['frames'])} frames")

JavaScript/Frontend Integration

Yes, streaming works great with JavaScript! The @gradio/client package fully supports async iterators for streaming:

import { Client } from "@gradio/client";

const client = await Client.connect("http://localhost:7860");
const stream = client.submit("/stream_process_video", {
  video_path: videoFile,
  prompt: "person, car",
  segmentation_method: "sam3",
  mode: "static",
  quality: 4,
  sigma: 0.3,
  output_fps: 15.0,
  frame_format: "jpeg",
  frame_quality: 85
});

for await (const msg of stream) {
  const chunk = JSON.parse(msg.data);
  if (chunk.status === "complete") break;
  
  // Display frames immediately
  displayFrame(`data:image/jpeg;base64,${chunk.frames[0]}`);
}

Complete examples available:

examples/streaming_demo.html - Standalone HTML demo
examples/streaming_client.ts - React/TypeScript/Vanilla JS examples

See STREAMING_API.md for detailed streaming guide.```