vision-edge / README.md
WolfDavid's picture
Initial deploy: MobileNetV3 Faster R-CNN object detection
844ee22

A newer version of the Gradio SDK is available: 6.16.0

Upgrade
metadata
title: Vision Edge
emoji: πŸ‘
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.9.1
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
tags:
  - object-detection
  - computer-vision
  - mobilenetv3
  - faster-rcnn
  - edge-deployment
short_description: Object detection with MobileNetV3 Faster R-CNN

Vision Edge β€” Object Detection

Real-time object detection using torchvision's Faster R-CNN with MobileNetV3-Large FPN backbone, pre-trained on COCO.

What This Demonstrates

  • Edge-friendly architecture β€” MobileNetV3 is designed for mobile and edge inference, with 8-10Γ— fewer parameters than ResNet-50
  • Pre-trained on COCO β€” 91 classes including people, vehicles, animals, furniture, food, sports equipment
  • CPU-only inference β€” runs on HF's free tier without any GPU
  • Production export pipeline β€” the full source repo supports TFLite, ONNX, INT8 quantization, and Edge TPU deployment

How to Use

  1. Upload an image or pick an example
  2. Adjust the confidence threshold (default 0.5)
  3. Click "Run Detection"
  4. See annotated output with bounding boxes and per-detection confidence

Inference latency on HF's CPU tier: ~0.5–2 seconds per image.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Image Upload (PIL)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  torchvision Transform              β”‚
β”‚  (resize, normalize)                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  MobileNetV3-Large FPN Backbone     β”‚
β”‚  (feature extraction)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Faster R-CNN Detection Head        β”‚
β”‚  (region proposals + classifier)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Annotated Image + Detections List  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Edge Deployment Path

The full vision-edge pipeline in the source repo additionally supports:

  • TFLite export for Android / iOS
  • INT8 quantization with post-training calibration
  • Edge TPU compilation for Google Coral boards
  • ONNX export for any ML runtime

License

MIT