Faster R-CNN ResNet-50 FPN for LiteRT

This repository contains an inference-only LiteRT packaging of TorchVision fasterrcnn_resnet50_fpn.

The detector is split into three TFLite files:

File	Role
`fasterrcnn_resnet50_fpn_backbone_body_dynamic_hw.tflite`	ResNet backbone body: transformed image tensor to C2-C5 feature maps
`fasterrcnn_resnet50_fpn_rpn_head_dynamic_hw.tflite`	RPN head: one FPN level to objectness logits and box deltas
`fasterrcnn_resnet50_fpn_roi_box_dynamic_n.tflite`	ROI box head and predictor: pooled ROI features to class logits and box deltas

TorchVision host code keeps the detector-specific orchestration around those LiteRT submodels: preprocessing, FPN, anchor/proposal decode, NMS, ROIAlign, and final postprocessing.

Source model documentation: https://docs.pytorch.org/vision/main/models/generated/torchvision.models.detection.fasterrcnn_resnet50_fpn.html

Usage

Install runtime dependencies:

pip install ai-edge-litert torch torchvision pillow numpy

Run the sample from this repository directory:

python sample_torchvision_fasterrcnn_litert_cpu.py

The script defaults to the three .tflite files in the same directory as the sample. You can override them explicitly:

python sample_torchvision_fasterrcnn_litert_cpu.py \
  --backbone-model fasterrcnn_resnet50_fpn_backbone_body_dynamic_hw.tflite \
  --rpn-head-model fasterrcnn_resnet50_fpn_rpn_head_dynamic_hw.tflite \
  --roi-model fasterrcnn_resnet50_fpn_roi_box_dynamic_n.tflite \
  --image https://github.com/pytorch/hub/raw/master/images/dog.jpg

Example output:

input image: original=(1213, 1546) transformed=(1, 3, 800, 1024)
host proposal decode/NMS: 1000 proposals
ROI LiteRT outputs: logits=(1000, 91) box_regression=(1000, 364)
detections above 0.50: 1
  01: dog score=0.9669 box=[137.84, 67.8, 1386.9, 1172.82]

Notes

This is not a single-file object detector. The sample script is part of the runtime contract and uses TorchVision host modules for the portions that are not represented by the three LiteRT submodels.

The TFLite files use dynamic image height/width where the current CPU LiteRT runtime path supports it. The sample runs with HardwareAccelerator.CPU.

BibTeX entry and citation info

@article{DBLP:journals/corr/RenHG015,
  author       = {Shaoqing Ren and
                  Kaiming He and
                  Ross B. Girshick and
                  Jian Sun},
  title        = {Faster {R-CNN:} Towards Real-Time Object Detection with Region Proposal
                  Networks},
  journal      = {CoRR},
  volume       = {abs/1506.01497},
  year         = {2015},
  url          = {http://arxiv.org/abs/1506.01497},
  eprinttype   = {arXiv},
  eprint       = {1506.01497},
  timestamp    = {Mon, 13 Aug 2018 16:46:02 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/RenHG015.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Downloads last month: 46

Dataset used to train litert-community/FasterRCNN-ResNet50-FPN

Collection including litert-community/FasterRCNN-ResNet50-FPN

Web Classical Models

Collection

Classical Models • 11 items • Updated 3 days ago • 1

Paper for litert-community/FasterRCNN-ResNet50-FPN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Paper • 1506.01497 • Published Jun 4, 2015 • 2