| --- |
| license: apache-2.0 |
| library_name: mmdetection |
| tags: |
| - object-detection |
| - vision-transformer |
| - mmdetection |
| - pytorch |
| - faster-rcnn |
| datasets: |
| - coco |
| metrics: |
| - map |
| --- |
| |
| # Faster R-CNN with RoPE-ViT Backbone for Object Detection |
|
|
| This model is a Faster R-CNN object detection model with a RoPE-ViT (Vision Transformer with Rotary Position Embeddings) backbone, trained on the COCO dataset. |
|
|
| ## Model Description |
|
|
| - **Architecture:** Faster R-CNN |
| - **Backbone:** RoPE-ViT Tiny |
| - **Dataset:** COCO |
| - **Task:** Object Detection |
| - **Framework:** MMDetection |
|
|
| ## Training Results |
|
|
| | Metric | Value | |
| |--------|-------| |
| | bbox_mAP | 0.0680 | |
| | bbox_mAP_50 | 0.1510 | |
| | bbox_mAP_75 | 0.0530 | |
| | bbox_mAP_s (small) | 0.0360 | |
| | bbox_mAP_m (medium) | 0.1260 | |
| | bbox_mAP_l (large) | 0.0640 | |
| |
| ## Usage |
| |
| ```python |
| from mmdet.apis import init_detector, inference_detector |
| |
| config_file = 'faster_rcnn_rope_vit_tiny_coco.py' |
| checkpoint_file = 'best_coco_bbox_mAP_epoch_12.pth' |
| |
| # Initialize the model |
| model = init_detector(config_file, checkpoint_file, device='cuda:0') |
|
|
| # Inference on an image |
| result = inference_detector(model, 'demo.jpg') |
| ``` |
| |
| ## Training Configuration |
| |
| The model was trained with the following configuration: |
| - Input size: 512x512 |
| - Training epochs: 12 |
| - Optimizer: SGD with momentum |
| - Learning rate scheduler: Step decay |
| |
| ## Citation |
| |
| If you use this model, please cite: |
| |
| ```bibtex |
| @misc{rope-vit-detection, |
| author = {VLG IITR}, |
| title = {Faster R-CNN with RoPE-ViT for Object Detection}, |
| year = {2026}, |
| publisher = {Hugging Face}, |
| } |
| ``` |
| |
| ## License |
| |
| This model is released under the Apache 2.0 license. |
| |