--- license: apache-2.0 library_name: mmdetection tags: - object-detection - vision-transformer - mmdetection - pytorch - faster-rcnn datasets: - coco metrics: - map --- # Faster R-CNN with RoPE-ViT Backbone for Object Detection This model is a Faster R-CNN object detection model with a RoPE-ViT (Vision Transformer with Rotary Position Embeddings) backbone, trained on the COCO dataset. ## Model Description - **Architecture:** Faster R-CNN - **Backbone:** RoPE-ViT Tiny - **Dataset:** COCO - **Task:** Object Detection - **Framework:** MMDetection ## Training Results | Metric | Value | |--------|-------| | bbox_mAP | 0.0680 | | bbox_mAP_50 | 0.1510 | | bbox_mAP_75 | 0.0530 | | bbox_mAP_s (small) | 0.0360 | | bbox_mAP_m (medium) | 0.1260 | | bbox_mAP_l (large) | 0.0640 | ## Usage ```python from mmdet.apis import init_detector, inference_detector config_file = 'faster_rcnn_rope_vit_tiny_coco.py' checkpoint_file = 'best_coco_bbox_mAP_epoch_12.pth' # Initialize the model model = init_detector(config_file, checkpoint_file, device='cuda:0') # Inference on an image result = inference_detector(model, 'demo.jpg') ``` ## Training Configuration The model was trained with the following configuration: - Input size: 512x512 - Training epochs: 12 - Optimizer: SGD with momentum - Learning rate scheduler: Step decay ## Citation If you use this model, please cite: ```bibtex @misc{rope-vit-detection, author = {VLG IITR}, title = {Faster R-CNN with RoPE-ViT for Object Detection}, year = {2026}, publisher = {Hugging Face}, } ``` ## License This model is released under the Apache 2.0 license.