File size: 1,635 Bytes
3c499ce | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | ---
license: apache-2.0
library_name: mmdetection
tags:
- object-detection
- vision-transformer
- mmdetection
- pytorch
- faster-rcnn
datasets:
- coco
metrics:
- map
---
# Faster R-CNN with RoPE-ViT Backbone for Object Detection
This model is a Faster R-CNN object detection model with a RoPE-ViT (Vision Transformer with Rotary Position Embeddings) backbone, trained on the COCO dataset.
## Model Description
- **Architecture:** Faster R-CNN
- **Backbone:** RoPE-ViT Tiny
- **Dataset:** COCO
- **Task:** Object Detection
- **Framework:** MMDetection
## Training Results
| Metric | Value |
|--------|-------|
| bbox_mAP | 0.0680 |
| bbox_mAP_50 | 0.1510 |
| bbox_mAP_75 | 0.0530 |
| bbox_mAP_s (small) | 0.0360 |
| bbox_mAP_m (medium) | 0.1260 |
| bbox_mAP_l (large) | 0.0640 |
## Usage
```python
from mmdet.apis import init_detector, inference_detector
config_file = 'faster_rcnn_rope_vit_tiny_coco.py'
checkpoint_file = 'best_coco_bbox_mAP_epoch_12.pth'
# Initialize the model
model = init_detector(config_file, checkpoint_file, device='cuda:0')
# Inference on an image
result = inference_detector(model, 'demo.jpg')
```
## Training Configuration
The model was trained with the following configuration:
- Input size: 512x512
- Training epochs: 12
- Optimizer: SGD with momentum
- Learning rate scheduler: Step decay
## Citation
If you use this model, please cite:
```bibtex
@misc{rope-vit-detection,
author = {VLG IITR},
title = {Faster R-CNN with RoPE-ViT for Object Detection},
year = {2026},
publisher = {Hugging Face},
}
```
## License
This model is released under the Apache 2.0 license.
|