File size: 1,635 Bytes

3c499ce

---
license: apache-2.0
library_name: mmdetection
tags:
  - object-detection
  - vision-transformer
  - mmdetection
  - pytorch
  - faster-rcnn
datasets:
  - coco
metrics:
  - map
---

# Faster R-CNN with RoPE-ViT Backbone for Object Detection

This model is a Faster R-CNN object detection model with a RoPE-ViT (Vision Transformer with Rotary Position Embeddings) backbone, trained on the COCO dataset.

## Model Description

- **Architecture:** Faster R-CNN
- **Backbone:** RoPE-ViT Tiny
- **Dataset:** COCO
- **Task:** Object Detection
- **Framework:** MMDetection

## Training Results

| Metric | Value |
|--------|-------|
| bbox_mAP | 0.0680 |
| bbox_mAP_50 | 0.1510 |
| bbox_mAP_75 | 0.0530 |
| bbox_mAP_s (small) | 0.0360 |
| bbox_mAP_m (medium) | 0.1260 |
| bbox_mAP_l (large) | 0.0640 |

## Usage

```python
from mmdet.apis import init_detector, inference_detector

config_file = 'faster_rcnn_rope_vit_tiny_coco.py'
checkpoint_file = 'best_coco_bbox_mAP_epoch_12.pth'

# Initialize the model
model = init_detector(config_file, checkpoint_file, device='cuda:0')

# Inference on an image
result = inference_detector(model, 'demo.jpg')
```

## Training Configuration

The model was trained with the following configuration:
- Input size: 512x512
- Training epochs: 12
- Optimizer: SGD with momentum
- Learning rate scheduler: Step decay

## Citation

If you use this model, please cite:

```bibtex
@misc{rope-vit-detection,
  author = {VLG IITR},
  title = {Faster R-CNN with RoPE-ViT for Object Detection},
  year = {2026},
  publisher = {Hugging Face},
}
```

## License

This model is released under the Apache 2.0 license.