aadex
/

faster-rcnn-rope-vit-tiny-coco

Object Detection

vision-transformer

Model card Files Files and versions

faster-rcnn-rope-vit-tiny-coco / README.md

aadex's picture

Upload README.md with huggingface_hub

3c499ce verified about 2 months ago

|

history blame contribute delete

1.64 kB

	---
	license: apache-2.0
	library_name: mmdetection
	tags:
	- object-detection
	- vision-transformer
	- mmdetection
	- pytorch
	- faster-rcnn
	datasets:
	- coco
	metrics:
	- map
	---

	# Faster R-CNN with RoPE-ViT Backbone for Object Detection

	This model is a Faster R-CNN object detection model with a RoPE-ViT (Vision Transformer with Rotary Position Embeddings) backbone, trained on the COCO dataset.

	## Model Description

	- Architecture: Faster R-CNN
	- Backbone: RoPE-ViT Tiny
	- Dataset: COCO
	- Task: Object Detection
	- Framework: MMDetection

	## Training Results

	\| Metric \| Value \|
	\|--------\|-------\|
	\| bbox_mAP \| 0.0680 \|
	\| bbox_mAP_50 \| 0.1510 \|
	\| bbox_mAP_75 \| 0.0530 \|
	\| bbox_mAP_s (small) \| 0.0360 \|
	\| bbox_mAP_m (medium) \| 0.1260 \|
	\| bbox_mAP_l (large) \| 0.0640 \|

	## Usage

	```python
	from mmdet.apis import init_detector, inference_detector

	config_file = 'faster_rcnn_rope_vit_tiny_coco.py'
	checkpoint_file = 'best_coco_bbox_mAP_epoch_12.pth'

	# Initialize the model
	model = init_detector(config_file, checkpoint_file, device='cuda:0')

	# Inference on an image
	result = inference_detector(model, 'demo.jpg')
	```

	## Training Configuration

	The model was trained with the following configuration:
	- Input size: 512x512
	- Training epochs: 12
	- Optimizer: SGD with momentum
	- Learning rate scheduler: Step decay

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{rope-vit-detection,
	author = {VLG IITR},
	title = {Faster R-CNN with RoPE-ViT for Object Detection},
	year = {2026},
	publisher = {Hugging Face},
	}
	```

	## License

	This model is released under the Apache 2.0 license.