aadex
/

simple-vit-detection-coco

+---
+license: apache-2.0
+tags:
+- object-detection
+- vision-transformer
+- coco
+- faster-rcnn
+- positional-embeddings
+- simple-vit
+datasets:
+- coco
+library_name: mmdetection
+---
+# Simple ViT - Object Detection on COCO
+Faster R-CNN with Simple ViT-Tiny backbone (learned positional embeddings)
+## Model Details
+- **Architecture**: Faster R-CNN with ViT-Tiny backbone
+- **Backbone**: Simple ViT (192-dim, 12 layers, 3 heads)
+- **Positional Embedding**: SIMPLE
+- **Training Resolution**: 512x512
+- **Dataset**: COCO 2017
+- **Framework**: MMDetection
+## Training Configuration
+| Parameter | Value |
+|-----------|-------|
+| Image Size | 512x512 |
+| Patch Size | 16x16 |
+| Hidden Dim | 192 |
+| Layers | 12 |
+| Heads | 3 |
+| MLP Dim | 768 |
+## Checkpoint Info
+- **Filename**: `best_coco_bbox_mAP_epoch_12.pth`
+- **Size**: 114.9 MB
+- **Epoch**: 12
+## Usage
+```python
+from mmdet.apis import init_detector, inference_detector
+config_file = 'detection/configs/faster_rcnn_simple_vit_tiny_coco.py'
+checkpoint_file = 'best_coco_bbox_mAP_epoch_12.pth'
+model = init_detector(config_file, checkpoint_file, device='cuda:0')
+result = inference_detector(model, 'test.jpg')
+```
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{vit_detection_coco,
+  title={Vision Transformer Object Detection with Simple ViT},
+  year={2026},
+  publisher={Hugging Face},
+}
+```
+## License
+Apache 2.0