File size: 1,390 Bytes
9e7840d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
license: apache-2.0
tags:
- object-detection
- vision-transformer
- coco
- faster-rcnn
- positional-embeddings
- simple-vit
datasets:
- coco
library_name: mmdetection
---
# Simple ViT - Object Detection on COCO
Faster R-CNN with Simple ViT-Tiny backbone (learned positional embeddings)
## Model Details
- **Architecture**: Faster R-CNN with ViT-Tiny backbone
- **Backbone**: Simple ViT (192-dim, 12 layers, 3 heads)
- **Positional Embedding**: SIMPLE
- **Training Resolution**: 512x512
- **Dataset**: COCO 2017
- **Framework**: MMDetection
## Training Configuration
| Parameter | Value |
|-----------|-------|
| Image Size | 512x512 |
| Patch Size | 16x16 |
| Hidden Dim | 192 |
| Layers | 12 |
| Heads | 3 |
| MLP Dim | 768 |
## Checkpoint Info
- **Filename**: `best_coco_bbox_mAP_epoch_12.pth`
- **Size**: 114.9 MB
- **Epoch**: 12
## Usage
```python
from mmdet.apis import init_detector, inference_detector
config_file = 'detection/configs/faster_rcnn_simple_vit_tiny_coco.py'
checkpoint_file = 'best_coco_bbox_mAP_epoch_12.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
result = inference_detector(model, 'test.jpg')
```
## Citation
If you use this model, please cite:
```bibtex
@misc{vit_detection_coco,
title={Vision Transformer Object Detection with Simple ViT},
year={2026},
publisher={Hugging Face},
}
```
## License
Apache 2.0
|