simple-vit-detection-coco / README.md

aadex

Upload README.md with huggingface_hub

9e7840d verified 2 days ago

preview code

raw

history blame contribute delete

1.39 kB

metadata

license: apache-2.0
tags:
  - object-detection
  - vision-transformer
  - coco
  - faster-rcnn
  - positional-embeddings
  - simple-vit
datasets:
  - coco
library_name: mmdetection

Simple ViT - Object Detection on COCO

Faster R-CNN with Simple ViT-Tiny backbone (learned positional embeddings)

Model Details

Architecture: Faster R-CNN with ViT-Tiny backbone
Backbone: Simple ViT (192-dim, 12 layers, 3 heads)
Positional Embedding: SIMPLE
Training Resolution: 512x512
Dataset: COCO 2017
Framework: MMDetection

Training Configuration

Parameter	Value
Image Size	512x512
Patch Size	16x16
Hidden Dim	192
Layers	12
Heads	3
MLP Dim	768

Checkpoint Info

Filename: best_coco_bbox_mAP_epoch_12.pth
Size: 114.9 MB
Epoch: 12

Usage

from mmdet.apis import init_detector, inference_detector

config_file = 'detection/configs/faster_rcnn_simple_vit_tiny_coco.py'
checkpoint_file = 'best_coco_bbox_mAP_epoch_12.pth'

model = init_detector(config_file, checkpoint_file, device='cuda:0')
result = inference_detector(model, 'test.jpg')

Citation

If you use this model, please cite:

@misc{vit_detection_coco,
  title={Vision Transformer Object Detection with Simple ViT},
  year={2026},
  publisher={Hugging Face},
}

License

Apache 2.0