FastPose-ViT: Pretrained Weights
Pretrained weights for FastPose-ViT, a Vision Transformer pipeline for real-time 6D spacecraft pose estimation.
Available Weights
All models are trained on the SPEED dataset.
| File | Model | Task | Input Resolution |
|---|---|---|---|
vit_b_16_384.pth |
ViT-B/16-384 | Pose estimation (6D) | 384x384 |
vit_b_16.pth |
ViT-B/16 | Pose estimation (6D) | 224x224 |
small.pth |
LW-DETR Small | Object detection (bbox) | 512x512 |
Usage
- Clone the repository:
git clone https://github.com/PierreAncey/FastPose-ViT.git
cd FastPose-ViT
Download weights and place them in a
weights/directory.Run evaluation on the SPEED dataset:
DATASET=SPEED_FIXED && \
python3 src/evaluate.py \
--model_weights weights/vit_b_16_384.pth \
--rotation_format matrix \
--num_hidden_layers 0 \
--hidden_layer_dim 0 \
--nb_class_tokens 1 \
--batch_size 8 \
--vit_model vit_b_16_384 \
--dataset SPEED \
--dataset_root_dir $DATASET \
--num_workers 8 \
--merge_outputs \
--no_mlp
- Run the object detector:
DATASET=SPEED_FIXED && \
python3 object_detector/evaluate.py \
--dataset_root_dir $DATASET \
--model_variant small \
--model_weights weights/small.pth
Model Details
- Pose estimator: ViT backbone with direct 6D pose regression (rotation matrix + translation vector). Uses 6D continuous rotation representation with Gram-Schmidt orthogonalization.
- Object detector: LW-DETR (Lightweight DETR) fine-tuned from COCO-pretrained weights for single-class spacecraft detection. Provides bounding boxes as preprocessing for the pose estimator.
Citation
@InProceedings{Ancey_2026_WACV,
author = {Ancey, Pierre and Price, Andrew and Javed, Saqib and Salzmann, Mathieu},
title = {FastPose-ViT: A Vision Transformer for Real-Time Spacecraft Pose Estimation},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2026},
pages = {7873-7882}
}
License
MIT License. See the repository for details.