FastPose-ViT / README.md
PierreAncey's picture
Update README.md
5de8373 verified
metadata
license: mit
tags:
  - pose-estimation
  - 6d-pose
  - vision-transformer
  - spacecraft
  - space
  - vit
datasets:
  - SPEED
pipeline_tag: image-classification

FastPose-ViT: Pretrained Weights

Pretrained weights for FastPose-ViT, a Vision Transformer pipeline for real-time 6D spacecraft pose estimation.

Available Weights

All models are trained on the SPEED dataset.

File Model Task Input Resolution
vit_b_16_384.pth ViT-B/16-384 Pose estimation (6D) 384x384
vit_b_16.pth ViT-B/16 Pose estimation (6D) 224x224
small.pth LW-DETR Small Object detection (bbox) 512x512

Usage

  1. Clone the repository:
git clone https://github.com/PierreAncey/FastPose-ViT.git
cd FastPose-ViT
  1. Download weights and place them in a weights/ directory.

  2. Run evaluation on the SPEED dataset:

DATASET=SPEED_FIXED && \
python3 src/evaluate.py \
  --model_weights weights/vit_b_16_384.pth \
  --rotation_format matrix \
  --num_hidden_layers 0 \
  --hidden_layer_dim 0 \
  --nb_class_tokens 1 \
  --batch_size 8 \
  --vit_model vit_b_16_384 \
  --dataset SPEED \
  --dataset_root_dir $DATASET \
  --num_workers 8 \
  --merge_outputs \
  --no_mlp
  1. Run the object detector:
DATASET=SPEED_FIXED && \
python3 object_detector/evaluate.py \
  --dataset_root_dir $DATASET \
  --model_variant small \
  --model_weights weights/small.pth

Model Details

  • Pose estimator: ViT backbone with direct 6D pose regression (rotation matrix + translation vector). Uses 6D continuous rotation representation with Gram-Schmidt orthogonalization.
  • Object detector: LW-DETR (Lightweight DETR) fine-tuned from COCO-pretrained weights for single-class spacecraft detection. Provides bounding boxes as preprocessing for the pose estimator.

Citation

@InProceedings{Ancey_2026_WACV,
    author    = {Ancey, Pierre and Price, Andrew and Javed, Saqib and Salzmann, Mathieu},
    title     = {FastPose-ViT: A Vision Transformer for Real-Time Spacecraft Pose Estimation},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {March},
    year      = {2026},
    pages     = {7873-7882}
}

License

MIT License. See the repository for details.