tapformer / README.md
nielsr's picture
nielsr HF Staff
Improve model card for TAPFormer
4a0c73e verified
|
raw
history blame
2.61 kB
metadata
license: mit
pipeline_tag: other
tags:
  - computer-vision
  - point-tracking
  - event-camera
  - multimodal-fusion

TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events

This repository contains the weights for TAPFormer, presented at CVPR 2026.

Paper (arXiv) | Project Page | GitHub Repository

Introduction

Tracking any point (TAP) is a fundamental yet challenging task in computer vision, requiring high precision and long-term motion reasoning. TAPFormer is a transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-frequency arbitrary point tracking.

Our key innovation is a Transient Asynchronous Fusion (TAF) mechanism, which explicitly models the temporal evolution between discrete frames through continuous event updates, bridging the gap between low-rate frames and high-rate events. In addition, a Cross-modal Locally Weighted Fusion (CLWF) module adaptively adjusts spatial attention according to modality reliability, yielding stable and discriminative features even under blur or low light.

Key Features

  • Asynchronous Fusion: The first framework to explicitly model temporal continuity between frames and events via TAF.
  • Modality Reliability: CLWF adaptively handles challenging conditions like motion blur or low illumination.
  • SOTA Performance: Achieves a 28.2% improvement in average pixel error on real-world benchmarks.

Installation and Usage

For detailed instructions on environment setup, data preparation, and running evaluation scripts, please refer to the official GitHub repository.

Citation

If you find this work useful in your research, please cite:

@article{liu2026tapformer,
  title={TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events},
  author={Liu, Jiaxiong and Tan, Zhen and Zhang, Jinpu and Zhou, Yi and Shen, Hui and Chen, Xieyuanli and Hu, Dewen},
  journal={arXiv preprint arXiv:2603.04989},
  year={2026}
}

@inproceedings{liu2025tracking,
  title={Tracking any point with frame-event fusion network at high frame rate},
  author={Liu, Jiaxiong and Wang, Bo and Tan, Zhen and Zhang, Jinpu and Shen, Hui and Hu, Dewen},
  booktitle={2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages={18834--18840},
  year={2025},
  organization={IEEE}
}