Improve model card for TAPFormer
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,10 +1,53 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
-
TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events (CVPR 2026)
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
-
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: other
|
| 4 |
+
tags:
|
| 5 |
+
- computer-vision
|
| 6 |
+
- point-tracking
|
| 7 |
+
- event-camera
|
| 8 |
+
- multimodal-fusion
|
| 9 |
---
|
|
|
|
| 10 |
|
| 11 |
+
# TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events
|
| 12 |
|
| 13 |
+
This repository contains the weights for **TAPFormer**, presented at CVPR 2026.
|
| 14 |
|
| 15 |
+
[**Paper (arXiv)**](https://arxiv.org/abs/2603.04989) | [**Project Page**](https://tapformer.github.io/) | [**GitHub Repository**](https://github.com/ljx1002/TAPFormer)
|
| 16 |
+
|
| 17 |
+
## Introduction
|
| 18 |
+
|
| 19 |
+
Tracking any point (TAP) is a fundamental yet challenging task in computer vision, requiring high precision and long-term motion reasoning. **TAPFormer** is a transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-frequency arbitrary point tracking.
|
| 20 |
+
|
| 21 |
+
Our key innovation is a **Transient Asynchronous Fusion (TAF)** mechanism, which explicitly models the temporal evolution between discrete frames through continuous event updates, bridging the gap between low-rate frames and high-rate events. In addition, a **Cross-modal Locally Weighted Fusion (CLWF)** module adaptively adjusts spatial attention according to modality reliability, yielding stable and discriminative features even under blur or low light.
|
| 22 |
+
|
| 23 |
+
## Key Features
|
| 24 |
+
|
| 25 |
+
- **Asynchronous Fusion**: The first framework to explicitly model temporal continuity between frames and events via TAF.
|
| 26 |
+
- **Modality Reliability**: CLWF adaptively handles challenging conditions like motion blur or low illumination.
|
| 27 |
+
- **SOTA Performance**: Achieves a 28.2% improvement in average pixel error on real-world benchmarks.
|
| 28 |
+
|
| 29 |
+
## Installation and Usage
|
| 30 |
+
|
| 31 |
+
For detailed instructions on environment setup, data preparation, and running evaluation scripts, please refer to the [official GitHub repository](https://github.com/ljx1002/TAPFormer).
|
| 32 |
+
|
| 33 |
+
## Citation
|
| 34 |
+
|
| 35 |
+
If you find this work useful in your research, please cite:
|
| 36 |
+
|
| 37 |
+
```bibtex
|
| 38 |
+
@article{liu2026tapformer,
|
| 39 |
+
title={TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events},
|
| 40 |
+
author={Liu, Jiaxiong and Tan, Zhen and Zhang, Jinpu and Zhou, Yi and Shen, Hui and Chen, Xieyuanli and Hu, Dewen},
|
| 41 |
+
journal={arXiv preprint arXiv:2603.04989},
|
| 42 |
+
year={2026}
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
@inproceedings{liu2025tracking,
|
| 46 |
+
title={Tracking any point with frame-event fusion network at high frame rate},
|
| 47 |
+
author={Liu, Jiaxiong and Wang, Bo and Tan, Zhen and Zhang, Jinpu and Shen, Hui and Hu, Dewen},
|
| 48 |
+
booktitle={2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
|
| 49 |
+
pages={18834--18840},
|
| 50 |
+
year={2025},
|
| 51 |
+
organization={IEEE}
|
| 52 |
+
}
|
| 53 |
+
```
|