ljx1002
/

tapformer

Model card Files Files and versions

xet

Community

Improve model card for TAPFormer

by nielsr HF Staff - opened 1 day ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+47

-4

Files changed (1) hide show

README.md +47 -4

README.md CHANGED Viewed

@@ -1,10 +1,53 @@
 ---
 license: mit
 ---
-TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events (CVPR 2026)
-https://arxiv.org/abs/2603.04989
-Project website: tapformer.github.io
-The github repo: https://github.com/ljx1002/TAPFormer

 ---
 license: mit
+pipeline_tag: other
+tags:
+- computer-vision
+- point-tracking
+- event-camera
+- multimodal-fusion
 ---
+# TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events
+This repository contains the weights for **TAPFormer**, presented at CVPR 2026.
+[**Paper (arXiv)**](https://arxiv.org/abs/2603.04989) | [**Project Page**](https://tapformer.github.io/) | [**GitHub Repository**](https://github.com/ljx1002/TAPFormer)
+## Introduction
+Tracking any point (TAP) is a fundamental yet challenging task in computer vision, requiring high precision and long-term motion reasoning. **TAPFormer** is a transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-frequency arbitrary point tracking.
+Our key innovation is a **Transient Asynchronous Fusion (TAF)** mechanism, which explicitly models the temporal evolution between discrete frames through continuous event updates, bridging the gap between low-rate frames and high-rate events. In addition, a **Cross-modal Locally Weighted Fusion (CLWF)** module adaptively adjusts spatial attention according to modality reliability, yielding stable and discriminative features even under blur or low light.
+## Key Features
+- **Asynchronous Fusion**: The first framework to explicitly model temporal continuity between frames and events via TAF.
+- **Modality Reliability**: CLWF adaptively handles challenging conditions like motion blur or low illumination.
+- **SOTA Performance**: Achieves a 28.2% improvement in average pixel error on real-world benchmarks.
+## Installation and Usage
+For detailed instructions on environment setup, data preparation, and running evaluation scripts, please refer to the [official GitHub repository](https://github.com/ljx1002/TAPFormer).
+## Citation
+If you find this work useful in your research, please cite:
+```bibtex
+@article{liu2026tapformer,
+  title={TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events},
+  author={Liu, Jiaxiong and Tan, Zhen and Zhang, Jinpu and Zhou, Yi and Shen, Hui and Chen, Xieyuanli and Hu, Dewen},
+  journal={arXiv preprint arXiv:2603.04989},
+  year={2026}
+}
+@inproceedings{liu2025tracking,
+  title={Tracking any point with frame-event fusion network at high frame rate},
+  author={Liu, Jiaxiong and Wang, Bo and Tan, Zhen and Zhang, Jinpu and Shen, Hui and Hu, Dewen},
+  booktitle={2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
+  pages={18834--18840},
+  year={2025},
+  organization={IEEE}
+}
+```