Confetti
/

GLAD

+---
+pipeline_tag: object-detection
+tags:
+- vision-language-tracking
+- diffusion-models
+- visual-tracking
+---
+# GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates
+This repository contains the weights for **GLAD**, a vision-language tracking model introduced in the paper [GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates](https://huggingface.co/papers/2602.00570).
+## Overview
+GLAD (Generative Language-AssisteD tracking) is a pioneering model that utilizes diffusion models for generative multi-modal fusion of text descriptions and template images.
+Current vision-language trackers often struggle with "low-semantic" images (such as those with significant blur or low resolution) because traditional discriminative fusion paradigms have limited effectiveness in bridging the gap between text and degraded visual features. GLAD addresses this by leveraging the reconstruction capabilities of generative models to bolster compatibility between language and images, effectively enhancing the semantic information of the template for more robust tracking.
+## Resources
+- **Paper:** [GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates](https://huggingface.co/papers/2602.00570)
+- **GitHub Repository:** [https://github.com/Confetti-lxy/GLAD](https://github.com/Confetti-lxy/GLAD)
+## Citation
+If you find this work useful in your research, please cite:
+```bibtex
+@article{luo2026glad,
+  title={GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates},
+  author={Luo, Xingyu and Cai, Yidong and Liu, Jie and Tang, Jie and Wu, Gangshan and Wang, Limin},
+  journal={arXiv preprint arXiv:2602.00570},
+  year={2026}
+}
+```