TULIP / README.md

nielsr HF Staff

Add/improve model card for TULIP

b24ab89 verified about 1 year ago

1.19 kB

license: apache-2.0
pipeline_tag: image-to-text
library_name: transformers

🌷 TULIP: Token-length Upgraded CLIP

TULIP (Token-length Upgraded CLIP) addresses the challenge of representing long captions in vision-language models. It enhances CLIP-like models by incorporating relative position encodings, enabling effective processing of captions longer than the default 77 tokens.

"TULIP: Token-length Upgraded CLIP" (accepted to ICLR 2025)
Ivona Najdenkoska٭, Mohammad M. Derakshani٭, Yuki M. Asano, Nanne van Noord, Marcel Worring, Cees G. M. Snoek
٭ Equal core contributions

Code: https://github.com/ivonajdenkoska/tulip

Highlights

Improves performance on long caption understanding tasks.
Uses relative positional encodings to handle long image captions.
Works with CLIP-like models.

How to use

Please refer to the original repository for detailed instructions on how to use and train the model.