dt-cococaps / README.md
patrickamadeus's picture
Upload DualTowerVLM using push_to_hub
ecabf70 verified
metadata
library_name: dualtowervlm
license: mit
pipeline_tag: image-text-to-text
tags:
  - vision-language
  - multimodal
  - dual-tower
  - research

DualTowerVLM is a dual-tower Vision-Language Model (VLM) architecture that processes images and text through separate towers before combining their representations.

For more information, check out the repository.

Usage:

from models.dual_tower.dual_tower import DualTowerVLM
from models.config import VLMConfig

cfg = VLMConfig()
model = DualTowerVLM.from_pretrained("patrickamadeus/dt-cococaps")