--- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/model-cards library_name: dualtowervlm license: mit pipeline_tag: image-text-to-text tags: - vision-language - multimodal - dual-tower - research --- **DualTowerVLM** is a dual-tower Vision-Language Model (VLM) architecture that processes images and text through separate towers before combining their representations. For more information, check out the repository. **Usage:** ```python from models.dual_tower.dual_tower import DualTowerVLM from models.config import VLMConfig cfg = VLMConfig() model = DualTowerVLM.from_pretrained("patrickamadeus/dt-cococaps") ```