dt-cococaps / README.md
patrickamadeus's picture
Upload DualTowerVLM using push_to_hub
ecabf70 verified
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
library_name: dualtowervlm
license: mit
pipeline_tag: image-text-to-text
tags:
- vision-language
- multimodal
- dual-tower
- research
---
**DualTowerVLM** is a dual-tower Vision-Language Model (VLM) architecture that processes images and text through separate towers before combining their representations.
For more information, check out the repository.
**Usage:**
```python
from models.dual_tower.dual_tower import DualTowerVLM
from models.config import VLMConfig
cfg = VLMConfig()
model = DualTowerVLM.from_pretrained("patrickamadeus/dt-cococaps")
```