patrickamadeus
/

dt-cococaps

Image-Text-to-Text

vision-language

Model card Files Files and versions

dt-cococaps / README.md

patrickamadeus's picture

Upload DualTowerVLM using push_to_hub

ecabf70 verified 3 days ago

|

history blame contribute delete

756 Bytes

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co/docs/hub/model-cards
	library_name: dualtowervlm
	license: mit
	pipeline_tag: image-text-to-text
	tags:
	- vision-language
	- multimodal
	- dual-tower
	- research
	---

	DualTowerVLM is a dual-tower Vision-Language Model (VLM) architecture that processes images and text through separate towers before combining their representations.

	For more information, check out the repository.

	Usage:

	```python
	from models.dual_tower.dual_tower import DualTowerVLM
	from models.config import VLMConfig

	cfg = VLMConfig()
	model = DualTowerVLM.from_pretrained("patrickamadeus/dt-cococaps")
	```