Multimodal Model📖👁️

teohyc 's Collections

updated Apr 9

Experimental vision-language models integrating visual encoders with large language models, exploring multimodal alignment and reasoning capabilities.

Upvote

teohyc/QwigLip-VQA

Visual Question Answering • Updated Apr 9
teohyc/QwigLip-VLM

Image-Text-to-Text • Updated Apr 9

Upvote

Collection guide
Browse collections