ucsahin
/

TraVisionLM-base

Image-Text-to-Text

text-generation

Model card Files Files and versions

ucsahin commited on Aug 8, 2024

Commit

5aadfef

·

verified ·

1 Parent(s): 4d34205

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -39,7 +39,7 @@ Türkçe görsel dil modelini deneyimlemeye hazır mısınız? Hadi başlayalım
 ## English
 This model is a multimodal large language model that combines [SigLIP](https://huggingface.co/docs/transformers/en/model_doc/siglip) as its vision encoder with [GPT2-large](https://huggingface.co/docs/transformers/en/model_doc/gpt2) as its language model. The vision projector connects the two modalities together.
-Its architecture closely resembles [PaliGemma](https://arxiv.org/pdf/2407.07726), with some refined adjustments to the vision projector and the causal language modeling.
 Here's the summary of the development process:

 ## English
 This model is a multimodal large language model that combines [SigLIP](https://huggingface.co/docs/transformers/en/model_doc/siglip) as its vision encoder with [GPT2-large](https://huggingface.co/docs/transformers/en/model_doc/gpt2) as its language model. The vision projector connects the two modalities together.
+Its architecture closely resembles [PaliGemma](https://huggingface.co/docs/transformers/v4.44.0/model_doc/paligemma), with some refined adjustments to the vision projector and the causal language modeling.
 Here's the summary of the development process: