Thank you so much, If I apply your model in my research , how can I inform you or cite you , are you willing to hear me more , I wish I could connect with you. "mahbub.hassan@ieee.org" my email. Thank you

replied to prithivMLmods's post 4 months ago

Can we apply it for traffic Engineering for example to understand and detect the vehicle and road scenario as well as level of traffjc jam

reacted to prithivMLmods's post with 👀🤗❤️ 4 months ago

Post

3447

Introducing prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥

✦︎ Models: prithivMLmods/DeepCaption-VLA-7B, also includes prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.

✦︎ Try the demo here: prithivMLmods/VisionScope-R2

✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb

✦︎ Collection: prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a

.
.
.

To know more about it, visit the model card of the respective model. !!

4 replies

MAHBUB HASSAN

AI & ML interests

Recent Activity

Organizations

mahbubchula's activity

TripAI

TripAI

Mahbub