Mueris
/

TurkishVLMTAMGA

clip2mt5-crossattention

Model card Files Files and versions

TurkishVLMTAMGA / README.txt

Mueris's picture

Upload 8 files

8208718 verified about 2 months ago

history blame contribute delete

555 Bytes

	# CLIP2MT5 CrossAttention VQA Model

	This is a Vision-Language model combining CLIP-ViT and mT5 using a custom cross-attention bridge.
	It supports Visual Question Answering (VQA) in Turkish.

	## Usage

	```python
	from PIL import Image
	from hf_clip2mt5 import load_for_inference, predict

	repo_id = "MUERIS/TurkishVLMTAMGA"

	model, tokenizer, device = load_for_inference(repo_id)

	image = Image.open("example.jpg")
	question = "Görselde kaç kişi var?"

	answer = predict(model, tokenizer, device, image, question)
	print(answer)