TurkishVLMTAMGA / README.txt
Mueris's picture
Upload 8 files
8208718 verified
# CLIP2MT5 CrossAttention VQA Model
This is a Vision-Language model combining **CLIP-ViT** and **mT5** using a custom cross-attention bridge.
It supports Visual Question Answering (VQA) in Turkish.
## Usage
```python
from PIL import Image
from hf_clip2mt5 import load_for_inference, predict
repo_id = "MUERIS/TurkishVLMTAMGA"
model, tokenizer, device = load_for_inference(repo_id)
image = Image.open("example.jpg")
question = "Görselde kaç kişi var?"
answer = predict(model, tokenizer, device, image, question)
print(answer)