File size: 447 Bytes
4079a01 cd469b0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# 🏛️ RAG Image Captioning with Landmark Location
This model generates captions for monument/landmark images using a retrieval-augmented generation approach.
## How it works:
- Uses CLIP to extract image embeddings.
- Retrieves top-k similar captions via FAISS.
- Generates a detailed caption with name and location using T5.
## Example
Input: 🏰 Image of the Taj Mahal
Output: _" is a white marble mausoleum located in Agra, India."_
|