YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
ποΈ RAG Image Captioning with Landmark Location
This model generates captions for monument/landmark images using a retrieval-augmented generation approach.
How it works:
- Uses CLIP to extract image embeddings.
- Retrieves top-k similar captions via FAISS.
- Generates a detailed caption with name and location using T5.
Example
Input: π° Image of the Taj Mahal
Output: " is a white marble mausoleum located in Agra, India."
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
1
Ask for provider support