YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸ›οΈ RAG Image Captioning with Landmark Location

This model generates captions for monument/landmark images using a retrieval-augmented generation approach.

How it works:

  • Uses CLIP to extract image embeddings.
  • Retrieves top-k similar captions via FAISS.
  • Generates a detailed caption with name and location using T5.

Example

Input: 🏰 Image of the Taj Mahal
Output: " is a white marble mausoleum located in Agra, India."

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support