# 🏛️ RAG Image Captioning with Landmark Location This model generates captions for monument/landmark images using a retrieval-augmented generation approach. ## How it works: - Uses CLIP to extract image embeddings. - Retrieves top-k similar captions via FAISS. - Generates a detailed caption with name and location using T5. ## Example Input: 🏰 Image of the Taj Mahal Output: _" is a white marble mausoleum located in Agra, India."_