--- license: mit tags: - rag - multimodal - faiss - sentence-transformers - clip - mistral - information-retrieval --- # Multimodal RAG System This repository contains a complete Multimodal Retrieval-Augmented Generation (RAG) system that combines text and image search with LLM-based answer generation. ## System Components - **Text Embeddings**: Sentence-BERT (all-MiniLM-L6-v2) - 384 dimensions - **Image Embeddings**: CLIP (ViT-B/32) - 512 dimensions - **Vector Database**: FAISS indices for efficient similarity search - **LLM**: Mistral-7B-Instruct (4-bit quantized) - **Total Vectors**: 446 (161 text + 285 images) ## Files - `text_index.faiss`: FAISS index for text embeddings - `image_index.faiss`: FAISS index for image embeddings - `text_metadata.pkl`: Metadata for text chunks (source, page, content) - `image_metadata.pkl`: Metadata for images (source, page, image_id) - `config.json`: System configuration - `image_summary.json`: Reference summary of images ## Usage See the load cells in the notebook for loading and using this RAG system. ## Features - Semantic text search - Cross-modal image search (text query → image results) - Multiple prompting strategies (Standard, Chain-of-Thought, Few-shot, Zero-shot) - Source attribution and traceability - Real-time answer generation