File size: 1,308 Bytes
7cd0d9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
license: mit
tags:
- rag
- multimodal
- faiss
- sentence-transformers
- clip
- mistral
- information-retrieval
---
# Multimodal RAG System
This repository contains a complete Multimodal Retrieval-Augmented Generation (RAG) system that combines text and image search with LLM-based answer generation.
## System Components
- **Text Embeddings**: Sentence-BERT (all-MiniLM-L6-v2) - 384 dimensions
- **Image Embeddings**: CLIP (ViT-B/32) - 512 dimensions
- **Vector Database**: FAISS indices for efficient similarity search
- **LLM**: Mistral-7B-Instruct (4-bit quantized)
- **Total Vectors**: 446 (161 text + 285 images)
## Files
- `text_index.faiss`: FAISS index for text embeddings
- `image_index.faiss`: FAISS index for image embeddings
- `text_metadata.pkl`: Metadata for text chunks (source, page, content)
- `image_metadata.pkl`: Metadata for images (source, page, image_id)
- `config.json`: System configuration
- `image_summary.json`: Reference summary of images
## Usage
See the load cells in the notebook for loading and using this RAG system.
## Features
- Semantic text search
- Cross-modal image search (text query → image results)
- Multiple prompting strategies (Standard, Chain-of-Thought, Few-shot, Zero-shot)
- Source attribution and traceability
- Real-time answer generation
|