Hamza66628
/

multimodal-rag-system

sentence-transformers

information-retrieval

Model card Files Files and versions

multimodal-rag-system / README.md

Hamza66628's picture

Upload folder using huggingface_hub

52d563e verified about 1 month ago

|

history blame contribute delete

1.31 kB

	---
	license: mit
	tags:
	- rag
	- multimodal
	- faiss
	- sentence-transformers
	- clip
	- mistral
	- information-retrieval
	---

	# Multimodal RAG System

	This repository contains a complete Multimodal Retrieval-Augmented Generation (RAG) system that combines text and image search with LLM-based answer generation.

	## System Components

	- Text Embeddings: Sentence-BERT (all-MiniLM-L6-v2) - 384 dimensions
	- Image Embeddings: CLIP (ViT-B/32) - 512 dimensions
	- Vector Database: FAISS indices for efficient similarity search
	- LLM: Mistral-7B-Instruct (4-bit quantized)
	- Total Vectors: 446 (161 text + 285 images)

	## Files

	- `text_index.faiss`: FAISS index for text embeddings
	- `image_index.faiss`: FAISS index for image embeddings
	- `text_metadata.pkl`: Metadata for text chunks (source, page, content)
	- `image_metadata.pkl`: Metadata for images (source, page, image_id)
	- `config.json`: System configuration
	- `image_summary.json`: Reference summary of images

	## Usage

	See the load cells in the notebook for loading and using this RAG system.

	## Features

	- Semantic text search
	- Cross-modal image search (text query → image results)
	- Multiple prompting strategies (Standard, Chain-of-Thought, Few-shot, Zero-shot)
	- Source attribution and traceability
	- Real-time answer generation