Spaces:

LuisMBA
/

multimodal_RAG_kaggle_based

Build error

App Files Files Community

multimodal_RAG_kaggle_based / README.md

LuisMBA

Update README.md

993f74e verified 10 months ago

preview code

raw

history blame contribute delete

1.9 kB

	---
	title: Multimodal RAG Kaggle Based
	emoji: 👁
	colorFrom: red
	colorTo: pink
	sdk: gradio
	sdk_version: 5.25.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Multimodal RAG to augment english recipes searches
	---

	# Multimodal Retrieval System with FAISS

	This repository contains a prototype system for multimodal information retrieval using FAISS, capable of searching across text and images using vector similarity.

	## Structure

	- `notebook/` (or `.ipynb`): Contains the logic to generate the vector indexes for both text and images.
	- `app.py`: Gradio-based interface for interacting with the system.
	- `search_ocean.py`: Core logic for performing FAISS-based similarity search using precomputed indexes.
	- `text_index.faiss`, `image_index.faiss`: The FAISS index files generated by the notebook (already included in the app).
	- `metadata_text.json`, `metadata_image.json`: Associated metadata for mapping index results back to source information.

	## What it does

	- Loads precomputed FAISS indexes (for text and image).
	- Performs retrieval based on a text or image query.
	- Returns top matching results using cosine similarity.

	## What it doesn't (yet) do

	- No generation step (e.g., using LLMs) is implemented in this app.
	- While the code for image retrieval is ready, image indexes must be built in the notebook beforehand.
	- There is no context overlap implemented when chunking the data for indexing. Each chunk is indexed independently, which may affect the quality of retrieval in some use cases.

	## Dependencies

	- `faiss-cpu`
	- `sentence-transformers`
	- `openai-clip`
	- `torch`
	- `torchvision`
	- `gradio`
	- `Pillow`

	## Notes

	- The app is designed to separate concerns between indexing (offline, notebook) and retrieval (live, Gradio app).
	- You can easily extend this to include LLM generation or contextual QA once relevant results are retrieved.