---
title: VisualRAG
emoji: 🔍
colorFrom: purple
colorTo: green
sdk: gradio
sdk_version: 4.40.0
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
---

# 🔍 VisualRAG — Multi-Modal AI System

A production-grade **Retrieval-Augmented Generation (RAG)** system combining computer vision and natural language understanding.

## 🧠 Pipeline

**Index:** `Image → YOLOv8 detection → CLIP ViT-B/32 embedding → FAISS vector store`  
**Query:** `Text → CLIP text embedding → cosine k-NN → Zephyr-7B answer generation`

## 🛠 Stack

| Component | Technology |
|---|---|
| Object detection | YOLOv8n (Ultralytics) |
| Visual embedding | CLIP ViT-B/32 (OpenAI) |
| Vector index | FAISS IndexFlatIP |
| LLM | Zephyr-7B-β (HF Serverless API) |
| UI | Gradio 4.40.0 |

## 🚀 How to use

1. **Detect & Index** — upload images; YOLOv8 detects objects, CLIP stores 512-d embeddings in FAISS
2. **Query (RAG)** — ask a question; CLIP retrieves relevant images, Zephyr-7B answers
3. **How it works** — full architecture overview

## 🔑 Optional: HF token

Settings → Variables and secrets → New secret → Name: `HF_TOKEN`