--- title: VisualRAG emoji: 🔍 colorFrom: purple colorTo: green sdk: gradio sdk_version: 4.40.0 python_version: "3.10" app_file: app.py pinned: false license: mit --- # 🔍 VisualRAG — Multi-Modal AI System A production-grade **Retrieval-Augmented Generation (RAG)** system combining computer vision and natural language understanding. ## 🧠 Pipeline **Index:** `Image → YOLOv8 detection → CLIP ViT-B/32 embedding → FAISS vector store` **Query:** `Text → CLIP text embedding → cosine k-NN → Zephyr-7B answer generation` ## 🛠 Stack | Component | Technology | |---|---| | Object detection | YOLOv8n (Ultralytics) | | Visual embedding | CLIP ViT-B/32 (OpenAI) | | Vector index | FAISS IndexFlatIP | | LLM | Zephyr-7B-β (HF Serverless API) | | UI | Gradio 4.40.0 | ## 🚀 How to use 1. **Detect & Index** — upload images; YOLOv8 detects objects, CLIP stores 512-d embeddings in FAISS 2. **Query (RAG)** — ask a question; CLIP retrieves relevant images, Zephyr-7B answers 3. **How it works** — full architecture overview ## 🔑 Optional: HF token Settings → Variables and secrets → New secret → Name: `HF_TOKEN`