A newer version of the Gradio SDK is available: 6.19.0
metadata
title: VisualRAG
emoji: π
colorFrom: purple
colorTo: green
sdk: gradio
sdk_version: 4.40.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit
π VisualRAG β Multi-Modal AI System
A production-grade Retrieval-Augmented Generation (RAG) system combining computer vision and natural language understanding.
π§ Pipeline
Index: Image β YOLOv8 detection β CLIP ViT-B/32 embedding β FAISS vector store
Query: Text β CLIP text embedding β cosine k-NN β Zephyr-7B answer generation
π Stack
| Component | Technology |
|---|---|
| Object detection | YOLOv8n (Ultralytics) |
| Visual embedding | CLIP ViT-B/32 (OpenAI) |
| Vector index | FAISS IndexFlatIP |
| LLM | Zephyr-7B-Ξ² (HF Serverless API) |
| UI | Gradio 4.40.0 |
π How to use
- Detect & Index β upload images; YOLOv8 detects objects, CLIP stores 512-d embeddings in FAISS
- Query (RAG) β ask a question; CLIP retrieves relevant images, Zephyr-7B answers
- How it works β full architecture overview
π Optional: HF token
Settings β Variables and secrets β New secret β Name: HF_TOKEN