| --- |
| title: VisualRAG |
| emoji: π |
| colorFrom: purple |
| colorTo: green |
| sdk: gradio |
| sdk_version: 4.40.0 |
| python_version: "3.10" |
| app_file: app.py |
| pinned: false |
| license: mit |
| --- |
| |
| # π VisualRAG β Multi-Modal AI System |
|
|
| A production-grade **Retrieval-Augmented Generation (RAG)** system combining computer vision and natural language understanding. |
|
|
| ## π§ Pipeline |
|
|
| **Index:** `Image β YOLOv8 detection β CLIP ViT-B/32 embedding β FAISS vector store` |
| **Query:** `Text β CLIP text embedding β cosine k-NN β Zephyr-7B answer generation` |
|
|
| ## π Stack |
|
|
| | Component | Technology | |
| |---|---| |
| | Object detection | YOLOv8n (Ultralytics) | |
| | Visual embedding | CLIP ViT-B/32 (OpenAI) | |
| | Vector index | FAISS IndexFlatIP | |
| | LLM | Zephyr-7B-Ξ² (HF Serverless API) | |
| | UI | Gradio 4.40.0 | |
|
|
| ## π How to use |
|
|
| 1. **Detect & Index** β upload images; YOLOv8 detects objects, CLIP stores 512-d embeddings in FAISS |
| 2. **Query (RAG)** β ask a question; CLIP retrieves relevant images, Zephyr-7B answers |
| 3. **How it works** β full architecture overview |
|
|
| ## π Optional: HF token |
|
|
| Settings β Variables and secrets β New secret β Name: `HF_TOKEN` |