VisualRAG / README.md
Faraz618's picture
Update README.md
66b81c5 verified
|
Raw
History Blame Contribute Delete
1.16 kB
---
title: VisualRAG
emoji: πŸ”
colorFrom: purple
colorTo: green
sdk: gradio
sdk_version: 4.40.0
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
---
# πŸ” VisualRAG β€” Multi-Modal AI System
A production-grade **Retrieval-Augmented Generation (RAG)** system combining computer vision and natural language understanding.
## 🧠 Pipeline
**Index:** `Image β†’ YOLOv8 detection β†’ CLIP ViT-B/32 embedding β†’ FAISS vector store`
**Query:** `Text β†’ CLIP text embedding β†’ cosine k-NN β†’ Zephyr-7B answer generation`
## πŸ›  Stack
| Component | Technology |
|---|---|
| Object detection | YOLOv8n (Ultralytics) |
| Visual embedding | CLIP ViT-B/32 (OpenAI) |
| Vector index | FAISS IndexFlatIP |
| LLM | Zephyr-7B-Ξ² (HF Serverless API) |
| UI | Gradio 4.40.0 |
## πŸš€ How to use
1. **Detect & Index** β€” upload images; YOLOv8 detects objects, CLIP stores 512-d embeddings in FAISS
2. **Query (RAG)** β€” ask a question; CLIP retrieves relevant images, Zephyr-7B answers
3. **How it works** β€” full architecture overview
## πŸ”‘ Optional: HF token
Settings β†’ Variables and secrets β†’ New secret β†’ Name: `HF_TOKEN`