Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
metadata
title: RAG
emoji: ⚡
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false
license: apache-2.0
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
File: README.md
# RAG Document Assistant
A complete Retrieval-Augmented Generation (RAG) Python project using:
- HuggingFace Transformers
- sentence-transformers (for embeddings)
- ChromaDB (vector store)
- Gradio (UI)
## Project Structure
project/ ├── app.py ├── rag_pipeline.py ├── generator.py ├── utils.py ├── requirements.txt ├── README.md ├── data/ └── db/
## Installation
1. Create a virtual environment (recommended):
```bash
python -m venv venv
source venv/bin/activate # on Windows use venv\Scripts\activate
- Install requirements:
pip install -r requirements.txt
- Run the app:
python app.py
Open the Gradio URL shown in the console (default http://127.0.0.1:7860).
How RAG Works (short)
- Documents are uploaded and their text is extracted.
- Text is chunked into overlapping passages.
- Each chunk is embedded using a pretrained sentence-transformer.
- Chunks and embeddings are stored in a vector database (ChromaDB).
- At query time, the user question is embedded and used to retrieve most relevant chunks.
- Retrieved chunks are passed to a generator LLM which composes a grounded answer.
Notes & Troubleshooting
- Textract may need system-level dependencies for PDF/DOCX parsing on some platforms.
- Large models may require GPUs. For local CPU usage, prefer small models like
flan-t5-small. - If you see memory errors, reduce model size or run on a machine with more RAM.
Roadmap / Improvements
- Add user authentication and per-user collections
- Support incremental indexing and deletion
- Add streaming generation for long answers
- Add API endpoints via FastAPI