Spaces:
Sleeping
Sleeping
metadata
title: Rag Chatbot
emoji: π
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: 'Retrieval-Augmented Generation (RAG) chatbot '
license: mit
PDF-Based RAG Chatbot
A simple, 100% free Retrieval-Augmented Generation (RAG) chatbot that answers questions from PDF documents. No API keys required!
π Links
- Live Demo: [Deploy to get your link]
- GitHub: [Your repository link]
After deployment, update these links with your actual URLs!
β¨ Features
- β Upload any two PDF documents
- β Ask questions about the content
- β 100% Free - No API keys needed
- β Privacy-friendly - Everything runs locally
- β Uses open-source Hugging Face models
- β Fast vector search with FAISS
π How to Use
Online (Hugging Face Spaces)
- Visit the deployed app
- Upload two PDF files
- Click "Process PDFs" (takes ~30 seconds first time)
- Ask questions about the documents!
Local Setup
- Clone this repository:
git clone <your-repo-url>
cd <repo-name>
- Install dependencies:
python setup.py
Or if you prefer:
pip install -r requirements.txt
- Run the app:
streamlit run app.py
- Open your browser to http://localhost:8501
Note: If you encounter dependency errors, see INSTALLATION.md for troubleshooting.
π οΈ How It Works
- PDF Reading: Extract text from PDFs using PyPDF2
- Text Chunking: Split documents into 1000-character chunks with 200 overlap
- Embeddings: Convert chunks to vectors using Sentence Transformers
- Vector Search: Store in FAISS index for fast similarity search
- Question Answering:
- Your question is converted to a vector
- Top 3 most similar chunks are retrieved
- FLAN-T5 generates an answer from the context
π» Tech Stack
- Streamlit: Simple, clean web interface
- PyPDF2: PDF text extraction
- Sentence Transformers: Text embeddings (all-MiniLM-L6-v2)
- FAISS: Fast vector similarity search
- FLAN-T5: Answer generation (google/flan-t5-base)
All models are free and open-source from Hugging Face!
π¦ Deployment to Hugging Face Spaces
- Create a new Space on huggingface.co/spaces
- Choose "Streamlit" as the SDK
- Upload these files:
app.pyrequirements.txtREADME.md
- The Space will automatically build and deploy!
π‘ Example Questions
- What are the main topics in these documents?
- Summarize the key findings
- What does the document say about [specific topic]?
- List the important points mentioned
π― Why This Stack?
- Streamlit: Much simpler than Gradio, easy to understand
- PyPDF2: Straightforward PDF reading
- No API Keys: Everything runs locally, completely free
- Fast: FAISS provides instant search results
- Open Source: All models from Hugging Face
License
MIT