rag-chatbot / README.md
vinaykamble289's picture
Update README.md
fc803e1 verified
metadata
title: Rag Chatbot
emoji: πŸš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: 'Retrieval-Augmented Generation (RAG) chatbot '
license: mit

PDF-Based RAG Chatbot

A simple, 100% free Retrieval-Augmented Generation (RAG) chatbot that answers questions from PDF documents. No API keys required!

πŸ”— Links

  • Live Demo: [Deploy to get your link]
  • GitHub: [Your repository link]

After deployment, update these links with your actual URLs!

✨ Features

  • βœ… Upload any two PDF documents
  • βœ… Ask questions about the content
  • βœ… 100% Free - No API keys needed
  • βœ… Privacy-friendly - Everything runs locally
  • βœ… Uses open-source Hugging Face models
  • βœ… Fast vector search with FAISS

πŸš€ How to Use

Online (Hugging Face Spaces)

  1. Visit the deployed app
  2. Upload two PDF files
  3. Click "Process PDFs" (takes ~30 seconds first time)
  4. Ask questions about the documents!

Local Setup

  1. Clone this repository:
git clone <your-repo-url>
cd <repo-name>
  1. Install dependencies:
python setup.py

Or if you prefer:

pip install -r requirements.txt
  1. Run the app:
streamlit run app.py
  1. Open your browser to http://localhost:8501

Note: If you encounter dependency errors, see INSTALLATION.md for troubleshooting.

πŸ› οΈ How It Works

  1. PDF Reading: Extract text from PDFs using PyPDF2
  2. Text Chunking: Split documents into 1000-character chunks with 200 overlap
  3. Embeddings: Convert chunks to vectors using Sentence Transformers
  4. Vector Search: Store in FAISS index for fast similarity search
  5. Question Answering:
    • Your question is converted to a vector
    • Top 3 most similar chunks are retrieved
    • FLAN-T5 generates an answer from the context

πŸ’» Tech Stack

  • Streamlit: Simple, clean web interface
  • PyPDF2: PDF text extraction
  • Sentence Transformers: Text embeddings (all-MiniLM-L6-v2)
  • FAISS: Fast vector similarity search
  • FLAN-T5: Answer generation (google/flan-t5-base)

All models are free and open-source from Hugging Face!

πŸ“¦ Deployment to Hugging Face Spaces

  1. Create a new Space on huggingface.co/spaces
  2. Choose "Streamlit" as the SDK
  3. Upload these files:
    • app.py
    • requirements.txt
    • README.md
  4. The Space will automatically build and deploy!

πŸ’‘ Example Questions

  • What are the main topics in these documents?
  • Summarize the key findings
  • What does the document say about [specific topic]?
  • List the important points mentioned

🎯 Why This Stack?

  • Streamlit: Much simpler than Gradio, easy to understand
  • PyPDF2: Straightforward PDF reading
  • No API Keys: Everything runs locally, completely free
  • Fast: FAISS provides instant search results
  • Open Source: All models from Hugging Face

License

MIT