DocuMind-AI / README.md
skrYugadharshini
Update README with full technical documentation
6c50f63
|
Raw
History Blame Contribute Delete
3.53 kB
metadata
title: DocuMind-AI
emoji: πŸ“„
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

πŸ“„ DocuMind-AI

An intelligent document assistant powered by RAG, LangChain, Groq LLaMA 3, and FAISS. Upload any PDF and chat with it using state-of-the-art AI β€” built for real-world enterprise use.

πŸš€ Features

  • Upload any PDF
  • Ask questions in natural language
  • Get accurate answers powered by LLaMA 3
  • Chat history support

title: DocuMind-AI emoji: πŸ“„ colorFrom: blue colorTo: purple sdk: docker pinned: false

πŸ“„ DocuMind-AI β€” Intelligent Document Assistant

An enterprise-grade RAG (Retrieval Augmented Generation) chatbot that allows users to upload any PDF and interact with it using natural language. Built with LangChain, Groq LLaMA 3, FAISS, and Streamlit.

πŸ”— Live Demo: huggingface.co/spaces/Yugadharshini/DocuMind-AI

πŸ”— GitHub: github.com/skrYugadharshini/DocuMind-AI


πŸš€ Features

  • Upload any PDF document
  • Ask questions in natural language
  • Semantic search using FAISS vector store
  • Fast and accurate answers powered by Groq LLaMA 3
  • Chat history support
  • Clean and intuitive Streamlit UI
  • Deployed on Hugging Face Spaces using Docker

πŸ› οΈ Tech Stack

Technology Purpose
Python Core programming language
LangChain LLM orchestration framework
Groq LLaMA 3 Large Language Model for answer generation
FAISS Vector store for semantic search
HuggingFace Sentence Transformers Text embeddings (all-MiniLM-L6-v2)
Streamlit Frontend UI
PyPDF PDF loading and parsing
Docker Containerization for deployment
Hugging Face Spaces Cloud deployment

βš™οΈ Full Technical Process

Step 1 β€” PDF Loading

  • User uploads any PDF through the Streamlit UI
  • PyPDF loads and extracts text from all pages

Step 2 β€” Text Chunking

  • Document split into chunks of 500 characters
  • 50 character overlap between chunks to preserve context
  • Uses LangChain RecursiveCharacterTextSplitter

Step 3 β€” Vector Embeddings

  • Each chunk converted to a 384-dimensional vector
  • Uses HuggingFace sentence-transformers (all-MiniLM-L6-v2)
  • Captures semantic meaning of text

Step 4 β€” Vector Store

  • All vectors stored in FAISS index
  • Enables fast similarity search across all chunks
  • Finds most relevant chunks for any question

Step 5 β€” RAG Chain

  • User asks a question
  • Question converted to vector
  • FAISS retrieves top 4 most relevant chunks
  • Chunks + question sent to Groq LLaMA 3
  • LLaMA 3 generates accurate answer based on context

Step 6 β€” Response

  • Answer displayed in Streamlit chat UI
  • Chat history maintained during session

πŸ› οΈ Built With

  • LangChain
  • Groq (LLaMA 3)
  • FAISS Vector Store
  • HuggingFace Embeddings
  • Streamlit

βš™οΈ How to Run

  1. Clone the repo
  2. Install dependencies: pip install -r requirements.txt
  3. Add your Groq API key in .env
  4. Run: streamlit run app.py

🎯 Future Improvements

  • Add chat history memory across sessions
  • Support multiple PDF uploads
  • Highlight source chunks in the PDF
  • Add support for other document types (DOCX, TXT)
  • Fine-tune chunk size for better accuracy
  • Add user authentication

πŸ‘©β€πŸ’» Author

Yugadharshini