--- title: DocuMind-AI emoji: 📄 colorFrom: blue colorTo: purple sdk: docker pinned: false --- # 📄 DocuMind-AI An intelligent document assistant powered by RAG, LangChain, Groq LLaMA 3, and FAISS. Upload any PDF and chat with it using state-of-the-art AI — built for real-world enterprise use. ## 🚀 Features - Upload any PDF - Ask questions in natural language - Get accurate answers powered by LLaMA 3 - Chat history support --- title: DocuMind-AI emoji: 📄 colorFrom: blue colorTo: purple sdk: docker pinned: false --- # 📄 DocuMind-AI — Intelligent Document Assistant An enterprise-grade RAG (Retrieval Augmented Generation) chatbot that allows users to upload any PDF and interact with it using natural language. Built with LangChain, Groq LLaMA 3, FAISS, and Streamlit. 🔗 Live Demo: [huggingface.co/spaces/Yugadharshini/DocuMind-AI](https://huggingface.co/spaces/Yugadharshini/DocuMind-AI) 🔗 GitHub: [github.com/skrYugadharshini/DocuMind-AI](https://github.com/skrYugadharshini/DocuMind-AI) --- ## 🚀 Features - Upload any PDF document - Ask questions in natural language - Semantic search using FAISS vector store - Fast and accurate answers powered by Groq LLaMA 3 - Chat history support - Clean and intuitive Streamlit UI - Deployed on Hugging Face Spaces using Docker --- ## 🛠️ Tech Stack | Technology | Purpose | |---|---| | Python | Core programming language | | LangChain | LLM orchestration framework | | Groq LLaMA 3 | Large Language Model for answer generation | | FAISS | Vector store for semantic search | | HuggingFace Sentence Transformers | Text embeddings (all-MiniLM-L6-v2) | | Streamlit | Frontend UI | | PyPDF | PDF loading and parsing | | Docker | Containerization for deployment | | Hugging Face Spaces | Cloud deployment | --- ## ⚙️ Full Technical Process ### Step 1 — PDF Loading - User uploads any PDF through the Streamlit UI - PyPDF loads and extracts text from all pages ### Step 2 — Text Chunking - Document split into chunks of 500 characters - 50 character overlap between chunks to preserve context - Uses LangChain RecursiveCharacterTextSplitter ### Step 3 — Vector Embeddings - Each chunk converted to a 384-dimensional vector - Uses HuggingFace sentence-transformers (all-MiniLM-L6-v2) - Captures semantic meaning of text ### Step 4 — Vector Store - All vectors stored in FAISS index - Enables fast similarity search across all chunks - Finds most relevant chunks for any question ### Step 5 — RAG Chain - User asks a question - Question converted to vector - FAISS retrieves top 4 most relevant chunks - Chunks + question sent to Groq LLaMA 3 - LLaMA 3 generates accurate answer based on context ### Step 6 — Response - Answer displayed in Streamlit chat UI - Chat history maintained during session --- ## 🛠️ Built With - LangChain - Groq (LLaMA 3) - FAISS Vector Store - HuggingFace Embeddings - Streamlit ## ⚙️ How to Run 1. Clone the repo 2. Install dependencies: `pip install -r requirements.txt` 3. Add your Groq API key in `.env` 4. Run: `streamlit run app.py` ## 🎯 Future Improvements - Add chat history memory across sessions - Support multiple PDF uploads - Highlight source chunks in the PDF - Add support for other document types (DOCX, TXT) - Fine-tune chunk size for better accuracy - Add user authentication --- ## 👩‍💻 Author **Yugadharshini** - GitHub: [@skrYugadharshini](https://github.com/skrYugadharshini) - Hugging Face: [@Yugadharshini](https://huggingface.co/Yugadharshini)