Spaces:
Sleeping
Sleeping
metadata
title: DocuMind-AI
emoji: π
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
π DocuMind-AI
An intelligent document assistant powered by RAG, LangChain, Groq LLaMA 3, and FAISS. Upload any PDF and chat with it using state-of-the-art AI β built for real-world enterprise use.
π Features
- Upload any PDF
- Ask questions in natural language
- Get accurate answers powered by LLaMA 3
- Chat history support
title: DocuMind-AI emoji: π colorFrom: blue colorTo: purple sdk: docker pinned: false
π DocuMind-AI β Intelligent Document Assistant
An enterprise-grade RAG (Retrieval Augmented Generation) chatbot that allows users to upload any PDF and interact with it using natural language. Built with LangChain, Groq LLaMA 3, FAISS, and Streamlit.
π Live Demo: huggingface.co/spaces/Yugadharshini/DocuMind-AI
π GitHub: github.com/skrYugadharshini/DocuMind-AI
π Features
- Upload any PDF document
- Ask questions in natural language
- Semantic search using FAISS vector store
- Fast and accurate answers powered by Groq LLaMA 3
- Chat history support
- Clean and intuitive Streamlit UI
- Deployed on Hugging Face Spaces using Docker
π οΈ Tech Stack
| Technology | Purpose |
|---|---|
| Python | Core programming language |
| LangChain | LLM orchestration framework |
| Groq LLaMA 3 | Large Language Model for answer generation |
| FAISS | Vector store for semantic search |
| HuggingFace Sentence Transformers | Text embeddings (all-MiniLM-L6-v2) |
| Streamlit | Frontend UI |
| PyPDF | PDF loading and parsing |
| Docker | Containerization for deployment |
| Hugging Face Spaces | Cloud deployment |
βοΈ Full Technical Process
Step 1 β PDF Loading
- User uploads any PDF through the Streamlit UI
- PyPDF loads and extracts text from all pages
Step 2 β Text Chunking
- Document split into chunks of 500 characters
- 50 character overlap between chunks to preserve context
- Uses LangChain RecursiveCharacterTextSplitter
Step 3 β Vector Embeddings
- Each chunk converted to a 384-dimensional vector
- Uses HuggingFace sentence-transformers (all-MiniLM-L6-v2)
- Captures semantic meaning of text
Step 4 β Vector Store
- All vectors stored in FAISS index
- Enables fast similarity search across all chunks
- Finds most relevant chunks for any question
Step 5 β RAG Chain
- User asks a question
- Question converted to vector
- FAISS retrieves top 4 most relevant chunks
- Chunks + question sent to Groq LLaMA 3
- LLaMA 3 generates accurate answer based on context
Step 6 β Response
- Answer displayed in Streamlit chat UI
- Chat history maintained during session
π οΈ Built With
- LangChain
- Groq (LLaMA 3)
- FAISS Vector Store
- HuggingFace Embeddings
- Streamlit
βοΈ How to Run
- Clone the repo
- Install dependencies:
pip install -r requirements.txt - Add your Groq API key in
.env - Run:
streamlit run app.py
π― Future Improvements
- Add chat history memory across sessions
- Support multiple PDF uploads
- Highlight source chunks in the PDF
- Add support for other document types (DOCX, TXT)
- Fine-tune chunk size for better accuracy
- Add user authentication
π©βπ» Author
Yugadharshini
- GitHub: @skrYugadharshini
- Hugging Face: @Yugadharshini