Spaces:

Yugadharshini
/

DocuMind-AI

Sleeping

App Files Files Community

DocuMind-AI / README.md

skrYugadharshini

Update README with full technical documentation

6c50f63 about 1 month ago

preview code

Raw

History Blame Contribute Delete

3.53 kB

metadata

title: DocuMind-AI
emoji: 📄
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

📄 DocuMind-AI

An intelligent document assistant powered by RAG, LangChain, Groq LLaMA 3, and FAISS. Upload any PDF and chat with it using state-of-the-art AI — built for real-world enterprise use.

🚀 Features

Upload any PDF
Ask questions in natural language
Get accurate answers powered by LLaMA 3
Chat history support

title: DocuMind-AI emoji: 📄 colorFrom: blue colorTo: purple sdk: docker pinned: false

📄 DocuMind-AI — Intelligent Document Assistant

An enterprise-grade RAG (Retrieval Augmented Generation) chatbot that allows users to upload any PDF and interact with it using natural language. Built with LangChain, Groq LLaMA 3, FAISS, and Streamlit.

🔗 Live Demo: huggingface.co/spaces/Yugadharshini/DocuMind-AI

🔗 GitHub: github.com/skrYugadharshini/DocuMind-AI

🚀 Features

Upload any PDF document
Ask questions in natural language
Semantic search using FAISS vector store
Fast and accurate answers powered by Groq LLaMA 3
Chat history support
Clean and intuitive Streamlit UI
Deployed on Hugging Face Spaces using Docker

🛠️ Tech Stack

Technology	Purpose
Python	Core programming language
LangChain	LLM orchestration framework
Groq LLaMA 3	Large Language Model for answer generation
FAISS	Vector store for semantic search
HuggingFace Sentence Transformers	Text embeddings (all-MiniLM-L6-v2)
Streamlit	Frontend UI
PyPDF	PDF loading and parsing
Docker	Containerization for deployment
Hugging Face Spaces	Cloud deployment

⚙️ Full Technical Process

Step 1 — PDF Loading

User uploads any PDF through the Streamlit UI
PyPDF loads and extracts text from all pages

Step 2 — Text Chunking

Document split into chunks of 500 characters
50 character overlap between chunks to preserve context
Uses LangChain RecursiveCharacterTextSplitter

Step 3 — Vector Embeddings

Each chunk converted to a 384-dimensional vector
Uses HuggingFace sentence-transformers (all-MiniLM-L6-v2)
Captures semantic meaning of text

Step 4 — Vector Store

All vectors stored in FAISS index
Enables fast similarity search across all chunks
Finds most relevant chunks for any question

Step 5 — RAG Chain

User asks a question
Question converted to vector
FAISS retrieves top 4 most relevant chunks
Chunks + question sent to Groq LLaMA 3
LLaMA 3 generates accurate answer based on context

Step 6 — Response

Answer displayed in Streamlit chat UI
Chat history maintained during session

🛠️ Built With

LangChain
Groq (LLaMA 3)
FAISS Vector Store
HuggingFace Embeddings
Streamlit

⚙️ How to Run

Clone the repo
Install dependencies: pip install -r requirements.txt
Add your Groq API key in .env
Run: streamlit run app.py

🎯 Future Improvements

Add chat history memory across sessions
Support multiple PDF uploads
Highlight source chunks in the PDF
Add support for other document types (DOCX, TXT)
Fine-tune chunk size for better accuracy
Add user authentication

👩‍💻 Author

Yugadharshini

GitHub: @skrYugadharshini
Hugging Face: @Yugadharshini