Bijay13 commited on
Commit
0cf7776
·
0 Parent(s):

Initial commit: PDF RAG chatbot with LangChain and Groq

Browse files
Files changed (4) hide show
  1. .gitignore +51 -0
  2. README.md +98 -0
  3. app.py +219 -0
  4. requirements.txt +9 -0
.gitignore ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual environments
24
+ venv/
25
+ env/
26
+ ENV/
27
+ .venv
28
+
29
+ # IDEs
30
+ .vscode/
31
+ .idea/
32
+ *.swp
33
+ *.swo
34
+ *~
35
+
36
+ # Gradio
37
+ gradio_queue.db
38
+ flagged/
39
+
40
+ # Vector stores
41
+ faiss_index/
42
+ chroma_db/
43
+
44
+ # Temporary files
45
+ *.tmp
46
+ *.log
47
+ .DS_Store
48
+
49
+ # Environment variables
50
+ .env
51
+ .env.local
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 📄 PDF RAG Chatbot
2
+
3
+ A Retrieval Augmented Generation (RAG) chatbot that allows you to upload PDF documents and have conversations based on their content using AI.
4
+
5
+ ## Features
6
+
7
+ - 📤 **PDF Upload**: Upload any PDF document
8
+ - 🤖 **AI-Powered Chat**: Ask questions about your PDF content
9
+ - 🔍 **Semantic Search**: Uses vector embeddings to find relevant information
10
+ - 💬 **Conversation Memory**: Maintains context throughout the conversation
11
+ - 🚀 **Fast Processing**: Powered by Groq's LLM API
12
+
13
+ ## Tech Stack
14
+
15
+ - **UI**: Gradio
16
+ - **LLM**: Groq (Llama 3.3 70B)
17
+ - **Framework**: LangChain
18
+ - **Embeddings**: HuggingFace (all-MiniLM-L6-v2)
19
+ - **Vector Store**: FAISS
20
+ - **PDF Processing**: PyPDF
21
+
22
+ ## Installation
23
+
24
+ 1. Clone the repository:
25
+ ```bash
26
+ git clone <your-repo-url>
27
+ cd pdf-rag-chatbot
28
+ ```
29
+
30
+ 2. Install dependencies:
31
+ ```bash
32
+ pip install -r requirements.txt
33
+ ```
34
+
35
+ 3. Set up your Groq API key:
36
+ ```bash
37
+ export GROQ_API_KEY="your-api-key-here"
38
+ ```
39
+
40
+ Get your free API key from [Groq Console](https://console.groq.com)
41
+
42
+ ## Usage
43
+
44
+ 1. Run the application:
45
+ ```bash
46
+ python app.py
47
+ ```
48
+
49
+ 2. Open your browser to the displayed URL (usually `http://localhost:7860`)
50
+
51
+ 3. Upload a PDF file
52
+
53
+ 4. Wait for processing to complete
54
+
55
+ 5. Start asking questions about the PDF content!
56
+
57
+ ## How It Works
58
+
59
+ 1. **PDF Processing**: The uploaded PDF is split into smaller chunks
60
+ 2. **Embedding**: Each chunk is converted into a vector embedding
61
+ 3. **Vector Storage**: Embeddings are stored in FAISS for fast retrieval
62
+ 4. **Query**: When you ask a question, the system finds the most relevant chunks
63
+ 5. **Response**: The LLM generates an answer based on the retrieved context
64
+
65
+ ## Environment Variables
66
+
67
+ - `GROQ_API_KEY`: Your Groq API key (required)
68
+
69
+ ## Deployment
70
+
71
+ ### Hugging Face Spaces
72
+
73
+ 1. Create a new Space on Hugging Face
74
+ 2. Upload all files
75
+ 3. Add `GROQ_API_KEY` to Space secrets
76
+ 4. Your app will be live!
77
+
78
+ ### Local Development
79
+
80
+ ```bash
81
+ python app.py
82
+ ```
83
+
84
+ ## Example Questions
85
+
86
+ After uploading a PDF, you can ask questions like:
87
+ - "What is the main topic of this document?"
88
+ - "Summarize the key points"
89
+ - "What does the document say about [specific topic]?"
90
+ - "Can you explain [concept] from the document?"
91
+
92
+ ## License
93
+
94
+ MIT License
95
+
96
+ ## Contributing
97
+
98
+ Contributions are welcome! Please feel free to submit a Pull Request.
app.py ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import gradio as gr
3
+ from langchain_groq import ChatGroq
4
+ from langchain.text_splitter import RecursiveCharacterTextSplitter
5
+ from langchain_community.vectorstores import FAISS
6
+ from langchain_community.embeddings import HuggingFaceEmbeddings
7
+ from langchain.chains import ConversationalRetrievalChain
8
+ from langchain.memory import ConversationBufferMemory
9
+ from langchain_community.document_loaders import PyPDFLoader
10
+ import tempfile
11
+ import shutil
12
+
13
+ MODEL_NAME = "llama-3.3-70b-versatile"
14
+ DEFAULT_API_KEY = os.getenv("GROQ_API_KEY", "")
15
+
16
+ # Global variables
17
+ vectorstore = None
18
+ conversation_chain = None
19
+ chat_history = []
20
+
21
+ def process_pdf(pdf_file, api_key):
22
+ """Process uploaded PDF and create vector store"""
23
+ global vectorstore, conversation_chain, chat_history
24
+
25
+ if not api_key:
26
+ return "Please provide a Groq API key first.", None
27
+
28
+ if pdf_file is None:
29
+ return "Please upload a PDF file.", None
30
+
31
+ try:
32
+ # Save uploaded file temporarily
33
+ temp_dir = tempfile.mkdtemp()
34
+ temp_pdf_path = os.path.join(temp_dir, "uploaded.pdf")
35
+ shutil.copy(pdf_file.name, temp_pdf_path)
36
+
37
+ # Load PDF
38
+ loader = PyPDFLoader(temp_pdf_path)
39
+ documents = loader.load()
40
+
41
+ # Split documents into chunks
42
+ text_splitter = RecursiveCharacterTextSplitter(
43
+ chunk_size=1000,
44
+ chunk_overlap=200,
45
+ length_function=len
46
+ )
47
+ chunks = text_splitter.split_documents(documents)
48
+
49
+ # Create embeddings and vector store
50
+ embeddings = HuggingFaceEmbeddings(
51
+ model_name="sentence-transformers/all-MiniLM-L6-v2"
52
+ )
53
+ vectorstore = FAISS.from_documents(chunks, embeddings)
54
+
55
+ # Initialize LLM
56
+ llm = ChatGroq(
57
+ groq_api_key=api_key,
58
+ model_name=MODEL_NAME,
59
+ temperature=0.7,
60
+ max_tokens=1024
61
+ )
62
+
63
+ # Create conversation chain
64
+ memory = ConversationBufferMemory(
65
+ memory_key="chat_history",
66
+ return_messages=True,
67
+ output_key="answer"
68
+ )
69
+
70
+ conversation_chain = ConversationalRetrievalChain.from_llm(
71
+ llm=llm,
72
+ retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
73
+ memory=memory,
74
+ return_source_documents=True
75
+ )
76
+
77
+ # Reset chat history
78
+ chat_history = []
79
+
80
+ # Cleanup
81
+ shutil.rmtree(temp_dir)
82
+
83
+ return f"✅ PDF processed successfully! Found {len(chunks)} text chunks. You can now ask questions about the document.", []
84
+
85
+ except Exception as e:
86
+ return f"Error processing PDF: {str(e)}", None
87
+
88
+ def chat_with_pdf(message, chat_history_ui, api_key):
89
+ """Handle chat interactions with the PDF content"""
90
+ global conversation_chain, chat_history
91
+
92
+ if not message.strip():
93
+ return chat_history_ui, ""
94
+
95
+ if conversation_chain is None:
96
+ chat_history_ui.append({
97
+ "role": "user",
98
+ "content": message
99
+ })
100
+ chat_history_ui.append({
101
+ "role": "assistant",
102
+ "content": "Please upload a PDF file first before asking questions."
103
+ })
104
+ return chat_history_ui, ""
105
+
106
+ try:
107
+ # Add user message
108
+ chat_history_ui.append({
109
+ "role": "user",
110
+ "content": message
111
+ })
112
+
113
+ # Get response from RAG chain
114
+ response = conversation_chain({"question": message})
115
+ answer = response["answer"]
116
+
117
+ # Add assistant response
118
+ chat_history_ui.append({
119
+ "role": "assistant",
120
+ "content": answer
121
+ })
122
+
123
+ return chat_history_ui, ""
124
+
125
+ except Exception as e:
126
+ chat_history_ui.append({
127
+ "role": "assistant",
128
+ "content": f"Error: {str(e)}"
129
+ })
130
+ return chat_history_ui, ""
131
+
132
+ def reset_chat():
133
+ """Reset the conversation"""
134
+ global conversation_chain, vectorstore, chat_history
135
+ conversation_chain = None
136
+ vectorstore = None
137
+ chat_history = []
138
+ return [], "Ready to upload a new PDF."
139
+
140
+ # Build Gradio Interface
141
+ with gr.Blocks(title="PDF RAG Chatbot") as demo:
142
+ gr.Markdown("# 📄 PDF RAG Chatbot")
143
+ gr.Markdown("Upload a PDF and chat with its content using AI")
144
+ gr.Markdown(f"**Model:** `{MODEL_NAME}`")
145
+
146
+ with gr.Row():
147
+ with gr.Column(scale=1):
148
+ if not DEFAULT_API_KEY:
149
+ api_key_input = gr.Textbox(
150
+ label="Groq API Key",
151
+ placeholder="Enter your Groq API key here...",
152
+ type="password"
153
+ )
154
+ else:
155
+ api_key_input = gr.Textbox(
156
+ type="password",
157
+ value=DEFAULT_API_KEY,
158
+ visible=False
159
+ )
160
+
161
+ pdf_upload = gr.File(
162
+ label="Upload PDF",
163
+ file_types=[".pdf"],
164
+ type="filepath"
165
+ )
166
+
167
+ process_btn = gr.Button("Process PDF", variant="primary")
168
+ status_text = gr.Textbox(
169
+ label="Status",
170
+ value="Upload a PDF to get started.",
171
+ interactive=False
172
+ )
173
+
174
+ clear_btn = gr.Button("Reset Chat", variant="stop")
175
+
176
+ with gr.Column(scale=2):
177
+ chatbot = gr.Chatbot(height=500)
178
+
179
+ with gr.Row():
180
+ msg = gr.Textbox(
181
+ label="Message",
182
+ placeholder="Ask a question about the PDF...",
183
+ scale=4
184
+ )
185
+ submit_btn = gr.Button("Send", scale=1)
186
+
187
+ if not DEFAULT_API_KEY:
188
+ gr.Markdown("### Instructions:")
189
+ gr.Markdown("1. Get a free API key from [Groq Console](https://console.groq.com)")
190
+ gr.Markdown("2. Enter your API key above")
191
+ gr.Markdown("3. Upload a PDF file")
192
+ gr.Markdown("4. Ask questions about the content!")
193
+
194
+ # Event handlers
195
+ process_btn.click(
196
+ process_pdf,
197
+ inputs=[pdf_upload, api_key_input],
198
+ outputs=[status_text, chatbot]
199
+ )
200
+
201
+ msg.submit(
202
+ chat_with_pdf,
203
+ inputs=[msg, chatbot, api_key_input],
204
+ outputs=[chatbot, msg]
205
+ )
206
+
207
+ submit_btn.click(
208
+ chat_with_pdf,
209
+ inputs=[msg, chatbot, api_key_input],
210
+ outputs=[chatbot, msg]
211
+ )
212
+
213
+ clear_btn.click(
214
+ reset_chat,
215
+ outputs=[chatbot, status_text]
216
+ )
217
+
218
+ if __name__ == "__main__":
219
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio==4.44.0
2
+ langchain==0.3.7
3
+ langchain-groq==0.2.1
4
+ langchain-community==0.3.5
5
+ pypdf==5.1.0
6
+ sentence-transformers==3.3.1
7
+ faiss-cpu==1.9.0
8
+ transformers==4.46.3
9
+ torch==2.5.1