daniel-simeone commited on
Commit
66c4741
·
1 Parent(s): b2be33e

improve quality

Browse files
Files changed (7) hide show
  1. DEPLOYMENT.md +63 -20
  2. README.md +75 -20
  3. app.py +84 -64
  4. example_usage.py +1 -1
  5. ingest_documents.py +5 -4
  6. ingestion.py +5 -5
  7. requirements.txt +1 -0
DEPLOYMENT.md CHANGED
@@ -6,6 +6,8 @@ This guide will walk you through deploying your RAG chatbot to Hugging Face Spac
6
 
7
  1. **Hugging Face Account**: Sign up at https://huggingface.co/join
8
  2. **Access Token**: Get your token from https://huggingface.co/settings/tokens
 
 
9
 
10
  ## Step-by-Step Deployment
11
 
@@ -34,13 +36,21 @@ This guide will walk you through deploying your RAG chatbot to Hugging Face Spac
34
  - `pdfs/` folder (if you want to include sample PDFs)
35
  - `ingest_documents.py` (optional, for manual ingestion)
36
 
37
- 3. **Important Notes**:
38
- - **Vector Store**: The `vector_store/` folder is in `.gitignore` and won't be uploaded. You have two options:
 
 
 
 
 
 
 
 
39
  - **Option A**: Run `ingest_documents.py` on the Space after deployment (via the Space's terminal)
40
  - **Option B**: Upload the vector store files manually if they're not too large
41
  - **PDFs**: If your PDFs are large (>50MB), consider hosting them elsewhere or using Hugging Face Datasets
42
 
43
- 4. **Wait for Build**: Hugging Face will automatically:
44
  - Install dependencies from `requirements.txt`
45
  - Start your Gradio app
46
  - Your Space will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
@@ -90,6 +100,22 @@ This guide will walk you through deploying your RAG chatbot to Hugging Face Spac
90
 
91
  ## Post-Deployment Setup
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  ### Setting Up the Vector Store on Hugging Face Spaces
94
 
95
  Since the vector store isn't included in the repository, you need to create it on the Space:
@@ -101,9 +127,9 @@ Since the vector store isn't included in the repository, you need to create it o
101
 
102
  2. **Option B: Upload Vector Store Files**:
103
  - If your vector store files are small enough:
104
- - Upload `vector_store/index.faiss`
105
- - Upload `vector_store/documents.pkl`
106
- - Upload `vector_store/embeddings.pkl`
107
  - The app will automatically load them on startup
108
 
109
  3. **Option C: Pre-build in a Script**:
@@ -118,25 +144,38 @@ Your Space needs these files:
118
  - ✅ `ingestion.py` - Document ingestion module
119
  - ✅ `requirements.txt` - Python dependencies
120
  - ✅ `README.md` - Documentation (optional but recommended)
121
- - ⚠️ `vector_store/` - Will be created on the Space
122
  - ⚠️ `pdfs/` - Optional, include if you want sample PDFs
123
 
124
  ## Configuration for Hugging Face Spaces
125
 
126
- ### Update app.py for Spaces
127
 
128
- The current `app.py` should work, but you might want to adjust:
 
 
 
129
 
130
- 1. **Port**: Hugging Face Spaces uses port 7860 automatically
131
- 2. **Server name**: Use `0.0.0.0` (already set)
132
- 3. **Share**: Set to `False` (already set)
 
 
 
 
 
133
 
134
- Your current launch code is fine:
 
 
 
 
135
  ```python
 
136
  app.launch(
137
  share=False,
138
  server_name="0.0.0.0",
139
- server_port=7861, # Note: Spaces uses 7860, but this should auto-adjust
140
  theme=MinimalistTheme()
141
  )
142
  ```
@@ -170,21 +209,25 @@ pinned: false
170
 
171
  - Verify the vector store files are in the correct location
172
  - Check file permissions
173
- - Ensure the path in `app.py` is correct (should be `./vector_store`)
174
 
175
- ### Model Loading Issues
176
 
177
- - Large models may take time to download on first run
178
- - Consider using smaller models for faster startup
179
- - Check available disk space in your Space
 
180
 
181
  ### Memory Issues
182
 
183
  - If you get out-of-memory errors, consider:
184
- - Using a smaller embedding model
185
  - Reducing chunk size in `ingestion.py`
 
186
  - Upgrading to a Space with more memory
187
 
 
 
188
  ## Updating Your Space
189
 
190
  After making changes locally:
 
6
 
7
  1. **Hugging Face Account**: Sign up at https://huggingface.co/join
8
  2. **Access Token**: Get your token from https://huggingface.co/settings/tokens
9
+ - Create a token with "Read" permissions
10
+ - This token is required for the Inference API to work
11
 
12
  ## Step-by-Step Deployment
13
 
 
36
  - `pdfs/` folder (if you want to include sample PDFs)
37
  - `ingest_documents.py` (optional, for manual ingestion)
38
 
39
+ 3. **Set Up HF_TOKEN Secret (REQUIRED)**:
40
+ - Go to your Space **Settings** **Secrets**
41
+ - Click **New secret**
42
+ - Name: `HF_TOKEN`
43
+ - Value: Paste your Hugging Face access token
44
+ - Click **Add secret**
45
+ - **Important**: Without this token, the chatbot will not work as it needs the Inference API
46
+
47
+ 4. **Important Notes**:
48
+ - **Vector Store**: The `data/vector_store/` folder is in `.gitignore` and won't be uploaded. You have two options:
49
  - **Option A**: Run `ingest_documents.py` on the Space after deployment (via the Space's terminal)
50
  - **Option B**: Upload the vector store files manually if they're not too large
51
  - **PDFs**: If your PDFs are large (>50MB), consider hosting them elsewhere or using Hugging Face Datasets
52
 
53
+ 5. **Wait for Build**: Hugging Face will automatically:
54
  - Install dependencies from `requirements.txt`
55
  - Start your Gradio app
56
  - Your Space will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
 
100
 
101
  ## Post-Deployment Setup
102
 
103
+ ### Setting Up HF_TOKEN Secret
104
+
105
+ **This is REQUIRED for the chatbot to work!**
106
+
107
+ 1. Go to your Space on Hugging Face
108
+ 2. Click on **Settings** (gear icon)
109
+ 3. Scroll down to **Secrets** section
110
+ 4. Click **New secret**
111
+ 5. Enter:
112
+ - **Name**: `HF_TOKEN`
113
+ - **Value**: Your Hugging Face access token (from https://huggingface.co/settings/tokens)
114
+ 6. Click **Add secret**
115
+ 7. The token will be automatically available to your app via `os.environ.get("HF_TOKEN")`
116
+
117
+ **Note**: The token must have "Read" permissions. The app uses it to access the Inference API for Mistral-7B-Instruct.
118
+
119
  ### Setting Up the Vector Store on Hugging Face Spaces
120
 
121
  Since the vector store isn't included in the repository, you need to create it on the Space:
 
127
 
128
  2. **Option B: Upload Vector Store Files**:
129
  - If your vector store files are small enough:
130
+ - Upload `data/vector_store/index.faiss`
131
+ - Upload `data/vector_store/documents.pkl`
132
+ - Upload `data/vector_store/embeddings.pkl`
133
  - The app will automatically load them on startup
134
 
135
  3. **Option C: Pre-build in a Script**:
 
144
  - ✅ `ingestion.py` - Document ingestion module
145
  - ✅ `requirements.txt` - Python dependencies
146
  - ✅ `README.md` - Documentation (optional but recommended)
147
+ - ⚠️ `data/vector_store/` - Will be created on the Space
148
  - ⚠️ `pdfs/` - Optional, include if you want sample PDFs
149
 
150
  ## Configuration for Hugging Face Spaces
151
 
152
+ ### Model Configuration
153
 
154
+ The chatbot now uses **Mistral-7B-Instruct-v0.2** via Hugging Face Inference API. This means:
155
+ - **No local model loading**: Faster startup, no need for GPU
156
+ - **Hosted inference**: The model runs on Hugging Face's infrastructure
157
+ - **Requires HF_TOKEN**: Must be set in Space secrets (see above)
158
 
159
+ The model is configured in `app.py` and can be changed if needed:
160
+ ```python
161
+ chatbot = RAGChatbot(model_name="mistralai/Mistral-7B-Instruct-v0.2")
162
+ ```
163
+
164
+ ### App Configuration
165
+
166
+ The current `app.py` is configured for Spaces:
167
 
168
+ 1. **Port**: Uses `os.environ.get("PORT", 7860)` - Spaces automatically sets this
169
+ 2. **Server name**: Uses `0.0.0.0` (required for Spaces)
170
+ 3. **Share**: Set to `False` (Spaces provides its own sharing)
171
+
172
+ The launch code is already configured correctly:
173
  ```python
174
+ port = int(os.environ.get("PORT", 7860))
175
  app.launch(
176
  share=False,
177
  server_name="0.0.0.0",
178
+ server_port=port,
179
  theme=MinimalistTheme()
180
  )
181
  ```
 
209
 
210
  - Verify the vector store files are in the correct location
211
  - Check file permissions
212
+ - Ensure the path in `app.py` is correct (should be `data/vector_store`)
213
 
214
+ ### Inference API Issues
215
 
216
+ - **"HF_TOKEN not set" error**: Make sure you've added the `HF_TOKEN` secret in Space settings
217
+ - **API rate limits**: Free tier has rate limits; upgrade if you need more requests
218
+ - **Model access errors**: Verify your token has "Read" permissions
219
+ - **Connection errors**: Check that the Inference API is accessible from your Space
220
 
221
  ### Memory Issues
222
 
223
  - If you get out-of-memory errors, consider:
224
+ - Using a smaller embedding model (e.g., `all-MiniLM-L6-v2` instead of `all-mpnet-base-v2`)
225
  - Reducing chunk size in `ingestion.py`
226
+ - Processing fewer documents at once
227
  - Upgrading to a Space with more memory
228
 
229
+ **Note**: Since the model runs via Inference API, memory issues are less likely than with local model loading.
230
+
231
  ## Updating Your Space
232
 
233
  After making changes locally:
README.md CHANGED
@@ -129,28 +129,65 @@ pinned: false
129
 
130
  ## Configuration
131
 
132
- ### Changing the Chatbot Model
133
 
134
- Edit `app.py` and change the `model_name` parameter in `RAGChatbot`:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
  ```python
137
- chatbot = RAGChatbot(model_name="your-preferred-model")
138
  ```
139
 
140
- Popular options:
141
- - `microsoft/DialoGPT-medium` (default)
142
- - `gpt2`
143
- - `facebook/blenderbot-400M-distill`
144
- - `microsoft/DialoGPT-large`
 
145
 
146
- ### Changing the Embedding Model
147
 
148
- Edit the `embedding_model` parameter:
149
 
150
  ```python
151
- chatbot = RAGChatbot(embedding_model="sentence-transformers/all-mpnet-base-v2")
 
 
 
 
152
  ```
153
 
 
 
 
 
 
 
 
 
 
 
 
154
  ## Project Structure
155
 
156
  ```
@@ -162,18 +199,30 @@ chatbot = RAGChatbot(embedding_model="sentence-transformers/all-mpnet-base-v2")
162
  ├── README.md # This file
163
  ├── pdfs/ # Folder for PDF files (add your PDFs here)
164
  │ └── README.md
165
- └── vector_store/ # Saved vector store (created after ingestion)
166
- ── index.faiss
167
- ├── documents.pkl
168
- ── embeddings.pkl
 
169
  ```
170
 
 
 
 
 
 
 
 
 
 
 
171
  ## Limitations
172
 
173
  - Vector store is stored locally (not persistent on Hugging Face Spaces by default)
174
  - Large documents may take time to process
175
  - Some URLs may be blocked or require authentication
176
- - GPU recommended for better performance with larger models
 
177
 
178
  ## Troubleshooting
179
 
@@ -186,10 +235,16 @@ chatbot = RAGChatbot(embedding_model="sentence-transformers/all-mpnet-base-v2")
186
  - Some websites block automated requests
187
  - Try different URLs or use PDF uploads instead
188
 
189
- ### Model Loading Issues
190
- - Ensure you have sufficient disk space
191
- - Check your internet connection for model downloads
192
- - Some models require GPU - check model requirements
 
 
 
 
 
 
193
 
194
  ## License
195
 
 
129
 
130
  ## Configuration
131
 
132
+ ### Setting Up Hugging Face Token (Required)
133
 
134
+ The chatbot uses Hugging Face Inference API to access high-quality models. You need to set up an API token:
135
+
136
+ 1. **Get your token:**
137
+ - Go to https://huggingface.co/settings/tokens
138
+ - Create a new token with "Read" permissions
139
+ - Copy the token
140
+
141
+ 2. **For local development:**
142
+ - Set environment variable: `export HF_TOKEN=your_token_here` (Linux/Mac)
143
+ - Or: `set HF_TOKEN=your_token_here` (Windows)
144
+ - Or create a `.env` file with `HF_TOKEN=your_token_here`
145
+
146
+ 3. **For Hugging Face Spaces:**
147
+ - Go to your Space → Settings → Secrets
148
+ - Add a new secret: Name = `HF_TOKEN`, Value = your token
149
+ - The app will automatically use this token
150
+
151
+ ### Chatbot Model
152
+
153
+ The chatbot uses **Mistral-7B-Instruct-v0.2** via Hugging Face Inference API. This is an instruction-tuned model that provides high-quality, coherent answers.
154
+
155
+ To change the model, edit `app.py`:
156
 
157
  ```python
158
+ chatbot = RAGChatbot(model_name="mistralai/Mistral-7B-Instruct-v0.2")
159
  ```
160
 
161
+ Other recommended instruction-tuned models:
162
+ - `HuggingFaceH4/zephyr-7b-beta` - Excellent for chat
163
+ - `meta-llama/Meta-Llama-3-8B-Instruct` - High quality (requires access)
164
+ - `microsoft/phi-2` - Smaller, faster option
165
+
166
+ ### Embedding Model
167
 
168
+ The default embedding model is **all-mpnet-base-v2**, which provides high-quality embeddings for better retrieval.
169
 
170
+ To change the embedding model, edit both `app.py` and `ingest_documents.py`:
171
 
172
  ```python
173
+ # In app.py
174
+ chatbot = RAGChatbot(embedding_model="all-mpnet-base-v2")
175
+
176
+ # In ingest_documents.py
177
+ ingestion = DocumentIngestion(embedding_model="all-mpnet-base-v2")
178
  ```
179
 
180
+ **Note:** If you change the embedding model, you must re-run `ingest_documents.py` to rebuild the vector store.
181
+
182
+ ### Ingestion Parameters
183
+
184
+ The ingestion system uses optimized parameters:
185
+ - **Chunk size**: 600 characters (for precise retrieval)
186
+ - **Chunk overlap**: 150 characters (to avoid cutting sentences)
187
+ - **Retrieval count**: 5 chunks (for comprehensive context)
188
+
189
+ These parameters are set in `ingestion.py` and can be adjusted if needed.
190
+
191
  ## Project Structure
192
 
193
  ```
 
199
  ├── README.md # This file
200
  ├── pdfs/ # Folder for PDF files (add your PDFs here)
201
  │ └── README.md
202
+ └── data/
203
+ ── vector_store/ # Saved vector store (created after ingestion)
204
+ ├── index.faiss
205
+ ── documents.pkl
206
+ └── embeddings.pkl
207
  ```
208
 
209
+ ## How It Works
210
+
211
+ The chatbot uses **Retrieval-Augmented Generation (RAG)**:
212
+
213
+ 1. **Document Ingestion**: PDFs and URLs are processed into chunks and embedded using sentence transformers
214
+ 2. **Vector Search**: When you ask a question, the system searches for the most relevant document chunks
215
+ 3. **Answer Generation**: The retrieved context is sent to Mistral-7B-Instruct via Inference API, which synthesizes a coherent answer based on the context
216
+
217
+ This approach combines the accuracy of document retrieval with the natural language capabilities of a large language model.
218
+
219
  ## Limitations
220
 
221
  - Vector store is stored locally (not persistent on Hugging Face Spaces by default)
222
  - Large documents may take time to process
223
  - Some URLs may be blocked or require authentication
224
+ - Requires HF_TOKEN for Inference API access (free tier available)
225
+ - If you change embedding model or chunk parameters, you must re-run ingestion
226
 
227
  ## Troubleshooting
228
 
 
235
  - Some websites block automated requests
236
  - Try different URLs or use PDF uploads instead
237
 
238
+ ### Inference API Issues
239
+ - Verify your `HF_TOKEN` is set correctly
240
+ - Check that the token has "Read" permissions
241
+ - Ensure you have API access (free tier available)
242
+ - If you get rate limit errors, you may need to upgrade your Hugging Face account
243
+
244
+ ### Ingestion Issues
245
+ - If you changed embedding model or chunk parameters, re-run `ingest_documents.py`
246
+ - Ensure you have enough disk space for the vector store
247
+ - Large documents may take time to process
248
 
249
  ## License
250
 
app.py CHANGED
@@ -6,9 +6,8 @@ from gradio.themes.base import Base
6
  from gradio.themes.utils import colors, fonts, sizes
7
  import os
8
  from typing import List, Tuple
9
- from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
10
  from ingestion import DocumentIngestion
11
- import torch
12
 
13
 
14
  # Create a clean minimalist theme
@@ -80,36 +79,38 @@ class RAGChatbot:
80
 
81
  def __init__(
82
  self,
83
- model_name: str = "microsoft/DialoGPT-medium",
84
- embedding_model: str = "all-MiniLM-L6-v2",
85
  vector_store_path: str = "data/vector_store"
86
  ):
87
  """
88
  Initialize the RAG chatbot.
89
 
90
  Args:
91
- model_name: Hugging Face model name for the chatbot
92
  embedding_model: Model for document embeddings
93
  vector_store_path: Path to saved vector store
94
  """
95
  self.model_name = model_name
96
- self.device = "cuda" if torch.cuda.is_available() else "cpu"
97
 
98
- # Load chatbot model
99
- print(f"Loading chatbot model: {model_name}")
 
 
 
 
 
 
 
100
  try:
101
- self.tokenizer = AutoTokenizer.from_pretrained(model_name)
102
- self.model = AutoModelForCausalLM.from_pretrained(model_name)
103
- self.tokenizer.pad_token = self.tokenizer.eos_token
104
- except Exception as e:
105
- print(f"Warning: Could not load {model_name}. Using a simpler pipeline.")
106
- self.model = None
107
- self.tokenizer = None
108
- self.chatbot_pipeline = pipeline(
109
- "text-generation",
110
- model="gpt2",
111
- device=0 if self.device == "cuda" else -1
112
  )
 
 
 
 
113
 
114
  # Initialize document ingestion
115
  self.ingestion = DocumentIngestion(embedding_model=embedding_model)
@@ -126,9 +127,9 @@ class RAGChatbot:
126
 
127
  self.chat_history = []
128
 
129
- def generate_response(self, query: str, use_rag: bool = True, num_results: int = 3) -> str:
130
  """
131
- Generate a response to the user query.
132
 
133
  Args:
134
  query: User's question
@@ -138,61 +139,80 @@ class RAGChatbot:
138
  Returns:
139
  Generated response
140
  """
141
- # If RAG is enabled and we have a vector store, return relevant context
 
 
 
142
  if use_rag and self.ingestion.index is not None:
143
  try:
144
  results = self.ingestion.search(query, k=num_results)
145
  if results:
146
- # Format the response with relevant context
147
- response_parts = []
148
- response_parts.append(f"Based on the documents, here's what I found regarding your question: '{query}'\n\n")
149
-
150
  for i, result in enumerate(results, 1):
151
- source = result['metadata']['source']
152
- text = result['text']
153
- # Clean up the text
154
- text = text.strip()
155
  if text:
156
- response_parts.append(f"**Relevant information {i}** (from {source}):\n{text}\n")
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
- response = "\n".join(response_parts)
159
- return response
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
  except Exception as e:
161
  print(f"Error in RAG retrieval: {e}")
162
  return f"I encountered an error while searching the documents: {str(e)}"
163
 
164
- # If no RAG or no results, try to generate a response using the model
165
- # But DialoGPT isn't great for this, so we'll keep it simple
166
- if self.model and self.tokenizer:
167
- # Simple generation without complex prompts
168
- inputs = self.tokenizer.encode(query, return_tensors="pt")
169
- inputs = inputs.to(self.device)
170
-
171
- with torch.no_grad():
172
- outputs = self.model.generate(
173
- inputs,
174
- max_new_tokens=100,
175
- num_return_sequences=1,
176
- temperature=0.7,
177
- do_sample=True,
178
- pad_token_id=self.tokenizer.eos_token_id,
179
- eos_token_id=self.tokenizer.eos_token_id,
180
- )
181
-
182
- # Decode only new tokens
183
- input_length = inputs.shape[1]
184
- generated_tokens = outputs[0][input_length:]
185
- response = self.tokenizer.decode(generated_tokens, skip_special_tokens=True)
186
-
187
- # Clean up
188
- response = response.replace("<|endoftext|>", "").strip()
189
-
190
- if not response or len(response.strip()) < 3:
191
- return "I understand your question, but I don't have relevant information in my knowledge base. Please enable RAG to search the documents."
192
 
193
- return response
194
- else:
195
- return "I understand your question, but I don't have relevant information in my knowledge base. Please enable RAG to search the documents."
 
 
 
 
 
 
 
 
196
 
197
  def chat(self, message: str, history, use_rag: bool):
198
  """
 
6
  from gradio.themes.utils import colors, fonts, sizes
7
  import os
8
  from typing import List, Tuple
9
+ from huggingface_hub import InferenceClient
10
  from ingestion import DocumentIngestion
 
11
 
12
 
13
  # Create a clean minimalist theme
 
79
 
80
  def __init__(
81
  self,
82
+ model_name: str = "mistralai/Mistral-7B-Instruct-v0.2",
83
+ embedding_model: str = "all-mpnet-base-v2",
84
  vector_store_path: str = "data/vector_store"
85
  ):
86
  """
87
  Initialize the RAG chatbot.
88
 
89
  Args:
90
+ model_name: Hugging Face model name for the chatbot (via Inference API)
91
  embedding_model: Model for document embeddings
92
  vector_store_path: Path to saved vector store
93
  """
94
  self.model_name = model_name
 
95
 
96
+ # Initialize Inference API client
97
+ hf_token = os.environ.get("HF_TOKEN")
98
+ if not hf_token:
99
+ print("Warning: HF_TOKEN not set. Inference API calls may fail.")
100
+ print("Set HF_TOKEN environment variable or add it to Space secrets.")
101
+ else:
102
+ print("HF_TOKEN found. Inference API ready.")
103
+
104
+ print(f"Initializing Inference API client for model: {model_name}")
105
  try:
106
+ self.inference_client = InferenceClient(
107
+ model=model_name,
108
+ token=hf_token
 
 
 
 
 
 
 
 
109
  )
110
+ print("Inference API client initialized successfully")
111
+ except Exception as e:
112
+ print(f"Error initializing Inference API client: {e}")
113
+ self.inference_client = None
114
 
115
  # Initialize document ingestion
116
  self.ingestion = DocumentIngestion(embedding_model=embedding_model)
 
127
 
128
  self.chat_history = []
129
 
130
+ def generate_response(self, query: str, use_rag: bool = True, num_results: int = 5) -> str:
131
  """
132
+ Generate a response to the user query using RAG and Inference API.
133
 
134
  Args:
135
  query: User's question
 
139
  Returns:
140
  Generated response
141
  """
142
+ if self.inference_client is None:
143
+ return "Error: Inference API client not initialized. Please check HF_TOKEN configuration."
144
+
145
+ # If RAG is enabled and we have a vector store, retrieve context and generate answer
146
  if use_rag and self.ingestion.index is not None:
147
  try:
148
  results = self.ingestion.search(query, k=num_results)
149
  if results:
150
+ # Build context from retrieved chunks
151
+ context_parts = []
 
 
152
  for i, result in enumerate(results, 1):
153
+ text = result['text'].strip()
 
 
 
154
  if text:
155
+ context_parts.append(f"[Context {i}]\n{text}")
156
+
157
+ context = "\n\n".join(context_parts)
158
+
159
+ # Build instruction-tuned prompt
160
+ prompt = f"""You are a helpful assistant. Answer the user's question based ONLY on the provided context. If the context doesn't contain enough information to answer the question, say so clearly.
161
+
162
+ Context:
163
+ {context}
164
+
165
+ Question: {query}
166
+
167
+ Answer:"""
168
 
169
+ # Generate response using Inference API
170
+ try:
171
+ response = self.inference_client.text_generation(
172
+ prompt,
173
+ max_new_tokens=512,
174
+ temperature=0.7,
175
+ top_p=0.9,
176
+ return_full_text=False
177
+ )
178
+ return response.strip()
179
+ except Exception as api_error:
180
+ print(f"Error calling Inference API: {api_error}")
181
+ # Fallback: return formatted chunks with note
182
+ response_parts = []
183
+ response_parts.append("I retrieved relevant information, but couldn't generate a synthesized answer. Here are the relevant chunks:\n\n")
184
+ for i, result in enumerate(results, 1):
185
+ source = result['metadata']['source']
186
+ text = result['text'].strip()
187
+ if text:
188
+ response_parts.append(f"**Relevant information {i}** (from {source}):\n{text}\n")
189
+ return "\n".join(response_parts)
190
+ else:
191
+ # No results found
192
+ return "I couldn't find any relevant information in the documents to answer your question. Please try rephrasing or check if the documents contain information about this topic."
193
  except Exception as e:
194
  print(f"Error in RAG retrieval: {e}")
195
  return f"I encountered an error while searching the documents: {str(e)}"
196
 
197
+ # If no RAG or no vector store, generate response without context
198
+ try:
199
+ prompt = f"""You are a helpful assistant. Answer the following question concisely.
200
+
201
+ Question: {query}
202
+
203
+ Answer:"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204
 
205
+ response = self.inference_client.text_generation(
206
+ prompt,
207
+ max_new_tokens=256,
208
+ temperature=0.7,
209
+ top_p=0.9,
210
+ return_full_text=False
211
+ )
212
+ return response.strip()
213
+ except Exception as e:
214
+ print(f"Error generating response: {e}")
215
+ return f"I encountered an error while generating a response: {str(e)}. Please check your HF_TOKEN configuration."
216
 
217
  def chat(self, message: str, history, use_rag: bool):
218
  """
example_usage.py CHANGED
@@ -32,7 +32,7 @@ def main():
32
  ingestion.build_vector_store()
33
 
34
  # Save vector store
35
- ingestion.save("./vector_store")
36
 
37
  # Example search
38
  query = "What is artificial intelligence?"
 
32
  ingestion.build_vector_store()
33
 
34
  # Save vector store
35
+ ingestion.save("data/vector_store")
36
 
37
  # Example search
38
  query = "What is artificial intelligence?"
ingest_documents.py CHANGED
@@ -11,7 +11,8 @@ from ingestion import DocumentIngestion
11
  PDF_FOLDER = "data/pdfs" # Folder containing PDF files
12
  URLS = [
13
  # Add your URLs here, one per line
14
- "https://www.ontario.ca/page/organic-crop-and-livestock-production-ontario"
 
15
  ]
16
 
17
 
@@ -23,7 +24,7 @@ def main():
23
 
24
  # Initialize ingestion system
25
  print("\nInitializing document ingestion system...")
26
- ingestion = DocumentIngestion(embedding_model="all-MiniLM-L6-v2")
27
 
28
  # Collect PDF files
29
  pdf_paths = []
@@ -67,13 +68,13 @@ def main():
67
 
68
  # Save vector store
69
  print("\nSaving vector store...")
70
- ingestion.save("./vector_store")
71
 
72
  print("\n" + "=" * 60)
73
  print("[SUCCESS] Ingestion complete!")
74
  print("=" * 60)
75
  print(f"\nTotal document chunks: {len(documents)}")
76
- print(f"Vector store saved to: ./vector_store")
77
  print("\nYou can now run 'py app.py' to start the chatbot.")
78
 
79
  except Exception as e:
 
11
  PDF_FOLDER = "data/pdfs" # Folder containing PDF files
12
  URLS = [
13
  # Add your URLs here, one per line
14
+ "https://inspection.canada.ca/en/food-labels/organic-products/operating-manual"
15
+
16
  ]
17
 
18
 
 
24
 
25
  # Initialize ingestion system
26
  print("\nInitializing document ingestion system...")
27
+ ingestion = DocumentIngestion(embedding_model="all-mpnet-base-v2")
28
 
29
  # Collect PDF files
30
  pdf_paths = []
 
68
 
69
  # Save vector store
70
  print("\nSaving vector store...")
71
+ ingestion.save("data/vector_store")
72
 
73
  print("\n" + "=" * 60)
74
  print("[SUCCESS] Ingestion complete!")
75
  print("=" * 60)
76
  print(f"\nTotal document chunks: {len(documents)}")
77
+ print(f"Vector store saved to: data/vector_store")
78
  print("\nYou can now run 'py app.py' to start the chatbot.")
79
 
80
  except Exception as e:
ingestion.py CHANGED
@@ -17,7 +17,7 @@ import pickle
17
  class DocumentIngestion:
18
  """Handles ingestion of PDFs and URLs into a searchable vector store."""
19
 
20
- def __init__(self, embedding_model: str = "all-MiniLM-L6-v2"):
21
  """
22
  Initialize the document ingestion system.
23
 
@@ -26,8 +26,8 @@ class DocumentIngestion:
26
  """
27
  self.embedding_model = SentenceTransformer(embedding_model)
28
  self.text_splitter = RecursiveCharacterTextSplitter(
29
- chunk_size=1000,
30
- chunk_overlap=200,
31
  length_function=len,
32
  )
33
  self.documents = []
@@ -200,7 +200,7 @@ class DocumentIngestion:
200
 
201
  return results
202
 
203
- def save(self, directory: str = "./vector_store"):
204
  """Save the vector store to disk."""
205
  os.makedirs(directory, exist_ok=True)
206
 
@@ -216,7 +216,7 @@ class DocumentIngestion:
216
 
217
  print(f"Vector store saved to {directory}")
218
 
219
- def load(self, directory: str = "./vector_store"):
220
  """Load the vector store from disk."""
221
  # Load index
222
  self.index = faiss.read_index(os.path.join(directory, "index.faiss"))
 
17
  class DocumentIngestion:
18
  """Handles ingestion of PDFs and URLs into a searchable vector store."""
19
 
20
+ def __init__(self, embedding_model: str = "all-mpnet-base-v2"):
21
  """
22
  Initialize the document ingestion system.
23
 
 
26
  """
27
  self.embedding_model = SentenceTransformer(embedding_model)
28
  self.text_splitter = RecursiveCharacterTextSplitter(
29
+ chunk_size=600,
30
+ chunk_overlap=150,
31
  length_function=len,
32
  )
33
  self.documents = []
 
200
 
201
  return results
202
 
203
+ def save(self, directory: str = "data/vector_store"):
204
  """Save the vector store to disk."""
205
  os.makedirs(directory, exist_ok=True)
206
 
 
216
 
217
  print(f"Vector store saved to {directory}")
218
 
219
+ def load(self, directory: str = "data/vector_store"):
220
  """Load the vector store from disk."""
221
  # Load index
222
  self.index = faiss.read_index(os.path.join(directory, "index.faiss"))
requirements.txt CHANGED
@@ -10,3 +10,4 @@ requests>=2.31.0
10
  faiss-cpu>=1.7.4
11
  numpy>=1.24.0
12
  accelerate>=0.25.0
 
 
10
  faiss-cpu>=1.7.4
11
  numpy>=1.24.0
12
  accelerate>=0.25.0
13
+ huggingface_hub>=0.20.0