Spaces:
Sleeping
Sleeping
daniel-simeone commited on
Commit ·
66c4741
1
Parent(s): b2be33e
improve quality
Browse files- DEPLOYMENT.md +63 -20
- README.md +75 -20
- app.py +84 -64
- example_usage.py +1 -1
- ingest_documents.py +5 -4
- ingestion.py +5 -5
- requirements.txt +1 -0
DEPLOYMENT.md
CHANGED
|
@@ -6,6 +6,8 @@ This guide will walk you through deploying your RAG chatbot to Hugging Face Spac
|
|
| 6 |
|
| 7 |
1. **Hugging Face Account**: Sign up at https://huggingface.co/join
|
| 8 |
2. **Access Token**: Get your token from https://huggingface.co/settings/tokens
|
|
|
|
|
|
|
| 9 |
|
| 10 |
## Step-by-Step Deployment
|
| 11 |
|
|
@@ -34,13 +36,21 @@ This guide will walk you through deploying your RAG chatbot to Hugging Face Spac
|
|
| 34 |
- `pdfs/` folder (if you want to include sample PDFs)
|
| 35 |
- `ingest_documents.py` (optional, for manual ingestion)
|
| 36 |
|
| 37 |
-
3. **
|
| 38 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
- **Option A**: Run `ingest_documents.py` on the Space after deployment (via the Space's terminal)
|
| 40 |
- **Option B**: Upload the vector store files manually if they're not too large
|
| 41 |
- **PDFs**: If your PDFs are large (>50MB), consider hosting them elsewhere or using Hugging Face Datasets
|
| 42 |
|
| 43 |
-
|
| 44 |
- Install dependencies from `requirements.txt`
|
| 45 |
- Start your Gradio app
|
| 46 |
- Your Space will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
|
|
@@ -90,6 +100,22 @@ This guide will walk you through deploying your RAG chatbot to Hugging Face Spac
|
|
| 90 |
|
| 91 |
## Post-Deployment Setup
|
| 92 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
### Setting Up the Vector Store on Hugging Face Spaces
|
| 94 |
|
| 95 |
Since the vector store isn't included in the repository, you need to create it on the Space:
|
|
@@ -101,9 +127,9 @@ Since the vector store isn't included in the repository, you need to create it o
|
|
| 101 |
|
| 102 |
2. **Option B: Upload Vector Store Files**:
|
| 103 |
- If your vector store files are small enough:
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
- The app will automatically load them on startup
|
| 108 |
|
| 109 |
3. **Option C: Pre-build in a Script**:
|
|
@@ -118,25 +144,38 @@ Your Space needs these files:
|
|
| 118 |
- ✅ `ingestion.py` - Document ingestion module
|
| 119 |
- ✅ `requirements.txt` - Python dependencies
|
| 120 |
- ✅ `README.md` - Documentation (optional but recommended)
|
| 121 |
-
- ⚠️ `vector_store/` - Will be created on the Space
|
| 122 |
- ⚠️ `pdfs/` - Optional, include if you want sample PDFs
|
| 123 |
|
| 124 |
## Configuration for Hugging Face Spaces
|
| 125 |
|
| 126 |
-
###
|
| 127 |
|
| 128 |
-
The
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
```python
|
|
|
|
| 136 |
app.launch(
|
| 137 |
share=False,
|
| 138 |
server_name="0.0.0.0",
|
| 139 |
-
server_port=
|
| 140 |
theme=MinimalistTheme()
|
| 141 |
)
|
| 142 |
```
|
|
@@ -170,21 +209,25 @@ pinned: false
|
|
| 170 |
|
| 171 |
- Verify the vector store files are in the correct location
|
| 172 |
- Check file permissions
|
| 173 |
-
- Ensure the path in `app.py` is correct (should be `
|
| 174 |
|
| 175 |
-
###
|
| 176 |
|
| 177 |
-
-
|
| 178 |
-
-
|
| 179 |
-
-
|
|
|
|
| 180 |
|
| 181 |
### Memory Issues
|
| 182 |
|
| 183 |
- If you get out-of-memory errors, consider:
|
| 184 |
-
- Using a smaller embedding model
|
| 185 |
- Reducing chunk size in `ingestion.py`
|
|
|
|
| 186 |
- Upgrading to a Space with more memory
|
| 187 |
|
|
|
|
|
|
|
| 188 |
## Updating Your Space
|
| 189 |
|
| 190 |
After making changes locally:
|
|
|
|
| 6 |
|
| 7 |
1. **Hugging Face Account**: Sign up at https://huggingface.co/join
|
| 8 |
2. **Access Token**: Get your token from https://huggingface.co/settings/tokens
|
| 9 |
+
- Create a token with "Read" permissions
|
| 10 |
+
- This token is required for the Inference API to work
|
| 11 |
|
| 12 |
## Step-by-Step Deployment
|
| 13 |
|
|
|
|
| 36 |
- `pdfs/` folder (if you want to include sample PDFs)
|
| 37 |
- `ingest_documents.py` (optional, for manual ingestion)
|
| 38 |
|
| 39 |
+
3. **Set Up HF_TOKEN Secret (REQUIRED)**:
|
| 40 |
+
- Go to your Space → **Settings** → **Secrets**
|
| 41 |
+
- Click **New secret**
|
| 42 |
+
- Name: `HF_TOKEN`
|
| 43 |
+
- Value: Paste your Hugging Face access token
|
| 44 |
+
- Click **Add secret**
|
| 45 |
+
- **Important**: Without this token, the chatbot will not work as it needs the Inference API
|
| 46 |
+
|
| 47 |
+
4. **Important Notes**:
|
| 48 |
+
- **Vector Store**: The `data/vector_store/` folder is in `.gitignore` and won't be uploaded. You have two options:
|
| 49 |
- **Option A**: Run `ingest_documents.py` on the Space after deployment (via the Space's terminal)
|
| 50 |
- **Option B**: Upload the vector store files manually if they're not too large
|
| 51 |
- **PDFs**: If your PDFs are large (>50MB), consider hosting them elsewhere or using Hugging Face Datasets
|
| 52 |
|
| 53 |
+
5. **Wait for Build**: Hugging Face will automatically:
|
| 54 |
- Install dependencies from `requirements.txt`
|
| 55 |
- Start your Gradio app
|
| 56 |
- Your Space will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
|
|
|
|
| 100 |
|
| 101 |
## Post-Deployment Setup
|
| 102 |
|
| 103 |
+
### Setting Up HF_TOKEN Secret
|
| 104 |
+
|
| 105 |
+
**This is REQUIRED for the chatbot to work!**
|
| 106 |
+
|
| 107 |
+
1. Go to your Space on Hugging Face
|
| 108 |
+
2. Click on **Settings** (gear icon)
|
| 109 |
+
3. Scroll down to **Secrets** section
|
| 110 |
+
4. Click **New secret**
|
| 111 |
+
5. Enter:
|
| 112 |
+
- **Name**: `HF_TOKEN`
|
| 113 |
+
- **Value**: Your Hugging Face access token (from https://huggingface.co/settings/tokens)
|
| 114 |
+
6. Click **Add secret**
|
| 115 |
+
7. The token will be automatically available to your app via `os.environ.get("HF_TOKEN")`
|
| 116 |
+
|
| 117 |
+
**Note**: The token must have "Read" permissions. The app uses it to access the Inference API for Mistral-7B-Instruct.
|
| 118 |
+
|
| 119 |
### Setting Up the Vector Store on Hugging Face Spaces
|
| 120 |
|
| 121 |
Since the vector store isn't included in the repository, you need to create it on the Space:
|
|
|
|
| 127 |
|
| 128 |
2. **Option B: Upload Vector Store Files**:
|
| 129 |
- If your vector store files are small enough:
|
| 130 |
+
- Upload `data/vector_store/index.faiss`
|
| 131 |
+
- Upload `data/vector_store/documents.pkl`
|
| 132 |
+
- Upload `data/vector_store/embeddings.pkl`
|
| 133 |
- The app will automatically load them on startup
|
| 134 |
|
| 135 |
3. **Option C: Pre-build in a Script**:
|
|
|
|
| 144 |
- ✅ `ingestion.py` - Document ingestion module
|
| 145 |
- ✅ `requirements.txt` - Python dependencies
|
| 146 |
- ✅ `README.md` - Documentation (optional but recommended)
|
| 147 |
+
- ⚠️ `data/vector_store/` - Will be created on the Space
|
| 148 |
- ⚠️ `pdfs/` - Optional, include if you want sample PDFs
|
| 149 |
|
| 150 |
## Configuration for Hugging Face Spaces
|
| 151 |
|
| 152 |
+
### Model Configuration
|
| 153 |
|
| 154 |
+
The chatbot now uses **Mistral-7B-Instruct-v0.2** via Hugging Face Inference API. This means:
|
| 155 |
+
- **No local model loading**: Faster startup, no need for GPU
|
| 156 |
+
- **Hosted inference**: The model runs on Hugging Face's infrastructure
|
| 157 |
+
- **Requires HF_TOKEN**: Must be set in Space secrets (see above)
|
| 158 |
|
| 159 |
+
The model is configured in `app.py` and can be changed if needed:
|
| 160 |
+
```python
|
| 161 |
+
chatbot = RAGChatbot(model_name="mistralai/Mistral-7B-Instruct-v0.2")
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
### App Configuration
|
| 165 |
+
|
| 166 |
+
The current `app.py` is configured for Spaces:
|
| 167 |
|
| 168 |
+
1. **Port**: Uses `os.environ.get("PORT", 7860)` - Spaces automatically sets this
|
| 169 |
+
2. **Server name**: Uses `0.0.0.0` (required for Spaces)
|
| 170 |
+
3. **Share**: Set to `False` (Spaces provides its own sharing)
|
| 171 |
+
|
| 172 |
+
The launch code is already configured correctly:
|
| 173 |
```python
|
| 174 |
+
port = int(os.environ.get("PORT", 7860))
|
| 175 |
app.launch(
|
| 176 |
share=False,
|
| 177 |
server_name="0.0.0.0",
|
| 178 |
+
server_port=port,
|
| 179 |
theme=MinimalistTheme()
|
| 180 |
)
|
| 181 |
```
|
|
|
|
| 209 |
|
| 210 |
- Verify the vector store files are in the correct location
|
| 211 |
- Check file permissions
|
| 212 |
+
- Ensure the path in `app.py` is correct (should be `data/vector_store`)
|
| 213 |
|
| 214 |
+
### Inference API Issues
|
| 215 |
|
| 216 |
+
- **"HF_TOKEN not set" error**: Make sure you've added the `HF_TOKEN` secret in Space settings
|
| 217 |
+
- **API rate limits**: Free tier has rate limits; upgrade if you need more requests
|
| 218 |
+
- **Model access errors**: Verify your token has "Read" permissions
|
| 219 |
+
- **Connection errors**: Check that the Inference API is accessible from your Space
|
| 220 |
|
| 221 |
### Memory Issues
|
| 222 |
|
| 223 |
- If you get out-of-memory errors, consider:
|
| 224 |
+
- Using a smaller embedding model (e.g., `all-MiniLM-L6-v2` instead of `all-mpnet-base-v2`)
|
| 225 |
- Reducing chunk size in `ingestion.py`
|
| 226 |
+
- Processing fewer documents at once
|
| 227 |
- Upgrading to a Space with more memory
|
| 228 |
|
| 229 |
+
**Note**: Since the model runs via Inference API, memory issues are less likely than with local model loading.
|
| 230 |
+
|
| 231 |
## Updating Your Space
|
| 232 |
|
| 233 |
After making changes locally:
|
README.md
CHANGED
|
@@ -129,28 +129,65 @@ pinned: false
|
|
| 129 |
|
| 130 |
## Configuration
|
| 131 |
|
| 132 |
-
###
|
| 133 |
|
| 134 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
```python
|
| 137 |
-
chatbot = RAGChatbot(model_name="
|
| 138 |
```
|
| 139 |
|
| 140 |
-
|
| 141 |
-
- `
|
| 142 |
-
- `
|
| 143 |
-
- `
|
| 144 |
-
|
|
|
|
| 145 |
|
| 146 |
-
|
| 147 |
|
| 148 |
-
|
| 149 |
|
| 150 |
```python
|
| 151 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
```
|
| 153 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 154 |
## Project Structure
|
| 155 |
|
| 156 |
```
|
|
@@ -162,18 +199,30 @@ chatbot = RAGChatbot(embedding_model="sentence-transformers/all-mpnet-base-v2")
|
|
| 162 |
├── README.md # This file
|
| 163 |
├── pdfs/ # Folder for PDF files (add your PDFs here)
|
| 164 |
│ └── README.md
|
| 165 |
-
└──
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
|
|
|
| 169 |
```
|
| 170 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
## Limitations
|
| 172 |
|
| 173 |
- Vector store is stored locally (not persistent on Hugging Face Spaces by default)
|
| 174 |
- Large documents may take time to process
|
| 175 |
- Some URLs may be blocked or require authentication
|
| 176 |
-
-
|
|
|
|
| 177 |
|
| 178 |
## Troubleshooting
|
| 179 |
|
|
@@ -186,10 +235,16 @@ chatbot = RAGChatbot(embedding_model="sentence-transformers/all-mpnet-base-v2")
|
|
| 186 |
- Some websites block automated requests
|
| 187 |
- Try different URLs or use PDF uploads instead
|
| 188 |
|
| 189 |
-
###
|
| 190 |
-
-
|
| 191 |
-
- Check
|
| 192 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
|
| 194 |
## License
|
| 195 |
|
|
|
|
| 129 |
|
| 130 |
## Configuration
|
| 131 |
|
| 132 |
+
### Setting Up Hugging Face Token (Required)
|
| 133 |
|
| 134 |
+
The chatbot uses Hugging Face Inference API to access high-quality models. You need to set up an API token:
|
| 135 |
+
|
| 136 |
+
1. **Get your token:**
|
| 137 |
+
- Go to https://huggingface.co/settings/tokens
|
| 138 |
+
- Create a new token with "Read" permissions
|
| 139 |
+
- Copy the token
|
| 140 |
+
|
| 141 |
+
2. **For local development:**
|
| 142 |
+
- Set environment variable: `export HF_TOKEN=your_token_here` (Linux/Mac)
|
| 143 |
+
- Or: `set HF_TOKEN=your_token_here` (Windows)
|
| 144 |
+
- Or create a `.env` file with `HF_TOKEN=your_token_here`
|
| 145 |
+
|
| 146 |
+
3. **For Hugging Face Spaces:**
|
| 147 |
+
- Go to your Space → Settings → Secrets
|
| 148 |
+
- Add a new secret: Name = `HF_TOKEN`, Value = your token
|
| 149 |
+
- The app will automatically use this token
|
| 150 |
+
|
| 151 |
+
### Chatbot Model
|
| 152 |
+
|
| 153 |
+
The chatbot uses **Mistral-7B-Instruct-v0.2** via Hugging Face Inference API. This is an instruction-tuned model that provides high-quality, coherent answers.
|
| 154 |
+
|
| 155 |
+
To change the model, edit `app.py`:
|
| 156 |
|
| 157 |
```python
|
| 158 |
+
chatbot = RAGChatbot(model_name="mistralai/Mistral-7B-Instruct-v0.2")
|
| 159 |
```
|
| 160 |
|
| 161 |
+
Other recommended instruction-tuned models:
|
| 162 |
+
- `HuggingFaceH4/zephyr-7b-beta` - Excellent for chat
|
| 163 |
+
- `meta-llama/Meta-Llama-3-8B-Instruct` - High quality (requires access)
|
| 164 |
+
- `microsoft/phi-2` - Smaller, faster option
|
| 165 |
+
|
| 166 |
+
### Embedding Model
|
| 167 |
|
| 168 |
+
The default embedding model is **all-mpnet-base-v2**, which provides high-quality embeddings for better retrieval.
|
| 169 |
|
| 170 |
+
To change the embedding model, edit both `app.py` and `ingest_documents.py`:
|
| 171 |
|
| 172 |
```python
|
| 173 |
+
# In app.py
|
| 174 |
+
chatbot = RAGChatbot(embedding_model="all-mpnet-base-v2")
|
| 175 |
+
|
| 176 |
+
# In ingest_documents.py
|
| 177 |
+
ingestion = DocumentIngestion(embedding_model="all-mpnet-base-v2")
|
| 178 |
```
|
| 179 |
|
| 180 |
+
**Note:** If you change the embedding model, you must re-run `ingest_documents.py` to rebuild the vector store.
|
| 181 |
+
|
| 182 |
+
### Ingestion Parameters
|
| 183 |
+
|
| 184 |
+
The ingestion system uses optimized parameters:
|
| 185 |
+
- **Chunk size**: 600 characters (for precise retrieval)
|
| 186 |
+
- **Chunk overlap**: 150 characters (to avoid cutting sentences)
|
| 187 |
+
- **Retrieval count**: 5 chunks (for comprehensive context)
|
| 188 |
+
|
| 189 |
+
These parameters are set in `ingestion.py` and can be adjusted if needed.
|
| 190 |
+
|
| 191 |
## Project Structure
|
| 192 |
|
| 193 |
```
|
|
|
|
| 199 |
├── README.md # This file
|
| 200 |
├── pdfs/ # Folder for PDF files (add your PDFs here)
|
| 201 |
│ └── README.md
|
| 202 |
+
└── data/
|
| 203 |
+
└── vector_store/ # Saved vector store (created after ingestion)
|
| 204 |
+
├── index.faiss
|
| 205 |
+
├── documents.pkl
|
| 206 |
+
└── embeddings.pkl
|
| 207 |
```
|
| 208 |
|
| 209 |
+
## How It Works
|
| 210 |
+
|
| 211 |
+
The chatbot uses **Retrieval-Augmented Generation (RAG)**:
|
| 212 |
+
|
| 213 |
+
1. **Document Ingestion**: PDFs and URLs are processed into chunks and embedded using sentence transformers
|
| 214 |
+
2. **Vector Search**: When you ask a question, the system searches for the most relevant document chunks
|
| 215 |
+
3. **Answer Generation**: The retrieved context is sent to Mistral-7B-Instruct via Inference API, which synthesizes a coherent answer based on the context
|
| 216 |
+
|
| 217 |
+
This approach combines the accuracy of document retrieval with the natural language capabilities of a large language model.
|
| 218 |
+
|
| 219 |
## Limitations
|
| 220 |
|
| 221 |
- Vector store is stored locally (not persistent on Hugging Face Spaces by default)
|
| 222 |
- Large documents may take time to process
|
| 223 |
- Some URLs may be blocked or require authentication
|
| 224 |
+
- Requires HF_TOKEN for Inference API access (free tier available)
|
| 225 |
+
- If you change embedding model or chunk parameters, you must re-run ingestion
|
| 226 |
|
| 227 |
## Troubleshooting
|
| 228 |
|
|
|
|
| 235 |
- Some websites block automated requests
|
| 236 |
- Try different URLs or use PDF uploads instead
|
| 237 |
|
| 238 |
+
### Inference API Issues
|
| 239 |
+
- Verify your `HF_TOKEN` is set correctly
|
| 240 |
+
- Check that the token has "Read" permissions
|
| 241 |
+
- Ensure you have API access (free tier available)
|
| 242 |
+
- If you get rate limit errors, you may need to upgrade your Hugging Face account
|
| 243 |
+
|
| 244 |
+
### Ingestion Issues
|
| 245 |
+
- If you changed embedding model or chunk parameters, re-run `ingest_documents.py`
|
| 246 |
+
- Ensure you have enough disk space for the vector store
|
| 247 |
+
- Large documents may take time to process
|
| 248 |
|
| 249 |
## License
|
| 250 |
|
app.py
CHANGED
|
@@ -6,9 +6,8 @@ from gradio.themes.base import Base
|
|
| 6 |
from gradio.themes.utils import colors, fonts, sizes
|
| 7 |
import os
|
| 8 |
from typing import List, Tuple
|
| 9 |
-
from
|
| 10 |
from ingestion import DocumentIngestion
|
| 11 |
-
import torch
|
| 12 |
|
| 13 |
|
| 14 |
# Create a clean minimalist theme
|
|
@@ -80,36 +79,38 @@ class RAGChatbot:
|
|
| 80 |
|
| 81 |
def __init__(
|
| 82 |
self,
|
| 83 |
-
model_name: str = "
|
| 84 |
-
embedding_model: str = "all-
|
| 85 |
vector_store_path: str = "data/vector_store"
|
| 86 |
):
|
| 87 |
"""
|
| 88 |
Initialize the RAG chatbot.
|
| 89 |
|
| 90 |
Args:
|
| 91 |
-
model_name: Hugging Face model name for the chatbot
|
| 92 |
embedding_model: Model for document embeddings
|
| 93 |
vector_store_path: Path to saved vector store
|
| 94 |
"""
|
| 95 |
self.model_name = model_name
|
| 96 |
-
self.device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 97 |
|
| 98 |
-
#
|
| 99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
try:
|
| 101 |
-
self.
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
except Exception as e:
|
| 105 |
-
print(f"Warning: Could not load {model_name}. Using a simpler pipeline.")
|
| 106 |
-
self.model = None
|
| 107 |
-
self.tokenizer = None
|
| 108 |
-
self.chatbot_pipeline = pipeline(
|
| 109 |
-
"text-generation",
|
| 110 |
-
model="gpt2",
|
| 111 |
-
device=0 if self.device == "cuda" else -1
|
| 112 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
# Initialize document ingestion
|
| 115 |
self.ingestion = DocumentIngestion(embedding_model=embedding_model)
|
|
@@ -126,9 +127,9 @@ class RAGChatbot:
|
|
| 126 |
|
| 127 |
self.chat_history = []
|
| 128 |
|
| 129 |
-
def generate_response(self, query: str, use_rag: bool = True, num_results: int =
|
| 130 |
"""
|
| 131 |
-
Generate a response to the user query.
|
| 132 |
|
| 133 |
Args:
|
| 134 |
query: User's question
|
|
@@ -138,61 +139,80 @@ class RAGChatbot:
|
|
| 138 |
Returns:
|
| 139 |
Generated response
|
| 140 |
"""
|
| 141 |
-
|
|
|
|
|
|
|
|
|
|
| 142 |
if use_rag and self.ingestion.index is not None:
|
| 143 |
try:
|
| 144 |
results = self.ingestion.search(query, k=num_results)
|
| 145 |
if results:
|
| 146 |
-
#
|
| 147 |
-
|
| 148 |
-
response_parts.append(f"Based on the documents, here's what I found regarding your question: '{query}'\n\n")
|
| 149 |
-
|
| 150 |
for i, result in enumerate(results, 1):
|
| 151 |
-
|
| 152 |
-
text = result['text']
|
| 153 |
-
# Clean up the text
|
| 154 |
-
text = text.strip()
|
| 155 |
if text:
|
| 156 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
-
response
|
| 159 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 160 |
except Exception as e:
|
| 161 |
print(f"Error in RAG retrieval: {e}")
|
| 162 |
return f"I encountered an error while searching the documents: {str(e)}"
|
| 163 |
|
| 164 |
-
# If no RAG or no
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
with torch.no_grad():
|
| 172 |
-
outputs = self.model.generate(
|
| 173 |
-
inputs,
|
| 174 |
-
max_new_tokens=100,
|
| 175 |
-
num_return_sequences=1,
|
| 176 |
-
temperature=0.7,
|
| 177 |
-
do_sample=True,
|
| 178 |
-
pad_token_id=self.tokenizer.eos_token_id,
|
| 179 |
-
eos_token_id=self.tokenizer.eos_token_id,
|
| 180 |
-
)
|
| 181 |
-
|
| 182 |
-
# Decode only new tokens
|
| 183 |
-
input_length = inputs.shape[1]
|
| 184 |
-
generated_tokens = outputs[0][input_length:]
|
| 185 |
-
response = self.tokenizer.decode(generated_tokens, skip_special_tokens=True)
|
| 186 |
-
|
| 187 |
-
# Clean up
|
| 188 |
-
response = response.replace("<|endoftext|>", "").strip()
|
| 189 |
-
|
| 190 |
-
if not response or len(response.strip()) < 3:
|
| 191 |
-
return "I understand your question, but I don't have relevant information in my knowledge base. Please enable RAG to search the documents."
|
| 192 |
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 196 |
|
| 197 |
def chat(self, message: str, history, use_rag: bool):
|
| 198 |
"""
|
|
|
|
| 6 |
from gradio.themes.utils import colors, fonts, sizes
|
| 7 |
import os
|
| 8 |
from typing import List, Tuple
|
| 9 |
+
from huggingface_hub import InferenceClient
|
| 10 |
from ingestion import DocumentIngestion
|
|
|
|
| 11 |
|
| 12 |
|
| 13 |
# Create a clean minimalist theme
|
|
|
|
| 79 |
|
| 80 |
def __init__(
|
| 81 |
self,
|
| 82 |
+
model_name: str = "mistralai/Mistral-7B-Instruct-v0.2",
|
| 83 |
+
embedding_model: str = "all-mpnet-base-v2",
|
| 84 |
vector_store_path: str = "data/vector_store"
|
| 85 |
):
|
| 86 |
"""
|
| 87 |
Initialize the RAG chatbot.
|
| 88 |
|
| 89 |
Args:
|
| 90 |
+
model_name: Hugging Face model name for the chatbot (via Inference API)
|
| 91 |
embedding_model: Model for document embeddings
|
| 92 |
vector_store_path: Path to saved vector store
|
| 93 |
"""
|
| 94 |
self.model_name = model_name
|
|
|
|
| 95 |
|
| 96 |
+
# Initialize Inference API client
|
| 97 |
+
hf_token = os.environ.get("HF_TOKEN")
|
| 98 |
+
if not hf_token:
|
| 99 |
+
print("Warning: HF_TOKEN not set. Inference API calls may fail.")
|
| 100 |
+
print("Set HF_TOKEN environment variable or add it to Space secrets.")
|
| 101 |
+
else:
|
| 102 |
+
print("HF_TOKEN found. Inference API ready.")
|
| 103 |
+
|
| 104 |
+
print(f"Initializing Inference API client for model: {model_name}")
|
| 105 |
try:
|
| 106 |
+
self.inference_client = InferenceClient(
|
| 107 |
+
model=model_name,
|
| 108 |
+
token=hf_token
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
)
|
| 110 |
+
print("Inference API client initialized successfully")
|
| 111 |
+
except Exception as e:
|
| 112 |
+
print(f"Error initializing Inference API client: {e}")
|
| 113 |
+
self.inference_client = None
|
| 114 |
|
| 115 |
# Initialize document ingestion
|
| 116 |
self.ingestion = DocumentIngestion(embedding_model=embedding_model)
|
|
|
|
| 127 |
|
| 128 |
self.chat_history = []
|
| 129 |
|
| 130 |
+
def generate_response(self, query: str, use_rag: bool = True, num_results: int = 5) -> str:
|
| 131 |
"""
|
| 132 |
+
Generate a response to the user query using RAG and Inference API.
|
| 133 |
|
| 134 |
Args:
|
| 135 |
query: User's question
|
|
|
|
| 139 |
Returns:
|
| 140 |
Generated response
|
| 141 |
"""
|
| 142 |
+
if self.inference_client is None:
|
| 143 |
+
return "Error: Inference API client not initialized. Please check HF_TOKEN configuration."
|
| 144 |
+
|
| 145 |
+
# If RAG is enabled and we have a vector store, retrieve context and generate answer
|
| 146 |
if use_rag and self.ingestion.index is not None:
|
| 147 |
try:
|
| 148 |
results = self.ingestion.search(query, k=num_results)
|
| 149 |
if results:
|
| 150 |
+
# Build context from retrieved chunks
|
| 151 |
+
context_parts = []
|
|
|
|
|
|
|
| 152 |
for i, result in enumerate(results, 1):
|
| 153 |
+
text = result['text'].strip()
|
|
|
|
|
|
|
|
|
|
| 154 |
if text:
|
| 155 |
+
context_parts.append(f"[Context {i}]\n{text}")
|
| 156 |
+
|
| 157 |
+
context = "\n\n".join(context_parts)
|
| 158 |
+
|
| 159 |
+
# Build instruction-tuned prompt
|
| 160 |
+
prompt = f"""You are a helpful assistant. Answer the user's question based ONLY on the provided context. If the context doesn't contain enough information to answer the question, say so clearly.
|
| 161 |
+
|
| 162 |
+
Context:
|
| 163 |
+
{context}
|
| 164 |
+
|
| 165 |
+
Question: {query}
|
| 166 |
+
|
| 167 |
+
Answer:"""
|
| 168 |
|
| 169 |
+
# Generate response using Inference API
|
| 170 |
+
try:
|
| 171 |
+
response = self.inference_client.text_generation(
|
| 172 |
+
prompt,
|
| 173 |
+
max_new_tokens=512,
|
| 174 |
+
temperature=0.7,
|
| 175 |
+
top_p=0.9,
|
| 176 |
+
return_full_text=False
|
| 177 |
+
)
|
| 178 |
+
return response.strip()
|
| 179 |
+
except Exception as api_error:
|
| 180 |
+
print(f"Error calling Inference API: {api_error}")
|
| 181 |
+
# Fallback: return formatted chunks with note
|
| 182 |
+
response_parts = []
|
| 183 |
+
response_parts.append("I retrieved relevant information, but couldn't generate a synthesized answer. Here are the relevant chunks:\n\n")
|
| 184 |
+
for i, result in enumerate(results, 1):
|
| 185 |
+
source = result['metadata']['source']
|
| 186 |
+
text = result['text'].strip()
|
| 187 |
+
if text:
|
| 188 |
+
response_parts.append(f"**Relevant information {i}** (from {source}):\n{text}\n")
|
| 189 |
+
return "\n".join(response_parts)
|
| 190 |
+
else:
|
| 191 |
+
# No results found
|
| 192 |
+
return "I couldn't find any relevant information in the documents to answer your question. Please try rephrasing or check if the documents contain information about this topic."
|
| 193 |
except Exception as e:
|
| 194 |
print(f"Error in RAG retrieval: {e}")
|
| 195 |
return f"I encountered an error while searching the documents: {str(e)}"
|
| 196 |
|
| 197 |
+
# If no RAG or no vector store, generate response without context
|
| 198 |
+
try:
|
| 199 |
+
prompt = f"""You are a helpful assistant. Answer the following question concisely.
|
| 200 |
+
|
| 201 |
+
Question: {query}
|
| 202 |
+
|
| 203 |
+
Answer:"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 204 |
|
| 205 |
+
response = self.inference_client.text_generation(
|
| 206 |
+
prompt,
|
| 207 |
+
max_new_tokens=256,
|
| 208 |
+
temperature=0.7,
|
| 209 |
+
top_p=0.9,
|
| 210 |
+
return_full_text=False
|
| 211 |
+
)
|
| 212 |
+
return response.strip()
|
| 213 |
+
except Exception as e:
|
| 214 |
+
print(f"Error generating response: {e}")
|
| 215 |
+
return f"I encountered an error while generating a response: {str(e)}. Please check your HF_TOKEN configuration."
|
| 216 |
|
| 217 |
def chat(self, message: str, history, use_rag: bool):
|
| 218 |
"""
|
example_usage.py
CHANGED
|
@@ -32,7 +32,7 @@ def main():
|
|
| 32 |
ingestion.build_vector_store()
|
| 33 |
|
| 34 |
# Save vector store
|
| 35 |
-
ingestion.save("
|
| 36 |
|
| 37 |
# Example search
|
| 38 |
query = "What is artificial intelligence?"
|
|
|
|
| 32 |
ingestion.build_vector_store()
|
| 33 |
|
| 34 |
# Save vector store
|
| 35 |
+
ingestion.save("data/vector_store")
|
| 36 |
|
| 37 |
# Example search
|
| 38 |
query = "What is artificial intelligence?"
|
ingest_documents.py
CHANGED
|
@@ -11,7 +11,8 @@ from ingestion import DocumentIngestion
|
|
| 11 |
PDF_FOLDER = "data/pdfs" # Folder containing PDF files
|
| 12 |
URLS = [
|
| 13 |
# Add your URLs here, one per line
|
| 14 |
-
"https://
|
|
|
|
| 15 |
]
|
| 16 |
|
| 17 |
|
|
@@ -23,7 +24,7 @@ def main():
|
|
| 23 |
|
| 24 |
# Initialize ingestion system
|
| 25 |
print("\nInitializing document ingestion system...")
|
| 26 |
-
ingestion = DocumentIngestion(embedding_model="all-
|
| 27 |
|
| 28 |
# Collect PDF files
|
| 29 |
pdf_paths = []
|
|
@@ -67,13 +68,13 @@ def main():
|
|
| 67 |
|
| 68 |
# Save vector store
|
| 69 |
print("\nSaving vector store...")
|
| 70 |
-
ingestion.save("
|
| 71 |
|
| 72 |
print("\n" + "=" * 60)
|
| 73 |
print("[SUCCESS] Ingestion complete!")
|
| 74 |
print("=" * 60)
|
| 75 |
print(f"\nTotal document chunks: {len(documents)}")
|
| 76 |
-
print(f"Vector store saved to:
|
| 77 |
print("\nYou can now run 'py app.py' to start the chatbot.")
|
| 78 |
|
| 79 |
except Exception as e:
|
|
|
|
| 11 |
PDF_FOLDER = "data/pdfs" # Folder containing PDF files
|
| 12 |
URLS = [
|
| 13 |
# Add your URLs here, one per line
|
| 14 |
+
"https://inspection.canada.ca/en/food-labels/organic-products/operating-manual"
|
| 15 |
+
|
| 16 |
]
|
| 17 |
|
| 18 |
|
|
|
|
| 24 |
|
| 25 |
# Initialize ingestion system
|
| 26 |
print("\nInitializing document ingestion system...")
|
| 27 |
+
ingestion = DocumentIngestion(embedding_model="all-mpnet-base-v2")
|
| 28 |
|
| 29 |
# Collect PDF files
|
| 30 |
pdf_paths = []
|
|
|
|
| 68 |
|
| 69 |
# Save vector store
|
| 70 |
print("\nSaving vector store...")
|
| 71 |
+
ingestion.save("data/vector_store")
|
| 72 |
|
| 73 |
print("\n" + "=" * 60)
|
| 74 |
print("[SUCCESS] Ingestion complete!")
|
| 75 |
print("=" * 60)
|
| 76 |
print(f"\nTotal document chunks: {len(documents)}")
|
| 77 |
+
print(f"Vector store saved to: data/vector_store")
|
| 78 |
print("\nYou can now run 'py app.py' to start the chatbot.")
|
| 79 |
|
| 80 |
except Exception as e:
|
ingestion.py
CHANGED
|
@@ -17,7 +17,7 @@ import pickle
|
|
| 17 |
class DocumentIngestion:
|
| 18 |
"""Handles ingestion of PDFs and URLs into a searchable vector store."""
|
| 19 |
|
| 20 |
-
def __init__(self, embedding_model: str = "all-
|
| 21 |
"""
|
| 22 |
Initialize the document ingestion system.
|
| 23 |
|
|
@@ -26,8 +26,8 @@ class DocumentIngestion:
|
|
| 26 |
"""
|
| 27 |
self.embedding_model = SentenceTransformer(embedding_model)
|
| 28 |
self.text_splitter = RecursiveCharacterTextSplitter(
|
| 29 |
-
chunk_size=
|
| 30 |
-
chunk_overlap=
|
| 31 |
length_function=len,
|
| 32 |
)
|
| 33 |
self.documents = []
|
|
@@ -200,7 +200,7 @@ class DocumentIngestion:
|
|
| 200 |
|
| 201 |
return results
|
| 202 |
|
| 203 |
-
def save(self, directory: str = "
|
| 204 |
"""Save the vector store to disk."""
|
| 205 |
os.makedirs(directory, exist_ok=True)
|
| 206 |
|
|
@@ -216,7 +216,7 @@ class DocumentIngestion:
|
|
| 216 |
|
| 217 |
print(f"Vector store saved to {directory}")
|
| 218 |
|
| 219 |
-
def load(self, directory: str = "
|
| 220 |
"""Load the vector store from disk."""
|
| 221 |
# Load index
|
| 222 |
self.index = faiss.read_index(os.path.join(directory, "index.faiss"))
|
|
|
|
| 17 |
class DocumentIngestion:
|
| 18 |
"""Handles ingestion of PDFs and URLs into a searchable vector store."""
|
| 19 |
|
| 20 |
+
def __init__(self, embedding_model: str = "all-mpnet-base-v2"):
|
| 21 |
"""
|
| 22 |
Initialize the document ingestion system.
|
| 23 |
|
|
|
|
| 26 |
"""
|
| 27 |
self.embedding_model = SentenceTransformer(embedding_model)
|
| 28 |
self.text_splitter = RecursiveCharacterTextSplitter(
|
| 29 |
+
chunk_size=600,
|
| 30 |
+
chunk_overlap=150,
|
| 31 |
length_function=len,
|
| 32 |
)
|
| 33 |
self.documents = []
|
|
|
|
| 200 |
|
| 201 |
return results
|
| 202 |
|
| 203 |
+
def save(self, directory: str = "data/vector_store"):
|
| 204 |
"""Save the vector store to disk."""
|
| 205 |
os.makedirs(directory, exist_ok=True)
|
| 206 |
|
|
|
|
| 216 |
|
| 217 |
print(f"Vector store saved to {directory}")
|
| 218 |
|
| 219 |
+
def load(self, directory: str = "data/vector_store"):
|
| 220 |
"""Load the vector store from disk."""
|
| 221 |
# Load index
|
| 222 |
self.index = faiss.read_index(os.path.join(directory, "index.faiss"))
|
requirements.txt
CHANGED
|
@@ -10,3 +10,4 @@ requests>=2.31.0
|
|
| 10 |
faiss-cpu>=1.7.4
|
| 11 |
numpy>=1.24.0
|
| 12 |
accelerate>=0.25.0
|
|
|
|
|
|
| 10 |
faiss-cpu>=1.7.4
|
| 11 |
numpy>=1.24.0
|
| 12 |
accelerate>=0.25.0
|
| 13 |
+
huggingface_hub>=0.20.0
|