Spaces:
Sleeping
Sleeping
File size: 7,199 Bytes
6b98b09 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
# Advanced RAG Chatbot - User Guide
## What's New?
### 1. Multiple Images & Texts Support in `/index` API
The `/index` endpoint now supports indexing multiple texts and images in a single request (max 10 each).
**Before:**
```python
# Old: Only 1 text and 1 image
data = {
'id': 'doc1',
'text': 'Single text',
}
files = {'image': open('image.jpg', 'rb')}
```
**After:**
```python
# New: Multiple texts and images (max 10 each)
data = {
'id': 'doc1',
'texts': ['Text 1', 'Text 2', 'Text 3'], # Up to 10
}
files = [
('images', open('image1.jpg', 'rb')),
('images', open('image2.jpg', 'rb')),
('images', open('image3.jpg', 'rb')), # Up to 10
]
response = requests.post('http://localhost:8000/index', data=data, files=files)
```
**Example with cURL:**
```bash
curl -X POST "http://localhost:8000/index" \
-F "id=event123" \
-F "texts=Sự kiện âm nhạc tại Hà Nội" \
-F "texts=Diễn ra vào ngày 20/10/2025" \
-F "texts=Địa điểm: Trung tâm Hội nghị Quốc gia" \
-F "images=@poster1.jpg" \
-F "images=@poster2.jpg" \
-F "images=@poster3.jpg"
```
### 2. Advanced RAG Pipeline in `/chat` API
The chat endpoint now uses modern RAG techniques for better response quality:
#### Key Improvements:
1. **Query Expansion**: Automatically expands your question with variations
2. **Multi-Query Retrieval**: Searches with multiple query variants
3. **Reranking**: Re-scores results for better relevance
4. **Contextual Compression**: Keeps only the most relevant parts
5. **Better Prompt Engineering**: Optimized prompts for LLM
#### How to Use:
**Basic Usage (Auto-enabled):**
```python
import requests
response = requests.post('http://localhost:8000/chat', json={
'message': 'Dao có nguy hiểm không?',
'use_rag': True,
'use_advanced_rag': True, # Default: True
'hf_token': 'hf_xxxxx'
})
result = response.json()
print("Response:", result['response'])
print("RAG Stats:", result['rag_stats']) # See pipeline statistics
```
**Advanced Configuration:**
```python
response = requests.post('http://localhost:8000/chat', json={
'message': 'Làm sao để tạo event mới?',
'use_rag': True,
'use_advanced_rag': True,
# RAG Pipeline Options
'use_query_expansion': True, # Expand query with variations
'use_reranking': True, # Rerank results
'use_compression': True, # Compress context
'score_threshold': 0.5, # Min relevance score (0-1)
'top_k': 5, # Number of documents to retrieve
# LLM Options
'max_tokens': 512,
'temperature': 0.7,
'hf_token': 'hf_xxxxx'
})
```
**Disable Advanced RAG (Use Basic):**
```python
response = requests.post('http://localhost:8000/chat', json={
'message': 'Your question',
'use_rag': True,
'use_advanced_rag': False, # Use basic RAG
})
```
## API Changes Summary
### `/index` Endpoint
**Old Parameters:**
- `id`: str (required)
- `text`: str (required)
- `image`: UploadFile (optional)
**New Parameters:**
- `id`: str (required)
- `texts`: List[str] (optional, max 10)
- `images`: List[UploadFile] (optional, max 10)
**Response:**
```json
{
"success": true,
"id": "doc123",
"message": "Đã index thành công document doc123 với 3 texts và 2 images"
}
```
### `/chat` Endpoint
**New Parameters:**
- `use_advanced_rag`: bool (default: True) - Enable advanced RAG
- `use_query_expansion`: bool (default: True) - Expand query
- `use_reranking`: bool (default: True) - Rerank results
- `use_compression`: bool (default: True) - Compress context
- `score_threshold`: float (default: 0.5) - Min relevance score
**Response (New):**
```json
{
"response": "AI generated answer...",
"context_used": [...],
"timestamp": "2025-10-29T...",
"rag_stats": {
"original_query": "Your question",
"expanded_queries": ["Query variant 1", "Query variant 2"],
"initial_results": 10,
"after_rerank": 5,
"after_compression": 5
}
}
```
## Complete Examples
### Example 1: Index Multiple Social Media Posts
```python
import requests
# Index a social media event with multiple posts and images
data = {
'id': 'event_festival_2025',
'texts': [
'Festival âm nhạc quốc tế Hà Nội 2025',
'Ngày 15-17 tháng 11 năm 2025',
'Địa điểm: Công viên Thống Nhất',
'Line-up: Sơn Tùng MTP, Đen Vâu, Hoàng Thùy Linh',
'Giá vé từ 500.000đ - 2.000.000đ'
]
}
files = [
('images', open('poster_festival.jpg', 'rb')),
('images', open('lineup.jpg', 'rb')),
('images', open('venue_map.jpg', 'rb'))
]
response = requests.post('http://localhost:8000/index', data=data, files=files)
print(response.json())
```
### Example 2: Advanced RAG Chat
```python
import requests
# Chat with advanced RAG
chat_response = requests.post('http://localhost:8000/chat', json={
'message': 'Festival âm nhạc Hà Nội diễn ra khi nào và ở đâu?',
'use_rag': True,
'use_advanced_rag': True,
'top_k': 3,
'score_threshold': 0.6,
'hf_token': 'your_hf_token_here'
})
result = chat_response.json()
print("Answer:", result['response'])
print("\nRetrieved Context:")
for ctx in result['context_used']:
print(f"- [{ctx['id']}] Confidence: {ctx['confidence']:.2%}")
print("\nRAG Pipeline Stats:")
print(f"- Original query: {result['rag_stats']['original_query']}")
print(f"- Query variants: {result['rag_stats']['expanded_queries']}")
print(f"- Documents retrieved: {result['rag_stats']['initial_results']}")
print(f"- After reranking: {result['rag_stats']['after_rerank']}")
```
## Performance Comparison
| Feature | Basic RAG | Advanced RAG |
|---------|-----------|--------------|
| Query Understanding | Single query | Multiple query variants |
| Retrieval Method | Direct vector search | Multi-query + hybrid |
| Result Ranking | Score from DB | Reranked with semantic similarity |
| Context Quality | Full text | Compressed, relevant parts only |
| Response Accuracy | Good | Better |
| Response Time | Faster | Slightly slower but better quality |
## When to Use What?
**Use Basic RAG when:**
- You need fast response time
- Queries are straightforward
- Context is already well-structured
**Use Advanced RAG when:**
- You need higher accuracy
- Queries are complex or ambiguous
- Context documents are long
- You want better relevance
## Troubleshooting
### Error: "Tối đa 10 texts"
You're sending more than 10 texts. Reduce to max 10.
### Error: "Tối đa 10 images"
You're sending more than 10 images. Reduce to max 10.
### RAG stats show 0 results
Your `score_threshold` might be too high. Try lowering it (e.g., 0.3-0.5).
## Next Steps
To further improve RAG, consider:
1. **Add BM25 Hybrid Search**: Combine dense + sparse retrieval
2. **Use Cross-Encoder for Reranking**: Better than embedding similarity
3. **Implement Query Decomposition**: Break complex queries into sub-queries
4. **Add Citation/Source Tracking**: Show which document each fact comes from
5. **Integrate RAG-Anything**: For advanced multimodal document processing
For RAG-Anything integration (more complex), see: https://github.com/HKUDS/RAG-Anything
|