File size: 7,199 Bytes
6b98b09
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# Advanced RAG Chatbot - User Guide

## What's New?

### 1. Multiple Images & Texts Support in `/index` API

The `/index` endpoint now supports indexing multiple texts and images in a single request (max 10 each).

**Before:**
```python
# Old: Only 1 text and 1 image
data = {
    'id': 'doc1',
    'text': 'Single text',
}
files = {'image': open('image.jpg', 'rb')}
```

**After:**
```python
# New: Multiple texts and images (max 10 each)
data = {
    'id': 'doc1',
    'texts': ['Text 1', 'Text 2', 'Text 3'],  # Up to 10
}
files = [
    ('images', open('image1.jpg', 'rb')),
    ('images', open('image2.jpg', 'rb')),
    ('images', open('image3.jpg', 'rb')),  # Up to 10
]
response = requests.post('http://localhost:8000/index', data=data, files=files)
```

**Example with cURL:**
```bash
curl -X POST "http://localhost:8000/index" \
  -F "id=event123" \
  -F "texts=Sự kiện âm nhạc tại Hà Nội" \
  -F "texts=Diễn ra vào ngày 20/10/2025" \
  -F "texts=Địa điểm: Trung tâm Hội nghị Quốc gia" \
  -F "images=@poster1.jpg" \
  -F "images=@poster2.jpg" \
  -F "images=@poster3.jpg"
```

### 2. Advanced RAG Pipeline in `/chat` API

The chat endpoint now uses modern RAG techniques for better response quality:

#### Key Improvements:

1. **Query Expansion**: Automatically expands your question with variations
2. **Multi-Query Retrieval**: Searches with multiple query variants
3. **Reranking**: Re-scores results for better relevance
4. **Contextual Compression**: Keeps only the most relevant parts
5. **Better Prompt Engineering**: Optimized prompts for LLM

#### How to Use:

**Basic Usage (Auto-enabled):**
```python
import requests

response = requests.post('http://localhost:8000/chat', json={
    'message': 'Dao có nguy hiểm không?',
    'use_rag': True,
    'use_advanced_rag': True,  # Default: True
    'hf_token': 'hf_xxxxx'
})

result = response.json()
print("Response:", result['response'])
print("RAG Stats:", result['rag_stats'])  # See pipeline statistics
```

**Advanced Configuration:**
```python
response = requests.post('http://localhost:8000/chat', json={
    'message': 'Làm sao để tạo event mới?',
    'use_rag': True,
    'use_advanced_rag': True,

    # RAG Pipeline Options
    'use_query_expansion': True,    # Expand query with variations
    'use_reranking': True,          # Rerank results
    'use_compression': True,        # Compress context
    'score_threshold': 0.5,         # Min relevance score (0-1)
    'top_k': 5,                     # Number of documents to retrieve

    # LLM Options
    'max_tokens': 512,
    'temperature': 0.7,
    'hf_token': 'hf_xxxxx'
})
```

**Disable Advanced RAG (Use Basic):**
```python
response = requests.post('http://localhost:8000/chat', json={
    'message': 'Your question',
    'use_rag': True,
    'use_advanced_rag': False,  # Use basic RAG
})
```

## API Changes Summary

### `/index` Endpoint

**Old Parameters:**
- `id`: str (required)
- `text`: str (required)
- `image`: UploadFile (optional)

**New Parameters:**
- `id`: str (required)
- `texts`: List[str] (optional, max 10)
- `images`: List[UploadFile] (optional, max 10)

**Response:**
```json
{
  "success": true,
  "id": "doc123",
  "message": "Đã index thành công document doc123 với 3 texts và 2 images"
}
```

### `/chat` Endpoint

**New Parameters:**
- `use_advanced_rag`: bool (default: True) - Enable advanced RAG
- `use_query_expansion`: bool (default: True) - Expand query
- `use_reranking`: bool (default: True) - Rerank results
- `use_compression`: bool (default: True) - Compress context
- `score_threshold`: float (default: 0.5) - Min relevance score

**Response (New):**
```json
{
  "response": "AI generated answer...",
  "context_used": [...],
  "timestamp": "2025-10-29T...",
  "rag_stats": {
    "original_query": "Your question",
    "expanded_queries": ["Query variant 1", "Query variant 2"],
    "initial_results": 10,
    "after_rerank": 5,
    "after_compression": 5
  }
}
```

## Complete Examples

### Example 1: Index Multiple Social Media Posts

```python
import requests

# Index a social media event with multiple posts and images
data = {
    'id': 'event_festival_2025',
    'texts': [
        'Festival âm nhạc quốc tế Hà Nội 2025',
        'Ngày 15-17 tháng 11 năm 2025',
        'Địa điểm: Công viên Thống Nhất',
        'Line-up: Sơn Tùng MTP, Đen Vâu, Hoàng Thùy Linh',
        'Giá vé từ 500.000đ - 2.000.000đ'
    ]
}

files = [
    ('images', open('poster_festival.jpg', 'rb')),
    ('images', open('lineup.jpg', 'rb')),
    ('images', open('venue_map.jpg', 'rb'))
]

response = requests.post('http://localhost:8000/index', data=data, files=files)
print(response.json())
```

### Example 2: Advanced RAG Chat

```python
import requests

# Chat with advanced RAG
chat_response = requests.post('http://localhost:8000/chat', json={
    'message': 'Festival âm nhạc Hà Nội diễn ra khi nào và ở đâu?',
    'use_rag': True,
    'use_advanced_rag': True,
    'top_k': 3,
    'score_threshold': 0.6,
    'hf_token': 'your_hf_token_here'
})

result = chat_response.json()
print("Answer:", result['response'])
print("\nRetrieved Context:")
for ctx in result['context_used']:
    print(f"- [{ctx['id']}] Confidence: {ctx['confidence']:.2%}")

print("\nRAG Pipeline Stats:")
print(f"- Original query: {result['rag_stats']['original_query']}")
print(f"- Query variants: {result['rag_stats']['expanded_queries']}")
print(f"- Documents retrieved: {result['rag_stats']['initial_results']}")
print(f"- After reranking: {result['rag_stats']['after_rerank']}")
```

## Performance Comparison

| Feature | Basic RAG | Advanced RAG |
|---------|-----------|--------------|
| Query Understanding | Single query | Multiple query variants |
| Retrieval Method | Direct vector search | Multi-query + hybrid |
| Result Ranking | Score from DB | Reranked with semantic similarity |
| Context Quality | Full text | Compressed, relevant parts only |
| Response Accuracy | Good | Better |
| Response Time | Faster | Slightly slower but better quality |

## When to Use What?

**Use Basic RAG when:**
- You need fast response time
- Queries are straightforward
- Context is already well-structured

**Use Advanced RAG when:**
- You need higher accuracy
- Queries are complex or ambiguous
- Context documents are long
- You want better relevance

## Troubleshooting

### Error: "Tối đa 10 texts"
You're sending more than 10 texts. Reduce to max 10.

### Error: "Tối đa 10 images"
You're sending more than 10 images. Reduce to max 10.

### RAG stats show 0 results
Your `score_threshold` might be too high. Try lowering it (e.g., 0.3-0.5).

## Next Steps

To further improve RAG, consider:

1. **Add BM25 Hybrid Search**: Combine dense + sparse retrieval
2. **Use Cross-Encoder for Reranking**: Better than embedding similarity
3. **Implement Query Decomposition**: Break complex queries into sub-queries
4. **Add Citation/Source Tracking**: Show which document each fact comes from
5. **Integrate RAG-Anything**: For advanced multimodal document processing

For RAG-Anything integration (more complex), see: https://github.com/HKUDS/RAG-Anything