Spaces:

Haiss123
/

Embeddings_Chat

Sleeping

File size: 7,199 Bytes

6b98b09

# Advanced RAG Chatbot - User Guide

## What's New?

### 1. Multiple Images & Texts Support in `/index` API

The `/index` endpoint now supports indexing multiple texts and images in a single request (max 10 each).

**Before:**
```python
# Old: Only 1 text and 1 image
data = {
    'id': 'doc1',
    'text': 'Single text',
}
files = {'image': open('image.jpg', 'rb')}
```

**After:**
```python
# New: Multiple texts and images (max 10 each)
data = {
    'id': 'doc1',
    'texts': ['Text 1', 'Text 2', 'Text 3'],  # Up to 10
}
files = [
    ('images', open('image1.jpg', 'rb')),
    ('images', open('image2.jpg', 'rb')),
    ('images', open('image3.jpg', 'rb')),  # Up to 10
]
response = requests.post('http://localhost:8000/index', data=data, files=files)
```

**Example with cURL:**
```bash
curl -X POST "http://localhost:8000/index" \
  -F "id=event123" \
  -F "texts=Sự kiện âm nhạc tại Hà Nội" \
  -F "texts=Diễn ra vào ngày 20/10/2025" \
  -F "texts=Địa điểm: Trung tâm Hội nghị Quốc gia" \
  -F "images=@poster1.jpg" \
  -F "images=@poster2.jpg" \
  -F "images=@poster3.jpg"
```

### 2. Advanced RAG Pipeline in `/chat` API

The chat endpoint now uses modern RAG techniques for better response quality:

#### Key Improvements:

1. **Query Expansion**: Automatically expands your question with variations
2. **Multi-Query Retrieval**: Searches with multiple query variants
3. **Reranking**: Re-scores results for better relevance
4. **Contextual Compression**: Keeps only the most relevant parts
5. **Better Prompt Engineering**: Optimized prompts for LLM

#### How to Use:

**Basic Usage (Auto-enabled):**
```python
import requests

response = requests.post('http://localhost:8000/chat', json={
    'message': 'Dao có nguy hiểm không?',
    'use_rag': True,
    'use_advanced_rag': True,  # Default: True
    'hf_token': 'hf_xxxxx'
})

result = response.json()
print("Response:", result['response'])
print("RAG Stats:", result['rag_stats'])  # See pipeline statistics
```

**Advanced Configuration:**
```python
response = requests.post('http://localhost:8000/chat', json={
    'message': 'Làm sao để tạo event mới?',
    'use_rag': True,
    'use_advanced_rag': True,

    # RAG Pipeline Options
    'use_query_expansion': True,    # Expand query with variations
    'use_reranking': True,          # Rerank results
    'use_compression': True,        # Compress context
    'score_threshold': 0.5,         # Min relevance score (0-1)
    'top_k': 5,                     # Number of documents to retrieve

    # LLM Options
    'max_tokens': 512,
    'temperature': 0.7,
    'hf_token': 'hf_xxxxx'
})
```

**Disable Advanced RAG (Use Basic):**
```python
response = requests.post('http://localhost:8000/chat', json={
    'message': 'Your question',
    'use_rag': True,
    'use_advanced_rag': False,  # Use basic RAG
})
```

## API Changes Summary

### `/index` Endpoint

**Old Parameters:**
- `id`: str (required)
- `text`: str (required)
- `image`: UploadFile (optional)

**New Parameters:**
- `id`: str (required)
- `texts`: List[str] (optional, max 10)
- `images`: List[UploadFile] (optional, max 10)

**Response:**
```json
{
  "success": true,
  "id": "doc123",
  "message": "Đã index thành công document doc123 với 3 texts và 2 images"
}
```

### `/chat` Endpoint

**New Parameters:**
- `use_advanced_rag`: bool (default: True) - Enable advanced RAG
- `use_query_expansion`: bool (default: True) - Expand query
- `use_reranking`: bool (default: True) - Rerank results
- `use_compression`: bool (default: True) - Compress context
- `score_threshold`: float (default: 0.5) - Min relevance score

**Response (New):**
```json
{
  "response": "AI generated answer...",
  "context_used": [...],
  "timestamp": "2025-10-29T...",
  "rag_stats": {
    "original_query": "Your question",
    "expanded_queries": ["Query variant 1", "Query variant 2"],
    "initial_results": 10,
    "after_rerank": 5,
    "after_compression": 5
  }
}
```

## Complete Examples

### Example 1: Index Multiple Social Media Posts

```python
import requests

# Index a social media event with multiple posts and images
data = {
    'id': 'event_festival_2025',
    'texts': [
        'Festival âm nhạc quốc tế Hà Nội 2025',
        'Ngày 15-17 tháng 11 năm 2025',
        'Địa điểm: Công viên Thống Nhất',
        'Line-up: Sơn Tùng MTP, Đen Vâu, Hoàng Thùy Linh',
        'Giá vé từ 500.000đ - 2.000.000đ'
    ]
}

files = [
    ('images', open('poster_festival.jpg', 'rb')),
    ('images', open('lineup.jpg', 'rb')),
    ('images', open('venue_map.jpg', 'rb'))
]

response = requests.post('http://localhost:8000/index', data=data, files=files)
print(response.json())
```

### Example 2: Advanced RAG Chat

```python
import requests

# Chat with advanced RAG
chat_response = requests.post('http://localhost:8000/chat', json={
    'message': 'Festival âm nhạc Hà Nội diễn ra khi nào và ở đâu?',
    'use_rag': True,
    'use_advanced_rag': True,
    'top_k': 3,
    'score_threshold': 0.6,
    'hf_token': 'your_hf_token_here'
})

result = chat_response.json()
print("Answer:", result['response'])
print("\nRetrieved Context:")
for ctx in result['context_used']:
    print(f"- [{ctx['id']}] Confidence: {ctx['confidence']:.2%}")

print("\nRAG Pipeline Stats:")
print(f"- Original query: {result['rag_stats']['original_query']}")
print(f"- Query variants: {result['rag_stats']['expanded_queries']}")
print(f"- Documents retrieved: {result['rag_stats']['initial_results']}")
print(f"- After reranking: {result['rag_stats']['after_rerank']}")
```

## Performance Comparison

| Feature | Basic RAG | Advanced RAG |
|---------|-----------|--------------|
| Query Understanding | Single query | Multiple query variants |
| Retrieval Method | Direct vector search | Multi-query + hybrid |
| Result Ranking | Score from DB | Reranked with semantic similarity |
| Context Quality | Full text | Compressed, relevant parts only |
| Response Accuracy | Good | Better |
| Response Time | Faster | Slightly slower but better quality |

## When to Use What?

**Use Basic RAG when:**
- You need fast response time
- Queries are straightforward
- Context is already well-structured

**Use Advanced RAG when:**
- You need higher accuracy
- Queries are complex or ambiguous
- Context documents are long
- You want better relevance

## Troubleshooting

### Error: "Tối đa 10 texts"
You're sending more than 10 texts. Reduce to max 10.

### Error: "Tối đa 10 images"
You're sending more than 10 images. Reduce to max 10.

### RAG stats show 0 results
Your `score_threshold` might be too high. Try lowering it (e.g., 0.3-0.5).

## Next Steps

To further improve RAG, consider:

1. **Add BM25 Hybrid Search**: Combine dense + sparse retrieval
2. **Use Cross-Encoder for Reranking**: Better than embedding similarity
3. **Implement Query Decomposition**: Break complex queries into sub-queries
4. **Add Citation/Source Tracking**: Show which document each fact comes from
5. **Integrate RAG-Anything**: For advanced multimodal document processing

For RAG-Anything integration (more complex), see: https://github.com/HKUDS/RAG-Anything