File size: 1,556 Bytes
06bd253
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# Quick Start: RAG API

Fast API endpoint for querying product design documents with <3 second response times.

## Deploy the API

```bash
# Deploy to Modal
modal deploy src/rag/rag_api.py

# Get the API URL
modal app show insurance-rag-api
```

## Use the API

### Python Client

```python
from src.rag.api_client import RAGAPIClient

# Initialize client
client = RAGAPIClient(base_url="https://your-api-url.modal.run")

# Query
result = client.query("What are the three product tiers?")
print(result['answer'])
print(f"Response time: {result['total_time']:.2f}s")
```

### cURL

```bash
curl -X POST https://your-api-url.modal.run/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the three product tiers?"}'
```

### JavaScript

```javascript
const response = await fetch('https://your-api-url.modal.run/query', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ question: 'What are the three product tiers?' })
});

const data = await response.json();
console.log(data.answer);
```

## Test Performance

```bash
# Test with default URL
python tests/test_api.py

# Test with custom URL
python tests/test_api.py --url https://your-api-url.modal.run
```

## Performance Target

- **Target**: <3 seconds per query
- **Typical**: 1.5-2.5 seconds
- **Optimizations**: Warm containers, reduced tokens, limited context

## API Endpoints

- `GET /health` - Health check
- `POST /query` - Query the RAG system
- `GET /` - API information

See `docs/api/RAG_API.md` for full documentation.