Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.1.0
Quick Start: RAG API
Fast API endpoint for querying product design documents with <3 second response times.
Deploy the API
# Deploy to Modal
modal deploy src/rag/rag_api.py
# Get the API URL
modal app show insurance-rag-api
Use the API
Python Client
from src.rag.api_client import RAGAPIClient
# Initialize client
client = RAGAPIClient(base_url="https://your-api-url.modal.run")
# Query
result = client.query("What are the three product tiers?")
print(result['answer'])
print(f"Response time: {result['total_time']:.2f}s")
cURL
curl -X POST https://your-api-url.modal.run/query \
-H "Content-Type: application/json" \
-d '{"question": "What are the three product tiers?"}'
JavaScript
const response = await fetch('https://your-api-url.modal.run/query', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ question: 'What are the three product tiers?' })
});
const data = await response.json();
console.log(data.answer);
Test Performance
# Test with default URL
python tests/test_api.py
# Test with custom URL
python tests/test_api.py --url https://your-api-url.modal.run
Performance Target
- Target: <3 seconds per query
- Typical: 1.5-2.5 seconds
- Optimizations: Warm containers, reduced tokens, limited context
API Endpoints
GET /health- Health checkPOST /query- Query the RAG systemGET /- API information
See docs/api/RAG_API.md for full documentation.