sdlc-agent / docs /QUICK_START_API.md
Veeru-c's picture
initial commit
06bd253

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Quick Start: RAG API

Fast API endpoint for querying product design documents with <3 second response times.

Deploy the API

# Deploy to Modal
modal deploy src/rag/rag_api.py

# Get the API URL
modal app show insurance-rag-api

Use the API

Python Client

from src.rag.api_client import RAGAPIClient

# Initialize client
client = RAGAPIClient(base_url="https://your-api-url.modal.run")

# Query
result = client.query("What are the three product tiers?")
print(result['answer'])
print(f"Response time: {result['total_time']:.2f}s")

cURL

curl -X POST https://your-api-url.modal.run/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the three product tiers?"}'

JavaScript

const response = await fetch('https://your-api-url.modal.run/query', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ question: 'What are the three product tiers?' })
});

const data = await response.json();
console.log(data.answer);

Test Performance

# Test with default URL
python tests/test_api.py

# Test with custom URL
python tests/test_api.py --url https://your-api-url.modal.run

Performance Target

  • Target: <3 seconds per query
  • Typical: 1.5-2.5 seconds
  • Optimizations: Warm containers, reduced tokens, limited context

API Endpoints

  • GET /health - Health check
  • POST /query - Query the RAG system
  • GET / - API information

See docs/api/RAG_API.md for full documentation.