Spaces:
Running
QuerySphere - API Documentation
Overview
The QuerySphere is a MVP level RAG (Retrieval-Augmented Generation) platform that enables organizations to unlock knowledge from multiple document sources while maintaining complete data privacy and eliminating API costs.
Base URL: http://localhost:8000 (or your deployed domain)
API Version: v1.0.0
Authentication
Currently, the API operates without authentication for local development. For production deployments, consider implementing:
- API Key Authentication
- JWT Tokens
- OAuth2
Rate Limiting
- Default: 100 requests per minute per IP
- File Uploads: 10MB max per file, 50MB total per request
- Chat Endpoints: 30 requests per minute per session
Response Format
All API responses follow this standard format:
{
"success": true,
"data": {...},
"message": "Operation completed successfully",
"timestamp": "2024-01-15T10:30:00Z"
}
Error responses:
{
"success": false,
"error": "Error Type",
"message": "Human-readable error message",
"detail": {...},
"timestamp": "2024-01-15T10:30:00Z"
}
System Management Endpoints
Get System Health
GET /api/health
Check system health and component status.
Response:
{
"status": "healthy",
"timestamp": "2024-01-15T10:30:00Z",
"version": "1.0.0",
"components": {
"vector_store": true,
"llm": true,
"embeddings": true,
"retrieval": true,
"generation": true
},
"details": {
"overall": "healthy",
"vector_store": true,
"llm": true,
"embeddings": true,
"retrieval": true,
"generation": true
}
}
Get System Information
GET /api/system-info
Get comprehensive system status and statistics.
Response:
{
"system_state": {
"is_ready": true,
"processing_status": "ready",
"total_documents": 15,
"active_sessions": 3
},
"configuration": {
"inference_model": "mistral:7b",
"embedding_model": "BAAI/bge-small-en-v1.5",
"retrieval_top_k": 10,
"vector_weight": 0.6,
"bm25_weight": 0.4,
"temperature": 0.1,
"enable_reranking": true
},
"llm_provider": {
"provider": "ollama",
"model": "mistral:7b",
"status": "healthy"
},
"system_information": {
"vector_store_status": "Ready (145 chunks)",
"current_model": "mistral:7b",
"embedding_model": "BAAI/bge-small-en-v1.5",
"chunking_strategy": "adaptive",
"system_uptime_seconds": 3600
},
"timestamp": "2024-01-15T10:30:00Z"
}
Document Management Endpoints
Upload Files
POST /api/upload
Upload multiple documents for processing.
Form Data:
files: List of files (PDF, DOCX, TXT, ZIP) - max 2GB total
Supported Formats:
- PDF Documents (.pdf)
- Microsoft Word (.docx, .doc)
- Text Files (.txt, .md)
- ZIP Archives (.zip) - automatic extraction
Response:
{
"success": true,
"message": "Successfully uploaded 3 files",
"files": [
{
"filename": "document_20240115_103000.pdf",
"original_name": "quarterly_report.pdf",
"size": 1542890,
"upload_time": "2024-01-15T10:30:00Z",
"file_path": "/uploads/document_20240115_103000.pdf",
"status": "uploaded"
}
]
}
Start Processing
POST /api/start-processing
Start processing uploaded documents through the RAG pipeline.
Pipeline Stages:
- Document parsing and text extraction
- Adaptive chunking (fixed/semantic/hierarchical)
- Embedding generation with BGE model
- Vector indexing (FAISS + BM25)
- Knowledge base compilation
Response:
{
"success": true,
"message": "Processing completed successfully",
"status": "ready",
"documents_processed": 3,
"total_chunks": 245,
"chunking_statistics": {
"adaptive": 120,
"semantic": 80,
"hierarchical": 45
},
"index_stats": {
"total_chunks_indexed": 245,
"vector_index_size": 245,
"bm25_indexed": true,
"metadata_stored": true
}
}
Get Processing Status
GET /api/processing-status
Monitor real-time processing progress.
Response:
{
"status": "processing",
"progress": 65,
"current_step": "Generating embeddings for quarterly_report.pdf...",
"processed": 2,
"total": 3,
"details": {
"chunks_processed": 156,
"embeddings_generated": 156
}
}
Chat & Query Endpoints
Chat with Documents
POST /api/chat
Query your knowledge base with natural language questions. Includes automatic RAGAS evaluation if enabled.
Request Body (JSON):
{
"message": "What were the Q3 revenue trends?",
"session_id": "session_1705314600"
}
Response:
{
"session_id": "session_1705314600",
"response": "Based on the Q3 financial report, revenue increased by 15% quarter-over-quarter, reaching $45 million. The growth was primarily driven by enterprise sales and new market expansion. [1][2]",
"sources": [
{
"rank": 1,
"score": 0.894,
"document_id": "doc_1705300000_abc123",
"chunk_id": "chunk_doc_1705300000_abc123_0",
"text_preview": "Q3 Financial Highlights: Revenue growth of 15% QoQ reaching $45M...",
"page_number": 7,
"section_title": "Financial Performance",
"retrieval_method": "hybrid"
}
],
"metrics": {
"retrieval_time": 245,
"generation_time": 3100,
"total_time": 3345,
"chunks_retrieved": 8,
"chunks_used": 3,
"tokens_used": 487
},
"ragas_metrics": {
"answer_relevancy": 0.89,
"faithfulness": 0.94,
"context_utilization": 0.87,
"context_relevancy": 0.91,
"overall_score": 0.90,
"context_precision": null,
"context_recall": null,
"answer_similarity": null,
"answer_correctness": null
}
}
Note: Ground truth metrics (context_precision, context_recall, answer_similarity, answer_correctness) are null unless ground truth is provided and RAGAS_ENABLE_GROUND_TRUTH=True.
Export Chat History
GET /api/export-chat/{session_id}
Export conversation history for analysis or reporting.
Parameters:
session_id: string (required) - Session identifierformat: string (optional) - Export format:json(default) orcsv
Response (JSON):
{
"session_id": "session_1705314600",
"export_time": "2024-01-15T11:00:00Z",
"total_messages": 5,
"history": [
{
"query": "What was the Q3 revenue growth?",
"response": "Revenue increased by 15% quarter-over-quarter...",
"sources": [...],
"timestamp": "2024-01-15T10:30:00Z",
"metrics": {
"total_time": 3345
},
"ragas_metrics": {
"answer_relevancy": 0.89,
"faithfulness": 0.94,
"overall_score": 0.90
}
}
]
}
RAGAS Evaluation Endpoints
Get RAGAS History
GET /api/ragas/history
Get complete RAGAS evaluation history for the current session.
Response:
{
"success": true,
"total_count": 25,
"statistics": {
"total_evaluations": 25,
"avg_answer_relevancy": 0.876,
"avg_faithfulness": 0.912,
"avg_context_utilization": 0.845,
"avg_context_relevancy": 0.889,
"avg_overall_score": 0.881,
"avg_retrieval_time_ms": 235,
"avg_generation_time_ms": 3250,
"avg_total_time_ms": 3485,
"min_score": 0.723,
"max_score": 0.967,
"std_dev": 0.089,
"session_start": "2024-01-15T09:00:00Z",
"last_updated": "2024-01-15T11:00:00Z"
},
"history": [
{
"query": "What were the Q3 revenue trends?",
"answer": "Revenue increased by 15%...",
"contexts": ["Q3 Financial Highlights...", "Revenue breakdown..."],
"timestamp": "2024-01-15T10:30:00Z",
"answer_relevancy": 0.89,
"faithfulness": 0.94,
"context_utilization": 0.87,
"context_relevancy": 0.91,
"overall_score": 0.90,
"retrieval_time_ms": 245,
"generation_time_ms": 3100,
"total_time_ms": 3345,
"chunks_retrieved": 8
}
]
}
Get RAGAS Statistics
GET /api/ragas/statistics
Get aggregate RAGAS statistics for the current session.
Response:
{
"success": true,
"statistics": {
"total_evaluations": 25,
"avg_answer_relevancy": 0.876,
"avg_faithfulness": 0.912,
"avg_context_utilization": 0.845,
"avg_context_relevancy": 0.889,
"avg_overall_score": 0.881,
"avg_retrieval_time_ms": 235,
"avg_generation_time_ms": 3250,
"avg_total_time_ms": 3485,
"min_score": 0.723,
"max_score": 0.967,
"std_dev": 0.089,
"session_start": "2024-01-15T09:00:00Z",
"last_updated": "2024-01-15T11:00:00Z"
}
}
Clear RAGAS History
POST /api/ragas/clear
Clear all RAGAS evaluation history and start a new session.
Response:
{
"success": true,
"message": "RAGAS evaluation history cleared, new session started"
}
Export RAGAS Data
GET /api/ragas/export
Export all RAGAS evaluation data as JSON.
Response: JSON file download containing:
{
"export_timestamp": "2024-01-15T11:00:00Z",
"total_evaluations": 25,
"statistics": {...},
"evaluations": [...],
"ground_truth_enabled": false
}
Get RAGAS Configuration
GET /api/ragas/config
Get current RAGAS configuration settings.
Response:
{
"enabled": true,
"ground_truth_enabled": false,
"base_metrics": [
"answer_relevancy",
"faithfulness",
"context_utilization",
"context_relevancy"
],
"ground_truth_metrics": [
"context_precision",
"context_recall",
"answer_similarity",
"answer_correctness"
],
"evaluation_timeout": 60,
"batch_size": 10
}
Analytics Endpoints
Get System Analytics
GET /api/analytics
Get comprehensive system analytics and performance metrics with caching.
Response:
{
"performance_metrics": {
"avg_response_time": 3485,
"min_response_time": 2100,
"max_response_time": 8900,
"total_queries": 127,
"queries_last_hour": 23,
"p95_response_time": 7200
},
"quality_metrics": {
"answer_relevancy": 0.876,
"faithfulness": 0.912,
"context_precision": 0.845,
"context_recall": null,
"overall_score": 0.878,
"avg_sources_per_query": 4.2,
"queries_with_sources": 125,
"confidence": "high",
"metrics_available": true
},
"system_information": {
"vector_store_status": "Ready (245 chunks)",
"current_model": "mistral:7b",
"embedding_model": "BAAI/bge-small-en-v1.5",
"chunking_strategy": "adaptive",
"system_uptime_seconds": 7200,
"last_updated": "2024-01-15T11:00:00Z"
},
"health_status": {
"overall": "healthy",
"llm": true,
"vector_store": true,
"embeddings": true,
"retrieval": true,
"generation": true
},
"chunking_statistics": {
"primary_strategy": "semantic",
"total_chunks": 245,
"strategies_used": {
"fixed": 98,
"semantic": 112,
"hierarchical": 35
}
},
"document_statistics": {
"total_documents": 15,
"total_chunks": 245,
"uploaded_files": 15,
"total_file_size_bytes": 52428800,
"total_file_size_mb": 50.0,
"avg_chunks_per_document": 16.3
},
"session_statistics": {
"total_sessions": 8,
"total_messages": 127,
"avg_messages_per_session": 15.9
},
"index_statistics": {
"total_chunks_indexed": 245,
"vector_index_size": 245,
"bm25_indexed": true
},
"calculated_at": "2024-01-15T11:00:00Z",
"cache_info": {
"from_cache": false,
"next_refresh_in": 30
}
}
Refresh Analytics Cache
GET /api/analytics/refresh
Force refresh analytics cache and get fresh data.
Response:
{
"success": true,
"message": "Analytics cache refreshed successfully",
"data": {
// Same structure as /api/analytics
}
}
Get Detailed Analytics
GET /api/analytics/detailed
Get detailed analytics including session breakdowns and component performance.
Response:
{
// All fields from /api/analytics, plus:
"detailed_sessions": [
{
"session_id": "session_1705314600",
"message_count": 12,
"first_message": "2024-01-15T09:00:00Z",
"last_message": "2024-01-15T10:45:00Z",
"total_response_time": 38500,
"avg_sources_per_query": 3.8
}
],
"component_performance": {
"retrieval": {
"avg_time_ms": 245,
"cache_hit_rate": 0.23
},
"embeddings": {
"model": "BAAI/bge-small-en-v1.5",
"dimension": 384,
"device": "cpu"
}
}
}
Configuration Endpoints
Get Current Configuration
GET /api/configuration
Retrieve current system configuration.
Response:
{
"configuration": {
"inference_model": "mistral:7b",
"embedding_model": "BAAI/bge-small-en-v1.5",
"vector_weight": 0.6,
"bm25_weight": 0.4,
"temperature": 0.1,
"max_tokens": 1000,
"chunk_size": 512,
"chunk_overlap": 50,
"top_k_retrieve": 10,
"enable_reranking": true,
"is_ready": true,
"llm_healthy": true
},
"health": {
"overall": "healthy",
"llm": true,
"vector_store": true,
"embeddings": true,
"retrieval": true,
"generation": true
}
}
Update Configuration
POST /api/configuration
Update system configuration parameters.
Form Data:
temperature: float (0.0-1.0) - Generation temperaturemax_tokens: integer (100-4000) - Maximum response tokensretrieval_top_k: integer (1-50) - Number of chunks to retrievevector_weight: float (0.0-1.0) - Weight for vector searchbm25_weight: float (0.0-1.0) - Weight for keyword searchenable_reranking: boolean - Enable cross-encoder rerankingsession_id: string (optional) - Session identifier for overrides
Response:
{
"success": true,
"message": "Configuration updated successfully",
"updates": {
"temperature": 0.2,
"retrieval_top_k": 15
}
}
Error Handling
Common HTTP Status Codes
- 200 - Success
- 400 - Bad Request (invalid parameters)
- 404 - Resource Not Found
- 500 - Internal Server Error
- 503 - Service Unavailable (component not ready)
Error Response Examples
RAGAS Evaluation Disabled:
{
"success": false,
"error": "RAGASDisabled",
"message": "RAGAS evaluation is not enabled. Set ENABLE_RAGAS=True in settings.",
"detail": {
"current_setting": "ENABLE_RAGAS=False"
},
"timestamp": "2024-01-15T10:30:00Z"
}
System Not Ready:
{
"success": false,
"error": "SystemNotReady",
"message": "System not ready. Please upload and process documents first.",
"detail": {
"is_ready": false,
"documents_processed": 0
},
"timestamp": "2024-01-15T10:30:00Z"
}
LLM Service Unavailable:
{
"success": false,
"error": "LLMUnavailable",
"message": "LLM service unavailable. Please ensure Ollama is running.",
"detail": {
"llm_healthy": false,
"suggestion": "Run 'ollama serve' in a separate terminal"
},
"timestamp": "2024-01-15T10:30:00Z"
}
Best Practices
1. File Upload
- Use chunked upload for large files (>100MB)
- Compress documents into ZIP archives for multiple files
- Ensure documents are text-extractable (not scanned images without OCR)
2. Query Optimization
- Be specific and contextual in questions
- Use natural language - no special syntax required
- Break complex questions into multiple simpler queries
3. Session Management
- Reuse
session_idfor conversation continuity - Sessions automatically expire after 24 hours of inactivity
- Export important conversations for long-term storage
4. RAGAS Evaluation
- Ensure OpenAI API key is configured for RAGAS to work
- Monitor evaluation metrics to track system quality
- Use analytics endpoints to identify quality trends
- Export evaluation data regularly for offline analysis
5. Performance Monitoring
- Monitor response times and token usage
- Use analytics endpoint for system health checks
- Set up alerts for quality metric degradation
- Enable caching for frequently accessed embeddings
6. Configuration Management
- Test configuration changes with a few queries first
- Monitor RAGAS metrics after configuration updates
- Use session-based overrides for experimentation
- Document optimal configurations for different use cases
SDK Examples
Python Client
import requests
class KnowledgeBaseClient:
def __init__(self, base_url="http://localhost:8000"):
self.base_url = base_url
self.session_id = None
def upload_documents(self, file_paths):
files = [('files', open(fpath, 'rb')) for fpath in file_paths]
response = requests.post(f"{self.base_url}/api/upload", files=files)
return response.json()
def start_processing(self):
response = requests.post(f"{self.base_url}/api/start-processing")
return response.json()
def query(self, question):
data = {'message': question}
if self.session_id:
data['session_id'] = self.session_id
response = requests.post(f"{self.base_url}/api/chat", json=data)
result = response.json()
if not self.session_id:
self.session_id = result.get('session_id')
return result
def get_ragas_history(self):
response = requests.get(f"{self.base_url}/api/ragas/history")
return response.json()
def get_analytics(self):
response = requests.get(f"{self.base_url}/api/analytics")
return response.json()
# Usage
client = KnowledgeBaseClient()
# Upload and process
client.upload_documents(['report.pdf', 'contract.docx'])
client.start_processing()
# Query
result = client.query("What are the key findings?")
print(result['response'])
print(f"Quality Score: {result['ragas_metrics']['overall_score']}")
# Get analytics
analytics = client.get_analytics()
print(f"Avg Response Time: {analytics['performance_metrics']['avg_response_time']}ms")
JavaScript Client
class KnowledgeBaseClient {
constructor(baseUrl = 'http://localhost:8000') {
this.baseUrl = baseUrl;
this.sessionId = null;
}
async uploadDocuments(files) {
const formData = new FormData();
files.forEach(file => formData.append('files', file));
const response = await fetch(`${this.baseUrl}/api/upload`, {
method: 'POST',
body: formData
});
return await response.json();
}
async startProcessing() {
const response = await fetch(`${this.baseUrl}/api/start-processing`, {
method: 'POST'
});
return await response.json();
}
async query(question) {
const body = { message: question };
if (this.sessionId) body.session_id = this.sessionId;
const response = await fetch(`${this.baseUrl}/api/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body)
});
const result = await response.json();
if (!this.sessionId) this.sessionId = result.session_id;
return result;
}
async getRagasHistory() {
const response = await fetch(`${this.baseUrl}/api/ragas/history`);
return await response.json();
}
async getAnalytics() {
const response = await fetch(`${this.baseUrl}/api/analytics`);
return await response.json();
}
}
// Usage
const client = new KnowledgeBaseClient();
// Query
const result = await client.query("What are the revenue trends?");
console.log(result.response);
console.log(`Quality: ${result.ragas_metrics.overall_score}`);
// Get RAGAS history
const history = await client.getRagasHistory();
console.log(`Total evaluations: ${history.total_count}`);
console.log(`Avg relevancy: ${history.statistics.avg_answer_relevancy}`);
Support & Troubleshooting
For API issues:
- Check system health endpoint first
- Verify document processing status
- Review error messages and suggested actions
- Check component readiness flags
For RAGAS issues:
- Ensure OpenAI API key is configured
- Check RAGAS is enabled in settings
- Monitor evaluation timeout settings
- Review logs for detailed error messages
For quality issues:
- Monitor RAGAS evaluation metrics
- Adjust retrieval and generation parameters
- Review source citations for context relevance
- Consider document preprocessing improvements
This API provides a complete RAG solution with multi-format document ingestion, intelligent retrieval, local LLM generation, and comprehensive RAGAS-based quality evaluation.