Spaces:

satyakimitra
/

QuerySphere

Running

App Files Files Community

QuerySphere / docs /API.md

satyakimitra

Renaming the app at places

69c2ef1 about 2 months ago

preview code

raw

history blame contribute delete

20.8 kB

QuerySphere - API Documentation

Overview

The QuerySphere is a MVP level RAG (Retrieval-Augmented Generation) platform that enables organizations to unlock knowledge from multiple document sources while maintaining complete data privacy and eliminating API costs.

Base URL: http://localhost:8000 (or your deployed domain)

API Version: v1.0.0

Authentication

Currently, the API operates without authentication for local development. For production deployments, consider implementing:

API Key Authentication
JWT Tokens
OAuth2

Rate Limiting

Default: 100 requests per minute per IP
File Uploads: 10MB max per file, 50MB total per request
Chat Endpoints: 30 requests per minute per session

Response Format

All API responses follow this standard format:

{
  "success": true,
  "data": {...},
  "message": "Operation completed successfully",
  "timestamp": "2024-01-15T10:30:00Z"
}

Error responses:

{
  "success": false,
  "error": "Error Type",
  "message": "Human-readable error message",
  "detail": {...},
  "timestamp": "2024-01-15T10:30:00Z"
}

System Management Endpoints

Get System Health

GET /api/health

Check system health and component status.

Response:

{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00Z",
  "version": "1.0.0",
  "components": {
    "vector_store": true,
    "llm": true,
    "embeddings": true,
    "retrieval": true,
    "generation": true
  },
  "details": {
    "overall": "healthy",
    "vector_store": true,
    "llm": true,
    "embeddings": true,
    "retrieval": true,
    "generation": true
  }
}

Get System Information

GET /api/system-info

Get comprehensive system status and statistics.

Response:

{
  "system_state": {
    "is_ready": true,
    "processing_status": "ready",
    "total_documents": 15,
    "active_sessions": 3
  },
  "configuration": {
    "inference_model": "mistral:7b",
    "embedding_model": "BAAI/bge-small-en-v1.5",
    "retrieval_top_k": 10,
    "vector_weight": 0.6,
    "bm25_weight": 0.4,
    "temperature": 0.1,
    "enable_reranking": true
  },
  "llm_provider": {
    "provider": "ollama",
    "model": "mistral:7b",
    "status": "healthy"
  },
  "system_information": {
    "vector_store_status": "Ready (145 chunks)",
    "current_model": "mistral:7b",
    "embedding_model": "BAAI/bge-small-en-v1.5",
    "chunking_strategy": "adaptive",
    "system_uptime_seconds": 3600
  },
  "timestamp": "2024-01-15T10:30:00Z"
}

Document Management Endpoints

Upload Files

POST /api/upload

Upload multiple documents for processing.

Form Data:

files: List of files (PDF, DOCX, TXT, ZIP) - max 2GB total

Supported Formats:

PDF Documents (.pdf)
Microsoft Word (.docx, .doc)
Text Files (.txt, .md)
ZIP Archives (.zip) - automatic extraction

Response:

{
  "success": true,
  "message": "Successfully uploaded 3 files",
  "files": [
    {
      "filename": "document_20240115_103000.pdf",
      "original_name": "quarterly_report.pdf",
      "size": 1542890,
      "upload_time": "2024-01-15T10:30:00Z",
      "file_path": "/uploads/document_20240115_103000.pdf",
      "status": "uploaded"
    }
  ]
}

Start Processing

POST /api/start-processing

Start processing uploaded documents through the RAG pipeline.

Pipeline Stages:

Document parsing and text extraction
Adaptive chunking (fixed/semantic/hierarchical)
Embedding generation with BGE model
Vector indexing (FAISS + BM25)
Knowledge base compilation

Response:

{
  "success": true,
  "message": "Processing completed successfully",
  "status": "ready",
  "documents_processed": 3,
  "total_chunks": 245,
  "chunking_statistics": {
    "adaptive": 120,
    "semantic": 80,
    "hierarchical": 45
  },
  "index_stats": {
    "total_chunks_indexed": 245,
    "vector_index_size": 245,
    "bm25_indexed": true,
    "metadata_stored": true
  }
}

Get Processing Status

GET /api/processing-status

Monitor real-time processing progress.

Response:

{
  "status": "processing",
  "progress": 65,
  "current_step": "Generating embeddings for quarterly_report.pdf...",
  "processed": 2,
  "total": 3,
  "details": {
    "chunks_processed": 156,
    "embeddings_generated": 156
  }
}

Chat & Query Endpoints

Chat with Documents

POST /api/chat

Query your knowledge base with natural language questions. Includes automatic RAGAS evaluation if enabled.

Request Body (JSON):

{
  "message": "What were the Q3 revenue trends?",
  "session_id": "session_1705314600"
}

Response:

{
  "session_id": "session_1705314600",
  "response": "Based on the Q3 financial report, revenue increased by 15% quarter-over-quarter, reaching $45 million. The growth was primarily driven by enterprise sales and new market expansion. [1][2]",
  "sources": [
    {
      "rank": 1,
      "score": 0.894,
      "document_id": "doc_1705300000_abc123",
      "chunk_id": "chunk_doc_1705300000_abc123_0",
      "text_preview": "Q3 Financial Highlights: Revenue growth of 15% QoQ reaching $45M...",
      "page_number": 7,
      "section_title": "Financial Performance",
      "retrieval_method": "hybrid"
    }
  ],
  "metrics": {
    "retrieval_time": 245,
    "generation_time": 3100,
    "total_time": 3345,
    "chunks_retrieved": 8,
    "chunks_used": 3,
    "tokens_used": 487
  },
  "ragas_metrics": {
    "answer_relevancy": 0.89,
    "faithfulness": 0.94,
    "context_utilization": 0.87,
    "context_relevancy": 0.91,
    "overall_score": 0.90,
    "context_precision": null,
    "context_recall": null,
    "answer_similarity": null,
    "answer_correctness": null
  }
}

Note: Ground truth metrics (context_precision, context_recall, answer_similarity, answer_correctness) are null unless ground truth is provided and RAGAS_ENABLE_GROUND_TRUTH=True.

Export Chat History

GET /api/export-chat/{session_id}

Export conversation history for analysis or reporting.

Parameters:

session_id: string (required) - Session identifier
format: string (optional) - Export format: json (default) or csv

Response (JSON):

{
  "session_id": "session_1705314600",
  "export_time": "2024-01-15T11:00:00Z",
  "total_messages": 5,
  "history": [
    {
      "query": "What was the Q3 revenue growth?",
      "response": "Revenue increased by 15% quarter-over-quarter...",
      "sources": [...],
      "timestamp": "2024-01-15T10:30:00Z",
      "metrics": {
        "total_time": 3345
      },
      "ragas_metrics": {
        "answer_relevancy": 0.89,
        "faithfulness": 0.94,
        "overall_score": 0.90
      }
    }
  ]
}

RAGAS Evaluation Endpoints

Get RAGAS History

GET /api/ragas/history

Get complete RAGAS evaluation history for the current session.

Response:

{
  "success": true,
  "total_count": 25,
  "statistics": {
    "total_evaluations": 25,
    "avg_answer_relevancy": 0.876,
    "avg_faithfulness": 0.912,
    "avg_context_utilization": 0.845,
    "avg_context_relevancy": 0.889,
    "avg_overall_score": 0.881,
    "avg_retrieval_time_ms": 235,
    "avg_generation_time_ms": 3250,
    "avg_total_time_ms": 3485,
    "min_score": 0.723,
    "max_score": 0.967,
    "std_dev": 0.089,
    "session_start": "2024-01-15T09:00:00Z",
    "last_updated": "2024-01-15T11:00:00Z"
  },
  "history": [
    {
      "query": "What were the Q3 revenue trends?",
      "answer": "Revenue increased by 15%...",
      "contexts": ["Q3 Financial Highlights...", "Revenue breakdown..."],
      "timestamp": "2024-01-15T10:30:00Z",
      "answer_relevancy": 0.89,
      "faithfulness": 0.94,
      "context_utilization": 0.87,
      "context_relevancy": 0.91,
      "overall_score": 0.90,
      "retrieval_time_ms": 245,
      "generation_time_ms": 3100,
      "total_time_ms": 3345,
      "chunks_retrieved": 8
    }
  ]
}

Get RAGAS Statistics

GET /api/ragas/statistics

Get aggregate RAGAS statistics for the current session.

Response:

{
  "success": true,
  "statistics": {
    "total_evaluations": 25,
    "avg_answer_relevancy": 0.876,
    "avg_faithfulness": 0.912,
    "avg_context_utilization": 0.845,
    "avg_context_relevancy": 0.889,
    "avg_overall_score": 0.881,
    "avg_retrieval_time_ms": 235,
    "avg_generation_time_ms": 3250,
    "avg_total_time_ms": 3485,
    "min_score": 0.723,
    "max_score": 0.967,
    "std_dev": 0.089,
    "session_start": "2024-01-15T09:00:00Z",
    "last_updated": "2024-01-15T11:00:00Z"
  }
}

Clear RAGAS History

POST /api/ragas/clear

Clear all RAGAS evaluation history and start a new session.

Response:

{
  "success": true,
  "message": "RAGAS evaluation history cleared, new session started"
}

Export RAGAS Data

GET /api/ragas/export

Export all RAGAS evaluation data as JSON.

Response: JSON file download containing:

{
  "export_timestamp": "2024-01-15T11:00:00Z",
  "total_evaluations": 25,
  "statistics": {...},
  "evaluations": [...],
  "ground_truth_enabled": false
}

Get RAGAS Configuration

GET /api/ragas/config

Get current RAGAS configuration settings.

Response:

{
  "enabled": true,
  "ground_truth_enabled": false,
  "base_metrics": [
    "answer_relevancy",
    "faithfulness",
    "context_utilization",
    "context_relevancy"
  ],
  "ground_truth_metrics": [
    "context_precision",
    "context_recall",
    "answer_similarity",
    "answer_correctness"
  ],
  "evaluation_timeout": 60,
  "batch_size": 10
}

Analytics Endpoints

Get System Analytics

GET /api/analytics

Get comprehensive system analytics and performance metrics with caching.

Response:

{
  "performance_metrics": {
    "avg_response_time": 3485,
    "min_response_time": 2100,
    "max_response_time": 8900,
    "total_queries": 127,
    "queries_last_hour": 23,
    "p95_response_time": 7200
  },
  "quality_metrics": {
    "answer_relevancy": 0.876,
    "faithfulness": 0.912,
    "context_precision": 0.845,
    "context_recall": null,
    "overall_score": 0.878,
    "avg_sources_per_query": 4.2,
    "queries_with_sources": 125,
    "confidence": "high",
    "metrics_available": true
  },
  "system_information": {
    "vector_store_status": "Ready (245 chunks)",
    "current_model": "mistral:7b",
    "embedding_model": "BAAI/bge-small-en-v1.5",
    "chunking_strategy": "adaptive",
    "system_uptime_seconds": 7200,
    "last_updated": "2024-01-15T11:00:00Z"
  },
  "health_status": {
    "overall": "healthy",
    "llm": true,
    "vector_store": true,
    "embeddings": true,
    "retrieval": true,
    "generation": true
  },
  "chunking_statistics": {
    "primary_strategy": "semantic",
    "total_chunks": 245,
    "strategies_used": {
      "fixed": 98,
      "semantic": 112,
      "hierarchical": 35
    }
  },
  "document_statistics": {
    "total_documents": 15,
    "total_chunks": 245,
    "uploaded_files": 15,
    "total_file_size_bytes": 52428800,
    "total_file_size_mb": 50.0,
    "avg_chunks_per_document": 16.3
  },
  "session_statistics": {
    "total_sessions": 8,
    "total_messages": 127,
    "avg_messages_per_session": 15.9
  },
  "index_statistics": {
    "total_chunks_indexed": 245,
    "vector_index_size": 245,
    "bm25_indexed": true
  },
  "calculated_at": "2024-01-15T11:00:00Z",
  "cache_info": {
    "from_cache": false,
    "next_refresh_in": 30
  }
}

Refresh Analytics Cache

GET /api/analytics/refresh

Force refresh analytics cache and get fresh data.

Response:

{
  "success": true,
  "message": "Analytics cache refreshed successfully",
  "data": {
    // Same structure as /api/analytics
  }
}

Get Detailed Analytics

GET /api/analytics/detailed

Get detailed analytics including session breakdowns and component performance.

Response:

{
  // All fields from /api/analytics, plus:
  "detailed_sessions": [
    {
      "session_id": "session_1705314600",
      "message_count": 12,
      "first_message": "2024-01-15T09:00:00Z",
      "last_message": "2024-01-15T10:45:00Z",
      "total_response_time": 38500,
      "avg_sources_per_query": 3.8
    }
  ],
  "component_performance": {
    "retrieval": {
      "avg_time_ms": 245,
      "cache_hit_rate": 0.23
    },
    "embeddings": {
      "model": "BAAI/bge-small-en-v1.5",
      "dimension": 384,
      "device": "cpu"
    }
  }
}

Configuration Endpoints

Get Current Configuration

GET /api/configuration

Retrieve current system configuration.

Response:

{
  "configuration": {
    "inference_model": "mistral:7b",
    "embedding_model": "BAAI/bge-small-en-v1.5",
    "vector_weight": 0.6,
    "bm25_weight": 0.4,
    "temperature": 0.1,
    "max_tokens": 1000,
    "chunk_size": 512,
    "chunk_overlap": 50,
    "top_k_retrieve": 10,
    "enable_reranking": true,
    "is_ready": true,
    "llm_healthy": true
  },
  "health": {
    "overall": "healthy",
    "llm": true,
    "vector_store": true,
    "embeddings": true,
    "retrieval": true,
    "generation": true
  }
}

Update Configuration

POST /api/configuration

Update system configuration parameters.

Form Data:

temperature: float (0.0-1.0) - Generation temperature
max_tokens: integer (100-4000) - Maximum response tokens
retrieval_top_k: integer (1-50) - Number of chunks to retrieve
vector_weight: float (0.0-1.0) - Weight for vector search
bm25_weight: float (0.0-1.0) - Weight for keyword search
enable_reranking: boolean - Enable cross-encoder reranking
session_id: string (optional) - Session identifier for overrides

Response:

{
  "success": true,
  "message": "Configuration updated successfully",
  "updates": {
    "temperature": 0.2,
    "retrieval_top_k": 15
  }
}

Error Handling

Common HTTP Status Codes

200 - Success
400 - Bad Request (invalid parameters)
404 - Resource Not Found
500 - Internal Server Error
503 - Service Unavailable (component not ready)

Error Response Examples

RAGAS Evaluation Disabled:

{
  "success": false,
  "error": "RAGASDisabled",
  "message": "RAGAS evaluation is not enabled. Set ENABLE_RAGAS=True in settings.",
  "detail": {
    "current_setting": "ENABLE_RAGAS=False"
  },
  "timestamp": "2024-01-15T10:30:00Z"
}

System Not Ready:

{
  "success": false,
  "error": "SystemNotReady",
  "message": "System not ready. Please upload and process documents first.",
  "detail": {
    "is_ready": false,
    "documents_processed": 0
  },
  "timestamp": "2024-01-15T10:30:00Z"
}

LLM Service Unavailable:

{
  "success": false,
  "error": "LLMUnavailable",
  "message": "LLM service unavailable. Please ensure Ollama is running.",
  "detail": {
    "llm_healthy": false,
    "suggestion": "Run 'ollama serve' in a separate terminal"
  },
  "timestamp": "2024-01-15T10:30:00Z"
}

Best Practices

1. File Upload

Use chunked upload for large files (>100MB)
Compress documents into ZIP archives for multiple files
Ensure documents are text-extractable (not scanned images without OCR)

2. Query Optimization

Be specific and contextual in questions
Use natural language - no special syntax required
Break complex questions into multiple simpler queries

3. Session Management

Reuse session_id for conversation continuity
Sessions automatically expire after 24 hours of inactivity
Export important conversations for long-term storage

4. RAGAS Evaluation

Ensure OpenAI API key is configured for RAGAS to work
Monitor evaluation metrics to track system quality
Use analytics endpoints to identify quality trends
Export evaluation data regularly for offline analysis

5. Performance Monitoring

Monitor response times and token usage
Use analytics endpoint for system health checks
Set up alerts for quality metric degradation
Enable caching for frequently accessed embeddings

6. Configuration Management

Test configuration changes with a few queries first
Monitor RAGAS metrics after configuration updates
Use session-based overrides for experimentation
Document optimal configurations for different use cases

SDK Examples

Python Client

import requests

class KnowledgeBaseClient:
    def __init__(self, base_url="http://localhost:8000"):
        self.base_url = base_url
        self.session_id = None
        
    def upload_documents(self, file_paths):
        files = [('files', open(fpath, 'rb')) for fpath in file_paths]
        response = requests.post(f"{self.base_url}/api/upload", files=files)
        return response.json()
    
    def start_processing(self):
        response = requests.post(f"{self.base_url}/api/start-processing")
        return response.json()
    
    def query(self, question):
        data = {'message': question}
        if self.session_id:
            data['session_id'] = self.session_id
        response = requests.post(f"{self.base_url}/api/chat", json=data)
        result = response.json()
        if not self.session_id:
            self.session_id = result.get('session_id')
        return result
    
    def get_ragas_history(self):
        response = requests.get(f"{self.base_url}/api/ragas/history")
        return response.json()
    
    def get_analytics(self):
        response = requests.get(f"{self.base_url}/api/analytics")
        return response.json()

# Usage
client = KnowledgeBaseClient()

# Upload and process
client.upload_documents(['report.pdf', 'contract.docx'])
client.start_processing()

# Query
result = client.query("What are the key findings?")
print(result['response'])
print(f"Quality Score: {result['ragas_metrics']['overall_score']}")

# Get analytics
analytics = client.get_analytics()
print(f"Avg Response Time: {analytics['performance_metrics']['avg_response_time']}ms")

JavaScript Client

class KnowledgeBaseClient {
    constructor(baseUrl = 'http://localhost:8000') {
        this.baseUrl = baseUrl;
        this.sessionId = null;
    }
    
    async uploadDocuments(files) {
        const formData = new FormData();
        files.forEach(file => formData.append('files', file));
        
        const response = await fetch(`${this.baseUrl}/api/upload`, {
            method: 'POST',
            body: formData
        });
        return await response.json();
    }
    
    async startProcessing() {
        const response = await fetch(`${this.baseUrl}/api/start-processing`, {
            method: 'POST'
        });
        return await response.json();
    }
    
    async query(question) {
        const body = { message: question };
        if (this.sessionId) body.session_id = this.sessionId;
        
        const response = await fetch(`${this.baseUrl}/api/chat`, {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify(body)
        });
        
        const result = await response.json();
        if (!this.sessionId) this.sessionId = result.session_id;
        return result;
    }
    
    async getRagasHistory() {
        const response = await fetch(`${this.baseUrl}/api/ragas/history`);
        return await response.json();
    }
    
    async getAnalytics() {
        const response = await fetch(`${this.baseUrl}/api/analytics`);
        return await response.json();
    }
}

// Usage
const client = new KnowledgeBaseClient();

// Query
const result = await client.query("What are the revenue trends?");
console.log(result.response);
console.log(`Quality: ${result.ragas_metrics.overall_score}`);

// Get RAGAS history
const history = await client.getRagasHistory();
console.log(`Total evaluations: ${history.total_count}`);
console.log(`Avg relevancy: ${history.statistics.avg_answer_relevancy}`);

Support & Troubleshooting

For API issues:

Check system health endpoint first
Verify document processing status
Review error messages and suggested actions
Check component readiness flags

For RAGAS issues:

Ensure OpenAI API key is configured
Check RAGAS is enabled in settings
Monitor evaluation timeout settings
Review logs for detailed error messages

For quality issues:

Monitor RAGAS evaluation metrics
Adjust retrieval and generation parameters
Review source citations for context relevance
Consider document preprocessing improvements

This API provides a complete RAG solution with multi-format document ingestion, intelligent retrieval, local LLM generation, and comprehensive RAGAS-based quality evaluation.