# RAG API This document outlines the API endpoints for managing Retrieval-Augmented Generation (RAG) components in PySpur. ## Document Collections ### Create Document Collection **Description**: Creates a new document collection from uploaded files and metadata. The files are processed asynchronously in the background. **URL**: `/rag/collections/` **Method**: POST **Form Data**: ```python files: List[UploadFile] # List of files to upload (optional) metadata: str # JSON string containing collection configuration ``` Where `metadata` is a JSON string representing: ```python class DocumentCollectionCreateSchema: name: str # Name of the collection description: str # Description of the collection text_processing: ChunkingConfigSchema # Configuration for text processing ``` **Response Schema**: ```python class DocumentCollectionResponseSchema: id: str # ID of the document collection name: str # Name of the collection description: str # Description of the collection status: str # Status of the collection (processing, ready, failed) created_at: str # When the collection was created (ISO format) updated_at: str # When the collection was last updated (ISO format) document_count: int # Number of documents in the collection chunk_count: int # Number of chunks in the collection error_message: Optional[str] # Error message if processing failed ``` ### List Document Collections **Description**: Lists all document collections. **URL**: `/rag/collections/` **Method**: GET **Response Schema**: ```python List[DocumentCollectionResponseSchema] ``` ### Get Document Collection **Description**: Gets details of a specific document collection. **URL**: `/rag/collections/{collection_id}/` **Method**: GET **Parameters**: ```python collection_id: str # ID of the document collection ``` **Response Schema**: ```python class DocumentCollectionResponseSchema: id: str # ID of the document collection name: str # Name of the collection description: str # Description of the collection status: str # Status of the collection (processing, ready, failed) created_at: str # When the collection was created (ISO format) updated_at: str # When the collection was last updated (ISO format) document_count: int # Number of documents in the collection chunk_count: int # Number of chunks in the collection error_message: Optional[str] # Error message if processing failed ``` ### Delete Document Collection **Description**: Deletes a document collection and its associated data. **URL**: `/rag/collections/{collection_id}/` **Method**: DELETE **Parameters**: ```python collection_id: str # ID of the document collection ``` **Response**: 200 OK with message ### Get Collection Progress **Description**: Gets the processing progress of a document collection. **URL**: `/rag/collections/{collection_id}/progress/` **Method**: GET **Parameters**: ```python collection_id: str # ID of the document collection ``` **Response Schema**: ```python class ProcessingProgressSchema: id: str # ID of the collection status: str # Status of processing progress: float # Progress percentage (0-100) current_step: Optional[str] # Current processing step total_files: Optional[int] # Total number of files processed_files: Optional[int] # Number of processed files total_chunks: Optional[int] # Total number of chunks processed_chunks: Optional[int] # Number of processed chunks error_message: Optional[str] # Error message if processing failed created_at: str # When processing started (ISO format) updated_at: str # When processing was last updated (ISO format) ``` ### Add Documents to Collection **Description**: Adds documents to an existing collection. The documents are processed asynchronously in the background. **URL**: `/rag/collections/{collection_id}/documents/` **Method**: POST **Parameters**: ```python collection_id: str # ID of the document collection ``` **Form Data**: ```python files: List[UploadFile] # List of files to upload ``` **Response Schema**: ```python class DocumentCollectionResponseSchema: # Same as Get Document Collection ``` ### Get Collection Documents **Description**: Gets all documents and their chunks for a collection. **URL**: `/rag/collections/{collection_id}/documents/` **Method**: GET **Parameters**: ```python collection_id: str # ID of the document collection ``` **Response Schema**: ```python List[DocumentWithChunksSchema] ``` Where `DocumentWithChunksSchema` contains: ```python class DocumentWithChunksSchema: id: str # ID of the document title: str # Title of the document metadata: Dict[str, Any] # Metadata about the document chunks: List[DocumentChunkSchema] # List of chunks in the document ``` ### Delete Document from Collection **Description**: Deletes a document from a collection. **URL**: `/rag/collections/{collection_id}/documents/{document_id}/` **Method**: DELETE **Parameters**: ```python collection_id: str # ID of the document collection document_id: str # ID of the document to delete ``` **Response**: 200 OK with message ### Preview Chunk **Description**: Previews how a document would be chunked with a given configuration. **URL**: `/rag/collections/preview_chunk/` **Method**: POST **Form Data**: ```python file: UploadFile # File to preview chunking_config: str # JSON string containing chunking configuration ``` **Response Schema**: ```python { "chunks": List[Dict[str, Any]], # Preview of chunks "total_chunks": int # Total number of chunks } ``` ## Vector Indices ### Create Vector Index **Description**: Creates a new vector index from a document collection. The index is created asynchronously in the background. **URL**: `/rag/indices/` **Method**: POST **Request Payload**: ```python class VectorIndexCreateSchema: name: str # Name of the index description: str # Description of the index collection_id: str # ID of the document collection embedding: EmbeddingConfigSchema # Configuration for embedding ``` **Response Schema**: ```python class VectorIndexResponseSchema: id: str # ID of the vector index name: str # Name of the index description: str # Description of the index collection_id: str # ID of the document collection status: str # Status of the index (processing, ready, failed) created_at: str # When the index was created (ISO format) updated_at: str # When the index was last updated (ISO format) document_count: int # Number of documents in the index chunk_count: int # Number of chunks in the index embedding_model: str # Name of the embedding model vector_db: str # Name of the vector database error_message: Optional[str] # Error message if processing failed ``` ### List Vector Indices **Description**: Lists all vector indices. **URL**: `/rag/indices/` **Method**: GET **Response Schema**: ```python List[VectorIndexResponseSchema] ``` ### Get Vector Index **Description**: Gets details of a specific vector index. **URL**: `/rag/indices/{index_id}/` **Method**: GET **Parameters**: ```python index_id: str # ID of the vector index ``` **Response Schema**: ```python class VectorIndexResponseSchema: # Same as Create Vector Index response ``` ### Delete Vector Index **Description**: Deletes a vector index and its associated data. **URL**: `/rag/indices/{index_id}/` **Method**: DELETE **Parameters**: ```python index_id: str # ID of the vector index ``` **Response**: 200 OK with message ### Get Index Progress **Description**: Gets the processing progress of a vector index. **URL**: `/rag/indices/{index_id}/progress/` **Method**: GET **Parameters**: ```python index_id: str # ID of the vector index ``` **Response Schema**: ```python class ProcessingProgressSchema: # Same as Get Collection Progress response ``` ### Retrieve from Index **Description**: Retrieves relevant chunks from a vector index based on a query. **URL**: `/rag/indices/{index_id}/retrieve/` **Method**: POST **Parameters**: ```python index_id: str # ID of the vector index ``` **Request Payload**: ```python class RetrievalRequestSchema: query: str # Query to search for top_k: Optional[int] = 5 # Number of results to return score_threshold: Optional[float] = None # Minimum score threshold semantic_weight: Optional[float] = 1.0 # Weight for semantic search keyword_weight: Optional[float] = 0.0 # Weight for keyword search ``` **Response Schema**: ```python class RetrievalResponseSchema: results: List[RetrievalResultSchema] # List of retrieval results total_results: int # Total number of results ``` Where `RetrievalResultSchema` contains: ```python class RetrievalResultSchema: text: str # Text of the chunk score: float # Relevance score metadata: ChunkMetadataSchema # Metadata about the chunk ```