shibbir24's picture
Upload 520 files
e6410cf verified
# RAG API
This document outlines the API endpoints for managing Retrieval-Augmented Generation (RAG) components in PySpur.
## Document Collections
### Create Document Collection
**Description**: Creates a new document collection from uploaded files and metadata. The files are processed asynchronously in the background.
**URL**: `/rag/collections/`
**Method**: POST
**Form Data**:
```python
files: List[UploadFile] # List of files to upload (optional)
metadata: str # JSON string containing collection configuration
```
Where `metadata` is a JSON string representing:
```python
class DocumentCollectionCreateSchema:
name: str # Name of the collection
description: str # Description of the collection
text_processing: ChunkingConfigSchema # Configuration for text processing
```
**Response Schema**:
```python
class DocumentCollectionResponseSchema:
id: str # ID of the document collection
name: str # Name of the collection
description: str # Description of the collection
status: str # Status of the collection (processing, ready, failed)
created_at: str # When the collection was created (ISO format)
updated_at: str # When the collection was last updated (ISO format)
document_count: int # Number of documents in the collection
chunk_count: int # Number of chunks in the collection
error_message: Optional[str] # Error message if processing failed
```
### List Document Collections
**Description**: Lists all document collections.
**URL**: `/rag/collections/`
**Method**: GET
**Response Schema**:
```python
List[DocumentCollectionResponseSchema]
```
### Get Document Collection
**Description**: Gets details of a specific document collection.
**URL**: `/rag/collections/{collection_id}/`
**Method**: GET
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Response Schema**:
```python
class DocumentCollectionResponseSchema:
id: str # ID of the document collection
name: str # Name of the collection
description: str # Description of the collection
status: str # Status of the collection (processing, ready, failed)
created_at: str # When the collection was created (ISO format)
updated_at: str # When the collection was last updated (ISO format)
document_count: int # Number of documents in the collection
chunk_count: int # Number of chunks in the collection
error_message: Optional[str] # Error message if processing failed
```
### Delete Document Collection
**Description**: Deletes a document collection and its associated data.
**URL**: `/rag/collections/{collection_id}/`
**Method**: DELETE
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Response**: 200 OK with message
### Get Collection Progress
**Description**: Gets the processing progress of a document collection.
**URL**: `/rag/collections/{collection_id}/progress/`
**Method**: GET
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Response Schema**:
```python
class ProcessingProgressSchema:
id: str # ID of the collection
status: str # Status of processing
progress: float # Progress percentage (0-100)
current_step: Optional[str] # Current processing step
total_files: Optional[int] # Total number of files
processed_files: Optional[int] # Number of processed files
total_chunks: Optional[int] # Total number of chunks
processed_chunks: Optional[int] # Number of processed chunks
error_message: Optional[str] # Error message if processing failed
created_at: str # When processing started (ISO format)
updated_at: str # When processing was last updated (ISO format)
```
### Add Documents to Collection
**Description**: Adds documents to an existing collection. The documents are processed asynchronously in the background.
**URL**: `/rag/collections/{collection_id}/documents/`
**Method**: POST
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Form Data**:
```python
files: List[UploadFile] # List of files to upload
```
**Response Schema**:
```python
class DocumentCollectionResponseSchema:
# Same as Get Document Collection
```
### Get Collection Documents
**Description**: Gets all documents and their chunks for a collection.
**URL**: `/rag/collections/{collection_id}/documents/`
**Method**: GET
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Response Schema**:
```python
List[DocumentWithChunksSchema]
```
Where `DocumentWithChunksSchema` contains:
```python
class DocumentWithChunksSchema:
id: str # ID of the document
title: str # Title of the document
metadata: Dict[str, Any] # Metadata about the document
chunks: List[DocumentChunkSchema] # List of chunks in the document
```
### Delete Document from Collection
**Description**: Deletes a document from a collection.
**URL**: `/rag/collections/{collection_id}/documents/{document_id}/`
**Method**: DELETE
**Parameters**:
```python
collection_id: str # ID of the document collection
document_id: str # ID of the document to delete
```
**Response**: 200 OK with message
### Preview Chunk
**Description**: Previews how a document would be chunked with a given configuration.
**URL**: `/rag/collections/preview_chunk/`
**Method**: POST
**Form Data**:
```python
file: UploadFile # File to preview
chunking_config: str # JSON string containing chunking configuration
```
**Response Schema**:
```python
{
"chunks": List[Dict[str, Any]], # Preview of chunks
"total_chunks": int # Total number of chunks
}
```
## Vector Indices
### Create Vector Index
**Description**: Creates a new vector index from a document collection. The index is created asynchronously in the background.
**URL**: `/rag/indices/`
**Method**: POST
**Request Payload**:
```python
class VectorIndexCreateSchema:
name: str # Name of the index
description: str # Description of the index
collection_id: str # ID of the document collection
embedding: EmbeddingConfigSchema # Configuration for embedding
```
**Response Schema**:
```python
class VectorIndexResponseSchema:
id: str # ID of the vector index
name: str # Name of the index
description: str # Description of the index
collection_id: str # ID of the document collection
status: str # Status of the index (processing, ready, failed)
created_at: str # When the index was created (ISO format)
updated_at: str # When the index was last updated (ISO format)
document_count: int # Number of documents in the index
chunk_count: int # Number of chunks in the index
embedding_model: str # Name of the embedding model
vector_db: str # Name of the vector database
error_message: Optional[str] # Error message if processing failed
```
### List Vector Indices
**Description**: Lists all vector indices.
**URL**: `/rag/indices/`
**Method**: GET
**Response Schema**:
```python
List[VectorIndexResponseSchema]
```
### Get Vector Index
**Description**: Gets details of a specific vector index.
**URL**: `/rag/indices/{index_id}/`
**Method**: GET
**Parameters**:
```python
index_id: str # ID of the vector index
```
**Response Schema**:
```python
class VectorIndexResponseSchema:
# Same as Create Vector Index response
```
### Delete Vector Index
**Description**: Deletes a vector index and its associated data.
**URL**: `/rag/indices/{index_id}/`
**Method**: DELETE
**Parameters**:
```python
index_id: str # ID of the vector index
```
**Response**: 200 OK with message
### Get Index Progress
**Description**: Gets the processing progress of a vector index.
**URL**: `/rag/indices/{index_id}/progress/`
**Method**: GET
**Parameters**:
```python
index_id: str # ID of the vector index
```
**Response Schema**:
```python
class ProcessingProgressSchema:
# Same as Get Collection Progress response
```
### Retrieve from Index
**Description**: Retrieves relevant chunks from a vector index based on a query.
**URL**: `/rag/indices/{index_id}/retrieve/`
**Method**: POST
**Parameters**:
```python
index_id: str # ID of the vector index
```
**Request Payload**:
```python
class RetrievalRequestSchema:
query: str # Query to search for
top_k: Optional[int] = 5 # Number of results to return
score_threshold: Optional[float] = None # Minimum score threshold
semantic_weight: Optional[float] = 1.0 # Weight for semantic search
keyword_weight: Optional[float] = 0.0 # Weight for keyword search
```
**Response Schema**:
```python
class RetrievalResponseSchema:
results: List[RetrievalResultSchema] # List of retrieval results
total_results: int # Total number of results
```
Where `RetrievalResultSchema` contains:
```python
class RetrievalResultSchema:
text: str # Text of the chunk
score: float # Relevance score
metadata: ChunkMetadataSchema # Metadata about the chunk
```