File size: 9,363 Bytes
e6410cf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 | # RAG API
This document outlines the API endpoints for managing Retrieval-Augmented Generation (RAG) components in PySpur.
## Document Collections
### Create Document Collection
**Description**: Creates a new document collection from uploaded files and metadata. The files are processed asynchronously in the background.
**URL**: `/rag/collections/`
**Method**: POST
**Form Data**:
```python
files: List[UploadFile] # List of files to upload (optional)
metadata: str # JSON string containing collection configuration
```
Where `metadata` is a JSON string representing:
```python
class DocumentCollectionCreateSchema:
name: str # Name of the collection
description: str # Description of the collection
text_processing: ChunkingConfigSchema # Configuration for text processing
```
**Response Schema**:
```python
class DocumentCollectionResponseSchema:
id: str # ID of the document collection
name: str # Name of the collection
description: str # Description of the collection
status: str # Status of the collection (processing, ready, failed)
created_at: str # When the collection was created (ISO format)
updated_at: str # When the collection was last updated (ISO format)
document_count: int # Number of documents in the collection
chunk_count: int # Number of chunks in the collection
error_message: Optional[str] # Error message if processing failed
```
### List Document Collections
**Description**: Lists all document collections.
**URL**: `/rag/collections/`
**Method**: GET
**Response Schema**:
```python
List[DocumentCollectionResponseSchema]
```
### Get Document Collection
**Description**: Gets details of a specific document collection.
**URL**: `/rag/collections/{collection_id}/`
**Method**: GET
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Response Schema**:
```python
class DocumentCollectionResponseSchema:
id: str # ID of the document collection
name: str # Name of the collection
description: str # Description of the collection
status: str # Status of the collection (processing, ready, failed)
created_at: str # When the collection was created (ISO format)
updated_at: str # When the collection was last updated (ISO format)
document_count: int # Number of documents in the collection
chunk_count: int # Number of chunks in the collection
error_message: Optional[str] # Error message if processing failed
```
### Delete Document Collection
**Description**: Deletes a document collection and its associated data.
**URL**: `/rag/collections/{collection_id}/`
**Method**: DELETE
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Response**: 200 OK with message
### Get Collection Progress
**Description**: Gets the processing progress of a document collection.
**URL**: `/rag/collections/{collection_id}/progress/`
**Method**: GET
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Response Schema**:
```python
class ProcessingProgressSchema:
id: str # ID of the collection
status: str # Status of processing
progress: float # Progress percentage (0-100)
current_step: Optional[str] # Current processing step
total_files: Optional[int] # Total number of files
processed_files: Optional[int] # Number of processed files
total_chunks: Optional[int] # Total number of chunks
processed_chunks: Optional[int] # Number of processed chunks
error_message: Optional[str] # Error message if processing failed
created_at: str # When processing started (ISO format)
updated_at: str # When processing was last updated (ISO format)
```
### Add Documents to Collection
**Description**: Adds documents to an existing collection. The documents are processed asynchronously in the background.
**URL**: `/rag/collections/{collection_id}/documents/`
**Method**: POST
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Form Data**:
```python
files: List[UploadFile] # List of files to upload
```
**Response Schema**:
```python
class DocumentCollectionResponseSchema:
# Same as Get Document Collection
```
### Get Collection Documents
**Description**: Gets all documents and their chunks for a collection.
**URL**: `/rag/collections/{collection_id}/documents/`
**Method**: GET
**Parameters**:
```python
collection_id: str # ID of the document collection
```
**Response Schema**:
```python
List[DocumentWithChunksSchema]
```
Where `DocumentWithChunksSchema` contains:
```python
class DocumentWithChunksSchema:
id: str # ID of the document
title: str # Title of the document
metadata: Dict[str, Any] # Metadata about the document
chunks: List[DocumentChunkSchema] # List of chunks in the document
```
### Delete Document from Collection
**Description**: Deletes a document from a collection.
**URL**: `/rag/collections/{collection_id}/documents/{document_id}/`
**Method**: DELETE
**Parameters**:
```python
collection_id: str # ID of the document collection
document_id: str # ID of the document to delete
```
**Response**: 200 OK with message
### Preview Chunk
**Description**: Previews how a document would be chunked with a given configuration.
**URL**: `/rag/collections/preview_chunk/`
**Method**: POST
**Form Data**:
```python
file: UploadFile # File to preview
chunking_config: str # JSON string containing chunking configuration
```
**Response Schema**:
```python
{
"chunks": List[Dict[str, Any]], # Preview of chunks
"total_chunks": int # Total number of chunks
}
```
## Vector Indices
### Create Vector Index
**Description**: Creates a new vector index from a document collection. The index is created asynchronously in the background.
**URL**: `/rag/indices/`
**Method**: POST
**Request Payload**:
```python
class VectorIndexCreateSchema:
name: str # Name of the index
description: str # Description of the index
collection_id: str # ID of the document collection
embedding: EmbeddingConfigSchema # Configuration for embedding
```
**Response Schema**:
```python
class VectorIndexResponseSchema:
id: str # ID of the vector index
name: str # Name of the index
description: str # Description of the index
collection_id: str # ID of the document collection
status: str # Status of the index (processing, ready, failed)
created_at: str # When the index was created (ISO format)
updated_at: str # When the index was last updated (ISO format)
document_count: int # Number of documents in the index
chunk_count: int # Number of chunks in the index
embedding_model: str # Name of the embedding model
vector_db: str # Name of the vector database
error_message: Optional[str] # Error message if processing failed
```
### List Vector Indices
**Description**: Lists all vector indices.
**URL**: `/rag/indices/`
**Method**: GET
**Response Schema**:
```python
List[VectorIndexResponseSchema]
```
### Get Vector Index
**Description**: Gets details of a specific vector index.
**URL**: `/rag/indices/{index_id}/`
**Method**: GET
**Parameters**:
```python
index_id: str # ID of the vector index
```
**Response Schema**:
```python
class VectorIndexResponseSchema:
# Same as Create Vector Index response
```
### Delete Vector Index
**Description**: Deletes a vector index and its associated data.
**URL**: `/rag/indices/{index_id}/`
**Method**: DELETE
**Parameters**:
```python
index_id: str # ID of the vector index
```
**Response**: 200 OK with message
### Get Index Progress
**Description**: Gets the processing progress of a vector index.
**URL**: `/rag/indices/{index_id}/progress/`
**Method**: GET
**Parameters**:
```python
index_id: str # ID of the vector index
```
**Response Schema**:
```python
class ProcessingProgressSchema:
# Same as Get Collection Progress response
```
### Retrieve from Index
**Description**: Retrieves relevant chunks from a vector index based on a query.
**URL**: `/rag/indices/{index_id}/retrieve/`
**Method**: POST
**Parameters**:
```python
index_id: str # ID of the vector index
```
**Request Payload**:
```python
class RetrievalRequestSchema:
query: str # Query to search for
top_k: Optional[int] = 5 # Number of results to return
score_threshold: Optional[float] = None # Minimum score threshold
semantic_weight: Optional[float] = 1.0 # Weight for semantic search
keyword_weight: Optional[float] = 0.0 # Weight for keyword search
```
**Response Schema**:
```python
class RetrievalResponseSchema:
results: List[RetrievalResultSchema] # List of retrieval results
total_results: int # Total number of results
```
Where `RetrievalResultSchema` contains:
```python
class RetrievalResultSchema:
text: str # Text of the chunk
score: float # Relevance score
metadata: ChunkMetadataSchema # Metadata about the chunk
``` |