Spaces:
Sleeping
Sleeping
| # API Reference | |
| Base URL: `https://vn6295337-rag-document-assistant.hf.space/api` | |
| --- | |
| ## Zero-Storage Endpoints | |
| ### POST /embed-chunks | |
| Generate embeddings for text chunks and store in Pinecone. **Text is discarded immediately after embedding.** | |
| **Request Body:** | |
| ```json | |
| { | |
| "chunks": [ | |
| { | |
| "text": "The actual text content...", | |
| "metadata": { | |
| "filename": "document.pdf", | |
| "filePath": "/Documents/document.pdf", | |
| "fileId": "dropbox_file_id", | |
| "chunkIndex": 0, | |
| "startChar": 0, | |
| "endChar": 1000 | |
| } | |
| } | |
| ] | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "status": "success", | |
| "vectors_upserted": 5, | |
| "error": null | |
| } | |
| ``` | |
| **Privacy Note:** Text is used only for embedding generation, then immediately deleted. Only embeddings and metadata (file paths, positions) are stored. | |
| --- | |
| ### POST /query-secure | |
| Execute a zero-storage query. Re-fetches text from user's Dropbox at query time. | |
| **Request Body:** | |
| ```json | |
| { | |
| "query": "What are the payment terms?", | |
| "access_token": "dropbox_access_token", | |
| "top_k": 3 | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "answer": "According to the contract, payment is due within 30 days...", | |
| "citations": [ | |
| { | |
| "id": "file_id::0", | |
| "score": 0.85, | |
| "snippet": "Payment terms: Net 30 days from invoice date..." | |
| } | |
| ], | |
| "error": null | |
| } | |
| ``` | |
| **Flow:** | |
| 1. Generate query embedding | |
| 2. Search Pinecone for similar chunks | |
| 3. Re-fetch files from user's Dropbox using provided token | |
| 4. Extract chunk text using stored positions | |
| 5. Send to LLM for answer generation | |
| 6. Return answer (text never stored) | |
| --- | |
| ### DELETE /clear-index | |
| Clear all vectors from Pinecone index. | |
| **Response:** | |
| ```json | |
| { | |
| "status": "success", | |
| "message": "Index cleared" | |
| } | |
| ``` | |
| --- | |
| ## Dropbox Integration Endpoints | |
| ### POST /dropbox/token | |
| Exchange Dropbox authorization code for access token. Client secret is kept server-side. | |
| **Request Body:** | |
| ```json | |
| { | |
| "code": "authorization_code_from_oauth", | |
| "redirect_uri": "https://your-app.com/callback" | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "access_token": "sl.xxxxx", | |
| "token_type": "bearer", | |
| "expires_in": 14400 | |
| } | |
| ``` | |
| --- | |
| ### POST /dropbox/folder | |
| List contents of a Dropbox folder (proxy to avoid CORS). | |
| **Request Body:** | |
| ```json | |
| { | |
| "path": "", | |
| "access_token": "dropbox_access_token" | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "entries": [ | |
| { | |
| ".tag": "file", | |
| "name": "document.pdf", | |
| "id": "id:xxxxx", | |
| "path_lower": "/documents/document.pdf", | |
| "size": 12345 | |
| } | |
| ], | |
| "has_more": false | |
| } | |
| ``` | |
| --- | |
| ### POST /dropbox/file | |
| Download file content from Dropbox. Supports text files and PDFs (with text extraction). | |
| **Request Body:** | |
| ```json | |
| { | |
| "path": "/documents/document.pdf", | |
| "access_token": "dropbox_access_token" | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "content": "Extracted text content from the file..." | |
| } | |
| ``` | |
| **Supported File Types:** | |
| - `.txt` - Plain text | |
| - `.md` - Markdown | |
| - `.pdf` - PDF (text extraction via PyPDF2) | |
| --- | |
| ## Utility Endpoints | |
| ### GET /health | |
| Health check endpoint. | |
| **Response:** | |
| ```json | |
| { | |
| "status": "ok" | |
| } | |
| ``` | |
| --- | |
| ### GET /status | |
| Get current index status. | |
| **Query Parameters:** | |
| - `chunks_path` (optional): Path to chunks file (default: `data/chunks.jsonl`) | |
| **Response:** | |
| ```json | |
| { | |
| "exists": true, | |
| "chunks": 44, | |
| "documents": 5, | |
| "path": "data/chunks.jsonl", | |
| "error": null | |
| } | |
| ``` | |
| --- | |
| ## Legacy Endpoints | |
| ### POST /query | |
| Standard RAG query (uses local chunks file, not zero-storage). | |
| **Request Body:** | |
| ```json | |
| { | |
| "query": "What is GDPR?", | |
| "top_k": 3, | |
| "use_hybrid": false, | |
| "use_reranking": false | |
| } | |
| ``` | |
| --- | |
| ### POST /ingest | |
| Ingest documents from a directory (server-side processing). | |
| **Request Body:** | |
| ```json | |
| { | |
| "docs_dir": "sample_docs", | |
| "output_path": "data/chunks.jsonl", | |
| "provider": "sentence-transformers" | |
| } | |
| ``` | |
| --- | |
| ## Error Responses | |
| All endpoints return errors in this format: | |
| ```json | |
| { | |
| "status": "error", | |
| "error": "Description of the error" | |
| } | |
| ``` | |
| Common error codes: | |
| - Missing required parameters | |
| - Invalid access token | |
| - Dropbox API errors | |
| - Pinecone connection errors | |
| - LLM provider failures | |
| --- | |
| ## Rate Limits | |
| - **Embedding**: No explicit limit (Pinecone free tier: 100K operations/month) | |
| - **Queries**: Subject to LLM provider limits: | |
| - Gemini: 15 RPM | |
| - Groq: 30 RPM | |
| - OpenRouter: Varies by model | |
| --- | |
| ## Environment Variables | |
| Required on backend: | |
| - `PINECONE_API_KEY` - Pinecone vector database | |
| - `DROPBOX_APP_KEY` - Dropbox OAuth app key | |
| - `DROPBOX_APP_SECRET` - Dropbox OAuth app secret | |
| - `GEMINI_API_KEY` - Google Gemini API (primary LLM) | |
| - `GROQ_API_KEY` - Groq API (fallback LLM) | |