| # External API Documentation | |
| This document explains how to use the Document Parsing API from external applications using API key authentication. | |
| ## Table of Contents | |
| 1. [Overview](#overview) | |
| 2. [Authentication](#authentication) | |
| 3. [API Endpoints](#api-endpoints) | |
| 4. [Usage Examples](#usage-examples) | |
| 5. [Response Format](#response-format) | |
| 6. [Error Handling](#error-handling) | |
| ## Overview | |
| The Document Parsing API allows external applications to extract text and structured data from PDF and image files. The API supports: | |
| - **File Types**: PDF, PNG, JPEG, TIFF | |
| - **Max File Size**: 4 MB | |
| - **Authentication**: API Key (via `X-API-Key` header) or JWT Bearer token | |
| - **Response Format**: JSON | |
| ## Authentication | |
| ### Step 1: Create an Account | |
| First, you need to create an account using one of these methods: | |
| 1. **Firebase Authentication** (via web UI) | |
| 2. **OTP Authentication** (via API) | |
| #### OTP Authentication Flow | |
| ```bash | |
| # 1. Request OTP | |
| curl -X POST https://your-api-url/api/auth/otp/request \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "email": "your-business-email@company.com" | |
| }' | |
| # Response: | |
| # { | |
| # "success": true, | |
| # "message": "OTP sent to your email" | |
| # } | |
| # 2. Verify OTP and get JWT token | |
| curl -X POST https://your-api-url/api/auth/otp/verify \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "email": "your-business-email@company.com", | |
| "otp": "123456" | |
| }' | |
| # Response: | |
| # { | |
| # "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...", | |
| # "user": { ... } | |
| # } | |
| ``` | |
| **Note**: Only business email addresses are allowed (no Gmail, Yahoo, etc.) | |
| ### Step 2: Create an API Key | |
| Once authenticated, create an API key for your external application: | |
| ```bash | |
| # Create API key (requires JWT token from Step 1) | |
| curl -X POST https://your-api-url/api/auth/api-key/create \ | |
| -H "Authorization: Bearer YOUR_JWT_TOKEN" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "name": "My External App" | |
| }' | |
| # Response: | |
| # { | |
| # "success": true, | |
| # "api_key": "sk_live_abc123...", # ⚠️ SAVE THIS - shown only once! | |
| # "key_id": 1, | |
| # "key_prefix": "sk_live_abc...", | |
| # "name": "My External App", | |
| # "created_at": "2024-01-15T10:30:00", | |
| # "message": "API key created successfully. Store this key securely - it will not be shown again!" | |
| # } | |
| ``` | |
| **⚠️ IMPORTANT**: The full API key is only shown once when created. Store it securely in your application's environment variables or secret management system. | |
| ### Step 3: Use API Key for Authentication | |
| Use the API key in the `X-API-Key` header for all subsequent API calls: | |
| ```bash | |
| curl -X POST https://your-api-url/api/extract \ | |
| -H "X-API-Key: sk_live_abc123..." \ | |
| -F "file=@document.pdf" \ | |
| -F "key_fields=Invoice Number,Invoice Date,Total Amount" | |
| ``` | |
| ## API Endpoints | |
| ### 1. Document Extraction | |
| **Endpoint**: `POST /api/extract` | |
| **Authentication**: | |
| - API Key: `X-API-Key: <your-api-key>` | |
| - OR JWT: `Authorization: Bearer <jwt-token>` | |
| **Parameters**: | |
| - `file` (required): The document file (PDF, PNG, JPEG, TIFF) | |
| - `key_fields` (optional): Comma-separated list of specific fields to extract | |
| **Example Request**: | |
| ```bash | |
| curl -X POST https://your-api-url/api/extract \ | |
| -H "X-API-Key: sk_live_abc123..." \ | |
| -F "file=@invoice.pdf" \ | |
| -F "key_fields=Invoice Number,Invoice Date,Total Amount,PO Number" | |
| ``` | |
| **Example with cURL (file upload)**: | |
| ```bash | |
| curl -X POST https://your-api-url/api/extract \ | |
| -H "X-API-Key: sk_live_abc123..." \ | |
| -F "file=@/path/to/document.pdf" | |
| ``` | |
| ### 2. List API Keys | |
| **Endpoint**: `GET /api/auth/api-keys` | |
| **Authentication**: JWT Bearer token (required) | |
| **Example**: | |
| ```bash | |
| curl -X GET https://your-api-url/api/auth/api-keys \ | |
| -H "Authorization: Bearer YOUR_JWT_TOKEN" | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "success": true, | |
| "api_keys": [ | |
| { | |
| "id": 1, | |
| "name": "My External App", | |
| "key_prefix": "sk_live_abc...", | |
| "is_active": true, | |
| "last_used_at": "2024-01-15T14:30:00", | |
| "created_at": "2024-01-15T10:30:00" | |
| } | |
| ] | |
| } | |
| ``` | |
| ### 3. Delete API Key | |
| **Endpoint**: `DELETE /api/auth/api-key/{key_id}` | |
| **Authentication**: JWT Bearer token (required) | |
| **Example**: | |
| ```bash | |
| curl -X DELETE https://your-api-url/api/auth/api-key/1 \ | |
| -H "Authorization: Bearer YOUR_JWT_TOKEN" | |
| ``` | |
| ## Usage Examples | |
| ### Python Example | |
| ```python | |
| import requests | |
| # API Configuration | |
| API_BASE_URL = "https://your-api-url" | |
| API_KEY = "sk_live_abc123..." # Your API key | |
| # Extract document | |
| def extract_document(file_path, key_fields=None): | |
| url = f"{API_BASE_URL}/api/extract" | |
| headers = { | |
| "X-API-Key": API_KEY | |
| } | |
| with open(file_path, 'rb') as f: | |
| files = {'file': f} | |
| data = {} | |
| if key_fields: | |
| data['key_fields'] = key_fields | |
| response = requests.post(url, headers=headers, files=files, data=data) | |
| response.raise_for_status() | |
| return response.json() | |
| # Usage | |
| result = extract_document("invoice.pdf", key_fields="Invoice Number,Invoice Date,Total Amount") | |
| print(result) | |
| ``` | |
| ### JavaScript/Node.js Example | |
| ```javascript | |
| const FormData = require('form-data'); | |
| const fs = require('fs'); | |
| const axios = require('axios'); | |
| // API Configuration | |
| const API_BASE_URL = 'https://your-api-url'; | |
| const API_KEY = 'sk_live_abc123...'; // Your API key | |
| // Extract document | |
| async function extractDocument(filePath, keyFields = null) { | |
| const form = new FormData(); | |
| form.append('file', fs.createReadStream(filePath)); | |
| if (keyFields) { | |
| form.append('key_fields', keyFields); | |
| } | |
| try { | |
| const response = await axios.post(`${API_BASE_URL}/api/extract`, form, { | |
| headers: { | |
| 'X-API-Key': API_KEY, | |
| ...form.getHeaders() | |
| } | |
| }); | |
| return response.data; | |
| } catch (error) { | |
| console.error('Error:', error.response?.data || error.message); | |
| throw error; | |
| } | |
| } | |
| // Usage | |
| extractDocument('invoice.pdf', 'Invoice Number,Invoice Date,Total Amount') | |
| .then(result => console.log(result)) | |
| .catch(error => console.error(error)); | |
| ``` | |
| ### PHP Example | |
| ```php | |
| <?php | |
| $apiBaseUrl = "https://your-api-url"; | |
| $apiKey = "sk_live_abc123..."; // Your API key | |
| function extractDocument($filePath, $keyFields = null) { | |
| global $apiBaseUrl, $apiKey; | |
| $url = $apiBaseUrl . "/api/extract"; | |
| $curl = curl_init(); | |
| $postData = [ | |
| 'file' => new CURLFile($filePath) | |
| ]; | |
| if ($keyFields) { | |
| $postData['key_fields'] = $keyFields; | |
| } | |
| curl_setopt_array($curl, [ | |
| CURLOPT_URL => $url, | |
| CURLOPT_RETURNTRANSFER => true, | |
| CURLOPT_POST => true, | |
| CURLOPT_POSTFIELDS => $postData, | |
| CURLOPT_HTTPHEADER => [ | |
| "X-API-Key: " . $apiKey | |
| ] | |
| ]); | |
| $response = curl_exec($curl); | |
| $httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE); | |
| curl_close($curl); | |
| if ($httpCode !== 200) { | |
| throw new Exception("API request failed: " . $response); | |
| } | |
| return json_decode($response, true); | |
| } | |
| // Usage | |
| try { | |
| $result = extractDocument("invoice.pdf", "Invoice Number,Invoice Date,Total Amount"); | |
| print_r($result); | |
| } catch (Exception $e) { | |
| echo "Error: " . $e->getMessage(); | |
| } | |
| ?> | |
| ``` | |
| ## Response Format | |
| ### Success Response | |
| ```json | |
| { | |
| "id": 123, | |
| "fileName": "invoice.pdf", | |
| "fileType": "application/pdf", | |
| "fileSize": "2.5 MB", | |
| "status": "completed", | |
| "confidence": 92.5, | |
| "fieldsExtracted": 15, | |
| "totalTime": 3500, | |
| "fields": { | |
| "page_1": { | |
| "text": "Extracted text from page 1...", | |
| "table": { | |
| "row_1": { | |
| "column_1": "value1", | |
| "column_2": "value2" | |
| } | |
| }, | |
| "footer_notes": ["Note 1", "Note 2"] | |
| } | |
| }, | |
| "full_text": "Complete extracted text from all pages...", | |
| "Fields": { | |
| "Invoice Number": "INV-001", | |
| "Invoice Date": "2024-01-15", | |
| "Total Amount": "$1,234.56" | |
| }, | |
| "stages": { | |
| "uploading": { | |
| "time": 525, | |
| "status": "completed", | |
| "variation": "normal" | |
| }, | |
| "aiAnalysis": { | |
| "time": 1925, | |
| "status": "completed", | |
| "variation": "normal" | |
| }, | |
| "dataExtraction": { | |
| "time": 700, | |
| "status": "completed", | |
| "variation": "fast" | |
| }, | |
| "outputRendering": { | |
| "time": 350, | |
| "status": "completed", | |
| "variation": "normal" | |
| } | |
| }, | |
| "errorMessage": null | |
| } | |
| ``` | |
| ### Response Fields | |
| - `id`: Extraction record ID | |
| - `fileName`: Original filename | |
| - `fileType`: MIME type of the file | |
| - `fileSize`: File size as string | |
| - `status`: "completed" or "failed" | |
| - `confidence`: Extraction confidence (0-100) | |
| - `fieldsExtracted`: Number of fields extracted | |
| - `totalTime`: Total processing time in milliseconds | |
| - `fields`: Structured data with page-wise extraction (tables, text, metadata) | |
| - `full_text`: Complete extracted text from all pages | |
| - `Fields`: User-specified fields extracted (if `key_fields` parameter was provided) | |
| - `stages`: Processing stage timings | |
| - `errorMessage`: Error message if extraction failed | |
| ## Error Handling | |
| ### Authentication Errors | |
| **401 Unauthorized** - Invalid or missing API key: | |
| ```json | |
| { | |
| "detail": "Invalid API key" | |
| } | |
| ``` | |
| **401 Unauthorized** - No authentication provided: | |
| ```json | |
| { | |
| "detail": "Authentication required. Provide either a Bearer token or X-API-Key header." | |
| } | |
| ``` | |
| ### Validation Errors | |
| **400 Bad Request** - File too large: | |
| ```json | |
| { | |
| "detail": "File size exceeds 4 MB limit. Your file is 5.2 MB." | |
| } | |
| ``` | |
| **400 Bad Request** - Invalid file type: | |
| ```json | |
| { | |
| "detail": "Only PDF, PNG, JPG, and TIFF files are allowed." | |
| } | |
| ``` | |
| ### Processing Errors | |
| **500 Internal Server Error** - Extraction failed: | |
| ```json | |
| { | |
| "id": 123, | |
| "status": "failed", | |
| "confidence": 0.0, | |
| "fieldsExtracted": 0, | |
| "errorMessage": "OCR processing failed: ..." | |
| } | |
| ``` | |
| ## Best Practices | |
| 1. **Store API Keys Securely**: Never commit API keys to version control. Use environment variables or secret management systems. | |
| 2. **Handle Errors Gracefully**: Always check the `status` field in the response. If `status` is "failed", check `errorMessage` for details. | |
| 3. **Respect Rate Limits**: If rate limiting is implemented, handle 429 responses appropriately with exponential backoff. | |
| 4. **Validate File Types**: Check file type and size before uploading to avoid unnecessary API calls. | |
| 5. **Use Specific Fields**: When you know what fields to extract, use the `key_fields` parameter for better accuracy and faster processing. | |
| 6. **Monitor API Key Usage**: Regularly check your API keys via the `/api/auth/api-keys` endpoint to monitor usage and detect unauthorized access. | |
| ## Security Notes | |
| - API keys are hashed before storage in the database | |
| - Only the key prefix is shown when listing API keys | |
| - API keys can be deactivated (soft deleted) but not permanently deleted | |
| - Each API key is tied to a specific user account | |
| - API key usage is tracked with `last_used_at` timestamp | |
| ## Support | |
| For issues or questions: | |
| 1. Check the error message in the API response | |
| 2. Verify your API key is active and correct | |
| 3. Ensure your file meets the requirements (type, size) | |
| 4. Check the API status endpoint: `GET /ping` | |