External API Documentation
This document explains how to use the Document Parsing API from external applications using API key authentication.
Table of Contents
Overview
The Document Parsing API allows external applications to extract text and structured data from PDF and image files. The API supports:
- File Types: PDF, PNG, JPEG, TIFF
- Max File Size: 4 MB
- Authentication: API Key (via
X-API-Keyheader) or JWT Bearer token - Response Format: JSON
Authentication
Step 1: Create an Account
First, you need to create an account using one of these methods:
- Firebase Authentication (via web UI)
- OTP Authentication (via API)
OTP Authentication Flow
# 1. Request OTP
curl -X POST https://your-api-url/api/auth/otp/request \
-H "Content-Type: application/json" \
-d '{
"email": "your-business-email@company.com"
}'
# Response:
# {
# "success": true,
# "message": "OTP sent to your email"
# }
# 2. Verify OTP and get JWT token
curl -X POST https://your-api-url/api/auth/otp/verify \
-H "Content-Type: application/json" \
-d '{
"email": "your-business-email@company.com",
"otp": "123456"
}'
# Response:
# {
# "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
# "user": { ... }
# }
Note: Only business email addresses are allowed (no Gmail, Yahoo, etc.)
Step 2: Create an API Key
Once authenticated, create an API key for your external application:
# Create API key (requires JWT token from Step 1)
curl -X POST https://your-api-url/api/auth/api-key/create \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "My External App"
}'
# Response:
# {
# "success": true,
# "api_key": "sk_live_abc123...", # ⚠️ SAVE THIS - shown only once!
# "key_id": 1,
# "key_prefix": "sk_live_abc...",
# "name": "My External App",
# "created_at": "2024-01-15T10:30:00",
# "message": "API key created successfully. Store this key securely - it will not be shown again!"
# }
⚠️ IMPORTANT: The full API key is only shown once when created. Store it securely in your application's environment variables or secret management system.
Step 3: Use API Key for Authentication
Use the API key in the X-API-Key header for all subsequent API calls:
curl -X POST https://your-api-url/api/extract \
-H "X-API-Key: sk_live_abc123..." \
-F "file=@document.pdf" \
-F "key_fields=Invoice Number,Invoice Date,Total Amount"
API Endpoints
1. Document Extraction
Endpoint: POST /api/extract
Authentication:
- API Key:
X-API-Key: <your-api-key> - OR JWT:
Authorization: Bearer <jwt-token>
Parameters:
file(required): The document file (PDF, PNG, JPEG, TIFF)key_fields(optional): Comma-separated list of specific fields to extract
Example Request:
curl -X POST https://your-api-url/api/extract \
-H "X-API-Key: sk_live_abc123..." \
-F "file=@invoice.pdf" \
-F "key_fields=Invoice Number,Invoice Date,Total Amount,PO Number"
Example with cURL (file upload):
curl -X POST https://your-api-url/api/extract \
-H "X-API-Key: sk_live_abc123..." \
-F "file=@/path/to/document.pdf"
2. List API Keys
Endpoint: GET /api/auth/api-keys
Authentication: JWT Bearer token (required)
Example:
curl -X GET https://your-api-url/api/auth/api-keys \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
Response:
{
"success": true,
"api_keys": [
{
"id": 1,
"name": "My External App",
"key_prefix": "sk_live_abc...",
"is_active": true,
"last_used_at": "2024-01-15T14:30:00",
"created_at": "2024-01-15T10:30:00"
}
]
}
3. Delete API Key
Endpoint: DELETE /api/auth/api-key/{key_id}
Authentication: JWT Bearer token (required)
Example:
curl -X DELETE https://your-api-url/api/auth/api-key/1 \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
Usage Examples
Python Example
import requests
# API Configuration
API_BASE_URL = "https://your-api-url"
API_KEY = "sk_live_abc123..." # Your API key
# Extract document
def extract_document(file_path, key_fields=None):
url = f"{API_BASE_URL}/api/extract"
headers = {
"X-API-Key": API_KEY
}
with open(file_path, 'rb') as f:
files = {'file': f}
data = {}
if key_fields:
data['key_fields'] = key_fields
response = requests.post(url, headers=headers, files=files, data=data)
response.raise_for_status()
return response.json()
# Usage
result = extract_document("invoice.pdf", key_fields="Invoice Number,Invoice Date,Total Amount")
print(result)
JavaScript/Node.js Example
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
// API Configuration
const API_BASE_URL = 'https://your-api-url';
const API_KEY = 'sk_live_abc123...'; // Your API key
// Extract document
async function extractDocument(filePath, keyFields = null) {
const form = new FormData();
form.append('file', fs.createReadStream(filePath));
if (keyFields) {
form.append('key_fields', keyFields);
}
try {
const response = await axios.post(`${API_BASE_URL}/api/extract`, form, {
headers: {
'X-API-Key': API_KEY,
...form.getHeaders()
}
});
return response.data;
} catch (error) {
console.error('Error:', error.response?.data || error.message);
throw error;
}
}
// Usage
extractDocument('invoice.pdf', 'Invoice Number,Invoice Date,Total Amount')
.then(result => console.log(result))
.catch(error => console.error(error));
PHP Example
<?php
$apiBaseUrl = "https://your-api-url";
$apiKey = "sk_live_abc123..."; // Your API key
function extractDocument($filePath, $keyFields = null) {
global $apiBaseUrl, $apiKey;
$url = $apiBaseUrl . "/api/extract";
$curl = curl_init();
$postData = [
'file' => new CURLFile($filePath)
];
if ($keyFields) {
$postData['key_fields'] = $keyFields;
}
curl_setopt_array($curl, [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $postData,
CURLOPT_HTTPHEADER => [
"X-API-Key: " . $apiKey
]
]);
$response = curl_exec($curl);
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
if ($httpCode !== 200) {
throw new Exception("API request failed: " . $response);
}
return json_decode($response, true);
}
// Usage
try {
$result = extractDocument("invoice.pdf", "Invoice Number,Invoice Date,Total Amount");
print_r($result);
} catch (Exception $e) {
echo "Error: " . $e->getMessage();
}
?>
Response Format
Success Response
{
"id": 123,
"fileName": "invoice.pdf",
"fileType": "application/pdf",
"fileSize": "2.5 MB",
"status": "completed",
"confidence": 92.5,
"fieldsExtracted": 15,
"totalTime": 3500,
"fields": {
"page_1": {
"text": "Extracted text from page 1...",
"table": {
"row_1": {
"column_1": "value1",
"column_2": "value2"
}
},
"footer_notes": ["Note 1", "Note 2"]
}
},
"full_text": "Complete extracted text from all pages...",
"Fields": {
"Invoice Number": "INV-001",
"Invoice Date": "2024-01-15",
"Total Amount": "$1,234.56"
},
"stages": {
"uploading": {
"time": 525,
"status": "completed",
"variation": "normal"
},
"aiAnalysis": {
"time": 1925,
"status": "completed",
"variation": "normal"
},
"dataExtraction": {
"time": 700,
"status": "completed",
"variation": "fast"
},
"outputRendering": {
"time": 350,
"status": "completed",
"variation": "normal"
}
},
"errorMessage": null
}
Response Fields
id: Extraction record IDfileName: Original filenamefileType: MIME type of the filefileSize: File size as stringstatus: "completed" or "failed"confidence: Extraction confidence (0-100)fieldsExtracted: Number of fields extractedtotalTime: Total processing time in millisecondsfields: Structured data with page-wise extraction (tables, text, metadata)full_text: Complete extracted text from all pagesFields: User-specified fields extracted (ifkey_fieldsparameter was provided)stages: Processing stage timingserrorMessage: Error message if extraction failed
Error Handling
Authentication Errors
401 Unauthorized - Invalid or missing API key:
{
"detail": "Invalid API key"
}
401 Unauthorized - No authentication provided:
{
"detail": "Authentication required. Provide either a Bearer token or X-API-Key header."
}
Validation Errors
400 Bad Request - File too large:
{
"detail": "File size exceeds 4 MB limit. Your file is 5.2 MB."
}
400 Bad Request - Invalid file type:
{
"detail": "Only PDF, PNG, JPG, and TIFF files are allowed."
}
Processing Errors
500 Internal Server Error - Extraction failed:
{
"id": 123,
"status": "failed",
"confidence": 0.0,
"fieldsExtracted": 0,
"errorMessage": "OCR processing failed: ..."
}
Best Practices
Store API Keys Securely: Never commit API keys to version control. Use environment variables or secret management systems.
Handle Errors Gracefully: Always check the
statusfield in the response. Ifstatusis "failed", checkerrorMessagefor details.Respect Rate Limits: If rate limiting is implemented, handle 429 responses appropriately with exponential backoff.
Validate File Types: Check file type and size before uploading to avoid unnecessary API calls.
Use Specific Fields: When you know what fields to extract, use the
key_fieldsparameter for better accuracy and faster processing.Monitor API Key Usage: Regularly check your API keys via the
/api/auth/api-keysendpoint to monitor usage and detect unauthorized access.
Security Notes
- API keys are hashed before storage in the database
- Only the key prefix is shown when listing API keys
- API keys can be deactivated (soft deleted) but not permanently deleted
- Each API key is tied to a specific user account
- API key usage is tracked with
last_used_attimestamp
Support
For issues or questions:
- Check the error message in the API response
- Verify your API key is active and correct
- Ensure your file meets the requirements (type, size)
- Check the API status endpoint:
GET /ping