EZOFISOCR / EXTERNAL_API_DOCUMENTATION.md
Seth
Update
8e8c6a4
# External API Documentation
This document explains how to use the Document Parsing API from external applications using API key authentication.
## Table of Contents
1. [Overview](#overview)
2. [Authentication](#authentication)
3. [API Endpoints](#api-endpoints)
4. [Usage Examples](#usage-examples)
5. [Response Format](#response-format)
6. [Error Handling](#error-handling)
## Overview
The Document Parsing API allows external applications to extract text and structured data from PDF and image files. The API supports:
- **File Types**: PDF, PNG, JPEG, TIFF
- **Max File Size**: 4 MB
- **Authentication**: API Key (via `X-API-Key` header) or JWT Bearer token
- **Response Format**: JSON
## Authentication
### Step 1: Create an Account
First, you need to create an account using one of these methods:
1. **Firebase Authentication** (via web UI)
2. **OTP Authentication** (via API)
#### OTP Authentication Flow
```bash
# 1. Request OTP
curl -X POST https://your-api-url/api/auth/otp/request \
-H "Content-Type: application/json" \
-d '{
"email": "your-business-email@company.com"
}'
# Response:
# {
# "success": true,
# "message": "OTP sent to your email"
# }
# 2. Verify OTP and get JWT token
curl -X POST https://your-api-url/api/auth/otp/verify \
-H "Content-Type: application/json" \
-d '{
"email": "your-business-email@company.com",
"otp": "123456"
}'
# Response:
# {
# "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
# "user": { ... }
# }
```
**Note**: Only business email addresses are allowed (no Gmail, Yahoo, etc.)
### Step 2: Create an API Key
Once authenticated, create an API key for your external application:
```bash
# Create API key (requires JWT token from Step 1)
curl -X POST https://your-api-url/api/auth/api-key/create \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "My External App"
}'
# Response:
# {
# "success": true,
# "api_key": "sk_live_abc123...", # ⚠️ SAVE THIS - shown only once!
# "key_id": 1,
# "key_prefix": "sk_live_abc...",
# "name": "My External App",
# "created_at": "2024-01-15T10:30:00",
# "message": "API key created successfully. Store this key securely - it will not be shown again!"
# }
```
**⚠️ IMPORTANT**: The full API key is only shown once when created. Store it securely in your application's environment variables or secret management system.
### Step 3: Use API Key for Authentication
Use the API key in the `X-API-Key` header for all subsequent API calls:
```bash
curl -X POST https://your-api-url/api/extract \
-H "X-API-Key: sk_live_abc123..." \
-F "file=@document.pdf" \
-F "key_fields=Invoice Number,Invoice Date,Total Amount"
```
## API Endpoints
### 1. Document Extraction
**Endpoint**: `POST /api/extract`
**Authentication**:
- API Key: `X-API-Key: <your-api-key>`
- OR JWT: `Authorization: Bearer <jwt-token>`
**Parameters**:
- `file` (required): The document file (PDF, PNG, JPEG, TIFF)
- `key_fields` (optional): Comma-separated list of specific fields to extract
**Example Request**:
```bash
curl -X POST https://your-api-url/api/extract \
-H "X-API-Key: sk_live_abc123..." \
-F "file=@invoice.pdf" \
-F "key_fields=Invoice Number,Invoice Date,Total Amount,PO Number"
```
**Example with cURL (file upload)**:
```bash
curl -X POST https://your-api-url/api/extract \
-H "X-API-Key: sk_live_abc123..." \
-F "file=@/path/to/document.pdf"
```
### 2. List API Keys
**Endpoint**: `GET /api/auth/api-keys`
**Authentication**: JWT Bearer token (required)
**Example**:
```bash
curl -X GET https://your-api-url/api/auth/api-keys \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
```
**Response**:
```json
{
"success": true,
"api_keys": [
{
"id": 1,
"name": "My External App",
"key_prefix": "sk_live_abc...",
"is_active": true,
"last_used_at": "2024-01-15T14:30:00",
"created_at": "2024-01-15T10:30:00"
}
]
}
```
### 3. Delete API Key
**Endpoint**: `DELETE /api/auth/api-key/{key_id}`
**Authentication**: JWT Bearer token (required)
**Example**:
```bash
curl -X DELETE https://your-api-url/api/auth/api-key/1 \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
```
## Usage Examples
### Python Example
```python
import requests
# API Configuration
API_BASE_URL = "https://your-api-url"
API_KEY = "sk_live_abc123..." # Your API key
# Extract document
def extract_document(file_path, key_fields=None):
url = f"{API_BASE_URL}/api/extract"
headers = {
"X-API-Key": API_KEY
}
with open(file_path, 'rb') as f:
files = {'file': f}
data = {}
if key_fields:
data['key_fields'] = key_fields
response = requests.post(url, headers=headers, files=files, data=data)
response.raise_for_status()
return response.json()
# Usage
result = extract_document("invoice.pdf", key_fields="Invoice Number,Invoice Date,Total Amount")
print(result)
```
### JavaScript/Node.js Example
```javascript
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
// API Configuration
const API_BASE_URL = 'https://your-api-url';
const API_KEY = 'sk_live_abc123...'; // Your API key
// Extract document
async function extractDocument(filePath, keyFields = null) {
const form = new FormData();
form.append('file', fs.createReadStream(filePath));
if (keyFields) {
form.append('key_fields', keyFields);
}
try {
const response = await axios.post(`${API_BASE_URL}/api/extract`, form, {
headers: {
'X-API-Key': API_KEY,
...form.getHeaders()
}
});
return response.data;
} catch (error) {
console.error('Error:', error.response?.data || error.message);
throw error;
}
}
// Usage
extractDocument('invoice.pdf', 'Invoice Number,Invoice Date,Total Amount')
.then(result => console.log(result))
.catch(error => console.error(error));
```
### PHP Example
```php
<?php
$apiBaseUrl = "https://your-api-url";
$apiKey = "sk_live_abc123..."; // Your API key
function extractDocument($filePath, $keyFields = null) {
global $apiBaseUrl, $apiKey;
$url = $apiBaseUrl . "/api/extract";
$curl = curl_init();
$postData = [
'file' => new CURLFile($filePath)
];
if ($keyFields) {
$postData['key_fields'] = $keyFields;
}
curl_setopt_array($curl, [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $postData,
CURLOPT_HTTPHEADER => [
"X-API-Key: " . $apiKey
]
]);
$response = curl_exec($curl);
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
if ($httpCode !== 200) {
throw new Exception("API request failed: " . $response);
}
return json_decode($response, true);
}
// Usage
try {
$result = extractDocument("invoice.pdf", "Invoice Number,Invoice Date,Total Amount");
print_r($result);
} catch (Exception $e) {
echo "Error: " . $e->getMessage();
}
?>
```
## Response Format
### Success Response
```json
{
"id": 123,
"fileName": "invoice.pdf",
"fileType": "application/pdf",
"fileSize": "2.5 MB",
"status": "completed",
"confidence": 92.5,
"fieldsExtracted": 15,
"totalTime": 3500,
"fields": {
"page_1": {
"text": "Extracted text from page 1...",
"table": {
"row_1": {
"column_1": "value1",
"column_2": "value2"
}
},
"footer_notes": ["Note 1", "Note 2"]
}
},
"full_text": "Complete extracted text from all pages...",
"Fields": {
"Invoice Number": "INV-001",
"Invoice Date": "2024-01-15",
"Total Amount": "$1,234.56"
},
"stages": {
"uploading": {
"time": 525,
"status": "completed",
"variation": "normal"
},
"aiAnalysis": {
"time": 1925,
"status": "completed",
"variation": "normal"
},
"dataExtraction": {
"time": 700,
"status": "completed",
"variation": "fast"
},
"outputRendering": {
"time": 350,
"status": "completed",
"variation": "normal"
}
},
"errorMessage": null
}
```
### Response Fields
- `id`: Extraction record ID
- `fileName`: Original filename
- `fileType`: MIME type of the file
- `fileSize`: File size as string
- `status`: "completed" or "failed"
- `confidence`: Extraction confidence (0-100)
- `fieldsExtracted`: Number of fields extracted
- `totalTime`: Total processing time in milliseconds
- `fields`: Structured data with page-wise extraction (tables, text, metadata)
- `full_text`: Complete extracted text from all pages
- `Fields`: User-specified fields extracted (if `key_fields` parameter was provided)
- `stages`: Processing stage timings
- `errorMessage`: Error message if extraction failed
## Error Handling
### Authentication Errors
**401 Unauthorized** - Invalid or missing API key:
```json
{
"detail": "Invalid API key"
}
```
**401 Unauthorized** - No authentication provided:
```json
{
"detail": "Authentication required. Provide either a Bearer token or X-API-Key header."
}
```
### Validation Errors
**400 Bad Request** - File too large:
```json
{
"detail": "File size exceeds 4 MB limit. Your file is 5.2 MB."
}
```
**400 Bad Request** - Invalid file type:
```json
{
"detail": "Only PDF, PNG, JPG, and TIFF files are allowed."
}
```
### Processing Errors
**500 Internal Server Error** - Extraction failed:
```json
{
"id": 123,
"status": "failed",
"confidence": 0.0,
"fieldsExtracted": 0,
"errorMessage": "OCR processing failed: ..."
}
```
## Best Practices
1. **Store API Keys Securely**: Never commit API keys to version control. Use environment variables or secret management systems.
2. **Handle Errors Gracefully**: Always check the `status` field in the response. If `status` is "failed", check `errorMessage` for details.
3. **Respect Rate Limits**: If rate limiting is implemented, handle 429 responses appropriately with exponential backoff.
4. **Validate File Types**: Check file type and size before uploading to avoid unnecessary API calls.
5. **Use Specific Fields**: When you know what fields to extract, use the `key_fields` parameter for better accuracy and faster processing.
6. **Monitor API Key Usage**: Regularly check your API keys via the `/api/auth/api-keys` endpoint to monitor usage and detect unauthorized access.
## Security Notes
- API keys are hashed before storage in the database
- Only the key prefix is shown when listing API keys
- API keys can be deactivated (soft deleted) but not permanently deleted
- Each API key is tied to a specific user account
- API key usage is tracked with `last_used_at` timestamp
## Support
For issues or questions:
1. Check the error message in the API response
2. Verify your API key is active and correct
3. Ensure your file meets the requirements (type, size)
4. Check the API status endpoint: `GET /ping`