EZOFISOCR / EXTERNAL_API_DOCUMENTATION.md
Seth
Update
8e8c6a4

External API Documentation

This document explains how to use the Document Parsing API from external applications using API key authentication.

Table of Contents

  1. Overview
  2. Authentication
  3. API Endpoints
  4. Usage Examples
  5. Response Format
  6. Error Handling

Overview

The Document Parsing API allows external applications to extract text and structured data from PDF and image files. The API supports:

  • File Types: PDF, PNG, JPEG, TIFF
  • Max File Size: 4 MB
  • Authentication: API Key (via X-API-Key header) or JWT Bearer token
  • Response Format: JSON

Authentication

Step 1: Create an Account

First, you need to create an account using one of these methods:

  1. Firebase Authentication (via web UI)
  2. OTP Authentication (via API)

OTP Authentication Flow

# 1. Request OTP
curl -X POST https://your-api-url/api/auth/otp/request \
  -H "Content-Type: application/json" \
  -d '{
    "email": "your-business-email@company.com"
  }'

# Response:
# {
#   "success": true,
#   "message": "OTP sent to your email"
# }

# 2. Verify OTP and get JWT token
curl -X POST https://your-api-url/api/auth/otp/verify \
  -H "Content-Type: application/json" \
  -d '{
    "email": "your-business-email@company.com",
    "otp": "123456"
  }'

# Response:
# {
#   "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
#   "user": { ... }
# }

Note: Only business email addresses are allowed (no Gmail, Yahoo, etc.)

Step 2: Create an API Key

Once authenticated, create an API key for your external application:

# Create API key (requires JWT token from Step 1)
curl -X POST https://your-api-url/api/auth/api-key/create \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My External App"
  }'

# Response:
# {
#   "success": true,
#   "api_key": "sk_live_abc123...",  # ⚠️ SAVE THIS - shown only once!
#   "key_id": 1,
#   "key_prefix": "sk_live_abc...",
#   "name": "My External App",
#   "created_at": "2024-01-15T10:30:00",
#   "message": "API key created successfully. Store this key securely - it will not be shown again!"
# }

⚠️ IMPORTANT: The full API key is only shown once when created. Store it securely in your application's environment variables or secret management system.

Step 3: Use API Key for Authentication

Use the API key in the X-API-Key header for all subsequent API calls:

curl -X POST https://your-api-url/api/extract \
  -H "X-API-Key: sk_live_abc123..." \
  -F "file=@document.pdf" \
  -F "key_fields=Invoice Number,Invoice Date,Total Amount"

API Endpoints

1. Document Extraction

Endpoint: POST /api/extract

Authentication:

  • API Key: X-API-Key: <your-api-key>
  • OR JWT: Authorization: Bearer <jwt-token>

Parameters:

  • file (required): The document file (PDF, PNG, JPEG, TIFF)
  • key_fields (optional): Comma-separated list of specific fields to extract

Example Request:

curl -X POST https://your-api-url/api/extract \
  -H "X-API-Key: sk_live_abc123..." \
  -F "file=@invoice.pdf" \
  -F "key_fields=Invoice Number,Invoice Date,Total Amount,PO Number"

Example with cURL (file upload):

curl -X POST https://your-api-url/api/extract \
  -H "X-API-Key: sk_live_abc123..." \
  -F "file=@/path/to/document.pdf"

2. List API Keys

Endpoint: GET /api/auth/api-keys

Authentication: JWT Bearer token (required)

Example:

curl -X GET https://your-api-url/api/auth/api-keys \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Response:

{
  "success": true,
  "api_keys": [
    {
      "id": 1,
      "name": "My External App",
      "key_prefix": "sk_live_abc...",
      "is_active": true,
      "last_used_at": "2024-01-15T14:30:00",
      "created_at": "2024-01-15T10:30:00"
    }
  ]
}

3. Delete API Key

Endpoint: DELETE /api/auth/api-key/{key_id}

Authentication: JWT Bearer token (required)

Example:

curl -X DELETE https://your-api-url/api/auth/api-key/1 \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Usage Examples

Python Example

import requests

# API Configuration
API_BASE_URL = "https://your-api-url"
API_KEY = "sk_live_abc123..."  # Your API key

# Extract document
def extract_document(file_path, key_fields=None):
    url = f"{API_BASE_URL}/api/extract"
    headers = {
        "X-API-Key": API_KEY
    }
    
    with open(file_path, 'rb') as f:
        files = {'file': f}
        data = {}
        if key_fields:
            data['key_fields'] = key_fields
        
        response = requests.post(url, headers=headers, files=files, data=data)
        response.raise_for_status()
        return response.json()

# Usage
result = extract_document("invoice.pdf", key_fields="Invoice Number,Invoice Date,Total Amount")
print(result)

JavaScript/Node.js Example

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

// API Configuration
const API_BASE_URL = 'https://your-api-url';
const API_KEY = 'sk_live_abc123...'; // Your API key

// Extract document
async function extractDocument(filePath, keyFields = null) {
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));
  if (keyFields) {
    form.append('key_fields', keyFields);
  }

  try {
    const response = await axios.post(`${API_BASE_URL}/api/extract`, form, {
      headers: {
        'X-API-Key': API_KEY,
        ...form.getHeaders()
      }
    });
    return response.data;
  } catch (error) {
    console.error('Error:', error.response?.data || error.message);
    throw error;
  }
}

// Usage
extractDocument('invoice.pdf', 'Invoice Number,Invoice Date,Total Amount')
  .then(result => console.log(result))
  .catch(error => console.error(error));

PHP Example

<?php

$apiBaseUrl = "https://your-api-url";
$apiKey = "sk_live_abc123..."; // Your API key

function extractDocument($filePath, $keyFields = null) {
    global $apiBaseUrl, $apiKey;
    
    $url = $apiBaseUrl . "/api/extract";
    
    $curl = curl_init();
    
    $postData = [
        'file' => new CURLFile($filePath)
    ];
    
    if ($keyFields) {
        $postData['key_fields'] = $keyFields;
    }
    
    curl_setopt_array($curl, [
        CURLOPT_URL => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_POST => true,
        CURLOPT_POSTFIELDS => $postData,
        CURLOPT_HTTPHEADER => [
            "X-API-Key: " . $apiKey
        ]
    ]);
    
    $response = curl_exec($curl);
    $httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
    curl_close($curl);
    
    if ($httpCode !== 200) {
        throw new Exception("API request failed: " . $response);
    }
    
    return json_decode($response, true);
}

// Usage
try {
    $result = extractDocument("invoice.pdf", "Invoice Number,Invoice Date,Total Amount");
    print_r($result);
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}
?>

Response Format

Success Response

{
  "id": 123,
  "fileName": "invoice.pdf",
  "fileType": "application/pdf",
  "fileSize": "2.5 MB",
  "status": "completed",
  "confidence": 92.5,
  "fieldsExtracted": 15,
  "totalTime": 3500,
  "fields": {
    "page_1": {
      "text": "Extracted text from page 1...",
      "table": {
        "row_1": {
          "column_1": "value1",
          "column_2": "value2"
        }
      },
      "footer_notes": ["Note 1", "Note 2"]
    }
  },
  "full_text": "Complete extracted text from all pages...",
  "Fields": {
    "Invoice Number": "INV-001",
    "Invoice Date": "2024-01-15",
    "Total Amount": "$1,234.56"
  },
  "stages": {
    "uploading": {
      "time": 525,
      "status": "completed",
      "variation": "normal"
    },
    "aiAnalysis": {
      "time": 1925,
      "status": "completed",
      "variation": "normal"
    },
    "dataExtraction": {
      "time": 700,
      "status": "completed",
      "variation": "fast"
    },
    "outputRendering": {
      "time": 350,
      "status": "completed",
      "variation": "normal"
    }
  },
  "errorMessage": null
}

Response Fields

  • id: Extraction record ID
  • fileName: Original filename
  • fileType: MIME type of the file
  • fileSize: File size as string
  • status: "completed" or "failed"
  • confidence: Extraction confidence (0-100)
  • fieldsExtracted: Number of fields extracted
  • totalTime: Total processing time in milliseconds
  • fields: Structured data with page-wise extraction (tables, text, metadata)
  • full_text: Complete extracted text from all pages
  • Fields: User-specified fields extracted (if key_fields parameter was provided)
  • stages: Processing stage timings
  • errorMessage: Error message if extraction failed

Error Handling

Authentication Errors

401 Unauthorized - Invalid or missing API key:

{
  "detail": "Invalid API key"
}

401 Unauthorized - No authentication provided:

{
  "detail": "Authentication required. Provide either a Bearer token or X-API-Key header."
}

Validation Errors

400 Bad Request - File too large:

{
  "detail": "File size exceeds 4 MB limit. Your file is 5.2 MB."
}

400 Bad Request - Invalid file type:

{
  "detail": "Only PDF, PNG, JPG, and TIFF files are allowed."
}

Processing Errors

500 Internal Server Error - Extraction failed:

{
  "id": 123,
  "status": "failed",
  "confidence": 0.0,
  "fieldsExtracted": 0,
  "errorMessage": "OCR processing failed: ..."
}

Best Practices

  1. Store API Keys Securely: Never commit API keys to version control. Use environment variables or secret management systems.

  2. Handle Errors Gracefully: Always check the status field in the response. If status is "failed", check errorMessage for details.

  3. Respect Rate Limits: If rate limiting is implemented, handle 429 responses appropriately with exponential backoff.

  4. Validate File Types: Check file type and size before uploading to avoid unnecessary API calls.

  5. Use Specific Fields: When you know what fields to extract, use the key_fields parameter for better accuracy and faster processing.

  6. Monitor API Key Usage: Regularly check your API keys via the /api/auth/api-keys endpoint to monitor usage and detect unauthorized access.

Security Notes

  • API keys are hashed before storage in the database
  • Only the key prefix is shown when listing API keys
  • API keys can be deactivated (soft deleted) but not permanently deleted
  • Each API key is tied to a specific user account
  • API key usage is tracked with last_used_at timestamp

Support

For issues or questions:

  1. Check the error message in the API response
  2. Verify your API key is active and correct
  3. Ensure your file meets the requirements (type, size)
  4. Check the API status endpoint: GET /ping