invoice_extractor / README_git.md
github-actions[bot]
Sync from GitHub: 67c5ee67bf66d7b77be5e2ffbfaa22681c2e0ebf
aeb681f

Invoice Information Extractor API

Extract structured information from Indian tractor invoices using AI-powered REST API.

What It Does

Combines YOLO (signature/stamp detection) + Qwen2.5-VL (text extraction) to extract:

  • Dealer name
  • Model name
  • Horse power
  • Asset cost
  • Signature presence & location
  • Stamp presence & location

Architecture

Production (Hugging Face Deployment)

  • FastAPI server with REST endpoints
  • Models loaded on startup and cached in memory
  • YOLO model stored locally in utils/models/best.pt
  • Qwen2.5-VL downloaded from Hugging Face on first run (not stored locally)

Key Components

  • app.py - FastAPI server with endpoints
  • model_manager.py - Handles model loading and caching
  • inference.py - Processing pipeline and validation
  • config.py - Configuration settings
  • executable.py - Legacy CLI interface (deprecated)

Installation

pip install -r requirements.txt

Requirements: Python 3.10+, CUDA GPU (8GB+ VRAM)

Running the Server

Local Development

python app.py

Server runs on http://localhost:7860

Production (Hugging Face Spaces)

uvicorn app:app --host 0.0.0.0 --port 7860

API Endpoints

1. Health Check

GET /health

Response:

{
  "status": "healthy",
  "models_loaded": true
}

2. Extract Single Invoice

POST /extract

Parameters:

  • file (required): Image file (JPG, PNG, JPEG)
  • doc_id (optional): Document identifier

Example (cURL):

curl -X POST "http://localhost:7860/extract" \
  -F "file=@invoice_001.png" \
  -F "doc_id=invoice_001"

Response:

{
  "doc_id": "invoice_001",
  "fields": {
    "dealer_name": "ABC Tractors Pvt Ltd",
    "model_name": "Mahindra 575 DI",
    "horse_power": 50,
    "asset_cost": 525000,
    "signature": {"present": true, "bbox": [100, 200, 300, 250]},
    "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
  },
  "confidence": 0.89,
  "processing_time_sec": 3.8,
  "cost_estimate_usd": 0.000528,
  "warnings": null
}

3. Extract Multiple Invoices (Batch)

POST /extract_batch

Parameters:

  • files (required): Array of image files

Output Format

Results saved to sample_output/result.json:

{
  "doc_id": "invoice_001",
  "fields": {
    "dealer_name": "ABC Tractors Pvt Ltd",
    "model_name": "Mahindra 575 DI",
    "horse_power": 50,
    "asset_cost": 525000,
    "signature": {"present": true, "bbox": [100, 200, 300, 250]},
    "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
  },
  "confidence": 0.89,
  "processing_time_sec": 3.8,
  "cost_estimate_usd": 0.000528
}

Range: 0.0 to 1.0 (higher is better)

Cost Calculation

Formula:

cost_usd = (0.5 * processing_time_sec) / 3600

Assumes $0.60 per GPU hour

Typical costs:

  • Per invoice: ~$0.002

Models

  • YOLO: Signature/stamp detection (best.pt)
  • Qwen2.5-VL-7B: Text extraction (4-bit quantized)

GPU Requirements

  • Minimum: 10 GB VRAM

Project Structure

INVOICE_INFO_EXTRACTOR/
β”œβ”€β”€ app.py                 # FastAPI server (main entry point)
β”œβ”€β”€ model_manager.py       # Model loading and caching
β”œβ”€β”€ inference.py           # Processing pipeline and validation
β”œβ”€β”€ config.py              # Configuration settings
β”œβ”€β”€ requirements.txt       
β”œβ”€β”€ README.md             
β”œβ”€β”€ executable.py          # Legacy CLI (deprecated)
β”œβ”€β”€ utils/
β”‚   └── models/
β”‚       └── best.pt        # YOLO model (stored locally)
└── sample_output/
    └── result.json        # Sample output

Performance

  • Processing time: ~8 seconds per invoice
  • Cost per invoice: ~$0.002 (GPU time)
  • GPU Memory: 8GB minimum