# Invoice Information Extractor API Extract structured information from Indian tractor invoices using AI-powered REST API. ## What It Does Combines **YOLO** (signature/stamp detection) + **Qwen2.5-VL** (text extraction) to extract: - Dealer name - Model name - Horse power - Asset cost - Signature presence & location - Stamp presence & location ## Architecture ### Production (Hugging Face Deployment) - **FastAPI server** with REST endpoints - **Models loaded on startup** and cached in memory - **YOLO model** stored locally in `utils/models/best.pt` - **Qwen2.5-VL** downloaded from Hugging Face on first run (not stored locally) ### Key Components - `app.py` - FastAPI server with endpoints - `model_manager.py` - Handles model loading and caching - `inference.py` - Processing pipeline and validation - `config.py` - Configuration settings - `executable.py` - Legacy CLI interface (deprecated) ## Installation ```bash pip install -r requirements.txt ``` **Requirements:** Python 3.10+, CUDA GPU (8GB+ VRAM) ## Running the Server ### Local Development ```bash python app.py ``` Server runs on `http://localhost:7860` ### Production (Hugging Face Spaces) ```bash uvicorn app:app --host 0.0.0.0 --port 7860 ``` ## API Endpoints ### 1. Health Check ```bash GET /health ``` **Response:** ```json { "status": "healthy", "models_loaded": true } ``` ### 2. Extract Single Invoice ```bash POST /extract ``` **Parameters:** - `file` (required): Image file (JPG, PNG, JPEG) - `doc_id` (optional): Document identifier **Example (cURL):** ```bash curl -X POST "http://localhost:7860/extract" \ -F "file=@invoice_001.png" \ -F "doc_id=invoice_001" ``` **Response:** ```json { "doc_id": "invoice_001", "fields": { "dealer_name": "ABC Tractors Pvt Ltd", "model_name": "Mahindra 575 DI", "horse_power": 50, "asset_cost": 525000, "signature": {"present": true, "bbox": [100, 200, 300, 250]}, "stamp": {"present": true, "bbox": [400, 500, 500, 550]} }, "confidence": 0.89, "processing_time_sec": 3.8, "cost_estimate_usd": 0.000528, "warnings": null } ``` ### 3. Extract Multiple Invoices (Batch) ```bash POST /extract_batch ``` **Parameters:** - `files` (required): Array of image files ## Output Format Results saved to `sample_output/result.json`: ```json { "doc_id": "invoice_001", "fields": { "dealer_name": "ABC Tractors Pvt Ltd", "model_name": "Mahindra 575 DI", "horse_power": 50, "asset_cost": 525000, "signature": {"present": true, "bbox": [100, 200, 300, 250]}, "stamp": {"present": true, "bbox": [400, 500, 500, 550]} }, "confidence": 0.89, "processing_time_sec": 3.8, "cost_estimate_usd": 0.000528 } ``` Range: 0.0 to 1.0 (higher is better) ## Cost Calculation **Formula:** ``` cost_usd = (0.5 * processing_time_sec) / 3600 ``` Assumes **$0.60 per GPU hour** **Typical costs:** - Per invoice: ~$0.002 ## Models - **YOLO:** Signature/stamp detection (`best.pt`) - **Qwen2.5-VL-7B:** Text extraction (4-bit quantized) ## GPU Requirements - **Minimum:** 10 GB VRAM ## Project Structure ``` INVOICE_INFO_EXTRACTOR/ ├── app.py # FastAPI server (main entry point) ├── model_manager.py # Model loading and caching ├── inference.py # Processing pipeline and validation ├── config.py # Configuration settings ├── requirements.txt ├── README.md ├── executable.py # Legacy CLI (deprecated) ├── utils/ │ └── models/ │ └── best.pt # YOLO model (stored locally) └── sample_output/ └── result.json # Sample output ``` ## Performance - **Processing time:** ~8 seconds per invoice - **Cost per invoice:** ~$0.002 (GPU time) - **GPU Memory:** 8GB minimum