Spaces:

quantumbit
/

invoice_extractor

Paused

App Files Files Community

invoice_extractor / README_git.md

github-actions[bot]

Sync from GitHub: 67c5ee67bf66d7b77be5e2ffbfaa22681c2e0ebf

aeb681f 2 days ago

preview code

raw

history blame contribute delete

3.84 kB

	# Invoice Information Extractor API

	Extract structured information from Indian tractor invoices using AI-powered REST API.

	## What It Does

	Combines YOLO (signature/stamp detection) + Qwen2.5-VL (text extraction) to extract:
	- Dealer name
	- Model name
	- Horse power
	- Asset cost
	- Signature presence & location
	- Stamp presence & location

	## Architecture

	### Production (Hugging Face Deployment)
	- FastAPI server with REST endpoints
	- Models loaded on startup and cached in memory
	- YOLO model stored locally in `utils/models/best.pt`
	- Qwen2.5-VL downloaded from Hugging Face on first run (not stored locally)

	### Key Components
	- `app.py` - FastAPI server with endpoints
	- `model_manager.py` - Handles model loading and caching
	- `inference.py` - Processing pipeline and validation
	- `config.py` - Configuration settings
	- `executable.py` - Legacy CLI interface (deprecated)

	## Installation

	```bash
	pip install -r requirements.txt
	```

	Requirements: Python 3.10+, CUDA GPU (8GB+ VRAM)

	## Running the Server

	### Local Development
	```bash
	python app.py
	```

	Server runs on `http://localhost:7860`

	### Production (Hugging Face Spaces)
	```bash
	uvicorn app:app --host 0.0.0.0 --port 7860
	```

	## API Endpoints

	### 1. Health Check
	```bash
	GET /health
	```

	Response:
	```json
	{
	"status": "healthy",
	"models_loaded": true
	}
	```

	### 2. Extract Single Invoice
	```bash
	POST /extract
	```

	Parameters:
	- `file` (required): Image file (JPG, PNG, JPEG)
	- `doc_id` (optional): Document identifier

	Example (cURL):
	```bash
	curl -X POST "http://localhost:7860/extract" \
	-F "file=@invoice_001.png" \
	-F "doc_id=invoice_001"
	```



	Response:
	```json
	{
	"doc_id": "invoice_001",
	"fields": {
	"dealer_name": "ABC Tractors Pvt Ltd",
	"model_name": "Mahindra 575 DI",
	"horse_power": 50,
	"asset_cost": 525000,
	"signature": {"present": true, "bbox": [100, 200, 300, 250]},
	"stamp": {"present": true, "bbox": [400, 500, 500, 550]}
	},
	"confidence": 0.89,
	"processing_time_sec": 3.8,
	"cost_estimate_usd": 0.000528,
	"warnings": null
	}
	```

	### 3. Extract Multiple Invoices (Batch)
	```bash
	POST /extract_batch
	```

	Parameters:
	- `files` (required): Array of image files

	## Output Format

	Results saved to `sample_output/result.json`:

	```json
	{
	"doc_id": "invoice_001",
	"fields": {
	"dealer_name": "ABC Tractors Pvt Ltd",
	"model_name": "Mahindra 575 DI",
	"horse_power": 50,
	"asset_cost": 525000,
	"signature": {"present": true, "bbox": [100, 200, 300, 250]},
	"stamp": {"present": true, "bbox": [400, 500, 500, 550]}
	},
	"confidence": 0.89,
	"processing_time_sec": 3.8,
	"cost_estimate_usd": 0.000528
	}
	```


	Range: 0.0 to 1.0 (higher is better)

	## Cost Calculation

	Formula:
	```
	cost_usd = (0.5 * processing_time_sec) / 3600
	```

	Assumes $0.60 per GPU hour

	Typical costs:
	- Per invoice: ~$0.002

	## Models

	- YOLO: Signature/stamp detection (`best.pt`)
	- Qwen2.5-VL-7B: Text extraction (4-bit quantized)

	## GPU Requirements

	- Minimum: 10 GB VRAM

	## Project Structure

	```
	INVOICE_INFO_EXTRACTOR/
	├── app.py # FastAPI server (main entry point)
	├── model_manager.py # Model loading and caching
	├── inference.py # Processing pipeline and validation
	├── config.py # Configuration settings
	├── requirements.txt
	├── README.md
	├── executable.py # Legacy CLI (deprecated)
	├── utils/
	│ └── models/
	│ └── best.pt # YOLO model (stored locally)
	└── sample_output/
	└── result.json # Sample output
	```

	## Performance

	- Processing time: ~8 seconds per invoice
	- Cost per invoice: ~$0.002 (GPU time)
	- GPU Memory: 8GB minimum